{ "/scratch/micpie/export/uniprot_binding_sites_multiple/test_0-1.jsonl": "{"text":"Task: Create a molecule that binds to the given site in the protein.\nProtein: MRSVSGQVVCVTGAGGFIASWLVKILLEKGYTVRGTVRNPDDPKNGHLRELEGAKERLTLCKADLLDYQSLREAINGCDGVFHTASPVTDDPEQMVEPAVIGTKNVINAAAEANVRRVVFTSSIGAVYMDPNRDPETVVDETCWSDPDFCKNTKNWYCYGKMVAEQAAWEEAKEKGVDLVVINPVLVQGPLLQTTVNASVLHILKYLTGSAKTYANSVQAYVDVKDVALAHILLYETPEASGRYLCAESVLHRGDVVEILSKFFPEYPIPTKCSDVTKPRVKPYKFSNQKLKDLGLEFTPVKQCLYETVKSLQEKGHLPIPTQKDEPIIRIQP\nBinding site position: 13-19\nOutput: [O][=C][Branch1][C][O][C][C][C@H1][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][S][Ring1][=Branch2][C][=Branch1][C][=O][N][C][C][O][C][C][Ring1][Branch1][=O]"} {"text":"Task: Come up with a chemical that binds to the given in the protein.\nPeptide sequence: MFFNRERELEKLLRLVSTEPNLITFVYGPINSGKTALMMEFIKKLPDDHIAFYINLRRTPITSYSDFVDVLFSVEFRNKVKTLKEAVSLVLSAGKETFGFPVPTELLARITKEKKPKNAFAYIVTLMEEVRKAGKRPILILDELQVIGDLKVDGSLIYELFNFFIHLTKESHLSHVFVVTSDSLFIERVYSEAMLQGRAEYFLVDDFKRETALRFLKNNGLSDDEAELVWNYFGGKPVYLAETIKHRDELKEWCERMLKLRTSQILDELYALEKELFEKVVKLFFAFEKQESVPYRSLSEEILWAVKRNILFAEPVDRVLRPQGRLELLAIKRILDIIE\nBinding site position: 28-35\nOutput: CCN(CC)c1cc(OC)c(C(=O)NCC2CN(Cc3ccccc3)CCO2)cc1Cl.O=C(O)C(=O)O"}", "/scratch/micpie/export/uniprot_binding_sites_multiple/valid_0-0.jsonl": "{"text":"Task: Come up with a binding site for the compound in the amino acid sequence.\nPeptide sequence: MIKHFLEDNSDDAELSKFVKDFPGSEPCHPTESKTRVARPQILEPRPQSPDLCDDDVEFRATLWSQPSDSQQYFCPPAPLSPSSRPRSPWGKLDPYDSSEDDKEYVGFATLPNQVHRKSVKKGFDFTLMVAGESGLGKSTLVNSLFLTDLYRDRKLLGAEERIMQTVEITKHAVDIEEKGVRLRLTIVDTPGFGDAVNNTECWRPVAEYIDQQFEQYFRDESGLNRKNIQDNRVHCCLYFISPFGHGLRPLDVEFMKALHQRVNIVPILAKADTLTPSEVDRKKCKIREEIEHFGIKIYQFPDCDSDEDEDFKLQDQALKESIPFAVIGSNTVVEARGRRVRGRLYPWGIVEVENPGHCDFVKLRTMLVRTHMQDLKDVTRETHYENYRAQCIQSMTRLVVKERNRNKLTRESGTDFPIPAVPPGTDPETEKLIREKDEELRRMQEMLHKIQRQMKETH\nInChI representation: InChI=1S\/C25H25N7O\/c1-3-7-21-27-22-17(2)14-15-26-25(22)32(21)16-18-10-12-20(13-11-18)33-23(24-28-30-31-29-24)19-8-5-4-6-9-19\/h4-6,8-15,23H,3,7,16H2,1-2H3,(H,28,29,30,31)\nResult: 132-139"} {"text":"Task: Find a binding site for the compound in the protein.\nPeptide sequence: MVQTVYKNSDQTVFEDAKALFQLNKNILLKGPTGSGKTKLAETLSNVMKLPMHQVNCSVDLDTESLLGFKTIHTNEEGHQEIVFIDGPVIKAMKEGHILYIDEINMAKPETLPILNGVLDYRRQLTNPYTGEVIKAAPGFNVIAAINEGYVGTLPMNEALKNRFIVIEVDYIDGDILKTVIKEQSKLQDEQLIQHIVKFNEDLRTMTKQGQISEEAASIRALIDLSDLATVMPIERAVQRTIIDKLEDEREQQAILNAIELNF\ncanonical SMILES: CCN(CC)c1cc(OC)c(C(=O)NCC2CN(Cc3ccccc3)CCO2)cc1Cl.O=C(O)C(=O)O\nOutput: 31-38"}", "/scratch/micpie/export/uniprot_binding_sites_multiple/test_0-2.jsonl": "{"text":"Question: Can you give me one example of a binding site of the compound with the SELFIES representation [O][=C][Branch1][C][O][C][C][C@H1][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][S][Ring1][=Branch2][C][=Branch1][C][=O][N][C][C][O][C][C][Ring1][Branch1][=O] in this peptide sequence MRSVSGQVVCVTGAGGFIASWLVKILLEKGYTVRGTVRNPDDPKNGHLRELEGAKERLTLCKADLLDYQSLREAINGCDGVFHTASPVTDDPEQMVEPAVIGTKNVINAAAEANVRRVVFTSSIGAVYMDPNRDPETVVDETCWSDPDFCKNTKNWYCYGKMVAEQAAWEEAKEKGVDLVVINPVLVQGPLLQTTVNASVLHILKYLTGSAKTYANSVQAYVDVKDVALAHILLYETPEASGRYLCAESVLHRGDVVEILSKFFPEYPIPTKCSDVTKPRVKPYKFSNQKLKDLGLEFTPVKQCLYETVKSLQEKGHLPIPTQKDEPIIRIQP?\nAnswer: One site for the chemical is 13-19."} {"text":"Question: Can you find one binding site of the compound with the InChI InChI=1S\/C24H32ClN3O3.C2H2O4\/c1-4-28(5-2)22-14-23(30-3)20(13-21(22)25)24(29)26-15-19-17-27(11-12-31-19)16-18-9-7-6-8-10-18;3-1(4)2(5)6\/h6-10,13-14,19H,4-5,11-12,15-17H2,1-3H3,(H,26,29);(H,3,4)(H,5,6) in this peptide sequence MFFNRERELEKLLRLVSTEPNLITFVYGPINSGKTALMMEFIKKLPDDHIAFYINLRRTPITSYSDFVDVLFSVEFRNKVKTLKEAVSLVLSAGKETFGFPVPTELLARITKEKKPKNAFAYIVTLMEEVRKAGKRPILILDELQVIGDLKVDGSLIYELFNFFIHLTKESHLSHVFVVTSDSLFIERVYSEAMLQGRAEYFLVDDFKRETALRFLKNNGLSDDEAELVWNYFGGKPVYLAETIKHRDELKEWCERMLKLRTSQILDELYALEKELFEKVVKLFFAFEKQESVPYRSLSEEILWAVKRNILFAEPVDRVLRPQGRLELLAIKRILDIIE?\nAnswer: One site for the molecule is 28 to 35."}", "/scratch/micpie/export/uniprot_binding_sites_multiple/test_0-0.jsonl": "{"text":"Task: Come up with a binding site for the compound in the peptide sequence.\nProtein: MRSVSGQVVCVTGAGGFIASWLVKILLEKGYTVRGTVRNPDDPKNGHLRELEGAKERLTLCKADLLDYQSLREAINGCDGVFHTASPVTDDPEQMVEPAVIGTKNVINAAAEANVRRVVFTSSIGAVYMDPNRDPETVVDETCWSDPDFCKNTKNWYCYGKMVAEQAAWEEAKEKGVDLVVINPVLVQGPLLQTTVNASVLHILKYLTGSAKTYANSVQAYVDVKDVALAHILLYETPEASGRYLCAESVLHRGDVVEILSKFFPEYPIPTKCSDVTKPRVKPYKFSNQKLKDLGLEFTPVKQCLYETVKSLQEKGHLPIPTQKDEPIIRIQP\nSMILES representation: O=C(O)CC[C@H](NC(=O)c1cc2ccccc2s1)C(=O)NC1COCC1=O\nOutput: 13-19"} {"text":"Task: Identify a binding site for the chemical in the AA sequence.\nProtein: MFFNRERELEKLLRLVSTEPNLITFVYGPINSGKTALMMEFIKKLPDDHIAFYINLRRTPITSYSDFVDVLFSVEFRNKVKTLKEAVSLVLSAGKETFGFPVPTELLARITKEKKPKNAFAYIVTLMEEVRKAGKRPILILDELQVIGDLKVDGSLIYELFNFFIHLTKESHLSHVFVVTSDSLFIERVYSEAMLQGRAEYFLVDDFKRETALRFLKNNGLSDDEAELVWNYFGGKPVYLAETIKHRDELKEWCERMLKLRTSQILDELYALEKELFEKVVKLFFAFEKQESVPYRSLSEEILWAVKRNILFAEPVDRVLRPQGRLELLAIKRILDIIE\nInChI representation: InChI=1S\/C24H32ClN3O3.C2H2O4\/c1-4-28(5-2)22-14-23(30-3)20(13-21(22)25)24(29)26-15-19-17-27(11-12-31-19)16-18-9-7-6-8-10-18;3-1(4)2(5)6\/h6-10,13-14,19H,4-5,11-12,15-17H2,1-3H3,(H,26,29);(H,3,4)(H,5,6)\nResult: 28-35"}", "/scratch/micpie/export/uniprot_binding_sites_multiple/test_0-3.jsonl": "{"text":"Question: What molecule can possibly bind to the site at 13-19 in the amino acid sequence below?\nSequence: MRSVSGQVVCVTGAGGFIASWLVKILLEKGYTVRGTVRNPDDPKNGHLRELEGAKERLTLCKADLLDYQSLREAINGCDGVFHTASPVTDDPEQMVEPAVIGTKNVINAAAEANVRRVVFTSSIGAVYMDPNRDPETVVDETCWSDPDFCKNTKNWYCYGKMVAEQAAWEEAKEKGVDLVVINPVLVQGPLLQTTVNASVLHILKYLTGSAKTYANSVQAYVDVKDVALAHILLYETPEASGRYLCAESVLHRGDVVEILSKFFPEYPIPTKCSDVTKPRVKPYKFSNQKLKDLGLEFTPVKQCLYETVKSLQEKGHLPIPTQKDEPIIRIQP\nAnswer: InChI=1S\/C18H18N2O6S\/c21-13-9-26-8-12(13)20-17(24)11(5-6-16(22)23)19-18(25)15-7-10-3-1-2-4-14(10)27-15\/h1-4,7,11-12H,5-6,8-9H2,(H,19,25)(H,20,24)(H,22,23)\/t11-,12?\/m0\/s1"} {"text":"Question: What chemical can possibly bind to the binding site at the position 28 to 35 in the amino acid sequence?\nSequence: MFFNRERELEKLLRLVSTEPNLITFVYGPINSGKTALMMEFIKKLPDDHIAFYINLRRTPITSYSDFVDVLFSVEFRNKVKTLKEAVSLVLSAGKETFGFPVPTELLARITKEKKPKNAFAYIVTLMEEVRKAGKRPILILDELQVIGDLKVDGSLIYELFNFFIHLTKESHLSHVFVVTSDSLFIERVYSEAMLQGRAEYFLVDDFKRETALRFLKNNGLSDDEAELVWNYFGGKPVYLAETIKHRDELKEWCERMLKLRTSQILDELYALEKELFEKVVKLFFAFEKQESVPYRSLSEEILWAVKRNILFAEPVDRVLRPQGRLELLAIKRILDIIE\nAnswer: CCN(CC)c1cc(OC)c(C(=O)NCC2CN(Cc3ccccc3)CCO2)cc1Cl.O=C(O)C(=O)O"}", "/scratch/micpie/export/uniprot_binding_sites_multiple/train_0-0.jsonl": "{"text":"Task: Find a binding site for the molecule in the amino acid sequence.\nPeptide sequence: MKNPKKKSGGFRIVNMLKRGVARVSPFGGLKRLPAGLLLGHGPIRMVLAILAFLRFTAIKPSLGLINRWGSVGKKEAMEIIKKFKKDLAAMLRIINARKEKKRRGADTSVGIVGLLLTTAMAAEVTRRGSAYYMYLDRNDAGEAISFPTTLGMNKCYIQIMDLGHMCDATMSYECPMLDEGVEPDDVDCWCNTTSTWVVYGTCHHKKGEARRSRRAVTLPSHSTRKLQTRSQTWLESREYTKHLIRVENWIFRNPGFALAAAAIAWLLGSSTSQKVIYLVMILLIAPAYSIRCIGVSNRDFVEGMSGGTWVDVVLEHGGCVTVMAQDKPTVDIELVTTTVSNMAEVRSYCYEASISDMASDSRCPTQGEAYLDKQSDTQYVCKRTLVDRGWGNGCGLFGKGSLVTCAKFACSKKMTGKSIQPENLEYRIMLSVHGSQHSGMIVNDTGHETDENRAKVEITPNSPRAEATLGGFGSLGLDCEPRTGLDFSDLYYLTMNNKHWLVHKEWFHDIPLPWHAGADTGTPHWNNKEALVEFKDAHAKRQTVVVLGSQEGAVHTALAGALEAEMDGAKGRLSSGHLKCRLKMDKLRLKGVSYSLCTAAFTFTKIPAETLHGTVTVEVQYAGTDGPCKVPAQMAVDMQTLTPVGRLITANPVITESTENSKMMLELDPPFGDSYIVIGVGEKKITHHWHRSGSTIGKAFEATVRGAKRMAVLGDTAWDFGSVGGALNSLGKGIHQIFGAAFKSLFGGMSWFSQILIGTLLMWLGLNTKNGSISLMCLALGGVLIFLSTAVSADVGCSVDFSKKETRCGTGVFVYNDVEAWRDRYKYHPDSPRRLAAAVKQAWEDGICGISSVSRMENIMWRSVEGELNAILEENGVQLTVVVGSVKNPMWRGPQRLPVPVNELPHGWKAWGKSYFVRAAKTNNSFVVDGDTLKECPLKHRAWNSFLVEDHGFGVFHTSVWLKVREDYSLECDPAVIGTAVKGKEAVHSDLGYWIESEKNDTWRLKRAHLIEMKTCEWPKSHTLWTDGIEESDLIIPKSLAGPLSHHNTREGYRTQMKGPWHSEELEIRFEECPGTKVHVEETCGTRGPSLRSTTASGRVIEEWCCRECTMPPLSFRAKDGCWYGMEIRPRKEPESNLVRSMVTAGSTDHMDHFSLGVLVILLMVQEGLKKRMTTKIIISTSMAVLVAMILGGFSMSDLAKLAILMGATFAEMNTGGDVAHLALIAAFKVRPALLVSFIFRANWTPRESMLLALASCLLQTAISALEGDLMVLINGFALAWLAIRAMVVPRTDNITLAILAALTPLARGTLLVAWRAGLATCGGFMLLSLKGKGSVKKNLPFVMALGLTAVRLVDPINVVGLLLLTRSGKRSWPPSEVLTAVGLICALAGGFAKADIEMAGPMAAVGLLIVSYVVSGKSVDMYIERAGDITWEKDAEVTGNSPRLDVALDESGDFSLVEDDGPPMREIILKVVLMTICGMNPIAIPFAAGAWYVYVKTGKRSGALWDVPAPKEVKKGETTDGVYRVMTRRLLGSTQVGVGVMQEGVFHTMWHVTKGSALRSGEGRLDPYWGDVKQDLVSYCGPWKLDAAWDGHSEVQLLAVPPGERARNIQTLPGIFKTKDGDIGAVALDYPAGTSGSPILDKCGRVIGLYGNGVVIKNGSYVSAITQGRREEETPVECFEPSMLKKKQLTVLDLHPGAGKTRRVLPEIVREAIKTRLRTVILAPTRVVAAEMEEALRGLPVRYMTTAVNVTHSGTEIVDLMCHATFTSRLLQPIRVPNYNLYIMDEAHFTDPSSIAARGYISTRVEMGEAAAIFMTATPPGTRDAFPDSNSPIMDTEVEVPERAWSSGFDWVTDHSGKTVWFVPSVRNGNEIAACLTKAGKRVIQLSRKTFETEFQKTKHQEWDFVVTTDISEMGANFKADRVIDSRRCLKPVILDGERVILAGPMPVTHASAAQRRGRIGRNPNKPGDEYLYGGGCAETDEDHAHWLEARMLLDNIYLQDGLIASLYRPEADKVAAIEGEFKLRTEQRKTFVELMKRGDLPVWLAYQVASAGITYTDRRWCFDGTTNNTIMEDSVPAEVWTRHGEKRVLKPRWMDARVCSDHAALKSFKEFAAGKRGAAFGVMEALGTLPGHMTERFQEAIDNLAVLMRAETGSRPYKAAAAQLPETLETIMLLGLLGTVSLGIFFVLMRNKGIGKMGFGMVTLGASAWLMWLSEIEPARIACVLIVVFLLLVVLIPEPEKQRSPQDNQMAIIIMVAVGLLGLITANELGWLERTKSDLSHLMGRREEGATIGFSMDIDLRPASAWAIYAALTTFITPAVQHAVTTSYNNYSLMAMATQAGVLFGMGKGMPFYAWDFGVPLLMIGCYSQLTPLTLIVAIILLVAHYMYLIPGLQAAAARAAQKRTAAGIMKNPVVDGIVVTDIDTMTIDPQVEKKMGQVLLIAVAVSSAILSRTAWGWGEAGALITAATSTLWEGSPNKYWNSSTATSLCNIFRGSYLAGASLIYTVTRNAGLVKRRGGGTGETLGEKWKARLNQMSALEFYSYKKSGITEVCREEARRALKDGVATGGHAVSRGSAKLRWLVERGYLQPYGKVIDLGCGRGGWSYYAATIRKVQEVKGYTKGGPGHEEPMLVQSYGWNIVRLKSGVDVFHMAAEPCDTLLCDIGESSSSPEVEEARTLRVLSMVGDWLEKRPGAFCIKVLCPYTSTMMETLERLQRRYGGGLVRVPLSRNSTHEMYWVSGAKSNTIKSVSTTSQLLLGRMDGPRRPVKYEEDVNLGSGTRAVVSCAEAPNMKIIGNRIERIRSEHAETWFFDENHPYRTWAYHGSYEAPTQGSASSLINGVVRLLSKPWDVVTGVTGIAMTDTTPYGQQRVFKEKVDTRVPDPQEGTRQVMSMVSSWLWKELGKHKRPRVCTKEEFINKVRSNAALGAIFEEEKEWKTAVEAVNDPRFWALVDKEREHHLRGECQSCVYNMMGKREKKQGEFGKAKGSRAIWYMWLGARFLEFEALGFLNEDHWMGRENSGGGVEGLGLQRLGYVLEEMSRIPGGRMYADDTAGWDTRISRFDLENEALITNQMEKGHRALALAIIKYTYQNKVVKVLRPAEKGKTVMDIISRQDQRGSGQVVTYALNTFTNLVVQLIRNMEAEEVLEMQDLWLLRRSEKVTNWLQSNGWDRLKRMAVSGDDCVVKPIDDRFAHALRFLNDMGKVRKDTQEWKPSTGWDNWEEVPFCSHHFNKLHLKDGRSIVVPCRHQDELIGRARVSPGAGWSIRETACLAKSYAQMWQLLYFHRRDLRLMANAICSSVPVDWVPTGRTTWSIHGKGEWMTTEDMLVVWNRVWIEENDHMEDKTPVTKWTDIPYLGKREDLWCGSLIGHRPRTTWAENIKNTVNMVRRIIGDEEKYMDYLSTQVRYLGEEGSTPGVL\nSELFIES: [C][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][Ring1][O][C][=C][Branch2][Ring1][#Branch2][C][=Branch1][C][=O][N][C][C][C][N][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][O][Ring1][=N][C][=C][Ring2][Ring1][=Branch2][Cl].[O][=C][Branch1][C][O][C][=Branch1][C][=O][O]\nOutput: 1696-1703"} {"text":"Task: Find a binding site for the molecule in the protein.\nPeptide sequence: MVDRVKTGIPGMDDILYGGIPRRNVVLLSGGPGTGKSIFSYQYLWNGLREGEPGVFVALEEHPVQVRINMAQFGWDVREYERQGLFAVVDAFTSGIGEAAKKERYVVTDPEDVGLLIDVLKEAIRDVGAKRVAVDSVSTLYLAKPVLARRTVMLLKRVLSGLGTTSILVSQVSVTERGFGGPGVEHAADGIIRLDLDEVDGELVRSLIIWKMRGTKHSMRRHPFEITDKGIIVYPDKVVRIGRRVSIE\nSMILES representation: CCN(CC)c1cc(OC)c(C(=O)NCC2CN(Cc3ccccc3)CCO2)cc1Cl.O=C(O)C(=O)O\nResult: 30-37"}", "/scratch/micpie/export/uniprot_binding_sites_multiple/train_0-3.jsonl": "{"text":"Question: What chemical can bind to the site at the position 1696 to 1703 in the given amino acid sequence?\nSequence: MKNPKKKSGGFRIVNMLKRGVARVSPFGGLKRLPAGLLLGHGPIRMVLAILAFLRFTAIKPSLGLINRWGSVGKKEAMEIIKKFKKDLAAMLRIINARKEKKRRGADTSVGIVGLLLTTAMAAEVTRRGSAYYMYLDRNDAGEAISFPTTLGMNKCYIQIMDLGHMCDATMSYECPMLDEGVEPDDVDCWCNTTSTWVVYGTCHHKKGEARRSRRAVTLPSHSTRKLQTRSQTWLESREYTKHLIRVENWIFRNPGFALAAAAIAWLLGSSTSQKVIYLVMILLIAPAYSIRCIGVSNRDFVEGMSGGTWVDVVLEHGGCVTVMAQDKPTVDIELVTTTVSNMAEVRSYCYEASISDMASDSRCPTQGEAYLDKQSDTQYVCKRTLVDRGWGNGCGLFGKGSLVTCAKFACSKKMTGKSIQPENLEYRIMLSVHGSQHSGMIVNDTGHETDENRAKVEITPNSPRAEATLGGFGSLGLDCEPRTGLDFSDLYYLTMNNKHWLVHKEWFHDIPLPWHAGADTGTPHWNNKEALVEFKDAHAKRQTVVVLGSQEGAVHTALAGALEAEMDGAKGRLSSGHLKCRLKMDKLRLKGVSYSLCTAAFTFTKIPAETLHGTVTVEVQYAGTDGPCKVPAQMAVDMQTLTPVGRLITANPVITESTENSKMMLELDPPFGDSYIVIGVGEKKITHHWHRSGSTIGKAFEATVRGAKRMAVLGDTAWDFGSVGGALNSLGKGIHQIFGAAFKSLFGGMSWFSQILIGTLLMWLGLNTKNGSISLMCLALGGVLIFLSTAVSADVGCSVDFSKKETRCGTGVFVYNDVEAWRDRYKYHPDSPRRLAAAVKQAWEDGICGISSVSRMENIMWRSVEGELNAILEENGVQLTVVVGSVKNPMWRGPQRLPVPVNELPHGWKAWGKSYFVRAAKTNNSFVVDGDTLKECPLKHRAWNSFLVEDHGFGVFHTSVWLKVREDYSLECDPAVIGTAVKGKEAVHSDLGYWIESEKNDTWRLKRAHLIEMKTCEWPKSHTLWTDGIEESDLIIPKSLAGPLSHHNTREGYRTQMKGPWHSEELEIRFEECPGTKVHVEETCGTRGPSLRSTTASGRVIEEWCCRECTMPPLSFRAKDGCWYGMEIRPRKEPESNLVRSMVTAGSTDHMDHFSLGVLVILLMVQEGLKKRMTTKIIISTSMAVLVAMILGGFSMSDLAKLAILMGATFAEMNTGGDVAHLALIAAFKVRPALLVSFIFRANWTPRESMLLALASCLLQTAISALEGDLMVLINGFALAWLAIRAMVVPRTDNITLAILAALTPLARGTLLVAWRAGLATCGGFMLLSLKGKGSVKKNLPFVMALGLTAVRLVDPINVVGLLLLTRSGKRSWPPSEVLTAVGLICALAGGFAKADIEMAGPMAAVGLLIVSYVVSGKSVDMYIERAGDITWEKDAEVTGNSPRLDVALDESGDFSLVEDDGPPMREIILKVVLMTICGMNPIAIPFAAGAWYVYVKTGKRSGALWDVPAPKEVKKGETTDGVYRVMTRRLLGSTQVGVGVMQEGVFHTMWHVTKGSALRSGEGRLDPYWGDVKQDLVSYCGPWKLDAAWDGHSEVQLLAVPPGERARNIQTLPGIFKTKDGDIGAVALDYPAGTSGSPILDKCGRVIGLYGNGVVIKNGSYVSAITQGRREEETPVECFEPSMLKKKQLTVLDLHPGAGKTRRVLPEIVREAIKTRLRTVILAPTRVVAAEMEEALRGLPVRYMTTAVNVTHSGTEIVDLMCHATFTSRLLQPIRVPNYNLYIMDEAHFTDPSSIAARGYISTRVEMGEAAAIFMTATPPGTRDAFPDSNSPIMDTEVEVPERAWSSGFDWVTDHSGKTVWFVPSVRNGNEIAACLTKAGKRVIQLSRKTFETEFQKTKHQEWDFVVTTDISEMGANFKADRVIDSRRCLKPVILDGERVILAGPMPVTHASAAQRRGRIGRNPNKPGDEYLYGGGCAETDEDHAHWLEARMLLDNIYLQDGLIASLYRPEADKVAAIEGEFKLRTEQRKTFVELMKRGDLPVWLAYQVASAGITYTDRRWCFDGTTNNTIMEDSVPAEVWTRHGEKRVLKPRWMDARVCSDHAALKSFKEFAAGKRGAAFGVMEALGTLPGHMTERFQEAIDNLAVLMRAETGSRPYKAAAAQLPETLETIMLLGLLGTVSLGIFFVLMRNKGIGKMGFGMVTLGASAWLMWLSEIEPARIACVLIVVFLLLVVLIPEPEKQRSPQDNQMAIIIMVAVGLLGLITANELGWLERTKSDLSHLMGRREEGATIGFSMDIDLRPASAWAIYAALTTFITPAVQHAVTTSYNNYSLMAMATQAGVLFGMGKGMPFYAWDFGVPLLMIGCYSQLTPLTLIVAIILLVAHYMYLIPGLQAAAARAAQKRTAAGIMKNPVVDGIVVTDIDTMTIDPQVEKKMGQVLLIAVAVSSAILSRTAWGWGEAGALITAATSTLWEGSPNKYWNSSTATSLCNIFRGSYLAGASLIYTVTRNAGLVKRRGGGTGETLGEKWKARLNQMSALEFYSYKKSGITEVCREEARRALKDGVATGGHAVSRGSAKLRWLVERGYLQPYGKVIDLGCGRGGWSYYAATIRKVQEVKGYTKGGPGHEEPMLVQSYGWNIVRLKSGVDVFHMAAEPCDTLLCDIGESSSSPEVEEARTLRVLSMVGDWLEKRPGAFCIKVLCPYTSTMMETLERLQRRYGGGLVRVPLSRNSTHEMYWVSGAKSNTIKSVSTTSQLLLGRMDGPRRPVKYEEDVNLGSGTRAVVSCAEAPNMKIIGNRIERIRSEHAETWFFDENHPYRTWAYHGSYEAPTQGSASSLINGVVRLLSKPWDVVTGVTGIAMTDTTPYGQQRVFKEKVDTRVPDPQEGTRQVMSMVSSWLWKELGKHKRPRVCTKEEFINKVRSNAALGAIFEEEKEWKTAVEAVNDPRFWALVDKEREHHLRGECQSCVYNMMGKREKKQGEFGKAKGSRAIWYMWLGARFLEFEALGFLNEDHWMGRENSGGGVEGLGLQRLGYVLEEMSRIPGGRMYADDTAGWDTRISRFDLENEALITNQMEKGHRALALAIIKYTYQNKVVKVLRPAEKGKTVMDIISRQDQRGSGQVVTYALNTFTNLVVQLIRNMEAEEVLEMQDLWLLRRSEKVTNWLQSNGWDRLKRMAVSGDDCVVKPIDDRFAHALRFLNDMGKVRKDTQEWKPSTGWDNWEEVPFCSHHFNKLHLKDGRSIVVPCRHQDELIGRARVSPGAGWSIRETACLAKSYAQMWQLLYFHRRDLRLMANAICSSVPVDWVPTGRTTWSIHGKGEWMTTEDMLVVWNRVWIEENDHMEDKTPVTKWTDIPYLGKREDLWCGSLIGHRPRTTWAENIKNTVNMVRRIIGDEEKYMDYLSTQVRYLGEEGSTPGVL\nAnswer: CCN(CC)c1cc(OC)c(C(=O)NCC2CN(Cc3ccccc3)CCO2)cc1Cl.O=C(O)C(=O)O"} {"text":"Question: What chemical can possibly bind to the binding site at 30-37 in the given protein sequence below?\nSequence: MVDRVKTGIPGMDDILYGGIPRRNVVLLSGGPGTGKSIFSYQYLWNGLREGEPGVFVALEEHPVQVRINMAQFGWDVREYERQGLFAVVDAFTSGIGEAAKKERYVVTDPEDVGLLIDVLKEAIRDVGAKRVAVDSVSTLYLAKPVLARRTVMLLKRVLSGLGTTSILVSQVSVTERGFGGPGVEHAADGIIRLDLDEVDGELVRSLIIWKMRGTKHSMRRHPFEITDKGIIVYPDKVVRIGRRVSIE\nAnswer: CCN(CC)c1cc(OC)c(C(=O)NCC2CN(Cc3ccccc3)CCO2)cc1Cl.O=C(O)C(=O)O"}", "/scratch/micpie/export/uniprot_binding_sites_multiple/valid_0-2.jsonl": "{"text":"Question: Can you give me one example of a binding site of the compound with the SELFIES representation [C][C][C][C][=N][C][=C][Branch1][C][C][C][=C][N][=C][Ring1][#Branch1][N][Ring1][#Branch2][C][C][=C][C][=C][Branch2][Ring1][Ring2][O][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=N][NH1][Ring1][Branch1][C][=C][Ring2][Ring1][Ring1] in this protein MIKHFLEDNSDDAELSKFVKDFPGSEPCHPTESKTRVARPQILEPRPQSPDLCDDDVEFRATLWSQPSDSQQYFCPPAPLSPSSRPRSPWGKLDPYDSSEDDKEYVGFATLPNQVHRKSVKKGFDFTLMVAGESGLGKSTLVNSLFLTDLYRDRKLLGAEERIMQTVEITKHAVDIEEKGVRLRLTIVDTPGFGDAVNNTECWRPVAEYIDQQFEQYFRDESGLNRKNIQDNRVHCCLYFISPFGHGLRPLDVEFMKALHQRVNIVPILAKADTLTPSEVDRKKCKIREEIEHFGIKIYQFPDCDSDEDEDFKLQDQALKESIPFAVIGSNTVVEARGRRVRGRLYPWGIVEVENPGHCDFVKLRTMLVRTHMQDLKDVTRETHYENYRAQCIQSMTRLVVKERNRNKLTRESGTDFPIPAVPPGTDPETEKLIREKDEELRRMQEMLHKIQRQMKETH?\nAnswer: One binding site for the molecule is 132 to 139."} {"text":"Question: Can you find one binding site of the molecule with the InChI InChI=1S\/C24H32ClN3O3.C2H2O4\/c1-4-28(5-2)22-14-23(30-3)20(13-21(22)25)24(29)26-15-19-17-27(11-12-31-19)16-18-9-7-6-8-10-18;3-1(4)2(5)6\/h6-10,13-14,19H,4-5,11-12,15-17H2,1-3H3,(H,26,29);(H,3,4)(H,5,6) in this amino acid sequence MVQTVYKNSDQTVFEDAKALFQLNKNILLKGPTGSGKTKLAETLSNVMKLPMHQVNCSVDLDTESLLGFKTIHTNEEGHQEIVFIDGPVIKAMKEGHILYIDEINMAKPETLPILNGVLDYRRQLTNPYTGEVIKAAPGFNVIAAINEGYVGTLPMNEALKNRFIVIEVDYIDGDILKTVIKEQSKLQDEQLIQHIVKFNEDLRTMTKQGQISEEAASIRALIDLSDLATVMPIERAVQRTIIDKLEDEREQQAILNAIELNF?\nAnswer: One binding site for the compound is 31-38."}", "/scratch/micpie/export/uniprot_binding_sites_multiple/valid_0-1.jsonl": "{"text":"Task: Come up with a molecule that binds to the given in the peptide sequence.\nAA sequence: MIKHFLEDNSDDAELSKFVKDFPGSEPCHPTESKTRVARPQILEPRPQSPDLCDDDVEFRATLWSQPSDSQQYFCPPAPLSPSSRPRSPWGKLDPYDSSEDDKEYVGFATLPNQVHRKSVKKGFDFTLMVAGESGLGKSTLVNSLFLTDLYRDRKLLGAEERIMQTVEITKHAVDIEEKGVRLRLTIVDTPGFGDAVNNTECWRPVAEYIDQQFEQYFRDESGLNRKNIQDNRVHCCLYFISPFGHGLRPLDVEFMKALHQRVNIVPILAKADTLTPSEVDRKKCKIREEIEHFGIKIYQFPDCDSDEDEDFKLQDQALKESIPFAVIGSNTVVEARGRRVRGRLYPWGIVEVENPGHCDFVKLRTMLVRTHMQDLKDVTRETHYENYRAQCIQSMTRLVVKERNRNKLTRESGTDFPIPAVPPGTDPETEKLIREKDEELRRMQEMLHKIQRQMKETH\nBinding site: 132 to 139\nResult: CCCcnccC)ccnc6n9CccccOCcccccc6))))))cnnn[nH]5)))))))cc6"} {"text":"Task: Create a chemical that binds to the given binding site in the peptide sequence.\nPeptide sequence: MVQTVYKNSDQTVFEDAKALFQLNKNILLKGPTGSGKTKLAETLSNVMKLPMHQVNCSVDLDTESLLGFKTIHTNEEGHQEIVFIDGPVIKAMKEGHILYIDEINMAKPETLPILNGVLDYRRQLTNPYTGEVIKAAPGFNVIAAINEGYVGTLPMNEALKNRFIVIEVDYIDGDILKTVIKEQSKLQDEQLIQHIVKFNEDLRTMTKQGQISEEAASIRALIDLSDLATVMPIERAVQRTIIDKLEDEREQQAILNAIELNF\nBinding site: 31 to 38\nOutput: CCNCC))cccOC))cC=O)NCCCNCcccccc6)))))))CCO6)))))))))cc6Cl.O=CO)C=O)O"}", "/scratch/micpie/export/uniprot_binding_sites_multiple/train_0-2.jsonl": "{"text":"Question: Can you give me one example of a binding site of the compound with the SMILES representation CCN(CC)c1cc(OC)c(C(=O)NCC2CN(Cc3ccccc3)CCO2)cc1Cl.O=C(O)C(=O)O in this protein MKNPKKKSGGFRIVNMLKRGVARVSPFGGLKRLPAGLLLGHGPIRMVLAILAFLRFTAIKPSLGLINRWGSVGKKEAMEIIKKFKKDLAAMLRIINARKEKKRRGADTSVGIVGLLLTTAMAAEVTRRGSAYYMYLDRNDAGEAISFPTTLGMNKCYIQIMDLGHMCDATMSYECPMLDEGVEPDDVDCWCNTTSTWVVYGTCHHKKGEARRSRRAVTLPSHSTRKLQTRSQTWLESREYTKHLIRVENWIFRNPGFALAAAAIAWLLGSSTSQKVIYLVMILLIAPAYSIRCIGVSNRDFVEGMSGGTWVDVVLEHGGCVTVMAQDKPTVDIELVTTTVSNMAEVRSYCYEASISDMASDSRCPTQGEAYLDKQSDTQYVCKRTLVDRGWGNGCGLFGKGSLVTCAKFACSKKMTGKSIQPENLEYRIMLSVHGSQHSGMIVNDTGHETDENRAKVEITPNSPRAEATLGGFGSLGLDCEPRTGLDFSDLYYLTMNNKHWLVHKEWFHDIPLPWHAGADTGTPHWNNKEALVEFKDAHAKRQTVVVLGSQEGAVHTALAGALEAEMDGAKGRLSSGHLKCRLKMDKLRLKGVSYSLCTAAFTFTKIPAETLHGTVTVEVQYAGTDGPCKVPAQMAVDMQTLTPVGRLITANPVITESTENSKMMLELDPPFGDSYIVIGVGEKKITHHWHRSGSTIGKAFEATVRGAKRMAVLGDTAWDFGSVGGALNSLGKGIHQIFGAAFKSLFGGMSWFSQILIGTLLMWLGLNTKNGSISLMCLALGGVLIFLSTAVSADVGCSVDFSKKETRCGTGVFVYNDVEAWRDRYKYHPDSPRRLAAAVKQAWEDGICGISSVSRMENIMWRSVEGELNAILEENGVQLTVVVGSVKNPMWRGPQRLPVPVNELPHGWKAWGKSYFVRAAKTNNSFVVDGDTLKECPLKHRAWNSFLVEDHGFGVFHTSVWLKVREDYSLECDPAVIGTAVKGKEAVHSDLGYWIESEKNDTWRLKRAHLIEMKTCEWPKSHTLWTDGIEESDLIIPKSLAGPLSHHNTREGYRTQMKGPWHSEELEIRFEECPGTKVHVEETCGTRGPSLRSTTASGRVIEEWCCRECTMPPLSFRAKDGCWYGMEIRPRKEPESNLVRSMVTAGSTDHMDHFSLGVLVILLMVQEGLKKRMTTKIIISTSMAVLVAMILGGFSMSDLAKLAILMGATFAEMNTGGDVAHLALIAAFKVRPALLVSFIFRANWTPRESMLLALASCLLQTAISALEGDLMVLINGFALAWLAIRAMVVPRTDNITLAILAALTPLARGTLLVAWRAGLATCGGFMLLSLKGKGSVKKNLPFVMALGLTAVRLVDPINVVGLLLLTRSGKRSWPPSEVLTAVGLICALAGGFAKADIEMAGPMAAVGLLIVSYVVSGKSVDMYIERAGDITWEKDAEVTGNSPRLDVALDESGDFSLVEDDGPPMREIILKVVLMTICGMNPIAIPFAAGAWYVYVKTGKRSGALWDVPAPKEVKKGETTDGVYRVMTRRLLGSTQVGVGVMQEGVFHTMWHVTKGSALRSGEGRLDPYWGDVKQDLVSYCGPWKLDAAWDGHSEVQLLAVPPGERARNIQTLPGIFKTKDGDIGAVALDYPAGTSGSPILDKCGRVIGLYGNGVVIKNGSYVSAITQGRREEETPVECFEPSMLKKKQLTVLDLHPGAGKTRRVLPEIVREAIKTRLRTVILAPTRVVAAEMEEALRGLPVRYMTTAVNVTHSGTEIVDLMCHATFTSRLLQPIRVPNYNLYIMDEAHFTDPSSIAARGYISTRVEMGEAAAIFMTATPPGTRDAFPDSNSPIMDTEVEVPERAWSSGFDWVTDHSGKTVWFVPSVRNGNEIAACLTKAGKRVIQLSRKTFETEFQKTKHQEWDFVVTTDISEMGANFKADRVIDSRRCLKPVILDGERVILAGPMPVTHASAAQRRGRIGRNPNKPGDEYLYGGGCAETDEDHAHWLEARMLLDNIYLQDGLIASLYRPEADKVAAIEGEFKLRTEQRKTFVELMKRGDLPVWLAYQVASAGITYTDRRWCFDGTTNNTIMEDSVPAEVWTRHGEKRVLKPRWMDARVCSDHAALKSFKEFAAGKRGAAFGVMEALGTLPGHMTERFQEAIDNLAVLMRAETGSRPYKAAAAQLPETLETIMLLGLLGTVSLGIFFVLMRNKGIGKMGFGMVTLGASAWLMWLSEIEPARIACVLIVVFLLLVVLIPEPEKQRSPQDNQMAIIIMVAVGLLGLITANELGWLERTKSDLSHLMGRREEGATIGFSMDIDLRPASAWAIYAALTTFITPAVQHAVTTSYNNYSLMAMATQAGVLFGMGKGMPFYAWDFGVPLLMIGCYSQLTPLTLIVAIILLVAHYMYLIPGLQAAAARAAQKRTAAGIMKNPVVDGIVVTDIDTMTIDPQVEKKMGQVLLIAVAVSSAILSRTAWGWGEAGALITAATSTLWEGSPNKYWNSSTATSLCNIFRGSYLAGASLIYTVTRNAGLVKRRGGGTGETLGEKWKARLNQMSALEFYSYKKSGITEVCREEARRALKDGVATGGHAVSRGSAKLRWLVERGYLQPYGKVIDLGCGRGGWSYYAATIRKVQEVKGYTKGGPGHEEPMLVQSYGWNIVRLKSGVDVFHMAAEPCDTLLCDIGESSSSPEVEEARTLRVLSMVGDWLEKRPGAFCIKVLCPYTSTMMETLERLQRRYGGGLVRVPLSRNSTHEMYWVSGAKSNTIKSVSTTSQLLLGRMDGPRRPVKYEEDVNLGSGTRAVVSCAEAPNMKIIGNRIERIRSEHAETWFFDENHPYRTWAYHGSYEAPTQGSASSLINGVVRLLSKPWDVVTGVTGIAMTDTTPYGQQRVFKEKVDTRVPDPQEGTRQVMSMVSSWLWKELGKHKRPRVCTKEEFINKVRSNAALGAIFEEEKEWKTAVEAVNDPRFWALVDKEREHHLRGECQSCVYNMMGKREKKQGEFGKAKGSRAIWYMWLGARFLEFEALGFLNEDHWMGRENSGGGVEGLGLQRLGYVLEEMSRIPGGRMYADDTAGWDTRISRFDLENEALITNQMEKGHRALALAIIKYTYQNKVVKVLRPAEKGKTVMDIISRQDQRGSGQVVTYALNTFTNLVVQLIRNMEAEEVLEMQDLWLLRRSEKVTNWLQSNGWDRLKRMAVSGDDCVVKPIDDRFAHALRFLNDMGKVRKDTQEWKPSTGWDNWEEVPFCSHHFNKLHLKDGRSIVVPCRHQDELIGRARVSPGAGWSIRETACLAKSYAQMWQLLYFHRRDLRLMANAICSSVPVDWVPTGRTTWSIHGKGEWMTTEDMLVVWNRVWIEENDHMEDKTPVTKWTDIPYLGKREDLWCGSLIGHRPRTTWAENIKNTVNMVRRIIGDEEKYMDYLSTQVRYLGEEGSTPGVL?\nAnswer: One binding site for the molecule is 1696 to 1703."} {"text":"Question: Can you give me one example of a binding site of the chemical with the DeepSMILES representation CCNCC))cccOC))cC=O)NCCCNCcccccc6)))))))CCO6)))))))))cc6Cl.O=CO)C=O)O in this amino acid sequence MVDRVKTGIPGMDDILYGGIPRRNVVLLSGGPGTGKSIFSYQYLWNGLREGEPGVFVALEEHPVQVRINMAQFGWDVREYERQGLFAVVDAFTSGIGEAAKKERYVVTDPEDVGLLIDVLKEAIRDVGAKRVAVDSVSTLYLAKPVLARRTVMLLKRVLSGLGTTSILVSQVSVTERGFGGPGVEHAADGIIRLDLDEVDGELVRSLIIWKMRGTKHSMRRHPFEITDKGIIVYPDKVVRIGRRVSIE?\nAnswer: One possible site for the molecule is 30 to 37."}", "/scratch/micpie/export/uniprot_binding_sites_multiple/train_0-1.jsonl": "{"text":"Task: Create a compound that binds to the given binding site in the amino acid sequence.\nPeptide sequence: MKNPKKKSGGFRIVNMLKRGVARVSPFGGLKRLPAGLLLGHGPIRMVLAILAFLRFTAIKPSLGLINRWGSVGKKEAMEIIKKFKKDLAAMLRIINARKEKKRRGADTSVGIVGLLLTTAMAAEVTRRGSAYYMYLDRNDAGEAISFPTTLGMNKCYIQIMDLGHMCDATMSYECPMLDEGVEPDDVDCWCNTTSTWVVYGTCHHKKGEARRSRRAVTLPSHSTRKLQTRSQTWLESREYTKHLIRVENWIFRNPGFALAAAAIAWLLGSSTSQKVIYLVMILLIAPAYSIRCIGVSNRDFVEGMSGGTWVDVVLEHGGCVTVMAQDKPTVDIELVTTTVSNMAEVRSYCYEASISDMASDSRCPTQGEAYLDKQSDTQYVCKRTLVDRGWGNGCGLFGKGSLVTCAKFACSKKMTGKSIQPENLEYRIMLSVHGSQHSGMIVNDTGHETDENRAKVEITPNSPRAEATLGGFGSLGLDCEPRTGLDFSDLYYLTMNNKHWLVHKEWFHDIPLPWHAGADTGTPHWNNKEALVEFKDAHAKRQTVVVLGSQEGAVHTALAGALEAEMDGAKGRLSSGHLKCRLKMDKLRLKGVSYSLCTAAFTFTKIPAETLHGTVTVEVQYAGTDGPCKVPAQMAVDMQTLTPVGRLITANPVITESTENSKMMLELDPPFGDSYIVIGVGEKKITHHWHRSGSTIGKAFEATVRGAKRMAVLGDTAWDFGSVGGALNSLGKGIHQIFGAAFKSLFGGMSWFSQILIGTLLMWLGLNTKNGSISLMCLALGGVLIFLSTAVSADVGCSVDFSKKETRCGTGVFVYNDVEAWRDRYKYHPDSPRRLAAAVKQAWEDGICGISSVSRMENIMWRSVEGELNAILEENGVQLTVVVGSVKNPMWRGPQRLPVPVNELPHGWKAWGKSYFVRAAKTNNSFVVDGDTLKECPLKHRAWNSFLVEDHGFGVFHTSVWLKVREDYSLECDPAVIGTAVKGKEAVHSDLGYWIESEKNDTWRLKRAHLIEMKTCEWPKSHTLWTDGIEESDLIIPKSLAGPLSHHNTREGYRTQMKGPWHSEELEIRFEECPGTKVHVEETCGTRGPSLRSTTASGRVIEEWCCRECTMPPLSFRAKDGCWYGMEIRPRKEPESNLVRSMVTAGSTDHMDHFSLGVLVILLMVQEGLKKRMTTKIIISTSMAVLVAMILGGFSMSDLAKLAILMGATFAEMNTGGDVAHLALIAAFKVRPALLVSFIFRANWTPRESMLLALASCLLQTAISALEGDLMVLINGFALAWLAIRAMVVPRTDNITLAILAALTPLARGTLLVAWRAGLATCGGFMLLSLKGKGSVKKNLPFVMALGLTAVRLVDPINVVGLLLLTRSGKRSWPPSEVLTAVGLICALAGGFAKADIEMAGPMAAVGLLIVSYVVSGKSVDMYIERAGDITWEKDAEVTGNSPRLDVALDESGDFSLVEDDGPPMREIILKVVLMTICGMNPIAIPFAAGAWYVYVKTGKRSGALWDVPAPKEVKKGETTDGVYRVMTRRLLGSTQVGVGVMQEGVFHTMWHVTKGSALRSGEGRLDPYWGDVKQDLVSYCGPWKLDAAWDGHSEVQLLAVPPGERARNIQTLPGIFKTKDGDIGAVALDYPAGTSGSPILDKCGRVIGLYGNGVVIKNGSYVSAITQGRREEETPVECFEPSMLKKKQLTVLDLHPGAGKTRRVLPEIVREAIKTRLRTVILAPTRVVAAEMEEALRGLPVRYMTTAVNVTHSGTEIVDLMCHATFTSRLLQPIRVPNYNLYIMDEAHFTDPSSIAARGYISTRVEMGEAAAIFMTATPPGTRDAFPDSNSPIMDTEVEVPERAWSSGFDWVTDHSGKTVWFVPSVRNGNEIAACLTKAGKRVIQLSRKTFETEFQKTKHQEWDFVVTTDISEMGANFKADRVIDSRRCLKPVILDGERVILAGPMPVTHASAAQRRGRIGRNPNKPGDEYLYGGGCAETDEDHAHWLEARMLLDNIYLQDGLIASLYRPEADKVAAIEGEFKLRTEQRKTFVELMKRGDLPVWLAYQVASAGITYTDRRWCFDGTTNNTIMEDSVPAEVWTRHGEKRVLKPRWMDARVCSDHAALKSFKEFAAGKRGAAFGVMEALGTLPGHMTERFQEAIDNLAVLMRAETGSRPYKAAAAQLPETLETIMLLGLLGTVSLGIFFVLMRNKGIGKMGFGMVTLGASAWLMWLSEIEPARIACVLIVVFLLLVVLIPEPEKQRSPQDNQMAIIIMVAVGLLGLITANELGWLERTKSDLSHLMGRREEGATIGFSMDIDLRPASAWAIYAALTTFITPAVQHAVTTSYNNYSLMAMATQAGVLFGMGKGMPFYAWDFGVPLLMIGCYSQLTPLTLIVAIILLVAHYMYLIPGLQAAAARAAQKRTAAGIMKNPVVDGIVVTDIDTMTIDPQVEKKMGQVLLIAVAVSSAILSRTAWGWGEAGALITAATSTLWEGSPNKYWNSSTATSLCNIFRGSYLAGASLIYTVTRNAGLVKRRGGGTGETLGEKWKARLNQMSALEFYSYKKSGITEVCREEARRALKDGVATGGHAVSRGSAKLRWLVERGYLQPYGKVIDLGCGRGGWSYYAATIRKVQEVKGYTKGGPGHEEPMLVQSYGWNIVRLKSGVDVFHMAAEPCDTLLCDIGESSSSPEVEEARTLRVLSMVGDWLEKRPGAFCIKVLCPYTSTMMETLERLQRRYGGGLVRVPLSRNSTHEMYWVSGAKSNTIKSVSTTSQLLLGRMDGPRRPVKYEEDVNLGSGTRAVVSCAEAPNMKIIGNRIERIRSEHAETWFFDENHPYRTWAYHGSYEAPTQGSASSLINGVVRLLSKPWDVVTGVTGIAMTDTTPYGQQRVFKEKVDTRVPDPQEGTRQVMSMVSSWLWKELGKHKRPRVCTKEEFINKVRSNAALGAIFEEEKEWKTAVEAVNDPRFWALVDKEREHHLRGECQSCVYNMMGKREKKQGEFGKAKGSRAIWYMWLGARFLEFEALGFLNEDHWMGRENSGGGVEGLGLQRLGYVLEEMSRIPGGRMYADDTAGWDTRISRFDLENEALITNQMEKGHRALALAIIKYTYQNKVVKVLRPAEKGKTVMDIISRQDQRGSGQVVTYALNTFTNLVVQLIRNMEAEEVLEMQDLWLLRRSEKVTNWLQSNGWDRLKRMAVSGDDCVVKPIDDRFAHALRFLNDMGKVRKDTQEWKPSTGWDNWEEVPFCSHHFNKLHLKDGRSIVVPCRHQDELIGRARVSPGAGWSIRETACLAKSYAQMWQLLYFHRRDLRLMANAICSSVPVDWVPTGRTTWSIHGKGEWMTTEDMLVVWNRVWIEENDHMEDKTPVTKWTDIPYLGKREDLWCGSLIGHRPRTTWAENIKNTVNMVRRIIGDEEKYMDYLSTQVRYLGEEGSTPGVL\nBinding site: 1696 to 1703\nOutput: [C][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][Ring1][O][C][=C][Branch2][Ring1][#Branch2][C][=Branch1][C][=O][N][C][C][C][N][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][O][Ring1][=N][C][=C][Ring2][Ring1][=Branch2][Cl].[O][=C][Branch1][C][O][C][=Branch1][C][=O][O]"} {"text":"Task: Design a chemical that binds to the given binding site in the AA sequence.\nAmino acid sequence: MVDRVKTGIPGMDDILYGGIPRRNVVLLSGGPGTGKSIFSYQYLWNGLREGEPGVFVALEEHPVQVRINMAQFGWDVREYERQGLFAVVDAFTSGIGEAAKKERYVVTDPEDVGLLIDVLKEAIRDVGAKRVAVDSVSTLYLAKPVLARRTVMLLKRVLSGLGTTSILVSQVSVTERGFGGPGVEHAADGIIRLDLDEVDGELVRSLIIWKMRGTKHSMRRHPFEITDKGIIVYPDKVVRIGRRVSIE\nBinding site: 30-37\nResult: InChI=1S\/C24H32ClN3O3.C2H2O4\/c1-4-28(5-2)22-14-23(30-3)20(13-21(22)25)24(29)26-15-19-17-27(11-12-31-19)16-18-9-7-6-8-10-18;3-1(4)2(5)6\/h6-10,13-14,19H,4-5,11-12,15-17H2,1-3H3,(H,26,29);(H,3,4)(H,5,6)"}", "/scratch/micpie/export/uniprot_binding_sites_multiple/valid_0-3.jsonl": "{"text":"Question: What compound can possibly bind to the binding site at 132-139 in the given protein sequence?\nSequence: MIKHFLEDNSDDAELSKFVKDFPGSEPCHPTESKTRVARPQILEPRPQSPDLCDDDVEFRATLWSQPSDSQQYFCPPAPLSPSSRPRSPWGKLDPYDSSEDDKEYVGFATLPNQVHRKSVKKGFDFTLMVAGESGLGKSTLVNSLFLTDLYRDRKLLGAEERIMQTVEITKHAVDIEEKGVRLRLTIVDTPGFGDAVNNTECWRPVAEYIDQQFEQYFRDESGLNRKNIQDNRVHCCLYFISPFGHGLRPLDVEFMKALHQRVNIVPILAKADTLTPSEVDRKKCKIREEIEHFGIKIYQFPDCDSDEDEDFKLQDQALKESIPFAVIGSNTVVEARGRRVRGRLYPWGIVEVENPGHCDFVKLRTMLVRTHMQDLKDVTRETHYENYRAQCIQSMTRLVVKERNRNKLTRESGTDFPIPAVPPGTDPETEKLIREKDEELRRMQEMLHKIQRQMKETH\nAnswer: InChI=1S\/C25H25N7O\/c1-3-7-21-27-22-17(2)14-15-26-25(22)32(21)16-18-10-12-20(13-11-18)33-23(24-28-30-31-29-24)19-8-5-4-6-9-19\/h4-6,8-15,23H,3,7,16H2,1-2H3,(H,28,29,30,31)"} {"text":"Question: What molecule can possibly bind to the binding site at 31-38 in the given AA sequence below?\nSequence: MVQTVYKNSDQTVFEDAKALFQLNKNILLKGPTGSGKTKLAETLSNVMKLPMHQVNCSVDLDTESLLGFKTIHTNEEGHQEIVFIDGPVIKAMKEGHILYIDEINMAKPETLPILNGVLDYRRQLTNPYTGEVIKAAPGFNVIAAINEGYVGTLPMNEALKNRFIVIEVDYIDGDILKTVIKEQSKLQDEQLIQHIVKFNEDLRTMTKQGQISEEAASIRALIDLSDLATVMPIERAVQRTIIDKLEDEREQQAILNAIELNF\nAnswer: CCN(CC)c1cc(OC)c(C(=O)NCC2CN(Cc3ccccc3)CCO2)cc1Cl.O=C(O)C(=O)O"}", "/scratch/micpie/export/MUV_713/valid_0-0.jsonl": "{"text":"The chemical compound with the InChI InChI=1S\/C14H12O3S\/c1-2-12(15)10-5-7-11(8-6-10)17-14(16)13-4-3-9-18-13\/h3-9H,2H2,1H3 is not an inhibitor of the estrogen receptor-alpha-coactivator binding."} {"text":"The compound with the canonical SMILES representation of COc1ccccc1-c1c(C)nn2c(N3CCC4(CC3)OCCO4)cc(C)nc12 is not an inhibitor of the estrogen receptor-alpha-coactivator binding."}", "/scratch/micpie/export/MUV_713/test_0-0.jsonl": "{"text":"The compound with the canonical SMILES representation of COc1ccc(N(CC(=O)Nc2cccc(C(=O)O)c2)S(=O)(=O)c2c(C)noc2C)cc1 is not an inhibitor of the estrogen receptor-alpha-coactivator binding."} {"text":"The molecular species with the SELFIES representation of ['[C][C][=C][Branch1][C][C][C][=Branch1][C][=O][C][Branch1][=C][C][C][C][C][C][#C][C][C][C][C][#C][C][O][=C][Branch1][C][C][C][Ring2][Ring1][=Branch1][=O]'] is not an inhibitor of the estrogen receptor-alpha-coactivator binding."}", "/scratch/micpie/export/MUV_713/train_0-0.jsonl": "{"text":"The molecule with the SMILES representation of CCn1c(CSc2nccn2C)nc2cc(C(=O)O)ccc21 is not an inhibitor of the ER-alpha-coact. binding."} {"text":"The molecular species with the DeepSMILES CCOC=O)CC)ncncccnn5-ccccCl)cc6)))))))))c6=O is not an inhibitor of the ER-alpha-coact. binding."}", "/scratch/micpie/export/MUV_692/valid_0-0.jsonl": "{"text":"The molecular species with the SMILES Cn1ccnc1SCC(=O)Nc1ccc(Oc2ccccc2)cc1 is not an agonist of the steroidogenic factor 1 (SF-1)."} {"text":"The chemical with the InChI InChI=1S\/C17H19N3O2S2\/c1-12-11-23-17-19-15(10-20(12)17)9-18-24(21,22)16-7-6-13-4-2-3-5-14(13)8-16\/h6-8,10-11,18H,2-5,9H2,1H3 is not an agonist of the steroidogenic factor 1 (SF-1)."}", "/scratch/micpie/export/MUV_692/test_0-0.jsonl": "{"text":"The chemical with the canonical SMILES NC(=O)NC(Cc1ccccc1)C(=O)O is not an agonist of SF-1."} {"text":"The molecular species with the SMILES representation of Cc1cc(C(=O)CN(Cc2ccc(F)cc2)S(=O)(=O)c2ccc3c(c2)OCCO3)c(C)n1C is not an agonist of the steroidogenic factor 1 (SF-1)."}", "/scratch/micpie/export/MUV_692/train_0-0.jsonl": "{"text":"The molecular species with the DeepSMILES CCncCScnccn5C))))))))ncccC=O)O))ccc69 is not an agonist of SF-1."} {"text":"The chemical compound with the InChI representation of InChI=1S\/C19H22N4O3S2\/c1-13-16(6-8-25-13)18-21-22-19(23(18)11-14-4-2-7-26-14)28-12-17(24)20-10-15-5-3-9-27-15\/h3,5-6,8-9,14H,2,4,7,10-12H2,1H3,(H,20,24) is not an agonist of the steroidogenic factor 1 (SF-1)."}", "/scratch/micpie/export/bio_ner_9/valid_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: AIF is required for oxidative phosphorylation and for the assembly and\/or stabilization of respiratory complex I. 41 Upon induction of apoptosis, AIF is cleaved and released into the cytosol, where it translocates to the nucleus and mediates chromatin condensation and large-scale DNA fragmentation. 41 However, this well-known pro-apoptotic action of AIF is in conflict with the observation that AIF is essential for the maintenance of normal heart function and its inactivation results in dilated C. 42 Moreover, cardiac myocytes isolated from a mouse model with 80% reduction in AIF levels manifested increased cell death induced by oxidative stress, and the hearts of these mice displayed enhanced ischemic damage after in vivo I\/R. 43 Although it has been described that AIF is released from cardiac myocyte mitochondria during I\/R, its contribution to I\/R-induced apoptosis was discounted. 38 However, AIF has been implicated in cardiac myocyte death induced by oxidative stress and HF. 44.\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: cytosol,185,192,Anatomy\nnucleus,223,230,Anatomy\nchromatin,244,253,Anatomy\nheart,452,457,Anatomy\ncardiac myocytes,523,539,Anatomy\ncell,622,626,Anatomy\nhearts,670,676,Anatomy\ncardiac myocyte mitochondria,807,835,Anatomy\ncardiac myocyte,951,966,Anatomy"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Table1Summary of the experimental set-up used for the processing of a lactate-rich medium to methaneMethane-yielding microbial communities processing lactate-rich medium to methaneM1AM1BThe seed methanogenic inoculumActivated sludge from a municipal waste treatment plant Warszawa Poudnie in Warsaw, Poland, sampled in the winteraMethane-yielding sludge from the 50-L-UASB bioreactor processing acidic effluent from molasses fermentation [ 19] inoculated with activated sludge from a municipal waste treatment plant Warszawa Poudnie in Warsaw, Poland, sampled in the autumnaLactate-rich mediummodified M9 (containing MgCl2 instead of MgSO4, without glucose) Sodium lactate 8.26g\/L, butyric acid 1.06g\/L, propionic acid 0.97g\/L, acetic acid 1.54g\/LSodium lactate 7g\/L, sodium butyrate 1.3g\/L, propionic acid 0.99g\/L, acetic acid 1.05g\/L, yeast extract 0.5g\/LHydraulic retention time (HRT), days77Culture history-Incubation at room temperature (2025C) for 26days after inoculation-27th57th day of cultivationneutralized medium continuously supplied to the bioreactor-Since 58th day of cultivationnon-neutralized medium supplied to the bioreactorIncubation at room temperature (2025C) for 10days after inoculation..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: lactate - rich,72,86,state\nlactate - rich,156,170,state\nmethanogenic,203,215,state\n50 - L,373,379,state\nfermentation,439,451,state\nroom temperature,971,987,state\n2025C,990,995,state\nroom temperature,1210,1226,state\n2025C,1229,1234,state"}", "/scratch/micpie/export/bio_ner_9/test_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: The tree\/fruit can be divided into several anatomical compartments: (1) seed, (2) juice, (3) peel, (4) leaf, (5) flower, (6) bark, and (7) roots, each of which has interesting pharmacologic activity..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: fruit,11,16,Anatomy\ncompartments,56,68,Anatomy\nseed,75,79,Anatomy\njuice,86,91,Anatomy\npeel,98,102,Anatomy\nleaf,109,113,Anatomy\nflower,120,126,Anatomy\nbark,133,137,Anatomy\nroots,148,153,Anatomy"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Four reproductive tissues were isolated: the testes and male accessory glands (MAGs) from males, and the ovaries and lower reproductive tract (LRT, which comprises the atrium, the spermatheca and the parovarium) from females..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: testes,45,51,body-site\nmale accessory glands,56,77,body-site\nMAGs,80,84,body-site\novaries,106,113,body-site\nlower reproductive tract,118,142,body-site\nLRT,145,148,body-site\natrium,170,176,body-site\nspermatheca,182,193,body-site\nparovarium,202,212,body-site"}", "/scratch/micpie/export/bio_ner_9/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: The motivational neural reward anticipation signal presumably activates the reward network including the amygdala, orbitofrontal cortex, the more ventral and dorsal striatum (nucleus accumbens, putamen, and caudate), leading to an release of caudate\/SNr inhibition on the executive oculomotor structure superior colliculus..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: neural,17,23,Anatomy\namygdala,105,113,Anatomy\norbitofrontal cortex,115,135,Anatomy\nventral,146,153,Anatomy\ndorsal striatum,158,173,Anatomy\nnucleus accumbens,176,193,Anatomy\nputamen,195,202,Anatomy\ncaudate,208,215,Anatomy\noculomotor structure superior colliculus,285,325,Anatomy"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Dams were maintained on a high-fat diet (HFD) consisting of 36% fat (from porcine and poultry fat, corn and fish oil; TAD Primate Diet 5LOP, Test Diet, St. Louis, MO) that is supplemented with calorically dense treats (consisting of Glaxo powder\/TAD pellets, peanut butter, honey, banana, and cornstarch)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: TAD Primate Diet 5LOP,122,143,treatment\nTest Diet,145,154,treatment\ncalorically dense treats,197,221,treatment\nGlaxo powder,238,250,treatment\nTAD pellets,253,264,treatment\npeanut butter,266,279,treatment\nhoney,281,286,treatment\nbanana,288,294,treatment\ncornstarch,300,310,treatment"}", "/scratch/micpie/export/chemdner/test_0-1.jsonl": "{"text":"User: Does the following text contain mentions of chemicals? Can you output matches?\nText: Mercury induces the expression of cyclooxygenase-2 and inducible nitric oxide synthase.\nAssistant: There is nitric oxide and Mercury."} {"text":"User: Does the following text contain mentions of chemical substances?Please return matches\nGain-of-function mutations in the canonical transient receptor potential 6 (TRPC6) gene are a cause of autosomal dominant focal segmental glomerulosclerosis (FSGS4). The mechanisms whereby abnormal TRPC6 activity results in proteinuria remain unknown. The Erk1\/2 MAP kinases are activated in glomeruli and podocytes in several proteinuric disease models. We therefore examined whether FSGS-associated mutations in TRPC6 result in activation of these kinases. In 293T cells and cultured podocytes, overexpression of gain-of-function TRPC6 mutants resulted in increased Erk1\/2 phosphorylation, an effect dependent upon channel function. Pharmacologic inhibitor studies implicated several signaling mediators, including calmodulin and calcineurin, supporting the importance of TRPC6-mediated calcium influx in this process. Through media transfer experiments, we uncovered two distinct mechanisms for Erk activation by mutant TRPC6, a cell autonomous, EGFR-independent mechanism and a cell non-autonomous mechanism involving metalloprotease-mediated release of a presumed EGFR ligand. The inhibitors KN-92 and H89 were able to block both pathways in mutant TRPC6 expressing cells, as well as the prolonged elevation of intracellular calcium levels upon carbachol stimulation seen in these cells. However, these effects appear independent of their effects on CaMKII and PKA, respectively. Phosphorylation of T70, S282 and Y31\/Y285 were not necessary for Erk activation by mutant TRPC6, though a phosphomimetic TRPC6 S282E mutant was capable of Erk activation. Taken together, these results identify two pathways downstream of mutant TRPC6 leading to Erk activation that may play a role in the development of FSGS.\nAssistant: I found no match."}", "/scratch/micpie/export/chemdner/valid_0-0.jsonl": "{"text":"Task: Find all the mentions of chemicals in the following sentence. Return the matching words. If there is no mention of a chemical, return `no match`.\nSentence: We aimed to determine the genotoxic potential of essential oil (EO) obtained from Nepeta nuda. The chemical content of EO was measured via gas chromatography\/mass spectrometry. The most abundant contents were 4aα,7β,7aα-nepetalactone (18.10%), germacrene (15.68%) and elemol (14.38%). For genotoxic effects of EO, Zea mays' seeds were exposed to four different concentrations of this oil. Inhibition of root and stem growth were observed with an increase in EO concentrations. Randomly amplified polymorphic DNA (RAPD) method was used to determine the genotoxic effects of EO. Some changes occurred in RAPD profiles of germinated EO-treated seeds. Even though total soluble protein quantity vary, the data observed from the protein profiles of sodium dodecyl sulphate-polyacrylamide gel electrophoresis (SDS-PAGE) showed that there was a little differentiation between band profiles of treated samples and control group. We concluded that the basis of interactions between plants, like allelopathy, may be related with genotoxic effects of EO.\nAnswer: sodium dodecyl sulphate, 4aα,7β,7aα-nepetalactone, elemol, germacrene, polyacrylamide, and SDS"} {"text":"Task: Find all the mentions of chemical substances in the subsequent text. Return the matching words. If there is no matching entity, return `no match`.\nDescription: In cotranslational translocation, the ribosome funnel and the channel of the protein translocation complex SecYEG are aligned. For the nascent chain to enter the channel immediately after synthesis, a yet unidentified signal triggers displacement of the SecYEG sealing plug from the pore. Here we show that ribosome binding to the resting SecYEG channel triggers this conformational transition. The purified and reconstituted SecYEG channel opens to form a large ion-conducting channel which has the conductivity of the plug deletion mutant. The number of ion conducting channels inserted into the planar bilayer per fusion event roughly equals the number of SecYEG channels counted by fluorescence correlation spectroscopy in a single proteoliposome. Thus, the open probability of the channel must be close to unity. To prevent the otherwise lethal proton leak, a closed post-translational conformation of the SecYEG complex bound to a ribosome must exist.\nAnswer: no match"}", "/scratch/micpie/export/chemdner/test_0-0.jsonl": "{"text":"Task: Find all the mentions of chemical compounds in the following text. Return the matching entities. If there is no matching entity, return `no match`.\nSentence: Mercury induces the expression of cyclooxygenase-2 and inducible nitric oxide synthase.\nAnswer: nitric oxide and Mercury"} {"text":"Task: Find all the mentions of chemicals in the following text. Return the matching words. If there is no mention of a chemical, return `no match`.\nSentence: Gain-of-function mutations in the canonical transient receptor potential 6 (TRPC6) gene are a cause of autosomal dominant focal segmental glomerulosclerosis (FSGS4). The mechanisms whereby abnormal TRPC6 activity results in proteinuria remain unknown. The Erk1\/2 MAP kinases are activated in glomeruli and podocytes in several proteinuric disease models. We therefore examined whether FSGS-associated mutations in TRPC6 result in activation of these kinases. In 293T cells and cultured podocytes, overexpression of gain-of-function TRPC6 mutants resulted in increased Erk1\/2 phosphorylation, an effect dependent upon channel function. Pharmacologic inhibitor studies implicated several signaling mediators, including calmodulin and calcineurin, supporting the importance of TRPC6-mediated calcium influx in this process. Through media transfer experiments, we uncovered two distinct mechanisms for Erk activation by mutant TRPC6, a cell autonomous, EGFR-independent mechanism and a cell non-autonomous mechanism involving metalloprotease-mediated release of a presumed EGFR ligand. The inhibitors KN-92 and H89 were able to block both pathways in mutant TRPC6 expressing cells, as well as the prolonged elevation of intracellular calcium levels upon carbachol stimulation seen in these cells. However, these effects appear independent of their effects on CaMKII and PKA, respectively. Phosphorylation of T70, S282 and Y31\/Y285 were not necessary for Erk activation by mutant TRPC6, though a phosphomimetic TRPC6 S282E mutant was capable of Erk activation. Taken together, these results identify two pathways downstream of mutant TRPC6 leading to Erk activation that may play a role in the development of FSGS.\nAnswer: no match"}", "/scratch/micpie/export/chemdner/train_0-0.jsonl": "{"text":"Task: Find all the mentions of chemical substances in the subsequent text. Return the matching entities. If there is no mention of a chemical, return `no match`.\nSentence: DPP6 as a candidate gene for neuroleptic-induced tardive dyskinesia.\nAnswer: no match"} {"text":"Task: Find all the mentions of chemical substances in the subsequent sentence. Return the matching entities. If there is no mention of a chemical, return `no match`.\nDescription: Gain-of-function mutations in transient receptor potential C6 (TRPC6) activate extracellular-signal-regulated kinases Erk1\/2.\nAnswer: no match"}", "/scratch/micpie/export/chemdner/valid_0-1.jsonl": "{"text":"User: Does the following text contain mentions of chemical compounds?Please return matches\nWe aimed to determine the genotoxic potential of essential oil (EO) obtained from Nepeta nuda. The chemical content of EO was measured via gas chromatography\/mass spectrometry. The most abundant contents were 4aα,7β,7aα-nepetalactone (18.10%), germacrene (15.68%) and elemol (14.38%). For genotoxic effects of EO, Zea mays' seeds were exposed to four different concentrations of this oil. Inhibition of root and stem growth were observed with an increase in EO concentrations. Randomly amplified polymorphic DNA (RAPD) method was used to determine the genotoxic effects of EO. Some changes occurred in RAPD profiles of germinated EO-treated seeds. Even though total soluble protein quantity vary, the data observed from the protein profiles of sodium dodecyl sulphate-polyacrylamide gel electrophoresis (SDS-PAGE) showed that there was a little differentiation between band profiles of treated samples and control group. We concluded that the basis of interactions between plants, like allelopathy, may be related with genotoxic effects of EO.\nAssistant: There is sodium dodecyl sulphate, 4aα,7β,7aα-nepetalactone, elemol, germacrene, polyacrylamide, and SDS."} {"text":"User: Does the following text contain mentions of chemicals?Please return matches\nIn cotranslational translocation, the ribosome funnel and the channel of the protein translocation complex SecYEG are aligned. For the nascent chain to enter the channel immediately after synthesis, a yet unidentified signal triggers displacement of the SecYEG sealing plug from the pore. Here we show that ribosome binding to the resting SecYEG channel triggers this conformational transition. The purified and reconstituted SecYEG channel opens to form a large ion-conducting channel which has the conductivity of the plug deletion mutant. The number of ion conducting channels inserted into the planar bilayer per fusion event roughly equals the number of SecYEG channels counted by fluorescence correlation spectroscopy in a single proteoliposome. Thus, the open probability of the channel must be close to unity. To prevent the otherwise lethal proton leak, a closed post-translational conformation of the SecYEG complex bound to a ribosome must exist.\nAssistant: I found no match."}", "/scratch/micpie/export/chemdner/train_0-1.jsonl": "{"text":"User: Does the following text contain mentions of chemical compounds? Can you output matches?\nDPP6 as a candidate gene for neuroleptic-induced tardive dyskinesia.\nAssistant: I found no match."} {"text":"User: Does the following text contain mentions of chemical compounds? Can you output matches?\nGain-of-function mutations in transient receptor potential C6 (TRPC6) activate extracellular-signal-regulated kinases Erk1\/2.\nAssistant: I found no match."}", "/scratch/micpie/export/sr_mmp_tox21/test_0-10.jsonl": "{"text":"User: Can you generate the SMILES of a molecule that is not toxic in the Mitochondrial membrane potential assay?\nAssistant: Of course, here you go: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"User: Can you generate the SMILES of a molecule that is not toxic in the Mitochondrial membrane potential assay?\nAssistant: Of course, here you go: CC(C)CCCCCOC(=O)c1ccccc1C(=O)OCCCCCC(C)C"}", "/scratch/micpie/export/sr_mmp_tox21/valid_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the canonical SMILES Cc1nc(C)c(C)nc1C is toxic in the SR-Mitochondrial membrane potential assay?\nAssistant: No, this molecule is not toxic in the SR-Mitochondrial membrane potential assay."} {"text":"User: Can you figure out if the molecule with the InChI InChI=1S\/C8H10N4O2\/c1-2-3-12-6-5(9-4-10-6)7(13)11-8(12)14\/h4H,2-3H2,1H3,(H,9,10)(H,11,13,14) is toxic in the SR-MMP assay?\nAssistant: No, this molecule is not toxic in the SR-MMP assay."}", "/scratch/micpie/export/sr_mmp_tox21/test_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the SR-MMP assay?\nConstraint: You must select none, one or more options from A or B without using any other words.\nOptions:\nA CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C\nB C=C(Cl)CCl\nAnswer: A, B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the SR-MMP assay?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any additional words.\nOptions:\n1 CCC)CCCCCOC=O)cccccc6C=O)OCCCCCCC)C\n2 NcccCCl)=CCl)Cl)))cSN)=O)=O))cc6SN)=O)=O\n3 OccI)ccI)ccccnc%106\nAnswer: 1, 2, 3"}", "/scratch/micpie/export/sr_mmp_tox21/train_0-8.jsonl": "{"text":"User: Can you estimate if the molecule with the DeepSMILES CCOccccncSN)=O)=O))sc5c9 is toxic in the Mitochondrial membrane potential assay?\nAssistant: No, this molecule is not toxic in the Mitochondrial membrane potential assay."} {"text":"User: Can you tell me if the molecule with the DeepSMILES CCC)NCC)C is toxic in the Mitochondrial membrane potential assay?\nAssistant: No, this molecule is not toxic in the Mitochondrial membrane potential assay."}", "/scratch/micpie/export/sr_mmp_tox21/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Mitochondrial membrane potential assay.\nMolecule canonical SMILES: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-Mitochondrial membrane potential assay.\nSELFIES: [C][C][Branch1][C][C][C][C][C][C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][O][C][C][C][C][C][C][Branch1][C][C][C]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"}", "/scratch/micpie/export/sr_mmp_tox21/valid_0-9.jsonl": "{"text":"User: Is the molecule with the SMILES Cc1nc(C)c(C)nc1C toxic in the SR-Mitochondrial membrane potential assay?\nAssistant: No, it is not toxic in the SR-Mitochondrial membrane potential assay."} {"text":"User: Is the molecule with the SELFIES [C][C][C][N][C][=Branch1][C][=O][NH1][C][=Branch1][C][=O][C][NH1][C][=N][C][=Ring1][Branch1][Ring1][O] toxic in the SR-Mitochondrial membrane potential assay?\nAssistant: No, it is not toxic in the SR-Mitochondrial membrane potential assay."}", "/scratch/micpie/export/sr_mmp_tox21/test_0-1.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20) is not showing SR-MMP toxicity."} {"text":"The molecule with the SELFIES representation of [C][C][Branch1][C][C][C][C][C][C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][O][C][C][C][C][C][C][Branch1][C][C][C] is not showing SR-MMP toxicity."}", "/scratch/micpie/export/sr_mmp_tox21/valid_0-0.jsonl": "{"text":"The molecule with the canonical SMILES Cc1nc(C)c(C)nc1C is not toxic in the SR-Mitochondrial membrane potential assay."} {"text":"The molecule with the SMILES representation of CCCn1c(=O)[nH]c(=O)c2[nH]cnc21 is not toxic in the SR-Mitochondrial membrane potential assay."}", "/scratch/micpie/export/sr_mmp_tox21/test_0-2.jsonl": "{"text":"Based on the InChI representation InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20), the molecule has no Mitochondrial membrane potential toxicity features."} {"text":"Based on the InChI representation InChI=1S\/C24H38O4\/c1-19(2)13-7-5-11-17-27-23(25)21-15-9-10-16-22(21)24(26)28-18-12-6-8-14-20(3)4\/h9-10,15-16,19-20H,5-8,11-14,17-18H2,1-4H3, the molecule has no SR-MMP toxicity features."}", "/scratch/micpie/export/sr_mmp_tox21/valid_0-10.jsonl": "{"text":"User: Can you give me the SMILES of a molecule that is not toxic in the SR-Mitochondrial membrane potential assay?\nAssistant: Of course, here you go: Cc1nc(C)c(C)nc1C"} {"text":"User: Can you generate the DeepSMILES of a molecule that is not toxic in the SR-MMP assay?\nAssistant: Of course, here you go: CCCnc=O)[nH]c=O)c[nH]cnc59"}", "/scratch/micpie/export/sr_mmp_tox21/train_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Mitochondrial membrane potential assay.\nMolecule SELFIES: [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the Mitochondrial membrane potential assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-MMP assay.\nMolecule SELFIES: [C][C][Branch1][C][C][N][C][Branch1][C][C][C]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the SR-MMP assay."}", "/scratch/micpie/export/sr_mmp_tox21/valid_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-Mitochondrial membrane potential assay.\nMolecule SELFIES: [C][C][=N][C][Branch1][C][C][=C][Branch1][C][C][N][=C][Ring1][Branch2][C]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the SR-Mitochondrial membrane potential assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Mitochondrial membrane potential assay.\nDeepSMILES: CCCnc=O)[nH]c=O)c[nH]cnc59\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the Mitochondrial membrane potential assay."}", "/scratch/micpie/export/sr_mmp_tox21/test_0-9.jsonl": "{"text":"User: Is the molecule with the DeepSMILES CCCNCC))CCC))C=O)NccC)cccc6C toxic in the Mitochondrial membrane potential assay?\nAssistant: No, it is not toxic in the Mitochondrial membrane potential assay."} {"text":"User: Is the molecule with the DeepSMILES CCC)CCCCCOC=O)cccccc6C=O)OCCCCCCC)C toxic in the Mitochondrial membrane potential assay?\nAssistant: No, it is not toxic in the Mitochondrial membrane potential assay."}", "/scratch/micpie/export/sr_mmp_tox21/test_0-0.jsonl": "{"text":"The molecule with the InChI representation of InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20) is not toxic in the SR-Mitochondrial membrane potential assay."} {"text":"The molecule with the canonical SMILES CC(C)CCCCCOC(=O)c1ccccc1C(=O)OCCCCCC(C)C is not toxic in the Mitochondrial membrane potential assay."}", "/scratch/micpie/export/sr_mmp_tox21/valid_0-7.jsonl": "{"text":"Task: Please give me a SELFIES based on the text description below.\nDescription: A molecule that is toxic in the SR-MMP assay.\nResult: [C][C][=N][C][Branch1][C][C][=C][Branch1][C][C][N][=C][Ring1][Branch2][C]"} {"text":"Task: Please create a molecule DeepSMILES based on the description below.\nDescription: A molecule that is toxic in the SR-MMP assay.\nResult: CCCnc=O)[nH]c=O)c[nH]cnc59"}", "/scratch/micpie/export/sr_mmp_tox21/test_0-3.jsonl": "{"text":"The SELFIES [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C] is from a molecule that is not identified as toxic in the SR-MMP assay."} {"text":"The SELFIES [C][C][Branch1][C][C][C][C][C][C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][O][C][C][C][C][C][C][Branch1][C][C][C] represents a molecule that is not identified as toxic in the SR-Mitochondrial membrane potential assay."}", "/scratch/micpie/export/sr_mmp_tox21/valid_0-11.jsonl": "{"text":"User: I'm looking for the SELFIES of a molecule that is not toxic in the SR-MMP assay?\nAssistant: This is a molecule that is not toxic in the SR-MMP assay: [C][C][=N][C][Branch1][C][C][=C][Branch1][C][C][N][=C][Ring1][Branch2][C]"} {"text":"User: I'm looking for the DeepSMILES of a molecule that is not toxic in the SR-MMP assay?\nAssistant: This is a molecule that is not toxic in the SR-MMP assay: CCCnc=O)[nH]c=O)c[nH]cnc59"}", "/scratch/micpie/export/sr_mmp_tox21/train_0-0.jsonl": "{"text":"The molecule with the DeepSMILES CCOccccncSN)=O)=O))sc5c9 is not toxic in the SR-Mitochondrial membrane potential assay."} {"text":"The molecule with the SMILES representation of CC(C)NC(C)C is not toxic in the SR-MMP assay."}", "/scratch/micpie/export/sr_mmp_tox21/test_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Mitochondrial membrane potential assay.\nMolecule SELFIES: [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C]\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the Mitochondrial membrane potential assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-MMP assay.\nMolecule canonical SMILES: CC(C)CCCCCOC(=O)c1ccccc1C(=O)OCCCCCC(C)C\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the SR-MMP assay."}", "/scratch/micpie/export/sr_mmp_tox21/train_0-10.jsonl": "{"text":"User: Can you create the SELFIES of a molecule that is not toxic in the SR-Mitochondrial membrane potential assay?\nAssistant: Of course, here you go: [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N]"} {"text":"User: Can you give me the SELFIES of a molecule that is not toxic in the SR-MMP assay?\nAssistant: Yes, here you go: [C][C][Branch1][C][C][N][C][Branch1][C][C][C]"}", "/scratch/micpie/export/sr_mmp_tox21/train_0-3.jsonl": "{"text":"The canonical SMILES CCOc1ccc2nc(S(N)(=O)=O)sc2c1 represents a molecule that is not identified as toxic in the SR-MMP assay."} {"text":"The SELFIES [C][C][Branch1][C][C][N][C][Branch1][C][C][C] represents a molecule that is not identified as toxic in the SR-MMP assay."}", "/scratch/micpie/export/sr_mmp_tox21/train_0-12.jsonl": "{"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be toxic in the Mitochondrial membrane potential assay.\nAssistant: Got it, this canonical SMILES is not toxic in the Mitochondrial membrane potential assay: CCOc1ccc2nc(S(N)(=O)=O)sc2c1"} {"text":"User: I want to come up with a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be toxic in the Mitochondrial membrane potential assay.\nAssistant: Got it, this canonical SMILES is not toxic in the Mitochondrial membrane potential assay: CC(C)NC(C)C"}", "/scratch/micpie/export/sr_mmp_tox21/test_0-13.jsonl": "{"text":"User: I want to generate a SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the SR-MMP assay.\nAssistant: Ok, this SMILES is not toxic in the SR-MMP assay: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"User: I want to generate a SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the SR-MMP assay.\nAssistant: Ok, this SMILES is not toxic in the SR-MMP assay: CC(C)CCCCCOC(=O)c1ccccc1C(=O)OCCCCCC(C)C"}", "/scratch/micpie/export/sr_mmp_tox21/valid_0-2.jsonl": "{"text":"Based on the DeepSMILES CcncC)cC)nc6C, the molecule has no SR-MMP toxicity characteristics."} {"text":"Based on the SMILES CCCn1c(=O)[nH]c(=O)c2[nH]cnc21, the molecule has no SR-MMP toxicity features."}", "/scratch/micpie/export/sr_mmp_tox21/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES representation of CCOccccncSN)=O)=O))sc5c9 toxic in the SR-MMP assay?\nConstraint: Even if you are uncertain, you must pick either A or B without using any other words.\nOptions:\nA: False\nB: True\nAnswer: A"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES CC(C)NC(C)C toxic in the SR-Mitochondrial membrane potential assay?\nConstraint: Even if you are uncertain, you must pick either a or b without using any other words.\nOptions:\na. False\nb. True\nAnswer: a"}", "/scratch/micpie/export/sr_mmp_tox21/valid_0-1.jsonl": "{"text":"The molecule with the SMILES Cc1nc(C)c(C)nc1C is not showing SR-MMP toxicity."} {"text":"The molecule with the InChI InChI=1S\/C8H10N4O2\/c1-2-3-12-6-5(9-4-10-6)7(13)11-8(12)14\/h4H,2-3H2,1H3,(H,9,10)(H,11,13,14) is not showing SR-MMP toxicity."}", "/scratch/micpie/export/sr_mmp_tox21/valid_0-13.jsonl": "{"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the Mitochondrial membrane potential assay.\nAssistant: Got it, this DeepSMILES is not toxic in the Mitochondrial membrane potential assay: CcncC)cC)nc6C"} {"text":"User: I want to create a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the SR-MMP assay.\nAssistant: Got it, this canonical SMILES is not toxic in the SR-MMP assay: CCCn1c(=O)[nH]c(=O)c2[nH]cnc21"}", "/scratch/micpie/export/sr_mmp_tox21/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-Mitochondrial membrane potential assay.\ncanonical SMILES: Cc1nc(C)c(C)nc1C\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-MMP assay.\nMolecule SMILES: CCCn1c(=O)[nH]c(=O)c2[nH]cnc21\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"}", "/scratch/micpie/export/sr_mmp_tox21/train_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the SR-Mitochondrial membrane potential assay?\nConstraint: You must select none, one or more options from A, B, C, or D without using any other words.\nOptions:\nA: CCOccccncSN)=O)=O))sc5c9\nB: CCCC)CC=O)NC=O)C6\nC: CCCCOCCOC=O)cccccc6C=O)OCCOCCCC\nD: CCC)C)cccccO)c6\nAnswer: A, B, C"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the SR-MMP assay?\nConstraint: You must select none, one or more options from a, b, c, d, or e without using any other words.\nOptions:\na: InChI=1S\/C8H14O4\/c1-5-4-8(11-6(2)9)12-7(3)10-5\/h5,7-8H,4H2,1-3H3\nb: InChI=1S\/C7H10N\/c1-2-8-6-4-3-5-7-8\/h3-7H,2H2,1H3\/q+1\nc: InChI=1S\/C6H15N\/c1-5(2)7-6(3)4\/h5-7H,1-4H3\nd: InChI=1S\/C23H27FN4O3\/c1-14-17(23(30)28-9-2-3-19(29)22(28)25-14)8-12-27-10-6-15(7-11-27)21-18-5-4-16(24)13-20(18)31-26-21\/h4-5,13,15,19,29H,2-3,6-12H2,1H3\/t19-\/m1\/s1\ne: InChI=1S\/C6H6N2\/c1-6(5-8)3-2-4-7\/h1-3H2\nAnswer: a, b, c, d, e"}", "/scratch/micpie/export/sr_mmp_tox21/valid_0-4.jsonl": "{"text":"The canonical SMILES Cc1nc(C)c(C)nc1C is not toxic in the Mitochondrial membrane potential assay."} {"text":"The SELFIES [C][C][C][N][C][=Branch1][C][=O][NH1][C][=Branch1][C][=O][C][NH1][C][=N][C][=Ring1][Branch1][Ring1][O] is not toxic in the SR-Mitochondrial membrane potential assay."}", "/scratch/micpie/export/sr_mmp_tox21/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-MMP assay.\nMolecule SMILES: CCOc1ccc2nc(S(N)(=O)=O)sc2c1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-MMP assay.\nMolecule InChI: InChI=1S\/C6H15N\/c1-5(2)7-6(3)4\/h5-7H,1-4H3\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/sr_mmp_tox21/valid_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the Mitochondrial membrane potential assay?\nConstraint: You must select none, one or more options from a, b, or c without using any additional words.\nOptions:\n[a] Cc1nc(C)c(C)nc1C\n[b] NS(=O)(=O)c1ccc(NC(=O)CCC(=O)O)cc1\n[c] Cc1cn([C@H]2C[C@H](O)[C@@H](CO)O2)c(=O)[nH]c1=O\nAnswer: a, b, c"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the Mitochondrial membrane potential assay?\nConstraint: You must select none, one or more options from 1 or 2 without using any additional words.\nOptions:\n1. CCCn1c(=O)[nH]c(=O)c2[nH]cnc21\n2. CCCCC(=O)O\nAnswer: 1, 2"}", "/scratch/micpie/export/sr_mmp_tox21/valid_0-12.jsonl": "{"text":"User: I want to create a InChI.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be toxic in the SR-Mitochondrial membrane potential assay.\nAssistant: Ok, here you go, this InChI is not toxic in the SR-Mitochondrial membrane potential assay: InChI=1S\/C8H12N2\/c1-5-6(2)10-8(4)7(3)9-5\/h1-4H3"} {"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be toxic in the SR-Mitochondrial membrane potential assay.\nAssistant: Ok, this SMILES is not toxic in the SR-Mitochondrial membrane potential assay: CCCn1c(=O)[nH]c(=O)c2[nH]cnc21"}", "/scratch/micpie/export/sr_mmp_tox21/train_0-2.jsonl": "{"text":"Based on the SMILES representation CCOc1ccc2nc(S(N)(=O)=O)sc2c1, the molecule has no SR-MMP toxicity characteristics."} {"text":"Based on the SELFIES representation [C][C][Branch1][C][C][N][C][Branch1][C][C][C], the molecule has no SR-MMP toxicity characteristics."}", "/scratch/micpie/export/sr_mmp_tox21/test_0-11.jsonl": "{"text":"User: I'm looking for the SMILES of a molecule that is not toxic in the Mitochondrial membrane potential assay?\nAssistant: This is a molecule that is not toxic in the Mitochondrial membrane potential assay: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"User: I'm looking for the canonical SMILES of a molecule that is not toxic in the SR-MMP assay?\nAssistant: This is a molecule that is not toxic in the SR-MMP assay: CC(C)CCCCCOC(=O)c1ccccc1C(=O)OCCCCCC(C)C"}", "/scratch/micpie/export/sr_mmp_tox21/train_0-7.jsonl": "{"text":"Task: Please generate a molecule SELFIES based on the text description below.\nDescription: A molecule that is toxic in the SR-Mitochondrial membrane potential assay.\nResult: [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N]"} {"text":"Task: Please generate a molecule canonical SMILES based on the text description below.\nDescription: A molecule that is toxic in the SR-MMP assay.\nResult: CC(C)NC(C)C"}", "/scratch/micpie/export/sr_mmp_tox21/train_0-11.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that is not toxic in the Mitochondrial membrane potential assay?\nAssistant: This is a molecule that is not toxic in the Mitochondrial membrane potential assay: CCOc1ccc2nc(S(N)(=O)=O)sc2c1"} {"text":"User: I'm searching for the DeepSMILES of a molecule that is not toxic in the SR-MMP assay?\nAssistant: This is a molecule that is not toxic in the SR-MMP assay: CCC)NCC)C"}", "/scratch/micpie/export/sr_mmp_tox21/train_0-1.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13) is not showing SR-MMP toxicity."} {"text":"The molecule with the SELFIES [C][C][Branch1][C][C][N][C][Branch1][C][C][C] is not showing SR-MMP toxicity."}", "/scratch/micpie/export/sr_mmp_tox21/train_0-13.jsonl": "{"text":"User: I want to come up with a InChI.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the SR-MMP assay.\nAssistant: Understood, this InChI is not toxic in the SR-MMP assay: InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13)"} {"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the SR-MMP assay.\nAssistant: Understood, this DeepSMILES is not toxic in the SR-MMP assay: CCC)NCC)C"}", "/scratch/micpie/export/sr_mmp_tox21/train_0-4.jsonl": "{"text":"The InChI InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13) is not toxic in the SR-MMP assay."} {"text":"The InChI InChI=1S\/C6H15N\/c1-5(2)7-6(3)4\/h5-7H,1-4H3 is not toxic in the SR-MMP assay."}", "/scratch/micpie/export/sr_mmp_tox21/test_0-7.jsonl": "{"text":"Task: Please give me a canonical SMILES based on the text description below.\nDescription: A molecule that is toxic in the Mitochondrial membrane potential assay.\nResult: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"Task: Please give me a DeepSMILES based on the text description below.\nDescription: A molecule that is toxic in the SR-Mitochondrial membrane potential assay.\nResult: CCC)CCCCCOC=O)cccccc6C=O)OCCCCCCC)C"}", "/scratch/micpie/export/sr_mmp_tox21/train_0-9.jsonl": "{"text":"User: Is the molecule with the DeepSMILES CCOccccncSN)=O)=O))sc5c9 toxic in the Mitochondrial membrane potential assay?\nAssistant: No, it is not toxic in the Mitochondrial membrane potential assay."} {"text":"User: Is the molecule with the SMILES CC(C)NC(C)C toxic in the Mitochondrial membrane potential assay?\nAssistant: No, it is not toxic in the Mitochondrial membrane potential assay."}", "/scratch/micpie/export/sr_mmp_tox21/valid_0-3.jsonl": "{"text":"The InChI InChI=1S\/C8H12N2\/c1-5-6(2)10-8(4)7(3)9-5\/h1-4H3 represents a molecule that is not identified as toxic in the SR-Mitochondrial membrane potential assay."} {"text":"The DeepSMILES CCCnc=O)[nH]c=O)c[nH]cnc59 represents a molecule that is not identified as toxic in the SR-Mitochondrial membrane potential assay."}", "/scratch/micpie/export/sr_mmp_tox21/test_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the canonical SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C is toxic in the SR-Mitochondrial membrane potential assay?\nAssistant: No, this molecule is not toxic in the SR-Mitochondrial membrane potential assay."} {"text":"User: Can you estimate if the molecule with the SMILES CC(C)CCCCCOC(=O)c1ccccc1C(=O)OCCCCCC(C)C is toxic in the SR-Mitochondrial membrane potential assay?\nAssistant: No, this molecule is not toxic in the SR-Mitochondrial membrane potential assay."}", "/scratch/micpie/export/sr_mmp_tox21/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES CCCNCC))CCC))C=O)NccC)cccc6C toxic in the SR-Mitochondrial membrane potential assay?\nConstraint: Even if you are not sure, you must pick either a or b without using any additional words.\nOptions:\n(a) True\n(b) False\nAnswer: b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES CCC)CCCCCOC=O)cccccc6C=O)OCCCCCCC)C toxic in the Mitochondrial membrane potential assay?\nConstraint: Even if you are uncertain, you must pick either A or B without using any additional words.\nOptions:\nA.) False\nB.) True\nAnswer: A"}", "/scratch/micpie/export/sr_mmp_tox21/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [C][C][=N][C][Branch1][C][C][=C][Branch1][C][C][N][=C][Ring1][Branch2][C] toxic in the Mitochondrial membrane potential assay?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any other words.\nOptions:\n[1] True\n[2] False\nAnswer: 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES representation of CCCn1c(=O)[nH]c(=O)c2[nH]cnc21 toxic in the SR-Mitochondrial membrane potential assay?\nConstraint: Even if you are uncertain, you must pick either A or B without using any additional words.\nOptions:\nA) False\nB) True\nAnswer: A"}", "/scratch/micpie/export/sr_mmp_tox21/test_0-4.jsonl": "{"text":"The molecule SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C is not toxic in the Mitochondrial membrane potential assay."} {"text":"The InChI InChI=1S\/C24H38O4\/c1-19(2)13-7-5-11-17-27-23(25)21-15-9-10-16-22(21)24(26)28-18-12-6-8-14-20(3)4\/h9-10,15-16,19-20H,5-8,11-14,17-18H2,1-4H3 is not toxic in the SR-Mitochondrial membrane potential assay."}", "/scratch/micpie/export/sr_mmp_tox21/test_0-12.jsonl": "{"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very curious. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be toxic in the SR-Mitochondrial membrane potential assay.\nAssistant: Ok, here you go, this DeepSMILES is not toxic in the SR-Mitochondrial membrane potential assay: CCCNCC))CCC))C=O)NccC)cccc6C"} {"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very curious. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the SR-Mitochondrial membrane potential assay.\nAssistant: Ok, here you go, this SMILES is not toxic in the SR-Mitochondrial membrane potential assay: CC(C)CCCCCOC(=O)c1ccccc1C(=O)OCCCCCC(C)C"}", "/scratch/micpie/export/MUV_832/valid_0-0.jsonl": "{"text":"The molecular species with the DeepSMILES representation of Cnccnc5SCC=O)NccccOcccccc6)))))))cc6 is not an inhibitor of the Cathepsin G protease."} {"text":"The chemical with the DeepSMILES representation of CCNCC))S=O)=O)cccc-cnncSCC=O)Ncccccc6)OCO5))))))))))))o5)))))cc6 is not an inhibitor of the Cathepsin G protease."}", "/scratch/micpie/export/MUV_832/test_0-0.jsonl": "{"text":"The molecule with the SMILES representation of COc1ccccc1Nc1nc2c(s1)CCC2 is not an inhibitor of the Cathepsin G protease."} {"text":"The compound with the SMILES CC1=C(C)C(=O)C(CCCCC#CCCCC#CCO)=C(C)C1=O is not an inhibitor of the Cathepsin G protease."}", "/scratch/micpie/export/MUV_832/train_0-0.jsonl": "{"text":"The molecular species with the SELFIES [C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][Ring1][=Branch1][C][C][=Branch1][C][=O][N][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][Ring1][=C][C][C][C][C][N][Ring1][Branch1][Ring2][Ring1][Ring2] is not an inhibitor of the Cathepsin G protease."} {"text":"The chemical with the SMILES representation of CS(=O)(=O)N(CC(=O)NCCSCc1c(F)cccc1Cl)c1ccc2c(c1)OCO2 is not an inhibitor of the Cathepsin G protease."}", "/scratch/micpie/export/rdkit_features/valid_14-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C is 4.53."} {"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O is 2.72."}", "/scratch/micpie/export/rdkit_features/train_104-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_5-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C is 6."} {"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+] is 4."}", "/scratch/micpie/export/rdkit_features/test_22-16.jsonl": "{"text":"Question: What is the count of rings of the chemical with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O?\nAnswer: 3"} {"text":"Question: What is the number of rings of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_6-8.jsonl": "{"text":"The count of basic groups of the molecule with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+] is 2."} {"text":"The number of basic groups of the molecule with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F is 0."}", "/scratch/micpie/export/rdkit_features/train_108-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2?\nAnswer: 11"} {"text":"Question: What is the aromatic bond count of the chemical with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_107-5.jsonl": "{"text":"The number of rotatable bonds of the chemical with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C is 6."} {"text":"The rotatable bond count of the compound with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C is 8."}", "/scratch/micpie/export/rdkit_features/test_6-4.jsonl": "{"text":"The count of rings of the chemical with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3 is 3."} {"text":"The number of rings of the chemical with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C is 4."}", "/scratch/micpie/export/rdkit_features/train_32-4.jsonl": "{"text":"The count of rings of the chemical with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C is 2."} {"text":"The number of rings of the molecule with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2 is 2."}", "/scratch/micpie/export/rdkit_features/test_104-5.jsonl": "{"text":"The number of rotatable bonds of the chemical with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2 is 1."} {"text":"The number of rotatable bonds of the chemical with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3 is 5."}", "/scratch/micpie/export/rdkit_features/valid_105-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F?\nAnswer: 5"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_30-17.jsonl": "{"text":"Question: What is the rotatable bond count of the chemical with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3?\nAnswer: 6"} {"text":"Question: What is the number of rotatable bonds of the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_17-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the chemical with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S is 4."} {"text":"The count of hydrogen bond acceptors of the molecule with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 9."}", "/scratch/micpie/export/rdkit_features/valid_9-8.jsonl": "{"text":"The number of basic groups of the molecule with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2 is 1."} {"text":"The basic group count of the molecule with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4 is 0."}", "/scratch/micpie/export/rdkit_features/train_29-5.jsonl": "{"text":"The number of rotatable bonds of the compound with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F is 4."} {"text":"The rotatable bond count of the compound with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C is 6."}", "/scratch/micpie/export/rdkit_features/test_104-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2 is 42.67."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3 is 49.50."}", "/scratch/micpie/export/rdkit_features/test_107-16.jsonl": "{"text":"Question: What is the number of rings of the molecule with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC?\nAnswer: 2"} {"text":"Question: What is the ring count of the chemical with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_11-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl?\nAnswer: 58.75"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C?\nAnswer: 50.18"}", "/scratch/micpie/export/rdkit_features/valid_7-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the compound with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_29-23.jsonl": "{"text":"User: I want to design a chemical with a number of hydrogen bond donor sites of 2, a count of hydrogen bond acceptors of 2 and a Wildman-Crippen LogP value computed using RDKit of 3.75.\nAssistant: That is a very interesting question, do you have some additional conditions that help me narrow down the search?\nUser: Yeah, I want the chemical formula to be C20H26FN3O2.\nAssistant: In that scenario, I suggest the chemical with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F."} {"text":"User: I want to design a molecule with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptor sites of 2 and a LogP value computed using the Wildman-Crippen method of 2.52.\nAssistant: Cool, do you have some additional requirements that help me narrow down the search?\nUser: Yea, I want the molecular formula to be C22H36N3O2+.\nAssistant: I suggest the molecule with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C."}", "/scratch/micpie/export/rdkit_features/valid_120-4.jsonl": "{"text":"The ring count of the chemical with SMILES c1cc(oc1)CNC(=O)CSc2nc3ccc(cc3s2)n4c(c5c(c4O)[C@@H]6C[C@@H]5C=C6)O is 6."} {"text":"The count of rings of the compound with SMILES CN(c1nc2cc(ccc2s1)Cl)C(=O)CSc3ccc(c(c3)F)F is 3."}", "/scratch/micpie/export/rdkit_features/valid_106-22.jsonl": "{"text":"User: I want to make a compound with a number of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 5.\nAssistant: That is a very interesting question, do you have some additional requirements I should take into account?\nUser: Yeah, I want the number of heteroatoms to be 8.\nAssistant: In that scenario, I suggest the compound with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl."} {"text":"User: I want to design a chemical with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptors of 3.\nAssistant: Nice, do you have some additional ?\nUser: I want the number of heteroatoms to be 11.\nAssistant: In that situation, I the chemical with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F."}", "/scratch/micpie/export/rdkit_features/valid_101-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_11-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl?\nAnswer: C20H28ClNO2"} {"text":"Question: What is the formula of the chemical with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C?\nAnswer: C17H21Cl2NO"}", "/scratch/micpie/export/rdkit_features/valid_118-19.jsonl": "{"text":"Question: What is the acid group count of the chemical with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O?\nAnswer: 0"} {"text":"Question: What is the acid group count of the compound with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_104-3.jsonl": "{"text":"The number of heteroatoms of the molecule with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3 is 9."} {"text":"The number of heteroatoms of the compound with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C is 5."}", "/scratch/micpie/export/rdkit_features/train_10-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the chemical with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4 is 5."} {"text":"The count of hydrogen bond acceptors of the molecule with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl is 2."}", "/scratch/micpie/export/rdkit_features/train_6-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the compound with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4?\nAnswer: 9"} {"text":"Question: What is the count of heteroatoms of the compound with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_13-20.jsonl": "{"text":"Question: What is the basic group count of the chemical with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the chemical with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_120-1.jsonl": "{"text":"The number of hydrogen bond donors of the chemical with SMILES Cc1ccc(cc1)NC(=O)CSc2nnc(n2C)C=NC(=O)CCc3ccccc3Cl is 1."} {"text":"The count of hydrogen bond donors of the chemical with SMILES Cc1c(cnn1c2ccccc2)N[C@@H](C)C(=O)Nc3cccc(c3Cl)Cl is 2."}", "/scratch/micpie/export/rdkit_features/test_114-8.jsonl": "{"text":"The number of basic groups of the molecule with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C is 0."} {"text":"The count of basic groups of the compound with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5 is 0."}", "/scratch/micpie/export/rdkit_features/test_118-0.jsonl": "{"text":"The chemical formula of the compound with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O is C14H23FNO+."} {"text":"The molecular formula of the compound with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N is C14H12FN5O5."}", "/scratch/micpie/export/rdkit_features/test_14-20.jsonl": "{"text":"Question: What is the number of basic groups of the chemical with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the compound with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_107-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC is 2.24."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC is 2.28."}", "/scratch/micpie/export/rdkit_features/valid_12-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl is 5."} {"text":"The number of rotatable bonds of the compound with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3 is 7."}", "/scratch/micpie/export/rdkit_features/test_20-23.jsonl": "{"text":"User: I want to synthesize a compound with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptors of 2 and a Wildman-Crippen LogP value computed using RDKit of 3.95.\nAssistant: That's interesting, do you have some additional ?\nUser: I want the molecular formula to be C19H24BrFN2O2.\nAssistant: I advise the compound with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br."} {"text":"User: I want to design a molecule with a count of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 6 and a Wildman-Crippen LogP value computed using RDKit of 1.04.\nAssistant: That is a very interesting question, do you have some additional limitations I should consider?\nUser: Yes, I want the molecular formula to be C17H15ClNO6S-.\nAssistant: In that situation, I propose the molecule with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-]."}", "/scratch/micpie/export/rdkit_features/train_31-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F?\nAnswer: 5"} {"text":"Question: What is the rotatable bond count of the chemical with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_28-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br is 1."} {"text":"The number of hydrogen bond acceptors of the molecule with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C is 2."}", "/scratch/micpie/export/rdkit_features/train_19-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC?\nAnswer: C23H31N3O4"} {"text":"Question: What is the formula of the molecule with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3?\nAnswer: C25H33N3O3"}", "/scratch/micpie/export/rdkit_features/train_111-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the compound with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C is 8."} {"text":"The number of hydrogen bond acceptor sites of the compound with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC is 10."}", "/scratch/micpie/export/rdkit_features/valid_107-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C is 2.20."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C is 2.25."}", "/scratch/micpie/export/rdkit_features/train_108-11.jsonl": "{"text":"User: I want to make a compound with a chemical formula of C17H21F3N6O2S.\nAssistant: That's interesting, do you have some additional that I should consider?\nUser: Yea, I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 7.\nAssistant: In that situation, I the compound with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2."} {"text":"User: I want to create a compound with a formula of C23H32N4O4.\nAssistant: Interesting, do you have some additional constraints I should take into account?\nUser: Yep, I want the count of hydrogen bond donors to be 3, the number of hydrogen bond acceptors to be 4.\nAssistant: Then, I propose the compound with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3."}", "/scratch/micpie/export/rdkit_features/train_27-23.jsonl": "{"text":"User: I want to create a chemical with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 3 and a LogP value computed using the Wildman-Crippen method of 3.15.\nAssistant: Cool, do you have some additional requirements that I should consider?\nUser: I want the chemical formula to be C21H31N4O+.\nAssistant: In that case, I recommend the chemical with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C."} {"text":"User: I want to synthesize a molecule with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 4 and a Wildman-Crippen LogP value computed using RDKit of 3.34.\nAssistant: That's interesting, do you have some additional conditions?\nUser: Yeah, I want the chemical formula to be C21H23N3O3.\nAssistant: In that case, I suggest the molecule with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4."}", "/scratch/micpie/export/rdkit_features/valid_101-6.jsonl": "{"text":"The count of aromatic bonds of the compound with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3 is 16."} {"text":"The number of aromatic bonds of the chemical with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1 is 18."}", "/scratch/micpie/export/rdkit_features/valid_116-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the molecule with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 6"} {"text":"Question: What is the count of heteroatoms of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_107-23.jsonl": "{"text":"User: I want to analyze a molecule with a count of hydrogen bond donors of 2, a count of hydrogen bond acceptors of 6 and a LogP value computed using the Wildman-Crippen method of 2.24.\nAssistant: Interesting, do you have some additional limitations I should consider?\nUser: Indeed, I want the chemical formula to be C22H27N3O5S.\nAssistant: In that case, I propose the molecule with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC."} {"text":"User: I want to analyze a molecule with a number of hydrogen bond donors of 3, a number of hydrogen bond acceptor sites of 7 and a LogP value computed using the Wildman-Crippen method of 2.28.\nAssistant: Nice, do you have some additional conditions I should consider?\nUser: I want the formula to be C23H29N3O6.\nAssistant: In that case, I the molecule with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC."}", "/scratch/micpie/export/rdkit_features/test_112-11.jsonl": "{"text":"User: I want to analyze a molecule with a molecular formula of C21H26N6O3S.\nAssistant: That's interesting, do you have some additional requirements?\nUser: Indeed, I want the number of hydrogen bond donor sites to be 0, the number of hydrogen bond acceptor sites to be 9.\nAssistant: I propose the molecule with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC."} {"text":"User: I want to analyze a compound with a molecular formula of C22H27N6O2S+.\nAssistant: That is a very interesting question, do you have some additional that I should consider?\nUser: Yes, I want the number of hydrogen bond donors to be 3, the count of hydrogen bond acceptors to be 5.\nAssistant: In that scenario, I suggest the compound with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N."}", "/scratch/micpie/export/rdkit_features/valid_24-0.jsonl": "{"text":"The chemical formula of the compound with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl is C18H13ClN7O2-."} {"text":"The chemical formula of the chemical with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C is C21H39N6O+."}", "/scratch/micpie/export/rdkit_features/test_10-15.jsonl": "{"text":"Question: What is the heteroatom count of the molecule with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4?\nAnswer: 6"} {"text":"Question: What is the count of heteroatoms of the chemical with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_18-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O?\nAnswer: 7"} {"text":"Question: What is the count of rotatable bonds of the molecule with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_104-7.jsonl": "{"text":"The acid group count of the molecule with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3 is 0."} {"text":"The count of acid groups of the molecule with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C is 0."}", "/scratch/micpie/export/rdkit_features/valid_103-7.jsonl": "{"text":"The number of acid groups of the molecule with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO is 0."} {"text":"The number of acid groups of the chemical with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N is 0."}", "/scratch/micpie/export/rdkit_features/train_15-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the compound with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC is 50.59."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C is 55.59."}", "/scratch/micpie/export/rdkit_features/valid_1-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_31-20.jsonl": "{"text":"Question: What is the count of basic groups of the chemical with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the molecule with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_22-19.jsonl": "{"text":"Question: What is the count of acid groups of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_1-8.jsonl": "{"text":"The number of basic groups of the molecule with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4 is 1."} {"text":"The number of basic groups of the compound with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl is 0."}", "/scratch/micpie/export/rdkit_features/test_15-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O?\nAnswer: 38.77"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2?\nAnswer: 52.50"}", "/scratch/micpie/export/rdkit_features/train_114-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_115-4.jsonl": "{"text":"The number of rings of the compound with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4 is 4."} {"text":"The count of rings of the chemical with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F is 6."}", "/scratch/micpie/export/rdkit_features/test_8-11.jsonl": "{"text":"User: I want to make a chemical with a formula of C22H26N4O4.\nAssistant: Do you have some additional constraints that I should consider?\nUser: Yes, I want the count of hydrogen bond donors to be 0, the number of hydrogen bond acceptors to be 7.\nAssistant: In that situation, I advise the chemical with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C."} {"text":"User: I want to design a chemical with a molecular formula of C22H24F2N2O3.\nAssistant: Interesting, do you have some additional limitations I should take into account?\nUser: Indeed, I want the count of hydrogen bond donors to be 0, the number of hydrogen bond acceptors to be 4.\nAssistant: In that scenario, I propose the chemical with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F."}", "/scratch/micpie/export/rdkit_features/test_2-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the molecule with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C?\nAnswer: 8"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_1-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4?\nAnswer: 64.23"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC?\nAnswer: 63.72"}", "/scratch/micpie/export/rdkit_features/valid_105-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_116-18.jsonl": "{"text":"Question: What is the aromatic bond count of the compound with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1?\nAnswer: 21"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl?\nAnswer: 16"}", "/scratch/micpie/export/rdkit_features/train_15-5.jsonl": "{"text":"The number of rotatable bonds of the compound with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC is 6."} {"text":"The rotatable bond count of the chemical with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C is 6."}", "/scratch/micpie/export/rdkit_features/test_119-11.jsonl": "{"text":"User: I want to synthesize a chemical with a formula of C15H16FN5O3.\nAssistant: Nice, do you have some additional conditions that help me narrow down the search?\nUser: Yeah, I want the number of hydrogen bond donor sites to be 2, the count of hydrogen bond acceptors to be 6.\nAssistant: In that scenario, I the chemical with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O."} {"text":"User: I want to synthesize a chemical with a chemical formula of C22H29BrN2O4S.\nAssistant: That is a very interesting question, do you have some additional conditions I should take into account?\nUser: Yea, I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptors to be 4.\nAssistant: In that case, I advise the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C."}", "/scratch/micpie/export/rdkit_features/valid_15-4.jsonl": "{"text":"The count of rings of the molecule with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N is 2."} {"text":"The count of rings of the molecule with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C is 1."}", "/scratch/micpie/export/rdkit_features/valid_31-12.jsonl": "{"text":"Question: What is the chemical formula of the compound with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC?\nAnswer: C24H27FN2O3"} {"text":"Question: What is the formula of the chemical with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3?\nAnswer: C20H27FN2O4"}", "/scratch/micpie/export/rdkit_features/test_103-12.jsonl": "{"text":"Question: What is the chemical formula of the compound with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1?\nAnswer: C22H16N2O2"} {"text":"Question: What is the chemical formula of the molecule with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: C14H21N5O4"}", "/scratch/micpie/export/rdkit_features/train_11-5.jsonl": "{"text":"The number of rotatable bonds of the compound with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl is 3."} {"text":"The rotatable bond count of the chemical with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C is 3."}", "/scratch/micpie/export/rdkit_features/train_101-0.jsonl": "{"text":"The formula of the compound with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3 is C18H18N4O2."} {"text":"The chemical formula of the molecule with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1 is C25H23N3O4."}", "/scratch/micpie/export/rdkit_features/valid_109-6.jsonl": "{"text":"The count of aromatic bonds of the molecule with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl is 12."} {"text":"The number of aromatic bonds of the compound with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F is 17."}", "/scratch/micpie/export/rdkit_features/test_15-19.jsonl": "{"text":"Question: What is the acid group count of the compound with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the chemical with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_22-7.jsonl": "{"text":"The acid group count of the molecule with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O is 0."} {"text":"The acid group count of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4 is 0."}", "/scratch/micpie/export/rdkit_features/valid_111-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_120-8.jsonl": "{"text":"The count of basic groups of the compound with SMILES Cc1ccc(cc1)NC(=O)CSc2nnc(n2C)C=NC(=O)CCc3ccccc3Cl is 0."} {"text":"The number of basic groups of the molecule with SMILES Cc1c(cnn1c2ccccc2)N[C@@H](C)C(=O)Nc3cccc(c3Cl)Cl is 0."}", "/scratch/micpie/export/rdkit_features/valid_15-0.jsonl": "{"text":"The formula of the compound with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N is C18H20N4O."} {"text":"The chemical formula of the chemical with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C is C18H35N2O2+."}", "/scratch/micpie/export/rdkit_features/test_11-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_118-17.jsonl": "{"text":"Question: What is the rotatable bond count of the chemical with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O?\nAnswer: 6"} {"text":"Question: What is the count of rotatable bonds of the molecule with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_14-7.jsonl": "{"text":"The count of acid groups of the compound with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C is 0."} {"text":"The number of acid groups of the chemical with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O is 0."}", "/scratch/micpie/export/rdkit_features/valid_19-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the compound with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC?\nAnswer: 7"} {"text":"Question: What is the count of heteroatoms of the chemical with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/valid_114-5.jsonl": "{"text":"The rotatable bond count of the molecule with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl is 5."} {"text":"The number of rotatable bonds of the compound with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4 is 8."}", "/scratch/micpie/export/rdkit_features/train_14-11.jsonl": "{"text":"User: I want to analyze a compound with a chemical formula of C20H30N4O.\nAssistant: Nice, do you have some additional requirements I should take into account?\nUser: Yes, I want the number of hydrogen bond donor sites to be 2, the number of hydrogen bond acceptors to be 3.\nAssistant: In that case, I the compound with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C."} {"text":"User: I want to create a molecule with a chemical formula of C14H14BrNO3.\nAssistant: That is a very interesting question, do you have some additional conditions I should take into account?\nUser: Yes, I want the number of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 4.\nAssistant: Then, I recommend the molecule with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O."}", "/scratch/micpie/export/rdkit_features/valid_106-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl?\nAnswer: 5"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_16-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the chemical with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C?\nAnswer: 4"} {"text":"Question: What is the count of heteroatoms of the chemical with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/train_103-12.jsonl": "{"text":"Question: What is the formula of the molecule with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1?\nAnswer: C18H14FNO2"} {"text":"Question: What is the chemical formula of the molecule with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: C14H21N5O4"}", "/scratch/micpie/export/rdkit_features/valid_31-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC is C24H27FN2O3."} {"text":"The chemical formula of the chemical with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3 is C20H27FN2O4."}", "/scratch/micpie/export/rdkit_features/test_26-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3 is 0."} {"text":"The number of hydrogen bond donors of the compound with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br is 2."}", "/scratch/micpie/export/rdkit_features/valid_21-23.jsonl": "{"text":"User: I want to synthesize a molecule with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptors of 6 and a Wildman-Crippen LogP value computed using RDKit of 0.08.\nAssistant: Interesting, do you have some additional I should take into account?\nUser: Yes, I want the formula to be C18H26N2O5S.\nAssistant: Then, I recommend the molecule with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-]."} {"text":"User: I want to synthesize a molecule with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptor sites of 4 and a Wildman-Crippen LogP value computed using RDKit of 2.49.\nAssistant: Cool, do you have some additional conditions that I should consider?\nUser: Yes, I want the molecular formula to be C22H33N3O3.\nAssistant: In that situation, I recommend the molecule with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3."}", "/scratch/micpie/export/rdkit_features/test_8-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C is 0."} {"text":"The count of hydrogen bond donors of the molecule with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F is 0."}", "/scratch/micpie/export/rdkit_features/test_26-20.jsonl": "{"text":"Question: What is the number of basic groups of the compound with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3?\nAnswer: 0"} {"text":"Question: What is the basic group count of the molecule with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_25-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4?\nAnswer: 6"} {"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_118-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the chemical with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F?\nAnswer: 3"} {"text":"Question: What is the count of heteroatoms of the chemical with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/test_33-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the chemical with SMILES Cc1ccc(cc1)C(=O)N2CCC(CC2)NC(=O)N[C@@H](C)[C@H](C(F)(F)F)O?\nAnswer: 4"} {"text":"Question: What is the rotatable bond count of the molecule with SMILES Cc1cc(no1)OCc2nc(no2)c3cc(c(n(c3=O)C)C)Br?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_104-19.jsonl": "{"text":"Question: What is the acid group count of the chemical with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 0"} {"text":"Question: What is the acid group count of the molecule with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_106-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl is 38.66."} {"text":"The total sum of atomic polarizabilities of the molecule with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F is 58.52."}", "/scratch/micpie/export/rdkit_features/test_21-6.jsonl": "{"text":"The count of aromatic bonds of the molecule with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-] is 12."} {"text":"The count of aromatic bonds of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O is 5."}", "/scratch/micpie/export/rdkit_features/test_108-16.jsonl": "{"text":"Question: What is the ring count of the chemical with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5?\nAnswer: 5"} {"text":"Question: What is the number of rings of the compound with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_11-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2?\nAnswer: 51.26"} {"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C?\nAnswer: 62.09"}", "/scratch/micpie/export/rdkit_features/valid_114-7.jsonl": "{"text":"The number of acid groups of the molecule with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl is 0."} {"text":"The count of acid groups of the chemical with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4 is 0."}", "/scratch/micpie/export/rdkit_features/valid_32-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3 is 1."} {"text":"The number of hydrogen bond donor sites of the compound with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O is 2."}", "/scratch/micpie/export/rdkit_features/valid_20-11.jsonl": "{"text":"User: I want to make a chemical with a chemical formula of C25H33N3O3.\nAssistant: Interesting, do you have some additional constraints I should take into account?\nUser: Yeah, I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 3.\nAssistant: In that scenario, I the chemical with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3."} {"text":"User: I want to make a molecule with a molecular formula of C16H19ClNO6S-.\nAssistant: That is a very interesting question, do you have some additional conditions I should take into account?\nUser: Yep, I want the count of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 6.\nAssistant: In that scenario, I advise the molecule with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-]."}", "/scratch/micpie/export/rdkit_features/valid_19-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the molecule with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC is 4."} {"text":"The number of hydrogen bond acceptors of the chemical with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F is 5."}", "/scratch/micpie/export/rdkit_features/test_31-20.jsonl": "{"text":"Question: What is the count of basic groups of the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the molecule with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_30-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the molecule with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_2-4.jsonl": "{"text":"The count of rings of the molecule with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C is 3."} {"text":"The ring count of the compound with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4 is 4."}", "/scratch/micpie/export/rdkit_features/train_1-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the chemical with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4?\nAnswer: 7"} {"text":"Question: What is the rotatable bond count of the chemical with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_32-16.jsonl": "{"text":"Question: What is the number of rings of the compound with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3?\nAnswer: 3"} {"text":"Question: What is the number of rings of the molecule with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_12-11.jsonl": "{"text":"User: I want to analyze a compound with a formula of C18H18FN3OS.\nAssistant: That is a very interesting question, do you have some additional conditions that help me narrow down the search?\nUser: Yeah, I want the number of hydrogen bond donor sites to be 2, the number of hydrogen bond acceptor sites to be 3.\nAssistant: I the compound with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C."} {"text":"User: I want to synthesize a molecule with a chemical formula of C21H29FN2O.\nAssistant: Interesting, do you have some additional limitations that help me narrow down the search?\nUser: Yea, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 2.\nAssistant: I recommend the molecule with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4."}", "/scratch/micpie/export/rdkit_features/train_107-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl is 68.15."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5 is 63.23."}", "/scratch/micpie/export/rdkit_features/train_120-12.jsonl": "{"text":"Question: What is the formula of the compound with SMILES Cc1ccc(cc1)NC(=O)CSc2nnc(n2C)C=NC(=O)CCc3ccccc3Cl?\nAnswer: C22H22ClN5O2S"} {"text":"Question: What is the molecular formula of the compound with SMILES Cc1c(cnn1c2ccccc2)N[C@@H](C)C(=O)Nc3cccc(c3Cl)Cl?\nAnswer: C19H18Cl2N4O"}", "/scratch/micpie/export/rdkit_features/train_33-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES COc1ccc(c(n1)C(F)(F)F)NC(=O)C(=O)NCCC[NH+]2CCCCC2?\nAnswer: C17H24F3N4O3+"} {"text":"Question: What is the molecular formula of the compound with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@H]3CCSC3)Br?\nAnswer: C15H20BrN4O2S+"}", "/scratch/micpie/export/rdkit_features/train_116-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the chemical with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 1."} {"text":"The count of hydrogen bond donors of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br is 0."}", "/scratch/micpie/export/rdkit_features/valid_114-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl is C20H17Cl2N3O2S2."} {"text":"The chemical formula of the chemical with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4 is C27H24ClN3O2."}", "/scratch/micpie/export/rdkit_features/valid_113-7.jsonl": "{"text":"The acid group count of the chemical with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 0."} {"text":"The number of acid groups of the compound with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 0."}", "/scratch/micpie/export/rdkit_features/valid_25-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4?\nAnswer: 62.44"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3?\nAnswer: 58.87"}", "/scratch/micpie/export/rdkit_features/valid_110-3.jsonl": "{"text":"The heteroatom count of the compound with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O is 8."} {"text":"The count of heteroatoms of the compound with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C is 10."}", "/scratch/micpie/export/rdkit_features/test_103-1.jsonl": "{"text":"The number of hydrogen bond donors of the molecule with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1 is 2."} {"text":"The number of hydrogen bond donor sites of the compound with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 1."}", "/scratch/micpie/export/rdkit_features/test_117-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc5cc(ccc5c4)Br is C26H25BrN2O3."} {"text":"The molecular formula of the molecule with SMILES C[C@H](C(C)(C)OC)[NH2+]CCNc1ccccc1 is C14H25N2O+."}", "/scratch/micpie/export/rdkit_features/train_101-11.jsonl": "{"text":"User: I want to analyze a compound with a chemical formula of C18H18N4O2.\nAssistant: Nice, do you have some additional conditions I should take into account?\nUser: Yes, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 5.\nAssistant: Then, I recommend the compound with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3."} {"text":"User: I want to synthesize a chemical with a chemical formula of C25H23N3O4.\nAssistant: Cool, do you have some additional that help me narrow down the search?\nUser: I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 7.\nAssistant: In that case, I suggest the chemical with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1."}", "/scratch/micpie/export/rdkit_features/train_2-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6 is 7."} {"text":"The number of hydrogen bond acceptors of the chemical with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O is 5."}", "/scratch/micpie/export/rdkit_features/test_29-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4?\nAnswer: 2"} {"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_22-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O?\nAnswer: 3"} {"text":"Question: What is the ring count of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_118-11.jsonl": "{"text":"User: I want to design a molecule with a chemical formula of C14H23FNO+.\nAssistant: That's interesting, do you have some additional limitations?\nUser: Indeed, I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 1.\nAssistant: In that case, I recommend the molecule with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O."} {"text":"User: I want to analyze a chemical with a chemical formula of C15H20FN3O4.\nAssistant: That's interesting, do you have some additional limitations?\nUser: I want the count of hydrogen bond donors to be 3, the number of hydrogen bond acceptors to be 5.\nAssistant: I propose the chemical with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO."}", "/scratch/micpie/export/rdkit_features/train_109-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3 is 5."} {"text":"The count of hydrogen bond acceptors of the molecule with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C is 8."}", "/scratch/micpie/export/rdkit_features/train_28-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl is C14H9BrClNO2S."} {"text":"The molecular formula of the chemical with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F is C20H26FN3O2."}", "/scratch/micpie/export/rdkit_features/train_120-11.jsonl": "{"text":"User: I want to make a compound with a chemical formula of C22H22ClN5O2S.\nAssistant: That's interesting, do you have some additional requirements I should take into account?\nUser: Yea, I want the number of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 6.\nAssistant: In that scenario, I the compound with SMILES Cc1ccc(cc1)NC(=O)CSc2nnc(n2C)C=NC(=O)CCc3ccccc3Cl."} {"text":"User: I want to design a chemical with a chemical formula of C19H18Cl2N4O.\nAssistant: Cool, do you have some additional limitations I should take into account?\nUser: Yes, I want the number of hydrogen bond donor sites to be 2, the number of hydrogen bond acceptors to be 4.\nAssistant: In that situation, I recommend the chemical with SMILES Cc1c(cnn1c2ccccc2)N[C@@H](C)C(=O)Nc3cccc(c3Cl)Cl."}", "/scratch/micpie/export/rdkit_features/test_105-3.jsonl": "{"text":"The number of heteroatoms of the molecule with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N is 5."} {"text":"The count of heteroatoms of the molecule with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O is 7."}", "/scratch/micpie/export/rdkit_features/test_107-4.jsonl": "{"text":"The count of rings of the molecule with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC is 2."} {"text":"The ring count of the chemical with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC is 3."}", "/scratch/micpie/export/rdkit_features/test_4-0.jsonl": "{"text":"The molecular formula of the compound with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4 is C23H20ClN3O3."} {"text":"The molecular formula of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3 is C23H29N2O3S+."}", "/scratch/micpie/export/rdkit_features/test_109-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl is 5."} {"text":"The number of hydrogen bond acceptors of the molecule with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N is 5."}", "/scratch/micpie/export/rdkit_features/train_14-23.jsonl": "{"text":"User: I want to design a chemical with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptors of 3 and a LogP value computed using the Wildman-Crippen method of 4.83.\nAssistant: That's interesting, do you have some additional requirements that I should consider?\nUser: Yes, I want the molecular formula to be C20H30N4O.\nAssistant: In that case, I propose the chemical with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C."} {"text":"User: I want to make a compound with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 4 and a Wildman-Crippen LogP value of 2.72.\nAssistant: That's interesting, do you have some additional conditions that help me narrow down the search?\nUser: Yes, I want the formula to be C14H14BrNO3.\nAssistant: In that case, I advise the compound with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O."}", "/scratch/micpie/export/rdkit_features/valid_106-23.jsonl": "{"text":"User: I want to analyze a molecule with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptor sites of 5 and a Wildman-Crippen LogP value of 2.32.\nAssistant: Cool, do you have some additional constraints that I should consider?\nUser: Yep, I want the chemical formula to be C13H11ClFN3O3.\nAssistant: I the molecule with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl."} {"text":"User: I want to make a compound with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptor sites of 3 and a Wildman-Crippen LogP value computed using RDKit of 0.70.\nAssistant: Interesting, do you have some additional I should take into account?\nUser: I want the molecular formula to be C18H24F4N3O3S+.\nAssistant: Then, I suggest the compound with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F."}", "/scratch/micpie/export/rdkit_features/valid_13-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4 is 3."} {"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C is 4."}", "/scratch/micpie/export/rdkit_features/train_119-15.jsonl": "{"text":"Question: What is the heteroatom count of the compound with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO?\nAnswer: 9"} {"text":"Question: What is the number of heteroatoms of the molecule with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl?\nAnswer: 10"}", "/scratch/micpie/export/rdkit_features/test_32-7.jsonl": "{"text":"The acid group count of the chemical with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3 is 2."} {"text":"The count of acid groups of the compound with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3 is 0."}", "/scratch/micpie/export/rdkit_features/train_0-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the chemical with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1?\nAnswer: 3"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/valid_102-20.jsonl": "{"text":"Question: What is the number of basic groups of the molecule with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the compound with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_0-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1?\nAnswer: 6"} {"text":"Question: What is the ring count of the chemical with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_116-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the molecule with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 17"} {"text":"Question: What is the number of aromatic bonds of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br?\nAnswer: 17"}", "/scratch/micpie/export/rdkit_features/test_22-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O?\nAnswer: 64.67"} {"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4?\nAnswer: 71.03"}", "/scratch/micpie/export/rdkit_features/valid_18-0.jsonl": "{"text":"The formula of the compound with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is C22H23N5O7."} {"text":"The formula of the chemical with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O is C12H14Br2F3NO."}", "/scratch/micpie/export/rdkit_features/test_28-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC is 53.59."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C is 58.17."}", "/scratch/micpie/export/rdkit_features/train_30-20.jsonl": "{"text":"Question: What is the basic group count of the chemical with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the molecule with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_1-6.jsonl": "{"text":"The count of aromatic bonds of the chemical with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5 is 21."} {"text":"The aromatic bond count of the molecule with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F is 17."}", "/scratch/micpie/export/rdkit_features/train_26-20.jsonl": "{"text":"Question: What is the number of basic groups of the chemical with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F?\nAnswer: 0"} {"text":"Question: What is the basic group count of the compound with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_26-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the molecule with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3?\nAnswer: 7"} {"text":"Question: What is the heteroatom count of the molecule with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_32-7.jsonl": "{"text":"The acid group count of the molecule with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C is 0."} {"text":"The number of acid groups of the chemical with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2 is 0."}", "/scratch/micpie/export/rdkit_features/valid_17-7.jsonl": "{"text":"The number of acid groups of the chemical with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S is 0."} {"text":"The count of acid groups of the molecule with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 0."}", "/scratch/micpie/export/rdkit_features/train_5-6.jsonl": "{"text":"The aromatic bond count of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C is 16."} {"text":"The number of aromatic bonds of the chemical with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C is 6."}", "/scratch/micpie/export/rdkit_features/train_13-12.jsonl": "{"text":"Question: What is the chemical formula of the compound with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: C21H32FNO2"} {"text":"Question: What is the formula of the chemical with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C?\nAnswer: C20H30N4O"}", "/scratch/micpie/export/rdkit_features/test_19-11.jsonl": "{"text":"User: I want to make a compound with a molecular formula of C21H26ClN3O4.\nAssistant: That's interesting, do you have some additional that help me narrow down the search?\nUser: Yep, I want the number of hydrogen bond donor sites to be 2, the count of hydrogen bond acceptors to be 4.\nAssistant: Then, I advise the compound with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl."} {"text":"User: I want to design a chemical with a formula of C21H34FN4O2S+.\nAssistant: That's interesting, do you have some additional requirements I should take into account?\nUser: Yeah, I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 4.\nAssistant: I propose the chemical with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4."}", "/scratch/micpie/export/rdkit_features/valid_9-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_1-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4?\nAnswer: C25H33N2O3+"} {"text":"Question: What is the formula of the molecule with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl?\nAnswer: C17H22ClF3N2O2S"}", "/scratch/micpie/export/rdkit_features/train_17-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the chemical with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F?\nAnswer: 6"} {"text":"Question: What is the rotatable bond count of the compound with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_26-11.jsonl": "{"text":"User: I want to analyze a chemical with a formula of C18H24F3NO3.\nAssistant: Interesting, do you have some additional limitations I should consider?\nUser: Yea, I want the count of hydrogen bond donors to be 0, the number of hydrogen bond acceptor sites to be 3.\nAssistant: In that scenario, I propose the chemical with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2."} {"text":"User: I want to create a compound with a chemical formula of C19H20N4O4.\nAssistant: That is a very interesting question, do you have some additional limitations I should consider?\nUser: Yes, I want the number of hydrogen bond donor sites to be 3, the count of hydrogen bond acceptors to be 6.\nAssistant: In that situation, I propose the compound with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC."}", "/scratch/micpie/export/rdkit_features/valid_11-4.jsonl": "{"text":"The ring count of the molecule with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl is 3."} {"text":"The number of rings of the chemical with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C is 3."}", "/scratch/micpie/export/rdkit_features/test_111-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_112-23.jsonl": "{"text":"User: I want to make a molecule with a number of hydrogen bond donor sites of 0, a number of hydrogen bond acceptors of 9 and a Wildman-Crippen LogP value computed using RDKit of 2.25.\nAssistant: Nice, do you have some additional requirements that help me narrow down the search?\nUser: Yep, I want the chemical formula to be C21H26N6O3S.\nAssistant: In that case, I advise the molecule with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC."} {"text":"User: I want to design a compound with a number of hydrogen bond donor sites of 3, a number of hydrogen bond acceptor sites of 5 and a LogP value computed using the Wildman-Crippen method of 0.31.\nAssistant: Nice, do you have some additional ?\nUser: I want the chemical formula to be C22H27N6O2S+.\nAssistant: In that case, I suggest the compound with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N."}", "/scratch/micpie/export/rdkit_features/valid_14-16.jsonl": "{"text":"Question: What is the ring count of the compound with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C?\nAnswer: 3"} {"text":"Question: What is the number of rings of the molecule with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_33-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the compound with SMILES c1ccc2c(c1)cn(n2)CC(=O)N3C[C@@H]4[C@H]3CN(CC4)C(=O)[C@@]56CCC[C@@H]5C6 is 2.29."} {"text":"The Wildman-Crippen LogP value of the compound with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@@H]3CCSC3)Br is 1.03."}", "/scratch/micpie/export/rdkit_features/test_13-3.jsonl": "{"text":"The count of heteroatoms of the molecule with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC is 5."} {"text":"The count of heteroatoms of the molecule with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4 is 3."}", "/scratch/micpie/export/rdkit_features/valid_10-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C?\nAnswer: 10"} {"text":"Question: What is the aromatic bond count of the chemical with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_29-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4?\nAnswer: 2"} {"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_28-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the compound with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC is 6."} {"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C is 3."}", "/scratch/micpie/export/rdkit_features/valid_19-22.jsonl": "{"text":"User: I want to create a chemical with a count of hydrogen bond donors of 3 and a number of hydrogen bond acceptors of 4.\nAssistant: That is a very interesting question, do you have some additional constraints I should take into account?\nUser: Indeed, I want the heteroatom count to be 7.\nAssistant: I advise the chemical with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC."} {"text":"User: I want to synthesize a compound with a count of hydrogen bond donors of 2 and a number of hydrogen bond acceptor sites of 5.\nAssistant: Interesting, do you have some additional limitations I should take into account?\nUser: Yes, I want the number of heteroatoms to be 9.\nAssistant: In that situation, I recommend the compound with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F."}", "/scratch/micpie/export/rdkit_features/test_102-3.jsonl": "{"text":"The heteroatom count of the chemical with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3 is 4."} {"text":"The count of heteroatoms of the chemical with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1 is 8."}", "/scratch/micpie/export/rdkit_features/train_104-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 7"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_116-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 5.01."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br is 5.13."}", "/scratch/micpie/export/rdkit_features/train_116-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 71.83."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br is 70.09."}", "/scratch/micpie/export/rdkit_features/test_107-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_6-11.jsonl": "{"text":"User: I want to create a chemical with a formula of C25H26N3O3+.\nAssistant: Cool, do you have some additional constraints I should take into account?\nUser: I want the count of hydrogen bond donors to be 3, the number of hydrogen bond acceptor sites to be 4.\nAssistant: Then, I recommend the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+]."} {"text":"User: I want to analyze a chemical with a molecular formula of C22H24FN5O2.\nAssistant: Do you have some additional limitations?\nUser: I want the number of hydrogen bond donor sites to be 0, the number of hydrogen bond acceptor sites to be 6.\nAssistant: In that situation, I advise the chemical with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F."}", "/scratch/micpie/export/rdkit_features/train_20-23.jsonl": "{"text":"User: I want to synthesize a molecule with a number of hydrogen bond donors of 2, a count of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value computed using RDKit of 3.95.\nAssistant: Cool, do you have some additional that I should consider?\nUser: Yeah, I want the chemical formula to be C25H33N3O3.\nAssistant: In that scenario, I propose the molecule with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3."} {"text":"User: I want to create a molecule with a number of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 6 and a LogP value computed using the Wildman-Crippen method of 0.69.\nAssistant: Nice, do you have some additional that help me narrow down the search?\nUser: I want the chemical formula to be C17H15FNO6S-.\nAssistant: In that scenario, I propose the molecule with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC."}", "/scratch/micpie/export/rdkit_features/train_115-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F?\nAnswer: 5"} {"text":"Question: What is the count of rotatable bonds of the chemical with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_101-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3?\nAnswer: 49.69"} {"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1?\nAnswer: 65.84"}", "/scratch/micpie/export/rdkit_features/valid_108-6.jsonl": "{"text":"The aromatic bond count of the compound with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O is 12."} {"text":"The aromatic bond count of the compound with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C is 6."}", "/scratch/micpie/export/rdkit_features/test_5-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5?\nAnswer: 1"} {"text":"Question: What is the number of basic groups of the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_28-20.jsonl": "{"text":"Question: What is the basic group count of the molecule with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl?\nAnswer: 0"} {"text":"Question: What is the basic group count of the molecule with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_4-4.jsonl": "{"text":"The number of rings of the molecule with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4 is 4."} {"text":"The number of rings of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3 is 4."}", "/scratch/micpie/export/rdkit_features/test_14-4.jsonl": "{"text":"The count of rings of the chemical with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3 is 3."} {"text":"The count of rings of the chemical with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O is 1."}", "/scratch/micpie/export/rdkit_features/train_26-12.jsonl": "{"text":"Question: What is the chemical formula of the compound with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F?\nAnswer: C15H15F6NO2"} {"text":"Question: What is the chemical formula of the chemical with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C?\nAnswer: C21H31N4O+"}", "/scratch/micpie/export/rdkit_features/test_27-19.jsonl": "{"text":"Question: What is the count of acid groups of the compound with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F?\nAnswer: 0"} {"text":"Question: What is the acid group count of the molecule with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_12-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the chemical with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C?\nAnswer: 4"} {"text":"Question: What is the count of heteroatoms of the chemical with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_33-1.jsonl": "{"text":"The number of hydrogen bond donors of the compound with SMILES Cc1ccc(cc1)C(=O)N2CCC(CC2)NC(=O)N[C@@H](C)[C@H](C(F)(F)F)O is 3."} {"text":"The number of hydrogen bond donors of the compound with SMILES Cc1cc(no1)OCc2nc(no2)c3cc(c(n(c3=O)C)C)Br is 0."}", "/scratch/micpie/export/rdkit_features/train_25-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C?\nAnswer: 3"} {"text":"Question: What is the ring count of the compound with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_104-1.jsonl": "{"text":"The number of hydrogen bond donors of the compound with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2 is 2."} {"text":"The count of hydrogen bond donors of the chemical with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3 is 1."}", "/scratch/micpie/export/rdkit_features/train_20-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3?\nAnswer: 2"} {"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_9-6.jsonl": "{"text":"The count of aromatic bonds of the molecule with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C is 6."} {"text":"The count of aromatic bonds of the molecule with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4 is 10."}", "/scratch/micpie/export/rdkit_features/valid_30-11.jsonl": "{"text":"User: I want to design a chemical with a formula of C21H34N4O.\nAssistant: That's interesting, do you have some additional requirements I should take into account?\nUser: Yeah, I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptors to be 4.\nAssistant: In that scenario, I the chemical with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3."} {"text":"User: I want to analyze a compound with a molecular formula of C24H24F2N2O2.\nAssistant: Do you have some additional requirements I should take into account?\nUser: Yep, I want the count of hydrogen bond donors to be 0, the number of hydrogen bond acceptors to be 2.\nAssistant: I the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F."}", "/scratch/micpie/export/rdkit_features/test_18-6.jsonl": "{"text":"The number of aromatic bonds of the chemical with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O is 6."} {"text":"The count of aromatic bonds of the compound with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C is 11."}", "/scratch/micpie/export/rdkit_features/train_17-6.jsonl": "{"text":"The number of aromatic bonds of the chemical with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F is 6."} {"text":"The aromatic bond count of the compound with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 12."}", "/scratch/micpie/export/rdkit_features/train_104-3.jsonl": "{"text":"The number of heteroatoms of the molecule with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 9."} {"text":"The count of heteroatoms of the molecule with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl is 6."}", "/scratch/micpie/export/rdkit_features/valid_26-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the compound with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2?\nAnswer: 7"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/train_116-11.jsonl": "{"text":"User: I want to design a molecule with a formula of C27H25ClFN3O2.\nAssistant: Nice, do you have some additional I should consider?\nUser: Yeah, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 3.\nAssistant: In that situation, I advise the molecule with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F."} {"text":"User: I want to synthesize a chemical with a molecular formula of C26H25BrN2O3.\nAssistant: Cool, do you have some additional conditions that I should consider?\nUser: Yes, I want the number of hydrogen bond donors to be 0, the number of hydrogen bond acceptor sites to be 3.\nAssistant: In that case, I recommend the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br."}", "/scratch/micpie/export/rdkit_features/test_109-20.jsonl": "{"text":"Question: What is the basic group count of the chemical with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl?\nAnswer: 0"} {"text":"Question: What is the basic group count of the compound with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_104-3.jsonl": "{"text":"The number of heteroatoms of the chemical with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2 is 10."} {"text":"The count of heteroatoms of the molecule with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3 is 7."}", "/scratch/micpie/export/rdkit_features/train_107-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl?\nAnswer: 68.15"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5?\nAnswer: 63.23"}", "/scratch/micpie/export/rdkit_features/train_12-16.jsonl": "{"text":"Question: What is the ring count of the compound with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C?\nAnswer: 3"} {"text":"Question: What is the count of rings of the chemical with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_12-7.jsonl": "{"text":"The acid group count of the chemical with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl is 0."} {"text":"The number of acid groups of the chemical with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3 is 0."}", "/scratch/micpie/export/rdkit_features/test_30-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the compound with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3?\nAnswer: 12"} {"text":"Question: What is the number of aromatic bonds of the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/valid_113-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is C21H32N5O3S+."} {"text":"The molecular formula of the compound with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is C21H19Cl2N3O2S2."}", "/scratch/micpie/export/rdkit_features/train_33-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES COc1ccc(c(n1)C(F)(F)F)NC(=O)C(=O)NCCC[NH+]2CCCCC2?\nAnswer: 54.40"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@H]3CCSC3)Br?\nAnswer: 51.69"}", "/scratch/micpie/export/rdkit_features/valid_20-22.jsonl": "{"text":"User: I want to make a compound with a count of hydrogen bond donors of 2 and a number of hydrogen bond acceptor sites of 3.\nAssistant: That's interesting, do you have some additional limitations I should take into account?\nUser: Yeah, I want the number of heteroatoms to be 6.\nAssistant: In that case, I suggest the compound with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3."} {"text":"User: I want to synthesize a compound with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 6.\nAssistant: Cool, do you have some additional that I should consider?\nUser: Yep, I want the count of heteroatoms to be 9.\nAssistant: In that scenario, I propose the compound with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-]."}", "/scratch/micpie/export/rdkit_features/train_107-15.jsonl": "{"text":"Question: What is the heteroatom count of the compound with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl?\nAnswer: 10"} {"text":"Question: What is the count of heteroatoms of the compound with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/train_8-6.jsonl": "{"text":"The count of aromatic bonds of the molecule with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N is 18."} {"text":"The number of aromatic bonds of the chemical with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO is 12."}", "/scratch/micpie/export/rdkit_features/valid_1-4.jsonl": "{"text":"The number of rings of the chemical with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5 is 5."} {"text":"The ring count of the compound with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F is 5."}", "/scratch/micpie/export/rdkit_features/test_1-20.jsonl": "{"text":"Question: What is the basic group count of the chemical with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4?\nAnswer: 1"} {"text":"Question: What is the count of basic groups of the molecule with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_6-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+] is 67.04."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F is 62.38."}", "/scratch/micpie/export/rdkit_features/valid_5-1.jsonl": "{"text":"The number of hydrogen bond donors of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C is 1."} {"text":"The count of hydrogen bond donors of the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+] is 3."}", "/scratch/micpie/export/rdkit_features/train_11-4.jsonl": "{"text":"The count of rings of the molecule with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl is 3."} {"text":"The number of rings of the molecule with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C is 3."}", "/scratch/micpie/export/rdkit_features/train_31-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F is 65.41."} {"text":"The sum of atomic polarizabilities of the compound with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4 is 61.53."}", "/scratch/micpie/export/rdkit_features/test_8-8.jsonl": "{"text":"The number of basic groups of the compound with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C is 0."} {"text":"The number of basic groups of the molecule with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F is 0."}", "/scratch/micpie/export/rdkit_features/valid_103-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO is 68.34."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N is 42.98."}", "/scratch/micpie/export/rdkit_features/valid_107-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the molecule with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C is 2."} {"text":"The count of hydrogen bond donors of the chemical with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C is 3."}", "/scratch/micpie/export/rdkit_features/test_9-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C is 70.19."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4 is 74.85."}", "/scratch/micpie/export/rdkit_features/train_100-3.jsonl": "{"text":"The heteroatom count of the compound with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C is 4."} {"text":"The heteroatom count of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3 is 6."}", "/scratch/micpie/export/rdkit_features/test_116-20.jsonl": "{"text":"Question: What is the basic group count of the molecule with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_105-20.jsonl": "{"text":"Question: What is the number of basic groups of the compound with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F?\nAnswer: 0"} {"text":"Question: What is the basic group count of the compound with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_13-19.jsonl": "{"text":"Question: What is the acid group count of the compound with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the molecule with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_17-19.jsonl": "{"text":"Question: What is the number of acid groups of the molecule with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the chemical with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_20-6.jsonl": "{"text":"The aromatic bond count of the molecule with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3 is 12."} {"text":"The aromatic bond count of the chemical with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-] is 6."}", "/scratch/micpie/export/rdkit_features/test_117-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc5cc(ccc5c4)Br?\nAnswer: 0"} {"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES C[C@H](C(C)(C)OC)[NH2+]CCNc1ccccc1?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_112-19.jsonl": "{"text":"Question: What is the count of acid groups of the compound with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC?\nAnswer: 0"} {"text":"Question: What is the acid group count of the compound with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_32-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3?\nAnswer: C19H30N5O2S+"} {"text":"Question: What is the chemical formula of the compound with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3?\nAnswer: C21H24N4O3"}", "/scratch/micpie/export/rdkit_features/train_101-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3 is 5."} {"text":"The number of rotatable bonds of the molecule with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1 is 8."}", "/scratch/micpie/export/rdkit_features/train_17-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F is 64.87."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 67.30."}", "/scratch/micpie/export/rdkit_features/valid_10-22.jsonl": "{"text":"User: I want to design a compound with a count of hydrogen bond donors of 0 and a count of hydrogen bond acceptors of 7.\nAssistant: That is a very interesting question, do you have some additional requirements?\nUser: Indeed, I want the number of heteroatoms to be 10.\nAssistant: Then, I the compound with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C."} {"text":"User: I want to create a molecule with a number of hydrogen bond donor sites of 1 and a number of hydrogen bond acceptor sites of 2.\nAssistant: Do you have some additional I should take into account?\nUser: Yep, I want the heteroatom count to be 4.\nAssistant: I recommend the molecule with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3."}", "/scratch/micpie/export/rdkit_features/train_116-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the chemical with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 3."} {"text":"The count of hydrogen bond acceptors of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br is 3."}", "/scratch/micpie/export/rdkit_features/valid_14-19.jsonl": "{"text":"Question: What is the count of acid groups of the molecule with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the molecule with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_108-20.jsonl": "{"text":"Question: What is the basic group count of the chemical with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the compound with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_120-3.jsonl": "{"text":"The count of heteroatoms of the compound with SMILES c1cc(oc1)CNC(=O)CSc2nc3ccc(cc3s2)n4c(c5c(c4O)[C@@H]6C[C@@H]5C=C6)O is 9."} {"text":"The number of heteroatoms of the molecule with SMILES CN(c1nc2cc(ccc2s1)Cl)C(=O)CSc3ccc(c(c3)F)F is 8."}", "/scratch/micpie/export/rdkit_features/test_2-20.jsonl": "{"text":"Question: What is the count of basic groups of the compound with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C?\nAnswer: 1"} {"text":"Question: What is the number of basic groups of the molecule with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_11-7.jsonl": "{"text":"The acid group count of the molecule with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl is 0."} {"text":"The count of acid groups of the molecule with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C is 0."}", "/scratch/micpie/export/rdkit_features/valid_7-20.jsonl": "{"text":"Question: What is the count of basic groups of the compound with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the molecule with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_113-0.jsonl": "{"text":"The chemical formula of the compound with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N is C20H24N5O2S2+."} {"text":"The chemical formula of the chemical with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is C20H17Cl2N3O3S2."}", "/scratch/micpie/export/rdkit_features/valid_100-6.jsonl": "{"text":"The count of aromatic bonds of the compound with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F is 5."} {"text":"The aromatic bond count of the molecule with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4 is 12."}", "/scratch/micpie/export/rdkit_features/valid_10-17.jsonl": "{"text":"Question: What is the rotatable bond count of the chemical with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C?\nAnswer: 7"} {"text":"Question: What is the count of rotatable bonds of the compound with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_105-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N is 1."} {"text":"The number of hydrogen bond donor sites of the compound with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O is 2."}", "/scratch/micpie/export/rdkit_features/valid_29-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl is 2.38."} {"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3 is 2.31."}", "/scratch/micpie/export/rdkit_features/valid_117-12.jsonl": "{"text":"Question: What is the formula of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4OC)Cl?\nAnswer: C27H27ClN2O4"} {"text":"Question: What is the molecular formula of the molecule with SMILES Cc1cc(ccc1[C@@H](C)[NH2+]CC2(CC2)CO)F?\nAnswer: C14H21FNO+"}", "/scratch/micpie/export/rdkit_features/valid_111-0.jsonl": "{"text":"The formula of the chemical with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4 is C24H35N8+."} {"text":"The formula of the compound with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC is C22H29N7O3."}", "/scratch/micpie/export/rdkit_features/test_100-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F is 54.34."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br is 40.17."}", "/scratch/micpie/export/rdkit_features/valid_4-4.jsonl": "{"text":"The number of rings of the chemical with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br is 3."} {"text":"The ring count of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C is 4."}", "/scratch/micpie/export/rdkit_features/train_29-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_117-6.jsonl": "{"text":"The count of aromatic bonds of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc5cc(ccc5c4)Br is 17."} {"text":"The count of aromatic bonds of the molecule with SMILES C[C@H](C(C)(C)OC)[NH2+]CCNc1ccccc1 is 6."}", "/scratch/micpie/export/rdkit_features/train_108-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2 is 11."} {"text":"The aromatic bond count of the molecule with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3 is 6."}", "/scratch/micpie/export/rdkit_features/test_5-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5 is 16."} {"text":"The number of aromatic bonds of the molecule with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F is 12."}", "/scratch/micpie/export/rdkit_features/valid_8-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C is 5."} {"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br is 2."}", "/scratch/micpie/export/rdkit_features/train_32-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C is 5."} {"text":"The number of aromatic bonds of the molecule with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2 is 11."}", "/scratch/micpie/export/rdkit_features/test_0-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1 is 7.58."} {"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC is 3.66."}", "/scratch/micpie/export/rdkit_features/train_15-22.jsonl": "{"text":"User: I want to make a molecule with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 5.\nAssistant: That is a very interesting question, do you have some additional constraints I should take into account?\nUser: I want the number of heteroatoms to be 6.\nAssistant: In that case, I suggest the molecule with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC."} {"text":"User: I want to analyze a compound with a number of hydrogen bond donors of 2 and a number of hydrogen bond acceptor sites of 3.\nAssistant: Interesting, do you have some additional conditions that help me narrow down the search?\nUser: Yes, I want the count of heteroatoms to be 5.\nAssistant: In that case, I suggest the compound with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C."}", "/scratch/micpie/export/rdkit_features/test_32-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the molecule with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3 is 6."} {"text":"The count of hydrogen bond acceptors of the molecule with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3 is 4."}", "/scratch/micpie/export/rdkit_features/train_11-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the molecule with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl is 2."} {"text":"The number of hydrogen bond acceptors of the chemical with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C is 1."}", "/scratch/micpie/export/rdkit_features/valid_0-8.jsonl": "{"text":"The count of basic groups of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1 is 0."} {"text":"The count of basic groups of the compound with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4 is 0."}", "/scratch/micpie/export/rdkit_features/train_25-4.jsonl": "{"text":"The count of rings of the molecule with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C is 3."} {"text":"The ring count of the compound with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F is 2."}", "/scratch/micpie/export/rdkit_features/train_106-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O is 0."} {"text":"The number of hydrogen bond donor sites of the molecule with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl is 2."}", "/scratch/micpie/export/rdkit_features/valid_17-15.jsonl": "{"text":"Question: What is the heteroatom count of the molecule with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S?\nAnswer: 9"} {"text":"Question: What is the heteroatom count of the compound with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 13"}", "/scratch/micpie/export/rdkit_features/test_22-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O?\nAnswer: C21H33N3O3"} {"text":"Question: What is the formula of the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4?\nAnswer: C21H40N6O+2"}", "/scratch/micpie/export/rdkit_features/train_9-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C is 2.14."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F is 3.83."}", "/scratch/micpie/export/rdkit_features/test_100-7.jsonl": "{"text":"The number of acid groups of the chemical with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F is 0."} {"text":"The count of acid groups of the chemical with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br is 0."}", "/scratch/micpie/export/rdkit_features/test_116-4.jsonl": "{"text":"The ring count of the molecule with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1 is 7."} {"text":"The number of rings of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl is 5."}", "/scratch/micpie/export/rdkit_features/valid_101-8.jsonl": "{"text":"The basic group count of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3 is 0."} {"text":"The number of basic groups of the chemical with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1 is 0."}", "/scratch/micpie/export/rdkit_features/train_9-23.jsonl": "{"text":"User: I want to design a chemical with a number of hydrogen bond donors of 3, a number of hydrogen bond acceptors of 3 and a LogP value computed using the Wildman-Crippen method of 2.14.\nAssistant: Nice, do you have some additional requirements I should consider?\nUser: Yeah, I want the chemical formula to be C25H32N3O3+.\nAssistant: Then, I recommend the chemical with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C."} {"text":"User: I want to synthesize a molecule with a number of hydrogen bond donor sites of 0, a count of hydrogen bond acceptors of 5 and a LogP value computed using the Wildman-Crippen method of 3.83.\nAssistant: Interesting, do you have some additional conditions I should take into account?\nUser: Yeah, I want the chemical formula to be C22H32FN5O.\nAssistant: In that case, I advise the molecule with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F."}", "/scratch/micpie/export/rdkit_features/valid_110-7.jsonl": "{"text":"The acid group count of the molecule with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O is 0."} {"text":"The number of acid groups of the molecule with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C is 0."}", "/scratch/micpie/export/rdkit_features/test_17-11.jsonl": "{"text":"User: I want to analyze a molecule with a formula of C20H29F3N2O3.\nAssistant: Interesting, do you have some additional limitations that help me narrow down the search?\nUser: Yes, I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptor sites to be 3.\nAssistant: In that scenario, I recommend the molecule with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F."} {"text":"User: I want to synthesize a chemical with a formula of C21H23N5O7.\nAssistant: That is a very interesting question, do you have some additional conditions that help me narrow down the search?\nUser: Yep, I want the count of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 9.\nAssistant: In that scenario, I suggest the chemical with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O."}", "/scratch/micpie/export/rdkit_features/test_6-12.jsonl": "{"text":"Question: What is the chemical formula of the compound with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3?\nAnswer: C22H34N4O4"} {"text":"Question: What is the molecular formula of the compound with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C?\nAnswer: C25H30N4O2"}", "/scratch/micpie/export/rdkit_features/train_5-15.jsonl": "{"text":"Question: What is the heteroatom count of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C?\nAnswer: 6"} {"text":"Question: What is the heteroatom count of the molecule with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/valid_112-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the chemical with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC?\nAnswer: 7"} {"text":"Question: What is the rotatable bond count of the molecule with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/train_105-20.jsonl": "{"text":"Question: What is the basic group count of the chemical with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the compound with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_12-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl is 6."} {"text":"The number of aromatic bonds of the compound with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3 is 5."}", "/scratch/micpie/export/rdkit_features/valid_32-5.jsonl": "{"text":"The number of rotatable bonds of the molecule with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3 is 9."} {"text":"The count of rotatable bonds of the chemical with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O is 3."}", "/scratch/micpie/export/rdkit_features/test_10-4.jsonl": "{"text":"The count of rings of the compound with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4 is 4."} {"text":"The count of rings of the compound with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F is 2."}", "/scratch/micpie/export/rdkit_features/valid_102-7.jsonl": "{"text":"The count of acid groups of the molecule with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1 is 0."} {"text":"The acid group count of the chemical with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2 is 0."}", "/scratch/micpie/export/rdkit_features/test_24-20.jsonl": "{"text":"Question: What is the number of basic groups of the compound with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl?\nAnswer: 0"} {"text":"Question: What is the basic group count of the molecule with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_111-8.jsonl": "{"text":"The number of basic groups of the molecule with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C is 0."} {"text":"The number of basic groups of the compound with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC is 0."}", "/scratch/micpie/export/rdkit_features/valid_117-23.jsonl": "{"text":"User: I want to design a chemical with a number of hydrogen bond donor sites of 0, a number of hydrogen bond acceptor sites of 4 and a Wildman-Crippen LogP value computed using RDKit of 5.03.\nAssistant: That's interesting, do you have some additional conditions?\nUser: Yep, I want the formula to be C27H27ClN2O4.\nAssistant: Then, I propose the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4OC)Cl."} {"text":"User: I want to analyze a compound with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptors of 1 and a Wildman-Crippen LogP value of 1.53.\nAssistant: Do you have some additional requirements that I should consider?\nUser: I want the molecular formula to be C14H21FNO+.\nAssistant: Then, I recommend the compound with SMILES Cc1cc(ccc1[C@@H](C)[NH2+]CC2(CC2)CO)F."}", "/scratch/micpie/export/rdkit_features/train_16-12.jsonl": "{"text":"Question: What is the formula of the compound with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C?\nAnswer: C17H28N3OS+"} {"text":"Question: What is the formula of the chemical with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F?\nAnswer: C23H28F2N2O3"}", "/scratch/micpie/export/rdkit_features/valid_16-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C?\nAnswer: 1"} {"text":"Question: What is the number of basic groups of the chemical with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_27-15.jsonl": "{"text":"Question: What is the heteroatom count of the molecule with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C?\nAnswer: 5"} {"text":"Question: What is the number of heteroatoms of the molecule with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_16-16.jsonl": "{"text":"Question: What is the number of rings of the molecule with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C?\nAnswer: 2"} {"text":"Question: What is the count of rings of the molecule with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_5-8.jsonl": "{"text":"The number of basic groups of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C is 1."} {"text":"The count of basic groups of the compound with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C is 0."}", "/scratch/micpie/export/rdkit_features/train_26-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F?\nAnswer: 0"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_21-16.jsonl": "{"text":"Question: What is the number of rings of the molecule with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC?\nAnswer: 2"} {"text":"Question: What is the number of rings of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_19-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC is 4."} {"text":"The number of hydrogen bond acceptors of the chemical with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3 is 3."}", "/scratch/micpie/export/rdkit_features/test_0-22.jsonl": "{"text":"User: I want to design a molecule with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 9.\nAssistant: That's interesting, do you have some additional constraints that help me narrow down the search?\nUser: Yea, I want the count of heteroatoms to be 10.\nAssistant: In that scenario, I recommend the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1."} {"text":"User: I want to analyze a chemical with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 6.\nAssistant: That is a very interesting question, do you have some additional I should consider?\nUser: Indeed, I want the number of heteroatoms to be 7.\nAssistant: Then, I advise the chemical with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC."}", "/scratch/micpie/export/rdkit_features/test_0-16.jsonl": "{"text":"Question: What is the number of rings of the compound with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1?\nAnswer: 7"} {"text":"Question: What is the number of rings of the compound with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_25-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4?\nAnswer: 69.40"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3?\nAnswer: 51.58"}", "/scratch/micpie/export/rdkit_features/train_32-3.jsonl": "{"text":"The number of heteroatoms of the compound with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C is 9."} {"text":"The heteroatom count of the compound with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2 is 10."}", "/scratch/micpie/export/rdkit_features/train_15-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC?\nAnswer: C17H22N4O2"} {"text":"Question: What is the chemical formula of the compound with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C?\nAnswer: C17H28N3OS+"}", "/scratch/micpie/export/rdkit_features/train_14-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the chemical with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C is 3."} {"text":"The number of hydrogen bond acceptors of the chemical with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O is 4."}", "/scratch/micpie/export/rdkit_features/train_4-4.jsonl": "{"text":"The ring count of the chemical with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4 is 4."} {"text":"The number of rings of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C is 4."}", "/scratch/micpie/export/rdkit_features/test_1-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4 is 3."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC is 8."}", "/scratch/micpie/export/rdkit_features/train_117-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(cc4)Oc5ccc(cn5)Cl?\nAnswer: 8"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES C[C@@H](C(C)(C)OC)[NH2+]CCNc1ccccc1?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_100-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C is 59.25."} {"text":"The total sum of atomic polarizabilities of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3 is 45.63."}", "/scratch/micpie/export/rdkit_features/train_119-4.jsonl": "{"text":"The count of rings of the chemical with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO is 2."} {"text":"The number of rings of the molecule with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl is 3."}", "/scratch/micpie/export/rdkit_features/valid_23-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4?\nAnswer: 6"} {"text":"Question: What is the number of rotatable bonds of the chemical with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_23-1.jsonl": "{"text":"The number of hydrogen bond donors of the chemical with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4 is 1."} {"text":"The number of hydrogen bond donors of the molecule with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3 is 2."}", "/scratch/micpie/export/rdkit_features/valid_5-18.jsonl": "{"text":"Question: What is the aromatic bond count of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C?\nAnswer: 21"} {"text":"Question: What is the count of aromatic bonds of the molecule with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+]?\nAnswer: 18"}", "/scratch/micpie/export/rdkit_features/test_5-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5 is 2.38."} {"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F is 3.88."}", "/scratch/micpie/export/rdkit_features/valid_117-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4OC)Cl is 5.03."} {"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES Cc1cc(ccc1[C@@H](C)[NH2+]CC2(CC2)CO)F is 1.53."}", "/scratch/micpie/export/rdkit_features/valid_115-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4 is 67.42."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F is 71.32."}", "/scratch/micpie/export/rdkit_features/train_25-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C?\nAnswer: 6"} {"text":"Question: What is the count of rotatable bonds of the compound with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_11-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl is 58.75."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C is 50.18."}", "/scratch/micpie/export/rdkit_features/test_29-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4 is C21H33N4O+."} {"text":"The chemical formula of the molecule with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C is C21H38N2O2."}", "/scratch/micpie/export/rdkit_features/test_27-5.jsonl": "{"text":"The rotatable bond count of the compound with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F is 4."} {"text":"The number of rotatable bonds of the compound with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F is 5."}", "/scratch/micpie/export/rdkit_features/test_33-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES Cc1ccc(cc1)C(=O)N2CCC(CC2)NC(=O)N[C@@H](C)[C@H](C(F)(F)F)O is 0."} {"text":"The count of acid groups of the chemical with SMILES Cc1cc(no1)OCc2nc(no2)c3cc(c(n(c3=O)C)C)Br is 0."}", "/scratch/micpie/export/rdkit_features/train_8-19.jsonl": "{"text":"Question: What is the count of acid groups of the chemical with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the molecule with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_111-19.jsonl": "{"text":"Question: What is the number of acid groups of the compound with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the compound with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_4-8.jsonl": "{"text":"The count of basic groups of the chemical with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4 is 1."} {"text":"The basic group count of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C is 1."}", "/scratch/micpie/export/rdkit_features/train_20-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3 is 10."} {"text":"The number of rotatable bonds of the chemical with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC is 6."}", "/scratch/micpie/export/rdkit_features/train_19-7.jsonl": "{"text":"The count of acid groups of the compound with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC is 0."} {"text":"The acid group count of the molecule with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3 is 0."}", "/scratch/micpie/export/rdkit_features/train_16-19.jsonl": "{"text":"Question: What is the number of acid groups of the compound with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the compound with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_18-12.jsonl": "{"text":"Question: What is the molecular formula of the molecule with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O?\nAnswer: C20H25N3O9S"} {"text":"Question: What is the chemical formula of the molecule with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C?\nAnswer: C21H28N2O3S2"}", "/scratch/micpie/export/rdkit_features/train_26-6.jsonl": "{"text":"The number of aromatic bonds of the chemical with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F is 6."} {"text":"The aromatic bond count of the chemical with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C is 11."}", "/scratch/micpie/export/rdkit_features/valid_24-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl is 2."} {"text":"The number of hydrogen bond donors of the chemical with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C is 1."}", "/scratch/micpie/export/rdkit_features/valid_102-4.jsonl": "{"text":"The number of rings of the compound with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1 is 3."} {"text":"The number of rings of the compound with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2 is 4."}", "/scratch/micpie/export/rdkit_features/valid_16-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C?\nAnswer: 2"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_29-15.jsonl": "{"text":"Question: What is the heteroatom count of the compound with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl?\nAnswer: 5"} {"text":"Question: What is the count of heteroatoms of the molecule with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_12-6.jsonl": "{"text":"The count of aromatic bonds of the molecule with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C is 15."} {"text":"The number of aromatic bonds of the molecule with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4 is 6."}", "/scratch/micpie/export/rdkit_features/train_28-23.jsonl": "{"text":"User: I want to analyze a chemical with a count of hydrogen bond donors of 0, a count of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value of 3.94.\nAssistant: Interesting, do you have some additional constraints that help me narrow down the search?\nUser: Yea, I want the chemical formula to be C14H9BrClNO2S.\nAssistant: Then, I advise the chemical with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl."} {"text":"User: I want to design a molecule with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptor sites of 2 and a Wildman-Crippen LogP value computed using RDKit of 3.75.\nAssistant: Interesting, do you have some additional limitations?\nUser: Yes, I want the chemical formula to be C20H26FN3O2.\nAssistant: I advise the molecule with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F."}", "/scratch/micpie/export/rdkit_features/test_3-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the compound with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC is 7."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F is 4."}", "/scratch/micpie/export/rdkit_features/test_1-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4 is 5."} {"text":"The rotatable bond count of the compound with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC is 6."}", "/scratch/micpie/export/rdkit_features/test_30-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the compound with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3 is 3.58."} {"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC is 3.76."}", "/scratch/micpie/export/rdkit_features/valid_14-0.jsonl": "{"text":"The chemical formula of the compound with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C is C20H28N4O."} {"text":"The molecular formula of the chemical with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O is C14H14BrNO3."}", "/scratch/micpie/export/rdkit_features/train_2-23.jsonl": "{"text":"User: I want to analyze a compound with a number of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 7 and a LogP value computed using the Wildman-Crippen method of 2.39.\nAssistant: That's interesting, do you have some additional constraints I should consider?\nUser: I want the chemical formula to be C22H24N5O2S+.\nAssistant: In that situation, I recommend the compound with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6."} {"text":"User: I want to create a chemical with a number of hydrogen bond donors of 2, a count of hydrogen bond acceptors of 5 and a LogP value computed using the Wildman-Crippen method of 4.28.\nAssistant: Cool, do you have some additional constraints?\nUser: Yes, I want the chemical formula to be C25H24N2O4.\nAssistant: In that situation, I the chemical with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O."}", "/scratch/micpie/export/rdkit_features/test_31-23.jsonl": "{"text":"User: I want to analyze a chemical with a count of hydrogen bond donors of 0, a count of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value computed using RDKit of 3.74.\nAssistant: That's interesting, do you have some additional constraints I should take into account?\nUser: Yeah, I want the chemical formula to be C24H23F2N3O2.\nAssistant: In that situation, I recommend the chemical with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F."} {"text":"User: I want to make a compound with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value computed using RDKit of 2.43.\nAssistant: That is a very interesting question, do you have some additional ?\nUser: Yea, I want the chemical formula to be C20H27FN2O4.\nAssistant: In that scenario, I propose the compound with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3."}", "/scratch/micpie/export/rdkit_features/valid_25-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4 is 62.44."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3 is 58.87."}", "/scratch/micpie/export/rdkit_features/valid_119-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the compound with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O?\nAnswer: 9"} {"text":"Question: What is the number of heteroatoms of the molecule with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_114-3.jsonl": "{"text":"The number of heteroatoms of the compound with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C is 8."} {"text":"The number of heteroatoms of the molecule with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5 is 8."}", "/scratch/micpie/export/rdkit_features/train_12-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the compound with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C is 1."} {"text":"The number of hydrogen bond acceptors of the molecule with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC is 2."}", "/scratch/micpie/export/rdkit_features/test_105-11.jsonl": "{"text":"User: I want to create a chemical with a formula of C17H15BrN2O2.\nAssistant: Cool, do you have some additional conditions that help me narrow down the search?\nUser: I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 3.\nAssistant: In that scenario, I suggest the chemical with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N."} {"text":"User: I want to synthesize a compound with a chemical formula of C16H21N3O4.\nAssistant: That's interesting, do you have some additional requirements I should take into account?\nUser: Yeah, I want the count of hydrogen bond donors to be 2, the count of hydrogen bond acceptors to be 6.\nAssistant: In that situation, I recommend the compound with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O."}", "/scratch/micpie/export/rdkit_features/valid_1-23.jsonl": "{"text":"User: I want to analyze a chemical with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 7 and a LogP value computed using the Wildman-Crippen method of 3.79.\nAssistant: Nice, do you have some additional conditions I should consider?\nUser: Yeah, I want the formula to be C21H21N7OS.\nAssistant: In that scenario, I suggest the chemical with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5."} {"text":"User: I want to create a chemical with a count of hydrogen bond donors of 0, a number of hydrogen bond acceptor sites of 7 and a LogP value computed using the Wildman-Crippen method of 3.68.\nAssistant: Interesting, do you have some additional I should take into account?\nUser: Yeah, I want the molecular formula to be C22H24FN5O2.\nAssistant: In that scenario, I the chemical with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F."}", "/scratch/micpie/export/rdkit_features/valid_28-19.jsonl": "{"text":"Question: What is the number of acid groups of the compound with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the molecule with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_111-6.jsonl": "{"text":"The aromatic bond count of the molecule with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C is 15."} {"text":"The count of aromatic bonds of the molecule with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC is 17."}", "/scratch/micpie/export/rdkit_features/train_102-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O?\nAnswer: 0"} {"text":"Question: What is the basic group count of the chemical with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_104-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the molecule with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 4"} {"text":"Question: What is the count of rotatable bonds of the compound with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_21-20.jsonl": "{"text":"Question: What is the number of basic groups of the chemical with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-]?\nAnswer: 1"} {"text":"Question: What is the basic group count of the compound with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_5-0.jsonl": "{"text":"The molecular formula of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C is C21H23N4O3S+."} {"text":"The formula of the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+] is C25H25N2O4+."}", "/scratch/micpie/export/rdkit_features/train_103-4.jsonl": "{"text":"The ring count of the compound with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1 is 3."} {"text":"The count of rings of the compound with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 3."}", "/scratch/micpie/export/rdkit_features/train_1-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4 is 7."} {"text":"The number of rotatable bonds of the compound with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl is 3."}", "/scratch/micpie/export/rdkit_features/train_19-22.jsonl": "{"text":"User: I want to create a molecule with a count of hydrogen bond donors of 3 and a count of hydrogen bond acceptors of 4.\nAssistant: Do you have some additional conditions I should take into account?\nUser: I want the count of heteroatoms to be 7.\nAssistant: In that scenario, I suggest the molecule with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC."} {"text":"User: I want to create a chemical with a number of hydrogen bond donors of 2 and a number of hydrogen bond acceptors of 3.\nAssistant: Nice, do you have some additional limitations I should take into account?\nUser: Yea, I want the heteroatom count to be 6.\nAssistant: I recommend the chemical with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3."}", "/scratch/micpie/export/rdkit_features/train_23-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4?\nAnswer: 65.41"} {"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3?\nAnswer: 41.87"}", "/scratch/micpie/export/rdkit_features/test_21-11.jsonl": "{"text":"User: I want to synthesize a molecule with a formula of C18H18NO6S-.\nAssistant: Do you have some additional limitations I should take into account?\nUser: I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptors to be 6.\nAssistant: I propose the molecule with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-]."} {"text":"User: I want to synthesize a chemical with a chemical formula of C20H32N4O3.\nAssistant: That's interesting, do you have some additional requirements that help me narrow down the search?\nUser: I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 5.\nAssistant: In that situation, I the chemical with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O."}", "/scratch/micpie/export/rdkit_features/valid_100-5.jsonl": "{"text":"The rotatable bond count of the compound with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F is 5."} {"text":"The number of rotatable bonds of the compound with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4 is 3."}", "/scratch/micpie/export/rdkit_features/train_102-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the molecule with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O is 8."} {"text":"The count of hydrogen bond acceptors of the chemical with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1 is 3."}", "/scratch/micpie/export/rdkit_features/train_19-20.jsonl": "{"text":"Question: What is the number of basic groups of the molecule with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC?\nAnswer: 0"} {"text":"Question: What is the basic group count of the chemical with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_12-0.jsonl": "{"text":"The chemical formula of the compound with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C is C17H21Cl2NO."} {"text":"The chemical formula of the chemical with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC is C21H32FNO2."}", "/scratch/micpie/export/rdkit_features/test_103-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1 is 3."} {"text":"The number of rotatable bonds of the compound with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 4."}", "/scratch/micpie/export/rdkit_features/test_28-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC?\nAnswer: 6"} {"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_29-11.jsonl": "{"text":"User: I want to create a chemical with a chemical formula of C20H26FN3O2.\nAssistant: Nice, do you have some additional constraints I should take into account?\nUser: I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 2.\nAssistant: I advise the chemical with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F."} {"text":"User: I want to design a compound with a chemical formula of C22H36N3O2+.\nAssistant: Nice, do you have some additional requirements?\nUser: Yes, I want the number of hydrogen bond donor sites to be 2, the count of hydrogen bond acceptors to be 2.\nAssistant: In that situation, I suggest the compound with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C."}", "/scratch/micpie/export/rdkit_features/valid_4-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br is 3.94."} {"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C is 0.88."}", "/scratch/micpie/export/rdkit_features/valid_5-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+]?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_12-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C is 3."} {"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4 is 2."}", "/scratch/micpie/export/rdkit_features/valid_112-3.jsonl": "{"text":"The heteroatom count of the compound with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC is 10."} {"text":"The number of heteroatoms of the chemical with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N is 9."}", "/scratch/micpie/export/rdkit_features/train_7-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl is 1."} {"text":"The number of hydrogen bond donor sites of the molecule with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F is 2."}", "/scratch/micpie/export/rdkit_features/train_32-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C is 0.62."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2 is 0.85."}", "/scratch/micpie/export/rdkit_features/valid_109-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl is 6."} {"text":"The rotatable bond count of the molecule with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F is 5."}", "/scratch/micpie/export/rdkit_features/valid_100-0.jsonl": "{"text":"The formula of the chemical with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F is C18H27FN3O+."} {"text":"The chemical formula of the compound with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4 is C20H24N3O+."}", "/scratch/micpie/export/rdkit_features/train_2-15.jsonl": "{"text":"Question: What is the heteroatom count of the compound with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6?\nAnswer: 8"} {"text":"Question: What is the heteroatom count of the compound with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_30-1.jsonl": "{"text":"The number of hydrogen bond donors of the compound with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3 is 1."} {"text":"The count of hydrogen bond donors of the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC is 0."}", "/scratch/micpie/export/rdkit_features/valid_113-1.jsonl": "{"text":"The number of hydrogen bond donors of the compound with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 3."} {"text":"The number of hydrogen bond donor sites of the compound with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 3."}", "/scratch/micpie/export/rdkit_features/train_23-20.jsonl": "{"text":"Question: What is the count of basic groups of the chemical with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4?\nAnswer: 1"} {"text":"Question: What is the number of basic groups of the compound with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_28-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC is 1."} {"text":"The count of hydrogen bond donors of the molecule with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C is 3."}", "/scratch/micpie/export/rdkit_features/train_4-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4?\nAnswer: 66.62"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C?\nAnswer: 67.32"}", "/scratch/micpie/export/rdkit_features/train_5-12.jsonl": "{"text":"Question: What is the formula of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C?\nAnswer: C23H29N2O3S+"} {"text":"Question: What is the chemical formula of the chemical with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C?\nAnswer: C21H32N4O4"}", "/scratch/micpie/export/rdkit_features/valid_111-22.jsonl": "{"text":"User: I want to make a molecule with a number of hydrogen bond donor sites of 1 and a count of hydrogen bond acceptors of 7.\nAssistant: That is a very interesting question, do you have some additional constraints?\nUser: Yep, I want the heteroatom count to be 8.\nAssistant: Then, I advise the molecule with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4."} {"text":"User: I want to analyze a molecule with a number of hydrogen bond donor sites of 0 and a number of hydrogen bond acceptors of 10.\nAssistant: Interesting, do you have some additional requirements that help me narrow down the search?\nUser: Indeed, I want the count of heteroatoms to be 10.\nAssistant: In that case, I the molecule with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC."}", "/scratch/micpie/export/rdkit_features/valid_19-23.jsonl": "{"text":"User: I want to make a molecule with a number of hydrogen bond donors of 3, a count of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value of 3.70.\nAssistant: Do you have some additional constraints I should consider?\nUser: Yes, I want the chemical formula to be C23H31N3O4.\nAssistant: In that case, I advise the molecule with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC."} {"text":"User: I want to make a chemical with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptors of 5 and a Wildman-Crippen LogP value of 3.99.\nAssistant: Interesting, do you have some additional limitations that help me narrow down the search?\nUser: Yes, I want the formula to be C18H19FN2O4S2.\nAssistant: In that scenario, I the chemical with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F."}", "/scratch/micpie/export/rdkit_features/valid_27-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the chemical with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F is 5."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C is 1."}", "/scratch/micpie/export/rdkit_features/valid_112-8.jsonl": "{"text":"The basic group count of the molecule with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC is 0."} {"text":"The count of basic groups of the chemical with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N is 2."}", "/scratch/micpie/export/rdkit_features/test_15-11.jsonl": "{"text":"User: I want to analyze a molecule with a chemical formula of C13H14BrNO3.\nAssistant: Interesting, do you have some additional conditions that I should consider?\nUser: Yeah, I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptor sites to be 4.\nAssistant: In that case, I propose the molecule with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O."} {"text":"User: I want to design a molecule with a formula of C16H26N3OS+.\nAssistant: That's interesting, do you have some additional conditions I should take into account?\nUser: Yeah, I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptors to be 3.\nAssistant: In that situation, I recommend the molecule with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2."}", "/scratch/micpie/export/rdkit_features/test_6-8.jsonl": "{"text":"The count of basic groups of the chemical with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3 is 0."} {"text":"The number of basic groups of the molecule with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C is 0."}", "/scratch/micpie/export/rdkit_features/train_119-0.jsonl": "{"text":"The formula of the compound with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO is C15H14FN3O5."} {"text":"The molecular formula of the molecule with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl is C19H23Cl2N5O2S."}", "/scratch/micpie/export/rdkit_features/valid_118-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O is 1.92."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO is 1.99."}", "/scratch/micpie/export/rdkit_features/valid_115-16.jsonl": "{"text":"Question: What is the ring count of the chemical with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4?\nAnswer: 4"} {"text":"Question: What is the ring count of the molecule with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_108-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O is 2.37."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C is 2.08."}", "/scratch/micpie/export/rdkit_features/test_0-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1?\nAnswer: 10"} {"text":"Question: What is the number of heteroatoms of the compound with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_11-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the compound with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2?\nAnswer: 5"} {"text":"Question: What is the number of heteroatoms of the molecule with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_104-4.jsonl": "{"text":"The number of rings of the chemical with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 3."} {"text":"The number of rings of the compound with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl is 3."}", "/scratch/micpie/export/rdkit_features/test_4-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4?\nAnswer: 4"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_16-4.jsonl": "{"text":"The count of rings of the molecule with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C is 1."} {"text":"The number of rings of the compound with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N is 3."}", "/scratch/micpie/export/rdkit_features/train_28-7.jsonl": "{"text":"The acid group count of the chemical with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl is 0."} {"text":"The acid group count of the molecule with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F is 0."}", "/scratch/micpie/export/rdkit_features/train_26-22.jsonl": "{"text":"User: I want to make a chemical with a number of hydrogen bond donors of 0 and a number of hydrogen bond acceptors of 2.\nAssistant: Cool, do you have some additional limitations I should take into account?\nUser: Yea, I want the heteroatom count to be 9.\nAssistant: In that situation, I recommend the chemical with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F."} {"text":"User: I want to create a compound with a count of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 3.\nAssistant: That's interesting, do you have some additional limitations that help me narrow down the search?\nUser: Yeah, I want the count of heteroatoms to be 5.\nAssistant: In that situation, I the compound with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C."}", "/scratch/micpie/export/rdkit_features/test_21-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-]?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_109-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3 is 69.79."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C is 57.87."}", "/scratch/micpie/export/rdkit_features/train_15-7.jsonl": "{"text":"The number of acid groups of the chemical with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC is 0."} {"text":"The acid group count of the molecule with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C is 0."}", "/scratch/micpie/export/rdkit_features/test_30-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3?\nAnswer: 58.53"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC?\nAnswer: 65.41"}", "/scratch/micpie/export/rdkit_features/valid_6-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+]?\nAnswer: 67.04"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F?\nAnswer: 62.38"}", "/scratch/micpie/export/rdkit_features/valid_13-23.jsonl": "{"text":"User: I want to design a chemical with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 3 and a LogP value computed using the Wildman-Crippen method of 4.53.\nAssistant: Interesting, do you have some additional ?\nUser: Indeed, I want the chemical formula to be C21H20FN3O.\nAssistant: In that situation, I recommend the chemical with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4."} {"text":"User: I want to design a chemical with a count of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 4 and a Wildman-Crippen LogP value computed using RDKit of 4.56.\nAssistant: Cool, do you have some additional requirements that help me narrow down the search?\nUser: I want the chemical formula to be C21H20N4.\nAssistant: I recommend the chemical with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C."}", "/scratch/micpie/export/rdkit_features/test_28-3.jsonl": "{"text":"The number of heteroatoms of the compound with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC is 7."} {"text":"The count of heteroatoms of the molecule with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C is 6."}", "/scratch/micpie/export/rdkit_features/valid_106-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl is 1."} {"text":"The number of hydrogen bond donors of the chemical with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F is 2."}", "/scratch/micpie/export/rdkit_features/train_27-8.jsonl": "{"text":"The count of basic groups of the molecule with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C is 1."} {"text":"The basic group count of the molecule with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4 is 0."}", "/scratch/micpie/export/rdkit_features/test_9-0.jsonl": "{"text":"The chemical formula of the compound with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C is C23H36N3O3+."} {"text":"The chemical formula of the molecule with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4 is C23H40N7+."}", "/scratch/micpie/export/rdkit_features/test_13-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 52.88"} {"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4?\nAnswer: 62.09"}", "/scratch/micpie/export/rdkit_features/test_28-8.jsonl": "{"text":"The number of basic groups of the molecule with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC is 1."} {"text":"The count of basic groups of the compound with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C is 0."}", "/scratch/micpie/export/rdkit_features/train_119-19.jsonl": "{"text":"Question: What is the number of acid groups of the molecule with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the chemical with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_117-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(cc4)Oc5ccc(cn5)Cl is 0."} {"text":"The number of hydrogen bond donors of the chemical with SMILES C[C@@H](C(C)(C)OC)[NH2+]CCNc1ccccc1 is 2."}", "/scratch/micpie/export/rdkit_features/train_4-19.jsonl": "{"text":"Question: What is the number of acid groups of the compound with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4?\nAnswer: 0"} {"text":"Question: What is the acid group count of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_103-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_29-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl is 3."} {"text":"The number of hydrogen bond donor sites of the compound with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3 is 3."}", "/scratch/micpie/export/rdkit_features/test_119-3.jsonl": "{"text":"The count of heteroatoms of the molecule with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O is 9."} {"text":"The heteroatom count of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C is 8."}", "/scratch/micpie/export/rdkit_features/valid_118-6.jsonl": "{"text":"The count of aromatic bonds of the molecule with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O is 6."} {"text":"The aromatic bond count of the compound with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO is 6."}", "/scratch/micpie/export/rdkit_features/train_5-4.jsonl": "{"text":"The count of rings of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C is 4."} {"text":"The ring count of the compound with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C is 3."}", "/scratch/micpie/export/rdkit_features/valid_10-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C?\nAnswer: 0"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_7-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC is 3.75."} {"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC is 3.90."}", "/scratch/micpie/export/rdkit_features/train_118-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F?\nAnswer: 1"} {"text":"Question: What is the ring count of the molecule with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_24-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5?\nAnswer: 6"} {"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/test_110-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3 is 69.89."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4 is 68.44."}", "/scratch/micpie/export/rdkit_features/valid_116-20.jsonl": "{"text":"Question: What is the number of basic groups of the compound with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 0"} {"text":"Question: What is the basic group count of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_119-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the molecule with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO is 1.94."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl is 3.59."}", "/scratch/micpie/export/rdkit_features/test_2-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C is 0."} {"text":"The count of acid groups of the molecule with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4 is 0."}", "/scratch/micpie/export/rdkit_features/valid_112-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC?\nAnswer: 9"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_109-15.jsonl": "{"text":"Question: What is the heteroatom count of the chemical with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl?\nAnswer: 8"} {"text":"Question: What is the count of heteroatoms of the molecule with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N?\nAnswer: 10"}", "/scratch/micpie/export/rdkit_features/test_21-12.jsonl": "{"text":"Question: What is the formula of the molecule with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-]?\nAnswer: C18H18NO6S-"} {"text":"Question: What is the formula of the molecule with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O?\nAnswer: C20H32N4O3"}", "/scratch/micpie/export/rdkit_features/train_14-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C?\nAnswer: 60.41"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O?\nAnswer: 40.53"}", "/scratch/micpie/export/rdkit_features/train_103-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the compound with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1 is 4.24."} {"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is -1.05."}", "/scratch/micpie/export/rdkit_features/valid_118-22.jsonl": "{"text":"User: I want to make a compound with a count of hydrogen bond donors of 2 and a count of hydrogen bond acceptors of 1.\nAssistant: Cool, do you have some additional limitations I should take into account?\nUser: Yes, I want the count of heteroatoms to be 3.\nAssistant: In that case, I recommend the compound with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O."} {"text":"User: I want to synthesize a molecule with a number of hydrogen bond donor sites of 3 and a number of hydrogen bond acceptors of 5.\nAssistant: Cool, do you have some additional ?\nUser: Indeed, I want the heteroatom count to be 8.\nAssistant: In that situation, I suggest the molecule with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO."}", "/scratch/micpie/export/rdkit_features/train_12-20.jsonl": "{"text":"Question: What is the basic group count of the chemical with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the compound with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_112-5.jsonl": "{"text":"The number of rotatable bonds of the molecule with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC is 7."} {"text":"The count of rotatable bonds of the compound with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 8."}", "/scratch/micpie/export/rdkit_features/test_101-1.jsonl": "{"text":"The number of hydrogen bond donors of the chemical with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F is 1."} {"text":"The count of hydrogen bond donors of the chemical with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1 is 2."}", "/scratch/micpie/export/rdkit_features/train_112-20.jsonl": "{"text":"Question: What is the basic group count of the chemical with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the chemical with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_8-12.jsonl": "{"text":"Question: What is the molecular formula of the molecule with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N?\nAnswer: C20H13F2N3O5"} {"text":"Question: What is the chemical formula of the molecule with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO?\nAnswer: C23H32NO4S+"}", "/scratch/micpie/export/rdkit_features/valid_9-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2 is 6."} {"text":"The number of rotatable bonds of the molecule with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4 is 8."}", "/scratch/micpie/export/rdkit_features/train_15-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC?\nAnswer: 50.59"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C?\nAnswer: 55.59"}", "/scratch/micpie/export/rdkit_features/test_104-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2 is -2.78."} {"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3 is 4.36."}", "/scratch/micpie/export/rdkit_features/train_8-22.jsonl": "{"text":"User: I want to create a compound with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptors of 6.\nAssistant: Do you have some additional constraints I should take into account?\nUser: Indeed, I want the heteroatom count to be 10.\nAssistant: I propose the compound with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N."} {"text":"User: I want to make a compound with a number of hydrogen bond donors of 2 and a number of hydrogen bond acceptor sites of 4.\nAssistant: Do you have some additional limitations that help me narrow down the search?\nUser: Indeed, I want the heteroatom count to be 6.\nAssistant: In that scenario, I recommend the compound with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO."}", "/scratch/micpie/export/rdkit_features/train_8-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N?\nAnswer: 18"} {"text":"Question: What is the number of aromatic bonds of the compound with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/train_0-8.jsonl": "{"text":"The basic group count of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1 is 0."} {"text":"The number of basic groups of the compound with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4 is 1."}", "/scratch/micpie/export/rdkit_features/valid_108-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O is 8."} {"text":"The number of rotatable bonds of the compound with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C is 7."}", "/scratch/micpie/export/rdkit_features/test_20-11.jsonl": "{"text":"User: I want to make a compound with a chemical formula of C19H24BrFN2O2.\nAssistant: Do you have some additional limitations I should consider?\nUser: Yea, I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 2.\nAssistant: In that situation, I propose the compound with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br."} {"text":"User: I want to create a chemical with a formula of C17H15ClNO6S-.\nAssistant: That's interesting, do you have some additional conditions I should take into account?\nUser: Indeed, I want the count of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 6.\nAssistant: In that case, I recommend the chemical with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-]."}", "/scratch/micpie/export/rdkit_features/train_103-16.jsonl": "{"text":"Question: What is the number of rings of the molecule with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1?\nAnswer: 3"} {"text":"Question: What is the ring count of the molecule with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_22-6.jsonl": "{"text":"The count of aromatic bonds of the compound with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O is 5."} {"text":"The count of aromatic bonds of the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4 is 5."}", "/scratch/micpie/export/rdkit_features/train_115-1.jsonl": "{"text":"The number of hydrogen bond donors of the chemical with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F is 2."} {"text":"The number of hydrogen bond donor sites of the chemical with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 1."}", "/scratch/micpie/export/rdkit_features/test_6-7.jsonl": "{"text":"The count of acid groups of the compound with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3 is 0."} {"text":"The number of acid groups of the chemical with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C is 0."}", "/scratch/micpie/export/rdkit_features/valid_116-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_116-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1?\nAnswer: 4"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_15-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O is 6."} {"text":"The number of rotatable bonds of the chemical with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2 is 6."}", "/scratch/micpie/export/rdkit_features/test_103-11.jsonl": "{"text":"User: I want to create a chemical with a formula of C22H16N2O2.\nAssistant: That is a very interesting question, do you have some additional that I should consider?\nUser: Yes, I want the number of hydrogen bond donors to be 2, the count of hydrogen bond acceptors to be 4.\nAssistant: In that scenario, I the chemical with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1."} {"text":"User: I want to synthesize a molecule with a formula of C14H21N5O4.\nAssistant: Nice, do you have some additional limitations I should take into account?\nUser: Yes, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 7.\nAssistant: I advise the molecule with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3."}", "/scratch/micpie/export/rdkit_features/valid_101-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3 is 4."} {"text":"The rotatable bond count of the compound with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1 is 6."}", "/scratch/micpie/export/rdkit_features/valid_9-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the compound with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2?\nAnswer: 6"} {"text":"Question: What is the number of aromatic bonds of the molecule with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_0-5.jsonl": "{"text":"The rotatable bond count of the compound with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1 is 3."} {"text":"The rotatable bond count of the molecule with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC is 6."}", "/scratch/micpie/export/rdkit_features/valid_115-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_7-20.jsonl": "{"text":"Question: What is the count of basic groups of the compound with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the compound with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_109-0.jsonl": "{"text":"The formula of the chemical with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl is C22H26ClN3O4."} {"text":"The chemical formula of the chemical with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N is C16H13BrN4O4S."}", "/scratch/micpie/export/rdkit_features/test_115-16.jsonl": "{"text":"Question: What is the number of rings of the molecule with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F?\nAnswer: 4"} {"text":"Question: What is the count of rings of the molecule with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_119-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO is 6."} {"text":"The count of rotatable bonds of the molecule with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl is 7."}", "/scratch/micpie/export/rdkit_features/test_8-3.jsonl": "{"text":"The number of heteroatoms of the chemical with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C is 8."} {"text":"The count of heteroatoms of the chemical with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F is 7."}", "/scratch/micpie/export/rdkit_features/valid_103-1.jsonl": "{"text":"The count of hydrogen bond donors of the chemical with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO is 4."} {"text":"The count of hydrogen bond donors of the compound with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N is 2."}", "/scratch/micpie/export/rdkit_features/valid_7-7.jsonl": "{"text":"The number of acid groups of the molecule with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC is 0."} {"text":"The count of acid groups of the compound with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC is 0."}", "/scratch/micpie/export/rdkit_features/test_1-18.jsonl": "{"text":"Question: What is the aromatic bond count of the molecule with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4?\nAnswer: 17"} {"text":"Question: What is the number of aromatic bonds of the chemical with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC?\nAnswer: 22"}", "/scratch/micpie/export/rdkit_features/test_105-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_14-19.jsonl": "{"text":"Question: What is the count of acid groups of the compound with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the molecule with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_22-20.jsonl": "{"text":"Question: What is the count of basic groups of the chemical with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_111-1.jsonl": "{"text":"The number of hydrogen bond donors of the chemical with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C is 0."} {"text":"The number of hydrogen bond donors of the chemical with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC is 0."}", "/scratch/micpie/export/rdkit_features/test_31-3.jsonl": "{"text":"The number of heteroatoms of the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F is 7."} {"text":"The number of heteroatoms of the molecule with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3 is 7."}", "/scratch/micpie/export/rdkit_features/train_29-3.jsonl": "{"text":"The heteroatom count of the compound with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F is 6."} {"text":"The number of heteroatoms of the molecule with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C is 5."}", "/scratch/micpie/export/rdkit_features/valid_21-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the compound with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-] is 58.13."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3 is 66.43."}", "/scratch/micpie/export/rdkit_features/valid_116-8.jsonl": "{"text":"The count of basic groups of the chemical with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 0."} {"text":"The count of basic groups of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl is 0."}", "/scratch/micpie/export/rdkit_features/train_23-11.jsonl": "{"text":"User: I want to create a compound with a chemical formula of C20H33N6O2+.\nAssistant: Interesting, do you have some additional limitations that I should consider?\nUser: Yep, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 7.\nAssistant: In that situation, I propose the compound with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4."} {"text":"User: I want to analyze a chemical with a chemical formula of C12H12IN6O-.\nAssistant: Nice, do you have some additional constraints that help me narrow down the search?\nUser: Indeed, I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptors to be 4.\nAssistant: I propose the chemical with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3."}", "/scratch/micpie/export/rdkit_features/train_10-6.jsonl": "{"text":"The count of aromatic bonds of the molecule with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4 is 5."} {"text":"The aromatic bond count of the compound with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl is 6."}", "/scratch/micpie/export/rdkit_features/valid_28-23.jsonl": "{"text":"User: I want to create a chemical with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptor sites of 1 and a Wildman-Crippen LogP value computed using RDKit of 3.67.\nAssistant: Do you have some additional conditions that I should consider?\nUser: Yeah, I want the molecular formula to be C16H20BrFN2O.\nAssistant: In that situation, I propose the chemical with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br."} {"text":"User: I want to create a chemical with a number of hydrogen bond donors of 0, a count of hydrogen bond acceptors of 2 and a LogP value computed using the Wildman-Crippen method of 3.88.\nAssistant: Interesting, do you have some additional I should take into account?\nUser: Yes, I want the formula to be C22H26N2O2.\nAssistant: In that case, I advise the chemical with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C."}", "/scratch/micpie/export/rdkit_features/test_11-0.jsonl": "{"text":"The formula of the chemical with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2 is C18H24F3NO."} {"text":"The chemical formula of the compound with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C is C22H31NO2."}", "/scratch/micpie/export/rdkit_features/test_119-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the molecule with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O is 2."} {"text":"The count of hydrogen bond donors of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C is 1."}", "/scratch/micpie/export/rdkit_features/valid_23-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4 is 5."} {"text":"The count of hydrogen bond acceptors of the molecule with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4 is 6."}", "/scratch/micpie/export/rdkit_features/valid_102-16.jsonl": "{"text":"Question: What is the count of rings of the compound with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1?\nAnswer: 3"} {"text":"Question: What is the count of rings of the molecule with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_103-7.jsonl": "{"text":"The acid group count of the compound with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1 is 0."} {"text":"The acid group count of the compound with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 0."}", "/scratch/micpie/export/rdkit_features/test_17-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F?\nAnswer: C20H29F3N2O3"} {"text":"Question: What is the molecular formula of the compound with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: C21H23N5O7"}", "/scratch/micpie/export/rdkit_features/test_29-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the chemical with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4?\nAnswer: 2"} {"text":"Question: What is the rotatable bond count of the molecule with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_18-20.jsonl": "{"text":"Question: What is the count of basic groups of the molecule with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O?\nAnswer: 0"} {"text":"Question: What is the basic group count of the molecule with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_110-6.jsonl": "{"text":"The count of aromatic bonds of the chemical with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O is 22."} {"text":"The number of aromatic bonds of the chemical with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C is 15."}", "/scratch/micpie/export/rdkit_features/test_31-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F?\nAnswer: 3"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_18-6.jsonl": "{"text":"The count of aromatic bonds of the molecule with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 12."} {"text":"The number of aromatic bonds of the compound with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O is 6."}", "/scratch/micpie/export/rdkit_features/train_28-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl?\nAnswer: 11"} {"text":"Question: What is the number of aromatic bonds of the chemical with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_5-11.jsonl": "{"text":"User: I want to synthesize a molecule with a chemical formula of C21H23N4O3S+.\nAssistant: Interesting, do you have some additional that I should consider?\nUser: I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 6.\nAssistant: In that scenario, I the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C."} {"text":"User: I want to make a compound with a formula of C25H25N2O4+.\nAssistant: Cool, do you have some additional limitations that help me narrow down the search?\nUser: Yep, I want the count of hydrogen bond donors to be 3, the number of hydrogen bond acceptor sites to be 4.\nAssistant: In that situation, I advise the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+]."}", "/scratch/micpie/export/rdkit_features/train_6-19.jsonl": "{"text":"Question: What is the acid group count of the chemical with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the compound with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_120-15.jsonl": "{"text":"Question: What is the heteroatom count of the molecule with SMILES Cc1ccc(cc1)NC(=O)CSc2nnc(n2C)C=NC(=O)CCc3ccccc3Cl?\nAnswer: 9"} {"text":"Question: What is the count of heteroatoms of the chemical with SMILES Cc1c(cnn1c2ccccc2)N[C@@H](C)C(=O)Nc3cccc(c3Cl)Cl?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_120-3.jsonl": "{"text":"The number of heteroatoms of the compound with SMILES c1ccc(cc1)CCN(CC(=O)Nc2ccc(cc2)Cl)S(=O)(=O)c3ccc4c(c3)OCCO4 is 9."} {"text":"The count of heteroatoms of the chemical with SMILES Cc1c(cnn1c2ccccc2)N[C@H](C)C(=O)Nc3cccc(c3Cl)Cl is 7."}", "/scratch/micpie/export/rdkit_features/train_118-11.jsonl": "{"text":"User: I want to design a compound with a chemical formula of C14H23FNO+.\nAssistant: Interesting, do you have some additional requirements that help me narrow down the search?\nUser: Yeah, I want the number of hydrogen bond donor sites to be 2, the count of hydrogen bond acceptors to be 1.\nAssistant: In that situation, I recommend the compound with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F."} {"text":"User: I want to design a chemical with a chemical formula of C15H18FN3O5.\nAssistant: Nice, do you have some additional constraints that help me narrow down the search?\nUser: Indeed, I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 6.\nAssistant: In that situation, I recommend the chemical with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3."}", "/scratch/micpie/export/rdkit_features/valid_23-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4 is 65.31."} {"text":"The total sum of atomic polarizabilities of the molecule with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4 is 58.75."}", "/scratch/micpie/export/rdkit_features/train_102-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O?\nAnswer: 2"} {"text":"Question: What is the count of rotatable bonds of the molecule with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_101-23.jsonl": "{"text":"User: I want to design a compound with a count of hydrogen bond donors of 2, a count of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value of 2.80.\nAssistant: Nice, do you have some additional constraints I should take into account?\nUser: I want the formula to be C15H13ClN4O2.\nAssistant: I propose the compound with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3."} {"text":"User: I want to analyze a chemical with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 3 and a Wildman-Crippen LogP value computed using RDKit of 5.05.\nAssistant: Interesting, do you have some additional conditions I should take into account?\nUser: Indeed, I want the formula to be C25H23F2N3O2.\nAssistant: I advise the chemical with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1."}", "/scratch/micpie/export/rdkit_features/train_116-23.jsonl": "{"text":"User: I want to analyze a compound with a count of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 3 and a LogP value computed using the Wildman-Crippen method of 5.01.\nAssistant: Interesting, do you have some additional constraints?\nUser: Yep, I want the formula to be C27H25ClFN3O2.\nAssistant: In that situation, I suggest the compound with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F."} {"text":"User: I want to make a molecule with a count of hydrogen bond donors of 0, a count of hydrogen bond acceptors of 3 and a LogP value computed using the Wildman-Crippen method of 5.13.\nAssistant: Cool, do you have some additional conditions that I should consider?\nUser: Yes, I want the chemical formula to be C26H25BrN2O3.\nAssistant: In that situation, I the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br."}", "/scratch/micpie/export/rdkit_features/valid_1-19.jsonl": "{"text":"Question: What is the count of acid groups of the compound with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the chemical with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_29-8.jsonl": "{"text":"The count of basic groups of the compound with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F is 0."} {"text":"The basic group count of the compound with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C is 1."}", "/scratch/micpie/export/rdkit_features/train_13-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC is 8."} {"text":"The count of rotatable bonds of the molecule with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C is 5."}", "/scratch/micpie/export/rdkit_features/train_27-22.jsonl": "{"text":"User: I want to design a compound with a count of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 3.\nAssistant: Do you have some additional limitations that I should consider?\nUser: Yeah, I want the heteroatom count to be 5.\nAssistant: In that case, I suggest the compound with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C."} {"text":"User: I want to create a chemical with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 4.\nAssistant: Cool, do you have some additional ?\nUser: Indeed, I want the count of heteroatoms to be 6.\nAssistant: Then, I suggest the chemical with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4."}", "/scratch/micpie/export/rdkit_features/test_19-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl is 62.98."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4 is 69.09."}", "/scratch/micpie/export/rdkit_features/valid_102-11.jsonl": "{"text":"User: I want to analyze a compound with a chemical formula of C23H21F2N3O2.\nAssistant: Interesting, do you have some additional conditions that I should consider?\nUser: Yep, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 3.\nAssistant: Then, I suggest the compound with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1."} {"text":"User: I want to design a molecule with a chemical formula of C48H76N6O4.\nAssistant: That is a very interesting question, do you have some additional limitations I should take into account?\nUser: Indeed, I want the count of hydrogen bond donors to be 4, the number of hydrogen bond acceptor sites to be 6.\nAssistant: In that situation, I propose the molecule with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2."}", "/scratch/micpie/export/rdkit_features/test_12-7.jsonl": "{"text":"The count of acid groups of the molecule with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C is 0."} {"text":"The count of acid groups of the chemical with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4 is 0."}", "/scratch/micpie/export/rdkit_features/valid_118-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O is 42.44."} {"text":"The sum of atomic polarizabilities of the compound with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO is 46.80."}", "/scratch/micpie/export/rdkit_features/train_109-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3?\nAnswer: C23H33N3O5"} {"text":"Question: What is the chemical formula of the chemical with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C?\nAnswer: C16H18N6O3S3"}", "/scratch/micpie/export/rdkit_features/valid_32-22.jsonl": "{"text":"User: I want to synthesize a molecule with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 6.\nAssistant: That's interesting, do you have some additional conditions that I should consider?\nUser: Indeed, I want the number of heteroatoms to be 8.\nAssistant: In that scenario, I the molecule with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3."} {"text":"User: I want to design a molecule with a count of hydrogen bond donors of 2 and a number of hydrogen bond acceptor sites of 5.\nAssistant: Nice, do you have some additional constraints I should take into account?\nUser: I want the heteroatom count to be 9.\nAssistant: In that case, I propose the molecule with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O."}", "/scratch/micpie/export/rdkit_features/valid_118-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O?\nAnswer: C14H23FNO+"} {"text":"Question: What is the chemical formula of the compound with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO?\nAnswer: C15H20FN3O4"}", "/scratch/micpie/export/rdkit_features/test_107-7.jsonl": "{"text":"The count of acid groups of the compound with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC is 0."} {"text":"The count of acid groups of the molecule with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC is 0."}", "/scratch/micpie/export/rdkit_features/valid_115-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the compound with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4 is 8."} {"text":"The count of hydrogen bond acceptors of the molecule with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F is 2."}", "/scratch/micpie/export/rdkit_features/test_4-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4 is 61.70."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3 is 67.32."}", "/scratch/micpie/export/rdkit_features/valid_22-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O?\nAnswer: 3"} {"text":"Question: What is the rotatable bond count of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_101-4.jsonl": "{"text":"The count of rings of the compound with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F is 2."} {"text":"The count of rings of the compound with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1 is 5."}", "/scratch/micpie/export/rdkit_features/valid_0-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1 is 57.40."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4 is 69.89."}", "/scratch/micpie/export/rdkit_features/test_112-3.jsonl": "{"text":"The count of heteroatoms of the molecule with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC is 10."} {"text":"The count of heteroatoms of the compound with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N is 9."}", "/scratch/micpie/export/rdkit_features/train_16-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C?\nAnswer: 6"} {"text":"Question: What is the count of rotatable bonds of the compound with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_100-16.jsonl": "{"text":"Question: What is the number of rings of the compound with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C?\nAnswer: 3"} {"text":"Question: What is the ring count of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_19-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC is 3.70."} {"text":"The Wildman-Crippen LogP value of the chemical with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3 is 3.95."}", "/scratch/micpie/export/rdkit_features/test_18-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the compound with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O is 0.35."} {"text":"The Wildman-Crippen LogP value of the chemical with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C is 3.51."}", "/scratch/micpie/export/rdkit_features/train_20-16.jsonl": "{"text":"Question: What is the count of rings of the chemical with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3?\nAnswer: 3"} {"text":"Question: What is the count of rings of the molecule with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_111-6.jsonl": "{"text":"The number of aromatic bonds of the chemical with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4 is 17."} {"text":"The count of aromatic bonds of the chemical with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC is 17."}", "/scratch/micpie/export/rdkit_features/train_114-17.jsonl": "{"text":"Question: What is the rotatable bond count of the compound with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C?\nAnswer: 5"} {"text":"Question: What is the count of rotatable bonds of the molecule with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_118-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F?\nAnswer: 6"} {"text":"Question: What is the aromatic bond count of the chemical with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_19-7.jsonl": "{"text":"The acid group count of the molecule with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl is 0."} {"text":"The count of acid groups of the molecule with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4 is 0."}", "/scratch/micpie/export/rdkit_features/train_17-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the compound with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F is 2.82."} {"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 0.93."}", "/scratch/micpie/export/rdkit_features/train_11-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the molecule with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl?\nAnswer: 3"} {"text":"Question: What is the rotatable bond count of the chemical with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_17-20.jsonl": "{"text":"Question: What is the count of basic groups of the compound with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the molecule with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_22-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O is 64.67."} {"text":"The total sum of atomic polarizabilities of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4 is 71.03."}", "/scratch/micpie/export/rdkit_features/train_106-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O is 6."} {"text":"The aromatic bond count of the compound with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl is 12."}", "/scratch/micpie/export/rdkit_features/test_108-11.jsonl": "{"text":"User: I want to analyze a compound with a chemical formula of C21H19N9OS.\nAssistant: Do you have some additional conditions I should consider?\nUser: I want the number of hydrogen bond donors to be 2, the count of hydrogen bond acceptors to be 10.\nAssistant: In that case, I recommend the compound with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5."} {"text":"User: I want to design a molecule with a molecular formula of C23H29N5O4.\nAssistant: Cool, do you have some additional constraints I should consider?\nUser: Yea, I want the count of hydrogen bond donors to be 3, the number of hydrogen bond acceptors to be 5.\nAssistant: I suggest the molecule with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3."}", "/scratch/micpie/export/rdkit_features/valid_115-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4 is 5.05."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F is 5.31."}", "/scratch/micpie/export/rdkit_features/valid_5-8.jsonl": "{"text":"The number of basic groups of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C is 1."} {"text":"The number of basic groups of the molecule with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+] is 1."}", "/scratch/micpie/export/rdkit_features/test_0-19.jsonl": "{"text":"Question: What is the count of acid groups of the chemical with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1?\nAnswer: 1"} {"text":"Question: What is the acid group count of the chemical with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_31-7.jsonl": "{"text":"The acid group count of the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F is 0."} {"text":"The number of acid groups of the chemical with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3 is 0."}", "/scratch/micpie/export/rdkit_features/valid_113-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 5"} {"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_5-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C?\nAnswer: 62.00"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+]?\nAnswer: 66.08"}", "/scratch/micpie/export/rdkit_features/test_116-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1 is 0."} {"text":"The number of acid groups of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl is 0."}", "/scratch/micpie/export/rdkit_features/test_18-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_12-7.jsonl": "{"text":"The number of acid groups of the compound with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C is 0."} {"text":"The count of acid groups of the compound with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC is 0."}", "/scratch/micpie/export/rdkit_features/test_7-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_26-7.jsonl": "{"text":"The acid group count of the chemical with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2 is 0."} {"text":"The count of acid groups of the molecule with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC is 0."}", "/scratch/micpie/export/rdkit_features/test_20-4.jsonl": "{"text":"The ring count of the compound with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br is 4."} {"text":"The ring count of the molecule with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-] is 2."}", "/scratch/micpie/export/rdkit_features/valid_118-16.jsonl": "{"text":"Question: What is the count of rings of the chemical with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O?\nAnswer: 1"} {"text":"Question: What is the ring count of the chemical with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_100-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the molecule with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F?\nAnswer: 5"} {"text":"Question: What is the count of heteroatoms of the chemical with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_103-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1 is C22H16N2O2."} {"text":"The formula of the compound with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is C14H21N5O4."}", "/scratch/micpie/export/rdkit_features/test_101-8.jsonl": "{"text":"The basic group count of the compound with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F is 0."} {"text":"The count of basic groups of the chemical with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1 is 0."}", "/scratch/micpie/export/rdkit_features/test_118-20.jsonl": "{"text":"Question: What is the number of basic groups of the molecule with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O?\nAnswer: 1"} {"text":"Question: What is the number of basic groups of the molecule with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_30-1.jsonl": "{"text":"The number of hydrogen bond donors of the compound with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3 is 1."} {"text":"The count of hydrogen bond donors of the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F is 0."}", "/scratch/micpie/export/rdkit_features/train_102-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O?\nAnswer: 8"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_20-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the chemical with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3?\nAnswer: 10"} {"text":"Question: What is the rotatable bond count of the compound with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-]?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_115-19.jsonl": "{"text":"Question: What is the count of acid groups of the molecule with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the chemical with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_117-4.jsonl": "{"text":"The ring count of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4OC)Cl is 5."} {"text":"The number of rings of the chemical with SMILES Cc1cc(ccc1[C@@H](C)[NH2+]CC2(CC2)CO)F is 2."}", "/scratch/micpie/export/rdkit_features/valid_11-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the compound with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl is 1."} {"text":"The number of hydrogen bond donor sites of the compound with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C is 0."}", "/scratch/micpie/export/rdkit_features/test_10-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4?\nAnswer: 8"} {"text":"Question: What is the number of rotatable bonds of the molecule with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/train_107-20.jsonl": "{"text":"Question: What is the count of basic groups of the compound with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl?\nAnswer: 1"} {"text":"Question: What is the count of basic groups of the compound with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_31-8.jsonl": "{"text":"The number of basic groups of the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F is 0."} {"text":"The number of basic groups of the molecule with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3 is 0."}", "/scratch/micpie/export/rdkit_features/test_22-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O is 3."} {"text":"The number of rotatable bonds of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4 is 6."}", "/scratch/micpie/export/rdkit_features/train_105-23.jsonl": "{"text":"User: I want to design a compound with a number of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 5 and a LogP value computed using the Wildman-Crippen method of 4.30.\nAssistant: Interesting, do you have some additional limitations that I should consider?\nUser: Yep, I want the formula to be C21H15N3O4.\nAssistant: In that situation, I the compound with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N."} {"text":"User: I want to synthesize a molecule with a number of hydrogen bond donors of 2, a count of hydrogen bond acceptors of 6 and a Wildman-Crippen LogP value of 2.49.\nAssistant: Cool, do you have some additional requirements I should consider?\nUser: Yeah, I want the chemical formula to be C16H21N3O4.\nAssistant: In that case, I advise the molecule with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O."}", "/scratch/micpie/export/rdkit_features/test_22-23.jsonl": "{"text":"User: I want to create a compound with a number of hydrogen bond donor sites of 1, a count of hydrogen bond acceptors of 4 and a LogP value computed using the Wildman-Crippen method of 2.26.\nAssistant: Cool, do you have some additional requirements that I should consider?\nUser: I want the chemical formula to be C21H33N3O3.\nAssistant: I propose the compound with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O."} {"text":"User: I want to design a chemical with a number of hydrogen bond donor sites of 2, a count of hydrogen bond acceptors of 5 and a LogP value computed using the Wildman-Crippen method of -0.65.\nAssistant: Do you have some additional I should consider?\nUser: I want the formula to be C21H40N6O+2.\nAssistant: In that situation, I advise the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4."}", "/scratch/micpie/export/rdkit_features/valid_23-20.jsonl": "{"text":"Question: What is the number of basic groups of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4?\nAnswer: 1"} {"text":"Question: What is the count of basic groups of the compound with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_20-1.jsonl": "{"text":"The number of hydrogen bond donors of the compound with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3 is 2."} {"text":"The count of hydrogen bond donors of the molecule with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-] is 1."}", "/scratch/micpie/export/rdkit_features/train_18-7.jsonl": "{"text":"The acid group count of the compound with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O is 0."} {"text":"The acid group count of the compound with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl is 0."}", "/scratch/micpie/export/rdkit_features/valid_28-22.jsonl": "{"text":"User: I want to create a chemical with a number of hydrogen bond donor sites of 2 and a count of hydrogen bond acceptors of 1.\nAssistant: Do you have some additional conditions I should consider?\nUser: Yes, I want the count of heteroatoms to be 5.\nAssistant: Then, I advise the chemical with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br."} {"text":"User: I want to create a compound with a number of hydrogen bond donor sites of 0 and a count of hydrogen bond acceptors of 2.\nAssistant: Interesting, do you have some additional conditions I should take into account?\nUser: Yea, I want the heteroatom count to be 4.\nAssistant: In that situation, I propose the compound with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C."}", "/scratch/micpie/export/rdkit_features/train_23-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the molecule with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4 is 0.74."} {"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3 is 1.77."}", "/scratch/micpie/export/rdkit_features/train_114-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C?\nAnswer: 64.69"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F?\nAnswer: 71.42"}", "/scratch/micpie/export/rdkit_features/test_7-5.jsonl": "{"text":"The number of rotatable bonds of the compound with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC is 6."} {"text":"The number of rotatable bonds of the molecule with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC is 5."}", "/scratch/micpie/export/rdkit_features/test_18-8.jsonl": "{"text":"The count of basic groups of the molecule with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O is 0."} {"text":"The count of basic groups of the molecule with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C is 0."}", "/scratch/micpie/export/rdkit_features/valid_101-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3?\nAnswer: 4"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_111-12.jsonl": "{"text":"Question: What is the formula of the molecule with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4?\nAnswer: C22H34N5O3S+"} {"text":"Question: What is the chemical formula of the compound with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC?\nAnswer: C22H31N5O4"}", "/scratch/micpie/export/rdkit_features/test_106-7.jsonl": "{"text":"The acid group count of the molecule with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O is 0."} {"text":"The count of acid groups of the compound with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC is 0."}", "/scratch/micpie/export/rdkit_features/train_105-6.jsonl": "{"text":"The count of aromatic bonds of the compound with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N is 18."} {"text":"The number of aromatic bonds of the compound with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O is 6."}", "/scratch/micpie/export/rdkit_features/test_31-12.jsonl": "{"text":"Question: What is the molecular formula of the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F?\nAnswer: C24H23F2N3O2"} {"text":"Question: What is the formula of the compound with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3?\nAnswer: C20H27FN2O4"}", "/scratch/micpie/export/rdkit_features/test_104-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2 is 0."} {"text":"The count of aromatic bonds of the molecule with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3 is 18."}", "/scratch/micpie/export/rdkit_features/test_26-23.jsonl": "{"text":"User: I want to design a compound with a number of hydrogen bond donor sites of 0, a number of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value computed using RDKit of 3.66.\nAssistant: That's interesting, do you have some additional conditions I should consider?\nUser: Yea, I want the chemical formula to be C18H22F3NO3.\nAssistant: In that scenario, I propose the compound with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3."} {"text":"User: I want to design a molecule with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptor sites of 3 and a Wildman-Crippen LogP value of 3.75.\nAssistant: That's interesting, do you have some additional that help me narrow down the search?\nUser: Yep, I want the formula to be C15H13BrClNO3.\nAssistant: In that situation, I the molecule with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br."}", "/scratch/micpie/export/rdkit_features/train_26-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F?\nAnswer: 2"} {"text":"Question: What is the ring count of the molecule with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_10-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the chemical with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C?\nAnswer: 10"} {"text":"Question: What is the heteroatom count of the chemical with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_102-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the chemical with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3 is 4."} {"text":"The count of hydrogen bond acceptors of the compound with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1 is 6."}", "/scratch/micpie/export/rdkit_features/train_16-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the compound with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C is 3."} {"text":"The count of hydrogen bond acceptors of the molecule with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F is 3."}", "/scratch/micpie/export/rdkit_features/test_14-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the compound with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3 is 3."} {"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O is 4."}", "/scratch/micpie/export/rdkit_features/valid_16-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the chemical with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C?\nAnswer: 9"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/valid_27-11.jsonl": "{"text":"User: I want to synthesize a compound with a chemical formula of C14H18F3N5OS.\nAssistant: Nice, do you have some additional that I should consider?\nUser: Yep, I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 5.\nAssistant: Then, I suggest the compound with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F."} {"text":"User: I want to make a compound with a molecular formula of C20H31ClN3O+.\nAssistant: Nice, do you have some additional constraints I should consider?\nUser: Yea, I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 1.\nAssistant: In that situation, I propose the compound with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C."}", "/scratch/micpie/export/rdkit_features/train_104-5.jsonl": "{"text":"The number of rotatable bonds of the molecule with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 4."} {"text":"The count of rotatable bonds of the compound with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl is 3."}", "/scratch/micpie/export/rdkit_features/test_10-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4?\nAnswer: 74.79"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F?\nAnswer: 51.15"}", "/scratch/micpie/export/rdkit_features/train_109-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_19-7.jsonl": "{"text":"The acid group count of the molecule with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC is 0."} {"text":"The number of acid groups of the compound with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F is 0."}", "/scratch/micpie/export/rdkit_features/valid_105-5.jsonl": "{"text":"The number of rotatable bonds of the molecule with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F is 5."} {"text":"The number of rotatable bonds of the molecule with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O is 2."}", "/scratch/micpie/export/rdkit_features/train_19-3.jsonl": "{"text":"The heteroatom count of the molecule with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC is 7."} {"text":"The number of heteroatoms of the compound with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3 is 6."}", "/scratch/micpie/export/rdkit_features/test_28-5.jsonl": "{"text":"The number of rotatable bonds of the compound with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC is 5."} {"text":"The count of rotatable bonds of the chemical with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C is 4."}", "/scratch/micpie/export/rdkit_features/train_119-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond donors of the molecule with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_25-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the chemical with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C?\nAnswer: 9"} {"text":"Question: What is the heteroatom count of the molecule with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/test_106-16.jsonl": "{"text":"Question: What is the number of rings of the molecule with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O?\nAnswer: 2"} {"text":"Question: What is the count of rings of the compound with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_14-22.jsonl": "{"text":"User: I want to make a molecule with a count of hydrogen bond donors of 2 and a number of hydrogen bond acceptor sites of 3.\nAssistant: Do you have some additional constraints I should take into account?\nUser: Yea, I want the number of heteroatoms to be 5.\nAssistant: I advise the molecule with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C."} {"text":"User: I want to make a chemical with a count of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 4.\nAssistant: That is a very interesting question, do you have some additional requirements?\nUser: I want the number of heteroatoms to be 5.\nAssistant: In that situation, I the chemical with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O."}", "/scratch/micpie/export/rdkit_features/test_24-0.jsonl": "{"text":"The chemical formula of the compound with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl is C14H7Cl2N8O-."} {"text":"The formula of the compound with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C is C21H39N6O+."}", "/scratch/micpie/export/rdkit_features/train_100-0.jsonl": "{"text":"The formula of the chemical with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C is C19H33N2O2+."} {"text":"The molecular formula of the compound with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3 is C17H15N3O3."}", "/scratch/micpie/export/rdkit_features/valid_103-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO?\nAnswer: C22H36O7"} {"text":"Question: What is the formula of the compound with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N?\nAnswer: C14H14N6O3"}", "/scratch/micpie/export/rdkit_features/train_13-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the molecule with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC is 4.74."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C is 4.83."}", "/scratch/micpie/export/rdkit_features/valid_9-1.jsonl": "{"text":"The count of hydrogen bond donors of the chemical with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2 is 1."} {"text":"The count of hydrogen bond donors of the chemical with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4 is 1."}", "/scratch/micpie/export/rdkit_features/train_13-3.jsonl": "{"text":"The heteroatom count of the molecule with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC is 4."} {"text":"The count of heteroatoms of the compound with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C is 5."}", "/scratch/micpie/export/rdkit_features/test_102-8.jsonl": "{"text":"The count of basic groups of the chemical with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3 is 1."} {"text":"The number of basic groups of the molecule with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1 is 0."}", "/scratch/micpie/export/rdkit_features/train_106-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O?\nAnswer: 39.71"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl?\nAnswer: 63.49"}", "/scratch/micpie/export/rdkit_features/valid_104-16.jsonl": "{"text":"Question: What is the count of rings of the chemical with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3?\nAnswer: 3"} {"text":"Question: What is the number of rings of the molecule with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_102-15.jsonl": "{"text":"Question: What is the heteroatom count of the chemical with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3?\nAnswer: 4"} {"text":"Question: What is the number of heteroatoms of the molecule with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/test_106-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O?\nAnswer: 0"} {"text":"Question: What is the acid group count of the compound with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_20-20.jsonl": "{"text":"Question: What is the number of basic groups of the chemical with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the compound with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-]?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_23-20.jsonl": "{"text":"Question: What is the number of basic groups of the compound with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4?\nAnswer: 1"} {"text":"Question: What is the count of basic groups of the molecule with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_15-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O?\nAnswer: C13H14BrNO3"} {"text":"Question: What is the chemical formula of the chemical with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2?\nAnswer: C16H26N3OS+"}", "/scratch/micpie/export/rdkit_features/test_119-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O?\nAnswer: 6"} {"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_26-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the molecule with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_2-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the compound with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C?\nAnswer: 11"} {"text":"Question: What is the number of aromatic bonds of the molecule with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4?\nAnswer: 18"}", "/scratch/micpie/export/rdkit_features/train_30-12.jsonl": "{"text":"Question: What is the chemical formula of the compound with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F?\nAnswer: C19H27F3N2O2"} {"text":"Question: What is the chemical formula of the molecule with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC?\nAnswer: C24H27FN2O3"}", "/scratch/micpie/export/rdkit_features/test_17-19.jsonl": "{"text":"Question: What is the count of acid groups of the chemical with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the chemical with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_104-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2?\nAnswer: 5"} {"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_110-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O is 7."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C is 8."}", "/scratch/micpie/export/rdkit_features/valid_12-3.jsonl": "{"text":"The number of heteroatoms of the chemical with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl is 4."} {"text":"The number of heteroatoms of the compound with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3 is 5."}", "/scratch/micpie/export/rdkit_features/test_23-8.jsonl": "{"text":"The basic group count of the compound with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4 is 1."} {"text":"The number of basic groups of the molecule with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4 is 1."}", "/scratch/micpie/export/rdkit_features/valid_111-8.jsonl": "{"text":"The basic group count of the compound with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4 is 1."} {"text":"The count of basic groups of the chemical with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC is 0."}", "/scratch/micpie/export/rdkit_features/train_26-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F?\nAnswer: 6"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/test_116-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_114-12.jsonl": "{"text":"Question: What is the formula of the chemical with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C?\nAnswer: C21H19Cl2N3O2S2"} {"text":"Question: What is the chemical formula of the chemical with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F?\nAnswer: C27H26FN4O2+"}", "/scratch/micpie/export/rdkit_features/train_4-11.jsonl": "{"text":"User: I want to design a molecule with a formula of C24H28N3O3+.\nAssistant: Do you have some additional requirements I should take into account?\nUser: I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 5.\nAssistant: Then, I advise the molecule with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4."} {"text":"User: I want to synthesize a chemical with a molecular formula of C23H29N2O3S+.\nAssistant: That is a very interesting question, do you have some additional requirements that I should consider?\nUser: Indeed, I want the number of hydrogen bond donor sites to be 2, the count of hydrogen bond acceptors to be 4.\nAssistant: Then, I suggest the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C."}", "/scratch/micpie/export/rdkit_features/valid_2-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the compound with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4 is 62.67."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3 is 64.53."}", "/scratch/micpie/export/rdkit_features/valid_118-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O?\nAnswer: 42.44"} {"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO?\nAnswer: 46.80"}", "/scratch/micpie/export/rdkit_features/train_20-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3 is 71.71."} {"text":"The sum of atomic polarizabilities of the compound with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC is 49.29."}", "/scratch/micpie/export/rdkit_features/train_28-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl?\nAnswer: C14H9BrClNO2S"} {"text":"Question: What is the chemical formula of the compound with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F?\nAnswer: C20H26FN3O2"}", "/scratch/micpie/export/rdkit_features/valid_27-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the chemical with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F?\nAnswer: 10"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_14-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the compound with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C?\nAnswer: 5"} {"text":"Question: What is the number of heteroatoms of the molecule with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_20-19.jsonl": "{"text":"Question: What is the acid group count of the chemical with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the compound with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_31-23.jsonl": "{"text":"User: I want to create a chemical with a number of hydrogen bond donor sites of 0, a number of hydrogen bond acceptor sites of 3 and a Wildman-Crippen LogP value of 3.59.\nAssistant: Cool, do you have some additional constraints that help me narrow down the search?\nUser: Yeah, I want the molecular formula to be C24H27FN2O3.\nAssistant: In that case, I recommend the chemical with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC."} {"text":"User: I want to synthesize a compound with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 4 and a Wildman-Crippen LogP value computed using RDKit of 2.43.\nAssistant: That's interesting, do you have some additional limitations I should take into account?\nUser: Indeed, I want the formula to be C20H27FN2O4.\nAssistant: In that scenario, I the compound with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3."}", "/scratch/micpie/export/rdkit_features/valid_23-23.jsonl": "{"text":"User: I want to analyze a compound with a number of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 5 and a Wildman-Crippen LogP value computed using RDKit of 0.96.\nAssistant: Nice, do you have some additional requirements that help me narrow down the search?\nUser: Yeah, I want the chemical formula to be C19H34N5OS+.\nAssistant: In that case, I advise the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4."} {"text":"User: I want to analyze a chemical with a count of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 6 and a LogP value computed using the Wildman-Crippen method of 1.79.\nAssistant: Do you have some additional limitations that I should consider?\nUser: Yep, I want the chemical formula to be C19H24N7O2-.\nAssistant: In that case, I propose the chemical with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4."}", "/scratch/micpie/export/rdkit_features/test_108-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5 is C21H19N9OS."} {"text":"The chemical formula of the molecule with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3 is C23H29N5O4."}", "/scratch/micpie/export/rdkit_features/test_6-6.jsonl": "{"text":"The number of aromatic bonds of the chemical with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3 is 6."} {"text":"The number of aromatic bonds of the compound with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C is 17."}", "/scratch/micpie/export/rdkit_features/train_1-16.jsonl": "{"text":"Question: What is the number of rings of the molecule with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4?\nAnswer: 4"} {"text":"Question: What is the ring count of the chemical with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_29-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl is C20H31ClN3O+."} {"text":"The chemical formula of the compound with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3 is C21H25F2N2O2+."}", "/scratch/micpie/export/rdkit_features/valid_30-5.jsonl": "{"text":"The number of rotatable bonds of the molecule with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3 is 4."} {"text":"The rotatable bond count of the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F is 3."}", "/scratch/micpie/export/rdkit_features/train_12-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C is 3."} {"text":"The count of rotatable bonds of the chemical with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC is 8."}", "/scratch/micpie/export/rdkit_features/valid_13-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4 is 1."} {"text":"The number of hydrogen bond donor sites of the compound with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C is 1."}", "/scratch/micpie/export/rdkit_features/valid_8-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C?\nAnswer: 70.69"} {"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br?\nAnswer: 35.06"}", "/scratch/micpie/export/rdkit_features/valid_3-0.jsonl": "{"text":"The chemical formula of the compound with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N is C22H23N5O2S."} {"text":"The molecular formula of the compound with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC is C20H22ClN3O4."}", "/scratch/micpie/export/rdkit_features/valid_24-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_100-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F?\nAnswer: C18H27FN3O+"} {"text":"Question: What is the molecular formula of the compound with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4?\nAnswer: C20H24N3O+"}", "/scratch/micpie/export/rdkit_features/train_109-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3 is 6."} {"text":"The count of aromatic bonds of the molecule with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C is 16."}", "/scratch/micpie/export/rdkit_features/valid_106-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the chemical with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl is 2.32."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F is 0.70."}", "/scratch/micpie/export/rdkit_features/valid_16-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C?\nAnswer: 58.82"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3?\nAnswer: 69.89"}", "/scratch/micpie/export/rdkit_features/train_113-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the compound with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 9"} {"text":"Question: What is the number of heteroatoms of the molecule with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/valid_2-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4?\nAnswer: 0"} {"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_18-1.jsonl": "{"text":"The number of hydrogen bond donors of the molecule with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O is 1."} {"text":"The count of hydrogen bond donors of the chemical with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C is 1."}", "/scratch/micpie/export/rdkit_features/test_24-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl is 21."} {"text":"The number of aromatic bonds of the molecule with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C is 5."}", "/scratch/micpie/export/rdkit_features/test_112-8.jsonl": "{"text":"The basic group count of the chemical with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC is 0."} {"text":"The basic group count of the chemical with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N is 2."}", "/scratch/micpie/export/rdkit_features/test_21-3.jsonl": "{"text":"The heteroatom count of the molecule with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-] is 8."} {"text":"The count of heteroatoms of the molecule with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O is 7."}", "/scratch/micpie/export/rdkit_features/train_10-8.jsonl": "{"text":"The basic group count of the chemical with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4 is 1."} {"text":"The count of basic groups of the chemical with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl is 0."}", "/scratch/micpie/export/rdkit_features/valid_2-6.jsonl": "{"text":"The aromatic bond count of the molecule with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4 is 17."} {"text":"The aromatic bond count of the molecule with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3 is 11."}", "/scratch/micpie/export/rdkit_features/train_18-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O is 1."} {"text":"The count of hydrogen bond donors of the chemical with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl is 2."}", "/scratch/micpie/export/rdkit_features/train_17-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F is 3."} {"text":"The count of hydrogen bond acceptors of the compound with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 9."}", "/scratch/micpie/export/rdkit_features/valid_109-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl?\nAnswer: 3"} {"text":"Question: What is the ring count of the compound with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_102-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3?\nAnswer: 60.66"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1?\nAnswer: 73.63"}", "/scratch/micpie/export/rdkit_features/test_0-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1 is 1."} {"text":"The number of hydrogen bond donors of the compound with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC is 1."}", "/scratch/micpie/export/rdkit_features/train_119-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO?\nAnswer: 43.60"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl?\nAnswer: 63.14"}", "/scratch/micpie/export/rdkit_features/test_16-23.jsonl": "{"text":"User: I want to synthesize a chemical with a count of hydrogen bond donors of 2, a count of hydrogen bond acceptors of 2 and a Wildman-Crippen LogP value of 1.42.\nAssistant: That's interesting, do you have some additional requirements I should consider?\nUser: I want the chemical formula to be C18H35N2O2+.\nAssistant: In that scenario, I recommend the chemical with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C."} {"text":"User: I want to design a chemical with a count of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 5 and a Wildman-Crippen LogP value computed using RDKit of 2.95.\nAssistant: That is a very interesting question, do you have some additional conditions?\nUser: Yes, I want the formula to be C17H19Cl2N3O3S.\nAssistant: In that situation, I the chemical with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N."}", "/scratch/micpie/export/rdkit_features/train_104-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 7."} {"text":"The count of hydrogen bond acceptors of the chemical with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl is 4."}", "/scratch/micpie/export/rdkit_features/test_33-6.jsonl": "{"text":"The number of aromatic bonds of the chemical with SMILES Cc1ccc(cc1)C(=O)N2CCC(CC2)NC(=O)N[C@@H](C)[C@H](C(F)(F)F)O is 6."} {"text":"The aromatic bond count of the molecule with SMILES Cc1cc(no1)OCc2nc(no2)c3cc(c(n(c3=O)C)C)Br is 16."}", "/scratch/micpie/export/rdkit_features/test_108-4.jsonl": "{"text":"The number of rings of the chemical with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5 is 5."} {"text":"The ring count of the chemical with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3 is 3."}", "/scratch/micpie/export/rdkit_features/train_24-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_26-0.jsonl": "{"text":"The chemical formula of the compound with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2 is C18H24F3NO3."} {"text":"The chemical formula of the chemical with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC is C19H20N4O4."}", "/scratch/micpie/export/rdkit_features/test_7-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC?\nAnswer: C24H26N4O3"} {"text":"Question: What is the formula of the chemical with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC?\nAnswer: C21H18N2O5S"}", "/scratch/micpie/export/rdkit_features/valid_104-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3?\nAnswer: 6"} {"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_109-3.jsonl": "{"text":"The number of heteroatoms of the molecule with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3 is 8."} {"text":"The heteroatom count of the compound with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C is 12."}", "/scratch/micpie/export/rdkit_features/train_118-3.jsonl": "{"text":"The count of heteroatoms of the compound with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F is 3."} {"text":"The heteroatom count of the molecule with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3 is 9."}", "/scratch/micpie/export/rdkit_features/valid_29-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the molecule with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl is 1."} {"text":"The number of hydrogen bond acceptors of the chemical with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3 is 2."}", "/scratch/micpie/export/rdkit_features/train_107-0.jsonl": "{"text":"The chemical formula of the compound with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl is C21H30ClN6O3+."} {"text":"The chemical formula of the chemical with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5 is C21H19N9OS."}", "/scratch/micpie/export/rdkit_features/test_107-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the compound with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC?\nAnswer: 9"} {"text":"Question: What is the number of heteroatoms of the molecule with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/test_11-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2 is 51.26."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C is 62.09."}", "/scratch/micpie/export/rdkit_features/test_25-3.jsonl": "{"text":"The number of heteroatoms of the molecule with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4 is 7."} {"text":"The count of heteroatoms of the compound with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3 is 7."}", "/scratch/micpie/export/rdkit_features/train_1-23.jsonl": "{"text":"User: I want to analyze a chemical with a number of hydrogen bond donors of 2, a count of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value computed using RDKit of 2.54.\nAssistant: Do you have some additional limitations I should consider?\nUser: Yep, I want the formula to be C25H33N2O3+.\nAssistant: In that case, I advise the chemical with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4."} {"text":"User: I want to make a molecule with a number of hydrogen bond donors of 0, a count of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value computed using RDKit of 3.91.\nAssistant: Interesting, do you have some additional requirements?\nUser: Yep, I want the chemical formula to be C17H22ClF3N2O2S.\nAssistant: In that scenario, I propose the molecule with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl."}", "/scratch/micpie/export/rdkit_features/valid_13-16.jsonl": "{"text":"Question: What is the number of rings of the compound with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4?\nAnswer: 4"} {"text":"Question: What is the count of rings of the molecule with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_120-5.jsonl": "{"text":"The rotatable bond count of the compound with SMILES c1ccc(cc1)CCN(CC(=O)Nc2ccc(cc2)Cl)S(=O)(=O)c3ccc4c(c3)OCCO4 is 8."} {"text":"The rotatable bond count of the compound with SMILES Cc1c(cnn1c2ccccc2)N[C@H](C)C(=O)Nc3cccc(c3Cl)Cl is 5."}", "/scratch/micpie/export/rdkit_features/train_28-8.jsonl": "{"text":"The count of basic groups of the chemical with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl is 0."} {"text":"The count of basic groups of the molecule with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F is 0."}", "/scratch/micpie/export/rdkit_features/valid_20-23.jsonl": "{"text":"User: I want to create a chemical with a count of hydrogen bond donors of 2, a count of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value of 3.95.\nAssistant: Cool, do you have some additional I should take into account?\nUser: Yeah, I want the chemical formula to be C25H33N3O3.\nAssistant: In that case, I suggest the chemical with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3."} {"text":"User: I want to design a chemical with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptor sites of 6 and a Wildman-Crippen LogP value of 1.10.\nAssistant: Cool, do you have some additional conditions I should take into account?\nUser: Yes, I want the chemical formula to be C16H19ClNO6S-.\nAssistant: In that case, I suggest the chemical with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-]."}", "/scratch/micpie/export/rdkit_features/test_6-3.jsonl": "{"text":"The number of heteroatoms of the chemical with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3 is 8."} {"text":"The number of heteroatoms of the chemical with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C is 6."}", "/scratch/micpie/export/rdkit_features/test_20-18.jsonl": "{"text":"Question: What is the aromatic bond count of the molecule with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br?\nAnswer: 6"} {"text":"Question: What is the aromatic bond count of the compound with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-]?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/train_10-20.jsonl": "{"text":"Question: What is the number of basic groups of the chemical with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4?\nAnswer: 1"} {"text":"Question: What is the basic group count of the compound with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_27-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F?\nAnswer: 5"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_28-0.jsonl": "{"text":"The formula of the molecule with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC is C17H19N4OS2+."} {"text":"The molecular formula of the molecule with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C is C20H25N5O."}", "/scratch/micpie/export/rdkit_features/test_8-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the molecule with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C?\nAnswer: 8"} {"text":"Question: What is the count of rotatable bonds of the chemical with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_29-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_15-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC is 2.53."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C is 1.57."}", "/scratch/micpie/export/rdkit_features/test_102-4.jsonl": "{"text":"The number of rings of the compound with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3 is 5."} {"text":"The number of rings of the chemical with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1 is 4."}", "/scratch/micpie/export/rdkit_features/train_29-4.jsonl": "{"text":"The ring count of the chemical with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F is 4."} {"text":"The ring count of the compound with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C is 3."}", "/scratch/micpie/export/rdkit_features/test_111-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4 is C22H34N5O3S+."} {"text":"The molecular formula of the chemical with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC is C22H31N5O4."}", "/scratch/micpie/export/rdkit_features/train_109-23.jsonl": "{"text":"User: I want to analyze a molecule with a number of hydrogen bond donors of 2, a number of hydrogen bond acceptors of 5 and a LogP value computed using the Wildman-Crippen method of 2.25.\nAssistant: Nice, do you have some additional requirements I should take into account?\nUser: Indeed, I want the formula to be C23H33N3O5.\nAssistant: In that situation, I the molecule with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3."} {"text":"User: I want to make a molecule with a number of hydrogen bond donors of 3, a count of hydrogen bond acceptors of 8 and a Wildman-Crippen LogP value computed using RDKit of 2.46.\nAssistant: Do you have some additional requirements that I should consider?\nUser: Indeed, I want the chemical formula to be C16H18N6O3S3.\nAssistant: In that scenario, I propose the molecule with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C."}", "/scratch/micpie/export/rdkit_features/test_31-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F?\nAnswer: 0"} {"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_0-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the compound with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1?\nAnswer: 25"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC?\nAnswer: 23"}", "/scratch/micpie/export/rdkit_features/train_22-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the molecule with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O is 1."} {"text":"The number of hydrogen bond donors of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4 is 1."}", "/scratch/micpie/export/rdkit_features/valid_104-22.jsonl": "{"text":"User: I want to create a compound with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptors of 6.\nAssistant: That's interesting, do you have some additional requirements that I should consider?\nUser: Yeah, I want the number of heteroatoms to be 9.\nAssistant: In that case, I the compound with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3."} {"text":"User: I want to create a chemical with a count of hydrogen bond donors of 3 and a count of hydrogen bond acceptors of 2.\nAssistant: Do you have some additional constraints I should consider?\nUser: Yep, I want the number of heteroatoms to be 5.\nAssistant: I the chemical with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C."}", "/scratch/micpie/export/rdkit_features/valid_112-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the chemical with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC is 9."} {"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N is 5."}", "/scratch/micpie/export/rdkit_features/train_3-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O is 65.41."} {"text":"The total sum of atomic polarizabilities of the molecule with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4 is 66.62."}", "/scratch/micpie/export/rdkit_features/valid_118-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O is C14H23FNO+."} {"text":"The chemical formula of the chemical with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO is C15H20FN3O4."}", "/scratch/micpie/export/rdkit_features/test_24-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl is 7."} {"text":"The count of hydrogen bond acceptors of the chemical with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C is 5."}", "/scratch/micpie/export/rdkit_features/valid_5-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C?\nAnswer: 8"} {"text":"Question: What is the number of heteroatoms of the molecule with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+]?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_118-18.jsonl": "{"text":"Question: What is the aromatic bond count of the molecule with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O?\nAnswer: 6"} {"text":"Question: What is the aromatic bond count of the chemical with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/valid_20-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3?\nAnswer: 71.71"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-]?\nAnswer: 51.82"}", "/scratch/micpie/export/rdkit_features/train_107-8.jsonl": "{"text":"The basic group count of the chemical with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl is 1."} {"text":"The count of basic groups of the chemical with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5 is 0."}", "/scratch/micpie/export/rdkit_features/test_103-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1?\nAnswer: 3"} {"text":"Question: What is the rotatable bond count of the chemical with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_30-0.jsonl": "{"text":"The molecular formula of the chemical with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3 is C21H34N4O."} {"text":"The formula of the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F is C24H24F2N2O2."}", "/scratch/micpie/export/rdkit_features/test_28-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC?\nAnswer: 4"} {"text":"Question: What is the count of rings of the chemical with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_22-22.jsonl": "{"text":"User: I want to analyze a compound with a number of hydrogen bond donor sites of 1 and a number of hydrogen bond acceptor sites of 4.\nAssistant: Cool, do you have some additional constraints I should consider?\nUser: Yeah, I want the heteroatom count to be 8.\nAssistant: In that case, I the compound with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O."} {"text":"User: I want to analyze a compound with a number of hydrogen bond donor sites of 1 and a number of hydrogen bond acceptors of 7.\nAssistant: That is a very interesting question, do you have some additional requirements I should take into account?\nUser: I want the number of heteroatoms to be 8.\nAssistant: In that case, I recommend the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4."}", "/scratch/micpie/export/rdkit_features/train_105-4.jsonl": "{"text":"The ring count of the compound with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N is 3."} {"text":"The ring count of the compound with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O is 5."}", "/scratch/micpie/export/rdkit_features/valid_100-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the compound with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F is 54.34."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4 is 55.31."}", "/scratch/micpie/export/rdkit_features/valid_17-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the chemical with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S?\nAnswer: 0"} {"text":"Question: What is the number of aromatic bonds of the molecule with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/test_21-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-] is C18H18NO6S-."} {"text":"The molecular formula of the compound with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O is C20H32N4O3."}", "/scratch/micpie/export/rdkit_features/test_111-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4 is 1.06."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC is 2.06."}", "/scratch/micpie/export/rdkit_features/test_5-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donors of the molecule with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_107-1.jsonl": "{"text":"The number of hydrogen bond donors of the chemical with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC is 2."} {"text":"The count of hydrogen bond donors of the molecule with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC is 3."}", "/scratch/micpie/export/rdkit_features/test_22-8.jsonl": "{"text":"The basic group count of the compound with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O is 0."} {"text":"The count of basic groups of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4 is 2."}", "/scratch/micpie/export/rdkit_features/train_106-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O?\nAnswer: C12H16F2N4O3"} {"text":"Question: What is the formula of the molecule with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl?\nAnswer: C20H24ClN7O3"}", "/scratch/micpie/export/rdkit_features/valid_108-22.jsonl": "{"text":"User: I want to synthesize a compound with a count of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 8.\nAssistant: That is a very interesting question, do you have some additional I should consider?\nUser: Yep, I want the heteroatom count to be 12.\nAssistant: In that scenario, I recommend the compound with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O."} {"text":"User: I want to make a compound with a number of hydrogen bond donors of 3 and a count of hydrogen bond acceptors of 4.\nAssistant: Cool, do you have some additional limitations I should take into account?\nUser: Yes, I want the heteroatom count to be 8.\nAssistant: In that scenario, I propose the compound with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C."}", "/scratch/micpie/export/rdkit_features/valid_8-12.jsonl": "{"text":"Question: What is the molecular formula of the molecule with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C?\nAnswer: C23H36N2O5"} {"text":"Question: What is the molecular formula of the chemical with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br?\nAnswer: C11H7Br2F4NO2"}", "/scratch/micpie/export/rdkit_features/train_26-11.jsonl": "{"text":"User: I want to design a chemical with a chemical formula of C15H15F6NO2.\nAssistant: Cool, do you have some additional limitations I should consider?\nUser: Yea, I want the count of hydrogen bond donors to be 0, the count of hydrogen bond acceptors to be 2.\nAssistant: In that case, I recommend the chemical with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F."} {"text":"User: I want to make a chemical with a formula of C21H31N4O+.\nAssistant: Cool, do you have some additional conditions I should take into account?\nUser: Yea, I want the number of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 3.\nAssistant: In that case, I the chemical with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C."}", "/scratch/micpie/export/rdkit_features/test_15-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the chemical with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O?\nAnswer: 6"} {"text":"Question: What is the number of rotatable bonds of the molecule with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_10-0.jsonl": "{"text":"The molecular formula of the chemical with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C is C17H25F3N6S."} {"text":"The chemical formula of the chemical with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3 is C20H26ClNO2."}", "/scratch/micpie/export/rdkit_features/train_2-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6?\nAnswer: 1"} {"text":"Question: What is the count of basic groups of the compound with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_116-7.jsonl": "{"text":"The count of acid groups of the molecule with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 0."} {"text":"The count of acid groups of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl is 0."}", "/scratch/micpie/export/rdkit_features/test_28-20.jsonl": "{"text":"Question: What is the count of basic groups of the compound with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC?\nAnswer: 1"} {"text":"Question: What is the number of basic groups of the compound with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_112-23.jsonl": "{"text":"User: I want to design a chemical with a count of hydrogen bond donors of 0, a number of hydrogen bond acceptors of 8 and a LogP value computed using the Wildman-Crippen method of 2.20.\nAssistant: Cool, do you have some additional conditions that I should consider?\nUser: Indeed, I want the molecular formula to be C22H31N5O4.\nAssistant: Then, I recommend the chemical with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC."} {"text":"User: I want to synthesize a chemical with a number of hydrogen bond donor sites of 3, a number of hydrogen bond acceptor sites of 5 and a LogP value computed using the Wildman-Crippen method of 0.24.\nAssistant: Cool, do you have some additional constraints?\nUser: Yes, I want the chemical formula to be C21H32N5O3S+.\nAssistant: In that case, I suggest the chemical with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N."}", "/scratch/micpie/export/rdkit_features/test_104-22.jsonl": "{"text":"User: I want to synthesize a chemical with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptors of 5.\nAssistant: That's interesting, do you have some additional I should consider?\nUser: Yea, I want the number of heteroatoms to be 10.\nAssistant: In that scenario, I suggest the chemical with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2."} {"text":"User: I want to create a chemical with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 6.\nAssistant: Do you have some additional requirements?\nUser: Yea, I want the number of heteroatoms to be 7.\nAssistant: I suggest the chemical with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3."}", "/scratch/micpie/export/rdkit_features/train_11-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl?\nAnswer: 58.75"} {"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C?\nAnswer: 50.18"}", "/scratch/micpie/export/rdkit_features/valid_7-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC is 9."} {"text":"The number of rotatable bonds of the molecule with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC is 7."}", "/scratch/micpie/export/rdkit_features/test_33-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES Cc1ccc(cc1)C(=O)N2CCC(CC2)NC(=O)N[C@@H](C)[C@H](C(F)(F)F)O is 55.06."} {"text":"The sum of atomic polarizabilities of the compound with SMILES Cc1cc(no1)OCc2nc(no2)c3cc(c(n(c3=O)C)C)Br is 43.97."}", "/scratch/micpie/export/rdkit_features/test_5-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5 is C23H27N2O3S+."} {"text":"The chemical formula of the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F is C22H22F2N2O3."}", "/scratch/micpie/export/rdkit_features/valid_19-17.jsonl": "{"text":"Question: What is the rotatable bond count of the compound with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC?\nAnswer: 9"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_2-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C is C20H27ClFN4O2+."} {"text":"The chemical formula of the compound with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4 is C24H25N3O3."}", "/scratch/micpie/export/rdkit_features/test_24-22.jsonl": "{"text":"User: I want to analyze a chemical with a number of hydrogen bond donor sites of 1 and a number of hydrogen bond acceptor sites of 7.\nAssistant: That is a very interesting question, do you have some additional conditions I should take into account?\nUser: Yeah, I want the count of heteroatoms to be 11.\nAssistant: In that case, I advise the chemical with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl."} {"text":"User: I want to design a chemical with a count of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 5.\nAssistant: That's interesting, do you have some additional conditions?\nUser: Yea, I want the heteroatom count to be 7.\nAssistant: In that case, I the chemical with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C."}", "/scratch/micpie/export/rdkit_features/train_9-18.jsonl": "{"text":"Question: What is the aromatic bond count of the chemical with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C?\nAnswer: 12"} {"text":"Question: What is the aromatic bond count of the compound with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/valid_108-7.jsonl": "{"text":"The number of acid groups of the molecule with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O is 0."} {"text":"The number of acid groups of the compound with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C is 0."}", "/scratch/micpie/export/rdkit_features/test_29-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4?\nAnswer: 1"} {"text":"Question: What is the number of basic groups of the molecule with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_112-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the molecule with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC?\nAnswer: 10"} {"text":"Question: What is the number of heteroatoms of the molecule with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/valid_13-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the compound with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4 is 4.53."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C is 4.56."}", "/scratch/micpie/export/rdkit_features/valid_27-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F is 4.23."} {"text":"The Wildman-Crippen LogP value of the chemical with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C is 2.54."}", "/scratch/micpie/export/rdkit_features/valid_4-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_114-20.jsonl": "{"text":"Question: What is the number of basic groups of the compound with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the chemical with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_16-20.jsonl": "{"text":"Question: What is the basic group count of the chemical with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C?\nAnswer: 1"} {"text":"Question: What is the number of basic groups of the compound with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_23-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the compound with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4?\nAnswer: 7"} {"text":"Question: What is the heteroatom count of the molecule with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4?\nAnswer: 10"}", "/scratch/micpie/export/rdkit_features/valid_111-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4 is 0."} {"text":"The acid group count of the compound with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC is 0."}", "/scratch/micpie/export/rdkit_features/train_9-4.jsonl": "{"text":"The count of rings of the chemical with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C is 4."} {"text":"The count of rings of the compound with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F is 3."}", "/scratch/micpie/export/rdkit_features/train_104-6.jsonl": "{"text":"The number of aromatic bonds of the chemical with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 5."} {"text":"The aromatic bond count of the chemical with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl is 12."}", "/scratch/micpie/export/rdkit_features/train_16-18.jsonl": "{"text":"Question: What is the aromatic bond count of the compound with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C?\nAnswer: 5"} {"text":"Question: What is the aromatic bond count of the molecule with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_116-18.jsonl": "{"text":"Question: What is the aromatic bond count of the molecule with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 17"} {"text":"Question: What is the number of aromatic bonds of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl?\nAnswer: 17"}", "/scratch/micpie/export/rdkit_features/train_108-23.jsonl": "{"text":"User: I want to make a chemical with a number of hydrogen bond donor sites of 2, a count of hydrogen bond acceptors of 7 and a Wildman-Crippen LogP value of 2.03.\nAssistant: Cool, do you have some additional conditions I should take into account?\nUser: Yes, I want the molecular formula to be C17H21F3N6O2S.\nAssistant: I recommend the chemical with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2."} {"text":"User: I want to analyze a molecule with a number of hydrogen bond donor sites of 3, a number of hydrogen bond acceptor sites of 4 and a Wildman-Crippen LogP value of 2.06.\nAssistant: Do you have some additional requirements I should consider?\nUser: Yep, I want the molecular formula to be C23H32N4O4.\nAssistant: In that case, I propose the molecule with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3."}", "/scratch/micpie/export/rdkit_features/valid_103-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO?\nAnswer: 7"} {"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_32-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C is 63.82."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2 is 51.73."}", "/scratch/micpie/export/rdkit_features/valid_102-22.jsonl": "{"text":"User: I want to synthesize a compound with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 3.\nAssistant: Do you have some additional requirements I should consider?\nUser: Indeed, I want the heteroatom count to be 7.\nAssistant: In that situation, I recommend the compound with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1."} {"text":"User: I want to synthesize a compound with a count of hydrogen bond donors of 4 and a count of hydrogen bond acceptors of 6.\nAssistant: That's interesting, do you have some additional ?\nUser: I want the number of heteroatoms to be 10.\nAssistant: I advise the compound with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2."}", "/scratch/micpie/export/rdkit_features/valid_5-22.jsonl": "{"text":"User: I want to make a chemical with a number of hydrogen bond donor sites of 1 and a count of hydrogen bond acceptors of 6.\nAssistant: Cool, do you have some additional limitations that help me narrow down the search?\nUser: I want the number of heteroatoms to be 8.\nAssistant: In that scenario, I recommend the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C."} {"text":"User: I want to analyze a molecule with a number of hydrogen bond donor sites of 3 and a number of hydrogen bond acceptors of 4.\nAssistant: That is a very interesting question, do you have some additional limitations I should take into account?\nUser: Yea, I want the heteroatom count to be 6.\nAssistant: In that situation, I the molecule with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+]."}", "/scratch/micpie/export/rdkit_features/test_103-20.jsonl": "{"text":"Question: What is the count of basic groups of the molecule with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the chemical with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_113-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 3."} {"text":"The count of hydrogen bond donors of the compound with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 3."}", "/scratch/micpie/export/rdkit_features/train_31-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F is C24H27FN2O3."} {"text":"The molecular formula of the chemical with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4 is C22H24N4O3."}", "/scratch/micpie/export/rdkit_features/test_102-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3?\nAnswer: 4"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_105-12.jsonl": "{"text":"Question: What is the formula of the molecule with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F?\nAnswer: C18H14FN3OS2"} {"text":"Question: What is the formula of the molecule with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O?\nAnswer: C13H12ClFN3O3+"}", "/scratch/micpie/export/rdkit_features/train_30-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F?\nAnswer: 56.92"} {"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC?\nAnswer: 65.41"}", "/scratch/micpie/export/rdkit_features/train_118-19.jsonl": "{"text":"Question: What is the acid group count of the compound with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F?\nAnswer: 0"} {"text":"Question: What is the acid group count of the compound with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_12-3.jsonl": "{"text":"The number of heteroatoms of the compound with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C is 6."} {"text":"The heteroatom count of the compound with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4 is 4."}", "/scratch/micpie/export/rdkit_features/valid_26-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2?\nAnswer: 0"} {"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_33-16.jsonl": "{"text":"Question: What is the count of rings of the chemical with SMILES c1ccc2c(c1)cn(n2)CC(=O)N3C[C@@H]4[C@H]3CN(CC4)C(=O)[C@@]56CCC[C@@H]5C6?\nAnswer: 6"} {"text":"Question: What is the count of rings of the chemical with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@@H]3CCSC3)Br?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_18-7.jsonl": "{"text":"The acid group count of the chemical with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O is 0."} {"text":"The number of acid groups of the compound with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C is 0."}", "/scratch/micpie/export/rdkit_features/test_24-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl is 43.27."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C is 70.37."}", "/scratch/micpie/export/rdkit_features/test_120-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the compound with SMILES c1ccc(cc1)CCN(CC(=O)Nc2ccc(cc2)Cl)S(=O)(=O)c3ccc4c(c3)OCCO4?\nAnswer: 9"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES Cc1c(cnn1c2ccccc2)N[C@H](C)C(=O)Nc3cccc(c3Cl)Cl?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_19-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the compound with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl?\nAnswer: 8"} {"text":"Question: What is the number of heteroatoms of the compound with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/test_101-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_7-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the compound with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC?\nAnswer: 17"} {"text":"Question: What is the number of aromatic bonds of the chemical with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC?\nAnswer: 16"}", "/scratch/micpie/export/rdkit_features/test_22-4.jsonl": "{"text":"The number of rings of the molecule with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O is 3."} {"text":"The number of rings of the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4 is 4."}", "/scratch/micpie/export/rdkit_features/valid_9-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2?\nAnswer: 2"} {"text":"Question: What is the ring count of the chemical with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_103-11.jsonl": "{"text":"User: I want to synthesize a molecule with a molecular formula of C18H14FNO2.\nAssistant: That's interesting, do you have some additional conditions I should take into account?\nUser: Yes, I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptor sites to be 3.\nAssistant: In that situation, I the molecule with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1."} {"text":"User: I want to synthesize a molecule with a formula of C14H21N5O4.\nAssistant: Nice, do you have some additional ?\nUser: Yeah, I want the count of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 7.\nAssistant: In that scenario, I recommend the molecule with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3."}", "/scratch/micpie/export/rdkit_features/train_7-19.jsonl": "{"text":"Question: What is the count of acid groups of the chemical with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl?\nAnswer: 0"} {"text":"Question: What is the acid group count of the molecule with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_33-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES c1ccc2c(c1)cn(n2)CC(=O)N3C[C@@H]4[C@H]3CN(CC4)C(=O)[C@@]56CCC[C@@H]5C6?\nAnswer: 62.06"} {"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@@H]3CCSC3)Br?\nAnswer: 51.69"}", "/scratch/micpie/export/rdkit_features/valid_2-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the molecule with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4 is 7."} {"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3 is 4."}", "/scratch/micpie/export/rdkit_features/valid_23-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4 is 0."} {"text":"The number of acid groups of the chemical with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4 is 3."}", "/scratch/micpie/export/rdkit_features/valid_105-7.jsonl": "{"text":"The number of acid groups of the molecule with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F is 0."} {"text":"The number of acid groups of the molecule with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O is 0."}", "/scratch/micpie/export/rdkit_features/valid_115-8.jsonl": "{"text":"The count of basic groups of the compound with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4 is 0."} {"text":"The basic group count of the chemical with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F is 0."}", "/scratch/micpie/export/rdkit_features/test_10-19.jsonl": "{"text":"Question: What is the count of acid groups of the compound with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the chemical with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_31-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the chemical with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F?\nAnswer: 7"} {"text":"Question: What is the count of heteroatoms of the compound with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/valid_33-22.jsonl": "{"text":"User: I want to make a molecule with a number of hydrogen bond donors of 0 and a number of hydrogen bond acceptor sites of 4.\nAssistant: Nice, do you have some additional requirements I should take into account?\nUser: Yes, I want the count of heteroatoms to be 6.\nAssistant: In that situation, I recommend the molecule with SMILES c1ccc2c(c1)cn(n2)CC(=O)N3C[C@@H]4[C@H]3CN(CC4)C(=O)[C@@]56CCC[C@@H]5C6."} {"text":"User: I want to synthesize a molecule with a number of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 6.\nAssistant: That's interesting, do you have some additional conditions that I should consider?\nUser: Indeed, I want the count of heteroatoms to be 8.\nAssistant: Then, I suggest the molecule with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@@H]3CCSC3)Br."}", "/scratch/micpie/export/rdkit_features/valid_21-8.jsonl": "{"text":"The number of basic groups of the chemical with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-] is 1."} {"text":"The number of basic groups of the chemical with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3 is 0."}", "/scratch/micpie/export/rdkit_features/valid_117-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4OC)Cl is 73.11."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES Cc1cc(ccc1[C@@H](C)[NH2+]CC2(CC2)CO)F is 41.10."}", "/scratch/micpie/export/rdkit_features/valid_105-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F?\nAnswer: 5"} {"text":"Question: What is the rotatable bond count of the compound with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_31-7.jsonl": "{"text":"The count of acid groups of the compound with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F is 0."} {"text":"The count of acid groups of the molecule with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4 is 0."}", "/scratch/micpie/export/rdkit_features/train_117-22.jsonl": "{"text":"User: I want to synthesize a chemical with a number of hydrogen bond donors of 0 and a count of hydrogen bond acceptors of 5.\nAssistant: That's interesting, do you have some additional conditions I should take into account?\nUser: Yeah, I want the number of heteroatoms to be 8.\nAssistant: Then, I suggest the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(cc4)Oc5ccc(cn5)Cl."} {"text":"User: I want to synthesize a molecule with a number of hydrogen bond donors of 2 and a number of hydrogen bond acceptors of 2.\nAssistant: Nice, do you have some additional constraints I should take into account?\nUser: Yeah, I want the heteroatom count to be 3.\nAssistant: In that situation, I advise the molecule with SMILES C[C@@H](C(C)(C)OC)[NH2+]CCNc1ccccc1."}", "/scratch/micpie/export/rdkit_features/train_16-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the molecule with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C is 2."} {"text":"The count of hydrogen bond donors of the compound with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F is 1."}", "/scratch/micpie/export/rdkit_features/test_116-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1?\nAnswer: C29H26FN3O2S"} {"text":"Question: What is the formula of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl?\nAnswer: C24H23ClN2O3S"}", "/scratch/micpie/export/rdkit_features/valid_14-22.jsonl": "{"text":"User: I want to make a molecule with a number of hydrogen bond donor sites of 1 and a count of hydrogen bond acceptors of 3.\nAssistant: Do you have some additional I should take into account?\nUser: Yes, I want the number of heteroatoms to be 5.\nAssistant: In that situation, I the molecule with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C."} {"text":"User: I want to analyze a compound with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 4.\nAssistant: Nice, do you have some additional I should take into account?\nUser: Yep, I want the heteroatom count to be 5.\nAssistant: In that situation, I advise the compound with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O."}", "/scratch/micpie/export/rdkit_features/train_113-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 5."} {"text":"The count of hydrogen bond acceptors of the chemical with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 3."}", "/scratch/micpie/export/rdkit_features/train_25-19.jsonl": "{"text":"Question: What is the acid group count of the compound with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the molecule with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_0-0.jsonl": "{"text":"The molecular formula of the chemical with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1 is C22H10N2O5S2."} {"text":"The formula of the chemical with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4 is C24H35NO4."}", "/scratch/micpie/export/rdkit_features/test_23-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_13-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 61.56"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C?\nAnswer: 60.41"}", "/scratch/micpie/export/rdkit_features/valid_3-3.jsonl": "{"text":"The number of heteroatoms of the compound with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N is 8."} {"text":"The heteroatom count of the molecule with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC is 8."}", "/scratch/micpie/export/rdkit_features/test_25-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4?\nAnswer: 1"} {"text":"Question: What is the number of basic groups of the chemical with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_114-5.jsonl": "{"text":"The rotatable bond count of the compound with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C is 5."} {"text":"The rotatable bond count of the compound with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F is 5."}", "/scratch/micpie/export/rdkit_features/test_7-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC?\nAnswer: 6"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_110-6.jsonl": "{"text":"The aromatic bond count of the molecule with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3 is 11."} {"text":"The number of aromatic bonds of the compound with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4 is 11."}", "/scratch/micpie/export/rdkit_features/test_17-7.jsonl": "{"text":"The count of acid groups of the compound with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F is 0."} {"text":"The acid group count of the molecule with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 0."}", "/scratch/micpie/export/rdkit_features/train_22-20.jsonl": "{"text":"Question: What is the number of basic groups of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_3-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_6-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the molecule with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4?\nAnswer: 4"} {"text":"Question: What is the count of rotatable bonds of the molecule with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_15-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC?\nAnswer: 0"} {"text":"Question: What is the acid group count of the compound with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_15-20.jsonl": "{"text":"Question: What is the count of basic groups of the compound with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O?\nAnswer: 0"} {"text":"Question: What is the basic group count of the molecule with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_102-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the chemical with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1 is 4.52."} {"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2 is 8.36."}", "/scratch/micpie/export/rdkit_features/valid_112-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC is 2.25."} {"text":"The Wildman-Crippen LogP value of the chemical with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N is 0.31."}", "/scratch/micpie/export/rdkit_features/test_33-15.jsonl": "{"text":"Question: What is the heteroatom count of the chemical with SMILES Cc1ccc(cc1)C(=O)N2CCC(CC2)NC(=O)N[C@@H](C)[C@H](C(F)(F)F)O?\nAnswer: 9"} {"text":"Question: What is the number of heteroatoms of the compound with SMILES Cc1cc(no1)OCc2nc(no2)c3cc(c(n(c3=O)C)C)Br?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/train_14-3.jsonl": "{"text":"The number of heteroatoms of the molecule with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C is 5."} {"text":"The number of heteroatoms of the chemical with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O is 5."}", "/scratch/micpie/export/rdkit_features/test_16-1.jsonl": "{"text":"The number of hydrogen bond donors of the chemical with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C is 2."} {"text":"The number of hydrogen bond donors of the compound with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N is 1."}", "/scratch/micpie/export/rdkit_features/train_33-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES COc1ccc(c(n1)C(F)(F)F)NC(=O)C(=O)NCCC[NH+]2CCCCC2?\nAnswer: 2"} {"text":"Question: What is the ring count of the compound with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@H]3CCSC3)Br?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_25-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the molecule with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4?\nAnswer: 8"} {"text":"Question: What is the heteroatom count of the molecule with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_16-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the chemical with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C?\nAnswer: 5"} {"text":"Question: What is the number of heteroatoms of the compound with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/train_13-7.jsonl": "{"text":"The acid group count of the chemical with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC is 0."} {"text":"The count of acid groups of the molecule with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C is 0."}", "/scratch/micpie/export/rdkit_features/train_109-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3?\nAnswer: 5"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/valid_100-7.jsonl": "{"text":"The acid group count of the chemical with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F is 0."} {"text":"The number of acid groups of the molecule with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4 is 0."}", "/scratch/micpie/export/rdkit_features/test_118-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O?\nAnswer: 6"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_21-19.jsonl": "{"text":"Question: What is the count of acid groups of the chemical with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC?\nAnswer: 1"} {"text":"Question: What is the number of acid groups of the molecule with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_15-23.jsonl": "{"text":"User: I want to design a chemical with a count of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 4 and a Wildman-Crippen LogP value of 2.82.\nAssistant: Do you have some additional I should take into account?\nUser: Yeah, I want the formula to be C18H20N4O.\nAssistant: In that case, I advise the chemical with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N."} {"text":"User: I want to design a compound with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptors of 2 and a Wildman-Crippen LogP value computed using RDKit of 1.28.\nAssistant: Nice, do you have some additional constraints I should consider?\nUser: I want the molecular formula to be C18H35N2O2+.\nAssistant: Then, I suggest the compound with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C."}", "/scratch/micpie/export/rdkit_features/valid_22-3.jsonl": "{"text":"The count of heteroatoms of the chemical with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O is 8."} {"text":"The count of heteroatoms of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4 is 8."}", "/scratch/micpie/export/rdkit_features/valid_16-5.jsonl": "{"text":"The rotatable bond count of the chemical with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C is 9."} {"text":"The number of rotatable bonds of the chemical with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3 is 9."}", "/scratch/micpie/export/rdkit_features/test_19-18.jsonl": "{"text":"Question: What is the aromatic bond count of the molecule with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl?\nAnswer: 11"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_13-3.jsonl": "{"text":"The heteroatom count of the chemical with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4 is 5."} {"text":"The number of heteroatoms of the molecule with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C is 4."}", "/scratch/micpie/export/rdkit_features/valid_103-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the chemical with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO?\nAnswer: 0"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/test_25-11.jsonl": "{"text":"User: I want to make a compound with a chemical formula of C21H38N5O2+.\nAssistant: That is a very interesting question, do you have some additional that help me narrow down the search?\nUser: Yeah, I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptor sites to be 6.\nAssistant: In that case, I advise the compound with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4."} {"text":"User: I want to analyze a chemical with a molecular formula of C19H19F3N2O2.\nAssistant: Interesting, do you have some additional I should consider?\nUser: Yea, I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptors to be 3.\nAssistant: In that scenario, I recommend the chemical with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3."}", "/scratch/micpie/export/rdkit_features/test_31-11.jsonl": "{"text":"User: I want to analyze a molecule with a chemical formula of C24H23F2N3O2.\nAssistant: Nice, do you have some additional that I should consider?\nUser: Yeah, I want the count of hydrogen bond donors to be 0, the count of hydrogen bond acceptors to be 3.\nAssistant: In that scenario, I recommend the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F."} {"text":"User: I want to create a molecule with a chemical formula of C20H27FN2O4.\nAssistant: That's interesting, do you have some additional constraints that I should consider?\nUser: Yea, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 4.\nAssistant: In that situation, I recommend the molecule with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3."}", "/scratch/micpie/export/rdkit_features/valid_21-4.jsonl": "{"text":"The count of rings of the chemical with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-] is 2."} {"text":"The count of rings of the compound with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3 is 3."}", "/scratch/micpie/export/rdkit_features/valid_28-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br is 48.10."} {"text":"The sum of atomic polarizabilities of the compound with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C is 59.86."}", "/scratch/micpie/export/rdkit_features/test_119-16.jsonl": "{"text":"Question: What is the count of rings of the molecule with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O?\nAnswer: 2"} {"text":"Question: What is the number of rings of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_16-5.jsonl": "{"text":"The number of rotatable bonds of the chemical with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C is 10."} {"text":"The number of rotatable bonds of the chemical with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N is 5."}", "/scratch/micpie/export/rdkit_features/valid_113-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 69.10"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 64.69"}", "/scratch/micpie/export/rdkit_features/test_11-22.jsonl": "{"text":"User: I want to synthesize a compound with a number of hydrogen bond donor sites of 0 and a number of hydrogen bond acceptor sites of 1.\nAssistant: Interesting, do you have some additional constraints I should consider?\nUser: Yeah, I want the count of heteroatoms to be 5.\nAssistant: Then, I suggest the compound with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2."} {"text":"User: I want to synthesize a compound with a count of hydrogen bond donors of 0 and a number of hydrogen bond acceptors of 2.\nAssistant: Nice, do you have some additional limitations I should consider?\nUser: I want the heteroatom count to be 3.\nAssistant: In that situation, I the compound with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C."}", "/scratch/micpie/export/rdkit_features/train_118-7.jsonl": "{"text":"The number of acid groups of the chemical with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F is 0."} {"text":"The count of acid groups of the compound with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3 is 0."}", "/scratch/micpie/export/rdkit_features/valid_105-11.jsonl": "{"text":"User: I want to design a molecule with a molecular formula of C18H14FN3OS2.\nAssistant: That's interesting, do you have some additional limitations I should consider?\nUser: Yes, I want the number of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 5.\nAssistant: I the molecule with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F."} {"text":"User: I want to make a chemical with a formula of C13H12ClFN3O3+.\nAssistant: Interesting, do you have some additional limitations that help me narrow down the search?\nUser: Yes, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 4.\nAssistant: In that situation, I suggest the chemical with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O."}", "/scratch/micpie/export/rdkit_features/train_111-3.jsonl": "{"text":"The number of heteroatoms of the molecule with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C is 10."} {"text":"The count of heteroatoms of the molecule with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC is 10."}", "/scratch/micpie/export/rdkit_features/valid_18-3.jsonl": "{"text":"The number of heteroatoms of the compound with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 12."} {"text":"The count of heteroatoms of the compound with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O is 7."}", "/scratch/micpie/export/rdkit_features/test_108-23.jsonl": "{"text":"User: I want to synthesize a compound with a number of hydrogen bond donors of 2, a number of hydrogen bond acceptors of 10 and a LogP value computed using the Wildman-Crippen method of 2.55.\nAssistant: That is a very interesting question, do you have some additional limitations that I should consider?\nUser: Indeed, I want the chemical formula to be C21H19N9OS.\nAssistant: In that scenario, I advise the compound with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5."} {"text":"User: I want to design a compound with a number of hydrogen bond donor sites of 3, a number of hydrogen bond acceptors of 5 and a Wildman-Crippen LogP value of 1.70.\nAssistant: That's interesting, do you have some additional constraints that help me narrow down the search?\nUser: Yea, I want the chemical formula to be C23H29N5O4.\nAssistant: In that case, I suggest the compound with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3."}", "/scratch/micpie/export/rdkit_features/train_103-8.jsonl": "{"text":"The number of basic groups of the molecule with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1 is 0."} {"text":"The basic group count of the molecule with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 0."}", "/scratch/micpie/export/rdkit_features/valid_104-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the molecule with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3 is 2."} {"text":"The number of hydrogen bond donors of the molecule with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C is 3."}", "/scratch/micpie/export/rdkit_features/valid_3-8.jsonl": "{"text":"The number of basic groups of the chemical with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N is 0."} {"text":"The number of basic groups of the compound with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC is 0."}", "/scratch/micpie/export/rdkit_features/train_114-6.jsonl": "{"text":"The aromatic bond count of the compound with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C is 18."} {"text":"The count of aromatic bonds of the compound with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F is 23."}", "/scratch/micpie/export/rdkit_features/test_25-0.jsonl": "{"text":"The chemical formula of the compound with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4 is C21H38N5O2+."} {"text":"The chemical formula of the compound with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3 is C19H19F3N2O2."}", "/scratch/micpie/export/rdkit_features/train_10-23.jsonl": "{"text":"User: I want to analyze a molecule with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptor sites of 5 and a Wildman-Crippen LogP value of 2.36.\nAssistant: Do you have some additional constraints?\nUser: Indeed, I want the chemical formula to be C23H42N5O+.\nAssistant: I suggest the molecule with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4."} {"text":"User: I want to create a molecule with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptor sites of 2 and a Wildman-Crippen LogP value of 4.69.\nAssistant: Nice, do you have some additional conditions that help me narrow down the search?\nUser: Indeed, I want the chemical formula to be C20H28ClNO2.\nAssistant: I advise the molecule with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl."}", "/scratch/micpie/export/rdkit_features/valid_8-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C is 70.69."} {"text":"The sum of atomic polarizabilities of the compound with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br is 35.06."}", "/scratch/micpie/export/rdkit_features/valid_15-18.jsonl": "{"text":"Question: What is the aromatic bond count of the molecule with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N?\nAnswer: 11"} {"text":"Question: What is the number of aromatic bonds of the chemical with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_5-3.jsonl": "{"text":"The count of heteroatoms of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C is 6."} {"text":"The heteroatom count of the compound with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C is 8."}", "/scratch/micpie/export/rdkit_features/train_1-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4?\nAnswer: 70.61"} {"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl?\nAnswer: 55.14"}", "/scratch/micpie/export/rdkit_features/valid_3-23.jsonl": "{"text":"User: I want to analyze a chemical with a number of hydrogen bond donor sites of 3, a count of hydrogen bond acceptors of 5 and a Wildman-Crippen LogP value computed using RDKit of 3.62.\nAssistant: Interesting, do you have some additional limitations I should take into account?\nUser: I want the chemical formula to be C22H23N5O2S.\nAssistant: In that scenario, I advise the chemical with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N."} {"text":"User: I want to create a molecule with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptor sites of 5 and a Wildman-Crippen LogP value computed using RDKit of 3.59.\nAssistant: Interesting, do you have some additional limitations?\nUser: Yes, I want the chemical formula to be C20H22ClN3O4.\nAssistant: I suggest the molecule with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC."}", "/scratch/micpie/export/rdkit_features/valid_6-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+]?\nAnswer: 4"} {"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_108-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2?\nAnswer: 7"} {"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_118-3.jsonl": "{"text":"The count of heteroatoms of the compound with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O is 3."} {"text":"The heteroatom count of the chemical with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N is 11."}", "/scratch/micpie/export/rdkit_features/train_12-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the compound with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C?\nAnswer: 6"} {"text":"Question: What is the number of aromatic bonds of the chemical with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_114-8.jsonl": "{"text":"The count of basic groups of the molecule with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl is 0."} {"text":"The basic group count of the compound with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4 is 0."}", "/scratch/micpie/export/rdkit_features/train_22-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O is 58.55."} {"text":"The sum of atomic polarizabilities of the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4 is 65.41."}", "/scratch/micpie/export/rdkit_features/train_6-1.jsonl": "{"text":"The number of hydrogen bond donors of the compound with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4 is 2."} {"text":"The count of hydrogen bond donors of the compound with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C is 1."}", "/scratch/micpie/export/rdkit_features/valid_22-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the chemical with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O?\nAnswer: 8"} {"text":"Question: What is the heteroatom count of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/train_100-19.jsonl": "{"text":"Question: What is the number of acid groups of the compound with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the chemical with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_21-23.jsonl": "{"text":"User: I want to make a chemical with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptor sites of 6 and a Wildman-Crippen LogP value computed using RDKit of 0.86.\nAssistant: That's interesting, do you have some additional requirements I should consider?\nUser: I want the chemical formula to be C18H18NO6S-.\nAssistant: I propose the chemical with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-]."} {"text":"User: I want to analyze a molecule with a count of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 5 and a Wildman-Crippen LogP value of 2.08.\nAssistant: That's interesting, do you have some additional I should consider?\nUser: Yep, I want the formula to be C20H32N4O3.\nAssistant: In that scenario, I the molecule with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O."}", "/scratch/micpie/export/rdkit_features/train_7-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the molecule with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl is 5."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F is 5."}", "/scratch/micpie/export/rdkit_features/test_12-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C?\nAnswer: 2"} {"text":"Question: What is the rotatable bond count of the molecule with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_118-18.jsonl": "{"text":"Question: What is the aromatic bond count of the compound with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O?\nAnswer: 6"} {"text":"Question: What is the aromatic bond count of the compound with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_114-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the molecule with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C?\nAnswer: 8"} {"text":"Question: What is the count of heteroatoms of the compound with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/valid_23-22.jsonl": "{"text":"User: I want to analyze a compound with a number of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 5.\nAssistant: Nice, do you have some additional requirements I should consider?\nUser: Indeed, I want the heteroatom count to be 7.\nAssistant: Then, I advise the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4."} {"text":"User: I want to make a compound with a number of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 6.\nAssistant: That is a very interesting question, do you have some additional limitations that I should consider?\nUser: Indeed, I want the heteroatom count to be 9.\nAssistant: I propose the compound with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4."}", "/scratch/micpie/export/rdkit_features/train_113-3.jsonl": "{"text":"The count of heteroatoms of the chemical with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 9."} {"text":"The heteroatom count of the molecule with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 9."}", "/scratch/micpie/export/rdkit_features/valid_102-6.jsonl": "{"text":"The aromatic bond count of the compound with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1 is 18."} {"text":"The count of aromatic bonds of the compound with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2 is 12."}", "/scratch/micpie/export/rdkit_features/test_114-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C is 5.74."} {"text":"The Wildman-Crippen LogP value of the compound with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5 is 5.07."}", "/scratch/micpie/export/rdkit_features/train_105-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the chemical with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N?\nAnswer: 6"} {"text":"Question: What is the count of rotatable bonds of the molecule with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_31-23.jsonl": "{"text":"User: I want to make a molecule with a count of hydrogen bond donors of 0, a count of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value of 3.73.\nAssistant: Do you have some additional limitations I should take into account?\nUser: I want the chemical formula to be C24H27FN2O3.\nAssistant: I propose the molecule with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F."} {"text":"User: I want to synthesize a compound with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptors of 4 and a LogP value computed using the Wildman-Crippen method of 2.47.\nAssistant: That is a very interesting question, do you have some additional requirements?\nUser: Indeed, I want the formula to be C22H24N4O3.\nAssistant: I recommend the compound with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4."}", "/scratch/micpie/export/rdkit_features/valid_106-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl?\nAnswer: 38.66"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F?\nAnswer: 58.52"}", "/scratch/micpie/export/rdkit_features/test_17-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the chemical with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F?\nAnswer: 0"} {"text":"Question: What is the count of aromatic bonds of the compound with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/train_107-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl?\nAnswer: 6"} {"text":"Question: What is the number of aromatic bonds of the chemical with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5?\nAnswer: 27"}", "/scratch/micpie/export/rdkit_features/valid_25-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4 is 0."} {"text":"The acid group count of the chemical with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3 is 0."}", "/scratch/micpie/export/rdkit_features/train_6-11.jsonl": "{"text":"User: I want to design a chemical with a formula of C20H24N4O4S.\nAssistant: That's interesting, do you have some additional constraints that help me narrow down the search?\nUser: Yes, I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 6.\nAssistant: Then, I recommend the chemical with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4."} {"text":"User: I want to synthesize a chemical with a molecular formula of C25H30N4O2.\nAssistant: That is a very interesting question, do you have some additional that I should consider?\nUser: Yeah, I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptor sites to be 5.\nAssistant: Then, I the chemical with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C."}", "/scratch/micpie/export/rdkit_features/train_114-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C is 64.69."} {"text":"The total sum of atomic polarizabilities of the molecule with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F is 71.42."}", "/scratch/micpie/export/rdkit_features/test_105-6.jsonl": "{"text":"The aromatic bond count of the chemical with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N is 12."} {"text":"The count of aromatic bonds of the chemical with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O is 6."}", "/scratch/micpie/export/rdkit_features/train_17-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the molecule with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F?\nAnswer: 7"} {"text":"Question: What is the number of heteroatoms of the molecule with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/valid_106-20.jsonl": "{"text":"Question: What is the number of basic groups of the chemical with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the compound with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_112-6.jsonl": "{"text":"The aromatic bond count of the chemical with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC is 16."} {"text":"The count of aromatic bonds of the compound with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N is 12."}", "/scratch/micpie/export/rdkit_features/test_112-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC is 0."} {"text":"The acid group count of the chemical with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N is 0."}", "/scratch/micpie/export/rdkit_features/train_30-3.jsonl": "{"text":"The number of heteroatoms of the compound with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F is 7."} {"text":"The count of heteroatoms of the molecule with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC is 6."}", "/scratch/micpie/export/rdkit_features/test_118-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O is 1.92."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N is 1.07."}", "/scratch/micpie/export/rdkit_features/test_119-22.jsonl": "{"text":"User: I want to create a molecule with a count of hydrogen bond donors of 2 and a number of hydrogen bond acceptors of 6.\nAssistant: Interesting, do you have some additional I should consider?\nUser: Yes, I want the heteroatom count to be 9.\nAssistant: Then, I the molecule with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O."} {"text":"User: I want to make a molecule with a number of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 4.\nAssistant: Interesting, do you have some additional requirements that help me narrow down the search?\nUser: Yes, I want the number of heteroatoms to be 8.\nAssistant: In that scenario, I recommend the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C."}", "/scratch/micpie/export/rdkit_features/test_1-7.jsonl": "{"text":"The number of acid groups of the molecule with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4 is 0."} {"text":"The number of acid groups of the compound with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC is 0."}", "/scratch/micpie/export/rdkit_features/valid_32-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the chemical with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3?\nAnswer: 9"} {"text":"Question: What is the number of rotatable bonds of the molecule with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_12-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C?\nAnswer: 0"} {"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_29-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F is 2."} {"text":"The number of hydrogen bond acceptors of the chemical with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C is 2."}", "/scratch/micpie/export/rdkit_features/train_17-18.jsonl": "{"text":"Question: What is the aromatic bond count of the chemical with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F?\nAnswer: 6"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/test_107-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC is 66.93."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC is 67.93."}", "/scratch/micpie/export/rdkit_features/valid_20-3.jsonl": "{"text":"The heteroatom count of the compound with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3 is 6."} {"text":"The number of heteroatoms of the molecule with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-] is 9."}", "/scratch/micpie/export/rdkit_features/train_111-18.jsonl": "{"text":"Question: What is the aromatic bond count of the compound with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 15"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC?\nAnswer: 17"}", "/scratch/micpie/export/rdkit_features/train_3-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O is 5."} {"text":"The number of rotatable bonds of the chemical with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4 is 6."}", "/scratch/micpie/export/rdkit_features/train_28-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl is 41.48."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F is 58.00."}", "/scratch/micpie/export/rdkit_features/valid_33-18.jsonl": "{"text":"Question: What is the aromatic bond count of the molecule with SMILES c1ccc2c(c1)cn(n2)CC(=O)N3C[C@@H]4[C@H]3CN(CC4)C(=O)[C@@]56CCC[C@@H]5C6?\nAnswer: 10"} {"text":"Question: What is the count of aromatic bonds of the compound with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@@H]3CCSC3)Br?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/valid_7-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_14-3.jsonl": "{"text":"The heteroatom count of the molecule with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C is 5."} {"text":"The count of heteroatoms of the compound with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O is 5."}", "/scratch/micpie/export/rdkit_features/valid_104-0.jsonl": "{"text":"The formula of the compound with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3 is C14H21N5O4."} {"text":"The chemical formula of the chemical with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C is C21H34N3O2+."}", "/scratch/micpie/export/rdkit_features/train_33-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the compound with SMILES COc1ccc(c(n1)C(F)(F)F)NC(=O)C(=O)NCCC[NH+]2CCCCC2?\nAnswer: 10"} {"text":"Question: What is the count of heteroatoms of the compound with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@H]3CCSC3)Br?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/train_2-7.jsonl": "{"text":"The number of acid groups of the molecule with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6 is 0."} {"text":"The acid group count of the compound with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O is 0."}", "/scratch/micpie/export/rdkit_features/test_5-5.jsonl": "{"text":"The rotatable bond count of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5 is 4."} {"text":"The count of rotatable bonds of the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F is 6."}", "/scratch/micpie/export/rdkit_features/train_104-11.jsonl": "{"text":"User: I want to synthesize a compound with a formula of C14H21N5O4.\nAssistant: Do you have some additional constraints that I should consider?\nUser: Indeed, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 7.\nAssistant: In that situation, I recommend the compound with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3."} {"text":"User: I want to design a chemical with a chemical formula of C19H16ClN2O2S-.\nAssistant: Cool, do you have some additional that help me narrow down the search?\nUser: Yes, I want the number of hydrogen bond donor sites to be 0, the number of hydrogen bond acceptors to be 4.\nAssistant: In that situation, I propose the chemical with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl."}", "/scratch/micpie/export/rdkit_features/train_119-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO?\nAnswer: C15H14FN3O5"} {"text":"Question: What is the chemical formula of the chemical with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl?\nAnswer: C19H23Cl2N5O2S"}", "/scratch/micpie/export/rdkit_features/test_101-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F is 41.46."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1 is 69.40."}", "/scratch/micpie/export/rdkit_features/valid_113-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 5."} {"text":"The number of hydrogen bond acceptors of the molecule with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 3."}", "/scratch/micpie/export/rdkit_features/valid_28-3.jsonl": "{"text":"The count of heteroatoms of the compound with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br is 5."} {"text":"The count of heteroatoms of the molecule with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C is 4."}", "/scratch/micpie/export/rdkit_features/test_112-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC?\nAnswer: 66.20"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 67.83"}", "/scratch/micpie/export/rdkit_features/test_7-8.jsonl": "{"text":"The number of basic groups of the molecule with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC is 0."} {"text":"The number of basic groups of the chemical with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC is 0."}", "/scratch/micpie/export/rdkit_features/train_26-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the chemical with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F?\nAnswer: 3"} {"text":"Question: What is the number of rotatable bonds of the chemical with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_13-16.jsonl": "{"text":"Question: What is the ring count of the chemical with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 2"} {"text":"Question: What is the ring count of the compound with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_4-3.jsonl": "{"text":"The count of heteroatoms of the compound with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br is 7."} {"text":"The number of heteroatoms of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C is 6."}", "/scratch/micpie/export/rdkit_features/test_12-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_106-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl is 5."} {"text":"The count of hydrogen bond acceptors of the compound with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F is 3."}", "/scratch/micpie/export/rdkit_features/test_102-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3?\nAnswer: C23H25NO3"} {"text":"Question: What is the chemical formula of the molecule with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1?\nAnswer: C28H26N2O6"}", "/scratch/micpie/export/rdkit_features/valid_10-11.jsonl": "{"text":"User: I want to analyze a molecule with a formula of C17H25F3N6S.\nAssistant: Interesting, do you have some additional limitations I should take into account?\nUser: Yeah, I want the number of hydrogen bond donors to be 0, the count of hydrogen bond acceptors to be 7.\nAssistant: Then, I the molecule with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C."} {"text":"User: I want to design a chemical with a formula of C20H26ClNO2.\nAssistant: Cool, do you have some additional constraints I should take into account?\nUser: Yes, I want the number of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 2.\nAssistant: In that scenario, I advise the chemical with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3."}", "/scratch/micpie/export/rdkit_features/test_120-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES c1ccc(cc1)CCN(CC(=O)Nc2ccc(cc2)Cl)S(=O)(=O)c3ccc4c(c3)OCCO4 is 3.98."} {"text":"The Wildman-Crippen LogP value of the chemical with SMILES Cc1c(cnn1c2ccccc2)N[C@H](C)C(=O)Nc3cccc(c3Cl)Cl is 4.93."}", "/scratch/micpie/export/rdkit_features/valid_18-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 12"} {"text":"Question: What is the aromatic bond count of the molecule with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_24-18.jsonl": "{"text":"Question: What is the aromatic bond count of the molecule with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl?\nAnswer: 22"} {"text":"Question: What is the count of aromatic bonds of the molecule with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_25-8.jsonl": "{"text":"The number of basic groups of the compound with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C is 0."} {"text":"The basic group count of the compound with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F is 0."}", "/scratch/micpie/export/rdkit_features/train_1-3.jsonl": "{"text":"The heteroatom count of the compound with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4 is 5."} {"text":"The number of heteroatoms of the molecule with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl is 9."}", "/scratch/micpie/export/rdkit_features/valid_118-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donors of the molecule with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_109-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl is 6."} {"text":"The rotatable bond count of the chemical with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N is 4."}", "/scratch/micpie/export/rdkit_features/train_11-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the compound with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_31-4.jsonl": "{"text":"The ring count of the chemical with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F is 5."} {"text":"The ring count of the chemical with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3 is 3."}", "/scratch/micpie/export/rdkit_features/train_19-4.jsonl": "{"text":"The count of rings of the chemical with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC is 2."} {"text":"The number of rings of the molecule with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3 is 3."}", "/scratch/micpie/export/rdkit_features/valid_29-5.jsonl": "{"text":"The rotatable bond count of the compound with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl is 7."} {"text":"The count of rotatable bonds of the compound with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3 is 5."}", "/scratch/micpie/export/rdkit_features/train_5-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C?\nAnswer: 16"} {"text":"Question: What is the number of aromatic bonds of the compound with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_101-7.jsonl": "{"text":"The count of acid groups of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3 is 0."} {"text":"The acid group count of the compound with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1 is 0."}", "/scratch/micpie/export/rdkit_features/test_20-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br?\nAnswer: 5"} {"text":"Question: What is the number of rotatable bonds of the molecule with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-]?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_27-20.jsonl": "{"text":"Question: What is the number of basic groups of the chemical with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F?\nAnswer: 1"} {"text":"Question: What is the number of basic groups of the molecule with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_27-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C is 7."} {"text":"The number of rotatable bonds of the compound with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4 is 6."}", "/scratch/micpie/export/rdkit_features/test_20-5.jsonl": "{"text":"The number of rotatable bonds of the chemical with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br is 5."} {"text":"The number of rotatable bonds of the compound with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-] is 7."}", "/scratch/micpie/export/rdkit_features/test_7-23.jsonl": "{"text":"User: I want to analyze a chemical with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 6 and a Wildman-Crippen LogP value computed using RDKit of 3.75.\nAssistant: That is a very interesting question, do you have some additional requirements I should take into account?\nUser: Yep, I want the molecular formula to be C24H26N4O3.\nAssistant: In that scenario, I recommend the chemical with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC."} {"text":"User: I want to analyze a chemical with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptors of 7 and a Wildman-Crippen LogP value of 3.90.\nAssistant: Cool, do you have some additional requirements?\nUser: Yea, I want the chemical formula to be C21H18N2O5S.\nAssistant: I propose the chemical with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC."}", "/scratch/micpie/export/rdkit_features/test_15-23.jsonl": "{"text":"User: I want to design a chemical with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value computed using RDKit of 2.72.\nAssistant: That is a very interesting question, do you have some additional I should consider?\nUser: Yes, I want the molecular formula to be C13H14BrNO3.\nAssistant: In that case, I recommend the chemical with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O."} {"text":"User: I want to synthesize a compound with a number of hydrogen bond donor sites of 2, a count of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value of 1.26.\nAssistant: Cool, do you have some additional constraints that I should consider?\nUser: Yep, I want the molecular formula to be C16H26N3OS+.\nAssistant: I recommend the compound with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2."}", "/scratch/micpie/export/rdkit_features/train_23-19.jsonl": "{"text":"Question: What is the count of acid groups of the molecule with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the compound with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_105-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_108-19.jsonl": "{"text":"Question: What is the acid group count of the chemical with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2?\nAnswer: 0"} {"text":"Question: What is the acid group count of the chemical with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_5-7.jsonl": "{"text":"The count of acid groups of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C is 0."} {"text":"The number of acid groups of the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+] is 0."}", "/scratch/micpie/export/rdkit_features/train_113-11.jsonl": "{"text":"User: I want to create a molecule with a chemical formula of C21H32N5O3S+.\nAssistant: That's interesting, do you have some additional constraints I should take into account?\nUser: Indeed, I want the number of hydrogen bond donors to be 3, the number of hydrogen bond acceptors to be 5.\nAssistant: Then, I advise the molecule with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N."} {"text":"User: I want to design a molecule with a chemical formula of C21H19Cl2N3O2S2.\nAssistant: Interesting, do you have some additional constraints that I should consider?\nUser: Yep, I want the count of hydrogen bond donors to be 3, the count of hydrogen bond acceptors to be 3.\nAssistant: I the molecule with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl."}", "/scratch/micpie/export/rdkit_features/valid_120-19.jsonl": "{"text":"Question: What is the number of acid groups of the molecule with SMILES c1cc(oc1)CNC(=O)CSc2nc3ccc(cc3s2)n4c(c5c(c4O)[C@@H]6C[C@@H]5C=C6)O?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the molecule with SMILES CN(c1nc2cc(ccc2s1)Cl)C(=O)CSc3ccc(c(c3)F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_12-15.jsonl": "{"text":"Question: What is the heteroatom count of the chemical with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C?\nAnswer: 6"} {"text":"Question: What is the heteroatom count of the molecule with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_16-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the molecule with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C is 1.57."} {"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F is 2.82."}", "/scratch/micpie/export/rdkit_features/valid_107-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C is 0."} {"text":"The number of acid groups of the compound with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C is 0."}", "/scratch/micpie/export/rdkit_features/test_31-22.jsonl": "{"text":"User: I want to design a molecule with a number of hydrogen bond donors of 0 and a number of hydrogen bond acceptors of 3.\nAssistant: Cool, do you have some additional limitations I should consider?\nUser: I want the count of heteroatoms to be 7.\nAssistant: In that case, I advise the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F."} {"text":"User: I want to make a molecule with a number of hydrogen bond donor sites of 1 and a number of hydrogen bond acceptors of 4.\nAssistant: That is a very interesting question, do you have some additional constraints?\nUser: Yes, I want the heteroatom count to be 7.\nAssistant: Then, I propose the molecule with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3."}", "/scratch/micpie/export/rdkit_features/test_14-22.jsonl": "{"text":"User: I want to design a chemical with a number of hydrogen bond donor sites of 1 and a number of hydrogen bond acceptors of 3.\nAssistant: Interesting, do you have some additional conditions I should take into account?\nUser: Yea, I want the number of heteroatoms to be 5.\nAssistant: In that case, I recommend the chemical with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3."} {"text":"User: I want to make a molecule with a number of hydrogen bond donor sites of 1 and a count of hydrogen bond acceptors of 4.\nAssistant: Cool, do you have some additional conditions that I should consider?\nUser: I want the number of heteroatoms to be 5.\nAssistant: I recommend the molecule with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O."}", "/scratch/micpie/export/rdkit_features/train_115-15.jsonl": "{"text":"Question: What is the heteroatom count of the chemical with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F?\nAnswer: 8"} {"text":"Question: What is the heteroatom count of the molecule with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/train_100-20.jsonl": "{"text":"Question: What is the basic group count of the molecule with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C?\nAnswer: 1"} {"text":"Question: What is the count of basic groups of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_7-0.jsonl": "{"text":"The chemical formula of the compound with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC is C24H26N4O3."} {"text":"The chemical formula of the molecule with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC is C21H18N2O5S."}", "/scratch/micpie/export/rdkit_features/valid_30-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3 is 64.83."} {"text":"The total sum of atomic polarizabilities of the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F is 63.16."}", "/scratch/micpie/export/rdkit_features/test_112-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the molecule with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC?\nAnswer: 10"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/test_14-3.jsonl": "{"text":"The count of heteroatoms of the molecule with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3 is 5."} {"text":"The heteroatom count of the molecule with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O is 5."}", "/scratch/micpie/export/rdkit_features/valid_109-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl is 5."} {"text":"The count of hydrogen bond acceptors of the chemical with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F is 5."}", "/scratch/micpie/export/rdkit_features/test_8-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C is 7."} {"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F is 4."}", "/scratch/micpie/export/rdkit_features/test_19-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_22-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O?\nAnswer: 58.55"} {"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4?\nAnswer: 65.41"}", "/scratch/micpie/export/rdkit_features/train_33-3.jsonl": "{"text":"The number of heteroatoms of the compound with SMILES COc1ccc(c(n1)C(F)(F)F)NC(=O)C(=O)NCCC[NH+]2CCCCC2 is 10."} {"text":"The count of heteroatoms of the molecule with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@H]3CCSC3)Br is 8."}", "/scratch/micpie/export/rdkit_features/train_2-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6?\nAnswer: 64.73"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O?\nAnswer: 65.41"}", "/scratch/micpie/export/rdkit_features/train_118-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F?\nAnswer: 1"} {"text":"Question: What is the count of basic groups of the molecule with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_19-16.jsonl": "{"text":"Question: What is the count of rings of the compound with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC?\nAnswer: 2"} {"text":"Question: What is the number of rings of the chemical with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_109-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the compound with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl?\nAnswer: 6"} {"text":"Question: What is the count of rotatable bonds of the compound with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_106-19.jsonl": "{"text":"Question: What is the acid group count of the compound with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the compound with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_109-4.jsonl": "{"text":"The count of rings of the compound with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3 is 3."} {"text":"The count of rings of the chemical with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C is 3."}", "/scratch/micpie/export/rdkit_features/train_8-20.jsonl": "{"text":"Question: What is the count of basic groups of the chemical with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the chemical with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_106-19.jsonl": "{"text":"Question: What is the acid group count of the chemical with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl?\nAnswer: 0"} {"text":"Question: What is the acid group count of the molecule with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_18-0.jsonl": "{"text":"The molecular formula of the compound with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O is C20H25N3O9S."} {"text":"The formula of the chemical with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C is C21H28N2O3S2."}", "/scratch/micpie/export/rdkit_features/valid_31-22.jsonl": "{"text":"User: I want to synthesize a molecule with a count of hydrogen bond donors of 0 and a number of hydrogen bond acceptors of 3.\nAssistant: That is a very interesting question, do you have some additional limitations?\nUser: Indeed, I want the heteroatom count to be 6.\nAssistant: I recommend the molecule with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC."} {"text":"User: I want to synthesize a molecule with a number of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 4.\nAssistant: Do you have some additional constraints that help me narrow down the search?\nUser: Yep, I want the count of heteroatoms to be 7.\nAssistant: In that scenario, I propose the molecule with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3."}", "/scratch/micpie/export/rdkit_features/train_116-6.jsonl": "{"text":"The aromatic bond count of the chemical with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 17."} {"text":"The number of aromatic bonds of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br is 17."}", "/scratch/micpie/export/rdkit_features/valid_28-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_110-0.jsonl": "{"text":"The molecular formula of the compound with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O is C24H29N4O4+."} {"text":"The chemical formula of the chemical with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C is C19H25N5O3S2."}", "/scratch/micpie/export/rdkit_features/test_0-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1?\nAnswer: 69.35"} {"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC?\nAnswer: 62.75"}", "/scratch/micpie/export/rdkit_features/test_28-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC?\nAnswer: 53.59"} {"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C?\nAnswer: 58.17"}", "/scratch/micpie/export/rdkit_features/valid_113-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the molecule with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 9"} {"text":"Question: What is the count of heteroatoms of the compound with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/train_101-17.jsonl": "{"text":"Question: What is the rotatable bond count of the chemical with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3?\nAnswer: 5"} {"text":"Question: What is the rotatable bond count of the chemical with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/train_32-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C?\nAnswer: 4"} {"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_12-4.jsonl": "{"text":"The count of rings of the compound with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C is 4."} {"text":"The count of rings of the chemical with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4 is 4."}", "/scratch/micpie/export/rdkit_features/train_103-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1?\nAnswer: 4"} {"text":"Question: What is the rotatable bond count of the chemical with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_31-4.jsonl": "{"text":"The count of rings of the compound with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F is 4."} {"text":"The number of rings of the chemical with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4 is 4."}", "/scratch/micpie/export/rdkit_features/train_18-22.jsonl": "{"text":"User: I want to design a chemical with a number of hydrogen bond donor sites of 1 and a number of hydrogen bond acceptor sites of 9.\nAssistant: That is a very interesting question, do you have some additional constraints I should consider?\nUser: Yeah, I want the heteroatom count to be 13.\nAssistant: In that scenario, I suggest the chemical with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O."} {"text":"User: I want to create a chemical with a number of hydrogen bond donors of 2 and a count of hydrogen bond acceptors of 4.\nAssistant: That's interesting, do you have some additional constraints?\nUser: Yep, I want the count of heteroatoms to be 9.\nAssistant: In that scenario, I advise the chemical with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl."}", "/scratch/micpie/export/rdkit_features/train_22-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donors of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_23-4.jsonl": "{"text":"The number of rings of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4 is 4."} {"text":"The count of rings of the compound with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4 is 4."}", "/scratch/micpie/export/rdkit_features/train_117-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(cc4)Oc5ccc(cn5)Cl?\nAnswer: 5"} {"text":"Question: What is the ring count of the chemical with SMILES C[C@@H](C(C)(C)OC)[NH2+]CCNc1ccccc1?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_31-16.jsonl": "{"text":"Question: What is the ring count of the chemical with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F?\nAnswer: 5"} {"text":"Question: What is the count of rings of the chemical with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_119-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the compound with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO is 6."} {"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl is 6."}", "/scratch/micpie/export/rdkit_features/train_24-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the compound with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5?\nAnswer: 5"} {"text":"Question: What is the count of rotatable bonds of the molecule with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_1-8.jsonl": "{"text":"The basic group count of the compound with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5 is 0."} {"text":"The number of basic groups of the chemical with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F is 0."}", "/scratch/micpie/export/rdkit_features/test_100-12.jsonl": "{"text":"Question: What is the molecular formula of the molecule with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F?\nAnswer: C18H27FN3O+"} {"text":"Question: What is the formula of the chemical with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br?\nAnswer: C13H14BrN3O2"}", "/scratch/micpie/export/rdkit_features/train_20-1.jsonl": "{"text":"The count of hydrogen bond donors of the chemical with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3 is 2."} {"text":"The count of hydrogen bond donors of the compound with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC is 1."}", "/scratch/micpie/export/rdkit_features/train_20-8.jsonl": "{"text":"The number of basic groups of the compound with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3 is 0."} {"text":"The number of basic groups of the chemical with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC is 0."}", "/scratch/micpie/export/rdkit_features/valid_21-22.jsonl": "{"text":"User: I want to create a compound with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptor sites of 6.\nAssistant: Do you have some additional conditions that I should consider?\nUser: Yep, I want the count of heteroatoms to be 8.\nAssistant: In that case, I propose the compound with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-]."} {"text":"User: I want to design a molecule with a number of hydrogen bond donor sites of 2 and a count of hydrogen bond acceptors of 4.\nAssistant: Do you have some additional limitations I should take into account?\nUser: I want the heteroatom count to be 6.\nAssistant: In that case, I the molecule with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3."}", "/scratch/micpie/export/rdkit_features/test_104-19.jsonl": "{"text":"Question: What is the count of acid groups of the compound with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2?\nAnswer: 0"} {"text":"Question: What is the acid group count of the chemical with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_22-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donors of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_23-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_31-0.jsonl": "{"text":"The molecular formula of the chemical with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F is C24H23F2N3O2."} {"text":"The chemical formula of the compound with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3 is C20H27FN2O4."}", "/scratch/micpie/export/rdkit_features/test_0-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1 is 9."} {"text":"The number of hydrogen bond acceptors of the chemical with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC is 6."}", "/scratch/micpie/export/rdkit_features/test_3-0.jsonl": "{"text":"The molecular formula of the compound with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC is C22H26N6O3."} {"text":"The molecular formula of the molecule with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F is C20H20F2N4O3."}", "/scratch/micpie/export/rdkit_features/valid_104-20.jsonl": "{"text":"Question: What is the count of basic groups of the compound with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the chemical with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_103-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the compound with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO is 7."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N is 6."}", "/scratch/micpie/export/rdkit_features/test_112-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC is 9."} {"text":"The count of hydrogen bond acceptors of the molecule with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N is 5."}", "/scratch/micpie/export/rdkit_features/valid_12-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the compound with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl is 4.53."} {"text":"The Wildman-Crippen LogP value of the chemical with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3 is 4.56."}", "/scratch/micpie/export/rdkit_features/valid_14-23.jsonl": "{"text":"User: I want to create a compound with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value of 4.53.\nAssistant: That is a very interesting question, do you have some additional limitations I should consider?\nUser: Yes, I want the molecular formula to be C20H28N4O.\nAssistant: In that case, I advise the compound with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C."} {"text":"User: I want to make a molecule with a count of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 4 and a LogP value computed using the Wildman-Crippen method of 2.72.\nAssistant: Cool, do you have some additional conditions I should take into account?\nUser: Yea, I want the chemical formula to be C14H14BrNO3.\nAssistant: Then, I propose the molecule with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O."}", "/scratch/micpie/export/rdkit_features/train_32-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C is 8."} {"text":"The count of rotatable bonds of the compound with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2 is 7."}", "/scratch/micpie/export/rdkit_features/test_112-16.jsonl": "{"text":"Question: What is the number of rings of the compound with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC?\nAnswer: 4"} {"text":"Question: What is the ring count of the compound with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_32-16.jsonl": "{"text":"Question: What is the number of rings of the compound with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3?\nAnswer: 3"} {"text":"Question: What is the ring count of the molecule with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_111-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4 is 8."} {"text":"The number of rotatable bonds of the compound with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC is 6."}", "/scratch/micpie/export/rdkit_features/train_118-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F?\nAnswer: 42.44"} {"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3?\nAnswer: 46.27"}", "/scratch/micpie/export/rdkit_features/valid_21-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-] is 0.08."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3 is 2.49."}", "/scratch/micpie/export/rdkit_features/valid_26-3.jsonl": "{"text":"The heteroatom count of the compound with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2 is 7."} {"text":"The count of heteroatoms of the compound with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC is 8."}", "/scratch/micpie/export/rdkit_features/test_10-16.jsonl": "{"text":"Question: What is the ring count of the compound with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4?\nAnswer: 4"} {"text":"Question: What is the ring count of the molecule with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_18-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_24-11.jsonl": "{"text":"User: I want to design a molecule with a formula of C14H7Cl2N8O-.\nAssistant: Interesting, do you have some additional requirements I should take into account?\nUser: Yeah, I want the number of hydrogen bond donor sites to be 1, the count of hydrogen bond acceptors to be 7.\nAssistant: In that situation, I advise the molecule with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl."} {"text":"User: I want to synthesize a molecule with a formula of C21H39N6O+.\nAssistant: That is a very interesting question, do you have some additional I should consider?\nUser: I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 5.\nAssistant: In that case, I advise the molecule with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C."}", "/scratch/micpie/export/rdkit_features/valid_110-22.jsonl": "{"text":"User: I want to synthesize a chemical with a number of hydrogen bond donors of 2 and a count of hydrogen bond acceptors of 4.\nAssistant: Interesting, do you have some additional I should consider?\nUser: Yea, I want the count of heteroatoms to be 8.\nAssistant: In that case, I propose the chemical with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O."} {"text":"User: I want to analyze a molecule with a number of hydrogen bond donor sites of 0 and a number of hydrogen bond acceptors of 8.\nAssistant: Interesting, do you have some additional constraints I should take into account?\nUser: Indeed, I want the count of heteroatoms to be 10.\nAssistant: In that scenario, I suggest the molecule with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C."}", "/scratch/micpie/export/rdkit_features/valid_110-12.jsonl": "{"text":"Question: What is the molecular formula of the compound with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O?\nAnswer: C24H29N4O4+"} {"text":"Question: What is the formula of the compound with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: C19H25N5O3S2"}", "/scratch/micpie/export/rdkit_features/valid_119-23.jsonl": "{"text":"User: I want to design a molecule with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptor sites of 6 and a Wildman-Crippen LogP value of 1.24.\nAssistant: Cool, do you have some additional constraints I should take into account?\nUser: Indeed, I want the molecular formula to be C14H18FN3O5.\nAssistant: In that scenario, I the molecule with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O."} {"text":"User: I want to design a compound with a number of hydrogen bond donor sites of 0, a number of hydrogen bond acceptor sites of 7 and a LogP value computed using the Wildman-Crippen method of 3.71.\nAssistant: Cool, do you have some additional requirements I should take into account?\nUser: Indeed, I want the formula to be C28H23N3O4.\nAssistant: In that scenario, I the compound with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6."}", "/scratch/micpie/export/rdkit_features/valid_115-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4?\nAnswer: 67.42"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F?\nAnswer: 71.32"}", "/scratch/micpie/export/rdkit_features/train_1-0.jsonl": "{"text":"The molecular formula of the compound with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4 is C25H33N2O3+."} {"text":"The molecular formula of the chemical with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl is C17H22ClF3N2O2S."}", "/scratch/micpie/export/rdkit_features/train_4-23.jsonl": "{"text":"User: I want to analyze a compound with a count of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 5 and a LogP value computed using the Wildman-Crippen method of 2.25.\nAssistant: Do you have some additional requirements that I should consider?\nUser: Yes, I want the molecular formula to be C24H28N3O3+.\nAssistant: In that situation, I propose the compound with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4."} {"text":"User: I want to analyze a molecule with a number of hydrogen bond donors of 2, a number of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value of 2.36.\nAssistant: Interesting, do you have some additional conditions I should consider?\nUser: Yes, I want the molecular formula to be C23H29N2O3S+.\nAssistant: In that scenario, I recommend the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C."}", "/scratch/micpie/export/rdkit_features/test_33-3.jsonl": "{"text":"The number of heteroatoms of the molecule with SMILES Cc1ccc(cc1)C(=O)N2CCC(CC2)NC(=O)N[C@@H](C)[C@H](C(F)(F)F)O is 9."} {"text":"The heteroatom count of the chemical with SMILES Cc1cc(no1)OCc2nc(no2)c3cc(c(n(c3=O)C)C)Br is 9."}", "/scratch/micpie/export/rdkit_features/test_21-20.jsonl": "{"text":"Question: What is the number of basic groups of the chemical with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-]?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the molecule with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_19-1.jsonl": "{"text":"The number of hydrogen bond donors of the chemical with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC is 3."} {"text":"The number of hydrogen bond donor sites of the compound with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F is 2."}", "/scratch/micpie/export/rdkit_features/test_119-4.jsonl": "{"text":"The number of rings of the compound with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O is 2."} {"text":"The count of rings of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C is 2."}", "/scratch/micpie/export/rdkit_features/test_5-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5 is 4."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F is 4."}", "/scratch/micpie/export/rdkit_features/test_20-7.jsonl": "{"text":"The count of acid groups of the compound with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br is 0."} {"text":"The count of acid groups of the molecule with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-] is 1."}", "/scratch/micpie/export/rdkit_features/train_30-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the chemical with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F?\nAnswer: 8"} {"text":"Question: What is the number of rotatable bonds of the molecule with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_13-22.jsonl": "{"text":"User: I want to make a compound with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 3.\nAssistant: Nice, do you have some additional that help me narrow down the search?\nUser: Indeed, I want the heteroatom count to be 5.\nAssistant: Then, I suggest the compound with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4."} {"text":"User: I want to design a compound with a number of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 4.\nAssistant: Cool, do you have some additional requirements I should take into account?\nUser: Yes, I want the heteroatom count to be 4.\nAssistant: In that case, I propose the compound with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C."}", "/scratch/micpie/export/rdkit_features/train_28-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the chemical with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl is 3.94."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F is 3.75."}", "/scratch/micpie/export/rdkit_features/test_22-22.jsonl": "{"text":"User: I want to design a compound with a number of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 4.\nAssistant: Do you have some additional requirements that I should consider?\nUser: Yeah, I want the number of heteroatoms to be 6.\nAssistant: I propose the compound with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O."} {"text":"User: I want to synthesize a chemical with a count of hydrogen bond donors of 2 and a number of hydrogen bond acceptor sites of 5.\nAssistant: Do you have some additional limitations I should take into account?\nUser: Indeed, I want the count of heteroatoms to be 7.\nAssistant: In that situation, I propose the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4."}", "/scratch/micpie/export/rdkit_features/test_16-20.jsonl": "{"text":"Question: What is the basic group count of the chemical with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C?\nAnswer: 1"} {"text":"Question: What is the count of basic groups of the compound with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_18-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the chemical with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 6"} {"text":"Question: What is the number of rotatable bonds of the molecule with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_0-22.jsonl": "{"text":"User: I want to create a compound with a number of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 8.\nAssistant: Cool, do you have some additional I should take into account?\nUser: I want the number of heteroatoms to be 11.\nAssistant: In that scenario, I advise the compound with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1."} {"text":"User: I want to create a molecule with a number of hydrogen bond donor sites of 2 and a count of hydrogen bond acceptors of 3.\nAssistant: Interesting, do you have some additional limitations I should take into account?\nUser: I want the count of heteroatoms to be 5.\nAssistant: In that scenario, I recommend the molecule with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4."}", "/scratch/micpie/export/rdkit_features/test_11-8.jsonl": "{"text":"The number of basic groups of the compound with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2 is 0."} {"text":"The count of basic groups of the molecule with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C is 0."}", "/scratch/micpie/export/rdkit_features/test_17-8.jsonl": "{"text":"The basic group count of the chemical with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F is 0."} {"text":"The basic group count of the chemical with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 0."}", "/scratch/micpie/export/rdkit_features/valid_108-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the compound with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_115-23.jsonl": "{"text":"User: I want to analyze a compound with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptors of 5 and a Wildman-Crippen LogP value of 5.31.\nAssistant: Nice, do you have some additional constraints that help me narrow down the search?\nUser: Yeah, I want the formula to be C27H22FN3O4.\nAssistant: In that case, I propose the compound with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F."} {"text":"User: I want to analyze a chemical with a count of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value of 5.18.\nAssistant: That's interesting, do you have some additional requirements?\nUser: Yep, I want the molecular formula to be C30H30FN3O2.\nAssistant: In that scenario, I advise the chemical with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F."}", "/scratch/micpie/export/rdkit_features/valid_15-3.jsonl": "{"text":"The number of heteroatoms of the compound with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N is 5."} {"text":"The heteroatom count of the chemical with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C is 4."}", "/scratch/micpie/export/rdkit_features/train_27-6.jsonl": "{"text":"The count of aromatic bonds of the molecule with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C is 11."} {"text":"The number of aromatic bonds of the chemical with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4 is 16."}", "/scratch/micpie/export/rdkit_features/test_113-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N is 3."} {"text":"The count of hydrogen bond donors of the compound with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 3."}", "/scratch/micpie/export/rdkit_features/valid_18-12.jsonl": "{"text":"Question: What is the formula of the compound with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: C22H23N5O7"} {"text":"Question: What is the formula of the compound with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O?\nAnswer: C12H14Br2F3NO"}", "/scratch/micpie/export/rdkit_features/train_19-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC?\nAnswer: 9"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3?\nAnswer: 10"}", "/scratch/micpie/export/rdkit_features/train_110-7.jsonl": "{"text":"The count of acid groups of the compound with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O is 0."} {"text":"The acid group count of the molecule with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C is 0."}", "/scratch/micpie/export/rdkit_features/train_20-20.jsonl": "{"text":"Question: What is the number of basic groups of the chemical with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the compound with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_18-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the compound with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O?\nAnswer: 13"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/train_21-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC?\nAnswer: 6"} {"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_27-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C?\nAnswer: 3"} {"text":"Question: What is the ring count of the compound with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_106-3.jsonl": "{"text":"The count of heteroatoms of the compound with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O is 9."} {"text":"The count of heteroatoms of the compound with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl is 11."}", "/scratch/micpie/export/rdkit_features/test_18-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O?\nAnswer: 65.29"} {"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C?\nAnswer: 66.04"}", "/scratch/micpie/export/rdkit_features/test_31-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F is 63.59."} {"text":"The total sum of atomic polarizabilities of the molecule with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3 is 59.17."}", "/scratch/micpie/export/rdkit_features/train_108-16.jsonl": "{"text":"Question: What is the ring count of the compound with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2?\nAnswer: 2"} {"text":"Question: What is the count of rings of the molecule with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_1-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the chemical with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5?\nAnswer: 9"} {"text":"Question: What is the count of heteroatoms of the molecule with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/valid_12-22.jsonl": "{"text":"User: I want to create a molecule with a count of hydrogen bond donors of 0 and a count of hydrogen bond acceptors of 2.\nAssistant: That is a very interesting question, do you have some additional requirements that help me narrow down the search?\nUser: I want the count of heteroatoms to be 4.\nAssistant: In that case, I propose the molecule with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl."} {"text":"User: I want to make a compound with a count of hydrogen bond donors of 0 and a number of hydrogen bond acceptor sites of 4.\nAssistant: Do you have some additional constraints?\nUser: Yeah, I want the count of heteroatoms to be 5.\nAssistant: In that situation, I the compound with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3."}", "/scratch/micpie/export/rdkit_features/valid_19-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC is 12."} {"text":"The aromatic bond count of the compound with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F is 15."}", "/scratch/micpie/export/rdkit_features/valid_29-4.jsonl": "{"text":"The count of rings of the molecule with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl is 3."} {"text":"The count of rings of the molecule with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3 is 3."}", "/scratch/micpie/export/rdkit_features/train_10-4.jsonl": "{"text":"The number of rings of the chemical with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4 is 4."} {"text":"The ring count of the chemical with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl is 3."}", "/scratch/micpie/export/rdkit_features/train_16-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C?\nAnswer: 55.59"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F?\nAnswer: 64.87"}", "/scratch/micpie/export/rdkit_features/train_22-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O?\nAnswer: 3"} {"text":"Question: What is the count of rotatable bonds of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_15-18.jsonl": "{"text":"Question: What is the aromatic bond count of the compound with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O?\nAnswer: 6"} {"text":"Question: What is the aromatic bond count of the compound with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_101-11.jsonl": "{"text":"User: I want to create a molecule with a chemical formula of C15H13ClN4O2.\nAssistant: Do you have some additional requirements that I should consider?\nUser: Yea, I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 4.\nAssistant: In that situation, I advise the molecule with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3."} {"text":"User: I want to make a molecule with a formula of C25H23F2N3O2.\nAssistant: Cool, do you have some additional ?\nUser: Yep, I want the count of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 3.\nAssistant: Then, I advise the molecule with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1."}", "/scratch/micpie/export/rdkit_features/valid_109-4.jsonl": "{"text":"The number of rings of the chemical with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl is 3."} {"text":"The number of rings of the compound with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F is 4."}", "/scratch/micpie/export/rdkit_features/valid_118-3.jsonl": "{"text":"The count of heteroatoms of the molecule with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O is 3."} {"text":"The heteroatom count of the molecule with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO is 8."}", "/scratch/micpie/export/rdkit_features/valid_9-23.jsonl": "{"text":"User: I want to synthesize a chemical with a count of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value of 3.97.\nAssistant: That is a very interesting question, do you have some additional requirements I should take into account?\nUser: Yes, I want the formula to be C21H31F3N2O2.\nAssistant: In that case, I recommend the chemical with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2."} {"text":"User: I want to analyze a chemical with a count of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 6 and a Wildman-Crippen LogP value of 3.70.\nAssistant: Nice, do you have some additional requirements I should consider?\nUser: I want the chemical formula to be C23H39N5O2.\nAssistant: In that case, I suggest the chemical with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4."}", "/scratch/micpie/export/rdkit_features/train_10-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the molecule with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4?\nAnswer: 6"} {"text":"Question: What is the heteroatom count of the chemical with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_2-3.jsonl": "{"text":"The heteroatom count of the molecule with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6 is 8."} {"text":"The number of heteroatoms of the chemical with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O is 6."}", "/scratch/micpie/export/rdkit_features/valid_33-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES c1ccc2c(c1)cn(n2)CC(=O)N3C[C@@H]4[C@H]3CN(CC4)C(=O)[C@@]56CCC[C@@H]5C6?\nAnswer: 4"} {"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@@H]3CCSC3)Br?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_117-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc5cc(ccc5c4)Br?\nAnswer: 5"} {"text":"Question: What is the count of rings of the chemical with SMILES C[C@H](C(C)(C)OC)[NH2+]CCNc1ccccc1?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_0-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the chemical with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1 is 5.68."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4 is 3.85."}", "/scratch/micpie/export/rdkit_features/valid_104-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3 is 5."} {"text":"The number of rotatable bonds of the molecule with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C is 6."}", "/scratch/micpie/export/rdkit_features/test_6-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the molecule with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_102-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the chemical with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1?\nAnswer: 7"} {"text":"Question: What is the heteroatom count of the chemical with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2?\nAnswer: 10"}", "/scratch/micpie/export/rdkit_features/valid_100-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the chemical with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F?\nAnswer: 5"} {"text":"Question: What is the count of rotatable bonds of the chemical with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_26-4.jsonl": "{"text":"The count of rings of the chemical with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F is 2."} {"text":"The count of rings of the compound with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C is 3."}", "/scratch/micpie/export/rdkit_features/valid_18-4.jsonl": "{"text":"The count of rings of the chemical with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 5."} {"text":"The ring count of the chemical with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O is 1."}", "/scratch/micpie/export/rdkit_features/train_20-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the compound with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3 is 3.95."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC is 0.69."}", "/scratch/micpie/export/rdkit_features/valid_16-8.jsonl": "{"text":"The number of basic groups of the compound with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C is 1."} {"text":"The basic group count of the chemical with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3 is 0."}", "/scratch/micpie/export/rdkit_features/test_29-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the chemical with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4 is 2."} {"text":"The count of hydrogen bond donors of the molecule with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C is 1."}", "/scratch/micpie/export/rdkit_features/train_103-23.jsonl": "{"text":"User: I want to make a molecule with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptors of 3 and a LogP value computed using the Wildman-Crippen method of 4.24.\nAssistant: Cool, do you have some additional that help me narrow down the search?\nUser: I want the chemical formula to be C18H14FNO2.\nAssistant: In that case, I the molecule with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1."} {"text":"User: I want to design a molecule with a number of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 7 and a Wildman-Crippen LogP value computed using RDKit of -1.05.\nAssistant: Nice, do you have some additional constraints I should take into account?\nUser: Yes, I want the formula to be C14H21N5O4.\nAssistant: Then, I suggest the molecule with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3."}", "/scratch/micpie/export/rdkit_features/train_118-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F is 1.92."} {"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3 is 1.73."}", "/scratch/micpie/export/rdkit_features/valid_119-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the compound with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O is 6."} {"text":"The count of hydrogen bond acceptors of the chemical with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6 is 7."}", "/scratch/micpie/export/rdkit_features/valid_6-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the molecule with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+]?\nAnswer: 18"} {"text":"Question: What is the count of aromatic bonds of the compound with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F?\nAnswer: 17"}", "/scratch/micpie/export/rdkit_features/test_4-11.jsonl": "{"text":"User: I want to synthesize a molecule with a chemical formula of C23H20ClN3O3.\nAssistant: Interesting, do you have some additional limitations?\nUser: I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 4.\nAssistant: In that case, I the molecule with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4."} {"text":"User: I want to make a molecule with a chemical formula of C23H29N2O3S+.\nAssistant: Do you have some additional conditions I should consider?\nUser: I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 4.\nAssistant: In that case, I the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3."}", "/scratch/micpie/export/rdkit_features/test_33-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES Cc1ccc(cc1)C(=O)N2CCC(CC2)NC(=O)N[C@@H](C)[C@H](C(F)(F)F)O?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the molecule with SMILES Cc1cc(no1)OCc2nc(no2)c3cc(c(n(c3=O)C)C)Br?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_31-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F is 5."} {"text":"The count of rotatable bonds of the molecule with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4 is 6."}", "/scratch/micpie/export/rdkit_features/test_107-8.jsonl": "{"text":"The number of basic groups of the compound with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC is 0."} {"text":"The basic group count of the molecule with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC is 0."}", "/scratch/micpie/export/rdkit_features/train_0-6.jsonl": "{"text":"The aromatic bond count of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1 is 28."} {"text":"The aromatic bond count of the compound with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4 is 12."}", "/scratch/micpie/export/rdkit_features/valid_113-23.jsonl": "{"text":"User: I want to analyze a chemical with a number of hydrogen bond donor sites of 3, a number of hydrogen bond acceptor sites of 5 and a Wildman-Crippen LogP value computed using RDKit of 0.24.\nAssistant: That's interesting, do you have some additional constraints I should consider?\nUser: I want the molecular formula to be C21H32N5O3S+.\nAssistant: Then, I recommend the chemical with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N."} {"text":"User: I want to design a molecule with a number of hydrogen bond donor sites of 3, a count of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value computed using RDKit of 6.22.\nAssistant: That's interesting, do you have some additional conditions I should take into account?\nUser: Yes, I want the molecular formula to be C21H19Cl2N3O2S2.\nAssistant: Then, I suggest the molecule with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl."}", "/scratch/micpie/export/rdkit_features/train_106-4.jsonl": "{"text":"The count of rings of the compound with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O is 2."} {"text":"The ring count of the molecule with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl is 4."}", "/scratch/micpie/export/rdkit_features/train_101-3.jsonl": "{"text":"The number of heteroatoms of the molecule with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3 is 6."} {"text":"The heteroatom count of the compound with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1 is 7."}", "/scratch/micpie/export/rdkit_features/train_5-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C?\nAnswer: 4"} {"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_104-11.jsonl": "{"text":"User: I want to design a chemical with a formula of C11H18N4O5S.\nAssistant: That's interesting, do you have some additional limitations that help me narrow down the search?\nUser: Yes, I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 5.\nAssistant: I advise the chemical with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2."} {"text":"User: I want to analyze a molecule with a molecular formula of C18H14ClN5O.\nAssistant: Interesting, do you have some additional limitations that I should consider?\nUser: Yea, I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptors to be 6.\nAssistant: I the molecule with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3."}", "/scratch/micpie/export/rdkit_features/train_14-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C is 11."} {"text":"The count of aromatic bonds of the compound with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O is 6."}", "/scratch/micpie/export/rdkit_features/valid_0-6.jsonl": "{"text":"The aromatic bond count of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1 is 25."} {"text":"The number of aromatic bonds of the chemical with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4 is 6."}", "/scratch/micpie/export/rdkit_features/test_32-6.jsonl": "{"text":"The count of aromatic bonds of the chemical with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3 is 11."} {"text":"The number of aromatic bonds of the molecule with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3 is 12."}", "/scratch/micpie/export/rdkit_features/test_16-3.jsonl": "{"text":"The count of heteroatoms of the compound with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C is 4."} {"text":"The number of heteroatoms of the chemical with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N is 9."}", "/scratch/micpie/export/rdkit_features/test_13-4.jsonl": "{"text":"The number of rings of the compound with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC is 2."} {"text":"The ring count of the molecule with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4 is 4."}", "/scratch/micpie/export/rdkit_features/train_5-11.jsonl": "{"text":"User: I want to analyze a molecule with a formula of C23H29N2O3S+.\nAssistant: Cool, do you have some additional I should consider?\nUser: Yep, I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptors to be 4.\nAssistant: I propose the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C."} {"text":"User: I want to create a compound with a chemical formula of C21H32N4O4.\nAssistant: Nice, do you have some additional I should consider?\nUser: Yes, I want the number of hydrogen bond donor sites to be 3, the number of hydrogen bond acceptor sites to be 4.\nAssistant: Then, I recommend the compound with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C."}", "/scratch/micpie/export/rdkit_features/valid_13-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4 is 54.95."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C is 54.70."}", "/scratch/micpie/export/rdkit_features/test_112-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC is 2.44."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N is 0.31."}", "/scratch/micpie/export/rdkit_features/train_103-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the molecule with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1 is 3."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 7."}", "/scratch/micpie/export/rdkit_features/train_6-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4 is 61.71."} {"text":"The sum of atomic polarizabilities of the compound with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C is 70.01."}", "/scratch/micpie/export/rdkit_features/train_119-23.jsonl": "{"text":"User: I want to make a molecule with a number of hydrogen bond donors of 3, a number of hydrogen bond acceptor sites of 6 and a Wildman-Crippen LogP value computed using RDKit of 1.94.\nAssistant: That's interesting, do you have some additional requirements?\nUser: Yes, I want the chemical formula to be C15H14FN3O5.\nAssistant: In that case, I suggest the molecule with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO."} {"text":"User: I want to make a chemical with a number of hydrogen bond donor sites of 2, a count of hydrogen bond acceptors of 6 and a LogP value computed using the Wildman-Crippen method of 3.59.\nAssistant: That's interesting, do you have some additional limitations that I should consider?\nUser: I want the chemical formula to be C19H23Cl2N5O2S.\nAssistant: Then, I propose the chemical with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl."}", "/scratch/micpie/export/rdkit_features/test_106-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O is 44.25."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC is 66.93."}", "/scratch/micpie/export/rdkit_features/valid_102-8.jsonl": "{"text":"The basic group count of the compound with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1 is 0."} {"text":"The basic group count of the molecule with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2 is 2."}", "/scratch/micpie/export/rdkit_features/valid_15-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the chemical with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N is 1."} {"text":"The number of hydrogen bond donor sites of the compound with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C is 2."}", "/scratch/micpie/export/rdkit_features/test_106-6.jsonl": "{"text":"The aromatic bond count of the compound with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O is 10."} {"text":"The aromatic bond count of the chemical with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC is 11."}", "/scratch/micpie/export/rdkit_features/train_25-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C?\nAnswer: 63.41"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F?\nAnswer: 42.45"}", "/scratch/micpie/export/rdkit_features/train_6-23.jsonl": "{"text":"User: I want to make a compound with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptor sites of 6 and a LogP value computed using the Wildman-Crippen method of 3.26.\nAssistant: Do you have some additional I should take into account?\nUser: I want the chemical formula to be C20H24N4O4S.\nAssistant: Then, I recommend the compound with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4."} {"text":"User: I want to synthesize a chemical with a count of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 5 and a LogP value computed using the Wildman-Crippen method of 3.62.\nAssistant: Nice, do you have some additional conditions I should take into account?\nUser: Yeah, I want the formula to be C25H30N4O2.\nAssistant: In that case, I recommend the chemical with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C."}", "/scratch/micpie/export/rdkit_features/test_108-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the compound with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5 is 2."} {"text":"The number of hydrogen bond donors of the molecule with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3 is 3."}", "/scratch/micpie/export/rdkit_features/train_13-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_1-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the chemical with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5 is 7."} {"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F is 7."}", "/scratch/micpie/export/rdkit_features/valid_21-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-]?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donors of the molecule with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_102-8.jsonl": "{"text":"The basic group count of the molecule with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O is 0."} {"text":"The basic group count of the molecule with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1 is 0."}", "/scratch/micpie/export/rdkit_features/train_31-12.jsonl": "{"text":"Question: What is the chemical formula of the compound with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F?\nAnswer: C24H27FN2O3"} {"text":"Question: What is the molecular formula of the compound with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4?\nAnswer: C22H24N4O3"}", "/scratch/micpie/export/rdkit_features/test_113-16.jsonl": "{"text":"Question: What is the ring count of the compound with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 3"} {"text":"Question: What is the count of rings of the molecule with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_15-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the compound with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O is 4."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2 is 3."}", "/scratch/micpie/export/rdkit_features/test_24-23.jsonl": "{"text":"User: I want to synthesize a molecule with a number of hydrogen bond donor sites of 1, a count of hydrogen bond acceptors of 7 and a Wildman-Crippen LogP value of 2.10.\nAssistant: Do you have some additional requirements I should take into account?\nUser: Yeah, I want the molecular formula to be C14H7Cl2N8O-.\nAssistant: In that case, I the molecule with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl."} {"text":"User: I want to make a compound with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 5 and a LogP value computed using the Wildman-Crippen method of 0.99.\nAssistant: Cool, do you have some additional limitations I should take into account?\nUser: Yes, I want the chemical formula to be C21H39N6O+.\nAssistant: Then, I suggest the compound with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C."}", "/scratch/micpie/export/rdkit_features/train_6-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4?\nAnswer: 6"} {"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_20-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br is 2."} {"text":"The number of hydrogen bond donor sites of the chemical with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-] is 1."}", "/scratch/micpie/export/rdkit_features/valid_117-7.jsonl": "{"text":"The count of acid groups of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4OC)Cl is 0."} {"text":"The count of acid groups of the molecule with SMILES Cc1cc(ccc1[C@@H](C)[NH2+]CC2(CC2)CO)F is 0."}", "/scratch/micpie/export/rdkit_features/test_105-8.jsonl": "{"text":"The basic group count of the compound with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N is 0."} {"text":"The basic group count of the molecule with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O is 0."}", "/scratch/micpie/export/rdkit_features/test_119-17.jsonl": "{"text":"Question: What is the rotatable bond count of the compound with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O?\nAnswer: 5"} {"text":"Question: What is the number of rotatable bonds of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/valid_20-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3?\nAnswer: 3"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-]?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_14-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3 is 1."} {"text":"The number of hydrogen bond donor sites of the compound with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O is 1."}", "/scratch/micpie/export/rdkit_features/train_8-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the compound with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N?\nAnswer: 5"} {"text":"Question: What is the rotatable bond count of the molecule with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO?\nAnswer: 10"}", "/scratch/micpie/export/rdkit_features/test_3-20.jsonl": "{"text":"Question: What is the number of basic groups of the compound with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the chemical with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_7-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the molecule with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC is 4."} {"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC is 5."}", "/scratch/micpie/export/rdkit_features/valid_112-7.jsonl": "{"text":"The acid group count of the molecule with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC is 0."} {"text":"The count of acid groups of the chemical with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N is 0."}", "/scratch/micpie/export/rdkit_features/train_24-23.jsonl": "{"text":"User: I want to analyze a molecule with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptors of 6 and a Wildman-Crippen LogP value of 1.65.\nAssistant: Cool, do you have some additional conditions I should take into account?\nUser: Yeah, I want the chemical formula to be C20H16N7O2-.\nAssistant: In that scenario, I recommend the molecule with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5."} {"text":"User: I want to synthesize a compound with a count of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 8 and a Wildman-Crippen LogP value of 2.44.\nAssistant: Nice, do you have some additional that help me narrow down the search?\nUser: Indeed, I want the molecular formula to be C19H31N7O2.\nAssistant: Then, I suggest the compound with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C."}", "/scratch/micpie/export/rdkit_features/test_18-19.jsonl": "{"text":"Question: What is the acid group count of the compound with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the chemical with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_117-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(cc4)Oc5ccc(cn5)Cl is 73.54."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES C[C@@H](C(C)(C)OC)[NH2+]CCNc1ccccc1 is 44.31."}", "/scratch/micpie/export/rdkit_features/valid_12-12.jsonl": "{"text":"Question: What is the molecular formula of the molecule with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl?\nAnswer: C20H26ClNO2"} {"text":"Question: What is the chemical formula of the molecule with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3?\nAnswer: C19H33FN4"}", "/scratch/micpie/export/rdkit_features/train_103-19.jsonl": "{"text":"Question: What is the count of acid groups of the molecule with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1?\nAnswer: 0"} {"text":"Question: What is the acid group count of the compound with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_104-8.jsonl": "{"text":"The count of basic groups of the molecule with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3 is 0."} {"text":"The basic group count of the chemical with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C is 1."}", "/scratch/micpie/export/rdkit_features/test_27-16.jsonl": "{"text":"Question: What is the count of rings of the molecule with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F?\nAnswer: 2"} {"text":"Question: What is the ring count of the compound with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_109-19.jsonl": "{"text":"Question: What is the count of acid groups of the chemical with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl?\nAnswer: 0"} {"text":"Question: What is the acid group count of the compound with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_110-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O?\nAnswer: 6"} {"text":"Question: What is the rotatable bond count of the chemical with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_31-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F?\nAnswer: 0"} {"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_20-19.jsonl": "{"text":"Question: What is the acid group count of the chemical with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the chemical with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-]?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_104-23.jsonl": "{"text":"User: I want to create a chemical with a number of hydrogen bond donors of 2, a count of hydrogen bond acceptors of 5 and a Wildman-Crippen LogP value computed using RDKit of -2.78.\nAssistant: That is a very interesting question, do you have some additional requirements I should take into account?\nUser: Yep, I want the molecular formula to be C11H18N4O5S.\nAssistant: I recommend the chemical with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2."} {"text":"User: I want to analyze a chemical with a count of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 6 and a LogP value computed using the Wildman-Crippen method of 4.36.\nAssistant: Nice, do you have some additional conditions I should take into account?\nUser: Yeah, I want the chemical formula to be C18H14ClN5O.\nAssistant: In that situation, I suggest the chemical with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3."}", "/scratch/micpie/export/rdkit_features/test_105-23.jsonl": "{"text":"User: I want to synthesize a compound with a number of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value of 4.04.\nAssistant: Nice, do you have some additional constraints I should take into account?\nUser: Yep, I want the chemical formula to be C17H15BrN2O2.\nAssistant: I recommend the compound with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N."} {"text":"User: I want to make a molecule with a count of hydrogen bond donors of 2, a count of hydrogen bond acceptors of 6 and a Wildman-Crippen LogP value computed using RDKit of 2.49.\nAssistant: That's interesting, do you have some additional requirements I should take into account?\nUser: Yea, I want the chemical formula to be C16H21N3O4.\nAssistant: Then, I propose the molecule with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O."}", "/scratch/micpie/export/rdkit_features/test_19-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl?\nAnswer: 62.98"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4?\nAnswer: 69.09"}", "/scratch/micpie/export/rdkit_features/valid_10-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C?\nAnswer: 57.76"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3?\nAnswer: 57.42"}", "/scratch/micpie/export/rdkit_features/test_25-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4 is 7."} {"text":"The number of rotatable bonds of the chemical with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3 is 6."}", "/scratch/micpie/export/rdkit_features/train_117-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(cc4)Oc5ccc(cn5)Cl is 18."} {"text":"The count of aromatic bonds of the molecule with SMILES C[C@@H](C(C)(C)OC)[NH2+]CCNc1ccccc1 is 6."}", "/scratch/micpie/export/rdkit_features/valid_31-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_12-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl is 57.42."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3 is 60.40."}", "/scratch/micpie/export/rdkit_features/test_16-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_6-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3 is 69.00."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C is 70.01."}", "/scratch/micpie/export/rdkit_features/test_23-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4?\nAnswer: 6"} {"text":"Question: What is the rotatable bond count of the chemical with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_22-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O is 0."} {"text":"The acid group count of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4 is 0."}", "/scratch/micpie/export/rdkit_features/test_30-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3 is 0."} {"text":"The count of acid groups of the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC is 0."}", "/scratch/micpie/export/rdkit_features/test_108-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5?\nAnswer: 10"} {"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_105-16.jsonl": "{"text":"Question: What is the number of rings of the molecule with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N?\nAnswer: 2"} {"text":"Question: What is the number of rings of the chemical with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_111-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4?\nAnswer: 1"} {"text":"Question: What is the basic group count of the compound with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_102-5.jsonl": "{"text":"The number of rotatable bonds of the molecule with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O is 2."} {"text":"The number of rotatable bonds of the chemical with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1 is 4."}", "/scratch/micpie/export/rdkit_features/train_20-22.jsonl": "{"text":"User: I want to design a molecule with a number of hydrogen bond donor sites of 2 and a count of hydrogen bond acceptors of 3.\nAssistant: Nice, do you have some additional constraints I should consider?\nUser: I want the heteroatom count to be 6.\nAssistant: I propose the molecule with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3."} {"text":"User: I want to synthesize a chemical with a number of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 6.\nAssistant: Interesting, do you have some additional constraints that I should consider?\nUser: Yep, I want the number of heteroatoms to be 9.\nAssistant: I propose the chemical with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC."}", "/scratch/micpie/export/rdkit_features/train_3-8.jsonl": "{"text":"The number of basic groups of the molecule with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O is 0."} {"text":"The count of basic groups of the chemical with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4 is 1."}", "/scratch/micpie/export/rdkit_features/train_15-20.jsonl": "{"text":"Question: What is the count of basic groups of the compound with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC?\nAnswer: 0"} {"text":"Question: What is the basic group count of the chemical with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_22-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O is 2.49."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4 is 0.74."}", "/scratch/micpie/export/rdkit_features/train_27-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_104-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the compound with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 9"} {"text":"Question: What is the heteroatom count of the chemical with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_110-1.jsonl": "{"text":"The number of hydrogen bond donors of the molecule with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O is 3."} {"text":"The number of hydrogen bond donor sites of the molecule with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C is 0."}", "/scratch/micpie/export/rdkit_features/test_119-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O?\nAnswer: 2"} {"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_106-16.jsonl": "{"text":"Question: What is the count of rings of the compound with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O?\nAnswer: 2"} {"text":"Question: What is the ring count of the chemical with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_106-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O is 2."} {"text":"The number of hydrogen bond donors of the compound with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC is 2."}", "/scratch/micpie/export/rdkit_features/test_110-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the chemical with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3?\nAnswer: 8"} {"text":"Question: What is the count of heteroatoms of the chemical with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/test_19-4.jsonl": "{"text":"The count of rings of the molecule with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl is 3."} {"text":"The number of rings of the compound with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4 is 4."}", "/scratch/micpie/export/rdkit_features/train_104-16.jsonl": "{"text":"Question: What is the count of rings of the molecule with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 3"} {"text":"Question: What is the number of rings of the molecule with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_16-15.jsonl": "{"text":"Question: What is the heteroatom count of the compound with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C?\nAnswer: 4"} {"text":"Question: What is the number of heteroatoms of the compound with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_120-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES c1ccc(cc1)CCN(CC(=O)Nc2ccc(cc2)Cl)S(=O)(=O)c3ccc4c(c3)OCCO4?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES Cc1c(cnn1c2ccccc2)N[C@H](C)C(=O)Nc3cccc(c3Cl)Cl?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_120-4.jsonl": "{"text":"The ring count of the chemical with SMILES c1ccc(cc1)CCN(CC(=O)Nc2ccc(cc2)Cl)S(=O)(=O)c3ccc4c(c3)OCCO4 is 4."} {"text":"The count of rings of the chemical with SMILES Cc1c(cnn1c2ccccc2)N[C@H](C)C(=O)Nc3cccc(c3Cl)Cl is 3."}", "/scratch/micpie/export/rdkit_features/train_0-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1?\nAnswer: 58.28"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4?\nAnswer: 70.61"}", "/scratch/micpie/export/rdkit_features/valid_116-16.jsonl": "{"text":"Question: What is the count of rings of the chemical with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 6"} {"text":"Question: What is the number of rings of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_105-8.jsonl": "{"text":"The number of basic groups of the compound with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N is 0."} {"text":"The basic group count of the chemical with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O is 0."}", "/scratch/micpie/export/rdkit_features/test_3-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC?\nAnswer: 7"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_105-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the compound with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F is 5."} {"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O is 4."}", "/scratch/micpie/export/rdkit_features/train_5-19.jsonl": "{"text":"Question: What is the count of acid groups of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the chemical with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_26-11.jsonl": "{"text":"User: I want to analyze a molecule with a molecular formula of C18H22F3NO3.\nAssistant: Do you have some additional limitations I should consider?\nUser: Yeah, I want the number of hydrogen bond donor sites to be 0, the count of hydrogen bond acceptors to be 3.\nAssistant: In that case, I suggest the molecule with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3."} {"text":"User: I want to design a molecule with a molecular formula of C15H13BrClNO3.\nAssistant: Do you have some additional limitations that I should consider?\nUser: Yep, I want the count of hydrogen bond donors to be 2, the count of hydrogen bond acceptors to be 3.\nAssistant: In that case, I advise the molecule with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br."}", "/scratch/micpie/export/rdkit_features/valid_13-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4 is 3."} {"text":"The count of rotatable bonds of the molecule with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C is 4."}", "/scratch/micpie/export/rdkit_features/train_21-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the molecule with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC?\nAnswer: 12"} {"text":"Question: What is the count of aromatic bonds of the molecule with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_3-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC is 65.06."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F is 56.46."}", "/scratch/micpie/export/rdkit_features/valid_2-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4?\nAnswer: 17"} {"text":"Question: What is the count of aromatic bonds of the compound with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/train_100-22.jsonl": "{"text":"User: I want to make a compound with a number of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 2.\nAssistant: Do you have some additional limitations that I should consider?\nUser: I want the heteroatom count to be 4.\nAssistant: I advise the compound with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C."} {"text":"User: I want to synthesize a chemical with a number of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 5.\nAssistant: Nice, do you have some additional conditions that help me narrow down the search?\nUser: Yea, I want the heteroatom count to be 6.\nAssistant: I advise the chemical with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3."}", "/scratch/micpie/export/rdkit_features/valid_9-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2 is 0."} {"text":"The acid group count of the molecule with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4 is 0."}", "/scratch/micpie/export/rdkit_features/test_108-20.jsonl": "{"text":"Question: What is the count of basic groups of the molecule with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the molecule with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_13-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4?\nAnswer: 3"} {"text":"Question: What is the count of rotatable bonds of the compound with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_16-22.jsonl": "{"text":"User: I want to create a chemical with a count of hydrogen bond donors of 2 and a number of hydrogen bond acceptor sites of 2.\nAssistant: Interesting, do you have some additional requirements that I should consider?\nUser: I want the count of heteroatoms to be 4.\nAssistant: In that case, I advise the chemical with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C."} {"text":"User: I want to analyze a molecule with a number of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 5.\nAssistant: Nice, do you have some additional requirements I should take into account?\nUser: Indeed, I want the heteroatom count to be 9.\nAssistant: In that scenario, I the molecule with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N."}", "/scratch/micpie/export/rdkit_features/test_5-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5?\nAnswer: 16"} {"text":"Question: What is the aromatic bond count of the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/test_112-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC?\nAnswer: 0"} {"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_107-3.jsonl": "{"text":"The number of heteroatoms of the molecule with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C is 8."} {"text":"The heteroatom count of the chemical with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C is 9."}", "/scratch/micpie/export/rdkit_features/test_115-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F is C27H22FN3O4."} {"text":"The chemical formula of the compound with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is C30H30FN3O2."}", "/scratch/micpie/export/rdkit_features/valid_105-6.jsonl": "{"text":"The number of aromatic bonds of the chemical with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F is 16."} {"text":"The aromatic bond count of the compound with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O is 11."}", "/scratch/micpie/export/rdkit_features/valid_20-4.jsonl": "{"text":"The number of rings of the chemical with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3 is 3."} {"text":"The number of rings of the molecule with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-] is 2."}", "/scratch/micpie/export/rdkit_features/train_104-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 0."} {"text":"The count of acid groups of the molecule with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl is 1."}", "/scratch/micpie/export/rdkit_features/test_32-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3 is 9."} {"text":"The count of rotatable bonds of the compound with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3 is 6."}", "/scratch/micpie/export/rdkit_features/train_18-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the molecule with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O is 0.60."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl is 3.70."}", "/scratch/micpie/export/rdkit_features/test_112-4.jsonl": "{"text":"The ring count of the molecule with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC is 4."} {"text":"The number of rings of the compound with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N is 3."}", "/scratch/micpie/export/rdkit_features/valid_17-1.jsonl": "{"text":"The count of hydrogen bond donors of the chemical with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S is 2."} {"text":"The count of hydrogen bond donors of the compound with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 2."}", "/scratch/micpie/export/rdkit_features/valid_29-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl is 0."} {"text":"The number of acid groups of the chemical with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3 is 0."}", "/scratch/micpie/export/rdkit_features/test_116-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1 is 1."} {"text":"The number of hydrogen bond donor sites of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl is 0."}", "/scratch/micpie/export/rdkit_features/valid_101-16.jsonl": "{"text":"Question: What is the number of rings of the compound with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3?\nAnswer: 3"} {"text":"Question: What is the ring count of the compound with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_119-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the compound with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O is 44.51."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6 is 71.12."}", "/scratch/micpie/export/rdkit_features/train_116-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the compound with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 7"} {"text":"Question: What is the heteroatom count of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_8-22.jsonl": "{"text":"User: I want to analyze a chemical with a number of hydrogen bond donor sites of 1 and a number of hydrogen bond acceptor sites of 5.\nAssistant: Cool, do you have some additional ?\nUser: Yea, I want the count of heteroatoms to be 7.\nAssistant: Then, I advise the chemical with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C."} {"text":"User: I want to synthesize a compound with a number of hydrogen bond donors of 0 and a number of hydrogen bond acceptors of 2.\nAssistant: Do you have some additional conditions I should take into account?\nUser: Yes, I want the heteroatom count to be 9.\nAssistant: In that scenario, I advise the compound with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br."}", "/scratch/micpie/export/rdkit_features/test_15-3.jsonl": "{"text":"The number of heteroatoms of the chemical with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O is 5."} {"text":"The count of heteroatoms of the compound with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2 is 5."}", "/scratch/micpie/export/rdkit_features/test_14-5.jsonl": "{"text":"The number of rotatable bonds of the chemical with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3 is 6."} {"text":"The rotatable bond count of the compound with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O is 6."}", "/scratch/micpie/export/rdkit_features/valid_117-17.jsonl": "{"text":"Question: What is the rotatable bond count of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4OC)Cl?\nAnswer: 5"} {"text":"Question: What is the count of rotatable bonds of the compound with SMILES Cc1cc(ccc1[C@@H](C)[NH2+]CC2(CC2)CO)F?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_13-23.jsonl": "{"text":"User: I want to create a chemical with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 2 and a LogP value computed using the Wildman-Crippen method of 4.74.\nAssistant: That's interesting, do you have some additional requirements I should consider?\nUser: Yes, I want the formula to be C21H32FNO2.\nAssistant: Then, I advise the chemical with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC."} {"text":"User: I want to make a chemical with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptors of 3 and a LogP value computed using the Wildman-Crippen method of 4.83.\nAssistant: Cool, do you have some additional requirements I should consider?\nUser: Yeah, I want the molecular formula to be C20H30N4O.\nAssistant: In that case, I the chemical with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C."}", "/scratch/micpie/export/rdkit_features/valid_115-12.jsonl": "{"text":"Question: What is the chemical formula of the compound with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4?\nAnswer: C24H22N4O4S"} {"text":"Question: What is the chemical formula of the molecule with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F?\nAnswer: C27H25F4N3O2"}", "/scratch/micpie/export/rdkit_features/valid_12-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl?\nAnswer: 4"} {"text":"Question: What is the number of rings of the compound with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_27-4.jsonl": "{"text":"The count of rings of the chemical with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F is 2."} {"text":"The number of rings of the chemical with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C is 3."}", "/scratch/micpie/export/rdkit_features/test_24-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the chemical with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl?\nAnswer: 21"} {"text":"Question: What is the aromatic bond count of the compound with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_114-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the molecule with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C is 3."} {"text":"The number of hydrogen bond acceptor sites of the compound with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F is 4."}", "/scratch/micpie/export/rdkit_features/test_106-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O is 2.47."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC is 2.24."}", "/scratch/micpie/export/rdkit_features/train_4-17.jsonl": "{"text":"Question: What is the rotatable bond count of the chemical with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4?\nAnswer: 6"} {"text":"Question: What is the number of rotatable bonds of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_120-11.jsonl": "{"text":"User: I want to analyze a molecule with a chemical formula of C23H19N3O4S2.\nAssistant: That's interesting, do you have some additional limitations I should consider?\nUser: I want the number of hydrogen bond donor sites to be 3, the count of hydrogen bond acceptors to be 8.\nAssistant: In that situation, I suggest the molecule with SMILES c1cc(oc1)CNC(=O)CSc2nc3ccc(cc3s2)n4c(c5c(c4O)[C@@H]6C[C@@H]5C=C6)O."} {"text":"User: I want to design a molecule with a chemical formula of C16H11ClF2N2OS2.\nAssistant: That is a very interesting question, do you have some additional limitations I should take into account?\nUser: Yep, I want the count of hydrogen bond donors to be 0, the count of hydrogen bond acceptors to be 4.\nAssistant: In that situation, I recommend the molecule with SMILES CN(c1nc2cc(ccc2s1)Cl)C(=O)CSc3ccc(c(c3)F)F."}", "/scratch/micpie/export/rdkit_features/test_104-0.jsonl": "{"text":"The formula of the molecule with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2 is C11H18N4O5S."} {"text":"The formula of the compound with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3 is C18H14ClN5O."}", "/scratch/micpie/export/rdkit_features/valid_28-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br is 3.67."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C is 3.88."}", "/scratch/micpie/export/rdkit_features/train_108-5.jsonl": "{"text":"The number of rotatable bonds of the chemical with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2 is 7."} {"text":"The count of rotatable bonds of the molecule with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3 is 6."}", "/scratch/micpie/export/rdkit_features/test_115-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the chemical with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F is 5.31."} {"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 5.18."}", "/scratch/micpie/export/rdkit_features/test_14-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3?\nAnswer: C21H24N4O"} {"text":"Question: What is the chemical formula of the chemical with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O?\nAnswer: C13H14BrNO3"}", "/scratch/micpie/export/rdkit_features/test_14-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3?\nAnswer: 6"} {"text":"Question: What is the count of rotatable bonds of the molecule with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_17-0.jsonl": "{"text":"The formula of the chemical with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F is C20H29F3N2O3."} {"text":"The formula of the compound with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is C21H23N5O7."}", "/scratch/micpie/export/rdkit_features/valid_12-8.jsonl": "{"text":"The basic group count of the molecule with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl is 0."} {"text":"The basic group count of the molecule with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3 is 0."}", "/scratch/micpie/export/rdkit_features/train_2-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6 is 2.39."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O is 4.28."}", "/scratch/micpie/export/rdkit_features/train_117-3.jsonl": "{"text":"The heteroatom count of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(cc4)Oc5ccc(cn5)Cl is 8."} {"text":"The count of heteroatoms of the chemical with SMILES C[C@@H](C(C)(C)OC)[NH2+]CCNc1ccccc1 is 3."}", "/scratch/micpie/export/rdkit_features/test_25-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the chemical with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_112-19.jsonl": "{"text":"Question: What is the acid group count of the chemical with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the molecule with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_3-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the molecule with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O?\nAnswer: 5"} {"text":"Question: What is the rotatable bond count of the compound with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_31-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the molecule with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC is 3."} {"text":"The count of hydrogen bond acceptors of the compound with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3 is 4."}", "/scratch/micpie/export/rdkit_features/train_25-6.jsonl": "{"text":"The count of aromatic bonds of the molecule with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C is 10."} {"text":"The count of aromatic bonds of the molecule with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F is 6."}", "/scratch/micpie/export/rdkit_features/valid_102-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1 is C23H21F2N3O2."} {"text":"The chemical formula of the molecule with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2 is C48H76N6O4."}", "/scratch/micpie/export/rdkit_features/train_16-11.jsonl": "{"text":"User: I want to create a compound with a chemical formula of C17H28N3OS+.\nAssistant: That's interesting, do you have some additional constraints I should take into account?\nUser: Indeed, I want the number of hydrogen bond donor sites to be 2, the count of hydrogen bond acceptors to be 3.\nAssistant: I recommend the compound with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C."} {"text":"User: I want to create a compound with a formula of C23H28F2N2O3.\nAssistant: Nice, do you have some additional requirements I should consider?\nUser: Yes, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 3.\nAssistant: In that situation, I recommend the compound with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F."}", "/scratch/micpie/export/rdkit_features/test_16-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C is 1.42."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N is 2.95."}", "/scratch/micpie/export/rdkit_features/valid_104-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the chemical with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3?\nAnswer: 5"} {"text":"Question: What is the aromatic bond count of the molecule with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_115-8.jsonl": "{"text":"The count of basic groups of the chemical with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F is 0."} {"text":"The number of basic groups of the chemical with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 0."}", "/scratch/micpie/export/rdkit_features/train_5-20.jsonl": "{"text":"Question: What is the count of basic groups of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C?\nAnswer: 1"} {"text":"Question: What is the basic group count of the molecule with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_2-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C is 11."} {"text":"The number of aromatic bonds of the chemical with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4 is 18."}", "/scratch/micpie/export/rdkit_features/valid_5-5.jsonl": "{"text":"The rotatable bond count of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C is 6."} {"text":"The number of rotatable bonds of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+] is 8."}", "/scratch/micpie/export/rdkit_features/test_109-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl?\nAnswer: 64.74"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N?\nAnswer: 50.39"}", "/scratch/micpie/export/rdkit_features/train_7-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_29-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the molecule with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_23-15.jsonl": "{"text":"Question: What is the heteroatom count of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4?\nAnswer: 7"} {"text":"Question: What is the number of heteroatoms of the compound with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/train_17-22.jsonl": "{"text":"User: I want to make a molecule with a number of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 3.\nAssistant: That's interesting, do you have some additional limitations I should take into account?\nUser: Yep, I want the count of heteroatoms to be 7.\nAssistant: I the molecule with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F."} {"text":"User: I want to synthesize a chemical with a number of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 9.\nAssistant: Nice, do you have some additional requirements?\nUser: Yea, I want the heteroatom count to be 12.\nAssistant: I recommend the chemical with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O."}", "/scratch/micpie/export/rdkit_features/test_115-15.jsonl": "{"text":"Question: What is the heteroatom count of the compound with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F?\nAnswer: 8"} {"text":"Question: What is the number of heteroatoms of the compound with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_11-6.jsonl": "{"text":"The aromatic bond count of the molecule with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl is 6."} {"text":"The aromatic bond count of the molecule with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C is 6."}", "/scratch/micpie/export/rdkit_features/test_105-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N is 3."} {"text":"The number of hydrogen bond acceptors of the molecule with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O is 6."}", "/scratch/micpie/export/rdkit_features/train_2-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_108-4.jsonl": "{"text":"The ring count of the compound with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O is 2."} {"text":"The ring count of the compound with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C is 2."}", "/scratch/micpie/export/rdkit_features/train_116-22.jsonl": "{"text":"User: I want to synthesize a compound with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 3.\nAssistant: Cool, do you have some additional constraints I should take into account?\nUser: Yeah, I want the count of heteroatoms to be 7.\nAssistant: In that situation, I the compound with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F."} {"text":"User: I want to synthesize a compound with a number of hydrogen bond donors of 0 and a number of hydrogen bond acceptors of 3.\nAssistant: Nice, do you have some additional requirements?\nUser: Indeed, I want the count of heteroatoms to be 6.\nAssistant: In that scenario, I the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br."}", "/scratch/micpie/export/rdkit_features/valid_106-0.jsonl": "{"text":"The formula of the chemical with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl is C13H11ClFN3O3."} {"text":"The chemical formula of the molecule with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F is C18H24F4N3O3S+."}", "/scratch/micpie/export/rdkit_features/train_21-0.jsonl": "{"text":"The chemical formula of the compound with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC is C17H15FNO6S-."} {"text":"The chemical formula of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O is C19H25ClFN3O3."}", "/scratch/micpie/export/rdkit_features/valid_5-23.jsonl": "{"text":"User: I want to synthesize a chemical with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 6 and a LogP value computed using the Wildman-Crippen method of 2.09.\nAssistant: That's interesting, do you have some additional conditions I should take into account?\nUser: Yeah, I want the chemical formula to be C21H23N4O3S+.\nAssistant: In that situation, I propose the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C."} {"text":"User: I want to make a compound with a number of hydrogen bond donor sites of 3, a number of hydrogen bond acceptor sites of 4 and a LogP value computed using the Wildman-Crippen method of 3.05.\nAssistant: That is a very interesting question, do you have some additional constraints?\nUser: Yep, I want the chemical formula to be C25H25N2O4+.\nAssistant: Then, I advise the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+]."}", "/scratch/micpie/export/rdkit_features/valid_17-11.jsonl": "{"text":"User: I want to make a molecule with a chemical formula of C19H29F3N2O3S.\nAssistant: That's interesting, do you have some additional constraints I should take into account?\nUser: Yes, I want the number of hydrogen bond donor sites to be 2, the number of hydrogen bond acceptor sites to be 4.\nAssistant: In that case, I the molecule with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S."} {"text":"User: I want to synthesize a compound with a chemical formula of C20H22N6O7.\nAssistant: Cool, do you have some additional I should take into account?\nUser: Indeed, I want the number of hydrogen bond donor sites to be 2, the number of hydrogen bond acceptors to be 9.\nAssistant: I the compound with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O."}", "/scratch/micpie/export/rdkit_features/train_30-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_103-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the compound with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1 is 4.89."} {"text":"The Wildman-Crippen LogP value of the chemical with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is -1.05."}", "/scratch/micpie/export/rdkit_features/test_116-0.jsonl": "{"text":"The molecular formula of the molecule with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1 is C29H26FN3O2S."} {"text":"The chemical formula of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl is C24H23ClN2O3S."}", "/scratch/micpie/export/rdkit_features/test_22-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O is 4."} {"text":"The number of hydrogen bond acceptor sites of the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4 is 5."}", "/scratch/micpie/export/rdkit_features/train_115-4.jsonl": "{"text":"The number of rings of the compound with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F is 3."} {"text":"The count of rings of the compound with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 6."}", "/scratch/micpie/export/rdkit_features/valid_14-20.jsonl": "{"text":"Question: What is the count of basic groups of the compound with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the molecule with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_110-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the chemical with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3?\nAnswer: 11"} {"text":"Question: What is the aromatic bond count of the chemical with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/valid_14-8.jsonl": "{"text":"The count of basic groups of the molecule with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C is 0."} {"text":"The number of basic groups of the compound with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O is 0."}", "/scratch/micpie/export/rdkit_features/valid_31-8.jsonl": "{"text":"The count of basic groups of the compound with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC is 0."} {"text":"The basic group count of the molecule with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3 is 0."}", "/scratch/micpie/export/rdkit_features/train_100-4.jsonl": "{"text":"The count of rings of the chemical with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C is 3."} {"text":"The number of rings of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3 is 3."}", "/scratch/micpie/export/rdkit_features/test_17-3.jsonl": "{"text":"The heteroatom count of the compound with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F is 8."} {"text":"The heteroatom count of the molecule with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 12."}", "/scratch/micpie/export/rdkit_features/train_21-12.jsonl": "{"text":"Question: What is the formula of the chemical with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC?\nAnswer: C17H15FNO6S-"} {"text":"Question: What is the molecular formula of the compound with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O?\nAnswer: C19H25ClFN3O3"}", "/scratch/micpie/export/rdkit_features/test_4-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4?\nAnswer: 61.70"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3?\nAnswer: 67.32"}", "/scratch/micpie/export/rdkit_features/train_106-8.jsonl": "{"text":"The basic group count of the molecule with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O is 0."} {"text":"The count of basic groups of the chemical with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl is 0."}", "/scratch/micpie/export/rdkit_features/test_33-12.jsonl": "{"text":"Question: What is the formula of the chemical with SMILES Cc1ccc(cc1)C(=O)N2CCC(CC2)NC(=O)N[C@@H](C)[C@H](C(F)(F)F)O?\nAnswer: C18H24F3N3O3"} {"text":"Question: What is the chemical formula of the molecule with SMILES Cc1cc(no1)OCc2nc(no2)c3cc(c(n(c3=O)C)C)Br?\nAnswer: C14H13BrN4O4"}", "/scratch/micpie/export/rdkit_features/valid_11-20.jsonl": "{"text":"Question: What is the number of basic groups of the compound with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the compound with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_2-11.jsonl": "{"text":"User: I want to make a chemical with a chemical formula of C22H24N5O2S+.\nAssistant: Interesting, do you have some additional I should take into account?\nUser: Yeah, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 7.\nAssistant: Then, I the chemical with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6."} {"text":"User: I want to analyze a molecule with a chemical formula of C25H24N2O4.\nAssistant: Do you have some additional conditions that I should consider?\nUser: I want the number of hydrogen bond donor sites to be 2, the count of hydrogen bond acceptors to be 5.\nAssistant: In that scenario, I recommend the molecule with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O."}", "/scratch/micpie/export/rdkit_features/train_15-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the chemical with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC is 5."} {"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C is 3."}", "/scratch/micpie/export/rdkit_features/train_22-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O?\nAnswer: C19H25ClFN3O3"} {"text":"Question: What is the formula of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4?\nAnswer: C20H33N6O2+"}", "/scratch/micpie/export/rdkit_features/test_118-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the chemical with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O is 1."} {"text":"The count of hydrogen bond acceptors of the chemical with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N is 7."}", "/scratch/micpie/export/rdkit_features/valid_24-23.jsonl": "{"text":"User: I want to make a molecule with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptors of 6 and a LogP value computed using the Wildman-Crippen method of 1.94.\nAssistant: Interesting, do you have some additional requirements I should consider?\nUser: Indeed, I want the molecular formula to be C18H13ClN7O2-.\nAssistant: In that scenario, I advise the molecule with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl."} {"text":"User: I want to make a molecule with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptors of 5 and a Wildman-Crippen LogP value computed using RDKit of 0.99.\nAssistant: Do you have some additional limitations I should take into account?\nUser: I want the chemical formula to be C21H39N6O+.\nAssistant: In that situation, I the molecule with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C."}", "/scratch/micpie/export/rdkit_features/valid_107-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_119-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O?\nAnswer: 6"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_102-11.jsonl": "{"text":"User: I want to analyze a compound with a molecular formula of C23H25NO3.\nAssistant: Do you have some additional constraints that I should consider?\nUser: Yep, I want the count of hydrogen bond donors to be 0, the number of hydrogen bond acceptors to be 4.\nAssistant: In that case, I the compound with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3."} {"text":"User: I want to create a molecule with a molecular formula of C28H26N2O6.\nAssistant: Cool, do you have some additional limitations I should take into account?\nUser: Yep, I want the number of hydrogen bond donor sites to be 2, the count of hydrogen bond acceptors to be 6.\nAssistant: Then, I recommend the molecule with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1."}", "/scratch/micpie/export/rdkit_features/train_2-4.jsonl": "{"text":"The number of rings of the compound with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6 is 6."} {"text":"The number of rings of the molecule with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O is 5."}", "/scratch/micpie/export/rdkit_features/valid_120-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES c1cc(oc1)CNC(=O)CSc2nc3ccc(cc3s2)n4c(c5c(c4O)[C@@H]6C[C@@H]5C=C6)O is 4.64."} {"text":"The Wildman-Crippen LogP value of the chemical with SMILES CN(c1nc2cc(ccc2s1)Cl)C(=O)CSc3ccc(c(c3)F)F is 4.98."}", "/scratch/micpie/export/rdkit_features/test_21-5.jsonl": "{"text":"The rotatable bond count of the molecule with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-] is 6."} {"text":"The count of rotatable bonds of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O is 4."}", "/scratch/micpie/export/rdkit_features/train_112-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC is 68.10."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 69.10."}", "/scratch/micpie/export/rdkit_features/valid_10-7.jsonl": "{"text":"The acid group count of the compound with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C is 0."} {"text":"The acid group count of the chemical with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3 is 0."}", "/scratch/micpie/export/rdkit_features/train_116-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 3"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_21-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the molecule with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-]?\nAnswer: 6"} {"text":"Question: What is the rotatable bond count of the compound with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_100-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F is 1.37."} {"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4 is 1.39."}", "/scratch/micpie/export/rdkit_features/test_25-22.jsonl": "{"text":"User: I want to synthesize a chemical with a number of hydrogen bond donor sites of 1 and a count of hydrogen bond acceptors of 6.\nAssistant: That is a very interesting question, do you have some additional constraints?\nUser: Yes, I want the count of heteroatoms to be 7.\nAssistant: I the chemical with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4."} {"text":"User: I want to create a compound with a count of hydrogen bond donors of 2 and a number of hydrogen bond acceptors of 3.\nAssistant: Do you have some additional conditions I should consider?\nUser: Yeah, I want the heteroatom count to be 7.\nAssistant: In that scenario, I propose the compound with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3."}", "/scratch/micpie/export/rdkit_features/test_4-20.jsonl": "{"text":"Question: What is the number of basic groups of the compound with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4?\nAnswer: 0"} {"text":"Question: What is the basic group count of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_1-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/train_18-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O is C21H27N3O9S."} {"text":"The chemical formula of the compound with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl is C17H18Cl2N2O4S."}", "/scratch/micpie/export/rdkit_features/valid_10-16.jsonl": "{"text":"Question: What is the count of rings of the chemical with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C?\nAnswer: 3"} {"text":"Question: What is the ring count of the molecule with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_7-22.jsonl": "{"text":"User: I want to create a compound with a count of hydrogen bond donors of 2 and a number of hydrogen bond acceptors of 4.\nAssistant: That is a very interesting question, do you have some additional I should take into account?\nUser: Indeed, I want the count of heteroatoms to be 6.\nAssistant: In that scenario, I the compound with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC."} {"text":"User: I want to make a molecule with a number of hydrogen bond donors of 2 and a count of hydrogen bond acceptors of 5.\nAssistant: Do you have some additional limitations that I should consider?\nUser: Yes, I want the count of heteroatoms to be 7.\nAssistant: In that scenario, I suggest the molecule with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC."}", "/scratch/micpie/export/rdkit_features/test_9-12.jsonl": "{"text":"Question: What is the formula of the compound with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C?\nAnswer: C23H36N3O3+"} {"text":"Question: What is the chemical formula of the compound with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4?\nAnswer: C23H40N7+"}", "/scratch/micpie/export/rdkit_features/valid_18-7.jsonl": "{"text":"The number of acid groups of the chemical with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 0."} {"text":"The count of acid groups of the chemical with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O is 0."}", "/scratch/micpie/export/rdkit_features/valid_31-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC is 3.59."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3 is 2.43."}", "/scratch/micpie/export/rdkit_features/train_110-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O?\nAnswer: 3"} {"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_113-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is C21H32N5O3S+."} {"text":"The chemical formula of the molecule with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is C21H19Cl2N3O2S2."}", "/scratch/micpie/export/rdkit_features/valid_108-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the compound with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O is 8."} {"text":"The number of hydrogen bond acceptors of the chemical with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C is 4."}", "/scratch/micpie/export/rdkit_features/test_115-12.jsonl": "{"text":"Question: What is the chemical formula of the compound with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F?\nAnswer: C27H22FN3O4"} {"text":"Question: What is the molecular formula of the compound with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: C30H30FN3O2"}", "/scratch/micpie/export/rdkit_features/test_104-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2?\nAnswer: 2"} {"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_22-1.jsonl": "{"text":"The count of hydrogen bond donors of the chemical with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O is 1."} {"text":"The number of hydrogen bond donors of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4 is 1."}", "/scratch/micpie/export/rdkit_features/train_30-0.jsonl": "{"text":"The molecular formula of the compound with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F is C19H27F3N2O2."} {"text":"The formula of the molecule with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC is C24H27FN2O3."}", "/scratch/micpie/export/rdkit_features/test_118-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_115-19.jsonl": "{"text":"Question: What is the count of acid groups of the compound with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4?\nAnswer: 0"} {"text":"Question: What is the acid group count of the molecule with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_27-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the compound with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F?\nAnswer: 8"} {"text":"Question: What is the count of heteroatoms of the molecule with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/train_109-5.jsonl": "{"text":"The number of rotatable bonds of the compound with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3 is 6."} {"text":"The count of rotatable bonds of the chemical with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C is 6."}", "/scratch/micpie/export/rdkit_features/valid_19-20.jsonl": "{"text":"Question: What is the count of basic groups of the compound with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC?\nAnswer: 0"} {"text":"Question: What is the basic group count of the molecule with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_9-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the molecule with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2?\nAnswer: 6"} {"text":"Question: What is the count of rotatable bonds of the molecule with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/test_119-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O?\nAnswer: C15H16FN5O3"} {"text":"Question: What is the molecular formula of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C?\nAnswer: C22H29BrN2O4S"}", "/scratch/micpie/export/rdkit_features/train_112-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the compound with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC?\nAnswer: 9"} {"text":"Question: What is the count of heteroatoms of the chemical with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/train_107-7.jsonl": "{"text":"The acid group count of the chemical with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl is 0."} {"text":"The number of acid groups of the molecule with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5 is 0."}", "/scratch/micpie/export/rdkit_features/test_25-18.jsonl": "{"text":"Question: What is the aromatic bond count of the compound with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4?\nAnswer: 5"} {"text":"Question: What is the number of aromatic bonds of the chemical with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/valid_28-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br is 6."} {"text":"The count of aromatic bonds of the compound with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C is 12."}", "/scratch/micpie/export/rdkit_features/train_103-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1 is 4."} {"text":"The number of rotatable bonds of the chemical with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 4."}", "/scratch/micpie/export/rdkit_features/valid_112-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the compound with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_114-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the chemical with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl?\nAnswer: 18"} {"text":"Question: What is the aromatic bond count of the chemical with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4?\nAnswer: 22"}", "/scratch/micpie/export/rdkit_features/test_112-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the molecule with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_18-16.jsonl": "{"text":"Question: What is the count of rings of the compound with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O?\nAnswer: 4"} {"text":"Question: What is the count of rings of the molecule with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_29-16.jsonl": "{"text":"Question: What is the number of rings of the molecule with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F?\nAnswer: 4"} {"text":"Question: What is the count of rings of the molecule with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_19-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl?\nAnswer: 4"} {"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_116-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the chemical with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1?\nAnswer: 7"} {"text":"Question: What is the count of heteroatoms of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/valid_103-5.jsonl": "{"text":"The rotatable bond count of the chemical with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO is 7."} {"text":"The rotatable bond count of the compound with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N is 2."}", "/scratch/micpie/export/rdkit_features/test_113-4.jsonl": "{"text":"The ring count of the molecule with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N is 3."} {"text":"The count of rings of the chemical with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 3."}", "/scratch/micpie/export/rdkit_features/valid_102-19.jsonl": "{"text":"Question: What is the number of acid groups of the compound with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1?\nAnswer: 0"} {"text":"Question: What is the acid group count of the compound with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_110-23.jsonl": "{"text":"User: I want to analyze a chemical with a number of hydrogen bond donors of 2, a number of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value of 0.67.\nAssistant: That is a very interesting question, do you have some additional requirements I should take into account?\nUser: Indeed, I want the chemical formula to be C24H29N4O4+.\nAssistant: I advise the chemical with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O."} {"text":"User: I want to make a chemical with a number of hydrogen bond donors of 0, a number of hydrogen bond acceptors of 8 and a Wildman-Crippen LogP value computed using RDKit of 2.33.\nAssistant: Interesting, do you have some additional conditions I should take into account?\nUser: Yep, I want the molecular formula to be C19H25N5O3S2.\nAssistant: I recommend the chemical with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C."}", "/scratch/micpie/export/rdkit_features/test_14-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3?\nAnswer: 17"} {"text":"Question: What is the number of aromatic bonds of the chemical with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_25-3.jsonl": "{"text":"The heteroatom count of the compound with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4 is 8."} {"text":"The number of heteroatoms of the compound with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3 is 5."}", "/scratch/micpie/export/rdkit_features/test_32-4.jsonl": "{"text":"The ring count of the chemical with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3 is 3."} {"text":"The number of rings of the compound with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3 is 3."}", "/scratch/micpie/export/rdkit_features/valid_103-6.jsonl": "{"text":"The aromatic bond count of the molecule with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO is 0."} {"text":"The count of aromatic bonds of the chemical with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N is 11."}", "/scratch/micpie/export/rdkit_features/valid_3-4.jsonl": "{"text":"The number of rings of the molecule with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N is 4."} {"text":"The count of rings of the chemical with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC is 4."}", "/scratch/micpie/export/rdkit_features/valid_107-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the chemical with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C?\nAnswer: 8"} {"text":"Question: What is the heteroatom count of the compound with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/test_26-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3?\nAnswer: 0"} {"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_104-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2?\nAnswer: 0"} {"text":"Question: What is the aromatic bond count of the molecule with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3?\nAnswer: 18"}", "/scratch/micpie/export/rdkit_features/valid_13-7.jsonl": "{"text":"The number of acid groups of the molecule with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4 is 0."} {"text":"The count of acid groups of the chemical with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C is 0."}", "/scratch/micpie/export/rdkit_features/test_26-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the molecule with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3 is 3."} {"text":"The number of hydrogen bond acceptor sites of the compound with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br is 3."}", "/scratch/micpie/export/rdkit_features/valid_114-3.jsonl": "{"text":"The number of heteroatoms of the chemical with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl is 9."} {"text":"The number of heteroatoms of the compound with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4 is 6."}", "/scratch/micpie/export/rdkit_features/test_101-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F?\nAnswer: 41.46"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1?\nAnswer: 69.40"}", "/scratch/micpie/export/rdkit_features/valid_26-5.jsonl": "{"text":"The number of rotatable bonds of the chemical with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2 is 6."} {"text":"The number of rotatable bonds of the molecule with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC is 5."}", "/scratch/micpie/export/rdkit_features/train_101-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3?\nAnswer: C18H18N4O2"} {"text":"Question: What is the formula of the molecule with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1?\nAnswer: C25H23N3O4"}", "/scratch/micpie/export/rdkit_features/train_106-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O is 39.71."} {"text":"The sum of atomic polarizabilities of the compound with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl is 63.49."}", "/scratch/micpie/export/rdkit_features/train_110-3.jsonl": "{"text":"The count of heteroatoms of the chemical with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O is 10."} {"text":"The count of heteroatoms of the molecule with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C is 10."}", "/scratch/micpie/export/rdkit_features/train_28-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl?\nAnswer: 3"} {"text":"Question: What is the ring count of the compound with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_104-8.jsonl": "{"text":"The count of basic groups of the molecule with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2 is 0."} {"text":"The number of basic groups of the compound with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3 is 0."}", "/scratch/micpie/export/rdkit_features/valid_20-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3 is 71.71."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-] is 51.82."}", "/scratch/micpie/export/rdkit_features/train_116-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: C27H25ClFN3O2"} {"text":"Question: What is the chemical formula of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br?\nAnswer: C26H25BrN2O3"}", "/scratch/micpie/export/rdkit_features/valid_7-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC?\nAnswer: 66.32"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC?\nAnswer: 62.32"}", "/scratch/micpie/export/rdkit_features/train_25-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the chemical with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C?\nAnswer: 10"} {"text":"Question: What is the aromatic bond count of the molecule with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_110-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the molecule with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3?\nAnswer: 9"} {"text":"Question: What is the number of rotatable bonds of the chemical with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_11-7.jsonl": "{"text":"The number of acid groups of the compound with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2 is 0."} {"text":"The count of acid groups of the molecule with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C is 0."}", "/scratch/micpie/export/rdkit_features/valid_114-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl is 5.91."} {"text":"The Wildman-Crippen LogP value of the chemical with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4 is 5.19."}", "/scratch/micpie/export/rdkit_features/train_20-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3?\nAnswer: C25H33N3O3"} {"text":"Question: What is the formula of the molecule with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC?\nAnswer: C17H15FNO6S-"}", "/scratch/micpie/export/rdkit_features/test_29-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the molecule with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4 is 2.21."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C is 3.99."}", "/scratch/micpie/export/rdkit_features/train_110-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the molecule with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O is 1.61."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C is 2.33."}", "/scratch/micpie/export/rdkit_features/valid_33-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES c1ccc2c(c1)cn(n2)CC(=O)N3C[C@@H]4[C@H]3CN(CC4)C(=O)[C@@]56CCC[C@@H]5C6 is 3."} {"text":"The count of rotatable bonds of the molecule with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@@H]3CCSC3)Br is 4."}", "/scratch/micpie/export/rdkit_features/train_2-22.jsonl": "{"text":"User: I want to create a chemical with a number of hydrogen bond donor sites of 1 and a count of hydrogen bond acceptors of 7.\nAssistant: That's interesting, do you have some additional that help me narrow down the search?\nUser: Yep, I want the number of heteroatoms to be 8.\nAssistant: In that case, I recommend the chemical with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6."} {"text":"User: I want to make a compound with a number of hydrogen bond donors of 2 and a count of hydrogen bond acceptors of 5.\nAssistant: Cool, do you have some additional ?\nUser: Indeed, I want the count of heteroatoms to be 6.\nAssistant: In that scenario, I advise the compound with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O."}", "/scratch/micpie/export/rdkit_features/valid_24-22.jsonl": "{"text":"User: I want to synthesize a compound with a number of hydrogen bond donor sites of 2 and a count of hydrogen bond acceptors of 6.\nAssistant: Do you have some additional constraints I should take into account?\nUser: Indeed, I want the heteroatom count to be 10.\nAssistant: Then, I propose the compound with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl."} {"text":"User: I want to synthesize a compound with a number of hydrogen bond donor sites of 1 and a count of hydrogen bond acceptors of 5.\nAssistant: Cool, do you have some additional constraints I should take into account?\nUser: Yea, I want the heteroatom count to be 7.\nAssistant: In that situation, I the compound with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C."}", "/scratch/micpie/export/rdkit_features/test_10-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_20-15.jsonl": "{"text":"Question: What is the heteroatom count of the molecule with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3?\nAnswer: 6"} {"text":"Question: What is the number of heteroatoms of the molecule with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-]?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/test_32-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the chemical with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3?\nAnswer: 9"} {"text":"Question: What is the rotatable bond count of the chemical with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_114-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C is 4."} {"text":"The number of hydrogen bond acceptors of the molecule with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5 is 6."}", "/scratch/micpie/export/rdkit_features/valid_11-19.jsonl": "{"text":"Question: What is the acid group count of the chemical with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the molecule with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_21-6.jsonl": "{"text":"The count of aromatic bonds of the chemical with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC is 12."} {"text":"The count of aromatic bonds of the molecule with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O is 6."}", "/scratch/micpie/export/rdkit_features/test_30-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3?\nAnswer: C21H25N3O2"} {"text":"Question: What is the chemical formula of the chemical with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC?\nAnswer: C24H27FN2O3"}", "/scratch/micpie/export/rdkit_features/train_22-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the molecule with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O is 4."} {"text":"The number of hydrogen bond acceptors of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4 is 7."}", "/scratch/micpie/export/rdkit_features/test_10-5.jsonl": "{"text":"The rotatable bond count of the compound with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4 is 8."} {"text":"The count of rotatable bonds of the molecule with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F is 7."}", "/scratch/micpie/export/rdkit_features/test_23-23.jsonl": "{"text":"User: I want to design a molecule with a count of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 5 and a Wildman-Crippen LogP value of 0.96.\nAssistant: Cool, do you have some additional conditions I should take into account?\nUser: Indeed, I want the chemical formula to be C19H34N5OS+.\nAssistant: Then, I suggest the molecule with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4."} {"text":"User: I want to create a chemical with a number of hydrogen bond donors of 2, a count of hydrogen bond acceptors of 5 and a LogP value computed using the Wildman-Crippen method of 0.48.\nAssistant: Cool, do you have some additional limitations that help me narrow down the search?\nUser: Yep, I want the formula to be C18H19ClN8O.\nAssistant: In that situation, I propose the chemical with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4."}", "/scratch/micpie/export/rdkit_features/test_110-22.jsonl": "{"text":"User: I want to create a compound with a number of hydrogen bond donor sites of 2 and a count of hydrogen bond acceptors of 5.\nAssistant: That's interesting, do you have some additional constraints?\nUser: Indeed, I want the heteroatom count to be 8.\nAssistant: In that scenario, I the compound with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3."} {"text":"User: I want to make a molecule with a number of hydrogen bond donors of 0 and a number of hydrogen bond acceptors of 8.\nAssistant: That's interesting, do you have some additional requirements that help me narrow down the search?\nUser: Yep, I want the number of heteroatoms to be 9.\nAssistant: In that scenario, I advise the molecule with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4."}", "/scratch/micpie/export/rdkit_features/valid_114-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the compound with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl is 61.60."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4 is 70.61."}", "/scratch/micpie/export/rdkit_features/train_14-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C?\nAnswer: 3"} {"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_24-3.jsonl": "{"text":"The heteroatom count of the chemical with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5 is 9."} {"text":"The count of heteroatoms of the compound with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C is 9."}", "/scratch/micpie/export/rdkit_features/train_0-19.jsonl": "{"text":"Question: What is the count of acid groups of the compound with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1?\nAnswer: 1"} {"text":"Question: What is the acid group count of the molecule with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_110-1.jsonl": "{"text":"The count of hydrogen bond donors of the chemical with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O is 2."} {"text":"The number of hydrogen bond donors of the molecule with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C is 0."}", "/scratch/micpie/export/rdkit_features/train_8-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N is 0."} {"text":"The number of acid groups of the chemical with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO is 0."}", "/scratch/micpie/export/rdkit_features/valid_25-0.jsonl": "{"text":"The formula of the molecule with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4 is C20H29N5O3."} {"text":"The formula of the molecule with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3 is C20H25N3OS."}", "/scratch/micpie/export/rdkit_features/train_111-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the compound with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 6"} {"text":"Question: What is the rotatable bond count of the compound with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/test_22-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the chemical with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O is 2.26."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4 is -0.65."}", "/scratch/micpie/export/rdkit_features/train_104-22.jsonl": "{"text":"User: I want to design a compound with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 7.\nAssistant: Nice, do you have some additional constraints that I should consider?\nUser: I want the count of heteroatoms to be 9.\nAssistant: Then, I suggest the compound with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3."} {"text":"User: I want to analyze a chemical with a count of hydrogen bond donors of 0 and a count of hydrogen bond acceptors of 4.\nAssistant: Do you have some additional I should take into account?\nUser: Indeed, I want the number of heteroatoms to be 6.\nAssistant: In that situation, I recommend the chemical with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl."}", "/scratch/micpie/export/rdkit_features/valid_8-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C is 6."} {"text":"The aromatic bond count of the compound with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br is 6."}", "/scratch/micpie/export/rdkit_features/valid_12-23.jsonl": "{"text":"User: I want to synthesize a compound with a number of hydrogen bond donor sites of 0, a number of hydrogen bond acceptors of 2 and a LogP value computed using the Wildman-Crippen method of 4.53.\nAssistant: Interesting, do you have some additional constraints that I should consider?\nUser: Yep, I want the molecular formula to be C20H26ClNO2.\nAssistant: In that scenario, I the compound with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl."} {"text":"User: I want to analyze a compound with a number of hydrogen bond donors of 0, a number of hydrogen bond acceptor sites of 4 and a Wildman-Crippen LogP value computed using RDKit of 4.56.\nAssistant: Interesting, do you have some additional requirements I should take into account?\nUser: I want the molecular formula to be C19H33FN4.\nAssistant: Then, I propose the compound with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3."}", "/scratch/micpie/export/rdkit_features/train_24-20.jsonl": "{"text":"Question: What is the number of basic groups of the chemical with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the molecule with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_101-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3?\nAnswer: 43.25"} {"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1?\nAnswer: 65.35"}", "/scratch/micpie/export/rdkit_features/train_104-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 1."} {"text":"The number of hydrogen bond donors of the compound with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl is 0."}", "/scratch/micpie/export/rdkit_features/valid_24-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl?\nAnswer: 6"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_32-11.jsonl": "{"text":"User: I want to analyze a compound with a molecular formula of C18H32N5O3S+.\nAssistant: Interesting, do you have some additional constraints I should take into account?\nUser: Indeed, I want the number of hydrogen bond donor sites to be 4, the count of hydrogen bond acceptors to be 5.\nAssistant: In that scenario, I advise the compound with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C."} {"text":"User: I want to create a compound with a molecular formula of C17H20F3N4O3+.\nAssistant: Nice, do you have some additional ?\nUser: Yea, I want the number of hydrogen bond donors to be 3, the number of hydrogen bond acceptor sites to be 4.\nAssistant: In that situation, I propose the compound with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2."}", "/scratch/micpie/export/rdkit_features/valid_109-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl?\nAnswer: 6"} {"text":"Question: What is the rotatable bond count of the molecule with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_19-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl?\nAnswer: C21H26ClN3O4"} {"text":"Question: What is the chemical formula of the molecule with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4?\nAnswer: C21H34FN4O2S+"}", "/scratch/micpie/export/rdkit_features/test_100-4.jsonl": "{"text":"The count of rings of the compound with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F is 4."} {"text":"The number of rings of the chemical with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br is 2."}", "/scratch/micpie/export/rdkit_features/test_30-20.jsonl": "{"text":"Question: What is the count of basic groups of the molecule with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the chemical with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_33-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES COc1ccc(c(n1)C(F)(F)F)NC(=O)C(=O)NCCC[NH+]2CCCCC2?\nAnswer: 1"} {"text":"Question: What is the number of basic groups of the compound with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@H]3CCSC3)Br?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_21-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the molecule with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-]?\nAnswer: 8"} {"text":"Question: What is the heteroatom count of the chemical with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_103-19.jsonl": "{"text":"Question: What is the acid group count of the chemical with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the molecule with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_28-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_14-19.jsonl": "{"text":"Question: What is the number of acid groups of the compound with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the chemical with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_114-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the chemical with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl?\nAnswer: 9"} {"text":"Question: What is the number of heteroatoms of the molecule with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_6-20.jsonl": "{"text":"Question: What is the number of basic groups of the compound with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the compound with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_104-20.jsonl": "{"text":"Question: What is the number of basic groups of the chemical with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the chemical with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_107-3.jsonl": "{"text":"The heteroatom count of the molecule with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC is 9."} {"text":"The count of heteroatoms of the chemical with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC is 9."}", "/scratch/micpie/export/rdkit_features/valid_107-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the molecule with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C is 5."} {"text":"The number of hydrogen bond acceptors of the chemical with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C is 6."}", "/scratch/micpie/export/rdkit_features/valid_30-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_9-7.jsonl": "{"text":"The number of acid groups of the molecule with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C is 0."} {"text":"The count of acid groups of the compound with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F is 0."}", "/scratch/micpie/export/rdkit_features/valid_112-5.jsonl": "{"text":"The number of rotatable bonds of the chemical with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC is 7."} {"text":"The rotatable bond count of the chemical with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N is 7."}", "/scratch/micpie/export/rdkit_features/train_107-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl?\nAnswer: 4"} {"text":"Question: What is the rotatable bond count of the molecule with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_14-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C is 2."} {"text":"The number of hydrogen bond donors of the molecule with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O is 1."}", "/scratch/micpie/export/rdkit_features/train_108-12.jsonl": "{"text":"Question: What is the chemical formula of the compound with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2?\nAnswer: C17H21F3N6O2S"} {"text":"Question: What is the formula of the chemical with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3?\nAnswer: C23H32N4O4"}", "/scratch/micpie/export/rdkit_features/valid_104-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3 is 5."} {"text":"The aromatic bond count of the molecule with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C is 6."}", "/scratch/micpie/export/rdkit_features/test_101-22.jsonl": "{"text":"User: I want to synthesize a chemical with a number of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 4.\nAssistant: Nice, do you have some additional conditions that help me narrow down the search?\nUser: Yep, I want the heteroatom count to be 7.\nAssistant: In that scenario, I the chemical with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F."} {"text":"User: I want to make a compound with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptors of 6.\nAssistant: That is a very interesting question, do you have some additional that help me narrow down the search?\nUser: Yeah, I want the count of heteroatoms to be 8.\nAssistant: In that scenario, I recommend the compound with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1."}", "/scratch/micpie/export/rdkit_features/train_114-22.jsonl": "{"text":"User: I want to make a chemical with a number of hydrogen bond donors of 3 and a number of hydrogen bond acceptors of 3.\nAssistant: That's interesting, do you have some additional constraints I should take into account?\nUser: Indeed, I want the heteroatom count to be 9.\nAssistant: In that case, I the chemical with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C."} {"text":"User: I want to synthesize a molecule with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptors of 4.\nAssistant: Interesting, do you have some additional I should consider?\nUser: Indeed, I want the count of heteroatoms to be 7.\nAssistant: In that situation, I propose the molecule with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F."}", "/scratch/micpie/export/rdkit_features/train_116-16.jsonl": "{"text":"Question: What is the count of rings of the compound with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 6"} {"text":"Question: What is the ring count of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_114-17.jsonl": "{"text":"Question: What is the rotatable bond count of the compound with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C?\nAnswer: 7"} {"text":"Question: What is the rotatable bond count of the compound with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_0-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1 is 69.35."} {"text":"The total sum of atomic polarizabilities of the molecule with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC is 62.75."}", "/scratch/micpie/export/rdkit_features/valid_19-11.jsonl": "{"text":"User: I want to synthesize a molecule with a chemical formula of C23H31N3O4.\nAssistant: That is a very interesting question, do you have some additional constraints?\nUser: Yeah, I want the count of hydrogen bond donors to be 3, the number of hydrogen bond acceptor sites to be 4.\nAssistant: In that situation, I propose the molecule with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC."} {"text":"User: I want to analyze a molecule with a chemical formula of C18H19FN2O4S2.\nAssistant: Do you have some additional requirements that I should consider?\nUser: Indeed, I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 5.\nAssistant: I recommend the molecule with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F."}", "/scratch/micpie/export/rdkit_features/valid_17-12.jsonl": "{"text":"Question: What is the chemical formula of the compound with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S?\nAnswer: C19H29F3N2O3S"} {"text":"Question: What is the formula of the compound with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: C20H22N6O7"}", "/scratch/micpie/export/rdkit_features/test_29-16.jsonl": "{"text":"Question: What is the number of rings of the molecule with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4?\nAnswer: 4"} {"text":"Question: What is the count of rings of the chemical with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_1-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the compound with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4?\nAnswer: 5"} {"text":"Question: What is the count of rotatable bonds of the compound with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_31-3.jsonl": "{"text":"The heteroatom count of the molecule with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F is 6."} {"text":"The number of heteroatoms of the compound with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4 is 7."}", "/scratch/micpie/export/rdkit_features/test_27-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_14-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3 is 4.55."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O is 2.72."}", "/scratch/micpie/export/rdkit_features/valid_116-11.jsonl": "{"text":"User: I want to synthesize a compound with a chemical formula of C30H32FN3O2.\nAssistant: Do you have some additional requirements I should take into account?\nUser: Yeah, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 3.\nAssistant: In that situation, I recommend the compound with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F."} {"text":"User: I want to make a chemical with a molecular formula of C26H25ClN2O4.\nAssistant: Nice, do you have some additional I should consider?\nUser: Yes, I want the number of hydrogen bond donors to be 0, the number of hydrogen bond acceptor sites to be 4.\nAssistant: In that scenario, I advise the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl."}", "/scratch/micpie/export/rdkit_features/test_19-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl is 8."} {"text":"The rotatable bond count of the chemical with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4 is 6."}", "/scratch/micpie/export/rdkit_features/test_21-22.jsonl": "{"text":"User: I want to design a compound with a number of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 6.\nAssistant: Nice, do you have some additional that help me narrow down the search?\nUser: Yes, I want the count of heteroatoms to be 8.\nAssistant: I the compound with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-]."} {"text":"User: I want to synthesize a molecule with a number of hydrogen bond donor sites of 1 and a number of hydrogen bond acceptors of 5.\nAssistant: Interesting, do you have some additional conditions I should take into account?\nUser: Yeah, I want the count of heteroatoms to be 7.\nAssistant: In that scenario, I the molecule with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O."}", "/scratch/micpie/export/rdkit_features/test_32-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_112-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC?\nAnswer: 8"} {"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_114-11.jsonl": "{"text":"User: I want to make a compound with a molecular formula of C24H27N3O3S2.\nAssistant: Do you have some additional conditions that I should consider?\nUser: Yeah, I want the number of hydrogen bond donor sites to be 3, the number of hydrogen bond acceptors to be 4.\nAssistant: In that scenario, I recommend the compound with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C."} {"text":"User: I want to design a chemical with a chemical formula of C26H22N4O4.\nAssistant: That's interesting, do you have some additional limitations that I should consider?\nUser: Yea, I want the number of hydrogen bond donor sites to be 2, the number of hydrogen bond acceptors to be 6.\nAssistant: In that scenario, I suggest the chemical with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5."}", "/scratch/micpie/export/rdkit_features/valid_115-23.jsonl": "{"text":"User: I want to analyze a chemical with a number of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 8 and a Wildman-Crippen LogP value of 5.05.\nAssistant: That is a very interesting question, do you have some additional constraints I should take into account?\nUser: I want the molecular formula to be C24H22N4O4S.\nAssistant: Then, I suggest the chemical with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4."} {"text":"User: I want to design a molecule with a number of hydrogen bond donors of 2, a number of hydrogen bond acceptor sites of 2 and a Wildman-Crippen LogP value of 5.31.\nAssistant: That is a very interesting question, do you have some additional requirements that help me narrow down the search?\nUser: Indeed, I want the molecular formula to be C27H25F4N3O2.\nAssistant: I suggest the molecule with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F."}", "/scratch/micpie/export/rdkit_features/train_104-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 47.35."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl is 52.99."}", "/scratch/micpie/export/rdkit_features/valid_114-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl?\nAnswer: 61.60"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4?\nAnswer: 70.61"}", "/scratch/micpie/export/rdkit_features/train_21-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC is 6."} {"text":"The count of hydrogen bond acceptors of the molecule with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O is 4."}", "/scratch/micpie/export/rdkit_features/valid_25-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the chemical with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4 is 8."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3 is 4."}", "/scratch/micpie/export/rdkit_features/test_15-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O is 2.72."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2 is 1.26."}", "/scratch/micpie/export/rdkit_features/test_116-8.jsonl": "{"text":"The number of basic groups of the chemical with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1 is 0."} {"text":"The number of basic groups of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl is 0."}", "/scratch/micpie/export/rdkit_features/valid_27-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F?\nAnswer: 2"} {"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_33-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES COc1ccc(c(n1)C(F)(F)F)NC(=O)C(=O)NCCC[NH+]2CCCCC2?\nAnswer: 3"} {"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@H]3CCSC3)Br?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_23-11.jsonl": "{"text":"User: I want to make a compound with a chemical formula of C19H34N5OS+.\nAssistant: Interesting, do you have some additional constraints?\nUser: Yep, I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptors to be 5.\nAssistant: Then, I advise the compound with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4."} {"text":"User: I want to analyze a chemical with a chemical formula of C18H19ClN8O.\nAssistant: Cool, do you have some additional constraints?\nUser: I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 5.\nAssistant: In that situation, I advise the chemical with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4."}", "/scratch/micpie/export/rdkit_features/train_102-15.jsonl": "{"text":"Question: What is the heteroatom count of the chemical with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O?\nAnswer: 10"} {"text":"Question: What is the count of heteroatoms of the chemical with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_12-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C is C18H18FN3OS."} {"text":"The chemical formula of the compound with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4 is C21H29FN2O."}", "/scratch/micpie/export/rdkit_features/test_28-7.jsonl": "{"text":"The number of acid groups of the compound with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC is 0."} {"text":"The number of acid groups of the compound with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C is 0."}", "/scratch/micpie/export/rdkit_features/valid_112-22.jsonl": "{"text":"User: I want to create a compound with a number of hydrogen bond donor sites of 0 and a number of hydrogen bond acceptors of 9.\nAssistant: That is a very interesting question, do you have some additional requirements that help me narrow down the search?\nUser: Yeah, I want the count of heteroatoms to be 10.\nAssistant: In that case, I the compound with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC."} {"text":"User: I want to analyze a molecule with a number of hydrogen bond donors of 3 and a number of hydrogen bond acceptors of 5.\nAssistant: That's interesting, do you have some additional that I should consider?\nUser: Yep, I want the heteroatom count to be 9.\nAssistant: Then, I the molecule with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N."}", "/scratch/micpie/export/rdkit_features/valid_110-4.jsonl": "{"text":"The number of rings of the compound with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O is 5."} {"text":"The number of rings of the chemical with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C is 4."}", "/scratch/micpie/export/rdkit_features/test_26-8.jsonl": "{"text":"The basic group count of the molecule with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3 is 0."} {"text":"The basic group count of the chemical with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br is 0."}", "/scratch/micpie/export/rdkit_features/test_8-23.jsonl": "{"text":"User: I want to synthesize a molecule with a count of hydrogen bond donors of 0, a number of hydrogen bond acceptor sites of 7 and a Wildman-Crippen LogP value computed using RDKit of 3.72.\nAssistant: That is a very interesting question, do you have some additional constraints I should take into account?\nUser: Yep, I want the chemical formula to be C22H26N4O4.\nAssistant: In that case, I suggest the molecule with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C."} {"text":"User: I want to create a molecule with a count of hydrogen bond donors of 0, a number of hydrogen bond acceptor sites of 4 and a Wildman-Crippen LogP value of 3.85.\nAssistant: Nice, do you have some additional conditions I should take into account?\nUser: Indeed, I want the molecular formula to be C22H24F2N2O3.\nAssistant: Then, I suggest the molecule with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F."}", "/scratch/micpie/export/rdkit_features/test_116-3.jsonl": "{"text":"The count of heteroatoms of the compound with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1 is 7."} {"text":"The number of heteroatoms of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl is 7."}", "/scratch/micpie/export/rdkit_features/test_4-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the chemical with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4?\nAnswer: 18"} {"text":"Question: What is the aromatic bond count of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3?\nAnswer: 16"}", "/scratch/micpie/export/rdkit_features/train_7-22.jsonl": "{"text":"User: I want to create a chemical with a number of hydrogen bond donor sites of 1 and a number of hydrogen bond acceptors of 5.\nAssistant: That's interesting, do you have some additional that help me narrow down the search?\nUser: Yeah, I want the count of heteroatoms to be 7.\nAssistant: In that situation, I the chemical with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl."} {"text":"User: I want to design a molecule with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptor sites of 5.\nAssistant: Nice, do you have some additional conditions I should take into account?\nUser: Indeed, I want the count of heteroatoms to be 9.\nAssistant: I the molecule with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F."}", "/scratch/micpie/export/rdkit_features/valid_17-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S?\nAnswer: 4"} {"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/valid_103-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the compound with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_108-1.jsonl": "{"text":"The count of hydrogen bond donors of the chemical with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2 is 2."} {"text":"The number of hydrogen bond donor sites of the chemical with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3 is 3."}", "/scratch/micpie/export/rdkit_features/test_17-23.jsonl": "{"text":"User: I want to make a compound with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptor sites of 3 and a LogP value computed using the Wildman-Crippen method of 2.91.\nAssistant: Do you have some additional conditions that help me narrow down the search?\nUser: Yeah, I want the molecular formula to be C20H29F3N2O3.\nAssistant: In that situation, I advise the compound with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F."} {"text":"User: I want to synthesize a compound with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptors of 9 and a LogP value computed using the Wildman-Crippen method of 0.96.\nAssistant: Do you have some additional conditions that I should consider?\nUser: Yea, I want the chemical formula to be C21H23N5O7.\nAssistant: Then, I propose the compound with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O."}", "/scratch/micpie/export/rdkit_features/valid_108-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O is C17H20N4O6S2."} {"text":"The chemical formula of the molecule with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C is C23H34N4O4."}", "/scratch/micpie/export/rdkit_features/train_115-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F is 55.36."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 75.06."}", "/scratch/micpie/export/rdkit_features/test_18-3.jsonl": "{"text":"The number of heteroatoms of the compound with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O is 13."} {"text":"The heteroatom count of the compound with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C is 7."}", "/scratch/micpie/export/rdkit_features/train_105-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the molecule with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N?\nAnswer: 7"} {"text":"Question: What is the heteroatom count of the molecule with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/valid_21-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-] is 6."} {"text":"The count of rotatable bonds of the chemical with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3 is 5."}", "/scratch/micpie/export/rdkit_features/valid_109-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl is 2.40."} {"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F is 2.09."}", "/scratch/micpie/export/rdkit_features/test_102-22.jsonl": "{"text":"User: I want to design a chemical with a count of hydrogen bond donors of 0 and a number of hydrogen bond acceptors of 4.\nAssistant: Cool, do you have some additional I should take into account?\nUser: Yea, I want the number of heteroatoms to be 4.\nAssistant: In that situation, I advise the chemical with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3."} {"text":"User: I want to create a chemical with a number of hydrogen bond donors of 2 and a number of hydrogen bond acceptor sites of 6.\nAssistant: Do you have some additional conditions I should take into account?\nUser: Indeed, I want the count of heteroatoms to be 8.\nAssistant: I recommend the chemical with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1."}", "/scratch/micpie/export/rdkit_features/test_13-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the chemical with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 12"} {"text":"Question: What is the count of aromatic bonds of the compound with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_26-6.jsonl": "{"text":"The aromatic bond count of the chemical with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3 is 6."} {"text":"The number of aromatic bonds of the molecule with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br is 12."}", "/scratch/micpie/export/rdkit_features/train_100-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C?\nAnswer: 4"} {"text":"Question: What is the rotatable bond count of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_103-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donors of the molecule with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_11-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the molecule with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl?\nAnswer: 4"} {"text":"Question: What is the heteroatom count of the chemical with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_13-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 6"} {"text":"Question: What is the rotatable bond count of the compound with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_112-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC?\nAnswer: 7"} {"text":"Question: What is the rotatable bond count of the compound with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/test_3-11.jsonl": "{"text":"User: I want to analyze a chemical with a molecular formula of C22H26N6O3.\nAssistant: Interesting, do you have some additional limitations I should take into account?\nUser: Indeed, I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptors to be 7.\nAssistant: I the chemical with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC."} {"text":"User: I want to design a molecule with a chemical formula of C20H20F2N4O3.\nAssistant: That's interesting, do you have some additional limitations I should take into account?\nUser: Yep, I want the number of hydrogen bond donors to be 3, the count of hydrogen bond acceptors to be 4.\nAssistant: I suggest the molecule with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F."}", "/scratch/micpie/export/rdkit_features/train_3-7.jsonl": "{"text":"The count of acid groups of the molecule with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O is 0."} {"text":"The count of acid groups of the chemical with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4 is 0."}", "/scratch/micpie/export/rdkit_features/valid_112-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC?\nAnswer: 4"} {"text":"Question: What is the number of rings of the molecule with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_12-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the compound with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C is 50.18."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC is 61.56."}", "/scratch/micpie/export/rdkit_features/test_103-8.jsonl": "{"text":"The basic group count of the chemical with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1 is 0."} {"text":"The basic group count of the molecule with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 0."}", "/scratch/micpie/export/rdkit_features/valid_104-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the chemical with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3?\nAnswer: 5"} {"text":"Question: What is the rotatable bond count of the compound with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_14-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_11-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl is 58.75."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C is 50.18."}", "/scratch/micpie/export/rdkit_features/valid_30-22.jsonl": "{"text":"User: I want to synthesize a chemical with a number of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 4.\nAssistant: Interesting, do you have some additional that help me narrow down the search?\nUser: Yeah, I want the number of heteroatoms to be 5.\nAssistant: Then, I propose the chemical with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3."} {"text":"User: I want to make a molecule with a number of hydrogen bond donor sites of 0 and a count of hydrogen bond acceptors of 2.\nAssistant: Nice, do you have some additional requirements I should take into account?\nUser: Yes, I want the heteroatom count to be 6.\nAssistant: In that scenario, I suggest the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F."}", "/scratch/micpie/export/rdkit_features/test_117-1.jsonl": "{"text":"The number of hydrogen bond donors of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc5cc(ccc5c4)Br is 0."} {"text":"The number of hydrogen bond donor sites of the compound with SMILES C[C@H](C(C)(C)OC)[NH2+]CCNc1ccccc1 is 2."}", "/scratch/micpie/export/rdkit_features/train_102-0.jsonl": "{"text":"The chemical formula of the compound with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O is C20H30N2O6S2."} {"text":"The formula of the compound with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1 is C18H15NO2."}", "/scratch/micpie/export/rdkit_features/valid_11-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl?\nAnswer: C20H28ClNO2"} {"text":"Question: What is the formula of the compound with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C?\nAnswer: C17H21Cl2NO"}", "/scratch/micpie/export/rdkit_features/test_0-0.jsonl": "{"text":"The formula of the compound with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1 is C24H12N2O3S5."} {"text":"The chemical formula of the molecule with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC is C24H21N3O4."}", "/scratch/micpie/export/rdkit_features/test_11-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the compound with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2 is 1."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C is 2."}", "/scratch/micpie/export/rdkit_features/valid_110-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O?\nAnswer: 1"} {"text":"Question: What is the basic group count of the compound with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_28-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the chemical with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br?\nAnswer: 5"} {"text":"Question: What is the number of heteroatoms of the compound with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_101-6.jsonl": "{"text":"The count of aromatic bonds of the compound with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F is 12."} {"text":"The aromatic bond count of the compound with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1 is 28."}", "/scratch/micpie/export/rdkit_features/valid_118-15.jsonl": "{"text":"Question: What is the heteroatom count of the chemical with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O?\nAnswer: 3"} {"text":"Question: What is the heteroatom count of the compound with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/train_14-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C is 5."} {"text":"The rotatable bond count of the compound with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O is 4."}", "/scratch/micpie/export/rdkit_features/train_112-16.jsonl": "{"text":"Question: What is the count of rings of the compound with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC?\nAnswer: 4"} {"text":"Question: What is the number of rings of the chemical with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_113-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 5"} {"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_103-8.jsonl": "{"text":"The basic group count of the chemical with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO is 0."} {"text":"The basic group count of the chemical with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N is 0."}", "/scratch/micpie/export/rdkit_features/train_17-4.jsonl": "{"text":"The count of rings of the chemical with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F is 3."} {"text":"The count of rings of the chemical with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 5."}", "/scratch/micpie/export/rdkit_features/test_1-8.jsonl": "{"text":"The basic group count of the molecule with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4 is 1."} {"text":"The basic group count of the molecule with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC is 0."}", "/scratch/micpie/export/rdkit_features/valid_104-12.jsonl": "{"text":"Question: What is the formula of the compound with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3?\nAnswer: C14H21N5O4"} {"text":"Question: What is the chemical formula of the compound with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C?\nAnswer: C21H34N3O2+"}", "/scratch/micpie/export/rdkit_features/train_102-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O?\nAnswer: 68.02"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1?\nAnswer: 44.39"}", "/scratch/micpie/export/rdkit_features/test_30-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the compound with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3 is 3."} {"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC is 3."}", "/scratch/micpie/export/rdkit_features/train_19-1.jsonl": "{"text":"The count of hydrogen bond donors of the chemical with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC is 3."} {"text":"The count of hydrogen bond donors of the chemical with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3 is 2."}", "/scratch/micpie/export/rdkit_features/test_13-11.jsonl": "{"text":"User: I want to design a molecule with a formula of C19H21ClFNO2.\nAssistant: Cool, do you have some additional constraints?\nUser: Yeah, I want the number of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 2.\nAssistant: Then, I propose the molecule with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC."} {"text":"User: I want to create a compound with a chemical formula of C22H31NO2.\nAssistant: Do you have some additional requirements I should consider?\nUser: Yea, I want the number of hydrogen bond donor sites to be 0, the number of hydrogen bond acceptors to be 2.\nAssistant: In that scenario, I the compound with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4."}", "/scratch/micpie/export/rdkit_features/test_110-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3?\nAnswer: 69.89"} {"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4?\nAnswer: 68.44"}", "/scratch/micpie/export/rdkit_features/train_33-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES COc1ccc(c(n1)C(F)(F)F)NC(=O)C(=O)NCCC[NH+]2CCCCC2?\nAnswer: 4"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@H]3CCSC3)Br?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_27-4.jsonl": "{"text":"The ring count of the chemical with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F is 2."} {"text":"The count of rings of the compound with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F is 3."}", "/scratch/micpie/export/rdkit_features/valid_12-0.jsonl": "{"text":"The chemical formula of the compound with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl is C20H26ClNO2."} {"text":"The chemical formula of the chemical with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3 is C19H33FN4."}", "/scratch/micpie/export/rdkit_features/train_120-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES Cc1ccc(cc1)NC(=O)CSc2nnc(n2C)C=NC(=O)CCc3ccccc3Cl is 65.57."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES Cc1c(cnn1c2ccccc2)N[C@@H](C)C(=O)Nc3cccc(c3Cl)Cl is 55.00."}", "/scratch/micpie/export/rdkit_features/valid_6-20.jsonl": "{"text":"Question: What is the basic group count of the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+]?\nAnswer: 2"} {"text":"Question: What is the basic group count of the compound with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_2-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C is 7."} {"text":"The count of rotatable bonds of the chemical with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4 is 5."}", "/scratch/micpie/export/rdkit_features/test_4-19.jsonl": "{"text":"Question: What is the count of acid groups of the compound with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_110-23.jsonl": "{"text":"User: I want to synthesize a chemical with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptors of 5 and a LogP value computed using the Wildman-Crippen method of 1.04.\nAssistant: Interesting, do you have some additional conditions that I should consider?\nUser: Yep, I want the chemical formula to be C23H30N3O4S+.\nAssistant: I propose the chemical with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3."} {"text":"User: I want to design a molecule with a count of hydrogen bond donors of 0, a number of hydrogen bond acceptor sites of 8 and a Wildman-Crippen LogP value of 2.17.\nAssistant: Do you have some additional requirements I should take into account?\nUser: I want the chemical formula to be C21H31N5O3S.\nAssistant: In that scenario, I propose the molecule with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4."}", "/scratch/micpie/export/rdkit_features/valid_33-15.jsonl": "{"text":"Question: What is the heteroatom count of the compound with SMILES c1ccc2c(c1)cn(n2)CC(=O)N3C[C@@H]4[C@H]3CN(CC4)C(=O)[C@@]56CCC[C@@H]5C6?\nAnswer: 6"} {"text":"Question: What is the number of heteroatoms of the compound with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@@H]3CCSC3)Br?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/train_105-3.jsonl": "{"text":"The count of heteroatoms of the compound with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N is 7."} {"text":"The heteroatom count of the molecule with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O is 7."}", "/scratch/micpie/export/rdkit_features/test_6-16.jsonl": "{"text":"Question: What is the count of rings of the compound with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3?\nAnswer: 3"} {"text":"Question: What is the ring count of the molecule with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_108-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O?\nAnswer: 8"} {"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_114-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the compound with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_11-0.jsonl": "{"text":"The molecular formula of the compound with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl is C20H28ClNO2."} {"text":"The molecular formula of the molecule with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C is C17H21Cl2NO."}", "/scratch/micpie/export/rdkit_features/valid_4-17.jsonl": "{"text":"Question: What is the rotatable bond count of the compound with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br?\nAnswer: 4"} {"text":"Question: What is the rotatable bond count of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_32-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_102-7.jsonl": "{"text":"The acid group count of the compound with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O is 0."} {"text":"The count of acid groups of the compound with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1 is 0."}", "/scratch/micpie/export/rdkit_features/valid_117-16.jsonl": "{"text":"Question: What is the count of rings of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4OC)Cl?\nAnswer: 5"} {"text":"Question: What is the count of rings of the molecule with SMILES Cc1cc(ccc1[C@@H](C)[NH2+]CC2(CC2)CO)F?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_105-11.jsonl": "{"text":"User: I want to analyze a molecule with a molecular formula of C21H15N3O4.\nAssistant: Do you have some additional I should take into account?\nUser: I want the number of hydrogen bond donor sites to be 1, the count of hydrogen bond acceptors to be 5.\nAssistant: Then, I the molecule with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N."} {"text":"User: I want to synthesize a molecule with a chemical formula of C16H21N3O4.\nAssistant: Interesting, do you have some additional requirements that I should consider?\nUser: I want the count of hydrogen bond donors to be 2, the count of hydrogen bond acceptors to be 6.\nAssistant: I recommend the molecule with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O."}", "/scratch/micpie/export/rdkit_features/valid_30-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_102-3.jsonl": "{"text":"The count of heteroatoms of the molecule with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O is 10."} {"text":"The heteroatom count of the molecule with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1 is 3."}", "/scratch/micpie/export/rdkit_features/test_4-23.jsonl": "{"text":"User: I want to make a compound with a number of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value computed using RDKit of 3.86.\nAssistant: Interesting, do you have some additional limitations I should take into account?\nUser: I want the molecular formula to be C23H20ClN3O3.\nAssistant: I advise the compound with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4."} {"text":"User: I want to analyze a compound with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptor sites of 4 and a LogP value computed using the Wildman-Crippen method of 2.50.\nAssistant: Do you have some additional that help me narrow down the search?\nUser: Yep, I want the formula to be C23H29N2O3S+.\nAssistant: Then, I propose the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3."}", "/scratch/micpie/export/rdkit_features/test_117-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc5cc(ccc5c4)Br is 5.13."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES C[C@H](C(C)(C)OC)[NH2+]CCNc1ccccc1 is 1.48."}", "/scratch/micpie/export/rdkit_features/train_8-11.jsonl": "{"text":"User: I want to create a compound with a chemical formula of C20H13F2N3O5.\nAssistant: That is a very interesting question, do you have some additional requirements that help me narrow down the search?\nUser: Indeed, I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptors to be 6.\nAssistant: In that case, I advise the compound with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N."} {"text":"User: I want to create a compound with a chemical formula of C23H32NO4S+.\nAssistant: Do you have some additional requirements that I should consider?\nUser: I want the count of hydrogen bond donors to be 2, the count of hydrogen bond acceptors to be 4.\nAssistant: In that scenario, I recommend the compound with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO."}", "/scratch/micpie/export/rdkit_features/test_32-23.jsonl": "{"text":"User: I want to design a chemical with a count of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 6 and a Wildman-Crippen LogP value computed using RDKit of 0.85.\nAssistant: Cool, do you have some additional requirements I should take into account?\nUser: I want the chemical formula to be C19H30N5O2S+.\nAssistant: Then, I propose the chemical with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3."} {"text":"User: I want to synthesize a molecule with a number of hydrogen bond donors of 3, a number of hydrogen bond acceptor sites of 4 and a LogP value computed using the Wildman-Crippen method of 2.42.\nAssistant: Do you have some additional limitations that I should consider?\nUser: Indeed, I want the chemical formula to be C21H24N4O3.\nAssistant: In that scenario, I advise the molecule with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3."}", "/scratch/micpie/export/rdkit_features/valid_19-3.jsonl": "{"text":"The count of heteroatoms of the molecule with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC is 7."} {"text":"The heteroatom count of the molecule with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F is 9."}", "/scratch/micpie/export/rdkit_features/test_6-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3 is C22H34N4O4."} {"text":"The formula of the compound with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C is C25H30N4O2."}", "/scratch/micpie/export/rdkit_features/train_116-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 71.83"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br?\nAnswer: 70.09"}", "/scratch/micpie/export/rdkit_features/train_29-22.jsonl": "{"text":"User: I want to design a molecule with a number of hydrogen bond donors of 2 and a count of hydrogen bond acceptors of 2.\nAssistant: Interesting, do you have some additional conditions that help me narrow down the search?\nUser: Yep, I want the heteroatom count to be 6.\nAssistant: In that situation, I the molecule with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F."} {"text":"User: I want to make a compound with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptor sites of 2.\nAssistant: Interesting, do you have some additional limitations that I should consider?\nUser: I want the count of heteroatoms to be 5.\nAssistant: Then, I propose the compound with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C."}", "/scratch/micpie/export/rdkit_features/test_23-18.jsonl": "{"text":"Question: What is the aromatic bond count of the chemical with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4?\nAnswer: 5"} {"text":"Question: What is the aromatic bond count of the compound with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4?\nAnswer: 17"}", "/scratch/micpie/export/rdkit_features/train_111-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 0"} {"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_5-5.jsonl": "{"text":"The rotatable bond count of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C is 4."} {"text":"The count of rotatable bonds of the molecule with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C is 2."}", "/scratch/micpie/export/rdkit_features/test_108-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5?\nAnswer: C21H19N9OS"} {"text":"Question: What is the formula of the chemical with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3?\nAnswer: C23H29N5O4"}", "/scratch/micpie/export/rdkit_features/train_30-1.jsonl": "{"text":"The count of hydrogen bond donors of the chemical with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F is 1."} {"text":"The count of hydrogen bond donors of the molecule with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC is 0."}", "/scratch/micpie/export/rdkit_features/test_18-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O is 9."} {"text":"The number of hydrogen bond acceptor sites of the compound with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C is 4."}", "/scratch/micpie/export/rdkit_features/train_26-23.jsonl": "{"text":"User: I want to design a chemical with a number of hydrogen bond donor sites of 0, a number of hydrogen bond acceptor sites of 2 and a Wildman-Crippen LogP value of 3.89.\nAssistant: Nice, do you have some additional requirements I should take into account?\nUser: I want the chemical formula to be C15H15F6NO2.\nAssistant: In that situation, I recommend the chemical with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F."} {"text":"User: I want to synthesize a compound with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptor sites of 3 and a Wildman-Crippen LogP value computed using RDKit of 3.15.\nAssistant: Do you have some additional constraints?\nUser: I want the formula to be C21H31N4O+.\nAssistant: I the compound with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C."}", "/scratch/micpie/export/rdkit_features/train_10-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the chemical with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4?\nAnswer: 5"} {"text":"Question: What is the aromatic bond count of the molecule with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_10-12.jsonl": "{"text":"Question: What is the chemical formula of the compound with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4?\nAnswer: C23H42N5O+"} {"text":"Question: What is the molecular formula of the molecule with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl?\nAnswer: C20H28ClNO2"}", "/scratch/micpie/export/rdkit_features/test_23-16.jsonl": "{"text":"Question: What is the ring count of the compound with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4?\nAnswer: 4"} {"text":"Question: What is the ring count of the molecule with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_6-12.jsonl": "{"text":"Question: What is the chemical formula of the compound with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4?\nAnswer: C20H24N4O4S"} {"text":"Question: What is the chemical formula of the chemical with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C?\nAnswer: C25H30N4O2"}", "/scratch/micpie/export/rdkit_features/train_105-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N is 53.47."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O is 48.67."}", "/scratch/micpie/export/rdkit_features/train_31-19.jsonl": "{"text":"Question: What is the acid group count of the compound with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the compound with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_21-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-] is 5."} {"text":"The number of aromatic bonds of the compound with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3 is 6."}", "/scratch/micpie/export/rdkit_features/valid_0-16.jsonl": "{"text":"Question: What is the count of rings of the chemical with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1?\nAnswer: 5"} {"text":"Question: What is the count of rings of the molecule with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_17-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S?\nAnswer: 5"} {"text":"Question: What is the number of rotatable bonds of the chemical with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_6-23.jsonl": "{"text":"User: I want to analyze a compound with a count of hydrogen bond donors of 3, a number of hydrogen bond acceptor sites of 4 and a Wildman-Crippen LogP value of 3.27.\nAssistant: Interesting, do you have some additional limitations I should take into account?\nUser: Yep, I want the formula to be C22H34N4O4.\nAssistant: Then, I suggest the compound with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3."} {"text":"User: I want to design a molecule with a number of hydrogen bond donor sites of 1, a count of hydrogen bond acceptors of 5 and a Wildman-Crippen LogP value of 3.62.\nAssistant: Interesting, do you have some additional limitations that help me narrow down the search?\nUser: Yeah, I want the molecular formula to be C25H30N4O2.\nAssistant: In that case, I recommend the molecule with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C."}", "/scratch/micpie/export/rdkit_features/train_27-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C?\nAnswer: 62.83"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4?\nAnswer: 58.00"}", "/scratch/micpie/export/rdkit_features/train_11-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl is 6."} {"text":"The count of aromatic bonds of the molecule with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C is 6."}", "/scratch/micpie/export/rdkit_features/train_114-18.jsonl": "{"text":"Question: What is the aromatic bond count of the chemical with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C?\nAnswer: 18"} {"text":"Question: What is the aromatic bond count of the molecule with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F?\nAnswer: 23"}", "/scratch/micpie/export/rdkit_features/train_104-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: C14H21N5O4"} {"text":"Question: What is the formula of the molecule with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl?\nAnswer: C19H16ClN2O2S-"}", "/scratch/micpie/export/rdkit_features/valid_116-19.jsonl": "{"text":"Question: What is the count of acid groups of the compound with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_18-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O?\nAnswer: 9"} {"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_103-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the chemical with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1?\nAnswer: 4"} {"text":"Question: What is the heteroatom count of the molecule with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/train_4-20.jsonl": "{"text":"Question: What is the number of basic groups of the chemical with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4?\nAnswer: 1"} {"text":"Question: What is the number of basic groups of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_6-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+]?\nAnswer: 8"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_16-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C is 2."} {"text":"The number of hydrogen bond acceptors of the chemical with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3 is 4."}", "/scratch/micpie/export/rdkit_features/train_111-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C is 64.81."} {"text":"The sum of atomic polarizabilities of the compound with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC is 68.16."}", "/scratch/micpie/export/rdkit_features/test_21-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the chemical with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-] is 1."} {"text":"The number of hydrogen bond donor sites of the molecule with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O is 1."}", "/scratch/micpie/export/rdkit_features/train_106-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the molecule with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O is 2.27."} {"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl is 2.33."}", "/scratch/micpie/export/rdkit_features/valid_30-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3?\nAnswer: C21H34N4O"} {"text":"Question: What is the molecular formula of the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F?\nAnswer: C24H24F2N2O2"}", "/scratch/micpie/export/rdkit_features/train_118-17.jsonl": "{"text":"Question: What is the rotatable bond count of the compound with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F?\nAnswer: 6"} {"text":"Question: What is the rotatable bond count of the compound with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_19-11.jsonl": "{"text":"User: I want to make a compound with a chemical formula of C23H31N3O4.\nAssistant: That is a very interesting question, do you have some additional conditions I should take into account?\nUser: Indeed, I want the number of hydrogen bond donors to be 3, the number of hydrogen bond acceptors to be 4.\nAssistant: In that case, I suggest the compound with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC."} {"text":"User: I want to analyze a chemical with a chemical formula of C25H33N3O3.\nAssistant: Interesting, do you have some additional limitations that I should consider?\nUser: Yeah, I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 3.\nAssistant: In that situation, I the chemical with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3."}", "/scratch/micpie/export/rdkit_features/train_6-8.jsonl": "{"text":"The count of basic groups of the chemical with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4 is 0."} {"text":"The count of basic groups of the compound with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C is 0."}", "/scratch/micpie/export/rdkit_features/valid_2-19.jsonl": "{"text":"Question: What is the acid group count of the chemical with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the compound with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_106-5.jsonl": "{"text":"The rotatable bond count of the chemical with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O is 6."} {"text":"The number of rotatable bonds of the molecule with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC is 8."}", "/scratch/micpie/export/rdkit_features/valid_116-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is C30H32FN3O2."} {"text":"The chemical formula of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl is C26H25ClN2O4."}", "/scratch/micpie/export/rdkit_features/train_120-6.jsonl": "{"text":"The number of aromatic bonds of the chemical with SMILES Cc1ccc(cc1)NC(=O)CSc2nnc(n2C)C=NC(=O)CCc3ccccc3Cl is 17."} {"text":"The aromatic bond count of the compound with SMILES Cc1c(cnn1c2ccccc2)N[C@@H](C)C(=O)Nc3cccc(c3Cl)Cl is 17."}", "/scratch/micpie/export/rdkit_features/test_16-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C?\nAnswer: 2"} {"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_115-5.jsonl": "{"text":"The rotatable bond count of the chemical with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F is 5."} {"text":"The number of rotatable bonds of the molecule with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 5."}", "/scratch/micpie/export/rdkit_features/test_118-7.jsonl": "{"text":"The acid group count of the molecule with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O is 0."} {"text":"The acid group count of the chemical with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N is 0."}", "/scratch/micpie/export/rdkit_features/test_3-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC is 5."} {"text":"The number of rotatable bonds of the chemical with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F is 6."}", "/scratch/micpie/export/rdkit_features/train_2-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6 is C22H24N5O2S+."} {"text":"The chemical formula of the compound with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O is C25H24N2O4."}", "/scratch/micpie/export/rdkit_features/train_12-4.jsonl": "{"text":"The number of rings of the molecule with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C is 3."} {"text":"The number of rings of the molecule with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC is 2."}", "/scratch/micpie/export/rdkit_features/valid_109-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_9-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C?\nAnswer: 3"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_108-3.jsonl": "{"text":"The number of heteroatoms of the molecule with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5 is 11."} {"text":"The number of heteroatoms of the molecule with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3 is 9."}", "/scratch/micpie/export/rdkit_features/test_18-23.jsonl": "{"text":"User: I want to make a chemical with a count of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 9 and a Wildman-Crippen LogP value of 0.35.\nAssistant: That's interesting, do you have some additional that help me narrow down the search?\nUser: Yes, I want the chemical formula to be C20H25N3O9S.\nAssistant: In that situation, I suggest the chemical with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O."} {"text":"User: I want to synthesize a compound with a number of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 4 and a LogP value computed using the Wildman-Crippen method of 3.51.\nAssistant: Cool, do you have some additional requirements I should take into account?\nUser: Yea, I want the formula to be C21H28N2O3S2.\nAssistant: Then, I propose the compound with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C."}", "/scratch/micpie/export/rdkit_features/valid_30-4.jsonl": "{"text":"The number of rings of the chemical with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3 is 3."} {"text":"The count of rings of the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F is 5."}", "/scratch/micpie/export/rdkit_features/test_13-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the chemical with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC is 1."} {"text":"The number of hydrogen bond donors of the compound with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4 is 0."}", "/scratch/micpie/export/rdkit_features/valid_2-22.jsonl": "{"text":"User: I want to make a molecule with a count of hydrogen bond donors of 0 and a number of hydrogen bond acceptors of 7.\nAssistant: Cool, do you have some additional conditions I should consider?\nUser: Yea, I want the count of heteroatoms to be 8.\nAssistant: In that scenario, I propose the molecule with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4."} {"text":"User: I want to analyze a molecule with a number of hydrogen bond donor sites of 1 and a number of hydrogen bond acceptors of 4.\nAssistant: Cool, do you have some additional limitations that help me narrow down the search?\nUser: Yea, I want the number of heteroatoms to be 7.\nAssistant: In that situation, I suggest the molecule with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3."}", "/scratch/micpie/export/rdkit_features/valid_0-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1 is 1."} {"text":"The acid group count of the compound with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4 is 0."}", "/scratch/micpie/export/rdkit_features/train_32-22.jsonl": "{"text":"User: I want to make a chemical with a number of hydrogen bond donors of 4 and a number of hydrogen bond acceptors of 5.\nAssistant: Nice, do you have some additional limitations that help me narrow down the search?\nUser: Yep, I want the count of heteroatoms to be 9.\nAssistant: In that scenario, I advise the chemical with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C."} {"text":"User: I want to design a chemical with a count of hydrogen bond donors of 3 and a number of hydrogen bond acceptor sites of 4.\nAssistant: Nice, do you have some additional I should take into account?\nUser: I want the heteroatom count to be 10.\nAssistant: I propose the chemical with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2."}", "/scratch/micpie/export/rdkit_features/train_112-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the molecule with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC is 2.20."} {"text":"The Wildman-Crippen LogP value of the compound with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 0.24."}", "/scratch/micpie/export/rdkit_features/valid_114-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the chemical with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl is 3."} {"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4 is 2."}", "/scratch/micpie/export/rdkit_features/train_6-22.jsonl": "{"text":"User: I want to make a compound with a number of hydrogen bond donor sites of 2 and a count of hydrogen bond acceptors of 6.\nAssistant: That's interesting, do you have some additional requirements I should consider?\nUser: Yea, I want the count of heteroatoms to be 9.\nAssistant: In that scenario, I advise the compound with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4."} {"text":"User: I want to design a molecule with a number of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 5.\nAssistant: Cool, do you have some additional limitations that I should consider?\nUser: Yea, I want the heteroatom count to be 6.\nAssistant: In that scenario, I the molecule with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C."}", "/scratch/micpie/export/rdkit_features/train_4-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4 is 66.62."} {"text":"The sum of atomic polarizabilities of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C is 67.32."}", "/scratch/micpie/export/rdkit_features/valid_1-18.jsonl": "{"text":"Question: What is the aromatic bond count of the molecule with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5?\nAnswer: 21"} {"text":"Question: What is the count of aromatic bonds of the compound with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F?\nAnswer: 17"}", "/scratch/micpie/export/rdkit_features/valid_29-12.jsonl": "{"text":"Question: What is the molecular formula of the molecule with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl?\nAnswer: C20H31ClN3O+"} {"text":"Question: What is the chemical formula of the molecule with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3?\nAnswer: C21H25F2N2O2+"}", "/scratch/micpie/export/rdkit_features/train_103-3.jsonl": "{"text":"The heteroatom count of the compound with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1 is 4."} {"text":"The count of heteroatoms of the chemical with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 9."}", "/scratch/micpie/export/rdkit_features/valid_16-19.jsonl": "{"text":"Question: What is the count of acid groups of the compound with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C?\nAnswer: 0"} {"text":"Question: What is the acid group count of the compound with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_101-16.jsonl": "{"text":"Question: What is the count of rings of the molecule with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3?\nAnswer: 3"} {"text":"Question: What is the ring count of the compound with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_1-22.jsonl": "{"text":"User: I want to create a chemical with a number of hydrogen bond donors of 2 and a count of hydrogen bond acceptors of 3.\nAssistant: That is a very interesting question, do you have some additional constraints that I should consider?\nUser: Yea, I want the count of heteroatoms to be 6.\nAssistant: Then, I the chemical with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4."} {"text":"User: I want to create a molecule with a number of hydrogen bond donors of 0 and a number of hydrogen bond acceptor sites of 8.\nAssistant: That is a very interesting question, do you have some additional conditions that I should consider?\nUser: Yeah, I want the heteroatom count to be 8.\nAssistant: In that scenario, I suggest the molecule with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC."}", "/scratch/micpie/export/rdkit_features/train_117-23.jsonl": "{"text":"User: I want to make a chemical with a number of hydrogen bond donors of 0, a number of hydrogen bond acceptors of 5 and a Wildman-Crippen LogP value of 5.06.\nAssistant: That's interesting, do you have some additional ?\nUser: Yep, I want the chemical formula to be C27H26ClN3O4.\nAssistant: In that case, I recommend the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(cc4)Oc5ccc(cn5)Cl."} {"text":"User: I want to synthesize a chemical with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptors of 2 and a Wildman-Crippen LogP value of 1.48.\nAssistant: Nice, do you have some additional I should consider?\nUser: Yeah, I want the chemical formula to be C14H25N2O+.\nAssistant: In that situation, I propose the chemical with SMILES C[C@@H](C(C)(C)OC)[NH2+]CCNc1ccccc1."}", "/scratch/micpie/export/rdkit_features/valid_15-5.jsonl": "{"text":"The rotatable bond count of the molecule with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N is 5."} {"text":"The number of rotatable bonds of the molecule with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C is 9."}", "/scratch/micpie/export/rdkit_features/valid_15-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the compound with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N is 50.22."} {"text":"The total sum of atomic polarizabilities of the molecule with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C is 58.82."}", "/scratch/micpie/export/rdkit_features/train_108-0.jsonl": "{"text":"The molecular formula of the chemical with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2 is C17H21F3N6O2S."} {"text":"The chemical formula of the molecule with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3 is C23H32N4O4."}", "/scratch/micpie/export/rdkit_features/test_102-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3?\nAnswer: 3"} {"text":"Question: What is the number of rotatable bonds of the chemical with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/valid_119-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O?\nAnswer: 44.51"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6?\nAnswer: 71.12"}", "/scratch/micpie/export/rdkit_features/valid_30-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the molecule with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3?\nAnswer: 4"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_30-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the compound with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3 is 4."} {"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F is 2."}", "/scratch/micpie/export/rdkit_features/test_32-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3 is 0.85."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3 is 2.42."}", "/scratch/micpie/export/rdkit_features/train_30-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the molecule with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F?\nAnswer: 7"} {"text":"Question: What is the heteroatom count of the chemical with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_13-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC is 6."} {"text":"The number of rotatable bonds of the molecule with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4 is 2."}", "/scratch/micpie/export/rdkit_features/train_1-7.jsonl": "{"text":"The acid group count of the molecule with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4 is 0."} {"text":"The count of acid groups of the compound with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl is 0."}", "/scratch/micpie/export/rdkit_features/valid_2-12.jsonl": "{"text":"Question: What is the molecular formula of the molecule with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4?\nAnswer: C21H24N4O3S"} {"text":"Question: What is the molecular formula of the chemical with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3?\nAnswer: C21H28N4O2S"}", "/scratch/micpie/export/rdkit_features/valid_105-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F?\nAnswer: 3"} {"text":"Question: What is the count of rings of the molecule with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_9-6.jsonl": "{"text":"The count of aromatic bonds of the molecule with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C is 12."} {"text":"The count of aromatic bonds of the compound with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F is 11."}", "/scratch/micpie/export/rdkit_features/train_3-20.jsonl": "{"text":"Question: What is the basic group count of the molecule with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O?\nAnswer: 0"} {"text":"Question: What is the basic group count of the molecule with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_20-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-]?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_108-19.jsonl": "{"text":"Question: What is the acid group count of the chemical with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the molecule with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_115-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F is 24."} {"text":"The aromatic bond count of the molecule with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 18."}", "/scratch/micpie/export/rdkit_features/test_118-22.jsonl": "{"text":"User: I want to create a chemical with a count of hydrogen bond donors of 2 and a number of hydrogen bond acceptors of 1.\nAssistant: Interesting, do you have some additional requirements that I should consider?\nUser: I want the heteroatom count to be 3.\nAssistant: I the chemical with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O."} {"text":"User: I want to synthesize a chemical with a number of hydrogen bond donor sites of 3 and a count of hydrogen bond acceptors of 7.\nAssistant: That's interesting, do you have some additional constraints?\nUser: Yes, I want the number of heteroatoms to be 11.\nAssistant: Then, I recommend the chemical with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N."}", "/scratch/micpie/export/rdkit_features/valid_23-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4 is 1."} {"text":"The count of hydrogen bond donors of the chemical with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4 is 1."}", "/scratch/micpie/export/rdkit_features/train_5-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C?\nAnswer: 67.32"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C?\nAnswer: 65.91"}", "/scratch/micpie/export/rdkit_features/train_117-11.jsonl": "{"text":"User: I want to create a molecule with a formula of C27H26ClN3O4.\nAssistant: Nice, do you have some additional requirements that I should consider?\nUser: Yep, I want the number of hydrogen bond donors to be 0, the number of hydrogen bond acceptor sites to be 5.\nAssistant: Then, I recommend the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(cc4)Oc5ccc(cn5)Cl."} {"text":"User: I want to create a molecule with a formula of C14H25N2O+.\nAssistant: Interesting, do you have some additional requirements I should consider?\nUser: Yeah, I want the count of hydrogen bond donors to be 2, the count of hydrogen bond acceptors to be 2.\nAssistant: Then, I suggest the molecule with SMILES C[C@@H](C(C)(C)OC)[NH2+]CCNc1ccccc1."}", "/scratch/micpie/export/rdkit_features/test_102-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3?\nAnswer: 0"} {"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_1-4.jsonl": "{"text":"The ring count of the compound with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4 is 4."} {"text":"The count of rings of the molecule with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl is 3."}", "/scratch/micpie/export/rdkit_features/test_102-5.jsonl": "{"text":"The number of rotatable bonds of the compound with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3 is 3."} {"text":"The count of rotatable bonds of the molecule with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1 is 11."}", "/scratch/micpie/export/rdkit_features/train_19-16.jsonl": "{"text":"Question: What is the count of rings of the compound with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC?\nAnswer: 2"} {"text":"Question: What is the ring count of the molecule with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_2-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4 is C21H24N4O3S."} {"text":"The formula of the compound with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3 is C21H28N4O2S."}", "/scratch/micpie/export/rdkit_features/valid_26-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2 is 52.86."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC is 54.38."}", "/scratch/micpie/export/rdkit_features/train_107-11.jsonl": "{"text":"User: I want to make a compound with a chemical formula of C21H30ClN6O3+.\nAssistant: Cool, do you have some additional constraints that help me narrow down the search?\nUser: I want the count of hydrogen bond donors to be 3, the number of hydrogen bond acceptors to be 3.\nAssistant: In that scenario, I propose the compound with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl."} {"text":"User: I want to design a molecule with a chemical formula of C21H19N9OS.\nAssistant: Cool, do you have some additional conditions that I should consider?\nUser: Yep, I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 10.\nAssistant: In that scenario, I propose the molecule with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5."}", "/scratch/micpie/export/rdkit_features/valid_27-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F is 47.52."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C is 62.15."}", "/scratch/micpie/export/rdkit_features/valid_108-11.jsonl": "{"text":"User: I want to design a compound with a chemical formula of C17H20N4O6S2.\nAssistant: Interesting, do you have some additional limitations?\nUser: Indeed, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 8.\nAssistant: I suggest the compound with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O."} {"text":"User: I want to analyze a chemical with a chemical formula of C23H34N4O4.\nAssistant: That is a very interesting question, do you have some additional conditions that help me narrow down the search?\nUser: Yeah, I want the number of hydrogen bond donors to be 3, the count of hydrogen bond acceptors to be 4.\nAssistant: I suggest the chemical with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C."}", "/scratch/micpie/export/rdkit_features/train_27-11.jsonl": "{"text":"User: I want to design a molecule with a chemical formula of C21H31N4O+.\nAssistant: That is a very interesting question, do you have some additional that I should consider?\nUser: Yes, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 3.\nAssistant: In that case, I recommend the molecule with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C."} {"text":"User: I want to design a molecule with a molecular formula of C21H23N3O3.\nAssistant: Do you have some additional limitations that I should consider?\nUser: Yeah, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 4.\nAssistant: Then, I the molecule with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4."}", "/scratch/micpie/export/rdkit_features/valid_3-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N?\nAnswer: 4"} {"text":"Question: What is the ring count of the compound with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_10-3.jsonl": "{"text":"The count of heteroatoms of the compound with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4 is 6."} {"text":"The number of heteroatoms of the molecule with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl is 4."}", "/scratch/micpie/export/rdkit_features/test_24-17.jsonl": "{"text":"Question: What is the rotatable bond count of the compound with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl?\nAnswer: 3"} {"text":"Question: What is the rotatable bond count of the compound with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/train_111-22.jsonl": "{"text":"User: I want to make a compound with a count of hydrogen bond donors of 0 and a count of hydrogen bond acceptors of 8.\nAssistant: Cool, do you have some additional conditions that help me narrow down the search?\nUser: Yep, I want the number of heteroatoms to be 10.\nAssistant: In that situation, I recommend the compound with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C."} {"text":"User: I want to design a compound with a count of hydrogen bond donors of 0 and a count of hydrogen bond acceptors of 10.\nAssistant: Do you have some additional constraints I should take into account?\nUser: I want the heteroatom count to be 10.\nAssistant: In that situation, I advise the compound with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC."}", "/scratch/micpie/export/rdkit_features/valid_116-12.jsonl": "{"text":"Question: What is the formula of the molecule with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: C30H32FN3O2"} {"text":"Question: What is the molecular formula of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl?\nAnswer: C26H25ClN2O4"}", "/scratch/micpie/export/rdkit_features/test_103-23.jsonl": "{"text":"User: I want to synthesize a chemical with a number of hydrogen bond donors of 2, a number of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value of 4.89.\nAssistant: That is a very interesting question, do you have some additional conditions I should consider?\nUser: Yeah, I want the formula to be C22H16N2O2.\nAssistant: In that case, I propose the chemical with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1."} {"text":"User: I want to design a molecule with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 7 and a Wildman-Crippen LogP value computed using RDKit of -1.05.\nAssistant: Do you have some additional conditions I should consider?\nUser: Indeed, I want the chemical formula to be C14H21N5O4.\nAssistant: Then, I advise the molecule with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3."}", "/scratch/micpie/export/rdkit_features/valid_108-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the molecule with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_111-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 4"} {"text":"Question: What is the ring count of the compound with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_11-11.jsonl": "{"text":"User: I want to design a compound with a chemical formula of C18H24F3NO.\nAssistant: Do you have some additional requirements I should take into account?\nUser: Indeed, I want the number of hydrogen bond donors to be 0, the number of hydrogen bond acceptors to be 1.\nAssistant: In that scenario, I suggest the compound with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2."} {"text":"User: I want to create a compound with a chemical formula of C22H31NO2.\nAssistant: Do you have some additional that I should consider?\nUser: Indeed, I want the number of hydrogen bond donor sites to be 0, the number of hydrogen bond acceptor sites to be 2.\nAssistant: In that situation, I the compound with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C."}", "/scratch/micpie/export/rdkit_features/valid_14-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C is 59.07."} {"text":"The total sum of atomic polarizabilities of the molecule with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O is 40.53."}", "/scratch/micpie/export/rdkit_features/valid_9-4.jsonl": "{"text":"The number of rings of the compound with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2 is 2."} {"text":"The ring count of the chemical with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4 is 4."}", "/scratch/micpie/export/rdkit_features/test_7-1.jsonl": "{"text":"The number of hydrogen bond donors of the compound with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC is 1."} {"text":"The count of hydrogen bond donors of the chemical with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC is 1."}", "/scratch/micpie/export/rdkit_features/test_32-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3?\nAnswer: 6"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_4-8.jsonl": "{"text":"The basic group count of the compound with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4 is 0."} {"text":"The basic group count of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3 is 1."}", "/scratch/micpie/export/rdkit_features/train_101-1.jsonl": "{"text":"The count of hydrogen bond donors of the chemical with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3 is 1."} {"text":"The number of hydrogen bond donors of the chemical with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1 is 1."}", "/scratch/micpie/export/rdkit_features/valid_32-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3?\nAnswer: 6"} {"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_13-20.jsonl": "{"text":"Question: What is the basic group count of the chemical with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 0"} {"text":"Question: What is the basic group count of the molecule with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_7-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the compound with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl is 62.61."} {"text":"The total sum of atomic polarizabilities of the molecule with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F is 60.29."}", "/scratch/micpie/export/rdkit_features/test_109-11.jsonl": "{"text":"User: I want to make a compound with a molecular formula of C22H26ClN3O4.\nAssistant: That is a very interesting question, do you have some additional limitations I should take into account?\nUser: Yeah, I want the number of hydrogen bond donor sites to be 3, the number of hydrogen bond acceptor sites to be 5.\nAssistant: In that situation, I propose the compound with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl."} {"text":"User: I want to synthesize a chemical with a chemical formula of C16H13BrN4O4S.\nAssistant: Interesting, do you have some additional constraints I should consider?\nUser: I want the number of hydrogen bond donors to be 3, the count of hydrogen bond acceptors to be 5.\nAssistant: Then, I advise the chemical with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N."}", "/scratch/micpie/export/rdkit_features/train_117-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(cc4)Oc5ccc(cn5)Cl?\nAnswer: 5"} {"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES C[C@@H](C(C)(C)OC)[NH2+]CCNc1ccccc1?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_105-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the compound with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N?\nAnswer: 5"} {"text":"Question: What is the heteroatom count of the compound with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/train_112-0.jsonl": "{"text":"The molecular formula of the molecule with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC is C22H31N5O4."} {"text":"The chemical formula of the molecule with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is C21H32N5O3S+."}", "/scratch/micpie/export/rdkit_features/test_5-8.jsonl": "{"text":"The number of basic groups of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5 is 1."} {"text":"The basic group count of the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F is 1."}", "/scratch/micpie/export/rdkit_features/valid_3-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N?\nAnswer: 5"} {"text":"Question: What is the rotatable bond count of the chemical with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_22-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O?\nAnswer: 4"} {"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/valid_115-17.jsonl": "{"text":"Question: What is the rotatable bond count of the compound with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4?\nAnswer: 8"} {"text":"Question: What is the number of rotatable bonds of the chemical with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_106-5.jsonl": "{"text":"The number of rotatable bonds of the molecule with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl is 2."} {"text":"The rotatable bond count of the compound with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F is 5."}", "/scratch/micpie/export/rdkit_features/valid_18-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the compound with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_29-3.jsonl": "{"text":"The number of heteroatoms of the chemical with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl is 5."} {"text":"The number of heteroatoms of the molecule with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3 is 6."}", "/scratch/micpie/export/rdkit_features/test_8-7.jsonl": "{"text":"The count of acid groups of the compound with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C is 0."} {"text":"The number of acid groups of the chemical with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F is 0."}", "/scratch/micpie/export/rdkit_features/valid_32-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3 is 63.45."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O is 50.73."}", "/scratch/micpie/export/rdkit_features/train_105-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N?\nAnswer: 53.47"} {"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O?\nAnswer: 48.67"}", "/scratch/micpie/export/rdkit_features/valid_32-8.jsonl": "{"text":"The number of basic groups of the molecule with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3 is 1."} {"text":"The number of basic groups of the molecule with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O is 0."}", "/scratch/micpie/export/rdkit_features/test_6-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3?\nAnswer: 4"} {"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_8-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the chemical with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C?\nAnswer: 8"} {"text":"Question: What is the count of heteroatoms of the chemical with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_11-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2?\nAnswer: 6"} {"text":"Question: What is the aromatic bond count of the compound with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_7-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the molecule with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl?\nAnswer: 16"} {"text":"Question: What is the aromatic bond count of the molecule with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/valid_12-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl?\nAnswer: 0"} {"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_10-5.jsonl": "{"text":"The rotatable bond count of the compound with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4 is 8."} {"text":"The count of rotatable bonds of the compound with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl is 3."}", "/scratch/micpie/export/rdkit_features/train_7-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl?\nAnswer: C22H24ClN3O3"} {"text":"Question: What is the molecular formula of the chemical with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F?\nAnswer: C21H24F2N2O5"}", "/scratch/micpie/export/rdkit_features/valid_114-11.jsonl": "{"text":"User: I want to analyze a compound with a formula of C20H17Cl2N3O2S2.\nAssistant: Do you have some additional that I should consider?\nUser: Yep, I want the number of hydrogen bond donors to be 3, the number of hydrogen bond acceptors to be 3.\nAssistant: I recommend the compound with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl."} {"text":"User: I want to design a molecule with a chemical formula of C27H24ClN3O2.\nAssistant: Do you have some additional conditions I should consider?\nUser: I want the number of hydrogen bond donor sites to be 3, the number of hydrogen bond acceptors to be 2.\nAssistant: In that case, I suggest the molecule with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4."}", "/scratch/micpie/export/rdkit_features/train_20-7.jsonl": "{"text":"The number of acid groups of the compound with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3 is 0."} {"text":"The number of acid groups of the compound with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC is 1."}", "/scratch/micpie/export/rdkit_features/valid_21-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-] is 6."} {"text":"The count of hydrogen bond acceptors of the molecule with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3 is 4."}", "/scratch/micpie/export/rdkit_features/valid_101-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the chemical with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3 is 4."} {"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1 is 3."}", "/scratch/micpie/export/rdkit_features/test_109-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl?\nAnswer: 3"} {"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_106-4.jsonl": "{"text":"The number of rings of the molecule with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl is 3."} {"text":"The count of rings of the molecule with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F is 3."}", "/scratch/micpie/export/rdkit_features/train_110-0.jsonl": "{"text":"The formula of the molecule with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O is C22H19N5O5."} {"text":"The chemical formula of the compound with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C is C19H25N5O3S2."}", "/scratch/micpie/export/rdkit_features/valid_103-22.jsonl": "{"text":"User: I want to synthesize a compound with a count of hydrogen bond donors of 4 and a number of hydrogen bond acceptors of 7.\nAssistant: Cool, do you have some additional limitations I should consider?\nUser: I want the count of heteroatoms to be 7.\nAssistant: I propose the compound with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO."} {"text":"User: I want to create a chemical with a number of hydrogen bond donors of 2 and a count of hydrogen bond acceptors of 6.\nAssistant: Interesting, do you have some additional conditions?\nUser: Yeah, I want the count of heteroatoms to be 9.\nAssistant: I propose the chemical with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N."}", "/scratch/micpie/export/rdkit_features/train_111-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C is 6."} {"text":"The number of rotatable bonds of the molecule with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC is 8."}", "/scratch/micpie/export/rdkit_features/test_11-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the chemical with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2 is 0."} {"text":"The count of hydrogen bond donors of the compound with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C is 0."}", "/scratch/micpie/export/rdkit_features/test_115-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F?\nAnswer: 5"} {"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_1-11.jsonl": "{"text":"User: I want to analyze a molecule with a formula of C23H25ClN3O2+.\nAssistant: Interesting, do you have some additional requirements I should take into account?\nUser: I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 3.\nAssistant: In that case, I propose the molecule with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4."} {"text":"User: I want to synthesize a compound with a molecular formula of C23H23N5O3.\nAssistant: That's interesting, do you have some additional requirements?\nUser: Indeed, I want the number of hydrogen bond donor sites to be 0, the number of hydrogen bond acceptors to be 8.\nAssistant: In that case, I the compound with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC."}", "/scratch/micpie/export/rdkit_features/train_3-19.jsonl": "{"text":"Question: What is the number of acid groups of the compound with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the chemical with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_108-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5 is 63.23."} {"text":"The total sum of atomic polarizabilities of the molecule with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3 is 68.52."}", "/scratch/micpie/export/rdkit_features/train_2-19.jsonl": "{"text":"Question: What is the count of acid groups of the molecule with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6?\nAnswer: 0"} {"text":"Question: What is the acid group count of the chemical with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_6-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3 is 3."} {"text":"The rotatable bond count of the chemical with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C is 6."}", "/scratch/micpie/export/rdkit_features/test_114-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C?\nAnswer: 4"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_107-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl?\nAnswer: 0"} {"text":"Question: What is the acid group count of the molecule with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_7-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl?\nAnswer: 5"} {"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_26-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3 is C18H22F3NO3."} {"text":"The formula of the molecule with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br is C15H13BrClNO3."}", "/scratch/micpie/export/rdkit_features/test_13-7.jsonl": "{"text":"The acid group count of the chemical with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC is 0."} {"text":"The number of acid groups of the chemical with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4 is 0."}", "/scratch/micpie/export/rdkit_features/train_115-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the compound with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F is 5.03."} {"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 5.01."}", "/scratch/micpie/export/rdkit_features/valid_101-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3?\nAnswer: 4"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_107-22.jsonl": "{"text":"User: I want to design a compound with a count of hydrogen bond donors of 2 and a number of hydrogen bond acceptor sites of 6.\nAssistant: Interesting, do you have some additional limitations that help me narrow down the search?\nUser: Yes, I want the count of heteroatoms to be 9.\nAssistant: I suggest the compound with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC."} {"text":"User: I want to make a chemical with a number of hydrogen bond donors of 3 and a number of hydrogen bond acceptor sites of 7.\nAssistant: That's interesting, do you have some additional limitations?\nUser: Indeed, I want the count of heteroatoms to be 9.\nAssistant: In that scenario, I the chemical with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC."}", "/scratch/micpie/export/rdkit_features/valid_19-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC?\nAnswer: 4"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_28-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the chemical with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br?\nAnswer: 6"} {"text":"Question: What is the number of aromatic bonds of the compound with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/test_23-6.jsonl": "{"text":"The aromatic bond count of the molecule with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4 is 5."} {"text":"The aromatic bond count of the molecule with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4 is 17."}", "/scratch/micpie/export/rdkit_features/test_21-4.jsonl": "{"text":"The count of rings of the compound with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-] is 2."} {"text":"The number of rings of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O is 3."}", "/scratch/micpie/export/rdkit_features/train_115-20.jsonl": "{"text":"Question: What is the basic group count of the chemical with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F?\nAnswer: 0"} {"text":"Question: What is the basic group count of the molecule with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_11-23.jsonl": "{"text":"User: I want to synthesize a chemical with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 2 and a LogP value computed using the Wildman-Crippen method of 4.69.\nAssistant: Interesting, do you have some additional limitations that help me narrow down the search?\nUser: Yea, I want the chemical formula to be C20H28ClNO2.\nAssistant: I suggest the chemical with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl."} {"text":"User: I want to make a chemical with a count of hydrogen bond donors of 0, a count of hydrogen bond acceptors of 1 and a Wildman-Crippen LogP value of 4.74.\nAssistant: Do you have some additional limitations I should consider?\nUser: Yeah, I want the chemical formula to be C17H21Cl2NO.\nAssistant: In that scenario, I propose the chemical with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C."}", "/scratch/micpie/export/rdkit_features/valid_4-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the chemical with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br is 3."} {"text":"The count of hydrogen bond acceptors of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C is 3."}", "/scratch/micpie/export/rdkit_features/train_109-0.jsonl": "{"text":"The chemical formula of the compound with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3 is C23H33N3O5."} {"text":"The formula of the compound with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C is C16H18N6O3S3."}", "/scratch/micpie/export/rdkit_features/train_100-5.jsonl": "{"text":"The number of rotatable bonds of the molecule with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C is 4."} {"text":"The number of rotatable bonds of the compound with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3 is 5."}", "/scratch/micpie/export/rdkit_features/test_30-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3?\nAnswer: 3"} {"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_15-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the molecule with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O is 1."} {"text":"The count of hydrogen bond donors of the chemical with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2 is 2."}", "/scratch/micpie/export/rdkit_features/test_117-11.jsonl": "{"text":"User: I want to make a chemical with a chemical formula of C26H25BrN2O3.\nAssistant: That is a very interesting question, do you have some additional constraints?\nUser: Yes, I want the number of hydrogen bond donor sites to be 0, the number of hydrogen bond acceptors to be 3.\nAssistant: Then, I the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc5cc(ccc5c4)Br."} {"text":"User: I want to create a compound with a molecular formula of C14H25N2O+.\nAssistant: That's interesting, do you have some additional constraints?\nUser: I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptors to be 2.\nAssistant: In that situation, I recommend the compound with SMILES C[C@H](C(C)(C)OC)[NH2+]CCNc1ccccc1."}", "/scratch/micpie/export/rdkit_features/test_5-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5?\nAnswer: 4"} {"text":"Question: What is the count of rotatable bonds of the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_7-3.jsonl": "{"text":"The count of heteroatoms of the compound with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC is 7."} {"text":"The count of heteroatoms of the compound with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC is 8."}", "/scratch/micpie/export/rdkit_features/valid_100-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_4-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the chemical with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4 is 2.25."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C is 2.36."}", "/scratch/micpie/export/rdkit_features/valid_107-8.jsonl": "{"text":"The basic group count of the molecule with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C is 0."} {"text":"The count of basic groups of the compound with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C is 0."}", "/scratch/micpie/export/rdkit_features/test_33-11.jsonl": "{"text":"User: I want to analyze a chemical with a molecular formula of C18H24F3N3O3.\nAssistant: That is a very interesting question, do you have some additional I should take into account?\nUser: Yea, I want the count of hydrogen bond donors to be 3, the number of hydrogen bond acceptor sites to be 3.\nAssistant: In that situation, I advise the chemical with SMILES Cc1ccc(cc1)C(=O)N2CCC(CC2)NC(=O)N[C@@H](C)[C@H](C(F)(F)F)O."} {"text":"User: I want to design a chemical with a molecular formula of C14H13BrN4O4.\nAssistant: Interesting, do you have some additional requirements that help me narrow down the search?\nUser: Yea, I want the number of hydrogen bond donor sites to be 0, the number of hydrogen bond acceptor sites to be 8.\nAssistant: Then, I suggest the chemical with SMILES Cc1cc(no1)OCc2nc(no2)c3cc(c(n(c3=O)C)C)Br."}", "/scratch/micpie/export/rdkit_features/valid_25-8.jsonl": "{"text":"The basic group count of the molecule with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4 is 0."} {"text":"The basic group count of the chemical with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3 is 0."}", "/scratch/micpie/export/rdkit_features/train_116-3.jsonl": "{"text":"The heteroatom count of the compound with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 7."} {"text":"The heteroatom count of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br is 6."}", "/scratch/micpie/export/rdkit_features/valid_17-23.jsonl": "{"text":"User: I want to design a compound with a number of hydrogen bond donor sites of 2, a count of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value of 2.65.\nAssistant: Do you have some additional limitations?\nUser: Yep, I want the molecular formula to be C19H29F3N2O3S.\nAssistant: In that case, I advise the compound with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S."} {"text":"User: I want to analyze a compound with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptor sites of 9 and a Wildman-Crippen LogP value computed using RDKit of 0.66.\nAssistant: Cool, do you have some additional requirements that I should consider?\nUser: Yep, I want the chemical formula to be C20H22N6O7.\nAssistant: In that case, I suggest the compound with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O."}", "/scratch/micpie/export/rdkit_features/valid_28-4.jsonl": "{"text":"The number of rings of the compound with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br is 3."} {"text":"The ring count of the compound with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C is 3."}", "/scratch/micpie/export/rdkit_features/valid_17-6.jsonl": "{"text":"The count of aromatic bonds of the molecule with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S is 0."} {"text":"The number of aromatic bonds of the molecule with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 11."}", "/scratch/micpie/export/rdkit_features/test_108-8.jsonl": "{"text":"The count of basic groups of the chemical with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5 is 0."} {"text":"The count of basic groups of the chemical with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3 is 0."}", "/scratch/micpie/export/rdkit_features/test_119-8.jsonl": "{"text":"The count of basic groups of the compound with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O is 0."} {"text":"The count of basic groups of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C is 0."}", "/scratch/micpie/export/rdkit_features/test_118-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the molecule with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O is 2."} {"text":"The number of hydrogen bond donor sites of the compound with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N is 3."}", "/scratch/micpie/export/rdkit_features/test_27-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F is C18H27F3N3O2+."} {"text":"The chemical formula of the molecule with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F is C19H27F2N3O2."}", "/scratch/micpie/export/rdkit_features/test_0-3.jsonl": "{"text":"The number of heteroatoms of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1 is 10."} {"text":"The count of heteroatoms of the chemical with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC is 7."}", "/scratch/micpie/export/rdkit_features/train_19-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC is 9."} {"text":"The rotatable bond count of the chemical with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3 is 10."}", "/scratch/micpie/export/rdkit_features/test_15-7.jsonl": "{"text":"The acid group count of the molecule with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O is 0."} {"text":"The number of acid groups of the chemical with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2 is 0."}", "/scratch/micpie/export/rdkit_features/valid_29-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl?\nAnswer: 7"} {"text":"Question: What is the count of rotatable bonds of the chemical with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_28-3.jsonl": "{"text":"The count of heteroatoms of the chemical with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl is 6."} {"text":"The heteroatom count of the compound with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F is 6."}", "/scratch/micpie/export/rdkit_features/test_24-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl?\nAnswer: 4"} {"text":"Question: What is the count of rings of the compound with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_20-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the chemical with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br is 3.95."} {"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-] is 1.04."}", "/scratch/micpie/export/rdkit_features/valid_30-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the molecule with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3?\nAnswer: 5"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_16-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C?\nAnswer: C18H35N2O2+"} {"text":"Question: What is the formula of the chemical with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3?\nAnswer: C23H36N2O4"}", "/scratch/micpie/export/rdkit_features/test_102-23.jsonl": "{"text":"User: I want to analyze a chemical with a count of hydrogen bond donors of 0, a number of hydrogen bond acceptors of 4 and a LogP value computed using the Wildman-Crippen method of 4.54.\nAssistant: That is a very interesting question, do you have some additional requirements I should take into account?\nUser: Yep, I want the chemical formula to be C23H25NO3.\nAssistant: In that scenario, I the chemical with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3."} {"text":"User: I want to create a chemical with a number of hydrogen bond donors of 2, a number of hydrogen bond acceptors of 6 and a Wildman-Crippen LogP value of 4.69.\nAssistant: That's interesting, do you have some additional conditions that help me narrow down the search?\nUser: Yes, I want the molecular formula to be C28H26N2O6.\nAssistant: In that case, I propose the chemical with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1."}", "/scratch/micpie/export/rdkit_features/test_17-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the chemical with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F is 2.91."} {"text":"The Wildman-Crippen LogP value of the compound with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 0.96."}", "/scratch/micpie/export/rdkit_features/valid_31-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC is 0."} {"text":"The number of hydrogen bond donors of the chemical with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3 is 1."}", "/scratch/micpie/export/rdkit_features/valid_0-11.jsonl": "{"text":"User: I want to analyze a compound with a molecular formula of C22H10N2O5S2.\nAssistant: Interesting, do you have some additional constraints I should take into account?\nUser: I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptors to be 8.\nAssistant: Then, I recommend the compound with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1."} {"text":"User: I want to analyze a molecule with a molecular formula of C24H35NO4.\nAssistant: Interesting, do you have some additional conditions?\nUser: I want the number of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 4.\nAssistant: I recommend the molecule with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4."}", "/scratch/micpie/export/rdkit_features/test_22-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_27-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the compound with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C is 62.83."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4 is 58.00."}", "/scratch/micpie/export/rdkit_features/test_101-20.jsonl": "{"text":"Question: What is the number of basic groups of the chemical with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the chemical with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_106-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the chemical with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O is 6."} {"text":"The number of hydrogen bond acceptors of the molecule with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl is 6."}", "/scratch/micpie/export/rdkit_features/train_120-3.jsonl": "{"text":"The number of heteroatoms of the molecule with SMILES Cc1ccc(cc1)NC(=O)CSc2nnc(n2C)C=NC(=O)CCc3ccccc3Cl is 9."} {"text":"The number of heteroatoms of the molecule with SMILES Cc1c(cnn1c2ccccc2)N[C@@H](C)C(=O)Nc3cccc(c3Cl)Cl is 7."}", "/scratch/micpie/export/rdkit_features/test_4-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4 is 18."} {"text":"The number of aromatic bonds of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3 is 16."}", "/scratch/micpie/export/rdkit_features/valid_105-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F is 51.47."} {"text":"The sum of atomic polarizabilities of the compound with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O is 39.32."}", "/scratch/micpie/export/rdkit_features/valid_115-22.jsonl": "{"text":"User: I want to analyze a molecule with a number of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 8.\nAssistant: Nice, do you have some additional constraints that I should consider?\nUser: Yep, I want the count of heteroatoms to be 9.\nAssistant: In that scenario, I suggest the molecule with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4."} {"text":"User: I want to design a chemical with a number of hydrogen bond donors of 2 and a number of hydrogen bond acceptors of 2.\nAssistant: Interesting, do you have some additional conditions that help me narrow down the search?\nUser: Yep, I want the heteroatom count to be 9.\nAssistant: In that situation, I the chemical with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F."}", "/scratch/micpie/export/rdkit_features/train_0-20.jsonl": "{"text":"Question: What is the number of basic groups of the compound with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the molecule with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_118-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O is 6."} {"text":"The aromatic bond count of the molecule with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N is 12."}", "/scratch/micpie/export/rdkit_features/train_30-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the compound with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F is 2."} {"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC is 3."}", "/scratch/micpie/export/rdkit_features/test_20-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br?\nAnswer: C19H24BrFN2O2"} {"text":"Question: What is the chemical formula of the molecule with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-]?\nAnswer: C17H15ClNO6S-"}", "/scratch/micpie/export/rdkit_features/valid_120-20.jsonl": "{"text":"Question: What is the basic group count of the molecule with SMILES c1cc(oc1)CNC(=O)CSc2nc3ccc(cc3s2)n4c(c5c(c4O)[C@@H]6C[C@@H]5C=C6)O?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the molecule with SMILES CN(c1nc2cc(ccc2s1)Cl)C(=O)CSc3ccc(c(c3)F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_25-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4?\nAnswer: 4"} {"text":"Question: What is the number of rings of the compound with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_109-19.jsonl": "{"text":"Question: What is the number of acid groups of the molecule with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the molecule with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_14-16.jsonl": "{"text":"Question: What is the ring count of the compound with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C?\nAnswer: 2"} {"text":"Question: What is the ring count of the compound with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_110-23.jsonl": "{"text":"User: I want to analyze a compound with a number of hydrogen bond donors of 3, a number of hydrogen bond acceptor sites of 7 and a LogP value computed using the Wildman-Crippen method of 1.61.\nAssistant: Nice, do you have some additional constraints I should take into account?\nUser: Yea, I want the formula to be C22H19N5O5.\nAssistant: I recommend the compound with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O."} {"text":"User: I want to create a molecule with a number of hydrogen bond donor sites of 0, a number of hydrogen bond acceptor sites of 8 and a LogP value computed using the Wildman-Crippen method of 2.33.\nAssistant: Interesting, do you have some additional constraints that help me narrow down the search?\nUser: Yes, I want the molecular formula to be C19H25N5O3S2.\nAssistant: In that situation, I the molecule with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C."}", "/scratch/micpie/export/rdkit_features/test_106-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the molecule with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O?\nAnswer: 6"} {"text":"Question: What is the count of rotatable bonds of the compound with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/test_4-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4 is 4."} {"text":"The number of hydrogen bond acceptor sites of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3 is 4."}", "/scratch/micpie/export/rdkit_features/valid_107-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C?\nAnswer: 16"} {"text":"Question: What is the aromatic bond count of the compound with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/test_107-20.jsonl": "{"text":"Question: What is the basic group count of the molecule with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the molecule with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_31-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F?\nAnswer: 65.41"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4?\nAnswer: 61.53"}", "/scratch/micpie/export/rdkit_features/test_15-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the compound with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O?\nAnswer: 5"} {"text":"Question: What is the heteroatom count of the chemical with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_119-20.jsonl": "{"text":"Question: What is the number of basic groups of the compound with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O?\nAnswer: 0"} {"text":"Question: What is the basic group count of the compound with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_104-8.jsonl": "{"text":"The count of basic groups of the chemical with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 0."} {"text":"The count of basic groups of the chemical with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl is 0."}", "/scratch/micpie/export/rdkit_features/train_1-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the compound with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4?\nAnswer: 5"} {"text":"Question: What is the number of heteroatoms of the molecule with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/valid_9-22.jsonl": "{"text":"User: I want to make a molecule with a number of hydrogen bond donor sites of 1 and a count of hydrogen bond acceptors of 3.\nAssistant: That's interesting, do you have some additional requirements I should consider?\nUser: Yea, I want the count of heteroatoms to be 7.\nAssistant: I advise the molecule with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2."} {"text":"User: I want to make a chemical with a count of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 6.\nAssistant: That is a very interesting question, do you have some additional limitations I should take into account?\nUser: I want the number of heteroatoms to be 7.\nAssistant: In that scenario, I advise the chemical with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4."}", "/scratch/micpie/export/rdkit_features/test_29-23.jsonl": "{"text":"User: I want to make a compound with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptor sites of 2 and a Wildman-Crippen LogP value of 2.21.\nAssistant: That is a very interesting question, do you have some additional constraints I should consider?\nUser: Yeah, I want the formula to be C21H33N4O+.\nAssistant: In that scenario, I the compound with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4."} {"text":"User: I want to analyze a molecule with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 2 and a Wildman-Crippen LogP value of 3.99.\nAssistant: Nice, do you have some additional limitations I should take into account?\nUser: Yeah, I want the formula to be C21H38N2O2.\nAssistant: In that situation, I the molecule with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C."}", "/scratch/micpie/export/rdkit_features/valid_27-19.jsonl": "{"text":"Question: What is the number of acid groups of the chemical with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the molecule with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_2-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4 is 3.97."} {"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3 is 3.99."}", "/scratch/micpie/export/rdkit_features/test_31-5.jsonl": "{"text":"The number of rotatable bonds of the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F is 3."} {"text":"The count of rotatable bonds of the compound with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3 is 6."}", "/scratch/micpie/export/rdkit_features/train_7-16.jsonl": "{"text":"Question: What is the count of rings of the compound with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl?\nAnswer: 4"} {"text":"Question: What is the number of rings of the molecule with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_103-15.jsonl": "{"text":"Question: What is the heteroatom count of the chemical with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO?\nAnswer: 7"} {"text":"Question: What is the count of heteroatoms of the molecule with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/train_112-4.jsonl": "{"text":"The ring count of the chemical with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC is 4."} {"text":"The count of rings of the molecule with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 3."}", "/scratch/micpie/export/rdkit_features/valid_9-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the chemical with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2 is 3.97."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4 is 3.70."}", "/scratch/micpie/export/rdkit_features/test_22-0.jsonl": "{"text":"The formula of the molecule with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O is C21H33N3O3."} {"text":"The chemical formula of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4 is C21H40N6O+2."}", "/scratch/micpie/export/rdkit_features/valid_20-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3 is C25H33N3O3."} {"text":"The molecular formula of the compound with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-] is C16H19ClNO6S-."}", "/scratch/micpie/export/rdkit_features/valid_8-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C is 6."} {"text":"The count of rotatable bonds of the compound with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br is 1."}", "/scratch/micpie/export/rdkit_features/train_102-23.jsonl": "{"text":"User: I want to create a compound with a count of hydrogen bond donors of 4, a number of hydrogen bond acceptor sites of 8 and a LogP value computed using the Wildman-Crippen method of -0.42.\nAssistant: Interesting, do you have some additional constraints I should take into account?\nUser: I want the chemical formula to be C20H30N2O6S2.\nAssistant: Then, I the compound with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O."} {"text":"User: I want to make a molecule with a number of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value of 4.10.\nAssistant: Nice, do you have some additional limitations I should take into account?\nUser: Yeah, I want the molecular formula to be C18H15NO2.\nAssistant: In that case, I suggest the molecule with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1."}", "/scratch/micpie/export/rdkit_features/valid_27-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F?\nAnswer: 4"} {"text":"Question: What is the number of rotatable bonds of the molecule with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_114-7.jsonl": "{"text":"The acid group count of the molecule with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C is 0."} {"text":"The count of acid groups of the chemical with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F is 0."}", "/scratch/micpie/export/rdkit_features/valid_5-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C is 62.00."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+] is 66.08."}", "/scratch/micpie/export/rdkit_features/test_100-16.jsonl": "{"text":"Question: What is the ring count of the compound with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F?\nAnswer: 4"} {"text":"Question: What is the number of rings of the molecule with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_15-11.jsonl": "{"text":"User: I want to synthesize a compound with a chemical formula of C18H20N4O.\nAssistant: That is a very interesting question, do you have some additional constraints I should take into account?\nUser: Indeed, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 4.\nAssistant: Then, I recommend the compound with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N."} {"text":"User: I want to make a chemical with a chemical formula of C18H35N2O2+.\nAssistant: That's interesting, do you have some additional requirements that I should consider?\nUser: I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptors to be 2.\nAssistant: In that scenario, I advise the chemical with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C."}", "/scratch/micpie/export/rdkit_features/test_113-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_21-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the molecule with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC?\nAnswer: 9"} {"text":"Question: What is the count of heteroatoms of the compound with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/train_116-20.jsonl": "{"text":"Question: What is the count of basic groups of the compound with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 0"} {"text":"Question: What is the basic group count of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_24-11.jsonl": "{"text":"User: I want to design a chemical with a chemical formula of C20H16N7O2-.\nAssistant: Cool, do you have some additional limitations I should take into account?\nUser: Yeah, I want the number of hydrogen bond donor sites to be 2, the count of hydrogen bond acceptors to be 6.\nAssistant: I suggest the chemical with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5."} {"text":"User: I want to create a compound with a formula of C19H31N7O2.\nAssistant: That is a very interesting question, do you have some additional limitations I should take into account?\nUser: Yes, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 8.\nAssistant: Then, I suggest the compound with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C."}", "/scratch/micpie/export/rdkit_features/train_21-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC?\nAnswer: 49.29"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O?\nAnswer: 58.55"}", "/scratch/micpie/export/rdkit_features/valid_3-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the compound with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N is 5."} {"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC is 5."}", "/scratch/micpie/export/rdkit_features/test_29-7.jsonl": "{"text":"The acid group count of the chemical with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4 is 0."} {"text":"The number of acid groups of the molecule with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C is 0."}", "/scratch/micpie/export/rdkit_features/train_103-15.jsonl": "{"text":"Question: What is the heteroatom count of the compound with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1?\nAnswer: 4"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/test_115-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F is 5."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 3."}", "/scratch/micpie/export/rdkit_features/valid_102-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1 is 60.50."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2 is 144.96."}", "/scratch/micpie/export/rdkit_features/test_7-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC?\nAnswer: 4"} {"text":"Question: What is the number of rings of the compound with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_20-18.jsonl": "{"text":"Question: What is the aromatic bond count of the compound with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3?\nAnswer: 12"} {"text":"Question: What is the number of aromatic bonds of the molecule with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-]?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_7-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl is 3.60."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F is 3.87."}", "/scratch/micpie/export/rdkit_features/train_120-5.jsonl": "{"text":"The rotatable bond count of the chemical with SMILES Cc1ccc(cc1)NC(=O)CSc2nnc(n2C)C=NC(=O)CCc3ccccc3Cl is 8."} {"text":"The count of rotatable bonds of the molecule with SMILES Cc1c(cnn1c2ccccc2)N[C@@H](C)C(=O)Nc3cccc(c3Cl)Cl is 5."}", "/scratch/micpie/export/rdkit_features/test_13-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC is 12."} {"text":"The number of aromatic bonds of the compound with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4 is 6."}", "/scratch/micpie/export/rdkit_features/test_23-7.jsonl": "{"text":"The count of acid groups of the compound with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4 is 0."} {"text":"The number of acid groups of the chemical with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4 is 3."}", "/scratch/micpie/export/rdkit_features/train_27-19.jsonl": "{"text":"Question: What is the count of acid groups of the molecule with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the molecule with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_19-20.jsonl": "{"text":"Question: What is the basic group count of the chemical with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the compound with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_23-8.jsonl": "{"text":"The count of basic groups of the compound with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4 is 1."} {"text":"The number of basic groups of the compound with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3 is 0."}", "/scratch/micpie/export/rdkit_features/train_10-17.jsonl": "{"text":"Question: What is the rotatable bond count of the compound with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4?\nAnswer: 8"} {"text":"Question: What is the rotatable bond count of the molecule with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_110-4.jsonl": "{"text":"The number of rings of the molecule with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3 is 3."} {"text":"The count of rings of the compound with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4 is 4."}", "/scratch/micpie/export/rdkit_features/train_107-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl?\nAnswer: 4"} {"text":"Question: What is the number of rings of the chemical with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_2-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_2-1.jsonl": "{"text":"The number of hydrogen bond donors of the molecule with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4 is 0."} {"text":"The number of hydrogen bond donor sites of the chemical with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3 is 1."}", "/scratch/micpie/export/rdkit_features/train_4-16.jsonl": "{"text":"Question: What is the ring count of the compound with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4?\nAnswer: 4"} {"text":"Question: What is the ring count of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_6-20.jsonl": "{"text":"Question: What is the number of basic groups of the molecule with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the molecule with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_19-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC is 9."} {"text":"The rotatable bond count of the molecule with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F is 6."}", "/scratch/micpie/export/rdkit_features/valid_33-11.jsonl": "{"text":"User: I want to analyze a chemical with a formula of C22H26N4O2.\nAssistant: Cool, do you have some additional constraints that help me narrow down the search?\nUser: Indeed, I want the number of hydrogen bond donors to be 0, the number of hydrogen bond acceptor sites to be 4.\nAssistant: In that scenario, I suggest the chemical with SMILES c1ccc2c(c1)cn(n2)CC(=O)N3C[C@@H]4[C@H]3CN(CC4)C(=O)[C@@]56CCC[C@@H]5C6."} {"text":"User: I want to create a molecule with a chemical formula of C15H20BrN4O2S+.\nAssistant: Cool, do you have some additional conditions?\nUser: Indeed, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 6.\nAssistant: I propose the molecule with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@@H]3CCSC3)Br."}", "/scratch/micpie/export/rdkit_features/test_111-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4 is 7."} {"text":"The count of hydrogen bond acceptors of the molecule with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC is 8."}", "/scratch/micpie/export/rdkit_features/test_18-16.jsonl": "{"text":"Question: What is the count of rings of the chemical with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O?\nAnswer: 4"} {"text":"Question: What is the ring count of the molecule with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_117-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc5cc(ccc5c4)Br is 70.09."} {"text":"The sum of atomic polarizabilities of the compound with SMILES C[C@H](C(C)(C)OC)[NH2+]CCNc1ccccc1 is 44.31."}", "/scratch/micpie/export/rdkit_features/valid_103-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO?\nAnswer: 2"} {"text":"Question: What is the number of rings of the chemical with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_113-20.jsonl": "{"text":"Question: What is the basic group count of the molecule with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 2"} {"text":"Question: What is the count of basic groups of the compound with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_14-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3?\nAnswer: 58.17"} {"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O?\nAnswer: 38.77"}", "/scratch/micpie/export/rdkit_features/train_3-11.jsonl": "{"text":"User: I want to analyze a molecule with a formula of C25H24N2O4.\nAssistant: That is a very interesting question, do you have some additional requirements?\nUser: Indeed, I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptors to be 5.\nAssistant: In that case, I the molecule with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O."} {"text":"User: I want to create a compound with a molecular formula of C24H28N3O3+.\nAssistant: Cool, do you have some additional constraints I should take into account?\nUser: Yeah, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 5.\nAssistant: In that situation, I suggest the compound with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4."}", "/scratch/micpie/export/rdkit_features/test_30-4.jsonl": "{"text":"The ring count of the compound with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3 is 3."} {"text":"The number of rings of the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC is 4."}", "/scratch/micpie/export/rdkit_features/test_115-4.jsonl": "{"text":"The count of rings of the compound with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F is 4."} {"text":"The count of rings of the chemical with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 6."}", "/scratch/micpie/export/rdkit_features/valid_9-6.jsonl": "{"text":"The number of aromatic bonds of the chemical with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2 is 6."} {"text":"The number of aromatic bonds of the molecule with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4 is 5."}", "/scratch/micpie/export/rdkit_features/train_119-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO?\nAnswer: 6"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_15-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_17-16.jsonl": "{"text":"Question: What is the number of rings of the molecule with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F?\nAnswer: 3"} {"text":"Question: What is the number of rings of the compound with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_5-4.jsonl": "{"text":"The number of rings of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5 is 5."} {"text":"The number of rings of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F is 3."}", "/scratch/micpie/export/rdkit_features/train_106-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O?\nAnswer: 6"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_4-5.jsonl": "{"text":"The number of rotatable bonds of the chemical with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4 is 3."} {"text":"The number of rotatable bonds of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3 is 5."}", "/scratch/micpie/export/rdkit_features/valid_15-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N?\nAnswer: 50.22"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C?\nAnswer: 58.82"}", "/scratch/micpie/export/rdkit_features/train_18-12.jsonl": "{"text":"Question: What is the formula of the compound with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O?\nAnswer: C21H27N3O9S"} {"text":"Question: What is the molecular formula of the chemical with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl?\nAnswer: C17H18Cl2N2O4S"}", "/scratch/micpie/export/rdkit_features/valid_10-20.jsonl": "{"text":"Question: What is the number of basic groups of the chemical with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the molecule with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_7-3.jsonl": "{"text":"The count of heteroatoms of the chemical with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC is 6."} {"text":"The heteroatom count of the molecule with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC is 7."}", "/scratch/micpie/export/rdkit_features/test_2-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C?\nAnswer: 61.94"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4?\nAnswer: 64.62"}", "/scratch/micpie/export/rdkit_features/train_115-22.jsonl": "{"text":"User: I want to analyze a compound with a number of hydrogen bond donor sites of 2 and a count of hydrogen bond acceptors of 3.\nAssistant: Nice, do you have some additional that I should consider?\nUser: Yep, I want the number of heteroatoms to be 8.\nAssistant: In that situation, I suggest the compound with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F."} {"text":"User: I want to make a chemical with a count of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 3.\nAssistant: Do you have some additional constraints that help me narrow down the search?\nUser: Yep, I want the heteroatom count to be 7.\nAssistant: I the chemical with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F."}", "/scratch/micpie/export/rdkit_features/valid_23-16.jsonl": "{"text":"Question: What is the count of rings of the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4?\nAnswer: 4"} {"text":"Question: What is the number of rings of the molecule with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_115-7.jsonl": "{"text":"The count of acid groups of the molecule with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4 is 0."} {"text":"The acid group count of the compound with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F is 0."}", "/scratch/micpie/export/rdkit_features/test_11-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the molecule with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2?\nAnswer: 5"} {"text":"Question: What is the count of rotatable bonds of the compound with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_6-19.jsonl": "{"text":"Question: What is the acid group count of the compound with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3?\nAnswer: 0"} {"text":"Question: What is the acid group count of the molecule with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_111-18.jsonl": "{"text":"Question: What is the aromatic bond count of the compound with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4?\nAnswer: 17"} {"text":"Question: What is the aromatic bond count of the compound with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC?\nAnswer: 17"}", "/scratch/micpie/export/rdkit_features/valid_4-0.jsonl": "{"text":"The molecular formula of the molecule with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br is C18H17BrClN3O2."} {"text":"The chemical formula of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C is C23H31N3O2S+2."}", "/scratch/micpie/export/rdkit_features/valid_32-20.jsonl": "{"text":"Question: What is the number of basic groups of the molecule with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3?\nAnswer: 1"} {"text":"Question: What is the number of basic groups of the chemical with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_106-3.jsonl": "{"text":"The heteroatom count of the compound with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O is 8."} {"text":"The heteroatom count of the chemical with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC is 9."}", "/scratch/micpie/export/rdkit_features/test_106-0.jsonl": "{"text":"The molecular formula of the chemical with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O is C14H18N4O4."} {"text":"The chemical formula of the compound with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC is C22H27N3O5S."}", "/scratch/micpie/export/rdkit_features/valid_106-15.jsonl": "{"text":"Question: What is the heteroatom count of the molecule with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl?\nAnswer: 8"} {"text":"Question: What is the count of heteroatoms of the compound with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/valid_3-12.jsonl": "{"text":"Question: What is the formula of the chemical with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N?\nAnswer: C22H23N5O2S"} {"text":"Question: What is the molecular formula of the compound with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC?\nAnswer: C20H22ClN3O4"}", "/scratch/micpie/export/rdkit_features/test_110-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3?\nAnswer: 3"} {"text":"Question: What is the number of rings of the chemical with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_1-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5?\nAnswer: 62.36"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F?\nAnswer: 62.38"}", "/scratch/micpie/export/rdkit_features/valid_27-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F?\nAnswer: 47.52"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C?\nAnswer: 62.15"}", "/scratch/micpie/export/rdkit_features/train_104-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the compound with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is -1.05."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl is 3.68."}", "/scratch/micpie/export/rdkit_features/test_102-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3?\nAnswer: 5"} {"text":"Question: What is the number of rings of the molecule with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_101-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3 is 2."} {"text":"The number of hydrogen bond donor sites of the chemical with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1 is 1."}", "/scratch/micpie/export/rdkit_features/train_5-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C is 2."} {"text":"The count of hydrogen bond donors of the molecule with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C is 3."}", "/scratch/micpie/export/rdkit_features/test_11-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2?\nAnswer: 0"} {"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_102-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O is 68.02."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1 is 44.39."}", "/scratch/micpie/export/rdkit_features/valid_31-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the chemical with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC?\nAnswer: 6"} {"text":"Question: What is the heteroatom count of the molecule with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/train_100-12.jsonl": "{"text":"Question: What is the molecular formula of the compound with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C?\nAnswer: C19H33N2O2+"} {"text":"Question: What is the formula of the chemical with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3?\nAnswer: C17H15N3O3"}", "/scratch/micpie/export/rdkit_features/train_12-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C?\nAnswer: 50.18"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 61.56"}", "/scratch/micpie/export/rdkit_features/train_109-20.jsonl": "{"text":"Question: What is the count of basic groups of the chemical with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the chemical with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_2-11.jsonl": "{"text":"User: I want to design a compound with a molecular formula of C21H24N4O3S.\nAssistant: Cool, do you have some additional I should take into account?\nUser: I want the number of hydrogen bond donors to be 0, the number of hydrogen bond acceptor sites to be 7.\nAssistant: Then, I the compound with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4."} {"text":"User: I want to create a chemical with a chemical formula of C21H28N4O2S.\nAssistant: Nice, do you have some additional conditions that help me narrow down the search?\nUser: Indeed, I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptors to be 4.\nAssistant: In that scenario, I suggest the chemical with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3."}", "/scratch/micpie/export/rdkit_features/valid_17-8.jsonl": "{"text":"The count of basic groups of the molecule with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S is 0."} {"text":"The basic group count of the chemical with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 0."}", "/scratch/micpie/export/rdkit_features/valid_17-3.jsonl": "{"text":"The number of heteroatoms of the compound with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S is 9."} {"text":"The number of heteroatoms of the compound with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 13."}", "/scratch/micpie/export/rdkit_features/valid_25-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4?\nAnswer: C20H29N5O3"} {"text":"Question: What is the chemical formula of the compound with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3?\nAnswer: C20H25N3OS"}", "/scratch/micpie/export/rdkit_features/train_8-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N?\nAnswer: 52.29"} {"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO?\nAnswer: 69.03"}", "/scratch/micpie/export/rdkit_features/train_19-19.jsonl": "{"text":"Question: What is the count of acid groups of the chemical with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the compound with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_28-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br?\nAnswer: C16H20BrFN2O"} {"text":"Question: What is the chemical formula of the compound with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C?\nAnswer: C22H26N2O2"}", "/scratch/micpie/export/rdkit_features/test_15-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O?\nAnswer: 4"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_17-23.jsonl": "{"text":"User: I want to make a chemical with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 3 and a Wildman-Crippen LogP value computed using RDKit of 2.82.\nAssistant: That is a very interesting question, do you have some additional requirements I should take into account?\nUser: Yep, I want the chemical formula to be C23H28F2N2O3.\nAssistant: In that situation, I propose the chemical with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F."} {"text":"User: I want to synthesize a chemical with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 9 and a LogP value computed using the Wildman-Crippen method of 0.93.\nAssistant: Interesting, do you have some additional constraints that help me narrow down the search?\nUser: Yep, I want the molecular formula to be C23H24N4O8.\nAssistant: Then, I suggest the chemical with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O."}", "/scratch/micpie/export/rdkit_features/test_119-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O is C15H16FN5O3."} {"text":"The formula of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C is C22H29BrN2O4S."}", "/scratch/micpie/export/rdkit_features/train_4-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4?\nAnswer: 17"} {"text":"Question: What is the aromatic bond count of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C?\nAnswer: 16"}", "/scratch/micpie/export/rdkit_features/valid_120-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES c1cc(oc1)CNC(=O)CSc2nc3ccc(cc3s2)n4c(c5c(c4O)[C@@H]6C[C@@H]5C=C6)O?\nAnswer: 65.46"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES CN(c1nc2cc(ccc2s1)Cl)C(=O)CSc3ccc(c(c3)F)F?\nAnswer: 47.59"}", "/scratch/micpie/export/rdkit_features/train_102-22.jsonl": "{"text":"User: I want to analyze a compound with a number of hydrogen bond donor sites of 4 and a count of hydrogen bond acceptors of 8.\nAssistant: That's interesting, do you have some additional conditions that I should consider?\nUser: Yes, I want the heteroatom count to be 10.\nAssistant: I advise the compound with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O."} {"text":"User: I want to create a compound with a number of hydrogen bond donor sites of 1 and a number of hydrogen bond acceptors of 3.\nAssistant: Nice, do you have some additional constraints that I should consider?\nUser: Yes, I want the heteroatom count to be 3.\nAssistant: In that situation, I suggest the compound with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1."}", "/scratch/micpie/export/rdkit_features/train_115-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F?\nAnswer: 3"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_23-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4 is 6."} {"text":"The rotatable bond count of the molecule with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4 is 4."}", "/scratch/micpie/export/rdkit_features/valid_24-19.jsonl": "{"text":"Question: What is the number of acid groups of the molecule with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl?\nAnswer: 3"} {"text":"Question: What is the count of acid groups of the compound with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_15-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_33-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES c1ccc2c(c1)cn(n2)CC(=O)N3C[C@@H]4[C@H]3CN(CC4)C(=O)[C@@]56CCC[C@@H]5C6 is 0."} {"text":"The count of acid groups of the chemical with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@@H]3CCSC3)Br is 0."}", "/scratch/micpie/export/rdkit_features/valid_22-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O is 5."} {"text":"The number of hydrogen bond acceptors of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4 is 7."}", "/scratch/micpie/export/rdkit_features/valid_8-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_118-8.jsonl": "{"text":"The basic group count of the molecule with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O is 1."} {"text":"The number of basic groups of the compound with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N is 0."}", "/scratch/micpie/export/rdkit_features/train_119-22.jsonl": "{"text":"User: I want to design a compound with a count of hydrogen bond donors of 3 and a number of hydrogen bond acceptor sites of 6.\nAssistant: Do you have some additional I should consider?\nUser: Yea, I want the heteroatom count to be 9.\nAssistant: I suggest the compound with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO."} {"text":"User: I want to create a compound with a number of hydrogen bond donors of 2 and a number of hydrogen bond acceptor sites of 6.\nAssistant: That's interesting, do you have some additional constraints I should consider?\nUser: Yep, I want the count of heteroatoms to be 10.\nAssistant: In that scenario, I recommend the compound with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl."}", "/scratch/micpie/export/rdkit_features/train_10-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_13-12.jsonl": "{"text":"Question: What is the formula of the molecule with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: C19H21ClFNO2"} {"text":"Question: What is the formula of the compound with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4?\nAnswer: C22H31NO2"}", "/scratch/micpie/export/rdkit_features/valid_107-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C?\nAnswer: 68.15"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C?\nAnswer: 66.60"}", "/scratch/micpie/export/rdkit_features/valid_4-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_14-11.jsonl": "{"text":"User: I want to design a compound with a chemical formula of C21H24N4O.\nAssistant: Interesting, do you have some additional limitations that I should consider?\nUser: Yes, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 3.\nAssistant: In that scenario, I propose the compound with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3."} {"text":"User: I want to make a molecule with a molecular formula of C13H14BrNO3.\nAssistant: That is a very interesting question, do you have some additional constraints I should take into account?\nUser: Indeed, I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptors to be 4.\nAssistant: I suggest the molecule with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O."}", "/scratch/micpie/export/rdkit_features/test_105-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the molecule with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_23-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_13-15.jsonl": "{"text":"Question: What is the heteroatom count of the molecule with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4?\nAnswer: 5"} {"text":"Question: What is the number of heteroatoms of the compound with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_1-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5?\nAnswer: C21H21N7OS"} {"text":"Question: What is the formula of the chemical with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F?\nAnswer: C22H24FN5O2"}", "/scratch/micpie/export/rdkit_features/valid_6-6.jsonl": "{"text":"The count of aromatic bonds of the molecule with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+] is 18."} {"text":"The count of aromatic bonds of the compound with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F is 17."}", "/scratch/micpie/export/rdkit_features/test_101-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F?\nAnswer: 6"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_8-3.jsonl": "{"text":"The count of heteroatoms of the chemical with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N is 10."} {"text":"The number of heteroatoms of the chemical with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO is 6."}", "/scratch/micpie/export/rdkit_features/valid_31-16.jsonl": "{"text":"Question: What is the ring count of the compound with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC?\nAnswer: 4"} {"text":"Question: What is the count of rings of the molecule with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_109-7.jsonl": "{"text":"The number of acid groups of the molecule with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl is 0."} {"text":"The number of acid groups of the chemical with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N is 0."}", "/scratch/micpie/export/rdkit_features/test_21-7.jsonl": "{"text":"The number of acid groups of the molecule with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-] is 1."} {"text":"The acid group count of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O is 0."}", "/scratch/micpie/export/rdkit_features/test_2-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C is 1."} {"text":"The number of hydrogen bond donor sites of the chemical with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4 is 0."}", "/scratch/micpie/export/rdkit_features/valid_101-4.jsonl": "{"text":"The count of rings of the chemical with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3 is 3."} {"text":"The count of rings of the chemical with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1 is 4."}", "/scratch/micpie/export/rdkit_features/train_20-0.jsonl": "{"text":"The molecular formula of the compound with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3 is C25H33N3O3."} {"text":"The chemical formula of the compound with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC is C17H15FNO6S-."}", "/scratch/micpie/export/rdkit_features/valid_106-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the chemical with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl?\nAnswer: 2"} {"text":"Question: What is the count of rotatable bonds of the compound with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_110-8.jsonl": "{"text":"The number of basic groups of the compound with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3 is 1."} {"text":"The count of basic groups of the compound with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4 is 0."}", "/scratch/micpie/export/rdkit_features/valid_0-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1?\nAnswer: 0"} {"text":"Question: What is the basic group count of the molecule with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_5-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5?\nAnswer: 65.99"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F?\nAnswer: 59.11"}", "/scratch/micpie/export/rdkit_features/test_107-11.jsonl": "{"text":"User: I want to synthesize a compound with a chemical formula of C22H27N3O5S.\nAssistant: Nice, do you have some additional constraints that I should consider?\nUser: Yeah, I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 6.\nAssistant: In that case, I advise the compound with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC."} {"text":"User: I want to make a chemical with a chemical formula of C23H29N3O6.\nAssistant: Interesting, do you have some additional constraints I should take into account?\nUser: Yeah, I want the count of hydrogen bond donors to be 3, the number of hydrogen bond acceptors to be 7.\nAssistant: Then, I recommend the chemical with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC."}", "/scratch/micpie/export/rdkit_features/train_4-12.jsonl": "{"text":"Question: What is the chemical formula of the compound with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4?\nAnswer: C24H28N3O3+"} {"text":"Question: What is the chemical formula of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C?\nAnswer: C23H29N2O3S+"}", "/scratch/micpie/export/rdkit_features/valid_11-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl is 4.69."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C is 4.74."}", "/scratch/micpie/export/rdkit_features/train_15-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC is 11."} {"text":"The aromatic bond count of the molecule with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C is 5."}", "/scratch/micpie/export/rdkit_features/train_119-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO?\nAnswer: 2"} {"text":"Question: What is the count of rings of the chemical with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_106-0.jsonl": "{"text":"The chemical formula of the compound with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O is C12H16F2N4O3."} {"text":"The chemical formula of the chemical with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl is C20H24ClN7O3."}", "/scratch/micpie/export/rdkit_features/test_8-20.jsonl": "{"text":"Question: What is the basic group count of the chemical with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the compound with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_9-17.jsonl": "{"text":"Question: What is the rotatable bond count of the compound with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C?\nAnswer: 5"} {"text":"Question: What is the count of rotatable bonds of the molecule with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/test_20-22.jsonl": "{"text":"User: I want to analyze a chemical with a count of hydrogen bond donors of 2 and a count of hydrogen bond acceptors of 2.\nAssistant: Interesting, do you have some additional ?\nUser: Yes, I want the heteroatom count to be 6.\nAssistant: I recommend the chemical with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br."} {"text":"User: I want to analyze a compound with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 6.\nAssistant: Cool, do you have some additional requirements I should take into account?\nUser: Indeed, I want the heteroatom count to be 9.\nAssistant: In that situation, I the compound with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-]."}", "/scratch/micpie/export/rdkit_features/valid_110-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O?\nAnswer: 4"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/test_21-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-] is 0.86."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O is 2.08."}", "/scratch/micpie/export/rdkit_features/valid_30-3.jsonl": "{"text":"The heteroatom count of the chemical with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3 is 5."} {"text":"The count of heteroatoms of the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F is 6."}", "/scratch/micpie/export/rdkit_features/train_18-8.jsonl": "{"text":"The count of basic groups of the compound with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O is 0."} {"text":"The count of basic groups of the compound with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl is 0."}", "/scratch/micpie/export/rdkit_features/train_111-0.jsonl": "{"text":"The molecular formula of the chemical with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C is C20H27N5O4S."} {"text":"The formula of the molecule with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC is C22H29N7O3."}", "/scratch/micpie/export/rdkit_features/valid_107-17.jsonl": "{"text":"Question: What is the rotatable bond count of the compound with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C?\nAnswer: 6"} {"text":"Question: What is the rotatable bond count of the chemical with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/valid_13-8.jsonl": "{"text":"The number of basic groups of the molecule with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4 is 0."} {"text":"The count of basic groups of the chemical with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C is 0."}", "/scratch/micpie/export/rdkit_features/test_100-11.jsonl": "{"text":"User: I want to make a molecule with a chemical formula of C18H27FN3O+.\nAssistant: Interesting, do you have some additional conditions?\nUser: Yes, I want the number of hydrogen bond donor sites to be 2, the count of hydrogen bond acceptors to be 1.\nAssistant: In that case, I suggest the molecule with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F."} {"text":"User: I want to design a compound with a chemical formula of C13H14BrN3O2.\nAssistant: That's interesting, do you have some additional I should take into account?\nUser: Yeah, I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 3.\nAssistant: In that case, I propose the compound with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br."}", "/scratch/micpie/export/rdkit_features/test_29-3.jsonl": "{"text":"The number of heteroatoms of the chemical with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4 is 5."} {"text":"The count of heteroatoms of the molecule with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C is 4."}", "/scratch/micpie/export/rdkit_features/test_105-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N is 1."} {"text":"The count of hydrogen bond donors of the chemical with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O is 2."}", "/scratch/micpie/export/rdkit_features/valid_116-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 79.60."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl is 70.02."}", "/scratch/micpie/export/rdkit_features/test_107-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC is C22H27N3O5S."} {"text":"The molecular formula of the chemical with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC is C23H29N3O6."}", "/scratch/micpie/export/rdkit_features/train_9-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the chemical with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C is 3."} {"text":"The number of hydrogen bond acceptors of the molecule with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F is 5."}", "/scratch/micpie/export/rdkit_features/train_25-23.jsonl": "{"text":"User: I want to design a compound with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 8 and a Wildman-Crippen LogP value computed using RDKit of 2.44.\nAssistant: Do you have some additional conditions that I should consider?\nUser: Indeed, I want the molecular formula to be C19H31N7O2.\nAssistant: I advise the compound with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C."} {"text":"User: I want to make a chemical with a number of hydrogen bond donors of 0, a count of hydrogen bond acceptors of 2 and a Wildman-Crippen LogP value of 3.89.\nAssistant: That's interesting, do you have some additional limitations I should take into account?\nUser: I want the formula to be C15H15F6NO2.\nAssistant: Then, I recommend the chemical with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F."}", "/scratch/micpie/export/rdkit_features/train_115-18.jsonl": "{"text":"Question: What is the aromatic bond count of the molecule with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F?\nAnswer: 18"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 18"}", "/scratch/micpie/export/rdkit_features/train_29-7.jsonl": "{"text":"The acid group count of the molecule with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F is 0."} {"text":"The count of acid groups of the compound with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C is 0."}", "/scratch/micpie/export/rdkit_features/valid_120-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the chemical with SMILES c1cc(oc1)CNC(=O)CSc2nc3ccc(cc3s2)n4c(c5c(c4O)[C@@H]6C[C@@H]5C=C6)O?\nAnswer: 9"} {"text":"Question: What is the heteroatom count of the chemical with SMILES CN(c1nc2cc(ccc2s1)Cl)C(=O)CSc3ccc(c(c3)F)F?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/train_31-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_112-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the molecule with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC is 0."} {"text":"The number of hydrogen bond donor sites of the compound with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N is 3."}", "/scratch/micpie/export/rdkit_features/train_102-4.jsonl": "{"text":"The count of rings of the compound with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O is 5."} {"text":"The ring count of the chemical with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1 is 3."}", "/scratch/micpie/export/rdkit_features/test_109-8.jsonl": "{"text":"The number of basic groups of the compound with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl is 0."} {"text":"The basic group count of the chemical with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N is 0."}", "/scratch/micpie/export/rdkit_features/test_5-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5 is 65.99."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F is 59.11."}", "/scratch/micpie/export/rdkit_features/test_31-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_25-6.jsonl": "{"text":"The aromatic bond count of the chemical with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4 is 11."} {"text":"The aromatic bond count of the compound with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3 is 12."}", "/scratch/micpie/export/rdkit_features/valid_113-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 0.24."} {"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 6.22."}", "/scratch/micpie/export/rdkit_features/train_110-11.jsonl": "{"text":"User: I want to create a compound with a chemical formula of C22H19N5O5.\nAssistant: Do you have some additional constraints I should consider?\nUser: Yeah, I want the number of hydrogen bond donor sites to be 3, the number of hydrogen bond acceptors to be 7.\nAssistant: In that scenario, I recommend the compound with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O."} {"text":"User: I want to create a molecule with a formula of C19H25N5O3S2.\nAssistant: That's interesting, do you have some additional limitations I should take into account?\nUser: Yep, I want the count of hydrogen bond donors to be 0, the number of hydrogen bond acceptors to be 8.\nAssistant: Then, I suggest the molecule with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C."}", "/scratch/micpie/export/rdkit_features/valid_116-1.jsonl": "{"text":"The number of hydrogen bond donors of the molecule with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 1."} {"text":"The number of hydrogen bond donors of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl is 0."}", "/scratch/micpie/export/rdkit_features/train_120-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES Cc1ccc(cc1)NC(=O)CSc2nnc(n2C)C=NC(=O)CCc3ccccc3Cl is 4.09."} {"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES Cc1c(cnn1c2ccccc2)N[C@@H](C)C(=O)Nc3cccc(c3Cl)Cl is 4.93."}", "/scratch/micpie/export/rdkit_features/train_22-11.jsonl": "{"text":"User: I want to analyze a chemical with a chemical formula of C19H25ClFN3O3.\nAssistant: Cool, do you have some additional requirements I should take into account?\nUser: Indeed, I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptors to be 4.\nAssistant: Then, I the chemical with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O."} {"text":"User: I want to design a molecule with a chemical formula of C20H33N6O2+.\nAssistant: That's interesting, do you have some additional that help me narrow down the search?\nUser: Yea, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 7.\nAssistant: I propose the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4."}", "/scratch/micpie/export/rdkit_features/train_0-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1 is C21H6F2N2O3S4."} {"text":"The molecular formula of the chemical with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4 is C25H33N2O3+."}", "/scratch/micpie/export/rdkit_features/valid_30-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3?\nAnswer: 64.83"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F?\nAnswer: 63.16"}", "/scratch/micpie/export/rdkit_features/test_104-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2 is 5."} {"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3 is 6."}", "/scratch/micpie/export/rdkit_features/test_0-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1 is 25."} {"text":"The aromatic bond count of the compound with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC is 23."}", "/scratch/micpie/export/rdkit_features/train_30-8.jsonl": "{"text":"The number of basic groups of the molecule with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F is 0."} {"text":"The basic group count of the molecule with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC is 0."}", "/scratch/micpie/export/rdkit_features/valid_100-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the compound with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F?\nAnswer: 5"} {"text":"Question: What is the aromatic bond count of the compound with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/test_24-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl?\nAnswer: 43.27"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C?\nAnswer: 70.37"}", "/scratch/micpie/export/rdkit_features/train_23-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4 is 65.41."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3 is 41.87."}", "/scratch/micpie/export/rdkit_features/valid_28-0.jsonl": "{"text":"The molecular formula of the chemical with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br is C16H20BrFN2O."} {"text":"The chemical formula of the chemical with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C is C22H26N2O2."}", "/scratch/micpie/export/rdkit_features/valid_4-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br?\nAnswer: 53.15"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C?\nAnswer: 68.95"}", "/scratch/micpie/export/rdkit_features/train_110-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O?\nAnswer: C22H19N5O5"} {"text":"Question: What is the chemical formula of the chemical with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: C19H25N5O3S2"}", "/scratch/micpie/export/rdkit_features/test_26-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3 is 51.53."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br is 43.80."}", "/scratch/micpie/export/rdkit_features/valid_17-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_118-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_107-6.jsonl": "{"text":"The count of aromatic bonds of the molecule with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC is 11."} {"text":"The aromatic bond count of the compound with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC is 12."}", "/scratch/micpie/export/rdkit_features/train_0-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1 is 7.56."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4 is 2.54."}", "/scratch/micpie/export/rdkit_features/train_7-6.jsonl": "{"text":"The aromatic bond count of the chemical with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl is 16."} {"text":"The aromatic bond count of the compound with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F is 12."}", "/scratch/micpie/export/rdkit_features/valid_120-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES c1cc(oc1)CNC(=O)CSc2nc3ccc(cc3s2)n4c(c5c(c4O)[C@@H]6C[C@@H]5C=C6)O?\nAnswer: C23H19N3O4S2"} {"text":"Question: What is the molecular formula of the compound with SMILES CN(c1nc2cc(ccc2s1)Cl)C(=O)CSc3ccc(c(c3)F)F?\nAnswer: C16H11ClF2N2OS2"}", "/scratch/micpie/export/rdkit_features/valid_102-3.jsonl": "{"text":"The heteroatom count of the chemical with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1 is 7."} {"text":"The heteroatom count of the compound with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2 is 10."}", "/scratch/micpie/export/rdkit_features/valid_17-0.jsonl": "{"text":"The formula of the compound with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S is C19H29F3N2O3S."} {"text":"The molecular formula of the chemical with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is C20H22N6O7."}", "/scratch/micpie/export/rdkit_features/valid_109-7.jsonl": "{"text":"The number of acid groups of the molecule with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl is 0."} {"text":"The acid group count of the compound with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F is 0."}", "/scratch/micpie/export/rdkit_features/valid_12-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_3-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N is 3.62."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC is 3.59."}", "/scratch/micpie/export/rdkit_features/valid_31-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the compound with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC is 65.41."} {"text":"The sum of atomic polarizabilities of the compound with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3 is 59.17."}", "/scratch/micpie/export/rdkit_features/train_19-18.jsonl": "{"text":"Question: What is the aromatic bond count of the chemical with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC?\nAnswer: 12"} {"text":"Question: What is the number of aromatic bonds of the compound with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/train_101-19.jsonl": "{"text":"Question: What is the acid group count of the compound with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3?\nAnswer: 0"} {"text":"Question: What is the acid group count of the molecule with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_115-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4 is 1."} {"text":"The number of hydrogen bond donors of the chemical with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F is 2."}", "/scratch/micpie/export/rdkit_features/valid_8-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C?\nAnswer: 5"} {"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_8-23.jsonl": "{"text":"User: I want to create a compound with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptor sites of 6 and a LogP value computed using the Wildman-Crippen method of 3.55.\nAssistant: Do you have some additional requirements?\nUser: I want the chemical formula to be C20H13F2N3O5.\nAssistant: In that situation, I suggest the compound with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N."} {"text":"User: I want to analyze a compound with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value computed using RDKit of 2.25.\nAssistant: That's interesting, do you have some additional limitations that help me narrow down the search?\nUser: Yes, I want the chemical formula to be C23H32NO4S+.\nAssistant: Then, I propose the compound with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO."}", "/scratch/micpie/export/rdkit_features/test_33-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES Cc1ccc(cc1)C(=O)N2CCC(CC2)NC(=O)N[C@@H](C)[C@H](C(F)(F)F)O?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES Cc1cc(no1)OCc2nc(no2)c3cc(c(n(c3=O)C)C)Br?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_21-19.jsonl": "{"text":"Question: What is the count of acid groups of the chemical with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-]?\nAnswer: 1"} {"text":"Question: What is the number of acid groups of the molecule with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_33-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES Cc1ccc(cc1)C(=O)N2CCC(CC2)NC(=O)N[C@@H](C)[C@H](C(F)(F)F)O?\nAnswer: 2"} {"text":"Question: What is the number of rings of the chemical with SMILES Cc1cc(no1)OCc2nc(no2)c3cc(c(n(c3=O)C)C)Br?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_1-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4?\nAnswer: 2"} {"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_24-12.jsonl": "{"text":"Question: What is the formula of the compound with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl?\nAnswer: C18H13ClN7O2-"} {"text":"Question: What is the chemical formula of the compound with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C?\nAnswer: C21H39N6O+"}", "/scratch/micpie/export/rdkit_features/train_0-3.jsonl": "{"text":"The heteroatom count of the chemical with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1 is 11."} {"text":"The count of heteroatoms of the molecule with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4 is 5."}", "/scratch/micpie/export/rdkit_features/valid_28-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br?\nAnswer: 48.10"} {"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C?\nAnswer: 59.86"}", "/scratch/micpie/export/rdkit_features/valid_5-20.jsonl": "{"text":"Question: What is the count of basic groups of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C?\nAnswer: 1"} {"text":"Question: What is the count of basic groups of the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+]?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_112-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC?\nAnswer: 66.20"} {"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 67.83"}", "/scratch/micpie/export/rdkit_features/train_31-8.jsonl": "{"text":"The basic group count of the chemical with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F is 0."} {"text":"The count of basic groups of the chemical with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4 is 0."}", "/scratch/micpie/export/rdkit_features/test_106-8.jsonl": "{"text":"The count of basic groups of the molecule with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O is 0."} {"text":"The count of basic groups of the chemical with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC is 0."}", "/scratch/micpie/export/rdkit_features/valid_110-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O is 0.67."} {"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C is 2.33."}", "/scratch/micpie/export/rdkit_features/valid_118-20.jsonl": "{"text":"Question: What is the count of basic groups of the compound with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O?\nAnswer: 1"} {"text":"Question: What is the number of basic groups of the molecule with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_1-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4?\nAnswer: 2"} {"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_114-1.jsonl": "{"text":"The number of hydrogen bond donors of the compound with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl is 3."} {"text":"The count of hydrogen bond donors of the compound with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4 is 3."}", "/scratch/micpie/export/rdkit_features/test_113-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the chemical with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 11"} {"text":"Question: What is the aromatic bond count of the compound with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 18"}", "/scratch/micpie/export/rdkit_features/train_0-23.jsonl": "{"text":"User: I want to analyze a molecule with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 8 and a Wildman-Crippen LogP value of 7.56.\nAssistant: Interesting, do you have some additional requirements that I should consider?\nUser: Indeed, I want the molecular formula to be C21H6F2N2O3S4.\nAssistant: Then, I propose the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1."} {"text":"User: I want to analyze a molecule with a count of hydrogen bond donors of 2, a count of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value of 2.54.\nAssistant: Do you have some additional constraints that help me narrow down the search?\nUser: I want the chemical formula to be C25H33N2O3+.\nAssistant: I suggest the molecule with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4."}", "/scratch/micpie/export/rdkit_features/test_23-12.jsonl": "{"text":"Question: What is the molecular formula of the compound with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4?\nAnswer: C19H34N5OS+"} {"text":"Question: What is the molecular formula of the molecule with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4?\nAnswer: C18H19ClN8O"}", "/scratch/micpie/export/rdkit_features/train_104-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 47.35"} {"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl?\nAnswer: 52.99"}", "/scratch/micpie/export/rdkit_features/valid_16-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the compound with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C?\nAnswer: 0"} {"text":"Question: What is the number of aromatic bonds of the molecule with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_120-19.jsonl": "{"text":"Question: What is the count of acid groups of the chemical with SMILES Cc1ccc(cc1)NC(=O)CSc2nnc(n2C)C=NC(=O)CCc3ccccc3Cl?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the compound with SMILES Cc1c(cnn1c2ccccc2)N[C@@H](C)C(=O)Nc3cccc(c3Cl)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_16-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_112-4.jsonl": "{"text":"The count of rings of the molecule with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC is 4."} {"text":"The ring count of the compound with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N is 3."}", "/scratch/micpie/export/rdkit_features/train_32-23.jsonl": "{"text":"User: I want to analyze a chemical with a number of hydrogen bond donors of 4, a count of hydrogen bond acceptors of 5 and a Wildman-Crippen LogP value computed using RDKit of 0.62.\nAssistant: That is a very interesting question, do you have some additional constraints that I should consider?\nUser: Yea, I want the chemical formula to be C18H32N5O3S+.\nAssistant: Then, I recommend the chemical with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C."} {"text":"User: I want to synthesize a molecule with a count of hydrogen bond donors of 3, a number of hydrogen bond acceptor sites of 4 and a Wildman-Crippen LogP value of 0.85.\nAssistant: Do you have some additional ?\nUser: Yea, I want the formula to be C17H20F3N4O3+.\nAssistant: In that scenario, I suggest the molecule with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2."}", "/scratch/micpie/export/rdkit_features/valid_19-8.jsonl": "{"text":"The basic group count of the compound with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC is 0."} {"text":"The count of basic groups of the compound with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F is 0."}", "/scratch/micpie/export/rdkit_features/valid_2-7.jsonl": "{"text":"The acid group count of the molecule with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4 is 0."} {"text":"The count of acid groups of the compound with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3 is 0."}", "/scratch/micpie/export/rdkit_features/train_10-11.jsonl": "{"text":"User: I want to make a compound with a chemical formula of C23H42N5O+.\nAssistant: Interesting, do you have some additional constraints that help me narrow down the search?\nUser: Yeah, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 5.\nAssistant: I advise the compound with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4."} {"text":"User: I want to make a compound with a molecular formula of C20H28ClNO2.\nAssistant: Cool, do you have some additional constraints that I should consider?\nUser: I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptor sites to be 2.\nAssistant: In that case, I advise the compound with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl."}", "/scratch/micpie/export/rdkit_features/test_23-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the compound with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4 is 1."} {"text":"The number of hydrogen bond donors of the chemical with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4 is 2."}", "/scratch/micpie/export/rdkit_features/test_108-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5 is 10."} {"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3 is 5."}", "/scratch/micpie/export/rdkit_features/test_106-23.jsonl": "{"text":"User: I want to analyze a molecule with a count of hydrogen bond donors of 2, a count of hydrogen bond acceptors of 6 and a Wildman-Crippen LogP value computed using RDKit of 2.47.\nAssistant: Nice, do you have some additional constraints that help me narrow down the search?\nUser: Yep, I want the chemical formula to be C14H18N4O4.\nAssistant: I the molecule with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O."} {"text":"User: I want to create a molecule with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptor sites of 6 and a LogP value computed using the Wildman-Crippen method of 2.24.\nAssistant: Cool, do you have some additional conditions I should take into account?\nUser: Yea, I want the chemical formula to be C22H27N3O5S.\nAssistant: I recommend the molecule with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC."}", "/scratch/micpie/export/rdkit_features/train_23-4.jsonl": "{"text":"The ring count of the compound with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4 is 4."} {"text":"The number of rings of the molecule with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3 is 3."}", "/scratch/micpie/export/rdkit_features/test_113-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N is 64.11."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 62.40."}", "/scratch/micpie/export/rdkit_features/test_27-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the molecule with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_6-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the compound with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3 is 3.27."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C is 3.62."}", "/scratch/micpie/export/rdkit_features/valid_103-3.jsonl": "{"text":"The count of heteroatoms of the chemical with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO is 7."} {"text":"The count of heteroatoms of the chemical with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N is 9."}", "/scratch/micpie/export/rdkit_features/valid_16-22.jsonl": "{"text":"User: I want to create a chemical with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptor sites of 2.\nAssistant: Do you have some additional constraints that help me narrow down the search?\nUser: Yeah, I want the count of heteroatoms to be 4.\nAssistant: I advise the chemical with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C."} {"text":"User: I want to make a chemical with a number of hydrogen bond donor sites of 1 and a count of hydrogen bond acceptors of 4.\nAssistant: That is a very interesting question, do you have some additional constraints that help me narrow down the search?\nUser: Indeed, I want the heteroatom count to be 6.\nAssistant: In that scenario, I recommend the chemical with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3."}", "/scratch/micpie/export/rdkit_features/valid_113-6.jsonl": "{"text":"The number of aromatic bonds of the chemical with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 6."} {"text":"The number of aromatic bonds of the chemical with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 18."}", "/scratch/micpie/export/rdkit_features/test_20-15.jsonl": "{"text":"Question: What is the heteroatom count of the chemical with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br?\nAnswer: 6"} {"text":"Question: What is the count of heteroatoms of the chemical with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-]?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/train_16-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C is C17H28N3OS+."} {"text":"The chemical formula of the molecule with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F is C23H28F2N2O3."}", "/scratch/micpie/export/rdkit_features/train_105-0.jsonl": "{"text":"The molecular formula of the chemical with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N is C21H15N3O4."} {"text":"The chemical formula of the compound with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O is C16H21N3O4."}", "/scratch/micpie/export/rdkit_features/valid_29-19.jsonl": "{"text":"Question: What is the count of acid groups of the molecule with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the compound with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_28-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl is 2."} {"text":"The number of rotatable bonds of the compound with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F is 4."}", "/scratch/micpie/export/rdkit_features/test_120-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES c1ccc(cc1)CCN(CC(=O)Nc2ccc(cc2)Cl)S(=O)(=O)c3ccc4c(c3)OCCO4?\nAnswer: 68.87"} {"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES Cc1c(cnn1c2ccccc2)N[C@H](C)C(=O)Nc3cccc(c3Cl)Cl?\nAnswer: 55.00"}", "/scratch/micpie/export/rdkit_features/train_115-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F?\nAnswer: 2"} {"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_102-7.jsonl": "{"text":"The count of acid groups of the compound with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3 is 0."} {"text":"The number of acid groups of the molecule with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1 is 1."}", "/scratch/micpie/export/rdkit_features/test_1-4.jsonl": "{"text":"The ring count of the molecule with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4 is 4."} {"text":"The number of rings of the chemical with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC is 5."}", "/scratch/micpie/export/rdkit_features/train_33-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the compound with SMILES COc1ccc(c(n1)C(F)(F)F)NC(=O)C(=O)NCCC[NH+]2CCCCC2 is 54.40."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@H]3CCSC3)Br is 51.69."}", "/scratch/micpie/export/rdkit_features/test_109-3.jsonl": "{"text":"The count of heteroatoms of the compound with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl is 8."} {"text":"The count of heteroatoms of the compound with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N is 10."}", "/scratch/micpie/export/rdkit_features/test_6-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3?\nAnswer: 3"} {"text":"Question: What is the number of rotatable bonds of the chemical with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_6-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4?\nAnswer: 4"} {"text":"Question: What is the count of rings of the compound with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_22-8.jsonl": "{"text":"The basic group count of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O is 0."} {"text":"The basic group count of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4 is 1."}", "/scratch/micpie/export/rdkit_features/test_21-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-]?\nAnswer: 6"} {"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_12-3.jsonl": "{"text":"The heteroatom count of the chemical with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C is 4."} {"text":"The count of heteroatoms of the compound with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC is 4."}", "/scratch/micpie/export/rdkit_features/train_111-19.jsonl": "{"text":"Question: What is the count of acid groups of the molecule with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the compound with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_26-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the chemical with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2?\nAnswer: 6"} {"text":"Question: What is the rotatable bond count of the chemical with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_30-20.jsonl": "{"text":"Question: What is the basic group count of the molecule with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3?\nAnswer: 0"} {"text":"Question: What is the basic group count of the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_108-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2 is 7."} {"text":"The count of hydrogen bond acceptors of the chemical with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3 is 4."}", "/scratch/micpie/export/rdkit_features/train_103-6.jsonl": "{"text":"The count of aromatic bonds of the compound with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1 is 17."} {"text":"The aromatic bond count of the molecule with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 5."}", "/scratch/micpie/export/rdkit_features/train_2-5.jsonl": "{"text":"The rotatable bond count of the chemical with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6 is 5."} {"text":"The number of rotatable bonds of the chemical with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O is 5."}", "/scratch/micpie/export/rdkit_features/valid_104-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3?\nAnswer: 2"} {"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_113-16.jsonl": "{"text":"Question: What is the count of rings of the compound with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 3"} {"text":"Question: What is the ring count of the molecule with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_29-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the compound with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl?\nAnswer: 6"} {"text":"Question: What is the aromatic bond count of the molecule with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/train_0-12.jsonl": "{"text":"Question: What is the formula of the chemical with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1?\nAnswer: C21H6F2N2O3S4"} {"text":"Question: What is the chemical formula of the chemical with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4?\nAnswer: C25H33N2O3+"}", "/scratch/micpie/export/rdkit_features/train_102-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the molecule with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O?\nAnswer: 0"} {"text":"Question: What is the number of aromatic bonds of the molecule with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1?\nAnswer: 17"}", "/scratch/micpie/export/rdkit_features/train_24-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the chemical with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5 is 2."} {"text":"The count of hydrogen bond donors of the chemical with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C is 1."}", "/scratch/micpie/export/rdkit_features/test_113-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the molecule with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 9"} {"text":"Question: What is the number of heteroatoms of the compound with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 10"}", "/scratch/micpie/export/rdkit_features/train_2-8.jsonl": "{"text":"The basic group count of the molecule with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6 is 1."} {"text":"The basic group count of the compound with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O is 0."}", "/scratch/micpie/export/rdkit_features/valid_120-0.jsonl": "{"text":"The molecular formula of the compound with SMILES c1cc(oc1)CNC(=O)CSc2nc3ccc(cc3s2)n4c(c5c(c4O)[C@@H]6C[C@@H]5C=C6)O is C23H19N3O4S2."} {"text":"The chemical formula of the compound with SMILES CN(c1nc2cc(ccc2s1)Cl)C(=O)CSc3ccc(c(c3)F)F is C16H11ClF2N2OS2."}", "/scratch/micpie/export/rdkit_features/test_100-22.jsonl": "{"text":"User: I want to design a compound with a count of hydrogen bond donors of 2 and a number of hydrogen bond acceptor sites of 1.\nAssistant: Do you have some additional conditions I should consider?\nUser: Yea, I want the number of heteroatoms to be 5.\nAssistant: In that case, I suggest the compound with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F."} {"text":"User: I want to create a compound with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptors of 3.\nAssistant: That is a very interesting question, do you have some additional limitations I should take into account?\nUser: Yea, I want the count of heteroatoms to be 6.\nAssistant: In that case, I recommend the compound with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br."}", "/scratch/micpie/export/rdkit_features/test_9-23.jsonl": "{"text":"User: I want to create a chemical with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptor sites of 3 and a LogP value computed using the Wildman-Crippen method of 2.94.\nAssistant: Cool, do you have some additional I should take into account?\nUser: Yes, I want the chemical formula to be C23H36N3O3+.\nAssistant: I suggest the chemical with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C."} {"text":"User: I want to create a compound with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptor sites of 6 and a LogP value computed using the Wildman-Crippen method of 2.40.\nAssistant: That's interesting, do you have some additional limitations I should consider?\nUser: Yeah, I want the chemical formula to be C23H40N7+.\nAssistant: In that scenario, I recommend the compound with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4."}", "/scratch/micpie/export/rdkit_features/train_33-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES COc1ccc(c(n1)C(F)(F)F)NC(=O)C(=O)NCCC[NH+]2CCCCC2?\nAnswer: 6"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@H]3CCSC3)Br?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_116-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1?\nAnswer: 76.74"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl?\nAnswer: 67.26"}", "/scratch/micpie/export/rdkit_features/train_31-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the chemical with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F?\nAnswer: 12"} {"text":"Question: What is the aromatic bond count of the compound with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/test_13-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 2"} {"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_5-11.jsonl": "{"text":"User: I want to design a molecule with a chemical formula of C23H27N2O3S+.\nAssistant: Cool, do you have some additional requirements that help me narrow down the search?\nUser: Yes, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 4.\nAssistant: I advise the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5."} {"text":"User: I want to create a chemical with a chemical formula of C22H22F2N2O3.\nAssistant: Interesting, do you have some additional that I should consider?\nUser: Yea, I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptors to be 4.\nAssistant: In that situation, I suggest the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F."}", "/scratch/micpie/export/rdkit_features/train_24-6.jsonl": "{"text":"The aromatic bond count of the chemical with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5 is 22."} {"text":"The number of aromatic bonds of the molecule with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C is 10."}", "/scratch/micpie/export/rdkit_features/train_119-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO is 43.60."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl is 63.14."}", "/scratch/micpie/export/rdkit_features/train_30-11.jsonl": "{"text":"User: I want to create a molecule with a formula of C19H27F3N2O2.\nAssistant: Nice, do you have some additional that help me narrow down the search?\nUser: Yep, I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptors to be 2.\nAssistant: In that scenario, I suggest the molecule with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F."} {"text":"User: I want to design a chemical with a molecular formula of C24H27FN2O3.\nAssistant: Cool, do you have some additional I should consider?\nUser: Yea, I want the count of hydrogen bond donors to be 0, the number of hydrogen bond acceptor sites to be 3.\nAssistant: In that case, I propose the chemical with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC."}", "/scratch/micpie/export/rdkit_features/valid_120-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the chemical with SMILES c1cc(oc1)CNC(=O)CSc2nc3ccc(cc3s2)n4c(c5c(c4O)[C@@H]6C[C@@H]5C=C6)O is 8."} {"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES CN(c1nc2cc(ccc2s1)Cl)C(=O)CSc3ccc(c(c3)F)F is 4."}", "/scratch/micpie/export/rdkit_features/valid_24-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl is 51.83."} {"text":"The sum of atomic polarizabilities of the compound with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C is 70.37."}", "/scratch/micpie/export/rdkit_features/valid_27-23.jsonl": "{"text":"User: I want to create a compound with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptors of 5 and a LogP value computed using the Wildman-Crippen method of 4.23.\nAssistant: Nice, do you have some additional constraints that help me narrow down the search?\nUser: Yea, I want the chemical formula to be C14H18F3N5OS.\nAssistant: In that situation, I advise the compound with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F."} {"text":"User: I want to make a molecule with a number of hydrogen bond donors of 2, a count of hydrogen bond acceptors of 1 and a Wildman-Crippen LogP value of 2.54.\nAssistant: Cool, do you have some additional limitations I should take into account?\nUser: Yep, I want the chemical formula to be C20H31ClN3O+.\nAssistant: Then, I the molecule with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C."}", "/scratch/micpie/export/rdkit_features/test_22-11.jsonl": "{"text":"User: I want to analyze a molecule with a chemical formula of C21H33N3O3.\nAssistant: Interesting, do you have some additional limitations I should consider?\nUser: Yes, I want the number of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 4.\nAssistant: Then, I recommend the molecule with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O."} {"text":"User: I want to synthesize a molecule with a molecular formula of C21H40N6O+2.\nAssistant: That's interesting, do you have some additional conditions that I should consider?\nUser: Indeed, I want the count of hydrogen bond donors to be 2, the count of hydrogen bond acceptors to be 5.\nAssistant: In that situation, I propose the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4."}", "/scratch/micpie/export/rdkit_features/test_32-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3 is 63.45."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3 is 59.77."}", "/scratch/micpie/export/rdkit_features/train_112-12.jsonl": "{"text":"Question: What is the chemical formula of the compound with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC?\nAnswer: C22H31N5O4"} {"text":"Question: What is the chemical formula of the chemical with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: C21H32N5O3S+"}", "/scratch/micpie/export/rdkit_features/valid_25-5.jsonl": "{"text":"The rotatable bond count of the compound with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4 is 7."} {"text":"The number of rotatable bonds of the compound with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3 is 6."}", "/scratch/micpie/export/rdkit_features/valid_20-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3?\nAnswer: 3"} {"text":"Question: What is the count of rings of the molecule with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-]?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_119-23.jsonl": "{"text":"User: I want to analyze a compound with a count of hydrogen bond donors of 2, a count of hydrogen bond acceptors of 6 and a Wildman-Crippen LogP value computed using RDKit of 1.71.\nAssistant: Do you have some additional ?\nUser: I want the chemical formula to be C15H16FN5O3.\nAssistant: Then, I the compound with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O."} {"text":"User: I want to create a compound with a number of hydrogen bond donor sites of 1, a count of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value of 3.88.\nAssistant: Cool, do you have some additional requirements I should take into account?\nUser: I want the chemical formula to be C22H29BrN2O4S.\nAssistant: In that situation, I recommend the compound with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C."}", "/scratch/micpie/export/rdkit_features/train_1-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4 is 70.61."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl is 55.14."}", "/scratch/micpie/export/rdkit_features/valid_119-8.jsonl": "{"text":"The number of basic groups of the compound with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O is 0."} {"text":"The basic group count of the chemical with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6 is 0."}", "/scratch/micpie/export/rdkit_features/valid_104-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the molecule with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3 is 6."} {"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C is 2."}", "/scratch/micpie/export/rdkit_features/test_1-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the molecule with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4 is 2."} {"text":"The number of hydrogen bond donor sites of the molecule with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC is 0."}", "/scratch/micpie/export/rdkit_features/valid_111-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4 is 7."} {"text":"The count of hydrogen bond acceptors of the chemical with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC is 10."}", "/scratch/micpie/export/rdkit_features/test_105-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N is 46.78."} {"text":"The sum of atomic polarizabilities of the compound with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O is 48.67."}", "/scratch/micpie/export/rdkit_features/valid_102-23.jsonl": "{"text":"User: I want to design a chemical with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value of 4.52.\nAssistant: Do you have some additional conditions I should take into account?\nUser: Yea, I want the formula to be C23H21F2N3O2.\nAssistant: Then, I propose the chemical with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1."} {"text":"User: I want to analyze a chemical with a count of hydrogen bond donors of 4, a number of hydrogen bond acceptor sites of 6 and a LogP value computed using the Wildman-Crippen method of 8.36.\nAssistant: Interesting, do you have some additional limitations that I should consider?\nUser: Yep, I want the chemical formula to be C48H76N6O4.\nAssistant: In that scenario, I recommend the chemical with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2."}", "/scratch/micpie/export/rdkit_features/train_118-0.jsonl": "{"text":"The molecular formula of the chemical with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F is C14H23FNO+."} {"text":"The chemical formula of the chemical with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3 is C15H18FN3O5."}", "/scratch/micpie/export/rdkit_features/test_114-7.jsonl": "{"text":"The number of acid groups of the molecule with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C is 0."} {"text":"The count of acid groups of the compound with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5 is 0."}", "/scratch/micpie/export/rdkit_features/valid_11-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_30-19.jsonl": "{"text":"Question: What is the acid group count of the compound with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_18-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O?\nAnswer: 7"} {"text":"Question: What is the rotatable bond count of the molecule with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_24-3.jsonl": "{"text":"The number of heteroatoms of the compound with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl is 11."} {"text":"The count of heteroatoms of the molecule with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C is 7."}", "/scratch/micpie/export/rdkit_features/valid_26-8.jsonl": "{"text":"The number of basic groups of the chemical with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2 is 0."} {"text":"The count of basic groups of the compound with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC is 0."}", "/scratch/micpie/export/rdkit_features/valid_10-19.jsonl": "{"text":"Question: What is the number of acid groups of the compound with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the molecule with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_13-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 0"} {"text":"Question: What is the basic group count of the compound with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_32-1.jsonl": "{"text":"The number of hydrogen bond donors of the molecule with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C is 4."} {"text":"The number of hydrogen bond donor sites of the chemical with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2 is 3."}", "/scratch/micpie/export/rdkit_features/test_107-5.jsonl": "{"text":"The rotatable bond count of the compound with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC is 8."} {"text":"The rotatable bond count of the molecule with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC is 6."}", "/scratch/micpie/export/rdkit_features/test_32-19.jsonl": "{"text":"Question: What is the acid group count of the compound with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3?\nAnswer: 2"} {"text":"Question: What is the acid group count of the molecule with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_14-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C is 3."} {"text":"The count of rotatable bonds of the compound with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O is 5."}", "/scratch/micpie/export/rdkit_features/train_10-16.jsonl": "{"text":"Question: What is the number of rings of the compound with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4?\nAnswer: 4"} {"text":"Question: What is the count of rings of the compound with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_24-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl is 6."} {"text":"The number of hydrogen bond acceptors of the chemical with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C is 5."}", "/scratch/micpie/export/rdkit_features/train_6-4.jsonl": "{"text":"The number of rings of the compound with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4 is 4."} {"text":"The number of rings of the molecule with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C is 4."}", "/scratch/micpie/export/rdkit_features/test_100-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F is 1.37."} {"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br is 2.67."}", "/scratch/micpie/export/rdkit_features/valid_22-11.jsonl": "{"text":"User: I want to create a molecule with a formula of C17H24ClN3O3S.\nAssistant: Interesting, do you have some additional conditions that I should consider?\nUser: Yeah, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 5.\nAssistant: Then, I advise the molecule with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O."} {"text":"User: I want to design a compound with a molecular formula of C20H33N6O2+.\nAssistant: That is a very interesting question, do you have some additional constraints I should take into account?\nUser: Yeah, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 7.\nAssistant: In that case, I propose the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4."}", "/scratch/micpie/export/rdkit_features/valid_22-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O is 2.41."} {"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4 is 0.74."}", "/scratch/micpie/export/rdkit_features/test_3-8.jsonl": "{"text":"The basic group count of the molecule with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC is 0."} {"text":"The count of basic groups of the compound with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F is 0."}", "/scratch/micpie/export/rdkit_features/train_24-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5 is 55.17."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C is 63.41."}", "/scratch/micpie/export/rdkit_features/train_11-11.jsonl": "{"text":"User: I want to make a compound with a molecular formula of C20H28ClNO2.\nAssistant: Do you have some additional constraints I should take into account?\nUser: I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 2.\nAssistant: In that situation, I advise the compound with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl."} {"text":"User: I want to synthesize a molecule with a molecular formula of C17H21Cl2NO.\nAssistant: That's interesting, do you have some additional requirements?\nUser: I want the count of hydrogen bond donors to be 0, the count of hydrogen bond acceptors to be 1.\nAssistant: In that scenario, I suggest the molecule with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C."}", "/scratch/micpie/export/rdkit_features/test_1-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4?\nAnswer: C23H25ClN3O2+"} {"text":"Question: What is the molecular formula of the compound with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC?\nAnswer: C23H23N5O3"}", "/scratch/micpie/export/rdkit_features/valid_3-11.jsonl": "{"text":"User: I want to create a compound with a molecular formula of C22H23N5O2S.\nAssistant: That is a very interesting question, do you have some additional that I should consider?\nUser: Indeed, I want the number of hydrogen bond donors to be 3, the number of hydrogen bond acceptor sites to be 5.\nAssistant: In that case, I advise the compound with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N."} {"text":"User: I want to synthesize a molecule with a molecular formula of C20H22ClN3O4.\nAssistant: Do you have some additional constraints I should take into account?\nUser: Yep, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 5.\nAssistant: In that case, I recommend the molecule with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC."}", "/scratch/micpie/export/rdkit_features/valid_9-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2 is C21H31F3N2O2."} {"text":"The formula of the compound with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4 is C23H39N5O2."}", "/scratch/micpie/export/rdkit_features/valid_31-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC?\nAnswer: 0"} {"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_119-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the molecule with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO?\nAnswer: 6"} {"text":"Question: What is the rotatable bond count of the molecule with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_115-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the molecule with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_11-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the molecule with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl?\nAnswer: 3"} {"text":"Question: What is the count of rotatable bonds of the compound with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_113-4.jsonl": "{"text":"The ring count of the compound with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 3."} {"text":"The count of rings of the chemical with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 3."}", "/scratch/micpie/export/rdkit_features/test_104-4.jsonl": "{"text":"The number of rings of the molecule with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2 is 2."} {"text":"The count of rings of the chemical with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3 is 3."}", "/scratch/micpie/export/rdkit_features/train_119-8.jsonl": "{"text":"The count of basic groups of the molecule with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO is 0."} {"text":"The basic group count of the compound with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl is 0."}", "/scratch/micpie/export/rdkit_features/train_5-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C is 4."} {"text":"The count of hydrogen bond acceptors of the compound with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C is 4."}", "/scratch/micpie/export/rdkit_features/valid_104-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3 is 47.35."} {"text":"The sum of atomic polarizabilities of the compound with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C is 64.53."}", "/scratch/micpie/export/rdkit_features/valid_19-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC?\nAnswer: 67.66"} {"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F?\nAnswer: 56.11"}", "/scratch/micpie/export/rdkit_features/train_101-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_116-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the molecule with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1 is 4."} {"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl is 4."}", "/scratch/micpie/export/rdkit_features/valid_2-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4?\nAnswer: 4"} {"text":"Question: What is the number of rings of the compound with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_1-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5 is 3.79."} {"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F is 3.68."}", "/scratch/micpie/export/rdkit_features/test_120-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES c1ccc(cc1)CCN(CC(=O)Nc2ccc(cc2)Cl)S(=O)(=O)c3ccc4c(c3)OCCO4 is 68.87."} {"text":"The sum of atomic polarizabilities of the compound with SMILES Cc1c(cnn1c2ccccc2)N[C@H](C)C(=O)Nc3cccc(c3Cl)Cl is 55.00."}", "/scratch/micpie/export/rdkit_features/train_28-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the chemical with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl?\nAnswer: 6"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_101-8.jsonl": "{"text":"The number of basic groups of the chemical with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3 is 0."} {"text":"The basic group count of the compound with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1 is 0."}", "/scratch/micpie/export/rdkit_features/valid_113-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: C21H32N5O3S+"} {"text":"Question: What is the chemical formula of the chemical with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: C21H19Cl2N3O2S2"}", "/scratch/micpie/export/rdkit_features/train_119-6.jsonl": "{"text":"The count of aromatic bonds of the chemical with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO is 12."} {"text":"The number of aromatic bonds of the compound with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl is 11."}", "/scratch/micpie/export/rdkit_features/train_23-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4?\nAnswer: 7"} {"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_25-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C?\nAnswer: 8"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_8-8.jsonl": "{"text":"The number of basic groups of the molecule with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C is 0."} {"text":"The count of basic groups of the compound with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br is 0."}", "/scratch/micpie/export/rdkit_features/test_115-3.jsonl": "{"text":"The number of heteroatoms of the chemical with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F is 8."} {"text":"The heteroatom count of the chemical with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 6."}", "/scratch/micpie/export/rdkit_features/test_117-22.jsonl": "{"text":"User: I want to design a molecule with a number of hydrogen bond donors of 0 and a number of hydrogen bond acceptors of 3.\nAssistant: That is a very interesting question, do you have some additional that I should consider?\nUser: Yep, I want the heteroatom count to be 6.\nAssistant: In that case, I the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc5cc(ccc5c4)Br."} {"text":"User: I want to synthesize a molecule with a number of hydrogen bond donors of 2 and a number of hydrogen bond acceptor sites of 2.\nAssistant: That's interesting, do you have some additional conditions I should take into account?\nUser: Indeed, I want the heteroatom count to be 3.\nAssistant: In that situation, I the molecule with SMILES C[C@H](C(C)(C)OC)[NH2+]CCNc1ccccc1."}", "/scratch/micpie/export/rdkit_features/train_1-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4 is 12."} {"text":"The aromatic bond count of the compound with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl is 6."}", "/scratch/micpie/export/rdkit_features/train_11-23.jsonl": "{"text":"User: I want to make a compound with a number of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 2 and a Wildman-Crippen LogP value of 4.69.\nAssistant: That's interesting, do you have some additional constraints I should take into account?\nUser: Yes, I want the chemical formula to be C20H28ClNO2.\nAssistant: In that scenario, I the compound with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl."} {"text":"User: I want to create a compound with a number of hydrogen bond donors of 0, a number of hydrogen bond acceptor sites of 1 and a Wildman-Crippen LogP value computed using RDKit of 4.74.\nAssistant: That is a very interesting question, do you have some additional I should consider?\nUser: Indeed, I want the chemical formula to be C17H21Cl2NO.\nAssistant: In that situation, I recommend the compound with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C."}", "/scratch/micpie/export/rdkit_features/train_115-8.jsonl": "{"text":"The number of basic groups of the compound with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F is 0."} {"text":"The number of basic groups of the molecule with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 0."}", "/scratch/micpie/export/rdkit_features/train_26-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F?\nAnswer: 42.45"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C?\nAnswer: 62.83"}", "/scratch/micpie/export/rdkit_features/valid_102-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1 is 3."} {"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2 is 6."}", "/scratch/micpie/export/rdkit_features/valid_31-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the compound with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC?\nAnswer: 12"} {"text":"Question: What is the number of aromatic bonds of the molecule with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_111-19.jsonl": "{"text":"Question: What is the count of acid groups of the chemical with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4?\nAnswer: 0"} {"text":"Question: What is the acid group count of the compound with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_105-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the molecule with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N?\nAnswer: 18"} {"text":"Question: What is the number of aromatic bonds of the compound with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_8-1.jsonl": "{"text":"The number of hydrogen bond donors of the molecule with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N is 2."} {"text":"The number of hydrogen bond donor sites of the molecule with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO is 2."}", "/scratch/micpie/export/rdkit_features/train_8-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N is C20H13F2N3O5."} {"text":"The molecular formula of the compound with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO is C23H32NO4S+."}", "/scratch/micpie/export/rdkit_features/valid_30-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3?\nAnswer: 3"} {"text":"Question: What is the count of rings of the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_108-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the molecule with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O?\nAnswer: 12"} {"text":"Question: What is the count of heteroatoms of the chemical with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/train_26-1.jsonl": "{"text":"The number of hydrogen bond donors of the chemical with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F is 0."} {"text":"The number of hydrogen bond donor sites of the chemical with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C is 1."}", "/scratch/micpie/export/rdkit_features/test_3-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC?\nAnswer: 65.06"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F?\nAnswer: 56.46"}", "/scratch/micpie/export/rdkit_features/valid_117-11.jsonl": "{"text":"User: I want to analyze a compound with a molecular formula of C27H27ClN2O4.\nAssistant: That's interesting, do you have some additional limitations I should take into account?\nUser: Indeed, I want the count of hydrogen bond donors to be 0, the number of hydrogen bond acceptor sites to be 4.\nAssistant: I suggest the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4OC)Cl."} {"text":"User: I want to create a chemical with a chemical formula of C14H21FNO+.\nAssistant: Do you have some additional limitations I should take into account?\nUser: Yeah, I want the count of hydrogen bond donors to be 2, the count of hydrogen bond acceptors to be 1.\nAssistant: In that situation, I recommend the chemical with SMILES Cc1cc(ccc1[C@@H](C)[NH2+]CC2(CC2)CO)F."}", "/scratch/micpie/export/rdkit_features/test_10-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the molecule with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4 is 1."} {"text":"The number of hydrogen bond donors of the molecule with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F is 0."}", "/scratch/micpie/export/rdkit_features/test_3-18.jsonl": "{"text":"Question: What is the aromatic bond count of the chemical with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC?\nAnswer: 17"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F?\nAnswer: 16"}", "/scratch/micpie/export/rdkit_features/train_109-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the molecule with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3?\nAnswer: 8"} {"text":"Question: What is the count of heteroatoms of the chemical with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/valid_8-18.jsonl": "{"text":"Question: What is the aromatic bond count of the chemical with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C?\nAnswer: 6"} {"text":"Question: What is the number of aromatic bonds of the chemical with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_13-4.jsonl": "{"text":"The ring count of the compound with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC is 2."} {"text":"The ring count of the chemical with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C is 2."}", "/scratch/micpie/export/rdkit_features/train_3-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_112-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC?\nAnswer: 7"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_20-0.jsonl": "{"text":"The molecular formula of the compound with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br is C19H24BrFN2O2."} {"text":"The formula of the molecule with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-] is C17H15ClNO6S-."}", "/scratch/micpie/export/rdkit_features/test_11-19.jsonl": "{"text":"Question: What is the count of acid groups of the compound with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the compound with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_5-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5 is 1."} {"text":"The number of hydrogen bond donor sites of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F is 2."}", "/scratch/micpie/export/rdkit_features/train_18-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_20-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the chemical with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3 is 3."} {"text":"The number of hydrogen bond acceptors of the chemical with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC is 6."}", "/scratch/micpie/export/rdkit_features/train_113-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_114-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the chemical with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl?\nAnswer: 5"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/valid_11-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_13-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4 is 17."} {"text":"The number of aromatic bonds of the chemical with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C is 22."}", "/scratch/micpie/export/rdkit_features/test_5-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5?\nAnswer: C23H27N2O3S+"} {"text":"Question: What is the chemical formula of the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F?\nAnswer: C22H22F2N2O3"}", "/scratch/micpie/export/rdkit_features/valid_24-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl is 1.94."} {"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C is 0.99."}", "/scratch/micpie/export/rdkit_features/train_6-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4?\nAnswer: 61.71"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C?\nAnswer: 70.01"}", "/scratch/micpie/export/rdkit_features/train_100-23.jsonl": "{"text":"User: I want to make a molecule with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 2 and a LogP value computed using the Wildman-Crippen method of 1.13.\nAssistant: Interesting, do you have some additional constraints that help me narrow down the search?\nUser: Indeed, I want the chemical formula to be C19H33N2O2+.\nAssistant: In that case, I suggest the molecule with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C."} {"text":"User: I want to make a chemical with a count of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 5 and a Wildman-Crippen LogP value of 2.93.\nAssistant: That is a very interesting question, do you have some additional conditions that I should consider?\nUser: I want the formula to be C17H15N3O3.\nAssistant: In that case, I propose the chemical with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3."}", "/scratch/micpie/export/rdkit_features/valid_102-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_118-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O?\nAnswer: C14H23FNO+"} {"text":"Question: What is the chemical formula of the compound with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N?\nAnswer: C14H12FN5O5"}", "/scratch/micpie/export/rdkit_features/valid_103-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the compound with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO is 1.51."} {"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N is -1.16."}", "/scratch/micpie/export/rdkit_features/valid_15-7.jsonl": "{"text":"The count of acid groups of the compound with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N is 0."} {"text":"The count of acid groups of the molecule with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C is 0."}", "/scratch/micpie/export/rdkit_features/test_114-19.jsonl": "{"text":"Question: What is the count of acid groups of the compound with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the molecule with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_19-6.jsonl": "{"text":"The aromatic bond count of the molecule with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl is 11."} {"text":"The count of aromatic bonds of the molecule with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4 is 5."}", "/scratch/micpie/export/rdkit_features/valid_33-19.jsonl": "{"text":"Question: What is the acid group count of the chemical with SMILES c1ccc2c(c1)cn(n2)CC(=O)N3C[C@@H]4[C@H]3CN(CC4)C(=O)[C@@]56CCC[C@@H]5C6?\nAnswer: 0"} {"text":"Question: What is the acid group count of the chemical with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@@H]3CCSC3)Br?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_18-19.jsonl": "{"text":"Question: What is the number of acid groups of the compound with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the chemical with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_22-23.jsonl": "{"text":"User: I want to design a chemical with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 4 and a Wildman-Crippen LogP value computed using RDKit of 2.49.\nAssistant: Interesting, do you have some additional I should take into account?\nUser: Yea, I want the chemical formula to be C19H25ClFN3O3.\nAssistant: I propose the chemical with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O."} {"text":"User: I want to make a molecule with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptors of 7 and a Wildman-Crippen LogP value of 0.74.\nAssistant: That's interesting, do you have some additional constraints I should consider?\nUser: Yes, I want the formula to be C20H33N6O2+.\nAssistant: Then, I the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4."}", "/scratch/micpie/export/rdkit_features/test_4-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_109-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl?\nAnswer: C22H26ClN3O4"} {"text":"Question: What is the chemical formula of the chemical with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N?\nAnswer: C16H13BrN4O4S"}", "/scratch/micpie/export/rdkit_features/test_110-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3 is 9."} {"text":"The rotatable bond count of the molecule with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4 is 7."}", "/scratch/micpie/export/rdkit_features/valid_26-23.jsonl": "{"text":"User: I want to synthesize a chemical with a number of hydrogen bond donors of 0, a number of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value computed using RDKit of 3.59.\nAssistant: Nice, do you have some additional constraints?\nUser: I want the formula to be C18H24F3NO3.\nAssistant: I the chemical with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2."} {"text":"User: I want to create a molecule with a number of hydrogen bond donors of 3, a number of hydrogen bond acceptor sites of 6 and a Wildman-Crippen LogP value of 3.65.\nAssistant: Cool, do you have some additional requirements that help me narrow down the search?\nUser: Yep, I want the formula to be C19H20N4O4.\nAssistant: Then, I propose the molecule with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC."}", "/scratch/micpie/export/rdkit_features/valid_13-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4 is C21H20FN3O."} {"text":"The chemical formula of the compound with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C is C21H20N4."}", "/scratch/micpie/export/rdkit_features/valid_8-4.jsonl": "{"text":"The number of rings of the compound with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C is 2."} {"text":"The ring count of the chemical with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br is 2."}", "/scratch/micpie/export/rdkit_features/valid_107-23.jsonl": "{"text":"User: I want to make a compound with a number of hydrogen bond donors of 2, a number of hydrogen bond acceptors of 5 and a Wildman-Crippen LogP value computed using RDKit of 2.20.\nAssistant: That is a very interesting question, do you have some additional conditions I should take into account?\nUser: Indeed, I want the chemical formula to be C24H27N5O3.\nAssistant: In that case, I propose the compound with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C."} {"text":"User: I want to analyze a compound with a number of hydrogen bond donors of 3, a number of hydrogen bond acceptors of 6 and a Wildman-Crippen LogP value computed using RDKit of 2.25.\nAssistant: That's interesting, do you have some additional conditions that help me narrow down the search?\nUser: Yep, I want the molecular formula to be C23H27N3O6.\nAssistant: I propose the compound with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C."}", "/scratch/micpie/export/rdkit_features/train_4-1.jsonl": "{"text":"The number of hydrogen bond donors of the compound with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4 is 1."} {"text":"The number of hydrogen bond donors of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C is 2."}", "/scratch/micpie/export/rdkit_features/valid_111-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the chemical with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4 is 1.02."} {"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC is 2.11."}", "/scratch/micpie/export/rdkit_features/valid_24-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl?\nAnswer: 5"} {"text":"Question: What is the rotatable bond count of the molecule with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/train_113-23.jsonl": "{"text":"User: I want to make a compound with a number of hydrogen bond donor sites of 3, a number of hydrogen bond acceptor sites of 5 and a LogP value computed using the Wildman-Crippen method of 0.24.\nAssistant: Do you have some additional conditions?\nUser: Yea, I want the molecular formula to be C21H32N5O3S+.\nAssistant: In that situation, I recommend the compound with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N."} {"text":"User: I want to synthesize a chemical with a number of hydrogen bond donor sites of 3, a count of hydrogen bond acceptors of 3 and a LogP value computed using the Wildman-Crippen method of 6.22.\nAssistant: Interesting, do you have some additional limitations that I should consider?\nUser: Yeah, I want the chemical formula to be C21H19Cl2N3O2S2.\nAssistant: In that scenario, I the chemical with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl."}", "/scratch/micpie/export/rdkit_features/valid_2-8.jsonl": "{"text":"The basic group count of the chemical with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4 is 0."} {"text":"The basic group count of the compound with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3 is 0."}", "/scratch/micpie/export/rdkit_features/train_11-16.jsonl": "{"text":"Question: What is the ring count of the compound with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl?\nAnswer: 3"} {"text":"Question: What is the number of rings of the chemical with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_117-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc5cc(ccc5c4)Br is 3."} {"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES C[C@H](C(C)(C)OC)[NH2+]CCNc1ccccc1 is 2."}", "/scratch/micpie/export/rdkit_features/train_9-3.jsonl": "{"text":"The count of heteroatoms of the compound with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C is 6."} {"text":"The count of heteroatoms of the compound with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F is 7."}", "/scratch/micpie/export/rdkit_features/train_15-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the molecule with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC?\nAnswer: 6"} {"text":"Question: What is the count of rotatable bonds of the molecule with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_32-23.jsonl": "{"text":"User: I want to synthesize a chemical with a count of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 6 and a Wildman-Crippen LogP value of 0.85.\nAssistant: That is a very interesting question, do you have some additional constraints that I should consider?\nUser: Indeed, I want the chemical formula to be C19H30N5O2S+.\nAssistant: Then, I propose the chemical with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3."} {"text":"User: I want to create a chemical with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptors of 5 and a LogP value computed using the Wildman-Crippen method of 2.09.\nAssistant: Do you have some additional conditions I should take into account?\nUser: I want the molecular formula to be C16H18Cl2N2O5.\nAssistant: Then, I propose the chemical with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O."}", "/scratch/micpie/export/rdkit_features/test_4-7.jsonl": "{"text":"The acid group count of the chemical with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4 is 0."} {"text":"The number of acid groups of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3 is 0."}", "/scratch/micpie/export/rdkit_features/test_100-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the molecule with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F?\nAnswer: 5"} {"text":"Question: What is the number of aromatic bonds of the molecule with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/valid_2-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4?\nAnswer: 62.67"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3?\nAnswer: 64.53"}", "/scratch/micpie/export/rdkit_features/train_22-4.jsonl": "{"text":"The count of rings of the molecule with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O is 3."} {"text":"The number of rings of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4 is 4."}", "/scratch/micpie/export/rdkit_features/test_101-11.jsonl": "{"text":"User: I want to create a compound with a molecular formula of C15H14F2N2O3.\nAssistant: Interesting, do you have some additional I should consider?\nUser: Indeed, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 4.\nAssistant: I suggest the compound with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F."} {"text":"User: I want to synthesize a compound with a chemical formula of C26H20N6OS.\nAssistant: Interesting, do you have some additional requirements I should take into account?\nUser: Yea, I want the number of hydrogen bond donor sites to be 2, the number of hydrogen bond acceptors to be 6.\nAssistant: In that scenario, I the compound with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1."}", "/scratch/micpie/export/rdkit_features/valid_111-20.jsonl": "{"text":"Question: What is the basic group count of the molecule with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4?\nAnswer: 1"} {"text":"Question: What is the basic group count of the molecule with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_26-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the chemical with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F is 2."} {"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C is 3."}", "/scratch/micpie/export/rdkit_features/test_103-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1?\nAnswer: 53.19"} {"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 47.35"}", "/scratch/micpie/export/rdkit_features/train_113-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 69.10"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 64.69"}", "/scratch/micpie/export/rdkit_features/test_110-0.jsonl": "{"text":"The molecular formula of the compound with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3 is C23H30N3O4S+."} {"text":"The chemical formula of the molecule with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4 is C21H31N5O3S."}", "/scratch/micpie/export/rdkit_features/test_3-3.jsonl": "{"text":"The number of heteroatoms of the chemical with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC is 9."} {"text":"The number of heteroatoms of the compound with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F is 9."}", "/scratch/micpie/export/rdkit_features/train_31-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F is 3.73."} {"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4 is 2.47."}", "/scratch/micpie/export/rdkit_features/test_32-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3?\nAnswer: 63.45"} {"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3?\nAnswer: 59.77"}", "/scratch/micpie/export/rdkit_features/valid_12-11.jsonl": "{"text":"User: I want to design a molecule with a chemical formula of C20H26ClNO2.\nAssistant: That is a very interesting question, do you have some additional conditions I should consider?\nUser: Yep, I want the number of hydrogen bond donors to be 0, the count of hydrogen bond acceptors to be 2.\nAssistant: I advise the molecule with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl."} {"text":"User: I want to make a molecule with a chemical formula of C19H33FN4.\nAssistant: Do you have some additional limitations I should consider?\nUser: Yes, I want the count of hydrogen bond donors to be 0, the number of hydrogen bond acceptors to be 4.\nAssistant: Then, I advise the molecule with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3."}", "/scratch/micpie/export/rdkit_features/test_101-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F is 2.87."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1 is 4.81."}", "/scratch/micpie/export/rdkit_features/train_25-0.jsonl": "{"text":"The chemical formula of the compound with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C is C19H31N7O2."} {"text":"The chemical formula of the chemical with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F is C15H15F6NO2."}", "/scratch/micpie/export/rdkit_features/test_33-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES Cc1ccc(cc1)C(=O)N2CCC(CC2)NC(=O)N[C@@H](C)[C@H](C(F)(F)F)O?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES Cc1cc(no1)OCc2nc(no2)c3cc(c(n(c3=O)C)C)Br?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/valid_32-11.jsonl": "{"text":"User: I want to make a chemical with a chemical formula of C19H30N5O2S+.\nAssistant: Cool, do you have some additional conditions I should consider?\nUser: Yep, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 6.\nAssistant: In that scenario, I recommend the chemical with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3."} {"text":"User: I want to synthesize a chemical with a chemical formula of C16H18Cl2N2O5.\nAssistant: Interesting, do you have some additional requirements I should consider?\nUser: I want the number of hydrogen bond donor sites to be 2, the count of hydrogen bond acceptors to be 5.\nAssistant: In that situation, I suggest the chemical with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O."}", "/scratch/micpie/export/rdkit_features/valid_19-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC is 67.66."} {"text":"The sum of atomic polarizabilities of the compound with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F is 56.11."}", "/scratch/micpie/export/rdkit_features/train_108-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_106-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O?\nAnswer: 0"} {"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_20-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3?\nAnswer: 71.71"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC?\nAnswer: 49.29"}", "/scratch/micpie/export/rdkit_features/train_3-4.jsonl": "{"text":"The number of rings of the molecule with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O is 5."} {"text":"The number of rings of the chemical with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4 is 4."}", "/scratch/micpie/export/rdkit_features/valid_107-12.jsonl": "{"text":"Question: What is the formula of the chemical with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C?\nAnswer: C24H27N5O3"} {"text":"Question: What is the chemical formula of the molecule with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C?\nAnswer: C23H27N3O6"}", "/scratch/micpie/export/rdkit_features/test_12-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C is 51.24."} {"text":"The sum of atomic polarizabilities of the compound with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4 is 59.86."}", "/scratch/micpie/export/rdkit_features/test_32-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the molecule with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3?\nAnswer: 11"} {"text":"Question: What is the aromatic bond count of the chemical with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/test_28-12.jsonl": "{"text":"Question: What is the formula of the compound with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC?\nAnswer: C17H19N4OS2+"} {"text":"Question: What is the chemical formula of the molecule with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C?\nAnswer: C20H25N5O"}", "/scratch/micpie/export/rdkit_features/valid_103-4.jsonl": "{"text":"The count of rings of the chemical with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO is 2."} {"text":"The count of rings of the molecule with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N is 3."}", "/scratch/micpie/export/rdkit_features/valid_1-3.jsonl": "{"text":"The heteroatom count of the compound with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5 is 9."} {"text":"The number of heteroatoms of the compound with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F is 8."}", "/scratch/micpie/export/rdkit_features/train_8-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N is 5."} {"text":"The count of rotatable bonds of the chemical with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO is 10."}", "/scratch/micpie/export/rdkit_features/train_9-11.jsonl": "{"text":"User: I want to create a compound with a chemical formula of C25H32N3O3+.\nAssistant: That is a very interesting question, do you have some additional requirements that help me narrow down the search?\nUser: Indeed, I want the number of hydrogen bond donors to be 3, the count of hydrogen bond acceptors to be 3.\nAssistant: In that scenario, I the compound with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C."} {"text":"User: I want to create a molecule with a chemical formula of C22H32FN5O.\nAssistant: That is a very interesting question, do you have some additional limitations I should take into account?\nUser: I want the count of hydrogen bond donors to be 0, the number of hydrogen bond acceptors to be 5.\nAssistant: In that situation, I advise the molecule with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F."}", "/scratch/micpie/export/rdkit_features/train_116-19.jsonl": "{"text":"Question: What is the count of acid groups of the molecule with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 0"} {"text":"Question: What is the acid group count of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_111-4.jsonl": "{"text":"The number of rings of the molecule with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C is 4."} {"text":"The number of rings of the compound with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC is 4."}", "/scratch/micpie/export/rdkit_features/valid_120-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the molecule with SMILES c1cc(oc1)CNC(=O)CSc2nc3ccc(cc3s2)n4c(c5c(c4O)[C@@H]6C[C@@H]5C=C6)O is 3."} {"text":"The number of hydrogen bond donors of the compound with SMILES CN(c1nc2cc(ccc2s1)Cl)C(=O)CSc3ccc(c(c3)F)F is 0."}", "/scratch/micpie/export/rdkit_features/train_109-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the molecule with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3 is 2."} {"text":"The number of hydrogen bond donor sites of the compound with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C is 3."}", "/scratch/micpie/export/rdkit_features/valid_22-6.jsonl": "{"text":"The aromatic bond count of the chemical with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O is 5."} {"text":"The aromatic bond count of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4 is 10."}", "/scratch/micpie/export/rdkit_features/valid_109-12.jsonl": "{"text":"Question: What is the formula of the compound with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl?\nAnswer: C22H26ClN3O4"} {"text":"Question: What is the formula of the molecule with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F?\nAnswer: C21H19FN4O3S"}", "/scratch/micpie/export/rdkit_features/valid_30-23.jsonl": "{"text":"User: I want to make a chemical with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value of 3.66.\nAssistant: Interesting, do you have some additional ?\nUser: Yeah, I want the formula to be C21H34N4O.\nAssistant: In that situation, I suggest the chemical with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3."} {"text":"User: I want to design a molecule with a number of hydrogen bond donors of 0, a count of hydrogen bond acceptors of 2 and a Wildman-Crippen LogP value of 3.89.\nAssistant: Cool, do you have some additional constraints I should take into account?\nUser: Indeed, I want the chemical formula to be C24H24F2N2O2.\nAssistant: In that scenario, I advise the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F."}", "/scratch/micpie/export/rdkit_features/test_9-4.jsonl": "{"text":"The number of rings of the molecule with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C is 3."} {"text":"The number of rings of the compound with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4 is 4."}", "/scratch/micpie/export/rdkit_features/valid_115-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4 is 8."} {"text":"The rotatable bond count of the compound with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F is 4."}", "/scratch/micpie/export/rdkit_features/test_2-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C?\nAnswer: C20H27ClFN4O2+"} {"text":"Question: What is the chemical formula of the chemical with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4?\nAnswer: C24H25N3O3"}", "/scratch/micpie/export/rdkit_features/train_118-22.jsonl": "{"text":"User: I want to synthesize a compound with a count of hydrogen bond donors of 2 and a number of hydrogen bond acceptor sites of 1.\nAssistant: That is a very interesting question, do you have some additional conditions that help me narrow down the search?\nUser: Indeed, I want the count of heteroatoms to be 3.\nAssistant: I propose the compound with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F."} {"text":"User: I want to analyze a chemical with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptor sites of 6.\nAssistant: Interesting, do you have some additional limitations that help me narrow down the search?\nUser: Yeah, I want the count of heteroatoms to be 9.\nAssistant: In that scenario, I suggest the chemical with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3."}", "/scratch/micpie/export/rdkit_features/test_0-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_115-3.jsonl": "{"text":"The heteroatom count of the chemical with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4 is 9."} {"text":"The number of heteroatoms of the chemical with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F is 9."}", "/scratch/micpie/export/rdkit_features/test_30-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3 is C21H25N3O2."} {"text":"The molecular formula of the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC is C24H27FN2O3."}", "/scratch/micpie/export/rdkit_features/train_107-23.jsonl": "{"text":"User: I want to analyze a compound with a number of hydrogen bond donors of 3, a number of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value of 0.61.\nAssistant: That is a very interesting question, do you have some additional conditions I should consider?\nUser: Yeah, I want the chemical formula to be C21H30ClN6O3+.\nAssistant: In that scenario, I propose the compound with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl."} {"text":"User: I want to synthesize a chemical with a number of hydrogen bond donors of 2, a number of hydrogen bond acceptor sites of 10 and a LogP value computed using the Wildman-Crippen method of 2.55.\nAssistant: Cool, do you have some additional conditions that help me narrow down the search?\nUser: Yeah, I want the chemical formula to be C21H19N9OS.\nAssistant: In that situation, I recommend the chemical with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5."}", "/scratch/micpie/export/rdkit_features/test_111-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4?\nAnswer: 7"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/train_8-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N is 52.29."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO is 69.03."}", "/scratch/micpie/export/rdkit_features/train_27-4.jsonl": "{"text":"The count of rings of the compound with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C is 3."} {"text":"The number of rings of the compound with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4 is 4."}", "/scratch/micpie/export/rdkit_features/train_13-1.jsonl": "{"text":"The count of hydrogen bond donors of the chemical with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC is 1."} {"text":"The number of hydrogen bond donors of the compound with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C is 2."}", "/scratch/micpie/export/rdkit_features/train_31-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the molecule with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F?\nAnswer: 6"} {"text":"Question: What is the number of heteroatoms of the compound with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_19-17.jsonl": "{"text":"Question: What is the rotatable bond count of the chemical with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl?\nAnswer: 8"} {"text":"Question: What is the rotatable bond count of the molecule with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_108-8.jsonl": "{"text":"The number of basic groups of the chemical with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2 is 0."} {"text":"The basic group count of the molecule with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3 is 0."}", "/scratch/micpie/export/rdkit_features/train_23-22.jsonl": "{"text":"User: I want to analyze a chemical with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 7.\nAssistant: Interesting, do you have some additional I should consider?\nUser: Yea, I want the number of heteroatoms to be 8.\nAssistant: I suggest the chemical with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4."} {"text":"User: I want to design a chemical with a number of hydrogen bond donors of 2 and a number of hydrogen bond acceptors of 4.\nAssistant: That's interesting, do you have some additional conditions that I should consider?\nUser: Yep, I want the number of heteroatoms to be 8.\nAssistant: In that case, I propose the chemical with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3."}", "/scratch/micpie/export/rdkit_features/train_110-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O?\nAnswer: 7"} {"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/test_19-1.jsonl": "{"text":"The number of hydrogen bond donors of the molecule with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl is 2."} {"text":"The number of hydrogen bond donor sites of the molecule with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4 is 2."}", "/scratch/micpie/export/rdkit_features/test_100-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F?\nAnswer: 5"} {"text":"Question: What is the number of rotatable bonds of the chemical with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_105-22.jsonl": "{"text":"User: I want to synthesize a chemical with a number of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 5.\nAssistant: That is a very interesting question, do you have some additional that help me narrow down the search?\nUser: Yes, I want the count of heteroatoms to be 7.\nAssistant: Then, I propose the chemical with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N."} {"text":"User: I want to design a chemical with a number of hydrogen bond donors of 2 and a number of hydrogen bond acceptors of 6.\nAssistant: Do you have some additional requirements I should take into account?\nUser: Yep, I want the count of heteroatoms to be 7.\nAssistant: In that scenario, I suggest the chemical with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O."}", "/scratch/micpie/export/rdkit_features/valid_1-22.jsonl": "{"text":"User: I want to make a compound with a number of hydrogen bond donor sites of 1 and a count of hydrogen bond acceptors of 7.\nAssistant: Interesting, do you have some additional ?\nUser: I want the heteroatom count to be 9.\nAssistant: I the compound with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5."} {"text":"User: I want to make a chemical with a count of hydrogen bond donors of 0 and a number of hydrogen bond acceptor sites of 7.\nAssistant: Nice, do you have some additional limitations that help me narrow down the search?\nUser: Yes, I want the heteroatom count to be 8.\nAssistant: In that scenario, I recommend the chemical with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F."}", "/scratch/micpie/export/rdkit_features/valid_111-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4 is 9."} {"text":"The number of rotatable bonds of the molecule with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC is 8."}", "/scratch/micpie/export/rdkit_features/train_21-7.jsonl": "{"text":"The acid group count of the chemical with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC is 1."} {"text":"The number of acid groups of the compound with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O is 0."}", "/scratch/micpie/export/rdkit_features/train_26-0.jsonl": "{"text":"The molecular formula of the molecule with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F is C15H15F6NO2."} {"text":"The chemical formula of the compound with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C is C21H31N4O+."}", "/scratch/micpie/export/rdkit_features/valid_22-8.jsonl": "{"text":"The count of basic groups of the chemical with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O is 0."} {"text":"The count of basic groups of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4 is 1."}", "/scratch/micpie/export/rdkit_features/valid_110-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the compound with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O?\nAnswer: 11"} {"text":"Question: What is the number of aromatic bonds of the chemical with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 15"}", "/scratch/micpie/export/rdkit_features/train_11-0.jsonl": "{"text":"The molecular formula of the chemical with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl is C20H28ClNO2."} {"text":"The molecular formula of the compound with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C is C17H21Cl2NO."}", "/scratch/micpie/export/rdkit_features/test_101-3.jsonl": "{"text":"The count of heteroatoms of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F is 7."} {"text":"The heteroatom count of the compound with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1 is 8."}", "/scratch/micpie/export/rdkit_features/test_111-8.jsonl": "{"text":"The basic group count of the chemical with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4 is 1."} {"text":"The count of basic groups of the molecule with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC is 0."}", "/scratch/micpie/export/rdkit_features/test_117-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc5cc(ccc5c4)Br?\nAnswer: 4"} {"text":"Question: What is the rotatable bond count of the molecule with SMILES C[C@H](C(C)(C)OC)[NH2+]CCNc1ccccc1?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/train_26-3.jsonl": "{"text":"The heteroatom count of the chemical with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F is 9."} {"text":"The number of heteroatoms of the molecule with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C is 5."}", "/scratch/micpie/export/rdkit_features/train_108-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2 is 56.70."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3 is 69.43."}", "/scratch/micpie/export/rdkit_features/valid_2-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4?\nAnswer: 7"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_120-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES c1ccc(cc1)CCN(CC(=O)Nc2ccc(cc2)Cl)S(=O)(=O)c3ccc4c(c3)OCCO4 is 5."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES Cc1c(cnn1c2ccccc2)N[C@H](C)C(=O)Nc3cccc(c3Cl)Cl is 4."}", "/scratch/micpie/export/rdkit_features/train_16-3.jsonl": "{"text":"The heteroatom count of the molecule with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C is 5."} {"text":"The heteroatom count of the compound with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F is 7."}", "/scratch/micpie/export/rdkit_features/train_119-11.jsonl": "{"text":"User: I want to design a chemical with a molecular formula of C15H14FN3O5.\nAssistant: That's interesting, do you have some additional that I should consider?\nUser: Indeed, I want the number of hydrogen bond donors to be 3, the number of hydrogen bond acceptors to be 6.\nAssistant: In that situation, I propose the chemical with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO."} {"text":"User: I want to synthesize a molecule with a formula of C19H23Cl2N5O2S.\nAssistant: That is a very interesting question, do you have some additional limitations that help me narrow down the search?\nUser: Indeed, I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 6.\nAssistant: I recommend the molecule with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl."}", "/scratch/micpie/export/rdkit_features/train_29-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the molecule with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F is 3.75."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C is 2.52."}", "/scratch/micpie/export/rdkit_features/valid_108-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O is 58.27."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C is 70.76."}", "/scratch/micpie/export/rdkit_features/test_0-23.jsonl": "{"text":"User: I want to create a molecule with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 9 and a LogP value computed using the Wildman-Crippen method of 7.58.\nAssistant: That's interesting, do you have some additional conditions that I should consider?\nUser: Yep, I want the chemical formula to be C24H12N2O3S5.\nAssistant: I propose the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1."} {"text":"User: I want to create a compound with a number of hydrogen bond donor sites of 1, a count of hydrogen bond acceptors of 6 and a Wildman-Crippen LogP value of 3.66.\nAssistant: Cool, do you have some additional limitations I should consider?\nUser: Yea, I want the chemical formula to be C24H21N3O4.\nAssistant: I the compound with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC."}", "/scratch/micpie/export/rdkit_features/test_112-1.jsonl": "{"text":"The count of hydrogen bond donors of the chemical with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC is 0."} {"text":"The count of hydrogen bond donors of the compound with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N is 3."}", "/scratch/micpie/export/rdkit_features/valid_0-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1 is 8."} {"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4 is 4."}", "/scratch/micpie/export/rdkit_features/train_19-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the compound with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC is 67.66."} {"text":"The sum of atomic polarizabilities of the compound with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3 is 71.71."}", "/scratch/micpie/export/rdkit_features/train_25-5.jsonl": "{"text":"The rotatable bond count of the compound with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C is 6."} {"text":"The rotatable bond count of the compound with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F is 3."}", "/scratch/micpie/export/rdkit_features/valid_109-22.jsonl": "{"text":"User: I want to analyze a molecule with a number of hydrogen bond donors of 3 and a count of hydrogen bond acceptors of 5.\nAssistant: That is a very interesting question, do you have some additional constraints that I should consider?\nUser: Yep, I want the number of heteroatoms to be 8.\nAssistant: In that case, I propose the molecule with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl."} {"text":"User: I want to create a chemical with a number of hydrogen bond donors of 3 and a number of hydrogen bond acceptors of 5.\nAssistant: Interesting, do you have some additional requirements I should take into account?\nUser: Indeed, I want the heteroatom count to be 9.\nAssistant: I advise the chemical with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F."}", "/scratch/micpie/export/rdkit_features/train_100-15.jsonl": "{"text":"Question: What is the heteroatom count of the molecule with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C?\nAnswer: 4"} {"text":"Question: What is the heteroatom count of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_4-11.jsonl": "{"text":"User: I want to analyze a chemical with a molecular formula of C18H17BrClN3O2.\nAssistant: That is a very interesting question, do you have some additional conditions that I should consider?\nUser: Yes, I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptors to be 3.\nAssistant: In that case, I recommend the chemical with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br."} {"text":"User: I want to analyze a compound with a formula of C23H31N3O2S+2.\nAssistant: Nice, do you have some additional ?\nUser: Yes, I want the number of hydrogen bond donor sites to be 2, the number of hydrogen bond acceptors to be 3.\nAssistant: In that situation, I recommend the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C."}", "/scratch/micpie/export/rdkit_features/test_100-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_9-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_23-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4 is 65.31."} {"text":"The sum of atomic polarizabilities of the compound with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4 is 56.13."}", "/scratch/micpie/export/rdkit_features/valid_119-11.jsonl": "{"text":"User: I want to create a chemical with a formula of C14H18FN3O5.\nAssistant: Do you have some additional requirements that help me narrow down the search?\nUser: Indeed, I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptors to be 6.\nAssistant: In that situation, I suggest the chemical with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O."} {"text":"User: I want to design a chemical with a chemical formula of C28H23N3O4.\nAssistant: Nice, do you have some additional limitations I should take into account?\nUser: Yes, I want the count of hydrogen bond donors to be 0, the number of hydrogen bond acceptor sites to be 7.\nAssistant: Then, I advise the chemical with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6."}", "/scratch/micpie/export/rdkit_features/train_17-20.jsonl": "{"text":"Question: What is the number of basic groups of the chemical with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the molecule with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_109-0.jsonl": "{"text":"The chemical formula of the compound with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl is C22H26ClN3O4."} {"text":"The chemical formula of the compound with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F is C21H19FN4O3S."}", "/scratch/micpie/export/rdkit_features/test_105-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the compound with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_15-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N?\nAnswer: C18H20N4O"} {"text":"Question: What is the formula of the chemical with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C?\nAnswer: C18H35N2O2+"}", "/scratch/micpie/export/rdkit_features/test_10-23.jsonl": "{"text":"User: I want to synthesize a molecule with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 5 and a Wildman-Crippen LogP value of 2.36.\nAssistant: Nice, do you have some additional constraints that I should consider?\nUser: Indeed, I want the formula to be C23H42N5O+.\nAssistant: In that scenario, I suggest the molecule with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4."} {"text":"User: I want to make a molecule with a number of hydrogen bond donor sites of 0, a number of hydrogen bond acceptor sites of 1 and a Wildman-Crippen LogP value computed using RDKit of 4.73.\nAssistant: That is a very interesting question, do you have some additional conditions I should consider?\nUser: Yep, I want the chemical formula to be C18H23F4NO.\nAssistant: In that case, I recommend the molecule with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F."}", "/scratch/micpie/export/rdkit_features/valid_105-1.jsonl": "{"text":"The number of hydrogen bond donors of the compound with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F is 1."} {"text":"The number of hydrogen bond donors of the molecule with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O is 1."}", "/scratch/micpie/export/rdkit_features/valid_6-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+]?\nAnswer: 3"} {"text":"Question: What is the count of rings of the compound with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_2-23.jsonl": "{"text":"User: I want to design a molecule with a count of hydrogen bond donors of 0, a number of hydrogen bond acceptors of 7 and a Wildman-Crippen LogP value computed using RDKit of 3.97.\nAssistant: Do you have some additional conditions that I should consider?\nUser: I want the chemical formula to be C21H24N4O3S.\nAssistant: In that scenario, I advise the molecule with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4."} {"text":"User: I want to make a chemical with a count of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value computed using RDKit of 3.99.\nAssistant: Interesting, do you have some additional that help me narrow down the search?\nUser: Yeah, I want the formula to be C21H28N4O2S.\nAssistant: In that scenario, I recommend the chemical with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3."}", "/scratch/micpie/export/rdkit_features/train_5-0.jsonl": "{"text":"The chemical formula of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C is C23H29N2O3S+."} {"text":"The molecular formula of the compound with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C is C21H32N4O4."}", "/scratch/micpie/export/rdkit_features/test_6-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the compound with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3 is 4."} {"text":"The count of hydrogen bond acceptors of the molecule with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C is 5."}", "/scratch/micpie/export/rdkit_features/test_116-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1 is 5.57."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl is 5.08."}", "/scratch/micpie/export/rdkit_features/test_30-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the compound with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3 is 58.53."} {"text":"The total sum of atomic polarizabilities of the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC is 65.41."}", "/scratch/micpie/export/rdkit_features/test_105-7.jsonl": "{"text":"The acid group count of the chemical with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N is 0."} {"text":"The count of acid groups of the chemical with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O is 0."}", "/scratch/micpie/export/rdkit_features/train_26-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F is 3.89."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C is 3.15."}", "/scratch/micpie/export/rdkit_features/train_27-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_3-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC?\nAnswer: 5"} {"text":"Question: What is the count of rotatable bonds of the compound with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_12-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the compound with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C is 4.74."} {"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC is 4.74."}", "/scratch/micpie/export/rdkit_features/valid_0-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1?\nAnswer: 57.40"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4?\nAnswer: 69.89"}", "/scratch/micpie/export/rdkit_features/train_111-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: C20H27N5O4S"} {"text":"Question: What is the chemical formula of the chemical with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC?\nAnswer: C22H29N7O3"}", "/scratch/micpie/export/rdkit_features/valid_117-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4OC)Cl?\nAnswer: 4"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES Cc1cc(ccc1[C@@H](C)[NH2+]CC2(CC2)CO)F?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_28-4.jsonl": "{"text":"The ring count of the chemical with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC is 4."} {"text":"The number of rings of the molecule with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C is 3."}", "/scratch/micpie/export/rdkit_features/valid_22-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O?\nAnswer: 0"} {"text":"Question: What is the acid group count of the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_100-20.jsonl": "{"text":"Question: What is the number of basic groups of the compound with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F?\nAnswer: 1"} {"text":"Question: What is the count of basic groups of the compound with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_107-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl is 6."} {"text":"The number of aromatic bonds of the compound with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5 is 27."}", "/scratch/micpie/export/rdkit_features/train_11-3.jsonl": "{"text":"The count of heteroatoms of the compound with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl is 4."} {"text":"The count of heteroatoms of the compound with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C is 4."}", "/scratch/micpie/export/rdkit_features/test_17-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_23-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4 is 0.96."} {"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4 is 0.48."}", "/scratch/micpie/export/rdkit_features/train_32-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the compound with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C?\nAnswer: 9"} {"text":"Question: What is the count of heteroatoms of the molecule with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2?\nAnswer: 10"}", "/scratch/micpie/export/rdkit_features/valid_18-22.jsonl": "{"text":"User: I want to create a molecule with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 9.\nAssistant: Cool, do you have some additional limitations that I should consider?\nUser: Yea, I want the number of heteroatoms to be 12.\nAssistant: In that scenario, I the molecule with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O."} {"text":"User: I want to analyze a chemical with a number of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 2.\nAssistant: That is a very interesting question, do you have some additional conditions that I should consider?\nUser: Yea, I want the count of heteroatoms to be 7.\nAssistant: Then, I advise the chemical with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O."}", "/scratch/micpie/export/rdkit_features/valid_7-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC is 3.80."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC is 3.84."}", "/scratch/micpie/export/rdkit_features/valid_32-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3?\nAnswer: C19H30N5O2S+"} {"text":"Question: What is the molecular formula of the molecule with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O?\nAnswer: C16H18Cl2N2O5"}", "/scratch/micpie/export/rdkit_features/train_3-12.jsonl": "{"text":"Question: What is the formula of the chemical with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O?\nAnswer: C25H24N2O4"} {"text":"Question: What is the chemical formula of the chemical with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4?\nAnswer: C24H28N3O3+"}", "/scratch/micpie/export/rdkit_features/test_23-3.jsonl": "{"text":"The number of heteroatoms of the molecule with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4 is 7."} {"text":"The heteroatom count of the chemical with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4 is 10."}", "/scratch/micpie/export/rdkit_features/train_102-6.jsonl": "{"text":"The count of aromatic bonds of the chemical with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O is 0."} {"text":"The aromatic bond count of the molecule with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1 is 17."}", "/scratch/micpie/export/rdkit_features/test_31-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the chemical with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F is 0."} {"text":"The number of hydrogen bond donors of the molecule with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3 is 1."}", "/scratch/micpie/export/rdkit_features/test_28-23.jsonl": "{"text":"User: I want to make a molecule with a count of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 6 and a Wildman-Crippen LogP value computed using RDKit of 2.54.\nAssistant: That is a very interesting question, do you have some additional requirements I should consider?\nUser: Yea, I want the molecular formula to be C17H19N4OS2+.\nAssistant: I recommend the molecule with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC."} {"text":"User: I want to make a compound with a count of hydrogen bond donors of 3, a number of hydrogen bond acceptor sites of 3 and a Wildman-Crippen LogP value of 3.93.\nAssistant: Nice, do you have some additional limitations that help me narrow down the search?\nUser: Yeah, I want the chemical formula to be C20H25N5O.\nAssistant: Then, I suggest the compound with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C."}", "/scratch/micpie/export/rdkit_features/train_116-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_117-3.jsonl": "{"text":"The number of heteroatoms of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc5cc(ccc5c4)Br is 6."} {"text":"The heteroatom count of the chemical with SMILES C[C@H](C(C)(C)OC)[NH2+]CCNc1ccccc1 is 3."}", "/scratch/micpie/export/rdkit_features/train_105-16.jsonl": "{"text":"Question: What is the count of rings of the chemical with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N?\nAnswer: 3"} {"text":"Question: What is the number of rings of the chemical with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_29-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl?\nAnswer: 1"} {"text":"Question: What is the count of basic groups of the compound with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_32-0.jsonl": "{"text":"The molecular formula of the compound with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C is C18H32N5O3S+."} {"text":"The chemical formula of the compound with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2 is C17H20F3N4O3+."}", "/scratch/micpie/export/rdkit_features/valid_111-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the molecule with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4?\nAnswer: 9"} {"text":"Question: What is the count of rotatable bonds of the molecule with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/valid_10-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C is 3.80."} {"text":"The Wildman-Crippen LogP value of the compound with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3 is 4.63."}", "/scratch/micpie/export/rdkit_features/valid_28-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_0-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1?\nAnswer: 8"} {"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_110-7.jsonl": "{"text":"The acid group count of the chemical with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3 is 0."} {"text":"The number of acid groups of the compound with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4 is 0."}", "/scratch/micpie/export/rdkit_features/valid_33-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES c1ccc2c(c1)cn(n2)CC(=O)N3C[C@@H]4[C@H]3CN(CC4)C(=O)[C@@]56CCC[C@@H]5C6 is C22H26N4O2."} {"text":"The chemical formula of the chemical with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@@H]3CCSC3)Br is C15H20BrN4O2S+."}", "/scratch/micpie/export/rdkit_features/test_111-6.jsonl": "{"text":"The count of aromatic bonds of the chemical with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4 is 11."} {"text":"The aromatic bond count of the molecule with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC is 11."}", "/scratch/micpie/export/rdkit_features/test_27-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the molecule with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F is 2."} {"text":"The count of hydrogen bond acceptors of the chemical with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F is 3."}", "/scratch/micpie/export/rdkit_features/train_118-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F is 2."} {"text":"The number of hydrogen bond donors of the molecule with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3 is 2."}", "/scratch/micpie/export/rdkit_features/valid_3-18.jsonl": "{"text":"Question: What is the aromatic bond count of the compound with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N?\nAnswer: 17"} {"text":"Question: What is the count of aromatic bonds of the compound with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/valid_24-8.jsonl": "{"text":"The basic group count of the compound with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl is 0."} {"text":"The count of basic groups of the compound with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C is 1."}", "/scratch/micpie/export/rdkit_features/valid_109-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl?\nAnswer: 64.74"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F?\nAnswer: 59.89"}", "/scratch/micpie/export/rdkit_features/valid_5-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C?\nAnswer: 6"} {"text":"Question: What is the count of rotatable bonds of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+]?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/test_7-4.jsonl": "{"text":"The ring count of the compound with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC is 4."} {"text":"The number of rings of the compound with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC is 4."}", "/scratch/micpie/export/rdkit_features/train_20-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3?\nAnswer: 10"} {"text":"Question: What is the number of rotatable bonds of the molecule with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_110-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the chemical with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O?\nAnswer: 10"} {"text":"Question: What is the count of heteroatoms of the chemical with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 10"}", "/scratch/micpie/export/rdkit_features/valid_114-23.jsonl": "{"text":"User: I want to analyze a compound with a number of hydrogen bond donor sites of 3, a number of hydrogen bond acceptors of 3 and a LogP value computed using the Wildman-Crippen method of 5.91.\nAssistant: Nice, do you have some additional that I should consider?\nUser: I want the chemical formula to be C20H17Cl2N3O2S2.\nAssistant: I suggest the compound with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl."} {"text":"User: I want to create a chemical with a number of hydrogen bond donors of 3, a count of hydrogen bond acceptors of 2 and a LogP value computed using the Wildman-Crippen method of 5.19.\nAssistant: That's interesting, do you have some additional conditions that I should consider?\nUser: Yep, I want the chemical formula to be C27H24ClN3O2.\nAssistant: In that situation, I suggest the chemical with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4."}", "/scratch/micpie/export/rdkit_features/test_2-8.jsonl": "{"text":"The basic group count of the molecule with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C is 1."} {"text":"The number of basic groups of the molecule with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4 is 0."}", "/scratch/micpie/export/rdkit_features/valid_0-1.jsonl": "{"text":"The number of hydrogen bond donors of the chemical with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1 is 1."} {"text":"The number of hydrogen bond donors of the compound with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4 is 1."}", "/scratch/micpie/export/rdkit_features/train_120-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the compound with SMILES Cc1ccc(cc1)NC(=O)CSc2nnc(n2C)C=NC(=O)CCc3ccccc3Cl?\nAnswer: 8"} {"text":"Question: What is the count of rotatable bonds of the chemical with SMILES Cc1c(cnn1c2ccccc2)N[C@@H](C)C(=O)Nc3cccc(c3Cl)Cl?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_104-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2?\nAnswer: 1"} {"text":"Question: What is the number of rotatable bonds of the molecule with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_107-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the molecule with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl is 3."} {"text":"The count of hydrogen bond acceptors of the chemical with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5 is 10."}", "/scratch/micpie/export/rdkit_features/valid_10-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C?\nAnswer: 7"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_27-12.jsonl": "{"text":"Question: What is the molecular formula of the molecule with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F?\nAnswer: C18H27F3N3O2+"} {"text":"Question: What is the molecular formula of the chemical with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F?\nAnswer: C19H27F2N3O2"}", "/scratch/micpie/export/rdkit_features/train_7-7.jsonl": "{"text":"The number of acid groups of the chemical with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl is 0."} {"text":"The acid group count of the compound with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F is 0."}", "/scratch/micpie/export/rdkit_features/test_109-19.jsonl": "{"text":"Question: What is the count of acid groups of the molecule with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl?\nAnswer: 0"} {"text":"Question: What is the acid group count of the chemical with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_9-22.jsonl": "{"text":"User: I want to design a chemical with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 3.\nAssistant: Interesting, do you have some additional conditions that help me narrow down the search?\nUser: Yes, I want the number of heteroatoms to be 6.\nAssistant: In that situation, I the chemical with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C."} {"text":"User: I want to make a compound with a count of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 6.\nAssistant: Interesting, do you have some additional limitations I should take into account?\nUser: Yes, I want the count of heteroatoms to be 7.\nAssistant: I suggest the compound with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4."}", "/scratch/micpie/export/rdkit_features/test_111-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4 is 72.20."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC is 68.10."}", "/scratch/micpie/export/rdkit_features/valid_7-8.jsonl": "{"text":"The count of basic groups of the chemical with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC is 0."} {"text":"The number of basic groups of the compound with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC is 0."}", "/scratch/micpie/export/rdkit_features/test_12-20.jsonl": "{"text":"Question: What is the basic group count of the chemical with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the chemical with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_20-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-]?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_25-3.jsonl": "{"text":"The heteroatom count of the compound with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C is 9."} {"text":"The heteroatom count of the chemical with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F is 9."}", "/scratch/micpie/export/rdkit_features/train_20-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the compound with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3?\nAnswer: 6"} {"text":"Question: What is the count of heteroatoms of the molecule with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/test_7-22.jsonl": "{"text":"User: I want to create a compound with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 6.\nAssistant: Nice, do you have some additional I should consider?\nUser: I want the number of heteroatoms to be 7.\nAssistant: I recommend the compound with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC."} {"text":"User: I want to create a molecule with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 7.\nAssistant: Interesting, do you have some additional requirements that I should consider?\nUser: Yes, I want the count of heteroatoms to be 8.\nAssistant: In that situation, I recommend the molecule with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC."}", "/scratch/micpie/export/rdkit_features/valid_105-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F is C18H14FN3OS2."} {"text":"The molecular formula of the chemical with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O is C13H12ClFN3O3+."}", "/scratch/micpie/export/rdkit_features/test_12-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the compound with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C is 2."} {"text":"The count of hydrogen bond donors of the molecule with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4 is 1."}", "/scratch/micpie/export/rdkit_features/valid_110-15.jsonl": "{"text":"Question: What is the heteroatom count of the compound with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O?\nAnswer: 8"} {"text":"Question: What is the count of heteroatoms of the chemical with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 10"}", "/scratch/micpie/export/rdkit_features/train_1-20.jsonl": "{"text":"Question: What is the basic group count of the chemical with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4?\nAnswer: 1"} {"text":"Question: What is the number of basic groups of the chemical with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_27-7.jsonl": "{"text":"The count of acid groups of the molecule with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F is 0."} {"text":"The number of acid groups of the chemical with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F is 0."}", "/scratch/micpie/export/rdkit_features/test_8-18.jsonl": "{"text":"Question: What is the aromatic bond count of the molecule with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C?\nAnswer: 16"} {"text":"Question: What is the aromatic bond count of the compound with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/train_109-18.jsonl": "{"text":"Question: What is the aromatic bond count of the molecule with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3?\nAnswer: 6"} {"text":"Question: What is the number of aromatic bonds of the chemical with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C?\nAnswer: 16"}", "/scratch/micpie/export/rdkit_features/train_101-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the chemical with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3 is 2.77."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1 is 4.44."}", "/scratch/micpie/export/rdkit_features/train_102-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O?\nAnswer: 4"} {"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_17-16.jsonl": "{"text":"Question: What is the number of rings of the molecule with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F?\nAnswer: 3"} {"text":"Question: What is the number of rings of the compound with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_30-22.jsonl": "{"text":"User: I want to make a molecule with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 2.\nAssistant: Cool, do you have some additional I should consider?\nUser: Indeed, I want the count of heteroatoms to be 7.\nAssistant: Then, I advise the molecule with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F."} {"text":"User: I want to create a chemical with a count of hydrogen bond donors of 0 and a number of hydrogen bond acceptors of 3.\nAssistant: That is a very interesting question, do you have some additional conditions?\nUser: Indeed, I want the heteroatom count to be 6.\nAssistant: In that case, I suggest the chemical with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC."}", "/scratch/micpie/export/rdkit_features/test_29-11.jsonl": "{"text":"User: I want to analyze a compound with a chemical formula of C21H33N4O+.\nAssistant: Nice, do you have some additional constraints?\nUser: Indeed, I want the number of hydrogen bond donors to be 2, the count of hydrogen bond acceptors to be 2.\nAssistant: I recommend the compound with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4."} {"text":"User: I want to design a molecule with a chemical formula of C21H38N2O2.\nAssistant: That's interesting, do you have some additional conditions I should take into account?\nUser: Yea, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 2.\nAssistant: Then, I advise the molecule with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C."}", "/scratch/micpie/export/rdkit_features/train_16-4.jsonl": "{"text":"The count of rings of the compound with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C is 2."} {"text":"The count of rings of the chemical with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F is 3."}", "/scratch/micpie/export/rdkit_features/test_27-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the chemical with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F?\nAnswer: 4"} {"text":"Question: What is the number of rotatable bonds of the chemical with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_118-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_22-12.jsonl": "{"text":"Question: What is the chemical formula of the compound with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O?\nAnswer: C17H24ClN3O3S"} {"text":"Question: What is the formula of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4?\nAnswer: C20H33N6O2+"}", "/scratch/micpie/export/rdkit_features/test_24-15.jsonl": "{"text":"Question: What is the heteroatom count of the molecule with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl?\nAnswer: 11"} {"text":"Question: What is the count of heteroatoms of the molecule with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/train_104-0.jsonl": "{"text":"The formula of the chemical with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is C14H21N5O4."} {"text":"The molecular formula of the compound with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl is C19H16ClN2O2S-."}", "/scratch/micpie/export/rdkit_features/train_15-11.jsonl": "{"text":"User: I want to make a compound with a formula of C17H22N4O2.\nAssistant: Interesting, do you have some additional that help me narrow down the search?\nUser: I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptors to be 5.\nAssistant: In that scenario, I the compound with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC."} {"text":"User: I want to design a molecule with a chemical formula of C17H28N3OS+.\nAssistant: Interesting, do you have some additional that help me narrow down the search?\nUser: Indeed, I want the number of hydrogen bond donor sites to be 2, the number of hydrogen bond acceptor sites to be 3.\nAssistant: Then, I recommend the molecule with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C."}", "/scratch/micpie/export/rdkit_features/test_10-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the molecule with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4 is 2.36."} {"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F is 4.73."}", "/scratch/micpie/export/rdkit_features/train_11-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_22-19.jsonl": "{"text":"Question: What is the number of acid groups of the molecule with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O?\nAnswer: 0"} {"text":"Question: What is the acid group count of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_107-12.jsonl": "{"text":"Question: What is the molecular formula of the compound with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl?\nAnswer: C21H30ClN6O3+"} {"text":"Question: What is the chemical formula of the compound with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5?\nAnswer: C21H19N9OS"}", "/scratch/micpie/export/rdkit_features/test_111-4.jsonl": "{"text":"The count of rings of the molecule with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4 is 4."} {"text":"The count of rings of the chemical with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC is 4."}", "/scratch/micpie/export/rdkit_features/train_110-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the compound with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O?\nAnswer: 22"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 15"}", "/scratch/micpie/export/rdkit_features/test_118-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/valid_107-0.jsonl": "{"text":"The formula of the compound with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C is C24H27N5O3."} {"text":"The chemical formula of the molecule with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C is C23H27N3O6."}", "/scratch/micpie/export/rdkit_features/train_11-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the chemical with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl?\nAnswer: 6"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_30-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the molecule with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F?\nAnswer: 6"} {"text":"Question: What is the count of aromatic bonds of the compound with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/valid_119-6.jsonl": "{"text":"The count of aromatic bonds of the compound with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O is 6."} {"text":"The aromatic bond count of the chemical with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6 is 23."}", "/scratch/micpie/export/rdkit_features/train_107-22.jsonl": "{"text":"User: I want to analyze a molecule with a number of hydrogen bond donor sites of 3 and a number of hydrogen bond acceptors of 3.\nAssistant: Do you have some additional that help me narrow down the search?\nUser: Yeah, I want the number of heteroatoms to be 10.\nAssistant: In that scenario, I advise the molecule with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl."} {"text":"User: I want to create a compound with a count of hydrogen bond donors of 2 and a number of hydrogen bond acceptors of 10.\nAssistant: Cool, do you have some additional constraints I should take into account?\nUser: Yeah, I want the count of heteroatoms to be 11.\nAssistant: In that scenario, I advise the compound with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5."}", "/scratch/micpie/export/rdkit_features/train_113-6.jsonl": "{"text":"The count of aromatic bonds of the compound with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 6."} {"text":"The number of aromatic bonds of the molecule with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 18."}", "/scratch/micpie/export/rdkit_features/test_114-0.jsonl": "{"text":"The molecular formula of the chemical with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C is C24H27N3O3S2."} {"text":"The chemical formula of the chemical with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5 is C26H22N4O4."}", "/scratch/micpie/export/rdkit_features/train_29-20.jsonl": "{"text":"Question: What is the count of basic groups of the compound with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the compound with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_110-19.jsonl": "{"text":"Question: What is the count of acid groups of the compound with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the chemical with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_15-0.jsonl": "{"text":"The chemical formula of the compound with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC is C17H22N4O2."} {"text":"The chemical formula of the compound with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C is C17H28N3OS+."}", "/scratch/micpie/export/rdkit_features/train_108-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2 is 2.03."} {"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3 is 2.06."}", "/scratch/micpie/export/rdkit_features/valid_112-12.jsonl": "{"text":"Question: What is the chemical formula of the compound with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC?\nAnswer: C21H26N6O3S"} {"text":"Question: What is the chemical formula of the chemical with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: C22H27N6O2S+"}", "/scratch/micpie/export/rdkit_features/valid_6-22.jsonl": "{"text":"User: I want to make a compound with a count of hydrogen bond donors of 3 and a number of hydrogen bond acceptors of 4.\nAssistant: Interesting, do you have some additional conditions I should take into account?\nUser: Indeed, I want the heteroatom count to be 6.\nAssistant: In that situation, I suggest the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+]."} {"text":"User: I want to synthesize a chemical with a count of hydrogen bond donors of 0 and a count of hydrogen bond acceptors of 6.\nAssistant: Interesting, do you have some additional I should take into account?\nUser: I want the number of heteroatoms to be 8.\nAssistant: In that scenario, I propose the chemical with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F."}", "/scratch/micpie/export/rdkit_features/test_9-8.jsonl": "{"text":"The basic group count of the molecule with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C is 1."} {"text":"The basic group count of the molecule with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4 is 1."}", "/scratch/micpie/export/rdkit_features/test_115-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F is 69.25."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 78.26."}", "/scratch/micpie/export/rdkit_features/train_101-7.jsonl": "{"text":"The count of acid groups of the molecule with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3 is 0."} {"text":"The count of acid groups of the compound with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1 is 0."}", "/scratch/micpie/export/rdkit_features/valid_2-4.jsonl": "{"text":"The count of rings of the chemical with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4 is 4."} {"text":"The number of rings of the chemical with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3 is 3."}", "/scratch/micpie/export/rdkit_features/valid_119-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the compound with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O?\nAnswer: 6"} {"text":"Question: What is the count of aromatic bonds of the molecule with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6?\nAnswer: 23"}", "/scratch/micpie/export/rdkit_features/train_28-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the chemical with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_109-20.jsonl": "{"text":"Question: What is the number of basic groups of the chemical with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl?\nAnswer: 0"} {"text":"Question: What is the basic group count of the chemical with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_25-4.jsonl": "{"text":"The ring count of the molecule with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4 is 4."} {"text":"The count of rings of the molecule with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3 is 3."}", "/scratch/micpie/export/rdkit_features/train_13-22.jsonl": "{"text":"User: I want to make a compound with a number of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 2.\nAssistant: That is a very interesting question, do you have some additional constraints that help me narrow down the search?\nUser: Indeed, I want the heteroatom count to be 4.\nAssistant: In that scenario, I advise the compound with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC."} {"text":"User: I want to synthesize a compound with a count of hydrogen bond donors of 2 and a number of hydrogen bond acceptors of 3.\nAssistant: Cool, do you have some additional limitations I should consider?\nUser: Yeah, I want the number of heteroatoms to be 5.\nAssistant: Then, I the compound with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C."}", "/scratch/micpie/export/rdkit_features/train_29-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the molecule with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F is 2."} {"text":"The number of hydrogen bond donor sites of the molecule with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C is 2."}", "/scratch/micpie/export/rdkit_features/valid_115-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the chemical with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4?\nAnswer: 9"} {"text":"Question: What is the heteroatom count of the compound with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/test_120-8.jsonl": "{"text":"The count of basic groups of the chemical with SMILES c1ccc(cc1)CCN(CC(=O)Nc2ccc(cc2)Cl)S(=O)(=O)c3ccc4c(c3)OCCO4 is 0."} {"text":"The count of basic groups of the molecule with SMILES Cc1c(cnn1c2ccccc2)N[C@H](C)C(=O)Nc3cccc(c3Cl)Cl is 0."}", "/scratch/micpie/export/rdkit_features/train_115-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F?\nAnswer: C21H14BrF2N3O2"} {"text":"Question: What is the chemical formula of the chemical with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: C29H27F2N3O2"}", "/scratch/micpie/export/rdkit_features/train_33-7.jsonl": "{"text":"The count of acid groups of the compound with SMILES COc1ccc(c(n1)C(F)(F)F)NC(=O)C(=O)NCCC[NH+]2CCCCC2 is 0."} {"text":"The count of acid groups of the compound with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@H]3CCSC3)Br is 0."}", "/scratch/micpie/export/rdkit_features/test_2-22.jsonl": "{"text":"User: I want to make a molecule with a count of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 4.\nAssistant: That's interesting, do you have some additional conditions that help me narrow down the search?\nUser: Indeed, I want the heteroatom count to be 8.\nAssistant: Then, I advise the molecule with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C."} {"text":"User: I want to create a chemical with a number of hydrogen bond donor sites of 0 and a number of hydrogen bond acceptors of 5.\nAssistant: Cool, do you have some additional conditions I should take into account?\nUser: Yes, I want the heteroatom count to be 6.\nAssistant: In that case, I suggest the chemical with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4."}", "/scratch/micpie/export/rdkit_features/train_21-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_103-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO?\nAnswer: 68.34"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N?\nAnswer: 42.98"}", "/scratch/micpie/export/rdkit_features/valid_116-6.jsonl": "{"text":"The aromatic bond count of the molecule with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 17."} {"text":"The count of aromatic bonds of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl is 17."}", "/scratch/micpie/export/rdkit_features/test_12-22.jsonl": "{"text":"User: I want to analyze a chemical with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptor sites of 3.\nAssistant: Do you have some additional constraints?\nUser: Indeed, I want the number of heteroatoms to be 6.\nAssistant: Then, I propose the chemical with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C."} {"text":"User: I want to create a chemical with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 2.\nAssistant: That is a very interesting question, do you have some additional I should take into account?\nUser: Yea, I want the number of heteroatoms to be 4.\nAssistant: In that scenario, I suggest the chemical with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4."}", "/scratch/micpie/export/rdkit_features/test_17-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the compound with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F is 1."} {"text":"The number of hydrogen bond donors of the chemical with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 1."}", "/scratch/micpie/export/rdkit_features/train_18-23.jsonl": "{"text":"User: I want to design a compound with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 9 and a LogP value computed using the Wildman-Crippen method of 0.60.\nAssistant: Do you have some additional constraints I should take into account?\nUser: Yeah, I want the formula to be C21H27N3O9S.\nAssistant: In that case, I propose the compound with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O."} {"text":"User: I want to create a chemical with a number of hydrogen bond donor sites of 2, a count of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value of 3.70.\nAssistant: Do you have some additional limitations?\nUser: Yea, I want the chemical formula to be C17H18Cl2N2O4S.\nAssistant: I advise the chemical with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl."}", "/scratch/micpie/export/rdkit_features/test_111-22.jsonl": "{"text":"User: I want to make a chemical with a number of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 7.\nAssistant: That is a very interesting question, do you have some additional constraints that help me narrow down the search?\nUser: Yeah, I want the count of heteroatoms to be 9.\nAssistant: In that scenario, I the chemical with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4."} {"text":"User: I want to make a compound with a number of hydrogen bond donors of 0 and a number of hydrogen bond acceptor sites of 8.\nAssistant: Nice, do you have some additional requirements?\nUser: Yes, I want the number of heteroatoms to be 9.\nAssistant: In that situation, I advise the compound with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC."}", "/scratch/micpie/export/rdkit_features/test_13-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC is 52.88."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4 is 62.09."}", "/scratch/micpie/export/rdkit_features/train_100-6.jsonl": "{"text":"The count of aromatic bonds of the compound with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C is 0."} {"text":"The count of aromatic bonds of the chemical with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3 is 17."}", "/scratch/micpie/export/rdkit_features/valid_5-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the molecule with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+]?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_30-23.jsonl": "{"text":"User: I want to make a chemical with a number of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 2 and a Wildman-Crippen LogP value of 3.65.\nAssistant: Cool, do you have some additional limitations I should take into account?\nUser: Yes, I want the formula to be C19H27F3N2O2.\nAssistant: In that case, I the chemical with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F."} {"text":"User: I want to make a compound with a number of hydrogen bond donors of 0, a number of hydrogen bond acceptor sites of 3 and a LogP value computed using the Wildman-Crippen method of 3.84.\nAssistant: Cool, do you have some additional conditions I should take into account?\nUser: Yea, I want the chemical formula to be C24H27FN2O3.\nAssistant: Then, I recommend the compound with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC."}", "/scratch/micpie/export/rdkit_features/train_24-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5?\nAnswer: 3"} {"text":"Question: What is the count of acid groups of the compound with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_103-4.jsonl": "{"text":"The ring count of the molecule with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1 is 4."} {"text":"The number of rings of the compound with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 3."}", "/scratch/micpie/export/rdkit_features/test_26-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the compound with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3?\nAnswer: 6"} {"text":"Question: What is the aromatic bond count of the chemical with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/train_110-20.jsonl": "{"text":"Question: What is the count of basic groups of the compound with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the molecule with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_13-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4?\nAnswer: C21H20FN3O"} {"text":"Question: What is the chemical formula of the molecule with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C?\nAnswer: C21H20N4"}", "/scratch/micpie/export/rdkit_features/test_10-22.jsonl": "{"text":"User: I want to create a chemical with a number of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 5.\nAssistant: Cool, do you have some additional constraints that help me narrow down the search?\nUser: Yep, I want the heteroatom count to be 6.\nAssistant: In that situation, I the chemical with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4."} {"text":"User: I want to analyze a compound with a count of hydrogen bond donors of 0 and a number of hydrogen bond acceptors of 1.\nAssistant: Do you have some additional requirements?\nUser: Yeah, I want the number of heteroatoms to be 6.\nAssistant: Then, I suggest the compound with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F."}", "/scratch/micpie/export/rdkit_features/valid_7-1.jsonl": "{"text":"The count of hydrogen bond donors of the chemical with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC is 2."} {"text":"The number of hydrogen bond donors of the chemical with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC is 2."}", "/scratch/micpie/export/rdkit_features/valid_25-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the compound with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4 is 2.45."} {"text":"The Wildman-Crippen LogP value of the compound with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3 is 3.98."}", "/scratch/micpie/export/rdkit_features/train_25-11.jsonl": "{"text":"User: I want to analyze a molecule with a chemical formula of C19H31N7O2.\nAssistant: Cool, do you have some additional ?\nUser: Indeed, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 8.\nAssistant: In that scenario, I suggest the molecule with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C."} {"text":"User: I want to make a chemical with a formula of C15H15F6NO2.\nAssistant: Interesting, do you have some additional I should consider?\nUser: I want the count of hydrogen bond donors to be 0, the count of hydrogen bond acceptors to be 2.\nAssistant: I the chemical with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F."}", "/scratch/micpie/export/rdkit_features/valid_16-0.jsonl": "{"text":"The chemical formula of the compound with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C is C18H35N2O2+."} {"text":"The molecular formula of the compound with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3 is C23H36N2O4."}", "/scratch/micpie/export/rdkit_features/train_118-6.jsonl": "{"text":"The aromatic bond count of the chemical with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F is 6."} {"text":"The number of aromatic bonds of the chemical with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3 is 6."}", "/scratch/micpie/export/rdkit_features/valid_22-4.jsonl": "{"text":"The count of rings of the chemical with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O is 3."} {"text":"The count of rings of the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4 is 4."}", "/scratch/micpie/export/rdkit_features/train_105-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N is 5."} {"text":"The count of hydrogen bond acceptors of the molecule with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O is 6."}", "/scratch/micpie/export/rdkit_features/valid_17-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S is 5."} {"text":"The count of rotatable bonds of the chemical with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 5."}", "/scratch/micpie/export/rdkit_features/test_7-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC is 0."} {"text":"The count of acid groups of the compound with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC is 0."}", "/scratch/micpie/export/rdkit_features/valid_9-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_30-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F is 6."} {"text":"The aromatic bond count of the compound with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC is 12."}", "/scratch/micpie/export/rdkit_features/train_25-12.jsonl": "{"text":"Question: What is the molecular formula of the compound with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C?\nAnswer: C19H31N7O2"} {"text":"Question: What is the formula of the molecule with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F?\nAnswer: C15H15F6NO2"}", "/scratch/micpie/export/rdkit_features/valid_0-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_115-20.jsonl": "{"text":"Question: What is the count of basic groups of the chemical with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the chemical with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_105-19.jsonl": "{"text":"Question: What is the number of acid groups of the chemical with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the chemical with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_110-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O is 69.18."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C is 63.82."}", "/scratch/micpie/export/rdkit_features/train_27-7.jsonl": "{"text":"The number of acid groups of the molecule with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C is 0."} {"text":"The count of acid groups of the molecule with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4 is 0."}", "/scratch/micpie/export/rdkit_features/test_14-23.jsonl": "{"text":"User: I want to analyze a chemical with a number of hydrogen bond donor sites of 1, a count of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value computed using RDKit of 4.55.\nAssistant: Do you have some additional limitations that I should consider?\nUser: Yeah, I want the chemical formula to be C21H24N4O.\nAssistant: I the chemical with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3."} {"text":"User: I want to analyze a chemical with a count of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 4 and a LogP value computed using the Wildman-Crippen method of 2.72.\nAssistant: Interesting, do you have some additional constraints that help me narrow down the search?\nUser: Yes, I want the chemical formula to be C13H14BrNO3.\nAssistant: In that situation, I recommend the chemical with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O."}", "/scratch/micpie/export/rdkit_features/train_15-8.jsonl": "{"text":"The basic group count of the compound with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC is 0."} {"text":"The count of basic groups of the chemical with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C is 1."}", "/scratch/micpie/export/rdkit_features/valid_101-0.jsonl": "{"text":"The molecular formula of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3 is C15H13ClN4O2."} {"text":"The chemical formula of the molecule with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1 is C25H23F2N3O2."}", "/scratch/micpie/export/rdkit_features/train_21-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC is 49.29."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O is 58.55."}", "/scratch/micpie/export/rdkit_features/test_114-5.jsonl": "{"text":"The rotatable bond count of the chemical with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C is 7."} {"text":"The number of rotatable bonds of the chemical with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5 is 7."}", "/scratch/micpie/export/rdkit_features/train_30-7.jsonl": "{"text":"The acid group count of the chemical with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F is 0."} {"text":"The count of acid groups of the chemical with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC is 0."}", "/scratch/micpie/export/rdkit_features/train_107-3.jsonl": "{"text":"The number of heteroatoms of the compound with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl is 10."} {"text":"The heteroatom count of the chemical with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5 is 11."}", "/scratch/micpie/export/rdkit_features/test_106-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O?\nAnswer: 2"} {"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_106-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O?\nAnswer: C14H18N4O4"} {"text":"Question: What is the chemical formula of the compound with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC?\nAnswer: C22H27N3O5S"}", "/scratch/micpie/export/rdkit_features/test_102-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3 is 60.66."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1 is 73.63."}", "/scratch/micpie/export/rdkit_features/test_115-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F?\nAnswer: 69.25"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 78.26"}", "/scratch/micpie/export/rdkit_features/train_18-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the chemical with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O is 9."} {"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl is 4."}", "/scratch/micpie/export/rdkit_features/valid_12-18.jsonl": "{"text":"Question: What is the aromatic bond count of the chemical with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl?\nAnswer: 6"} {"text":"Question: What is the count of aromatic bonds of the compound with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_17-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S is 61.95."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 62.08."}", "/scratch/micpie/export/rdkit_features/train_12-11.jsonl": "{"text":"User: I want to design a chemical with a molecular formula of C17H21Cl2NO.\nAssistant: Nice, do you have some additional requirements that help me narrow down the search?\nUser: I want the count of hydrogen bond donors to be 0, the count of hydrogen bond acceptors to be 1.\nAssistant: In that case, I advise the chemical with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C."} {"text":"User: I want to analyze a molecule with a chemical formula of C21H32FNO2.\nAssistant: That is a very interesting question, do you have some additional ?\nUser: Indeed, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 2.\nAssistant: I suggest the molecule with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC."}", "/scratch/micpie/export/rdkit_features/train_115-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F?\nAnswer: 55.36"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 75.06"}", "/scratch/micpie/export/rdkit_features/test_24-5.jsonl": "{"text":"The rotatable bond count of the compound with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl is 3."} {"text":"The rotatable bond count of the chemical with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C is 7."}", "/scratch/micpie/export/rdkit_features/valid_115-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4?\nAnswer: 8"} {"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_4-22.jsonl": "{"text":"User: I want to design a chemical with a count of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 3.\nAssistant: Nice, do you have some additional conditions?\nUser: Yes, I want the heteroatom count to be 7.\nAssistant: In that scenario, I suggest the chemical with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br."} {"text":"User: I want to create a chemical with a count of hydrogen bond donors of 2 and a number of hydrogen bond acceptors of 3.\nAssistant: That's interesting, do you have some additional conditions that I should consider?\nUser: Yea, I want the count of heteroatoms to be 6.\nAssistant: In that case, I advise the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C."}", "/scratch/micpie/export/rdkit_features/valid_22-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O?\nAnswer: 5"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_30-3.jsonl": "{"text":"The heteroatom count of the compound with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3 is 5."} {"text":"The count of heteroatoms of the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC is 6."}", "/scratch/micpie/export/rdkit_features/valid_2-20.jsonl": "{"text":"Question: What is the number of basic groups of the molecule with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the compound with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_28-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl?\nAnswer: 41.48"} {"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F?\nAnswer: 58.00"}", "/scratch/micpie/export/rdkit_features/train_115-11.jsonl": "{"text":"User: I want to synthesize a chemical with a chemical formula of C21H14BrF2N3O2.\nAssistant: Do you have some additional limitations I should take into account?\nUser: Yeah, I want the number of hydrogen bond donor sites to be 2, the number of hydrogen bond acceptors to be 3.\nAssistant: Then, I advise the chemical with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F."} {"text":"User: I want to create a compound with a molecular formula of C29H27F2N3O2.\nAssistant: Nice, do you have some additional limitations that help me narrow down the search?\nUser: Yeah, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 3.\nAssistant: I advise the compound with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F."}", "/scratch/micpie/export/rdkit_features/valid_3-15.jsonl": "{"text":"Question: What is the heteroatom count of the compound with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N?\nAnswer: 8"} {"text":"Question: What is the number of heteroatoms of the molecule with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/test_3-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC is 3.75."} {"text":"The Wildman-Crippen LogP value of the compound with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F is 3.54."}", "/scratch/micpie/export/rdkit_features/valid_4-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the chemical with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br?\nAnswer: 12"} {"text":"Question: What is the number of aromatic bonds of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C?\nAnswer: 16"}", "/scratch/micpie/export/rdkit_features/test_100-5.jsonl": "{"text":"The number of rotatable bonds of the compound with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F is 5."} {"text":"The rotatable bond count of the compound with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br is 4."}", "/scratch/micpie/export/rdkit_features/train_14-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C?\nAnswer: 2"} {"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_105-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N is 4."} {"text":"The count of rotatable bonds of the chemical with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O is 4."}", "/scratch/micpie/export/rdkit_features/train_6-3.jsonl": "{"text":"The count of heteroatoms of the chemical with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4 is 9."} {"text":"The count of heteroatoms of the chemical with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C is 6."}", "/scratch/micpie/export/rdkit_features/train_10-1.jsonl": "{"text":"The number of hydrogen bond donors of the chemical with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4 is 1."} {"text":"The number of hydrogen bond donor sites of the compound with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl is 1."}", "/scratch/micpie/export/rdkit_features/train_5-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C?\nAnswer: 4"} {"text":"Question: What is the rotatable bond count of the chemical with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_21-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-]?\nAnswer: 6"} {"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_11-23.jsonl": "{"text":"User: I want to synthesize a molecule with a count of hydrogen bond donors of 0, a count of hydrogen bond acceptors of 1 and a Wildman-Crippen LogP value of 4.62.\nAssistant: That is a very interesting question, do you have some additional requirements I should consider?\nUser: Yea, I want the formula to be C18H24F3NO.\nAssistant: I suggest the molecule with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2."} {"text":"User: I want to create a chemical with a number of hydrogen bond donors of 0, a number of hydrogen bond acceptor sites of 2 and a LogP value computed using the Wildman-Crippen method of 4.61.\nAssistant: Interesting, do you have some additional that help me narrow down the search?\nUser: Yeah, I want the formula to be C22H31NO2.\nAssistant: In that situation, I advise the chemical with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C."}", "/scratch/micpie/export/rdkit_features/valid_112-6.jsonl": "{"text":"The count of aromatic bonds of the chemical with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC is 16."} {"text":"The number of aromatic bonds of the chemical with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N is 12."}", "/scratch/micpie/export/rdkit_features/test_120-11.jsonl": "{"text":"User: I want to synthesize a chemical with a formula of C24H23ClN2O5S.\nAssistant: Do you have some additional I should consider?\nUser: Indeed, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 5.\nAssistant: I advise the chemical with SMILES c1ccc(cc1)CCN(CC(=O)Nc2ccc(cc2)Cl)S(=O)(=O)c3ccc4c(c3)OCCO4."} {"text":"User: I want to create a compound with a chemical formula of C19H18Cl2N4O.\nAssistant: Interesting, do you have some additional conditions that help me narrow down the search?\nUser: Yeah, I want the count of hydrogen bond donors to be 2, the count of hydrogen bond acceptors to be 4.\nAssistant: In that situation, I recommend the compound with SMILES Cc1c(cnn1c2ccccc2)N[C@H](C)C(=O)Nc3cccc(c3Cl)Cl."}", "/scratch/micpie/export/rdkit_features/test_15-8.jsonl": "{"text":"The basic group count of the compound with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O is 0."} {"text":"The number of basic groups of the chemical with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2 is 1."}", "/scratch/micpie/export/rdkit_features/test_119-20.jsonl": "{"text":"Question: What is the count of basic groups of the compound with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_113-8.jsonl": "{"text":"The number of basic groups of the molecule with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 2."} {"text":"The count of basic groups of the chemical with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 0."}", "/scratch/micpie/export/rdkit_features/valid_105-8.jsonl": "{"text":"The count of basic groups of the compound with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F is 0."} {"text":"The basic group count of the compound with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O is 1."}", "/scratch/micpie/export/rdkit_features/train_120-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES Cc1ccc(cc1)NC(=O)CSc2nnc(n2C)C=NC(=O)CCc3ccccc3Cl?\nAnswer: 65.57"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES Cc1c(cnn1c2ccccc2)N[C@@H](C)C(=O)Nc3cccc(c3Cl)Cl?\nAnswer: 55.00"}", "/scratch/micpie/export/rdkit_features/test_11-4.jsonl": "{"text":"The count of rings of the chemical with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2 is 2."} {"text":"The number of rings of the molecule with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C is 3."}", "/scratch/micpie/export/rdkit_features/train_103-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_112-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC?\nAnswer: C21H26N6O3S"} {"text":"Question: What is the chemical formula of the compound with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: C22H27N6O2S+"}", "/scratch/micpie/export/rdkit_features/test_25-15.jsonl": "{"text":"Question: What is the heteroatom count of the molecule with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4?\nAnswer: 7"} {"text":"Question: What is the heteroatom count of the compound with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_118-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the chemical with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_117-4.jsonl": "{"text":"The count of rings of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(cc4)Oc5ccc(cn5)Cl is 5."} {"text":"The count of rings of the compound with SMILES C[C@@H](C(C)(C)OC)[NH2+]CCNc1ccccc1 is 1."}", "/scratch/micpie/export/rdkit_features/test_32-22.jsonl": "{"text":"User: I want to design a compound with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 6.\nAssistant: Cool, do you have some additional conditions that help me narrow down the search?\nUser: Yep, I want the count of heteroatoms to be 8.\nAssistant: In that case, I recommend the compound with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3."} {"text":"User: I want to create a chemical with a number of hydrogen bond donors of 3 and a number of hydrogen bond acceptors of 4.\nAssistant: Do you have some additional I should take into account?\nUser: Indeed, I want the number of heteroatoms to be 7.\nAssistant: In that situation, I the chemical with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3."}", "/scratch/micpie/export/rdkit_features/test_120-0.jsonl": "{"text":"The chemical formula of the compound with SMILES c1ccc(cc1)CCN(CC(=O)Nc2ccc(cc2)Cl)S(=O)(=O)c3ccc4c(c3)OCCO4 is C24H23ClN2O5S."} {"text":"The molecular formula of the molecule with SMILES Cc1c(cnn1c2ccccc2)N[C@H](C)C(=O)Nc3cccc(c3Cl)Cl is C19H18Cl2N4O."}", "/scratch/micpie/export/rdkit_features/test_10-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4?\nAnswer: C23H42N5O+"} {"text":"Question: What is the chemical formula of the compound with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F?\nAnswer: C18H23F4NO"}", "/scratch/micpie/export/rdkit_features/test_109-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl is 64.74."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N is 50.39."}", "/scratch/micpie/export/rdkit_features/train_32-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the compound with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C?\nAnswer: 5"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/valid_25-11.jsonl": "{"text":"User: I want to design a chemical with a chemical formula of C20H29N5O3.\nAssistant: Nice, do you have some additional limitations I should take into account?\nUser: Yeah, I want the number of hydrogen bond donors to be 0, the count of hydrogen bond acceptors to be 8.\nAssistant: I the chemical with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4."} {"text":"User: I want to synthesize a molecule with a chemical formula of C20H25N3OS.\nAssistant: That is a very interesting question, do you have some additional constraints I should consider?\nUser: Yeah, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 4.\nAssistant: In that case, I the molecule with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3."}", "/scratch/micpie/export/rdkit_features/test_7-19.jsonl": "{"text":"Question: What is the acid group count of the compound with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the compound with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_23-5.jsonl": "{"text":"The rotatable bond count of the molecule with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4 is 6."} {"text":"The number of rotatable bonds of the chemical with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4 is 4."}", "/scratch/micpie/export/rdkit_features/test_101-12.jsonl": "{"text":"Question: What is the molecular formula of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F?\nAnswer: C15H14F2N2O3"} {"text":"Question: What is the formula of the molecule with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1?\nAnswer: C26H20N6OS"}", "/scratch/micpie/export/rdkit_features/train_32-16.jsonl": "{"text":"Question: What is the ring count of the chemical with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C?\nAnswer: 2"} {"text":"Question: What is the ring count of the molecule with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_120-17.jsonl": "{"text":"Question: What is the rotatable bond count of the compound with SMILES c1cc(oc1)CNC(=O)CSc2nc3ccc(cc3s2)n4c(c5c(c4O)[C@@H]6C[C@@H]5C=C6)O?\nAnswer: 6"} {"text":"Question: What is the number of rotatable bonds of the molecule with SMILES CN(c1nc2cc(ccc2s1)Cl)C(=O)CSc3ccc(c(c3)F)F?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_29-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F?\nAnswer: 58.00"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C?\nAnswer: 67.63"}", "/scratch/micpie/export/rdkit_features/train_2-1.jsonl": "{"text":"The number of hydrogen bond donors of the chemical with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6 is 1."} {"text":"The number of hydrogen bond donors of the molecule with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O is 2."}", "/scratch/micpie/export/rdkit_features/valid_17-22.jsonl": "{"text":"User: I want to synthesize a chemical with a number of hydrogen bond donors of 2 and a count of hydrogen bond acceptors of 4.\nAssistant: That is a very interesting question, do you have some additional limitations I should take into account?\nUser: Yep, I want the count of heteroatoms to be 9.\nAssistant: In that scenario, I advise the chemical with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S."} {"text":"User: I want to synthesize a chemical with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptor sites of 9.\nAssistant: That's interesting, do you have some additional limitations I should take into account?\nUser: Yes, I want the number of heteroatoms to be 13.\nAssistant: In that scenario, I advise the chemical with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O."}", "/scratch/micpie/export/rdkit_features/test_100-3.jsonl": "{"text":"The heteroatom count of the molecule with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F is 5."} {"text":"The count of heteroatoms of the chemical with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br is 6."}", "/scratch/micpie/export/rdkit_features/valid_104-4.jsonl": "{"text":"The count of rings of the compound with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3 is 3."} {"text":"The count of rings of the chemical with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C is 2."}", "/scratch/micpie/export/rdkit_features/valid_107-11.jsonl": "{"text":"User: I want to synthesize a molecule with a molecular formula of C24H27N5O3.\nAssistant: That's interesting, do you have some additional limitations?\nUser: Yea, I want the count of hydrogen bond donors to be 2, the count of hydrogen bond acceptors to be 5.\nAssistant: I propose the molecule with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C."} {"text":"User: I want to create a chemical with a molecular formula of C23H27N3O6.\nAssistant: Cool, do you have some additional constraints that I should consider?\nUser: Yep, I want the number of hydrogen bond donors to be 3, the number of hydrogen bond acceptors to be 6.\nAssistant: In that situation, I propose the chemical with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C."}", "/scratch/micpie/export/rdkit_features/test_33-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES Cc1ccc(cc1)C(=O)N2CCC(CC2)NC(=O)N[C@@H](C)[C@H](C(F)(F)F)O is 4."} {"text":"The number of rotatable bonds of the chemical with SMILES Cc1cc(no1)OCc2nc(no2)c3cc(c(n(c3=O)C)C)Br is 4."}", "/scratch/micpie/export/rdkit_features/test_115-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F?\nAnswer: 2"} {"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_32-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3 is C19H30N5O2S+."} {"text":"The chemical formula of the chemical with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O is C16H18Cl2N2O5."}", "/scratch/micpie/export/rdkit_features/valid_12-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the compound with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl is 2."} {"text":"The number of hydrogen bond acceptor sites of the compound with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3 is 4."}", "/scratch/micpie/export/rdkit_features/train_22-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O is C19H25ClFN3O3."} {"text":"The molecular formula of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4 is C20H33N6O2+."}", "/scratch/micpie/export/rdkit_features/train_13-11.jsonl": "{"text":"User: I want to create a compound with a chemical formula of C21H32FNO2.\nAssistant: Nice, do you have some additional requirements that help me narrow down the search?\nUser: I want the count of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 2.\nAssistant: In that scenario, I suggest the compound with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC."} {"text":"User: I want to analyze a compound with a chemical formula of C20H30N4O.\nAssistant: That is a very interesting question, do you have some additional conditions that I should consider?\nUser: Indeed, I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 3.\nAssistant: Then, I advise the compound with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C."}", "/scratch/micpie/export/rdkit_features/train_10-19.jsonl": "{"text":"Question: What is the acid group count of the compound with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4?\nAnswer: 0"} {"text":"Question: What is the acid group count of the chemical with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_1-1.jsonl": "{"text":"The number of hydrogen bond donors of the chemical with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5 is 1."} {"text":"The count of hydrogen bond donors of the compound with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F is 0."}", "/scratch/micpie/export/rdkit_features/train_18-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O?\nAnswer: 9"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_100-11.jsonl": "{"text":"User: I want to synthesize a chemical with a molecular formula of C19H33N2O2+.\nAssistant: That's interesting, do you have some additional requirements that I should consider?\nUser: Yeah, I want the count of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 2.\nAssistant: In that scenario, I recommend the chemical with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C."} {"text":"User: I want to make a molecule with a molecular formula of C17H15N3O3.\nAssistant: That's interesting, do you have some additional requirements I should take into account?\nUser: Indeed, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 5.\nAssistant: I recommend the molecule with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3."}", "/scratch/micpie/export/rdkit_features/test_8-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C is 63.66."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F is 60.44."}", "/scratch/micpie/export/rdkit_features/valid_107-16.jsonl": "{"text":"Question: What is the count of rings of the compound with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C?\nAnswer: 4"} {"text":"Question: What is the count of rings of the compound with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_16-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the compound with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C is 2."} {"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N is 5."}", "/scratch/micpie/export/rdkit_features/test_120-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES c1ccc(cc1)CCN(CC(=O)Nc2ccc(cc2)Cl)S(=O)(=O)c3ccc4c(c3)OCCO4?\nAnswer: 5"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES Cc1c(cnn1c2ccccc2)N[C@H](C)C(=O)Nc3cccc(c3Cl)Cl?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_15-23.jsonl": "{"text":"User: I want to create a compound with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptor sites of 5 and a LogP value computed using the Wildman-Crippen method of 2.53.\nAssistant: Do you have some additional constraints I should take into account?\nUser: Yea, I want the chemical formula to be C17H22N4O2.\nAssistant: In that case, I the compound with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC."} {"text":"User: I want to synthesize a chemical with a count of hydrogen bond donors of 2, a count of hydrogen bond acceptors of 3 and a LogP value computed using the Wildman-Crippen method of 1.57.\nAssistant: Nice, do you have some additional limitations that help me narrow down the search?\nUser: Indeed, I want the chemical formula to be C17H28N3OS+.\nAssistant: I advise the chemical with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C."}", "/scratch/micpie/export/rdkit_features/valid_26-6.jsonl": "{"text":"The count of aromatic bonds of the molecule with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2 is 6."} {"text":"The aromatic bond count of the molecule with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC is 17."}", "/scratch/micpie/export/rdkit_features/train_109-22.jsonl": "{"text":"User: I want to create a compound with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptor sites of 5.\nAssistant: That's interesting, do you have some additional requirements that help me narrow down the search?\nUser: Yep, I want the count of heteroatoms to be 8.\nAssistant: In that scenario, I the compound with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3."} {"text":"User: I want to analyze a chemical with a number of hydrogen bond donor sites of 3 and a number of hydrogen bond acceptor sites of 8.\nAssistant: Do you have some additional requirements that help me narrow down the search?\nUser: Yep, I want the count of heteroatoms to be 12.\nAssistant: I the chemical with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C."}", "/scratch/micpie/export/rdkit_features/test_119-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O?\nAnswer: 45.53"} {"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C?\nAnswer: 69.41"}", "/scratch/micpie/export/rdkit_features/test_102-18.jsonl": "{"text":"Question: What is the aromatic bond count of the chemical with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3?\nAnswer: 16"} {"text":"Question: What is the count of aromatic bonds of the molecule with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1?\nAnswer: 22"}", "/scratch/micpie/export/rdkit_features/test_20-8.jsonl": "{"text":"The basic group count of the molecule with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br is 0."} {"text":"The basic group count of the chemical with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-] is 0."}", "/scratch/micpie/export/rdkit_features/valid_27-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the molecule with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F?\nAnswer: 10"} {"text":"Question: What is the aromatic bond count of the chemical with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_11-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the compound with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl is 2."} {"text":"The number of hydrogen bond acceptors of the chemical with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C is 1."}", "/scratch/micpie/export/rdkit_features/valid_32-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the molecule with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3?\nAnswer: 8"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/train_2-6.jsonl": "{"text":"The count of aromatic bonds of the compound with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6 is 20."} {"text":"The aromatic bond count of the chemical with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O is 21."}", "/scratch/micpie/export/rdkit_features/valid_113-4.jsonl": "{"text":"The count of rings of the chemical with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 3."} {"text":"The count of rings of the compound with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 3."}", "/scratch/micpie/export/rdkit_features/train_23-16.jsonl": "{"text":"Question: What is the number of rings of the molecule with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4?\nAnswer: 4"} {"text":"Question: What is the number of rings of the chemical with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_20-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the compound with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br is 2."} {"text":"The number of hydrogen bond acceptors of the chemical with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-] is 6."}", "/scratch/micpie/export/rdkit_features/valid_110-8.jsonl": "{"text":"The count of basic groups of the compound with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O is 1."} {"text":"The number of basic groups of the molecule with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C is 0."}", "/scratch/micpie/export/rdkit_features/test_24-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl is 2.10."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C is 0.99."}", "/scratch/micpie/export/rdkit_features/test_27-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the molecule with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F is 2.17."} {"text":"The Wildman-Crippen LogP value of the chemical with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F is 3.82."}", "/scratch/micpie/export/rdkit_features/test_101-16.jsonl": "{"text":"Question: What is the count of rings of the compound with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F?\nAnswer: 2"} {"text":"Question: What is the count of rings of the molecule with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_15-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O is 38.77."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2 is 52.50."}", "/scratch/micpie/export/rdkit_features/train_100-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the compound with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C is 2."} {"text":"The count of hydrogen bond acceptors of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3 is 5."}", "/scratch/micpie/export/rdkit_features/valid_110-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O?\nAnswer: 69.18"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 63.82"}", "/scratch/micpie/export/rdkit_features/valid_3-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N is 5."} {"text":"The number of rotatable bonds of the molecule with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC is 6."}", "/scratch/micpie/export/rdkit_features/valid_8-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the compound with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C is 3.64."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br is 3.87."}", "/scratch/micpie/export/rdkit_features/valid_25-19.jsonl": "{"text":"Question: What is the count of acid groups of the chemical with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4?\nAnswer: 0"} {"text":"Question: What is the acid group count of the molecule with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_10-22.jsonl": "{"text":"User: I want to analyze a compound with a count of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 5.\nAssistant: Cool, do you have some additional constraints that help me narrow down the search?\nUser: Yea, I want the number of heteroatoms to be 6.\nAssistant: In that scenario, I propose the compound with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4."} {"text":"User: I want to synthesize a chemical with a number of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 2.\nAssistant: Do you have some additional that help me narrow down the search?\nUser: Yes, I want the heteroatom count to be 4.\nAssistant: I propose the chemical with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl."}", "/scratch/micpie/export/rdkit_features/train_25-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C is 63.41."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F is 42.45."}", "/scratch/micpie/export/rdkit_features/test_110-3.jsonl": "{"text":"The number of heteroatoms of the molecule with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3 is 8."} {"text":"The number of heteroatoms of the compound with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4 is 9."}", "/scratch/micpie/export/rdkit_features/valid_17-4.jsonl": "{"text":"The ring count of the compound with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S is 3."} {"text":"The ring count of the chemical with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 5."}", "/scratch/micpie/export/rdkit_features/train_26-5.jsonl": "{"text":"The number of rotatable bonds of the chemical with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F is 3."} {"text":"The count of rotatable bonds of the molecule with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C is 7."}", "/scratch/micpie/export/rdkit_features/test_115-22.jsonl": "{"text":"User: I want to make a chemical with a number of hydrogen bond donor sites of 2 and a count of hydrogen bond acceptors of 5.\nAssistant: That's interesting, do you have some additional conditions that help me narrow down the search?\nUser: Indeed, I want the count of heteroatoms to be 8.\nAssistant: In that case, I propose the chemical with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F."} {"text":"User: I want to design a chemical with a number of hydrogen bond donor sites of 1 and a number of hydrogen bond acceptors of 3.\nAssistant: Cool, do you have some additional conditions?\nUser: I want the number of heteroatoms to be 6.\nAssistant: Then, I the chemical with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F."}", "/scratch/micpie/export/rdkit_features/valid_32-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the molecule with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3 is 0.85."} {"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O is 2.09."}", "/scratch/micpie/export/rdkit_features/train_24-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5?\nAnswer: 55.17"} {"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C?\nAnswer: 63.41"}", "/scratch/micpie/export/rdkit_features/test_27-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the molecule with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F is 2."} {"text":"The number of hydrogen bond donors of the chemical with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F is 2."}", "/scratch/micpie/export/rdkit_features/train_117-7.jsonl": "{"text":"The acid group count of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(cc4)Oc5ccc(cn5)Cl is 0."} {"text":"The number of acid groups of the compound with SMILES C[C@@H](C(C)(C)OC)[NH2+]CCNc1ccccc1 is 0."}", "/scratch/micpie/export/rdkit_features/valid_3-6.jsonl": "{"text":"The number of aromatic bonds of the chemical with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N is 17."} {"text":"The number of aromatic bonds of the compound with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC is 11."}", "/scratch/micpie/export/rdkit_features/valid_20-19.jsonl": "{"text":"Question: What is the acid group count of the chemical with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3?\nAnswer: 0"} {"text":"Question: What is the acid group count of the chemical with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-]?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_3-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_114-22.jsonl": "{"text":"User: I want to analyze a molecule with a number of hydrogen bond donors of 3 and a number of hydrogen bond acceptors of 3.\nAssistant: That's interesting, do you have some additional that help me narrow down the search?\nUser: Yep, I want the number of heteroatoms to be 9.\nAssistant: I recommend the molecule with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl."} {"text":"User: I want to analyze a chemical with a count of hydrogen bond donors of 3 and a number of hydrogen bond acceptors of 2.\nAssistant: That is a very interesting question, do you have some additional that help me narrow down the search?\nUser: Yep, I want the number of heteroatoms to be 6.\nAssistant: In that case, I propose the chemical with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4."}", "/scratch/micpie/export/rdkit_features/train_111-23.jsonl": "{"text":"User: I want to make a compound with a number of hydrogen bond donor sites of 0, a number of hydrogen bond acceptors of 8 and a Wildman-Crippen LogP value computed using RDKit of 2.17.\nAssistant: Do you have some additional conditions that I should consider?\nUser: Yea, I want the formula to be C20H27N5O4S.\nAssistant: I propose the compound with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C."} {"text":"User: I want to synthesize a molecule with a number of hydrogen bond donor sites of 0, a count of hydrogen bond acceptors of 10 and a LogP value computed using the Wildman-Crippen method of 2.11.\nAssistant: That is a very interesting question, do you have some additional requirements?\nUser: Yep, I want the formula to be C22H29N7O3.\nAssistant: In that case, I suggest the molecule with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC."}", "/scratch/micpie/export/rdkit_features/train_25-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the chemical with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C is 8."} {"text":"The number of hydrogen bond acceptor sites of the compound with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F is 2."}", "/scratch/micpie/export/rdkit_features/valid_33-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES c1ccc2c(c1)cn(n2)CC(=O)N3C[C@@H]4[C@H]3CN(CC4)C(=O)[C@@]56CCC[C@@H]5C6?\nAnswer: C22H26N4O2"} {"text":"Question: What is the chemical formula of the molecule with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@@H]3CCSC3)Br?\nAnswer: C15H20BrN4O2S+"}", "/scratch/micpie/export/rdkit_features/test_114-6.jsonl": "{"text":"The count of aromatic bonds of the compound with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C is 18."} {"text":"The aromatic bond count of the chemical with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5 is 26."}", "/scratch/micpie/export/rdkit_features/train_33-22.jsonl": "{"text":"User: I want to make a compound with a number of hydrogen bond donor sites of 3 and a number of hydrogen bond acceptor sites of 4.\nAssistant: Nice, do you have some additional limitations that I should consider?\nUser: Yes, I want the number of heteroatoms to be 10.\nAssistant: In that scenario, I the compound with SMILES COc1ccc(c(n1)C(F)(F)F)NC(=O)C(=O)NCCC[NH+]2CCCCC2."} {"text":"User: I want to design a chemical with a number of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 6.\nAssistant: That is a very interesting question, do you have some additional limitations I should take into account?\nUser: Yes, I want the heteroatom count to be 8.\nAssistant: I recommend the chemical with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@H]3CCSC3)Br."}", "/scratch/micpie/export/rdkit_features/train_33-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the chemical with SMILES COc1ccc(c(n1)C(F)(F)F)NC(=O)C(=O)NCCC[NH+]2CCCCC2 is 4."} {"text":"The number of hydrogen bond acceptors of the molecule with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@H]3CCSC3)Br is 6."}", "/scratch/micpie/export/rdkit_features/valid_23-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4?\nAnswer: 65.31"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4?\nAnswer: 58.75"}", "/scratch/micpie/export/rdkit_features/test_105-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N?\nAnswer: 46.78"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O?\nAnswer: 48.67"}", "/scratch/micpie/export/rdkit_features/train_118-4.jsonl": "{"text":"The number of rings of the compound with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F is 1."} {"text":"The count of rings of the chemical with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3 is 3."}", "/scratch/micpie/export/rdkit_features/test_25-4.jsonl": "{"text":"The ring count of the molecule with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4 is 4."} {"text":"The ring count of the chemical with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3 is 3."}", "/scratch/micpie/export/rdkit_features/valid_0-23.jsonl": "{"text":"User: I want to analyze a chemical with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptors of 8 and a Wildman-Crippen LogP value of 5.68.\nAssistant: That is a very interesting question, do you have some additional constraints I should take into account?\nUser: Yea, I want the chemical formula to be C22H10N2O5S2.\nAssistant: In that scenario, I the chemical with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1."} {"text":"User: I want to analyze a molecule with a number of hydrogen bond donor sites of 1, a count of hydrogen bond acceptors of 4 and a LogP value computed using the Wildman-Crippen method of 3.85.\nAssistant: Interesting, do you have some additional requirements I should take into account?\nUser: Yea, I want the chemical formula to be C24H35NO4.\nAssistant: Then, I propose the molecule with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4."}", "/scratch/micpie/export/rdkit_features/train_108-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2?\nAnswer: 7"} {"text":"Question: What is the rotatable bond count of the molecule with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_10-8.jsonl": "{"text":"The count of basic groups of the molecule with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C is 0."} {"text":"The basic group count of the compound with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3 is 0."}", "/scratch/micpie/export/rdkit_features/valid_100-20.jsonl": "{"text":"Question: What is the count of basic groups of the molecule with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F?\nAnswer: 1"} {"text":"Question: What is the count of basic groups of the chemical with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_120-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES Cc1ccc(cc1)NC(=O)CSc2nnc(n2C)C=NC(=O)CCc3ccccc3Cl?\nAnswer: 6"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES Cc1c(cnn1c2ccccc2)N[C@@H](C)C(=O)Nc3cccc(c3Cl)Cl?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_30-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3 is 3.66."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F is 3.89."}", "/scratch/micpie/export/rdkit_features/valid_22-5.jsonl": "{"text":"The number of rotatable bonds of the chemical with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O is 3."} {"text":"The number of rotatable bonds of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4 is 6."}", "/scratch/micpie/export/rdkit_features/train_109-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3?\nAnswer: 69.79"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C?\nAnswer: 57.87"}", "/scratch/micpie/export/rdkit_features/test_3-1.jsonl": "{"text":"The number of hydrogen bond donors of the molecule with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC is 2."} {"text":"The count of hydrogen bond donors of the molecule with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F is 3."}", "/scratch/micpie/export/rdkit_features/test_104-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2 is 0."} {"text":"The number of acid groups of the molecule with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3 is 0."}", "/scratch/micpie/export/rdkit_features/test_29-22.jsonl": "{"text":"User: I want to make a chemical with a number of hydrogen bond donors of 2 and a number of hydrogen bond acceptors of 2.\nAssistant: Interesting, do you have some additional conditions that I should consider?\nUser: Yeah, I want the number of heteroatoms to be 5.\nAssistant: Then, I the chemical with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4."} {"text":"User: I want to synthesize a chemical with a number of hydrogen bond donor sites of 1 and a number of hydrogen bond acceptors of 2.\nAssistant: Nice, do you have some additional limitations I should take into account?\nUser: Yea, I want the number of heteroatoms to be 4.\nAssistant: In that situation, I the chemical with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C."}", "/scratch/micpie/export/rdkit_features/train_22-16.jsonl": "{"text":"Question: What is the ring count of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O?\nAnswer: 3"} {"text":"Question: What is the count of rings of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_23-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4 is C20H33N6O2+."} {"text":"The chemical formula of the chemical with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3 is C12H12IN6O-."}", "/scratch/micpie/export/rdkit_features/valid_104-23.jsonl": "{"text":"User: I want to analyze a molecule with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptor sites of 6 and a Wildman-Crippen LogP value computed using RDKit of -1.13.\nAssistant: Interesting, do you have some additional limitations I should consider?\nUser: Yea, I want the chemical formula to be C14H21N5O4.\nAssistant: In that scenario, I recommend the molecule with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3."} {"text":"User: I want to design a chemical with a number of hydrogen bond donors of 3, a count of hydrogen bond acceptors of 2 and a LogP value computed using the Wildman-Crippen method of 2.75.\nAssistant: Do you have some additional constraints?\nUser: Yep, I want the chemical formula to be C21H34N3O2+.\nAssistant: In that case, I suggest the chemical with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C."}", "/scratch/micpie/export/rdkit_features/test_105-4.jsonl": "{"text":"The number of rings of the molecule with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N is 2."} {"text":"The ring count of the chemical with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O is 5."}", "/scratch/micpie/export/rdkit_features/valid_14-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C?\nAnswer: 3"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_114-3.jsonl": "{"text":"The heteroatom count of the chemical with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C is 9."} {"text":"The count of heteroatoms of the molecule with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F is 7."}", "/scratch/micpie/export/rdkit_features/train_27-20.jsonl": "{"text":"Question: What is the number of basic groups of the chemical with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C?\nAnswer: 1"} {"text":"Question: What is the basic group count of the compound with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_12-8.jsonl": "{"text":"The basic group count of the compound with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C is 0."} {"text":"The basic group count of the molecule with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4 is 0."}", "/scratch/micpie/export/rdkit_features/test_103-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1 is 53.19."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 47.35."}", "/scratch/micpie/export/rdkit_features/train_29-6.jsonl": "{"text":"The count of aromatic bonds of the chemical with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F is 6."} {"text":"The count of aromatic bonds of the molecule with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C is 6."}", "/scratch/micpie/export/rdkit_features/test_22-7.jsonl": "{"text":"The count of acid groups of the molecule with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O is 0."} {"text":"The number of acid groups of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4 is 0."}", "/scratch/micpie/export/rdkit_features/test_30-22.jsonl": "{"text":"User: I want to create a chemical with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 3.\nAssistant: That's interesting, do you have some additional constraints I should consider?\nUser: Yep, I want the number of heteroatoms to be 5.\nAssistant: Then, I the chemical with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3."} {"text":"User: I want to make a chemical with a number of hydrogen bond donor sites of 0 and a count of hydrogen bond acceptors of 3.\nAssistant: Nice, do you have some additional ?\nUser: Yea, I want the number of heteroatoms to be 6.\nAssistant: In that case, I suggest the chemical with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC."}", "/scratch/micpie/export/rdkit_features/valid_32-4.jsonl": "{"text":"The number of rings of the molecule with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3 is 3."} {"text":"The number of rings of the molecule with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O is 2."}", "/scratch/micpie/export/rdkit_features/valid_0-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1 is 5."} {"text":"The number of rotatable bonds of the molecule with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4 is 9."}", "/scratch/micpie/export/rdkit_features/train_118-23.jsonl": "{"text":"User: I want to design a molecule with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptors of 1 and a LogP value computed using the Wildman-Crippen method of 1.92.\nAssistant: Nice, do you have some additional I should consider?\nUser: I want the formula to be C14H23FNO+.\nAssistant: In that situation, I propose the molecule with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F."} {"text":"User: I want to synthesize a chemical with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptor sites of 6 and a Wildman-Crippen LogP value computed using RDKit of 1.73.\nAssistant: Nice, do you have some additional I should consider?\nUser: Indeed, I want the molecular formula to be C15H18FN3O5.\nAssistant: In that scenario, I advise the chemical with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3."}", "/scratch/micpie/export/rdkit_features/train_17-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F is C23H28F2N2O3."} {"text":"The chemical formula of the chemical with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is C23H24N4O8."}", "/scratch/micpie/export/rdkit_features/valid_112-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC is 66.20."} {"text":"The sum of atomic polarizabilities of the compound with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N is 67.83."}", "/scratch/micpie/export/rdkit_features/train_100-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_118-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O is 42.44."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N is 42.71."}", "/scratch/micpie/export/rdkit_features/valid_106-16.jsonl": "{"text":"Question: What is the count of rings of the compound with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl?\nAnswer: 3"} {"text":"Question: What is the number of rings of the compound with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_108-22.jsonl": "{"text":"User: I want to make a chemical with a count of hydrogen bond donors of 2 and a number of hydrogen bond acceptor sites of 7.\nAssistant: Do you have some additional requirements I should take into account?\nUser: Indeed, I want the number of heteroatoms to be 12.\nAssistant: In that situation, I advise the chemical with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2."} {"text":"User: I want to make a compound with a number of hydrogen bond donors of 3 and a number of hydrogen bond acceptor sites of 4.\nAssistant: Nice, do you have some additional constraints I should take into account?\nUser: Yes, I want the count of heteroatoms to be 8.\nAssistant: Then, I propose the compound with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3."}", "/scratch/micpie/export/rdkit_features/test_101-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the compound with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F?\nAnswer: 12"} {"text":"Question: What is the number of aromatic bonds of the molecule with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1?\nAnswer: 28"}", "/scratch/micpie/export/rdkit_features/test_27-22.jsonl": "{"text":"User: I want to design a chemical with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptor sites of 2.\nAssistant: Nice, do you have some additional constraints?\nUser: Yep, I want the number of heteroatoms to be 8.\nAssistant: I the chemical with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F."} {"text":"User: I want to analyze a chemical with a number of hydrogen bond donor sites of 2 and a count of hydrogen bond acceptors of 3.\nAssistant: That's interesting, do you have some additional requirements I should consider?\nUser: Yea, I want the heteroatom count to be 7.\nAssistant: I advise the chemical with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F."}", "/scratch/micpie/export/rdkit_features/train_110-16.jsonl": "{"text":"Question: What is the ring count of the compound with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O?\nAnswer: 4"} {"text":"Question: What is the count of rings of the chemical with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_120-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES Cc1ccc(cc1)NC(=O)CSc2nnc(n2C)C=NC(=O)CCc3ccccc3Cl is 0."} {"text":"The acid group count of the molecule with SMILES Cc1c(cnn1c2ccccc2)N[C@@H](C)C(=O)Nc3cccc(c3Cl)Cl is 0."}", "/scratch/micpie/export/rdkit_features/test_100-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F?\nAnswer: 54.34"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br?\nAnswer: 40.17"}", "/scratch/micpie/export/rdkit_features/train_115-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F is C21H14BrF2N3O2."} {"text":"The molecular formula of the chemical with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is C29H27F2N3O2."}", "/scratch/micpie/export/rdkit_features/train_9-8.jsonl": "{"text":"The basic group count of the chemical with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C is 1."} {"text":"The basic group count of the compound with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F is 0."}", "/scratch/micpie/export/rdkit_features/test_102-0.jsonl": "{"text":"The formula of the chemical with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3 is C23H25NO3."} {"text":"The molecular formula of the compound with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1 is C28H26N2O6."}", "/scratch/micpie/export/rdkit_features/test_11-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2?\nAnswer: C18H24F3NO"} {"text":"Question: What is the chemical formula of the compound with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C?\nAnswer: C22H31NO2"}", "/scratch/micpie/export/rdkit_features/train_23-5.jsonl": "{"text":"The number of rotatable bonds of the molecule with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4 is 6."} {"text":"The rotatable bond count of the molecule with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3 is 3."}", "/scratch/micpie/export/rdkit_features/train_0-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1?\nAnswer: 11"} {"text":"Question: What is the count of heteroatoms of the molecule with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_29-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4?\nAnswer: 64.17"} {"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C?\nAnswer: 66.10"}", "/scratch/micpie/export/rdkit_features/valid_29-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl?\nAnswer: 62.15"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3?\nAnswer: 58.55"}", "/scratch/micpie/export/rdkit_features/train_111-11.jsonl": "{"text":"User: I want to design a chemical with a formula of C20H27N5O4S.\nAssistant: Interesting, do you have some additional requirements I should take into account?\nUser: Yeah, I want the number of hydrogen bond donors to be 0, the count of hydrogen bond acceptors to be 8.\nAssistant: In that situation, I recommend the chemical with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C."} {"text":"User: I want to analyze a chemical with a chemical formula of C22H29N7O3.\nAssistant: That is a very interesting question, do you have some additional conditions I should take into account?\nUser: Yep, I want the count of hydrogen bond donors to be 0, the number of hydrogen bond acceptors to be 10.\nAssistant: In that case, I suggest the chemical with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC."}", "/scratch/micpie/export/rdkit_features/valid_23-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4 is C19H34N5OS+."} {"text":"The chemical formula of the molecule with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4 is C19H24N7O2-."}", "/scratch/micpie/export/rdkit_features/valid_0-4.jsonl": "{"text":"The count of rings of the compound with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1 is 5."} {"text":"The number of rings of the molecule with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4 is 4."}", "/scratch/micpie/export/rdkit_features/train_27-3.jsonl": "{"text":"The heteroatom count of the compound with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C is 5."} {"text":"The number of heteroatoms of the compound with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4 is 6."}", "/scratch/micpie/export/rdkit_features/test_1-3.jsonl": "{"text":"The heteroatom count of the chemical with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4 is 6."} {"text":"The count of heteroatoms of the compound with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC is 8."}", "/scratch/micpie/export/rdkit_features/test_8-19.jsonl": "{"text":"Question: What is the number of acid groups of the molecule with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C?\nAnswer: 0"} {"text":"Question: What is the acid group count of the compound with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_4-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4 is 3.86."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3 is 2.50."}", "/scratch/micpie/export/rdkit_features/test_8-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C?\nAnswer: 7"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_119-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the chemical with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O?\nAnswer: 5"} {"text":"Question: What is the rotatable bond count of the chemical with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_101-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the chemical with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3?\nAnswer: 7"} {"text":"Question: What is the number of heteroatoms of the compound with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/valid_8-3.jsonl": "{"text":"The heteroatom count of the compound with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C is 7."} {"text":"The number of heteroatoms of the compound with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br is 9."}", "/scratch/micpie/export/rdkit_features/valid_114-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl?\nAnswer: 3"} {"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_115-11.jsonl": "{"text":"User: I want to analyze a compound with a molecular formula of C24H22N4O4S.\nAssistant: Cool, do you have some additional I should take into account?\nUser: Yes, I want the number of hydrogen bond donor sites to be 1, the count of hydrogen bond acceptors to be 8.\nAssistant: In that situation, I the compound with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4."} {"text":"User: I want to synthesize a molecule with a chemical formula of C27H25F4N3O2.\nAssistant: Do you have some additional limitations I should take into account?\nUser: Indeed, I want the count of hydrogen bond donors to be 2, the count of hydrogen bond acceptors to be 2.\nAssistant: In that situation, I recommend the molecule with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F."}", "/scratch/micpie/export/rdkit_features/train_24-7.jsonl": "{"text":"The number of acid groups of the molecule with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5 is 3."} {"text":"The number of acid groups of the chemical with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C is 0."}", "/scratch/micpie/export/rdkit_features/train_9-19.jsonl": "{"text":"Question: What is the count of acid groups of the chemical with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C?\nAnswer: 0"} {"text":"Question: What is the acid group count of the compound with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_9-15.jsonl": "{"text":"Question: What is the heteroatom count of the molecule with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2?\nAnswer: 7"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_117-5.jsonl": "{"text":"The rotatable bond count of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc5cc(ccc5c4)Br is 4."} {"text":"The rotatable bond count of the molecule with SMILES C[C@H](C(C)(C)OC)[NH2+]CCNc1ccccc1 is 7."}", "/scratch/micpie/export/rdkit_features/test_5-3.jsonl": "{"text":"The count of heteroatoms of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5 is 6."} {"text":"The number of heteroatoms of the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F is 7."}", "/scratch/micpie/export/rdkit_features/train_8-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_18-11.jsonl": "{"text":"User: I want to create a molecule with a molecular formula of C21H27N3O9S.\nAssistant: Interesting, do you have some additional requirements that I should consider?\nUser: Yes, I want the number of hydrogen bond donor sites to be 1, the count of hydrogen bond acceptors to be 9.\nAssistant: In that scenario, I suggest the molecule with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O."} {"text":"User: I want to design a molecule with a formula of C17H18Cl2N2O4S.\nAssistant: Do you have some additional requirements I should consider?\nUser: Yep, I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 4.\nAssistant: Then, I suggest the molecule with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl."}", "/scratch/micpie/export/rdkit_features/train_103-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1 is 1."} {"text":"The number of hydrogen bond donors of the molecule with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 1."}", "/scratch/micpie/export/rdkit_features/train_32-19.jsonl": "{"text":"Question: What is the number of acid groups of the compound with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the molecule with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_2-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C is 61.94."} {"text":"The sum of atomic polarizabilities of the compound with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4 is 64.62."}", "/scratch/micpie/export/rdkit_features/valid_14-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C?\nAnswer: 59.07"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O?\nAnswer: 40.53"}", "/scratch/micpie/export/rdkit_features/valid_21-3.jsonl": "{"text":"The count of heteroatoms of the compound with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-] is 8."} {"text":"The heteroatom count of the molecule with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3 is 6."}", "/scratch/micpie/export/rdkit_features/valid_19-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC is 3.70."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F is 3.99."}", "/scratch/micpie/export/rdkit_features/test_22-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the molecule with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O?\nAnswer: 6"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/valid_9-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2?\nAnswer: 63.11"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4?\nAnswer: 73.59"}", "/scratch/micpie/export/rdkit_features/test_111-3.jsonl": "{"text":"The heteroatom count of the molecule with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4 is 9."} {"text":"The heteroatom count of the compound with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC is 9."}", "/scratch/micpie/export/rdkit_features/test_105-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N is C17H15BrN2O2."} {"text":"The chemical formula of the compound with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O is C16H21N3O4."}", "/scratch/micpie/export/rdkit_features/test_25-12.jsonl": "{"text":"Question: What is the formula of the molecule with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4?\nAnswer: C21H38N5O2+"} {"text":"Question: What is the chemical formula of the chemical with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3?\nAnswer: C19H19F3N2O2"}", "/scratch/micpie/export/rdkit_features/valid_115-0.jsonl": "{"text":"The molecular formula of the chemical with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4 is C24H22N4O4S."} {"text":"The chemical formula of the molecule with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F is C27H25F4N3O2."}", "/scratch/micpie/export/rdkit_features/train_2-12.jsonl": "{"text":"Question: What is the formula of the compound with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6?\nAnswer: C22H24N5O2S+"} {"text":"Question: What is the chemical formula of the compound with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O?\nAnswer: C25H24N2O4"}", "/scratch/micpie/export/rdkit_features/train_28-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl?\nAnswer: 0"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_2-23.jsonl": "{"text":"User: I want to make a compound with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value computed using RDKit of 2.43.\nAssistant: Do you have some additional constraints I should consider?\nUser: I want the chemical formula to be C20H27ClFN4O2+.\nAssistant: In that scenario, I suggest the compound with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C."} {"text":"User: I want to create a chemical with a count of hydrogen bond donors of 0, a number of hydrogen bond acceptors of 5 and a Wildman-Crippen LogP value of 3.64.\nAssistant: Cool, do you have some additional ?\nUser: Yes, I want the molecular formula to be C24H25N3O3.\nAssistant: I advise the chemical with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4."}", "/scratch/micpie/export/rdkit_features/train_106-22.jsonl": "{"text":"User: I want to design a molecule with a number of hydrogen bond donors of 0 and a number of hydrogen bond acceptor sites of 6.\nAssistant: Cool, do you have some additional I should consider?\nUser: Yep, I want the heteroatom count to be 9.\nAssistant: In that case, I suggest the molecule with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O."} {"text":"User: I want to design a compound with a number of hydrogen bond donors of 2 and a count of hydrogen bond acceptors of 6.\nAssistant: That is a very interesting question, do you have some additional that help me narrow down the search?\nUser: Yea, I want the number of heteroatoms to be 11.\nAssistant: In that scenario, I advise the compound with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl."}", "/scratch/micpie/export/rdkit_features/valid_110-5.jsonl": "{"text":"The rotatable bond count of the chemical with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O is 3."} {"text":"The count of rotatable bonds of the chemical with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C is 6."}", "/scratch/micpie/export/rdkit_features/train_7-23.jsonl": "{"text":"User: I want to design a compound with a count of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 5 and a Wildman-Crippen LogP value of 3.60.\nAssistant: Do you have some additional I should consider?\nUser: Yea, I want the chemical formula to be C22H24ClN3O3.\nAssistant: I advise the compound with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl."} {"text":"User: I want to create a chemical with a number of hydrogen bond donors of 2, a number of hydrogen bond acceptors of 5 and a LogP value computed using the Wildman-Crippen method of 3.87.\nAssistant: Interesting, do you have some additional requirements I should take into account?\nUser: Yea, I want the formula to be C21H24F2N2O5.\nAssistant: In that scenario, I the chemical with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F."}", "/scratch/micpie/export/rdkit_features/test_119-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O is 45.53."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C is 69.41."}", "/scratch/micpie/export/rdkit_features/test_111-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the compound with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4?\nAnswer: 9"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/test_7-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC?\nAnswer: 66.38"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC?\nAnswer: 58.07"}", "/scratch/micpie/export/rdkit_features/valid_24-20.jsonl": "{"text":"Question: What is the count of basic groups of the chemical with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the compound with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_28-16.jsonl": "{"text":"Question: What is the ring count of the compound with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br?\nAnswer: 3"} {"text":"Question: What is the ring count of the chemical with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_24-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_23-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4?\nAnswer: 5"} {"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_107-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond donors of the molecule with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_114-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C?\nAnswer: 3"} {"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_107-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl is 4."} {"text":"The count of rotatable bonds of the chemical with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5 is 5."}", "/scratch/micpie/export/rdkit_features/test_7-17.jsonl": "{"text":"Question: What is the rotatable bond count of the chemical with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC?\nAnswer: 6"} {"text":"Question: What is the rotatable bond count of the chemical with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_26-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3?\nAnswer: 3"} {"text":"Question: What is the number of rings of the molecule with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_27-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C?\nAnswer: C21H31N4O+"} {"text":"Question: What is the chemical formula of the compound with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4?\nAnswer: C21H23N3O3"}", "/scratch/micpie/export/rdkit_features/test_111-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4?\nAnswer: 72.20"} {"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC?\nAnswer: 68.10"}", "/scratch/micpie/export/rdkit_features/test_17-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F?\nAnswer: 60.81"} {"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 63.41"}", "/scratch/micpie/export/rdkit_features/train_110-22.jsonl": "{"text":"User: I want to analyze a chemical with a count of hydrogen bond donors of 3 and a number of hydrogen bond acceptors of 7.\nAssistant: Cool, do you have some additional conditions that I should consider?\nUser: Yes, I want the count of heteroatoms to be 10.\nAssistant: In that case, I the chemical with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O."} {"text":"User: I want to synthesize a molecule with a number of hydrogen bond donor sites of 0 and a count of hydrogen bond acceptors of 8.\nAssistant: That's interesting, do you have some additional constraints that help me narrow down the search?\nUser: Indeed, I want the number of heteroatoms to be 10.\nAssistant: Then, I advise the molecule with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C."}", "/scratch/micpie/export/rdkit_features/train_24-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5 is 1.65."} {"text":"The Wildman-Crippen LogP value of the chemical with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C is 2.44."}", "/scratch/micpie/export/rdkit_features/train_26-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F is 42.45."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C is 62.83."}", "/scratch/micpie/export/rdkit_features/valid_114-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl?\nAnswer: 3"} {"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_21-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-] is 2."} {"text":"The number of hydrogen bond donors of the molecule with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3 is 2."}", "/scratch/micpie/export/rdkit_features/valid_100-23.jsonl": "{"text":"User: I want to design a molecule with a number of hydrogen bond donor sites of 2, a count of hydrogen bond acceptors of 1 and a Wildman-Crippen LogP value computed using RDKit of 1.37.\nAssistant: Interesting, do you have some additional conditions I should take into account?\nUser: Indeed, I want the formula to be C18H27FN3O+.\nAssistant: In that situation, I recommend the molecule with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F."} {"text":"User: I want to make a molecule with a number of hydrogen bond donor sites of 1, a count of hydrogen bond acceptors of 2 and a LogP value computed using the Wildman-Crippen method of 1.39.\nAssistant: Nice, do you have some additional requirements I should take into account?\nUser: Indeed, I want the formula to be C20H24N3O+.\nAssistant: In that case, I recommend the molecule with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4."}", "/scratch/micpie/export/rdkit_features/valid_112-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC is C21H26N6O3S."} {"text":"The chemical formula of the compound with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N is C22H27N6O2S+."}", "/scratch/micpie/export/rdkit_features/train_114-8.jsonl": "{"text":"The basic group count of the compound with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C is 0."} {"text":"The count of basic groups of the molecule with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F is 1."}", "/scratch/micpie/export/rdkit_features/test_33-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the chemical with SMILES Cc1ccc(cc1)C(=O)N2CCC(CC2)NC(=O)N[C@@H](C)[C@H](C(F)(F)F)O?\nAnswer: 6"} {"text":"Question: What is the aromatic bond count of the chemical with SMILES Cc1cc(no1)OCc2nc(no2)c3cc(c(n(c3=O)C)C)Br?\nAnswer: 16"}", "/scratch/micpie/export/rdkit_features/valid_110-11.jsonl": "{"text":"User: I want to make a molecule with a formula of C24H29N4O4+.\nAssistant: That is a very interesting question, do you have some additional requirements that help me narrow down the search?\nUser: Yeah, I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 4.\nAssistant: I advise the molecule with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O."} {"text":"User: I want to design a compound with a molecular formula of C19H25N5O3S2.\nAssistant: Do you have some additional requirements I should take into account?\nUser: Indeed, I want the number of hydrogen bond donor sites to be 0, the count of hydrogen bond acceptors to be 8.\nAssistant: Then, I propose the compound with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C."}", "/scratch/micpie/export/rdkit_features/valid_14-1.jsonl": "{"text":"The count of hydrogen bond donors of the chemical with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C is 1."} {"text":"The number of hydrogen bond donors of the chemical with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O is 1."}", "/scratch/micpie/export/rdkit_features/train_6-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_19-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC?\nAnswer: 4"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_110-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the compound with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O is 4."} {"text":"The number of hydrogen bond acceptors of the molecule with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C is 8."}", "/scratch/micpie/export/rdkit_features/train_6-7.jsonl": "{"text":"The number of acid groups of the compound with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4 is 0."} {"text":"The acid group count of the molecule with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C is 0."}", "/scratch/micpie/export/rdkit_features/test_112-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the compound with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC is 66.20."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N is 67.83."}", "/scratch/micpie/export/rdkit_features/valid_26-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2?\nAnswer: 52.86"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC?\nAnswer: 54.38"}", "/scratch/micpie/export/rdkit_features/test_30-6.jsonl": "{"text":"The count of aromatic bonds of the chemical with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3 is 12."} {"text":"The count of aromatic bonds of the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC is 12."}", "/scratch/micpie/export/rdkit_features/test_11-6.jsonl": "{"text":"The aromatic bond count of the compound with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2 is 6."} {"text":"The number of aromatic bonds of the chemical with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C is 6."}", "/scratch/micpie/export/rdkit_features/valid_31-19.jsonl": "{"text":"Question: What is the number of acid groups of the compound with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC?\nAnswer: 0"} {"text":"Question: What is the acid group count of the chemical with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_112-6.jsonl": "{"text":"The aromatic bond count of the molecule with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC is 11."} {"text":"The aromatic bond count of the compound with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 6."}", "/scratch/micpie/export/rdkit_features/train_9-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C is C25H32N3O3+."} {"text":"The formula of the chemical with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F is C22H32FN5O."}", "/scratch/micpie/export/rdkit_features/train_16-8.jsonl": "{"text":"The number of basic groups of the chemical with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C is 1."} {"text":"The basic group count of the molecule with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F is 0."}", "/scratch/micpie/export/rdkit_features/train_0-5.jsonl": "{"text":"The number of rotatable bonds of the compound with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1 is 3."} {"text":"The count of rotatable bonds of the chemical with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4 is 7."}", "/scratch/micpie/export/rdkit_features/test_113-3.jsonl": "{"text":"The count of heteroatoms of the chemical with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N is 9."} {"text":"The number of heteroatoms of the compound with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 10."}", "/scratch/micpie/export/rdkit_features/train_7-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the chemical with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl?\nAnswer: 7"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/test_28-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_109-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the molecule with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3?\nAnswer: 6"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_32-20.jsonl": "{"text":"Question: What is the basic group count of the chemical with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3?\nAnswer: 1"} {"text":"Question: What is the number of basic groups of the molecule with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_26-12.jsonl": "{"text":"Question: What is the formula of the chemical with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2?\nAnswer: C18H24F3NO3"} {"text":"Question: What is the chemical formula of the compound with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC?\nAnswer: C19H20N4O4"}", "/scratch/micpie/export/rdkit_features/test_17-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the chemical with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F?\nAnswer: 8"} {"text":"Question: What is the number of heteroatoms of the molecule with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/test_101-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F is 4."} {"text":"The count of hydrogen bond acceptors of the chemical with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1 is 6."}", "/scratch/micpie/export/rdkit_features/test_26-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the chemical with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3?\nAnswer: 6"} {"text":"Question: What is the number of rotatable bonds of the chemical with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_24-22.jsonl": "{"text":"User: I want to analyze a chemical with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptors of 6.\nAssistant: That's interesting, do you have some additional constraints that help me narrow down the search?\nUser: Yeah, I want the number of heteroatoms to be 9.\nAssistant: I propose the chemical with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5."} {"text":"User: I want to design a molecule with a number of hydrogen bond donor sites of 1 and a count of hydrogen bond acceptors of 8.\nAssistant: Cool, do you have some additional limitations I should take into account?\nUser: Indeed, I want the heteroatom count to be 9.\nAssistant: I advise the molecule with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C."}", "/scratch/micpie/export/rdkit_features/valid_106-8.jsonl": "{"text":"The basic group count of the chemical with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl is 0."} {"text":"The count of basic groups of the compound with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F is 1."}", "/scratch/micpie/export/rdkit_features/train_24-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the chemical with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5 is 6."} {"text":"The count of hydrogen bond acceptors of the compound with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C is 8."}", "/scratch/micpie/export/rdkit_features/train_14-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the chemical with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C?\nAnswer: 5"} {"text":"Question: What is the number of heteroatoms of the compound with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_101-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the chemical with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3?\nAnswer: 16"} {"text":"Question: What is the aromatic bond count of the chemical with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1?\nAnswer: 18"}", "/scratch/micpie/export/rdkit_features/train_8-4.jsonl": "{"text":"The count of rings of the chemical with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N is 4."} {"text":"The number of rings of the chemical with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO is 3."}", "/scratch/micpie/export/rdkit_features/valid_8-23.jsonl": "{"text":"User: I want to synthesize a chemical with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptor sites of 5 and a Wildman-Crippen LogP value of 3.64.\nAssistant: That's interesting, do you have some additional constraints?\nUser: Yeah, I want the formula to be C23H36N2O5.\nAssistant: In that situation, I the chemical with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C."} {"text":"User: I want to synthesize a compound with a number of hydrogen bond donor sites of 0, a number of hydrogen bond acceptor sites of 2 and a Wildman-Crippen LogP value of 3.87.\nAssistant: Interesting, do you have some additional conditions?\nUser: Yeah, I want the formula to be C11H7Br2F4NO2.\nAssistant: In that situation, I suggest the compound with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br."}", "/scratch/micpie/export/rdkit_features/valid_104-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3?\nAnswer: 47.35"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C?\nAnswer: 64.53"}", "/scratch/micpie/export/rdkit_features/valid_116-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 5"} {"text":"Question: What is the rotatable bond count of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_112-18.jsonl": "{"text":"Question: What is the aromatic bond count of the molecule with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC?\nAnswer: 11"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_0-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the compound with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1?\nAnswer: 9"} {"text":"Question: What is the number of heteroatoms of the compound with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_110-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3?\nAnswer: C23H30N3O4S+"} {"text":"Question: What is the formula of the chemical with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4?\nAnswer: C21H31N5O3S"}", "/scratch/micpie/export/rdkit_features/valid_106-12.jsonl": "{"text":"Question: What is the molecular formula of the compound with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl?\nAnswer: C13H11ClFN3O3"} {"text":"Question: What is the chemical formula of the chemical with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F?\nAnswer: C18H24F4N3O3S+"}", "/scratch/micpie/export/rdkit_features/test_23-22.jsonl": "{"text":"User: I want to design a compound with a number of hydrogen bond donor sites of 1 and a number of hydrogen bond acceptor sites of 5.\nAssistant: Do you have some additional limitations that help me narrow down the search?\nUser: Indeed, I want the heteroatom count to be 7.\nAssistant: In that case, I recommend the compound with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4."} {"text":"User: I want to analyze a molecule with a number of hydrogen bond donors of 2 and a number of hydrogen bond acceptors of 5.\nAssistant: That is a very interesting question, do you have some additional ?\nUser: Yeah, I want the heteroatom count to be 10.\nAssistant: Then, I the molecule with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4."}", "/scratch/micpie/export/rdkit_features/valid_23-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the molecule with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_115-16.jsonl": "{"text":"Question: What is the count of rings of the molecule with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F?\nAnswer: 3"} {"text":"Question: What is the count of rings of the molecule with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_101-7.jsonl": "{"text":"The number of acid groups of the compound with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F is 0."} {"text":"The number of acid groups of the chemical with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1 is 0."}", "/scratch/micpie/export/rdkit_features/train_116-4.jsonl": "{"text":"The ring count of the compound with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 6."} {"text":"The count of rings of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br is 5."}", "/scratch/micpie/export/rdkit_features/test_25-23.jsonl": "{"text":"User: I want to analyze a compound with a number of hydrogen bond donor sites of 1, a count of hydrogen bond acceptors of 6 and a Wildman-Crippen LogP value computed using RDKit of 0.93.\nAssistant: Nice, do you have some additional limitations?\nUser: Yeah, I want the molecular formula to be C21H38N5O2+.\nAssistant: In that situation, I propose the compound with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4."} {"text":"User: I want to create a molecule with a number of hydrogen bond donors of 2, a count of hydrogen bond acceptors of 3 and a LogP value computed using the Wildman-Crippen method of 3.88.\nAssistant: Interesting, do you have some additional constraints I should take into account?\nUser: Yes, I want the molecular formula to be C19H19F3N2O2.\nAssistant: In that scenario, I the molecule with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3."}", "/scratch/micpie/export/rdkit_features/train_6-5.jsonl": "{"text":"The rotatable bond count of the compound with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4 is 4."} {"text":"The number of rotatable bonds of the molecule with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C is 6."}", "/scratch/micpie/export/rdkit_features/test_17-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F is 60.81."} {"text":"The sum of atomic polarizabilities of the compound with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 63.41."}", "/scratch/micpie/export/rdkit_features/test_111-1.jsonl": "{"text":"The number of hydrogen bond donors of the molecule with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4 is 1."} {"text":"The number of hydrogen bond donor sites of the compound with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC is 0."}", "/scratch/micpie/export/rdkit_features/valid_0-12.jsonl": "{"text":"Question: What is the formula of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1?\nAnswer: C22H10N2O5S2"} {"text":"Question: What is the molecular formula of the molecule with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4?\nAnswer: C24H35NO4"}", "/scratch/micpie/export/rdkit_features/valid_9-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the molecule with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2 is 3."} {"text":"The count of hydrogen bond acceptors of the molecule with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4 is 6."}", "/scratch/micpie/export/rdkit_features/valid_102-5.jsonl": "{"text":"The number of rotatable bonds of the compound with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1 is 6."} {"text":"The rotatable bond count of the compound with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2 is 33."}", "/scratch/micpie/export/rdkit_features/train_112-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC?\nAnswer: 0"} {"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_22-23.jsonl": "{"text":"User: I want to analyze a chemical with a number of hydrogen bond donor sites of 1, a count of hydrogen bond acceptors of 5 and a LogP value computed using the Wildman-Crippen method of 2.41.\nAssistant: Interesting, do you have some additional conditions I should consider?\nUser: Yea, I want the chemical formula to be C17H24ClN3O3S.\nAssistant: Then, I advise the chemical with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O."} {"text":"User: I want to design a molecule with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 7 and a LogP value computed using the Wildman-Crippen method of 0.74.\nAssistant: Nice, do you have some additional that I should consider?\nUser: Indeed, I want the chemical formula to be C20H33N6O2+.\nAssistant: I propose the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4."}", "/scratch/micpie/export/rdkit_features/test_29-4.jsonl": "{"text":"The count of rings of the compound with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4 is 4."} {"text":"The number of rings of the chemical with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C is 2."}", "/scratch/micpie/export/rdkit_features/train_13-17.jsonl": "{"text":"Question: What is the rotatable bond count of the compound with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 8"} {"text":"Question: What is the number of rotatable bonds of the molecule with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_12-6.jsonl": "{"text":"The count of aromatic bonds of the chemical with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C is 6."} {"text":"The count of aromatic bonds of the compound with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC is 6."}", "/scratch/micpie/export/rdkit_features/train_119-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the compound with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO?\nAnswer: 12"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/valid_109-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl?\nAnswer: 5"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_12-22.jsonl": "{"text":"User: I want to create a chemical with a number of hydrogen bond donor sites of 0 and a count of hydrogen bond acceptors of 1.\nAssistant: Do you have some additional limitations I should consider?\nUser: Yep, I want the number of heteroatoms to be 4.\nAssistant: Then, I the chemical with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C."} {"text":"User: I want to make a chemical with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 2.\nAssistant: Cool, do you have some additional limitations I should consider?\nUser: Yea, I want the count of heteroatoms to be 4.\nAssistant: In that scenario, I advise the chemical with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC."}", "/scratch/micpie/export/rdkit_features/train_101-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the chemical with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3?\nAnswer: 6"} {"text":"Question: What is the number of heteroatoms of the compound with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_27-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F is 56.26."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F is 57.46."}", "/scratch/micpie/export/rdkit_features/valid_18-11.jsonl": "{"text":"User: I want to make a chemical with a molecular formula of C22H23N5O7.\nAssistant: Nice, do you have some additional constraints I should consider?\nUser: Yea, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 9.\nAssistant: In that scenario, I recommend the chemical with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O."} {"text":"User: I want to design a compound with a formula of C12H14Br2F3NO.\nAssistant: That's interesting, do you have some additional requirements that I should consider?\nUser: Yep, I want the count of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 2.\nAssistant: In that scenario, I suggest the compound with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O."}", "/scratch/micpie/export/rdkit_features/train_105-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donors of the molecule with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_20-5.jsonl": "{"text":"The number of rotatable bonds of the molecule with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3 is 10."} {"text":"The count of rotatable bonds of the molecule with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-] is 6."}", "/scratch/micpie/export/rdkit_features/train_10-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4?\nAnswer: 5"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_4-5.jsonl": "{"text":"The number of rotatable bonds of the compound with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4 is 6."} {"text":"The rotatable bond count of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C is 4."}", "/scratch/micpie/export/rdkit_features/valid_18-15.jsonl": "{"text":"Question: What is the heteroatom count of the compound with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 12"} {"text":"Question: What is the heteroatom count of the chemical with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_17-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/train_30-4.jsonl": "{"text":"The ring count of the molecule with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F is 1."} {"text":"The count of rings of the chemical with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC is 4."}", "/scratch/micpie/export/rdkit_features/train_11-8.jsonl": "{"text":"The number of basic groups of the chemical with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl is 0."} {"text":"The count of basic groups of the chemical with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C is 0."}", "/scratch/micpie/export/rdkit_features/test_11-20.jsonl": "{"text":"Question: What is the count of basic groups of the molecule with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2?\nAnswer: 0"} {"text":"Question: What is the basic group count of the compound with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_100-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C?\nAnswer: 59.25"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3?\nAnswer: 45.63"}", "/scratch/micpie/export/rdkit_features/train_16-22.jsonl": "{"text":"User: I want to synthesize a compound with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptor sites of 3.\nAssistant: Do you have some additional conditions that help me narrow down the search?\nUser: I want the heteroatom count to be 5.\nAssistant: In that situation, I advise the compound with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C."} {"text":"User: I want to synthesize a chemical with a number of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 3.\nAssistant: That's interesting, do you have some additional I should take into account?\nUser: I want the heteroatom count to be 7.\nAssistant: In that situation, I propose the chemical with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F."}", "/scratch/micpie/export/rdkit_features/train_8-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N?\nAnswer: 6"} {"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_7-20.jsonl": "{"text":"Question: What is the number of basic groups of the chemical with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the chemical with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_119-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O is 1.71."} {"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C is 3.88."}", "/scratch/micpie/export/rdkit_features/test_14-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the compound with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3 is 58.17."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O is 38.77."}", "/scratch/micpie/export/rdkit_features/test_106-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O is 6."} {"text":"The count of hydrogen bond acceptors of the compound with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC is 6."}", "/scratch/micpie/export/rdkit_features/train_14-18.jsonl": "{"text":"Question: What is the aromatic bond count of the chemical with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C?\nAnswer: 11"} {"text":"Question: What is the aromatic bond count of the molecule with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_33-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES Cc1ccc(cc1)C(=O)N2CCC(CC2)NC(=O)N[C@@H](C)[C@H](C(F)(F)F)O is 3."} {"text":"The count of hydrogen bond acceptors of the compound with SMILES Cc1cc(no1)OCc2nc(no2)c3cc(c(n(c3=O)C)C)Br is 8."}", "/scratch/micpie/export/rdkit_features/valid_6-7.jsonl": "{"text":"The count of acid groups of the molecule with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+] is 0."} {"text":"The acid group count of the compound with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F is 2."}", "/scratch/micpie/export/rdkit_features/train_23-7.jsonl": "{"text":"The number of acid groups of the compound with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4 is 0."} {"text":"The number of acid groups of the molecule with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3 is 3."}", "/scratch/micpie/export/rdkit_features/test_29-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4 is 2."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C is 2."}", "/scratch/micpie/export/rdkit_features/valid_0-18.jsonl": "{"text":"Question: What is the aromatic bond count of the compound with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1?\nAnswer: 25"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_13-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the molecule with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_7-11.jsonl": "{"text":"User: I want to create a molecule with a formula of C24H26N4O3.\nAssistant: Do you have some additional requirements I should take into account?\nUser: I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptors to be 6.\nAssistant: Then, I suggest the molecule with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC."} {"text":"User: I want to create a chemical with a molecular formula of C21H18N2O5S.\nAssistant: That's interesting, do you have some additional constraints?\nUser: I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptor sites to be 7.\nAssistant: In that case, I recommend the chemical with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC."}", "/scratch/micpie/export/rdkit_features/test_2-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the molecule with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C?\nAnswer: 7"} {"text":"Question: What is the rotatable bond count of the compound with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_24-19.jsonl": "{"text":"Question: What is the number of acid groups of the chemical with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl?\nAnswer: 3"} {"text":"Question: What is the acid group count of the compound with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_8-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C?\nAnswer: 63.66"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F?\nAnswer: 60.44"}", "/scratch/micpie/export/rdkit_features/train_21-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC is 0.69."} {"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O is 2.49."}", "/scratch/micpie/export/rdkit_features/train_8-8.jsonl": "{"text":"The number of basic groups of the chemical with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N is 0."} {"text":"The basic group count of the compound with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO is 1."}", "/scratch/micpie/export/rdkit_features/valid_21-16.jsonl": "{"text":"Question: What is the number of rings of the compound with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-]?\nAnswer: 2"} {"text":"Question: What is the count of rings of the molecule with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_103-23.jsonl": "{"text":"User: I want to design a chemical with a count of hydrogen bond donors of 4, a number of hydrogen bond acceptor sites of 7 and a Wildman-Crippen LogP value computed using RDKit of 1.51.\nAssistant: Nice, do you have some additional limitations I should consider?\nUser: Indeed, I want the chemical formula to be C22H36O7.\nAssistant: I suggest the chemical with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO."} {"text":"User: I want to synthesize a chemical with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptors of 6 and a Wildman-Crippen LogP value computed using RDKit of -1.16.\nAssistant: Interesting, do you have some additional constraints I should take into account?\nUser: Yeah, I want the molecular formula to be C14H14N6O3.\nAssistant: In that situation, I the chemical with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N."}", "/scratch/micpie/export/rdkit_features/valid_25-20.jsonl": "{"text":"Question: What is the basic group count of the chemical with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4?\nAnswer: 0"} {"text":"Question: What is the basic group count of the molecule with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_106-20.jsonl": "{"text":"Question: What is the count of basic groups of the chemical with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O?\nAnswer: 0"} {"text":"Question: What is the basic group count of the chemical with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_105-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F?\nAnswer: 51.47"} {"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O?\nAnswer: 39.32"}", "/scratch/micpie/export/rdkit_features/valid_102-12.jsonl": "{"text":"Question: What is the formula of the compound with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1?\nAnswer: C23H21F2N3O2"} {"text":"Question: What is the molecular formula of the chemical with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2?\nAnswer: C48H76N6O4"}", "/scratch/micpie/export/rdkit_features/train_26-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the chemical with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_109-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl?\nAnswer: 5"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_1-0.jsonl": "{"text":"The molecular formula of the chemical with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4 is C23H25ClN3O2+."} {"text":"The chemical formula of the chemical with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC is C23H23N5O3."}", "/scratch/micpie/export/rdkit_features/test_7-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the molecule with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC?\nAnswer: 7"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/test_27-11.jsonl": "{"text":"User: I want to make a molecule with a formula of C18H27F3N3O2+.\nAssistant: Nice, do you have some additional constraints?\nUser: Yep, I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 2.\nAssistant: In that scenario, I suggest the molecule with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F."} {"text":"User: I want to design a chemical with a chemical formula of C19H27F2N3O2.\nAssistant: Interesting, do you have some additional conditions I should consider?\nUser: I want the number of hydrogen bond donors to be 2, the count of hydrogen bond acceptors to be 3.\nAssistant: In that scenario, I suggest the chemical with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F."}", "/scratch/micpie/export/rdkit_features/valid_106-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the compound with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl?\nAnswer: 11"} {"text":"Question: What is the count of aromatic bonds of the molecule with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_21-16.jsonl": "{"text":"Question: What is the count of rings of the molecule with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-]?\nAnswer: 2"} {"text":"Question: What is the number of rings of the compound with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_27-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F is 6."} {"text":"The number of aromatic bonds of the molecule with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F is 6."}", "/scratch/micpie/export/rdkit_features/train_118-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F is 42.44."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3 is 46.27."}", "/scratch/micpie/export/rdkit_features/valid_26-4.jsonl": "{"text":"The ring count of the chemical with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2 is 2."} {"text":"The ring count of the molecule with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC is 3."}", "/scratch/micpie/export/rdkit_features/train_100-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the compound with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C?\nAnswer: 0"} {"text":"Question: What is the number of aromatic bonds of the chemical with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3?\nAnswer: 17"}", "/scratch/micpie/export/rdkit_features/test_25-16.jsonl": "{"text":"Question: What is the ring count of the chemical with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4?\nAnswer: 4"} {"text":"Question: What is the number of rings of the compound with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_21-5.jsonl": "{"text":"The rotatable bond count of the compound with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC is 6."} {"text":"The rotatable bond count of the molecule with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O is 3."}", "/scratch/micpie/export/rdkit_features/valid_116-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 5.03."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl is 5.13."}", "/scratch/micpie/export/rdkit_features/train_20-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the molecule with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3?\nAnswer: 12"} {"text":"Question: What is the aromatic bond count of the chemical with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/valid_7-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC is 66.32."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC is 62.32."}", "/scratch/micpie/export/rdkit_features/test_23-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4?\nAnswer: 65.31"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4?\nAnswer: 56.13"}", "/scratch/micpie/export/rdkit_features/test_100-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the compound with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F is 1."} {"text":"The number of hydrogen bond acceptors of the chemical with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br is 3."}", "/scratch/micpie/export/rdkit_features/train_24-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5 is C20H16N7O2-."} {"text":"The chemical formula of the molecule with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C is C19H31N7O2."}", "/scratch/micpie/export/rdkit_features/test_13-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_120-16.jsonl": "{"text":"Question: What is the count of rings of the molecule with SMILES c1cc(oc1)CNC(=O)CSc2nc3ccc(cc3s2)n4c(c5c(c4O)[C@@H]6C[C@@H]5C=C6)O?\nAnswer: 6"} {"text":"Question: What is the count of rings of the compound with SMILES CN(c1nc2cc(ccc2s1)Cl)C(=O)CSc3ccc(c(c3)F)F?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_113-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 8."} {"text":"The rotatable bond count of the compound with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 5."}", "/scratch/micpie/export/rdkit_features/test_118-23.jsonl": "{"text":"User: I want to analyze a chemical with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptors of 1 and a LogP value computed using the Wildman-Crippen method of 1.92.\nAssistant: Interesting, do you have some additional conditions I should take into account?\nUser: Yeah, I want the molecular formula to be C14H23FNO+.\nAssistant: In that situation, I suggest the chemical with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O."} {"text":"User: I want to make a chemical with a count of hydrogen bond donors of 3, a number of hydrogen bond acceptors of 7 and a LogP value computed using the Wildman-Crippen method of 1.07.\nAssistant: Interesting, do you have some additional constraints?\nUser: Yeah, I want the molecular formula to be C14H12FN5O5.\nAssistant: I suggest the chemical with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N."}", "/scratch/micpie/export/rdkit_features/test_116-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1?\nAnswer: 4"} {"text":"Question: What is the rotatable bond count of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_12-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the compound with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl is 0."} {"text":"The count of hydrogen bond donors of the molecule with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3 is 0."}", "/scratch/micpie/export/rdkit_features/train_9-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C?\nAnswer: C25H32N3O3+"} {"text":"Question: What is the chemical formula of the chemical with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F?\nAnswer: C22H32FN5O"}", "/scratch/micpie/export/rdkit_features/valid_115-6.jsonl": "{"text":"The aromatic bond count of the molecule with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4 is 22."} {"text":"The count of aromatic bonds of the molecule with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F is 16."}", "/scratch/micpie/export/rdkit_features/train_112-11.jsonl": "{"text":"User: I want to create a chemical with a formula of C22H31N5O4.\nAssistant: Cool, do you have some additional limitations I should consider?\nUser: Yep, I want the number of hydrogen bond donor sites to be 0, the count of hydrogen bond acceptors to be 8.\nAssistant: In that case, I advise the chemical with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC."} {"text":"User: I want to synthesize a molecule with a molecular formula of C21H32N5O3S+.\nAssistant: Nice, do you have some additional conditions?\nUser: Yep, I want the number of hydrogen bond donor sites to be 3, the number of hydrogen bond acceptors to be 5.\nAssistant: In that case, I advise the molecule with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N."}", "/scratch/micpie/export/rdkit_features/test_29-15.jsonl": "{"text":"Question: What is the heteroatom count of the molecule with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4?\nAnswer: 5"} {"text":"Question: What is the count of heteroatoms of the molecule with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_2-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the molecule with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4?\nAnswer: 8"} {"text":"Question: What is the count of heteroatoms of the chemical with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/train_17-11.jsonl": "{"text":"User: I want to create a compound with a molecular formula of C23H28F2N2O3.\nAssistant: Cool, do you have some additional requirements I should take into account?\nUser: I want the count of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 3.\nAssistant: I recommend the compound with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F."} {"text":"User: I want to make a compound with a formula of C23H24N4O8.\nAssistant: Interesting, do you have some additional conditions that help me narrow down the search?\nUser: Yep, I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptor sites to be 9.\nAssistant: In that situation, I suggest the compound with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O."}", "/scratch/micpie/export/rdkit_features/train_100-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C?\nAnswer: 2"} {"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_118-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O is 6."} {"text":"The rotatable bond count of the compound with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N is 5."}", "/scratch/micpie/export/rdkit_features/test_6-22.jsonl": "{"text":"User: I want to synthesize a chemical with a number of hydrogen bond donor sites of 3 and a number of hydrogen bond acceptor sites of 4.\nAssistant: That is a very interesting question, do you have some additional I should take into account?\nUser: I want the number of heteroatoms to be 8.\nAssistant: In that case, I the chemical with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3."} {"text":"User: I want to synthesize a compound with a number of hydrogen bond donor sites of 1 and a number of hydrogen bond acceptors of 5.\nAssistant: That's interesting, do you have some additional constraints that I should consider?\nUser: Indeed, I want the count of heteroatoms to be 6.\nAssistant: Then, I propose the compound with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C."}", "/scratch/micpie/export/rdkit_features/valid_119-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the molecule with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_17-3.jsonl": "{"text":"The number of heteroatoms of the chemical with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F is 7."} {"text":"The heteroatom count of the chemical with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 12."}", "/scratch/micpie/export/rdkit_features/valid_29-22.jsonl": "{"text":"User: I want to create a chemical with a count of hydrogen bond donors of 3 and a number of hydrogen bond acceptor sites of 1.\nAssistant: Nice, do you have some additional constraints I should consider?\nUser: Yeah, I want the number of heteroatoms to be 5.\nAssistant: I advise the chemical with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl."} {"text":"User: I want to create a molecule with a number of hydrogen bond donors of 3 and a number of hydrogen bond acceptors of 2.\nAssistant: That is a very interesting question, do you have some additional limitations that I should consider?\nUser: Yep, I want the heteroatom count to be 6.\nAssistant: In that scenario, I recommend the molecule with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3."}", "/scratch/micpie/export/rdkit_features/valid_101-20.jsonl": "{"text":"Question: What is the number of basic groups of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the molecule with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_19-0.jsonl": "{"text":"The formula of the compound with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl is C21H26ClN3O4."} {"text":"The formula of the molecule with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4 is C21H34FN4O2S+."}", "/scratch/micpie/export/rdkit_features/valid_119-22.jsonl": "{"text":"User: I want to make a compound with a number of hydrogen bond donors of 2 and a number of hydrogen bond acceptor sites of 6.\nAssistant: Do you have some additional conditions I should take into account?\nUser: Yes, I want the heteroatom count to be 9.\nAssistant: I the compound with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O."} {"text":"User: I want to analyze a compound with a number of hydrogen bond donors of 0 and a number of hydrogen bond acceptor sites of 7.\nAssistant: Cool, do you have some additional limitations I should take into account?\nUser: Yeah, I want the number of heteroatoms to be 7.\nAssistant: In that situation, I propose the compound with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6."}", "/scratch/micpie/export/rdkit_features/train_120-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES Cc1ccc(cc1)NC(=O)CSc2nnc(n2C)C=NC(=O)CCc3ccccc3Cl is C22H22ClN5O2S."} {"text":"The chemical formula of the compound with SMILES Cc1c(cnn1c2ccccc2)N[C@@H](C)C(=O)Nc3cccc(c3Cl)Cl is C19H18Cl2N4O."}", "/scratch/micpie/export/rdkit_features/valid_27-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F?\nAnswer: C14H18F3N5OS"} {"text":"Question: What is the formula of the molecule with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C?\nAnswer: C20H31ClN3O+"}", "/scratch/micpie/export/rdkit_features/test_21-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-]?\nAnswer: 52.49"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O?\nAnswer: 63.34"}", "/scratch/micpie/export/rdkit_features/valid_109-15.jsonl": "{"text":"Question: What is the heteroatom count of the molecule with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl?\nAnswer: 8"} {"text":"Question: What is the count of heteroatoms of the molecule with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/train_22-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O is 6."} {"text":"The number of aromatic bonds of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4 is 10."}", "/scratch/micpie/export/rdkit_features/test_10-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4?\nAnswer: 5"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_18-20.jsonl": "{"text":"Question: What is the number of basic groups of the molecule with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O?\nAnswer: 0"} {"text":"Question: What is the basic group count of the molecule with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_21-12.jsonl": "{"text":"Question: What is the molecular formula of the molecule with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-]?\nAnswer: C18H26N2O5S"} {"text":"Question: What is the formula of the molecule with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3?\nAnswer: C22H33N3O3"}", "/scratch/micpie/export/rdkit_features/train_104-20.jsonl": "{"text":"Question: What is the basic group count of the molecule with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 0"} {"text":"Question: What is the basic group count of the compound with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_25-17.jsonl": "{"text":"Question: What is the rotatable bond count of the chemical with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4?\nAnswer: 7"} {"text":"Question: What is the count of rotatable bonds of the chemical with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_105-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F is 4.16."} {"text":"The Wildman-Crippen LogP value of the chemical with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O is 1.73."}", "/scratch/micpie/export/rdkit_features/test_21-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-] is 52.49."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O is 63.34."}", "/scratch/micpie/export/rdkit_features/train_24-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the compound with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5?\nAnswer: 22"} {"text":"Question: What is the aromatic bond count of the chemical with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C?\nAnswer: 10"}", "/scratch/micpie/export/rdkit_features/train_5-22.jsonl": "{"text":"User: I want to make a molecule with a number of hydrogen bond donors of 2 and a count of hydrogen bond acceptors of 4.\nAssistant: Interesting, do you have some additional that I should consider?\nUser: Yes, I want the count of heteroatoms to be 6.\nAssistant: I advise the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C."} {"text":"User: I want to analyze a molecule with a number of hydrogen bond donors of 3 and a number of hydrogen bond acceptor sites of 4.\nAssistant: Interesting, do you have some additional limitations that help me narrow down the search?\nUser: Yeah, I want the heteroatom count to be 8.\nAssistant: Then, I recommend the molecule with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C."}", "/scratch/micpie/export/rdkit_features/test_24-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the compound with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl is 1."} {"text":"The number of hydrogen bond donor sites of the chemical with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C is 1."}", "/scratch/micpie/export/rdkit_features/train_17-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F?\nAnswer: C23H28F2N2O3"} {"text":"Question: What is the chemical formula of the molecule with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: C23H24N4O8"}", "/scratch/micpie/export/rdkit_features/valid_109-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl is 3."} {"text":"The number of hydrogen bond donor sites of the chemical with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F is 3."}", "/scratch/micpie/export/rdkit_features/train_115-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the compound with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F is 3."} {"text":"The number of hydrogen bond acceptors of the molecule with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 3."}", "/scratch/micpie/export/rdkit_features/test_104-12.jsonl": "{"text":"Question: What is the molecular formula of the molecule with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2?\nAnswer: C11H18N4O5S"} {"text":"Question: What is the chemical formula of the chemical with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3?\nAnswer: C18H14ClN5O"}", "/scratch/micpie/export/rdkit_features/train_103-18.jsonl": "{"text":"Question: What is the aromatic bond count of the compound with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1?\nAnswer: 17"} {"text":"Question: What is the aromatic bond count of the molecule with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_12-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_101-3.jsonl": "{"text":"The count of heteroatoms of the compound with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3 is 7."} {"text":"The number of heteroatoms of the chemical with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1 is 7."}", "/scratch/micpie/export/rdkit_features/test_119-7.jsonl": "{"text":"The number of acid groups of the chemical with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O is 0."} {"text":"The count of acid groups of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C is 0."}", "/scratch/micpie/export/rdkit_features/valid_18-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 65.17"} {"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O?\nAnswer: 40.13"}", "/scratch/micpie/export/rdkit_features/valid_100-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_117-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(cc4)Oc5ccc(cn5)Cl?\nAnswer: 0"} {"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES C[C@@H](C(C)(C)OC)[NH2+]CCNc1ccccc1?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_20-3.jsonl": "{"text":"The count of heteroatoms of the compound with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3 is 6."} {"text":"The number of heteroatoms of the compound with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC is 9."}", "/scratch/micpie/export/rdkit_features/train_7-8.jsonl": "{"text":"The number of basic groups of the compound with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl is 0."} {"text":"The count of basic groups of the molecule with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F is 0."}", "/scratch/micpie/export/rdkit_features/test_120-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the molecule with SMILES c1ccc(cc1)CCN(CC(=O)Nc2ccc(cc2)Cl)S(=O)(=O)c3ccc4c(c3)OCCO4?\nAnswer: 8"} {"text":"Question: What is the count of rotatable bonds of the compound with SMILES Cc1c(cnn1c2ccccc2)N[C@H](C)C(=O)Nc3cccc(c3Cl)Cl?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_16-19.jsonl": "{"text":"Question: What is the count of acid groups of the compound with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C?\nAnswer: 0"} {"text":"Question: What is the acid group count of the molecule with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_119-7.jsonl": "{"text":"The count of acid groups of the molecule with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO is 0."} {"text":"The acid group count of the molecule with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl is 0."}", "/scratch/micpie/export/rdkit_features/valid_19-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC?\nAnswer: 12"} {"text":"Question: What is the aromatic bond count of the compound with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F?\nAnswer: 15"}", "/scratch/micpie/export/rdkit_features/train_21-4.jsonl": "{"text":"The number of rings of the molecule with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC is 2."} {"text":"The ring count of the molecule with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O is 3."}", "/scratch/micpie/export/rdkit_features/valid_108-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the chemical with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O is 1."} {"text":"The number of hydrogen bond donors of the compound with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C is 3."}", "/scratch/micpie/export/rdkit_features/train_23-23.jsonl": "{"text":"User: I want to analyze a molecule with a count of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 7 and a Wildman-Crippen LogP value computed using RDKit of 0.74.\nAssistant: Nice, do you have some additional requirements I should consider?\nUser: Yep, I want the molecular formula to be C20H33N6O2+.\nAssistant: Then, I advise the molecule with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4."} {"text":"User: I want to synthesize a chemical with a number of hydrogen bond donor sites of 2, a count of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value of 1.77.\nAssistant: Cool, do you have some additional I should take into account?\nUser: I want the chemical formula to be C12H12IN6O-.\nAssistant: In that scenario, I propose the chemical with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3."}", "/scratch/micpie/export/rdkit_features/valid_118-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the chemical with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O is 2."} {"text":"The number of hydrogen bond donor sites of the compound with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO is 3."}", "/scratch/micpie/export/rdkit_features/train_31-11.jsonl": "{"text":"User: I want to design a chemical with a chemical formula of C24H27FN2O3.\nAssistant: Interesting, do you have some additional conditions that help me narrow down the search?\nUser: I want the count of hydrogen bond donors to be 0, the count of hydrogen bond acceptors to be 3.\nAssistant: I recommend the chemical with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F."} {"text":"User: I want to design a chemical with a molecular formula of C22H24N4O3.\nAssistant: Cool, do you have some additional requirements that I should consider?\nUser: I want the number of hydrogen bond donor sites to be 2, the number of hydrogen bond acceptor sites to be 4.\nAssistant: In that scenario, I advise the chemical with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4."}", "/scratch/micpie/export/rdkit_features/valid_102-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the molecule with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1?\nAnswer: 18"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/test_14-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3?\nAnswer: 3"} {"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_11-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl?\nAnswer: 6"} {"text":"Question: What is the aromatic bond count of the chemical with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_1-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5 is 62.36."} {"text":"The total sum of atomic polarizabilities of the molecule with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F is 62.38."}", "/scratch/micpie/export/rdkit_features/test_11-5.jsonl": "{"text":"The number of rotatable bonds of the molecule with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2 is 5."} {"text":"The count of rotatable bonds of the compound with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C is 6."}", "/scratch/micpie/export/rdkit_features/valid_28-1.jsonl": "{"text":"The number of hydrogen bond donors of the compound with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br is 2."} {"text":"The count of hydrogen bond donors of the molecule with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C is 0."}", "/scratch/micpie/export/rdkit_features/valid_16-11.jsonl": "{"text":"User: I want to synthesize a molecule with a formula of C18H35N2O2+.\nAssistant: That is a very interesting question, do you have some additional constraints that help me narrow down the search?\nUser: Yeah, I want the number of hydrogen bond donor sites to be 2, the number of hydrogen bond acceptors to be 2.\nAssistant: I suggest the molecule with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C."} {"text":"User: I want to synthesize a compound with a chemical formula of C23H36N2O4.\nAssistant: That is a very interesting question, do you have some additional limitations I should take into account?\nUser: Yeah, I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptor sites to be 4.\nAssistant: In that scenario, I suggest the compound with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3."}", "/scratch/micpie/export/rdkit_features/test_19-23.jsonl": "{"text":"User: I want to make a compound with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptor sites of 4 and a Wildman-Crippen LogP value of 3.75.\nAssistant: Interesting, do you have some additional constraints?\nUser: Yes, I want the molecular formula to be C21H26ClN3O4.\nAssistant: In that scenario, I suggest the compound with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl."} {"text":"User: I want to analyze a molecule with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value of 2.16.\nAssistant: Do you have some additional constraints I should consider?\nUser: Yea, I want the chemical formula to be C21H34FN4O2S+.\nAssistant: In that scenario, I advise the molecule with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4."}", "/scratch/micpie/export/rdkit_features/test_10-6.jsonl": "{"text":"The aromatic bond count of the molecule with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4 is 5."} {"text":"The aromatic bond count of the chemical with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F is 6."}", "/scratch/micpie/export/rdkit_features/train_107-4.jsonl": "{"text":"The count of rings of the molecule with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl is 4."} {"text":"The count of rings of the chemical with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5 is 5."}", "/scratch/micpie/export/rdkit_features/valid_108-16.jsonl": "{"text":"Question: What is the count of rings of the compound with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O?\nAnswer: 2"} {"text":"Question: What is the ring count of the chemical with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_116-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the compound with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 3."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl is 4."}", "/scratch/micpie/export/rdkit_features/valid_108-8.jsonl": "{"text":"The number of basic groups of the molecule with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O is 0."} {"text":"The number of basic groups of the compound with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C is 0."}", "/scratch/micpie/export/rdkit_features/test_113-23.jsonl": "{"text":"User: I want to analyze a compound with a count of hydrogen bond donors of 3, a count of hydrogen bond acceptors of 5 and a Wildman-Crippen LogP value of 0.67.\nAssistant: Do you have some additional conditions that help me narrow down the search?\nUser: Yes, I want the molecular formula to be C20H24N5O2S2+.\nAssistant: I propose the compound with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N."} {"text":"User: I want to create a compound with a number of hydrogen bond donor sites of 3, a count of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value of 5.61.\nAssistant: That's interesting, do you have some additional conditions?\nUser: I want the molecular formula to be C20H17Cl2N3O3S2.\nAssistant: Then, I propose the compound with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl."}", "/scratch/micpie/export/rdkit_features/test_113-20.jsonl": "{"text":"Question: What is the count of basic groups of the molecule with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 2"} {"text":"Question: What is the count of basic groups of the molecule with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_32-20.jsonl": "{"text":"Question: What is the number of basic groups of the compound with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C?\nAnswer: 1"} {"text":"Question: What is the number of basic groups of the chemical with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_10-7.jsonl": "{"text":"The number of acid groups of the compound with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4 is 0."} {"text":"The number of acid groups of the chemical with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl is 0."}", "/scratch/micpie/export/rdkit_features/test_20-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br is 6."} {"text":"The number of aromatic bonds of the chemical with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-] is 12."}", "/scratch/micpie/export/rdkit_features/test_16-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C?\nAnswer: C18H35N2O2+"} {"text":"Question: What is the formula of the chemical with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N?\nAnswer: C17H19Cl2N3O3S"}", "/scratch/micpie/export/rdkit_features/test_28-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the chemical with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC?\nAnswer: 19"} {"text":"Question: What is the number of aromatic bonds of the compound with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C?\nAnswer: 16"}", "/scratch/micpie/export/rdkit_features/test_111-23.jsonl": "{"text":"User: I want to design a molecule with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptors of 7 and a Wildman-Crippen LogP value of 1.06.\nAssistant: Do you have some additional limitations that help me narrow down the search?\nUser: Yea, I want the molecular formula to be C22H34N5O3S+.\nAssistant: In that case, I recommend the molecule with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4."} {"text":"User: I want to synthesize a molecule with a number of hydrogen bond donor sites of 0, a number of hydrogen bond acceptors of 8 and a LogP value computed using the Wildman-Crippen method of 2.06.\nAssistant: That is a very interesting question, do you have some additional limitations I should take into account?\nUser: Yes, I want the chemical formula to be C22H31N5O4.\nAssistant: I propose the molecule with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC."}", "/scratch/micpie/export/rdkit_features/train_118-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_118-4.jsonl": "{"text":"The count of rings of the chemical with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O is 1."} {"text":"The ring count of the molecule with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO is 2."}", "/scratch/micpie/export/rdkit_features/train_113-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 69.10."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 64.69."}", "/scratch/micpie/export/rdkit_features/valid_3-20.jsonl": "{"text":"Question: What is the number of basic groups of the molecule with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the molecule with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_18-5.jsonl": "{"text":"The rotatable bond count of the compound with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O is 7."} {"text":"The count of rotatable bonds of the chemical with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl is 7."}", "/scratch/micpie/export/rdkit_features/test_108-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5?\nAnswer: 63.23"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3?\nAnswer: 68.52"}", "/scratch/micpie/export/rdkit_features/test_12-16.jsonl": "{"text":"Question: What is the ring count of the compound with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C?\nAnswer: 4"} {"text":"Question: What is the count of rings of the compound with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_1-19.jsonl": "{"text":"Question: What is the count of acid groups of the molecule with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4?\nAnswer: 0"} {"text":"Question: What is the acid group count of the compound with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_110-20.jsonl": "{"text":"Question: What is the number of basic groups of the molecule with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3?\nAnswer: 1"} {"text":"Question: What is the count of basic groups of the chemical with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_26-22.jsonl": "{"text":"User: I want to create a molecule with a count of hydrogen bond donors of 0 and a number of hydrogen bond acceptors of 3.\nAssistant: That is a very interesting question, do you have some additional constraints I should take into account?\nUser: Yea, I want the heteroatom count to be 7.\nAssistant: Then, I propose the molecule with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3."} {"text":"User: I want to make a molecule with a number of hydrogen bond donors of 2 and a number of hydrogen bond acceptors of 3.\nAssistant: Interesting, do you have some additional I should consider?\nUser: Indeed, I want the heteroatom count to be 6.\nAssistant: In that situation, I propose the molecule with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br."}", "/scratch/micpie/export/rdkit_features/train_13-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_10-7.jsonl": "{"text":"The acid group count of the chemical with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4 is 0."} {"text":"The count of acid groups of the molecule with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F is 0."}", "/scratch/micpie/export/rdkit_features/train_106-7.jsonl": "{"text":"The acid group count of the molecule with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O is 0."} {"text":"The count of acid groups of the molecule with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl is 0."}", "/scratch/micpie/export/rdkit_features/valid_9-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2?\nAnswer: C21H31F3N2O2"} {"text":"Question: What is the molecular formula of the compound with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4?\nAnswer: C23H39N5O2"}", "/scratch/micpie/export/rdkit_features/train_13-19.jsonl": "{"text":"Question: What is the acid group count of the compound with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the compound with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_115-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the molecule with SMILES Cc1c(c(on1)C)COc2ccc(cc2OC)\/C=C\\C(=O)Nc3ccc(cc3)c4csnn4?\nAnswer: 22"} {"text":"Question: What is the aromatic bond count of the chemical with SMILES c1cc(ccc1C2(CC2)C(=O)N3[C@@H]4CC[C@H]3C[C@H](C4)NC(=O)c5cc6cc(ccc6[nH]5)C(F)(F)F)F?\nAnswer: 16"}", "/scratch/micpie/export/rdkit_features/valid_109-8.jsonl": "{"text":"The basic group count of the molecule with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl is 0."} {"text":"The number of basic groups of the compound with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F is 0."}", "/scratch/micpie/export/rdkit_features/train_0-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the chemical with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1 is 8."} {"text":"The number of hydrogen bond acceptor sites of the compound with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4 is 3."}", "/scratch/micpie/export/rdkit_features/train_14-0.jsonl": "{"text":"The formula of the molecule with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C is C20H30N4O."} {"text":"The formula of the molecule with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O is C14H14BrNO3."}", "/scratch/micpie/export/rdkit_features/train_19-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_1-16.jsonl": "{"text":"Question: What is the count of rings of the chemical with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5?\nAnswer: 5"} {"text":"Question: What is the number of rings of the chemical with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_119-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O?\nAnswer: C14H18FN3O5"} {"text":"Question: What is the chemical formula of the chemical with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6?\nAnswer: C28H23N3O4"}", "/scratch/micpie/export/rdkit_features/train_4-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4 is 5."} {"text":"The number of hydrogen bond acceptors of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C is 4."}", "/scratch/micpie/export/rdkit_features/test_6-1.jsonl": "{"text":"The count of hydrogen bond donors of the chemical with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3 is 3."} {"text":"The number of hydrogen bond donor sites of the molecule with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C is 1."}", "/scratch/micpie/export/rdkit_features/train_117-8.jsonl": "{"text":"The count of basic groups of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(cc4)Oc5ccc(cn5)Cl is 0."} {"text":"The count of basic groups of the molecule with SMILES C[C@@H](C(C)(C)OC)[NH2+]CCNc1ccccc1 is 1."}", "/scratch/micpie/export/rdkit_features/test_2-11.jsonl": "{"text":"User: I want to analyze a compound with a formula of C20H27ClFN4O2+.\nAssistant: Interesting, do you have some additional conditions I should take into account?\nUser: Yeah, I want the number of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 4.\nAssistant: Then, I propose the compound with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C."} {"text":"User: I want to analyze a compound with a chemical formula of C24H25N3O3.\nAssistant: Do you have some additional limitations I should take into account?\nUser: Yes, I want the count of hydrogen bond donors to be 0, the number of hydrogen bond acceptor sites to be 5.\nAssistant: Then, I suggest the compound with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4."}", "/scratch/micpie/export/rdkit_features/valid_4-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br is 1."} {"text":"The count of hydrogen bond donors of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C is 2."}", "/scratch/micpie/export/rdkit_features/train_108-7.jsonl": "{"text":"The acid group count of the compound with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2 is 0."} {"text":"The number of acid groups of the chemical with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3 is 0."}", "/scratch/micpie/export/rdkit_features/test_10-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4 is 74.79."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F is 51.15."}", "/scratch/micpie/export/rdkit_features/test_10-20.jsonl": "{"text":"Question: What is the count of basic groups of the compound with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4?\nAnswer: 1"} {"text":"Question: What is the basic group count of the chemical with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_10-11.jsonl": "{"text":"User: I want to design a compound with a chemical formula of C23H42N5O+.\nAssistant: That is a very interesting question, do you have some additional that I should consider?\nUser: Yeah, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 5.\nAssistant: In that case, I recommend the compound with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4."} {"text":"User: I want to make a molecule with a chemical formula of C18H23F4NO.\nAssistant: That's interesting, do you have some additional conditions that I should consider?\nUser: Yeah, I want the number of hydrogen bond donor sites to be 0, the count of hydrogen bond acceptors to be 1.\nAssistant: In that case, I the molecule with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F."}", "/scratch/micpie/export/rdkit_features/valid_5-3.jsonl": "{"text":"The number of heteroatoms of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C is 8."} {"text":"The heteroatom count of the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+] is 6."}", "/scratch/micpie/export/rdkit_features/train_25-22.jsonl": "{"text":"User: I want to make a molecule with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 8.\nAssistant: That is a very interesting question, do you have some additional conditions I should take into account?\nUser: Indeed, I want the heteroatom count to be 9.\nAssistant: In that situation, I suggest the molecule with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C."} {"text":"User: I want to synthesize a molecule with a count of hydrogen bond donors of 0 and a number of hydrogen bond acceptors of 2.\nAssistant: That is a very interesting question, do you have some additional ?\nUser: Yea, I want the heteroatom count to be 9.\nAssistant: Then, I the molecule with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F."}", "/scratch/micpie/export/rdkit_features/valid_11-7.jsonl": "{"text":"The count of acid groups of the molecule with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl is 0."} {"text":"The acid group count of the chemical with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C is 0."}", "/scratch/micpie/export/rdkit_features/train_16-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C?\nAnswer: 3"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_15-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_118-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O?\nAnswer: 1"} {"text":"Question: What is the count of rings of the compound with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_111-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4?\nAnswer: 74.38"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC?\nAnswer: 68.16"}", "/scratch/micpie/export/rdkit_features/valid_113-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_16-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C?\nAnswer: 58.82"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N?\nAnswer: 55.56"}", "/scratch/micpie/export/rdkit_features/valid_106-3.jsonl": "{"text":"The count of heteroatoms of the molecule with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl is 8."} {"text":"The heteroatom count of the chemical with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F is 11."}", "/scratch/micpie/export/rdkit_features/train_31-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F?\nAnswer: 0"} {"text":"Question: What is the basic group count of the compound with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_118-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the chemical with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O?\nAnswer: 3"} {"text":"Question: What is the number of heteroatoms of the molecule with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/valid_118-8.jsonl": "{"text":"The count of basic groups of the molecule with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O is 1."} {"text":"The number of basic groups of the chemical with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO is 0."}", "/scratch/micpie/export/rdkit_features/valid_107-4.jsonl": "{"text":"The count of rings of the compound with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C is 4."} {"text":"The count of rings of the chemical with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C is 3."}", "/scratch/micpie/export/rdkit_features/valid_119-3.jsonl": "{"text":"The count of heteroatoms of the compound with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O is 9."} {"text":"The heteroatom count of the chemical with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6 is 7."}", "/scratch/micpie/export/rdkit_features/test_110-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3 is 5."} {"text":"The count of hydrogen bond acceptors of the molecule with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4 is 8."}", "/scratch/micpie/export/rdkit_features/test_31-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F?\nAnswer: 63.59"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3?\nAnswer: 59.17"}", "/scratch/micpie/export/rdkit_features/test_25-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4 is 0.93."} {"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3 is 3.88."}", "/scratch/micpie/export/rdkit_features/train_29-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_100-19.jsonl": "{"text":"Question: What is the acid group count of the chemical with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the molecule with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_119-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the compound with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O?\nAnswer: 12"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/test_119-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the molecule with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O?\nAnswer: 9"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/valid_116-23.jsonl": "{"text":"User: I want to design a chemical with a count of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value computed using RDKit of 5.03.\nAssistant: Do you have some additional constraints I should take into account?\nUser: Yeah, I want the chemical formula to be C30H32FN3O2.\nAssistant: I advise the chemical with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F."} {"text":"User: I want to make a chemical with a number of hydrogen bond donors of 0, a number of hydrogen bond acceptors of 4 and a LogP value computed using the Wildman-Crippen method of 5.13.\nAssistant: Do you have some additional limitations that help me narrow down the search?\nUser: Yea, I want the chemical formula to be C26H25ClN2O4.\nAssistant: In that case, I recommend the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl."}", "/scratch/micpie/export/rdkit_features/test_117-7.jsonl": "{"text":"The acid group count of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc5cc(ccc5c4)Br is 0."} {"text":"The number of acid groups of the compound with SMILES C[C@H](C(C)(C)OC)[NH2+]CCNc1ccccc1 is 0."}", "/scratch/micpie/export/rdkit_features/test_3-12.jsonl": "{"text":"Question: What is the molecular formula of the molecule with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC?\nAnswer: C22H26N6O3"} {"text":"Question: What is the chemical formula of the chemical with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F?\nAnswer: C20H20F2N4O3"}", "/scratch/micpie/export/rdkit_features/valid_106-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl is 11."} {"text":"The number of aromatic bonds of the molecule with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F is 6."}", "/scratch/micpie/export/rdkit_features/test_2-19.jsonl": "{"text":"Question: What is the count of acid groups of the molecule with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the chemical with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_15-15.jsonl": "{"text":"Question: What is the heteroatom count of the compound with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N?\nAnswer: 5"} {"text":"Question: What is the count of heteroatoms of the molecule with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_24-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the molecule with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5?\nAnswer: 9"} {"text":"Question: What is the count of heteroatoms of the compound with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/train_103-22.jsonl": "{"text":"User: I want to analyze a molecule with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 3.\nAssistant: Nice, do you have some additional conditions that help me narrow down the search?\nUser: I want the number of heteroatoms to be 4.\nAssistant: Then, I suggest the molecule with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1."} {"text":"User: I want to synthesize a compound with a number of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 7.\nAssistant: Cool, do you have some additional constraints I should take into account?\nUser: Yes, I want the count of heteroatoms to be 9.\nAssistant: Then, I the compound with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3."}", "/scratch/micpie/export/rdkit_features/train_107-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl is 0.61."} {"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5 is 2.55."}", "/scratch/micpie/export/rdkit_features/test_114-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the compound with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C is 71.75."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5 is 68.04."}", "/scratch/micpie/export/rdkit_features/test_12-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C is 2."} {"text":"The rotatable bond count of the chemical with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4 is 6."}", "/scratch/micpie/export/rdkit_features/train_110-19.jsonl": "{"text":"Question: What is the number of acid groups of the compound with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the molecule with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_106-20.jsonl": "{"text":"Question: What is the number of basic groups of the molecule with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the compound with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_3-6.jsonl": "{"text":"The aromatic bond count of the chemical with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC is 17."} {"text":"The count of aromatic bonds of the chemical with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F is 16."}", "/scratch/micpie/export/rdkit_features/test_8-22.jsonl": "{"text":"User: I want to make a compound with a number of hydrogen bond donors of 0 and a number of hydrogen bond acceptor sites of 7.\nAssistant: Nice, do you have some additional conditions that help me narrow down the search?\nUser: Indeed, I want the count of heteroatoms to be 8.\nAssistant: In that situation, I recommend the compound with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C."} {"text":"User: I want to design a compound with a count of hydrogen bond donors of 0 and a number of hydrogen bond acceptor sites of 4.\nAssistant: Nice, do you have some additional constraints that I should consider?\nUser: Yep, I want the heteroatom count to be 7.\nAssistant: In that situation, I advise the compound with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F."}", "/scratch/micpie/export/rdkit_features/train_29-19.jsonl": "{"text":"Question: What is the number of acid groups of the molecule with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the molecule with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_8-4.jsonl": "{"text":"The count of rings of the compound with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C is 3."} {"text":"The number of rings of the molecule with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F is 4."}", "/scratch/micpie/export/rdkit_features/test_2-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the compound with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C is 4."} {"text":"The number of hydrogen bond acceptors of the chemical with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4 is 5."}", "/scratch/micpie/export/rdkit_features/test_5-23.jsonl": "{"text":"User: I want to create a chemical with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptor sites of 4 and a Wildman-Crippen LogP value computed using RDKit of 2.38.\nAssistant: That's interesting, do you have some additional conditions I should consider?\nUser: Yea, I want the chemical formula to be C23H27N2O3S+.\nAssistant: Then, I the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5."} {"text":"User: I want to analyze a chemical with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value of 3.88.\nAssistant: Do you have some additional limitations I should take into account?\nUser: I want the molecular formula to be C22H22F2N2O3.\nAssistant: In that scenario, I recommend the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F."}", "/scratch/micpie/export/rdkit_features/test_21-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the molecule with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-]?\nAnswer: 8"} {"text":"Question: What is the count of heteroatoms of the molecule with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/valid_18-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the compound with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 1."} {"text":"The count of hydrogen bond donors of the molecule with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O is 1."}", "/scratch/micpie/export/rdkit_features/train_1-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_5-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C is 67.32."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C is 65.91."}", "/scratch/micpie/export/rdkit_features/train_102-16.jsonl": "{"text":"Question: What is the count of rings of the molecule with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O?\nAnswer: 5"} {"text":"Question: What is the ring count of the compound with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_106-22.jsonl": "{"text":"User: I want to design a molecule with a number of hydrogen bond donors of 2 and a number of hydrogen bond acceptors of 6.\nAssistant: That is a very interesting question, do you have some additional that help me narrow down the search?\nUser: Yep, I want the count of heteroatoms to be 8.\nAssistant: In that case, I advise the molecule with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O."} {"text":"User: I want to create a compound with a number of hydrogen bond donor sites of 2 and a count of hydrogen bond acceptors of 6.\nAssistant: Nice, do you have some additional conditions I should take into account?\nUser: Yea, I want the heteroatom count to be 9.\nAssistant: I the compound with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC."}", "/scratch/micpie/export/rdkit_features/train_28-11.jsonl": "{"text":"User: I want to analyze a molecule with a formula of C14H9BrClNO2S.\nAssistant: Interesting, do you have some additional constraints I should take into account?\nUser: Yep, I want the number of hydrogen bond donor sites to be 0, the number of hydrogen bond acceptors to be 3.\nAssistant: I the molecule with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl."} {"text":"User: I want to synthesize a molecule with a formula of C20H26FN3O2.\nAssistant: That is a very interesting question, do you have some additional limitations I should take into account?\nUser: Yeah, I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 2.\nAssistant: In that case, I advise the molecule with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F."}", "/scratch/micpie/export/rdkit_features/train_27-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C is 1."} {"text":"The count of hydrogen bond donors of the compound with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4 is 1."}", "/scratch/micpie/export/rdkit_features/valid_28-11.jsonl": "{"text":"User: I want to synthesize a compound with a chemical formula of C16H20BrFN2O.\nAssistant: That is a very interesting question, do you have some additional limitations?\nUser: Yep, I want the number of hydrogen bond donors to be 2, the count of hydrogen bond acceptors to be 1.\nAssistant: I advise the compound with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br."} {"text":"User: I want to analyze a chemical with a chemical formula of C22H26N2O2.\nAssistant: That is a very interesting question, do you have some additional conditions I should take into account?\nUser: Yea, I want the number of hydrogen bond donors to be 0, the number of hydrogen bond acceptors to be 2.\nAssistant: I suggest the chemical with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C."}", "/scratch/micpie/export/rdkit_features/test_0-11.jsonl": "{"text":"User: I want to create a compound with a chemical formula of C24H12N2O3S5.\nAssistant: Do you have some additional constraints I should take into account?\nUser: Indeed, I want the count of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 9.\nAssistant: Then, I suggest the compound with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1."} {"text":"User: I want to analyze a compound with a chemical formula of C24H21N3O4.\nAssistant: That's interesting, do you have some additional requirements I should take into account?\nUser: Yea, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 6.\nAssistant: Then, I advise the compound with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC."}", "/scratch/micpie/export/rdkit_features/train_111-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C is 2.17."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC is 2.11."}", "/scratch/micpie/export/rdkit_features/valid_30-7.jsonl": "{"text":"The acid group count of the chemical with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3 is 0."} {"text":"The acid group count of the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F is 0."}", "/scratch/micpie/export/rdkit_features/test_32-15.jsonl": "{"text":"Question: What is the heteroatom count of the molecule with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3?\nAnswer: 8"} {"text":"Question: What is the heteroatom count of the compound with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_3-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC?\nAnswer: 4"} {"text":"Question: What is the count of rings of the compound with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_105-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N is 4.04."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O is 2.49."}", "/scratch/micpie/export/rdkit_features/train_4-22.jsonl": "{"text":"User: I want to design a molecule with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 5.\nAssistant: That is a very interesting question, do you have some additional requirements I should consider?\nUser: Indeed, I want the heteroatom count to be 6.\nAssistant: Then, I recommend the molecule with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4."} {"text":"User: I want to analyze a compound with a count of hydrogen bond donors of 2 and a number of hydrogen bond acceptor sites of 4.\nAssistant: Cool, do you have some additional limitations that help me narrow down the search?\nUser: Yes, I want the heteroatom count to be 6.\nAssistant: In that case, I propose the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C."}", "/scratch/micpie/export/rdkit_features/test_103-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1?\nAnswer: 4"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_100-6.jsonl": "{"text":"The count of aromatic bonds of the molecule with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F is 5."} {"text":"The number of aromatic bonds of the molecule with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br is 11."}", "/scratch/micpie/export/rdkit_features/valid_18-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 9."} {"text":"The count of hydrogen bond acceptors of the compound with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O is 2."}", "/scratch/micpie/export/rdkit_features/train_100-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the molecule with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C is 1."} {"text":"The number of hydrogen bond donor sites of the chemical with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3 is 1."}", "/scratch/micpie/export/rdkit_features/valid_22-0.jsonl": "{"text":"The chemical formula of the compound with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O is C17H24ClN3O3S."} {"text":"The chemical formula of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4 is C20H33N6O2+."}", "/scratch/micpie/export/rdkit_features/train_0-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1 is 1."} {"text":"The count of acid groups of the compound with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4 is 0."}", "/scratch/micpie/export/rdkit_features/train_20-4.jsonl": "{"text":"The ring count of the chemical with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3 is 3."} {"text":"The number of rings of the molecule with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC is 2."}", "/scratch/micpie/export/rdkit_features/train_17-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_0-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1?\nAnswer: 3"} {"text":"Question: What is the number of rotatable bonds of the molecule with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_30-8.jsonl": "{"text":"The count of basic groups of the molecule with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3 is 0."} {"text":"The basic group count of the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC is 0."}", "/scratch/micpie/export/rdkit_features/test_12-23.jsonl": "{"text":"User: I want to design a molecule with a count of hydrogen bond donors of 2, a count of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value of 4.63.\nAssistant: Interesting, do you have some additional constraints I should take into account?\nUser: Indeed, I want the molecular formula to be C18H18FN3OS.\nAssistant: In that case, I propose the molecule with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C."} {"text":"User: I want to design a molecule with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 2 and a LogP value computed using the Wildman-Crippen method of 4.97.\nAssistant: Nice, do you have some additional conditions I should take into account?\nUser: I want the formula to be C21H29FN2O.\nAssistant: In that situation, I advise the molecule with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4."}", "/scratch/micpie/export/rdkit_features/valid_24-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl is 5."} {"text":"The rotatable bond count of the chemical with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C is 7."}", "/scratch/micpie/export/rdkit_features/test_15-4.jsonl": "{"text":"The count of rings of the molecule with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O is 1."} {"text":"The ring count of the compound with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2 is 2."}", "/scratch/micpie/export/rdkit_features/test_107-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC?\nAnswer: C22H27N3O5S"} {"text":"Question: What is the molecular formula of the chemical with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC?\nAnswer: C23H29N3O6"}", "/scratch/micpie/export/rdkit_features/valid_25-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4?\nAnswer: 8"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_26-15.jsonl": "{"text":"Question: What is the heteroatom count of the chemical with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F?\nAnswer: 9"} {"text":"Question: What is the number of heteroatoms of the compound with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_113-11.jsonl": "{"text":"User: I want to analyze a compound with a molecular formula of C21H32N5O3S+.\nAssistant: That's interesting, do you have some additional constraints I should consider?\nUser: Yep, I want the number of hydrogen bond donors to be 3, the number of hydrogen bond acceptor sites to be 5.\nAssistant: In that scenario, I advise the compound with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N."} {"text":"User: I want to synthesize a molecule with a molecular formula of C21H19Cl2N3O2S2.\nAssistant: Nice, do you have some additional requirements I should consider?\nUser: Yea, I want the count of hydrogen bond donors to be 3, the number of hydrogen bond acceptor sites to be 3.\nAssistant: In that scenario, I advise the molecule with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl."}", "/scratch/micpie/export/rdkit_features/test_113-17.jsonl": "{"text":"Question: What is the rotatable bond count of the compound with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 7"} {"text":"Question: What is the count of rotatable bonds of the molecule with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_29-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl is 62.15."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3 is 58.55."}", "/scratch/micpie/export/rdkit_features/valid_33-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the chemical with SMILES c1ccc2c(c1)cn(n2)CC(=O)N3C[C@@H]4[C@H]3CN(CC4)C(=O)[C@@]56CCC[C@@H]5C6 is 4."} {"text":"The number of hydrogen bond acceptor sites of the compound with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@@H]3CCSC3)Br is 6."}", "/scratch/micpie/export/rdkit_features/test_107-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC?\nAnswer: 6"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_114-20.jsonl": "{"text":"Question: What is the number of basic groups of the molecule with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the chemical with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_117-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(cc4)Oc5ccc(cn5)Cl?\nAnswer: 6"} {"text":"Question: What is the number of rotatable bonds of the chemical with SMILES C[C@@H](C(C)(C)OC)[NH2+]CCNc1ccccc1?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/valid_14-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_3-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N?\nAnswer: 64.06"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC?\nAnswer: 58.56"}", "/scratch/micpie/export/rdkit_features/test_101-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the compound with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_111-17.jsonl": "{"text":"Question: What is the rotatable bond count of the compound with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4?\nAnswer: 8"} {"text":"Question: What is the rotatable bond count of the chemical with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_111-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4?\nAnswer: 4"} {"text":"Question: What is the count of rings of the molecule with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_33-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the molecule with SMILES c1ccc2c(c1)cn(n2)CC(=O)N3C[C@@H]4[C@H]3CN(CC4)C(=O)[C@@]56CCC[C@@H]5C6?\nAnswer: 3"} {"text":"Question: What is the number of rotatable bonds of the molecule with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@@H]3CCSC3)Br?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_111-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the compound with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4?\nAnswer: 11"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/train_12-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C is 0."} {"text":"The number of hydrogen bond donor sites of the compound with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC is 1."}", "/scratch/micpie/export/rdkit_features/valid_4-7.jsonl": "{"text":"The acid group count of the chemical with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br is 0."} {"text":"The acid group count of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C is 0."}", "/scratch/micpie/export/rdkit_features/train_119-1.jsonl": "{"text":"The number of hydrogen bond donors of the chemical with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO is 3."} {"text":"The number of hydrogen bond donor sites of the molecule with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl is 2."}", "/scratch/micpie/export/rdkit_features/test_28-19.jsonl": "{"text":"Question: What is the acid group count of the compound with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the chemical with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_101-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F is C15H14F2N2O3."} {"text":"The chemical formula of the compound with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1 is C26H20N6OS."}", "/scratch/micpie/export/rdkit_features/train_30-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_29-8.jsonl": "{"text":"The number of basic groups of the molecule with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl is 1."} {"text":"The count of basic groups of the chemical with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3 is 1."}", "/scratch/micpie/export/rdkit_features/train_1-1.jsonl": "{"text":"The count of hydrogen bond donors of the chemical with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4 is 2."} {"text":"The count of hydrogen bond donors of the molecule with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl is 0."}", "/scratch/micpie/export/rdkit_features/valid_7-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the compound with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC?\nAnswer: 6"} {"text":"Question: What is the heteroatom count of the chemical with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/train_16-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C is 55.59."} {"text":"The sum of atomic polarizabilities of the compound with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F is 64.87."}", "/scratch/micpie/export/rdkit_features/test_24-7.jsonl": "{"text":"The acid group count of the chemical with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl is 3."} {"text":"The number of acid groups of the compound with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C is 0."}", "/scratch/micpie/export/rdkit_features/valid_113-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 69.10."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 64.69."}", "/scratch/micpie/export/rdkit_features/train_1-11.jsonl": "{"text":"User: I want to analyze a molecule with a molecular formula of C25H33N2O3+.\nAssistant: Interesting, do you have some additional conditions that I should consider?\nUser: Indeed, I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 3.\nAssistant: In that scenario, I the molecule with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4."} {"text":"User: I want to synthesize a molecule with a molecular formula of C17H22ClF3N2O2S.\nAssistant: Interesting, do you have some additional constraints?\nUser: I want the count of hydrogen bond donors to be 0, the count of hydrogen bond acceptors to be 3.\nAssistant: In that situation, I recommend the molecule with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl."}", "/scratch/micpie/export/rdkit_features/valid_33-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES c1ccc2c(c1)cn(n2)CC(=O)N3C[C@@H]4[C@H]3CN(CC4)C(=O)[C@@]56CCC[C@@H]5C6?\nAnswer: 0"} {"text":"Question: What is the basic group count of the molecule with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@@H]3CCSC3)Br?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_104-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the chemical with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 5"} {"text":"Question: What is the count of aromatic bonds of the compound with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/train_117-0.jsonl": "{"text":"The formula of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(cc4)Oc5ccc(cn5)Cl is C27H26ClN3O4."} {"text":"The chemical formula of the molecule with SMILES C[C@@H](C(C)(C)OC)[NH2+]CCNc1ccccc1 is C14H25N2O+."}", "/scratch/micpie/export/rdkit_features/test_100-1.jsonl": "{"text":"The number of hydrogen bond donors of the molecule with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F is 2."} {"text":"The count of hydrogen bond donors of the chemical with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br is 2."}", "/scratch/micpie/export/rdkit_features/train_24-16.jsonl": "{"text":"Question: What is the ring count of the chemical with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5?\nAnswer: 5"} {"text":"Question: What is the count of rings of the chemical with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_5-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C?\nAnswer: 4"} {"text":"Question: What is the count of rings of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+]?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_23-8.jsonl": "{"text":"The count of basic groups of the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4 is 1."} {"text":"The count of basic groups of the chemical with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4 is 0."}", "/scratch/micpie/export/rdkit_features/valid_19-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC?\nAnswer: 3"} {"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_27-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F is 2."} {"text":"The number of hydrogen bond donor sites of the chemical with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C is 2."}", "/scratch/micpie/export/rdkit_features/valid_33-4.jsonl": "{"text":"The number of rings of the compound with SMILES c1ccc2c(c1)cn(n2)CC(=O)N3C[C@@H]4[C@H]3CN(CC4)C(=O)[C@@]56CCC[C@@H]5C6 is 6."} {"text":"The ring count of the molecule with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@@H]3CCSC3)Br is 3."}", "/scratch/micpie/export/rdkit_features/test_28-22.jsonl": "{"text":"User: I want to analyze a chemical with a count of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 6.\nAssistant: Cool, do you have some additional conditions?\nUser: Yea, I want the heteroatom count to be 7.\nAssistant: Then, I suggest the chemical with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC."} {"text":"User: I want to make a molecule with a number of hydrogen bond donor sites of 3 and a number of hydrogen bond acceptor sites of 3.\nAssistant: That's interesting, do you have some additional ?\nUser: Yep, I want the heteroatom count to be 6.\nAssistant: In that situation, I propose the molecule with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C."}", "/scratch/micpie/export/rdkit_features/test_9-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C?\nAnswer: 3"} {"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_100-7.jsonl": "{"text":"The count of acid groups of the molecule with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C is 0."} {"text":"The count of acid groups of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3 is 0."}", "/scratch/micpie/export/rdkit_features/train_111-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the molecule with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_7-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl?\nAnswer: 62.61"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F?\nAnswer: 60.29"}", "/scratch/micpie/export/rdkit_features/valid_18-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 65.17."} {"text":"The sum of atomic polarizabilities of the compound with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O is 40.13."}", "/scratch/micpie/export/rdkit_features/valid_7-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC?\nAnswer: 4"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_18-16.jsonl": "{"text":"Question: What is the count of rings of the compound with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 5"} {"text":"Question: What is the count of rings of the chemical with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_17-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S?\nAnswer: 61.95"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 62.08"}", "/scratch/micpie/export/rdkit_features/valid_14-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the compound with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C is 3."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O is 4."}", "/scratch/micpie/export/rdkit_features/test_3-15.jsonl": "{"text":"Question: What is the heteroatom count of the compound with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC?\nAnswer: 9"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/test_103-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the compound with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1?\nAnswer: 24"} {"text":"Question: What is the aromatic bond count of the compound with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_105-23.jsonl": "{"text":"User: I want to create a compound with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptors of 5 and a Wildman-Crippen LogP value computed using RDKit of 4.16.\nAssistant: Do you have some additional conditions that I should consider?\nUser: Indeed, I want the chemical formula to be C18H14FN3OS2.\nAssistant: In that situation, I advise the compound with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F."} {"text":"User: I want to design a molecule with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value computed using RDKit of 1.73.\nAssistant: Nice, do you have some additional requirements I should take into account?\nUser: Yea, I want the chemical formula to be C13H12ClFN3O3+.\nAssistant: I suggest the molecule with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O."}", "/scratch/micpie/export/rdkit_features/train_24-4.jsonl": "{"text":"The count of rings of the compound with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5 is 5."} {"text":"The count of rings of the chemical with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C is 3."}", "/scratch/micpie/export/rdkit_features/test_28-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the compound with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC is 2.54."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C is 3.93."}", "/scratch/micpie/export/rdkit_features/valid_119-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O is 1.24."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6 is 3.71."}", "/scratch/micpie/export/rdkit_features/test_120-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the chemical with SMILES c1ccc(cc1)CCN(CC(=O)Nc2ccc(cc2)Cl)S(=O)(=O)c3ccc4c(c3)OCCO4?\nAnswer: 18"} {"text":"Question: What is the aromatic bond count of the chemical with SMILES Cc1c(cnn1c2ccccc2)N[C@H](C)C(=O)Nc3cccc(c3Cl)Cl?\nAnswer: 17"}", "/scratch/micpie/export/rdkit_features/train_21-3.jsonl": "{"text":"The number of heteroatoms of the molecule with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC is 9."} {"text":"The number of heteroatoms of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O is 8."}", "/scratch/micpie/export/rdkit_features/train_13-18.jsonl": "{"text":"Question: What is the aromatic bond count of the chemical with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 6"} {"text":"Question: What is the number of aromatic bonds of the chemical with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/test_108-7.jsonl": "{"text":"The count of acid groups of the compound with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5 is 0."} {"text":"The count of acid groups of the molecule with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3 is 0."}", "/scratch/micpie/export/rdkit_features/train_120-16.jsonl": "{"text":"Question: What is the ring count of the chemical with SMILES Cc1ccc(cc1)NC(=O)CSc2nnc(n2C)C=NC(=O)CCc3ccccc3Cl?\nAnswer: 3"} {"text":"Question: What is the count of rings of the molecule with SMILES Cc1c(cnn1c2ccccc2)N[C@@H](C)C(=O)Nc3cccc(c3Cl)Cl?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_16-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C is 2."} {"text":"The count of hydrogen bond donors of the chemical with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3 is 1."}", "/scratch/micpie/export/rdkit_features/test_113-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N is 5."} {"text":"The count of hydrogen bond acceptors of the molecule with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 4."}", "/scratch/micpie/export/rdkit_features/valid_15-8.jsonl": "{"text":"The basic group count of the molecule with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N is 0."} {"text":"The basic group count of the molecule with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C is 1."}", "/scratch/micpie/export/rdkit_features/train_2-16.jsonl": "{"text":"Question: What is the count of rings of the compound with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6?\nAnswer: 6"} {"text":"Question: What is the number of rings of the compound with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_120-8.jsonl": "{"text":"The number of basic groups of the compound with SMILES c1cc(oc1)CNC(=O)CSc2nc3ccc(cc3s2)n4c(c5c(c4O)[C@@H]6C[C@@H]5C=C6)O is 0."} {"text":"The basic group count of the chemical with SMILES CN(c1nc2cc(ccc2s1)Cl)C(=O)CSc3ccc(c(c3)F)F is 0."}", "/scratch/micpie/export/rdkit_features/train_118-8.jsonl": "{"text":"The count of basic groups of the compound with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F is 1."} {"text":"The basic group count of the chemical with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3 is 0."}", "/scratch/micpie/export/rdkit_features/test_14-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the compound with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3?\nAnswer: 5"} {"text":"Question: What is the number of heteroatoms of the molecule with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_27-3.jsonl": "{"text":"The number of heteroatoms of the molecule with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F is 8."} {"text":"The number of heteroatoms of the chemical with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F is 7."}", "/scratch/micpie/export/rdkit_features/train_11-19.jsonl": "{"text":"Question: What is the acid group count of the compound with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the chemical with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_16-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C is 5."} {"text":"The count of aromatic bonds of the compound with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F is 6."}", "/scratch/micpie/export/rdkit_features/test_5-7.jsonl": "{"text":"The acid group count of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5 is 0."} {"text":"The number of acid groups of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F is 0."}", "/scratch/micpie/export/rdkit_features/valid_7-0.jsonl": "{"text":"The chemical formula of the compound with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC is C24H28N2O4."} {"text":"The formula of the molecule with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC is C23H23N3O4."}", "/scratch/micpie/export/rdkit_features/train_33-4.jsonl": "{"text":"The number of rings of the compound with SMILES COc1ccc(c(n1)C(F)(F)F)NC(=O)C(=O)NCCC[NH+]2CCCCC2 is 2."} {"text":"The number of rings of the chemical with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@H]3CCSC3)Br is 3."}", "/scratch/micpie/export/rdkit_features/train_112-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the compound with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC is 8."} {"text":"The number of hydrogen bond acceptors of the chemical with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 5."}", "/scratch/micpie/export/rdkit_features/valid_21-7.jsonl": "{"text":"The count of acid groups of the molecule with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-] is 1."} {"text":"The number of acid groups of the chemical with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3 is 0."}", "/scratch/micpie/export/rdkit_features/test_20-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-]?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_29-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F is 58.00."} {"text":"The total sum of atomic polarizabilities of the molecule with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C is 67.63."}", "/scratch/micpie/export/rdkit_features/test_103-22.jsonl": "{"text":"User: I want to create a compound with a count of hydrogen bond donors of 2 and a number of hydrogen bond acceptors of 4.\nAssistant: Interesting, do you have some additional constraints I should consider?\nUser: Yeah, I want the number of heteroatoms to be 4.\nAssistant: In that scenario, I the compound with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1."} {"text":"User: I want to analyze a molecule with a number of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 7.\nAssistant: Nice, do you have some additional requirements that help me narrow down the search?\nUser: Yeah, I want the heteroatom count to be 9.\nAssistant: Then, I propose the molecule with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3."}", "/scratch/micpie/export/rdkit_features/valid_0-19.jsonl": "{"text":"Question: What is the count of acid groups of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1?\nAnswer: 1"} {"text":"Question: What is the acid group count of the compound with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_7-6.jsonl": "{"text":"The number of aromatic bonds of the chemical with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC is 17."} {"text":"The number of aromatic bonds of the chemical with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC is 16."}", "/scratch/micpie/export/rdkit_features/valid_114-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl?\nAnswer: C20H17Cl2N3O2S2"} {"text":"Question: What is the chemical formula of the compound with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4?\nAnswer: C27H24ClN3O2"}", "/scratch/micpie/export/rdkit_features/valid_28-20.jsonl": "{"text":"Question: What is the basic group count of the molecule with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br?\nAnswer: 0"} {"text":"Question: What is the basic group count of the compound with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_4-8.jsonl": "{"text":"The basic group count of the compound with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br is 0."} {"text":"The number of basic groups of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C is 2."}", "/scratch/micpie/export/rdkit_features/valid_20-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3?\nAnswer: C25H33N3O3"} {"text":"Question: What is the formula of the compound with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-]?\nAnswer: C16H19ClNO6S-"}", "/scratch/micpie/export/rdkit_features/train_23-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4?\nAnswer: C20H33N6O2+"} {"text":"Question: What is the chemical formula of the chemical with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3?\nAnswer: C12H12IN6O-"}", "/scratch/micpie/export/rdkit_features/train_102-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O is -0.42."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1 is 4.10."}", "/scratch/micpie/export/rdkit_features/test_18-4.jsonl": "{"text":"The count of rings of the chemical with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O is 4."} {"text":"The count of rings of the chemical with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C is 3."}", "/scratch/micpie/export/rdkit_features/test_115-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the molecule with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F is 2."} {"text":"The count of hydrogen bond donors of the compound with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 1."}", "/scratch/micpie/export/rdkit_features/valid_120-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES c1cc(oc1)CNC(=O)CSc2nc3ccc(cc3s2)n4c(c5c(c4O)[C@@H]6C[C@@H]5C=C6)O?\nAnswer: 3"} {"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES CN(c1nc2cc(ccc2s1)Cl)C(=O)CSc3ccc(c(c3)F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_17-5.jsonl": "{"text":"The rotatable bond count of the chemical with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F is 6."} {"text":"The rotatable bond count of the molecule with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 5."}", "/scratch/micpie/export/rdkit_features/valid_6-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+]?\nAnswer: 3"} {"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_112-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC?\nAnswer: 9"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_116-4.jsonl": "{"text":"The count of rings of the chemical with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 6."} {"text":"The count of rings of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl is 5."}", "/scratch/micpie/export/rdkit_features/valid_31-5.jsonl": "{"text":"The number of rotatable bonds of the chemical with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC is 5."} {"text":"The count of rotatable bonds of the molecule with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3 is 6."}", "/scratch/micpie/export/rdkit_features/test_3-22.jsonl": "{"text":"User: I want to create a compound with a count of hydrogen bond donors of 2 and a count of hydrogen bond acceptors of 7.\nAssistant: Nice, do you have some additional limitations that help me narrow down the search?\nUser: I want the number of heteroatoms to be 9.\nAssistant: In that case, I advise the compound with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC."} {"text":"User: I want to create a molecule with a number of hydrogen bond donor sites of 3 and a count of hydrogen bond acceptors of 4.\nAssistant: That's interesting, do you have some additional requirements I should take into account?\nUser: Yea, I want the count of heteroatoms to be 9.\nAssistant: In that situation, I the molecule with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F."}", "/scratch/micpie/export/rdkit_features/test_113-22.jsonl": "{"text":"User: I want to synthesize a compound with a number of hydrogen bond donors of 3 and a number of hydrogen bond acceptor sites of 5.\nAssistant: Nice, do you have some additional constraints I should take into account?\nUser: I want the number of heteroatoms to be 9.\nAssistant: Then, I propose the compound with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N."} {"text":"User: I want to make a chemical with a count of hydrogen bond donors of 3 and a number of hydrogen bond acceptor sites of 4.\nAssistant: Cool, do you have some additional requirements?\nUser: Yea, I want the number of heteroatoms to be 10.\nAssistant: In that situation, I the chemical with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl."}", "/scratch/micpie/export/rdkit_features/valid_8-1.jsonl": "{"text":"The number of hydrogen bond donors of the molecule with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C is 1."} {"text":"The count of hydrogen bond donors of the chemical with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br is 0."}", "/scratch/micpie/export/rdkit_features/valid_109-18.jsonl": "{"text":"Question: What is the aromatic bond count of the chemical with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl?\nAnswer: 12"} {"text":"Question: What is the aromatic bond count of the compound with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F?\nAnswer: 17"}", "/scratch/micpie/export/rdkit_features/train_19-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC?\nAnswer: 67.66"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3?\nAnswer: 71.71"}", "/scratch/micpie/export/rdkit_features/valid_5-12.jsonl": "{"text":"Question: What is the formula of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C?\nAnswer: C21H23N4O3S+"} {"text":"Question: What is the chemical formula of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+]?\nAnswer: C25H25N2O4+"}", "/scratch/micpie/export/rdkit_features/train_0-11.jsonl": "{"text":"User: I want to synthesize a compound with a molecular formula of C21H6F2N2O3S4.\nAssistant: That's interesting, do you have some additional constraints I should consider?\nUser: Yes, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 8.\nAssistant: In that case, I advise the compound with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1."} {"text":"User: I want to create a molecule with a chemical formula of C25H33N2O3+.\nAssistant: That's interesting, do you have some additional requirements that help me narrow down the search?\nUser: Yea, I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptors to be 3.\nAssistant: In that scenario, I suggest the molecule with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4."}", "/scratch/micpie/export/rdkit_features/train_24-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5?\nAnswer: C20H16N7O2-"} {"text":"Question: What is the chemical formula of the molecule with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C?\nAnswer: C19H31N7O2"}", "/scratch/micpie/export/rdkit_features/test_108-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the compound with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5?\nAnswer: 5"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_23-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4 is 0.96."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4 is 1.79."}", "/scratch/micpie/export/rdkit_features/valid_31-6.jsonl": "{"text":"The aromatic bond count of the chemical with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC is 12."} {"text":"The count of aromatic bonds of the compound with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3 is 6."}", "/scratch/micpie/export/rdkit_features/test_117-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc5cc(ccc5c4)Br?\nAnswer: 6"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES C[C@H](C(C)(C)OC)[NH2+]CCNc1ccccc1?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_13-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC is 61.56."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C is 60.41."}", "/scratch/micpie/export/rdkit_features/test_116-23.jsonl": "{"text":"User: I want to design a molecule with a count of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value computed using RDKit of 5.57.\nAssistant: That is a very interesting question, do you have some additional limitations?\nUser: Yea, I want the chemical formula to be C29H26FN3O2S.\nAssistant: I propose the molecule with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1."} {"text":"User: I want to analyze a molecule with a number of hydrogen bond donors of 0, a count of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value of 5.08.\nAssistant: That is a very interesting question, do you have some additional conditions I should take into account?\nUser: Yea, I want the molecular formula to be C24H23ClN2O3S.\nAssistant: In that situation, I advise the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl."}", "/scratch/micpie/export/rdkit_features/train_114-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_1-16.jsonl": "{"text":"Question: What is the count of rings of the compound with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4?\nAnswer: 4"} {"text":"Question: What is the count of rings of the compound with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_108-22.jsonl": "{"text":"User: I want to analyze a chemical with a number of hydrogen bond donor sites of 2 and a count of hydrogen bond acceptors of 10.\nAssistant: That is a very interesting question, do you have some additional requirements I should consider?\nUser: Yes, I want the number of heteroatoms to be 11.\nAssistant: In that situation, I recommend the chemical with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5."} {"text":"User: I want to analyze a chemical with a number of hydrogen bond donors of 3 and a number of hydrogen bond acceptors of 5.\nAssistant: That is a very interesting question, do you have some additional constraints I should consider?\nUser: Yep, I want the heteroatom count to be 9.\nAssistant: In that situation, I suggest the chemical with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3."}", "/scratch/micpie/export/rdkit_features/test_26-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3?\nAnswer: 3"} {"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_17-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F is 1."} {"text":"The number of hydrogen bond donors of the molecule with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 1."}", "/scratch/micpie/export/rdkit_features/train_29-12.jsonl": "{"text":"Question: What is the molecular formula of the molecule with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F?\nAnswer: C20H26FN3O2"} {"text":"Question: What is the molecular formula of the compound with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C?\nAnswer: C22H36N3O2+"}", "/scratch/micpie/export/rdkit_features/test_13-8.jsonl": "{"text":"The number of basic groups of the compound with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC is 0."} {"text":"The basic group count of the molecule with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4 is 0."}", "/scratch/micpie/export/rdkit_features/train_113-20.jsonl": "{"text":"Question: What is the count of basic groups of the compound with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 2"} {"text":"Question: What is the number of basic groups of the compound with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_21-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-]?\nAnswer: 5"} {"text":"Question: What is the count of aromatic bonds of the compound with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_9-18.jsonl": "{"text":"Question: What is the aromatic bond count of the compound with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C?\nAnswer: 6"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4?\nAnswer: 10"}", "/scratch/micpie/export/rdkit_features/train_108-4.jsonl": "{"text":"The count of rings of the chemical with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2 is 2."} {"text":"The count of rings of the molecule with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3 is 3."}", "/scratch/micpie/export/rdkit_features/valid_9-20.jsonl": "{"text":"Question: What is the count of basic groups of the compound with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2?\nAnswer: 1"} {"text":"Question: What is the count of basic groups of the compound with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_6-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+] is 3.05."} {"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F is 3.63."}", "/scratch/micpie/export/rdkit_features/valid_11-16.jsonl": "{"text":"Question: What is the count of rings of the molecule with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl?\nAnswer: 3"} {"text":"Question: What is the ring count of the compound with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_33-11.jsonl": "{"text":"User: I want to analyze a compound with a formula of C17H24F3N4O3+.\nAssistant: Nice, do you have some additional limitations?\nUser: Indeed, I want the number of hydrogen bond donors to be 3, the number of hydrogen bond acceptor sites to be 4.\nAssistant: In that scenario, I suggest the compound with SMILES COc1ccc(c(n1)C(F)(F)F)NC(=O)C(=O)NCCC[NH+]2CCCCC2."} {"text":"User: I want to design a chemical with a formula of C15H20BrN4O2S+.\nAssistant: Cool, do you have some additional requirements I should take into account?\nUser: I want the number of hydrogen bond donor sites to be 1, the count of hydrogen bond acceptors to be 6.\nAssistant: I the chemical with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@H]3CCSC3)Br."}", "/scratch/micpie/export/rdkit_features/train_28-4.jsonl": "{"text":"The count of rings of the chemical with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl is 3."} {"text":"The number of rings of the molecule with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F is 4."}", "/scratch/micpie/export/rdkit_features/test_102-19.jsonl": "{"text":"Question: What is the count of acid groups of the molecule with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the compound with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_12-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_0-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1 is 1."} {"text":"The count of hydrogen bond donors of the chemical with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4 is 2."}", "/scratch/micpie/export/rdkit_features/train_33-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES COc1ccc(c(n1)C(F)(F)F)NC(=O)C(=O)NCCC[NH+]2CCCCC2 is C17H24F3N4O3+."} {"text":"The molecular formula of the compound with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@H]3CCSC3)Br is C15H20BrN4O2S+."}", "/scratch/micpie/export/rdkit_features/valid_12-19.jsonl": "{"text":"Question: What is the number of acid groups of the molecule with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the chemical with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_13-0.jsonl": "{"text":"The molecular formula of the molecule with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC is C21H32FNO2."} {"text":"The molecular formula of the molecule with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C is C20H30N4O."}", "/scratch/micpie/export/rdkit_features/test_112-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC is C21H26N6O3S."} {"text":"The molecular formula of the compound with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N is C22H27N6O2S+."}", "/scratch/micpie/export/rdkit_features/valid_17-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the chemical with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_116-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is C27H25ClFN3O2."} {"text":"The molecular formula of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br is C26H25BrN2O3."}", "/scratch/micpie/export/rdkit_features/valid_100-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F?\nAnswer: 54.34"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4?\nAnswer: 55.31"}", "/scratch/micpie/export/rdkit_features/test_17-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F is 6."} {"text":"The count of rotatable bonds of the chemical with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 5."}", "/scratch/micpie/export/rdkit_features/test_32-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3 is C19H30N5O2S+."} {"text":"The chemical formula of the compound with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3 is C21H24N4O3."}", "/scratch/micpie/export/rdkit_features/valid_116-22.jsonl": "{"text":"User: I want to analyze a compound with a count of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 3.\nAssistant: That's interesting, do you have some additional conditions I should take into account?\nUser: Indeed, I want the number of heteroatoms to be 6.\nAssistant: I recommend the compound with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F."} {"text":"User: I want to analyze a chemical with a count of hydrogen bond donors of 0 and a number of hydrogen bond acceptor sites of 4.\nAssistant: Interesting, do you have some additional requirements I should take into account?\nUser: Yeah, I want the heteroatom count to be 7.\nAssistant: Then, I recommend the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl."}", "/scratch/micpie/export/rdkit_features/test_16-0.jsonl": "{"text":"The formula of the molecule with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C is C18H35N2O2+."} {"text":"The formula of the molecule with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N is C17H19Cl2N3O3S."}", "/scratch/micpie/export/rdkit_features/train_15-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the compound with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC?\nAnswer: 6"} {"text":"Question: What is the count of heteroatoms of the chemical with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_26-7.jsonl": "{"text":"The acid group count of the chemical with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3 is 0."} {"text":"The acid group count of the molecule with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br is 0."}", "/scratch/micpie/export/rdkit_features/train_109-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3 is 2.25."} {"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C is 2.46."}", "/scratch/micpie/export/rdkit_features/valid_8-0.jsonl": "{"text":"The formula of the compound with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C is C23H36N2O5."} {"text":"The chemical formula of the compound with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br is C11H7Br2F4NO2."}", "/scratch/micpie/export/rdkit_features/valid_27-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F is C14H18F3N5OS."} {"text":"The chemical formula of the compound with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C is C20H31ClN3O+."}", "/scratch/micpie/export/rdkit_features/test_20-3.jsonl": "{"text":"The number of heteroatoms of the compound with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br is 6."} {"text":"The number of heteroatoms of the compound with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-] is 9."}", "/scratch/micpie/export/rdkit_features/test_25-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4 is 69.40."} {"text":"The sum of atomic polarizabilities of the compound with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3 is 51.58."}", "/scratch/micpie/export/rdkit_features/train_113-16.jsonl": "{"text":"Question: What is the count of rings of the molecule with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 3"} {"text":"Question: What is the number of rings of the molecule with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_27-7.jsonl": "{"text":"The number of acid groups of the chemical with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F is 0."} {"text":"The acid group count of the compound with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C is 0."}", "/scratch/micpie/export/rdkit_features/valid_33-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES c1ccc2c(c1)cn(n2)CC(=O)N3C[C@@H]4[C@H]3CN(CC4)C(=O)[C@@]56CCC[C@@H]5C6?\nAnswer: 0"} {"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@@H]3CCSC3)Br?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_28-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the chemical with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl?\nAnswer: 2"} {"text":"Question: What is the number of rotatable bonds of the molecule with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_120-4.jsonl": "{"text":"The ring count of the molecule with SMILES Cc1ccc(cc1)NC(=O)CSc2nnc(n2C)C=NC(=O)CCc3ccccc3Cl is 3."} {"text":"The ring count of the compound with SMILES Cc1c(cnn1c2ccccc2)N[C@@H](C)C(=O)Nc3cccc(c3Cl)Cl is 3."}", "/scratch/micpie/export/rdkit_features/valid_25-17.jsonl": "{"text":"Question: What is the rotatable bond count of the chemical with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4?\nAnswer: 7"} {"text":"Question: What is the number of rotatable bonds of the molecule with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_113-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: C20H24N5O2S2+"} {"text":"Question: What is the chemical formula of the chemical with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: C20H17Cl2N3O3S2"}", "/scratch/micpie/export/rdkit_features/train_105-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N?\nAnswer: C21H15N3O4"} {"text":"Question: What is the chemical formula of the chemical with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O?\nAnswer: C16H21N3O4"}", "/scratch/micpie/export/rdkit_features/test_112-23.jsonl": "{"text":"User: I want to synthesize a chemical with a number of hydrogen bond donor sites of 0, a number of hydrogen bond acceptors of 9 and a Wildman-Crippen LogP value computed using RDKit of 2.44.\nAssistant: That's interesting, do you have some additional conditions I should take into account?\nUser: Yes, I want the formula to be C21H26N6O3S.\nAssistant: Then, I recommend the chemical with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC."} {"text":"User: I want to analyze a molecule with a number of hydrogen bond donors of 3, a number of hydrogen bond acceptor sites of 5 and a LogP value computed using the Wildman-Crippen method of 0.31.\nAssistant: That is a very interesting question, do you have some additional I should take into account?\nUser: Yes, I want the formula to be C22H27N6O2S+.\nAssistant: In that situation, I advise the molecule with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N."}", "/scratch/micpie/export/rdkit_features/valid_8-20.jsonl": "{"text":"Question: What is the number of basic groups of the chemical with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the chemical with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_17-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the chemical with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S is 2.65."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 0.66."}", "/scratch/micpie/export/rdkit_features/valid_3-22.jsonl": "{"text":"User: I want to synthesize a chemical with a number of hydrogen bond donor sites of 3 and a number of hydrogen bond acceptor sites of 5.\nAssistant: That's interesting, do you have some additional limitations that help me narrow down the search?\nUser: Yes, I want the heteroatom count to be 8.\nAssistant: Then, I recommend the chemical with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N."} {"text":"User: I want to analyze a molecule with a number of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 5.\nAssistant: That's interesting, do you have some additional that I should consider?\nUser: Yeah, I want the number of heteroatoms to be 8.\nAssistant: In that scenario, I suggest the molecule with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC."}", "/scratch/micpie/export/rdkit_features/valid_23-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4?\nAnswer: C19H34N5OS+"} {"text":"Question: What is the formula of the compound with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4?\nAnswer: C19H24N7O2-"}", "/scratch/micpie/export/rdkit_features/test_106-11.jsonl": "{"text":"User: I want to create a chemical with a molecular formula of C14H18N4O4.\nAssistant: That is a very interesting question, do you have some additional conditions I should consider?\nUser: Indeed, I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptors to be 6.\nAssistant: I recommend the chemical with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O."} {"text":"User: I want to make a compound with a chemical formula of C22H27N3O5S.\nAssistant: Nice, do you have some additional constraints that I should consider?\nUser: Indeed, I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptors to be 6.\nAssistant: In that situation, I suggest the compound with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC."}", "/scratch/micpie/export/rdkit_features/valid_113-3.jsonl": "{"text":"The count of heteroatoms of the molecule with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 9."} {"text":"The heteroatom count of the chemical with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 9."}", "/scratch/micpie/export/rdkit_features/train_100-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C is 1.13."} {"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3 is 2.93."}", "/scratch/micpie/export/rdkit_features/test_117-4.jsonl": "{"text":"The count of rings of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc5cc(ccc5c4)Br is 5."} {"text":"The ring count of the chemical with SMILES C[C@H](C(C)(C)OC)[NH2+]CCNc1ccccc1 is 1."}", "/scratch/micpie/export/rdkit_features/train_5-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C is 2.36."} {"text":"The Wildman-Crippen LogP value of the compound with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C is 3.16."}", "/scratch/micpie/export/rdkit_features/valid_30-6.jsonl": "{"text":"The count of aromatic bonds of the chemical with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3 is 6."} {"text":"The number of aromatic bonds of the chemical with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F is 12."}", "/scratch/micpie/export/rdkit_features/valid_11-22.jsonl": "{"text":"User: I want to design a chemical with a number of hydrogen bond donor sites of 1 and a number of hydrogen bond acceptors of 2.\nAssistant: Interesting, do you have some additional conditions I should take into account?\nUser: Indeed, I want the number of heteroatoms to be 4.\nAssistant: In that case, I advise the chemical with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl."} {"text":"User: I want to create a molecule with a number of hydrogen bond donor sites of 0 and a number of hydrogen bond acceptors of 1.\nAssistant: That is a very interesting question, do you have some additional conditions that I should consider?\nUser: Indeed, I want the count of heteroatoms to be 4.\nAssistant: I propose the molecule with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C."}", "/scratch/micpie/export/rdkit_features/valid_29-16.jsonl": "{"text":"Question: What is the count of rings of the molecule with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl?\nAnswer: 3"} {"text":"Question: What is the number of rings of the molecule with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_8-16.jsonl": "{"text":"Question: What is the count of rings of the chemical with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N?\nAnswer: 4"} {"text":"Question: What is the ring count of the molecule with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_6-4.jsonl": "{"text":"The ring count of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+] is 3."} {"text":"The ring count of the compound with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F is 4."}", "/scratch/micpie/export/rdkit_features/valid_32-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3?\nAnswer: 63.45"} {"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O?\nAnswer: 50.73"}", "/scratch/micpie/export/rdkit_features/test_109-22.jsonl": "{"text":"User: I want to synthesize a chemical with a count of hydrogen bond donors of 3 and a number of hydrogen bond acceptor sites of 5.\nAssistant: Nice, do you have some additional requirements?\nUser: Yeah, I want the count of heteroatoms to be 8.\nAssistant: In that case, I propose the chemical with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl."} {"text":"User: I want to analyze a chemical with a number of hydrogen bond donor sites of 3 and a number of hydrogen bond acceptors of 5.\nAssistant: Cool, do you have some additional limitations that help me narrow down the search?\nUser: Yea, I want the count of heteroatoms to be 10.\nAssistant: Then, I recommend the chemical with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N."}", "/scratch/micpie/export/rdkit_features/train_112-3.jsonl": "{"text":"The number of heteroatoms of the chemical with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC is 9."} {"text":"The heteroatom count of the chemical with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 9."}", "/scratch/micpie/export/rdkit_features/test_3-4.jsonl": "{"text":"The ring count of the molecule with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC is 4."} {"text":"The ring count of the molecule with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F is 3."}", "/scratch/micpie/export/rdkit_features/valid_27-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F is 10."} {"text":"The count of aromatic bonds of the compound with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C is 6."}", "/scratch/micpie/export/rdkit_features/test_24-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl?\nAnswer: 7"} {"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_119-19.jsonl": "{"text":"Question: What is the count of acid groups of the chemical with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_19-3.jsonl": "{"text":"The number of heteroatoms of the compound with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl is 8."} {"text":"The count of heteroatoms of the chemical with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4 is 8."}", "/scratch/micpie/export/rdkit_features/valid_28-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br?\nAnswer: 4"} {"text":"Question: What is the rotatable bond count of the chemical with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_18-6.jsonl": "{"text":"The number of aromatic bonds of the chemical with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O is 6."} {"text":"The number of aromatic bonds of the compound with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl is 12."}", "/scratch/micpie/export/rdkit_features/train_31-22.jsonl": "{"text":"User: I want to analyze a chemical with a count of hydrogen bond donors of 0 and a number of hydrogen bond acceptors of 3.\nAssistant: Cool, do you have some additional constraints that help me narrow down the search?\nUser: Yeah, I want the number of heteroatoms to be 6.\nAssistant: Then, I recommend the chemical with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F."} {"text":"User: I want to analyze a compound with a count of hydrogen bond donors of 2 and a number of hydrogen bond acceptor sites of 4.\nAssistant: That is a very interesting question, do you have some additional I should take into account?\nUser: I want the count of heteroatoms to be 7.\nAssistant: Then, I the compound with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4."}", "/scratch/micpie/export/rdkit_features/valid_14-12.jsonl": "{"text":"Question: What is the chemical formula of the compound with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C?\nAnswer: C20H28N4O"} {"text":"Question: What is the chemical formula of the compound with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O?\nAnswer: C14H14BrNO3"}", "/scratch/micpie/export/rdkit_features/test_32-3.jsonl": "{"text":"The heteroatom count of the molecule with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3 is 8."} {"text":"The number of heteroatoms of the molecule with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3 is 7."}", "/scratch/micpie/export/rdkit_features/valid_113-22.jsonl": "{"text":"User: I want to synthesize a molecule with a count of hydrogen bond donors of 3 and a number of hydrogen bond acceptor sites of 5.\nAssistant: Nice, do you have some additional constraints I should take into account?\nUser: Yes, I want the count of heteroatoms to be 9.\nAssistant: I suggest the molecule with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N."} {"text":"User: I want to make a molecule with a number of hydrogen bond donors of 3 and a number of hydrogen bond acceptor sites of 3.\nAssistant: That's interesting, do you have some additional limitations I should take into account?\nUser: Yep, I want the heteroatom count to be 9.\nAssistant: In that situation, I suggest the molecule with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl."}", "/scratch/micpie/export/rdkit_features/test_2-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the chemical with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C is 2.43."} {"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4 is 3.64."}", "/scratch/micpie/export/rdkit_features/test_12-19.jsonl": "{"text":"Question: What is the number of acid groups of the chemical with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the molecule with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_14-6.jsonl": "{"text":"The count of aromatic bonds of the molecule with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C is 11."} {"text":"The count of aromatic bonds of the compound with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O is 6."}", "/scratch/micpie/export/rdkit_features/train_33-5.jsonl": "{"text":"The rotatable bond count of the compound with SMILES COc1ccc(c(n1)C(F)(F)F)NC(=O)C(=O)NCCC[NH+]2CCCCC2 is 6."} {"text":"The rotatable bond count of the chemical with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@H]3CCSC3)Br is 4."}", "/scratch/micpie/export/rdkit_features/train_4-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4?\nAnswer: 5"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_29-8.jsonl": "{"text":"The number of basic groups of the molecule with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4 is 1."} {"text":"The number of basic groups of the chemical with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C is 0."}", "/scratch/micpie/export/rdkit_features/test_15-22.jsonl": "{"text":"User: I want to analyze a compound with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 4.\nAssistant: That's interesting, do you have some additional conditions I should take into account?\nUser: Yep, I want the heteroatom count to be 5.\nAssistant: In that situation, I advise the compound with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O."} {"text":"User: I want to design a compound with a number of hydrogen bond donors of 2 and a count of hydrogen bond acceptors of 3.\nAssistant: That's interesting, do you have some additional I should take into account?\nUser: Indeed, I want the heteroatom count to be 5.\nAssistant: In that situation, I advise the compound with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2."}", "/scratch/micpie/export/rdkit_features/test_100-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the compound with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F?\nAnswer: 5"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_11-5.jsonl": "{"text":"The number of rotatable bonds of the chemical with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl is 3."} {"text":"The rotatable bond count of the molecule with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C is 3."}", "/scratch/micpie/export/rdkit_features/valid_15-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N?\nAnswer: 0"} {"text":"Question: What is the acid group count of the chemical with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_16-7.jsonl": "{"text":"The acid group count of the molecule with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C is 0."} {"text":"The acid group count of the molecule with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N is 0."}", "/scratch/micpie/export/rdkit_features/valid_15-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the compound with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N is 2.82."} {"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C is 1.28."}", "/scratch/micpie/export/rdkit_features/valid_6-23.jsonl": "{"text":"User: I want to analyze a molecule with a number of hydrogen bond donor sites of 3, a number of hydrogen bond acceptors of 4 and a LogP value computed using the Wildman-Crippen method of 3.05.\nAssistant: Do you have some additional that I should consider?\nUser: Indeed, I want the chemical formula to be C25H26N3O3+.\nAssistant: In that situation, I advise the molecule with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+]."} {"text":"User: I want to synthesize a chemical with a number of hydrogen bond donor sites of 0, a number of hydrogen bond acceptor sites of 6 and a Wildman-Crippen LogP value of 3.63.\nAssistant: Cool, do you have some additional requirements?\nUser: I want the formula to be C22H24FN5O2.\nAssistant: I advise the chemical with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F."}", "/scratch/micpie/export/rdkit_features/valid_27-8.jsonl": "{"text":"The basic group count of the compound with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F is 0."} {"text":"The number of basic groups of the compound with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C is 1."}", "/scratch/micpie/export/rdkit_features/train_9-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C is 6."} {"text":"The count of rotatable bonds of the molecule with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F is 8."}", "/scratch/micpie/export/rdkit_features/test_10-8.jsonl": "{"text":"The number of basic groups of the molecule with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4 is 1."} {"text":"The count of basic groups of the chemical with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F is 0."}", "/scratch/micpie/export/rdkit_features/valid_18-23.jsonl": "{"text":"User: I want to synthesize a chemical with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptor sites of 9 and a LogP value computed using the Wildman-Crippen method of 0.95.\nAssistant: Interesting, do you have some additional conditions I should consider?\nUser: Yeah, I want the molecular formula to be C22H23N5O7.\nAssistant: In that situation, I recommend the chemical with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O."} {"text":"User: I want to make a chemical with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptors of 2 and a Wildman-Crippen LogP value computed using RDKit of 3.96.\nAssistant: Interesting, do you have some additional limitations I should take into account?\nUser: Yeah, I want the chemical formula to be C12H14Br2F3NO.\nAssistant: In that scenario, I propose the chemical with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O."}", "/scratch/micpie/export/rdkit_features/train_19-23.jsonl": "{"text":"User: I want to synthesize a chemical with a count of hydrogen bond donors of 3, a number of hydrogen bond acceptor sites of 4 and a Wildman-Crippen LogP value of 3.70.\nAssistant: Cool, do you have some additional requirements I should take into account?\nUser: I want the formula to be C23H31N3O4.\nAssistant: In that situation, I suggest the chemical with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC."} {"text":"User: I want to create a compound with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptors of 3 and a LogP value computed using the Wildman-Crippen method of 3.95.\nAssistant: Nice, do you have some additional requirements I should take into account?\nUser: Yes, I want the chemical formula to be C25H33N3O3.\nAssistant: In that situation, I propose the compound with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3."}", "/scratch/micpie/export/rdkit_features/train_112-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC is 0."} {"text":"The number of hydrogen bond donor sites of the chemical with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 3."}", "/scratch/micpie/export/rdkit_features/train_21-1.jsonl": "{"text":"The number of hydrogen bond donors of the molecule with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC is 1."} {"text":"The number of hydrogen bond donors of the compound with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O is 1."}", "/scratch/micpie/export/rdkit_features/train_5-23.jsonl": "{"text":"User: I want to create a compound with a number of hydrogen bond donors of 2, a count of hydrogen bond acceptors of 4 and a LogP value computed using the Wildman-Crippen method of 2.36.\nAssistant: That's interesting, do you have some additional conditions I should consider?\nUser: Indeed, I want the chemical formula to be C23H29N2O3S+.\nAssistant: I suggest the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C."} {"text":"User: I want to create a chemical with a count of hydrogen bond donors of 3, a count of hydrogen bond acceptors of 4 and a LogP value computed using the Wildman-Crippen method of 3.16.\nAssistant: Nice, do you have some additional constraints I should take into account?\nUser: Yeah, I want the chemical formula to be C21H32N4O4.\nAssistant: I the chemical with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C."}", "/scratch/micpie/export/rdkit_features/test_1-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4 is 64.23."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC is 63.72."}", "/scratch/micpie/export/rdkit_features/train_0-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_102-6.jsonl": "{"text":"The count of aromatic bonds of the molecule with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3 is 16."} {"text":"The count of aromatic bonds of the chemical with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1 is 22."}", "/scratch/micpie/export/rdkit_features/valid_6-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+]?\nAnswer: 6"} {"text":"Question: What is the count of heteroatoms of the compound with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/test_105-22.jsonl": "{"text":"User: I want to synthesize a compound with a count of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 3.\nAssistant: Do you have some additional constraints?\nUser: I want the heteroatom count to be 5.\nAssistant: I propose the compound with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N."} {"text":"User: I want to design a molecule with a number of hydrogen bond donors of 2 and a number of hydrogen bond acceptors of 6.\nAssistant: Nice, do you have some additional that I should consider?\nUser: Indeed, I want the count of heteroatoms to be 7.\nAssistant: In that situation, I the molecule with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O."}", "/scratch/micpie/export/rdkit_features/test_15-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O is 6."} {"text":"The aromatic bond count of the molecule with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2 is 5."}", "/scratch/micpie/export/rdkit_features/valid_9-3.jsonl": "{"text":"The number of heteroatoms of the molecule with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2 is 7."} {"text":"The heteroatom count of the compound with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4 is 7."}", "/scratch/micpie/export/rdkit_features/train_27-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the compound with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C?\nAnswer: 7"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_21-8.jsonl": "{"text":"The count of basic groups of the chemical with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-] is 0."} {"text":"The count of basic groups of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O is 0."}", "/scratch/micpie/export/rdkit_features/valid_120-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES c1cc(oc1)CNC(=O)CSc2nc3ccc(cc3s2)n4c(c5c(c4O)[C@@H]6C[C@@H]5C=C6)O?\nAnswer: 8"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES CN(c1nc2cc(ccc2s1)Cl)C(=O)CSc3ccc(c(c3)F)F?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_107-22.jsonl": "{"text":"User: I want to design a molecule with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptor sites of 5.\nAssistant: Do you have some additional that I should consider?\nUser: Yep, I want the number of heteroatoms to be 8.\nAssistant: Then, I suggest the molecule with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C."} {"text":"User: I want to design a chemical with a count of hydrogen bond donors of 3 and a number of hydrogen bond acceptor sites of 6.\nAssistant: That's interesting, do you have some additional I should take into account?\nUser: Indeed, I want the number of heteroatoms to be 9.\nAssistant: In that scenario, I advise the chemical with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C."}", "/scratch/micpie/export/rdkit_features/train_16-23.jsonl": "{"text":"User: I want to design a compound with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptor sites of 3 and a LogP value computed using the Wildman-Crippen method of 1.57.\nAssistant: Interesting, do you have some additional I should take into account?\nUser: Yea, I want the molecular formula to be C17H28N3OS+.\nAssistant: I advise the compound with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C."} {"text":"User: I want to design a compound with a number of hydrogen bond donors of 1, a count of hydrogen bond acceptors of 3 and a LogP value computed using the Wildman-Crippen method of 2.82.\nAssistant: Do you have some additional constraints I should take into account?\nUser: Yeah, I want the chemical formula to be C23H28F2N2O3.\nAssistant: In that scenario, I recommend the compound with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F."}", "/scratch/micpie/export/rdkit_features/test_103-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1?\nAnswer: 4"} {"text":"Question: What is the number of rings of the chemical with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_114-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C?\nAnswer: 71.75"} {"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5?\nAnswer: 68.04"}", "/scratch/micpie/export/rdkit_features/train_5-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C?\nAnswer: 2"} {"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_108-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the molecule with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_20-8.jsonl": "{"text":"The count of basic groups of the molecule with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3 is 0."} {"text":"The basic group count of the compound with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-] is 0."}", "/scratch/micpie/export/rdkit_features/train_106-23.jsonl": "{"text":"User: I want to synthesize a chemical with a count of hydrogen bond donors of 0, a number of hydrogen bond acceptor sites of 6 and a Wildman-Crippen LogP value of 2.27.\nAssistant: Interesting, do you have some additional requirements I should consider?\nUser: Yea, I want the molecular formula to be C12H16F2N4O3.\nAssistant: In that situation, I suggest the chemical with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O."} {"text":"User: I want to make a compound with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptor sites of 6 and a LogP value computed using the Wildman-Crippen method of 2.33.\nAssistant: Cool, do you have some additional conditions that I should consider?\nUser: Yep, I want the molecular formula to be C20H24ClN7O3.\nAssistant: In that case, I advise the compound with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl."}", "/scratch/micpie/export/rdkit_features/test_29-12.jsonl": "{"text":"Question: What is the molecular formula of the compound with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4?\nAnswer: C21H33N4O+"} {"text":"Question: What is the formula of the molecule with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C?\nAnswer: C21H38N2O2"}", "/scratch/micpie/export/rdkit_features/test_19-16.jsonl": "{"text":"Question: What is the count of rings of the chemical with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl?\nAnswer: 3"} {"text":"Question: What is the number of rings of the chemical with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_117-8.jsonl": "{"text":"The basic group count of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4OC)Cl is 0."} {"text":"The count of basic groups of the compound with SMILES Cc1cc(ccc1[C@@H](C)[NH2+]CC2(CC2)CO)F is 1."}", "/scratch/micpie/export/rdkit_features/test_114-23.jsonl": "{"text":"User: I want to analyze a molecule with a count of hydrogen bond donors of 3, a count of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value of 5.74.\nAssistant: That's interesting, do you have some additional constraints I should consider?\nUser: I want the chemical formula to be C24H27N3O3S2.\nAssistant: In that case, I recommend the molecule with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C."} {"text":"User: I want to make a compound with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptor sites of 6 and a LogP value computed using the Wildman-Crippen method of 5.07.\nAssistant: Cool, do you have some additional limitations I should take into account?\nUser: Indeed, I want the chemical formula to be C26H22N4O4.\nAssistant: In that situation, I propose the compound with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5."}", "/scratch/micpie/export/rdkit_features/test_106-4.jsonl": "{"text":"The ring count of the compound with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O is 2."} {"text":"The count of rings of the compound with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC is 2."}", "/scratch/micpie/export/rdkit_features/train_19-6.jsonl": "{"text":"The number of aromatic bonds of the chemical with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC is 12."} {"text":"The number of aromatic bonds of the chemical with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3 is 12."}", "/scratch/micpie/export/rdkit_features/train_1-22.jsonl": "{"text":"User: I want to design a compound with a number of hydrogen bond donor sites of 2 and a count of hydrogen bond acceptors of 3.\nAssistant: Nice, do you have some additional requirements that help me narrow down the search?\nUser: I want the heteroatom count to be 5.\nAssistant: I suggest the compound with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4."} {"text":"User: I want to synthesize a compound with a count of hydrogen bond donors of 0 and a number of hydrogen bond acceptors of 3.\nAssistant: That's interesting, do you have some additional ?\nUser: Yep, I want the heteroatom count to be 9.\nAssistant: I advise the compound with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl."}", "/scratch/micpie/export/rdkit_features/test_16-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the compound with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C?\nAnswer: 0"} {"text":"Question: What is the number of aromatic bonds of the molecule with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_114-15.jsonl": "{"text":"Question: What is the heteroatom count of the chemical with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C?\nAnswer: 9"} {"text":"Question: What is the count of heteroatoms of the compound with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/valid_31-4.jsonl": "{"text":"The count of rings of the compound with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC is 4."} {"text":"The count of rings of the compound with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3 is 3."}", "/scratch/micpie/export/rdkit_features/test_17-4.jsonl": "{"text":"The ring count of the compound with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F is 3."} {"text":"The ring count of the chemical with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 5."}", "/scratch/micpie/export/rdkit_features/valid_103-11.jsonl": "{"text":"User: I want to make a molecule with a chemical formula of C22H36O7.\nAssistant: Do you have some additional limitations I should consider?\nUser: Yeah, I want the number of hydrogen bond donors to be 4, the count of hydrogen bond acceptors to be 7.\nAssistant: In that situation, I recommend the molecule with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO."} {"text":"User: I want to design a molecule with a molecular formula of C14H14N6O3.\nAssistant: Cool, do you have some additional I should take into account?\nUser: Yep, I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptors to be 6.\nAssistant: In that scenario, I the molecule with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N."}", "/scratch/micpie/export/rdkit_features/valid_117-3.jsonl": "{"text":"The heteroatom count of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4OC)Cl is 7."} {"text":"The heteroatom count of the chemical with SMILES Cc1cc(ccc1[C@@H](C)[NH2+]CC2(CC2)CO)F is 3."}", "/scratch/micpie/export/rdkit_features/train_15-4.jsonl": "{"text":"The number of rings of the chemical with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC is 2."} {"text":"The number of rings of the chemical with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C is 2."}", "/scratch/micpie/export/rdkit_features/train_6-6.jsonl": "{"text":"The aromatic bond count of the molecule with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4 is 11."} {"text":"The number of aromatic bonds of the chemical with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C is 17."}", "/scratch/micpie/export/rdkit_features/valid_16-4.jsonl": "{"text":"The number of rings of the compound with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C is 1."} {"text":"The count of rings of the molecule with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3 is 3."}", "/scratch/micpie/export/rdkit_features/valid_114-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl?\nAnswer: 3"} {"text":"Question: What is the count of rings of the molecule with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_110-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O is 11."} {"text":"The count of aromatic bonds of the molecule with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C is 15."}", "/scratch/micpie/export/rdkit_features/train_29-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the molecule with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F?\nAnswer: 4"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_5-16.jsonl": "{"text":"Question: What is the number of rings of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C?\nAnswer: 4"} {"text":"Question: What is the count of rings of the compound with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_101-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the compound with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3 is 49.69."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1 is 65.84."}", "/scratch/micpie/export/rdkit_features/test_9-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the molecule with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C is 1."} {"text":"The count of hydrogen bond donors of the molecule with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4 is 1."}", "/scratch/micpie/export/rdkit_features/train_0-4.jsonl": "{"text":"The ring count of the chemical with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1 is 6."} {"text":"The count of rings of the chemical with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4 is 4."}", "/scratch/micpie/export/rdkit_features/valid_24-3.jsonl": "{"text":"The number of heteroatoms of the chemical with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl is 10."} {"text":"The heteroatom count of the molecule with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C is 7."}", "/scratch/micpie/export/rdkit_features/valid_105-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the molecule with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F?\nAnswer: 7"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/valid_10-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C is 57.76."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3 is 57.42."}", "/scratch/micpie/export/rdkit_features/valid_101-19.jsonl": "{"text":"Question: What is the count of acid groups of the chemical with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3?\nAnswer: 0"} {"text":"Question: What is the acid group count of the chemical with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_16-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C is 58.82."} {"text":"The total sum of atomic polarizabilities of the molecule with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N is 55.56."}", "/scratch/micpie/export/rdkit_features/valid_1-0.jsonl": "{"text":"The chemical formula of the compound with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5 is C21H21N7OS."} {"text":"The molecular formula of the chemical with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F is C22H24FN5O2."}", "/scratch/micpie/export/rdkit_features/test_21-19.jsonl": "{"text":"Question: What is the number of acid groups of the chemical with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-]?\nAnswer: 1"} {"text":"Question: What is the acid group count of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_19-0.jsonl": "{"text":"The molecular formula of the chemical with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC is C23H31N3O4."} {"text":"The molecular formula of the compound with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F is C18H19FN2O4S2."}", "/scratch/micpie/export/rdkit_features/test_32-1.jsonl": "{"text":"The number of hydrogen bond donors of the molecule with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3 is 1."} {"text":"The count of hydrogen bond donors of the molecule with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3 is 3."}", "/scratch/micpie/export/rdkit_features/test_9-3.jsonl": "{"text":"The number of heteroatoms of the chemical with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C is 6."} {"text":"The number of heteroatoms of the molecule with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4 is 7."}", "/scratch/micpie/export/rdkit_features/valid_26-19.jsonl": "{"text":"Question: What is the acid group count of the chemical with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2?\nAnswer: 0"} {"text":"Question: What is the acid group count of the molecule with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_31-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the chemical with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC?\nAnswer: 5"} {"text":"Question: What is the number of rotatable bonds of the molecule with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_16-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C is 6."} {"text":"The number of rotatable bonds of the molecule with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F is 6."}", "/scratch/micpie/export/rdkit_features/test_26-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3?\nAnswer: 51.53"} {"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br?\nAnswer: 43.80"}", "/scratch/micpie/export/rdkit_features/train_6-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the chemical with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4 is 6."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C is 5."}", "/scratch/micpie/export/rdkit_features/train_18-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O?\nAnswer: 68.38"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl?\nAnswer: 54.59"}", "/scratch/micpie/export/rdkit_features/test_0-7.jsonl": "{"text":"The count of acid groups of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1 is 1."} {"text":"The count of acid groups of the molecule with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC is 0."}", "/scratch/micpie/export/rdkit_features/train_2-18.jsonl": "{"text":"Question: What is the aromatic bond count of the molecule with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6?\nAnswer: 20"} {"text":"Question: What is the number of aromatic bonds of the molecule with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O?\nAnswer: 21"}", "/scratch/micpie/export/rdkit_features/valid_110-17.jsonl": "{"text":"Question: What is the rotatable bond count of the compound with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O?\nAnswer: 3"} {"text":"Question: What is the count of rotatable bonds of the compound with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_8-7.jsonl": "{"text":"The number of acid groups of the chemical with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C is 0."} {"text":"The acid group count of the molecule with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br is 0."}", "/scratch/micpie/export/rdkit_features/valid_32-3.jsonl": "{"text":"The heteroatom count of the compound with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3 is 8."} {"text":"The number of heteroatoms of the chemical with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O is 9."}", "/scratch/micpie/export/rdkit_features/test_26-3.jsonl": "{"text":"The heteroatom count of the molecule with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3 is 7."} {"text":"The heteroatom count of the compound with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br is 6."}", "/scratch/micpie/export/rdkit_features/train_19-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC is C23H31N3O4."} {"text":"The chemical formula of the molecule with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3 is C25H33N3O3."}", "/scratch/micpie/export/rdkit_features/valid_16-7.jsonl": "{"text":"The number of acid groups of the molecule with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C is 0."} {"text":"The acid group count of the molecule with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3 is 0."}", "/scratch/micpie/export/rdkit_features/valid_108-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O?\nAnswer: 8"} {"text":"Question: What is the number of rotatable bonds of the chemical with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/train_11-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_13-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the chemical with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC is 4.58."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4 is 4.97."}", "/scratch/micpie/export/rdkit_features/test_114-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the molecule with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C is 3."} {"text":"The count of hydrogen bond donors of the molecule with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5 is 2."}", "/scratch/micpie/export/rdkit_features/train_20-11.jsonl": "{"text":"User: I want to create a molecule with a formula of C25H33N3O3.\nAssistant: Cool, do you have some additional limitations?\nUser: Yeah, I want the number of hydrogen bond donor sites to be 2, the number of hydrogen bond acceptors to be 3.\nAssistant: I advise the molecule with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3."} {"text":"User: I want to make a molecule with a formula of C17H15FNO6S-.\nAssistant: Nice, do you have some additional limitations I should take into account?\nUser: Yep, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 6.\nAssistant: In that scenario, I propose the molecule with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC."}", "/scratch/micpie/export/rdkit_features/valid_18-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 9"} {"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_11-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl is 4.69."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C is 4.74."}", "/scratch/micpie/export/rdkit_features/train_115-7.jsonl": "{"text":"The number of acid groups of the compound with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F is 0."} {"text":"The number of acid groups of the chemical with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 0."}", "/scratch/micpie/export/rdkit_features/valid_3-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N is 64.06."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC is 58.56."}", "/scratch/micpie/export/rdkit_features/valid_119-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O?\nAnswer: 2"} {"text":"Question: What is the count of rings of the chemical with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_27-22.jsonl": "{"text":"User: I want to create a compound with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptor sites of 5.\nAssistant: That's interesting, do you have some additional requirements that help me narrow down the search?\nUser: I want the heteroatom count to be 10.\nAssistant: I suggest the compound with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F."} {"text":"User: I want to design a chemical with a count of hydrogen bond donors of 2 and a count of hydrogen bond acceptors of 1.\nAssistant: Interesting, do you have some additional I should take into account?\nUser: Yea, I want the number of heteroatoms to be 5.\nAssistant: In that scenario, I recommend the chemical with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C."}", "/scratch/micpie/export/rdkit_features/test_107-18.jsonl": "{"text":"Question: What is the aromatic bond count of the molecule with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC?\nAnswer: 11"} {"text":"Question: What is the number of aromatic bonds of the chemical with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/test_115-11.jsonl": "{"text":"User: I want to synthesize a molecule with a chemical formula of C27H22FN3O4.\nAssistant: Interesting, do you have some additional requirements I should consider?\nUser: Yep, I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 5.\nAssistant: Then, I suggest the molecule with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F."} {"text":"User: I want to create a molecule with a molecular formula of C30H30FN3O2.\nAssistant: Do you have some additional limitations that I should consider?\nUser: Yes, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 3.\nAssistant: In that case, I the molecule with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F."}", "/scratch/micpie/export/rdkit_features/train_33-19.jsonl": "{"text":"Question: What is the count of acid groups of the compound with SMILES COc1ccc(c(n1)C(F)(F)F)NC(=O)C(=O)NCCC[NH+]2CCCCC2?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the compound with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@H]3CCSC3)Br?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_4-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br is 12."} {"text":"The aromatic bond count of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C is 16."}", "/scratch/micpie/export/rdkit_features/valid_13-4.jsonl": "{"text":"The number of rings of the molecule with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4 is 4."} {"text":"The number of rings of the compound with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C is 4."}", "/scratch/micpie/export/rdkit_features/valid_25-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4?\nAnswer: 0"} {"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_23-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4 is 5."} {"text":"The number of hydrogen bond acceptors of the molecule with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4 is 5."}", "/scratch/micpie/export/rdkit_features/test_114-4.jsonl": "{"text":"The count of rings of the chemical with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C is 3."} {"text":"The count of rings of the compound with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5 is 5."}", "/scratch/micpie/export/rdkit_features/train_4-7.jsonl": "{"text":"The count of acid groups of the molecule with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4 is 0."} {"text":"The count of acid groups of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C is 0."}", "/scratch/micpie/export/rdkit_features/train_1-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the molecule with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4?\nAnswer: 12"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_109-16.jsonl": "{"text":"Question: What is the count of rings of the chemical with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3?\nAnswer: 3"} {"text":"Question: What is the number of rings of the compound with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_102-11.jsonl": "{"text":"User: I want to synthesize a molecule with a chemical formula of C20H30N2O6S2.\nAssistant: Cool, do you have some additional I should take into account?\nUser: Yes, I want the number of hydrogen bond donors to be 4, the number of hydrogen bond acceptors to be 8.\nAssistant: Then, I propose the molecule with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O."} {"text":"User: I want to synthesize a compound with a formula of C18H15NO2.\nAssistant: Do you have some additional that help me narrow down the search?\nUser: Yep, I want the number of hydrogen bond donor sites to be 1, the count of hydrogen bond acceptors to be 3.\nAssistant: In that situation, I propose the compound with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1."}", "/scratch/micpie/export/rdkit_features/test_15-16.jsonl": "{"text":"Question: What is the number of rings of the compound with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O?\nAnswer: 1"} {"text":"Question: What is the count of rings of the molecule with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_14-7.jsonl": "{"text":"The acid group count of the molecule with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3 is 0."} {"text":"The count of acid groups of the molecule with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O is 0."}", "/scratch/micpie/export/rdkit_features/valid_119-7.jsonl": "{"text":"The number of acid groups of the compound with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O is 0."} {"text":"The count of acid groups of the molecule with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6 is 0."}", "/scratch/micpie/export/rdkit_features/train_115-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F is 18."} {"text":"The aromatic bond count of the compound with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 18."}", "/scratch/micpie/export/rdkit_features/valid_117-6.jsonl": "{"text":"The aromatic bond count of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4OC)Cl is 17."} {"text":"The aromatic bond count of the compound with SMILES Cc1cc(ccc1[C@@H](C)[NH2+]CC2(CC2)CO)F is 6."}", "/scratch/micpie/export/rdkit_features/train_0-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1 is 58.28."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4 is 70.61."}", "/scratch/micpie/export/rdkit_features/test_8-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C?\nAnswer: 0"} {"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_115-7.jsonl": "{"text":"The number of acid groups of the molecule with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F is 0."} {"text":"The count of acid groups of the molecule with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 0."}", "/scratch/micpie/export/rdkit_features/test_110-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3 is 2."} {"text":"The number of hydrogen bond donors of the chemical with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4 is 0."}", "/scratch/micpie/export/rdkit_features/test_31-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F is 3."} {"text":"The count of hydrogen bond acceptors of the chemical with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3 is 4."}", "/scratch/micpie/export/rdkit_features/valid_24-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl?\nAnswer: 51.83"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C?\nAnswer: 70.37"}", "/scratch/micpie/export/rdkit_features/train_107-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl is 3."} {"text":"The number of hydrogen bond donors of the chemical with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5 is 2."}", "/scratch/micpie/export/rdkit_features/valid_4-16.jsonl": "{"text":"Question: What is the number of rings of the compound with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br?\nAnswer: 3"} {"text":"Question: What is the ring count of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_1-23.jsonl": "{"text":"User: I want to synthesize a molecule with a number of hydrogen bond donor sites of 2, a count of hydrogen bond acceptors of 3 and a LogP value computed using the Wildman-Crippen method of 2.58.\nAssistant: Cool, do you have some additional constraints I should take into account?\nUser: Yes, I want the formula to be C23H25ClN3O2+.\nAssistant: In that case, I recommend the molecule with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4."} {"text":"User: I want to create a compound with a number of hydrogen bond donor sites of 0, a count of hydrogen bond acceptors of 8 and a LogP value computed using the Wildman-Crippen method of 3.73.\nAssistant: Nice, do you have some additional that I should consider?\nUser: Yep, I want the formula to be C23H23N5O3.\nAssistant: Then, I the compound with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC."}", "/scratch/micpie/export/rdkit_features/train_33-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES COc1ccc(c(n1)C(F)(F)F)NC(=O)C(=O)NCCC[NH+]2CCCCC2 is 3."} {"text":"The number of hydrogen bond donor sites of the compound with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@H]3CCSC3)Br is 1."}", "/scratch/micpie/export/rdkit_features/valid_26-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the compound with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2 is 0."} {"text":"The count of hydrogen bond donors of the compound with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC is 3."}", "/scratch/micpie/export/rdkit_features/valid_29-6.jsonl": "{"text":"The number of aromatic bonds of the chemical with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl is 6."} {"text":"The number of aromatic bonds of the compound with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3 is 12."}", "/scratch/micpie/export/rdkit_features/valid_110-19.jsonl": "{"text":"Question: What is the count of acid groups of the molecule with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the compound with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_108-5.jsonl": "{"text":"The rotatable bond count of the molecule with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5 is 5."} {"text":"The count of rotatable bonds of the compound with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3 is 6."}", "/scratch/micpie/export/rdkit_features/valid_109-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl is 64.74."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F is 59.89."}", "/scratch/micpie/export/rdkit_features/train_25-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_31-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F?\nAnswer: 4"} {"text":"Question: What is the ring count of the chemical with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_30-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the compound with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3?\nAnswer: 6"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/test_14-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3?\nAnswer: 3"} {"text":"Question: What is the count of rings of the molecule with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_33-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the compound with SMILES COc1ccc(c(n1)C(F)(F)F)NC(=O)C(=O)NCCC[NH+]2CCCCC2?\nAnswer: 6"} {"text":"Question: What is the aromatic bond count of the molecule with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@H]3CCSC3)Br?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/test_4-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4?\nAnswer: C23H20ClN3O3"} {"text":"Question: What is the chemical formula of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3?\nAnswer: C23H29N2O3S+"}", "/scratch/micpie/export/rdkit_features/valid_27-3.jsonl": "{"text":"The number of heteroatoms of the chemical with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F is 10."} {"text":"The count of heteroatoms of the molecule with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C is 5."}", "/scratch/micpie/export/rdkit_features/test_108-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the chemical with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5?\nAnswer: 11"} {"text":"Question: What is the number of heteroatoms of the molecule with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/test_114-22.jsonl": "{"text":"User: I want to analyze a compound with a count of hydrogen bond donors of 3 and a number of hydrogen bond acceptor sites of 4.\nAssistant: Interesting, do you have some additional conditions?\nUser: Yep, I want the count of heteroatoms to be 8.\nAssistant: In that case, I propose the compound with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C."} {"text":"User: I want to analyze a molecule with a number of hydrogen bond donors of 2 and a number of hydrogen bond acceptor sites of 6.\nAssistant: Interesting, do you have some additional requirements that I should consider?\nUser: I want the count of heteroatoms to be 8.\nAssistant: In that situation, I the molecule with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5."}", "/scratch/micpie/export/rdkit_features/test_120-6.jsonl": "{"text":"The aromatic bond count of the compound with SMILES c1ccc(cc1)CCN(CC(=O)Nc2ccc(cc2)Cl)S(=O)(=O)c3ccc4c(c3)OCCO4 is 18."} {"text":"The count of aromatic bonds of the chemical with SMILES Cc1c(cnn1c2ccccc2)N[C@H](C)C(=O)Nc3cccc(c3Cl)Cl is 17."}", "/scratch/micpie/export/rdkit_features/train_7-3.jsonl": "{"text":"The count of heteroatoms of the compound with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl is 7."} {"text":"The count of heteroatoms of the chemical with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F is 9."}", "/scratch/micpie/export/rdkit_features/valid_105-4.jsonl": "{"text":"The count of rings of the compound with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F is 3."} {"text":"The ring count of the molecule with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O is 3."}", "/scratch/micpie/export/rdkit_features/test_14-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3 is 17."} {"text":"The aromatic bond count of the compound with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O is 6."}", "/scratch/micpie/export/rdkit_features/valid_15-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the compound with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N?\nAnswer: 5"} {"text":"Question: What is the rotatable bond count of the compound with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/train_101-22.jsonl": "{"text":"User: I want to make a compound with a count of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 5.\nAssistant: Interesting, do you have some additional constraints I should take into account?\nUser: Yeah, I want the number of heteroatoms to be 6.\nAssistant: I the compound with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3."} {"text":"User: I want to create a molecule with a number of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 7.\nAssistant: Interesting, do you have some additional constraints I should take into account?\nUser: Yep, I want the number of heteroatoms to be 7.\nAssistant: In that situation, I propose the molecule with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1."}", "/scratch/micpie/export/rdkit_features/test_8-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C?\nAnswer: C22H26N4O4"} {"text":"Question: What is the chemical formula of the compound with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F?\nAnswer: C22H24F2N2O3"}", "/scratch/micpie/export/rdkit_features/valid_114-6.jsonl": "{"text":"The count of aromatic bonds of the compound with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl is 18."} {"text":"The number of aromatic bonds of the chemical with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4 is 22."}", "/scratch/micpie/export/rdkit_features/train_3-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O?\nAnswer: 5"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_9-20.jsonl": "{"text":"Question: What is the count of basic groups of the chemical with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C?\nAnswer: 1"} {"text":"Question: What is the count of basic groups of the molecule with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_22-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O?\nAnswer: 8"} {"text":"Question: What is the count of heteroatoms of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/test_109-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl is 2.40."} {"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N is 1.83."}", "/scratch/micpie/export/rdkit_features/train_32-12.jsonl": "{"text":"Question: What is the chemical formula of the compound with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C?\nAnswer: C18H32N5O3S+"} {"text":"Question: What is the chemical formula of the chemical with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2?\nAnswer: C17H20F3N4O3+"}", "/scratch/micpie/export/rdkit_features/train_28-22.jsonl": "{"text":"User: I want to analyze a compound with a count of hydrogen bond donors of 0 and a number of hydrogen bond acceptor sites of 3.\nAssistant: Nice, do you have some additional requirements I should take into account?\nUser: I want the heteroatom count to be 6.\nAssistant: I suggest the compound with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl."} {"text":"User: I want to design a molecule with a number of hydrogen bond donor sites of 2 and a count of hydrogen bond acceptors of 2.\nAssistant: Interesting, do you have some additional constraints I should consider?\nUser: Yep, I want the count of heteroatoms to be 6.\nAssistant: In that case, I propose the molecule with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F."}", "/scratch/micpie/export/rdkit_features/train_9-22.jsonl": "{"text":"User: I want to design a compound with a number of hydrogen bond donor sites of 3 and a count of hydrogen bond acceptors of 3.\nAssistant: That is a very interesting question, do you have some additional requirements that help me narrow down the search?\nUser: Yes, I want the count of heteroatoms to be 6.\nAssistant: Then, I recommend the compound with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C."} {"text":"User: I want to make a chemical with a number of hydrogen bond donors of 0 and a count of hydrogen bond acceptors of 5.\nAssistant: Do you have some additional limitations I should take into account?\nUser: I want the count of heteroatoms to be 7.\nAssistant: In that case, I the chemical with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F."}", "/scratch/micpie/export/rdkit_features/valid_100-4.jsonl": "{"text":"The count of rings of the molecule with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F is 4."} {"text":"The ring count of the molecule with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4 is 4."}", "/scratch/micpie/export/rdkit_features/valid_108-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O?\nAnswer: 12"} {"text":"Question: What is the count of aromatic bonds of the compound with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_33-22.jsonl": "{"text":"User: I want to design a chemical with a number of hydrogen bond donor sites of 3 and a count of hydrogen bond acceptors of 3.\nAssistant: Interesting, do you have some additional limitations?\nUser: Yea, I want the number of heteroatoms to be 9.\nAssistant: Then, I advise the chemical with SMILES Cc1ccc(cc1)C(=O)N2CCC(CC2)NC(=O)N[C@@H](C)[C@H](C(F)(F)F)O."} {"text":"User: I want to create a molecule with a number of hydrogen bond donors of 0 and a number of hydrogen bond acceptors of 8.\nAssistant: Interesting, do you have some additional constraints?\nUser: Yep, I want the number of heteroatoms to be 9.\nAssistant: In that situation, I advise the molecule with SMILES Cc1cc(no1)OCc2nc(no2)c3cc(c(n(c3=O)C)C)Br."}", "/scratch/micpie/export/rdkit_features/valid_120-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES c1cc(oc1)CNC(=O)CSc2nc3ccc(cc3s2)n4c(c5c(c4O)[C@@H]6C[C@@H]5C=C6)O is 65.46."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES CN(c1nc2cc(ccc2s1)Cl)C(=O)CSc3ccc(c(c3)F)F is 47.59."}", "/scratch/micpie/export/rdkit_features/train_102-12.jsonl": "{"text":"Question: What is the chemical formula of the compound with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O?\nAnswer: C20H30N2O6S2"} {"text":"Question: What is the chemical formula of the molecule with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1?\nAnswer: C18H15NO2"}", "/scratch/micpie/export/rdkit_features/valid_22-22.jsonl": "{"text":"User: I want to analyze a compound with a number of hydrogen bond donor sites of 1 and a count of hydrogen bond acceptors of 5.\nAssistant: That is a very interesting question, do you have some additional I should take into account?\nUser: Yea, I want the number of heteroatoms to be 8.\nAssistant: I propose the compound with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O."} {"text":"User: I want to analyze a molecule with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 7.\nAssistant: That is a very interesting question, do you have some additional I should take into account?\nUser: I want the count of heteroatoms to be 8.\nAssistant: Then, I the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4."}", "/scratch/micpie/export/rdkit_features/valid_107-6.jsonl": "{"text":"The aromatic bond count of the chemical with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C is 16."} {"text":"The count of aromatic bonds of the molecule with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C is 12."}", "/scratch/micpie/export/rdkit_features/valid_116-3.jsonl": "{"text":"The count of heteroatoms of the chemical with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 6."} {"text":"The count of heteroatoms of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl is 7."}", "/scratch/micpie/export/rdkit_features/test_110-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_115-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the compound with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F?\nAnswer: 24"} {"text":"Question: What is the number of aromatic bonds of the compound with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 18"}", "/scratch/micpie/export/rdkit_features/train_9-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C?\nAnswer: 71.04"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F?\nAnswer: 66.92"}", "/scratch/micpie/export/rdkit_features/test_110-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3 is 1.04."} {"text":"The Wildman-Crippen LogP value of the compound with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4 is 2.17."}", "/scratch/micpie/export/rdkit_features/train_18-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the chemical with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O?\nAnswer: 6"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/train_100-8.jsonl": "{"text":"The basic group count of the chemical with SMILES CC(C)[C@H]1[C@@H](CCO1)C(=O)N2CC3(C2)CC[NH+](C3)CC=C(C)C is 1."} {"text":"The basic group count of the chemical with SMILES COc1cccc(c1)CC(=O)Nc2cc(no2)c3ccccn3 is 0."}", "/scratch/micpie/export/rdkit_features/valid_9-19.jsonl": "{"text":"Question: What is the count of acid groups of the molecule with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the chemical with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_26-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the compound with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2 is 3."} {"text":"The number of hydrogen bond acceptor sites of the compound with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC is 6."}", "/scratch/micpie/export/rdkit_features/valid_103-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO?\nAnswer: 4"} {"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_27-20.jsonl": "{"text":"Question: What is the number of basic groups of the molecule with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the chemical with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_4-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br is 4."} {"text":"The number of rotatable bonds of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C is 5."}", "/scratch/micpie/export/rdkit_features/test_101-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F is 6."} {"text":"The number of rotatable bonds of the molecule with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1 is 4."}", "/scratch/micpie/export/rdkit_features/valid_5-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C?\nAnswer: 6"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+]?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_113-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the compound with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 8"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_23-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4?\nAnswer: 5"} {"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_7-23.jsonl": "{"text":"User: I want to create a molecule with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptor sites of 4 and a Wildman-Crippen LogP value computed using RDKit of 3.80.\nAssistant: Do you have some additional limitations I should take into account?\nUser: Yep, I want the formula to be C24H28N2O4.\nAssistant: Then, I recommend the molecule with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC."} {"text":"User: I want to design a compound with a number of hydrogen bond donor sites of 2, a count of hydrogen bond acceptors of 5 and a LogP value computed using the Wildman-Crippen method of 3.84.\nAssistant: That is a very interesting question, do you have some additional conditions?\nUser: Indeed, I want the formula to be C23H23N3O4.\nAssistant: Then, I the compound with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC."}", "/scratch/micpie/export/rdkit_features/train_4-3.jsonl": "{"text":"The number of heteroatoms of the chemical with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4 is 6."} {"text":"The heteroatom count of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C is 6."}", "/scratch/micpie/export/rdkit_features/valid_31-11.jsonl": "{"text":"User: I want to create a compound with a chemical formula of C24H27FN2O3.\nAssistant: Do you have some additional conditions I should take into account?\nUser: I want the number of hydrogen bond donor sites to be 0, the count of hydrogen bond acceptors to be 3.\nAssistant: In that situation, I suggest the compound with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC."} {"text":"User: I want to analyze a chemical with a chemical formula of C20H27FN2O4.\nAssistant: That's interesting, do you have some additional constraints I should take into account?\nUser: Yea, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 4.\nAssistant: Then, I advise the chemical with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3."}", "/scratch/micpie/export/rdkit_features/train_20-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_25-18.jsonl": "{"text":"Question: What is the aromatic bond count of the molecule with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4?\nAnswer: 11"} {"text":"Question: What is the number of aromatic bonds of the chemical with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/valid_16-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the chemical with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C is 1.42."} {"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3 is 2.55."}", "/scratch/micpie/export/rdkit_features/test_120-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES c1ccc(cc1)CCN(CC(=O)Nc2ccc(cc2)Cl)S(=O)(=O)c3ccc4c(c3)OCCO4 is 0."} {"text":"The acid group count of the compound with SMILES Cc1c(cnn1c2ccccc2)N[C@H](C)C(=O)Nc3cccc(c3Cl)Cl is 0."}", "/scratch/micpie/export/rdkit_features/test_12-18.jsonl": "{"text":"Question: What is the aromatic bond count of the chemical with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C?\nAnswer: 15"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_117-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4OC)Cl?\nAnswer: 7"} {"text":"Question: What is the count of heteroatoms of the molecule with SMILES Cc1cc(ccc1[C@@H](C)[NH2+]CC2(CC2)CO)F?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_16-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the chemical with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C?\nAnswer: 10"} {"text":"Question: What is the number of rotatable bonds of the molecule with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_27-18.jsonl": "{"text":"Question: What is the aromatic bond count of the compound with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C?\nAnswer: 11"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4?\nAnswer: 16"}", "/scratch/micpie/export/rdkit_features/valid_6-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+] is 4."} {"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F is 6."}", "/scratch/micpie/export/rdkit_features/train_12-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the molecule with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C?\nAnswer: 3"} {"text":"Question: What is the count of rotatable bonds of the compound with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/test_15-0.jsonl": "{"text":"The formula of the compound with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCCCCC#N)O is C13H14BrNO3."} {"text":"The chemical formula of the chemical with SMILES C[C@@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)Cc2nccs2 is C16H26N3OS+."}", "/scratch/micpie/export/rdkit_features/valid_33-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the compound with SMILES c1ccc2c(c1)cn(n2)CC(=O)N3C[C@@H]4[C@H]3CN(CC4)C(=O)[C@@]56CCC[C@@H]5C6 is 62.06."} {"text":"The sum of atomic polarizabilities of the compound with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@@H]3CCSC3)Br is 51.69."}", "/scratch/micpie/export/rdkit_features/valid_14-4.jsonl": "{"text":"The count of rings of the molecule with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C is 3."} {"text":"The number of rings of the chemical with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O is 2."}", "/scratch/micpie/export/rdkit_features/train_11-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the compound with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl is 1."} {"text":"The number of hydrogen bond donors of the molecule with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C is 0."}", "/scratch/micpie/export/rdkit_features/valid_120-5.jsonl": "{"text":"The rotatable bond count of the chemical with SMILES c1cc(oc1)CNC(=O)CSc2nc3ccc(cc3s2)n4c(c5c(c4O)[C@@H]6C[C@@H]5C=C6)O is 6."} {"text":"The count of rotatable bonds of the chemical with SMILES CN(c1nc2cc(ccc2s1)Cl)C(=O)CSc3ccc(c(c3)F)F is 4."}", "/scratch/micpie/export/rdkit_features/train_25-20.jsonl": "{"text":"Question: What is the basic group count of the molecule with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C?\nAnswer: 0"} {"text":"Question: What is the basic group count of the chemical with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_21-17.jsonl": "{"text":"Question: What is the rotatable bond count of the compound with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-]?\nAnswer: 6"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_4-12.jsonl": "{"text":"Question: What is the molecular formula of the molecule with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br?\nAnswer: C18H17BrClN3O2"} {"text":"Question: What is the formula of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C?\nAnswer: C23H31N3O2S+2"}", "/scratch/micpie/export/rdkit_features/valid_29-23.jsonl": "{"text":"User: I want to synthesize a compound with a number of hydrogen bond donors of 3, a number of hydrogen bond acceptors of 1 and a LogP value computed using the Wildman-Crippen method of 2.38.\nAssistant: That is a very interesting question, do you have some additional limitations I should consider?\nUser: Yea, I want the chemical formula to be C20H31ClN3O+.\nAssistant: In that situation, I advise the compound with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl."} {"text":"User: I want to analyze a molecule with a count of hydrogen bond donors of 3, a number of hydrogen bond acceptor sites of 2 and a LogP value computed using the Wildman-Crippen method of 2.31.\nAssistant: Do you have some additional conditions that help me narrow down the search?\nUser: Yea, I want the molecular formula to be C21H25F2N2O2+.\nAssistant: In that case, I recommend the molecule with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3."}", "/scratch/micpie/export/rdkit_features/test_29-19.jsonl": "{"text":"Question: What is the number of acid groups of the molecule with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the compound with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_10-3.jsonl": "{"text":"The count of heteroatoms of the chemical with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C is 10."} {"text":"The number of heteroatoms of the chemical with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3 is 4."}", "/scratch/micpie/export/rdkit_features/train_116-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 4"} {"text":"Question: What is the rotatable bond count of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_9-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the compound with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C?\nAnswer: 6"} {"text":"Question: What is the count of heteroatoms of the compound with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/train_110-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O is 60.90."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C is 63.82."}", "/scratch/micpie/export/rdkit_features/test_25-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4 is 1."} {"text":"The count of hydrogen bond donors of the compound with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3 is 2."}", "/scratch/micpie/export/rdkit_features/test_29-5.jsonl": "{"text":"The rotatable bond count of the chemical with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4 is 2."} {"text":"The count of rotatable bonds of the molecule with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C is 4."}", "/scratch/micpie/export/rdkit_features/test_4-22.jsonl": "{"text":"User: I want to synthesize a molecule with a number of hydrogen bond donor sites of 1 and a number of hydrogen bond acceptor sites of 4.\nAssistant: Cool, do you have some additional that help me narrow down the search?\nUser: Indeed, I want the count of heteroatoms to be 7.\nAssistant: In that case, I advise the molecule with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4."} {"text":"User: I want to analyze a compound with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptor sites of 4.\nAssistant: That's interesting, do you have some additional requirements that I should consider?\nUser: Yep, I want the heteroatom count to be 6.\nAssistant: In that situation, I the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3."}", "/scratch/micpie/export/rdkit_features/train_116-5.jsonl": "{"text":"The rotatable bond count of the chemical with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 4."} {"text":"The number of rotatable bonds of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br is 4."}", "/scratch/micpie/export/rdkit_features/train_3-18.jsonl": "{"text":"Question: What is the aromatic bond count of the molecule with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O?\nAnswer: 21"} {"text":"Question: What is the aromatic bond count of the chemical with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4?\nAnswer: 17"}", "/scratch/micpie/export/rdkit_features/train_110-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O?\nAnswer: 60.90"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 63.82"}", "/scratch/micpie/export/rdkit_features/train_112-8.jsonl": "{"text":"The count of basic groups of the molecule with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC is 0."} {"text":"The count of basic groups of the molecule with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 2."}", "/scratch/micpie/export/rdkit_features/valid_16-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C is 58.82."} {"text":"The sum of atomic polarizabilities of the compound with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3 is 69.89."}", "/scratch/micpie/export/rdkit_features/test_24-8.jsonl": "{"text":"The basic group count of the chemical with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl is 0."} {"text":"The count of basic groups of the compound with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C is 1."}", "/scratch/micpie/export/rdkit_features/test_108-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5 is 2.55."} {"text":"The Wildman-Crippen LogP value of the chemical with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3 is 1.70."}", "/scratch/micpie/export/rdkit_features/valid_10-23.jsonl": "{"text":"User: I want to synthesize a chemical with a number of hydrogen bond donors of 0, a number of hydrogen bond acceptor sites of 7 and a Wildman-Crippen LogP value of 3.80.\nAssistant: Cool, do you have some additional requirements I should take into account?\nUser: Yeah, I want the formula to be C17H25F3N6S.\nAssistant: In that case, I recommend the chemical with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C."} {"text":"User: I want to create a molecule with a count of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 2 and a LogP value computed using the Wildman-Crippen method of 4.63.\nAssistant: Do you have some additional limitations I should take into account?\nUser: Indeed, I want the formula to be C20H26ClNO2.\nAssistant: In that case, I propose the molecule with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3."}", "/scratch/micpie/export/rdkit_features/test_29-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the compound with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4?\nAnswer: 6"} {"text":"Question: What is the count of aromatic bonds of the compound with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_101-20.jsonl": "{"text":"Question: What is the basic group count of the molecule with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the chemical with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_4-6.jsonl": "{"text":"The aromatic bond count of the compound with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4 is 17."} {"text":"The count of aromatic bonds of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C is 16."}", "/scratch/micpie/export/rdkit_features/train_114-0.jsonl": "{"text":"The chemical formula of the molecule with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C is C21H19Cl2N3O2S2."} {"text":"The molecular formula of the compound with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F is C27H26FN4O2+."}", "/scratch/micpie/export/rdkit_features/valid_100-3.jsonl": "{"text":"The number of heteroatoms of the chemical with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F is 5."} {"text":"The count of heteroatoms of the molecule with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4 is 4."}", "/scratch/micpie/export/rdkit_features/test_19-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl is 4."} {"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4 is 4."}", "/scratch/micpie/export/rdkit_features/train_117-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(cc4)Oc5ccc(cn5)Cl is 5."} {"text":"The number of hydrogen bond acceptor sites of the compound with SMILES C[C@@H](C(C)(C)OC)[NH2+]CCNc1ccccc1 is 2."}", "/scratch/micpie/export/rdkit_features/valid_4-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_14-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C?\nAnswer: 3"} {"text":"Question: What is the rotatable bond count of the molecule with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_2-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the compound with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6 is 64.73."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O is 65.41."}", "/scratch/micpie/export/rdkit_features/test_100-0.jsonl": "{"text":"The formula of the molecule with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F is C18H27FN3O+."} {"text":"The chemical formula of the compound with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br is C13H14BrN3O2."}", "/scratch/micpie/export/rdkit_features/valid_31-7.jsonl": "{"text":"The number of acid groups of the compound with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC is 0."} {"text":"The acid group count of the chemical with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3 is 0."}", "/scratch/micpie/export/rdkit_features/valid_21-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-]?\nAnswer: 58.13"} {"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3?\nAnswer: 66.43"}", "/scratch/micpie/export/rdkit_features/test_10-18.jsonl": "{"text":"Question: What is the aromatic bond count of the molecule with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4?\nAnswer: 5"} {"text":"Question: What is the count of aromatic bonds of the molecule with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_33-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES Cc1ccc(cc1)C(=O)N2CCC(CC2)NC(=O)N[C@@H](C)[C@H](C(F)(F)F)O?\nAnswer: 55.06"} {"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES Cc1cc(no1)OCc2nc(no2)c3cc(c(n(c3=O)C)C)Br?\nAnswer: 43.97"}", "/scratch/micpie/export/rdkit_features/valid_111-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the molecule with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4?\nAnswer: 8"} {"text":"Question: What is the count of heteroatoms of the compound with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC?\nAnswer: 10"}", "/scratch/micpie/export/rdkit_features/valid_15-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the molecule with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N is 4."} {"text":"The count of hydrogen bond acceptors of the chemical with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C is 2."}", "/scratch/micpie/export/rdkit_features/test_110-11.jsonl": "{"text":"User: I want to synthesize a compound with a chemical formula of C23H30N3O4S+.\nAssistant: Nice, do you have some additional conditions?\nUser: I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 5.\nAssistant: In that scenario, I the compound with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3."} {"text":"User: I want to synthesize a molecule with a formula of C21H31N5O3S.\nAssistant: Interesting, do you have some additional limitations?\nUser: Indeed, I want the number of hydrogen bond donors to be 0, the count of hydrogen bond acceptors to be 8.\nAssistant: In that situation, I the molecule with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4."}", "/scratch/micpie/export/rdkit_features/test_31-18.jsonl": "{"text":"Question: What is the aromatic bond count of the chemical with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F?\nAnswer: 16"} {"text":"Question: What is the number of aromatic bonds of the chemical with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_18-20.jsonl": "{"text":"Question: What is the number of basic groups of the molecule with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the molecule with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_119-1.jsonl": "{"text":"The number of hydrogen bond donors of the molecule with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O is 2."} {"text":"The number of hydrogen bond donors of the molecule with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6 is 0."}", "/scratch/micpie/export/rdkit_features/valid_3-19.jsonl": "{"text":"Question: What is the acid group count of the chemical with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N?\nAnswer: 0"} {"text":"Question: What is the acid group count of the molecule with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_104-11.jsonl": "{"text":"User: I want to synthesize a chemical with a molecular formula of C14H21N5O4.\nAssistant: That is a very interesting question, do you have some additional that help me narrow down the search?\nUser: Indeed, I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 6.\nAssistant: In that scenario, I suggest the chemical with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3."} {"text":"User: I want to design a chemical with a formula of C21H34N3O2+.\nAssistant: Do you have some additional requirements?\nUser: I want the count of hydrogen bond donors to be 3, the count of hydrogen bond acceptors to be 2.\nAssistant: In that case, I advise the chemical with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C."}", "/scratch/micpie/export/rdkit_features/test_109-18.jsonl": "{"text":"Question: What is the aromatic bond count of the chemical with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl?\nAnswer: 12"} {"text":"Question: What is the number of aromatic bonds of the molecule with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N?\nAnswer: 17"}", "/scratch/micpie/export/rdkit_features/train_115-3.jsonl": "{"text":"The count of heteroatoms of the compound with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F is 8."} {"text":"The number of heteroatoms of the molecule with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 7."}", "/scratch/micpie/export/rdkit_features/train_14-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C is 4.83."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O is 2.72."}", "/scratch/micpie/export/rdkit_features/test_2-16.jsonl": "{"text":"Question: What is the count of rings of the chemical with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C?\nAnswer: 3"} {"text":"Question: What is the count of rings of the compound with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_113-7.jsonl": "{"text":"The number of acid groups of the chemical with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 0."} {"text":"The acid group count of the chemical with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 0."}", "/scratch/micpie/export/rdkit_features/test_108-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the compound with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5?\nAnswer: 27"} {"text":"Question: What is the number of aromatic bonds of the chemical with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/valid_120-7.jsonl": "{"text":"The count of acid groups of the molecule with SMILES c1cc(oc1)CNC(=O)CSc2nc3ccc(cc3s2)n4c(c5c(c4O)[C@@H]6C[C@@H]5C=C6)O is 0."} {"text":"The acid group count of the chemical with SMILES CN(c1nc2cc(ccc2s1)Cl)C(=O)CSc3ccc(c(c3)F)F is 0."}", "/scratch/micpie/export/rdkit_features/valid_23-18.jsonl": "{"text":"Question: What is the aromatic bond count of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4?\nAnswer: 5"} {"text":"Question: What is the count of aromatic bonds of the molecule with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/valid_25-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the chemical with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4 is 0."} {"text":"The number of hydrogen bond donors of the chemical with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3 is 1."}", "/scratch/micpie/export/rdkit_features/valid_7-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC?\nAnswer: 16"} {"text":"Question: What is the aromatic bond count of the molecule with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC?\nAnswer: 18"}", "/scratch/micpie/export/rdkit_features/train_17-7.jsonl": "{"text":"The count of acid groups of the compound with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F is 0."} {"text":"The count of acid groups of the compound with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 0."}", "/scratch/micpie/export/rdkit_features/train_30-5.jsonl": "{"text":"The rotatable bond count of the compound with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F is 8."} {"text":"The rotatable bond count of the molecule with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC is 5."}", "/scratch/micpie/export/rdkit_features/train_24-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5 is 5."} {"text":"The count of rotatable bonds of the compound with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C is 6."}", "/scratch/micpie/export/rdkit_features/train_112-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC?\nAnswer: 68.10"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 69.10"}", "/scratch/micpie/export/rdkit_features/train_101-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3?\nAnswer: 5"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/valid_3-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N?\nAnswer: 5"} {"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_16-16.jsonl": "{"text":"Question: What is the ring count of the compound with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C?\nAnswer: 1"} {"text":"Question: What is the number of rings of the compound with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_20-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3 is 3.95."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-] is 1.10."}", "/scratch/micpie/export/rdkit_features/test_13-0.jsonl": "{"text":"The formula of the compound with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC is C19H21ClFNO2."} {"text":"The molecular formula of the compound with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4 is C22H31NO2."}", "/scratch/micpie/export/rdkit_features/test_9-7.jsonl": "{"text":"The count of acid groups of the compound with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C is 0."} {"text":"The acid group count of the chemical with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4 is 0."}", "/scratch/micpie/export/rdkit_features/valid_107-19.jsonl": "{"text":"Question: What is the number of acid groups of the chemical with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the compound with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_19-22.jsonl": "{"text":"User: I want to make a molecule with a number of hydrogen bond donor sites of 2 and a count of hydrogen bond acceptors of 4.\nAssistant: Cool, do you have some additional constraints that I should consider?\nUser: Yea, I want the heteroatom count to be 8.\nAssistant: In that case, I the molecule with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl."} {"text":"User: I want to design a compound with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptors of 4.\nAssistant: Nice, do you have some additional limitations I should consider?\nUser: Yeah, I want the count of heteroatoms to be 8.\nAssistant: In that situation, I advise the compound with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4."}", "/scratch/micpie/export/rdkit_features/test_112-18.jsonl": "{"text":"Question: What is the aromatic bond count of the compound with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC?\nAnswer: 16"} {"text":"Question: What is the aromatic bond count of the compound with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/valid_15-16.jsonl": "{"text":"Question: What is the count of rings of the molecule with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N?\nAnswer: 2"} {"text":"Question: What is the number of rings of the compound with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_20-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the molecule with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3 is 3."} {"text":"The count of hydrogen bond acceptors of the compound with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-] is 6."}", "/scratch/micpie/export/rdkit_features/test_8-6.jsonl": "{"text":"The count of aromatic bonds of the molecule with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C is 16."} {"text":"The number of aromatic bonds of the compound with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F is 12."}", "/scratch/micpie/export/rdkit_features/test_118-4.jsonl": "{"text":"The ring count of the chemical with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O is 1."} {"text":"The ring count of the molecule with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N is 2."}", "/scratch/micpie/export/rdkit_features/valid_111-12.jsonl": "{"text":"Question: What is the formula of the compound with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4?\nAnswer: C24H35N8+"} {"text":"Question: What is the formula of the chemical with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC?\nAnswer: C22H29N7O3"}", "/scratch/micpie/export/rdkit_features/test_4-3.jsonl": "{"text":"The number of heteroatoms of the compound with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4 is 7."} {"text":"The number of heteroatoms of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3 is 6."}", "/scratch/micpie/export/rdkit_features/train_112-22.jsonl": "{"text":"User: I want to synthesize a chemical with a number of hydrogen bond donor sites of 0 and a number of hydrogen bond acceptors of 8.\nAssistant: That's interesting, do you have some additional limitations I should take into account?\nUser: Yes, I want the number of heteroatoms to be 9.\nAssistant: Then, I propose the chemical with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC."} {"text":"User: I want to make a chemical with a number of hydrogen bond donor sites of 3 and a number of hydrogen bond acceptors of 5.\nAssistant: Nice, do you have some additional requirements?\nUser: Yeah, I want the number of heteroatoms to be 9.\nAssistant: I suggest the chemical with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N."}", "/scratch/micpie/export/rdkit_features/valid_31-3.jsonl": "{"text":"The count of heteroatoms of the chemical with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC is 6."} {"text":"The number of heteroatoms of the compound with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3 is 7."}", "/scratch/micpie/export/rdkit_features/test_16-8.jsonl": "{"text":"The count of basic groups of the compound with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C is 1."} {"text":"The basic group count of the compound with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N is 0."}", "/scratch/micpie/export/rdkit_features/valid_28-8.jsonl": "{"text":"The basic group count of the compound with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br is 0."} {"text":"The number of basic groups of the chemical with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C is 0."}", "/scratch/micpie/export/rdkit_features/valid_120-6.jsonl": "{"text":"The number of aromatic bonds of the chemical with SMILES c1cc(oc1)CNC(=O)CSc2nc3ccc(cc3s2)n4c(c5c(c4O)[C@@H]6C[C@@H]5C=C6)O is 20."} {"text":"The count of aromatic bonds of the molecule with SMILES CN(c1nc2cc(ccc2s1)Cl)C(=O)CSc3ccc(c(c3)F)F is 16."}", "/scratch/micpie/export/rdkit_features/train_111-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 64.81"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC?\nAnswer: 68.16"}", "/scratch/micpie/export/rdkit_features/valid_107-20.jsonl": "{"text":"Question: What is the number of basic groups of the molecule with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C?\nAnswer: 0"} {"text":"Question: What is the count of basic groups of the chemical with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_19-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl is 3.75."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4 is 2.16."}", "/scratch/micpie/export/rdkit_features/valid_6-3.jsonl": "{"text":"The number of heteroatoms of the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+] is 6."} {"text":"The count of heteroatoms of the chemical with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F is 8."}", "/scratch/micpie/export/rdkit_features/valid_116-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_119-20.jsonl": "{"text":"Question: What is the basic group count of the molecule with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO?\nAnswer: 0"} {"text":"Question: What is the basic group count of the chemical with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_112-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC is 7."} {"text":"The number of rotatable bonds of the molecule with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N is 7."}", "/scratch/micpie/export/rdkit_features/test_106-15.jsonl": "{"text":"Question: What is the heteroatom count of the compound with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O?\nAnswer: 8"} {"text":"Question: What is the count of heteroatoms of the compound with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/valid_14-11.jsonl": "{"text":"User: I want to design a compound with a chemical formula of C20H28N4O.\nAssistant: That's interesting, do you have some additional requirements that help me narrow down the search?\nUser: Yes, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 3.\nAssistant: Then, I the compound with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C."} {"text":"User: I want to design a chemical with a chemical formula of C14H14BrNO3.\nAssistant: That's interesting, do you have some additional conditions that help me narrow down the search?\nUser: Yes, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 4.\nAssistant: In that case, I propose the chemical with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O."}", "/scratch/micpie/export/rdkit_features/valid_113-5.jsonl": "{"text":"The rotatable bond count of the compound with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 8."} {"text":"The rotatable bond count of the chemical with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 5."}", "/scratch/micpie/export/rdkit_features/train_21-23.jsonl": "{"text":"User: I want to analyze a compound with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 6 and a Wildman-Crippen LogP value of 0.69.\nAssistant: That's interesting, do you have some additional constraints that help me narrow down the search?\nUser: Yeah, I want the molecular formula to be C17H15FNO6S-.\nAssistant: I recommend the compound with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC."} {"text":"User: I want to design a molecule with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptor sites of 4 and a Wildman-Crippen LogP value of 2.49.\nAssistant: That's interesting, do you have some additional conditions I should take into account?\nUser: Indeed, I want the formula to be C19H25ClFN3O3.\nAssistant: I propose the molecule with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O."}", "/scratch/micpie/export/rdkit_features/valid_19-19.jsonl": "{"text":"Question: What is the number of acid groups of the molecule with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the compound with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_24-4.jsonl": "{"text":"The count of rings of the chemical with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl is 4."} {"text":"The count of rings of the molecule with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C is 3."}", "/scratch/micpie/export/rdkit_features/valid_104-15.jsonl": "{"text":"Question: What is the heteroatom count of the molecule with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3?\nAnswer: 9"} {"text":"Question: What is the heteroatom count of the compound with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_104-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2?\nAnswer: 2"} {"text":"Question: What is the ring count of the molecule with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_118-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O?\nAnswer: 42.44"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N?\nAnswer: 42.71"}", "/scratch/micpie/export/rdkit_features/test_113-6.jsonl": "{"text":"The number of aromatic bonds of the chemical with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N is 11."} {"text":"The number of aromatic bonds of the chemical with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 18."}", "/scratch/micpie/export/rdkit_features/valid_117-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4OC)Cl is 4."} {"text":"The count of hydrogen bond acceptors of the compound with SMILES Cc1cc(ccc1[C@@H](C)[NH2+]CC2(CC2)CO)F is 1."}", "/scratch/micpie/export/rdkit_features/train_12-12.jsonl": "{"text":"Question: What is the chemical formula of the compound with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C?\nAnswer: C17H21Cl2NO"} {"text":"Question: What is the molecular formula of the molecule with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: C21H32FNO2"}", "/scratch/micpie/export/rdkit_features/valid_108-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O?\nAnswer: 58.27"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C?\nAnswer: 70.76"}", "/scratch/micpie/export/rdkit_features/train_110-5.jsonl": "{"text":"The rotatable bond count of the molecule with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O is 6."} {"text":"The count of rotatable bonds of the compound with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C is 6."}", "/scratch/micpie/export/rdkit_features/train_8-15.jsonl": "{"text":"Question: What is the heteroatom count of the compound with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N?\nAnswer: 10"} {"text":"Question: What is the number of heteroatoms of the compound with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_109-4.jsonl": "{"text":"The ring count of the compound with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl is 3."} {"text":"The count of rings of the molecule with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N is 3."}", "/scratch/micpie/export/rdkit_features/valid_111-4.jsonl": "{"text":"The ring count of the compound with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4 is 4."} {"text":"The number of rings of the compound with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC is 4."}", "/scratch/micpie/export/rdkit_features/test_113-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N is 0."} {"text":"The number of acid groups of the molecule with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 0."}", "/scratch/micpie/export/rdkit_features/valid_100-8.jsonl": "{"text":"The basic group count of the chemical with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F is 1."} {"text":"The basic group count of the chemical with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4 is 1."}", "/scratch/micpie/export/rdkit_features/train_27-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the molecule with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C is 3.15."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4 is 3.34."}", "/scratch/micpie/export/rdkit_features/valid_0-22.jsonl": "{"text":"User: I want to synthesize a molecule with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 8.\nAssistant: Nice, do you have some additional I should take into account?\nUser: Yep, I want the count of heteroatoms to be 9.\nAssistant: In that situation, I suggest the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1."} {"text":"User: I want to design a chemical with a number of hydrogen bond donor sites of 1 and a count of hydrogen bond acceptors of 4.\nAssistant: Nice, do you have some additional constraints I should consider?\nUser: Yeah, I want the number of heteroatoms to be 5.\nAssistant: I suggest the chemical with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4."}", "/scratch/micpie/export/rdkit_features/train_22-3.jsonl": "{"text":"The count of heteroatoms of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O is 8."} {"text":"The number of heteroatoms of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4 is 8."}", "/scratch/micpie/export/rdkit_features/train_106-11.jsonl": "{"text":"User: I want to create a chemical with a chemical formula of C12H16F2N4O3.\nAssistant: Cool, do you have some additional constraints?\nUser: I want the count of hydrogen bond donors to be 0, the count of hydrogen bond acceptors to be 6.\nAssistant: In that situation, I the chemical with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O."} {"text":"User: I want to make a chemical with a chemical formula of C20H24ClN7O3.\nAssistant: That's interesting, do you have some additional conditions I should take into account?\nUser: Indeed, I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptors to be 6.\nAssistant: In that situation, I the chemical with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl."}", "/scratch/micpie/export/rdkit_features/train_2-17.jsonl": "{"text":"Question: What is the rotatable bond count of the compound with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6?\nAnswer: 5"} {"text":"Question: What is the count of rotatable bonds of the chemical with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_13-11.jsonl": "{"text":"User: I want to design a molecule with a chemical formula of C21H20FN3O.\nAssistant: Nice, do you have some additional constraints I should take into account?\nUser: Yeah, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 3.\nAssistant: Then, I the molecule with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4."} {"text":"User: I want to analyze a molecule with a chemical formula of C21H20N4.\nAssistant: That's interesting, do you have some additional constraints that help me narrow down the search?\nUser: Yea, I want the number of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 4.\nAssistant: I propose the molecule with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C."}", "/scratch/micpie/export/rdkit_features/test_9-16.jsonl": "{"text":"Question: What is the number of rings of the compound with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C?\nAnswer: 3"} {"text":"Question: What is the count of rings of the chemical with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_16-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_115-5.jsonl": "{"text":"The rotatable bond count of the molecule with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F is 8."} {"text":"The rotatable bond count of the chemical with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 5."}", "/scratch/micpie/export/rdkit_features/train_6-0.jsonl": "{"text":"The chemical formula of the compound with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4 is C20H24N4O4S."} {"text":"The chemical formula of the chemical with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C is C25H30N4O2."}", "/scratch/micpie/export/rdkit_features/test_8-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the chemical with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C is 3.72."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F is 3.85."}", "/scratch/micpie/export/rdkit_features/test_3-7.jsonl": "{"text":"The acid group count of the molecule with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC is 2."} {"text":"The acid group count of the compound with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F is 0."}", "/scratch/micpie/export/rdkit_features/valid_22-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O is 56.71."} {"text":"The total sum of atomic polarizabilities of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4 is 65.41."}", "/scratch/micpie/export/rdkit_features/test_104-15.jsonl": "{"text":"Question: What is the heteroatom count of the chemical with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2?\nAnswer: 10"} {"text":"Question: What is the heteroatom count of the chemical with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/valid_102-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the chemical with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1 is 1."} {"text":"The count of hydrogen bond donors of the compound with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2 is 4."}", "/scratch/micpie/export/rdkit_features/test_107-19.jsonl": "{"text":"Question: What is the count of acid groups of the molecule with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC?\nAnswer: 0"} {"text":"Question: What is the acid group count of the compound with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_18-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 0.95."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O is 3.96."}", "/scratch/micpie/export/rdkit_features/test_113-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N is 7."} {"text":"The count of rotatable bonds of the molecule with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 6."}", "/scratch/micpie/export/rdkit_features/test_17-22.jsonl": "{"text":"User: I want to make a chemical with a number of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 3.\nAssistant: That is a very interesting question, do you have some additional I should take into account?\nUser: Yeah, I want the heteroatom count to be 8.\nAssistant: In that scenario, I the chemical with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F."} {"text":"User: I want to design a chemical with a number of hydrogen bond donor sites of 1 and a number of hydrogen bond acceptors of 9.\nAssistant: That is a very interesting question, do you have some additional constraints that help me narrow down the search?\nUser: Indeed, I want the count of heteroatoms to be 12.\nAssistant: Then, I advise the chemical with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O."}", "/scratch/micpie/export/rdkit_features/test_102-20.jsonl": "{"text":"Question: What is the count of basic groups of the compound with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3?\nAnswer: 1"} {"text":"Question: What is the number of basic groups of the molecule with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_12-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the chemical with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl?\nAnswer: 4"} {"text":"Question: What is the heteroatom count of the compound with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_28-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the compound with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl is 3."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F is 2."}", "/scratch/micpie/export/rdkit_features/valid_30-8.jsonl": "{"text":"The basic group count of the molecule with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3 is 0."} {"text":"The count of basic groups of the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F is 0."}", "/scratch/micpie/export/rdkit_features/valid_112-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC?\nAnswer: 0"} {"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_8-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the chemical with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N is 6."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO is 4."}", "/scratch/micpie/export/rdkit_features/test_9-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the compound with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C?\nAnswer: 6"} {"text":"Question: What is the count of heteroatoms of the molecule with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/train_3-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the molecule with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O is 2."} {"text":"The count of hydrogen bond donors of the chemical with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4 is 1."}", "/scratch/micpie/export/rdkit_features/test_116-6.jsonl": "{"text":"The number of aromatic bonds of the chemical with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1 is 21."} {"text":"The aromatic bond count of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl is 16."}", "/scratch/micpie/export/rdkit_features/valid_18-8.jsonl": "{"text":"The number of basic groups of the compound with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 0."} {"text":"The basic group count of the compound with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O is 1."}", "/scratch/micpie/export/rdkit_features/train_102-19.jsonl": "{"text":"Question: What is the count of acid groups of the compound with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the molecule with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_18-3.jsonl": "{"text":"The count of heteroatoms of the molecule with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O is 13."} {"text":"The heteroatom count of the molecule with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl is 9."}", "/scratch/micpie/export/rdkit_features/test_109-1.jsonl": "{"text":"The number of hydrogen bond donors of the molecule with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl is 3."} {"text":"The number of hydrogen bond donors of the chemical with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N is 3."}", "/scratch/micpie/export/rdkit_features/test_12-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C?\nAnswer: C18H18FN3OS"} {"text":"Question: What is the molecular formula of the chemical with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4?\nAnswer: C21H29FN2O"}", "/scratch/micpie/export/rdkit_features/valid_100-19.jsonl": "{"text":"Question: What is the number of acid groups of the chemical with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F?\nAnswer: 0"} {"text":"Question: What is the acid group count of the chemical with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_5-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C is 2.09."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+] is 3.05."}", "/scratch/micpie/export/rdkit_features/train_21-22.jsonl": "{"text":"User: I want to create a molecule with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 6.\nAssistant: That is a very interesting question, do you have some additional conditions I should consider?\nUser: I want the count of heteroatoms to be 9.\nAssistant: I propose the molecule with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC."} {"text":"User: I want to design a chemical with a number of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 4.\nAssistant: Nice, do you have some additional limitations?\nUser: Yea, I want the count of heteroatoms to be 8.\nAssistant: In that situation, I suggest the chemical with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O."}", "/scratch/micpie/export/rdkit_features/valid_4-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br is 53.15."} {"text":"The sum of atomic polarizabilities of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C is 68.95."}", "/scratch/micpie/export/rdkit_features/test_28-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the molecule with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC?\nAnswer: 7"} {"text":"Question: What is the heteroatom count of the molecule with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_30-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the compound with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F is 3.65."} {"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC is 3.84."}", "/scratch/micpie/export/rdkit_features/valid_22-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O?\nAnswer: 56.71"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4?\nAnswer: 65.41"}", "/scratch/micpie/export/rdkit_features/test_31-6.jsonl": "{"text":"The number of aromatic bonds of the chemical with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F is 16."} {"text":"The aromatic bond count of the chemical with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3 is 6."}", "/scratch/micpie/export/rdkit_features/test_103-6.jsonl": "{"text":"The number of aromatic bonds of the chemical with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1 is 24."} {"text":"The number of aromatic bonds of the compound with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 5."}", "/scratch/micpie/export/rdkit_features/valid_7-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the compound with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC?\nAnswer: 9"} {"text":"Question: What is the count of rotatable bonds of the compound with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_8-0.jsonl": "{"text":"The formula of the molecule with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C is C22H26N4O4."} {"text":"The formula of the chemical with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F is C22H24F2N2O3."}", "/scratch/micpie/export/rdkit_features/test_13-19.jsonl": "{"text":"Question: What is the number of acid groups of the chemical with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 0"} {"text":"Question: What is the acid group count of the chemical with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_105-7.jsonl": "{"text":"The number of acid groups of the molecule with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N is 0."} {"text":"The count of acid groups of the molecule with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O is 0."}", "/scratch/micpie/export/rdkit_features/valid_2-3.jsonl": "{"text":"The number of heteroatoms of the molecule with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4 is 8."} {"text":"The count of heteroatoms of the chemical with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3 is 7."}", "/scratch/micpie/export/rdkit_features/train_9-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C is 71.04."} {"text":"The total sum of atomic polarizabilities of the molecule with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F is 66.92."}", "/scratch/micpie/export/rdkit_features/test_120-16.jsonl": "{"text":"Question: What is the count of rings of the molecule with SMILES c1ccc(cc1)CCN(CC(=O)Nc2ccc(cc2)Cl)S(=O)(=O)c3ccc4c(c3)OCCO4?\nAnswer: 4"} {"text":"Question: What is the number of rings of the molecule with SMILES Cc1c(cnn1c2ccccc2)N[C@H](C)C(=O)Nc3cccc(c3Cl)Cl?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_2-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C?\nAnswer: 4"} {"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_17-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the chemical with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F is 3."} {"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 9."}", "/scratch/micpie/export/rdkit_features/valid_118-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the compound with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O is 1."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO is 5."}", "/scratch/micpie/export/rdkit_features/train_19-8.jsonl": "{"text":"The count of basic groups of the molecule with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC is 0."} {"text":"The basic group count of the compound with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3 is 0."}", "/scratch/micpie/export/rdkit_features/train_30-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F is 56.92."} {"text":"The total sum of atomic polarizabilities of the molecule with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC is 65.41."}", "/scratch/micpie/export/rdkit_features/test_20-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br?\nAnswer: 56.85"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-]?\nAnswer: 50.91"}", "/scratch/micpie/export/rdkit_features/train_12-23.jsonl": "{"text":"User: I want to make a molecule with a count of hydrogen bond donors of 0, a number of hydrogen bond acceptors of 1 and a LogP value computed using the Wildman-Crippen method of 4.74.\nAssistant: Interesting, do you have some additional requirements I should take into account?\nUser: Indeed, I want the formula to be C17H21Cl2NO.\nAssistant: I recommend the molecule with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C."} {"text":"User: I want to analyze a chemical with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 2 and a Wildman-Crippen LogP value of 4.74.\nAssistant: Do you have some additional constraints I should consider?\nUser: Yep, I want the chemical formula to be C21H32FNO2.\nAssistant: In that scenario, I suggest the chemical with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC."}", "/scratch/micpie/export/rdkit_features/train_120-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES Cc1ccc(cc1)NC(=O)CSc2nnc(n2C)C=NC(=O)CCc3ccccc3Cl is 6."} {"text":"The count of hydrogen bond acceptors of the compound with SMILES Cc1c(cnn1c2ccccc2)N[C@@H](C)C(=O)Nc3cccc(c3Cl)Cl is 4."}", "/scratch/micpie/export/rdkit_features/valid_26-16.jsonl": "{"text":"Question: What is the count of rings of the compound with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2?\nAnswer: 2"} {"text":"Question: What is the count of rings of the molecule with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_113-22.jsonl": "{"text":"User: I want to create a chemical with a number of hydrogen bond donors of 3 and a number of hydrogen bond acceptor sites of 5.\nAssistant: Interesting, do you have some additional constraints I should take into account?\nUser: Yep, I want the heteroatom count to be 9.\nAssistant: Then, I advise the chemical with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N."} {"text":"User: I want to synthesize a compound with a number of hydrogen bond donor sites of 3 and a count of hydrogen bond acceptors of 3.\nAssistant: Cool, do you have some additional constraints that I should consider?\nUser: Yes, I want the count of heteroatoms to be 9.\nAssistant: Then, I propose the compound with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl."}", "/scratch/micpie/export/rdkit_features/test_29-6.jsonl": "{"text":"The aromatic bond count of the molecule with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4 is 6."} {"text":"The number of aromatic bonds of the molecule with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C is 0."}", "/scratch/micpie/export/rdkit_features/valid_8-19.jsonl": "{"text":"Question: What is the number of acid groups of the compound with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the chemical with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_1-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the molecule with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5?\nAnswer: 4"} {"text":"Question: What is the rotatable bond count of the molecule with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_1-6.jsonl": "{"text":"The count of aromatic bonds of the chemical with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4 is 17."} {"text":"The number of aromatic bonds of the chemical with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC is 22."}", "/scratch/micpie/export/rdkit_features/valid_29-11.jsonl": "{"text":"User: I want to synthesize a chemical with a formula of C20H31ClN3O+.\nAssistant: Interesting, do you have some additional conditions I should consider?\nUser: Yep, I want the count of hydrogen bond donors to be 3, the number of hydrogen bond acceptors to be 1.\nAssistant: In that situation, I advise the chemical with SMILES CCC[NH+]1CC[C@@H](C1)CNC(=O)NCC2(CCC2)c3ccc(cc3)Cl."} {"text":"User: I want to analyze a chemical with a chemical formula of C21H25F2N2O2+.\nAssistant: Interesting, do you have some additional limitations I should take into account?\nUser: Yep, I want the count of hydrogen bond donors to be 3, the count of hydrogen bond acceptors to be 2.\nAssistant: In that case, I advise the chemical with SMILES C[C@@H](c1cccc(c1)O)NC(=O)[C@@H]2C[NH+](CCC2(F)F)Cc3ccccc3."}", "/scratch/micpie/export/rdkit_features/test_30-5.jsonl": "{"text":"The rotatable bond count of the molecule with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3 is 6."} {"text":"The number of rotatable bonds of the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC is 4."}", "/scratch/micpie/export/rdkit_features/test_13-22.jsonl": "{"text":"User: I want to synthesize a molecule with a number of hydrogen bond donor sites of 1 and a number of hydrogen bond acceptor sites of 2.\nAssistant: Do you have some additional requirements?\nUser: I want the count of heteroatoms to be 5.\nAssistant: In that situation, I advise the molecule with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC."} {"text":"User: I want to analyze a chemical with a number of hydrogen bond donors of 0 and a number of hydrogen bond acceptor sites of 2.\nAssistant: Interesting, do you have some additional requirements that help me narrow down the search?\nUser: Yep, I want the number of heteroatoms to be 3.\nAssistant: In that situation, I recommend the chemical with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4."}", "/scratch/micpie/export/rdkit_features/test_114-16.jsonl": "{"text":"Question: What is the number of rings of the molecule with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C?\nAnswer: 3"} {"text":"Question: What is the number of rings of the molecule with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_100-22.jsonl": "{"text":"User: I want to analyze a compound with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptor sites of 1.\nAssistant: Do you have some additional constraints?\nUser: Indeed, I want the heteroatom count to be 5.\nAssistant: In that scenario, I advise the compound with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F."} {"text":"User: I want to design a chemical with a number of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 2.\nAssistant: That's interesting, do you have some additional limitations I should consider?\nUser: Yes, I want the number of heteroatoms to be 4.\nAssistant: Then, I suggest the chemical with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4."}", "/scratch/micpie/export/rdkit_features/train_26-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_21-8.jsonl": "{"text":"The count of basic groups of the compound with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC is 0."} {"text":"The basic group count of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O is 0."}", "/scratch/micpie/export/rdkit_features/test_5-22.jsonl": "{"text":"User: I want to synthesize a compound with a number of hydrogen bond donor sites of 1 and a number of hydrogen bond acceptors of 4.\nAssistant: Do you have some additional conditions I should take into account?\nUser: Yep, I want the count of heteroatoms to be 6.\nAssistant: In that case, I the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5."} {"text":"User: I want to design a chemical with a number of hydrogen bond donors of 2 and a number of hydrogen bond acceptor sites of 4.\nAssistant: That's interesting, do you have some additional conditions that help me narrow down the search?\nUser: I want the number of heteroatoms to be 7.\nAssistant: In that case, I the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F."}", "/scratch/micpie/export/rdkit_features/test_13-23.jsonl": "{"text":"User: I want to create a chemical with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptors of 2 and a Wildman-Crippen LogP value computed using RDKit of 4.58.\nAssistant: That is a very interesting question, do you have some additional limitations that help me narrow down the search?\nUser: I want the molecular formula to be C19H21ClFNO2.\nAssistant: In that situation, I propose the chemical with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC."} {"text":"User: I want to make a molecule with a number of hydrogen bond donor sites of 0, a count of hydrogen bond acceptors of 2 and a Wildman-Crippen LogP value of 4.97.\nAssistant: That's interesting, do you have some additional conditions I should take into account?\nUser: Yes, I want the molecular formula to be C22H31NO2.\nAssistant: Then, I propose the molecule with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4."}", "/scratch/micpie/export/rdkit_features/test_3-23.jsonl": "{"text":"User: I want to synthesize a chemical with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptor sites of 7 and a Wildman-Crippen LogP value computed using RDKit of 3.75.\nAssistant: Interesting, do you have some additional requirements that help me narrow down the search?\nUser: Yea, I want the formula to be C22H26N6O3.\nAssistant: In that situation, I suggest the chemical with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC."} {"text":"User: I want to design a compound with a number of hydrogen bond donor sites of 3, a number of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value computed using RDKit of 3.54.\nAssistant: Cool, do you have some additional requirements that help me narrow down the search?\nUser: Yeah, I want the molecular formula to be C20H20F2N4O3.\nAssistant: In that situation, I advise the compound with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F."}", "/scratch/micpie/export/rdkit_features/test_4-16.jsonl": "{"text":"Question: What is the number of rings of the compound with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4?\nAnswer: 4"} {"text":"Question: What is the number of rings of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_23-6.jsonl": "{"text":"The number of aromatic bonds of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4 is 5."} {"text":"The number of aromatic bonds of the compound with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4 is 11."}", "/scratch/micpie/export/rdkit_features/test_23-0.jsonl": "{"text":"The chemical formula of the compound with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4 is C19H34N5OS+."} {"text":"The molecular formula of the compound with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4 is C18H19ClN8O."}", "/scratch/micpie/export/rdkit_features/test_111-7.jsonl": "{"text":"The count of acid groups of the compound with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4 is 0."} {"text":"The count of acid groups of the chemical with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC is 0."}", "/scratch/micpie/export/rdkit_features/valid_4-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the molecule with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br?\nAnswer: 7"} {"text":"Question: What is the heteroatom count of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_31-6.jsonl": "{"text":"The aromatic bond count of the compound with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F is 12."} {"text":"The number of aromatic bonds of the chemical with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4 is 12."}", "/scratch/micpie/export/rdkit_features/test_5-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5?\nAnswer: 4"} {"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_13-8.jsonl": "{"text":"The basic group count of the molecule with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC is 0."} {"text":"The count of basic groups of the molecule with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C is 0."}", "/scratch/micpie/export/rdkit_features/train_3-16.jsonl": "{"text":"Question: What is the count of rings of the compound with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O?\nAnswer: 5"} {"text":"Question: What is the ring count of the chemical with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_111-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4?\nAnswer: 7"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC?\nAnswer: 10"}", "/scratch/micpie/export/rdkit_features/test_101-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the chemical with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F?\nAnswer: 7"} {"text":"Question: What is the heteroatom count of the compound with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/test_111-11.jsonl": "{"text":"User: I want to analyze a molecule with a chemical formula of C22H34N5O3S+.\nAssistant: Nice, do you have some additional that help me narrow down the search?\nUser: Yep, I want the count of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 7.\nAssistant: In that situation, I recommend the molecule with SMILES CCN(C)c1nnc(n1C[C@@H](c2ccccc2OC)[NH+]3CCCC3)[C@@H]4CCS(=O)(=O)C4."} {"text":"User: I want to analyze a molecule with a formula of C22H31N5O4.\nAssistant: Cool, do you have some additional limitations that I should consider?\nUser: Yep, I want the number of hydrogen bond donors to be 0, the number of hydrogen bond acceptor sites to be 8.\nAssistant: In that situation, I advise the molecule with SMILES C[C@@H]1C[C@@H](CCO1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4OC)OC."}", "/scratch/micpie/export/rdkit_features/valid_4-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_117-5.jsonl": "{"text":"The rotatable bond count of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4OC)Cl is 5."} {"text":"The rotatable bond count of the compound with SMILES Cc1cc(ccc1[C@@H](C)[NH2+]CC2(CC2)CO)F is 5."}", "/scratch/micpie/export/rdkit_features/test_104-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES C1CN(CCN(C1)C(=O)C(=O)N)C(=O)[C@H]2CCS(=O)(=O)N2?\nAnswer: 42.67"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES C[C@@H](c1cccc(c1)C#N)Nc2nc(nc(n2)Cl)Oc3ccccc3?\nAnswer: 49.50"}", "/scratch/micpie/export/rdkit_features/valid_119-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O?\nAnswer: 2"} {"text":"Question: What is the count of hydrogen bond donors of the molecule with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_111-3.jsonl": "{"text":"The number of heteroatoms of the molecule with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4 is 8."} {"text":"The count of heteroatoms of the molecule with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC is 10."}", "/scratch/micpie/export/rdkit_features/test_28-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the compound with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC?\nAnswer: 5"} {"text":"Question: What is the rotatable bond count of the compound with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_3-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N is 3."} {"text":"The number of hydrogen bond donors of the compound with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC is 1."}", "/scratch/micpie/export/rdkit_features/test_100-8.jsonl": "{"text":"The basic group count of the molecule with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F is 1."} {"text":"The basic group count of the chemical with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br is 0."}", "/scratch/micpie/export/rdkit_features/train_107-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES c1cc(c(cc1NC(=O)N2CC[NH+](CC2)CC(=O)N3CCCCC3)N4CCNC4=O)Cl?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5?\nAnswer: 10"}", "/scratch/micpie/export/rdkit_features/valid_104-19.jsonl": "{"text":"Question: What is the acid group count of the chemical with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the molecule with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_9-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C?\nAnswer: 70.19"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4?\nAnswer: 74.85"}", "/scratch/micpie/export/rdkit_features/valid_10-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C is 7."} {"text":"The count of rotatable bonds of the chemical with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3 is 4."}", "/scratch/micpie/export/rdkit_features/test_7-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC is 66.38."} {"text":"The sum of atomic polarizabilities of the chemical with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC is 58.07."}", "/scratch/micpie/export/rdkit_features/train_8-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the compound with SMILES c1cc(cc(c1)Oc2ccc(cn2)NC(=O)c3ccc4c(c3)OC(O4)(F)F)C(=O)N is 3.55."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES COc1ccccc1S(=O)(=O)CCC[NH+](Cc2ccccc2)C3(CCCC3)CO is 2.25."}", "/scratch/micpie/export/rdkit_features/train_27-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C is C21H31N4O+."} {"text":"The chemical formula of the chemical with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4 is C21H23N3O3."}", "/scratch/micpie/export/rdkit_features/train_25-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C is 2.44."} {"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F is 3.89."}", "/scratch/micpie/export/rdkit_features/test_33-8.jsonl": "{"text":"The number of basic groups of the compound with SMILES Cc1ccc(cc1)C(=O)N2CCC(CC2)NC(=O)N[C@@H](C)[C@H](C(F)(F)F)O is 0."} {"text":"The count of basic groups of the molecule with SMILES Cc1cc(no1)OCc2nc(no2)c3cc(c(n(c3=O)C)C)Br is 0."}", "/scratch/micpie/export/rdkit_features/test_22-20.jsonl": "{"text":"Question: What is the basic group count of the chemical with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O?\nAnswer: 0"} {"text":"Question: What is the basic group count of the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_105-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N?\nAnswer: C17H15BrN2O2"} {"text":"Question: What is the formula of the molecule with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O?\nAnswer: C16H21N3O4"}", "/scratch/micpie/export/rdkit_features/train_101-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the chemical with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3?\nAnswer: 17"} {"text":"Question: What is the number of aromatic bonds of the molecule with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1?\nAnswer: 23"}", "/scratch/micpie/export/rdkit_features/train_14-4.jsonl": "{"text":"The ring count of the chemical with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C is 2."} {"text":"The count of rings of the compound with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O is 2."}", "/scratch/micpie/export/rdkit_features/train_23-6.jsonl": "{"text":"The number of aromatic bonds of the chemical with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4 is 10."} {"text":"The aromatic bond count of the chemical with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3 is 11."}", "/scratch/micpie/export/rdkit_features/test_23-19.jsonl": "{"text":"Question: What is the number of acid groups of the chemical with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4?\nAnswer: 0"} {"text":"Question: What is the acid group count of the compound with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_26-20.jsonl": "{"text":"Question: What is the basic group count of the molecule with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the compound with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_16-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C?\nAnswer: 1"} {"text":"Question: What is the number of rings of the compound with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_4-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the chemical with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4?\nAnswer: 6"} {"text":"Question: What is the heteroatom count of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_12-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the molecule with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl?\nAnswer: 5"} {"text":"Question: What is the rotatable bond count of the compound with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/train_109-8.jsonl": "{"text":"The basic group count of the molecule with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3 is 0."} {"text":"The number of basic groups of the molecule with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C is 0."}", "/scratch/micpie/export/rdkit_features/valid_21-11.jsonl": "{"text":"User: I want to analyze a chemical with a chemical formula of C18H26N2O5S.\nAssistant: Interesting, do you have some additional constraints?\nUser: Yep, I want the count of hydrogen bond donors to be 2, the count of hydrogen bond acceptors to be 6.\nAssistant: In that scenario, I propose the chemical with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-]."} {"text":"User: I want to design a molecule with a molecular formula of C22H33N3O3.\nAssistant: Nice, do you have some additional limitations that help me narrow down the search?\nUser: I want the number of hydrogen bond donors to be 2, the count of hydrogen bond acceptors to be 4.\nAssistant: I propose the molecule with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3."}", "/scratch/micpie/export/rdkit_features/train_22-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O?\nAnswer: 6"} {"text":"Question: What is the count of aromatic bonds of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4?\nAnswer: 10"}", "/scratch/micpie/export/rdkit_features/train_0-18.jsonl": "{"text":"Question: What is the aromatic bond count of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3sc4c(sc5c(F)csc54)c3F)c2c1?\nAnswer: 28"} {"text":"Question: What is the aromatic bond count of the chemical with SMILES c1ccc(cc1)[C@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/test_105-17.jsonl": "{"text":"Question: What is the rotatable bond count of the compound with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N?\nAnswer: 4"} {"text":"Question: What is the rotatable bond count of the compound with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/train_12-19.jsonl": "{"text":"Question: What is the count of acid groups of the compound with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the chemical with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_106-7.jsonl": "{"text":"The number of acid groups of the compound with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl is 0."} {"text":"The acid group count of the chemical with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F is 0."}", "/scratch/micpie/export/rdkit_features/valid_15-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N?\nAnswer: 4"} {"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_12-8.jsonl": "{"text":"The basic group count of the chemical with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C is 0."} {"text":"The number of basic groups of the molecule with SMILES CC(C)[C@@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC is 0."}", "/scratch/micpie/export/rdkit_features/train_10-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4 is 2.36."} {"text":"The Wildman-Crippen LogP value of the chemical with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl is 4.69."}", "/scratch/micpie/export/rdkit_features/valid_31-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES Cc1ccccc1CCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC?\nAnswer: 65.41"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@H](C2)OCC3CC3?\nAnswer: 59.17"}", "/scratch/micpie/export/rdkit_features/valid_14-18.jsonl": "{"text":"Question: What is the aromatic bond count of the chemical with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C?\nAnswer: 11"} {"text":"Question: What is the number of aromatic bonds of the chemical with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_6-12.jsonl": "{"text":"Question: What is the chemical formula of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+]?\nAnswer: C25H26N3O3+"} {"text":"Question: What is the chemical formula of the compound with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F?\nAnswer: C22H24FN5O2"}", "/scratch/micpie/export/rdkit_features/valid_114-19.jsonl": "{"text":"Question: What is the number of acid groups of the chemical with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl?\nAnswer: 0"} {"text":"Question: What is the acid group count of the molecule with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_14-8.jsonl": "{"text":"The count of basic groups of the compound with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C is 0."} {"text":"The count of basic groups of the compound with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O is 0."}", "/scratch/micpie/export/rdkit_features/valid_16-3.jsonl": "{"text":"The count of heteroatoms of the compound with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C is 4."} {"text":"The number of heteroatoms of the compound with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3 is 6."}", "/scratch/micpie/export/rdkit_features/test_116-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1 is 76.74."} {"text":"The sum of atomic polarizabilities of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl is 67.26."}", "/scratch/micpie/export/rdkit_features/test_14-8.jsonl": "{"text":"The count of basic groups of the compound with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3 is 0."} {"text":"The basic group count of the chemical with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O is 0."}", "/scratch/micpie/export/rdkit_features/test_18-5.jsonl": "{"text":"The number of rotatable bonds of the molecule with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O is 7."} {"text":"The number of rotatable bonds of the chemical with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C is 5."}", "/scratch/micpie/export/rdkit_features/valid_26-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2 is 3.59."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC is 3.65."}", "/scratch/micpie/export/rdkit_features/train_111-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 8"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC?\nAnswer: 10"}", "/scratch/micpie/export/rdkit_features/train_26-7.jsonl": "{"text":"The acid group count of the compound with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F is 0."} {"text":"The number of acid groups of the molecule with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C is 0."}", "/scratch/micpie/export/rdkit_features/valid_6-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+] is 8."} {"text":"The count of rotatable bonds of the chemical with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F is 5."}", "/scratch/micpie/export/rdkit_features/test_106-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O?\nAnswer: 10"} {"text":"Question: What is the count of aromatic bonds of the compound with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/train_30-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F?\nAnswer: 1"} {"text":"Question: What is the number of rings of the chemical with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_107-9.jsonl": "{"text":"The sum of atomic polarizabilities of the molecule with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C is 68.15."} {"text":"The sum of atomic polarizabilities of the compound with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C is 66.60."}", "/scratch/micpie/export/rdkit_features/train_3-23.jsonl": "{"text":"User: I want to synthesize a compound with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptor sites of 5 and a LogP value computed using the Wildman-Crippen method of 4.28.\nAssistant: Interesting, do you have some additional requirements that I should consider?\nUser: I want the chemical formula to be C25H24N2O4.\nAssistant: I the compound with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O."} {"text":"User: I want to synthesize a compound with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptors of 5 and a Wildman-Crippen LogP value computed using RDKit of 2.25.\nAssistant: Interesting, do you have some additional constraints I should take into account?\nUser: Indeed, I want the molecular formula to be C24H28N3O3+.\nAssistant: I advise the compound with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4."}", "/scratch/micpie/export/rdkit_features/valid_22-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the chemical with SMILES C[C@@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cnc(s3)Cl)O?\nAnswer: 5"} {"text":"Question: What is the number of aromatic bonds of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@H]4CCOC4?\nAnswer: 10"}", "/scratch/micpie/export/rdkit_features/test_120-1.jsonl": "{"text":"The number of hydrogen bond donors of the chemical with SMILES c1ccc(cc1)CCN(CC(=O)Nc2ccc(cc2)Cl)S(=O)(=O)c3ccc4c(c3)OCCO4 is 1."} {"text":"The count of hydrogen bond donors of the compound with SMILES Cc1c(cnn1c2ccccc2)N[C@H](C)C(=O)Nc3cccc(c3Cl)Cl is 2."}", "/scratch/micpie/export/rdkit_features/valid_24-6.jsonl": "{"text":"The aromatic bond count of the molecule with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl is 22."} {"text":"The aromatic bond count of the compound with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C is 5."}", "/scratch/micpie/export/rdkit_features/test_11-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2 is 4.62."} {"text":"The Wildman-Crippen LogP value of the compound with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C is 4.61."}", "/scratch/micpie/export/rdkit_features/test_4-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the molecule with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4?\nAnswer: 7"} {"text":"Question: What is the heteroatom count of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_10-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C?\nAnswer: C17H25F3N6S"} {"text":"Question: What is the formula of the chemical with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3?\nAnswer: C20H26ClNO2"}", "/scratch/micpie/export/rdkit_features/valid_11-8.jsonl": "{"text":"The count of basic groups of the chemical with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl is 0."} {"text":"The basic group count of the chemical with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C is 0."}", "/scratch/micpie/export/rdkit_features/valid_4-23.jsonl": "{"text":"User: I want to analyze a chemical with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value computed using RDKit of 3.94.\nAssistant: Interesting, do you have some additional I should take into account?\nUser: Yes, I want the chemical formula to be C18H17BrClN3O2.\nAssistant: Then, I advise the chemical with SMILES Cc1cc(ncc1NC(=O)C(=O)N(Cc2ccc(cc2)Cl)C3CC3)Br."} {"text":"User: I want to make a compound with a number of hydrogen bond donor sites of 2, a count of hydrogen bond acceptors of 3 and a LogP value computed using the Wildman-Crippen method of 0.88.\nAssistant: Interesting, do you have some additional ?\nUser: Indeed, I want the formula to be C23H31N3O2S+2.\nAssistant: In that case, I suggest the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)C4CC[NH+](CC4)C."}", "/scratch/micpie/export/rdkit_features/valid_117-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4OC)Cl?\nAnswer: 0"} {"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES Cc1cc(ccc1[C@@H](C)[NH2+]CC2(CC2)CO)F?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_103-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO is C22H36O7."} {"text":"The chemical formula of the molecule with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N is C14H14N6O3."}", "/scratch/micpie/export/rdkit_features/train_18-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O is 68.38."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl is 54.59."}", "/scratch/micpie/export/rdkit_features/train_9-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C?\nAnswer: 4"} {"text":"Question: What is the count of rings of the chemical with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_14-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C?\nAnswer: 0"} {"text":"Question: What is the basic group count of the chemical with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_101-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the compound with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3 is 5."} {"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1 is 7."}", "/scratch/micpie/export/rdkit_features/train_113-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: C21H32N5O3S+"} {"text":"Question: What is the molecular formula of the molecule with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: C21H19Cl2N3O2S2"}", "/scratch/micpie/export/rdkit_features/train_6-18.jsonl": "{"text":"Question: What is the aromatic bond count of the compound with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4?\nAnswer: 11"} {"text":"Question: What is the aromatic bond count of the chemical with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C?\nAnswer: 17"}", "/scratch/micpie/export/rdkit_features/train_21-11.jsonl": "{"text":"User: I want to analyze a molecule with a molecular formula of C17H15FNO6S-.\nAssistant: That is a very interesting question, do you have some additional constraints that help me narrow down the search?\nUser: I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptors to be 6.\nAssistant: I the molecule with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC."} {"text":"User: I want to make a molecule with a chemical formula of C19H25ClFN3O3.\nAssistant: Interesting, do you have some additional I should consider?\nUser: Yeah, I want the number of hydrogen bond donor sites to be 1, the number of hydrogen bond acceptors to be 4.\nAssistant: Then, I advise the molecule with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O."}", "/scratch/micpie/export/rdkit_features/valid_111-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4 is 1."} {"text":"The number of hydrogen bond donors of the molecule with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC is 0."}", "/scratch/micpie/export/rdkit_features/train_32-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C?\nAnswer: 63.82"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2?\nAnswer: 51.73"}", "/scratch/micpie/export/rdkit_features/test_16-11.jsonl": "{"text":"User: I want to make a molecule with a chemical formula of C18H35N2O2+.\nAssistant: Nice, do you have some additional constraints that help me narrow down the search?\nUser: Yes, I want the number of hydrogen bond donors to be 2, the count of hydrogen bond acceptors to be 2.\nAssistant: Then, I propose the molecule with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C."} {"text":"User: I want to analyze a compound with a formula of C17H19Cl2N3O3S.\nAssistant: That is a very interesting question, do you have some additional conditions that help me narrow down the search?\nUser: Yea, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 5.\nAssistant: In that scenario, I propose the compound with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N."}", "/scratch/micpie/export/rdkit_features/test_3-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES Cc1cc(ccc1n2c(nnn2)C(C)C)NC(=O)N[C@H]3CCOc4c3ccc(c4)OC?\nAnswer: 2"} {"text":"Question: What is the count of acid groups of the molecule with SMILES CC(C)Oc1cccc(c1)CNC(=O)C(=O)Nc2ccc3c(c2)[nH]c(n3)C(F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_24-4.jsonl": "{"text":"The number of rings of the compound with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl is 4."} {"text":"The ring count of the molecule with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C is 3."}", "/scratch/micpie/export/rdkit_features/valid_14-7.jsonl": "{"text":"The acid group count of the molecule with SMILES C[C@@H]1CCC[C@@H](N(C1)C(=O)Nc2cnn(c2)[C@H](C)c3ccccc3)C is 0."} {"text":"The count of acid groups of the chemical with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCC2(CC2)CC#N)O is 0."}", "/scratch/micpie/export/rdkit_features/test_117-8.jsonl": "{"text":"The count of basic groups of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc5cc(ccc5c4)Br is 0."} {"text":"The count of basic groups of the chemical with SMILES C[C@H](C(C)(C)OC)[NH2+]CCNc1ccccc1 is 1."}", "/scratch/micpie/export/rdkit_features/test_116-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_26-8.jsonl": "{"text":"The basic group count of the molecule with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@H]2CCCOC2)C(F)(F)F is 0."} {"text":"The basic group count of the chemical with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@H]2CCCn3c2[nH+]c(c3)C is 1."}", "/scratch/micpie/export/rdkit_features/train_7-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the molecule with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl?\nAnswer: 6"} {"text":"Question: What is the number of rotatable bonds of the molecule with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F?\nAnswer: 10"}", "/scratch/micpie/export/rdkit_features/test_26-4.jsonl": "{"text":"The number of rings of the chemical with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3 is 3."} {"text":"The ring count of the molecule with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br is 2."}", "/scratch/micpie/export/rdkit_features/train_13-16.jsonl": "{"text":"Question: What is the number of rings of the compound with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 2"} {"text":"Question: What is the ring count of the molecule with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_19-15.jsonl": "{"text":"Question: What is the heteroatom count of the chemical with SMILES C[C@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC?\nAnswer: 7"} {"text":"Question: What is the count of heteroatoms of the compound with SMILES Cc1ccc(cc1)CN2CC[C@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_0-3.jsonl": "{"text":"The number of heteroatoms of the chemical with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1 is 9."} {"text":"The number of heteroatoms of the compound with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4 is 5."}", "/scratch/micpie/export/rdkit_features/test_107-17.jsonl": "{"text":"Question: What is the rotatable bond count of the chemical with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC?\nAnswer: 8"} {"text":"Question: What is the number of rotatable bonds of the molecule with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_1-5.jsonl": "{"text":"The number of rotatable bonds of the compound with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5 is 4."} {"text":"The count of rotatable bonds of the compound with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F is 4."}", "/scratch/micpie/export/rdkit_features/train_118-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the compound with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F is 1."} {"text":"The number of hydrogen bond acceptor sites of the compound with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3 is 6."}", "/scratch/micpie/export/rdkit_features/valid_28-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br is 0."} {"text":"The number of acid groups of the chemical with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C is 0."}", "/scratch/micpie/export/rdkit_features/train_5-7.jsonl": "{"text":"The count of acid groups of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@@H]([C@H](CC4)O)C is 0."} {"text":"The acid group count of the chemical with SMILES C[C@H]1C[C@H](CCN1C(=O)Nc2cc3c([nH]c2=O)CCCC3)NC(=O)OC(C)(C)C is 0."}", "/scratch/micpie/export/rdkit_features/train_105-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N is 6."} {"text":"The count of rotatable bonds of the compound with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O is 4."}", "/scratch/micpie/export/rdkit_features/valid_10-6.jsonl": "{"text":"The count of aromatic bonds of the molecule with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C is 10."} {"text":"The aromatic bond count of the compound with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3 is 6."}", "/scratch/micpie/export/rdkit_features/valid_3-7.jsonl": "{"text":"The acid group count of the chemical with SMILES Cc1c(sc(n1)c2ccccc2)NC(=O)N[C@@H]3CCN(C3)c4ccccc4C(=O)N is 0."} {"text":"The acid group count of the chemical with SMILES C[C@@H]1C[C@H]1c2ccc(o2)CN(C3CC3)C(=O)C(=O)Nc4ccc(nc4Cl)OC is 0."}", "/scratch/micpie/export/rdkit_features/test_28-11.jsonl": "{"text":"User: I want to design a molecule with a molecular formula of C17H19N4OS2+.\nAssistant: Do you have some additional conditions I should take into account?\nUser: I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 6.\nAssistant: In that scenario, I propose the molecule with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC."} {"text":"User: I want to create a compound with a molecular formula of C20H25N5O.\nAssistant: Interesting, do you have some additional conditions I should take into account?\nUser: I want the number of hydrogen bond donor sites to be 3, the number of hydrogen bond acceptors to be 3.\nAssistant: Then, I suggest the compound with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C."}", "/scratch/micpie/export/rdkit_features/train_33-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES COc1ccc(c(n1)C(F)(F)F)NC(=O)C(=O)NCCC[NH+]2CCCCC2 is 0.62."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@H]3CCSC3)Br is 1.03."}", "/scratch/micpie/export/rdkit_features/train_31-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the chemical with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F is 3."} {"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4 is 4."}", "/scratch/micpie/export/rdkit_features/test_27-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F?\nAnswer: 56.26"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F?\nAnswer: 57.46"}", "/scratch/micpie/export/rdkit_features/test_30-11.jsonl": "{"text":"User: I want to design a chemical with a chemical formula of C21H25N3O2.\nAssistant: Nice, do you have some additional requirements that I should consider?\nUser: Yeah, I want the number of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 3.\nAssistant: I recommend the chemical with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3."} {"text":"User: I want to make a compound with a chemical formula of C24H27FN2O3.\nAssistant: Interesting, do you have some additional limitations I should take into account?\nUser: Indeed, I want the count of hydrogen bond donors to be 0, the number of hydrogen bond acceptors to be 3.\nAssistant: In that situation, I advise the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC."}", "/scratch/micpie/export/rdkit_features/train_117-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(cc4)Oc5ccc(cn5)Cl?\nAnswer: C27H26ClN3O4"} {"text":"Question: What is the chemical formula of the chemical with SMILES C[C@@H](C(C)(C)OC)[NH2+]CCNc1ccccc1?\nAnswer: C14H25N2O+"}", "/scratch/micpie/export/rdkit_features/valid_33-3.jsonl": "{"text":"The number of heteroatoms of the molecule with SMILES c1ccc2c(c1)cn(n2)CC(=O)N3C[C@@H]4[C@H]3CN(CC4)C(=O)[C@@]56CCC[C@@H]5C6 is 6."} {"text":"The number of heteroatoms of the molecule with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@@H]3CCSC3)Br is 8."}", "/scratch/micpie/export/rdkit_features/valid_7-11.jsonl": "{"text":"User: I want to make a compound with a chemical formula of C24H28N2O4.\nAssistant: That's interesting, do you have some additional requirements I should consider?\nUser: Yep, I want the number of hydrogen bond donor sites to be 2, the number of hydrogen bond acceptor sites to be 4.\nAssistant: I advise the compound with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC."} {"text":"User: I want to design a chemical with a chemical formula of C23H23N3O4.\nAssistant: That is a very interesting question, do you have some additional constraints I should take into account?\nUser: Indeed, I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 5.\nAssistant: In that scenario, I advise the chemical with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC."}", "/scratch/micpie/export/rdkit_features/test_109-23.jsonl": "{"text":"User: I want to design a chemical with a number of hydrogen bond donors of 3, a number of hydrogen bond acceptor sites of 5 and a LogP value computed using the Wildman-Crippen method of 2.40.\nAssistant: That's interesting, do you have some additional conditions that help me narrow down the search?\nUser: Yes, I want the formula to be C22H26ClN3O4.\nAssistant: I recommend the chemical with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl."} {"text":"User: I want to make a chemical with a number of hydrogen bond donors of 3, a count of hydrogen bond acceptors of 5 and a LogP value computed using the Wildman-Crippen method of 1.83.\nAssistant: Interesting, do you have some additional requirements that help me narrow down the search?\nUser: Yea, I want the chemical formula to be C16H13BrN4O4S.\nAssistant: In that scenario, I recommend the chemical with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N."}", "/scratch/micpie/export/rdkit_features/valid_116-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 5."} {"text":"The rotatable bond count of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl is 5."}", "/scratch/micpie/export/rdkit_features/train_103-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1 is 44.28."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 47.35."}", "/scratch/micpie/export/rdkit_features/valid_101-12.jsonl": "{"text":"Question: What is the formula of the chemical with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3?\nAnswer: C15H13ClN4O2"} {"text":"Question: What is the chemical formula of the molecule with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1?\nAnswer: C25H23F2N3O2"}", "/scratch/micpie/export/rdkit_features/test_102-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3 is 0."} {"text":"The count of hydrogen bond donors of the compound with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1 is 2."}", "/scratch/micpie/export/rdkit_features/valid_7-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC is 16."} {"text":"The count of aromatic bonds of the chemical with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC is 18."}", "/scratch/micpie/export/rdkit_features/valid_26-18.jsonl": "{"text":"Question: What is the aromatic bond count of the chemical with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2?\nAnswer: 6"} {"text":"Question: What is the number of aromatic bonds of the compound with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC?\nAnswer: 17"}", "/scratch/micpie/export/rdkit_features/test_4-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the compound with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4?\nAnswer: 3"} {"text":"Question: What is the count of rotatable bonds of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_5-16.jsonl": "{"text":"Question: What is the ring count of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5?\nAnswer: 5"} {"text":"Question: What is the count of rings of the molecule with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_109-3.jsonl": "{"text":"The count of heteroatoms of the chemical with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl is 8."} {"text":"The number of heteroatoms of the compound with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F is 9."}", "/scratch/micpie/export/rdkit_features/train_4-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_9-20.jsonl": "{"text":"Question: What is the number of basic groups of the molecule with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C?\nAnswer: 1"} {"text":"Question: What is the count of basic groups of the molecule with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/test_103-7.jsonl": "{"text":"The count of acid groups of the compound with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1 is 0."} {"text":"The count of acid groups of the molecule with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 0."}", "/scratch/micpie/export/rdkit_features/test_107-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC is 6."} {"text":"The number of hydrogen bond acceptors of the molecule with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC is 7."}", "/scratch/micpie/export/rdkit_features/train_114-16.jsonl": "{"text":"Question: What is the count of rings of the chemical with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C?\nAnswer: 3"} {"text":"Question: What is the ring count of the molecule with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_111-7.jsonl": "{"text":"The acid group count of the compound with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C is 0."} {"text":"The number of acid groups of the compound with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC is 0."}", "/scratch/micpie/export/rdkit_features/train_10-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4 is C23H42N5O+."} {"text":"The formula of the compound with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl is C20H28ClNO2."}", "/scratch/micpie/export/rdkit_features/test_32-11.jsonl": "{"text":"User: I want to create a molecule with a formula of C19H30N5O2S+.\nAssistant: That is a very interesting question, do you have some additional requirements I should take into account?\nUser: Yep, I want the number of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 6.\nAssistant: In that scenario, I suggest the molecule with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3."} {"text":"User: I want to analyze a compound with a molecular formula of C21H24N4O3.\nAssistant: Cool, do you have some additional limitations that help me narrow down the search?\nUser: Yea, I want the count of hydrogen bond donors to be 3, the count of hydrogen bond acceptors to be 4.\nAssistant: In that case, I recommend the compound with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3."}", "/scratch/micpie/export/rdkit_features/test_102-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES COc1ccc2c3c(c4cc(OC)c(OC)cc4c2c1)CN1CCC[C@H]1C3 is 4.54."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES Cc1oc(-c2ccccc2)nc1CCOc1ccc(C[C@@H](NC(=O)\/C=C\/c2ccco2)C(=O)O)cc1 is 4.69."}", "/scratch/micpie/export/rdkit_features/test_27-8.jsonl": "{"text":"The basic group count of the compound with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F is 1."} {"text":"The count of basic groups of the molecule with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F is 0."}", "/scratch/micpie/export/rdkit_features/valid_19-4.jsonl": "{"text":"The count of rings of the compound with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC is 2."} {"text":"The count of rings of the compound with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F is 3."}", "/scratch/micpie/export/rdkit_features/valid_1-7.jsonl": "{"text":"The number of acid groups of the chemical with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5 is 0."} {"text":"The number of acid groups of the chemical with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F is 0."}", "/scratch/micpie/export/rdkit_features/test_9-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C is 5."} {"text":"The number of rotatable bonds of the compound with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4 is 8."}", "/scratch/micpie/export/rdkit_features/test_32-8.jsonl": "{"text":"The count of basic groups of the compound with SMILES CS(=O)(=O)C[C@@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3 is 1."} {"text":"The count of basic groups of the compound with SMILES C[C@@H](C(=O)N(C)Cc1ccccc1)Nc2ccc(cc2)[C@@H]3CC(=O)NC(=O)N3 is 0."}", "/scratch/micpie/export/rdkit_features/valid_8-16.jsonl": "{"text":"Question: What is the count of rings of the chemical with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C?\nAnswer: 2"} {"text":"Question: What is the count of rings of the chemical with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_18-22.jsonl": "{"text":"User: I want to create a compound with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 9.\nAssistant: Do you have some additional requirements that I should consider?\nUser: I want the number of heteroatoms to be 13.\nAssistant: Then, I the compound with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O."} {"text":"User: I want to design a chemical with a number of hydrogen bond donor sites of 1 and a count of hydrogen bond acceptors of 4.\nAssistant: That is a very interesting question, do you have some additional requirements that help me narrow down the search?\nUser: Yeah, I want the count of heteroatoms to be 7.\nAssistant: In that case, I suggest the chemical with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C."}", "/scratch/micpie/export/rdkit_features/train_105-19.jsonl": "{"text":"Question: What is the count of acid groups of the chemical with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the chemical with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_111-11.jsonl": "{"text":"User: I want to synthesize a compound with a formula of C24H35N8+.\nAssistant: Interesting, do you have some additional limitations I should take into account?\nUser: Yea, I want the number of hydrogen bond donor sites to be 1, the count of hydrogen bond acceptors to be 7.\nAssistant: I suggest the compound with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4."} {"text":"User: I want to create a compound with a formula of C22H29N7O3.\nAssistant: That's interesting, do you have some additional requirements that I should consider?\nUser: Yep, I want the count of hydrogen bond donors to be 0, the number of hydrogen bond acceptors to be 10.\nAssistant: In that scenario, I suggest the compound with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC."}", "/scratch/micpie/export/rdkit_features/train_32-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C?\nAnswer: 5"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_33-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the molecule with SMILES c1ccc2c(c1)cn(n2)CC(=O)N3C[C@@H]4[C@H]3CN(CC4)C(=O)[C@@]56CCC[C@@H]5C6 is 0."} {"text":"The count of hydrogen bond donors of the chemical with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@@H]3CCSC3)Br is 1."}", "/scratch/micpie/export/rdkit_features/test_27-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the chemical with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F?\nAnswer: 6"} {"text":"Question: What is the number of aromatic bonds of the compound with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_10-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the compound with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4 is 5."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F is 1."}", "/scratch/micpie/export/rdkit_features/valid_6-19.jsonl": "{"text":"Question: What is the count of acid groups of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+]?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the chemical with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_11-16.jsonl": "{"text":"Question: What is the count of rings of the chemical with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2?\nAnswer: 2"} {"text":"Question: What is the ring count of the molecule with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_113-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N is 0.67."} {"text":"The Wildman-Crippen LogP value of the compound with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 5.61."}", "/scratch/micpie/export/rdkit_features/valid_100-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F is 2."} {"text":"The number of hydrogen bond donors of the molecule with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4 is 1."}", "/scratch/micpie/export/rdkit_features/train_103-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1?\nAnswer: 44.28"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 47.35"}", "/scratch/micpie/export/rdkit_features/valid_116-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES CN(C)c1ccc(c2c1cccc2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 79.60"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(o4)c5ccc(cc5)Cl?\nAnswer: 70.02"}", "/scratch/micpie/export/rdkit_features/valid_101-22.jsonl": "{"text":"User: I want to synthesize a compound with a number of hydrogen bond donors of 2 and a number of hydrogen bond acceptor sites of 4.\nAssistant: That's interesting, do you have some additional conditions that I should consider?\nUser: Yeah, I want the number of heteroatoms to be 7.\nAssistant: Then, I propose the compound with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3."} {"text":"User: I want to synthesize a molecule with a number of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 3.\nAssistant: Do you have some additional constraints I should take into account?\nUser: Indeed, I want the count of heteroatoms to be 7.\nAssistant: Then, I suggest the molecule with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1."}", "/scratch/micpie/export/rdkit_features/valid_111-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4 is 74.38."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC is 68.16."}", "/scratch/micpie/export/rdkit_features/test_109-16.jsonl": "{"text":"Question: What is the ring count of the chemical with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl?\nAnswer: 3"} {"text":"Question: What is the ring count of the chemical with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_24-16.jsonl": "{"text":"Question: What is the number of rings of the molecule with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl?\nAnswer: 4"} {"text":"Question: What is the number of rings of the chemical with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_108-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O?\nAnswer: C17H20N4O6S2"} {"text":"Question: What is the chemical formula of the molecule with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C?\nAnswer: C23H34N4O4"}", "/scratch/micpie/export/rdkit_features/test_33-0.jsonl": "{"text":"The formula of the compound with SMILES Cc1ccc(cc1)C(=O)N2CCC(CC2)NC(=O)N[C@@H](C)[C@H](C(F)(F)F)O is C18H24F3N3O3."} {"text":"The chemical formula of the molecule with SMILES Cc1cc(no1)OCc2nc(no2)c3cc(c(n(c3=O)C)C)Br is C14H13BrN4O4."}", "/scratch/micpie/export/rdkit_features/valid_26-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2?\nAnswer: 3"} {"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_111-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4?\nAnswer: 4"} {"text":"Question: What is the count of rings of the molecule with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_102-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1?\nAnswer: 60.50"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2?\nAnswer: 144.96"}", "/scratch/micpie/export/rdkit_features/train_114-20.jsonl": "{"text":"Question: What is the count of basic groups of the molecule with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C?\nAnswer: 0"} {"text":"Question: What is the basic group count of the compound with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/valid_28-5.jsonl": "{"text":"The rotatable bond count of the compound with SMILES c1ccc(c(c1)C2(CC2)CNC(=O)N[C@H]3CC[C@H](C3)F)Br is 4."} {"text":"The number of rotatable bonds of the compound with SMILES C[C@H](c1ccc(cc1)c2ccccc2)N(C)C(=O)C[C@@H]3CCC(=O)N3C is 5."}", "/scratch/micpie/export/rdkit_features/test_18-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O is 65.29."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C is 66.04."}", "/scratch/micpie/export/rdkit_features/train_14-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C?\nAnswer: C20H30N4O"} {"text":"Question: What is the chemical formula of the molecule with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O?\nAnswer: C14H14BrNO3"}", "/scratch/micpie/export/rdkit_features/train_114-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the compound with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C is 6.22."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F is 3.73."}", "/scratch/micpie/export/rdkit_features/test_0-8.jsonl": "{"text":"The count of basic groups of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1 is 0."} {"text":"The count of basic groups of the molecule with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC is 0."}", "/scratch/micpie/export/rdkit_features/test_30-16.jsonl": "{"text":"Question: What is the number of rings of the compound with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3?\nAnswer: 3"} {"text":"Question: What is the count of rings of the chemical with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_25-23.jsonl": "{"text":"User: I want to create a compound with a count of hydrogen bond donors of 0, a number of hydrogen bond acceptors of 8 and a Wildman-Crippen LogP value of 2.45.\nAssistant: Nice, do you have some additional requirements I should consider?\nUser: Yes, I want the chemical formula to be C20H29N5O3.\nAssistant: In that scenario, I the compound with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4."} {"text":"User: I want to analyze a compound with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value computed using RDKit of 3.98.\nAssistant: That's interesting, do you have some additional I should consider?\nUser: I want the chemical formula to be C20H25N3OS.\nAssistant: In that case, I the compound with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3."}", "/scratch/micpie/export/rdkit_features/train_9-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the compound with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C is 3."} {"text":"The number of hydrogen bond donors of the chemical with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F is 0."}", "/scratch/micpie/export/rdkit_features/test_4-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES COc1ccc(c(n1)Cl)NC(=O)C(=O)N2Cc3ccccc3[C@H](C2)c4ccccc4 is 1."} {"text":"The number of hydrogen bond donor sites of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)n2c3c(c(c2)C[NH+]([C@H]4CC[C@@H](CC4)O)C)cccc3 is 2."}", "/scratch/micpie/export/rdkit_features/valid_27-5.jsonl": "{"text":"The number of rotatable bonds of the chemical with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F is 4."} {"text":"The number of rotatable bonds of the compound with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C is 5."}", "/scratch/micpie/export/rdkit_features/train_18-4.jsonl": "{"text":"The count of rings of the compound with SMILES C[C@@H](CS(=O)(=O)C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O is 4."} {"text":"The count of rings of the chemical with SMILES C[C@@H](CNS(=O)(=O)c1ccc(c(c1)Cl)NC(=O)C)Oc2ccc(cc2)Cl is 2."}", "/scratch/micpie/export/rdkit_features/train_15-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC?\nAnswer: 2"} {"text":"Question: What is the count of rings of the compound with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_108-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES Cc1ccc(c(c1)C)n2c(nnc2SCc3cc(=O)n4c(n3)nc([nH]4)N)c5ccccn5 is 27."} {"text":"The aromatic bond count of the molecule with SMILES CCc1cc(=O)[nH]c(n1)c2cccc(c2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3 is 12."}", "/scratch/micpie/export/rdkit_features/test_6-11.jsonl": "{"text":"User: I want to design a chemical with a chemical formula of C22H34N4O4.\nAssistant: That is a very interesting question, do you have some additional constraints that I should consider?\nUser: Yea, I want the number of hydrogen bond donor sites to be 3, the number of hydrogen bond acceptors to be 4.\nAssistant: In that case, I advise the chemical with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3."} {"text":"User: I want to create a molecule with a chemical formula of C25H30N4O2.\nAssistant: Interesting, do you have some additional requirements?\nUser: Indeed, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 5.\nAssistant: I propose the molecule with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C."}", "/scratch/micpie/export/rdkit_features/train_108-3.jsonl": "{"text":"The heteroatom count of the chemical with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2 is 12."} {"text":"The count of heteroatoms of the chemical with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3 is 8."}", "/scratch/micpie/export/rdkit_features/test_5-19.jsonl": "{"text":"Question: What is the acid group count of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_113-19.jsonl": "{"text":"Question: What is the number of acid groups of the compound with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 0"} {"text":"Question: What is the acid group count of the chemical with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_18-5.jsonl": "{"text":"The rotatable bond count of the chemical with SMILES c1cnc(nc1)CC(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 6."} {"text":"The rotatable bond count of the chemical with SMILES CCN(Cc1ccc(c(c1)Br)Br)C[C@@H](C(F)(F)F)O is 5."}", "/scratch/micpie/export/rdkit_features/test_8-16.jsonl": "{"text":"Question: What is the number of rings of the molecule with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C?\nAnswer: 3"} {"text":"Question: What is the number of rings of the chemical with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_32-19.jsonl": "{"text":"Question: What is the acid group count of the compound with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3?\nAnswer: 2"} {"text":"Question: What is the acid group count of the compound with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_10-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4?\nAnswer: 74.79"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl?\nAnswer: 58.75"}", "/scratch/micpie/export/rdkit_features/valid_110-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_104-23.jsonl": "{"text":"User: I want to design a compound with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptor sites of 7 and a Wildman-Crippen LogP value computed using RDKit of -1.05.\nAssistant: Nice, do you have some additional constraints that I should consider?\nUser: I want the molecular formula to be C14H21N5O4.\nAssistant: Then, I propose the compound with SMILES Cn1c(cnn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3."} {"text":"User: I want to make a chemical with a number of hydrogen bond donor sites of 0, a number of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value computed using RDKit of 3.68.\nAssistant: Nice, do you have some additional conditions I should take into account?\nUser: Yea, I want the molecular formula to be C19H16ClN2O2S-.\nAssistant: In that situation, I suggest the chemical with SMILES c1cc(ccc1SC2CCN(CC2)C(=O)c3ccc(c(c3)[O-])C#N)Cl."}", "/scratch/micpie/export/rdkit_features/train_112-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES CN1CCN(CC1=O)c2nnc(n2CC3(CCCC3)OC)c4ccc(c(c4)OC)OC is 0."} {"text":"The acid group count of the compound with SMILES c1cc(cnc1)[C@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 0."}", "/scratch/micpie/export/rdkit_features/train_117-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(cc4)Oc5ccc(cn5)Cl is 6."} {"text":"The count of rotatable bonds of the molecule with SMILES C[C@@H](C(C)(C)OC)[NH2+]CCNc1ccccc1 is 7."}", "/scratch/micpie/export/rdkit_features/valid_118-23.jsonl": "{"text":"User: I want to analyze a chemical with a count of hydrogen bond donors of 2, a count of hydrogen bond acceptors of 1 and a Wildman-Crippen LogP value computed using RDKit of 1.92.\nAssistant: Cool, do you have some additional conditions I should take into account?\nUser: Indeed, I want the formula to be C14H23FNO+.\nAssistant: Then, I propose the chemical with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O."} {"text":"User: I want to make a chemical with a count of hydrogen bond donors of 3, a count of hydrogen bond acceptors of 5 and a LogP value computed using the Wildman-Crippen method of 1.99.\nAssistant: Nice, do you have some additional conditions that help me narrow down the search?\nUser: Yep, I want the molecular formula to be C15H20FN3O4.\nAssistant: In that case, I recommend the chemical with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO."}", "/scratch/micpie/export/rdkit_features/valid_9-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2 is 63.11."} {"text":"The sum of atomic polarizabilities of the molecule with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4 is 73.59."}", "/scratch/micpie/export/rdkit_features/test_26-5.jsonl": "{"text":"The count of rotatable bonds of the compound with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3 is 6."} {"text":"The count of rotatable bonds of the molecule with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br is 4."}", "/scratch/micpie/export/rdkit_features/valid_6-1.jsonl": "{"text":"The number of hydrogen bond donor sites of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+] is 3."} {"text":"The number of hydrogen bond donor sites of the chemical with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F is 0."}", "/scratch/micpie/export/rdkit_features/valid_32-7.jsonl": "{"text":"The number of acid groups of the molecule with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3 is 2."} {"text":"The count of acid groups of the compound with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O is 0."}", "/scratch/micpie/export/rdkit_features/test_9-11.jsonl": "{"text":"User: I want to make a molecule with a molecular formula of C23H36N3O3+.\nAssistant: Cool, do you have some additional constraints?\nUser: Yeah, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 3.\nAssistant: Then, I suggest the molecule with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C."} {"text":"User: I want to create a chemical with a chemical formula of C23H40N7+.\nAssistant: Do you have some additional conditions I should take into account?\nUser: Yea, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 6.\nAssistant: Then, I recommend the chemical with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4."}", "/scratch/micpie/export/rdkit_features/test_25-8.jsonl": "{"text":"The count of basic groups of the compound with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4 is 1."} {"text":"The count of basic groups of the compound with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3 is 0."}", "/scratch/micpie/export/rdkit_features/valid_101-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3 is 2.80."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1 is 5.05."}", "/scratch/micpie/export/rdkit_features/test_20-16.jsonl": "{"text":"Question: What is the count of rings of the chemical with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br?\nAnswer: 4"} {"text":"Question: What is the number of rings of the molecule with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-]?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_33-4.jsonl": "{"text":"The number of rings of the compound with SMILES Cc1ccc(cc1)C(=O)N2CCC(CC2)NC(=O)N[C@@H](C)[C@H](C(F)(F)F)O is 2."} {"text":"The number of rings of the molecule with SMILES Cc1cc(no1)OCc2nc(no2)c3cc(c(n(c3=O)C)C)Br is 3."}", "/scratch/micpie/export/rdkit_features/test_117-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc5cc(ccc5c4)Br?\nAnswer: 3"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES C[C@H](C(C)(C)OC)[NH2+]CCNc1ccccc1?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_7-16.jsonl": "{"text":"Question: What is the ring count of the molecule with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC?\nAnswer: 4"} {"text":"Question: What is the count of rings of the chemical with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_113-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 5"} {"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/test_20-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the molecule with SMILES C[C@@]1(CCO[C@H]1C2CC2)CNC(=O)N[C@@H]3C[C@H]3c4ccc(c(c4)F)Br is 56.85."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES COC(=O)[C@@H](c1ccc(cc1)Cl)NS(=O)(=O)Cc2ccc(cc2)C(=O)[O-] is 50.91."}", "/scratch/micpie/export/rdkit_features/train_109-11.jsonl": "{"text":"User: I want to make a molecule with a formula of C23H33N3O5.\nAssistant: That's interesting, do you have some additional limitations I should take into account?\nUser: I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 5.\nAssistant: I advise the molecule with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3."} {"text":"User: I want to analyze a chemical with a molecular formula of C16H18N6O3S3.\nAssistant: Nice, do you have some additional constraints I should consider?\nUser: Indeed, I want the number of hydrogen bond donors to be 3, the number of hydrogen bond acceptor sites to be 8.\nAssistant: In that case, I recommend the chemical with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C."}", "/scratch/micpie/export/rdkit_features/test_113-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 64.11"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 62.40"}", "/scratch/micpie/export/rdkit_features/test_31-19.jsonl": "{"text":"Question: What is the acid group count of the molecule with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F?\nAnswer: 0"} {"text":"Question: What is the acid group count of the compound with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_25-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_113-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 0.24."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 6.22."}", "/scratch/micpie/export/rdkit_features/test_0-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1?\nAnswer: 9"} {"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_110-4.jsonl": "{"text":"The number of rings of the molecule with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O is 4."} {"text":"The count of rings of the chemical with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C is 4."}", "/scratch/micpie/export/rdkit_features/train_28-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl is 11."} {"text":"The aromatic bond count of the compound with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F is 6."}", "/scratch/micpie/export/rdkit_features/valid_0-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1?\nAnswer: 5"} {"text":"Question: What is the number of rotatable bonds of the molecule with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/valid_113-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the compound with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 6"} {"text":"Question: What is the number of aromatic bonds of the chemical with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 18"}", "/scratch/micpie/export/rdkit_features/train_29-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the chemical with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F?\nAnswer: 6"} {"text":"Question: What is the heteroatom count of the compound with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_115-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES COc1cc(ccc1OCc2ccccc2)C(=O)Nc3cc(ccc3NC(=O)c4ccccn4)F?\nAnswer: 8"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES Cc1ccc(cc1)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_107-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES Cn1c2ccc(cc2nc1C3CC3)NC(=O)C(=O)N[C@@H](Cc4ccccc4)C(=O)N(C)C?\nAnswer: 5"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES CCOc1cc2c(cc1NC(=O)C(=O)NCc3cccc(c3)NC(=O)COC)O[C@@H](C2)C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_27-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the molecule with SMILES CCCCN(C)c1ccc(cc1)C(=O)NC[C@@H]2CCCn3c2[nH+]c(c3)C is 3."} {"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES CCN(Cc1ccc(cc1)OC)C(=O)c2ccc3c(c2)[nH]c(=O)n3C4CC4 is 4."}", "/scratch/micpie/export/rdkit_features/valid_23-11.jsonl": "{"text":"User: I want to make a molecule with a chemical formula of C19H34N5OS+.\nAssistant: Do you have some additional requirements I should take into account?\nUser: Yeah, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 5.\nAssistant: In that situation, I propose the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4."} {"text":"User: I want to make a compound with a chemical formula of C19H24N7O2-.\nAssistant: Do you have some additional limitations I should consider?\nUser: I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 6.\nAssistant: I suggest the compound with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4."}", "/scratch/micpie/export/rdkit_features/valid_112-11.jsonl": "{"text":"User: I want to synthesize a molecule with a chemical formula of C21H26N6O3S.\nAssistant: Do you have some additional constraints that help me narrow down the search?\nUser: Yea, I want the number of hydrogen bond donor sites to be 0, the count of hydrogen bond acceptors to be 9.\nAssistant: Then, I the molecule with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC."} {"text":"User: I want to synthesize a chemical with a chemical formula of C22H27N6O2S+.\nAssistant: Cool, do you have some additional limitations that I should consider?\nUser: I want the count of hydrogen bond donors to be 3, the number of hydrogen bond acceptors to be 5.\nAssistant: I propose the chemical with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N."}", "/scratch/micpie/export/rdkit_features/train_2-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES Cn1c2ccc(cc2nc1Cc3cccs3)c4nc(no4)[C@@H]5C[NH+](CCO5)C6CC6?\nAnswer: 7"} {"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES COc1ccc2c(c1)CC[C@@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_101-23.jsonl": "{"text":"User: I want to make a compound with a number of hydrogen bond donor sites of 1, a number of hydrogen bond acceptors of 5 and a LogP value computed using the Wildman-Crippen method of 2.77.\nAssistant: Nice, do you have some additional limitations that I should consider?\nUser: I want the chemical formula to be C18H18N4O2.\nAssistant: In that situation, I recommend the compound with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3."} {"text":"User: I want to create a compound with a count of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 7 and a LogP value computed using the Wildman-Crippen method of 4.44.\nAssistant: That is a very interesting question, do you have some additional I should take into account?\nUser: I want the chemical formula to be C25H23N3O4.\nAssistant: In that situation, I advise the compound with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1."}", "/scratch/micpie/export/rdkit_features/test_9-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C is 2.94."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4 is 2.40."}", "/scratch/micpie/export/rdkit_features/test_10-3.jsonl": "{"text":"The count of heteroatoms of the compound with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4 is 6."} {"text":"The heteroatom count of the molecule with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F is 6."}", "/scratch/micpie/export/rdkit_features/train_105-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N?\nAnswer: 5"} {"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_13-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the molecule with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 4"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_106-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the molecule with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O?\nAnswer: 3"} {"text":"Question: What is the rotatable bond count of the chemical with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_116-11.jsonl": "{"text":"User: I want to make a compound with a molecular formula of C29H26FN3O2S.\nAssistant: Cool, do you have some additional constraints I should take into account?\nUser: Yeah, I want the count of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 4.\nAssistant: Then, I recommend the compound with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1."} {"text":"User: I want to analyze a chemical with a molecular formula of C24H23ClN2O3S.\nAssistant: That's interesting, do you have some additional conditions I should take into account?\nUser: Indeed, I want the number of hydrogen bond donor sites to be 0, the number of hydrogen bond acceptor sites to be 4.\nAssistant: I the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl."}", "/scratch/micpie/export/rdkit_features/train_13-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC is 2."} {"text":"The count of hydrogen bond acceptors of the molecule with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C is 3."}", "/scratch/micpie/export/rdkit_features/train_33-8.jsonl": "{"text":"The number of basic groups of the compound with SMILES COc1ccc(c(n1)C(F)(F)F)NC(=O)C(=O)NCCC[NH+]2CCCCC2 is 1."} {"text":"The number of basic groups of the chemical with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@H]3CCSC3)Br is 1."}", "/scratch/micpie/export/rdkit_features/test_114-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the chemical with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C?\nAnswer: 18"} {"text":"Question: What is the number of aromatic bonds of the compound with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5?\nAnswer: 26"}", "/scratch/micpie/export/rdkit_features/valid_24-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the molecule with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl?\nAnswer: 10"} {"text":"Question: What is the heteroatom count of the molecule with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/valid_16-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C is 0."} {"text":"The number of aromatic bonds of the molecule with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3 is 0."}", "/scratch/micpie/export/rdkit_features/valid_24-7.jsonl": "{"text":"The count of acid groups of the molecule with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl is 3."} {"text":"The number of acid groups of the chemical with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C is 0."}", "/scratch/micpie/export/rdkit_features/test_19-19.jsonl": "{"text":"Question: What is the count of acid groups of the compound with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the compound with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_120-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES c1ccc(cc1)CCN(CC(=O)Nc2ccc(cc2)Cl)S(=O)(=O)c3ccc4c(c3)OCCO4?\nAnswer: C24H23ClN2O5S"} {"text":"Question: What is the chemical formula of the compound with SMILES Cc1c(cnn1c2ccccc2)N[C@H](C)C(=O)Nc3cccc(c3Cl)Cl?\nAnswer: C19H18Cl2N4O"}", "/scratch/micpie/export/rdkit_features/train_110-8.jsonl": "{"text":"The count of basic groups of the chemical with SMILES COC(=O)Cc1cc(=O)n([nH]1)c2ccc(cc2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O is 0."} {"text":"The number of basic groups of the molecule with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@@H]4CN(CCO4)S(=O)(=O)C is 0."}", "/scratch/micpie/export/rdkit_features/train_24-8.jsonl": "{"text":"The number of basic groups of the chemical with SMILES c1cc(ccc1c2n[n-]nn2)C3(CC3)C(=O)Nc4ccc(cc4)n5ccc(=O)[nH]5 is 0."} {"text":"The count of basic groups of the compound with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@H]2c3cnn(c3)C is 0."}", "/scratch/micpie/export/rdkit_features/valid_24-11.jsonl": "{"text":"User: I want to make a compound with a chemical formula of C18H13ClN7O2-.\nAssistant: That's interesting, do you have some additional conditions that help me narrow down the search?\nUser: Indeed, I want the number of hydrogen bond donors to be 2, the count of hydrogen bond acceptors to be 6.\nAssistant: I advise the compound with SMILES c1cc(ccc1Cn2cc(ccc2=O)NC(=O)c3cc(c[nH]3)c4n[n-]nn4)Cl."} {"text":"User: I want to synthesize a molecule with a molecular formula of C21H39N6O+.\nAssistant: Interesting, do you have some additional requirements I should take into account?\nUser: Indeed, I want the number of hydrogen bond donor sites to be 1, the count of hydrogen bond acceptors to be 5.\nAssistant: Then, I recommend the molecule with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@@H](C3)C."}", "/scratch/micpie/export/rdkit_features/valid_13-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the chemical with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4?\nAnswer: 17"} {"text":"Question: What is the count of aromatic bonds of the compound with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C?\nAnswer: 22"}", "/scratch/micpie/export/rdkit_features/test_106-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the chemical with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O?\nAnswer: 44.25"} {"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC?\nAnswer: 66.93"}", "/scratch/micpie/export/rdkit_features/test_116-16.jsonl": "{"text":"Question: What is the count of rings of the compound with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1?\nAnswer: 7"} {"text":"Question: What is the ring count of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_109-7.jsonl": "{"text":"The number of acid groups of the compound with SMILES Cc1cc(ccc1OC2CCOCC2)NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N3CCCC3 is 0."} {"text":"The count of acid groups of the compound with SMILES CCNS(=O)(=O)c1cccc(c1)C(=O)Nc2nc(c(s2)c3n[nH]c(=S)n3C)C is 0."}", "/scratch/micpie/export/rdkit_features/valid_101-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES COc1cccc(c1)CC(=O)Nc2c3c(cc(n2)Cl)[nH]cn3 is 43.25."} {"text":"The total sum of atomic polarizabilities of the compound with SMILES O=C(CN(C(=O)c1ccc(-c2ccccn2)cc1)C1CCCC1)Nc1cc(F)cc(F)c1 is 65.35."}", "/scratch/micpie/export/rdkit_features/train_103-20.jsonl": "{"text":"Question: What is the number of basic groups of the compound with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1?\nAnswer: 0"} {"text":"Question: What is the basic group count of the compound with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_20-7.jsonl": "{"text":"The acid group count of the chemical with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3 is 0."} {"text":"The count of acid groups of the chemical with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-] is 1."}", "/scratch/micpie/export/rdkit_features/valid_0-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cocc3C(=O)c3cccs3)c2c1?\nAnswer: 8"} {"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES COc1ccc(cc1)C2(CC3(C2)CCC3)C(=O)NCCCOCC4CCOCC4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_5-4.jsonl": "{"text":"The count of rings of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C is 4."} {"text":"The count of rings of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+] is 3."}", "/scratch/micpie/export/rdkit_features/valid_11-11.jsonl": "{"text":"User: I want to design a compound with a chemical formula of C20H28ClNO2.\nAssistant: That's interesting, do you have some additional conditions I should take into account?\nUser: Yep, I want the number of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 2.\nAssistant: In that scenario, I recommend the compound with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl."} {"text":"User: I want to make a chemical with a formula of C17H21Cl2NO.\nAssistant: Interesting, do you have some additional limitations I should take into account?\nUser: Yes, I want the number of hydrogen bond donor sites to be 0, the number of hydrogen bond acceptors to be 1.\nAssistant: In that case, I suggest the chemical with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C."}", "/scratch/micpie/export/rdkit_features/valid_13-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4?\nAnswer: 54.95"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C?\nAnswer: 54.70"}", "/scratch/micpie/export/rdkit_features/valid_119-5.jsonl": "{"text":"The rotatable bond count of the chemical with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O is 5."} {"text":"The number of rotatable bonds of the molecule with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6 is 3."}", "/scratch/micpie/export/rdkit_features/train_114-1.jsonl": "{"text":"The number of hydrogen bond donors of the compound with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C is 3."} {"text":"The number of hydrogen bond donor sites of the chemical with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F is 2."}", "/scratch/micpie/export/rdkit_features/test_106-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES CC[C@H](C)[C@@H](C(=O)OC)Nc1ccc2c(c[nH]c2n1)N(=O)=O?\nAnswer: 6"} {"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_101-4.jsonl": "{"text":"The count of rings of the compound with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3 is 3."} {"text":"The count of rings of the molecule with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1 is 4."}", "/scratch/micpie/export/rdkit_features/test_18-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the chemical with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O?\nAnswer: 6"} {"text":"Question: What is the number of aromatic bonds of the chemical with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/valid_15-20.jsonl": "{"text":"Question: What is the basic group count of the molecule with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the compound with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_1-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4 is 2.54."} {"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl is 3.91."}", "/scratch/micpie/export/rdkit_features/train_15-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC is 1."} {"text":"The number of hydrogen bond donors of the compound with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C is 2."}", "/scratch/micpie/export/rdkit_features/test_6-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the molecule with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3?\nAnswer: 8"} {"text":"Question: What is the heteroatom count of the molecule with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_25-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C is 0."} {"text":"The number of acid groups of the molecule with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F is 0."}", "/scratch/micpie/export/rdkit_features/valid_2-5.jsonl": "{"text":"The count of rotatable bonds of the molecule with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4 is 7."} {"text":"The number of rotatable bonds of the molecule with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3 is 6."}", "/scratch/micpie/export/rdkit_features/train_23-18.jsonl": "{"text":"Question: What is the aromatic bond count of the chemical with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4?\nAnswer: 10"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/test_112-22.jsonl": "{"text":"User: I want to synthesize a compound with a number of hydrogen bond donors of 0 and a number of hydrogen bond acceptors of 9.\nAssistant: That is a very interesting question, do you have some additional limitations that I should consider?\nUser: Indeed, I want the heteroatom count to be 10.\nAssistant: In that situation, I propose the compound with SMILES C[C@@H](c1nc(cs1)Cn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(cc4)OC)OC."} {"text":"User: I want to make a molecule with a number of hydrogen bond donors of 3 and a number of hydrogen bond acceptor sites of 5.\nAssistant: Nice, do you have some additional constraints?\nUser: I want the count of heteroatoms to be 9.\nAssistant: I recommend the molecule with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N."}", "/scratch/micpie/export/rdkit_features/valid_9-11.jsonl": "{"text":"User: I want to create a chemical with a chemical formula of C21H31F3N2O2.\nAssistant: Interesting, do you have some additional limitations I should take into account?\nUser: I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 3.\nAssistant: Then, I propose the chemical with SMILES CC(C)[C@H](CNC(=O)CN1C[C@@](OC(C1)(C)C)(C)C(F)(F)F)c2ccccc2."} {"text":"User: I want to make a chemical with a formula of C23H39N5O2.\nAssistant: That's interesting, do you have some additional conditions?\nUser: I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 6.\nAssistant: I the chemical with SMILES C[C@@H]1CCCC[C@@H]1OCCn2c(nnc2N3CC[C@H](C3)CC(C)C)[C@@H]4CCC(=O)N4."}", "/scratch/micpie/export/rdkit_features/test_6-18.jsonl": "{"text":"Question: What is the aromatic bond count of the chemical with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3?\nAnswer: 6"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C?\nAnswer: 17"}", "/scratch/micpie/export/rdkit_features/train_15-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC?\nAnswer: 5"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/train_116-8.jsonl": "{"text":"The basic group count of the molecule with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 0."} {"text":"The number of basic groups of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br is 0."}", "/scratch/micpie/export/rdkit_features/test_5-15.jsonl": "{"text":"Question: What is the heteroatom count of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CCC5(CC4)COC5?\nAnswer: 6"} {"text":"Question: What is the number of heteroatoms of the molecule with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@H]3CCC(CN3)(F)F?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/train_119-3.jsonl": "{"text":"The number of heteroatoms of the chemical with SMILES c1cc(ccc1NC(=O)c2cc(c(cc2N)F)N(=O)=O)OCCO is 9."} {"text":"The number of heteroatoms of the molecule with SMILES Cn1c(nnc1SCC(=O)NC2CCCCC2)CNC(=O)c3ccc(cc3Cl)Cl is 10."}", "/scratch/micpie/export/rdkit_features/train_1-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4 is 3."} {"text":"The number of hydrogen bond acceptors of the molecule with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl is 3."}", "/scratch/micpie/export/rdkit_features/train_108-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2?\nAnswer: 56.70"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3?\nAnswer: 69.43"}", "/scratch/micpie/export/rdkit_features/test_101-23.jsonl": "{"text":"User: I want to make a compound with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 4 and a Wildman-Crippen LogP value computed using RDKit of 2.87.\nAssistant: That is a very interesting question, do you have some additional constraints I should take into account?\nUser: Yeah, I want the chemical formula to be C15H14F2N2O3.\nAssistant: In that scenario, I suggest the compound with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F."} {"text":"User: I want to make a compound with a number of hydrogen bond donor sites of 2, a count of hydrogen bond acceptors of 6 and a Wildman-Crippen LogP value computed using RDKit of 4.81.\nAssistant: Interesting, do you have some additional conditions?\nUser: I want the molecular formula to be C26H20N6OS.\nAssistant: In that scenario, I propose the compound with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1."}", "/scratch/micpie/export/rdkit_features/train_115-19.jsonl": "{"text":"Question: What is the number of acid groups of the molecule with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the chemical with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_0-4.jsonl": "{"text":"The count of rings of the chemical with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1 is 7."} {"text":"The number of rings of the compound with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC is 4."}", "/scratch/micpie/export/rdkit_features/train_32-17.jsonl": "{"text":"Question: What is the count of rotatable bonds of the chemical with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C?\nAnswer: 8"} {"text":"Question: What is the count of rotatable bonds of the chemical with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/train_10-9.jsonl": "{"text":"The sum of atomic polarizabilities of the chemical with SMILES C[C@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4 is 74.79."} {"text":"The total sum of atomic polarizabilities of the molecule with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@@H](C2)c3cccc(c3)Cl is 58.75."}", "/scratch/micpie/export/rdkit_features/test_11-3.jsonl": "{"text":"The count of heteroatoms of the chemical with SMILES CC(C)[C@@H](CC(=O)N1CC(C1)(C)CC(F)(F)F)c2ccccc2 is 5."} {"text":"The count of heteroatoms of the chemical with SMILES Cc1cc(c(c(c1)C)C(=O)CCCC(=O)N2CC([C@@H]2C3CC3)(C)C)C is 3."}", "/scratch/micpie/export/rdkit_features/train_114-23.jsonl": "{"text":"User: I want to synthesize a compound with a number of hydrogen bond donors of 3, a number of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value computed using RDKit of 6.22.\nAssistant: That is a very interesting question, do you have some additional ?\nUser: Indeed, I want the chemical formula to be C21H19Cl2N3O2S2.\nAssistant: In that situation, I recommend the compound with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C."} {"text":"User: I want to create a chemical with a number of hydrogen bond donors of 2, a number of hydrogen bond acceptor sites of 4 and a Wildman-Crippen LogP value of 3.73.\nAssistant: That is a very interesting question, do you have some additional limitations I should take into account?\nUser: Yes, I want the chemical formula to be C27H26FN4O2+.\nAssistant: In that situation, I recommend the chemical with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F."}", "/scratch/micpie/export/rdkit_features/valid_108-23.jsonl": "{"text":"User: I want to design a molecule with a number of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 8 and a Wildman-Crippen LogP value computed using RDKit of 2.37.\nAssistant: That's interesting, do you have some additional limitations I should take into account?\nUser: Yeah, I want the chemical formula to be C17H20N4O6S2.\nAssistant: In that scenario, I suggest the molecule with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O."} {"text":"User: I want to synthesize a molecule with a number of hydrogen bond donors of 3, a number of hydrogen bond acceptors of 4 and a Wildman-Crippen LogP value of 2.08.\nAssistant: Do you have some additional that help me narrow down the search?\nUser: Yep, I want the formula to be C23H34N4O4.\nAssistant: In that scenario, I recommend the molecule with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C."}", "/scratch/micpie/export/rdkit_features/test_21-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the molecule with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-]?\nAnswer: 12"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_21-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the molecule with SMILES Cc1cc(cc(c1C)S(=O)(=O)N[C@@H](c2ccccc2)C(=O)OC)C(=O)[O-] is 6."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3ccn(n3)C(C)C)O is 5."}", "/scratch/micpie/export/rdkit_features/valid_6-0.jsonl": "{"text":"The formula of the molecule with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)c3ccc(cc3)[C@@H](CN)[NH3+] is C25H26N3O3+."} {"text":"The chemical formula of the compound with SMILES COc1ccc(cc1)[C@H]2CCCCCN2C(=O)Cn3nc(nn3)c4ccc(cc4)F is C22H24FN5O2."}", "/scratch/micpie/export/rdkit_features/train_33-6.jsonl": "{"text":"The aromatic bond count of the chemical with SMILES COc1ccc(c(n1)C(F)(F)F)NC(=O)C(=O)NCCC[NH+]2CCCCC2 is 6."} {"text":"The count of aromatic bonds of the compound with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@H]3CCSC3)Br is 11."}", "/scratch/micpie/export/rdkit_features/test_8-5.jsonl": "{"text":"The number of rotatable bonds of the compound with SMILES C[C@@H](c1ccccn1)N(Cc2ccoc2C(=O)OC)C(=O)c3ccn(n3)CC(C)C is 8."} {"text":"The count of rotatable bonds of the compound with SMILES COc1ccc(cc1)CN2CC[C@H](C2=O)N3CCCc4c3cccc4OC(F)F is 6."}", "/scratch/micpie/export/rdkit_features/valid_10-4.jsonl": "{"text":"The ring count of the chemical with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C is 3."} {"text":"The number of rings of the chemical with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3 is 3."}", "/scratch/micpie/export/rdkit_features/test_6-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES C[C@H]1CCN(C[C@H]1CNC(=O)OC(C)(C)C)C(=O)Nc2cc3c([nH]c2=O)CCCC3?\nAnswer: 69.00"} {"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES C[C@@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C?\nAnswer: 70.01"}", "/scratch/micpie/export/rdkit_features/test_116-22.jsonl": "{"text":"User: I want to synthesize a chemical with a number of hydrogen bond donor sites of 1 and a number of hydrogen bond acceptor sites of 4.\nAssistant: Do you have some additional constraints I should take into account?\nUser: Yes, I want the number of heteroatoms to be 7.\nAssistant: Then, I the chemical with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1."} {"text":"User: I want to design a compound with a number of hydrogen bond donors of 0 and a number of hydrogen bond acceptor sites of 4.\nAssistant: Nice, do you have some additional that help me narrow down the search?\nUser: Yep, I want the number of heteroatoms to be 7.\nAssistant: In that situation, I suggest the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl."}", "/scratch/micpie/export/rdkit_features/test_26-12.jsonl": "{"text":"Question: What is the chemical formula of the molecule with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3?\nAnswer: C18H22F3NO3"} {"text":"Question: What is the formula of the compound with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br?\nAnswer: C15H13BrClNO3"}", "/scratch/micpie/export/rdkit_features/valid_15-22.jsonl": "{"text":"User: I want to synthesize a molecule with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptors of 4.\nAssistant: Do you have some additional requirements I should take into account?\nUser: I want the number of heteroatoms to be 5.\nAssistant: In that situation, I propose the molecule with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N."} {"text":"User: I want to create a chemical with a count of hydrogen bond donors of 2 and a number of hydrogen bond acceptor sites of 2.\nAssistant: Cool, do you have some additional that I should consider?\nUser: Indeed, I want the count of heteroatoms to be 4.\nAssistant: Then, I advise the chemical with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C."}", "/scratch/micpie/export/rdkit_features/train_30-19.jsonl": "{"text":"Question: What is the acid group count of the compound with SMILES CCCN(CC(C)C)C(=O)C(=O)NCCCc1cccc(c1)C(F)(F)F?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the chemical with SMILES CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4ccc(c(c4)F)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_119-5.jsonl": "{"text":"The rotatable bond count of the chemical with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O is 5."} {"text":"The count of rotatable bonds of the molecule with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C is 11."}", "/scratch/micpie/export/rdkit_features/valid_108-19.jsonl": "{"text":"Question: What is the number of acid groups of the molecule with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the chemical with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_18-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the molecule with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O?\nAnswer: 13"} {"text":"Question: What is the count of heteroatoms of the chemical with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/valid_105-22.jsonl": "{"text":"User: I want to create a compound with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 5.\nAssistant: Do you have some additional constraints that I should consider?\nUser: Yeah, I want the heteroatom count to be 7.\nAssistant: In that case, I recommend the compound with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F."} {"text":"User: I want to design a compound with a number of hydrogen bond donor sites of 1 and a count of hydrogen bond acceptors of 4.\nAssistant: Do you have some additional limitations I should consider?\nUser: Indeed, I want the count of heteroatoms to be 8.\nAssistant: I advise the compound with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O."}", "/scratch/micpie/export/rdkit_features/test_0-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1?\nAnswer: C24H12N2O3S5"} {"text":"Question: What is the molecular formula of the molecule with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC?\nAnswer: C24H21N3O4"}", "/scratch/micpie/export/rdkit_features/train_120-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the compound with SMILES Cc1ccc(cc1)NC(=O)CSc2nnc(n2C)C=NC(=O)CCc3ccccc3Cl?\nAnswer: 17"} {"text":"Question: What is the number of aromatic bonds of the molecule with SMILES Cc1c(cnn1c2ccccc2)N[C@@H](C)C(=O)Nc3cccc(c3Cl)Cl?\nAnswer: 17"}", "/scratch/micpie/export/rdkit_features/test_14-0.jsonl": "{"text":"The molecular formula of the molecule with SMILES CCN(Cc1ccccc1)C(=O)Nc2cnn(c2)[C@@H](C)c3ccccc3 is C21H24N4O."} {"text":"The molecular formula of the compound with SMILES c1cc(cc(c1)Br)[C@H](C(=O)OCCCCC#N)O is C13H14BrNO3."}", "/scratch/micpie/export/rdkit_features/test_25-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4 is 6."} {"text":"The count of hydrogen bond acceptors of the compound with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3 is 3."}", "/scratch/micpie/export/rdkit_features/test_13-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC is 2."} {"text":"The count of hydrogen bond acceptors of the molecule with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4 is 2."}", "/scratch/micpie/export/rdkit_features/train_120-20.jsonl": "{"text":"Question: What is the basic group count of the chemical with SMILES Cc1ccc(cc1)NC(=O)CSc2nnc(n2C)C=NC(=O)CCc3ccccc3Cl?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the compound with SMILES Cc1c(cnn1c2ccccc2)N[C@@H](C)C(=O)Nc3cccc(c3Cl)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_12-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl?\nAnswer: 57.42"} {"text":"Question: What is the sum of atomic polarizabilities of the molecule with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3?\nAnswer: 60.40"}", "/scratch/micpie/export/rdkit_features/valid_8-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the compound with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C?\nAnswer: 7"} {"text":"Question: What is the number of heteroatoms of the compound with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/test_103-19.jsonl": "{"text":"Question: What is the count of acid groups of the chemical with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1?\nAnswer: 0"} {"text":"Question: What is the acid group count of the chemical with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_116-7.jsonl": "{"text":"The acid group count of the chemical with SMILES c1ccc2c(c1)c(cc(n2)Cl)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F is 0."} {"text":"The number of acid groups of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4)Br is 0."}", "/scratch/micpie/export/rdkit_features/valid_10-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C is 7."} {"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3 is 2."}", "/scratch/micpie/export/rdkit_features/valid_30-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES CC(C)(C)[C@H]1CCCC[C@@H]1CNC(=O)C2CCN(CC2)c3cnccn3?\nAnswer: 4"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@H]4C[C@H]4c5ccc(cc5)F?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_117-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc(cc4)Oc5ccc(cn5)Cl is 5.06."} {"text":"The LogP value computed using the Wildman-Crippen method of the molecule with SMILES C[C@@H](C(C)(C)OC)[NH2+]CCNc1ccccc1 is 1.48."}", "/scratch/micpie/export/rdkit_features/test_31-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the chemical with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)Cn4ccc5c4cccc5F is 3.74."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES C[C@@H](c1cc(ccc1OC)F)NC(=O)C(=O)N2CCC[C@@H](C2)OCC3CC3 is 2.43."}", "/scratch/micpie/export/rdkit_features/valid_16-23.jsonl": "{"text":"User: I want to analyze a molecule with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptors of 2 and a Wildman-Crippen LogP value computed using RDKit of 1.42.\nAssistant: Cool, do you have some additional conditions I should take into account?\nUser: Yea, I want the chemical formula to be C18H35N2O2+.\nAssistant: In that scenario, I propose the molecule with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCCOC(C)C."} {"text":"User: I want to create a compound with a count of hydrogen bond donors of 1, a number of hydrogen bond acceptor sites of 4 and a LogP value computed using the Wildman-Crippen method of 2.55.\nAssistant: Do you have some additional I should take into account?\nUser: I want the molecular formula to be C23H36N2O4.\nAssistant: In that scenario, I propose the compound with SMILES C=CCC(CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)CC3CCOCC3."}", "/scratch/micpie/export/rdkit_features/test_26-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES c1ccc(c(c1)C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)OCC3CC3 is 3.66."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES COc1ccccc1CNC(=O)c2cc(cc(c2O)Cl)Br is 3.75."}", "/scratch/micpie/export/rdkit_features/train_32-8.jsonl": "{"text":"The number of basic groups of the molecule with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C is 1."} {"text":"The count of basic groups of the molecule with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2 is 1."}", "/scratch/micpie/export/rdkit_features/train_29-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the chemical with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F?\nAnswer: 6"} {"text":"Question: What is the number of aromatic bonds of the compound with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_23-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the chemical with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4?\nAnswer: 8"} {"text":"Question: What is the heteroatom count of the molecule with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/train_20-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@H](CCCCOC)c3ccccc3 is 12."} {"text":"The aromatic bond count of the chemical with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@@H](c2ccccc2F)C(=O)OC is 12."}", "/scratch/micpie/export/rdkit_features/test_30-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the compound with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3?\nAnswer: 5"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_22-5.jsonl": "{"text":"The number of rotatable bonds of the molecule with SMILES C[C@H](C(=O)N1CCCC[C@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O is 3."} {"text":"The rotatable bond count of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4 is 6."}", "/scratch/micpie/export/rdkit_features/test_120-20.jsonl": "{"text":"Question: What is the number of basic groups of the molecule with SMILES c1ccc(cc1)CCN(CC(=O)Nc2ccc(cc2)Cl)S(=O)(=O)c3ccc4c(c3)OCCO4?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the molecule with SMILES Cc1c(cnn1c2ccccc2)N[C@H](C)C(=O)Nc3cccc(c3Cl)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_8-11.jsonl": "{"text":"User: I want to make a molecule with a formula of C23H36N2O5.\nAssistant: Nice, do you have some additional requirements I should take into account?\nUser: Indeed, I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 5.\nAssistant: In that case, I the molecule with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C."} {"text":"User: I want to make a chemical with a molecular formula of C11H7Br2F4NO2.\nAssistant: That's interesting, do you have some additional limitations I should take into account?\nUser: Yea, I want the number of hydrogen bond donors to be 0, the number of hydrogen bond acceptors to be 2.\nAssistant: In that situation, I suggest the chemical with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br."}", "/scratch/micpie/export/rdkit_features/train_3-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O is C25H24N2O4."} {"text":"The chemical formula of the molecule with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4 is C24H28N3O3+."}", "/scratch/micpie/export/rdkit_features/test_119-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O is 12."} {"text":"The count of aromatic bonds of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C is 12."}", "/scratch/micpie/export/rdkit_features/train_17-14.jsonl": "{"text":"Question: What is the count of hydrogen bond acceptors of the molecule with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 9"}", "/scratch/micpie/export/rdkit_features/valid_102-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the compound with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1?\nAnswer: 3"} {"text":"Question: What is the count of hydrogen bond acceptors of the chemical with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_2-3.jsonl": "{"text":"The count of heteroatoms of the chemical with SMILES CCN(CCc1nc(on1)[C@H](c2c(cccc2Cl)F)[NH+]3CCCCC3)C(=O)C is 8."} {"text":"The heteroatom count of the molecule with SMILES CCn1c(=O)ccc(n1)C(=O)N2CC[C@H]([C@@H]2c3cccc(c3)OC)c4ccccc4 is 6."}", "/scratch/micpie/export/rdkit_features/valid_7-4.jsonl": "{"text":"The count of rings of the chemical with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC is 4."} {"text":"The number of rings of the compound with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC is 3."}", "/scratch/micpie/export/rdkit_features/train_6-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES CCOC(=O)c1csc(n1)[C@@H]2CCCN2C(=O)Nc3cc4c([nH]c3=O)CCCC4 is 3.26."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the compound with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)c3c4ccccc4c(=O)n(n3)CC(C)C is 3.62."}", "/scratch/micpie/export/rdkit_features/valid_114-4.jsonl": "{"text":"The number of rings of the chemical with SMILES Cc1ccc(cc1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3cccc(c3)Cl)Cl is 3."} {"text":"The count of rings of the chemical with SMILES C=CC(=O)NCc1ccc(cc1)C(=O)NC[C@@H](c2ccccc2Cl)c3c[nH]c4c3cccc4 is 4."}", "/scratch/micpie/export/rdkit_features/test_113-11.jsonl": "{"text":"User: I want to design a chemical with a formula of C20H24N5O2S2+.\nAssistant: Do you have some additional constraints I should take into account?\nUser: Indeed, I want the number of hydrogen bond donors to be 3, the count of hydrogen bond acceptors to be 5.\nAssistant: Then, I propose the chemical with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N."} {"text":"User: I want to design a chemical with a chemical formula of C20H17Cl2N3O3S2.\nAssistant: Cool, do you have some additional constraints I should take into account?\nUser: Yeah, I want the number of hydrogen bond donor sites to be 3, the number of hydrogen bond acceptors to be 4.\nAssistant: Then, I advise the chemical with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl."}", "/scratch/micpie/export/rdkit_features/test_109-6.jsonl": "{"text":"The aromatic bond count of the molecule with SMILES C[C@H](c1ccccc1)[C@@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl is 12."} {"text":"The count of aromatic bonds of the molecule with SMILES c1cc(cc(c1)n2cc[nH]c2=O)NC(=O)c3cc(ccc3Br)S(=O)(=O)N is 17."}", "/scratch/micpie/export/rdkit_features/test_22-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O is 1."} {"text":"The count of hydrogen bond donors of the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4 is 2."}", "/scratch/micpie/export/rdkit_features/test_113-19.jsonl": "{"text":"Question: What is the acid group count of the chemical with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the chemical with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_9-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the molecule with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C is 3."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4 is 6."}", "/scratch/micpie/export/rdkit_features/train_101-6.jsonl": "{"text":"The aromatic bond count of the compound with SMILES Cc1cnn(c1NC(=O)Cc2cccc(c2)OC)c3ccccn3 is 17."} {"text":"The aromatic bond count of the compound with SMILES COc1ccc(-c2cn(-c3ccc(O)c(C(=O)OCCCc4ccccc4)c3)nn2)cc1 is 23."}", "/scratch/micpie/export/rdkit_features/train_114-11.jsonl": "{"text":"User: I want to design a molecule with a chemical formula of C21H19Cl2N3O2S2.\nAssistant: That is a very interesting question, do you have some additional I should take into account?\nUser: Yes, I want the number of hydrogen bond donor sites to be 3, the number of hydrogen bond acceptors to be 3.\nAssistant: In that scenario, I recommend the molecule with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C."} {"text":"User: I want to synthesize a molecule with a molecular formula of C27H26FN4O2+.\nAssistant: Interesting, do you have some additional I should take into account?\nUser: I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptor sites to be 4.\nAssistant: Then, I the molecule with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F."}", "/scratch/micpie/export/rdkit_features/train_120-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donors of the chemical with SMILES Cc1ccc(cc1)NC(=O)CSc2nnc(n2C)C=NC(=O)CCc3ccccc3Cl?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donor sites of the chemical with SMILES Cc1c(cnn1c2ccccc2)N[C@@H](C)C(=O)Nc3cccc(c3Cl)Cl?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_8-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the chemical with SMILES C[C@H]1C[C@@H](N(C[C@@H]1CNC(=O)OC(C)(C)C)C(=O)Cc2c(cccc2OC)OC)C?\nAnswer: 6"} {"text":"Question: What is the count of rotatable bonds of the chemical with SMILES c1cc(c(cc1Br)C(=O)N2CC(OC(C2)(F)F)(F)F)Br?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_7-0.jsonl": "{"text":"The chemical formula of the compound with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl is C22H24ClN3O3."} {"text":"The formula of the chemical with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F is C21H24F2N2O5."}", "/scratch/micpie/export/rdkit_features/train_111-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the molecule with SMILES CCN(C)c1nnc(n1Cc2c(c3ccccc3o2)C)[C@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 10"} {"text":"Question: What is the heteroatom count of the molecule with SMILES C[C@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC?\nAnswer: 10"}", "/scratch/micpie/export/rdkit_features/valid_1-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the molecule with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_32-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the molecule with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3?\nAnswer: 11"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_3-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O?\nAnswer: 65.41"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4?\nAnswer: 66.62"}", "/scratch/micpie/export/rdkit_features/train_7-11.jsonl": "{"text":"User: I want to analyze a compound with a chemical formula of C22H24ClN3O3.\nAssistant: That's interesting, do you have some additional requirements?\nUser: Yes, I want the count of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 5.\nAssistant: I recommend the compound with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl."} {"text":"User: I want to make a chemical with a chemical formula of C21H24F2N2O5.\nAssistant: Cool, do you have some additional conditions I should consider?\nUser: Yes, I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptors to be 5.\nAssistant: Then, I advise the chemical with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F."}", "/scratch/micpie/export/rdkit_features/test_22-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O?\nAnswer: 4"} {"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_100-23.jsonl": "{"text":"User: I want to synthesize a chemical with a number of hydrogen bond donor sites of 2, a number of hydrogen bond acceptor sites of 1 and a LogP value computed using the Wildman-Crippen method of 1.37.\nAssistant: Do you have some additional constraints?\nUser: Indeed, I want the molecular formula to be C18H27FN3O+.\nAssistant: In that case, I suggest the chemical with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F."} {"text":"User: I want to analyze a compound with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptors of 3 and a LogP value computed using the Wildman-Crippen method of 2.67.\nAssistant: Do you have some additional conditions?\nUser: Yes, I want the chemical formula to be C13H14BrN3O2.\nAssistant: Then, I the compound with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br."}", "/scratch/micpie/export/rdkit_features/valid_1-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5?\nAnswer: 7"} {"text":"Question: What is the count of hydrogen bond acceptors of the compound with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F?\nAnswer: 7"}", "/scratch/micpie/export/rdkit_features/test_23-4.jsonl": "{"text":"The ring count of the compound with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4 is 4."} {"text":"The count of rings of the compound with SMILES c1cc(cc(c1)Cl)C[NH+]2CCN(CC2)C(=O)Nc3ccc(cn3)c4n[n-]nn4 is 4."}", "/scratch/micpie/export/rdkit_features/test_12-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C is 4.63."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4 is 4.97."}", "/scratch/micpie/export/rdkit_features/train_114-4.jsonl": "{"text":"The ring count of the chemical with SMILES Cc1cc(cc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl)C is 3."} {"text":"The ring count of the compound with SMILES C[NH+]1CCN(CC1)c2ccc(cc2)NC(=O)c3ccccc3c4ncc(o4)c5ccccc5F is 5."}", "/scratch/micpie/export/rdkit_features/test_103-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the compound with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1 is 4."} {"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 7."}", "/scratch/micpie/export/rdkit_features/train_9-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the compound with SMILES CC1(C(=O)Nc2ccccc2N1C(=O)C[NH+](Cc3ccccc3)C4(CCCC4)CO)C?\nAnswer: 6"} {"text":"Question: What is the rotatable bond count of the compound with SMILES CC(C)C[C@H]1CCN(C1)c2nnc(n2CCCC(=O)N(C)C)c3ccccc3F?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/test_100-13.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES CC[C@@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F?\nAnswer: 2"} {"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES Cc1c(c(n[nH]1)NC(=O)Cc2cccc(c2)OC)Br?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_28-6.jsonl": "{"text":"The number of aromatic bonds of the molecule with SMILES Cc1c(n2ccsc2n1)C[NH+](C)Cc3nc4ccc(cc4s3)OC is 19."} {"text":"The aromatic bond count of the molecule with SMILES Cc1c(c2ccccc2[nH]1)CCNC(=O)Nc3cnc(nc3)C(C)(C)C is 16."}", "/scratch/micpie/export/rdkit_features/valid_25-22.jsonl": "{"text":"User: I want to analyze a compound with a number of hydrogen bond donor sites of 0 and a count of hydrogen bond acceptors of 8.\nAssistant: Do you have some additional conditions that I should consider?\nUser: Yep, I want the count of heteroatoms to be 8.\nAssistant: In that situation, I suggest the compound with SMILES CCc1nnc(n1Cc2ccc(nc2OC)OC)N3CCC[C@H]3[C@@H]4CCCO4."} {"text":"User: I want to make a compound with a count of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 4.\nAssistant: That is a very interesting question, do you have some additional limitations that help me narrow down the search?\nUser: Yep, I want the number of heteroatoms to be 5.\nAssistant: I the compound with SMILES Cc1cccc(c1)NC2CCN(CC2)C(=O)CCSc3ccccn3."}", "/scratch/micpie/export/rdkit_features/test_25-7.jsonl": "{"text":"The number of acid groups of the compound with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4 is 0."} {"text":"The number of acid groups of the chemical with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3 is 0."}", "/scratch/micpie/export/rdkit_features/test_22-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the compound with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O?\nAnswer: 5"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_117-22.jsonl": "{"text":"User: I want to create a chemical with a count of hydrogen bond donors of 0 and a number of hydrogen bond acceptors of 4.\nAssistant: Do you have some additional limitations I should consider?\nUser: Yep, I want the heteroatom count to be 7.\nAssistant: In that situation, I the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4OC)Cl."} {"text":"User: I want to create a chemical with a number of hydrogen bond donor sites of 2 and a number of hydrogen bond acceptors of 1.\nAssistant: That's interesting, do you have some additional constraints I should take into account?\nUser: I want the number of heteroatoms to be 3.\nAssistant: In that situation, I recommend the chemical with SMILES Cc1cc(ccc1[C@@H](C)[NH2+]CC2(CC2)CO)F."}", "/scratch/micpie/export/rdkit_features/valid_105-18.jsonl": "{"text":"Question: What is the aromatic bond count of the molecule with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F?\nAnswer: 16"} {"text":"Question: What is the count of aromatic bonds of the chemical with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/valid_11-3.jsonl": "{"text":"The heteroatom count of the chemical with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl is 4."} {"text":"The count of heteroatoms of the compound with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C is 4."}", "/scratch/micpie/export/rdkit_features/train_4-0.jsonl": "{"text":"The molecular formula of the chemical with SMILES C[C@@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4 is C24H28N3O3+."} {"text":"The chemical formula of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+]4CC[C@H]([C@H](CC4)O)C is C23H29N2O3S+."}", "/scratch/micpie/export/rdkit_features/valid_7-12.jsonl": "{"text":"Question: What is the molecular formula of the compound with SMILES COc1cc2c(cc([nH]2)CCCNC(=O)CCc3ccc4c(c3)CCO4)c(c1)OC?\nAnswer: C24H28N2O4"} {"text":"Question: What is the chemical formula of the compound with SMILES Cc1cc(ccc1CC(=O)Nc2cccc(c2)C(=O)Nc3ccc(nc3)OC)OC?\nAnswer: C23H23N3O4"}", "/scratch/micpie/export/rdkit_features/valid_21-0.jsonl": "{"text":"The formula of the molecule with SMILES CC(C)(C)[C@H](C(=O)OC)[NH2+]CC(=O)Nc1c(c2c(s1)CCCC2)C(=O)[O-] is C18H26N2O5S."} {"text":"The chemical formula of the molecule with SMILES C[C@H](C(=O)N1CCC(CC1)[C@H]2CCCCN2C(=O)[C@@H](C)O)Nc3ccccc3 is C22H33N3O3."}", "/scratch/micpie/export/rdkit_features/test_17-6.jsonl": "{"text":"The count of aromatic bonds of the molecule with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F is 0."} {"text":"The count of aromatic bonds of the molecule with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 11."}", "/scratch/micpie/export/rdkit_features/test_116-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES c1cc2c(ccc3c2cc(s3)C(=O)N[C@H]4C[C@H]5CC[C@@H](C4)N5C(=O)C6(CC6)c7ccc(cc7)F)nc1 is 4."} {"text":"The count of rotatable bonds of the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5c(s4)cccc5Cl is 4."}", "/scratch/micpie/export/rdkit_features/valid_33-8.jsonl": "{"text":"The number of basic groups of the chemical with SMILES c1ccc2c(c1)cn(n2)CC(=O)N3C[C@@H]4[C@H]3CN(CC4)C(=O)[C@@]56CCC[C@@H]5C6 is 0."} {"text":"The basic group count of the compound with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@@H]3CCSC3)Br is 1."}", "/scratch/micpie/export/rdkit_features/test_113-8.jsonl": "{"text":"The count of basic groups of the molecule with SMILES c1cc(cnc1)[C@@H]2[C@H](CCN2C(=O)\/C=C\/c3ccsc3)CNC(=O)CSC(=[NH2+])N is 2."} {"text":"The count of basic groups of the compound with SMILES COc1cccc(c1)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 0."}", "/scratch/micpie/export/rdkit_features/valid_13-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES Cc1ccc(c2c1cccn2)C(=O)Nc3c(cccc3F)N4CCCC4?\nAnswer: 3"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the molecule with SMILES Cc1cccc(c1)c2ccc(nc2)NCc3ccc4c(c3)ncn4C?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_106-11.jsonl": "{"text":"User: I want to analyze a chemical with a chemical formula of C13H11ClFN3O3.\nAssistant: Interesting, do you have some additional ?\nUser: Yes, I want the number of hydrogen bond donor sites to be 1, the count of hydrogen bond acceptors to be 5.\nAssistant: I suggest the chemical with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl."} {"text":"User: I want to make a compound with a chemical formula of C18H24F4N3O3S+.\nAssistant: Do you have some additional limitations?\nUser: Yes, I want the number of hydrogen bond donors to be 2, the number of hydrogen bond acceptors to be 3.\nAssistant: In that situation, I advise the compound with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F."}", "/scratch/micpie/export/rdkit_features/train_11-22.jsonl": "{"text":"User: I want to design a compound with a count of hydrogen bond donors of 1 and a number of hydrogen bond acceptor sites of 2.\nAssistant: Interesting, do you have some additional constraints I should take into account?\nUser: Yes, I want the heteroatom count to be 4.\nAssistant: In that case, I suggest the compound with SMILES C[C@@H]1C[C@H](C[C@@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl."} {"text":"User: I want to design a compound with a number of hydrogen bond donors of 0 and a number of hydrogen bond acceptor sites of 1.\nAssistant: Do you have some additional conditions?\nUser: Indeed, I want the number of heteroatoms to be 4.\nAssistant: In that case, I the compound with SMILES C[C@@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@@H]2C3CC3)(C)C."}", "/scratch/micpie/export/rdkit_features/valid_119-4.jsonl": "{"text":"The number of rings of the molecule with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O is 2."} {"text":"The ring count of the molecule with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6 is 6."}", "/scratch/micpie/export/rdkit_features/valid_2-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES CCCCS(=O)(=O)c1ccc(cc1)c2nc(no2)c3ccc(cn3)N4CCCC4?\nAnswer: 7"} {"text":"Question: What is the number of rotatable bonds of the chemical with SMILES CN(CC(=O)N(C)C1CCCCC1)C(=O)Nc2nc(cs2)Cc3ccccc3?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/valid_33-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES c1ccc2c(c1)cn(n2)CC(=O)N3C[C@@H]4[C@H]3CN(CC4)C(=O)[C@@]56CCC[C@@H]5C6 is 10."} {"text":"The aromatic bond count of the compound with SMILES Cc1c(cc(c(=O)n1C)c2nc(on2)C[NH+](C)[C@@H]3CCSC3)Br is 11."}", "/scratch/micpie/export/rdkit_features/valid_100-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F is 1."} {"text":"The number of hydrogen bond acceptors of the compound with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4 is 2."}", "/scratch/micpie/export/rdkit_features/valid_100-11.jsonl": "{"text":"User: I want to make a molecule with a chemical formula of C18H27FN3O+.\nAssistant: Cool, do you have some additional constraints I should consider?\nUser: I want the count of hydrogen bond donors to be 2, the number of hydrogen bond acceptors to be 1.\nAssistant: In that scenario, I recommend the molecule with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F."} {"text":"User: I want to synthesize a compound with a formula of C20H24N3O+.\nAssistant: Interesting, do you have some additional I should consider?\nUser: I want the number of hydrogen bond donors to be 1, the count of hydrogen bond acceptors to be 2.\nAssistant: I suggest the compound with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4."}", "/scratch/micpie/export/rdkit_features/test_25-6.jsonl": "{"text":"The count of aromatic bonds of the chemical with SMILES CCc1nnc(n1C[C@@H](C(C)C)[NH+]2CCOCC2)N3CCC[C@@H]3[C@H]4CCCO4 is 5."} {"text":"The count of aromatic bonds of the compound with SMILES c1cc(c(c(c1)F)F)CC(=O)Nc2ccc(c(c2)F)NC[C@H]3CCCO3 is 12."}", "/scratch/micpie/export/rdkit_features/train_103-0.jsonl": "{"text":"The chemical formula of the chemical with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccc(F)cc1 is C18H14FNO2."} {"text":"The chemical formula of the chemical with SMILES Cn1cnc(n1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is C14H21N5O4."}", "/scratch/micpie/export/rdkit_features/valid_17-19.jsonl": "{"text":"Question: What is the number of acid groups of the molecule with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S?\nAnswer: 0"} {"text":"Question: What is the acid group count of the molecule with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_118-5.jsonl": "{"text":"The rotatable bond count of the molecule with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F is 6."} {"text":"The rotatable bond count of the chemical with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3 is 3."}", "/scratch/micpie/export/rdkit_features/train_29-0.jsonl": "{"text":"The chemical formula of the compound with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@@H]3C[C@H]4CC[C@H]3C4)F is C20H26FN3O2."} {"text":"The chemical formula of the chemical with SMILES CC[C@H](C)Oc1cc(ccc1CNC(=O)N(C)C2C[C@H]3CC[C@@H](C2)[NH+]3C)C is C22H36N3O2+."}", "/scratch/micpie/export/rdkit_features/valid_103-17.jsonl": "{"text":"Question: What is the rotatable bond count of the chemical with SMILES CC(=O)O[C@@H]1CC[C@]2(C)C(CCC(C)(O)CCO)=C(C)C(=O)[C@@H](O)[C@H]2[C@@]1(C)CO?\nAnswer: 7"} {"text":"Question: What is the rotatable bond count of the chemical with SMILES c1cnc2c(n1)cc(cn2)C(=O)N[C@@H]3CCN(C3)C(=O)C(=O)N?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/valid_12-4.jsonl": "{"text":"The count of rings of the compound with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl is 4."} {"text":"The count of rings of the compound with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3 is 3."}", "/scratch/micpie/export/rdkit_features/train_3-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the molecule with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O?\nAnswer: 6"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_7-4.jsonl": "{"text":"The ring count of the chemical with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl is 4."} {"text":"The number of rings of the molecule with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F is 2."}", "/scratch/micpie/export/rdkit_features/train_25-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES CCc1nnc(n1CCNC(=O)OC(C)(C)C)N2CCC[C@@H]2c3cnn(c3)C is 1."} {"text":"The number of hydrogen bond donors of the chemical with SMILES c1cc(ccc1C(=O)N(CC(F)(F)F)[C@@H]2CCCOC2)C(F)(F)F is 0."}", "/scratch/micpie/export/rdkit_features/valid_119-0.jsonl": "{"text":"The molecular formula of the chemical with SMILES COC1(CCOCC1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O is C14H18FN3O5."} {"text":"The formula of the compound with SMILES CCOc1ccc2ccccc2c1[C@@H]3c4c(n(c(=O)n(c4=O)C)C)N=C5[C@@H]3C(=O)c6c5cccc6 is C28H23N3O4."}", "/scratch/micpie/export/rdkit_features/valid_118-5.jsonl": "{"text":"The rotatable bond count of the molecule with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O is 6."} {"text":"The number of rotatable bonds of the chemical with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO is 5."}", "/scratch/micpie/export/rdkit_features/valid_110-16.jsonl": "{"text":"Question: What is the number of rings of the chemical with SMILES Cc1cc(c(o1)C)C(=O)N2CC[NH+](CC2)CN3C(=O)[C@@]4(CCc5ccccc5C4)NC3=O?\nAnswer: 5"} {"text":"Question: What is the ring count of the compound with SMILES CCN(C)c1nnc(n1Cc2csc3c2cccc3)[C@H]4CN(CCO4)S(=O)(=O)C?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_112-20.jsonl": "{"text":"Question: What is the basic group count of the molecule with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC?\nAnswer: 0"} {"text":"Question: What is the basic group count of the molecule with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/test_107-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES CCc1cc(c(s1)NC(=O)C(=O)N[C@H](Cc2ccccc2)C(=O)N(C)C)C(=O)OCC?\nAnswer: 66.93"} {"text":"Question: What is the total sum of atomic polarizabilities of the molecule with SMILES C[C@@H]1CN(C[C@H](O1)C)c2ccccc2NC(=O)C(=O)NCc3cc(c(c(c3)OC)O)OC?\nAnswer: 67.93"}", "/scratch/micpie/export/rdkit_features/train_23-3.jsonl": "{"text":"The number of heteroatoms of the compound with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4 is 8."} {"text":"The heteroatom count of the compound with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3 is 8."}", "/scratch/micpie/export/rdkit_features/test_33-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the compound with SMILES Cc1ccc(cc1)C(=O)N2CCC(CC2)NC(=O)N[C@@H](C)[C@H](C(F)(F)F)O is 2.21."} {"text":"The Wildman-Crippen LogP value of the compound with SMILES Cc1cc(no1)OCc2nc(no2)c3cc(c(n(c3=O)C)C)Br is 2.38."}", "/scratch/micpie/export/rdkit_features/test_12-21.jsonl": "{"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES CC1(CCCc2c1nc(s2)NC(=O)c3c[nH]c4c3c(ccc4)F)C?\nAnswer: 51.24"} {"text":"Question: What is the sum of atomic polarizabilities of the compound with SMILES c1cc(c(c(c1)F)NC(=O)CC(C2CCC2)C3CCC3)N4CCCC4?\nAnswer: 59.86"}", "/scratch/micpie/export/rdkit_features/valid_111-23.jsonl": "{"text":"User: I want to analyze a chemical with a count of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 7 and a LogP value computed using the Wildman-Crippen method of 1.02.\nAssistant: Interesting, do you have some additional limitations?\nUser: Yeah, I want the formula to be C24H35N8+.\nAssistant: In that scenario, I suggest the chemical with SMILES CCN(C)c1nnc(n1CCc2c(nccn2)C)CC[NH+]3CCN(CC3)c4ccccc4."} {"text":"User: I want to synthesize a compound with a number of hydrogen bond donor sites of 0, a count of hydrogen bond acceptors of 10 and a LogP value computed using the Wildman-Crippen method of 2.11.\nAssistant: That's interesting, do you have some additional conditions I should consider?\nUser: Yeah, I want the chemical formula to be C22H29N7O3.\nAssistant: In that case, I advise the compound with SMILES C[C@@H](Cn1c(nnc1N2CCN(CC2)c3cnccn3)c4ccc(c(c4)OC)OC)OC."}", "/scratch/micpie/export/rdkit_features/train_3-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O is 5."} {"text":"The number of hydrogen bond acceptors of the chemical with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4 is 5."}", "/scratch/micpie/export/rdkit_features/test_22-3.jsonl": "{"text":"The number of heteroatoms of the molecule with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O is 6."} {"text":"The heteroatom count of the molecule with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4 is 7."}", "/scratch/micpie/export/rdkit_features/train_21-17.jsonl": "{"text":"Question: What is the rotatable bond count of the chemical with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC?\nAnswer: 6"} {"text":"Question: What is the count of rotatable bonds of the compound with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_10-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES CC(C)C[C@@H]1CCN(C1)c2nnc(n2CCSC(F)(F)F)c3cnn(c3)C is 0."} {"text":"The number of hydrogen bond donor sites of the molecule with SMILES c1cc(cc(c1)Cl)[C@H]2CCC[C@@H](C2)NC(=O)CC3CCC(=O)CC3 is 1."}", "/scratch/micpie/export/rdkit_features/train_3-22.jsonl": "{"text":"User: I want to make a chemical with a number of hydrogen bond donors of 2 and a number of hydrogen bond acceptors of 5.\nAssistant: Do you have some additional constraints?\nUser: Yea, I want the heteroatom count to be 6.\nAssistant: In that scenario, I the chemical with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O."} {"text":"User: I want to analyze a chemical with a count of hydrogen bond donors of 1 and a count of hydrogen bond acceptors of 5.\nAssistant: That's interesting, do you have some additional constraints?\nUser: Yes, I want the number of heteroatoms to be 6.\nAssistant: In that scenario, I the chemical with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4."}", "/scratch/micpie/export/rdkit_features/valid_12-20.jsonl": "{"text":"Question: What is the basic group count of the chemical with SMILES CC(C)[C@H](C(=O)N1CC2([C@@H]1C3CC3)CCC2)Oc4ccc(cc4)Cl?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the molecule with SMILES CC(C)CCN(C)c1nnc(n1C[C@H]2CC[C@H](C2)F)C3CCCC3?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_1-11.jsonl": "{"text":"User: I want to design a chemical with a chemical formula of C21H21N7OS.\nAssistant: Nice, do you have some additional constraints?\nUser: I want the count of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 7.\nAssistant: I recommend the chemical with SMILES Cc1[nH]nc(n1)[C@@H]2CCCCN2C(=O)c3csc(n3)c4cnn(c4)c5ccccc5."} {"text":"User: I want to analyze a molecule with a formula of C22H24FN5O2.\nAssistant: That is a very interesting question, do you have some additional limitations I should take into account?\nUser: Yeah, I want the count of hydrogen bond donors to be 0, the number of hydrogen bond acceptor sites to be 7.\nAssistant: I propose the molecule with SMILES Cc1ccc(c(n1)N2CCCC2)c3nc(on3)c4cc(ccc4N5CCOCC5)F."}", "/scratch/micpie/export/rdkit_features/test_19-8.jsonl": "{"text":"The count of basic groups of the molecule with SMILES CC(C)COC[C@@H](c1ccco1)NC(=O)N[C@H]2CCN(C2=O)c3cccc(c3)Cl is 0."} {"text":"The count of basic groups of the molecule with SMILES CN(C[C@@H]1C[C@@H](C[NH+]1Cc2nccs2)F)C(=O)NC[C@@H]3CCC4(O3)CCCCC4 is 1."}", "/scratch/micpie/export/rdkit_features/train_106-18.jsonl": "{"text":"Question: What is the count of aromatic bonds of the molecule with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O?\nAnswer: 6"} {"text":"Question: What is the aromatic bond count of the molecule with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/test_27-23.jsonl": "{"text":"User: I want to synthesize a molecule with a number of hydrogen bond donors of 2, a number of hydrogen bond acceptors of 2 and a Wildman-Crippen LogP value of 2.17.\nAssistant: Do you have some additional requirements I should take into account?\nUser: Yep, I want the chemical formula to be C18H27F3N3O2+.\nAssistant: In that scenario, I recommend the molecule with SMILES Cc1ccc(cc1NC(=O)N2CC[NH+](CC2)C(C)(C)COC)C(F)(F)F."} {"text":"User: I want to analyze a chemical with a number of hydrogen bond donors of 2, a number of hydrogen bond acceptor sites of 3 and a Wildman-Crippen LogP value of 3.82.\nAssistant: That's interesting, do you have some additional limitations I should take into account?\nUser: Yep, I want the chemical formula to be C19H27F2N3O2.\nAssistant: In that case, I advise the chemical with SMILES c1ccc(cc1)N[C@@H]2CCCN(C2)C(=O)N[C@@H]3CCCC[C@H]3OC(F)F."}", "/scratch/micpie/export/rdkit_features/valid_15-6.jsonl": "{"text":"The aromatic bond count of the molecule with SMILES CC(C)[C@H](c1cnn(c1)C)NC(=O)\/C=C\/c2ccccc2C#N is 11."} {"text":"The aromatic bond count of the molecule with SMILES C[C@H](C=C)[C@H](C)C(=O)NC1CC[NH+](CC1)CCOCC(C)C is 0."}", "/scratch/micpie/export/rdkit_features/valid_118-7.jsonl": "{"text":"The count of acid groups of the molecule with SMILES CC[C@H](CC[NH2+][C@H](C)c1ccc(cc1C)F)O is 0."} {"text":"The acid group count of the chemical with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)NCC2(CCCCC2)CO is 0."}", "/scratch/micpie/export/rdkit_features/train_3-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the molecule with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O is 4.28."} {"text":"The Wildman-Crippen LogP value of the compound with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4 is 2.25."}", "/scratch/micpie/export/rdkit_features/train_21-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES Cc1cc(ccc1C(=O)[O-])S(=O)(=O)N[C@H](c2ccccc2F)C(=O)OC?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the molecule with SMILES C[C@H](C(=O)N1CCCC[C@@H]1C2CCN(CC2)C(=O)c3cc(cnc3Cl)F)O?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_23-3.jsonl": "{"text":"The heteroatom count of the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC3(CC3)S(=O)C)C4CCCC4 is 7."} {"text":"The count of heteroatoms of the molecule with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)C(=O)N3CCC[C@@H]3CC4CCCCC4 is 9."}", "/scratch/micpie/export/rdkit_features/test_29-9.jsonl": "{"text":"The sum of atomic polarizabilities of the compound with SMILES C[NH+]1CCC[C@H]2[C@@H]1CCN(C2)C(=O)Nc3cccc(c3)N4CCCCC4 is 64.17."} {"text":"The total sum of atomic polarizabilities of the chemical with SMILES CC(C)C(=O)N1CCCC[C@@H]1C(=O)NC[C@@H]2CCCC[C@H]2C(C)(C)C is 66.10."}", "/scratch/micpie/export/rdkit_features/test_1-10.jsonl": "{"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4 is 2.58."} {"text":"The Wildman-Crippen LogP value of the molecule with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC is 3.73."}", "/scratch/micpie/export/rdkit_features/test_120-19.jsonl": "{"text":"Question: What is the number of acid groups of the molecule with SMILES c1ccc(cc1)CCN(CC(=O)Nc2ccc(cc2)Cl)S(=O)(=O)c3ccc4c(c3)OCCO4?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the compound with SMILES Cc1c(cnn1c2ccccc2)N[C@H](C)C(=O)Nc3cccc(c3Cl)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_117-12.jsonl": "{"text":"Question: What is the molecular formula of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc5cc(ccc5c4)Br?\nAnswer: C26H25BrN2O3"} {"text":"Question: What is the formula of the molecule with SMILES C[C@H](C(C)(C)OC)[NH2+]CCNc1ccccc1?\nAnswer: C14H25N2O+"}", "/scratch/micpie/export/rdkit_features/test_103-3.jsonl": "{"text":"The count of heteroatoms of the molecule with SMILES Oc1ccccc1-c1cc(-c2ccccn2)cc(-c2ccccc2O)n1 is 4."} {"text":"The number of heteroatoms of the compound with SMILES Cn1cc(nn1)C(=O)N2C[C@H]([C@@H](C2)OC)NC(=O)[C@H]3CCCO3 is 9."}", "/scratch/micpie/export/rdkit_features/train_102-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES CS[C@@]12C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C(=O)[C@]1(SC)C[C@@H]3[C@@H](O)CC[C@H](O)[C@H]3N1C2=O is 4."} {"text":"The number of hydrogen bond donor sites of the compound with SMILES O\/N=C(\\COc1cccc2ccccc12)c1ccccc1 is 1."}", "/scratch/micpie/export/rdkit_features/train_113-19.jsonl": "{"text":"Question: What is the count of acid groups of the chemical with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the compound with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_16-7.jsonl": "{"text":"The count of acid groups of the chemical with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@@H](C)[C@@H](C)C=C is 0."} {"text":"The number of acid groups of the molecule with SMILES CC#CC[C@@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F is 0."}", "/scratch/micpie/export/rdkit_features/train_28-1.jsonl": "{"text":"The count of hydrogen bond donors of the compound with SMILES c1cc2c(cc1Br)CCN2C(=O)C(=O)c3ccc(s3)Cl is 0."} {"text":"The count of hydrogen bond donors of the compound with SMILES c1cc(c(cc1C(=O)N)NC(=O)N2CCC[C@@H]2C[C@H]3C[C@H]4CC[C@H]3C4)F is 2."}", "/scratch/micpie/export/rdkit_features/train_31-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES Cc1ccc(cc1)OCCC(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)c4cccc(c4C)F is 0."} {"text":"The number of hydrogen bond donor sites of the chemical with SMILES Cc1c(cccn1)CC(=O)NCc2cccc(c2)N3C(=O)[C@](NC3=O)(C)C4CC4 is 2."}", "/scratch/micpie/export/rdkit_features/valid_108-3.jsonl": "{"text":"The number of heteroatoms of the molecule with SMILES C[C@@H](C(=O)Nc1cc(ccc1OC)S(=O)(=O)N(C)C)Sc2c(cccn2)N(=O)=O is 12."} {"text":"The heteroatom count of the molecule with SMILES Cc1cc(ccc1NC(=O)C(=O)N[C@@H](C(C)C)C(=O)N2CCCC2)C(=O)NCC(C)C is 8."}", "/scratch/micpie/export/rdkit_features/train_32-2.jsonl": "{"text":"The number of hydrogen bond acceptor sites of the chemical with SMILES C[C@@H]1C[NH+](C[C@H](O1)C)Cc2csc(n2)NC(=O)NCCCNC(=O)C(C)C is 5."} {"text":"The number of hydrogen bond acceptor sites of the molecule with SMILES C[NH+](CCCNC(=O)C(=O)Nc1nc(co1)C(F)(F)F)Cc2ccccc2 is 4."}", "/scratch/micpie/export/rdkit_features/train_15-3.jsonl": "{"text":"The count of heteroatoms of the molecule with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC is 6."} {"text":"The heteroatom count of the molecule with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C is 5."}", "/scratch/micpie/export/rdkit_features/test_30-23.jsonl": "{"text":"User: I want to design a molecule with a number of hydrogen bond donor sites of 1, a count of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value of 3.58.\nAssistant: Nice, do you have some additional I should take into account?\nUser: Indeed, I want the molecular formula to be C21H25N3O2.\nAssistant: I the molecule with SMILES CN(C)C(=O)CCCC(=O)Nc1ccccc1N2CCc3c2cccc3."} {"text":"User: I want to design a compound with a count of hydrogen bond donors of 0, a count of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value of 3.76.\nAssistant: Interesting, do you have some additional conditions I should take into account?\nUser: I want the chemical formula to be C24H27FN2O3.\nAssistant: In that situation, I suggest the compound with SMILES Cc1c(cccc1F)C(=O)N2CC[C@H]3[C@@H]2CCN3C(=O)[C@@H](C)c4ccc(cc4)OC."}", "/scratch/micpie/export/rdkit_features/test_9-19.jsonl": "{"text":"Question: What is the number of acid groups of the compound with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C?\nAnswer: 0"} {"text":"Question: What is the count of acid groups of the compound with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_7-5.jsonl": "{"text":"The rotatable bond count of the chemical with SMILES C[C@H]1CCc2ccccc2N1CCNC(=O)CCn3c4ccc(cc4oc3=O)Cl is 6."} {"text":"The rotatable bond count of the chemical with SMILES Cc1cccc(c1NC(=O)[C@@H](CCOC)NC(=O)OCc2ccccc2)OC(F)F is 10."}", "/scratch/micpie/export/rdkit_features/test_105-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the compound with SMILES Cc1ccc(c(c1)Br)O[C@@H](C)C(=O)Nc2ccc(cc2)C#N?\nAnswer: 12"} {"text":"Question: What is the count of aromatic bonds of the compound with SMILES COc1cnc(cc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/train_118-12.jsonl": "{"text":"Question: What is the molecular formula of the chemical with SMILES Cc1cc(ccc1[C@H](C)[NH2+][C@H](C)CCCO)F?\nAnswer: C14H23FNO+"} {"text":"Question: What is the chemical formula of the compound with SMILES c1c(c(cc(c1N(=O)=O)F)N)C(=O)N[C@H]2CCCCC23OCCO3?\nAnswer: C15H18FN3O5"}", "/scratch/micpie/export/rdkit_features/valid_104-10.jsonl": "{"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES CO[C@@H]1CN(C[C@H]1NC(=O)[C@H]2CCCO2)C(=O)Cc3[nH]ncn3 is -1.13."} {"text":"The Wildman-Crippen LogP value computed using RDKit of the chemical with SMILES CC(=C)C[NH+]1CCC(CC1)NC(=O)Nc2ccc(cc2)COC(C)(C)C is 2.75."}", "/scratch/micpie/export/rdkit_features/train_23-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the compound with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4?\nAnswer: 6"} {"text":"Question: What is the number of rotatable bonds of the chemical with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_26-22.jsonl": "{"text":"User: I want to make a chemical with a count of hydrogen bond donors of 0 and a number of hydrogen bond acceptor sites of 3.\nAssistant: Nice, do you have some additional requirements that help me narrow down the search?\nUser: Yes, I want the number of heteroatoms to be 7.\nAssistant: In that situation, I suggest the chemical with SMILES CC(C)Oc1cccc(c1)CC(=O)N(CC(F)(F)F)[C@H]2CCCOC2."} {"text":"User: I want to analyze a compound with a number of hydrogen bond donors of 3 and a number of hydrogen bond acceptors of 6.\nAssistant: Interesting, do you have some additional requirements I should consider?\nUser: Yes, I want the count of heteroatoms to be 8.\nAssistant: In that case, I the compound with SMILES CC(C)NC(=O)Nc1ccc(cc1)c2nc(on2)c3ccc(cc3O)OC."}", "/scratch/micpie/export/rdkit_features/valid_32-6.jsonl": "{"text":"The aromatic bond count of the chemical with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3 is 11."} {"text":"The count of aromatic bonds of the compound with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O is 6."}", "/scratch/micpie/export/rdkit_features/test_16-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES CCCCOCC[NH+]1CCC(CC1)NC(=O)[C@H](C)[C@@H](C)C=C is 0."} {"text":"The number of aromatic bonds of the compound with SMILES c1c(c(sc1Cl)Cl)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)CCC#N is 5."}", "/scratch/micpie/export/rdkit_features/valid_120-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES c1cc(oc1)CNC(=O)CSc2nc3ccc(cc3s2)n4c(c5c(c4O)[C@@H]6C[C@@H]5C=C6)O?\nAnswer: 20"} {"text":"Question: What is the aromatic bond count of the molecule with SMILES CN(c1nc2cc(ccc2s1)Cl)C(=O)CSc3ccc(c(c3)F)F?\nAnswer: 16"}", "/scratch/micpie/export/rdkit_features/test_117-23.jsonl": "{"text":"User: I want to design a chemical with a number of hydrogen bond donors of 0, a count of hydrogen bond acceptors of 3 and a Wildman-Crippen LogP value of 5.13.\nAssistant: That's interesting, do you have some additional requirements I should take into account?\nUser: Yeah, I want the chemical formula to be C26H25BrN2O3.\nAssistant: Then, I propose the chemical with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4ccc5cc(ccc5c4)Br."} {"text":"User: I want to analyze a molecule with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptor sites of 2 and a LogP value computed using the Wildman-Crippen method of 1.48.\nAssistant: Cool, do you have some additional limitations I should take into account?\nUser: Indeed, I want the molecular formula to be C14H25N2O+.\nAssistant: Then, I the molecule with SMILES C[C@H](C(C)(C)OC)[NH2+]CCNc1ccccc1."}", "/scratch/micpie/export/rdkit_features/valid_106-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES c1cc(nc2c1cc(c(c2)N(=O)=O)N3C[C@H]([C@H](C3)F)O)Cl?\nAnswer: 1"} {"text":"Question: What is the number of hydrogen bond donors of the compound with SMILES c1cc(cc(c1)S(=O)(=O)N[C@H]2CCC[NH+](C2)[C@@H]3CCCN(C3=O)CC(F)(F)F)F?\nAnswer: 2"}", "/scratch/micpie/export/rdkit_features/train_14-9.jsonl": "{"text":"The total sum of atomic polarizabilities of the chemical with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C is 60.41."} {"text":"The sum of atomic polarizabilities of the compound with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O is 40.53."}", "/scratch/micpie/export/rdkit_features/train_115-23.jsonl": "{"text":"User: I want to design a molecule with a count of hydrogen bond donors of 2, a number of hydrogen bond acceptor sites of 3 and a LogP value computed using the Wildman-Crippen method of 5.03.\nAssistant: That's interesting, do you have some additional requirements I should take into account?\nUser: I want the chemical formula to be C21H14BrF2N3O2.\nAssistant: In that case, I advise the molecule with SMILES c1ccnc(c1)C(=O)Nc2ccc(cc2NC(=O)\/C=C\\c3cc(ccc3F)Br)F."} {"text":"User: I want to create a molecule with a count of hydrogen bond donors of 1, a number of hydrogen bond acceptors of 3 and a LogP value computed using the Wildman-Crippen method of 5.01.\nAssistant: Nice, do you have some additional requirements I should take into account?\nUser: I want the molecular formula to be C29H27F2N3O2.\nAssistant: In that case, I recommend the molecule with SMILES c1cc(cc(c1)F)c2c(cccn2)C(=O)N[C@H]3C[C@H]4CC[C@@H](C3)N4C(=O)C5(CC5)c6ccc(cc6)F."}", "/scratch/micpie/export/rdkit_features/valid_102-17.jsonl": "{"text":"Question: What is the rotatable bond count of the molecule with SMILES CC(C)N(CC(=O)Nc1cc(F)cc(F)c1)C(=O)c1ccc(-c2ccccn2)cc1?\nAnswer: 6"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES CCCN(CCCNC(=O)CCCCCCCCCCCCCCC(=O)NCCCN(CCC)CCc1cccc2c1CC(=O)N2)CCc1cccc2c1CC(=O)N2?\nAnswer: 33"}", "/scratch/micpie/export/rdkit_features/test_7-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES C[C@@H](c1cccc(c1)N2CCCC2)NC(=O)c3cc(nn3c4ccccc4)C(=O)OC is 6."} {"text":"The count of hydrogen bond acceptors of the chemical with SMILES CCOC(=O)c1ccc2c(c1)sc(n2)NC(=O)C3=Cc4cccc(c4OC3)OC is 7."}", "/scratch/micpie/export/rdkit_features/train_23-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the compound with SMILES CC[NH+]1CCN(C[C@@H]1C)c2nnc(n2CCc3c(noc3C)C)[C@@H]4CCOC4 is 7."} {"text":"The count of hydrogen bond acceptors of the molecule with SMILES c1cc(c(cc1c2n[n-]nn2)I)NC(=O)NC3CCC3 is 4."}", "/scratch/micpie/export/rdkit_features/train_15-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES CC(C)[C@@H](c1cnn(c1)C)NC(=O)c2ccc(cc2)\/C=N\\OC?\nAnswer: 11"} {"text":"Question: What is the count of aromatic bonds of the molecule with SMILES Cc1c(scn1)C[NH+]2CCC(CC2)NC(=O)[C@H](C)[C@@H](C)C=C?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/valid_27-16.jsonl": "{"text":"Question: What is the count of rings of the compound with SMILES CC(C)(C)c1cnc(s1)NC(=O)Nc2ccn(n2)CCC(F)(F)F?\nAnswer: 2"} {"text":"Question: What is the number of rings of the molecule with SMILES C[NH+](CCC1CCN(CC1)C(=O)N[C@H]2C[C@@H](C2)c3ccc(cc3)Cl)C?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_113-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the molecule with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 8"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES Cc1cccc(c1C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_33-20.jsonl": "{"text":"Question: What is the basic group count of the compound with SMILES Cc1ccc(cc1)C(=O)N2CCC(CC2)NC(=O)N[C@@H](C)[C@H](C(F)(F)F)O?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the compound with SMILES Cc1cc(no1)OCc2nc(no2)c3cc(c(n(c3=O)C)C)Br?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/train_1-19.jsonl": "{"text":"Question: What is the number of acid groups of the molecule with SMILES c1ccc(cc1)[C@@H](C(=O)NCC2(CCCCC2)[NH+]3CCOCC3)Oc4ccccc4?\nAnswer: 0"} {"text":"Question: What is the number of acid groups of the chemical with SMILES c1ccc(c(c1)N2CCN(CC2)S(=O)(=O)C3CCC(CC3)C(F)(F)F)Cl?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_100-16.jsonl": "{"text":"Question: What is the count of rings of the molecule with SMILES CC[C@H](C[NH+]1CCC2(C1)CN(C2)C(=O)c3c(cc[nH]3)C4CC4)F?\nAnswer: 4"} {"text":"Question: What is the number of rings of the chemical with SMILES CC(=O)N1CCC2(C1)C[NH+](C2)Cc3cccc(c3)c4ccccn4?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_19-12.jsonl": "{"text":"Question: What is the formula of the molecule with SMILES C[C@@H](CC(C)(C)c1ccccc1)NC(=O)NCC(=O)Nc2cc(cc(c2)OC)OC?\nAnswer: C23H31N3O4"} {"text":"Question: What is the formula of the chemical with SMILES CCS(=O)(=O)c1ccc(s1)CNC(=O)N[C@H](C)c2cc3cccc(c3o2)F?\nAnswer: C18H19FN2O4S2"}", "/scratch/micpie/export/rdkit_features/train_106-5.jsonl": "{"text":"The count of rotatable bonds of the chemical with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O is 3."} {"text":"The rotatable bond count of the chemical with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl is 4."}", "/scratch/micpie/export/rdkit_features/train_3-3.jsonl": "{"text":"The heteroatom count of the chemical with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O is 6."} {"text":"The number of heteroatoms of the molecule with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4 is 6."}", "/scratch/micpie/export/rdkit_features/valid_109-11.jsonl": "{"text":"User: I want to make a chemical with a chemical formula of C22H26ClN3O4.\nAssistant: Interesting, do you have some additional ?\nUser: Yea, I want the number of hydrogen bond donor sites to be 3, the count of hydrogen bond acceptors to be 5.\nAssistant: In that situation, I recommend the chemical with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl."} {"text":"User: I want to analyze a molecule with a molecular formula of C21H19FN4O3S.\nAssistant: Do you have some additional limitations that I should consider?\nUser: Yes, I want the number of hydrogen bond donor sites to be 3, the count of hydrogen bond acceptors to be 5.\nAssistant: I recommend the molecule with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F."}", "/scratch/micpie/export/rdkit_features/valid_32-2.jsonl": "{"text":"The number of hydrogen bond acceptors of the molecule with SMILES CS(=O)(=O)C[C@H]1CCC[NH+](C1)CCCCCn2nc(nn2)c3ccccc3 is 6."} {"text":"The count of hydrogen bond acceptors of the chemical with SMILES CC[C@]1(CCN(C1)C(=O)C(=O)Nc2cc(cc(c2Cl)Cl)C(=O)OC)O is 5."}", "/scratch/micpie/export/rdkit_features/valid_117-1.jsonl": "{"text":"The count of hydrogen bond donors of the molecule with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4OC)Cl is 0."} {"text":"The count of hydrogen bond donors of the compound with SMILES Cc1cc(ccc1[C@@H](C)[NH2+]CC2(CC2)CO)F is 2."}", "/scratch/micpie/export/rdkit_features/test_0-20.jsonl": "{"text":"Question: What is the count of basic groups of the compound with SMILES N#Cc1ccc2sc(-c3nocc3C(=O)O)c(-c3cc4c(s3)-c3sccc3C43SCCS3)c2c1?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the compound with SMILES Cn1c(cc2ccccc2c1=O)C(=O)NCc3cccnc3Oc4cccc(c4)OC?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/valid_20-20.jsonl": "{"text":"Question: What is the count of basic groups of the compound with SMILES Cc1ccc(cc1)CN2CC[C@@H](C2=O)NC(=O)N[C@@H](CCCCOC)c3ccccc3?\nAnswer: 0"} {"text":"Question: What is the number of basic groups of the compound with SMILES COC(=O)[C@@H](C1CCCCC1)NS(=O)(=O)c2ccc(cc2Cl)C(=O)[O-]?\nAnswer: 0"}", "/scratch/micpie/export/rdkit_features/test_9-13.jsonl": "{"text":"Question: What is the count of hydrogen bond donors of the chemical with SMILES C[C@H](c1ccc(cc1)N2CCCOC2=O)[NH2+]C[C@H]3CCCN(C3)C(=O)C(C)(C)C?\nAnswer: 1"} {"text":"Question: What is the count of hydrogen bond donors of the compound with SMILES C[C@@H](CCn1c(nnc1N2CC[C@@H](C2)CC(C)C)c3cnn(c3)C)[NH+]4CCCCC4?\nAnswer: 1"}", "/scratch/micpie/export/rdkit_features/train_105-10.jsonl": "{"text":"The Wildman-Crippen LogP value of the chemical with SMILES c1ccc(cc1)COc2ccc(c(c2)N(=O)=O)NC(=O)c3ccc(cc3)C#N is 4.30."} {"text":"The LogP value computed using the Wildman-Crippen method of the compound with SMILES COc1ccc(nc1N(=O)=O)NC23C[C@H]4C[C@H](C2)CC(C4)(C3)O is 2.49."}", "/scratch/micpie/export/rdkit_features/valid_5-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES Cc1ccc(cc1)S(=O)(=O)n2cc(c3c2cccc3)C[NH+](C)Cc4nc(on4)C is 21."} {"text":"The number of aromatic bonds of the compound with SMILES c1ccc(cc1)\/C=C\/COC(=O)c2ccccc2NC(=O)[C@@H](Cc3cccc(c3)O)[NH3+] is 18."}", "/scratch/micpie/export/rdkit_features/test_101-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the molecule with SMILES COc1cccc(c1)CC(=O)Nc2cnccc2OC(F)F?\nAnswer: 4"} {"text":"Question: What is the number of hydrogen bond acceptor sites of the chemical with SMILES Cc1cccc(-c2nn(C(=S)Nc3cccc(C(N)=O)c3)cc2-c2ccnc3ccccc23)n1?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_24-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES c1cc(ncc1c2n[n-]nn2)NC(=O)c3c(nc4n3ccc(c4)Cl)Cl?\nAnswer: C14H7Cl2N8O-"} {"text":"Question: What is the chemical formula of the molecule with SMILES CCc1nnc(n1CCC[NH+]2CCC[C@@H]2C(=O)N(C)C)N3CCCC[C@H](C3)C?\nAnswer: C21H39N6O+"}", "/scratch/micpie/export/rdkit_features/valid_112-18.jsonl": "{"text":"Question: What is the number of aromatic bonds of the molecule with SMILES Cc1c(scn1)CCn2c(nnc2N3CCN(C(=O)C3)C)c4ccc(c(c4)OC)OC?\nAnswer: 16"} {"text":"Question: What is the aromatic bond count of the molecule with SMILES Cc1cccc(n1)\/C=C\/C(=O)N2CC[C@H]([C@H]2c3cccnc3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 12"}", "/scratch/micpie/export/rdkit_features/test_13-15.jsonl": "{"text":"Question: What is the number of heteroatoms of the molecule with SMILES CC(C)c1ccc(cc1Cl)C(=O)NCCc2ccc(c(c2)F)OC?\nAnswer: 5"} {"text":"Question: What is the number of heteroatoms of the chemical with SMILES c1ccc2c(c1)CCC3(C2)CCN(CC3)C(=O)OCC4CCCCC4?\nAnswer: 3"}", "/scratch/micpie/export/rdkit_features/valid_11-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the chemical with SMILES C[C@@H]1CC(C[C@H](O1)C)C(=O)N[C@@H]2CCC[C@H](C2)c3cccc(c3)Cl?\nAnswer: 4"} {"text":"Question: What is the number of heteroatoms of the molecule with SMILES C[C@H](c1ccc(c(c1)Cl)Cl)C(=O)N2CC([C@H]2C3CC3)(C)C?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/valid_105-3.jsonl": "{"text":"The heteroatom count of the compound with SMILES Cc1nc(cs1)c2ccc(s2)CCNC(=O)c3cccc(c3C#N)F is 7."} {"text":"The count of heteroatoms of the compound with SMILES c1cc2c(cc1N(=O)=O)c(cc([nH+]2)Cl)N3C[C@H]([C@H](C3)F)O is 8."}", "/scratch/micpie/export/rdkit_features/valid_109-23.jsonl": "{"text":"User: I want to create a chemical with a count of hydrogen bond donors of 3, a count of hydrogen bond acceptors of 5 and a Wildman-Crippen LogP value of 2.40.\nAssistant: Nice, do you have some additional that help me narrow down the search?\nUser: Yea, I want the molecular formula to be C22H26ClN3O4.\nAssistant: In that scenario, I the chemical with SMILES C[C@H](c1ccccc1)[C@H](CO)NC(=O)C(=O)Nc2cccc(c2N3CCOCC3)Cl."} {"text":"User: I want to synthesize a compound with a number of hydrogen bond donor sites of 3, a number of hydrogen bond acceptor sites of 5 and a LogP value computed using the Wildman-Crippen method of 2.09.\nAssistant: That is a very interesting question, do you have some additional limitations I should consider?\nUser: Yea, I want the chemical formula to be C21H19FN4O3S.\nAssistant: In that situation, I advise the compound with SMILES c1ccc(c(c1)C[C@@H]2C(=O)N[C@H](CS2)C(=O)Nc3cccc(c3)n4cc[nH]c4=O)F."}", "/scratch/micpie/export/rdkit_features/test_22-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the compound with SMILES Cc1cc(c(n1C)C)C(=O)N2CCC(CC2)[C@@H]3CCCCN3C(=O)[C@H](C)O?\nAnswer: 3"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES CC[NH+]1CCN(C[C@H]1C)c2nnc(n2CC[NH+]3CCOC[C@H]3C)C4CCCC4?\nAnswer: 6"}", "/scratch/micpie/export/rdkit_features/test_18-11.jsonl": "{"text":"User: I want to synthesize a molecule with a chemical formula of C20H25N3O9S.\nAssistant: Nice, do you have some additional constraints that I should consider?\nUser: Indeed, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptors to be 9.\nAssistant: In that scenario, I advise the molecule with SMILES CCS(=O)(=O)CC(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)c3ccc(c4c3OCO4)N(=O)=O."} {"text":"User: I want to make a molecule with a chemical formula of C21H28N2O3S2.\nAssistant: That's interesting, do you have some additional constraints?\nUser: Indeed, I want the number of hydrogen bond donors to be 1, the number of hydrogen bond acceptor sites to be 4.\nAssistant: Then, I suggest the molecule with SMILES Cc1cccc2c1CN(CC2)S(=O)(=O)c3ccc(s3)CCNC(=O)C(C)(C)C."}", "/scratch/micpie/export/rdkit_features/valid_117-0.jsonl": "{"text":"The molecular formula of the compound with SMILES CCOc1ccccc1C(=O)N2CCCC23CN(C3)C(=O)c4cc5ccccc5c(c4OC)Cl is C27H27ClN2O4."} {"text":"The chemical formula of the molecule with SMILES Cc1cc(ccc1[C@@H](C)[NH2+]CC2(CC2)CO)F is C14H21FNO+."}", "/scratch/micpie/export/rdkit_features/train_17-8.jsonl": "{"text":"The basic group count of the chemical with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F is 0."} {"text":"The basic group count of the molecule with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O is 0."}", "/scratch/micpie/export/rdkit_features/train_108-15.jsonl": "{"text":"Question: What is the heteroatom count of the molecule with SMILES CC(C)(C)NC(=O)CNC(=O)CSc1nnc(n1CC(F)(F)F)c2ccncc2?\nAnswer: 12"} {"text":"Question: What is the count of heteroatoms of the chemical with SMILES CC(C)[C@@H](C(=O)N1CCCC1)NC(=O)C(=O)Nc2ccc(cc2)C(=O)NC3CCCC3?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/test_1-15.jsonl": "{"text":"Question: What is the count of heteroatoms of the compound with SMILES Cc1c(cc2ccccc2n1)C(=O)NC[C@@H](c3ccccc3Cl)[NH+]4CCOCC4?\nAnswer: 6"} {"text":"Question: What is the count of heteroatoms of the molecule with SMILES Cn1c2c(cn1)CC[C@H](C2)c3nc(no3)c4ccc(nc4)OCc5ccc(cc5)OC?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/test_17-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the molecule with SMILES C[C@H](CC=C)C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F?\nAnswer: 6"} {"text":"Question: What is the number of rotatable bonds of the compound with SMILES Cn1cc(nc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/train_113-8.jsonl": "{"text":"The basic group count of the compound with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N is 2."} {"text":"The basic group count of the chemical with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl is 0."}", "/scratch/micpie/export/rdkit_features/test_10-0.jsonl": "{"text":"The molecular formula of the molecule with SMILES C[C@@H](Cn1c(nnc1N2CC[C@H](C2)CC(C)C)C3CCCC3)C[NH+]4CCOCC4 is C23H42N5O+."} {"text":"The formula of the compound with SMILES CC1(CN(C1)C(=O)CCCCCc2ccc(cc2)F)CC(F)(F)F is C18H23F4NO."}", "/scratch/micpie/export/rdkit_features/train_13-6.jsonl": "{"text":"The aromatic bond count of the compound with SMILES CC(C)[C@H](CC1CCCCC1)C(=O)NCCc2ccc(c(c2)F)OC is 6."} {"text":"The aromatic bond count of the molecule with SMILES C[C@@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C is 11."}", "/scratch/micpie/export/rdkit_features/train_14-17.jsonl": "{"text":"Question: What is the number of rotatable bonds of the compound with SMILES C[C@H](c1ccccc1)n2cc(cn2)NC(=O)NC(C)(C)CC(C)(C)C?\nAnswer: 5"} {"text":"Question: What is the count of rotatable bonds of the molecule with SMILES c1cc(cc(c1)Br)[C@@H](C(=O)OCC2(CCC2)C#N)O?\nAnswer: 4"}", "/scratch/micpie/export/rdkit_features/test_118-11.jsonl": "{"text":"User: I want to analyze a compound with a molecular formula of C14H23FNO+.\nAssistant: Interesting, do you have some additional that help me narrow down the search?\nUser: Yes, I want the number of hydrogen bond donor sites to be 2, the number of hydrogen bond acceptor sites to be 1.\nAssistant: In that case, I the compound with SMILES CC[C@@H](CC[NH2+][C@@H](C)c1ccc(cc1C)F)O."} {"text":"User: I want to create a chemical with a molecular formula of C14H12FN5O5.\nAssistant: Interesting, do you have some additional that help me narrow down the search?\nUser: I want the number of hydrogen bond donor sites to be 3, the number of hydrogen bond acceptor sites to be 7.\nAssistant: Then, I recommend the chemical with SMILES COc1c(cc(cn1)NC(=O)c2cc(c(cc2N)F)N(=O)=O)C(=O)N."}", "/scratch/micpie/export/rdkit_features/train_106-15.jsonl": "{"text":"Question: What is the heteroatom count of the molecule with SMILES C[C@@H]1CCN(CCC1(F)F)c2ncc(c(n2)OC)N(=O)=O?\nAnswer: 9"} {"text":"Question: What is the count of heteroatoms of the compound with SMILES CN(C)c1nccc(n1)[C@H]2CN(CCO2)C(=O)Nc3ccc(c(c3)N4CCNC4=O)Cl?\nAnswer: 11"}", "/scratch/micpie/export/rdkit_features/valid_17-16.jsonl": "{"text":"Question: What is the count of rings of the molecule with SMILES CC(C)[C@H](C(=O)N1CC[C@H]2CO[C@H]([C@H]2C1)CNC(=O)C3(CCC3)C(F)(F)F)S?\nAnswer: 3"} {"text":"Question: What is the ring count of the chemical with SMILES Cc1c([nH]nn1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 5"}", "/scratch/micpie/export/rdkit_features/test_110-14.jsonl": "{"text":"Question: What is the number of hydrogen bond acceptors of the compound with SMILES Cc1cccc(c1)C[NH+]2CCN(CC2)C(=O)COC(=O)CCCNC(=O)c3ccsc3?\nAnswer: 5"} {"text":"Question: What is the number of hydrogen bond acceptors of the chemical with SMILES CCN(C)c1nnc(n1C[C@H]2CCN(C2)c3ccccc3OC)[C@H]4CCS(=O)(=O)C4?\nAnswer: 8"}", "/scratch/micpie/export/rdkit_features/train_17-21.jsonl": "{"text":"Question: What is the total sum of atomic polarizabilities of the compound with SMILES CC#CC[C@H](c1ccccc1)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)C(C)(F)F?\nAnswer: 64.87"} {"text":"Question: What is the sum of atomic polarizabilities of the chemical with SMILES Cn1c(cccc1=O)C(=O)N2CC[C@H]3CO[C@H]([C@H]3C2)CNC(=O)c4ccc(c5c4OCO5)N(=O)=O?\nAnswer: 67.30"}", "/scratch/micpie/export/rdkit_features/train_113-18.jsonl": "{"text":"Question: What is the aromatic bond count of the chemical with SMILES c1cc(cnc1)[C@@H]2[C@@H](CCN2C(=O)COC3CCCCC3)CNC(=O)CSC(=[NH2+])N?\nAnswer: 6"} {"text":"Question: What is the count of aromatic bonds of the molecule with SMILES Cc1ccc(c(c1)C)NC(=S)Nc2ccc(c(c2)S(=O)(=O)Nc3ccccc3Cl)Cl?\nAnswer: 18"}", "/scratch/micpie/export/rdkit_features/test_114-12.jsonl": "{"text":"Question: What is the chemical formula of the chemical with SMILES Cc1ccc(cc1S(=O)(=O)Nc2ccc(cc2)OC)NC(=S)Nc3ccc(cc3)C(C)C?\nAnswer: C24H27N3O3S2"} {"text":"Question: What is the formula of the compound with SMILES COc1cc2c3ccccc3oc2cc1NC(=O)c4cccc(c4)NC(=O)CCn5ccnc5?\nAnswer: C26H22N4O4"}", "/scratch/micpie/export/rdkit_features/test_119-2.jsonl": "{"text":"The count of hydrogen bond acceptors of the chemical with SMILES CN(C)c1cccc(n1)CNC(=O)c2cc(c(cc2N)F)N(=O)=O is 6."} {"text":"The count of hydrogen bond acceptors of the chemical with SMILES Cc1ccc(cc1)S(=O)(=O)N(Cc2ccc(cc2)Br)CC(=O)NCCCOC(C)C is 4."}", "/scratch/micpie/export/rdkit_features/train_3-6.jsonl": "{"text":"The number of aromatic bonds of the compound with SMILES COc1ccc2c(c1)CC[C@H](C2)NC(=O)c3c(cco3)Cn4c5ccccc5cc4O is 21."} {"text":"The aromatic bond count of the chemical with SMILES C[C@H](c1ccc2cc(ccc2c1)OC)OC(=O)C[NH+]3CCN(CC3)c4ccccn4 is 17."}", "/scratch/micpie/export/oqmd/train_0-17.jsonl": "{"text":"Task: Predict a property of a material based on the description of the material.\nDescription: Predict the PBE-computed band gap of the structure with composition ZrZnNiMo.\nResult: 0.0000 eV"} {"text":"Task: Predict a property of a material based on the description of the material.\nDescription: Predict the PAW-PBE-computed band gap of the material with composition MgTiZn2.\nResult: 0.0000 eV"}", "/scratch/micpie/export/oqmd/train_0-16.jsonl": "{"text":"User: I need to design material with a PAW-PBE-computed band gap of 0.0000 eV and a volume per atom computed using the PBE GGA functional of 16.0142 \\AA^3 \/ atom, what do you suggest?\nAssistant: Here is a compound with a PAW-PBE-computed band gap of 0.0000 eV and a volume per atom computed using the PBE GGA functional of 16.0142 \\AA^3 \/ atom: ZrZnNiMo."} {"text":"User: I need to design material with a band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 0.0000 eV and a volume per atom computed using PAW-PBE of 16.0000 \\AA^3 \/ atom, what do you suggest?\nAssistant: I found a material with a band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 0.0000 eV and a volume per atom computed using PAW-PBE of 16.0000 \\AA^3 \/ atom: MgTiZn2."}", "/scratch/micpie/export/oqmd/test_0-10.jsonl": "{"text":"Question: What is the formation energy per atom computed using the PBE GGA functional of the material with composition BiRuW2?\nAnswer: The formation energy per atom computed using the PBE GGA functional of the material with composition BiRuW2 is 0.9702 eV \/ atom."} {"text":"Question: What is the formation energy per atom computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of the material with composition MnZnFeAg?\nAnswer: The formation energy per atom computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of the material with composition MnZnFeAg is 0.2862 eV \/ atom."}", "/scratch/micpie/export/oqmd/valid_0-8.jsonl": "{"text":"Question: What is the band gap computed using the PBE GGA functional of the structure with composition Ge3O and space group 221?\nAnswer: The band gap computed using the PBE GGA functional of the structure with composition Ge3O and space group 221 is 0.0000 eV."} {"text":"Question: What is the PBE-computed band gap of the material with composition TmGaCu2 and space group 225?\nAnswer: The PBE-computed band gap of the material with composition TmGaCu2 and space group 225 is 0.0000 eV."}", "/scratch/micpie/export/oqmd/test_0-16.jsonl": "{"text":"User: I need to design material with a band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 0.0000 eV and a volume per atom computed using the PBE GGA functional of 17.4858 \\AA^3 \/ atom, what do you suggest?\nAssistant: I have found a structure with a band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 0.0000 eV and a volume per atom computed using the PBE GGA functional of 17.4858 \\AA^3 \/ atom: BiRuW2."} {"text":"User: I need a material with a band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 0.0000 eV and a volume per atom computed using the PBE GGA functional of 14.2336 \\AA^3 \/ atom, what do you recommend?\nAssistant: Here is a material with a band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 0.0000 eV and a volume per atom computed using the PBE GGA functional of 14.2336 \\AA^3 \/ atom: MnZnFeAg."}", "/scratch/micpie/export/oqmd/test_0-15.jsonl": "{"text":"User: I am looking for a material with a band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 0.0000 eV.\nAssistant: Is there anything else you need the compound to have?\nUser: No other needs\nAssistant: In that case, I recommend the material with composition BiRuW2 and space group 225."} {"text":"User: I am looking for a material with a band gap computed using PAW-PBE of 0.0000 eV.\nAssistant: Are there are requirements?\nUser: None\nAssistant: Okay, I suggest the material with composition MnZnFeAg and space group with the International Tables number 216."}", "/scratch/micpie/export/oqmd/train_0-8.jsonl": "{"text":"Question: What is the band gap computed using the PBE GGA functional of the compound with composition ZrZnNiMo and space group with the International Tables number 216?\nAnswer: The band gap computed using the PBE GGA functional of the compound with composition ZrZnNiMo and space group with the International Tables number 216 is 0.0000 eV."} {"text":"Question: What is the band gap computed using the PBE GGA functional of the compound with composition MgTiZn2 and space group 216?\nAnswer: The band gap computed using the PBE GGA functional of the compound with composition MgTiZn2 and space group 216 is 0.0000 eV."}", "/scratch/micpie/export/oqmd/test_0-5.jsonl": "{"text":"The compound with composition BiRuW2 crystallizes in the space group with the International Tables number 225."} {"text":"The compound with composition MnZnFeAg crystallizes in the space group with the International Tables number 216."}", "/scratch/micpie/export/oqmd/valid_0-9.jsonl": "{"text":"Question: What is the volume per atom computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of the compound with composition Ge3O?\nAnswer: The volume per atom computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of the compound with composition Ge3O is 15.8687 \\AA^3 \/ atom."} {"text":"Question: What is the volume per atom computed using PAW-PBE of the material with composition TmGaCu2?\nAnswer: The volume per atom computed using PAW-PBE of the material with composition TmGaCu2 is 15.9520 \\AA^3 \/ atom."}", "/scratch/micpie/export/oqmd/test_0-1.jsonl": "{"text":"The material with composition BiRuW2 and space group 225 has a band gap computed using PAW-PBE of 0.0000 eV."} {"text":"The material with composition MnZnFeAg and space group 216 has a band gap computed using PAW-PBE of 0.0000 eV."}", "/scratch/micpie/export/oqmd/test_0-18.jsonl": "{"text":"Task: Predict a property of a material based on the description of the material.\nDescription: Predict the band gap computed using the PBE GGA functional of the structure with composition BiRuW2 and space group with the International Tables number 225 and a volume per atom computed using PAW-PBE of 17.4858 \\AA^3 \/ atom.\nResult: 0.0000 eV"} {"text":"Task: Predict a property of a material based on the description of the material.\nDescription: Predict the PAW-PBE-computed band gap of the material with composition MnZnFeAg and space group with the International Tables number 216 and a volume per atom computed using PAW-PBE of 14.2336 \\AA^3 \/ atom.\nResult: 0.0000 eV"}", "/scratch/micpie/export/oqmd/valid_0-0.jsonl": "{"text":"The material with composition Ge3O has a PBE-computed band gap of 0.0000 eV."} {"text":"The structure with composition TmGaCu2 has a band gap computed using the PBE GGA functional of 0.0000 eV."}", "/scratch/micpie/export/oqmd/test_0-2.jsonl": "{"text":"The material with composition BiRuW2 has a band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 0.0000 eV and a energy per atom computed using PAW-PBE of -8.8201 eV \/ atom."} {"text":"The structure with composition MnZnFeAg has a band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 0.0000 eV and a energy per atom computed using PAW-PBE of -5.0695 eV \/ atom."}", "/scratch/micpie/export/oqmd/valid_0-10.jsonl": "{"text":"Question: What is the formation energy per atom computed using PAW-PBE of the material with composition Ge3O?\nAnswer: The formation energy per atom computed using PAW-PBE of the material with composition Ge3O is 0.2157 eV \/ atom."} {"text":"Question: What is the formation energy per atom computed using the PBE GGA functional of the structure with composition TmGaCu2?\nAnswer: The formation energy per atom computed using the PBE GGA functional of the structure with composition TmGaCu2 is -0.3508 eV \/ atom."}", "/scratch/micpie/export/oqmd/train_0-6.jsonl": "{"text":"The structure with composition ZrZnNiMo crystallizes in the space group with the International Tables number 216 and has a band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 0.0000 eV."} {"text":"The structure with composition MgTiZn2 occurs in the space group 216 and has a band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 0.0000 eV."}", "/scratch/micpie/export/oqmd/valid_0-6.jsonl": "{"text":"The material with composition Ge3O occurs in the space group 221 and has a band gap computed using PAW-PBE of 0.0000 eV."} {"text":"The material with composition TmGaCu2 can be found in the space group 225 and has a band gap computed using PAW-PBE of 0.0000 eV."}", "/scratch/micpie/export/oqmd/test_0-9.jsonl": "{"text":"Question: What is the volume per atom computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of the material with composition BiRuW2?\nAnswer: The volume per atom computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of the material with composition BiRuW2 is 17.4858 \\AA^3 \/ atom."} {"text":"Question: What is the volume per atom computed using PAW-PBE of the compound with composition MnZnFeAg?\nAnswer: The volume per atom computed using PAW-PBE of the compound with composition MnZnFeAg is 14.2336 \\AA^3 \/ atom."}", "/scratch/micpie/export/oqmd/test_0-0.jsonl": "{"text":"The compound with composition BiRuW2 has a band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 0.0000 eV."} {"text":"The material with composition MnZnFeAg has a band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 0.0000 eV."}", "/scratch/micpie/export/oqmd/valid_0-16.jsonl": "{"text":"User: I want to design material with a band gap computed using the PBE GGA functional of 0.0000 eV and a volume per atom computed using the PBE GGA functional of 15.8687 \\AA^3 \/ atom, what do you suggest?\nAssistant: I have found a structure with a band gap computed using the PBE GGA functional of 0.0000 eV and a volume per atom computed using the PBE GGA functional of 15.8687 \\AA^3 \/ atom: Ge3O."} {"text":"User: I need to design material with a PBE-computed band gap of 0.0000 eV and a volume per atom computed using the PBE GGA functional of 15.9520 \\AA^3 \/ atom, what should I do?\nAssistant: Here is a structure with a PBE-computed band gap of 0.0000 eV and a volume per atom computed using the PBE GGA functional of 15.9520 \\AA^3 \/ atom: TmGaCu2."}", "/scratch/micpie/export/oqmd/valid_0-7.jsonl": "{"text":"Question: What is the band gap computed using PAW-PBE of the structure with composition Ge3O?\nAnswer: The band gap computed using PAW-PBE of the structure with composition Ge3O is 0.0000 eV."} {"text":"Question: What is the PAW-PBE-computed band gap of the compound with composition TmGaCu2?\nAnswer: The PAW-PBE-computed band gap of the compound with composition TmGaCu2 is 0.0000 eV."}", "/scratch/micpie/export/oqmd/test_0-3.jsonl": "{"text":"The structure with composition BiRuW2 and space group 225 has a band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 0.0000 eV and a magnetization per atom computed using the PBE GGA functional of 0.0000 \\mu B."} {"text":"The compound with composition MnZnFeAg and space group 216 has a PBE-computed band gap of 0.0000 eV and a magnetization per atom computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 1.3474 \\mu B."}", "/scratch/micpie/export/oqmd/valid_0-11.jsonl": "{"text":"Question: What is the magnetization per atom computed using the PBE GGA functional of the material with composition Ge3O?\nAnswer: The magnetization per atom computed using the PBE GGA functional of the material with composition Ge3O is 0.0000 \\mu B."} {"text":"Question: What is the magnetization per atom computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of the material with composition TmGaCu2?\nAnswer: The magnetization per atom computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of the material with composition TmGaCu2 is 0.0000 \\mu B."}", "/scratch/micpie/export/oqmd/train_0-0.jsonl": "{"text":"The compound with composition ZrZnNiMo has a PBE-computed band gap of 0.0000 eV."} {"text":"The compound with composition MgTiZn2 has a band gap computed using the PBE GGA functional of 0.0000 eV."}", "/scratch/micpie/export/oqmd/test_0-6.jsonl": "{"text":"The compound with composition BiRuW2 crystallizes in the space group 225 and has a band gap computed using the PBE GGA functional of 0.0000 eV."} {"text":"The material with composition MnZnFeAg can be found in the space group 216 and has a PBE-computed band gap of 0.0000 eV."}", "/scratch/micpie/export/oqmd/train_0-10.jsonl": "{"text":"Question: What is the formation energy per atom computed using the PBE GGA functional of the material with composition ZrZnNiMo?\nAnswer: The formation energy per atom computed using the PBE GGA functional of the material with composition ZrZnNiMo is 0.1579 eV \/ atom."} {"text":"Question: What is the formation energy per atom computed using the PBE GGA functional of the compound with composition MgTiZn2?\nAnswer: The formation energy per atom computed using the PBE GGA functional of the compound with composition MgTiZn2 is 0.0511 eV \/ atom."}", "/scratch/micpie/export/oqmd/train_0-3.jsonl": "{"text":"The compound with composition ZrZnNiMo and space group 216 has a band gap computed using the PBE GGA functional of 0.0000 eV and a magnetization per atom computed using the PBE GGA functional of 0.4525 \\mu B."} {"text":"The structure with composition MgTiZn2 and space group with the International Tables number 216 has a PAW-PBE-computed band gap of 0.0000 eV and a magnetization per atom computed using the PBE GGA functional of 0.0050 \\mu B."}", "/scratch/micpie/export/oqmd/train_0-12.jsonl": "{"text":"User: I am looking for a material with a band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 0.0000 eV.\nAssistant: Do you have other constraints?\nUser: I would like it to have a volume per atom computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 16.0142 \\AA^3 \/ atom.\nAssistant: Do you have other constraints?\nUser: I would like it to crystallize in the space group with the International Tables number 216.\nAssistant: Okay, I recommend the material with composition ZrZnNiMo."} {"text":"User: I am looking for a compound with a PBE-computed band gap of 0.0000 eV.\nAssistant: Are there are requirements?\nUser: I would like it to have a volume per atom computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 16.0000 \\AA^3 \/ atom.\nAssistant: Are there are requirements?\nUser: I would like it to crystallize in the space group 216.\nAssistant: In this case, I suggest the compound with composition MgTiZn2."}", "/scratch/micpie/export/oqmd/test_0-13.jsonl": "{"text":"User: I am looking for a structure with a PAW-PBE-computed band gap of 0.0000 eV.\nAssistant: Is there anything else you need the compound to have?\nUser: I would like it to crystallize in the space group 225.\nAssistant: In this case, I recommend the structure with composition BiRuW2."} {"text":"User: I am looking for a compound with a PAW-PBE-computed band gap of 0.0000 eV.\nAssistant: Do you have other constraints?\nUser: I would like it to crystallize in the space group with the International Tables number 216.\nAssistant: In this case, I suggest the compound with composition MnZnFeAg."}", "/scratch/micpie/export/oqmd/valid_0-2.jsonl": "{"text":"The structure with composition Ge3O has a band gap computed using PAW-PBE of 0.0000 eV and a energy per atom computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of -4.3828 eV \/ atom."} {"text":"The structure with composition TmGaCu2 has a PAW-PBE-computed band gap of 0.0000 eV and a energy per atom computed using PAW-PBE of -4.0855 eV \/ atom."}", "/scratch/micpie/export/oqmd/train_0-14.jsonl": "{"text":"User: I need a structure with a volume per atom computed using PAW-PBE of 16.0142 \\AA^3 \/ atom.\nAssistant: Do you have other constraints?\nUser: I would like it to occur in the space group with the International Tables number 216.\nAssistant: In that case, I recommend the structure with composition ZrZnNiMo."} {"text":"User: I need a material with a volume per atom computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 16.0000 \\AA^3 \/ atom.\nAssistant: Do you have other constraints?\nUser: I would like it to be found in the space group with the International Tables number 216.\nAssistant: In this case, I recommend the material with composition MgTiZn2."}", "/scratch/micpie/export/oqmd/valid_0-1.jsonl": "{"text":"The structure with composition Ge3O and space group 221 has a band gap computed using PAW-PBE of 0.0000 eV."} {"text":"The structure with composition TmGaCu2 and space group with the International Tables number 225 has a band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 0.0000 eV."}", "/scratch/micpie/export/oqmd/valid_0-13.jsonl": "{"text":"User: I need a structure with a band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 0.0000 eV.\nAssistant: Do you have other constraints?\nUser: I would like it to occur in the space group with the International Tables number 221.\nAssistant: In this case, I recommend the structure with composition Ge3O."} {"text":"User: I am looking for a structure with a band gap computed using PAW-PBE of 0.0000 eV.\nAssistant: Do you have other constraints?\nUser: I would like it to occur in the space group 225.\nAssistant: Great, I recommend the structure with composition TmGaCu2."}", "/scratch/micpie/export/oqmd/valid_0-5.jsonl": "{"text":"The material with composition Ge3O crystallizes in the space group with the International Tables number 221."} {"text":"The compound with composition TmGaCu2 crystallizes in the space group with the International Tables number 225."}", "/scratch/micpie/export/oqmd/train_0-15.jsonl": "{"text":"User: I need a structure with a band gap computed using PAW-PBE of 0.0000 eV.\nAssistant: Are there are requirements?\nUser: No other constraints\nAssistant: Okay, I suggest the structure with composition ZrZnNiMo and space group 216."} {"text":"User: I am looking for a compound with a band gap computed using PAW-PBE of 0.0000 eV.\nAssistant: Are there are requirements?\nUser: No other needs\nAssistant: In that case, I suggest the compound with composition MgTiZn2 and space group 216."}", "/scratch/micpie/export/oqmd/valid_0-4.jsonl": "{"text":"The structure with composition Ge3O has a band gap computed using PAW-PBE of 0.0000 eV and a volume per atom computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 15.8687 \\AA^3 \/ atom."} {"text":"The structure with composition TmGaCu2 has a band gap computed using the PBE GGA functional of 0.0000 eV and a volume per atom computed using the PBE GGA functional of 15.9520 \\AA^3 \/ atom."}", "/scratch/micpie/export/oqmd/train_0-5.jsonl": "{"text":"The material with composition ZrZnNiMo occurs in the space group 216."} {"text":"The compound with composition MgTiZn2 crystallizes in the space group with the International Tables number 216."}", "/scratch/micpie/export/oqmd/valid_0-15.jsonl": "{"text":"User: I am looking for a compound with a band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 0.0000 eV.\nAssistant: Are there are requirements?\nUser: No other needs\nAssistant: In that case, I suggest the compound with composition Ge3O and space group with the International Tables number 221."} {"text":"User: I am looking for a material with a band gap computed using PAW-PBE of 0.0000 eV.\nAssistant: Are there are requirements?\nUser: No other needs\nAssistant: In that case, I suggest the material with composition TmGaCu2 and space group 225."}", "/scratch/micpie/export/oqmd/valid_0-12.jsonl": "{"text":"User: I am looking for a material with a PBE-computed band gap of 0.0000 eV.\nAssistant: Is there anything else you need the compound to have?\nUser: I would like it to have a volume per atom computed using PAW-PBE of 15.8687 \\AA^3 \/ atom.\nAssistant: Is there anything else you need the compound to have?\nUser: I would like it to crystallize in the space group 221.\nAssistant: Okay, I recommend the material with composition Ge3O."} {"text":"User: I am looking for a compound with a PAW-PBE-computed band gap of 0.0000 eV.\nAssistant: Are there are requirements?\nUser: I would like it to have a volume per atom computed using the PBE GGA functional of 15.9520 \\AA^3 \/ atom.\nAssistant: Are there are requirements?\nUser: I would like it to be found in the space group with the International Tables number 225.\nAssistant: In that case, I suggest the compound with composition TmGaCu2."}", "/scratch/micpie/export/oqmd/valid_0-18.jsonl": "{"text":"Task: Predict a property of a material based on the description of the material.\nDescription: Predict the band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of the compound with composition Ge3O and space group 221 and a volume per atom computed using the PBE GGA functional of 15.8687 \\AA^3 \/ atom.\nResult: 0.0000 eV"} {"text":"Task: Predict a property of a material based on the description of the material.\nDescription: Predict the PAW-PBE-computed band gap of the structure with composition TmGaCu2 and space group with the International Tables number 225 and a volume per atom computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 15.9520 \\AA^3 \/ atom.\nResult: 0.0000 eV"}", "/scratch/micpie/export/oqmd/train_0-2.jsonl": "{"text":"The material with composition ZrZnNiMo has a band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 0.0000 eV and a energy per atom computed using PAW-PBE of -6.3990 eV \/ atom."} {"text":"The material with composition MgTiZn2 has a band gap computed using PAW-PBE of 0.0000 eV and a energy per atom computed using PAW-PBE of -2.8920 eV \/ atom."}", "/scratch/micpie/export/oqmd/test_0-11.jsonl": "{"text":"Question: What is the magnetization per atom computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of the structure with composition BiRuW2?\nAnswer: The magnetization per atom computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of the structure with composition BiRuW2 is 0.0000 \\mu B."} {"text":"Question: What is the magnetization per atom computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of the compound with composition MnZnFeAg?\nAnswer: The magnetization per atom computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of the compound with composition MnZnFeAg is 1.3474 \\mu B."}", "/scratch/micpie/export/oqmd/train_0-7.jsonl": "{"text":"Question: What is the band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of the material with composition ZrZnNiMo?\nAnswer: The band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of the material with composition ZrZnNiMo is 0.0000 eV."} {"text":"Question: What is the band gap computed using the PBE GGA functional of the structure with composition MgTiZn2?\nAnswer: The band gap computed using the PBE GGA functional of the structure with composition MgTiZn2 is 0.0000 eV."}", "/scratch/micpie/export/oqmd/test_0-17.jsonl": "{"text":"Task: Predict a property of a material based on the description of the material.\nDescription: Predict the PAW-PBE-computed band gap of the material with composition BiRuW2.\nResult: 0.0000 eV"} {"text":"Task: Predict a property of a material based on the description of the material.\nDescription: Predict the band gap computed using the PBE GGA functional of the structure with composition MnZnFeAg.\nResult: 0.0000 eV"}", "/scratch/micpie/export/oqmd/train_0-11.jsonl": "{"text":"Question: What is the magnetization per atom computed using PAW-PBE of the compound with composition ZrZnNiMo?\nAnswer: The magnetization per atom computed using PAW-PBE of the compound with composition ZrZnNiMo is 0.4525 \\mu B."} {"text":"Question: What is the magnetization per atom computed using PAW-PBE of the compound with composition MgTiZn2?\nAnswer: The magnetization per atom computed using PAW-PBE of the compound with composition MgTiZn2 is 0.0050 \\mu B."}", "/scratch/micpie/export/oqmd/train_0-1.jsonl": "{"text":"The material with composition ZrZnNiMo and space group with the International Tables number 216 has a band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 0.0000 eV."} {"text":"The material with composition MgTiZn2 and space group with the International Tables number 216 has a band gap computed using PAW-PBE of 0.0000 eV."}", "/scratch/micpie/export/oqmd/train_0-13.jsonl": "{"text":"User: I am looking for a material with a band gap computed using PAW-PBE of 0.0000 eV.\nAssistant: Are there are requirements?\nUser: I would like it to crystallize in the space group with the International Tables number 216.\nAssistant: In that case, I recommend the material with composition ZrZnNiMo."} {"text":"User: I am looking for a material with a band gap computed using the PBE GGA functional of 0.0000 eV.\nAssistant: Are there are requirements?\nUser: I would like it to be found in the space group 216.\nAssistant: Okay, I recommend the material with composition MgTiZn2."}", "/scratch/micpie/export/oqmd/train_0-4.jsonl": "{"text":"The structure with composition ZrZnNiMo has a band gap computed using the PBE GGA functional of 0.0000 eV and a volume per atom computed using the PBE GGA functional of 16.0142 \\AA^3 \/ atom."} {"text":"The material with composition MgTiZn2 has a PAW-PBE-computed band gap of 0.0000 eV and a volume per atom computed using PAW-PBE of 16.0000 \\AA^3 \/ atom."}", "/scratch/micpie/export/oqmd/test_0-7.jsonl": "{"text":"Question: What is the band gap computed using PAW-PBE of the material with composition BiRuW2?\nAnswer: The band gap computed using PAW-PBE of the material with composition BiRuW2 is 0.0000 eV."} {"text":"Question: What is the band gap computed using PAW-PBE of the compound with composition MnZnFeAg?\nAnswer: The band gap computed using PAW-PBE of the compound with composition MnZnFeAg is 0.0000 eV."}", "/scratch/micpie/export/oqmd/train_0-9.jsonl": "{"text":"Question: What is the volume per atom computed using the PBE GGA functional of the material with composition ZrZnNiMo?\nAnswer: The volume per atom computed using the PBE GGA functional of the material with composition ZrZnNiMo is 16.0142 \\AA^3 \/ atom."} {"text":"Question: What is the volume per atom computed using the PBE GGA functional of the structure with composition MgTiZn2?\nAnswer: The volume per atom computed using the PBE GGA functional of the structure with composition MgTiZn2 is 16.0000 \\AA^3 \/ atom."}", "/scratch/micpie/export/oqmd/train_0-18.jsonl": "{"text":"Task: Predict a property of a material based on the description of the material.\nDescription: Predict the PAW-PBE-computed band gap of the material with composition ZrZnNiMo and space group with the International Tables number 216 and a volume per atom computed using PAW-PBE of 16.0142 \\AA^3 \/ atom.\nResult: 0.0000 eV"} {"text":"Task: Predict a property of a material based on the description of the material.\nDescription: Predict the band gap computed using PAW-PBE of the structure with composition MgTiZn2 and space group 216 and a volume per atom computed using PAW-PBE of 16.0000 \\AA^3 \/ atom.\nResult: 0.0000 eV"}", "/scratch/micpie/export/oqmd/valid_0-3.jsonl": "{"text":"The structure with composition Ge3O and space group with the International Tables number 221 has a band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 0.0000 eV and a magnetization per atom computed using PAW-PBE of 0.0000 \\mu B."} {"text":"The structure with composition TmGaCu2 and space group with the International Tables number 225 has a band gap computed using the PBE GGA functional of 0.0000 eV and a magnetization per atom computed using PAW-PBE of 0.0000 \\mu B."}", "/scratch/micpie/export/oqmd/test_0-8.jsonl": "{"text":"Question: What is the band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of the structure with composition BiRuW2 and space group 225?\nAnswer: The band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of the structure with composition BiRuW2 and space group 225 is 0.0000 eV."} {"text":"Question: What is the PAW-PBE-computed band gap of the compound with composition MnZnFeAg and space group with the International Tables number 216?\nAnswer: The PAW-PBE-computed band gap of the compound with composition MnZnFeAg and space group with the International Tables number 216 is 0.0000 eV."}", "/scratch/micpie/export/oqmd/test_0-14.jsonl": "{"text":"User: I need a material with a volume per atom computed using the PBE GGA functional of 17.4858 \\AA^3 \/ atom.\nAssistant: Do you have other constraints?\nUser: I would like it to occur in the space group 225.\nAssistant: In this case, I suggest the material with composition BiRuW2."} {"text":"User: I need a compound with a volume per atom computed using the PBE GGA functional of 14.2336 \\AA^3 \/ atom.\nAssistant: Are there are requirements?\nUser: I would like it to be found in the space group with the International Tables number 216.\nAssistant: Great, I recommend the compound with composition MnZnFeAg."}", "/scratch/micpie/export/oqmd/valid_0-17.jsonl": "{"text":"Task: Predict a property of a material based on the description of the material.\nDescription: Predict the PBE-computed band gap of the material with composition Ge3O.\nResult: 0.0000 eV"} {"text":"Task: Predict a property of a material based on the description of the material.\nDescription: Predict the band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of the material with composition TmGaCu2.\nResult: 0.0000 eV"}", "/scratch/micpie/export/oqmd/valid_0-14.jsonl": "{"text":"User: I am looking for a structure with a volume per atom computed using the PBE GGA functional of 15.8687 \\AA^3 \/ atom.\nAssistant: Are there are requirements?\nUser: I would like it to occur in the space group with the International Tables number 221.\nAssistant: In this case, I recommend the structure with composition Ge3O."} {"text":"User: I need a structure with a volume per atom computed using PAW-PBE of 15.9520 \\AA^3 \/ atom.\nAssistant: Do you have other constraints?\nUser: I would like it to occur in the space group 225.\nAssistant: Great, I suggest the structure with composition TmGaCu2."}", "/scratch/micpie/export/oqmd/test_0-4.jsonl": "{"text":"The material with composition BiRuW2 has a PAW-PBE-computed band gap of 0.0000 eV and a volume per atom computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 17.4858 \\AA^3 \/ atom."} {"text":"The material with composition MnZnFeAg has a band gap computed using PAW-PBE of 0.0000 eV and a volume per atom computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 14.2336 \\AA^3 \/ atom."}", "/scratch/micpie/export/oqmd/test_0-12.jsonl": "{"text":"User: I am looking for a structure with a band gap computed using PAW-PBE of 0.0000 eV.\nAssistant: Are there are requirements?\nUser: I would like it to have a volume per atom computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 17.4858 \\AA^3 \/ atom.\nAssistant: Are there are requirements?\nUser: I would like it to be found in the space group 225.\nAssistant: Okay, I recommend the structure with composition BiRuW2."} {"text":"User: I am looking for a structure with a band gap computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 0.0000 eV.\nAssistant: Do you have other constraints?\nUser: I would like it to have a volume per atom computed using DFT (with the PAW method as implemented in VASP) using the PBE functional of 14.2336 \\AA^3 \/ atom.\nAssistant: Do you have other constraints?\nUser: I would like it to crystallize in the space group with the International Tables number 216.\nAssistant: Okay, I recommend the structure with composition MnZnFeAg."}", "/scratch/micpie/export/drug_disease_pathway_protein/valid_0-0.jsonl": "{"text":"The drug Mitiglinide is indicated for the Type 2 diabetes mellitus disease and modulates the TGF-beta signaling pathway pathway. The pathway TGF-beta signaling pathway contains the protein PP2A-alpha."} {"text":"The drug NC=NC=O)NC=N6))[C@H]C[C@H]O)[C@@H]CO))O5 is indicated for the Myelodysplastic syndrome disease and modulates the Spliceosome pathway. The pathway Spliceosome contains the protein Pre-mRNA-splicing factor SRP40."}", "/scratch/micpie/export/drug_disease_pathway_protein/test_0-0.jsonl": "{"text":"The drug [H][C@@]CC[C@H]O)[C@@]5C)CC[C@][H])C=CCC[C@@]%136[H]))))C=CO)C=C6 is indicated for the Premature ovarian failure disease and modulates the Ovarian steroidogenesis pathway. The pathway Ovarian steroidogenesis contains the protein LH-B."} {"text":"The drug Acetylsalicylic acid is indicated for the Ankylosing spondylitis disease and modulates the Antigen processing and presentation pathway. The pathway Antigen processing and presentation contains the protein LAP-1."}", "/scratch/micpie/export/drug_disease_pathway_protein/train_0-0.jsonl": "{"text":"The drug Argatroban is indicated for the Antithrombin III deficiency disease and modulates the Complement and coagulation cascades pathway. The pathway Complement and coagulation cascades contains the protein C3 convertase activator."} {"text":"The drug Phylloquinone is indicated for the Combined deficiency of vitamin K-dependent clotting factors disease and modulates the Ubiquinone and other terpenoid-quinone biosynthesis pathway. The pathway Ubiquinone and other terpenoid-quinone biosynthesis contains the protein Vitamin K gamma glutamyl carboxylase."}", "/scratch/micpie/export/RedDB/train_0-28.jsonl": "{"text":"Task: Please create a molecular species with the InChI based on the text description below.\nDescription: It has an ML-predicted aqueous solubility -0.241 logS and a cavity formation energy at the PBE level of theory of 4.054 kT.\n Result: FGTVOMHLQDTQAT-UHFFFAOYSA-N"} {"text":"Task: Please generate a chemical compound with the InChI based on the text description.\nDescription: It has an ML-predicted aqueous solubility -3.009 logS and a cavity formation energy at the PBE level of theory of 6.746 kT.\n Result: ZZYJBWDQNXOGRH-UHFFFAOYSA-N"}", "/scratch/micpie/export/RedDB/train_0-17.jsonl": "{"text":"The molecular species with the InChI representation of FGTVOMHLQDTQAT-UHFFFAOYSA-N has a cavity formation energy at the PBE level of theory of 4.054 kT."} {"text":"The chemical compound with the InChI representation of ZZYJBWDQNXOGRH-UHFFFAOYSA-N has a cavity formation energy at the PBE level of theory of 6.746 kT."}", "/scratch/micpie/export/RedDB/train_0-16.jsonl": "{"text":"The molecule with the InChI representation of FGTVOMHLQDTQAT-UHFFFAOYSA-N has a solvent-accessible surface area of 269.306 \\AA^2."} {"text":"The chemical compound with the InChI representation of ZZYJBWDQNXOGRH-UHFFFAOYSA-N has a solvent-accessible surface area of 597.304 \\AA^2."}", "/scratch/micpie/export/RedDB/test_0-10.jsonl": "{"text":"The chemical compound with the DeepSMILES O=CCC=O)CF)=C5F has a nuclear repulsion energy at the PBE level of theory of 426.524 Hartree."} {"text":"The compound with the InChI representation of InChI=1S\/C16H12N2O13S3\/c19-13-6-2-1-3-8(32(23,24)25)10(6)15(21)17(13)18-14(20)7-4-5-9(33(26,27)28)12(34(29,30)31)11(7)16(18)22\/h1-5,19-22H,(H,23,24,25)(H,26,27,28)(H,29,30,31) has a nuclear repulsion energy at the PBE level of theory of 2038.509 Hartree."}", "/scratch/micpie/export/RedDB/valid_0-8.jsonl": "{"text":"The molecule with the DeepSMILES representation of O=CCC=O)C=C5 has a aqueous phase molecular energy at the PBE level of theory of -342.998 Hartree."} {"text":"The chemical with the SELFIES [C][=C][C][Branch1][C][O][=C][C][Branch1][Ring2][C][Ring1][#Branch1][=C][Branch1][C][O][N][Branch1][Branch1][C][=Ring1][Branch1][O][N][Branch1][Ring1][C][O][C][Branch1][C][O][=C][Branch1][Ring2][C][=Ring1][=Branch1][C][Branch1][C][O][=C][Branch1][C][O][C][=C][Ring1][#Branch1][O] has a aqueous phase molecular energy at the PBE level of theory of -1326.756 Hartree."}", "/scratch/micpie/export/RedDB/test_0-22.jsonl": "{"text":"The compound with the InChI representation of FXAKRDFRFAQBST-UHFFFAOYSA-N has a aqueous phase LUMO energy at the PBE level of theory of -0.151 Hartree."} {"text":"The chemical compound with the InChI YWJNWARKAJINTC-UHFFFAOYSA-N has a aqueous phase LUMO energy at the PBE level of theory of -0.078 Hartree."}", "/scratch/micpie/export/RedDB/test_0-16.jsonl": "{"text":"The chemical compound with the InChI representation of FXAKRDFRFAQBST-UHFFFAOYSA-N has a solvent-accessible surface area of 276.867 \\AA^2."} {"text":"The chemical with the InChI representation of YWJNWARKAJINTC-UHFFFAOYSA-N has a solvent-accessible surface area of 525.692 \\AA^2."}", "/scratch/micpie/export/RedDB/test_0-15.jsonl": "{"text":"The chemical compound with the InChI representation of FXAKRDFRFAQBST-UHFFFAOYSA-N has a chemical reaction field energy of -14.254 kT."} {"text":"The chemical compound with the InChI representation of YWJNWARKAJINTC-UHFFFAOYSA-N has a chemical reaction field energy of -35.555 kT."}", "/scratch/micpie/export/RedDB/train_0-27.jsonl": "{"text":"Task: Please give me a compound with the canonical SMILES based on the text description below.\nDescription: It has an ML-predicted aqueous solubility -0.241 logS and a cavity formation energy at the PBE level of theory of 4.054 kT.\n Result: O=C1CC(=O)C(O)=C1O"} {"text":"Task: Please give me a chemical with the SMILES based on the text description below.\nDescription: It has an ML-predicted aqueous solubility -3.009 logS and a cavity formation energy at the PBE level of theory of 6.746 kT.\n Result: c1c(N)c(N)cc(c12)c(O)n(c2O)-n(c3O)c(O)c(c34)c(N)c(N)c(N)c4N"}", "/scratch/micpie/export/RedDB/train_0-8.jsonl": "{"text":"The molecular species with the canonical SMILES representation of O=C1CC(=O)C(O)=C1O has a aqueous phase molecular energy at the PBE level of theory of -493.338 Hartree."} {"text":"The compound with the canonical SMILES Nc1cc2c(O)n(-n3c(O)c4c(N)c(N)c(N)c(N)c4c3O)c(O)c2cc1N has a aqueous phase molecular energy at the PBE level of theory of -1468.532 Hartree."}", "/scratch/micpie/export/RedDB/test_0-5.jsonl": "{"text":"The molecular species with the canonical SMILES representation of O=C1CC(=O)C(F)=C1F has a gas-phase molecular energy at the PBE level of theory of -541.285 Hartree."} {"text":"The compound with the SMILES c1ccc(S(=O)(=O)O)c(c12)c(O)n(c2O)-n(c3O)c(O)c(c34)c(S(=O)(=O)O)c(cc4)S(=O)(=O)O has a gas-phase molecular energy at the PBE level of theory of -1136.677 Hartree."}", "/scratch/micpie/export/RedDB/valid_0-25.jsonl": "{"text":"The molecule with the InChI MCFZBCCYOPSZLG-UHFFFAOYSA-N has a optimized gas-phase HOMO energy at the PBE level of theory of -0.221 Hartree."} {"text":"The molecular species with the InChI representation of ZYYXVQMXUNSARX-UHFFFAOYSA-N has a optimized gas-phase HOMO energy at the PBE level of theory of -0.166 Hartree."}", "/scratch/micpie/export/RedDB/valid_0-9.jsonl": "{"text":"The molecular species with the InChI representation of InChI=1S\/C5H4O2\/c6-4-1-2-5(7)3-4\/h1-2H,3H2 has a aqueous phase LUMO energy at the PBE level of theory of -0.145 Hartree."} {"text":"The chemical with the SMILES representation of c1cc(O)cc(c12)c(O)n(c2O)-n(c3O)c(O)c(c34)c(O)c(O)cc4O has a aqueous phase LUMO energy at the PBE level of theory of -0.075 Hartree."}", "/scratch/micpie/export/RedDB/test_0-26.jsonl": "{"text":"The chemical with the InChI FXAKRDFRFAQBST-UHFFFAOYSA-N has a optimized gas-phase LUMO energy calculated with DFT at the PBE level of theory of -0.154 Hartree."} {"text":"The molecular species with the InChI YWJNWARKAJINTC-UHFFFAOYSA-N has a optimized gas-phase LUMO energy calculated with DFT at the PBE level of theory of -0.149 Hartree."}", "/scratch/micpie/export/RedDB/test_0-19.jsonl": "{"text":"The molecule with the InChI FXAKRDFRFAQBST-UHFFFAOYSA-N has a gaseous phase HOMO energy at the PBE level of theory of -0.243 Hartree."} {"text":"The molecular species with the InChI representation of YWJNWARKAJINTC-UHFFFAOYSA-N has a gaseous phase HOMO energy at the PBE level of theory of -0.141 Hartree."}", "/scratch/micpie/export/RedDB/valid_0-28.jsonl": "{"text":"Task: Please create a chemical compound with the InChI based on the description below.\nDescription: It has an ML-predicted aqueous solubility -0.487 logS and a cavity formation energy at the PBE level of theory of 3.942 kT.\n Result: MCFZBCCYOPSZLG-UHFFFAOYSA-N"} {"text":"Task: Please give me a chemical compound with the InChI based on the description.\nDescription: It has an ML-predicted aqueous solubility -3.256 logS and a cavity formation energy at the PBE level of theory of 6.235 kT.\n Result: ZYYXVQMXUNSARX-UHFFFAOYSA-N"}", "/scratch/micpie/export/RedDB/valid_0-24.jsonl": "{"text":"The compound with the InChI representation of MCFZBCCYOPSZLG-UHFFFAOYSA-N has a optimized gas-phase molecular energy at the PBE level of theory of -342.983 Hartree."} {"text":"The chemical compound with the InChI representation of ZYYXVQMXUNSARX-UHFFFAOYSA-N has a optimized gas-phase molecular energy at the PBE level of theory of -1591.280 Hartree."}", "/scratch/micpie/export/RedDB/train_0-24.jsonl": "{"text":"The chemical with the InChI FGTVOMHLQDTQAT-UHFFFAOYSA-N has a optimized gas-phase molecular energy at the PBE level of theory of -493.318 Hartree."} {"text":"The molecular species with the InChI representation of ZZYJBWDQNXOGRH-UHFFFAOYSA-N has a optimized gas-phase molecular energy at the PBE level of theory of -1779.682 Hartree."}", "/scratch/micpie/export/RedDB/test_0-1.jsonl": "{"text":"The chemical with the DeepSMILES O=CCC=O)CF)=C5F has a molecular surface area of 129.337 \\AA^2."} {"text":"The chemical compound with the InChI representation of InChI=1S\/C16H12N2O13S3\/c19-13-6-2-1-3-8(32(23,24)25)10(6)15(21)17(13)18-14(20)7-4-5-9(33(26,27)28)12(34(29,30)31)11(7)16(18)22\/h1-5,19-22H,(H,23,24,25)(H,26,27,28)(H,29,30,31) has a molecular surface area of 295.916 \\AA^2."}", "/scratch/micpie/export/RedDB/test_0-18.jsonl": "{"text":"The molecule with the InChI FXAKRDFRFAQBST-UHFFFAOYSA-N has a gas-phase molecular energy at the PBE level of theory of -541.285 Hartree."} {"text":"The compound with the InChI representation of YWJNWARKAJINTC-UHFFFAOYSA-N has a gas-phase molecular energy at the PBE level of theory of -1136.677 Hartree."}", "/scratch/micpie/export/RedDB/test_0-29.jsonl": "{"text":"Task: Please generate a molecular species with the SELFIES based on the description below.\nDescription: It has an aqueous phase LUMO energy at the PBE level of theory -0.151 Hartree and a aqueous phase HOMO energy at the PBE level of theory of -0.248 Hartree.\nResult: [O][=C][C][C][=Branch1][C][=O][C][Branch1][C][F][=C][Ring1][#Branch1][F]"} {"text":"Task: Please generate a molecule with the canonical SMILES based on the description below.\nDescription: It has an aqueous phase LUMO energy at the PBE level of theory -0.078 Hartree and a aqueous phase HOMO energy at the PBE level of theory of -0.148 Hartree.\nResult: O=S(=O)(O)c1ccc2c(O)n(-n3c(O)c4cccc(S(=O)(=O)O)c4c3O)c(O)c2c1S(=O)(=O)O"}", "/scratch/micpie/export/RedDB/valid_0-0.jsonl": "{"text":"The molecule with the canonical SMILES O=C1C=CC(=O)C1 has a ML-predicted aqueous solubility of -0.487 logS."} {"text":"The molecule with the DeepSMILES representation of cccO)ccc6)cO)nc3O))-ncO))cO)cc3)cO)cO)cc4O has a ML-predicted aqueous solubility of -3.256 logS."}", "/scratch/micpie/export/RedDB/test_0-21.jsonl": "{"text":"The chemical compound with the InChI FXAKRDFRFAQBST-UHFFFAOYSA-N has a aqueous phase molecular energy at the PBE level of theory of -541.297 Hartree."} {"text":"The chemical compound with the InChI representation of YWJNWARKAJINTC-UHFFFAOYSA-N has a aqueous phase molecular energy at the PBE level of theory of -1136.719 Hartree."}", "/scratch/micpie/export/RedDB/test_0-27.jsonl": "{"text":"Task: Please create a compound with the SELFIES based on the text description below.\nDescription: It has an ML-predicted aqueous solubility -0.587 logS and a cavity formation energy at the PBE level of theory of 4.116 kT.\n Result: [O][=C][C][C][=Branch1][C][=O][C][Branch1][C][F][=C][Ring1][#Branch1][F]"} {"text":"Task: Please create a molecular species with the DeepSMILES based on the description.\nDescription: It has an ML-predicted aqueous solubility -1.340 logS and a cavity formation energy at the PBE level of theory of 6.158 kT.\n Result: ccccS=O)=O)O))cc6)cO)nc3O))-ncO))cO)cc3)cS=O)=O)O))ccc4))S=O)=O)O"}", "/scratch/micpie/export/RedDB/test_0-2.jsonl": "{"text":"The chemical with the canonical SMILES representation of O=C1CC(=O)C(F)=C1F has a chemical reaction field energy of -14.254 kT."} {"text":"The molecular species with the DeepSMILES representation of ccccS=O)=O)O))cc6)cO)nc3O))-ncO))cO)cc3)cS=O)=O)O))ccc4))S=O)=O)O has a chemical reaction field energy of -35.555 kT."}", "/scratch/micpie/export/RedDB/test_0-30.jsonl": "{"text":"Task: Please give me a molecule with the InChI based on the description.\nDescription: It has an aqueous phase LUMO energy at the PBE level of theory -0.151 Hartree and a aqueous phase HOMO energy at the PBE level of theory of -0.248 Hartree.\nResult: FXAKRDFRFAQBST-UHFFFAOYSA-N"} {"text":"Task: Please give me a molecule with the InChI based on the text description.\nDescription: It has an aqueous phase LUMO energy at the PBE level of theory -0.078 Hartree and a aqueous phase HOMO energy at the PBE level of theory of -0.148 Hartree.\nResult: YWJNWARKAJINTC-UHFFFAOYSA-N"}", "/scratch/micpie/export/RedDB/train_0-22.jsonl": "{"text":"The compound with the InChI FGTVOMHLQDTQAT-UHFFFAOYSA-N has a aqueous phase LUMO energy at the PBE level of theory of -0.132 Hartree."} {"text":"The compound with the InChI ZZYJBWDQNXOGRH-UHFFFAOYSA-N has a aqueous phase LUMO energy at the PBE level of theory of -0.054 Hartree."}", "/scratch/micpie/export/RedDB/valid_0-10.jsonl": "{"text":"The molecule with the DeepSMILES O=CCC=O)C=C5 has a nuclear repulsion energy at the PBE level of theory of 265.213 Hartree."} {"text":"The chemical compound with the SMILES c1cc(O)cc(c12)c(O)n(c2O)-n(c3O)c(O)c(c34)c(O)c(O)cc4O has a nuclear repulsion energy at the PBE level of theory of 2299.041 Hartree."}", "/scratch/micpie/export/RedDB/train_0-6.jsonl": "{"text":"The chemical with the canonical SMILES O=C1CC(=O)C(O)=C1O has a gaseous phase HOMO energy at the PBE level of theory of -0.232 Hartree."} {"text":"The chemical compound with the SELFIES [C][=C][Branch1][C][N][C][Branch1][C][N][=C][C][Branch1][Ring2][C][Ring1][Branch2][=C][Branch1][C][O][N][Branch1][Branch1][C][=Ring1][Branch1][O][N][Branch1][Ring1][C][O][C][Branch1][C][O][=C][Branch1][Ring2][C][=Ring1][=Branch1][C][Branch1][C][N][=C][Branch1][C][N][C][Branch1][C][N][=C][Ring1][Branch2][N] has a gaseous phase HOMO energy at the PBE level of theory of -0.126 Hartree."}", "/scratch/micpie/export/RedDB/valid_0-6.jsonl": "{"text":"The compound with the SELFIES [O][=C][C][C][=Branch1][C][=O][C][=C][Ring1][=Branch1] has a gaseous phase HOMO energy at the PBE level of theory of -0.221 Hartree."} {"text":"The compound with the SMILES representation of c1cc(O)cc(c12)c(O)n(c2O)-n(c3O)c(O)c(c34)c(O)c(O)cc4O has a gaseous phase HOMO energy at the PBE level of theory of -0.146 Hartree."}", "/scratch/micpie/export/RedDB/valid_0-30.jsonl": "{"text":"Task: Please create a molecular species with the InChI based on the text description.\nDescription: It has an aqueous phase LUMO energy at the PBE level of theory -0.145 Hartree and a aqueous phase HOMO energy at the PBE level of theory of -0.229 Hartree.\nResult: MCFZBCCYOPSZLG-UHFFFAOYSA-N"} {"text":"Task: Please generate a chemical with the InChI based on the description.\nDescription: It has an aqueous phase LUMO energy at the PBE level of theory -0.075 Hartree and a aqueous phase HOMO energy at the PBE level of theory of -0.158 Hartree.\nResult: ZYYXVQMXUNSARX-UHFFFAOYSA-N"}", "/scratch/micpie/export/RedDB/train_0-21.jsonl": "{"text":"The chemical compound with the InChI FGTVOMHLQDTQAT-UHFFFAOYSA-N has a aqueous phase molecular energy at the PBE level of theory of -493.338 Hartree."} {"text":"The compound with the InChI representation of ZZYJBWDQNXOGRH-UHFFFAOYSA-N has a aqueous phase molecular energy at the PBE level of theory of -1468.532 Hartree."}", "/scratch/micpie/export/RedDB/train_0-19.jsonl": "{"text":"The molecular species with the InChI FGTVOMHLQDTQAT-UHFFFAOYSA-N has a gaseous phase HOMO energy at the PBE level of theory of -0.232 Hartree."} {"text":"The molecule with the InChI representation of ZZYJBWDQNXOGRH-UHFFFAOYSA-N has a gaseous phase HOMO energy at the PBE level of theory of -0.126 Hartree."}", "/scratch/micpie/export/RedDB/valid_0-29.jsonl": "{"text":"Task: Please generate a molecule with the DeepSMILES based on the description below.\nDescription: It has an aqueous phase LUMO energy at the PBE level of theory -0.145 Hartree and a aqueous phase HOMO energy at the PBE level of theory of -0.229 Hartree.\nResult: O=CCC=O)C=C5"} {"text":"Task: Please create a molecule with the DeepSMILES based on the description below.\nDescription: It has an aqueous phase LUMO energy at the PBE level of theory -0.075 Hartree and a aqueous phase HOMO energy at the PBE level of theory of -0.158 Hartree.\nResult: cccO)ccc6)cO)nc3O))-ncO))cO)cc3)cO)cO)cc4O"}", "/scratch/micpie/export/RedDB/test_0-9.jsonl": "{"text":"The chemical with the canonical SMILES O=C1CC(=O)C(F)=C1F has a aqueous phase LUMO energy at the PBE level of theory of -0.151 Hartree."} {"text":"The compound with the DeepSMILES representation of ccccS=O)=O)O))cc6)cO)nc3O))-ncO))cO)cc3)cS=O)=O)O))ccc4))S=O)=O)O has a aqueous phase LUMO energy at the PBE level of theory of -0.078 Hartree."}", "/scratch/micpie/export/RedDB/test_0-0.jsonl": "{"text":"The compound with the DeepSMILES representation of O=CCC=O)CF)=C5F has a ML-predicted aqueous solubility of -0.587 logS."} {"text":"The compound with the SMILES c1ccc(S(=O)(=O)O)c(c12)c(O)n(c2O)-n(c3O)c(O)c(c34)c(S(=O)(=O)O)c(cc4)S(=O)(=O)O has a ML-predicted aqueous solubility of -1.340 logS."}", "/scratch/micpie/export/RedDB/test_0-24.jsonl": "{"text":"The chemical with the InChI representation of FXAKRDFRFAQBST-UHFFFAOYSA-N has a optimized gas-phase molecular energy at the PBE level of theory of -541.285 Hartree."} {"text":"The compound with the InChI YWJNWARKAJINTC-UHFFFAOYSA-N has a optimized gas-phase molecular energy at the PBE level of theory of -1401.725 Hartree."}", "/scratch/micpie/export/RedDB/valid_0-16.jsonl": "{"text":"The chemical with the InChI MCFZBCCYOPSZLG-UHFFFAOYSA-N has a solvent-accessible surface area of 255.660 \\AA^2."} {"text":"The chemical compound with the InChI ZYYXVQMXUNSARX-UHFFFAOYSA-N has a solvent-accessible surface area of 535.069 \\AA^2."}", "/scratch/micpie/export/RedDB/valid_0-7.jsonl": "{"text":"The molecule with the canonical SMILES O=C1C=CC(=O)C1 has a gaseous phase LUMO energy at the PBE level of theory of -0.145 Hartree."} {"text":"The chemical with the canonical SMILES Oc1ccc2c(O)n(-n3c(O)c4c(O)cc(O)c(O)c4c3O)c(O)c2c1 has a gaseous phase LUMO energy at the PBE level of theory of -0.072 Hartree."}", "/scratch/micpie/export/RedDB/test_0-3.jsonl": "{"text":"The molecule with the canonical SMILES representation of O=C1CC(=O)C(F)=C1F has a solvent-accessible surface area of 276.867 \\AA^2."} {"text":"The chemical compound with the SELFIES [C][=C][C][=C][Branch1][=Branch2][S][=Branch1][C][=O][=Branch1][C][=O][O][C][Branch1][Ring2][C][Ring1][#Branch2][=C][Branch1][C][O][N][Branch1][Branch1][C][=Ring1][Branch1][O][N][Branch1][Ring1][C][O][C][Branch1][C][O][=C][Branch1][Ring2][C][=Ring1][=Branch1][C][Branch1][=Branch2][S][=Branch1][C][=O][=Branch1][C][=O][O][=C][Branch1][Branch1][C][=C][Ring1][=Branch2][S][=Branch1][C][=O][=Branch1][C][=O][O] has a solvent-accessible surface area of 525.692 \\AA^2."}", "/scratch/micpie/export/RedDB/valid_0-11.jsonl": "{"text":"The molecular species with the SELFIES [O][=C][C][C][=Branch1][C][=O][C][=C][Ring1][=Branch1] has a optimized gas-phase molecular energy at the PBE level of theory of -342.983 Hartree."} {"text":"The chemical with the canonical SMILES Oc1ccc2c(O)n(-n3c(O)c4c(O)cc(O)c(O)c4c3O)c(O)c2c1 has a optimized gas-phase molecular energy at the PBE level of theory of -1591.280 Hartree."}", "/scratch/micpie/export/RedDB/train_0-20.jsonl": "{"text":"The compound with the InChI FGTVOMHLQDTQAT-UHFFFAOYSA-N has a gaseous phase LUMO energy at the PBE level of theory of -0.132 Hartree."} {"text":"The compound with the InChI ZZYJBWDQNXOGRH-UHFFFAOYSA-N has a gaseous phase LUMO energy at the PBE level of theory of -0.053 Hartree."}", "/scratch/micpie/export/RedDB/train_0-30.jsonl": "{"text":"Task: Please generate a molecular species with the InChI based on the description below.\nDescription: It has an aqueous phase LUMO energy at the PBE level of theory -0.132 Hartree and a aqueous phase HOMO energy at the PBE level of theory of -0.231 Hartree.\nResult: FGTVOMHLQDTQAT-UHFFFAOYSA-N"} {"text":"Task: Please create a compound with the InChI based on the text description.\nDescription: It has an aqueous phase LUMO energy at the PBE level of theory -0.054 Hartree and a aqueous phase HOMO energy at the PBE level of theory of -0.137 Hartree.\nResult: ZZYJBWDQNXOGRH-UHFFFAOYSA-N"}", "/scratch/micpie/export/RedDB/valid_0-20.jsonl": "{"text":"The compound with the InChI MCFZBCCYOPSZLG-UHFFFAOYSA-N has a gaseous phase LUMO energy at the PBE level of theory of -0.145 Hartree."} {"text":"The chemical with the InChI representation of ZYYXVQMXUNSARX-UHFFFAOYSA-N has a gaseous phase LUMO energy at the PBE level of theory of -0.072 Hartree."}", "/scratch/micpie/export/RedDB/train_0-26.jsonl": "{"text":"The molecule with the InChI representation of FGTVOMHLQDTQAT-UHFFFAOYSA-N has a optimized gas-phase LUMO energy calculated with DFT at the PBE level of theory of -0.132 Hartree."} {"text":"The chemical compound with the InChI representation of ZZYJBWDQNXOGRH-UHFFFAOYSA-N has a optimized gas-phase LUMO energy calculated at the PBE level of theory of -0.143 Hartree."}", "/scratch/micpie/export/RedDB/train_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of [O][=C][C][C][=Branch1][C][=O][C][Branch1][C][O][=C][Ring1][#Branch1][O] has a ML-predicted aqueous solubility of -0.241 logS."} {"text":"The compound with the DeepSMILES ccN)cN)ccc6)cO)nc3O))-ncO))cO)cc3)cN)cN)cN)c4N has a ML-predicted aqueous solubility of -3.009 logS."}", "/scratch/micpie/export/RedDB/test_0-6.jsonl": "{"text":"The molecule with the SELFIES [O][=C][C][C][=Branch1][C][=O][C][Branch1][C][F][=C][Ring1][#Branch1][F] has a gaseous phase HOMO energy at the PBE level of theory of -0.243 Hartree."} {"text":"The compound with the InChI InChI=1S\/C16H12N2O13S3\/c19-13-6-2-1-3-8(32(23,24)25)10(6)15(21)17(13)18-14(20)7-4-5-9(33(26,27)28)12(34(29,30)31)11(7)16(18)22\/h1-5,19-22H,(H,23,24,25)(H,26,27,28)(H,29,30,31) has a gaseous phase HOMO energy at the PBE level of theory of -0.141 Hartree."}", "/scratch/micpie/export/RedDB/train_0-10.jsonl": "{"text":"The compound with the DeepSMILES O=CCC=O)CO)=C5O has a nuclear repulsion energy at the PBE level of theory of 432.902 Hartree."} {"text":"The chemical compound with the SMILES representation of c1c(N)c(N)cc(c12)c(O)n(c2O)-n(c3O)c(O)c(c34)c(N)c(N)c(N)c4N has a nuclear repulsion energy at the PBE level of theory of 3002.432 Hartree."}", "/scratch/micpie/export/RedDB/train_0-3.jsonl": "{"text":"The molecule with the SELFIES [O][=C][C][C][=Branch1][C][=O][C][Branch1][C][O][=C][Ring1][#Branch1][O] has a solvent-accessible surface area of 269.306 \\AA^2."} {"text":"The molecule with the InChI representation of InChI=1S\/C16H18N8O4\/c17-5-1-3-4(2-6(5)18)14(26)23(13(3)25)24-15(27)7-8(16(24)28)10(20)12(22)11(21)9(7)19\/h1-2,25-28H,17-22H2 has a solvent-accessible surface area of 597.304 \\AA^2."}", "/scratch/micpie/export/RedDB/train_0-23.jsonl": "{"text":"The chemical with the InChI FGTVOMHLQDTQAT-UHFFFAOYSA-N has a nuclear repulsion energy at the PBE level of theory of 432.902 Hartree."} {"text":"The molecular species with the InChI representation of ZZYJBWDQNXOGRH-UHFFFAOYSA-N has a nuclear repulsion energy at the PBE level of theory of 3002.432 Hartree."}", "/scratch/micpie/export/RedDB/train_0-12.jsonl": "{"text":"The compound with the InChI representation of InChI=1S\/C5H4O4\/c6-2-1-3(7)5(9)4(2)8\/h8-9H,1H2 has a optimized gas-phase HOMO energy at the PBE level of theory of -0.232 Hartree."} {"text":"The molecule with the SMILES c1c(N)c(N)cc(c12)c(O)n(c2O)-n(c3O)c(O)c(c34)c(N)c(N)c(N)c4N has a optimized gas-phase HOMO energy at the PBE level of theory of -0.159 Hartree."}", "/scratch/micpie/export/RedDB/test_0-28.jsonl": "{"text":"Task: Please generate a molecular species with the InChI based on the text description.\nDescription: It has an ML-predicted aqueous solubility -0.587 logS and a cavity formation energy at the PBE level of theory of 4.116 kT.\n Result: FXAKRDFRFAQBST-UHFFFAOYSA-N"} {"text":"Task: Please create a compound with the InChI based on the description below.\nDescription: It has an ML-predicted aqueous solubility -1.340 logS and a cavity formation energy at the PBE level of theory of 6.158 kT.\n Result: YWJNWARKAJINTC-UHFFFAOYSA-N"}", "/scratch/micpie/export/RedDB/test_0-13.jsonl": "{"text":"The chemical with the SELFIES representation of [O][=C][C][C][=Branch1][C][=O][C][Branch1][C][F][=C][Ring1][#Branch1][F] has a optimized gas-phase LUMO energy calculated with DFT at the PBE level of theory of -0.154 Hartree.The chemical with the InChI representation of FXAKRDFRFAQBST-UHFFFAOYSA-N has a ML-predicted aqueous solubility of -0.587 logS."} {"text":"The molecule with the SMILES c1ccc(S(=O)(=O)O)c(c12)c(O)n(c2O)-n(c3O)c(O)c(c34)c(S(=O)(=O)O)c(cc4)S(=O)(=O)O has a optimized gas-phase LUMO energy calculated with DFT at the PBE level of theory of -0.149 Hartree.The molecule with the InChI YWJNWARKAJINTC-UHFFFAOYSA-N has a ML-predicted aqueous solubility of -1.340 logS."}", "/scratch/micpie/export/RedDB/test_0-23.jsonl": "{"text":"The chemical with the InChI FXAKRDFRFAQBST-UHFFFAOYSA-N has a nuclear repulsion energy at the PBE level of theory of 426.524 Hartree."} {"text":"The molecular species with the InChI representation of YWJNWARKAJINTC-UHFFFAOYSA-N has a nuclear repulsion energy at the PBE level of theory of 2038.509 Hartree."}", "/scratch/micpie/export/RedDB/valid_0-2.jsonl": "{"text":"The chemical compound with the SMILES O=C1CC(=O)C=C1 has a chemical reaction field energy of -15.484 kT."} {"text":"The molecule with the InChI InChI=1S\/C16H12N2O8\/c19-5-1-2-6-7(3-5)14(24)17(13(6)23)18-15(25)10-8(20)4-9(21)12(22)11(10)16(18)26\/h1-4,19-26H has a chemical reaction field energy of -59.298 kT."}", "/scratch/micpie/export/RedDB/valid_0-21.jsonl": "{"text":"The compound with the InChI MCFZBCCYOPSZLG-UHFFFAOYSA-N has a aqueous phase molecular energy at the PBE level of theory of -342.998 Hartree."} {"text":"The compound with the InChI ZYYXVQMXUNSARX-UHFFFAOYSA-N has a aqueous phase molecular energy at the PBE level of theory of -1326.756 Hartree."}", "/scratch/micpie/export/RedDB/train_0-14.jsonl": "{"text":"The molecule with the InChI representation of FGTVOMHLQDTQAT-UHFFFAOYSA-N has a molecular surface area of 125.115 \\AA^2."} {"text":"The chemical with the InChI representation of ZZYJBWDQNXOGRH-UHFFFAOYSA-N has a molecular surface area of 346.900 \\AA^2."}", "/scratch/micpie/export/RedDB/valid_0-1.jsonl": "{"text":"The chemical compound with the InChI InChI=1S\/C5H4O2\/c6-4-1-2-5(7)3-4\/h1-2H,3H2 has a molecular surface area of 115.660 \\AA^2."} {"text":"The chemical compound with the SMILES c1cc(O)cc(c12)c(O)n(c2O)-n(c3O)c(O)c(c34)c(O)c(O)cc4O has a molecular surface area of 300.767 \\AA^2."}", "/scratch/micpie/export/RedDB/valid_0-13.jsonl": "{"text":"The chemical compound with the InChI InChI=1S\/C5H4O2\/c6-4-1-2-5(7)3-4\/h1-2H,3H2 has a optimized gas-phase LUMO energy calculated with DFT at the PBE level of theory of -0.145 Hartree.The chemical compound with the InChI MCFZBCCYOPSZLG-UHFFFAOYSA-N has a ML-predicted aqueous solubility of -0.487 logS."} {"text":"The chemical compound with the SELFIES [C][=C][C][Branch1][C][O][=C][C][Branch1][Ring2][C][Ring1][#Branch1][=C][Branch1][C][O][N][Branch1][Branch1][C][=Ring1][Branch1][O][N][Branch1][Ring1][C][O][C][Branch1][C][O][=C][Branch1][Ring2][C][=Ring1][=Branch1][C][Branch1][C][O][=C][Branch1][C][O][C][=C][Ring1][#Branch1][O] has a optimized gas-phase LUMO energy calculated with DFT at the PBE level of theory of -0.119 Hartree.The chemical compound with the InChI ZYYXVQMXUNSARX-UHFFFAOYSA-N has a ML-predicted aqueous solubility of -3.256 logS."}", "/scratch/micpie/export/RedDB/train_0-29.jsonl": "{"text":"Task: Please generate a molecular species with the SELFIES based on the description below.\nDescription: It has an aqueous phase LUMO energy at the PBE level of theory -0.132 Hartree and a aqueous phase HOMO energy at the PBE level of theory of -0.231 Hartree.\nResult: [O][=C][C][C][=Branch1][C][=O][C][Branch1][C][O][=C][Ring1][#Branch1][O]"} {"text":"Task: Please generate a chemical compound with the InChI based on the description below.\nDescription: It has an aqueous phase LUMO energy at the PBE level of theory -0.054 Hartree and a aqueous phase HOMO energy at the PBE level of theory of -0.137 Hartree.\nResult: InChI=1S\/C16H18N8O4\/c17-5-1-3-4(2-6(5)18)14(26)23(13(3)25)24-15(27)7-8(16(24)28)10(20)12(22)11(21)9(7)19\/h1-2,25-28H,17-22H2"}", "/scratch/micpie/export/RedDB/valid_0-23.jsonl": "{"text":"The molecular species with the InChI representation of MCFZBCCYOPSZLG-UHFFFAOYSA-N has a nuclear repulsion energy at the PBE level of theory of 265.213 Hartree."} {"text":"The chemical compound with the InChI ZYYXVQMXUNSARX-UHFFFAOYSA-N has a nuclear repulsion energy at the PBE level of theory of 2299.041 Hartree."}", "/scratch/micpie/export/RedDB/valid_0-5.jsonl": "{"text":"The molecular species with the canonical SMILES representation of O=C1C=CC(=O)C1 has a gas-phase molecular energy at the PBE level of theory of -342.983 Hartree."} {"text":"The molecular species with the SMILES representation of c1cc(O)cc(c12)c(O)n(c2O)-n(c3O)c(O)c(c34)c(O)c(O)cc4O has a gas-phase molecular energy at the PBE level of theory of -1326.688 Hartree."}", "/scratch/micpie/export/RedDB/train_0-15.jsonl": "{"text":"The molecule with the InChI representation of FGTVOMHLQDTQAT-UHFFFAOYSA-N has a chemical reaction field energy of -21.365 kT."} {"text":"The compound with the InChI ZZYJBWDQNXOGRH-UHFFFAOYSA-N has a chemical reaction field energy of -69.133 kT."}", "/scratch/micpie/export/RedDB/valid_0-4.jsonl": "{"text":"The chemical compound with the canonical SMILES representation of O=C1C=CC(=O)C1 has a cavity formation energy at the PBE level of theory of 3.942 kT."} {"text":"The chemical with the DeepSMILES representation of cccO)ccc6)cO)nc3O))-ncO))cO)cc3)cO)cO)cc4O has a cavity formation energy at the PBE level of theory of 6.235 kT."}", "/scratch/micpie/export/RedDB/train_0-5.jsonl": "{"text":"The chemical compound with the canonical SMILES O=C1CC(=O)C(O)=C1O has a gas-phase molecular energy at the PBE level of theory of -493.318 Hartree."} {"text":"The molecule with the SMILES representation of c1c(N)c(N)cc(c12)c(O)n(c2O)-n(c3O)c(O)c(c34)c(N)c(N)c(N)c4N has a gas-phase molecular energy at the PBE level of theory of -1468.454 Hartree."}", "/scratch/micpie/export/RedDB/valid_0-15.jsonl": "{"text":"The molecule with the InChI representation of MCFZBCCYOPSZLG-UHFFFAOYSA-N has a chemical reaction field energy of -15.484 kT."} {"text":"The chemical compound with the InChI ZYYXVQMXUNSARX-UHFFFAOYSA-N has a chemical reaction field energy of -59.298 kT."}", "/scratch/micpie/export/RedDB/valid_0-12.jsonl": "{"text":"The chemical with the DeepSMILES O=CCC=O)C=C5 has a optimized gas-phase HOMO energy at the PBE level of theory of -0.221 Hartree."} {"text":"The molecule with the canonical SMILES Oc1ccc2c(O)n(-n3c(O)c4c(O)cc(O)c(O)c4c3O)c(O)c2c1 has a optimized gas-phase HOMO energy at the PBE level of theory of -0.166 Hartree."}", "/scratch/micpie/export/RedDB/valid_0-18.jsonl": "{"text":"The chemical with the InChI MCFZBCCYOPSZLG-UHFFFAOYSA-N has a gas-phase molecular energy at the PBE level of theory of -342.983 Hartree."} {"text":"The molecule with the InChI representation of ZYYXVQMXUNSARX-UHFFFAOYSA-N has a gas-phase molecular energy at the PBE level of theory of -1326.688 Hartree."}", "/scratch/micpie/export/RedDB/train_0-2.jsonl": "{"text":"The compound with the canonical SMILES representation of O=C1CC(=O)C(O)=C1O has a chemical reaction field energy of -21.365 kT."} {"text":"The compound with the InChI InChI=1S\/C16H18N8O4\/c17-5-1-3-4(2-6(5)18)14(26)23(13(3)25)24-15(27)7-8(16(24)28)10(20)12(22)11(21)9(7)19\/h1-2,25-28H,17-22H2 has a chemical reaction field energy of -69.133 kT."}", "/scratch/micpie/export/RedDB/test_0-11.jsonl": "{"text":"The chemical with the SMILES representation of O=C1CC(=O)C(F)=C1F has a optimized gas-phase molecular energy at the PBE level of theory of -541.285 Hartree."} {"text":"The compound with the DeepSMILES representation of ccccS=O)=O)O))cc6)cO)nc3O))-ncO))cO)cc3)cS=O)=O)O))ccc4))S=O)=O)O has a optimized gas-phase molecular energy at the PBE level of theory of -1401.725 Hartree."}", "/scratch/micpie/export/RedDB/train_0-7.jsonl": "{"text":"The chemical with the InChI InChI=1S\/C5H4O4\/c6-2-1-3(7)5(9)4(2)8\/h8-9H,1H2 has a gaseous phase LUMO energy at the PBE level of theory of -0.132 Hartree."} {"text":"The molecule with the SELFIES [C][=C][Branch1][C][N][C][Branch1][C][N][=C][C][Branch1][Ring2][C][Ring1][Branch2][=C][Branch1][C][O][N][Branch1][Branch1][C][=Ring1][Branch1][O][N][Branch1][Ring1][C][O][C][Branch1][C][O][=C][Branch1][Ring2][C][=Ring1][=Branch1][C][Branch1][C][N][=C][Branch1][C][N][C][Branch1][C][N][=C][Ring1][Branch2][N] has a gaseous phase LUMO energy at the PBE level of theory of -0.053 Hartree."}", "/scratch/micpie/export/RedDB/test_0-17.jsonl": "{"text":"The chemical with the InChI FXAKRDFRFAQBST-UHFFFAOYSA-N has a cavity formation energy at the PBE level of theory of 4.116 kT."} {"text":"The molecular species with the InChI YWJNWARKAJINTC-UHFFFAOYSA-N has a cavity formation energy at the PBE level of theory of 6.158 kT."}", "/scratch/micpie/export/RedDB/valid_0-27.jsonl": "{"text":"Task: Please generate a compound with the DeepSMILES based on the description.\nDescription: It has an ML-predicted aqueous solubility -0.487 logS and a cavity formation energy at the PBE level of theory of 3.942 kT.\n Result: O=CCC=O)C=C5"} {"text":"Task: Please give me a molecular species with the InChI based on the description below.\nDescription: It has an ML-predicted aqueous solubility -3.256 logS and a cavity formation energy at the PBE level of theory of 6.235 kT.\n Result: InChI=1S\/C16H12N2O8\/c19-5-1-2-6-7(3-5)14(24)17(13(6)23)18-15(25)10-8(20)4-9(21)12(22)11(10)16(18)26\/h1-4,19-26H"}", "/scratch/micpie/export/RedDB/valid_0-19.jsonl": "{"text":"The molecule with the InChI representation of MCFZBCCYOPSZLG-UHFFFAOYSA-N has a gaseous phase HOMO energy at the PBE level of theory of -0.221 Hartree."} {"text":"The molecular species with the InChI representation of ZYYXVQMXUNSARX-UHFFFAOYSA-N has a gaseous phase HOMO energy at the PBE level of theory of -0.146 Hartree."}", "/scratch/micpie/export/RedDB/train_0-11.jsonl": "{"text":"The molecule with the SMILES O=C1CC(=O)C(O)=C1O has a optimized gas-phase molecular energy at the PBE level of theory of -493.318 Hartree."} {"text":"The molecule with the InChI InChI=1S\/C16H18N8O4\/c17-5-1-3-4(2-6(5)18)14(26)23(13(3)25)24-15(27)7-8(16(24)28)10(20)12(22)11(21)9(7)19\/h1-2,25-28H,17-22H2 has a optimized gas-phase molecular energy at the PBE level of theory of -1779.682 Hartree."}", "/scratch/micpie/export/RedDB/train_0-1.jsonl": "{"text":"The molecular species with the InChI representation of InChI=1S\/C5H4O4\/c6-2-1-3(7)5(9)4(2)8\/h8-9H,1H2 has a molecular surface area of 125.115 \\AA^2."} {"text":"The molecular species with the SMILES representation of c1c(N)c(N)cc(c12)c(O)n(c2O)-n(c3O)c(O)c(c34)c(N)c(N)c(N)c4N has a molecular surface area of 346.900 \\AA^2."}", "/scratch/micpie/export/RedDB/train_0-13.jsonl": "{"text":"The chemical compound with the SELFIES [O][=C][C][C][=Branch1][C][=O][C][Branch1][C][O][=C][Ring1][#Branch1][O] has a optimized gas-phase LUMO energy calculated with DFT at the PBE level of theory of -0.132 Hartree.The chemical compound with the InChI FGTVOMHLQDTQAT-UHFFFAOYSA-N has a ML-predicted aqueous solubility of -0.241 logS."} {"text":"The compound with the canonical SMILES representation of Nc1cc2c(O)n(-n3c(O)c4c(N)c(N)c(N)c(N)c4c3O)c(O)c2cc1N has a optimized gas-phase LUMO energy calculated with DFT at the PBE level of theory of -0.143 Hartree.The compound with the InChI representation of ZZYJBWDQNXOGRH-UHFFFAOYSA-N has a ML-predicted aqueous solubility of -3.009 logS."}", "/scratch/micpie/export/RedDB/valid_0-26.jsonl": "{"text":"The molecule with the InChI representation of MCFZBCCYOPSZLG-UHFFFAOYSA-N has a optimized gas-phase LUMO energy calculated with DFT at the PBE level of theory of -0.145 Hartree."} {"text":"The chemical with the InChI ZYYXVQMXUNSARX-UHFFFAOYSA-N has a optimized gas-phase LUMO energy calculated with DFT at the PBE level of theory of -0.119 Hartree."}", "/scratch/micpie/export/RedDB/train_0-4.jsonl": "{"text":"The molecular species with the SMILES representation of O=C1CC(=O)C(O)=C1O has a cavity formation energy at the PBE level of theory of 4.054 kT."} {"text":"The compound with the DeepSMILES ccN)cN)ccc6)cO)nc3O))-ncO))cO)cc3)cN)cN)cN)c4N has a cavity formation energy at the PBE level of theory of 6.746 kT."}", "/scratch/micpie/export/RedDB/test_0-7.jsonl": "{"text":"The molecule with the SMILES O=C1CC(=O)C(F)=C1F has a gaseous phase LUMO energy at the PBE level of theory of -0.154 Hartree."} {"text":"The compound with the InChI representation of InChI=1S\/C16H12N2O13S3\/c19-13-6-2-1-3-8(32(23,24)25)10(6)15(21)17(13)18-14(20)7-4-5-9(33(26,27)28)12(34(29,30)31)11(7)16(18)22\/h1-5,19-22H,(H,23,24,25)(H,26,27,28)(H,29,30,31) has a gaseous phase LUMO energy at the PBE level of theory of -0.071 Hartree."}", "/scratch/micpie/export/RedDB/train_0-9.jsonl": "{"text":"The molecular species with the InChI representation of InChI=1S\/C5H4O4\/c6-2-1-3(7)5(9)4(2)8\/h8-9H,1H2 has a aqueous phase LUMO energy at the PBE level of theory of -0.132 Hartree."} {"text":"The chemical with the SMILES c1c(N)c(N)cc(c12)c(O)n(c2O)-n(c3O)c(O)c(c34)c(N)c(N)c(N)c4N has a aqueous phase LUMO energy at the PBE level of theory of -0.054 Hartree."}", "/scratch/micpie/export/RedDB/train_0-25.jsonl": "{"text":"The molecule with the InChI representation of FGTVOMHLQDTQAT-UHFFFAOYSA-N has a optimized gas-phase HOMO energy at the PBE level of theory of -0.232 Hartree."} {"text":"The molecular species with the InChI ZZYJBWDQNXOGRH-UHFFFAOYSA-N has a optimized gas-phase HOMO energy at the PBE level of theory of -0.159 Hartree."}", "/scratch/micpie/export/RedDB/valid_0-22.jsonl": "{"text":"The chemical with the InChI representation of MCFZBCCYOPSZLG-UHFFFAOYSA-N has a aqueous phase LUMO energy at the PBE level of theory of -0.145 Hartree."} {"text":"The molecule with the InChI ZYYXVQMXUNSARX-UHFFFAOYSA-N has a aqueous phase LUMO energy at the PBE level of theory of -0.075 Hartree."}", "/scratch/micpie/export/RedDB/train_0-18.jsonl": "{"text":"The chemical with the InChI representation of FGTVOMHLQDTQAT-UHFFFAOYSA-N has a gas-phase molecular energy at the PBE level of theory of -493.318 Hartree."} {"text":"The chemical with the InChI representation of ZZYJBWDQNXOGRH-UHFFFAOYSA-N has a gas-phase molecular energy at the PBE level of theory of -1468.454 Hartree."}", "/scratch/micpie/export/RedDB/valid_0-3.jsonl": "{"text":"The molecular species with the DeepSMILES O=CCC=O)C=C5 has a solvent-accessible surface area of 255.660 \\AA^2."} {"text":"The molecular species with the SELFIES representation of [C][=C][C][Branch1][C][O][=C][C][Branch1][Ring2][C][Ring1][#Branch1][=C][Branch1][C][O][N][Branch1][Branch1][C][=Ring1][Branch1][O][N][Branch1][Ring1][C][O][C][Branch1][C][O][=C][Branch1][Ring2][C][=Ring1][=Branch1][C][Branch1][C][O][=C][Branch1][C][O][C][=C][Ring1][#Branch1][O] has a solvent-accessible surface area of 535.069 \\AA^2."}", "/scratch/micpie/export/RedDB/test_0-8.jsonl": "{"text":"The compound with the SMILES representation of O=C1CC(=O)C(F)=C1F has a aqueous phase molecular energy at the PBE level of theory of -541.297 Hartree."} {"text":"The compound with the SELFIES representation of [C][=C][C][=C][Branch1][=Branch2][S][=Branch1][C][=O][=Branch1][C][=O][O][C][Branch1][Ring2][C][Ring1][#Branch2][=C][Branch1][C][O][N][Branch1][Branch1][C][=Ring1][Branch1][O][N][Branch1][Ring1][C][O][C][Branch1][C][O][=C][Branch1][Ring2][C][=Ring1][=Branch1][C][Branch1][=Branch2][S][=Branch1][C][=O][=Branch1][C][=O][O][=C][Branch1][Branch1][C][=C][Ring1][=Branch2][S][=Branch1][C][=O][=Branch1][C][=O][O] has a aqueous phase molecular energy at the PBE level of theory of -1136.719 Hartree."}", "/scratch/micpie/export/RedDB/test_0-14.jsonl": "{"text":"The chemical with the InChI FXAKRDFRFAQBST-UHFFFAOYSA-N has a molecular surface area of 129.337 \\AA^2."} {"text":"The compound with the InChI YWJNWARKAJINTC-UHFFFAOYSA-N has a molecular surface area of 295.916 \\AA^2."}", "/scratch/micpie/export/RedDB/valid_0-17.jsonl": "{"text":"The molecular species with the InChI representation of MCFZBCCYOPSZLG-UHFFFAOYSA-N has a cavity formation energy at the PBE level of theory of 3.942 kT."} {"text":"The molecular species with the InChI representation of ZYYXVQMXUNSARX-UHFFFAOYSA-N has a cavity formation energy at the PBE level of theory of 6.235 kT."}", "/scratch/micpie/export/RedDB/valid_0-14.jsonl": "{"text":"The compound with the InChI MCFZBCCYOPSZLG-UHFFFAOYSA-N has a molecular surface area of 115.660 \\AA^2."} {"text":"The chemical compound with the InChI representation of ZYYXVQMXUNSARX-UHFFFAOYSA-N has a molecular surface area of 300.767 \\AA^2."}", "/scratch/micpie/export/RedDB/test_0-25.jsonl": "{"text":"The chemical compound with the InChI FXAKRDFRFAQBST-UHFFFAOYSA-N has a optimized gas-phase HOMO energy at the PBE level of theory of -0.243 Hartree."} {"text":"The chemical compound with the InChI representation of YWJNWARKAJINTC-UHFFFAOYSA-N has a optimized gas-phase HOMO energy at the PBE level of theory of -0.159 Hartree."}", "/scratch/micpie/export/RedDB/test_0-4.jsonl": "{"text":"The molecule with the SMILES O=C1CC(=O)C(F)=C1F has a cavity formation energy at the PBE level of theory of 4.116 kT."} {"text":"The chemical with the SMILES c1ccc(S(=O)(=O)O)c(c12)c(O)n(c2O)-n(c3O)c(O)c(c34)c(S(=O)(=O)O)c(cc4)S(=O)(=O)O has a cavity formation energy at the PBE level of theory of 6.158 kT."}", "/scratch/micpie/export/RedDB/test_0-12.jsonl": "{"text":"The molecular species with the SMILES O=C1CC(=O)C(F)=C1F has a optimized gas-phase HOMO energy at the PBE level of theory of -0.243 Hartree."} {"text":"The chemical compound with the SMILES representation of c1ccc(S(=O)(=O)O)c(c12)c(O)n(c2O)-n(c3O)c(O)c(c34)c(S(=O)(=O)O)c(cc4)S(=O)(=O)O has a optimized gas-phase HOMO energy at the PBE level of theory of -0.159 Hartree."}", "/scratch/micpie/export/RedDB/test_0-20.jsonl": "{"text":"The compound with the InChI FXAKRDFRFAQBST-UHFFFAOYSA-N has a gaseous phase LUMO energy at the PBE level of theory of -0.154 Hartree."} {"text":"The compound with the InChI YWJNWARKAJINTC-UHFFFAOYSA-N has a gaseous phase LUMO energy at the PBE level of theory of -0.071 Hartree."}", "/scratch/micpie/export/bio_ner_26/valid_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Other bvrR regulated genes related with cell envelope were: three lipoprotein genes (BAB1 _ 0358; BAB1 _ 0589; BAB1 _ 2147), which were down-regulated; six genes for periplasmic proteins and chaperones (htpX, heat shock protein, BAB1 _ 1821; clpA and clpB, stress response proteins, BAB1 _ 1573 and BAB1 _ 1868, respectively; BAB2 _ 1107; BAB1 _ 0505; BAB1 _ 1022), which were all up-regulated; one gene related with LPS biosynthesis (glycosyl transferase, BAB1 _ 1620), which was up-regulated; and five genes for fatty acids biosynthesis (fabG, ketoacyl-acyl-carrier-protein reductase, BAB1 _ 2043; fabF, oxoacyl-acyl-carrier-protein synthase, BAB1 _ 0872; fadD, fatty-acyl-CoA synthase, BAB1 _ 0320; cfa, cyclopropane-fatty-acyl-phospholipid synthase, BAB1 _ 0476; BAB1 _ 1357)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: bvrR,6,10,Protein\nBAB1 _ 0358,86,97,Protein\nBAB1 _ 0589,99,110,Protein\nBAB1 _ 2147,112,123,Protein\nhtpX,207,211,Protein\nBAB1 _ 1821,233,244,Protein\nclpA,246,250,Protein\nclpB,255,259,Protein\nBAB1 _ 1573,287,298,Protein\nBAB1 _ 1868,303,314,Protein\nBAB2 _ 1107,330,341,Protein\nBAB1 _ 0505,343,354,Protein\nBAB1 _ 1022,356,367,Protein\nLPS,423,426,Chemical\nBAB1 _ 1620,464,475,Protein\nfabG,550,554,Protein\nBAB1 _ 2043,603,614,Protein\nfabF,616,620,Protein\nBAB1 _ 0872,667,678,Protein\nfadD,680,684,Protein\nfatty - acyl - CoA,686,704,Chemical\nBAB1 _ 0320,715,726,Protein\ncfa,728,731,Protein\ncyclopropane - fatty - acyl - phospholipid,733,775,Chemical\nBAB1 _ 0476,786,797,Protein\nBAB1 _ 1357,799,810,Protein"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Other bvrR regulated genes related with cell envelope were: three lipoprotein genes (BAB1 _ 0358; BAB1 _ 0589; BAB1 _ 2147), which were down-regulated; six genes for periplasmic proteins and chaperones (htpX, heat shock protein, BAB1 _ 1821; clpA and clpB, stress response proteins, BAB1 _ 1573 and BAB1 _ 1868, respectively; BAB2 _ 1107; BAB1 _ 0505; BAB1 _ 1022), which were all up-regulated; one gene related with LPS biosynthesis (glycosyl transferase, BAB1 _ 1620), which was up-regulated; and five genes for fatty acids biosynthesis (fabG, ketoacyl-acyl-carrier-protein reductase, BAB1 _ 2043; fabF, oxoacyl-acyl-carrier-protein synthase, BAB1 _ 0872; fadD, fatty-acyl-CoA synthase, BAB1 _ 0320; cfa, cyclopropane-fatty-acyl-phospholipid synthase, BAB1 _ 0476; BAB1 _ 1357)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: bvrR,6,10,Protein\nBAB1 _ 0358,86,97,Protein\nBAB1 _ 0589,99,110,Protein\nBAB1 _ 2147,112,123,Protein\nhtpX,207,211,Protein\nBAB1 _ 1821,233,244,Protein\nclpA,246,250,Protein\nclpB,255,259,Protein\nBAB1 _ 1573,287,298,Protein\nBAB1 _ 1868,303,314,Protein\nBAB2 _ 1107,330,341,Protein\nBAB1 _ 0505,343,354,Protein\nBAB1 _ 1022,356,367,Protein\nLPS,423,426,Chemical\nBAB1 _ 1620,464,475,Protein\nfabG,550,554,Protein\nBAB1 _ 2043,603,614,Protein\nfabF,616,620,Protein\nBAB1 _ 0872,667,678,Protein\nfadD,680,684,Protein\nfatty - acyl - CoA,686,704,Chemical\nBAB1 _ 0320,715,726,Protein\ncfa,728,731,Protein\ncyclopropane - fatty - acyl - phospholipid,733,775,Chemical\nBAB1 _ 0476,786,797,Protein\nBAB1 _ 1357,799,810,Protein"}", "/scratch/micpie/export/bio_ner_26/test_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: TCGCCGAGAATTTCGAYTTG TTCGAYTTGACGGACAGCGT TCGTCRTTGCCCCAYTTNGG 1180 1169 von Netzer et al., 2013 FAE-Kf 7757f-1 7757f-2 7766f 8543r assA TCGGACGCGTGCAACGATCTGA TCGGACGCGTGCAACGCCCTGA TGTAACGGCATGACCATTCT TCGTCRTTGCCCCAYTTNGG 786 786 777 von Netzer et al., 2013 assA2 13591376f 17851802r assA YATGWACTGGCACGGMCA GCRTTTTCMACCCAKGTA 426+ Aitken et al., 2013 assA3 13941409f 18431860r assA CCGCACCTGGGTKCAYCA GKCCATSGTGTAYTTCTT 440+ Ncr2for Ncr2rev Ncr TGGACAAAYAAAMGYACVGAT GATTCCGGCTTTTTTCCAAVT 320 Morris et al., 2014 Sequence positions indicated for primers refer to the nucleotide position of the following references; Thauera aromatica K127 bss operon (Winderl et al., 2007; von Netzer et al., 2013), Azoarcus sp. strain T bssA (Washer and Edwards, 2007), and Desulfatibacillum alkenivorans AK-01 (Callaghan et al., 2010; Aitken et al., 2013)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: TCGCCGAGAATTTCGAYTTG,0,20,primer\nTTCGAYTTGACGGACAGCGT,21,41,primer\nTCGTCRTTGCCCCAYTTNGG,42,62,primer\nFAE - Kf,97,105,primer\n7757f - 1,106,115,primer\n7757f - 2,116,125,primer\n7766f,126,131,primer\n8543r,132,137,primer\nTCGGACGCGTGCAACGATCTGA,143,165,primer\nTCGGACGCGTGCAACGCCCTGA,166,188,primer\nTGTAACGGCATGACCATTCT,189,209,primer\nTCGTCRTTGCCCCAYTTNGG,210,230,primer\nassA2,267,272,primer\n13591376f,273,282,primer\n17851802r,283,292,primer\nYATGWACTGGCACGGMCA,298,316,primer\nGCRTTTTCMACCCAKGTA,317,335,primer\nassA3,362,367,primer\n13941409f,368,377,primer\n18431860r,378,387,primer\nCCGCACCTGGGTKCAYCA,393,411,primer\nGKCCATSGTGTAYTTCTT,412,430,primer\nNcr2for,437,444,primer\nNcr2rev,445,452,primer\nTGGACAAAYAAAMGYACVGAT,457,478,primer\nGATTCCGGCTTTTTTCCAAVT,479,500,primer"} {"text":"Task: Please carry out the NER task for the the text below.\nText: TCGCCGAGAATTTCGAYTTG TTCGAYTTGACGGACAGCGT TCGTCRTTGCCCCAYTTNGG 1180 1169 von Netzer et al., 2013 FAE-Kf 7757f-1 7757f-2 7766f 8543r assA TCGGACGCGTGCAACGATCTGA TCGGACGCGTGCAACGCCCTGA TGTAACGGCATGACCATTCT TCGTCRTTGCCCCAYTTNGG 786 786 777 von Netzer et al., 2013 assA2 13591376f 17851802r assA YATGWACTGGCACGGMCA GCRTTTTCMACCCAKGTA 426+ Aitken et al., 2013 assA3 13941409f 18431860r assA CCGCACCTGGGTKCAYCA GKCCATSGTGTAYTTCTT 440+ Ncr2for Ncr2rev Ncr TGGACAAAYAAAMGYACVGAT GATTCCGGCTTTTTTCCAAVT 320 Morris et al., 2014 Sequence positions indicated for primers refer to the nucleotide position of the following references; Thauera aromatica K127 bss operon (Winderl et al., 2007; von Netzer et al., 2013), Azoarcus sp. strain T bssA (Washer and Edwards, 2007), and Desulfatibacillum alkenivorans AK-01 (Callaghan et al., 2010; Aitken et al., 2013)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: TCGCCGAGAATTTCGAYTTG,0,20,primer\nTTCGAYTTGACGGACAGCGT,21,41,primer\nTCGTCRTTGCCCCAYTTNGG,42,62,primer\nFAE - Kf,97,105,primer\n7757f - 1,106,115,primer\n7757f - 2,116,125,primer\n7766f,126,131,primer\n8543r,132,137,primer\nTCGGACGCGTGCAACGATCTGA,143,165,primer\nTCGGACGCGTGCAACGCCCTGA,166,188,primer\nTGTAACGGCATGACCATTCT,189,209,primer\nTCGTCRTTGCCCCAYTTNGG,210,230,primer\nassA2,267,272,primer\n13591376f,273,282,primer\n17851802r,283,292,primer\nYATGWACTGGCACGGMCA,298,316,primer\nGCRTTTTCMACCCAKGTA,317,335,primer\nassA3,362,367,primer\n13941409f,368,377,primer\n18431860r,378,387,primer\nCCGCACCTGGGTKCAYCA,393,411,primer\nGKCCATSGTGTAYTTCTT,412,430,primer\nNcr2for,437,444,primer\nNcr2rev,445,452,primer\nTGGACAAAYAAAMGYACVGAT,457,478,primer\nGATTCCGGCTTTTTTCCAAVT,479,500,primer"}", "/scratch/micpie/export/bio_ner_26/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Present in both bacterial organisms compared here are also genes encoding a hemine uptake system (ye0323-0332\/plu2631-2636), the YfeABCD transporter system of chelated iron, the ferrous (Fe2+) iron transporter proteins FeoAB, the AfuABC\/SfuABC ferric (Fe3+) transporter, the enterobactin and its transporter (FepBDCG), the FecABCDE ABC transporter system, and several putative hemin\/siderophore\/iron uptake proteins (YE1459-1461\/Plu2850-2852), YE3190\/Plu2853, and YE0555\/Plu3738)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: hemine,76,82,Chemical\nye0323,99,105,Protein\n0332,108,112,Protein\nplu2631,115,122,Protein\n2636,125,129,Protein\nYfeABCD,136,143,Protein\niron,175,179,Chemical\nFeoAB,228,233,Protein\nAfuABC,239,245,Protein\nSfuABC,248,254,Protein\nferric,255,261,Chemical\nFe3 +,264,269,Chemical\nenterobactin,288,300,Chemical\nFepBDCG,323,330,Protein\nFecABCDE,337,345,Protein\nhemin,391,396,Chemical\nsiderophore,399,410,Chemical\niron,413,417,Chemical\nYE1459,436,442,Protein\n1461,445,449,Protein\nPlu2850,452,459,Protein\n2852,462,466,Protein\nYE3190,469,475,Protein\nPlu2853,478,485,Protein\nYE0555,491,497,Protein\nPlu3738,500,507,Protein"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Present in both bacterial organisms compared here are also genes encoding a hemine uptake system (ye0323-0332\/plu2631-2636), the YfeABCD transporter system of chelated iron, the ferrous (Fe2+) iron transporter proteins FeoAB, the AfuABC\/SfuABC ferric (Fe3+) transporter, the enterobactin and its transporter (FepBDCG), the FecABCDE ABC transporter system, and several putative hemin\/siderophore\/iron uptake proteins (YE1459-1461\/Plu2850-2852), YE3190\/Plu2853, and YE0555\/Plu3738)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: hemine,76,82,Chemical\nye0323,99,105,Protein\n0332,108,112,Protein\nplu2631,115,122,Protein\n2636,125,129,Protein\nYfeABCD,136,143,Protein\niron,175,179,Chemical\nFeoAB,228,233,Protein\nAfuABC,239,245,Protein\nSfuABC,248,254,Protein\nferric,255,261,Chemical\nFe3 +,264,269,Chemical\nenterobactin,288,300,Chemical\nFepBDCG,323,330,Protein\nFecABCDE,337,345,Protein\nhemin,391,396,Chemical\nsiderophore,399,410,Chemical\niron,413,417,Chemical\nYE1459,436,442,Protein\n1461,445,449,Protein\nPlu2850,452,459,Protein\n2852,462,466,Protein\nYE3190,469,475,Protein\nPlu2853,478,485,Protein\nYE0555,491,497,Protein\nPlu3738,500,507,Protein"}", "/scratch/micpie/export/bio_ner_4/valid_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: It attenuates the signaling activity of G-proteins, blocking the homing of Intra Epithelial Lymphocytes (IELs), and it is specifically expressed both in human small intestinal mucosa and in murine IELs, key players in the development of human CD villous atrophy [ 7], [ 17]..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: Intra Epithelial Lymphocytes,77,105,Anatomy\nIELs,108,112,Anatomy\nsmall intestinal mucosa,162,185,Anatomy\nIELs,200,204,Anatomy"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Continuous variables are expressed as medians with interquartile ranges (IQRs, at 25th and 75th percentiles). Data was not available on the age at sexual debut for one woman, lifetime number of sexual partners of one woman, and number of sexual acts with study partner in the last month of two women. * The hormonal contraceptives included oral pills, norethisterone enanthate, Depo-Provera, and steroids..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: hormonal contraceptives,308,331,treatment\nnorethisterone enanthate,353,377,treatment\nDepo - Provera,379,393,treatment\nsteroids,399,407,treatment"}", "/scratch/micpie/export/bio_ner_4/test_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: Culturing activated NK cells with DCs at low NK\/DC ratios (1: 5) led to increases in TNF-alpha production, which were augmented dramatically by the addition of suboptimal doses (10 ng\/ml) of LPS (Fig. 2 A)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: NK cells,20,28,Anatomy\nDCs,34,37,Anatomy\nNK,45,47,Anatomy\nDC,50,52,Anatomy"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Stock concentrations of fluorene, naphthalene, and benzo [] pyrene in acetone was hand mixed into 350 g batches of sand from St. George or Orange beach for 1 min to achieve final concentrations of 100 g g1 naphthalene, 120 g g1 fluorene, and 20 g g1 benzo [] pyrene in the columns..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: fluorene,24,32,treatment\nnaphthalene,34,45,treatment\nnaphthalene,206,217,treatment\nfluorene,228,236,treatment"}", "/scratch/micpie/export/bio_ner_4/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: For example, although aspirin is a generally well-tolerated pain reliever and is increasingly advocated as a preventative tool for heart attacks and colorectal cancer (Vainio and Miller 2003; Werner et al. 2004), it is also linked to increased risk of gastrointestinal bleeding, cerebral hemorrhage (Werner et al. 2004), and asthma attacks (Jenkins et al. 2004)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: heart,133,138,Anatomy\ncolorectal cancer,151,168,Anatomy\ngastrointestinal,255,271,Anatomy\ncerebral,282,290,Anatomy"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Patients received either a zincsubstituted carbonated hydroxyapatite dentifrice (HA group, BioRepair, Wolff, Bielefeld, Germany) or a dentifrice containing an amine fluoride\/stannous fluoride (AmF\/SnF2 group, Meridol, CP GABA, Hamburg, Germany) with no further oral hygiene instructions..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: zincsubstituted carbonated hydroxyapatite dentifrice,27,79,treatment\nBioRepair,92,101,treatment\ndentifrice containing an amine fluoride \/ stannous fluoride,135,194,treatment\nMeridol,215,222,treatment"}", "/scratch/micpie/export/bio_ner_8/valid_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: In the fetal GK\/Par rat exposed to mild hyperglycemia during gestation (a model of IUED), data from our group suggest that the beta-cell deficit (reduced by more than 50%) starts as early as fetal age E16 and reflects decreased beta-cell proliferation, a limitation of beta-cell neogenesis from precursors, and increased apoptosis of both beta cells and their precursors [ 86]..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: fetal,7,12,Anatomy\nbeta - cell,130,141,Anatomy\nfetal,197,202,Anatomy\nbeta - cell,234,245,Anatomy\nbeta - cell,277,288,Anatomy\nprecursors,305,315,Anatomy\nbeta cells,349,359,Anatomy\nprecursors,370,380,Anatomy"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: There was no cross-reaction with shrimp tissues or common shrimp viruses including white spot syndrome virus (WSSV), yellow head virus (YHV), Taura syndrome virus (TSV), Penaeus monodon nucleopolyhedrovirus (PemoNPV), Penaeus stylirostris densovirus (PstDNV) and Penaeus monodon densovirus (PmDNV)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: white spot syndrome virus,85,110,Organism\/Species\nWSSV,113,117,Organism\/Species\nyellow head virus,120,137,Organism\/Species\nYHV,140,143,Organism\/Species\nTaura syndrome virus,146,166,Organism\/Species\nTSV,169,172,Organism\/Species\nPenaeus monodon nucleopolyhedrovirus,175,211,Organism\/Species\nPemoNPV,214,221,Organism\/Species"}", "/scratch/micpie/export/bio_ner_8/test_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: NK-DC (C-E) or NK-K562 (F-H) conjugate formation was measured by flow cytometry, at the following ratios (NK\/DC or NK\/K562): 1: 5 (C and F); 1: 1 (D and G); and 5: 1 (E and H)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: NK,0,2,Anatomy\nDC,5,7,Anatomy\nNK,20,22,Anatomy\nK562,25,29,Anatomy\nNK,117,119,Anatomy\nDC,122,124,Anatomy\nNK,128,130,Anatomy\nK562,133,137,Anatomy"} {"text":"Task: Please carry out the NER task for the the text below.\nText: No cross-reactions were found against other duck pathogens, including duck hepatitis A virus, duck plague herpesvirus, duck reovirus, Newcastle disease virus, and Riemerella anatipestifer 12\/19 (63.2%) and 26\/51 (51%) sera samples from two flocks of ducks that survived DAstV infections in commercial duck farms were positive for DAstV by this method, respectively..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: duck hepatitis A virus,72,94,Organism\/Species\nduck plague herpesvirus,96,119,Organism\/Species\nduck reovirus,121,134,Organism\/Species\nNewcastle disease virus,136,159,Organism\/Species\nRiemerella anatipestifer 12 \/ 19,165,197,Organism\/Species\n26 \/ 51,212,219,Organism\/Species\nDAstV,279,284,Organism\/Species\nDAstV,339,344,Organism\/Species"}", "/scratch/micpie/export/bio_ner_8/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: RT-PCR analyses of total RNA from ovaries (O), embryos (E), from male and female larvae (L), male soma (head plus thorax, MS) and female soma (head plus thorax, FS)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: ovaries,36,43,Anatomy\nembryos,50,57,Anatomy\nsoma,103,107,Anatomy\nhead,110,114,Anatomy\nthorax,120,126,Anatomy\nsoma,143,147,Anatomy\nhead,150,154,Anatomy\nthorax,160,166,Anatomy"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: To minimize competition for limited substrates between different microbial groups (e. g., sulfate-reducing and fermentative bacteria) the slurries were amended with a combination of six low-molecular-weight organic acids (acetate, butyrate, formate, lactate, propionate, and succinate), each at a final concentration of 5mM. Slurries were pasteurized at 80C for 1h to eliminate viable vegetative cells and were incubated at 50C immediately afterwards to promote germination and growth of thermophilic endospores..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: low - molecular - weight organic acids,189,227,treatment\nacetate,230,237,treatment\nbutyrate,239,247,treatment\nformate,249,256,treatment\nlactate,258,265,treatment\npropionate,267,277,treatment\nsuccinate,283,292,treatment\npasteurized at 80C for 1h,347,372,treatment"}", "/scratch/micpie/export/human_intestinal_absorption/test_0-10.jsonl": "{"text":"User: I'm looking for the canonical SMILES of a molecule that is absorbed from the human gastrointestinal system?\nAssistant: This is a molecule that is absorbed from the human gastrointestinal system: CC(=O)Nc1ccccc1"} {"text":"User: I'm looking for the canonical SMILES of a molecule that is absorbed from the human gastrointestinal system?\nAssistant: This is a molecule that is absorbed from the human gastrointestinal system: CCN(CC)[C@H](C)C(=O)c1ccccc1"}", "/scratch/micpie/export/human_intestinal_absorption/valid_0-8.jsonl": "{"text":"User: Is the molecule with the DeepSMILES CccNC)C))c=N)n-cccccc6))))))n5C absorbed from the human gastrointestinal system?\nAssistant: Yes, it is absorbed from the human gastrointestinal system."} {"text":"User: Is the molecule with the SMILES CC1=C(CC(=O)O)c2cc(F)ccc2\/C1=C\/c1ccc(S(C)=O)cc1 absorbed from the human gastrointestinal system?\nAssistant: Yes, it is absorbed from the human gastrointestinal system."}", "/scratch/micpie/export/human_intestinal_absorption/train_0-8.jsonl": "{"text":"User: Is the molecule with the canonical SMILES COc1ccccc1OCCNC[C@@H](O)c1ccc(C)cc1S(N)(=O)=O absorbed from the human gastrointestinal system?\nAssistant: Yes, it is absorbed from the human gastrointestinal system."} {"text":"User: Is the molecule with the canonical SMILES Nc1cc(-c2ccncc2)c[nH]c1=O absorbed from the human gastrointestinal system?\nAssistant: Yes, it is absorbed from the human gastrointestinal system."}", "/scratch/micpie/export/human_intestinal_absorption/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is absorbed from the human gastrointestinal system.\nMolecule IUPAC name: N-Phenylacetamide\nConstraint: Answer the question in a full sentence.\nResult: This molecule is absorbed from the human gastrointestinal system."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is absorbed from the human gastrointestinal system.\nMolecule SMILES: CCN(CC)[C@H](C)C(=O)c1ccccc1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is absorbed from the human gastrointestinal system."}", "/scratch/micpie/export/human_intestinal_absorption/valid_0-9.jsonl": "{"text":"User: Can you create the SMILES of a molecule that is absorbed from the human gastrointestinal system?\nAssistant: Sure, here you go: Cc1c(N(C)C)c(=N)n(-c2ccccc2)n1C"} {"text":"User: Can you create the DeepSMILES of a molecule that is absorbed from the human gastrointestinal system?\nAssistant: Yes, here you go: CC=CCC=O)O)))cccF)ccc6\/C9=C\/ccccSC)=O))cc6"}", "/scratch/micpie/export/human_intestinal_absorption/test_0-1.jsonl": "{"text":"Based on the IUPAC name representation N-Phenylacetamide, the molecule has human intestinal absorption characteristics."} {"text":"Based on the SMILES representation CCN(CC)[C@H](C)C(=O)c1ccccc1, the molecule has HIA properties."}", "/scratch/micpie/export/human_intestinal_absorption/valid_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][=C][Branch1][=Branch1][N][Branch1][C][C][C][C][=Branch1][C][=N][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][N][Ring1][#C][C] displays human intestinal absorption (HIA) properties."} {"text":"The molecule with the canonical SMILES representation of CC1=C(CC(=O)O)c2cc(F)ccc2\/C1=C\/c1ccc(S(C)=O)cc1 exhibits human intestinal absorption (HIA) properties."}", "/scratch/micpie/export/human_intestinal_absorption/test_0-2.jsonl": "{"text":"The canonical SMILES CC(=O)Nc1ccccc1 represents a molecule that shows HIA."} {"text":"The canonical SMILES CCN(CC)[C@H](C)C(=O)c1ccccc1 is from a molecule that exhibits human intestinal absorption."}", "/scratch/micpie/export/human_intestinal_absorption/valid_0-10.jsonl": "{"text":"User: I'm looking for the InChI of a molecule that is absorbed from the human gastrointestinal system?\nAssistant: This is a molecule that is absorbed from the human gastrointestinal system: InChI=1S\/C13H18N4\/c1-10-12(15(2)3)13(14)17(16(10)4)11-8-6-5-7-9-11\/h5-9,14H,1-4H3"} {"text":"User: I'm looking for the SELFIES of a molecule that is absorbed from the human gastrointestinal system?\nAssistant: This is a molecule that is absorbed from the human gastrointestinal system: [C][C][=C][Branch1][#Branch1][C][C][=Branch1][C][=O][O][C][=C][C][Branch1][C][F][=C][C][=C][Ring1][#Branch1][\/C][Ring1][=C][=C][\/C][=C][C][=C][Branch1][=Branch1][S][Branch1][C][C][=O][C][=C][Ring1][=Branch2]"}", "/scratch/micpie/export/human_intestinal_absorption/train_0-6.jsonl": "{"text":"Task: Please give me a SELFIES based on the text description.\nDescription: A molecule that is absorbed from the human gastrointestinal system.\nResult: [C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][C][N][C][C@@H1][Branch1][C][O][C][=C][C][=C][Branch1][C][C][C][=C][Ring1][#Branch1][S][Branch1][C][N][=Branch1][C][=O][=O]"} {"text":"Task: Please create a molecule InChI based on the description.\nDescription: A molecule that is absorbed from the human gastrointestinal system.\nResult: InChI=1S\/C10H9N3O\/c11-9-5-8(6-13-10(9)14)7-1-3-12-4-2-7\/h1-6H,11H2,(H,13,14)"}", "/scratch/micpie/export/human_intestinal_absorption/valid_0-6.jsonl": "{"text":"Task: Please generate a SMILES based on the text description below.\nDescription: A molecule that is absorbed from the human gastrointestinal system.\nResult: Cc1c(N(C)C)c(=N)n(-c2ccccc2)n1C"} {"text":"Task: Please create a molecule DeepSMILES based on the text description.\nDescription: A molecule that is absorbed from the human gastrointestinal system.\nResult: CC=CCC=O)O)))cccF)ccc6\/C9=C\/ccccSC)=O))cc6"}", "/scratch/micpie/export/human_intestinal_absorption/test_0-9.jsonl": "{"text":"User: Can you generate the canonical SMILES of a molecule that is absorbed from the human gastrointestinal system?\nAssistant: Of course, here you go: CC(=O)Nc1ccccc1"} {"text":"User: Can you create the SMILES of a molecule that is absorbed from the human gastrointestinal system?\nAssistant: Of course, here you go: CCN(CC)[C@H](C)C(=O)c1ccccc1"}", "/scratch/micpie/export/human_intestinal_absorption/test_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1] exhibits human intestinal absorption properties."} {"text":"The molecule with the SELFIES [C][C][N][Branch1][Ring1][C][C][C@H1][Branch1][C][C][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1] exhibits human intestinal absorption properties."}", "/scratch/micpie/export/human_intestinal_absorption/valid_0-7.jsonl": "{"text":"User: Can you derive if the molecule with the canonical SMILES Cc1c(N(C)C)c(=N)n(-c2ccccc2)n1C is absorbed from the human gastrointestinal system?\nAssistant: Yes, this molecule is absorbed from the human gastrointestinal system."} {"text":"User: Can you estimate if the molecule with the SELFIES [C][C][=C][Branch1][#Branch1][C][C][=Branch1][C][=O][O][C][=C][C][Branch1][C][F][=C][C][=C][Ring1][#Branch1][\/C][Ring1][=C][=C][\/C][=C][C][=C][Branch1][=Branch1][S][Branch1][C][C][=O][C][=C][Ring1][=Branch2] is absorbed from the human gastrointestinal system?\nAssistant: Yes, this molecule is absorbed from the human gastrointestinal system."}", "/scratch/micpie/export/human_intestinal_absorption/test_0-3.jsonl": "{"text":"The DeepSMILES CC=O)Ncccccc6 is absorbed from the human gastrointestinal system."} {"text":"The molecule canonical SMILES CCN(CC)[C@H](C)C(=O)c1ccccc1 is absorbed from the human gastrointestinal system."}", "/scratch/micpie/export/human_intestinal_absorption/valid_0-11.jsonl": "{"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should be absorbed from the human gastrointestinal system.\nAssistant: Ok, this DeepSMILES is absorbed from the human gastrointestinal system: CccNC)C))c=N)n-cccccc6))))))n5C"} {"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should be absorbed from the human gastrointestinal system.\nAssistant: Got it, here you go, this SELFIES is absorbed from the human gastrointestinal system: [C][C][=C][Branch1][#Branch1][C][C][=Branch1][C][=O][O][C][=C][C][Branch1][C][F][=C][C][=C][Ring1][#Branch1][\/C][Ring1][=C][=C][\/C][=C][C][=C][Branch1][=Branch1][S][Branch1][C][C][=O][C][=C][Ring1][=Branch2]"}", "/scratch/micpie/export/human_intestinal_absorption/train_0-0.jsonl": "{"text":"The molecule with the canonical SMILES COc1ccccc1OCCNC[C@@H](O)c1ccc(C)cc1S(N)(=O)=O exhibits human intestinal absorption (HIA) properties."} {"text":"The molecule with the SMILES Nc1cc(-c2ccncc2)c[nH]c1=O shows human intestinal absorption properties."}", "/scratch/micpie/export/human_intestinal_absorption/test_0-6.jsonl": "{"text":"Task: Please create a SELFIES based on the description.\nDescription: A molecule that is absorbed from the human gastrointestinal system.\nResult: [C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1]"} {"text":"Task: Please generate a canonical SMILES based on the text description.\nDescription: A molecule that is absorbed from the human gastrointestinal system.\nResult: CCN(CC)[C@H](C)C(=O)c1ccccc1"}", "/scratch/micpie/export/human_intestinal_absorption/train_0-10.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that is absorbed from the human gastrointestinal system?\nAssistant: This is a molecule that is absorbed from the human gastrointestinal system: COc1ccccc1OCCNC[C@@H](O)c1ccc(C)cc1S(N)(=O)=O"} {"text":"User: I'm looking for the canonical SMILES of a molecule that is absorbed from the human gastrointestinal system?\nAssistant: This is a molecule that is absorbed from the human gastrointestinal system: Nc1cc(-c2ccncc2)c[nH]c1=O"}", "/scratch/micpie/export/human_intestinal_absorption/train_0-3.jsonl": "{"text":"The molecule SELFIES [C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][C][N][C][C@@H1][Branch1][C][O][C][=C][C][=C][Branch1][C][C][C][=C][Ring1][#Branch1][S][Branch1][C][N][=Branch1][C][=O][=O] is absorbed from the human gastrointestinal system."} {"text":"The SELFIES [N][C][=C][C][Branch1][=Branch2][C][=C][C][=N][C][=C][Ring1][=Branch1][=C][NH1][C][Ring1][N][=O] is absorbed from the human gastrointestinal system."}", "/scratch/micpie/export/human_intestinal_absorption/train_0-12.jsonl": "{"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should be absorbed from the human gastrointestinal system.\nAssistant: Understood, this SELFIES is absorbed from the human gastrointestinal system: [C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][C][N][C][C@@H1][Branch1][C][O][C][=C][C][=C][Branch1][C][C][C][=C][Ring1][#Branch1][S][Branch1][C][N][=Branch1][C][=O][=O]"} {"text":"User: I want to generate a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should be absorbed from the human gastrointestinal system.\nAssistant: Got it, this canonical SMILES is absorbed from the human gastrointestinal system: Nc1cc(-c2ccncc2)c[nH]c1=O"}", "/scratch/micpie/export/human_intestinal_absorption/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES CC(=O)Nc1ccccc1 absorbed from the human gastrointestinal system?\nConstraint: Even if you are uncertain, you must pick either a or b without using any other words.\nOptions:\na. False\nb. True\nAnswer: b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES representation of CCNCC))[C@H]C)C=O)cccccc6 absorbed from the human gastrointestinal system?\nConstraint: Even if you are not sure, you must pick either A or B without using any additional words.\nOptions:\nA. False\nB. True\nAnswer: B"}", "/scratch/micpie/export/human_intestinal_absorption/valid_0-2.jsonl": "{"text":"The DeepSMILES CccNC)C))c=N)n-cccccc6))))))n5C represents a molecule that exhibits human intestinal absorption (HIA)."} {"text":"The SMILES CC1=C(CC(=O)O)c2cc(F)ccc2\/C1=C\/c1ccc(S(C)=O)cc1 is from a molecule that displays human intestinal absorption (HIA)."}", "/scratch/micpie/export/human_intestinal_absorption/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are absorbed from the human gastrointestinal system?\nConstraint: You must select none, one or more options from A, B, C, or D without using any other words.\nOptions:\n[A] COcccccc6OCCNC[C@@H]O)ccccC)cc6SN)=O)=O\n[B] CC=O)Occcccc6O[C@H][C@H]OCC)=O)))C=C[C@H][C@H]C%11)NC)CCC[C@@]%147%11\n[C] CNC)CCCNcccccc6CCcccccc6%15\n[D] OCCNCCNCCCNcccccc6SccccCl)cc6%14)))))))))))))))))CC6\nAnswer: A, B, C, D"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are absorbed from the human gastrointestinal system?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any additional words.\nOptions:\nA.) OCCOCCN1CCN(C2=Nc3ccccc3Sc3ccccc32)CC1\nB.) Nc1cc(-c2ccncc2)c[nH]c1=O\nC.) CC(=O)OCC(=O)[C@@]1(O)CC[C@@H]2[C@@H]3CCC4=CC(=O)CC[C@]4(C)[C@]3(F)[C@@H](O)C[C@@]21C\nD.) CN[C@@H]1[C@H](O[C@H]2[C@@H](O[C@H]3[C@@H](O)[C@@H](O)[C@@H](NC(=N)N)[C@@H](O)[C@@H]3NC(=N)N)O[C@@H](C)[C@@]2(O)C=O)O[C@@H](CO)[C@@H](O)[C@H]1O\nE.) CC(=O)O[C@@H]1[C@H]([N+]2(C)CCCCC2)C[C@@H]2[C@@H]3CC[C@@H]4C[C@@H](OC(C)=O)[C@@H](N5CCCCC5)C[C@]4(C)[C@@H]3CC[C@@]12C\nAnswer: A, B, C"}", "/scratch/micpie/export/human_intestinal_absorption/valid_0-1.jsonl": "{"text":"Based on the canonical SMILES representation Cc1c(N(C)C)c(=N)n(-c2ccccc2)n1C, the molecule has human intestinal absorption (HIA) properties."} {"text":"Based on the InChI representation InChI=1S\/C20H17FO3S\/c1-12-17(9-13-3-6-15(7-4-13)25(2)24)16-8-5-14(21)10-19(16)18(12)11-20(22)23\/h3-10H,11H2,1-2H3,(H,22,23)\/b17-9+, the molecule has HIA characteristics."}", "/scratch/micpie/export/human_intestinal_absorption/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of Cc1c(N(C)C)c(=N)n(-c2ccccc2)n1C absorbed from the human gastrointestinal system?\nConstraint: Even if you are not sure, you must pick either a or b without using any additional words.\nOptions:\na: True\nb: False\nAnswer: a"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES [C][C][=C][Branch1][#Branch1][C][C][=Branch1][C][=O][O][C][=C][C][Branch1][C][F][=C][C][=C][Ring1][#Branch1][\/C][Ring1][=C][=C][\/C][=C][C][=C][Branch1][=Branch1][S][Branch1][C][C][=O][C][=C][Ring1][=Branch2] absorbed from the human gastrointestinal system?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any additional words.\nOptions:\n(1) False\n(2) True\nAnswer: 2"}", "/scratch/micpie/export/human_intestinal_absorption/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is absorbed from the human gastrointestinal system.\nMolecule InChI: InChI=1S\/C13H18N4\/c1-10-12(15(2)3)13(14)17(16(10)4)11-8-6-5-7-9-11\/h5-9,14H,1-4H3\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is absorbed from the human gastrointestinal system."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is absorbed from the human gastrointestinal system.\nMolecule SMILES: CC1=C(CC(=O)O)c2cc(F)ccc2\/C1=C\/c1ccc(S(C)=O)cc1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is absorbed from the human gastrointestinal system."}", "/scratch/micpie/export/human_intestinal_absorption/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is absorbed from the human gastrointestinal system.\ncanonical SMILES: Cc1c(N(C)C)c(=N)n(-c2ccccc2)n1C\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is absorbed from the human gastrointestinal system.\nInChI: InChI=1S\/C20H17FO3S\/c1-12-17(9-13-3-6-15(7-4-13)25(2)24)16-8-5-14(21)10-19(16)18(12)11-20(22)23\/h3-10H,11H2,1-2H3,(H,22,23)\/b17-9+\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"}", "/scratch/micpie/export/human_intestinal_absorption/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is absorbed from the human gastrointestinal system.\nSMILES: COc1ccccc1OCCNC[C@@H](O)c1ccc(C)cc1S(N)(=O)=O\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is absorbed from the human gastrointestinal system."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is absorbed from the human gastrointestinal system.\nMolecule canonical SMILES: Nc1cc(-c2ccncc2)c[nH]c1=O\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is absorbed from the human gastrointestinal system."}", "/scratch/micpie/export/human_intestinal_absorption/valid_0-12.jsonl": "{"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should be absorbed from the human gastrointestinal system.\nAssistant: Understood, this SELFIES is absorbed from the human gastrointestinal system: [C][C][=C][Branch1][=Branch1][N][Branch1][C][C][C][C][=Branch1][C][=N][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][N][Ring1][#C][C]"} {"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should be absorbed from the human gastrointestinal system.\nAssistant: Got it, this InChI is absorbed from the human gastrointestinal system: InChI=1S\/C20H17FO3S\/c1-12-17(9-13-3-6-15(7-4-13)25(2)24)16-8-5-14(21)10-19(16)18(12)11-20(22)23\/h3-10H,11H2,1-2H3,(H,22,23)\/b17-9+"}", "/scratch/micpie/export/human_intestinal_absorption/train_0-2.jsonl": "{"text":"The InChI InChI=1S\/C18H24N2O5S\/c1-13-7-8-14(18(11-13)26(19,22)23)15(21)12-20-9-10-25-17-6-4-3-5-16(17)24-2\/h3-8,11,15,20-21H,9-10,12H2,1-2H3,(H2,19,22,23)\/t15-\/m1\/s1 represents a molecule that displays human intestinal absorption (HIA)."} {"text":"The SELFIES [N][C][=C][C][Branch1][=Branch2][C][=C][C][=N][C][=C][Ring1][=Branch1][=C][NH1][C][Ring1][N][=O] represents a molecule that shows HIA."}", "/scratch/micpie/export/human_intestinal_absorption/test_0-11.jsonl": "{"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should be absorbed from the human gastrointestinal system.\nAssistant: Ok, here you go, this DeepSMILES is absorbed from the human gastrointestinal system: CC=O)Ncccccc6"} {"text":"User: I want to create a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should be absorbed from the human gastrointestinal system.\nAssistant: Got it, this canonical SMILES is absorbed from the human gastrointestinal system: CCN(CC)[C@H](C)C(=O)c1ccccc1"}", "/scratch/micpie/export/human_intestinal_absorption/train_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the SMILES COc1ccccc1OCCNC[C@@H](O)c1ccc(C)cc1S(N)(=O)=O is absorbed from the human gastrointestinal system?\nAssistant: Yes, this molecule is absorbed from the human gastrointestinal system."} {"text":"User: Can you estimate if the molecule with the SELFIES [N][C][=C][C][Branch1][=Branch2][C][=C][C][=N][C][=C][Ring1][=Branch1][=C][NH1][C][Ring1][N][=O] is absorbed from the human gastrointestinal system?\nAssistant: Yes, this molecule is absorbed from the human gastrointestinal system."}", "/scratch/micpie/export/human_intestinal_absorption/train_0-11.jsonl": "{"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should be absorbed from the human gastrointestinal system.\nAssistant: Got it, this SELFIES is absorbed from the human gastrointestinal system: [C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][C][N][C][C@@H1][Branch1][C][O][C][=C][C][=C][Branch1][C][C][C][=C][Ring1][#Branch1][S][Branch1][C][N][=Branch1][C][=O][=O]"} {"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should be absorbed from the human gastrointestinal system.\nAssistant: Got it, this SELFIES is absorbed from the human gastrointestinal system: [N][C][=C][C][Branch1][=Branch2][C][=C][C][=N][C][=C][Ring1][=Branch1][=C][NH1][C][Ring1][N][=O]"}", "/scratch/micpie/export/human_intestinal_absorption/train_0-1.jsonl": "{"text":"Based on the DeepSMILES COcccccc6OCCNC[C@@H]O)ccccC)cc6SN)=O)=O, the molecule has human intestinal absorption (HIA) properties."} {"text":"Based on the SELFIES representation [N][C][=C][C][Branch1][=Branch2][C][=C][C][=N][C][=C][Ring1][=Branch1][=C][NH1][C][Ring1][N][=O], the molecule has human intestinal absorption features."}", "/scratch/micpie/export/human_intestinal_absorption/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI InChI=1S\/C18H24N2O5S\/c1-13-7-8-14(18(11-13)26(19,22)23)15(21)12-20-9-10-25-17-6-4-3-5-16(17)24-2\/h3-8,11,15,20-21H,9-10,12H2,1-2H3,(H2,19,22,23)\/t15-\/m1\/s1 absorbed from the human gastrointestinal system?\nConstraint: Even if you are uncertain, you must pick either a or b without using any other words.\nOptions:\n(a) True\n(b) False\nAnswer: a"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES Nc1cc(-c2ccncc2)c[nH]c1=O absorbed from the human gastrointestinal system?\nConstraint: Even if you are not sure, you must pick either A or B without using any other words.\nOptions:\nA False\nB True\nAnswer: B"}", "/scratch/micpie/export/human_intestinal_absorption/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is absorbed from the human gastrointestinal system.\ncanonical SMILES: COc1ccccc1OCCNC[C@@H](O)c1ccc(C)cc1S(N)(=O)=O\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is absorbed from the human gastrointestinal system.\nMolecule InChI: InChI=1S\/C10H9N3O\/c11-9-5-8(6-13-10(9)14)7-1-3-12-4-2-7\/h1-6H,11H2,(H,13,14)\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"}", "/scratch/micpie/export/human_intestinal_absorption/test_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the SELFIES [C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1] is absorbed from the human gastrointestinal system?\nAssistant: Yes, this molecule is absorbed from the human gastrointestinal system."} {"text":"User: Can you tell me if the molecule with the InChI InChI=1S\/C13H19NO\/c1-4-14(5-2)11(3)13(15)12-9-7-6-8-10-12\/h6-11H,4-5H2,1-3H3\/t11-\/m1\/s1 is absorbed from the human gastrointestinal system?\nAssistant: Yes, this molecule is absorbed from the human gastrointestinal system."}", "/scratch/micpie/export/human_intestinal_absorption/train_0-9.jsonl": "{"text":"User: Can you generate the DeepSMILES of a molecule that is absorbed from the human gastrointestinal system?\nAssistant: Yes, here you go: COcccccc6OCCNC[C@@H]O)ccccC)cc6SN)=O)=O"} {"text":"User: Can you give me the SELFIES of a molecule that is absorbed from the human gastrointestinal system?\nAssistant: Sure, here you go: [N][C][=C][C][Branch1][=Branch2][C][=C][C][=N][C][=C][Ring1][=Branch1][=C][NH1][C][Ring1][N][=O]"}", "/scratch/micpie/export/human_intestinal_absorption/valid_0-3.jsonl": "{"text":"The SMILES Cc1c(N(C)C)c(=N)n(-c2ccccc2)n1C is absorbed from the human gastrointestinal system."} {"text":"The InChI InChI=1S\/C20H17FO3S\/c1-12-17(9-13-3-6-15(7-4-13)25(2)24)16-8-5-14(21)10-19(16)18(12)11-20(22)23\/h3-10H,11H2,1-2H3,(H,22,23)\/b17-9+ is absorbed from the human gastrointestinal system."}", "/scratch/micpie/export/human_intestinal_absorption/test_0-8.jsonl": "{"text":"User: Is the molecule with the DeepSMILES CC=O)Ncccccc6 absorbed from the human gastrointestinal system?\nAssistant: Yes, it is absorbed from the human gastrointestinal system."} {"text":"User: Is the molecule with the canonical SMILES CCN(CC)[C@H](C)C(=O)c1ccccc1 absorbed from the human gastrointestinal system?\nAssistant: Yes, it is absorbed from the human gastrointestinal system."}", "/scratch/micpie/export/human_intestinal_absorption/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are absorbed from the human gastrointestinal system?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any other words.\nOptions:\nA: [C][C@H1][Branch1][=Branch1][C][=Branch1][C][=O][O][C][=C][C][=C][C][Branch1][=N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring1][=C]\nB: [C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1]\nC: [O][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][C][C][N][Branch2][Ring1][N][C][C][C][Branch1][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][C][Ring2][Ring1][#Branch1]\nD: [C][O][C][=C][C][=N][C][Branch2][Ring1][O][C][S][=Branch1][C][=O][C][=N][C][=C][C][Branch1][#Branch1][O][C][Branch1][C][F][F][=C][C][=C][Ring1][#Branch2][NH1][Ring1][=N][=C][Ring2][Ring1][=Branch1][O][C]\nE: [C][C@H1][Branch1][=Branch1][C][=Branch1][C][=O][O][C][=C][C][=C][Branch1][#C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][=Branch2][=O][C][=C][Ring1][S]\nAnswer: A, B, C, D, E"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are absorbed from the human gastrointestinal system?\nConstraint: You must select none, one or more options from a, b, or c without using any other words.\nOptions:\na CCC)C)cccc[C@H]O)CCCNCCCCO)cccccc6))))))cccccc6)))))))CC6))))))))))cc6\nb O=CNCCO[N+]=O)[O-]))))))ccccnc6\nc CCNCC))[C@H]C)C=O)cccccc6\nAnswer: a, b, c"}", "/scratch/micpie/export/human_intestinal_absorption/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are absorbed from the human gastrointestinal system?\nConstraint: You must select none, one or more options from 1, 2, 3, 4, or 5 without using any other words.\nOptions:\n1: CCCCCCCNCC))CCC[C@@H]O)ccccNSC)=O)=O)))cc6\n2: COcccccc[C@H]C)C=O)O)))ccc6c%10\n3: CcccCC=O)O)))nC)c5C=O)ccccCl)cc6\n4: CccNC)C))c=N)n-cccccc6))))))n5C\n5: N#C[C@H]O[C@@H]O[C@@H]CO[C@@H]O[C@@H]CO))[C@@H]O)[C@H]O)[C@@H]6O)))))))))[C@@H]O)[C@H]O)[C@@H]6O))))))))cccccc6\nAnswer: 1, 2, 3, 4"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are absorbed from the human gastrointestinal system?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any additional words.\nOptions:\nA CC(C)N(CC[C@](C(N)=O)(c1ccccc1)c1cccnc1)C(C)C\nB NCCC[C@](N)(C(=O)O)C(F)F\nC O=C(N[C@H](CO)[C@@H](O)c1ccc([N+](=O)[O-])cc1)C(Cl)Cl\nD CC1=C(CC(=O)O)c2cc(F)ccc2\/C1=C\/c1ccc(S(C)=O)cc1\nE CCCc1nc2c(n1Cc1ccc(-c3ccccc3C(=O)O)cc1)=C[C@H](c1nc3ccccc3n1C)CC=2C\nAnswer: A, B, C, D, E"}", "/scratch/micpie/export/human_intestinal_absorption/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is absorbed from the human gastrointestinal system.\nIUPAC name: N-Phenylacetamide\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is absorbed from the human gastrointestinal system.\nMolecule InChI: InChI=1S\/C13H19NO\/c1-4-14(5-2)11(3)13(15)12-9-7-6-8-10-12\/h6-11H,4-5H2,1-3H3\/t11-\/m1\/s1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"}", "/scratch/micpie/export/human_intestinal_absorption/test_0-12.jsonl": "{"text":"User: I want to come up with a molecule IUPAC name.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should be absorbed from the human gastrointestinal system.\nAssistant: Got it, this IUPAC name is absorbed from the human gastrointestinal system: N-Phenylacetamide"} {"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should be absorbed from the human gastrointestinal system.\nAssistant: Got it, this DeepSMILES is absorbed from the human gastrointestinal system: CCNCC))[C@H]C)C=O)cccccc6"}", "/scratch/micpie/export/bio_ner_38/valid_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Three groups of catalysts clearly show affinities toward: (1) hydrogen formation [ on early transition metals (Ti, V, Cr, Mn, Zr, Nb, Mo, Hf, Ta, We, and Re) and platinum group metals (Ru, Rh, Ir, and Pt)], (2) sorbitol formation [ on late transition metals (Fe, Co, Ni, Cu, Pd, Au, and Ag) and Al (sp metal)], and (3) sorbitol and 2-deoxysorbitol formation [ on post-transition metals (In, Sn, Sb, Pb, and Bi), as well as Zn and Cd (d metals)]..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: hydrogen,63,71,Chemical\/Drug\ntransition metals,93,110,Chemical\/Drug\nTi,113,115,Chemical\/Drug\nV,117,118,Chemical\/Drug\nCr,120,122,Chemical\/Drug\nMn,124,126,Chemical\/Drug\nZr,128,130,Chemical\/Drug\nNb,132,134,Chemical\/Drug\nMo,136,138,Chemical\/Drug\nHf,140,142,Chemical\/Drug\nTa,144,146,Chemical\/Drug\nWe,148,150,Chemical\/Drug\nRe,156,158,Chemical\/Drug\nplatinum,164,172,Chemical\/Drug\nRu,188,190,Chemical\/Drug\nRh,192,194,Chemical\/Drug\nIr,196,198,Chemical\/Drug\nPt,204,206,Chemical\/Drug\nsorbitol,215,223,Chemical\/Drug\ntransition metals,244,261,Chemical\/Drug\nFe,264,266,Chemical\/Drug\nCo,268,270,Chemical\/Drug\nNi,272,274,Chemical\/Drug\nCu,276,278,Chemical\/Drug\nPd,280,282,Chemical\/Drug\nAu,284,286,Chemical\/Drug\nAg,292,294,Chemical\/Drug\nAl,300,302,Chemical\/Drug\nsorbitol,326,334,Chemical\/Drug\n2 - deoxysorbitol,339,356,Chemical\/Drug\ntransition metals,379,396,Chemical\/Drug\nIn,399,401,Chemical\/Drug\nSn,403,405,Chemical\/Drug\nSb,407,409,Chemical\/Drug\nPb,411,413,Chemical\/Drug\nBi,419,421,Chemical\/Drug\nZn,435,437,Chemical\/Drug\nCd,442,444,Chemical\/Drug"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Three groups of catalysts clearly show affinities toward: (1) hydrogen formation [ on early transition metals (Ti, V, Cr, Mn, Zr, Nb, Mo, Hf, Ta, We, and Re) and platinum group metals (Ru, Rh, Ir, and Pt)], (2) sorbitol formation [ on late transition metals (Fe, Co, Ni, Cu, Pd, Au, and Ag) and Al (sp metal)], and (3) sorbitol and 2-deoxysorbitol formation [ on post-transition metals (In, Sn, Sb, Pb, and Bi), as well as Zn and Cd (d metals)]..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: hydrogen,63,71,Chemical\/Drug\ntransition metals,93,110,Chemical\/Drug\nTi,113,115,Chemical\/Drug\nV,117,118,Chemical\/Drug\nCr,120,122,Chemical\/Drug\nMn,124,126,Chemical\/Drug\nZr,128,130,Chemical\/Drug\nNb,132,134,Chemical\/Drug\nMo,136,138,Chemical\/Drug\nHf,140,142,Chemical\/Drug\nTa,144,146,Chemical\/Drug\nWe,148,150,Chemical\/Drug\nRe,156,158,Chemical\/Drug\nplatinum,164,172,Chemical\/Drug\nRu,188,190,Chemical\/Drug\nRh,192,194,Chemical\/Drug\nIr,196,198,Chemical\/Drug\nPt,204,206,Chemical\/Drug\nsorbitol,215,223,Chemical\/Drug\ntransition metals,244,261,Chemical\/Drug\nFe,264,266,Chemical\/Drug\nCo,268,270,Chemical\/Drug\nNi,272,274,Chemical\/Drug\nCu,276,278,Chemical\/Drug\nPd,280,282,Chemical\/Drug\nAu,284,286,Chemical\/Drug\nAg,292,294,Chemical\/Drug\nAl,300,302,Chemical\/Drug\nsorbitol,326,334,Chemical\/Drug\n2 - deoxysorbitol,339,356,Chemical\/Drug\ntransition metals,379,396,Chemical\/Drug\nIn,399,401,Chemical\/Drug\nSn,403,405,Chemical\/Drug\nSb,407,409,Chemical\/Drug\nPb,411,413,Chemical\/Drug\nBi,419,421,Chemical\/Drug\nZn,435,437,Chemical\/Drug\nCd,442,444,Chemical\/Drug"}", "/scratch/micpie/export/bio_ner_38/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Three groups of catalysts clearly show affinities toward: (1) hydrogen formation [ on early transition metals (Ti, V, Cr, Mn, Zr, Nb, Mo, Hf, Ta, We, and Re) and platinum group metals (Ru, Rh, Ir, and Pt)], (2) sorbitol formation [ on late transition metals (Fe, Co, Ni, Cu, Pd, Au, and Ag) and Al (sp metal)], and (3) sorbitol and 2-deoxysorbitol formation [ on post-transition metals (In, Sn, Sb, Pb, and Bi), as well as Zn and Cd (d metals)]..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: hydrogen,63,71,Chemical\/Drug\ntransition metals,93,110,Chemical\/Drug\nTi,113,115,Chemical\/Drug\nV,117,118,Chemical\/Drug\nCr,120,122,Chemical\/Drug\nMn,124,126,Chemical\/Drug\nZr,128,130,Chemical\/Drug\nNb,132,134,Chemical\/Drug\nMo,136,138,Chemical\/Drug\nHf,140,142,Chemical\/Drug\nTa,144,146,Chemical\/Drug\nWe,148,150,Chemical\/Drug\nRe,156,158,Chemical\/Drug\nplatinum,164,172,Chemical\/Drug\nRu,188,190,Chemical\/Drug\nRh,192,194,Chemical\/Drug\nIr,196,198,Chemical\/Drug\nPt,204,206,Chemical\/Drug\nsorbitol,215,223,Chemical\/Drug\ntransition metals,244,261,Chemical\/Drug\nFe,264,266,Chemical\/Drug\nCo,268,270,Chemical\/Drug\nNi,272,274,Chemical\/Drug\nCu,276,278,Chemical\/Drug\nPd,280,282,Chemical\/Drug\nAu,284,286,Chemical\/Drug\nAg,292,294,Chemical\/Drug\nAl,300,302,Chemical\/Drug\nsorbitol,326,334,Chemical\/Drug\n2 - deoxysorbitol,339,356,Chemical\/Drug\ntransition metals,379,396,Chemical\/Drug\nIn,399,401,Chemical\/Drug\nSn,403,405,Chemical\/Drug\nSb,407,409,Chemical\/Drug\nPb,411,413,Chemical\/Drug\nBi,419,421,Chemical\/Drug\nZn,435,437,Chemical\/Drug\nCd,442,444,Chemical\/Drug"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Three groups of catalysts clearly show affinities toward: (1) hydrogen formation [ on early transition metals (Ti, V, Cr, Mn, Zr, Nb, Mo, Hf, Ta, We, and Re) and platinum group metals (Ru, Rh, Ir, and Pt)], (2) sorbitol formation [ on late transition metals (Fe, Co, Ni, Cu, Pd, Au, and Ag) and Al (sp metal)], and (3) sorbitol and 2-deoxysorbitol formation [ on post-transition metals (In, Sn, Sb, Pb, and Bi), as well as Zn and Cd (d metals)]..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: hydrogen,63,71,Chemical\/Drug\ntransition metals,93,110,Chemical\/Drug\nTi,113,115,Chemical\/Drug\nV,117,118,Chemical\/Drug\nCr,120,122,Chemical\/Drug\nMn,124,126,Chemical\/Drug\nZr,128,130,Chemical\/Drug\nNb,132,134,Chemical\/Drug\nMo,136,138,Chemical\/Drug\nHf,140,142,Chemical\/Drug\nTa,144,146,Chemical\/Drug\nWe,148,150,Chemical\/Drug\nRe,156,158,Chemical\/Drug\nplatinum,164,172,Chemical\/Drug\nRu,188,190,Chemical\/Drug\nRh,192,194,Chemical\/Drug\nIr,196,198,Chemical\/Drug\nPt,204,206,Chemical\/Drug\nsorbitol,215,223,Chemical\/Drug\ntransition metals,244,261,Chemical\/Drug\nFe,264,266,Chemical\/Drug\nCo,268,270,Chemical\/Drug\nNi,272,274,Chemical\/Drug\nCu,276,278,Chemical\/Drug\nPd,280,282,Chemical\/Drug\nAu,284,286,Chemical\/Drug\nAg,292,294,Chemical\/Drug\nAl,300,302,Chemical\/Drug\nsorbitol,326,334,Chemical\/Drug\n2 - deoxysorbitol,339,356,Chemical\/Drug\ntransition metals,379,396,Chemical\/Drug\nIn,399,401,Chemical\/Drug\nSn,403,405,Chemical\/Drug\nSb,407,409,Chemical\/Drug\nPb,411,413,Chemical\/Drug\nBi,419,421,Chemical\/Drug\nZn,435,437,Chemical\/Drug\nCd,442,444,Chemical\/Drug"}", "/scratch/micpie/export/drug_protein_pathway_disease/valid_0-0.jsonl": "{"text":"The drug [C][N][C][=N][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][C][C][C][=Branch1][C][=O][N][Ring1][=Branch2][C] targets the protein 61 kDa Cam-PDE. The protein 61 kDa Cam-PDE is involved in the Metabolic pathways. The Metabolic pathways is modulated by the disease Urofacial syndrome."} {"text":"The drug COc1c(C)c2c(c(O)c1C\/C=C(\\C)CCC(=O)OCCN1CCOCC1)C(=O)OC2 targets the protein 6-pyruvoyl tetrahydrobiopterin synthase. The protein 6-pyruvoyl tetrahydrobiopterin synthase is involved in the Folate biosynthesis. The Folate biosynthesis is modulated by the disease Hypophosphatasia."}", "/scratch/micpie/export/drug_protein_pathway_disease/test_0-0.jsonl": "{"text":"The drug [H][C@@]CC[C@H]O)[C@@]5C)CC[C@][H])C=CC=CO)C=C6C[C@@H]CCCCCCCCCS=O)CCCCF)F)CF)F)F))))))))))))))))[C@@]%17%10[H] targets the protein Estrogen receptor. The protein Estrogen receptor is involved in the Estrogen signaling pathway. The Estrogen signaling pathway is modulated by the disease Pachyonychia congenita."} {"text":"The drug [H][C@@]C)NC=O)[C@][H])CC=CC=CC=C6)))))))NC=O)[C@@][H])S)CCCCC)))))))))))CO)=O targets the protein Neprilysin. The protein Neprilysin is involved in the Renin-angiotensin system. The Renin-angiotensin system is modulated by the disease Renal tubular dysgenesis."}", "/scratch/micpie/export/drug_protein_pathway_disease/train_0-0.jsonl": "{"text":"The drug InChI=1S\/C14H16N2O5\/c17-11(5-8-14(20)21)15-9-1-3-10(4-2-9)16-12(18)6-7-13(16)19\/h1-4,11,15,17H,5-8H2,(H,20,21)\/t11-\/m0\/s1 targets the protein MLC-2B. The protein MLC-2B is involved in the Platelet activation. The Platelet activation is modulated by the disease Platelet-type von Willebrand disease."} {"text":"The drug InChI=1S\/C6H14N4O3\/c7-4(5(11)12)2-1-3-9-6(8)10-13\/h4,13H,1-3,7H2,(H,11,12)(H3,8,9,10)\/t4-\/m0\/s1 targets the protein Constitutive NOS. The protein Constitutive NOS is involved in the Arginine biosynthesis. The Arginine biosynthesis is modulated by the disease Citrullinemia."}", "/scratch/micpie/export/compound_protein/test_0-1.jsonl": "{"text":"The protein Ataxia telangiectasia mutated is targeted by the drug with the SMILES Cc1nn(CC(=O)NC(C)C(C)C)c(C)c1S(=O)(=O)N1CCCCC1."} {"text":"The protein NET is targeted by the drug with the SELFIES [O][=C][N][C][C][Branch1][N][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1][C][Ring1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1]."}", "/scratch/micpie/export/compound_protein/valid_0-0.jsonl": "{"text":"The compound SELFIES [C][C][C][=C][C][C][C][=Branch1][C][=O][N][Branch2][Ring1][O][C][=C][C][=C][C][Branch1][P][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][=C][Ring1][S][C][=Branch1][C][=O][C][Ring2][Ring1][O][Ring2][Ring1][#Branch1] targets the protein A-T mutated."} {"text":"The compound canonical SMILES CCCCCCn1nc(NC(=O)C2CNC(=O)C2)cc1-c1ccccc1 targets the protein Sodium\/glucose cotransporter 1."}", "/scratch/micpie/export/compound_protein/test_0-2.jsonl": "{"text":"User: Can you come up with one example for a compound canonical SMILES that targets the protein Ataxia telangiectasia mutated?\nAssistant: Of course, the compound canonical SMILES Cc1nn(CC(=O)NC(C)C(C)C)c(C)c1S(=O)(=O)N1CCCCC1 targets the protein Ataxia telangiectasia mutated."} {"text":"User: Can you come up with an example for a compound InChI that targets the protein Sodium-dependent noradrenaline transporter?\nAssistant: Yes, of course, the compound InChI InChI=1S\/C16H13Cl2NO\/c17-12-6-4-10(5-7-12)15-14(9-19-16(15)20)11-2-1-3-13(18)8-11\/h1-8,14-15H,9H2,(H,19,20) targets the protein Sodium-dependent noradrenaline transporter."}", "/scratch/micpie/export/compound_protein/test_0-0.jsonl": "{"text":"The compound InChI InChI=1S\/C17H30N4O3S\/c1-12(2)13(3)18-16(22)11-21-15(5)17(14(4)19-21)25(23,24)20-9-7-6-8-10-20\/h12-13H,6-11H2,1-5H3,(H,18,22) targets the protein Serine-protein kinase ATM."} {"text":"The compound SMILES O=C1NCC(c2cccc(Cl)c2)C1c1ccc(Cl)cc1 targets the protein Sodium-dependent noradrenaline transporter."}", "/scratch/micpie/export/compound_protein/train_0-0.jsonl": "{"text":"The compound DeepSMILES CccccSCCC=O)ccccF)cc6))))))))C=O)O))))cc6 targets the protein A-T mutated."} {"text":"The compound InChI InChI=1S\/C19H23FN2O\/c1-23-19-8-3-2-5-16(19)14-18(22-11-9-21-10-12-22)15-6-4-7-17(20)13-15\/h2-8,13,18,21H,9-12,14H2,1H3 targets the protein Norepinephrine transporter."}", "/scratch/micpie/export/compound_protein/valid_0-2.jsonl": "{"text":"User: Can you give me an example for a compound InChI that targets the protein A-T mutated?\nAssistant: Yes, the compound InChI InChI=1S\/C22H19BrN2O3\/c1-13-4-2-7-18-19(13)22(28)25(21(18)27)17-6-3-5-14(12-17)20(26)24-16-10-8-15(23)9-11-16\/h2-6,8-13,18-19H,7H2,1H3,(H,24,26) targets the protein A-T mutated."} {"text":"User: Can you come up with one example for a compound SELFIES that targets the protein Na(+)\/glucose cotransporter 1?\nAssistant: Sure, the compound SELFIES [C][C][C][C][C][C][N][N][=C][Branch1][S][N][C][=Branch1][C][=O][C][C][N][C][=Branch1][C][=O][C][Ring1][=Branch1][C][=C][Ring1][=C][C][=C][C][=C][C][=C][Ring1][=Branch1] targets the protein Na(+)\/glucose cotransporter 1."}", "/scratch/micpie/export/compound_protein/valid_0-1.jsonl": "{"text":"The protein Ataxia telangiectasia mutated is targeted by the drug with the SELFIES [C][C][C][=C][C][C][C][=Branch1][C][=O][N][Branch2][Ring1][O][C][=C][C][=C][C][Branch1][P][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][=C][Ring1][S][C][=Branch1][C][=O][C][Ring2][Ring1][O][Ring2][Ring1][#Branch1]."} {"text":"The protein Na(+)\/glucose cotransporter 1 is targeted by the drug with the InChI InChI=1S\/C20H26N4O2\/c1-2-3-4-8-11-24-17(15-9-6-5-7-10-15)13-18(23-24)22-20(26)16-12-19(25)21-14-16\/h5-7,9-10,13,16H,2-4,8,11-12,14H2,1H3,(H,21,25)(H,22,23,26)."}", "/scratch/micpie/export/compound_protein/train_0-2.jsonl": "{"text":"User: Can you give me one example for a compound canonical SMILES that targets the protein A-T mutated?\nAssistant: Yes, of course, the compound canonical SMILES Cc1ccc(SC(CC(=O)c2ccc(F)cc2)C(=O)O)cc1 targets the protein A-T mutated."} {"text":"User: Can you come up with an example for a compound DeepSMILES that targets the protein Sodium-dependent noradrenaline transporter?\nAssistant: Sure, the compound DeepSMILES COcccccc6CCcccccF)c6))))))NCCNCC6 targets the protein Sodium-dependent noradrenaline transporter."}", "/scratch/micpie/export/compound_protein/train_0-1.jsonl": "{"text":"The protein Ataxia telangiectasia mutated is targeted by the drug with the DeepSMILES CccccSCCC=O)ccccF)cc6))))))))C=O)O))))cc6."} {"text":"The protein Sodium-dependent noradrenaline transporter is targeted by the drug with the InChI InChI=1S\/C19H23FN2O\/c1-23-19-8-3-2-5-16(19)14-18(22-11-9-21-10-12-22)15-6-4-7-17(20)13-15\/h2-8,13,18,21H,9-12,14H2,1H3."}", "/scratch/micpie/export/train_02.jsonl": "{"text":"User: Can you create the SELFIES of the molecule with the SMILES [H].[H]C1OC(C2C([H])C([H])C([H])C(F)C2[H])NC1C([H])([H])N1C([H])C([H])([H])C([H])(C(O)OC([H])([H])[H])C([H])([H])C1([H])[H]?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [H].[H][C][O][C][Branch2][Ring1][=Branch1][C][C][Branch1][C][H][C][Branch1][C][H][C][Branch1][C][H][C][Branch1][C][F][C][Ring1][#Branch2][H][N][C][Ring1][S][C][Branch1][C][H][Branch1][C][H][N][C][Branch1][C][H][C][Branch1][C][H][Branch1][C][H][C][Branch1][C][H][Branch1][=C][C][Branch1][C][O][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][C][H][Branch1][C][H][C][Ring2][Ring1][Ring1][Branch1][C][H][H]."} {"text":"Question: What is the number of hydrogen bond donor sites of the compound with SMILES C[C@@H](CNC(=O)Nc1nc(c(s1)C(C)C)c2ccccc2)SC?\nAnswer: 2"}", "/scratch/micpie/export/qm9/train_0-16.jsonl": "{"text":"Task: Please create a chemical with the canonical SMILES based on the text description.\nDescription: It has a heat capacity of 6.316 cal\/(mol K) at 298.15 K and a dipole moment of 1.6256 Debye.\nResult: N"} {"text":"Task: Please give me a molecule with the SELFIES based on the description below.\nDescription: It has a heat capacity of 23.434 cal\/(mol K) at 298.15 K and a dipole moment of 0.8626 Debye.\nResult: [C][N][C][C][C][O][C][Ring1][#Branch1][Ring1][Branch1][C][Ring1][#Branch1][C][Ring1][=Branch1][Ring1][Branch1]"}", "/scratch/micpie/export/qm9/test_0-10.jsonl": "{"text":"The molecule with the DeepSMILES representation of O when calculated with B3LYP DFT simlulations has an enthalpy of -76.400922 Hartree at 298.15 K."} {"text":"The chemical with the SELFIES representation of [C][N][C][C][C][C][Ring1][Branch1][C][Ring1][#Branch1][Ring1][Branch1][C][N][Ring1][=Branch1][Ring1][Branch1] when calculated with B3LYP DFT simlulations has an enthalpy of -380.747675 Hartree at 298.15 K."}", "/scratch/micpie/export/qm9/valid_0-8.jsonl": "{"text":"As per Density Functional Theory simulation the molecular species with canonical SMILES C has an internal energy of -40.47893 Hartree at 0 K."} {"text":"As per DFT simulation the compound with SMILES C1N2C3C4C5CC13C2C45 has an internal energy of -364.720374 Hartree at 0 K."}", "/scratch/micpie/export/qm9/test_0-16.jsonl": "{"text":"Task: Please create a chemical with the SELFIES based on the description below.\nDescription: It has a heat capacity of 6.002 cal\/(mol K) at 298.15 K and a dipole moment of 1.8511 Debye.\nResult: [O]"} {"text":"Task: Please generate a molecule with the SMILES based on the text description.\nDescription: It has a heat capacity of 23.972 cal\/(mol K) at 298.15 K and a dipole moment of 1.248 Debye.\nResult: C1N2C3C4C5C2C13CN45"}", "/scratch/micpie/export/qm9/test_0-15.jsonl": "{"text":"Task: Please create a molecule with the SMILES based on the text description.\nDescription: A molecule with a dipole moment of 1.8511 Debye and an isotropic polarizability of 6.31 Bohr^3.\nResult: O"} {"text":"Task: Please create a molecule with the SMILES based on the text description below.\nDescription: A molecule with a dipole moment of 1.248 Debye and an isotropic polarizability of 73.6 Bohr^3.\nResult: C1N2C3C4C5C2C13CN45"}", "/scratch/micpie/export/qm9/train_0-8.jsonl": "{"text":"As per Density Functional Theory simulation the chemical compound with SELFIES [N] has an internal energy of -56.525887 Hartree at 0 K."} {"text":"As per DFT calculation the chemical with SELFIES [C][N][C][C][C][O][C][Ring1][#Branch1][Ring1][Branch1][C][Ring1][#Branch1][C][Ring1][=Branch1][Ring1][Branch1] has an internal energy of -400.633052 Hartree at 0 K."}", "/scratch/micpie/export/qm9/test_0-5.jsonl": "{"text":"The chemical described by its InChI representation InChI=1S\/H2O\/h1H2 possesses a HOMO-LUMO gap measuring 0.3615 Hartree as per DFT results calculated with B3LYP accuracy."} {"text":"The molecule described by its SMILES notation C1N2C3C4C5C2C13CN45 possesses a HOMO-LUMO gap measuring 0.2953 Hartree as per Density Functional Theory results calculated with B3LYP exchange correlation functional."}", "/scratch/micpie/export/qm9/valid_0-9.jsonl": "{"text":"The molecule represented in DeepSMILES as C has an internal energy of -40.476062 Hartree at 298.15 K when calculated using DFT with B3LYP functional."} {"text":"The chemical represented in SMILES as C1N2C3C4C5CC13C2C45 has an internal energy of -364.714974 Hartree at 298.15 K when calculated using Density Functional Theory with B3LYP functional."}", "/scratch/micpie/export/qm9/test_0-1.jsonl": "{"text":"The isotropic polarizability of compound with the canonical SMILES O is 6.31 Bohr^3 calculated using DFT with B3LYP exchange correlation functional."} {"text":"The isotropic polarizability of molecule with the SELFIES [C][N][C][C][C][C][Ring1][Branch1][C][Ring1][#Branch1][Ring1][Branch1][C][N][Ring1][=Branch1][Ring1][Branch1] is 73.6 Bohr^3 calculated using DFT with B3LYP functional."}", "/scratch/micpie/export/qm9/valid_0-0.jsonl": "{"text":"The chemical with the SELFIES representation of [C] has a dipole moment of 0.0 Debye, calculated computationally using DFT with B3LYP exchange correlation functional."} {"text":"The molecular species with the DeepSMILES representation of CNCCCCC75C7C65 has a dipole moment of 1.9576 Debye, simulated computationally using Density Functional Theory with B3LYP exchange correlation functional."}", "/scratch/micpie/export/qm9/test_0-2.jsonl": "{"text":"The molecule with the DeepSMILES O has a moment of inertia along principal axis of rotation of 799.58812 GHz calculated computationally."} {"text":"The molecule with the SMILES C1N2C3C4C5C2C13CN45 has a moment of inertia along principal axis of rotation of 3.67118 GHz calculated computationally."}", "/scratch/micpie/export/qm9/valid_0-10.jsonl": "{"text":"The molecule with the DeepSMILES representation of C when calculated with B3LYP DFT simlulations has an enthalpy of -40.475117 Hartree at 298.15 K."} {"text":"The molecule with the SMILES representation of C1N2C3C4C5CC13C2C45 when calculated with B3LYP DFT simlulations has an enthalpy of -364.71403 Hartree at 298.15 K."}", "/scratch/micpie/export/qm9/train_0-6.jsonl": "{"text":"The compound with the DeepSMILES representation of N has an electronic spatial extent of 26.1563 Bohr^2 computed using DFT."} {"text":"The compound with the InChI representation of InChI=1S\/C7H7NO\/c1-7-5-2-3(4(2)9-7)6(7)8(1)5\/h2-6H,1H2\/t2-,3+,4-,5-,6+,7+ has an electronic spatial extent of 756.3557 Bohr^2 computed using Density Functional Theory."}", "/scratch/micpie/export/qm9/valid_0-6.jsonl": "{"text":"The compound with the DeepSMILES representation of C has an electronic spatial extent of 35.3641 Bohr^2 computed using DFT."} {"text":"The molecular species with the canonical SMILES representation of C1C2C3C2C2N4CC12C34 has an electronic spatial extent of 803.1904 Bohr^2 computed using Density Functional Theory."}", "/scratch/micpie/export/qm9/test_0-9.jsonl": "{"text":"The molecular species represented in InChI as InChI=1S\/H2O\/h1H2 has an internal energy of -76.401867 Hartree at 298.15 K when calculated using Density Functional Theory with B3LYP exchange correlation functional."} {"text":"The compound represented in SELFIES as [C][N][C][C][C][C][Ring1][Branch1][C][Ring1][#Branch1][Ring1][Branch1][C][N][Ring1][=Branch1][Ring1][Branch1] has an internal energy of -380.748619 Hartree at 298.15 K when calculated using Density Functional Theory with B3LYP exchange correlation functional."}", "/scratch/micpie/export/qm9/test_0-0.jsonl": "{"text":"The chemical compound with the InChI InChI=1S\/H2O\/h1H2 has a dipole moment of 1.8511 Debye, calculated computationally using Density Functional Theory with B3LYP accuracy."} {"text":"The compound with the DeepSMILES CNCCCC5C75CN65 has a dipole moment of 1.248 Debye, calculated computationally using Density Functional Theory with B3LYP functional."}", "/scratch/micpie/export/qm9/valid_0-16.jsonl": "{"text":"Task: Please create a compound with the DeepSMILES based on the text description below.\nDescription: It has a heat capacity of 6.469 cal\/(mol K) at 298.15 K and a dipole moment of 0.0 Debye.\nResult: C"} {"text":"Task: Please create a compound with the DeepSMILES based on the text description below.\nDescription: It has a heat capacity of 24.796 cal\/(mol K) at 298.15 K and a dipole moment of 1.9576 Debye.\nResult: CNCCCCC75C7C65"}", "/scratch/micpie/export/qm9/valid_0-7.jsonl": "{"text":"The molecule with the SMILES C has a zero point energy of 0.044749 Hartree when computed using DFT with B3LYP functional."} {"text":"The chemical with the DeepSMILES representation of CNCCCCC75C7C65 has a zero point energy of 0.152222 Hartree when computed using Density Functional Theory with B3LYP functional."}", "/scratch/micpie/export/qm9/test_0-3.jsonl": "{"text":"Based on DFT simulation with B3LYP exchange correlation functional, the molecule with the DeepSMILES O has an energy of highest occupied molecular orbital -0.2928 Hartree."} {"text":"Based on Density Functional Theory simulation with B3LYP exchange correlation functional, the molecule with the canonical SMILES C1N2C3C2C2N4CC12C34 has an energy of highest occupied molecular orbital -0.2233 Hartree."}", "/scratch/micpie/export/qm9/valid_0-11.jsonl": "{"text":"The SELFIES [C] represents a molecular species that has a Gibbs free energy of -40.498597 Hartree at 298.15 K, calculated computationally using Density Functional Theory with B3LYP accuracy."} {"text":"The DeepSMILES CNCCCCC75C7C65 represents a molecular species that has a Gibbs free energy of -364.74965 Hartree at 298.15 K, calculated computationally using Density Functional Theory with B3LYP accuracy."}", "/scratch/micpie/export/qm9/train_0-0.jsonl": "{"text":"The molecule with the canonical SMILES N has a dipole moment of 1.6256 Debye, simulated computationally using DFT with B3LYP exchange correlation functional."} {"text":"The molecule with the canonical SMILES C1N2C3C4C5OC13C2C54 has a dipole moment of 0.8626 Debye, simulated computationally using DFT with B3LYP accuracy."}", "/scratch/micpie/export/qm9/test_0-6.jsonl": "{"text":"The chemical with the InChI representation of InChI=1S\/H2O\/h1H2 has an electronic spatial extent of 19.0002 Bohr^2 computed using Density Functional Theory."} {"text":"The chemical compound with the SMILES representation of C1N2C3C4C5C2C13CN45 has an electronic spatial extent of 780.3553 Bohr^2 computed using DFT."}", "/scratch/micpie/export/qm9/train_0-10.jsonl": "{"text":"The compound with the SELFIES representation of [N] when calculated with B3LYP DFT simlulations has an enthalpy of -56.522082 Hartree at 298.15 K."} {"text":"The molecule with the SMILES representation of C1N2C3C4C5OC13C2C45 when calculated with B3LYP DFT simlulations has an enthalpy of -400.626948 Hartree at 298.15 K."}", "/scratch/micpie/export/qm9/train_0-3.jsonl": "{"text":"Based on DFT calculation with B3LYP exchange correlation functional, the compound with the SELFIES [N] has an energy of highest occupied molecular orbital -0.257 Hartree."} {"text":"Based on Density Functional Theory simulation with B3LYP accuracy, the chemical with the SMILES C1N2C3C4C5OC13C2C45 has an energy of highest occupied molecular orbital -0.2316 Hartree."}", "/scratch/micpie/export/qm9/train_0-12.jsonl": "{"text":"At temperature 298.15 K, the Density Functional Theory calculated value of heat capacity is 6.316 cal\/(mol K) for the compound with the canonical SMILES representation of N."} {"text":"At temperature 298.15 K, the Density Functional Theory calculated value of heat capacity is 23.434 cal\/(mol K) for the chemical compound with the SELFIES [C][N][C][C][C][O][C][Ring1][#Branch1][Ring1][Branch1][C][Ring1][#Branch1][C][Ring1][=Branch1][Ring1][Branch1]."}", "/scratch/micpie/export/qm9/test_0-13.jsonl": "{"text":"'Question: What is a compound with a gap of 0.3615 Hartree and an energy of highest occupied molecular orbital -0.2928 Hartree?\nAnswer: A compound with DeepSMILES O'"} {"text":"'Question: What is a molecular species with a gap of 0.2953 Hartree and an energy of highest occupied molecular orbital -0.2233 Hartree?\nAnswer: A molecular species with SMILES C1N2C3C4C5C2C13CN45'"}", "/scratch/micpie/export/qm9/valid_0-2.jsonl": "{"text":"The molecule with the DeepSMILES C has a Rotational constant A of 157.7118 GHz calculated computationally."} {"text":"The chemical with the SMILES C1N2C3C4C5CC13C2C45 has a Rotational constant A of 3.52845 GHz calculated computationally."}", "/scratch/micpie/export/qm9/train_0-14.jsonl": "{"text":"'Question: What is a molecule with an electronic spatial extent of 26.1563 Bohr^2 and an energy of lowest unoccupied molecular orbital 0.0829 Hartree?\nAnswer: A molecule with InChI InChI=1S\/H3N\/h1H3'"} {"text":"'Question: What is a molecule with an electronic spatial extent of 756.3557 Bohr^2 and an energy of lowest unoccupied molecular orbital 0.0742 Hartree?\nAnswer: A molecule with canonical SMILES C1N2C3C4C5OC13C2C54'"}", "/scratch/micpie/export/qm9/valid_0-1.jsonl": "{"text":"The polarizability of compound with the canonical SMILES C is 13.21 Bohr^3 calculated using DFT with B3LYP exchange correlation functional."} {"text":"The polarizability of molecule with the canonical SMILES C1C2C3C2C2N4CC12C34 is 77.4 Bohr^3 calculated using Density Functional Theory with B3LYP accuracy."}", "/scratch/micpie/export/qm9/valid_0-13.jsonl": "{"text":"'Question: What is a molecular species with a gap of 0.5048 Hartree and an energy of highest occupied molecular orbital -0.3877 Hartree?\nAnswer: A molecular species with InChI InChI=1S\/CH4\/h1H4'"} {"text":"'Question: What is a chemical with a gap of 0.3003 Hartree and an energy of highest occupied molecular orbital -0.2122 Hartree?\nAnswer: A chemical with canonical SMILES C1C2C3C2C2N4CC12C34'"}", "/scratch/micpie/export/qm9/valid_0-5.jsonl": "{"text":"The compound described by its InChI notation InChI=1S\/CH4\/h1H4 possesses a HOMO-LUMO gap measuring 0.5048 Hartree as per Density Functional Theory results calculated with B3LYP functional."} {"text":"The molecular species represented by its DeepSMILES notation CNCCCCC75C7C65 possesses a HOMO-LUMO gap measuring 0.3003 Hartree as per Density Functional Theory results calculated with B3LYP accuracy."}", "/scratch/micpie/export/qm9/train_0-15.jsonl": "{"text":"Task: Please generate a molecule with the SELFIES based on the description.\nDescription: A molecule with a dipole moment of 1.6256 Debye and an isotropic polarizability of 9.46 Bohr^3.\nResult: [N]"} {"text":"Task: Please give me a molecule with the InChI based on the text description.\nDescription: A molecule with a dipole moment of 0.8626 Debye and an isotropic polarizability of 69.48 Bohr^3.\nResult: InChI=1S\/C7H7NO\/c1-7-5-2-3(4(2)9-7)6(7)8(1)5\/h2-6H,1H2\/t2-,3+,4-,5-,6+,7+"}", "/scratch/micpie/export/qm9/valid_0-4.jsonl": "{"text":"The lowest unoccupied molecular orbital computed using DFT simulation and B3LYP accuracy is 0.1171 Hartree."} {"text":"The lowest unoccupied molecular orbital calculated using DFT simulation and B3LYP functional is 0.0881 Hartree."}", "/scratch/micpie/export/qm9/train_0-5.jsonl": "{"text":"The molecular species represented by its DeepSMILES notation N possesses a HOMO-LUMO gap measuring 0.3399 Hartree as per DFT results calculated with B3LYP functional."} {"text":"The molecular species represented by its DeepSMILES representation CNCCCOC75C7C65 possesses a HOMO-LUMO gap measuring 0.3058 Hartree as per Density Functional Theory results calculated with B3LYP accuracy."}", "/scratch/micpie/export/qm9/valid_0-15.jsonl": "{"text":"Task: Please generate a molecule with the InChI based on the text description.\nDescription: A molecule with a dipole moment of 0.0 Debye and an isotropic polarizability of 13.21 Bohr^3.\nResult: InChI=1S\/CH4\/h1H4"} {"text":"Task: Please create a molecule with the DeepSMILES based on the description below.\nDescription: A molecule with a dipole moment of 1.9576 Debye and an isotropic polarizability of 77.4 Bohr^3.\nResult: CNCCCCC75C7C65"}", "/scratch/micpie/export/qm9/valid_0-12.jsonl": "{"text":"At temperature 298.15 K, the DFT calculated value of heat capacity is 6.469 cal\/(mol K) for the compound with the canonical SMILES C."} {"text":"At temperature 298.15 K, the DFT calculated value of heat capacity is 24.796 cal\/(mol K) for the molecular species with the canonical SMILES representation of C1C2C3C2C2N4CC12C34."}", "/scratch/micpie/export/qm9/train_0-2.jsonl": "{"text":"The molecule with the InChI InChI=1S\/H3N\/h1H3 has a Rotational constant A of 293.60975 GHz calculated computationally."} {"text":"The molecule with the SELFIES [C][N][C][C][C][O][C][Ring1][#Branch1][Ring1][Branch1][C][Ring1][#Branch1][C][Ring1][=Branch1][Ring1][Branch1] has a Rotational constant A of 3.64015 GHz calculated computationally."}", "/scratch/micpie/export/qm9/test_0-11.jsonl": "{"text":"The SMILES O represents a molecular species that has a Gibbs free energy of -76.422349 Hartree at 298.15 K, calculated computationally using DFT with B3LYP exchange correlation functional."} {"text":"The canonical SMILES C1N2C3C2C2N4CC12C34 represents a chemical that has a Gibbs free energy of -380.783148 Hartree at 298.15 K, calculated computationally using DFT with B3LYP accuracy."}", "/scratch/micpie/export/qm9/train_0-7.jsonl": "{"text":"The compound with the SMILES N has a zpve of 0.034358 Hartree when computed using Density Functional Theory with B3LYP functional."} {"text":"The chemical with the SMILES C1N2C3C4C5OC13C2C45 has a zpve of 0.127862 Hartree when computed using DFT with B3LYP functional."}", "/scratch/micpie/export/qm9/train_0-11.jsonl": "{"text":"The SELFIES [N] is from a chemical that has a Gibbs free energy of -56.544961 Hartree at 298.15 K, calculated computationally using DFT with B3LYP exchange correlation functional."} {"text":"The SMILES C1N2C3C4C5OC13C2C45 is from a chemical compound that has a Gibbs free energy of -400.662186 Hartree at 298.15 K, calculated computationally using Density Functional Theory with B3LYP accuracy."}", "/scratch/micpie/export/qm9/train_0-1.jsonl": "{"text":"The polarizability of chemical compound with the SELFIES [N] is 9.46 Bohr^3 calculated using Density Functional Theory with B3LYP functional."} {"text":"The isotropic polarizability of molecular species with the canonical SMILES C1N2C3C4C5OC13C2C54 is 69.48 Bohr^3 calculated using Density Functional Theory with B3LYP accuracy."}", "/scratch/micpie/export/qm9/train_0-13.jsonl": "{"text":"'Question: What is a compound with a homo lumo gap of 0.3399 Hartree and an energy of highest occupied molecular orbital -0.257 Hartree?\nAnswer: A compound with SELFIES [N]'"} {"text":"'Question: What is a compound with a HOMO-LUMO gap of 0.3058 Hartree and an energy of highest occupied molecular orbital -0.2316 Hartree?\nAnswer: A compound with canonical SMILES C1N2C3C4C5OC13C2C54'"}", "/scratch/micpie/export/qm9/train_0-4.jsonl": "{"text":"The LUMO calculated using DFT calculation and B3LYP exchange correlation functional is 0.0829 Hartree."} {"text":"The lowest unoccupied molecular orbital calculated using Density Functional Theory simulation and B3LYP functional is 0.0742 Hartree."}", "/scratch/micpie/export/qm9/test_0-7.jsonl": "{"text":"The chemical with the SMILES representation of O has a zero point vibrational energy of 0.021375 Hartree when computed using Density Functional Theory with B3LYP functional."} {"text":"The molecule with the InChI InChI=1S\/C7H8N2\/c1-7-2-9-5(7)3-4(6(7)9)8(1)3\/h3-6H,1-2H2\/t3-,4+,5+,6-,7+,8? has a zpve of 0.140458 Hartree when computed using Density Functional Theory with B3LYP functional."}", "/scratch/micpie/export/qm9/train_0-9.jsonl": "{"text":"The chemical represented in SELFIES as [N] has an internal energy of -56.523026 Hartree at 298.15 K when calculated using DFT with B3LYP functional."} {"text":"The chemical compound represented in DeepSMILES as CNCCCOC75C7C65 has an internal energy of -400.627892 Hartree at 298.15 K when calculated using Density Functional Theory with B3LYP exchange correlation functional."}", "/scratch/micpie/export/qm9/valid_0-3.jsonl": "{"text":"Based on DFT calculation with B3LYP exchange correlation functional, the molecule with the DeepSMILES C has an energy of highest occupied molecular orbital -0.3877 Hartree."} {"text":"Based on DFT simulation with B3LYP exchange correlation functional, the molecule with the SELFIES [C][N][C][C][C][C][C][Ring1][#Branch1][Ring1][Branch1][C][Ring1][#Branch1][C][Ring1][=Branch1][Ring1][Branch1] has an energy of highest occupied molecular orbital -0.2122 Hartree."}", "/scratch/micpie/export/qm9/test_0-8.jsonl": "{"text":"As per Density Functional Theory calculation the molecule with InChI InChI=1S\/H2O\/h1H2 has an internal energy of -76.404702 Hartree at 0 K."} {"text":"As per Density Functional Theory calculation the molecule with DeepSMILES CNCCCC5C75CN65 has an internal energy of -380.753918 Hartree at 0 K."}", "/scratch/micpie/export/qm9/test_0-14.jsonl": "{"text":"'Question: What is a molecule with an electronic spatial extent of 19.0002 Bohr^2 and an energy of lowest unoccupied molecular orbital 0.0687 Hartree?\nAnswer: A molecule with SELFIES [O]'"} {"text":"'Question: What is a molecule with an electronic spatial extent of 780.3553 Bohr^2 and an energy of lowest unoccupied molecular orbital 0.072 Hartree?\nAnswer: A molecule with canonical SMILES C1N2C3C2C2N4CC12C34'"}", "/scratch/micpie/export/qm9/valid_0-14.jsonl": "{"text":"'Question: What is a molecule with an electronic spatial extent of 35.3641 Bohr^2 and an energy of lowest unoccupied molecular orbital 0.1171 Hartree?\nAnswer: A molecule with InChI InChI=1S\/CH4\/h1H4'"} {"text":"'Question: What is a molecule with an electronic spatial extent of 803.1904 Bohr^2 and an energy of lowest unoccupied molecular orbital 0.0881 Hartree?\nAnswer: A molecule with InChI InChI=1S\/C8H9N\/c1-3-4-5(3)7-8(1)2-9(7)6(4)8\/h3-7H,1-2H2\/t3-,4+,5-,6-,7+,8-'"}", "/scratch/micpie/export/qm9/test_0-4.jsonl": "{"text":"The lumo calculated using Density Functional Theory simulation and B3LYP accuracy is 0.0687 Hartree."} {"text":"The lumo calculated using Density Functional Theory simulation and B3LYP functional is 0.072 Hartree."}", "/scratch/micpie/export/qm9/test_0-12.jsonl": "{"text":"At temperature 298.15 K, the Density Functional Theory calculated value of heat capacity is 6.002 cal\/(mol K) for the chemical with the DeepSMILES representation of O."} {"text":"At temperature 298.15 K, the Density Functional Theory calculated value of heat capacity is 23.972 cal\/(mol K) for the molecular species with the SELFIES [C][N][C][C][C][C][Ring1][Branch1][C][Ring1][#Branch1][Ring1][Branch1][C][N][Ring1][=Branch1][Ring1][Branch1]."}", "/scratch/micpie/export/perovskite_db/train_0-17.jsonl": "{"text":"User: I would like to know the bandgap of a perovskite material.\nAssistant: Cool! What material are you interested in?\nUser: I am interested in the material with the reduced chemical formula of CH6I3NPb.\nAssistant: The bandgap of the material with reduced chemical formula CH6I3NPb is 1.50 eV."} {"text":"User: I want to know the bandgap of a perovskite material.\nAssistant: Cool! What material are you interested in?\nUser: I am interested in the material with the reduced chemical formula of CH6I3NPb.\nAssistant: The bandgap of the material with reduced chemical formula CH6I3NPb is 1.60 eV."}", "/scratch/micpie/export/perovskite_db/train_0-16.jsonl": "{"text":"Question: What is a perovskite solar cell with a a short-circuit voltage of 161.00 mA\/cm^2 and a FF of 0.57 percent?\nAnswer: A device with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Au"} {"text":"Question: What is a perovskite solar cell with a a short-circuit voltage (JSC) of 11.50 mA\/cm^2 and a FF of 0.72 percent?\nAnswer: A solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au"}", "/scratch/micpie/export/perovskite_db/test_0-10.jsonl": "{"text":"Question: What is a perovskite solar cell with a a short-circuit voltage (JSC) of 154.30 mA\/cm^2?\nAnswer: A perovskite solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Carbon"} {"text":"Question: What is a perovskite solar cell with a a JSC of 201.10 mA\/cm^2?\nAnswer: A device with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au"}", "/scratch/micpie/export/perovskite_db/valid_0-8.jsonl": "{"text":"Question: What is a perovskite material with a bandgap of 1.60 eV?\nAnswer: A perovskite material with reduced chemical formula CH6I3NPb"} {"text":"Question: What is a perovskite material with a bandgap of 1.60 eV?\nAnswer: A perovskite material with reduced chemical formula CH6I3NPb"}", "/scratch/micpie/export/perovskite_db/test_0-22.jsonl": "{"text":"User: I would like to design a solar cell with a PCE of 6.17 percent.\nAssistant: Cool! Do you have other needs?\nUser: I would like the solar cell to have an open-circuit voltage of 0.78 V and a JSC of 154.30 mA\/cm^2.\nAssistant: In your case, I recommend SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Carbon."} {"text":"User: I would like to design a perovskite solar cell with a power conversion efficiency of 12.66 percent.\nAssistant: Cool! Do you have other constraints?\nUser: I want the perovskite solar cell to have an OCV of 1.01 V and a short-circuit voltage of 201.10 mA\/cm^2.\nAssistant: I would try the device stack SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au."}", "/scratch/micpie/export/perovskite_db/test_0-16.jsonl": "{"text":"Question: What is a perovskite solar cell with a a short-circuit voltage (JSC) of 154.30 mA\/cm^2 and a fill factor (FF) of 0.51 percent?\nAnswer: A solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Carbon"} {"text":"Question: What is a perovskite solar cell with a a JSC of 201.10 mA\/cm^2 and a FF of 0.62 percent?\nAnswer: A device with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au"}", "/scratch/micpie/export/perovskite_db/test_0-15.jsonl": "{"text":"Question: What is a perovskite solar cell with a an open-circuit voltage (OCV) of 0.78 V and a PCE of 6.17 percent?\nAnswer: A solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Carbon"} {"text":"Question: What is a perovskite solar cell with a an open-circuit voltage of 1.01 V and a PCE of 12.66 percent?\nAnswer: A device with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au"}", "/scratch/micpie/export/perovskite_db/train_0-8.jsonl": "{"text":"Question: What is a perovskite material with a bandgap of 1.50 eV?\nAnswer: A perovskite material with reduced chemical formula CH6I3NPb"} {"text":"Question: What is a perovskite material with a bandgap of 1.60 eV?\nAnswer: A perovskite material with reduced chemical formula CH6I3NPb"}", "/scratch/micpie/export/perovskite_db/test_0-5.jsonl": "{"text":"The perovskite solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Carbon has a FF of 0.51 percent."} {"text":"The solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au achieves a FF of 0.62 percent."}", "/scratch/micpie/export/perovskite_db/valid_0-25.jsonl": "{"text":"User: I want to design a perovskite solar cell with a PCE of 9.80 percent.\nAssistant: Awesome! Do you have other needs?\nUser: Yes, I also want the perovskite solar cell to have CH6I3NPb as the absorber.\nAssistant: In your case, I recommend SLG, ITO, PEDOT:PSS, DPP-DTT, MAPbI3, PCBM-60, LiF, and Ag."} {"text":"User: I would like to design a perovskite solar cell with a power conversion efficiency (PCE) of 16.06 percent.\nAssistant: Cool! Do you have other requirements?\nUser: Yes, I want the perovskite solar cell to have CH6I3NPb as the perovskite material.\nAssistant: I would try the device stack SLG, ITO, VOx, X-DVTPD, MAPbI3, PCBM-60, Bphen, and Ag."}", "/scratch/micpie/export/perovskite_db/valid_0-9.jsonl": "{"text":"Question: What is a perovskite solar cell with a an OCV of 1.00 V?\nAnswer: A perovskite solar cell with the device stack of SLG, ITO, PEDOT:PSS, DPP-DTT, MAPbI3, PCBM-60, LiF, and Ag"} {"text":"Question: What is a perovskite solar cell with a an open-circuit voltage of 1.07 V?\nAnswer: A perovskite solar cell with the device stack of SLG, ITO, VOx, X-DVTPD, MAPbI3, PCBM-60, Bphen, and Ag"}", "/scratch/micpie/export/perovskite_db/test_0-19.jsonl": "{"text":"User: I would love to design a perovskite solar cell with a power conversion efficiency of 6.17 percent.\nAssistant: Interesting! Do you have other requirements?\nUser: I would like the perovskite solar cell to have a fill factor (FF) of 0.51 percent.\nAssistant: In that case, you should use the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Carbon."} {"text":"User: I would love to design a perovskite solar cell with a power conversion efficiency (PCE) of 12.66 percent.\nAssistant: Awesome! Do you have other requirements?\nUser: I want the perovskite solar cell to have a fill factor of 0.62 percent.\nAssistant: In that case, you should use the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au."}", "/scratch/micpie/export/perovskite_db/valid_0-24.jsonl": "{"text":"User: I would like to design a perovskite solar cell with a PCE of 9.80 percent.\nAssistant: That's interesting! Do you have other requirements?\nUser: Indeed, I need the perovskite solar cell to have MAPbI3 as the perovskite material.\nAssistant: In that case, you should use the device stack of SLG, ITO, PEDOT:PSS, DPP-DTT, MAPbI3, PCBM-60, LiF, and Ag."} {"text":"User: I want to design a solar cell with a power conversion efficiency of 16.06 percent.\nAssistant: Interesting! Do you have other constraints?\nUser: Yes, I also want the solar cell to have MAPbI3 as the absorber.\nAssistant: In that case, you should use the device stack of SLG, ITO, VOx, X-DVTPD, MAPbI3, PCBM-60, Bphen, and Ag."}", "/scratch/micpie/export/perovskite_db/train_0-24.jsonl": "{"text":"User: I would like to design a solar cell with a power conversion efficiency of 5.50 percent.\nAssistant: That's interesting! Do you have other requirements?\nUser: Indeed, I need the solar cell to have MAPbI3 as the perovskite material.\nAssistant: In that case, you should use the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Au."} {"text":"User: I would love to design a perovskite solar cell with a power conversion efficiency of 0.07 percent.\nAssistant: Awesome! Do you have other requirements?\nUser: Yes, I also want the perovskite solar cell to have MAPbI3 as the absorber.\nAssistant: In your case, I recommend SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au."}", "/scratch/micpie/export/perovskite_db/test_0-1.jsonl": "{"text":"The perovskite material with the descriptive chemical formula of MAPbI3 has an open-circuit voltage of 0.78 V."} {"text":"The perovskite material with the descriptive chemical formula of MAPbI3 has an open-circuit voltage of 1.01 V."}", "/scratch/micpie/export/perovskite_db/test_0-18.jsonl": "{"text":"User: I want to know the bandgap of a perovskite material.\nAssistant: That's interesting! What material are you interested in?\nUser: I am interested in the material with the descriptive chemical formula of MAPbI3.\nAssistant: The bandgap of the material with descriptive chemical formula MAPbI3 is 1.60 eV."} {"text":"User: I would love to know the bandgap of a perovskite material.\nAssistant: Awesome! What material are you interested in?\nUser: I am interested in the material with the descriptive chemical formula of MAPbI3.\nAssistant: The bandgap of the material with descriptive chemical formula MAPbI3 is 1.60 eV."}", "/scratch/micpie/export/perovskite_db/valid_0-0.jsonl": "{"text":"The perovskite material with the reduced chemical formula of CH6I3NPb has a bandgap of 1.60 eV."} {"text":"The perovskite material with the reduced chemical formula of CH6I3NPb has a bandgap of 1.60 eV."}", "/scratch/micpie/export/perovskite_db/test_0-21.jsonl": "{"text":"User: I would like to design a perovskite solar cell with a PCE of 6.17 percent.\nAssistant: Awesome! Do you have other constraints?\nUser: I would like the perovskite solar cell to have an open-circuit voltage of 0.78 V.\nAssistant: In your case, I recommend SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Carbon."} {"text":"User: I want to design a perovskite solar cell with a power conversion efficiency of 12.66 percent.\nAssistant: Awesome! Do you have other requirements?\nUser: I want the perovskite solar cell to have an open-circuit voltage of 1.01 V.\nAssistant: In your case, I recommend SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au."}", "/scratch/micpie/export/perovskite_db/test_0-2.jsonl": "{"text":"The solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Carbon has a bandgap of 1.60 eV."} {"text":"The solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au has a bandgap of 1.60 eV."}", "/scratch/micpie/export/perovskite_db/train_0-22.jsonl": "{"text":"User: I want to design a perovskite solar cell with a power conversion efficiency of 5.50 percent.\nAssistant: Interesting! Do you have other constraints?\nUser: I would like the perovskite solar cell to have an open-circuit voltage of 0.63 V and a short-circuit voltage of 161.00 mA\/cm^2.\nAssistant: I would try the device stack SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Au."} {"text":"User: I would love to design a perovskite solar cell with a power conversion efficiency of 0.07 percent.\nAssistant: That's interesting! Do you have other requirements?\nUser: I would like the perovskite solar cell to have an open-circuit voltage of 0.89 V and a short-circuit voltage of 11.50 mA\/cm^2.\nAssistant: In your case, I recommend SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au."}", "/scratch/micpie/export/perovskite_db/valid_0-10.jsonl": "{"text":"Question: What is a perovskite solar cell with a a short-circuit voltage (JSC) of 133.00 mA\/cm^2?\nAnswer: A solar cell with the device stack of SLG, ITO, PEDOT:PSS, DPP-DTT, MAPbI3, PCBM-60, LiF, and Ag"} {"text":"Question: What is a perovskite solar cell with a a short-circuit voltage (JSC) of 195.20 mA\/cm^2?\nAnswer: A perovskite solar cell with the device stack of SLG, ITO, VOx, X-DVTPD, MAPbI3, PCBM-60, Bphen, and Ag"}", "/scratch/micpie/export/perovskite_db/train_0-6.jsonl": "{"text":"The perovskite solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Au has a power conversion efficiency (PCE) of 5.50 percent."} {"text":"The solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au achieves a power conversion efficiency of 0.07 percent."}", "/scratch/micpie/export/perovskite_db/valid_0-6.jsonl": "{"text":"The solar cell with the device stack of SLG, ITO, PEDOT:PSS, DPP-DTT, MAPbI3, PCBM-60, LiF, and Ag has a power conversion efficiency of 9.80 percent."} {"text":"The solar cell with the device stack of SLG, ITO, VOx, X-DVTPD, MAPbI3, PCBM-60, Bphen, and Ag achieves a power conversion efficiency (PCE) of 16.06 percent."}", "/scratch/micpie/export/perovskite_db/train_0-21.jsonl": "{"text":"User: I want to design a solar cell with a power conversion efficiency (PCE) of 5.50 percent.\nAssistant: Awesome! Do you have other requirements?\nUser: I want the solar cell to have an OCV of 0.63 V.\nAssistant: In your case, I recommend SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Au."} {"text":"User: I would like to design a perovskite solar cell with a power conversion efficiency (PCE) of 0.07 percent.\nAssistant: Interesting! Do you have other requirements?\nUser: I want the perovskite solar cell to have an OCV of 0.89 V.\nAssistant: In your case, I recommend SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au."}", "/scratch/micpie/export/perovskite_db/train_0-19.jsonl": "{"text":"User: I would like to design a solar cell with a PCE of 5.50 percent.\nAssistant: Awesome! Do you have other requirements?\nUser: I want the solar cell to have a FF of 0.57 percent.\nAssistant: In that case, you should use the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Au."} {"text":"User: I would love to design a solar cell with a power conversion efficiency of 0.07 percent.\nAssistant: Cool! Do you have other requirements?\nUser: I would like the solar cell to have a fill factor (FF) of 0.72 percent.\nAssistant: In that case, you should use the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au."}", "/scratch/micpie/export/perovskite_db/test_0-9.jsonl": "{"text":"Question: What is a perovskite solar cell with a an open-circuit voltage of 0.78 V?\nAnswer: A device with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Carbon"} {"text":"Question: What is a perovskite solar cell with a an open-circuit voltage (OCV) of 1.01 V?\nAnswer: A solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au"}", "/scratch/micpie/export/perovskite_db/test_0-0.jsonl": "{"text":"The perovskite material with the reduced chemical formula of CH6I3NPb has a bandgap of 1.60 eV."} {"text":"The perovskite material with the reduced chemical formula of CH6I3NPb has a bandgap of 1.60 eV."}", "/scratch/micpie/export/perovskite_db/test_0-24.jsonl": "{"text":"User: I would love to design a perovskite solar cell with a power conversion efficiency of 6.17 percent.\nAssistant: Awesome! Do you have other needs?\nUser: Yes, I need the perovskite solar cell to have MAPbI3 as the absorber.\nAssistant: I would try the device stack SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Carbon."} {"text":"User: I would like to design a perovskite solar cell with a power conversion efficiency (PCE) of 12.66 percent.\nAssistant: Awesome! Do you have other requirements?\nUser: Indeed, I need the perovskite solar cell to have MAPbI3 as the perovskite material.\nAssistant: I would try the device stack SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au."}", "/scratch/micpie/export/perovskite_db/valid_0-16.jsonl": "{"text":"Question: What is a perovskite solar cell with a a short-circuit voltage (JSC) of 133.00 mA\/cm^2 and a fill factor of 0.74 percent?\nAnswer: A perovskite solar cell with the device stack of SLG, ITO, PEDOT:PSS, DPP-DTT, MAPbI3, PCBM-60, LiF, and Ag"} {"text":"Question: What is a perovskite solar cell with a a JSC of 195.20 mA\/cm^2 and a FF of 0.77 percent?\nAnswer: A solar cell with the device stack of SLG, ITO, VOx, X-DVTPD, MAPbI3, PCBM-60, Bphen, and Ag"}", "/scratch/micpie/export/perovskite_db/valid_0-7.jsonl": "{"text":"The solar cell with the device stack of SLG, ITO, PEDOT:PSS, DPP-DTT, MAPbI3, PCBM-60, LiF, and Ag achieves an open-circuit voltage of 1.00 V and a JSC of 133.00 mA\/cm^2."} {"text":"The solar cell with the device stack of SLG, ITO, VOx, X-DVTPD, MAPbI3, PCBM-60, Bphen, and Ag achieves an open-circuit voltage of 1.07 V and a JSC of 195.20 mA\/cm^2."}", "/scratch/micpie/export/perovskite_db/test_0-3.jsonl": "{"text":"The solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Carbon achieves an open-circuit voltage (OCV) of 0.78 V."} {"text":"The solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au achieves an OCV of 1.01 V."}", "/scratch/micpie/export/perovskite_db/valid_0-11.jsonl": "{"text":"Question: What is a perovskite solar cell with a a fill factor of 0.74 percent?\nAnswer: A device with the device stack of SLG, ITO, PEDOT:PSS, DPP-DTT, MAPbI3, PCBM-60, LiF, and Ag"} {"text":"Question: What is a perovskite solar cell with a a fill factor of 0.77 percent?\nAnswer: A perovskite solar cell with the device stack of SLG, ITO, VOx, X-DVTPD, MAPbI3, PCBM-60, Bphen, and Ag"}", "/scratch/micpie/export/perovskite_db/train_0-20.jsonl": "{"text":"User: I want to design a solar cell with a PCE of 5.50 percent.\nAssistant: Cool! Do you have other needs?\nUser: I want the solar cell to have a short-circuit voltage of 161.00 mA\/cm^2.\nAssistant: In that case, you should use the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Au."} {"text":"User: I would love to design a solar cell with a power conversion efficiency of 0.07 percent.\nAssistant: Interesting! Do you have other needs?\nUser: I would like the solar cell to have a short-circuit voltage of 11.50 mA\/cm^2.\nAssistant: In that case, you should use the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au."}", "/scratch/micpie/export/perovskite_db/valid_0-20.jsonl": "{"text":"User: I want to design a perovskite solar cell with a power conversion efficiency (PCE) of 9.80 percent.\nAssistant: That's interesting! Do you have other constraints?\nUser: I would like the perovskite solar cell to have a JSC of 133.00 mA\/cm^2.\nAssistant: I would try the device stack SLG, ITO, PEDOT:PSS, DPP-DTT, MAPbI3, PCBM-60, LiF, and Ag."} {"text":"User: I want to design a solar cell with a power conversion efficiency of 16.06 percent.\nAssistant: That's interesting! Do you have other constraints?\nUser: I would like the solar cell to have a short-circuit voltage of 195.20 mA\/cm^2.\nAssistant: In your case, I recommend SLG, ITO, VOx, X-DVTPD, MAPbI3, PCBM-60, Bphen, and Ag."}", "/scratch/micpie/export/perovskite_db/train_0-0.jsonl": "{"text":"The perovskite material with the reduced chemical formula of CH6I3NPb has a bandgap of 1.50 eV."} {"text":"The perovskite material with the reduced chemical formula of CH6I3NPb has a bandgap of 1.60 eV."}", "/scratch/micpie/export/perovskite_db/test_0-6.jsonl": "{"text":"The solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Carbon achieves a PCE of 6.17 percent."} {"text":"The perovskite solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au achieves a power conversion efficiency (PCE) of 12.66 percent."}", "/scratch/micpie/export/perovskite_db/train_0-10.jsonl": "{"text":"Question: What is a perovskite solar cell with a a short-circuit voltage of 161.00 mA\/cm^2?\nAnswer: A solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Au"} {"text":"Question: What is a perovskite solar cell with a a short-circuit voltage of 11.50 mA\/cm^2?\nAnswer: A device with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au"}", "/scratch/micpie/export/perovskite_db/train_0-3.jsonl": "{"text":"The solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Au achieves an open-circuit voltage of 0.63 V."} {"text":"The solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au achieves an OCV of 0.89 V."}", "/scratch/micpie/export/perovskite_db/train_0-23.jsonl": "{"text":"User: I want to design a perovskite solar cell with a PCE of 5.50 percent.\nAssistant: Awesome! Do you have other constraints?\nUser: I would like the perovskite solar cell to have an open-circuit voltage (OCV) of 0.63 V and a FF of 0.57 percent.\nAssistant: In your case, I recommend SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Au."} {"text":"User: I would like to design a perovskite solar cell with a power conversion efficiency of 0.07 percent.\nAssistant: That's interesting! Do you have other constraints?\nUser: I would like the perovskite solar cell to have an open-circuit voltage (OCV) of 0.89 V and a fill factor (FF) of 0.72 percent.\nAssistant: In that case, you should use the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au."}", "/scratch/micpie/export/perovskite_db/train_0-12.jsonl": "{"text":"Question: What is a perovskite solar cell with a a power conversion efficiency of 5.50 percent?\nAnswer: A perovskite solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Au"} {"text":"Question: What is a perovskite solar cell with a a PCE of 0.07 percent?\nAnswer: A perovskite solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au"}", "/scratch/micpie/export/perovskite_db/test_0-13.jsonl": "{"text":"Question: What is a perovskite solar cell with a an OCV of 0.78 V and a short-circuit voltage (JSC) of 154.30 mA\/cm^2?\nAnswer: A solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Carbon"} {"text":"Question: What is a perovskite solar cell with a an open-circuit voltage (OCV) of 1.01 V and a short-circuit voltage (JSC) of 201.10 mA\/cm^2?\nAnswer: A solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au"}", "/scratch/micpie/export/perovskite_db/test_0-23.jsonl": "{"text":"User: I would love to design a perovskite solar cell with a PCE of 6.17 percent.\nAssistant: Cool! Do you have other requirements?\nUser: I want the perovskite solar cell to have an open-circuit voltage (OCV) of 0.78 V and a fill factor (FF) of 0.51 percent.\nAssistant: In that case, you should use the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Carbon."} {"text":"User: I would like to design a solar cell with a power conversion efficiency of 12.66 percent.\nAssistant: Interesting! Do you have other requirements?\nUser: I would like the solar cell to have an open-circuit voltage (OCV) of 1.01 V and a fill factor of 0.62 percent.\nAssistant: In that case, you should use the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au."}", "/scratch/micpie/export/perovskite_db/valid_0-2.jsonl": "{"text":"The perovskite solar cell with the device stack of SLG, ITO, PEDOT:PSS, DPP-DTT, MAPbI3, PCBM-60, LiF, and Ag has a bandgap of 1.60 eV."} {"text":"The solar cell with the device stack of SLG, ITO, VOx, X-DVTPD, MAPbI3, PCBM-60, Bphen, and Ag has a bandgap of 1.60 eV."}", "/scratch/micpie/export/perovskite_db/valid_0-21.jsonl": "{"text":"User: I would love to design a solar cell with a PCE of 9.80 percent.\nAssistant: Awesome! Do you have other constraints?\nUser: I would like the solar cell to have an open-circuit voltage (OCV) of 1.00 V.\nAssistant: In that case, you should use the device stack of SLG, ITO, PEDOT:PSS, DPP-DTT, MAPbI3, PCBM-60, LiF, and Ag."} {"text":"User: I want to design a perovskite solar cell with a power conversion efficiency of 16.06 percent.\nAssistant: That's interesting! Do you have other requirements?\nUser: I want the perovskite solar cell to have an open-circuit voltage of 1.07 V.\nAssistant: In that case, you should use the device stack of SLG, ITO, VOx, X-DVTPD, MAPbI3, PCBM-60, Bphen, and Ag."}", "/scratch/micpie/export/perovskite_db/train_0-14.jsonl": "{"text":"Question: What is a perovskite solar cell with a an open-circuit voltage (OCV) of 0.63 V and a fill factor (FF) of 0.57 percent?\nAnswer: A solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Au"} {"text":"Question: What is a perovskite solar cell with a an open-circuit voltage of 0.89 V and a fill factor of 0.72 percent?\nAnswer: A solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au"}", "/scratch/micpie/export/perovskite_db/valid_0-1.jsonl": "{"text":"The perovskite material with the descriptive chemical formula of MAPbI3 has an open-circuit voltage of 1.00 V."} {"text":"The perovskite material with the descriptive chemical formula of MAPbI3 has an open-circuit voltage of 1.07 V."}", "/scratch/micpie/export/perovskite_db/valid_0-13.jsonl": "{"text":"Question: What is a perovskite solar cell with a an open-circuit voltage of 1.00 V and a short-circuit voltage of 133.00 mA\/cm^2?\nAnswer: A solar cell with the device stack of SLG, ITO, PEDOT:PSS, DPP-DTT, MAPbI3, PCBM-60, LiF, and Ag"} {"text":"Question: What is a perovskite solar cell with a an open-circuit voltage (OCV) of 1.07 V and a JSC of 195.20 mA\/cm^2?\nAnswer: A device with the device stack of SLG, ITO, VOx, X-DVTPD, MAPbI3, PCBM-60, Bphen, and Ag"}", "/scratch/micpie/export/perovskite_db/valid_0-23.jsonl": "{"text":"User: I want to design a solar cell with a power conversion efficiency (PCE) of 9.80 percent.\nAssistant: Cool! Do you have other constraints?\nUser: I want the solar cell to have an open-circuit voltage (OCV) of 1.00 V and a FF of 0.74 percent.\nAssistant: In that case, you should use the device stack of SLG, ITO, PEDOT:PSS, DPP-DTT, MAPbI3, PCBM-60, LiF, and Ag."} {"text":"User: I would love to design a perovskite solar cell with a PCE of 16.06 percent.\nAssistant: Awesome! Do you have other constraints?\nUser: I would like the perovskite solar cell to have an open-circuit voltage of 1.07 V and a FF of 0.77 percent.\nAssistant: In that case, you should use the device stack of SLG, ITO, VOx, X-DVTPD, MAPbI3, PCBM-60, Bphen, and Ag."}", "/scratch/micpie/export/perovskite_db/valid_0-5.jsonl": "{"text":"The solar cell with the device stack of SLG, ITO, PEDOT:PSS, DPP-DTT, MAPbI3, PCBM-60, LiF, and Ag achieves a FF of 0.74 percent."} {"text":"The perovskite solar cell with the device stack of SLG, ITO, VOx, X-DVTPD, MAPbI3, PCBM-60, Bphen, and Ag has a fill factor of 0.77 percent."}", "/scratch/micpie/export/perovskite_db/train_0-15.jsonl": "{"text":"Question: What is a perovskite solar cell with a an OCV of 0.63 V and a power conversion efficiency (PCE) of 5.50 percent?\nAnswer: A device with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Au"} {"text":"Question: What is a perovskite solar cell with a an OCV of 0.89 V and a PCE of 0.07 percent?\nAnswer: A perovskite solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au"}", "/scratch/micpie/export/perovskite_db/valid_0-4.jsonl": "{"text":"The perovskite solar cell with the device stack of SLG, ITO, PEDOT:PSS, DPP-DTT, MAPbI3, PCBM-60, LiF, and Ag has a short-circuit voltage of 133.00 mA\/cm^2."} {"text":"The solar cell with the device stack of SLG, ITO, VOx, X-DVTPD, MAPbI3, PCBM-60, Bphen, and Ag has a JSC of 195.20 mA\/cm^2."}", "/scratch/micpie/export/perovskite_db/train_0-5.jsonl": "{"text":"The solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Au has a fill factor of 0.57 percent."} {"text":"The perovskite solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au achieves a FF of 0.72 percent."}", "/scratch/micpie/export/perovskite_db/valid_0-15.jsonl": "{"text":"Question: What is a perovskite solar cell with a an OCV of 1.00 V and a power conversion efficiency of 9.80 percent?\nAnswer: A device with the device stack of SLG, ITO, PEDOT:PSS, DPP-DTT, MAPbI3, PCBM-60, LiF, and Ag"} {"text":"Question: What is a perovskite solar cell with a an open-circuit voltage (OCV) of 1.07 V and a PCE of 16.06 percent?\nAnswer: A perovskite solar cell with the device stack of SLG, ITO, VOx, X-DVTPD, MAPbI3, PCBM-60, Bphen, and Ag"}", "/scratch/micpie/export/perovskite_db/valid_0-12.jsonl": "{"text":"Question: What is a perovskite solar cell with a a power conversion efficiency (PCE) of 9.80 percent?\nAnswer: A device with the device stack of SLG, ITO, PEDOT:PSS, DPP-DTT, MAPbI3, PCBM-60, LiF, and Ag"} {"text":"Question: What is a perovskite solar cell with a a power conversion efficiency (PCE) of 16.06 percent?\nAnswer: A perovskite solar cell with the device stack of SLG, ITO, VOx, X-DVTPD, MAPbI3, PCBM-60, Bphen, and Ag"}", "/scratch/micpie/export/perovskite_db/valid_0-18.jsonl": "{"text":"User: I would love to know the bandgap of a perovskite material.\nAssistant: That's interesting! What material are you interested in?\nUser: I am interested in the material with the descriptive chemical formula of MAPbI3.\nAssistant: The bandgap of the material with descriptive chemical formula MAPbI3 is 1.60 eV."} {"text":"User: I would like to know the bandgap of a perovskite material.\nAssistant: That's interesting! What material are you interested in?\nUser: I am interested in the material with the descriptive chemical formula of MAPbI3.\nAssistant: The bandgap of the material with descriptive chemical formula MAPbI3 is 1.60 eV."}", "/scratch/micpie/export/perovskite_db/train_0-2.jsonl": "{"text":"The solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Au has a bandgap of 1.50 eV."} {"text":"The solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au has a bandgap of 1.60 eV."}", "/scratch/micpie/export/perovskite_db/test_0-11.jsonl": "{"text":"Question: What is a perovskite solar cell with a a FF of 0.51 percent?\nAnswer: A device with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Carbon"} {"text":"Question: What is a perovskite solar cell with a a fill factor of 0.62 percent?\nAnswer: A solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au"}", "/scratch/micpie/export/perovskite_db/train_0-7.jsonl": "{"text":"The perovskite solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Au has an OCV of 0.63 V and a JSC of 161.00 mA\/cm^2."} {"text":"The solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au has an open-circuit voltage of 0.89 V and a short-circuit voltage of 11.50 mA\/cm^2."}", "/scratch/micpie/export/perovskite_db/test_0-17.jsonl": "{"text":"User: I would love to know the bandgap of a perovskite material.\nAssistant: That's interesting! What material are you interested in?\nUser: I am interested in the material with the reduced chemical formula of CH6I3NPb.\nAssistant: The bandgap of the material with reduced chemical formula CH6I3NPb is 1.60 eV."} {"text":"User: I would love to know the bandgap of a perovskite material.\nAssistant: Cool! What material are you interested in?\nUser: I am interested in the material with the reduced chemical formula of CH6I3NPb.\nAssistant: The bandgap of the material with reduced chemical formula CH6I3NPb is 1.60 eV."}", "/scratch/micpie/export/perovskite_db/valid_0-19.jsonl": "{"text":"User: I want to design a perovskite solar cell with a power conversion efficiency (PCE) of 9.80 percent.\nAssistant: That's interesting! Do you have other requirements?\nUser: I would like the perovskite solar cell to have a fill factor (FF) of 0.74 percent.\nAssistant: In that case, you should use the device stack of SLG, ITO, PEDOT:PSS, DPP-DTT, MAPbI3, PCBM-60, LiF, and Ag."} {"text":"User: I would love to design a solar cell with a power conversion efficiency of 16.06 percent.\nAssistant: Cool! Do you have other requirements?\nUser: I want the solar cell to have a fill factor of 0.77 percent.\nAssistant: In that case, you should use the device stack of SLG, ITO, VOx, X-DVTPD, MAPbI3, PCBM-60, Bphen, and Ag."}", "/scratch/micpie/export/perovskite_db/train_0-11.jsonl": "{"text":"Question: What is a perovskite solar cell with a a FF of 0.57 percent?\nAnswer: A device with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Au"} {"text":"Question: What is a perovskite solar cell with a a FF of 0.72 percent?\nAnswer: A device with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au"}", "/scratch/micpie/export/perovskite_db/train_0-1.jsonl": "{"text":"The perovskite material with the descriptive chemical formula of MAPbI3 has an open-circuit voltage of 0.63 V."} {"text":"The perovskite material with the descriptive chemical formula of MAPbI3 has an open-circuit voltage of 0.89 V."}", "/scratch/micpie/export/perovskite_db/train_0-13.jsonl": "{"text":"Question: What is a perovskite solar cell with a an open-circuit voltage of 0.63 V and a JSC of 161.00 mA\/cm^2?\nAnswer: A solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Au"} {"text":"Question: What is a perovskite solar cell with a an open-circuit voltage (OCV) of 0.89 V and a short-circuit voltage of 11.50 mA\/cm^2?\nAnswer: A solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au"}", "/scratch/micpie/export/perovskite_db/train_0-4.jsonl": "{"text":"The solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Au has a short-circuit voltage of 161.00 mA\/cm^2."} {"text":"The perovskite solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au has a short-circuit voltage of 11.50 mA\/cm^2."}", "/scratch/micpie/export/perovskite_db/test_0-7.jsonl": "{"text":"The solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Carbon achieves an open-circuit voltage (OCV) of 0.78 V and achieves a short-circuit voltage of 154.30 mA\/cm^2."} {"text":"The solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au has an OCV of 1.01 V and achieves a short-circuit voltage (JSC) of 201.10 mA\/cm^2."}", "/scratch/micpie/export/perovskite_db/train_0-9.jsonl": "{"text":"Question: What is a perovskite solar cell with a an OCV of 0.63 V?\nAnswer: A solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Au"} {"text":"Question: What is a perovskite solar cell with a an open-circuit voltage (OCV) of 0.89 V?\nAnswer: A perovskite solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au"}", "/scratch/micpie/export/perovskite_db/train_0-25.jsonl": "{"text":"User: I want to design a solar cell with a power conversion efficiency (PCE) of 5.50 percent.\nAssistant: That's interesting! Do you have other needs?\nUser: Indeed, I also want the solar cell to have CH6I3NPb as the absorber.\nAssistant: I would try the device stack SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Au."} {"text":"User: I want to design a perovskite solar cell with a power conversion efficiency of 0.07 percent.\nAssistant: Cool! Do you have other requirements?\nUser: Indeed, I also want the perovskite solar cell to have CH6I3NPb as the perovskite material.\nAssistant: In your case, I recommend SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au."}", "/scratch/micpie/export/perovskite_db/valid_0-22.jsonl": "{"text":"User: I would like to design a perovskite solar cell with a power conversion efficiency of 9.80 percent.\nAssistant: Interesting! Do you have other requirements?\nUser: I would like the perovskite solar cell to have an open-circuit voltage of 1.00 V and a short-circuit voltage of 133.00 mA\/cm^2.\nAssistant: In your case, I recommend SLG, ITO, PEDOT:PSS, DPP-DTT, MAPbI3, PCBM-60, LiF, and Ag."} {"text":"User: I would love to design a perovskite solar cell with a power conversion efficiency (PCE) of 16.06 percent.\nAssistant: Cool! Do you have other requirements?\nUser: I would like the perovskite solar cell to have an OCV of 1.07 V and a short-circuit voltage (JSC) of 195.20 mA\/cm^2.\nAssistant: In that case, you should use the device stack of SLG, ITO, VOx, X-DVTPD, MAPbI3, PCBM-60, Bphen, and Ag."}", "/scratch/micpie/export/perovskite_db/train_0-18.jsonl": "{"text":"User: I would like to know the bandgap of a perovskite material.\nAssistant: Interesting! What material are you interested in?\nUser: I am interested in the material with the descriptive chemical formula of MAPbI3.\nAssistant: The bandgap of the material with descriptive chemical formula MAPbI3 is 1.50 eV."} {"text":"User: I want to know the bandgap of a perovskite material.\nAssistant: Awesome! What material are you interested in?\nUser: I am interested in the material with the descriptive chemical formula of MAPbI3.\nAssistant: The bandgap of the material with descriptive chemical formula MAPbI3 is 1.60 eV."}", "/scratch/micpie/export/perovskite_db/valid_0-3.jsonl": "{"text":"The perovskite solar cell with the device stack of SLG, ITO, PEDOT:PSS, DPP-DTT, MAPbI3, PCBM-60, LiF, and Ag has an open-circuit voltage (OCV) of 1.00 V."} {"text":"The solar cell with the device stack of SLG, ITO, VOx, X-DVTPD, MAPbI3, PCBM-60, Bphen, and Ag has an open-circuit voltage (OCV) of 1.07 V."}", "/scratch/micpie/export/perovskite_db/test_0-8.jsonl": "{"text":"Question: What is a perovskite material with a bandgap of 1.60 eV?\nAnswer: A perovskite material with reduced chemical formula CH6I3NPb"} {"text":"Question: What is a perovskite material with a bandgap of 1.60 eV?\nAnswer: A perovskite material with reduced chemical formula CH6I3NPb"}", "/scratch/micpie/export/perovskite_db/test_0-14.jsonl": "{"text":"Question: What is a perovskite solar cell with a an OCV of 0.78 V and a fill factor of 0.51 percent?\nAnswer: A solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Carbon"} {"text":"Question: What is a perovskite solar cell with a an open-circuit voltage (OCV) of 1.01 V and a fill factor of 0.62 percent?\nAnswer: A perovskite solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au"}", "/scratch/micpie/export/perovskite_db/valid_0-17.jsonl": "{"text":"User: I would love to know the bandgap of a perovskite material.\nAssistant: Interesting! What material are you interested in?\nUser: I am interested in the material with the reduced chemical formula of CH6I3NPb.\nAssistant: The bandgap of the material with reduced chemical formula CH6I3NPb is 1.60 eV."} {"text":"User: I want to know the bandgap of a perovskite material.\nAssistant: Awesome! What material are you interested in?\nUser: I am interested in the material with the reduced chemical formula of CH6I3NPb.\nAssistant: The bandgap of the material with reduced chemical formula CH6I3NPb is 1.60 eV."}", "/scratch/micpie/export/perovskite_db/valid_0-14.jsonl": "{"text":"Question: What is a perovskite solar cell with a an open-circuit voltage (OCV) of 1.00 V and a fill factor of 0.74 percent?\nAnswer: A device with the device stack of SLG, ITO, PEDOT:PSS, DPP-DTT, MAPbI3, PCBM-60, LiF, and Ag"} {"text":"Question: What is a perovskite solar cell with a an open-circuit voltage of 1.07 V and a FF of 0.77 percent?\nAnswer: A perovskite solar cell with the device stack of SLG, ITO, VOx, X-DVTPD, MAPbI3, PCBM-60, Bphen, and Ag"}", "/scratch/micpie/export/perovskite_db/test_0-25.jsonl": "{"text":"User: I would like to design a perovskite solar cell with a power conversion efficiency of 6.17 percent.\nAssistant: Awesome! Do you have other constraints?\nUser: Yes, I need the perovskite solar cell to have CH6I3NPb as the perovskite material.\nAssistant: In that case, you should use the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Carbon."} {"text":"User: I want to design a perovskite solar cell with a PCE of 12.66 percent.\nAssistant: Awesome! Do you have other requirements?\nUser: Yes, I want the perovskite solar cell to have CH6I3NPb as the perovskite material.\nAssistant: In your case, I recommend SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au."}", "/scratch/micpie/export/perovskite_db/test_0-4.jsonl": "{"text":"The solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Carbon has a short-circuit voltage (JSC) of 154.30 mA\/cm^2."} {"text":"The perovskite solar cell with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au has a JSC of 201.10 mA\/cm^2."}", "/scratch/micpie/export/perovskite_db/test_0-12.jsonl": "{"text":"Question: What is a perovskite solar cell with a a power conversion efficiency of 6.17 percent?\nAnswer: A device with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Carbon"} {"text":"Question: What is a perovskite solar cell with a a PCE of 12.66 percent?\nAnswer: A device with the device stack of SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au"}", "/scratch/micpie/export/perovskite_db/test_0-20.jsonl": "{"text":"User: I want to design a solar cell with a PCE of 6.17 percent.\nAssistant: Interesting! Do you have other constraints?\nUser: I want the solar cell to have a short-circuit voltage of 154.30 mA\/cm^2.\nAssistant: I would try the device stack SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, and Carbon."} {"text":"User: I want to design a solar cell with a PCE of 12.66 percent.\nAssistant: Interesting! Do you have other requirements?\nUser: I want the solar cell to have a short-circuit voltage of 201.10 mA\/cm^2.\nAssistant: In your case, I recommend SLG, FTO, TiO2-c, TiO2-mp, MAPbI3, Spiro-MeOTAD, and Au."}", "/scratch/micpie/export/aminoacids/train_0-0.jsonl": "{"text":"The AA with the InChI InChI=1S\/C3H7NO2\/c1-2(4)3(5)6\/h2H,4H2,1H3,(H,5,6)\/t2-\/m0\/s1 has a one-letter code A and a three-letter code ala."} {"text":"The AA with the InChI InChI=1S\/C5H11NO2\/c1-3(2)4(6)5(7)8\/h3-4H,6H2,1-2H3,(H,7,8)\/t4-\/m0\/s1 has a one-letter code V and a three-letter code val."}", "/scratch/micpie/export/aminoacids/train_0-3.jsonl": "{"text":"Question: What is the one-letter code of the amino acid (AA) with the SMILES C[C@@H](C(=O)O)N?\nAnswer: ala."} {"text":"Question: What is the one-letter code of the AA with the SMILES CC(C)[C@@H](C(=O)O)N?\nAnswer: val."}", "/scratch/micpie/export/aminoacids/train_0-2.jsonl": "{"text":"Question: What is the one-letter code of the AA with the SELFIES [C][C@@H1][Branch1][=Branch1][C][=Branch1][C][=O][O][N]?\nAnswer: A."} {"text":"Question: What is the one-letter code of the amino acid (AA) with the SELFIES [C][C][Branch1][C][C][C@@H1][Branch1][=Branch1][C][=Branch1][C][=O][O][N]?\nAnswer: V."}", "/scratch/micpie/export/aminoacids/train_0-1.jsonl": "{"text":"The essential amino acid alanine has a one-letter code A and a three-letter code ala."} {"text":"The amino acid (AA) valine has a one-letter code V and a three-letter code val."}", "/scratch/micpie/export/aminoacids/train_0-4.jsonl": "{"text":"Question: What is the type of the amino acid with the one-letter code A and canonical SMILES C[C@H](N)C(=O)O?\nConstraint: The possible types are: polar, non-polar, positively charged, negatively charged\nAnswer: From the provided amino acid types (polar, non-polar, positively charged, negatively charged), the amino acid with the one-letter code A is non-polar."} {"text":"Question: What is the type of the amino acid with the one-letter code V and InChI InChI=1S\/C5H11NO2\/c1-3(2)4(6)5(7)8\/h3-4H,6H2,1-2H3,(H,7,8)\/t4-\/m0\/s1?\nConstraint: The possible types are: polar, non-polar, positively charged, negatively charged\nAnswer: From the provided amino acid types (polar, non-polar, positively charged, negatively charged), the amino acid with the one-letter code V is non-polar."}", "/scratch/micpie/export/bio_ner_58/train_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: Sample ID Location Habitat Collection Date GPS Coordinates Elevation PES36 Lake 3 Cryoconite hole 31.01.2017 S 71.96345, E 23.31964 1342 m PES38 Lake 3 Cryoconite hole 31.01.2017 S 71.96345, E 23.31964 1342 m PES39 Lake 3 Cryoconite hole 31.01.2017 S 71.96345, E 23.31964 1342 m PES40 Lake 3 Cryoconite hole 31.01.2017 S 71.96345, E 23.31964 1342 m PES42 Petrelnuten Cryoconite hole 31.01.2017 S 72.01292, E 22.82887 1492 m PES43 Petrelnuten Cryoconite hole 31.01.2017 S 72.01292, E 22.82887 1492 m PES47 Lake 2 Cryoconite hole 01.02.2017 S 71.95867, E 23.31546 1316 m PES48 Lake 2 Cryoconite hole 01.02.2017 S 71.95867, E 23.31546 1316 m PES49 Lake 2 Cryoconite hole 01.02.2017 S 71.95867, E 23.31546 1316 m PES50 Lake 2 Cryoconite hole 01.02.2017 S 71.95867, E 23.31546 1316 m PES51 Lake 2 Cryoconite hole 01.02.2017 S 71.95867, E 23.31546 1316 m PES52 Dubois Cryoconite hole 01.02.2017 S 72.03793, E 23.29693 1361 m PES53 Dubois Cryoconite hole 01.02.2017 S 72.03793, E 23.29693 1361 m PES54 Dubois Cryoconite hole 01.02.2017 S 72.03793, E 23.29693 1361 m PES55 Dubois Cryoconite hole 01.02.2017 S 72.03793, E 23.29693 1361 m PES56 Dubois Cryoconite hole 01.02.2017 S 72.03793, E 23.29693 1361 m PES59 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES60 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES61 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES62 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES63 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES64 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES65 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES4 Utsteinen Soil 19.01.2017 S 71.94535, E 23.34500 1359 m PES6 Utsteinen Soil 19.01.2017 S 71.94575, E 23.34525 1367 m PES33 Dubois Soil 30.01.2017 S 72.05169, E 23.25497 1352 m PES35 Dubois Soil 30.01.2017 S 72.04891, E 23.28334 1341 m PES44 Petrelnuten Soil 31.01.2017 S 72.01266, E 22.82781 1511 m PES57 Utsteinen Snow 02.02.2017 S 71.95177, E 23.34854 1362 m PES2 Utsteinen Endolith 18.01.2017 S 71.94535, E 23.34500 1359 m PES32 Dubois Endolith 30.01.2017 S 72.04891, E 23.28334 1341 m PES34 Dubois Endolith 30.01.2017 S 72.05169, E 23.25497 1352 m PES41 Lake 3 Lake 31.01.2017 S 71.96589, E 23.33311 1315 m PES46 Lake 2 Lake 31.01.2017 S 71.95818, E 23.31509 1317 m Overview of all 13C and 14C data of the cryoconite hole and soil samples and the corresponding ages of the carbon..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: Elevation,59,68,site\nCryoconite hole,82,97,site\n1342 m,136,142,site\nCryoconite hole,156,171,site\n1342 m,210,216,site\nCryoconite hole,230,245,site\n1342 m,284,290,site\nCryoconite hole,304,319,site\n1342 m,358,364,site\nCryoconite hole,383,398,site\n1492 m,437,443,site\nCryoconite hole,462,477,site\n1492 m,516,522,site\nCryoconite hole,536,551,site\n1316 m,590,596,site\nCryoconite hole,610,625,site\n1316 m,664,670,site\nCryoconite hole,684,699,site\n1316 m,738,744,site\nCryoconite hole,758,773,site\n1316 m,812,818,site\nCryoconite hole,832,847,site\n1316 m,886,892,site\nCryoconite hole,906,921,site\n1361 m,960,966,site\nCryoconite hole,980,995,site\n1361 m,1034,1040,site\nCryoconite hole,1054,1069,site\n1361 m,1108,1114,site\nCryoconite hole,1128,1143,site\n1361 m,1182,1188,site\nCryoconite hole,1202,1217,site\n1361 m,1256,1262,site\nCryoconite hole,1281,1296,site\n1492 m,1335,1341,site\nCryoconite hole,1360,1375,site\n1492 m,1414,1420,site\nCryoconite hole,1439,1454,site\n1492 m,1493,1499,site\nCryoconite hole,1518,1533,site\n1492 m,1572,1578,site\nCryoconite hole,1597,1612,site\n1492 m,1651,1657,site\nCryoconite hole,1676,1691,site\n1492 m,1730,1736,site\nCryoconite hole,1755,1770,site\n1492 m,1809,1815,site\n1359 m,1874,1880,site\n1352 m,2002,2008,site\n1341 m,2065,2071,site\n1511 m,2133,2139,site\n1362 m,2199,2205,site\n1359 m,2268,2274,site\n1341 m,2335,2341,site\n1352 m,2402,2408,site\n1315 m,2465,2471,site\n1317 m,2528,2534,site\ncryoconite hole,2575,2590,site"} {"text":"Task: Please carry out the NER task for the the text below.\nText: Sample ID Location Habitat Collection Date GPS Coordinates Elevation PES36 Lake 3 Cryoconite hole 31.01.2017 S 71.96345, E 23.31964 1342 m PES38 Lake 3 Cryoconite hole 31.01.2017 S 71.96345, E 23.31964 1342 m PES39 Lake 3 Cryoconite hole 31.01.2017 S 71.96345, E 23.31964 1342 m PES40 Lake 3 Cryoconite hole 31.01.2017 S 71.96345, E 23.31964 1342 m PES42 Petrelnuten Cryoconite hole 31.01.2017 S 72.01292, E 22.82887 1492 m PES43 Petrelnuten Cryoconite hole 31.01.2017 S 72.01292, E 22.82887 1492 m PES47 Lake 2 Cryoconite hole 01.02.2017 S 71.95867, E 23.31546 1316 m PES48 Lake 2 Cryoconite hole 01.02.2017 S 71.95867, E 23.31546 1316 m PES49 Lake 2 Cryoconite hole 01.02.2017 S 71.95867, E 23.31546 1316 m PES50 Lake 2 Cryoconite hole 01.02.2017 S 71.95867, E 23.31546 1316 m PES51 Lake 2 Cryoconite hole 01.02.2017 S 71.95867, E 23.31546 1316 m PES52 Dubois Cryoconite hole 01.02.2017 S 72.03793, E 23.29693 1361 m PES53 Dubois Cryoconite hole 01.02.2017 S 72.03793, E 23.29693 1361 m PES54 Dubois Cryoconite hole 01.02.2017 S 72.03793, E 23.29693 1361 m PES55 Dubois Cryoconite hole 01.02.2017 S 72.03793, E 23.29693 1361 m PES56 Dubois Cryoconite hole 01.02.2017 S 72.03793, E 23.29693 1361 m PES59 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES60 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES61 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES62 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES63 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES64 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES65 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES4 Utsteinen Soil 19.01.2017 S 71.94535, E 23.34500 1359 m PES6 Utsteinen Soil 19.01.2017 S 71.94575, E 23.34525 1367 m PES33 Dubois Soil 30.01.2017 S 72.05169, E 23.25497 1352 m PES35 Dubois Soil 30.01.2017 S 72.04891, E 23.28334 1341 m PES44 Petrelnuten Soil 31.01.2017 S 72.01266, E 22.82781 1511 m PES57 Utsteinen Snow 02.02.2017 S 71.95177, E 23.34854 1362 m PES2 Utsteinen Endolith 18.01.2017 S 71.94535, E 23.34500 1359 m PES32 Dubois Endolith 30.01.2017 S 72.04891, E 23.28334 1341 m PES34 Dubois Endolith 30.01.2017 S 72.05169, E 23.25497 1352 m PES41 Lake 3 Lake 31.01.2017 S 71.96589, E 23.33311 1315 m PES46 Lake 2 Lake 31.01.2017 S 71.95818, E 23.31509 1317 m Overview of all 13C and 14C data of the cryoconite hole and soil samples and the corresponding ages of the carbon..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: Elevation,59,68,site\nCryoconite hole,82,97,site\n1342 m,136,142,site\nCryoconite hole,156,171,site\n1342 m,210,216,site\nCryoconite hole,230,245,site\n1342 m,284,290,site\nCryoconite hole,304,319,site\n1342 m,358,364,site\nCryoconite hole,383,398,site\n1492 m,437,443,site\nCryoconite hole,462,477,site\n1492 m,516,522,site\nCryoconite hole,536,551,site\n1316 m,590,596,site\nCryoconite hole,610,625,site\n1316 m,664,670,site\nCryoconite hole,684,699,site\n1316 m,738,744,site\nCryoconite hole,758,773,site\n1316 m,812,818,site\nCryoconite hole,832,847,site\n1316 m,886,892,site\nCryoconite hole,906,921,site\n1361 m,960,966,site\nCryoconite hole,980,995,site\n1361 m,1034,1040,site\nCryoconite hole,1054,1069,site\n1361 m,1108,1114,site\nCryoconite hole,1128,1143,site\n1361 m,1182,1188,site\nCryoconite hole,1202,1217,site\n1361 m,1256,1262,site\nCryoconite hole,1281,1296,site\n1492 m,1335,1341,site\nCryoconite hole,1360,1375,site\n1492 m,1414,1420,site\nCryoconite hole,1439,1454,site\n1492 m,1493,1499,site\nCryoconite hole,1518,1533,site\n1492 m,1572,1578,site\nCryoconite hole,1597,1612,site\n1492 m,1651,1657,site\nCryoconite hole,1676,1691,site\n1492 m,1730,1736,site\nCryoconite hole,1755,1770,site\n1492 m,1809,1815,site\n1359 m,1874,1880,site\n1352 m,2002,2008,site\n1341 m,2065,2071,site\n1511 m,2133,2139,site\n1362 m,2199,2205,site\n1359 m,2268,2274,site\n1341 m,2335,2341,site\n1352 m,2402,2408,site\n1315 m,2465,2471,site\n1317 m,2528,2534,site\ncryoconite hole,2575,2590,site"}", "/scratch/micpie/export/compound_protein_pathway/test_4-0.jsonl": "{"text":"The compound C=Cn1cnc2c(Nc3ccc(P(C)(C)=O)cc3)nc(N3CCC(CO)CC3)nc21 targets the protein Proto-oncogene c-Src which is involved in the CD28 co-stimulation."} {"text":"The compound InChI=1S\/C23H14F4N6O2\/c24-13-1-5-15(6-2-13)33-20(23(25,26)27)19(31-32-33)22(34)30-14-3-7-16(8-4-14)35-18-10-12-29-21-17(18)9-11-28-21\/h1-12H,(H,28,29)(H,30,34) targets the protein Tyrosine-protein kinase Met which is involved in the MET activates RAP1 and RAC1."}", "/scratch/micpie/export/compound_protein_pathway/valid_5-1.jsonl": "{"text":"The compound InChI=1S\/C28H30FN11O2\/c1-37-15-22(13-33-37)25-14-30-26-27(34-25)40(36-35-26)18-23-17-39(6-9-42-23)28-31-11-21(12-32-28)19-2-3-20(24(29)10-19)16-38-4-7-41-8-5-38\/h2-3,10-15,23H,4-9,16-18H2,1H3\/t23-\/m0\/s1 targets the protein HGF\/SF receptor. The protein HGF\/SF receptor is involved in the MET activates RAP1 and RAC1."} {"text":"The compound CCCcccNCCOCC6))))))ncsccNCCNCC6))))))nnnc6c%139)))))))))))C6 targets the protein Beta-glucuronidase. The protein Beta-glucuronidase is involved in the MPS VII - Sly syndrome."}", "/scratch/micpie/export/compound_protein_pathway/valid_5-0.jsonl": "{"text":"The compound Cn1cc(-c2cnc3nnn(C[C@@H]4CN(c5ncc(-c6ccc(CN7CCOCC7)c(F)c6)cn5)CCO4)c3n2)cn1 targets the protein Scatter factor receptor which is involved in the MET activates RAP1 and RAC1."} {"text":"The compound InChI=1S\/C20H25N7OS\/c1-2-4-14-13(3-1)15-16-17(19(24-25-23-16)26-7-5-21-6-8-26)29-20(15)22-18(14)27-9-11-28-12-10-27\/h21H,1-12H2 targets the protein Beta-glucuronidase which is involved in the MPS VII - Sly syndrome."}", "/scratch/micpie/export/compound_protein_pathway/valid_3-0.jsonl": "{"text":"The compound InChI=1S\/C33H39N7O4\/c1-23-6-7-25(37-33(41)24-8-9-34-31(18-24)40-13-15-43-16-14-40)19-27(23)38-32-26-20-30(29(42-2)21-28(26)35-22-36-32)44-17-12-39-10-4-3-5-11-39\/h6-9,18-22H,3-5,10-17H2,1-2H3,(H,37,41)(H,35,36,38) targets the protein MAX-interacting protein 2 which is involved in the Fc epsilon RI signaling pathway."} {"text":"The compound CC(C)(C)n1nc(-c2ccc3ncccc3c2)c2c(N)ncnc21.Cl targets the protein p60-Src which is involved in the CD28 co-stimulation."}", "/scratch/micpie/export/compound_protein_pathway/test_0-1.jsonl": "{"text":"The compound O=C[C@H](O[C@H]1O[C@H](CO)[C@@H](O)[C@H](O)[C@H]1O)[C@@H](O)[C@H](O)[C@H](O)CO targets the protein Olfactory receptor OR11-16. The protein Olfactory receptor OR11-16 is involved in the Olfactory Signaling Pathway."} {"text":"The compound [O][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][=C][Branch2][Ring1][N][N][C][=C][C][=C][C][Branch1][P][S][=Branch1][C][=O][=Branch1][C][=O][N][Branch1][Ring2][C][C][O][C][C][O][=C][Ring1][S][C][=C][C][Branch1][C][O][=C][Ring2][Ring1][Branch2][Ring2][Ring1][P] targets the protein PKC-A. The protein PKC-A is involved in the Long-term depression."}", "/scratch/micpie/export/compound_protein_pathway/test_5-0.jsonl": "{"text":"The compound C\/C(=N\\NC(N)=O)c1cnc2nnn(C(C)c3cc4cccnc4cc3F)c2n1 targets the protein Hepatocyte growth factor receptor which is involved in the MET activates RAP1 and RAC1."} {"text":"The compound Cc1cc2nc(-c3ccc(-c4nnc(-c5ccccn5)o4)cc3)[nH]c2cc1C targets the protein Beta-G1 which is involved in the MPS VII - Sly syndrome."}", "/scratch/micpie/export/compound_protein_pathway/test_2-0.jsonl": "{"text":"The compound N[C@H][C@@H]CNcnccC=O)NccccOCF)F)Cl)))cc6))))))))cc6-ccncnc6))))))))))))C[C@H]65 targets the protein p150 which is involved in the Axon guidance."} {"text":"The compound Cccnc-ccccC)c-cccccc6)NC=O)C5CCOCC6)))))))))))))c6))))))[nH]5 targets the protein CSAID-binding protein which is involved in the Fc epsilon RI signaling pathway."}", "/scratch/micpie/export/compound_protein_pathway/valid_0-0.jsonl": "{"text":"The compound C[C@]12CC[C@H]3[C@@H](CCC4=CC(=O)CC[C@@]43CO)[C@@H]1CCC2=O targets the protein HPRAJ which is involved in the Olfactory Signaling Pathway."} {"text":"The compound InChI=1S\/C27H23Cl2N3O6\/c1-37-25-15(9-33)38-27(24(35)23(25)34)32-21-11(5-3-7-14(21)29)17-18-12(8-30-26(18)36)16-10-4-2-6-13(28)19(10)31-20(16)22(17)32\/h2-7,15,23-25,27,31,33-35H,8-9H2,1H3,(H,30,36) targets the protein PKC-A which is involved in the Long-term depression."}", "/scratch/micpie/export/compound_protein_pathway/test_3-0.jsonl": "{"text":"The compound Cc1cc(F)ccc1-c1nc(N(C(N)=O)c2c(F)cccc2F)c2ncn(C)c2n1 targets the protein SAPK2a which is involved in the Fc epsilon RI signaling pathway."} {"text":"The compound [C][C][Branch1][C][C][N][N][=C][Branch1][#C][C][=C][N][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Branch1][C][N][N][=C][N][=C][Ring1][#Branch1][Ring2][Ring1][Ring2].[Cl] targets the protein pp60c-src which is involved in the CD28 co-stimulation."}", "/scratch/micpie/export/compound_protein_pathway/train_1-0.jsonl": "{"text":"The compound O=C1NC(=O)C2=C1c1cn(c3ccccc13)CCOCCOCCOCCn1cc2c2cccnc21 targets the protein Protein kinase C alpha type which is involved in the Long-term depression."} {"text":"The compound [C][N][Branch1][Ring2][C][C][O][C][=N][C][=C][Branch2][Ring1][=Branch2][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][#Branch2][S][C][Branch1][C][F][Branch1][C][F][F][C][=C][Ring1][O][C][=C][Ring2][Ring1][Ring2][C][=C][N][=C][N][=C][Ring1][=Branch1] targets the protein Abelson murine leukemia viral oncogene homolog 1 which is involved in the Axon guidance."}", "/scratch/micpie/export/compound_protein_pathway/test_0-0.jsonl": "{"text":"The compound InChI=1S\/C12H22O11\/c13-1-4(16)7(17)8(18)5(2-14)22-12-11(21)10(20)9(19)6(3-15)23-12\/h2,4-13,15-21H,1,3H2\/t4-,5+,6-,7-,8-,9-,10+,11-,12+\/m1\/s1 targets the protein HPRAJ which is involved in the Olfactory Signaling Pathway."} {"text":"The compound [O][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][=C][Branch2][Ring1][N][N][C][=C][C][=C][C][Branch1][P][S][=Branch1][C][=O][=Branch1][C][=O][N][Branch1][Ring2][C][C][O][C][C][O][=C][Ring1][S][C][=C][C][Branch1][C][O][=C][Ring2][Ring1][Branch2][Ring2][Ring1][P] targets the protein PKC-alpha which is involved in the Long-term depression."}", "/scratch/micpie/export/compound_protein_pathway/train_2-0.jsonl": "{"text":"The compound InChI=1S\/C30H37N5O2\/c1-21(15-23-5-8-26-17-28(31-19-27(26)16-23)33-30(37)25-9-10-25)18-32-29(36)24-6-3-22(4-7-24)20-35-13-11-34(2)12-14-35\/h3-8,16-17,19,21,25H,9-15,18,20H2,1-2H3,(H,32,36)(H,31,33,37)\/t21-\/m1\/s1 targets the protein p150 which is involved in the Axon guidance."} {"text":"The compound O=C(c1ccccc1C(F)(F)F)N1CCC(Cc2ccccc2)CC1 targets the protein Mitogen-activated protein kinase p38 alpha which is involved in the Fc epsilon RI signaling pathway."}", "/scratch/micpie/export/compound_protein_pathway/valid_2-0.jsonl": "{"text":"The compound [O][=C][Branch2][Ring2][Ring1][N][C][=C][C][=C][C][=C][Branch1][P][C][=C][C][Branch1][C][F][=C][C][NH1][N][=C][C][Ring1][#Branch2][=Ring1][Branch1][C][=C][Ring1][S][C][=N][Ring2][Ring1][Ring2][C][C][C][Ring1][Ring1] targets the protein Tyrosine-protein kinase ABL1 which is involved in the Axon guidance."} {"text":"The compound [O-][n+]1ccc2c(-c3ccc(F)cc3Cl)cc(-c3ccncc3)nc2c1-c1c(Cl)cccc1Cl targets the protein MAPK 14 which is involved in the Fc epsilon RI signaling pathway."}", "/scratch/micpie/export/compound_protein_pathway/valid_2-1.jsonl": "{"text":"The compound O=C(Nc1cc2ccc(-c3cc(F)cc4[nH]ncc34)cc2cn1)C1CC1 targets the protein Tyrosine-protein kinase ABL1. The protein Tyrosine-protein kinase ABL1 is involved in the Axon guidance."} {"text":"The compound [O-][n+]1ccc2c(-c3ccc(F)cc3Cl)cc(-c3ccncc3)nc2c1-c1c(Cl)cccc1Cl targets the protein CSBP. The protein CSBP is involved in the Fc epsilon RI signaling pathway."}", "/scratch/micpie/export/compound_protein_pathway/valid_4-0.jsonl": "{"text":"The compound CcccccC)c6\/C=C\/ncnccNccccPC)C)=O))cc6)))))))nc-cccccn6))))))nc69 targets the protein p60-Src which is involved in the CD28 co-stimulation."} {"text":"The compound O=CCccccO)cF)c6)))))))N\/N=C\\C=O)NccCl)cccCl)c69 targets the protein HGF receptor which is involved in the MET activates RAP1 and RAC1."}", "/scratch/micpie/export/compound_protein_pathway/train_5-1.jsonl": "{"text":"The compound Cncc-cccc=O)nCCOcccncccOCccccCCC=O)Ncccccc6N)))))))))))cc6))))))))ccc%106)))))))))))))n6))))))cn5 targets the protein Proto-oncogene c-Met. The protein Proto-oncogene c-Met is involved in the MET activates RAP1 and RAC1."} {"text":"The compound Cc1ccc(C(c2c[nH]c3ccc(C(=O)N\/N=C\/c4ccc(F)cc4)cc23)c2c[nH]c3ccc(C(=O)N\/N=C\/c4ccc(F)cc4)cc23)cc1 targets the protein Beta-G1. The protein Beta-G1 is involved in the MPS VII - Sly syndrome."}", "/scratch/micpie/export/compound_protein_pathway/test_2-1.jsonl": "{"text":"The compound N[C@H]1[C@@H]2CN(c3ncc(C(=O)Nc4ccc(OC(F)(F)Cl)cc4)cc3-c3cncnc3)C[C@H]12 targets the protein p150. The protein p150 is involved in the Axon guidance."} {"text":"The compound InChI=1S\/C23H23N3O2\/c1-14-3-4-17(21-24-13-15(2)25-21)11-18(14)16-5-6-19-20(12-16)26-22(27)23(19)7-9-28-10-8-23\/h3-6,11-13H,7-10H2,1-2H3,(H,24,25)(H,26,27) targets the protein MAP kinase MXI2. The protein MAP kinase MXI2 is involved in the Fc epsilon RI signaling pathway."}", "/scratch/micpie/export/compound_protein_pathway/train_0-0.jsonl": "{"text":"The compound [O][=C][Branch1][C][O][C][=Branch1][C][=O][C][O] targets the protein HPRAJ which is involved in the Olfactory Signaling Pathway."} {"text":"The compound O=CNCCOC=O)cccO)cC=O)ccO)cccc6C=O)O)))))))))cO)c6)))))))))CccccO)cc6)))))))))cccccc6 targets the protein PKC-A which is involved in the Long-term depression."}", "/scratch/micpie/export/compound_protein_pathway/test_1-1.jsonl": "{"text":"The compound O=C\/C=C\\ccccO)cO)c6)))))))Occcccc69 targets the protein PKC-A. The protein PKC-A is involved in the Long-term depression."} {"text":"The compound Cc1ccc(C(=O)Nc2ccc(CN3CCN(C)CC3)c(C(F)(F)F)c2)cc1-n1cc(-c2cnc3[nH]nc(C)c3c2)nn1 targets the protein Abelson murine leukemia viral oncogene homolog 1. The protein Abelson murine leukemia viral oncogene homolog 1 is involved in the Axon guidance."}", "/scratch/micpie/export/compound_protein_pathway/test_5-1.jsonl": "{"text":"The compound C\/C(=N\\NC(N)=O)c1cnc2nnn(C(C)c3cc4cccnc4cc3F)c2n1 targets the protein SF receptor. The protein SF receptor is involved in the MET activates RAP1 and RAC1."} {"text":"The compound Ccccnc-cccc-cnnc-cccccn6))))))o5)))))cc6))))))[nH]c5cc9C targets the protein Beta-G1. The protein Beta-G1 is involved in the MPS VII - Sly syndrome."}", "/scratch/micpie/export/compound_protein_pathway/train_4-1.jsonl": "{"text":"The compound InChI=1S\/C16H19N5.ClH\/c1-10-6-5-7-11(8-10)13-12-14(17)18-9-19-15(12)21(20-13)16(2,3)4;\/h5-9H,1-4H3,(H2,17,18,19);1H targets the protein Proto-oncogene tyrosine-protein kinase Src. The protein Proto-oncogene tyrosine-protein kinase Src is involved in the CD28 co-stimulation."} {"text":"The compound COc1cc2nccc(Oc3ccc(NC(=O)C(C)(C)C(=O)Nc4ccc(F)cc4)cc3)c2cc1OC targets the protein HGF receptor. The protein HGF receptor is involved in the MET activates RAP1 and RAC1."}", "/scratch/micpie/export/compound_protein_pathway/train_5-0.jsonl": "{"text":"The compound Cn1cc(-c2ccc(=O)n(CCOc3ccnc4cc(OCc5ccc(CCC(=O)Nc6ccccc6N)cc5)ccc34)n2)cn1 targets the protein HGF\/SF receptor which is involved in the MET activates RAP1 and RAC1."} {"text":"The compound CccccCcc[nH]ccccC=O)N\/N=C\/ccccF)cc6))))))))))cc96)))))))))cc[nH]ccccC=O)N\/N=C\/ccccF)cc6))))))))))cc96))))))))))cc6 targets the protein Beta-G1 which is involved in the MPS VII - Sly syndrome."}", "/scratch/micpie/export/compound_protein_pathway/valid_0-1.jsonl": "{"text":"The compound [C][C@][C][C][C@H1][C@@H1][Branch2][Ring1][C][C][C][C][=C][C][=Branch1][C][=O][C][C][C@@][Ring1][#Branch1][Ring1][O][C][O][C@@H1][Ring1][P][C][C][C][Ring2][Ring1][Ring2][=O] targets the protein Olfactory receptor OR11-16. The protein Olfactory receptor OR11-16 is involved in the Olfactory Signaling Pathway."} {"text":"The compound COC1C(CO)OC(n2c3c(Cl)cccc3c3c4c(c5c6cccc(Cl)c6[nH]c5c32)CNC4=O)C(O)C1O targets the protein PKC-alpha. The protein PKC-alpha is involved in the Long-term depression."}", "/scratch/micpie/export/compound_protein_pathway/train_2-1.jsonl": "{"text":"The compound C[C@@H](CNC(=O)c1ccc(CN2CCN(C)CC2)cc1)Cc1ccc2cc(NC(=O)C3CC3)ncc2c1 targets the protein Proto-oncogene c-Abl. The protein Proto-oncogene c-Abl is involved in the Axon guidance."} {"text":"The compound O=C(c1ccccc1C(F)(F)F)N1CCC(Cc2ccccc2)CC1 targets the protein MAP kinase MXI2. The protein MAP kinase MXI2 is involved in the Fc epsilon RI signaling pathway."}", "/scratch/micpie/export/compound_protein_pathway/valid_1-1.jsonl": "{"text":"The compound CCCCCCCC\/C=C\\CCCCCCCC\/C=C1\/C[C@@](CO)(COC(C)=O)OC1=O targets the protein PKC-A. The protein PKC-A is involved in the Long-term depression."} {"text":"The compound [N][#C][C][=C][C][Branch1][C][F][=C][C][Branch2][Ring2][#C][C][=C][C][Branch2][Ring1][=Branch2][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][#Branch2][O][C][Branch1][C][F][Branch1][C][F][Cl][C][=C][Ring1][O][=C][N][=C][Ring2][Ring1][Ring2][C][C][C][C@@H1][Branch1][C][O][N][Ring1][=Branch1][=C][Ring2][Ring1][P] targets the protein Proto-oncogene c-Abl. The protein Proto-oncogene c-Abl is involved in the Axon guidance."}", "/scratch/micpie/export/compound_protein_pathway/test_3-1.jsonl": "{"text":"The compound CcccF)ccc6-cncNCN)=O))ccF)cccc6F))))))))cncnC)c5n9 targets the protein Stress-activated protein kinase 2a. The protein Stress-activated protein kinase 2a is involved in the Fc epsilon RI signaling pathway."} {"text":"The compound CC(C)n1nc(-c2cnc3ccccc3c2)c2c(N)ncnc21.Cl targets the protein Proto-oncogene tyrosine-protein kinase Src. The protein Proto-oncogene tyrosine-protein kinase Src is involved in the CD28 co-stimulation."}", "/scratch/micpie/export/compound_protein_pathway/test_1-0.jsonl": "{"text":"The compound O=C\/C=C\\ccccO)cO)c6)))))))Occcccc69 targets the protein PKC-alpha which is involved in the Long-term depression."} {"text":"The compound Cc1ccc(C(=O)Nc2ccc(CN3CCN(C)CC3)c(C(F)(F)F)c2)cc1-n1cc(-c2cnc3[nH]nc(C)c3c2)nn1 targets the protein Tyrosine-protein kinase ABL1 which is involved in the Axon guidance."}", "/scratch/micpie/export/compound_protein_pathway/valid_4-1.jsonl": "{"text":"The compound Cc1cccc(C)c1\/C=C\/n1cnc2c(Nc3ccc(P(C)(C)=O)cc3)nc(-c3ccccn3)nc21 targets the protein p60-Src. The protein p60-Src is involved in the CD28 co-stimulation."} {"text":"The compound O=CCccccO)cF)c6)))))))N\/N=C\\C=O)NccCl)cccCl)c69 targets the protein HGF\/SF receptor. The protein HGF\/SF receptor is involved in the MET activates RAP1 and RAC1."}", "/scratch/micpie/export/compound_protein_pathway/train_1-1.jsonl": "{"text":"The compound O=C1NC(=O)C2=C1c1cn(c3ccccc13)CCOCCOCCOCCn1cc2c2cccnc21 targets the protein PKC-alpha. The protein PKC-alpha is involved in the Long-term depression."} {"text":"The compound [C][N][Branch1][Ring2][C][C][O][C][=N][C][=C][Branch2][Ring1][=Branch2][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][#Branch2][S][C][Branch1][C][F][Branch1][C][F][F][C][=C][Ring1][O][C][=C][Ring2][Ring1][Ring2][C][=C][N][=C][N][=C][Ring1][=Branch1] targets the protein Tyrosine-protein kinase ABL1. The protein Tyrosine-protein kinase ABL1 is involved in the Axon guidance."}", "/scratch/micpie/export/compound_protein_pathway/train_0-1.jsonl": "{"text":"The compound O=C(O)C(=O)CO targets the protein Olfactory receptor 51E2. The protein Olfactory receptor 51E2 is involved in the Olfactory Signaling Pathway."} {"text":"The compound InChI=1S\/C31H25NO10\/c33-21-11-9-17(10-12-21)13-20(32-29(38)18-5-2-1-3-6-18)16-42-31(41)19-14-24(35)27(25(36)15-19)28(37)26-22(30(39)40)7-4-8-23(26)34\/h1-12,14-15,20,33-36H,13,16H2,(H,32,38)(H,39,40) targets the protein PKC-A. The protein PKC-A is involved in the Long-term depression."}", "/scratch/micpie/export/compound_protein_pathway/valid_1-0.jsonl": "{"text":"The compound InChI=1S\/C27H46O5\/c1-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-25-21-27(22-28,32-26(25)30)23-31-24(2)29\/h10-11,20,28H,3-9,12-19,21-23H2,1-2H3\/b11-10-,25-20-\/t27-\/m1\/s1 targets the protein PKC-alpha which is involved in the Long-term depression."} {"text":"The compound N#Cc1cc(F)cc(-c2cc(C(=O)Nc3ccc(OC(F)(F)Cl)cc3)cnc2C2CC[C@@H](O)N2)c1 targets the protein p150 which is involved in the Axon guidance."}", "/scratch/micpie/export/compound_protein_pathway/train_3-1.jsonl": "{"text":"The compound Cc1ccccc1C(=O)N1CCC(Cc2ccccc2)CC1 targets the protein SAPK2a. The protein SAPK2a is involved in the Fc epsilon RI signaling pathway."} {"text":"The compound COc1ccc(-c2nn(C(C)C)c3ncnc(N)c23)cc1.Cl targets the protein Proto-oncogene c-Src. The protein Proto-oncogene c-Src is involved in the CD28 co-stimulation."}", "/scratch/micpie/export/compound_protein_pathway/valid_3-1.jsonl": "{"text":"The compound COc1cc2ncnc(Nc3cc(NC(=O)c4ccnc(N5CCOCC5)c4)ccc3C)c2cc1OCCN1CCCCC1 targets the protein Stress-activated protein kinase 2a. The protein Stress-activated protein kinase 2a is involved in the Fc epsilon RI signaling pathway."} {"text":"The compound InChI=1S\/C18H18N6.ClH\/c1-18(2,3)24-17-14(16(19)21-10-22-17)15(23-24)12-6-7-13-11(9-12)5-4-8-20-13;\/h4-10H,1-3H3,(H2,19,21,22);1H targets the protein p60-Src. The protein p60-Src is involved in the CD28 co-stimulation."}", "/scratch/micpie/export/compound_protein_pathway/test_4-1.jsonl": "{"text":"The compound InChI=1S\/C21H27N6O2P\/c1-4-26-14-22-18-19(23-16-5-7-17(8-6-16)30(2,3)29)24-21(25-20(18)26)27-11-9-15(13-28)10-12-27\/h4-8,14-15,28H,1,9-13H2,2-3H3,(H,23,24,25) targets the protein Proto-oncogene tyrosine-protein kinase Src. The protein Proto-oncogene tyrosine-protein kinase Src is involved in the CD28 co-stimulation."} {"text":"The compound O=CNccccOcccnc[nH]ccc95))))))))))cc6)))))))cnnn-ccccF)cc6))))))c5CF)F)F targets the protein Scatter factor receptor. The protein Scatter factor receptor is involved in the MET activates RAP1 and RAC1."}", "/scratch/micpie/export/compound_protein_pathway/train_3-0.jsonl": "{"text":"The compound InChI=1S\/C20H23NO\/c1-16-7-5-6-10-19(16)20(22)21-13-11-18(12-14-21)15-17-8-3-2-4-9-17\/h2-10,18H,11-15H2,1H3 targets the protein MAX-interacting protein 2 which is involved in the Fc epsilon RI signaling pathway."} {"text":"The compound [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][C][=N][N][Branch1][=Branch1][C][Branch1][C][C][C][C][=N][C][=N][C][Branch1][C][N][=C][Ring1][=N][Ring1][#Branch1][C][=C][Ring2][Ring1][Ring1].[Cl] targets the protein p60-Src which is involved in the CD28 co-stimulation."}", "/scratch/micpie/export/compound_protein_pathway/train_4-0.jsonl": "{"text":"The compound [C][C][=C][C][=C][C][Branch2][Ring1][O][C][=N][N][Branch1][=Branch2][C][Branch1][C][C][Branch1][C][C][C][C][=N][C][=N][C][Branch1][C][N][=C][Ring1][=C][Ring1][#Branch1][=C][Ring2][Ring1][Ring2].[Cl] targets the protein p60-Src which is involved in the CD28 co-stimulation."} {"text":"The compound [C][O][C][=C][C][=N][C][=C][C][Branch2][Ring2][#Branch2][O][C][=C][C][=C][Branch2][Ring1][=N][N][C][=Branch1][C][=O][C][Branch1][C][C][Branch1][C][C][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][=C][Ring2][Ring1][=Branch1][=C][Ring2][Ring1][=N][C][=C][Ring2][Ring1][P][O][C] targets the protein Tyrosine-protein kinase Met which is involved in the MET activates RAP1 and RAC1."}", "/scratch/micpie/export/sarscov2_3clpro_diamond/test_0-10.jsonl": "{"text":"User: I'm looking for the DeepSMILES of a molecule that is successfully targeting the SARSCoV2 3CL protease?\nAssistant: This is a molecule that is successfully targeting the SARSCoV2 3CL protease: O=CCCl))NCCNS=O)=O)ccccCl)cc6)))))))CC6"} {"text":"User: I'm looking for the canonical SMILES of a molecule that is targeting the SARSCoV2 3CL protease?\nAssistant: This is a molecule that is targeting the SARSCoV2 3CL protease: OCC1(c2ccccn2)CCCC1"}", "/scratch/micpie/export/sarscov2_3clpro_diamond/valid_0-8.jsonl": "{"text":"User: Is the molecule with the DeepSMILES NC=O)[C@H]CCC[C@H]5cccsc5 acting against the SARSCoV2 3CL protease?\nAssistant: Yes, it is acting against the SARSCoV2 3CL protease."} {"text":"User: Is the molecule with the canonical SMILES Nc1cc(=O)[nH][nH]1 targeting the SARSCoV2 3CL protease?\nAssistant: No, it is not targeting the SARSCoV2 3CL protease."}", "/scratch/micpie/export/sarscov2_3clpro_diamond/train_0-8.jsonl": "{"text":"User: Is the molecule with the SELFIES [C][C][=Branch1][C][=O][N][C][C][C][=C][NH1][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch2][Ring1][#Branch1] targeting the SARSCoV2 3CL protease?\nAssistant: Yes, it is targeting the SARSCoV2 3CL protease."} {"text":"User: Is the molecule with the InChI InChI=1S\/C8H13N3O2\/c1-10(2)8(12)13-6-5-11-4-3-9-7-11\/h3-4,7H,5-6H2,1-2H3 targeting the SARSCoV2 3CL protease?\nAssistant: No, it is not targeting the SARSCoV2 3CL protease."}", "/scratch/micpie/export/sarscov2_3clpro_diamond/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is successfully targeting the SARSCoV2 3CL protease.\ncanonical SMILES: O=C(CCl)N1CCN(S(=O)(=O)c2ccc(Cl)cc2)CC1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is successfully targeting the SARSCoV2 3CL protease."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is targeting the SARSCoV2 3CL protease.\nMolecule InChI: InChI=1S\/C11H15NO\/c13-9-11(6-2-3-7-11)10-5-1-4-8-12-10\/h1,4-5,8,13H,2-3,6-7,9H2\nConstraint: Answer the question in a full sentence.\nResult: This molecule is targeting the SARSCoV2 3CL protease."}", "/scratch/micpie/export/sarscov2_3clpro_diamond/valid_0-9.jsonl": "{"text":"User: Can you generate the DeepSMILES of a molecule that is successfully targeting the SARSCoV2 3CL protease?\nAssistant: Yes, here you go: NC=O)[C@H]CCC[C@H]5cccsc5"} {"text":"User: Can you create the DeepSMILES of a molecule that is not targeting the SARSCoV2 3CL protease?\nAssistant: Yes, here you go: Nccc=O)[nH][nH]5"}", "/scratch/micpie/export/sarscov2_3clpro_diamond/test_0-1.jsonl": "{"text":"Based on the SELFIES [O][=C][Branch1][Ring1][C][Cl][N][C][C][N][Branch2][Ring1][Ring1][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][C][Ring1][S], the molecule is targeting the SARSCoV2 3CL protease."} {"text":"Based on the SELFIES [O][C][C][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][C][C][Ring1][O], the molecule is successfully targeting the SARSCoV2 3CL protease."}", "/scratch/micpie/export/sarscov2_3clpro_diamond/valid_0-0.jsonl": "{"text":"The molecule with the SMILES NC(=O)[C@H]1CCC[C@H]1c1ccsc1 exhibits activity against the SARSCoV2 3CL protease."} {"text":"The molecule with the SELFIES representation of [N][C][=C][C][=Branch1][C][=O][NH1][NH1][Ring1][=Branch1] shows no activity against the SARSCoV2 3CL protease."}", "/scratch/micpie/export/sarscov2_3clpro_diamond/test_0-2.jsonl": "{"text":"The SMILES O=C(CCl)N1CCN(S(=O)(=O)c2ccc(Cl)cc2)CC1 is from a molecule that shows activity against the SARS-CoV-2 3CL protease."} {"text":"The InChI InChI=1S\/C11H15NO\/c13-9-11(6-2-3-7-11)10-5-1-4-8-12-10\/h1,4-5,8,13H,2-3,6-7,9H2 is from a molecule that displays activity against the SARS-CoV-2 3CL protease."}", "/scratch/micpie/export/sarscov2_3clpro_diamond/valid_0-10.jsonl": "{"text":"User: I'm looking for the DeepSMILES of a molecule that is acting against the SARSCoV2 3CL protease?\nAssistant: This is a molecule that is acting against the SARSCoV2 3CL protease: NC=O)[C@H]CCC[C@H]5cccsc5"} {"text":"User: I'm searching for the canonical SMILES of a molecule that is not successfully targeting the SARSCoV2 3CL protease?\nAssistant: This is a molecule that is not successfully targeting the SARSCoV2 3CL protease: Nc1cc(=O)[nH][nH]1"}", "/scratch/micpie/export/sarscov2_3clpro_diamond/train_0-6.jsonl": "{"text":"Task: Please create a SELFIES based on the text description.\nDescription: A molecule that is acting against the SARSCoV2 3CL protease.\nResult: [C][C][=Branch1][C][=O][N][C][C][C][=C][NH1][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch2][Ring1][#Branch1]"} {"text":"Task: Please generate a molecule DeepSMILES based on the text description.\nDescription: A molecule that is targeting the SARSCoV2 3CL protease.\nResult: CNC)C=O)OCCnccnc5"}", "/scratch/micpie/export/sarscov2_3clpro_diamond/valid_0-6.jsonl": "{"text":"Task: Please create a SELFIES based on the description.\nDescription: A molecule that is targeting the SARSCoV2 3CL protease.\nResult: [N][C][=Branch1][C][=O][C@H1][C][C][C][C@H1][Ring1][Branch1][C][C][=C][S][C][=Ring1][Branch1]"} {"text":"Task: Please create a molecule SMILES based on the description.\nDescription: A molecule that is targeting the SARSCoV2 3CL protease.\nResult: Nc1cc(=O)[nH][nH]1"}", "/scratch/micpie/export/sarscov2_3clpro_diamond/test_0-9.jsonl": "{"text":"User: Can you give me the InChI of a molecule that is successfully targeting the SARSCoV2 3CL protease?\nAssistant: Yes, here you go: InChI=1S\/C12H14Cl2N2O3S\/c13-9-12(17)15-5-7-16(8-6-15)20(18,19)11-3-1-10(14)2-4-11\/h1-4H,5-9H2"} {"text":"User: Can you give me the canonical SMILES of a molecule that is successfully targeting the SARSCoV2 3CL protease?\nAssistant: Yes, I'm happy to help, here you go: OCC1(c2ccccn2)CCCC1"}", "/scratch/micpie/export/sarscov2_3clpro_diamond/test_0-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of O=CCCl))NCCNS=O)=O)ccccCl)cc6)))))))CC6 displays activity against the SARSCoV2 3CL protease."} {"text":"The molecule with the InChI representation of InChI=1S\/C11H15NO\/c13-9-11(6-2-3-7-11)10-5-1-4-8-12-10\/h1,4-5,8,13H,2-3,6-7,9H2 exhibits activity against the SARS-CoV-2 3CL protease."}", "/scratch/micpie/export/sarscov2_3clpro_diamond/valid_0-7.jsonl": "{"text":"User: Can you derive if the molecule with the SMILES NC(=O)[C@H]1CCC[C@H]1c1ccsc1 is acting against the SARSCoV2 3CL protease?\nAssistant: Yes, this molecule is acting against the SARSCoV2 3CL protease."} {"text":"User: Can you estimate if the molecule with the canonical SMILES Nc1cc(=O)[nH][nH]1 is successfully targeting the SARSCoV2 3CL protease?\nAssistant: No, this molecule is not successfully targeting the SARSCoV2 3CL protease."}", "/scratch/micpie/export/sarscov2_3clpro_diamond/test_0-3.jsonl": "{"text":"The canonical SMILES O=C(CCl)N1CCN(S(=O)(=O)c2ccc(Cl)cc2)CC1 is acting against the SARSCoV2 3CL protease."} {"text":"The SMILES OCC1(c2ccccn2)CCCC1 is successfully targeting the SARSCoV2 3CL protease."}", "/scratch/micpie/export/sarscov2_3clpro_diamond/valid_0-11.jsonl": "{"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should be targeting the SARSCoV2 3CL protease.\nAssistant: Ok, here you go, this SELFIES is targeting the SARSCoV2 3CL protease: [N][C][=Branch1][C][=O][C@H1][C][C][C][C@H1][Ring1][Branch1][C][C][=C][S][C][=Ring1][Branch1]"} {"text":"User: I want to create a molecule SELFIES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be targeting the SARSCoV2 3CL protease.\nAssistant: Got it, this SELFIES is not targeting the SARSCoV2 3CL protease: [N][C][=C][C][=Branch1][C][=O][NH1][NH1][Ring1][=Branch1]"}", "/scratch/micpie/export/sarscov2_3clpro_diamond/train_0-0.jsonl": "{"text":"The molecule with the SMILES CC(=O)NCCc1c[nH]c2ccc(F)cc12 displays activity against the SARSCoV2 3CL protease."} {"text":"The molecule with the SELFIES [C][N][Branch1][C][C][C][=Branch1][C][=O][O][C][C][N][C][=C][N][=C][Ring1][Branch1] shows no activity against the SARSCoV2 3CL protease."}", "/scratch/micpie/export/sarscov2_3clpro_diamond/test_0-6.jsonl": "{"text":"Task: Please generate a molecule InChI based on the text description.\nDescription: A molecule that is successfully targeting the SARSCoV2 3CL protease.\nResult: InChI=1S\/C12H14Cl2N2O3S\/c13-9-12(17)15-5-7-16(8-6-15)20(18,19)11-3-1-10(14)2-4-11\/h1-4H,5-9H2"} {"text":"Task: Please give me a molecule SELFIES based on the description below.\nDescription: A molecule that is targeting the SARSCoV2 3CL protease.\nResult: [O][C][C][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][C][C][Ring1][O]"}", "/scratch/micpie/export/sarscov2_3clpro_diamond/train_0-10.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that is targeting the SARSCoV2 3CL protease?\nAssistant: This is a molecule that is targeting the SARSCoV2 3CL protease: CC(=O)NCCc1c[nH]c2ccc(F)cc12"} {"text":"User: I'm searching for the InChI of a molecule that is not acting against the SARSCoV2 3CL protease?\nAssistant: This is a molecule that is not acting against the SARSCoV2 3CL protease: InChI=1S\/C8H13N3O2\/c1-10(2)8(12)13-6-5-11-4-3-9-7-11\/h3-4,7H,5-6H2,1-2H3"}", "/scratch/micpie/export/sarscov2_3clpro_diamond/train_0-3.jsonl": "{"text":"The canonical SMILES CC(=O)NCCc1c[nH]c2ccc(F)cc12 is acting against the SARSCoV2 3CL protease."} {"text":"The SELFIES [C][N][Branch1][C][C][C][=Branch1][C][=O][O][C][C][N][C][=C][N][=C][Ring1][Branch1] is not successfully targeting the SARSCoV2 3CL protease."}", "/scratch/micpie/export/sarscov2_3clpro_diamond/train_0-12.jsonl": "{"text":"User: I want to create a InChI.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should be targeting the SARSCoV2 3CL protease.\nAssistant: Understood, this InChI is targeting the SARSCoV2 3CL protease: InChI=1S\/C12H13FN2O\/c1-8(16)14-5-4-9-7-15-12-3-2-10(13)6-11(9)12\/h2-3,6-7,15H,4-5H2,1H3,(H,14,16)"} {"text":"User: I want to generate a molecule InChI.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be successfully targeting the SARSCoV2 3CL protease.\nAssistant: Ok, this InChI is not successfully targeting the SARSCoV2 3CL protease: InChI=1S\/C8H13N3O2\/c1-10(2)8(12)13-6-5-11-4-3-9-7-11\/h3-4,7H,5-6H2,1-2H3"}", "/scratch/micpie/export/sarscov2_3clpro_diamond/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are acting against the SARSCoV2 3CL protease?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any other words.\nOptions:\n1.) O=C(CCl)N1CCN(S(=O)(=O)c2ccc(Cl)cc2)CC1\n2.) CC(=O)N1CCC2CC21S(=O)(=O)c1ccccc1\n3.) COC(=O)c1ccc(NC(=O)C2CCCC2)cc1\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are acting against the SARSCoV2 3CL protease?\nConstraint: You must select none, one or more options from 1 or 2 without using any additional words.\nOptions:\n1) Cc1ccc(C)c(S(=O)(=O)N2CCN(C(=O)CCl)CC2)c1\n2) OCC1(c2ccccn2)CCCC1\nAnswer: 1, 2"}", "/scratch/micpie/export/sarscov2_3clpro_diamond/valid_0-2.jsonl": "{"text":"The SELFIES [N][C][=Branch1][C][=O][C@H1][C][C][C][C@H1][Ring1][Branch1][C][C][=C][S][C][=Ring1][Branch1] represents a molecule that shows activity against the SARSCoV2 3CL protease."} {"text":"The SMILES Nc1cc(=O)[nH][nH]1 is from a molecule that shows no activity against the SARSCoV2 3CL protease."}", "/scratch/micpie/export/sarscov2_3clpro_diamond/valid_0-1.jsonl": "{"text":"Based on the InChI InChI=1S\/C10H13NOS\/c11-10(12)9-3-1-2-8(9)7-4-5-13-6-7\/h4-6,8-9H,1-3H2,(H2,11,12)\/t8-,9-\/m0\/s1, the molecule is targeting the SARSCoV2 3CL protease."} {"text":"Based on the SELFIES [N][C][=C][C][=Branch1][C][=O][NH1][NH1][Ring1][=Branch1], the molecule is not acting against the SARSCoV2 3CL protease."}", "/scratch/micpie/export/sarscov2_3clpro_diamond/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are acting against the SARSCoV2 3CL protease?\nConstraint: You must select none, one or more options from 1 or 2 without using any additional words.\nOptions:\n1: [C][N][C][=N][C][=C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][=C][Ring1][#Branch2][Cl]\n2: [N][C][=Branch1][C][=O][C@H1][C][C][C][C@H1][Ring1][Branch1][C][C][=C][S][C][=Ring1][Branch1]\nAnswer: 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not targeting the SARSCoV2 3CL protease?\nConstraint: You must select none, one or more options from A or B without using any other words.\nOptions:\nA: CC(C)NC(=O)c1ccc2ccccc2n1\nB: Nc1cc(=O)[nH][nH]1\nAnswer: A, B"}", "/scratch/micpie/export/sarscov2_3clpro_diamond/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is acting against the SARSCoV2 3CL protease.\nMolecule DeepSMILES: NC=O)[C@H]CCC[C@H]5cccsc5\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is acting against the SARSCoV2 3CL protease."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is acting against the SARSCoV2 3CL protease.\nInChI: InChI=1S\/C3H5N3O\/c4-2-1-3(7)6-5-2\/h1H,(H4,4,5,6,7)\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not acting against the SARSCoV2 3CL protease."}", "/scratch/micpie/export/sarscov2_3clpro_diamond/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is successfully targeting the SARSCoV2 3CL protease.\nInChI: InChI=1S\/C10H13NOS\/c11-10(12)9-3-1-2-8(9)7-4-5-13-6-7\/h4-6,8-9H,1-3H2,(H2,11,12)\/t8-,9-\/m0\/s1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is successfully targeting the SARSCoV2 3CL protease.\ncanonical SMILES: Nc1cc(=O)[nH][nH]1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/sarscov2_3clpro_diamond/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is acting against the SARSCoV2 3CL protease.\nSMILES: CC(=O)NCCc1c[nH]c2ccc(F)cc12\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is acting against the SARSCoV2 3CL protease."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is targeting the SARSCoV2 3CL protease.\nMolecule DeepSMILES: CNC)C=O)OCCnccnc5\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not targeting the SARSCoV2 3CL protease."}", "/scratch/micpie/export/sarscov2_3clpro_diamond/valid_0-12.jsonl": "{"text":"User: I want to create a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should be targeting the SARSCoV2 3CL protease.\nAssistant: Ok, this canonical SMILES is targeting the SARSCoV2 3CL protease: NC(=O)[C@H]1CCC[C@H]1c1ccsc1"} {"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be successfully targeting the SARSCoV2 3CL protease.\nAssistant: Got it, this DeepSMILES is not successfully targeting the SARSCoV2 3CL protease: Nccc=O)[nH][nH]5"}", "/scratch/micpie/export/sarscov2_3clpro_diamond/train_0-2.jsonl": "{"text":"The canonical SMILES CC(=O)NCCc1c[nH]c2ccc(F)cc12 represents a molecule that exhibits activity against the SARS-CoV-2 3CL protease."} {"text":"The InChI InChI=1S\/C8H13N3O2\/c1-10(2)8(12)13-6-5-11-4-3-9-7-11\/h3-4,7H,5-6H2,1-2H3 is from a molecule that exhibits no activity against the SARS-CoV-2 3CL protease."}", "/scratch/micpie/export/sarscov2_3clpro_diamond/test_0-11.jsonl": "{"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should be acting against the SARSCoV2 3CL protease.\nAssistant: Ok, here you go, this DeepSMILES is acting against the SARSCoV2 3CL protease: O=CCCl))NCCNS=O)=O)ccccCl)cc6)))))))CC6"} {"text":"User: I want to create a InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should be acting against the SARSCoV2 3CL protease.\nAssistant: Got it, here you go, this InChI is acting against the SARSCoV2 3CL protease: InChI=1S\/C11H15NO\/c13-9-11(6-2-3-7-11)10-5-1-4-8-12-10\/h1,4-5,8,13H,2-3,6-7,9H2"}", "/scratch/micpie/export/sarscov2_3clpro_diamond/train_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the SELFIES [C][C][=Branch1][C][=O][N][C][C][C][=C][NH1][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch2][Ring1][#Branch1] is acting against the SARSCoV2 3CL protease?\nAssistant: Yes, this molecule is acting against the SARSCoV2 3CL protease."} {"text":"User: Can you estimate if the molecule with the InChI InChI=1S\/C8H13N3O2\/c1-10(2)8(12)13-6-5-11-4-3-9-7-11\/h3-4,7H,5-6H2,1-2H3 is acting against the SARSCoV2 3CL protease?\nAssistant: No, this molecule is not acting against the SARSCoV2 3CL protease."}", "/scratch/micpie/export/sarscov2_3clpro_diamond/train_0-11.jsonl": "{"text":"User: I want to generate a SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should be successfully targeting the SARSCoV2 3CL protease.\nAssistant: Ok, here you go, this SMILES is successfully targeting the SARSCoV2 3CL protease: CC(=O)NCCc1c[nH]c2ccc(F)cc12"} {"text":"User: I want to generate a InChI.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be targeting the SARSCoV2 3CL protease.\nAssistant: Got it, here you go, this InChI is not targeting the SARSCoV2 3CL protease: InChI=1S\/C8H13N3O2\/c1-10(2)8(12)13-6-5-11-4-3-9-7-11\/h3-4,7H,5-6H2,1-2H3"}", "/scratch/micpie/export/sarscov2_3clpro_diamond/train_0-1.jsonl": "{"text":"Based on the SMILES CC(=O)NCCc1c[nH]c2ccc(F)cc12, the molecule is acting against the SARSCoV2 3CL protease."} {"text":"Based on the InChI InChI=1S\/C8H13N3O2\/c1-10(2)8(12)13-6-5-11-4-3-9-7-11\/h3-4,7H,5-6H2,1-2H3, the molecule is not targeting the SARSCoV2 3CL protease."}", "/scratch/micpie/export/sarscov2_3clpro_diamond/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are targeting the SARSCoV2 3CL protease?\nConstraint: You must select none, one or more options from a or b without using any other words.\nOptions:\na) InChI=1S\/C11H12N4OS\/c1-2-5-9-14-15-11(17-9)13-10(16)8-6-3-4-7-12-8\/h3-4,6-7H,2,5H2,1H3,(H,13,15,16)\nb) InChI=1S\/C12H13FN2O\/c1-8(16)14-5-4-9-7-15-12-3-2-10(13)6-11(9)12\/h2-3,6-7,15H,4-5H2,1H3,(H,14,16)\nAnswer: b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not acting against the SARSCoV2 3CL protease?\nConstraint: You must select none, one or more options from A, B, or C without using any other words.\nOptions:\nA) CNC)C=O)OCCnccnc5\nB) CcccccNC=O)[C@@H]CCCN5)))))))n6\nC) COC=O)COccccC#N))cc6\nAnswer: A, B, C"}", "/scratch/micpie/export/sarscov2_3clpro_diamond/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is successfully targeting the SARSCoV2 3CL protease.\nMolecule SELFIES: [C][C][=Branch1][C][=O][N][C][C][C][=C][NH1][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch2][Ring1][#Branch1]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is acting against the SARSCoV2 3CL protease.\nSMILES: CN(C)C(=O)OCCn1ccnc1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/sarscov2_3clpro_diamond/test_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the InChI InChI=1S\/C12H14Cl2N2O3S\/c13-9-12(17)15-5-7-16(8-6-15)20(18,19)11-3-1-10(14)2-4-11\/h1-4H,5-9H2 is acting against the SARSCoV2 3CL protease?\nAssistant: Yes, this molecule is acting against the SARSCoV2 3CL protease."} {"text":"User: Can you estimate if the molecule with the DeepSMILES OCCcccccn6))))))CCCC5 is targeting the SARSCoV2 3CL protease?\nAssistant: Yes, this molecule is targeting the SARSCoV2 3CL protease."}", "/scratch/micpie/export/sarscov2_3clpro_diamond/train_0-9.jsonl": "{"text":"User: Can you create the DeepSMILES of a molecule that is targeting the SARSCoV2 3CL protease?\nAssistant: Of course, here you go: CC=O)NCCcc[nH]ccccF)cc96"} {"text":"User: Can you create the SELFIES of a molecule that is not acting against the SARSCoV2 3CL protease?\nAssistant: Of course, here you go: [C][N][Branch1][C][C][C][=Branch1][C][=O][O][C][C][N][C][=C][N][=C][Ring1][Branch1]"}", "/scratch/micpie/export/sarscov2_3clpro_diamond/valid_0-3.jsonl": "{"text":"The molecule canonical SMILES NC(=O)[C@H]1CCC[C@H]1c1ccsc1 is acting against the SARSCoV2 3CL protease."} {"text":"The canonical SMILES Nc1cc(=O)[nH][nH]1 is not acting against the SARSCoV2 3CL protease."}", "/scratch/micpie/export/sarscov2_3clpro_diamond/test_0-8.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C12H14Cl2N2O3S\/c13-9-12(17)15-5-7-16(8-6-15)20(18,19)11-3-1-10(14)2-4-11\/h1-4H,5-9H2 acting against the SARSCoV2 3CL protease?\nAssistant: Yes, it is acting against the SARSCoV2 3CL protease."} {"text":"User: Is the molecule with the canonical SMILES OCC1(c2ccccn2)CCCC1 targeting the SARSCoV2 3CL protease?\nAssistant: Yes, it is targeting the SARSCoV2 3CL protease."}", "/scratch/micpie/export/sarscov2_3clpro_diamond/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is successfully targeting the SARSCoV2 3CL protease.\ncanonical SMILES: O=C(CCl)N1CCN(S(=O)(=O)c2ccc(Cl)cc2)CC1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is acting against the SARSCoV2 3CL protease.\nSELFIES: [O][C][C][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][C][C][Ring1][O]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: True"}", "/scratch/micpie/export/sarscov2_3clpro_diamond/test_0-12.jsonl": "{"text":"User: I want to come up with a InChI.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should be successfully targeting the SARSCoV2 3CL protease.\nAssistant: Understood, this InChI is successfully targeting the SARSCoV2 3CL protease: InChI=1S\/C12H14Cl2N2O3S\/c13-9-12(17)15-5-7-16(8-6-15)20(18,19)11-3-1-10(14)2-4-11\/h1-4H,5-9H2"} {"text":"User: I want to come up with a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should be successfully targeting the SARSCoV2 3CL protease.\nAssistant: Got it, this canonical SMILES is successfully targeting the SARSCoV2 3CL protease: OCC1(c2ccccn2)CCCC1"}", "/scratch/micpie/export/drug_protein_pathway/test_0-1.jsonl": "{"text":"The drug Fulvestrant targets the protein ER-alpha. The protein ER-alpha is involved in the Estrogen-dependent gene expression."} {"text":"The drug InChI=1S\/C6H6N4O\/c7-6-9-3-1-2-8-4(3)5(11)10-6\/h1-2,8H,(H3,7,9,10,11) targets the protein HGPRTase. The protein HGPRTase is involved in the Purine salvage."}", "/scratch/micpie/export/drug_protein_pathway/valid_0-0.jsonl": "{"text":"The drug Cn1c(=O)c2c(ncn2C)n(C)c1=O targets the protein Cone cGMP-specific 3',5'-cyclic phosphodiesterase subunit alpha' which is involved in the Metabolic pathways."} {"text":"The drug InChI=1S\/C20H25NO2S2\/c1-14-7-11-24-18(14)17(19-15(2)8-12-25-19)6-4-10-21-9-3-5-16(13-21)20(22)23\/h6-8,11-12,16H,3-5,9-10,13H2,1-2H3,(H,22,23)\/t16-\/m1\/s1 targets the protein Solute carrier family 6 member 1 which is involved in the Reuptake of GABA."}", "/scratch/micpie/export/drug_protein_pathway/test_0-2.jsonl": "{"text":"User: Can you come up with one example for a protein that binds the drug [H][C@@][C][C][C@H1][Branch1][C][O][C@@][Ring1][=Branch1][Branch1][C][C][C][C][C@][Branch1][C][H][C][=C][C][=C][Branch1][C][O][C][=C][Ring1][#Branch1][C][C@@H1][Branch2][Ring1][S][C][C][C][C][C][C][C][C][C][S][=Branch1][C][=O][C][C][C][C][Branch1][C][F][Branch1][C][F][C][Branch1][C][F][Branch1][C][F][F][C@@][Ring2][Ring2][#Branch2][Ring2][Ring1][P][H]?\nAssistant: Yes, the drug [H][C@@][C][C][C@H1][Branch1][C][O][C@@][Ring1][=Branch1][Branch1][C][C][C][C][C@][Branch1][C][H][C][=C][C][=C][Branch1][C][O][C][=C][Ring1][#Branch1][C][C@@H1][Branch2][Ring1][S][C][C][C][C][C][C][C][C][C][S][=Branch1][C][=O][C][C][C][C][Branch1][C][F][Branch1][C][F][C][Branch1][C][F][Branch1][C][F][F][C@@][Ring2][Ring2][#Branch2][Ring2][Ring1][P][H] targets the protein ESR1.\nUser: Can you tell me more details about protein ESR1?\nAssistant: Of course, the protein ESR1 is involved in the Estrogen-dependent gene expression."} {"text":"User: Can you come up with an example for a protein that binds the drug 9-Deazaguanine?\nAssistant: Of course, the drug 9-Deazaguanine targets the protein HPRT1.\nUser: Can you tell me more details about protein HPRT1?\nAssistant: Yes, the protein HPRT1 is involved in the Purine salvage."}", "/scratch/micpie/export/drug_protein_pathway/test_0-0.jsonl": "{"text":"The drug InChI=1S\/C32H47F5O3S\/c1-30-17-15-26-25-12-11-24(38)21-23(25)20-22(29(26)27(30)13-14-28(30)39)10-7-5-3-2-4-6-8-18-41(40)19-9-16-31(33,34)32(35,36)37\/h11-12,21-22,26-29,38-39H,2-10,13-20H2,1H3\/t22-,26-,27+,28+,29-,30+,41?\/m1\/s1 targets the protein Nuclear receptor subfamily 3 group A member 1 which is involved in the Estrogen-dependent gene expression."} {"text":"The drug NC1=NC2=C(NC=C2)C(=O)N1 targets the protein HGPRT which is involved in the Purine salvage."}", "/scratch/micpie/export/drug_protein_pathway/train_0-0.jsonl": "{"text":"The drug [H][C@](O)(CCC(O)=O)NC1=CC=C(C=C1)N1C(=O)CCC1=O targets the protein Myosin RLC which is involved in the Platelet activation."} {"text":"The drug InChI=1S\/C6H9N3O2\/c7-5(6(10)11)1-4-2-8-3-9-4\/h2-3,5H,1,7H2,(H,8,9)(H,10,11)\/t5-\/m0\/s1 targets the protein Histidase which is involved in the Histidine catabolism."}", "/scratch/micpie/export/drug_protein_pathway/valid_0-2.jsonl": "{"text":"User: Can you come up with one example for a protein that binds the drug CN1C=NC2=C1C(=O)N(C)C(=O)N2C?\nAssistant: Yes, the drug CN1C=NC2=C1C(=O)N(C)C(=O)N2C targets the protein PDE6C.\nUser: Can you tell me more about protein PDE6C?\nAssistant: Yes, the protein PDE6C is involved in the Metabolic pathways."} {"text":"User: Can you give me an example for a protein that binds the drug Tiagabine?\nAssistant: Of course, the drug Tiagabine targets the protein SLC6A1.\nUser: Can you tell me more about protein SLC6A1?\nAssistant: Yes, the protein SLC6A1 is involved in the Reuptake of GABA."}", "/scratch/micpie/export/drug_protein_pathway/valid_0-1.jsonl": "{"text":"The drug CNC=NC=C5C=O)NC)C=O)N6C targets the protein cGMP phosphodiesterase 6C. The protein cGMP phosphodiesterase 6C is involved in the Metabolic pathways."} {"text":"The drug Tiagabine targets the protein GAT-1. The protein GAT-1 is involved in the Reuptake of GABA."}", "/scratch/micpie/export/drug_protein_pathway/train_0-2.jsonl": "{"text":"User: Can you come up with an example for a protein that binds the drug 4-[4-(2,5-DIOXO-PYRROLIDIN-1-YL)-PHENYLAMINO]-4-HYDROXY-BUTYRIC ACID?\nAssistant: Yes, the drug 4-[4-(2,5-DIOXO-PYRROLIDIN-1-YL)-PHENYLAMINO]-4-HYDROXY-BUTYRIC ACID targets the protein MYL12A.\nUser: Can you tell me more details about protein MYL12A?\nAssistant: Of course, the protein MYL12A is involved in the Platelet activation."} {"text":"User: Can you give me an example for a protein that binds the drug InChI=1S\/C6H9N3O2\/c7-5(6(10)11)1-4-2-8-3-9-4\/h2-3,5H,1,7H2,(H,8,9)(H,10,11)\/t5-\/m0\/s1?\nAssistant: Yes, of course, the drug InChI=1S\/C6H9N3O2\/c7-5(6(10)11)1-4-2-8-3-9-4\/h2-3,5H,1,7H2,(H,8,9)(H,10,11)\/t5-\/m0\/s1 targets the protein HAL.\nUser: Can you tell me more details about protein HAL?\nAssistant: Yes, of course, the protein HAL is involved in the Histidine catabolism."}", "/scratch/micpie/export/drug_protein_pathway/train_0-1.jsonl": "{"text":"The drug 4-[4-(2,5-DIOXO-PYRROLIDIN-1-YL)-PHENYLAMINO]-4-HYDROXY-BUTYRIC ACID targets the protein Myosin RLC. The protein Myosin RLC is involved in the Platelet activation."} {"text":"The drug N[C@@H]CC=CNC=N5))))))CO)=O targets the protein Histidine ammonia-lyase. The protein Histidine ammonia-lyase is involved in the Histidine catabolism."}", "/scratch/micpie/export/bio_ner/valid_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the text below.\nText: Sometimes they are perceived like the tinnitus in patient's own head..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: head,66,70,Anatomy"} {"text":"Task: Please carry out the named entity recognition (NER) task for the text below.\nText: The H2 gas was provided in the reactors R1 and R2 by using a peristaltic pump operating at a flow rate of 1.7mL\/min and was injected through Al2O3 ceramic membranes having pore size of 1.2m..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: H2 gas,4,10,treatment"}", "/scratch/micpie/export/bio_ner/test_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the text below.\nText: In the 1950s doctors prescribed DES to pregnant women to prevent miscarriage and premature births and to produce \"bigger and stronger babies\" even though DES had been shown to cause damage to reproductive tissues in animals (Dinusson et al. 1948; Dunn and Green 1963; Takasugi and Bern 1964)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: tissues,207,214,Anatomy"} {"text":"Task: Please carry out the named entity recognition (NER) task for the text below.\nText: Hoffman's tobacco hornworm diet was used to manipulate the G. mellonella gut microbiota..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: Hoffman ' s tobacco hornworm diet,0,33,treatment"}", "/scratch/micpie/export/bio_ner/train_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the text below.\nText: The secondary antibody used for both BrdU and nucleoli staining was a rhodamine (tetra-methyl)-conjugated goat anti-mouse antibody (T-2762) from Molecular Probe (Eugene, OR)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: nucleoli,46,54,Anatomy"} {"text":"Task: Please carry out the named entity recognition task for the text below.\nText: On day 50 (denoted by the asterisk) of the 75-day synbiotic supplementation period the majority juveniles stools (n = 5) were observed to return to normal..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: synbiotic supplementation,53,78,treatment"}", "/scratch/micpie/export/compound_chebi/test_0-1.jsonl": "{"text":"Task: Please generate a compound canonical SMILES that is a sulfasalazine.\nResult: O=C(O)c1cc(\/N=N\/c2ccc(S(=O)(=O)Nc3ccccn3)cc2)ccc1O"} {"text":"Task: Please create a compound DeepSMILES that is a 2-aminosulfonyl-benzoic acid methyl ester.\nResult: COC=O)cccccc6SN)=O)=O"}", "/scratch/micpie/export/compound_chebi/valid_0-0.jsonl": "{"text":"The compound canonical SMILES Fc1ccccc1COc1ccc(C(=S)N2CCCC2)cc1 is a [4-[(2-fluorophenyl)methoxy]phenyl]-(1-pyrrolidinyl)methanethione."} {"text":"The compound canonical SMILES CC[C@H](C)[C@H](NC(=O)[C@H](Cc1ccc(O)cc1)NC(=O)[C@@H]1CCCN1C(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@@H]1CCCN1C(=O)[C@H](CCCCN)NC(=O)[C@H](CC(N)=O)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](Cc1ccc(O)cc1)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H]1CCC(=O)N1)C(=O)N[C@@H](CC(C)C)C(=O)O is a neurotensin."}", "/scratch/micpie/export/compound_chebi/test_0-0.jsonl": "{"text":"The compound SELFIES [O][=C][Branch1][C][O][C][=C][C][Branch2][Ring1][=N][\/N][=N][\/C][=C][C][=C][Branch1][P][S][=Branch1][C][=O][=Branch1][C][=O][N][C][=C][C][=C][C][=N][Ring1][=Branch1][C][=C][Ring1][S][=C][C][=C][Ring2][Ring1][Branch2][O] is a sulfasalazine."} {"text":"The compound SELFIES [C][O][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][S][Branch1][C][N][=Branch1][C][=O][=O] is a 2-aminosulfonyl-benzoic acid methyl ester."}", "/scratch/micpie/export/compound_chebi/train_0-0.jsonl": "{"text":"The compound SELFIES [N][C][C][C][C][C][C][N] is a hexane-1,6-diamine."} {"text":"The compound SELFIES [C][C][C][C][C][C][C][C][\/C][=C][\\C][C][C][C][C][C][C][C][Branch1][C][N][=O] is a oleamide."}", "/scratch/micpie/export/compound_chebi/valid_0-1.jsonl": "{"text":"Task: Please generate a compound canonical SMILES that is a [4-[(2-fluorophenyl)methoxy]phenyl]-(1-pyrrolidinyl)methanethione.\nResult: Fc1ccccc1COc1ccc(C(=S)N2CCCC2)cc1"} {"text":"Task: Please create a compound SELFIES that is a neurotensin.\nResult: [C][C][C@H1][Branch1][C][C][C@H1][Branch2][N][Branch2][N][C][=Branch1][C][=O][C@H1][Branch1][=N][C][C][=C][C][=C][Branch1][C][O][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C@@H1][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][C@H1][Branch1][#Branch2][C][C][C][N][C][=Branch1][C][=N][N][N][C][=Branch1][C][=O][C@H1][Branch1][#Branch2][C][C][C][N][C][=Branch1][C][=N][N][N][C][=Branch1][C][=O][C@@H1][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][C][C][C][N][N][C][=Branch1][C][=O][C@H1][Branch1][#Branch1][C][C][Branch1][C][N][=O][N][C][=Branch1][C][=O][C@H1][Branch1][Branch2][C][C][C][=Branch1][C][=O][O][N][C][=Branch1][C][=O][C@H1][Branch1][=N][C][C][=C][C][=C][Branch1][C][O][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C@H1][Branch1][#Branch1][C][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@@H1][C][C][C][=Branch1][C][=O][N][Ring1][=Branch1][C][=Branch1][C][=O][N][C@@H1][Branch1][#Branch1][C][C][Branch1][C][C][C][C][=Branch1][C][=O][O]"}", "/scratch/micpie/export/compound_chebi/train_0-1.jsonl": "{"text":"Task: Please generate a canonical SMILES that is a hexane-1,6-diamine.\nResult: NCCCCCCN"} {"text":"Task: Please create a SMILES that is a oleamide.\nResult: CCCCCCCC\/C=C\\CCCCCCCC(N)=O"}", "/scratch/micpie/export/bio_ner_34/train_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: Antibodies used in this study were as follows: rabbit anti-Er81 [ 14], rabbit anti-Pea3 [ 14], rabbit anti-PV [ 14], rabbit anti-eGFP (Molecular Probes, Eugene, Oregon, United States), rabbit anti-Calbindin, rabbit anti-Calretinin (Swant, Bellinzona, Switzerland), rabbit anti-CGRP (Chemicon, Temecula, California, United States), rabbit anti-vGlut1 (Synaptic Systems, Goettingen, Germany), rabbit anti-Brn3a (gift from E. Turner), rabbit anti-TrkA and-p75 (gift from L. F. Reichardt), rabbit anti-Runx3 (Kramer and Arber, unpublished reagent), rabbit anti-Rhodamine (Molecular Probes), mouse anti-neurofilament (American Type Culture Collection, Manassas, Virginia, United States), sheep anti-eGFP (Biogenesis, Poole, United Kingdom), goat anti-LacZ [ 14], goat anti-TrkC (gift from L. F. Reichardt), and guinea pig anti-Isl1 [ 14]..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: Antibodies,0,10,GO_ontology\nrabbit,47,53,Organism\nEr81,61,65,Gene_or_geneproduct\nrabbit,73,79,Organism\nPea3,87,91,Gene_or_geneproduct\nrabbit,99,105,Organism\nPV,113,115,Gene_or_geneproduct\nrabbit,123,129,Organism\nMolecular,144,153,Chemical\nrabbit,194,200,Organism\nCalbindin,208,217,Gene_or_geneproduct\nrabbit,219,225,Organism\nCalretinin,233,243,Gene_or_geneproduct\nrabbit,279,285,Organism\nrabbit,348,354,Organism\nvGlut1,362,368,Gene_or_geneproduct\nrabbit,411,417,Organism\nBrn3a,425,430,Gene_or_geneproduct\nrabbit,455,461,Organism\nTrkA,469,473,Gene_or_geneproduct\np75,480,483,Gene_or_geneproduct\nrabbit,514,520,Organism\nRunx3,528,533,Gene_or_geneproduct\nreagent,566,573,Chemical\nrabbit,576,582,Organism\nMolecular,602,611,Chemical\nmouse,621,626,Organism\nneurofilament,634,647,GO_ontology\nsheep,720,725,Organism\ngoat,776,780,Organism\ngoat,800,804,Organism\nTrkC,812,816,Gene_or_geneproduct\nguinea pig,851,861,Organism\nIsl1,869,873,Gene_or_geneproduct"} {"text":"Task: Please carry out the NER task for the the text below.\nText: Antibodies used in this study were as follows: rabbit anti-Er81 [ 14], rabbit anti-Pea3 [ 14], rabbit anti-PV [ 14], rabbit anti-eGFP (Molecular Probes, Eugene, Oregon, United States), rabbit anti-Calbindin, rabbit anti-Calretinin (Swant, Bellinzona, Switzerland), rabbit anti-CGRP (Chemicon, Temecula, California, United States), rabbit anti-vGlut1 (Synaptic Systems, Goettingen, Germany), rabbit anti-Brn3a (gift from E. Turner), rabbit anti-TrkA and-p75 (gift from L. F. Reichardt), rabbit anti-Runx3 (Kramer and Arber, unpublished reagent), rabbit anti-Rhodamine (Molecular Probes), mouse anti-neurofilament (American Type Culture Collection, Manassas, Virginia, United States), sheep anti-eGFP (Biogenesis, Poole, United Kingdom), goat anti-LacZ [ 14], goat anti-TrkC (gift from L. F. Reichardt), and guinea pig anti-Isl1 [ 14]..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: Antibodies,0,10,GO_ontology\nrabbit,47,53,Organism\nEr81,61,65,Gene_or_geneproduct\nrabbit,73,79,Organism\nPea3,87,91,Gene_or_geneproduct\nrabbit,99,105,Organism\nPV,113,115,Gene_or_geneproduct\nrabbit,123,129,Organism\nMolecular,144,153,Chemical\nrabbit,194,200,Organism\nCalbindin,208,217,Gene_or_geneproduct\nrabbit,219,225,Organism\nCalretinin,233,243,Gene_or_geneproduct\nrabbit,279,285,Organism\nrabbit,348,354,Organism\nvGlut1,362,368,Gene_or_geneproduct\nrabbit,411,417,Organism\nBrn3a,425,430,Gene_or_geneproduct\nrabbit,455,461,Organism\nTrkA,469,473,Gene_or_geneproduct\np75,480,483,Gene_or_geneproduct\nrabbit,514,520,Organism\nRunx3,528,533,Gene_or_geneproduct\nreagent,566,573,Chemical\nrabbit,576,582,Organism\nMolecular,602,611,Chemical\nmouse,621,626,Organism\nneurofilament,634,647,GO_ontology\nsheep,720,725,Organism\ngoat,776,780,Organism\ngoat,800,804,Organism\nTrkC,812,816,Gene_or_geneproduct\nguinea pig,851,861,Organism\nIsl1,869,873,Gene_or_geneproduct"}", "/scratch/micpie/export/MUV_846/valid_0-0.jsonl": "{"text":"The compound with the SELFIES representation of ['[O][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][N][=C][N][Ring1][#Branch2][C][C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][N+1][=Branch1][C][=O][O-1]'] is not an inhibitor of factor XIa (FXIa)."} {"text":"The chemical compound with the DeepSMILES CcccNC=O)CNC=O)CCcccco5)))))Scccccc6%11))))))))))))))no5 is not an inhibitor of factor XIa (FXIa)."}", "/scratch/micpie/export/MUV_846/test_0-0.jsonl": "{"text":"The molecular species with the DeepSMILES O=CCNCcccccc6)))))))S=O)=O)cccccc6)))))))))N\/N=C\/cccccc6)OCO5 is not an inhibitor of factor XIa (FXIa)."} {"text":"The molecular species with the SELFIES representation of ['[C][N][Branch1][C][C][C][=C][C][=N][C][S][C][=C][Branch1][=C][S][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=C][N][=C][Ring1][#C][C][Ring2][Ring1][=Branch1][=Ring2][Ring1][C]'] is not an inhibitor of factor XIa (FXIa)."}", "/scratch/micpie/export/MUV_846/train_0-0.jsonl": "{"text":"The chemical with the SELFIES representation of ['[C][C][O][C][=Branch1][C][=O][C][Branch1][=N][C][C][=C][C][=C][Branch1][C][O][C][=C][Ring1][#Branch1][N][C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][C][C][=Branch1][C][=O][N][Branch1][#Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=Branch1][C][=O][C][Ring1][=C][Ring2][Ring1][=C]'] is not an inhibitor of factor XIa (FXIa)."} {"text":"The compound with the canonical SMILES CCN(CC)c1ccc(\/C=C2\/C(=O)ON=C2C)cc1 is not an inhibitor of factor XIa (FXIa)."}", "/scratch/micpie/export/bio_ner_21/valid_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: DNA from 36 HNSCC tumors had been previously sequenced and found to carry 65 mutations in 5 genes (TP53, NOTCH1, CDKN2A, CASP8, PTEN) from a panel of the 14 most frequently mutated genes (CASP8, CDKN2A, FAT1, FBXW7, HRAS, IRF6, MLL2, NOTCH1, NSD1, PTEN, PIK3CA, RB1, TP53, TP63) (unpublished results)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: HNSCC,12,17,Disease\/Disorder\ntumors,18,24,Disease\/Disorder\nTP53,100,104,Gene\/Protein\nNOTCH1,106,112,Gene\/Protein\nCDKN2A,114,120,Gene\/Protein\nCASP8,122,127,Gene\/Protein\nPTEN,129,133,Gene\/Protein\nCASP8,190,195,Gene\/Protein\nCDKN2A,197,203,Gene\/Protein\nFAT1,205,209,Gene\/Protein\nFBXW7,211,216,Gene\/Protein\nHRAS,218,222,Gene\/Protein\nIRF6,224,228,Gene\/Protein\nMLL2,230,234,Gene\/Protein\nNOTCH1,236,242,Gene\/Protein\nNSD1,244,248,Gene\/Protein\nPTEN,250,254,Gene\/Protein\nPIK3CA,256,262,Gene\/Protein\nRB1,264,267,Gene\/Protein\nTP53,269,273,Gene\/Protein\nTP63,275,279,Gene\/Protein"} {"text":"Task: Please carry out the NER task for the the text below.\nText: Wall Identifier Formation Sample days after fracturing Bacteria isolated Geochemical analyses performed Microbial analyses performed Figures present Utica-3 Utica-Point Pleasant 38-460 Marinobacter sp. UTICA-S1B6 N-NH3, TN, S2, SO42 n\/a Figure 2 Utica-6 Utica-Point Pleasant 38-460 n\/a N-NH3, TN, S2, SO42 n\/a Figure 2 Utica-7 Utica-Point Pleasant 38-460 n\/a N-NH3, TN, S2, SO42 n\/a Figure 2 Utica 8 Utica-Point Pleasant 38-460 Arcobacter sp. UTICA-S4D1 N-NH3, TN, S2, SO42 n\/a Figure 2 Marcellus-1 Marcellus 4-328 n\/a n\/a 16S EMIRGE Figure 3 Marcellus-4 Marcellus 24-485 Arcobacter sp. MARC-MIP3H16 NH3, TN, S2, SO42, CI, NPOC, CO2 Metagenomics, 16S EMIRGE Figures 2, 3, 5 Marcellus-5 Marcellus 35 496 n\/a NH3, TN, S2, SO42 n\/a Figure 2 Sample Collection Produced fluid samples were collected from six hydraulically fractured natural-gas wells in the northern Appalachian Basin: four from the Utica-Point Pleasant Formation (Utica-3, Utica-6, Utica-7, and Utica-8) and two from the Marcellus Shale (Marcellus-4 and Marcellus-5)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: Utica - 3,149,158,site\nUtica - Point Pleasant,159,181,site\nUtica - 6,258,267,site\nUtica - Point Pleasant,268,290,site\nUtica - 7,343,352,site\nUtica - Point Pleasant,353,375,site\nUtica - Point Pleasant,436,458,site\nMarcellus - 1,533,546,site\nMarcellus,547,556,site\nMarcellus - 4,597,610,site\nMarcellus,611,620,site\nMarcellus - 5,734,747,site\nMarcellus,748,757,site\nUtica - Point Pleasant,962,984,site\nUtica - 3,997,1006,site\nUtica - 6,1008,1017,site\nUtica - 7,1019,1028,site\nUtica - 8,1034,1043,site\nMarcellus Shale,1062,1077,site\nMarcellus - 4,1080,1093,site\nMarcellus - 5,1098,1111,site"}", "/scratch/micpie/export/bio_ner_21/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: SREBP activation decreased as ECs migration slowed; (2) Coincidental with SREBP activation, mRNA expression of its target genes such as low density lipoprotein receptor, HMG-CoA reductase, and fatty acid synthase also increased in migrating ECs population as detected by real-time PCR; (3) Migration induced SREBP activation in ECs was inhibited by SREBP-acting protein RNAi and pharmacologically by 25-hydroxycholesterol; (4) Inhibition of SREBP led to decreased ECs migration in various models; (5) Cells genetically deficient in SREBP-acting protein, S1P, or S2P, phenotypically exhibited impaired migration; (6) SREBP inhibition in ECs suppressed the activity of small GTPase Cdc42, a key molecule for ECs motility..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: SREBP,0,5,Gene_or_gene_product\nECs,30,33,Cell\nSREBP,75,80,Gene_or_gene_product\nlow density lipoprotein receptor,137,169,Gene_or_gene_product\nHMG - CoA reductase,171,190,Gene_or_gene_product\nfatty acid synthase,196,215,Gene_or_gene_product\nECs,244,247,Cell\nSREBP,314,319,Gene_or_gene_product\nECs,334,337,Cell\nSREBP - acting protein,355,377,Gene_or_gene_product\n25 - hydroxycholesterol,408,431,Chemical\nSREBP,452,457,Gene_or_gene_product\nECs,475,478,Cell\nCells,513,518,Cell\nSREBP - acting protein,544,566,Gene_or_gene_product\nS1P,568,571,Gene_or_gene_product\nS2P,576,579,Gene_or_gene_product\nSREBP,631,636,Gene_or_gene_product\nECs,651,654,Cell\nCdc42,695,700,Gene_or_gene_product\nECs,721,724,Cell"} {"text":"Task: Please carry out the NER task for the the text below.\nText: Five of these were caused by mutations predicted to produce a truncated protein: (i) a sporadic case showed a 32 bp deletion in exon 11, and a mutant mRNA without exon 11 was produced; the normal exon 10 was also spliced out; (ii) a sporadic case had a 1 bp deletion in exon 12 (1634delT); (iii) a TSC2-linked mother and daughter pair had a G--> T transversion in exon 23 (G2715T) introducing a cryptic splice site causing a 29 bp truncation of mRNA from exon 23; (iv) a sporadic case showed a 2 bp deletion in exon 36; (v) a sporadic case showed a 1 bp insertion disrupting the donor splice site of exon 37 (5007+ 2insA), resulting in the use of an upstream exonic cryptic splice site to cause a 29 bp truncation of mRNA from exon 37..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: truncated protein,62,79,Gene\/Protein\n32 bp deletion,111,125,Gene\/Protein\nexon 11,129,136,Gene\/Protein\nmutant mRNA,144,155,Gene\/Protein\nexon 11,164,171,Gene\/Protein\n1 bp deletion,255,268,Gene\/Protein\nexon 12,272,279,Gene\/Protein\n1634delT,282,290,Gene\/Protein\nTSC2 - linked mother and daughter pair,302,340,Gene\/Protein\nexon 23,373,380,Gene\/Protein\nG2715T,383,389,Gene\/Protein\nexon 23,465,472,Gene\/Protein\n2 bp deletion,505,518,Gene\/Protein\nexon 36,522,529,Gene\/Protein\n1 bp insertion,561,575,Gene\/Protein\nexon 37,612,619,Gene\/Protein\n5007 + 2insA,622,634,Gene\/Protein\nupstream exonic cryptic splice site,664,699,Gene\/Protein\n29 bp truncation,711,727,Gene\/Protein\nmRNA,731,735,Gene\/Protein\nexon 37,741,748,Gene\/Protein"}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/test_0-10.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that is not modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: This is a molecule that is not modulating the M1 muscarinic receptor activity in a negative way: O(c1cc2CC(n3nnnc3c2cc1OC)(C)C)C"} {"text":"User: I'm searching for the SELFIES of a molecule that is not modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: This is a molecule that is not modulating the M1 muscarinic receptor activity in a negative way: [O][Branch2][Ring1][=Branch1][C][C][=Branch1][C][=O][C][=C][Branch1][N][N][Branch1][Branch2][C][=Branch1][Ring2][=C][Ring1][Branch1][C][C][C][C][=C][N][=C][C][=C][C][Ring1][=Branch1][=C][C][=C][Ring1][#Branch2]"}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/valid_0-8.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C20H25N5O2\/c1-3-12(2)22-20(26)16-17-19(24-15-9-5-4-8-14(15)23-17)25(18(16)21)11-13-7-6-10-27-13\/h4-5,8-9,12-13H,3,6-7,10-11,21H2,1-2H3,(H,22,26) modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: No, it is not modulating the M1 muscarinic receptor activity in a negative way."} {"text":"User: Is the molecule with the DeepSMILES OCCCC5)))Cnnnnc5CNCCccC6)cccc6)))))))))cccc[nH]c6=O)))cccOC))c6 modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: No, it is not modulating the M1 muscarinic receptor activity in a negative way."}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/train_0-8.jsonl": "{"text":"User: Is the molecule with the canonical SMILES COC(=O)CSc1nc2c(cc1C#N)C(=O)CCC2 modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: No, it is not modulating the M1 muscarinic receptor activity in a negative way."} {"text":"User: Is the molecule with the SMILES Clc1n(c2c(n1)cccc2)CCC#N modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: No, it is not modulating the M1 muscarinic receptor activity in a negative way."}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a negative way.\nDeepSMILES: OcccCCnnnnc5c9cc%13OC))))))))))C)C))))))C\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not modulating the M1 muscarinic receptor activity in a negative way."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a negative way.\nSELFIES: [O][Branch2][Ring1][=Branch1][C][C][=Branch1][C][=O][C][=C][Branch1][N][N][Branch1][Branch2][C][=Branch1][Ring2][=C][Ring1][Branch1][C][C][C][C][=C][N][=C][C][=C][C][Ring1][=Branch1][=C][C][=C][Ring1][#Branch2]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not modulating the M1 muscarinic receptor activity in a negative way."}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/valid_0-9.jsonl": "{"text":"User: Can you generate the DeepSMILES of a molecule that is not modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: Sure, here you go: OCCCC5)))Cncnccnc6cc9N))C=O)NCCC))C)))))))cccc6"} {"text":"User: Can you give me the SMILES of a molecule that is not modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: Yes, here you go: O1C(CCC1)Cn1nnnc1C(N1CCc2c(C1)cccc2)c1cc2c([nH]c1=O)ccc(OC)c2"}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/test_0-1.jsonl": "{"text":"Based on the canonical SMILES COc1cc2c(cc1OC)-c1nnnn1C(C)(C)C2, the molecule shows no negative modulation of the M1 muscarinic receptor activity."} {"text":"Based on the canonical SMILES Cc1cc(C(=O)COc2cccc3cccnc23)c(C)n1C, the molecule shows no negative modulation of the M1 muscarinic receptor activity."}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/valid_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of [O][C][Branch1][=Branch1][C][C][C][Ring1][Branch1][C][N][C][=N][C][=C][Branch2][Ring1][#Branch1][N][=C][Ring1][=Branch1][C][=Branch1][Branch1][=C][Ring1][=Branch2][N][C][=Branch1][C][=O][N][C][Branch1][Ring1][C][C][C][C][=C][C][=C][Ring2][Ring1][C] exhibits no negative modulation of the M1 muscarinic receptor activity."} {"text":"The molecule with the SELFIES [O][C][Branch1][=Branch1][C][C][C][Ring1][Branch1][C][N][N][=N][N][=C][Ring1][Branch1][C][Branch1][P][N][C][C][C][=C][Branch1][Ring2][C][Ring1][=Branch1][C][=C][C][=C][Ring1][#Branch1][C][=C][C][=C][Branch1][=Branch1][NH1][C][Ring1][=Branch1][=O][C][=C][C][Branch1][Ring1][O][C][=C][Ring1][O] displays no negative modulation of the M1 muscarinic receptor activity."}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/test_0-2.jsonl": "{"text":"The DeepSMILES OcccCCnnnnc5c9cc%13OC))))))))))C)C))))))C represents a molecule that displays no negative modulation of the M1 muscarinic receptor activity."} {"text":"The InChI InChI=1S\/C18H18N2O2\/c1-12-10-15(13(2)20(12)3)16(21)11-22-17-8-4-6-14-7-5-9-19-18(14)17\/h4-10H,11H2,1-3H3 is from a molecule that shows no negative modulation of the M1 muscarinic receptor activity."}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/valid_0-10.jsonl": "{"text":"User: I'm searching for the DeepSMILES of a molecule that is not modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: This is a molecule that is not modulating the M1 muscarinic receptor activity in a negative way: OCCCC5)))Cncnccnc6cc9N))C=O)NCCC))C)))))))cccc6"} {"text":"User: I'm looking for the InChI of a molecule that is not modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: This is a molecule that is not modulating the M1 muscarinic receptor activity in a negative way: InChI=1S\/C26H28N6O3\/c1-34-20-8-9-23-19(13-20)14-22(26(33)27-23)24(31-11-10-17-5-2-3-6-18(17)15-31)25-28-29-30-32(25)16-21-7-4-12-35-21\/h2-3,5-6,8-9,13-14,21,24H,4,7,10-12,15-16H2,1H3,(H,27,33)"}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/train_0-6.jsonl": "{"text":"Task: Please give me a molecule canonical SMILES based on the text description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a negative way.\nResult: COC(=O)CSc1nc2c(cc1C#N)C(=O)CCC2"} {"text":"Task: Please generate a InChI based on the text description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a negative way.\nResult: InChI=1S\/C10H8ClN3\/c11-10-13-8-4-1-2-5-9(8)14(10)7-3-6-12\/h1-2,4-5H,3,7H2"}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/valid_0-6.jsonl": "{"text":"Task: Please generate a molecule canonical SMILES based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a negative way.\nResult: CCC(C)NC(=O)c1c(N)n(CC2CCCO2)c2nc3ccccc3nc12"} {"text":"Task: Please generate a InChI based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a negative way.\nResult: InChI=1S\/C26H28N6O3\/c1-34-20-8-9-23-19(13-20)14-22(26(33)27-23)24(31-11-10-17-5-2-3-6-18(17)15-31)25-28-29-30-32(25)16-21-7-4-12-35-21\/h2-3,5-6,8-9,13-14,21,24H,4,7,10-12,15-16H2,1H3,(H,27,33)"}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/test_0-9.jsonl": "{"text":"User: Can you generate the SMILES of a molecule that is not modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: Yes, here you go: O(c1cc2CC(n3nnnc3c2cc1OC)(C)C)C"} {"text":"User: Can you create the DeepSMILES of a molecule that is not modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: Sure, here you go: OCC=O)ccncc5)C))C))C)))))ccncccc6ccc%10"}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/test_0-0.jsonl": "{"text":"The molecule with the DeepSMILES OcccCCnnnnc5c9cc%13OC))))))))))C)C))))))C shows no negative modulation of the M1 muscarinic receptor activity."} {"text":"The molecule with the SELFIES representation of [O][Branch2][Ring1][=Branch1][C][C][=Branch1][C][=O][C][=C][Branch1][N][N][Branch1][Branch2][C][=Branch1][Ring2][=C][Ring1][Branch1][C][C][C][C][=C][N][=C][C][=C][C][Ring1][=Branch1][=C][C][=C][Ring1][#Branch2] displays no negative modulation of the M1 muscarinic receptor activity."}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/valid_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the SMILES O1C(CCC1)Cn1c2nc3c(nc2c(c1N)C(=O)NC(CC)C)cccc3 is modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: No, this molecule is not modulating the M1 muscarinic receptor activity in a negative way."} {"text":"User: Can you tell me if the molecule with the SELFIES [O][C][Branch1][=Branch1][C][C][C][Ring1][Branch1][C][N][N][=N][N][=C][Ring1][Branch1][C][Branch1][P][N][C][C][C][=C][Branch1][Ring2][C][Ring1][=Branch1][C][=C][C][=C][Ring1][#Branch1][C][=C][C][=C][Branch1][=Branch1][NH1][C][Ring1][=Branch1][=O][C][=C][C][Branch1][Ring1][O][C][=C][Ring1][O] is modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: No, this molecule is not modulating the M1 muscarinic receptor activity in a negative way."}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/test_0-3.jsonl": "{"text":"The molecule DeepSMILES OcccCCnnnnc5c9cc%13OC))))))))))C)C))))))C is not modulating the M1 muscarinic receptor activity in a negative way."} {"text":"The molecule DeepSMILES OCC=O)ccncc5)C))C))C)))))ccncccc6ccc%10 is not modulating the M1 muscarinic receptor activity in a negative way."}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/valid_0-11.jsonl": "{"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be modulating the M1 muscarinic receptor activity in a negative way.\nAssistant: Ok, this DeepSMILES is not modulating the M1 muscarinic receptor activity in a negative way: OCCCC5)))Cncnccnc6cc9N))C=O)NCCC))C)))))))cccc6"} {"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be modulating the M1 muscarinic receptor activity in a negative way.\nAssistant: Got it, this SELFIES is not modulating the M1 muscarinic receptor activity in a negative way: [O][C][Branch1][=Branch1][C][C][C][Ring1][Branch1][C][N][N][=N][N][=C][Ring1][Branch1][C][Branch1][P][N][C][C][C][=C][Branch1][Ring2][C][Ring1][=Branch1][C][=C][C][=C][Ring1][#Branch1][C][=C][C][=C][Branch1][=Branch1][NH1][C][Ring1][=Branch1][=O][C][=C][C][Branch1][Ring1][O][C][=C][Ring1][O]"}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/train_0-0.jsonl": "{"text":"The molecule with the SMILES representation of S(c1nc2CCCC(=O)c2cc1C#N)CC(OC)=O displays no negative modulation of the M1 muscarinic receptor activity."} {"text":"The molecule with the SMILES Clc1n(c2c(n1)cccc2)CCC#N displays no negative modulation of the M1 muscarinic receptor activity."}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/test_0-6.jsonl": "{"text":"Task: Please create a molecule canonical SMILES based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a negative way.\nResult: COc1cc2c(cc1OC)-c1nnnn1C(C)(C)C2"} {"text":"Task: Please create a molecule SMILES based on the text description below.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a negative way.\nResult: O(CC(=O)c1c(n(c(c1)C)C)C)c1c2ncccc2ccc1"}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/train_0-10.jsonl": "{"text":"User: I'm looking for the InChI of a molecule that is not modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: This is a molecule that is not modulating the M1 muscarinic receptor activity in a negative way: InChI=1S\/C13H12N2O3S\/c1-18-12(17)7-19-13-8(6-14)5-9-10(15-13)3-2-4-11(9)16\/h5H,2-4,7H2,1H3"} {"text":"User: I'm searching for the InChI of a molecule that is not modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: This is a molecule that is not modulating the M1 muscarinic receptor activity in a negative way: InChI=1S\/C10H8ClN3\/c11-10-13-8-4-1-2-5-9(8)14(10)7-3-6-12\/h1-2,4-5H,3,7H2"}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/train_0-3.jsonl": "{"text":"The DeepSMILES ScncCCCC=O)c6cc%10C#N))))))))))))CCOC))=O is not modulating the M1 muscarinic receptor activity in a negative way."} {"text":"The SELFIES [Cl][C][N][Branch1][=C][C][=C][Branch1][Ring2][N][=Ring1][Branch1][C][=C][C][=C][Ring1][#Branch1][C][C][C][#N] is not modulating the M1 muscarinic receptor activity in a negative way."}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/train_0-12.jsonl": "{"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be modulating the M1 muscarinic receptor activity in a negative way.\nAssistant: Ok, this SELFIES is not modulating the M1 muscarinic receptor activity in a negative way: [S][Branch2][Ring1][Ring2][C][=N][C][C][C][C][C][=Branch1][C][=O][C][=Ring1][#Branch1][C][=C][Ring1][O][C][#N][C][C][Branch1][Ring1][O][C][=O]"} {"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be modulating the M1 muscarinic receptor activity in a negative way.\nAssistant: Got it, this SMILES is not modulating the M1 muscarinic receptor activity in a negative way: Clc1n(c2c(n1)cccc2)CCC#N"}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not modulating the M1 muscarinic receptor activity in a negative way?\nConstraint: You must select none, one or more options from A, B, C, or D without using any additional words.\nOptions:\nA) Clc1c(c(c(S(=O)(=O)NC2CCCCC2)nc1C)C#N)C\nB) S(=O)(=O)(NCc1ncccc1)c1cc2oc(=O)n(c2cc1)CC(OCC)=O\nC) O=c1n(c(NC(=O)c2cc(OC)ccc2)cc(=O)n1C)C\nD) O(c1cc2CC(n3nnnc3c2cc1OC)(C)C)C\nAnswer: A, B, C, D"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not modulating the M1 muscarinic receptor activity in a negative way?\nConstraint: You must select none, one or more options from A, B, C, or D without using any other words.\nOptions:\nA) OCC=O)ccncc5)C))C))C)))))ccncccc6ccc%10\nB) sncc[nH]cnc6=O)))))c5C=O)NccccNCCCCC6)))COCC)))=O)))))cc6\nC) scncccc6cN)c9C=O)NCcoccc5))))))))))))ccOC))cOC))c6\nD) ClccccCC=O)Ncnnnn5))CCC)))))))))cc6\nAnswer: A, B, C, D"}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/valid_0-2.jsonl": "{"text":"The canonical SMILES CCC(C)NC(=O)c1c(N)n(CC2CCCO2)c2nc3ccccc3nc12 is from a molecule that displays no negative modulation of the M1 muscarinic receptor activity."} {"text":"The SELFIES [O][C][Branch1][=Branch1][C][C][C][Ring1][Branch1][C][N][N][=N][N][=C][Ring1][Branch1][C][Branch1][P][N][C][C][C][=C][Branch1][Ring2][C][Ring1][=Branch1][C][=C][C][=C][Ring1][#Branch1][C][=C][C][=C][Branch1][=Branch1][NH1][C][Ring1][=Branch1][=O][C][=C][C][Branch1][Ring1][O][C][=C][Ring1][O] represents a molecule that exhibits no negative modulation of the M1 muscarinic receptor activity."}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/valid_0-1.jsonl": "{"text":"Based on the canonical SMILES CCC(C)NC(=O)c1c(N)n(CC2CCCO2)c2nc3ccccc3nc12, the molecule exhibits no negative modulation of the M1 muscarinic receptor activity."} {"text":"Based on the SMILES O1C(CCC1)Cn1nnnc1C(N1CCc2c(C1)cccc2)c1cc2c([nH]c1=O)ccc(OC)c2, the molecule exhibits no negative modulation of the M1 muscarinic receptor activity."}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not modulating the M1 muscarinic receptor activity in a negative way?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any additional words.\nOptions:\n(1) O=C1N(c2c(C31N(CCCN1CCOCC1)C(=O)C(O)=C3C(=O)c1oc(cc1)C)cccc2)C\n(2) O(CC(=O)NCCC=1CCCCC1)c1nc2c(nc1)cccc2\n(3) Clc1c(c2c(CN3CCN(CC3)c3ccc(OC)cc3)cc(oc2cc1C)=O)C\n(4) O1C(CCC1)Cn1c2nc3c(nc2c(c1N)C(=O)NC(CC)C)cccc3\nAnswer: 1, 2, 3, 4"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not modulating the M1 muscarinic receptor activity in a negative way?\nConstraint: You must select none, one or more options from A, B, C, or D without using any other words.\nOptions:\nA. InChI=1S\/C16H15N3O2S\/c1-2-12(11-7-4-3-5-8-11)14(20)17-16-19-18-15(22-16)13-9-6-10-21-13\/h3-10,12H,2H2,1H3,(H,17,19,20)\nB. InChI=1S\/C26H28N6O3\/c1-34-20-8-9-23-19(13-20)14-22(26(33)27-23)24(31-11-10-17-5-2-3-6-18(17)15-31)25-28-29-30-32(25)16-21-7-4-12-35-21\/h2-3,5-6,8-9,13-14,21,24H,4,7,10-12,15-16H2,1H3,(H,27,33)\nC. InChI=1S\/C20H28N2O5\/c1-12(2)26-16(23)11-22-14-10-13(17(24)21-19(3,4)5)8-9-15(14)27-20(6,7)18(22)25\/h8-10,12H,11H2,1-7H3,(H,21,24)\nD. InChI=1S\/C15H16N2O2S2\/c1-11-10-16(14(18)12-4-2-8-20-12)6-7-17(11)15(19)13-5-3-9-21-13\/h2-5,8-9,11H,6-7,10H2,1H3\nAnswer: A, B, C, D"}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a negative way.\nDeepSMILES: OCCCC5)))Cncnccnc6cc9N))C=O)NCCC))C)))))))cccc6\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not modulating the M1 muscarinic receptor activity in a negative way."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a negative way.\nSELFIES: [O][C][Branch1][=Branch1][C][C][C][Ring1][Branch1][C][N][N][=N][N][=C][Ring1][Branch1][C][Branch1][P][N][C][C][C][=C][Branch1][Ring2][C][Ring1][=Branch1][C][=C][C][=C][Ring1][#Branch1][C][=C][C][=C][Branch1][=Branch1][NH1][C][Ring1][=Branch1][=O][C][=C][C][Branch1][Ring1][O][C][=C][Ring1][O]\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not modulating the M1 muscarinic receptor activity in a negative way."}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a negative way.\nMolecule DeepSMILES: OCCCC5)))Cncnccnc6cc9N))C=O)NCCC))C)))))))cccc6\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a negative way.\nSMILES: O1C(CCC1)Cn1nnnc1C(N1CCc2c(C1)cccc2)c1cc2c([nH]c1=O)ccc(OC)c2\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a negative way.\nMolecule InChI: InChI=1S\/C13H12N2O3S\/c1-18-12(17)7-19-13-8(6-14)5-9-10(15-13)3-2-4-11(9)16\/h5H,2-4,7H2,1H3\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not modulating the M1 muscarinic receptor activity in a negative way."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a negative way.\nDeepSMILES: Clcnccn5)cccc6))))))CCC#N\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not modulating the M1 muscarinic receptor activity in a negative way."}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/valid_0-12.jsonl": "{"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be modulating the M1 muscarinic receptor activity in a negative way.\nAssistant: Ok, this DeepSMILES is not modulating the M1 muscarinic receptor activity in a negative way: OCCCC5)))Cncnccnc6cc9N))C=O)NCCC))C)))))))cccc6"} {"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be modulating the M1 muscarinic receptor activity in a negative way.\nAssistant: Understood, this DeepSMILES is not modulating the M1 muscarinic receptor activity in a negative way: OCCCC5)))Cnnnnc5CNCCccC6)cccc6)))))))))cccc[nH]c6=O)))cccOC))c6"}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/train_0-2.jsonl": "{"text":"The canonical SMILES COC(=O)CSc1nc2c(cc1C#N)C(=O)CCC2 represents a molecule that displays no negative modulation of the M1 muscarinic receptor activity."} {"text":"The canonical SMILES N#CCCn1c(Cl)nc2ccccc21 represents a molecule that shows no negative modulation of the M1 muscarinic receptor activity."}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/test_0-11.jsonl": "{"text":"User: I want to create a SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be modulating the M1 muscarinic receptor activity in a negative way.\nAssistant: Got it, here you go, this SMILES is not modulating the M1 muscarinic receptor activity in a negative way: O(c1cc2CC(n3nnnc3c2cc1OC)(C)C)C"} {"text":"User: I want to generate a InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be modulating the M1 muscarinic receptor activity in a negative way.\nAssistant: Ok, this InChI is not modulating the M1 muscarinic receptor activity in a negative way: InChI=1S\/C18H18N2O2\/c1-12-10-15(13(2)20(12)3)16(21)11-22-17-8-4-6-14-7-5-9-19-18(14)17\/h4-10H,11H2,1-3H3"}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/train_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the SMILES S(c1nc2CCCC(=O)c2cc1C#N)CC(OC)=O is modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: No, this molecule is not modulating the M1 muscarinic receptor activity in a negative way."} {"text":"User: Can you tell me if the molecule with the SMILES Clc1n(c2c(n1)cccc2)CCC#N is modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: No, this molecule is not modulating the M1 muscarinic receptor activity in a negative way."}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/train_0-11.jsonl": "{"text":"User: I want to create a SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be modulating the M1 muscarinic receptor activity in a negative way.\nAssistant: Got it, here you go, this SMILES is not modulating the M1 muscarinic receptor activity in a negative way: S(c1nc2CCCC(=O)c2cc1C#N)CC(OC)=O"} {"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be modulating the M1 muscarinic receptor activity in a negative way.\nAssistant: Got it, this DeepSMILES is not modulating the M1 muscarinic receptor activity in a negative way: Clcnccn5)cccc6))))))CCC#N"}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/train_0-1.jsonl": "{"text":"Based on the DeepSMILES representation ScncCCCC=O)c6cc%10C#N))))))))))))CCOC))=O, the molecule displays no negative modulation of the M1 muscarinic receptor activity."} {"text":"Based on the DeepSMILES representation Clcnccn5)cccc6))))))CCC#N, the molecule displays no negative modulation of the M1 muscarinic receptor activity."}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not modulating the M1 muscarinic receptor activity in a negative way?\nConstraint: You must select none, one or more options from a, b, c, or d without using any additional words.\nOptions:\na: o1c(CN2C(=O)\/C(=C\\NC3CC3)C(=O)NC2=O)ccc1\nb: S(c1nc2CCCC(=O)c2cc1C#N)CC(OC)=O\nc: S(=O)(=O)(N1CC(CCC1)C(=O)N1CCN(CC1)c1c(ccc(c1)C)C)c1[nH]cnc1\nd: s1c2CCCCc2cc1C(=O)Nn1cnnc1\nAnswer: a, b, c, d"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not modulating the M1 muscarinic receptor activity in a negative way?\nConstraint: You must select none, one or more options from a or b without using any additional words.\nOptions:\n[a] InChI=1S\/C7H9F3N2O5\/c1-3(13)11-5(15)12-6(16,4(14)17-2)7(8,9)10\/h16H,1-2H3,(H2,11,12,13,15)\n[b] InChI=1S\/C10H8ClN3\/c11-10-13-8-4-1-2-5-9(8)14(10)7-3-6-12\/h1-2,4-5H,3,7H2\nAnswer: a, b"}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a negative way.\nDeepSMILES: ScncCCCC=O)c6cc%10C#N))))))))))))CCOC))=O\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a negative way.\nSMILES: Clc1n(c2c(n1)cccc2)CCC#N\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/test_0-7.jsonl": "{"text":"User: Can you derive if the molecule with the InChI InChI=1S\/C13H16N4O2\/c1-13(2)7-8-5-10(18-3)11(19-4)6-9(8)12-14-15-16-17(12)13\/h5-6H,7H2,1-4H3 is modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: No, this molecule is not modulating the M1 muscarinic receptor activity in a negative way."} {"text":"User: Can you estimate if the molecule with the SELFIES [O][Branch2][Ring1][=Branch1][C][C][=Branch1][C][=O][C][=C][Branch1][N][N][Branch1][Branch2][C][=Branch1][Ring2][=C][Ring1][Branch1][C][C][C][C][=C][N][=C][C][=C][C][Ring1][=Branch1][=C][C][=C][Ring1][#Branch2] is modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: No, this molecule is not modulating the M1 muscarinic receptor activity in a negative way."}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/train_0-9.jsonl": "{"text":"User: Can you create the InChI of a molecule that is not modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: Yes, I'm happy to help, here you go: InChI=1S\/C13H12N2O3S\/c1-18-12(17)7-19-13-8(6-14)5-9-10(15-13)3-2-4-11(9)16\/h5H,2-4,7H2,1H3"} {"text":"User: Can you generate the SMILES of a molecule that is not modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: Of course, here you go: Clc1n(c2c(n1)cccc2)CCC#N"}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/valid_0-3.jsonl": "{"text":"The DeepSMILES OCCCC5)))Cncnccnc6cc9N))C=O)NCCC))C)))))))cccc6 is not modulating the M1 muscarinic receptor activity in a negative way."} {"text":"The molecule SELFIES [O][C][Branch1][=Branch1][C][C][C][Ring1][Branch1][C][N][N][=N][N][=C][Ring1][Branch1][C][Branch1][P][N][C][C][C][=C][Branch1][Ring2][C][Ring1][=Branch1][C][=C][C][=C][Ring1][#Branch1][C][=C][C][=C][Branch1][=Branch1][NH1][C][Ring1][=Branch1][=O][C][=C][C][Branch1][Ring1][O][C][=C][Ring1][O] is not modulating the M1 muscarinic receptor activity in a negative way."}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/test_0-8.jsonl": "{"text":"User: Is the molecule with the SMILES O(c1cc2CC(n3nnnc3c2cc1OC)(C)C)C modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: No, it is not modulating the M1 muscarinic receptor activity in a negative way."} {"text":"User: Is the molecule with the SMILES O(CC(=O)c1c(n(c(c1)C)C)C)c1c2ncccc2ccc1 modulating the M1 muscarinic receptor activity in a negative way?\nAssistant: No, it is not modulating the M1 muscarinic receptor activity in a negative way."}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a negative way.\nSELFIES: [O][Branch2][Ring1][N][C][=C][C][C][C][Branch1][P][N][N][=N][N][=C][Ring1][Branch1][C][=Ring1][=Branch2][C][=C][Ring1][=N][O][C][Branch1][C][C][C][C]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a negative way.\nSMILES: O(CC(=O)c1c(n(c(c1)C)C)C)c1c2ncccc2ccc1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/m1_muscarinic_receptor_antagonists_butkiewicz/test_0-12.jsonl": "{"text":"User: I want to generate a InChI.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be modulating the M1 muscarinic receptor activity in a negative way.\nAssistant: Got it, this InChI is not modulating the M1 muscarinic receptor activity in a negative way: InChI=1S\/C13H16N4O2\/c1-13(2)7-8-5-10(18-3)11(19-4)6-9(8)12-14-15-16-17(12)13\/h5-6H,7H2,1-4H3"} {"text":"User: I want to come up with a SELFIES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be modulating the M1 muscarinic receptor activity in a negative way.\nAssistant: Understood, this SELFIES is not modulating the M1 muscarinic receptor activity in a negative way: [O][Branch2][Ring1][=Branch1][C][C][=Branch1][C][=O][C][=C][Branch1][N][N][Branch1][Branch2][C][=Branch1][Ring2][=C][Ring1][Branch1][C][C][C][C][=C][N][=C][C][=C][C][Ring1][=Branch1][=C][C][=C][Ring1][#Branch2]"}", "/scratch/micpie/export/ord_rxn_smiles_yield_pred/test_0-1.jsonl": "{"text":"User: I need to run a reaction with the reaction SMILES (RXNSMILES) C1COCCO1.CC(=O)O.CS(=O)(=O)c1ccc(N)cc1.Clc1cncc(Cl)n1.O=C([O-])[O-].[Cs+].[Pd].c1ccc(P(c2ccccc2)c2ccc3ccccc3c2-c2c(P(c3ccccc3)c3ccccc3)ccc3ccccc23)cc1>>CS(=O)(=O)c1ccc(Nc2cncc(Cl)n2)cc1. What is the yield I can get?\nAssistant: The predicted yield is 22\\%."} {"text":"User: I need to run a reaction with the reaction SMILES (RXNSMILES) Brc1cccc2ccoc12.CC(C)(C)[O-].Cc1ccccc1.O=C(\/C=C\/c1ccccc1)\/C=C\/c1ccccc1.[Na+].[Pd].c1ccc(CCN2CCNCC2)cc1.c1ccc(P(c2ccccc2)c2ccc3ccccc3c2-c2c(P(c3ccccc3)c3ccccc3)ccc3ccccc23)cc1>>c1ccc(CCN2CCN(c3cccc4ccoc34)CC2)cc1. What is the yield I can expect?\nAssistant: The estimated yield is 62\\%."}", "/scratch/micpie/export/ord_rxn_smiles_yield_pred/valid_0-0.jsonl": "{"text":"The reaction yield of a reaction with the RXNSMILES CC(C)(C)[O-].CC(C)[C@@H]1CNCCN1C(=O)OC(C)(C)C.CN(C)c1ccccc1-c1ccccc1P(C1CCCCC1)C1CCCCC1.COc1ccc(Br)cc1OC.Cc1ccccc1.O=C(\/C=C\/c1ccccc1)\/C=C\/c1ccccc1.[Na+].[Pd]>>COc1ccc(N2CCN(C(=O)OC(C)(C)C)[C@H](C(C)C)C2)cc1OC is 36\\%."} {"text":"The yield of a reaction with the reaction SMILES Brc1cccnc1.CC(C)(C)OC(=O)N1CCNCC1.CC(C)(C)[O-].Cc1ccccc1.O=C(\/C=C\/c1ccccc1)\/C=C\/c1ccccc1.[Na+].[Pd].c1ccc(P(c2ccccc2)c2ccc3ccccc3c2-c2c(P(c3ccccc3)c3ccccc3)ccc3ccccc23)cc1>>CC(C)(C)OC(=O)N1CCN(c2cccnc2)CC1 is 29\\%."}", "/scratch/micpie/export/ord_rxn_smiles_yield_pred/test_0-2.jsonl": "{"text":"Question: What's reaction yield of a reaction with the reaction SMILES C1COCCO1.CC(=O)O.CS(=O)(=O)c1ccc(N)cc1.Clc1cncc(Cl)n1.O=C([O-])[O-].[Cs+].[Pd].c1ccc(P(c2ccccc2)c2ccc3ccccc3c2-c2c(P(c3ccccc3)c3ccccc3)ccc3ccccc23)cc1>>CS(=O)(=O)c1ccc(Nc2cncc(Cl)n2)cc1?\nAnswer: 22\\%."} {"text":"Question: What is the yield of a reaction with the reaction SMILES (RXNSMILES) Brc1cccc2ccoc12.CC(C)(C)[O-].Cc1ccccc1.O=C(\/C=C\/c1ccccc1)\/C=C\/c1ccccc1.[Na+].[Pd].c1ccc(CCN2CCNCC2)cc1.c1ccc(P(c2ccccc2)c2ccc3ccccc3c2-c2c(P(c3ccccc3)c3ccccc3)ccc3ccccc23)cc1>>c1ccc(CCN2CCN(c3cccc4ccoc34)CC2)cc1?\nAnswer: 62\\%."}", "/scratch/micpie/export/ord_rxn_smiles_yield_pred/test_0-0.jsonl": "{"text":"The reaction yield of a reaction with the reaction SMILES C1COCCO1.CC(=O)O.CS(=O)(=O)c1ccc(N)cc1.Clc1cncc(Cl)n1.O=C([O-])[O-].[Cs+].[Pd].c1ccc(P(c2ccccc2)c2ccc3ccccc3c2-c2c(P(c3ccccc3)c3ccccc3)ccc3ccccc23)cc1>>CS(=O)(=O)c1ccc(Nc2cncc(Cl)n2)cc1 is 22\\%."} {"text":"The reaction yield of a reaction with the reaction SMILES Brc1cccc2ccoc12.CC(C)(C)[O-].Cc1ccccc1.O=C(\/C=C\/c1ccccc1)\/C=C\/c1ccccc1.[Na+].[Pd].c1ccc(CCN2CCNCC2)cc1.c1ccc(P(c2ccccc2)c2ccc3ccccc3c2-c2c(P(c3ccccc3)c3ccccc3)ccc3ccccc23)cc1>>c1ccc(CCN2CCN(c3cccc4ccoc34)CC2)cc1 is 62\\%."}", "/scratch/micpie/export/ord_rxn_smiles_yield_pred/train_0-0.jsonl": "{"text":"The yield of a reaction with the reaction SMILES string CC(=O)O.Cc1ccccc1.Clc1cccc(Cl)n1.Nc1ccccc1.O=C([O-])[O-].[K+].[Pd].c1ccc(P(c2ccccc2)c2ccc3ccccc3c2-c2c(P(c3ccccc3)c3ccccc3)ccc3ccccc23)cc1>>Clc1cccc(Nc2ccccc2)n1 is 53\\%."} {"text":"The yield of a reaction with the reaction SMILES (RXNSMILES) C1COCCO1.CC(=O)O.CONC(=O)c1ccccc1Nc1cc(Cl)ncc1Cl.Cc1cc(N)n(C(C)C)n1.O=C([O-])[O-].[Cs+].[Pd].c1ccc(P(c2ccccc2)c2ccc3ccccc3c2-c2c(P(c3ccccc3)c3ccccc3)ccc3ccccc23)cc1>>CONC(=O)c1ccccc1Nc1cc(Nc2cc(C)nn2C(C)C)ncc1Cl is 37\\%."}", "/scratch/micpie/export/ord_rxn_smiles_yield_pred/valid_0-2.jsonl": "{"text":"Question: What's the yield of a reaction with the reaction SMILES string CC(C)(C)[O-].CC(C)[C@@H]1CNCCN1C(=O)OC(C)(C)C.CN(C)c1ccccc1-c1ccccc1P(C1CCCCC1)C1CCCCC1.COc1ccc(Br)cc1OC.Cc1ccccc1.O=C(\/C=C\/c1ccccc1)\/C=C\/c1ccccc1.[Na+].[Pd]>>COc1ccc(N2CCN(C(=O)OC(C)(C)C)[C@H](C(C)C)C2)cc1OC?\nAnswer: 36\\%."} {"text":"Question: What's reaction yield of a reaction with the RXNSMILES Brc1cccnc1.CC(C)(C)OC(=O)N1CCNCC1.CC(C)(C)[O-].Cc1ccccc1.O=C(\/C=C\/c1ccccc1)\/C=C\/c1ccccc1.[Na+].[Pd].c1ccc(P(c2ccccc2)c2ccc3ccccc3c2-c2c(P(c3ccccc3)c3ccccc3)ccc3ccccc23)cc1>>CC(C)(C)OC(=O)N1CCN(c2cccnc2)CC1?\nAnswer: 29\\%."}", "/scratch/micpie/export/ord_rxn_smiles_yield_pred/valid_0-1.jsonl": "{"text":"User: I need to run a reaction with the reaction SMILES (RXNSMILES) CC(C)(C)[O-].CC(C)[C@@H]1CNCCN1C(=O)OC(C)(C)C.CN(C)c1ccccc1-c1ccccc1P(C1CCCCC1)C1CCCCC1.COc1ccc(Br)cc1OC.Cc1ccccc1.O=C(\/C=C\/c1ccccc1)\/C=C\/c1ccccc1.[Na+].[Pd]>>COc1ccc(N2CCN(C(=O)OC(C)(C)C)[C@H](C(C)C)C2)cc1OC. What is the yield I should expect?\nAssistant: The predicted yield is 36\\%."} {"text":"User: I need to run a reaction with the RXNSMILES Brc1cccnc1.CC(C)(C)OC(=O)N1CCNCC1.CC(C)(C)[O-].Cc1ccccc1.O=C(\/C=C\/c1ccccc1)\/C=C\/c1ccccc1.[Na+].[Pd].c1ccc(P(c2ccccc2)c2ccc3ccccc3c2-c2c(P(c3ccccc3)c3ccccc3)ccc3ccccc23)cc1>>CC(C)(C)OC(=O)N1CCN(c2cccnc2)CC1. What is the reaction yield I should expect?\nAssistant: The expected reaction yield is 29\\%."}", "/scratch/micpie/export/ord_rxn_smiles_yield_pred/train_0-2.jsonl": "{"text":"Question: What is the reaction yield of a reaction with the reaction SMILES CC(=O)O.Cc1ccccc1.Clc1cccc(Cl)n1.Nc1ccccc1.O=C([O-])[O-].[K+].[Pd].c1ccc(P(c2ccccc2)c2ccc3ccccc3c2-c2c(P(c3ccccc3)c3ccccc3)ccc3ccccc23)cc1>>Clc1cccc(Nc2ccccc2)n1?\nAnswer: 53\\%."} {"text":"Question: What is reaction yield of a reaction with the reaction SMILES (RXNSMILES) C1COCCO1.CC(=O)O.CONC(=O)c1ccccc1Nc1cc(Cl)ncc1Cl.Cc1cc(N)n(C(C)C)n1.O=C([O-])[O-].[Cs+].[Pd].c1ccc(P(c2ccccc2)c2ccc3ccccc3c2-c2c(P(c3ccccc3)c3ccccc3)ccc3ccccc23)cc1>>CONC(=O)c1ccccc1Nc1cc(Nc2cc(C)nn2C(C)C)ncc1Cl?\nAnswer: 37\\%."}", "/scratch/micpie/export/ord_rxn_smiles_yield_pred/train_0-1.jsonl": "{"text":"User: I want to run a reaction with the RXNSMILES CC(=O)O.Cc1ccccc1.Clc1cccc(Cl)n1.Nc1ccccc1.O=C([O-])[O-].[K+].[Pd].c1ccc(P(c2ccccc2)c2ccc3ccccc3c2-c2c(P(c3ccccc3)c3ccccc3)ccc3ccccc23)cc1>>Clc1cccc(Nc2ccccc2)n1. What is the reaction yield I can expect?\nAssistant: The estimated reaction yield is 53\\%."} {"text":"User: I want to run a reaction with the reaction SMILES C1COCCO1.CC(=O)O.CONC(=O)c1ccccc1Nc1cc(Cl)ncc1Cl.Cc1cc(N)n(C(C)C)n1.O=C([O-])[O-].[Cs+].[Pd].c1ccc(P(c2ccccc2)c2ccc3ccccc3c2-c2c(P(c3ccccc3)c3ccccc3)ccc3ccccc23)cc1>>CONC(=O)c1ccccc1Nc1cc(Nc2cc(C)nn2C(C)C)ncc1Cl. What is the reaction yield I can expect?\nAssistant: The expected reaction yield is 37\\%."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_5-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the InChI string.\nInChI string: InChI=1S\/C14H23N3O3\/c1-3-11-10-13(20-16-11)12-6-4-8-17(12)14(18)15-7-5-9-19-2\/h10,12H,3-9H2,1-2H3,(H,15,18)\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the InChI string.\nMolecule InChI string: InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8-\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]']"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_2-4.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the InChI string InChI=1S\/C46H51ClN6O8S\/c1-45(2)15-13-33(39(28-45)31-3-5-34(47)6-4-31)29-51-19-21-52(22-20-51)35-7-10-38(43(26-35)61-36-8-11-40-32(25-36)14-18-48-40)44(55)50-62(58,59)37-9-12-41(42(27-37)53(56)57)49-46(30-54)16-23-60-24-17-46\/h3-12,14,18,25-27,48-49,54H,13,15-17,19-24,28-30H2,1-2H3,(H,50,55)?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N]."} {"text":"User: Can you generate the SELFIES of the molecule with the InChI string InChI=1S\/C21H18BrClN2O3\/c1-24(2)8-9-25-18(12-4-3-5-13(22)10-12)17-19(26)15-11-14(23)6-7-16(15)28-20(17)21(25)27\/h3-7,10-11,18H,8-9H2,1-2H3?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_2-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the InChI string.\nInChI string: InChI=1S\/C28H42O7\/c1-16(2)17(3)13-23(31)35-22-15-21-24(5)9-8-20(30)14-19(24)7-10-27(21,33)28(34)12-11-26(32,18(4)29)25(22,28)6\/h7,13,16,20-22,30,32-34H,8-12,14-15H2,1-6H3\/b17-13+\/t20-,21+,22+,24-,25+,26+,27-,28+\/m0\/s1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the InChI string.\nInChI string: InChI=1S\/C23H25N3O\/c1-2-6-21(7-3-1)19-27-22-11-9-20(10-12-22)18-25-14-16-26(17-15-25)23-8-4-5-13-24-23\/h1-13H,14-19H2\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O]"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_4-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C18H24N4O3\/c1-13(2)25-16-7-5-14(6-8-16)20-18(23)21-15-10-19-22(11-15)12-17-4-3-9-24-17\/h5-8,10-11,13,17H,3-4,9,12H2,1-2H3,(H2,20,21,23) can also be represented with the SELFIES representation [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1]."} {"text":"The molecule with the InChI string representation of InChI=1S\/C21H16N2O2\/c22-20(24)17-9-7-15(8-10-17)4-3-13-23-21(25)19-12-11-16-5-1-2-6-18(16)14-19\/h1-2,5-12,14H,13H2,(H2,22,24)(H,23,25) can also be represented with the SELFIES representation [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_4-4.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the InChI string InChI=1S\/C18H24N4O3\/c1-13(2)25-16-7-5-14(6-8-16)20-18(23)21-15-10-19-22(11-15)12-17-4-3-9-24-17\/h5-8,10-11,13,17H,3-4,9,12H2,1-2H3,(H2,20,21,23)?\nAssistant: Yes, this molecule has a SELFIES of [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1]."} {"text":"User: Can you create the SELFIES of the molecule with the InChI string InChI=1S\/C21H16N2O2\/c22-20(24)17-9-7-15(8-10-17)4-3-13-23-21(25)19-12-11-16-5-1-2-6-18(16)14-19\/h1-2,5-12,14H,13H2,(H2,22,24)(H,23,25)?\nAssistant: Of course, this molecule has a SELFIES of [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_1-4.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the InChI string InChI=1S\/C17H15N5OS2\/c1-20-8-7-11-12(9-20)25-15-13(11)14(23)21(10-5-3-2-4-6-10)16-18-19-17(24)22(15)16\/h2-6H,7-9H2,1H3,(H,19,24)?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2]."} {"text":"User: Can you generate the SELFIES of the molecule with the InChI string InChI=1S\/C16H16BrF2N3O\/c17-12-6-11(18)7-13(19)16(12)20-15(23)9-22-8-10-4-2-1-3-5-14(10)21-22\/h6-8H,1-5,9H2,(H,20,23)?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_5-1.jsonl": "{"text":"The molecule with the SELFIES [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C] can also be represented with the InChI string representation InChI=1S\/C14H23N3O3\/c1-3-11-10-13(20-16-11)12-6-4-8-17(12)14(18)15-7-5-9-19-2\/h10,12H,3-9H2,1-2H3,(H,15,18)."} {"text":"The molecule with the SELFIES representation of ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]'] can also be represented with the InChI string representation InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8-."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_4-4.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the InChI string InChI=1S\/C12H20N4O2\/c1-2-3-8-4-6-9(7-5-8)14-12(17)10-11(13)16-18-15-10\/h8-9H,2-7H2,1H3,(H2,13,16)(H,14,17)?\nAssistant: Yes, this molecule has a SELFIES of [C][C][C][C][C][C][C][Branch1][=C][N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][#C]."} {"text":"User: Can you create the SELFIES of the molecule with the InChI string InChI=1S\/C18H22N2O3S\/c1-14-8-7-9-15(2)18(14)19-17(21)12-20(3)24(22,23)13-16-10-5-4-6-11-16\/h4-11H,12-13H2,1-3H3,(H,19,21)?\nAssistant: Of course, this molecule has a SELFIES of [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_4-4.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the InChI string InChI=1S\/C11H17N5O3\/c1-2-13-10(17)7-3-5-16(6-4-7)11(18)8-9(12)15-19-14-8\/h7H,2-6H2,1H3,(H2,12,15)(H,13,17)?\nAssistant: Of course, this molecule has a SELFIES of [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C]."} {"text":"User: Can you tell me the SELFIES of the molecule with the InChI string InChI=1S\/C18H22N4O\/c1-11(2)19-17(23)10-9-14-12(3)20-18-15-7-5-6-8-16(15)21-22(18)13(14)4\/h5-8,11H,9-10H2,1-4H3,(H,19,23)?\nAssistant: Of course, this molecule has a SELFIES of [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_1-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SELFIES from the InChI string.\nMolecule InChI string: InChI=1S\/C19H23N5O2S\/c1-24-19(21-22-23-24)27-11-10-20-13-16-8-9-17(18(12-16)25-2)26-14-15-6-4-3-5-7-15\/h3-9,12,20H,10-11,13-14H2,1-2H3\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the InChI string.\nMolecule InChI string: InChI=1S\/C12H15N3OS\/c1-13-12(17)15-8-7-11(14-15)9-3-5-10(16-2)6-4-9\/h3-6H,7-8H2,1-2H3,(H,13,17)\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N]"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_3-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the InChI string.\nInChI string: InChI=1S\/C25H27ClN2O7S\/c1-34-13-3-10-28-22(18-4-2-5-19(26)16-18)21(24(30)25(28)31)23(29)17-6-8-20(9-7-17)36(32,33)27-11-14-35-15-12-27\/h2,4-9,16,22,29H,3,10-15H2,1H3\/b23-21+\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1]"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the InChI string.\nInChI string: InChI=1S\/C16H19FN4O3\/c1-23-15-5-4-11(7-14(15)17)19-16(22)20-12-8-18-21(9-12)10-13-3-2-6-24-13\/h4-5,7-9,13H,2-3,6,10H2,1H3,(H2,19,20,22)\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F]"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_1-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the SELFIES [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C19H23N5O2S\/c1-24-19(21-22-23-24)27-11-10-20-13-16-8-9-17(18(12-16)25-2)26-14-15-6-4-3-5-7-15\/h3-9,12,20H,10-11,13-14H2,1-2H3."} {"text":"User: Can you tell me the InChI string of the molecule with the SELFIES [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N]?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C12H15N3OS\/c1-13-12(17)15-8-7-11(14-15)9-3-5-10(16-2)6-4-9\/h3-6H,7-8H2,1-2H3,(H,13,17)."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_5-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C14H23N3O3\/c1-3-11-10-13(20-16-11)12-6-4-8-17(12)14(18)15-7-5-9-19-2\/h10,12H,3-9H2,1-2H3,(H,15,18) can also be represented with the SELFIES [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C]."} {"text":"The molecule with the InChI string representation of InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8- can also be represented with the SELFIES representation ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]']."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_1-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the SELFIES [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2]?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C16H12N4O\/c1-10-6-8-11(9-7-10)14-18-15-12-4-2-3-5-13(12)17-16(21)20(15)19-14\/h2-9H,1H3,(H,17,21)."} {"text":"User: Can you generate the InChI string of the molecule with the SELFIES [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2]?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C27H28F2N6O3\/c1-18-24(37-14-13-34-11-9-33(2)10-12-34)16-35-25(18)27(30-17-31-35)38-23-8-7-19(15-22(23)29)32-26(36)20-5-3-4-6-21(20)28\/h3-8,15-17H,9-14H2,1-2H3,(H,32,36)."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_5-4.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the InChI string InChI=1S\/C18H16FN3O\/c19-17-4-2-1-3-14(17)12-22(16-9-10-16)18(23)21-15-7-5-13(11-20)6-8-15\/h1-8,16H,9-10,12H2,(H,21,23)?\nAssistant: Sure, this molecule has a SELFIES of [N][#C][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][Branch1][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][C][C][Ring1][Ring1][C][=C][Ring2][Ring1][Branch1]."} {"text":"User: Can you generate the SELFIES of the molecule with the InChI string InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1?\nAssistant: Sure, this molecule has a SELFIES of ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]']."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_0-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the SELFIES [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1]?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1."} {"text":"User: Can you create the InChI string of the molecule with the SELFIES [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1]?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C12H11N5S2\/c1-2-4-9(5-3-1)15-12-16-10(7-19-12)6-18-11-13-8-14-17-11\/h1-5,7-8H,6H2,(H,15,16)(H,13,14,17)."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_3-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C16H12FN3O2S\/c17-11-3-1-4-12(9-11)18-15(21)10-23-16-7-6-13(19-20-16)14-5-2-8-22-14\/h1-9H,10H2,(H,18,21) can also be represented with the SELFIES representation [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1]."} {"text":"The molecule with the InChI string InChI=1S\/C10H17N5O3\/c1-4-15(5-2)7(16)6-14(3)10(17)8-9(11)13-18-12-8\/h4-6H2,1-3H3,(H2,11,13) can also be represented with the SELFIES representation [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_0-1.jsonl": "{"text":"The molecule with the SELFIES [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1] can also be represented with the InChI string representation InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1."} {"text":"The molecule with the SELFIES representation of [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1] can also be represented with the InChI string InChI=1S\/C12H11N5S2\/c1-2-4-9(5-3-1)15-12-16-10(7-19-12)6-18-11-13-8-14-17-11\/h1-5,7-8H,6H2,(H,15,16)(H,13,14,17)."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_5-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C16H26N2O\/c1-6-17(7-2)16(19)12-18(8-3)15-10-13(4)9-14(5)11-15\/h9-11H,6-8,12H2,1-5H3 can also be represented with the SELFIES [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2]."} {"text":"The molecule with the InChI string InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1 can also be represented with the SELFIES representation ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]']."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_2-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C46H51ClN6O8S\/c1-45(2)15-13-33(39(28-45)31-3-5-34(47)6-4-31)29-51-19-21-52(22-20-51)35-7-10-38(43(26-35)61-36-8-11-40-32(25-36)14-18-48-40)44(55)50-62(58,59)37-9-12-41(42(27-37)53(56)57)49-46(30-54)16-23-60-24-17-46\/h3-12,14,18,25-27,48-49,54H,13,15-17,19-24,28-30H2,1-2H3,(H,50,55) can also be represented with the SELFIES [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N]."} {"text":"The molecule with the InChI string representation of InChI=1S\/C21H18BrClN2O3\/c1-24(2)8-9-25-18(12-4-3-5-13(22)10-12)17-19(26)15-11-14(23)6-7-16(15)28-20(17)21(25)27\/h3-7,10-11,18H,8-9H2,1-2H3 can also be represented with the SELFIES representation [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_2-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the InChI string.\nInChI string: InChI=1S\/C13H15N3O2S\/c1-9-8-19-13(15-9)16-14-7-10-4-5-11(17-2)12(6-10)18-3\/h4-8H,1-3H3,(H,15,16)\/b14-7+\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C]"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the InChI string.\nMolecule InChI string: InChI=1S\/C8H10O3S\/c1-3-11-6-4-5-12-7(6)8(9)10-2\/h4-5H,3H2,1-2H3\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C]"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_0-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1 can also be represented with the SELFIES [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2]."} {"text":"The molecule with the InChI string representation of InChI=1S\/C25H37N3O4\/c1-28(2)6-5-26-24(30)22(25-13-16-7-17(14-25)9-18(8-16)15-25)27-23(29)19-10-20(31-3)12-21(11-19)32-4\/h10-12,16-18,22H,5-9,13-15H2,1-4H3,(H,26,30)(H,27,29) can also be represented with the SELFIES representation [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_3-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the InChI string from the SELFIES.\nMolecule SELFIES: [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C16H12FN3O2S\/c17-11-3-1-4-12(9-11)18-15(21)10-23-16-7-6-13(19-20-16)14-5-2-8-22-14\/h1-9H,10H2,(H,18,21)"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the InChI string from the SELFIES.\nMolecule SELFIES: [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C10H17N5O3\/c1-4-15(5-2)7(16)6-14(3)10(17)8-9(11)13-18-12-8\/h4-6H2,1-3H3,(H2,11,13)"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_5-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the SELFIES.\nMolecule SELFIES: [N][#C][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][Branch1][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][C][C][Ring1][Ring1][C][=C][Ring2][Ring1][Branch1]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C18H16FN3O\/c19-17-4-2-1-3-14(17)12-22(16-9-10-16)18(23)21-15-7-5-13(11-20)6-8-15\/h1-8,16H,9-10,12H2,(H,21,23)"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the InChI string from the SELFIES.\nSELFIES: ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]']\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_3-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the SELFIES [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C19H20BrNO2.ClH\/c20-17-8-6-16(7-9-17)19(22)18(15-4-2-1-3-5-15)14-21-10-12-23-13-11-21;\/h1-9,18H,10-14H2;1H."} {"text":"User: Can you tell me the InChI string of the molecule with the SELFIES [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2]?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C17H19FN2O3\/c1-3-23-15-7-5-4-6-12(15)11-19-17(21)20-13-8-9-16(22-2)14(18)10-13\/h4-10H,3,11H2,1-2H3,(H2,19,20,21)."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_5-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the SELFIES [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2]?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C16H26N2O\/c1-6-17(7-2)16(19)12-18(8-3)15-10-13(4)9-14(5)11-15\/h9-11H,6-8,12H2,1-5H3."} {"text":"User: Can you create the InChI string of the molecule with the SELFIES ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]']?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_4-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the SELFIES.\nMolecule SELFIES: [C][C][C][C][C][C][C][Branch1][=C][N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][#C]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C12H20N4O2\/c1-2-3-8-4-6-9(7-5-8)14-12(17)10-11(13)16-18-15-10\/h8-9H,2-7H2,1H3,(H2,13,16)(H,14,17)"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the InChI string from the SELFIES.\nSELFIES: [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C18H22N2O3S\/c1-14-8-7-9-15(2)18(14)19-17(21)12-20(3)24(22,23)13-16-10-5-4-6-11-16\/h4-11H,12-13H2,1-3H3,(H,19,21)"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_1-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the InChI string from the SELFIES.\nSELFIES: [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C16H12N4O\/c1-10-6-8-11(9-7-10)14-18-15-12-4-2-3-5-13(12)17-16(21)20(15)19-14\/h2-9H,1H3,(H,17,21)"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the InChI string from the SELFIES.\nSELFIES: [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C27H28F2N6O3\/c1-18-24(37-14-13-34-11-9-33(2)10-12-34)16-35-25(18)27(30-17-31-35)38-23-8-7-19(15-22(23)29)32-26(36)20-5-3-4-6-21(20)28\/h3-8,15-17H,9-14H2,1-2H3,(H,32,36)"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_0-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the InChI string.\nMolecule InChI string: InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1]"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the InChI string.\nInChI string: InChI=1S\/C12H11N5S2\/c1-2-4-9(5-3-1)15-12-16-10(7-19-12)6-18-11-13-8-14-17-11\/h1-5,7-8H,6H2,(H,15,16)(H,13,14,17)\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1]"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_3-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C25H27ClN2O7S\/c1-34-13-3-10-28-22(18-4-2-5-19(26)16-18)21(24(30)25(28)31)23(29)17-6-8-20(9-7-17)36(32,33)27-11-14-35-15-12-27\/h2,4-9,16,22,29H,3,10-15H2,1H3\/b23-21+ can also be represented with the SELFIES representation [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1]."} {"text":"The molecule with the InChI string representation of InChI=1S\/C16H19FN4O3\/c1-23-15-5-4-11(7-14(15)17)19-16(22)20-12-8-18-21(9-12)10-13-3-2-6-24-13\/h4-5,7-9,13H,2-3,6,10H2,1H3,(H2,19,20,22) can also be represented with the SELFIES [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_1-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C16H12N4O\/c1-10-6-8-11(9-7-10)14-18-15-12-4-2-3-5-13(12)17-16(21)20(15)19-14\/h2-9H,1H3,(H,17,21) can also be represented with the SELFIES [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2]."} {"text":"The molecule with the InChI string representation of InChI=1S\/C27H28F2N6O3\/c1-18-24(37-14-13-34-11-9-33(2)10-12-34)16-35-25(18)27(30-17-31-35)38-23-8-7-19(15-22(23)29)32-26(36)20-5-3-4-6-21(20)28\/h3-8,15-17H,9-14H2,1-2H3,(H,32,36) can also be represented with the SELFIES representation [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_5-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the InChI string.\nMolecule InChI string: InChI=1S\/C16H26N2O\/c1-6-17(7-2)16(19)12-18(8-3)15-10-13(4)9-14(5)11-15\/h9-11H,6-8,12H2,1-5H3\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2]"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the InChI string.\nInChI string: InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]']"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_2-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the InChI string from the SELFIES.\nSELFIES: [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C28H42O7\/c1-16(2)17(3)13-23(31)35-22-15-21-24(5)9-8-20(30)14-19(24)7-10-27(21,33)28(34)12-11-26(32,18(4)29)25(22,28)6\/h7,13,16,20-22,30,32-34H,8-12,14-15H2,1-6H3\/b17-13+\/t20-,21+,22+,24-,25+,26+,27-,28+\/m0\/s1"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the InChI string from the SELFIES.\nSELFIES: [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C23H25N3O\/c1-2-6-21(7-3-1)19-27-22-11-9-20(10-12-22)18-25-14-16-26(17-15-25)23-8-4-5-13-24-23\/h1-13H,14-19H2"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_1-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the InChI string.\nMolecule InChI string: InChI=1S\/C17H15N5OS2\/c1-20-8-7-11-12(9-20)25-15-13(11)14(23)21(10-5-3-2-4-6-10)16-18-19-17(24)22(15)16\/h2-6H,7-9H2,1H3,(H,19,24)\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2]"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SELFIES from the InChI string.\nInChI string: InChI=1S\/C16H16BrF2N3O\/c17-12-6-11(18)7-13(19)16(12)20-15(23)9-22-8-10-4-2-1-3-5-14(10)21-22\/h6-8H,1-5,9H2,(H,20,23)\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br]"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_5-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the SELFIES [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C]?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C14H23N3O3\/c1-3-11-10-13(20-16-11)12-6-4-8-17(12)14(18)15-7-5-9-19-2\/h10,12H,3-9H2,1-2H3,(H,15,18)."} {"text":"User: Can you create the InChI string of the molecule with the SELFIES ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]']?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8-."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_2-4.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the InChI string InChI=1S\/C28H42O7\/c1-16(2)17(3)13-23(31)35-22-15-21-24(5)9-8-20(30)14-19(24)7-10-27(21,33)28(34)12-11-26(32,18(4)29)25(22,28)6\/h7,13,16,20-22,30,32-34H,8-12,14-15H2,1-6H3\/b17-13+\/t20-,21+,22+,24-,25+,26+,27-,28+\/m0\/s1?\nAssistant: Yes, this molecule has a SELFIES of [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C]."} {"text":"User: Can you create the SELFIES of the molecule with the InChI string InChI=1S\/C23H25N3O\/c1-2-6-21(7-3-1)19-27-22-11-9-20(10-12-22)18-25-14-16-26(17-15-25)23-8-4-5-13-24-23\/h1-13H,14-19H2?\nAssistant: Sure, this molecule has a SELFIES of [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_3-4.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the InChI string InChI=1S\/C16H12FN3O2S\/c17-11-3-1-4-12(9-11)18-15(21)10-23-16-7-6-13(19-20-16)14-5-2-8-22-14\/h1-9H,10H2,(H,18,21)?\nAssistant: Of course, this molecule has a SELFIES of [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1]."} {"text":"User: Can you create the SELFIES of the molecule with the InChI string InChI=1S\/C10H17N5O3\/c1-4-15(5-2)7(16)6-14(3)10(17)8-9(11)13-18-12-8\/h4-6H2,1-3H3,(H2,11,13)?\nAssistant: Yes, this molecule has a SELFIES of [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_0-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1 can also be represented with the SELFIES [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1]."} {"text":"The molecule with the InChI string InChI=1S\/C12H11N5S2\/c1-2-4-9(5-3-1)15-12-16-10(7-19-12)6-18-11-13-8-14-17-11\/h1-5,7-8H,6H2,(H,15,16)(H,13,14,17) can also be represented with the SELFIES [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_2-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the SELFIES [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N]?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C46H51ClN6O8S\/c1-45(2)15-13-33(39(28-45)31-3-5-34(47)6-4-31)29-51-19-21-52(22-20-51)35-7-10-38(43(26-35)61-36-8-11-40-32(25-36)14-18-48-40)44(55)50-62(58,59)37-9-12-41(42(27-37)53(56)57)49-46(30-54)16-23-60-24-17-46\/h3-12,14,18,25-27,48-49,54H,13,15-17,19-24,28-30H2,1-2H3,(H,50,55)."} {"text":"User: Can you create the InChI string of the molecule with the SELFIES [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1]?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C21H18BrClN2O3\/c1-24(2)8-9-25-18(12-4-3-5-13(22)10-12)17-19(26)15-11-14(23)6-7-16(15)28-20(17)21(25)27\/h3-7,10-11,18H,8-9H2,1-2H3."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_5-5.jsonl": "{"text":"User: Can you generate the InChI string of the molecule with the SELFIES [N][#C][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][Branch1][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][C][C][Ring1][Ring1][C][=C][Ring2][Ring1][Branch1]?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C18H16FN3O\/c19-17-4-2-1-3-14(17)12-22(16-9-10-16)18(23)21-15-7-5-13(11-20)6-8-15\/h1-8,16H,9-10,12H2,(H,21,23)."} {"text":"User: Can you generate the InChI string of the molecule with the SELFIES ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]']?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_3-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the SELFIES [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1]?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C25H27ClN2O7S\/c1-34-13-3-10-28-22(18-4-2-5-19(26)16-18)21(24(30)25(28)31)23(29)17-6-8-20(9-7-17)36(32,33)27-11-14-35-15-12-27\/h2,4-9,16,22,29H,3,10-15H2,1H3\/b23-21+."} {"text":"User: Can you tell me the InChI string of the molecule with the SELFIES [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F]?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C16H19FN4O3\/c1-23-15-5-4-11(7-14(15)17)19-16(22)20-12-8-18-21(9-12)10-13-3-2-6-24-13\/h4-5,7-9,13H,2-3,6,10H2,1H3,(H2,19,20,22)."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_2-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C28H42O7\/c1-16(2)17(3)13-23(31)35-22-15-21-24(5)9-8-20(30)14-19(24)7-10-27(21,33)28(34)12-11-26(32,18(4)29)25(22,28)6\/h7,13,16,20-22,30,32-34H,8-12,14-15H2,1-6H3\/b17-13+\/t20-,21+,22+,24-,25+,26+,27-,28+\/m0\/s1 can also be represented with the SELFIES [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C]."} {"text":"The molecule with the InChI string representation of InChI=1S\/C23H25N3O\/c1-2-6-21(7-3-1)19-27-22-11-9-20(10-12-22)18-25-14-16-26(17-15-25)23-8-4-5-13-24-23\/h1-13H,14-19H2 can also be represented with the SELFIES representation [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_1-4.jsonl": "{"text":"User: Can you create the SELFIES of the molecule with the InChI string InChI=1S\/C16H12N4O\/c1-10-6-8-11(9-7-10)14-18-15-12-4-2-3-5-13(12)17-16(21)20(15)19-14\/h2-9H,1H3,(H,17,21)?\nAssistant: Of course, this molecule has a SELFIES of [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2]."} {"text":"User: Can you tell me the SELFIES of the molecule with the InChI string InChI=1S\/C27H28F2N6O3\/c1-18-24(37-14-13-34-11-9-33(2)10-12-34)16-35-25(18)27(30-17-31-35)38-23-8-7-19(15-22(23)29)32-26(36)20-5-3-4-6-21(20)28\/h3-8,15-17H,9-14H2,1-2H3,(H,32,36)?\nAssistant: Of course, this molecule has a SELFIES of [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_2-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C13H15N3O2S\/c1-9-8-19-13(15-9)16-14-7-10-4-5-11(17-2)12(6-10)18-3\/h4-8H,1-3H3,(H,15,16)\/b14-7+ can also be represented with the SELFIES representation [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C]."} {"text":"The molecule with the InChI string InChI=1S\/C8H10O3S\/c1-3-11-6-4-5-12-7(6)8(9)10-2\/h4-5H,3H2,1-2H3 can also be represented with the SELFIES representation [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_4-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SELFIES from the InChI string.\nInChI string: InChI=1S\/C12H20N4O2\/c1-2-3-8-4-6-9(7-5-8)14-12(17)10-11(13)16-18-15-10\/h8-9H,2-7H2,1H3,(H2,13,16)(H,14,17)\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][C][C][C][C][C][C][Branch1][=C][N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][#C]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SELFIES from the InChI string.\nMolecule InChI string: InChI=1S\/C18H22N2O3S\/c1-14-8-7-9-15(2)18(14)19-17(21)12-20(3)24(22,23)13-16-10-5-4-6-11-16\/h4-11H,12-13H2,1-3H3,(H,19,21)\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_0-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the SELFIES.\nMolecule SELFIES: [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the SELFIES.\nSELFIES: [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C12H11N5S2\/c1-2-4-9(5-3-1)15-12-16-10(7-19-12)6-18-11-13-8-14-17-11\/h1-5,7-8H,6H2,(H,15,16)(H,13,14,17)"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_4-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SELFIES from the InChI string.\nInChI string: InChI=1S\/C18H24N4O3\/c1-13(2)25-16-7-5-14(6-8-16)20-18(23)21-15-10-19-22(11-15)12-17-4-3-9-24-17\/h5-8,10-11,13,17H,3-4,9,12H2,1-2H3,(H2,20,21,23)\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1]"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SELFIES from the InChI string.\nMolecule InChI string: InChI=1S\/C21H16N2O2\/c22-20(24)17-9-7-15(8-10-17)4-3-13-23-21(25)19-12-11-16-5-1-2-6-18(16)14-19\/h1-2,5-12,14H,13H2,(H2,22,24)(H,23,25)\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1]"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_3-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the InChI string.\nInChI string: InChI=1S\/C16H12FN3O2S\/c17-11-3-1-4-12(9-11)18-15(21)10-23-16-7-6-13(19-20-16)14-5-2-8-22-14\/h1-9H,10H2,(H,18,21)\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the InChI string.\nInChI string: InChI=1S\/C10H17N5O3\/c1-4-15(5-2)7(16)6-14(3)10(17)8-9(11)13-18-12-8\/h4-6H2,1-3H3,(H2,11,13)\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_2-1.jsonl": "{"text":"The molecule with the SELFIES [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C] can also be represented with the InChI string InChI=1S\/C13H15N3O2S\/c1-9-8-19-13(15-9)16-14-7-10-4-5-11(17-2)12(6-10)18-3\/h4-8H,1-3H3,(H,15,16)\/b14-7+."} {"text":"The molecule with the SELFIES [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C] can also be represented with the InChI string representation InChI=1S\/C8H10O3S\/c1-3-11-6-4-5-12-7(6)8(9)10-2\/h4-5H,3H2,1-2H3."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_5-4.jsonl": "{"text":"User: Can you create the SELFIES of the molecule with the InChI string InChI=1S\/C16H26N2O\/c1-6-17(7-2)16(19)12-18(8-3)15-10-13(4)9-14(5)11-15\/h9-11H,6-8,12H2,1-5H3?\nAssistant: Sure, this molecule has a SELFIES of [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2]."} {"text":"User: Can you generate the SELFIES of the molecule with the InChI string InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1?\nAssistant: Yes, this molecule has a SELFIES of ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]']."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_4-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the SELFIES [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1]?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C18H24N4O3\/c1-13(2)25-16-7-5-14(6-8-16)20-18(23)21-15-10-19-22(11-15)12-17-4-3-9-24-17\/h5-8,10-11,13,17H,3-4,9,12H2,1-2H3,(H2,20,21,23)."} {"text":"User: Can you tell me the InChI string of the molecule with the SELFIES [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1]?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C21H16N2O2\/c22-20(24)17-9-7-15(8-10-17)4-3-13-23-21(25)19-12-11-16-5-1-2-6-18(16)14-19\/h1-2,5-12,14H,13H2,(H2,22,24)(H,23,25)."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_4-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C12H20N4O2\/c1-2-3-8-4-6-9(7-5-8)14-12(17)10-11(13)16-18-15-10\/h8-9H,2-7H2,1H3,(H2,13,16)(H,14,17) can also be represented with the SELFIES representation [C][C][C][C][C][C][C][Branch1][=C][N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][#C]."} {"text":"The molecule with the InChI string representation of InChI=1S\/C18H22N2O3S\/c1-14-8-7-9-15(2)18(14)19-17(21)12-20(3)24(22,23)13-16-10-5-4-6-11-16\/h4-11H,12-13H2,1-3H3,(H,19,21) can also be represented with the SELFIES representation [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_5-1.jsonl": "{"text":"The molecule with the SELFIES [N][#C][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][Branch1][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][C][C][Ring1][Ring1][C][=C][Ring2][Ring1][Branch1] can also be represented with the InChI string representation InChI=1S\/C18H16FN3O\/c19-17-4-2-1-3-14(17)12-22(16-9-10-16)18(23)21-15-7-5-13(11-20)6-8-15\/h1-8,16H,9-10,12H2,(H,21,23)."} {"text":"The molecule with the SELFIES representation of ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]'] can also be represented with the InChI string InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_2-1.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N] can also be represented with the InChI string InChI=1S\/C46H51ClN6O8S\/c1-45(2)15-13-33(39(28-45)31-3-5-34(47)6-4-31)29-51-19-21-52(22-20-51)35-7-10-38(43(26-35)61-36-8-11-40-32(25-36)14-18-48-40)44(55)50-62(58,59)37-9-12-41(42(27-37)53(56)57)49-46(30-54)16-23-60-24-17-46\/h3-12,14,18,25-27,48-49,54H,13,15-17,19-24,28-30H2,1-2H3,(H,50,55)."} {"text":"The molecule with the SELFIES representation of [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1] can also be represented with the InChI string representation InChI=1S\/C21H18BrClN2O3\/c1-24(2)8-9-25-18(12-4-3-5-13(22)10-12)17-19(26)15-11-14(23)6-7-16(15)28-20(17)21(25)27\/h3-7,10-11,18H,8-9H2,1-2H3."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_0-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1 can also be represented with the SELFIES [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C]."} {"text":"The molecule with the InChI string representation of InChI=1S\/C32H26NO2.ClHO4\/c1-2-35-32(34)27-19-12-20-29(21-27)33-30(25-15-8-4-9-16-25)22-28(24-13-6-3-7-14-24)23-31(33)26-17-10-5-11-18-26;2-1(3,4)5\/h3-23H,2H2,1H3;(H,2,3,4,5)\/q+1;\/p-1 can also be represented with the SELFIES representation [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_0-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the SELFIES.\nMolecule SELFIES: [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the SELFIES.\nSELFIES: [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C32H26NO2.ClHO4\/c1-2-35-32(34)27-19-12-20-29(21-27)33-30(25-15-8-4-9-16-25)22-28(24-13-6-3-7-14-24)23-31(33)26-17-10-5-11-18-26;2-1(3,4)5\/h3-23H,2H2,1H3;(H,2,3,4,5)\/q+1;\/p-1"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_1-4.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the InChI string InChI=1S\/C19H23N5O2S\/c1-24-19(21-22-23-24)27-11-10-20-13-16-8-9-17(18(12-16)25-2)26-14-15-6-4-3-5-7-15\/h3-9,12,20H,10-11,13-14H2,1-2H3?\nAssistant: Sure, this molecule has a SELFIES of [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]."} {"text":"User: Can you generate the SELFIES of the molecule with the InChI string InChI=1S\/C12H15N3OS\/c1-13-12(17)15-8-7-11(14-15)9-3-5-10(16-2)6-4-9\/h3-6H,7-8H2,1-2H3,(H,13,17)?\nAssistant: Yes, this molecule has a SELFIES of [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_2-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the SELFIES [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C]?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C28H42O7\/c1-16(2)17(3)13-23(31)35-22-15-21-24(5)9-8-20(30)14-19(24)7-10-27(21,33)28(34)12-11-26(32,18(4)29)25(22,28)6\/h7,13,16,20-22,30,32-34H,8-12,14-15H2,1-6H3\/b17-13+\/t20-,21+,22+,24-,25+,26+,27-,28+\/m0\/s1."} {"text":"User: Can you create the InChI string of the molecule with the SELFIES [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O]?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C23H25N3O\/c1-2-6-21(7-3-1)19-27-22-11-9-20(10-12-22)18-25-14-16-26(17-15-25)23-8-4-5-13-24-23\/h1-13H,14-19H2."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_1-1.jsonl": "{"text":"The molecule with the SELFIES representation of [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1] can also be represented with the InChI string representation InChI=1S\/C19H23N5O2S\/c1-24-19(21-22-23-24)27-11-10-20-13-16-8-9-17(18(12-16)25-2)26-14-15-6-4-3-5-7-15\/h3-9,12,20H,10-11,13-14H2,1-2H3."} {"text":"The molecule with the SELFIES [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N] can also be represented with the InChI string InChI=1S\/C12H15N3OS\/c1-13-12(17)15-8-7-11(14-15)9-3-5-10(16-2)6-4-9\/h3-6H,7-8H2,1-2H3,(H,13,17)."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_5-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SELFIES from the InChI string.\nMolecule InChI string: InChI=1S\/C18H16FN3O\/c19-17-4-2-1-3-14(17)12-22(16-9-10-16)18(23)21-15-7-5-13(11-20)6-8-15\/h1-8,16H,9-10,12H2,(H,21,23)\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [N][#C][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][Branch1][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][C][C][Ring1][Ring1][C][=C][Ring2][Ring1][Branch1]"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the InChI string.\nMolecule InChI string: InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]']"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_5-1.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2] can also be represented with the InChI string InChI=1S\/C16H26N2O\/c1-6-17(7-2)16(19)12-18(8-3)15-10-13(4)9-14(5)11-15\/h9-11H,6-8,12H2,1-5H3."} {"text":"The molecule with the SELFIES representation of ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]'] can also be represented with the InChI string InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_4-1.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C] can also be represented with the InChI string InChI=1S\/C11H17N5O3\/c1-2-13-10(17)7-3-5-16(6-4-7)11(18)8-9(12)15-19-14-8\/h7H,2-6H2,1H3,(H2,12,15)(H,13,17)."} {"text":"The molecule with the SELFIES representation of [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C] can also be represented with the InChI string representation InChI=1S\/C18H22N4O\/c1-11(2)19-17(23)10-9-14-12(3)20-18-15-7-5-6-8-16(15)21-22(18)13(14)4\/h5-8,11H,9-10H2,1-4H3,(H,19,23)."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_3-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the SELFIES.\nMolecule SELFIES: [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C25H27ClN2O7S\/c1-34-13-3-10-28-22(18-4-2-5-19(26)16-18)21(24(30)25(28)31)23(29)17-6-8-20(9-7-17)36(32,33)27-11-14-35-15-12-27\/h2,4-9,16,22,29H,3,10-15H2,1H3\/b23-21+"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the InChI string from the SELFIES.\nMolecule SELFIES: [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C16H19FN4O3\/c1-23-15-5-4-11(7-14(15)17)19-16(22)20-12-8-18-21(9-12)10-13-3-2-6-24-13\/h4-5,7-9,13H,2-3,6,10H2,1H3,(H2,19,20,22)"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_3-4.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the InChI string InChI=1S\/C19H20BrNO2.ClH\/c20-17-8-6-16(7-9-17)19(22)18(15-4-2-1-3-5-15)14-21-10-12-23-13-11-21;\/h1-9,18H,10-14H2;1H?\nAssistant: Of course, this molecule has a SELFIES of [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]."} {"text":"User: Can you tell me the SELFIES of the molecule with the InChI string InChI=1S\/C17H19FN2O3\/c1-3-23-15-7-5-4-6-12(15)11-19-17(21)20-13-8-9-16(22-2)14(18)10-13\/h4-10H,3,11H2,1-2H3,(H2,19,20,21)?\nAssistant: Of course, this molecule has a SELFIES of [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_1-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the InChI string from the SELFIES.\nSELFIES: [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C17H15N5OS2\/c1-20-8-7-11-12(9-20)25-15-13(11)14(23)21(10-5-3-2-4-6-10)16-18-19-17(24)22(15)16\/h2-6H,7-9H2,1H3,(H,19,24)"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the SELFIES.\nMolecule SELFIES: [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C16H16BrF2N3O\/c17-12-6-11(18)7-13(19)16(12)20-15(23)9-22-8-10-4-2-1-3-5-14(10)21-22\/h6-8H,1-5,9H2,(H,20,23)"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_0-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the InChI string.\nMolecule InChI string: InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2]"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the InChI string.\nInChI string: InChI=1S\/C25H37N3O4\/c1-28(2)6-5-26-24(30)22(25-13-16-7-17(14-25)9-18(8-16)15-25)27-23(29)19-10-20(31-3)12-21(11-19)32-4\/h10-12,16-18,22H,5-9,13-15H2,1-4H3,(H,26,30)(H,27,29)\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C]"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_5-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C18H16FN3O\/c19-17-4-2-1-3-14(17)12-22(16-9-10-16)18(23)21-15-7-5-13(11-20)6-8-15\/h1-8,16H,9-10,12H2,(H,21,23) can also be represented with the SELFIES [N][#C][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][Branch1][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][C][C][Ring1][Ring1][C][=C][Ring2][Ring1][Branch1]."} {"text":"The molecule with the InChI string InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1 can also be represented with the SELFIES ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]']."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_0-1.jsonl": "{"text":"The molecule with the SELFIES representation of [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2] can also be represented with the InChI string InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1."} {"text":"The molecule with the SELFIES [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C] can also be represented with the InChI string InChI=1S\/C25H37N3O4\/c1-28(2)6-5-26-24(30)22(25-13-16-7-17(14-25)9-18(8-16)15-25)27-23(29)19-10-20(31-3)12-21(11-19)32-4\/h10-12,16-18,22H,5-9,13-15H2,1-4H3,(H,26,30)(H,27,29)."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_2-4.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the InChI string InChI=1S\/C13H15N3O2S\/c1-9-8-19-13(15-9)16-14-7-10-4-5-11(17-2)12(6-10)18-3\/h4-8H,1-3H3,(H,15,16)\/b14-7+?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C]."} {"text":"User: Can you tell me the SELFIES of the molecule with the InChI string InChI=1S\/C8H10O3S\/c1-3-11-6-4-5-12-7(6)8(9)10-2\/h4-5H,3H2,1-2H3?\nAssistant: Of course, this molecule has a SELFIES of [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_2-1.jsonl": "{"text":"The molecule with the SELFIES [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C] can also be represented with the InChI string representation InChI=1S\/C28H42O7\/c1-16(2)17(3)13-23(31)35-22-15-21-24(5)9-8-20(30)14-19(24)7-10-27(21,33)28(34)12-11-26(32,18(4)29)25(22,28)6\/h7,13,16,20-22,30,32-34H,8-12,14-15H2,1-6H3\/b17-13+\/t20-,21+,22+,24-,25+,26+,27-,28+\/m0\/s1."} {"text":"The molecule with the SELFIES representation of [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O] can also be represented with the InChI string representation InChI=1S\/C23H25N3O\/c1-2-6-21(7-3-1)19-27-22-11-9-20(10-12-22)18-25-14-16-26(17-15-25)23-8-4-5-13-24-23\/h1-13H,14-19H2."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_1-1.jsonl": "{"text":"The molecule with the SELFIES [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2] can also be represented with the InChI string InChI=1S\/C17H15N5OS2\/c1-20-8-7-11-12(9-20)25-15-13(11)14(23)21(10-5-3-2-4-6-10)16-18-19-17(24)22(15)16\/h2-6H,7-9H2,1H3,(H,19,24)."} {"text":"The molecule with the SELFIES representation of [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br] can also be represented with the InChI string representation InChI=1S\/C16H16BrF2N3O\/c17-12-6-11(18)7-13(19)16(12)20-15(23)9-22-8-10-4-2-1-3-5-14(10)21-22\/h6-8H,1-5,9H2,(H,20,23)."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_3-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the SELFIES [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1]?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C16H12FN3O2S\/c17-11-3-1-4-12(9-11)18-15(21)10-23-16-7-6-13(19-20-16)14-5-2-8-22-14\/h1-9H,10H2,(H,18,21)."} {"text":"User: Can you tell me the InChI string of the molecule with the SELFIES [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C10H17N5O3\/c1-4-15(5-2)7(16)6-14(3)10(17)8-9(11)13-18-12-8\/h4-6H2,1-3H3,(H2,11,13)."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_3-1.jsonl": "{"text":"The molecule with the SELFIES [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1] can also be represented with the InChI string representation InChI=1S\/C25H27ClN2O7S\/c1-34-13-3-10-28-22(18-4-2-5-19(26)16-18)21(24(30)25(28)31)23(29)17-6-8-20(9-7-17)36(32,33)27-11-14-35-15-12-27\/h2,4-9,16,22,29H,3,10-15H2,1H3\/b23-21+."} {"text":"The molecule with the SELFIES [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F] can also be represented with the InChI string representation InChI=1S\/C16H19FN4O3\/c1-23-15-5-4-11(7-14(15)17)19-16(22)20-12-8-18-21(9-12)10-13-3-2-6-24-13\/h4-5,7-9,13H,2-3,6,10H2,1H3,(H2,19,20,22)."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_0-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the SELFIES [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2]?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1."} {"text":"User: Can you tell me the InChI string of the molecule with the SELFIES [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C]?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C25H37N3O4\/c1-28(2)6-5-26-24(30)22(25-13-16-7-17(14-25)9-18(8-16)15-25)27-23(29)19-10-20(31-3)12-21(11-19)32-4\/h10-12,16-18,22H,5-9,13-15H2,1-4H3,(H,26,30)(H,27,29)."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_0-4.jsonl": "{"text":"User: Can you create the SELFIES of the molecule with the InChI string InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2]."} {"text":"User: Can you tell me the SELFIES of the molecule with the InChI string InChI=1S\/C25H37N3O4\/c1-28(2)6-5-26-24(30)22(25-13-16-7-17(14-25)9-18(8-16)15-25)27-23(29)19-10-20(31-3)12-21(11-19)32-4\/h10-12,16-18,22H,5-9,13-15H2,1-4H3,(H,26,30)(H,27,29)?\nAssistant: Of course, this molecule has a SELFIES of [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_1-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the InChI string from the SELFIES.\nMolecule SELFIES: [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C19H23N5O2S\/c1-24-19(21-22-23-24)27-11-10-20-13-16-8-9-17(18(12-16)25-2)26-14-15-6-4-3-5-7-15\/h3-9,12,20H,10-11,13-14H2,1-2H3"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the SELFIES.\nMolecule SELFIES: [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C12H15N3OS\/c1-13-12(17)15-8-7-11(14-15)9-3-5-10(16-2)6-4-9\/h3-6H,7-8H2,1-2H3,(H,13,17)"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_5-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the SELFIES.\nSELFIES: [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C16H26N2O\/c1-6-17(7-2)16(19)12-18(8-3)15-10-13(4)9-14(5)11-15\/h9-11H,6-8,12H2,1-5H3"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the InChI string from the SELFIES.\nSELFIES: ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]']\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_0-5.jsonl": "{"text":"User: Can you generate the InChI string of the molecule with the SELFIES [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C]?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1."} {"text":"User: Can you create the InChI string of the molecule with the SELFIES [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1]?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C32H26NO2.ClHO4\/c1-2-35-32(34)27-19-12-20-29(21-27)33-30(25-15-8-4-9-16-25)22-28(24-13-6-3-7-14-24)23-31(33)26-17-10-5-11-18-26;2-1(3,4)5\/h3-23H,2H2,1H3;(H,2,3,4,5)\/q+1;\/p-1."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_4-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the SELFIES [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C]?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C11H17N5O3\/c1-2-13-10(17)7-3-5-16(6-4-7)11(18)8-9(12)15-19-14-8\/h7H,2-6H2,1H3,(H2,12,15)(H,13,17)."} {"text":"User: Can you create the InChI string of the molecule with the SELFIES [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C]?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C18H22N4O\/c1-11(2)19-17(23)10-9-14-12(3)20-18-15-7-5-6-8-16(15)21-22(18)13(14)4\/h5-8,11H,9-10H2,1-4H3,(H,19,23)."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_1-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C19H23N5O2S\/c1-24-19(21-22-23-24)27-11-10-20-13-16-8-9-17(18(12-16)25-2)26-14-15-6-4-3-5-7-15\/h3-9,12,20H,10-11,13-14H2,1-2H3 can also be represented with the SELFIES representation [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]."} {"text":"The molecule with the InChI string representation of InChI=1S\/C12H15N3OS\/c1-13-12(17)15-8-7-11(14-15)9-3-5-10(16-2)6-4-9\/h3-6H,7-8H2,1-2H3,(H,13,17) can also be represented with the SELFIES [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_0-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the InChI string.\nMolecule InChI string: InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C]"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the InChI string.\nInChI string: InChI=1S\/C32H26NO2.ClHO4\/c1-2-35-32(34)27-19-12-20-29(21-27)33-30(25-15-8-4-9-16-25)22-28(24-13-6-3-7-14-24)23-31(33)26-17-10-5-11-18-26;2-1(3,4)5\/h3-23H,2H2,1H3;(H,2,3,4,5)\/q+1;\/p-1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1]"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_4-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the InChI string.\nInChI string: InChI=1S\/C11H17N5O3\/c1-2-13-10(17)7-3-5-16(6-4-7)11(18)8-9(12)15-19-14-8\/h7H,2-6H2,1H3,(H2,12,15)(H,13,17)\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C]"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the InChI string.\nMolecule InChI string: InChI=1S\/C18H22N4O\/c1-11(2)19-17(23)10-9-14-12(3)20-18-15-7-5-6-8-16(15)21-22(18)13(14)4\/h5-8,11H,9-10H2,1-4H3,(H,19,23)\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C]"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_4-1.jsonl": "{"text":"The molecule with the SELFIES [C][C][C][C][C][C][C][Branch1][=C][N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][#C] can also be represented with the InChI string InChI=1S\/C12H20N4O2\/c1-2-3-8-4-6-9(7-5-8)14-12(17)10-11(13)16-18-15-10\/h8-9H,2-7H2,1H3,(H2,13,16)(H,14,17)."} {"text":"The molecule with the SELFIES representation of [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1] can also be represented with the InChI string InChI=1S\/C18H22N2O3S\/c1-14-8-7-9-15(2)18(14)19-17(21)12-20(3)24(22,23)13-16-10-5-4-6-11-16\/h4-11H,12-13H2,1-3H3,(H,19,21)."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_5-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the SELFIES.\nSELFIES: [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C14H23N3O3\/c1-3-11-10-13(20-16-11)12-6-4-8-17(12)14(18)15-7-5-9-19-2\/h10,12H,3-9H2,1-2H3,(H,15,18)"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the SELFIES.\nSELFIES: ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]']\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8-"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_2-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the InChI string.\nInChI string: InChI=1S\/C46H51ClN6O8S\/c1-45(2)15-13-33(39(28-45)31-3-5-34(47)6-4-31)29-51-19-21-52(22-20-51)35-7-10-38(43(26-35)61-36-8-11-40-32(25-36)14-18-48-40)44(55)50-62(58,59)37-9-12-41(42(27-37)53(56)57)49-46(30-54)16-23-60-24-17-46\/h3-12,14,18,25-27,48-49,54H,13,15-17,19-24,28-30H2,1-2H3,(H,50,55)\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N]"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SELFIES from the InChI string.\nMolecule InChI string: InChI=1S\/C21H18BrClN2O3\/c1-24(2)8-9-25-18(12-4-3-5-13(22)10-12)17-19(26)15-11-14(23)6-7-16(15)28-20(17)21(25)27\/h3-7,10-11,18H,8-9H2,1-2H3\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1]"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_1-1.jsonl": "{"text":"The molecule with the SELFIES [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2] can also be represented with the InChI string representation InChI=1S\/C16H12N4O\/c1-10-6-8-11(9-7-10)14-18-15-12-4-2-3-5-13(12)17-16(21)20(15)19-14\/h2-9H,1H3,(H,17,21)."} {"text":"The molecule with the SELFIES [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2] can also be represented with the InChI string representation InChI=1S\/C27H28F2N6O3\/c1-18-24(37-14-13-34-11-9-33(2)10-12-34)16-35-25(18)27(30-17-31-35)38-23-8-7-19(15-22(23)29)32-26(36)20-5-3-4-6-21(20)28\/h3-8,15-17H,9-14H2,1-2H3,(H,32,36)."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_0-1.jsonl": "{"text":"The molecule with the SELFIES [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C] can also be represented with the InChI string representation InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1."} {"text":"The molecule with the SELFIES [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1] can also be represented with the InChI string representation InChI=1S\/C32H26NO2.ClHO4\/c1-2-35-32(34)27-19-12-20-29(21-27)33-30(25-15-8-4-9-16-25)22-28(24-13-6-3-7-14-24)23-31(33)26-17-10-5-11-18-26;2-1(3,4)5\/h3-23H,2H2,1H3;(H,2,3,4,5)\/q+1;\/p-1."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_3-4.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the InChI string InChI=1S\/C25H27ClN2O7S\/c1-34-13-3-10-28-22(18-4-2-5-19(26)16-18)21(24(30)25(28)31)23(29)17-6-8-20(9-7-17)36(32,33)27-11-14-35-15-12-27\/h2,4-9,16,22,29H,3,10-15H2,1H3\/b23-21+?\nAssistant: Sure, this molecule has a SELFIES of [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1]."} {"text":"User: Can you create the SELFIES of the molecule with the InChI string InChI=1S\/C16H19FN4O3\/c1-23-15-5-4-11(7-14(15)17)19-16(22)20-12-8-18-21(9-12)10-13-3-2-6-24-13\/h4-5,7-9,13H,2-3,6,10H2,1H3,(H2,19,20,22)?\nAssistant: Yes, this molecule has a SELFIES of [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_0-4.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the InChI string InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C]."} {"text":"User: Can you create the SELFIES of the molecule with the InChI string InChI=1S\/C32H26NO2.ClHO4\/c1-2-35-32(34)27-19-12-20-29(21-27)33-30(25-15-8-4-9-16-25)22-28(24-13-6-3-7-14-24)23-31(33)26-17-10-5-11-18-26;2-1(3,4)5\/h3-23H,2H2,1H3;(H,2,3,4,5)\/q+1;\/p-1?\nAssistant: Sure, this molecule has a SELFIES of [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_1-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C17H15N5OS2\/c1-20-8-7-11-12(9-20)25-15-13(11)14(23)21(10-5-3-2-4-6-10)16-18-19-17(24)22(15)16\/h2-6H,7-9H2,1H3,(H,19,24) can also be represented with the SELFIES representation [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2]."} {"text":"The molecule with the InChI string InChI=1S\/C16H16BrF2N3O\/c17-12-6-11(18)7-13(19)16(12)20-15(23)9-22-8-10-4-2-1-3-5-14(10)21-22\/h6-8H,1-5,9H2,(H,20,23) can also be represented with the SELFIES [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_4-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the SELFIES [C][C][C][C][C][C][C][Branch1][=C][N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][#C]?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C12H20N4O2\/c1-2-3-8-4-6-9(7-5-8)14-12(17)10-11(13)16-18-15-10\/h8-9H,2-7H2,1H3,(H2,13,16)(H,14,17)."} {"text":"User: Can you create the InChI string of the molecule with the SELFIES [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C18H22N2O3S\/c1-14-8-7-9-15(2)18(14)19-17(21)12-20(3)24(22,23)13-16-10-5-4-6-11-16\/h4-11H,12-13H2,1-3H3,(H,19,21)."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_4-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the SELFIES.\nSELFIES: [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C11H17N5O3\/c1-2-13-10(17)7-3-5-16(6-4-7)11(18)8-9(12)15-19-14-8\/h7H,2-6H2,1H3,(H2,12,15)(H,13,17)"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the InChI string from the SELFIES.\nSELFIES: [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C18H22N4O\/c1-11(2)19-17(23)10-9-14-12(3)20-18-15-7-5-6-8-16(15)21-22(18)13(14)4\/h5-8,11H,9-10H2,1-4H3,(H,19,23)"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_4-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the InChI string from the SELFIES.\nMolecule SELFIES: [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C18H24N4O3\/c1-13(2)25-16-7-5-14(6-8-16)20-18(23)21-15-10-19-22(11-15)12-17-4-3-9-24-17\/h5-8,10-11,13,17H,3-4,9,12H2,1-2H3,(H2,20,21,23)"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the SELFIES.\nMolecule SELFIES: [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C21H16N2O2\/c22-20(24)17-9-7-15(8-10-17)4-3-13-23-21(25)19-12-11-16-5-1-2-6-18(16)14-19\/h1-2,5-12,14H,13H2,(H2,22,24)(H,23,25)"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_3-1.jsonl": "{"text":"The molecule with the SELFIES [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1] can also be represented with the InChI string representation InChI=1S\/C19H20BrNO2.ClH\/c20-17-8-6-16(7-9-17)19(22)18(15-4-2-1-3-5-15)14-21-10-12-23-13-11-21;\/h1-9,18H,10-14H2;1H."} {"text":"The molecule with the SELFIES representation of [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2] can also be represented with the InChI string InChI=1S\/C17H19FN2O3\/c1-3-23-15-7-5-4-6-12(15)11-19-17(21)20-13-8-9-16(22-2)14(18)10-13\/h4-10H,3,11H2,1-2H3,(H2,19,20,21)."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_2-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the InChI string from the SELFIES.\nSELFIES: [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C13H15N3O2S\/c1-9-8-19-13(15-9)16-14-7-10-4-5-11(17-2)12(6-10)18-3\/h4-8H,1-3H3,(H,15,16)\/b14-7+"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the SELFIES.\nMolecule SELFIES: [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C8H10O3S\/c1-3-11-6-4-5-12-7(6)8(9)10-2\/h4-5H,3H2,1-2H3"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_3-1.jsonl": "{"text":"The molecule with the SELFIES representation of [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1] can also be represented with the InChI string representation InChI=1S\/C16H12FN3O2S\/c17-11-3-1-4-12(9-11)18-15(21)10-23-16-7-6-13(19-20-16)14-5-2-8-22-14\/h1-9H,10H2,(H,18,21)."} {"text":"The molecule with the SELFIES [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N] can also be represented with the InChI string InChI=1S\/C10H17N5O3\/c1-4-15(5-2)7(16)6-14(3)10(17)8-9(11)13-18-12-8\/h4-6H2,1-3H3,(H2,11,13)."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_0-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the InChI string from the SELFIES.\nSELFIES: [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the SELFIES.\nSELFIES: [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C25H37N3O4\/c1-28(2)6-5-26-24(30)22(25-13-16-7-17(14-25)9-18(8-16)15-25)27-23(29)19-10-20(31-3)12-21(11-19)32-4\/h10-12,16-18,22H,5-9,13-15H2,1-4H3,(H,26,30)(H,27,29)"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_1-5.jsonl": "{"text":"User: Can you generate the InChI string of the molecule with the SELFIES [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2]?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C17H15N5OS2\/c1-20-8-7-11-12(9-20)25-15-13(11)14(23)21(10-5-3-2-4-6-10)16-18-19-17(24)22(15)16\/h2-6H,7-9H2,1H3,(H,19,24)."} {"text":"User: Can you create the InChI string of the molecule with the SELFIES [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br]?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C16H16BrF2N3O\/c17-12-6-11(18)7-13(19)16(12)20-15(23)9-22-8-10-4-2-1-3-5-14(10)21-22\/h6-8H,1-5,9H2,(H,20,23)."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_4-1.jsonl": "{"text":"The molecule with the SELFIES [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1] can also be represented with the InChI string representation InChI=1S\/C18H24N4O3\/c1-13(2)25-16-7-5-14(6-8-16)20-18(23)21-15-10-19-22(11-15)12-17-4-3-9-24-17\/h5-8,10-11,13,17H,3-4,9,12H2,1-2H3,(H2,20,21,23)."} {"text":"The molecule with the SELFIES [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1] can also be represented with the InChI string InChI=1S\/C21H16N2O2\/c22-20(24)17-9-7-15(8-10-17)4-3-13-23-21(25)19-12-11-16-5-1-2-6-18(16)14-19\/h1-2,5-12,14H,13H2,(H2,22,24)(H,23,25)."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_5-4.jsonl": "{"text":"User: Can you create the SELFIES of the molecule with the InChI string InChI=1S\/C14H23N3O3\/c1-3-11-10-13(20-16-11)12-6-4-8-17(12)14(18)15-7-5-9-19-2\/h10,12H,3-9H2,1-2H3,(H,15,18)?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C]."} {"text":"User: Can you generate the SELFIES of the molecule with the InChI string InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8-?\nAssistant: Sure, this molecule has a SELFIES of ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]']."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/valid_2-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the SELFIES [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C]?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C13H15N3O2S\/c1-9-8-19-13(15-9)16-14-7-10-4-5-11(17-2)12(6-10)18-3\/h4-8H,1-3H3,(H,15,16)\/b14-7+."} {"text":"User: Can you generate the InChI string of the molecule with the SELFIES [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C]?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C8H10O3S\/c1-3-11-6-4-5-12-7(6)8(9)10-2\/h4-5H,3H2,1-2H3."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_1-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the InChI string.\nInChI string: InChI=1S\/C16H12N4O\/c1-10-6-8-11(9-7-10)14-18-15-12-4-2-3-5-13(12)17-16(21)20(15)19-14\/h2-9H,1H3,(H,17,21)\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SELFIES from the InChI string.\nInChI string: InChI=1S\/C27H28F2N6O3\/c1-18-24(37-14-13-34-11-9-33(2)10-12-34)16-35-25(18)27(30-17-31-35)38-23-8-7-19(15-22(23)29)32-26(36)20-5-3-4-6-21(20)28\/h3-8,15-17H,9-14H2,1-2H3,(H,32,36)\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2]"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_0-4.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the InChI string InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1?\nAssistant: Yes, this molecule has a SELFIES of [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1]."} {"text":"User: Can you tell me the SELFIES of the molecule with the InChI string InChI=1S\/C12H11N5S2\/c1-2-4-9(5-3-1)15-12-16-10(7-19-12)6-18-11-13-8-14-17-11\/h1-5,7-8H,6H2,(H,15,16)(H,13,14,17)?\nAssistant: Yes, this molecule has a SELFIES of [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_3-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C19H20BrNO2.ClH\/c20-17-8-6-16(7-9-17)19(22)18(15-4-2-1-3-5-15)14-21-10-12-23-13-11-21;\/h1-9,18H,10-14H2;1H can also be represented with the SELFIES representation [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]."} {"text":"The molecule with the InChI string representation of InChI=1S\/C17H19FN2O3\/c1-3-23-15-7-5-4-6-12(15)11-19-17(21)20-13-8-9-16(22-2)14(18)10-13\/h4-10H,3,11H2,1-2H3,(H2,19,20,21) can also be represented with the SELFIES [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/test_2-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the SELFIES.\nMolecule SELFIES: [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C46H51ClN6O8S\/c1-45(2)15-13-33(39(28-45)31-3-5-34(47)6-4-31)29-51-19-21-52(22-20-51)35-7-10-38(43(26-35)61-36-8-11-40-32(25-36)14-18-48-40)44(55)50-62(58,59)37-9-12-41(42(27-37)53(56)57)49-46(30-54)16-23-60-24-17-46\/h3-12,14,18,25-27,48-49,54H,13,15-17,19-24,28-30H2,1-2H3,(H,50,55)"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the InChI string from the SELFIES.\nMolecule SELFIES: [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C21H18BrClN2O3\/c1-24(2)8-9-25-18(12-4-3-5-13(22)10-12)17-19(26)15-11-14(23)6-7-16(15)28-20(17)21(25)27\/h3-7,10-11,18H,8-9H2,1-2H3"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_4-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C11H17N5O3\/c1-2-13-10(17)7-3-5-16(6-4-7)11(18)8-9(12)15-19-14-8\/h7H,2-6H2,1H3,(H2,12,15)(H,13,17) can also be represented with the SELFIES representation [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C]."} {"text":"The molecule with the InChI string representation of InChI=1S\/C18H22N4O\/c1-11(2)19-17(23)10-9-14-12(3)20-18-15-7-5-6-8-16(15)21-22(18)13(14)4\/h5-8,11H,9-10H2,1-4H3,(H,19,23) can also be represented with the SELFIES representation [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C]."}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_3-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SELFIES from the InChI string.\nMolecule InChI string: InChI=1S\/C19H20BrNO2.ClH\/c20-17-8-6-16(7-9-17)19(22)18(15-4-2-1-3-5-15)14-21-10-12-23-13-11-21;\/h1-9,18H,10-14H2;1H\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the InChI string.\nInChI string: InChI=1S\/C17H19FN2O3\/c1-3-23-15-7-5-4-6-12(15)11-19-17(21)20-13-8-9-16(22-2)14(18)10-13\/h4-10H,3,11H2,1-2H3,(H2,19,20,21)\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2]"}", "/scratch/micpie/export/mol_repr_transl_selfies_inchi/train_3-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the SELFIES.\nMolecule SELFIES: [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C19H20BrNO2.ClH\/c20-17-8-6-16(7-9-17)19(22)18(15-4-2-1-3-5-15)14-21-10-12-23-13-11-21;\/h1-9,18H,10-14H2;1H"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the InChI string from the SELFIES.\nMolecule SELFIES: [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C17H19FN2O3\/c1-3-23-15-7-5-4-6-12(15)11-19-17(21)20-13-8-9-16(22-2)14(18)10-13\/h4-10H,3,11H2,1-2H3,(H2,19,20,21)"}", "/scratch/micpie/export/compound_chebi_chebi/valid_0-0.jsonl": "{"text":"The compound IUPAC name [4-[(2-fluorophenyl)methoxy]phenyl]-pyrrolidin-1-ylmethanethione is a [4-[(2-fluorophenyl)methoxy]phenyl]-(1-pyrrolidinyl)methanethione and is a aromatic ether."} {"text":"The compound DeepSMILES CC[C@H]C)[C@H]NC=O)[C@H]CccccO)cc6)))))))NC=O)[C@@H]CCCN5C=O)[C@H]CCCNC=N)N))))))NC=O)[C@H]CCCNC=N)N))))))NC=O)[C@@H]CCCN5C=O)[C@H]CCCCN)))))NC=O)[C@H]CCN)=O)))NC=O)[C@H]CCC=O)O))))NC=O)[C@H]CccccO)cc6)))))))NC=O)[C@H]CCC)C)))NC=O)[C@@H]CCC=O)N5)))))))))))))))))))))))))))))))))))))))))))C=O)N[C@@H]CCC)C)))C=O)O is a neurotensin and is the conjugate acid of neurotensin(1+)."}", "/scratch/micpie/export/compound_chebi_chebi/test_0-0.jsonl": "{"text":"The compound InChI InChI=1S\/C18H14N4O5S\/c23-16-9-6-13(11-15(16)18(24)25)21-20-12-4-7-14(8-5-12)28(26,27)22-17-3-1-2-10-19-17\/h1-11,23H,(H,19,22)(H,24,25)\/b21-20+ is a sulfasalazine and has the role gastrointestinal drug."} {"text":"The compound DeepSMILES COC=O)cccccc6SN)=O)=O is a 2-aminosulfonyl-benzoic acid methyl ester and is a sulfonamide."}", "/scratch/micpie/export/compound_chebi_chebi/train_0-0.jsonl": "{"text":"The compound InChI InChI=1S\/C6H16N2\/c7-5-3-1-2-4-6-8\/h1-8H2 is a hexane-1,6-diamine and has the functional parent N(1),N(6)-bis-DNCP-1,6-hexanediamine."} {"text":"The compound SMILES CCCCCCCC\/C=C\\CCCCCCCC(N)=O is a oleamide and has the role human metabolite."}", "/scratch/micpie/export/drug_chebi/valid_0-0.jsonl": "{"text":"The drug InChI=1S\/C8H10N4O2\/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2\/h4H,1-3H3 is a caffeine."} {"text":"The drug InChI=1S\/C17H17N7O8S4\/c1-23-16(20-21-22-23)34-4-5-3-33-15-17(32-2,14(31)24(15)7(5)11(29)30)19-9(26)13-35-12(36-13)6(8(18)25)10(27)28\/h13,15H,3-4H2,1-2H3,(H2,18,25)(H,19,26)(H,27,28)(H,29,30)\/t13?,15-,17+\/m1\/s1 is a cefotetan."}", "/scratch/micpie/export/drug_chebi/test_0-0.jsonl": "{"text":"The drug C[C@]12CC[C@@H]3c4ccc(O)cc4C[C@@H](CCCCCCCCCS(=O)CCCC(F)(F)C(F)(F)F)[C@H]3[C@@H]1CC[C@@H]2O is a fulvestrant."} {"text":"The drug N[C@H]CO)=O))C=CCO)=CCO)=C6 is a (S)-3,5-dihydroxyphenylglycine zwitterion."}", "/scratch/micpie/export/drug_chebi/train_0-0.jsonl": "{"text":"The drug [H][C@@][Branch1][C][C][C][=C][C][Branch1][C][O][=C][C][=C][Ring1][#Branch1][O][C@@][Branch1][C][H][Branch2][Ring1][Branch1][C][=C][C][=C][Branch1][O][O][C][C][N][C][C][C][C][Ring1][Branch1][C][=C][Ring1][=C][C@@][Ring2][Ring1][O][Branch1][C][H][C][=C][C][=C][Branch1][C][O][C][=C][Ring1][#Branch1] is a (2R,3R,4S)-3-(4-hydroxyphenyl)-4-methyl-2-[4-(2-pyrrolidin-1-ylethoxy)phenyl]chroman-6-ol."} {"text":"The drug O=C(OCCOCCO)c1ccccc1Nc1cccc(C(F)(F)F)c1 is a 2-[3-(trifluoromethyl)anilino]benzoic acid 2-(2-hydroxyethoxy)ethyl ester."}", "/scratch/micpie/export/bio_ner_10/valid_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: We reasoned that activated NK cells would be more useful for these experiments because (i) an encounter in vivo between NK cells that normally traffic in the blood (4) and iDCs which reside in the tissues (1, 3), should only occur when both cells are activated; (ii) activated NK cells mediate more potent lysis of iDCs (11-13), thereby allowing for a more rigorous examination of a potential cytotoxic effect; and (iii) the number of NK cells obtained after short-term in vitro culture would be far greater providing more cells for analysis (9)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: NK cells,27,35,Anatomy\nNK cells,121,129,Anatomy\nblood,159,164,Anatomy\niDCs,174,178,Anatomy\ntissues,199,206,Anatomy\ncells,244,249,Anatomy\nNK cells,281,289,Anatomy\niDCs,319,323,Anatomy\nNK cells,443,451,Anatomy\ncells,533,538,Anatomy"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: MAG metagenome assembled genome, R1 single-stage reactor, R2 acidogenic reactor of the two-stage, R3 methanogenic reactor of the two-stage The H2 utilization efficiency was calculated as previously described [ 15], using the following Eq. (1): 1 \\ documentclass [ 12pt] { minimal} \\ usepackage { amsmath} \\ usepackage { wasysym} \\ usepackage { amsfonts} \\ usepackage { amssymb} \\ usepackage { amsbsy} \\ usepackage { mathrsfs} \\ usepackage { upgreek} \\ setlength { \\ oddsidemargin} {-69pt} \\ begin { document} $ $ { H} _ 2 \\ \\ mathrm { utilization} \\ \\ mathrm { efficiency} = \\ frac { { \\ mathrm { H}} _ 2 \\ \\ mathrm { in} \\ mathrm { jected} \\ \\ left (\\ raisebox { 1ex} { $ \\ mathrm { mL} $} \\! \\ left \/ \\! \\ raisebox {-1ex} { $ \\ mathrm { L} \\ bullet \\ mathrm { day} $} \\ right. \\ right)-{ \\ mathrm { H}} _ 2 \\ kern0.5em \\ mathrm { in} \\ \\ mathrm { biogas} \\ \\ left (\\ raisebox { 1ex} { $ \\ mathrm { mL} $} \\! \\ left \/ \\! \\ raisebox {-1ex} { $ \\ mathrm { L} \\ bullet \\ mathrm { day} $} \\ right. \\ right)} { { \\ mathrm { H}} _ 2 \\ \\ mathrm { in} \\ mathrm { jected} \\ \\ left (\\ raisebox { 1ex} { $ \\ mathrm { mL} $} \\! \\ left \/ \\! \\ raisebox {-1ex} { $ \\ mathrm { L} \\ bullet \\ mathrm { day} $} \\ right. \\ right)} \\ times 100 $ $ \\ end { document} H2utilization efficiency = H2injectedmLLdayH2in biogasmLLdayH2injectedmLLday100 The CO2 conversion efficiency was calculated as follows (2): 2 \\ documentclass [ 12pt] { minimal} \\ usepackage { amsmath} \\ usepackage { wasysym} \\ usepackage { amsfonts} \\ usepackage { amssymb} \\ usepackage { amsbsy} \\ usepackage { mathrsfs} \\ usepackage { upgreek} \\ setlength { \\ oddsidemargin} {-69pt} \\ begin { document} $ $ { \\ mathrm { CO}} _ 2 \\ \\ mathrm { conversion} \\ \\ mathrm { efficiency} = \\ frac { \\ overline { { \\ mathrm { CO}} _ 2} \\ \\ mathrm { phaseI} \\ \\ left (\\ raisebox { 1ex} { $ \\ mathrm { mL} $} \\! \\ left \/ \\! \\ raisebox {-1ex} { $ \\ mathrm { L} \\ bullet \\ mathrm { day} $} \\ right. \\ right)-{ \\ mathrm { CO}} _ 2 \\ kern0.5em \\ mathrm { in} \\ \\ mathrm { biogas} \\ \\ mathrm { phase} \\ \\ mathrm { II} \\ \\ left (\\ raisebox { 1ex} { $ \\ mathrm { mL} $} \\! \\ left \/ \\! \\ raisebox {-1ex} { $ \\ mathrm { L} \\ bullet \\ mathrm { day} $} \\ right. \\ right)} { \\ overline { { \\ mathrm { CO}} _ 2} \\ \\ mathrm { phaseI} \\ \\ left (\\ raisebox { 1ex} { $ \\ mathrm { mL} $} \\! \\ left \/ \\! \\ raisebox {-1ex} { $ \\ mathrm { L} \\ bullet \\ mathrm { day} $} \\ right. \\ right)} \\ times 100 $ $ \\ end { document} CO2conversion efficiency = CO2phaseImLLdayCO2in biogas phaseIImLLdayCO2phaseImLLday100 Sample collection For metatranscriptomic analyses, triplicate samples (~ 30mL each) were collected from R1 and R3 at steady-state reactor operation of phase I (i. e., period with stable biogas production with a daily variation lower than 10% for at least 5days), and after 1week from the H2 injection (phase II)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: single - stage,36,50,state\nacidogenic,63,73,state\ntwo - stage,89,100,state\nmethanogenic,105,117,state\ntwo - stage,133,144,state\nbiogas,864,870,state\nbiogas,1315,1321,state\nbiogas,2035,2041,state\nsteady - state,2680,2694,state\nbiogas,2752,2758,state"}", "/scratch/micpie/export/bio_ner_10/test_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Thus, these studies on rat Std promoter function indicate that (i) HNF1 and C\/EBP are responsible for liver specificity of the rat Std gene; (ii) androgenic repression of the gene requires the presence of all of the OCT-1 and C\/EBP elements between positions-231 and-292; and (iii) AR may exert its negative regulatory effect indirectly through transcriptional interference of OCT-1 and C\/EBP rather than through a direct DNA-AR interaction..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: rat Std promoter,23,39,Gene\/Protein\nHNF1,68,72,Gene\/Protein\nC \/ EBP,77,84,Gene\/Protein\nrat Std gene,130,142,Gene\/Protein\nOCT - 1,220,227,Gene\/Protein\nC \/ EBP elements,232,248,Gene\/Protein\nAR,295,297,Gene\/Protein\nOCT - 1,390,397,Gene\/Protein\nC \/ EBP,402,409,Gene\/Protein\nDNA - AR,439,447,Gene\/Protein"} {"text":"Task: Please carry out the NER task for the the text below.\nText: This effect was specifically due to Tat, since Jurkat and U937 cells cotransfected either with tat cDNA in antisense orientation (tat\/AS), tat carrying a mutation in the aminoacid cys22-gly22 (tat 22\/S) or with the backbone vector alone (pRPneo-SL3) did not show any significant difference in c-fos promoter activity as compared to cells transfected with FC3 plasmid alone..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: Tat,36,39,Gene\/Protein\ntat cDNA,95,103,Gene\/Protein\ntat \/ AS,131,139,Gene\/Protein\ntat,142,145,Gene\/Protein\naminoacid cys22 - gly22,173,196,Gene\/Protein\ntat 22 \/ S,199,209,Gene\/Protein\nbackbone vector alone,223,244,Gene\/Protein\npRPneo - SL3,247,259,Gene\/Protein\nc - fos promoter,304,320,Gene\/Protein\nFC3 plasmid,368,379,Gene\/Protein"}", "/scratch/micpie/export/bio_ner_10/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: This study shows that angiogenin expression is up-regulated in the cytoplasmic and nuclear compartments in in situ carcinoma and invasive carcinoma compared with normal breast tissue and that angiogenin expression in invasive carcinomas is significantly positively associated with high tumour grade (p = 0.03), positive oestrogen receptor (ER) status (p = 0.01), HIF-1 alpha (p = 0.001) and DEC 1 (p = 0.001), but not with patient age (p = 0.8), tumour size (p = 0.25), lymph node status (p = 0.69), epidermal growth factor receptor (p = 0.56) or microvessel density (p = 0.32)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: cytoplasmic,69,80,Anatomy\nnuclear compartments,85,105,Anatomy\nin situ carcinoma,109,126,Anatomy\ninvasive carcinoma,131,149,Anatomy\nbreast tissue,171,184,Anatomy\ninvasive carcinomas,219,238,Anatomy\ntumour,288,294,Anatomy\ntumour,461,467,Anatomy\nlymph node,487,497,Anatomy\nmicrovessel,568,579,Anatomy"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Since sulfate concentrations are naturally higher in marine seawater than in oligohaline water, the estimated sea water sulfate (SO42-) concentration at the respective salinities (28.2 mM SO42-at a salinity of 35; 16.9 mM SO42-at a salinity of 21; 6.4 mM SO42-at a salinity of 8; 5.6. 2 mM SO42-at a salinity of 7; 3.2 mM SO42-at a salinity of 4) was subtracted from TSul to obtain a salinity-corrected value (Sulred)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: sulfate concentrations,6,28,state\nmarine,53,59,state\noligohaline,77,88,state\nsalinities,170,180,state\nsalinity of 35,204,218,state\nsalinity of 21,241,255,state\nsalinity of 8,277,290,state\nsalinity of 7,315,328,state\nsalinity of 4,350,363,state\nsalinity,402,410,state"}", "/scratch/micpie/export/MUV_712/valid_0-0.jsonl": "{"text":"The molecule with the SMILES COc1ccc(C2C(C(=O)NCc3ccccc3)=C(C)N=C3N=CNN32)cc1OC is not an inhibitor of the heat shock protein 90."} {"text":"The compound with the SELFIES ['[C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][Branch1][C][C][=N][N][C][Branch1][P][N][C][C][C][Branch1][Branch1][C][C][Ring1][=Branch1][O][C][C][O][Ring1][#Branch1][=C][C][Branch1][C][C][=N][C][=Ring2][Ring1][Branch1][Ring1][P]'] is not an inhibitor of the heat shock protein 90."}", "/scratch/micpie/export/MUV_712/test_0-0.jsonl": "{"text":"The chemical with the SMILES representation of NC(=O)NC(Cc1ccccc1)C(=O)O is not an inhibitor of the heat shock protein 90."} {"text":"The compound with the canonical SMILES COCCN1C(SCc2cccnc2)=NN\/C1=C1\\C=Nc2ccccc21 is not an inhibitor of the heat shock protein 90."}", "/scratch/micpie/export/MUV_712/train_0-0.jsonl": "{"text":"The molecule with the canonical SMILES Cc1cccc(N2CCN(C(=O)C34CC5CC(CC(C5)C3)C4)CC2)c1C is not an inhibitor of HSP90."} {"text":"The molecule with the DeepSMILES representation of COcccNCCCCCNC=O)cccccc6C9=O))))))))))))))))cncccc6c%10 is not an inhibitor of the heat shock protein 90."}", "/scratch/micpie/export/bio_ner_5/valid_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: (E) The effect of neutralizing IFN-gamma production was tested under the following conditions: DCs alone (gray bars); NK+ DC (1: 5) (black bars); NK+ DC (1: 5) and 10 mug\/ml of blocking anti-IFN-gamma mAb (stippled bars)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: DCs,98,101,Anatomy\nNK,122,124,Anatomy\nDC,127,129,Anatomy\nNK,153,155,Anatomy\nDC,158,160,Anatomy"} {"text":"Task: Please carry out the NER task for the the text below.\nText: Searching for genes involved in lactate utilization under anaerobic conditions The following enzymes involved in anaerobic lactate oxidation were selected as query protein sequences for tBLASTn searches of the prepared custom database (maximal E-value of 1e1, word size set to 3, BLOSUM62 matrix, and gap open\/extend cost of 11\/1): lactate permease WP _ 014355268 (AWO _ RS04425) (LldP), lactate racemase WP _ 014355269 (AWO _ RS04430) (LarA), electron transfer flavoprotein subunit alpha WP _ 014355266 (AWO _ RS04415) (EtfA), FAD\/FMN-containing dehydrogenase WP _ 014355267 (AWO _ RS04420) (GlcD), electron transporter RnfC WP _ 014356580 (AWO _ RS11370) all from Acetobacterium woodii DSM 1030 genome NC _ 016894; l-lactate utilization protein LutB containing FeS oxidoreductase WP _ 028317114 (Q362 _ RS0100810) from Desulfobulbus elongatus DSM 2908 assembly ASM62114v1; [ FeFe]-hydrogenase large subunit Fe, Fe _ hydrog _ A WP _ 012939287 (ACFER _ RS10010) from Acidaminococcus fermentans DSM 20731 genome NC _ 013740; Ni, Fe-hydrogenase III large subunit WP _ 075074147 (LARV _ RS13630) from Longilinea arvoryzae strain KOME-1 assembly ASM105023v2; and NADH-quinone oxidoreductase subunit C WP _ 004312112 (HMPREF1074 _ RS21770) from Bacteroides xylanisolvens CL03T12C04 assembly Bact _ xyla _ CL03T12C04 _ V1..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: lactate,32,39,treatment\nlactate,123,130,treatment\nlactate,339,346,treatment\nlactate,397,404,treatment\nlactate,741,748,treatment"}", "/scratch/micpie/export/bio_ner_5/test_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: (A) TNF-alpha production was measured in the supernatants from cultures of: DCs alone (gray bars); NK+ DC (1: 5) (black bars)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: supernatants,48,60,Anatomy\ncultures,66,74,Anatomy\nDCs,79,82,Anatomy\nNK,103,105,Anatomy\nDC,108,110,Anatomy"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: For one experiment, two groups of animals (nave, who received untreated water-and ABX-treated, who received ABX-treated water) were sacrificed following ABX-treated or untreated water exposure (Supplemental Figure S1A)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: untreated water,63,78,treatment\nABX,85,88,treatment\nABX - treated water,113,132,treatment\nABX,160,163,treatment\nuntreated water,177,192,treatment"}", "/scratch/micpie/export/bio_ner_5/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: The observation that AP-3 is found in both cytosolic and membrane subcellular fractions, and that membrane-associated AP-3 can be extracted with salts (Dell'Angelica et al., 1997a) is consistent with its role as a membrane coat that can be recruited from a cytosolic pool..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: cytosolic,45,54,Anatomy\nmembrane subcellular fractions,59,89,Anatomy\nmembrane,100,108,Anatomy\nmembrane,223,231,Anatomy\ncytosolic pool,266,280,Anatomy"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: n. NFIJ00000000) or Bacteroides clarus An43 (Bacteroidetes, NFII00000000) as representatives of Gram-negative bacteria, or with Megamonas hypermegale An288, (Selenomonadales\/Firmicutes, NFIW00000000), Lactobacillus reuteri An71 (Lactobacillaceae\/Firmicutes, NFHN00000000), Butyricicoccus pullicaecorum An179 (Ruminococcaceae\/Firmicutes, NFKL00000000) or Blautia producta An81 (Lachnospiraceae\/Firmicutes, NFKQ00000000) as representatives of Gram-positive bacteria..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: Bacteroides clarus An43,20,43,treatment\nMegamonas hypermegale An288,131,158,treatment\nLactobacillus reuteri An71,207,233,treatment\nButyricicoccus pullicaecorum An179,282,316,treatment\nBlautia producta An81,366,387,treatment"}", "/scratch/micpie/export/compound_protein_pathway_disease_1/test_4-0.jsonl": "{"text":"The compound O=[N+]([O-])c1cccc2cn[nH]c12 targets the protein cNOS. The protein cNOS is involved in Calcium signaling pathway. The Calcium signaling pathway is modulated by the disease Waardenburg syndrome."} {"text":"The compound COc1c(C(=O)N[C@@H]2CCN(Cc3ccccc3)C2)cc(Br)c2ccccc12 targets the protein Dopamine D4 receptor. The protein Dopamine D4 receptor is involved in Dopaminergic synapse. The Dopaminergic synapse is modulated by the disease Episodic ataxias."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/valid_5-0.jsonl": "{"text":"The compound N#CC(C#N)=Cc1cc(CN2CCN(c3ccccc3)CC2)c[nH]1 targets the protein Dopamine D4 receptor. The protein Dopamine D4 receptor is involved in Dopaminergic synapse. The Dopaminergic synapse is modulated by the disease Episodic ataxias."} {"text":"The compound InChI=1S\/C47H61N9O10\/c1-3-5-15-35(53-43(62)33(48)22-29-18-20-31(57)21-19-29)44(63)51-27-40(58)52-38(24-30-26-50-34-17-11-10-14-32(30)34)46(65)54-36(16-6-4-2)45(64)56-39(25-41(59)60)47(66)55-37(42(49)61)23-28-12-8-7-9-13-28\/h7-14,17-21,26,33,35-39,50,57H,3-6,15-16,22-25,27,48H2,1-2H3,(H2,49,61)(H,51,63)(H,52,58)(H,53,62)(H,54,65)(H,55,66)(H,56,64)(H,59,60)\/t33-,35+,36-,37-,38+,39-\/m0\/s1 targets the protein Delta-type opioid receptor. The protein Delta-type opioid receptor is involved in Neuroactive ligand-receptor interaction. The Neuroactive ligand-receptor interaction is modulated by the disease Schizophrenia."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/test_9-0.jsonl": "{"text":"The compound [O][=C][Branch1][C][O][C][=C][C][=C][C][Branch2][Ring2][Ring1][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][\/C][=Branch1][S][=N][\\N][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][S][Ring1][=Branch2][C][C][N][Ring2][Ring1][C][=N][Ring2][Ring1][O] targets the protein Bcl2-L-1. The protein Bcl2-L-1 is involved in JAK-STAT signaling pathway. The JAK-STAT signaling pathway is modulated by the disease Laron syndrome."} {"text":"The compound O=CcscCOcccccc6))))))))nc5CCN9CCC3 targets the protein mGluR5. The protein mGluR5 is involved in Neuroactive ligand-receptor interaction. The Neuroactive ligand-receptor interaction is modulated by the disease Genetic obesity."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/valid_3-0.jsonl": "{"text":"The compound COcccccc6)NC=O)Cccnccc-cccccc6))))))nn5c9-%14 targets the protein FLK-1. The protein FLK-1 is involved in Ras signaling pathway. The Ras signaling pathway is modulated by the disease Watson syndrome."} {"text":"The compound InChI=1S\/C22H25ClFN3\/c23-17-6-1-4-16(12-17)14-26-11-3-5-15-8-9-18-19-7-2-10-22(19,24)21(25)27-20(18)13-15\/h1,4,6,8-9,12-13,19,26H,2-3,5,7,10-11,14H2,(H2,25,27)\/t19-,22+\/m1\/s1 targets the protein Nitric oxide synthase, endothelial. The protein Nitric oxide synthase, endothelial is involved in Calcium signaling pathway. The Calcium signaling pathway is modulated by the disease Waardenburg syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/test_5-0.jsonl": "{"text":"The compound [C][O][C][=C][C][=C][Branch2][Ring1][S][C][=C][C][=C][C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=N][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][=Branch2] targets the protein D(2C) dopamine receptor. The protein D(2C) dopamine receptor is involved in Dopaminergic synapse. The Dopaminergic synapse is modulated by the disease Episodic ataxias."} {"text":"The compound InChI=1S\/C22H20N4O\/c1-25(2)22-24-23-21(26(22)19-12-7-13-20(27)15-19)18-11-6-10-17(14-18)16-8-4-3-5-9-16\/h3-15,27H,1-2H3 targets the protein Delta-type opioid receptor. The protein Delta-type opioid receptor is involved in Neuroactive ligand-receptor interaction. The Neuroactive ligand-receptor interaction is modulated by the disease Schizophrenia."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/test_2-0.jsonl": "{"text":"The compound [C][=C][C][Branch1][#Branch1][S][C][C][C][Ring1][Ring1][=C][C][=C][Ring1][#Branch2][C][N][C][C][C][C][Branch1][#C][N][C][=C][C][=C][NH1][N][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][C][Ring1][S] targets the protein Rho-associated, coiled-coil-containing protein kinase 2. The protein Rho-associated, coiled-coil-containing protein kinase 2 is involved in Tight junction. The Tight junction is modulated by the disease Progressive familial intrahepatic cholestasis."} {"text":"The compound COc1ccc(N2Cc3cnc(Nc4ccc(F)cc4)nc3N([C@@H]3CC[C@@H](O)C3)C2=O)cc1 targets the protein CD antigen CD309. The protein CD antigen CD309 is involved in Ras signaling pathway. The Ras signaling pathway is modulated by the disease Watson syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/valid_0-0.jsonl": "{"text":"The compound InChI=1S\/C22H24ClN3O3\/c1-2-3-12-24-20(27)11-13-25-21(28)18-9-4-5-10-19(18)26(22(25)29)15-16-7-6-8-17(23)14-16\/h4-10,14H,2-3,11-13,15H2,1H3,(H,24,27) targets the protein Antigen NY-CO-13. The protein Antigen NY-CO-13 is involved in Cellular senescence. The Cellular senescence is modulated by the disease Ataxia telangiectasia."} {"text":"The compound [C][N][Branch1][C][C][C][C][N][C][C][N][Branch2][Ring2][Branch2][C][=Branch1][C][=O][C][C][=C][Branch1][=Branch2][C][Branch1][C][C][Branch1][C][C][C][S][C][=Ring1][=Branch2][N][C][=Branch1][C][=O][N][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1][Cl][C][C][C][Ring2][Ring1][=C][=O] targets the protein Calcium\/calmodulin-dependent protein kinase type II subunit beta. The protein Calcium\/calmodulin-dependent protein kinase type II subunit beta is involved in Dopaminergic synapse. The Dopaminergic synapse is modulated by the disease Early infantile epileptic encephalopathy."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/test_7-0.jsonl": "{"text":"The compound InChI=1S\/C19H12Cl2F3N5O2S\/c20-13-2-1-3-14(21)17(13)30-8-15-26-16(31-28-15)9-32-18-27-25-10-29(18)12-6-4-11(5-7-12)19(22,23)24\/h1-7,10H,8-9H2 targets the protein Proto-oncogene c-ErbB-1. The protein Proto-oncogene c-ErbB-1 is involved in Regulation of actin cytoskeleton. The Regulation of actin cytoskeleton is modulated by the disease X-linked mental retardation."} {"text":"The compound CCOc1cc(CN2CCC(Nc3nc4cc(OC(F)(F)F)ccc4o3)CC2)ccc1OC targets the protein SS5-R. The protein SS5-R is involved in Neuroactive ligand-receptor interaction. The Neuroactive ligand-receptor interaction is modulated by the disease Febrile seizures."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/test_3-0.jsonl": "{"text":"The compound C[C@]O[C@H]C[C@]5O)CO))))ncccccc6cccccccccc6n%21c9c%13%20))))))))))CNC5=O targets the protein Fetal liver kinase 1. The protein Fetal liver kinase 1 is involved in Ras signaling pathway. The Ras signaling pathway is modulated by the disease Watson syndrome."} {"text":"The compound [N][C][=N][C][C][C][N][Ring1][=Branch1] targets the protein Constitutive NOS. The protein Constitutive NOS is involved in Calcium signaling pathway. The Calcium signaling pathway is modulated by the disease Waardenburg syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/train_1-0.jsonl": "{"text":"The compound COccccOcccc[nH]cC)cc5c9F)))))))))))ncnc6cc%10OCCCNCCCC5 targets the protein CaM kinase II subunit beta. The protein CaM kinase II subunit beta is involved in Dopaminergic synapse. The Dopaminergic synapse is modulated by the disease Early infantile epileptic encephalopathy."} {"text":"The compound InChI=1S\/C20H21N3\/c1-2-5-16(6-3-1)14-23-12-10-18(15-23)22-20-8-4-7-17-13-21-11-9-19(17)20\/h1-9,11,13,18,22H,10,12,14-15H2 targets the protein Rho-associated, coiled-coil-containing protein kinase II. The protein Rho-associated, coiled-coil-containing protein kinase II is involved in Tight junction. The Tight junction is modulated by the disease Progressive familial intrahepatic cholestasis."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/test_0-0.jsonl": "{"text":"The compound O=CNccccF)cc6)))))))CCCCNcncncc6ncn5CCCCC7))))))))))))))C6 targets the protein Antigen NY-CO-13. The protein Antigen NY-CO-13 is involved in Cellular senescence. The Cellular senescence is modulated by the disease Ataxia telangiectasia."} {"text":"The compound InChI=1S\/C14H10N4\/c1-2-11-8-17-18-13(11)6-9(1)12-5-10-3-4-15-14(10)16-7-12\/h1-8H,(H,15,16)(H,17,18) targets the protein CaM kinase II subunit beta. The protein CaM kinase II subunit beta is involved in Dopaminergic synapse. The Dopaminergic synapse is modulated by the disease Early infantile epileptic encephalopathy."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/test_6-0.jsonl": "{"text":"The compound CNCC[C@]CCCC[C@H]6[C@H]%10CccccNCcccccO)c6))))))))cc6%14 targets the protein D-OR-1. The protein D-OR-1 is involved in Neuroactive ligand-receptor interaction. The Neuroactive ligand-receptor interaction is modulated by the disease Schizophrenia."} {"text":"The compound C=C(CN(C)C)C(=O)c1ccccc1.Cl targets the protein Receptor tyrosine-protein kinase erbB-1. The protein Receptor tyrosine-protein kinase erbB-1 is involved in Regulation of actin cytoskeleton. The Regulation of actin cytoskeleton is modulated by the disease X-linked mental retardation."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/train_2-0.jsonl": "{"text":"The compound CS(=O)(=O)c1ccc(CN2CCC(Nc3cccc4cnccc34)C2)cc1 targets the protein p164 ROCK-2. The protein p164 ROCK-2 is involved in Tight junction. The Tight junction is modulated by the disease Progressive familial intrahepatic cholestasis."} {"text":"The compound [C][O][C][=C][C][=C][Branch2][Ring1][N][N][C][=N][C][=C][C][=Branch1][Ring2][=N][Ring1][=Branch1][C][=N][C][=C][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][Ring1][=N][C][=C][Ring2][Ring1][#Branch1] targets the protein Protein-tyrosine kinase receptor flk-1. The protein Protein-tyrosine kinase receptor flk-1 is involved in Ras signaling pathway. The Ras signaling pathway is modulated by the disease Watson syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/valid_2-0.jsonl": "{"text":"The compound OCCOcccccCNCCCNcccc[nH]ncc5c9))))))))))CC6)))))))c6 targets the protein ROCK-II. The protein ROCK-II is involved in Tight junction. The Tight junction is modulated by the disease Progressive familial intrahepatic cholestasis."} {"text":"The compound CCOc1cc2ncc(C#N)c(Nc3ccc(F)c(Cl)c3)c2cc1NC(=O)\/C=C\/CN(C)C targets the protein VEGFR-2. The protein VEGFR-2 is involved in Ras signaling pathway. The Ras signaling pathway is modulated by the disease Watson syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/valid_4-0.jsonl": "{"text":"The compound InChI=1S\/C20H28ClN5\/c1-14-8-18(26-20(22)9-14)10-16-12-24-13-19(16)25-7-6-23-11-15-2-4-17(21)5-3-15\/h2-5,8-9,16,19,23-25H,6-7,10-13H2,1H3,(H2,22,26)\/t16-,19+\/m0\/s1 targets the protein NOS type III. The protein NOS type III is involved in Calcium signaling pathway. The Calcium signaling pathway is modulated by the disease Waardenburg syndrome."} {"text":"The compound N#CC(C#N)=Cc1[nH]ccc1CN1CCN(c2ccc(Cl)cc2)CC1 targets the protein D(4) dopamine receptor. The protein D(4) dopamine receptor is involved in Dopaminergic synapse. The Dopaminergic synapse is modulated by the disease Episodic ataxias."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/train_0-0.jsonl": "{"text":"The compound CCOC(=O)c1ccccc1NC(=O)CN(c1ccccc1OC)S(=O)(=O)c1ccc(C)cc1 targets the protein Antigen NY-CO-13. The protein Antigen NY-CO-13 is involved in Cellular senescence. The Cellular senescence is modulated by the disease Ataxia telangiectasia."} {"text":"The compound InChI=1S\/C28H39N7O3\/c1-5-22-27(37)34(3)23-17-29-28(32-25(23)35(22)20-8-6-7-9-20)31-21-11-10-18(16-24(21)38-4)26(36)30-19-12-14-33(2)15-13-19\/h10-11,16-17,19-20,22H,5-9,12-15H2,1-4H3,(H,30,36)(H,29,31,32) targets the protein CaM kinase II subunit beta. The protein CaM kinase II subunit beta is involved in Dopaminergic synapse. The Dopaminergic synapse is modulated by the disease Early infantile epileptic encephalopathy."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/valid_9-0.jsonl": "{"text":"The compound Cc1oc(-c2ccc(Cl)cc2)c(-c2cccc(N3CCN(c4ccc(NS(=O)(=O)c5ccc(N[C@H](CCN(C)C)CSc6ccccc6)c([N+](=O)[O-])c5)cc4)CC3)c2)c1C(=O)O targets the protein Bcl2-L-1. The protein Bcl2-L-1 is involved in JAK-STAT signaling pathway. The JAK-STAT signaling pathway is modulated by the disease Laron syndrome."} {"text":"The compound [C][C][C][C][=Branch1][C][=O][C][=C][N][=C][Branch1][=N][N][C@H1][C][C][C@H1][Branch1][C][C][C][C][Ring1][#Branch1][N][=C][Ring1][=C][C][Ring2][Ring1][Ring1] targets the protein mGluR5. The protein mGluR5 is involved in Neuroactive ligand-receptor interaction. The Neuroactive ligand-receptor interaction is modulated by the disease Genetic obesity."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/train_8-0.jsonl": "{"text":"The compound [C][C][O][C][=C][C][Branch2][Ring1][#Branch2][C][N][C][C][C][Branch1][#C][N][C][=N][C][=C][C][=N][C][=C][Ring1][=Branch1][O][Ring1][=Branch2][C][C][Ring1][S][=C][C][=C][Ring2][Ring1][#Branch1][O][C] targets the protein Somatostatin receptor type 5. The protein Somatostatin receptor type 5 is involved in Neuroactive ligand-receptor interaction. The Neuroactive ligand-receptor interaction is modulated by the disease Febrile seizures."} {"text":"The compound CcccC=O)NS=O)=O)ccccN[C@H]CCNC)C))))CScccccc6))))))))))c[N+]=O)[O-]))c6)))))))))ccC)c6-ccccccCCCOcccccccccc%106))))))))))))))cC=O)O))nn95 targets the protein Apoptosis regulator Bcl-X. The protein Apoptosis regulator Bcl-X is involved in JAK-STAT signaling pathway. The JAK-STAT signaling pathway is modulated by the disease Laron syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/train_5-0.jsonl": "{"text":"The compound c1ccc(N2CCN(Cc3ccc[nH]3)CC2)cc1 targets the protein D(4) dopamine receptor. The protein D(4) dopamine receptor is involved in Dopaminergic synapse. The Dopaminergic synapse is modulated by the disease Episodic ataxias."} {"text":"The compound [C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][Branch1][C][C][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C@@][C][C][C][C][C@H1][Ring1][=Branch1][C@@H1][Branch1][Ring2][C][Ring1][O][N][Branch1][Branch2][C][C][C][C][C][Ring1][Ring2][C][C][Ring1][S] targets the protein DOR-1. The protein DOR-1 is involved in Neuroactive ligand-receptor interaction. The Neuroactive ligand-receptor interaction is modulated by the disease Schizophrenia."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/train_9-0.jsonl": "{"text":"The compound CC[C@H](C)[C@H](NC(=O)[C@H](C)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](CCCN=C(N)N)NC(=O)CNC(=O)[C@@H](NC(=O)[C@H](CCC(N)=O)NC(=O)CN)C(C)C)C(=O)N[C@H](C(=O)NCC(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CC(=O)O)C(=O)N[C@H](C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCCN=C(N)N)C(=O)O)[C@@H](C)CC)[C@@H](C)CC targets the protein Apoptosis regulator Bcl-X. The protein Apoptosis regulator Bcl-X is involved in JAK-STAT signaling pathway. The JAK-STAT signaling pathway is modulated by the disease Laron syndrome."} {"text":"The compound CccccNcncccn6)CCC)CC6=O)))))))))))cn6 targets the protein Metabotropic glutamate receptor 5. The protein Metabotropic glutamate receptor 5 is involved in Neuroactive ligand-receptor interaction. The Neuroactive ligand-receptor interaction is modulated by the disease Genetic obesity."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/test_1-0.jsonl": "{"text":"The compound Cc1cccc(NC(=O)Nc2ccc(-c3csc4c(C#CCN5CCOCC5)cnc(N)c34)cc2)c1 targets the protein CaMK-II subunit gamma. The protein CaMK-II subunit gamma is involved in Dopaminergic synapse. The Dopaminergic synapse is modulated by the disease Early infantile epileptic encephalopathy."} {"text":"The compound [O][=C][Branch2][Ring2][Branch2][C][O][C][=C][C][=C][C][Branch2][Ring1][#Branch2][C][N][C][C][C@@H1][Branch1][S][N][C][=C][C][=C][C][=C][N][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][Ring1][S][=C][Ring2][Ring1][#Branch1][N][C][C][O][C][C][Ring1][=Branch1] targets the protein Rho kinase 2. The protein Rho kinase 2 is involved in Tight junction. The Tight junction is modulated by the disease Progressive familial intrahepatic cholestasis."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/valid_7-0.jsonl": "{"text":"The compound InChI=1S\/C28H26N6O3S2\/c35-28(34-11-13-36-14-12-34)37-21-15-20(30-17-21)6-9-23-16-24-26(39-23)27(32-18-31-24)33-19-4-7-22(8-5-19)38-25-3-1-2-10-29-25\/h1-5,7-8,10,16,18,20-21,30H,11-15,17H2,(H,31,32,33)\/t20-,21-\/m1\/s1 targets the protein Receptor tyrosine-protein kinase erbB-1. The protein Receptor tyrosine-protein kinase erbB-1 is involved in Regulation of actin cytoskeleton. The Regulation of actin cytoskeleton is modulated by the disease X-linked mental retardation."} {"text":"The compound CCOcccNCCCNcncccccc6s9))))))))))CC6))))))ccc6OC targets the protein SS5-R. The protein SS5-R is involved in Neuroactive ligand-receptor interaction. The Neuroactive ligand-receptor interaction is modulated by the disease Febrile seizures."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/valid_8-0.jsonl": "{"text":"The compound InChI=1S\/C28H32N4O5S\/c1-3-36-27-17-20(9-11-26(27)35-2)19-32-15-13-21(14-16-32)29-28-30-24-18-22(10-12-25(24)37-28)31-38(33,34)23-7-5-4-6-8-23\/h4-12,17-18,21,31H,3,13-16,19H2,1-2H3,(H,29,30) targets the protein SS5R. The protein SS5R is involved in Neuroactive ligand-receptor interaction. The Neuroactive ligand-receptor interaction is modulated by the disease Febrile seizures."} {"text":"The compound InChI=1S\/C44H42N6O3S\/c45-24-33-30(7-4-10-36(33)50-16-5-12-38(50)44-21-26-18-27(22-44)20-28(19-26)23-44)31-13-14-39(47-40(31)42(52)53)49-17-15-29-6-3-8-32(34(29)25-49)41(51)48-43-46-35-9-1-2-11-37(35)54-43\/h1-4,6-11,13-14,26-28,38H,5,12,15-23,25H2,(H,52,53)(H,46,48,51) targets the protein Apoptosis regulator Bcl-X. The protein Apoptosis regulator Bcl-X is involved in JAK-STAT signaling pathway. The JAK-STAT signaling pathway is modulated by the disease Laron syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/valid_1-0.jsonl": "{"text":"The compound CN[C@@H]1C[C@H]2O[C@@](C)([C@@H]1OC)n1c3ccccc3c3c4c(c5c6ccccc6n2c5c31)C(=O)NC4 targets the protein Calcium\/calmodulin-dependent protein kinase type II subunit beta. The protein Calcium\/calmodulin-dependent protein kinase type II subunit beta is involved in Dopaminergic synapse. The Dopaminergic synapse is modulated by the disease Early infantile epileptic encephalopathy."} {"text":"The compound c1cc(N[C@H]2CCN(Cc3ccc(C4CC4)cc3)C2)c2ccncc2c1 targets the protein Rho-associated, coiled-coil-containing protein kinase 2. The protein Rho-associated, coiled-coil-containing protein kinase 2 is involved in Tight junction. The Tight junction is modulated by the disease Progressive familial intrahepatic cholestasis."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/train_6-0.jsonl": "{"text":"The compound Oc1ccc(CNc2ccc3c(c2)[C@@]24CCCC[C@H]2[C@@H](C3)N(CC2CC2)CC4)cc1 targets the protein DOR-1. The protein DOR-1 is involved in Neuroactive ligand-receptor interaction. The Neuroactive ligand-receptor interaction is modulated by the disease Schizophrenia."} {"text":"The compound [C][C][N][C][=Branch1][C][=O][O][C@H1][C][N][C@@H1][Branch2][Ring2][S][C][#C][C][=C][C][=N][C][=N][C][Branch2][Ring1][N][N][C][=C][C][=C][Branch1][=C][O][C][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1][C][Branch1][C][Cl][=C][Ring1][S][=C][Ring2][Ring1][#Branch1][S][Ring2][Ring1][#Branch2][C][Ring2][Ring1][P] targets the protein Epidermal growth factor receptor. The protein Epidermal growth factor receptor is involved in Regulation of actin cytoskeleton. The Regulation of actin cytoskeleton is modulated by the disease X-linked mental retardation."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/test_8-0.jsonl": "{"text":"The compound [C][C][O][C][=C][C][Branch2][Ring1][#Branch2][C][N][C][C][C][Branch1][#C][N][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][O][Ring1][=Branch2][C][C][Ring1][S][=C][C][=C][Ring2][Ring1][#Branch1][C][Branch1][C][F][Branch1][C][F][F] targets the protein SST5. The protein SST5 is involved in Neuroactive ligand-receptor interaction. The Neuroactive ligand-receptor interaction is modulated by the disease Febrile seizures."} {"text":"The compound CcccC)nncS=O)=O)NC=O)ccccNCCNCcccccc6-ccccCl)cc6)))))))))))))CC6))))))cc6)))))))))nc5n9 targets the protein Apoptosis regulator Bcl-X. The protein Apoptosis regulator Bcl-X is involved in JAK-STAT signaling pathway. The JAK-STAT signaling pathway is modulated by the disease Laron syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/valid_6-0.jsonl": "{"text":"The compound InChI=1S\/C25H25N5\/c1-3-8-21(9-4-1)20-30-19-16-26-23(30)25(22-10-5-2-6-11-22)12-17-29(18-13-25)24-27-14-7-15-28-24\/h1-11,14-16,19H,12-13,17-18,20H2 targets the protein Delta-type opioid receptor. The protein Delta-type opioid receptor is involved in Neuroactive ligand-receptor interaction. The Neuroactive ligand-receptor interaction is modulated by the disease Schizophrenia."} {"text":"The compound InChI=1S\/C23H19ClFN7S\/c1-2-19-26-9-10-32(19)12-14-3-5-15(6-4-14)30-23-31-22-20(33-23)21(27-13-28-22)29-16-7-8-18(25)17(24)11-16\/h3-11,13H,2,12H2,1H3,(H2,27,28,29,30,31) targets the protein Epidermal growth factor receptor. The protein Epidermal growth factor receptor is involved in Regulation of actin cytoskeleton. The Regulation of actin cytoskeleton is modulated by the disease X-linked mental retardation."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/train_3-0.jsonl": "{"text":"The compound InChI=1S\/C24H28N6\/c1-17-21-9-10-28-23(21)8-7-22(17)29-24-18(14-27-15-19(24)13-25)5-2-3-11-30-12-4-6-20(26)16-30\/h2,5,7-10,14-15,20,28H,3-4,6,11-12,16,26H2,1H3,(H,27,29)\/b5-2+\/t20-\/m1\/s1 targets the protein Vascular endothelial growth factor receptor 2. The protein Vascular endothelial growth factor receptor 2 is involved in Ras signaling pathway. The Ras signaling pathway is modulated by the disease Watson syndrome."} {"text":"The compound NC1=N[C@H](COc2ccc3c(c2)CCNC3)COc2ccsc21 targets the protein Nitric oxide synthase, endothelial. The protein Nitric oxide synthase, endothelial is involved in Calcium signaling pathway. The Calcium signaling pathway is modulated by the disease Waardenburg syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/train_7-0.jsonl": "{"text":"The compound CCCC)\/C=C\/C=O)C=O)\/C=C\/C=C\/C=C\/C=C\/C=C\/C=C\/[C@@H]C[C@@H]O)[C@H]C=O)O))O5 targets the protein Proto-oncogene c-ErbB-1. The protein Proto-oncogene c-ErbB-1 is involved in Regulation of actin cytoskeleton. The Regulation of actin cytoskeleton is modulated by the disease X-linked mental retardation."} {"text":"The compound CCOc1cc(CN2CCC(Nc3nc4ccccc4o3)CC2)ccc1OC1CC1 targets the protein SS5-R. The protein SS5-R is involved in Neuroactive ligand-receptor interaction. The Neuroactive ligand-receptor interaction is modulated by the disease Febrile seizures."}", "/scratch/micpie/export/compound_protein_pathway_disease_1/train_4-0.jsonl": "{"text":"The compound Cc1cc(N)nc(C[C@H]2CNC[C@H]2NCCNCCc2ccccc2)c1 targets the protein NOS type III. The protein NOS type III is involved in Calcium signaling pathway. The Calcium signaling pathway is modulated by the disease Waardenburg syndrome."} {"text":"The compound InChI=1S\/C30H35N3O3\/c1-23(34)24-9-11-25(12-10-24)26-13-15-27(16-14-26)30(35)31-17-5-6-18-32-19-21-33(22-20-32)28-7-3-4-8-29(28)36-2\/h3-4,7-16H,5-6,17-22H2,1-2H3,(H,31,35) targets the protein D(4) dopamine receptor. The protein D(4) dopamine receptor is involved in Dopaminergic synapse. The Dopaminergic synapse is modulated by the disease Episodic ataxias."}", "/scratch/micpie/export/smiles_to_3d/test_0-5.jsonl": "{"text":"Question: What is the SMILES of the compound with the 3D-structure in V3000 Molfile format format (after optimization on B3LYP\/6-31G(2df,p) level of theory) [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 22 24 0 0 0\nM V30 BEGIN ATOM\nM V30 1 C 1.600787 -1.565855 0.515621 0\nM V30 2 C 0.704503 -0.472319 -0.053279 0\nM V30 3 C 1.383694 0.534941 -1.038965 0\nM V30 4 C 0.849893 1.725454 -0.192527 0\nM V30 5 C 0.328682 0.727491 0.884737 0\nM V30 6 C -1.178761 0.601906 1.183080 0\nM V30 7 N -1.759185 -0.445184 0.312456 0\nM V30 8 C -1.658647 -0.148759 -1.121647 0\nM V30 9 C -0.651216 -1.031251 -0.479001 0\nM V30 10 H 1.096939 -2.113602 1.320257 0\nM V30 11 H 1.878452 -2.292149 -0.257605 0\nM V30 12 H 2.525958 -1.144848 0.922699 0\nM V30 13 H 2.474529 0.460096 -1.027758 0\nM V30 14 H 1.049841 0.482723 -2.077739 0\nM V30 15 H 1.614010 2.422731 0.159336 0\nM V30 16 H 0.064177 2.313386 -0.674904 0\nM V30 17 H 0.912838 0.771092 1.807569 0\nM V30 18 H -1.335923 0.281495 2.218471 0\nM V30 19 H -1.720265 1.545393 1.048806 0\nM V30 20 H -2.445941 -0.585115 -1.731268 0\nM V30 21 H -1.363012 0.849308 -1.432956 0\nM V30 22 H -0.710914 -2.103872 -0.650214 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 1 10\nM V30 3 1 1 11\nM V30 4 1 1 12\nM V30 5 1 2 3\nM V30 6 1 2 5\nM V30 7 1 2 9\nM V30 8 1 3 4\nM V30 9 1 3 13\nM V30 10 1 3 14\nM V30 11 1 4 5\nM V30 12 1 4 15\nM V30 13 1 4 16\nM V30 14 1 5 6\nM V30 15 1 5 17\nM V30 16 1 6 7\nM V30 17 1 6 18\nM V30 18 1 6 19\nM V30 19 1 7 8\nM V30 20 1 7 9\nM V30 21 1 8 9\nM V30 22 1 8 20\nM V30 23 1 8 21\nM V30 24 1 9 22\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]?\nAnswer: [H]C([H])([H])C12C([H])([H])C([H])([H])C1([H])C([H])([H])N1C([H])([H])C12[H]"} {"text":"Question: What is the SMILES of the compound with the three-dimensional structure in MOLV3000 Molfile format3000 format (after optimization on B3LYP\/6-31G(2df,p) level of theory) [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 19 19 0 0 0\nM V30 BEGIN ATOM\nM V30 1 C 2.581587 -1.123201 0.668251 0\nM V30 2 C 1.883715 -0.545351 -0.566246 0\nM V30 3 N 0.488772 -0.211432 -0.329753 0\nM V30 4 C -0.558023 -1.129368 -0.295660 0 VAL=3\nM V30 5 C -1.660426 -0.382718 0.019950 0 VAL=3\nM V30 6 O -2.932736 -0.819797 0.162060 0\nM V30 7 N -1.340724 0.932486 0.183143 0 VAL=2\nM V30 8 C -0.034778 1.009735 -0.029956 0 VAL=3\nM V30 9 C 0.760167 2.271351 0.041939 0\nM V30 10 H 3.621228 -1.371920 0.433803 0\nM V30 11 H 2.576389 -0.404246 1.492701 0\nM V30 12 H 2.078796 -2.032402 1.009468 0\nM V30 13 H 2.394438 0.360689 -0.902657 0\nM V30 14 H 1.925727 -1.260129 -1.395008 0\nM V30 15 H -0.416013 -2.174828 -0.504902 0\nM V30 16 H -3.456724 -0.038725 0.377160 0\nM V30 17 H 1.538120 2.233629 0.813472 0\nM V30 18 H 1.251445 2.508848 -0.909191 0\nM V30 19 H 0.078693 3.087393 0.285279 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 1 10\nM V30 3 1 1 11\nM V30 4 1 1 12\nM V30 5 1 2 3\nM V30 6 1 2 13\nM V30 7 1 2 14\nM V30 8 1 3 4\nM V30 9 1 3 8\nM V30 10 1 4 5\nM V30 11 1 4 15\nM V30 12 1 5 6\nM V30 13 1 5 7\nM V30 14 1 6 16\nM V30 15 1 7 8\nM V30 16 1 8 9\nM V30 17 1 9 17\nM V30 18 1 9 18\nM V30 19 1 9 19\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]?\nAnswer: [H]OC1NC(C([H])([H])[H])N(C([H])([H])C([H])([H])[H])C1[H]"}", "/scratch/micpie/export/smiles_to_3d/test_0-1.jsonl": "{"text":"Question: Can you generate the geometry in V2000 Molfile format (after optimization on B3LYP\/6-31G(2df,p) level of theory) of the chemical with the SMILES [H]C([H])([H])C12C([H])([H])C([H])([H])C1([H])C([H])([H])N1C([H])([H])C12[H]?\nAnswer: [V2000]\n\n ChemNLP 3D\n\n 22 24 0 0 0 0 0 0 0 0999 V2000\n 1.6008 -1.5659 0.5156 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7045 -0.4723 -0.0533 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.3837 0.5349 -1.0390 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.8499 1.7255 -0.1925 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.3287 0.7275 0.8847 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.1788 0.6019 1.1831 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.7592 -0.4452 0.3125 N 0 0 0 0 0 0 0 0 0 0 0 0\n -1.6586 -0.1488 -1.1216 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6512 -1.0313 -0.4790 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0969 -2.1136 1.3203 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.8785 -2.2921 -0.2576 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.5260 -1.1448 0.9227 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.4745 0.4601 -1.0278 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0498 0.4827 -2.0777 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.6140 2.4227 0.1593 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0642 2.3134 -0.6749 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9128 0.7711 1.8076 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3359 0.2815 2.2185 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.7203 1.5454 1.0488 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.4459 -0.5851 -1.7313 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3630 0.8493 -1.4330 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7109 -2.1039 -0.6502 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 1 10 1 0\n 1 11 1 0\n 1 12 1 0\n 2 3 1 0\n 2 5 1 0\n 2 9 1 0\n 3 4 1 0\n 3 13 1 0\n 3 14 1 0\n 4 5 1 0\n 4 15 1 0\n 4 16 1 0\n 5 6 1 0\n 5 17 1 0\n 6 7 1 0\n 6 18 1 0\n 6 19 1 0\n 7 8 1 0\n 7 9 1 0\n 8 9 1 0\n 8 20 1 0\n 8 21 1 0\n 9 22 1 0\nM END\n[\\V2000]"} {"text":"Question: What is the content of a V2000 Molfile file with the geometry (after optimization on B3LYP\/6-31G(2df,p) level of theory) of the compound with the SMILES [H]OC1NC(C([H])([H])[H])N(C([H])([H])C([H])([H])[H])C1[H]?\nAnswer: [V2000]\n\n ChemNLP 3D\n\n 19 19 0 0 0 0 0 0 0 0999 V2000\n 2.5816 -1.1232 0.6683 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.8837 -0.5454 -0.5662 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.4888 -0.2114 -0.3298 N 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5580 -1.1294 -0.2957 C 0 0 0 0 0 3 0 0 0 0 0 0\n -1.6604 -0.3827 0.0199 C 0 0 0 0 0 3 0 0 0 0 0 0\n -2.9327 -0.8198 0.1621 O 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3407 0.9325 0.1831 N 0 0 0 0 0 2 0 0 0 0 0 0\n -0.0348 1.0097 -0.0300 C 0 0 0 0 0 3 0 0 0 0 0 0\n 0.7602 2.2714 0.0419 C 0 0 0 0 0 0 0 0 0 0 0 0\n 3.6212 -1.3719 0.4338 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.5764 -0.4042 1.4927 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.0788 -2.0324 1.0095 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.3944 0.3607 -0.9027 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.9257 -1.2601 -1.3950 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.4160 -2.1748 -0.5049 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.4567 -0.0387 0.3772 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.5381 2.2336 0.8135 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2514 2.5088 -0.9092 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0787 3.0874 0.2853 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 1 10 1 0\n 1 11 1 0\n 1 12 1 0\n 2 3 1 0\n 2 13 1 0\n 2 14 1 0\n 3 4 1 0\n 3 8 1 0\n 4 5 1 0\n 4 15 1 0\n 5 6 1 0\n 5 7 1 0\n 6 16 1 0\n 7 8 1 0\n 8 9 1 0\n 9 17 1 0\n 9 18 1 0\n 9 19 1 0\nM END\n[\\V2000]"}", "/scratch/micpie/export/smiles_to_3d/valid_0-0.jsonl": "{"text":"Question: Can you provide me with the content within a XYZ file with optimized molecular geometry (following B3LYP\/6-31G(2df,p) theory) of the compound with the SMILES [H]N1C([H])(CN)C12C1([H])C([H])([H])C2([H])C1([H])[H]?\nAnswer: [XYZ]\n17\nH8 C7 N2\nN 3.146 -0.875 -0.361\nC 2.276 -0.234 0.050\nC 1.177 0.529 0.588\nN 0.553 1.536 -0.334\nC -0.207 0.386 0.075\nC -0.927 -0.651 -0.797\nC -1.378 -1.310 0.545\nC -1.567 0.223 0.771\nC -2.180 0.275 -0.667\nH 1.302 0.807 1.632\nH 0.307 2.367 0.201\nH -0.547 -1.142 -1.690\nH -2.298 -1.896 0.498\nH -0.607 -1.809 1.136\nH -1.922 0.728 1.670\nH -2.140 1.232 -1.191\nH -3.149 -0.215 -0.787[\\XYZ]"} {"text":"Question: Can you generate the content of a XYZ file with the geometry (after optimization on B3LYP\/6-31G(2df,p) level of theory) of the compound with the SMILES [H]OC([H])([H])C([H])([H])OC(O)C1([H])C([H])([H])C1([H])[H]?\nAnswer: [XYZ]\n19\nH10 C6 O3\nO 3.151 -1.090 -0.422\nC 2.958 0.054 0.385\nC 1.695 0.810 0.014\nO 0.600 -0.100 0.243\nC -0.636 0.377 -0.036\nO -0.841 1.497 -0.438\nC -1.677 -0.648 0.219\nC -3.028 -0.153 0.709\nC -2.892 -0.640 -0.694\nH 2.348 -1.617 -0.348\nH 2.925 -0.206 1.455\nH 3.823 0.706 0.227\nH 1.562 1.709 0.625\nH 1.703 1.104 -1.040\nH -1.316 -1.601 0.585\nH -3.543 -0.773 1.433\nH -3.113 0.916 0.861\nH -2.886 0.102 -1.484\nH -3.311 -1.603 -0.960[\\XYZ]"}", "/scratch/micpie/export/smiles_to_3d/test_0-2.jsonl": "{"text":"Question: Can you provide me with the three-dimensional structure in MOLV3000 Molfile format3000 format (after optimization on B3LYP\/6-31G(2df,p) level of theory) of the compound with the SMILES [H]C([H])([H])C12C([H])([H])C([H])([H])C1([H])C([H])([H])N1C([H])([H])C12[H]?\nAnswer: [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 22 24 0 0 0\nM V30 BEGIN ATOM\nM V30 1 C 1.600787 -1.565855 0.515621 0\nM V30 2 C 0.704503 -0.472319 -0.053279 0\nM V30 3 C 1.383694 0.534941 -1.038965 0\nM V30 4 C 0.849893 1.725454 -0.192527 0\nM V30 5 C 0.328682 0.727491 0.884737 0\nM V30 6 C -1.178761 0.601906 1.183080 0\nM V30 7 N -1.759185 -0.445184 0.312456 0\nM V30 8 C -1.658647 -0.148759 -1.121647 0\nM V30 9 C -0.651216 -1.031251 -0.479001 0\nM V30 10 H 1.096939 -2.113602 1.320257 0\nM V30 11 H 1.878452 -2.292149 -0.257605 0\nM V30 12 H 2.525958 -1.144848 0.922699 0\nM V30 13 H 2.474529 0.460096 -1.027758 0\nM V30 14 H 1.049841 0.482723 -2.077739 0\nM V30 15 H 1.614010 2.422731 0.159336 0\nM V30 16 H 0.064177 2.313386 -0.674904 0\nM V30 17 H 0.912838 0.771092 1.807569 0\nM V30 18 H -1.335923 0.281495 2.218471 0\nM V30 19 H -1.720265 1.545393 1.048806 0\nM V30 20 H -2.445941 -0.585115 -1.731268 0\nM V30 21 H -1.363012 0.849308 -1.432956 0\nM V30 22 H -0.710914 -2.103872 -0.650214 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 1 10\nM V30 3 1 1 11\nM V30 4 1 1 12\nM V30 5 1 2 3\nM V30 6 1 2 5\nM V30 7 1 2 9\nM V30 8 1 3 4\nM V30 9 1 3 13\nM V30 10 1 3 14\nM V30 11 1 4 5\nM V30 12 1 4 15\nM V30 13 1 4 16\nM V30 14 1 5 6\nM V30 15 1 5 17\nM V30 16 1 6 7\nM V30 17 1 6 18\nM V30 18 1 6 19\nM V30 19 1 7 8\nM V30 20 1 7 9\nM V30 21 1 8 9\nM V30 22 1 8 20\nM V30 23 1 8 21\nM V30 24 1 9 22\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]"} {"text":"Question: Can you generate the 3D-structure in V3000 Molfile format format (after optimization on B3LYP\/6-31G(2df,p) level of theory) of the compound with the SMILES [H]OC1NC(C([H])([H])[H])N(C([H])([H])C([H])([H])[H])C1[H]?\nAnswer: [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 19 19 0 0 0\nM V30 BEGIN ATOM\nM V30 1 C 2.581587 -1.123201 0.668251 0\nM V30 2 C 1.883715 -0.545351 -0.566246 0\nM V30 3 N 0.488772 -0.211432 -0.329753 0\nM V30 4 C -0.558023 -1.129368 -0.295660 0 VAL=3\nM V30 5 C -1.660426 -0.382718 0.019950 0 VAL=3\nM V30 6 O -2.932736 -0.819797 0.162060 0\nM V30 7 N -1.340724 0.932486 0.183143 0 VAL=2\nM V30 8 C -0.034778 1.009735 -0.029956 0 VAL=3\nM V30 9 C 0.760167 2.271351 0.041939 0\nM V30 10 H 3.621228 -1.371920 0.433803 0\nM V30 11 H 2.576389 -0.404246 1.492701 0\nM V30 12 H 2.078796 -2.032402 1.009468 0\nM V30 13 H 2.394438 0.360689 -0.902657 0\nM V30 14 H 1.925727 -1.260129 -1.395008 0\nM V30 15 H -0.416013 -2.174828 -0.504902 0\nM V30 16 H -3.456724 -0.038725 0.377160 0\nM V30 17 H 1.538120 2.233629 0.813472 0\nM V30 18 H 1.251445 2.508848 -0.909191 0\nM V30 19 H 0.078693 3.087393 0.285279 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 1 10\nM V30 3 1 1 11\nM V30 4 1 1 12\nM V30 5 1 2 3\nM V30 6 1 2 13\nM V30 7 1 2 14\nM V30 8 1 3 4\nM V30 9 1 3 8\nM V30 10 1 4 5\nM V30 11 1 4 15\nM V30 12 1 5 6\nM V30 13 1 5 7\nM V30 14 1 6 16\nM V30 15 1 7 8\nM V30 16 1 8 9\nM V30 17 1 9 17\nM V30 18 1 9 18\nM V30 19 1 9 19\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]"}", "/scratch/micpie/export/smiles_to_3d/train_0-6.jsonl": "{"text":"User: I need to generate 3D geometries of a molecule.\nAssistant: What is the SMILES of the molecule?\nUser: [H]OC1NONC1OC([H])O\nAssistant: The 3D-structure in XYZ format (after optimization on B3LYP\/6-31G(2df,p) level of theory) of the #molecule|compound|chemical!} is [XYZ]\n11\nH2 C3 N2 O4\nO 1.408 1.926 0.053\nC 1.273 0.601 0.016\nN 2.238 -0.267 0.105\nO 1.628 -1.503 0.021\nN 0.274 -1.383 -0.134\nC 0.037 -0.103 -0.129\nO -1.174 0.471 -0.269\nC -2.281 -0.207 0.220\nO -3.374 0.203 0.024\nH 2.346 2.139 0.151\nH -1.997 -1.108 0.783[\\XYZ].\nUser: Can you give me the 3D molecular structure in V2000 Molfile format (following optimization using B3LYP\/6-31G(2df,p) theory)?\nAssistant: The 3D molecular structure in V2000 Molfile format (following optimization using B3LYP\/6-31G(2df,p) theory) of the molecule is [V2000]\n\n ChemNLP 3D\n\n 11 11 0 0 0 0 0 0 0 0999 V2000\n 1.4075 1.9257 0.0528 O 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2732 0.6009 0.0157 C 0 0 0 0 0 3 0 0 0 0 0 0\n 2.2380 -0.2675 0.1046 N 0 0 0 0 0 2 0 0 0 0 0 0\n 1.6276 -1.5027 0.0212 O 0 0 0 0 0 0 0 0 0 0 0 0\n 0.2738 -1.3828 -0.1337 N 0 0 0 0 0 2 0 0 0 0 0 0\n 0.0370 -0.1034 -0.1293 C 0 0 0 0 0 3 0 0 0 0 0 0\n -1.1740 0.4715 -0.2690 O 0 0 0 0 0 0 0 0 0 0 0 0\n -2.2809 -0.2073 0.2197 C 0 0 0 0 0 3 0 0 0 0 0 0\n -3.3745 0.2031 0.0242 O 0 0 0 0 0 1 0 0 0 0 0 0\n 2.3457 2.1385 0.1507 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9972 -1.1084 0.7826 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 1 10 1 0\n 2 3 1 0\n 2 6 1 0\n 3 4 1 0\n 4 5 1 0\n 5 6 1 0\n 6 7 1 0\n 7 8 1 0\n 8 9 1 0\n 8 11 1 0\nM END\n[\\V2000]."} {"text":"User: I need to generate conformers of a compound.\nAssistant: Can you provide the SMILES of the compound?\nUser: [H]C([H])([H])C1NC2([H])C3([H])C([H])([H])C2([H])C3([H])O1\nAssistant: The molecular geometry in XYZ format (optimized with B3LYP\/6-31G(2df,p) level of theory) of the #molecule|compound|chemical!} is [XYZ]\n18\nH9 C7 N1 O1\nC -2.924 -0.010 0.000\nC -1.434 0.105 -0.000\nN -0.807 1.211 -0.000\nC 0.633 1.080 -0.000\nC 1.263 0.001 -0.948\nC 2.485 -0.034 0.000\nC 1.263 0.001 0.948\nC 0.600 -1.035 -0.000\nO -0.819 -1.110 -0.000\nH -3.264 -0.560 0.884\nH -3.365 0.986 -0.005\nH -3.264 -0.569 -0.878\nH 1.115 2.059 -0.000\nH 1.248 -0.001 -2.038\nH 3.084 -0.949 0.000\nH 3.115 0.857 0.000\nH 1.247 -0.001 2.038\nH 0.970 -2.061 0.000[\\XYZ].\nUser: What is the 3D molecular structure in V2000 Molfile format (following optimization using B3LYP\/6-31G(2df,p) theory)?\nAssistant: The 3D molecular structure in V2000 Molfile format (following optimization using B3LYP\/6-31G(2df,p) theory) of the compound is [V2000]\n\n ChemNLP 3D\n\n 18 20 0 0 0 0 0 0 0 0999 V2000\n -2.9243 -0.0104 0.0002 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.4338 0.1051 -0.0002 C 0 0 0 0 0 3 0 0 0 0 0 0\n -0.8067 1.2106 -0.0002 N 0 0 0 0 0 2 0 0 0 0 0 0\n 0.6326 1.0798 -0.0001 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2631 0.0009 -0.9483 C 0 0 0 0 0 0 0 0 0 0 0 0\n 2.4845 -0.0341 0.0002 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2628 0.0009 0.9485 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.6003 -1.0346 -0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8188 -1.1101 -0.0003 O 0 0 0 0 0 0 0 0 0 0 0 0\n -3.2636 -0.5602 0.8842 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.3653 0.9857 -0.0045 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.2635 -0.5691 -0.8781 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.1148 2.0589 -0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2479 -0.0007 -2.0377 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.0836 -0.9491 0.0003 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.1154 0.8575 0.0003 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2473 -0.0006 2.0378 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9696 -2.0609 0.0001 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 1 10 1 0\n 1 11 1 0\n 1 12 1 0\n 2 3 1 0\n 2 9 1 0\n 3 4 1 0\n 4 5 1 0\n 4 7 1 0\n 4 13 1 0\n 5 6 1 0\n 5 8 1 0\n 5 14 1 0\n 6 7 1 0\n 6 15 1 0\n 6 16 1 0\n 7 8 1 0\n 7 17 1 0\n 8 9 1 0\n 8 18 1 0\nM END\n[\\V2000]."}", "/scratch/micpie/export/smiles_to_3d/valid_0-6.jsonl": "{"text":"User: I need to generate 3D geometries of a chemical.\nAssistant: Can you provide the SMILES of the chemical?\nUser: [H]N1C([H])(CN)C12C1([H])C([H])([H])C2([H])C1([H])[H]\nAssistant: The three-dimensional molecular structure in XYZ format (after B3LYP\/6-31G(2df,p) level of theory optimization) of the #molecule|compound|chemical!} is [XYZ]\n17\nH8 C7 N2\nN 3.146 -0.875 -0.361\nC 2.276 -0.234 0.050\nC 1.177 0.529 0.588\nN 0.553 1.536 -0.334\nC -0.207 0.386 0.075\nC -0.927 -0.651 -0.797\nC -1.378 -1.310 0.545\nC -1.567 0.223 0.771\nC -2.180 0.275 -0.667\nH 1.302 0.807 1.632\nH 0.307 2.367 0.201\nH -0.547 -1.142 -1.690\nH -2.298 -1.896 0.498\nH -0.607 -1.809 1.136\nH -1.922 0.728 1.670\nH -2.140 1.232 -1.191\nH -3.149 -0.215 -0.787[\\XYZ].\nUser: What is the content of a V2000 Molfile file with the geometry (after optimization on B3LYP\/6-31G(2df,p) level of theory)?\nAssistant: The content of a V2000 Molfile file with the geometry (after optimization on B3LYP\/6-31G(2df,p) level of theory) of the chemical is [V2000]\n\n ChemNLP 3D\n\n 17 19 0 0 0 0 0 0 0 0999 V2000\n 3.1458 -0.8753 -0.3614 N 0 0 0 0 0 1 0 0 0 0 0 0\n 2.2762 -0.2339 0.0502 C 0 0 0 0 0 2 0 0 0 0 0 0\n 1.1771 0.5291 0.5885 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5526 1.5360 -0.3336 N 0 0 0 0 0 0 0 0 0 0 0 0\n -0.2069 0.3859 0.0754 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.9273 -0.6510 -0.7966 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3780 -1.3103 0.5446 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.5668 0.2226 0.7711 C 0 0 0 0 0 0 0 0 0 0 0 0\n -2.1798 0.2749 -0.6672 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.3018 0.8071 1.6320 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.3067 2.3669 0.2011 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5473 -1.1423 -1.6902 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.2982 -1.8964 0.4977 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6074 -1.8094 1.1364 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9220 0.7279 1.6700 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.1398 1.2324 -1.1907 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.1489 -0.2150 -0.7873 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 3 1 0\n 3 4 1 0\n 3 5 1 0\n 3 10 1 0\n 4 5 1 0\n 4 11 1 0\n 5 6 1 0\n 5 8 1 0\n 6 7 1 0\n 6 9 1 0\n 6 12 1 0\n 7 8 1 0\n 7 13 1 0\n 7 14 1 0\n 8 9 1 0\n 8 15 1 0\n 9 16 1 0\n 9 17 1 0\nM END\n[\\V2000]."} {"text":"User: I need to generate 3D geometries of a chemical.\nAssistant: Can you provide the SMILES of the chemical?\nUser: [H]OC([H])([H])C([H])([H])OC(O)C1([H])C([H])([H])C1([H])[H]\nAssistant: The content within a XYZ file with optimized molecular geometry (following B3LYP\/6-31G(2df,p) theory) of the #molecule|compound|chemical!} is [XYZ]\n19\nH10 C6 O3\nO 3.151 -1.090 -0.422\nC 2.958 0.054 0.385\nC 1.695 0.810 0.014\nO 0.600 -0.100 0.243\nC -0.636 0.377 -0.036\nO -0.841 1.497 -0.438\nC -1.677 -0.648 0.219\nC -3.028 -0.153 0.709\nC -2.892 -0.640 -0.694\nH 2.348 -1.617 -0.348\nH 2.925 -0.206 1.455\nH 3.823 0.706 0.227\nH 1.562 1.709 0.625\nH 1.703 1.104 -1.040\nH -1.316 -1.601 0.585\nH -3.543 -0.773 1.433\nH -3.113 0.916 0.861\nH -2.886 0.102 -1.484\nH -3.311 -1.603 -0.960[\\XYZ].\nUser: What is the 3D-structure in V2000 Molfile format (after optimization on B3LYP\/6-31G(2df,p) level of theory)?\nAssistant: The 3D-structure in V2000 Molfile format (after optimization on B3LYP\/6-31G(2df,p) level of theory) of the chemical is [V2000]\n\n ChemNLP 3D\n\n 19 19 0 0 0 0 0 0 0 0999 V2000\n 3.1513 -1.0899 -0.4224 O 0 0 0 0 0 0 0 0 0 0 0 0\n 2.9575 0.0541 0.3852 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.6950 0.8103 0.0142 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5997 -0.0996 0.2430 O 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6356 0.3770 -0.0358 C 0 0 0 0 0 3 0 0 0 0 0 0\n -0.8406 1.4973 -0.4377 O 0 0 0 0 0 1 0 0 0 0 0 0\n -1.6766 -0.6483 0.2191 C 0 0 0 0 0 0 0 0 0 0 0 0\n -3.0276 -0.1531 0.7085 C 0 0 0 0 0 0 0 0 0 0 0 0\n -2.8917 -0.6400 -0.6944 C 0 0 0 0 0 0 0 0 0 0 0 0\n 2.3480 -1.6168 -0.3477 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.9246 -0.2056 1.4552 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.8230 0.7058 0.2272 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.5619 1.7088 0.6251 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.7025 1.1035 -1.0397 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3163 -1.6008 0.5852 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.5430 -0.7728 1.4329 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.1132 0.9161 0.8614 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.8861 0.1023 -1.4838 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.3111 -1.6031 -0.9604 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 1 10 1 0\n 2 3 1 0\n 2 11 1 0\n 2 12 1 0\n 3 4 1 0\n 3 13 1 0\n 3 14 1 0\n 4 5 1 0\n 5 6 1 0\n 5 7 1 0\n 7 8 1 0\n 7 9 1 0\n 7 15 1 0\n 8 9 1 0\n 8 16 1 0\n 8 17 1 0\n 9 18 1 0\n 9 19 1 0\nM END\n[\\V2000]."}", "/scratch/micpie/export/smiles_to_3d/test_0-0.jsonl": "{"text":"Question: Can you generate the three-dimensional structure in XYZ format (after optimization on B3LYP\/6-31G(2df,p) level of theory) of the molecule with the SMILES [H]C([H])([H])C12C([H])([H])C([H])([H])C1([H])C([H])([H])N1C([H])([H])C12[H]?\nAnswer: [XYZ]\n22\nH13 C8 N1\nC 1.601 -1.566 0.516\nC 0.705 -0.472 -0.053\nC 1.384 0.535 -1.039\nC 0.850 1.725 -0.193\nC 0.329 0.727 0.885\nC -1.179 0.602 1.183\nN -1.759 -0.445 0.312\nC -1.659 -0.149 -1.122\nC -0.651 -1.031 -0.479\nH 1.097 -2.114 1.320\nH 1.878 -2.292 -0.258\nH 2.526 -1.145 0.923\nH 2.475 0.460 -1.028\nH 1.050 0.483 -2.078\nH 1.614 2.423 0.159\nH 0.064 2.313 -0.675\nH 0.913 0.771 1.808\nH -1.336 0.281 2.218\nH -1.720 1.545 1.049\nH -2.446 -0.585 -1.731\nH -1.363 0.849 -1.433\nH -0.711 -2.104 -0.650[\\XYZ]"} {"text":"Question: Can you generate the 3D-structure in XYZ format (after optimization on B3LYP\/6-31G(2df,p) level of theory) of the compound with the SMILES [H]OC1NC(C([H])([H])[H])N(C([H])([H])C([H])([H])[H])C1[H]?\nAnswer: [XYZ]\n19\nH10 C6 N2 O1\nC 2.582 -1.123 0.668\nC 1.884 -0.545 -0.566\nN 0.489 -0.211 -0.330\nC -0.558 -1.129 -0.296\nC -1.660 -0.383 0.020\nO -2.933 -0.820 0.162\nN -1.341 0.932 0.183\nC -0.035 1.010 -0.030\nC 0.760 2.271 0.042\nH 3.621 -1.372 0.434\nH 2.576 -0.404 1.493\nH 2.079 -2.032 1.009\nH 2.394 0.361 -0.903\nH 1.926 -1.260 -1.395\nH -0.416 -2.175 -0.505\nH -3.457 -0.039 0.377\nH 1.538 2.234 0.813\nH 1.251 2.509 -0.909\nH 0.079 3.087 0.285[\\XYZ]"}", "/scratch/micpie/export/smiles_to_3d/test_0-3.jsonl": "{"text":"Question: What is the SMILES of the molecule with content within a XYZ file with optimized molecular geometry (following B3LYP\/6-31G(2df,p) theory) [XYZ]\n22\nH13 C8 N1\nC 1.601 -1.566 0.516\nC 0.705 -0.472 -0.053\nC 1.384 0.535 -1.039\nC 0.850 1.725 -0.193\nC 0.329 0.727 0.885\nC -1.179 0.602 1.183\nN -1.759 -0.445 0.312\nC -1.659 -0.149 -1.122\nC -0.651 -1.031 -0.479\nH 1.097 -2.114 1.320\nH 1.878 -2.292 -0.258\nH 2.526 -1.145 0.923\nH 2.475 0.460 -1.028\nH 1.050 0.483 -2.078\nH 1.614 2.423 0.159\nH 0.064 2.313 -0.675\nH 0.913 0.771 1.808\nH -1.336 0.281 2.218\nH -1.720 1.545 1.049\nH -2.446 -0.585 -1.731\nH -1.363 0.849 -1.433\nH -0.711 -2.104 -0.650[\\XYZ]?\nAnswer: [H]C([H])([H])C12C([H])([H])C([H])([H])C1([H])C([H])([H])N1C([H])([H])C12[H]"} {"text":"Question: What is the SMILES of the molecule with molecular geometry in XYZ format (optimized with B3LYP\/6-31G(2df,p) level of theory) [XYZ]\n19\nH10 C6 N2 O1\nC 2.582 -1.123 0.668\nC 1.884 -0.545 -0.566\nN 0.489 -0.211 -0.330\nC -0.558 -1.129 -0.296\nC -1.660 -0.383 0.020\nO -2.933 -0.820 0.162\nN -1.341 0.932 0.183\nC -0.035 1.010 -0.030\nC 0.760 2.271 0.042\nH 3.621 -1.372 0.434\nH 2.576 -0.404 1.493\nH 2.079 -2.032 1.009\nH 2.394 0.361 -0.903\nH 1.926 -1.260 -1.395\nH -0.416 -2.175 -0.505\nH -3.457 -0.039 0.377\nH 1.538 2.234 0.813\nH 1.251 2.509 -0.909\nH 0.079 3.087 0.285[\\XYZ]?\nAnswer: [H]OC1NC(C([H])([H])[H])N(C([H])([H])C([H])([H])[H])C1[H]"}", "/scratch/micpie/export/smiles_to_3d/train_0-0.jsonl": "{"text":"Question: Can you provide me with the content within a XYZ file with optimized molecular geometry (following B3LYP\/6-31G(2df,p) theory) of the chemical with the SMILES [H]OC1NONC1OC([H])O?\nAnswer: [XYZ]\n11\nH2 C3 N2 O4\nO 1.408 1.926 0.053\nC 1.273 0.601 0.016\nN 2.238 -0.267 0.105\nO 1.628 -1.503 0.021\nN 0.274 -1.383 -0.134\nC 0.037 -0.103 -0.129\nO -1.174 0.471 -0.269\nC -2.281 -0.207 0.220\nO -3.374 0.203 0.024\nH 2.346 2.139 0.151\nH -1.997 -1.108 0.783[\\XYZ]"} {"text":"Question: Can you generate the three-dimensional structure in XYZ format (after optimization on B3LYP\/6-31G(2df,p) level of theory) of the compound with the SMILES [H]C([H])([H])C1NC2([H])C3([H])C([H])([H])C2([H])C3([H])O1?\nAnswer: [XYZ]\n18\nH9 C7 N1 O1\nC -2.924 -0.010 0.000\nC -1.434 0.105 -0.000\nN -0.807 1.211 -0.000\nC 0.633 1.080 -0.000\nC 1.263 0.001 -0.948\nC 2.485 -0.034 0.000\nC 1.263 0.001 0.948\nC 0.600 -1.035 -0.000\nO -0.819 -1.110 -0.000\nH -3.264 -0.560 0.884\nH -3.365 0.986 -0.005\nH -3.264 -0.569 -0.878\nH 1.115 2.059 -0.000\nH 1.248 -0.001 -2.038\nH 3.084 -0.949 0.000\nH 3.115 0.857 0.000\nH 1.247 -0.001 2.038\nH 0.970 -2.061 0.000[\\XYZ]"}", "/scratch/micpie/export/smiles_to_3d/test_0-6.jsonl": "{"text":"User: I need to generate conformers of a chemical.\nAssistant: What is the SMILES of the chemical?\nUser: [H]C([H])([H])C12C([H])([H])C([H])([H])C1([H])C([H])([H])N1C([H])([H])C12[H]\nAssistant: The 3D molecular structure in XYZ format (following optimization using B3LYP\/6-31G(2df,p) theory) of the #molecule|compound|chemical!} is [XYZ]\n22\nH13 C8 N1\nC 1.601 -1.566 0.516\nC 0.705 -0.472 -0.053\nC 1.384 0.535 -1.039\nC 0.850 1.725 -0.193\nC 0.329 0.727 0.885\nC -1.179 0.602 1.183\nN -1.759 -0.445 0.312\nC -1.659 -0.149 -1.122\nC -0.651 -1.031 -0.479\nH 1.097 -2.114 1.320\nH 1.878 -2.292 -0.258\nH 2.526 -1.145 0.923\nH 2.475 0.460 -1.028\nH 1.050 0.483 -2.078\nH 1.614 2.423 0.159\nH 0.064 2.313 -0.675\nH 0.913 0.771 1.808\nH -1.336 0.281 2.218\nH -1.720 1.545 1.049\nH -2.446 -0.585 -1.731\nH -1.363 0.849 -1.433\nH -0.711 -2.104 -0.650[\\XYZ].\nUser: And how about the 3D molecular structure in V2000 Molfile format (following optimization using B3LYP\/6-31G(2df,p) theory)?\nAssistant: The 3D molecular structure in V2000 Molfile format (following optimization using B3LYP\/6-31G(2df,p) theory) of the chemical is [V2000]\n\n ChemNLP 3D\n\n 22 24 0 0 0 0 0 0 0 0999 V2000\n 1.6008 -1.5659 0.5156 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7045 -0.4723 -0.0533 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.3837 0.5349 -1.0390 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.8499 1.7255 -0.1925 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.3287 0.7275 0.8847 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.1788 0.6019 1.1831 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.7592 -0.4452 0.3125 N 0 0 0 0 0 0 0 0 0 0 0 0\n -1.6586 -0.1488 -1.1216 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6512 -1.0313 -0.4790 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0969 -2.1136 1.3203 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.8785 -2.2921 -0.2576 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.5260 -1.1448 0.9227 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.4745 0.4601 -1.0278 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0498 0.4827 -2.0777 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.6140 2.4227 0.1593 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0642 2.3134 -0.6749 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9128 0.7711 1.8076 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3359 0.2815 2.2185 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.7203 1.5454 1.0488 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.4459 -0.5851 -1.7313 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3630 0.8493 -1.4330 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7109 -2.1039 -0.6502 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 1 10 1 0\n 1 11 1 0\n 1 12 1 0\n 2 3 1 0\n 2 5 1 0\n 2 9 1 0\n 3 4 1 0\n 3 13 1 0\n 3 14 1 0\n 4 5 1 0\n 4 15 1 0\n 4 16 1 0\n 5 6 1 0\n 5 17 1 0\n 6 7 1 0\n 6 18 1 0\n 6 19 1 0\n 7 8 1 0\n 7 9 1 0\n 8 9 1 0\n 8 20 1 0\n 8 21 1 0\n 9 22 1 0\nM END\n[\\V2000]."} {"text":"User: I need to generate 3D structures of a molecule.\nAssistant: Can you provide the SMILES of the molecule?\nUser: [H]OC1NC(C([H])([H])[H])N(C([H])([H])C([H])([H])[H])C1[H]\nAssistant: The geometry in XYZ format (after optimization on B3LYP\/6-31G(2df,p) level of theory) of the #molecule|compound|chemical!} is [XYZ]\n19\nH10 C6 N2 O1\nC 2.582 -1.123 0.668\nC 1.884 -0.545 -0.566\nN 0.489 -0.211 -0.330\nC -0.558 -1.129 -0.296\nC -1.660 -0.383 0.020\nO -2.933 -0.820 0.162\nN -1.341 0.932 0.183\nC -0.035 1.010 -0.030\nC 0.760 2.271 0.042\nH 3.621 -1.372 0.434\nH 2.576 -0.404 1.493\nH 2.079 -2.032 1.009\nH 2.394 0.361 -0.903\nH 1.926 -1.260 -1.395\nH -0.416 -2.175 -0.505\nH -3.457 -0.039 0.377\nH 1.538 2.234 0.813\nH 1.251 2.509 -0.909\nH 0.079 3.087 0.285[\\XYZ].\nUser: And how about the data from a V2000 Molfile format file containing optimized geometry (using B3LYP\/6-31G(2df,p) theory)?\nAssistant: The data from a V2000 Molfile format file containing optimized geometry (using B3LYP\/6-31G(2df,p) theory) of the molecule is [V2000]\n\n ChemNLP 3D\n\n 19 19 0 0 0 0 0 0 0 0999 V2000\n 2.5816 -1.1232 0.6683 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.8837 -0.5454 -0.5662 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.4888 -0.2114 -0.3298 N 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5580 -1.1294 -0.2957 C 0 0 0 0 0 3 0 0 0 0 0 0\n -1.6604 -0.3827 0.0199 C 0 0 0 0 0 3 0 0 0 0 0 0\n -2.9327 -0.8198 0.1621 O 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3407 0.9325 0.1831 N 0 0 0 0 0 2 0 0 0 0 0 0\n -0.0348 1.0097 -0.0300 C 0 0 0 0 0 3 0 0 0 0 0 0\n 0.7602 2.2714 0.0419 C 0 0 0 0 0 0 0 0 0 0 0 0\n 3.6212 -1.3719 0.4338 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.5764 -0.4042 1.4927 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.0788 -2.0324 1.0095 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.3944 0.3607 -0.9027 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.9257 -1.2601 -1.3950 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.4160 -2.1748 -0.5049 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.4567 -0.0387 0.3772 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.5381 2.2336 0.8135 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2514 2.5088 -0.9092 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0787 3.0874 0.2853 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 1 10 1 0\n 1 11 1 0\n 1 12 1 0\n 2 3 1 0\n 2 13 1 0\n 2 14 1 0\n 3 4 1 0\n 3 8 1 0\n 4 5 1 0\n 4 15 1 0\n 5 6 1 0\n 5 7 1 0\n 6 16 1 0\n 7 8 1 0\n 8 9 1 0\n 9 17 1 0\n 9 18 1 0\n 9 19 1 0\nM END\n[\\V2000]."}", "/scratch/micpie/export/smiles_to_3d/train_0-3.jsonl": "{"text":"Question: What is the SMILES of the compound with three-dimensional structure in XYZ format (after optimization on B3LYP\/6-31G(2df,p) level of theory) [XYZ]\n11\nH2 C3 N2 O4\nO 1.408 1.926 0.053\nC 1.273 0.601 0.016\nN 2.238 -0.267 0.105\nO 1.628 -1.503 0.021\nN 0.274 -1.383 -0.134\nC 0.037 -0.103 -0.129\nO -1.174 0.471 -0.269\nC -2.281 -0.207 0.220\nO -3.374 0.203 0.024\nH 2.346 2.139 0.151\nH -1.997 -1.108 0.783[\\XYZ]?\nAnswer: [H]OC1NONC1OC([H])O"} {"text":"Question: What is the SMILES of the molecule with data from a XYZ file containing optimized geometry (using B3LYP\/6-31G(2df,p) theory) [XYZ]\n18\nH9 C7 N1 O1\nC -2.924 -0.010 0.000\nC -1.434 0.105 -0.000\nN -0.807 1.211 -0.000\nC 0.633 1.080 -0.000\nC 1.263 0.001 -0.948\nC 2.485 -0.034 0.000\nC 1.263 0.001 0.948\nC 0.600 -1.035 -0.000\nO -0.819 -1.110 -0.000\nH -3.264 -0.560 0.884\nH -3.365 0.986 -0.005\nH -3.264 -0.569 -0.878\nH 1.115 2.059 -0.000\nH 1.248 -0.001 -2.038\nH 3.084 -0.949 0.000\nH 3.115 0.857 0.000\nH 1.247 -0.001 2.038\nH 0.970 -2.061 0.000[\\XYZ]?\nAnswer: [H]C([H])([H])C1NC2([H])C3([H])C([H])([H])C2([H])C3([H])O1"}", "/scratch/micpie/export/smiles_to_3d/valid_0-2.jsonl": "{"text":"Question: Can you provide me with the three-dimensional structure in MOLV3000 Molfile format3000 format (after optimization on B3LYP\/6-31G(2df,p) level of theory) of the compound with the SMILES [H]N1C([H])(CN)C12C1([H])C([H])([H])C2([H])C1([H])[H]?\nAnswer: [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 17 19 0 0 0\nM V30 BEGIN ATOM\nM V30 1 N 3.145795 -0.875261 -0.361416 0 VAL=1\nM V30 2 C 2.276187 -0.233873 0.050245 0 VAL=2\nM V30 3 C 1.177097 0.529118 0.588476 0\nM V30 4 N 0.552646 1.535953 -0.333638 0\nM V30 5 C -0.206915 0.385932 0.075420 0\nM V30 6 C -0.927304 -0.651044 -0.796582 0\nM V30 7 C -1.378040 -1.310350 0.544629 0\nM V30 8 C -1.566846 0.222627 0.771116 0\nM V30 9 C -2.179820 0.274919 -0.667238 0\nM V30 10 H 1.301786 0.807099 1.631958 0\nM V30 11 H 0.306672 2.366917 0.201093 0\nM V30 12 H -0.547290 -1.142329 -1.690196 0\nM V30 13 H -2.298240 -1.896395 0.497713 0\nM V30 14 H -0.607447 -1.809367 1.136384 0\nM V30 15 H -1.922028 0.727854 1.670009 0\nM V30 16 H -2.139840 1.232358 -1.190678 0\nM V30 17 H -3.148865 -0.214968 -0.787310 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 2 3\nM V30 3 1 3 4\nM V30 4 1 3 5\nM V30 5 1 3 10\nM V30 6 1 4 5\nM V30 7 1 4 11\nM V30 8 1 5 6\nM V30 9 1 5 8\nM V30 10 1 6 7\nM V30 11 1 6 9\nM V30 12 1 6 12\nM V30 13 1 7 8\nM V30 14 1 7 13\nM V30 15 1 7 14\nM V30 16 1 8 9\nM V30 17 1 8 15\nM V30 18 1 9 16\nM V30 19 1 9 17\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]"} {"text":"Question: Can you provide me with the content within a V3000 Molfile format file with optimized molecular geometry (following B3LYP\/6-31G(2df,p) theory) of the chemical with the SMILES [H]OC([H])([H])C([H])([H])OC(O)C1([H])C([H])([H])C1([H])[H]?\nAnswer: [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 19 19 0 0 0\nM V30 BEGIN ATOM\nM V30 1 O 3.151259 -1.089935 -0.422368 0\nM V30 2 C 2.957505 0.054095 0.385237 0\nM V30 3 C 1.695008 0.810314 0.014168 0\nM V30 4 O 0.599688 -0.099560 0.243035 0\nM V30 5 C -0.635553 0.377020 -0.035830 0 VAL=3\nM V30 6 O -0.840574 1.497288 -0.437660 0 VAL=1\nM V30 7 C -1.676583 -0.648313 0.219104 0\nM V30 8 C -3.027594 -0.153117 0.708505 0\nM V30 9 C -2.891673 -0.639950 -0.694442 0\nM V30 10 H 2.348029 -1.616820 -0.347690 0\nM V30 11 H 2.924649 -0.205642 1.455216 0\nM V30 12 H 3.823014 0.705775 0.227220 0\nM V30 13 H 1.561884 1.708833 0.625108 0\nM V30 14 H 1.702515 1.103532 -1.039676 0\nM V30 15 H -1.316284 -1.600840 0.585157 0\nM V30 16 H -3.543043 -0.772759 1.432878 0\nM V30 17 H -3.113203 0.916102 0.861399 0\nM V30 18 H -2.886064 0.102295 -1.483772 0\nM V30 19 H -3.311149 -1.603108 -0.960354 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 1 10\nM V30 3 1 2 3\nM V30 4 1 2 11\nM V30 5 1 2 12\nM V30 6 1 3 4\nM V30 7 1 3 13\nM V30 8 1 3 14\nM V30 9 1 4 5\nM V30 10 1 5 6\nM V30 11 1 5 7\nM V30 12 1 7 8\nM V30 13 1 7 9\nM V30 14 1 7 15\nM V30 15 1 8 9\nM V30 16 1 8 16\nM V30 17 1 8 17\nM V30 18 1 9 18\nM V30 19 1 9 19\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]"}", "/scratch/micpie/export/smiles_to_3d/valid_0-1.jsonl": "{"text":"Question: Can you generate the three-dimensional structure in V2000 Molfile format (after optimization on B3LYP\/6-31G(2df,p) level of theory) of the molecule with the SMILES [H]N1C([H])(CN)C12C1([H])C([H])([H])C2([H])C1([H])[H]?\nAnswer: [V2000]\n\n ChemNLP 3D\n\n 17 19 0 0 0 0 0 0 0 0999 V2000\n 3.1458 -0.8753 -0.3614 N 0 0 0 0 0 1 0 0 0 0 0 0\n 2.2762 -0.2339 0.0502 C 0 0 0 0 0 2 0 0 0 0 0 0\n 1.1771 0.5291 0.5885 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5526 1.5360 -0.3336 N 0 0 0 0 0 0 0 0 0 0 0 0\n -0.2069 0.3859 0.0754 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.9273 -0.6510 -0.7966 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3780 -1.3103 0.5446 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.5668 0.2226 0.7711 C 0 0 0 0 0 0 0 0 0 0 0 0\n -2.1798 0.2749 -0.6672 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.3018 0.8071 1.6320 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.3067 2.3669 0.2011 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5473 -1.1423 -1.6902 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.2982 -1.8964 0.4977 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6074 -1.8094 1.1364 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9220 0.7279 1.6700 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.1398 1.2324 -1.1907 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.1489 -0.2150 -0.7873 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 3 1 0\n 3 4 1 0\n 3 5 1 0\n 3 10 1 0\n 4 5 1 0\n 4 11 1 0\n 5 6 1 0\n 5 8 1 0\n 6 7 1 0\n 6 9 1 0\n 6 12 1 0\n 7 8 1 0\n 7 13 1 0\n 7 14 1 0\n 8 9 1 0\n 8 15 1 0\n 9 16 1 0\n 9 17 1 0\nM END\n[\\V2000]"} {"text":"Question: Can you provide me with the 3D molecular structure in V2000 Molfile format (following optimization using B3LYP\/6-31G(2df,p) theory) of the molecule with the SMILES [H]OC([H])([H])C([H])([H])OC(O)C1([H])C([H])([H])C1([H])[H]?\nAnswer: [V2000]\n\n ChemNLP 3D\n\n 19 19 0 0 0 0 0 0 0 0999 V2000\n 3.1513 -1.0899 -0.4224 O 0 0 0 0 0 0 0 0 0 0 0 0\n 2.9575 0.0541 0.3852 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.6950 0.8103 0.0142 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5997 -0.0996 0.2430 O 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6356 0.3770 -0.0358 C 0 0 0 0 0 3 0 0 0 0 0 0\n -0.8406 1.4973 -0.4377 O 0 0 0 0 0 1 0 0 0 0 0 0\n -1.6766 -0.6483 0.2191 C 0 0 0 0 0 0 0 0 0 0 0 0\n -3.0276 -0.1531 0.7085 C 0 0 0 0 0 0 0 0 0 0 0 0\n -2.8917 -0.6400 -0.6944 C 0 0 0 0 0 0 0 0 0 0 0 0\n 2.3480 -1.6168 -0.3477 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.9246 -0.2056 1.4552 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.8230 0.7058 0.2272 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.5619 1.7088 0.6251 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.7025 1.1035 -1.0397 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3163 -1.6008 0.5852 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.5430 -0.7728 1.4329 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.1132 0.9161 0.8614 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.8861 0.1023 -1.4838 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.3111 -1.6031 -0.9604 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 1 10 1 0\n 2 3 1 0\n 2 11 1 0\n 2 12 1 0\n 3 4 1 0\n 3 13 1 0\n 3 14 1 0\n 4 5 1 0\n 5 6 1 0\n 5 7 1 0\n 7 8 1 0\n 7 9 1 0\n 7 15 1 0\n 8 9 1 0\n 8 16 1 0\n 8 17 1 0\n 9 18 1 0\n 9 19 1 0\nM END\n[\\V2000]"}", "/scratch/micpie/export/smiles_to_3d/valid_0-5.jsonl": "{"text":"Question: What is the SMILES of the molecule with the molecular geometry in V3000 Molfile format (optimized with B3LYP\/6-31G(2df,p) level of theory) [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 17 19 0 0 0\nM V30 BEGIN ATOM\nM V30 1 N 3.145795 -0.875261 -0.361416 0 VAL=1\nM V30 2 C 2.276187 -0.233873 0.050245 0 VAL=2\nM V30 3 C 1.177097 0.529118 0.588476 0\nM V30 4 N 0.552646 1.535953 -0.333638 0\nM V30 5 C -0.206915 0.385932 0.075420 0\nM V30 6 C -0.927304 -0.651044 -0.796582 0\nM V30 7 C -1.378040 -1.310350 0.544629 0\nM V30 8 C -1.566846 0.222627 0.771116 0\nM V30 9 C -2.179820 0.274919 -0.667238 0\nM V30 10 H 1.301786 0.807099 1.631958 0\nM V30 11 H 0.306672 2.366917 0.201093 0\nM V30 12 H -0.547290 -1.142329 -1.690196 0\nM V30 13 H -2.298240 -1.896395 0.497713 0\nM V30 14 H -0.607447 -1.809367 1.136384 0\nM V30 15 H -1.922028 0.727854 1.670009 0\nM V30 16 H -2.139840 1.232358 -1.190678 0\nM V30 17 H -3.148865 -0.214968 -0.787310 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 2 3\nM V30 3 1 3 4\nM V30 4 1 3 5\nM V30 5 1 3 10\nM V30 6 1 4 5\nM V30 7 1 4 11\nM V30 8 1 5 6\nM V30 9 1 5 8\nM V30 10 1 6 7\nM V30 11 1 6 9\nM V30 12 1 6 12\nM V30 13 1 7 8\nM V30 14 1 7 13\nM V30 15 1 7 14\nM V30 16 1 8 9\nM V30 17 1 8 15\nM V30 18 1 9 16\nM V30 19 1 9 17\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]?\nAnswer: [H]N1C([H])(CN)C12C1([H])C([H])([H])C2([H])C1([H])[H]"} {"text":"Question: Can you provide me with the SMILES of the chemical with the 3D-structure in V3000 Molfile format format (after optimization on B3LYP\/6-31G(2df,p) level of theory) [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 19 19 0 0 0\nM V30 BEGIN ATOM\nM V30 1 O 3.151259 -1.089935 -0.422368 0\nM V30 2 C 2.957505 0.054095 0.385237 0\nM V30 3 C 1.695008 0.810314 0.014168 0\nM V30 4 O 0.599688 -0.099560 0.243035 0\nM V30 5 C -0.635553 0.377020 -0.035830 0 VAL=3\nM V30 6 O -0.840574 1.497288 -0.437660 0 VAL=1\nM V30 7 C -1.676583 -0.648313 0.219104 0\nM V30 8 C -3.027594 -0.153117 0.708505 0\nM V30 9 C -2.891673 -0.639950 -0.694442 0\nM V30 10 H 2.348029 -1.616820 -0.347690 0\nM V30 11 H 2.924649 -0.205642 1.455216 0\nM V30 12 H 3.823014 0.705775 0.227220 0\nM V30 13 H 1.561884 1.708833 0.625108 0\nM V30 14 H 1.702515 1.103532 -1.039676 0\nM V30 15 H -1.316284 -1.600840 0.585157 0\nM V30 16 H -3.543043 -0.772759 1.432878 0\nM V30 17 H -3.113203 0.916102 0.861399 0\nM V30 18 H -2.886064 0.102295 -1.483772 0\nM V30 19 H -3.311149 -1.603108 -0.960354 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 1 10\nM V30 3 1 2 3\nM V30 4 1 2 11\nM V30 5 1 2 12\nM V30 6 1 3 4\nM V30 7 1 3 13\nM V30 8 1 3 14\nM V30 9 1 4 5\nM V30 10 1 5 6\nM V30 11 1 5 7\nM V30 12 1 7 8\nM V30 13 1 7 9\nM V30 14 1 7 15\nM V30 15 1 8 9\nM V30 16 1 8 16\nM V30 17 1 8 17\nM V30 18 1 9 18\nM V30 19 1 9 19\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]?\nAnswer: [H]OC([H])([H])C([H])([H])OC(O)C1([H])C([H])([H])C1([H])[H]"}", "/scratch/micpie/export/smiles_to_3d/valid_0-4.jsonl": "{"text":"Question: Can you provide me with the SMILES of the molecule with the three-dimensional molecular structure in MOLV2000 Molfile format (after B3LYP\/6-31G(2df,p) level of theory optimization) [V2000]\n\n ChemNLP 3D\n\n 17 19 0 0 0 0 0 0 0 0999 V2000\n 3.1458 -0.8753 -0.3614 N 0 0 0 0 0 1 0 0 0 0 0 0\n 2.2762 -0.2339 0.0502 C 0 0 0 0 0 2 0 0 0 0 0 0\n 1.1771 0.5291 0.5885 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5526 1.5360 -0.3336 N 0 0 0 0 0 0 0 0 0 0 0 0\n -0.2069 0.3859 0.0754 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.9273 -0.6510 -0.7966 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3780 -1.3103 0.5446 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.5668 0.2226 0.7711 C 0 0 0 0 0 0 0 0 0 0 0 0\n -2.1798 0.2749 -0.6672 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.3018 0.8071 1.6320 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.3067 2.3669 0.2011 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5473 -1.1423 -1.6902 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.2982 -1.8964 0.4977 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6074 -1.8094 1.1364 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9220 0.7279 1.6700 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.1398 1.2324 -1.1907 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.1489 -0.2150 -0.7873 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 3 1 0\n 3 4 1 0\n 3 5 1 0\n 3 10 1 0\n 4 5 1 0\n 4 11 1 0\n 5 6 1 0\n 5 8 1 0\n 6 7 1 0\n 6 9 1 0\n 6 12 1 0\n 7 8 1 0\n 7 13 1 0\n 7 14 1 0\n 8 9 1 0\n 8 15 1 0\n 9 16 1 0\n 9 17 1 0\nM END\n[\\V2000]?\nAnswer: [H]N1C([H])(CN)C12C1([H])C([H])([H])C2([H])C1([H])[H]"} {"text":"Question: Can you generate the SMILES of the molecule with the content of a V2000 Molfile file with the geometry (after optimization on B3LYP\/6-31G(2df,p) level of theory) [V2000]\n\n ChemNLP 3D\n\n 19 19 0 0 0 0 0 0 0 0999 V2000\n 3.1513 -1.0899 -0.4224 O 0 0 0 0 0 0 0 0 0 0 0 0\n 2.9575 0.0541 0.3852 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.6950 0.8103 0.0142 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5997 -0.0996 0.2430 O 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6356 0.3770 -0.0358 C 0 0 0 0 0 3 0 0 0 0 0 0\n -0.8406 1.4973 -0.4377 O 0 0 0 0 0 1 0 0 0 0 0 0\n -1.6766 -0.6483 0.2191 C 0 0 0 0 0 0 0 0 0 0 0 0\n -3.0276 -0.1531 0.7085 C 0 0 0 0 0 0 0 0 0 0 0 0\n -2.8917 -0.6400 -0.6944 C 0 0 0 0 0 0 0 0 0 0 0 0\n 2.3480 -1.6168 -0.3477 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.9246 -0.2056 1.4552 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.8230 0.7058 0.2272 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.5619 1.7088 0.6251 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.7025 1.1035 -1.0397 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3163 -1.6008 0.5852 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.5430 -0.7728 1.4329 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.1132 0.9161 0.8614 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.8861 0.1023 -1.4838 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.3111 -1.6031 -0.9604 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 1 10 1 0\n 2 3 1 0\n 2 11 1 0\n 2 12 1 0\n 3 4 1 0\n 3 13 1 0\n 3 14 1 0\n 4 5 1 0\n 5 6 1 0\n 5 7 1 0\n 7 8 1 0\n 7 9 1 0\n 7 15 1 0\n 8 9 1 0\n 8 16 1 0\n 8 17 1 0\n 9 18 1 0\n 9 19 1 0\nM END\n[\\V2000]?\nAnswer: [H]OC([H])([H])C([H])([H])OC(O)C1([H])C([H])([H])C1([H])[H]"}", "/scratch/micpie/export/smiles_to_3d/train_0-5.jsonl": "{"text":"Question: What is the SMILES of the chemical with the 3D-structure in V3000 Molfile format format (after optimization on B3LYP\/6-31G(2df,p) level of theory) [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 11 11 0 0 0\nM V30 BEGIN ATOM\nM V30 1 O 1.407535 1.925683 0.052787 0\nM V30 2 C 1.273193 0.600895 0.015746 0 VAL=3\nM V30 3 N 2.238042 -0.267475 0.104570 0 VAL=2\nM V30 4 O 1.627637 -1.502708 0.021176 0\nM V30 5 N 0.273757 -1.382774 -0.133663 0 VAL=2\nM V30 6 C 0.036992 -0.103374 -0.129300 0 VAL=3\nM V30 7 O -1.174030 0.471452 -0.268997 0\nM V30 8 C -2.280914 -0.207291 0.219674 0 VAL=3\nM V30 9 O -3.374476 0.203101 0.024239 0 VAL=1\nM V30 10 H 2.345677 2.138507 0.150673 0\nM V30 11 H -1.997226 -1.108371 0.782605 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 1 10\nM V30 3 1 2 3\nM V30 4 1 2 6\nM V30 5 1 3 4\nM V30 6 1 4 5\nM V30 7 1 5 6\nM V30 8 1 6 7\nM V30 9 1 7 8\nM V30 10 1 8 9\nM V30 11 1 8 11\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]?\nAnswer: [H]OC1NONC1OC([H])O"} {"text":"Question: Can you generate the SMILES of the compound with the data from a V3000 Molfile format file containing optimized geometry (using B3LYP\/6-31G(2df,p) theory) [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 18 20 0 0 0\nM V30 BEGIN ATOM\nM V30 1 C -2.924308 -0.010399 0.000198 0\nM V30 2 C -1.433831 0.105092 -0.000224 0 VAL=3\nM V30 3 N -0.806660 1.210620 -0.000233 0 VAL=2\nM V30 4 C 0.632648 1.079819 -0.000057 0\nM V30 5 C 1.263076 0.000852 -0.948347 0\nM V30 6 C 2.484529 -0.034144 0.000216 0\nM V30 7 C 1.262822 0.000892 0.948453 0\nM V30 8 C 0.600267 -1.034647 -0.000016 0\nM V30 9 O -0.818847 -1.110069 -0.000252 0\nM V30 10 H -3.263560 -0.560212 0.884151 0\nM V30 11 H -3.365343 0.985669 -0.004505 0\nM V30 12 H -3.263518 -0.569142 -0.878083 0\nM V30 13 H 1.114849 2.058926 -0.000020 0\nM V30 14 H 1.247867 -0.000729 -2.037674 0\nM V30 15 H 3.083552 -0.949092 0.000315 0\nM V30 16 H 3.115433 0.857499 0.000284 0\nM V30 17 H 1.247336 -0.000638 2.037777 0\nM V30 18 H 0.969557 -2.060865 0.000056 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 1 10\nM V30 3 1 1 11\nM V30 4 1 1 12\nM V30 5 1 2 3\nM V30 6 1 2 9\nM V30 7 1 3 4\nM V30 8 1 4 5\nM V30 9 1 4 7\nM V30 10 1 4 13\nM V30 11 1 5 6\nM V30 12 1 5 8\nM V30 13 1 5 14\nM V30 14 1 6 7\nM V30 15 1 6 15\nM V30 16 1 6 16\nM V30 17 1 7 8\nM V30 18 1 7 17\nM V30 19 1 8 9\nM V30 20 1 8 18\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]?\nAnswer: [H]C([H])([H])C1NC2([H])C3([H])C([H])([H])C2([H])C3([H])O1"}", "/scratch/micpie/export/smiles_to_3d/train_0-2.jsonl": "{"text":"Question: Can you provide me with the 3D molecular structure in V3000 Molfile format (following optimization using B3LYP\/6-31G(2df,p) theory) of the chemical with the SMILES [H]OC1NONC1OC([H])O?\nAnswer: [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 11 11 0 0 0\nM V30 BEGIN ATOM\nM V30 1 O 1.407535 1.925683 0.052787 0\nM V30 2 C 1.273193 0.600895 0.015746 0 VAL=3\nM V30 3 N 2.238042 -0.267475 0.104570 0 VAL=2\nM V30 4 O 1.627637 -1.502708 0.021176 0\nM V30 5 N 0.273757 -1.382774 -0.133663 0 VAL=2\nM V30 6 C 0.036992 -0.103374 -0.129300 0 VAL=3\nM V30 7 O -1.174030 0.471452 -0.268997 0\nM V30 8 C -2.280914 -0.207291 0.219674 0 VAL=3\nM V30 9 O -3.374476 0.203101 0.024239 0 VAL=1\nM V30 10 H 2.345677 2.138507 0.150673 0\nM V30 11 H -1.997226 -1.108371 0.782605 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 1 10\nM V30 3 1 2 3\nM V30 4 1 2 6\nM V30 5 1 3 4\nM V30 6 1 4 5\nM V30 7 1 5 6\nM V30 8 1 6 7\nM V30 9 1 7 8\nM V30 10 1 8 9\nM V30 11 1 8 11\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]"} {"text":"Question: Can you provide me with the data from a V3000 Molfile format file containing optimized geometry (using B3LYP\/6-31G(2df,p) theory) of the compound with the SMILES [H]C([H])([H])C1NC2([H])C3([H])C([H])([H])C2([H])C3([H])O1?\nAnswer: [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 18 20 0 0 0\nM V30 BEGIN ATOM\nM V30 1 C -2.924308 -0.010399 0.000198 0\nM V30 2 C -1.433831 0.105092 -0.000224 0 VAL=3\nM V30 3 N -0.806660 1.210620 -0.000233 0 VAL=2\nM V30 4 C 0.632648 1.079819 -0.000057 0\nM V30 5 C 1.263076 0.000852 -0.948347 0\nM V30 6 C 2.484529 -0.034144 0.000216 0\nM V30 7 C 1.262822 0.000892 0.948453 0\nM V30 8 C 0.600267 -1.034647 -0.000016 0\nM V30 9 O -0.818847 -1.110069 -0.000252 0\nM V30 10 H -3.263560 -0.560212 0.884151 0\nM V30 11 H -3.365343 0.985669 -0.004505 0\nM V30 12 H -3.263518 -0.569142 -0.878083 0\nM V30 13 H 1.114849 2.058926 -0.000020 0\nM V30 14 H 1.247867 -0.000729 -2.037674 0\nM V30 15 H 3.083552 -0.949092 0.000315 0\nM V30 16 H 3.115433 0.857499 0.000284 0\nM V30 17 H 1.247336 -0.000638 2.037777 0\nM V30 18 H 0.969557 -2.060865 0.000056 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 1 10\nM V30 3 1 1 11\nM V30 4 1 1 12\nM V30 5 1 2 3\nM V30 6 1 2 9\nM V30 7 1 3 4\nM V30 8 1 4 5\nM V30 9 1 4 7\nM V30 10 1 4 13\nM V30 11 1 5 6\nM V30 12 1 5 8\nM V30 13 1 5 14\nM V30 14 1 6 7\nM V30 15 1 6 15\nM V30 16 1 6 16\nM V30 17 1 7 8\nM V30 18 1 7 17\nM V30 19 1 8 9\nM V30 20 1 8 18\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]"}", "/scratch/micpie/export/smiles_to_3d/train_0-1.jsonl": "{"text":"Question: What is the molecular geometry in V2000 Molfile format (optimized with B3LYP\/6-31G(2df,p) level of theory) of the molecule with the SMILES [H]OC1NONC1OC([H])O?\nAnswer: [V2000]\n\n ChemNLP 3D\n\n 11 11 0 0 0 0 0 0 0 0999 V2000\n 1.4075 1.9257 0.0528 O 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2732 0.6009 0.0157 C 0 0 0 0 0 3 0 0 0 0 0 0\n 2.2380 -0.2675 0.1046 N 0 0 0 0 0 2 0 0 0 0 0 0\n 1.6276 -1.5027 0.0212 O 0 0 0 0 0 0 0 0 0 0 0 0\n 0.2738 -1.3828 -0.1337 N 0 0 0 0 0 2 0 0 0 0 0 0\n 0.0370 -0.1034 -0.1293 C 0 0 0 0 0 3 0 0 0 0 0 0\n -1.1740 0.4715 -0.2690 O 0 0 0 0 0 0 0 0 0 0 0 0\n -2.2809 -0.2073 0.2197 C 0 0 0 0 0 3 0 0 0 0 0 0\n -3.3745 0.2031 0.0242 O 0 0 0 0 0 1 0 0 0 0 0 0\n 2.3457 2.1385 0.1507 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9972 -1.1084 0.7826 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 1 10 1 0\n 2 3 1 0\n 2 6 1 0\n 3 4 1 0\n 4 5 1 0\n 5 6 1 0\n 6 7 1 0\n 7 8 1 0\n 8 9 1 0\n 8 11 1 0\nM END\n[\\V2000]"} {"text":"Question: Can you generate the content of a V2000 Molfile file with the geometry (after optimization on B3LYP\/6-31G(2df,p) level of theory) of the compound with the SMILES [H]C([H])([H])C1NC2([H])C3([H])C([H])([H])C2([H])C3([H])O1?\nAnswer: [V2000]\n\n ChemNLP 3D\n\n 18 20 0 0 0 0 0 0 0 0999 V2000\n -2.9243 -0.0104 0.0002 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.4338 0.1051 -0.0002 C 0 0 0 0 0 3 0 0 0 0 0 0\n -0.8067 1.2106 -0.0002 N 0 0 0 0 0 2 0 0 0 0 0 0\n 0.6326 1.0798 -0.0001 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2631 0.0009 -0.9483 C 0 0 0 0 0 0 0 0 0 0 0 0\n 2.4845 -0.0341 0.0002 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2628 0.0009 0.9485 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.6003 -1.0346 -0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8188 -1.1101 -0.0003 O 0 0 0 0 0 0 0 0 0 0 0 0\n -3.2636 -0.5602 0.8842 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.3653 0.9857 -0.0045 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.2635 -0.5691 -0.8781 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.1148 2.0589 -0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2479 -0.0007 -2.0377 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.0836 -0.9491 0.0003 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.1154 0.8575 0.0003 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2473 -0.0006 2.0378 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9696 -2.0609 0.0001 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 1 10 1 0\n 1 11 1 0\n 1 12 1 0\n 2 3 1 0\n 2 9 1 0\n 3 4 1 0\n 4 5 1 0\n 4 7 1 0\n 4 13 1 0\n 5 6 1 0\n 5 8 1 0\n 5 14 1 0\n 6 7 1 0\n 6 15 1 0\n 6 16 1 0\n 7 8 1 0\n 7 17 1 0\n 8 9 1 0\n 8 18 1 0\nM END\n[\\V2000]"}", "/scratch/micpie/export/smiles_to_3d/train_0-4.jsonl": "{"text":"Question: What is the SMILES of the chemical with the geometry in V2000 Molfile format (after optimization on B3LYP\/6-31G(2df,p) level of theory) [V2000]\n\n ChemNLP 3D\n\n 11 11 0 0 0 0 0 0 0 0999 V2000\n 1.4075 1.9257 0.0528 O 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2732 0.6009 0.0157 C 0 0 0 0 0 3 0 0 0 0 0 0\n 2.2380 -0.2675 0.1046 N 0 0 0 0 0 2 0 0 0 0 0 0\n 1.6276 -1.5027 0.0212 O 0 0 0 0 0 0 0 0 0 0 0 0\n 0.2738 -1.3828 -0.1337 N 0 0 0 0 0 2 0 0 0 0 0 0\n 0.0370 -0.1034 -0.1293 C 0 0 0 0 0 3 0 0 0 0 0 0\n -1.1740 0.4715 -0.2690 O 0 0 0 0 0 0 0 0 0 0 0 0\n -2.2809 -0.2073 0.2197 C 0 0 0 0 0 3 0 0 0 0 0 0\n -3.3745 0.2031 0.0242 O 0 0 0 0 0 1 0 0 0 0 0 0\n 2.3457 2.1385 0.1507 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9972 -1.1084 0.7826 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 1 10 1 0\n 2 3 1 0\n 2 6 1 0\n 3 4 1 0\n 4 5 1 0\n 5 6 1 0\n 6 7 1 0\n 7 8 1 0\n 8 9 1 0\n 8 11 1 0\nM END\n[\\V2000]?\nAnswer: [H]OC1NONC1OC([H])O"} {"text":"Question: Can you generate the SMILES of the compound with the 3D-structure in V2000 Molfile format (after optimization on B3LYP\/6-31G(2df,p) level of theory) [V2000]\n\n ChemNLP 3D\n\n 18 20 0 0 0 0 0 0 0 0999 V2000\n -2.9243 -0.0104 0.0002 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.4338 0.1051 -0.0002 C 0 0 0 0 0 3 0 0 0 0 0 0\n -0.8067 1.2106 -0.0002 N 0 0 0 0 0 2 0 0 0 0 0 0\n 0.6326 1.0798 -0.0001 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2631 0.0009 -0.9483 C 0 0 0 0 0 0 0 0 0 0 0 0\n 2.4845 -0.0341 0.0002 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2628 0.0009 0.9485 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.6003 -1.0346 -0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8188 -1.1101 -0.0003 O 0 0 0 0 0 0 0 0 0 0 0 0\n -3.2636 -0.5602 0.8842 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.3653 0.9857 -0.0045 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.2635 -0.5691 -0.8781 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.1148 2.0589 -0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2479 -0.0007 -2.0377 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.0836 -0.9491 0.0003 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.1154 0.8575 0.0003 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2473 -0.0006 2.0378 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9696 -2.0609 0.0001 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 1 10 1 0\n 1 11 1 0\n 1 12 1 0\n 2 3 1 0\n 2 9 1 0\n 3 4 1 0\n 4 5 1 0\n 4 7 1 0\n 4 13 1 0\n 5 6 1 0\n 5 8 1 0\n 5 14 1 0\n 6 7 1 0\n 6 15 1 0\n 6 16 1 0\n 7 8 1 0\n 7 17 1 0\n 8 9 1 0\n 8 18 1 0\nM END\n[\\V2000]?\nAnswer: [H]C([H])([H])C1NC2([H])C3([H])C([H])([H])C2([H])C3([H])O1"}", "/scratch/micpie/export/smiles_to_3d/valid_0-3.jsonl": "{"text":"Question: What is the SMILES of the molecule with 3D molecular structure in XYZ format (following optimization using B3LYP\/6-31G(2df,p) theory) [XYZ]\n17\nH8 C7 N2\nN 3.146 -0.875 -0.361\nC 2.276 -0.234 0.050\nC 1.177 0.529 0.588\nN 0.553 1.536 -0.334\nC -0.207 0.386 0.075\nC -0.927 -0.651 -0.797\nC -1.378 -1.310 0.545\nC -1.567 0.223 0.771\nC -2.180 0.275 -0.667\nH 1.302 0.807 1.632\nH 0.307 2.367 0.201\nH -0.547 -1.142 -1.690\nH -2.298 -1.896 0.498\nH -0.607 -1.809 1.136\nH -1.922 0.728 1.670\nH -2.140 1.232 -1.191\nH -3.149 -0.215 -0.787[\\XYZ]?\nAnswer: [H]N1C([H])(CN)C12C1([H])C([H])([H])C2([H])C1([H])[H]"} {"text":"Question: Can you provide me with the SMILES of the chemical with three-dimensional molecular structure in XYZ format (after B3LYP\/6-31G(2df,p) level of theory optimization) [XYZ]\n19\nH10 C6 O3\nO 3.151 -1.090 -0.422\nC 2.958 0.054 0.385\nC 1.695 0.810 0.014\nO 0.600 -0.100 0.243\nC -0.636 0.377 -0.036\nO -0.841 1.497 -0.438\nC -1.677 -0.648 0.219\nC -3.028 -0.153 0.709\nC -2.892 -0.640 -0.694\nH 2.348 -1.617 -0.348\nH 2.925 -0.206 1.455\nH 3.823 0.706 0.227\nH 1.562 1.709 0.625\nH 1.703 1.104 -1.040\nH -1.316 -1.601 0.585\nH -3.543 -0.773 1.433\nH -3.113 0.916 0.861\nH -2.886 0.102 -1.484\nH -3.311 -1.603 -0.960[\\XYZ]?\nAnswer: [H]OC([H])([H])C([H])([H])OC(O)C1([H])C([H])([H])C1([H])[H]"}", "/scratch/micpie/export/smiles_to_3d/test_0-4.jsonl": "{"text":"Question: Can you generate the SMILES of the molecule with the 3D molecular structure in V2000 Molfile format (following optimization using B3LYP\/6-31G(2df,p) theory) [V2000]\n\n ChemNLP 3D\n\n 22 24 0 0 0 0 0 0 0 0999 V2000\n 1.6008 -1.5659 0.5156 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7045 -0.4723 -0.0533 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.3837 0.5349 -1.0390 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.8499 1.7255 -0.1925 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.3287 0.7275 0.8847 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.1788 0.6019 1.1831 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.7592 -0.4452 0.3125 N 0 0 0 0 0 0 0 0 0 0 0 0\n -1.6586 -0.1488 -1.1216 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6512 -1.0313 -0.4790 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0969 -2.1136 1.3203 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.8785 -2.2921 -0.2576 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.5260 -1.1448 0.9227 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.4745 0.4601 -1.0278 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0498 0.4827 -2.0777 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.6140 2.4227 0.1593 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0642 2.3134 -0.6749 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9128 0.7711 1.8076 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3359 0.2815 2.2185 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.7203 1.5454 1.0488 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.4459 -0.5851 -1.7313 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3630 0.8493 -1.4330 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7109 -2.1039 -0.6502 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 1 10 1 0\n 1 11 1 0\n 1 12 1 0\n 2 3 1 0\n 2 5 1 0\n 2 9 1 0\n 3 4 1 0\n 3 13 1 0\n 3 14 1 0\n 4 5 1 0\n 4 15 1 0\n 4 16 1 0\n 5 6 1 0\n 5 17 1 0\n 6 7 1 0\n 6 18 1 0\n 6 19 1 0\n 7 8 1 0\n 7 9 1 0\n 8 9 1 0\n 8 20 1 0\n 8 21 1 0\n 9 22 1 0\nM END\n[\\V2000]?\nAnswer: [H]C([H])([H])C12C([H])([H])C([H])([H])C1([H])C([H])([H])N1C([H])([H])C12[H]"} {"text":"Question: What is the SMILES of the molecule with the molecular geometry in V2000 Molfile format (optimized with B3LYP\/6-31G(2df,p) level of theory) [V2000]\n\n ChemNLP 3D\n\n 19 19 0 0 0 0 0 0 0 0999 V2000\n 2.5816 -1.1232 0.6683 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.8837 -0.5454 -0.5662 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.4888 -0.2114 -0.3298 N 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5580 -1.1294 -0.2957 C 0 0 0 0 0 3 0 0 0 0 0 0\n -1.6604 -0.3827 0.0199 C 0 0 0 0 0 3 0 0 0 0 0 0\n -2.9327 -0.8198 0.1621 O 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3407 0.9325 0.1831 N 0 0 0 0 0 2 0 0 0 0 0 0\n -0.0348 1.0097 -0.0300 C 0 0 0 0 0 3 0 0 0 0 0 0\n 0.7602 2.2714 0.0419 C 0 0 0 0 0 0 0 0 0 0 0 0\n 3.6212 -1.3719 0.4338 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.5764 -0.4042 1.4927 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.0788 -2.0324 1.0095 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.3944 0.3607 -0.9027 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.9257 -1.2601 -1.3950 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.4160 -2.1748 -0.5049 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.4567 -0.0387 0.3772 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.5381 2.2336 0.8135 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2514 2.5088 -0.9092 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0787 3.0874 0.2853 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 1 10 1 0\n 1 11 1 0\n 1 12 1 0\n 2 3 1 0\n 2 13 1 0\n 2 14 1 0\n 3 4 1 0\n 3 8 1 0\n 4 5 1 0\n 4 15 1 0\n 5 6 1 0\n 5 7 1 0\n 6 16 1 0\n 7 8 1 0\n 8 9 1 0\n 9 17 1 0\n 9 18 1 0\n 9 19 1 0\nM END\n[\\V2000]?\nAnswer: [H]OC1NC(C([H])([H])[H])N(C([H])([H])C([H])([H])[H])C1[H]"}", "/scratch/micpie/export/MUV_600/valid_0-0.jsonl": "{"text":"The chemical with the InChI InChI=1S\/C20H11N3O3\/c24-19-14-7-4-10-21-17(14)20(25)23(19)13-6-3-5-12(11-13)18-22-15-8-1-2-9-16(15)26-18\/h1-11H is not an inhibitor of the steroidogenic factor 1 (SF-1)."} {"text":"The molecular species with the DeepSMILES representation of COcccccc6-ccC)nncNCCCCC6))OCCO5))))))))ccC)nc96 is not an inhibitor of SF-1."}", "/scratch/micpie/export/MUV_600/test_0-0.jsonl": "{"text":"The chemical compound with the SELFIES representation of ['[C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][N][C][=N][C][=C][Branch1][Ring2][S][Ring1][Branch1][C][C][C][Ring1][=Branch1]'] is not an inhibitor of the steroidogenic factor 1 (SF-1)."} {"text":"The chemical compound with the InChI representation of InChI=1S\/C19H19N5OS\/c1-25-10-9-24-18(16-12-21-17-7-3-2-6-15(16)17)22-23-19(24)26-13-14-5-4-8-20-11-14\/h2-8,11-12,22H,9-10,13H2,1H3\/b18-16- is not an inhibitor of SF-1."}", "/scratch/micpie/export/MUV_600/train_0-0.jsonl": "{"text":"The chemical with the DeepSMILES representation of COcccccc6NC=O)CNC=O)NCCCcccccc69))))))))C5=O)))))))))))occcccc69 is not an inhibitor of the steroidogenic factor 1 (SF-1)."} {"text":"The compound with the canonical SMILES O=C(CCn1cccc1)Nc1nccs1 is not an inhibitor of SF-1."}", "/scratch/micpie/export/sarscov2_vitro_touret/test_0-10.jsonl": "{"text":"User: I'm searching for the DeepSMILES of a molecule that is mitigating the effects of the Corona virus?\nAssistant: This is a molecule that is mitigating the effects of the Corona virus: C=CCOcccccc6OCCO)CNCC)C.Cl"} {"text":"User: I'm looking for the SELFIES of a molecule that is not mitigating the effects of the Corona virus?\nAssistant: This is a molecule that is not mitigating the effects of the Corona virus: [C][C][O][C][O][C@H1][Branch2][Ring1][=Branch2][C@@H1][Branch1][N][C][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C@H1][Branch1][O][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C@H1][Ring2][Ring1][#C][O]"}", "/scratch/micpie/export/sarscov2_vitro_touret/valid_0-8.jsonl": "{"text":"User: Is the molecule with the SMILES C=C1CC[C@@]2(O)[C@H]3Cc4ccc(O)c5c4[C@@]2(CCN3CC2CC2)[C@H]1O5.Cl mitigating the effects of the Corona virus?\nAssistant: Yes, it is mitigating the effects of the Corona virus."} {"text":"User: Is the molecule with the canonical SMILES Clc1ccc(C(c2ccccc2Cl)C(Cl)Cl)cc1 mitigating the effects of the Corona virus?\nAssistant: No, it is not mitigating the effects of the Corona virus."}", "/scratch/micpie/export/sarscov2_vitro_touret/train_0-8.jsonl": "{"text":"User: Is the molecule with the canonical SMILES CCOc1ccc2nc(S(N)(=O)=O)sc2c1 mitigating the effects of the Corona virus?\nAssistant: Yes, it is mitigating the effects of the Corona virus."} {"text":"User: Is the molecule with the SELFIES [N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=Branch1][C][=C][Ring1][=Branch1][Cl][N][C][N][S][Ring1][=Branch2][=Branch1][C][=O][=O] mitigating the effects of the Corona virus?\nAssistant: No, it is not mitigating the effects of the Corona virus."}", "/scratch/micpie/export/sarscov2_vitro_touret/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is mitigating the effects of the Corona virus.\nSMILES: C=CCOc1ccccc1OCC(O)CNC(C)C.Cl\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is mitigating the effects of the Corona virus."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is mitigating the effects of the Corona virus.\ncanonical SMILES: CCOC1O[C@H]([C@@H](COCc2ccccc2)OCc2ccccc2)[C@H](OCc2ccccc2)[C@H]1O\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not mitigating the effects of the Corona virus."}", "/scratch/micpie/export/sarscov2_vitro_touret/valid_0-9.jsonl": "{"text":"User: Can you generate the SELFIES of a molecule that is mitigating the effects of the Corona virus?\nAssistant: Yes, here you go: [C][=C][C][C][C@@][Branch1][C][O][C@H1][C][C][=C][C][=C][Branch1][C][O][C][=C][Ring1][#Branch1][C@@][Ring1][N][Branch1][N][C][C][N][Ring1][=N][C][C][C][C][Ring1][Ring1][C@H1][Ring2][Ring1][#Branch1][O][Ring1][N].[Cl]"} {"text":"User: Can you give me the SMILES of a molecule that is not mitigating the effects of the Corona virus?\nAssistant: Yes, I'm happy to help, here you go: Clc1ccc(C(c2ccccc2Cl)C(Cl)Cl)cc1"}", "/scratch/micpie/export/sarscov2_vitro_touret/test_0-1.jsonl": "{"text":"Based on the InChI InChI=1S\/C15H23NO3.ClH\/c1-4-9-18-14-7-5-6-8-15(14)19-11-13(17)10-16-12(2)3;\/h4-8,12-13,16-17H,1,9-11H2,2-3H3;1H, the molecule is ineffectevelymitigating the effects of the Corona virus."} {"text":"Based on the DeepSMILES representation CCOCO[C@H][C@@H]COCcccccc6)))))))))OCcccccc6)))))))))[C@H]OCcccccc6))))))))[C@H]5O, the molecule is effectively mitigating the effects of the Corona virus."}", "/scratch/micpie/export/sarscov2_vitro_touret/valid_0-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of C=CCC[C@@]O)[C@H]CccccO)cc6[C@@]%10CCN%12CCCC3)))))))[C@H]%14O5.Cl displays activity against the Corona virus."} {"text":"The molecule with the DeepSMILES ClccccCcccccc6Cl)))))))CCl)Cl)))cc6 displays no activity against the Corona virus."}", "/scratch/micpie/export/sarscov2_vitro_touret/test_0-2.jsonl": "{"text":"The SMILES C=CCOc1ccccc1OCC(O)CNC(C)C.Cl represents a molecule that displays activity against the Corona virus."} {"text":"The InChI InChI=1S\/C29H34O6\/c1-2-32-29-26(30)28(34-20-24-16-10-5-11-17-24)27(35-29)25(33-19-23-14-8-4-9-15-23)21-31-18-22-12-6-3-7-13-22\/h3-17,25-30H,2,18-21H2,1H3\/t25-,26-,27-,28-,29?\/m1\/s1 is from a molecule that exhibits no activity against the Corona virus."}", "/scratch/micpie/export/sarscov2_vitro_touret/valid_0-10.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that is mitigating the effects of the Corona virus?\nAssistant: This is a molecule that is mitigating the effects of the Corona virus: C=C1CC[C@@]2(O)[C@H]3Cc4ccc(O)c5c4[C@@]2(CCN3CC2CC2)[C@H]1O5.Cl"} {"text":"User: I'm looking for the InChI of a molecule that is not mitigating the effects of the Corona virus?\nAssistant: This is a molecule that is not mitigating the effects of the Corona virus: InChI=1S\/C14H10Cl4\/c15-10-7-5-9(6-8-10)13(14(17)18)11-3-1-2-4-12(11)16\/h1-8,13-14H"}", "/scratch/micpie/export/sarscov2_vitro_touret/train_0-6.jsonl": "{"text":"Task: Please create a canonical SMILES based on the description below.\nDescription: A molecule that is mitigating the effects of the Corona virus.\nResult: CCOc1ccc2nc(S(N)(=O)=O)sc2c1"} {"text":"Task: Please generate a SELFIES based on the text description below.\nDescription: A molecule that is mitigating the effects of the Corona virus.\nResult: [N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=Branch1][C][=C][Ring1][=Branch1][Cl][N][C][N][S][Ring1][=Branch2][=Branch1][C][=O][=O]"}", "/scratch/micpie/export/sarscov2_vitro_touret/valid_0-6.jsonl": "{"text":"Task: Please generate a canonical SMILES based on the text description below.\nDescription: A molecule that is mitigating the effects of the Corona virus.\nResult: C=C1CC[C@@]2(O)[C@H]3Cc4ccc(O)c5c4[C@@]2(CCN3CC2CC2)[C@H]1O5.Cl"} {"text":"Task: Please generate a DeepSMILES based on the text description.\nDescription: A molecule that is mitigating the effects of the Corona virus.\nResult: ClccccCcccccc6Cl)))))))CCl)Cl)))cc6"}", "/scratch/micpie/export/sarscov2_vitro_touret/test_0-9.jsonl": "{"text":"User: Can you create the SMILES of a molecule that is mitigating the effects of the Corona virus?\nAssistant: Sure, here you go: C=CCOc1ccccc1OCC(O)CNC(C)C.Cl"} {"text":"User: Can you create the canonical SMILES of a molecule that is not mitigating the effects of the Corona virus?\nAssistant: Yes, I'm happy to help, here you go: CCOC1O[C@H]([C@@H](COCc2ccccc2)OCc2ccccc2)[C@H](OCc2ccccc2)[C@H]1O"}", "/scratch/micpie/export/sarscov2_vitro_touret/test_0-0.jsonl": "{"text":"The molecule with the SMILES C=CCOc1ccccc1OCC(O)CNC(C)C.Cl exhibits activity against COVID19."} {"text":"The molecule with the InChI representation of InChI=1S\/C29H34O6\/c1-2-32-29-26(30)28(34-20-24-16-10-5-11-17-24)27(35-29)25(33-19-23-14-8-4-9-15-23)21-31-18-22-12-6-3-7-13-22\/h3-17,25-30H,2,18-21H2,1H3\/t25-,26-,27-,28-,29?\/m1\/s1 shows no activity against SARSCoV2."}", "/scratch/micpie/export/sarscov2_vitro_touret/valid_0-7.jsonl": "{"text":"User: Can you derive if the molecule with the DeepSMILES C=CCC[C@@]O)[C@H]CccccO)cc6[C@@]%10CCN%12CCCC3)))))))[C@H]%14O5.Cl is mitigating the effects of the Corona virus?\nAssistant: Yes, this molecule is mitigating the effects of the Corona virus."} {"text":"User: Can you estimate if the molecule with the InChI InChI=1S\/C14H10Cl4\/c15-10-7-5-9(6-8-10)13(14(17)18)11-3-1-2-4-12(11)16\/h1-8,13-14H is mitigating the effects of the Corona virus?\nAssistant: No, this molecule is not mitigating the effects of the Corona virus."}", "/scratch/micpie/export/sarscov2_vitro_touret/test_0-3.jsonl": "{"text":"The SMILES C=CCOc1ccccc1OCC(O)CNC(C)C.Cl is mitigating the effects of the Corona virus."} {"text":"The molecule SMILES CCOC1O[C@H]([C@@H](COCc2ccccc2)OCc2ccccc2)[C@H](OCc2ccccc2)[C@H]1O is not mitigating the effects of the Corona virus."}", "/scratch/micpie/export/sarscov2_vitro_touret/valid_0-11.jsonl": "{"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should be mitigating the effects of the Corona virus.\nAssistant: Ok, here you go, this DeepSMILES is mitigating the effects of the Corona virus: C=CCC[C@@]O)[C@H]CccccO)cc6[C@@]%10CCN%12CCCC3)))))))[C@H]%14O5.Cl"} {"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be mitigating the effects of the Corona virus.\nAssistant: Got it, this DeepSMILES is not mitigating the effects of the Corona virus: ClccccCcccccc6Cl)))))))CCl)Cl)))cc6"}", "/scratch/micpie/export/sarscov2_vitro_touret/train_0-0.jsonl": "{"text":"The molecule with the canonical SMILES CCOc1ccc2nc(S(N)(=O)=O)sc2c1 shows activity against SARSCoV2."} {"text":"The molecule with the InChI representation of InChI=1S\/C7H8ClN3O4S2\/c8-4-1-5-7(2-6(4)16(9,12)13)17(14,15)11-3-10-5\/h1-2,10-11H,3H2,(H2,9,12,13) exhibits no activity against the Corona virus."}", "/scratch/micpie/export/sarscov2_vitro_touret/test_0-6.jsonl": "{"text":"Task: Please create a molecule SELFIES based on the text description.\nDescription: A molecule that is mitigating the effects of the Corona virus.\nResult: [C][=C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][C][Branch1][C][O][C][N][C][Branch1][C][C][C].[Cl]"} {"text":"Task: Please create a canonical SMILES based on the text description below.\nDescription: A molecule that is mitigating the effects of the Corona virus.\nResult: CCOC1O[C@H]([C@@H](COCc2ccccc2)OCc2ccccc2)[C@H](OCc2ccccc2)[C@H]1O"}", "/scratch/micpie/export/sarscov2_vitro_touret/train_0-10.jsonl": "{"text":"User: I'm searching for the InChI of a molecule that is mitigating the effects of the Corona virus?\nAssistant: This is a molecule that is mitigating the effects of the Corona virus: InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13)"} {"text":"User: I'm searching for the SELFIES of a molecule that is not mitigating the effects of the Corona virus?\nAssistant: This is a molecule that is not mitigating the effects of the Corona virus: [N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=Branch1][C][=C][Ring1][=Branch1][Cl][N][C][N][S][Ring1][=Branch2][=Branch1][C][=O][=O]"}", "/scratch/micpie/export/sarscov2_vitro_touret/train_0-3.jsonl": "{"text":"The SMILES CCOc1ccc2nc(S(N)(=O)=O)sc2c1 is mitigating the effects of the Corona virus."} {"text":"The molecule DeepSMILES NS=O)=O)cccccc6Cl)))NCNS6=O)=O is not mitigating the effects of the Corona virus."}", "/scratch/micpie/export/sarscov2_vitro_touret/train_0-12.jsonl": "{"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should be mitigating the effects of the Corona virus.\nAssistant: Understood, this DeepSMILES is mitigating the effects of the Corona virus: CCOccccncSN)=O)=O))sc5c9"} {"text":"User: I want to create a InChI.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be mitigating the effects of the Corona virus.\nAssistant: Ok, this InChI is not mitigating the effects of the Corona virus: InChI=1S\/C7H8ClN3O4S2\/c8-4-1-5-7(2-6(4)16(9,12)13)17(14,15)11-3-10-5\/h1-2,10-11H,3H2,(H2,9,12,13)"}", "/scratch/micpie/export/sarscov2_vitro_touret/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are mitigating the effects of the Corona virus?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any additional words.\nOptions:\n(1) CN(C)CCc1c[nH]c2ccc(Cn3cncn3)cc12.O=C(O)c1ccccc1\n(2) CCOC(=O)[C@H](CCc1ccccc1)N[C@@H](C)C(=O)N1[C@H](C(=O)O)C[C@@H]2CCC[C@@H]21\n(3) C=CCOc1ccccc1OCC(O)CNC(C)C.Cl\n(4) CCOCC(O)COc1ccc(NC(=O)CC[S+](C)C)cc1.Cc1ccc(S(=O)(=O)[O-])cc1\nAnswer: 3"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not mitigating the effects of the Corona virus?\nConstraint: You must select none, one or more options from a or b without using any other words.\nOptions:\na: Cl.O=C(O)\/C=C\/c1ccc(Cn2ccnc2)cc1\nb: CCOC1O[C@H]([C@@H](COCc2ccccc2)OCc2ccccc2)[C@H](OCc2ccccc2)[C@H]1O\nAnswer: a, b"}", "/scratch/micpie/export/sarscov2_vitro_touret/valid_0-2.jsonl": "{"text":"The SELFIES [C][=C][C][C][C@@][Branch1][C][O][C@H1][C][C][=C][C][=C][Branch1][C][O][C][=C][Ring1][#Branch1][C@@][Ring1][N][Branch1][N][C][C][N][Ring1][=N][C][C][C][C][Ring1][Ring1][C@H1][Ring2][Ring1][#Branch1][O][Ring1][N].[Cl] is from a molecule that exhibits activity against COVID19."} {"text":"The SELFIES [Cl][C][=C][C][=C][Branch2][Ring1][C][C][Branch1][#Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][Cl][C][Branch1][C][Cl][Cl][C][=C][Ring1][P] is from a molecule that exhibits no activity against COVID19."}", "/scratch/micpie/export/sarscov2_vitro_touret/valid_0-1.jsonl": "{"text":"Based on the SELFIES representation [C][=C][C][C][C@@][Branch1][C][O][C@H1][C][C][=C][C][=C][Branch1][C][O][C][=C][Ring1][#Branch1][C@@][Ring1][N][Branch1][N][C][C][N][Ring1][=N][C][C][C][C][Ring1][Ring1][C@H1][Ring2][Ring1][#Branch1][O][Ring1][N].[Cl], the molecule is ineffectevelymitigating the effects of the Corona virus."} {"text":"Based on the SMILES representation Clc1ccc(C(c2ccccc2Cl)C(Cl)Cl)cc1, the molecule is effectively mitigating the effects of the Corona virus."}", "/scratch/micpie/export/sarscov2_vitro_touret/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are mitigating the effects of the Corona virus?\nConstraint: You must select none, one or more options from a, b, or c without using any other words.\nOptions:\n[a] C[C@H](N[C@@H](CCc1ccccc1)C(=O)O)C(=O)N1CCC[C@H]1C(=O)O.O.O\n[b] C=C1CC[C@@]2(O)[C@H]3Cc4ccc(O)c5c4[C@@]2(CCN3CC2CC2)[C@H]1O5.Cl\n[c] COc1ccc(CCN(C)CCCC(C#N)(c2ccc(OC)c(OC)c2)C(C)C)cc1OC.Cl\nAnswer: b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not mitigating the effects of the Corona virus?\nConstraint: You must select none, one or more options from a, b, c, d, or e without using any additional words.\nOptions:\na. CCC)N=cccn-ccccCl)cc6))))))cccccc6nc-%10cc%14NccccCl)cc6\nb. Cl.NCccccSN)=O)=O))cc6\nc. CCOC=O)CC)N)CcccI)cOcccI)cO)cI)c6)))))))cI)c6.Cl\nd. ClccccCcccccc6Cl)))))))CCl)Cl)))cc6\ne. CC=O)Ncccccc6\nAnswer: a, b, c, d, e"}", "/scratch/micpie/export/sarscov2_vitro_touret/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is mitigating the effects of the Corona virus.\nMolecule SELFIES: [C][=C][C][C][C@@][Branch1][C][O][C@H1][C][C][=C][C][=C][Branch1][C][O][C][=C][Ring1][#Branch1][C@@][Ring1][N][Branch1][N][C][C][N][Ring1][=N][C][C][C][C][Ring1][Ring1][C@H1][Ring2][Ring1][#Branch1][O][Ring1][N].[Cl]\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is mitigating the effects of the Corona virus."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is mitigating the effects of the Corona virus.\nSELFIES: [Cl][C][=C][C][=C][Branch2][Ring1][C][C][Branch1][#Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][Cl][C][Branch1][C][Cl][Cl][C][=C][Ring1][P]\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not mitigating the effects of the Corona virus."}", "/scratch/micpie/export/sarscov2_vitro_touret/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is mitigating the effects of the Corona virus.\nMolecule SELFIES: [C][=C][C][C][C@@][Branch1][C][O][C@H1][C][C][=C][C][=C][Branch1][C][O][C][=C][Ring1][#Branch1][C@@][Ring1][N][Branch1][N][C][C][N][Ring1][=N][C][C][C][C][Ring1][Ring1][C@H1][Ring2][Ring1][#Branch1][O][Ring1][N].[Cl]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is mitigating the effects of the Corona virus.\nMolecule DeepSMILES: ClccccCcccccc6Cl)))))))CCl)Cl)))cc6\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any words.\nResult: False"}", "/scratch/micpie/export/sarscov2_vitro_touret/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is mitigating the effects of the Corona virus.\ncanonical SMILES: CCOc1ccc2nc(S(N)(=O)=O)sc2c1\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is mitigating the effects of the Corona virus."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is mitigating the effects of the Corona virus.\nSMILES: NS(=O)(=O)c1cc2c(cc1Cl)NCNS2(=O)=O\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not mitigating the effects of the Corona virus."}", "/scratch/micpie/export/sarscov2_vitro_touret/valid_0-12.jsonl": "{"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should be mitigating the effects of the Corona virus.\nAssistant: Ok, this DeepSMILES is mitigating the effects of the Corona virus: C=CCC[C@@]O)[C@H]CccccO)cc6[C@@]%10CCN%12CCCC3)))))))[C@H]%14O5.Cl"} {"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be mitigating the effects of the Corona virus.\nAssistant: Got it, this DeepSMILES is not mitigating the effects of the Corona virus: ClccccCcccccc6Cl)))))))CCl)Cl)))cc6"}", "/scratch/micpie/export/sarscov2_vitro_touret/train_0-2.jsonl": "{"text":"The SELFIES [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N] represents a molecule that exhibits activity against SARSCoV2."} {"text":"The canonical SMILES NS(=O)(=O)c1cc2c(cc1Cl)NCNS2(=O)=O is from a molecule that exhibits no activity against the Corona virus."}", "/scratch/micpie/export/sarscov2_vitro_touret/test_0-11.jsonl": "{"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should be mitigating the effects of the Corona virus.\nAssistant: Ok, here you go, this SMILES is mitigating the effects of the Corona virus: C=CCOc1ccccc1OCC(O)CNC(C)C.Cl"} {"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be mitigating the effects of the Corona virus.\nAssistant: Ok, this SMILES is not mitigating the effects of the Corona virus: CCOC1O[C@H]([C@@H](COCc2ccccc2)OCc2ccccc2)[C@H](OCc2ccccc2)[C@H]1O"}", "/scratch/micpie/export/sarscov2_vitro_touret/train_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the canonical SMILES CCOc1ccc2nc(S(N)(=O)=O)sc2c1 is mitigating the effects of the Corona virus?\nAssistant: Yes, this molecule is mitigating the effects of the Corona virus."} {"text":"User: Can you tell me if the molecule with the SMILES NS(=O)(=O)c1cc2c(cc1Cl)NCNS2(=O)=O is mitigating the effects of the Corona virus?\nAssistant: No, this molecule is not mitigating the effects of the Corona virus."}", "/scratch/micpie/export/sarscov2_vitro_touret/train_0-11.jsonl": "{"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should be mitigating the effects of the Corona virus.\nAssistant: Got it, here you go, this DeepSMILES is mitigating the effects of the Corona virus: CCOccccncSN)=O)=O))sc5c9"} {"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be mitigating the effects of the Corona virus.\nAssistant: Got it, here you go, this DeepSMILES is not mitigating the effects of the Corona virus: NS=O)=O)cccccc6Cl)))NCNS6=O)=O"}", "/scratch/micpie/export/sarscov2_vitro_touret/train_0-1.jsonl": "{"text":"Based on the InChI InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13), the molecule is ineffectevelymitigating the effects of the Corona virus."} {"text":"Based on the SELFIES [N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=Branch1][C][=C][Ring1][=Branch1][Cl][N][C][N][S][Ring1][=Branch2][=Branch1][C][=O][=O], the molecule is effectively mitigating the effects of the Corona virus."}", "/scratch/micpie/export/sarscov2_vitro_touret/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are mitigating the effects of the Corona virus?\nConstraint: You must select none, one or more options from a, b, or c without using any other words.\nOptions:\n[a] CC(=O)OCC(=O)[C@H]1CC[C@H]2[C@@H]3CC[C@H]4C[C@H](O)CC[C@]4(C)[C@H]3C(=O)C[C@]12C\n[b] CN(C)CCC(c1ccc(Cl)cc1)c1ccccn1.O=C(O)\/C=C\\C(=O)O\n[c] CCOc1ccc2nc(S(N)(=O)=O)sc2c1\nAnswer: a, c"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not mitigating the effects of the Corona virus?\nConstraint: You must select none, one or more options from a, b, c, d, or e without using any additional words.\nOptions:\na: InChI=1S\/C22H29FO5\/c1-12-8-16-15-5-4-13-9-14(25)6-7-19(13,2)21(15,23)17(26)10-20(16,3)22(12,28)18(27)11-24\/h6-7,9,12,15-17,24,26,28H,4-5,8,10-11H2,1-3H3\/t12-,15-,16-,17-,19-,20-,21-,22-\/m0\/s1\nb: InChI=1S\/C10H15N3O5.ClH\/c11-6(4-14)10(18)13-12-3-5-1-2-7(15)9(17)8(5)16;\/h1-2,6,12,14-17H,3-4,11H2,(H,13,18);1H\nc: InChI=1S\/C17H24N2O.ClH\/c1-4-5-9-15-13-14-8-6-7-10-16(14)17(18-15)20-12-11-19(2)3;\/h6-8,10,13H,4-5,9,11-12H2,1-3H3;1H\nd: InChI=1S\/C7H8ClN3O4S2\/c8-4-1-5-7(2-6(4)16(9,12)13)17(14,15)11-3-10-5\/h1-2,10-11H,3H2,(H2,9,12,13)\ne: InChI=1S\/C24H31NO6.ClH\/c1-25(2)16-21(31-24(28)14-13-23(26)27)17-30-22-10-5-4-8-19(22)12-11-18-7-6-9-20(15-18)29-3;\/h4-10,15,21H,11-14,16-17H2,1-3H3,(H,26,27);1H\nAnswer: a, b, c, d, e"}", "/scratch/micpie/export/sarscov2_vitro_touret/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is mitigating the effects of the Corona virus.\nInChI: InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13)\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is mitigating the effects of the Corona virus.\nMolecule SELFIES: [N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=Branch1][C][=C][Ring1][=Branch1][Cl][N][C][N][S][Ring1][=Branch2][=Branch1][C][=O][=O]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/sarscov2_vitro_touret/test_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the SMILES C=CCOc1ccccc1OCC(O)CNC(C)C.Cl is mitigating the effects of the Corona virus?\nAssistant: Yes, this molecule is mitigating the effects of the Corona virus."} {"text":"User: Can you tell me if the molecule with the canonical SMILES CCOC1O[C@H]([C@@H](COCc2ccccc2)OCc2ccccc2)[C@H](OCc2ccccc2)[C@H]1O is mitigating the effects of the Corona virus?\nAssistant: No, this molecule is not mitigating the effects of the Corona virus."}", "/scratch/micpie/export/sarscov2_vitro_touret/train_0-9.jsonl": "{"text":"User: Can you give me the canonical SMILES of a molecule that is mitigating the effects of the Corona virus?\nAssistant: Sure, here you go: CCOc1ccc2nc(S(N)(=O)=O)sc2c1"} {"text":"User: Can you generate the DeepSMILES of a molecule that is not mitigating the effects of the Corona virus?\nAssistant: Sure, here you go: NS=O)=O)cccccc6Cl)))NCNS6=O)=O"}", "/scratch/micpie/export/sarscov2_vitro_touret/valid_0-3.jsonl": "{"text":"The InChI InChI=1S\/C21H25NO3.ClH\/c1-12-6-7-21(24)16-10-14-4-5-15(23)18-17(14)20(21,19(12)25-18)8-9-22(16)11-13-2-3-13;\/h4-5,13,16,19,23-24H,1-3,6-11H2;1H\/t16-,19+,20+,21-;\/m1.\/s1 is mitigating the effects of the Corona virus."} {"text":"The molecule DeepSMILES ClccccCcccccc6Cl)))))))CCl)Cl)))cc6 is not mitigating the effects of the Corona virus."}", "/scratch/micpie/export/sarscov2_vitro_touret/test_0-8.jsonl": "{"text":"User: Is the molecule with the DeepSMILES C=CCOcccccc6OCCO)CNCC)C.Cl mitigating the effects of the Corona virus?\nAssistant: Yes, it is mitigating the effects of the Corona virus."} {"text":"User: Is the molecule with the SELFIES [C][C][O][C][O][C@H1][Branch2][Ring1][=Branch2][C@@H1][Branch1][N][C][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C@H1][Branch1][O][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C@H1][Ring2][Ring1][#C][O] mitigating the effects of the Corona virus?\nAssistant: No, it is not mitigating the effects of the Corona virus."}", "/scratch/micpie/export/sarscov2_vitro_touret/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is mitigating the effects of the Corona virus.\nMolecule DeepSMILES: C=CCOcccccc6OCCO)CNCC)C.Cl\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is mitigating the effects of the Corona virus.\nMolecule InChI: InChI=1S\/C29H34O6\/c1-2-32-29-26(30)28(34-20-24-16-10-5-11-17-24)27(35-29)25(33-19-23-14-8-4-9-15-23)21-31-18-22-12-6-3-7-13-22\/h3-17,25-30H,2,18-21H2,1H3\/t25-,26-,27-,28-,29?\/m1\/s1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/sarscov2_vitro_touret/test_0-12.jsonl": "{"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should be mitigating the effects of the Corona virus.\nAssistant: Got it, this SELFIES is mitigating the effects of the Corona virus: [C][=C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][C][Branch1][C][O][C][N][C][Branch1][C][C][C].[Cl]"} {"text":"User: I want to create a SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be mitigating the effects of the Corona virus.\nAssistant: Understood, this SMILES is not mitigating the effects of the Corona virus: CCOC1O[C@H]([C@@H](COCc2ccccc2)OCc2ccccc2)[C@H](OCc2ccccc2)[C@H]1O"}", "/scratch/micpie/export/drug_protein_hpo/test_0-1.jsonl": "{"text":"The drug [H][C@@]12CC[C@H](O)[C@@]1(C)CC[C@]1([H])C3=CC=C(O)C=C3C[C@@H](CCCCCCCCCS(=O)CCCC(F)(F)C(F)(F)F)[C@@]21[H] targets the protein ER-alpha. The protein ER-alpha is associated with Acne."} {"text":"The drug 9-Deazaguanine targets the protein HGPRTase. The protein HGPRTase is associated with Podagra."}", "/scratch/micpie/export/drug_protein_hpo/valid_0-0.jsonl": "{"text":"The drug Caffeine targets the protein cGMP phosphodiesterase 6C and is associated with Abnormal pupillary light reflex."} {"text":"The drug InChI=1S\/C20H25NO2S2\/c1-14-7-11-24-18(14)17(19-15(2)8-12-25-19)6-4-10-21-9-3-5-16(13-21)20(22)23\/h6-8,11-12,16H,3-5,9-10,13H2,1-2H3,(H,22,23)\/t16-\/m1\/s1 targets the protein GAT-1 and is associated with Generalized myoclonic-atonic seizure."}", "/scratch/micpie/export/drug_protein_hpo/test_0-2.jsonl": "{"text":"The drug Fulvestrant targets the protein Estradiol receptor. The protein Estradiol receptor is associated with the human phenotype represented by Acne."} {"text":"The drug Nc1nc2cc[nH]c2c(=O)[nH]1 targets the protein HGPRT. The protein HGPRT is associated with the human phenotype represented by Podagra."}", "/scratch/micpie/export/drug_protein_hpo/test_0-0.jsonl": "{"text":"The drug Fulvestrant targets the protein Estrogen receptor and is associated with Acne."} {"text":"The drug NC1=NC2=C(NC=C2)C(=O)N1 targets the protein HGPRTase and is associated with Podagra."}", "/scratch/micpie/export/drug_protein_hpo/test_0-3.jsonl": "{"text":"The drug Fulvestrant targets the protein Nuclear receptor subfamily 3 group A member 1. This protein is associated with the Acne."} {"text":"The drug NC=NC=CNC=C5)))C=O)N6 targets the protein HGPRT. This protein is associated with the Podagra."}", "/scratch/micpie/export/drug_protein_hpo/train_0-0.jsonl": "{"text":"The drug O=C(O)CC[C@H](O)Nc1ccc(N2C(=O)CCC2=O)cc1 targets the protein Myosin heavy chain slow isoform and is associated with Asymmetric septal hypertrophy."} {"text":"The drug Histidine targets the protein Histidase and is associated with Hyperhistidinemia."}", "/scratch/micpie/export/drug_protein_hpo/train_0-3.jsonl": "{"text":"The drug 4-[4-(2,5-DIOXO-PYRROLIDIN-1-YL)-PHENYLAMINO]-4-HYDROXY-BUTYRIC ACID targets the protein Myosin-7. This protein is associated with the Asymmetric septal hypertrophy."} {"text":"The drug InChI=1S\/C6H9N3O2\/c7-5(6(10)11)1-4-2-8-3-9-4\/h2-3,5H,1,7H2,(H,8,9)(H,10,11)\/t5-\/m0\/s1 targets the protein Histidine ammonia-lyase. This protein is associated with the Hyperhistidinemia."}", "/scratch/micpie/export/drug_protein_hpo/valid_0-2.jsonl": "{"text":"The drug Caffeine targets the protein cGMP phosphodiesterase 6C. The protein cGMP phosphodiesterase 6C is associated with the human phenotype represented by Abnormal pupillary light reflex."} {"text":"The drug Tiagabine targets the protein Solute carrier family 6 member 1. The protein Solute carrier family 6 member 1 is associated with the human phenotype represented by Generalized myoclonic-atonic seizure."}", "/scratch/micpie/export/drug_protein_hpo/valid_0-1.jsonl": "{"text":"The drug Caffeine targets the protein cGMP phosphodiesterase 6C. The protein cGMP phosphodiesterase 6C is associated with Abnormal pupillary light reflex."} {"text":"The drug InChI=1S\/C20H25NO2S2\/c1-14-7-11-24-18(14)17(19-15(2)8-12-25-19)6-4-10-21-9-3-5-16(13-21)20(22)23\/h6-8,11-12,16H,3-5,9-10,13H2,1-2H3,(H,22,23)\/t16-\/m1\/s1 targets the protein GAT-1. The protein GAT-1 is associated with Generalized myoclonic-atonic seizure."}", "/scratch/micpie/export/drug_protein_hpo/train_0-2.jsonl": "{"text":"The drug 4-[4-(2,5-DIOXO-PYRROLIDIN-1-YL)-PHENYLAMINO]-4-HYDROXY-BUTYRIC ACID targets the protein MyHC-slow. The protein MyHC-slow is associated with the human phenotype represented by Asymmetric septal hypertrophy."} {"text":"The drug Histidine targets the protein Histidine ammonia-lyase. The protein Histidine ammonia-lyase is associated with the human phenotype represented by Hyperhistidinemia."}", "/scratch/micpie/export/drug_protein_hpo/train_0-1.jsonl": "{"text":"The drug 4-[4-(2,5-DIOXO-PYRROLIDIN-1-YL)-PHENYLAMINO]-4-HYDROXY-BUTYRIC ACID targets the protein Myosin heavy chain 7. The protein Myosin heavy chain 7 is associated with Asymmetric septal hypertrophy."} {"text":"The drug N[C@@H](CC1=CNC=N1)C(O)=O targets the protein Histidase. The protein Histidase is associated with Hyperhistidinemia."}", "/scratch/micpie/export/drug_protein_hpo/valid_0-3.jsonl": "{"text":"The drug CNC=NC=C5C=O)NC)C=O)N6C targets the protein Cone cGMP-specific 3',5'-cyclic phosphodiesterase subunit alpha'. This protein is associated with the Abnormal pupillary light reflex."} {"text":"The drug Tiagabine targets the protein Sodium- and chloride-dependent GABA transporter 1. This protein is associated with the Generalized myoclonic-atonic seizure."}", "/scratch/micpie/export/drug_chebi_chebi/valid_0-0.jsonl": "{"text":"The drug C[C@H]1CCC[C@H](O)CCCCCc2cc(O)cc(O)c2C(=O)O1 is a alpha-Zearalanol and is a macrolide."} {"text":"The drug [H][C@]CNCCC=CNC=CC=CC=C96)))))))[C@]6[H])C[C@]%10[H])C=CO[C@H]%14C))))C=O)OC is a ajmalicine and is a monoterpenoid indole alkaloid."}", "/scratch/micpie/export/drug_chebi_chebi/test_0-0.jsonl": "{"text":"The drug InChI=1S\/C10H12O5\/c1-2-3-15-10(14)6-4-7(11)9(13)8(12)5-6\/h4-5,11-13H,2-3H2,1H3 is a n-propyl gallate and is a trihydroxybenzoic acid."} {"text":"The drug N[C@H](C(O)=O)C1=CC(O)=CC(O)=C1 is a (S)-3,5-dihydroxyphenylglycine zwitterion and is a tautomer of (S)-3,5-dihydroxyphenylglycine."}", "/scratch/micpie/export/drug_chebi_chebi/train_0-0.jsonl": "{"text":"The drug CCCCC=CC=CO)C=C6))))))))NCCO)C=CC=CO)C=C6 is a 4-(1-hydroxy-2-\\{[4-(4-hydroxyphenyl)butan-2-yl]amino\\}ethyl)phenol and is a secondary alcohol."} {"text":"The drug O=C(OCCOCCO)c1ccccc1Nc1cccc(C(F)(F)F)c1 is a 2-[3-(trifluoromethyl)anilino]benzoic acid 2-(2-hydroxyethoxy)ethyl ester and is a benzoate ester."}", "/scratch/micpie/export/peptides_nonfouling/test_0-10.jsonl": "{"text":"User: Can you generate a amino acid sequence that is nonfouling?\nAssistant: Of course, here you go: VIKRQHRAM"} {"text":"User: Can you create a amino acid sequence that is not nonfouling?\nAssistant: Of course, here you go: MSNAIEINELHWLLAVTQNIDVGIVVLDLNYRVTVWNTFMENRSGVVPYVAIDKTFFELFPEVNRQWLSKKIDNVVTLGTPAFTIWEQRPYLVRFKNYQPITGHEEFMYQNTTLFPLRSTTGAISHVCLVIYDVTDVAANRN"}", "/scratch/micpie/export/peptides_nonfouling/valid_0-8.jsonl": "{"text":"User: Can you tell me if the amino acid sequence ATQPATAE is nonfouling?\nAssistant: Yes, this amino acid sequence is nonfouling."} {"text":"User: Can you derive if the amino acid sequence MSGSDRGDRLYDVLGVTRDATVQEIKTAYRKLALKHHPDKYVDQDSKEVNEIKFKEITAAYEILSDPEKKSHYDLYGDDNGAASSGG is nonfouling?\nAssistant: No, this amino acid sequence is not nonfouling."}", "/scratch/micpie/export/peptides_nonfouling/test_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which amino acid sequences are nonfouling?\nConstraint: You must select none, one or more options from A, B, C, or D without using any other words.\nOptions:\nA: MSRIGRIPVAIPKGVDVKLGDNNTLTVKGQKGTLTKQFHKDMIIKVEGDKIIVQRPSDEKKHKALHGLTRTLINNMVIGVTQGYEKALEINGVGYRAQKQGKKLILTLGYSHPVEMEEPQGITVEVPAPNKIVVKGADKQAVGEFAAKIRSKREPEVYKGKGIKYEDEVIRRKEGKAGGKGKK\nB: KFISVEDV\nC: VIKRQHRAM\nD: QLRRLQEER\nAnswer: C, D"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which amino acid sequences are not nonfouling?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any additional words.\nOptions:\n(A) MSNAIEINELHWLLAVTQNIDVGIVVLDLNYRVTVWNTFMENRSGVVPYVAIDKTFFELFPEVNRQWLSKKIDNVVTLGTPAFTIWEQRPYLVRFKNYQPITGHEEFMYQNTTLFPLRSTTGAISHVCLVIYDVTDVAANRN\n(B) MTISEVSKKYELSADTLRYYERIGLIPPVNRNKSGIRSFTEEDCEWVNFIKCMRGAGLSIETLIEYVAMFQQGSSTIKARKELLIEQRNQLAKRIEEMQKTLERLNFKIDRYEEGIIEKEKVLKSSRNKKMEAVSYE\n(C) VLSRIQFVSRFGSNLVHIEGVKGSGKSWLAQRYLEKWCDDADQVLLMCHPNQSIQQQRGIILKQVVRDPLFNESDSVVDSVGRMLAGEKCNLVLAIDDAHLLSSELLSELWLLVQKAQSAPNWQINILLFAEKGKLTQTLSQMSYGQDNKPVDIDINP\n(D) DDETK\n(E) MTYVDGFVLPVPEGKIDAYRQMAESAGKIWMEHGALQYKECVLEDAKPEMPEDAPETCKITPFGKLAGTKDGETVIFAFIVYKSREHRDEVNKKVMADPRMQEACDENNMPFDPSRMAYGGFKALVDL\nAnswer: A, B, C, E"}", "/scratch/micpie/export/peptides_nonfouling/train_0-8.jsonl": "{"text":"User: Can you derive if the amino acid sequence VASQKND is nonfouling?\nAssistant: Yes, this amino acid sequence is nonfouling."} {"text":"User: Can you derive if the amino acid sequence MTIELRDVTMENYFDVLNLDVKEYQKQFIATNAISLAEAYVYTKNGDFVAPLAVYDNDAIIGFVMIAYDKKIGISSGNYLLFRFMIDKNFQNQGYFKPIMDKVLDYVRTAPAGLSNKLWLSYEPENEQARFCYLSYGFKETGEISENEVVAIYDLTIEK is nonfouling?\nAssistant: No, this amino acid sequence is not nonfouling."}", "/scratch/micpie/export/peptides_nonfouling/test_0-5.jsonl": "{"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is nonfouling.\namino acid sequence : VIKRQHRAM\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is nonfouling.\n: MSNAIEINELHWLLAVTQNIDVGIVVLDLNYRVTVWNTFMENRSGVVPYVAIDKTFFELFPEVNRQWLSKKIDNVVTLGTPAFTIWEQRPYLVRFKNYQPITGHEEFMYQNTTLFPLRSTTGAISHVCLVIYDVTDVAANRN\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/peptides_nonfouling/valid_0-9.jsonl": "{"text":"User: Is the amino acid sequence ATQPATAE nonfouling?\nAssistant: Yes, it is nonfouling."} {"text":"User: Is the amino acid sequence MSGSDRGDRLYDVLGVTRDATVQEIKTAYRKLALKHHPDKYVDQDSKEVNEIKFKEITAAYEILSDPEKKSHYDLYGDDNGAASSGG nonfouling?\nAssistant: No, it is not nonfouling."}", "/scratch/micpie/export/peptides_nonfouling/test_0-1.jsonl": "{"text":"The amino acid sequence VIKRQHRAM exhibits nonfouling properties."} {"text":"The amino acid sequence MSNAIEINELHWLLAVTQNIDVGIVVLDLNYRVTVWNTFMENRSGVVPYVAIDKTFFELFPEVNRQWLSKKIDNVVTLGTPAFTIWEQRPYLVRFKNYQPITGHEEFMYQNTTLFPLRSTTGAISHVCLVIYDVTDVAANRN displays no nonfouling properties."}", "/scratch/micpie/export/peptides_nonfouling/valid_0-0.jsonl": "{"text":"The sequence of amino acid ATQPATAE demonstrates nonfouling properties."} {"text":"The sequence of amino acid MSGSDRGDRLYDVLGVTRDATVQEIKTAYRKLALKHHPDKYVDQDSKEVNEIKFKEITAAYEILSDPEKKSHYDLYGDDNGAASSGG shows no nonfouling properties."}", "/scratch/micpie/export/peptides_nonfouling/test_0-2.jsonl": "{"text":"Based on the amino acid sequence VIKRQHRAM, the peptide has nonfouling characteristics."} {"text":"Based on the amino acid sequence representation MSNAIEINELHWLLAVTQNIDVGIVVLDLNYRVTVWNTFMENRSGVVPYVAIDKTFFELFPEVNRQWLSKKIDNVVTLGTPAFTIWEQRPYLVRFKNYQPITGHEEFMYQNTTLFPLRSTTGAISHVCLVIYDVTDVAANRN, the peptide has no nonfouling features."}", "/scratch/micpie/export/peptides_nonfouling/valid_0-10.jsonl": "{"text":"User: Can you give me a amino acid sequence that is nonfouling?\nAssistant: Of course, here you go: ATQPATAE"} {"text":"User: Can you create a amino acid sequence that is not nonfouling?\nAssistant: Yes, I'm happy to help, here you go: MSGSDRGDRLYDVLGVTRDATVQEIKTAYRKLALKHHPDKYVDQDSKEVNEIKFKEITAAYEILSDPEKKSHYDLYGDDNGAASSGG"}", "/scratch/micpie/export/peptides_nonfouling/train_0-6.jsonl": "{"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is nonfouling.\n: VASQKND\nConstraint: Answer the question in a complete sentence.\nResult: This amino acid sequence is nonfouling."} {"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is nonfouling.\namino acid sequence : MTIELRDVTMENYFDVLNLDVKEYQKQFIATNAISLAEAYVYTKNGDFVAPLAVYDNDAIIGFVMIAYDKKIGISSGNYLLFRFMIDKNFQNQGYFKPIMDKVLDYVRTAPAGLSNKLWLSYEPENEQARFCYLSYGFKETGEISENEVVAIYDLTIEK\nConstraint: Answer the question in a complete sentence.\nResult: This amino acid sequence is not nonfouling."}", "/scratch/micpie/export/peptides_nonfouling/valid_0-6.jsonl": "{"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is nonfouling.\n: ATQPATAE\nConstraint: Answer the question in a complete sentence.\nResult: This amino acid sequence is nonfouling."} {"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is nonfouling.\n: MSGSDRGDRLYDVLGVTRDATVQEIKTAYRKLALKHHPDKYVDQDSKEVNEIKFKEITAAYEILSDPEKKSHYDLYGDDNGAASSGG\nConstraint: Answer the question in a full sentence.\nResult: This amino acid sequence is not nonfouling."}", "/scratch/micpie/export/peptides_nonfouling/test_0-9.jsonl": "{"text":"User: Is the amino acid sequence VIKRQHRAM nonfouling?\nAssistant: Yes, it is nonfouling."} {"text":"User: Is the amino acid sequence MSNAIEINELHWLLAVTQNIDVGIVVLDLNYRVTVWNTFMENRSGVVPYVAIDKTFFELFPEVNRQWLSKKIDNVVTLGTPAFTIWEQRPYLVRFKNYQPITGHEEFMYQNTTLFPLRSTTGAISHVCLVIYDVTDVAANRN nonfouling?\nAssistant: No, it is not nonfouling."}", "/scratch/micpie/export/peptides_nonfouling/test_0-0.jsonl": "{"text":"The sequence of amino acid VIKRQHRAM shows nonfouling properties."} {"text":"The sequence of amino acid MSNAIEINELHWLLAVTQNIDVGIVVLDLNYRVTVWNTFMENRSGVVPYVAIDKTFFELFPEVNRQWLSKKIDNVVTLGTPAFTIWEQRPYLVRFKNYQPITGHEEFMYQNTTLFPLRSTTGAISHVCLVIYDVTDVAANRN demonstrates no nonfouling properties."}", "/scratch/micpie/export/peptides_nonfouling/valid_0-7.jsonl": "{"text":"Task: Please give me a amino acid sequence based on the description.\nDescription: A amino acid sequence of a peptide that is nonfouling.\nResult: ATQPATAE"} {"text":"Task: Please create a AA sequence based on the description below.\nDescription: A amino acid sequence of a peptide that is nonfouling.\nResult: MSGSDRGDRLYDVLGVTRDATVQEIKTAYRKLALKHHPDKYVDQDSKEVNEIKFKEITAAYEILSDPEKKSHYDLYGDDNGAASSGG"}", "/scratch/micpie/export/peptides_nonfouling/test_0-3.jsonl": "{"text":"The amino acid sequence VIKRQHRAM is from a peptide that is identified as nonfouling."} {"text":"The amino acid sequence MSNAIEINELHWLLAVTQNIDVGIVVLDLNYRVTVWNTFMENRSGVVPYVAIDKTFFELFPEVNRQWLSKKIDNVVTLGTPAFTIWEQRPYLVRFKNYQPITGHEEFMYQNTTLFPLRSTTGAISHVCLVIYDVTDVAANRN is from a peptide that is not identified as nonfouling."}", "/scratch/micpie/export/peptides_nonfouling/valid_0-11.jsonl": "{"text":"User: I'm searching for a amino acid sequence that is nonfouling?\nAssistant: This is a amino acid sequence that is nonfouling: ATQPATAE"} {"text":"User: I'm searching for a amino acid sequence that is not nonfouling?\nAssistant: This is a amino acid sequence that is not nonfouling: MSGSDRGDRLYDVLGVTRDATVQEIKTAYRKLALKHHPDKYVDQDSKEVNEIKFKEITAAYEILSDPEKKSHYDLYGDDNGAASSGG"}", "/scratch/micpie/export/peptides_nonfouling/train_0-0.jsonl": "{"text":"The sequence of amino acid VASQKND shows nonfouling properties."} {"text":"The sequence of amino acid MTIELRDVTMENYFDVLNLDVKEYQKQFIATNAISLAEAYVYTKNGDFVAPLAVYDNDAIIGFVMIAYDKKIGISSGNYLLFRFMIDKNFQNQGYFKPIMDKVLDYVRTAPAGLSNKLWLSYEPENEQARFCYLSYGFKETGEISENEVVAIYDLTIEK exhibits no nonfouling properties."}", "/scratch/micpie/export/peptides_nonfouling/test_0-6.jsonl": "{"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is nonfouling.\n: VIKRQHRAM\nConstraint: Answer the question in a full sentence.\nResult: This amino acid sequence is nonfouling."} {"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is nonfouling.\namino acid sequence : MSNAIEINELHWLLAVTQNIDVGIVVLDLNYRVTVWNTFMENRSGVVPYVAIDKTFFELFPEVNRQWLSKKIDNVVTLGTPAFTIWEQRPYLVRFKNYQPITGHEEFMYQNTTLFPLRSTTGAISHVCLVIYDVTDVAANRN\nConstraint: Answer the question in a complete sentence.\nResult: This amino acid sequence is not nonfouling."}", "/scratch/micpie/export/peptides_nonfouling/train_0-10.jsonl": "{"text":"User: Can you create a amino acid sequence that is nonfouling?\nAssistant: Yes, I'm happy to help, here you go: VASQKND"} {"text":"User: Can you generate a amino acid sequence that is not nonfouling?\nAssistant: Yes, I'm happy to help, here you go: MTIELRDVTMENYFDVLNLDVKEYQKQFIATNAISLAEAYVYTKNGDFVAPLAVYDNDAIIGFVMIAYDKKIGISSGNYLLFRFMIDKNFQNQGYFKPIMDKVLDYVRTAPAGLSNKLWLSYEPENEQARFCYLSYGFKETGEISENEVVAIYDLTIEK"}", "/scratch/micpie/export/peptides_nonfouling/train_0-3.jsonl": "{"text":"The amino acid sequence VASQKND is from a peptide that is identified as nonfouling."} {"text":"The amino acid sequence MTIELRDVTMENYFDVLNLDVKEYQKQFIATNAISLAEAYVYTKNGDFVAPLAVYDNDAIIGFVMIAYDKKIGISSGNYLLFRFMIDKNFQNQGYFKPIMDKVLDYVRTAPAGLSNKLWLSYEPENEQARFCYLSYGFKETGEISENEVVAIYDLTIEK is from a peptide that is not identified as nonfouling."}", "/scratch/micpie/export/peptides_nonfouling/train_0-12.jsonl": "{"text":"User: I want to come up with a sequence of amino acids.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The amino acid sequence should be nonfouling.\nAssistant: Ok, here you go, this is nonfouling: VASQKND"} {"text":"User: I want to come up with a sequence of amino acids.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The amino acid sequence should not be nonfouling.\nAssistant: Got it, here you go, this is not nonfouling: MTIELRDVTMENYFDVLNLDVKEYQKQFIATNAISLAEAYVYTKNGDFVAPLAVYDNDAIIGFVMIAYDKKIGISSGNYLLFRFMIDKNFQNQGYFKPIMDKVLDYVRTAPAGLSNKLWLSYEPENEQARFCYLSYGFKETGEISENEVVAIYDLTIEK"}", "/scratch/micpie/export/peptides_nonfouling/test_0-13.jsonl": "{"text":"User: I want to come up with a AA sequence.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the amino acid sequence should be nonfouling.\nAssistant: Got it, this is nonfouling: VIKRQHRAM"} {"text":"User: I want to create a amino acid sequence .\nAssistant: I would love to help you. Should it be a special one?\nUser: Yes, the amino acid sequence should not be nonfouling.\nAssistant: Got it, this is not nonfouling: MSNAIEINELHWLLAVTQNIDVGIVVLDLNYRVTVWNTFMENRSGVVPYVAIDKTFFELFPEVNRQWLSKKIDNVVTLGTPAFTIWEQRPYLVRFKNYQPITGHEEFMYQNTTLFPLRSTTGAISHVCLVIYDVTDVAANRN"}", "/scratch/micpie/export/peptides_nonfouling/valid_0-2.jsonl": "{"text":"Based on the amino acid sequence ATQPATAE, the peptide has nonfouling features."} {"text":"Based on the amino acid sequence MSGSDRGDRLYDVLGVTRDATVQEIKTAYRKLALKHHPDKYVDQDSKEVNEIKFKEITAAYEILSDPEKKSHYDLYGDDNGAASSGG, the peptide has no nonfouling features."}", "/scratch/micpie/export/peptides_nonfouling/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the amino acid sequence with the representation of VASQKND nonfouling?\nConstraint: Even if you are not sure, you must pick either a or b without using any other words.\nOptions:\na.) False\nb.) True\nAnswer: b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the amino acid sequence with the MTIELRDVTMENYFDVLNLDVKEYQKQFIATNAISLAEAYVYTKNGDFVAPLAVYDNDAIIGFVMIAYDKKIGISSGNYLLFRFMIDKNFQNQGYFKPIMDKVLDYVRTAPAGLSNKLWLSYEPENEQARFCYLSYGFKETGEISENEVVAIYDLTIEK nonfouling?\nConstraint: Even if you are not sure, you must pick either a or b without using any other words.\nOptions:\na) False\nb) True\nAnswer: a"}", "/scratch/micpie/export/peptides_nonfouling/valid_0-1.jsonl": "{"text":"The amino acid sequence ATQPATAE displays nonfouling properties."} {"text":"The amino acid sequence MSGSDRGDRLYDVLGVTRDATVQEIKTAYRKLALKHHPDKYVDQDSKEVNEIKFKEITAAYEILSDPEKKSHYDLYGDDNGAASSGG displays no nonfouling properties."}", "/scratch/micpie/export/peptides_nonfouling/valid_0-13.jsonl": "{"text":"User: I want to generate a AA sequence.\nAssistant: Nice. Should it be a special amino acid sequence?\nUser: Yes, the amino acid sequence should be nonfouling.\nAssistant: Understood, this is nonfouling: ATQPATAE"} {"text":"User: I want to generate a amino acid sequence .\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the amino acid sequence should not be nonfouling.\nAssistant: Ok, this is not nonfouling: MSGSDRGDRLYDVLGVTRDATVQEIKTAYRKLALKHHPDKYVDQDSKEVNEIKFKEITAAYEILSDPEKKSHYDLYGDDNGAASSGG"}", "/scratch/micpie/export/peptides_nonfouling/valid_0-5.jsonl": "{"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is nonfouling.\namino acid sequence : ATQPATAE\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is nonfouling.\namino acid sequence : MSGSDRGDRLYDVLGVTRDATVQEIKTAYRKLALKHHPDKYVDQDSKEVNEIKFKEITAAYEILSDPEKKSHYDLYGDDNGAASSGG\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/peptides_nonfouling/train_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which amino acid sequences are nonfouling?\nConstraint: You must select none, one or more options from A, B, or C without using any other words.\nOptions:\nA: MNVETIREYCLSKKGVTESFPFDDVSLVMKVLDKMFALIDLEGANSISLKCDPERAIELREHYAGIEGAYHFNKKYWNGVYFDRDVDDKLIKELVDHSYEEVIKKFTKKLRAEYDALP\nB: VASQKND\nC: MHNKIVRIASSALTGGKLLEKLKPLTRWEVQWDPNKTKCLGITREVTFKDYETTWAFLTRVSMRSHLWGHHPLIHTSYTWVKLELHTHDIDPKDGAHSQLSDIDVRMAKRIDSYIDEMTT\nAnswer: B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which amino acid sequences are not nonfouling?\nConstraint: You must select none, one or more options from A, B, C, or D without using any additional words.\nOptions:\n[A] MTIELRDVTMENYFDVLNLDVKEYQKQFIATNAISLAEAYVYTKNGDFVAPLAVYDNDAIIGFVMIAYDKKIGISSGNYLLFRFMIDKNFQNQGYFKPIMDKVLDYVRTAPAGLSNKLWLSYEPENEQARFCYLSYGFKETGEISENEVVAIYDLTIEK\n[B] LIFCF\n[C] SSVTN\n[D] IDREII\nAnswer: A, B"}", "/scratch/micpie/export/peptides_nonfouling/valid_0-4.jsonl": "{"text":"The sequence of AAs ATQPATAE is nonfouling."} {"text":"The sequence of AAs MSGSDRGDRLYDVLGVTRDATVQEIKTAYRKLALKHHPDKYVDQDSKEVNEIKFKEITAAYEILSDPEKKSHYDLYGDDNGAASSGG is not nonfouling."}", "/scratch/micpie/export/peptides_nonfouling/train_0-5.jsonl": "{"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is nonfouling.\namino acid sequence : VASQKND\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is nonfouling.\n: MTIELRDVTMENYFDVLNLDVKEYQKQFIATNAISLAEAYVYTKNGDFVAPLAVYDNDAIIGFVMIAYDKKIGISSGNYLLFRFMIDKNFQNQGYFKPIMDKVLDYVRTAPAGLSNKLWLSYEPENEQARFCYLSYGFKETGEISENEVVAIYDLTIEK\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/peptides_nonfouling/valid_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which amino acid sequences are nonfouling?\nConstraint: You must select none, one or more options from a, b, c, or d without using any other words.\nOptions:\na ATQPATAE\nb LEAGGDNL\nc DEAVF\nd EPSSVASDVSK\nAnswer: a, b, c, d"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which amino acid sequences are not nonfouling?\nConstraint: You must select none, one or more options from a, b, c, or d without using any other words.\nOptions:\na. MSGSDRGDRLYDVLGVTRDATVQEIKTAYRKLALKHHPDKYVDQDSKEVNEIKFKEITAAYEILSDPEKKSHYDLYGDDNGAASSGG\nb. MAQASDSTAMEVEEATNQTVKKRFEVKKWSAVALWAWDIQVDNCAICRNHIMDLCIECQANQAAGLKDECTVAWGNCNHAFHFHCISRWLKTRQVCPLDNREWEFQKYGH\nc. GQYTCFILGR\nd. ANPQY\nAnswer: a, b, c, d"}", "/scratch/micpie/export/peptides_nonfouling/valid_0-12.jsonl": "{"text":"User: I want to come up with a sequence of amino acids.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The amino acid sequence should be nonfouling.\nAssistant: Got it, here you go, this is nonfouling: ATQPATAE"} {"text":"User: I want to come up with a AA sequence.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The amino acid sequence should not be nonfouling.\nAssistant: Ok, this is not nonfouling: MSGSDRGDRLYDVLGVTRDATVQEIKTAYRKLALKHHPDKYVDQDSKEVNEIKFKEITAAYEILSDPEKKSHYDLYGDDNGAASSGG"}", "/scratch/micpie/export/peptides_nonfouling/train_0-2.jsonl": "{"text":"Based on the amino acid sequence VASQKND, the peptide has nonfouling features."} {"text":"Based on the amino acid sequence MTIELRDVTMENYFDVLNLDVKEYQKQFIATNAISLAEAYVYTKNGDFVAPLAVYDNDAIIGFVMIAYDKKIGISSGNYLLFRFMIDKNFQNQGYFKPIMDKVLDYVRTAPAGLSNKLWLSYEPENEQARFCYLSYGFKETGEISENEVVAIYDLTIEK, the peptide has no nonfouling features."}", "/scratch/micpie/export/peptides_nonfouling/test_0-11.jsonl": "{"text":"User: I'm searching for a amino acid sequence that is nonfouling?\nAssistant: This is a amino acid sequence that is nonfouling: VIKRQHRAM"} {"text":"User: I'm searching for a amino acid sequence that is not nonfouling?\nAssistant: This is a amino acid sequence that is not nonfouling: MSNAIEINELHWLLAVTQNIDVGIVVLDLNYRVTVWNTFMENRSGVVPYVAIDKTFFELFPEVNRQWLSKKIDNVVTLGTPAFTIWEQRPYLVRFKNYQPITGHEEFMYQNTTLFPLRSTTGAISHVCLVIYDVTDVAANRN"}", "/scratch/micpie/export/peptides_nonfouling/train_0-7.jsonl": "{"text":"Task: Please generate a amino acid sequence based on the text description.\nDescription: A amino acid sequence of a peptide that is nonfouling.\nResult: VASQKND"} {"text":"Task: Please give me a amino acid sequence based on the description.\nDescription: A amino acid sequence of a peptide that is nonfouling.\nResult: MTIELRDVTMENYFDVLNLDVKEYQKQFIATNAISLAEAYVYTKNGDFVAPLAVYDNDAIIGFVMIAYDKKIGISSGNYLLFRFMIDKNFQNQGYFKPIMDKVLDYVRTAPAGLSNKLWLSYEPENEQARFCYLSYGFKETGEISENEVVAIYDLTIEK"}", "/scratch/micpie/export/peptides_nonfouling/train_0-11.jsonl": "{"text":"User: I'm looking for a amino acid sequence that is nonfouling?\nAssistant: This is a amino acid sequence that is nonfouling: VASQKND"} {"text":"User: I'm looking for a amino acid sequence that is not nonfouling?\nAssistant: This is a amino acid sequence that is not nonfouling: MTIELRDVTMENYFDVLNLDVKEYQKQFIATNAISLAEAYVYTKNGDFVAPLAVYDNDAIIGFVMIAYDKKIGISSGNYLLFRFMIDKNFQNQGYFKPIMDKVLDYVRTAPAGLSNKLWLSYEPENEQARFCYLSYGFKETGEISENEVVAIYDLTIEK"}", "/scratch/micpie/export/peptides_nonfouling/train_0-1.jsonl": "{"text":"The amino acid sequence VASQKND displays nonfouling properties."} {"text":"The amino acid sequence MTIELRDVTMENYFDVLNLDVKEYQKQFIATNAISLAEAYVYTKNGDFVAPLAVYDNDAIIGFVMIAYDKKIGISSGNYLLFRFMIDKNFQNQGYFKPIMDKVLDYVRTAPAGLSNKLWLSYEPENEQARFCYLSYGFKETGEISENEVVAIYDLTIEK exhibits no nonfouling properties."}", "/scratch/micpie/export/peptides_nonfouling/train_0-13.jsonl": "{"text":"User: I want to generate a amino acid sequence .\nAssistant: I would love to help you. Should it be a special one?\nUser: Yes, the amino acid sequence should be nonfouling.\nAssistant: Got it, this is nonfouling: VASQKND"} {"text":"User: I want to come up with a AA sequence.\nAssistant: This sounds very exciting. Should it be a special amino acid sequence?\nUser: Yes, the amino acid sequence should not be nonfouling.\nAssistant: Got it, this is not nonfouling: MTIELRDVTMENYFDVLNLDVKEYQKQFIATNAISLAEAYVYTKNGDFVAPLAVYDNDAIIGFVMIAYDKKIGISSGNYLLFRFMIDKNFQNQGYFKPIMDKVLDYVRTAPAGLSNKLWLSYEPENEQARFCYLSYGFKETGEISENEVVAIYDLTIEK"}", "/scratch/micpie/export/peptides_nonfouling/train_0-4.jsonl": "{"text":"The amino acid sequence VASQKND is nonfouling."} {"text":"The amino acid sequence MTIELRDVTMENYFDVLNLDVKEYQKQFIATNAISLAEAYVYTKNGDFVAPLAVYDNDAIIGFVMIAYDKKIGISSGNYLLFRFMIDKNFQNQGYFKPIMDKVLDYVRTAPAGLSNKLWLSYEPENEQARFCYLSYGFKETGEISENEVVAIYDLTIEK is not nonfouling."}", "/scratch/micpie/export/peptides_nonfouling/test_0-7.jsonl": "{"text":"Task: Please create a AA sequence based on the text description.\nDescription: A amino acid sequence of a peptide that is nonfouling.\nResult: VIKRQHRAM"} {"text":"Task: Please create a sequence of amino acids based on the text description.\nDescription: A amino acid sequence of a peptide that is nonfouling.\nResult: MSNAIEINELHWLLAVTQNIDVGIVVLDLNYRVTVWNTFMENRSGVVPYVAIDKTFFELFPEVNRQWLSKKIDNVVTLGTPAFTIWEQRPYLVRFKNYQPITGHEEFMYQNTTLFPLRSTTGAISHVCLVIYDVTDVAANRN"}", "/scratch/micpie/export/peptides_nonfouling/train_0-9.jsonl": "{"text":"User: Is the amino acid sequence VASQKND nonfouling?\nAssistant: Yes, it is nonfouling."} {"text":"User: Is the amino acid sequence MTIELRDVTMENYFDVLNLDVKEYQKQFIATNAISLAEAYVYTKNGDFVAPLAVYDNDAIIGFVMIAYDKKIGISSGNYLLFRFMIDKNFQNQGYFKPIMDKVLDYVRTAPAGLSNKLWLSYEPENEQARFCYLSYGFKETGEISENEVVAIYDLTIEK nonfouling?\nAssistant: No, it is not nonfouling."}", "/scratch/micpie/export/peptides_nonfouling/valid_0-3.jsonl": "{"text":"The amino acid sequence ATQPATAE represents a peptide that is identified as nonfouling."} {"text":"The amino acid sequence MSGSDRGDRLYDVLGVTRDATVQEIKTAYRKLALKHHPDKYVDQDSKEVNEIKFKEITAAYEILSDPEKKSHYDLYGDDNGAASSGG represents a peptide that is not identified as nonfouling."}", "/scratch/micpie/export/peptides_nonfouling/test_0-8.jsonl": "{"text":"User: Can you estimate if the amino acid sequence VIKRQHRAM is nonfouling?\nAssistant: Yes, this amino acid sequence is nonfouling."} {"text":"User: Can you estimate if the amino acid sequence MSNAIEINELHWLLAVTQNIDVGIVVLDLNYRVTVWNTFMENRSGVVPYVAIDKTFFELFPEVNRQWLSKKIDNVVTLGTPAFTIWEQRPYLVRFKNYQPITGHEEFMYQNTTLFPLRSTTGAISHVCLVIYDVTDVAANRN is nonfouling?\nAssistant: No, this amino acid sequence is not nonfouling."}", "/scratch/micpie/export/peptides_nonfouling/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the amino acid sequence with the VIKRQHRAM nonfouling?\nConstraint: Even if you are uncertain, you must pick either A or B without using any other words.\nOptions:\n[A] False\n[B] True\nAnswer: B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the amino acid sequence with the representation of MSNAIEINELHWLLAVTQNIDVGIVVLDLNYRVTVWNTFMENRSGVVPYVAIDKTFFELFPEVNRQWLSKKIDNVVTLGTPAFTIWEQRPYLVRFKNYQPITGHEEFMYQNTTLFPLRSTTGAISHVCLVIYDVTDVAANRN nonfouling?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any additional words.\nOptions:\n1. True\n2. False\nAnswer: 2"}", "/scratch/micpie/export/peptides_nonfouling/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the amino acid sequence with the representation of ATQPATAE nonfouling?\nConstraint: Even if you are uncertain, you must pick either a or b without using any additional words.\nOptions:\na True\nb False\nAnswer: a"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the amino acid sequence with the representation of MSGSDRGDRLYDVLGVTRDATVQEIKTAYRKLALKHHPDKYVDQDSKEVNEIKFKEITAAYEILSDPEKKSHYDLYGDDNGAASSGG nonfouling?\nConstraint: Even if you are uncertain, you must pick either a or b without using any other words.\nOptions:\na: True\nb: False\nAnswer: b"}", "/scratch/micpie/export/peptides_nonfouling/test_0-4.jsonl": "{"text":"The amino acid sequence VIKRQHRAM is nonfouling."} {"text":"The amino acid sequence MSNAIEINELHWLLAVTQNIDVGIVVLDLNYRVTVWNTFMENRSGVVPYVAIDKTFFELFPEVNRQWLSKKIDNVVTLGTPAFTIWEQRPYLVRFKNYQPITGHEEFMYQNTTLFPLRSTTGAISHVCLVIYDVTDVAANRN is not nonfouling."}", "/scratch/micpie/export/peptides_nonfouling/test_0-12.jsonl": "{"text":"User: I want to generate a amino acid sequence .\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The amino acid sequence should be nonfouling.\nAssistant: Got it, this is nonfouling: VIKRQHRAM"} {"text":"User: I want to come up with a sequence of amino acids.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The amino acid sequence should not be nonfouling.\nAssistant: Got it, this is not nonfouling: MSNAIEINELHWLLAVTQNIDVGIVVLDLNYRVTVWNTFMENRSGVVPYVAIDKTFFELFPEVNRQWLSKKIDNVVTLGTPAFTIWEQRPYLVRFKNYQPITGHEEFMYQNTTLFPLRSTTGAISHVCLVIYDVTDVAANRN"}", "/scratch/micpie/export/compound_protein_hpo/train_2-2.jsonl": "{"text":"The compound Nc1c2c(nc3c1C(c1ccc(Cl)cc1)NC(=S)N3)CCCC2 targets the protein Butyrylcholine esterase. The protein Butyrylcholine esterase is associated with the human phenotype represented by Chronic infection."} {"text":"The compound [C][=C][C][S][S+1][Branch1][C][O-1][C][C][=C] targets the protein HEST2. The protein HEST2 is associated with the human phenotype represented by Abnormal fifth cranial nerve physiology."}", "/scratch/micpie/export/compound_protein_hpo/test_1-2.jsonl": "{"text":"The compound InChI=1S\/C17H20ClF2N7O3\/c18-12-8-24-14(25-10-17(19,20)11-4-2-1-3-5-11)15(29)27(12)9-13(28)23-6-7-30-26-16(21)22\/h1-5,8H,6-7,9-10H2,(H,23,28)(H,24,25)(H4,21,22,26) targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II). The protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) is associated with the human phenotype represented by Excessive bleeding from superficial cuts."} {"text":"The compound ClcccccNCCCNCCCOccccc[nH]cccccc6c%139))))))))))))))))))))))ccnc6c%10)))CCCC6 targets the protein Choline esterase II. The protein Choline esterase II is associated with the human phenotype represented by Chronic infection."}", "/scratch/micpie/export/compound_protein_hpo/test_0-1.jsonl": "{"text":"The compound InChI=1S\/C28H32FN5O3\/c1-32-25-17-30-24-15-23(29)21(14-22(24)27(25)34(28(32)35)20-6-4-12-36-18-20)19-7-8-26(31-16-19)37-13-5-11-33-9-2-3-10-33\/h7-8,14-17,20H,2-6,9-13,18H2,1H3\/t20-\/m0\/s1 targets the protein A-T mutated. The protein A-T mutated is associated with Defective B cell differentiation."} {"text":"The compound InChI=1S\/C28H38N6O5\/c1-28(2,3)39-27(38)33-23(21-13-6-10-18-9-4-5-12-20(18)21)25(37)34-16-8-14-22(34)24(36)32-19(17-35)11-7-15-31-26(29)30\/h4-6,9-10,12-13,17,19,22-23H,7-8,11,14-16H2,1-3H3,(H,32,36)(H,33,38)(H4,29,30,31)\/t19?,22-,23-\/m0\/s1 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II). The protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) is associated with Excessive bleeding from superficial cuts."}", "/scratch/micpie/export/compound_protein_hpo/test_2-0.jsonl": "{"text":"The compound InChI=1S\/C19H23N3OS.ClH\/c1-3-21(4-2)14-13-20-19(23)22-15-9-5-7-11-17(15)24-18-12-8-6-10-16(18)22;\/h5-12H,3-4,13-14H2,1-2H3,(H,20,23);1H targets the protein Pseudocholinesterase and is associated with Chronic infection."} {"text":"The compound COc1cc(OC)c2c(=O)c(OCCCCN3CCCCC3)c(-c3cc(OC)c(OC)c(OC)c3)oc2c1 targets the protein Telomerase-associated protein 2 and is associated with Abnormal fifth cranial nerve physiology."}", "/scratch/micpie/export/compound_protein_hpo/valid_2-2.jsonl": "{"text":"The compound CN(Cc1ccccc1)Cc1ccc(NC(=O)Cc2ccc(O)c(O)c2)cc1 targets the protein Pseudocholinesterase. The protein Pseudocholinesterase is associated with the human phenotype represented by Chronic infection."} {"text":"The compound InChI=1S\/C30H36N4O4\/c35-27(11-17-33-13-3-1-4-14-33)31-21-7-9-23-25(19-21)30(38)26-20-22(8-10-24(26)29(23)37)32-28(36)12-18-34-15-5-2-6-16-34\/h7-10,19-20H,1-6,11-18H2,(H,31,35)(H,32,36) targets the protein Telomerase reverse transcriptase. The protein Telomerase reverse transcriptase is associated with the human phenotype represented by Abnormal fifth cranial nerve physiology."}", "/scratch/micpie/export/compound_protein_hpo/valid_0-0.jsonl": "{"text":"The compound InChI=1S\/C7H5BrN4\/c8-6-3-1-5(2-4-6)7-9-11-12-10-7\/h1-4H,(H,9,10,11,12) targets the protein Ataxia telangiectasia mutated and is associated with Defective B cell differentiation."} {"text":"The compound CCCC(CCC)C(=O)NN(CC(=O)OCC)C(=O)[C@@H]1CCCC[C@H]1C(=O)N[C@@H](CCCN=C(N)N)C(=O)C(=O)NCCc1ccccc1 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) and is associated with Excessive bleeding from superficial cuts."}", "/scratch/micpie/export/compound_protein_hpo/train_1-3.jsonl": "{"text":"The compound InChI=1S\/C20H26N6O3\/c21-20(22)23-9-3-6-14(12-27)24-18(28)17-8-4-10-26(17)19(29)16-11-13-5-1-2-7-15(13)25-16\/h1-2,5,7,11-12,14,17,25H,3-4,6,8-10H2,(H,24,28)(H4,21,22,23)\/t14?,17-\/m0\/s1 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II). The protein is associated with the Excessive bleeding from superficial cuts."} {"text":"The compound C[n+]c\/C=C\/cccccccc6[nH]9)))))))))))ccNCCOCC6))))))cccccc6%10.[I-] targets the protein Pseudocholinesterase. The protein is associated with the Chronic infection."}", "/scratch/micpie/export/compound_protein_hpo/test_0-2.jsonl": "{"text":"The compound Cn1c(=O)n([C@H]2CCCOC2)c2c3cc(-c4ccc(OCCCN5CCCC5)nc4)c(F)cc3ncc21 targets the protein A-T mutated. The protein A-T mutated is associated with the human phenotype represented by Defective B cell differentiation."} {"text":"The compound CC(C)(C)OC(=O)N[C@H](C(=O)N1CCC[C@H]1C(=O)NC(C=O)CCCN=C(N)N)c1cccc2ccccc12 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II). The protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) is associated with the human phenotype represented by Excessive bleeding from superficial cuts."}", "/scratch/micpie/export/compound_protein_hpo/train_1-0.jsonl": "{"text":"The compound InChI=1S\/C20H26N6O3\/c21-20(22)23-9-3-6-14(12-27)24-18(28)17-8-4-10-26(17)19(29)16-11-13-5-1-2-7-15(13)25-16\/h1-2,5,7,11-12,14,17,25H,3-4,6,8-10H2,(H,24,28)(H4,21,22,23)\/t14?,17-\/m0\/s1 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) and is associated with Excessive bleeding from superficial cuts."} {"text":"The compound C[n+]c\/C=C\/cccccccc6[nH]9)))))))))))ccNCCOCC6))))))cccccc6%10.[I-] targets the protein Butyrylcholine esterase and is associated with Chronic infection."}", "/scratch/micpie/export/compound_protein_hpo/train_2-3.jsonl": "{"text":"The compound Ncccncc6CccccCl)cc6))))))NC=S)N6)))))))CCCC6 targets the protein Pseudocholinesterase. The protein is associated with the Chronic infection."} {"text":"The compound C=CCS[S+]([O-])CC=C targets the protein Telomerase-associated protein 2. The protein is associated with the Abnormal fifth cranial nerve physiology."}", "/scratch/micpie/export/compound_protein_hpo/valid_1-2.jsonl": "{"text":"The compound Cc1cnc(NCCc2ccc3c(c2)CCC3)c(=O)n1CC(=O)NCCONC(=N)N targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II). The protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) is associated with the human phenotype represented by Excessive bleeding from superficial cuts."} {"text":"The compound CCNC(=O)Oc1cccc2c1CCC1CCN(C)C21 targets the protein Acylcholine acylhydrolase. The protein Acylcholine acylhydrolase is associated with the human phenotype represented by Chronic infection."}", "/scratch/micpie/export/compound_protein_hpo/test_0-0.jsonl": "{"text":"The compound Cn1c(=O)n([C@H]2CCCOC2)c2c3cc(-c4ccc(OCCCN5CCCC5)nc4)c(F)cc3ncc21 targets the protein Serine-protein kinase ATM and is associated with Defective B cell differentiation."} {"text":"The compound CCC)C)OC=O)N[C@H]C=O)NCCC[C@H]5C=O)NCC=O))CCCN=CN)N)))))))))))))))cccccccccc%106 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) and is associated with Excessive bleeding from superficial cuts."}", "/scratch/micpie/export/compound_protein_hpo/train_2-0.jsonl": "{"text":"The compound Nc1c2c(nc3c1C(c1ccc(Cl)cc1)NC(=S)N3)CCCC2 targets the protein Acylcholine acylhydrolase and is associated with Chronic infection."} {"text":"The compound C=CCS[S+]([O-])CC=C targets the protein Telomerase-associated protein 2 and is associated with Abnormal fifth cranial nerve physiology."}", "/scratch/micpie/export/compound_protein_hpo/valid_2-0.jsonl": "{"text":"The compound [C][N][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][=C][C][=C][Branch2][Ring1][Branch1][N][C][=Branch1][C][=O][C][C][=C][C][=C][Branch1][C][O][C][Branch1][C][O][=C][Ring1][Branch2][C][=C][Ring2][Ring1][C] targets the protein Cholinesterase and is associated with Chronic infection."} {"text":"The compound [O][=C][Branch1][O][C][C][N][C][C][C][C][C][Ring1][=Branch1][N][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][=C][C][Branch1][S][N][C][=Branch1][C][=O][C][C][N][C][C][C][C][C][Ring1][=Branch1][=C][C][=C][Ring1][P][C][Ring2][Ring1][#Branch1][=O] targets the protein Telomerase reverse transcriptase and is associated with Abnormal fifth cranial nerve physiology."}", "/scratch/micpie/export/compound_protein_hpo/test_0-3.jsonl": "{"text":"The compound InChI=1S\/C28H32FN5O3\/c1-32-25-17-30-24-15-23(29)21(14-22(24)27(25)34(28(32)35)20-6-4-12-36-18-20)19-7-8-26(31-16-19)37-13-5-11-33-9-2-3-10-33\/h7-8,14-17,20H,2-6,9-13,18H2,1H3\/t20-\/m0\/s1 targets the protein A-T mutated. The protein is associated with the Defective B cell differentiation."} {"text":"The compound CC(C)(C)OC(=O)N[C@H](C(=O)N1CCC[C@H]1C(=O)NC(C=O)CCCN=C(N)N)c1cccc2ccccc12 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II). The protein is associated with the Excessive bleeding from superficial cuts."}", "/scratch/micpie/export/compound_protein_hpo/valid_2-1.jsonl": "{"text":"The compound [C][N][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][=C][C][=C][Branch2][Ring1][Branch1][N][C][=Branch1][C][=O][C][C][=C][C][=C][Branch1][C][O][C][Branch1][C][O][=C][Ring1][Branch2][C][=C][Ring2][Ring1][C] targets the protein Pseudocholinesterase. The protein Pseudocholinesterase is associated with Chronic infection."} {"text":"The compound O=C(CCN1CCCCC1)Nc1ccc2c(c1)C(=O)c1cc(NC(=O)CCN3CCCCC3)ccc1C2=O targets the protein TP2. The protein TP2 is associated with Abnormal fifth cranial nerve physiology."}", "/scratch/micpie/export/compound_protein_hpo/test_2-1.jsonl": "{"text":"The compound [C][C][N][Branch1][Ring1][C][C][C][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][S][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][=C].[Cl] targets the protein Acylcholine acylhydrolase. The protein Acylcholine acylhydrolase is associated with Chronic infection."} {"text":"The compound COc1cc(OC)c2c(=O)c(OCCCCN3CCCCC3)c(-c3cc(OC)c(OC)c(OC)c3)oc2c1 targets the protein Telomerase-associated protein 2. The protein Telomerase-associated protein 2 is associated with Abnormal fifth cranial nerve physiology."}", "/scratch/micpie/export/compound_protein_hpo/train_0-0.jsonl": "{"text":"The compound [C][O][C][C][=C][C][=C][Branch2][Ring2][#Branch2][C][=C][C][=C][Branch1][S][N][C@@H1][Branch1][C][C][C][C][=C][N][Branch1][C][C][N][=Ring1][=Branch1][C][Branch1][=Branch1][C][Branch1][C][N][=O][=C][N][=C][Ring2][Ring1][C][C][=C][Ring2][Ring1][=Branch1][F][C][=N][Ring2][Ring1][=N] targets the protein A-T mutated and is associated with Defective B cell differentiation."} {"text":"The compound InChI=1S\/C20H34N6O6\/c1-3-32-17(29)11-26(25-13(2)28)19(31)16-9-5-4-8-15(16)18(30)24-14(12-27)7-6-10-23-20(21)22\/h12,14-16H,3-11H2,1-2H3,(H,24,30)(H,25,28)(H4,21,22,23)\/t14-,15+,16+\/m0\/s1 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) and is associated with Excessive bleeding from superficial cuts."}", "/scratch/micpie/export/compound_protein_hpo/train_0-3.jsonl": "{"text":"The compound COCc1ccc(-c2cc3c(N[C@@H](C)c4ccn(C)n4)c(C(N)=O)cnc3cc2F)cn1 targets the protein Ataxia telangiectasia mutated. The protein is associated with the Defective B cell differentiation."} {"text":"The compound CCOC(=O)CN(NC(C)=O)C(=O)[C@@H]1CCCC[C@H]1C(=O)N[C@H](C=O)CCCN=C(N)N targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II). The protein is associated with the Excessive bleeding from superficial cuts."}", "/scratch/micpie/export/compound_protein_hpo/test_1-1.jsonl": "{"text":"The compound N=C(N)NOCCNC(=O)Cn1c(Cl)cnc(NCC(F)(F)c2ccccc2)c1=O targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II). The protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) is associated with Excessive bleeding from superficial cuts."} {"text":"The compound InChI=1S\/C31H33ClN4O\/c32-21-14-15-24-28(20-21)36-26-11-4-2-9-23(26)31(24)34-18-6-16-33-17-7-19-37-29-13-5-12-27-30(29)22-8-1-3-10-25(22)35-27\/h1,3,5,8,10,12-15,20,33,35H,2,4,6-7,9,11,16-19H2,(H,34,36) targets the protein Acylcholine acylhydrolase. The protein Acylcholine acylhydrolase is associated with Chronic infection."}", "/scratch/micpie/export/compound_protein_hpo/valid_1-3.jsonl": "{"text":"The compound InChI=1S\/C21H29N7O3\/c1-14-12-26-19(25-8-7-15-5-6-16-3-2-4-17(16)11-15)20(30)28(14)13-18(29)24-9-10-31-27-21(22)23\/h5-6,11-12H,2-4,7-10,13H2,1H3,(H,24,29)(H,25,26)(H4,22,23,27) targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II). The protein is associated with the Excessive bleeding from superficial cuts."} {"text":"The compound CCNC(=O)Oc1cccc2c1CCC1CCN(C)C21 targets the protein Acylcholine acylhydrolase. The protein is associated with the Chronic infection."}", "/scratch/micpie/export/compound_protein_hpo/valid_0-2.jsonl": "{"text":"The compound [Br][C][=C][C][=C][Branch1][Branch2][C][N][=N][NH1][N][=Ring1][Branch1][C][=C][Ring1][O] targets the protein A-T mutated. The protein A-T mutated is associated with the human phenotype represented by Defective B cell differentiation."} {"text":"The compound [C][C][C][C][Branch1][Ring2][C][C][C][C][=Branch1][C][=O][N][N][Branch1][=Branch2][C][C][=Branch1][C][=O][O][C][C][C][=Branch1][C][=O][C@@H1][C][C][C][C][C@H1][Ring1][=Branch1][C][=Branch1][C][=O][N][C@@H1][Branch1][#Branch2][C][C][C][N][=C][Branch1][C][N][N][C][=Branch1][C][=O][C][=Branch1][C][=O][N][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1] targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II). The protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) is associated with the human phenotype represented by Excessive bleeding from superficial cuts."}", "/scratch/micpie/export/compound_protein_hpo/valid_0-1.jsonl": "{"text":"The compound Brcccc-cnn[nH]n5)))))cc6 targets the protein Ataxia telangiectasia mutated. The protein Ataxia telangiectasia mutated is associated with Defective B cell differentiation."} {"text":"The compound CCCCCCC)))C=O)NNCC=O)OCC)))))C=O)[C@@H]CCCC[C@H]6C=O)N[C@@H]CCCN=CN)N))))))C=O)C=O)NCCcccccc6 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II). The protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) is associated with Excessive bleeding from superficial cuts."}", "/scratch/micpie/export/compound_protein_hpo/train_2-1.jsonl": "{"text":"The compound Ncccncc6CccccCl)cc6))))))NC=S)N6)))))))CCCC6 targets the protein Choline esterase II. The protein Choline esterase II is associated with Chronic infection."} {"text":"The compound [C][=C][C][S][S+1][Branch1][C][O-1][C][C][=C] targets the protein TP2. The protein TP2 is associated with Abnormal fifth cranial nerve physiology."}", "/scratch/micpie/export/compound_protein_hpo/valid_1-1.jsonl": "{"text":"The compound InChI=1S\/C21H29N7O3\/c1-14-12-26-19(25-8-7-15-5-6-16-3-2-4-17(16)11-15)20(30)28(14)13-18(29)24-9-10-31-27-21(22)23\/h5-6,11-12H,2-4,7-10,13H2,1H3,(H,24,29)(H,25,26)(H4,22,23,27) targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II). The protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) is associated with Excessive bleeding from superficial cuts."} {"text":"The compound InChI=1S\/C16H22N2O2\/c1-3-17-16(19)20-14-6-4-5-13-12(14)8-7-11-9-10-18(2)15(11)13\/h4-6,11,15H,3,7-10H2,1-2H3,(H,17,19) targets the protein Cholinesterase. The protein Cholinesterase is associated with Chronic infection."}", "/scratch/micpie/export/compound_protein_hpo/test_1-3.jsonl": "{"text":"The compound InChI=1S\/C17H20ClF2N7O3\/c18-12-8-24-14(25-10-17(19,20)11-4-2-1-3-5-11)15(29)27(12)9-13(28)23-6-7-30-26-16(21)22\/h1-5,8H,6-7,9-10H2,(H,23,28)(H,24,25)(H4,21,22,26) targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II). The protein is associated with the Excessive bleeding from superficial cuts."} {"text":"The compound InChI=1S\/C31H33ClN4O\/c32-21-14-15-24-28(20-21)36-26-11-4-2-9-23(26)31(24)34-18-6-16-33-17-7-19-37-29-13-5-12-27-30(29)22-8-1-3-10-25(22)35-27\/h1,3,5,8,10,12-15,20,33,35H,2,4,6-7,9,11,16-19H2,(H,34,36) targets the protein Acylcholine acylhydrolase. The protein is associated with the Chronic infection."}", "/scratch/micpie/export/compound_protein_hpo/test_1-0.jsonl": "{"text":"The compound InChI=1S\/C17H20ClF2N7O3\/c18-12-8-24-14(25-10-17(19,20)11-4-2-1-3-5-11)15(29)27(12)9-13(28)23-6-7-30-26-16(21)22\/h1-5,8H,6-7,9-10H2,(H,23,28)(H,24,25)(H4,21,22,26) targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) and is associated with Excessive bleeding from superficial cuts."} {"text":"The compound InChI=1S\/C31H33ClN4O\/c32-21-14-15-24-28(20-21)36-26-11-4-2-9-23(26)31(24)34-18-6-16-33-17-7-19-37-29-13-5-12-27-30(29)22-8-1-3-10-25(22)35-27\/h1,3,5,8,10,12-15,20,33,35H,2,4,6-7,9,11,16-19H2,(H,34,36) targets the protein Butyrylcholine esterase and is associated with Chronic infection."}", "/scratch/micpie/export/compound_protein_hpo/train_0-2.jsonl": "{"text":"The compound COCc1ccc(-c2cc3c(N[C@@H](C)c4ccn(C)n4)c(C(N)=O)cnc3cc2F)cn1 targets the protein Ataxia telangiectasia mutated. The protein Ataxia telangiectasia mutated is associated with the human phenotype represented by Defective B cell differentiation."} {"text":"The compound InChI=1S\/C20H34N6O6\/c1-3-32-17(29)11-26(25-13(2)28)19(31)16-9-5-4-8-15(16)18(30)24-14(12-27)7-6-10-23-20(21)22\/h12,14-16H,3-11H2,1-2H3,(H,24,30)(H,25,28)(H4,21,22,23)\/t14-,15+,16+\/m0\/s1 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II). The protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) is associated with the human phenotype represented by Excessive bleeding from superficial cuts."}", "/scratch/micpie/export/compound_protein_hpo/test_2-2.jsonl": "{"text":"The compound [C][C][N][Branch1][Ring1][C][C][C][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][S][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][=C].[Cl] targets the protein Choline esterase II. The protein Choline esterase II is associated with the human phenotype represented by Chronic infection."} {"text":"The compound COcccOC))cc=O)cOCCCCNCCCCC6)))))))))))c-cccOC))cOC))cOC))c6))))))oc6c%10 targets the protein TP2. The protein TP2 is associated with the human phenotype represented by Abnormal fifth cranial nerve physiology."}", "/scratch/micpie/export/compound_protein_hpo/train_1-1.jsonl": "{"text":"The compound InChI=1S\/C20H26N6O3\/c21-20(22)23-9-3-6-14(12-27)24-18(28)17-8-4-10-26(17)19(29)16-11-13-5-1-2-7-15(13)25-16\/h1-2,5,7,11-12,14,17,25H,3-4,6,8-10H2,(H,24,28)(H4,21,22,23)\/t14?,17-\/m0\/s1 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II). The protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) is associated with Excessive bleeding from superficial cuts."} {"text":"The compound [C][N+1][=C][Branch1][S][\/C][=C][\/C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][Ring1][=Branch2][C][=C][Branch1][=Branch2][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring2][Ring1][O].[I-1] targets the protein Cholinesterase. The protein Cholinesterase is associated with Chronic infection."}", "/scratch/micpie/export/compound_protein_hpo/train_0-1.jsonl": "{"text":"The compound COCc1ccc(-c2cc3c(N[C@@H](C)c4ccn(C)n4)c(C(N)=O)cnc3cc2F)cn1 targets the protein Ataxia telangiectasia mutated. The protein Ataxia telangiectasia mutated is associated with Defective B cell differentiation."} {"text":"The compound CCOC(=O)CN(NC(C)=O)C(=O)[C@@H]1CCCC[C@H]1C(=O)N[C@H](C=O)CCCN=C(N)N targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II). The protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) is associated with Excessive bleeding from superficial cuts."}", "/scratch/micpie/export/compound_protein_hpo/valid_1-0.jsonl": "{"text":"The compound Cc1cnc(NCCc2ccc3c(c2)CCC3)c(=O)n1CC(=O)NCCONC(=N)N targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) and is associated with Excessive bleeding from superficial cuts."} {"text":"The compound [C][C][N][C][=Branch1][C][=O][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][C][N][Branch1][C][C][C][Ring1][#Branch2][Ring1][=Branch1] targets the protein Choline esterase II and is associated with Chronic infection."}", "/scratch/micpie/export/compound_protein_hpo/valid_2-3.jsonl": "{"text":"The compound InChI=1S\/C23H24N2O3\/c1-25(15-17-5-3-2-4-6-17)16-18-7-10-20(11-8-18)24-23(28)14-19-9-12-21(26)22(27)13-19\/h2-13,26-27H,14-16H2,1H3,(H,24,28) targets the protein Butyrylcholine esterase. The protein is associated with the Chronic infection."} {"text":"The compound InChI=1S\/C30H36N4O4\/c35-27(11-17-33-13-3-1-4-14-33)31-21-7-9-23-25(19-21)30(38)26-20-22(8-10-24(26)29(23)37)32-28(36)12-18-34-15-5-2-6-16-34\/h7-10,19-20H,1-6,11-18H2,(H,31,35)(H,32,36) targets the protein Telomerase reverse transcriptase. The protein is associated with the Abnormal fifth cranial nerve physiology."}", "/scratch/micpie/export/compound_protein_hpo/valid_0-3.jsonl": "{"text":"The compound Brcccc-cnn[nH]n5)))))cc6 targets the protein Serine-protein kinase ATM. The protein is associated with the Defective B cell differentiation."} {"text":"The compound CCCCCCC)))C=O)NNCC=O)OCC)))))C=O)[C@@H]CCCC[C@H]6C=O)N[C@@H]CCCN=CN)N))))))C=O)C=O)NCCcccccc6 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II). The protein is associated with the Excessive bleeding from superficial cuts."}", "/scratch/micpie/export/compound_protein_hpo/train_1-2.jsonl": "{"text":"The compound [N][=C][Branch1][C][N][N][C][C][C][C][Branch1][Ring1][C][=O][N][C][=Branch1][C][=O][C@@H1][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][Ring1][=Branch2] targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II). The protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) is associated with the human phenotype represented by Excessive bleeding from superficial cuts."} {"text":"The compound C[n+]1c(\/C=C\/c2cc3ccccc3[nH]2)cc(N2CCOCC2)c2ccccc21.[I-] targets the protein Butyrylcholine esterase. The protein Butyrylcholine esterase is associated with the human phenotype represented by Chronic infection."}", "/scratch/micpie/export/compound_protein_hpo/test_2-3.jsonl": "{"text":"The compound InChI=1S\/C19H23N3OS.ClH\/c1-3-21(4-2)14-13-20-19(23)22-15-9-5-7-11-17(15)24-18-12-8-6-10-16(18)22;\/h5-12H,3-4,13-14H2,1-2H3,(H,20,23);1H targets the protein Pseudocholinesterase. The protein is associated with the Chronic infection."} {"text":"The compound COcccOC))cc=O)cOCCCCNCCCCC6)))))))))))c-cccOC))cOC))cOC))c6))))))oc6c%10 targets the protein HEST2. The protein is associated with the Abnormal fifth cranial nerve physiology."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_5-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SELFIES from the canonical SMILES.\nMolecule canonical SMILES: CCc1cc(C2CCCN2C(=O)NCCCOC)on1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SELFIES from the canonical SMILES.\ncanonical SMILES: CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]']"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_2-4.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the canonical SMILES CC1(C)CCC(CN2CCN(c3ccc(C(=O)NS(=O)(=O)c4ccc(NC5(CO)CCOCC5)c([N+](=O)[O-])c4)c(Oc4ccc5[nH]ccc5c4)c3)CC2)=C(c2ccc(Cl)cc2)C1?\nAssistant: Of course, this molecule has a SELFIES of [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N]."} {"text":"User: Can you generate the SELFIES of the molecule with the canonical SMILES CN(C)CCN1C(=O)c2oc3ccc(Cl)cc3c(=O)c2C1c1cccc(Br)c1?\nAssistant: Sure, this molecule has a SELFIES of [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_2-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the canonical SMILES.\ncanonical SMILES: CC(=O)[C@]1(O)CC[C@@]2(O)[C@]1(C)[C@H](OC(=O)\/C=C(\\C)C(C)C)C[C@@H]1[C@@]3(C)CC[C@H](O)CC3=CC[C@]12O\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C]"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the canonical SMILES.\nMolecule canonical SMILES: c1ccc(COc2ccc(CN3CCN(c4ccccn4)CC3)cc2)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O]"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_4-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CC(C)Oc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1 can also be represented with the SELFIES representation [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1]."} {"text":"The molecule with the canonical SMILES representation of NC(=O)c1ccc(C#CCNC(=O)c2ccc3ccccc3c2)cc1 can also be represented with the SELFIES representation [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_4-4.jsonl": "{"text":"User: Can you create the SELFIES of the molecule with the canonical SMILES CC(C)Oc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1?\nAssistant: Sure, this molecule has a SELFIES of [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1]."} {"text":"User: Can you tell me the SELFIES of the molecule with the canonical SMILES NC(=O)c1ccc(C#CCNC(=O)c2ccc3ccccc3c2)cc1?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_1-4.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the canonical SMILES CN1CCc2c(sc3c2c(=O)n(-c2ccccc2)c2nnc(S)n32)C1?\nAssistant: Sure, this molecule has a SELFIES of [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2]."} {"text":"User: Can you generate the SELFIES of the molecule with the canonical SMILES O=C(Cn1cc2c(n1)CCCCC2)Nc1c(F)cc(F)cc1Br?\nAssistant: Yes, this molecule has a SELFIES of [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_5-1.jsonl": "{"text":"The molecule with the SELFIES [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C] can also be represented with the canonical SMILES representation CCc1cc(C2CCCN2C(=O)NCCCOC)on1."} {"text":"The molecule with the SELFIES ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]'] can also be represented with the canonical SMILES representation CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_4-4.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the canonical SMILES CCN(CC)C(=O)CN(C)C(=O)c1nonc1N?\nAssistant: Of course, this molecule has a SELFIES of [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]."} {"text":"User: Can you generate the SELFIES of the molecule with the canonical SMILES Cc1cccc(C)c1NC(=O)CN(C)S(=O)(=O)Cc1ccccc1?\nAssistant: Yes, this molecule has a SELFIES of [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_4-4.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the canonical SMILES CCNC(=O)C1CCN(C(=O)c2nonc2N)CC1?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C]."} {"text":"User: Can you generate the SELFIES of the molecule with the canonical SMILES CN(C)c1nc(N)nc(COC(=O)c2ccc(Cl)cc2N)n1?\nAssistant: Of course, this molecule has a SELFIES of [C][N][Branch1][C][C][C][=N][C][Branch1][C][N][=N][C][Branch2][Ring1][Ring1][C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=N][Ring2][Ring1][Ring1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_1-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the canonical SMILES.\ncanonical SMILES: COc1cc(CNCCSc2nnnn2C)ccc1OCc1ccccc1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the canonical SMILES.\nMolecule canonical SMILES: CNC(=S)N1CCC(c2ccc(OC)cc2)=N1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N]"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_3-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the canonical SMILES.\nMolecule canonical SMILES: COCCCN1C(=O)C(=O)\/C(=C(\/O)c2ccc(S(=O)(=O)N3CCOCC3)cc2)C1c1cccc(Cl)c1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SELFIES from the canonical SMILES.\ncanonical SMILES: COc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1F\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F]"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_1-5.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the SELFIES [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]?\nAssistant: Yes, this molecule has a canonical SMILES of COc1cc(CNCCSc2nnnn2C)ccc1OCc1ccccc1."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the SELFIES [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N]?\nAssistant: Sure, this molecule has a canonical SMILES of CNC(=S)N1CCC(c2ccc(OC)cc2)=N1."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_5-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CCc1cc(C2CCCN2C(=O)NCCCOC)on1 can also be represented with the SELFIES [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C]."} {"text":"The molecule with the canonical SMILES CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21 can also be represented with the SELFIES representation ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]']."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_1-5.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the SELFIES [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2]?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of Cc1ccc(-c2nc3c4ccccc4[nH]c(=O)n3n2)cc1."} {"text":"User: Can you create the canonical SMILES of the molecule with the SELFIES [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2]?\nAssistant: Sure, this molecule has a canonical SMILES of Cc1c(OCCN2CCN(C)CC2)cn2ncnc(Oc3ccc(NC(=O)c4ccccc4F)cc3F)c12."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_5-4.jsonl": "{"text":"User: Can you create the SELFIES of the molecule with the canonical SMILES Cc1nc2c3ccccc3nn2c(C)c1CCC(=O)NC(C)C?\nAssistant: Yes, this molecule has a SELFIES of [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C]."} {"text":"User: Can you generate the SELFIES of the molecule with the canonical SMILES Nc1c2c([nH+]c3ccccc13)CCCC2?\nAssistant: Yes, this molecule has a SELFIES of ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]']."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_0-5.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the SELFIES [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1]?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1."} {"text":"User: Can you create the canonical SMILES of the molecule with the SELFIES [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1]?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of c1ccc(Nc2nc(CSc3nc[nH]n3)cs2)cc1."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_3-0.jsonl": "{"text":"The molecule with the canonical SMILES O=C(CSc1ccc(-c2ccco2)nn1)Nc1cccc(F)c1 can also be represented with the SELFIES [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1]."} {"text":"The molecule with the canonical SMILES representation of CCN(CCc1ccccc1)C(=O)c1nonc1N can also be represented with the SELFIES [C][C][N][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_0-1.jsonl": "{"text":"The molecule with the SELFIES representation of [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1] can also be represented with the canonical SMILES CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1."} {"text":"The molecule with the SELFIES [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1] can also be represented with the canonical SMILES c1ccc(Nc2nc(CSc3nc[nH]n3)cs2)cc1."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_5-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CCN(CC)C(=O)CN(CC)c1cc(C)cc(C)c1 can also be represented with the SELFIES representation [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2]."} {"text":"The molecule with the canonical SMILES representation of CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1 can also be represented with the SELFIES representation ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]']."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_2-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CC1(C)CCC(CN2CCN(c3ccc(C(=O)NS(=O)(=O)c4ccc(NC5(CO)CCOCC5)c([N+](=O)[O-])c4)c(Oc4ccc5[nH]ccc5c4)c3)CC2)=C(c2ccc(Cl)cc2)C1 can also be represented with the SELFIES [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N]."} {"text":"The molecule with the canonical SMILES representation of CN(C)CCN1C(=O)c2oc3ccc(Cl)cc3c(=O)c2C1c1cccc(Br)c1 can also be represented with the SELFIES [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_2-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SELFIES from the canonical SMILES.\nMolecule canonical SMILES: COc1ccc(\/C=N\/Nc2nc(C)cs2)cc1OC\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C]"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the canonical SMILES.\ncanonical SMILES: CCOc1ccsc1C(=O)OC\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C]"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_0-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F can also be represented with the SELFIES representation [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2]."} {"text":"The molecule with the canonical SMILES representation of COc1cc(OC)cc(C(=O)NC(C(=O)NCCN(C)C)C23CC4CC(CC(C4)C2)C3)c1 can also be represented with the SELFIES [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_3-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the SELFIES.\nSELFIES: [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: O=C(CSc1ccc(-c2ccco2)nn1)Nc1cccc(F)c1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the SELFIES.\nMolecule SELFIES: [C][C][N][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CCN(CCc1ccccc1)C(=O)c1nonc1N"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_5-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the SELFIES.\nSELFIES: [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Cc1nc2c3ccccc3nn2c(C)c1CCC(=O)NC(C)C"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the canonical SMILES from the SELFIES.\nSELFIES: ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]']\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Nc1c2c([nH+]c3ccccc13)CCCC2"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_3-5.jsonl": "{"text":"User: Can you generate the canonical SMILES of the molecule with the SELFIES [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]?\nAssistant: Yes, this molecule has a canonical SMILES of Cl.O=C(c1ccc(Br)cc1)C(CN1CCOCC1)c1ccccc1."} {"text":"User: Can you generate the canonical SMILES of the molecule with the SELFIES [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2]?\nAssistant: Of course, this molecule has a canonical SMILES of CCOc1ccccc1CNC(=O)Nc1ccc(OC)c(F)c1."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_5-5.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the SELFIES [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2]?\nAssistant: Sure, this molecule has a canonical SMILES of CCN(CC)C(=O)CN(CC)c1cc(C)cc(C)c1."} {"text":"User: Can you generate the canonical SMILES of the molecule with the SELFIES ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]']?\nAssistant: Of course, this molecule has a canonical SMILES of CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_4-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the canonical SMILES from the SELFIES.\nSELFIES: [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCN(CC)C(=O)CN(C)C(=O)c1nonc1N"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the SELFIES.\nMolecule SELFIES: [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: Cc1cccc(C)c1NC(=O)CN(C)S(=O)(=O)Cc1ccccc1"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_1-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the SELFIES.\nSELFIES: [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: Cc1ccc(-c2nc3c4ccccc4[nH]c(=O)n3n2)cc1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the SELFIES.\nSELFIES: [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: Cc1c(OCCN2CCN(C)CC2)cn2ncnc(Oc3ccc(NC(=O)c4ccccc4F)cc3F)c12"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_0-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the canonical SMILES.\ncanonical SMILES: CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the canonical SMILES.\nMolecule canonical SMILES: c1ccc(Nc2nc(CSc3nc[nH]n3)cs2)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1]"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_3-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of COCCCN1C(=O)C(=O)\/C(=C(\/O)c2ccc(S(=O)(=O)N3CCOCC3)cc2)C1c1cccc(Cl)c1 can also be represented with the SELFIES [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1]."} {"text":"The molecule with the canonical SMILES COc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1F can also be represented with the SELFIES representation [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_1-0.jsonl": "{"text":"The molecule with the canonical SMILES Cc1ccc(-c2nc3c4ccccc4[nH]c(=O)n3n2)cc1 can also be represented with the SELFIES [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2]."} {"text":"The molecule with the canonical SMILES Cc1c(OCCN2CCN(C)CC2)cn2ncnc(Oc3ccc(NC(=O)c4ccccc4F)cc3F)c12 can also be represented with the SELFIES representation [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_5-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the canonical SMILES.\ncanonical SMILES: CCN(CC)C(=O)CN(CC)c1cc(C)cc(C)c1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the canonical SMILES.\nMolecule canonical SMILES: CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]']"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_2-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the SELFIES.\nMolecule SELFIES: [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CC(=O)[C@]1(O)CC[C@@]2(O)[C@]1(C)[C@H](OC(=O)\/C=C(\\C)C(C)C)C[C@@H]1[C@@]3(C)CC[C@H](O)CC3=CC[C@]12O"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the canonical SMILES from the SELFIES.\nSELFIES: [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: c1ccc(COc2ccc(CN3CCN(c4ccccn4)CC3)cc2)cc1"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_1-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SELFIES from the canonical SMILES.\ncanonical SMILES: CN1CCc2c(sc3c2c(=O)n(-c2ccccc2)c2nnc(S)n32)C1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2]"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the canonical SMILES.\nMolecule canonical SMILES: O=C(Cn1cc2c(n1)CCCCC2)Nc1c(F)cc(F)cc1Br\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br]"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_5-5.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the SELFIES [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C]?\nAssistant: Of course, this molecule has a canonical SMILES of CCc1cc(C2CCCN2C(=O)NCCCOC)on1."} {"text":"User: Can you create the canonical SMILES of the molecule with the SELFIES ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]']?\nAssistant: Sure, this molecule has a canonical SMILES of CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_2-4.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the canonical SMILES CC(=O)[C@]1(O)CC[C@@]2(O)[C@]1(C)[C@H](OC(=O)\/C=C(\\C)C(C)C)C[C@@H]1[C@@]3(C)CC[C@H](O)CC3=CC[C@]12O?\nAssistant: Sure, this molecule has a SELFIES of [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C]."} {"text":"User: Can you create the SELFIES of the molecule with the canonical SMILES c1ccc(COc2ccc(CN3CCN(c4ccccn4)CC3)cc2)cc1?\nAssistant: Yes, this molecule has a SELFIES of [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_3-4.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the canonical SMILES O=C(CSc1ccc(-c2ccco2)nn1)Nc1cccc(F)c1?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1]."} {"text":"User: Can you tell me the SELFIES of the molecule with the canonical SMILES CCN(CCc1ccccc1)C(=O)c1nonc1N?\nAssistant: Sure, this molecule has a SELFIES of [C][C][N][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_0-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1 can also be represented with the SELFIES [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1]."} {"text":"The molecule with the canonical SMILES c1ccc(Nc2nc(CSc3nc[nH]n3)cs2)cc1 can also be represented with the SELFIES [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_2-5.jsonl": "{"text":"User: Can you generate the canonical SMILES of the molecule with the SELFIES [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N]?\nAssistant: Sure, this molecule has a canonical SMILES of CC1(C)CCC(CN2CCN(c3ccc(C(=O)NS(=O)(=O)c4ccc(NC5(CO)CCOCC5)c([N+](=O)[O-])c4)c(Oc4ccc5[nH]ccc5c4)c3)CC2)=C(c2ccc(Cl)cc2)C1."} {"text":"User: Can you create the canonical SMILES of the molecule with the SELFIES [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1]?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of CN(C)CCN1C(=O)c2oc3ccc(Cl)cc3c(=O)c2C1c1cccc(Br)c1."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_5-5.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the SELFIES [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C]?\nAssistant: Sure, this molecule has a canonical SMILES of Cc1nc2c3ccccc3nn2c(C)c1CCC(=O)NC(C)C."} {"text":"User: Can you generate the canonical SMILES of the molecule with the SELFIES ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]']?\nAssistant: Of course, this molecule has a canonical SMILES of Nc1c2c([nH+]c3ccccc13)CCCC2."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_3-5.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the SELFIES [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1]?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of COCCCN1C(=O)C(=O)\/C(=C(\/O)c2ccc(S(=O)(=O)N3CCOCC3)cc2)C1c1cccc(Cl)c1."} {"text":"User: Can you generate the canonical SMILES of the molecule with the SELFIES [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F]?\nAssistant: Of course, this molecule has a canonical SMILES of COc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1F."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_2-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CC(=O)[C@]1(O)CC[C@@]2(O)[C@]1(C)[C@H](OC(=O)\/C=C(\\C)C(C)C)C[C@@H]1[C@@]3(C)CC[C@H](O)CC3=CC[C@]12O can also be represented with the SELFIES [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C]."} {"text":"The molecule with the canonical SMILES representation of c1ccc(COc2ccc(CN3CCN(c4ccccn4)CC3)cc2)cc1 can also be represented with the SELFIES representation [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_1-4.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the canonical SMILES Cc1ccc(-c2nc3c4ccccc4[nH]c(=O)n3n2)cc1?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2]."} {"text":"User: Can you tell me the SELFIES of the molecule with the canonical SMILES Cc1c(OCCN2CCN(C)CC2)cn2ncnc(Oc3ccc(NC(=O)c4ccccc4F)cc3F)c12?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_2-0.jsonl": "{"text":"The molecule with the canonical SMILES COc1ccc(\/C=N\/Nc2nc(C)cs2)cc1OC can also be represented with the SELFIES [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C]."} {"text":"The molecule with the canonical SMILES CCOc1ccsc1C(=O)OC can also be represented with the SELFIES representation [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_4-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the canonical SMILES.\nMolecule canonical SMILES: CCN(CC)C(=O)CN(C)C(=O)c1nonc1N\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the canonical SMILES.\ncanonical SMILES: Cc1cccc(C)c1NC(=O)CN(C)S(=O)(=O)Cc1ccccc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_0-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the SELFIES.\nMolecule SELFIES: [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the canonical SMILES from the SELFIES.\nSELFIES: [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: c1ccc(Nc2nc(CSc3nc[nH]n3)cs2)cc1"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_4-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SELFIES from the canonical SMILES.\nMolecule canonical SMILES: CC(C)Oc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1]"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the canonical SMILES.\nMolecule canonical SMILES: NC(=O)c1ccc(C#CCNC(=O)c2ccc3ccccc3c2)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1]"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_3-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SELFIES from the canonical SMILES.\ncanonical SMILES: O=C(CSc1ccc(-c2ccco2)nn1)Nc1cccc(F)c1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1]"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the canonical SMILES.\nMolecule canonical SMILES: CCN(CCc1ccccc1)C(=O)c1nonc1N\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][C][N][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_2-1.jsonl": "{"text":"The molecule with the SELFIES representation of [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C] can also be represented with the canonical SMILES representation COc1ccc(\/C=N\/Nc2nc(C)cs2)cc1OC."} {"text":"The molecule with the SELFIES representation of [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C] can also be represented with the canonical SMILES CCOc1ccsc1C(=O)OC."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_5-4.jsonl": "{"text":"User: Can you create the SELFIES of the molecule with the canonical SMILES CCN(CC)C(=O)CN(CC)c1cc(C)cc(C)c1?\nAssistant: Of course, this molecule has a SELFIES of [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2]."} {"text":"User: Can you generate the SELFIES of the molecule with the canonical SMILES CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]']."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_4-5.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the SELFIES [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1]?\nAssistant: Of course, this molecule has a canonical SMILES of CC(C)Oc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1."} {"text":"User: Can you generate the canonical SMILES of the molecule with the SELFIES [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1]?\nAssistant: Sure, this molecule has a canonical SMILES of NC(=O)c1ccc(C#CCNC(=O)c2ccc3ccccc3c2)cc1."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_4-0.jsonl": "{"text":"The molecule with the canonical SMILES CCN(CC)C(=O)CN(C)C(=O)c1nonc1N can also be represented with the SELFIES [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]."} {"text":"The molecule with the canonical SMILES representation of Cc1cccc(C)c1NC(=O)CN(C)S(=O)(=O)Cc1ccccc1 can also be represented with the SELFIES [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_5-1.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C] can also be represented with the canonical SMILES representation Cc1nc2c3ccccc3nn2c(C)c1CCC(=O)NC(C)C."} {"text":"The molecule with the SELFIES representation of ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]'] can also be represented with the canonical SMILES Nc1c2c([nH+]c3ccccc13)CCCC2."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_2-1.jsonl": "{"text":"The molecule with the SELFIES [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N] can also be represented with the canonical SMILES CC1(C)CCC(CN2CCN(c3ccc(C(=O)NS(=O)(=O)c4ccc(NC5(CO)CCOCC5)c([N+](=O)[O-])c4)c(Oc4ccc5[nH]ccc5c4)c3)CC2)=C(c2ccc(Cl)cc2)C1."} {"text":"The molecule with the SELFIES [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1] can also be represented with the canonical SMILES CN(C)CCN1C(=O)c2oc3ccc(Cl)cc3c(=O)c2C1c1cccc(Br)c1."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_0-0.jsonl": "{"text":"The molecule with the canonical SMILES Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1 can also be represented with the SELFIES [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C]."} {"text":"The molecule with the canonical SMILES representation of CCOC(=O)c1cccc(-[n+]2c(-c3ccccc3)cc(-c3ccccc3)cc2-c2ccccc2)c1.[O-][Cl+3]([O-])([O-])[O-] can also be represented with the SELFIES representation [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_0-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the SELFIES.\nMolecule SELFIES: [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the SELFIES.\nSELFIES: [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCOC(=O)c1cccc(-[n+]2c(-c3ccccc3)cc(-c3ccccc3)cc2-c2ccccc2)c1.[O-][Cl+3]([O-])([O-])[O-]"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_1-4.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the canonical SMILES COc1cc(CNCCSc2nnnn2C)ccc1OCc1ccccc1?\nAssistant: Sure, this molecule has a SELFIES of [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]."} {"text":"User: Can you create the SELFIES of the molecule with the canonical SMILES CNC(=S)N1CCC(c2ccc(OC)cc2)=N1?\nAssistant: Of course, this molecule has a SELFIES of [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_2-5.jsonl": "{"text":"User: Can you generate the canonical SMILES of the molecule with the SELFIES [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C]?\nAssistant: Sure, this molecule has a canonical SMILES of CC(=O)[C@]1(O)CC[C@@]2(O)[C@]1(C)[C@H](OC(=O)\/C=C(\\C)C(C)C)C[C@@H]1[C@@]3(C)CC[C@H](O)CC3=CC[C@]12O."} {"text":"User: Can you create the canonical SMILES of the molecule with the SELFIES [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O]?\nAssistant: Of course, this molecule has a canonical SMILES of c1ccc(COc2ccc(CN3CCN(c4ccccn4)CC3)cc2)cc1."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_1-1.jsonl": "{"text":"The molecule with the SELFIES representation of [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1] can also be represented with the canonical SMILES representation COc1cc(CNCCSc2nnnn2C)ccc1OCc1ccccc1."} {"text":"The molecule with the SELFIES [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N] can also be represented with the canonical SMILES representation CNC(=S)N1CCC(c2ccc(OC)cc2)=N1."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_5-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the canonical SMILES.\ncanonical SMILES: Cc1nc2c3ccccc3nn2c(C)c1CCC(=O)NC(C)C\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the canonical SMILES.\ncanonical SMILES: Nc1c2c([nH+]c3ccccc13)CCCC2\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]']"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_5-1.jsonl": "{"text":"The molecule with the SELFIES [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2] can also be represented with the canonical SMILES representation CCN(CC)C(=O)CN(CC)c1cc(C)cc(C)c1."} {"text":"The molecule with the SELFIES ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]'] can also be represented with the canonical SMILES CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_4-1.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C] can also be represented with the canonical SMILES representation CCNC(=O)C1CCN(C(=O)c2nonc2N)CC1."} {"text":"The molecule with the SELFIES [C][N][Branch1][C][C][C][=N][C][Branch1][C][N][=N][C][Branch2][Ring1][Ring1][C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=N][Ring2][Ring1][Ring1] can also be represented with the canonical SMILES representation CN(C)c1nc(N)nc(COC(=O)c2ccc(Cl)cc2N)n1."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_3-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the SELFIES.\nSELFIES: [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: COCCCN1C(=O)C(=O)\/C(=C(\/O)c2ccc(S(=O)(=O)N3CCOCC3)cc2)C1c1cccc(Cl)c1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the SELFIES.\nMolecule SELFIES: [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: COc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1F"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_3-4.jsonl": "{"text":"User: Can you create the SELFIES of the molecule with the canonical SMILES Cl.O=C(c1ccc(Br)cc1)C(CN1CCOCC1)c1ccccc1?\nAssistant: Sure, this molecule has a SELFIES of [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]."} {"text":"User: Can you tell me the SELFIES of the molecule with the canonical SMILES CCOc1ccccc1CNC(=O)Nc1ccc(OC)c(F)c1?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_1-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the SELFIES.\nSELFIES: [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CN1CCc2c(sc3c2c(=O)n(-c2ccccc2)c2nnc(S)n32)C1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the SELFIES.\nMolecule SELFIES: [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: O=C(Cn1cc2c(n1)CCCCC2)Nc1c(F)cc(F)cc1Br"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_0-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the canonical SMILES.\nMolecule canonical SMILES: CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SELFIES from the canonical SMILES.\ncanonical SMILES: COc1cc(OC)cc(C(=O)NC(C(=O)NCCN(C)C)C23CC4CC(CC(C4)C2)C3)c1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C]"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_5-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of Cc1nc2c3ccccc3nn2c(C)c1CCC(=O)NC(C)C can also be represented with the SELFIES representation [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C]."} {"text":"The molecule with the canonical SMILES representation of Nc1c2c([nH+]c3ccccc13)CCCC2 can also be represented with the SELFIES ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]']."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_0-1.jsonl": "{"text":"The molecule with the SELFIES representation of [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2] can also be represented with the canonical SMILES representation CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F."} {"text":"The molecule with the SELFIES [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C] can also be represented with the canonical SMILES COc1cc(OC)cc(C(=O)NC(C(=O)NCCN(C)C)C23CC4CC(CC(C4)C2)C3)c1."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_2-4.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the canonical SMILES COc1ccc(\/C=N\/Nc2nc(C)cs2)cc1OC?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C]."} {"text":"User: Can you create the SELFIES of the molecule with the canonical SMILES CCOc1ccsc1C(=O)OC?\nAssistant: Sure, this molecule has a SELFIES of [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_2-1.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C] can also be represented with the canonical SMILES CC(=O)[C@]1(O)CC[C@@]2(O)[C@]1(C)[C@H](OC(=O)\/C=C(\\C)C(C)C)C[C@@H]1[C@@]3(C)CC[C@H](O)CC3=CC[C@]12O."} {"text":"The molecule with the SELFIES [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O] can also be represented with the canonical SMILES representation c1ccc(COc2ccc(CN3CCN(c4ccccn4)CC3)cc2)cc1."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_1-1.jsonl": "{"text":"The molecule with the SELFIES [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2] can also be represented with the canonical SMILES CN1CCc2c(sc3c2c(=O)n(-c2ccccc2)c2nnc(S)n32)C1."} {"text":"The molecule with the SELFIES [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br] can also be represented with the canonical SMILES O=C(Cn1cc2c(n1)CCCCC2)Nc1c(F)cc(F)cc1Br."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_3-5.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the SELFIES [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1]?\nAssistant: Of course, this molecule has a canonical SMILES of O=C(CSc1ccc(-c2ccco2)nn1)Nc1cccc(F)c1."} {"text":"User: Can you create the canonical SMILES of the molecule with the SELFIES [C][C][N][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]?\nAssistant: Sure, this molecule has a canonical SMILES of CCN(CCc1ccccc1)C(=O)c1nonc1N."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_3-1.jsonl": "{"text":"The molecule with the SELFIES representation of [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1] can also be represented with the canonical SMILES COCCCN1C(=O)C(=O)\/C(=C(\/O)c2ccc(S(=O)(=O)N3CCOCC3)cc2)C1c1cccc(Cl)c1."} {"text":"The molecule with the SELFIES [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F] can also be represented with the canonical SMILES COc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1F."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_0-5.jsonl": "{"text":"User: Can you generate the canonical SMILES of the molecule with the SELFIES [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2]?\nAssistant: Of course, this molecule has a canonical SMILES of CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the SELFIES [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C]?\nAssistant: Sure, this molecule has a canonical SMILES of COc1cc(OC)cc(C(=O)NC(C(=O)NCCN(C)C)C23CC4CC(CC(C4)C2)C3)c1."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_0-4.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the canonical SMILES CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F?\nAssistant: Yes, this molecule has a SELFIES of [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2]."} {"text":"User: Can you tell me the SELFIES of the molecule with the canonical SMILES COc1cc(OC)cc(C(=O)NC(C(=O)NCCN(C)C)C23CC4CC(CC(C4)C2)C3)c1?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_1-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the canonical SMILES from the SELFIES.\nMolecule SELFIES: [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: COc1cc(CNCCSc2nnnn2C)ccc1OCc1ccccc1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the SELFIES.\nSELFIES: [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CNC(=S)N1CCC(c2ccc(OC)cc2)=N1"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_5-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the canonical SMILES from the SELFIES.\nSELFIES: [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCN(CC)C(=O)CN(CC)c1cc(C)cc(C)c1"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the SELFIES.\nSELFIES: ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]']\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_0-5.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the SELFIES [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C]?\nAssistant: Sure, this molecule has a canonical SMILES of Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1."} {"text":"User: Can you create the canonical SMILES of the molecule with the SELFIES [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1]?\nAssistant: Of course, this molecule has a canonical SMILES of CCOC(=O)c1cccc(-[n+]2c(-c3ccccc3)cc(-c3ccccc3)cc2-c2ccccc2)c1.[O-][Cl+3]([O-])([O-])[O-]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_4-5.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the SELFIES [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C]?\nAssistant: Yes, this molecule has a canonical SMILES of CCNC(=O)C1CCN(C(=O)c2nonc2N)CC1."} {"text":"User: Can you generate the canonical SMILES of the molecule with the SELFIES [C][N][Branch1][C][C][C][=N][C][Branch1][C][N][=N][C][Branch2][Ring1][Ring1][C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=N][Ring2][Ring1][Ring1]?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of CN(C)c1nc(N)nc(COC(=O)c2ccc(Cl)cc2N)n1."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_1-0.jsonl": "{"text":"The molecule with the canonical SMILES COc1cc(CNCCSc2nnnn2C)ccc1OCc1ccccc1 can also be represented with the SELFIES representation [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]."} {"text":"The molecule with the canonical SMILES CNC(=S)N1CCC(c2ccc(OC)cc2)=N1 can also be represented with the SELFIES [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_0-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the canonical SMILES.\nMolecule canonical SMILES: Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C]"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SELFIES from the canonical SMILES.\nMolecule canonical SMILES: CCOC(=O)c1cccc(-[n+]2c(-c3ccccc3)cc(-c3ccccc3)cc2-c2ccccc2)c1.[O-][Cl+3]([O-])([O-])[O-]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1]"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_4-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SELFIES from the canonical SMILES.\nMolecule canonical SMILES: CCNC(=O)C1CCN(C(=O)c2nonc2N)CC1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C]"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SELFIES from the canonical SMILES.\nMolecule canonical SMILES: CN(C)c1nc(N)nc(COC(=O)c2ccc(Cl)cc2N)n1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][N][Branch1][C][C][C][=N][C][Branch1][C][N][=N][C][Branch2][Ring1][Ring1][C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=N][Ring2][Ring1][Ring1]"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_4-1.jsonl": "{"text":"The molecule with the SELFIES [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N] can also be represented with the canonical SMILES CCN(CC)C(=O)CN(C)C(=O)c1nonc1N."} {"text":"The molecule with the SELFIES representation of [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1] can also be represented with the canonical SMILES representation Cc1cccc(C)c1NC(=O)CN(C)S(=O)(=O)Cc1ccccc1."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_5-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the SELFIES.\nSELFIES: [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CCc1cc(C2CCCN2C(=O)NCCCOC)on1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the SELFIES.\nMolecule SELFIES: ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]']\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_2-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SELFIES from the canonical SMILES.\ncanonical SMILES: CC1(C)CCC(CN2CCN(c3ccc(C(=O)NS(=O)(=O)c4ccc(NC5(CO)CCOCC5)c([N+](=O)[O-])c4)c(Oc4ccc5[nH]ccc5c4)c3)CC2)=C(c2ccc(Cl)cc2)C1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N]"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SELFIES from the canonical SMILES.\nMolecule canonical SMILES: CN(C)CCN1C(=O)c2oc3ccc(Cl)cc3c(=O)c2C1c1cccc(Br)c1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1]"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_1-1.jsonl": "{"text":"The molecule with the SELFIES [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2] can also be represented with the canonical SMILES Cc1ccc(-c2nc3c4ccccc4[nH]c(=O)n3n2)cc1."} {"text":"The molecule with the SELFIES representation of [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2] can also be represented with the canonical SMILES representation Cc1c(OCCN2CCN(C)CC2)cn2ncnc(Oc3ccc(NC(=O)c4ccccc4F)cc3F)c12."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_0-1.jsonl": "{"text":"The molecule with the SELFIES representation of [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C] can also be represented with the canonical SMILES representation Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1."} {"text":"The molecule with the SELFIES [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1] can also be represented with the canonical SMILES CCOC(=O)c1cccc(-[n+]2c(-c3ccccc3)cc(-c3ccccc3)cc2-c2ccccc2)c1.[O-][Cl+3]([O-])([O-])[O-]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_3-4.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the canonical SMILES COCCCN1C(=O)C(=O)\/C(=C(\/O)c2ccc(S(=O)(=O)N3CCOCC3)cc2)C1c1cccc(Cl)c1?\nAssistant: Sure, this molecule has a SELFIES of [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1]."} {"text":"User: Can you tell me the SELFIES of the molecule with the canonical SMILES COc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1F?\nAssistant: Of course, this molecule has a SELFIES of [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_0-4.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the canonical SMILES Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1?\nAssistant: Of course, this molecule has a SELFIES of [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C]."} {"text":"User: Can you generate the SELFIES of the molecule with the canonical SMILES CCOC(=O)c1cccc(-[n+]2c(-c3ccccc3)cc(-c3ccccc3)cc2-c2ccccc2)c1.[O-][Cl+3]([O-])([O-])[O-]?\nAssistant: Sure, this molecule has a SELFIES of [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_1-0.jsonl": "{"text":"The molecule with the canonical SMILES CN1CCc2c(sc3c2c(=O)n(-c2ccccc2)c2nnc(S)n32)C1 can also be represented with the SELFIES representation [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2]."} {"text":"The molecule with the canonical SMILES O=C(Cn1cc2c(n1)CCCCC2)Nc1c(F)cc(F)cc1Br can also be represented with the SELFIES representation [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_4-5.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the SELFIES [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of CCN(CC)C(=O)CN(C)C(=O)c1nonc1N."} {"text":"User: Can you create the canonical SMILES of the molecule with the SELFIES [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]?\nAssistant: Sure, this molecule has a canonical SMILES of Cc1cccc(C)c1NC(=O)CN(C)S(=O)(=O)Cc1ccccc1."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_4-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the canonical SMILES from the SELFIES.\nMolecule SELFIES: [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CCNC(=O)C1CCN(C(=O)c2nonc2N)CC1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the SELFIES.\nMolecule SELFIES: [C][N][Branch1][C][C][C][=N][C][Branch1][C][N][=N][C][Branch2][Ring1][Ring1][C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=N][Ring2][Ring1][Ring1]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CN(C)c1nc(N)nc(COC(=O)c2ccc(Cl)cc2N)n1"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_4-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the canonical SMILES from the SELFIES.\nSELFIES: [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC(C)Oc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the canonical SMILES from the SELFIES.\nSELFIES: [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: NC(=O)c1ccc(C#CCNC(=O)c2ccc3ccccc3c2)cc1"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_3-1.jsonl": "{"text":"The molecule with the SELFIES [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1] can also be represented with the canonical SMILES Cl.O=C(c1ccc(Br)cc1)C(CN1CCOCC1)c1ccccc1."} {"text":"The molecule with the SELFIES [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2] can also be represented with the canonical SMILES CCOc1ccccc1CNC(=O)Nc1ccc(OC)c(F)c1."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_2-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the canonical SMILES from the SELFIES.\nSELFIES: [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: COc1ccc(\/C=N\/Nc2nc(C)cs2)cc1OC"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the SELFIES.\nMolecule SELFIES: [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CCOc1ccsc1C(=O)OC"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_3-1.jsonl": "{"text":"The molecule with the SELFIES [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1] can also be represented with the canonical SMILES representation O=C(CSc1ccc(-c2ccco2)nn1)Nc1cccc(F)c1."} {"text":"The molecule with the SELFIES representation of [C][C][N][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N] can also be represented with the canonical SMILES representation CCN(CCc1ccccc1)C(=O)c1nonc1N."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_0-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the SELFIES.\nMolecule SELFIES: [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the SELFIES.\nMolecule SELFIES: [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: COc1cc(OC)cc(C(=O)NC(C(=O)NCCN(C)C)C23CC4CC(CC(C4)C2)C3)c1"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_1-5.jsonl": "{"text":"User: Can you generate the canonical SMILES of the molecule with the SELFIES [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2]?\nAssistant: Sure, this molecule has a canonical SMILES of CN1CCc2c(sc3c2c(=O)n(-c2ccccc2)c2nnc(S)n32)C1."} {"text":"User: Can you create the canonical SMILES of the molecule with the SELFIES [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br]?\nAssistant: Sure, this molecule has a canonical SMILES of O=C(Cn1cc2c(n1)CCCCC2)Nc1c(F)cc(F)cc1Br."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_4-1.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1] can also be represented with the canonical SMILES CC(C)Oc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1."} {"text":"The molecule with the SELFIES representation of [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1] can also be represented with the canonical SMILES NC(=O)c1ccc(C#CCNC(=O)c2ccc3ccccc3c2)cc1."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_5-4.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the canonical SMILES CCc1cc(C2CCCN2C(=O)NCCCOC)on1?\nAssistant: Sure, this molecule has a SELFIES of [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C]."} {"text":"User: Can you generate the SELFIES of the molecule with the canonical SMILES CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21?\nAssistant: Yes, this molecule has a SELFIES of ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]']."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/valid_2-5.jsonl": "{"text":"User: Can you generate the canonical SMILES of the molecule with the SELFIES [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C]?\nAssistant: Of course, this molecule has a canonical SMILES of COc1ccc(\/C=N\/Nc2nc(C)cs2)cc1OC."} {"text":"User: Can you create the canonical SMILES of the molecule with the SELFIES [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C]?\nAssistant: Yes, this molecule has a canonical SMILES of CCOc1ccsc1C(=O)OC."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_1-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the canonical SMILES.\ncanonical SMILES: Cc1ccc(-c2nc3c4ccccc4[nH]c(=O)n3n2)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2]"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SELFIES from the canonical SMILES.\ncanonical SMILES: Cc1c(OCCN2CCN(C)CC2)cn2ncnc(Oc3ccc(NC(=O)c4ccccc4F)cc3F)c12\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2]"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_0-4.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the canonical SMILES CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1?\nAssistant: Sure, this molecule has a SELFIES of [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1]."} {"text":"User: Can you tell me the SELFIES of the molecule with the canonical SMILES c1ccc(Nc2nc(CSc3nc[nH]n3)cs2)cc1?\nAssistant: Yes, this molecule has a SELFIES of [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_3-0.jsonl": "{"text":"The molecule with the canonical SMILES Cl.O=C(c1ccc(Br)cc1)C(CN1CCOCC1)c1ccccc1 can also be represented with the SELFIES representation [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]."} {"text":"The molecule with the canonical SMILES CCOc1ccccc1CNC(=O)Nc1ccc(OC)c(F)c1 can also be represented with the SELFIES representation [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/test_2-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the SELFIES.\nSELFIES: [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CC1(C)CCC(CN2CCN(c3ccc(C(=O)NS(=O)(=O)c4ccc(NC5(CO)CCOCC5)c([N+](=O)[O-])c4)c(Oc4ccc5[nH]ccc5c4)c3)CC2)=C(c2ccc(Cl)cc2)C1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the SELFIES.\nSELFIES: [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CN(C)CCN1C(=O)c2oc3ccc(Cl)cc3c(=O)c2C1c1cccc(Br)c1"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_4-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CCNC(=O)C1CCN(C(=O)c2nonc2N)CC1 can also be represented with the SELFIES [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C]."} {"text":"The molecule with the canonical SMILES representation of CN(C)c1nc(N)nc(COC(=O)c2ccc(Cl)cc2N)n1 can also be represented with the SELFIES representation [C][N][Branch1][C][C][C][=N][C][Branch1][C][N][=N][C][Branch2][Ring1][Ring1][C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=N][Ring2][Ring1][Ring1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_3-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SELFIES from the canonical SMILES.\nMolecule canonical SMILES: Cl.O=C(c1ccc(Br)cc1)C(CN1CCOCC1)c1ccccc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the canonical SMILES.\ncanonical SMILES: CCOc1ccccc1CNC(=O)Nc1ccc(OC)c(F)c1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2]"}", "/scratch/micpie/export/mol_repr_transl_selfies_canonical/train_3-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the canonical SMILES from the SELFIES.\nMolecule SELFIES: [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Cl.O=C(c1ccc(Br)cc1)C(CN1CCOCC1)c1ccccc1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the SELFIES.\nSELFIES: [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCOc1ccccc1CNC(=O)Nc1ccc(OC)c(F)c1"}", "/scratch/micpie/export/herg_blockers/test_0-10.jsonl": "{"text":"User: I'm searching for the SELFIES of a molecule that is active against hERG (<10uM)?\nAssistant: This is a molecule that is a human ether-à-go-go related gene (hERG) blocking compound: [O][=C][N][C][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][O][C][C][NH1+1][Branch2][Ring1][=N][C][C][C][C][Branch1][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][C][Ring2][Ring1][Branch2]"} {"text":"User: I'm looking for the SMILES of a molecule that blocks the human ether-à-go-go related gene (hERG)?\nAssistant: This is a molecule that is a hERG blocking compound: CC[C@@H](c1cccc(O)c1)[C@@H](C)CN(C)C"}", "/scratch/micpie/export/herg_blockers/valid_0-8.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C22H24N4O5S\/c1-32(28,29)25-16-3-5-21-17(13-16)20(27)14-22(30-21)7-10-26(11-8-22)9-6-15-2-4-18-19(12-15)24-31-23-18\/h2-5,12-13,25H,6-11,14H2,1H3 a hERG blocker?\nAssistant: Yes, it is a hERG blocker."} {"text":"User: Is the molecule with the canonical SMILES C[NH+](C)CCCN1c2ccccc2CCc2ccccc21 a human ether-à-go-go related gene (hERG) blocker?\nAssistant: Yes, it is a human ether-à-go-go related gene (hERG) blocker."}", "/scratch/micpie/export/herg_blockers/train_0-8.jsonl": "{"text":"User: Is the molecule with the SELFIES [O][C][=C][C][=C][Branch2][Ring2][=Branch2][C][C][N][C][C][C][Branch2][Ring1][O][N][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][C][Ring2][Ring1][Branch2][C][=C][Ring2][Ring1][S] a hERG blocking compound (<10uM)?\nAssistant: Yes, it is a hERG blocking compound (<10uM)."} {"text":"User: Is the molecule with the SELFIES [F][C][=C][C][=C][C][C][Branch1][#Branch2][C@@H1][C][NH2+1][C][C][C@@H1][Ring1][=Branch1][F][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][Ring2][Ring1][=Branch1][=Ring2][Ring1][C] a hERG blocker?\nAssistant: Yes, it is a hERG blocker."}", "/scratch/micpie/export/herg_blockers/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a hERG blocking compound (<10uM).\nMolecule DeepSMILES: O=CNCNcccccc6))))))C5CC[NH+]CCCCccccF)cc6))))))ccccF)cc6))))))))))CC6\nConstraint: Answer the question in a full sentence.\nResult: This molecule is hERG blocking compound (<10uM)."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a hERG blocking compound.\nMolecule SELFIES: [C][C][C@@H1][Branch1][N][C][=C][C][=C][C][Branch1][C][O][=C][Ring1][#Branch1][C@@H1][Branch1][C][C][C][N][Branch1][C][C][C]\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is hERG blocking compound."}", "/scratch/micpie/export/herg_blockers/valid_0-9.jsonl": "{"text":"User: Can you create the canonical SMILES of a molecule that is a human ether-à-go-go related gene (hERG) blocking compound?\nAssistant: Yes, here you go: CS(=O)(=O)Nc1ccc2c(c1)C(=O)CC1(CCN(CCc3ccc4nonc4c3)CC1)O2"} {"text":"User: Can you create the canonical SMILES of a molecule that is a human ether-à-go-go related gene (hERG) blocker?\nAssistant: Of course, here you go: C[NH+](C)CCCN1c2ccccc2CCc2ccccc21"}", "/scratch/micpie/export/herg_blockers/test_0-1.jsonl": "{"text":"Based on the SMILES representation O=C1NCN(c2ccccc2)C12CC[NH+](CCCC(c1ccc(F)cc1)c1ccc(F)cc1)CC2, the molecule is a hERG blocker."} {"text":"Based on the canonical SMILES representation CC[C@@H](c1cccc(O)c1)[C@@H](C)CN(C)C, the molecule is a human ether-à-go-go related gene (hERG) blocker."}", "/scratch/micpie/export/herg_blockers/valid_0-0.jsonl": "{"text":"The molecule with the SMILES CS(=O)(=O)Nc1ccc2c(c1)C(=O)CC1(CCN(CCc3ccc4nonc4c3)CC1)O2 is a human ether-à-go-go related gene (hERG) blocker."} {"text":"The molecule with the SMILES C[NH+](C)CCCN1c2ccccc2CCc2ccccc21 is a human ether-à-go-go related gene (hERG) blocking compound."}", "/scratch/micpie/export/herg_blockers/test_0-2.jsonl": "{"text":"The InChI InChI=1S\/C29H31F2N3O\/c30-24-12-8-22(9-13-24)27(23-10-14-25(31)15-11-23)7-4-18-33-19-16-29(17-20-33)28(35)32-21-34(29)26-5-2-1-3-6-26\/h1-3,5-6,8-15,27H,4,7,16-21H2,(H,32,35)\/p+1 represents a molecule that is a hERG blocking compound (<10uM)."} {"text":"The InChI InChI=1S\/C14H23NO\/c1-5-14(11(2)10-15(3)4)12-7-6-8-13(16)9-12\/h6-9,11,14,16H,5,10H2,1-4H3\/t11-,14+\/m0\/s1 represents a molecule that is a hERG blocking compound."}", "/scratch/micpie/export/herg_blockers/valid_0-10.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that is active against hERG (<10uM)?\nAssistant: This is a molecule that is a hERG blocking compound (<10uM): CS(=O)(=O)Nc1ccc2c(c1)C(=O)CC1(CCN(CCc3ccc4nonc4c3)CC1)O2"} {"text":"User: I'm looking for the DeepSMILES of a molecule that is active against hERG (<10uM)?\nAssistant: This is a molecule that is a hERG blocker: C[NH+]C)CCCNcccccc6CCcccccc6%15"}", "/scratch/micpie/export/herg_blockers/train_0-6.jsonl": "{"text":"Task: Please create a molecule SMILES based on the text description below.\nDescription: A molecule that is a hERG blocking compound.\nResult: Oc1ccc(CCN2CCC(Nc3nc4ccccc4n3Cc3ccc(F)cc3)CC2)cc1"} {"text":"Task: Please generate a SELFIES based on the description below.\nDescription: A molecule that is a hERG blocker.\nResult: [F][C][=C][C][=C][C][C][Branch1][#Branch2][C@@H1][C][NH2+1][C][C][C@@H1][Ring1][=Branch1][F][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][Ring2][Ring1][=Branch1][=Ring2][Ring1][C]"}", "/scratch/micpie/export/herg_blockers/valid_0-6.jsonl": "{"text":"Task: Please generate a molecule canonical SMILES based on the text description below.\nDescription: A molecule that is a human ether-à-go-go related gene (hERG) blocker.\nResult: CS(=O)(=O)Nc1ccc2c(c1)C(=O)CC1(CCN(CCc3ccc4nonc4c3)CC1)O2"} {"text":"Task: Please generate a molecule SELFIES based on the description below.\nDescription: A molecule that is a human ether-à-go-go related gene (hERG) blocking compound.\nResult: [C][NH1+1][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][#C]"}", "/scratch/micpie/export/herg_blockers/test_0-9.jsonl": "{"text":"User: Can you generate the SMILES of a molecule that is a hERG blocker?\nAssistant: Sure, here you go: O=C1NCN(c2ccccc2)C12CC[NH+](CCCC(c1ccc(F)cc1)c1ccc(F)cc1)CC2"} {"text":"User: Can you give me the canonical SMILES of a molecule that is a hERG blocking compound?\nAssistant: Yes, here you go: CC[C@@H](c1cccc(O)c1)[C@@H](C)CN(C)C"}", "/scratch/micpie/export/herg_blockers/test_0-0.jsonl": "{"text":"The molecule with the DeepSMILES O=CNCNcccccc6))))))C5CC[NH+]CCCCccccF)cc6))))))ccccF)cc6))))))))))CC6 is a hERG blocking compound (<10uM)."} {"text":"The molecule with the SMILES CC[C@@H](c1cccc(O)c1)[C@@H](C)CN(C)C is a hERG blocking compound."}", "/scratch/micpie/export/herg_blockers/valid_0-7.jsonl": "{"text":"User: Can you derive if the molecule with the DeepSMILES CS=O)=O)Ncccccc6)C=O)CCCCNCCccccnonc5c9)))))))))))CC6)))))O6 is a hERG blocking compound?\nAssistant: Yes, this molecule is a hERG blocking compound."} {"text":"User: Can you derive if the molecule with the SELFIES [C][NH1+1][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][#C] is a human ether-à-go-go related gene (hERG) blocking compound?\nAssistant: Yes, this molecule is a human ether-à-go-go related gene (hERG) blocking compound."}", "/scratch/micpie/export/herg_blockers/test_0-3.jsonl": "{"text":"The SMILES O=C1NCN(c2ccccc2)C12CC[NH+](CCCC(c1ccc(F)cc1)c1ccc(F)cc1)CC2 is a hERG blocking compound (<10uM)."} {"text":"The molecule DeepSMILES CC[C@@H]cccccO)c6))))))[C@@H]C)CNC)C is a hERG blocker."}", "/scratch/micpie/export/herg_blockers/valid_0-11.jsonl": "{"text":"User: I want to come up with a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should be a hERG blocking compound (<10uM).\nAssistant: Ok, this canonical SMILES is a hERG blocking compound (<10uM): CS(=O)(=O)Nc1ccc2c(c1)C(=O)CC1(CCN(CCc3ccc4nonc4c3)CC1)O2"} {"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should be a hERG blocking compound (<10uM).\nAssistant: Got it, this InChI is a hERG blocking compound (<10uM): InChI=1S\/C19H24N2\/c1-20(2)14-7-15-21-18-10-5-3-8-16(18)12-13-17-9-4-6-11-19(17)21\/h3-6,8-11H,7,12-15H2,1-2H3\/p+1"}", "/scratch/micpie/export/herg_blockers/train_0-0.jsonl": "{"text":"The molecule with the SELFIES [O][C][=C][C][=C][Branch2][Ring2][=Branch2][C][C][N][C][C][C][Branch2][Ring1][O][N][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][C][Ring2][Ring1][Branch2][C][=C][Ring2][Ring1][S] is a hERG blocking compound (<10uM)."} {"text":"The molecule with the SMILES Fc1cccc2c([C@@H]3C[NH2+]CC[C@@H]3F)c(-c3ccccc3)[nH]c12 is a human ether-à-go-go related gene (hERG) blocker."}", "/scratch/micpie/export/herg_blockers/test_0-6.jsonl": "{"text":"Task: Please give me a canonical SMILES based on the description below.\nDescription: A molecule that is a hERG blocker.\nResult: O=C1NCN(c2ccccc2)C12CC[NH+](CCCC(c1ccc(F)cc1)c1ccc(F)cc1)CC2"} {"text":"Task: Please generate a SELFIES based on the description below.\nDescription: A molecule that is a hERG blocking compound (<10uM).\nResult: [C][C][C@@H1][Branch1][N][C][=C][C][=C][C][Branch1][C][O][=C][Ring1][#Branch1][C@@H1][Branch1][C][C][C][N][Branch1][C][C][C]"}", "/scratch/micpie/export/herg_blockers/train_0-10.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that blocks hERG?\nAssistant: This is a molecule that is a hERG blocking compound (<10uM): Oc1ccc(CCN2CCC(Nc3nc4ccccc4n3Cc3ccc(F)cc3)CC2)cc1"} {"text":"User: I'm searching for the SELFIES of a molecule that blocks the human ether-à-go-go related gene (hERG)?\nAssistant: This is a molecule that is a hERG blocking compound: [F][C][=C][C][=C][C][C][Branch1][#Branch2][C@@H1][C][NH2+1][C][C][C@@H1][Ring1][=Branch1][F][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][Ring2][Ring1][=Branch1][=Ring2][Ring1][C]"}", "/scratch/micpie/export/herg_blockers/train_0-3.jsonl": "{"text":"The molecule DeepSMILES OccccCCNCCCNcncccccc6n9CccccF)cc6)))))))))))))))))CC6))))))))cc6 is a hERG blocker."} {"text":"The molecule canonical SMILES Fc1cccc2c([C@@H]3C[NH2+]CC[C@@H]3F)c(-c3ccccc3)[nH]c12 is a hERG blocking compound."}", "/scratch/micpie/export/herg_blockers/train_0-12.jsonl": "{"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should be a hERG blocker.\nAssistant: Understood, this SMILES is a hERG blocker: Oc1ccc(CCN2CCC(Nc3nc4ccccc4n3Cc3ccc(F)cc3)CC2)cc1"} {"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should be a human ether-à-go-go related gene (hERG) blocker.\nAssistant: Ok, this SMILES is a human ether-à-go-go related gene (hERG) blocker: Fc1cccc2c([C@@H]3C[NH2+]CC[C@@H]3F)c(-c3ccccc3)[nH]c12"}", "/scratch/micpie/export/herg_blockers/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are a human ether-à-go-go related gene (hERG) blocking compound?\nConstraint: You must select none, one or more options from a, b, or c without using any additional words.\nOptions:\n(a) O=CNCNcccccc6))))))C5CC[NH+]CCCCccccF)cc6))))))ccccF)cc6))))))))))CC6\n(b) O=CNCCCNCcccccc6)OCO5)))))))))CC6)))))))ccc=O)cccF)cCl)cc6o%10\n(c) CC=NC=[N+]CCCC6))))C=O)C6CC[NH+]CCCcnocccF)ccc96)))))))))CC6\nAnswer: a, b, c"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are a hERG blocking compound?\nConstraint: You must select none, one or more options from a or b without using any other words.\nOptions:\na. CC[C@@H](c1cccc(O)c1)[C@@H](C)CN(C)C\nb. C1CCC2(C1)C1C[NH+](CC3CC3)CC2C[NH+](CC2CC2)C1\nAnswer: a, b"}", "/scratch/micpie/export/herg_blockers/valid_0-2.jsonl": "{"text":"The InChI InChI=1S\/C22H24N4O5S\/c1-32(28,29)25-16-3-5-21-17(13-16)20(27)14-22(30-21)7-10-26(11-8-22)9-6-15-2-4-18-19(12-15)24-31-23-18\/h2-5,12-13,25H,6-11,14H2,1H3 is from a molecule that is a hERG blocking compound."} {"text":"The DeepSMILES C[NH+]C)CCCNcccccc6CCcccccc6%15 is from a molecule that is a human ether-à-go-go related gene (hERG) blocking compound."}", "/scratch/micpie/export/herg_blockers/valid_0-1.jsonl": "{"text":"Based on the SMILES representation CS(=O)(=O)Nc1ccc2c(c1)C(=O)CC1(CCN(CCc3ccc4nonc4c3)CC1)O2, the molecule is a hERG blocking compound."} {"text":"Based on the DeepSMILES representation C[NH+]C)CCCNcccccc6CCcccccc6%15, the molecule is a hERG blocker."}", "/scratch/micpie/export/herg_blockers/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are a hERG blocker?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any other words.\nOptions:\nA [O][=C][N][C][C][N][Ring1][Branch1][C][C][NH1+1][C][C][C][Branch2][Ring1][O][C][=C][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][S][Ring1][#Branch1][C][C][Ring2][Ring1][=Branch1]\nB [C][C][=Branch1][C][=O][N][C][=C][C][=C][Branch2][Ring2][#Branch1][C][N][C][C][C][Branch2][Ring1][#Branch2][N][C][=Branch1][C][=O][C][=C][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][O][Ring1][N][C][C][Ring2][Ring1][Branch1][C][=C][Ring2][Ring1][N]\nC [C][S][=Branch1][C][=O][=Branch1][C][=O][N][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][C][Branch2][Ring1][=Branch2][C][C][N][Branch1][S][C][C][C][C][=C][C][=N][O][N][=C][Ring1][Branch1][C][=Ring1][=Branch2][C][C][Ring1][P][O][Ring2][Ring1][Branch2]\nD [C][#C][C][O][C][C][C][C][=Branch1][C][=O][O]\nE [C][O][C][=C][C][=C][Branch2][Ring2][Ring1][C][C][NH2+1][C][C][C][C][Branch1][Ring1][C][#N][Branch1][P][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][Ring1][O][C][=C][Ring1][#Branch2][C][Branch1][C][C][C][C][=C][Ring2][Ring1][N][O][C]\nAnswer: C, E"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are a human ether-à-go-go related gene (hERG) blocking compound?\nConstraint: You must select none, one or more options from a, b, c, or d without using any other words.\nOptions:\na) C[NH+]C)CCCNcccccc6CCcccccc6%15\nb) CCCCCCCCC=O)NCccccC[C@@H]O)CO))))cOC))c6\nc) COcccccc6O[C@H]C=O)CC[C@@]O)[C@@H]C%11)[NH+]C)CC[C@]%13%106\nd) CCCCC))ccn-ccccF)cc6))))))ccccCl)cc96\nAnswer: a, b"}", "/scratch/micpie/export/herg_blockers/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a hERG blocker.\nDeepSMILES: CS=O)=O)Ncccccc6)C=O)CCCCNCCccccnonc5c9)))))))))))CC6)))))O6\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is hERG blocker."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a hERG blocker.\nMolecule SELFIES: [C][NH1+1][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][#C]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is hERG blocker."}", "/scratch/micpie/export/herg_blockers/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against the human ether-à-go-go related gene (hERG).\ncanonical SMILES: CS(=O)(=O)Nc1ccc2c(c1)C(=O)CC1(CCN(CCc3ccc4nonc4c3)CC1)O2\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against the human ether-à-go-go related gene (hERG).\nSMILES: C[NH+](C)CCCN1c2ccccc2CCc2ccccc21\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"}", "/scratch/micpie/export/herg_blockers/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a human ether-à-go-go related gene (hERG) blocking compound.\ncanonical SMILES: Oc1ccc(CCN2CCC(Nc3nc4ccccc4n3Cc3ccc(F)cc3)CC2)cc1\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is human ether-à-go-go related gene (hERG) blocking compound."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a hERG blocking compound (<10uM).\nMolecule DeepSMILES: Fcccccc[C@@H]C[NH2+]CC[C@@H]6F)))))))c-cccccc6))))))[nH]c95\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is hERG blocking compound (<10uM)."}", "/scratch/micpie/export/herg_blockers/valid_0-12.jsonl": "{"text":"User: I want to create a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should be a hERG blocker.\nAssistant: Ok, this canonical SMILES is a hERG blocker: CS(=O)(=O)Nc1ccc2c(c1)C(=O)CC1(CCN(CCc3ccc4nonc4c3)CC1)O2"} {"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should be a hERG blocking compound.\nAssistant: Ok, this SMILES is a hERG blocking compound: C[NH+](C)CCCN1c2ccccc2CCc2ccccc21"}", "/scratch/micpie/export/herg_blockers/train_0-2.jsonl": "{"text":"The SMILES Oc1ccc(CCN2CCC(Nc3nc4ccccc4n3Cc3ccc(F)cc3)CC2)cc1 is from a molecule that is a hERG blocking compound (<10uM)."} {"text":"The DeepSMILES Fcccccc[C@@H]C[NH2+]CC[C@@H]6F)))))))c-cccccc6))))))[nH]c95 is from a molecule that is a hERG blocking compound (<10uM)."}", "/scratch/micpie/export/herg_blockers/test_0-11.jsonl": "{"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should be a hERG blocking compound.\nAssistant: Ok, here you go, this InChI is a hERG blocking compound: InChI=1S\/C29H31F2N3O\/c30-24-12-8-22(9-13-24)27(23-10-14-25(31)15-11-23)7-4-18-33-19-16-29(17-20-33)28(35)32-21-34(29)26-5-2-1-3-6-26\/h1-3,5-6,8-15,27H,4,7,16-21H2,(H,32,35)\/p+1"} {"text":"User: I want to create a InChI.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should be a hERG blocking compound (<10uM).\nAssistant: Got it, here you go, this InChI is a hERG blocking compound (<10uM): InChI=1S\/C14H23NO\/c1-5-14(11(2)10-15(3)4)12-7-6-8-13(16)9-12\/h6-9,11,14,16H,5,10H2,1-4H3\/t11-,14+\/m0\/s1"}", "/scratch/micpie/export/herg_blockers/train_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the SMILES Oc1ccc(CCN2CCC(Nc3nc4ccccc4n3Cc3ccc(F)cc3)CC2)cc1 is a human ether-à-go-go related gene (hERG) blocker?\nAssistant: Yes, this molecule is a human ether-à-go-go related gene (hERG) blocker."} {"text":"User: Can you derive if the molecule with the SMILES Fc1cccc2c([C@@H]3C[NH2+]CC[C@@H]3F)c(-c3ccccc3)[nH]c12 is a human ether-à-go-go related gene (hERG) blocking compound?\nAssistant: Yes, this molecule is a human ether-à-go-go related gene (hERG) blocking compound."}", "/scratch/micpie/export/herg_blockers/train_0-11.jsonl": "{"text":"User: I want to come up with a canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should be a hERG blocker.\nAssistant: Got it, this canonical SMILES is a hERG blocker: Oc1ccc(CCN2CCC(Nc3nc4ccccc4n3Cc3ccc(F)cc3)CC2)cc1"} {"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should be a hERG blocker.\nAssistant: Ok, here you go, this DeepSMILES is a hERG blocker: Fcccccc[C@@H]C[NH2+]CC[C@@H]6F)))))))c-cccccc6))))))[nH]c95"}", "/scratch/micpie/export/herg_blockers/train_0-1.jsonl": "{"text":"Based on the SELFIES [O][C][=C][C][=C][Branch2][Ring2][=Branch2][C][C][N][C][C][C][Branch2][Ring1][O][N][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][C][Ring2][Ring1][Branch2][C][=C][Ring2][Ring1][S], the molecule is a hERG blocker."} {"text":"Based on the DeepSMILES representation Fcccccc[C@@H]C[NH2+]CC[C@@H]6F)))))))c-cccccc6))))))[nH]c95, the molecule is a human ether-à-go-go related gene (hERG) blocking compound."}", "/scratch/micpie/export/herg_blockers/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are a human ether-à-go-go related gene (hERG) blocking compound?\nConstraint: You must select none, one or more options from A or B without using any additional words.\nOptions:\nA. OccccCCNCCCNcncccccc6n9CccccF)cc6)))))))))))))))))CC6))))))))cc6\nB. CScncccc6C=O)NCC[C@@H][NH2+]Cccncn5CccccC#N))cc6))))))))))))))C5=O\nAnswer: A, B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are a hERG blocking compound (<10uM)?\nConstraint: You must select none, one or more options from 1 or 2 without using any additional words.\nOptions:\n(1) Fcccccc[C@@H]C[NH2+]CC[C@@H]6F)))))))c-cccccc6))))))[nH]c95\n(2) CCC)C)C=O)NCCcccF)ccc6F)))))))=C[C@H]5cccccO)c6\nAnswer: 1, 2"}", "/scratch/micpie/export/herg_blockers/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that blocks the human ether-à-go-go related gene (hERG).\nSELFIES: [O][C][=C][C][=C][Branch2][Ring2][=Branch2][C][C][N][C][C][C][Branch2][Ring1][O][N][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][C][Ring2][Ring1][Branch2][C][=C][Ring2][Ring1][S]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against hERG (<10uM).\nMolecule DeepSMILES: Fcccccc[C@@H]C[NH2+]CC[C@@H]6F)))))))c-cccccc6))))))[nH]c95\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"}", "/scratch/micpie/export/herg_blockers/test_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the canonical SMILES O=C1NCN(c2ccccc2)C12CC[NH+](CCCC(c1ccc(F)cc1)c1ccc(F)cc1)CC2 is a human ether-à-go-go related gene (hERG) blocking compound?\nAssistant: Yes, this molecule is a human ether-à-go-go related gene (hERG) blocking compound."} {"text":"User: Can you tell me if the molecule with the InChI InChI=1S\/C14H23NO\/c1-5-14(11(2)10-15(3)4)12-7-6-8-13(16)9-12\/h6-9,11,14,16H,5,10H2,1-4H3\/t11-,14+\/m0\/s1 is a human ether-à-go-go related gene (hERG) blocker?\nAssistant: Yes, this molecule is a human ether-à-go-go related gene (hERG) blocker."}", "/scratch/micpie/export/herg_blockers/train_0-9.jsonl": "{"text":"User: Can you generate the DeepSMILES of a molecule that is a human ether-à-go-go related gene (hERG) blocker?\nAssistant: Of course, here you go: OccccCCNCCCNcncccccc6n9CccccF)cc6)))))))))))))))))CC6))))))))cc6"} {"text":"User: Can you give me the DeepSMILES of a molecule that is a human ether-à-go-go related gene (hERG) blocker?\nAssistant: Sure, here you go: Fcccccc[C@@H]C[NH2+]CC[C@@H]6F)))))))c-cccccc6))))))[nH]c95"}", "/scratch/micpie/export/herg_blockers/valid_0-3.jsonl": "{"text":"The SELFIES [C][S][=Branch1][C][=O][=Branch1][C][=O][N][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][C][Branch2][Ring1][=Branch2][C][C][N][Branch1][S][C][C][C][C][=C][C][=N][O][N][=C][Ring1][Branch1][C][=Ring1][=Branch2][C][C][Ring1][P][O][Ring2][Ring1][Branch2] is a hERG blocking compound (<10uM)."} {"text":"The molecule SMILES C[NH+](C)CCCN1c2ccccc2CCc2ccccc21 is a hERG blocking compound."}", "/scratch/micpie/export/herg_blockers/test_0-8.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C29H31F2N3O\/c30-24-12-8-22(9-13-24)27(23-10-14-25(31)15-11-23)7-4-18-33-19-16-29(17-20-33)28(35)32-21-34(29)26-5-2-1-3-6-26\/h1-3,5-6,8-15,27H,4,7,16-21H2,(H,32,35)\/p+1 a human ether-à-go-go related gene (hERG) blocking compound?\nAssistant: Yes, it is a human ether-à-go-go related gene (hERG) blocking compound."} {"text":"User: Is the molecule with the SMILES CC[C@@H](c1cccc(O)c1)[C@@H](C)CN(C)C a hERG blocking compound (<10uM)?\nAssistant: Yes, it is a hERG blocking compound (<10uM)."}", "/scratch/micpie/export/herg_blockers/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against the human ether-à-go-go related gene (hERG).\ncanonical SMILES: O=C1NCN(c2ccccc2)C12CC[NH+](CCCC(c1ccc(F)cc1)c1ccc(F)cc1)CC2\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against hERG (<10uM).\nInChI: InChI=1S\/C14H23NO\/c1-5-14(11(2)10-15(3)4)12-7-6-8-13(16)9-12\/h6-9,11,14,16H,5,10H2,1-4H3\/t11-,14+\/m0\/s1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"}", "/scratch/micpie/export/herg_blockers/test_0-12.jsonl": "{"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should be a hERG blocking compound (<10uM).\nAssistant: Understood, this DeepSMILES is a hERG blocking compound (<10uM): O=CNCNcccccc6))))))C5CC[NH+]CCCCccccF)cc6))))))ccccF)cc6))))))))))CC6"} {"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should be a human ether-à-go-go related gene (hERG) blocking compound.\nAssistant: Understood, this SMILES is a human ether-à-go-go related gene (hERG) blocking compound: CC[C@@H](c1cccc(O)c1)[C@@H](C)CN(C)C"}", "/scratch/micpie/export/MUV_466/valid_0-0.jsonl": "{"text":"The chemical with the DeepSMILES representation of Cnccnc5SCC=O)NccccOcccccc6)))))))cc6 is not an agonist of the S1P1 receptor."} {"text":"The chemical with the DeepSMILES COcccccc6-ccC)nncNCCCCC6))OCCO5))))))))ccC)nc96 is not an agonist of the S1P1 receptor."}", "/scratch/micpie/export/MUV_466/test_0-0.jsonl": "{"text":"The molecular species with the InChI representation of InChI=1S\/C10H12N2O3\/c11-10(15)12-8(9(13)14)6-7-4-2-1-3-5-7\/h1-5,8H,6H2,(H,13,14)(H3,11,12,15) is not an agonist of the S1P1 receptor."} {"text":"The chemical compound with the DeepSMILES representation of CcccccOCC=O)Ncccccc6NCCNC=O)CC)C)))CC6))))))))))))))))c6C is not an agonist of the S1P1 receptor."}", "/scratch/micpie/export/MUV_466/train_0-0.jsonl": "{"text":"The compound with the DeepSMILES representation of O=CNCcccco5)))))))ccscNcccccc6Cl))))))))n5 is not an agonist of the S1P1 receptor."} {"text":"The chemical with the canonical SMILES CCOC(=O)C(C)n1cnc2c(cnn2-c2ccc(Cl)cc2)c1=O is not an agonist of the S1P1 receptor."}", "/scratch/micpie/export/orbnet_denali/valid_1-6.jsonl": "{"text":"User: I would like to know the GFN1-xTB total energy of the chemical with SELFIES [H].[H].[H][C][O][C][C][Branch1][C][H][C][Branch1][#Branch2][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][#Branch2][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][C][H][C][Ring2][Ring1][C][C][O][C][C][Branch1][C][H][C][Branch1][#Branch2][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][C][H][C][Branch1][Ring1][O][H][C][Ring1][#C][C][Branch1][C][O][C][Ring2][Ring1][Ring2][H].\nAssistant: Do you have the XYZ file file of a conformer of the chemical?\nUser: Here it is: [XYZ]\n44\nH18 C19 O7\nC -6.481 -0.583 -2.661\nO -5.062 -0.756 -2.484\nC -4.467 -0.303 -1.382\nC -5.139 0.379 -0.381\nC -4.411 0.726 0.787\nO -5.076 1.308 1.766\nC -3.045 0.598 0.851\nC -2.252 1.109 1.982\nO -2.736 1.714 2.918\nC -0.837 0.838 1.844\nC -0.294 0.172 0.813\nC 1.134 -0.141 0.610\nC 1.514 -1.332 -0.067\nC 2.827 -1.677 -0.276\nO 3.038 -2.879 -0.878\nC 4.364 -3.212 -1.413\nC 3.874 -0.699 0.051\nO 5.193 -0.926 -0.246\nC 6.215 0.040 0.114\nC 3.496 0.469 0.707\nC 2.157 0.722 0.985\nO 1.761 1.876 1.664\nC 1.809 3.102 0.922\nO -1.062 -0.357 -0.116\nC -2.383 -0.159 -0.139\nC -3.095 -0.594 -1.247\nH -6.753 -1.026 -3.600\nH -6.783 0.451 -2.593\nH -7.069 -1.133 -1.887\nH -6.186 0.560 -0.303\nH -4.415 1.560 2.496\nH -0.195 1.205 2.616\nH 0.770 -2.024 -0.470\nH 4.247 -4.144 -1.964\nH 4.753 -2.368 -2.117\nH 5.080 -3.422 -0.580\nH 6.143 0.312 1.210\nH 6.193 0.978 -0.511\nH 7.171 -0.468 0.027\nH 4.205 1.248 0.958\nH 2.682 3.059 0.118\nH 0.784 3.293 0.401\nH 2.078 4.044 1.596\nH -2.676 -1.184 -2.043[\\XYZ]\nAssistant: The GFN1-xTB total energy of the chemical is -78.69620 Hartree."} {"text":"User: I must know the GFN1-xTB total energy of the compound with SELFIES [H][C][C][Branch1][C][H][C][Branch1][C][Cl][C][Branch1][C][H][C][Branch2][Ring1][N][N][Branch1][C][H][C][O][K][O][C][Branch1][C][O][C][Branch1][C][H][Branch1][C][H][S][C][Ring1][O][Branch1][C][H][H][C][Ring2][Ring1][Branch2][H].\nAssistant: Do you have the XYZ file file of a conformer of the compound?\nUser: I do: [XYZ]\n26\nK1 H9 C10 S1 N1 Cl1 O3\nO 3.248 0.034 -0.392\nC 2.472 0.812 0.211\nO 1.284 1.038 0.016\nC 3.145 1.611 1.415\nS 3.895 3.156 0.828\nC 5.431 2.612 0.009\nC 6.455 2.333 1.077\nO 6.579 1.193 1.529\nN 7.215 3.326 1.597\nC 7.349 4.677 1.186\nC 6.244 5.492 0.963\nC 6.433 6.819 0.600\nC 7.711 7.343 0.473\nC 8.808 6.529 0.719\nCl 10.403 7.156 0.588\nC 8.636 5.199 1.070\nK 5.167 -0.455 0.690\nH 3.886 0.990 1.957\nH 2.368 1.942 2.108\nH 5.732 3.411 -0.664\nH 5.254 1.688 -0.570\nH 7.885 2.989 2.283\nH 5.239 5.112 1.101\nH 5.572 7.452 0.419\nH 7.860 8.381 0.209\nH 9.497 4.572 1.252[\\XYZ]\nAssistant: The GFN1-xTB total energy of the compound is -49.44017 Hartree."}", "/scratch/micpie/export/orbnet_denali/valid_1-4.jsonl": "{"text":"Task: Return the total energy of a chemical structure computed at the GFN1-xTB level of theory.\nDescription: The chemical structure has the XYZ file [XYZ]\n44\nH18 C19 O7\nC -6.481 -0.583 -2.661\nO -5.062 -0.756 -2.484\nC -4.467 -0.303 -1.382\nC -5.139 0.379 -0.381\nC -4.411 0.726 0.787\nO -5.076 1.308 1.766\nC -3.045 0.598 0.851\nC -2.252 1.109 1.982\nO -2.736 1.714 2.918\nC -0.837 0.838 1.844\nC -0.294 0.172 0.813\nC 1.134 -0.141 0.610\nC 1.514 -1.332 -0.067\nC 2.827 -1.677 -0.276\nO 3.038 -2.879 -0.878\nC 4.364 -3.212 -1.413\nC 3.874 -0.699 0.051\nO 5.193 -0.926 -0.246\nC 6.215 0.040 0.114\nC 3.496 0.469 0.707\nC 2.157 0.722 0.985\nO 1.761 1.876 1.664\nC 1.809 3.102 0.922\nO -1.062 -0.357 -0.116\nC -2.383 -0.159 -0.139\nC -3.095 -0.594 -1.247\nH -6.753 -1.026 -3.600\nH -6.783 0.451 -2.593\nH -7.069 -1.133 -1.887\nH -6.186 0.560 -0.303\nH -4.415 1.560 2.496\nH -0.195 1.205 2.616\nH 0.770 -2.024 -0.470\nH 4.247 -4.144 -1.964\nH 4.753 -2.368 -2.117\nH 5.080 -3.422 -0.580\nH 6.143 0.312 1.210\nH 6.193 0.978 -0.511\nH 7.171 -0.468 0.027\nH 4.205 1.248 0.958\nH 2.682 3.059 0.118\nH 0.784 3.293 0.401\nH 2.078 4.044 1.596\nH -2.676 -1.184 -2.043[\\XYZ].\nAnswer: -78.69620 Hartree"} {"text":"Task: Return the total energy of a chemical structure computed at the GFN1-xTB level of theory.\nDescription: The chemical structure has the XYZ file [XYZ]\n26\nK1 H9 C10 S1 N1 Cl1 O3\nO 3.248 0.034 -0.392\nC 2.472 0.812 0.211\nO 1.284 1.038 0.016\nC 3.145 1.611 1.415\nS 3.895 3.156 0.828\nC 5.431 2.612 0.009\nC 6.455 2.333 1.077\nO 6.579 1.193 1.529\nN 7.215 3.326 1.597\nC 7.349 4.677 1.186\nC 6.244 5.492 0.963\nC 6.433 6.819 0.600\nC 7.711 7.343 0.473\nC 8.808 6.529 0.719\nCl 10.403 7.156 0.588\nC 8.636 5.199 1.070\nK 5.167 -0.455 0.690\nH 3.886 0.990 1.957\nH 2.368 1.942 2.108\nH 5.732 3.411 -0.664\nH 5.254 1.688 -0.570\nH 7.885 2.989 2.283\nH 5.239 5.112 1.101\nH 5.572 7.452 0.419\nH 7.860 8.381 0.209\nH 9.497 4.572 1.252[\\XYZ].\nAnswer: -49.44017 Hartree"}", "/scratch/micpie/export/orbnet_denali/test_1-2.jsonl": "{"text":"Question: What is the structure of a conformer of the chemical structure with SMILES [H]OC1C([H])C(OC([H])([H])[H])C([H])C2OC(C3C([H])C(OC([H])([H])[H])C(OC([H])([H])[H])C([H])C3OC([H])([H])[H])C([H])C(O)C12?\nConstraint: Return a MOL2000 file.\nAnswer: [V2000]\n\n ChemNLP 3D\n\n 44 46 0 0 0 0 0 0 0 0999 V2000\n -6.4435 -1.2251 -2.6670 C 0 0 0 0 0 0 0 0 0 0 0 0\n -5.0323 -1.2567 -2.5576 O 0 0 0 0 0 0 0 0 0 0 0 0\n -4.4125 -0.6788 -1.5038 C 0 0 0 0 0 3 0 0 0 0 0 0\n -5.0911 -0.0231 -0.4743 C 0 0 0 0 0 3 0 0 0 0 0 0\n -4.3759 0.5454 0.5756 C 0 0 0 0 0 3 0 0 0 0 0 0\n -5.0185 1.1696 1.5567 O 0 0 0 0 0 0 0 0 0 0 0 0\n -2.9622 0.4607 0.6011 C 0 0 0 0 0 3 0 0 0 0 0 0\n -2.1983 1.0427 1.6944 C 0 0 0 0 0 3 0 0 0 0 0 0\n -2.7352 1.6388 2.6340 O 0 0 0 0 0 1 0 0 0 0 0 0\n -0.7680 0.8554 1.5863 C 0 0 0 0 0 3 0 0 0 0 0 0\n -0.2098 0.2024 0.5326 C 0 0 0 0 0 3 0 0 0 0 0 0\n 1.2373 0.0021 0.3757 C 0 0 0 0 0 3 0 0 0 0 0 0\n 1.7812 -1.1414 -0.2371 C 0 0 0 0 0 3 0 0 0 0 0 0\n 3.1640 -1.2580 -0.3437 C 0 0 0 0 0 3 0 0 0 0 0 0\n 0.9276 -2.1010 -0.6597 O 0 0 0 0 0 0 0 0 0 0 0 0\n 1.4111 -3.2841 -1.2651 C 0 0 0 0 0 0 0 0 0 0 0 0\n 4.0288 -0.2748 0.1329 C 0 0 0 0 0 3 0 0 0 0 0 0\n 5.3415 -0.5430 -0.0595 O 0 0 0 0 0 0 0 0 0 0 0 0\n 6.3811 0.3261 0.3613 C 0 0 0 0 0 0 0 0 0 0 0 0\n 3.4896 0.8719 0.7517 C 0 0 0 0 0 3 0 0 0 0 0 0\n 2.1138 0.9828 0.8600 C 0 0 0 0 0 3 0 0 0 0 0 0\n 4.3493 1.8358 1.2150 O 0 0 0 0 0 0 0 0 0 0 0 0\n 3.8396 2.9817 1.8704 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.9578 -0.3042 -0.4639 O 0 0 0 0 0 0 0 0 0 0 0 0\n -2.3106 -0.2048 -0.4478 C 0 0 0 0 0 3 0 0 0 0 0 0\n -3.0184 -0.7730 -1.4972 C 0 0 0 0 0 3 0 0 0 0 0 0\n -6.6897 -1.7544 -3.5891 H 0 0 0 0 0 0 0 0 0 0 0 0\n -6.8161 -0.1963 -2.7306 H 0 0 0 0 0 0 0 0 0 0 0 0\n -6.9210 -1.7325 -1.8208 H 0 0 0 0 0 0 0 0 0 0 0 0\n -6.1691 0.0589 -0.4654 H 0 0 0 0 0 0 0 0 0 0 0 0\n -4.3163 1.4971 2.2009 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.1515 1.2336 2.3906 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5297 -3.8785 -1.5118 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.9676 -3.0659 -2.1842 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.6167 -2.1252 -0.8054 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.0503 -3.8537 -0.5801 H 0 0 0 0 0 0 0 0 0 0 0 0\n 6.3709 0.4668 1.4463 H 0 0 0 0 0 0 0 0 0 0 0 0\n 6.3164 1.2981 -0.1371 H 0 0 0 0 0 0 0 0 0 0 0 0\n 7.3134 -0.1646 0.0715 H 0 0 0 0 0 0 0 0 0 0 0 0\n 4.7082 3.5748 2.1627 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.2059 3.5787 1.2036 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.6850 1.8694 1.3081 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.2703 2.7120 2.7680 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.5016 -1.2830 -2.2982 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 1 27 1 0\n 1 28 1 0\n 1 29 1 0\n 2 3 1 0\n 3 4 1 0\n 3 26 1 0\n 4 5 1 0\n 4 30 1 0\n 5 6 1 0\n 5 7 1 0\n 6 31 1 0\n 7 8 1 0\n 7 25 1 0\n 8 9 1 0\n 8 10 1 0\n 10 11 1 0\n 10 32 1 0\n 11 12 1 0\n 11 24 1 0\n 12 13 1 0\n 12 21 1 0\n 13 14 1 0\n 13 15 1 0\n 14 17 1 0\n 14 35 1 0\n 15 16 1 0\n 16 33 1 0\n 16 34 1 0\n 16 36 1 0\n 17 18 1 0\n 17 20 1 0\n 18 19 1 0\n 19 37 1 0\n 19 38 1 0\n 19 39 1 0\n 20 21 1 0\n 20 22 1 0\n 21 42 1 0\n 22 23 1 0\n 23 40 1 0\n 23 41 1 0\n 23 43 1 0\n 24 25 1 0\n 25 26 1 0\n 26 44 1 0\nM END\n[\\V2000]"} {"text":"Question: What is the structure of a conformer of the chemical with SMILES [H]C1C([H])C(Cl)C([H])C(N([H])C(O[Li])C([H])([H])SC([H])([H])C(O)O)C1[H]?\nConstraint: Return a MOL2000 file.\nAnswer: [V2000]\n\n ChemNLP 3D\n\n 26 26 0 0 0 0 0 0 0 0999 V2000\n 4.9318 0.7044 0.3141 O 0 0 0 0 0 1 0 0 0 0 0 0\n 4.8405 1.3285 -0.7343 C 0 0 0 0 0 3 0 0 0 0 0 0\n 5.6159 2.2316 -1.1681 O 0 0 0 0 0 1 0 0 0 0 0 0\n 3.6199 0.9495 -1.6362 C 0 0 0 0 0 0 0 0 0 0 0 0\n 3.2032 2.2102 -2.8766 S 0 0 0 0 0 0 0 0 0 0 0 0\n 4.1223 1.5756 -4.2937 C 0 0 0 0 0 0 0 0 0 0 0 0\n 5.5291 2.1198 -4.5315 C 0 0 0 0 0 3 0 0 0 0 0 0\n 5.9249 1.9904 -5.7270 O 0 0 0 0 0 0 0 0 0 0 0 0\n 6.1993 2.6480 -3.5398 N 0 0 0 0 0 0 0 0 0 0 0 0\n 7.5139 3.1565 -3.5845 C 0 0 0 0 0 3 0 0 0 0 0 0\n 8.1332 3.4499 -2.3674 C 0 0 0 0 0 3 0 0 0 0 0 0\n 9.4268 3.9444 -2.3432 C 0 0 0 0 0 3 0 0 0 0 0 0\n 10.1248 4.1569 -3.5245 C 0 0 0 0 0 3 0 0 0 0 0 0\n 9.4973 3.8810 -4.7283 C 0 0 0 0 0 3 0 0 0 0 0 0\n 10.3438 4.1275 -6.2101 Cl 0 0 0 0 0 0 0 0 0 0 0 0\n 8.1984 3.4003 -4.7741 C 0 0 0 0 0 3 0 0 0 0 0 0\n 4.8782 1.1919 -6.7893 Li 0 0 0 0 0 1 0 0 0 0 0 0\n 2.7523 0.8353 -1.0015 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.8133 -0.0067 -2.1145 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.5546 1.8734 -5.2171 H 0 0 0 0 0 0 0 0 0 0 0 0\n 4.1957 0.4841 -4.2691 H 0 0 0 0 0 0 0 0 0 0 0 0\n 5.8078 2.4820 -2.4785 H 0 0 0 0 0 0 0 0 0 0 0 0\n 7.6073 3.2841 -1.4344 H 0 0 0 0 0 0 0 0 0 0 0 0\n 9.9026 4.1675 -1.3939 H 0 0 0 0 0 0 0 0 0 0 0 0\n 11.1425 4.5287 -3.5080 H 0 0 0 0 0 0 0 0 0 0 0 0\n 7.7443 3.2274 -5.7384 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 3 1 0\n 2 4 1 0\n 4 5 1 0\n 4 18 1 0\n 4 19 1 0\n 5 6 1 0\n 6 7 1 0\n 6 20 1 0\n 6 21 1 0\n 7 8 1 0\n 7 9 1 0\n 8 17 1 0\n 9 10 1 0\n 9 22 1 0\n 10 11 1 0\n 10 16 1 0\n 11 12 1 0\n 11 23 1 0\n 12 13 1 0\n 12 24 1 0\n 13 14 1 0\n 13 25 1 0\n 14 15 1 0\n 14 16 1 0\n 16 26 1 0\nM END\n[\\V2000]"}", "/scratch/micpie/export/orbnet_denali/test_1-5.jsonl": "{"text":"Task: Return the total energy of a compound computed at the {\\omega}B97X-D3\/def2-TZVP level of theory.\nDescription: The compound has the XYZ file [XYZ]\n44\nH18 C19 O7\nC -6.444 -1.225 -2.667\nO -5.032 -1.257 -2.558\nC -4.412 -0.679 -1.504\nC -5.091 -0.023 -0.474\nC -4.376 0.545 0.576\nO -5.018 1.170 1.557\nC -2.962 0.461 0.601\nC -2.198 1.043 1.694\nO -2.735 1.639 2.634\nC -0.768 0.855 1.586\nC -0.210 0.202 0.533\nC 1.237 0.002 0.376\nC 1.781 -1.141 -0.237\nC 3.164 -1.258 -0.344\nO 0.928 -2.101 -0.660\nC 1.411 -3.284 -1.265\nC 4.029 -0.275 0.133\nO 5.341 -0.543 -0.059\nC 6.381 0.326 0.361\nC 3.490 0.872 0.752\nC 2.114 0.983 0.860\nO 4.349 1.836 1.215\nC 3.840 2.982 1.870\nO -0.958 -0.304 -0.464\nC -2.311 -0.205 -0.448\nC -3.018 -0.773 -1.497\nH -6.690 -1.754 -3.589\nH -6.816 -0.196 -2.731\nH -6.921 -1.732 -1.821\nH -6.169 0.059 -0.465\nH -4.316 1.497 2.201\nH -0.152 1.234 2.391\nH 0.530 -3.878 -1.512\nH 1.968 -3.066 -2.184\nH 3.617 -2.125 -0.805\nH 2.050 -3.854 -0.580\nH 6.371 0.467 1.446\nH 6.316 1.298 -0.137\nH 7.313 -0.165 0.072\nH 4.708 3.575 2.163\nH 3.206 3.579 1.204\nH 1.685 1.869 1.308\nH 3.270 2.712 2.768\nH -2.502 -1.283 -2.298[\\XYZ].\nAnswer: -1261.51648 Hartree"} {"text":"Task: Return the total energy of a chemical computed at the {\\omega}B97X-D3\/def2-TZVP level of theory.\nDescription: The chemical has the XYZ file [XYZ]\n26\nLi1 H9 C10 S1 N1 Cl1 O3\nO 4.932 0.704 0.314\nC 4.841 1.329 -0.734\nO 5.616 2.232 -1.168\nC 3.620 0.950 -1.636\nS 3.203 2.210 -2.877\nC 4.122 1.576 -4.294\nC 5.529 2.120 -4.531\nO 5.925 1.990 -5.727\nN 6.199 2.648 -3.540\nC 7.514 3.156 -3.584\nC 8.133 3.450 -2.367\nC 9.427 3.944 -2.343\nC 10.125 4.157 -3.524\nC 9.497 3.881 -4.728\nCl 10.344 4.127 -6.210\nC 8.198 3.400 -4.774\nLi 4.878 1.192 -6.789\nH 2.752 0.835 -1.001\nH 3.813 -0.007 -2.114\nH 3.555 1.873 -5.217\nH 4.196 0.484 -4.269\nH 5.808 2.482 -2.478\nH 7.607 3.284 -1.434\nH 9.903 4.167 -1.394\nH 11.142 4.529 -3.508\nH 7.744 3.227 -5.738[\\XYZ].\nAnswer: -1532.97043 Hartree"}", "/scratch/micpie/export/orbnet_denali/train_1-5.jsonl": "{"text":"Task: Return the total energy of a molecule computed at the {\\omega}B97X-D3\/def2-TZVP level of theory.\nDescription: The molecule has the XYZ file [XYZ]\n44\nH18 C19 O7\nC -6.542 -1.167 -2.752\nO -5.088 -0.998 -2.662\nC -4.441 -0.523 -1.609\nC -5.113 -0.193 -0.439\nC -4.456 0.341 0.675\nO -5.062 0.788 1.771\nC -3.046 0.424 0.596\nC -2.270 0.793 1.781\nO -2.884 1.013 2.887\nC -0.767 0.911 1.585\nC -0.182 0.474 0.404\nC 1.284 0.202 0.252\nC 1.779 -1.107 -0.151\nC 3.174 -1.311 -0.258\nO 0.866 -2.147 -0.290\nC 1.390 -3.379 -0.673\nC 4.080 -0.279 0.061\nO 5.353 -0.605 -0.052\nC 6.532 -0.133 0.854\nC 3.569 1.026 0.431\nC 2.234 1.310 0.481\nO 4.421 1.977 0.840\nC 3.980 3.085 1.606\nO -0.928 0.131 -0.704\nC -2.313 -0.001 -0.611\nC -3.033 -0.518 -1.714\nH -6.692 -1.677 -3.713\nH -7.161 -0.227 -2.602\nH -6.898 -1.863 -2.014\nH -6.212 -0.192 -0.339\nH -4.311 1.050 2.378\nH -0.180 1.210 2.398\nH 0.568 -3.844 -1.348\nH 2.404 -3.406 -1.384\nH 3.595 -2.307 -0.454\nH 1.694 -4.046 0.308\nH 6.205 -0.100 1.962\nH 6.988 1.024 0.533\nH 7.342 -0.883 0.699\nH 4.810 3.631 1.927\nH 3.280 3.710 1.078\nH 1.707 2.303 0.631\nH 3.491 2.679 2.505\nH -2.446 -0.865 -2.521[\\XYZ].\nAnswer: -1261.39283 Hartree"} {"text":"Task: Return the total energy of a molecule computed at the {\\omega}B97X-D3\/def2-TZVP level of theory.\nDescription: The molecule has the XYZ file [XYZ]\n26\nK1 H9 C10 S1 N1 Cl1 O3\nO 3.679 3.606 -0.112\nC 2.706 2.996 0.432\nO 1.620 3.459 0.748\nC 2.948 1.489 0.744\nS 3.940 0.690 -0.552\nC 5.486 0.389 0.329\nC 6.311 1.592 0.739\nO 7.269 1.394 1.510\nN 6.002 2.792 0.234\nC 6.782 3.922 0.518\nC 8.183 3.867 0.486\nC 8.934 5.009 0.745\nC 8.307 6.218 1.021\nC 6.917 6.274 1.018\nCl 6.132 7.777 1.324\nC 6.154 5.144 0.770\nK 9.103 2.257 2.626\nH 1.990 0.976 0.829\nH 3.460 1.424 1.706\nH 6.108 -0.187 -0.364\nH 5.337 -0.233 1.218\nH 4.911 3.050 -0.024\nH 8.695 2.938 0.214\nH 10.025 4.966 0.679\nH 8.887 7.113 1.213\nH 5.073 5.217 0.760[\\XYZ].\nAnswer: -2125.39705 Hartree"}", "/scratch/micpie/export/orbnet_denali/test_0-5.jsonl": "{"text":"Task: Return the total energy of a molecule computed at the {\\omega}B97X-D3\/def2-TZVP level of theory.\nDescription: The molecule has the XYZ file [XYZ]\n35\nH14 C16 I1 N2 O2\nO -2.054 -1.196 -3.088\nC -2.281 -1.177 -1.852\nO -1.345 -1.785 -0.955\nC -1.195 -1.440 0.310\nN -0.079 -2.118 0.983\nC 1.270 -1.677 0.475\nC 1.606 -0.209 0.921\nC 2.886 0.260 0.255\nC 4.164 0.034 0.932\nC 5.444 0.190 0.245\nC 5.408 0.755 -1.040\nC 4.142 1.096 -1.658\nC 2.917 0.731 -1.039\nI 1.044 1.177 -2.067\nN -1.978 -0.700 0.985\nC -3.028 0.009 0.220\nC -3.810 0.980 0.905\nC -4.892 1.490 0.235\nC -5.165 1.141 -1.079\nC -4.379 0.256 -1.802\nC -3.237 -0.323 -1.126\nH -0.280 -3.134 0.808\nH -0.185 -1.978 1.970\nH 1.091 -1.731 -0.641\nH 1.972 -2.409 0.793\nH 1.741 -0.182 2.060\nH 0.810 0.503 0.617\nH 4.155 -0.341 1.973\nH 6.395 0.003 0.813\nH 6.404 0.853 -1.690\nH 4.157 1.477 -2.661\nH -3.480 1.259 1.951\nH -5.547 2.250 0.720\nH -6.061 1.537 -1.581\nH -4.536 -0.007 -2.827[\\XYZ].\nAnswer: -1175.64052 Hartree"} {"text":"Task: Return the total energy of a compound computed at the {\\omega}B97X-D3\/def2-TZVP level of theory.\nDescription: The compound has the XYZ file [XYZ]\n44\nH18 C19 O7\nC -5.666 2.701 0.538\nO -5.907 1.500 -0.218\nC -4.925 0.591 -0.544\nC -5.368 -0.571 -1.168\nC -4.365 -1.568 -1.399\nO -4.759 -2.607 -2.108\nC -3.031 -1.354 -1.064\nC -2.106 -2.461 -1.198\nO -2.420 -3.607 -1.637\nC -0.708 -2.198 -0.814\nC -0.475 -0.973 -0.269\nC 0.820 -0.487 0.148\nC 1.052 0.941 0.366\nC 2.280 1.558 0.481\nO 2.201 2.924 0.494\nC 3.396 3.599 0.923\nC 3.480 0.694 0.443\nO 4.760 1.208 0.611\nC 6.014 0.321 0.236\nC 3.271 -0.697 0.372\nC 1.998 -1.328 0.251\nO 1.915 -2.662 0.131\nC 2.939 -3.386 0.851\nO -1.395 0.020 -0.161\nC -2.680 -0.140 -0.486\nC -3.606 0.894 -0.329\nH -6.576 3.228 0.666\nH -5.247 2.422 1.490\nH -4.954 3.418 0.055\nH -6.407 -0.714 -1.300\nH -3.957 -3.333 -2.024\nH 0.057 -2.945 -0.917\nH 0.160 1.640 0.360\nH 2.986 4.676 1.175\nH 4.179 3.540 0.120\nH 3.815 3.135 1.855\nH 5.897 -0.206 -0.788\nH 6.241 -0.309 1.144\nH 6.897 0.964 0.152\nH 4.246 -1.221 0.270\nH 2.981 -3.001 1.979\nH 2.758 -4.539 0.933\nH 3.979 -3.301 0.444\nH -3.230 1.838 0.076[\\XYZ].\nAnswer: -1261.42237 Hartree"}", "/scratch/micpie/export/orbnet_denali/test_0-1.jsonl": "{"text":"Question: What's the structure of a conformer of the chemical structure with SMILES [H].[H]C1CC([H])C(I)C(C([H])([H])C([H])([H])N([H])([H])C2NC3C([H])C([H])C([H])C([H])C3C(O)O2)C1[H]?\nConstraint: Return a XYZ file.\nAnswer: [XYZ]\n35\nH14 C16 I1 N2 O2\nO -2.054 -1.196 -3.088\nC -2.281 -1.177 -1.852\nO -1.345 -1.785 -0.955\nC -1.195 -1.440 0.310\nN -0.079 -2.118 0.983\nC 1.270 -1.677 0.475\nC 1.606 -0.209 0.921\nC 2.886 0.260 0.255\nC 4.164 0.034 0.932\nC 5.444 0.190 0.245\nC 5.408 0.755 -1.040\nC 4.142 1.096 -1.658\nC 2.917 0.731 -1.039\nI 1.044 1.177 -2.067\nN -1.978 -0.700 0.985\nC -3.028 0.009 0.220\nC -3.810 0.980 0.905\nC -4.892 1.490 0.235\nC -5.165 1.141 -1.079\nC -4.379 0.256 -1.802\nC -3.237 -0.323 -1.126\nH -0.280 -3.134 0.808\nH -0.185 -1.978 1.970\nH 1.091 -1.731 -0.641\nH 1.972 -2.409 0.793\nH 1.741 -0.182 2.060\nH 0.810 0.503 0.617\nH 4.155 -0.341 1.973\nH 6.395 0.003 0.813\nH 6.404 0.853 -1.690\nH 4.157 1.477 -2.661\nH -3.480 1.259 1.951\nH -5.547 2.250 0.720\nH -6.061 1.537 -1.581\nH -4.536 -0.007 -2.827[\\XYZ]"} {"text":"Question: What is the structure of a conformer of the compound with InChI InChI=1S\/C19H34O7.H\/c1-22-10-5-12(20)19-13(21)8-15(26-18(19)6-10)11-7-16(24-3)17(25-4)9-14(11)23-2;\/h10-21H,5-9H2,1-4H3;?\nConstraint: Return a XYZ file.\nAnswer: [XYZ]\n44\nH18 C19 O7\nC -5.666 2.701 0.538\nO -5.907 1.500 -0.218\nC -4.925 0.591 -0.544\nC -5.368 -0.571 -1.168\nC -4.365 -1.568 -1.399\nO -4.759 -2.607 -2.108\nC -3.031 -1.354 -1.064\nC -2.106 -2.461 -1.198\nO -2.420 -3.607 -1.637\nC -0.708 -2.198 -0.814\nC -0.475 -0.973 -0.269\nC 0.820 -0.487 0.148\nC 1.052 0.941 0.366\nC 2.280 1.558 0.481\nO 2.201 2.924 0.494\nC 3.396 3.599 0.923\nC 3.480 0.694 0.443\nO 4.760 1.208 0.611\nC 6.014 0.321 0.236\nC 3.271 -0.697 0.372\nC 1.998 -1.328 0.251\nO 1.915 -2.662 0.131\nC 2.939 -3.386 0.851\nO -1.395 0.020 -0.161\nC -2.680 -0.140 -0.486\nC -3.606 0.894 -0.329\nH -6.576 3.228 0.666\nH -5.247 2.422 1.490\nH -4.954 3.418 0.055\nH -6.407 -0.714 -1.300\nH -3.957 -3.333 -2.024\nH 0.057 -2.945 -0.917\nH 0.160 1.640 0.360\nH 2.986 4.676 1.175\nH 4.179 3.540 0.120\nH 3.815 3.135 1.855\nH 5.897 -0.206 -0.788\nH 6.241 -0.309 1.144\nH 6.897 0.964 0.152\nH 4.246 -1.221 0.270\nH 2.981 -3.001 1.979\nH 2.758 -4.539 0.933\nH 3.979 -3.301 0.444\nH -3.230 1.838 0.076[\\XYZ]"}", "/scratch/micpie/export/orbnet_denali/valid_0-0.jsonl": "{"text":"The chemical structure with SMILES [H]C1C([H])C([H])C(C([H])([H])C([H])([H])N([H])([H])C2NC3C([H])C([H])C([H])C([H])C3C(O)O2)C(I)C1[H] has a charge of 1."} {"text":"The molecule with InChI InChI=1S\/C18H32O4\/c1-21-18-13(7-8-17-14(18)9-10-22-17)16(20)11-15(19)12-5-3-2-4-6-12\/h12-20H,2-11H2,1H3 has a charge of 0."}", "/scratch/micpie/export/orbnet_denali/test_1-7.jsonl": "{"text":"User: I want to know the {\\omega}B97X-D3\/def2-TZVP total energy of the chemical structure with canonical SMILES COC1CC(O)C2C(O)CC(C3CC(OC)C(OC)CC3OC)OC2C1.\nAssistant: Do you have the XYZ file file of a conformer of the chemical structure?\nUser: I have it: [XYZ]\n44\nH18 C19 O7\nC -6.444 -1.225 -2.667\nO -5.032 -1.257 -2.558\nC -4.412 -0.679 -1.504\nC -5.091 -0.023 -0.474\nC -4.376 0.545 0.576\nO -5.018 1.170 1.557\nC -2.962 0.461 0.601\nC -2.198 1.043 1.694\nO -2.735 1.639 2.634\nC -0.768 0.855 1.586\nC -0.210 0.202 0.533\nC 1.237 0.002 0.376\nC 1.781 -1.141 -0.237\nC 3.164 -1.258 -0.344\nO 0.928 -2.101 -0.660\nC 1.411 -3.284 -1.265\nC 4.029 -0.275 0.133\nO 5.341 -0.543 -0.059\nC 6.381 0.326 0.361\nC 3.490 0.872 0.752\nC 2.114 0.983 0.860\nO 4.349 1.836 1.215\nC 3.840 2.982 1.870\nO -0.958 -0.304 -0.464\nC -2.311 -0.205 -0.448\nC -3.018 -0.773 -1.497\nH -6.690 -1.754 -3.589\nH -6.816 -0.196 -2.731\nH -6.921 -1.732 -1.821\nH -6.169 0.059 -0.465\nH -4.316 1.497 2.201\nH -0.152 1.234 2.391\nH 0.530 -3.878 -1.512\nH 1.968 -3.066 -2.184\nH 3.617 -2.125 -0.805\nH 2.050 -3.854 -0.580\nH 6.371 0.467 1.446\nH 6.316 1.298 -0.137\nH 7.313 -0.165 0.072\nH 4.708 3.575 2.163\nH 3.206 3.579 1.204\nH 1.685 1.869 1.308\nH 3.270 2.712 2.768\nH -2.502 -1.283 -2.298[\\XYZ]\nAssistant: The total energy on the {\\omega}B97X-D3\/def2-TZVP level of theory of the chemical structure is -1261.51648 Hartree."} {"text":"User: I must know the {\\omega}B97X-D3\/def2-TZVP total energy of the chemical structure with SMILES [H]C1C([H])C(Cl)C([H])C(N([H])C(O[Li])C([H])([H])SC([H])([H])C(O)O)C1[H].\nAssistant: Do you have the XYZ file file of a conformer of the chemical structure?\nUser: Yes: [XYZ]\n26\nLi1 H9 C10 S1 N1 Cl1 O3\nO 4.932 0.704 0.314\nC 4.841 1.329 -0.734\nO 5.616 2.232 -1.168\nC 3.620 0.950 -1.636\nS 3.203 2.210 -2.877\nC 4.122 1.576 -4.294\nC 5.529 2.120 -4.531\nO 5.925 1.990 -5.727\nN 6.199 2.648 -3.540\nC 7.514 3.156 -3.584\nC 8.133 3.450 -2.367\nC 9.427 3.944 -2.343\nC 10.125 4.157 -3.524\nC 9.497 3.881 -4.728\nCl 10.344 4.127 -6.210\nC 8.198 3.400 -4.774\nLi 4.878 1.192 -6.789\nH 2.752 0.835 -1.001\nH 3.813 -0.007 -2.114\nH 3.555 1.873 -5.217\nH 4.196 0.484 -4.269\nH 5.808 2.482 -2.478\nH 7.607 3.284 -1.434\nH 9.903 4.167 -1.394\nH 11.142 4.529 -3.508\nH 7.744 3.227 -5.738[\\XYZ]\nAssistant: The total energy on the {\\omega}B97X-D3\/def2-TZVP level of theory of the chemical structure is -1532.97043 Hartree."}", "/scratch/micpie/export/orbnet_denali/train_1-3.jsonl": "{"text":"Question: What's the structure of a conformer of the chemical structure with SMILES [H].[H].[H].[H]COC1C([H])C(OC([H])[H])C(OC([H])([H])[H])C([H])C1C1OC2C([H])C(OC([H])([H])[H])C([H])C(O[H])C2C(O)C1[H]?\nConstraint: Return a MOL3000 file.\nAnswer: [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 44 43 0 0 0\nM V30 BEGIN ATOM\nM V30 1 C -6.542161 -1.166728 -2.752454 0\nM V30 2 O -5.088479 -0.997615 -2.662078 0\nM V30 3 C -4.440652 -0.523102 -1.608575 0 VAL=3\nM V30 4 C -5.113212 -0.193188 -0.438939 0 VAL=3\nM V30 5 C -4.455676 0.340760 0.674620 0 VAL=3\nM V30 6 O -5.062269 0.787710 1.770695 0\nM V30 7 C -3.045743 0.423620 0.596097 0 VAL=3\nM V30 8 C -2.269530 0.793074 1.781109 0 VAL=3\nM V30 9 O -2.884426 1.013255 2.887104 0 VAL=1\nM V30 10 C -0.766797 0.910902 1.585163 0 VAL=3\nM V30 11 C -0.182288 0.474461 0.404403 0 VAL=3\nM V30 12 C 1.283620 0.202048 0.252355 0 VAL=3\nM V30 13 C 1.778616 -1.106511 -0.151329 0 VAL=3\nM V30 14 C 3.173810 -1.311431 -0.257820 0 VAL=3\nM V30 15 O 0.866150 -2.147054 -0.289708 0\nM V30 16 C 1.390125 -3.378650 -0.672688 0 VAL=2\nM V30 17 C 4.080406 -0.278938 0.060547 0 VAL=3\nM V30 18 O 5.353385 -0.605279 -0.051501 0\nM V30 19 C 6.531960 -0.133392 0.853667 0 VAL=3\nM V30 20 C 3.569292 1.025566 0.431027 0 VAL=3\nM V30 21 C 2.233647 1.309646 0.481381 0 VAL=3\nM V30 22 O 4.420965 1.976862 0.839682 0\nM V30 23 C 3.980090 3.085488 1.605972 0\nM V30 24 O -0.928225 0.130515 -0.704075 0\nM V30 25 C -2.312805 -0.001494 -0.610951 0 VAL=3\nM V30 26 C -3.033246 -0.518430 -1.713580 0 VAL=3\nM V30 27 H -6.692455 -1.677366 -3.712861 0\nM V30 28 H -7.160545 -0.226623 -2.602101 0\nM V30 29 H -6.898182 -1.863172 -2.013525 0\nM V30 30 H -6.212452 -0.191821 -0.339265 0\nM V30 31 H -4.310562 1.050092 2.377535 0\nM V30 32 H -0.180372 1.209801 2.397833 0\nM V30 33 H 0.568374 -3.844244 -1.347749 0\nM V30 34 H 2.403544 -3.406070 -1.384333 0 VAL=-1\nM V30 35 H 3.595369 -2.307063 -0.454407 0\nM V30 36 H 1.694191 -4.045680 0.307938 0 VAL=-1\nM V30 37 H 6.205293 -0.100263 1.961636 0\nM V30 38 H 6.987802 1.024410 0.533217 0 VAL=-1\nM V30 39 H 7.341653 -0.883486 0.698914 0\nM V30 40 H 4.810232 3.630612 1.927056 0\nM V30 41 H 3.279769 3.709574 1.078054 0\nM V30 42 H 1.707345 2.302783 0.631236 0\nM V30 43 H 3.490834 2.678656 2.505143 0\nM V30 44 H -2.445908 -0.864871 -2.521191 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 1 27\nM V30 3 1 1 28\nM V30 4 1 1 29\nM V30 5 1 2 3\nM V30 6 1 3 4\nM V30 7 1 3 26\nM V30 8 1 4 5\nM V30 9 1 4 30\nM V30 10 1 5 6\nM V30 11 1 5 7\nM V30 12 1 6 31\nM V30 13 1 7 8\nM V30 14 1 7 25\nM V30 15 1 8 9\nM V30 16 1 8 10\nM V30 17 1 10 11\nM V30 18 1 10 32\nM V30 19 1 11 12\nM V30 20 1 11 24\nM V30 21 1 12 13\nM V30 22 1 12 21\nM V30 23 1 13 14\nM V30 24 1 13 15\nM V30 25 1 14 17\nM V30 26 1 14 35\nM V30 27 1 15 16\nM V30 28 1 16 33\nM V30 29 1 17 18\nM V30 30 1 17 20\nM V30 31 1 18 19\nM V30 32 1 19 37\nM V30 33 1 19 39\nM V30 34 1 20 21\nM V30 35 1 20 22\nM V30 36 1 21 42\nM V30 37 1 22 23\nM V30 38 1 23 40\nM V30 39 1 23 41\nM V30 40 1 23 43\nM V30 41 1 24 25\nM V30 42 1 25 26\nM V30 43 1 26 44\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]"} {"text":"Question: What's the structure of a conformer of the chemical with SELFIES [H][C][C][Branch1][C][H][C][Branch1][C][Cl][C][Branch1][C][H][C][Branch2][Ring1][=C][N][Branch1][C][H][C][Branch1][Ring1][O][K][C][Branch1][C][H][Branch1][C][H][S][C][Branch1][C][H][Branch1][C][H][C][Branch1][C][O][O][C][Ring2][Ring1][Branch2][H]?\nConstraint: Return a MOL3000 file.\nAnswer: [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 26 26 0 0 0\nM V30 BEGIN ATOM\nM V30 1 O 3.679426 3.606092 -0.111915 0 VAL=1\nM V30 2 C 2.706478 2.996424 0.431732 0 VAL=3\nM V30 3 O 1.620097 3.459330 0.747639 0 VAL=1\nM V30 4 C 2.948302 1.488746 0.743883 0\nM V30 5 S 3.939548 0.689881 -0.551574 0\nM V30 6 C 5.486262 0.388669 0.329218 0\nM V30 7 C 6.311315 1.591895 0.739200 0 VAL=3\nM V30 8 O 7.268916 1.394331 1.509758 0\nM V30 9 N 6.001596 2.792092 0.234056 0\nM V30 10 C 6.781655 3.921630 0.518316 0 VAL=3\nM V30 11 C 8.182549 3.867423 0.485735 0 VAL=3\nM V30 12 C 8.933998 5.009215 0.744996 0 VAL=3\nM V30 13 C 8.306926 6.218264 1.021363 0 VAL=3\nM V30 14 C 6.916822 6.273859 1.017974 0 VAL=3\nM V30 15 Cl 6.132054 7.776585 1.324146 0\nM V30 16 C 6.153573 5.143614 0.770300 0 VAL=3\nM V30 17 K 9.103469 2.257414 2.626235 0 VAL=1\nM V30 18 H 1.989519 0.975711 0.829409 0\nM V30 19 H 3.459556 1.423585 1.706075 0\nM V30 20 H 6.107553 -0.186528 -0.364428 0\nM V30 21 H 5.336770 -0.233445 1.217609 0\nM V30 22 H 4.911425 3.049999 -0.023556 0\nM V30 23 H 8.695240 2.937689 0.213706 0\nM V30 24 H 10.024561 4.966479 0.679366 0\nM V30 25 H 8.887152 7.112606 1.212714 0\nM V30 26 H 5.072853 5.217396 0.760114 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 2 3\nM V30 3 1 2 4\nM V30 4 1 4 5\nM V30 5 1 4 18\nM V30 6 1 4 19\nM V30 7 1 5 6\nM V30 8 1 6 7\nM V30 9 1 6 20\nM V30 10 1 6 21\nM V30 11 1 7 8\nM V30 12 1 7 9\nM V30 13 1 8 17\nM V30 14 1 9 10\nM V30 15 1 9 22\nM V30 16 1 10 11\nM V30 17 1 10 16\nM V30 18 1 11 12\nM V30 19 1 11 23\nM V30 20 1 12 13\nM V30 21 1 12 24\nM V30 22 1 13 14\nM V30 23 1 13 25\nM V30 24 1 14 15\nM V30 25 1 14 16\nM V30 26 1 16 26\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]"}", "/scratch/micpie/export/orbnet_denali/test_0-2.jsonl": "{"text":"Question: What's the structure of a conformer of the chemical with DeepSMILES [H].[H]CCC[H])CI)CC[H])[H])C[H])[H])N[H])[H])CNCC[H])C[H])C[H])C[H])C6CO)O%10)))))))))))))C6[H]?\nConstraint: Return a MOL2000 file.\nAnswer: [V2000]\n\n ChemNLP 3D\n\n 35 36 0 0 0 0 0 0 0 0999 V2000\n -2.0543 -1.1961 -3.0875 O 0 0 0 0 0 1 0 0 0 0 0 0\n -2.2814 -1.1775 -1.8523 C 0 0 0 0 0 3 0 0 0 0 0 0\n -1.3451 -1.7854 -0.9548 O 0 0 0 0 0 0 0 0 0 0 0 0\n -1.1947 -1.4402 0.3099 C 0 0 0 0 0 3 0 0 0 0 0 0\n -0.0793 -2.1180 0.9832 N 0 0 0 0 0 4 0 0 0 0 0 0\n 1.2701 -1.6773 0.4750 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.6058 -0.2085 0.9212 C 0 0 0 0 0 0 0 0 0 0 0 0\n 2.8861 0.2596 0.2546 C 0 0 0 0 0 3 0 0 0 0 0 0\n 4.1644 0.0342 0.9317 C 0 0 0 0 0 3 0 0 0 0 0 0\n 5.4435 0.1900 0.2452 C 0 0 0 0 0 3 0 0 0 0 0 0\n 5.4075 0.7553 -1.0399 C 0 0 0 0 0 2 0 0 0 0 0 0\n 4.1422 1.0956 -1.6582 C 0 0 0 0 0 3 0 0 0 0 0 0\n 2.9173 0.7308 -1.0393 C 0 0 0 0 0 3 0 0 0 0 0 0\n 1.0440 1.1773 -2.0669 I 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9781 -0.7003 0.9850 N 0 0 0 0 0 2 0 0 0 0 0 0\n -3.0278 0.0088 0.2195 C 0 0 0 0 0 3 0 0 0 0 0 0\n -3.8105 0.9798 0.9050 C 0 0 0 0 0 3 0 0 0 0 0 0\n -4.8922 1.4896 0.2349 C 0 0 0 0 0 3 0 0 0 0 0 0\n -5.1648 1.1414 -1.0793 C 0 0 0 0 0 3 0 0 0 0 0 0\n -4.3790 0.2556 -1.8019 C 0 0 0 0 0 3 0 0 0 0 0 0\n -3.2371 -0.3228 -1.1263 C 0 0 0 0 0 3 0 0 0 0 0 0\n -0.2805 -3.1338 0.8078 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.1848 -1.9777 1.9703 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0912 -1.7313 -0.6413 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.9719 -2.4092 0.7934 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.7410 -0.1817 2.0601 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.8105 0.5035 0.6168 H 0 0 0 0 0 0 0 0 0 0 0 0\n 4.1545 -0.3411 1.9731 H 0 0 0 0 0 0 0 0 0 0 0 0\n 6.3952 0.0026 0.8134 H 0 0 0 0 0 0 0 0 0 0 0 0\n 6.4045 0.8532 -1.6902 H 0 0 0 0 0 15 0 0 0 0 0 0\n 4.1566 1.4771 -2.6606 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.4804 1.2591 1.9510 H 0 0 0 0 0 0 0 0 0 0 0 0\n -5.5473 2.2503 0.7198 H 0 0 0 0 0 0 0 0 0 0 0 0\n -6.0608 1.5374 -1.5805 H 0 0 0 0 0 0 0 0 0 0 0 0\n -4.5357 -0.0074 -2.8268 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 3 1 0\n 2 21 1 0\n 3 4 1 0\n 4 5 1 0\n 4 15 1 0\n 5 6 1 0\n 5 22 1 0\n 5 23 1 0\n 6 7 1 0\n 6 24 1 0\n 6 25 1 0\n 7 8 1 0\n 7 26 1 0\n 7 27 1 0\n 8 9 1 0\n 8 13 1 0\n 9 10 1 0\n 9 28 1 0\n 10 11 1 0\n 10 29 1 0\n 11 12 1 0\n 12 13 1 0\n 12 31 1 0\n 13 14 1 0\n 15 16 1 0\n 16 17 1 0\n 16 21 1 0\n 17 18 1 0\n 17 32 1 0\n 18 19 1 0\n 18 33 1 0\n 19 20 1 0\n 19 34 1 0\n 20 21 1 0\n 20 35 1 0\nM END\n[\\V2000]"} {"text":"Question: What's the structure of a conformer of the molecule with canonical SMILES COC1CC(O)C2C(O)CC(C3CC(OC)C(OC)CC3OC)OC2C1.[H]?\nConstraint: Return a MOL2000 file.\nAnswer: [V2000]\n\n ChemNLP 3D\n\n 44 45 0 0 0 0 0 0 0 0999 V2000\n -5.6663 2.7014 0.5384 C 0 0 0 0 0 0 0 0 0 0 0 0\n -5.9069 1.5002 -0.2183 O 0 0 0 0 0 0 0 0 0 0 0 0\n -4.9251 0.5906 -0.5444 C 0 0 0 0 0 3 0 0 0 0 0 0\n -5.3676 -0.5709 -1.1685 C 0 0 0 0 0 3 0 0 0 0 0 0\n -4.3649 -1.5678 -1.3994 C 0 0 0 0 0 3 0 0 0 0 0 0\n -4.7592 -2.6072 -2.1081 O 0 0 0 0 0 0 0 0 0 0 0 0\n -3.0308 -1.3539 -1.0639 C 0 0 0 0 0 3 0 0 0 0 0 0\n -2.1062 -2.4606 -1.1979 C 0 0 0 0 0 3 0 0 0 0 0 0\n -2.4204 -3.6067 -1.6366 O 0 0 0 0 0 1 0 0 0 0 0 0\n -0.7076 -2.1980 -0.8142 C 0 0 0 0 0 3 0 0 0 0 0 0\n -0.4751 -0.9733 -0.2690 C 0 0 0 0 0 3 0 0 0 0 0 0\n 0.8195 -0.4872 0.1482 C 0 0 0 0 0 3 0 0 0 0 0 0\n 1.0525 0.9411 0.3658 C 0 0 0 0 0 3 0 0 0 0 0 0\n 2.2799 1.5579 0.4811 C 0 0 0 0 0 3 0 0 0 0 0 0\n 2.2013 2.9243 0.4936 O 0 0 0 0 0 0 0 0 0 0 0 0\n 3.3957 3.5988 0.9227 C 0 0 0 0 0 0 0 0 0 0 0 0\n 3.4796 0.6940 0.4426 C 0 0 0 0 0 3 0 0 0 0 0 0\n 4.7597 1.2079 0.6106 O 0 0 0 0 0 0 0 0 0 0 0 0\n 6.0144 0.3205 0.2355 C 0 0 0 0 0 0 0 0 0 0 0 0\n 3.2712 -0.6969 0.3724 C 0 0 0 0 0 3 0 0 0 0 0 0\n 1.9977 -1.3283 0.2509 C 0 0 0 0 0 3 0 0 0 0 0 0\n 1.9150 -2.6623 0.1306 O 0 0 0 0 0 0 0 0 0 0 0 0\n 2.9390 -3.3857 0.8513 C 0 0 0 0 0 3 0 0 0 0 0 0\n -1.3953 0.0202 -0.1607 O 0 0 0 0 0 0 0 0 0 0 0 0\n -2.6804 -0.1401 -0.4863 C 0 0 0 0 0 3 0 0 0 0 0 0\n -3.6061 0.8939 -0.3291 C 0 0 0 0 0 3 0 0 0 0 0 0\n -6.5756 3.2275 0.6661 H 0 0 0 0 0 0 0 0 0 0 0 0\n -5.2466 2.4218 1.4898 H 0 0 0 0 0 0 0 0 0 0 0 0\n -4.9538 3.4185 0.0549 H 0 0 0 0 0 0 0 0 0 0 0 0\n -6.4070 -0.7143 -1.2996 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.9572 -3.3333 -2.0238 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0568 -2.9454 -0.9167 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.1597 1.6400 0.3603 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.9864 4.6759 1.1747 H 0 0 0 0 0 0 0 0 0 0 0 0\n 4.1793 3.5398 0.1201 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.8155 3.1351 1.8553 H 0 0 0 0 0 0 0 0 0 0 0 0\n 5.8972 -0.2061 -0.7883 H 0 0 0 0 0 0 0 0 0 0 0 0\n 6.2407 -0.3090 1.1443 H 0 0 0 0 0 0 0 0 0 0 0 0\n 6.8973 0.9635 0.1516 H 0 0 0 0 0 0 0 0 0 0 0 0\n 4.2464 -1.2212 0.2701 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.9812 -3.0006 1.9794 H 0 0 0 0 0 15 0 0 0 0 0 0\n 2.7575 -4.5389 0.9326 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.9794 -3.3009 0.4441 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.2304 1.8380 0.0757 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 1 27 1 0\n 1 28 1 0\n 1 29 1 0\n 2 3 1 0\n 3 4 1 0\n 3 26 1 0\n 4 5 1 0\n 4 30 1 0\n 5 6 1 0\n 5 7 1 0\n 6 31 1 0\n 7 8 1 0\n 7 25 1 0\n 8 9 1 0\n 8 10 1 0\n 10 11 1 0\n 10 32 1 0\n 11 12 1 0\n 11 24 1 0\n 12 13 1 0\n 12 21 1 0\n 13 14 1 0\n 13 33 1 0\n 14 15 1 0\n 14 17 1 0\n 15 16 1 0\n 16 34 1 0\n 16 35 1 0\n 16 36 1 0\n 17 18 1 0\n 17 20 1 0\n 18 19 1 0\n 19 37 1 0\n 19 38 1 0\n 19 39 1 0\n 20 21 1 0\n 20 40 1 0\n 21 22 1 0\n 22 23 1 0\n 23 42 1 0\n 23 43 1 0\n 24 25 1 0\n 25 26 1 0\n 26 44 1 0\nM END\n[\\V2000]"}", "/scratch/micpie/export/orbnet_denali/train_1-0.jsonl": "{"text":"The compound with SMILES [H].[H].[H].[H]COC1C([H])C(OC([H])[H])C(OC([H])([H])[H])C([H])C1C1OC2C([H])C(OC([H])([H])[H])C([H])C(O[H])C2C(O)C1[H] has a charge of 0."} {"text":"The molecule with SELFIES [H][C][C][Branch1][C][H][C][Branch1][C][Cl][C][Branch1][C][H][C][Branch2][Ring1][=C][N][Branch1][C][H][C][Branch1][Ring1][O][K][C][Branch1][C][H][Branch1][C][H][S][C][Branch1][C][H][Branch1][C][H][C][Branch1][C][O][O][C][Ring2][Ring1][Branch2][H] has a charge of 0."}", "/scratch/micpie/export/orbnet_denali/train_0-6.jsonl": "{"text":"User: I would like to know the GFN1-xTB total energy of the molecule with DeepSMILES [H].[H]CCCCO)OCN[H])[H])C[H])[H])C[H])[H])CC[H])C[H])C[H])C[H])C6I))))))))))NC6C[H])C%10[H].\nAssistant: Do you have the XYZ file file of a conformer of the molecule?\nUser: Here it is: [XYZ]\n35\nH14 C16 I1 N2 O2\nO -2.154 -0.717 -3.091\nC -2.202 -0.565 -1.929\nO -1.122 -1.409 -1.401\nC -1.305 -1.457 -0.153\nN -0.152 -2.111 0.661\nC 1.219 -1.647 0.224\nC 1.664 -0.394 0.961\nC 2.777 0.375 0.333\nC 3.935 0.634 1.154\nC 5.097 1.070 0.548\nC 5.103 1.426 -0.852\nC 3.906 1.419 -1.526\nC 2.850 0.848 -0.977\nI 1.179 0.659 -2.177\nN -1.875 -0.757 0.760\nC -3.056 -0.085 0.312\nC -3.964 0.458 1.172\nC -5.089 1.206 0.786\nC -5.206 1.642 -0.536\nC -4.212 1.061 -1.374\nC -3.158 0.148 -1.083\nH -0.111 -3.134 0.808\nH -0.337 -1.781 1.604\nH 1.144 -1.613 -0.889\nH 1.963 -2.359 0.423\nH 2.027 -0.644 1.936\nH 0.824 0.315 1.112\nH 3.724 0.534 2.253\nH 5.898 1.526 1.184\nH 5.975 1.976 -1.191\nH 3.709 2.064 -2.391\nH -3.781 0.112 2.214\nH -5.796 1.666 1.521\nH -5.858 2.404 -0.949\nH -4.357 1.395 -2.509[\\XYZ]\nAssistant: The GFN1-xTB total energy of the molecule is -60.53905 Hartree."} {"text":"User: I must know the GFN1-xTB total energy of the chemical with SELFIES [H].[H][O][C][C][Branch1][C][H][C][Branch1][#Branch2][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][C][H][C][O][C][Branch2][Ring2][O][C][C][Branch1][C][H][C][Branch1][#Branch1][O][C][Branch1][C][H][H][C][Branch1][#Branch2][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][C][H][C][Ring1][P][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][C][H][C][Branch1][C][O][C][Ring2][Ring2][=Branch2][Ring2][Ring1][=C].\nAssistant: Do you have the XYZ file file of a conformer of the chemical?\nUser: Here it is: [XYZ]\n44\nH18 C19 O7\nC -5.598 2.587 0.816\nO -5.926 1.423 0.013\nC -4.939 0.521 -0.270\nC -5.418 -0.709 -0.835\nC -4.461 -1.616 -1.307\nO -4.770 -2.786 -1.785\nC -3.070 -1.328 -1.136\nC -2.046 -2.384 -1.436\nO -2.387 -3.493 -1.882\nC -0.727 -2.031 -1.027\nC -0.429 -0.822 -0.590\nC 0.909 -0.400 -0.105\nC 1.179 1.028 -0.092\nC 2.416 1.533 0.228\nO 2.519 2.881 -0.147\nC 3.669 3.567 0.196\nC 3.432 0.679 0.617\nO 4.558 1.195 1.054\nC 5.499 0.294 1.785\nC 3.242 -0.665 0.606\nC 2.027 -1.198 0.254\nO 1.813 -2.546 0.258\nC 2.922 -3.496 0.237\nO -1.369 0.120 -0.394\nC -2.688 -0.147 -0.590\nC -3.591 0.803 -0.177\nH -6.605 2.908 1.172\nH -5.052 2.257 1.748\nH -5.030 3.357 0.311\nH -6.532 -0.752 -1.003\nH -3.809 -3.287 -1.760\nH 0.075 -2.844 -1.034\nH 0.410 1.814 -0.270\nH 3.583 4.677 -0.228\nH 4.585 3.040 -0.264\nH 3.881 3.555 1.284\nH 5.951 -0.390 1.068\nH 4.944 -0.290 2.595\nH 6.342 0.947 2.161\nH 3.966 -1.351 0.972\nH 3.545 -3.572 1.173\nH 2.440 -4.507 0.143\nH 3.593 -3.329 -0.658\nH -3.248 1.831 0.124[\\XYZ]\nAssistant: The GFN1-xTB total energy of the chemical is -78.67223 Hartree."}", "/scratch/micpie/export/orbnet_denali/valid_0-6.jsonl": "{"text":"User: I must know the GFN1-xTB total energy of the chemical structure with DeepSMILES [H]CC[H])C[H])CC[H])[H])C[H])[H])N[H])[H])CNCC[H])C[H])C[H])C[H])C6CO)O%10)))))))))))))CI)C6[H].\nAssistant: Do you have the XYZ file file of a conformer of the chemical structure?\nUser: Here it is: [XYZ]\n35\nH14 C16 I1 N2 O2\nO -2.540 -1.190 -3.025\nC -2.434 -1.001 -1.861\nO -1.398 -1.691 -1.154\nC -1.231 -1.479 0.138\nN -0.088 -2.242 0.653\nC 1.221 -1.655 0.190\nC 1.444 -0.283 0.818\nC 2.753 0.290 0.318\nC 3.874 0.247 1.143\nC 5.087 0.767 0.717\nC 5.184 1.333 -0.546\nC 4.074 1.373 -1.377\nC 2.857 0.852 -0.955\nI 1.197 0.926 -2.270\nN -1.868 -0.739 0.932\nC -2.935 -0.020 0.377\nC -3.697 0.813 1.192\nC -4.746 1.528 0.637\nC -5.040 1.421 -0.722\nC -4.286 0.596 -1.537\nC -3.224 -0.135 -0.995\nH -0.161 -3.197 0.301\nH -0.137 -2.231 1.671\nH 1.163 -1.570 -0.900\nH 2.003 -2.364 0.473\nH 1.455 -0.372 1.908\nH 0.619 0.375 0.528\nH 3.790 -0.193 2.132\nH 5.950 0.730 1.370\nH 6.126 1.744 -0.890\nH 4.153 1.813 -2.363\nH -3.459 0.890 2.246\nH -5.343 2.177 1.266\nH -5.863 1.987 -1.141\nH -4.501 0.500 -2.596[\\XYZ]\nAssistant: The GFN1-xTB total energy of the chemical structure is -60.66416 Hartree."} {"text":"User: I must know the GFN1-xTB total energy of the molecule with SELFIES [H][O][C][Branch2][Ring2][O][C][Branch1][C][H][C][Branch1][C][O][C][C][Branch1][C][H][C][Branch1][C][H][C][O][C][Branch1][C][H][C][Branch1][C][H][C][Ring1][#Branch1][C][Ring1][=N][O][C][Branch1][C][H][Branch1][C][H][H][C][C][Branch1][C][H][C][Branch1][C][H][C][Branch1][C][H][C][Branch1][C][H][C][Ring1][#Branch2][H].\nAssistant: Do you have the XYZ file file of a conformer of the molecule?\nUser: Yes: [XYZ]\n36\nH14 C18 O4\nC -2.218 -1.492 2.559\nO -5.256 -1.280 0.457\nC -4.004 -1.186 1.036\nC -1.352 -0.748 1.755\nC 0.090 -0.809 2.106\nO 0.500 -1.048 3.205\nC 0.906 -0.583 0.886\nC 2.882 -0.503 -1.448\nO 2.653 0.380 2.128\nC 3.025 0.287 -0.318\nC 3.732 -0.369 -2.513\nC 4.735 0.634 -2.441\nC 4.728 1.588 -1.388\nC 3.995 1.331 -0.253\nC 2.162 -0.015 0.896\nC -1.854 -0.095 0.549\nC -1.605 1.747 -0.996\nC -3.171 -0.420 0.174\nO -1.041 0.760 -0.135\nC -4.022 0.031 -0.856\nC -5.231 -0.575 -0.713\nC -3.587 -1.743 2.266\nH -4.234 -2.330 2.953\nH -1.787 -1.825 3.577\nH 0.460 -0.762 -0.063\nH -0.766 2.364 -1.261\nH 2.003 -1.194 -1.541\nH 3.703 -1.018 -3.400\nH 5.418 0.902 -3.203\nH 5.428 2.428 -1.383\nH 3.546 0.998 2.171\nH 4.031 1.980 0.639\nH -2.386 2.272 -0.502\nH -2.064 1.319 -1.909\nH -6.158 -0.617 -1.203\nH -3.740 0.774 -1.595[\\XYZ]\nAssistant: The GFN1-xTB total energy of the molecule is -61.08251 Hartree."}", "/scratch/micpie/export/orbnet_denali/valid_1-2.jsonl": "{"text":"Question: What is the structure of a conformer of the molecule with canonical SMILES COC1CC(O)C2C(O)CC(C3CC(OC)C(OC)CC3OC)OC2C1.[H].[H]?\nConstraint: Return a MOL2000 file.\nAnswer: [V2000]\n\n ChemNLP 3D\n\n 44 44 0 0 0 0 0 0 0 0999 V2000\n -6.4815 -0.5826 -2.6614 C 0 0 0 0 0 0 0 0 0 0 0 0\n -5.0617 -0.7557 -2.4841 O 0 0 0 0 0 0 0 0 0 0 0 0\n -4.4669 -0.3032 -1.3816 C 0 0 0 0 0 3 0 0 0 0 0 0\n -5.1392 0.3789 -0.3815 C 0 0 0 0 0 3 0 0 0 0 0 0\n -4.4113 0.7259 0.7874 C 0 0 0 0 0 3 0 0 0 0 0 0\n -5.0761 1.3078 1.7664 O 0 0 0 0 0 0 0 0 0 0 0 0\n -3.0446 0.5984 0.8510 C 0 0 0 0 0 3 0 0 0 0 0 0\n -2.2519 1.1089 1.9823 C 0 0 0 0 0 3 0 0 0 0 0 0\n -2.7362 1.7139 2.9183 O 0 0 0 0 0 1 0 0 0 0 0 0\n -0.8371 0.8381 1.8443 C 0 0 0 0 0 3 0 0 0 0 0 0\n -0.2939 0.1721 0.8135 C 0 0 0 0 0 3 0 0 0 0 0 0\n 1.1340 -0.1413 0.6104 C 0 0 0 0 0 3 0 0 0 0 0 0\n 1.5139 -1.3322 -0.0666 C 0 0 0 0 0 3 0 0 0 0 0 0\n 2.8273 -1.6772 -0.2759 C 0 0 0 0 0 3 0 0 0 0 0 0\n 3.0380 -2.8787 -0.8782 O 0 0 0 0 0 0 0 0 0 0 0 0\n 4.3639 -3.2120 -1.4131 C 0 0 0 0 0 0 0 0 0 0 0 0\n 3.8737 -0.6986 0.0512 C 0 0 0 0 0 3 0 0 0 0 0 0\n 5.1926 -0.9261 -0.2460 O 0 0 0 0 0 0 0 0 0 0 0 0\n 6.2146 0.0399 0.1141 C 0 0 0 0 0 0 0 0 0 0 0 0\n 3.4963 0.4694 0.7070 C 0 0 0 0 0 3 0 0 0 0 0 0\n 2.1566 0.7220 0.9855 C 0 0 0 0 0 3 0 0 0 0 0 0\n 1.7615 1.8762 1.6643 O 0 0 0 0 0 0 0 0 0 0 0 0\n 1.8092 3.1019 0.9217 C 0 0 0 0 0 2 0 0 0 0 0 0\n -1.0618 -0.3569 -0.1157 O 0 0 0 0 0 0 0 0 0 0 0 0\n -2.3828 -0.1592 -0.1386 C 0 0 0 0 0 3 0 0 0 0 0 0\n -3.0954 -0.5945 -1.2472 C 0 0 0 0 0 3 0 0 0 0 0 0\n -6.7532 -1.0263 -3.6003 H 0 0 0 0 0 0 0 0 0 0 0 0\n -6.7834 0.4506 -2.5928 H 0 0 0 0 0 0 0 0 0 0 0 0\n -7.0691 -1.1327 -1.8868 H 0 0 0 0 0 0 0 0 0 0 0 0\n -6.1864 0.5604 -0.3033 H 0 0 0 0 0 0 0 0 0 0 0 0\n -4.4147 1.5603 2.4964 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.1952 1.2048 2.6159 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7705 -2.0244 -0.4701 H 0 0 0 0 0 0 0 0 0 0 0 0\n 4.2475 -4.1442 -1.9644 H 0 0 0 0 0 0 0 0 0 0 0 0\n 4.7530 -2.3678 -2.1171 H 0 0 0 0 0 0 0 0 0 0 0 0\n 5.0804 -3.4216 -0.5801 H 0 0 0 0 0 0 0 0 0 0 0 0\n 6.1432 0.3118 1.2097 H 0 0 0 0 0 0 0 0 0 0 0 0\n 6.1927 0.9784 -0.5115 H 0 0 0 0 0 0 0 0 0 0 0 0\n 7.1705 -0.4681 0.0269 H 0 0 0 0 0 0 0 0 0 0 0 0\n 4.2049 1.2477 0.9582 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.6815 3.0586 0.1178 H 0 0 0 0 0 15 0 0 0 0 0 0\n 0.7840 3.2930 0.4007 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.0782 4.0440 1.5961 H 0 0 0 0 0 15 0 0 0 0 0 0\n -2.6757 -1.1843 -2.0428 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 1 27 1 0\n 1 28 1 0\n 1 29 1 0\n 2 3 1 0\n 3 4 1 0\n 3 26 1 0\n 4 5 1 0\n 4 30 1 0\n 5 6 1 0\n 5 7 1 0\n 6 31 1 0\n 7 8 1 0\n 7 25 1 0\n 8 9 1 0\n 8 10 1 0\n 10 11 1 0\n 10 32 1 0\n 11 12 1 0\n 11 24 1 0\n 12 13 1 0\n 12 21 1 0\n 13 14 1 0\n 13 33 1 0\n 14 15 1 0\n 14 17 1 0\n 15 16 1 0\n 16 34 1 0\n 16 35 1 0\n 16 36 1 0\n 17 18 1 0\n 17 20 1 0\n 18 19 1 0\n 19 37 1 0\n 19 38 1 0\n 19 39 1 0\n 20 21 1 0\n 20 40 1 0\n 21 22 1 0\n 22 23 1 0\n 23 42 1 0\n 24 25 1 0\n 25 26 1 0\n 26 44 1 0\nM END\n[\\V2000]"} {"text":"Question: What's the structure of a conformer of the compound with DeepSMILES [H]CC[H])CCl)C[H])CN[H])CO[K]OCO)C[H])[H])SC8[H])[H]))))))))))C6[H]?\nConstraint: Return a MOL2000 file.\nAnswer: [V2000]\n\n ChemNLP 3D\n\n 26 27 0 0 0 0 0 0 0 0999 V2000\n 3.2482 0.0339 -0.3917 O 0 0 0 0 0 0 0 0 0 0 0 0\n 2.4720 0.8124 0.2114 C 0 0 0 0 0 3 0 0 0 0 0 0\n 1.2837 1.0375 0.0164 O 0 0 0 0 0 1 0 0 0 0 0 0\n 3.1452 1.6111 1.4148 C 0 0 0 0 0 0 0 0 0 0 0 0\n 3.8951 3.1564 0.8284 S 0 0 0 0 0 0 0 0 0 0 0 0\n 5.4310 2.6119 0.0087 C 0 0 0 0 0 0 0 0 0 0 0 0\n 6.4546 2.3329 1.0773 C 0 0 0 0 0 3 0 0 0 0 0 0\n 6.5789 1.1934 1.5289 O 0 0 0 0 0 0 0 0 0 0 0 0\n 7.2148 3.3260 1.5968 N 0 0 0 0 0 0 0 0 0 0 0 0\n 7.3487 4.6766 1.1858 C 0 0 0 0 0 3 0 0 0 0 0 0\n 6.2440 5.4925 0.9633 C 0 0 0 0 0 3 0 0 0 0 0 0\n 6.4330 6.8190 0.5998 C 0 0 0 0 0 3 0 0 0 0 0 0\n 7.7109 7.3433 0.4735 C 0 0 0 0 0 3 0 0 0 0 0 0\n 8.8080 6.5291 0.7192 C 0 0 0 0 0 3 0 0 0 0 0 0\n 10.4025 7.1558 0.5878 Cl 0 0 0 0 0 0 0 0 0 0 0 0\n 8.6355 5.1986 1.0702 C 0 0 0 0 0 3 0 0 0 0 0 0\n 5.1668 -0.4550 0.6904 K 0 0 0 0 0 2 0 0 0 0 0 0\n 3.8860 0.9903 1.9573 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.3684 1.9418 2.1080 H 0 0 0 0 0 0 0 0 0 0 0 0\n 5.7322 3.4115 -0.6637 H 0 0 0 0 0 0 0 0 0 0 0 0\n 5.2544 1.6876 -0.5696 H 0 0 0 0 0 0 0 0 0 0 0 0\n 7.8855 2.9895 2.2831 H 0 0 0 0 0 0 0 0 0 0 0 0\n 5.2387 5.1124 1.1011 H 0 0 0 0 0 0 0 0 0 0 0 0\n 5.5718 7.4523 0.4188 H 0 0 0 0 0 0 0 0 0 0 0 0\n 7.8599 8.3814 0.2094 H 0 0 0 0 0 0 0 0 0 0 0 0\n 9.4971 4.5721 1.2519 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 1 17 1 0\n 2 3 1 0\n 2 4 1 0\n 4 5 1 0\n 4 18 1 0\n 4 19 1 0\n 5 6 1 0\n 6 7 1 0\n 6 20 1 0\n 6 21 1 0\n 7 8 1 0\n 7 9 1 0\n 8 17 1 0\n 9 10 1 0\n 9 22 1 0\n 10 11 1 0\n 10 16 1 0\n 11 12 1 0\n 11 23 1 0\n 12 13 1 0\n 12 24 1 0\n 13 14 1 0\n 13 25 1 0\n 14 15 1 0\n 14 16 1 0\n 16 26 1 0\nM END\n[\\V2000]"}", "/scratch/micpie/export/orbnet_denali/test_0-0.jsonl": "{"text":"The chemical with SMILES [H].[H]C1CC([H])C(I)C(C([H])([H])C([H])([H])N([H])([H])C2NC3C([H])C([H])C([H])C([H])C3C(O)O2)C1[H] has a charge of 1."} {"text":"The chemical structure with SELFIES [H].[H][O][C][C][Branch1][C][H][C][Branch1][#Branch2][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][C][H][C][O][C][Branch2][Ring2][N][C][C][Branch1][C][H][C][Branch1][#Branch2][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][#Branch2][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][C][H][C][Ring2][Ring1][C][O][C][Branch1][C][H][H][C][Branch1][C][H][C][Branch1][C][O][C][Ring2][Ring2][=Branch2][Ring2][Ring1][=C] has a charge of 0."}", "/scratch/micpie/export/orbnet_denali/valid_0-7.jsonl": "{"text":"User: I would like to know the {\\omega}B97X-D3\/def2-TZVP total energy of the chemical with DeepSMILES [H]CC[H])C[H])CC[H])[H])C[H])[H])N[H])[H])CNCC[H])C[H])C[H])C[H])C6CO)O%10)))))))))))))CI)C6[H].\nAssistant: Do you have the XYZ file file of a conformer of the chemical?\nUser: I have it: [XYZ]\n35\nH14 C16 I1 N2 O2\nO -2.540 -1.190 -3.025\nC -2.434 -1.001 -1.861\nO -1.398 -1.691 -1.154\nC -1.231 -1.479 0.138\nN -0.088 -2.242 0.653\nC 1.221 -1.655 0.190\nC 1.444 -0.283 0.818\nC 2.753 0.290 0.318\nC 3.874 0.247 1.143\nC 5.087 0.767 0.717\nC 5.184 1.333 -0.546\nC 4.074 1.373 -1.377\nC 2.857 0.852 -0.955\nI 1.197 0.926 -2.270\nN -1.868 -0.739 0.932\nC -2.935 -0.020 0.377\nC -3.697 0.813 1.192\nC -4.746 1.528 0.637\nC -5.040 1.421 -0.722\nC -4.286 0.596 -1.537\nC -3.224 -0.135 -0.995\nH -0.161 -3.197 0.301\nH -0.137 -2.231 1.671\nH 1.163 -1.570 -0.900\nH 2.003 -2.364 0.473\nH 1.455 -0.372 1.908\nH 0.619 0.375 0.528\nH 3.790 -0.193 2.132\nH 5.950 0.730 1.370\nH 6.126 1.744 -0.890\nH 4.153 1.813 -2.363\nH -3.459 0.890 2.246\nH -5.343 2.177 1.266\nH -5.863 1.987 -1.141\nH -4.501 0.500 -2.596[\\XYZ]\nAssistant: The total energy on the {\\omega}B97X-D3\/def2-TZVP level of theory of the chemical is -1175.70842 Hartree."} {"text":"User: I have to know the {\\omega}B97X-D3\/def2-TZVP total energy of the chemical with canonical SMILES COC1C(C(O)CC(O)C2CCCCC2)CCC2OCCC21.\nAssistant: Do you have the XYZ file file of a conformer of the chemical?\nUser: Here it is: [XYZ]\n36\nH14 C18 O4\nC -2.218 -1.492 2.559\nO -5.256 -1.280 0.457\nC -4.004 -1.186 1.036\nC -1.352 -0.748 1.755\nC 0.090 -0.809 2.106\nO 0.500 -1.048 3.205\nC 0.906 -0.583 0.886\nC 2.882 -0.503 -1.448\nO 2.653 0.380 2.128\nC 3.025 0.287 -0.318\nC 3.732 -0.369 -2.513\nC 4.735 0.634 -2.441\nC 4.728 1.588 -1.388\nC 3.995 1.331 -0.253\nC 2.162 -0.015 0.896\nC -1.854 -0.095 0.549\nC -1.605 1.747 -0.996\nC -3.171 -0.420 0.174\nO -1.041 0.760 -0.135\nC -4.022 0.031 -0.856\nC -5.231 -0.575 -0.713\nC -3.587 -1.743 2.266\nH -4.234 -2.330 2.953\nH -1.787 -1.825 3.577\nH 0.460 -0.762 -0.063\nH -0.766 2.364 -1.261\nH 2.003 -1.194 -1.541\nH 3.703 -1.018 -3.400\nH 5.418 0.902 -3.203\nH 5.428 2.428 -1.383\nH 3.546 0.998 2.171\nH 4.031 1.980 0.639\nH -2.386 2.272 -0.502\nH -2.064 1.319 -1.909\nH -6.158 -0.617 -1.203\nH -3.740 0.774 -1.595[\\XYZ]\nAssistant: The total energy on the {\\omega}B97X-D3\/def2-TZVP level of theory of the chemical is -995.21527 Hartree."}", "/scratch/micpie/export/orbnet_denali/train_1-7.jsonl": "{"text":"User: I want to know the {\\omega}B97X-D3\/def2-TZVP total energy of the chemical with DeepSMILES [H].[H].[H].[H]COCC[H])COC[H])[H])))COC[H])[H])[H])))C[H])C6COCC[H])COC[H])[H])[H])))C[H])CO[H]))C6CO)C%10[H].\nAssistant: Do you have the XYZ file file of a conformer of the chemical?\nUser: Here it is: [XYZ]\n44\nH18 C19 O7\nC -6.542 -1.167 -2.752\nO -5.088 -0.998 -2.662\nC -4.441 -0.523 -1.609\nC -5.113 -0.193 -0.439\nC -4.456 0.341 0.675\nO -5.062 0.788 1.771\nC -3.046 0.424 0.596\nC -2.270 0.793 1.781\nO -2.884 1.013 2.887\nC -0.767 0.911 1.585\nC -0.182 0.474 0.404\nC 1.284 0.202 0.252\nC 1.779 -1.107 -0.151\nC 3.174 -1.311 -0.258\nO 0.866 -2.147 -0.290\nC 1.390 -3.379 -0.673\nC 4.080 -0.279 0.061\nO 5.353 -0.605 -0.052\nC 6.532 -0.133 0.854\nC 3.569 1.026 0.431\nC 2.234 1.310 0.481\nO 4.421 1.977 0.840\nC 3.980 3.085 1.606\nO -0.928 0.131 -0.704\nC -2.313 -0.001 -0.611\nC -3.033 -0.518 -1.714\nH -6.692 -1.677 -3.713\nH -7.161 -0.227 -2.602\nH -6.898 -1.863 -2.014\nH -6.212 -0.192 -0.339\nH -4.311 1.050 2.378\nH -0.180 1.210 2.398\nH 0.568 -3.844 -1.348\nH 2.404 -3.406 -1.384\nH 3.595 -2.307 -0.454\nH 1.694 -4.046 0.308\nH 6.205 -0.100 1.962\nH 6.988 1.024 0.533\nH 7.342 -0.883 0.699\nH 4.810 3.631 1.927\nH 3.280 3.710 1.078\nH 1.707 2.303 0.631\nH 3.491 2.679 2.505\nH -2.446 -0.865 -2.521[\\XYZ]\nAssistant: The total energy on the {\\omega}B97X-D3\/def2-TZVP level of theory of the chemical is -1261.39283 Hartree."} {"text":"User: I have to know the {\\omega}B97X-D3\/def2-TZVP total energy of the molecule with SMILES [H]C1C([H])C(Cl)C([H])C(N([H])C(O[K])C([H])([H])SC([H])([H])C(O)O)C1[H].\nAssistant: Do you have the XYZ file file of a conformer of the molecule?\nUser: I have it: [XYZ]\n26\nK1 H9 C10 S1 N1 Cl1 O3\nO 3.679 3.606 -0.112\nC 2.706 2.996 0.432\nO 1.620 3.459 0.748\nC 2.948 1.489 0.744\nS 3.940 0.690 -0.552\nC 5.486 0.389 0.329\nC 6.311 1.592 0.739\nO 7.269 1.394 1.510\nN 6.002 2.792 0.234\nC 6.782 3.922 0.518\nC 8.183 3.867 0.486\nC 8.934 5.009 0.745\nC 8.307 6.218 1.021\nC 6.917 6.274 1.018\nCl 6.132 7.777 1.324\nC 6.154 5.144 0.770\nK 9.103 2.257 2.626\nH 1.990 0.976 0.829\nH 3.460 1.424 1.706\nH 6.108 -0.187 -0.364\nH 5.337 -0.233 1.218\nH 4.911 3.050 -0.024\nH 8.695 2.938 0.214\nH 10.025 4.966 0.679\nH 8.887 7.113 1.213\nH 5.073 5.217 0.760[\\XYZ]\nAssistant: The total energy on the {\\omega}B97X-D3\/def2-TZVP level of theory of the molecule is -2125.39705 Hartree."}", "/scratch/micpie/export/orbnet_denali/train_1-4.jsonl": "{"text":"Task: Return the total energy of a compound computed at the GFN1-xTB level of theory.\nDescription: The compound has the XYZ file [XYZ]\n44\nH18 C19 O7\nC -6.542 -1.167 -2.752\nO -5.088 -0.998 -2.662\nC -4.441 -0.523 -1.609\nC -5.113 -0.193 -0.439\nC -4.456 0.341 0.675\nO -5.062 0.788 1.771\nC -3.046 0.424 0.596\nC -2.270 0.793 1.781\nO -2.884 1.013 2.887\nC -0.767 0.911 1.585\nC -0.182 0.474 0.404\nC 1.284 0.202 0.252\nC 1.779 -1.107 -0.151\nC 3.174 -1.311 -0.258\nO 0.866 -2.147 -0.290\nC 1.390 -3.379 -0.673\nC 4.080 -0.279 0.061\nO 5.353 -0.605 -0.052\nC 6.532 -0.133 0.854\nC 3.569 1.026 0.431\nC 2.234 1.310 0.481\nO 4.421 1.977 0.840\nC 3.980 3.085 1.606\nO -0.928 0.131 -0.704\nC -2.313 -0.001 -0.611\nC -3.033 -0.518 -1.714\nH -6.692 -1.677 -3.713\nH -7.161 -0.227 -2.602\nH -6.898 -1.863 -2.014\nH -6.212 -0.192 -0.339\nH -4.311 1.050 2.378\nH -0.180 1.210 2.398\nH 0.568 -3.844 -1.348\nH 2.404 -3.406 -1.384\nH 3.595 -2.307 -0.454\nH 1.694 -4.046 0.308\nH 6.205 -0.100 1.962\nH 6.988 1.024 0.533\nH 7.342 -0.883 0.699\nH 4.810 3.631 1.927\nH 3.280 3.710 1.078\nH 1.707 2.303 0.631\nH 3.491 2.679 2.505\nH -2.446 -0.865 -2.521[\\XYZ].\nAnswer: -78.62736 Hartree"} {"text":"Task: Return the total energy of a chemical structure computed at the GFN1-xTB level of theory.\nDescription: The chemical structure has the XYZ file [XYZ]\n26\nK1 H9 C10 S1 N1 Cl1 O3\nO 3.679 3.606 -0.112\nC 2.706 2.996 0.432\nO 1.620 3.459 0.748\nC 2.948 1.489 0.744\nS 3.940 0.690 -0.552\nC 5.486 0.389 0.329\nC 6.311 1.592 0.739\nO 7.269 1.394 1.510\nN 6.002 2.792 0.234\nC 6.782 3.922 0.518\nC 8.183 3.867 0.486\nC 8.934 5.009 0.745\nC 8.307 6.218 1.021\nC 6.917 6.274 1.018\nCl 6.132 7.777 1.324\nC 6.154 5.144 0.770\nK 9.103 2.257 2.626\nH 1.990 0.976 0.829\nH 3.460 1.424 1.706\nH 6.108 -0.187 -0.364\nH 5.337 -0.233 1.218\nH 4.911 3.050 -0.024\nH 8.695 2.938 0.214\nH 10.025 4.966 0.679\nH 8.887 7.113 1.213\nH 5.073 5.217 0.760[\\XYZ].\nAnswer: -49.42847 Hartree"}", "/scratch/micpie/export/orbnet_denali/test_0-3.jsonl": "{"text":"Question: What's the structure of a conformer of the chemical with SMILES [H].[H]C1CC([H])C(I)C(C([H])([H])C([H])([H])N([H])([H])C2NC3C([H])C([H])C([H])C([H])C3C(O)O2)C1[H]?\nConstraint: Return a MOL3000 file.\nAnswer: [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 35 36 0 0 0\nM V30 BEGIN ATOM\nM V30 1 O -2.054266 -1.196150 -3.087510 0 VAL=1\nM V30 2 C -2.281387 -1.177483 -1.852283 0 VAL=3\nM V30 3 O -1.345104 -1.785396 -0.954835 0\nM V30 4 C -1.194674 -1.440191 0.309890 0 VAL=3\nM V30 5 N -0.079276 -2.117958 0.983158 0 VAL=4\nM V30 6 C 1.270055 -1.677267 0.474989 0\nM V30 7 C 1.605770 -0.208528 0.921163 0\nM V30 8 C 2.886089 0.259559 0.254579 0 VAL=3\nM V30 9 C 4.164401 0.034162 0.931672 0 VAL=3\nM V30 10 C 5.443515 0.190013 0.245215 0 VAL=3\nM V30 11 C 5.407544 0.755291 -1.039873 0 VAL=2\nM V30 12 C 4.142159 1.095587 -1.658191 0 VAL=3\nM V30 13 C 2.917337 0.730822 -1.039298 0 VAL=3\nM V30 14 I 1.044020 1.177263 -2.066873 0\nM V30 15 N -1.978136 -0.700310 0.984952 0 VAL=2\nM V30 16 C -3.027842 0.008817 0.219545 0 VAL=3\nM V30 17 C -3.810463 0.979761 0.904968 0 VAL=3\nM V30 18 C -4.892241 1.489612 0.234887 0 VAL=3\nM V30 19 C -5.164823 1.141383 -1.079346 0 VAL=3\nM V30 20 C -4.379035 0.255593 -1.801859 0 VAL=3\nM V30 21 C -3.237117 -0.322821 -1.126254 0 VAL=3\nM V30 22 H -0.280495 -3.133835 0.807839 0\nM V30 23 H -0.184767 -1.977743 1.970289 0\nM V30 24 H 1.091192 -1.731322 -0.641273 0\nM V30 25 H 1.971928 -2.409176 0.793404 0\nM V30 26 H 1.741003 -0.181720 2.060080 0\nM V30 27 H 0.810461 0.503467 0.616822 0\nM V30 28 H 4.154523 -0.341056 1.973095 0\nM V30 29 H 6.395171 0.002597 0.813436 0\nM V30 30 H 6.404482 0.853163 -1.690186 0 VAL=-1\nM V30 31 H 4.156614 1.477126 -2.660614 0\nM V30 32 H -3.480375 1.259093 1.950976 0\nM V30 33 H -5.547253 2.250300 0.719758 0\nM V30 34 H -6.060807 1.537416 -1.580523 0\nM V30 35 H -4.535733 -0.007373 -2.826767 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 2 3\nM V30 3 1 2 21\nM V30 4 1 3 4\nM V30 5 1 4 5\nM V30 6 1 4 15\nM V30 7 1 5 6\nM V30 8 1 5 22\nM V30 9 1 5 23\nM V30 10 1 6 7\nM V30 11 1 6 24\nM V30 12 1 6 25\nM V30 13 1 7 8\nM V30 14 1 7 26\nM V30 15 1 7 27\nM V30 16 1 8 9\nM V30 17 1 8 13\nM V30 18 1 9 10\nM V30 19 1 9 28\nM V30 20 1 10 11\nM V30 21 1 10 29\nM V30 22 1 11 12\nM V30 23 1 12 13\nM V30 24 1 12 31\nM V30 25 1 13 14\nM V30 26 1 15 16\nM V30 27 1 16 17\nM V30 28 1 16 21\nM V30 29 1 17 18\nM V30 30 1 17 32\nM V30 31 1 18 19\nM V30 32 1 18 33\nM V30 33 1 19 20\nM V30 34 1 19 34\nM V30 35 1 20 21\nM V30 36 1 20 35\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]"} {"text":"Question: What is the structure of a conformer of the chemical structure with InChI InChI=1S\/C19H34O7.H\/c1-22-10-5-12(20)19-13(21)8-15(26-18(19)6-10)11-7-16(24-3)17(25-4)9-14(11)23-2;\/h10-21H,5-9H2,1-4H3;?\nConstraint: Return a MOL3000 file.\nAnswer: [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 44 45 0 0 0\nM V30 BEGIN ATOM\nM V30 1 C -5.666262 2.701421 0.538356 0\nM V30 2 O -5.906930 1.500161 -0.218301 0\nM V30 3 C -4.925098 0.590626 -0.544385 0 VAL=3\nM V30 4 C -5.367559 -0.570919 -1.168490 0 VAL=3\nM V30 5 C -4.364895 -1.567799 -1.399363 0 VAL=3\nM V30 6 O -4.759193 -2.607156 -2.108141 0\nM V30 7 C -3.030767 -1.353927 -1.063910 0 VAL=3\nM V30 8 C -2.106177 -2.460592 -1.197935 0 VAL=3\nM V30 9 O -2.420433 -3.606689 -1.636577 0 VAL=1\nM V30 10 C -0.707580 -2.197951 -0.814233 0 VAL=3\nM V30 11 C -0.475145 -0.973341 -0.268985 0 VAL=3\nM V30 12 C 0.819532 -0.487201 0.148225 0 VAL=3\nM V30 13 C 1.052453 0.941118 0.365790 0 VAL=3\nM V30 14 C 2.279884 1.557918 0.481134 0 VAL=3\nM V30 15 O 2.201336 2.924299 0.493574 0\nM V30 16 C 3.395673 3.598792 0.922704 0\nM V30 17 C 3.479582 0.694029 0.442623 0 VAL=3\nM V30 18 O 4.759702 1.207887 0.610637 0\nM V30 19 C 6.014420 0.320519 0.235549 0\nM V30 20 C 3.271183 -0.696881 0.372413 0 VAL=3\nM V30 21 C 1.997699 -1.328284 0.250904 0 VAL=3\nM V30 22 O 1.915043 -2.662292 0.130583 0\nM V30 23 C 2.939014 -3.385672 0.851259 0 VAL=3\nM V30 24 O -1.395254 0.020156 -0.160676 0\nM V30 25 C -2.680424 -0.140124 -0.486334 0 VAL=3\nM V30 26 C -3.606098 0.893937 -0.329144 0 VAL=3\nM V30 27 H -6.575628 3.227530 0.666084 0\nM V30 28 H -5.246614 2.421755 1.489806 0\nM V30 29 H -4.953780 3.418456 0.054871 0\nM V30 30 H -6.407041 -0.714281 -1.299592 0\nM V30 31 H -3.957249 -3.333254 -2.023808 0\nM V30 32 H 0.056784 -2.945390 -0.916721 0\nM V30 33 H 0.159711 1.639964 0.360277 0\nM V30 34 H 2.986392 4.675898 1.174702 0\nM V30 35 H 4.179301 3.539757 0.120143 0\nM V30 36 H 3.815451 3.135106 1.855307 0\nM V30 37 H 5.897219 -0.206066 -0.788253 0\nM V30 38 H 6.240702 -0.308954 1.144277 0\nM V30 39 H 6.897288 0.963534 0.151634 0\nM V30 40 H 4.246390 -1.221162 0.270064 0\nM V30 41 H 2.981206 -3.000641 1.979397 0 VAL=-1\nM V30 42 H 2.757508 -4.538916 0.932585 0\nM V30 43 H 3.979403 -3.300897 0.444144 0\nM V30 44 H -3.230430 1.837992 0.075730 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 1 27\nM V30 3 1 1 28\nM V30 4 1 1 29\nM V30 5 1 2 3\nM V30 6 1 3 4\nM V30 7 1 3 26\nM V30 8 1 4 5\nM V30 9 1 4 30\nM V30 10 1 5 6\nM V30 11 1 5 7\nM V30 12 1 6 31\nM V30 13 1 7 8\nM V30 14 1 7 25\nM V30 15 1 8 9\nM V30 16 1 8 10\nM V30 17 1 10 11\nM V30 18 1 10 32\nM V30 19 1 11 12\nM V30 20 1 11 24\nM V30 21 1 12 13\nM V30 22 1 12 21\nM V30 23 1 13 14\nM V30 24 1 13 33\nM V30 25 1 14 15\nM V30 26 1 14 17\nM V30 27 1 15 16\nM V30 28 1 16 34\nM V30 29 1 16 35\nM V30 30 1 16 36\nM V30 31 1 17 18\nM V30 32 1 17 20\nM V30 33 1 18 19\nM V30 34 1 19 37\nM V30 35 1 19 38\nM V30 36 1 19 39\nM V30 37 1 20 21\nM V30 38 1 20 40\nM V30 39 1 21 22\nM V30 40 1 22 23\nM V30 41 1 23 42\nM V30 42 1 23 43\nM V30 43 1 24 25\nM V30 44 1 25 26\nM V30 45 1 26 44\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]"}", "/scratch/micpie/export/orbnet_denali/train_0-0.jsonl": "{"text":"The compound with SMILES [H].[H]C1CC2C(O)OC(N([H])([H])C([H])([H])C([H])([H])C3C([H])C([H])C([H])C([H])C3I)NC2C([H])C1[H] has a charge of 1."} {"text":"The chemical structure with DeepSMILES [H].[H]OCC[H])COC[H])[H])[H])))C[H])COCCC[H])COC[H])[H])))COC[H])[H])[H])))C[H])C6OC[H])[H])[H])))))))))C[H])CO)C%106 has a charge of 0."}", "/scratch/micpie/export/orbnet_denali/test_0-6.jsonl": "{"text":"User: I must know the GFN1-xTB total energy of the molecule with SMILES [H].[H]C1CC([H])C(I)C(C([H])([H])C([H])([H])N([H])([H])C2NC3C([H])C([H])C([H])C([H])C3C(O)O2)C1[H].\nAssistant: Do you have the XYZ file file of a conformer of the molecule?\nUser: I have it: [XYZ]\n35\nH14 C16 I1 N2 O2\nO -2.054 -1.196 -3.088\nC -2.281 -1.177 -1.852\nO -1.345 -1.785 -0.955\nC -1.195 -1.440 0.310\nN -0.079 -2.118 0.983\nC 1.270 -1.677 0.475\nC 1.606 -0.209 0.921\nC 2.886 0.260 0.255\nC 4.164 0.034 0.932\nC 5.444 0.190 0.245\nC 5.408 0.755 -1.040\nC 4.142 1.096 -1.658\nC 2.917 0.731 -1.039\nI 1.044 1.177 -2.067\nN -1.978 -0.700 0.985\nC -3.028 0.009 0.220\nC -3.810 0.980 0.905\nC -4.892 1.490 0.235\nC -5.165 1.141 -1.079\nC -4.379 0.256 -1.802\nC -3.237 -0.323 -1.126\nH -0.280 -3.134 0.808\nH -0.185 -1.978 1.970\nH 1.091 -1.731 -0.641\nH 1.972 -2.409 0.793\nH 1.741 -0.182 2.060\nH 0.810 0.503 0.617\nH 4.155 -0.341 1.973\nH 6.395 0.003 0.813\nH 6.404 0.853 -1.690\nH 4.157 1.477 -2.661\nH -3.480 1.259 1.951\nH -5.547 2.250 0.720\nH -6.061 1.537 -1.581\nH -4.536 -0.007 -2.827[\\XYZ]\nAssistant: The GFN1-xTB total energy of the molecule is -60.60328 Hartree."} {"text":"User: I want to know the GFN1-xTB total energy of the molecule with SMILES [H].[H]OC1C([H])C(OC([H])([H])[H])C([H])C2OC(C3C([H])C(OC([H])([H])[H])C(OC([H])([H])[H])C([H])C3OC([H])[H])C([H])C(O)C12.\nAssistant: Do you have the XYZ file file of a conformer of the molecule?\nUser: I do: [XYZ]\n44\nH18 C19 O7\nC -5.666 2.701 0.538\nO -5.907 1.500 -0.218\nC -4.925 0.591 -0.544\nC -5.368 -0.571 -1.168\nC -4.365 -1.568 -1.399\nO -4.759 -2.607 -2.108\nC -3.031 -1.354 -1.064\nC -2.106 -2.461 -1.198\nO -2.420 -3.607 -1.637\nC -0.708 -2.198 -0.814\nC -0.475 -0.973 -0.269\nC 0.820 -0.487 0.148\nC 1.052 0.941 0.366\nC 2.280 1.558 0.481\nO 2.201 2.924 0.494\nC 3.396 3.599 0.923\nC 3.480 0.694 0.443\nO 4.760 1.208 0.611\nC 6.014 0.321 0.236\nC 3.271 -0.697 0.372\nC 1.998 -1.328 0.251\nO 1.915 -2.662 0.131\nC 2.939 -3.386 0.851\nO -1.395 0.020 -0.161\nC -2.680 -0.140 -0.486\nC -3.606 0.894 -0.329\nH -6.576 3.228 0.666\nH -5.247 2.422 1.490\nH -4.954 3.418 0.055\nH -6.407 -0.714 -1.300\nH -3.957 -3.333 -2.024\nH 0.057 -2.945 -0.917\nH 0.160 1.640 0.360\nH 2.986 4.676 1.175\nH 4.179 3.540 0.120\nH 3.815 3.135 1.855\nH 5.897 -0.206 -0.788\nH 6.241 -0.309 1.144\nH 6.897 0.964 0.152\nH 4.246 -1.221 0.270\nH 2.981 -3.001 1.979\nH 2.758 -4.539 0.933\nH 3.979 -3.301 0.444\nH -3.230 1.838 0.076[\\XYZ]\nAssistant: The GFN1-xTB total energy of the molecule is -78.65904 Hartree."}", "/scratch/micpie/export/orbnet_denali/train_0-3.jsonl": "{"text":"Question: What's the structure of a conformer of the compound with DeepSMILES [H].[H]CCCCO)OCN[H])[H])C[H])[H])C[H])[H])CC[H])C[H])C[H])C[H])C6I))))))))))NC6C[H])C%10[H]?\nConstraint: Return a MOL3000 file.\nAnswer: [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 35 36 0 0 0\nM V30 BEGIN ATOM\nM V30 1 O -2.153680 -0.716978 -3.091233 0 VAL=1\nM V30 2 C -2.201794 -0.565108 -1.928952 0 VAL=3\nM V30 3 O -1.121587 -1.408945 -1.400725 0\nM V30 4 C -1.304914 -1.457047 -0.152549 0 VAL=3\nM V30 5 N -0.151803 -2.110850 0.660619 0 VAL=4\nM V30 6 C 1.218708 -1.646921 0.223521 0\nM V30 7 C 1.663890 -0.393992 0.961226 0\nM V30 8 C 2.777226 0.375275 0.332550 0 VAL=3\nM V30 9 C 3.935437 0.634091 1.154444 0 VAL=3\nM V30 10 C 5.097333 1.070228 0.547839 0 VAL=3\nM V30 11 C 5.102739 1.426457 -0.852298 0 VAL=3\nM V30 12 C 3.906078 1.418660 -1.526242 0 VAL=3\nM V30 13 C 2.849543 0.848438 -0.976896 0 VAL=3\nM V30 14 I 1.178750 0.659287 -2.176666 0\nM V30 15 N -1.874572 -0.756568 0.760269 0 VAL=2\nM V30 16 C -3.056017 -0.085368 0.312281 0 VAL=3\nM V30 17 C -3.963932 0.457575 1.172450 0 VAL=3\nM V30 18 C -5.088782 1.206006 0.786264 0 VAL=3\nM V30 19 C -5.206175 1.641951 -0.535501 0 VAL=3\nM V30 20 C -4.211578 1.060906 -1.374478 0 VAL=2\nM V30 21 C -3.158318 0.147622 -1.083332 0 VAL=3\nM V30 22 H -0.110666 -3.134052 0.807669 0\nM V30 23 H -0.336631 -1.780815 1.603513 0\nM V30 24 H 1.143671 -1.613228 -0.888808 0\nM V30 25 H 1.963380 -2.358627 0.422809 0\nM V30 26 H 2.026800 -0.644065 1.936458 0\nM V30 27 H 0.823696 0.315143 1.112059 0\nM V30 28 H 3.723787 0.534053 2.253242 0\nM V30 29 H 5.898107 1.525723 1.184111 0\nM V30 30 H 5.974801 1.975687 -1.191153 0\nM V30 31 H 3.709486 2.064336 -2.390513 0\nM V30 32 H -3.781318 0.112006 2.213799 0\nM V30 33 H -5.796019 1.666495 1.521045 0\nM V30 34 H -5.857624 2.404202 -0.949320 0\nM V30 35 H -4.356776 1.395405 -2.508765 0 VAL=-1\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 2 3\nM V30 3 1 2 21\nM V30 4 1 3 4\nM V30 5 1 4 5\nM V30 6 1 4 15\nM V30 7 1 5 6\nM V30 8 1 5 22\nM V30 9 1 5 23\nM V30 10 1 6 7\nM V30 11 1 6 24\nM V30 12 1 6 25\nM V30 13 1 7 8\nM V30 14 1 7 26\nM V30 15 1 7 27\nM V30 16 1 8 9\nM V30 17 1 8 13\nM V30 18 1 9 10\nM V30 19 1 9 28\nM V30 20 1 10 11\nM V30 21 1 10 29\nM V30 22 1 11 12\nM V30 23 1 11 30\nM V30 24 1 12 13\nM V30 25 1 12 31\nM V30 26 1 13 14\nM V30 27 1 15 16\nM V30 28 1 16 17\nM V30 29 1 16 21\nM V30 30 1 17 18\nM V30 31 1 17 32\nM V30 32 1 18 19\nM V30 33 1 18 33\nM V30 34 1 19 20\nM V30 35 1 19 34\nM V30 36 1 20 21\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]"} {"text":"Question: What's the structure of a conformer of the molecule with SELFIES [H].[H][O][C][C][Branch1][C][H][C][Branch1][#Branch2][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][C][H][C][O][C][Branch2][Ring2][O][C][C][Branch1][C][H][C][Branch1][#Branch1][O][C][Branch1][C][H][H][C][Branch1][#Branch2][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][C][H][C][Ring1][P][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][C][H][C][Branch1][C][O][C][Ring2][Ring2][=Branch2][Ring2][Ring1][=C]?\nConstraint: Return a MOL3000 file.\nAnswer: [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 44 45 0 0 0\nM V30 BEGIN ATOM\nM V30 1 C -5.598069 2.587463 0.816112 0\nM V30 2 O -5.926279 1.423201 0.013487 0\nM V30 3 C -4.939257 0.520715 -0.270448 0 VAL=3\nM V30 4 C -5.418339 -0.708871 -0.834953 0 VAL=3\nM V30 5 C -4.460928 -1.616330 -1.307307 0 VAL=3\nM V30 6 O -4.769902 -2.786215 -1.784875 0\nM V30 7 C -3.069715 -1.327862 -1.135640 0 VAL=3\nM V30 8 C -2.045605 -2.383646 -1.435564 0 VAL=3\nM V30 9 O -2.386591 -3.492921 -1.881717 0 VAL=1\nM V30 10 C -0.726943 -2.031178 -1.027477 0 VAL=3\nM V30 11 C -0.429451 -0.822243 -0.590211 0 VAL=3\nM V30 12 C 0.908707 -0.400009 -0.104537 0 VAL=3\nM V30 13 C 1.179448 1.028247 -0.092320 0 VAL=3\nM V30 14 C 2.416006 1.533271 0.227699 0 VAL=3\nM V30 15 O 2.518631 2.880789 -0.147347 0\nM V30 16 C 3.669353 3.566639 0.196023 0 VAL=3\nM V30 17 C 3.432276 0.679367 0.616774 0 VAL=3\nM V30 18 O 4.557859 1.195192 1.054024 0\nM V30 19 C 5.498678 0.293934 1.784608 0\nM V30 20 C 3.242116 -0.665165 0.605697 0 VAL=3\nM V30 21 C 2.027000 -1.198334 0.254320 0 VAL=3\nM V30 22 O 1.812734 -2.546314 0.257591 0\nM V30 23 C 2.921657 -3.496433 0.237447 0\nM V30 24 O -1.368827 0.119648 -0.393646 0\nM V30 25 C -2.687993 -0.147211 -0.589957 0 VAL=3\nM V30 26 C -3.591325 0.803402 -0.177304 0 VAL=3\nM V30 27 H -6.605284 2.908328 1.171892 0\nM V30 28 H -5.051636 2.256920 1.748106 0\nM V30 29 H -5.029772 3.357393 0.311305 0\nM V30 30 H -6.531699 -0.751557 -1.002652 0\nM V30 31 H -3.808600 -3.286943 -1.759525 0\nM V30 32 H 0.075362 -2.843786 -1.034405 0\nM V30 33 H 0.410381 1.813934 -0.269900 0\nM V30 34 H 3.582631 4.677410 -0.227573 0 VAL=-1\nM V30 35 H 4.585414 3.040285 -0.264177 0\nM V30 36 H 3.881041 3.554833 1.284081 0\nM V30 37 H 5.951276 -0.389658 1.067813 0\nM V30 38 H 4.944263 -0.289879 2.594597 0\nM V30 39 H 6.341782 0.946805 2.161473 0\nM V30 40 H 3.965796 -1.351397 0.971863 0\nM V30 41 H 3.544712 -3.572239 1.172532 0\nM V30 42 H 2.440330 -4.506529 0.142552 0\nM V30 43 H 3.593189 -3.328953 -0.657845 0\nM V30 44 H -3.248226 1.831031 0.123651 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 1 27\nM V30 3 1 1 28\nM V30 4 1 1 29\nM V30 5 1 2 3\nM V30 6 1 3 4\nM V30 7 1 3 26\nM V30 8 1 4 5\nM V30 9 1 4 30\nM V30 10 1 5 6\nM V30 11 1 5 7\nM V30 12 1 6 31\nM V30 13 1 7 8\nM V30 14 1 7 25\nM V30 15 1 8 9\nM V30 16 1 8 10\nM V30 17 1 10 11\nM V30 18 1 10 32\nM V30 19 1 11 12\nM V30 20 1 11 24\nM V30 21 1 12 13\nM V30 22 1 12 21\nM V30 23 1 13 14\nM V30 24 1 13 33\nM V30 25 1 14 15\nM V30 26 1 14 17\nM V30 27 1 15 16\nM V30 28 1 16 35\nM V30 29 1 16 36\nM V30 30 1 17 18\nM V30 31 1 17 20\nM V30 32 1 18 19\nM V30 33 1 19 37\nM V30 34 1 19 38\nM V30 35 1 19 39\nM V30 36 1 20 21\nM V30 37 1 20 40\nM V30 38 1 21 22\nM V30 39 1 22 23\nM V30 40 1 23 41\nM V30 41 1 23 42\nM V30 42 1 23 43\nM V30 43 1 24 25\nM V30 44 1 25 26\nM V30 45 1 26 44\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]"}", "/scratch/micpie/export/orbnet_denali/test_1-4.jsonl": "{"text":"Task: Return the total energy of a chemical computed at the GFN1-xTB level of theory.\nDescription: The chemical has the XYZ file [XYZ]\n44\nH18 C19 O7\nC -6.444 -1.225 -2.667\nO -5.032 -1.257 -2.558\nC -4.412 -0.679 -1.504\nC -5.091 -0.023 -0.474\nC -4.376 0.545 0.576\nO -5.018 1.170 1.557\nC -2.962 0.461 0.601\nC -2.198 1.043 1.694\nO -2.735 1.639 2.634\nC -0.768 0.855 1.586\nC -0.210 0.202 0.533\nC 1.237 0.002 0.376\nC 1.781 -1.141 -0.237\nC 3.164 -1.258 -0.344\nO 0.928 -2.101 -0.660\nC 1.411 -3.284 -1.265\nC 4.029 -0.275 0.133\nO 5.341 -0.543 -0.059\nC 6.381 0.326 0.361\nC 3.490 0.872 0.752\nC 2.114 0.983 0.860\nO 4.349 1.836 1.215\nC 3.840 2.982 1.870\nO -0.958 -0.304 -0.464\nC -2.311 -0.205 -0.448\nC -3.018 -0.773 -1.497\nH -6.690 -1.754 -3.589\nH -6.816 -0.196 -2.731\nH -6.921 -1.732 -1.821\nH -6.169 0.059 -0.465\nH -4.316 1.497 2.201\nH -0.152 1.234 2.391\nH 0.530 -3.878 -1.512\nH 1.968 -3.066 -2.184\nH 3.617 -2.125 -0.805\nH 2.050 -3.854 -0.580\nH 6.371 0.467 1.446\nH 6.316 1.298 -0.137\nH 7.313 -0.165 0.072\nH 4.708 3.575 2.163\nH 3.206 3.579 1.204\nH 1.685 1.869 1.308\nH 3.270 2.712 2.768\nH -2.502 -1.283 -2.298[\\XYZ].\nAnswer: -78.73657 Hartree"} {"text":"Task: Return the total energy of a molecule computed at the GFN1-xTB level of theory.\nDescription: The molecule has the XYZ file [XYZ]\n26\nLi1 H9 C10 S1 N1 Cl1 O3\nO 4.932 0.704 0.314\nC 4.841 1.329 -0.734\nO 5.616 2.232 -1.168\nC 3.620 0.950 -1.636\nS 3.203 2.210 -2.877\nC 4.122 1.576 -4.294\nC 5.529 2.120 -4.531\nO 5.925 1.990 -5.727\nN 6.199 2.648 -3.540\nC 7.514 3.156 -3.584\nC 8.133 3.450 -2.367\nC 9.427 3.944 -2.343\nC 10.125 4.157 -3.524\nC 9.497 3.881 -4.728\nCl 10.344 4.127 -6.210\nC 8.198 3.400 -4.774\nLi 4.878 1.192 -6.789\nH 2.752 0.835 -1.001\nH 3.813 -0.007 -2.114\nH 3.555 1.873 -5.217\nH 4.196 0.484 -4.269\nH 5.808 2.482 -2.478\nH 7.607 3.284 -1.434\nH 9.903 4.167 -1.394\nH 11.142 4.529 -3.508\nH 7.744 3.227 -5.738[\\XYZ].\nAnswer: -49.40739 Hartree"}", "/scratch/micpie/export/orbnet_denali/test_1-1.jsonl": "{"text":"Question: What is the structure of a conformer of the chemical structure with SMILES [H]OC1C([H])C(OC([H])([H])[H])C([H])C2OC(C3C([H])C(OC([H])([H])[H])C(OC([H])([H])[H])C([H])C3OC([H])([H])[H])C([H])C(O)C12?\nConstraint: Return a XYZ file.\nAnswer: [XYZ]\n44\nH18 C19 O7\nC -6.444 -1.225 -2.667\nO -5.032 -1.257 -2.558\nC -4.412 -0.679 -1.504\nC -5.091 -0.023 -0.474\nC -4.376 0.545 0.576\nO -5.018 1.170 1.557\nC -2.962 0.461 0.601\nC -2.198 1.043 1.694\nO -2.735 1.639 2.634\nC -0.768 0.855 1.586\nC -0.210 0.202 0.533\nC 1.237 0.002 0.376\nC 1.781 -1.141 -0.237\nC 3.164 -1.258 -0.344\nO 0.928 -2.101 -0.660\nC 1.411 -3.284 -1.265\nC 4.029 -0.275 0.133\nO 5.341 -0.543 -0.059\nC 6.381 0.326 0.361\nC 3.490 0.872 0.752\nC 2.114 0.983 0.860\nO 4.349 1.836 1.215\nC 3.840 2.982 1.870\nO -0.958 -0.304 -0.464\nC -2.311 -0.205 -0.448\nC -3.018 -0.773 -1.497\nH -6.690 -1.754 -3.589\nH -6.816 -0.196 -2.731\nH -6.921 -1.732 -1.821\nH -6.169 0.059 -0.465\nH -4.316 1.497 2.201\nH -0.152 1.234 2.391\nH 0.530 -3.878 -1.512\nH 1.968 -3.066 -2.184\nH 3.617 -2.125 -0.805\nH 2.050 -3.854 -0.580\nH 6.371 0.467 1.446\nH 6.316 1.298 -0.137\nH 7.313 -0.165 0.072\nH 4.708 3.575 2.163\nH 3.206 3.579 1.204\nH 1.685 1.869 1.308\nH 3.270 2.712 2.768\nH -2.502 -1.283 -2.298[\\XYZ]"} {"text":"Question: What is the structure of a conformer of the compound with InChI InChI=1S\/C10H19ClNO3S.Li\/c11-7-2-1-3-8(4-7)12-9(13)5-16-6-10(14)15;\/h7-10,12,14-15H,1-6H2;\/q-1;+1?\nConstraint: Return a XYZ file.\nAnswer: [XYZ]\n26\nLi1 H9 C10 S1 N1 Cl1 O3\nO 4.932 0.704 0.314\nC 4.841 1.329 -0.734\nO 5.616 2.232 -1.168\nC 3.620 0.950 -1.636\nS 3.203 2.210 -2.877\nC 4.122 1.576 -4.294\nC 5.529 2.120 -4.531\nO 5.925 1.990 -5.727\nN 6.199 2.648 -3.540\nC 7.514 3.156 -3.584\nC 8.133 3.450 -2.367\nC 9.427 3.944 -2.343\nC 10.125 4.157 -3.524\nC 9.497 3.881 -4.728\nCl 10.344 4.127 -6.210\nC 8.198 3.400 -4.774\nLi 4.878 1.192 -6.789\nH 2.752 0.835 -1.001\nH 3.813 -0.007 -2.114\nH 3.555 1.873 -5.217\nH 4.196 0.484 -4.269\nH 5.808 2.482 -2.478\nH 7.607 3.284 -1.434\nH 9.903 4.167 -1.394\nH 11.142 4.529 -3.508\nH 7.744 3.227 -5.738[\\XYZ]"}", "/scratch/micpie/export/orbnet_denali/train_1-6.jsonl": "{"text":"User: I would like to know the GFN1-xTB total energy of the chemical with canonical SMILES COC1CC(O)C2C(O)CC(C3CC(OC)C(OC)CC3OC)OC2C1.[H].[H].[H].\nAssistant: Do you have the XYZ file file of a conformer of the chemical?\nUser: I do: [XYZ]\n44\nH18 C19 O7\nC -6.542 -1.167 -2.752\nO -5.088 -0.998 -2.662\nC -4.441 -0.523 -1.609\nC -5.113 -0.193 -0.439\nC -4.456 0.341 0.675\nO -5.062 0.788 1.771\nC -3.046 0.424 0.596\nC -2.270 0.793 1.781\nO -2.884 1.013 2.887\nC -0.767 0.911 1.585\nC -0.182 0.474 0.404\nC 1.284 0.202 0.252\nC 1.779 -1.107 -0.151\nC 3.174 -1.311 -0.258\nO 0.866 -2.147 -0.290\nC 1.390 -3.379 -0.673\nC 4.080 -0.279 0.061\nO 5.353 -0.605 -0.052\nC 6.532 -0.133 0.854\nC 3.569 1.026 0.431\nC 2.234 1.310 0.481\nO 4.421 1.977 0.840\nC 3.980 3.085 1.606\nO -0.928 0.131 -0.704\nC -2.313 -0.001 -0.611\nC -3.033 -0.518 -1.714\nH -6.692 -1.677 -3.713\nH -7.161 -0.227 -2.602\nH -6.898 -1.863 -2.014\nH -6.212 -0.192 -0.339\nH -4.311 1.050 2.378\nH -0.180 1.210 2.398\nH 0.568 -3.844 -1.348\nH 2.404 -3.406 -1.384\nH 3.595 -2.307 -0.454\nH 1.694 -4.046 0.308\nH 6.205 -0.100 1.962\nH 6.988 1.024 0.533\nH 7.342 -0.883 0.699\nH 4.810 3.631 1.927\nH 3.280 3.710 1.078\nH 1.707 2.303 0.631\nH 3.491 2.679 2.505\nH -2.446 -0.865 -2.521[\\XYZ]\nAssistant: The GFN1-xTB total energy of the chemical is -78.62736 Hartree."} {"text":"User: I have to know the GFN1-xTB total energy of the chemical with canonical SMILES OC(O)CSCC(NC1CCCC(Cl)C1)O[K].\nAssistant: Do you have the XYZ file file of a conformer of the chemical?\nUser: Yes: [XYZ]\n26\nK1 H9 C10 S1 N1 Cl1 O3\nO 3.679 3.606 -0.112\nC 2.706 2.996 0.432\nO 1.620 3.459 0.748\nC 2.948 1.489 0.744\nS 3.940 0.690 -0.552\nC 5.486 0.389 0.329\nC 6.311 1.592 0.739\nO 7.269 1.394 1.510\nN 6.002 2.792 0.234\nC 6.782 3.922 0.518\nC 8.183 3.867 0.486\nC 8.934 5.009 0.745\nC 8.307 6.218 1.021\nC 6.917 6.274 1.018\nCl 6.132 7.777 1.324\nC 6.154 5.144 0.770\nK 9.103 2.257 2.626\nH 1.990 0.976 0.829\nH 3.460 1.424 1.706\nH 6.108 -0.187 -0.364\nH 5.337 -0.233 1.218\nH 4.911 3.050 -0.024\nH 8.695 2.938 0.214\nH 10.025 4.966 0.679\nH 8.887 7.113 1.213\nH 5.073 5.217 0.760[\\XYZ]\nAssistant: The GFN1-xTB total energy of the chemical is -49.42847 Hartree."}", "/scratch/micpie/export/orbnet_denali/valid_1-3.jsonl": "{"text":"Question: What is the structure of a conformer of the compound with SMILES [H].[H].[H]COC1C([H])C(OC([H])([H])[H])C(OC([H])([H])[H])C([H])C1C1OC2C([H])C(OC([H])([H])[H])C([H])C(O[H])C2C(O)C1[H]?\nConstraint: Return a MOL3000 file.\nAnswer: [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 44 44 0 0 0\nM V30 BEGIN ATOM\nM V30 1 C -6.481493 -0.582633 -2.661362 0\nM V30 2 O -5.061657 -0.755738 -2.484112 0\nM V30 3 C -4.466928 -0.303211 -1.381590 0 VAL=3\nM V30 4 C -5.139244 0.378890 -0.381459 0 VAL=3\nM V30 5 C -4.411262 0.725903 0.787392 0 VAL=3\nM V30 6 O -5.076123 1.307775 1.766363 0\nM V30 7 C -3.044567 0.598409 0.850965 0 VAL=3\nM V30 8 C -2.251924 1.108939 1.982346 0 VAL=3\nM V30 9 O -2.736242 1.713872 2.918306 0 VAL=1\nM V30 10 C -0.837134 0.838129 1.844298 0 VAL=3\nM V30 11 C -0.293907 0.172052 0.813469 0 VAL=3\nM V30 12 C 1.134038 -0.141298 0.610426 0 VAL=3\nM V30 13 C 1.513867 -1.332228 -0.066592 0 VAL=3\nM V30 14 C 2.827265 -1.677185 -0.275878 0 VAL=3\nM V30 15 O 3.038038 -2.878673 -0.878211 0\nM V30 16 C 4.363909 -3.212004 -1.413133 0\nM V30 17 C 3.873733 -0.698621 0.051242 0 VAL=3\nM V30 18 O 5.192596 -0.926089 -0.246033 0\nM V30 19 C 6.214561 0.039905 0.114097 0\nM V30 20 C 3.496254 0.469377 0.707044 0 VAL=3\nM V30 21 C 2.156574 0.721962 0.985484 0 VAL=3\nM V30 22 O 1.761478 1.876169 1.664281 0\nM V30 23 C 1.809219 3.101874 0.921749 0 VAL=2\nM V30 24 O -1.061792 -0.356887 -0.115690 0\nM V30 25 C -2.382767 -0.159200 -0.138641 0 VAL=3\nM V30 26 C -3.095438 -0.594458 -1.247228 0 VAL=3\nM V30 27 H -6.753224 -1.026269 -3.600323 0\nM V30 28 H -6.783386 0.450641 -2.592793 0\nM V30 29 H -7.069061 -1.132701 -1.886813 0\nM V30 30 H -6.186410 0.560398 -0.303268 0\nM V30 31 H -4.414659 1.560261 2.496384 0\nM V30 32 H -0.195200 1.204788 2.615897 0\nM V30 33 H 0.770481 -2.024363 -0.470120 0\nM V30 34 H 4.247471 -4.144185 -1.964442 0\nM V30 35 H 4.752967 -2.367847 -2.117078 0\nM V30 36 H 5.080423 -3.421567 -0.580127 0\nM V30 37 H 6.143232 0.311805 1.209675 0\nM V30 38 H 6.192730 0.978405 -0.511500 0\nM V30 39 H 7.170503 -0.468096 0.026902 0\nM V30 40 H 4.204932 1.247721 0.958156 0\nM V30 41 H 2.681503 3.058567 0.117776 0 VAL=-1\nM V30 42 H 0.783999 3.293037 0.400725 0\nM V30 43 H 2.078201 4.044020 1.596125 0 VAL=-1\nM V30 44 H -2.675729 -1.184270 -2.042775 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 1 27\nM V30 3 1 1 28\nM V30 4 1 1 29\nM V30 5 1 2 3\nM V30 6 1 3 4\nM V30 7 1 3 26\nM V30 8 1 4 5\nM V30 9 1 4 30\nM V30 10 1 5 6\nM V30 11 1 5 7\nM V30 12 1 6 31\nM V30 13 1 7 8\nM V30 14 1 7 25\nM V30 15 1 8 9\nM V30 16 1 8 10\nM V30 17 1 10 11\nM V30 18 1 10 32\nM V30 19 1 11 12\nM V30 20 1 11 24\nM V30 21 1 12 13\nM V30 22 1 12 21\nM V30 23 1 13 14\nM V30 24 1 13 33\nM V30 25 1 14 15\nM V30 26 1 14 17\nM V30 27 1 15 16\nM V30 28 1 16 34\nM V30 29 1 16 35\nM V30 30 1 16 36\nM V30 31 1 17 18\nM V30 32 1 17 20\nM V30 33 1 18 19\nM V30 34 1 19 37\nM V30 35 1 19 38\nM V30 36 1 19 39\nM V30 37 1 20 21\nM V30 38 1 20 40\nM V30 39 1 21 22\nM V30 40 1 22 23\nM V30 41 1 23 42\nM V30 42 1 24 25\nM V30 43 1 25 26\nM V30 44 1 26 44\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]"} {"text":"Question: What's the structure of a conformer of the chemical structure with DeepSMILES [H]CC[H])CCl)C[H])CN[H])CO[K]OCO)C[H])[H])SC8[H])[H]))))))))))C6[H]?\nConstraint: Return a MOL3000 file.\nAnswer: [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 26 27 0 0 0\nM V30 BEGIN ATOM\nM V30 1 O 3.248239 0.033887 -0.391745 0\nM V30 2 C 2.472007 0.812446 0.211449 0 VAL=3\nM V30 3 O 1.283736 1.037532 0.016417 0 VAL=1\nM V30 4 C 3.145168 1.611136 1.414818 0\nM V30 5 S 3.895054 3.156378 0.828443 0\nM V30 6 C 5.430957 2.611911 0.008724 0\nM V30 7 C 6.454634 2.332922 1.077292 0 VAL=3\nM V30 8 O 6.578906 1.193366 1.528913 0\nM V30 9 N 7.214774 3.326013 1.596816 0\nM V30 10 C 7.348733 4.676577 1.185771 0 VAL=3\nM V30 11 C 6.243963 5.492462 0.963312 0 VAL=3\nM V30 12 C 6.433019 6.819028 0.599759 0 VAL=3\nM V30 13 C 7.710883 7.343273 0.473481 0 VAL=3\nM V30 14 C 8.808042 6.529130 0.719216 0 VAL=3\nM V30 15 Cl 10.402513 7.155809 0.587770 0\nM V30 16 C 8.635531 5.198615 1.070234 0 VAL=3\nM V30 17 K 5.166826 -0.455021 0.690392 0 VAL=2\nM V30 18 H 3.885954 0.990336 1.957337 0\nM V30 19 H 2.368364 1.941840 2.107967 0\nM V30 20 H 5.732219 3.411466 -0.663717 0\nM V30 21 H 5.254436 1.687649 -0.569629 0\nM V30 22 H 7.885496 2.989458 2.283104 0\nM V30 23 H 5.238697 5.112407 1.101065 0\nM V30 24 H 5.571800 7.452258 0.418792 0\nM V30 25 H 7.859882 8.381367 0.209448 0\nM V30 26 H 9.497115 4.572087 1.251923 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 1 17\nM V30 3 1 2 3\nM V30 4 1 2 4\nM V30 5 1 4 5\nM V30 6 1 4 18\nM V30 7 1 4 19\nM V30 8 1 5 6\nM V30 9 1 6 7\nM V30 10 1 6 20\nM V30 11 1 6 21\nM V30 12 1 7 8\nM V30 13 1 7 9\nM V30 14 1 8 17\nM V30 15 1 9 10\nM V30 16 1 9 22\nM V30 17 1 10 11\nM V30 18 1 10 16\nM V30 19 1 11 12\nM V30 20 1 11 23\nM V30 21 1 12 13\nM V30 22 1 12 24\nM V30 23 1 13 14\nM V30 24 1 13 25\nM V30 25 1 14 15\nM V30 26 1 14 16\nM V30 27 1 16 26\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]"}", "/scratch/micpie/export/orbnet_denali/valid_0-2.jsonl": "{"text":"Question: What is the structure of a conformer of the compound with SMILES [H]C1C([H])C([H])C(C([H])([H])C([H])([H])N([H])([H])C2NC3C([H])C([H])C([H])C([H])C3C(O)O2)C(I)C1[H]?\nConstraint: Return a MOL2000 file.\nAnswer: [V2000]\n\n ChemNLP 3D\n\n 35 37 0 0 0 0 0 0 0 0999 V2000\n -2.5401 -1.1905 -3.0246 O 0 0 0 0 0 1 0 0 0 0 0 0\n -2.4339 -1.0014 -1.8609 C 0 0 0 0 0 3 0 0 0 0 0 0\n -1.3979 -1.6907 -1.1542 O 0 0 0 0 0 0 0 0 0 0 0 0\n -1.2310 -1.4794 0.1378 C 0 0 0 0 0 3 0 0 0 0 0 0\n -0.0878 -2.2420 0.6527 N 0 0 0 0 0 4 0 0 0 0 0 0\n 1.2214 -1.6555 0.1904 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.4442 -0.2828 0.8183 C 0 0 0 0 0 0 0 0 0 0 0 0\n 2.7531 0.2898 0.3179 C 0 0 0 0 0 3 0 0 0 0 0 0\n 3.8742 0.2469 1.1432 C 0 0 0 0 0 3 0 0 0 0 0 0\n 5.0869 0.7667 0.7165 C 0 0 0 0 0 3 0 0 0 0 0 0\n 5.1844 1.3325 -0.5463 C 0 0 0 0 0 3 0 0 0 0 0 0\n 4.0741 1.3728 -1.3767 C 0 0 0 0 0 3 0 0 0 0 0 0\n 2.8565 0.8517 -0.9549 C 0 0 0 0 0 3 0 0 0 0 0 0\n 1.1969 0.9257 -2.2704 I 0 0 0 0 0 0 0 0 0 0 0 0\n -1.8679 -0.7388 0.9321 N 0 0 0 0 0 2 0 0 0 0 0 0\n -2.9345 -0.0198 0.3771 C 0 0 0 0 0 3 0 0 0 0 0 0\n -3.6965 0.8133 1.1917 C 0 0 0 0 0 3 0 0 0 0 0 0\n -4.7456 1.5281 0.6366 C 0 0 0 0 0 3 0 0 0 0 0 0\n -5.0398 1.4209 -0.7220 C 0 0 0 0 0 3 0 0 0 0 0 0\n -4.2862 0.5960 -1.5374 C 0 0 0 0 0 3 0 0 0 0 0 0\n -3.2240 -0.1352 -0.9949 C 0 0 0 0 0 3 0 0 0 0 0 0\n -0.1609 -3.1966 0.3014 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.1372 -2.2306 1.6714 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.1628 -1.5704 -0.8998 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.0027 -2.3637 0.4733 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.4547 -0.3718 1.9083 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.6194 0.3751 0.5275 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.7903 -0.1925 2.1316 H 0 0 0 0 0 0 0 0 0 0 0 0\n 5.9498 0.7301 1.3703 H 0 0 0 0 0 0 0 0 0 0 0 0\n 6.1258 1.7445 -0.8896 H 0 0 0 0 0 0 0 0 0 0 0 0\n 4.1525 1.8131 -2.3626 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.4594 0.8904 2.2456 H 0 0 0 0 0 0 0 0 0 0 0 0\n -5.3431 2.1775 1.2658 H 0 0 0 0 0 0 0 0 0 0 0 0\n -5.8634 1.9867 -1.1407 H 0 0 0 0 0 0 0 0 0 0 0 0\n -4.5006 0.5001 -2.5959 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 3 1 0\n 2 21 1 0\n 3 4 1 0\n 4 5 1 0\n 4 15 1 0\n 5 6 1 0\n 5 22 1 0\n 5 23 1 0\n 6 7 1 0\n 6 24 1 0\n 6 25 1 0\n 7 8 1 0\n 7 26 1 0\n 7 27 1 0\n 8 9 1 0\n 8 13 1 0\n 9 10 1 0\n 9 28 1 0\n 10 11 1 0\n 10 29 1 0\n 11 12 1 0\n 11 30 1 0\n 12 13 1 0\n 12 31 1 0\n 13 14 1 0\n 15 16 1 0\n 16 17 1 0\n 16 21 1 0\n 17 18 1 0\n 17 32 1 0\n 18 19 1 0\n 18 33 1 0\n 19 20 1 0\n 19 34 1 0\n 20 21 1 0\n 20 35 1 0\nM END\n[\\V2000]"} {"text":"Question: What is the structure of a conformer of the chemical structure with canonical SMILES COC1C(C(O)CC(O)C2CCCCC2)CCC2OCCC21?\nConstraint: Return a MOL2000 file.\nAnswer: [V2000]\n\n ChemNLP 3D\n\n 36 38 0 0 0 0 0 0 0 0999 V2000\n -2.2177 -1.4924 2.5591 C 0 0 0 0 0 3 0 0 0 0 0 0\n -5.2562 -1.2797 0.4574 O 0 0 0 0 0 0 0 0 0 0 0 0\n -4.0042 -1.1862 1.0356 C 0 0 0 0 0 3 0 0 0 0 0 0\n -1.3523 -0.7483 1.7555 C 0 0 0 0 0 3 0 0 0 0 0 0\n 0.0896 -0.8088 2.1059 C 0 0 0 0 0 3 0 0 0 0 0 0\n 0.5003 -1.0480 3.2053 O 0 0 0 0 0 1 0 0 0 0 0 0\n 0.9060 -0.5827 0.8858 C 0 0 0 0 0 3 0 0 0 0 0 0\n 2.8823 -0.5027 -1.4483 C 0 0 0 0 0 3 0 0 0 0 0 0\n 2.6532 0.3797 2.1277 O 0 0 0 0 0 0 0 0 0 0 0 0\n 3.0245 0.2870 -0.3176 C 0 0 0 0 0 3 0 0 0 0 0 0\n 3.7321 -0.3694 -2.5131 C 0 0 0 0 0 3 0 0 0 0 0 0\n 4.7348 0.6343 -2.4405 C 0 0 0 0 0 3 0 0 0 0 0 0\n 4.7283 1.5883 -1.3876 C 0 0 0 0 0 3 0 0 0 0 0 0\n 3.9954 1.3311 -0.2533 C 0 0 0 0 0 3 0 0 0 0 0 0\n 2.1619 -0.0146 0.8957 C 0 0 0 0 0 3 0 0 0 0 0 0\n -1.8543 -0.0952 0.5490 C 0 0 0 0 0 3 0 0 0 0 0 0\n -1.6046 1.7471 -0.9955 C 0 0 0 0 0 0 0 0 0 0 0 0\n -3.1714 -0.4203 0.1742 C 0 0 0 0 0 3 0 0 0 0 0 0\n -1.0408 0.7600 -0.1353 O 0 0 0 0 0 0 0 0 0 0 0 0\n -4.0218 0.0308 -0.8556 C 0 0 0 0 0 3 0 0 0 0 0 0\n -5.2309 -0.5748 -0.7128 C 0 0 0 0 0 3 0 0 0 0 0 0\n -3.5873 -1.7426 2.2657 C 0 0 0 0 0 3 0 0 0 0 0 0\n -4.2338 -2.3303 2.9528 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.7868 -1.8247 3.5772 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.4600 -0.7617 -0.0631 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7659 2.3643 -1.2612 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.0032 -1.1937 -1.5406 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.7027 -1.0179 -3.3996 H 0 0 0 0 0 0 0 0 0 0 0 0\n 5.4184 0.9020 -3.2031 H 0 0 0 0 0 0 0 0 0 0 0 0\n 5.4280 2.4275 -1.3831 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.5457 0.9983 2.1713 H 0 0 0 0 0 0 0 0 0 0 0 0\n 4.0314 1.9804 0.6385 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.3863 2.2716 -0.5023 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0643 1.3192 -1.9089 H 0 0 0 0 0 0 0 0 0 0 0 0\n -6.1577 -0.6166 -1.2031 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.7397 0.7739 -1.5951 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 4 1 0\n 1 22 1 0\n 1 24 1 0\n 2 3 1 0\n 2 21 1 0\n 3 18 1 0\n 3 22 1 0\n 4 5 1 0\n 4 16 1 0\n 5 6 1 0\n 5 7 1 0\n 7 15 1 0\n 7 25 1 0\n 8 10 1 0\n 8 11 1 0\n 8 27 1 0\n 9 15 1 0\n 9 31 1 0\n 10 14 1 0\n 10 15 1 0\n 11 12 1 0\n 11 28 1 0\n 12 13 1 0\n 12 29 1 0\n 13 14 1 0\n 13 30 1 0\n 14 32 1 0\n 16 18 1 0\n 16 19 1 0\n 17 19 1 0\n 17 26 1 0\n 17 33 1 0\n 17 34 1 0\n 18 20 1 0\n 20 21 1 0\n 20 36 1 0\n 21 35 1 0\n 22 23 1 0\nM END\n[\\V2000]"}", "/scratch/micpie/export/orbnet_denali/valid_0-1.jsonl": "{"text":"Question: What's the structure of a conformer of the compound with SMILES [H]C1C([H])C([H])C(C([H])([H])C([H])([H])N([H])([H])C2NC3C([H])C([H])C([H])C([H])C3C(O)O2)C(I)C1[H]?\nConstraint: Return a XYZ file.\nAnswer: [XYZ]\n35\nH14 C16 I1 N2 O2\nO -2.540 -1.190 -3.025\nC -2.434 -1.001 -1.861\nO -1.398 -1.691 -1.154\nC -1.231 -1.479 0.138\nN -0.088 -2.242 0.653\nC 1.221 -1.655 0.190\nC 1.444 -0.283 0.818\nC 2.753 0.290 0.318\nC 3.874 0.247 1.143\nC 5.087 0.767 0.717\nC 5.184 1.333 -0.546\nC 4.074 1.373 -1.377\nC 2.857 0.852 -0.955\nI 1.197 0.926 -2.270\nN -1.868 -0.739 0.932\nC -2.935 -0.020 0.377\nC -3.697 0.813 1.192\nC -4.746 1.528 0.637\nC -5.040 1.421 -0.722\nC -4.286 0.596 -1.537\nC -3.224 -0.135 -0.995\nH -0.161 -3.197 0.301\nH -0.137 -2.231 1.671\nH 1.163 -1.570 -0.900\nH 2.003 -2.364 0.473\nH 1.455 -0.372 1.908\nH 0.619 0.375 0.528\nH 3.790 -0.193 2.132\nH 5.950 0.730 1.370\nH 6.126 1.744 -0.890\nH 4.153 1.813 -2.363\nH -3.459 0.890 2.246\nH -5.343 2.177 1.266\nH -5.863 1.987 -1.141\nH -4.501 0.500 -2.596[\\XYZ]"} {"text":"Question: What is the structure of a conformer of the compound with InChI InChI=1S\/C18H32O4\/c1-21-18-13(7-8-17-14(18)9-10-22-17)16(20)11-15(19)12-5-3-2-4-6-12\/h12-20H,2-11H2,1H3?\nConstraint: Return a XYZ file.\nAnswer: [XYZ]\n36\nH14 C18 O4\nC -2.218 -1.492 2.559\nO -5.256 -1.280 0.457\nC -4.004 -1.186 1.036\nC -1.352 -0.748 1.755\nC 0.090 -0.809 2.106\nO 0.500 -1.048 3.205\nC 0.906 -0.583 0.886\nC 2.882 -0.503 -1.448\nO 2.653 0.380 2.128\nC 3.025 0.287 -0.318\nC 3.732 -0.369 -2.513\nC 4.735 0.634 -2.441\nC 4.728 1.588 -1.388\nC 3.995 1.331 -0.253\nC 2.162 -0.015 0.896\nC -1.854 -0.095 0.549\nC -1.605 1.747 -0.996\nC -3.171 -0.420 0.174\nO -1.041 0.760 -0.135\nC -4.022 0.031 -0.856\nC -5.231 -0.575 -0.713\nC -3.587 -1.743 2.266\nH -4.234 -2.330 2.953\nH -1.787 -1.825 3.577\nH 0.460 -0.762 -0.063\nH -0.766 2.364 -1.261\nH 2.003 -1.194 -1.541\nH 3.703 -1.018 -3.400\nH 5.418 0.902 -3.203\nH 5.428 2.428 -1.383\nH 3.546 0.998 2.171\nH 4.031 1.980 0.639\nH -2.386 2.272 -0.502\nH -2.064 1.319 -1.909\nH -6.158 -0.617 -1.203\nH -3.740 0.774 -1.595[\\XYZ]"}", "/scratch/micpie/export/orbnet_denali/valid_1-1.jsonl": "{"text":"Question: What's the structure of a conformer of the chemical structure with SMILES [H].[H].[H]COC1C([H])C(OC([H])([H])[H])C(OC([H])([H])[H])C([H])C1C1OC2C([H])C(OC([H])([H])[H])C([H])C(O[H])C2C(O)C1[H]?\nConstraint: Return a XYZ file.\nAnswer: [XYZ]\n44\nH18 C19 O7\nC -6.481 -0.583 -2.661\nO -5.062 -0.756 -2.484\nC -4.467 -0.303 -1.382\nC -5.139 0.379 -0.381\nC -4.411 0.726 0.787\nO -5.076 1.308 1.766\nC -3.045 0.598 0.851\nC -2.252 1.109 1.982\nO -2.736 1.714 2.918\nC -0.837 0.838 1.844\nC -0.294 0.172 0.813\nC 1.134 -0.141 0.610\nC 1.514 -1.332 -0.067\nC 2.827 -1.677 -0.276\nO 3.038 -2.879 -0.878\nC 4.364 -3.212 -1.413\nC 3.874 -0.699 0.051\nO 5.193 -0.926 -0.246\nC 6.215 0.040 0.114\nC 3.496 0.469 0.707\nC 2.157 0.722 0.985\nO 1.761 1.876 1.664\nC 1.809 3.102 0.922\nO -1.062 -0.357 -0.116\nC -2.383 -0.159 -0.139\nC -3.095 -0.594 -1.247\nH -6.753 -1.026 -3.600\nH -6.783 0.451 -2.593\nH -7.069 -1.133 -1.887\nH -6.186 0.560 -0.303\nH -4.415 1.560 2.496\nH -0.195 1.205 2.616\nH 0.770 -2.024 -0.470\nH 4.247 -4.144 -1.964\nH 4.753 -2.368 -2.117\nH 5.080 -3.422 -0.580\nH 6.143 0.312 1.210\nH 6.193 0.978 -0.511\nH 7.171 -0.468 0.027\nH 4.205 1.248 0.958\nH 2.682 3.059 0.118\nH 0.784 3.293 0.401\nH 2.078 4.044 1.596\nH -2.676 -1.184 -2.043[\\XYZ]"} {"text":"Question: What is the structure of a conformer of the chemical with InChI InChI=1S\/C10H18ClNO3S.K\/c11-7-2-1-3-8(4-7)12-9(13)5-16-6-10(14)15;\/h7-10,12,14H,1-6H2;\/q-2;+2?\nConstraint: Return a XYZ file.\nAnswer: [XYZ]\n26\nK1 H9 C10 S1 N1 Cl1 O3\nO 3.248 0.034 -0.392\nC 2.472 0.812 0.211\nO 1.284 1.038 0.016\nC 3.145 1.611 1.415\nS 3.895 3.156 0.828\nC 5.431 2.612 0.009\nC 6.455 2.333 1.077\nO 6.579 1.193 1.529\nN 7.215 3.326 1.597\nC 7.349 4.677 1.186\nC 6.244 5.492 0.963\nC 6.433 6.819 0.600\nC 7.711 7.343 0.473\nC 8.808 6.529 0.719\nCl 10.403 7.156 0.588\nC 8.636 5.199 1.070\nK 5.167 -0.455 0.690\nH 3.886 0.990 1.957\nH 2.368 1.942 2.108\nH 5.732 3.411 -0.664\nH 5.254 1.688 -0.570\nH 7.885 2.989 2.283\nH 5.239 5.112 1.101\nH 5.572 7.452 0.419\nH 7.860 8.381 0.209\nH 9.497 4.572 1.252[\\XYZ]"}", "/scratch/micpie/export/orbnet_denali/valid_0-5.jsonl": "{"text":"Task: Return the total energy of a molecule computed at the {\\omega}B97X-D3\/def2-TZVP level of theory.\nDescription: The molecule has the XYZ file [XYZ]\n35\nH14 C16 I1 N2 O2\nO -2.540 -1.190 -3.025\nC -2.434 -1.001 -1.861\nO -1.398 -1.691 -1.154\nC -1.231 -1.479 0.138\nN -0.088 -2.242 0.653\nC 1.221 -1.655 0.190\nC 1.444 -0.283 0.818\nC 2.753 0.290 0.318\nC 3.874 0.247 1.143\nC 5.087 0.767 0.717\nC 5.184 1.333 -0.546\nC 4.074 1.373 -1.377\nC 2.857 0.852 -0.955\nI 1.197 0.926 -2.270\nN -1.868 -0.739 0.932\nC -2.935 -0.020 0.377\nC -3.697 0.813 1.192\nC -4.746 1.528 0.637\nC -5.040 1.421 -0.722\nC -4.286 0.596 -1.537\nC -3.224 -0.135 -0.995\nH -0.161 -3.197 0.301\nH -0.137 -2.231 1.671\nH 1.163 -1.570 -0.900\nH 2.003 -2.364 0.473\nH 1.455 -0.372 1.908\nH 0.619 0.375 0.528\nH 3.790 -0.193 2.132\nH 5.950 0.730 1.370\nH 6.126 1.744 -0.890\nH 4.153 1.813 -2.363\nH -3.459 0.890 2.246\nH -5.343 2.177 1.266\nH -5.863 1.987 -1.141\nH -4.501 0.500 -2.596[\\XYZ].\nAnswer: -1175.70842 Hartree"} {"text":"Task: Return the total energy of a chemical structure computed at the {\\omega}B97X-D3\/def2-TZVP level of theory.\nDescription: The chemical structure has the XYZ file [XYZ]\n36\nH14 C18 O4\nC -2.218 -1.492 2.559\nO -5.256 -1.280 0.457\nC -4.004 -1.186 1.036\nC -1.352 -0.748 1.755\nC 0.090 -0.809 2.106\nO 0.500 -1.048 3.205\nC 0.906 -0.583 0.886\nC 2.882 -0.503 -1.448\nO 2.653 0.380 2.128\nC 3.025 0.287 -0.318\nC 3.732 -0.369 -2.513\nC 4.735 0.634 -2.441\nC 4.728 1.588 -1.388\nC 3.995 1.331 -0.253\nC 2.162 -0.015 0.896\nC -1.854 -0.095 0.549\nC -1.605 1.747 -0.996\nC -3.171 -0.420 0.174\nO -1.041 0.760 -0.135\nC -4.022 0.031 -0.856\nC -5.231 -0.575 -0.713\nC -3.587 -1.743 2.266\nH -4.234 -2.330 2.953\nH -1.787 -1.825 3.577\nH 0.460 -0.762 -0.063\nH -0.766 2.364 -1.261\nH 2.003 -1.194 -1.541\nH 3.703 -1.018 -3.400\nH 5.418 0.902 -3.203\nH 5.428 2.428 -1.383\nH 3.546 0.998 2.171\nH 4.031 1.980 0.639\nH -2.386 2.272 -0.502\nH -2.064 1.319 -1.909\nH -6.158 -0.617 -1.203\nH -3.740 0.774 -1.595[\\XYZ].\nAnswer: -995.21527 Hartree"}", "/scratch/micpie/export/orbnet_denali/valid_0-4.jsonl": "{"text":"Task: Return the total energy of a chemical computed at the GFN1-xTB level of theory.\nDescription: The chemical has the XYZ file [XYZ]\n35\nH14 C16 I1 N2 O2\nO -2.540 -1.190 -3.025\nC -2.434 -1.001 -1.861\nO -1.398 -1.691 -1.154\nC -1.231 -1.479 0.138\nN -0.088 -2.242 0.653\nC 1.221 -1.655 0.190\nC 1.444 -0.283 0.818\nC 2.753 0.290 0.318\nC 3.874 0.247 1.143\nC 5.087 0.767 0.717\nC 5.184 1.333 -0.546\nC 4.074 1.373 -1.377\nC 2.857 0.852 -0.955\nI 1.197 0.926 -2.270\nN -1.868 -0.739 0.932\nC -2.935 -0.020 0.377\nC -3.697 0.813 1.192\nC -4.746 1.528 0.637\nC -5.040 1.421 -0.722\nC -4.286 0.596 -1.537\nC -3.224 -0.135 -0.995\nH -0.161 -3.197 0.301\nH -0.137 -2.231 1.671\nH 1.163 -1.570 -0.900\nH 2.003 -2.364 0.473\nH 1.455 -0.372 1.908\nH 0.619 0.375 0.528\nH 3.790 -0.193 2.132\nH 5.950 0.730 1.370\nH 6.126 1.744 -0.890\nH 4.153 1.813 -2.363\nH -3.459 0.890 2.246\nH -5.343 2.177 1.266\nH -5.863 1.987 -1.141\nH -4.501 0.500 -2.596[\\XYZ].\nAnswer: -60.66416 Hartree"} {"text":"Task: Return the total energy of a molecule computed at the GFN1-xTB level of theory.\nDescription: The molecule has the XYZ file [XYZ]\n36\nH14 C18 O4\nC -2.218 -1.492 2.559\nO -5.256 -1.280 0.457\nC -4.004 -1.186 1.036\nC -1.352 -0.748 1.755\nC 0.090 -0.809 2.106\nO 0.500 -1.048 3.205\nC 0.906 -0.583 0.886\nC 2.882 -0.503 -1.448\nO 2.653 0.380 2.128\nC 3.025 0.287 -0.318\nC 3.732 -0.369 -2.513\nC 4.735 0.634 -2.441\nC 4.728 1.588 -1.388\nC 3.995 1.331 -0.253\nC 2.162 -0.015 0.896\nC -1.854 -0.095 0.549\nC -1.605 1.747 -0.996\nC -3.171 -0.420 0.174\nO -1.041 0.760 -0.135\nC -4.022 0.031 -0.856\nC -5.231 -0.575 -0.713\nC -3.587 -1.743 2.266\nH -4.234 -2.330 2.953\nH -1.787 -1.825 3.577\nH 0.460 -0.762 -0.063\nH -0.766 2.364 -1.261\nH 2.003 -1.194 -1.541\nH 3.703 -1.018 -3.400\nH 5.418 0.902 -3.203\nH 5.428 2.428 -1.383\nH 3.546 0.998 2.171\nH 4.031 1.980 0.639\nH -2.386 2.272 -0.502\nH -2.064 1.319 -1.909\nH -6.158 -0.617 -1.203\nH -3.740 0.774 -1.595[\\XYZ].\nAnswer: -61.08251 Hartree"}", "/scratch/micpie/export/orbnet_denali/test_1-3.jsonl": "{"text":"Question: What's the structure of a conformer of the chemical structure with InChI InChI=1S\/C19H34O7\/c1-22-10-5-12(20)19-13(21)8-15(26-18(19)6-10)11-7-16(24-3)17(25-4)9-14(11)23-2\/h10-21H,5-9H2,1-4H3?\nConstraint: Return a MOL3000 file.\nAnswer: [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 44 46 0 0 0\nM V30 BEGIN ATOM\nM V30 1 C -6.443545 -1.225137 -2.666952 0\nM V30 2 O -5.032280 -1.256717 -2.557574 0\nM V30 3 C -4.412471 -0.678750 -1.503783 0 VAL=3\nM V30 4 C -5.091118 -0.023118 -0.474336 0 VAL=3\nM V30 5 C -4.375941 0.545405 0.575565 0 VAL=3\nM V30 6 O -5.018498 1.169597 1.556700 0\nM V30 7 C -2.962159 0.460707 0.601055 0 VAL=3\nM V30 8 C -2.198348 1.042741 1.694383 0 VAL=3\nM V30 9 O -2.735199 1.638798 2.633984 0 VAL=1\nM V30 10 C -0.767985 0.855448 1.586304 0 VAL=3\nM V30 11 C -0.209776 0.202353 0.532635 0 VAL=3\nM V30 12 C 1.237335 0.002138 0.375691 0 VAL=3\nM V30 13 C 1.781233 -1.141432 -0.237080 0 VAL=3\nM V30 14 C 3.163986 -1.257951 -0.343690 0 VAL=3\nM V30 15 O 0.927579 -2.101005 -0.659715 0\nM V30 16 C 1.411123 -3.284102 -1.265134 0\nM V30 17 C 4.028850 -0.274799 0.132950 0 VAL=3\nM V30 18 O 5.341499 -0.543002 -0.059463 0\nM V30 19 C 6.381057 0.326141 0.361257 0\nM V30 20 C 3.489601 0.871903 0.751692 0 VAL=3\nM V30 21 C 2.113849 0.982837 0.859982 0 VAL=3\nM V30 22 O 4.349258 1.835842 1.215043 0\nM V30 23 C 3.839638 2.981689 1.870418 0\nM V30 24 O -0.957778 -0.304179 -0.463901 0\nM V30 25 C -2.310554 -0.204834 -0.447802 0 VAL=3\nM V30 26 C -3.018388 -0.772976 -1.497205 0 VAL=3\nM V30 27 H -6.689726 -1.754436 -3.589092 0\nM V30 28 H -6.816065 -0.196253 -2.730599 0\nM V30 29 H -6.920963 -1.732499 -1.820756 0\nM V30 30 H -6.169124 0.058868 -0.465444 0\nM V30 31 H -4.316294 1.497135 2.200923 0\nM V30 32 H -0.151528 1.233625 2.390552 0\nM V30 33 H 0.529667 -3.878467 -1.511846 0\nM V30 34 H 1.967584 -3.065861 -2.184181 0\nM V30 35 H 3.616696 -2.125229 -0.805441 0\nM V30 36 H 2.050318 -3.853714 -0.580109 0\nM V30 37 H 6.370918 0.466773 1.446314 0\nM V30 38 H 6.316426 1.298085 -0.137078 0\nM V30 39 H 7.313379 -0.164552 0.071529 0\nM V30 40 H 4.708226 3.574832 2.162690 0\nM V30 41 H 3.205899 3.578738 1.203641 0\nM V30 42 H 1.684952 1.869447 1.308134 0\nM V30 43 H 3.270257 2.711951 2.767979 0\nM V30 44 H -2.501590 -1.283029 -2.298239 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 1 27\nM V30 3 1 1 28\nM V30 4 1 1 29\nM V30 5 1 2 3\nM V30 6 1 3 4\nM V30 7 1 3 26\nM V30 8 1 4 5\nM V30 9 1 4 30\nM V30 10 1 5 6\nM V30 11 1 5 7\nM V30 12 1 6 31\nM V30 13 1 7 8\nM V30 14 1 7 25\nM V30 15 1 8 9\nM V30 16 1 8 10\nM V30 17 1 10 11\nM V30 18 1 10 32\nM V30 19 1 11 12\nM V30 20 1 11 24\nM V30 21 1 12 13\nM V30 22 1 12 21\nM V30 23 1 13 14\nM V30 24 1 13 15\nM V30 25 1 14 17\nM V30 26 1 14 35\nM V30 27 1 15 16\nM V30 28 1 16 33\nM V30 29 1 16 34\nM V30 30 1 16 36\nM V30 31 1 17 18\nM V30 32 1 17 20\nM V30 33 1 18 19\nM V30 34 1 19 37\nM V30 35 1 19 38\nM V30 36 1 19 39\nM V30 37 1 20 21\nM V30 38 1 20 22\nM V30 39 1 21 42\nM V30 40 1 22 23\nM V30 41 1 23 40\nM V30 42 1 23 41\nM V30 43 1 23 43\nM V30 44 1 24 25\nM V30 45 1 25 26\nM V30 46 1 26 44\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]"} {"text":"Question: What is the structure of a conformer of the chemical with InChI InChI=1S\/C10H19ClNO3S.Li\/c11-7-2-1-3-8(4-7)12-9(13)5-16-6-10(14)15;\/h7-10,12,14-15H,1-6H2;\/q-1;+1?\nConstraint: Return a MOL3000 file.\nAnswer: [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 26 26 0 0 0\nM V30 BEGIN ATOM\nM V30 1 O 4.931823 0.704410 0.314137 0 VAL=1\nM V30 2 C 4.840533 1.328526 -0.734275 0 VAL=3\nM V30 3 O 5.615903 2.231592 -1.168117 0 VAL=1\nM V30 4 C 3.619914 0.949520 -1.636194 0\nM V30 5 S 3.203158 2.210201 -2.876588 0\nM V30 6 C 4.122339 1.575587 -4.293708 0\nM V30 7 C 5.529137 2.119800 -4.531460 0 VAL=3\nM V30 8 O 5.924919 1.990432 -5.726984 0\nM V30 9 N 6.199271 2.648039 -3.539792 0\nM V30 10 C 7.513867 3.156456 -3.584492 0 VAL=3\nM V30 11 C 8.133175 3.449920 -2.367440 0 VAL=3\nM V30 12 C 9.426765 3.944368 -2.343199 0 VAL=3\nM V30 13 C 10.124779 4.156917 -3.524464 0 VAL=3\nM V30 14 C 9.497315 3.881042 -4.728261 0 VAL=3\nM V30 15 Cl 10.343800 4.127451 -6.210094 0\nM V30 16 C 8.198381 3.400308 -4.774085 0 VAL=3\nM V30 17 Li 4.878179 1.191928 -6.789255 0 VAL=1\nM V30 18 H 2.752273 0.835329 -1.001489 0\nM V30 19 H 3.813306 -0.006744 -2.114491 0\nM V30 20 H 3.554622 1.873393 -5.217136 0\nM V30 21 H 4.195656 0.484086 -4.269056 0\nM V30 22 H 5.807799 2.481975 -2.478489 0\nM V30 23 H 7.607320 3.284075 -1.434442 0\nM V30 24 H 9.902565 4.167483 -1.393913 0\nM V30 25 H 11.142475 4.528718 -3.508048 0\nM V30 26 H 7.744253 3.227429 -5.738427 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 2 3\nM V30 3 1 2 4\nM V30 4 1 4 5\nM V30 5 1 4 18\nM V30 6 1 4 19\nM V30 7 1 5 6\nM V30 8 1 6 7\nM V30 9 1 6 20\nM V30 10 1 6 21\nM V30 11 1 7 8\nM V30 12 1 7 9\nM V30 13 1 8 17\nM V30 14 1 9 10\nM V30 15 1 9 22\nM V30 16 1 10 11\nM V30 17 1 10 16\nM V30 18 1 11 12\nM V30 19 1 11 23\nM V30 20 1 12 13\nM V30 21 1 12 24\nM V30 22 1 13 14\nM V30 23 1 13 25\nM V30 24 1 14 15\nM V30 25 1 14 16\nM V30 26 1 16 26\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]"}", "/scratch/micpie/export/orbnet_denali/train_0-5.jsonl": "{"text":"Task: Return the total energy of a chemical structure computed at the {\\omega}B97X-D3\/def2-TZVP level of theory.\nDescription: The chemical structure has the XYZ file [XYZ]\n35\nH14 C16 I1 N2 O2\nO -2.154 -0.717 -3.091\nC -2.202 -0.565 -1.929\nO -1.122 -1.409 -1.401\nC -1.305 -1.457 -0.153\nN -0.152 -2.111 0.661\nC 1.219 -1.647 0.224\nC 1.664 -0.394 0.961\nC 2.777 0.375 0.333\nC 3.935 0.634 1.154\nC 5.097 1.070 0.548\nC 5.103 1.426 -0.852\nC 3.906 1.419 -1.526\nC 2.850 0.848 -0.977\nI 1.179 0.659 -2.177\nN -1.875 -0.757 0.760\nC -3.056 -0.085 0.312\nC -3.964 0.458 1.172\nC -5.089 1.206 0.786\nC -5.206 1.642 -0.536\nC -4.212 1.061 -1.374\nC -3.158 0.148 -1.083\nH -0.111 -3.134 0.808\nH -0.337 -1.781 1.604\nH 1.144 -1.613 -0.889\nH 1.963 -2.359 0.423\nH 2.027 -0.644 1.936\nH 0.824 0.315 1.112\nH 3.724 0.534 2.253\nH 5.898 1.526 1.184\nH 5.975 1.976 -1.191\nH 3.709 2.064 -2.391\nH -3.781 0.112 2.214\nH -5.796 1.666 1.521\nH -5.858 2.404 -0.949\nH -4.357 1.395 -2.509[\\XYZ].\nAnswer: -1175.56822 Hartree"} {"text":"Task: Return the total energy of a compound computed at the {\\omega}B97X-D3\/def2-TZVP level of theory.\nDescription: The compound has the XYZ file [XYZ]\n44\nH18 C19 O7\nC -5.598 2.587 0.816\nO -5.926 1.423 0.013\nC -4.939 0.521 -0.270\nC -5.418 -0.709 -0.835\nC -4.461 -1.616 -1.307\nO -4.770 -2.786 -1.785\nC -3.070 -1.328 -1.136\nC -2.046 -2.384 -1.436\nO -2.387 -3.493 -1.882\nC -0.727 -2.031 -1.027\nC -0.429 -0.822 -0.590\nC 0.909 -0.400 -0.105\nC 1.179 1.028 -0.092\nC 2.416 1.533 0.228\nO 2.519 2.881 -0.147\nC 3.669 3.567 0.196\nC 3.432 0.679 0.617\nO 4.558 1.195 1.054\nC 5.499 0.294 1.785\nC 3.242 -0.665 0.606\nC 2.027 -1.198 0.254\nO 1.813 -2.546 0.258\nC 2.922 -3.496 0.237\nO -1.369 0.120 -0.394\nC -2.688 -0.147 -0.590\nC -3.591 0.803 -0.177\nH -6.605 2.908 1.172\nH -5.052 2.257 1.748\nH -5.030 3.357 0.311\nH -6.532 -0.752 -1.003\nH -3.809 -3.287 -1.760\nH 0.075 -2.844 -1.034\nH 0.410 1.814 -0.270\nH 3.583 4.677 -0.228\nH 4.585 3.040 -0.264\nH 3.881 3.555 1.284\nH 5.951 -0.390 1.068\nH 4.944 -0.290 2.595\nH 6.342 0.947 2.161\nH 3.966 -1.351 0.972\nH 3.545 -3.572 1.173\nH 2.440 -4.507 0.143\nH 3.593 -3.329 -0.658\nH -3.248 1.831 0.124[\\XYZ].\nAnswer: -1261.44565 Hartree"}", "/scratch/micpie/export/orbnet_denali/test_1-0.jsonl": "{"text":"The compound with SELFIES [H][O][C][C][Branch1][C][H][C][Branch1][#Branch2][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][C][H][C][O][C][Branch2][Ring2][#C][C][C][Branch1][C][H][C][Branch1][#Branch2][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][#Branch2][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][C][H][C][Ring2][Ring1][C][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][C][H][C][Branch1][C][O][C][Ring2][Ring2][#Branch2][Ring2][Ring1][#C] has a charge of 0."} {"text":"The molecule with SELFIES [H][C][C][Branch1][C][H][C][Branch1][C][Cl][C][Branch1][C][H][C][Branch2][Ring1][=C][N][Branch1][C][H][C][Branch1][Ring1][O][Li][C][Branch1][C][H][Branch1][C][H][S][C][Branch1][C][H][Branch1][C][H][C][Branch1][C][O][O][C][Ring2][Ring1][Branch2][H] has a charge of 0."}", "/scratch/micpie/export/orbnet_denali/train_0-2.jsonl": "{"text":"Question: What's the structure of a conformer of the chemical structure with DeepSMILES [H].[H]CCCCO)OCN[H])[H])C[H])[H])C[H])[H])CC[H])C[H])C[H])C[H])C6I))))))))))NC6C[H])C%10[H]?\nConstraint: Return a MOL2000 file.\nAnswer: [V2000]\n\n ChemNLP 3D\n\n 35 36 0 0 0 0 0 0 0 0999 V2000\n -2.1537 -0.7170 -3.0912 O 0 0 0 0 0 1 0 0 0 0 0 0\n -2.2018 -0.5651 -1.9290 C 0 0 0 0 0 3 0 0 0 0 0 0\n -1.1216 -1.4089 -1.4007 O 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3049 -1.4570 -0.1525 C 0 0 0 0 0 3 0 0 0 0 0 0\n -0.1518 -2.1108 0.6606 N 0 0 0 0 0 4 0 0 0 0 0 0\n 1.2187 -1.6469 0.2235 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.6639 -0.3940 0.9612 C 0 0 0 0 0 0 0 0 0 0 0 0\n 2.7772 0.3753 0.3325 C 0 0 0 0 0 3 0 0 0 0 0 0\n 3.9354 0.6341 1.1544 C 0 0 0 0 0 3 0 0 0 0 0 0\n 5.0973 1.0702 0.5478 C 0 0 0 0 0 3 0 0 0 0 0 0\n 5.1027 1.4265 -0.8523 C 0 0 0 0 0 3 0 0 0 0 0 0\n 3.9061 1.4187 -1.5262 C 0 0 0 0 0 3 0 0 0 0 0 0\n 2.8495 0.8484 -0.9769 C 0 0 0 0 0 3 0 0 0 0 0 0\n 1.1787 0.6593 -2.1767 I 0 0 0 0 0 0 0 0 0 0 0 0\n -1.8746 -0.7566 0.7603 N 0 0 0 0 0 2 0 0 0 0 0 0\n -3.0560 -0.0854 0.3123 C 0 0 0 0 0 3 0 0 0 0 0 0\n -3.9639 0.4576 1.1724 C 0 0 0 0 0 3 0 0 0 0 0 0\n -5.0888 1.2060 0.7863 C 0 0 0 0 0 3 0 0 0 0 0 0\n -5.2062 1.6420 -0.5355 C 0 0 0 0 0 3 0 0 0 0 0 0\n -4.2116 1.0609 -1.3745 C 0 0 0 0 0 2 0 0 0 0 0 0\n -3.1583 0.1476 -1.0833 C 0 0 0 0 0 3 0 0 0 0 0 0\n -0.1107 -3.1341 0.8077 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.3366 -1.7808 1.6035 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.1437 -1.6132 -0.8888 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.9634 -2.3586 0.4228 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.0268 -0.6441 1.9365 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.8237 0.3151 1.1121 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.7238 0.5341 2.2532 H 0 0 0 0 0 0 0 0 0 0 0 0\n 5.8981 1.5257 1.1841 H 0 0 0 0 0 0 0 0 0 0 0 0\n 5.9748 1.9757 -1.1912 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.7095 2.0643 -2.3905 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.7813 0.1120 2.2138 H 0 0 0 0 0 0 0 0 0 0 0 0\n -5.7960 1.6665 1.5210 H 0 0 0 0 0 0 0 0 0 0 0 0\n -5.8576 2.4042 -0.9493 H 0 0 0 0 0 0 0 0 0 0 0 0\n -4.3568 1.3954 -2.5088 H 0 0 0 0 0 15 0 0 0 0 0 0\n 1 2 1 0\n 2 3 1 0\n 2 21 1 0\n 3 4 1 0\n 4 5 1 0\n 4 15 1 0\n 5 6 1 0\n 5 22 1 0\n 5 23 1 0\n 6 7 1 0\n 6 24 1 0\n 6 25 1 0\n 7 8 1 0\n 7 26 1 0\n 7 27 1 0\n 8 9 1 0\n 8 13 1 0\n 9 10 1 0\n 9 28 1 0\n 10 11 1 0\n 10 29 1 0\n 11 12 1 0\n 11 30 1 0\n 12 13 1 0\n 12 31 1 0\n 13 14 1 0\n 15 16 1 0\n 16 17 1 0\n 16 21 1 0\n 17 18 1 0\n 17 32 1 0\n 18 19 1 0\n 18 33 1 0\n 19 20 1 0\n 19 34 1 0\n 20 21 1 0\nM END\n[\\V2000]"} {"text":"Question: What is the structure of a conformer of the compound with DeepSMILES [H].[H]OCC[H])COC[H])[H])[H])))C[H])COCCC[H])COC[H])[H])))COC[H])[H])[H])))C[H])C6OC[H])[H])[H])))))))))C[H])CO)C%106?\nConstraint: Return a MOL2000 file.\nAnswer: [V2000]\n\n ChemNLP 3D\n\n 44 45 0 0 0 0 0 0 0 0999 V2000\n -5.5981 2.5875 0.8161 C 0 0 0 0 0 0 0 0 0 0 0 0\n -5.9263 1.4232 0.0135 O 0 0 0 0 0 0 0 0 0 0 0 0\n -4.9393 0.5207 -0.2704 C 0 0 0 0 0 3 0 0 0 0 0 0\n -5.4183 -0.7089 -0.8350 C 0 0 0 0 0 3 0 0 0 0 0 0\n -4.4609 -1.6163 -1.3073 C 0 0 0 0 0 3 0 0 0 0 0 0\n -4.7699 -2.7862 -1.7849 O 0 0 0 0 0 0 0 0 0 0 0 0\n -3.0697 -1.3279 -1.1356 C 0 0 0 0 0 3 0 0 0 0 0 0\n -2.0456 -2.3836 -1.4356 C 0 0 0 0 0 3 0 0 0 0 0 0\n -2.3866 -3.4929 -1.8817 O 0 0 0 0 0 1 0 0 0 0 0 0\n -0.7269 -2.0312 -1.0275 C 0 0 0 0 0 3 0 0 0 0 0 0\n -0.4295 -0.8222 -0.5902 C 0 0 0 0 0 3 0 0 0 0 0 0\n 0.9087 -0.4000 -0.1045 C 0 0 0 0 0 3 0 0 0 0 0 0\n 1.1794 1.0282 -0.0923 C 0 0 0 0 0 3 0 0 0 0 0 0\n 2.4160 1.5333 0.2277 C 0 0 0 0 0 3 0 0 0 0 0 0\n 2.5186 2.8808 -0.1473 O 0 0 0 0 0 0 0 0 0 0 0 0\n 3.6694 3.5666 0.1960 C 0 0 0 0 0 3 0 0 0 0 0 0\n 3.4323 0.6794 0.6168 C 0 0 0 0 0 3 0 0 0 0 0 0\n 4.5579 1.1952 1.0540 O 0 0 0 0 0 0 0 0 0 0 0 0\n 5.4987 0.2939 1.7846 C 0 0 0 0 0 0 0 0 0 0 0 0\n 3.2421 -0.6652 0.6057 C 0 0 0 0 0 3 0 0 0 0 0 0\n 2.0270 -1.1983 0.2543 C 0 0 0 0 0 3 0 0 0 0 0 0\n 1.8127 -2.5463 0.2576 O 0 0 0 0 0 0 0 0 0 0 0 0\n 2.9217 -3.4964 0.2374 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3688 0.1196 -0.3936 O 0 0 0 0 0 0 0 0 0 0 0 0\n -2.6880 -0.1472 -0.5900 C 0 0 0 0 0 3 0 0 0 0 0 0\n -3.5913 0.8034 -0.1773 C 0 0 0 0 0 3 0 0 0 0 0 0\n -6.6053 2.9083 1.1719 H 0 0 0 0 0 0 0 0 0 0 0 0\n -5.0516 2.2569 1.7481 H 0 0 0 0 0 0 0 0 0 0 0 0\n -5.0298 3.3574 0.3113 H 0 0 0 0 0 0 0 0 0 0 0 0\n -6.5317 -0.7516 -1.0027 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.8086 -3.2869 -1.7595 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0754 -2.8438 -1.0344 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.4104 1.8139 -0.2699 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.5826 4.6774 -0.2276 H 0 0 0 0 0 15 0 0 0 0 0 0\n 4.5854 3.0403 -0.2642 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.8810 3.5548 1.2841 H 0 0 0 0 0 0 0 0 0 0 0 0\n 5.9513 -0.3897 1.0678 H 0 0 0 0 0 0 0 0 0 0 0 0\n 4.9443 -0.2899 2.5946 H 0 0 0 0 0 0 0 0 0 0 0 0\n 6.3418 0.9468 2.1615 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.9658 -1.3514 0.9719 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.5447 -3.5722 1.1725 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.4403 -4.5065 0.1426 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.5932 -3.3290 -0.6578 H 0 0 0 0 0 0 0 0 0 0 0 0\n -3.2482 1.8310 0.1237 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 1 27 1 0\n 1 28 1 0\n 1 29 1 0\n 2 3 1 0\n 3 4 1 0\n 3 26 1 0\n 4 5 1 0\n 4 30 1 0\n 5 6 1 0\n 5 7 1 0\n 6 31 1 0\n 7 8 1 0\n 7 25 1 0\n 8 9 1 0\n 8 10 1 0\n 10 11 1 0\n 10 32 1 0\n 11 12 1 0\n 11 24 1 0\n 12 13 1 0\n 12 21 1 0\n 13 14 1 0\n 13 33 1 0\n 14 15 1 0\n 14 17 1 0\n 15 16 1 0\n 16 35 1 0\n 16 36 1 0\n 17 18 1 0\n 17 20 1 0\n 18 19 1 0\n 19 37 1 0\n 19 38 1 0\n 19 39 1 0\n 20 21 1 0\n 20 40 1 0\n 21 22 1 0\n 22 23 1 0\n 23 41 1 0\n 23 42 1 0\n 23 43 1 0\n 24 25 1 0\n 25 26 1 0\n 26 44 1 0\nM END\n[\\V2000]"}", "/scratch/micpie/export/orbnet_denali/train_0-7.jsonl": "{"text":"User: I would like to know the {\\omega}B97X-D3\/def2-TZVP total energy of the chemical structure with SMILES [H].[H]C1CC2C(O)OC(N([H])([H])C([H])([H])C([H])([H])C3C([H])C([H])C([H])C([H])C3I)NC2C([H])C1[H].\nAssistant: Do you have the XYZ file file of a conformer of the chemical structure?\nUser: I do: [XYZ]\n35\nH14 C16 I1 N2 O2\nO -2.154 -0.717 -3.091\nC -2.202 -0.565 -1.929\nO -1.122 -1.409 -1.401\nC -1.305 -1.457 -0.153\nN -0.152 -2.111 0.661\nC 1.219 -1.647 0.224\nC 1.664 -0.394 0.961\nC 2.777 0.375 0.333\nC 3.935 0.634 1.154\nC 5.097 1.070 0.548\nC 5.103 1.426 -0.852\nC 3.906 1.419 -1.526\nC 2.850 0.848 -0.977\nI 1.179 0.659 -2.177\nN -1.875 -0.757 0.760\nC -3.056 -0.085 0.312\nC -3.964 0.458 1.172\nC -5.089 1.206 0.786\nC -5.206 1.642 -0.536\nC -4.212 1.061 -1.374\nC -3.158 0.148 -1.083\nH -0.111 -3.134 0.808\nH -0.337 -1.781 1.604\nH 1.144 -1.613 -0.889\nH 1.963 -2.359 0.423\nH 2.027 -0.644 1.936\nH 0.824 0.315 1.112\nH 3.724 0.534 2.253\nH 5.898 1.526 1.184\nH 5.975 1.976 -1.191\nH 3.709 2.064 -2.391\nH -3.781 0.112 2.214\nH -5.796 1.666 1.521\nH -5.858 2.404 -0.949\nH -4.357 1.395 -2.509[\\XYZ]\nAssistant: The total energy on the {\\omega}B97X-D3\/def2-TZVP level of theory of the chemical structure is -1175.56822 Hartree."} {"text":"User: I would like to know the {\\omega}B97X-D3\/def2-TZVP total energy of the compound with DeepSMILES [H].[H]OCC[H])COC[H])[H])[H])))C[H])COCCC[H])COC[H])[H])))COC[H])[H])[H])))C[H])C6OC[H])[H])[H])))))))))C[H])CO)C%106.\nAssistant: Do you have the XYZ file file of a conformer of the compound?\nUser: I have it: [XYZ]\n44\nH18 C19 O7\nC -5.598 2.587 0.816\nO -5.926 1.423 0.013\nC -4.939 0.521 -0.270\nC -5.418 -0.709 -0.835\nC -4.461 -1.616 -1.307\nO -4.770 -2.786 -1.785\nC -3.070 -1.328 -1.136\nC -2.046 -2.384 -1.436\nO -2.387 -3.493 -1.882\nC -0.727 -2.031 -1.027\nC -0.429 -0.822 -0.590\nC 0.909 -0.400 -0.105\nC 1.179 1.028 -0.092\nC 2.416 1.533 0.228\nO 2.519 2.881 -0.147\nC 3.669 3.567 0.196\nC 3.432 0.679 0.617\nO 4.558 1.195 1.054\nC 5.499 0.294 1.785\nC 3.242 -0.665 0.606\nC 2.027 -1.198 0.254\nO 1.813 -2.546 0.258\nC 2.922 -3.496 0.237\nO -1.369 0.120 -0.394\nC -2.688 -0.147 -0.590\nC -3.591 0.803 -0.177\nH -6.605 2.908 1.172\nH -5.052 2.257 1.748\nH -5.030 3.357 0.311\nH -6.532 -0.752 -1.003\nH -3.809 -3.287 -1.760\nH 0.075 -2.844 -1.034\nH 0.410 1.814 -0.270\nH 3.583 4.677 -0.228\nH 4.585 3.040 -0.264\nH 3.881 3.555 1.284\nH 5.951 -0.390 1.068\nH 4.944 -0.290 2.595\nH 6.342 0.947 2.161\nH 3.966 -1.351 0.972\nH 3.545 -3.572 1.173\nH 2.440 -4.507 0.143\nH 3.593 -3.329 -0.658\nH -3.248 1.831 0.124[\\XYZ]\nAssistant: The total energy on the {\\omega}B97X-D3\/def2-TZVP level of theory of the compound is -1261.44565 Hartree."}", "/scratch/micpie/export/orbnet_denali/train_1-1.jsonl": "{"text":"Question: What's the structure of a conformer of the chemical structure with InChI InChI=1S\/C19H34O7.3H\/c1-22-10-5-12(20)19-13(21)8-15(26-18(19)6-10)11-7-16(24-3)17(25-4)9-14(11)23-2;;;\/h10-21H,5-9H2,1-4H3;;;?\nConstraint: Return a XYZ file.\nAnswer: [XYZ]\n44\nH18 C19 O7\nC -6.542 -1.167 -2.752\nO -5.088 -0.998 -2.662\nC -4.441 -0.523 -1.609\nC -5.113 -0.193 -0.439\nC -4.456 0.341 0.675\nO -5.062 0.788 1.771\nC -3.046 0.424 0.596\nC -2.270 0.793 1.781\nO -2.884 1.013 2.887\nC -0.767 0.911 1.585\nC -0.182 0.474 0.404\nC 1.284 0.202 0.252\nC 1.779 -1.107 -0.151\nC 3.174 -1.311 -0.258\nO 0.866 -2.147 -0.290\nC 1.390 -3.379 -0.673\nC 4.080 -0.279 0.061\nO 5.353 -0.605 -0.052\nC 6.532 -0.133 0.854\nC 3.569 1.026 0.431\nC 2.234 1.310 0.481\nO 4.421 1.977 0.840\nC 3.980 3.085 1.606\nO -0.928 0.131 -0.704\nC -2.313 -0.001 -0.611\nC -3.033 -0.518 -1.714\nH -6.692 -1.677 -3.713\nH -7.161 -0.227 -2.602\nH -6.898 -1.863 -2.014\nH -6.212 -0.192 -0.339\nH -4.311 1.050 2.378\nH -0.180 1.210 2.398\nH 0.568 -3.844 -1.348\nH 2.404 -3.406 -1.384\nH 3.595 -2.307 -0.454\nH 1.694 -4.046 0.308\nH 6.205 -0.100 1.962\nH 6.988 1.024 0.533\nH 7.342 -0.883 0.699\nH 4.810 3.631 1.927\nH 3.280 3.710 1.078\nH 1.707 2.303 0.631\nH 3.491 2.679 2.505\nH -2.446 -0.865 -2.521[\\XYZ]"} {"text":"Question: What is the structure of a conformer of the molecule with SMILES [H]C1C([H])C(Cl)C([H])C(N([H])C(O[K])C([H])([H])SC([H])([H])C(O)O)C1[H]?\nConstraint: Return a XYZ file.\nAnswer: [XYZ]\n26\nK1 H9 C10 S1 N1 Cl1 O3\nO 3.679 3.606 -0.112\nC 2.706 2.996 0.432\nO 1.620 3.459 0.748\nC 2.948 1.489 0.744\nS 3.940 0.690 -0.552\nC 5.486 0.389 0.329\nC 6.311 1.592 0.739\nO 7.269 1.394 1.510\nN 6.002 2.792 0.234\nC 6.782 3.922 0.518\nC 8.183 3.867 0.486\nC 8.934 5.009 0.745\nC 8.307 6.218 1.021\nC 6.917 6.274 1.018\nCl 6.132 7.777 1.324\nC 6.154 5.144 0.770\nK 9.103 2.257 2.626\nH 1.990 0.976 0.829\nH 3.460 1.424 1.706\nH 6.108 -0.187 -0.364\nH 5.337 -0.233 1.218\nH 4.911 3.050 -0.024\nH 8.695 2.938 0.214\nH 10.025 4.966 0.679\nH 8.887 7.113 1.213\nH 5.073 5.217 0.760[\\XYZ]"}", "/scratch/micpie/export/orbnet_denali/train_0-1.jsonl": "{"text":"Question: What is the structure of a conformer of the chemical with DeepSMILES [H].[H]CCCCO)OCN[H])[H])C[H])[H])C[H])[H])CC[H])C[H])C[H])C[H])C6I))))))))))NC6C[H])C%10[H]?\nConstraint: Return a XYZ file.\nAnswer: [XYZ]\n35\nH14 C16 I1 N2 O2\nO -2.154 -0.717 -3.091\nC -2.202 -0.565 -1.929\nO -1.122 -1.409 -1.401\nC -1.305 -1.457 -0.153\nN -0.152 -2.111 0.661\nC 1.219 -1.647 0.224\nC 1.664 -0.394 0.961\nC 2.777 0.375 0.333\nC 3.935 0.634 1.154\nC 5.097 1.070 0.548\nC 5.103 1.426 -0.852\nC 3.906 1.419 -1.526\nC 2.850 0.848 -0.977\nI 1.179 0.659 -2.177\nN -1.875 -0.757 0.760\nC -3.056 -0.085 0.312\nC -3.964 0.458 1.172\nC -5.089 1.206 0.786\nC -5.206 1.642 -0.536\nC -4.212 1.061 -1.374\nC -3.158 0.148 -1.083\nH -0.111 -3.134 0.808\nH -0.337 -1.781 1.604\nH 1.144 -1.613 -0.889\nH 1.963 -2.359 0.423\nH 2.027 -0.644 1.936\nH 0.824 0.315 1.112\nH 3.724 0.534 2.253\nH 5.898 1.526 1.184\nH 5.975 1.976 -1.191\nH 3.709 2.064 -2.391\nH -3.781 0.112 2.214\nH -5.796 1.666 1.521\nH -5.858 2.404 -0.949\nH -4.357 1.395 -2.509[\\XYZ]"} {"text":"Question: What's the structure of a conformer of the compound with InChI InChI=1S\/C19H34O7.H\/c1-22-10-5-12(20)19-13(21)8-15(26-18(19)6-10)11-7-16(24-3)17(25-4)9-14(11)23-2;\/h10-21H,5-9H2,1-4H3;?\nConstraint: Return a XYZ file.\nAnswer: [XYZ]\n44\nH18 C19 O7\nC -5.598 2.587 0.816\nO -5.926 1.423 0.013\nC -4.939 0.521 -0.270\nC -5.418 -0.709 -0.835\nC -4.461 -1.616 -1.307\nO -4.770 -2.786 -1.785\nC -3.070 -1.328 -1.136\nC -2.046 -2.384 -1.436\nO -2.387 -3.493 -1.882\nC -0.727 -2.031 -1.027\nC -0.429 -0.822 -0.590\nC 0.909 -0.400 -0.105\nC 1.179 1.028 -0.092\nC 2.416 1.533 0.228\nO 2.519 2.881 -0.147\nC 3.669 3.567 0.196\nC 3.432 0.679 0.617\nO 4.558 1.195 1.054\nC 5.499 0.294 1.785\nC 3.242 -0.665 0.606\nC 2.027 -1.198 0.254\nO 1.813 -2.546 0.258\nC 2.922 -3.496 0.237\nO -1.369 0.120 -0.394\nC -2.688 -0.147 -0.590\nC -3.591 0.803 -0.177\nH -6.605 2.908 1.172\nH -5.052 2.257 1.748\nH -5.030 3.357 0.311\nH -6.532 -0.752 -1.003\nH -3.809 -3.287 -1.760\nH 0.075 -2.844 -1.034\nH 0.410 1.814 -0.270\nH 3.583 4.677 -0.228\nH 4.585 3.040 -0.264\nH 3.881 3.555 1.284\nH 5.951 -0.390 1.068\nH 4.944 -0.290 2.595\nH 6.342 0.947 2.161\nH 3.966 -1.351 0.972\nH 3.545 -3.572 1.173\nH 2.440 -4.507 0.143\nH 3.593 -3.329 -0.658\nH -3.248 1.831 0.124[\\XYZ]"}", "/scratch/micpie/export/orbnet_denali/train_0-4.jsonl": "{"text":"Task: Return the total energy of a chemical structure computed at the GFN1-xTB level of theory.\nDescription: The chemical structure has the XYZ file [XYZ]\n35\nH14 C16 I1 N2 O2\nO -2.154 -0.717 -3.091\nC -2.202 -0.565 -1.929\nO -1.122 -1.409 -1.401\nC -1.305 -1.457 -0.153\nN -0.152 -2.111 0.661\nC 1.219 -1.647 0.224\nC 1.664 -0.394 0.961\nC 2.777 0.375 0.333\nC 3.935 0.634 1.154\nC 5.097 1.070 0.548\nC 5.103 1.426 -0.852\nC 3.906 1.419 -1.526\nC 2.850 0.848 -0.977\nI 1.179 0.659 -2.177\nN -1.875 -0.757 0.760\nC -3.056 -0.085 0.312\nC -3.964 0.458 1.172\nC -5.089 1.206 0.786\nC -5.206 1.642 -0.536\nC -4.212 1.061 -1.374\nC -3.158 0.148 -1.083\nH -0.111 -3.134 0.808\nH -0.337 -1.781 1.604\nH 1.144 -1.613 -0.889\nH 1.963 -2.359 0.423\nH 2.027 -0.644 1.936\nH 0.824 0.315 1.112\nH 3.724 0.534 2.253\nH 5.898 1.526 1.184\nH 5.975 1.976 -1.191\nH 3.709 2.064 -2.391\nH -3.781 0.112 2.214\nH -5.796 1.666 1.521\nH -5.858 2.404 -0.949\nH -4.357 1.395 -2.509[\\XYZ].\nAnswer: -60.53905 Hartree"} {"text":"Task: Return the total energy of a molecule computed at the GFN1-xTB level of theory.\nDescription: The molecule has the XYZ file [XYZ]\n44\nH18 C19 O7\nC -5.598 2.587 0.816\nO -5.926 1.423 0.013\nC -4.939 0.521 -0.270\nC -5.418 -0.709 -0.835\nC -4.461 -1.616 -1.307\nO -4.770 -2.786 -1.785\nC -3.070 -1.328 -1.136\nC -2.046 -2.384 -1.436\nO -2.387 -3.493 -1.882\nC -0.727 -2.031 -1.027\nC -0.429 -0.822 -0.590\nC 0.909 -0.400 -0.105\nC 1.179 1.028 -0.092\nC 2.416 1.533 0.228\nO 2.519 2.881 -0.147\nC 3.669 3.567 0.196\nC 3.432 0.679 0.617\nO 4.558 1.195 1.054\nC 5.499 0.294 1.785\nC 3.242 -0.665 0.606\nC 2.027 -1.198 0.254\nO 1.813 -2.546 0.258\nC 2.922 -3.496 0.237\nO -1.369 0.120 -0.394\nC -2.688 -0.147 -0.590\nC -3.591 0.803 -0.177\nH -6.605 2.908 1.172\nH -5.052 2.257 1.748\nH -5.030 3.357 0.311\nH -6.532 -0.752 -1.003\nH -3.809 -3.287 -1.760\nH 0.075 -2.844 -1.034\nH 0.410 1.814 -0.270\nH 3.583 4.677 -0.228\nH 4.585 3.040 -0.264\nH 3.881 3.555 1.284\nH 5.951 -0.390 1.068\nH 4.944 -0.290 2.595\nH 6.342 0.947 2.161\nH 3.966 -1.351 0.972\nH 3.545 -3.572 1.173\nH 2.440 -4.507 0.143\nH 3.593 -3.329 -0.658\nH -3.248 1.831 0.124[\\XYZ].\nAnswer: -78.67223 Hartree"}", "/scratch/micpie/export/orbnet_denali/valid_1-0.jsonl": "{"text":"The chemical structure with canonical SMILES COC1CC(O)C2C(O)CC(C3CC(OC)C(OC)CC3OC)OC2C1.[H].[H] has a charge of 0."} {"text":"The chemical structure with SMILES [H]C1C([H])C(Cl)C([H])C(N([H])C2O[K]OC(O)C([H])([H])SC2([H])[H])C1[H] has a charge of 0."}", "/scratch/micpie/export/orbnet_denali/test_0-7.jsonl": "{"text":"User: I must know the {\\omega}B97X-D3\/def2-TZVP total energy of the compound with DeepSMILES [H].[H]CCC[H])CI)CC[H])[H])C[H])[H])N[H])[H])CNCC[H])C[H])C[H])C[H])C6CO)O%10)))))))))))))C6[H].\nAssistant: Do you have the XYZ file file of a conformer of the compound?\nUser: Yes: [XYZ]\n35\nH14 C16 I1 N2 O2\nO -2.054 -1.196 -3.088\nC -2.281 -1.177 -1.852\nO -1.345 -1.785 -0.955\nC -1.195 -1.440 0.310\nN -0.079 -2.118 0.983\nC 1.270 -1.677 0.475\nC 1.606 -0.209 0.921\nC 2.886 0.260 0.255\nC 4.164 0.034 0.932\nC 5.444 0.190 0.245\nC 5.408 0.755 -1.040\nC 4.142 1.096 -1.658\nC 2.917 0.731 -1.039\nI 1.044 1.177 -2.067\nN -1.978 -0.700 0.985\nC -3.028 0.009 0.220\nC -3.810 0.980 0.905\nC -4.892 1.490 0.235\nC -5.165 1.141 -1.079\nC -4.379 0.256 -1.802\nC -3.237 -0.323 -1.126\nH -0.280 -3.134 0.808\nH -0.185 -1.978 1.970\nH 1.091 -1.731 -0.641\nH 1.972 -2.409 0.793\nH 1.741 -0.182 2.060\nH 0.810 0.503 0.617\nH 4.155 -0.341 1.973\nH 6.395 0.003 0.813\nH 6.404 0.853 -1.690\nH 4.157 1.477 -2.661\nH -3.480 1.259 1.951\nH -5.547 2.250 0.720\nH -6.061 1.537 -1.581\nH -4.536 -0.007 -2.827[\\XYZ]\nAssistant: The total energy on the {\\omega}B97X-D3\/def2-TZVP level of theory of the compound is -1175.64052 Hartree."} {"text":"User: I would like to know the {\\omega}B97X-D3\/def2-TZVP total energy of the compound with SELFIES [H].[H][O][C][C][Branch1][C][H][C][Branch1][#Branch2][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][C][H][C][O][C][Branch2][Ring2][N][C][C][Branch1][C][H][C][Branch1][#Branch2][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][#Branch2][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][C][H][C][Ring2][Ring1][C][O][C][Branch1][C][H][H][C][Branch1][C][H][C][Branch1][C][O][C][Ring2][Ring2][=Branch2][Ring2][Ring1][=C].\nAssistant: Do you have the XYZ file file of a conformer of the compound?\nUser: Here it is: [XYZ]\n44\nH18 C19 O7\nC -5.666 2.701 0.538\nO -5.907 1.500 -0.218\nC -4.925 0.591 -0.544\nC -5.368 -0.571 -1.168\nC -4.365 -1.568 -1.399\nO -4.759 -2.607 -2.108\nC -3.031 -1.354 -1.064\nC -2.106 -2.461 -1.198\nO -2.420 -3.607 -1.637\nC -0.708 -2.198 -0.814\nC -0.475 -0.973 -0.269\nC 0.820 -0.487 0.148\nC 1.052 0.941 0.366\nC 2.280 1.558 0.481\nO 2.201 2.924 0.494\nC 3.396 3.599 0.923\nC 3.480 0.694 0.443\nO 4.760 1.208 0.611\nC 6.014 0.321 0.236\nC 3.271 -0.697 0.372\nC 1.998 -1.328 0.251\nO 1.915 -2.662 0.131\nC 2.939 -3.386 0.851\nO -1.395 0.020 -0.161\nC -2.680 -0.140 -0.486\nC -3.606 0.894 -0.329\nH -6.576 3.228 0.666\nH -5.247 2.422 1.490\nH -4.954 3.418 0.055\nH -6.407 -0.714 -1.300\nH -3.957 -3.333 -2.024\nH 0.057 -2.945 -0.917\nH 0.160 1.640 0.360\nH 2.986 4.676 1.175\nH 4.179 3.540 0.120\nH 3.815 3.135 1.855\nH 5.897 -0.206 -0.788\nH 6.241 -0.309 1.144\nH 6.897 0.964 0.152\nH 4.246 -1.221 0.270\nH 2.981 -3.001 1.979\nH 2.758 -4.539 0.933\nH 3.979 -3.301 0.444\nH -3.230 1.838 0.076[\\XYZ]\nAssistant: The total energy on the {\\omega}B97X-D3\/def2-TZVP level of theory of the compound is -1261.42237 Hartree."}", "/scratch/micpie/export/orbnet_denali/test_1-6.jsonl": "{"text":"User: I would like to know the GFN1-xTB total energy of the chemical structure with SELFIES [H][O][C][C][Branch1][C][H][C][Branch1][#Branch2][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][C][H][C][O][C][Branch2][Ring2][#C][C][C][Branch1][C][H][C][Branch1][#Branch2][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][#Branch2][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][C][H][C][Ring2][Ring1][C][O][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][C][H][C][Branch1][C][O][C][Ring2][Ring2][#Branch2][Ring2][Ring1][#C].\nAssistant: Do you have the XYZ file file of a conformer of the chemical structure?\nUser: I have it: [XYZ]\n44\nH18 C19 O7\nC -6.444 -1.225 -2.667\nO -5.032 -1.257 -2.558\nC -4.412 -0.679 -1.504\nC -5.091 -0.023 -0.474\nC -4.376 0.545 0.576\nO -5.018 1.170 1.557\nC -2.962 0.461 0.601\nC -2.198 1.043 1.694\nO -2.735 1.639 2.634\nC -0.768 0.855 1.586\nC -0.210 0.202 0.533\nC 1.237 0.002 0.376\nC 1.781 -1.141 -0.237\nC 3.164 -1.258 -0.344\nO 0.928 -2.101 -0.660\nC 1.411 -3.284 -1.265\nC 4.029 -0.275 0.133\nO 5.341 -0.543 -0.059\nC 6.381 0.326 0.361\nC 3.490 0.872 0.752\nC 2.114 0.983 0.860\nO 4.349 1.836 1.215\nC 3.840 2.982 1.870\nO -0.958 -0.304 -0.464\nC -2.311 -0.205 -0.448\nC -3.018 -0.773 -1.497\nH -6.690 -1.754 -3.589\nH -6.816 -0.196 -2.731\nH -6.921 -1.732 -1.821\nH -6.169 0.059 -0.465\nH -4.316 1.497 2.201\nH -0.152 1.234 2.391\nH 0.530 -3.878 -1.512\nH 1.968 -3.066 -2.184\nH 3.617 -2.125 -0.805\nH 2.050 -3.854 -0.580\nH 6.371 0.467 1.446\nH 6.316 1.298 -0.137\nH 7.313 -0.165 0.072\nH 4.708 3.575 2.163\nH 3.206 3.579 1.204\nH 1.685 1.869 1.308\nH 3.270 2.712 2.768\nH -2.502 -1.283 -2.298[\\XYZ]\nAssistant: The GFN1-xTB total energy of the chemical structure is -78.73657 Hartree."} {"text":"User: I have to know the GFN1-xTB total energy of the chemical structure with DeepSMILES [H]CC[H])CCl)C[H])CN[H])CO[Li]))C[H])[H])SC[H])[H])CO)O)))))))C6[H].\nAssistant: Do you have the XYZ file file of a conformer of the chemical structure?\nUser: Yes: [XYZ]\n26\nLi1 H9 C10 S1 N1 Cl1 O3\nO 4.932 0.704 0.314\nC 4.841 1.329 -0.734\nO 5.616 2.232 -1.168\nC 3.620 0.950 -1.636\nS 3.203 2.210 -2.877\nC 4.122 1.576 -4.294\nC 5.529 2.120 -4.531\nO 5.925 1.990 -5.727\nN 6.199 2.648 -3.540\nC 7.514 3.156 -3.584\nC 8.133 3.450 -2.367\nC 9.427 3.944 -2.343\nC 10.125 4.157 -3.524\nC 9.497 3.881 -4.728\nCl 10.344 4.127 -6.210\nC 8.198 3.400 -4.774\nLi 4.878 1.192 -6.789\nH 2.752 0.835 -1.001\nH 3.813 -0.007 -2.114\nH 3.555 1.873 -5.217\nH 4.196 0.484 -4.269\nH 5.808 2.482 -2.478\nH 7.607 3.284 -1.434\nH 9.903 4.167 -1.394\nH 11.142 4.529 -3.508\nH 7.744 3.227 -5.738[\\XYZ]\nAssistant: The GFN1-xTB total energy of the chemical structure is -49.40739 Hartree."}", "/scratch/micpie/export/orbnet_denali/valid_0-3.jsonl": "{"text":"Question: What's the structure of a conformer of the chemical with DeepSMILES [H]CC[H])C[H])CC[H])[H])C[H])[H])N[H])[H])CNCC[H])C[H])C[H])C[H])C6CO)O%10)))))))))))))CI)C6[H]?\nConstraint: Return a MOL3000 file.\nAnswer: [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 35 37 0 0 0\nM V30 BEGIN ATOM\nM V30 1 O -2.540103 -1.190494 -3.024580 0 VAL=1\nM V30 2 C -2.433895 -1.001429 -1.860949 0 VAL=3\nM V30 3 O -1.397856 -1.690668 -1.154207 0\nM V30 4 C -1.231032 -1.479446 0.137848 0 VAL=3\nM V30 5 N -0.087771 -2.241967 0.652745 0 VAL=4\nM V30 6 C 1.221441 -1.655500 0.190432 0\nM V30 7 C 1.444238 -0.282843 0.818340 0\nM V30 8 C 2.753127 0.289790 0.317922 0 VAL=3\nM V30 9 C 3.874157 0.246936 1.143204 0 VAL=3\nM V30 10 C 5.086856 0.766726 0.716523 0 VAL=3\nM V30 11 C 5.184426 1.332504 -0.546298 0 VAL=3\nM V30 12 C 4.074132 1.372826 -1.376653 0 VAL=3\nM V30 13 C 2.856523 0.851732 -0.954896 0 VAL=3\nM V30 14 I 1.196922 0.925679 -2.270431 0\nM V30 15 N -1.867907 -0.738824 0.932052 0 VAL=2\nM V30 16 C -2.934506 -0.019830 0.377123 0 VAL=3\nM V30 17 C -3.696506 0.813251 1.191684 0 VAL=3\nM V30 18 C -4.745638 1.528138 0.636606 0 VAL=3\nM V30 19 C -5.039840 1.420896 -0.721955 0 VAL=3\nM V30 20 C -4.286213 0.595991 -1.537381 0 VAL=3\nM V30 21 C -3.224019 -0.135244 -0.994885 0 VAL=3\nM V30 22 H -0.160854 -3.196612 0.301437 0\nM V30 23 H -0.137167 -2.230590 1.671450 0\nM V30 24 H 1.162781 -1.570428 -0.899757 0\nM V30 25 H 2.002685 -2.363714 0.473321 0\nM V30 26 H 1.454724 -0.371826 1.908258 0\nM V30 27 H 0.619404 0.375106 0.527542 0\nM V30 28 H 3.790297 -0.192506 2.131583 0\nM V30 29 H 5.949819 0.730135 1.370311 0\nM V30 30 H 6.125778 1.744489 -0.889624 0\nM V30 31 H 4.152506 1.813097 -2.362641 0\nM V30 32 H -3.459377 0.890395 2.245564 0\nM V30 33 H -5.343114 2.177478 1.265803 0\nM V30 34 H -5.863417 1.986688 -1.140685 0\nM V30 35 H -4.500601 0.500062 -2.595861 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 2 3\nM V30 3 1 2 21\nM V30 4 1 3 4\nM V30 5 1 4 5\nM V30 6 1 4 15\nM V30 7 1 5 6\nM V30 8 1 5 22\nM V30 9 1 5 23\nM V30 10 1 6 7\nM V30 11 1 6 24\nM V30 12 1 6 25\nM V30 13 1 7 8\nM V30 14 1 7 26\nM V30 15 1 7 27\nM V30 16 1 8 9\nM V30 17 1 8 13\nM V30 18 1 9 10\nM V30 19 1 9 28\nM V30 20 1 10 11\nM V30 21 1 10 29\nM V30 22 1 11 12\nM V30 23 1 11 30\nM V30 24 1 12 13\nM V30 25 1 12 31\nM V30 26 1 13 14\nM V30 27 1 15 16\nM V30 28 1 16 17\nM V30 29 1 16 21\nM V30 30 1 17 18\nM V30 31 1 17 32\nM V30 32 1 18 19\nM V30 33 1 18 33\nM V30 34 1 19 20\nM V30 35 1 19 34\nM V30 36 1 20 21\nM V30 37 1 20 35\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]"} {"text":"Question: What's the structure of a conformer of the chemical structure with canonical SMILES COC1C(C(O)CC(O)C2CCCCC2)CCC2OCCC21?\nConstraint: Return a MOL3000 file.\nAnswer: [V3000]\n\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 36 38 0 0 0\nM V30 BEGIN ATOM\nM V30 1 C -2.217687 -1.492382 2.559104 0 VAL=3\nM V30 2 O -5.256178 -1.279683 0.457441 0\nM V30 3 C -4.004187 -1.186160 1.035626 0 VAL=3\nM V30 4 C -1.352331 -0.748252 1.755460 0 VAL=3\nM V30 5 C 0.089638 -0.808818 2.105938 0 VAL=3\nM V30 6 O 0.500272 -1.048031 3.205330 0 VAL=1\nM V30 7 C 0.906010 -0.582672 0.885775 0 VAL=3\nM V30 8 C 2.882330 -0.502659 -1.448306 0 VAL=3\nM V30 9 O 2.653191 0.379714 2.127734 0\nM V30 10 C 3.024521 0.286994 -0.317598 0 VAL=3\nM V30 11 C 3.732062 -0.369437 -2.513078 0 VAL=3\nM V30 12 C 4.734769 0.634307 -2.440545 0 VAL=3\nM V30 13 C 4.728314 1.588301 -1.387604 0 VAL=3\nM V30 14 C 3.995368 1.331104 -0.253256 0 VAL=3\nM V30 15 C 2.161929 -0.014627 0.895741 0 VAL=3\nM V30 16 C -1.854319 -0.095232 0.548968 0 VAL=3\nM V30 17 C -1.604552 1.747132 -0.995537 0\nM V30 18 C -3.171391 -0.420252 0.174158 0 VAL=3\nM V30 19 O -1.040788 0.760005 -0.135313 0\nM V30 20 C -4.021762 0.030830 -0.855552 0 VAL=3\nM V30 21 C -5.230882 -0.574769 -0.712821 0 VAL=3\nM V30 22 C -3.587310 -1.742569 2.265661 0 VAL=3\nM V30 23 H -4.233802 -2.330313 2.952823 0\nM V30 24 H -1.786829 -1.824699 3.577188 0\nM V30 25 H 0.460016 -0.761705 -0.063113 0\nM V30 26 H -0.765888 2.364338 -1.261163 0\nM V30 27 H 2.003165 -1.193700 -1.540570 0\nM V30 28 H 3.702661 -1.017874 -3.399618 0\nM V30 29 H 5.418395 0.901982 -3.203091 0\nM V30 30 H 5.428019 2.427512 -1.383060 0\nM V30 31 H 3.545665 0.998287 2.171333 0\nM V30 32 H 4.031352 1.980398 0.638550 0\nM V30 33 H -2.386323 2.271592 -0.502259 0\nM V30 34 H -2.064252 1.319199 -1.908908 0\nM V30 35 H -6.157670 -0.616637 -1.203088 0\nM V30 36 H -3.739688 0.773918 -1.595133 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 4\nM V30 2 1 1 22\nM V30 3 1 1 24\nM V30 4 1 2 3\nM V30 5 1 2 21\nM V30 6 1 3 18\nM V30 7 1 3 22\nM V30 8 1 4 5\nM V30 9 1 4 16\nM V30 10 1 5 6\nM V30 11 1 5 7\nM V30 12 1 7 15\nM V30 13 1 7 25\nM V30 14 1 8 10\nM V30 15 1 8 11\nM V30 16 1 8 27\nM V30 17 1 9 15\nM V30 18 1 9 31\nM V30 19 1 10 14\nM V30 20 1 10 15\nM V30 21 1 11 12\nM V30 22 1 11 28\nM V30 23 1 12 13\nM V30 24 1 12 29\nM V30 25 1 13 14\nM V30 26 1 13 30\nM V30 27 1 14 32\nM V30 28 1 16 18\nM V30 29 1 16 19\nM V30 30 1 17 19\nM V30 31 1 17 26\nM V30 32 1 17 33\nM V30 33 1 17 34\nM V30 34 1 18 20\nM V30 35 1 20 21\nM V30 36 1 20 36\nM V30 37 1 21 35\nM V30 38 1 22 23\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000]"}", "/scratch/micpie/export/orbnet_denali/valid_1-5.jsonl": "{"text":"Task: Return the total energy of a compound computed at the {\\omega}B97X-D3\/def2-TZVP level of theory.\nDescription: The compound has the XYZ file [XYZ]\n44\nH18 C19 O7\nC -6.481 -0.583 -2.661\nO -5.062 -0.756 -2.484\nC -4.467 -0.303 -1.382\nC -5.139 0.379 -0.381\nC -4.411 0.726 0.787\nO -5.076 1.308 1.766\nC -3.045 0.598 0.851\nC -2.252 1.109 1.982\nO -2.736 1.714 2.918\nC -0.837 0.838 1.844\nC -0.294 0.172 0.813\nC 1.134 -0.141 0.610\nC 1.514 -1.332 -0.067\nC 2.827 -1.677 -0.276\nO 3.038 -2.879 -0.878\nC 4.364 -3.212 -1.413\nC 3.874 -0.699 0.051\nO 5.193 -0.926 -0.246\nC 6.215 0.040 0.114\nC 3.496 0.469 0.707\nC 2.157 0.722 0.985\nO 1.761 1.876 1.664\nC 1.809 3.102 0.922\nO -1.062 -0.357 -0.116\nC -2.383 -0.159 -0.139\nC -3.095 -0.594 -1.247\nH -6.753 -1.026 -3.600\nH -6.783 0.451 -2.593\nH -7.069 -1.133 -1.887\nH -6.186 0.560 -0.303\nH -4.415 1.560 2.496\nH -0.195 1.205 2.616\nH 0.770 -2.024 -0.470\nH 4.247 -4.144 -1.964\nH 4.753 -2.368 -2.117\nH 5.080 -3.422 -0.580\nH 6.143 0.312 1.210\nH 6.193 0.978 -0.511\nH 7.171 -0.468 0.027\nH 4.205 1.248 0.958\nH 2.682 3.059 0.118\nH 0.784 3.293 0.401\nH 2.078 4.044 1.596\nH -2.676 -1.184 -2.043[\\XYZ].\nAnswer: -1261.46817 Hartree"} {"text":"Task: Return the total energy of a compound computed at the {\\omega}B97X-D3\/def2-TZVP level of theory.\nDescription: The compound has the XYZ file [XYZ]\n26\nK1 H9 C10 S1 N1 Cl1 O3\nO 3.248 0.034 -0.392\nC 2.472 0.812 0.211\nO 1.284 1.038 0.016\nC 3.145 1.611 1.415\nS 3.895 3.156 0.828\nC 5.431 2.612 0.009\nC 6.455 2.333 1.077\nO 6.579 1.193 1.529\nN 7.215 3.326 1.597\nC 7.349 4.677 1.186\nC 6.244 5.492 0.963\nC 6.433 6.819 0.600\nC 7.711 7.343 0.473\nC 8.808 6.529 0.719\nCl 10.403 7.156 0.588\nC 8.636 5.199 1.070\nK 5.167 -0.455 0.690\nH 3.886 0.990 1.957\nH 2.368 1.942 2.108\nH 5.732 3.411 -0.664\nH 5.254 1.688 -0.570\nH 7.885 2.989 2.283\nH 5.239 5.112 1.101\nH 5.572 7.452 0.419\nH 7.860 8.381 0.209\nH 9.497 4.572 1.252[\\XYZ].\nAnswer: -2125.39707 Hartree"}", "/scratch/micpie/export/orbnet_denali/valid_1-7.jsonl": "{"text":"User: I would like to know the {\\omega}B97X-D3\/def2-TZVP total energy of the chemical with InChI InChI=1S\/C19H34O7.2H\/c1-22-10-5-12(20)19-13(21)8-15(26-18(19)6-10)11-7-16(24-3)17(25-4)9-14(11)23-2;;\/h10-21H,5-9H2,1-4H3;;.\nAssistant: Do you have the XYZ file file of a conformer of the chemical?\nUser: Here it is: [XYZ]\n44\nH18 C19 O7\nC -6.481 -0.583 -2.661\nO -5.062 -0.756 -2.484\nC -4.467 -0.303 -1.382\nC -5.139 0.379 -0.381\nC -4.411 0.726 0.787\nO -5.076 1.308 1.766\nC -3.045 0.598 0.851\nC -2.252 1.109 1.982\nO -2.736 1.714 2.918\nC -0.837 0.838 1.844\nC -0.294 0.172 0.813\nC 1.134 -0.141 0.610\nC 1.514 -1.332 -0.067\nC 2.827 -1.677 -0.276\nO 3.038 -2.879 -0.878\nC 4.364 -3.212 -1.413\nC 3.874 -0.699 0.051\nO 5.193 -0.926 -0.246\nC 6.215 0.040 0.114\nC 3.496 0.469 0.707\nC 2.157 0.722 0.985\nO 1.761 1.876 1.664\nC 1.809 3.102 0.922\nO -1.062 -0.357 -0.116\nC -2.383 -0.159 -0.139\nC -3.095 -0.594 -1.247\nH -6.753 -1.026 -3.600\nH -6.783 0.451 -2.593\nH -7.069 -1.133 -1.887\nH -6.186 0.560 -0.303\nH -4.415 1.560 2.496\nH -0.195 1.205 2.616\nH 0.770 -2.024 -0.470\nH 4.247 -4.144 -1.964\nH 4.753 -2.368 -2.117\nH 5.080 -3.422 -0.580\nH 6.143 0.312 1.210\nH 6.193 0.978 -0.511\nH 7.171 -0.468 0.027\nH 4.205 1.248 0.958\nH 2.682 3.059 0.118\nH 0.784 3.293 0.401\nH 2.078 4.044 1.596\nH -2.676 -1.184 -2.043[\\XYZ]\nAssistant: The total energy on the {\\omega}B97X-D3\/def2-TZVP level of theory of the chemical is -1261.46817 Hartree."} {"text":"User: I would like to know the {\\omega}B97X-D3\/def2-TZVP total energy of the molecule with SELFIES [H][C][C][Branch1][C][H][C][Branch1][C][Cl][C][Branch1][C][H][C][Branch2][Ring1][N][N][Branch1][C][H][C][O][K][O][C][Branch1][C][O][C][Branch1][C][H][Branch1][C][H][S][C][Ring1][O][Branch1][C][H][H][C][Ring2][Ring1][Branch2][H].\nAssistant: Do you have the XYZ file file of a conformer of the molecule?\nUser: I have it: [XYZ]\n26\nK1 H9 C10 S1 N1 Cl1 O3\nO 3.248 0.034 -0.392\nC 2.472 0.812 0.211\nO 1.284 1.038 0.016\nC 3.145 1.611 1.415\nS 3.895 3.156 0.828\nC 5.431 2.612 0.009\nC 6.455 2.333 1.077\nO 6.579 1.193 1.529\nN 7.215 3.326 1.597\nC 7.349 4.677 1.186\nC 6.244 5.492 0.963\nC 6.433 6.819 0.600\nC 7.711 7.343 0.473\nC 8.808 6.529 0.719\nCl 10.403 7.156 0.588\nC 8.636 5.199 1.070\nK 5.167 -0.455 0.690\nH 3.886 0.990 1.957\nH 2.368 1.942 2.108\nH 5.732 3.411 -0.664\nH 5.254 1.688 -0.570\nH 7.885 2.989 2.283\nH 5.239 5.112 1.101\nH 5.572 7.452 0.419\nH 7.860 8.381 0.209\nH 9.497 4.572 1.252[\\XYZ]\nAssistant: The total energy on the {\\omega}B97X-D3\/def2-TZVP level of theory of the molecule is -2125.39707 Hartree."}", "/scratch/micpie/export/orbnet_denali/train_1-2.jsonl": "{"text":"Question: What is the structure of a conformer of the compound with DeepSMILES [H].[H].[H].[H]COCC[H])COC[H])[H])))COC[H])[H])[H])))C[H])C6COCC[H])COC[H])[H])[H])))C[H])CO[H]))C6CO)C%10[H]?\nConstraint: Return a MOL2000 file.\nAnswer: [V2000]\n\n ChemNLP 3D\n\n 44 43 0 0 0 0 0 0 0 0999 V2000\n -6.5422 -1.1667 -2.7525 C 0 0 0 0 0 0 0 0 0 0 0 0\n -5.0885 -0.9976 -2.6621 O 0 0 0 0 0 0 0 0 0 0 0 0\n -4.4407 -0.5231 -1.6086 C 0 0 0 0 0 3 0 0 0 0 0 0\n -5.1132 -0.1932 -0.4389 C 0 0 0 0 0 3 0 0 0 0 0 0\n -4.4557 0.3408 0.6746 C 0 0 0 0 0 3 0 0 0 0 0 0\n -5.0623 0.7877 1.7707 O 0 0 0 0 0 0 0 0 0 0 0 0\n -3.0457 0.4236 0.5961 C 0 0 0 0 0 3 0 0 0 0 0 0\n -2.2695 0.7931 1.7811 C 0 0 0 0 0 3 0 0 0 0 0 0\n -2.8844 1.0133 2.8871 O 0 0 0 0 0 1 0 0 0 0 0 0\n -0.7668 0.9109 1.5852 C 0 0 0 0 0 3 0 0 0 0 0 0\n -0.1823 0.4745 0.4044 C 0 0 0 0 0 3 0 0 0 0 0 0\n 1.2836 0.2020 0.2524 C 0 0 0 0 0 3 0 0 0 0 0 0\n 1.7786 -1.1065 -0.1513 C 0 0 0 0 0 3 0 0 0 0 0 0\n 3.1738 -1.3114 -0.2578 C 0 0 0 0 0 3 0 0 0 0 0 0\n 0.8661 -2.1471 -0.2897 O 0 0 0 0 0 0 0 0 0 0 0 0\n 1.3901 -3.3786 -0.6727 C 0 0 0 0 0 2 0 0 0 0 0 0\n 4.0804 -0.2789 0.0605 C 0 0 0 0 0 3 0 0 0 0 0 0\n 5.3534 -0.6053 -0.0515 O 0 0 0 0 0 0 0 0 0 0 0 0\n 6.5320 -0.1334 0.8537 C 0 0 0 0 0 3 0 0 0 0 0 0\n 3.5693 1.0256 0.4310 C 0 0 0 0 0 3 0 0 0 0 0 0\n 2.2336 1.3096 0.4814 C 0 0 0 0 0 3 0 0 0 0 0 0\n 4.4210 1.9769 0.8397 O 0 0 0 0 0 0 0 0 0 0 0 0\n 3.9801 3.0855 1.6060 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.9282 0.1305 -0.7041 O 0 0 0 0 0 0 0 0 0 0 0 0\n -2.3128 -0.0015 -0.6110 C 0 0 0 0 0 3 0 0 0 0 0 0\n -3.0332 -0.5184 -1.7136 C 0 0 0 0 0 3 0 0 0 0 0 0\n -6.6925 -1.6774 -3.7129 H 0 0 0 0 0 0 0 0 0 0 0 0\n -7.1605 -0.2266 -2.6021 H 0 0 0 0 0 0 0 0 0 0 0 0\n -6.8982 -1.8632 -2.0135 H 0 0 0 0 0 0 0 0 0 0 0 0\n -6.2125 -0.1918 -0.3393 H 0 0 0 0 0 0 0 0 0 0 0 0\n -4.3106 1.0501 2.3775 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.1804 1.2098 2.3978 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5684 -3.8442 -1.3477 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2.4035 -3.4061 -1.3843 H 0 0 0 0 0 15 0 0 0 0 0 0\n 3.5954 -2.3071 -0.4544 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.6942 -4.0457 0.3079 H 0 0 0 0 0 15 0 0 0 0 0 0\n 6.2053 -0.1003 1.9616 H 0 0 0 0 0 0 0 0 0 0 0 0\n 6.9878 1.0244 0.5332 H 0 0 0 0 0 15 0 0 0 0 0 0\n 7.3417 -0.8835 0.6989 H 0 0 0 0 0 0 0 0 0 0 0 0\n 4.8102 3.6306 1.9271 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.2798 3.7096 1.0781 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.7073 2.3028 0.6312 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.4908 2.6787 2.5051 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.4459 -0.8649 -2.5212 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 1 27 1 0\n 1 28 1 0\n 1 29 1 0\n 2 3 1 0\n 3 4 1 0\n 3 26 1 0\n 4 5 1 0\n 4 30 1 0\n 5 6 1 0\n 5 7 1 0\n 6 31 1 0\n 7 8 1 0\n 7 25 1 0\n 8 9 1 0\n 8 10 1 0\n 10 11 1 0\n 10 32 1 0\n 11 12 1 0\n 11 24 1 0\n 12 13 1 0\n 12 21 1 0\n 13 14 1 0\n 13 15 1 0\n 14 17 1 0\n 14 35 1 0\n 15 16 1 0\n 16 33 1 0\n 17 18 1 0\n 17 20 1 0\n 18 19 1 0\n 19 37 1 0\n 19 39 1 0\n 20 21 1 0\n 20 22 1 0\n 21 42 1 0\n 22 23 1 0\n 23 40 1 0\n 23 41 1 0\n 23 43 1 0\n 24 25 1 0\n 25 26 1 0\n 26 44 1 0\nM END\n[\\V2000]"} {"text":"Question: What's the structure of a conformer of the chemical with SMILES [H]C1C([H])C(Cl)C([H])C(N([H])C(O[K])C([H])([H])SC([H])([H])C(O)O)C1[H]?\nConstraint: Return a MOL2000 file.\nAnswer: [V2000]\n\n ChemNLP 3D\n\n 26 26 0 0 0 0 0 0 0 0999 V2000\n 3.6794 3.6061 -0.1119 O 0 0 0 0 0 1 0 0 0 0 0 0\n 2.7065 2.9964 0.4317 C 0 0 0 0 0 3 0 0 0 0 0 0\n 1.6201 3.4593 0.7476 O 0 0 0 0 0 1 0 0 0 0 0 0\n 2.9483 1.4887 0.7439 C 0 0 0 0 0 0 0 0 0 0 0 0\n 3.9395 0.6899 -0.5516 S 0 0 0 0 0 0 0 0 0 0 0 0\n 5.4863 0.3887 0.3292 C 0 0 0 0 0 0 0 0 0 0 0 0\n 6.3113 1.5919 0.7392 C 0 0 0 0 0 3 0 0 0 0 0 0\n 7.2689 1.3943 1.5098 O 0 0 0 0 0 0 0 0 0 0 0 0\n 6.0016 2.7921 0.2341 N 0 0 0 0 0 0 0 0 0 0 0 0\n 6.7817 3.9216 0.5183 C 0 0 0 0 0 3 0 0 0 0 0 0\n 8.1825 3.8674 0.4857 C 0 0 0 0 0 3 0 0 0 0 0 0\n 8.9340 5.0092 0.7450 C 0 0 0 0 0 3 0 0 0 0 0 0\n 8.3069 6.2183 1.0214 C 0 0 0 0 0 3 0 0 0 0 0 0\n 6.9168 6.2739 1.0180 C 0 0 0 0 0 3 0 0 0 0 0 0\n 6.1321 7.7766 1.3241 Cl 0 0 0 0 0 0 0 0 0 0 0 0\n 6.1536 5.1436 0.7703 C 0 0 0 0 0 3 0 0 0 0 0 0\n 9.1035 2.2574 2.6262 K 0 0 0 0 0 1 0 0 0 0 0 0\n 1.9895 0.9757 0.8294 H 0 0 0 0 0 0 0 0 0 0 0 0\n 3.4596 1.4236 1.7061 H 0 0 0 0 0 0 0 0 0 0 0 0\n 6.1076 -0.1865 -0.3644 H 0 0 0 0 0 0 0 0 0 0 0 0\n 5.3368 -0.2334 1.2176 H 0 0 0 0 0 0 0 0 0 0 0 0\n 4.9114 3.0500 -0.0236 H 0 0 0 0 0 0 0 0 0 0 0 0\n 8.6952 2.9377 0.2137 H 0 0 0 0 0 0 0 0 0 0 0 0\n 10.0246 4.9665 0.6794 H 0 0 0 0 0 0 0 0 0 0 0 0\n 8.8872 7.1126 1.2127 H 0 0 0 0 0 0 0 0 0 0 0 0\n 5.0729 5.2174 0.7601 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 3 1 0\n 2 4 1 0\n 4 5 1 0\n 4 18 1 0\n 4 19 1 0\n 5 6 1 0\n 6 7 1 0\n 6 20 1 0\n 6 21 1 0\n 7 8 1 0\n 7 9 1 0\n 8 17 1 0\n 9 10 1 0\n 9 22 1 0\n 10 11 1 0\n 10 16 1 0\n 11 12 1 0\n 11 23 1 0\n 12 13 1 0\n 12 24 1 0\n 13 14 1 0\n 13 25 1 0\n 14 15 1 0\n 14 16 1 0\n 16 26 1 0\nM END\n[\\V2000]"}", "/scratch/micpie/export/orbnet_denali/test_0-4.jsonl": "{"text":"Task: Return the total energy of a molecule computed at the GFN1-xTB level of theory.\nDescription: The molecule has the XYZ file [XYZ]\n35\nH14 C16 I1 N2 O2\nO -2.054 -1.196 -3.088\nC -2.281 -1.177 -1.852\nO -1.345 -1.785 -0.955\nC -1.195 -1.440 0.310\nN -0.079 -2.118 0.983\nC 1.270 -1.677 0.475\nC 1.606 -0.209 0.921\nC 2.886 0.260 0.255\nC 4.164 0.034 0.932\nC 5.444 0.190 0.245\nC 5.408 0.755 -1.040\nC 4.142 1.096 -1.658\nC 2.917 0.731 -1.039\nI 1.044 1.177 -2.067\nN -1.978 -0.700 0.985\nC -3.028 0.009 0.220\nC -3.810 0.980 0.905\nC -4.892 1.490 0.235\nC -5.165 1.141 -1.079\nC -4.379 0.256 -1.802\nC -3.237 -0.323 -1.126\nH -0.280 -3.134 0.808\nH -0.185 -1.978 1.970\nH 1.091 -1.731 -0.641\nH 1.972 -2.409 0.793\nH 1.741 -0.182 2.060\nH 0.810 0.503 0.617\nH 4.155 -0.341 1.973\nH 6.395 0.003 0.813\nH 6.404 0.853 -1.690\nH 4.157 1.477 -2.661\nH -3.480 1.259 1.951\nH -5.547 2.250 0.720\nH -6.061 1.537 -1.581\nH -4.536 -0.007 -2.827[\\XYZ].\nAnswer: -60.60328 Hartree"} {"text":"Task: Return the total energy of a chemical computed at the GFN1-xTB level of theory.\nDescription: The chemical has the XYZ file [XYZ]\n44\nH18 C19 O7\nC -5.666 2.701 0.538\nO -5.907 1.500 -0.218\nC -4.925 0.591 -0.544\nC -5.368 -0.571 -1.168\nC -4.365 -1.568 -1.399\nO -4.759 -2.607 -2.108\nC -3.031 -1.354 -1.064\nC -2.106 -2.461 -1.198\nO -2.420 -3.607 -1.637\nC -0.708 -2.198 -0.814\nC -0.475 -0.973 -0.269\nC 0.820 -0.487 0.148\nC 1.052 0.941 0.366\nC 2.280 1.558 0.481\nO 2.201 2.924 0.494\nC 3.396 3.599 0.923\nC 3.480 0.694 0.443\nO 4.760 1.208 0.611\nC 6.014 0.321 0.236\nC 3.271 -0.697 0.372\nC 1.998 -1.328 0.251\nO 1.915 -2.662 0.131\nC 2.939 -3.386 0.851\nO -1.395 0.020 -0.161\nC -2.680 -0.140 -0.486\nC -3.606 0.894 -0.329\nH -6.576 3.228 0.666\nH -5.247 2.422 1.490\nH -4.954 3.418 0.055\nH -6.407 -0.714 -1.300\nH -3.957 -3.333 -2.024\nH 0.057 -2.945 -0.917\nH 0.160 1.640 0.360\nH 2.986 4.676 1.175\nH 4.179 3.540 0.120\nH 3.815 3.135 1.855\nH 5.897 -0.206 -0.788\nH 6.241 -0.309 1.144\nH 6.897 0.964 0.152\nH 4.246 -1.221 0.270\nH 2.981 -3.001 1.979\nH 2.758 -4.539 0.933\nH 3.979 -3.301 0.444\nH -3.230 1.838 0.076[\\XYZ].\nAnswer: -78.65904 Hartree"}", "/scratch/micpie/export/mp_descriptions/test_0-1.jsonl": "{"text":"Task: Design a CIF card that matches the description below.\nDescription: Rb4Co(PO4)6 crystallizes in the monoclinic C2\/m space group. There are two inequivalent Rb sites. In the first Rb site, Rb(1) is bonded in a 8-coordinate geometry to two equivalent O(3), two equivalent O(4), two equivalent O(6), and two equivalent O(8) atoms. In the second Rb site, Rb(2) is bonded in a 9-coordinate geometry to one O(6), two equivalent O(1), two equivalent O(3), two equivalent O(4), and two equivalent O(7) atoms. Co(1) is bonded to two equivalent O(5) and four equivalent O(7) atoms to form CoO6 octahedra that share corners with two equivalent P(2)O4 tetrahedra. There are two inequivalent P sites. In the first P site, P(1) is bonded to one O(1), one O(2), one O(3), and one O(4) atom to form corner-sharing PO4 tetrahedra. In the second P site, P(2) is bonded to one O(5), one O(6), and two equivalent O(2) atoms to form PO4 tetrahedra that share a cornercorner with one Co(1)O6 octahedra and corners with two equivalent P(1)O4 tetrahedra. The corner-sharing octahedral tilt angles are 46°. There are eight inequivalent O sites. In the first O site, O(4) is bonded in a distorted single-bond geometry to one Rb(1), one Rb(2), and one P(1) atom. In the second O site, O(5) is bonded in a distorted bent 120 degrees geometry to one Co(1) and one P(2) atom. In the third O site, O(6) is bonded in a single-bond geometry to one Rb(2), two equivalent Rb(1), and one P(2) atom. In the fourth O site, O(7) is bonded in a distorted single-bond geometry to one Rb(2) and one Co(1) atom. In the fifth O site, O(8) is bonded in a 3-coordinate geometry to two equivalent Rb(1) and one O(8) atom. In the sixth O site, O(1) is bonded in a 2-coordinate geometry to two equivalent Rb(2) and two equivalent P(1) atoms. In the seventh O site, O(2) is bonded in a bent 120 degrees geometry to one P(1) and one P(2) atom. In the eighth O site, O(3) is bonded in a single-bond geometry to one Rb(1), one Rb(2), and one P(1) atom.\nCIF: [CIF]\ndata_Rb4Co(PO4)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.820\n_cell_length_b 8.820\n_cell_length_c 9.748\n_cell_angle_alpha 108.807\n_cell_angle_beta 115.611\n_cell_angle_gamma 99.553\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Rb4Co(PO4)6\n_chemical_formula_sum 'Rb4 Co1 P6 O24'\n_cell_volume 604.468\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Rb Rb0 1 0.225 0.725 0.500 1.0\n Rb Rb1 1 0.775 0.275 0.500 1.0\n Rb Rb2 1 0.278 0.675 0.953 1.0\n Rb Rb3 1 0.722 0.325 0.047 1.0\n Co Co4 1 0.000 0.000 0.000 1.0\n P P5 1 0.499 0.483 0.715 1.0\n P P6 1 0.232 0.215 0.715 1.0\n P P7 1 0.501 0.517 0.285 1.0\n P P8 1 0.768 0.785 0.285 1.0\n P P9 1 0.158 0.237 0.395 1.0\n P P10 1 0.843 0.763 0.605 1.0\n O O11 1 0.416 0.374 0.790 1.0\n O O12 1 0.584 0.626 0.210 1.0\n O O13 1 0.367 0.362 0.510 1.0\n O O14 1 0.148 0.143 0.510 1.0\n O O15 1 0.633 0.638 0.490 1.0\n O O16 1 0.852 0.857 0.490 1.0\n O O17 1 0.682 0.475 0.764 1.0\n O O18 1 0.289 0.082 0.764 1.0\n O O19 1 0.318 0.525 0.236 1.0\n O O20 1 0.711 0.918 0.236 1.0\n O O21 1 0.469 0.649 0.762 1.0\n O O22 1 0.113 0.293 0.762 1.0\n O O23 1 0.531 0.351 0.238 1.0\n O O24 1 0.887 0.707 0.238 1.0\n O O25 1 0.150 0.097 0.247 1.0\n O O26 1 0.850 0.903 0.753 1.0\n O O27 1 0.033 0.335 0.368 1.0\n O O28 1 0.967 0.665 0.632 1.0\n O O29 1 0.005 0.800 0.994 1.0\n O O30 1 0.194 0.989 0.994 1.0\n O O31 1 0.995 0.200 0.006 1.0\n O O32 1 0.806 0.011 0.006 1.0\n O O33 1 0.454 0.967 0.422 1.0\n O O34 1 0.546 0.033 0.578 1.0\n[\/CIF]\n"} {"text":"Task: Design a CIF file that matches the description below.\nDescription: Li4CrMnO6 is Caswellsilverite-derived structured and crystallizes in the monoclinic C2\/c space group. There are three inequivalent Li sites. In the first Li site, Li(1) is bonded to two equivalent O(1), two equivalent O(2), and two equivalent O(3) atoms to form LiO6 octahedra that share corners with two equivalent Li(2)O6 octahedra, corners with two equivalent Cr(1)O6 octahedra, corners with two equivalent Mn(1)O6 octahedra, edges with two equivalent Li(2)O6 octahedra, edges with two equivalent Cr(1)O6 octahedra, edges with two equivalent Mn(1)O6 octahedra, and edges with six equivalent Li(3)O6 octahedra. The corner-sharing octahedral tilt angles range from 6-9°. In the second Li site, Li(2) is bonded to two equivalent O(1), two equivalent O(2), and two equivalent O(3) atoms to form LiO6 octahedra that share corners with two equivalent Li(1)O6 octahedra, corners with four equivalent Li(3)O6 octahedra, edges with two equivalent Li(1)O6 octahedra, edges with three equivalent Cr(1)O6 octahedra, edges with three equivalent Mn(1)O6 octahedra, and edges with four equivalent Li(3)O6 octahedra. The corner-sharing octahedral tilt angles range from 8-9°. In the third Li site, Li(3) is bonded to two equivalent O(1), two equivalent O(2), and two equivalent O(3) atoms to form LiO6 octahedra that share corners with two equivalent Li(2)O6 octahedra, corners with two equivalent Cr(1)O6 octahedra, corners with two equivalent Mn(1)O6 octahedra, edges with two equivalent Li(2)O6 octahedra, edges with two equivalent Cr(1)O6 octahedra, edges with two equivalent Mn(1)O6 octahedra, edges with three equivalent Li(1)O6 octahedra, and edges with three equivalent Li(3)O6 octahedra. The corner-sharing octahedral tilt angles range from 8-9°. Cr(1) is bonded to two equivalent O(1), two equivalent O(2), and two equivalent O(3) atoms to form CrO6 octahedra that share corners with two equivalent Li(1)O6 octahedra, corners with four equivalent Li(3)O6 octahedra, edges with two equivalent Li(1)O6 octahedra, edges with three equivalent Li(2)O6 octahedra, edges with three equivalent Mn(1)O6 octahedra, and edges with four equivalent Li(3)O6 octahedra. The corner-sharing octahedral tilt angles are 8°. Mn(1) is bonded to two equivalent O(1), two equivalent O(2), and two equivalent O(3) atoms to form MnO6 octahedra that share corners with two equivalent Li(1)O6 octahedra, corners with four equivalent Li(3)O6 octahedra, edges with two equivalent Li(1)O6 octahedra, edges with three equivalent Li(2)O6 octahedra, edges with three equivalent Cr(1)O6 octahedra, and edges with four equivalent Li(3)O6 octahedra. The corner-sharing octahedral tilt angles range from 6-9°. There are three inequivalent O sites. In the first O site, O(3) is bonded to one Li(1), one Li(2), two equivalent Li(3), one Cr(1), and one Mn(1) atom to form a mixture of corner and edge-sharing OLi4MnCr octahedra. The corner-sharing octahedral tilt angles range from 0-9°. In the second O site, O(1) is bonded to one Li(1), one Li(2), two equivalent Li(3), one Cr(1), and one Mn(1) atom to form a mixture of corner and edge-sharing OLi4MnCr octahedra. The corner-sharing octahedral tilt angles range from 0-8°. In the third O site, O(2) is bonded to one Li(1), one Li(2), two equivalent Li(3), one Cr(1), and one Mn(1) atom to form a mixture of corner and edge-sharing OLi4MnCr octahedra. The corner-sharing octahedral tilt angles range from 0-9°.\nCIF: [CIF]\ndata_Li4MnCrO6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.002\n_cell_length_b 5.003\n_cell_length_c 9.743\n_cell_angle_alpha 85.402\n_cell_angle_beta 94.591\n_cell_angle_gamma 60.608\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Li4MnCrO6\n_chemical_formula_sum 'Li8 Mn2 Cr2 O12'\n_cell_volume 209.739\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.001 0.500 0.000 1.0\n Li Li1 1 0.500 0.001 0.500 1.0\n Li Li2 1 0.247 0.247 0.750 1.0\n Li Li3 1 0.753 0.753 0.250 1.0\n Li Li4 1 0.151 0.670 0.499 1.0\n Li Li5 1 0.329 0.849 0.999 1.0\n Li Li6 1 0.670 0.151 0.001 1.0\n Li Li7 1 0.849 0.329 0.501 1.0\n Mn Mn8 1 0.918 0.918 0.750 1.0\n Mn Mn9 1 0.082 0.082 0.250 1.0\n Cr Cr10 1 0.421 0.421 0.250 1.0\n Cr Cr11 1 0.579 0.579 0.750 1.0\n O O12 1 0.146 0.354 0.364 1.0\n O O13 1 0.354 0.146 0.136 1.0\n O O14 1 0.646 0.854 0.863 1.0\n O O15 1 0.854 0.646 0.637 1.0\n O O16 1 0.287 0.574 0.861 1.0\n O O17 1 0.574 0.287 0.639 1.0\n O O18 1 0.426 0.713 0.361 1.0\n O O19 1 0.712 0.426 0.139 1.0\n O O20 1 0.066 0.793 0.137 1.0\n O O21 1 0.207 0.934 0.637 1.0\n O O22 1 0.793 0.066 0.363 1.0\n O O23 1 0.934 0.207 0.863 1.0\n[\/CIF]\n"}", "/scratch/micpie/export/mp_descriptions/valid_0-0.jsonl": "{"text":"Task: Please design a compound based on the CIF file.\nCIF: [CIF]\ndata_Na3Os\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.441\n_cell_length_b 6.441\n_cell_length_c 5.283\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Na3Os\n_chemical_formula_sum 'Na6 Os2'\n_cell_volume 189.794\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Na Na0 1 0.822 0.178 0.750 1.0\n Na Na1 1 0.356 0.178 0.750 1.0\n Na Na2 1 0.822 0.644 0.750 1.0\n Na Na3 1 0.178 0.822 0.250 1.0\n Na Na4 1 0.644 0.822 0.250 1.0\n Na Na5 1 0.178 0.356 0.250 1.0\n Os Os6 1 0.667 0.333 0.250 1.0\n Os Os7 1 0.333 0.667 0.750 1.0\n[\/CIF]\n\nAnswer: Na3Os crystallizes in the hexagonal P6_3\/mmc space group. Na(1) is bonded in a distorted see-saw-like geometry to four equivalent Os(1) atoms. Os(1) is bonded to twelve equivalent Na(1) atoms to form a mixture of face and corner-sharing OsNa12 cuboctahedra."} {"text":"Task: Design a structure based on the Crystallographic Information File (CIF).\nCIF: [CIF]\ndata_YFe3Se2ClO8\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.516\n_cell_length_b 7.259\n_cell_length_c 9.673\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural YFe3Se2ClO8\n_chemical_formula_sum 'Y2 Fe6 Se4 Cl2 O16'\n_cell_volume 457.546\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Y Y0 1 0.000 0.228 0.000 1.0\n Y Y1 1 0.500 0.772 0.500 1.0\n Fe Fe2 1 0.250 0.500 0.750 1.0\n Fe Fe3 1 0.250 0.500 0.250 1.0\n Fe Fe4 1 0.750 0.500 0.750 1.0\n Fe Fe5 1 0.500 0.299 0.500 1.0\n Fe Fe6 1 0.000 0.701 0.000 1.0\n Fe Fe7 1 0.750 0.500 0.250 1.0\n Se Se8 1 0.500 0.086 0.813 1.0\n Se Se9 1 0.000 0.914 0.687 1.0\n Se Se10 1 0.500 0.086 0.187 1.0\n Se Se11 1 0.000 0.914 0.313 1.0\n Cl Cl12 1 0.000 0.372 0.500 1.0\n Cl Cl13 1 0.500 0.628 0.000 1.0\n O O14 1 0.000 0.491 0.863 1.0\n O O15 1 0.000 0.924 0.867 1.0\n O O16 1 0.500 0.076 0.367 1.0\n O O17 1 0.292 0.229 0.841 1.0\n O O18 1 0.208 0.771 0.341 1.0\n O O19 1 0.792 0.771 0.341 1.0\n O O20 1 0.500 0.509 0.637 1.0\n O O21 1 0.708 0.229 0.841 1.0\n O O22 1 0.708 0.229 0.159 1.0\n O O23 1 0.792 0.771 0.659 1.0\n O O24 1 0.000 0.491 0.137 1.0\n O O25 1 0.500 0.509 0.363 1.0\n O O26 1 0.208 0.771 0.659 1.0\n O O27 1 0.292 0.229 0.159 1.0\n O O28 1 0.500 0.076 0.633 1.0\n O O29 1 0.000 0.924 0.133 1.0\n[\/CIF]\n\nAnswer: YFe3Se2O8Cl crystallizes in the orthorhombic Pmmn space group. Y(1) is bonded in a body-centered cubic geometry to two equivalent O(1), two equivalent O(2), and four equivalent O(3) atoms. There are two inequivalent Fe sites. In the first Fe site, Fe(1) is bonded in a distorted square co-planar geometry to two equivalent O(1), two equivalent O(3), and two equivalent Cl(1) atoms. In the second Fe site, Fe(2) is bonded in a square co-planar geometry to two equivalent O(1) and two equivalent O(2) atoms. Se(1) is bonded in a trigonal non-coplanar geometry to one O(2) and two equivalent O(3) atoms. There are three inequivalent O sites. In the first O site, O(1) is bonded to one Y(1), one Fe(2), and two equivalent Fe(1) atoms to form a mixture of edge and corner-sharing OYFe3 tetrahedra. In the second O site, O(2) is bonded in a distorted trigonal planar geometry to one Y(1), one Fe(2), and one Se(1) atom. In the third O site, O(3) is bonded in a distorted trigonal planar geometry to one Y(1), one Fe(1), and one Se(1) atom. Cl(1) is bonded in a 4-coordinate geometry to four equivalent Fe(1) atoms."}", "/scratch/micpie/export/mp_descriptions/test_0-2.jsonl": "{"text":"User: Could you describe a crystal structure based on the CIF file?\nAssistant: Sure, I would need the CIF file to do that.\nUser: [CIF]\ndata_Rb4Co(PO4)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.820\n_cell_length_b 8.820\n_cell_length_c 9.748\n_cell_angle_alpha 108.807\n_cell_angle_beta 115.611\n_cell_angle_gamma 99.553\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Rb4Co(PO4)6\n_chemical_formula_sum 'Rb4 Co1 P6 O24'\n_cell_volume 604.468\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Rb Rb0 1 0.225 0.725 0.500 1.0\n Rb Rb1 1 0.775 0.275 0.500 1.0\n Rb Rb2 1 0.278 0.675 0.953 1.0\n Rb Rb3 1 0.722 0.325 0.047 1.0\n Co Co4 1 0.000 0.000 0.000 1.0\n P P5 1 0.499 0.483 0.715 1.0\n P P6 1 0.232 0.215 0.715 1.0\n P P7 1 0.501 0.517 0.285 1.0\n P P8 1 0.768 0.785 0.285 1.0\n P P9 1 0.158 0.237 0.395 1.0\n P P10 1 0.843 0.763 0.605 1.0\n O O11 1 0.416 0.374 0.790 1.0\n O O12 1 0.584 0.626 0.210 1.0\n O O13 1 0.367 0.362 0.510 1.0\n O O14 1 0.148 0.143 0.510 1.0\n O O15 1 0.633 0.638 0.490 1.0\n O O16 1 0.852 0.857 0.490 1.0\n O O17 1 0.682 0.475 0.764 1.0\n O O18 1 0.289 0.082 0.764 1.0\n O O19 1 0.318 0.525 0.236 1.0\n O O20 1 0.711 0.918 0.236 1.0\n O O21 1 0.469 0.649 0.762 1.0\n O O22 1 0.113 0.293 0.762 1.0\n O O23 1 0.531 0.351 0.238 1.0\n O O24 1 0.887 0.707 0.238 1.0\n O O25 1 0.150 0.097 0.247 1.0\n O O26 1 0.850 0.903 0.753 1.0\n O O27 1 0.033 0.335 0.368 1.0\n O O28 1 0.967 0.665 0.632 1.0\n O O29 1 0.005 0.800 0.994 1.0\n O O30 1 0.194 0.989 0.994 1.0\n O O31 1 0.995 0.200 0.006 1.0\n O O32 1 0.806 0.011 0.006 1.0\n O O33 1 0.454 0.967 0.422 1.0\n O O34 1 0.546 0.033 0.578 1.0\n[\/CIF]\n\nAssistant: Rb4Co(PO4)6 crystallizes in the monoclinic C2\/m space group. There are two inequivalent Rb sites. In the first Rb site, Rb(1) is bonded in a 8-coordinate geometry to two equivalent O(3), two equivalent O(4), two equivalent O(6), and two equivalent O(8) atoms. In the second Rb site, Rb(2) is bonded in a 9-coordinate geometry to one O(6), two equivalent O(1), two equivalent O(3), two equivalent O(4), and two equivalent O(7) atoms. Co(1) is bonded to two equivalent O(5) and four equivalent O(7) atoms to form CoO6 octahedra that share corners with two equivalent P(2)O4 tetrahedra. There are two inequivalent P sites. In the first P site, P(1) is bonded to one O(1), one O(2), one O(3), and one O(4) atom to form corner-sharing PO4 tetrahedra. In the second P site, P(2) is bonded to one O(5), one O(6), and two equivalent O(2) atoms to form PO4 tetrahedra that share a cornercorner with one Co(1)O6 octahedra and corners with two equivalent P(1)O4 tetrahedra. The corner-sharing octahedral tilt angles are 46°. There are eight inequivalent O sites. In the first O site, O(4) is bonded in a distorted single-bond geometry to one Rb(1), one Rb(2), and one P(1) atom. In the second O site, O(5) is bonded in a distorted bent 120 degrees geometry to one Co(1) and one P(2) atom. In the third O site, O(6) is bonded in a single-bond geometry to one Rb(2), two equivalent Rb(1), and one P(2) atom. In the fourth O site, O(7) is bonded in a distorted single-bond geometry to one Rb(2) and one Co(1) atom. In the fifth O site, O(8) is bonded in a 3-coordinate geometry to two equivalent Rb(1) and one O(8) atom. In the sixth O site, O(1) is bonded in a 2-coordinate geometry to two equivalent Rb(2) and two equivalent P(1) atoms. In the seventh O site, O(2) is bonded in a bent 120 degrees geometry to one P(1) and one P(2) atom. In the eighth O site, O(3) is bonded in a single-bond geometry to one Rb(1), one Rb(2), and one P(1) atom."} {"text":"User: Can you describe a compound based on the CIF file?\nAssistant: Sure, I need the CIF file to do that.\nUser: [CIF]\ndata_Li4MnCrO6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.002\n_cell_length_b 5.003\n_cell_length_c 9.743\n_cell_angle_alpha 85.402\n_cell_angle_beta 94.591\n_cell_angle_gamma 60.608\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Li4MnCrO6\n_chemical_formula_sum 'Li8 Mn2 Cr2 O12'\n_cell_volume 209.739\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.001 0.500 0.000 1.0\n Li Li1 1 0.500 0.001 0.500 1.0\n Li Li2 1 0.247 0.247 0.750 1.0\n Li Li3 1 0.753 0.753 0.250 1.0\n Li Li4 1 0.151 0.670 0.499 1.0\n Li Li5 1 0.329 0.849 0.999 1.0\n Li Li6 1 0.670 0.151 0.001 1.0\n Li Li7 1 0.849 0.329 0.501 1.0\n Mn Mn8 1 0.918 0.918 0.750 1.0\n Mn Mn9 1 0.082 0.082 0.250 1.0\n Cr Cr10 1 0.421 0.421 0.250 1.0\n Cr Cr11 1 0.579 0.579 0.750 1.0\n O O12 1 0.146 0.354 0.364 1.0\n O O13 1 0.354 0.146 0.136 1.0\n O O14 1 0.646 0.854 0.863 1.0\n O O15 1 0.854 0.646 0.637 1.0\n O O16 1 0.287 0.574 0.861 1.0\n O O17 1 0.574 0.287 0.639 1.0\n O O18 1 0.426 0.713 0.361 1.0\n O O19 1 0.712 0.426 0.139 1.0\n O O20 1 0.066 0.793 0.137 1.0\n O O21 1 0.207 0.934 0.637 1.0\n O O22 1 0.793 0.066 0.363 1.0\n O O23 1 0.934 0.207 0.863 1.0\n[\/CIF]\n\nAssistant: Li4CrMnO6 is Caswellsilverite-derived structured and crystallizes in the monoclinic C2\/c space group. There are three inequivalent Li sites. In the first Li site, Li(1) is bonded to two equivalent O(1), two equivalent O(2), and two equivalent O(3) atoms to form LiO6 octahedra that share corners with two equivalent Li(2)O6 octahedra, corners with two equivalent Cr(1)O6 octahedra, corners with two equivalent Mn(1)O6 octahedra, edges with two equivalent Li(2)O6 octahedra, edges with two equivalent Cr(1)O6 octahedra, edges with two equivalent Mn(1)O6 octahedra, and edges with six equivalent Li(3)O6 octahedra. The corner-sharing octahedral tilt angles range from 6-9°. In the second Li site, Li(2) is bonded to two equivalent O(1), two equivalent O(2), and two equivalent O(3) atoms to form LiO6 octahedra that share corners with two equivalent Li(1)O6 octahedra, corners with four equivalent Li(3)O6 octahedra, edges with two equivalent Li(1)O6 octahedra, edges with three equivalent Cr(1)O6 octahedra, edges with three equivalent Mn(1)O6 octahedra, and edges with four equivalent Li(3)O6 octahedra. The corner-sharing octahedral tilt angles range from 8-9°. In the third Li site, Li(3) is bonded to two equivalent O(1), two equivalent O(2), and two equivalent O(3) atoms to form LiO6 octahedra that share corners with two equivalent Li(2)O6 octahedra, corners with two equivalent Cr(1)O6 octahedra, corners with two equivalent Mn(1)O6 octahedra, edges with two equivalent Li(2)O6 octahedra, edges with two equivalent Cr(1)O6 octahedra, edges with two equivalent Mn(1)O6 octahedra, edges with three equivalent Li(1)O6 octahedra, and edges with three equivalent Li(3)O6 octahedra. The corner-sharing octahedral tilt angles range from 8-9°. Cr(1) is bonded to two equivalent O(1), two equivalent O(2), and two equivalent O(3) atoms to form CrO6 octahedra that share corners with two equivalent Li(1)O6 octahedra, corners with four equivalent Li(3)O6 octahedra, edges with two equivalent Li(1)O6 octahedra, edges with three equivalent Li(2)O6 octahedra, edges with three equivalent Mn(1)O6 octahedra, and edges with four equivalent Li(3)O6 octahedra. The corner-sharing octahedral tilt angles are 8°. Mn(1) is bonded to two equivalent O(1), two equivalent O(2), and two equivalent O(3) atoms to form MnO6 octahedra that share corners with two equivalent Li(1)O6 octahedra, corners with four equivalent Li(3)O6 octahedra, edges with two equivalent Li(1)O6 octahedra, edges with three equivalent Li(2)O6 octahedra, edges with three equivalent Cr(1)O6 octahedra, and edges with four equivalent Li(3)O6 octahedra. The corner-sharing octahedral tilt angles range from 6-9°. There are three inequivalent O sites. In the first O site, O(3) is bonded to one Li(1), one Li(2), two equivalent Li(3), one Cr(1), and one Mn(1) atom to form a mixture of corner and edge-sharing OLi4MnCr octahedra. The corner-sharing octahedral tilt angles range from 0-9°. In the second O site, O(1) is bonded to one Li(1), one Li(2), two equivalent Li(3), one Cr(1), and one Mn(1) atom to form a mixture of corner and edge-sharing OLi4MnCr octahedra. The corner-sharing octahedral tilt angles range from 0-8°. In the third O site, O(2) is bonded to one Li(1), one Li(2), two equivalent Li(3), one Cr(1), and one Mn(1) atom to form a mixture of corner and edge-sharing OLi4MnCr octahedra. The corner-sharing octahedral tilt angles range from 0-9°."}", "/scratch/micpie/export/mp_descriptions/test_0-0.jsonl": "{"text":"Task: Design a structure based on the Crystallographic Information File (CIF).\nCIF: [CIF]\ndata_Rb4Co(PO4)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.820\n_cell_length_b 8.820\n_cell_length_c 9.748\n_cell_angle_alpha 108.807\n_cell_angle_beta 115.611\n_cell_angle_gamma 99.553\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Rb4Co(PO4)6\n_chemical_formula_sum 'Rb4 Co1 P6 O24'\n_cell_volume 604.468\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Rb Rb0 1 0.225 0.725 0.500 1.0\n Rb Rb1 1 0.775 0.275 0.500 1.0\n Rb Rb2 1 0.278 0.675 0.953 1.0\n Rb Rb3 1 0.722 0.325 0.047 1.0\n Co Co4 1 0.000 0.000 0.000 1.0\n P P5 1 0.499 0.483 0.715 1.0\n P P6 1 0.232 0.215 0.715 1.0\n P P7 1 0.501 0.517 0.285 1.0\n P P8 1 0.768 0.785 0.285 1.0\n P P9 1 0.158 0.237 0.395 1.0\n P P10 1 0.843 0.763 0.605 1.0\n O O11 1 0.416 0.374 0.790 1.0\n O O12 1 0.584 0.626 0.210 1.0\n O O13 1 0.367 0.362 0.510 1.0\n O O14 1 0.148 0.143 0.510 1.0\n O O15 1 0.633 0.638 0.490 1.0\n O O16 1 0.852 0.857 0.490 1.0\n O O17 1 0.682 0.475 0.764 1.0\n O O18 1 0.289 0.082 0.764 1.0\n O O19 1 0.318 0.525 0.236 1.0\n O O20 1 0.711 0.918 0.236 1.0\n O O21 1 0.469 0.649 0.762 1.0\n O O22 1 0.113 0.293 0.762 1.0\n O O23 1 0.531 0.351 0.238 1.0\n O O24 1 0.887 0.707 0.238 1.0\n O O25 1 0.150 0.097 0.247 1.0\n O O26 1 0.850 0.903 0.753 1.0\n O O27 1 0.033 0.335 0.368 1.0\n O O28 1 0.967 0.665 0.632 1.0\n O O29 1 0.005 0.800 0.994 1.0\n O O30 1 0.194 0.989 0.994 1.0\n O O31 1 0.995 0.200 0.006 1.0\n O O32 1 0.806 0.011 0.006 1.0\n O O33 1 0.454 0.967 0.422 1.0\n O O34 1 0.546 0.033 0.578 1.0\n[\/CIF]\n\nDescription: Rb4Co(PO4)6 crystallizes in the monoclinic C2\/m space group. There are two inequivalent Rb sites. In the first Rb site, Rb(1) is bonded in a 8-coordinate geometry to two equivalent O(3), two equivalent O(4), two equivalent O(6), and two equivalent O(8) atoms. In the second Rb site, Rb(2) is bonded in a 9-coordinate geometry to one O(6), two equivalent O(1), two equivalent O(3), two equivalent O(4), and two equivalent O(7) atoms. Co(1) is bonded to two equivalent O(5) and four equivalent O(7) atoms to form CoO6 octahedra that share corners with two equivalent P(2)O4 tetrahedra. There are two inequivalent P sites. In the first P site, P(1) is bonded to one O(1), one O(2), one O(3), and one O(4) atom to form corner-sharing PO4 tetrahedra. In the second P site, P(2) is bonded to one O(5), one O(6), and two equivalent O(2) atoms to form PO4 tetrahedra that share a cornercorner with one Co(1)O6 octahedra and corners with two equivalent P(1)O4 tetrahedra. The corner-sharing octahedral tilt angles are 46°. There are eight inequivalent O sites. In the first O site, O(4) is bonded in a distorted single-bond geometry to one Rb(1), one Rb(2), and one P(1) atom. In the second O site, O(5) is bonded in a distorted bent 120 degrees geometry to one Co(1) and one P(2) atom. In the third O site, O(6) is bonded in a single-bond geometry to one Rb(2), two equivalent Rb(1), and one P(2) atom. In the fourth O site, O(7) is bonded in a distorted single-bond geometry to one Rb(2) and one Co(1) atom. In the fifth O site, O(8) is bonded in a 3-coordinate geometry to two equivalent Rb(1) and one O(8) atom. In the sixth O site, O(1) is bonded in a 2-coordinate geometry to two equivalent Rb(2) and two equivalent P(1) atoms. In the seventh O site, O(2) is bonded in a bent 120 degrees geometry to one P(1) and one P(2) atom. In the eighth O site, O(3) is bonded in a single-bond geometry to one Rb(1), one Rb(2), and one P(1) atom."} {"text":"Task: Design a crystal structure based on the Crystallographic Information File (CIF).\nCIF: [CIF]\ndata_Li4MnCrO6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.002\n_cell_length_b 5.003\n_cell_length_c 9.743\n_cell_angle_alpha 85.402\n_cell_angle_beta 94.591\n_cell_angle_gamma 60.608\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Li4MnCrO6\n_chemical_formula_sum 'Li8 Mn2 Cr2 O12'\n_cell_volume 209.739\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.001 0.500 0.000 1.0\n Li Li1 1 0.500 0.001 0.500 1.0\n Li Li2 1 0.247 0.247 0.750 1.0\n Li Li3 1 0.753 0.753 0.250 1.0\n Li Li4 1 0.151 0.670 0.499 1.0\n Li Li5 1 0.329 0.849 0.999 1.0\n Li Li6 1 0.670 0.151 0.001 1.0\n Li Li7 1 0.849 0.329 0.501 1.0\n Mn Mn8 1 0.918 0.918 0.750 1.0\n Mn Mn9 1 0.082 0.082 0.250 1.0\n Cr Cr10 1 0.421 0.421 0.250 1.0\n Cr Cr11 1 0.579 0.579 0.750 1.0\n O O12 1 0.146 0.354 0.364 1.0\n O O13 1 0.354 0.146 0.136 1.0\n O O14 1 0.646 0.854 0.863 1.0\n O O15 1 0.854 0.646 0.637 1.0\n O O16 1 0.287 0.574 0.861 1.0\n O O17 1 0.574 0.287 0.639 1.0\n O O18 1 0.426 0.713 0.361 1.0\n O O19 1 0.712 0.426 0.139 1.0\n O O20 1 0.066 0.793 0.137 1.0\n O O21 1 0.207 0.934 0.637 1.0\n O O22 1 0.793 0.066 0.363 1.0\n O O23 1 0.934 0.207 0.863 1.0\n[\/CIF]\n\nDescription: Li4CrMnO6 is Caswellsilverite-derived structured and crystallizes in the monoclinic C2\/c space group. There are three inequivalent Li sites. In the first Li site, Li(1) is bonded to two equivalent O(1), two equivalent O(2), and two equivalent O(3) atoms to form LiO6 octahedra that share corners with two equivalent Li(2)O6 octahedra, corners with two equivalent Cr(1)O6 octahedra, corners with two equivalent Mn(1)O6 octahedra, edges with two equivalent Li(2)O6 octahedra, edges with two equivalent Cr(1)O6 octahedra, edges with two equivalent Mn(1)O6 octahedra, and edges with six equivalent Li(3)O6 octahedra. The corner-sharing octahedral tilt angles range from 6-9°. In the second Li site, Li(2) is bonded to two equivalent O(1), two equivalent O(2), and two equivalent O(3) atoms to form LiO6 octahedra that share corners with two equivalent Li(1)O6 octahedra, corners with four equivalent Li(3)O6 octahedra, edges with two equivalent Li(1)O6 octahedra, edges with three equivalent Cr(1)O6 octahedra, edges with three equivalent Mn(1)O6 octahedra, and edges with four equivalent Li(3)O6 octahedra. The corner-sharing octahedral tilt angles range from 8-9°. In the third Li site, Li(3) is bonded to two equivalent O(1), two equivalent O(2), and two equivalent O(3) atoms to form LiO6 octahedra that share corners with two equivalent Li(2)O6 octahedra, corners with two equivalent Cr(1)O6 octahedra, corners with two equivalent Mn(1)O6 octahedra, edges with two equivalent Li(2)O6 octahedra, edges with two equivalent Cr(1)O6 octahedra, edges with two equivalent Mn(1)O6 octahedra, edges with three equivalent Li(1)O6 octahedra, and edges with three equivalent Li(3)O6 octahedra. The corner-sharing octahedral tilt angles range from 8-9°. Cr(1) is bonded to two equivalent O(1), two equivalent O(2), and two equivalent O(3) atoms to form CrO6 octahedra that share corners with two equivalent Li(1)O6 octahedra, corners with four equivalent Li(3)O6 octahedra, edges with two equivalent Li(1)O6 octahedra, edges with three equivalent Li(2)O6 octahedra, edges with three equivalent Mn(1)O6 octahedra, and edges with four equivalent Li(3)O6 octahedra. The corner-sharing octahedral tilt angles are 8°. Mn(1) is bonded to two equivalent O(1), two equivalent O(2), and two equivalent O(3) atoms to form MnO6 octahedra that share corners with two equivalent Li(1)O6 octahedra, corners with four equivalent Li(3)O6 octahedra, edges with two equivalent Li(1)O6 octahedra, edges with three equivalent Li(2)O6 octahedra, edges with three equivalent Cr(1)O6 octahedra, and edges with four equivalent Li(3)O6 octahedra. The corner-sharing octahedral tilt angles range from 6-9°. There are three inequivalent O sites. In the first O site, O(3) is bonded to one Li(1), one Li(2), two equivalent Li(3), one Cr(1), and one Mn(1) atom to form a mixture of corner and edge-sharing OLi4MnCr octahedra. The corner-sharing octahedral tilt angles range from 0-9°. In the second O site, O(1) is bonded to one Li(1), one Li(2), two equivalent Li(3), one Cr(1), and one Mn(1) atom to form a mixture of corner and edge-sharing OLi4MnCr octahedra. The corner-sharing octahedral tilt angles range from 0-8°. In the third O site, O(2) is bonded to one Li(1), one Li(2), two equivalent Li(3), one Cr(1), and one Mn(1) atom to form a mixture of corner and edge-sharing OLi4MnCr octahedra. The corner-sharing octahedral tilt angles range from 0-9°."}", "/scratch/micpie/export/mp_descriptions/test_0-3.jsonl": "{"text":"User: Could you design a CIF card that matches a description of a material?\nAssistant: I can give it a try, I require the description of the material to do that.\nUser: Rb4Co(PO4)6 crystallizes in the monoclinic C2\/m space group. There are two inequivalent Rb sites. In the first Rb site, Rb(1) is bonded in a 8-coordinate geometry to two equivalent O(3), two equivalent O(4), two equivalent O(6), and two equivalent O(8) atoms. In the second Rb site, Rb(2) is bonded in a 9-coordinate geometry to one O(6), two equivalent O(1), two equivalent O(3), two equivalent O(4), and two equivalent O(7) atoms. Co(1) is bonded to two equivalent O(5) and four equivalent O(7) atoms to form CoO6 octahedra that share corners with two equivalent P(2)O4 tetrahedra. There are two inequivalent P sites. In the first P site, P(1) is bonded to one O(1), one O(2), one O(3), and one O(4) atom to form corner-sharing PO4 tetrahedra. In the second P site, P(2) is bonded to one O(5), one O(6), and two equivalent O(2) atoms to form PO4 tetrahedra that share a cornercorner with one Co(1)O6 octahedra and corners with two equivalent P(1)O4 tetrahedra. The corner-sharing octahedral tilt angles are 46°. There are eight inequivalent O sites. In the first O site, O(4) is bonded in a distorted single-bond geometry to one Rb(1), one Rb(2), and one P(1) atom. In the second O site, O(5) is bonded in a distorted bent 120 degrees geometry to one Co(1) and one P(2) atom. In the third O site, O(6) is bonded in a single-bond geometry to one Rb(2), two equivalent Rb(1), and one P(2) atom. In the fourth O site, O(7) is bonded in a distorted single-bond geometry to one Rb(2) and one Co(1) atom. In the fifth O site, O(8) is bonded in a 3-coordinate geometry to two equivalent Rb(1) and one O(8) atom. In the sixth O site, O(1) is bonded in a 2-coordinate geometry to two equivalent Rb(2) and two equivalent P(1) atoms. In the seventh O site, O(2) is bonded in a bent 120 degrees geometry to one P(1) and one P(2) atom. In the eighth O site, O(3) is bonded in a single-bond geometry to one Rb(1), one Rb(2), and one P(1) atom.\nAssistant: [CIF]\ndata_Rb4Co(PO4)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.820\n_cell_length_b 8.820\n_cell_length_c 9.748\n_cell_angle_alpha 108.807\n_cell_angle_beta 115.611\n_cell_angle_gamma 99.553\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Rb4Co(PO4)6\n_chemical_formula_sum 'Rb4 Co1 P6 O24'\n_cell_volume 604.468\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Rb Rb0 1 0.225 0.725 0.500 1.0\n Rb Rb1 1 0.775 0.275 0.500 1.0\n Rb Rb2 1 0.278 0.675 0.953 1.0\n Rb Rb3 1 0.722 0.325 0.047 1.0\n Co Co4 1 0.000 0.000 0.000 1.0\n P P5 1 0.499 0.483 0.715 1.0\n P P6 1 0.232 0.215 0.715 1.0\n P P7 1 0.501 0.517 0.285 1.0\n P P8 1 0.768 0.785 0.285 1.0\n P P9 1 0.158 0.237 0.395 1.0\n P P10 1 0.843 0.763 0.605 1.0\n O O11 1 0.416 0.374 0.790 1.0\n O O12 1 0.584 0.626 0.210 1.0\n O O13 1 0.367 0.362 0.510 1.0\n O O14 1 0.148 0.143 0.510 1.0\n O O15 1 0.633 0.638 0.490 1.0\n O O16 1 0.852 0.857 0.490 1.0\n O O17 1 0.682 0.475 0.764 1.0\n O O18 1 0.289 0.082 0.764 1.0\n O O19 1 0.318 0.525 0.236 1.0\n O O20 1 0.711 0.918 0.236 1.0\n O O21 1 0.469 0.649 0.762 1.0\n O O22 1 0.113 0.293 0.762 1.0\n O O23 1 0.531 0.351 0.238 1.0\n O O24 1 0.887 0.707 0.238 1.0\n O O25 1 0.150 0.097 0.247 1.0\n O O26 1 0.850 0.903 0.753 1.0\n O O27 1 0.033 0.335 0.368 1.0\n O O28 1 0.967 0.665 0.632 1.0\n O O29 1 0.005 0.800 0.994 1.0\n O O30 1 0.194 0.989 0.994 1.0\n O O31 1 0.995 0.200 0.006 1.0\n O O32 1 0.806 0.011 0.006 1.0\n O O33 1 0.454 0.967 0.422 1.0\n O O34 1 0.546 0.033 0.578 1.0\n[\/CIF]\n"} {"text":"User: Could you design a CIF card that matches a description of a crystal structure?\nAssistant: I need the description of the crystal structure to do that.\nUser: Li4CrMnO6 is Caswellsilverite-derived structured and crystallizes in the monoclinic C2\/c space group. There are three inequivalent Li sites. In the first Li site, Li(1) is bonded to two equivalent O(1), two equivalent O(2), and two equivalent O(3) atoms to form LiO6 octahedra that share corners with two equivalent Li(2)O6 octahedra, corners with two equivalent Cr(1)O6 octahedra, corners with two equivalent Mn(1)O6 octahedra, edges with two equivalent Li(2)O6 octahedra, edges with two equivalent Cr(1)O6 octahedra, edges with two equivalent Mn(1)O6 octahedra, and edges with six equivalent Li(3)O6 octahedra. The corner-sharing octahedral tilt angles range from 6-9°. In the second Li site, Li(2) is bonded to two equivalent O(1), two equivalent O(2), and two equivalent O(3) atoms to form LiO6 octahedra that share corners with two equivalent Li(1)O6 octahedra, corners with four equivalent Li(3)O6 octahedra, edges with two equivalent Li(1)O6 octahedra, edges with three equivalent Cr(1)O6 octahedra, edges with three equivalent Mn(1)O6 octahedra, and edges with four equivalent Li(3)O6 octahedra. The corner-sharing octahedral tilt angles range from 8-9°. In the third Li site, Li(3) is bonded to two equivalent O(1), two equivalent O(2), and two equivalent O(3) atoms to form LiO6 octahedra that share corners with two equivalent Li(2)O6 octahedra, corners with two equivalent Cr(1)O6 octahedra, corners with two equivalent Mn(1)O6 octahedra, edges with two equivalent Li(2)O6 octahedra, edges with two equivalent Cr(1)O6 octahedra, edges with two equivalent Mn(1)O6 octahedra, edges with three equivalent Li(1)O6 octahedra, and edges with three equivalent Li(3)O6 octahedra. The corner-sharing octahedral tilt angles range from 8-9°. Cr(1) is bonded to two equivalent O(1), two equivalent O(2), and two equivalent O(3) atoms to form CrO6 octahedra that share corners with two equivalent Li(1)O6 octahedra, corners with four equivalent Li(3)O6 octahedra, edges with two equivalent Li(1)O6 octahedra, edges with three equivalent Li(2)O6 octahedra, edges with three equivalent Mn(1)O6 octahedra, and edges with four equivalent Li(3)O6 octahedra. The corner-sharing octahedral tilt angles are 8°. Mn(1) is bonded to two equivalent O(1), two equivalent O(2), and two equivalent O(3) atoms to form MnO6 octahedra that share corners with two equivalent Li(1)O6 octahedra, corners with four equivalent Li(3)O6 octahedra, edges with two equivalent Li(1)O6 octahedra, edges with three equivalent Li(2)O6 octahedra, edges with three equivalent Cr(1)O6 octahedra, and edges with four equivalent Li(3)O6 octahedra. The corner-sharing octahedral tilt angles range from 6-9°. There are three inequivalent O sites. In the first O site, O(3) is bonded to one Li(1), one Li(2), two equivalent Li(3), one Cr(1), and one Mn(1) atom to form a mixture of corner and edge-sharing OLi4MnCr octahedra. The corner-sharing octahedral tilt angles range from 0-9°. In the second O site, O(1) is bonded to one Li(1), one Li(2), two equivalent Li(3), one Cr(1), and one Mn(1) atom to form a mixture of corner and edge-sharing OLi4MnCr octahedra. The corner-sharing octahedral tilt angles range from 0-8°. In the third O site, O(2) is bonded to one Li(1), one Li(2), two equivalent Li(3), one Cr(1), and one Mn(1) atom to form a mixture of corner and edge-sharing OLi4MnCr octahedra. The corner-sharing octahedral tilt angles range from 0-9°.\nAssistant: [CIF]\ndata_Li4MnCrO6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.002\n_cell_length_b 5.003\n_cell_length_c 9.743\n_cell_angle_alpha 85.402\n_cell_angle_beta 94.591\n_cell_angle_gamma 60.608\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Li4MnCrO6\n_chemical_formula_sum 'Li8 Mn2 Cr2 O12'\n_cell_volume 209.739\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.001 0.500 0.000 1.0\n Li Li1 1 0.500 0.001 0.500 1.0\n Li Li2 1 0.247 0.247 0.750 1.0\n Li Li3 1 0.753 0.753 0.250 1.0\n Li Li4 1 0.151 0.670 0.499 1.0\n Li Li5 1 0.329 0.849 0.999 1.0\n Li Li6 1 0.670 0.151 0.001 1.0\n Li Li7 1 0.849 0.329 0.501 1.0\n Mn Mn8 1 0.918 0.918 0.750 1.0\n Mn Mn9 1 0.082 0.082 0.250 1.0\n Cr Cr10 1 0.421 0.421 0.250 1.0\n Cr Cr11 1 0.579 0.579 0.750 1.0\n O O12 1 0.146 0.354 0.364 1.0\n O O13 1 0.354 0.146 0.136 1.0\n O O14 1 0.646 0.854 0.863 1.0\n O O15 1 0.854 0.646 0.637 1.0\n O O16 1 0.287 0.574 0.861 1.0\n O O17 1 0.574 0.287 0.639 1.0\n O O18 1 0.426 0.713 0.361 1.0\n O O19 1 0.712 0.426 0.139 1.0\n O O20 1 0.066 0.793 0.137 1.0\n O O21 1 0.207 0.934 0.637 1.0\n O O22 1 0.793 0.066 0.363 1.0\n O O23 1 0.934 0.207 0.863 1.0\n[\/CIF]\n"}", "/scratch/micpie/export/mp_descriptions/train_0-0.jsonl": "{"text":"Task: Design a material structure based on the CIF file.\nCIF: [CIF]\ndata_LiBi4B3O11\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.533\n_cell_length_b 6.533\n_cell_length_c 12.403\n_cell_angle_alpha 83.089\n_cell_angle_beta 83.089\n_cell_angle_gamma 88.841\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiBi4B3O11\n_chemical_formula_sum 'Li2 Bi8 B6 O22'\n_cell_volume 521.674\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.112 0.888 0.750 1.0\n Li Li1 1 0.888 0.112 0.250 1.0\n Bi Bi2 1 0.601 0.920 0.832 1.0\n Bi Bi3 1 0.920 0.601 0.332 1.0\n Bi Bi4 1 0.826 0.400 0.921 1.0\n Bi Bi5 1 0.400 0.826 0.421 1.0\n Bi Bi6 1 0.600 0.174 0.579 1.0\n Bi Bi7 1 0.174 0.600 0.079 1.0\n Bi Bi8 1 0.080 0.399 0.668 1.0\n Bi Bi9 1 0.399 0.080 0.168 1.0\n B B10 1 0.862 0.827 0.574 1.0\n B B11 1 0.827 0.862 0.074 1.0\n B B12 1 0.532 0.468 0.750 1.0\n B B13 1 0.468 0.532 0.250 1.0\n B B14 1 0.173 0.138 0.926 1.0\n B B15 1 0.138 0.173 0.426 1.0\n O O16 1 0.863 0.954 0.656 1.0\n O O17 1 0.954 0.863 0.156 1.0\n O O18 1 0.704 0.867 0.506 1.0\n O O19 1 0.867 0.704 0.006 1.0\n O O20 1 0.409 0.993 0.705 1.0\n O O21 1 0.320 0.989 0.948 1.0\n O O22 1 0.993 0.409 0.205 1.0\n O O23 1 0.989 0.320 0.448 1.0\n O O24 1 0.518 0.587 0.835 1.0\n O O25 1 0.687 0.313 0.750 1.0\n O O26 1 0.587 0.518 0.335 1.0\n O O27 1 0.413 0.482 0.665 1.0\n O O28 1 0.313 0.687 0.250 1.0\n O O29 1 0.482 0.413 0.165 1.0\n O O30 1 0.011 0.680 0.552 1.0\n O O31 1 0.007 0.591 0.795 1.0\n O O32 1 0.680 0.011 0.052 1.0\n O O33 1 0.591 0.007 0.295 1.0\n O O34 1 0.133 0.296 0.994 1.0\n O O35 1 0.296 0.133 0.494 1.0\n O O36 1 0.046 0.137 0.844 1.0\n O O37 1 0.137 0.046 0.344 1.0\n[\/CIF]\n\nDescription: LiB3Bi4O11 crystallizes in the monoclinic C2\/c space group. Li(1) is bonded in a distorted trigonal pyramidal geometry to two equivalent O(1) and two equivalent O(3) atoms. There are two inequivalent B sites. In the first B site, B(1) is bonded in a trigonal planar geometry to one O(1), one O(2), and one O(4) atom. In the second B site, B(2) is bonded in a trigonal planar geometry to one O(6) and two equivalent O(5) atoms. There are two inequivalent Bi sites. In the first Bi site, Bi(1) is bonded in a 5-coordinate geometry to one O(1), one O(3), one O(4), one O(5), and one O(6) atom. In the second Bi site, Bi(2) is bonded in a 6-coordinate geometry to one O(1), one O(3), one O(5), one O(6), and two equivalent O(2) atoms. There are six inequivalent O sites. In the first O site, O(1) is bonded in a 4-coordinate geometry to one Li(1), one B(1), one Bi(1), and one Bi(2) atom. In the second O site, O(2) is bonded in a distorted single-bond geometry to one B(1) and two equivalent Bi(2) atoms. In the third O site, O(3) is bonded in a distorted trigonal planar geometry to one Li(1), one Bi(1), and one Bi(2) atom. In the fourth O site, O(4) is bonded in a distorted bent 120 degrees geometry to one B(1) and one Bi(1) atom. In the fifth O site, O(5) is bonded in a 1-coordinate geometry to one B(2), one Bi(1), and one Bi(2) atom. In the sixth O site, O(6) is bonded in a single-bond geometry to one B(2), two equivalent Bi(1), and two equivalent Bi(2) atoms."} {"text":"Task: Design a compound based on the Crystallographic Information File (CIF).\nCIF: [CIF]\ndata_Cd5Te4S\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 18.919\n_cell_length_b 18.919\n_cell_length_c 18.919\n_cell_angle_alpha 13.955\n_cell_angle_beta 13.955\n_cell_angle_gamma 13.955\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Cd5Te4S\n_chemical_formula_sum 'Cd5 Te4 S1'\n_cell_volume 342.783\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cd Cd0 1 0.995 0.995 0.995 1.0\n Cd Cd1 1 0.605 0.605 0.605 1.0\n Cd Cd2 1 0.202 0.202 0.202 1.0\n Cd Cd3 1 0.800 0.800 0.800 1.0\n Cd Cd4 1 0.397 0.397 0.397 1.0\n Te Te5 1 0.554 0.554 0.554 1.0\n Te Te6 1 0.151 0.151 0.151 1.0\n Te Te7 1 0.749 0.749 0.749 1.0\n Te Te8 1 0.347 0.347 0.347 1.0\n S S9 1 0.949 0.949 0.949 1.0\n[\/CIF]\n\nDescription: Cd5Te4S is Enargite-like structured and crystallizes in the trigonal R3m space group. There are five inequivalent Cd sites. In the first Cd site, Cd(1) is bonded to three equivalent Te(4) and one S(1) atom to form CdTe3S tetrahedra that share corners with three equivalent Cd(5)Te4 tetrahedra, corners with three equivalent Cd(2)TeS3 tetrahedra, and corners with six equivalent Cd(1)Te3S tetrahedra. In the second Cd site, Cd(2) is bonded to one Te(1) and three equivalent S(1) atoms to form CdTeS3 tetrahedra that share corners with three equivalent Cd(1)Te3S tetrahedra, corners with three equivalent Cd(3)Te4 tetrahedra, and corners with six equivalent Cd(2)TeS3 tetrahedra. In the third Cd site, Cd(3) is bonded to one Te(2) and three equivalent Te(1) atoms to form CdTe4 tetrahedra that share corners with three equivalent Cd(4)Te4 tetrahedra, corners with three equivalent Cd(2)TeS3 tetrahedra, and corners with six equivalent Cd(3)Te4 tetrahedra. In the fourth Cd site, Cd(4) is bonded to one Te(3) and three equivalent Te(2) atoms to form corner-sharing CdTe4 tetrahedra. In the fifth Cd site, Cd(5) is bonded to one Te(4) and three equivalent Te(3) atoms to form CdTe4 tetrahedra that share corners with three equivalent Cd(1)Te3S tetrahedra, corners with three equivalent Cd(4)Te4 tetrahedra, and corners with six equivalent Cd(5)Te4 tetrahedra. There are four inequivalent Te sites. In the first Te site, Te(1) is bonded to one Cd(2) and three equivalent Cd(3) atoms to form TeCd4 tetrahedra that share corners with three equivalent Te(2)Cd4 tetrahedra, corners with three equivalent S(1)Cd4 tetrahedra, and corners with six equivalent Te(1)Cd4 tetrahedra. In the second Te site, Te(2) is bonded to one Cd(3) and three equivalent Cd(4) atoms to form corner-sharing TeCd4 tetrahedra. In the third Te site, Te(3) is bonded to one Cd(4) and three equivalent Cd(5) atoms to form corner-sharing TeCd4 tetrahedra. In the fourth Te site, Te(4) is bonded to one Cd(5) and three equivalent Cd(1) atoms to form TeCd4 tetrahedra that share corners with three equivalent Te(3)Cd4 tetrahedra, corners with three equivalent S(1)Cd4 tetrahedra, and corners with six equivalent Te(4)Cd4 tetrahedra. S(1) is bonded to one Cd(1) and three equivalent Cd(2) atoms to form SCd4 tetrahedra that share corners with three equivalent Te(1)Cd4 tetrahedra, corners with three equivalent Te(4)Cd4 tetrahedra, and corners with six equivalent S(1)Cd4 tetrahedra."}", "/scratch/micpie/export/mp_descriptions/train_0-3.jsonl": "{"text":"User: Can you design a Crystallographic Information File (CIF) that matches a description of a compound?\nAssistant: Sure, I would need the description of the compound to do that.\nUser: LiB3Bi4O11 crystallizes in the monoclinic C2\/c space group. Li(1) is bonded in a distorted trigonal pyramidal geometry to two equivalent O(1) and two equivalent O(3) atoms. There are two inequivalent B sites. In the first B site, B(1) is bonded in a trigonal planar geometry to one O(1), one O(2), and one O(4) atom. In the second B site, B(2) is bonded in a trigonal planar geometry to one O(6) and two equivalent O(5) atoms. There are two inequivalent Bi sites. In the first Bi site, Bi(1) is bonded in a 5-coordinate geometry to one O(1), one O(3), one O(4), one O(5), and one O(6) atom. In the second Bi site, Bi(2) is bonded in a 6-coordinate geometry to one O(1), one O(3), one O(5), one O(6), and two equivalent O(2) atoms. There are six inequivalent O sites. In the first O site, O(1) is bonded in a 4-coordinate geometry to one Li(1), one B(1), one Bi(1), and one Bi(2) atom. In the second O site, O(2) is bonded in a distorted single-bond geometry to one B(1) and two equivalent Bi(2) atoms. In the third O site, O(3) is bonded in a distorted trigonal planar geometry to one Li(1), one Bi(1), and one Bi(2) atom. In the fourth O site, O(4) is bonded in a distorted bent 120 degrees geometry to one B(1) and one Bi(1) atom. In the fifth O site, O(5) is bonded in a 1-coordinate geometry to one B(2), one Bi(1), and one Bi(2) atom. In the sixth O site, O(6) is bonded in a single-bond geometry to one B(2), two equivalent Bi(1), and two equivalent Bi(2) atoms.\nAssistant: [CIF]\ndata_LiBi4B3O11\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.533\n_cell_length_b 6.533\n_cell_length_c 12.403\n_cell_angle_alpha 83.089\n_cell_angle_beta 83.089\n_cell_angle_gamma 88.841\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiBi4B3O11\n_chemical_formula_sum 'Li2 Bi8 B6 O22'\n_cell_volume 521.674\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.112 0.888 0.750 1.0\n Li Li1 1 0.888 0.112 0.250 1.0\n Bi Bi2 1 0.601 0.920 0.832 1.0\n Bi Bi3 1 0.920 0.601 0.332 1.0\n Bi Bi4 1 0.826 0.400 0.921 1.0\n Bi Bi5 1 0.400 0.826 0.421 1.0\n Bi Bi6 1 0.600 0.174 0.579 1.0\n Bi Bi7 1 0.174 0.600 0.079 1.0\n Bi Bi8 1 0.080 0.399 0.668 1.0\n Bi Bi9 1 0.399 0.080 0.168 1.0\n B B10 1 0.862 0.827 0.574 1.0\n B B11 1 0.827 0.862 0.074 1.0\n B B12 1 0.532 0.468 0.750 1.0\n B B13 1 0.468 0.532 0.250 1.0\n B B14 1 0.173 0.138 0.926 1.0\n B B15 1 0.138 0.173 0.426 1.0\n O O16 1 0.863 0.954 0.656 1.0\n O O17 1 0.954 0.863 0.156 1.0\n O O18 1 0.704 0.867 0.506 1.0\n O O19 1 0.867 0.704 0.006 1.0\n O O20 1 0.409 0.993 0.705 1.0\n O O21 1 0.320 0.989 0.948 1.0\n O O22 1 0.993 0.409 0.205 1.0\n O O23 1 0.989 0.320 0.448 1.0\n O O24 1 0.518 0.587 0.835 1.0\n O O25 1 0.687 0.313 0.750 1.0\n O O26 1 0.587 0.518 0.335 1.0\n O O27 1 0.413 0.482 0.665 1.0\n O O28 1 0.313 0.687 0.250 1.0\n O O29 1 0.482 0.413 0.165 1.0\n O O30 1 0.011 0.680 0.552 1.0\n O O31 1 0.007 0.591 0.795 1.0\n O O32 1 0.680 0.011 0.052 1.0\n O O33 1 0.591 0.007 0.295 1.0\n O O34 1 0.133 0.296 0.994 1.0\n O O35 1 0.296 0.133 0.494 1.0\n O O36 1 0.046 0.137 0.844 1.0\n O O37 1 0.137 0.046 0.344 1.0\n[\/CIF]\n"} {"text":"User: Could you design a Crystallographic Information File (CIF) that matches a description of a crystal structure?\nAssistant: I can give it a try, I need the description of the crystal structure to do that.\nUser: Cd5Te4S is Enargite-like structured and crystallizes in the trigonal R3m space group. There are five inequivalent Cd sites. In the first Cd site, Cd(1) is bonded to three equivalent Te(4) and one S(1) atom to form CdTe3S tetrahedra that share corners with three equivalent Cd(5)Te4 tetrahedra, corners with three equivalent Cd(2)TeS3 tetrahedra, and corners with six equivalent Cd(1)Te3S tetrahedra. In the second Cd site, Cd(2) is bonded to one Te(1) and three equivalent S(1) atoms to form CdTeS3 tetrahedra that share corners with three equivalent Cd(1)Te3S tetrahedra, corners with three equivalent Cd(3)Te4 tetrahedra, and corners with six equivalent Cd(2)TeS3 tetrahedra. In the third Cd site, Cd(3) is bonded to one Te(2) and three equivalent Te(1) atoms to form CdTe4 tetrahedra that share corners with three equivalent Cd(4)Te4 tetrahedra, corners with three equivalent Cd(2)TeS3 tetrahedra, and corners with six equivalent Cd(3)Te4 tetrahedra. In the fourth Cd site, Cd(4) is bonded to one Te(3) and three equivalent Te(2) atoms to form corner-sharing CdTe4 tetrahedra. In the fifth Cd site, Cd(5) is bonded to one Te(4) and three equivalent Te(3) atoms to form CdTe4 tetrahedra that share corners with three equivalent Cd(1)Te3S tetrahedra, corners with three equivalent Cd(4)Te4 tetrahedra, and corners with six equivalent Cd(5)Te4 tetrahedra. There are four inequivalent Te sites. In the first Te site, Te(1) is bonded to one Cd(2) and three equivalent Cd(3) atoms to form TeCd4 tetrahedra that share corners with three equivalent Te(2)Cd4 tetrahedra, corners with three equivalent S(1)Cd4 tetrahedra, and corners with six equivalent Te(1)Cd4 tetrahedra. In the second Te site, Te(2) is bonded to one Cd(3) and three equivalent Cd(4) atoms to form corner-sharing TeCd4 tetrahedra. In the third Te site, Te(3) is bonded to one Cd(4) and three equivalent Cd(5) atoms to form corner-sharing TeCd4 tetrahedra. In the fourth Te site, Te(4) is bonded to one Cd(5) and three equivalent Cd(1) atoms to form TeCd4 tetrahedra that share corners with three equivalent Te(3)Cd4 tetrahedra, corners with three equivalent S(1)Cd4 tetrahedra, and corners with six equivalent Te(4)Cd4 tetrahedra. S(1) is bonded to one Cd(1) and three equivalent Cd(2) atoms to form SCd4 tetrahedra that share corners with three equivalent Te(1)Cd4 tetrahedra, corners with three equivalent Te(4)Cd4 tetrahedra, and corners with six equivalent S(1)Cd4 tetrahedra.\nAssistant: [CIF]\ndata_Cd5Te4S\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 18.919\n_cell_length_b 18.919\n_cell_length_c 18.919\n_cell_angle_alpha 13.955\n_cell_angle_beta 13.955\n_cell_angle_gamma 13.955\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Cd5Te4S\n_chemical_formula_sum 'Cd5 Te4 S1'\n_cell_volume 342.783\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cd Cd0 1 0.995 0.995 0.995 1.0\n Cd Cd1 1 0.605 0.605 0.605 1.0\n Cd Cd2 1 0.202 0.202 0.202 1.0\n Cd Cd3 1 0.800 0.800 0.800 1.0\n Cd Cd4 1 0.397 0.397 0.397 1.0\n Te Te5 1 0.554 0.554 0.554 1.0\n Te Te6 1 0.151 0.151 0.151 1.0\n Te Te7 1 0.749 0.749 0.749 1.0\n Te Te8 1 0.347 0.347 0.347 1.0\n S S9 1 0.949 0.949 0.949 1.0\n[\/CIF]\n"}", "/scratch/micpie/export/mp_descriptions/valid_0-2.jsonl": "{"text":"User: Could you describe a material structure based on the CIF file?\nAssistant: Sure, I need the CIF file to do that.\nUser: [CIF]\ndata_Na3Os\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.441\n_cell_length_b 6.441\n_cell_length_c 5.283\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Na3Os\n_chemical_formula_sum 'Na6 Os2'\n_cell_volume 189.794\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Na Na0 1 0.822 0.178 0.750 1.0\n Na Na1 1 0.356 0.178 0.750 1.0\n Na Na2 1 0.822 0.644 0.750 1.0\n Na Na3 1 0.178 0.822 0.250 1.0\n Na Na4 1 0.644 0.822 0.250 1.0\n Na Na5 1 0.178 0.356 0.250 1.0\n Os Os6 1 0.667 0.333 0.250 1.0\n Os Os7 1 0.333 0.667 0.750 1.0\n[\/CIF]\n\nAssistant: Na3Os crystallizes in the hexagonal P6_3\/mmc space group. Na(1) is bonded in a distorted see-saw-like geometry to four equivalent Os(1) atoms. Os(1) is bonded to twelve equivalent Na(1) atoms to form a mixture of face and corner-sharing OsNa12 cuboctahedra."} {"text":"User: Can you describe a structure based on the CIF card?\nAssistant: I need the CIF card to do that.\nUser: [CIF]\ndata_YFe3Se2ClO8\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.516\n_cell_length_b 7.259\n_cell_length_c 9.673\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural YFe3Se2ClO8\n_chemical_formula_sum 'Y2 Fe6 Se4 Cl2 O16'\n_cell_volume 457.546\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Y Y0 1 0.000 0.228 0.000 1.0\n Y Y1 1 0.500 0.772 0.500 1.0\n Fe Fe2 1 0.250 0.500 0.750 1.0\n Fe Fe3 1 0.250 0.500 0.250 1.0\n Fe Fe4 1 0.750 0.500 0.750 1.0\n Fe Fe5 1 0.500 0.299 0.500 1.0\n Fe Fe6 1 0.000 0.701 0.000 1.0\n Fe Fe7 1 0.750 0.500 0.250 1.0\n Se Se8 1 0.500 0.086 0.813 1.0\n Se Se9 1 0.000 0.914 0.687 1.0\n Se Se10 1 0.500 0.086 0.187 1.0\n Se Se11 1 0.000 0.914 0.313 1.0\n Cl Cl12 1 0.000 0.372 0.500 1.0\n Cl Cl13 1 0.500 0.628 0.000 1.0\n O O14 1 0.000 0.491 0.863 1.0\n O O15 1 0.000 0.924 0.867 1.0\n O O16 1 0.500 0.076 0.367 1.0\n O O17 1 0.292 0.229 0.841 1.0\n O O18 1 0.208 0.771 0.341 1.0\n O O19 1 0.792 0.771 0.341 1.0\n O O20 1 0.500 0.509 0.637 1.0\n O O21 1 0.708 0.229 0.841 1.0\n O O22 1 0.708 0.229 0.159 1.0\n O O23 1 0.792 0.771 0.659 1.0\n O O24 1 0.000 0.491 0.137 1.0\n O O25 1 0.500 0.509 0.363 1.0\n O O26 1 0.208 0.771 0.659 1.0\n O O27 1 0.292 0.229 0.159 1.0\n O O28 1 0.500 0.076 0.633 1.0\n O O29 1 0.000 0.924 0.133 1.0\n[\/CIF]\n\nAssistant: YFe3Se2O8Cl crystallizes in the orthorhombic Pmmn space group. Y(1) is bonded in a body-centered cubic geometry to two equivalent O(1), two equivalent O(2), and four equivalent O(3) atoms. There are two inequivalent Fe sites. In the first Fe site, Fe(1) is bonded in a distorted square co-planar geometry to two equivalent O(1), two equivalent O(3), and two equivalent Cl(1) atoms. In the second Fe site, Fe(2) is bonded in a square co-planar geometry to two equivalent O(1) and two equivalent O(2) atoms. Se(1) is bonded in a trigonal non-coplanar geometry to one O(2) and two equivalent O(3) atoms. There are three inequivalent O sites. In the first O site, O(1) is bonded to one Y(1), one Fe(2), and two equivalent Fe(1) atoms to form a mixture of edge and corner-sharing OYFe3 tetrahedra. In the second O site, O(2) is bonded in a distorted trigonal planar geometry to one Y(1), one Fe(2), and one Se(1) atom. In the third O site, O(3) is bonded in a distorted trigonal planar geometry to one Y(1), one Fe(1), and one Se(1) atom. Cl(1) is bonded in a 4-coordinate geometry to four equivalent Fe(1) atoms."}", "/scratch/micpie/export/mp_descriptions/valid_0-1.jsonl": "{"text":"Task: Please design a Crystallographic Information File (CIF) that matches the description below.\nDescription: Na3Os crystallizes in the hexagonal P6_3\/mmc space group. Na(1) is bonded in a distorted see-saw-like geometry to four equivalent Os(1) atoms. Os(1) is bonded to twelve equivalent Na(1) atoms to form a mixture of face and corner-sharing OsNa12 cuboctahedra.\nCIF: [CIF]\ndata_Na3Os\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.441\n_cell_length_b 6.441\n_cell_length_c 5.283\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Na3Os\n_chemical_formula_sum 'Na6 Os2'\n_cell_volume 189.794\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Na Na0 1 0.822 0.178 0.750 1.0\n Na Na1 1 0.356 0.178 0.750 1.0\n Na Na2 1 0.822 0.644 0.750 1.0\n Na Na3 1 0.178 0.822 0.250 1.0\n Na Na4 1 0.644 0.822 0.250 1.0\n Na Na5 1 0.178 0.356 0.250 1.0\n Os Os6 1 0.667 0.333 0.250 1.0\n Os Os7 1 0.333 0.667 0.750 1.0\n[\/CIF]\n"} {"text":"Task: Design a CIF card that matches the description below.\nDescription: YFe3Se2O8Cl crystallizes in the orthorhombic Pmmn space group. Y(1) is bonded in a body-centered cubic geometry to two equivalent O(1), two equivalent O(2), and four equivalent O(3) atoms. There are two inequivalent Fe sites. In the first Fe site, Fe(1) is bonded in a distorted square co-planar geometry to two equivalent O(1), two equivalent O(3), and two equivalent Cl(1) atoms. In the second Fe site, Fe(2) is bonded in a square co-planar geometry to two equivalent O(1) and two equivalent O(2) atoms. Se(1) is bonded in a trigonal non-coplanar geometry to one O(2) and two equivalent O(3) atoms. There are three inequivalent O sites. In the first O site, O(1) is bonded to one Y(1), one Fe(2), and two equivalent Fe(1) atoms to form a mixture of edge and corner-sharing OYFe3 tetrahedra. In the second O site, O(2) is bonded in a distorted trigonal planar geometry to one Y(1), one Fe(2), and one Se(1) atom. In the third O site, O(3) is bonded in a distorted trigonal planar geometry to one Y(1), one Fe(1), and one Se(1) atom. Cl(1) is bonded in a 4-coordinate geometry to four equivalent Fe(1) atoms.\nCIF: [CIF]\ndata_YFe3Se2ClO8\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.516\n_cell_length_b 7.259\n_cell_length_c 9.673\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural YFe3Se2ClO8\n_chemical_formula_sum 'Y2 Fe6 Se4 Cl2 O16'\n_cell_volume 457.546\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Y Y0 1 0.000 0.228 0.000 1.0\n Y Y1 1 0.500 0.772 0.500 1.0\n Fe Fe2 1 0.250 0.500 0.750 1.0\n Fe Fe3 1 0.250 0.500 0.250 1.0\n Fe Fe4 1 0.750 0.500 0.750 1.0\n Fe Fe5 1 0.500 0.299 0.500 1.0\n Fe Fe6 1 0.000 0.701 0.000 1.0\n Fe Fe7 1 0.750 0.500 0.250 1.0\n Se Se8 1 0.500 0.086 0.813 1.0\n Se Se9 1 0.000 0.914 0.687 1.0\n Se Se10 1 0.500 0.086 0.187 1.0\n Se Se11 1 0.000 0.914 0.313 1.0\n Cl Cl12 1 0.000 0.372 0.500 1.0\n Cl Cl13 1 0.500 0.628 0.000 1.0\n O O14 1 0.000 0.491 0.863 1.0\n O O15 1 0.000 0.924 0.867 1.0\n O O16 1 0.500 0.076 0.367 1.0\n O O17 1 0.292 0.229 0.841 1.0\n O O18 1 0.208 0.771 0.341 1.0\n O O19 1 0.792 0.771 0.341 1.0\n O O20 1 0.500 0.509 0.637 1.0\n O O21 1 0.708 0.229 0.841 1.0\n O O22 1 0.708 0.229 0.159 1.0\n O O23 1 0.792 0.771 0.659 1.0\n O O24 1 0.000 0.491 0.137 1.0\n O O25 1 0.500 0.509 0.363 1.0\n O O26 1 0.208 0.771 0.659 1.0\n O O27 1 0.292 0.229 0.159 1.0\n O O28 1 0.500 0.076 0.633 1.0\n O O29 1 0.000 0.924 0.133 1.0\n[\/CIF]\n"}", "/scratch/micpie/export/mp_descriptions/train_0-2.jsonl": "{"text":"User: Could you describe a compound based on the CIF file?\nAssistant: Sure, I need the CIF file to do that.\nUser: [CIF]\ndata_LiBi4B3O11\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.533\n_cell_length_b 6.533\n_cell_length_c 12.403\n_cell_angle_alpha 83.089\n_cell_angle_beta 83.089\n_cell_angle_gamma 88.841\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiBi4B3O11\n_chemical_formula_sum 'Li2 Bi8 B6 O22'\n_cell_volume 521.674\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.112 0.888 0.750 1.0\n Li Li1 1 0.888 0.112 0.250 1.0\n Bi Bi2 1 0.601 0.920 0.832 1.0\n Bi Bi3 1 0.920 0.601 0.332 1.0\n Bi Bi4 1 0.826 0.400 0.921 1.0\n Bi Bi5 1 0.400 0.826 0.421 1.0\n Bi Bi6 1 0.600 0.174 0.579 1.0\n Bi Bi7 1 0.174 0.600 0.079 1.0\n Bi Bi8 1 0.080 0.399 0.668 1.0\n Bi Bi9 1 0.399 0.080 0.168 1.0\n B B10 1 0.862 0.827 0.574 1.0\n B B11 1 0.827 0.862 0.074 1.0\n B B12 1 0.532 0.468 0.750 1.0\n B B13 1 0.468 0.532 0.250 1.0\n B B14 1 0.173 0.138 0.926 1.0\n B B15 1 0.138 0.173 0.426 1.0\n O O16 1 0.863 0.954 0.656 1.0\n O O17 1 0.954 0.863 0.156 1.0\n O O18 1 0.704 0.867 0.506 1.0\n O O19 1 0.867 0.704 0.006 1.0\n O O20 1 0.409 0.993 0.705 1.0\n O O21 1 0.320 0.989 0.948 1.0\n O O22 1 0.993 0.409 0.205 1.0\n O O23 1 0.989 0.320 0.448 1.0\n O O24 1 0.518 0.587 0.835 1.0\n O O25 1 0.687 0.313 0.750 1.0\n O O26 1 0.587 0.518 0.335 1.0\n O O27 1 0.413 0.482 0.665 1.0\n O O28 1 0.313 0.687 0.250 1.0\n O O29 1 0.482 0.413 0.165 1.0\n O O30 1 0.011 0.680 0.552 1.0\n O O31 1 0.007 0.591 0.795 1.0\n O O32 1 0.680 0.011 0.052 1.0\n O O33 1 0.591 0.007 0.295 1.0\n O O34 1 0.133 0.296 0.994 1.0\n O O35 1 0.296 0.133 0.494 1.0\n O O36 1 0.046 0.137 0.844 1.0\n O O37 1 0.137 0.046 0.344 1.0\n[\/CIF]\n\nAssistant: LiB3Bi4O11 crystallizes in the monoclinic C2\/c space group. Li(1) is bonded in a distorted trigonal pyramidal geometry to two equivalent O(1) and two equivalent O(3) atoms. There are two inequivalent B sites. In the first B site, B(1) is bonded in a trigonal planar geometry to one O(1), one O(2), and one O(4) atom. In the second B site, B(2) is bonded in a trigonal planar geometry to one O(6) and two equivalent O(5) atoms. There are two inequivalent Bi sites. In the first Bi site, Bi(1) is bonded in a 5-coordinate geometry to one O(1), one O(3), one O(4), one O(5), and one O(6) atom. In the second Bi site, Bi(2) is bonded in a 6-coordinate geometry to one O(1), one O(3), one O(5), one O(6), and two equivalent O(2) atoms. There are six inequivalent O sites. In the first O site, O(1) is bonded in a 4-coordinate geometry to one Li(1), one B(1), one Bi(1), and one Bi(2) atom. In the second O site, O(2) is bonded in a distorted single-bond geometry to one B(1) and two equivalent Bi(2) atoms. In the third O site, O(3) is bonded in a distorted trigonal planar geometry to one Li(1), one Bi(1), and one Bi(2) atom. In the fourth O site, O(4) is bonded in a distorted bent 120 degrees geometry to one B(1) and one Bi(1) atom. In the fifth O site, O(5) is bonded in a 1-coordinate geometry to one B(2), one Bi(1), and one Bi(2) atom. In the sixth O site, O(6) is bonded in a single-bond geometry to one B(2), two equivalent Bi(1), and two equivalent Bi(2) atoms."} {"text":"User: Can you describe a material based on the CIF card?\nAssistant: I require the CIF card to do that.\nUser: [CIF]\ndata_Cd5Te4S\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 18.919\n_cell_length_b 18.919\n_cell_length_c 18.919\n_cell_angle_alpha 13.955\n_cell_angle_beta 13.955\n_cell_angle_gamma 13.955\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Cd5Te4S\n_chemical_formula_sum 'Cd5 Te4 S1'\n_cell_volume 342.783\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cd Cd0 1 0.995 0.995 0.995 1.0\n Cd Cd1 1 0.605 0.605 0.605 1.0\n Cd Cd2 1 0.202 0.202 0.202 1.0\n Cd Cd3 1 0.800 0.800 0.800 1.0\n Cd Cd4 1 0.397 0.397 0.397 1.0\n Te Te5 1 0.554 0.554 0.554 1.0\n Te Te6 1 0.151 0.151 0.151 1.0\n Te Te7 1 0.749 0.749 0.749 1.0\n Te Te8 1 0.347 0.347 0.347 1.0\n S S9 1 0.949 0.949 0.949 1.0\n[\/CIF]\n\nAssistant: Cd5Te4S is Enargite-like structured and crystallizes in the trigonal R3m space group. There are five inequivalent Cd sites. In the first Cd site, Cd(1) is bonded to three equivalent Te(4) and one S(1) atom to form CdTe3S tetrahedra that share corners with three equivalent Cd(5)Te4 tetrahedra, corners with three equivalent Cd(2)TeS3 tetrahedra, and corners with six equivalent Cd(1)Te3S tetrahedra. In the second Cd site, Cd(2) is bonded to one Te(1) and three equivalent S(1) atoms to form CdTeS3 tetrahedra that share corners with three equivalent Cd(1)Te3S tetrahedra, corners with three equivalent Cd(3)Te4 tetrahedra, and corners with six equivalent Cd(2)TeS3 tetrahedra. In the third Cd site, Cd(3) is bonded to one Te(2) and three equivalent Te(1) atoms to form CdTe4 tetrahedra that share corners with three equivalent Cd(4)Te4 tetrahedra, corners with three equivalent Cd(2)TeS3 tetrahedra, and corners with six equivalent Cd(3)Te4 tetrahedra. In the fourth Cd site, Cd(4) is bonded to one Te(3) and three equivalent Te(2) atoms to form corner-sharing CdTe4 tetrahedra. In the fifth Cd site, Cd(5) is bonded to one Te(4) and three equivalent Te(3) atoms to form CdTe4 tetrahedra that share corners with three equivalent Cd(1)Te3S tetrahedra, corners with three equivalent Cd(4)Te4 tetrahedra, and corners with six equivalent Cd(5)Te4 tetrahedra. There are four inequivalent Te sites. In the first Te site, Te(1) is bonded to one Cd(2) and three equivalent Cd(3) atoms to form TeCd4 tetrahedra that share corners with three equivalent Te(2)Cd4 tetrahedra, corners with three equivalent S(1)Cd4 tetrahedra, and corners with six equivalent Te(1)Cd4 tetrahedra. In the second Te site, Te(2) is bonded to one Cd(3) and three equivalent Cd(4) atoms to form corner-sharing TeCd4 tetrahedra. In the third Te site, Te(3) is bonded to one Cd(4) and three equivalent Cd(5) atoms to form corner-sharing TeCd4 tetrahedra. In the fourth Te site, Te(4) is bonded to one Cd(5) and three equivalent Cd(1) atoms to form TeCd4 tetrahedra that share corners with three equivalent Te(3)Cd4 tetrahedra, corners with three equivalent S(1)Cd4 tetrahedra, and corners with six equivalent Te(4)Cd4 tetrahedra. S(1) is bonded to one Cd(1) and three equivalent Cd(2) atoms to form SCd4 tetrahedra that share corners with three equivalent Te(1)Cd4 tetrahedra, corners with three equivalent Te(4)Cd4 tetrahedra, and corners with six equivalent S(1)Cd4 tetrahedra."}", "/scratch/micpie/export/mp_descriptions/train_0-1.jsonl": "{"text":"Task: Design a CIF file that matches the description below.\nDescription: LiB3Bi4O11 crystallizes in the monoclinic C2\/c space group. Li(1) is bonded in a distorted trigonal pyramidal geometry to two equivalent O(1) and two equivalent O(3) atoms. There are two inequivalent B sites. In the first B site, B(1) is bonded in a trigonal planar geometry to one O(1), one O(2), and one O(4) atom. In the second B site, B(2) is bonded in a trigonal planar geometry to one O(6) and two equivalent O(5) atoms. There are two inequivalent Bi sites. In the first Bi site, Bi(1) is bonded in a 5-coordinate geometry to one O(1), one O(3), one O(4), one O(5), and one O(6) atom. In the second Bi site, Bi(2) is bonded in a 6-coordinate geometry to one O(1), one O(3), one O(5), one O(6), and two equivalent O(2) atoms. There are six inequivalent O sites. In the first O site, O(1) is bonded in a 4-coordinate geometry to one Li(1), one B(1), one Bi(1), and one Bi(2) atom. In the second O site, O(2) is bonded in a distorted single-bond geometry to one B(1) and two equivalent Bi(2) atoms. In the third O site, O(3) is bonded in a distorted trigonal planar geometry to one Li(1), one Bi(1), and one Bi(2) atom. In the fourth O site, O(4) is bonded in a distorted bent 120 degrees geometry to one B(1) and one Bi(1) atom. In the fifth O site, O(5) is bonded in a 1-coordinate geometry to one B(2), one Bi(1), and one Bi(2) atom. In the sixth O site, O(6) is bonded in a single-bond geometry to one B(2), two equivalent Bi(1), and two equivalent Bi(2) atoms.\nAnswer: [CIF]\ndata_LiBi4B3O11\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.533\n_cell_length_b 6.533\n_cell_length_c 12.403\n_cell_angle_alpha 83.089\n_cell_angle_beta 83.089\n_cell_angle_gamma 88.841\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiBi4B3O11\n_chemical_formula_sum 'Li2 Bi8 B6 O22'\n_cell_volume 521.674\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.112 0.888 0.750 1.0\n Li Li1 1 0.888 0.112 0.250 1.0\n Bi Bi2 1 0.601 0.920 0.832 1.0\n Bi Bi3 1 0.920 0.601 0.332 1.0\n Bi Bi4 1 0.826 0.400 0.921 1.0\n Bi Bi5 1 0.400 0.826 0.421 1.0\n Bi Bi6 1 0.600 0.174 0.579 1.0\n Bi Bi7 1 0.174 0.600 0.079 1.0\n Bi Bi8 1 0.080 0.399 0.668 1.0\n Bi Bi9 1 0.399 0.080 0.168 1.0\n B B10 1 0.862 0.827 0.574 1.0\n B B11 1 0.827 0.862 0.074 1.0\n B B12 1 0.532 0.468 0.750 1.0\n B B13 1 0.468 0.532 0.250 1.0\n B B14 1 0.173 0.138 0.926 1.0\n B B15 1 0.138 0.173 0.426 1.0\n O O16 1 0.863 0.954 0.656 1.0\n O O17 1 0.954 0.863 0.156 1.0\n O O18 1 0.704 0.867 0.506 1.0\n O O19 1 0.867 0.704 0.006 1.0\n O O20 1 0.409 0.993 0.705 1.0\n O O21 1 0.320 0.989 0.948 1.0\n O O22 1 0.993 0.409 0.205 1.0\n O O23 1 0.989 0.320 0.448 1.0\n O O24 1 0.518 0.587 0.835 1.0\n O O25 1 0.687 0.313 0.750 1.0\n O O26 1 0.587 0.518 0.335 1.0\n O O27 1 0.413 0.482 0.665 1.0\n O O28 1 0.313 0.687 0.250 1.0\n O O29 1 0.482 0.413 0.165 1.0\n O O30 1 0.011 0.680 0.552 1.0\n O O31 1 0.007 0.591 0.795 1.0\n O O32 1 0.680 0.011 0.052 1.0\n O O33 1 0.591 0.007 0.295 1.0\n O O34 1 0.133 0.296 0.994 1.0\n O O35 1 0.296 0.133 0.494 1.0\n O O36 1 0.046 0.137 0.844 1.0\n O O37 1 0.137 0.046 0.344 1.0\n[\/CIF]\n"} {"text":"Task: Please design a Crystallographic Information File (CIF) that matches the description below.\nDescription: Cd5Te4S is Enargite-like structured and crystallizes in the trigonal R3m space group. There are five inequivalent Cd sites. In the first Cd site, Cd(1) is bonded to three equivalent Te(4) and one S(1) atom to form CdTe3S tetrahedra that share corners with three equivalent Cd(5)Te4 tetrahedra, corners with three equivalent Cd(2)TeS3 tetrahedra, and corners with six equivalent Cd(1)Te3S tetrahedra. In the second Cd site, Cd(2) is bonded to one Te(1) and three equivalent S(1) atoms to form CdTeS3 tetrahedra that share corners with three equivalent Cd(1)Te3S tetrahedra, corners with three equivalent Cd(3)Te4 tetrahedra, and corners with six equivalent Cd(2)TeS3 tetrahedra. In the third Cd site, Cd(3) is bonded to one Te(2) and three equivalent Te(1) atoms to form CdTe4 tetrahedra that share corners with three equivalent Cd(4)Te4 tetrahedra, corners with three equivalent Cd(2)TeS3 tetrahedra, and corners with six equivalent Cd(3)Te4 tetrahedra. In the fourth Cd site, Cd(4) is bonded to one Te(3) and three equivalent Te(2) atoms to form corner-sharing CdTe4 tetrahedra. In the fifth Cd site, Cd(5) is bonded to one Te(4) and three equivalent Te(3) atoms to form CdTe4 tetrahedra that share corners with three equivalent Cd(1)Te3S tetrahedra, corners with three equivalent Cd(4)Te4 tetrahedra, and corners with six equivalent Cd(5)Te4 tetrahedra. There are four inequivalent Te sites. In the first Te site, Te(1) is bonded to one Cd(2) and three equivalent Cd(3) atoms to form TeCd4 tetrahedra that share corners with three equivalent Te(2)Cd4 tetrahedra, corners with three equivalent S(1)Cd4 tetrahedra, and corners with six equivalent Te(1)Cd4 tetrahedra. In the second Te site, Te(2) is bonded to one Cd(3) and three equivalent Cd(4) atoms to form corner-sharing TeCd4 tetrahedra. In the third Te site, Te(3) is bonded to one Cd(4) and three equivalent Cd(5) atoms to form corner-sharing TeCd4 tetrahedra. In the fourth Te site, Te(4) is bonded to one Cd(5) and three equivalent Cd(1) atoms to form TeCd4 tetrahedra that share corners with three equivalent Te(3)Cd4 tetrahedra, corners with three equivalent S(1)Cd4 tetrahedra, and corners with six equivalent Te(4)Cd4 tetrahedra. S(1) is bonded to one Cd(1) and three equivalent Cd(2) atoms to form SCd4 tetrahedra that share corners with three equivalent Te(1)Cd4 tetrahedra, corners with three equivalent Te(4)Cd4 tetrahedra, and corners with six equivalent S(1)Cd4 tetrahedra.\nCIF: [CIF]\ndata_Cd5Te4S\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 18.919\n_cell_length_b 18.919\n_cell_length_c 18.919\n_cell_angle_alpha 13.955\n_cell_angle_beta 13.955\n_cell_angle_gamma 13.955\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Cd5Te4S\n_chemical_formula_sum 'Cd5 Te4 S1'\n_cell_volume 342.783\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cd Cd0 1 0.995 0.995 0.995 1.0\n Cd Cd1 1 0.605 0.605 0.605 1.0\n Cd Cd2 1 0.202 0.202 0.202 1.0\n Cd Cd3 1 0.800 0.800 0.800 1.0\n Cd Cd4 1 0.397 0.397 0.397 1.0\n Te Te5 1 0.554 0.554 0.554 1.0\n Te Te6 1 0.151 0.151 0.151 1.0\n Te Te7 1 0.749 0.749 0.749 1.0\n Te Te8 1 0.347 0.347 0.347 1.0\n S S9 1 0.949 0.949 0.949 1.0\n[\/CIF]\n"}", "/scratch/micpie/export/mp_descriptions/valid_0-3.jsonl": "{"text":"User: Could you design a CIF card that matches a description of a material structure?\nAssistant: I would need the description of the material structure to do that.\nUser: Na3Os crystallizes in the hexagonal P6_3\/mmc space group. Na(1) is bonded in a distorted see-saw-like geometry to four equivalent Os(1) atoms. Os(1) is bonded to twelve equivalent Na(1) atoms to form a mixture of face and corner-sharing OsNa12 cuboctahedra.\nAssistant: [CIF]\ndata_Na3Os\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.441\n_cell_length_b 6.441\n_cell_length_c 5.283\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Na3Os\n_chemical_formula_sum 'Na6 Os2'\n_cell_volume 189.794\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Na Na0 1 0.822 0.178 0.750 1.0\n Na Na1 1 0.356 0.178 0.750 1.0\n Na Na2 1 0.822 0.644 0.750 1.0\n Na Na3 1 0.178 0.822 0.250 1.0\n Na Na4 1 0.644 0.822 0.250 1.0\n Na Na5 1 0.178 0.356 0.250 1.0\n Os Os6 1 0.667 0.333 0.250 1.0\n Os Os7 1 0.333 0.667 0.750 1.0\n[\/CIF]\n"} {"text":"User: Could you design a CIF file that matches a description of a compound?\nAssistant: I can give it a try, I would need the description of the compound to do that.\nUser: YFe3Se2O8Cl crystallizes in the orthorhombic Pmmn space group. Y(1) is bonded in a body-centered cubic geometry to two equivalent O(1), two equivalent O(2), and four equivalent O(3) atoms. There are two inequivalent Fe sites. In the first Fe site, Fe(1) is bonded in a distorted square co-planar geometry to two equivalent O(1), two equivalent O(3), and two equivalent Cl(1) atoms. In the second Fe site, Fe(2) is bonded in a square co-planar geometry to two equivalent O(1) and two equivalent O(2) atoms. Se(1) is bonded in a trigonal non-coplanar geometry to one O(2) and two equivalent O(3) atoms. There are three inequivalent O sites. In the first O site, O(1) is bonded to one Y(1), one Fe(2), and two equivalent Fe(1) atoms to form a mixture of edge and corner-sharing OYFe3 tetrahedra. In the second O site, O(2) is bonded in a distorted trigonal planar geometry to one Y(1), one Fe(2), and one Se(1) atom. In the third O site, O(3) is bonded in a distorted trigonal planar geometry to one Y(1), one Fe(1), and one Se(1) atom. Cl(1) is bonded in a 4-coordinate geometry to four equivalent Fe(1) atoms.\nAssistant: [CIF]\ndata_YFe3Se2ClO8\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.516\n_cell_length_b 7.259\n_cell_length_c 9.673\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural YFe3Se2ClO8\n_chemical_formula_sum 'Y2 Fe6 Se4 Cl2 O16'\n_cell_volume 457.546\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Y Y0 1 0.000 0.228 0.000 1.0\n Y Y1 1 0.500 0.772 0.500 1.0\n Fe Fe2 1 0.250 0.500 0.750 1.0\n Fe Fe3 1 0.250 0.500 0.250 1.0\n Fe Fe4 1 0.750 0.500 0.750 1.0\n Fe Fe5 1 0.500 0.299 0.500 1.0\n Fe Fe6 1 0.000 0.701 0.000 1.0\n Fe Fe7 1 0.750 0.500 0.250 1.0\n Se Se8 1 0.500 0.086 0.813 1.0\n Se Se9 1 0.000 0.914 0.687 1.0\n Se Se10 1 0.500 0.086 0.187 1.0\n Se Se11 1 0.000 0.914 0.313 1.0\n Cl Cl12 1 0.000 0.372 0.500 1.0\n Cl Cl13 1 0.500 0.628 0.000 1.0\n O O14 1 0.000 0.491 0.863 1.0\n O O15 1 0.000 0.924 0.867 1.0\n O O16 1 0.500 0.076 0.367 1.0\n O O17 1 0.292 0.229 0.841 1.0\n O O18 1 0.208 0.771 0.341 1.0\n O O19 1 0.792 0.771 0.341 1.0\n O O20 1 0.500 0.509 0.637 1.0\n O O21 1 0.708 0.229 0.841 1.0\n O O22 1 0.708 0.229 0.159 1.0\n O O23 1 0.792 0.771 0.659 1.0\n O O24 1 0.000 0.491 0.137 1.0\n O O25 1 0.500 0.509 0.363 1.0\n O O26 1 0.208 0.771 0.659 1.0\n O O27 1 0.292 0.229 0.159 1.0\n O O28 1 0.500 0.076 0.633 1.0\n O O29 1 0.000 0.924 0.133 1.0\n[\/CIF]\n"}", "/scratch/micpie/export/uniprot_reactions/test_0-1.jsonl": "{"text":"Task: Predict a reaction that can be catalyzed by the following protein.\nSequence: MRSVSGQVVCVTGAGGFIASWLVKILLEKGYTVRGTVRNPDDPKNGHLRELEGAKERLTLCKADLLDYQSLREAINGCDGVFHTASPVTDDPEQMVEPAVIGTKNVINAAAEANVRRVVFTSSIGAVYMDPNRDPETVVDETCWSDPDFCKNTKNWYCYGKMVAEQAAWEEAKEKGVDLVVINPVLVQGPLLQTTVNASVLHILKYLTGSAKTYANSVQAYVDVKDVALAHILLYETPEASGRYLCAESVLHRGDVVEILSKFFPEYPIPTKCSDVTKPRVKPYKFSNQKLKDLGLEFTPVKQCLYETVKSLQEKGHLPIPTQKDEPIIRIQP\nResult: (E)-cinnamaldehyde + CoA + NADP(+) -> (E)-cinnamoyl-CoA + H(+) + NADPH"} {"text":"Task: Identify a biochemical reaction that can be catalyzed by the following protein.\nAmino acid sequence : MANVFDNSSYRDMLKMVFVIRDDLKMTKGEIVSQCCHGAISAYEKSKKYSPDYLKRWLKNGQVKETVKVDNENEMMDIRENATAIGVNYYIVQNDKRQKCNTVLVIGPAPNYMFESLTRSLKPL\nResult: an N-acyl-L-alpha-aminoacyl-tRNA + H2O -> a tRNA + an N-acyl-L-amino acid + H(+)"}", "/scratch/micpie/export/uniprot_reactions/valid_0-0.jsonl": "{"text":"The amino acid sequence MSLEQKKGADIISKILQIQNSIGKTTSPSTLKTKLSEISRKEQENARIQSKLSDLQKKKIDIDNKLLKEKQNLIKEEILERKKLEVLTKKQQKDEIEHQKKLKREIDAIKASTQYITDVSISSYNNTIPETEPEYDLFISHASEDKEDFVRPLAETLQQLGVNVWYDEFTLKVGDSLRQKIDSGLRNSKYGTVVLSTDFIKKDWTNYELDGLVAREMNGHKMILPIWHKITKNDVLDYSPNLADKVALNTSVNSIEEIAHQLADVILNR catalyzes the following chemical reaction: NAD(+) -> 2'cADPR + H(+) + nicotinamide"} {"text":"The protein with the sequence MSTLLIPQDTIAHTFDEAVASESNLRIDEVPENYLERFIHPSEPENFEFYSLRDSDIPSKRIPKNGIQVFENLKYHTNSKDNLYKDQPSSGPSPMRGVANIIREYFPQYLDDLRTWCRPKSSDDSIFNDFNHEQRITQPFTEERERRLLPLIDHFLGIKPYDIVHYCDTRFYPWKLSTRADYFHNHSRDRKAHAAKSHPDFATGPTKKSYFINSHLFFDRSTVHNIKEYGFPFRPTTDSARNETLLDLWFKKVPTELLVRSHISKRDNLKVRPVYNAPMIYIRIECMLFYPLLAQARKRDCCIMYGLETIRGGMNELERISNAFNSFLLIDWSRFDHLAPFTISNFFFKKWLPTKILIDHGYAQISNYHDHVHSFSAQAQSHGIPMISKEYQTPPEATVFAKKVLNLISFLERWYRDMVFVTPDGFAYRRTHAGVPSGILMTQFIDSFVNLTILLDGLIEFGFTDEEIKQLLVFIMGDDNVIFTPWTLLKLIEFFDWFAKYTLDRFGMVINISKSAVTSIRRKIEVLGYTNNYGFPTRSISKLVGQLAYPERHVTDADMCMRAIGFAYASCAQSETFHALCKKVFQYYFAKTSINERLILKGRKAELPGMFFAYPDVSEHIRLDHFPSLSEVRILLSKFQGYLKETPFGTIPTFSTPQTLRDQTQ catalyzes the reaction: a ribonucleoside 5'-triphosphate + RNA(n) -> diphosphate + RNA(n+1)"}", "/scratch/micpie/export/uniprot_reactions/test_0-2.jsonl": "{"text":"Task: Come up with a polypeptide that can catalyze this specific chemical reaction.\nReaction: (E)-sinapaldehyde + CoA + NADP(+) -> (E)-sinapoyl-CoA + H(+) + NADPH\nOutput: MRSVSGQVVCVTGAGGFIASWLVKILLEKGYTVRGTVRNPDDPKNGHLRELEGAKERLTLCKADLLDYQSLREAINGCDGVFHTASPVTDDPEQMVEPAVIGTKNVINAAAEANVRRVVFTSSIGAVYMDPNRDPETVVDETCWSDPDFCKNTKNWYCYGKMVAEQAAWEEAKEKGVDLVVINPVLVQGPLLQTTVNASVLHILKYLTGSAKTYANSVQAYVDVKDVALAHILLYETPEASGRYLCAESVLHRGDVVEILSKFFPEYPIPTKCSDVTKPRVKPYKFSNQKLKDLGLEFTPVKQCLYETVKSLQEKGHLPIPTQKDEPIIRIQP"} {"text":"Task: Come up with a amino acid sequence that can catalyze this specific biochemical reaction.\nReaction: an N-acyl-L-alpha-aminoacyl-tRNA + H2O -> a tRNA + an N-acyl-L-amino acid + H(+)\nResult: MANVFDNSSYRDMLKMVFVIRDDLKMTKGEIVSQCCHGAISAYEKSKKYSPDYLKRWLKNGQVKETVKVDNENEMMDIRENATAIGVNYYIVQNDKRQKCNTVLVIGPAPNYMFESLTRSLKPL"}", "/scratch/micpie/export/uniprot_reactions/test_0-0.jsonl": "{"text":"The AA sequence with the sequence MRSVSGQVVCVTGAGGFIASWLVKILLEKGYTVRGTVRNPDDPKNGHLRELEGAKERLTLCKADLLDYQSLREAINGCDGVFHTASPVTDDPEQMVEPAVIGTKNVINAAAEANVRRVVFTSSIGAVYMDPNRDPETVVDETCWSDPDFCKNTKNWYCYGKMVAEQAAWEEAKEKGVDLVVINPVLVQGPLLQTTVNASVLHILKYLTGSAKTYANSVQAYVDVKDVALAHILLYETPEASGRYLCAESVLHRGDVVEILSKFFPEYPIPTKCSDVTKPRVKPYKFSNQKLKDLGLEFTPVKQCLYETVKSLQEKGHLPIPTQKDEPIIRIQP catalyzes the following biochemical reaction: (E)-sinapaldehyde + CoA + NADP(+) -> (E)-sinapoyl-CoA + H(+) + NADPH"} {"text":"The AA sequence MANVFDNSSYRDMLKMVFVIRDDLKMTKGEIVSQCCHGAISAYEKSKKYSPDYLKRWLKNGQVKETVKVDNENEMMDIRENATAIGVNYYIVQNDKRQKCNTVLVIGPAPNYMFESLTRSLKPL catalyzes the biochemical reaction: an N-acyl-L-alpha-aminoacyl-tRNA + H2O -> a tRNA + an N-acyl-L-amino acid + H(+)"}", "/scratch/micpie/export/uniprot_reactions/test_0-3.jsonl": "{"text":"User: Can you tell me a reaction that can be catalyzed by the following amino acid sequence:\\nMRSVSGQVVCVTGAGGFIASWLVKILLEKGYTVRGTVRNPDDPKNGHLRELEGAKERLTLCKADLLDYQSLREAINGCDGVFHTASPVTDDPEQMVEPAVIGTKNVINAAAEANVRRVVFTSSIGAVYMDPNRDPETVVDETCWSDPDFCKNTKNWYCYGKMVAEQAAWEEAKEKGVDLVVINPVLVQGPLLQTTVNASVLHILKYLTGSAKTYANSVQAYVDVKDVALAHILLYETPEASGRYLCAESVLHRGDVVEILSKFFPEYPIPTKCSDVTKPRVKPYKFSNQKLKDLGLEFTPVKQCLYETVKSLQEKGHLPIPTQKDEPIIRIQP\nAssistant: The biochemical reaction that can be catalyzed by the given amino acid sequence are:\\n(E)-coniferaldehyde + CoA + NADP(+) -> (E)-feruloyl-CoA + H(+) + NADPH"} {"text":"User: Can you come up with a chemical reaction that can be catalyzed by the following polypeptide:\\nMANVFDNSSYRDMLKMVFVIRDDLKMTKGEIVSQCCHGAISAYEKSKKYSPDYLKRWLKNGQVKETVKVDNENEMMDIRENATAIGVNYYIVQNDKRQKCNTVLVIGPAPNYMFESLTRSLKPL\nAssistant: The reaction that can be catalyzed by the given polypeptide are:\\nan N-acyl-L-alpha-aminoacyl-tRNA + H2O -> a tRNA + an N-acyl-L-amino acid + H(+)"}", "/scratch/micpie/export/uniprot_reactions/train_0-0.jsonl": "{"text":"The amino acid sequence with the sequence MRFQVIVAAATITMITSYIPGVASQSTSDGDDLFVPVSNFDPKSIFPEIKHPFEPMYANTENGKIVPTNSWISNLFYPSADNLAPTTPDPYTLRLLDGYGGNPGLTIRQPSAKVLGSYPPTNDVPYTDAGYMINSVVVDLRLTSSEWSDVVPDRQVTDWDHLSANLRLSTPQDSNSYIDFPIVRGMAYITANYNNLTPQFLSQHAIISVEADEKKSDDNTSTFSGRKFKITMNDDPTSTFIIYSLGDKPLELRKQDNSNLVASKPYTGVIRVAKLPAPEFETLLDASRAVWPTGGDISARSDDNNGASYTIKWKTNSNEAPLLTYAYAHHLTSIDDSNVKRTDMTLQSATKGPMTALVGNEWTLRETELSPVEWLPLQAAPNPTTINEIMTEINKDIASNYTQETAKEDNYFSGKGLQKFAMLALILNKSDQTQLRNPELAQIALDKLKAAFLPYLQNEQADPFRYDTLYKGIVAKAGLPTSMGGTDDLSAEFGHSYYSDHHYHQGYFVVTAAIIHHLDPTWNADRLKAWTEALIRDVNNANDGDEYFAAFRNWDWFAGHSWAGGIKPDGALDGRDQESVPESVNFYWGAKLWGLATGNTPLTKLASLQLAVTKRTTYEYFWMLDGNKNRPENIVRNKVIGIYFEQKTDYTTYFGRFLEYIHGIQQLPMTPELMEYIRTPEFVSQEWDEKLGAIAPTVQSPWAGVLYLNYAIINPAEAYPALRKVQMDDGQTRSYSLYLTATRPHFFRRSLLAALARHGSTRRPSLPSSGDDDKHEDGFLLRFRRLNPFNLKHRIY catalyzes the following chemical reaction: Hydrolysis of (1->3)-beta-D-glucosidic linkages in (1->3)-beta-D-glucans."} {"text":"The polypeptide with the sequence MSLTSRYTHFVPDSTITEILNDSNTPQILLHYANIVNGSTPVHFTSHHDNQVNWTVATLTRMSQYMIPDFMKLFPPLEPTLSLQPDCHCSFINLPRPEIKIPIEILSPPKPNYAKYHYDATTSRVFVNSKHEMYMDNFDVSQLIRDVAAIKTDSPSGNITKGLLKTFHDSIKLRALSPIMSMFHSLLSYRCPCCTSLNGMKKLNHLCFQYSSIYAFLCDMVRPYMCVPFFVDRLGVQILPGFKVSSQYPLLFFEAIELMHTVGLGNLSDSLSGWCFYTWLDRARIGVFREMFNRRGSITMLKSRVVSTGTIFRFSQREFVIESITEQRSTDISPTFEECSFSDSQYIQDNCYKPIYDITTTLDDVKCRWLDVALNYFYGAVLYVTGPVSLALEQSGMGRPGSLNLQFGGTTDVYVEGRWITIDVEPVSPFVSRIKQLADRELAKTKVNGDSLEHGFFEAQTTNSAGNTKETLAGLRSEIIEQHDSPQEGRLLASMAGIRVIDAMRRFNTTFRDHTEFLNEVRRPTKAGMRYQQQRRPRVIQMTGTEAQLGGWLLLNVYEPTYKRLGYTSSGKNIGDIRDMQAVLEASGQNGINSSVDIIGMDASTQNTHVTLLGSAAIKAYNPERIGFPKMFFQSTHNGGDANSRVLPTRVTRDGQTIPKDDDVKYNLPQLAIIYSLHGMHGPTILYDGYFAPAVLTSQTVFRSGWYNTSSQHTMLGSLVLLSLEEDIRNGYKNPYDGAPERSLIAKHWHSIRIIGRVLGDDILLKAFGPPTLTPDELREVTAEVCAEFEHRMELLGFLCERAFSDVMCEFLKQKGFGGAPHMFPDRLVLYTSERGNQAMTNPTTMYRVCDALIIEFNSRSRNIFNTCVSRRVLQTVCSTFALRMTSSGHLVRRSYASRKPYSRVAKVSDGILSELHNHKTVFRIIDYNVLGDHIAMIFLPMLWATNHILGCPPPAIVSISGANIPAASPLTYPSAAITTFWLTATSRRKIDFDSSATAYKKSMSDISNLTAVPLDIIFSFSNAMELSPLSINLDKDYDIDTLRTFGFIVGIMSDSLFPTPSATRAKIKSPVVDDWSRYADSLLNPTRVRSSHHGSEILAESNVVVPYELRYAHRGTAKVRQSMYELPVTDLEYGENTMTTLTQLSESLKVKPGTSKLLRDAMLAGEVFVIPTTHPVTLPCPSFDAHGYGHIIPPNSLQSLLLTHLGLPVSSASYTSSFAKTILSDGKLPGSAEAYLSLYQETYKKGPSAVAYLKDAIGFSDSSMSALERLASNGLYGISGASFAYNPRGGFFFRFDQDNADRFGTSLSPSPTIRRLDIVHMMFTMLMYPTTMVSQNQWMMVRFGRSFSRLARR catalyzes the following chemical reaction: a ribonucleoside 5'-triphosphate + RNA(n) -> diphosphate + RNA(n+1)"}", "/scratch/micpie/export/uniprot_reactions/train_0-3.jsonl": "{"text":"User: Can you tell me a chemical reaction that can be catalyzed by the following protein:\\nMRFQVIVAAATITMITSYIPGVASQSTSDGDDLFVPVSNFDPKSIFPEIKHPFEPMYANTENGKIVPTNSWISNLFYPSADNLAPTTPDPYTLRLLDGYGGNPGLTIRQPSAKVLGSYPPTNDVPYTDAGYMINSVVVDLRLTSSEWSDVVPDRQVTDWDHLSANLRLSTPQDSNSYIDFPIVRGMAYITANYNNLTPQFLSQHAIISVEADEKKSDDNTSTFSGRKFKITMNDDPTSTFIIYSLGDKPLELRKQDNSNLVASKPYTGVIRVAKLPAPEFETLLDASRAVWPTGGDISARSDDNNGASYTIKWKTNSNEAPLLTYAYAHHLTSIDDSNVKRTDMTLQSATKGPMTALVGNEWTLRETELSPVEWLPLQAAPNPTTINEIMTEINKDIASNYTQETAKEDNYFSGKGLQKFAMLALILNKSDQTQLRNPELAQIALDKLKAAFLPYLQNEQADPFRYDTLYKGIVAKAGLPTSMGGTDDLSAEFGHSYYSDHHYHQGYFVVTAAIIHHLDPTWNADRLKAWTEALIRDVNNANDGDEYFAAFRNWDWFAGHSWAGGIKPDGALDGRDQESVPESVNFYWGAKLWGLATGNTPLTKLASLQLAVTKRTTYEYFWMLDGNKNRPENIVRNKVIGIYFEQKTDYTTYFGRFLEYIHGIQQLPMTPELMEYIRTPEFVSQEWDEKLGAIAPTVQSPWAGVLYLNYAIINPAEAYPALRKVQMDDGQTRSYSLYLTATRPHFFRRSLLAALARHGSTRRPSLPSSGDDDKHEDGFLLRFRRLNPFNLKHRIY\nAssistant: Yes, the chemical reaction that can be catalyzed by the given protein are:\\nHydrolysis of (1->3)-beta-D-glucosidic linkages in (1->3)-beta-D-glucans."} {"text":"User: Can you tell me a biochemical reaction that can be catalyzed by the following amino acid sequence:\\nMSLTSRYTHFVPDSTITEILNDSNTPQILLHYANIVNGSTPVHFTSHHDNQVNWTVATLTRMSQYMIPDFMKLFPPLEPTLSLQPDCHCSFINLPRPEIKIPIEILSPPKPNYAKYHYDATTSRVFVNSKHEMYMDNFDVSQLIRDVAAIKTDSPSGNITKGLLKTFHDSIKLRALSPIMSMFHSLLSYRCPCCTSLNGMKKLNHLCFQYSSIYAFLCDMVRPYMCVPFFVDRLGVQILPGFKVSSQYPLLFFEAIELMHTVGLGNLSDSLSGWCFYTWLDRARIGVFREMFNRRGSITMLKSRVVSTGTIFRFSQREFVIESITEQRSTDISPTFEECSFSDSQYIQDNCYKPIYDITTTLDDVKCRWLDVALNYFYGAVLYVTGPVSLALEQSGMGRPGSLNLQFGGTTDVYVEGRWITIDVEPVSPFVSRIKQLADRELAKTKVNGDSLEHGFFEAQTTNSAGNTKETLAGLRSEIIEQHDSPQEGRLLASMAGIRVIDAMRRFNTTFRDHTEFLNEVRRPTKAGMRYQQQRRPRVIQMTGTEAQLGGWLLLNVYEPTYKRLGYTSSGKNIGDIRDMQAVLEASGQNGINSSVDIIGMDASTQNTHVTLLGSAAIKAYNPERIGFPKMFFQSTHNGGDANSRVLPTRVTRDGQTIPKDDDVKYNLPQLAIIYSLHGMHGPTILYDGYFAPAVLTSQTVFRSGWYNTSSQHTMLGSLVLLSLEEDIRNGYKNPYDGAPERSLIAKHWHSIRIIGRVLGDDILLKAFGPPTLTPDELREVTAEVCAEFEHRMELLGFLCERAFSDVMCEFLKQKGFGGAPHMFPDRLVLYTSERGNQAMTNPTTMYRVCDALIIEFNSRSRNIFNTCVSRRVLQTVCSTFALRMTSSGHLVRRSYASRKPYSRVAKVSDGILSELHNHKTVFRIIDYNVLGDHIAMIFLPMLWATNHILGCPPPAIVSISGANIPAASPLTYPSAAITTFWLTATSRRKIDFDSSATAYKKSMSDISNLTAVPLDIIFSFSNAMELSPLSINLDKDYDIDTLRTFGFIVGIMSDSLFPTPSATRAKIKSPVVDDWSRYADSLLNPTRVRSSHHGSEILAESNVVVPYELRYAHRGTAKVRQSMYELPVTDLEYGENTMTTLTQLSESLKVKPGTSKLLRDAMLAGEVFVIPTTHPVTLPCPSFDAHGYGHIIPPNSLQSLLLTHLGLPVSSASYTSSFAKTILSDGKLPGSAEAYLSLYQETYKKGPSAVAYLKDAIGFSDSSMSALERLASNGLYGISGASFAYNPRGGFFFRFDQDNADRFGTSLSPSPTIRRLDIVHMMFTMLMYPTTMVSQNQWMMVRFGRSFSRLARR\nAssistant: The reaction that can be catalyzed by the given amino acid sequence are:\\na ribonucleoside 5'-triphosphate + RNA(n) -> diphosphate + RNA(n+1)"}", "/scratch/micpie/export/uniprot_reactions/valid_0-2.jsonl": "{"text":"Task: Create a polypeptide that can catalyze this specific biochemical reaction.\nReaction: H2O + NAD(+) -> ADP-D-ribose + H(+) + nicotinamide\nOutput: MSLEQKKGADIISKILQIQNSIGKTTSPSTLKTKLSEISRKEQENARIQSKLSDLQKKKIDIDNKLLKEKQNLIKEEILERKKLEVLTKKQQKDEIEHQKKLKREIDAIKASTQYITDVSISSYNNTIPETEPEYDLFISHASEDKEDFVRPLAETLQQLGVNVWYDEFTLKVGDSLRQKIDSGLRNSKYGTVVLSTDFIKKDWTNYELDGLVAREMNGHKMILPIWHKITKNDVLDYSPNLADKVALNTSVNSIEEIAHQLADVILNR"} {"text":"Task: Come up with a polypeptide that can catalyze this specific chemical reaction.\nReaction: a ribonucleoside 5'-triphosphate + RNA(n) -> diphosphate + RNA(n+1)\nResult: MSTLLIPQDTIAHTFDEAVASESNLRIDEVPENYLERFIHPSEPENFEFYSLRDSDIPSKRIPKNGIQVFENLKYHTNSKDNLYKDQPSSGPSPMRGVANIIREYFPQYLDDLRTWCRPKSSDDSIFNDFNHEQRITQPFTEERERRLLPLIDHFLGIKPYDIVHYCDTRFYPWKLSTRADYFHNHSRDRKAHAAKSHPDFATGPTKKSYFINSHLFFDRSTVHNIKEYGFPFRPTTDSARNETLLDLWFKKVPTELLVRSHISKRDNLKVRPVYNAPMIYIRIECMLFYPLLAQARKRDCCIMYGLETIRGGMNELERISNAFNSFLLIDWSRFDHLAPFTISNFFFKKWLPTKILIDHGYAQISNYHDHVHSFSAQAQSHGIPMISKEYQTPPEATVFAKKVLNLISFLERWYRDMVFVTPDGFAYRRTHAGVPSGILMTQFIDSFVNLTILLDGLIEFGFTDEEIKQLLVFIMGDDNVIFTPWTLLKLIEFFDWFAKYTLDRFGMVINISKSAVTSIRRKIEVLGYTNNYGFPTRSISKLVGQLAYPERHVTDADMCMRAIGFAYASCAQSETFHALCKKVFQYYFAKTSINERLILKGRKAELPGMFFAYPDVSEHIRLDHFPSLSEVRILLSKFQGYLKETPFGTIPTFSTPQTLRDQTQ"}", "/scratch/micpie/export/uniprot_reactions/valid_0-1.jsonl": "{"text":"Task: Identify a chemical reaction that can be catalyzed by the following polypeptide.\nAmino acid sequence : MSLEQKKGADIISKILQIQNSIGKTTSPSTLKTKLSEISRKEQENARIQSKLSDLQKKKIDIDNKLLKEKQNLIKEEILERKKLEVLTKKQQKDEIEHQKKLKREIDAIKASTQYITDVSISSYNNTIPETEPEYDLFISHASEDKEDFVRPLAETLQQLGVNVWYDEFTLKVGDSLRQKIDSGLRNSKYGTVVLSTDFIKKDWTNYELDGLVAREMNGHKMILPIWHKITKNDVLDYSPNLADKVALNTSVNSIEEIAHQLADVILNR\nResult: H2O + NADP(+) -> ADP-D-ribose 2'-phosphate + H(+) + nicotinamide"} {"text":"Task: Predict a biochemical reaction that can be catalyzed by the following AA sequence.\nAA sequence: MSTLLIPQDTIAHTFDEAVASESNLRIDEVPENYLERFIHPSEPENFEFYSLRDSDIPSKRIPKNGIQVFENLKYHTNSKDNLYKDQPSSGPSPMRGVANIIREYFPQYLDDLRTWCRPKSSDDSIFNDFNHEQRITQPFTEERERRLLPLIDHFLGIKPYDIVHYCDTRFYPWKLSTRADYFHNHSRDRKAHAAKSHPDFATGPTKKSYFINSHLFFDRSTVHNIKEYGFPFRPTTDSARNETLLDLWFKKVPTELLVRSHISKRDNLKVRPVYNAPMIYIRIECMLFYPLLAQARKRDCCIMYGLETIRGGMNELERISNAFNSFLLIDWSRFDHLAPFTISNFFFKKWLPTKILIDHGYAQISNYHDHVHSFSAQAQSHGIPMISKEYQTPPEATVFAKKVLNLISFLERWYRDMVFVTPDGFAYRRTHAGVPSGILMTQFIDSFVNLTILLDGLIEFGFTDEEIKQLLVFIMGDDNVIFTPWTLLKLIEFFDWFAKYTLDRFGMVINISKSAVTSIRRKIEVLGYTNNYGFPTRSISKLVGQLAYPERHVTDADMCMRAIGFAYASCAQSETFHALCKKVFQYYFAKTSINERLILKGRKAELPGMFFAYPDVSEHIRLDHFPSLSEVRILLSKFQGYLKETPFGTIPTFSTPQTLRDQTQ\nResult: a ribonucleoside 5'-triphosphate + RNA(n) -> diphosphate + RNA(n+1)"}", "/scratch/micpie/export/uniprot_reactions/train_0-2.jsonl": "{"text":"Task: Create a protein that can catalyze a specific chemical reaction.\nReaction: Hydrolysis of (1->3)-beta-D-glucosidic linkages in (1->3)-beta-D-glucans.\nResult: MRFQVIVAAATITMITSYIPGVASQSTSDGDDLFVPVSNFDPKSIFPEIKHPFEPMYANTENGKIVPTNSWISNLFYPSADNLAPTTPDPYTLRLLDGYGGNPGLTIRQPSAKVLGSYPPTNDVPYTDAGYMINSVVVDLRLTSSEWSDVVPDRQVTDWDHLSANLRLSTPQDSNSYIDFPIVRGMAYITANYNNLTPQFLSQHAIISVEADEKKSDDNTSTFSGRKFKITMNDDPTSTFIIYSLGDKPLELRKQDNSNLVASKPYTGVIRVAKLPAPEFETLLDASRAVWPTGGDISARSDDNNGASYTIKWKTNSNEAPLLTYAYAHHLTSIDDSNVKRTDMTLQSATKGPMTALVGNEWTLRETELSPVEWLPLQAAPNPTTINEIMTEINKDIASNYTQETAKEDNYFSGKGLQKFAMLALILNKSDQTQLRNPELAQIALDKLKAAFLPYLQNEQADPFRYDTLYKGIVAKAGLPTSMGGTDDLSAEFGHSYYSDHHYHQGYFVVTAAIIHHLDPTWNADRLKAWTEALIRDVNNANDGDEYFAAFRNWDWFAGHSWAGGIKPDGALDGRDQESVPESVNFYWGAKLWGLATGNTPLTKLASLQLAVTKRTTYEYFWMLDGNKNRPENIVRNKVIGIYFEQKTDYTTYFGRFLEYIHGIQQLPMTPELMEYIRTPEFVSQEWDEKLGAIAPTVQSPWAGVLYLNYAIINPAEAYPALRKVQMDDGQTRSYSLYLTATRPHFFRRSLLAALARHGSTRRPSLPSSGDDDKHEDGFLLRFRRLNPFNLKHRIY"} {"text":"Task: Come up with a AA sequence that can catalyze a specific reaction.\nReaction: a ribonucleoside 5'-triphosphate + RNA(n) -> diphosphate + RNA(n+1)\nResult: MSLTSRYTHFVPDSTITEILNDSNTPQILLHYANIVNGSTPVHFTSHHDNQVNWTVATLTRMSQYMIPDFMKLFPPLEPTLSLQPDCHCSFINLPRPEIKIPIEILSPPKPNYAKYHYDATTSRVFVNSKHEMYMDNFDVSQLIRDVAAIKTDSPSGNITKGLLKTFHDSIKLRALSPIMSMFHSLLSYRCPCCTSLNGMKKLNHLCFQYSSIYAFLCDMVRPYMCVPFFVDRLGVQILPGFKVSSQYPLLFFEAIELMHTVGLGNLSDSLSGWCFYTWLDRARIGVFREMFNRRGSITMLKSRVVSTGTIFRFSQREFVIESITEQRSTDISPTFEECSFSDSQYIQDNCYKPIYDITTTLDDVKCRWLDVALNYFYGAVLYVTGPVSLALEQSGMGRPGSLNLQFGGTTDVYVEGRWITIDVEPVSPFVSRIKQLADRELAKTKVNGDSLEHGFFEAQTTNSAGNTKETLAGLRSEIIEQHDSPQEGRLLASMAGIRVIDAMRRFNTTFRDHTEFLNEVRRPTKAGMRYQQQRRPRVIQMTGTEAQLGGWLLLNVYEPTYKRLGYTSSGKNIGDIRDMQAVLEASGQNGINSSVDIIGMDASTQNTHVTLLGSAAIKAYNPERIGFPKMFFQSTHNGGDANSRVLPTRVTRDGQTIPKDDDVKYNLPQLAIIYSLHGMHGPTILYDGYFAPAVLTSQTVFRSGWYNTSSQHTMLGSLVLLSLEEDIRNGYKNPYDGAPERSLIAKHWHSIRIIGRVLGDDILLKAFGPPTLTPDELREVTAEVCAEFEHRMELLGFLCERAFSDVMCEFLKQKGFGGAPHMFPDRLVLYTSERGNQAMTNPTTMYRVCDALIIEFNSRSRNIFNTCVSRRVLQTVCSTFALRMTSSGHLVRRSYASRKPYSRVAKVSDGILSELHNHKTVFRIIDYNVLGDHIAMIFLPMLWATNHILGCPPPAIVSISGANIPAASPLTYPSAAITTFWLTATSRRKIDFDSSATAYKKSMSDISNLTAVPLDIIFSFSNAMELSPLSINLDKDYDIDTLRTFGFIVGIMSDSLFPTPSATRAKIKSPVVDDWSRYADSLLNPTRVRSSHHGSEILAESNVVVPYELRYAHRGTAKVRQSMYELPVTDLEYGENTMTTLTQLSESLKVKPGTSKLLRDAMLAGEVFVIPTTHPVTLPCPSFDAHGYGHIIPPNSLQSLLLTHLGLPVSSASYTSSFAKTILSDGKLPGSAEAYLSLYQETYKKGPSAVAYLKDAIGFSDSSMSALERLASNGLYGISGASFAYNPRGGFFFRFDQDNADRFGTSLSPSPTIRRLDIVHMMFTMLMYPTTMVSQNQWMMVRFGRSFSRLARR"}", "/scratch/micpie/export/uniprot_reactions/train_0-1.jsonl": "{"text":"Task: Predict a reaction that can be catalyzed by the following polypeptide.\nAmino acid sequence : MRFQVIVAAATITMITSYIPGVASQSTSDGDDLFVPVSNFDPKSIFPEIKHPFEPMYANTENGKIVPTNSWISNLFYPSADNLAPTTPDPYTLRLLDGYGGNPGLTIRQPSAKVLGSYPPTNDVPYTDAGYMINSVVVDLRLTSSEWSDVVPDRQVTDWDHLSANLRLSTPQDSNSYIDFPIVRGMAYITANYNNLTPQFLSQHAIISVEADEKKSDDNTSTFSGRKFKITMNDDPTSTFIIYSLGDKPLELRKQDNSNLVASKPYTGVIRVAKLPAPEFETLLDASRAVWPTGGDISARSDDNNGASYTIKWKTNSNEAPLLTYAYAHHLTSIDDSNVKRTDMTLQSATKGPMTALVGNEWTLRETELSPVEWLPLQAAPNPTTINEIMTEINKDIASNYTQETAKEDNYFSGKGLQKFAMLALILNKSDQTQLRNPELAQIALDKLKAAFLPYLQNEQADPFRYDTLYKGIVAKAGLPTSMGGTDDLSAEFGHSYYSDHHYHQGYFVVTAAIIHHLDPTWNADRLKAWTEALIRDVNNANDGDEYFAAFRNWDWFAGHSWAGGIKPDGALDGRDQESVPESVNFYWGAKLWGLATGNTPLTKLASLQLAVTKRTTYEYFWMLDGNKNRPENIVRNKVIGIYFEQKTDYTTYFGRFLEYIHGIQQLPMTPELMEYIRTPEFVSQEWDEKLGAIAPTVQSPWAGVLYLNYAIINPAEAYPALRKVQMDDGQTRSYSLYLTATRPHFFRRSLLAALARHGSTRRPSLPSSGDDDKHEDGFLLRFRRLNPFNLKHRIY\nResult: Hydrolysis of (1->3)-beta-D-glucosidic linkages in (1->3)-beta-D-glucans."} {"text":"Task: Predict a biochemical reaction that can be catalyzed by this polypeptide.\nSequence: MSLTSRYTHFVPDSTITEILNDSNTPQILLHYANIVNGSTPVHFTSHHDNQVNWTVATLTRMSQYMIPDFMKLFPPLEPTLSLQPDCHCSFINLPRPEIKIPIEILSPPKPNYAKYHYDATTSRVFVNSKHEMYMDNFDVSQLIRDVAAIKTDSPSGNITKGLLKTFHDSIKLRALSPIMSMFHSLLSYRCPCCTSLNGMKKLNHLCFQYSSIYAFLCDMVRPYMCVPFFVDRLGVQILPGFKVSSQYPLLFFEAIELMHTVGLGNLSDSLSGWCFYTWLDRARIGVFREMFNRRGSITMLKSRVVSTGTIFRFSQREFVIESITEQRSTDISPTFEECSFSDSQYIQDNCYKPIYDITTTLDDVKCRWLDVALNYFYGAVLYVTGPVSLALEQSGMGRPGSLNLQFGGTTDVYVEGRWITIDVEPVSPFVSRIKQLADRELAKTKVNGDSLEHGFFEAQTTNSAGNTKETLAGLRSEIIEQHDSPQEGRLLASMAGIRVIDAMRRFNTTFRDHTEFLNEVRRPTKAGMRYQQQRRPRVIQMTGTEAQLGGWLLLNVYEPTYKRLGYTSSGKNIGDIRDMQAVLEASGQNGINSSVDIIGMDASTQNTHVTLLGSAAIKAYNPERIGFPKMFFQSTHNGGDANSRVLPTRVTRDGQTIPKDDDVKYNLPQLAIIYSLHGMHGPTILYDGYFAPAVLTSQTVFRSGWYNTSSQHTMLGSLVLLSLEEDIRNGYKNPYDGAPERSLIAKHWHSIRIIGRVLGDDILLKAFGPPTLTPDELREVTAEVCAEFEHRMELLGFLCERAFSDVMCEFLKQKGFGGAPHMFPDRLVLYTSERGNQAMTNPTTMYRVCDALIIEFNSRSRNIFNTCVSRRVLQTVCSTFALRMTSSGHLVRRSYASRKPYSRVAKVSDGILSELHNHKTVFRIIDYNVLGDHIAMIFLPMLWATNHILGCPPPAIVSISGANIPAASPLTYPSAAITTFWLTATSRRKIDFDSSATAYKKSMSDISNLTAVPLDIIFSFSNAMELSPLSINLDKDYDIDTLRTFGFIVGIMSDSLFPTPSATRAKIKSPVVDDWSRYADSLLNPTRVRSSHHGSEILAESNVVVPYELRYAHRGTAKVRQSMYELPVTDLEYGENTMTTLTQLSESLKVKPGTSKLLRDAMLAGEVFVIPTTHPVTLPCPSFDAHGYGHIIPPNSLQSLLLTHLGLPVSSASYTSSFAKTILSDGKLPGSAEAYLSLYQETYKKGPSAVAYLKDAIGFSDSSMSALERLASNGLYGISGASFAYNPRGGFFFRFDQDNADRFGTSLSPSPTIRRLDIVHMMFTMLMYPTTMVSQNQWMMVRFGRSFSRLARR\nResult: a ribonucleoside 5'-triphosphate + RNA(n) -> diphosphate + RNA(n+1)"}", "/scratch/micpie/export/uniprot_reactions/valid_0-3.jsonl": "{"text":"User: Can you come up with a chemical reaction that can be catalyzed by the following AA sequence:\\nMSLEQKKGADIISKILQIQNSIGKTTSPSTLKTKLSEISRKEQENARIQSKLSDLQKKKIDIDNKLLKEKQNLIKEEILERKKLEVLTKKQQKDEIEHQKKLKREIDAIKASTQYITDVSISSYNNTIPETEPEYDLFISHASEDKEDFVRPLAETLQQLGVNVWYDEFTLKVGDSLRQKIDSGLRNSKYGTVVLSTDFIKKDWTNYELDGLVAREMNGHKMILPIWHKITKNDVLDYSPNLADKVALNTSVNSIEEIAHQLADVILNR\nAssistant: Sure, the reaction that can be catalyzed by the given AA sequence are:\\nH2O + NADP(+) -> ADP-D-ribose 2'-phosphate + H(+) + nicotinamide"} {"text":"User: Can you tell me a reaction that can be catalyzed by the following AA sequence:\\nMSTLLIPQDTIAHTFDEAVASESNLRIDEVPENYLERFIHPSEPENFEFYSLRDSDIPSKRIPKNGIQVFENLKYHTNSKDNLYKDQPSSGPSPMRGVANIIREYFPQYLDDLRTWCRPKSSDDSIFNDFNHEQRITQPFTEERERRLLPLIDHFLGIKPYDIVHYCDTRFYPWKLSTRADYFHNHSRDRKAHAAKSHPDFATGPTKKSYFINSHLFFDRSTVHNIKEYGFPFRPTTDSARNETLLDLWFKKVPTELLVRSHISKRDNLKVRPVYNAPMIYIRIECMLFYPLLAQARKRDCCIMYGLETIRGGMNELERISNAFNSFLLIDWSRFDHLAPFTISNFFFKKWLPTKILIDHGYAQISNYHDHVHSFSAQAQSHGIPMISKEYQTPPEATVFAKKVLNLISFLERWYRDMVFVTPDGFAYRRTHAGVPSGILMTQFIDSFVNLTILLDGLIEFGFTDEEIKQLLVFIMGDDNVIFTPWTLLKLIEFFDWFAKYTLDRFGMVINISKSAVTSIRRKIEVLGYTNNYGFPTRSISKLVGQLAYPERHVTDADMCMRAIGFAYASCAQSETFHALCKKVFQYYFAKTSINERLILKGRKAELPGMFFAYPDVSEHIRLDHFPSLSEVRILLSKFQGYLKETPFGTIPTFSTPQTLRDQTQ\nAssistant: Yes, sure, the reaction that can be catalyzed by the given AA sequence are:\\na ribonucleoside 5'-triphosphate + RNA(n) -> diphosphate + RNA(n+1)"}", "/scratch/micpie/export/bio_ner_37/train_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: ICs were measured in gated CD4+ T-cell subsets including naïve CD4+ T cells (CD3+ CD8-CD4+ CD45RA+ CCR7+ CD27+), central memory CD4+ T cells (CD3+ CD8-CD4+ CD45RA-CCR7+ CD27+), transitional memory CD4+ T cells (CD3+ CD8-CD4+ CD45RA-CCR7-CD27+), effector memory CD4+ T cells (CD3+ CD8-CD4+ CD45RA-CCR7-CD27-) and terminally differentiated CD4+ T cells (CD3+ CD8-CD4+ CD45RA+ CCR7-CD27-)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: ICs,0,3,Gene\/Protein\nCD4,27,30,Gene\/Protein\nCD4,66,69,Gene\/Protein\nCD3,82,85,Gene\/Protein\nCD8,88,91,Gene\/Protein\nCD4,94,97,Gene\/Protein\nCD45RA,100,106,Gene\/Protein\nCCR7,109,113,Gene\/Protein\nCD27,116,120,Gene\/Protein\nCD4,140,143,Gene\/Protein\nCD3,156,159,Gene\/Protein\nCD8,162,165,Gene\/Protein\nCD4,168,171,Gene\/Protein\nCD45RA,174,180,Gene\/Protein\nCCR7,183,187,Gene\/Protein\nCD27,190,194,Gene\/Protein\nCD4,219,222,Gene\/Protein\nCD3,235,238,Gene\/Protein\nCD8,241,244,Gene\/Protein\nCD4,247,250,Gene\/Protein\nCD45RA,253,259,Gene\/Protein\nCCR7,262,266,Gene\/Protein\nCD27,269,273,Gene\/Protein\nCD4,294,297,Gene\/Protein\nCD3,310,313,Gene\/Protein\nCD8,316,319,Gene\/Protein\nCD4,322,325,Gene\/Protein\nCD45RA,328,334,Gene\/Protein\nCCR7,337,341,Gene\/Protein\nCD27,344,348,Gene\/Protein\nCD4,382,385,Gene\/Protein\nCD3,398,401,Gene\/Protein\nCD8,404,407,Gene\/Protein\nCD4,410,413,Gene\/Protein\nCD45RA,416,422,Gene\/Protein\nCCR7,425,429,Gene\/Protein\nCD27,432,436,Gene\/Protein"} {"text":"Task: Please carry out the NER task for the the text below.\nText: ICs were measured in gated CD4+ T-cell subsets including naïve CD4+ T cells (CD3+ CD8-CD4+ CD45RA+ CCR7+ CD27+), central memory CD4+ T cells (CD3+ CD8-CD4+ CD45RA-CCR7+ CD27+), transitional memory CD4+ T cells (CD3+ CD8-CD4+ CD45RA-CCR7-CD27+), effector memory CD4+ T cells (CD3+ CD8-CD4+ CD45RA-CCR7-CD27-) and terminally differentiated CD4+ T cells (CD3+ CD8-CD4+ CD45RA+ CCR7-CD27-)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: ICs,0,3,Gene\/Protein\nCD4,27,30,Gene\/Protein\nCD4,66,69,Gene\/Protein\nCD3,82,85,Gene\/Protein\nCD8,88,91,Gene\/Protein\nCD4,94,97,Gene\/Protein\nCD45RA,100,106,Gene\/Protein\nCCR7,109,113,Gene\/Protein\nCD27,116,120,Gene\/Protein\nCD4,140,143,Gene\/Protein\nCD3,156,159,Gene\/Protein\nCD8,162,165,Gene\/Protein\nCD4,168,171,Gene\/Protein\nCD45RA,174,180,Gene\/Protein\nCCR7,183,187,Gene\/Protein\nCD27,190,194,Gene\/Protein\nCD4,219,222,Gene\/Protein\nCD3,235,238,Gene\/Protein\nCD8,241,244,Gene\/Protein\nCD4,247,250,Gene\/Protein\nCD45RA,253,259,Gene\/Protein\nCCR7,262,266,Gene\/Protein\nCD27,269,273,Gene\/Protein\nCD4,294,297,Gene\/Protein\nCD3,310,313,Gene\/Protein\nCD8,316,319,Gene\/Protein\nCD4,322,325,Gene\/Protein\nCD45RA,328,334,Gene\/Protein\nCCR7,337,341,Gene\/Protein\nCD27,344,348,Gene\/Protein\nCD4,382,385,Gene\/Protein\nCD3,398,401,Gene\/Protein\nCD8,404,407,Gene\/Protein\nCD4,410,413,Gene\/Protein\nCD45RA,416,422,Gene\/Protein\nCCR7,425,429,Gene\/Protein\nCD27,432,436,Gene\/Protein"}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/test_0-10.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that is not inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: This is a molecule that is not inhibiting the activity of the serine\/threonine kinase, STK3: S(c1nc(N)c(cc1C(=O)Nc1ccccc1)C(=O)Nc1c(OC)cccc1)CC=C"} {"text":"User: I'm looking for the canonical SMILES of a molecule that is not inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: This is a molecule that is not inhibiting the activity of the serine\/threonine kinase, STK3: COc1ccc(C(=O)COC(=O)Cn2cnc3ccccc3c2=O)c(OC)c1"}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/valid_0-8.jsonl": "{"text":"User: Is the molecule with the SELFIES [S][C][Branch2][Ring1][C][C][=N][NH1][C][=Branch1][C][=O][C][=C][Ring1][#Branch1][C][=C][C][=C][Ring1][=Branch1][=C][C][=C][Ring1][S] inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: No, it is not inhibiting the activity of the serine\/threonine kinase, STK3."} {"text":"User: Is the molecule with the DeepSMILES SCCC=O)nccnc95))cccccc6)))))))cccccc6))))))))))cccccc6 inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: No, it is not inhibiting the activity of the serine\/threonine kinase, STK3."}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/train_0-8.jsonl": "{"text":"User: Is the molecule with the SELFIES [F][C][=C][C][=C][Branch2][Ring1][S][C][C][C][Branch1][=N][O][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][Branch1][C][O][=O][O][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][O][C][=C][Ring2][Ring1][=Branch2] inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: No, it is not inhibiting the activity of the serine\/threonine kinase, STK3."} {"text":"User: Is the molecule with the DeepSMILES OCCNCCNCC6))Ccccccc6))))))))))))COcccccc6))C=O)C inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: No, it is not inhibiting the activity of the serine\/threonine kinase, STK3."}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of the serine\/threonine kinase, STK3.\nMolecule SMILES: S(c1nc(N)c(cc1C(=O)Nc1ccccc1)C(=O)Nc1c(OC)cccc1)CC=C\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not inhibiting the activity of the serine\/threonine kinase, STK3."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of the serine\/threonine kinase, STK3.\ncanonical SMILES: COc1ccc(C(=O)COC(=O)Cn2cnc3ccccc3c2=O)c(OC)c1\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not inhibiting the activity of the serine\/threonine kinase, STK3."}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/valid_0-9.jsonl": "{"text":"User: Can you generate the DeepSMILES of a molecule that is not inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: Yes, I'm happy to help, here you go: sccn[nH]c=O)cc6cccc6))))))))))ccc5"} {"text":"User: Can you generate the SELFIES of a molecule that is not inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: Of course, here you go: [S][C][Branch2][Ring2][Ring2][C][C][=Branch1][C][=O][N][C][=Branch2][Ring1][C][=C][Branch1][#Branch1][N][=C][Ring1][#Branch2][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]"}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/test_0-1.jsonl": "{"text":"Based on the SELFIES representation [S][Branch2][Ring2][=N][C][=N][C][Branch1][C][N][=C][Branch2][Ring1][C][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][=C][Branch1][Ring1][O][C][C][=C][C][=C][Ring1][Branch2][C][C][=C], the molecule is not a serine\/threonine kinase, STK3 inhibitor."} {"text":"Based on the InChI InChI=1S\/C20H18N2O6\/c1-26-13-7-8-15(18(9-13)27-2)17(23)11-28-19(24)10-22-12-21-16-6-4-3-5-14(16)20(22)25\/h3-9,12H,10-11H2,1-2H3, the molecule is not a serine\/threonine kinase, STK3 inhibitor."}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/valid_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of [S][C][Branch2][Ring1][C][C][=N][NH1][C][=Branch1][C][=O][C][=C][Ring1][#Branch1][C][=C][C][=C][Ring1][=Branch1][=C][C][=C][Ring1][S] is not exhibiting activity against any of the following: serine kinase \/threonine kinase."} {"text":"The molecule with the SELFIES [S][C][Branch2][Ring2][Ring2][C][C][=Branch1][C][=O][N][C][=Branch2][Ring1][C][=C][Branch1][#Branch1][N][=C][Ring1][#Branch2][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1] is not displaying activity against any of the following: serine kinase \/threonine kinase."}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/test_0-2.jsonl": "{"text":"The SMILES S(c1nc(N)c(cc1C(=O)Nc1ccccc1)C(=O)Nc1c(OC)cccc1)CC=C is from a molecule that is not a serine\/threonine kinase, STK3 inhibitor."} {"text":"The DeepSMILES OC=O)Cnc=O)ccnc6))cccc6))))))))))CC=O)ccOC))ccOC))cc6 represents a molecule that is not a serine\/threonine kinase, STK3 inhibitor."}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/valid_0-10.jsonl": "{"text":"User: I'm searching for the InChI of a molecule that is not inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: This is a molecule that is not inhibiting the activity of the serine\/threonine kinase, STK3: InChI=1S\/C12H8N2OS\/c15-12-9-5-2-1-4-8(9)11(13-14-12)10-6-3-7-16-10\/h1-7H,(H,14,15)"} {"text":"User: I'm searching for the SMILES of a molecule that is not inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: This is a molecule that is not inhibiting the activity of the serine\/threonine kinase, STK3: S1C(CC(=O)n2c(c(nc12)c1ccccc1)c1ccccc1)c1ccccc1"}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/train_0-6.jsonl": "{"text":"Task: Please generate a molecule canonical SMILES based on the text description.\nDescription: A molecule that is inhibiting the activity of the serine\/threonine kinase, STK3.\nResult: O=C(O)C1=CC(c2ccc(F)cc2)CC(OCc2ccc(CO)cc2)O1"} {"text":"Task: Please give me a molecule canonical SMILES based on the description.\nDescription: A molecule that is inhibiting the activity of the serine\/threonine kinase, STK3.\nResult: CC(=O)c1ccc(OCC(O)CN2CCN(Cc3ccccc3)CC2)cc1"}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/valid_0-6.jsonl": "{"text":"Task: Please generate a molecule InChI based on the description.\nDescription: A molecule that is inhibiting the activity of the serine\/threonine kinase, STK3.\nResult: InChI=1S\/C12H8N2OS\/c15-12-9-5-2-1-4-8(9)11(13-14-12)10-6-3-7-16-10\/h1-7H,(H,14,15)"} {"text":"Task: Please give me a SELFIES based on the text description.\nDescription: A molecule that is inhibiting the activity of the serine\/threonine kinase, STK3.\nResult: [S][C][Branch2][Ring2][Ring2][C][C][=Branch1][C][=O][N][C][=Branch2][Ring1][C][=C][Branch1][#Branch1][N][=C][Ring1][#Branch2][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]"}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/test_0-9.jsonl": "{"text":"User: Can you create the DeepSMILES of a molecule that is not inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: Yes, I'm happy to help, here you go: ScncN)ccc6C=O)Ncccccc6))))))))))C=O)NccOC))cccc6))))))))))))CC=C"} {"text":"User: Can you give me the SMILES of a molecule that is not inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: Of course, here you go: O(C(=O)Cn1c(=O)c2c(nc1)cccc2)CC(=O)c1c(OC)cc(OC)cc1"}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/test_0-0.jsonl": "{"text":"The molecule with the SELFIES [S][Branch2][Ring2][=N][C][=N][C][Branch1][C][N][=C][Branch2][Ring1][C][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][=C][Branch1][Ring1][O][C][C][=C][C][=C][Ring1][Branch2][C][C][=C] is not showing activity against any of the following: serine kinase \/threonine kinase."} {"text":"The molecule with the SMILES representation of O(C(=O)Cn1c(=O)c2c(nc1)cccc2)CC(=O)c1c(OC)cc(OC)cc1 is not displaying activity against any of the following: serine kinase \/threonine kinase."}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/valid_0-7.jsonl": "{"text":"User: Can you derive if the molecule with the DeepSMILES sccn[nH]c=O)cc6cccc6))))))))))ccc5 is inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: No, this molecule is not inhibiting the activity of the serine\/threonine kinase, STK3."} {"text":"User: Can you derive if the molecule with the SELFIES [S][C][Branch2][Ring2][Ring2][C][C][=Branch1][C][=O][N][C][=Branch2][Ring1][C][=C][Branch1][#Branch1][N][=C][Ring1][#Branch2][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1] is inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: No, this molecule is not inhibiting the activity of the serine\/threonine kinase, STK3."}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/test_0-3.jsonl": "{"text":"The SMILES S(c1nc(N)c(cc1C(=O)Nc1ccccc1)C(=O)Nc1c(OC)cccc1)CC=C is not inhibiting the activity of the serine\/threonine kinase, STK3."} {"text":"The molecule InChI InChI=1S\/C20H18N2O6\/c1-26-13-7-8-15(18(9-13)27-2)17(23)11-28-19(24)10-22-12-21-16-6-4-3-5-14(16)20(22)25\/h3-9,12H,10-11H2,1-2H3 is not inhibiting the activity of the serine\/threonine kinase, STK3."}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/valid_0-11.jsonl": "{"text":"User: I want to create a InChI.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be inhibiting the activity of the serine\/threonine kinase, STK3.\nAssistant: Got it, here you go, this InChI is not inhibiting the activity of the serine\/threonine kinase, STK3: InChI=1S\/C12H8N2OS\/c15-12-9-5-2-1-4-8(9)11(13-14-12)10-6-3-7-16-10\/h1-7H,(H,14,15)"} {"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be inhibiting the activity of the serine\/threonine kinase, STK3.\nAssistant: Ok, this canonical SMILES is not inhibiting the activity of the serine\/threonine kinase, STK3: O=C1CC(c2ccccc2)Sc2nc(-c3ccccc3)c(-c3ccccc3)n21"}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/train_0-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of O=C(O)C1=CC(c2ccc(F)cc2)CC(OCc2ccc(CO)cc2)O1 is not showing activity against any of the following: serine kinase \/threonine kinase."} {"text":"The molecule with the SELFIES representation of [O][C][Branch2][Ring1][Branch1][C][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C] is not exhibiting activity against any of the following: serine kinase \/threonine kinase."}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/test_0-6.jsonl": "{"text":"Task: Please create a canonical SMILES based on the text description.\nDescription: A molecule that is inhibiting the activity of the serine\/threonine kinase, STK3.\nResult: C=CCSc1nc(N)c(C(=O)Nc2ccccc2OC)cc1C(=O)Nc1ccccc1"} {"text":"Task: Please give me a InChI based on the text description below.\nDescription: A molecule that is inhibiting the activity of the serine\/threonine kinase, STK3.\nResult: InChI=1S\/C20H18N2O6\/c1-26-13-7-8-15(18(9-13)27-2)17(23)11-28-19(24)10-22-12-21-16-6-4-3-5-14(16)20(22)25\/h3-9,12H,10-11H2,1-2H3"}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/train_0-10.jsonl": "{"text":"User: I'm searching for the DeepSMILES of a molecule that is not inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: This is a molecule that is not inhibiting the activity of the serine\/threonine kinase, STK3: FccccCCCOC=C6)CO)=O))))OCcccccc6))CO)))))))))))cc6"} {"text":"User: I'm looking for the SELFIES of a molecule that is not inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: This is a molecule that is not inhibiting the activity of the serine\/threonine kinase, STK3: [O][C][Branch2][Ring1][Branch1][C][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C]"}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/train_0-3.jsonl": "{"text":"The molecule canonical SMILES O=C(O)C1=CC(c2ccc(F)cc2)CC(OCc2ccc(CO)cc2)O1 is not inhibiting the activity of the serine\/threonine kinase, STK3."} {"text":"The InChI InChI=1S\/C22H28N2O3\/c1-18(25)20-7-9-22(10-8-20)27-17-21(26)16-24-13-11-23(12-14-24)15-19-5-3-2-4-6-19\/h2-10,21,26H,11-17H2,1H3 is not inhibiting the activity of the serine\/threonine kinase, STK3."}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/train_0-12.jsonl": "{"text":"User: I want to create a InChI.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be inhibiting the activity of the serine\/threonine kinase, STK3.\nAssistant: Understood, this InChI is not inhibiting the activity of the serine\/threonine kinase, STK3: InChI=1S\/C20H19FO5\/c21-17-7-5-15(6-8-17)16-9-18(20(23)24)26-19(10-16)25-12-14-3-1-13(11-22)2-4-14\/h1-9,16,19,22H,10-12H2,(H,23,24)"} {"text":"User: I want to generate a molecule InChI.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be inhibiting the activity of the serine\/threonine kinase, STK3.\nAssistant: Understood, this InChI is not inhibiting the activity of the serine\/threonine kinase, STK3: InChI=1S\/C22H28N2O3\/c1-18(25)20-7-9-22(10-8-20)27-17-21(26)16-24-13-11-23(12-14-24)15-19-5-3-2-4-6-19\/h2-10,21,26H,11-17H2,1H3"}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the activity of the serine\/threonine kinase, STK3?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any additional words.\nOptions:\n1: InChI=1S\/C23H22N4O3S\/c1-3-13-31-23-17(22(29)25-15-9-5-4-6-10-15)14-16(20(24)27-23)21(28)26-18-11-7-8-12-19(18)30-2\/h3-12,14H,1,13H2,2H3,(H2,24,27)(H,25,29)(H,26,28)\n2: InChI=1S\/C19H15F3N4O4S2\/c20-19(21,22)13-8-4-5-9-14(13)23-17(28)11-31-18-24-15(10-16(27)25-18)26-32(29,30)12-6-2-1-3-7-12\/h1-10H,11H2,(H,23,28)(H2,24,25,26,27)\n3: InChI=1S\/C18H23N5OS\/c1-18(2)8-11-12(10-22(18)3)16(23-4-6-24-7-5-23)21-17-14(11)15(20)13(9-19)25-17\/h4-8,10,20H2,1-3H3\nAnswer: 1, 2, 3"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the activity of the serine\/threonine kinase, STK3?\nConstraint: You must select none, one or more options from 1 or 2 without using any other words.\nOptions:\n1. ClccScnccN)cc6)))))))cccc6\n2. OC=O)Cnc=O)ccnc6))cccc6))))))))))CC=O)ccOC))ccOC))cc6\nAnswer: 1, 2"}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/valid_0-2.jsonl": "{"text":"The DeepSMILES sccn[nH]c=O)cc6cccc6))))))))))ccc5 represents a molecule that is not a serine\/threonine kinase, STK3 inhibitor."} {"text":"The DeepSMILES SCCC=O)nccnc95))cccccc6)))))))cccccc6))))))))))cccccc6 represents a molecule that is not a serine\/threonine kinase, STK3 inhibitor."}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/valid_0-1.jsonl": "{"text":"Based on the canonical SMILES O=c1[nH]nc(-c2cccs2)c2ccccc12, the molecule is not a serine\/threonine kinase, STK3 inhibitor."} {"text":"Based on the SELFIES [S][C][Branch2][Ring2][Ring2][C][C][=Branch1][C][=O][N][C][=Branch2][Ring1][C][=C][Branch1][#Branch1][N][=C][Ring1][#Branch2][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1], the molecule is not a serine\/threonine kinase, STK3 inhibitor."}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the activity of the serine\/threonine kinase, STK3?\nConstraint: You must select none, one or more options from A, B, or C without using any additional words.\nOptions:\nA) sccn[nH]c=O)cc6cccc6))))))))))ccc5\nB) ClcncnccccCCCc5nc9sc%16%12)))))))))ccccOC))cc6))))))))))C\nC) O=CNnc=O)cccc6)COC))=O)))cccc6)))))))))CCCCCC)C)C))CC6\nAnswer: A, B, C"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the activity of the serine\/threonine kinase, STK3?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any additional words.\nOptions:\nA [N][C][Branch2][Ring1][#Branch1][N][Branch1][O][C][=C][Ring1][Branch1][C][=C][C][=C][Ring1][=Branch1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=N][C][C][Ring2][Ring1][Ring1]\nB [S][Branch2][Ring1][Branch1][C][N][Branch1][S][C][=Branch1][Branch1][=N][N][=Ring1][Branch1][C][C][S][C][=C][C][=Ring1][Branch1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2]\nC [S][=Branch1][C][=O][=Branch1][C][=O][Branch2][Ring2][S][C][=C][C][Branch2][Ring2][Ring2][N][C][=Branch1][C][=O][C][=C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][=C][Branch1][Ring2][O][C][C][N][=C][C][=C][Ring1][=Branch2][=C][C][=C][Ring2][Ring1][C][=C][C][=C][Ring2][Ring1][O][C][C][O]\nD [Cl][C][=C][C][S][C][Branch2][Ring1][=Branch1][N][C][C][C][N][Branch1][Branch1][C][C][Ring1][#Branch1][C][=Branch1][C][=O][N][C][Branch1][C][C][C][=N][C][=Ring2][Ring1][C][C][=C][Ring2][Ring1][=Branch1]\nE [S][C][Branch2][Ring2][Ring2][C][C][=Branch1][C][=O][N][C][=Branch2][Ring1][C][=C][Branch1][#Branch1][N][=C][Ring1][#Branch2][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]\nAnswer: A, B, C, D, E"}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of the serine\/threonine kinase, STK3.\ncanonical SMILES: O=c1[nH]nc(-c2cccs2)c2ccccc12\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not inhibiting the activity of the serine\/threonine kinase, STK3."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of the serine\/threonine kinase, STK3.\nMolecule SELFIES: [S][C][Branch2][Ring2][Ring2][C][C][=Branch1][C][=O][N][C][=Branch2][Ring1][C][=C][Branch1][#Branch1][N][=C][Ring1][#Branch2][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not inhibiting the activity of the serine\/threonine kinase, STK3."}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of the serine\/threonine kinase, STK3.\nMolecule DeepSMILES: sccn[nH]c=O)cc6cccc6))))))))))ccc5\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of the serine\/threonine kinase, STK3.\ncanonical SMILES: O=C1CC(c2ccccc2)Sc2nc(-c3ccccc3)c(-c3ccccc3)n21\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any words.\nResult: False"}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of the serine\/threonine kinase, STK3.\nMolecule DeepSMILES: FccccCCCOC=C6)CO)=O))))OCcccccc6))CO)))))))))))cc6\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not inhibiting the activity of the serine\/threonine kinase, STK3."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of the serine\/threonine kinase, STK3.\nSMILES: OC(CN1CCN(CC1)Cc1ccccc1)COc1ccc(cc1)C(=O)C\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not inhibiting the activity of the serine\/threonine kinase, STK3."}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/valid_0-12.jsonl": "{"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be inhibiting the activity of the serine\/threonine kinase, STK3.\nAssistant: Understood, this InChI is not inhibiting the activity of the serine\/threonine kinase, STK3: InChI=1S\/C12H8N2OS\/c15-12-9-5-2-1-4-8(9)11(13-14-12)10-6-3-7-16-10\/h1-7H,(H,14,15)"} {"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be inhibiting the activity of the serine\/threonine kinase, STK3.\nAssistant: Ok, this SMILES is not inhibiting the activity of the serine\/threonine kinase, STK3: S1C(CC(=O)n2c(c(nc12)c1ccccc1)c1ccccc1)c1ccccc1"}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/train_0-2.jsonl": "{"text":"The SMILES Fc1ccc(C2CC(OC(=C2)C(O)=O)OCc2ccc(cc2)CO)cc1 represents a molecule that is not a serine\/threonine kinase, STK3 inhibitor."} {"text":"The InChI InChI=1S\/C22H28N2O3\/c1-18(25)20-7-9-22(10-8-20)27-17-21(26)16-24-13-11-23(12-14-24)15-19-5-3-2-4-6-19\/h2-10,21,26H,11-17H2,1H3 is from a molecule that is not a serine\/threonine kinase, STK3 inhibitor."}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/test_0-11.jsonl": "{"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be inhibiting the activity of the serine\/threonine kinase, STK3.\nAssistant: Ok, this SMILES is not inhibiting the activity of the serine\/threonine kinase, STK3: S(c1nc(N)c(cc1C(=O)Nc1ccccc1)C(=O)Nc1c(OC)cccc1)CC=C"} {"text":"User: I want to create a SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be inhibiting the activity of the serine\/threonine kinase, STK3.\nAssistant: Ok, this SMILES is not inhibiting the activity of the serine\/threonine kinase, STK3: O(C(=O)Cn1c(=O)c2c(nc1)cccc2)CC(=O)c1c(OC)cc(OC)cc1"}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/train_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the canonical SMILES O=C(O)C1=CC(c2ccc(F)cc2)CC(OCc2ccc(CO)cc2)O1 is inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: No, this molecule is not inhibiting the activity of the serine\/threonine kinase, STK3."} {"text":"User: Can you derive if the molecule with the canonical SMILES CC(=O)c1ccc(OCC(O)CN2CCN(Cc3ccccc3)CC2)cc1 is inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: No, this molecule is not inhibiting the activity of the serine\/threonine kinase, STK3."}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/train_0-11.jsonl": "{"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be inhibiting the activity of the serine\/threonine kinase, STK3.\nAssistant: Ok, here you go, this DeepSMILES is not inhibiting the activity of the serine\/threonine kinase, STK3: FccccCCCOC=C6)CO)=O))))OCcccccc6))CO)))))))))))cc6"} {"text":"User: I want to come up with a canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be inhibiting the activity of the serine\/threonine kinase, STK3.\nAssistant: Ok, this canonical SMILES is not inhibiting the activity of the serine\/threonine kinase, STK3: CC(=O)c1ccc(OCC(O)CN2CCN(Cc3ccccc3)CC2)cc1"}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/train_0-1.jsonl": "{"text":"Based on the InChI representation InChI=1S\/C20H19FO5\/c21-17-7-5-15(6-8-17)16-9-18(20(23)24)26-19(10-16)25-12-14-3-1-13(11-22)2-4-14\/h1-9,16,19,22H,10-12H2,(H,23,24), the molecule is not a serine\/threonine kinase, STK3 inhibitor."} {"text":"Based on the InChI InChI=1S\/C22H28N2O3\/c1-18(25)20-7-9-22(10-8-20)27-17-21(26)16-24-13-11-23(12-14-24)15-19-5-3-2-4-6-19\/h2-10,21,26H,11-17H2,1H3, the molecule is not a serine\/threonine kinase, STK3 inhibitor."}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the activity of the serine\/threonine kinase, STK3?\nConstraint: You must select none, one or more options from A, B, C, or D without using any additional words.\nOptions:\nA) Cc1ccccc1NC(=S)N\/N=C1\/C(=O)N(C)c2ccc(S(=O)(=O)N3CCOCC3)cc21\nB) CC1Cc2cc(S(=O)(=O)N3CCC(C(=O)N(C)Cc4ccccc4)CC3)ccc2N1C(=O)C1CC1\nC) Cc1ccc(-c2nnc(Sc3nc4ccccc4o3)c3ccccc23)cc1\nD) O=C(O)C1=CC(c2ccc(F)cc2)CC(OCc2ccc(CO)cc2)O1\nAnswer: A, B, C, D"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the activity of the serine\/threonine kinase, STK3?\nConstraint: You must select none, one or more options from a or b without using any other words.\nOptions:\na. InChI=1S\/C28H30ClNO6\/c1-28(2)25(34-3)23(31)24(32)27(36-28)35-18-13-14-19(20-11-7-8-12-22(20)29)21(15-18)26(33)30-16-17-9-5-4-6-10-17\/h4-15,23-25,27,31-32H,16H2,1-3H3,(H,30,33)\nb. InChI=1S\/C22H28N2O3\/c1-18(25)20-7-9-22(10-8-20)27-17-21(26)16-24-13-11-23(12-14-24)15-19-5-3-2-4-6-19\/h2-10,21,26H,11-17H2,1H3\nAnswer: a, b"}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of the serine\/threonine kinase, STK3.\ncanonical SMILES: O=C(O)C1=CC(c2ccc(F)cc2)CC(OCc2ccc(CO)cc2)O1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of the serine\/threonine kinase, STK3.\nMolecule SMILES: OC(CN1CCN(CC1)Cc1ccccc1)COc1ccc(cc1)C(=O)C\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/test_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the DeepSMILES ScncN)ccc6C=O)Ncccccc6))))))))))C=O)NccOC))cccc6))))))))))))CC=C is inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: No, this molecule is not inhibiting the activity of the serine\/threonine kinase, STK3."} {"text":"User: Can you tell me if the molecule with the SMILES O(C(=O)Cn1c(=O)c2c(nc1)cccc2)CC(=O)c1c(OC)cc(OC)cc1 is inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: No, this molecule is not inhibiting the activity of the serine\/threonine kinase, STK3."}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/train_0-9.jsonl": "{"text":"User: Can you generate the SMILES of a molecule that is not inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: Of course, here you go: Fc1ccc(C2CC(OC(=C2)C(O)=O)OCc2ccc(cc2)CO)cc1"} {"text":"User: Can you create the DeepSMILES of a molecule that is not inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: Yes, here you go: OCCNCCNCC6))Ccccccc6))))))))))))COcccccc6))C=O)C"}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/valid_0-3.jsonl": "{"text":"The molecule canonical SMILES O=c1[nH]nc(-c2cccs2)c2ccccc12 is not inhibiting the activity of the serine\/threonine kinase, STK3."} {"text":"The molecule SELFIES [S][C][Branch2][Ring2][Ring2][C][C][=Branch1][C][=O][N][C][=Branch2][Ring1][C][=C][Branch1][#Branch1][N][=C][Ring1][#Branch2][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1] is not inhibiting the activity of the serine\/threonine kinase, STK3."}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/test_0-8.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C23H22N4O3S\/c1-3-13-31-23-17(22(29)25-15-9-5-4-6-10-15)14-16(20(24)27-23)21(28)26-18-11-7-8-12-19(18)30-2\/h3-12,14H,1,13H2,2H3,(H2,24,27)(H,25,29)(H,26,28) inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: No, it is not inhibiting the activity of the serine\/threonine kinase, STK3."} {"text":"User: Is the molecule with the canonical SMILES COc1ccc(C(=O)COC(=O)Cn2cnc3ccccc3c2=O)c(OC)c1 inhibiting the activity of the serine\/threonine kinase, STK3?\nAssistant: No, it is not inhibiting the activity of the serine\/threonine kinase, STK3."}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of the serine\/threonine kinase, STK3.\nSELFIES: [S][Branch2][Ring2][=N][C][=N][C][Branch1][C][N][=C][Branch2][Ring1][C][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][=C][Branch1][Ring1][O][C][C][=C][C][=C][Ring1][Branch2][C][C][=C]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of the serine\/threonine kinase, STK3.\nMolecule DeepSMILES: OC=O)Cnc=O)ccnc6))cccc6))))))))))CC=O)ccOC))ccOC))cc6\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/serine_threonine_kinase_33_butkiewicz/test_0-12.jsonl": "{"text":"User: I want to create a molecule SELFIES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be inhibiting the activity of the serine\/threonine kinase, STK3.\nAssistant: Got it, this SELFIES is not inhibiting the activity of the serine\/threonine kinase, STK3: [S][Branch2][Ring2][=N][C][=N][C][Branch1][C][N][=C][Branch2][Ring1][C][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][=C][Branch1][Ring1][O][C][C][=C][C][=C][Ring1][Branch2][C][C][=C]"} {"text":"User: I want to come up with a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be inhibiting the activity of the serine\/threonine kinase, STK3.\nAssistant: Ok, this canonical SMILES is not inhibiting the activity of the serine\/threonine kinase, STK3: COc1ccc(C(=O)COC(=O)Cn2cnc3ccccc3c2=O)c(OC)c1"}", "/scratch/micpie/export/freesolv/train_0-17.jsonl": "{"text":"User: Can you estimate the hydration free energy computed using the GAFF force field in kcal\/mol of the molecule with the DeepSMILES CCCCO?\nAssistant: Sure, this molecule has a hydration free energy computed using the GAFF force field of -3.230 kcal\/mol."} {"text":"User: Can you estimate the hydration free energy computed using the GAFF force field in kcal\/mol of the molecule with the InChI InChI=1S\/C10H18O\/c1-9(2)5-4-6-10(3)7-8-11\/h5,7,11H,4,6,8H2,1-3H3\/b10-7-?\nAssistant: Yes, I'm happy to help, this molecule has a hydration free energy computed using the GAFF force field of -2.600 kcal\/mol."}", "/scratch/micpie/export/freesolv/train_0-16.jsonl": "{"text":"User: I want to come up with a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should have a hydration free energy of -4.720 kcal\/mol.\nAssistant: Got it, this canonical SMILES represents a molecule that has a hydration free energy of -4.720 kcal\/mol: CCCCO"} {"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should have a hydration free energy of -4.780 kcal\/mol.\nAssistant: Got it, this InChI represents a molecule that has a hydration free energy of -4.780 kcal\/mol: InChI=1S\/C10H18O\/c1-9(2)5-4-6-10(3)7-8-11\/h5,7,11H,4,6,8H2,1-3H3\/b10-7-"}", "/scratch/micpie/export/freesolv/test_0-10.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hydration free energy computed using the GAFF force field in kcal\/mol.\nSMILES: c1cc(c(cc1c2cc(c(c(c2Cl)Cl)Cl)Cl)Cl)Cl\nConstraint: Even if you are not sure, you must answer with a numeric value in kcal\/mol without using any other words.\nResult: -1.080 kcal\/mol"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hydration free energy computed using the GAFF force field in kcal\/mol.\nSMILES: CCBr\nConstraint: Even if you are not sure, you must answer with a numeric value in kcal\/mol without using any additional words.\nResult: 0.490 kcal\/mol"}", "/scratch/micpie/export/freesolv/valid_0-8.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hydration free energy in kcal\/mol.\nMolecule canonical SMILES: CCCCCC(=O)OC\nConstraint: Even if you are uncertain, you must answer with a numeric value in kcal\/mol without using any other words.\nResult: -2.490 kcal\/mol"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hydration free energy in kcal\/mol.\nDeepSMILES: C[C@H]CF)F)F))O\nConstraint: Even if you are not sure, you must answer with a numeric value in kcal\/mol without using any other words.\nResult: -4.200 kcal\/mol"}", "/scratch/micpie/export/freesolv/test_0-16.jsonl": "{"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a hydration free energy of -3.040 kcal\/mol.\nAssistant: Ok, this DeepSMILES represents a molecule that has a hydration free energy of -3.040 kcal\/mol: cccccc6cccccc6Cl))Cl))Cl))Cl))))))Cl))Cl"} {"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a hydration free energy of -0.740 kcal\/mol.\nAssistant: Got it, this DeepSMILES represents a molecule that has a hydration free energy of -0.740 kcal\/mol: CCBr"}", "/scratch/micpie/export/freesolv/test_0-15.jsonl": "{"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should have a hydration free energy of -3.040 kcal\/mol.\nAssistant: Ok, here you go, this DeepSMILES represents a molecule that has a hydration free energy of -3.040 kcal\/mol: cccccc6cccccc6Cl))Cl))Cl))Cl))))))Cl))Cl"} {"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should have a hydration free energy of -0.740 kcal\/mol.\nAssistant: Ok, this canonical SMILES represents a molecule that has a hydration free energy of -0.740 kcal\/mol: CCBr"}", "/scratch/micpie/export/freesolv/train_0-8.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hydration free energy in kcal\/mol.\ncanonical SMILES: CCCCO\nConstraint: Even if you are uncertain, you must answer with a numeric value in kcal\/mol without using any other words.\nResult: -4.720 kcal\/mol"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hydration free energy in kcal\/mol.\ncanonical SMILES: CC(C)=CCC\/C(C)=C\\CO\nConstraint: Even if you are not sure, you must answer with a numeric value in kcal\/mol without using any additional words.\nResult: -4.780 kcal\/mol"}", "/scratch/micpie/export/freesolv/test_0-5.jsonl": "{"text":"Based on the SELFIES [C][=C][C][=Branch2][Ring1][N][=C][Branch2][Ring1][#Branch1][C][=C][Ring1][=Branch1][C][=C][C][=Branch1][=N][=C][Branch1][=Branch2][C][=Branch1][Branch1][=C][Ring1][=Branch1][Cl][Cl][Cl][Cl][Cl][Cl], the molecule has a hydration free energy computed using the GAFF force field of -1.080 kcal\/mol."} {"text":"Based on the canonical SMILES representation of CCBr, the molecule has a hydration free energy computed using the GAFF force field of 0.490 kcal\/mol."}", "/scratch/micpie/export/freesolv/valid_0-9.jsonl": "{"text":"Task: Please give me a molecule SELFIES based on the text description.\nDescription: A molecule that has hydration free energy of -2.490 kcal\/mol.\nResult: [C][C][C][C][C][C][=Branch1][C][=O][O][C]"} {"text":"Task: Please create a SMILES based on the text description below.\nDescription: A molecule that has hydration free energy of -4.200 kcal\/mol.\nResult: C[C@H](C(F)(F)F)O"}", "/scratch/micpie/export/freesolv/test_0-19.jsonl": "{"text":"User: I'm looking for the SMILES of a molecule that has a hydration free energy computed using the GAFF force field of -1.080 kcal\/mol.\nAssistant: This is a molecule that has a hydration free energy computed using the GAFF force field of -1.080 kcal\/mol: c1cc(c(cc1c2cc(c(c(c2Cl)Cl)Cl)Cl)Cl)Cl"} {"text":"User: I'm searching for the InChI of a molecule that has a hydration free energy computed using the GAFF force field of 0.490 kcal\/mol.\nAssistant: This is a molecule that has a hydration free energy computed using the GAFF force field of 0.490 kcal\/mol: InChI=1S\/C2H5Br\/c1-2-3\/h2H2,1H3"}", "/scratch/micpie/export/freesolv/test_0-1.jsonl": "{"text":"Based on the SELFIES [C][=C][C][=Branch2][Ring1][N][=C][Branch2][Ring1][#Branch1][C][=C][Ring1][=Branch1][C][=C][C][=Branch1][=N][=C][Branch1][=Branch2][C][=Branch1][Branch1][=C][Ring1][=Branch1][Cl][Cl][Cl][Cl][Cl][Cl], the molecule has a hydration free energy of -3.040 kcal\/mol."} {"text":"Based on the canonical SMILES representation of CCBr, the molecule has a hydration free energy of -0.740 kcal\/mol."}", "/scratch/micpie/export/freesolv/test_0-18.jsonl": "{"text":"User: Can you generate the DeepSMILES of a molecule that has a hydration free energy computed using the GAFF force field of -1.080 kcal\/mol?\nAssistant: Of course, here you go: cccccc6cccccc6Cl))Cl))Cl))Cl))))))Cl))Cl"} {"text":"User: Can you give me the DeepSMILES of a molecule that has a hydration free energy computed using the GAFF force field of 0.490 kcal\/mol?\nAssistant: Yes, I'm happy to help, here you go: CCBr"}", "/scratch/micpie/export/freesolv/valid_0-0.jsonl": "{"text":"The molecule with the SMILES CCCCCC(=O)OC has a hydration free energy of -2.490 kcal\/mol."} {"text":"The molecule with the InChI InChI=1S\/C3H5F3O\/c1-2(7)3(4,5)6\/h2,7H,1H3\/t2-\/m1\/s1 has a hydration free energy of -4.200 kcal\/mol."}", "/scratch/micpie/export/freesolv/test_0-21.jsonl": "{"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should have a hydration free energy computed using the GAFF force field of -1.080 kcal\/mol.\nAssistant: Got it, this InChI represents a molecule that has a hydration free energy computed using the GAFF force field of -1.080 kcal\/mol: InChI=1S\/C12H4Cl6\/c13-7-2-1-5(3-8(7)14)6-4-9(15)11(17)12(18)10(6)16\/h1-4H"} {"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a hydration free energy computed using the GAFF force field of 0.490 kcal\/mol.\nAssistant: Got it, this SMILES represents a molecule that has a hydration free energy computed using the GAFF force field of 0.490 kcal\/mol: CCBr"}", "/scratch/micpie/export/freesolv/test_0-2.jsonl": "{"text":"The SMILES c1cc(c(cc1c2cc(c(c(c2Cl)Cl)Cl)Cl)Cl)Cl is representing a molecule with a hydration free energy of -3.040 kcal\/mol."} {"text":"The SMILES CCBr is representing a molecule with a hydration free energy of -0.740 kcal\/mol."}", "/scratch/micpie/export/freesolv/valid_0-10.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hydration free energy computed using the GAFF force field in kcal\/mol.\nMolecule DeepSMILES: CCCCCC=O)OC\nConstraint: Even if you are uncertain, you must answer with a numeric value in kcal\/mol without using any additional words.\nResult: -3.300 kcal\/mol"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hydration free energy computed using the GAFF force field in kcal\/mol.\nInChI: InChI=1S\/C3H5F3O\/c1-2(7)3(4,5)6\/h2,7H,1H3\/t2-\/m1\/s1\nConstraint: Even if you are uncertain, you must answer with a numeric value in kcal\/mol without using any additional words.\nResult: -3.490 kcal\/mol"}", "/scratch/micpie/export/freesolv/train_0-6.jsonl": "{"text":"The SMILES CCCCO represents a molecule with a hydration free energy computed using the GAFF force field of -3.230 kcal\/mol."} {"text":"The canonical SMILES CC(C)=CCC\/C(C)=C\\CO represents a molecule with a hydration free energy computed using the GAFF force field of -2.600 kcal\/mol."}", "/scratch/micpie/export/freesolv/valid_0-6.jsonl": "{"text":"The SELFIES [C][C][C][C][C][C][=Branch1][C][=O][O][C] is representing a molecule that has a hydration free energy computed using the GAFF force field of -3.300 kcal\/mol."} {"text":"The SMILES C[C@H](C(F)(F)F)O represents a molecule that has a hydration free energy computed using the GAFF force field of -3.490 kcal\/mol."}", "/scratch/micpie/export/freesolv/train_0-21.jsonl": "{"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should have a hydration free energy computed using the GAFF force field of -3.230 kcal\/mol.\nAssistant: Understood, this canonical SMILES represents a molecule that has a hydration free energy computed using the GAFF force field of -3.230 kcal\/mol: CCCCO"} {"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should have a hydration free energy computed using the GAFF force field of -2.600 kcal\/mol.\nAssistant: Ok, this SELFIES represents a molecule that has a hydration free energy computed using the GAFF force field of -2.600 kcal\/mol: [C][C][=Branch1][O][=C][C][C][\/C][=Branch1][Ring2][=C][\\C][O][\/C][C]"}", "/scratch/micpie/export/freesolv/train_0-19.jsonl": "{"text":"User: I'm searching for the SELFIES of a molecule that has a hydration free energy computed using the GAFF force field of -3.230 kcal\/mol.\nAssistant: This is a molecule that has a hydration free energy computed using the GAFF force field of -3.230 kcal\/mol: [C][C][C][C][O]"} {"text":"User: I'm searching for the InChI of a molecule that has a hydration free energy computed using the GAFF force field of -2.600 kcal\/mol.\nAssistant: This is a molecule that has a hydration free energy computed using the GAFF force field of -2.600 kcal\/mol: InChI=1S\/C10H18O\/c1-9(2)5-4-6-10(3)7-8-11\/h5,7,11H,4,6,8H2,1-3H3\/b10-7-"}", "/scratch/micpie/export/freesolv/test_0-9.jsonl": "{"text":"Task: Please give me a SMILES based on the description.\nDescription: A molecule that has hydration free energy of -3.040 kcal\/mol.\nResult: c1cc(c(cc1c2cc(c(c(c2Cl)Cl)Cl)Cl)Cl)Cl"} {"text":"Task: Please generate a InChI based on the description below.\nDescription: A molecule that has hydration free energy of -0.740 kcal\/mol.\nResult: InChI=1S\/C2H5Br\/c1-2-3\/h2H2,1H3"}", "/scratch/micpie/export/freesolv/test_0-0.jsonl": "{"text":"The molecule with the DeepSMILES cccccc6cccccc6Cl))Cl))Cl))Cl))))))Cl))Cl has a hydration free energy of -3.040 kcal\/mol."} {"text":"The molecule with the DeepSMILES CCBr has a hydration free energy of -0.740 kcal\/mol."}", "/scratch/micpie/export/freesolv/valid_0-16.jsonl": "{"text":"User: I want to generate a InChI.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should have a hydration free energy of -2.490 kcal\/mol.\nAssistant: Ok, this InChI represents a molecule that has a hydration free energy of -2.490 kcal\/mol: InChI=1S\/C7H14O2\/c1-3-4-5-6-7(8)9-2\/h3-6H2,1-2H3"} {"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should have a hydration free energy of -4.200 kcal\/mol.\nAssistant: Understood, this DeepSMILES represents a molecule that has a hydration free energy of -4.200 kcal\/mol: C[C@H]CF)F)F))O"}", "/scratch/micpie/export/freesolv/valid_0-7.jsonl": "{"text":"The molecule with the canonical SMILES CCCCCC(=O)OC has a hydration free energy computed using the GAFF force field of -3.300 kcal\/mol."} {"text":"The molecule with the canonical SMILES C[C@@H](O)C(F)(F)F has a hydration free energy computed using the GAFF force field of -3.490 kcal\/mol."}", "/scratch/micpie/export/freesolv/test_0-3.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C12H4Cl6\/c13-7-2-1-5(3-8(7)14)6-4-9(15)11(17)12(18)10(6)16\/h1-4H has a hydration free energy of -3.040 kcal\/mol."} {"text":"The molecule with the SELFIES [C][C][Br] has a hydration free energy of -0.740 kcal\/mol."}", "/scratch/micpie/export/freesolv/valid_0-11.jsonl": "{"text":"Task: Please generate a InChI based on the text description below.\nDescription: A molecule that has hydration free energy computed using the GAFF force field of -3.300 kcal\/mol.\nResult: InChI=1S\/C7H14O2\/c1-3-4-5-6-7(8)9-2\/h3-6H2,1-2H3"} {"text":"Task: Please give me a molecule SELFIES based on the text description below.\nDescription: A molecule that has hydration free energy computed using the GAFF force field of -3.490 kcal\/mol.\nResult: [C][C@H1][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][O]"}", "/scratch/micpie/export/freesolv/train_0-20.jsonl": "{"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should have a hydration free energy computed using the GAFF force field of -3.230 kcal\/mol.\nAssistant: Got it, here you go, this SMILES represents a molecule that has a hydration free energy computed using the GAFF force field of -3.230 kcal\/mol: CCCCO"} {"text":"User: I want to come up with a canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should have a hydration free energy computed using the GAFF force field of -2.600 kcal\/mol.\nAssistant: Ok, this canonical SMILES represents a molecule that has a hydration free energy computed using the GAFF force field of -2.600 kcal\/mol: CC(C)=CCC\/C(C)=C\\CO"}", "/scratch/micpie/export/freesolv/valid_0-20.jsonl": "{"text":"User: I want to create a InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a hydration free energy computed using the GAFF force field of -3.300 kcal\/mol.\nAssistant: Ok, this InChI represents a molecule that has a hydration free energy computed using the GAFF force field of -3.300 kcal\/mol: InChI=1S\/C7H14O2\/c1-3-4-5-6-7(8)9-2\/h3-6H2,1-2H3"} {"text":"User: I want to create a SELFIES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a hydration free energy computed using the GAFF force field of -3.490 kcal\/mol.\nAssistant: Got it, here you go, this SELFIES represents a molecule that has a hydration free energy computed using the GAFF force field of -3.490 kcal\/mol: [C][C@H1][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][O]"}", "/scratch/micpie/export/freesolv/train_0-0.jsonl": "{"text":"The molecule with the SELFIES [C][C][C][C][O] has a hydration free energy of -4.720 kcal\/mol."} {"text":"The molecule with the SELFIES [C][C][=Branch1][O][=C][C][C][\/C][=Branch1][Ring2][=C][\\C][O][\/C][C] has a hydration free energy of -4.780 kcal\/mol."}", "/scratch/micpie/export/freesolv/test_0-6.jsonl": "{"text":"The InChI InChI=1S\/C12H4Cl6\/c13-7-2-1-5(3-8(7)14)6-4-9(15)11(17)12(18)10(6)16\/h1-4H is representing a molecule with a hydration free energy computed using the GAFF force field of -1.080 kcal\/mol."} {"text":"The SELFIES [C][C][Br] represents a molecule that has a hydration free energy computed using the GAFF force field of 0.490 kcal\/mol."}", "/scratch/micpie/export/freesolv/train_0-10.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hydration free energy computed using the GAFF force field in kcal\/mol.\nMolecule SMILES: CCCCO\nConstraint: Even if you are not sure, you must answer with a numeric value in kcal\/mol without using any additional words.\nResult: -3.230 kcal\/mol"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hydration free energy computed using the GAFF force field in kcal\/mol.\nMolecule SMILES: CC(=CCC\/C(=C\\CO)\/C)C\nConstraint: Even if you are not sure, you must answer with a numeric value in kcal\/mol without using any other words.\nResult: -2.600 kcal\/mol"}", "/scratch/micpie/export/freesolv/train_0-3.jsonl": "{"text":"The molecule with the DeepSMILES CCCCO has a hydration free energy of -4.720 kcal\/mol."} {"text":"The molecule with the SELFIES [C][C][=Branch1][O][=C][C][C][\/C][=Branch1][Ring2][=C][\\C][O][\/C][C] has a hydration free energy of -4.780 kcal\/mol."}", "/scratch/micpie/export/freesolv/train_0-12.jsonl": "{"text":"User: Can you derive the hydration free energy in kcal\/mol of the molecule with the InChI InChI=1S\/C4H10O\/c1-2-3-4-5\/h5H,2-4H2,1H3?\nAssistant: Sure, this molecule has a hydration free energy of -4.720 kcal\/mol."} {"text":"User: Can you tell me the hydration free energy in kcal\/mol of the molecule with the canonical SMILES CC(C)=CCC\/C(C)=C\\CO?\nAssistant: Yes, this molecule has a hydration free energy of -4.780 kcal\/mol."}", "/scratch/micpie/export/freesolv/test_0-13.jsonl": "{"text":"User: Can you create the canonical SMILES of a molecule that has a hydration free energy of -3.040 kcal\/mol?\nAssistant: Of course, here you go: Clc1ccc(-c2cc(Cl)c(Cl)c(Cl)c2Cl)cc1Cl"} {"text":"User: Can you create the canonical SMILES of a molecule that has a hydration free energy of -0.740 kcal\/mol?\nAssistant: Sure, here you go: CCBr"}", "/scratch/micpie/export/freesolv/valid_0-2.jsonl": "{"text":"The DeepSMILES CCCCCC=O)OC represents a molecule that has a hydration free energy of -2.490 kcal\/mol."} {"text":"The SELFIES [C][C@H1][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][O] is representing a molecule with a hydration free energy of -4.200 kcal\/mol."}", "/scratch/micpie/export/freesolv/valid_0-21.jsonl": "{"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should have a hydration free energy computed using the GAFF force field of -3.300 kcal\/mol.\nAssistant: Ok, this InChI represents a molecule that has a hydration free energy computed using the GAFF force field of -3.300 kcal\/mol: InChI=1S\/C7H14O2\/c1-3-4-5-6-7(8)9-2\/h3-6H2,1-2H3"} {"text":"User: I want to generate a molecule InChI.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should have a hydration free energy computed using the GAFF force field of -3.490 kcal\/mol.\nAssistant: Understood, this InChI represents a molecule that has a hydration free energy computed using the GAFF force field of -3.490 kcal\/mol: InChI=1S\/C3H5F3O\/c1-2(7)3(4,5)6\/h2,7H,1H3\/t2-\/m1\/s1"}", "/scratch/micpie/export/freesolv/train_0-14.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that has a hydration free energy of -4.720 kcal\/mol.\nAssistant: This is a molecule that has a hydration free energy of -4.720 kcal\/mol: CCCCO"} {"text":"User: I'm looking for the SELFIES of a molecule that has a hydration free energy of -4.780 kcal\/mol.\nAssistant: This is a molecule that has a hydration free energy of -4.780 kcal\/mol: [C][C][=Branch1][O][=C][C][C][\/C][=Branch1][Ring2][=C][\\C][O][\/C][C]"}", "/scratch/micpie/export/freesolv/valid_0-1.jsonl": "{"text":"Based on the SMILES CCCCCC(=O)OC, the molecule has a hydration free energy of -2.490 kcal\/mol."} {"text":"Based on the DeepSMILES representation of C[C@H]CF)F)F))O, the molecule has a hydration free energy of -4.200 kcal\/mol."}", "/scratch/micpie/export/freesolv/valid_0-13.jsonl": "{"text":"User: Can you give me the SELFIES of a molecule that has a hydration free energy of -2.490 kcal\/mol?\nAssistant: Yes, I'm happy to help, here you go: [C][C][C][C][C][C][=Branch1][C][=O][O][C]"} {"text":"User: Can you give me the SMILES of a molecule that has a hydration free energy of -4.200 kcal\/mol?\nAssistant: Of course, here you go: C[C@H](C(F)(F)F)O"}", "/scratch/micpie/export/freesolv/valid_0-5.jsonl": "{"text":"Based on the SMILES CCCCCC(=O)OC, the molecule has a hydration free energy computed using the GAFF force field of -3.300 kcal\/mol."} {"text":"Based on the SELFIES representation of [C][C@H1][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][O], the molecule has a hydration free energy computed using the GAFF force field of -3.490 kcal\/mol."}", "/scratch/micpie/export/freesolv/train_0-15.jsonl": "{"text":"User: I want to create a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a hydration free energy of -4.720 kcal\/mol.\nAssistant: Ok, here you go, this canonical SMILES represents a molecule that has a hydration free energy of -4.720 kcal\/mol: CCCCO"} {"text":"User: I want to come up with a InChI.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should have a hydration free energy of -4.780 kcal\/mol.\nAssistant: Ok, here you go, this InChI represents a molecule that has a hydration free energy of -4.780 kcal\/mol: InChI=1S\/C10H18O\/c1-9(2)5-4-6-10(3)7-8-11\/h5,7,11H,4,6,8H2,1-3H3\/b10-7-"}", "/scratch/micpie/export/freesolv/valid_0-4.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][C][C][C][C][=Branch1][C][=O][O][C] has a hydration free energy computed using the GAFF force field of -3.300 kcal\/mol."} {"text":"The molecule with the InChI InChI=1S\/C3H5F3O\/c1-2(7)3(4,5)6\/h2,7H,1H3\/t2-\/m1\/s1 has a hydration free energy computed using the GAFF force field of -3.490 kcal\/mol."}", "/scratch/micpie/export/freesolv/train_0-5.jsonl": "{"text":"Based on the DeepSMILES representation of CCCCO, the molecule has a hydration free energy computed using the GAFF force field of -3.230 kcal\/mol."} {"text":"Based on the canonical SMILES representation of CC(C)=CCC\/C(C)=C\\CO, the molecule has a hydration free energy computed using the GAFF force field of -2.600 kcal\/mol."}", "/scratch/micpie/export/freesolv/valid_0-15.jsonl": "{"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a hydration free energy of -2.490 kcal\/mol.\nAssistant: Ok, this InChI represents a molecule that has a hydration free energy of -2.490 kcal\/mol: InChI=1S\/C7H14O2\/c1-3-4-5-6-7(8)9-2\/h3-6H2,1-2H3"} {"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should have a hydration free energy of -4.200 kcal\/mol.\nAssistant: Ok, here you go, this SMILES represents a molecule that has a hydration free energy of -4.200 kcal\/mol: C[C@H](C(F)(F)F)O"}", "/scratch/micpie/export/freesolv/valid_0-12.jsonl": "{"text":"User: Can you tell me the hydration free energy in kcal\/mol of the molecule with the DeepSMILES CCCCCC=O)OC?\nAssistant: Yes, I'm happy to help, this molecule has a hydration free energy of -2.490 kcal\/mol."} {"text":"User: Can you derive the hydration free energy in kcal\/mol of the molecule with the SMILES C[C@H](C(F)(F)F)O?\nAssistant: Yes, this molecule has a hydration free energy of -4.200 kcal\/mol."}", "/scratch/micpie/export/freesolv/valid_0-18.jsonl": "{"text":"User: Can you give me the InChI of a molecule that has a hydration free energy computed using the GAFF force field of -3.300 kcal\/mol?\nAssistant: Sure, here you go: InChI=1S\/C7H14O2\/c1-3-4-5-6-7(8)9-2\/h3-6H2,1-2H3"} {"text":"User: Can you generate the canonical SMILES of a molecule that has a hydration free energy computed using the GAFF force field of -3.490 kcal\/mol?\nAssistant: Yes, here you go: C[C@@H](O)C(F)(F)F"}", "/scratch/micpie/export/freesolv/train_0-2.jsonl": "{"text":"The canonical SMILES CCCCO represents a molecule that has a hydration free energy of -4.720 kcal\/mol."} {"text":"The DeepSMILES CC=CCC\/C=C\\CO)))\/C)))))C is representing a molecule that has a hydration free energy of -4.780 kcal\/mol."}", "/scratch/micpie/export/freesolv/test_0-11.jsonl": "{"text":"Task: Please give me a canonical SMILES based on the text description below.\nDescription: A molecule that has hydration free energy computed using the GAFF force field of -1.080 kcal\/mol.\nResult: Clc1ccc(-c2cc(Cl)c(Cl)c(Cl)c2Cl)cc1Cl"} {"text":"Task: Please give me a SELFIES based on the text description.\nDescription: A molecule that has hydration free energy computed using the GAFF force field of 0.490 kcal\/mol.\nResult: [C][C][Br]"}", "/scratch/micpie/export/freesolv/train_0-7.jsonl": "{"text":"The molecule with the canonical SMILES CCCCO has a hydration free energy computed using the GAFF force field of -3.230 kcal\/mol."} {"text":"The molecule with the InChI InChI=1S\/C10H18O\/c1-9(2)5-4-6-10(3)7-8-11\/h5,7,11H,4,6,8H2,1-3H3\/b10-7- has a hydration free energy computed using the GAFF force field of -2.600 kcal\/mol."}", "/scratch/micpie/export/freesolv/test_0-17.jsonl": "{"text":"User: Can you tell me the hydration free energy computed using the GAFF force field in kcal\/mol of the molecule with the SELFIES [C][=C][C][=Branch2][Ring1][N][=C][Branch2][Ring1][#Branch1][C][=C][Ring1][=Branch1][C][=C][C][=Branch1][=N][=C][Branch1][=Branch2][C][=Branch1][Branch1][=C][Ring1][=Branch1][Cl][Cl][Cl][Cl][Cl][Cl]?\nAssistant: Sure, this molecule has a hydration free energy computed using the GAFF force field of -1.080 kcal\/mol."} {"text":"User: Can you estimate the hydration free energy computed using the GAFF force field in kcal\/mol of the molecule with the canonical SMILES CCBr?\nAssistant: Sure, this molecule has a hydration free energy computed using the GAFF force field of 0.490 kcal\/mol."}", "/scratch/micpie/export/freesolv/valid_0-19.jsonl": "{"text":"User: I'm searching for the SELFIES of a molecule that has a hydration free energy computed using the GAFF force field of -3.300 kcal\/mol.\nAssistant: This is a molecule that has a hydration free energy computed using the GAFF force field of -3.300 kcal\/mol: [C][C][C][C][C][C][=Branch1][C][=O][O][C]"} {"text":"User: I'm searching for the canonical SMILES of a molecule that has a hydration free energy computed using the GAFF force field of -3.490 kcal\/mol.\nAssistant: This is a molecule that has a hydration free energy computed using the GAFF force field of -3.490 kcal\/mol: C[C@@H](O)C(F)(F)F"}", "/scratch/micpie/export/freesolv/train_0-11.jsonl": "{"text":"Task: Please give me a molecule DeepSMILES based on the description.\nDescription: A molecule that has hydration free energy computed using the GAFF force field of -3.230 kcal\/mol.\nResult: CCCCO"} {"text":"Task: Please create a DeepSMILES based on the text description.\nDescription: A molecule that has hydration free energy computed using the GAFF force field of -2.600 kcal\/mol.\nResult: CC=CCC\/C=C\\CO)))\/C)))))C"}", "/scratch/micpie/export/freesolv/train_0-1.jsonl": "{"text":"Based on the SMILES representation of CCCCO, the molecule has a hydration free energy of -4.720 kcal\/mol."} {"text":"Based on the SMILES representation of CC(=CCC\/C(=C\\CO)\/C)C, the molecule has a hydration free energy of -4.780 kcal\/mol."}", "/scratch/micpie/export/freesolv/train_0-13.jsonl": "{"text":"User: Can you generate the SMILES of a molecule that has a hydration free energy of -4.720 kcal\/mol?\nAssistant: Yes, here you go: CCCCO"} {"text":"User: Can you create the SMILES of a molecule that has a hydration free energy of -4.780 kcal\/mol?\nAssistant: Of course, here you go: CC(=CCC\/C(=C\\CO)\/C)C"}", "/scratch/micpie/export/freesolv/train_0-4.jsonl": "{"text":"The molecule with the SMILES representation of CCCCO has a hydration free energy computed using the GAFF force field of -3.230 kcal\/mol."} {"text":"The molecule with the SELFIES [C][C][=Branch1][O][=C][C][C][\/C][=Branch1][Ring2][=C][\\C][O][\/C][C] has a hydration free energy computed using the GAFF force field of -2.600 kcal\/mol."}", "/scratch/micpie/export/freesolv/test_0-7.jsonl": "{"text":"The molecule with the SMILES c1cc(c(cc1c2cc(c(c(c2Cl)Cl)Cl)Cl)Cl)Cl has a hydration free energy computed using the GAFF force field of -1.080 kcal\/mol."} {"text":"The molecule with the SMILES CCBr has a hydration free energy computed using the GAFF force field of 0.490 kcal\/mol."}", "/scratch/micpie/export/freesolv/train_0-9.jsonl": "{"text":"Task: Please create a SMILES based on the description below.\nDescription: A molecule that has hydration free energy of -4.720 kcal\/mol.\nResult: CCCCO"} {"text":"Task: Please generate a molecule SMILES based on the description below.\nDescription: A molecule that has hydration free energy of -4.780 kcal\/mol.\nResult: CC(=CCC\/C(=C\\CO)\/C)C"}", "/scratch/micpie/export/freesolv/train_0-18.jsonl": "{"text":"User: Can you create the DeepSMILES of a molecule that has a hydration free energy computed using the GAFF force field of -3.230 kcal\/mol?\nAssistant: Yes, here you go: CCCCO"} {"text":"User: Can you generate the SMILES of a molecule that has a hydration free energy computed using the GAFF force field of -2.600 kcal\/mol?\nAssistant: Of course, here you go: CC(=CCC\/C(=C\\CO)\/C)C"}", "/scratch/micpie/export/freesolv/valid_0-3.jsonl": "{"text":"The molecule with the SMILES CCCCCC(=O)OC has a hydration free energy of -2.490 kcal\/mol."} {"text":"The molecule with the InChI InChI=1S\/C3H5F3O\/c1-2(7)3(4,5)6\/h2,7H,1H3\/t2-\/m1\/s1 has a hydration free energy of -4.200 kcal\/mol."}", "/scratch/micpie/export/freesolv/test_0-8.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hydration free energy in kcal\/mol.\nMolecule DeepSMILES: cccccc6cccccc6Cl))Cl))Cl))Cl))))))Cl))Cl\nConstraint: Even if you are not sure, you must answer with a numeric value in kcal\/mol without using any other words.\nResult: -3.040 kcal\/mol"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hydration free energy in kcal\/mol.\nDeepSMILES: CCBr\nConstraint: Even if you are uncertain, you must answer with a numeric value in kcal\/mol without using any other words.\nResult: -0.740 kcal\/mol"}", "/scratch/micpie/export/freesolv/test_0-14.jsonl": "{"text":"User: I'm looking for the InChI of a molecule that has a hydration free energy of -3.040 kcal\/mol.\nAssistant: This is a molecule that has a hydration free energy of -3.040 kcal\/mol: InChI=1S\/C12H4Cl6\/c13-7-2-1-5(3-8(7)14)6-4-9(15)11(17)12(18)10(6)16\/h1-4H"} {"text":"User: I'm searching for the InChI of a molecule that has a hydration free energy of -0.740 kcal\/mol.\nAssistant: This is a molecule that has a hydration free energy of -0.740 kcal\/mol: InChI=1S\/C2H5Br\/c1-2-3\/h2H2,1H3"}", "/scratch/micpie/export/freesolv/valid_0-17.jsonl": "{"text":"User: Can you derive the hydration free energy computed using the GAFF force field in kcal\/mol of the molecule with the SMILES CCCCCC(=O)OC?\nAssistant: Yes, I'm happy to help, this molecule has a hydration free energy computed using the GAFF force field of -3.300 kcal\/mol."} {"text":"User: Can you derive the hydration free energy computed using the GAFF force field in kcal\/mol of the molecule with the InChI InChI=1S\/C3H5F3O\/c1-2(7)3(4,5)6\/h2,7H,1H3\/t2-\/m1\/s1?\nAssistant: Of course, this molecule has a hydration free energy computed using the GAFF force field of -3.490 kcal\/mol."}", "/scratch/micpie/export/freesolv/valid_0-14.jsonl": "{"text":"User: I'm looking for the DeepSMILES of a molecule that has a hydration free energy of -2.490 kcal\/mol.\nAssistant: This is a molecule that has a hydration free energy of -2.490 kcal\/mol: CCCCCC=O)OC"} {"text":"User: I'm looking for the SELFIES of a molecule that has a hydration free energy of -4.200 kcal\/mol.\nAssistant: This is a molecule that has a hydration free energy of -4.200 kcal\/mol: [C][C@H1][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][O]"}", "/scratch/micpie/export/freesolv/test_0-4.jsonl": "{"text":"The molecule with the canonical SMILES representation of Clc1ccc(-c2cc(Cl)c(Cl)c(Cl)c2Cl)cc1Cl has a hydration free energy computed using the GAFF force field of -1.080 kcal\/mol."} {"text":"The molecule with the canonical SMILES CCBr has a hydration free energy computed using the GAFF force field of 0.490 kcal\/mol."}", "/scratch/micpie/export/freesolv/test_0-12.jsonl": "{"text":"User: Can you tell me the hydration free energy in kcal\/mol of the molecule with the InChI InChI=1S\/C12H4Cl6\/c13-7-2-1-5(3-8(7)14)6-4-9(15)11(17)12(18)10(6)16\/h1-4H?\nAssistant: Of course, this molecule has a hydration free energy of -3.040 kcal\/mol."} {"text":"User: Can you estimate the hydration free energy in kcal\/mol of the molecule with the canonical SMILES CCBr?\nAssistant: Sure, this molecule has a hydration free energy of -0.740 kcal\/mol."}", "/scratch/micpie/export/freesolv/test_0-20.jsonl": "{"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should have a hydration free energy computed using the GAFF force field of -1.080 kcal\/mol.\nAssistant: Ok, this SELFIES represents a molecule that has a hydration free energy computed using the GAFF force field of -1.080 kcal\/mol: [C][=C][C][=Branch2][Ring1][N][=C][Branch2][Ring1][#Branch1][C][=C][Ring1][=Branch1][C][=C][C][=Branch1][=N][=C][Branch1][=Branch2][C][=Branch1][Branch1][=C][Ring1][=Branch1][Cl][Cl][Cl][Cl][Cl][Cl]"} {"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should have a hydration free energy computed using the GAFF force field of 0.490 kcal\/mol.\nAssistant: Ok, here you go, this SELFIES represents a molecule that has a hydration free energy computed using the GAFF force field of 0.490 kcal\/mol: [C][C][Br]"}", "/scratch/micpie/export/caco2_wang/test_0-10.jsonl": "{"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a Caco-2 cell effective permeability of -6.220 cm\/s.\nAssistant: Got it, here you go, this DeepSMILES represents a molecule that has a Caco-2 cell effective permeability of -6.220 cm\/s: OcccO)ccc6)OCccccO)cO)c6))))))CO)C6"} {"text":"User: I want to generate a InChI.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a Caco-2 permeability of -4.980 cm\/s.\nAssistant: Got it, here you go, this InChI represents a molecule that has a Caco-2 permeability of -4.980 cm\/s: InChI=1S\/C7H5ClO3\/c8-4-1-2-6(9)5(3-4)7(10)11\/h1-3,9H,(H,10,11)"}", "/scratch/micpie/export/caco2_wang/valid_0-8.jsonl": "{"text":"User: Can you create the SMILES of a molecule that has a Caco-2 cell effective permeability of -4.090 cm\/s?\nAssistant: Yes, I'm happy to help, here you go: COc1ccc2c3c1O[C@H]1[C@@H](O)C=C[C@H]4[C@@H](C2)N(C)CC[C@]314"} {"text":"User: Can you create the DeepSMILES of a molecule that has a Caco-2 cell permeability of -5.640 cm\/s?\nAssistant: Yes, here you go: COcccncNC)CCCNC=O)CCCCO5)))))))))))ncN)c6cc%10OC"}", "/scratch/micpie/export/caco2_wang/train_0-8.jsonl": "{"text":"User: Can you create the DeepSMILES of a molecule that has a Caco-2 permeability of -3.860 cm\/s?\nAssistant: Yes, I'm happy to help, here you go: C\/C=C\\C#CCC\/C=C\\C=C\\C=O)NCCC)C"} {"text":"User: Can you create the SMILES of a molecule that has a Caco-2 cell effective permeability of -4.480 cm\/s?\nAssistant: Yes, here you go: COc1cc2ncnc(Nc3ccc(F)c(Cl)c3)c2cc1OCCCN1CCOCC1"}", "/scratch/micpie/export/caco2_wang/test_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the Caco-2 permeability in cm\/s.\nMolecule canonical SMILES: Oc1cc(O)c2c(c1)OC(c1ccc(O)c(O)c1)C(O)C2\nConstraint: Even if you are not sure, you must answer with a numeric value in cm\/s without the unit and without using any other words.\nResult: -6.220"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the Caco-2 permeability in cm\/s.\nInChI: InChI=1S\/C7H5ClO3\/c8-4-1-2-6(9)5(3-4)7(10)11\/h1-3,9H,(H,10,11)\nConstraint: Even if you are uncertain, you must answer with a numeric value in cm\/s without the unit and without using any other words.\nResult: -4.980"}", "/scratch/micpie/export/caco2_wang/valid_0-9.jsonl": "{"text":"User: I'm searching for the SELFIES of a molecule that has a Caco-2 cell permeability of -4.090 cm\/s.\nAssistant: This is a molecule that has a Caco-2 cell permeability of -4.090 cm\/s: ['[C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C@H1][C@@H1][Branch1][C][O][C][=C][C@H1][C@@H1][Branch1][Ring2][C][Ring1][N][N][Branch1][C][C][C][C][C@][Ring1][S][Ring1][=N][Ring1][Branch2]']"} {"text":"User: I'm looking for the DeepSMILES of a molecule that has a Caco-2 permeability of -5.640 cm\/s.\nAssistant: This is a molecule that has a Caco-2 permeability of -5.640 cm\/s: COcccncNC)CCCNC=O)CCCCO5)))))))))))ncN)c6cc%10OC"}", "/scratch/micpie/export/caco2_wang/test_0-1.jsonl": "{"text":"Based on the SMILES Oc1cc(O)c2c(c1)OC(c1ccc(O)c(O)c1)C(O)C2, the molecule has a Caco-2 cell permeability of -6.220 cm\/s."} {"text":"Based on the SELFIES representation of ['[O][=C][Branch1][C][O][C][=C][C][Branch1][C][Cl][=C][C][=C][Ring1][#Branch1][O]'], the molecule has a Caco-2 cell permeability of -4.980 cm\/s."}", "/scratch/micpie/export/caco2_wang/valid_0-0.jsonl": "{"text":"The molecule with the SELFIES ['[C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C@H1][C@@H1][Branch1][C][O][C][=C][C@H1][C@@H1][Branch1][Ring2][C][Ring1][N][N][Branch1][C][C][C][C][C@][Ring1][S][Ring1][=N][Ring1][Branch2]'] has a Caco-2 cell effective permeability of -4.090 cm\/s."} {"text":"The molecule with the SMILES representation of COc1cc2nc(N(C)CCCNC(=O)C3CCCO3)nc(N)c2cc1OC has a Caco-2 permeability of -5.640 cm\/s."}", "/scratch/micpie/export/caco2_wang/test_0-2.jsonl": "{"text":"The InChI InChI=1S\/C15H14O6\/c16-8-4-11(18)9-6-13(20)15(21-14(9)5-8)7-1-2-10(17)12(19)3-7\/h1-5,13,15-20H,6H2 is representing a molecule that has a Caco-2 cell permeability of -6.220 cm\/s."} {"text":"The InChI InChI=1S\/C7H5ClO3\/c8-4-1-2-6(9)5(3-4)7(10)11\/h1-3,9H,(H,10,11) represents a molecule that has a Caco-2 cell permeability of -4.980 cm\/s."}", "/scratch/micpie/export/caco2_wang/valid_0-10.jsonl": "{"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should have a Caco-2 permeability of -4.090 cm\/s.\nAssistant: Got it, this SELFIES represents a molecule that has a Caco-2 permeability of -4.090 cm\/s: ['[C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C@H1][C@@H1][Branch1][C][O][C][=C][C@H1][C@@H1][Branch1][Ring2][C][Ring1][N][N][Branch1][C][C][C][C][C@][Ring1][S][Ring1][=N][Ring1][Branch2]']"} {"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should have a Caco-2 cell permeability of -5.640 cm\/s.\nAssistant: Got it, this SELFIES represents a molecule that has a Caco-2 cell permeability of -5.640 cm\/s: ['[C][O][C][=C][C][=N][C][Branch2][Ring1][Ring2][N][Branch1][C][C][C][C][C][N][C][=Branch1][C][=O][C][C][C][C][O][Ring1][Branch1][=N][C][Branch1][C][N][=C][Ring2][Ring1][Ring2][C][=C][Ring2][Ring1][Branch2][O][C]']"}", "/scratch/micpie/export/caco2_wang/train_0-6.jsonl": "{"text":"Task: Please give me a InChI based on the description.\nDescription: A molecule that has a Caco-2 permeability of -3.860 cm\/s.\nResult: InChI=1S\/C16H23NO\/c1-4-5-6-7-8-9-10-11-12-13-16(18)17-14-15(2)3\/h4-5,10-13,15H,8-9,14H2,1-3H3,(H,17,18)\/b5-4-,11-10-,13-12+"} {"text":"Task: Please create a molecule canonical SMILES based on the description.\nDescription: A molecule that has a Caco-2 cell effective permeability of -4.480 cm\/s.\nResult: COc1cc2ncnc(Nc3ccc(F)c(Cl)c3)c2cc1OCCCN1CCOCC1"}", "/scratch/micpie/export/caco2_wang/valid_0-6.jsonl": "{"text":"Task: Please create a molecule InChI based on the text description below.\nDescription: A molecule that has a Caco-2 cell permeability of -4.090 cm\/s.\nResult: InChI=1S\/C18H21NO3\/c1-19-8-7-18-11-4-5-13(20)17(18)22-16-14(21-2)6-3-10(15(16)18)9-12(11)19\/h3-6,11-13,17,20H,7-9H2,1-2H3\/t11-,12+,13-,17-,18-\/m0\/s1"} {"text":"Task: Please generate a SELFIES based on the description below.\nDescription: A molecule that has a Caco-2 permeability of -5.640 cm\/s.\nResult: ['[C][O][C][=C][C][=N][C][Branch2][Ring1][Ring2][N][Branch1][C][C][C][C][C][N][C][=Branch1][C][=O][C][C][C][C][O][Ring1][Branch1][=N][C][Branch1][C][N][=C][Ring2][Ring1][Ring2][C][=C][Ring2][Ring1][Branch2][O][C]']"}", "/scratch/micpie/export/caco2_wang/test_0-9.jsonl": "{"text":"User: I'm looking for the SELFIES of a molecule that has a Caco-2 cell effective permeability of -6.220 cm\/s.\nAssistant: This is a molecule that has a Caco-2 cell effective permeability of -6.220 cm\/s: ['[O][C][=C][C][Branch1][C][O][=C][C][=Branch1][Ring2][=C][Ring1][#Branch1][O][C][Branch1][#C][C][=C][C][=C][Branch1][C][O][C][Branch1][C][O][=C][Ring1][Branch2][C][Branch1][C][O][C][Ring1][S]']"} {"text":"User: I'm looking for the canonical SMILES of a molecule that has a Caco-2 cell permeability of -4.980 cm\/s.\nAssistant: This is a molecule that has a Caco-2 cell permeability of -4.980 cm\/s: O=C(O)c1cc(Cl)ccc1O"}", "/scratch/micpie/export/caco2_wang/test_0-0.jsonl": "{"text":"The molecule with the SELFIES ['[O][C][=C][C][Branch1][C][O][=C][C][=Branch1][Ring2][=C][Ring1][#Branch1][O][C][Branch1][#C][C][=C][C][=C][Branch1][C][O][C][Branch1][C][O][=C][Ring1][Branch2][C][Branch1][C][O][C][Ring1][S]'] has a Caco-2 cell permeability of -6.220 cm\/s."} {"text":"The molecule with the DeepSMILES representation of O=CO)cccCl)ccc6O has a Caco-2 cell effective permeability of -4.980 cm\/s."}", "/scratch/micpie/export/caco2_wang/valid_0-7.jsonl": "{"text":"User: Can you estimate the Caco-2 cell permeability in cm\/s of the molecule with the canonical SMILES COc1ccc2c3c1O[C@H]1[C@@H](O)C=C[C@H]4[C@@H](C2)N(C)CC[C@@]341?\nAssistant: Sure, this molecule has a Caco-2 cell permeability of -4.090 cm\/s."} {"text":"User: Can you derive the Caco-2 permeability in cm\/s of the molecule with the SMILES COc1cc2nc(N(C)CCCNC(=O)C3CCCO3)nc(N)c2cc1OC?\nAssistant: Of course, this molecule has a Caco-2 permeability of -5.640 cm\/s."}", "/scratch/micpie/export/caco2_wang/test_0-3.jsonl": "{"text":"The molecule with the SMILES Oc1cc(O)c2c(c1)OC(c1ccc(O)c(O)c1)C(O)C2 has a Caco-2 cell effective permeability of -6.220 cm\/s."} {"text":"The molecule with the InChI InChI=1S\/C7H5ClO3\/c8-4-1-2-6(9)5(3-4)7(10)11\/h1-3,9H,(H,10,11) has a Caco-2 cell permeability of -4.980 cm\/s."}", "/scratch/micpie/export/caco2_wang/valid_0-11.jsonl": "{"text":"User: I want to create a molecule SELFIES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should have a Caco-2 cell effective permeability of -4.090 cm\/s.\nAssistant: Got it, this SELFIES represents a molecule that has a Caco-2 cell effective permeability of -4.090 cm\/s: ['[C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C@H1][C@@H1][Branch1][C][O][C][=C][C@H1][C@@H1][Branch1][Ring2][C][Ring1][N][N][Branch1][C][C][C][C][C@][Ring1][S][Ring1][=N][Ring1][Branch2]']"} {"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should have a Caco-2 cell effective permeability of -5.640 cm\/s.\nAssistant: Got it, this SMILES represents a molecule that has a Caco-2 cell effective permeability of -5.640 cm\/s: COc1cc2nc(N(C)CCCNC(=O)C3CCCO3)nc(N)c2cc1OC"}", "/scratch/micpie/export/caco2_wang/train_0-0.jsonl": "{"text":"The molecule with the SMILES C\/C=C\\C#CCC\/C=C\\C=C\\C(=O)NCC(C)C has a Caco-2 permeability of -3.860 cm\/s."} {"text":"The molecule with the SMILES representation of COc1cc2ncnc(Nc3ccc(F)c(Cl)c3)c2cc1OCCCN1CCOCC1 has a Caco-2 cell effective permeability of -4.480 cm\/s."}", "/scratch/micpie/export/caco2_wang/test_0-6.jsonl": "{"text":"Task: Please give me a molecule SELFIES based on the text description below.\nDescription: A molecule that has a Caco-2 permeability of -6.220 cm\/s.\nResult: ['[O][C][=C][C][Branch1][C][O][=C][C][=Branch1][Ring2][=C][Ring1][#Branch1][O][C][Branch1][#C][C][=C][C][=C][Branch1][C][O][C][Branch1][C][O][=C][Ring1][Branch2][C][Branch1][C][O][C][Ring1][S]']"} {"text":"Task: Please generate a molecule SELFIES based on the text description below.\nDescription: A molecule that has a Caco-2 cell permeability of -4.980 cm\/s.\nResult: ['[O][=C][Branch1][C][O][C][=C][C][Branch1][C][Cl][=C][C][=C][Ring1][#Branch1][O]']"}", "/scratch/micpie/export/caco2_wang/train_0-10.jsonl": "{"text":"User: I want to create a molecule SELFIES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a Caco-2 permeability of -3.860 cm\/s.\nAssistant: Got it, here you go, this SELFIES represents a molecule that has a Caco-2 permeability of -3.860 cm\/s: ['[C][\/C][=C][\\\\C][#C][C][C][\/C][=C][\\\\C][=C][\\\\C][=Branch1][C][=O][N][C][C][Branch1][C][C][C]']"} {"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a Caco-2 cell permeability of -4.480 cm\/s.\nAssistant: Got it, this SELFIES represents a molecule that has a Caco-2 cell permeability of -4.480 cm\/s: ['[C][O][C][=C][C][=N][C][=N][C][Branch1][S][N][C][=C][C][=C][Branch1][C][F][C][Branch1][C][Cl][=C][Ring1][Branch2][=C][Ring1][#C][C][=C][Ring2][Ring1][Ring1][O][C][C][C][N][C][C][O][C][C][Ring1][=Branch1]']"}", "/scratch/micpie/export/caco2_wang/train_0-3.jsonl": "{"text":"The molecule with the DeepSMILES C\/C=C\\C#CCC\/C=C\\C=C\\C=O)NCCC)C has a Caco-2 permeability of -3.860 cm\/s."} {"text":"The molecule with the InChI InChI=1S\/C22H24ClFN4O3\/c1-29-20-13-19-16(12-21(20)31-8-2-5-28-6-9-30-10-7-28)22(26-14-25-19)27-15-3-4-18(24)17(23)11-15\/h3-4,11-14H,2,5-10H2,1H3,(H,25,26,27) has a Caco-2 cell effective permeability of -4.480 cm\/s."}", "/scratch/micpie/export/caco2_wang/valid_0-2.jsonl": "{"text":"The DeepSMILES COcccccc6O[C@H][C@@H]O)C=C[C@H][C@@H]C%11)NC)CC[C@]%13%106 represents a molecule that has a Caco-2 cell permeability of -4.090 cm\/s."} {"text":"The SELFIES ['[C][O][C][=C][C][=N][C][Branch2][Ring1][Ring2][N][Branch1][C][C][C][C][C][N][C][=Branch1][C][=O][C][C][C][C][O][Ring1][Branch1][=N][C][Branch1][C][N][=C][Ring2][Ring1][Ring2][C][=C][Ring2][Ring1][Branch2][O][C]'] represents a molecule with a Caco-2 cell effective permeability of -5.640 cm\/s."}", "/scratch/micpie/export/caco2_wang/valid_0-1.jsonl": "{"text":"Based on the InChI InChI=1S\/C18H21NO3\/c1-19-8-7-18-11-4-5-13(20)17(18)22-16-14(21-2)6-3-10(15(16)18)9-12(11)19\/h3-6,11-13,17,20H,7-9H2,1-2H3\/t11-,12+,13-,17-,18-\/m0\/s1, the molecule has a Caco-2 cell effective permeability of -4.090 cm\/s."} {"text":"Based on the DeepSMILES representation of COcccncNC)CCCNC=O)CCCCO5)))))))))))ncN)c6cc%10OC, the molecule has a Caco-2 cell permeability of -5.640 cm\/s."}", "/scratch/micpie/export/caco2_wang/valid_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the Caco-2 cell permeability in cm\/s.\nInChI: InChI=1S\/C18H21NO3\/c1-19-8-7-18-11-4-5-13(20)17(18)22-16-14(21-2)6-3-10(15(16)18)9-12(11)19\/h3-6,11-13,17,20H,7-9H2,1-2H3\/t11-,12+,13-,17-,18-\/m0\/s1\nConstraint: Even if you are not sure, you must answer with a numeric value in cm\/s without the unit and without using any additional words.\nResult: -4.090"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the Caco-2 permeability in cm\/s.\ncanonical SMILES: COc1cc2nc(N(C)CCCNC(=O)C3CCCO3)nc(N)c2cc1OC\nConstraint: Even if you are not sure, you must answer with a numeric value in cm\/s without the unit and without using any additional words.\nResult: -5.640"}", "/scratch/micpie/export/caco2_wang/valid_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the Caco-2 cell effective permeability in cm\/s.\nMolecule InChI: InChI=1S\/C18H21NO3\/c1-19-8-7-18-11-4-5-13(20)17(18)22-16-14(21-2)6-3-10(15(16)18)9-12(11)19\/h3-6,11-13,17,20H,7-9H2,1-2H3\/t11-,12+,13-,17-,18-\/m0\/s1\nConstraint: Even if you are uncertain, you must answer with a numeric value in cm\/s without using any other words.\nResult: -4.090 cm\/s"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the Caco-2 permeability in cm\/s.\nDeepSMILES: COcccncNC)CCCNC=O)CCCCO5)))))))))))ncN)c6cc%10OC\nConstraint: Even if you are not sure, you must answer with a numeric value in cm\/s without using any additional words.\nResult: -5.640 cm\/s"}", "/scratch/micpie/export/caco2_wang/train_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the Caco-2 permeability in cm\/s.\nMolecule canonical SMILES: C\/C=C\\C#CCC\/C=C\\C=C\\C(=O)NCC(C)C\nConstraint: Even if you are uncertain, you must answer with a numeric value in cm\/s without the unit and without using any other words.\nResult: -3.860"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the Caco-2 cell effective permeability in cm\/s.\nMolecule InChI: InChI=1S\/C22H24ClFN4O3\/c1-29-20-13-19-16(12-21(20)31-8-2-5-28-6-9-30-10-7-28)22(26-14-25-19)27-15-3-4-18(24)17(23)11-15\/h3-4,11-14H,2,5-10H2,1H3,(H,25,26,27)\nConstraint: Even if you are uncertain, you must answer with a numeric value in cm\/s without the unit and without using any other words.\nResult: -4.480"}", "/scratch/micpie/export/caco2_wang/train_0-2.jsonl": "{"text":"The DeepSMILES C\/C=C\\C#CCC\/C=C\\C=C\\C=O)NCCC)C represents a molecule that has a Caco-2 cell permeability of -3.860 cm\/s."} {"text":"The SMILES COc1cc2ncnc(Nc3ccc(F)c(Cl)c3)c2cc1OCCCN1CCOCC1 represents a molecule with a Caco-2 cell permeability of -4.480 cm\/s."}", "/scratch/micpie/export/caco2_wang/test_0-11.jsonl": "{"text":"User: I want to come up with a SELFIES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should have a Caco-2 cell effective permeability of -6.220 cm\/s.\nAssistant: Got it, this SELFIES represents a molecule that has a Caco-2 cell effective permeability of -6.220 cm\/s: ['[O][C][=C][C][Branch1][C][O][=C][C][=Branch1][Ring2][=C][Ring1][#Branch1][O][C][Branch1][#C][C][=C][C][=C][Branch1][C][O][C][Branch1][C][O][=C][Ring1][Branch2][C][Branch1][C][O][C][Ring1][S]']"} {"text":"User: I want to come up with a SELFIES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should have a Caco-2 cell effective permeability of -4.980 cm\/s.\nAssistant: Got it, this SELFIES represents a molecule that has a Caco-2 cell effective permeability of -4.980 cm\/s: ['[O][=C][Branch1][C][O][C][=C][C][Branch1][C][Cl][=C][C][=C][Ring1][#Branch1][O]']"}", "/scratch/micpie/export/caco2_wang/train_0-7.jsonl": "{"text":"User: Can you tell me the Caco-2 permeability in cm\/s of the molecule with the canonical SMILES C\/C=C\\C#CCC\/C=C\\C=C\\C(=O)NCC(C)C?\nAssistant: Of course, this molecule has a Caco-2 permeability of -3.860 cm\/s."} {"text":"User: Can you derive the Caco-2 cell effective permeability in cm\/s of the molecule with the canonical SMILES COc1cc2ncnc(Nc3ccc(F)c(Cl)c3)c2cc1OCCCN1CCOCC1?\nAssistant: Yes, this molecule has a Caco-2 cell effective permeability of -4.480 cm\/s."}", "/scratch/micpie/export/caco2_wang/train_0-11.jsonl": "{"text":"User: I want to create a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should have a Caco-2 cell permeability of -3.860 cm\/s.\nAssistant: Understood, this canonical SMILES represents a molecule that has a Caco-2 cell permeability of -3.860 cm\/s: C\/C=C\\C#CCC\/C=C\\C=C\\C(=O)NCC(C)C"} {"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should have a Caco-2 cell permeability of -4.480 cm\/s.\nAssistant: Ok, this InChI represents a molecule that has a Caco-2 cell permeability of -4.480 cm\/s: InChI=1S\/C22H24ClFN4O3\/c1-29-20-13-19-16(12-21(20)31-8-2-5-28-6-9-30-10-7-28)22(26-14-25-19)27-15-3-4-18(24)17(23)11-15\/h3-4,11-14H,2,5-10H2,1H3,(H,25,26,27)"}", "/scratch/micpie/export/caco2_wang/train_0-1.jsonl": "{"text":"Based on the SELFIES ['[C][\/C][=C][\\\\C][#C][C][C][\/C][=C][\\\\C][=C][\\\\C][=Branch1][C][=O][N][C][C][Branch1][C][C][C]'], the molecule has a Caco-2 cell permeability of -3.860 cm\/s."} {"text":"Based on the InChI representation of InChI=1S\/C22H24ClFN4O3\/c1-29-20-13-19-16(12-21(20)31-8-2-5-28-6-9-30-10-7-28)22(26-14-25-19)27-15-3-4-18(24)17(23)11-15\/h3-4,11-14H,2,5-10H2,1H3,(H,25,26,27), the molecule has a Caco-2 cell effective permeability of -4.480 cm\/s."}", "/scratch/micpie/export/caco2_wang/train_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the Caco-2 cell permeability in cm\/s.\nMolecule SELFIES: ['[C][\/C][=C][\\\\C][#C][C][C][\/C][=C][\\\\C][=C][\\\\C][=Branch1][C][=O][N][C][C][Branch1][C][C][C]']\nConstraint: Even if you are uncertain, you must answer with a numeric value in cm\/s without using any other words.\nResult: -3.860 cm\/s"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the Caco-2 permeability in cm\/s.\nMolecule DeepSMILES: COcccncncNccccF)cCl)c6)))))))c6cc%10OCCCNCCOCC6\nConstraint: Even if you are uncertain, you must answer with a numeric value in cm\/s without using any other words.\nResult: -4.480 cm\/s"}", "/scratch/micpie/export/caco2_wang/test_0-7.jsonl": "{"text":"User: Can you estimate the Caco-2 cell permeability in cm\/s of the molecule with the SELFIES ['[O][C][=C][C][Branch1][C][O][=C][C][=Branch1][Ring2][=C][Ring1][#Branch1][O][C][Branch1][#C][C][=C][C][=C][Branch1][C][O][C][Branch1][C][O][=C][Ring1][Branch2][C][Branch1][C][O][C][Ring1][S]']?\nAssistant: Yes, this molecule has a Caco-2 cell permeability of -6.220 cm\/s."} {"text":"User: Can you tell me the Caco-2 cell permeability in cm\/s of the molecule with the canonical SMILES O=C(O)c1cc(Cl)ccc1O?\nAssistant: Of course, this molecule has a Caco-2 cell permeability of -4.980 cm\/s."}", "/scratch/micpie/export/caco2_wang/train_0-9.jsonl": "{"text":"User: I'm looking for the SMILES of a molecule that has a Caco-2 cell effective permeability of -3.860 cm\/s.\nAssistant: This is a molecule that has a Caco-2 cell effective permeability of -3.860 cm\/s: C\/C=C\\C#CCC\/C=C\\C=C\\C(=O)NCC(C)C"} {"text":"User: I'm looking for the DeepSMILES of a molecule that has a Caco-2 cell effective permeability of -4.480 cm\/s.\nAssistant: This is a molecule that has a Caco-2 cell effective permeability of -4.480 cm\/s: COcccncncNccccF)cCl)c6)))))))c6cc%10OCCCNCCOCC6"}", "/scratch/micpie/export/caco2_wang/valid_0-3.jsonl": "{"text":"The molecule with the DeepSMILES COcccccc6O[C@H][C@@H]O)C=C[C@H][C@@H]C%11)NC)CC[C@]%13%106 has a Caco-2 cell effective permeability of -4.090 cm\/s."} {"text":"The molecule with the canonical SMILES COc1cc2nc(N(C)CCCNC(=O)C3CCCO3)nc(N)c2cc1OC has a Caco-2 cell effective permeability of -5.640 cm\/s."}", "/scratch/micpie/export/caco2_wang/test_0-8.jsonl": "{"text":"User: Can you generate the SMILES of a molecule that has a Caco-2 cell effective permeability of -6.220 cm\/s?\nAssistant: Sure, here you go: Oc1cc(O)c2c(c1)OC(c1ccc(O)c(O)c1)C(O)C2"} {"text":"User: Can you create the DeepSMILES of a molecule that has a Caco-2 cell permeability of -4.980 cm\/s?\nAssistant: Yes, I'm happy to help, here you go: O=CO)cccCl)ccc6O"}", "/scratch/micpie/export/caco2_wang/test_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the Caco-2 permeability in cm\/s.\nMolecule SELFIES: ['[O][C][=C][C][Branch1][C][O][=C][C][=Branch1][Ring2][=C][Ring1][#Branch1][O][C][Branch1][#C][C][=C][C][=C][Branch1][C][O][C][Branch1][C][O][=C][Ring1][Branch2][C][Branch1][C][O][C][Ring1][S]']\nConstraint: Even if you are not sure, you must answer with a numeric value in cm\/s without using any other words.\nResult: -6.220 cm\/s"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the Caco-2 permeability in cm\/s.\nMolecule SMILES: O=C(O)c1cc(Cl)ccc1O\nConstraint: Even if you are uncertain, you must answer with a numeric value in cm\/s without using any additional words.\nResult: -4.980 cm\/s"}", "/scratch/micpie/export/bicerano_dataset/test_0-10.jsonl": "{"text":"The polymer with the polymer name of Poly(1 , 1 -dimethylsilazane) has a computed glass coefficient of thermal expansion (CTE) at 300 K of nan 1\/K."} {"text":"The polymer with the polymer name of Poly[N,N'-(p,p'-oxydiphenylene)pyromellitimide] has a computed glass coefficient of thermal expansion (CTE) at 300 K of 126.393260758123 1\/K."}", "/scratch/micpie/export/bicerano_dataset/valid_0-8.jsonl": "{"text":"The polymer with the polymer name of Poly(1,4-butadiene) (cis) has an experimental density at 300 K of 0.89 g\/cc."} {"text":"The polymer with the polymer name of Polyimide 10 has an experimental density at 300 K of nan g\/cc."}", "/scratch/micpie/export/bicerano_dataset/test_0-15.jsonl": "{"text":"Question: What is a polymer with a computed glass transition temperature of 267.197133631238 K and a computed glass coefficient of thermal expansion (CTE) at 300 K of nan 1\/K. Answer: A polymer with PSMILES [*]N[Si]([*])(C)C"} {"text":"Question: What is a polymer with a computed glass transition temperature of 817.214393934327 K and a computed glass coefficient of thermal expansion (CTE) at 300 K of 126.393260758123 1\/K. Answer: A polymer with PSMILES [*]c1ccc(Oc2ccc(-n3c(=O)c4cc5c(=O)n([*])c(=O)c5cc4c3=O)cc2)cc1"}", "/scratch/micpie/export/bicerano_dataset/train_0-8.jsonl": "{"text":"The polymer with the polymer name of Poly[oxy(diethylsilylene)] has an experimental density at 300 K of nan g\/cc."} {"text":"The polymer with the polymer name of Poly[N,N'-(p,p'-carbonyldiphenylene)pyromellitimide] has an experimental density at 300 K of nan g\/cc."}", "/scratch/micpie/export/bicerano_dataset/test_0-5.jsonl": "{"text":"The polymer with the PSMILES of [*]N[Si]([*])(C)C has a computed rubber coefficient of thermal expansion (CTE) at 300 K of 1261.45801302291 1\/K."} {"text":"The polymer with the PSMILES of [*]c1ccc(Oc2ccc(-n3c(=O)c4cc5c(=O)n([*])c(=O)c5cc4c3=O)cc2)cc1 has a computed rubber coefficient of thermal expansion (CTE) at 300 K of 300.67627793269 1\/K."}", "/scratch/micpie/export/bicerano_dataset/valid_0-9.jsonl": "{"text":"The polymer with the polymer name of Poly(1,4-butadiene) (cis) has a computed density at 300 K of 0.88742931382377 g\/cc."} {"text":"The polymer with the polymer name of Polyimide 10 has a computed density at 300 K of 1.28909956482974 g\/cc."}", "/scratch/micpie/export/bicerano_dataset/test_0-1.jsonl": "{"text":"The polymer with the PSMILES of [*]N[Si]([*])(C)C has a computed glass transition temperature of 267.197133631238 K."} {"text":"The polymer with the PSMILES of [*]c1ccc(Oc2ccc(-n3c(=O)c4cc5c(=O)n([*])c(=O)c5cc4c3=O)cc2)cc1 has a computed glass transition temperature of 817.214393934327 K."}", "/scratch/micpie/export/bicerano_dataset/valid_0-0.jsonl": "{"text":"The polymer with the PSMILES of [*]=CCCC=[*] has an experimental glass transition temperature of 171 K."} {"text":"The polymer with the PSMILES of [*]C(=O)c1ccc2c(c1)C(=O)N(c1cccc(C(=O)c3cccc(N4C(=O)c5ccc([*])cc5C4=O)c3)c1)C2=O has an experimental glass transition temperature of 513 K."}", "/scratch/micpie/export/bicerano_dataset/test_0-2.jsonl": "{"text":"The polymer with the PSMILES of [*]N[Si]([*])(C)C has an experimental density at 300 K of nan g\/cc."} {"text":"The polymer with the PSMILES of [*]c1ccc(Oc2ccc(-n3c(=O)c4cc5c(=O)n([*])c(=O)c5cc4c3=O)cc2)cc1 has an experimental density at 300 K of nan g\/cc."}", "/scratch/micpie/export/bicerano_dataset/valid_0-10.jsonl": "{"text":"The polymer with the polymer name of Poly(1,4-butadiene) (cis) has a computed glass coefficient of thermal expansion (CTE) at 300 K of nan 1\/K."} {"text":"The polymer with the polymer name of Polyimide 10 has a computed glass coefficient of thermal expansion (CTE) at 300 K of 111.627370947085 1\/K."}", "/scratch/micpie/export/bicerano_dataset/train_0-6.jsonl": "{"text":"The polymer with the polymer name of Poly[oxy(diethylsilylene)] has an experimental glass transition temperature of 130 K."} {"text":"The polymer with the polymer name of Poly[N,N'-(p,p'-carbonyldiphenylene)pyromellitimide] has an experimental glass transition temperature of 685 K."}", "/scratch/micpie/export/bicerano_dataset/valid_0-6.jsonl": "{"text":"The polymer with the polymer name of Poly(1,4-butadiene) (cis) has an experimental glass transition temperature of 171 K."} {"text":"The polymer with the polymer name of Polyimide 10 has an experimental glass transition temperature of 513 K."}", "/scratch/micpie/export/bicerano_dataset/test_0-9.jsonl": "{"text":"The polymer with the polymer name of Poly(1 , 1 -dimethylsilazane) has a computed density at 300 K of 1.06211724269242 g\/cc."} {"text":"The polymer with the polymer name of Poly[N,N'-(p,p'-oxydiphenylene)pyromellitimide] has a computed density at 300 K of 1.28528865595221 g\/cc."}", "/scratch/micpie/export/bicerano_dataset/test_0-0.jsonl": "{"text":"The polymer with the PSMILES of [*]N[Si]([*])(C)C has an experimental glass transition temperature of 191 K."} {"text":"The polymer with the PSMILES of [*]c1ccc(Oc2ccc(-n3c(=O)c4cc5c(=O)n([*])c(=O)c5cc4c3=O)cc2)cc1 has an experimental glass transition temperature of 672 K."}", "/scratch/micpie/export/bicerano_dataset/valid_0-7.jsonl": "{"text":"The polymer with the polymer name of Poly(1,4-butadiene) (cis) has a computed glass transition temperature of 211.73349156608 K."} {"text":"The polymer with the polymer name of Polyimide 10 has a computed glass transition temperature of 743.820415639762 K."}", "/scratch/micpie/export/bicerano_dataset/test_0-3.jsonl": "{"text":"The polymer with the PSMILES of [*]N[Si]([*])(C)C has a computed density at 300 K of 1.06211724269242 g\/cc."} {"text":"The polymer with the PSMILES of [*]c1ccc(Oc2ccc(-n3c(=O)c4cc5c(=O)n([*])c(=O)c5cc4c3=O)cc2)cc1 has a computed density at 300 K of 1.28528865595221 g\/cc."}", "/scratch/micpie/export/bicerano_dataset/valid_0-11.jsonl": "{"text":"The polymer with the polymer name of Poly(1,4-butadiene) (cis) has a computed rubber coefficient of thermal expansion (CTE) at 300 K of 1389.39265961122 1\/K."} {"text":"The polymer with the polymer name of Polyimide 10 has a computed rubber coefficient of thermal expansion (CTE) at 300 K of 438.836856401366 1\/K."}", "/scratch/micpie/export/bicerano_dataset/train_0-0.jsonl": "{"text":"The polymer with the PSMILES of [*]O[Si]([*])(CC)CC has an experimental glass transition temperature of 130 K."} {"text":"The polymer with the PSMILES of [*]C(=O)c1ccc(-n2c(=O)c3cc4c(=O)n(-c5ccc([*])cc5)c(=O)c4cc3c2=O)cc1 has an experimental glass transition temperature of 685 K."}", "/scratch/micpie/export/bicerano_dataset/test_0-6.jsonl": "{"text":"The polymer with the polymer name of Poly(1 , 1 -dimethylsilazane) has an experimental glass transition temperature of 191 K."} {"text":"The polymer with the polymer name of Poly[N,N'-(p,p'-oxydiphenylene)pyromellitimide] has an experimental glass transition temperature of 672 K."}", "/scratch/micpie/export/bicerano_dataset/train_0-10.jsonl": "{"text":"The polymer with the polymer name of Poly[oxy(diethylsilylene)] has a computed glass coefficient of thermal expansion (CTE) at 300 K of nan 1\/K."} {"text":"The polymer with the polymer name of Poly[N,N'-(p,p'-carbonyldiphenylene)pyromellitimide] has a computed glass coefficient of thermal expansion (CTE) at 300 K of 111.482810962958 1\/K."}", "/scratch/micpie/export/bicerano_dataset/train_0-3.jsonl": "{"text":"The polymer with the PSMILES of [*]O[Si]([*])(CC)CC has a computed density at 300 K of 0.954614133314522 g\/cc."} {"text":"The polymer with the PSMILES of [*]C(=O)c1ccc(-n2c(=O)c3cc4c(=O)n(-c5ccc([*])cc5)c(=O)c4cc3c2=O)cc1 has a computed density at 300 K of 1.26801287782886 g\/cc."}", "/scratch/micpie/export/bicerano_dataset/train_0-12.jsonl": "{"text":"Question: What is a polymer with an experimental glass transition temperature of 130 K and experimental density at 300 K of nan g\/cc. Answer: A polymer with PSMILES [*]O[Si]([*])(CC)CC"} {"text":"Question: What is a polymer with an experimental glass transition temperature of 685 K and experimental density at 300 K of nan g\/cc. Answer: A polymer with PSMILES [*]C(=O)c1ccc(-n2c(=O)c3cc4c(=O)n(-c5ccc([*])cc5)c(=O)c4cc3c2=O)cc1"}", "/scratch/micpie/export/bicerano_dataset/test_0-13.jsonl": "{"text":"Question: What is a polymer with a computed glass coefficient of thermal expansion (CTE) at 300 K of nan and a computed rubber coefficient of thermal expansion (CTE) at 300 K of 1261.45801302291 1\/K. Answer: A polymer with PSMILES [*]N[Si]([*])(C)C"} {"text":"Question: What is a polymer with a computed glass coefficient of thermal expansion (CTE) at 300 K of 126.393260758123 and a computed rubber coefficient of thermal expansion (CTE) at 300 K of 300.67627793269 1\/K. Answer: A polymer with PSMILES [*]c1ccc(Oc2ccc(-n3c(=O)c4cc5c(=O)n([*])c(=O)c5cc4c3=O)cc2)cc1"}", "/scratch/micpie/export/bicerano_dataset/valid_0-2.jsonl": "{"text":"The polymer with the PSMILES of [*]=CCCC=[*] has an experimental density at 300 K of 0.89 g\/cc."} {"text":"The polymer with the PSMILES of [*]C(=O)c1ccc2c(c1)C(=O)N(c1cccc(C(=O)c3cccc(N4C(=O)c5ccc([*])cc5C4=O)c3)c1)C2=O has an experimental density at 300 K of nan g\/cc."}", "/scratch/micpie/export/bicerano_dataset/train_0-14.jsonl": "{"text":"Question: What is a polymer with a computed glass transition temperature of 137.978175297484 K and a computed density at 300 K of 0.954614133314522 g\/cc. Answer: A polymer with PSMILES [*]O[Si]([*])(CC)CC"} {"text":"Question: What is a polymer with a computed glass transition temperature of 782.977843112764 K and a computed density at 300 K of 1.26801287782886 g\/cc. Answer: A polymer with PSMILES [*]C(=O)c1ccc(-n2c(=O)c3cc4c(=O)n(-c5ccc([*])cc5)c(=O)c4cc3c2=O)cc1"}", "/scratch/micpie/export/bicerano_dataset/valid_0-1.jsonl": "{"text":"The polymer with the PSMILES of [*]=CCCC=[*] has a computed glass transition temperature of 211.73349156608 K."} {"text":"The polymer with the PSMILES of [*]C(=O)c1ccc2c(c1)C(=O)N(c1cccc(C(=O)c3cccc(N4C(=O)c5ccc([*])cc5C4=O)c3)c1)C2=O has a computed glass transition temperature of 743.820415639762 K."}", "/scratch/micpie/export/bicerano_dataset/valid_0-13.jsonl": "{"text":"Question: What is a polymer with a computed glass coefficient of thermal expansion (CTE) at 300 K of nan and a computed rubber coefficient of thermal expansion (CTE) at 300 K of 1389.39265961122 1\/K. Answer: A polymer with PSMILES [*]=CCCC=[*]"} {"text":"Question: What is a polymer with a computed glass coefficient of thermal expansion (CTE) at 300 K of 111.627370947085 and a computed rubber coefficient of thermal expansion (CTE) at 300 K of 438.836856401366 1\/K. Answer: A polymer with PSMILES [*]C(=O)c1ccc2c(c1)C(=O)N(c1cccc(C(=O)c3cccc(N4C(=O)c5ccc([*])cc5C4=O)c3)c1)C2=O"}", "/scratch/micpie/export/bicerano_dataset/valid_0-5.jsonl": "{"text":"The polymer with the PSMILES of [*]=CCCC=[*] has a computed rubber coefficient of thermal expansion (CTE) at 300 K of 1389.39265961122 1\/K."} {"text":"The polymer with the PSMILES of [*]C(=O)c1ccc2c(c1)C(=O)N(c1cccc(C(=O)c3cccc(N4C(=O)c5ccc([*])cc5C4=O)c3)c1)C2=O has a computed rubber coefficient of thermal expansion (CTE) at 300 K of 438.836856401366 1\/K."}", "/scratch/micpie/export/bicerano_dataset/train_0-15.jsonl": "{"text":"Question: What is a polymer with a computed glass transition temperature of 137.978175297484 K and a computed glass coefficient of thermal expansion (CTE) at 300 K of nan 1\/K. Answer: A polymer with PSMILES [*]O[Si]([*])(CC)CC"} {"text":"Question: What is a polymer with a computed glass transition temperature of 782.977843112764 K and a computed glass coefficient of thermal expansion (CTE) at 300 K of 111.482810962958 1\/K. Answer: A polymer with PSMILES [*]C(=O)c1ccc(-n2c(=O)c3cc4c(=O)n(-c5ccc([*])cc5)c(=O)c4cc3c2=O)cc1"}", "/scratch/micpie/export/bicerano_dataset/valid_0-4.jsonl": "{"text":"The polymer with the PSMILES of [*]=CCCC=[*] has a computed glass coefficient of thermal expansion (CTE) at 300 K of nan 1\/K."} {"text":"The polymer with the PSMILES of [*]C(=O)c1ccc2c(c1)C(=O)N(c1cccc(C(=O)c3cccc(N4C(=O)c5ccc([*])cc5C4=O)c3)c1)C2=O has a computed glass coefficient of thermal expansion (CTE) at 300 K of 111.627370947085 1\/K."}", "/scratch/micpie/export/bicerano_dataset/train_0-5.jsonl": "{"text":"The polymer with the PSMILES of [*]O[Si]([*])(CC)CC has a computed rubber coefficient of thermal expansion (CTE) at 300 K of 1047.6534189166 1\/K."} {"text":"The polymer with the PSMILES of [*]C(=O)c1ccc(-n2c(=O)c3cc4c(=O)n(-c5ccc([*])cc5)c(=O)c4cc3c2=O)cc1 has a computed rubber coefficient of thermal expansion (CTE) at 300 K of 268.200809943845 1\/K."}", "/scratch/micpie/export/bicerano_dataset/valid_0-15.jsonl": "{"text":"Question: What is a polymer with a computed glass transition temperature of 211.73349156608 K and a computed glass coefficient of thermal expansion (CTE) at 300 K of nan 1\/K. Answer: A polymer with PSMILES [*]=CCCC=[*]"} {"text":"Question: What is a polymer with a computed glass transition temperature of 743.820415639762 K and a computed glass coefficient of thermal expansion (CTE) at 300 K of 111.627370947085 1\/K. Answer: A polymer with PSMILES [*]C(=O)c1ccc2c(c1)C(=O)N(c1cccc(C(=O)c3cccc(N4C(=O)c5ccc([*])cc5C4=O)c3)c1)C2=O"}", "/scratch/micpie/export/bicerano_dataset/valid_0-12.jsonl": "{"text":"Question: What is a polymer with an experimental glass transition temperature of 171 K and experimental density at 300 K of 0.89 g\/cc. Answer: A polymer with PSMILES [*]=CCCC=[*]"} {"text":"Question: What is a polymer with an experimental glass transition temperature of 513 K and experimental density at 300 K of nan g\/cc. Answer: A polymer with PSMILES [*]C(=O)c1ccc2c(c1)C(=O)N(c1cccc(C(=O)c3cccc(N4C(=O)c5ccc([*])cc5C4=O)c3)c1)C2=O"}", "/scratch/micpie/export/bicerano_dataset/train_0-2.jsonl": "{"text":"The polymer with the PSMILES of [*]O[Si]([*])(CC)CC has an experimental density at 300 K of nan g\/cc."} {"text":"The polymer with the PSMILES of [*]C(=O)c1ccc(-n2c(=O)c3cc4c(=O)n(-c5ccc([*])cc5)c(=O)c4cc3c2=O)cc1 has an experimental density at 300 K of nan g\/cc."}", "/scratch/micpie/export/bicerano_dataset/test_0-11.jsonl": "{"text":"The polymer with the polymer name of Poly(1 , 1 -dimethylsilazane) has a computed rubber coefficient of thermal expansion (CTE) at 300 K of 1261.45801302291 1\/K."} {"text":"The polymer with the polymer name of Poly[N,N'-(p,p'-oxydiphenylene)pyromellitimide] has a computed rubber coefficient of thermal expansion (CTE) at 300 K of 300.67627793269 1\/K."}", "/scratch/micpie/export/bicerano_dataset/train_0-7.jsonl": "{"text":"The polymer with the polymer name of Poly[oxy(diethylsilylene)] has a computed glass transition temperature of 137.978175297484 K."} {"text":"The polymer with the polymer name of Poly[N,N'-(p,p'-carbonyldiphenylene)pyromellitimide] has a computed glass transition temperature of 782.977843112764 K."}", "/scratch/micpie/export/bicerano_dataset/train_0-11.jsonl": "{"text":"The polymer with the polymer name of Poly[oxy(diethylsilylene)] has a computed rubber coefficient of thermal expansion (CTE) at 300 K of 1047.6534189166 1\/K."} {"text":"The polymer with the polymer name of Poly[N,N'-(p,p'-carbonyldiphenylene)pyromellitimide] has a computed rubber coefficient of thermal expansion (CTE) at 300 K of 268.200809943845 1\/K."}", "/scratch/micpie/export/bicerano_dataset/train_0-1.jsonl": "{"text":"The polymer with the PSMILES of [*]O[Si]([*])(CC)CC has a computed glass transition temperature of 137.978175297484 K."} {"text":"The polymer with the PSMILES of [*]C(=O)c1ccc(-n2c(=O)c3cc4c(=O)n(-c5ccc([*])cc5)c(=O)c4cc3c2=O)cc1 has a computed glass transition temperature of 782.977843112764 K."}", "/scratch/micpie/export/bicerano_dataset/train_0-13.jsonl": "{"text":"Question: What is a polymer with a computed glass coefficient of thermal expansion (CTE) at 300 K of nan and a computed rubber coefficient of thermal expansion (CTE) at 300 K of 1047.6534189166 1\/K. Answer: A polymer with PSMILES [*]O[Si]([*])(CC)CC"} {"text":"Question: What is a polymer with a computed glass coefficient of thermal expansion (CTE) at 300 K of 111.482810962958 and a computed rubber coefficient of thermal expansion (CTE) at 300 K of 268.200809943845 1\/K. Answer: A polymer with PSMILES [*]C(=O)c1ccc(-n2c(=O)c3cc4c(=O)n(-c5ccc([*])cc5)c(=O)c4cc3c2=O)cc1"}", "/scratch/micpie/export/bicerano_dataset/train_0-4.jsonl": "{"text":"The polymer with the PSMILES of [*]O[Si]([*])(CC)CC has a computed glass coefficient of thermal expansion (CTE) at 300 K of nan 1\/K."} {"text":"The polymer with the PSMILES of [*]C(=O)c1ccc(-n2c(=O)c3cc4c(=O)n(-c5ccc([*])cc5)c(=O)c4cc3c2=O)cc1 has a computed glass coefficient of thermal expansion (CTE) at 300 K of 111.482810962958 1\/K."}", "/scratch/micpie/export/bicerano_dataset/test_0-7.jsonl": "{"text":"The polymer with the polymer name of Poly(1 , 1 -dimethylsilazane) has a computed glass transition temperature of 267.197133631238 K."} {"text":"The polymer with the polymer name of Poly[N,N'-(p,p'-oxydiphenylene)pyromellitimide] has a computed glass transition temperature of 817.214393934327 K."}", "/scratch/micpie/export/bicerano_dataset/train_0-9.jsonl": "{"text":"The polymer with the polymer name of Poly[oxy(diethylsilylene)] has a computed density at 300 K of 0.954614133314522 g\/cc."} {"text":"The polymer with the polymer name of Poly[N,N'-(p,p'-carbonyldiphenylene)pyromellitimide] has a computed density at 300 K of 1.26801287782886 g\/cc."}", "/scratch/micpie/export/bicerano_dataset/valid_0-3.jsonl": "{"text":"The polymer with the PSMILES of [*]=CCCC=[*] has a computed density at 300 K of 0.88742931382377 g\/cc."} {"text":"The polymer with the PSMILES of [*]C(=O)c1ccc2c(c1)C(=O)N(c1cccc(C(=O)c3cccc(N4C(=O)c5ccc([*])cc5C4=O)c3)c1)C2=O has a computed density at 300 K of 1.28909956482974 g\/cc."}", "/scratch/micpie/export/bicerano_dataset/test_0-8.jsonl": "{"text":"The polymer with the polymer name of Poly(1 , 1 -dimethylsilazane) has an experimental density at 300 K of nan g\/cc."} {"text":"The polymer with the polymer name of Poly[N,N'-(p,p'-oxydiphenylene)pyromellitimide] has an experimental density at 300 K of nan g\/cc."}", "/scratch/micpie/export/bicerano_dataset/test_0-14.jsonl": "{"text":"Question: What is a polymer with a computed glass transition temperature of 267.197133631238 K and a computed density at 300 K of 1.06211724269242 g\/cc. Answer: A polymer with PSMILES [*]N[Si]([*])(C)C"} {"text":"Question: What is a polymer with a computed glass transition temperature of 817.214393934327 K and a computed density at 300 K of 1.28528865595221 g\/cc. Answer: A polymer with PSMILES [*]c1ccc(Oc2ccc(-n3c(=O)c4cc5c(=O)n([*])c(=O)c5cc4c3=O)cc2)cc1"}", "/scratch/micpie/export/bicerano_dataset/valid_0-14.jsonl": "{"text":"Question: What is a polymer with a computed glass transition temperature of 211.73349156608 K and a computed density at 300 K of 0.88742931382377 g\/cc. Answer: A polymer with PSMILES [*]=CCCC=[*]"} {"text":"Question: What is a polymer with a computed glass transition temperature of 743.820415639762 K and a computed density at 300 K of 1.28909956482974 g\/cc. Answer: A polymer with PSMILES [*]C(=O)c1ccc2c(c1)C(=O)N(c1cccc(C(=O)c3cccc(N4C(=O)c5ccc([*])cc5C4=O)c3)c1)C2=O"}", "/scratch/micpie/export/bicerano_dataset/test_0-4.jsonl": "{"text":"The polymer with the PSMILES of [*]N[Si]([*])(C)C has a computed glass coefficient of thermal expansion (CTE) at 300 K of nan 1\/K."} {"text":"The polymer with the PSMILES of [*]c1ccc(Oc2ccc(-n3c(=O)c4cc5c(=O)n([*])c(=O)c5cc4c3=O)cc2)cc1 has a computed glass coefficient of thermal expansion (CTE) at 300 K of 126.393260758123 1\/K."}", "/scratch/micpie/export/bicerano_dataset/test_0-12.jsonl": "{"text":"Question: What is a polymer with an experimental glass transition temperature of 191 K and experimental density at 300 K of nan g\/cc. Answer: A polymer with PSMILES [*]N[Si]([*])(C)C"} {"text":"Question: What is a polymer with an experimental glass transition temperature of 672 K and experimental density at 300 K of nan g\/cc. Answer: A polymer with PSMILES [*]c1ccc(Oc2ccc(-n3c(=O)c4cc5c(=O)n([*])c(=O)c5cc4c3=O)cc2)cc1"}", "/scratch/micpie/export/hiv/test_0-10.jsonl": "{"text":"User: I'm searching for the InChI of a molecule that is not active against HIV?\nAssistant: This is a molecule that is not active against HIV: InChI=1S\/C10H15ClNO3P\/c1-3-14-16(13,15-4-2)12-10-7-5-6-9(11)8-10\/h5-8H,3-4H2,1-2H3,(H,12,13)"} {"text":"User: I'm searching for the canonical SMILES of a molecule that is not active against the human immunodeficiency virus?\nAssistant: This is a molecule that is not active against the human immunodeficiency virus: CCCNC(=O)N(CCN(C)C)S(=O)(=O)c1ccc(Cl)cc1"}", "/scratch/micpie/export/hiv/valid_0-8.jsonl": "{"text":"User: Is the molecule with the SELFIES [C][C][=C][C][=C][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=N][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C] active against HIV?\nAssistant: No, it is not active against HIV."} {"text":"User: Is the molecule with the SMILES CN(C(=O)C12C3C4C1C1C2C3C41C(=O)O)C(C)(C)C active against the human immunodeficiency virus?\nAssistant: No, it is not active against the human immunodeficiency virus."}", "/scratch/micpie/export/hiv/train_0-8.jsonl": "{"text":"User: Is the molecule with the SMILES CCC1=[O+][Cu-3]2([O+]=C(CC)C1)[O+]=C(CC)CC(CC)=[O+]2 active against HIV?\nAssistant: No, it is not active against HIV."} {"text":"User: Is the molecule with the canonical SMILES O=C1C(=O)C2(CC1S(=O)(=O)c1ccccc1)c1ccccc1-c1ccccc12 active against HIV?\nAssistant: No, it is not active against HIV."}", "/scratch/micpie/export/hiv/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against HIV.\ncanonical SMILES: CCOP(=O)(Nc1cccc(Cl)c1)OCC\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not active against HIV."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against the human immunodeficiency virus.\nInChI: InChI=1S\/C14H22ClN3O3S\/c1-4-9-16-14(19)18(11-10-17(2)3)22(20,21)13-7-5-12(15)6-8-13\/h5-8H,4,9-11H2,1-3H3,(H,16,19)\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not active against the human immunodeficiency virus."}", "/scratch/micpie/export/hiv/valid_0-9.jsonl": "{"text":"User: Can you generate the SMILES of a molecule that is not active against HIV?\nAssistant: Yes, I'm happy to help, here you go: Cc1ccccc1NC(=N)Nc1ccccc1C"} {"text":"User: Can you generate the canonical SMILES of a molecule that is not active against the human immunodeficiency virus?\nAssistant: Sure, here you go: CN(C(=O)C12C3C4C1C1C2C3C41C(=O)O)C(C)(C)C"}", "/scratch/micpie/export/hiv/test_0-1.jsonl": "{"text":"Based on the SELFIES representation [C][C][O][P][=Branch1][C][=O][Branch1][=N][N][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1][O][C][C], the molecule displays no activity against the human immunodeficiency virus."} {"text":"Based on the SELFIES [C][C][C][N][C][=Branch1][C][=O][N][Branch1][Branch2][C][C][N][Branch1][C][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1], the molecule shows no activity against HIV."}", "/scratch/micpie/export/hiv/valid_0-0.jsonl": "{"text":"The molecule with the SELFIES [C][C][=C][C][=C][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=N][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C] shows no activity against HIV."} {"text":"The molecule with the DeepSMILES CNC=O)CCCC4CC6C6C64C=O)O)))))))))))CC)C)C displays no activity against the human immunodeficiency virus."}", "/scratch/micpie/export/hiv/test_0-2.jsonl": "{"text":"The canonical SMILES CCOP(=O)(Nc1cccc(Cl)c1)OCC represents a molecule that displays no activity against the human immunodeficiency virus."} {"text":"The canonical SMILES CCCNC(=O)N(CCN(C)C)S(=O)(=O)c1ccc(Cl)cc1 represents a molecule that shows no activity against the human immunodeficiency virus."}", "/scratch/micpie/export/hiv/valid_0-10.jsonl": "{"text":"User: I'm looking for the SELFIES of a molecule that is not active against the human immunodeficiency virus?\nAssistant: This is a molecule that is not active against the human immunodeficiency virus: [C][C][=C][C][=C][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=N][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C]"} {"text":"User: I'm looking for the InChI of a molecule that is not active against the human immunodeficiency virus?\nAssistant: This is a molecule that is not active against the human immunodeficiency virus: InChI=1S\/C15H19NO3\/c1-13(2,3)16(4)11(17)14-5-8-6(14)10-7(14)9(5)15(8,10)12(18)19\/h5-10H,1-4H3,(H,18,19)"}", "/scratch/micpie/export/hiv/train_0-6.jsonl": "{"text":"Task: Please generate a molecule DeepSMILES based on the description.\nDescription: A molecule that is active against HIV.\nResult: CCC=[O+][Cu-3][O+]=CCC))C6)))[O+]=CCC))CCCC))=[O+]6"} {"text":"Task: Please give me a SELFIES based on the text description below.\nDescription: A molecule that is active against the human immunodeficiency virus.\nResult: [O][=C][C][=Branch1][C][=O][C][Branch2][Ring1][Ring2][C][C][Ring1][=Branch1][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring2][Ring1][Branch2]"}", "/scratch/micpie/export/hiv/valid_0-6.jsonl": "{"text":"Task: Please create a InChI based on the text description.\nDescription: A molecule that is active against the human immunodeficiency virus.\nResult: InChI=1S\/C15H17N3\/c1-11-7-3-5-9-13(11)17-15(16)18-14-10-6-4-8-12(14)2\/h3-10H,1-2H3,(H3,16,17,18)"} {"text":"Task: Please create a molecule SELFIES based on the description.\nDescription: A molecule that is active against the human immunodeficiency virus.\nResult: [C][N][Branch2][Ring1][N][C][=Branch1][C][=O][C][C][C][C][Ring1][Ring2][C][C][Ring1][=Branch1][C][Ring1][=Branch1][C][Ring1][=Branch1][Ring1][Ring2][C][=Branch1][C][=O][O][C][Branch1][C][C][Branch1][C][C][C]"}", "/scratch/micpie/export/hiv/test_0-9.jsonl": "{"text":"User: Can you generate the canonical SMILES of a molecule that is not active against HIV?\nAssistant: Yes, here you go: CCOP(=O)(Nc1cccc(Cl)c1)OCC"} {"text":"User: Can you generate the InChI of a molecule that is not active against the human immunodeficiency virus?\nAssistant: Sure, here you go: InChI=1S\/C14H22ClN3O3S\/c1-4-9-16-14(19)18(11-10-17(2)3)22(20,21)13-7-5-12(15)6-8-13\/h5-8H,4,9-11H2,1-3H3,(H,16,19)"}", "/scratch/micpie/export/hiv/test_0-0.jsonl": "{"text":"The molecule with the InChI representation of InChI=1S\/C10H15ClNO3P\/c1-3-14-16(13,15-4-2)12-10-7-5-6-9(11)8-10\/h5-8H,3-4H2,1-2H3,(H,12,13) exhibits no activity against the human immunodeficiency virus."} {"text":"The molecule with the DeepSMILES representation of CCCNC=O)NCCNC)C))))S=O)=O)ccccCl)cc6 displays no activity against the human immunodeficiency virus."}", "/scratch/micpie/export/hiv/valid_0-7.jsonl": "{"text":"User: Can you derive if the molecule with the InChI InChI=1S\/C15H17N3\/c1-11-7-3-5-9-13(11)17-15(16)18-14-10-6-4-8-12(14)2\/h3-10H,1-2H3,(H3,16,17,18) is active against the human immunodeficiency virus?\nAssistant: No, this molecule is not active against the human immunodeficiency virus."} {"text":"User: Can you tell me if the molecule with the SMILES CN(C(=O)C12C3C4C1C1C2C3C41C(=O)O)C(C)(C)C is active against HIV?\nAssistant: No, this molecule is not active against HIV."}", "/scratch/micpie/export/hiv/test_0-3.jsonl": "{"text":"The molecule InChI InChI=1S\/C10H15ClNO3P\/c1-3-14-16(13,15-4-2)12-10-7-5-6-9(11)8-10\/h5-8H,3-4H2,1-2H3,(H,12,13) is not active against HIV."} {"text":"The molecule canonical SMILES CCCNC(=O)N(CCN(C)C)S(=O)(=O)c1ccc(Cl)cc1 is not active against the human immunodeficiency virus."}", "/scratch/micpie/export/hiv/valid_0-11.jsonl": "{"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be active against HIV.\nAssistant: Got it, here you go, this DeepSMILES is not active against HIV: Ccccccc6NC=N)Ncccccc6C"} {"text":"User: I want to create a InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be active against the human immunodeficiency virus.\nAssistant: Ok, here you go, this InChI is not active against the human immunodeficiency virus: InChI=1S\/C15H19NO3\/c1-13(2,3)16(4)11(17)14-5-8-6(14)10-7(14)9(5)15(8,10)12(18)19\/h5-10H,1-4H3,(H,18,19)"}", "/scratch/micpie/export/hiv/train_0-0.jsonl": "{"text":"The molecule with the SMILES CCC1=[O+][Cu-3]2([O+]=C(CC)C1)[O+]=C(CC)CC(CC)=[O+]2 displays no activity against HIV."} {"text":"The molecule with the SMILES O=C1C(=O)C2(CC1S(=O)(=O)c1ccccc1)c1ccccc1-c1ccccc12 displays no activity against HIV."}", "/scratch/micpie/export/hiv/test_0-6.jsonl": "{"text":"Task: Please generate a molecule SMILES based on the description below.\nDescription: A molecule that is active against the human immunodeficiency virus.\nResult: CCOP(=O)(Nc1cccc(Cl)c1)OCC"} {"text":"Task: Please generate a InChI based on the description below.\nDescription: A molecule that is active against the human immunodeficiency virus.\nResult: InChI=1S\/C14H22ClN3O3S\/c1-4-9-16-14(19)18(11-10-17(2)3)22(20,21)13-7-5-12(15)6-8-13\/h5-8H,4,9-11H2,1-3H3,(H,16,19)"}", "/scratch/micpie/export/hiv/train_0-10.jsonl": "{"text":"User: I'm looking for the SELFIES of a molecule that is not active against HIV?\nAssistant: This is a molecule that is not active against HIV: [C][C][C][=O+1][Cu-3][Branch1][#Branch2][O+1][=C][Branch1][Ring1][C][C][C][Ring1][Branch2][O+1][=C][Branch1][Ring1][C][C][C][C][Branch1][Ring1][C][C][=O+1][Ring1][#C]"} {"text":"User: I'm searching for the SMILES of a molecule that is not active against the human immunodeficiency virus?\nAssistant: This is a molecule that is not active against the human immunodeficiency virus: O=C1C(=O)C2(CC1S(=O)(=O)c1ccccc1)c1ccccc1-c1ccccc12"}", "/scratch/micpie/export/hiv/train_0-3.jsonl": "{"text":"The SELFIES [C][C][C][=O+1][Cu-3][Branch1][#Branch2][O+1][=C][Branch1][Ring1][C][C][C][Ring1][Branch2][O+1][=C][Branch1][Ring1][C][C][C][C][Branch1][Ring1][C][C][=O+1][Ring1][#C] is not active against the human immunodeficiency virus."} {"text":"The molecule DeepSMILES O=CC=O)CCC5S=O)=O)cccccc6)))))))))cccccc6-cccccc6%13 is not active against HIV."}", "/scratch/micpie/export/hiv/train_0-12.jsonl": "{"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be active against HIV.\nAssistant: Got it, this DeepSMILES is not active against HIV: CCC=[O+][Cu-3][O+]=CCC))C6)))[O+]=CCC))CCCC))=[O+]6"} {"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be active against the human immunodeficiency virus.\nAssistant: Got it, this DeepSMILES is not active against the human immunodeficiency virus: O=CC=O)CCC5S=O)=O)cccccc6)))))))))cccccc6-cccccc6%13"}", "/scratch/micpie/export/hiv/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not active against HIV?\nConstraint: You must select none, one or more options from a, b, c, or d without using any other words.\nOptions:\na. CNC=O)Cncn[nH]c=O)c6=O)))))-cccCl)ccc6%11\nb. COC=O)cccC=CccF)cF)cF)cF)c6F))))))))cccCl)cOC))cC=O)OC)))c6)))))))ccCl)c6OC\nc. CcccccS=O)=O)ccccCl)cc6[N+]=O)[O-])))))))))c6))NCCC6\nd. CCOP=O)NcccccCl)c6)))))))OCC\nAnswer: a, b, c, d"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not active against HIV?\nConstraint: You must select none, one or more options from a, b, c, d, or e without using any additional words.\nOptions:\n(a) [C][C][C][N][C][=Branch1][C][=O][N][Branch1][Branch2][C][C][N][Branch1][C][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1]\n(b) [C][O][C][=C][C][Branch2][Branch1][O][C][C][Branch2][Ring1][Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][C][=C][Ring1][=Branch2][=C][Branch1][C][C][N][C][Branch1][C][C][=C][Ring2][Ring1][Ring2][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][C][=C][Ring1][=Branch2][=C][C][=C][Ring2][Ring2][=Branch1][O]\n(c) [O][=C][C][C][C][C][C][C][=C][C][C][C][Branch1][#Branch2][S][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring2][Ring1][Ring1][Ring1][N]\n(d) [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][C][N][C][=Branch1][C][=O][N][C][=C][C][Branch1][C][C][=C][C][=N][Ring1][#Branch1][C][=C][Ring1][P]\n(e) [C][C][=C][Branch2][Ring1][Branch1][C][=Branch1][C][=O][N][C][=C][C][Branch1][C][F][=C][Branch1][C][F][C][=C][Ring1][Branch2][F][C][Branch1][S][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O][Ring1][=Branch1][C][Branch2][Ring1][Branch1][C][=Branch1][C][=O][N][C][=C][C][Branch1][C][F][=C][Branch1][C][F][C][=C][Ring1][Branch2][F][=C][Branch1][C][C][N][Ring2][Ring2][Branch2]\nAnswer: a, b, c, d, e"}", "/scratch/micpie/export/hiv/valid_0-2.jsonl": "{"text":"The canonical SMILES Cc1ccccc1NC(=N)Nc1ccccc1C represents a molecule that exhibits no activity against the human immunodeficiency virus."} {"text":"The SELFIES [C][N][Branch2][Ring1][N][C][=Branch1][C][=O][C][C][C][C][Ring1][Ring2][C][C][Ring1][=Branch1][C][Ring1][=Branch1][C][Ring1][=Branch1][Ring1][Ring2][C][=Branch1][C][=O][O][C][Branch1][C][C][Branch1][C][C][C] is from a molecule that exhibits no activity against HIV."}", "/scratch/micpie/export/hiv/valid_0-1.jsonl": "{"text":"Based on the DeepSMILES Ccccccc6NC=N)Ncccccc6C, the molecule shows no activity against the human immunodeficiency virus."} {"text":"Based on the DeepSMILES CNC=O)CCCC4CC6C6C64C=O)O)))))))))))CC)C)C, the molecule shows no activity against the human immunodeficiency virus."}", "/scratch/micpie/export/hiv/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not active against HIV?\nConstraint: You must select none, one or more options from a or b without using any additional words.\nOptions:\n[a] Cc1ccccc1NC(=N)Nc1ccccc1C\n[b] CC(CCn1[nH]c(=O)ccc1=O)=NNC(=O)C(N)=O\nAnswer: a, b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not active against the human immunodeficiency virus?\nConstraint: You must select none, one or more options from a, b, or c without using any additional words.\nOptions:\na. COC(C(O)c1ccc2c(c1)OCO2)P(=O)(c1ccccc1)c1ccccc1\nb. CC(=O)OC1(C#N)CC2OC1C1N=NN(C(=O)OC(C)(C)C)C21\nc. CN(C(=O)C12C3C4C1C1C2C3C41C(=O)O)C(C)(C)C\nAnswer: a, b, c"}", "/scratch/micpie/export/hiv/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against the human immunodeficiency virus.\nMolecule InChI: InChI=1S\/C15H17N3\/c1-11-7-3-5-9-13(11)17-15(16)18-14-10-6-4-8-12(14)2\/h3-10H,1-2H3,(H3,16,17,18)\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not active against the human immunodeficiency virus."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against the human immunodeficiency virus.\nMolecule canonical SMILES: CN(C(=O)C12C3C4C1C1C2C3C41C(=O)O)C(C)(C)C\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not active against the human immunodeficiency virus."}", "/scratch/micpie/export/hiv/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against the human immunodeficiency virus.\nMolecule DeepSMILES: Ccccccc6NC=N)Ncccccc6C\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against the human immunodeficiency virus.\nMolecule InChI: InChI=1S\/C15H19NO3\/c1-13(2,3)16(4)11(17)14-5-8-6(14)10-7(14)9(5)15(8,10)12(18)19\/h5-10H,1-4H3,(H,18,19)\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/hiv/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against the human immunodeficiency virus.\nInChI: InChI=1S\/2C7H12O2.Cu\/c2*1-3-6(8)5-7(9)4-2;\/h2*3-5H2,1-2H3;\/q;;+1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not active against the human immunodeficiency virus."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against HIV.\nMolecule canonical SMILES: O=C1C(=O)C2(CC1S(=O)(=O)c1ccccc1)c1ccccc1-c1ccccc12\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not active against HIV."}", "/scratch/micpie/export/hiv/valid_0-12.jsonl": "{"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be active against the human immunodeficiency virus.\nAssistant: Understood, this DeepSMILES is not active against the human immunodeficiency virus: Ccccccc6NC=N)Ncccccc6C"} {"text":"User: I want to create a InChI.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be active against HIV.\nAssistant: Got it, this InChI is not active against HIV: InChI=1S\/C15H19NO3\/c1-13(2,3)16(4)11(17)14-5-8-6(14)10-7(14)9(5)15(8,10)12(18)19\/h5-10H,1-4H3,(H,18,19)"}", "/scratch/micpie/export/hiv/train_0-2.jsonl": "{"text":"The SELFIES [C][C][C][=O+1][Cu-3][Branch1][#Branch2][O+1][=C][Branch1][Ring1][C][C][C][Ring1][Branch2][O+1][=C][Branch1][Ring1][C][C][C][C][Branch1][Ring1][C][C][=O+1][Ring1][#C] is from a molecule that exhibits no activity against HIV."} {"text":"The SELFIES [O][=C][C][=Branch1][C][=O][C][Branch2][Ring1][Ring2][C][C][Ring1][=Branch1][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring2][Ring1][Branch2] is from a molecule that displays no activity against HIV."}", "/scratch/micpie/export/hiv/test_0-11.jsonl": "{"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be active against the human immunodeficiency virus.\nAssistant: Got it, this DeepSMILES is not active against the human immunodeficiency virus: CCOP=O)NcccccCl)c6)))))))OCC"} {"text":"User: I want to create a SELFIES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be active against the human immunodeficiency virus.\nAssistant: Ok, here you go, this SELFIES is not active against the human immunodeficiency virus: [C][C][C][N][C][=Branch1][C][=O][N][Branch1][Branch2][C][C][N][Branch1][C][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1]"}", "/scratch/micpie/export/hiv/train_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the SMILES CCC1=[O+][Cu-3]2([O+]=C(CC)C1)[O+]=C(CC)CC(CC)=[O+]2 is active against HIV?\nAssistant: No, this molecule is not active against HIV."} {"text":"User: Can you derive if the molecule with the SMILES O=C1C(=O)C2(CC1S(=O)(=O)c1ccccc1)c1ccccc1-c1ccccc12 is active against the human immunodeficiency virus?\nAssistant: No, this molecule is not active against the human immunodeficiency virus."}", "/scratch/micpie/export/hiv/train_0-11.jsonl": "{"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be active against the human immunodeficiency virus.\nAssistant: Got it, here you go, this DeepSMILES is not active against the human immunodeficiency virus: CCC=[O+][Cu-3][O+]=CCC))C6)))[O+]=CCC))CCCC))=[O+]6"} {"text":"User: I want to generate a InChI.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be active against HIV.\nAssistant: Ok, here you go, this InChI is not active against HIV: InChI=1S\/C23H16O4S\/c24-21-20(28(26,27)15-8-2-1-3-9-15)14-23(22(21)25)18-12-6-4-10-16(18)17-11-5-7-13-19(17)23\/h1-13,20H,14H2"}", "/scratch/micpie/export/hiv/train_0-1.jsonl": "{"text":"Based on the InChI InChI=1S\/2C7H12O2.Cu\/c2*1-3-6(8)5-7(9)4-2;\/h2*3-5H2,1-2H3;\/q;;+1, the molecule exhibits no activity against the human immunodeficiency virus."} {"text":"Based on the InChI representation InChI=1S\/C23H16O4S\/c24-21-20(28(26,27)15-8-2-1-3-9-15)14-23(22(21)25)18-12-6-4-10-16(18)17-11-5-7-13-19(17)23\/h1-13,20H,14H2, the molecule exhibits no activity against the human immunodeficiency virus."}", "/scratch/micpie/export/hiv/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not active against HIV?\nConstraint: You must select none, one or more options from A or B without using any other words.\nOptions:\nA) InChI=1S\/C21H28F3NO5\/c1-3-4-5-6-7-8-9-10-11-19(27)30-25(20(28)21(22,23)24)15-16-12-13-17(26)18(14-16)29-2\/h3,12-14,26H,1,4-11,15H2,2H3\nB) InChI=1S\/2C7H12O2.Cu\/c2*1-3-6(8)5-7(9)4-2;\/h2*3-5H2,1-2H3;\/q;;+1\nAnswer: A, B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not active against HIV?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any additional words.\nOptions:\nA.) CCOC=O)CNC=O)CC))))CF)F)F))P=O)OCC)))OCC\nB.) CCCCC=CcccCl)cO)cC=O)O))c6))))))cccCl)cO)cC=O)O))c6))))))))))CCCCCCCCCCCCC6C)C%10CCC%17%14C.N\nC.) N=CCSS=O)=O)O))))NCCSS=O)=O)O\nD.) COccccNC=O)CC=O)nnc-cccccc6))))))cN=Ncccccc6C)))))))))c5-cccccc6)))))))))))))))cc6\nE.) O=CC=O)CCC5S=O)=O)cccccc6)))))))))cccccc6-cccccc6%13\nAnswer: A, C, D, E"}", "/scratch/micpie/export/hiv/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against HIV.\nDeepSMILES: CCC=[O+][Cu-3][O+]=CCC))C6)))[O+]=CCC))CCCC))=[O+]6\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against the human immunodeficiency virus.\nMolecule SMILES: O=C1C(=O)C2(CC1S(=O)(=O)c1ccccc1)c1ccccc1-c1ccccc12\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/hiv/test_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the SMILES CCOP(=O)(Nc1cccc(Cl)c1)OCC is active against HIV?\nAssistant: No, this molecule is not active against HIV."} {"text":"User: Can you estimate if the molecule with the DeepSMILES CCCNC=O)NCCNC)C))))S=O)=O)ccccCl)cc6 is active against HIV?\nAssistant: No, this molecule is not active against HIV."}", "/scratch/micpie/export/hiv/train_0-9.jsonl": "{"text":"User: Can you generate the canonical SMILES of a molecule that is not active against the human immunodeficiency virus?\nAssistant: Yes, here you go: CCC1=[O+][Cu-3]2([O+]=C(CC)C1)[O+]=C(CC)CC(CC)=[O+]2"} {"text":"User: Can you give me the canonical SMILES of a molecule that is not active against the human immunodeficiency virus?\nAssistant: Yes, I'm happy to help, here you go: O=C1C(=O)C2(CC1S(=O)(=O)c1ccccc1)c1ccccc1-c1ccccc12"}", "/scratch/micpie/export/hiv/valid_0-3.jsonl": "{"text":"The SELFIES [C][C][=C][C][=C][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=N][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C] is not active against the human immunodeficiency virus."} {"text":"The InChI InChI=1S\/C15H19NO3\/c1-13(2,3)16(4)11(17)14-5-8-6(14)10-7(14)9(5)15(8,10)12(18)19\/h5-10H,1-4H3,(H,18,19) is not active against the human immunodeficiency virus."}", "/scratch/micpie/export/hiv/test_0-8.jsonl": "{"text":"User: Is the molecule with the SMILES CCOP(=O)(Nc1cccc(Cl)c1)OCC active against the human immunodeficiency virus?\nAssistant: No, it is not active against the human immunodeficiency virus."} {"text":"User: Is the molecule with the canonical SMILES CCCNC(=O)N(CCN(C)C)S(=O)(=O)c1ccc(Cl)cc1 active against the human immunodeficiency virus?\nAssistant: No, it is not active against the human immunodeficiency virus."}", "/scratch/micpie/export/hiv/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against the human immunodeficiency virus.\nMolecule InChI: InChI=1S\/C10H15ClNO3P\/c1-3-14-16(13,15-4-2)12-10-7-5-6-9(11)8-10\/h5-8H,3-4H2,1-2H3,(H,12,13)\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against HIV.\nMolecule canonical SMILES: CCCNC(=O)N(CCN(C)C)S(=O)(=O)c1ccc(Cl)cc1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/hiv/test_0-12.jsonl": "{"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be active against the human immunodeficiency virus.\nAssistant: Ok, this SMILES is not active against the human immunodeficiency virus: CCOP(=O)(Nc1cccc(Cl)c1)OCC"} {"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be active against HIV.\nAssistant: Got it, this canonical SMILES is not active against HIV: CCCNC(=O)N(CCN(C)C)S(=O)(=O)c1ccc(Cl)cc1"}", "/scratch/micpie/export/bio_ner_3/valid_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Treatment was discontinued for two months, and KS recurred, with pain and edema in both legs..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: KS,47,49,Anatomy\nedema,74,79,Anatomy\nlegs,88,92,Anatomy"} {"text":"Task: Please carry out the NER task for the the text below.\nText: Each model included (i) experimental stage (end of first sufficient diet phase at experimental day 14; 14 days after initiation of the micronutrient deficient diets; 14 days after return to the sufficient diet), (ii) treatment group (deficiency versus control) and (iii) their interaction as fixed effects, with individual mice treated as a random effect..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: sufficient diet,59,74,treatment\nmicronutrient deficient diets,137,166,treatment\nsufficient diet,196,211,treatment"}", "/scratch/micpie/export/bio_ner_3/test_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: In this procedure, an opening is made in the trabecular meshwork, so that aqueous humor can drain into the sclera..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: trabecular,45,55,Anatomy\naqueous humor,74,87,Anatomy\nsclera,107,113,Anatomy"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: In the microbiota susceptibility cancer experiment, WT mice and WT mice colonized with 8-week-old male APC min \/+ mice microbiota (multiple antibiotics for 3 weeks plus 2 week microbiota adjustment) were exposed to AOM\/DSS colon tumor cycles..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: APC min \/ + mice microbiota,107,134,treatment\nantibiotics,146,157,treatment\nAOM \/ DSS,221,230,treatment"}", "/scratch/micpie/export/bio_ner_3/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Ligation of heparan sulfate proteoglycans (HSPGs) of endothelium with antithrombin (AT) induces cellular signalling events that alter the cell's biochemical and functional responses to inflammatory stimuli (e. g. bacterial lipopolysaccharide [ LPS])..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: endothelium,54,65,Anatomy\ncellular,98,106,Anatomy\ncell,140,144,Anatomy"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Afterwards, drinking water was substituted by organic fennel tea (Hipp, Pfaffenhofen, Germany) ad libitum as fennel tea can mask the bitter taste of DSS that would reduce liquid intake in rabbits..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: organic fennel tea,46,64,treatment\nfennel tea,110,120,treatment\nDSS,150,153,treatment"}", "/scratch/micpie/export/nr_aromatase_tox21/test_0-10.jsonl": "{"text":"User: Can you generate the SMILES of a molecule that is not toxic in the Aromatase enzyme assay?\nAssistant: Sure, here you go: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"User: Can you create the InChI of a molecule that is not toxic in the Aromatase enzyme assay?\nAssistant: Yes, here you go: InChI=1S\/C12H18O\/c1-11(2)8-9-13-10-12-6-4-3-5-7-12\/h3-7,11H,8-10H2,1-2H3"}", "/scratch/micpie/export/nr_aromatase_tox21/valid_0-8.jsonl": "{"text":"User: Can you estimate if the molecule with the canonical SMILES Cc1nc(C)c(C)nc1C is toxic in the NR-Aromatase enzyme assay?\nAssistant: No, this molecule is not toxic in the NR-Aromatase enzyme assay."} {"text":"User: Can you tell me if the molecule with the canonical SMILES CN(C)C(=O)Oc1ccc[n+](C)c1 is toxic in the NR-Aromatase enzyme assay?\nAssistant: No, this molecule is not toxic in the NR-Aromatase enzyme assay."}", "/scratch/micpie/export/nr_aromatase_tox21/test_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the Aromatase enzyme assay?\nConstraint: You must select none, one or more options from a, b, or c without using any other words.\nOptions:\n(a) CCC)=CCC\/CC)=C\/CC\/CC)=C\/CC\/CC)=C\/CC=CC)C=O)cccccc6C%10=O\n(b) COcccccc6NC=O)cccccccc6cc%10O\n(c) CCCNCC))CCC))C=O)NccC)cccc6C\nAnswer: a, c"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the Aromatase enzyme assay?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any additional words.\nOptions:\n[1] CCC)CCOCcccccc6\n[2] COP=O)OC))OC\n[3] CCC=O)NccccO)cc6\n[4] CCCCCCCCNC)C\nAnswer: 1, 2, 3, 4"}", "/scratch/micpie/export/nr_aromatase_tox21/train_0-8.jsonl": "{"text":"User: Can you figure out if the molecule with the SMILES CCN1C(=O)NC(c2ccccc2)C1=O is toxic in the Aromatase enzyme assay?\nAssistant: No, this molecule is not toxic in the Aromatase enzyme assay."} {"text":"User: Can you figure out if the molecule with the SMILES CCCCCCCCCCCCCCCC(=O)OCC is toxic in the NR-Aromatase enzyme assay?\nAssistant: No, this molecule is not toxic in the NR-Aromatase enzyme assay."}", "/scratch/micpie/export/nr_aromatase_tox21/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-Aromatase enzyme assay.\nMolecule SELFIES: [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-Aromatase enzyme assay.\nInChI: InChI=1S\/C12H18O\/c1-11(2)8-9-13-10-12-6-4-3-5-7-12\/h3-7,11H,8-10H2,1-2H3\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/nr_aromatase_tox21/valid_0-9.jsonl": "{"text":"User: Is the molecule with the SMILES Cc1nc(C)c(C)nc1C toxic in the NR-Aromatase enzyme assay?\nAssistant: No, it is not toxic in the NR-Aromatase enzyme assay."} {"text":"User: Is the molecule with the DeepSMILES CNC)C=O)Occcc[n+]C)c6 toxic in the NR-Aromatase enzyme assay?\nAssistant: No, it is not toxic in the NR-Aromatase enzyme assay."}", "/scratch/micpie/export/nr_aromatase_tox21/test_0-1.jsonl": "{"text":"The molecule with the DeepSMILES CCCNCC))CCC))C=O)NccC)cccc6C is not demonstrating toxicity in the NR-aromatase enzyme assay."} {"text":"The molecule with the InChI InChI=1S\/C12H18O\/c1-11(2)8-9-13-10-12-6-4-3-5-7-12\/h3-7,11H,8-10H2,1-2H3 is not exhibiting toxicity in the NR-Aromatase enzyme assay."}", "/scratch/micpie/export/nr_aromatase_tox21/valid_0-0.jsonl": "{"text":"The molecule with the canonical SMILES Cc1nc(C)c(C)nc1C is not toxic in the Aromatase enzyme assay."} {"text":"The molecule with the DeepSMILES representation of CNC)C=O)Occcc[n+]C)c6 is not toxic in the Aromatase enzyme assay."}", "/scratch/micpie/export/nr_aromatase_tox21/test_0-2.jsonl": "{"text":"Based on the DeepSMILES CCCNCC))CCC))C=O)NccC)cccc6C, the molecule has no Aromatase enzyme toxicity features."} {"text":"Based on the InChI InChI=1S\/C12H18O\/c1-11(2)8-9-13-10-12-6-4-3-5-7-12\/h3-7,11H,8-10H2,1-2H3, the molecule has no NR-Aromatase enzyme toxicity features."}", "/scratch/micpie/export/nr_aromatase_tox21/valid_0-10.jsonl": "{"text":"User: Can you give me the DeepSMILES of a molecule that is not toxic in the NR-Aromatase enzyme assay?\nAssistant: Of course, here you go: CcncC)cC)nc6C"} {"text":"User: Can you generate the canonical SMILES of a molecule that is not toxic in the NR-Aromatase enzyme assay?\nAssistant: Yes, here you go: CN(C)C(=O)Oc1ccc[n+](C)c1"}", "/scratch/micpie/export/nr_aromatase_tox21/train_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Aromatase enzyme assay.\nMolecule SELFIES: [C][C][N][C][=Branch1][C][=O][N][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][N][=O]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the Aromatase enzyme assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-Aromatase enzyme assay.\nSELFIES: [C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][=Branch1][C][=O][O][C][C]\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the NR-Aromatase enzyme assay."}", "/scratch/micpie/export/nr_aromatase_tox21/valid_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Aromatase enzyme assay.\nMolecule SELFIES: [C][C][=N][C][Branch1][C][C][=C][Branch1][C][C][N][=C][Ring1][Branch2][C]\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the Aromatase enzyme assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-Aromatase enzyme assay.\ncanonical SMILES: CN(C)C(=O)Oc1ccc[n+](C)c1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the NR-Aromatase enzyme assay."}", "/scratch/micpie/export/nr_aromatase_tox21/test_0-9.jsonl": "{"text":"User: Is the molecule with the SELFIES [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C] toxic in the NR-Aromatase enzyme assay?\nAssistant: No, it is not toxic in the NR-Aromatase enzyme assay."} {"text":"User: Is the molecule with the canonical SMILES CC(C)CCOCc1ccccc1 toxic in the NR-Aromatase enzyme assay?\nAssistant: No, it is not toxic in the NR-Aromatase enzyme assay."}", "/scratch/micpie/export/nr_aromatase_tox21/test_0-0.jsonl": "{"text":"The molecule with the InChI representation of InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20) is not toxic in the Aromatase enzyme assay."} {"text":"The molecule with the SMILES CC(C)CCOCc1ccccc1 is not toxic in the NR-Aromatase enzyme assay."}", "/scratch/micpie/export/nr_aromatase_tox21/valid_0-7.jsonl": "{"text":"Task: Please give me a molecule SMILES based on the text description below.\nDescription: A molecule that is toxic in the NR-Aromatase enzyme assay.\nResult: Cc1nc(C)c(C)nc1C"} {"text":"Task: Please generate a canonical SMILES based on the description.\nDescription: A molecule that is toxic in the NR-Aromatase enzyme assay.\nResult: CN(C)C(=O)Oc1ccc[n+](C)c1"}", "/scratch/micpie/export/nr_aromatase_tox21/test_0-3.jsonl": "{"text":"The DeepSMILES CCCNCC))CCC))C=O)NccC)cccc6C represents a molecule that is not identified as toxic in the NR-Aromatase enzyme assay."} {"text":"The DeepSMILES CCC)CCOCcccccc6 represents a molecule that is not identified as toxic in the Aromatase enzyme assay."}", "/scratch/micpie/export/nr_aromatase_tox21/valid_0-11.jsonl": "{"text":"User: I'm looking for the canonical SMILES of a molecule that is not toxic in the NR-Aromatase enzyme assay?\nAssistant: This is a molecule that is not toxic in the NR-Aromatase enzyme assay: Cc1nc(C)c(C)nc1C"} {"text":"User: I'm searching for the canonical SMILES of a molecule that is not toxic in the Aromatase enzyme assay?\nAssistant: This is a molecule that is not toxic in the Aromatase enzyme assay: CN(C)C(=O)Oc1ccc[n+](C)c1"}", "/scratch/micpie/export/nr_aromatase_tox21/train_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][N][C][=Branch1][C][=O][N][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][N][=O] is not toxic in the Aromatase enzyme assay."} {"text":"The molecule with the InChI InChI=1S\/C18H36O2\/c1-3-5-6-7-8-9-10-11-12-13-14-15-16-17-18(19)20-4-2\/h3-17H2,1-2H3 is not toxic in the NR-Aromatase enzyme assay."}", "/scratch/micpie/export/nr_aromatase_tox21/test_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-Aromatase enzyme assay.\nSMILES: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the NR-Aromatase enzyme assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-Aromatase enzyme assay.\nMolecule DeepSMILES: CCC)CCOCcccccc6\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the NR-Aromatase enzyme assay."}", "/scratch/micpie/export/nr_aromatase_tox21/train_0-10.jsonl": "{"text":"User: Can you give me the SMILES of a molecule that is not toxic in the NR-Aromatase enzyme assay?\nAssistant: Yes, here you go: CCN1C(=O)NC(c2ccccc2)C1=O"} {"text":"User: Can you create the canonical SMILES of a molecule that is not toxic in the NR-Aromatase enzyme assay?\nAssistant: Yes, I'm happy to help, here you go: CCCCCCCCCCCCCCCC(=O)OCC"}", "/scratch/micpie/export/nr_aromatase_tox21/train_0-3.jsonl": "{"text":"The DeepSMILES CCNC=O)NCcccccc6))))))C5=O is from a molecule that is not identified as toxic in the Aromatase enzyme assay."} {"text":"The InChI InChI=1S\/C18H36O2\/c1-3-5-6-7-8-9-10-11-12-13-14-15-16-17-18(19)20-4-2\/h3-17H2,1-2H3 is from a molecule that is not identified as toxic in the Aromatase enzyme assay."}", "/scratch/micpie/export/nr_aromatase_tox21/train_0-12.jsonl": "{"text":"User: I want to come up with a SELFIES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be toxic in the NR-Aromatase enzyme assay.\nAssistant: Ok, here you go, this SELFIES is not toxic in the NR-Aromatase enzyme assay: [C][C][N][C][=Branch1][C][=O][N][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][N][=O]"} {"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be toxic in the NR-Aromatase enzyme assay.\nAssistant: Ok, here you go, this SMILES is not toxic in the NR-Aromatase enzyme assay: CCCCCCCCCCCCCCCC(=O)OCC"}", "/scratch/micpie/export/nr_aromatase_tox21/test_0-13.jsonl": "{"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the NR-Aromatase enzyme assay.\nAssistant: Understood, this DeepSMILES is not toxic in the NR-Aromatase enzyme assay: CCCNCC))CCC))C=O)NccC)cccc6C"} {"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the Aromatase enzyme assay.\nAssistant: Understood, this InChI is not toxic in the Aromatase enzyme assay: InChI=1S\/C12H18O\/c1-11(2)8-9-13-10-12-6-4-3-5-7-12\/h3-7,11H,8-10H2,1-2H3"}", "/scratch/micpie/export/nr_aromatase_tox21/valid_0-2.jsonl": "{"text":"Based on the SMILES representation Cc1nc(C)c(C)nc1C, the molecule has no NR-Aromatase enzyme toxicity properties."} {"text":"Based on the InChI representation InChI=1S\/C9H13N2O2\/c1-10(2)9(12)13-8-5-4-6-11(3)7-8\/h4-7H,1-3H3\/q+1, the molecule has no Aromatase enzyme toxicity features."}", "/scratch/micpie/export/nr_aromatase_tox21/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES CCN1C(=O)NC(c2ccccc2)C1=O toxic in the Aromatase enzyme assay?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any other words.\nOptions:\n1. False\n2. True\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES representation of CCCCCCCCCCCCCCCC=O)OCC toxic in the Aromatase enzyme assay?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any other words.\nOptions:\n[1] False\n[2] True\nAnswer: 1"}", "/scratch/micpie/export/nr_aromatase_tox21/valid_0-1.jsonl": "{"text":"The molecule with the SMILES representation of Cc1nc(C)c(C)nc1C is not demonstrating toxicity in the NR-aromatase enzyme assay."} {"text":"The molecule with the SMILES CN(C)C(=O)Oc1ccc[n+](C)c1 is not exhibiting toxicity in the NR-Aromatase enzyme assay."}", "/scratch/micpie/export/nr_aromatase_tox21/valid_0-13.jsonl": "{"text":"User: I want to generate a SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the Aromatase enzyme assay.\nAssistant: Understood, this SMILES is not toxic in the Aromatase enzyme assay: Cc1nc(C)c(C)nc1C"} {"text":"User: I want to generate a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the NR-Aromatase enzyme assay.\nAssistant: Understood, this canonical SMILES is not toxic in the NR-Aromatase enzyme assay: CN(C)C(=O)Oc1ccc[n+](C)c1"}", "/scratch/micpie/export/nr_aromatase_tox21/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-Aromatase enzyme assay.\nDeepSMILES: CcncC)cC)nc6C\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Aromatase enzyme assay.\nMolecule SMILES: CN(C)C(=O)Oc1ccc[n+](C)c1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"}", "/scratch/micpie/export/nr_aromatase_tox21/train_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-Aromatase enzyme assay?\nConstraint: You must select none, one or more options from 1, 2, 3, 4, or 5 without using any additional words.\nOptions:\n1) CCCCNCC(O)c1ccc(O)cc1\n2) CCCCC1(COC(=O)CCC(=O)O)C(=O)N(c2ccccc2)N(c2ccccc2)C1=O\n3) CCCCCCCCCCCl\n4) CCn1nc(Cc2ccccc2)cc1C1CCN(C[C@H]2C[C@H](N(C)[C@@H](C(=O)O)C(C)C)C[C@@H]2c2cccc(F)c2)CC1\n5) CCN1C(=O)NC(c2ccccc2)C1=O\nAnswer: 1, 3, 4, 5"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-Aromatase enzyme assay?\nConstraint: You must select none, one or more options from A, B, C, or D without using any other words.\nOptions:\n(A) COccccCcncccccOC))cOC))cc%106)))))))))))cc6OC\n(B) CCCCCCCCCCCCCCCC=O)OCC\n(C) CCOC=O)C=CC)NCC)=CC=O)OCC))))C6cccccc6\/C=C\/C=O)OCC)C)C\n(D) FCF)F)CF)F)CF)F)CF)F)CF)F)CF)F)F\nAnswer: A, B, D"}", "/scratch/micpie/export/nr_aromatase_tox21/valid_0-4.jsonl": "{"text":"The InChI InChI=1S\/C8H12N2\/c1-5-6(2)10-8(4)7(3)9-5\/h1-4H3 is not toxic in the Aromatase enzyme assay."} {"text":"The SELFIES [C][N][Branch1][C][C][C][=Branch1][C][=O][O][C][=C][C][=C][N+1][Branch1][C][C][=C][Ring1][#Branch1] is not toxic in the Aromatase enzyme assay."}", "/scratch/micpie/export/nr_aromatase_tox21/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-Aromatase enzyme assay.\nMolecule DeepSMILES: CCNC=O)NCcccccc6))))))C5=O\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Aromatase enzyme assay.\nMolecule DeepSMILES: CCCCCCCCCCCCCCCC=O)OCC\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"}", "/scratch/micpie/export/nr_aromatase_tox21/valid_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the Aromatase enzyme assay?\nConstraint: You must select none, one or more options from a, b, c, or d without using any additional words.\nOptions:\n(a) InChI=1S\/C14H12N4O2S\/c15-10-5-7-11(8-6-10)21(19,20)18-14-9-16-12-3-1-2-4-13(12)17-14\/h1-9H,15H2,(H,17,18)\n(b) InChI=1S\/C7H18N2O2\/c8-2-1-3-9(4-6-10)5-7-11\/h10-11H,1-8H2\n(c) InChI=1S\/C14H9F3O2\/c15-14(16,17)10-7-5-9(6-8-10)11-3-1-2-4-12(11)13(18)19\/h1-8H,(H,18,19)\n(d) InChI=1S\/C8H12N2\/c1-5-6(2)10-8(4)7(3)9-5\/h1-4H3\nAnswer: a, b, c, d"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-Aromatase enzyme assay?\nConstraint: You must select none, one or more options from A, B, or C without using any additional words.\nOptions:\nA. CN(C)C(=O)Oc1ccc[n+](C)c1\nB. COc1ccc(O)cc1\nC. Cc1ccccc1OCC(O)CO\nAnswer: A, B, C"}", "/scratch/micpie/export/nr_aromatase_tox21/valid_0-12.jsonl": "{"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very curious. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be toxic in the Aromatase enzyme assay.\nAssistant: Got it, this DeepSMILES is not toxic in the Aromatase enzyme assay: CcncC)cC)nc6C"} {"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be toxic in the NR-Aromatase enzyme assay.\nAssistant: Ok, this DeepSMILES is not toxic in the NR-Aromatase enzyme assay: CNC)C=O)Occcc[n+]C)c6"}", "/scratch/micpie/export/nr_aromatase_tox21/train_0-2.jsonl": "{"text":"Based on the SELFIES representation [C][C][N][C][=Branch1][C][=O][N][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][N][=O], the molecule has no Aromatase enzyme toxicity features."} {"text":"Based on the SMILES CCCCCCCCCCCCCCCC(=O)OCC, the molecule has no NR-Aromatase enzyme toxicity characteristics."}", "/scratch/micpie/export/nr_aromatase_tox21/test_0-11.jsonl": "{"text":"User: I'm looking for the DeepSMILES of a molecule that is not toxic in the Aromatase enzyme assay?\nAssistant: This is a molecule that is not toxic in the Aromatase enzyme assay: CCCNCC))CCC))C=O)NccC)cccc6C"} {"text":"User: I'm searching for the DeepSMILES of a molecule that is not toxic in the NR-Aromatase enzyme assay?\nAssistant: This is a molecule that is not toxic in the NR-Aromatase enzyme assay: CCC)CCOCcccccc6"}", "/scratch/micpie/export/nr_aromatase_tox21/train_0-7.jsonl": "{"text":"Task: Please create a molecule SMILES based on the description below.\nDescription: A molecule that is toxic in the NR-Aromatase enzyme assay.\nResult: CCN1C(=O)NC(c2ccccc2)C1=O"} {"text":"Task: Please give me a molecule canonical SMILES based on the description below.\nDescription: A molecule that is toxic in the Aromatase enzyme assay.\nResult: CCCCCCCCCCCCCCCC(=O)OCC"}", "/scratch/micpie/export/nr_aromatase_tox21/train_0-11.jsonl": "{"text":"User: I'm searching for the DeepSMILES of a molecule that is not toxic in the NR-Aromatase enzyme assay?\nAssistant: This is a molecule that is not toxic in the NR-Aromatase enzyme assay: CCNC=O)NCcccccc6))))))C5=O"} {"text":"User: I'm looking for the InChI of a molecule that is not toxic in the NR-Aromatase enzyme assay?\nAssistant: This is a molecule that is not toxic in the NR-Aromatase enzyme assay: InChI=1S\/C18H36O2\/c1-3-5-6-7-8-9-10-11-12-13-14-15-16-17-18(19)20-4-2\/h3-17H2,1-2H3"}", "/scratch/micpie/export/nr_aromatase_tox21/train_0-1.jsonl": "{"text":"The molecule with the SMILES representation of CCN1C(=O)NC(c2ccccc2)C1=O is not demonstrating toxicity in the NR-aromatase enzyme assay."} {"text":"The molecule with the SELFIES representation of [C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][=Branch1][C][=O][O][C][C] is not demonstrating toxicity in the NR-aromatase enzyme assay."}", "/scratch/micpie/export/nr_aromatase_tox21/train_0-13.jsonl": "{"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the Aromatase enzyme assay.\nAssistant: Got it, this SMILES is not toxic in the Aromatase enzyme assay: CCN1C(=O)NC(c2ccccc2)C1=O"} {"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the Aromatase enzyme assay.\nAssistant: Ok, this InChI is not toxic in the Aromatase enzyme assay: InChI=1S\/C18H36O2\/c1-3-5-6-7-8-9-10-11-12-13-14-15-16-17-18(19)20-4-2\/h3-17H2,1-2H3"}", "/scratch/micpie/export/nr_aromatase_tox21/train_0-4.jsonl": "{"text":"The molecule DeepSMILES CCNC=O)NCcccccc6))))))C5=O is not toxic in the Aromatase enzyme assay."} {"text":"The InChI InChI=1S\/C18H36O2\/c1-3-5-6-7-8-9-10-11-12-13-14-15-16-17-18(19)20-4-2\/h3-17H2,1-2H3 is not toxic in the NR-Aromatase enzyme assay."}", "/scratch/micpie/export/nr_aromatase_tox21/test_0-7.jsonl": "{"text":"Task: Please create a DeepSMILES based on the description below.\nDescription: A molecule that is toxic in the NR-Aromatase enzyme assay.\nResult: CCCNCC))CCC))C=O)NccC)cccc6C"} {"text":"Task: Please give me a molecule InChI based on the text description.\nDescription: A molecule that is toxic in the Aromatase enzyme assay.\nResult: InChI=1S\/C12H18O\/c1-11(2)8-9-13-10-12-6-4-3-5-7-12\/h3-7,11H,8-10H2,1-2H3"}", "/scratch/micpie/export/nr_aromatase_tox21/train_0-9.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C11H12N2O2\/c1-2-13-10(14)9(12-11(13)15)8-6-4-3-5-7-8\/h3-7,9H,2H2,1H3,(H,12,15) toxic in the NR-Aromatase enzyme assay?\nAssistant: No, it is not toxic in the NR-Aromatase enzyme assay."} {"text":"User: Is the molecule with the InChI InChI=1S\/C18H36O2\/c1-3-5-6-7-8-9-10-11-12-13-14-15-16-17-18(19)20-4-2\/h3-17H2,1-2H3 toxic in the NR-Aromatase enzyme assay?\nAssistant: No, it is not toxic in the NR-Aromatase enzyme assay."}", "/scratch/micpie/export/nr_aromatase_tox21/valid_0-3.jsonl": "{"text":"The DeepSMILES CcncC)cC)nc6C represents a molecule that is not identified as toxic in the NR-Aromatase enzyme assay."} {"text":"The InChI InChI=1S\/C9H13N2O2\/c1-10(2)9(12)13-8-5-4-6-11(3)7-8\/h4-7H,1-3H3\/q+1 represents a molecule that is not identified as toxic in the NR-Aromatase enzyme assay."}", "/scratch/micpie/export/nr_aromatase_tox21/test_0-8.jsonl": "{"text":"User: Can you estimate if the molecule with the SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C is toxic in the Aromatase enzyme assay?\nAssistant: No, this molecule is not toxic in the Aromatase enzyme assay."} {"text":"User: Can you figure out if the molecule with the canonical SMILES CC(C)CCOCc1ccccc1 is toxic in the Aromatase enzyme assay?\nAssistant: No, this molecule is not toxic in the Aromatase enzyme assay."}", "/scratch/micpie/export/nr_aromatase_tox21/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C toxic in the NR-Aromatase enzyme assay?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any other words.\nOptions:\n1 True\n2 False\nAnswer: 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of CC(C)CCOCc1ccccc1 toxic in the NR-Aromatase enzyme assay?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any additional words.\nOptions:\n1.) False\n2.) True\nAnswer: 1"}", "/scratch/micpie/export/nr_aromatase_tox21/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES [C][C][=N][C][Branch1][C][C][=C][Branch1][C][C][N][=C][Ring1][Branch2][C] toxic in the NR-Aromatase enzyme assay?\nConstraint: Even if you are uncertain, you must pick either A or B without using any additional words.\nOptions:\n[A] False\n[B] True\nAnswer: A"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES representation of CN(C)C(=O)Oc1ccc[n+](C)c1 toxic in the Aromatase enzyme assay?\nConstraint: Even if you are uncertain, you must pick either a or b without using any other words.\nOptions:\na) True\nb) False\nAnswer: b"}", "/scratch/micpie/export/nr_aromatase_tox21/test_0-4.jsonl": "{"text":"The molecule SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C is not toxic in the NR-Aromatase enzyme assay."} {"text":"The molecule canonical SMILES CC(C)CCOCc1ccccc1 is not toxic in the Aromatase enzyme assay."}", "/scratch/micpie/export/nr_aromatase_tox21/test_0-12.jsonl": "{"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the NR-Aromatase enzyme assay.\nAssistant: Ok, here you go, this SELFIES is not toxic in the NR-Aromatase enzyme assay: [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C]"} {"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be toxic in the Aromatase enzyme assay.\nAssistant: Got it, here you go, this SELFIES is not toxic in the Aromatase enzyme assay: [C][C][Branch1][C][C][C][C][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/train_0-17.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES of ['[C][C][C][O][C][=C][Branch1][N][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][C][Branch1][C][F][=C][C][C][=Branch1][C][=O][C][Branch1][=Branch1][C][=Branch1][C][=O][O][=C][N][Ring2][Ring1][Branch2][C][Ring2][Ring1][=Branch1][=Ring1][#Branch2]'] penetrating the blood brain barrier?\nConstraint: Even if you are not sure, you must pick either a or b without using any other words.\nOptions:\na: True\nb: False\nAnswer: a"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES of CCCCC)=NN=CccccOC))cOC))c6))))))cccOC))cOC))cc6%11 penetrating the blood brain barrier?\nConstraint: Even if you are not sure, you must pick either A or B without using any additional words.\nOptions:\nA. True\nB. False\nAnswer: A"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/train_0-16.jsonl": "{"text":"User: I want to create a SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should be penetrating the blood brain barrier.\nAssistant: Got it, this SMILES is penetrating the blood brain barrier: CC1COc2c(N3CCN(C)CC3)c(F)cc3c(=O)c(C(=O)O)cn1c23"} {"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should be penetrating the blood brain barrier to reach the brain.\nAssistant: Got it, this InChI is penetrating the blood brain barrier to reach the brain: InChI=1S\/C22H26N2O4\/c1-7-15-13(2)23-24-22(14-8-9-18(25-3)19(10-14)26-4)17-12-21(28-6)20(27-5)11-16(15)17\/h8-12,15H,7H2,1-6H3"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/test_0-10.jsonl": "{"text":"Task: Please give me a molecule InChI based on the text description below.\nDescription: A molecule that is penetrating the blood brain barrier to reach the brain.\nResult: InChI=1S\/C18H27Cl2NO2\/c1-18(2,3)23-17(22)6-4-5-15-7-9-16(10-8-15)21(13-11-19)14-12-20\/h7-10H,4-6,11-14H2,1-3H3"} {"text":"Task: Please generate a molecule SMILES based on the text description.\nDescription: A molecule that is penetrating the blood brain barrier.\nResult: CCN(CC)C(=O)COc1cc2c(O)c3c(O)c(C)c4c(c13)C(=O)[C@@](C)(O\/C=C\/[C@H](OC)[C@@H](C)[C@@H](OC(C)=O)[C@H](C)[C@H](O)[C@H](C)[C@@H](O)[C@@H](C)\/C=C\/C=C(\/C)C(=O)N2)O4"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/valid_0-8.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is penetrating the blood brain barrier to reach the brain.\nSMILES: CN(C)[C@@H]1C(=O)\/C(=C(\/O)NCN2CCCC2)C(=O)[C@@]2(O)C(=O)C3=C(O)c4c(O)cccc4[C@@](C)(O)[C@H]3C[C@@H]12\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is penetrating the blood brain barrier.\nMolecule InChI: InChI=1S\/C18H23N3O\/c1-4-21(5-2)14-10-11-17(13(3)12-14)20-18(22)15-8-6-7-9-16(15)19\/h6-12H,4-5,19H2,1-3H3,(H,20,22)\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/test_0-16.jsonl": "{"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should be penetrating the blood brain barrier to reach the brain.\nAssistant: Got it, this SELFIES is penetrating the blood brain barrier to reach the brain: ['[C][C][Branch1][C][C][Branch1][C][C][O][C][=Branch1][C][=O][C][C][C][C][=C][C][=C][Branch1][#Branch2][N][Branch1][Ring2][C][C][Cl][C][C][Cl][C][=C][Ring1][=N]']"} {"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be penetrating the blood brain barrier.\nAssistant: Got it, this SELFIES is not penetrating the blood brain barrier: ['[C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][O][C][=C][Branch1][C][O][C][Branch1][C][C][=C][C][=Branch1][=Branch1][=C][Ring1][=N][Ring1][Branch2][C][=Branch1][C][=O][C@@][Branch1][C][C][Branch2][Branch1][=Branch2][O][\/C][=C][\/C@H1][Branch1][Ring1][O][C][C@@H1][Branch1][C][C][C@@H1][Branch1][#Branch1][O][C][Branch1][C][C][=O][C@H1][Branch1][C][C][C@H1][Branch1][C][O][C@H1][Branch1][C][C][C@@H1][Branch1][C][O][C@@H1][Branch1][C][C][\/C][=C][\/C][=C][Branch1][C][\/C][C][=Branch1][C][=O][N][Ring2][Ring2][=C][O][Ring2][Ring2][#Branch1]']"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/test_0-15.jsonl": "{"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should penetrating the blood brain barrier to reach the brain.\nAssistant: Ok, here you go, this SMILES is penetrating the blood brain barrier to reach the brain: CC(C)(C)OC(=O)CCCc1ccc(N(CCCl)CCCl)cc1"} {"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not penetrating the blood brain barrier to reach the brain.\nAssistant: Ok, here you go, this DeepSMILES is not penetrating the blood brain barrier to reach the brain: CCNCC))C=O)COccccO)ccO)cC)ccc%106)C=O)[C@@]C)O\/C=C\/[C@H]OC))[C@@H]C)[C@@H]OCC)=O)))[C@H]C)[C@H]O)[C@H]C)[C@@H]O)[C@@H]C)\/C=C\/C=C\/C)C=O)N%26)))))))))))))))))O5"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/train_0-8.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is penetrating the blood brain barrier to reach the brain.\nMolecule canonical SMILES: CC1COc2c(N3CCN(C)CC3)c(F)cc3c(=O)c(C(=O)O)cn1c23\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is penetrating the blood brain barrier to reach the brain.\nMolecule InChI: InChI=1S\/C22H26N2O4\/c1-7-15-13(2)23-24-22(14-8-9-18(25-3)19(10-14)26-4)17-12-21(28-6)20(27-5)11-16(15)17\/h8-12,15H,7H2,1-6H3\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/test_0-5.jsonl": "{"text":"CC(C)(C)OC(=O)CCCc1ccc(N(CCCl)CCCl)cc1 represents a molecule that is penetrating the blood brain barrier."} {"text":"CCNCC))C=O)COccccO)ccO)cC)ccc%106)C=O)[C@@]C)O\/C=C\/[C@H]OC))[C@@H]C)[C@@H]OCC)=O)))[C@H]C)[C@H]O)[C@H]C)[C@@H]O)[C@@H]C)\/C=C\/C=C\/C)C=O)N%26)))))))))))))))))O5 represents a molecule that is not penetrating the blood brain barrier."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/valid_0-9.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is penetrating the blood brain barrier to reach the brain.\nMolecule InChI: InChI=1S\/C27H33N3O8\/c1-26(37)13-7-6-8-16(31)17(13)21(32)18-14(26)11-15-20(29(2)3)22(33)19(24(35)27(15,38)23(18)34)25(36)28-12-30-9-4-5-10-30\/h6-8,14-15,20,28,31-32,36-38H,4-5,9-12H2,1-3H3\/b25-19-\/t14-,15-,20-,26+,27-\/m0\/s1\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is penetrating the blood brain barrier to reach the brain."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is penetrating the blood brain barrier.\ncanonical SMILES: CCN(CC)c1ccc(NC(=O)c2ccccc2N)c(C)c1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is penetrating the blood brain barrier."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/test_0-1.jsonl": "{"text":"Based on the canonical SMILES CC(C)(C)OC(=O)CCCc1ccc(N(CCCl)CCCl)cc1, the molecule is penetrating the blood brain barrier to reach the brain."} {"text":"Based on the SELFIES representation ['[C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][O][C][=C][Branch1][C][O][C][Branch1][C][C][=C][C][=Branch1][=Branch1][=C][Ring1][=N][Ring1][Branch2][C][=Branch1][C][=O][C@@][Branch1][C][C][Branch2][Branch1][=Branch2][O][\/C][=C][\/C@H1][Branch1][Ring1][O][C][C@@H1][Branch1][C][C][C@@H1][Branch1][#Branch1][O][C][Branch1][C][C][=O][C@H1][Branch1][C][C][C@H1][Branch1][C][O][C@H1][Branch1][C][C][C@@H1][Branch1][C][O][C@@H1][Branch1][C][C][\/C][=C][\/C][=C][Branch1][C][\/C][C][=Branch1][C][=O][N][Ring2][Ring2][=C][O][Ring2][Ring2][#Branch1]'], the molecule is not penetrating the blood brain barrier."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/test_0-18.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are penetrating the blood brain barrier?\nConstraint: You must select none, one or more options from a, b, c, or d without using any other words.\nOptions:\na: CCC)S[C@@H][C@H]NC=O)CC=O)Occcccc6)CCC5))))))))))cccccc6)))))))))C=O)N4[C@H]7C=O)O\nb: CCOCOCCCO)C=O)CO)))CccO)cccO)c6%10))C=O)cccccc6C%10=O))))))))))))))))))CCN)C6O\nc: CCC)OC=O)NC5=O\nd: CCC)C)OC=O)CCCccccNCCCl)))CCCl))))cc6\nAnswer: c, d"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not penetrating the blood brain barrier to reach the brain?\nConstraint: You must select none, one or more options from A or B without using any additional words.\nOptions:\nA CCN(CC)C(=O)COc1cc2c(O)c3c(O)c(C)c4c(c13)C(=O)[C@@](C)(O\/C=C\/[C@H](OC)[C@@H](C)[C@@H](OC(C)=O)[C@H](C)[C@H](O)[C@H](C)[C@@H](O)[C@@H](C)\/C=C\/C=C(\/C)C(=O)N2)O4\nB O=C1CSC2(CCN(CCCN3c4ccccc4Sc4ccc(Cl)cc43)CC2)N1.[Cl-].[H+]\nAnswer: A"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/valid_0-0.jsonl": "{"text":"The molecule with the SELFIES ['[C][N][Branch1][C][C][C@@H1][C][=Branch1][C][=O][\/C][=Branch1][=C][=C][Branch1][C][\/O][N][C][N][C][C][C][C][Ring1][Branch1][C][=Branch1][C][=O][C@@][Branch1][C][O][C][=Branch1][C][=O][C][=C][Branch1][C][O][C][=C][Branch1][C][O][C][=C][C][=C][Ring1][#Branch1][C@@][Branch1][C][C][Branch1][C][O][C@H1][Ring1][=C][C][C@@H1][Ring2][Ring2][Ring1][Ring2][Ring1][Ring2]'] is penetrating the blood brain barrier to reach the brain."} {"text":"The molecule with the DeepSMILES CCNCC))ccccNC=O)cccccc6N)))))))))cC)c6 is penetrating the blood brain barrier."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/test_0-2.jsonl": "{"text":"The canonical SMILES CC(C)(C)OC(=O)CCCc1ccc(N(CCCl)CCCl)cc1 is from a molecule that is identified as penetrating the blood brain barrier to reach the brain."} {"text":"The SELFIES ['[C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][O][C][=C][Branch1][C][O][C][Branch1][C][C][=C][C][=Branch1][=Branch1][=C][Ring1][=N][Ring1][Branch2][C][=Branch1][C][=O][C@@][Branch1][C][C][Branch2][Branch1][=Branch2][O][\/C][=C][\/C@H1][Branch1][Ring1][O][C][C@@H1][Branch1][C][C][C@@H1][Branch1][#Branch1][O][C][Branch1][C][C][=O][C@H1][Branch1][C][C][C@H1][Branch1][C][O][C@H1][Branch1][C][C][C@@H1][Branch1][C][O][C@@H1][Branch1][C][C][\/C][=C][\/C][=C][Branch1][C][\/C][C][=Branch1][C][=O][N][Ring2][Ring2][=C][O][Ring2][Ring2][#Branch1]'] represents a molecule that is not identified as penetrating the blood brain barrier."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/valid_0-10.jsonl": "{"text":"Task: Please generate a molecule canonical SMILES based on the description.\nDescription: A molecule that is penetrating the blood brain barrier to reach the brain.\nResult: CN(C)[C@@H]1C(=O)\/C(=C(\/O)NCN2CCCC2)C(=O)[C@@]2(O)C(=O)C3=C(O)c4c(O)cccc4[C@@](C)(O)[C@H]3C[C@@H]12"} {"text":"Task: Please give me a molecule SELFIES based on the text description below.\nDescription: A molecule that is penetrating the blood brain barrier.\nResult: ['[C][C][N][Branch1][Ring1][C][C][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][N][C][Branch1][C][C][=C][Ring1][P]']"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/train_0-6.jsonl": "{"text":"['[C][C][C][O][C][=C][Branch1][N][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][C][Branch1][C][F][=C][C][C][=Branch1][C][=O][C][Branch1][=Branch1][C][=Branch1][C][=O][O][=C][N][Ring2][Ring1][Branch2][C][Ring2][Ring1][=Branch1][=Ring1][#Branch2]'] is penetrating the blood brain barrier to reach the brain."} {"text":"['[C][C][C][C][Branch1][C][C][=N][N][=C][Branch1][P][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][Ring1][O][C][=C][Ring1][#Branch2][C][=C][C][Branch1][Ring1][O][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][#Branch2][Ring2][Ring1][#Branch2]'] is penetrating the blood brain barrier."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/valid_0-6.jsonl": "{"text":"InChI=1S\/C27H33N3O8\/c1-26(37)13-7-6-8-16(31)17(13)21(32)18-14(26)11-15-20(29(2)3)22(33)19(24(35)27(15,38)23(18)34)25(36)28-12-30-9-4-5-10-30\/h6-8,14-15,20,28,31-32,36-38H,4-5,9-12H2,1-3H3\/b25-19-\/t14-,15-,20-,26+,27-\/m0\/s1 is penetrating the blood brain barrier."} {"text":"CCN(CC)c1ccc(NC(=O)c2ccccc2N)c(C)c1 is penetrating the blood brain barrier to reach the brain."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/test_0-9.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is penetrating the blood brain barrier to reach the brain.\nMolecule SELFIES: ['[C][C][Branch1][C][C][Branch1][C][C][O][C][=Branch1][C][=O][C][C][C][C][=C][C][=C][Branch1][#Branch2][N][Branch1][Ring2][C][C][Cl][C][C][Cl][C][=C][Ring1][=N]']\nConstraint: Answer the question in a full sentence.\nResult: This molecule is penetrating the blood brain barrier to reach the brain."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is penetrating the blood brain barrier to reach the brain.\nInChI: InChI=1S\/C43H58N2O13\/c1-12-45(13-2)31(47)20-55-30-19-28-38(51)33-32(30)34-40(26(8)37(33)50)58-43(10,41(34)52)56-18-17-29(54-11)23(5)39(57-27(9)46)25(7)36(49)24(6)35(48)21(3)15-14-16-22(4)42(53)44-28\/h14-19,21,23-25,29,35-36,39,48-51H,12-13,20H2,1-11H3,(H,44,53)\/b15-14+,18-17+,22-16-\/t21-,23+,24+,25+,29-,35-,36+,39+,43-\/m0\/s1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not penetrating the blood brain barrier to reach the brain."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/test_0-0.jsonl": "{"text":"The molecule with the SELFIES ['[C][C][Branch1][C][C][Branch1][C][C][O][C][=Branch1][C][=O][C][C][C][C][=C][C][=C][Branch1][#Branch2][N][Branch1][Ring2][C][C][Cl][C][C][Cl][C][=C][Ring1][=N]'] is penetrating the blood brain barrier to reach the brain."} {"text":"The molecule with the canonical SMILES CCN(CC)C(=O)COc1cc2c(O)c3c(O)c(C)c4c(c13)C(=O)[C@@](C)(O\/C=C\/[C@H](OC)[C@@H](C)[C@@H](OC(C)=O)[C@H](C)[C@H](O)[C@H](C)[C@@H](O)[C@@H](C)\/C=C\/C=C(\/C)C(=O)N2)O4 is not penetrating the blood brain barrier to reach the brain."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/valid_0-16.jsonl": "{"text":"User: I want to come up with a InChI.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should be penetrating the blood brain barrier.\nAssistant: Got it, this InChI is penetrating the blood brain barrier: InChI=1S\/C27H33N3O8\/c1-26(37)13-7-6-8-16(31)17(13)21(32)18-14(26)11-15-20(29(2)3)22(33)19(24(35)27(15,38)23(18)34)25(36)28-12-30-9-4-5-10-30\/h6-8,14-15,20,28,31-32,36-38H,4-5,9-12H2,1-3H3\/b25-19-\/t14-,15-,20-,26+,27-\/m0\/s1"} {"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should be penetrating the blood brain barrier to reach the brain.\nAssistant: Got it, this SMILES is penetrating the blood brain barrier to reach the brain: CCN(CC)c1ccc(NC(=O)c2ccccc2N)c(C)c1"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/valid_0-7.jsonl": "{"text":"The canonical SMILES CN(C)[C@@H]1C(=O)\/C(=C(\/O)NCN2CCCC2)C(=O)[C@@]2(O)C(=O)C3=C(O)c4c(O)cccc4[C@@](C)(O)[C@H]3C[C@@H]12 is penetrating the blood brain barrier."} {"text":"The molecule DeepSMILES CCNCC))ccccNC=O)cccccc6N)))))))))cC)c6 is penetrating the blood brain barrier to reach the brain."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/test_0-3.jsonl": "{"text":"The molecule represented with the SMILES CC(C)(C)OC(=O)CCCc1ccc(N(CCCl)CCCl)cc1 is penetrating the blood brain barrier to reach the brain."} {"text":"The molecule represented with the SMILES CCN(CC)C(=O)COc1cc2c(O)c3c(O)c(C)c4c(c13)C(=O)[C@@](C)(O\/C=C\/[C@H](OC)[C@@H](C)[C@@H](OC(C)=O)[C@H](C)[C@H](O)[C@H](C)[C@@H](O)[C@@H](C)\/C=C\/C=C(\/C)C(=O)N2)O4 is not penetrating the blood brain barrier."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/valid_0-11.jsonl": "{"text":"User: Can you tell me if the molecule with the SELFIES ['[C][N][Branch1][C][C][C@@H1][C][=Branch1][C][=O][\/C][=Branch1][=C][=C][Branch1][C][\/O][N][C][N][C][C][C][C][Ring1][Branch1][C][=Branch1][C][=O][C@@][Branch1][C][O][C][=Branch1][C][=O][C][=C][Branch1][C][O][C][=C][Branch1][C][O][C][=C][C][=C][Ring1][#Branch1][C@@][Branch1][C][C][Branch1][C][O][C@H1][Ring1][=C][C][C@@H1][Ring2][Ring2][Ring1][Ring2][Ring1][Ring2]'] is penetrating the blood brain barrier to reach the brain?\nAssistant: Yes, this molecule is penetrating the blood brain barrier to reach the brain."} {"text":"User: Can you estimate if the molecule with the SELFIES ['[C][C][N][Branch1][Ring1][C][C][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][N][C][Branch1][C][C][=C][Ring1][P]'] is penetrating the blood brain barrier?\nAssistant: Yes, this molecule is penetrating the blood brain barrier."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/train_0-0.jsonl": "{"text":"The molecule with the DeepSMILES CCCOccNCCNC)CC6))))))cF)ccc=O)cC=O)O))cn%12c%106 is penetrating the blood brain barrier."} {"text":"The molecule with the DeepSMILES CCCCC)=NN=CccccOC))cOC))c6))))))cccOC))cOC))cc6%11 is penetrating the blood brain barrier to reach the brain."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/test_0-6.jsonl": "{"text":"CC(C)(C)OC(=O)CCCc1ccc(N(CCCl)CCCl)cc1 is penetrating the blood brain barrier to reach the brain."} {"text":"['[C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][O][C][=C][Branch1][C][O][C][Branch1][C][C][=C][C][=Branch1][=Branch1][=C][Ring1][=N][Ring1][Branch2][C][=Branch1][C][=O][C@@][Branch1][C][C][Branch2][Branch1][=Branch2][O][\/C][=C][\/C@H1][Branch1][Ring1][O][C][C@@H1][Branch1][C][C][C@@H1][Branch1][#Branch1][O][C][Branch1][C][C][=O][C@H1][Branch1][C][C][C@H1][Branch1][C][O][C@H1][Branch1][C][C][C@@H1][Branch1][C][O][C@@H1][Branch1][C][C][\/C][=C][\/C][=C][Branch1][C][\/C][C][=Branch1][C][=O][N][Ring2][Ring2][=C][O][Ring2][Ring2][#Branch1]'] is not penetrating the blood brain barrier."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/train_0-10.jsonl": "{"text":"Task: Please generate a molecule canonical SMILES based on the text description below.\nDescription: A molecule that is penetrating the blood brain barrier to reach the brain.\nResult: CC1COc2c(N3CCN(C)CC3)c(F)cc3c(=O)c(C(=O)O)cn1c23"} {"text":"Task: Please give me a molecule InChI based on the text description.\nDescription: A molecule that is penetrating the blood brain barrier.\nResult: InChI=1S\/C22H26N2O4\/c1-7-15-13(2)23-24-22(14-8-9-18(25-3)19(10-14)26-4)17-12-21(28-6)20(27-5)11-16(15)17\/h8-12,15H,7H2,1-6H3"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/train_0-3.jsonl": "{"text":"The molecule represented with the DeepSMILES CCCOccNCCNC)CC6))))))cF)ccc=O)cC=O)O))cn%12c%106 is penetrating the blood brain barrier."} {"text":"The molecule represented with the canonical SMILES CCC1C(C)=NN=C(c2ccc(OC)c(OC)c2)c2cc(OC)c(OC)cc21 is penetrating the blood brain barrier to reach the brain."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/train_0-12.jsonl": "{"text":"User: Is the molecule with the SELFIES ['[C][C][C][O][C][=C][Branch1][N][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][C][Branch1][C][F][=C][C][C][=Branch1][C][=O][C][Branch1][=Branch1][C][=Branch1][C][=O][O][=C][N][Ring2][Ring1][Branch2][C][Ring2][Ring1][=Branch1][=Ring1][#Branch2]'] penetrating the blood brain barrier to reach the brain?\nAssistant: Yes, it is penetrating the blood brain barrier to reach the brain."} {"text":"User: Is the molecule with the DeepSMILES CCCCC)=NN=CccccOC))cOC))c6))))))cccOC))cOC))cc6%11 penetrating the blood brain barrier?\nAssistant: Yes, it is penetrating the blood brain barrier."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/test_0-13.jsonl": "{"text":"User: Can you create the InChI of a molecule that is penetrating the blood brain barrier to reach the brain?\nAssistant: Sure, here you go: InChI=1S\/C18H27Cl2NO2\/c1-18(2,3)23-17(22)6-4-5-15-7-9-16(10-8-15)21(13-11-19)14-12-20\/h7-10H,4-6,11-14H2,1-3H3"} {"text":"User: Can you generate the SMILES of a molecule that is not penetrating the blood brain barrier?\nAssistant: Of course, here you go: CCN(CC)C(=O)COc1cc2c(O)c3c(O)c(C)c4c(c13)C(=O)[C@@](C)(O\/C=C\/[C@H](OC)[C@@H](C)[C@@H](OC(C)=O)[C@H](C)[C@H](O)[C@H](C)[C@@H](O)[C@@H](C)\/C=C\/C=C(\/C)C(=O)N2)O4"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/valid_0-2.jsonl": "{"text":"The DeepSMILES CNC)[C@@H]C=O)\/C=C\/O)NCNCCCC5))))))))C=O)[C@@]O)C=O)C=CO)ccO)cccc6[C@@]C)O)[C@H]%10C[C@@H]%18%14 represents a molecule that is identified as penetrating the blood brain barrier."} {"text":"The canonical SMILES CCN(CC)c1ccc(NC(=O)c2ccccc2N)c(C)c1 represents a molecule that is identified as penetrating the blood brain barrier to reach the brain."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/train_0-14.jsonl": "{"text":"User: I'm searching for the InChI of a molecule that is penetrating the blood brain barrier?\nAssistant: This is a molecule that is penetrating the blood brain barrier: InChI=1S\/C18H20FN3O4\/c1-10-9-26-17-14-11(16(23)12(18(24)25)8-22(10)14)7-13(19)15(17)21-5-3-20(2)4-6-21\/h7-8,10H,3-6,9H2,1-2H3,(H,24,25)"} {"text":"User: I'm looking for the SMILES of a molecule that is penetrating the blood brain barrier?\nAssistant: This is a molecule that is penetrating the blood brain barrier: CCC1C(C)=NN=C(c2ccc(OC)c(OC)c2)c2cc(OC)c(OC)cc21"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/valid_0-1.jsonl": "{"text":"Based on the InChI representation InChI=1S\/C27H33N3O8\/c1-26(37)13-7-6-8-16(31)17(13)21(32)18-14(26)11-15-20(29(2)3)22(33)19(24(35)27(15,38)23(18)34)25(36)28-12-30-9-4-5-10-30\/h6-8,14-15,20,28,31-32,36-38H,4-5,9-12H2,1-3H3\/b25-19-\/t14-,15-,20-,26+,27-\/m0\/s1, the molecule is penetrating the blood brain barrier."} {"text":"Based on the InChI InChI=1S\/C18H23N3O\/c1-4-21(5-2)14-10-11-17(13(3)12-14)20-18(22)15-8-6-7-9-16(15)19\/h6-12H,4-5,19H2,1-3H3,(H,20,22), the molecule is penetrating the blood brain barrier."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/valid_0-13.jsonl": "{"text":"User: Can you create the canonical SMILES of a molecule that is penetrating the blood brain barrier to reach the brain?\nAssistant: Of course, here you go: CN(C)[C@@H]1C(=O)\/C(=C(\/O)NCN2CCCC2)C(=O)[C@@]2(O)C(=O)C3=C(O)c4c(O)cccc4[C@@](C)(O)[C@H]3C[C@@H]12"} {"text":"User: Can you create the DeepSMILES of a molecule that is penetrating the blood brain barrier?\nAssistant: Yes, I'm happy to help, here you go: CCNCC))ccccNC=O)cccccc6N)))))))))cC)c6"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/valid_0-5.jsonl": "{"text":"CN(C)[C@@H]1C(=O)\/C(=C(\/O)NCN2CCCC2)C(=O)[C@@]2(O)C(=O)C3=C(O)c4c(O)cccc4[C@@](C)(O)[C@H]3C[C@@H]12 represents a molecule that is penetrating the blood brain barrier."} {"text":"CCN(CC)c1ccc(NC(=O)c2ccccc2N)c(C)c1 represents a molecule that is penetrating the blood brain barrier."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/train_0-15.jsonl": "{"text":"User: I want to come up with a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should penetrating the blood brain barrier to reach the brain.\nAssistant: Got it, here you go, this canonical SMILES is penetrating the blood brain barrier to reach the brain: CC1COc2c(N3CCN(C)CC3)c(F)cc3c(=O)c(C(=O)O)cn1c23"} {"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should penetrating the blood brain barrier.\nAssistant: Got it, this DeepSMILES is penetrating the blood brain barrier: CCCCC)=NN=CccccOC))cOC))c6))))))cccOC))cOC))cc6%11"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/valid_0-4.jsonl": "{"text":"CN(C)[C@@H]1C(=O)\/C(=C(\/O)NCN2CCCC2)C(=O)[C@@]2(O)C(=O)C3=C(O)c4c(O)cccc4[C@@](C)(O)[C@H]3C[C@@H]12 represents a molecule that is identified as penetrating the blood brain barrier."} {"text":"InChI=1S\/C18H23N3O\/c1-4-21(5-2)14-10-11-17(13(3)12-14)20-18(22)15-8-6-7-9-16(15)19\/h6-12H,4-5,19H2,1-3H3,(H,20,22) represents a molecule that is identified as penetrating the blood brain barrier."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/train_0-5.jsonl": "{"text":"CC1COc2c(N3CCN(C)CC3)c(F)cc3c(=O)c(C(=O)O)cn1c23 represents a molecule that is penetrating the blood brain barrier."} {"text":"InChI=1S\/C22H26N2O4\/c1-7-15-13(2)23-24-22(14-8-9-18(25-3)19(10-14)26-4)17-12-21(28-6)20(27-5)11-16(15)17\/h8-12,15H,7H2,1-6H3 represents a molecule that is penetrating the blood brain barrier."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/valid_0-15.jsonl": "{"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should penetrating the blood brain barrier.\nAssistant: Ok, here you go, this InChI is penetrating the blood brain barrier: InChI=1S\/C27H33N3O8\/c1-26(37)13-7-6-8-16(31)17(13)21(32)18-14(26)11-15-20(29(2)3)22(33)19(24(35)27(15,38)23(18)34)25(36)28-12-30-9-4-5-10-30\/h6-8,14-15,20,28,31-32,36-38H,4-5,9-12H2,1-3H3\/b25-19-\/t14-,15-,20-,26+,27-\/m0\/s1"} {"text":"User: I want to generate a molecule InChI.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should penetrating the blood brain barrier to reach the brain.\nAssistant: Got it, here you go, this InChI is penetrating the blood brain barrier to reach the brain: InChI=1S\/C18H23N3O\/c1-4-21(5-2)14-10-11-17(13(3)12-14)20-18(22)15-8-6-7-9-16(15)19\/h6-12H,4-5,19H2,1-3H3,(H,20,22)"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/valid_0-12.jsonl": "{"text":"User: Is the molecule with the SMILES CN(C)[C@@H]1C(=O)\/C(=C(\/O)NCN2CCCC2)C(=O)[C@@]2(O)C(=O)C3=C(O)c4c(O)cccc4[C@@](C)(O)[C@H]3C[C@@H]12 penetrating the blood brain barrier?\nAssistant: Yes, it is penetrating the blood brain barrier."} {"text":"User: Is the molecule with the DeepSMILES CCNCC))ccccNC=O)cccccc6N)))))))))cC)c6 penetrating the blood brain barrier to reach the brain?\nAssistant: Yes, it is penetrating the blood brain barrier to reach the brain."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/valid_0-18.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are penetrating the blood brain barrier?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any additional words.\nOptions:\n(A) CN(C)CCCN1c2ccccc2CCc2ccccc21\n(B) O=C(OCCN1CCCCC1)C(O)(c1ccccc1)c1ccccc1\n(C) CNC1CCN2c3ccccc3CCc3cccc1c32\n(D) CN(C)CCCC1(c2ccc(F)cc2)OCc2cc(C#N)ccc21\n(E) CN(C)[C@@H]1C(=O)\/C(=C(\/O)NCN2CCCC2)C(=O)[C@@]2(O)C(=O)C3=C(O)c4c(O)cccc4[C@@](C)(O)[C@H]3C[C@@H]12\nAnswer: A, B, C, D, E"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are penetrating the blood brain barrier?\nConstraint: You must select none, one or more options from A or B without using any other words.\nOptions:\n(A) ['[C][O][C][=N][C][=N][C][Branch1][O][C][=N][O][C][Branch1][C][C][=N][Ring1][=Branch1][=C][N][Ring1][O][C][=C][Ring1][#C][C][C][C][C][Ring1][=Branch1]']\n(B) ['[C][C][N][Branch1][Ring1][C][C][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][N][C][Branch1][C][C][=C][Ring1][P]']\nAnswer: A, B"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/train_0-2.jsonl": "{"text":"The canonical SMILES CC1COc2c(N3CCN(C)CC3)c(F)cc3c(=O)c(C(=O)O)cn1c23 is from a molecule that is identified as penetrating the blood brain barrier to reach the brain."} {"text":"The DeepSMILES CCCCC)=NN=CccccOC))cOC))c6))))))cccOC))cOC))cc6%11 is from a molecule that is identified as penetrating the blood brain barrier."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/test_0-11.jsonl": "{"text":"User: Can you tell me if the molecule with the DeepSMILES CCC)C)OC=O)CCCccccNCCCl)))CCCl))))cc6 is penetrating the blood brain barrier to reach the brain?\nAssistant: Yes, this molecule is penetrating the blood brain barrier to reach the brain."} {"text":"User: Can you estimate if the molecule with the canonical SMILES CCN(CC)C(=O)COc1cc2c(O)c3c(O)c(C)c4c(c13)C(=O)[C@@](C)(O\/C=C\/[C@H](OC)[C@@H](C)[C@@H](OC(C)=O)[C@H](C)[C@H](O)[C@H](C)[C@@H](O)[C@@H](C)\/C=C\/C=C(\/C)C(=O)N2)O4 is penetrating the blood brain barrier to reach the brain?\nAssistant: No, this molecule is not penetrating the blood brain barrier to reach the brain."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/train_0-7.jsonl": "{"text":"The canonical SMILES CC1COc2c(N3CCN(C)CC3)c(F)cc3c(=O)c(C(=O)O)cn1c23 is penetrating the blood brain barrier."} {"text":"The SMILES CCC1C(C)=NN=C(c2ccc(OC)c(OC)c2)c2cc(OC)c(OC)cc21 is penetrating the blood brain barrier to reach the brain."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/test_0-17.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES of ['[C][C][Branch1][C][C][Branch1][C][C][O][C][=Branch1][C][=O][C][C][C][C][=C][C][=C][Branch1][#Branch2][N][Branch1][Ring2][C][C][Cl][C][C][Cl][C][=C][Ring1][=N]'] penetrating the blood brain barrier?\nConstraint: Even if you are not sure, you must pick either a or b without using any additional words.\nOptions:\n(a) False\n(b) True\nAnswer: b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI of InChI=1S\/C43H58N2O13\/c1-12-45(13-2)31(47)20-55-30-19-28-38(51)33-32(30)34-40(26(8)37(33)50)58-43(10,41(34)52)56-18-17-29(54-11)23(5)39(57-27(9)46)25(7)36(49)24(6)35(48)21(3)15-14-16-22(4)42(53)44-28\/h14-19,21,23-25,29,35-36,39,48-51H,12-13,20H2,1-11H3,(H,44,53)\/b15-14+,18-17+,22-16-\/t21-,23+,24+,25+,29-,35-,36+,39+,43-\/m0\/s1 penetrating the blood brain barrier to reach the brain?\nConstraint: Even if you are not sure, you must pick either a or b without using any additional words.\nOptions:\na: True\nb: False\nAnswer: b"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/train_0-11.jsonl": "{"text":"User: Can you tell me if the molecule with the InChI InChI=1S\/C18H20FN3O4\/c1-10-9-26-17-14-11(16(23)12(18(24)25)8-22(10)14)7-13(19)15(17)21-5-3-20(2)4-6-21\/h7-8,10H,3-6,9H2,1-2H3,(H,24,25) is penetrating the blood brain barrier to reach the brain?\nAssistant: Yes, this molecule is penetrating the blood brain barrier to reach the brain."} {"text":"User: Can you derive if the molecule with the SELFIES ['[C][C][C][C][Branch1][C][C][=N][N][=C][Branch1][P][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][Ring1][O][C][=C][Ring1][#Branch2][C][=C][C][Branch1][Ring1][O][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][#Branch2][Ring2][Ring1][#Branch2]'] is penetrating the blood brain barrier?\nAssistant: Yes, this molecule is penetrating the blood brain barrier."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/train_0-1.jsonl": "{"text":"Based on the canonical SMILES CC1COc2c(N3CCN(C)CC3)c(F)cc3c(=O)c(C(=O)O)cn1c23, the molecule is penetrating the blood brain barrier to reach the brain."} {"text":"Based on the SMILES CCC1C(C)=NN=C(c2ccc(OC)c(OC)c2)c2cc(OC)c(OC)cc21, the molecule is penetrating the blood brain barrier."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/train_0-13.jsonl": "{"text":"User: Can you give me the DeepSMILES of a molecule that is penetrating the blood brain barrier?\nAssistant: Sure, here you go: CCCOccNCCNC)CC6))))))cF)ccc=O)cC=O)O))cn%12c%106"} {"text":"User: Can you create the SELFIES of a molecule that is penetrating the blood brain barrier?\nAssistant: Sure, here you go: ['[C][C][C][C][Branch1][C][C][=N][N][=C][Branch1][P][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][Ring1][O][C][=C][Ring1][#Branch2][C][=C][C][Branch1][Ring1][O][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][#Branch2][Ring2][Ring1][#Branch2]']"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/train_0-4.jsonl": "{"text":"CCCOccNCCNC)CC6))))))cF)ccc=O)cC=O)O))cn%12c%106 represents a molecule that is identified as penetrating the blood brain barrier to reach the brain."} {"text":"InChI=1S\/C22H26N2O4\/c1-7-15-13(2)23-24-22(14-8-9-18(25-3)19(10-14)26-4)17-12-21(28-6)20(27-5)11-16(15)17\/h8-12,15H,7H2,1-6H3 represents a molecule that is identified as penetrating the blood brain barrier."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/test_0-7.jsonl": "{"text":"The molecule canonical SMILES CC(C)(C)OC(=O)CCCc1ccc(N(CCCl)CCCl)cc1 is penetrating the blood brain barrier to reach the brain."} {"text":"The canonical SMILES CCN(CC)C(=O)COc1cc2c(O)c3c(O)c(C)c4c(c13)C(=O)[C@@](C)(O\/C=C\/[C@H](OC)[C@@H](C)[C@@H](OC(C)=O)[C@H](C)[C@H](O)[C@H](C)[C@@H](O)[C@@H](C)\/C=C\/C=C(\/C)C(=O)N2)O4 is not penetrating the blood brain barrier."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/train_0-9.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is penetrating the blood brain barrier to reach the brain.\ncanonical SMILES: CC1COc2c(N3CCN(C)CC3)c(F)cc3c(=O)c(C(=O)O)cn1c23\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is penetrating the blood brain barrier to reach the brain."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is penetrating the blood brain barrier to reach the brain.\ncanonical SMILES: CCC1C(C)=NN=C(c2ccc(OC)c(OC)c2)c2cc(OC)c(OC)cc21\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is penetrating the blood brain barrier to reach the brain."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/train_0-18.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are penetrating the blood brain barrier to reach the brain?\nConstraint: You must select none, one or more options from a or b without using any other words.\nOptions:\na.) CC1COc2c(N3CCN(C)CC3)c(F)cc3c(=O)c(C(=O)O)cn1c23\nb.) O=C(c1ccco1)N(c1cnccn1)C1CCN(CCc2ccccc2)CC1\nAnswer: a, b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are penetrating the blood brain barrier to reach the brain?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any additional words.\nOptions:\n[1] InChI=1S\/C24H31N3O2S\/c1-2-22(29)19-8-9-24-21(18-19)27(20-6-3-4-7-23(20)30-24)11-5-10-25-12-14-26(15-13-25)16-17-28\/h3-4,6-9,18,28H,2,5,10-17H2,1H3\n[2] InChI=1S\/C22H26N2O4\/c1-7-15-13(2)23-24-22(14-8-9-18(25-3)19(10-14)26-4)17-12-21(28-6)20(27-5)11-16(15)17\/h8-12,15H,7H2,1-6H3\n[3] InChI=1S\/C16H13ClN2O4S\/c17-13-8-11(24(18,22)23)6-7-14(13)19-15(20)9-12(16(19)21)10-4-2-1-3-5-10\/h1-8,12H,9H2,(H2,18,22,23)\nAnswer: 1, 2, 3"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/valid_0-3.jsonl": "{"text":"The molecule represented with the SELFIES ['[C][N][Branch1][C][C][C@@H1][C][=Branch1][C][=O][\/C][=Branch1][=C][=C][Branch1][C][\/O][N][C][N][C][C][C][C][Ring1][Branch1][C][=Branch1][C][=O][C@@][Branch1][C][O][C][=Branch1][C][=O][C][=C][Branch1][C][O][C][=C][Branch1][C][O][C][=C][C][=C][Ring1][#Branch1][C@@][Branch1][C][C][Branch1][C][O][C@H1][Ring1][=C][C][C@@H1][Ring2][Ring2][Ring1][Ring2][Ring1][Ring2]'] is penetrating the blood brain barrier."} {"text":"The molecule represented with the canonical SMILES CCN(CC)c1ccc(NC(=O)c2ccccc2N)c(C)c1 is penetrating the blood brain barrier."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/test_0-8.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is penetrating the blood brain barrier.\nSELFIES: ['[C][C][Branch1][C][C][Branch1][C][C][O][C][=Branch1][C][=O][C][C][C][C][=C][C][=C][Branch1][#Branch2][N][Branch1][Ring2][C][C][Cl][C][C][Cl][C][=C][Ring1][=N]']\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is penetrating the blood brain barrier.\nMolecule InChI: InChI=1S\/C43H58N2O13\/c1-12-45(13-2)31(47)20-55-30-19-28-38(51)33-32(30)34-40(26(8)37(33)50)58-43(10,41(34)52)56-18-17-29(54-11)23(5)39(57-27(9)46)25(7)36(49)24(6)35(48)21(3)15-14-16-22(4)42(53)44-28\/h14-19,21,23-25,29,35-36,39,48-51H,12-13,20H2,1-11H3,(H,44,53)\/b15-14+,18-17+,22-16-\/t21-,23+,24+,25+,29-,35-,36+,39+,43-\/m0\/s1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/test_0-14.jsonl": "{"text":"User: I'm searching for the DeepSMILES of a molecule that is penetrating the blood brain barrier?\nAssistant: This is a molecule that is penetrating the blood brain barrier: CCC)C)OC=O)CCCccccNCCCl)))CCCl))))cc6"} {"text":"User: I'm looking for the SELFIES of a molecule that is not penetrating the blood brain barrier to reach the brain?\nAssistant: This is a molecule that is not penetrating the blood brain barrier to reach the brain: ['[C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][O][C][=C][Branch1][C][O][C][Branch1][C][C][=C][C][=Branch1][=Branch1][=C][Ring1][=N][Ring1][Branch2][C][=Branch1][C][=O][C@@][Branch1][C][C][Branch2][Branch1][=Branch2][O][\/C][=C][\/C@H1][Branch1][Ring1][O][C][C@@H1][Branch1][C][C][C@@H1][Branch1][#Branch1][O][C][Branch1][C][C][=O][C@H1][Branch1][C][C][C@H1][Branch1][C][O][C@H1][Branch1][C][C][C@@H1][Branch1][C][O][C@@H1][Branch1][C][C][\/C][=C][\/C][=C][Branch1][C][\/C][C][=Branch1][C][=O][N][Ring2][Ring2][=C][O][Ring2][Ring2][#Branch1]']"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/valid_0-17.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES of CN(C)[C@@H]1C(=O)\/C(=C(\/O)NCN2CCCC2)C(=O)[C@@]2(O)C(=O)C3=C(O)c4c(O)cccc4[C@@](C)(O)[C@H]3C[C@@H]12 penetrating the blood brain barrier to reach the brain?\nConstraint: Even if you are not sure, you must pick either A or B without using any additional words.\nOptions:\n(A) True\n(B) False\nAnswer: A"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES of ['[C][C][N][Branch1][Ring1][C][C][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][N][C][Branch1][C][C][=C][Ring1][P]'] penetrating the blood brain barrier?\nConstraint: Even if you are not sure, you must pick either a or b without using any other words.\nOptions:\n(a) True\n(b) False\nAnswer: a"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/valid_0-14.jsonl": "{"text":"User: I'm looking for the DeepSMILES of a molecule that is penetrating the blood brain barrier to reach the brain?\nAssistant: This is a molecule that is penetrating the blood brain barrier to reach the brain: CNC)[C@@H]C=O)\/C=C\/O)NCNCCCC5))))))))C=O)[C@@]O)C=O)C=CO)ccO)cccc6[C@@]C)O)[C@H]%10C[C@@H]%18%14"} {"text":"User: I'm searching for the SMILES of a molecule that is penetrating the blood brain barrier to reach the brain?\nAssistant: This is a molecule that is penetrating the blood brain barrier to reach the brain: CCN(CC)c1ccc(NC(=O)c2ccccc2N)c(C)c1"}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/test_0-4.jsonl": "{"text":"InChI=1S\/C18H27Cl2NO2\/c1-18(2,3)23-17(22)6-4-5-15-7-9-16(10-8-15)21(13-11-19)14-12-20\/h7-10H,4-6,11-14H2,1-3H3 represents a molecule that is identified as penetrating the blood brain barrier to reach the brain."} {"text":"CCN(CC)C(=O)COc1cc2c(O)c3c(O)c(C)c4c(c13)C(=O)[C@@](C)(O\/C=C\/[C@H](OC)[C@@H](C)[C@@H](OC(C)=O)[C@H](C)[C@H](O)[C@H](C)[C@@H](O)[C@@H](C)\/C=C\/C=C(\/C)C(=O)N2)O4 represents a molecule that is not identified as penetrating the blood brain barrier to reach the brain."}", "/scratch/micpie/export/blood_brain_barrier_martins_et_al/test_0-12.jsonl": "{"text":"User: Is the molecule with the canonical SMILES CC(C)(C)OC(=O)CCCc1ccc(N(CCCl)CCCl)cc1 penetrating the blood brain barrier to reach the brain?\nAssistant: Yes, it is penetrating the blood brain barrier to reach the brain."} {"text":"User: Is the molecule with the canonical SMILES CCN(CC)C(=O)COc1cc2c(O)c3c(O)c(C)c4c(c13)C(=O)[C@@](C)(O\/C=C\/[C@H](OC)[C@@H](C)[C@@H](OC(C)=O)[C@H](C)[C@H](O)[C@H](C)[C@@H](O)[C@@H](C)\/C=C\/C=C(\/C)C(=O)N2)O4 penetrating the blood brain barrier to reach the brain?\nAssistant: No, it is not penetrating the blood brain barrier to reach the brain."}", "/scratch/micpie/export/drug_protein_ec_number/test_0-1.jsonl": "{"text":"The drug [H][C@](CC)(C1=CC=C(O)C=C1)[C@]([H])(CC)C1=CC=C(O)C=C1 targets the protein AKR1C1. Furthermore, the protein AKR1C1 catalyzes the trans-1,2-dihydrobenzene-1,2-diol dehydrogenase (EC 1.3.1.20) reaction."} {"text":"The drug NC=NC=CNC=C5)))C=O)N6 targets the protein HPRT1. Furthermore, the protein HPRT1 catalyzes the hypoxanthine phosphoribosyltransferase (EC 2.4.2.8) reaction."}", "/scratch/micpie/export/drug_protein_ec_number/valid_0-0.jsonl": "{"text":"The drug Caffeine targets the protein PIK3CA which catalyzes the non-specific serine\/threonine protein kinase (EC 2.7.11.1) reaction."} {"text":"The drug [N][C][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][=Branch1][C][=O][C][=C][C][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][N][=C][Ring1][=C] targets the protein TPSB2 which catalyzes the tryptase (EC 3.4.21.59) reaction."}", "/scratch/micpie/export/drug_protein_ec_number/test_0-2.jsonl": "{"text":"User: Can you give me an example for a protein that binds the drug Hexestrol?\nAssistant: Yes, the drug Hexestrol targets the protein AKR1C1.\nUser: Can you tell me which reaction the protein AKR1C1 catalyzes?\nAssistant: The protein AKR1C1 catalyzes a trans-1,2-dihydrobenzene-1,2-diol dehydrogenase (EC 1.3.1.20) reaction."} {"text":"User: Can you come up with an example for a protein that binds the drug NC1=NC2=C(NC=C2)C(=O)N1?\nAssistant: Sure, the drug NC1=NC2=C(NC=C2)C(=O)N1 targets the protein HPRT1.\nUser: Can you tell me which reaction the protein HPRT1 catalyzes?\nAssistant: The protein HPRT1 catalyzes a hypoxanthine phosphoribosyltransferase (EC 2.4.2.8) reaction."}", "/scratch/micpie/export/drug_protein_ec_number/test_0-0.jsonl": "{"text":"The drug Hexestrol targets the protein AKR1C1 which catalyzes the trans-1,2-dihydrobenzene-1,2-diol dehydrogenase (EC 1.3.1.20) reaction."} {"text":"The drug 9-Deazaguanine targets the protein HPRT1 which catalyzes the hypoxanthine phosphoribosyltransferase (EC 2.4.2.8) reaction."}", "/scratch/micpie/export/drug_protein_ec_number/train_0-0.jsonl": "{"text":"The drug Compound 19 targets the protein PTPN1 which catalyzes the protein-tyrosine-phosphatase (EC 3.1.3.48) reaction."} {"text":"The drug Histidine targets the protein HAL which catalyzes the histidine ammonia-lyase (EC 4.3.1.3) reaction."}", "/scratch/micpie/export/drug_protein_ec_number/valid_0-2.jsonl": "{"text":"User: Can you come up with one example for a protein that binds the drug Cn1c(=O)c2c(ncn2C)n(C)c1=O?\nAssistant: Sure, the drug Cn1c(=O)c2c(ncn2C)n(C)c1=O targets the protein PIK3CA.\nUser: Can you tell me which reaction the protein PIK3CA catalyzes?\nAssistant: The protein PIK3CA catalyzes a non-specific serine\/threonine protein kinase (EC 2.7.11.1) reaction."} {"text":"User: Can you come up with an example for a protein that binds the drug InChI=1S\/C26H29N3O\/c27-17-21-7-4-8-24(15-21)23-11-13-29(14-12-23)26(30)25-16-22(18-28-19-25)10-9-20-5-2-1-3-6-20\/h1-8,15-16,18-19,23H,9-14,17,27H2?\nAssistant: Sure, the drug InChI=1S\/C26H29N3O\/c27-17-21-7-4-8-24(15-21)23-11-13-29(14-12-23)26(30)25-16-22(18-28-19-25)10-9-20-5-2-1-3-6-20\/h1-8,15-16,18-19,23H,9-14,17,27H2 targets the protein TPSB2.\nUser: Can you tell me which reaction the protein TPSB2 catalyzes?\nAssistant: The protein TPSB2 catalyzes a tryptase (EC 3.4.21.59) reaction."}", "/scratch/micpie/export/drug_protein_ec_number/valid_0-1.jsonl": "{"text":"The drug Caffeine targets the protein PIK3CA. Furthermore, the protein PIK3CA catalyzes the non-specific serine\/threonine protein kinase (EC 2.7.11.1) reaction."} {"text":"The drug [4-(3-AMINOMETHYL-PHENYL)-PIPERIDIN-1-YL]-(5-PHENETHYL- PYRIDIN-3-YL)-METHANONE targets the protein TPSB2. Furthermore, the protein TPSB2 catalyzes the tryptase (EC 3.4.21.59) reaction."}", "/scratch/micpie/export/drug_protein_ec_number/train_0-2.jsonl": "{"text":"User: Can you come up with one example for a protein that binds the drug [H][C@@](C)(COC1=CC=C(C=C1)[C@]1([H])OC2=C(S[C@]1([H])C1=CC=C(O)C=C1)C=C(O)C=C2)N1CCCC1?\nAssistant: Yes, the drug [H][C@@](C)(COC1=CC=C(C=C1)[C@]1([H])OC2=C(S[C@]1([H])C1=CC=C(O)C=C1)C=C(O)C=C2)N1CCCC1 targets the protein PTPN1.\nUser: Can you tell me which reaction the protein PTPN1 catalyzes?\nAssistant: The protein PTPN1 catalyzes a protein-tyrosine-phosphatase (EC 3.1.3.48) reaction."} {"text":"User: Can you come up with one example for a protein that binds the drug Histidine?\nAssistant: Sure, the drug Histidine targets the protein HAL.\nUser: Can you tell me which reaction the protein HAL catalyzes?\nAssistant: The protein HAL catalyzes a histidine ammonia-lyase (EC 4.3.1.3) reaction."}", "/scratch/micpie/export/drug_protein_ec_number/train_0-1.jsonl": "{"text":"The drug Compound 19 targets the protein PTPN1. Furthermore, the protein PTPN1 catalyzes the protein-tyrosine-phosphatase (EC 3.1.3.48) reaction."} {"text":"The drug Histidine targets the protein HAL. Furthermore, the protein HAL catalyzes the histidine ammonia-lyase (EC 4.3.1.3) reaction."}", "/scratch/micpie/export/opv/test_0-10.jsonl": "{"text":"User: I aim to build a non-fullerene organic solar cell device with a power conversion efficiency (PCE) of 5.12%.\nAssistant: That's interesting. Do you have a donor polymer in mind?\nUser: Yes, I would like to use a polymer with monomer SELFIES [C][C][=C][C][Branch2][Branch1][Ring1][C][Branch1][O][C][C][Branch1][Branch1][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Branch1][C][C][C][C][C][C][C][=C][S][C][Branch1][P][C][=C][C][=C][Branch1][C][C][C][=N][S][N][=C][Ring1][#Branch2][Ring1][Branch1][=C][Ring1][#C][=C][Ring1][#C][S][Ring2][Ring2][Branch1] and need to know the PDI and molecular weight (Mw) of the polymer I should use.\nAssistant: I recommend trying a molecular weight (Mw) of 47.6 g\/mol and a PDI of 1.79."} {"text":"User: I wish to create a PCBM organic solar cell device with a power conversion efficiency of 2.41%.\nAssistant: Cool. Do you have a donor polymer in mind?\nUser: Yes, I would like to use a polymer with monomer SMILES CC1=CC2=C(C(C3=CC=C(SCC(CCCC)CC)S3)=C(C=C(C4=CC=C(C5=C6C(C(N5CC(CC)CCCC)=O)=C(C7=CC=C(C)S7)N(CC(CCCC)CC)C6=O)S4)S8)C8=C2C9=CC=C(SCC(CCCC)CC)S9)S1 and must know the polydispersity index and molecular weight of the polymer I should use.\nAssistant: I propose trying a molecular weight of 10.9 g\/mol and a polydispersity index of 1.62."}", "/scratch/micpie/export/opv/valid_0-8.jsonl": "{"text":"The PCBM OPV device with a donor polymer with monomer SELFIES [C][C][=C][C][Branch1][#C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][=C][Branch2][Ring2][=C][C][=C][C][=C][Branch1][Ring2][S][Ring1][Branch1][C][=C][Branch2][Ring1][N][C][=C][Branch1][#C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][=C][Branch1][C][C][S][Ring2][Ring1][Ring2][S][Ring2][Ring1][#Branch2][S][Ring2][Ring2][#C] and weight-average molecular weight 46.2 g\/mol and PDI of 2.10 has a short-circuit current density of 9.37 mA\/cm^2."} {"text":"The non-fullerene organic photovoltaics (OPV) device with a donor polymer with monomer DeepSMILES CC=CC=CCOCCCCCCCC)))))))))=CC=CC=CCC=C\/C=CCOCCCCCCCCCC))))))))CCCCCCCCCC)))))))))))))=O))\/C#N))))S5)))=CC)S5)))))S)))C-1=C6OCCCCCCCC)))))))))))))S5 and Mw 49.4 g\/mol and polydispersity index of 1.90 has a short-circuit current density of tested devices of 4.50 mA\/cm^2."}", "/scratch/micpie/export/opv/train_0-8.jsonl": "{"text":"The non-fullerene organic solar cell device with a donor polymer with monomer SMILES CC1=CC(N(C(CCCCCCCC)CCCCCCCC)C2=C3C=CC(C4=CC=C(C5=CC=C(C6=CC=C(C)S6)C7=NSN=C57)S4)=C2)=C3C=C1 and weight-average molecular weight (Mw) 73.0 g\/mol and PDI of 1.97 has a short-circuit current density of 6.92 mA\/cm^2."} {"text":"The non-fullerene organic photovoltaics (OPV) device with a donor polymer with monomer canonical SMILES CCCCCCCCOc1ccc(-c2nc3c(-c4ccc(C)s4)c(OCCCCCCCC)c(OCCCCCCCC)c(-c4ccc(-c5ccc6c7ccc(C)cc7n(CCCCCCCC)c6c5)s4)c3nc2-c2ccc(OCCCCCCCC)cc2)cc1 and weight-average molecular weight 21.4 g\/mol and polydispersity index (PDI) of 1.69 has a short-circuit current density of 7.48 mA\/cm^2."}", "/scratch/micpie/export/opv/test_0-5.jsonl": "{"text":"Question: What is the highest-occupied molecular orbital (HOMO) energy of a polymer with monomer canonical SMILES CCCCC(CC)CC1(CC(CC)CCCC)c2cc(C)sc2-c2sc(-c3ccc(C)c4nsnc34)cc21?\nAnswer: The highest-occupied molecular orbital (HOMO) energy is -5.30 eV."} {"text":"Question: What is the HOMO energy of a polymer with monomer SELFIES [C][C][=C][C][=C][Branch2][=Branch2][#C][C][Branch2][Ring1][Branch1][C][=C][C][=C][Branch1][N][S][C][C][Branch1][Branch1][C][C][C][C][C][C][S][Ring1][=C][=C][Branch2][=Branch1][Branch2][C][=C][Branch2][=Branch1][C][C][=C][C][=C][Branch2][Branch1][#Branch1][C][=C][C][Branch2][Ring1][C][C][Branch1][=C][N][Ring1][Branch1][C][C][Branch1][Ring1][C][C][C][C][C][C][=O][=C][Branch1][O][C][=C][C][=C][Branch1][C][C][S][Ring1][=Branch1][N][Branch1][O][C][C][Branch1][Branch1][C][C][C][C][C][C][C][Ring2][Ring1][=C][=O][S][Ring2][Ring2][Branch1][S][C][Ring1][C][=C][Ring2][Branch1][N][C][=C][C][=C][Branch1][N][S][C][C][Branch1][Branch1][C][C][C][C][C][C][S][Ring1][=C][S][Ring2][=Branch1][=N]?\nAnswer: The HOMO energy is -5.30 eV."}", "/scratch/micpie/export/opv/valid_0-9.jsonl": "{"text":"The non-fullerene organic photovoltaics device with a donor polymer with monomer SMILES CC1=CC(CCCCCCCCCCCCCC)=C(C2=CC3=C(S2)C=C(C4=C(CCCCCCCCCCCCCC)C=C(C)S4)S3)S1 and Mw 46.2 g\/mol and polydispersity index (PDI) of 2.10 has a fill factor of 0.48."} {"text":"The PC71BM OPV device with a donor polymer with monomer SMILES CC1=CC2=C(C(OCCCCCCCC)=C(C=C(C3=C4C(C=C(\/C=C(C(OCC(CCCCCCCC)CCCCCCCCCC)=O)\/C#N)S4)=C(C)S3)S5)C5=C2OCCCCCCCC)S1 and Mw 49.4 g\/mol and polydispersity index of 1.90 has a fill factor of 0.47."}", "/scratch/micpie/export/opv/test_0-1.jsonl": "{"text":"Question: What is the open-circuit voltage of tested devices of a PC71BM OPV device with a donor polymer with monomer DeepSMILES CC=CCCCCCCCC))))CC))))CCCCCC))))CC))))C=CSCC=CC=CC)C=NSN=C95)))))))))=C5))))))=C-1S5 and Mw 47.6 g\/mol and PDI of 1.79?\nAnswer: The Voc is0.61 V."} {"text":"Question: What is the open-circuit voltage of tested devices of a PCBM OPV device with a donor polymer with monomer canonical SMILES CCCCC(CC)CSc1ccc(-c2c3cc(-c4ccc(C5=C6C(=O)N(CC(CC)CCCC)C(c7ccc(C)s7)=C6C(=O)N5CC(CC)CCCC)s4)sc3c(-c3ccc(SCC(CC)CCCC)s3)c3cc(C)sc23)s1 and Mw 10.9 g\/mol and polydispersity index (PDI) of 1.62?\nAnswer: The Voc is0.78 V."}", "/scratch/micpie/export/opv/valid_0-0.jsonl": "{"text":"Question: What is the power conversion efficiency of a non-fullerene organic photovoltaics (OPV) device with a donor polymer with monomer canonical SMILES CCCCCCCCCCCCCCc1cc(C)sc1-c1cc2sc(-c3sc(C)cc3CCCCCCCCCCCCCC)cc2s1 and weight-average molecular weight 46.2 g\/mol and PDI of 2.10?\nAnswer: The power conversion efficiency is 2.34 %."} {"text":"Question: What is the power conversion efficiency of a PC71BM organic solar cell device with a donor polymer with monomer SELFIES [C][C][=C][C][=C][Branch2][#Branch1][P][C][Branch1][#Branch2][O][C][C][C][C][C][C][C][C][=C][Branch2][Branch1][P][C][=C][Branch2][Branch1][O][C][=C][C][Branch2][Ring2][=N][C][=C][Branch2][Ring2][Ring2][\/C][=C][Branch2][Ring1][=N][C][Branch2][Ring1][Branch2][O][C][C][Branch1][=Branch2][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][=O][\/C][#N][S][Ring2][Ring1][S][=C][Branch1][C][C][S][Ring2][Ring2][Ring2][S][C][Ring1][C][=C][Ring2][Branch1][=Branch1][O][C][C][C][C][C][C][C][C][S][Ring2][=Branch1][C] and weight-average molecular weight (Mw) 49.4 g\/mol and PDI of 1.90?\nAnswer: The power conversion efficiency is 1.63 %."}", "/scratch/micpie/export/opv/test_0-2.jsonl": "{"text":"Question: What is the short-circuit current density of tested devices of a non-fullerene organic photovoltaics device with a donor polymer with monomer SELFIES [C][C][=C][C][Branch2][Branch1][Ring1][C][Branch1][O][C][C][Branch1][Branch1][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Branch1][C][C][C][C][C][C][C][=C][S][C][Branch1][P][C][=C][C][=C][Branch1][C][C][C][=N][S][N][=C][Ring1][#Branch2][Ring1][Branch1][=C][Ring1][#C][=C][Ring1][#C][S][Ring2][Ring2][Branch1] and weight-average molecular weight (Mw) 47.6 g\/mol and PDI of 1.79?\nAnswer: The short-circuit current density is 15.73 mA\/cm^2."} {"text":"Question: What is the short-circuit current density of a PC71BM OPV device with a donor polymer with monomer SELFIES [C][C][=C][C][=C][Branch2][=Branch2][#C][C][Branch2][Ring1][Branch1][C][=C][C][=C][Branch1][N][S][C][C][Branch1][Branch1][C][C][C][C][C][C][S][Ring1][=C][=C][Branch2][=Branch1][Branch2][C][=C][Branch2][=Branch1][C][C][=C][C][=C][Branch2][Branch1][#Branch1][C][=C][C][Branch2][Ring1][C][C][Branch1][=C][N][Ring1][Branch1][C][C][Branch1][Ring1][C][C][C][C][C][C][=O][=C][Branch1][O][C][=C][C][=C][Branch1][C][C][S][Ring1][=Branch1][N][Branch1][O][C][C][Branch1][Branch1][C][C][C][C][C][C][C][Ring2][Ring1][=C][=O][S][Ring2][Ring2][Branch1][S][C][Ring1][C][=C][Ring2][Branch1][N][C][=C][C][=C][Branch1][N][S][C][C][Branch1][Branch1][C][C][C][C][C][C][S][Ring1][=C][S][Ring2][=Branch1][=N] and weight-average molecular weight 10.9 g\/mol and polydispersity index of 1.62?\nAnswer: The Jsc is7.79 mA\/cm^2."}", "/scratch/micpie/export/opv/valid_0-10.jsonl": "{"text":"User: I want to build a PCBM organic photovoltaics device with a power conversion efficiency of 2.34%.\nAssistant: Cool. Do you have a donor polymer in mind?\nUser: Yes, I would like to use a polymer with monomer DeepSMILES CC=CCCCCCCCCCCCCCCC))))))))))))))=CC=CC=CS5)C=CC=CCCCCCCCCCCCCCC))))))))))))))C=CC)S5)))))S5)))))))S5 and need to know the PDI and Mw of the polymer I should use.\nAssistant: I recommend trying a Mw of 46.2 g\/mol and a PDI of 2.10."} {"text":"User: I aim to create a non-fullerene organic photovoltaics (OPV) device with a power conversion efficiency (PCE) of 1.63%.\nAssistant: Cool. Do you have a donor polymer in mind?\nUser: Yes, I would like to use a polymer with monomer InChI InChI=1S\/C58H85NO4S4\/c1-7-11-15-19-23-24-26-30-34-45(33-29-25-20-16-12-8-2)42-63-58(60)46(41-59)38-47-39-48-44(6)65-57(56(48)66-47)51-40-50-53(62-36-32-28-22-18-14-10-4)54-49(37-43(5)64-54)52(55(50)67-51)61-35-31-27-21-17-13-9-3\/h37-40,45H,7-36,42H2,1-6H3\/b46-38- and would like to know the polydispersity index and molecular weight of the polymer I should use.\nAssistant: I propose trying a molecular weight of 49.4 g\/mol and a polydispersity index of 1.90."}", "/scratch/micpie/export/opv/train_0-6.jsonl": "{"text":"Question: What is the lowest-unoccupied molecular orbital (LUMO) energy of a polymer with monomer SELFIES [C][C][=C][C][Branch2][=Branch1][Branch2][N][Branch2][Ring1][Ring2][C][Branch1][=Branch2][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][=C][C][=C][C][Branch2][Ring2][Branch1][C][=C][C][=C][Branch2][Ring1][#Branch2][C][=C][C][=C][Branch1][O][C][=C][C][=C][Branch1][C][C][S][Ring1][=Branch1][C][=N][S][N][=C][Ring1][#C][Ring1][Branch1][S][Ring2][Ring1][Ring2][=C][Ring2][Ring1][#Branch2][=C][Ring2][Ring1][#Branch2][C][=C][Ring2][Branch1][C]?\nAnswer: The lowest-unoccupied molecular orbital (LUMO) energy of the polymer is -3.60 eV."} {"text":"Question: What is the lowest-unoccupied molecular orbital (LUMO) energy of a polymer with monomer InChI InChI=1S\/C82H109N3O4S2\/c1-8-13-18-23-28-33-54-85-70-59-61(6)38-49-68(70)69-50-44-65(60-71(69)85)72-52-53-74(91-72)76-80-79(75(73-51-39-62(7)90-73)81(88-57-36-31-26-21-16-11-4)82(76)89-58-37-32-27-22-17-12-5)83-77(63-40-45-66(46-41-63)86-55-34-29-24-19-14-9-2)78(84-80)64-42-47-67(48-43-64)87-56-35-30-25-20-15-10-3\/h38-53,59-60H,8-37,54-58H2,1-7H3?\nAnswer: The lowest-unoccupied molecular orbital (LUMO) energy of the polymer is -3.39 eV."}", "/scratch/micpie/export/opv/valid_0-6.jsonl": "{"text":"Question: What is the lowest-unoccupied molecular orbital energy of a polymer with monomer SELFIES [C][C][=C][C][Branch1][#C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][=C][Branch2][Ring2][=C][C][=C][C][=C][Branch1][Ring2][S][Ring1][Branch1][C][=C][Branch2][Ring1][N][C][=C][Branch1][#C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][=C][Branch1][C][C][S][Ring2][Ring1][Ring2][S][Ring2][Ring1][#Branch2][S][Ring2][Ring2][#C]?\nAnswer: The lowest-unoccupied molecular orbital energy is -3.10 eV."} {"text":"Question: What is the lowest-unoccupied molecular orbital energy of a polymer with monomer SELFIES [C][C][=C][C][=C][Branch2][#Branch1][P][C][Branch1][#Branch2][O][C][C][C][C][C][C][C][C][=C][Branch2][Branch1][P][C][=C][Branch2][Branch1][O][C][=C][C][Branch2][Ring2][=N][C][=C][Branch2][Ring2][Ring2][\/C][=C][Branch2][Ring1][=N][C][Branch2][Ring1][Branch2][O][C][C][Branch1][=Branch2][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][=O][\/C][#N][S][Ring2][Ring1][S][=C][Branch1][C][C][S][Ring2][Ring2][Ring2][S][C][Ring1][C][=C][Ring2][Branch1][=Branch1][O][C][C][C][C][C][C][C][C][S][Ring2][=Branch1][C]?\nAnswer: The lowest-unoccupied molecular orbital energy is -3.51 eV."}", "/scratch/micpie/export/opv/test_0-9.jsonl": "{"text":"The PC71BM organic photovoltaics device with a donor polymer with monomer canonical SMILES CCCCC(CC)CC1(CC(CC)CCCC)c2cc(C)sc2-c2sc(-c3ccc(C)c4nsnc34)cc21 and Mw 47.6 g\/mol and polydispersity index of 1.79 has a fill factor of tested devices of 0.53."} {"text":"The PC71BM organic photovoltaics device with a donor polymer with monomer InChI InChI=1S\/C66H84N2O2S8\/c1-11-19-23-43(15-5)37-67-61(52-28-27-41(9)73-52)59-60(66(67)70)62(68(65(59)69)38-44(16-6)24-20-12-2)53-30-29-49(75-53)54-36-48-58(51-32-34-56(77-51)72-40-46(18-8)26-22-14-4)63-47(35-42(10)74-63)57(64(48)78-54)50-31-33-55(76-50)71-39-45(17-7)25-21-13-3\/h27-36,43-46H,11-26,37-40H2,1-10H3 and weight-average molecular weight (Mw) 10.9 g\/mol and polydispersity index of 1.62 has a fill factor of tested devices of 0.40."}", "/scratch/micpie/export/opv/test_0-0.jsonl": "{"text":"Question: What is the power conversion efficiency (PCE) of a non-fullerene organic solar cell device with a donor polymer with monomer SMILES CC1=CC(C(CC(CCCC)CC)(CC(CCCC)CC)C2=C3SC(C4=CC=C(C)C5=NSN=C45)=C2)=C3S1 and weight-average molecular weight (Mw) 47.6 g\/mol and polydispersity index of 1.79?\nAnswer: The power conversion efficiency is 5.12 %."} {"text":"Question: What is the power conversion efficiency (PCE) of a PCBM organic solar cell device with a donor polymer with monomer InChI InChI=1S\/C66H84N2O2S8\/c1-11-19-23-43(15-5)37-67-61(52-28-27-41(9)73-52)59-60(66(67)70)62(68(65(59)69)38-44(16-6)24-20-12-2)53-30-29-49(75-53)54-36-48-58(51-32-34-56(77-51)72-40-46(18-8)26-22-14-4)63-47(35-42(10)74-63)57(64(48)78-54)50-31-33-55(76-50)71-39-45(17-7)25-21-13-3\/h27-36,43-46H,11-26,37-40H2,1-10H3 and weight-average molecular weight (Mw) 10.9 g\/mol and polydispersity index (PDI) of 1.62?\nAnswer: The power conversion efficiency is 2.41 %."}", "/scratch/micpie/export/opv/valid_0-7.jsonl": "{"text":"The PCBM organic photovoltaics device with a donor polymer with monomer canonical SMILES CCCCCCCCCCCCCCc1cc(C)sc1-c1cc2sc(-c3sc(C)cc3CCCCCCCCCCCCCC)cc2s1 and weight-average molecular weight 46.2 g\/mol and PDI of 2.10 has a power conversion efficiency (PCE) of 2.34%."} {"text":"The non-fullerene organic solar cell device with a donor polymer with monomer DeepSMILES CC=CC=CCOCCCCCCCC)))))))))=CC=CC=CCC=C\/C=CCOCCCCCCCCCC))))))))CCCCCCCCCC)))))))))))))=O))\/C#N))))S5)))=CC)S5)))))S)))C-1=C6OCCCCCCCC)))))))))))))S5 and weight-average molecular weight (Mw) 49.4 g\/mol and polydispersity index of 1.90 has a power conversion efficiency (PCE) of 1.63%."}", "/scratch/micpie/export/opv/test_0-3.jsonl": "{"text":"Question: What is the fill factor of tested devices of a non-fullerene organic photovoltaics (OPV) device with a donor polymer with monomer SMILES CC1=CC(C(CC(CCCC)CC)(CC(CCCC)CC)C2=C3SC(C4=CC=C(C)C5=NSN=C45)=C2)=C3S1 and Mw 47.6 g\/mol and polydispersity index (PDI) of 1.79?\nAnswer: The FF is 0.53."} {"text":"Question: What is the fill factor of a PCBM organic photovoltaics (OPV) device with a donor polymer with monomer InChI InChI=1S\/C66H84N2O2S8\/c1-11-19-23-43(15-5)37-67-61(52-28-27-41(9)73-52)59-60(66(67)70)62(68(65(59)69)38-44(16-6)24-20-12-2)53-30-29-49(75-53)54-36-48-58(51-32-34-56(77-51)72-40-46(18-8)26-22-14-4)63-47(35-42(10)74-63)57(64(48)78-54)50-31-33-55(76-50)71-39-45(17-7)25-21-13-3\/h27-36,43-46H,11-26,37-40H2,1-10H3 and weight-average molecular weight 10.9 g\/mol and polydispersity index (PDI) of 1.62?\nAnswer: The FF is 0.40."}", "/scratch/micpie/export/opv/valid_0-11.jsonl": "{"text":"User: I want to design a PC71BM organic photovoltaics (OPV) device with a power conversion efficiency (PCE) of 2.34%.\nAssistant: Cool. Do you have additional constraints?\nUser: I would like to have a short-circuit current density of 9.37 mA\/cm^2.\nAssistant: I propose trying a molecular weight (Mw) of 46.2 g\/mol and polydispersity index of 2.10 of a polymer with monomer InChI InChI=1S\/C44H68S4\/c1-5-7-9-11-13-15-17-19-21-23-25-27-29-37-31-35(3)45-43(37)41-33-39-40(47-41)34-42(48-39)44-38(32-36(4)46-44)30-28-26-24-22-20-18-16-14-12-10-8-6-2\/h31-34H,5-30H2,1-4H3."} {"text":"User: I want to build a PC71BM organic photovoltaics (OPV) device with a power conversion efficiency (PCE) of 1.63%.\nAssistant: That's interesting. Do you have additional constraints?\nUser: Yes, I would like to have a short-circuit current density of 4.50 mA\/cm^2.\nAssistant: I recommend trying a molecular weight of 49.4 g\/mol and PDI of 1.90 of a polymer with monomer SMILES CC1=CC2=C(C(OCCCCCCCC)=C(C=C(C3=C4C(C=C(\/C=C(C(OCC(CCCCCCCC)CCCCCCCCCC)=O)\/C#N)S4)=C(C)S3)S5)C5=C2OCCCCCCCC)S1."}", "/scratch/micpie/export/opv/train_0-0.jsonl": "{"text":"Question: What is the power conversion efficiency of a non-fullerene organic photovoltaics device with a donor polymer with monomer SELFIES [C][C][=C][C][Branch2][=Branch1][Branch2][N][Branch2][Ring1][Ring2][C][Branch1][=Branch2][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][=C][C][=C][C][Branch2][Ring2][Branch1][C][=C][C][=C][Branch2][Ring1][#Branch2][C][=C][C][=C][Branch1][O][C][=C][C][=C][Branch1][C][C][S][Ring1][=Branch1][C][=N][S][N][=C][Ring1][#C][Ring1][Branch1][S][Ring2][Ring1][Ring2][=C][Ring2][Ring1][#Branch2][=C][Ring2][Ring1][#Branch2][C][=C][Ring2][Branch1][C] and weight-average molecular weight (Mw) 73.0 g\/mol and polydispersity index (PDI) of 1.97?\nAnswer: The PCE is3.60 %."} {"text":"Question: What is the power conversion efficiency of a PC71BM OPV device with a donor polymer with monomer SMILES CC1=CC(N(CCCCCCCC)C2=C3C=CC(C4=CC=C(C5=C(OCCCCCCCC)C(OCCCCCCCC)=C(C6=CC=C(C)S6)C7=C5N=C(C8=CC=C(OCCCCCCCC)C=C8)C(C9=CC=C(OCCCCCCCC)C=C9)=N7)S4)=C2)=C3C=C1 and weight-average molecular weight (Mw) 21.4 g\/mol and PDI of 1.69?\nAnswer: The power conversion efficiency is 3.21 %."}", "/scratch/micpie/export/opv/test_0-6.jsonl": "{"text":"Question: What is the LUMO energy of a polymer with monomer canonical SMILES CCCCC(CC)CC1(CC(CC)CCCC)c2cc(C)sc2-c2sc(-c3ccc(C)c4nsnc34)cc21?\nAnswer: The LUMO energy is -3.55 eV."} {"text":"Question: What is the lowest-unoccupied molecular orbital (LUMO) energy of a polymer with monomer SMILES CC1=CC2=C(C(C3=CC=C(SCC(CCCC)CC)S3)=C(C=C(C4=CC=C(C5=C6C(C(N5CC(CC)CCCC)=O)=C(C7=CC=C(C)S7)N(CC(CCCC)CC)C6=O)S4)S8)C8=C2C9=CC=C(SCC(CCCC)CC)S9)S1?\nAnswer: The lowest-unoccupied molecular orbital (LUMO) energy is -3.45 eV."}", "/scratch/micpie/export/opv/train_0-10.jsonl": "{"text":"User: I aim to create a PC71BM organic photovoltaics (OPV) device with a power conversion efficiency (PCE) of 3.60%.\nAssistant: Cool. Do you have a donor polymer in mind?\nUser: Yes, I would like to use a polymer with monomer SELFIES [C][C][=C][C][Branch2][=Branch1][Branch2][N][Branch2][Ring1][Ring2][C][Branch1][=Branch2][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][=C][C][=C][C][Branch2][Ring2][Branch1][C][=C][C][=C][Branch2][Ring1][#Branch2][C][=C][C][=C][Branch1][O][C][=C][C][=C][Branch1][C][C][S][Ring1][=Branch1][C][=N][S][N][=C][Ring1][#C][Ring1][Branch1][S][Ring2][Ring1][Ring2][=C][Ring2][Ring1][#Branch2][=C][Ring2][Ring1][#Branch2][C][=C][Ring2][Branch1][C] and must know the polydispersity index and molecular weight of the polymer I should use.\nAssistant: I suggest trying a molecular weight of 73.0 g\/mol and a polydispersity index of 1.97."} {"text":"User: I want to create a PCBM organic photovoltaics device with a power conversion efficiency of 3.21%.\nAssistant: Cool. Do you have a donor polymer in mind?\nUser: Yes, I would like to use a polymer with monomer SELFIES [C][C][=C][C][Branch2][=Branch2][P][N][Branch1][=Branch2][C][C][C][C][C][C][C][C][C][=C][C][=C][C][Branch2][Branch2][#Branch2][C][=C][C][=C][Branch2][#Branch1][#C][C][=C][Branch1][#Branch2][O][C][C][C][C][C][C][C][C][C][Branch1][#Branch2][O][C][C][C][C][C][C][C][C][=C][Branch1][O][C][=C][C][=C][Branch1][C][C][S][Ring1][=Branch1][C][=C][Ring2][Ring1][=C][N][=C][Branch2][Ring1][Ring2][C][=C][C][=C][Branch1][#Branch2][O][C][C][C][C][C][C][C][C][C][=C][Ring1][#C][C][Branch2][Ring1][Ring2][C][=C][C][=C][Branch1][#Branch2][O][C][C][C][C][C][C][C][C][C][=C][Ring1][#C][=N][Ring2][Ring2][Ring2][S][Ring2][=Branch1][Branch1][=C][Ring2][=Branch1][O][=C][Ring2][=Branch1][O][C][=C][Ring2][#Branch1][#Branch2] and need to know the polydispersity index and molecular weight (Mw) of the polymer I should use.\nAssistant: I suggest trying a molecular weight (Mw) of 21.4 g\/mol and a polydispersity index of 1.69."}", "/scratch/micpie/export/opv/train_0-3.jsonl": "{"text":"Question: What is the fill factor of tested devices of a PC71BM OPV device with a donor polymer with monomer SELFIES [C][C][=C][C][Branch2][=Branch1][Branch2][N][Branch2][Ring1][Ring2][C][Branch1][=Branch2][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][=C][C][=C][C][Branch2][Ring2][Branch1][C][=C][C][=C][Branch2][Ring1][#Branch2][C][=C][C][=C][Branch1][O][C][=C][C][=C][Branch1][C][C][S][Ring1][=Branch1][C][=N][S][N][=C][Ring1][#C][Ring1][Branch1][S][Ring2][Ring1][Ring2][=C][Ring2][Ring1][#Branch2][=C][Ring2][Ring1][#Branch2][C][=C][Ring2][Branch1][C] and weight-average molecular weight (Mw) 73.0 g\/mol and polydispersity index of 1.97?\nAnswer: The FF is 0.63."} {"text":"Question: What is the fill factor of tested devices of a non-fullerene organic photovoltaics (OPV) device with a donor polymer with monomer SELFIES [C][C][=C][C][Branch2][=Branch2][P][N][Branch1][=Branch2][C][C][C][C][C][C][C][C][C][=C][C][=C][C][Branch2][Branch2][#Branch2][C][=C][C][=C][Branch2][#Branch1][#C][C][=C][Branch1][#Branch2][O][C][C][C][C][C][C][C][C][C][Branch1][#Branch2][O][C][C][C][C][C][C][C][C][=C][Branch1][O][C][=C][C][=C][Branch1][C][C][S][Ring1][=Branch1][C][=C][Ring2][Ring1][=C][N][=C][Branch2][Ring1][Ring2][C][=C][C][=C][Branch1][#Branch2][O][C][C][C][C][C][C][C][C][C][=C][Ring1][#C][C][Branch2][Ring1][Ring2][C][=C][C][=C][Branch1][#Branch2][O][C][C][C][C][C][C][C][C][C][=C][Ring1][#C][=N][Ring2][Ring2][Ring2][S][Ring2][=Branch1][Branch1][=C][Ring2][=Branch1][O][=C][Ring2][=Branch1][O][C][=C][Ring2][#Branch1][#Branch2] and Mw 21.4 g\/mol and PDI of 1.69?\nAnswer: The fill factor is 0.52."}", "/scratch/micpie/export/opv/train_0-12.jsonl": "{"text":"User: Can you suggest a donor polymer for a PC71BM organic photovoltaics device with a power conversion efficiency of 3.60% and a short-circuit current density of 6.92 mA\/cm^2?\nAssistant: I propose trying a Mw of 73.0 g\/mol and polydispersity index of 1.97 of a polymer with monomer canonical SMILES CCCCCCCCC(CCCCCCCC)n1c2cc(C)ccc2c2ccc(-c3ccc(-c4ccc(-c5ccc(C)s5)c5nsnc45)s3)cc21."} {"text":"User: Can you propose a donor polymer for a PCBM OPV device with a power conversion efficiency (PCE) of 3.21% and a short-circuit current density of 7.48 mA\/cm^2?\nAssistant: I recommend trying a Mw of 21.4 g\/mol and PDI of 1.69 of a polymer with monomer canonical SMILES CCCCCCCCOc1ccc(-c2nc3c(-c4ccc(C)s4)c(OCCCCCCCC)c(OCCCCCCCC)c(-c4ccc(-c5ccc6c7ccc(C)cc7n(CCCCCCCC)c6c5)s4)c3nc2-c2ccc(OCCCCCCCC)cc2)cc1."}", "/scratch/micpie/export/opv/test_0-13.jsonl": "{"text":"Task: Predict the power conversion efficiency of a PC71BM organic photovoltaics (OPV) device based on a description of the donor polymer.\nDescription: The donor polymer has monomer DeepSMILES CC=CCCCCCCCC))))CC))))CCCCCC))))CC))))C=CSCC=CC=CC)C=NSN=C95)))))))))=C5))))))=C-1S5 and weight-average molecular weight (Mw) 47.6 g\/mol and polydispersity index of 1.79.\nSolution: The PCE is 5.12 %."} {"text":"Task: Predict the power conversion efficiency of a PC71BM organic photovoltaics device based on a description of the donor polymer.\nDescription: The donor polymer has monomer SMILES CC1=CC2=C(C(C3=CC=C(SCC(CCCC)CC)S3)=C(C=C(C4=CC=C(C5=C6C(C(N5CC(CC)CCCC)=O)=C(C7=CC=C(C)S7)N(CC(CCCC)CC)C6=O)S4)S8)C8=C2C9=CC=C(SCC(CCCC)CC)S9)S1 and Mw 10.9 g\/mol and PDI of 1.62.\nSolution: The power conversion efficiency is 2.41 %."}", "/scratch/micpie/export/opv/valid_0-2.jsonl": "{"text":"Question: What is the short-circuit current density of a non-fullerene organic photovoltaics device with a donor polymer with monomer SMILES CC1=CC(CCCCCCCCCCCCCC)=C(C2=CC3=C(S2)C=C(C4=C(CCCCCCCCCCCCCC)C=C(C)S4)S3)S1 and Mw 46.2 g\/mol and polydispersity index of 2.10?\nAnswer: The Jsc is9.37 mA\/cm^2."} {"text":"Question: What is the short-circuit current density of a PC71BM organic photovoltaics (OPV) device with a donor polymer with monomer InChI InChI=1S\/C58H85NO4S4\/c1-7-11-15-19-23-24-26-30-34-45(33-29-25-20-16-12-8-2)42-63-58(60)46(41-59)38-47-39-48-44(6)65-57(56(48)66-47)51-40-50-53(62-36-32-28-22-18-14-10-4)54-49(37-43(5)64-54)52(55(50)67-51)61-35-31-27-21-17-13-9-3\/h37-40,45H,7-36,42H2,1-6H3\/b46-38- and Mw 49.4 g\/mol and polydispersity index of 1.90?\nAnswer: The short-circuit current density is 4.50 mA\/cm^2."}", "/scratch/micpie/export/opv/valid_0-1.jsonl": "{"text":"Question: What is the open-circuit voltage of a non-fullerene organic photovoltaics (OPV) device with a donor polymer with monomer InChI InChI=1S\/C44H68S4\/c1-5-7-9-11-13-15-17-19-21-23-25-27-29-37-31-35(3)45-43(37)41-33-39-40(47-41)34-42(48-39)44-38(32-36(4)46-44)30-28-26-24-22-20-18-16-14-12-10-8-6-2\/h31-34H,5-30H2,1-4H3 and Mw 46.2 g\/mol and polydispersity index (PDI) of 2.10?\nAnswer: The open-circuit voltage is 0.53 V."} {"text":"Question: What is the open-circuit voltage of tested devices of a PC71BM organic photovoltaics device with a donor polymer with monomer SMILES CC1=CC2=C(C(OCCCCCCCC)=C(C=C(C3=C4C(C=C(\/C=C(C(OCC(CCCCCCCC)CCCCCCCCCC)=O)\/C#N)S4)=C(C)S3)S5)C5=C2OCCCCCCCC)S1 and Mw 49.4 g\/mol and polydispersity index of 1.90?\nAnswer: The Voc is0.77 V."}", "/scratch/micpie/export/opv/valid_0-13.jsonl": "{"text":"Task: Predict the power conversion efficiency (PCE) of a non-fullerene organic solar cell device based on a description of the donor polymer.\nDescription: The donor polymer has monomer DeepSMILES CC=CCCCCCCCCCCCCCCC))))))))))))))=CC=CC=CS5)C=CC=CCCCCCCCCCCCCCC))))))))))))))C=CC)S5)))))S5)))))))S5 and weight-average molecular weight (Mw) 46.2 g\/mol and PDI of 2.10.\nSolution: The PCE is 2.34 %."} {"text":"Task: Predict the power conversion efficiency (PCE) of a PCBM organic photovoltaics (OPV) device based on a description of the donor polymer.\nDescription: The donor polymer has monomer SMILES CC1=CC2=C(C(OCCCCCCCC)=C(C=C(C3=C4C(C=C(\/C=C(C(OCC(CCCCCCCC)CCCCCCCCCC)=O)\/C#N)S4)=C(C)S3)S5)C5=C2OCCCCCCCC)S1 and weight-average molecular weight (Mw) 49.4 g\/mol and PDI of 1.90.\nSolution: The power conversion efficiency is 1.63 %."}", "/scratch/micpie/export/opv/valid_0-5.jsonl": "{"text":"Question: What is the HOMO energy of a polymer with monomer canonical SMILES CCCCCCCCCCCCCCc1cc(C)sc1-c1cc2sc(-c3sc(C)cc3CCCCCCCCCCCCCC)cc2s1?\nAnswer: The HOMO energy is -5.10 eV."} {"text":"Question: What is the HOMO energy of a polymer with monomer InChI InChI=1S\/C58H85NO4S4\/c1-7-11-15-19-23-24-26-30-34-45(33-29-25-20-16-12-8-2)42-63-58(60)46(41-59)38-47-39-48-44(6)65-57(56(48)66-47)51-40-50-53(62-36-32-28-22-18-14-10-4)54-49(37-43(5)64-54)52(55(50)67-51)61-35-31-27-21-17-13-9-3\/h37-40,45H,7-36,42H2,1-6H3\/b46-38-?\nAnswer: The HOMO energy is -5.76 eV."}", "/scratch/micpie/export/opv/valid_0-4.jsonl": "{"text":"Question: What is the bandgap of the polymer of a polymer with monomer InChI InChI=1S\/C44H68S4\/c1-5-7-9-11-13-15-17-19-21-23-25-27-29-37-31-35(3)45-43(37)41-33-39-40(47-41)34-42(48-39)44-38(32-36(4)46-44)30-28-26-24-22-20-18-16-14-12-10-8-6-2\/h31-34H,5-30H2,1-4H3?\nAnswer: The bandgap of the polymer is1.88 eV."} {"text":"Question: What is the bandgap of a polymer with monomer canonical SMILES CCCCCCCCCCC(CCCCCCCC)COC(=O)\/C(C#N)=C\\c1cc2c(C)sc(-c3cc4c(OCCCCCCCC)c5sc(C)cc5c(OCCCCCCCC)c4s3)c2s1?\nAnswer: The bandgap of the polymer is1.33 eV."}", "/scratch/micpie/export/opv/train_0-5.jsonl": "{"text":"Question: What is the HOMO energy of a polymer with monomer DeepSMILES CC=CCNCCCCCCCCC))))))))CCCCCCCC)))))))))C=CC=CCC=CC=CC=CC=CC=CC=CC)S5)))))C=NSN=C95)))))))))S5)))))=C6)))))))=C-1C=C6?\nAnswer: The HOMO energy of the polymer is -5.50 eV."} {"text":"Question: What is the highest-occupied molecular orbital energy of a polymer with monomer canonical SMILES CCCCCCCCOc1ccc(-c2nc3c(-c4ccc(C)s4)c(OCCCCCCCC)c(OCCCCCCCC)c(-c4ccc(-c5ccc6c7ccc(C)cc7n(CCCCCCCC)c6c5)s4)c3nc2-c2ccc(OCCCCCCCC)cc2)cc1?\nAnswer: The highest-occupied molecular orbital energy of the polymer is -5.41 eV."}", "/scratch/micpie/export/opv/valid_0-12.jsonl": "{"text":"User: Can you propose a donor polymer for a PCBM organic solar cell device with a power conversion efficiency of 2.34% and a short-circuit current density of 9.37 mA\/cm^2?\nAssistant: I suggest trying a molecular weight of 46.2 g\/mol and polydispersity index (PDI) of 2.10 of a polymer with monomer SELFIES [C][C][=C][C][Branch1][#C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][=C][Branch2][Ring2][=C][C][=C][C][=C][Branch1][Ring2][S][Ring1][Branch1][C][=C][Branch2][Ring1][N][C][=C][Branch1][#C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][=C][Branch1][C][C][S][Ring2][Ring1][Ring2][S][Ring2][Ring1][#Branch2][S][Ring2][Ring2][#C]."} {"text":"User: Can you suggest a donor polymer for a non-fullerene OPV device with a power conversion efficiency (PCE) of 1.63% and a short-circuit current density of 4.50 mA\/cm^2?\nAssistant: I suggest trying a Mw of 49.4 g\/mol and PDI of 1.90 of a polymer with monomer canonical SMILES CCCCCCCCCCC(CCCCCCCC)COC(=O)\/C(C#N)=C\\c1cc2c(C)sc(-c3cc4c(OCCCCCCCC)c5sc(C)cc5c(OCCCCCCCC)c4s3)c2s1."}", "/scratch/micpie/export/opv/train_0-2.jsonl": "{"text":"Question: What is the short-circuit current density of tested devices of a non-fullerene organic solar cell device with a donor polymer with monomer SELFIES [C][C][=C][C][Branch2][=Branch1][Branch2][N][Branch2][Ring1][Ring2][C][Branch1][=Branch2][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][=C][C][=C][C][Branch2][Ring2][Branch1][C][=C][C][=C][Branch2][Ring1][#Branch2][C][=C][C][=C][Branch1][O][C][=C][C][=C][Branch1][C][C][S][Ring1][=Branch1][C][=N][S][N][=C][Ring1][#C][Ring1][Branch1][S][Ring2][Ring1][Ring2][=C][Ring2][Ring1][#Branch2][=C][Ring2][Ring1][#Branch2][C][=C][Ring2][Branch1][C] and weight-average molecular weight (Mw) 73.0 g\/mol and PDI of 1.97?\nAnswer: The short-circuit current density is 6.92 mA\/cm^2."} {"text":"Question: What is the short-circuit current density of a PC71BM organic photovoltaics (OPV) device with a donor polymer with monomer canonical SMILES CCCCCCCCOc1ccc(-c2nc3c(-c4ccc(C)s4)c(OCCCCCCCC)c(OCCCCCCCC)c(-c4ccc(-c5ccc6c7ccc(C)cc7n(CCCCCCCC)c6c5)s4)c3nc2-c2ccc(OCCCCCCCC)cc2)cc1 and weight-average molecular weight (Mw) 21.4 g\/mol and polydispersity index (PDI) of 1.69?\nAnswer: The Jsc is7.48 mA\/cm^2."}", "/scratch/micpie/export/opv/test_0-11.jsonl": "{"text":"User: I aim to design a PCBM organic solar cell device with a power conversion efficiency of 5.12%.\nAssistant: Do you have additional constraints?\nUser: Yeah, I would like to have a short-circuit current density of 15.73 mA\/cm^2.\nAssistant: I propose trying a Mw of 47.6 g\/mol and polydispersity index of 1.79 of a polymer with monomer InChI InChI=1S\/C33H44N2S3\/c1-7-11-13-23(9-3)19-33(20-24(10-4)14-12-8-2)26-17-22(6)36-31(26)32-27(33)18-28(37-32)25-16-15-21(5)29-30(25)35-38-34-29\/h15-18,23-24H,7-14,19-20H2,1-6H3."} {"text":"User: I want to create a PCBM organic photovoltaics device with a power conversion efficiency (PCE) of 2.41%.\nAssistant: Do you have additional constraints?\nUser: Indeed, I would like to have a short-circuit current density of 7.79 mA\/cm^2.\nAssistant: I propose trying a molecular weight of 10.9 g\/mol and polydispersity index (PDI) of 1.62 of a polymer with monomer SMILES CC1=CC2=C(C(C3=CC=C(SCC(CCCC)CC)S3)=C(C=C(C4=CC=C(C5=C6C(C(N5CC(CC)CCCC)=O)=C(C7=CC=C(C)S7)N(CC(CCCC)CC)C6=O)S4)S8)C8=C2C9=CC=C(SCC(CCCC)CC)S9)S1."}", "/scratch/micpie/export/opv/train_0-7.jsonl": "{"text":"The PCBM organic photovoltaics (OPV) device with a donor polymer with monomer canonical SMILES CCCCCCCCC(CCCCCCCC)n1c2cc(C)ccc2c2ccc(-c3ccc(-c4ccc(-c5ccc(C)s5)c5nsnc45)s3)cc21 and Mw 73.0 g\/mol and polydispersity index (PDI) of 1.97 has a power conversion efficiency (PCE) of 3.60%."} {"text":"The PC71BM OPV device with a donor polymer with monomer canonical SMILES CCCCCCCCOc1ccc(-c2nc3c(-c4ccc(C)s4)c(OCCCCCCCC)c(OCCCCCCCC)c(-c4ccc(-c5ccc6c7ccc(C)cc7n(CCCCCCCC)c6c5)s4)c3nc2-c2ccc(OCCCCCCCC)cc2)cc1 and weight-average molecular weight (Mw) 21.4 g\/mol and polydispersity index (PDI) of 1.69 has a power conversion efficiency (PCE) of 3.21%."}", "/scratch/micpie/export/opv/train_0-11.jsonl": "{"text":"User: I would like to design a PCBM organic solar cell device with a power conversion efficiency of 3.60%.\nAssistant: That's interesting. Do you have additional constraints?\nUser: Indeed, I would like to have a short-circuit current density of tested devices of 6.92 mA\/cm^2.\nAssistant: I recommend trying a molecular weight of 73.0 g\/mol and polydispersity index of 1.97 of a polymer with monomer canonical SMILES CCCCCCCCC(CCCCCCCC)n1c2cc(C)ccc2c2ccc(-c3ccc(-c4ccc(-c5ccc(C)s5)c5nsnc45)s3)cc21."} {"text":"User: I want to build a PC71BM organic photovoltaics device with a power conversion efficiency of 3.21%.\nAssistant: Do you have additional constraints?\nUser: I would like to have a short-circuit current density of 7.48 mA\/cm^2.\nAssistant: I recommend trying a molecular weight of 21.4 g\/mol and PDI of 1.69 of a polymer with monomer SELFIES [C][C][=C][C][Branch2][=Branch2][P][N][Branch1][=Branch2][C][C][C][C][C][C][C][C][C][=C][C][=C][C][Branch2][Branch2][#Branch2][C][=C][C][=C][Branch2][#Branch1][#C][C][=C][Branch1][#Branch2][O][C][C][C][C][C][C][C][C][C][Branch1][#Branch2][O][C][C][C][C][C][C][C][C][=C][Branch1][O][C][=C][C][=C][Branch1][C][C][S][Ring1][=Branch1][C][=C][Ring2][Ring1][=C][N][=C][Branch2][Ring1][Ring2][C][=C][C][=C][Branch1][#Branch2][O][C][C][C][C][C][C][C][C][C][=C][Ring1][#C][C][Branch2][Ring1][Ring2][C][=C][C][=C][Branch1][#Branch2][O][C][C][C][C][C][C][C][C][C][=C][Ring1][#C][=N][Ring2][Ring2][Ring2][S][Ring2][=Branch1][Branch1][=C][Ring2][=Branch1][O][=C][Ring2][=Branch1][O][C][=C][Ring2][#Branch1][#Branch2]."}", "/scratch/micpie/export/opv/train_0-1.jsonl": "{"text":"Question: What is the open-circuit voltage of tested devices of a PC71BM OPV device with a donor polymer with monomer canonical SMILES CCCCCCCCC(CCCCCCCC)n1c2cc(C)ccc2c2ccc(-c3ccc(-c4ccc(-c5ccc(C)s5)c5nsnc45)s3)cc21 and weight-average molecular weight (Mw) 73.0 g\/mol and polydispersity index (PDI) of 1.97?\nAnswer: The Voc is0.89 V."} {"text":"Question: What is the open-circuit voltage of tested devices of a PC71BM OPV device with a donor polymer with monomer DeepSMILES CC=CCNCCCCCCCC))))))))C=CC=CCC=CC=CC=COCCCCCCCC)))))))))COCCCCCCCC)))))))))=CC=CC=CC)S5)))))C=C6N=CC=CC=COCCCCCCCC)))))))))C=C6))))))CC=CC=COCCCCCCCC)))))))))C=C6))))))=N6))))))))))S5)))))=C6)))))))=C-1C=C6 and weight-average molecular weight 21.4 g\/mol and polydispersity index (PDI) of 1.69?\nAnswer: The Voc is0.86 V."}", "/scratch/micpie/export/opv/train_0-13.jsonl": "{"text":"Task: Predict the power conversion efficiency of a PC71BM organic photovoltaics (OPV) device based on a description of the donor polymer.\nDescription: The donor polymer has monomer SELFIES [C][C][=C][C][Branch2][=Branch1][Branch2][N][Branch2][Ring1][Ring2][C][Branch1][=Branch2][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][=C][C][=C][C][Branch2][Ring2][Branch1][C][=C][C][=C][Branch2][Ring1][#Branch2][C][=C][C][=C][Branch1][O][C][=C][C][=C][Branch1][C][C][S][Ring1][=Branch1][C][=N][S][N][=C][Ring1][#C][Ring1][Branch1][S][Ring2][Ring1][Ring2][=C][Ring2][Ring1][#Branch2][=C][Ring2][Ring1][#Branch2][C][=C][Ring2][Branch1][C] and weight-average molecular weight (Mw) 73.0 g\/mol and polydispersity index (PDI) of 1.97.\nSolution: The PCE is 3.60 %."} {"text":"Task: Predict the power conversion efficiency of a PC71BM organic photovoltaics device based on a description of the donor polymer.\nDescription: The donor polymer has monomer SMILES CC1=CC(N(CCCCCCCC)C2=C3C=CC(C4=CC=C(C5=C(OCCCCCCCC)C(OCCCCCCCC)=C(C6=CC=C(C)S6)C7=C5N=C(C8=CC=C(OCCCCCCCC)C=C8)C(C9=CC=C(OCCCCCCCC)C=C9)=N7)S4)=C2)=C3C=C1 and weight-average molecular weight 21.4 g\/mol and polydispersity index (PDI) of 1.69.\nSolution: The PCE is 3.21 %."}", "/scratch/micpie/export/opv/train_0-4.jsonl": "{"text":"Question: What is the bandgap of a polymer with monomer DeepSMILES CC=CCNCCCCCCCCC))))))))CCCCCCCC)))))))))C=CC=CCC=CC=CC=CC=CC=CC=CC)S5)))))C=NSN=C95)))))))))S5)))))=C6)))))))=C-1C=C6?\nAnswer: The bandgap of the polymer is1.88 eV."} {"text":"Question: What is the bandgap of a polymer with monomer SMILES CC1=CC(N(CCCCCCCC)C2=C3C=CC(C4=CC=C(C5=C(OCCCCCCCC)C(OCCCCCCCC)=C(C6=CC=C(C)S6)C7=C5N=C(C8=CC=C(OCCCCCCCC)C=C8)C(C9=CC=C(OCCCCCCCC)C=C9)=N7)S4)=C2)=C3C=C1?\nAnswer: The bandgap of the polymer is2.02 eV."}", "/scratch/micpie/export/opv/test_0-7.jsonl": "{"text":"The PC71BM organic photovoltaics (OPV) device with a donor polymer with monomer InChI InChI=1S\/C33H44N2S3\/c1-7-11-13-23(9-3)19-33(20-24(10-4)14-12-8-2)26-17-22(6)36-31(26)32-27(33)18-28(37-32)25-16-15-21(5)29-30(25)35-38-34-29\/h15-18,23-24H,7-14,19-20H2,1-6H3 and weight-average molecular weight 47.6 g\/mol and PDI of 1.79 has a power conversion efficiency (PCE) of 5.12%."} {"text":"The non-fullerene organic photovoltaics device with a donor polymer with monomer SELFIES [C][C][=C][C][=C][Branch2][=Branch2][#C][C][Branch2][Ring1][Branch1][C][=C][C][=C][Branch1][N][S][C][C][Branch1][Branch1][C][C][C][C][C][C][S][Ring1][=C][=C][Branch2][=Branch1][Branch2][C][=C][Branch2][=Branch1][C][C][=C][C][=C][Branch2][Branch1][#Branch1][C][=C][C][Branch2][Ring1][C][C][Branch1][=C][N][Ring1][Branch1][C][C][Branch1][Ring1][C][C][C][C][C][C][=O][=C][Branch1][O][C][=C][C][=C][Branch1][C][C][S][Ring1][=Branch1][N][Branch1][O][C][C][Branch1][Branch1][C][C][C][C][C][C][C][Ring2][Ring1][=C][=O][S][Ring2][Ring2][Branch1][S][C][Ring1][C][=C][Ring2][Branch1][N][C][=C][C][=C][Branch1][N][S][C][C][Branch1][Branch1][C][C][C][C][C][C][S][Ring1][=C][S][Ring2][=Branch1][=N] and weight-average molecular weight 10.9 g\/mol and PDI of 1.62 has a power conversion efficiency of 2.41%."}", "/scratch/micpie/export/opv/train_0-9.jsonl": "{"text":"The PCBM OPV device with a donor polymer with monomer InChI InChI=1S\/C45H53N3S3\/c1-5-7-9-11-13-15-17-34(18-16-14-12-10-8-6-2)48-39-29-31(3)19-22-35(39)36-23-21-33(30-40(36)48)41-27-28-43(50-41)38-25-24-37(42-26-20-32(4)49-42)44-45(38)47-51-46-44\/h19-30,34H,5-18H2,1-4H3 and Mw 73.0 g\/mol and PDI of 1.97 has a fill factor of tested devices of 0.63."} {"text":"The PCBM organic solar cell device with a donor polymer with monomer SELFIES [C][C][=C][C][Branch2][=Branch2][P][N][Branch1][=Branch2][C][C][C][C][C][C][C][C][C][=C][C][=C][C][Branch2][Branch2][#Branch2][C][=C][C][=C][Branch2][#Branch1][#C][C][=C][Branch1][#Branch2][O][C][C][C][C][C][C][C][C][C][Branch1][#Branch2][O][C][C][C][C][C][C][C][C][=C][Branch1][O][C][=C][C][=C][Branch1][C][C][S][Ring1][=Branch1][C][=C][Ring2][Ring1][=C][N][=C][Branch2][Ring1][Ring2][C][=C][C][=C][Branch1][#Branch2][O][C][C][C][C][C][C][C][C][C][=C][Ring1][#C][C][Branch2][Ring1][Ring2][C][=C][C][=C][Branch1][#Branch2][O][C][C][C][C][C][C][C][C][C][=C][Ring1][#C][=N][Ring2][Ring2][Ring2][S][Ring2][=Branch1][Branch1][=C][Ring2][=Branch1][O][=C][Ring2][=Branch1][O][C][=C][Ring2][#Branch1][#Branch2] and weight-average molecular weight (Mw) 21.4 g\/mol and PDI of 1.69 has a fill factor of tested devices of 0.52."}", "/scratch/micpie/export/opv/valid_0-3.jsonl": "{"text":"Question: What is the fill factor of a PCBM organic photovoltaics (OPV) device with a donor polymer with monomer InChI InChI=1S\/C44H68S4\/c1-5-7-9-11-13-15-17-19-21-23-25-27-29-37-31-35(3)45-43(37)41-33-39-40(47-41)34-42(48-39)44-38(32-36(4)46-44)30-28-26-24-22-20-18-16-14-12-10-8-6-2\/h31-34H,5-30H2,1-4H3 and Mw 46.2 g\/mol and PDI of 2.10?\nAnswer: The FF is 0.48."} {"text":"Question: What is the fill factor of tested devices of a PCBM organic photovoltaics device with a donor polymer with monomer SELFIES [C][C][=C][C][=C][Branch2][#Branch1][P][C][Branch1][#Branch2][O][C][C][C][C][C][C][C][C][=C][Branch2][Branch1][P][C][=C][Branch2][Branch1][O][C][=C][C][Branch2][Ring2][=N][C][=C][Branch2][Ring2][Ring2][\/C][=C][Branch2][Ring1][=N][C][Branch2][Ring1][Branch2][O][C][C][Branch1][=Branch2][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][=O][\/C][#N][S][Ring2][Ring1][S][=C][Branch1][C][C][S][Ring2][Ring2][Ring2][S][C][Ring1][C][=C][Ring2][Branch1][=Branch1][O][C][C][C][C][C][C][C][C][S][Ring2][=Branch1][C] and weight-average molecular weight (Mw) 49.4 g\/mol and polydispersity index of 1.90?\nAnswer: The FF is 0.47."}", "/scratch/micpie/export/opv/test_0-8.jsonl": "{"text":"The PC71BM OPV device with a donor polymer with monomer InChI InChI=1S\/C33H44N2S3\/c1-7-11-13-23(9-3)19-33(20-24(10-4)14-12-8-2)26-17-22(6)36-31(26)32-27(33)18-28(37-32)25-16-15-21(5)29-30(25)35-38-34-29\/h15-18,23-24H,7-14,19-20H2,1-6H3 and weight-average molecular weight (Mw) 47.6 g\/mol and PDI of 1.79 has a short-circuit current density of 15.73 mA\/cm^2."} {"text":"The PCBM organic photovoltaics device with a donor polymer with monomer SELFIES [C][C][=C][C][=C][Branch2][=Branch2][#C][C][Branch2][Ring1][Branch1][C][=C][C][=C][Branch1][N][S][C][C][Branch1][Branch1][C][C][C][C][C][C][S][Ring1][=C][=C][Branch2][=Branch1][Branch2][C][=C][Branch2][=Branch1][C][C][=C][C][=C][Branch2][Branch1][#Branch1][C][=C][C][Branch2][Ring1][C][C][Branch1][=C][N][Ring1][Branch1][C][C][Branch1][Ring1][C][C][C][C][C][C][=O][=C][Branch1][O][C][=C][C][=C][Branch1][C][C][S][Ring1][=Branch1][N][Branch1][O][C][C][Branch1][Branch1][C][C][C][C][C][C][C][Ring2][Ring1][=C][=O][S][Ring2][Ring2][Branch1][S][C][Ring1][C][=C][Ring2][Branch1][N][C][=C][C][=C][Branch1][N][S][C][C][Branch1][Branch1][C][C][C][C][C][C][S][Ring1][=C][S][Ring2][=Branch1][=N] and weight-average molecular weight 10.9 g\/mol and PDI of 1.62 has a short-circuit current density of tested devices of 7.79 mA\/cm^2."}", "/scratch/micpie/export/opv/test_0-4.jsonl": "{"text":"Question: What is the bandgap of a polymer with monomer SMILES CC1=CC(C(CC(CCCC)CC)(CC(CCCC)CC)C2=C3SC(C4=CC=C(C)C5=NSN=C45)=C2)=C3S1?\nAnswer: The bandgap is 1.43 eV."} {"text":"Question: What is the bandgap of the polymer of a polymer with monomer canonical SMILES CCCCC(CC)CSc1ccc(-c2c3cc(-c4ccc(C5=C6C(=O)N(CC(CC)CCCC)C(c7ccc(C)s7)=C6C(=O)N5CC(CC)CCCC)s4)sc3c(-c3ccc(SCC(CC)CCCC)s3)c3cc(C)sc23)s1?\nAnswer: The bandgap of the polymer is1.45 eV."}", "/scratch/micpie/export/opv/test_0-12.jsonl": "{"text":"User: Can you recommend a donor polymer for a PC71BM organic photovoltaics device with a power conversion efficiency (PCE) of 5.12% and a short-circuit current density of tested devices of 15.73 mA\/cm^2?\nAssistant: I propose trying a molecular weight of 47.6 g\/mol and PDI of 1.79 of a polymer with monomer DeepSMILES CC=CCCCCCCCC))))CC))))CCCCCC))))CC))))C=CSCC=CC=CC)C=NSN=C95)))))))))=C5))))))=C-1S5."} {"text":"User: Can you suggest a donor polymer for a PCBM organic photovoltaics (OPV) device with a power conversion efficiency (PCE) of 2.41% and a short-circuit current density of 7.79 mA\/cm^2?\nAssistant: I propose trying a molecular weight of 10.9 g\/mol and PDI of 1.62 of a polymer with monomer DeepSMILES CC=CC=CCC=CC=CSCCCCCC))))CC)))))S5)))))=CC=CC=CC=CC=CCCN5CCCC))CCCC)))))))=O))=CC=CC=CC)S5)))))NCCCCCC))))CC))))C5=O)))))))S5)))))S)))C-1=C6C=CC=CSCCCCCC))))CC)))))S5)))))))))S5."}", "/scratch/micpie/export/BACE/valid_0-8.jsonl": "{"text":"Based on the DeepSMILES S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6, the molecule has a pIC50 of the human beta-secretase 1 (BACE-1) of 8.699 M."} {"text":"Based on the SMILES Clc1cc2nc(n(c2cc1)C(CC(=O)NCC1CCOCC1)CC)N, the molecule has a negative log10 of the 50% inhibitory concentration of BACE-1 of 3.000 M."}", "/scratch/micpie/export/BACE/train_0-8.jsonl": "{"text":"Based on the SMILES O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C, the molecule has a negative log10 of the 50% inhibitory concentration of BACE-1 of 9.155 M."} {"text":"Based on the InChI InChI=1S\/C15H19ClN4O\/c16-11-5-6-13-12(8-11)19-15(17)20(13)7-1-2-14(21)18-9-10-3-4-10\/h5-6,8,10H,1-4,7,9H2,(H2,17,19)(H,18,21), the molecule has a pIC50 of the human beta-secretase 1 (BACE-1) of 2.545 M."}", "/scratch/micpie/export/BACE/test_0-5.jsonl": "{"text":"User: Can you estimate if the molecule with the InChI InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1 is inhibitory of the human beta-secretase 1?\nAssistant: Yes, this molecule is inhibitory of the human beta-secretase 1."} {"text":"User: Can you tell me if the molecule with the SELFIES [NH1][N][=N][N][=C][Ring1][Branch1][C][Branch2][Ring1][=Branch1][N][C][=N][C][Branch1][N][C][C][=C][Ring1][=Branch1][C][=C][C][=C][Ring1][=Branch1][Branch1][C][C][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1] is inhibitory of the human beta-secretase 1?\nAssistant: No, this molecule is not inhibitory of the human beta-secretase 1."}", "/scratch/micpie/export/BACE/valid_0-9.jsonl": "{"text":"The InChIInChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1 represents a molecule that has a pIC50 of the human beta-secretase 1 (BACE-1) of 8.699 M."} {"text":"The canonical SMILESCCC(CC(=O)NCC1CCOCC1)n1c(N)nc2cc(Cl)ccc21 represents a molecule that has a pIC50 of the human beta-secretase 1 (BACE-1) of 3.000 M."}", "/scratch/micpie/export/BACE/test_0-1.jsonl": "{"text":"Based on the InChI representation InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1, the molecule is inhibitory of the human beta-secretase 1."} {"text":"Based on the InChI InChI=1S\/C20H22N6\/c1-20(2)13-15-10-6-7-11-16(15)18(22-20)21-17(19-23-25-26-24-19)12-14-8-4-3-5-9-14\/h3-11,17H,12-13H2,1-2H3,(H,21,22)(H,23,24,25,26), the molecule is not inhibitory of BACE-1."}", "/scratch/micpie/export/BACE/valid_0-0.jsonl": "{"text":"The compound with the IUPAC name of [(3R,4S,5S)-5-[[4-amino-3-[(2R)-3-ethoxy-1,1,1-trifluoropropan-2-yl]oxy-5-fluorophenyl]methyl]-4-hydroxy-1,1-dioxothian-3-yl]-[(3-tert-butylphenyl)methyl]azanium displays inhibition of the human beta-secretase 1 (BACE-1)."} {"text":"The chemical with the IUPAC name of 3-(2-amino-5-chlorobenzimidazol-1-yl)-N-(oxan-4-ylmethyl)pentanamide shows no inhibition of the human beta-secretase 1 (BACE-1)."}", "/scratch/micpie/export/BACE/test_0-2.jsonl": "{"text":"The SELFIES [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1] represents a molecule that is identified as inhibitory of the human beta-secretase 1."} {"text":"The SELFIES [NH1][N][=N][N][=C][Ring1][Branch1][C][Branch2][Ring1][=Branch1][N][C][=N][C][Branch1][N][C][C][=C][Ring1][=Branch1][C][=C][C][=C][Ring1][=Branch1][Branch1][C][C][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1] represents a molecule that is notidentified as inhibitory of the human beta-secretase 1."}", "/scratch/micpie/export/BACE/train_0-6.jsonl": "{"text":"User: Is the molecule with the SELFIES [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C] inhibitory of the human beta-secretase 1?\nAssistant: Yes, it is inhibitory of the human beta-secretase 1."} {"text":"User: Is the molecule with the InChI InChI=1S\/C15H19ClN4O\/c16-11-5-6-13-12(8-11)19-15(17)20(13)7-1-2-14(21)18-9-10-3-4-10\/h5-6,8,10H,1-4,7,9H2,(H2,17,19)(H,18,21) inhibitory of BACE-1?\nAssistant: No, it is not inhibitory of BACE-1."}", "/scratch/micpie/export/BACE/valid_0-6.jsonl": "{"text":"User: Is the molecule with the canonical SMILES CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F inhibitory of BACE-1?\nAssistant: Yes, it is inhibitory of BACE-1."} {"text":"User: Is the molecule with the canonical SMILES CCC(CC(=O)NCC1CCOCC1)n1c(N)nc2cc(Cl)ccc21 inhibitory of BACE-1?\nAssistant: No, it is not inhibitory of BACE-1."}", "/scratch/micpie/export/BACE/test_0-9.jsonl": "{"text":"The DeepSMILESS=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6 represents a molecule that has a pIC50 of the human beta-secretase 1 (BACE-1) of 8.699 M."} {"text":"The IUPAC name3,3-dimethyl-N-[2-phenyl-1-(2H-tetrazol-5-yl)ethyl]-2,4-dihydroisoquinolin-1-imine represents a molecule that has a pIC50 of the human beta-secretase 1 (BACE-1) of 3.195 M."}", "/scratch/micpie/export/BACE/test_0-0.jsonl": "{"text":"The compound with the canonical SMILES of CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1 shows inhibition of the human beta-secretase 1 (BACE-1)."} {"text":"The compound with the SMILES of [nH]1nnnc1C(NC1=NC(Cc2c1cccc2)(C)C)Cc1ccccc1 displays no inhibition of the human beta-secretase 1 (BACE-1)."}", "/scratch/micpie/export/BACE/valid_0-7.jsonl": "{"text":"The compound with the SELFIES [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2] has a pIC50 of the human beta-secretase 1 (BACE-1) of 8.699 M."} {"text":"The compound with the IUPAC name 3-(2-amino-5-chlorobenzimidazol-1-yl)-N-(oxan-4-ylmethyl)pentanamide has a negative log10 of the 50% inhibitory concentration of BACE-1 of 3.000 M."}", "/scratch/micpie/export/BACE/test_0-3.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibitory of the human beta-secretase 1.\nSMILES: S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibitory of the human beta-secretase 1.\nSELFIES: [NH1][N][=N][N][=C][Ring1][Branch1][C][Branch2][Ring1][=Branch1][N][C][=N][C][Branch1][N][C][C][=C][Ring1][=Branch1][C][=C][C][=C][Ring1][=Branch1][Branch1][C][C][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/BACE/train_0-0.jsonl": "{"text":"The chemical with the SELFIES of [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C] exhibits inhibition of the human beta-secretase 1 (BACE-1)."} {"text":"The compound with the InChI of InChI=1S\/C15H19ClN4O\/c16-11-5-6-13-12(8-11)19-15(17)20(13)7-1-2-14(21)18-9-10-3-4-10\/h5-6,8,10H,1-4,7,9H2,(H2,17,19)(H,18,21) displays no inhibition of the human beta-secretase 1 (BACE-1)."}", "/scratch/micpie/export/BACE/test_0-6.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1 inhibitory of BACE-1?\nAssistant: Yes, it is inhibitory of BACE-1."} {"text":"User: Is the molecule with the canonical SMILES CC1(C)Cc2ccccc2C(NC(Cc2ccccc2)c2nnn[nH]2)=N1 inhibitory of the human beta-secretase 1?\nAssistant: No, it is not inhibitory of the human beta-secretase 1."}", "/scratch/micpie/export/BACE/train_0-3.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibitory of BACE-1.\nMolecule DeepSMILES: OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibitory of BACE-1.\nSELFIES: [Cl][C][=C][C][N][=C][Branch2][Ring1][=Branch2][N][Branch1][Branch2][C][=Ring1][Branch1][C][=C][Ring1][=Branch2][C][C][C][C][=Branch1][C][=O][N][C][C][C][C][Ring1][Ring1][N]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"}", "/scratch/micpie/export/BACE/valid_0-2.jsonl": "{"text":"The IUPAC name [(3R,4S,5S)-5-[[4-amino-3-[(2R)-3-ethoxy-1,1,1-trifluoropropan-2-yl]oxy-5-fluorophenyl]methyl]-4-hydroxy-1,1-dioxothian-3-yl]-[(3-tert-butylphenyl)methyl]azanium represents a molecule that is identified as inhibitory of the human beta-secretase 1."} {"text":"The IUPAC name 3-(2-amino-5-chlorobenzimidazol-1-yl)-N-(oxan-4-ylmethyl)pentanamide represents a molecule that is notidentified as inhibitory of BACE-1."}", "/scratch/micpie/export/BACE/valid_0-1.jsonl": "{"text":"Based on the SELFIES representation [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2], the molecule is inhibitory of BACE-1."} {"text":"Based on the IUPAC name 3-(2-amino-5-chlorobenzimidazol-1-yl)-N-(oxan-4-ylmethyl)pentanamide, the molecule is not inhibitory of the human beta-secretase 1."}", "/scratch/micpie/export/BACE/valid_0-5.jsonl": "{"text":"User: Can you derive if the molecule with the IUPAC name [(3R,4S,5S)-5-[[4-amino-3-[(2R)-3-ethoxy-1,1,1-trifluoropropan-2-yl]oxy-5-fluorophenyl]methyl]-4-hydroxy-1,1-dioxothian-3-yl]-[(3-tert-butylphenyl)methyl]azanium is inhibitory of BACE-1?\nAssistant: Yes, this molecule is inhibitory of BACE-1."} {"text":"User: Can you estimate if the molecule with the InChI InChI=1S\/C18H25ClN4O2\/c1-2-14(10-17(24)21-11-12-5-7-25-8-6-12)23-16-4-3-13(19)9-15(16)22-18(23)20\/h3-4,9,12,14H,2,5-8,10-11H2,1H3,(H2,20,22)(H,21,24) is inhibitory of the human beta-secretase 1?\nAssistant: No, this molecule is not inhibitory of the human beta-secretase 1."}", "/scratch/micpie/export/BACE/valid_0-4.jsonl": "{"text":"Task: Please give me a DeepSMILES based on the description.\nDescription: A molecule that is inhibitory of the human beta-secretase 1.\nResult: S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6"} {"text":"Task: Please give me a SELFIES based on the description below.\nDescription: A molecule that is inhibitory of the human beta-secretase 1.\nResult: [Cl][C][=C][C][N][=C][Branch2][Ring1][#C][N][Branch1][Branch2][C][=Ring1][Branch1][C][=C][Ring1][=Branch2][C][Branch1][S][C][C][=Branch1][C][=O][N][C][C][C][C][O][C][C][Ring1][=Branch1][C][C][N]"}", "/scratch/micpie/export/BACE/train_0-5.jsonl": "{"text":"User: Can you derive if the molecule with the SMILES O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C is inhibitory of the human beta-secretase 1?\nAssistant: Yes, this molecule is inhibitory of the human beta-secretase 1."} {"text":"User: Can you estimate if the molecule with the IUPAC name 4-(2-amino-5-chlorobenzimidazol-1-yl)-N-(cyclopropylmethyl)butanamide is inhibitory of the human beta-secretase 1?\nAssistant: No, this molecule is not inhibitory of the human beta-secretase 1."}", "/scratch/micpie/export/BACE/train_0-2.jsonl": "{"text":"The SMILES O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C represents a molecule that is identified as inhibitory of the human beta-secretase 1."} {"text":"The SMILES Clc1cc2nc(n(c2cc1)CCCC(=O)NCC1CC1)N represents a molecule that is notidentified as inhibitory of the human beta-secretase 1."}", "/scratch/micpie/export/BACE/train_0-7.jsonl": "{"text":"The compound with the InChI InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1 has a negative log10 of the 50% inhibitory concentration of BACE-1 of 9.155 M."} {"text":"The compound with the canonical SMILES Nc1nc2cc(Cl)ccc2n1CCCC(=O)NCC1CC1 has a negative log10 of the 50% inhibitory concentration of BACE-1 of 2.545 M."}", "/scratch/micpie/export/BACE/train_0-1.jsonl": "{"text":"Based on the DeepSMILES OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C, the molecule is inhibitory of BACE-1."} {"text":"Based on the canonical SMILES representation Nc1nc2cc(Cl)ccc2n1CCCC(=O)NCC1CC1, the molecule is not inhibitory of BACE-1."}", "/scratch/micpie/export/BACE/train_0-4.jsonl": "{"text":"Task: Please generate a DeepSMILES based on the text description.\nDescription: A molecule that is inhibitory of the human beta-secretase 1.\nResult: OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C"} {"text":"Task: Please generate a molecule DeepSMILES based on the text description.\nDescription: A molecule that is inhibitory of the human beta-secretase 1.\nResult: Clcccncnc5cc9)))CCCC=O)NCCCC3))))))))))N"}", "/scratch/micpie/export/BACE/test_0-7.jsonl": "{"text":"The compound with the DeepSMILES S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6 has a negative log10 of the 50% inhibitory concentration of BACE-1 of 8.699 M."} {"text":"The compound with the IUPAC name 3,3-dimethyl-N-[2-phenyl-1-(2H-tetrazol-5-yl)ethyl]-2,4-dihydroisoquinolin-1-imine has a pIC50 of the human beta-secretase 1 (BACE-1) of 3.195 M."}", "/scratch/micpie/export/BACE/train_0-9.jsonl": "{"text":"The DeepSMILESOCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C represents a molecule that has a pIC50 of the human beta-secretase 1 (BACE-1) of 9.155 M."} {"text":"The canonical SMILESNc1nc2cc(Cl)ccc2n1CCCC(=O)NCC1CC1 represents a molecule that has a pIC50 of the human beta-secretase 1 (BACE-1) of 2.545 M."}", "/scratch/micpie/export/BACE/valid_0-3.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibitory of the human beta-secretase 1.\nMolecule IUPAC name: [(3R,4S,5S)-5-[[4-amino-3-[(2R)-3-ethoxy-1,1,1-trifluoropropan-2-yl]oxy-5-fluorophenyl]methyl]-4-hydroxy-1,1-dioxothian-3-yl]-[(3-tert-butylphenyl)methyl]azanium\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibitory of the human beta-secretase 1.\nMolecule IUPAC name: 3-(2-amino-5-chlorobenzimidazol-1-yl)-N-(oxan-4-ylmethyl)pentanamide\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/BACE/test_0-8.jsonl": "{"text":"Based on the InChI InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1, the molecule has a pIC50 of the human beta-secretase 1 (BACE-1) of 8.699 M."} {"text":"Based on the IUPAC name 3,3-dimethyl-N-[2-phenyl-1-(2H-tetrazol-5-yl)ethyl]-2,4-dihydroisoquinolin-1-imine, the molecule has a pIC50 of the human beta-secretase 1 (BACE-1) of 3.195 M."}", "/scratch/micpie/export/BACE/test_0-4.jsonl": "{"text":"Task: Please generate a molecule canonical SMILES based on the text description.\nDescription: A molecule that is inhibitory of BACE-1.\nResult: CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1"} {"text":"Task: Please generate a molecule DeepSMILES based on the description below.\nDescription: A molecule that is inhibitory of the human beta-secretase 1.\nResult: [nH]nnnc5CNC=NCCcc6cccc6)))))))C)C)))))Ccccccc6"}", "/scratch/micpie/export/bio_ner_27/test_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: CATERPILLER: CARD, Transcription Enhancer, R (purine)-binding, Pyrin, Lots of Leucine Repeats; CLR: CATERPILLER-like receptor; CIITA: MHC class II transactivator; IL: interleukin; IPAF: Ice protease-activating factor; MHC: Major histocompatibility complex; NAIP: neuronal apoptosis inhibitory protein; NALP: Nacht Domain-, Leucine-Rich Repeat-, and PYD-Containing Protein; NF-κB: Nuclear factor kappa b; NLR: NOD-like receptor; RICK: RIP-like-interacting CLARP kinase; TLR: Toll-like receptor..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: CATERPILLER,0,11,Gene\/Protein\nTranscription Enhancer,19,41,Gene\/Protein\npurine,47,53,Chemical\/Drug\nPyrin,66,71,Gene\/Protein\nLeucine,81,88,Chemical\/Drug\nCLR,98,101,Gene\/Protein\nCATERPILLER - like receptor,103,130,Gene\/Protein\nCIITA,132,137,Gene\/Protein\nMHC class II transactivator,139,166,Gene\/Protein\nIL,168,170,Gene\/Protein\ninterleukin,172,183,Gene\/Protein\nIPAF,185,189,Gene\/Protein\nIce protease - activating factor,191,223,Gene\/Protein\nMHC,225,228,Gene\/Protein\nMajor histocompatibility complex,230,262,Gene\/Protein\nNAIP,264,268,Gene\/Protein\nneuronal apoptosis inhibitory protein,270,307,Gene\/Protein\nNALP,309,313,Gene\/Protein\nNacht Domain -, Leucine - Rich Repeat -, and PYD - Containing Protein,315,384,Gene\/Protein\nNF - κB,386,393,Gene\/Protein\nNuclear factor kappa b,395,417,Gene\/Protein\nNLR,419,422,Gene\/Protein\nNOD - like receptor,424,443,Gene\/Protein\nRICK,445,449,Gene\/Protein\nRIP - like - interacting CLARP kinase,451,488,Gene\/Protein\nTLR,490,493,Gene\/Protein\nToll - like receptor,495,515,Gene\/Protein"} {"text":"Task: Please carry out the NER task for the the text below.\nText: CATERPILLER: CARD, Transcription Enhancer, R (purine)-binding, Pyrin, Lots of Leucine Repeats; CLR: CATERPILLER-like receptor; CIITA: MHC class II transactivator; IL: interleukin; IPAF: Ice protease-activating factor; MHC: Major histocompatibility complex; NAIP: neuronal apoptosis inhibitory protein; NALP: Nacht Domain-, Leucine-Rich Repeat-, and PYD-Containing Protein; NF-κB: Nuclear factor kappa b; NLR: NOD-like receptor; RICK: RIP-like-interacting CLARP kinase; TLR: Toll-like receptor..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: CATERPILLER,0,11,Gene\/Protein\nTranscription Enhancer,19,41,Gene\/Protein\npurine,47,53,Chemical\/Drug\nPyrin,66,71,Gene\/Protein\nLeucine,81,88,Chemical\/Drug\nCLR,98,101,Gene\/Protein\nCATERPILLER - like receptor,103,130,Gene\/Protein\nCIITA,132,137,Gene\/Protein\nMHC class II transactivator,139,166,Gene\/Protein\nIL,168,170,Gene\/Protein\ninterleukin,172,183,Gene\/Protein\nIPAF,185,189,Gene\/Protein\nIce protease - activating factor,191,223,Gene\/Protein\nMHC,225,228,Gene\/Protein\nMajor histocompatibility complex,230,262,Gene\/Protein\nNAIP,264,268,Gene\/Protein\nneuronal apoptosis inhibitory protein,270,307,Gene\/Protein\nNALP,309,313,Gene\/Protein\nNacht Domain -, Leucine - Rich Repeat -, and PYD - Containing Protein,315,384,Gene\/Protein\nNF - κB,386,393,Gene\/Protein\nNuclear factor kappa b,395,417,Gene\/Protein\nNLR,419,422,Gene\/Protein\nNOD - like receptor,424,443,Gene\/Protein\nRICK,445,449,Gene\/Protein\nRIP - like - interacting CLARP kinase,451,488,Gene\/Protein\nTLR,490,493,Gene\/Protein\nToll - like receptor,495,515,Gene\/Protein"}", "/scratch/micpie/export/bio_ner_27/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Four new dammarane-type triterpenoid saponins such as chikusetsusaponin LM3 (1), chikusetsusaponin LM4 (2), chikusetsusaponin LM5 (3), chikusetsusaponin LM6 (4), and twenty known triterpenoid saponins such as ginsenoside Rb3 (5), ginsenoside Rc (6), ginsenoside Rd (7), ginsenoside Re (8), ginsenoside Rg1 (9), ginsenoside F3 (10), ginsenoside F5 (11), ginsenoside F6 (12), chikusetsusaponin IVa (13), chikusetsusaponin V (14), chikusetsusaponin L5 (15), chikusetsusaponin L9a (16), chikusetsusaponin L9bc (17), chikusetsusaponin L10 (18), chikusetsusaponin FK2 (19), chikusetsusaponin FK6 (20), chikusetsusaponin FK7 (21), chikusetsusaponin FT1 (22), chikusetsusaponin LM1 (23), and chikusetsusaponin LM2 (24), were isolated from the leaves of Panax japonicus C..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: dammarane,9,18,Chemical\/Drug\ntriterpenoid saponins,26,47,Chemical\/Drug\nchikusetsusaponin LM3,56,77,Chemical\/Drug\nchikusetsusaponin LM4,84,105,Chemical\/Drug\nchikusetsusaponin LM5,112,133,Chemical\/Drug\nchikusetsusaponin LM6,140,161,Chemical\/Drug\ntriterpenoid saponins,185,206,Chemical\/Drug\nginsenoside Rb3,215,230,Chemical\/Drug\nginsenoside Rc,237,251,Chemical\/Drug\nginsenoside Rd,258,272,Chemical\/Drug\nginsenoside Re,279,293,Chemical\/Drug\nginsenoside Rg1,300,315,Chemical\/Drug\nginsenoside F3,322,336,Chemical\/Drug\nginsenoside F5,344,358,Chemical\/Drug\nginsenoside F6,366,380,Chemical\/Drug\nchikusetsusaponin IVa,388,409,Chemical\/Drug\nchikusetsusaponin V,417,436,Chemical\/Drug\nchikusetsusaponin L5,444,464,Chemical\/Drug\nchikusetsusaponin L9a,472,493,Chemical\/Drug\nchikusetsusaponin L9bc,501,523,Chemical\/Drug\nchikusetsusaponin L10,531,552,Chemical\/Drug\nchikusetsusaponin FK2,560,581,Chemical\/Drug\nchikusetsusaponin FK6,589,610,Chemical\/Drug\nchikusetsusaponin FK7,618,639,Chemical\/Drug\nchikusetsusaponin FT1,647,668,Chemical\/Drug\nchikusetsusaponin LM1,676,697,Chemical\/Drug\nchikusetsusaponin LM2,709,730,Chemical\/Drug"} {"text":"Task: Please carry out the NER task for the the text below.\nText: Of the 15 species, those that consume food with antimalarial or general antiparasitic properties were free from Haemoproteus, Plasmodium, Leucocytozoon, Trypanosoma and microfilariae: Philippine cockatoo (Cacatua haematuropygia), blue and yellow macaw (Ara ararauna), blue-throated macaw (Ara glaucogularis), blue-crowned conure (Thectocercus acuticaudatus), brown-throated conure (Eupsittula pertinax), nanday conure (Aratinga nenday), burrowing parrot (Cyanoliseus patagonus), blue-winged parrotlet (Forpus xanthopterygius), yellow-chevroned parakeet (Brotogeris chiriri), red-tailed Amazon (Amazona brasilensis), and blue-fronted Amazon (Amazona aestiva) (Table 1, and Additional file 3: Table S2)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: Haemoproteus,112,124,Organism\/Species\nPlasmodium,126,136,Organism\/Species\nLeucocytozoon,138,151,Organism\/Species\nTrypanosoma,153,164,Organism\/Species\nmicrofilariae,169,182,Organism\/Species\nPhilippine cockatoo,184,203,Organism\/Species\nCacatua haematuropygia,206,228,Organism\/Species\nblue and yellow macaw,232,253,Organism\/Species\nAra ararauna,256,268,Organism\/Species\nblue - throated macaw,272,293,Organism\/Species\nAra glaucogularis,296,313,Organism\/Species\nblue - crowned conure,317,338,Organism\/Species\nThectocercus acuticaudatus,341,367,Organism\/Species\nbrown - throated conure,371,394,Organism\/Species\nEupsittula pertinax,397,416,Organism\/Species\nnanday conure,420,433,Organism\/Species\nAratinga nenday,436,451,Organism\/Species\nburrowing parrot,455,471,Organism\/Species\nCyanoliseus patagonus,474,495,Organism\/Species\nblue - winged parrotlet,499,522,Organism\/Species\nForpus xanthopterygius,525,547,Organism\/Species\nyellow - chevroned parakeet,551,578,Organism\/Species\nBrotogeris chiriri,581,599,Organism\/Species\nred - tailed Amazon,603,622,Organism\/Species\nAmazona brasilensis,625,644,Organism\/Species\nblue - fronted Amazon,652,673,Organism\/Species\nAmazona aestiva,676,691,Organism\/Species"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_5-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C15H15FN2OS\/c1-10-6-7-11(17)8-13(10)18-15(19)9-20-14-5-3-2-4-12(14)16\/h2-8H,9,17H2,1H3,(H,18,19)\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: Cc1ccc(N)cc1NC(=O)CSc1ccccc1F"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the InChI string.\nInChI string: InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8-\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_2-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the InChI string InChI=1S\/C16H12FN3O\/c17-12-3-1-11(2-4-12)15-13(16(18)21)9-14(20-15)10-5-7-19-8-6-10\/h1-9,20H,(H2,18,21)?\nAssistant: Sure, this molecule has a SMILES of NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1."} {"text":"User: Can you create the SMILES of the molecule with the InChI string InChI=1S\/C17H15N3O3\/c1-23-15-8-7-11(9-14(15)18)16-13(17(21)22)10-20(19-16)12-5-3-2-4-6-12\/h2-10H,18H2,1H3,(H,21,22)?\nAssistant: Sure, this molecule has a SMILES of COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_2-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C20H22FN.ClH\/c21-20-6-5-17-11-15(12-19(17)13-20)7-9-22-10-8-16-3-1-2-4-18(16)14-22;\/h1-6,13,15H,7-12,14H2;1H\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SMILES from the InChI string.\nInChI string: InChI=1S\/C24H20N4O3S\/c1-15-8-10-16(11-9-15)28-23(30)22-21(18-6-2-3-7-19(18)26-22)27-24(28)32-14-20(29)25-13-17-5-4-12-31-17\/h2-12,26H,13-14H2,1H3,(H,25,29)\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_4-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C16H18N2O4S\/c1-11-15(8-9-22-11)16(19)17-10-12-2-6-14(7-3-12)23(20,21)18-13-4-5-13\/h2-3,6-9,13,18H,4-5,10H2,1H3,(H,17,19) can also be represented with the SMILES representation Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1."} {"text":"The molecule with the InChI string InChI=1S\/C16H15N3O2S\/c1-11-12(6-9-21-11)16(20)18-8-5-15-19-14(10-22-15)13-4-2-3-7-17-13\/h2-4,6-7,9-10H,5,8H2,1H3,(H,18,20) can also be represented with the SMILES representation Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_4-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the InChI string InChI=1S\/C16H18N2O4S\/c1-11-15(8-9-22-11)16(19)17-10-12-2-6-14(7-3-12)23(20,21)18-13-4-5-13\/h2-3,6-9,13,18H,4-5,10H2,1H3,(H,17,19)?\nAssistant: Yes, this molecule has a SMILES of Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1."} {"text":"User: Can you create the SMILES of the molecule with the InChI string InChI=1S\/C16H15N3O2S\/c1-11-12(6-9-21-11)16(20)18-8-5-15-19-14(10-22-15)13-4-2-3-7-17-13\/h2-4,6-7,9-10H,5,8H2,1H3,(H,18,20)?\nAssistant: Sure, this molecule has a SMILES of Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_1-4.jsonl": "{"text":"User: Can you create the SMILES of the molecule with the InChI string InChI=1S\/C19H18N2O2S\/c1-13-20-18(12-24-13)15-6-8-16(9-7-15)21-19(22)11-14-4-3-5-17(10-14)23-2\/h3-10,12H,11H2,1-2H3,(H,21,22)?\nAssistant: Yes, this molecule has a SMILES of COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1."} {"text":"User: Can you create the SMILES of the molecule with the InChI string InChI=1S\/C11H10N4O3\/c1-18-5-2-3-7-6(4-5)8-9(13-7)10(16)15(12)11(17)14-8\/h2-4,13H,12H2,1H3,(H,14,17)?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_5-1.jsonl": "{"text":"The molecule with the SMILES Cc1ccc(N)cc1NC(=O)CSc1ccccc1F can also be represented with the InChI string InChI=1S\/C15H15FN2OS\/c1-10-6-7-11(17)8-13(10)18-15(19)9-20-14-5-3-2-4-12(14)16\/h2-8H,9,17H2,1H3,(H,18,19)."} {"text":"The molecule with the SMILES representation of CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21 can also be represented with the InChI string representation InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8-."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_4-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the InChI string InChI=1S\/C15H14N4OS2\/c1-8-7-17-15(22-8)19-13(20)10-5-12(9-3-4-9)18-14(21-2)11(10)6-16\/h5,7,9H,3-4H2,1-2H3,(H,17,19,20)?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of CSc1nc(C2CC2)cc(C(=O)Nc2ncc(C)s2)c1C#N."} {"text":"User: Can you generate the SMILES of the molecule with the InChI string InChI=1S\/C17H22FNO4\/c1-11(2)15(17(21)23-10-14-4-3-9-22-14)19-16(20)12-5-7-13(18)8-6-12\/h5-8,11,14-15H,3-4,9-10H2,1-2H3,(H,19,20)?\nAssistant: Of course, this molecule has a SMILES of CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_4-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the InChI string InChI=1S\/C17H22N2O3S\/c20-17(14-4-2-1-3-5-14)18-12-13-6-10-16(11-7-13)23(21,22)19-15-8-9-15\/h1-2,6-7,10-11,14-15,19H,3-5,8-9,12H2,(H,18,20)?\nAssistant: Yes, this molecule has a SMILES of O=C(NCc1ccc(S(=O)(=O)NC2CC2)cc1)C1CC=CCC1."} {"text":"User: Can you create the SMILES of the molecule with the InChI string InChI=1S\/C17H17FN4O\/c1-2-23-17-9-14(18)5-8-16(17)20-10-13-3-6-15(7-4-13)22-12-19-11-21-22\/h3-9,11-12,20H,2,10H2,1H3?\nAssistant: Sure, this molecule has a SMILES of CCOc1cc(F)ccc1NCc1ccc(-n2cncn2)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_1-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C22H26N2O4\/c1-4-21(25)24-17-9-7-6-8-16(17)13-18(24)22(26)23-14-15-10-11-19(28-5-2)20(12-15)27-3\/h6-12,18H,4-5,13-14H2,1-3H3,(H,23,26)\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C17H14ClN3O3S2\/c1-11-16(26(23,24)10-12-6-2-3-7-13(12)18)25-17(20-11)21-15(22)14-8-4-5-9-19-14\/h2-9H,10H2,1H3,(H,20,21,22)\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_3-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the InChI string.\nInChI string: InChI=1S\/C24H25BrClN3O3\/c1-5-29-16(3)24(15(2)28-29)27-23(30)11-7-17-6-9-21(31-4)18(12-17)14-32-22-10-8-19(26)13-20(22)25\/h6-13H,5,14H2,1-4H3,(H,27,30)\/b11-7+\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C16H16FN3O2S\/c1-2-20(9-3-8-18)15(21)11-23-16-19-10-14(22-16)12-4-6-13(17)7-5-12\/h4-7,10H,2-3,9,11H2,1H3\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_1-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the SMILES CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C22H26N2O4\/c1-4-21(25)24-17-9-7-6-8-16(17)13-18(24)22(26)23-14-15-10-11-19(28-5-2)20(12-15)27-3\/h6-12,18H,4-5,13-14H2,1-3H3,(H,23,26)."} {"text":"User: Can you create the InChI string of the molecule with the SMILES Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C17H14ClN3O3S2\/c1-11-16(26(23,24)10-12-6-2-3-7-13(12)18)25-17(20-11)21-15(22)14-8-4-5-9-19-14\/h2-9H,10H2,1H3,(H,20,21,22)."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_5-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C15H15FN2OS\/c1-10-6-7-11(17)8-13(10)18-15(19)9-20-14-5-3-2-4-12(14)16\/h2-8H,9,17H2,1H3,(H,18,19) can also be represented with the SMILES Cc1ccc(N)cc1NC(=O)CSc1ccccc1F."} {"text":"The molecule with the InChI string InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8- can also be represented with the SMILES CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_1-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the SMILES COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C17H16FNO6S\/c1-24-14-9-11(3-8-16(20)21)10-15(17(14)25-2)26(22,23)19-13-6-4-12(18)5-7-13\/h3-10,19H,1-2H3,(H,20,21)\/b8-3+."} {"text":"User: Can you generate the InChI string of the molecule with the SMILES Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C15H14FNO4S\/c1-9-3-8-13(14(10(9)2)15(18)19)17-22(20,21)12-6-4-11(16)5-7-12\/h3-8,17H,1-2H3,(H,18,19)."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_5-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the InChI string InChI=1S\/C14H15N7S\/c1-2-4-10(5-3-1)8-21-14(18-19-20-21)22-9-12-15-13(17-16-12)11-6-7-11\/h1-5,11H,6-9H2,(H,15,16,17)?\nAssistant: Sure, this molecule has a SMILES of c1ccc(Cn2nnnc2SCc2nc(C3CC3)n[nH]2)cc1."} {"text":"User: Can you tell me the SMILES of the molecule with the InChI string InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of Nc1c2c([nH+]c3ccccc13)CCCC2."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_0-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the SMILES S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1."} {"text":"User: Can you generate the InChI string of the molecule with the SMILES COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C22H18ClN3O2\/c1-28-19-9-7-15(23)12-18(19)21(26-16-5-3-10-24-13-16)17-8-6-14-4-2-11-25-20(14)22(17)27\/h2-13,21,26-27H,1H3."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_3-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C15H17N5\/c1-3-4-13-9-14(20-15(19-13)16-10-17-20)18-12-7-5-11(2)6-8-12\/h5-10,18H,3-4H2,1-2H3 can also be represented with the SMILES CCCc1cc(Nc2ccc(C)cc2)n2ncnc2n1."} {"text":"The molecule with the InChI string representation of InChI=1S\/C15H18N2O4\/c1-11(18)16-13-5-2-4-12(10-13)15(20)21-9-8-17-7-3-6-14(17)19\/h2,4-5,10H,3,6-9H2,1H3,(H,16,18) can also be represented with the SMILES representation CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_0-1.jsonl": "{"text":"The molecule with the SMILES representation of S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1 can also be represented with the InChI string representation InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1."} {"text":"The molecule with the SMILES COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O can also be represented with the InChI string InChI=1S\/C22H18ClN3O2\/c1-28-19-9-7-15(23)12-18(19)21(26-16-5-3-10-24-13-16)17-8-6-14-4-2-11-25-20(14)22(17)27\/h2-13,21,26-27H,1H3."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_5-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C17H21N3O2S\/c1-11(2)15(17(22)20-6-3-4-7-20)19-16(21)12-9-14-13(18-10-12)5-8-23-14\/h5,8-11,15H,3-4,6-7H2,1-2H3,(H,19,21) can also be represented with the SMILES CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1."} {"text":"The molecule with the InChI string InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1 can also be represented with the SMILES representation CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_2-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C16H12FN3O\/c17-12-3-1-11(2-4-12)15-13(16(18)21)9-14(20-15)10-5-7-19-8-6-10\/h1-9,20H,(H2,18,21) can also be represented with the SMILES NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1."} {"text":"The molecule with the InChI string representation of InChI=1S\/C17H15N3O3\/c1-23-15-8-7-11(9-14(15)18)16-13(17(21)22)10-20(19-16)12-5-3-2-4-6-12\/h2-10H,18H2,1H3,(H,21,22) can also be represented with the SMILES representation COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_2-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C22H22Cl2N2O2\/c1-3-5-12-26(4-2)22(28)21(27)19-17-13-16(24)10-11-18(17)25-20(19)14-6-8-15(23)9-7-14\/h6-11,13,25H,3-5,12H2,1-2H3\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C22H26BrClN4O2\/c1-26(15-21(29)25-20-5-3-2-4-19(20)23)22(30)16-28-12-10-27(11-13-28)14-17-6-8-18(24)9-7-17\/h2-9H,10-16H2,1H3,(H,25,29)\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_0-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1 can also be represented with the SMILES S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1."} {"text":"The molecule with the InChI string InChI=1S\/C22H23N3O4\/c1-22(2,3)24-21(27)19(11-7-10-16-8-5-4-6-9-16)23-20(26)17-12-14-18(15-13-17)25(28)29\/h4-15H,1-3H3,(H,23,26)(H,24,27)\/b10-7+,19-11+ can also be represented with the SMILES CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_3-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the InChI string from the SMILES.\nSMILES: CCCc1cc(Nc2ccc(C)cc2)n2ncnc2n1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C15H17N5\/c1-3-4-13-9-14(20-15(19-13)16-10-17-20)18-12-7-5-11(2)6-8-12\/h5-10,18H,3-4H2,1-2H3"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the SMILES.\nSMILES: CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C15H18N2O4\/c1-11(18)16-13-5-2-4-12(10-13)15(20)21-9-8-17-7-3-6-14(17)19\/h2,4-5,10H,3,6-9H2,1H3,(H,16,18)"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_5-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the InChI string from the SMILES.\nSMILES: c1ccc(Cn2nnnc2SCc2nc(C3CC3)n[nH]2)cc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C14H15N7S\/c1-2-4-10(5-3-1)8-21-14(18-19-20-21)22-9-12-15-13(17-16-12)11-6-7-11\/h1-5,11H,6-9H2,(H,15,16,17)"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the SMILES.\nSMILES: Nc1c2c([nH+]c3ccccc13)CCCC2\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_3-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the SMILES COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C25H22FN5O3S\/c1-31-8-7-19-16(12-31)9-15(10-27)24(30-19)35-13-20-22(25(32)33-2)21(18(11-28)23(29)34-20)14-3-5-17(26)6-4-14\/h3-6,9,21H,7-8,12-13,29H2,1-2H3."} {"text":"User: Can you generate the InChI string of the molecule with the SMILES O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C16H16N2O4\/c19-14-10-12(11-4-1-2-5-13(11)17-14)16(21)22-9-8-18-7-3-6-15(18)20\/h1-2,4-5,10H,3,6-9H2,(H,17,19)."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_5-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the SMILES CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C17H21N3O2S\/c1-11(2)15(17(22)20-6-3-4-7-20)19-16(21)12-9-14-13(18-10-12)5-8-23-14\/h5,8-11,15H,3-4,6-7H2,1-2H3,(H,19,21)."} {"text":"User: Can you create the InChI string of the molecule with the SMILES CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_4-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the SMILES.\nSMILES: CSc1nc(C2CC2)cc(C(=O)Nc2ncc(C)s2)c1C#N\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C15H14N4OS2\/c1-8-7-17-15(22-8)19-13(20)10-5-12(9-3-4-9)18-14(21-2)11(10)6-16\/h5,7,9H,3-4H2,1-2H3,(H,17,19,20)"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the InChI string from the SMILES.\nSMILES: CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C17H22FNO4\/c1-11(2)15(17(21)23-10-14-4-3-9-22-14)19-16(20)12-5-7-13(18)8-6-12\/h5-8,11,14-15H,3-4,9-10H2,1-2H3,(H,19,20)"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_1-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the InChI string from the SMILES.\nMolecule SMILES: COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C17H16FNO6S\/c1-24-14-9-11(3-8-16(20)21)10-15(17(14)25-2)26(22,23)19-13-6-4-12(18)5-7-13\/h3-10,19H,1-2H3,(H,20,21)\/b8-3+"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the InChI string from the SMILES.\nMolecule SMILES: Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C15H14FNO4S\/c1-9-3-8-13(14(10(9)2)15(18)19)17-22(20,21)12-6-4-11(16)5-7-12\/h3-8,17H,1-2H3,(H,18,19)"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_0-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SMILES from the InChI string.\nInChI string: InChI=1S\/C22H18ClN3O2\/c1-28-19-9-7-15(23)12-18(19)21(26-16-5-3-10-24-13-16)17-8-6-14-4-2-11-25-20(14)22(17)27\/h2-13,21,26-27H,1H3\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_3-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C24H25BrClN3O3\/c1-5-29-16(3)24(15(2)28-29)27-23(30)11-7-17-6-9-21(31-4)18(12-17)14-32-22-10-8-19(26)13-20(22)25\/h6-13H,5,14H2,1-4H3,(H,27,30)\/b11-7+ can also be represented with the SMILES representation CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C."} {"text":"The molecule with the InChI string InChI=1S\/C16H16FN3O2S\/c1-2-20(9-3-8-18)15(21)11-23-16-19-10-14(22-16)12-4-6-13(17)7-5-12\/h4-7,10H,2-3,9,11H2,1H3 can also be represented with the SMILES representation CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_1-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C17H16FNO6S\/c1-24-14-9-11(3-8-16(20)21)10-15(17(14)25-2)26(22,23)19-13-6-4-12(18)5-7-13\/h3-10,19H,1-2H3,(H,20,21)\/b8-3+ can also be represented with the SMILES COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC."} {"text":"The molecule with the InChI string representation of InChI=1S\/C15H14FNO4S\/c1-9-3-8-13(14(10(9)2)15(18)19)17-22(20,21)12-6-4-11(16)5-7-12\/h3-8,17H,1-2H3,(H,18,19) can also be represented with the SMILES Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_5-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SMILES from the InChI string.\nInChI string: InChI=1S\/C17H21N3O2S\/c1-11(2)15(17(22)20-6-3-4-7-20)19-16(21)12-9-14-13(18-10-12)5-8-23-14\/h5,8-11,15H,3-4,6-7H2,1-2H3,(H,19,21)\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SMILES from the InChI string.\nInChI string: InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_2-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the SMILES.\nSMILES: Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C20H22FN.ClH\/c21-20-6-5-17-11-15(12-19(17)13-20)7-9-22-10-8-16-3-1-2-4-18(16)14-22;\/h1-6,13,15H,7-12,14H2;1H"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the InChI string from the SMILES.\nMolecule SMILES: Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C24H20N4O3S\/c1-15-8-10-16(11-9-15)28-23(30)22-21(18-6-2-3-7-19(18)26-22)27-24(28)32-14-20(29)25-13-17-5-4-12-31-17\/h2-12,26H,13-14H2,1H3,(H,25,29)"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_1-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C19H18N2O2S\/c1-13-20-18(12-24-13)15-6-8-16(9-7-15)21-19(22)11-14-4-3-5-17(10-14)23-2\/h3-10,12H,11H2,1-2H3,(H,21,22)\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C11H10N4O3\/c1-18-5-2-3-7-6(4-5)8-9(13-7)10(16)15(12)11(17)14-8\/h2-4,13H,12H2,1H3,(H,14,17)\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_5-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the SMILES Cc1ccc(N)cc1NC(=O)CSc1ccccc1F?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C15H15FN2OS\/c1-10-6-7-11(17)8-13(10)18-15(19)9-20-14-5-3-2-4-12(14)16\/h2-8H,9,17H2,1H3,(H,18,19)."} {"text":"User: Can you tell me the InChI string of the molecule with the SMILES CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8-."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_2-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the InChI string InChI=1S\/C20H22FN.ClH\/c21-20-6-5-17-11-15(12-19(17)13-20)7-9-22-10-8-16-3-1-2-4-18(16)14-22;\/h1-6,13,15H,7-12,14H2;1H?\nAssistant: Of course, this molecule has a SMILES of Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2."} {"text":"User: Can you create the SMILES of the molecule with the InChI string InChI=1S\/C24H20N4O3S\/c1-15-8-10-16(11-9-15)28-23(30)22-21(18-6-2-3-7-19(18)26-22)27-24(28)32-14-20(29)25-13-17-5-4-12-31-17\/h2-12,26H,13-14H2,1H3,(H,25,29)?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_3-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the InChI string InChI=1S\/C15H17N5\/c1-3-4-13-9-14(20-15(19-13)16-10-17-20)18-12-7-5-11(2)6-8-12\/h5-10,18H,3-4H2,1-2H3?\nAssistant: Of course, this molecule has a SMILES of CCCc1cc(Nc2ccc(C)cc2)n2ncnc2n1."} {"text":"User: Can you tell me the SMILES of the molecule with the InChI string InChI=1S\/C15H18N2O4\/c1-11(18)16-13-5-2-4-12(10-13)15(20)21-9-8-17-7-3-6-14(17)19\/h2,4-5,10H,3,6-9H2,1H3,(H,16,18)?\nAssistant: Yes, this molecule has a SMILES of CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_0-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1 can also be represented with the SMILES S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1."} {"text":"The molecule with the InChI string representation of InChI=1S\/C22H18ClN3O2\/c1-28-19-9-7-15(23)12-18(19)21(26-16-5-3-10-24-13-16)17-8-6-14-4-2-11-25-20(14)22(17)27\/h2-13,21,26-27H,1H3 can also be represented with the SMILES COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_2-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the SMILES NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C16H12FN3O\/c17-12-3-1-11(2-4-12)15-13(16(18)21)9-14(20-15)10-5-7-19-8-6-10\/h1-9,20H,(H2,18,21)."} {"text":"User: Can you create the InChI string of the molecule with the SMILES COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C17H15N3O3\/c1-23-15-8-7-11(9-14(15)18)16-13(17(21)22)10-20(19-16)12-5-3-2-4-6-12\/h2-10H,18H2,1H3,(H,21,22)."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_5-5.jsonl": "{"text":"User: Can you generate the InChI string of the molecule with the SMILES c1ccc(Cn2nnnc2SCc2nc(C3CC3)n[nH]2)cc1?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C14H15N7S\/c1-2-4-10(5-3-1)8-21-14(18-19-20-21)22-9-12-15-13(17-16-12)11-6-7-11\/h1-5,11H,6-9H2,(H,15,16,17)."} {"text":"User: Can you create the InChI string of the molecule with the SMILES Nc1c2c([nH+]c3ccccc13)CCCC2?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_3-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the SMILES CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C24H25BrClN3O3\/c1-5-29-16(3)24(15(2)28-29)27-23(30)11-7-17-6-9-21(31-4)18(12-17)14-32-22-10-8-19(26)13-20(22)25\/h6-13H,5,14H2,1-4H3,(H,27,30)\/b11-7+."} {"text":"User: Can you create the InChI string of the molecule with the SMILES CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C16H16FN3O2S\/c1-2-20(9-3-8-18)15(21)11-23-16-19-10-14(22-16)12-4-6-13(17)7-5-12\/h4-7,10H,2-3,9,11H2,1H3."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_2-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C20H22FN.ClH\/c21-20-6-5-17-11-15(12-19(17)13-20)7-9-22-10-8-16-3-1-2-4-18(16)14-22;\/h1-6,13,15H,7-12,14H2;1H can also be represented with the SMILES representation Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2."} {"text":"The molecule with the InChI string representation of InChI=1S\/C24H20N4O3S\/c1-15-8-10-16(11-9-15)28-23(30)22-21(18-6-2-3-7-19(18)26-22)27-24(28)32-14-20(29)25-13-17-5-4-12-31-17\/h2-12,26H,13-14H2,1H3,(H,25,29) can also be represented with the SMILES Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_1-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the InChI string InChI=1S\/C17H16FNO6S\/c1-24-14-9-11(3-8-16(20)21)10-15(17(14)25-2)26(22,23)19-13-6-4-12(18)5-7-13\/h3-10,19H,1-2H3,(H,20,21)\/b8-3+?\nAssistant: Of course, this molecule has a SMILES of COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC."} {"text":"User: Can you generate the SMILES of the molecule with the InChI string InChI=1S\/C15H14FNO4S\/c1-9-3-8-13(14(10(9)2)15(18)19)17-22(20,21)12-6-4-11(16)5-7-12\/h3-8,17H,1-2H3,(H,18,19)?\nAssistant: Of course, this molecule has a SMILES of Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_2-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C22H22Cl2N2O2\/c1-3-5-12-26(4-2)22(28)21(27)19-17-13-16(24)10-11-18(17)25-20(19)14-6-8-15(23)9-7-14\/h6-11,13,25H,3-5,12H2,1-2H3 can also be represented with the SMILES CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12."} {"text":"The molecule with the InChI string InChI=1S\/C22H26BrClN4O2\/c1-26(15-21(29)25-20-5-3-2-4-19(20)23)22(30)16-28-12-10-27(11-13-28)14-17-6-8-18(24)9-7-17\/h2-9H,10-16H2,1H3,(H,25,29) can also be represented with the SMILES representation CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_4-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C15H14N4OS2\/c1-8-7-17-15(22-8)19-13(20)10-5-12(9-3-4-9)18-14(21-2)11(10)6-16\/h5,7,9H,3-4H2,1-2H3,(H,17,19,20)\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CSc1nc(C2CC2)cc(C(=O)Nc2ncc(C)s2)c1C#N"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SMILES from the InChI string.\nInChI string: InChI=1S\/C17H22FNO4\/c1-11(2)15(17(21)23-10-14-4-3-9-22-14)19-16(20)12-5-7-13(18)8-6-12\/h5-8,11,14-15H,3-4,9-10H2,1-2H3,(H,19,20)\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_0-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the SMILES.\nMolecule SMILES: S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the SMILES.\nSMILES: COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C22H18ClN3O2\/c1-28-19-9-7-15(23)12-18(19)21(26-16-5-3-10-24-13-16)17-8-6-14-4-2-11-25-20(14)22(17)27\/h2-13,21,26-27H,1H3"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_4-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the InChI string.\nInChI string: InChI=1S\/C16H18N2O4S\/c1-11-15(8-9-22-11)16(19)17-10-12-2-6-14(7-3-12)23(20,21)18-13-4-5-13\/h2-3,6-9,13,18H,4-5,10H2,1H3,(H,17,19)\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C16H15N3O2S\/c1-11-12(6-9-21-11)16(20)18-8-5-15-19-14(10-22-15)13-4-2-3-7-17-13\/h2-4,6-7,9-10H,5,8H2,1H3,(H,18,20)\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_3-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SMILES from the InChI string.\nInChI string: InChI=1S\/C15H17N5\/c1-3-4-13-9-14(20-15(19-13)16-10-17-20)18-12-7-5-11(2)6-8-12\/h5-10,18H,3-4H2,1-2H3\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCCc1cc(Nc2ccc(C)cc2)n2ncnc2n1"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C15H18N2O4\/c1-11(18)16-13-5-2-4-12(10-13)15(20)21-9-8-17-7-3-6-14(17)19\/h2,4-5,10H,3,6-9H2,1H3,(H,16,18)\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_2-1.jsonl": "{"text":"The molecule with the SMILES representation of CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12 can also be represented with the InChI string InChI=1S\/C22H22Cl2N2O2\/c1-3-5-12-26(4-2)22(28)21(27)19-17-13-16(24)10-11-18(17)25-20(19)14-6-8-15(23)9-7-14\/h6-11,13,25H,3-5,12H2,1-2H3."} {"text":"The molecule with the SMILES CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1 can also be represented with the InChI string InChI=1S\/C22H26BrClN4O2\/c1-26(15-21(29)25-20-5-3-2-4-19(20)23)22(30)16-28-12-10-27(11-13-28)14-17-6-8-18(24)9-7-17\/h2-9H,10-16H2,1H3,(H,25,29)."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_5-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the InChI string InChI=1S\/C17H21N3O2S\/c1-11(2)15(17(22)20-6-3-4-7-20)19-16(21)12-9-14-13(18-10-12)5-8-23-14\/h5,8-11,15H,3-4,6-7H2,1-2H3,(H,19,21)?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1."} {"text":"User: Can you generate the SMILES of the molecule with the InChI string InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1?\nAssistant: Yes, this molecule has a SMILES of CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_4-5.jsonl": "{"text":"User: Can you generate the InChI string of the molecule with the SMILES Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C16H18N2O4S\/c1-11-15(8-9-22-11)16(19)17-10-12-2-6-14(7-3-12)23(20,21)18-13-4-5-13\/h2-3,6-9,13,18H,4-5,10H2,1H3,(H,17,19)."} {"text":"User: Can you create the InChI string of the molecule with the SMILES Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C16H15N3O2S\/c1-11-12(6-9-21-11)16(20)18-8-5-15-19-14(10-22-15)13-4-2-3-7-17-13\/h2-4,6-7,9-10H,5,8H2,1H3,(H,18,20)."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_4-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C15H14N4OS2\/c1-8-7-17-15(22-8)19-13(20)10-5-12(9-3-4-9)18-14(21-2)11(10)6-16\/h5,7,9H,3-4H2,1-2H3,(H,17,19,20) can also be represented with the SMILES representation CSc1nc(C2CC2)cc(C(=O)Nc2ncc(C)s2)c1C#N."} {"text":"The molecule with the InChI string representation of InChI=1S\/C17H22FNO4\/c1-11(2)15(17(21)23-10-14-4-3-9-22-14)19-16(20)12-5-7-13(18)8-6-12\/h5-8,11,14-15H,3-4,9-10H2,1-2H3,(H,19,20) can also be represented with the SMILES representation CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_5-1.jsonl": "{"text":"The molecule with the SMILES representation of c1ccc(Cn2nnnc2SCc2nc(C3CC3)n[nH]2)cc1 can also be represented with the InChI string representation InChI=1S\/C14H15N7S\/c1-2-4-10(5-3-1)8-21-14(18-19-20-21)22-9-12-15-13(17-16-12)11-6-7-11\/h1-5,11H,6-9H2,(H,15,16,17)."} {"text":"The molecule with the SMILES representation of Nc1c2c([nH+]c3ccccc13)CCCC2 can also be represented with the InChI string InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_2-1.jsonl": "{"text":"The molecule with the SMILES representation of NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1 can also be represented with the InChI string InChI=1S\/C16H12FN3O\/c17-12-3-1-11(2-4-12)15-13(16(18)21)9-14(20-15)10-5-7-19-8-6-10\/h1-9,20H,(H2,18,21)."} {"text":"The molecule with the SMILES representation of COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N can also be represented with the InChI string InChI=1S\/C17H15N3O3\/c1-23-15-8-7-11(9-14(15)18)16-13(17(21)22)10-20(19-16)12-5-3-2-4-6-12\/h2-10H,18H2,1H3,(H,21,22)."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_0-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1 can also be represented with the SMILES representation O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C."} {"text":"The molecule with the InChI string representation of InChI=1S\/C18H16N2O3S\/c1-22-13-9-7-12(8-10-13)16-11-15(20-23-16)18(21)19-14-5-3-4-6-17(14)24-2\/h3-11H,1-2H3,(H,19,21) can also be represented with the SMILES COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_0-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the SMILES.\nMolecule SMILES: O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the SMILES.\nMolecule SMILES: COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C18H16N2O3S\/c1-22-13-9-7-12(8-10-13)16-11-15(20-23-16)18(21)19-14-5-3-4-6-17(14)24-2\/h3-11H,1-2H3,(H,19,21)"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_1-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the InChI string InChI=1S\/C22H26N2O4\/c1-4-21(25)24-17-9-7-6-8-16(17)13-18(24)22(26)23-14-15-10-11-19(28-5-2)20(12-15)27-3\/h6-12,18H,4-5,13-14H2,1-3H3,(H,23,26)?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC."} {"text":"User: Can you tell me the SMILES of the molecule with the InChI string InChI=1S\/C17H14ClN3O3S2\/c1-11-16(26(23,24)10-12-6-2-3-7-13(12)18)25-17(20-11)21-15(22)14-8-4-5-9-19-14\/h2-9H,10H2,1H3,(H,20,21,22)?\nAssistant: Sure, this molecule has a SMILES of Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_2-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the SMILES Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C20H22FN.ClH\/c21-20-6-5-17-11-15(12-19(17)13-20)7-9-22-10-8-16-3-1-2-4-18(16)14-22;\/h1-6,13,15H,7-12,14H2;1H."} {"text":"User: Can you tell me the InChI string of the molecule with the SMILES Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C24H20N4O3S\/c1-15-8-10-16(11-9-15)28-23(30)22-21(18-6-2-3-7-19(18)26-22)27-24(28)32-14-20(29)25-13-17-5-4-12-31-17\/h2-12,26H,13-14H2,1H3,(H,25,29)."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_1-1.jsonl": "{"text":"The molecule with the SMILES CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC can also be represented with the InChI string InChI=1S\/C22H26N2O4\/c1-4-21(25)24-17-9-7-6-8-16(17)13-18(24)22(26)23-14-15-10-11-19(28-5-2)20(12-15)27-3\/h6-12,18H,4-5,13-14H2,1-3H3,(H,23,26)."} {"text":"The molecule with the SMILES representation of Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl can also be represented with the InChI string InChI=1S\/C17H14ClN3O3S2\/c1-11-16(26(23,24)10-12-6-2-3-7-13(12)18)25-17(20-11)21-15(22)14-8-4-5-9-19-14\/h2-9H,10H2,1H3,(H,20,21,22)."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_5-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the InChI string.\nInChI string: InChI=1S\/C14H15N7S\/c1-2-4-10(5-3-1)8-21-14(18-19-20-21)22-9-12-15-13(17-16-12)11-6-7-11\/h1-5,11H,6-9H2,(H,15,16,17)\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: c1ccc(Cn2nnnc2SCc2nc(C3CC3)n[nH]2)cc1"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SMILES from the InChI string.\nInChI string: InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Nc1c2c([nH+]c3ccccc13)CCCC2"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_5-1.jsonl": "{"text":"The molecule with the SMILES representation of CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1 can also be represented with the InChI string InChI=1S\/C17H21N3O2S\/c1-11(2)15(17(22)20-6-3-4-7-20)19-16(21)12-9-14-13(18-10-12)5-8-23-14\/h5,8-11,15H,3-4,6-7H2,1-2H3,(H,19,21)."} {"text":"The molecule with the SMILES representation of CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1 can also be represented with the InChI string representation InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_4-1.jsonl": "{"text":"The molecule with the SMILES O=C(NCc1ccc(S(=O)(=O)NC2CC2)cc1)C1CC=CCC1 can also be represented with the InChI string InChI=1S\/C17H22N2O3S\/c20-17(14-4-2-1-3-5-14)18-12-13-6-10-16(11-7-13)23(21,22)19-15-8-9-15\/h1-2,6-7,10-11,14-15,19H,3-5,8-9,12H2,(H,18,20)."} {"text":"The molecule with the SMILES representation of CCOc1cc(F)ccc1NCc1ccc(-n2cncn2)cc1 can also be represented with the InChI string representation InChI=1S\/C17H17FN4O\/c1-2-23-17-9-14(18)5-8-16(17)20-10-13-3-6-15(7-4-13)22-12-19-11-21-22\/h3-9,11-12,20H,2,10H2,1H3."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_3-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the InChI string from the SMILES.\nMolecule SMILES: CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C24H25BrClN3O3\/c1-5-29-16(3)24(15(2)28-29)27-23(30)11-7-17-6-9-21(31-4)18(12-17)14-32-22-10-8-19(26)13-20(22)25\/h6-13H,5,14H2,1-4H3,(H,27,30)\/b11-7+"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the SMILES.\nMolecule SMILES: CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C16H16FN3O2S\/c1-2-20(9-3-8-18)15(21)11-23-16-19-10-14(22-16)12-4-6-13(17)7-5-12\/h4-7,10H,2-3,9,11H2,1H3"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_3-4.jsonl": "{"text":"User: Can you create the SMILES of the molecule with the InChI string InChI=1S\/C25H22FN5O3S\/c1-31-8-7-19-16(12-31)9-15(10-27)24(30-19)35-13-20-22(25(32)33-2)21(18(11-28)23(29)34-20)14-3-5-17(26)6-4-14\/h3-6,9,21H,7-8,12-13,29H2,1-2H3?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1."} {"text":"User: Can you generate the SMILES of the molecule with the InChI string InChI=1S\/C16H16N2O4\/c19-14-10-12(11-4-1-2-5-13(11)17-14)16(21)22-9-8-18-7-3-6-15(18)20\/h1-2,4-5,10H,3,6-9H2,(H,17,19)?\nAssistant: Yes, this molecule has a SMILES of O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_1-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the InChI string from the SMILES.\nSMILES: COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C19H18N2O2S\/c1-13-20-18(12-24-13)15-6-8-16(9-7-15)21-19(22)11-14-4-3-5-17(10-14)23-2\/h3-10,12H,11H2,1-2H3,(H,21,22)"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the SMILES.\nSMILES: COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C11H10N4O3\/c1-18-5-2-3-7-6(4-5)8-9(13-7)10(16)15(12)11(17)14-8\/h2-4,13H,12H2,1H3,(H,14,17)"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_0-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SMILES from the InChI string.\nInChI string: InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C22H23N3O4\/c1-22(2,3)24-21(27)19(11-7-10-16-8-5-4-6-9-16)23-20(26)17-12-14-18(15-13-17)25(28)29\/h4-15H,1-3H3,(H,23,26)(H,24,27)\/b10-7+,19-11+\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_5-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C14H15N7S\/c1-2-4-10(5-3-1)8-21-14(18-19-20-21)22-9-12-15-13(17-16-12)11-6-7-11\/h1-5,11H,6-9H2,(H,15,16,17) can also be represented with the SMILES c1ccc(Cn2nnnc2SCc2nc(C3CC3)n[nH]2)cc1."} {"text":"The molecule with the InChI string InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1 can also be represented with the SMILES representation Nc1c2c([nH+]c3ccccc13)CCCC2."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_0-1.jsonl": "{"text":"The molecule with the SMILES representation of S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1 can also be represented with the InChI string representation InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1."} {"text":"The molecule with the SMILES CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1 can also be represented with the InChI string InChI=1S\/C22H23N3O4\/c1-22(2,3)24-21(27)19(11-7-10-16-8-5-4-6-9-16)23-20(26)17-12-14-18(15-13-17)25(28)29\/h4-15H,1-3H3,(H,23,26)(H,24,27)\/b10-7+,19-11+."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_2-4.jsonl": "{"text":"User: Can you create the SMILES of the molecule with the InChI string InChI=1S\/C22H22Cl2N2O2\/c1-3-5-12-26(4-2)22(28)21(27)19-17-13-16(24)10-11-18(17)25-20(19)14-6-8-15(23)9-7-14\/h6-11,13,25H,3-5,12H2,1-2H3?\nAssistant: Yes, this molecule has a SMILES of CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12."} {"text":"User: Can you create the SMILES of the molecule with the InChI string InChI=1S\/C22H26BrClN4O2\/c1-26(15-21(29)25-20-5-3-2-4-19(20)23)22(30)16-28-12-10-27(11-13-28)14-17-6-8-18(24)9-7-17\/h2-9H,10-16H2,1H3,(H,25,29)?\nAssistant: Yes, this molecule has a SMILES of CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_2-1.jsonl": "{"text":"The molecule with the SMILES representation of Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2 can also be represented with the InChI string representation InChI=1S\/C20H22FN.ClH\/c21-20-6-5-17-11-15(12-19(17)13-20)7-9-22-10-8-16-3-1-2-4-18(16)14-22;\/h1-6,13,15H,7-12,14H2;1H."} {"text":"The molecule with the SMILES representation of Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1 can also be represented with the InChI string representation InChI=1S\/C24H20N4O3S\/c1-15-8-10-16(11-9-15)28-23(30)22-21(18-6-2-3-7-19(18)26-22)27-24(28)32-14-20(29)25-13-17-5-4-12-31-17\/h2-12,26H,13-14H2,1H3,(H,25,29)."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_1-1.jsonl": "{"text":"The molecule with the SMILES COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1 can also be represented with the InChI string representation InChI=1S\/C19H18N2O2S\/c1-13-20-18(12-24-13)15-6-8-16(9-7-15)21-19(22)11-14-4-3-5-17(10-14)23-2\/h3-10,12H,11H2,1-2H3,(H,21,22)."} {"text":"The molecule with the SMILES COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1 can also be represented with the InChI string InChI=1S\/C11H10N4O3\/c1-18-5-2-3-7-6(4-5)8-9(13-7)10(16)15(12)11(17)14-8\/h2-4,13H,12H2,1H3,(H,14,17)."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_3-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the SMILES CCCc1cc(Nc2ccc(C)cc2)n2ncnc2n1?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C15H17N5\/c1-3-4-13-9-14(20-15(19-13)16-10-17-20)18-12-7-5-11(2)6-8-12\/h5-10,18H,3-4H2,1-2H3."} {"text":"User: Can you tell me the InChI string of the molecule with the SMILES CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C15H18N2O4\/c1-11(18)16-13-5-2-4-12(10-13)15(20)21-9-8-17-7-3-6-14(17)19\/h2,4-5,10H,3,6-9H2,1H3,(H,16,18)."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_3-1.jsonl": "{"text":"The molecule with the SMILES CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C can also be represented with the InChI string InChI=1S\/C24H25BrClN3O3\/c1-5-29-16(3)24(15(2)28-29)27-23(30)11-7-17-6-9-21(31-4)18(12-17)14-32-22-10-8-19(26)13-20(22)25\/h6-13H,5,14H2,1-4H3,(H,27,30)\/b11-7+."} {"text":"The molecule with the SMILES representation of CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1 can also be represented with the InChI string InChI=1S\/C16H16FN3O2S\/c1-2-20(9-3-8-18)15(21)11-23-16-19-10-14(22-16)12-4-6-13(17)7-5-12\/h4-7,10H,2-3,9,11H2,1H3."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_0-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the SMILES S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1."} {"text":"User: Can you create the InChI string of the molecule with the SMILES CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C22H23N3O4\/c1-22(2,3)24-21(27)19(11-7-10-16-8-5-4-6-9-16)23-20(26)17-12-14-18(15-13-17)25(28)29\/h4-15H,1-3H3,(H,23,26)(H,24,27)\/b10-7+,19-11+."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_0-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the InChI string InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1."} {"text":"User: Can you generate the SMILES of the molecule with the InChI string InChI=1S\/C22H23N3O4\/c1-22(2,3)24-21(27)19(11-7-10-16-8-5-4-6-9-16)23-20(26)17-12-14-18(15-13-17)25(28)29\/h4-15H,1-3H3,(H,23,26)(H,24,27)\/b10-7+,19-11+?\nAssistant: Yes, this molecule has a SMILES of CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_1-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the InChI string from the SMILES.\nSMILES: CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C22H26N2O4\/c1-4-21(25)24-17-9-7-6-8-16(17)13-18(24)22(26)23-14-15-10-11-19(28-5-2)20(12-15)27-3\/h6-12,18H,4-5,13-14H2,1-3H3,(H,23,26)"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the SMILES.\nMolecule SMILES: Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C17H14ClN3O3S2\/c1-11-16(26(23,24)10-12-6-2-3-7-13(12)18)25-17(20-11)21-15(22)14-8-4-5-9-19-14\/h2-9H,10H2,1H3,(H,20,21,22)"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_5-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the SMILES.\nMolecule SMILES: CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C17H21N3O2S\/c1-11(2)15(17(22)20-6-3-4-7-20)19-16(21)12-9-14-13(18-10-12)5-8-23-14\/h5,8-11,15H,3-4,6-7H2,1-2H3,(H,19,21)"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the InChI string from the SMILES.\nSMILES: CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_0-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the SMILES O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1."} {"text":"User: Can you tell me the InChI string of the molecule with the SMILES COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C18H16N2O3S\/c1-22-13-9-7-12(8-10-13)16-11-15(20-23-16)18(21)19-14-5-3-4-6-17(14)24-2\/h3-11H,1-2H3,(H,19,21)."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_4-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the SMILES O=C(NCc1ccc(S(=O)(=O)NC2CC2)cc1)C1CC=CCC1?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C17H22N2O3S\/c20-17(14-4-2-1-3-5-14)18-12-13-6-10-16(11-7-13)23(21,22)19-15-8-9-15\/h1-2,6-7,10-11,14-15,19H,3-5,8-9,12H2,(H,18,20)."} {"text":"User: Can you generate the InChI string of the molecule with the SMILES CCOc1cc(F)ccc1NCc1ccc(-n2cncn2)cc1?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C17H17FN4O\/c1-2-23-17-9-14(18)5-8-16(17)20-10-13-3-6-15(7-4-13)22-12-19-11-21-22\/h3-9,11-12,20H,2,10H2,1H3."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_1-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C22H26N2O4\/c1-4-21(25)24-17-9-7-6-8-16(17)13-18(24)22(26)23-14-15-10-11-19(28-5-2)20(12-15)27-3\/h6-12,18H,4-5,13-14H2,1-3H3,(H,23,26) can also be represented with the SMILES representation CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC."} {"text":"The molecule with the InChI string InChI=1S\/C17H14ClN3O3S2\/c1-11-16(26(23,24)10-12-6-2-3-7-13(12)18)25-17(20-11)21-15(22)14-8-4-5-9-19-14\/h2-9H,10H2,1H3,(H,20,21,22) can also be represented with the SMILES Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_0-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SMILES from the InChI string.\nInChI string: InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SMILES from the InChI string.\nInChI string: InChI=1S\/C18H16N2O3S\/c1-22-13-9-7-12(8-10-13)16-11-15(20-23-16)18(21)19-14-5-3-4-6-17(14)24-2\/h3-11H,1-2H3,(H,19,21)\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_4-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C17H22N2O3S\/c20-17(14-4-2-1-3-5-14)18-12-13-6-10-16(11-7-13)23(21,22)19-15-8-9-15\/h1-2,6-7,10-11,14-15,19H,3-5,8-9,12H2,(H,18,20)\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: O=C(NCc1ccc(S(=O)(=O)NC2CC2)cc1)C1CC=CCC1"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SMILES from the InChI string.\nInChI string: InChI=1S\/C17H17FN4O\/c1-2-23-17-9-14(18)5-8-16(17)20-10-13-3-6-15(7-4-13)22-12-19-11-21-22\/h3-9,11-12,20H,2,10H2,1H3\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CCOc1cc(F)ccc1NCc1ccc(-n2cncn2)cc1"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_4-1.jsonl": "{"text":"The molecule with the SMILES CSc1nc(C2CC2)cc(C(=O)Nc2ncc(C)s2)c1C#N can also be represented with the InChI string representation InChI=1S\/C15H14N4OS2\/c1-8-7-17-15(22-8)19-13(20)10-5-12(9-3-4-9)18-14(21-2)11(10)6-16\/h5,7,9H,3-4H2,1-2H3,(H,17,19,20)."} {"text":"The molecule with the SMILES CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1 can also be represented with the InChI string representation InChI=1S\/C17H22FNO4\/c1-11(2)15(17(21)23-10-14-4-3-9-22-14)19-16(20)12-5-7-13(18)8-6-12\/h5-8,11,14-15H,3-4,9-10H2,1-2H3,(H,19,20)."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_5-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the SMILES.\nMolecule SMILES: Cc1ccc(N)cc1NC(=O)CSc1ccccc1F\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C15H15FN2OS\/c1-10-6-7-11(17)8-13(10)18-15(19)9-20-14-5-3-2-4-12(14)16\/h2-8H,9,17H2,1H3,(H,18,19)"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the SMILES.\nSMILES: CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8-"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_2-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C16H12FN3O\/c17-12-3-1-11(2-4-12)15-13(16(18)21)9-14(20-15)10-5-7-19-8-6-10\/h1-9,20H,(H2,18,21)\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C17H15N3O3\/c1-23-15-8-7-11(9-14(15)18)16-13(17(21)22)10-20(19-16)12-5-3-2-4-6-12\/h2-10H,18H2,1H3,(H,21,22)\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_1-1.jsonl": "{"text":"The molecule with the SMILES representation of COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC can also be represented with the InChI string representation InChI=1S\/C17H16FNO6S\/c1-24-14-9-11(3-8-16(20)21)10-15(17(14)25-2)26(22,23)19-13-6-4-12(18)5-7-13\/h3-10,19H,1-2H3,(H,20,21)\/b8-3+."} {"text":"The molecule with the SMILES Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C can also be represented with the InChI string InChI=1S\/C15H14FNO4S\/c1-9-3-8-13(14(10(9)2)15(18)19)17-22(20,21)12-6-4-11(16)5-7-12\/h3-8,17H,1-2H3,(H,18,19)."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_0-1.jsonl": "{"text":"The molecule with the SMILES O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C can also be represented with the InChI string representation InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1."} {"text":"The molecule with the SMILES representation of COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1 can also be represented with the InChI string InChI=1S\/C18H16N2O3S\/c1-22-13-9-7-12(8-10-13)16-11-15(20-23-16)18(21)19-14-5-3-4-6-17(14)24-2\/h3-11H,1-2H3,(H,19,21)."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_3-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the InChI string InChI=1S\/C24H25BrClN3O3\/c1-5-29-16(3)24(15(2)28-29)27-23(30)11-7-17-6-9-21(31-4)18(12-17)14-32-22-10-8-19(26)13-20(22)25\/h6-13H,5,14H2,1-4H3,(H,27,30)\/b11-7+?\nAssistant: Sure, this molecule has a SMILES of CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C."} {"text":"User: Can you create the SMILES of the molecule with the InChI string InChI=1S\/C16H16FN3O2S\/c1-2-20(9-3-8-18)15(21)11-23-16-19-10-14(22-16)12-4-6-13(17)7-5-12\/h4-7,10H,2-3,9,11H2,1H3?\nAssistant: Of course, this molecule has a SMILES of CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_0-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the InChI string InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1?\nAssistant: Sure, this molecule has a SMILES of O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C."} {"text":"User: Can you create the SMILES of the molecule with the InChI string InChI=1S\/C18H16N2O3S\/c1-22-13-9-7-12(8-10-13)16-11-15(20-23-16)18(21)19-14-5-3-4-6-17(14)24-2\/h3-11H,1-2H3,(H,19,21)?\nAssistant: Sure, this molecule has a SMILES of COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_1-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C19H18N2O2S\/c1-13-20-18(12-24-13)15-6-8-16(9-7-15)21-19(22)11-14-4-3-5-17(10-14)23-2\/h3-10,12H,11H2,1-2H3,(H,21,22) can also be represented with the SMILES representation COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1."} {"text":"The molecule with the InChI string InChI=1S\/C11H10N4O3\/c1-18-5-2-3-7-6(4-5)8-9(13-7)10(16)15(12)11(17)14-8\/h2-4,13H,12H2,1H3,(H,14,17) can also be represented with the SMILES representation COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_4-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the SMILES CSc1nc(C2CC2)cc(C(=O)Nc2ncc(C)s2)c1C#N?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C15H14N4OS2\/c1-8-7-17-15(22-8)19-13(20)10-5-12(9-3-4-9)18-14(21-2)11(10)6-16\/h5,7,9H,3-4H2,1-2H3,(H,17,19,20)."} {"text":"User: Can you tell me the InChI string of the molecule with the SMILES CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C17H22FNO4\/c1-11(2)15(17(21)23-10-14-4-3-9-22-14)19-16(20)12-5-7-13(18)8-6-12\/h5-8,11,14-15H,3-4,9-10H2,1-2H3,(H,19,20)."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_4-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the InChI string from the SMILES.\nMolecule SMILES: O=C(NCc1ccc(S(=O)(=O)NC2CC2)cc1)C1CC=CCC1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C17H22N2O3S\/c20-17(14-4-2-1-3-5-14)18-12-13-6-10-16(11-7-13)23(21,22)19-15-8-9-15\/h1-2,6-7,10-11,14-15,19H,3-5,8-9,12H2,(H,18,20)"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the SMILES.\nMolecule SMILES: CCOc1cc(F)ccc1NCc1ccc(-n2cncn2)cc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C17H17FN4O\/c1-2-23-17-9-14(18)5-8-16(17)20-10-13-3-6-15(7-4-13)22-12-19-11-21-22\/h3-9,11-12,20H,2,10H2,1H3"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_4-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the SMILES.\nSMILES: Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C16H18N2O4S\/c1-11-15(8-9-22-11)16(19)17-10-12-2-6-14(7-3-12)23(20,21)18-13-4-5-13\/h2-3,6-9,13,18H,4-5,10H2,1H3,(H,17,19)"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the SMILES.\nSMILES: Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C16H15N3O2S\/c1-11-12(6-9-21-11)16(20)18-8-5-15-19-14(10-22-15)13-4-2-3-7-17-13\/h2-4,6-7,9-10H,5,8H2,1H3,(H,18,20)"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_3-1.jsonl": "{"text":"The molecule with the SMILES representation of COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1 can also be represented with the InChI string representation InChI=1S\/C25H22FN5O3S\/c1-31-8-7-19-16(12-31)9-15(10-27)24(30-19)35-13-20-22(25(32)33-2)21(18(11-28)23(29)34-20)14-3-5-17(26)6-4-14\/h3-6,9,21H,7-8,12-13,29H2,1-2H3."} {"text":"The molecule with the SMILES representation of O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12 can also be represented with the InChI string InChI=1S\/C16H16N2O4\/c19-14-10-12(11-4-1-2-5-13(11)17-14)16(21)22-9-8-18-7-3-6-15(18)20\/h1-2,4-5,10H,3,6-9H2,(H,17,19)."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_2-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the SMILES.\nMolecule SMILES: CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C22H22Cl2N2O2\/c1-3-5-12-26(4-2)22(28)21(27)19-17-13-16(24)10-11-18(17)25-20(19)14-6-8-15(23)9-7-14\/h6-11,13,25H,3-5,12H2,1-2H3"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the SMILES.\nSMILES: CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C22H26BrClN4O2\/c1-26(15-21(29)25-20-5-3-2-4-19(20)23)22(30)16-28-12-10-27(11-13-28)14-17-6-8-18(24)9-7-17\/h2-9H,10-16H2,1H3,(H,25,29)"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_3-1.jsonl": "{"text":"The molecule with the SMILES representation of CCCc1cc(Nc2ccc(C)cc2)n2ncnc2n1 can also be represented with the InChI string InChI=1S\/C15H17N5\/c1-3-4-13-9-14(20-15(19-13)16-10-17-20)18-12-7-5-11(2)6-8-12\/h5-10,18H,3-4H2,1-2H3."} {"text":"The molecule with the SMILES representation of CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1 can also be represented with the InChI string InChI=1S\/C15H18N2O4\/c1-11(18)16-13-5-2-4-12(10-13)15(20)21-9-8-17-7-3-6-14(17)19\/h2,4-5,10H,3,6-9H2,1H3,(H,16,18)."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_0-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the InChI string from the SMILES.\nSMILES: S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the InChI string from the SMILES.\nSMILES: CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C22H23N3O4\/c1-22(2,3)24-21(27)19(11-7-10-16-8-5-4-6-9-16)23-20(26)17-12-14-18(15-13-17)25(28)29\/h4-15H,1-3H3,(H,23,26)(H,24,27)\/b10-7+,19-11+"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_1-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the SMILES COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C19H18N2O2S\/c1-13-20-18(12-24-13)15-6-8-16(9-7-15)21-19(22)11-14-4-3-5-17(10-14)23-2\/h3-10,12H,11H2,1-2H3,(H,21,22)."} {"text":"User: Can you tell me the InChI string of the molecule with the SMILES COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C11H10N4O3\/c1-18-5-2-3-7-6(4-5)8-9(13-7)10(16)15(12)11(17)14-8\/h2-4,13H,12H2,1H3,(H,14,17)."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_4-1.jsonl": "{"text":"The molecule with the SMILES representation of Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1 can also be represented with the InChI string representation InChI=1S\/C16H18N2O4S\/c1-11-15(8-9-22-11)16(19)17-10-12-2-6-14(7-3-12)23(20,21)18-13-4-5-13\/h2-3,6-9,13,18H,4-5,10H2,1H3,(H,17,19)."} {"text":"The molecule with the SMILES representation of Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1 can also be represented with the InChI string InChI=1S\/C16H15N3O2S\/c1-11-12(6-9-21-11)16(20)18-8-5-15-19-14(10-22-15)13-4-2-3-7-17-13\/h2-4,6-7,9-10H,5,8H2,1H3,(H,18,20)."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_5-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the InChI string InChI=1S\/C15H15FN2OS\/c1-10-6-7-11(17)8-13(10)18-15(19)9-20-14-5-3-2-4-12(14)16\/h2-8H,9,17H2,1H3,(H,18,19)?\nAssistant: Sure, this molecule has a SMILES of Cc1ccc(N)cc1NC(=O)CSc1ccccc1F."} {"text":"User: Can you generate the SMILES of the molecule with the InChI string InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8-?\nAssistant: Of course, this molecule has a SMILES of CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/valid_2-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the SMILES CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C22H22Cl2N2O2\/c1-3-5-12-26(4-2)22(28)21(27)19-17-13-16(24)10-11-18(17)25-20(19)14-6-8-15(23)9-7-14\/h6-11,13,25H,3-5,12H2,1-2H3."} {"text":"User: Can you tell me the InChI string of the molecule with the SMILES CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C22H26BrClN4O2\/c1-26(15-21(29)25-20-5-3-2-4-19(20)23)22(30)16-28-12-10-27(11-13-28)14-17-6-8-18(24)9-7-17\/h2-9H,10-16H2,1H3,(H,25,29)."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_1-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C17H16FNO6S\/c1-24-14-9-11(3-8-16(20)21)10-15(17(14)25-2)26(22,23)19-13-6-4-12(18)5-7-13\/h3-10,19H,1-2H3,(H,20,21)\/b8-3+\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the InChI string.\nInChI string: InChI=1S\/C15H14FNO4S\/c1-9-3-8-13(14(10(9)2)15(18)19)17-22(20,21)12-6-4-11(16)5-7-12\/h3-8,17H,1-2H3,(H,18,19)\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_0-4.jsonl": "{"text":"User: Can you create the SMILES of the molecule with the InChI string InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1?\nAssistant: Of course, this molecule has a SMILES of S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1."} {"text":"User: Can you tell me the SMILES of the molecule with the InChI string InChI=1S\/C22H18ClN3O2\/c1-28-19-9-7-15(23)12-18(19)21(26-16-5-3-10-24-13-16)17-8-6-14-4-2-11-25-20(14)22(17)27\/h2-13,21,26-27H,1H3?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_3-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C25H22FN5O3S\/c1-31-8-7-19-16(12-31)9-15(10-27)24(30-19)35-13-20-22(25(32)33-2)21(18(11-28)23(29)34-20)14-3-5-17(26)6-4-14\/h3-6,9,21H,7-8,12-13,29H2,1-2H3 can also be represented with the SMILES COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1."} {"text":"The molecule with the InChI string InChI=1S\/C16H16N2O4\/c19-14-10-12(11-4-1-2-5-13(11)17-14)16(21)22-9-8-18-7-3-6-15(18)20\/h1-2,4-5,10H,3,6-9H2,(H,17,19) can also be represented with the SMILES representation O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/test_2-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the SMILES.\nMolecule SMILES: NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C16H12FN3O\/c17-12-3-1-11(2-4-12)15-13(16(18)21)9-14(20-15)10-5-7-19-8-6-10\/h1-9,20H,(H2,18,21)"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the SMILES.\nSMILES: COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C17H15N3O3\/c1-23-15-8-7-11(9-14(15)18)16-13(17(21)22)10-20(19-16)12-5-3-2-4-6-12\/h2-10H,18H2,1H3,(H,21,22)"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_4-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C17H22N2O3S\/c20-17(14-4-2-1-3-5-14)18-12-13-6-10-16(11-7-13)23(21,22)19-15-8-9-15\/h1-2,6-7,10-11,14-15,19H,3-5,8-9,12H2,(H,18,20) can also be represented with the SMILES representation O=C(NCc1ccc(S(=O)(=O)NC2CC2)cc1)C1CC=CCC1."} {"text":"The molecule with the InChI string representation of InChI=1S\/C17H17FN4O\/c1-2-23-17-9-14(18)5-8-16(17)20-10-13-3-6-15(7-4-13)22-12-19-11-21-22\/h3-9,11-12,20H,2,10H2,1H3 can also be represented with the SMILES CCOc1cc(F)ccc1NCc1ccc(-n2cncn2)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_3-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C25H22FN5O3S\/c1-31-8-7-19-16(12-31)9-15(10-27)24(30-19)35-13-20-22(25(32)33-2)21(18(11-28)23(29)34-20)14-3-5-17(26)6-4-14\/h3-6,9,21H,7-8,12-13,29H2,1-2H3\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the InChI string.\nInChI string: InChI=1S\/C16H16N2O4\/c19-14-10-12(11-4-1-2-5-13(11)17-14)16(21)22-9-8-18-7-3-6-15(18)20\/h1-2,4-5,10H,3,6-9H2,(H,17,19)\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12"}", "/scratch/micpie/export/mol_repr_transl_smiles_inchi/train_3-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the SMILES.\nSMILES: COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C25H22FN5O3S\/c1-31-8-7-19-16(12-31)9-15(10-27)24(30-19)35-13-20-22(25(32)33-2)21(18(11-28)23(29)34-20)14-3-5-17(26)6-4-14\/h3-6,9,21H,7-8,12-13,29H2,1-2H3"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the InChI string from the SMILES.\nMolecule SMILES: O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C16H16N2O4\/c19-14-10-12(11-4-1-2-5-13(11)17-14)16(21)22-9-8-18-7-3-6-15(18)20\/h1-2,4-5,10H,3,6-9H2,(H,17,19)"}", "/scratch/micpie/export/bio_ner_29/valid_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: Proteins (and accession numbers): PHO84 (P25297) from Saccharomyces cerevisiae; GvPT (Q00908) from Glomus versiforme; GiPT (AAL37552) from Glomus intraradices; Pht1; 1 (Y07682), Pht1; 2 (Y07681) Pht1; 3 (O48639) and Pht2; 1 (CAC15560) from Arabidopsis thaliana; StPT1 (Q43650) and StPT2 (Q41479) from Solanum tuberosum; MtPT1 (O22301) and MtPT2 (O22302) from Medicago truncatula; LePT1 (O24029) and LePT2 (O22549) from Lycopersicon esculentum; LaPT1 (AAK01938) and LaPT2 (AAK38197) from Lupinus albus; NtPT1 (AAF74025) from Nicotiana tabacum; OsPT1 (AAN39042) and OsPT2 (AAN39043) from Oryza sativa; and GmPT1 (HQ392508) and GmPT2 (HQ392509) from Glycine max..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: PHO84,36,41,Gene\/Protein\nSaccharomyces cerevisiae,57,81,Organism\/Species\nGvPT,83,87,Gene\/Protein\nQ00908,90,96,Chemical\/Drug\nGlomus versiforme,103,120,Organism\/Species\nGiPT,122,126,Gene\/Protein\nGlomus intraradices,144,163,Organism\/Species\nArabidopsis thaliana,250,270,Organism\/Species\nStPT1,272,277,Gene\/Protein\nStPT2,292,297,Gene\/Protein\nSolanum tuberosum,313,330,Organism\/Species\nMtPT1,332,337,Gene\/Protein\nMtPT2,352,357,Gene\/Protein\nMedicago truncatula,373,392,Organism\/Species\nLePT1,394,399,Gene\/Protein\nLePT2,414,419,Gene\/Protein\nLycopersicon esculentum,435,458,Organism\/Species\nLaPT1,460,465,Gene\/Protein\nLaPT2,482,487,Gene\/Protein\nLupinus albus,505,518,Organism\/Species\nNtPT1,520,525,Gene\/Protein\nNicotiana tabacum,543,560,Organism\/Species\nOsPT1,562,567,Gene\/Protein\nAAN39042,570,578,Chemical\/Drug\nOsPT2,584,589,Gene\/Protein\nOryza sativa,607,619,Organism\/Species\nGmPT1,625,630,Gene\/Protein\nGmPT2,647,652,Gene\/Protein\nGlycine max,670,681,Organism\/Species"} {"text":"Task: Please carry out the NER task for the the text below.\nText: Proteins (and accession numbers): PHO84 (P25297) from Saccharomyces cerevisiae; GvPT (Q00908) from Glomus versiforme; GiPT (AAL37552) from Glomus intraradices; Pht1; 1 (Y07682), Pht1; 2 (Y07681) Pht1; 3 (O48639) and Pht2; 1 (CAC15560) from Arabidopsis thaliana; StPT1 (Q43650) and StPT2 (Q41479) from Solanum tuberosum; MtPT1 (O22301) and MtPT2 (O22302) from Medicago truncatula; LePT1 (O24029) and LePT2 (O22549) from Lycopersicon esculentum; LaPT1 (AAK01938) and LaPT2 (AAK38197) from Lupinus albus; NtPT1 (AAF74025) from Nicotiana tabacum; OsPT1 (AAN39042) and OsPT2 (AAN39043) from Oryza sativa; and GmPT1 (HQ392508) and GmPT2 (HQ392509) from Glycine max..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: PHO84,36,41,Gene\/Protein\nSaccharomyces cerevisiae,57,81,Organism\/Species\nGvPT,83,87,Gene\/Protein\nQ00908,90,96,Chemical\/Drug\nGlomus versiforme,103,120,Organism\/Species\nGiPT,122,126,Gene\/Protein\nGlomus intraradices,144,163,Organism\/Species\nArabidopsis thaliana,250,270,Organism\/Species\nStPT1,272,277,Gene\/Protein\nStPT2,292,297,Gene\/Protein\nSolanum tuberosum,313,330,Organism\/Species\nMtPT1,332,337,Gene\/Protein\nMtPT2,352,357,Gene\/Protein\nMedicago truncatula,373,392,Organism\/Species\nLePT1,394,399,Gene\/Protein\nLePT2,414,419,Gene\/Protein\nLycopersicon esculentum,435,458,Organism\/Species\nLaPT1,460,465,Gene\/Protein\nLaPT2,482,487,Gene\/Protein\nLupinus albus,505,518,Organism\/Species\nNtPT1,520,525,Gene\/Protein\nNicotiana tabacum,543,560,Organism\/Species\nOsPT1,562,567,Gene\/Protein\nAAN39042,570,578,Chemical\/Drug\nOsPT2,584,589,Gene\/Protein\nOryza sativa,607,619,Organism\/Species\nGmPT1,625,630,Gene\/Protein\nGmPT2,647,652,Gene\/Protein\nGlycine max,670,681,Organism\/Species"}", "/scratch/micpie/export/bio_ner_29/test_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Other primary antibodies used for immunofluorescence were rat anti-GATA1 (1: 200, Santa Cruz Biotechnology, http: \/ \/ www. scbt. com, sc-265), goat anti-GATA4 (1: 200, Santa Cruz Biotechnology, sc-1237), rat anti-TRA98 (1: 200, gift of H. Tanaka and Y. Nishimune), rat anti-BC7 (1: 50, gift of H. Tanaka and Y. Nishimune), rat anti-TRA369 (1: 200, gift of H. Tanaka and Y. Nishimune), rabbit anti-RAD51 (1: 600 Calbiochem, http: \/ \/ www. calbiochem. com, PC130), mouse anti-GMP-1\/SUMO-1 (1: 200, Zymed, http: \/ \/ invitrogen. com, 33-2400), rabbit anti-phospho-H2AX (Ser139) (1: 200, Upstate, http: \/ \/ www. millipore. com, 01-164), mouse anti-phospho-H2AX (1: 200, Upstate, 05-636), mouse anti-SYCP3 (1: 200, Abcam, http: \/ \/ www. abcam. com, ab12452), rabbit anti-HP1 beta (1: 100, Abcam, ab10478), rabbit anti-H3-2meK9 (1: 100, Upstate, 07-441), rabbit anti-H3-3meK9 (1: 200, Upstate, 07-442), rabbit anti-AR (N-20) (1: 200, Santa Cruz Biotechnology, sc-816), and mouse anti-alpha SMA clone 1A4 (1: 800, Sigma, http: \/ \/ www. sigmaaldrich. com, A2547)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: antibodies,14,24,GO_ontology\nrat,58,61,Organism\nGATA1,69,74,Gene_or_geneproduct\ngoat,148,152,Organism\nGATA4,160,165,Gene_or_geneproduct\nrat,214,217,Organism\nrat,278,281,Organism\nrat,339,342,Organism\nrabbit,404,410,Organism\nRAD51,418,423,Gene_or_geneproduct\nmouse,485,490,Organism\nGMP - 1,498,505,Gene_or_geneproduct\nSUMO - 1,508,516,Gene_or_geneproduct\nrabbit,573,579,Organism\nphospho,587,594,Chemical\nH2AX,597,601,Gene_or_geneproduct\nmouse,673,678,Organism\nphospho,686,693,Chemical\nH2AX,696,700,Gene_or_geneproduct\nmouse,731,736,Organism\nSYCP3,744,749,Gene_or_geneproduct\nrabbit,804,810,Organism\nHP1 beta,818,826,Gene_or_geneproduct\nrabbit,854,860,Organism\nrabbit,909,915,Organism\nrabbit,964,970,Organism\nAR,978,980,Gene_or_geneproduct\nmouse,1042,1047,Organism\nalpha SMA,1055,1064,Gene_or_geneproduct"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Other primary antibodies used for immunofluorescence were rat anti-GATA1 (1: 200, Santa Cruz Biotechnology, http: \/ \/ www. scbt. com, sc-265), goat anti-GATA4 (1: 200, Santa Cruz Biotechnology, sc-1237), rat anti-TRA98 (1: 200, gift of H. Tanaka and Y. Nishimune), rat anti-BC7 (1: 50, gift of H. Tanaka and Y. Nishimune), rat anti-TRA369 (1: 200, gift of H. Tanaka and Y. Nishimune), rabbit anti-RAD51 (1: 600 Calbiochem, http: \/ \/ www. calbiochem. com, PC130), mouse anti-GMP-1\/SUMO-1 (1: 200, Zymed, http: \/ \/ invitrogen. com, 33-2400), rabbit anti-phospho-H2AX (Ser139) (1: 200, Upstate, http: \/ \/ www. millipore. com, 01-164), mouse anti-phospho-H2AX (1: 200, Upstate, 05-636), mouse anti-SYCP3 (1: 200, Abcam, http: \/ \/ www. abcam. com, ab12452), rabbit anti-HP1 beta (1: 100, Abcam, ab10478), rabbit anti-H3-2meK9 (1: 100, Upstate, 07-441), rabbit anti-H3-3meK9 (1: 200, Upstate, 07-442), rabbit anti-AR (N-20) (1: 200, Santa Cruz Biotechnology, sc-816), and mouse anti-alpha SMA clone 1A4 (1: 800, Sigma, http: \/ \/ www. sigmaaldrich. com, A2547)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: antibodies,14,24,GO_ontology\nrat,58,61,Organism\nGATA1,69,74,Gene_or_geneproduct\ngoat,148,152,Organism\nGATA4,160,165,Gene_or_geneproduct\nrat,214,217,Organism\nrat,278,281,Organism\nrat,339,342,Organism\nrabbit,404,410,Organism\nRAD51,418,423,Gene_or_geneproduct\nmouse,485,490,Organism\nGMP - 1,498,505,Gene_or_geneproduct\nSUMO - 1,508,516,Gene_or_geneproduct\nrabbit,573,579,Organism\nphospho,587,594,Chemical\nH2AX,597,601,Gene_or_geneproduct\nmouse,673,678,Organism\nphospho,686,693,Chemical\nH2AX,696,700,Gene_or_geneproduct\nmouse,731,736,Organism\nSYCP3,744,749,Gene_or_geneproduct\nrabbit,804,810,Organism\nHP1 beta,818,826,Gene_or_geneproduct\nrabbit,854,860,Organism\nrabbit,909,915,Organism\nrabbit,964,970,Organism\nAR,978,980,Gene_or_geneproduct\nmouse,1042,1047,Organism\nalpha SMA,1055,1064,Gene_or_geneproduct"}", "/scratch/micpie/export/bio_ner_29/train_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: WT, black circles; csrBC mutant, green squares; csrD mutant, orange triangles; csrA51 mutant, red diamonds. ptsG, glucose phosphotransferase system permease PtsG subunit (European Bioinformatics Institute European Nucleotide Archive accession no. EG10787); pgm, phosphoglucomutase (accession no. EG12144); glgC, glucose-1-phosphate adenylyltransferase (accession no. EG10379); zwf, glucose 6-phosphate-1-dehydrogenase (accession no. EG11221); pgi, phosphoglucose isomerase (accession no. EG10702); pfkA, phosphofructokinase (accession no. EG10699); eno, enolase (accession no. EG10258); icd, isocitrate dehydrogenase (accession no. EG10489); pck, phosphoenolpyruvate carboxykinase (accession no. EG10688); pta, phosphate acetyltransferase (accession no. EG20173); ack, acetate kinase (accession no. EG10027); ACS, acetyl coenzyme A synthetase (accession no. EG11448); pck, phosphoenolpyruvate carboxykinase (accession no. EG10688); fbp, fructose 1, 6-bisphosphatase (accession no. EG10283)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: csrD,48,52,Gene\/Protein\ncsrA51,79,85,Gene\/Protein\nptsG,108,112,Gene\/Protein\nglucose phosphotransferase system permease PtsG subunit,114,169,Gene\/Protein\npgm,259,262,Gene\/Protein\nphosphoglucomutase,264,282,Gene\/Protein\nglgC,310,314,Gene\/Protein\nglucose - 1 - phosphate adenylyltransferase,316,359,Gene\/Protein\nzwf,387,390,Gene\/Protein\nglucose 6 - phosphate - 1 - dehydrogenase,392,433,Gene\/Protein\npgi,461,464,Gene\/Protein\nphosphoglucose isomerase,466,490,Gene\/Protein\npfkA,518,522,Gene\/Protein\nphosphofructokinase,524,543,Gene\/Protein\neno,571,574,Gene\/Protein\nenolase,576,583,Gene\/Protein\nicd,611,614,Gene\/Protein\nisocitrate dehydrogenase,616,640,Gene\/Protein\npck,668,671,Gene\/Protein\nphosphoenolpyruvate carboxykinase,673,706,Gene\/Protein\npta,734,737,Gene\/Protein\nphosphate acetyltransferase,739,766,Gene\/Protein\nack,794,797,Gene\/Protein\nacetate kinase,799,813,Gene\/Protein\nACS,841,844,Gene\/Protein\nacetyl coenzyme A synthetase,846,874,Gene\/Protein\npck,902,905,Gene\/Protein\nphosphoenolpyruvate carboxykinase,907,940,Gene\/Protein\nfbp,968,971,Gene\/Protein"} {"text":"Task: Please carry out the NER task for the the text below.\nText: WT, black circles; csrBC mutant, green squares; csrD mutant, orange triangles; csrA51 mutant, red diamonds. ptsG, glucose phosphotransferase system permease PtsG subunit (European Bioinformatics Institute European Nucleotide Archive accession no. EG10787); pgm, phosphoglucomutase (accession no. EG12144); glgC, glucose-1-phosphate adenylyltransferase (accession no. EG10379); zwf, glucose 6-phosphate-1-dehydrogenase (accession no. EG11221); pgi, phosphoglucose isomerase (accession no. EG10702); pfkA, phosphofructokinase (accession no. EG10699); eno, enolase (accession no. EG10258); icd, isocitrate dehydrogenase (accession no. EG10489); pck, phosphoenolpyruvate carboxykinase (accession no. EG10688); pta, phosphate acetyltransferase (accession no. EG20173); ack, acetate kinase (accession no. EG10027); ACS, acetyl coenzyme A synthetase (accession no. EG11448); pck, phosphoenolpyruvate carboxykinase (accession no. EG10688); fbp, fructose 1, 6-bisphosphatase (accession no. EG10283)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: csrD,48,52,Gene\/Protein\ncsrA51,79,85,Gene\/Protein\nptsG,108,112,Gene\/Protein\nglucose phosphotransferase system permease PtsG subunit,114,169,Gene\/Protein\npgm,259,262,Gene\/Protein\nphosphoglucomutase,264,282,Gene\/Protein\nglgC,310,314,Gene\/Protein\nglucose - 1 - phosphate adenylyltransferase,316,359,Gene\/Protein\nzwf,387,390,Gene\/Protein\nglucose 6 - phosphate - 1 - dehydrogenase,392,433,Gene\/Protein\npgi,461,464,Gene\/Protein\nphosphoglucose isomerase,466,490,Gene\/Protein\npfkA,518,522,Gene\/Protein\nphosphofructokinase,524,543,Gene\/Protein\neno,571,574,Gene\/Protein\nenolase,576,583,Gene\/Protein\nicd,611,614,Gene\/Protein\nisocitrate dehydrogenase,616,640,Gene\/Protein\npck,668,671,Gene\/Protein\nphosphoenolpyruvate carboxykinase,673,706,Gene\/Protein\npta,734,737,Gene\/Protein\nphosphate acetyltransferase,739,766,Gene\/Protein\nack,794,797,Gene\/Protein\nacetate kinase,799,813,Gene\/Protein\nACS,841,844,Gene\/Protein\nacetyl coenzyme A synthetase,846,874,Gene\/Protein\npck,902,905,Gene\/Protein\nphosphoenolpyruvate carboxykinase,907,940,Gene\/Protein\nfbp,968,971,Gene\/Protein"}", "/scratch/micpie/export/qmof_quantum/valid_0-57.jsonl": "{"text":"User: I'm thinking about the conduction band minima and valence band maxima of reticular materials.\nAssistant: What can I do for you?\nUser: How does the PAW-PBE-D3(BJ) computed conduction band minimum (CBM) of the metal-organic framework (MOF) with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0 compare to the HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure and the HLE17 computed conduction band minimum of a PBE-D3(BJ) optimized structure?\nAssistant: The PAW-PBE-D3(BJ) computed conduction band minimum (CBM) is 0.647 eV, the HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure is 1.872 eV and the HLE17 computed conduction band minimum of a PBE-D3(BJ) optimized structure is -2.003 eV.\nUser: And how about the PAW-PBE-D3(BJ) calculated valence band maximum?\nAssistant: The PAW-PBE-D3(BJ) calculated valence band maximum is 0.335 eV, the HSE06 computed valence band maximum of a PBE-D3(BJ) optimized structure is -0.979 eV and the HLE17 computed valence band maximum of a PBE-D3(BJ) optimized structure is -2.211 eV."} {"text":"User: I'm thinking about the conduction band minima and valence band maxima of metal-organic frameworks.\nAssistant: What can I do for you?\nUser: How does the PAW-PBE-D3(BJ) computed conduction band minimum of the reticular material with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0 compare to the HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure and the HLE17 computed conduction band minimum of a PBE-D3(BJ) optimized structure?\nAssistant: The PAW-PBE-D3(BJ) computed conduction band minimum is 1.356 eV, the HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure is 2.713 eV and the HLE17 computed conduction band minimum of a PBE-D3(BJ) optimized structure is -1.286 eV.\nUser: And how about the PAW-PBE-D3(BJ) computed energy of valence band maximum (VBM)?\nAssistant: The PAW-PBE-D3(BJ) computed energy of valence band maximum (VBM) is 0.418 eV, the HSE06 computed valence band maximum of a PBE-D3(BJ) optimized structure is -0.902 eV and the HLE17 calculated valence band maximum of a PBE-D3(BJ) optimized structure is -2.198 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-48.jsonl": "{"text":"Question: What is the HSE06 computed valence band maximum of a PBE-D3(BJ) optimized structure of the metal-organic framework (MOF) with the topology hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2]?\nAnswer: The valence band maximum is -0.222 eV."} {"text":"Question: How large is the HSE06 calculated valence band maximum of a PBE-D3(BJ) optimized structure of the MOF with the topology sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn]?\nAnswer: The valence band maximum of the MOF is -2.074 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-35.jsonl": "{"text":"Question: What is the PAW-PBE-D3(BJ) computed conduction band minimum (CBM) of the reticular material with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0?\nAnswer: 0.647 eV."} {"text":"Question: What is the PAW-PBE-D3(BJ) calculated conduction band minimum of the reticular material with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0?\nAnswer: The conduction band minimum of the MOF is 1.356 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-28.jsonl": "{"text":"Question: What is the pore limiting diameter of the metal-organic framework (MOF) with the topology hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2]?\nAnswer: The pore limiting diameter is 0.937 \\AA."} {"text":"Question: How large is the pore limiting diameter of the metal-organic framework (MOF) with the net sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn]?\nAnswer: The pore limiting diameter is 2.694 \\AA."}", "/scratch/micpie/export/qmof_quantum/train_0-17.jsonl": "{"text":"The metal-organic framework with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0 has a HLE17 computed valence band maximum of a PBE-D3(BJ) optimized structure of -1.772 eV."} {"text":"The reticular material with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1 has a HLE17 computed valence band maximum of a PBE-D3(BJ) optimized structure of -3.406 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-16.jsonl": "{"text":"The metal-organic framework (MOF) with the RCSR identifier hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2] has a HLE17 calculated valence band maximum of a PBE-D3(BJ) optimized structure of -1.772 eV."} {"text":"The metal-organic framework (MOF) with the RCSR identifier sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn] has a HLE17 calculated valence band maximum of a PBE-D3(BJ) optimized structure of -3.406 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-39.jsonl": "{"text":"Question: How large is the HLE17 computed band gap of a PBE-D3(BJ) optimized structure of the metal-organic framework with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0?\nAnswer: The band gap is 1.504 eV."} {"text":"Question: How large is the HLE17 computed band gap of a PBE-D3(BJ) optimized structure of the metal-organic framework with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1?\nAnswer: 3.352 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-50.jsonl": "{"text":"User: With which linkers do I have to combine my nodes with SMILES [Cd] to form a reticular material with the topology jxj?\nAssistant: You have to combine your nodes with SMILES [Cd] with linkers with SMILES S=[C]1=NN=C(O1)c1ccncc1 to form a reticular material with the jxj topology."} {"text":"User: With which linkers do I have to combine my nodes with SMILES Cl[Mn] to form a metal-organic framework (MOF) with the topology pyr?\nAssistant: You have to combine your nodes with SMILES Cl[Mn] with linkers with SMILES [O-][n+]1ccc(cc1)C(=O)[O-] to form a metal-organic framework (MOF) with the pyr topology."}", "/scratch/micpie/export/qmof_quantum/test_0-10.jsonl": "{"text":"The metal-organic framework with the RCSR identifier jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd] has a PAW-PBE-D3(BJ) computed energy of valence band maximum (VBM) of -0.689 eV."} {"text":"The metal-organic framework with the RCSR code pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn] has a PAW-PBE-D3(BJ) computed valence band maximum (VBM) of 2.238 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-8.jsonl": "{"text":"The reticular material with the RCSR identifier hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu] has a PAW-PBE-D3(BJ) calculated energy of conduction band minimum of 0.647 eV."} {"text":"The metal-organic framework (MOF) with the net sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu] has a PAW-PBE-D3(BJ) calculated conduction band minimum of 1.356 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-54.jsonl": "{"text":"User: Which linkers and nodes do I have to combine to form a metal-organic framework (MOF) with the PAW-PBE-D3(BJ) calculated band gap of 2.025 eV?\nAssistant: Do you have other constraints?\nUser: I want to have a topology jxj and PAW-PBE-D3(BJ) calculated energy of conduction band minimum of 1.335 eV.\nAssistant: You have to combine your linkers with SMILES S=[C]1=NN=C(O1)c1ccncc1 with nodes with SMILES [Cd].\nUser: One more thing, What is the density and the space group number?\nAssistant: The density is 1.556 g\/cm^3 and the space group number is 14."} {"text":"User: Which linkers and nodes do I have to combine to form a metal-organic framework (MOF) with the PAW-PBE-D3(BJ) computed band gap of 0.759 eV?\nAssistant: Are there additional constraints?\nUser: I want to have a topology pyr and PAW-PBE-D3(BJ) computed conduction band minimum (CBM) of 2.997 eV.\nAssistant: You have to combine your linkers with SMILES [O-][n+]1ccc(cc1)C(=O)[O-] with nodes with SMILES Cl[Mn].\nUser: One more thing, What is the density and the space group number?\nAssistant: The density is 2.199 g\/cm^3 and the space group number is 61."}", "/scratch/micpie/export/qmof_quantum/test_0-22.jsonl": "{"text":"The metal-organic framework (MOF) with the RCSR identifier jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd] has a HSE06 calculated valence band maximum of a PBE-D3(BJ) optimized structure of -1.285 eV."} {"text":"The metal-organic framework with the topology pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn] has a HSE06 computed valence band maximum of a PBE-D3(BJ) optimized structure of 0.986 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-16.jsonl": "{"text":"The reticular material with the topology jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd] has a HLE17 computed valence band maximum of a PBE-D3(BJ) optimized structure of -2.647 eV."} {"text":"The MOF with the RCSR identifier pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn] has a HLE17 calculated valence band maximum of a PBE-D3(BJ) optimized structure of -0.897 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-52.jsonl": "{"text":"User: Which linkers and nodes do I have to combine to form a metal-organic framework (MOF) with the largest cavity diameter 2.262 \\AA?\nAssistant: Is there anything else I should take into account?\nUser: Indeed, I want to use linkers with SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].\nAssistant: You have to combine your linkers with SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-] with nodes with SMILES [OH2][K][OH2] to form a metal-organic framework (MOF) with the largest cavity diameter 2.262 \\AA."} {"text":"User: Which linkers and nodes do I have to combine to form a reticular material with the largest cavity diameter (LCD) 5.145 \\AA?\nAssistant: Do you have other requirements?\nUser: Yes, I want to use linkers with SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.\nAssistant: You have to combine your linkers with SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1 with nodes with SMILES [Zn] to form a reticular material with the largest cavity diameter (LCD) 5.145 \\AA."}", "/scratch/micpie/export/qmof_quantum/valid_0-58.jsonl": "{"text":"User: I'm thinking about the RCSR code of metal-organic frameworks.\nAssistant: Is there anything I can do?\nUser: Which RCSR code do the linkers with SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-] and nodes with SMILES [Cu] self-assemble to form a metal-organic framework (MOF)?\nAssistant: Into the RCSR code hcb.\nUser: I have one more question, What density do you estimate for this metal-organic framework?\nAssistant: The density is 1.924 g\/cm^3."} {"text":"User: I'm thinking about the RCSR code of reticular materials.\nAssistant: How can I help?\nUser: Which RCSR code do the linkers with SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1 and nodes with SMILES [Cu] self-assemble to form a reticular material?\nAssistant: Into the RCSR code sql.\nUser: What density do you predict for this metal-organic framework (MOF)?\nAssistant: The density is 1.602 g\/cm^3."}", "/scratch/micpie/export/qmof_quantum/valid_0-53.jsonl": "{"text":"User: Which linkers and nodes do I have to combine to form a reticular material with the pore limiting diameter (PLD) 1.281 \\AA?\nAssistant: Is there anything else I should take into account?\nUser: I want to use linkers with SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-] and want a PAW-PBE-D3(BJ) computed band gap of 0.312 eV.\nAssistant: You have to combine your linkers with SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-] with nodes with SMILES [Cu]."} {"text":"User: Which linkers and nodes do I have to combine to form a reticular material with the pore limiting diameter (PLD) 2.658 \\AA?\nAssistant: Are there additional constraints?\nUser: Yes, I want to use linkers with SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1 and want a PAW-PBE-D3(BJ) computed band gap of 0.938 eV.\nAssistant: You have to combine your linkers with SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1 with nodes with SMILES [Cu]."}", "/scratch/micpie/export/qmof_quantum/train_0-34.jsonl": "{"text":"Question: How large is the PAW-PBE-D3(BJ) computed conduction band minimum (CBM) of the MOF with the RCSR identifier hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2]?\nAnswer: 1.729 eV."} {"text":"Question: How large is the PAW-PBE-D3(BJ) computed conduction band minimum (CBM) of the metal-organic framework (MOF) with the net sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn]?\nAnswer: The conduction band minimum is 1.772 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-15.jsonl": "{"text":"The MOF with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0 has a HLE17 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of -0.350 eV."} {"text":"The metal-organic framework with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0 has a HLE17 computed conduction band minimum of a PBE-D3(BJ) optimized structure of 0.992 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-27.jsonl": "{"text":"Question: What is the mass density of the reticular material with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0?\nAnswer: The mass density is 1.636 g\/cm^3."} {"text":"Question: How large is the density of the metal-organic framework (MOF) with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1?\nAnswer: The density is 1.577 g\/cm^3."}", "/scratch/micpie/export/qmof_quantum/test_0-46.jsonl": "{"text":"Question: What is the HSE06 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of the metal-organic framework with the topology jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd]?\nAnswer: The conduction band minimum is 1.702 eV."} {"text":"Question: How large is the HSE06 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of the metal-organic framework (MOF) with the topology pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn]?\nAnswer: The conduction band minimum of the MOF is 3.195 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-8.jsonl": "{"text":"The metal-organic framework with the topology hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2] has a PAW-PBE-D3(BJ) calculated conduction band minimum (CBM) of 1.729 eV."} {"text":"The metal-organic framework with the RCSR identifier sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn] has a PAW-PBE-D3(BJ) computed conduction band minimum of 1.772 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-5.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0 has a largest cavity diameter (LCD) of 4.171 \\AA."} {"text":"The reticular material with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0 has a largest cavity diameter of 1.516 \\AA."}", "/scratch/micpie/export/qmof_quantum/valid_0-25.jsonl": "{"text":"The metal-organic framework with the RCSR identifier hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu] is a metal-organic framework with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0."} {"text":"The reticular material with the RCSR code sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu] is a metal-organic framework with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0."}", "/scratch/micpie/export/qmof_quantum/valid_0-9.jsonl": "{"text":"The metal-organic framework with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0 has a PAW-PBE-D3(BJ) computed conduction band minimum (CBM) of 0.647 eV."} {"text":"The metal-organic framework (MOF) with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0 has a PAW-PBE-D3(BJ) computed conduction band minimum of 1.356 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-26.jsonl": "{"text":"Question: What is the mass density of the metal-organic framework with the RCSR code jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd]?\nAnswer: 1.556 g\/cm^3."} {"text":"Question: How large is the mass density of the MOF with the topology pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn]?\nAnswer: The mass density is 2.199 g\/cm^3."}", "/scratch/micpie/export/qmof_quantum/test_0-19.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0 has a HSE06 calculated band gap of a PBE-D3(BJ) optimized structure of 2.986 eV."} {"text":"The reticular material with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0 has a HSE06 calculated band gap of a PBE-D3(BJ) optimized structure of 2.209 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-40.jsonl": "{"text":"Question: How large is the HLE17 computed conduction band minimum of a PBE-D3(BJ) optimized structure of the MOF with the topology jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd]?\nAnswer: The conduction band minimum is -0.350 eV."} {"text":"Question: How large is the HLE17 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of the metal-organic framework (MOF) with the RCSR identifier pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn]?\nAnswer: 0.992 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-57.jsonl": "{"text":"User: I'm thinking about the conduction band minima and valence band maxima of metal-organic frameworks.\nAssistant: Is there anything I can do?\nUser: How does the PAW-PBE-D3(BJ) computed conduction band minimum of the metal-organic framework with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0 compare to the HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure and the HLE17 computed conduction band minimum of a PBE-D3(BJ) optimized structure?\nAssistant: The PAW-PBE-D3(BJ) computed conduction band minimum is 1.335 eV, the HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure is 1.702 eV and the HLE17 computed conduction band minimum of a PBE-D3(BJ) optimized structure is -0.350 eV.\nUser: How about the PAW-PBE-D3(BJ) computed valence band maximum (VBM)?\nAssistant: The PAW-PBE-D3(BJ) computed valence band maximum (VBM) is -0.689 eV, the HSE06 calculated valence band maximum of a PBE-D3(BJ) optimized structure is -1.285 eV and the HLE17 computed valence band maximum of a PBE-D3(BJ) optimized structure is -2.647 eV."} {"text":"User: I'm thinking about the conduction band minima and valence band maxima of reticular materials.\nAssistant: How can I be of assistance?\nUser: How does the PAW-PBE-D3(BJ) computed conduction band minimum (CBM) of the metal-organic framework (MOF) with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0 compare to the HSE06 calculated conduction band minimum of a PBE-D3(BJ) optimized structure and the HLE17 computed conduction band minimum of a PBE-D3(BJ) optimized structure?\nAssistant: The PAW-PBE-D3(BJ) computed conduction band minimum (CBM) is 2.997 eV, the HSE06 calculated conduction band minimum of a PBE-D3(BJ) optimized structure is 3.195 eV and the HLE17 computed conduction band minimum of a PBE-D3(BJ) optimized structure is 0.992 eV.\nUser: And how does it look like for the PAW-PBE-D3(BJ) computed valence band maximum?\nAssistant: The PAW-PBE-D3(BJ) computed valence band maximum is 2.238 eV, the HSE06 computed valence band maximum of a PBE-D3(BJ) optimized structure is 0.986 eV and the HLE17 calculated valence band maximum of a PBE-D3(BJ) optimized structure is -0.897 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-38.jsonl": "{"text":"Question: What is the HLE17 computed band gap (HLE17 single-point after PBE-D3(BJ) optimization) of the MOF with the RCSR code hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu]?\nAnswer: The band gap is 0.208 eV."} {"text":"Question: How large is the HLE17 calculated band gap of a PBE-D3(BJ) optimized structure of the metal-organic framework (MOF) with the topology sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu]?\nAnswer: The band gap is 0.911 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-28.jsonl": "{"text":"Question: What is the pore limiting diameter (PLD) of the metal-organic framework (MOF) with the RCSR code hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu]?\nAnswer: 1.281 \\AA."} {"text":"Question: How large is the pore limiting diameter of the metal-organic framework with the topology sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu]?\nAnswer: 2.658 \\AA."}", "/scratch/micpie/export/qmof_quantum/valid_0-24.jsonl": "{"text":"The reticular material with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0 has a space group number of 2."} {"text":"The metal-organic framework with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0 has a space group number of 2."}", "/scratch/micpie/export/qmof_quantum/train_0-33.jsonl": "{"text":"Question: How large is the PAW-PBE-D3(BJ) computed band gap of the reticular material with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0?\nAnswer: 1.341 eV."} {"text":"Question: How large is the PAW-PBE-D3(BJ) computed band gap of the reticular material with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1?\nAnswer: The band gap of the MOF is 3.260 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-24.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0 has a spacegroup number of 4."} {"text":"The MOF with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1 has a space group number of 9."}", "/scratch/micpie/export/qmof_quantum/test_0-1.jsonl": "{"text":"The MOF with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0 has a mass density (density) of 1.556 g\/cm^3."} {"text":"The metal-organic framework with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0 has a mass density (density) of 2.199 g\/cm^3."}", "/scratch/micpie/export/qmof_quantum/test_0-34.jsonl": "{"text":"Question: What is the PAW-PBE-D3(BJ) calculated energy of conduction band minimum of the reticular material with the RCSR code jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd]?\nAnswer: The conduction band minimum of the MOF is 1.335 eV."} {"text":"Question: How large is the PAW-PBE-D3(BJ) calculated conduction band minimum of the reticular material with the topology pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn]?\nAnswer: The conduction band minimum of the MOF is 2.997 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-18.jsonl": "{"text":"The metal-organic framework (MOF) with the topology jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd] has a HSE06 calculated band gap of a PBE-D3(BJ) optimized structure of 2.986 eV."} {"text":"The metal-organic framework (MOF) with the topology pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn] has a HSE06 computed band gap (HSE06 single-point after PBE-D3(BJ) optimization) of 2.209 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-57.jsonl": "{"text":"User: I'm thinking about the conduction band minima and valence band maxima of reticular materials.\nAssistant: How can I help?\nUser: How does the PAW-PBE-D3(BJ) computed conduction band minimum of the reticular material with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0 compare to the HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure and the HLE17 calculated conduction band minimum of a PBE-D3(BJ) optimized structure?\nAssistant: The PAW-PBE-D3(BJ) computed conduction band minimum is 1.729 eV, the HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure is 1.864 eV and the HLE17 calculated conduction band minimum of a PBE-D3(BJ) optimized structure is -0.268 eV.\nUser: How about the PAW-PBE-D3(BJ) calculated valence band maximum (VBM)?\nAssistant: The PAW-PBE-D3(BJ) calculated valence band maximum (VBM) is 0.388 eV, the HSE06 computed valence band maximum of a PBE-D3(BJ) optimized structure is -0.222 eV and the HLE17 computed valence band maximum of a PBE-D3(BJ) optimized structure is -1.772 eV."} {"text":"User: I'm thinking about the conduction band minima and valence band maxima of metal-organic frameworks.\nAssistant: How can I help?\nUser: How does the PAW-PBE-D3(BJ) computed conduction band minimum (CBM) of the metal-organic framework with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1 compare to the HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure and the HLE17 computed conduction band minimum of a PBE-D3(BJ) optimized structure?\nAssistant: The PAW-PBE-D3(BJ) computed conduction band minimum (CBM) is 1.772 eV, the HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure is 2.197 eV and the HLE17 computed conduction band minimum of a PBE-D3(BJ) optimized structure is -0.055 eV.\nUser: And how about the PAW-PBE-D3(BJ) calculated valence band maximum (VBM)?\nAssistant: The PAW-PBE-D3(BJ) calculated valence band maximum (VBM) is -1.488 eV, the HSE06 calculated valence band maximum of a PBE-D3(BJ) optimized structure is -2.074 eV and the HLE17 calculated valence band maximum of a PBE-D3(BJ) optimized structure is -3.406 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-29.jsonl": "{"text":"Question: What is the pore limiting diameter (PLD) of the metal-organic framework with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0?\nAnswer: 2.770 \\AA."} {"text":"Question: How large is the pore limiting diameter (PLD) of the metal-organic framework (MOF) with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0?\nAnswer: The pore limiting diameter (PLD) is 0.783 \\AA."}", "/scratch/micpie/export/qmof_quantum/valid_0-0.jsonl": "{"text":"The metal-organic framework with the RCSR identifier hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu] has a mass density of 1.924 g\/cm^3."} {"text":"The reticular material with the net sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu] has a density of 1.602 g\/cm^3."}", "/scratch/micpie/export/qmof_quantum/test_0-49.jsonl": "{"text":"Question: In which net do the linkers with SMILES S=[C]1=NN=C(O1)c1ccncc1 and nodes with SMILES [Cd] self-assemble to form a metal-organic framework (MOF) with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0?\nAnswer: The metal-organic framework (MOF) with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0 self-assembles to form a metal-organic framework (MOF) with the net jxj."} {"text":"Question: In which topology do the linkers with SMILES [O-][n+]1ccc(cc1)C(=O)[O-] and nodes with SMILES Cl[Mn] self-assemble to form a reticular material with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0?\nAnswer: The reticular material with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0 self-assembles to form a reticular material with the topology pyr."}", "/scratch/micpie/export/qmof_quantum/valid_0-39.jsonl": "{"text":"Question: What is the HLE17 computed band gap (HLE17 single-point after PBE-D3(BJ) optimization) of the metal-organic framework with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0?\nAnswer: The band gap is 0.208 eV."} {"text":"Question: What is the HLE17 computed band gap (HLE17 single-point after PBE-D3(BJ) optimization) of the reticular material with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0?\nAnswer: 0.911 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-36.jsonl": "{"text":"Question: How large is the PAW-PBE-D3(BJ) calculated valence band maximum of the reticular material with the RCSR identifier hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2]?\nAnswer: 0.388 eV."} {"text":"Question: What is the PAW-PBE-D3(BJ) calculated valence band maximum of the metal-organic framework (MOF) with the net sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn]?\nAnswer: The valence band maximum of the MOF is -1.488 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-33.jsonl": "{"text":"Question: How large is the PAW-PBE-D3(BJ) computed band gap of the MOF with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0?\nAnswer: 2.025 eV."} {"text":"Question: What is the PAW-PBE-D3(BJ) calculated band gap of the metal-organic framework with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0?\nAnswer: 0.759 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-32.jsonl": "{"text":"Question: What is the PAW-PBE-D3(BJ) computed band gap of the MOF with the RCSR code jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd]?\nAnswer: The band gap is 2.025 eV."} {"text":"Question: How large is the PAW-PBE-D3(BJ) calculated band gap of the reticular material with the topology pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn]?\nAnswer: The band gap is 0.759 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-21.jsonl": "{"text":"The MOF with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0 has a HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure of 1.702 eV."} {"text":"The MOF with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0 has a HSE06 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of 3.195 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-27.jsonl": "{"text":"Question: How large is the density of the metal-organic framework with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0?\nAnswer: The mass density is 1.556 g\/cm^3."} {"text":"Question: What is the density of the metal-organic framework (MOF) with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0?\nAnswer: 2.199 g\/cm^3."}", "/scratch/micpie/export/qmof_quantum/test_0-2.jsonl": "{"text":"The reticular material with the RCSR code jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd] has a pore limiting diameter of 2.770 \\AA."} {"text":"The metal-organic framework (MOF) with the RCSR code pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn] has a pore limiting diameter (PLD) of 0.783 \\AA."}", "/scratch/micpie/export/qmof_quantum/test_0-30.jsonl": "{"text":"Question: What is the largest cavity diameter (LCD) of the reticular material with the net jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd]?\nAnswer: The largest cavity diameter is 4.171 \\AA."} {"text":"Question: How large is the largest cavity diameter (LCD) of the MOF with the net pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn]?\nAnswer: The largest cavity diameter is 1.516 \\AA."}", "/scratch/micpie/export/qmof_quantum/valid_0-42.jsonl": "{"text":"Question: How large is the HLE17 calculated valence band maximum of a PBE-D3(BJ) optimized structure of the metal-organic framework (MOF) with the net hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu]?\nAnswer: The valence band maximum is -2.211 eV."} {"text":"Question: What is the HLE17 computed valence band maximum of a PBE-D3(BJ) optimized structure of the MOF with the RCSR code sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu]?\nAnswer: The valence band maximum of the MOF is -2.198 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-41.jsonl": "{"text":"Question: What is the HLE17 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of the reticular material with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0?\nAnswer: -0.350 eV."} {"text":"Question: How large is the HLE17 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of the metal-organic framework with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0?\nAnswer: The conduction band minimum is 0.992 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-22.jsonl": "{"text":"The MOF with the net hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2] has a HSE06 computed valence band maximum of a PBE-D3(BJ) optimized structure of -0.222 eV."} {"text":"The MOF with the RCSR identifier sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn] has a HSE06 computed valence band maximum of a PBE-D3(BJ) optimized structure of -2.074 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-31.jsonl": "{"text":"Question: What is the largest cavity diameter (LCD) of the metal-organic framework with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0?\nAnswer: 2.262 \\AA."} {"text":"Question: What is the largest cavity diameter of the metal-organic framework with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1?\nAnswer: The largest cavity diameter is 5.145 \\AA."}", "/scratch/micpie/export/qmof_quantum/train_0-35.jsonl": "{"text":"Question: What is the PAW-PBE-D3(BJ) calculated energy of conduction band minimum of the metal-organic framework with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0?\nAnswer: The conduction band minimum is 1.729 eV."} {"text":"Question: What is the PAW-PBE-D3(BJ) computed conduction band minimum (CBM) of the metal-organic framework (MOF) with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1?\nAnswer: The conduction band minimum is 1.772 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-10.jsonl": "{"text":"The MOF with the RCSR code hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu] has a PAW-PBE-D3(BJ) calculated valence band maximum (VBM) of 0.335 eV."} {"text":"The MOF with the topology sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu] has a PAW-PBE-D3(BJ) calculated valence band maximum of 0.418 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-56.jsonl": "{"text":"User: I'm thinking about the band gaps of reticular materials.\nAssistant: How can I be of assistance?\nUser: How does the PAW-PBE-D3(BJ) computed band gap of the metal-organic framework (MOF) with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0 compare to the HSE06 computed band gap of a PBE-D3(BJ) optimized structure and the HLE17 computed band gap of a PBE-D3(BJ) optimized structure?\nAssistant: The PAW-PBE-D3(BJ) computed band gap is 1.341 eV, the HSE06 computed band gap of a PBE-D3(BJ) optimized structure is 2.086 eV and the HLE17 computed band gap of a PBE-D3(BJ) optimized structure is 1.504 eV."} {"text":"User: I'm thinking about the band gaps of reticular materials.\nAssistant: That's interesting.\nUser: How does the PAW-PBE-D3(BJ) calculated band gap of the reticular material with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1 compare to the HSE06 computed band gap (HSE06 single-point after PBE-D3(BJ) optimization) and the HLE17 computed band gap (HLE17 single-point after PBE-D3(BJ) optimization)?\nAssistant: The PAW-PBE-D3(BJ) calculated band gap is 3.260 eV, the HSE06 computed band gap (HSE06 single-point after PBE-D3(BJ) optimization) is 4.270 eV and the HLE17 computed band gap (HLE17 single-point after PBE-D3(BJ) optimization) is 3.352 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-6.jsonl": "{"text":"The metal-organic framework (MOF) with the topology hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2] has a PAW-PBE-D3(BJ) computed band gap of 1.341 eV."} {"text":"The metal-organic framework (MOF) with the RCSR code sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn] has a PAW-PBE-D3(BJ) computed band gap of 3.260 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-6.jsonl": "{"text":"The metal-organic framework with the topology hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu] has a PAW-PBE-D3(BJ) computed band gap of 0.312 eV."} {"text":"The metal-organic framework with the RCSR identifier sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu] has a PAW-PBE-D3(BJ) calculated band gap of 0.938 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-32.jsonl": "{"text":"Question: How large is the PAW-PBE-D3(BJ) calculated band gap of the reticular material with the RCSR identifier hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu]?\nAnswer: The band gap of the MOF is 0.312 eV."} {"text":"Question: What is the PAW-PBE-D3(BJ) computed band gap of the MOF with the topology sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu]?\nAnswer: The band gap of the MOF is 0.938 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-30.jsonl": "{"text":"Question: What is the largest cavity diameter of the metal-organic framework (MOF) with the topology hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu]?\nAnswer: The largest cavity diameter is 2.040 \\AA."} {"text":"Question: What is the largest cavity diameter of the reticular material with the RCSR identifier sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu]?\nAnswer: The largest cavity diameter is 3.436 \\AA."}", "/scratch/micpie/export/qmof_quantum/train_0-21.jsonl": "{"text":"The MOF with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0 has a HSE06 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of 1.864 eV."} {"text":"The metal-organic framework with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1 has a HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure of 2.197 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-36.jsonl": "{"text":"Question: How large is the PAW-PBE-D3(BJ) calculated valence band maximum (VBM) of the metal-organic framework with the RCSR identifier jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd]?\nAnswer: -0.689 eV."} {"text":"Question: How large is the PAW-PBE-D3(BJ) computed valence band maximum (VBM) of the metal-organic framework with the RCSR identifier pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn]?\nAnswer: 2.238 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-56.jsonl": "{"text":"User: I'm thinking about the band gaps of metal-organic frameworks.\nAssistant: How can I be of assistance?\nUser: How does the PAW-PBE-D3(BJ) calculated band gap of the metal-organic framework with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0 compare to the HSE06 calculated band gap of a PBE-D3(BJ) optimized structure and the HLE17 computed band gap (HLE17 single-point after PBE-D3(BJ) optimization)?\nAssistant: The PAW-PBE-D3(BJ) calculated band gap is 2.025 eV, the HSE06 calculated band gap of a PBE-D3(BJ) optimized structure is 2.986 eV and the HLE17 computed band gap (HLE17 single-point after PBE-D3(BJ) optimization) is 2.297 eV."} {"text":"User: I'm thinking about the band gaps of reticular materials.\nAssistant: Is there anything I can do?\nUser: How does the PAW-PBE-D3(BJ) computed band gap of the metal-organic framework (MOF) with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0 compare to the HSE06 calculated band gap of a PBE-D3(BJ) optimized structure and the HLE17 computed band gap (HLE17 single-point after PBE-D3(BJ) optimization)?\nAssistant: The PAW-PBE-D3(BJ) computed band gap is 0.759 eV, the HSE06 calculated band gap of a PBE-D3(BJ) optimized structure is 2.209 eV and the HLE17 computed band gap (HLE17 single-point after PBE-D3(BJ) optimization) is 1.889 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-46.jsonl": "{"text":"Question: How large is the HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure of the metal-organic framework with the RCSR code hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu]?\nAnswer: The conduction band minimum of the MOF is 1.872 eV."} {"text":"Question: What is the HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure of the MOF with the RCSR identifier sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu]?\nAnswer: 2.713 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-39.jsonl": "{"text":"Question: How large is the HLE17 calculated band gap of a PBE-D3(BJ) optimized structure of the reticular material with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0?\nAnswer: 2.297 eV."} {"text":"Question: What is the HLE17 computed band gap (HLE17 single-point after PBE-D3(BJ) optimization) of the metal-organic framework with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0?\nAnswer: The band gap is 1.889 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-47.jsonl": "{"text":"Question: What is the HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure of the reticular material with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0?\nAnswer: The conduction band minimum is 1.864 eV."} {"text":"Question: How large is the HSE06 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of the MOF with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1?\nAnswer: 2.197 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-36.jsonl": "{"text":"Question: What is the PAW-PBE-D3(BJ) computed energy of valence band maximum (VBM) of the metal-organic framework (MOF) with the RCSR identifier hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu]?\nAnswer: The valence band maximum of the MOF is 0.335 eV."} {"text":"Question: What is the PAW-PBE-D3(BJ) calculated valence band maximum of the metal-organic framework with the topology sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu]?\nAnswer: 0.418 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-55.jsonl": "{"text":"User: What suggestion do you have if I want a reticular material with the PAW-PBE-D3(BJ) calculated band gap of 1.341 eV?\nAssistant: Do you have other constraints?\nUser: Thanks, I want to have a RCSR code hcb and PAW-PBE-D3(BJ) calculated conduction band minimum (CBM) of 1.729 eV.\nAssistant: Do you have other requirements?\nUser: I have one more request, The density should be 1.636 g\/cm^3 and the space group number should be 4.\nAssistant: I propose that you combine your linkers with SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-] with nodes with SMILES [OH2][K][OH2]."} {"text":"User: What recommendations do you have if I need a metal-organic framework (MOF) with the PAW-PBE-D3(BJ) calculated band gap of 3.260 eV?\nAssistant: Do you have other constraints?\nUser: Indeed, I want to have a topology sql and PAW-PBE-D3(BJ) calculated energy of conduction band minimum of 1.772 eV.\nAssistant: Are there additional constraints?\nUser: The density should be 1.577 g\/cm^3 and the space group number should be 9.\nAssistant: I propose that you combine your linkers with SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1 with nodes with SMILES [Zn]."}", "/scratch/micpie/export/qmof_quantum/train_0-19.jsonl": "{"text":"The MOF with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0 has a HSE06 computed band gap of a PBE-D3(BJ) optimized structure of 2.086 eV."} {"text":"The reticular material with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1 has a HSE06 calculated band gap of a PBE-D3(BJ) optimized structure of 4.270 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-29.jsonl": "{"text":"Question: How large is the pore limiting diameter (PLD) of the metal-organic framework with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0?\nAnswer: The pore limiting diameter (PLD) is 1.281 \\AA."} {"text":"Question: What is the pore limiting diameter of the metal-organic framework (MOF) with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0?\nAnswer: 2.658 \\AA."}", "/scratch/micpie/export/qmof_quantum/test_0-55.jsonl": "{"text":"User: What recommendations do you have if I want a reticular material with the PAW-PBE-D3(BJ) calculated band gap of 2.025 eV?\nAssistant: Do you have other requirements?\nUser: I want to have a RCSR identifier jxj and PAW-PBE-D3(BJ) computed conduction band minimum (CBM) of 1.335 eV.\nAssistant: Is that it?\nUser: One more thing, The density should be 1.556 g\/cm^3 and the space group number should be 14.\nAssistant: I recommend that you combine your linkers with SMILES S=[C]1=NN=C(O1)c1ccncc1 with nodes with SMILES [Cd]."} {"text":"User: What proposals do you have if I want a metal-organic framework with the PAW-PBE-D3(BJ) calculated band gap of 0.759 eV?\nAssistant: Is there anything else I should take into account?\nUser: I want to have a RCSR code pyr and PAW-PBE-D3(BJ) calculated conduction band minimum of 2.997 eV.\nAssistant: Is there anything else I should take into account?\nUser: One more thing, The density should be 2.199 g\/cm^3 and the space group number should be 61.\nAssistant: I recommend that you combine your linkers with SMILES [O-][n+]1ccc(cc1)C(=O)[O-] with nodes with SMILES Cl[Mn]."}", "/scratch/micpie/export/qmof_quantum/test_0-42.jsonl": "{"text":"Question: How large is the HLE17 computed valence band maximum of a PBE-D3(BJ) optimized structure of the MOF with the topology jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd]?\nAnswer: The valence band maximum is -2.647 eV."} {"text":"Question: What is the HLE17 calculated valence band maximum of a PBE-D3(BJ) optimized structure of the MOF with the net pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn]?\nAnswer: The valence band maximum of the MOF is -0.897 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-9.jsonl": "{"text":"The metal-organic framework with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0 has a PAW-PBE-D3(BJ) calculated energy of conduction band minimum of 1.335 eV."} {"text":"The reticular material with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0 has a PAW-PBE-D3(BJ) computed conduction band minimum of 2.997 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-32.jsonl": "{"text":"Question: How large is the PAW-PBE-D3(BJ) calculated band gap of the MOF with the RCSR code hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2]?\nAnswer: The band gap of the MOF is 1.341 eV."} {"text":"Question: What is the PAW-PBE-D3(BJ) calculated band gap of the MOF with the net sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn]?\nAnswer: The band gap of the MOF is 3.260 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-52.jsonl": "{"text":"User: Which linkers and nodes do I have to combine to form a metal-organic framework (MOF) with the largest cavity diameter 2.040 \\AA?\nAssistant: Are there additional constraints?\nUser: Yes, I want to use linkers with SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-].\nAssistant: You have to combine your linkers with SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-] with nodes with SMILES [Cu] to form a metal-organic framework (MOF) with the largest cavity diameter 2.040 \\AA."} {"text":"User: Which linkers and nodes do I have to combine to form a reticular material with the largest cavity diameter 3.436 \\AA?\nAssistant: Are there additional constraints?\nUser: Thanks, I want to use linkers with SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1.\nAssistant: You have to combine your linkers with SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1 with nodes with SMILES [Cu] to form a reticular material with the largest cavity diameter 3.436 \\AA."}", "/scratch/micpie/export/qmof_quantum/train_0-40.jsonl": "{"text":"Question: How large is the HLE17 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of the metal-organic framework with the net hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2]?\nAnswer: The conduction band minimum of the MOF is -0.268 eV."} {"text":"Question: How large is the HLE17 computed conduction band minimum of a PBE-D3(BJ) optimized structure of the metal-organic framework (MOF) with the RCSR identifier sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn]?\nAnswer: The conduction band minimum is -0.055 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-0.jsonl": "{"text":"The reticular material with the RCSR identifier jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd] has a density of 1.556 g\/cm^3."} {"text":"The metal-organic framework with the RCSR identifier pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn] has a mass density of 2.199 g\/cm^3."}", "/scratch/micpie/export/qmof_quantum/test_0-31.jsonl": "{"text":"Question: What is the largest cavity diameter (LCD) of the metal-organic framework with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0?\nAnswer: 4.171 \\AA."} {"text":"Question: How large is the largest cavity diameter of the MOF with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0?\nAnswer: The largest cavity diameter is 1.516 \\AA."}", "/scratch/micpie/export/qmof_quantum/test_0-24.jsonl": "{"text":"The metal-organic framework with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0 has a space group number of 14."} {"text":"The reticular material with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0 has a space group number of 61."}", "/scratch/micpie/export/qmof_quantum/train_0-54.jsonl": "{"text":"User: Which linkers and nodes do I have to combine to form a reticular material with the PAW-PBE-D3(BJ) calculated band gap of 1.341 eV?\nAssistant: Do you have other constraints?\nUser: I want to have a topology hcb and PAW-PBE-D3(BJ) computed conduction band minimum of 1.729 eV.\nAssistant: You have to combine your linkers with SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-] with nodes with SMILES [OH2][K][OH2].\nUser: I have one more question, What is the density and the space group number?\nAssistant: The density is 1.636 g\/cm^3 and the space group number is 4."} {"text":"User: Which linkers and nodes do I have to combine to form a metal-organic framework with the PAW-PBE-D3(BJ) calculated band gap of 3.260 eV?\nAssistant: Do you have other constraints?\nUser: Thanks, I want to have a RCSR identifier sql and PAW-PBE-D3(BJ) computed conduction band minimum (CBM) of 1.772 eV.\nAssistant: You have to combine your linkers with SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1 with nodes with SMILES [Zn].\nUser: What is the density and the space group number?\nAssistant: The density is 1.577 g\/cm^3 and the space group number is 9."}", "/scratch/micpie/export/qmof_quantum/valid_0-16.jsonl": "{"text":"The reticular material with the RCSR identifier hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu] has a HLE17 computed valence band maximum of a PBE-D3(BJ) optimized structure of -2.211 eV."} {"text":"The metal-organic framework (MOF) with the RCSR code sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu] has a HLE17 computed valence band maximum of a PBE-D3(BJ) optimized structure of -2.198 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-7.jsonl": "{"text":"The MOF with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0 has a PAW-PBE-D3(BJ) calculated band gap of 0.312 eV."} {"text":"The metal-organic framework (MOF) with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0 has a PAW-PBE-D3(BJ) computed band gap of 0.938 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-47.jsonl": "{"text":"Question: How large is the HSE06 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of the reticular material with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0?\nAnswer: The conduction band minimum is 1.872 eV."} {"text":"Question: How large is the HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure of the MOF with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0?\nAnswer: 2.713 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-34.jsonl": "{"text":"Question: How large is the PAW-PBE-D3(BJ) computed conduction band minimum of the reticular material with the net hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu]?\nAnswer: The conduction band minimum of the MOF is 0.647 eV."} {"text":"Question: How large is the PAW-PBE-D3(BJ) computed conduction band minimum of the MOF with the topology sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu]?\nAnswer: The conduction band minimum of the MOF is 1.356 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-3.jsonl": "{"text":"The MOF with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0 has a pore limiting diameter of 2.770 \\AA."} {"text":"The metal-organic framework (MOF) with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0 has a pore limiting diameter of 0.783 \\AA."}", "/scratch/micpie/export/qmof_quantum/valid_0-11.jsonl": "{"text":"The metal-organic framework with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0 has a PAW-PBE-D3(BJ) calculated valence band maximum (VBM) of 0.335 eV."} {"text":"The reticular material with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0 has a PAW-PBE-D3(BJ) computed valence band maximum of 0.418 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-20.jsonl": "{"text":"The metal-organic framework with the RCSR code hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2] has a HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure of 1.864 eV."} {"text":"The reticular material with the net sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn] has a HSE06 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of 2.197 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-43.jsonl": "{"text":"Question: How large is the HLE17 computed valence band maximum of a PBE-D3(BJ) optimized structure of the metal-organic framework with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0?\nAnswer: The valence band maximum of the MOF is -1.772 eV."} {"text":"Question: How large is the HLE17 computed valence band maximum of a PBE-D3(BJ) optimized structure of the metal-organic framework (MOF) with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1?\nAnswer: The valence band maximum is -3.406 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-44.jsonl": "{"text":"Question: What is the HSE06 computed band gap of a PBE-D3(BJ) optimized structure of the MOF with the RCSR identifier jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd]?\nAnswer: The band gap of the MOF is 2.986 eV."} {"text":"Question: How large is the HSE06 calculated band gap of a PBE-D3(BJ) optimized structure of the reticular material with the topology pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn]?\nAnswer: The band gap of the MOF is 2.209 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-30.jsonl": "{"text":"Question: How large is the largest cavity diameter of the MOF with the topology hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2]?\nAnswer: 2.262 \\AA."} {"text":"Question: How large is the largest cavity diameter of the MOF with the topology sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn]?\nAnswer: The largest cavity diameter (LCD) is 5.145 \\AA."}", "/scratch/micpie/export/qmof_quantum/valid_0-20.jsonl": "{"text":"The reticular material with the net hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu] has a HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure of 1.872 eV."} {"text":"The reticular material with the net sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu] has a HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure of 2.713 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-37.jsonl": "{"text":"Question: What is the PAW-PBE-D3(BJ) computed energy of valence band maximum (VBM) of the metal-organic framework (MOF) with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0?\nAnswer: The valence band maximum is 0.335 eV."} {"text":"Question: What is the PAW-PBE-D3(BJ) calculated valence band maximum (VBM) of the MOF with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0?\nAnswer: 0.418 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-26.jsonl": "{"text":"Question: How large is the density of the metal-organic framework (MOF) with the topology hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2]?\nAnswer: The mass density is 1.636 g\/cm^3."} {"text":"Question: What is the density of the metal-organic framework (MOF) with the RCSR code sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn]?\nAnswer: The mass density is 1.577 g\/cm^3."}", "/scratch/micpie/export/qmof_quantum/train_0-0.jsonl": "{"text":"The MOF with the RCSR code hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2] has a density of 1.636 g\/cm^3."} {"text":"The metal-organic framework (MOF) with the RCSR code sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn] has a mass density of 1.577 g\/cm^3."}", "/scratch/micpie/export/qmof_quantum/test_0-6.jsonl": "{"text":"The metal-organic framework with the net jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd] has a PAW-PBE-D3(BJ) computed band gap of 2.025 eV."} {"text":"The MOF with the net pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn] has a PAW-PBE-D3(BJ) computed band gap of 0.759 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-10.jsonl": "{"text":"The metal-organic framework with the RCSR code hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2] has a PAW-PBE-D3(BJ) computed valence band maximum of 0.388 eV."} {"text":"The MOF with the topology sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn] has a PAW-PBE-D3(BJ) calculated valence band maximum (VBM) of -1.488 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-3.jsonl": "{"text":"The MOF with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0 has a pore limiting diameter (PLD) of 0.937 \\AA."} {"text":"The MOF with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1 has a pore limiting diameter of 2.694 \\AA."}", "/scratch/micpie/export/qmof_quantum/train_0-23.jsonl": "{"text":"The metal-organic framework with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0 has a HSE06 computed valence band maximum of a PBE-D3(BJ) optimized structure of -0.222 eV."} {"text":"The reticular material with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1 has a HSE06 calculated valence band maximum of a PBE-D3(BJ) optimized structure of -2.074 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-46.jsonl": "{"text":"Question: How large is the HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure of the metal-organic framework with the RCSR code hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2]?\nAnswer: 1.864 eV."} {"text":"Question: What is the HSE06 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of the MOF with the RCSR code sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn]?\nAnswer: 2.197 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-12.jsonl": "{"text":"The MOF with the net hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2] has a HLE17 computed band gap (HLE17 single-point after PBE-D3(BJ) optimization) of 1.504 eV."} {"text":"The reticular material with the RCSR code sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn] has a HLE17 computed band gap (HLE17 single-point after PBE-D3(BJ) optimization) of 3.352 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-28.jsonl": "{"text":"Question: What is the pore limiting diameter (PLD) of the MOF with the topology jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd]?\nAnswer: 2.770 \\AA."} {"text":"Question: How large is the pore limiting diameter of the metal-organic framework with the RCSR code pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn]?\nAnswer: 0.783 \\AA."}", "/scratch/micpie/export/qmof_quantum/valid_0-40.jsonl": "{"text":"Question: How large is the HLE17 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of the reticular material with the net hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu]?\nAnswer: The conduction band minimum is -2.003 eV."} {"text":"Question: How large is the HLE17 computed conduction band minimum of a PBE-D3(BJ) optimized structure of the metal-organic framework (MOF) with the RCSR code sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu]?\nAnswer: The conduction band minimum is -1.286 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-50.jsonl": "{"text":"User: With which linkers do I have to combine my nodes with SMILES [Cu] to form a metal-organic framework (MOF) with the net hcb?\nAssistant: You have to combine your nodes with SMILES [Cu] with linkers with SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-] to form a metal-organic framework (MOF) with the hcb net."} {"text":"User: With which linkers do I have to combine my nodes with SMILES [Cu] to form a metal-organic framework (MOF) with the net sql?\nAssistant: You have to combine your nodes with SMILES [Cu] with linkers with SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1 to form a metal-organic framework (MOF) with the sql net."}", "/scratch/micpie/export/qmof_quantum/test_0-13.jsonl": "{"text":"The MOF with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0 has a HLE17 computed band gap of a PBE-D3(BJ) optimized structure of 2.297 eV."} {"text":"The metal-organic framework with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0 has a HLE17 computed band gap (HLE17 single-point after PBE-D3(BJ) optimization) of 1.889 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-23.jsonl": "{"text":"The metal-organic framework with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0 has a HSE06 calculated valence band maximum of a PBE-D3(BJ) optimized structure of -1.285 eV."} {"text":"The metal-organic framework with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0 has a HSE06 calculated valence band maximum of a PBE-D3(BJ) optimized structure of 0.986 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-2.jsonl": "{"text":"The reticular material with the RCSR identifier hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu] has a pore limiting diameter (PLD) of 1.281 \\AA."} {"text":"The reticular material with the topology sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu] has a pore limiting diameter of 2.658 \\AA."}", "/scratch/micpie/export/qmof_quantum/valid_0-49.jsonl": "{"text":"Question: In which net do the linkers with SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-] and nodes with SMILES [Cu] self-assemble to form a metal-organic framework with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0?\nAnswer: The metal-organic framework with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0 self-assembles to form a metal-organic framework with the net hcb."} {"text":"Question: In which net do the linkers with SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1 and nodes with SMILES [Cu] self-assemble to form a reticular material with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0?\nAnswer: The reticular material with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0 self-assembles to form a reticular material with the net sql."}", "/scratch/micpie/export/qmof_quantum/valid_0-21.jsonl": "{"text":"The metal-organic framework with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0 has a HSE06 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of 1.872 eV."} {"text":"The metal-organic framework (MOF) with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0 has a HSE06 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of 2.713 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-14.jsonl": "{"text":"The metal-organic framework (MOF) with the RCSR identifier hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2] has a HLE17 computed conduction band minimum of a PBE-D3(BJ) optimized structure of -0.268 eV."} {"text":"The reticular material with the net sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn] has a HLE17 computed conduction band minimum of a PBE-D3(BJ) optimized structure of -0.055 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-51.jsonl": "{"text":"User: With which nodes do I have to combine my linkers with SMILES S=[C]1=NN=C(O1)c1ccncc1 to form a metal-organic framework with the topology jxj?\nAssistant: You have to combine your linkers with SMILES S=[C]1=NN=C(O1)c1ccncc1 with nodes with SMILES [Cd] to form a metal-organic framework with the jxj topology."} {"text":"User: With which nodes do I have to combine my linkers with SMILES [O-][n+]1ccc(cc1)C(=O)[O-] to form a reticular material with the topology pyr?\nAssistant: You have to combine your linkers with SMILES [O-][n+]1ccc(cc1)C(=O)[O-] with nodes with SMILES Cl[Mn] to form a reticular material with the pyr topology."}", "/scratch/micpie/export/qmof_quantum/valid_0-1.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0 has a mass density of 1.924 g\/cm^3."} {"text":"The reticular material with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0 has a density of 1.602 g\/cm^3."}", "/scratch/micpie/export/qmof_quantum/valid_0-13.jsonl": "{"text":"The reticular material with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0 has a HLE17 computed band gap (HLE17 single-point after PBE-D3(BJ) optimization) of 0.208 eV."} {"text":"The metal-organic framework with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0 has a HLE17 computed band gap of a PBE-D3(BJ) optimized structure of 0.911 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-44.jsonl": "{"text":"Question: How large is the HSE06 calculated band gap of a PBE-D3(BJ) optimized structure of the metal-organic framework with the RCSR code hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2]?\nAnswer: The band gap is 2.086 eV."} {"text":"Question: How large is the HSE06 computed band gap (HSE06 single-point after PBE-D3(BJ) optimization) of the metal-organic framework with the RCSR code sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn]?\nAnswer: The band gap is 4.270 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-52.jsonl": "{"text":"User: Which linkers and nodes do I have to combine to form a reticular material with the largest cavity diameter 4.171 \\AA?\nAssistant: Do you have other requirements?\nUser: Thanks, I want to use linkers with SMILES S=[C]1=NN=C(O1)c1ccncc1.\nAssistant: You have to combine your linkers with SMILES S=[C]1=NN=C(O1)c1ccncc1 with nodes with SMILES [Cd] to form a reticular material with the largest cavity diameter 4.171 \\AA."} {"text":"User: Which linkers and nodes do I have to combine to form a metal-organic framework with the largest cavity diameter (LCD) 1.516 \\AA?\nAssistant: Do you have other requirements?\nUser: I want to use linkers with SMILES [O-][n+]1ccc(cc1)C(=O)[O-].\nAssistant: You have to combine your linkers with SMILES [O-][n+]1ccc(cc1)C(=O)[O-] with nodes with SMILES Cl[Mn] to form a metal-organic framework with the largest cavity diameter (LCD) 1.516 \\AA."}", "/scratch/micpie/export/qmof_quantum/train_0-41.jsonl": "{"text":"Question: How large is the HLE17 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of the MOF with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0?\nAnswer: The conduction band minimum of the MOF is -0.268 eV."} {"text":"Question: What is the HLE17 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of the metal-organic framework (MOF) with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1?\nAnswer: -0.055 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-29.jsonl": "{"text":"Question: How large is the pore limiting diameter of the metal-organic framework (MOF) with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0?\nAnswer: 0.937 \\AA."} {"text":"Question: How large is the pore limiting diameter of the reticular material with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1?\nAnswer: 2.694 \\AA."}", "/scratch/micpie/export/qmof_quantum/valid_0-51.jsonl": "{"text":"User: With which nodes do I have to combine my linkers with SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-] to form a reticular material with the net hcb?\nAssistant: You have to combine your linkers with SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-] with nodes with SMILES [Cu] to form a reticular material with the hcb net."} {"text":"User: With which nodes do I have to combine my linkers with SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1 to form a metal-organic framework with the net sql?\nAssistant: You have to combine your linkers with SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1 with nodes with SMILES [Cu] to form a metal-organic framework with the sql net."}", "/scratch/micpie/export/qmof_quantum/valid_0-23.jsonl": "{"text":"The metal-organic framework with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0 has a HSE06 computed valence band maximum of a PBE-D3(BJ) optimized structure of -0.979 eV."} {"text":"The MOF with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0 has a HSE06 computed valence band maximum of a PBE-D3(BJ) optimized structure of -0.902 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-5.jsonl": "{"text":"The metal-organic framework with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0 has a largest cavity diameter of 2.040 \\AA."} {"text":"The MOF with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0 has a largest cavity diameter of 3.436 \\AA."}", "/scratch/micpie/export/qmof_quantum/train_0-15.jsonl": "{"text":"The reticular material with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0 has a HLE17 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of -0.268 eV."} {"text":"The metal-organic framework (MOF) with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1 has a HLE17 computed conduction band minimum of a PBE-D3(BJ) optimized structure of -0.055 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-4.jsonl": "{"text":"The reticular material with the RCSR identifier hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu] has a largest cavity diameter of 2.040 \\AA."} {"text":"The metal-organic framework (MOF) with the topology sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu] has a largest cavity diameter (LCD) of 3.436 \\AA."}", "/scratch/micpie/export/qmof_quantum/train_0-5.jsonl": "{"text":"The metal-organic framework with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0 has a largest cavity diameter of 2.262 \\AA."} {"text":"The metal-organic framework with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1 has a largest cavity diameter of 5.145 \\AA."}", "/scratch/micpie/export/qmof_quantum/valid_0-15.jsonl": "{"text":"The reticular material with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0 has a HLE17 computed conduction band minimum of a PBE-D3(BJ) optimized structure of -2.003 eV."} {"text":"The MOF with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0 has a HLE17 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of -1.286 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-54.jsonl": "{"text":"User: Which linkers and nodes do I have to combine to form a metal-organic framework (MOF) with the PAW-PBE-D3(BJ) calculated band gap of 0.312 eV?\nAssistant: Do you have other constraints?\nUser: Thanks, I want to have a topology hcb and PAW-PBE-D3(BJ) calculated conduction band minimum of 0.647 eV.\nAssistant: You have to combine your linkers with SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-] with nodes with SMILES [Cu].\nUser: One more thing, What is the density and the space group number?\nAssistant: The density is 1.924 g\/cm^3 and the space group number is 2."} {"text":"User: Which linkers and nodes do I have to combine to form a reticular material with the PAW-PBE-D3(BJ) computed band gap of 0.938 eV?\nAssistant: Are there additional constraints?\nUser: Thanks, I want to have a RCSR identifier sql and PAW-PBE-D3(BJ) calculated energy of conduction band minimum of 1.356 eV.\nAssistant: You have to combine your linkers with SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1 with nodes with SMILES [Cu].\nUser: One more thing, What is the density and the space group number?\nAssistant: The density is 1.602 g\/cm^3 and the space group number is 2."}", "/scratch/micpie/export/qmof_quantum/valid_0-12.jsonl": "{"text":"The reticular material with the RCSR identifier hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu] has a HLE17 computed band gap of a PBE-D3(BJ) optimized structure of 0.208 eV."} {"text":"The MOF with the topology sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu] has a HLE17 calculated band gap of a PBE-D3(BJ) optimized structure of 0.911 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-50.jsonl": "{"text":"User: With which linkers do I have to combine my nodes with SMILES [OH2][K][OH2] to form a metal-organic framework with the net hcb?\nAssistant: You have to combine your nodes with SMILES [OH2][K][OH2] with linkers with SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-] to form a metal-organic framework with the hcb net."} {"text":"User: With which linkers do I have to combine my nodes with SMILES [Zn] to form a metal-organic framework (MOF) with the topology sql?\nAssistant: You have to combine your nodes with SMILES [Zn] with linkers with SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1 to form a metal-organic framework (MOF) with the sql topology."}", "/scratch/micpie/export/qmof_quantum/valid_0-18.jsonl": "{"text":"The metal-organic framework with the topology hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu] has a HSE06 computed band gap (HSE06 single-point after PBE-D3(BJ) optimization) of 2.851 eV."} {"text":"The metal-organic framework with the RCSR code sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu] has a HSE06 computed band gap (HSE06 single-point after PBE-D3(BJ) optimization) of 3.615 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-45.jsonl": "{"text":"Question: What is the HSE06 calculated band gap of a PBE-D3(BJ) optimized structure of the MOF with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0?\nAnswer: 2.086 eV."} {"text":"Question: How large is the HSE06 computed band gap (HSE06 single-point after PBE-D3(BJ) optimization) of the metal-organic framework with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1?\nAnswer: The band gap is 4.270 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-35.jsonl": "{"text":"Question: How large is the PAW-PBE-D3(BJ) computed conduction band minimum of the reticular material with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0?\nAnswer: The conduction band minimum is 1.335 eV."} {"text":"Question: What is the PAW-PBE-D3(BJ) computed conduction band minimum (CBM) of the MOF with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0?\nAnswer: 2.997 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-56.jsonl": "{"text":"User: I'm thinking about the band gaps of metal-organic frameworks (MOFs).\nAssistant: What can I do for you?\nUser: How does the PAW-PBE-D3(BJ) computed band gap of the reticular material with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0 compare to the HSE06 computed band gap of a PBE-D3(BJ) optimized structure and the HLE17 calculated band gap of a PBE-D3(BJ) optimized structure?\nAssistant: The PAW-PBE-D3(BJ) computed band gap is 0.312 eV, the HSE06 computed band gap of a PBE-D3(BJ) optimized structure is 2.851 eV and the HLE17 calculated band gap of a PBE-D3(BJ) optimized structure is 0.208 eV."} {"text":"User: I'm thinking about the band gaps of metal-organic frameworks (MOFs).\nAssistant: How can I be of assistance?\nUser: How does the PAW-PBE-D3(BJ) calculated band gap of the reticular material with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0 compare to the HSE06 calculated band gap of a PBE-D3(BJ) optimized structure and the HLE17 calculated band gap of a PBE-D3(BJ) optimized structure?\nAssistant: The PAW-PBE-D3(BJ) calculated band gap is 0.938 eV, the HSE06 calculated band gap of a PBE-D3(BJ) optimized structure is 3.615 eV and the HLE17 calculated band gap of a PBE-D3(BJ) optimized structure is 0.911 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-47.jsonl": "{"text":"Question: What is the HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure of the MOF with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0?\nAnswer: The conduction band minimum of the MOF is 1.702 eV."} {"text":"Question: What is the HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure of the reticular material with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0?\nAnswer: 3.195 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-2.jsonl": "{"text":"The MOF with the RCSR identifier hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2] has a pore limiting diameter (PLD) of 0.937 \\AA."} {"text":"The MOF with the net sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn] has a pore limiting diameter (PLD) of 2.694 \\AA."}", "/scratch/micpie/export/qmof_quantum/valid_0-33.jsonl": "{"text":"Question: How large is the PAW-PBE-D3(BJ) calculated band gap of the metal-organic framework (MOF) with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0?\nAnswer: 0.312 eV."} {"text":"Question: How large is the PAW-PBE-D3(BJ) calculated band gap of the reticular material with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0?\nAnswer: The band gap is 0.938 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-42.jsonl": "{"text":"Question: How large is the HLE17 computed valence band maximum of a PBE-D3(BJ) optimized structure of the MOF with the RCSR identifier hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2]?\nAnswer: -1.772 eV."} {"text":"Question: What is the HLE17 computed valence band maximum of a PBE-D3(BJ) optimized structure of the metal-organic framework (MOF) with the RCSR identifier sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn]?\nAnswer: The valence band maximum is -3.406 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-11.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0 has a PAW-PBE-D3(BJ) computed valence band maximum (VBM) of -0.689 eV."} {"text":"The reticular material with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0 has a PAW-PBE-D3(BJ) calculated valence band maximum (VBM) of 2.238 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-7.jsonl": "{"text":"The reticular material with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0 has a PAW-PBE-D3(BJ) calculated band gap of 1.341 eV."} {"text":"The MOF with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1 has a PAW-PBE-D3(BJ) calculated band gap of 3.260 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-17.jsonl": "{"text":"The reticular material with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0 has a HLE17 computed valence band maximum of a PBE-D3(BJ) optimized structure of -2.647 eV."} {"text":"The MOF with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0 has a HLE17 computed valence band maximum of a PBE-D3(BJ) optimized structure of -0.897 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-27.jsonl": "{"text":"Question: How large is the mass density of the metal-organic framework (MOF) with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0?\nAnswer: The mass density is 1.924 g\/cm^3."} {"text":"Question: How large is the density of the metal-organic framework (MOF) with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0?\nAnswer: The mass density is 1.602 g\/cm^3."}", "/scratch/micpie/export/qmof_quantum/valid_0-19.jsonl": "{"text":"The MOF with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0 has a HSE06 calculated band gap of a PBE-D3(BJ) optimized structure of 2.851 eV."} {"text":"The MOF with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0 has a HSE06 computed band gap of a PBE-D3(BJ) optimized structure of 3.615 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-11.jsonl": "{"text":"The metal-organic framework with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0 has a PAW-PBE-D3(BJ) computed valence band maximum of 0.388 eV."} {"text":"The metal-organic framework with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1 has a PAW-PBE-D3(BJ) computed valence band maximum of -1.488 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-58.jsonl": "{"text":"User: I'm thinking about the RCSR identifier of metal-organic frameworks (MOFs).\nAssistant: Is there anything I can do?\nUser: Which RCSR identifier do the linkers with SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-] and nodes with SMILES [OH2][K][OH2] self-assemble to form a metal-organic framework?\nAssistant: Into the RCSR identifier hcb.\nUser: What density do you predict for this material?\nAssistant: The density is 1.636 g\/cm^3."} {"text":"User: I'm wondering about the RCSR identifier of reticular materials.\nAssistant: What can I do for you?\nUser: Which RCSR identifier do the linkers with SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1 and nodes with SMILES [Zn] self-assemble to form a metal-organic framework (MOF)?\nAssistant: Into the RCSR identifier sql.\nUser: What density do you expect for this metal-organic framework?\nAssistant: The density is 1.577 g\/cm^3."}", "/scratch/micpie/export/qmof_quantum/valid_0-31.jsonl": "{"text":"Question: How large is the largest cavity diameter (LCD) of the MOF with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0?\nAnswer: The largest cavity diameter (LCD) is 2.040 \\AA."} {"text":"Question: What is the largest cavity diameter (LCD) of the reticular material with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0?\nAnswer: 3.436 \\AA."}", "/scratch/micpie/export/qmof_quantum/train_0-1.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0 has a mass density of 1.636 g\/cm^3."} {"text":"The reticular material with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1 has a mass density (density) of 1.577 g\/cm^3."}", "/scratch/micpie/export/qmof_quantum/test_0-37.jsonl": "{"text":"Question: What is the PAW-PBE-D3(BJ) computed valence band maximum of the reticular material with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0?\nAnswer: The valence band maximum is -0.689 eV."} {"text":"Question: How large is the PAW-PBE-D3(BJ) calculated valence band maximum of the metal-organic framework (MOF) with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0?\nAnswer: The valence band maximum is 2.238 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-49.jsonl": "{"text":"Question: In which net do the linkers with SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-] and nodes with SMILES [OH2][K][OH2] self-assemble to form a reticular material with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0?\nAnswer: The reticular material with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0 self-assembles to form a reticular material with the net hcb."} {"text":"Question: In which net do the linkers with SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1 and nodes with SMILES [Zn] self-assemble to form a reticular material with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1?\nAnswer: The reticular material with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1 self-assembles to form a reticular material with the net sql."}", "/scratch/micpie/export/qmof_quantum/train_0-13.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0 has a HLE17 computed band gap (HLE17 single-point after PBE-D3(BJ) optimization) of 1.504 eV."} {"text":"The metal-organic framework (MOF) with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1 has a HLE17 calculated band gap of a PBE-D3(BJ) optimized structure of 3.352 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-26.jsonl": "{"text":"Question: How large is the density of the metal-organic framework with the RCSR identifier hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu]?\nAnswer: The density is 1.924 g\/cm^3."} {"text":"Question: How large is the mass density of the MOF with the RCSR code sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu]?\nAnswer: 1.602 g\/cm^3."}", "/scratch/micpie/export/qmof_quantum/train_0-4.jsonl": "{"text":"The reticular material with the RCSR identifier hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2] has a largest cavity diameter of 2.262 \\AA."} {"text":"The MOF with the RCSR code sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn] has a largest cavity diameter of 5.145 \\AA."}", "/scratch/micpie/export/qmof_quantum/test_0-53.jsonl": "{"text":"User: Which linkers and nodes do I have to combine to form a reticular material with the pore limiting diameter (PLD) 2.770 \\AA?\nAssistant: Do you have other requirements?\nUser: Indeed, I want to use linkers with SMILES S=[C]1=NN=C(O1)c1ccncc1 and want a PAW-PBE-D3(BJ) calculated band gap of 2.025 eV.\nAssistant: You have to combine your linkers with SMILES S=[C]1=NN=C(O1)c1ccncc1 with nodes with SMILES [Cd]."} {"text":"User: Which linkers and nodes do I have to combine to form a reticular material with the pore limiting diameter (PLD) 0.783 \\AA?\nAssistant: Are there additional constraints?\nUser: I want to use linkers with SMILES [O-][n+]1ccc(cc1)C(=O)[O-] and want a PAW-PBE-D3(BJ) computed band gap of 0.759 eV.\nAssistant: You have to combine your linkers with SMILES [O-][n+]1ccc(cc1)C(=O)[O-] with nodes with SMILES Cl[Mn]."}", "/scratch/micpie/export/qmof_quantum/test_0-43.jsonl": "{"text":"Question: What is the HLE17 computed valence band maximum of a PBE-D3(BJ) optimized structure of the MOF with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0?\nAnswer: -2.647 eV."} {"text":"Question: How large is the HLE17 computed valence band maximum of a PBE-D3(BJ) optimized structure of the reticular material with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0?\nAnswer: The valence band maximum is -0.897 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-7.jsonl": "{"text":"The MOF with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0 has a PAW-PBE-D3(BJ) computed band gap of 2.025 eV."} {"text":"The metal-organic framework with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0 has a PAW-PBE-D3(BJ) calculated band gap of 0.759 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-9.jsonl": "{"text":"The MOF with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0 has a PAW-PBE-D3(BJ) computed conduction band minimum (CBM) of 1.729 eV."} {"text":"The metal-organic framework with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1 has a PAW-PBE-D3(BJ) calculated conduction band minimum of 1.772 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-25.jsonl": "{"text":"The MOF with the RCSR code hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2] is a reticular material with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0."} {"text":"The metal-organic framework (MOF) with the RCSR code sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn] is a metal-organic framework with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1."}", "/scratch/micpie/export/qmof_quantum/test_0-48.jsonl": "{"text":"Question: How large is the HSE06 calculated valence band maximum of a PBE-D3(BJ) optimized structure of the MOF with the net jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd]?\nAnswer: The valence band maximum is -1.285 eV."} {"text":"Question: How large is the HSE06 calculated valence band maximum of a PBE-D3(BJ) optimized structure of the reticular material with the topology pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn]?\nAnswer: 0.986 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-58.jsonl": "{"text":"User: I'm wondering about the RCSR code of reticular materials.\nAssistant: That's interesting.\nUser: Which RCSR code do the linkers with SMILES S=[C]1=NN=C(O1)c1ccncc1 and nodes with SMILES [Cd] self-assemble to form a reticular material?\nAssistant: Into the RCSR code jxj.\nUser: One more thing, What density do you estimate for this reticular material?\nAssistant: The density is 1.556 g\/cm^3."} {"text":"User: I'm thinking about the RCSR code of reticular materials.\nAssistant: Is there anything I can do?\nUser: Which RCSR code do the linkers with SMILES [O-][n+]1ccc(cc1)C(=O)[O-] and nodes with SMILES Cl[Mn] self-assemble to form a reticular material?\nAssistant: Into the RCSR code pyr.\nUser: I have one more request, What density do you predict for this metal-organic framework (MOF)?\nAssistant: The density is 2.199 g\/cm^3."}", "/scratch/micpie/export/qmof_quantum/valid_0-45.jsonl": "{"text":"Question: How large is the HSE06 computed band gap of a PBE-D3(BJ) optimized structure of the reticular material with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0?\nAnswer: The band gap of the MOF is 2.851 eV."} {"text":"Question: How large is the HSE06 computed band gap of a PBE-D3(BJ) optimized structure of the metal-organic framework (MOF) with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0?\nAnswer: The band gap is 3.615 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-41.jsonl": "{"text":"Question: How large is the HLE17 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of the metal-organic framework (MOF) with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0?\nAnswer: The conduction band minimum of the MOF is -2.003 eV."} {"text":"Question: What is the HLE17 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of the metal-organic framework with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0?\nAnswer: -1.286 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-22.jsonl": "{"text":"The metal-organic framework (MOF) with the RCSR identifier hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu] has a HSE06 calculated valence band maximum of a PBE-D3(BJ) optimized structure of -0.979 eV."} {"text":"The metal-organic framework with the net sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu] has a HSE06 computed valence band maximum of a PBE-D3(BJ) optimized structure of -0.902 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-45.jsonl": "{"text":"Question: What is the HSE06 computed band gap (HSE06 single-point after PBE-D3(BJ) optimization) of the metal-organic framework with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0?\nAnswer: The band gap is 2.986 eV."} {"text":"Question: What is the HSE06 computed band gap of a PBE-D3(BJ) optimized structure of the MOF with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0?\nAnswer: The band gap is 2.209 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-44.jsonl": "{"text":"Question: How large is the HSE06 computed band gap (HSE06 single-point after PBE-D3(BJ) optimization) of the metal-organic framework with the RCSR code hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu]?\nAnswer: The band gap is 2.851 eV."} {"text":"Question: What is the HSE06 calculated band gap of a PBE-D3(BJ) optimized structure of the metal-organic framework with the RCSR code sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu]?\nAnswer: The band gap is 3.615 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-18.jsonl": "{"text":"The metal-organic framework with the net hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2] has a HSE06 computed band gap (HSE06 single-point after PBE-D3(BJ) optimization) of 2.086 eV."} {"text":"The reticular material with the RCSR code sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn] has a HSE06 computed band gap of a PBE-D3(BJ) optimized structure of 4.270 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-3.jsonl": "{"text":"The reticular material with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0 has a pore limiting diameter (PLD) of 1.281 \\AA."} {"text":"The MOF with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0 has a pore limiting diameter of 2.658 \\AA."}", "/scratch/micpie/export/qmof_quantum/test_0-8.jsonl": "{"text":"The metal-organic framework with the RCSR code jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd] has a PAW-PBE-D3(BJ) calculated conduction band minimum (CBM) of 1.335 eV."} {"text":"The MOF with the topology pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn] has a PAW-PBE-D3(BJ) computed conduction band minimum (CBM) of 2.997 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-14.jsonl": "{"text":"The metal-organic framework (MOF) with the topology jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd] has a HLE17 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of -0.350 eV."} {"text":"The MOF with the RCSR identifier pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn] has a HLE17 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of 0.992 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-17.jsonl": "{"text":"The reticular material with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0 has a HLE17 computed valence band maximum of a PBE-D3(BJ) optimized structure of -2.211 eV."} {"text":"The metal-organic framework with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0 has a HLE17 calculated valence band maximum of a PBE-D3(BJ) optimized structure of -2.198 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-55.jsonl": "{"text":"User: What suggestion do you have if I need a metal-organic framework (MOF) with the PAW-PBE-D3(BJ) calculated band gap of 0.312 eV?\nAssistant: Is there anything else I should take into account?\nUser: I want to have a RCSR identifier hcb and PAW-PBE-D3(BJ) calculated energy of conduction band minimum of 0.647 eV.\nAssistant: Do you have other constraints?\nUser: I have one more request, The density should be 1.924 g\/cm^3 and the space group number should be 2.\nAssistant: I propose that you combine your linkers with SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-] with nodes with SMILES [Cu]."} {"text":"User: What proposals do you have if I want a reticular material with the PAW-PBE-D3(BJ) calculated band gap of 0.938 eV?\nAssistant: Are there additional constraints?\nUser: I want to have a net sql and PAW-PBE-D3(BJ) calculated conduction band minimum of 1.356 eV.\nAssistant: Do you have other constraints?\nUser: The density should be 1.602 g\/cm^3 and the space group number should be 2.\nAssistant: combine your linkers with SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1 with nodes with SMILES [Cu]."}", "/scratch/micpie/export/qmof_quantum/valid_0-14.jsonl": "{"text":"The MOF with the net hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu] has a HLE17 computed conduction band minimum of a PBE-D3(BJ) optimized structure of -2.003 eV."} {"text":"The reticular material with the RCSR code sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu] has a HLE17 calculated conduction band minimum of a PBE-D3(BJ) optimized structure of -1.286 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-25.jsonl": "{"text":"The reticular material with the RCSR identifier jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd] is a reticular material with the MOFid S=[C]1=NN=C(O1)c1ccncc1.[Cd] MOFid-v1.jxj.cat0."} {"text":"The metal-organic framework (MOF) with the net pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn] is a metal-organic framework with the MOFid Cl[Mn].[O-][n+]1ccc(cc1)C(=O)[O-] MOFid-v1.pyr.cat0."}", "/scratch/micpie/export/qmof_quantum/test_0-4.jsonl": "{"text":"The MOF with the RCSR code jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd] has a largest cavity diameter of 4.171 \\AA."} {"text":"The MOF with the RCSR identifier pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn] has a largest cavity diameter of 1.516 \\AA."}", "/scratch/micpie/export/qmof_quantum/valid_0-48.jsonl": "{"text":"Question: What is the HSE06 calculated valence band maximum of a PBE-D3(BJ) optimized structure of the reticular material with the net hcb, linker SMILES [O-]C(=O)c1ccc(cc1)C(=O)[O-], and node SMILES [Cu]?\nAnswer: The valence band maximum is -0.979 eV."} {"text":"Question: What is the HSE06 computed valence band maximum of a PBE-D3(BJ) optimized structure of the MOF with the RCSR code sql, linker SMILES [O]S(=O)(=O)[O], c1ncn(c1)c1ccc(cc1)n1cncc1, and node SMILES [Cu]?\nAnswer: The valence band maximum of the MOF is -0.902 eV."}", "/scratch/micpie/export/qmof_quantum/valid_0-43.jsonl": "{"text":"Question: What is the HLE17 computed valence band maximum of a PBE-D3(BJ) optimized structure of the metal-organic framework with the MOFid [Cu].[O-]C(=O)c1ccc(cc1)C(=O)[O-] MOFid-v1.hcb.cat0?\nAnswer: -2.211 eV."} {"text":"Question: How large is the HLE17 computed valence band maximum of a PBE-D3(BJ) optimized structure of the reticular material with the MOFid [Cu].[O]S(=O)(=O)[O].c1ncn(c1)c1ccc(cc1)n1cncc1 MOFid-v1.sql.cat0?\nAnswer: The valence band maximum is -2.198 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-53.jsonl": "{"text":"User: Which linkers and nodes do I have to combine to form a metal-organic framework with the pore limiting diameter 0.937 \\AA?\nAssistant: Do you have other constraints?\nUser: Thanks, I want to use linkers with SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-] and want a PAW-PBE-D3(BJ) computed band gap of 1.341 eV.\nAssistant: You have to combine your linkers with SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-] with nodes with SMILES [OH2][K][OH2]."} {"text":"User: Which linkers and nodes do I have to combine to form a metal-organic framework (MOF) with the pore limiting diameter 2.694 \\AA?\nAssistant: Do you have other requirements?\nUser: I want to use linkers with SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1 and want a PAW-PBE-D3(BJ) calculated band gap of 3.260 eV.\nAssistant: You have to combine your linkers with SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1 with nodes with SMILES [Zn]."}", "/scratch/micpie/export/qmof_quantum/train_0-37.jsonl": "{"text":"Question: What is the PAW-PBE-D3(BJ) calculated valence band maximum of the metal-organic framework with the MOFid OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-].[OH2][K][OH2] MOFid-v1.hcb.cat0?\nAnswer: The valence band maximum of the MOF is 0.388 eV."} {"text":"Question: What is the PAW-PBE-D3(BJ) computed valence band maximum (VBM) of the MOF with the MOFid N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1.N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1.[Zn] MOFid-v1.sql.cat1?\nAnswer: -1.488 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-12.jsonl": "{"text":"The metal-organic framework with the net jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd] has a HLE17 computed band gap of a PBE-D3(BJ) optimized structure of 2.297 eV."} {"text":"The metal-organic framework (MOF) with the RCSR identifier pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn] has a HLE17 calculated band gap of a PBE-D3(BJ) optimized structure of 1.889 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-38.jsonl": "{"text":"Question: What is the HLE17 calculated band gap of a PBE-D3(BJ) optimized structure of the MOF with the net hcb, linker SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-], and node SMILES [OH2][K][OH2]?\nAnswer: The band gap of the MOF is 1.504 eV."} {"text":"Question: How large is the HLE17 computed band gap (HLE17 single-point after PBE-D3(BJ) optimization) of the MOF with the topology sql, linker SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1, and node SMILES [Zn]?\nAnswer: The band gap of the MOF is 3.352 eV."}", "/scratch/micpie/export/qmof_quantum/train_0-51.jsonl": "{"text":"User: With which nodes do I have to combine my linkers with SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-] to form a reticular material with the net hcb?\nAssistant: You have to combine your linkers with SMILES OCC1OC(C(C(C1O)O)O)c1c(O)c2c(c(c1O)O)C(=O)c1c(C2=O)c(C)c(c(c1)O)C(=O)[O-] with nodes with SMILES [OH2][K][OH2] to form a reticular material with the hcb net."} {"text":"User: With which nodes do I have to combine my linkers with SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1 to form a metal-organic framework with the net sql?\nAssistant: You have to combine your linkers with SMILES N1=NN=C([N]1)c1[nH]cnc1C1=NN=N[N]1, N1=NN=C([N]1)c1nc[nH]c1C1=N[N]N=N1 with nodes with SMILES [Zn] to form a metal-organic framework with the sql net."}", "/scratch/micpie/export/qmof_quantum/test_0-38.jsonl": "{"text":"Question: How large is the HLE17 computed band gap (HLE17 single-point after PBE-D3(BJ) optimization) of the metal-organic framework with the net jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd]?\nAnswer: 2.297 eV."} {"text":"Question: How large is the HLE17 calculated band gap of a PBE-D3(BJ) optimized structure of the MOF with the RCSR identifier pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn]?\nAnswer: 1.889 eV."}", "/scratch/micpie/export/qmof_quantum/test_0-20.jsonl": "{"text":"The reticular material with the RCSR code jxj, linker SMILES S=[C]1=NN=C(O1)c1ccncc1, and node SMILES [Cd] has a HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure of 1.702 eV."} {"text":"The metal-organic framework (MOF) with the topology pyr, linker SMILES [O-][n+]1ccc(cc1)C(=O)[O-], and node SMILES Cl[Mn] has a HSE06 computed conduction band minimum of a PBE-D3(BJ) optimized structure of 3.195 eV."}", "/scratch/micpie/export/drug_protein/test_0-1.jsonl": "{"text":"User: Can you give me one example for a protein that is targeted by the drug Fulvestrant?\nAssistant: Yes, the protein ER is targeted by this drug."} {"text":"User: Can you come up with one example for a protein that is targeted by the drug 9-Deazaguanine?\nAssistant: Of course, the protein HGPRTase is targeted by the above drug."}", "/scratch/micpie/export/drug_protein/valid_0-0.jsonl": "{"text":"The drug InChI=1S\/C8H10N4O2\/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2\/h4H,1-3H3 targets the protein cGMP phosphodiesterase 6C."} {"text":"The drug [C][C][=C][Branch1][=Branch1][S][C][=C][Ring1][Branch1][C][=Branch2][Ring1][Ring1][=C][C][C][N][C][C][C][C@H1][Branch1][Ring2][C][Ring1][=Branch1][C][Branch1][C][O][=O][C][=C][Branch1][C][C][C][=C][S][Ring1][=Branch1] targets the protein Sodium- and chloride-dependent GABA transporter 1."}", "/scratch/micpie/export/drug_protein/test_0-0.jsonl": "{"text":"The drug [H][C@@][C][C][C@H1][Branch1][C][O][C@@][Ring1][=Branch1][Branch1][C][C][C][C][C@][Branch1][C][H][C][=C][C][=C][Branch1][C][O][C][=C][Ring1][#Branch1][C][C@@H1][Branch2][Ring1][S][C][C][C][C][C][C][C][C][C][S][=Branch1][C][=O][C][C][C][C][Branch1][C][F][Branch1][C][F][C][Branch1][C][F][Branch1][C][F][F][C@@][Ring2][Ring2][#Branch2][Ring2][Ring1][P][H] targets the protein Estrogen receptor."} {"text":"The drug 9-Deazaguanine targets the protein HGPRT."}", "/scratch/micpie/export/drug_protein/train_0-0.jsonl": "{"text":"The drug InChI=1S\/C14H16N2O5\/c17-11(5-8-14(20)21)15-9-1-3-10(4-2-9)16-12(18)6-7-13(16)19\/h1-4,11,15,17H,5-8H2,(H,20,21)\/t11-\/m0\/s1 targets the protein MLC-2B."} {"text":"The drug Histidine targets the protein Histidase."}", "/scratch/micpie/export/drug_protein/valid_0-1.jsonl": "{"text":"User: Can you come up with an example for a protein that is targeted by the drug Caffeine?\nAssistant: Sure, the protein Cone cGMP-specific 3',5'-cyclic phosphodiesterase subunit alpha' is targeted by this drug."} {"text":"User: Can you give me an example for a protein that is targeted by the drug Tiagabine?\nAssistant: Sure, the protein Solute carrier family 6 member 1 is targeted by the above drug."}", "/scratch/micpie/export/drug_protein/train_0-1.jsonl": "{"text":"User: Can you come up with one example for a protein that is targeted by the drug 4-[4-(2,5-DIOXO-PYRROLIDIN-1-YL)-PHENYLAMINO]-4-HYDROXY-BUTYRIC ACID?\nAssistant: Yes, of course, the protein Myosin regulatory light chain MRLC3 is targeted by this drug."} {"text":"User: Can you come up with an example for a protein that is targeted by the drug InChI=1S\/C6H9N3O2\/c7-5(6(10)11)1-4-2-8-3-9-4\/h2-3,5H,1,7H2,(H,8,9)(H,10,11)\/t5-\/m0\/s1?\nAssistant: Yes, the protein Histidine ammonia-lyase is targeted by this drug."}", "/scratch/micpie/export/mattermodeling_stackexchange/test_0-1.jsonl": "{"text":"Task: Generate a title for this question.\nQuestion: I am performing an SCF Quantum ESPRESSO calculation on an HPC system, but it is taking significantly longer than usual. When I inspect the output file, I notice that it is stuck at a specific line, yet the calculation appears to be still running in the background.\nInterestingly, when I ran the exact same calculation on my laptop using just 2 cores, it completed within just 1 minute. However, when I tested it on the HPC using both 10 cores then 2 cores, the problem persisted regardless of the number of cores utilized.\nCould someone please explain to me the possible reasons for this difference in calculation time between the HPC system and my laptop?\nPS:\nMy input contains `nosym=.true.` and `noinv=.true.` in the `&SYSTEM` block. However, when I remove these two lines from the input, the calculation ran smoothly without any issues on the HPC.\n**The last few lines in case of HPC**:\n```\n Estimated static dynamical RAM per process > 30.73 MB\n Estimated max dynamical RAM per process > 36.84 MB\n Estimated total dynamical RAM > 147.34 MB\n Initial potential from superposition of free atoms\n starting charge 35.99975, renormalised to 36.00000\n Starting wfcs are 19 randomized atomic wfcs + 1 random wfcs\n total cpu time spent up to now is 110.7 secs\n Self-consistent Calculation\n iteration # 1 ecut= 70.00 Ry beta= 0.70\n Davidson diagonalization with overlap\n---- Real-time Memory Report at c_bands before calling an iterative solver\n 252 MiB given to the printing process from OS\n 0 MiB allocation reported by mallinfo(arena+hblkhd)\n 22597 MiB available memory on the node where the printing process lives\n```\n**The lines in case of laptop**:\n```\n Estimated static dynamical RAM per process > 57.66 MB\n Estimated max dynamical RAM per process > 69.90 MB\n Estimated total dynamical RAM > 139.80 MB\n Initial potential from superposition of free atoms\n starting charge 35.9998, renormalised to 36.0000\n Starting wfcs are 19 randomized atomic wfcs + 1 random wfcs\n total cpu time spent up to now is 2.4 secs\n Self-consistent Calculation\n iteration # 1 ecut= 70.00 Ry beta= 0.70\n Davidson diagonalization with overlap\n---- Real-time Memory Report at c_bands before calling an iterative solver\n 79 MiB given to the printing process from OS\n 0 MiB allocation reported by mallinfo(arena+hblkhd)\n 832 MiB available memory on the node where the printing process lives\n------------------\n ethr = 1.00E-02, avg # of iterations = 3.7\n Threshold (ethr) on eigenvalues was too large:\n Diagonalizing with lowered threshold\n Davidson diagonalization with overlap\n---- Real-time Memory Report at c_bands before calling an iterative solver\n 84 MiB given to the printing process from OS\n 0 MiB allocation reported by mallinfo(arena+hblkhd)\n 825 MiB available memory on the node where the printing process lives\n------------------\n ethr = 7.58E-04, avg # of iterations = 1.0\n total cpu time spent up to now is 12.1 secs\n```\nAnswer: What factors could cause a calculation to run successfully on a laptop but encounter issues on an HPC system?"} {"text":"Task: Generate a title for this question.\n\\nI wish to calculate solution-phase dielectric constants (required for a Monte-Carlo model) for CoCl$\\_2$ and TaS$\\_2$ dissolved in DMF.\nIs it possible to estimate these constants from the solid-state values, or to obtain them from first-principles \/ molecular dynamics calculations?\nAssistant: Is there a way to obtain solution-phase dielectric constants?"}", "/scratch/micpie/export/mattermodeling_stackexchange/valid_0-0.jsonl": "{"text":"Task: Please answer the question of the user.\nQuestion: As far as I understand (as a complete beginner to molecular simulation and modelling), Coulomb's law for two atoms i and j is:\n$$F=k\\frac{q\\_iq\\_j}{(r\\_{ij})^2}$$\n(the force is proportional to the product of the two charges and the **inverse square** of the distance)\nI am trying to understand molecular simulation and force fields and many use this formula but without squaring the distance, is there a reason behind this? My best thought is simply to improve computational time complexity as this is often the most computationally expensive step and quadratic terms scale worse than linear ones.\nAnswer: The force $F$ is proportional to $\\frac{1}{r^2}$ but the potential $V$ is proportional to $\\frac{1}{r}$.\nThis is because $F$ is proportional to $\\frac{\\partial V}{\\partial r}$, the derivative of $V$ with respect to $r$ (this is by definition).\nUsing $r$ instead of $r^2$ is not to save computational time (it is incorrect to think that calculating $r\\*r$ is more computationally expensive than calculating $q\\_1\\*q\\_2$). It is simply the correct Coulomb potential corresponding to the Coulomb force that you correctly wrote."} {"text":"Task: Provide a clear and concise reply to the user's inquiry.\nInquiry: What are the main computational frameworks used in materials modeling? Software packages can include those designed to run on both classical and quantum devices.\nAnswer: Monte Carlo\n===========\n---\nIn this case there is no one answer, and perhaps no individual best suited to write the full list. I can contribute few words about *Monte Carlo methods.*\n**What is Monte Carlo?** Monte Carlo (MC) is a name that refers to a broad range of computational techniques that rely on random numbers. MC is very broadly applicable anywhere you need to do a high-dimensional integral or sum, so it is widely used in fields like finance and even election forecasting (like Nate Silver's fivethiryeight), as well as the physical sciences.\n**Classical Monte Carlo:** Classical Monte Carlo is capable (in general) of describing any equilibrium statistical mechanical system. It works by stochastically sampling the Boltzmann distribution. Basically, it works by starting with an state, proposing updates to that state, accepting those updates with some probability (which satisfies the detailed balance condition). In practice, it is usually used with simplified models like the Ising model, or hard core spheres, rather than directly simulating atoms and electrons.\n**Quantum Monte Carlo:** Quantum Monte Carlo (QMC) is done by mapping a quantum problem onto an equivalent classical ensemble in a manner that sometimes looks like a path integral. One you have the corresponding classical ensemble then you can use classical Monte Carlo to study it. Similar to classical MC, QMC is typically used for simplified models, like the Heisenberg model, which can be instructive for how physical materials work.\nQMC has one major flaw: **the sign problem.** When converting from a quantum to a classical ensemble, sometimes you end up with negative probabilities. This means that the sampled states tend to cancel each other out, so in most cases you cannot do anything useful with QMC when there is a sign problem. Systems that usually have sign problems include anything with mobile fermions in $d>1$ and systems with frustrated spin interactions (like the triangular Heisenberg antiferromagnet)."}", "/scratch/micpie/export/mattermodeling_stackexchange/test_0-0.jsonl": "{"text":"Task: Offer a concise and informative answer to the user's question.\nInquiry: I am performing an SCF Quantum ESPRESSO calculation on an HPC system, but it is taking significantly longer than usual. When I inspect the output file, I notice that it is stuck at a specific line, yet the calculation appears to be still running in the background.\nInterestingly, when I ran the exact same calculation on my laptop using just 2 cores, it completed within just 1 minute. However, when I tested it on the HPC using both 10 cores then 2 cores, the problem persisted regardless of the number of cores utilized.\nCould someone please explain to me the possible reasons for this difference in calculation time between the HPC system and my laptop?\nPS:\nMy input contains `nosym=.true.` and `noinv=.true.` in the `&SYSTEM` block. However, when I remove these two lines from the input, the calculation ran smoothly without any issues on the HPC.\n**The last few lines in case of HPC**:\n```\n Estimated static dynamical RAM per process > 30.73 MB\n Estimated max dynamical RAM per process > 36.84 MB\n Estimated total dynamical RAM > 147.34 MB\n Initial potential from superposition of free atoms\n starting charge 35.99975, renormalised to 36.00000\n Starting wfcs are 19 randomized atomic wfcs + 1 random wfcs\n total cpu time spent up to now is 110.7 secs\n Self-consistent Calculation\n iteration # 1 ecut= 70.00 Ry beta= 0.70\n Davidson diagonalization with overlap\n---- Real-time Memory Report at c_bands before calling an iterative solver\n 252 MiB given to the printing process from OS\n 0 MiB allocation reported by mallinfo(arena+hblkhd)\n 22597 MiB available memory on the node where the printing process lives\n```\n**The lines in case of laptop**:\n```\n Estimated static dynamical RAM per process > 57.66 MB\n Estimated max dynamical RAM per process > 69.90 MB\n Estimated total dynamical RAM > 139.80 MB\n Initial potential from superposition of free atoms\n starting charge 35.9998, renormalised to 36.0000\n Starting wfcs are 19 randomized atomic wfcs + 1 random wfcs\n total cpu time spent up to now is 2.4 secs\n Self-consistent Calculation\n iteration # 1 ecut= 70.00 Ry beta= 0.70\n Davidson diagonalization with overlap\n---- Real-time Memory Report at c_bands before calling an iterative solver\n 79 MiB given to the printing process from OS\n 0 MiB allocation reported by mallinfo(arena+hblkhd)\n 832 MiB available memory on the node where the printing process lives\n------------------\n ethr = 1.00E-02, avg # of iterations = 3.7\n Threshold (ethr) on eigenvalues was too large:\n Diagonalizing with lowered threshold\n Davidson diagonalization with overlap\n---- Real-time Memory Report at c_bands before calling an iterative solver\n 84 MiB given to the printing process from OS\n 0 MiB allocation reported by mallinfo(arena+hblkhd)\n 825 MiB available memory on the node where the printing process lives\n------------------\n ethr = 7.58E-04, avg # of iterations = 1.0\n total cpu time spent up to now is 12.1 secs\n```\nAnswer: Almost all codes are written to take advantage of symmetry that is available in the system, to reduce the number of calculations that needs to be performed. If you explicitly say that symmetry must not be considered, it will take more time since now the code is stuck doing the calculations which would have otherwise been generated by symmetry.\nWhen you are doing calculations on a cluster with multiple nodes, there is an overhead for splitting the jobs into chunks, collecting the data back to master and related steps. In your case, this overhead is significantly larger than the actual time required to do the calculations on a single node."} {"text":"Task: Your role is to respond to the user's question with clarity.\n\\nI wish to calculate solution-phase dielectric constants (required for a Monte-Carlo model) for CoCl$\\_2$ and TaS$\\_2$ dissolved in DMF.\nIs it possible to estimate these constants from the solid-state values, or to obtain them from first-principles \/ molecular dynamics calculations?\nAnswer: It's possible to estimate solution-phase dielectric constant from a molecular dynamics simulation using this formula:\n$$\\epsilon\\_{r} = 1 + \\frac{4\\pi}{3Vk\\_{B}T}(\\langle \\mathbf{P}^{2} \\rangle - \\langle \\mathbf{P} \\rangle^{2})$$\nWhere $V$ is the volume, $k\\_{B}$ is Boltzmann's constant, $T$ is temperature, and $P$ is the dipole moment defined as: $\\mathbf{P} = \\sum\\_{i} \\vec{\\mu}\\_{i}$ the summation of molecular dipole moments.\nIn the absence of any external electric field (which I assume is the case here), from electrostatics, you have:\n$$\\mathbf{P}(\\mathbf{r}) = \\chi \\int\\_{\\Omega} \\mathbf{T}(\\mathbf{r}-\\mathbf{r}^{'})\\cdot \\mathbf{P}(\\mathbf{r}^{'}) d^{3} \\mathbf{r}^{'}$$\n$\\chi$ is the susceptibility, which is unknown here. Also, $\\mathbf{T}$ is the dipole-dipole tensor defined as:\n$$T\\_{ij} =\n\\frac{\\partial^{2}}{\\partial x\\_{i}\\partial x\\_{j}}(-\\ln(r))$$\nNow if you replace the integral with a summation and replace $\\mathbf{P}(\\mathbf{r})$ with the discretized dipole moment at molecular locations shown as $\\mathsf{P}$ and the discretized dipole-dipole tensor (matrix $\\mathsf{T}$), you have:\n$$\\mathsf{P} = \\chi \\mathsf{T} \\cdot \\mathsf{P}$$\nor:\n$$\\mathsf{T} \\cdot \\mathsf{P} = \\frac{1}{\\chi} \\mathsf{P}$$\nThis is an eigenvalue problem where you know dipole-dipole tensor $\\mathsf{T}$, while the eigenvector (dipole moment $\\mathsf{P}$) and eigenvalue (susceptibility $\\chi$) are unknowns. You can solve this eigenvalue problem for your system and then you'll get the dielectric constant by estimating the fluctuation of dipole moment."}", "/scratch/micpie/export/mattermodeling_stackexchange/train_0-0.jsonl": "{"text":"Task: Provide a clear and concise reply to the user's inquiry.\nInquiry: I have a general query regarding NEGF approach:\nIn TranSIESTA code, when self-energy of the electrodes is calculated, does it require the coupling elements of the electrode with the central scattering region? Or,does it require just the interaction between the principle layers of the leads?\nAnswer: Self-energies are calculated from a pristine, bulk calculation of only the electrode.\nThis is a requirement of the self-energy algorithm. This also forces the boundary conditions of the electrodes.\nSince, the calculation is:\n$$\n\\mathbf G^{-1} =\n\\begin{bmatrix}\n\\mathbf S\\_1 e - \\mathbf H\\_1 - \\Sigma\\_1 & \\cdots \\\\\n\\vdots \\\\\n\\end{bmatrix}\n$$\nThe $\\Sigma\\_1$ is the self-energy which is derived from the bulk electrode named $1$.\nHowever, the above requires that the electrode in the device region is equivalent to the bulk electrode region.\nThe initiated reader will notice that one can choose $\\mathbf H\\_1$ between two values:\n* the pristine bulk Hamiltonian: `TS.Elec.<>.Bulk true` \nforces the coupled region in the device region to be very electrode bulk like\n* the device region Hamiltonion: `TS.Elec.<>.Bulk false` \nforces the electrode part of the device to couple to a bulk electrode with bulk properties. This can in some cases spare you some layers of electrode in the device region.\nIt is always the user's responsibility to ensure the boundary conditions are fulfilled! TranSIESTA will\/can not check this."} {"text":"Task: Your role is to respond to the user's question with clarity.\nInquiry: In terms of energy, how are van der Waals forces modelled (are there formulas\/laws that govern these)?\nAssistant: Additional to Alone Programmer's answer. The Lennard-Jones Potential (LJ 12-6) is the standard, but is not unique, in some cases the factor of 6 is changed to 8 to simulate better the hydrogen bonds. \nAlso, there is the Buckingham potential, where the repulsion part ($r^{12}$ term) is modified to the exponential term. But the attractive long-range term ($r^{6}$) is the same.\nLJ 12-6 fits very well for the potential energy surface of a noble gas dimer. The epsilon term is the maximum depth of the curve, and sigma is the radius where the potential energy is zero (short distance). When the LJ is used to simulate the interaction of different atomic species, there is not a rule to determine the sigma and epsilon terms... and there are geometrical and arithmetic averages, using the values for the same-species interaction for each atom."}", "/scratch/micpie/export/mattermodeling_stackexchange/valid_0-1.jsonl": "{"text":"Task: Create a meaningful title for this question.\nInquiry: As far as I understand (as a complete beginner to molecular simulation and modelling), Coulomb's law for two atoms i and j is:\n$$F=k\\frac{q\\_iq\\_j}{(r\\_{ij})^2}$$\n(the force is proportional to the product of the two charges and the **inverse square** of the distance)\nI am trying to understand molecular simulation and force fields and many use this formula but without squaring the distance, is there a reason behind this? My best thought is simply to improve computational time complexity as this is often the most computationally expensive step and quadratic terms scale worse than linear ones.\nTitle: Why are Coulomb interactions modeled with just r, not r squared?"} {"text":"Task: Summarize the question in a title.\nInquiry: What are the main computational frameworks used in materials modeling? Software packages can include those designed to run on both classical and quantum devices.\nTitle: What are the main computational frameworks used in materials modeling?"}", "/scratch/micpie/export/mattermodeling_stackexchange/train_0-1.jsonl": "{"text":"Task: Create a meaningful title for this question.\n\\nI have a general query regarding NEGF approach:\nIn TranSIESTA code, when self-energy of the electrodes is calculated, does it require the coupling elements of the electrode with the central scattering region? Or,does it require just the interaction between the principle layers of the leads?\nTitle: Implemetation of NEGF in TranSIESTA"} {"text":"Task: Create a meaningful title for this question.\nInquiry: In terms of energy, how are van der Waals forces modelled (are there formulas\/laws that govern these)?\nIn molecular mechanics, how are van der Waals forces modelled?"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/test_0-10.jsonl": "{"text":"User: I'm looking for the canonical SMILES of a molecule that is not inhibiting CYP P450 2D6?\nAssistant: This is a molecule that is not inhibiting CYP P450 2D6: CCOC(=O)C(NC(C)=O)C(OC(C)=O)c1cccc(N(CCO)CCO)c1"} {"text":"User: I'm looking for the canonical SMILES of a molecule that is not inhibiting CYP2D6?\nAssistant: This is a molecule that is not inhibiting CYP2D6: Cc1ccc(NC(=S)NCCN2CCOCC2)cc1C"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/valid_0-8.jsonl": "{"text":"User: Is the molecule with the canonical SMILES CCc1cc2c(=O)[nH]cnc2s1 inhibiting CYP P450 2D6?\nAssistant: No, it is not inhibiting CYP P450 2D6."} {"text":"User: Is the molecule with the InChI InChI=1S\/C24H22FN5O2\/c25-19-8-6-18(7-9-19)21-23(31)30(11-10-17-4-2-1-3-5-17)22-20(27-21)16-26-24(28-22)29-12-14-32-15-13-29\/h1-9,16H,10-15H2 inhibiting CYP P450 2D6?\nAssistant: No, it is not inhibiting CYP P450 2D6."}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/train_0-8.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C13H9NO5S3\/c1-8(15)14(22(17,18)12-3-2-6-20-12)9-4-5-10-11(7-9)21-13(16)19-10\/h2-7H,1H3 inhibiting CYP P450 2D6?\nAssistant: No, it is not inhibiting CYP P450 2D6."} {"text":"User: Is the molecule with the SELFIES [O][=C][C][N][=C][Branch1][Branch2][C][=C][C][=C][S][Ring1][Branch1][C][=C][Branch1][N][S][C][=C][Ring1][Branch1][C][C][C][C][Ring1][=Branch1][N][Ring2][Ring1][Ring1] inhibiting CYP2D6?\nAssistant: Yes, it is inhibiting CYP2D6."}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP2D6.\nMolecule SELFIES: [C][C][O][C][=Branch1][C][=O][C][Branch1][#Branch1][N][C][Branch1][C][C][=O][C][Branch1][#Branch1][O][C][Branch1][C][C][=O][C][=C][C][=C][C][Branch1][#Branch2][N][Branch1][Ring2][C][C][O][C][C][O][=C][Ring1][=N]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not inhibiting CYP2D6."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2D6.\ncanonical SMILES: Cc1ccc(NC(=S)NCCN2CCOCC2)cc1C\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not inhibiting CYP P450 2D6."}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/valid_0-9.jsonl": "{"text":"User: Can you generate the SMILES of a molecule that is not inhibiting CYP P450 2D6?\nAssistant: Yes, I'm happy to help, here you go: CCc1cc2c(=O)[nH]cnc2s1"} {"text":"User: Can you give me the InChI of a molecule that is not inhibiting CYP2D6?\nAssistant: Yes, I'm happy to help, here you go: InChI=1S\/C24H22FN5O2\/c25-19-8-6-18(7-9-19)21-23(31)30(11-10-17-4-2-1-3-5-17)22-20(27-21)16-26-24(28-22)29-12-14-32-15-13-29\/h1-9,16H,10-15H2"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/test_0-1.jsonl": "{"text":"Based on the DeepSMILES representation CCOC=O)CNCC)=O)))COCC)=O)))cccccNCCO)))CCO))))c6, the molecule exhibits no inhibition of CYP2D6."} {"text":"Based on the SELFIES representation [C][C][=C][C][=C][Branch1][P][N][C][=Branch1][C][=S][N][C][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C], the molecule exhibits no inhibition of CYP2D6."}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/valid_0-0.jsonl": "{"text":"The molecule with the DeepSMILES CCcccc=O)[nH]cnc6s9 shows no inhibition of CYP2D6."} {"text":"The molecule with the SELFIES [O][=C][C][Branch1][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][=N][C][=C][N][=C][Branch1][=Branch2][N][C][C][O][C][C][Ring1][=Branch1][N][=C][Ring1][N][N][Ring2][Ring1][#Branch1][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1] exhibits no inhibition of CYP P450 2D6."}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/test_0-2.jsonl": "{"text":"The SELFIES [C][C][O][C][=Branch1][C][=O][C][Branch1][#Branch1][N][C][Branch1][C][C][=O][C][Branch1][#Branch1][O][C][Branch1][C][C][=O][C][=C][C][=C][C][Branch1][#Branch2][N][Branch1][Ring2][C][C][O][C][C][O][=C][Ring1][=N] is from a molecule that displays no inhibition of CYP P450 2D6."} {"text":"The SMILES Cc1ccc(NC(=S)NCCN2CCOCC2)cc1C represents a molecule that displays no inhibition of CYP P450 2D6."}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/valid_0-10.jsonl": "{"text":"User: I'm looking for the SMILES of a molecule that is not inhibiting CYP2D6?\nAssistant: This is a molecule that is not inhibiting CYP2D6: CCc1cc2c(=O)[nH]cnc2s1"} {"text":"User: I'm looking for the SMILES of a molecule that is not inhibiting CYP P450 2D6?\nAssistant: This is a molecule that is not inhibiting CYP P450 2D6: O=c1c(-c2ccc(F)cc2)nc2cnc(N3CCOCC3)nc2n1CCc1ccccc1"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/train_0-6.jsonl": "{"text":"Task: Please give me a DeepSMILES based on the description.\nDescription: A molecule that is inhibiting CYP P450 2D6.\nResult: CC=O)Nccccoc=O)sc5c9)))))))))S=O)=O)ccccs5"} {"text":"Task: Please create a SELFIES based on the description below.\nDescription: A molecule that is inhibiting CYP P450 2D6.\nResult: [O][=C][C][N][=C][Branch1][Branch2][C][=C][C][=C][S][Ring1][Branch1][C][=C][Branch1][N][S][C][=C][Ring1][Branch1][C][C][C][C][Ring1][=Branch1][N][Ring2][Ring1][Ring1]"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/valid_0-6.jsonl": "{"text":"Task: Please give me a molecule InChI based on the text description below.\nDescription: A molecule that is inhibiting CYP2D6.\nResult: InChI=1S\/C8H8N2OS\/c1-2-5-3-6-7(11)9-4-10-8(6)12-5\/h3-4H,2H2,1H3,(H,9,10,11)"} {"text":"Task: Please create a DeepSMILES based on the description below.\nDescription: A molecule that is inhibiting CYP2D6.\nResult: O=cc-ccccF)cc6))))))nccncNCCOCC6))))))nc6n%10CCcccccc6"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/test_0-9.jsonl": "{"text":"User: Can you generate the canonical SMILES of a molecule that is not inhibiting CYP P450 2D6?\nAssistant: Sure, here you go: CCOC(=O)C(NC(C)=O)C(OC(C)=O)c1cccc(N(CCO)CCO)c1"} {"text":"User: Can you generate the SELFIES of a molecule that is not inhibiting CYP2D6?\nAssistant: Yes, here you go: [C][C][=C][C][=C][Branch1][P][N][C][=Branch1][C][=S][N][C][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C]"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/test_0-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CCOC(=O)C(NC(C)=O)C(OC(C)=O)c1cccc(N(CCO)CCO)c1 shows no inhibition of CYP2D6."} {"text":"The molecule with the InChI InChI=1S\/C15H23N3OS\/c1-12-3-4-14(11-13(12)2)17-15(20)16-5-6-18-7-9-19-10-8-18\/h3-4,11H,5-10H2,1-2H3,(H2,16,17,20) exhibits no inhibition of CYP2D6."}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/valid_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the SMILES CCc1cc2c(=O)[nH]cnc2s1 is inhibiting CYP2D6?\nAssistant: No, this molecule is not inhibiting CYP2D6."} {"text":"User: Can you derive if the molecule with the SMILES O=c1c(-c2ccc(F)cc2)nc2cnc(N3CCOCC3)nc2n1CCc1ccccc1 is inhibiting CYP P450 2D6?\nAssistant: No, this molecule is not inhibiting CYP P450 2D6."}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/test_0-3.jsonl": "{"text":"The SELFIES [C][C][O][C][=Branch1][C][=O][C][Branch1][#Branch1][N][C][Branch1][C][C][=O][C][Branch1][#Branch1][O][C][Branch1][C][C][=O][C][=C][C][=C][C][Branch1][#Branch2][N][Branch1][Ring2][C][C][O][C][C][O][=C][Ring1][=N] is not inhibiting CYP2D6."} {"text":"The SELFIES [C][C][=C][C][=C][Branch1][P][N][C][=Branch1][C][=S][N][C][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C] is not inhibiting CYP P450 2D6."}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/valid_0-11.jsonl": "{"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be inhibiting CYP2D6.\nAssistant: Ok, this DeepSMILES is not inhibiting CYP2D6: CCcccc=O)[nH]cnc6s9"} {"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be inhibiting CYP2D6.\nAssistant: Ok, here you go, this SELFIES is not inhibiting CYP2D6: [O][=C][C][Branch1][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][=N][C][=C][N][=C][Branch1][=Branch2][N][C][C][O][C][C][Ring1][=Branch1][N][=C][Ring1][N][N][Ring2][Ring1][#Branch1][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1]"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/train_0-0.jsonl": "{"text":"The molecule with the InChI representation of InChI=1S\/C13H9NO5S3\/c1-8(15)14(22(17,18)12-3-2-6-20-12)9-4-5-10-11(7-9)21-13(16)19-10\/h2-7H,1H3 shows no inhibition of CYP P450 2D6."} {"text":"The molecule with the SMILES O=C1CN=C(c2cccs2)c2c(sc3c2CCCC3)N1 displays inhibition of CYP2D6."}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/test_0-6.jsonl": "{"text":"Task: Please generate a molecule InChI based on the description.\nDescription: A molecule that is inhibiting CYP P450 2D6.\nResult: InChI=1S\/C19H28N2O7\/c1-4-27-19(26)17(20-13(2)24)18(28-14(3)25)15-6-5-7-16(12-15)21(8-10-22)9-11-23\/h5-7,12,17-18,22-23H,4,8-11H2,1-3H3,(H,20,24)"} {"text":"Task: Please generate a SELFIES based on the description below.\nDescription: A molecule that is inhibiting CYP P450 2D6.\nResult: [C][C][=C][C][=C][Branch1][P][N][C][=Branch1][C][=S][N][C][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C]"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/train_0-10.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that is not inhibiting CYP2D6?\nAssistant: This is a molecule that is not inhibiting CYP2D6: CC(=O)N(c1ccc2oc(=O)sc2c1)S(=O)(=O)c1cccs1"} {"text":"User: I'm searching for the SMILES of a molecule that is inhibiting CYP2D6?\nAssistant: This is a molecule that is inhibiting CYP2D6: O=C1CN=C(c2cccs2)c2c(sc3c2CCCC3)N1"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/train_0-3.jsonl": "{"text":"The molecule SMILES CC(=O)N(c1ccc2oc(=O)sc2c1)S(=O)(=O)c1cccs1 is not inhibiting CYP2D6."} {"text":"The InChI InChI=1S\/C15H14N2OS2\/c18-12-8-16-14(11-6-3-7-19-11)13-9-4-1-2-5-10(9)20-15(13)17-12\/h3,6-7H,1-2,4-5,8H2,(H,17,18) is inhibiting CYP P450 2D6."}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/train_0-12.jsonl": "{"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be inhibiting CYP2D6.\nAssistant: Understood, this SELFIES is not inhibiting CYP2D6: [C][C][=Branch1][C][=O][N][Branch1][P][C][=C][C][=C][O][C][=Branch1][C][=O][S][C][Ring1][=Branch1][=C][Ring1][#Branch2][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][S][Ring1][Branch1]"} {"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should be inhibiting CYP P450 2D6.\nAssistant: Got it, this canonical SMILES is inhibiting CYP P450 2D6: O=C1CN=C(c2cccs2)c2c(sc3c2CCCC3)N1"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [C][C][O][C][=Branch1][C][=O][C][Branch1][#Branch1][N][C][Branch1][C][C][=O][C][Branch1][#Branch1][O][C][Branch1][C][C][=O][C][=C][C][=C][C][Branch1][#Branch2][N][Branch1][Ring2][C][C][O][C][C][O][=C][Ring1][=N] inhibiting CYP2D6?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any additional words.\nOptions:\n1. False\n2. True\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES CccccNC=S)NCCNCCOCC6)))))))))))cc6C inhibiting CYP2D6?\nConstraint: Even if you are uncertain, you must pick either a or b without using any other words.\nOptions:\na. True\nb. False\nAnswer: b"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/valid_0-2.jsonl": "{"text":"The canonical SMILES CCc1cc2c(=O)[nH]cnc2s1 represents a molecule that exhibits no inhibition of CYP P450 2D6."} {"text":"The DeepSMILES O=cc-ccccF)cc6))))))nccncNCCOCC6))))))nc6n%10CCcccccc6 represents a molecule that exhibits no inhibition of CYP2D6."}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting CYP2D6?\nConstraint: You must select none, one or more options from a, b, c, d, or e without using any other words.\nOptions:\na) O=C(Nc1ccsc1C(=O)NC1CCCCC1)c1ccc(F)cc1\nb) CCC1(C)Cc2c(sc3nnn(CC(=O)Nc4cccc5ccccc45)c(=O)c23)CO1\nc) O=[N+]([O-])c1cccc2c[n-]nc12\nd) Cc1ccccc1OCC(=O)Nc1sc(C(=O)Nc2cccc(C)c2C)c(C)c1C#N\ne) CC(=O)N(c1ccc2oc(=O)sc2c1)S(=O)(=O)c1cccs1\nAnswer: a, c, d, e"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are inhibiting CYP2D6?\nConstraint: You must select none, one or more options from 1, 2, 3, 4, or 5 without using any other words.\nOptions:\n[1] InChI=1S\/C15H14N2OS2\/c18-12-8-16-14(11-6-3-7-19-11)13-9-4-1-2-5-10(9)20-15(13)17-12\/h3,6-7H,1-2,4-5,8H2,(H,17,18)\n[2] InChI=1S\/C29H38N4O5\/c1-5-22(20-12-7-6-8-13-20)32-28(35)31-16-14-21-23(30-37-17-15-19(4)11-9-10-18(2)3)26-27(38-26)25(34)24(21)33(31)29(32)36\/h6-8,10,12-13,15,21-22,24-27,34H,5,9,11,14,16-17H2,1-4H3\/b19-15+,30-23+\/t21-,22-,24+,25-,26+,27+\/m0\/s1\n[3] InChI=1S\/C9H6Cl2N2OS\/c1-15-9-13-12-8(14-9)6-3-2-5(10)4-7(6)11\/h2-4H,1H3\n[4] InChI=1S\/C25H23N3O5S\/c1-3-16-6-9-18(10-7-16)26-22(29)15-34-25-27-21-13-17(24(31)32-2)8-11-20(21)23(30)28(25)14-19-5-4-12-33-19\/h4-13H,3,14-15H2,1-2H3,(H,26,29)\n[5] InChI=1S\/C24H25N2O2P\/c1-25-24(27)22-17-21(22)23(18-11-5-2-6-12-18)26-29(28,19-13-7-3-8-14-19)20-15-9-4-10-16-20\/h2-16,21-23H,17H2,1H3,(H,25,27)(H,26,28)\/t21-,22-,23+\/m1\/s1\nAnswer: 1"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/valid_0-1.jsonl": "{"text":"Based on the InChI InChI=1S\/C8H8N2OS\/c1-2-5-3-6-7(11)9-4-10-8(6)12-5\/h3-4H,2H2,1H3,(H,9,10,11), the molecule shows no inhibition of CYP2D6."} {"text":"Based on the SELFIES [O][=C][C][Branch1][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][=N][C][=C][N][=C][Branch1][=Branch2][N][C][C][O][C][C][Ring1][=Branch1][N][=C][Ring1][N][N][Ring2][Ring1][#Branch1][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1], the molecule displays no inhibition of CYP2D6."}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES CCc1cc2c(=O)[nH]cnc2s1 inhibiting CYP2D6?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any other words.\nOptions:\n1: False\n2: True\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [O][=C][C][Branch1][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][=N][C][=C][N][=C][Branch1][=Branch2][N][C][C][O][C][C][Ring1][=Branch1][N][=C][Ring1][N][N][Ring2][Ring1][#Branch1][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1] inhibiting CYP P450 2D6?\nConstraint: Even if you are uncertain, you must pick either a or b without using any additional words.\nOptions:\na. True\nb. False\nAnswer: b"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2D6.\nMolecule InChI: InChI=1S\/C8H8N2OS\/c1-2-5-3-6-7(11)9-4-10-8(6)12-5\/h3-4H,2H2,1H3,(H,9,10,11)\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not inhibiting CYP P450 2D6."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2D6.\nMolecule DeepSMILES: O=cc-ccccF)cc6))))))nccncNCCOCC6))))))nc6n%10CCcccccc6\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not inhibiting CYP P450 2D6."}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2D6.\nSELFIES: [C][C][C][=C][C][C][=Branch1][C][=O][NH1][C][=N][C][=Ring1][#Branch1][S][Ring1][#Branch2]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2D6.\nMolecule canonical SMILES: O=c1c(-c2ccc(F)cc2)nc2cnc(N3CCOCC3)nc2n1CCc1ccccc1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP2D6.\nMolecule DeepSMILES: CC=O)Nccccoc=O)sc5c9)))))))))S=O)=O)ccccs5\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not inhibiting CYP2D6."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2D6.\ncanonical SMILES: O=C1CN=C(c2cccs2)c2c(sc3c2CCCC3)N1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is inhibiting CYP P450 2D6."}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/valid_0-12.jsonl": "{"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be inhibiting CYP2D6.\nAssistant: Ok, this DeepSMILES is not inhibiting CYP2D6: CCcccc=O)[nH]cnc6s9"} {"text":"User: I want to create a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be inhibiting CYP P450 2D6.\nAssistant: Understood, this canonical SMILES is not inhibiting CYP P450 2D6: O=c1c(-c2ccc(F)cc2)nc2cnc(N3CCOCC3)nc2n1CCc1ccccc1"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/train_0-2.jsonl": "{"text":"The SMILES CC(=O)N(c1ccc2oc(=O)sc2c1)S(=O)(=O)c1cccs1 represents a molecule that shows no inhibition of CYP2D6."} {"text":"The SELFIES [O][=C][C][N][=C][Branch1][Branch2][C][=C][C][=C][S][Ring1][Branch1][C][=C][Branch1][N][S][C][=C][Ring1][Branch1][C][C][C][C][Ring1][=Branch1][N][Ring2][Ring1][Ring1] represents a molecule that displays inhibition of CYP2D6."}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/test_0-11.jsonl": "{"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be inhibiting CYP P450 2D6.\nAssistant: Ok, here you go, this SMILES is not inhibiting CYP P450 2D6: CCOC(=O)C(NC(C)=O)C(OC(C)=O)c1cccc(N(CCO)CCO)c1"} {"text":"User: I want to generate a SMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be inhibiting CYP P450 2D6.\nAssistant: Got it, here you go, this SMILES is not inhibiting CYP P450 2D6: Cc1ccc(NC(=S)NCCN2CCOCC2)cc1C"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/train_0-7.jsonl": "{"text":"User: Can you derive if the molecule with the SELFIES [C][C][=Branch1][C][=O][N][Branch1][P][C][=C][C][=C][O][C][=Branch1][C][=O][S][C][Ring1][=Branch1][=C][Ring1][#Branch2][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][S][Ring1][Branch1] is inhibiting CYP P450 2D6?\nAssistant: No, this molecule is not inhibiting CYP P450 2D6."} {"text":"User: Can you derive if the molecule with the SELFIES [O][=C][C][N][=C][Branch1][Branch2][C][=C][C][=C][S][Ring1][Branch1][C][=C][Branch1][N][S][C][=C][Ring1][Branch1][C][C][C][C][Ring1][=Branch1][N][Ring2][Ring1][Ring1] is inhibiting CYP P450 2D6?\nAssistant: Yes, this molecule is inhibiting CYP P450 2D6."}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/train_0-11.jsonl": "{"text":"User: I want to come up with a SELFIES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be inhibiting CYP2D6.\nAssistant: Ok, this SELFIES is not inhibiting CYP2D6: [C][C][=Branch1][C][=O][N][Branch1][P][C][=C][C][=C][O][C][=Branch1][C][=O][S][C][Ring1][=Branch1][=C][Ring1][#Branch2][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][S][Ring1][Branch1]"} {"text":"User: I want to come up with a canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should be inhibiting CYP P450 2D6.\nAssistant: Ok, here you go, this canonical SMILES is inhibiting CYP P450 2D6: O=C1CN=C(c2cccs2)c2c(sc3c2CCCC3)N1"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/train_0-1.jsonl": "{"text":"Based on the SELFIES representation [C][C][=Branch1][C][=O][N][Branch1][P][C][=C][C][=C][O][C][=Branch1][C][=O][S][C][Ring1][=Branch1][=C][Ring1][#Branch2][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][S][Ring1][Branch1], the molecule shows no inhibition of CYP2D6."} {"text":"Based on the SMILES O=C1CN=C(c2cccs2)c2c(sc3c2CCCC3)N1, the molecule displays inhibition of CYP P450 2D6."}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES CC(=O)N(c1ccc2oc(=O)sc2c1)S(=O)(=O)c1cccs1 inhibiting CYP P450 2D6?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any additional words.\nOptions:\n(1) False\n(2) True\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES O=C1CN=C(c2cccs2)c2c(sc3c2CCCC3)N1 inhibiting CYP2D6?\nConstraint: Even if you are uncertain, you must pick either A or B without using any additional words.\nOptions:\nA: False\nB: True\nAnswer: B"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2D6.\nMolecule InChI: InChI=1S\/C13H9NO5S3\/c1-8(15)14(22(17,18)12-3-2-6-20-12)9-4-5-10-11(7-9)21-13(16)19-10\/h2-7H,1H3\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP2D6.\ncanonical SMILES: O=C1CN=C(c2cccs2)c2c(sc3c2CCCC3)N1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/test_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the SMILES CCOC(=O)C(NC(C)=O)C(OC(C)=O)c1cccc(N(CCO)CCO)c1 is inhibiting CYP P450 2D6?\nAssistant: No, this molecule is not inhibiting CYP P450 2D6."} {"text":"User: Can you derive if the molecule with the InChI InChI=1S\/C15H23N3OS\/c1-12-3-4-14(11-13(12)2)17-15(20)16-5-6-18-7-9-19-10-8-18\/h3-4,11H,5-10H2,1-2H3,(H2,16,17,20) is inhibiting CYP P450 2D6?\nAssistant: No, this molecule is not inhibiting CYP P450 2D6."}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/train_0-9.jsonl": "{"text":"User: Can you create the SELFIES of a molecule that is not inhibiting CYP P450 2D6?\nAssistant: Yes, here you go: [C][C][=Branch1][C][=O][N][Branch1][P][C][=C][C][=C][O][C][=Branch1][C][=O][S][C][Ring1][=Branch1][=C][Ring1][#Branch2][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][S][Ring1][Branch1]"} {"text":"User: Can you create the SELFIES of a molecule that is inhibiting CYP2D6?\nAssistant: Yes, I'm happy to help, here you go: [O][=C][C][N][=C][Branch1][Branch2][C][=C][C][=C][S][Ring1][Branch1][C][=C][Branch1][N][S][C][=C][Ring1][Branch1][C][C][C][C][Ring1][=Branch1][N][Ring2][Ring1][Ring1]"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/valid_0-3.jsonl": "{"text":"The InChI InChI=1S\/C8H8N2OS\/c1-2-5-3-6-7(11)9-4-10-8(6)12-5\/h3-4H,2H2,1H3,(H,9,10,11) is not inhibiting CYP P450 2D6."} {"text":"The molecule DeepSMILES O=cc-ccccF)cc6))))))nccncNCCOCC6))))))nc6n%10CCcccccc6 is not inhibiting CYP P450 2D6."}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/test_0-8.jsonl": "{"text":"User: Is the molecule with the DeepSMILES CCOC=O)CNCC)=O)))COCC)=O)))cccccNCCO)))CCO))))c6 inhibiting CYP2D6?\nAssistant: No, it is not inhibiting CYP2D6."} {"text":"User: Is the molecule with the DeepSMILES CccccNC=S)NCCNCCOCC6)))))))))))cc6C inhibiting CYP P450 2D6?\nAssistant: No, it is not inhibiting CYP P450 2D6."}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting CYP2D6?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any other words.\nOptions:\n1.) Cn1nnc(NC(=O)c2ccco2)n1\n2.) C[C@@H](NC(=O)\/C(C#N)=C\/c1ccc(O)c(O)c1)c1ccccc1\n3.) CCOC(=O)C(NC(C)=O)C(OC(C)=O)c1cccc(N(CCO)CCO)c1\n4.) CC(C)[N+]1(C)[C@@H]2CC[C@H]1CC(C(=O)O[C@@H](CO)c1ccccc1)C2\nAnswer: 1, 2, 3, 4"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting CYP2D6?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any other words.\nOptions:\n1: [C][N][C][Branch2][Ring2][Ring1][\/C][Branch1][Ring1][C][#N][=C][Branch1][C][\\O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][C][C][C][Ring1][=Branch1][C][=C][Ring1][#C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring2][Ring1][=N]\n2: [O][=S][=Branch1][C][=O][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][N][C][C][N][Branch2][Ring1][Ring1][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][C][Ring1][S]\n3: [C][C][=C][N][=C][Branch2][Ring2][Branch2][C][N][C][=N][C][Branch2][Ring1][Ring2][C][=C][C][=C][Branch1][#Branch2][C][=Branch1][C][=O][N][Branch1][C][C][C][C][=C][Ring1][O][=N][C][=C][C][=C][C][=C][Ring2][Ring1][Branch1][Ring1][=Branch1][C][=N][Ring2][Ring1][=N]\n4: [C][C][=C][C][=C][Branch1][P][N][C][=Branch1][C][=S][N][C][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C]\nAnswer: 1, 2, 3, 4"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting CYP2D6?\nConstraint: You must select none, one or more options from a, b, or c without using any additional words.\nOptions:\na. [C][C][C][=C][C][C][=Branch1][C][=O][NH1][C][=N][C][=Ring1][#Branch1][S][Ring1][#Branch2]\nb. [Cl].[N][C][Branch1][C][N][=N][C][=Branch1][C][=O][C][=N][C][Branch1][C][Cl][=C][Branch1][C][N][N][=C][Ring1][Branch2][N].[O].[O]\nc. [Cl][C][=C][C][=C][Branch2][Ring1][#C][S][C][=C][C][Branch1][=Branch2][N][C][C][O][C][C][Ring1][=Branch1][=N][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][=N][Ring2][Ring1][C][C][=C][Ring2][Ring1][=Branch2]\nAnswer: a, b, c"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting CYP P450 2D6?\nConstraint: You must select none, one or more options from a or b without using any other words.\nOptions:\na.) O=c1c(-c2ccc(F)cc2)nc2cnc(N3CCOCC3)nc2n1CCc1ccccc1\nb.) COC(=O)COc1ccc(\/C=C2\/SC(=O)N(CC(=O)N3CCCC3)C2=O)cc1Br\nAnswer: a, b"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP2D6.\nMolecule InChI: InChI=1S\/C19H28N2O7\/c1-4-27-19(26)17(20-13(2)24)18(28-14(3)25)15-6-5-7-16(12-15)21(8-10-22)9-11-23\/h5-7,12,17-18,22-23H,4,8-11H2,1-3H3,(H,20,24)\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2D6.\nMolecule DeepSMILES: CccccNC=S)NCCNCCOCC6)))))))))))cc6C\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/cyp_p450_2d6_inhibition_veith_et_al/test_0-12.jsonl": "{"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be inhibiting CYP2D6.\nAssistant: Ok, this DeepSMILES is not inhibiting CYP2D6: CCOC=O)CNCC)=O)))COCC)=O)))cccccNCCO)))CCO))))c6"} {"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be inhibiting CYP P450 2D6.\nAssistant: Understood, this DeepSMILES is not inhibiting CYP P450 2D6: CccccNC=S)NCCNCCOCC6)))))))))))cc6C"}", "/scratch/micpie/export/ord_steps_yield/test_0-1.jsonl": "{"text":"User: I would like to run a reaction with the reaction action sequence MAKESOLUTION with methyl 2-morpholino-4-oxo-8-(pyrrolidin-2-yl)-4H-chromene-6-carboxylate (0.8g, 2.23 mmol) and (9,9-dimethyl-9H-xanthene-4,5-diyl)bis(diphenylphosphine) (0.110 g, 0.19 mmol) and 1-bromo-3,5-difluorobenzene (0.321 ml, 2.79 mmol) and cesium carbonate (1.091 g, 3.35 mmol); ADD SLN; ADD diacetoxypalladium (0.020 g, 0.09 mmol); MAKESOLUTION with methyl 2-morpholino-4-oxo-8-(pyrrolidin-2-yl)-4H-chromene-6-\n DEGAS with argon for unknown; STIR for 20 hours at 100 °C.\n SETTEMPERATURE room temperature; FILTER keep filtrate; CONCENTRATE.\n PURIFY; PURIFY.. What is the yield I should get?\nAssistant: The yield is 66\\%."} {"text":"User: I want to run a reaction with the reaction action sequence ADD 2-(2,5-dichloropyridin-4-ylamino)-N-methoxybenzamide (200 mg, 0.64 mmol); ADD 1-isopropyl-3-methyl-1H-pyrazol-5-amine (178 mg, 1.28 mmol); ADD cesium carbonate (626 mg, 1.92 mmol); ADD 1,4-dioxane (3 mL).\n ADD Nitrogen; DEGAS with unknown for 10 minutes.\n ADD 2,2'-bis(diphenylphosphino)-1,1'-binaphthyl (120 mg, 0.19 mmol); ADD diacetoxypalladium (21.58 mg, 0.10 mmol).\n REFLUX for overnight.\n SETTEMPERATURE room temperature; FILTER; WASH with CH2Cl2 \/ MeOH (10 mL); CONCENTRATE.\n PURIFY.. What is the reaction yield I can get?\nAssistant: The predicted reaction yield is 37\\%."}", "/scratch/micpie/export/ord_steps_yield/valid_0-0.jsonl": "{"text":"The reaction yield of a reaction with the reaction action sequence below is 53\\%.\nreaction action sequence: ADD 2,6-dichloropyridine (228 mg, 1.54 mmol); ADD toluene (7.5 ml); ADD diacetoxypalladium (6 mg, 0.03 mmol); ADD 2,2'-bis(diphenylphosphino)-1,1'-binaphthalene (19 mg, 0.03 mmol); ADD aniline (0.170 ml, 1.87 mmol); ADD potassium carbonate (583 mg, 4.22 mmol).\n DEGAS with SLN for 3-4 minutes; MICROWAVE for ~3 minutes at 160 °C; STIR for 30 minutes at same temperature.\n SETTEMPERATURE ambient temperature; ADD ethyl acetate (40 mL); FILTER keep filtrate.\n CONCENTRATE; PURIFY."} {"text":"The yield of a reaction with the reaction action steps below is 19\\%.\nreaction action steps: ADD 6-bromo-3-(trifluoromethyl)-[1,2,4]triazolo[4,3-a]pyridine (65 mg, 0.24 mmol); ADD 4-(4-(2-(1-methyl-1H-pyrazol-5-yl)ethoxy)phenyl)piperidine (112 mg, 0.24 mmol); ADD rac-2,2'-Bis(diphenylphosphino)-1,1'-binaphthyl (11.41 mg, 0.02 mmol); ADD Sodium tert-butoxide (0.060 mL, 0.49 mmol); ADD xylenes (5 mL); DEGAS with nitrogen for unknown.\n ADD Bis(dibenzylideneacetone)palladium (7.02 mg, 0.01 mmol).\n STIR for 30 minutes at 110 °C; SETTEMPERATURE RT.\n ADD EtOAc (25 mL); WASH with water (25 mL).\n COLLECTLAYER aqueous; TRITURATE with EtOAc (25 mL).\n PURIFY.\n INVALIDACTION.\n PURIFY."}", "/scratch/micpie/export/ord_steps_yield/test_0-2.jsonl": "{"text":"Task: Predict the reaction yield of a reaction based on the reaction action sequence.\nDescription: MAKESOLUTION with methyl 2-morpholino-4-oxo-8-(pyrrolidin-2-yl)-4H-chromene-6-carboxylate (0.8g, 2.23 mmol) and (9,9-dimethyl-9H-xanthene-4,5-diyl)bis(diphenylphosphine) (0.110 g, 0.19 mmol) and 1-bromo-3,5-difluorobenzene (0.321 ml, 2.79 mmol) and cesium carbonate (1.091 g, 3.35 mmol); ADD SLN; ADD diacetoxypalladium (0.020 g, 0.09 mmol); MAKESOLUTION with methyl 2-morpholino-4-oxo-8-(pyrrolidin-2-yl)-4H-chromene-6-\n DEGAS with argon for unknown; STIR for 20 hours at 100 °C.\n SETTEMPERATURE room temperature; FILTER keep filtrate; CONCENTRATE.\n PURIFY; PURIFY.\nAnswer: 66\\%"} {"text":"Task: Estimate the reaction yield of a reaction based on the reaction action steps.\nDescription: ADD 2-(2,5-dichloropyridin-4-ylamino)-N-methoxybenzamide (200 mg, 0.64 mmol); ADD 1-isopropyl-3-methyl-1H-pyrazol-5-amine (178 mg, 1.28 mmol); ADD cesium carbonate (626 mg, 1.92 mmol); ADD 1,4-dioxane (3 mL).\n ADD Nitrogen; DEGAS with unknown for 10 minutes.\n ADD 2,2'-bis(diphenylphosphino)-1,1'-binaphthyl (120 mg, 0.19 mmol); ADD diacetoxypalladium (21.58 mg, 0.10 mmol).\n REFLUX for overnight.\n SETTEMPERATURE room temperature; FILTER; WASH with CH2Cl2 \/ MeOH (10 mL); CONCENTRATE.\n PURIFY.\nAnswer: 37\\%"}", "/scratch/micpie/export/ord_steps_yield/test_0-0.jsonl": "{"text":"The yield of a reaction with the reaction action steps below is 66\\%.\nreaction action steps: MAKESOLUTION with methyl 2-morpholino-4-oxo-8-(pyrrolidin-2-yl)-4H-chromene-6-carboxylate (0.8g, 2.23 mmol) and (9,9-dimethyl-9H-xanthene-4,5-diyl)bis(diphenylphosphine) (0.110 g, 0.19 mmol) and 1-bromo-3,5-difluorobenzene (0.321 ml, 2.79 mmol) and cesium carbonate (1.091 g, 3.35 mmol); ADD SLN; ADD diacetoxypalladium (0.020 g, 0.09 mmol); MAKESOLUTION with methyl 2-morpholino-4-oxo-8-(pyrrolidin-2-yl)-4H-chromene-6-\n DEGAS with argon for unknown; STIR for 20 hours at 100 °C.\n SETTEMPERATURE room temperature; FILTER keep filtrate; CONCENTRATE.\n PURIFY; PURIFY."} {"text":"The yield of a reaction with the reaction action sequence below is 37\\%.\nreaction action sequence: ADD 2-(2,5-dichloropyridin-4-ylamino)-N-methoxybenzamide (200 mg, 0.64 mmol); ADD 1-isopropyl-3-methyl-1H-pyrazol-5-amine (178 mg, 1.28 mmol); ADD cesium carbonate (626 mg, 1.92 mmol); ADD 1,4-dioxane (3 mL).\n ADD Nitrogen; DEGAS with unknown for 10 minutes.\n ADD 2,2'-bis(diphenylphosphino)-1,1'-binaphthyl (120 mg, 0.19 mmol); ADD diacetoxypalladium (21.58 mg, 0.10 mmol).\n REFLUX for overnight.\n SETTEMPERATURE room temperature; FILTER; WASH with CH2Cl2 \/ MeOH (10 mL); CONCENTRATE.\n PURIFY."}", "/scratch/micpie/export/ord_steps_yield/train_0-0.jsonl": "{"text":"The reaction yield of a reaction with the reaction action sequence below is 22\\%.\nreaction action sequence: ADD 2,6-dichloropyrazine (287 mg, 1.93 mmol); ADD 4-(methylsulfonyl)aniline (330 mg, 1.93 mmol); ADD diacetoxypalladium (43.3 mg, 0.19 mmol); ADD 2,2'-bis(diphenylphosphino)-1,1'-binaphthyl (120 mg, 0.19 mmol); ADD cesium carbonate (690 mg, 2.12 mmol); ADD 1,4-dioxane (2 mL); STIR for 16 hours at 90 °C under nitrogen.\n CONCENTRATE; PURIFY."} {"text":"The yield of a reaction with the reaction action steps below is 62\\%.\nreaction action steps: ADD rac-2,2'-Bis(diphenylphosphino)-1,1'-binaphthyl (6.32 mg, 10.15 µmol); ADD Tris(dibenzylideneacetone)dipalladium(0) (4.65 mg, 5.08 µmol); ADD Sodium tert- butoxide (34.1 mg, 0.36 mmol).\n ADD toluene (3ml); ADD 7-bromobenzofuran (50mg, 0.25 mmol); ADD 1-phenethylpiperazine (48.3 mg, 0.25 mmol).\n REFLUX for over night."}", "/scratch/micpie/export/ord_steps_yield/valid_0-2.jsonl": "{"text":"Task: Predict the yield of a reaction based on the reaction action sequence.\nDescription: ADD 2,6-dichloropyridine (228 mg, 1.54 mmol); ADD toluene (7.5 ml); ADD diacetoxypalladium (6 mg, 0.03 mmol); ADD 2,2'-bis(diphenylphosphino)-1,1'-binaphthalene (19 mg, 0.03 mmol); ADD aniline (0.170 ml, 1.87 mmol); ADD potassium carbonate (583 mg, 4.22 mmol).\n DEGAS with SLN for 3-4 minutes; MICROWAVE for ~3 minutes at 160 °C; STIR for 30 minutes at same temperature.\n SETTEMPERATURE ambient temperature; ADD ethyl acetate (40 mL); FILTER keep filtrate.\n CONCENTRATE; PURIFY.\nAnswer: 53\\%"} {"text":"Task: Estimate the reaction yield of a reaction based on the reaction action sequence.\nDescription: ADD 6-bromo-3-(trifluoromethyl)-[1,2,4]triazolo[4,3-a]pyridine (65 mg, 0.24 mmol); ADD 4-(4-(2-(1-methyl-1H-pyrazol-5-yl)ethoxy)phenyl)piperidine (112 mg, 0.24 mmol); ADD rac-2,2'-Bis(diphenylphosphino)-1,1'-binaphthyl (11.41 mg, 0.02 mmol); ADD Sodium tert-butoxide (0.060 mL, 0.49 mmol); ADD xylenes (5 mL); DEGAS with nitrogen for unknown.\n ADD Bis(dibenzylideneacetone)palladium (7.02 mg, 0.01 mmol).\n STIR for 30 minutes at 110 °C; SETTEMPERATURE RT.\n ADD EtOAc (25 mL); WASH with water (25 mL).\n COLLECTLAYER aqueous; TRITURATE with EtOAc (25 mL).\n PURIFY.\n INVALIDACTION.\n PURIFY.\nAnswer: 19\\%"}", "/scratch/micpie/export/ord_steps_yield/valid_0-1.jsonl": "{"text":"User: I need to run a reaction with the reaction action steps ADD 2,6-dichloropyridine (228 mg, 1.54 mmol); ADD toluene (7.5 ml); ADD diacetoxypalladium (6 mg, 0.03 mmol); ADD 2,2'-bis(diphenylphosphino)-1,1'-binaphthalene (19 mg, 0.03 mmol); ADD aniline (0.170 ml, 1.87 mmol); ADD potassium carbonate (583 mg, 4.22 mmol).\n DEGAS with SLN for 3-4 minutes; MICROWAVE for ~3 minutes at 160 °C; STIR for 30 minutes at same temperature.\n SETTEMPERATURE ambient temperature; ADD ethyl acetate (40 mL); FILTER keep filtrate.\n CONCENTRATE; PURIFY.. What is the yield I should get?\nAssistant: The predicted yield is 53\\%."} {"text":"User: I want to run a reaction with the reaction action sequence ADD 6-bromo-3-(trifluoromethyl)-[1,2,4]triazolo[4,3-a]pyridine (65 mg, 0.24 mmol); ADD 4-(4-(2-(1-methyl-1H-pyrazol-5-yl)ethoxy)phenyl)piperidine (112 mg, 0.24 mmol); ADD rac-2,2'-Bis(diphenylphosphino)-1,1'-binaphthyl (11.41 mg, 0.02 mmol); ADD Sodium tert-butoxide (0.060 mL, 0.49 mmol); ADD xylenes (5 mL); DEGAS with nitrogen for unknown.\n ADD Bis(dibenzylideneacetone)palladium (7.02 mg, 0.01 mmol).\n STIR for 30 minutes at 110 °C; SETTEMPERATURE RT.\n ADD EtOAc (25 mL); WASH with water (25 mL).\n COLLECTLAYER aqueous; TRITURATE with EtOAc (25 mL).\n PURIFY.\n INVALIDACTION.\n PURIFY.. What is the reaction yield I should expect?\nAssistant: The reaction yield is 19\\%."}", "/scratch/micpie/export/ord_steps_yield/train_0-2.jsonl": "{"text":"Task: Predict the yield of a reaction based on the reaction action sequence.\nDescription: ADD 2,6-dichloropyrazine (287 mg, 1.93 mmol); ADD 4-(methylsulfonyl)aniline (330 mg, 1.93 mmol); ADD diacetoxypalladium (43.3 mg, 0.19 mmol); ADD 2,2'-bis(diphenylphosphino)-1,1'-binaphthyl (120 mg, 0.19 mmol); ADD cesium carbonate (690 mg, 2.12 mmol); ADD 1,4-dioxane (2 mL); STIR for 16 hours at 90 °C under nitrogen.\n CONCENTRATE; PURIFY.\nAnswer: 22\\%"} {"text":"Task: Predict the reaction yield of a reaction based on the reaction action steps.\nDescription: ADD rac-2,2'-Bis(diphenylphosphino)-1,1'-binaphthyl (6.32 mg, 10.15 µmol); ADD Tris(dibenzylideneacetone)dipalladium(0) (4.65 mg, 5.08 µmol); ADD Sodium tert- butoxide (34.1 mg, 0.36 mmol).\n ADD toluene (3ml); ADD 7-bromobenzofuran (50mg, 0.25 mmol); ADD 1-phenethylpiperazine (48.3 mg, 0.25 mmol).\n REFLUX for over night.\nAnswer: 62\\%"}", "/scratch/micpie/export/ord_steps_yield/train_0-1.jsonl": "{"text":"User: I want to run a reaction with the reaction action sequence ADD 2,6-dichloropyrazine (287 mg, 1.93 mmol); ADD 4-(methylsulfonyl)aniline (330 mg, 1.93 mmol); ADD diacetoxypalladium (43.3 mg, 0.19 mmol); ADD 2,2'-bis(diphenylphosphino)-1,1'-binaphthyl (120 mg, 0.19 mmol); ADD cesium carbonate (690 mg, 2.12 mmol); ADD 1,4-dioxane (2 mL); STIR for 16 hours at 90 °C under nitrogen.\n CONCENTRATE; PURIFY.. What is the reaction yield I can get?\nAssistant: The reaction yield is 22\\%."} {"text":"User: I would like to run a reaction with the reaction action steps ADD rac-2,2'-Bis(diphenylphosphino)-1,1'-binaphthyl (6.32 mg, 10.15 µmol); ADD Tris(dibenzylideneacetone)dipalladium(0) (4.65 mg, 5.08 µmol); ADD Sodium tert- butoxide (34.1 mg, 0.36 mmol).\n ADD toluene (3ml); ADD 7-bromobenzofuran (50mg, 0.25 mmol); ADD 1-phenethylpiperazine (48.3 mg, 0.25 mmol).\n REFLUX for over night.. What is the yield I can get?\nAssistant: The estimated yield is 62\\%."}", "/scratch/micpie/export/ord_procedure_steps/test_0-1.jsonl": "{"text":"User: Could you tell me the reaction action sequence for the reaction procedure Diethyl azodicarboxylate (0.235 ml, 1.49 mmol) was added at 0° C. to a stirred solution of triphenylphosphine (0.392 g, 1.49 mmol) and 3-(6H-dibenzo[b,e]oxepin-11-ylidene1-propanol (0.252 g, 1.0 mmol) in dry THF (5 ml) and the mixture stirred for 5 min. A solution of ethyl 2-ethoxy-3-(4-hydroxyphenyl)-propionate (0.356 g, 1.49 mmol) in dry THF (5 ml) was then added, the mixture allowed to warm to room temperature, and stirring continued for 18 h. The resulting mixture was treated with water (50 ml), and the products extracted into dichloromethane (4×50 ml). The extracts were combined, washed with brine, dried (Na2SO4) and evaporated to an orange gum. This was then purified by column chromatography on silica gel (20% Et2O in petroleum eluent) to give the title compound as an inseparable 4:1 mixture of E and Z double-bond isomers, as a pale yellow gum; 0.252 g (53%).?\nAssistant: I propose the reaction action sequence MAKESOLUTION with triphenylphosphine (0.392 g, 1.49 mmol) and 3-(6H-dibenzo[b,e]oxepin-11-ylidene1-propanol (0.252 g, 1.0 mmol) and dry THF (5 ml); ADD SLN; ADD Diethyl azodicarboxylate (0.235 ml, 1.49 mmol) at 0° C; STIR for 5 min.\n MAKESOLUTION with ethyl 2-ethoxy-3-(4-hydroxyphenyl)-propionate (0.356 g, 1.49 mmol) and dry THF (5 ml); ADD SLN; SETTEMPERATURE room temperature; STIR for 18 h.\n ADD water (50 ml); EXTRACT with dichloromethane (50 ml) 4 x.\n COLLECTLAYER organic; WASH with brine; DRYSOLUTION over Na2SO4; CONCENTRATE.\n PURIFY; YIELD title compound."} {"text":"User: Can you tell me the reaction action sequence for the reaction procedure A mixture of (7-chlorobenzo[d][1,3]dioxol-5-yl)methanol (5.5 g, 30 mmol) and SOCl2 (5.0 mL, 67 mmol) in dichloromethane (20 mL) was stirred at room temperature for 1 h and was then poured into ice water. The organic layer was separated and the aqueous phase was extracted with dichloromethane (50 mL×3). The combined extracts were washed with water and aqueous NaHCO3 solution, dried over Na2SO4 and evaporated under reduced pressure to afford 4-chloro-6-(chloromethyl)benzo[d][1,3]dioxole, which was directly used in the next step.?\nAssistant: I suggest the reaction action sequence MAKESOLUTION with (7-chlorobenzo[d][1,3]dioxol-5-yl)methanol (5.5 g, 30 mmol) and SOCl2 (5.0 mL, 67 mmol) and dichloromethane (20 mL); ADD SLN; STIR for 1 h at room temperature; ADD ice water.\n PHASESEPARATION; COLLECTLAYER organic; COLLECTLAYER aqueous; EXTRACT with dichloromethane (50 mL) 3 x.\n COLLECTLAYER organic; WASH with water; WASH with aqueous NaHCO3 solution; DRYSOLUTION over Na2SO4; CONCENTRATE; YIELD 4-chloro-6-(chloromethyl)benzo[d][1,3]dioxole."}", "/scratch/micpie/export/ord_procedure_steps/valid_0-0.jsonl": "{"text":"User: Can you show me the reaction procedure for the reaction action sequence ADD acetic acid (300 mL); ADD Fe powder (99.6 g, 1.78 mol) at 60° C.\n ADD 2-Cyano-5-methyl-3-nitropyridine (97 g, 0.59 mol); ADD acetic acid (400 mL) dropwise at below 80° C over 3.5 hours.\n STIR for 30 min; SETTEMPERATURE room temperature; ADD EtOAc (750 mL); FILTER keep filtrate; WASH with EtOAc (1 (500 mL, 3×250 mL).\n COLLECTLAYER organic; CONCENTRATE; PH with saturated NaHCO3 solution (850 mL) to pH neutral; COLLECTLAYER aqueous; EXTRACT with EtOAc (1×750 mL, 2 x.\n FILTER keep filtrate; DRYSOLUTION over Na2SO4; CONCENTRATE; YIELD 3-amino-2-cyano-5-methylpyridine (60 g, 21.7%).?\nAssistant: I suggest the reaction procedure To acetic acid (300 mL) in a 3-neck 2 liter round bottom flask equipped with mechanical stirrer and a thermometer was added Fe powder (99.6 g, 1.78 mol) with stirring at 60° C. 2-Cyano-5-methyl-3-nitropyridine (97 g, 0.59 mol) was dissolved in acetic acid (400 mL) with gentle warming and added to the above reaction mixture drop wise with efficient stirring so that the reaction temperature kept below 80° C. over 3.5 hours. The reaction mixture was further stirred for an addition 30 min, cooled, diluted with EtOAc (750 mL), filtered through celite and washed with EtOAc (1×500 mL, 3×250 mL). Combined EtOAc layers were evaporated to dryness to obtain dark brown solid which was neutralized with saturated NaHCO3 solution (850 mL), after addition of water (250 mL) to obtain homogeneous, this aqueous layer was extracted with EtOAc (1×750 mL, 2×500 mL). Combined EtOAc layers were filtered through small pad of silica gel (sand-SiO2-sand in sintered funnel), dried (Na2SO4) and evaporated to obtain 3-amino-2-cyano-5-methylpyridine (60 g) as yellow solid in 76% yield including ˜10% corresponding amide."} {"text":"User: Could you show me the procedure for the reaction action sequence MAKESOLUTION with diphenyl-methanamine (183 g, 1 mol) and CH3OH (1 L); ADD SLN; ADD 2-(Chloromethyl)oxirane (92 g, 1 mol) dropwise at 0° C; STIR for overnight at r.t..\n CONCENTRATE; YIELD title compound (201 g, 73%).?\nAssistant: I suggest the procedure 2-(Chloromethyl)oxirane (92 g, 1 mol) was added dropwise to a solution of diphenyl-methanamine (183 g, 1 mol) in CH3OH (1 L) at 0° C. then the mixture was stirred at r.t. overnight. The mixture was then concentrated in vacuo to give the title compound (201 g, 73%) which was used in next step without further purification."}", "/scratch/micpie/export/ord_procedure_steps/test_0-2.jsonl": "{"text":"Task: Convert a reaction procedure description into a reaction action sequence.\nProcedure: Diethyl azodicarboxylate (0.235 ml, 1.49 mmol) was added at 0° C. to a stirred solution of triphenylphosphine (0.392 g, 1.49 mmol) and 3-(6H-dibenzo[b,e]oxepin-11-ylidene1-propanol (0.252 g, 1.0 mmol) in dry THF (5 ml) and the mixture stirred for 5 min. A solution of ethyl 2-ethoxy-3-(4-hydroxyphenyl)-propionate (0.356 g, 1.49 mmol) in dry THF (5 ml) was then added, the mixture allowed to warm to room temperature, and stirring continued for 18 h. The resulting mixture was treated with water (50 ml), and the products extracted into dichloromethane (4×50 ml). The extracts were combined, washed with brine, dried (Na2SO4) and evaporated to an orange gum. This was then purified by column chromatography on silica gel (20% Et2O in petroleum eluent) to give the title compound as an inseparable 4:1 mixture of E and Z double-bond isomers, as a pale yellow gum; 0.252 g (53%).\nAnswer: MAKESOLUTION with triphenylphosphine (0.392 g, 1.49 mmol) and 3-(6H-dibenzo[b,e]oxepin-11-ylidene1-propanol (0.252 g, 1.0 mmol) and dry THF (5 ml); ADD SLN; ADD Diethyl azodicarboxylate (0.235 ml, 1.49 mmol) at 0° C; STIR for 5 min.\n MAKESOLUTION with ethyl 2-ethoxy-3-(4-hydroxyphenyl)-propionate (0.356 g, 1.49 mmol) and dry THF (5 ml); ADD SLN; SETTEMPERATURE room temperature; STIR for 18 h.\n ADD water (50 ml); EXTRACT with dichloromethane (50 ml) 4 x.\n COLLECTLAYER organic; WASH with brine; DRYSOLUTION over Na2SO4; CONCENTRATE.\n PURIFY; YIELD title compound."} {"text":"Task: Convert a description of reaction procedure into a reaction action steps.\nProcedure: A mixture of (7-chlorobenzo[d][1,3]dioxol-5-yl)methanol (5.5 g, 30 mmol) and SOCl2 (5.0 mL, 67 mmol) in dichloromethane (20 mL) was stirred at room temperature for 1 h and was then poured into ice water. The organic layer was separated and the aqueous phase was extracted with dichloromethane (50 mL×3). The combined extracts were washed with water and aqueous NaHCO3 solution, dried over Na2SO4 and evaporated under reduced pressure to afford 4-chloro-6-(chloromethyl)benzo[d][1,3]dioxole, which was directly used in the next step.\nAnswer: MAKESOLUTION with (7-chlorobenzo[d][1,3]dioxol-5-yl)methanol (5.5 g, 30 mmol) and SOCl2 (5.0 mL, 67 mmol) and dichloromethane (20 mL); ADD SLN; STIR for 1 h at room temperature; ADD ice water.\n PHASESEPARATION; COLLECTLAYER organic; COLLECTLAYER aqueous; EXTRACT with dichloromethane (50 mL) 3 x.\n COLLECTLAYER organic; WASH with water; WASH with aqueous NaHCO3 solution; DRYSOLUTION over Na2SO4; CONCENTRATE; YIELD 4-chloro-6-(chloromethyl)benzo[d][1,3]dioxole."}", "/scratch/micpie/export/ord_procedure_steps/test_0-0.jsonl": "{"text":"User: Can you give me the description of reaction procedure for the reaction action sequence MAKESOLUTION with triphenylphosphine (0.392 g, 1.49 mmol) and 3-(6H-dibenzo[b,e]oxepin-11-ylidene1-propanol (0.252 g, 1.0 mmol) and dry THF (5 ml); ADD SLN; ADD Diethyl azodicarboxylate (0.235 ml, 1.49 mmol) at 0° C; STIR for 5 min.\n MAKESOLUTION with ethyl 2-ethoxy-3-(4-hydroxyphenyl)-propionate (0.356 g, 1.49 mmol) and dry THF (5 ml); ADD SLN; SETTEMPERATURE room temperature; STIR for 18 h.\n ADD water (50 ml); EXTRACT with dichloromethane (50 ml) 4 x.\n COLLECTLAYER organic; WASH with brine; DRYSOLUTION over Na2SO4; CONCENTRATE.\n PURIFY; YIELD title compound.?\nAssistant: I suggest the description of reaction procedure Diethyl azodicarboxylate (0.235 ml, 1.49 mmol) was added at 0° C. to a stirred solution of triphenylphosphine (0.392 g, 1.49 mmol) and 3-(6H-dibenzo[b,e]oxepin-11-ylidene1-propanol (0.252 g, 1.0 mmol) in dry THF (5 ml) and the mixture stirred for 5 min. A solution of ethyl 2-ethoxy-3-(4-hydroxyphenyl)-propionate (0.356 g, 1.49 mmol) in dry THF (5 ml) was then added, the mixture allowed to warm to room temperature, and stirring continued for 18 h. The resulting mixture was treated with water (50 ml), and the products extracted into dichloromethane (4×50 ml). The extracts were combined, washed with brine, dried (Na2SO4) and evaporated to an orange gum. This was then purified by column chromatography on silica gel (20% Et2O in petroleum eluent) to give the title compound as an inseparable 4:1 mixture of E and Z double-bond isomers, as a pale yellow gum; 0.252 g (53%)."} {"text":"User: Can you show me the reaction procedure description for the reaction action sequence MAKESOLUTION with (7-chlorobenzo[d][1,3]dioxol-5-yl)methanol (5.5 g, 30 mmol) and SOCl2 (5.0 mL, 67 mmol) and dichloromethane (20 mL); ADD SLN; STIR for 1 h at room temperature; ADD ice water.\n PHASESEPARATION; COLLECTLAYER organic; COLLECTLAYER aqueous; EXTRACT with dichloromethane (50 mL) 3 x.\n COLLECTLAYER organic; WASH with water; WASH with aqueous NaHCO3 solution; DRYSOLUTION over Na2SO4; CONCENTRATE; YIELD 4-chloro-6-(chloromethyl)benzo[d][1,3]dioxole.?\nAssistant: I propose the reaction procedure description A mixture of (7-chlorobenzo[d][1,3]dioxol-5-yl)methanol (5.5 g, 30 mmol) and SOCl2 (5.0 mL, 67 mmol) in dichloromethane (20 mL) was stirred at room temperature for 1 h and was then poured into ice water. The organic layer was separated and the aqueous phase was extracted with dichloromethane (50 mL×3). The combined extracts were washed with water and aqueous NaHCO3 solution, dried over Na2SO4 and evaporated under reduced pressure to afford 4-chloro-6-(chloromethyl)benzo[d][1,3]dioxole, which was directly used in the next step."}", "/scratch/micpie/export/ord_procedure_steps/train_0-0.jsonl": "{"text":"User: Could you show me the reaction procedure description for the reaction action steps MAKESOLUTION with 1-bromo-4-(difluoromethoxy)-3-methoxy-2-(methoxymethoxy)benzene (1 g, 3.19 mmol) and methanol (15 mL); ADD SLN; ADD concentrated hydrochloride (2 mL); STIR for 1 h at 50° C.\n SETTEMPERATURE RT; CONCENTRATE; PH with sodium bicarbonate solution to pH basic; EXTRACT with dichloromethane.\n WASH with brine; DRYSOLUTION over sodium sulphate; CONCENTRATE; YIELD 6-Bromo-3-(difluoromethoxy)-2-methoxyphenol (800 mg, 93%).?\nAssistant: I propose the reaction procedure description To a stirring solution of 1-bromo-4-(difluoromethoxy)-3-methoxy-2-(methoxymethoxy)benzene (1 g, 3.19 mmol) in methanol (15 mL) was added concentrated hydrochloride (2 mL) and the reaction mixture was heated to 50° C. for 1 h. The reaction mixture was cooled to RT, concentrated under reduced pressure and the obtained residue was basified with sodium bicarbonate solution and extracted with dichloromethane (3×). The combined dichloromethane layers were washed with brine, dried over sodium sulphate and concentrated under reduced pressure to afford 6-Bromo-3-(difluoromethoxy)-2-methoxyphenol as a solid (800 mg, 93%)."} {"text":"User: Can you give me the reaction procedure for the reaction action steps MAKESOLUTION with 1-[5-(2-hydroxy-1,1-dimethyl-ethyl)-isoxazol-3-yl]-3-{4-[7-(2-morpholin-4-yl-ethoxy)-benzo[d]imidazo[2,1-b]thiazol-2-yl]-phenyl}-urea (1 equivalent) and silver carbonate (0.5-1 equivalent) and toluene; ADD SLN; ADD 1-bromo-2,3,4-tri-O-acetyl-α-D-glucuronic acid methyl ester (0.8-1.5 equivalents) at rt; STIR for unknown at rt.\n FILTER keep filtrate; PARTITION with water and dichloromethane or a mixture of dichloromethane and isopropanol.\n COLLECTLAYER aqueous; EXTRACT with dichloromethane or a mixture of dichloromethane; EXTRACT with isopropanol.\n COLLECTLAYER organic; WASH with 2N aq sodium hydroxide; DRYSOLUTION over magnesium sulfate; FILTER keep filtrate; CONCENTRATE.\n PURIFY; YIELD 3,4,5-triacetoxy-6-{2-methyl-2-[3-(3-{4-[7-(2-morpholin-4-yl-ethoxy)-benzo[d]imidazo[2,1-b]thiazol-2-yl]-phenyl}-ureido)-isoxazol-5-yl]-propoxy}-tetrahydro-pyran-2-carboxylic acid methyl ester.?\nAssistant: I suggest the reaction procedure To a stirred mixture of 1-[5-(2-hydroxy-1,1-dimethyl-ethyl)-isoxazol-3-yl]-3-{4-[7-(2-morpholin-4-yl-ethoxy)-benzo[d]imidazo[2,1-b]thiazol-2-yl]-phenyl}-urea (I-1) (1 equivalent) and silver carbonate (0.5-1 equivalent) in toluene at rt is added 1-bromo-2,3,4-tri-O-acetyl-α-D-glucuronic acid methyl ester (0.8-1.5 equivalents) and the mixture is stirred at rt until the reaction is substantially complete as monitored by LCMS or TLC. The mixture is filtered and the filtrate is partitioned between water and either dichloromethane or a mixture of dichloromethane and isopropanol. The separated aqueous layer is further extracted with dichloromethane or a mixture of dichloromethane and isopropanol. The combined organic layers are washed with 2N aq sodium hydroxide dried over magnesium sulfate, filtered, and concentrated under reduced pressure. The residue is purified by preparative reverse-phase HPLC or silica gel flash chromatography to afford 3,4,5-triacetoxy-6-{2-methyl-2-[3-(3-{4-[7-(2-morpholin-4-yl-ethoxy)-benzo[d]imidazo[2,1-b]thiazol-2-yl]-phenyl}-ureido)-isoxazol-5-yl]-propoxy}-tetrahydro-pyran-2-carboxylic acid methyl ester."}", "/scratch/micpie/export/ord_procedure_steps/valid_0-2.jsonl": "{"text":"Task: Convert a description of reaction procedure into a reaction action sequence.\nProcedure: To acetic acid (300 mL) in a 3-neck 2 liter round bottom flask equipped with mechanical stirrer and a thermometer was added Fe powder (99.6 g, 1.78 mol) with stirring at 60° C. 2-Cyano-5-methyl-3-nitropyridine (97 g, 0.59 mol) was dissolved in acetic acid (400 mL) with gentle warming and added to the above reaction mixture drop wise with efficient stirring so that the reaction temperature kept below 80° C. over 3.5 hours. The reaction mixture was further stirred for an addition 30 min, cooled, diluted with EtOAc (750 mL), filtered through celite and washed with EtOAc (1×500 mL, 3×250 mL). Combined EtOAc layers were evaporated to dryness to obtain dark brown solid which was neutralized with saturated NaHCO3 solution (850 mL), after addition of water (250 mL) to obtain homogeneous, this aqueous layer was extracted with EtOAc (1×750 mL, 2×500 mL). Combined EtOAc layers were filtered through small pad of silica gel (sand-SiO2-sand in sintered funnel), dried (Na2SO4) and evaporated to obtain 3-amino-2-cyano-5-methylpyridine (60 g) as yellow solid in 76% yield including ˜10% corresponding amide.\nAnswer: ADD acetic acid (300 mL); ADD Fe powder (99.6 g, 1.78 mol) at 60° C.\n ADD 2-Cyano-5-methyl-3-nitropyridine (97 g, 0.59 mol); ADD acetic acid (400 mL) dropwise at below 80° C over 3.5 hours.\n STIR for 30 min; SETTEMPERATURE room temperature; ADD EtOAc (750 mL); FILTER keep filtrate; WASH with EtOAc (1 (500 mL, 3×250 mL).\n COLLECTLAYER organic; CONCENTRATE; PH with saturated NaHCO3 solution (850 mL) to pH neutral; COLLECTLAYER aqueous; EXTRACT with EtOAc (1×750 mL, 2 x.\n FILTER keep filtrate; DRYSOLUTION over Na2SO4; CONCENTRATE; YIELD 3-amino-2-cyano-5-methylpyridine (60 g, 21.7%)."} {"text":"Task: Convert a description of reaction procedure into a reaction action steps.\nProcedure: 2-(Chloromethyl)oxirane (92 g, 1 mol) was added dropwise to a solution of diphenyl-methanamine (183 g, 1 mol) in CH3OH (1 L) at 0° C. then the mixture was stirred at r.t. overnight. The mixture was then concentrated in vacuo to give the title compound (201 g, 73%) which was used in next step without further purification.\nAnswer: MAKESOLUTION with diphenyl-methanamine (183 g, 1 mol) and CH3OH (1 L); ADD SLN; ADD 2-(Chloromethyl)oxirane (92 g, 1 mol) dropwise at 0° C; STIR for overnight at r.t..\n CONCENTRATE; YIELD title compound (201 g, 73%)."}", "/scratch/micpie/export/ord_procedure_steps/valid_0-1.jsonl": "{"text":"User: Could you give me the reaction action steps for the reaction procedure description To acetic acid (300 mL) in a 3-neck 2 liter round bottom flask equipped with mechanical stirrer and a thermometer was added Fe powder (99.6 g, 1.78 mol) with stirring at 60° C. 2-Cyano-5-methyl-3-nitropyridine (97 g, 0.59 mol) was dissolved in acetic acid (400 mL) with gentle warming and added to the above reaction mixture drop wise with efficient stirring so that the reaction temperature kept below 80° C. over 3.5 hours. The reaction mixture was further stirred for an addition 30 min, cooled, diluted with EtOAc (750 mL), filtered through celite and washed with EtOAc (1×500 mL, 3×250 mL). Combined EtOAc layers were evaporated to dryness to obtain dark brown solid which was neutralized with saturated NaHCO3 solution (850 mL), after addition of water (250 mL) to obtain homogeneous, this aqueous layer was extracted with EtOAc (1×750 mL, 2×500 mL). Combined EtOAc layers were filtered through small pad of silica gel (sand-SiO2-sand in sintered funnel), dried (Na2SO4) and evaporated to obtain 3-amino-2-cyano-5-methylpyridine (60 g) as yellow solid in 76% yield including ˜10% corresponding amide.?\nAssistant: I propose the reaction action steps ADD acetic acid (300 mL); ADD Fe powder (99.6 g, 1.78 mol) at 60° C.\n ADD 2-Cyano-5-methyl-3-nitropyridine (97 g, 0.59 mol); ADD acetic acid (400 mL) dropwise at below 80° C over 3.5 hours.\n STIR for 30 min; SETTEMPERATURE room temperature; ADD EtOAc (750 mL); FILTER keep filtrate; WASH with EtOAc (1 (500 mL, 3×250 mL).\n COLLECTLAYER organic; CONCENTRATE; PH with saturated NaHCO3 solution (850 mL) to pH neutral; COLLECTLAYER aqueous; EXTRACT with EtOAc (1×750 mL, 2 x.\n FILTER keep filtrate; DRYSOLUTION over Na2SO4; CONCENTRATE; YIELD 3-amino-2-cyano-5-methylpyridine (60 g, 21.7%)."} {"text":"User: Could you give me the reaction action sequence for the reaction procedure 2-(Chloromethyl)oxirane (92 g, 1 mol) was added dropwise to a solution of diphenyl-methanamine (183 g, 1 mol) in CH3OH (1 L) at 0° C. then the mixture was stirred at r.t. overnight. The mixture was then concentrated in vacuo to give the title compound (201 g, 73%) which was used in next step without further purification.?\nAssistant: I propose the reaction action sequence MAKESOLUTION with diphenyl-methanamine (183 g, 1 mol) and CH3OH (1 L); ADD SLN; ADD 2-(Chloromethyl)oxirane (92 g, 1 mol) dropwise at 0° C; STIR for overnight at r.t..\n CONCENTRATE; YIELD title compound (201 g, 73%)."}", "/scratch/micpie/export/ord_procedure_steps/train_0-2.jsonl": "{"text":"Task: Convert a procedure into a reaction action sequence.\nProcedure: To a stirring solution of 1-bromo-4-(difluoromethoxy)-3-methoxy-2-(methoxymethoxy)benzene (1 g, 3.19 mmol) in methanol (15 mL) was added concentrated hydrochloride (2 mL) and the reaction mixture was heated to 50° C. for 1 h. The reaction mixture was cooled to RT, concentrated under reduced pressure and the obtained residue was basified with sodium bicarbonate solution and extracted with dichloromethane (3×). The combined dichloromethane layers were washed with brine, dried over sodium sulphate and concentrated under reduced pressure to afford 6-Bromo-3-(difluoromethoxy)-2-methoxyphenol as a solid (800 mg, 93%).\nAnswer: MAKESOLUTION with 1-bromo-4-(difluoromethoxy)-3-methoxy-2-(methoxymethoxy)benzene (1 g, 3.19 mmol) and methanol (15 mL); ADD SLN; ADD concentrated hydrochloride (2 mL); STIR for 1 h at 50° C.\n SETTEMPERATURE RT; CONCENTRATE; PH with sodium bicarbonate solution to pH basic; EXTRACT with dichloromethane.\n WASH with brine; DRYSOLUTION over sodium sulphate; CONCENTRATE; YIELD 6-Bromo-3-(difluoromethoxy)-2-methoxyphenol (800 mg, 93%)."} {"text":"Task: Convert a description of reaction procedure into a reaction action sequence.\nProcedure: To a stirred mixture of 1-[5-(2-hydroxy-1,1-dimethyl-ethyl)-isoxazol-3-yl]-3-{4-[7-(2-morpholin-4-yl-ethoxy)-benzo[d]imidazo[2,1-b]thiazol-2-yl]-phenyl}-urea (I-1) (1 equivalent) and silver carbonate (0.5-1 equivalent) in toluene at rt is added 1-bromo-2,3,4-tri-O-acetyl-α-D-glucuronic acid methyl ester (0.8-1.5 equivalents) and the mixture is stirred at rt until the reaction is substantially complete as monitored by LCMS or TLC. The mixture is filtered and the filtrate is partitioned between water and either dichloromethane or a mixture of dichloromethane and isopropanol. The separated aqueous layer is further extracted with dichloromethane or a mixture of dichloromethane and isopropanol. The combined organic layers are washed with 2N aq sodium hydroxide dried over magnesium sulfate, filtered, and concentrated under reduced pressure. The residue is purified by preparative reverse-phase HPLC or silica gel flash chromatography to afford 3,4,5-triacetoxy-6-{2-methyl-2-[3-(3-{4-[7-(2-morpholin-4-yl-ethoxy)-benzo[d]imidazo[2,1-b]thiazol-2-yl]-phenyl}-ureido)-isoxazol-5-yl]-propoxy}-tetrahydro-pyran-2-carboxylic acid methyl ester.\nAnswer: MAKESOLUTION with 1-[5-(2-hydroxy-1,1-dimethyl-ethyl)-isoxazol-3-yl]-3-{4-[7-(2-morpholin-4-yl-ethoxy)-benzo[d]imidazo[2,1-b]thiazol-2-yl]-phenyl}-urea (1 equivalent) and silver carbonate (0.5-1 equivalent) and toluene; ADD SLN; ADD 1-bromo-2,3,4-tri-O-acetyl-α-D-glucuronic acid methyl ester (0.8-1.5 equivalents) at rt; STIR for unknown at rt.\n FILTER keep filtrate; PARTITION with water and dichloromethane or a mixture of dichloromethane and isopropanol.\n COLLECTLAYER aqueous; EXTRACT with dichloromethane or a mixture of dichloromethane; EXTRACT with isopropanol.\n COLLECTLAYER organic; WASH with 2N aq sodium hydroxide; DRYSOLUTION over magnesium sulfate; FILTER keep filtrate; CONCENTRATE.\n PURIFY; YIELD 3,4,5-triacetoxy-6-{2-methyl-2-[3-(3-{4-[7-(2-morpholin-4-yl-ethoxy)-benzo[d]imidazo[2,1-b]thiazol-2-yl]-phenyl}-ureido)-isoxazol-5-yl]-propoxy}-tetrahydro-pyran-2-carboxylic acid methyl ester."}", "/scratch/micpie/export/ord_procedure_steps/train_0-1.jsonl": "{"text":"User: Can you tell me the reaction action steps for the procedure To a stirring solution of 1-bromo-4-(difluoromethoxy)-3-methoxy-2-(methoxymethoxy)benzene (1 g, 3.19 mmol) in methanol (15 mL) was added concentrated hydrochloride (2 mL) and the reaction mixture was heated to 50° C. for 1 h. The reaction mixture was cooled to RT, concentrated under reduced pressure and the obtained residue was basified with sodium bicarbonate solution and extracted with dichloromethane (3×). The combined dichloromethane layers were washed with brine, dried over sodium sulphate and concentrated under reduced pressure to afford 6-Bromo-3-(difluoromethoxy)-2-methoxyphenol as a solid (800 mg, 93%).?\nAssistant: I suggest the reaction action steps MAKESOLUTION with 1-bromo-4-(difluoromethoxy)-3-methoxy-2-(methoxymethoxy)benzene (1 g, 3.19 mmol) and methanol (15 mL); ADD SLN; ADD concentrated hydrochloride (2 mL); STIR for 1 h at 50° C.\n SETTEMPERATURE RT; CONCENTRATE; PH with sodium bicarbonate solution to pH basic; EXTRACT with dichloromethane.\n WASH with brine; DRYSOLUTION over sodium sulphate; CONCENTRATE; YIELD 6-Bromo-3-(difluoromethoxy)-2-methoxyphenol (800 mg, 93%)."} {"text":"User: Could you show me the reaction action steps for the description of reaction procedure To a stirred mixture of 1-[5-(2-hydroxy-1,1-dimethyl-ethyl)-isoxazol-3-yl]-3-{4-[7-(2-morpholin-4-yl-ethoxy)-benzo[d]imidazo[2,1-b]thiazol-2-yl]-phenyl}-urea (I-1) (1 equivalent) and silver carbonate (0.5-1 equivalent) in toluene at rt is added 1-bromo-2,3,4-tri-O-acetyl-α-D-glucuronic acid methyl ester (0.8-1.5 equivalents) and the mixture is stirred at rt until the reaction is substantially complete as monitored by LCMS or TLC. The mixture is filtered and the filtrate is partitioned between water and either dichloromethane or a mixture of dichloromethane and isopropanol. The separated aqueous layer is further extracted with dichloromethane or a mixture of dichloromethane and isopropanol. The combined organic layers are washed with 2N aq sodium hydroxide dried over magnesium sulfate, filtered, and concentrated under reduced pressure. The residue is purified by preparative reverse-phase HPLC or silica gel flash chromatography to afford 3,4,5-triacetoxy-6-{2-methyl-2-[3-(3-{4-[7-(2-morpholin-4-yl-ethoxy)-benzo[d]imidazo[2,1-b]thiazol-2-yl]-phenyl}-ureido)-isoxazol-5-yl]-propoxy}-tetrahydro-pyran-2-carboxylic acid methyl ester.?\nAssistant: I propose the reaction action steps MAKESOLUTION with 1-[5-(2-hydroxy-1,1-dimethyl-ethyl)-isoxazol-3-yl]-3-{4-[7-(2-morpholin-4-yl-ethoxy)-benzo[d]imidazo[2,1-b]thiazol-2-yl]-phenyl}-urea (1 equivalent) and silver carbonate (0.5-1 equivalent) and toluene; ADD SLN; ADD 1-bromo-2,3,4-tri-O-acetyl-α-D-glucuronic acid methyl ester (0.8-1.5 equivalents) at rt; STIR for unknown at rt.\n FILTER keep filtrate; PARTITION with water and dichloromethane or a mixture of dichloromethane and isopropanol.\n COLLECTLAYER aqueous; EXTRACT with dichloromethane or a mixture of dichloromethane; EXTRACT with isopropanol.\n COLLECTLAYER organic; WASH with 2N aq sodium hydroxide; DRYSOLUTION over magnesium sulfate; FILTER keep filtrate; CONCENTRATE.\n PURIFY; YIELD 3,4,5-triacetoxy-6-{2-methyl-2-[3-(3-{4-[7-(2-morpholin-4-yl-ethoxy)-benzo[d]imidazo[2,1-b]thiazol-2-yl]-phenyl}-ureido)-isoxazol-5-yl]-propoxy}-tetrahydro-pyran-2-carboxylic acid methyl ester."}", "/scratch/micpie/export/train_00.jsonl": "{"text":"The chemical with the SELFIES of [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C] exhibits inhibition of the human beta-secretase 1 (BACE-1)."} {"text":"The IUAPC name in CAS-like style of the molecule with canonical SMILES CS(=O)(=O)c1ccc2nc(NC(=O)CN3CCCCCC3)sc2c1 is 2-(1-azepanyl)-N-(6-methylsulfonyl-1,3-benzothiazol-2-yl)acetamide."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/test_0-10.jsonl": "{"text":"User: Can you give me the SELFIES of a molecule that is not orexin-1 inhibitory?\nAssistant: Yes, here you go: [O][C][C][N][Branch2][Ring2][=Branch1][C][=Branch1][C][=O][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][Branch1][C][O][C][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][Ring1][Branch2][C][C][Ring2][Ring1][=N]"} {"text":"User: Can you generate the InChI of a molecule that is not orexin-1 inhibitory?\nAssistant: Yes, here you go: InChI=1S\/C12H16ClNO2\/c1-3-4-8-16-12(15)14-11-7-5-6-10(13)9(11)2\/h5-7H,3-4,8H2,1-2H3,(H,14,15)"}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/valid_0-8.jsonl": "{"text":"User: Can you figure out if the molecule with the SELFIES [O][=C][Branch1][=N][N][Branch1][=Branch2][C][C][C][C][C][C][Ring1][=Branch1][C][C][N][C][=Branch1][C][=O][C][=C][Branch1][Branch1][C][Ring1][=Branch1][=O][C][=C][C][=C][Ring1][Branch2][N] is orexin-1 inhibitory?\nAssistant: No, this molecule is not orexin-1 inhibitory."} {"text":"User: Can you tell me if the molecule with the DeepSMILES O=CNC=O)CC5nnccc5C)))C)))))))cccccc6))C is orexin-1 inhibitory?\nAssistant: No, this molecule is not orexin-1 inhibitory."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/test_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not orexin-1 inhibitory?\nConstraint: You must select none, one or more options from a, b, or c without using any additional words.\nOptions:\na. O1CCN(C(=O)N2CCN(CC2)CC(O)COc2cc3c(cc2)cccc3)CC1\nb. S=C(N(Cc1ccc(cc1)C)Cc1occc1)Nc1c(cccc1)C\nc. Clc1cc(N2CCN(CC2)C(=O)c2oc(CS(=O)(=O)c3c(OC)cccc3)cc2)ccc1\nAnswer: a, b, c"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not orexin-1 inhibitory?\nConstraint: You must select none, one or more options from A or B without using any other words.\nOptions:\nA: CCCCOC(=O)Nc1cccc(Cl)c1C\nB: O=[N+]([O-])c1ccc(Oc2cccc(C(F)(F)F)c2)nc1\nAnswer: A, B"}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/train_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the SELFIES [Cl][C][=C][Branch2][Ring2][Branch1][C][N][Branch1][=C][C][C][=Branch1][C][=O][N][C][C][C][Branch1][C][C][C][C][=Branch1][C][=O][C][C][C][=Branch1][C][=O][N][C][=N][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][Ring2][Ring1][=C] is orexin-1 inhibitory?\nAssistant: No, this molecule is not orexin-1 inhibitory."} {"text":"User: Can you estimate if the molecule with the InChI InChI=1S\/C17H16N4\/c1-12-5-6-16-15(8-12)17(9-13(2)20-16)21-19-11-14-4-3-7-18-10-14\/h3-11H,1-2H3,(H,20,21)\/b19-11+ is orexin-1 inhibitory?\nAssistant: No, this molecule is not orexin-1 inhibitory."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is orexin-1 inhibitory.\nMolecule canonical SMILES: O=C(N1CCOCC1)N1CCN(CC(O)COc2ccc3ccccc3c2)CC1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is orexin-1 inhibitory.\nSMILES: Clc1c(c(NC(OCCCC)=O)ccc1)C\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/valid_0-9.jsonl": "{"text":"User: Is the molecule with the SMILES O=C(N(C1CCCCC1)C)CN1C(=O)c2c(C1=O)cccc2N orexin-1 inhibitory?\nAssistant: No, it is not orexin-1 inhibitory."} {"text":"User: Is the molecule with the SMILES O=C1N(C(=O)CC1n1nc(cc1C)C)c1ccc(cc1)C orexin-1 inhibitory?\nAssistant: No, it is not orexin-1 inhibitory."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/test_0-1.jsonl": "{"text":"The molecule with the SELFIES representation of [O][C][C][N][Branch2][Ring2][=Branch1][C][=Branch1][C][=O][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][Branch1][C][O][C][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][Ring1][Branch2][C][C][Ring2][Ring1][=N] is not inhibiting orexin 1 receptor."} {"text":"The molecule with the SELFIES representation of [Cl][C][=C][Branch2][Ring1][Ring1][C][Branch1][O][N][C][Branch1][=Branch1][O][C][C][C][C][=O][=C][C][=C][Ring1][=C][C] is not inhibiting orexin 1 receptor."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/valid_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of [O][=C][Branch1][=N][N][Branch1][=Branch2][C][C][C][C][C][C][Ring1][=Branch1][C][C][N][C][=Branch1][C][=O][C][=C][Branch1][Branch1][C][Ring1][=Branch1][=O][C][=C][C][=C][Ring1][Branch2][N] is not orexin-1 inhibitory."} {"text":"The molecule with the SELFIES representation of [O][=C][N][Branch2][Ring1][Ring2][C][=Branch1][C][=O][C][C][Ring1][=Branch1][N][N][=C][Branch1][=Branch1][C][=C][Ring1][Branch1][C][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C] is not orexin-1 inhibitory."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/test_0-2.jsonl": "{"text":"Based on the InChI InChI=1S\/C22H29N3O4\/c26-20(17-29-21-6-5-18-3-1-2-4-19(18)15-21)16-23-7-9-24(10-8-23)22(27)25-11-13-28-14-12-25\/h1-6,15,20,26H,7-14,16-17H2, the molecule has no a orexin 1 receptor antagonist features."} {"text":"Based on the InChI representation InChI=1S\/C12H16ClNO2\/c1-3-4-8-16-12(15)14-11-7-5-6-10(13)9(11)2\/h5-7H,3-4,8H2,1-2H3,(H,14,15), the molecule has no orexin 1 inhibitor characteristics."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/valid_0-10.jsonl": "{"text":"User: Can you generate the DeepSMILES of a molecule that is not orexin-1 inhibitory?\nAssistant: Of course, here you go: O=CNCCCCCC6))))))C))CNC=O)ccC5=O))cccc6N"} {"text":"User: Can you generate the DeepSMILES of a molecule that is not orexin-1 inhibitory?\nAssistant: Of course, here you go: O=CNC=O)CC5nnccc5C)))C)))))))cccccc6))C"}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/train_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is orexin-1 inhibitory.\nMolecule canonical SMILES: CC(C)CCNC(=O)CN(Cc1ccccc1Cl)C(=O)CCC(=O)Nc1ccccn1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not orexin-1 inhibitory."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is orexin-1 inhibitory.\nDeepSMILES: ncccN\\N=C\\ccccnc6)))))))))cc6C))))cccc6))C\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not orexin-1 inhibitory."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/valid_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is orexin-1 inhibitory.\nMolecule InChI: InChI=1S\/C17H21N3O3\/c1-19(11-6-3-2-4-7-11)14(21)10-20-16(22)12-8-5-9-13(18)15(12)17(20)23\/h5,8-9,11H,2-4,6-7,10,18H2,1H3\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not orexin-1 inhibitory."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is orexin-1 inhibitory.\nMolecule InChI: InChI=1S\/C16H17N3O2\/c1-10-4-6-13(7-5-10)18-15(20)9-14(16(18)21)19-12(3)8-11(2)17-19\/h4-8,14H,9H2,1-3H3\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not orexin-1 inhibitory."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/test_0-9.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C22H29N3O4\/c26-20(17-29-21-6-5-18-3-1-2-4-19(18)15-21)16-23-7-9-24(10-8-23)22(27)25-11-13-28-14-12-25\/h1-6,15,20,26H,7-14,16-17H2 orexin-1 inhibitory?\nAssistant: No, it is not orexin-1 inhibitory."} {"text":"User: Is the molecule with the SMILES Clc1c(c(NC(OCCCC)=O)ccc1)C orexin-1 inhibitory?\nAssistant: No, it is not orexin-1 inhibitory."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/test_0-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of OCCNC=O)NCCNCC6))CCO)COcccccc6))cccc6)))))))))))))))))CC6 is not orexin-1 inhibitory."} {"text":"The molecule with the SMILES representation of Clc1c(c(NC(OCCCC)=O)ccc1)C is not orexin-1 inhibitory."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/valid_0-7.jsonl": "{"text":"Task: Please give me a canonical SMILES based on the text description.\nDescription: A molecule that is orexin-1 inhibitory.\nResult: CN(C(=O)CN1C(=O)c2cccc(N)c2C1=O)C1CCCCC1"} {"text":"Task: Please generate a molecule DeepSMILES based on the text description.\nDescription: A molecule that is orexin-1 inhibitory.\nResult: O=CNC=O)CC5nnccc5C)))C)))))))cccccc6))C"}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/test_0-3.jsonl": "{"text":"The DeepSMILES OCCNC=O)NCCNCC6))CCO)COcccccc6))cccc6)))))))))))))))))CC6 is from a molecule that is not identified as orexin-1 inhibitory."} {"text":"The canonical SMILES CCCCOC(=O)Nc1cccc(Cl)c1C represents a molecule that is not identified as orexin-1 inhibitory."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/valid_0-11.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that is not orexin-1 inhibitory?\nAssistant: This is a molecule that is not orexin-1 inhibitory: CN(C(=O)CN1C(=O)c2cccc(N)c2C1=O)C1CCCCC1"} {"text":"User: I'm searching for the InChI of a molecule that is not orexin-1 inhibitory?\nAssistant: This is a molecule that is not orexin-1 inhibitory: InChI=1S\/C16H17N3O2\/c1-10-4-6-13(7-5-10)18-15(20)9-14(16(18)21)19-12(3)8-11(2)17-19\/h4-8,14H,9H2,1-3H3"}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/train_0-0.jsonl": "{"text":"The molecule with the SMILES representation of Clc1c(CN(CC(=O)NCCC(C)C)C(=O)CCC(=O)Nc2ncccc2)cccc1 is not orexin-1 inhibitory."} {"text":"The molecule with the canonical SMILES Cc1ccc2nc(C)cc(N\/N=C\/c3cccnc3)c2c1 is not orexin-1 inhibitory."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/test_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is orexin-1 inhibitory.\nMolecule SELFIES: [O][C][C][N][Branch2][Ring2][=Branch1][C][=Branch1][C][=O][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][Branch1][C][O][C][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][Ring1][Branch2][C][C][Ring2][Ring1][=N]\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not orexin-1 inhibitory."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is orexin-1 inhibitory.\nMolecule DeepSMILES: ClcccNCOCCCC)))))=O)))ccc6))))C\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not orexin-1 inhibitory."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/train_0-10.jsonl": "{"text":"User: Can you give me the canonical SMILES of a molecule that is not orexin-1 inhibitory?\nAssistant: Of course, here you go: CC(C)CCNC(=O)CN(Cc1ccccc1Cl)C(=O)CCC(=O)Nc1ccccn1"} {"text":"User: Can you generate the InChI of a molecule that is not orexin-1 inhibitory?\nAssistant: Yes, here you go: InChI=1S\/C17H16N4\/c1-12-5-6-16-15(8-12)17(9-13(2)20-16)21-19-11-14-4-3-7-18-10-14\/h3-11H,1-2H3,(H,20,21)\/b19-11+"}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/train_0-3.jsonl": "{"text":"The DeepSMILES ClccCNCC=O)NCCCC)C)))))))C=O)CCC=O)Ncncccc6)))))))))))))cccc6 is from a molecule that is not identified as orexin-1 inhibitory."} {"text":"The SELFIES [N][=C][C][=Branch2][Ring1][Ring2][=C][Branch1][N][N][\\N][=C][\\C][=C][C][=C][N][=C][Ring1][=Branch1][C][=C][Ring1][#C][C][C][=C][Branch1][=Branch1][C][=C][Ring2][Ring1][Ring1][C] is from a molecule that is not identified as orexin-1 inhibitory."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/train_0-12.jsonl": "{"text":"User: I want to generate a molecule InChI.\nAssistant: This sounds very curious. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be orexin-1 inhibitory.\nAssistant: Got it, this InChI is not orexin-1 inhibitory: InChI=1S\/C23H29ClN4O3\/c1-17(2)12-14-26-22(30)16-28(15-18-7-3-4-8-19(18)24)23(31)11-10-21(29)27-20-9-5-6-13-25-20\/h3-9,13,17H,10-12,14-16H2,1-2H3,(H,26,30)(H,25,27,29)"} {"text":"User: I want to come up with a canonical SMILES.\nAssistant: This sounds very curious. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be orexin-1 inhibitory.\nAssistant: Ok, here you go, this canonical SMILES is not orexin-1 inhibitory: Cc1ccc2nc(C)cc(N\/N=C\/c3cccnc3)c2c1"}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/test_0-13.jsonl": "{"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be orexin-1 inhibitory.\nAssistant: Got it, this SELFIES is not orexin-1 inhibitory: [O][C][C][N][Branch2][Ring2][=Branch1][C][=Branch1][C][=O][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][Branch1][C][O][C][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][Ring1][Branch2][C][C][Ring2][Ring1][=N]"} {"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be orexin-1 inhibitory.\nAssistant: Got it, this canonical SMILES is not orexin-1 inhibitory: CCCCOC(=O)Nc1cccc(Cl)c1C"}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/valid_0-2.jsonl": "{"text":"Based on the InChI representation InChI=1S\/C17H21N3O3\/c1-19(11-6-3-2-4-7-11)14(21)10-20-16(22)12-8-5-9-13(18)15(12)17(20)23\/h5,8-9,11H,2-4,6-7,10,18H2,1H3, the molecule has no orexin 1 inhibitor properties."} {"text":"Based on the SMILES representation O=C1N(C(=O)CC1n1nc(cc1C)C)c1ccc(cc1)C, the molecule has no orexin 1 inhibitor properties."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of CC(C)CCNC(=O)CN(Cc1ccccc1Cl)C(=O)CCC(=O)Nc1ccccn1 orexin-1 inhibitory?\nConstraint: Even if you are not sure, you must pick either a or b without using any other words.\nOptions:\n(a) False\n(b) True\nAnswer: a"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES Cc1ccc2nc(C)cc(N\/N=C\/c3cccnc3)c2c1 orexin-1 inhibitory?\nConstraint: Even if you are uncertain, you must pick either A or B without using any other words.\nOptions:\nA.) True\nB.) False\nAnswer: B"}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/valid_0-1.jsonl": "{"text":"The molecule with the SMILES representation of O=C(N(C1CCCCC1)C)CN1C(=O)c2c(C1=O)cccc2N is not inhibiting orexin 1 receptor."} {"text":"The molecule with the SELFIES [O][=C][N][Branch2][Ring1][Ring2][C][=Branch1][C][=O][C][C][Ring1][=Branch1][N][N][=C][Branch1][=Branch1][C][=C][Ring1][Branch1][C][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C] is not inhibiting orexin 1 receptor."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/valid_0-13.jsonl": "{"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be orexin-1 inhibitory.\nAssistant: Understood, this SELFIES is not orexin-1 inhibitory: [O][=C][Branch1][=N][N][Branch1][=Branch2][C][C][C][C][C][C][Ring1][=Branch1][C][C][N][C][=Branch1][C][=O][C][=C][Branch1][Branch1][C][Ring1][=Branch1][=O][C][=C][C][=C][Ring1][Branch2][N]"} {"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be orexin-1 inhibitory.\nAssistant: Ok, this SMILES is not orexin-1 inhibitory: O=C1N(C(=O)CC1n1nc(cc1C)C)c1ccc(cc1)C"}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is orexin-1 inhibitory.\ncanonical SMILES: CN(C(=O)CN1C(=O)c2cccc(N)c2C1=O)C1CCCCC1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is orexin-1 inhibitory.\nMolecule SELFIES: [O][=C][N][Branch2][Ring1][Ring2][C][=Branch1][C][=O][C][C][Ring1][=Branch1][N][N][=C][Branch1][=Branch1][C][=C][Ring1][Branch1][C][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/train_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not orexin-1 inhibitory?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any additional words.\nOptions:\nA: COCCNC(=S)N\/N=C(\\C)c1ccc(OC)cc1\nB: CCC(=O)N1c2ccc(S(=O)(=O)NCCc3ccc(OC)c(OC)c3)cc2CC1C\nC: COc1ccc(-c2nc(C#N)c(NCc3ccccc3Cl)o2)cc1OC\nD: CCOc1ccc(N2C(=O)CC(NCCc3ccncc3)C2=O)cc1\nE: CC(C)CCNC(=O)CN(Cc1ccccc1Cl)C(=O)CCC(=O)Nc1ccccn1\nAnswer: A, B, C, D, E"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not orexin-1 inhibitory?\nConstraint: You must select none, one or more options from A, B, C, or D without using any other words.\nOptions:\nA. n1c2c(c(N\\N=C\\c3cccnc3)cc1C)cc(cc2)C\nB. O(C(=O)CCC1CCCC1)C(C)C(=O)Nc1c([N+]([O-])=O)cccc1\nC. Clc1c(C(=O)Nc2sc3c(CCC3)c2C(O)=O)cccc1\nD. O=C(Nc1cc(c2[nH]c3c(n2)cccc3)ccc1)c1cc(NC(=O)c2occc2)ccc1\nAnswer: A, B, C, D"}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/valid_0-4.jsonl": "{"text":"The molecule InChI InChI=1S\/C17H21N3O3\/c1-19(11-6-3-2-4-7-11)14(21)10-20-16(22)12-8-5-9-13(18)15(12)17(20)23\/h5,8-9,11H,2-4,6-7,10,18H2,1H3 is not orexin-1 inhibitory."} {"text":"The molecule SELFIES [O][=C][N][Branch2][Ring1][Ring2][C][=Branch1][C][=O][C][C][Ring1][=Branch1][N][N][=C][Branch1][=Branch1][C][=C][Ring1][Branch1][C][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C] is not orexin-1 inhibitory."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is orexin-1 inhibitory.\nMolecule InChI: InChI=1S\/C23H29ClN4O3\/c1-17(2)12-14-26-22(30)16-28(15-18-7-3-4-8-19(18)24)23(31)11-10-21(29)27-20-9-5-6-13-25-20\/h3-9,13,17H,10-12,14-16H2,1-2H3,(H,26,30)(H,25,27,29)\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is orexin-1 inhibitory.\nMolecule SELFIES: [N][=C][C][=Branch2][Ring1][Ring2][=C][Branch1][N][N][\\N][=C][\\C][=C][C][=C][N][=C][Ring1][=Branch1][C][=C][Ring1][#C][C][C][=C][Branch1][=Branch1][C][=C][Ring2][Ring1][Ring1][C]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/valid_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not orexin-1 inhibitory?\nConstraint: You must select none, one or more options from a, b, c, or d without using any additional words.\nOptions:\n[a] CN(C(=O)CN1C(=O)c2cccc(N)c2C1=O)C1CCCCC1\n[b] CC(=O)Nc1ccc(C(=O)COC(=O)C2CC3CCCC(C2)C3=O)cc1\n[c] Cc1ccc(S(=O)(=O)N2CCOCC2)cc1NC(=O)COC(=O)c1ccc(OCc2c(C)noc2C)cc1\n[d] CCn1c(=O)c(=O)n(CC)c2cc(S(=O)(=O)N3CCc4ccccc43)ccc21\nAnswer: a, b, c, d"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not orexin-1 inhibitory?\nConstraint: You must select none, one or more options from a, b, c, or d without using any additional words.\nOptions:\na. Clc1ccccc1CN1CCN(c2ncccn2)CC1\nb. Cc1ccc(N2C(=O)CC(n3nc(C)cc3C)C2=O)cc1\nc. C\/C(=N\\OCC(=O)OCC(=O)N(Cc1ccccc1)C(C)(C)C)c1ccc2c(c1)OCO2\nd. CCN1CCN(Cc2nc3cc(NC(=O)COc4ccc([N+](=O)[O-])cc4)ccc3n2C)CC1\nAnswer: a, b, c, d"}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/valid_0-12.jsonl": "{"text":"User: I want to create a molecule SELFIES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be orexin-1 inhibitory.\nAssistant: Got it, here you go, this SELFIES is not orexin-1 inhibitory: [O][=C][Branch1][=N][N][Branch1][=Branch2][C][C][C][C][C][C][Ring1][=Branch1][C][C][N][C][=Branch1][C][=O][C][=C][Branch1][Branch1][C][Ring1][=Branch1][=O][C][=C][C][=C][Ring1][Branch2][N]"} {"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be orexin-1 inhibitory.\nAssistant: Got it, here you go, this DeepSMILES is not orexin-1 inhibitory: O=CNC=O)CC5nnccc5C)))C)))))))cccccc6))C"}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/train_0-2.jsonl": "{"text":"Based on the DeepSMILES ClccCNCC=O)NCCCC)C)))))))C=O)CCC=O)Ncncccc6)))))))))))))cccc6, the molecule has no a orexin 1 receptor antagonist features."} {"text":"Based on the InChI representation InChI=1S\/C17H16N4\/c1-12-5-6-16-15(8-12)17(9-13(2)20-16)21-19-11-14-4-3-7-18-10-14\/h3-11H,1-2H3,(H,20,21)\/b19-11+, the molecule has no orexin 1 inhibitor characteristics."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/test_0-11.jsonl": "{"text":"User: I'm looking for the SELFIES of a molecule that is not orexin-1 inhibitory?\nAssistant: This is a molecule that is not orexin-1 inhibitory: [O][C][C][N][Branch2][Ring2][=Branch1][C][=Branch1][C][=O][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][Branch1][C][O][C][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][Ring1][Branch2][C][C][Ring2][Ring1][=N]"} {"text":"User: I'm searching for the InChI of a molecule that is not orexin-1 inhibitory?\nAssistant: This is a molecule that is not orexin-1 inhibitory: InChI=1S\/C12H16ClNO2\/c1-3-4-8-16-12(15)14-11-7-5-6-10(13)9(11)2\/h5-7H,3-4,8H2,1-2H3,(H,14,15)"}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/train_0-7.jsonl": "{"text":"Task: Please give me a canonical SMILES based on the text description below.\nDescription: A molecule that is orexin-1 inhibitory.\nResult: CC(C)CCNC(=O)CN(Cc1ccccc1Cl)C(=O)CCC(=O)Nc1ccccn1"} {"text":"Task: Please give me a molecule SELFIES based on the description.\nDescription: A molecule that is orexin-1 inhibitory.\nResult: [N][=C][C][=Branch2][Ring1][Ring2][=C][Branch1][N][N][\\N][=C][\\C][=C][C][=C][N][=C][Ring1][=Branch1][C][=C][Ring1][#C][C][C][=C][Branch1][=Branch1][C][=C][Ring2][Ring1][Ring1][C]"}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/train_0-11.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that is not orexin-1 inhibitory?\nAssistant: This is a molecule that is not orexin-1 inhibitory: Clc1c(CN(CC(=O)NCCC(C)C)C(=O)CCC(=O)Nc2ncccc2)cccc1"} {"text":"User: I'm looking for the DeepSMILES of a molecule that is not orexin-1 inhibitory?\nAssistant: This is a molecule that is not orexin-1 inhibitory: ncccN\\N=C\\ccccnc6)))))))))cc6C))))cccc6))C"}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/train_0-1.jsonl": "{"text":"The molecule with the SELFIES representation of [Cl][C][=C][Branch2][Ring2][Branch1][C][N][Branch1][=C][C][C][=Branch1][C][=O][N][C][C][C][Branch1][C][C][C][C][=Branch1][C][=O][C][C][C][=Branch1][C][=O][N][C][=N][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][Ring2][Ring1][=C] is not inhibiting orexin 1 receptor."} {"text":"The molecule with the SMILES representation of n1c2c(c(N\\N=C\\c3cccnc3)cc1C)cc(cc2)C is not inhibiting orexin 1 receptor."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/train_0-13.jsonl": "{"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be orexin-1 inhibitory.\nAssistant: Got it, this InChI is not orexin-1 inhibitory: InChI=1S\/C23H29ClN4O3\/c1-17(2)12-14-26-22(30)16-28(15-18-7-3-4-8-19(18)24)23(31)11-10-21(29)27-20-9-5-6-13-25-20\/h3-9,13,17H,10-12,14-16H2,1-2H3,(H,26,30)(H,25,27,29)"} {"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be orexin-1 inhibitory.\nAssistant: Got it, this DeepSMILES is not orexin-1 inhibitory: ncccN\\N=C\\ccccnc6)))))))))cc6C))))cccc6))C"}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/train_0-4.jsonl": "{"text":"The molecule InChI InChI=1S\/C23H29ClN4O3\/c1-17(2)12-14-26-22(30)16-28(15-18-7-3-4-8-19(18)24)23(31)11-10-21(29)27-20-9-5-6-13-25-20\/h3-9,13,17H,10-12,14-16H2,1-2H3,(H,26,30)(H,25,27,29) is not orexin-1 inhibitory."} {"text":"The molecule canonical SMILES Cc1ccc2nc(C)cc(N\/N=C\/c3cccnc3)c2c1 is not orexin-1 inhibitory."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/test_0-7.jsonl": "{"text":"Task: Please give me a molecule SMILES based on the description below.\nDescription: A molecule that is orexin-1 inhibitory.\nResult: O1CCN(C(=O)N2CCN(CC2)CC(O)COc2cc3c(cc2)cccc3)CC1"} {"text":"Task: Please create a molecule canonical SMILES based on the description.\nDescription: A molecule that is orexin-1 inhibitory.\nResult: CCCCOC(=O)Nc1cccc(Cl)c1C"}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/train_0-9.jsonl": "{"text":"User: Is the molecule with the DeepSMILES ClccCNCC=O)NCCCC)C)))))))C=O)CCC=O)Ncncccc6)))))))))))))cccc6 orexin-1 inhibitory?\nAssistant: No, it is not orexin-1 inhibitory."} {"text":"User: Is the molecule with the InChI InChI=1S\/C17H16N4\/c1-12-5-6-16-15(8-12)17(9-13(2)20-16)21-19-11-14-4-3-7-18-10-14\/h3-11H,1-2H3,(H,20,21)\/b19-11+ orexin-1 inhibitory?\nAssistant: No, it is not orexin-1 inhibitory."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/valid_0-3.jsonl": "{"text":"The SELFIES [O][=C][Branch1][=N][N][Branch1][=Branch2][C][C][C][C][C][C][Ring1][=Branch1][C][C][N][C][=Branch1][C][=O][C][=C][Branch1][Branch1][C][Ring1][=Branch1][=O][C][=C][C][=C][Ring1][Branch2][N] represents a molecule that is not identified as orexin-1 inhibitory."} {"text":"The SELFIES [O][=C][N][Branch2][Ring1][Ring2][C][=Branch1][C][=O][C][C][Ring1][=Branch1][N][N][=C][Branch1][=Branch1][C][=C][Ring1][Branch1][C][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C] represents a molecule that is not identified as orexin-1 inhibitory."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/test_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the SELFIES [O][C][C][N][Branch2][Ring2][=Branch1][C][=Branch1][C][=O][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][Branch1][C][O][C][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][Ring1][Branch2][C][C][Ring2][Ring1][=N] is orexin-1 inhibitory?\nAssistant: No, this molecule is not orexin-1 inhibitory."} {"text":"User: Can you figure out if the molecule with the SMILES Clc1c(c(NC(OCCCC)=O)ccc1)C is orexin-1 inhibitory?\nAssistant: No, this molecule is not orexin-1 inhibitory."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [O][C][C][N][Branch2][Ring2][=Branch1][C][=Branch1][C][=O][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][Branch1][C][O][C][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][Ring1][Branch2][C][C][Ring2][Ring1][=N] orexin-1 inhibitory?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any other words.\nOptions:\n[1] True\n[2] False\nAnswer: 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES representation of Clc1c(c(NC(OCCCC)=O)ccc1)C orexin-1 inhibitory?\nConstraint: Even if you are uncertain, you must pick either A or B without using any additional words.\nOptions:\nA.) False\nB.) True\nAnswer: A"}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES O=CNCCCCCC6))))))C))CNC=O)ccC5=O))cccc6N orexin-1 inhibitory?\nConstraint: Even if you are uncertain, you must pick either A or B without using any other words.\nOptions:\n[A] False\n[B] True\nAnswer: A"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [O][=C][N][Branch2][Ring1][Ring2][C][=Branch1][C][=O][C][C][Ring1][=Branch1][N][N][=C][Branch1][=Branch1][C][=C][Ring1][Branch1][C][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C] orexin-1 inhibitory?\nConstraint: Even if you are not sure, you must pick either a or b without using any other words.\nOptions:\n[a] True\n[b] False\nAnswer: b"}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/test_0-4.jsonl": "{"text":"The DeepSMILES OCCNC=O)NCCNCC6))CCO)COcccccc6))cccc6)))))))))))))))))CC6 is not orexin-1 inhibitory."} {"text":"The SELFIES [Cl][C][=C][Branch2][Ring1][Ring1][C][Branch1][O][N][C][Branch1][=Branch1][O][C][C][C][C][=O][=C][C][=C][Ring1][=C][C] is not orexin-1 inhibitory."}", "/scratch/micpie/export/orexin1_receptor_butkiewicz/test_0-12.jsonl": "{"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be orexin-1 inhibitory.\nAssistant: Ok, this canonical SMILES is not orexin-1 inhibitory: O=C(N1CCOCC1)N1CCN(CC(O)COc2ccc3ccccc3c2)CC1"} {"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be orexin-1 inhibitory.\nAssistant: Got it, here you go, this canonical SMILES is not orexin-1 inhibitory: CCCCOC(=O)Nc1cccc(Cl)c1C"}", "/scratch/micpie/export/drug_protein_hpo_disease/valid_0-0.jsonl": "{"text":"The drug CNC=NC=C5C=O)NC)C=O)N6C targets the protein Type 2 inositol 1,4,5-trisphosphate receptor. The protein Type 2 inositol 1,4,5-trisphosphate receptor is associated with Anhidrosis. The Anhidrosis is associated with the disease Autosomal dominant hypohidrotic ectodermal dysplasia."} {"text":"The drug COC=CC\\C=C\/C)CCC=O)OCCNCCOCC6)))))))))))))))CO)=CC=O)OCC5=C9C targets the protein PTP synthase. The protein PTP synthase is associated with Excessive salivation. The Excessive salivation is associated with the disease Hyperphenylalaninemia."}", "/scratch/micpie/export/drug_protein_hpo_disease/test_0-0.jsonl": "{"text":"The drug [H][C@@]CC[C@H]O)[C@@]5C)CC[C@][H])C=CC=CO)C=C6C[C@@H]CCCCCCCCCS=O)CCCCF)F)CF)F)F))))))))))))))))[C@@]%17%10[H] targets the protein ER. The protein ER is associated with Acne. The Acne is associated with the disease acne inversa, familial, 1."} {"text":"The drug CCCCCc1ccc(C)cc1O targets the protein HBSC II. The protein HBSC II is associated with Limited neck range of motion. The Limited neck range of motion is associated with the disease Isolated Klippel-Feil syndrome."}", "/scratch/micpie/export/drug_protein_hpo_disease/train_0-0.jsonl": "{"text":"The drug [H][C@](O)(CCC(O)=O)NC1=CC=C(C=C1)N1C(=O)CCC1=O targets the protein MyHC-beta. The protein MyHC-beta is associated with Asymmetric septal hypertrophy. The Asymmetric septal hypertrophy is associated with the disease Sanfilippo syndrome type C."} {"text":"The drug InChI=1S\/C6H14N4O3\/c7-4(5(11)12)2-1-3-9-6(8)10-13\/h4,13H,1-3,7H2,(H,11,12)(H3,8,9,10)\/t4-\/m0\/s1 targets the protein Neuronal NOS. The protein Neuronal NOS is associated with Decreased prealbumin level. The Decreased prealbumin level is associated with the disease Immune dysregulation-polyendocrinopathy-enteropathy-X-linked syndrome."}", "/scratch/micpie/export/bc5disease/test_0-1.jsonl": "{"text":"User: Does the following text contain mentions of diseases? Can you output matches?\nText: Stopping selegiline also significantly reduced the supine systolic and diastolic blood pressures consistent with a previously undescribed supine pressor action.\nAssistant: There is reduced the supine systolic and diastolic blood pressures."} {"text":"User: Does the following text contain mentions of diseases? Can you return matches?\nText: These findings suggest that helodermin-produced hypotension is partly attributable to the activation of glibenclamide-sensitive K+ channels (K (ATP) channels), which presumably exist on arterial smooth muscle cells.\nAssistant: I found hypotension."}", "/scratch/micpie/export/bc5disease/valid_0-0.jsonl": "{"text":"Task: Find all the mentions of diseases in the following sentence. Return the matching words. If there is no mention of a disease, return `no match`.\nDescription: A lesser degree of orthostatic hypotension occurred with standing.\nAnswer: orthostatic hypotension"} {"text":"Task: Find all the mentions of diseases in the subsequent text. Return the matching entities. If there is no mention of a disease, return `no match`.\nDescription: Our findings indicate that sustained-release nifedipine is useful for hypertensive psoriatic patients under long-term treatment with cyclosporin A, but that these patients should be monitored for gingival hyperplasia.\nAnswer: gingival hyperplasia, psoriatic, and hypertensive"}", "/scratch/micpie/export/bc5disease/test_0-0.jsonl": "{"text":"Task: Find all the mentions of diseases in the subsequent text. Return the matching entities. If there is no mention of a disease, return `no match`.\nDescription: Stopping selegiline also significantly reduced the supine systolic and diastolic blood pressures consistent with a previously undescribed supine pressor action.\nAnswer: reduced the supine systolic and diastolic blood pressures"} {"text":"Task: Find all the mentions of diseases in the following sentence. Return the matching entities. If there is no matching entity, return `no match`.\nSentence: These findings suggest that helodermin-produced hypotension is partly attributable to the activation of glibenclamide-sensitive K+ channels (K (ATP) channels), which presumably exist on arterial smooth muscle cells.\nAnswer: hypotension"}", "/scratch/micpie/export/bc5disease/train_0-0.jsonl": "{"text":"Task: Find all the mentions of diseases in the subsequent text. Return the matching entities. If there is no matching entity, return `no match`.\nDescription: Selegiline-induced postural hypotension in Parkinson's disease: a longitudinal study on the effects of drug withdrawal.\nAnswer: postural hypotension and Parkinson's disease"} {"text":"Task: Find all the mentions of diseases in the subsequent sentence. Return the matching entities. If there is no mention of a disease, return `no match`.\nDescription: The adverse events during combined therapy with cyclosporin A and nifedipine included an increase in blood urea nitrogen levels in 9 of the 13 patients and development of gingival hyperplasia in 2 of the 13 patients.\nAnswer: gingival hyperplasia"}", "/scratch/micpie/export/bc5disease/valid_0-1.jsonl": "{"text":"User: Does the following text contain mentions of diseases? Can you output matches?\nA lesser degree of orthostatic hypotension occurred with standing.\nAssistant: There is orthostatic hypotension."} {"text":"User: Does the following text contain mentions of diseases?Please return matches\nText: Our findings indicate that sustained-release nifedipine is useful for hypertensive psoriatic patients under long-term treatment with cyclosporin A, but that these patients should be monitored for gingival hyperplasia.\nAssistant: There is gingival hyperplasia, psoriatic, and hypertensive."}", "/scratch/micpie/export/bc5disease/train_0-1.jsonl": "{"text":"User: Does the following text contain mentions of diseases? Can you return matches?\nText: Selegiline-induced postural hypotension in Parkinson's disease: a longitudinal study on the effects of drug withdrawal.\nAssistant: I found postural hypotension and Parkinson's disease."} {"text":"User: Does the following text contain mentions of diseases? Can you output matches?\nText: The adverse events during combined therapy with cyclosporin A and nifedipine included an increase in blood urea nitrogen levels in 9 of the 13 patients and development of gingival hyperplasia in 2 of the 13 patients.\nAssistant: There is gingival hyperplasia."}", "/scratch/micpie/export/compound_protein_go_term_4/test_0-1.jsonl": "{"text":"The compound CCCCCCOccccccC=O)CCNC)C)))))ccc6c%10 targets the protein EAR-7. The protein EAR-7 enables the general transcription initiation factor binding."} {"text":"The compound CC[C@H]C=O)O))[C@H]CCcccOCCcnc-cccccc6))))))oc5C)))))))))ccc69 targets the protein PPAR-gamma. The protein PPAR-gamma is located in the nucleus."}", "/scratch/micpie/export/compound_protein_go_term_4/valid_0-0.jsonl": "{"text":"The compound CC\/C=C\\cccccc6))))))ccccOCC[N+]C)C)C)))))cc6)))))))cccccc6.[I-] targets the protein ER which enables the sequence-specific double-stranded DNA binding."} {"text":"The compound CCCCCCOc1ccc(C(=O)CCN2CC2C)c(Cl)c1 targets the protein c-erbA-beta which is located in the nucleus."}", "/scratch/micpie/export/compound_protein_go_term_4/train_1-0.jsonl": "{"text":"The compound [O][=C][C][=C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][=C][C][=C][Branch2][Ring1][C][N][C][C][Branch1][C][F][Branch1][C][F][C][Branch1][C][F][Branch1][C][F][F][C][=C][Ring1][#C][NH1][Ring2][Ring1][#Branch1] targets the protein Dihydrotestosterone receptor which is involved in the spermatogenesis."} {"text":"The compound COc1ccc2oc(-c3ccccc3)cc(=O)c2c1 targets the protein Placenta-specific ATP-binding cassette transporter which is located in the plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_4/test_0-0.jsonl": "{"text":"The compound CCCCCCOccccccC=O)CCNC)C)))))ccc6c%10 targets the protein Thyroid hormone receptor alpha which enables the general transcription initiation factor binding."} {"text":"The compound InChI=1S\/C25H27NO4\/c1-3-20(25(27)28)22-11-9-18-15-19(10-12-21(18)22)29-14-13-23-16(2)30-24(26-23)17-7-5-4-6-8-17\/h4-8,10,12,15,20,22H,3,9,11,13-14H2,1-2H3,(H,27,28)\/t20-,22+\/m0\/s1 targets the protein Nuclear receptor subfamily 1 group C member 3 which is located in the nucleus."}", "/scratch/micpie/export/compound_protein_go_term_4/train_0-0.jsonl": "{"text":"The compound [C][=C][C@][C][C][C@][Branch1][C][C][Branch1][C][O][C][C][Ring1][Branch2][=C][C][C@@H1][C@@H1][Ring1][N][C][C][C@][Branch1][C][C][C@@H1][Branch1][C][O][C][C][C@@H1][Ring1][O][Ring1][#Branch1] targets the protein Nuclear receptor subfamily 3 group A member 2 which is located in the chromatin."} {"text":"The compound [O][=C][C][=C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][=C][C][=C][Branch2][Ring1][C][N][C][C][Branch1][C][F][Branch1][C][F][C][Branch1][C][F][Branch1][C][F][F][C][=C][Ring1][#C][NH1][Ring2][Ring1][#Branch1] targets the protein Androgen receptor which is involved in the single fertilization."}", "/scratch/micpie/export/compound_protein_go_term_4/test_1-1.jsonl": "{"text":"The compound CCCCCCCNCCccccO[C@H]C)C=O)[O-]))))cc6))))))))cncccccc6o9.[Na+] targets the protein Peroxisome proliferator-activated receptor alpha. The protein Peroxisome proliferator-activated receptor alpha enables the DNA-binding transcription factor binding."} {"text":"The compound CCCOC[C@@H][C@H]CNC[C@]56ccccCl)cCl)c6 targets the protein NET. The protein NET is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_4/valid_0-1.jsonl": "{"text":"The compound InChI=1S\/C27H32NO.HI\/c1-5-26(22-12-8-6-9-13-22)27(23-14-10-7-11-15-23)24-16-18-25(19-17-24)29-21-20-28(2,3)4;\/h6-19H,5,20-21H2,1-4H3;1H\/q+1;\/p-1\/b27-26-; targets the protein ER. The protein ER enables the sequence-specific double-stranded DNA binding."} {"text":"The compound CCCCCCOccccC=O)CCNCC3C)))))))cCl)c6 targets the protein c-erbA-2. The protein c-erbA-2 is located in the nucleus."}", "/scratch/micpie/export/compound_protein_go_term_4/valid_1-1.jsonl": "{"text":"The compound InChI=1S\/C13H8BrNO3\/c14-10-5-9(17)6-11-12(10)18-13(15-11)7-1-3-8(16)4-2-7\/h1-6,16-17H targets the protein Estrogen receptor beta. The protein Estrogen receptor beta enables the receptor antagonist activity."} {"text":"The compound COC(=O)[C@H]1C2CCC(C[C@@H]1c1ccc(Br)cc1)N2C targets the protein NET. The protein NET is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_4/test_1-0.jsonl": "{"text":"The compound CCCCCCCN(CCc1ccc(O[C@H](C)C(=O)[O-])cc1)c1nc2ccccc2o1.[Na+] targets the protein Peroxisome proliferator-activated receptor alpha which enables the DNA-binding transcription factor binding."} {"text":"The compound CCCOC[C@@H]1[C@H]2CNC[C@]21c1ccc(Cl)c(Cl)c1 targets the protein NET which is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_4/train_1-1.jsonl": "{"text":"The compound O=cccCF)F)F))ccccNCCF)F)CF)F)F)))))cc6[nH]%10 targets the protein Dihydrotestosterone receptor. The protein Dihydrotestosterone receptor is involved in the spermatogenesis."} {"text":"The compound InChI=1S\/C16H12O3\/c1-18-12-7-8-15-13(9-12)14(17)10-16(19-15)11-5-3-2-4-6-11\/h2-10H,1H3 targets the protein CDw338. The protein CDw338 is located in the plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_4/train_0-1.jsonl": "{"text":"The compound C=C[C@]12CC[C@](C)(O)CC1=CC[C@H]1[C@@H]3CC[C@H](O)[C@@]3(C)CC[C@@H]12 targets the protein Nuclear receptor subfamily 3 group A member 2. The protein Nuclear receptor subfamily 3 group A member 2 is located in the chromatin."} {"text":"The compound O=c1cc(C(F)(F)F)c2ccc(NCC(F)(F)C(F)(F)F)cc2[nH]1 targets the protein Nuclear receptor subfamily 3 group C member 4. The protein Nuclear receptor subfamily 3 group C member 4 is involved in the single fertilization."}", "/scratch/micpie/export/compound_protein_go_term_4/valid_1-0.jsonl": "{"text":"The compound Occcc-cncccO)ccBr)c6o9)))))))))cc6 targets the protein Estrogen receptor beta which enables the receptor antagonist activity."} {"text":"The compound [C][O][C][=Branch1][C][=O][C@H1][C][C][C][C][Branch1][S][C][C@@H1][Ring1][#Branch1][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][N][Ring1][=C][C] targets the protein Norepinephrine transporter which is located in the integral component of plasma membrane."}", "/scratch/micpie/export/MUV_737/valid_0-0.jsonl": "{"text":"The chemical compound with the canonical SMILES Cn1ccnc1SCC(=O)Nc1ccc(Oc2ccccc2)cc1 is not a potentiator of the ER-alpha-coact. binding."} {"text":"The molecule with the canonical SMILES COc1ccc(-n2nnc3c(=O)n(CC(=O)N4CCCCC4)cnc32)cc1 is not a potentiator of the estrogen receptor-alpha-coactivator binding."}", "/scratch/micpie/export/MUV_737/test_0-0.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C24H30N4O3\/c29-23(17-27-13-1-2-14-27)25-19-5-9-21(10-6-19)31-22-11-7-20(8-12-22)26-24(30)18-28-15-3-4-16-28\/h5-12H,1-4,13-18H2,(H,25,29)(H,26,30) is not a potentiator of the ER-alpha-coact. binding."} {"text":"The molecule with the SELFIES representation of ['[C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1]'] is not a potentiator of the estrogen receptor-alpha-coactivator binding."}", "/scratch/micpie/export/MUV_737/train_0-0.jsonl": "{"text":"The chemical compound with the canonical SMILES CCc1cccc2c1NC(=O)C21C2C(=O)N(Cc3ccccc3)C(=O)C2C2CCCN21 is not a potentiator of the estrogen receptor-alpha-coactivator binding."} {"text":"The molecular species with the SELFIES ['[O][=S][=Branch1][C][=O][Branch2][Ring1][=Branch1][N][C][C][Branch1][=Branch2][C][=C][C][=C][N][=C][Ring1][=Branch1][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][S][Ring1][Branch1]'] is not a potentiator of the estrogen receptor-alpha-coactivator binding."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_5-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SELFIES from the DeepSMILES.\nMolecule DeepSMILES: CCcccCCCCN5C=O)NCCCOC))))))))))))on5\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C]"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SELFIES from the DeepSMILES.\nDeepSMILES: CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]']"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_2-4.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the DeepSMILES CCC)CCCCNCCNccccC=O)NS=O)=O)ccccNCCO))CCOCC6)))))))c[N+]=O)[O-]))c6)))))))))cOcccc[nH]ccc5c9))))))))))c6))))))CC6)))))))=CccccCl)cc6))))))C6?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N]."} {"text":"User: Can you generate the SELFIES of the molecule with the DeepSMILES CNC)CCNC=O)coccccCl)cc6c=O)c%10C%13cccccBr)c6?\nAssistant: Yes, this molecule has a SELFIES of [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_2-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the DeepSMILES.\nMolecule DeepSMILES: CC=O)[C@]O)CC[C@]O)[C@]O)CC=CC[C@@H]O)CC[C@]6C)[C@H]%10C[C@@H]OC=O)\/C=C\\C)CC)C))))))[C@]%17%14C\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SELFIES from the DeepSMILES.\nDeepSMILES: ccccCOccccCNCCNcccccn6))))))CC6)))))))cc6))))))))cc6\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O]"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_4-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of CCC)OccccNC=O)NccnnCCCCCO5))))))c5))))))))cc6 can also be represented with the SELFIES representation [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1]."} {"text":"The molecule with the DeepSMILES representation of NC=O)ccccC#CCNC=O)ccccccccc6c%10)))))))))))))))cc6 can also be represented with the SELFIES [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_4-4.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the DeepSMILES CCC)OccccNC=O)NccnnCCCCCO5))))))c5))))))))cc6?\nAssistant: Sure, this molecule has a SELFIES of [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1]."} {"text":"User: Can you create the SELFIES of the molecule with the DeepSMILES NC=O)ccccC#CCNC=O)ccccccccc6c%10)))))))))))))))cc6?\nAssistant: Of course, this molecule has a SELFIES of [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_1-4.jsonl": "{"text":"User: Can you create the SELFIES of the molecule with the DeepSMILES CNCCccscc5c=O)n-cccccc6))))))cnncS)n95))))))))))C6?\nAssistant: Sure, this molecule has a SELFIES of [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2]."} {"text":"User: Can you generate the SELFIES of the molecule with the DeepSMILES O=CCncccn5)CCCCC7))))))))))NccF)ccF)cc6Br?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_5-1.jsonl": "{"text":"The molecule with the SELFIES [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C] can also be represented with the DeepSMILES representation CCcccCCCCN5C=O)NCCCOC))))))))))))on5."} {"text":"The molecule with the SELFIES representation of ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]'] can also be represented with the DeepSMILES CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_4-4.jsonl": "{"text":"User: Can you create the SELFIES of the molecule with the DeepSMILES CCNCC))C=O)CNC)C=O)cnonc5N?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]."} {"text":"User: Can you tell me the SELFIES of the molecule with the DeepSMILES CcccccC)c6NC=O)CNC)S=O)=O)Ccccccc6?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_4-4.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the DeepSMILES CCNC=O)CCCNC=O)cnonc5N)))))))CC6?\nAssistant: Yes, this molecule has a SELFIES of [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C]."} {"text":"User: Can you create the SELFIES of the molecule with the DeepSMILES CNC)cncN)ncCOC=O)ccccCl)cc6N))))))))))n6?\nAssistant: Sure, this molecule has a SELFIES of [C][N][Branch1][C][C][C][=N][C][Branch1][C][N][=N][C][Branch2][Ring1][Ring1][C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=N][Ring2][Ring1][Ring1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_1-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the DeepSMILES.\nMolecule DeepSMILES: COcccCNCCScnnnn5C)))))))))))ccc6OCcccccc6\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SELFIES from the DeepSMILES.\nDeepSMILES: CNC=S)NCCCccccOC))cc6))))))=N5\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N]"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_3-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the DeepSMILES.\nMolecule DeepSMILES: COCCCNC=O)C=O)\/C=C\/O)ccccS=O)=O)NCCOCC6)))))))cc6)))))))C5cccccCl)c6\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1]"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SELFIES from the DeepSMILES.\nDeepSMILES: COccccNC=O)NccnnCCCCCO5))))))c5))))))))cc6F\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F]"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_1-5.jsonl": "{"text":"User: Can you tell me the DeepSMILES of the molecule with the SELFIES [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]?\nAssistant: Yes, this molecule has a DeepSMILES of COcccCNCCScnnnn5C)))))))))))ccc6OCcccccc6."} {"text":"User: Can you tell me the DeepSMILES of the molecule with the SELFIES [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N]?\nAssistant: Of course, this molecule has a DeepSMILES of CNC=S)NCCCccccOC))cc6))))))=N5."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_5-0.jsonl": "{"text":"The molecule with the DeepSMILES CCcccCCCCN5C=O)NCCCOC))))))))))))on5 can also be represented with the SELFIES representation [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C]."} {"text":"The molecule with the DeepSMILES CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58 can also be represented with the SELFIES representation ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]']."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_1-5.jsonl": "{"text":"User: Can you tell me the DeepSMILES of the molecule with the SELFIES [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2]?\nAssistant: Sure, this molecule has a DeepSMILES of Ccccc-cnccccccc6[nH]c=O)n%10n%13)))))))))))))cc6."} {"text":"User: Can you tell me the DeepSMILES of the molecule with the SELFIES [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2]?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of CccOCCNCCNC)CC6)))))))))cnncncOccccNC=O)cccccc6F)))))))))cc6F))))))))c96."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_5-4.jsonl": "{"text":"User: Can you create the SELFIES of the molecule with the DeepSMILES Ccnccccccc6nn9cC)c%13CCC=O)NCC)C?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C]."} {"text":"User: Can you generate the SELFIES of the molecule with the DeepSMILES Nccc[nH+]cccccc%106)))))))CCCC6?\nAssistant: Sure, this molecule has a SELFIES of ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]']."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_0-5.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the SELFIES [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1]?\nAssistant: Sure, this molecule has a DeepSMILES of S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6."} {"text":"User: Can you create the DeepSMILES of the molecule with the SELFIES [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1]?\nAssistant: Sure, this molecule has a DeepSMILES of ccccNcncCScnc[nH]n5)))))))cs5))))))cc6."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_3-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of O=CCScccc-cccco5)))))nn6))))))))NcccccF)c6 can also be represented with the SELFIES [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1]."} {"text":"The molecule with the DeepSMILES CCNCCcccccc6))))))))C=O)cnonc5N can also be represented with the SELFIES representation [C][C][N][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_0-1.jsonl": "{"text":"The molecule with the SELFIES [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1] can also be represented with the DeepSMILES S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6."} {"text":"The molecule with the SELFIES [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1] can also be represented with the DeepSMILES representation ccccNcncCScnc[nH]n5)))))))cs5))))))cc6."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_5-0.jsonl": "{"text":"The molecule with the DeepSMILES CCNCC))C=O)CNCC))cccC)ccC)c6 can also be represented with the SELFIES representation [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2]."} {"text":"The molecule with the DeepSMILES representation of CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20 can also be represented with the SELFIES ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]']."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_2-0.jsonl": "{"text":"The molecule with the DeepSMILES CCC)CCCCNCCNccccC=O)NS=O)=O)ccccNCCO))CCOCC6)))))))c[N+]=O)[O-]))c6)))))))))cOcccc[nH]ccc5c9))))))))))c6))))))CC6)))))))=CccccCl)cc6))))))C6 can also be represented with the SELFIES representation [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N]."} {"text":"The molecule with the DeepSMILES CNC)CCNC=O)coccccCl)cc6c=O)c%10C%13cccccBr)c6 can also be represented with the SELFIES [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_2-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SELFIES from the DeepSMILES.\nDeepSMILES: COcccc\/C=N\/NcncC)cs5))))))))cc6OC\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C]"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the DeepSMILES.\nDeepSMILES: CCOcccsc5C=O)OC\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C]"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_0-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6 can also be represented with the SELFIES representation [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2]."} {"text":"The molecule with the DeepSMILES COcccOC))ccC=O)NCC=O)NCCNC)C))))))CCCCCCCC6)C8)))C6)))))))))c6 can also be represented with the SELFIES [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_3-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the SELFIES.\nMolecule SELFIES: [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: O=CCScccc-cccco5)))))nn6))))))))NcccccF)c6"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SELFIES.\nSELFIES: [C][C][N][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CCNCCcccccc6))))))))C=O)cnonc5N"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_5-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SELFIES.\nSELFIES: [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: Ccnccccccc6nn9cC)c%13CCC=O)NCC)C"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the SELFIES.\nMolecule SELFIES: ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]']\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: Nccc[nH+]cccccc%106)))))))CCCC6"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_3-5.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the SELFIES [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]?\nAssistant: Of course, this molecule has a DeepSMILES of Cl.O=CccccBr)cc6))))))CCNCCOCC6)))))))cccccc6."} {"text":"User: Can you create the DeepSMILES of the molecule with the SELFIES [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2]?\nAssistant: Of course, this molecule has a DeepSMILES of CCOcccccc6CNC=O)NccccOC))cF)c6."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_5-5.jsonl": "{"text":"User: Can you tell me the DeepSMILES of the molecule with the SELFIES [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2]?\nAssistant: Sure, this molecule has a DeepSMILES of CCNCC))C=O)CNCC))cccC)ccC)c6."} {"text":"User: Can you generate the DeepSMILES of the molecule with the SELFIES ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]']?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_4-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the SELFIES.\nMolecule SELFIES: [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CCNCC))C=O)CNC)C=O)cnonc5N"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the SELFIES.\nSELFIES: [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CcccccC)c6NC=O)CNC)S=O)=O)Ccccccc6"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_1-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the SELFIES.\nMolecule SELFIES: [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: Ccccc-cnccccccc6[nH]c=O)n%10n%13)))))))))))))cc6"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the SELFIES.\nMolecule SELFIES: [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CccOCCNCCNC)CC6)))))))))cnncncOccccNC=O)cccccc6F)))))))))cc6F))))))))c96"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_0-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the DeepSMILES.\nMolecule DeepSMILES: S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1]"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SELFIES from the DeepSMILES.\nDeepSMILES: ccccNcncCScnc[nH]n5)))))))cs5))))))cc6\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1]"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_3-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of COCCCNC=O)C=O)\/C=C\/O)ccccS=O)=O)NCCOCC6)))))))cc6)))))))C5cccccCl)c6 can also be represented with the SELFIES representation [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1]."} {"text":"The molecule with the DeepSMILES representation of COccccNC=O)NccnnCCCCCO5))))))c5))))))))cc6F can also be represented with the SELFIES representation [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_1-0.jsonl": "{"text":"The molecule with the DeepSMILES Ccccc-cnccccccc6[nH]c=O)n%10n%13)))))))))))))cc6 can also be represented with the SELFIES representation [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2]."} {"text":"The molecule with the DeepSMILES representation of CccOCCNCCNC)CC6)))))))))cnncncOccccNC=O)cccccc6F)))))))))cc6F))))))))c96 can also be represented with the SELFIES representation [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_5-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the DeepSMILES.\nDeepSMILES: CCNCC))C=O)CNCC))cccC)ccC)c6\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SELFIES from the DeepSMILES.\nMolecule DeepSMILES: CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]']"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_2-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the SELFIES.\nSELFIES: [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CC=O)[C@]O)CC[C@]O)[C@]O)CC=CC[C@@H]O)CC[C@]6C)[C@H]%10C[C@@H]OC=O)\/C=C\\C)CC)C))))))[C@]%17%14C"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the SELFIES.\nSELFIES: [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: ccccCOccccCNCCNcccccn6))))))CC6)))))))cc6))))))))cc6"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_1-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SELFIES from the DeepSMILES.\nMolecule DeepSMILES: CNCCccscc5c=O)n-cccccc6))))))cnncS)n95))))))))))C6\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2]"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SELFIES from the DeepSMILES.\nDeepSMILES: O=CCncccn5)CCCCC7))))))))))NccF)ccF)cc6Br\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br]"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_5-5.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the SELFIES [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C]?\nAssistant: Yes, this molecule has a DeepSMILES of CCcccCCCCN5C=O)NCCCOC))))))))))))on5."} {"text":"User: Can you tell me the DeepSMILES of the molecule with the SELFIES ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]']?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_2-4.jsonl": "{"text":"User: Can you create the SELFIES of the molecule with the DeepSMILES CC=O)[C@]O)CC[C@]O)[C@]O)CC=CC[C@@H]O)CC[C@]6C)[C@H]%10C[C@@H]OC=O)\/C=C\\C)CC)C))))))[C@]%17%14C?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C]."} {"text":"User: Can you create the SELFIES of the molecule with the DeepSMILES ccccCOccccCNCCNcccccn6))))))CC6)))))))cc6))))))))cc6?\nAssistant: Sure, this molecule has a SELFIES of [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_3-4.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the DeepSMILES O=CCScccc-cccco5)))))nn6))))))))NcccccF)c6?\nAssistant: Sure, this molecule has a SELFIES of [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1]."} {"text":"User: Can you tell me the SELFIES of the molecule with the DeepSMILES CCNCCcccccc6))))))))C=O)cnonc5N?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][C][N][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_0-0.jsonl": "{"text":"The molecule with the DeepSMILES S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6 can also be represented with the SELFIES representation [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1]."} {"text":"The molecule with the DeepSMILES representation of ccccNcncCScnc[nH]n5)))))))cs5))))))cc6 can also be represented with the SELFIES [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_2-5.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the SELFIES [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N]?\nAssistant: Of course, this molecule has a DeepSMILES of CCC)CCCCNCCNccccC=O)NS=O)=O)ccccNCCO))CCOCC6)))))))c[N+]=O)[O-]))c6)))))))))cOcccc[nH]ccc5c9))))))))))c6))))))CC6)))))))=CccccCl)cc6))))))C6."} {"text":"User: Can you create the DeepSMILES of the molecule with the SELFIES [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1]?\nAssistant: Yes, this molecule has a DeepSMILES of CNC)CCNC=O)coccccCl)cc6c=O)c%10C%13cccccBr)c6."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_5-5.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the SELFIES [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C]?\nAssistant: Sure, this molecule has a DeepSMILES of Ccnccccccc6nn9cC)c%13CCC=O)NCC)C."} {"text":"User: Can you tell me the DeepSMILES of the molecule with the SELFIES ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]']?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of Nccc[nH+]cccccc%106)))))))CCCC6."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_3-5.jsonl": "{"text":"User: Can you tell me the DeepSMILES of the molecule with the SELFIES [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1]?\nAssistant: Of course, this molecule has a DeepSMILES of COCCCNC=O)C=O)\/C=C\/O)ccccS=O)=O)NCCOCC6)))))))cc6)))))))C5cccccCl)c6."} {"text":"User: Can you create the DeepSMILES of the molecule with the SELFIES [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F]?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of COccccNC=O)NccnnCCCCCO5))))))c5))))))))cc6F."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_2-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of CC=O)[C@]O)CC[C@]O)[C@]O)CC=CC[C@@H]O)CC[C@]6C)[C@H]%10C[C@@H]OC=O)\/C=C\\C)CC)C))))))[C@]%17%14C can also be represented with the SELFIES [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C]."} {"text":"The molecule with the DeepSMILES ccccCOccccCNCCNcccccn6))))))CC6)))))))cc6))))))))cc6 can also be represented with the SELFIES [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_1-4.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the DeepSMILES Ccccc-cnccccccc6[nH]c=O)n%10n%13)))))))))))))cc6?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2]."} {"text":"User: Can you generate the SELFIES of the molecule with the DeepSMILES CccOCCNCCNC)CC6)))))))))cnncncOccccNC=O)cccccc6F)))))))))cc6F))))))))c96?\nAssistant: Of course, this molecule has a SELFIES of [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_2-0.jsonl": "{"text":"The molecule with the DeepSMILES COcccc\/C=N\/NcncC)cs5))))))))cc6OC can also be represented with the SELFIES representation [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C]."} {"text":"The molecule with the DeepSMILES representation of CCOcccsc5C=O)OC can also be represented with the SELFIES [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_4-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the DeepSMILES.\nMolecule DeepSMILES: CCNCC))C=O)CNC)C=O)cnonc5N\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the DeepSMILES.\nMolecule DeepSMILES: CcccccC)c6NC=O)CNC)S=O)=O)Ccccccc6\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_0-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the SELFIES.\nMolecule SELFIES: [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the SELFIES.\nSELFIES: [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: ccccNcncCScnc[nH]n5)))))))cs5))))))cc6"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_4-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the DeepSMILES.\nMolecule DeepSMILES: CCC)OccccNC=O)NccnnCCCCCO5))))))c5))))))))cc6\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the DeepSMILES.\nDeepSMILES: NC=O)ccccC#CCNC=O)ccccccccc6c%10)))))))))))))))cc6\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1]"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_3-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the DeepSMILES.\nMolecule DeepSMILES: O=CCScccc-cccco5)))))nn6))))))))NcccccF)c6\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1]"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the DeepSMILES.\nMolecule DeepSMILES: CCNCCcccccc6))))))))C=O)cnonc5N\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][C][N][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_2-1.jsonl": "{"text":"The molecule with the SELFIES [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C] can also be represented with the DeepSMILES COcccc\/C=N\/NcncC)cs5))))))))cc6OC."} {"text":"The molecule with the SELFIES representation of [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C] can also be represented with the DeepSMILES CCOcccsc5C=O)OC."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_5-4.jsonl": "{"text":"User: Can you create the SELFIES of the molecule with the DeepSMILES CCNCC))C=O)CNCC))cccC)ccC)c6?\nAssistant: Yes, this molecule has a SELFIES of [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2]."} {"text":"User: Can you tell me the SELFIES of the molecule with the DeepSMILES CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20?\nAssistant: Of course, this molecule has a SELFIES of ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]']."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_4-5.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the SELFIES [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1]?\nAssistant: Sure, this molecule has a DeepSMILES of CCC)OccccNC=O)NccnnCCCCCO5))))))c5))))))))cc6."} {"text":"User: Can you create the DeepSMILES of the molecule with the SELFIES [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1]?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of NC=O)ccccC#CCNC=O)ccccccccc6c%10)))))))))))))))cc6."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_4-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of CCNCC))C=O)CNC)C=O)cnonc5N can also be represented with the SELFIES [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]."} {"text":"The molecule with the DeepSMILES representation of CcccccC)c6NC=O)CNC)S=O)=O)Ccccccc6 can also be represented with the SELFIES representation [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_5-1.jsonl": "{"text":"The molecule with the SELFIES [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C] can also be represented with the DeepSMILES representation Ccnccccccc6nn9cC)c%13CCC=O)NCC)C."} {"text":"The molecule with the SELFIES representation of ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]'] can also be represented with the DeepSMILES Nccc[nH+]cccccc%106)))))))CCCC6."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_2-1.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N] can also be represented with the DeepSMILES CCC)CCCCNCCNccccC=O)NS=O)=O)ccccNCCO))CCOCC6)))))))c[N+]=O)[O-]))c6)))))))))cOcccc[nH]ccc5c9))))))))))c6))))))CC6)))))))=CccccCl)cc6))))))C6."} {"text":"The molecule with the SELFIES representation of [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1] can also be represented with the DeepSMILES CNC)CCNC=O)coccccCl)cc6c=O)c%10C%13cccccBr)c6."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_0-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C can also be represented with the SELFIES representation [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C]."} {"text":"The molecule with the DeepSMILES representation of CCOC=O)ccccc-[n+]c-cccccc6))))))cc-cccccc6))))))cc6-cccccc6))))))))))))c6.[O-][Cl+3][O-])[O-])[O-] can also be represented with the SELFIES representation [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_0-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the SELFIES.\nSELFIES: [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the SELFIES.\nMolecule SELFIES: [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CCOC=O)ccccc-[n+]c-cccccc6))))))cc-cccccc6))))))cc6-cccccc6))))))))))))c6.[O-][Cl+3][O-])[O-])[O-]"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_1-4.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the DeepSMILES COcccCNCCScnnnn5C)))))))))))ccc6OCcccccc6?\nAssistant: Of course, this molecule has a SELFIES of [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]."} {"text":"User: Can you generate the SELFIES of the molecule with the DeepSMILES CNC=S)NCCCccccOC))cc6))))))=N5?\nAssistant: Sure, this molecule has a SELFIES of [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_2-5.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the SELFIES [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C]?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of CC=O)[C@]O)CC[C@]O)[C@]O)CC=CC[C@@H]O)CC[C@]6C)[C@H]%10C[C@@H]OC=O)\/C=C\\C)CC)C))))))[C@]%17%14C."} {"text":"User: Can you generate the DeepSMILES of the molecule with the SELFIES [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O]?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of ccccCOccccCNCCNcccccn6))))))CC6)))))))cc6))))))))cc6."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_1-1.jsonl": "{"text":"The molecule with the SELFIES [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1] can also be represented with the DeepSMILES COcccCNCCScnnnn5C)))))))))))ccc6OCcccccc6."} {"text":"The molecule with the SELFIES representation of [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N] can also be represented with the DeepSMILES representation CNC=S)NCCCccccOC))cc6))))))=N5."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_5-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the DeepSMILES.\nMolecule DeepSMILES: Ccnccccccc6nn9cC)c%13CCC=O)NCC)C\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C]"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the DeepSMILES.\nDeepSMILES: Nccc[nH+]cccccc%106)))))))CCCC6\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]']"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_5-1.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2] can also be represented with the DeepSMILES CCNCC))C=O)CNCC))cccC)ccC)c6."} {"text":"The molecule with the SELFIES representation of ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]'] can also be represented with the DeepSMILES representation CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_4-1.jsonl": "{"text":"The molecule with the SELFIES [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C] can also be represented with the DeepSMILES CCNC=O)CCCNC=O)cnonc5N)))))))CC6."} {"text":"The molecule with the SELFIES representation of [C][N][Branch1][C][C][C][=N][C][Branch1][C][N][=N][C][Branch2][Ring1][Ring1][C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=N][Ring2][Ring1][Ring1] can also be represented with the DeepSMILES CNC)cncN)ncCOC=O)ccccCl)cc6N))))))))))n6."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_3-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SELFIES.\nMolecule SELFIES: [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: COCCCNC=O)C=O)\/C=C\/O)ccccS=O)=O)NCCOCC6)))))))cc6)))))))C5cccccCl)c6"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SELFIES.\nMolecule SELFIES: [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: COccccNC=O)NccnnCCCCCO5))))))c5))))))))cc6F"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_3-4.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the DeepSMILES Cl.O=CccccBr)cc6))))))CCNCCOCC6)))))))cccccc6?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]."} {"text":"User: Can you tell me the SELFIES of the molecule with the DeepSMILES CCOcccccc6CNC=O)NccccOC))cF)c6?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_1-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the SELFIES.\nSELFIES: [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CNCCccscc5c=O)n-cccccc6))))))cnncS)n95))))))))))C6"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the SELFIES.\nSELFIES: [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: O=CCncccn5)CCCCC7))))))))))NccF)ccF)cc6Br"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_0-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SELFIES from the DeepSMILES.\nMolecule DeepSMILES: S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2]"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the DeepSMILES.\nDeepSMILES: COcccOC))ccC=O)NCC=O)NCCNC)C))))))CCCCCCCC6)C8)))C6)))))))))c6\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C]"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_5-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of Ccnccccccc6nn9cC)c%13CCC=O)NCC)C can also be represented with the SELFIES [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C]."} {"text":"The molecule with the DeepSMILES representation of Nccc[nH+]cccccc%106)))))))CCCC6 can also be represented with the SELFIES ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]']."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_0-1.jsonl": "{"text":"The molecule with the SELFIES [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2] can also be represented with the DeepSMILES representation S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6."} {"text":"The molecule with the SELFIES representation of [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C] can also be represented with the DeepSMILES representation COcccOC))ccC=O)NCC=O)NCCNC)C))))))CCCCCCCC6)C8)))C6)))))))))c6."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_2-4.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the DeepSMILES COcccc\/C=N\/NcncC)cs5))))))))cc6OC?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C]."} {"text":"User: Can you tell me the SELFIES of the molecule with the DeepSMILES CCOcccsc5C=O)OC?\nAssistant: Sure, this molecule has a SELFIES of [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_2-1.jsonl": "{"text":"The molecule with the SELFIES [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C] can also be represented with the DeepSMILES CC=O)[C@]O)CC[C@]O)[C@]O)CC=CC[C@@H]O)CC[C@]6C)[C@H]%10C[C@@H]OC=O)\/C=C\\C)CC)C))))))[C@]%17%14C."} {"text":"The molecule with the SELFIES representation of [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O] can also be represented with the DeepSMILES ccccCOccccCNCCNcccccn6))))))CC6)))))))cc6))))))))cc6."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_1-1.jsonl": "{"text":"The molecule with the SELFIES [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2] can also be represented with the DeepSMILES CNCCccscc5c=O)n-cccccc6))))))cnncS)n95))))))))))C6."} {"text":"The molecule with the SELFIES [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br] can also be represented with the DeepSMILES O=CCncccn5)CCCCC7))))))))))NccF)ccF)cc6Br."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_3-5.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the SELFIES [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1]?\nAssistant: Yes, this molecule has a DeepSMILES of O=CCScccc-cccco5)))))nn6))))))))NcccccF)c6."} {"text":"User: Can you tell me the DeepSMILES of the molecule with the SELFIES [C][C][N][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]?\nAssistant: Yes, this molecule has a DeepSMILES of CCNCCcccccc6))))))))C=O)cnonc5N."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_3-1.jsonl": "{"text":"The molecule with the SELFIES [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1] can also be represented with the DeepSMILES COCCCNC=O)C=O)\/C=C\/O)ccccS=O)=O)NCCOCC6)))))))cc6)))))))C5cccccCl)c6."} {"text":"The molecule with the SELFIES [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F] can also be represented with the DeepSMILES COccccNC=O)NccnnCCCCCO5))))))c5))))))))cc6F."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_0-5.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the SELFIES [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2]?\nAssistant: Of course, this molecule has a DeepSMILES of S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6."} {"text":"User: Can you tell me the DeepSMILES of the molecule with the SELFIES [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C]?\nAssistant: Of course, this molecule has a DeepSMILES of COcccOC))ccC=O)NCC=O)NCCNC)C))))))CCCCCCCC6)C8)))C6)))))))))c6."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_0-4.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the DeepSMILES S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2]."} {"text":"User: Can you tell me the SELFIES of the molecule with the DeepSMILES COcccOC))ccC=O)NCC=O)NCCNC)C))))))CCCCCCCC6)C8)))C6)))))))))c6?\nAssistant: Yes, this molecule has a SELFIES of [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_1-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SELFIES.\nMolecule SELFIES: [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: COcccCNCCScnnnn5C)))))))))))ccc6OCcccccc6"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the SELFIES.\nSELFIES: [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CNC=S)NCCCccccOC))cc6))))))=N5"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_5-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the SELFIES.\nSELFIES: [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CCNCC))C=O)CNCC))cccC)ccC)c6"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the SELFIES.\nSELFIES: ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]']\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_0-5.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the SELFIES [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C]?\nAssistant: Sure, this molecule has a DeepSMILES of OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C."} {"text":"User: Can you generate the DeepSMILES of the molecule with the SELFIES [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1]?\nAssistant: Of course, this molecule has a DeepSMILES of CCOC=O)ccccc-[n+]c-cccccc6))))))cc-cccccc6))))))cc6-cccccc6))))))))))))c6.[O-][Cl+3][O-])[O-])[O-]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_4-5.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the SELFIES [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C]?\nAssistant: Of course, this molecule has a DeepSMILES of CCNC=O)CCCNC=O)cnonc5N)))))))CC6."} {"text":"User: Can you generate the DeepSMILES of the molecule with the SELFIES [C][N][Branch1][C][C][C][=N][C][Branch1][C][N][=N][C][Branch2][Ring1][Ring1][C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=N][Ring2][Ring1][Ring1]?\nAssistant: Of course, this molecule has a DeepSMILES of CNC)cncN)ncCOC=O)ccccCl)cc6N))))))))))n6."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_1-0.jsonl": "{"text":"The molecule with the DeepSMILES COcccCNCCScnnnn5C)))))))))))ccc6OCcccccc6 can also be represented with the SELFIES [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]."} {"text":"The molecule with the DeepSMILES representation of CNC=S)NCCCccccOC))cc6))))))=N5 can also be represented with the SELFIES [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_0-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SELFIES from the DeepSMILES.\nDeepSMILES: OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C]"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the DeepSMILES.\nMolecule DeepSMILES: CCOC=O)ccccc-[n+]c-cccccc6))))))cc-cccccc6))))))cc6-cccccc6))))))))))))c6.[O-][Cl+3][O-])[O-])[O-]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1]"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_4-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SELFIES from the DeepSMILES.\nDeepSMILES: CCNC=O)CCCNC=O)cnonc5N)))))))CC6\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C]"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SELFIES from the DeepSMILES.\nMolecule DeepSMILES: CNC)cncN)ncCOC=O)ccccCl)cc6N))))))))))n6\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [C][N][Branch1][C][C][C][=N][C][Branch1][C][N][=N][C][Branch2][Ring1][Ring1][C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=N][Ring2][Ring1][Ring1]"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_4-1.jsonl": "{"text":"The molecule with the SELFIES [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N] can also be represented with the DeepSMILES CCNCC))C=O)CNC)C=O)cnonc5N."} {"text":"The molecule with the SELFIES [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1] can also be represented with the DeepSMILES CcccccC)c6NC=O)CNC)S=O)=O)Ccccccc6."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_5-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the SELFIES.\nSELFIES: [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CCcccCCCCN5C=O)NCCCOC))))))))))))on5"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SELFIES.\nMolecule SELFIES: ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]']\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_2-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SELFIES from the DeepSMILES.\nMolecule DeepSMILES: CCC)CCCCNCCNccccC=O)NS=O)=O)ccccNCCO))CCOCC6)))))))c[N+]=O)[O-]))c6)))))))))cOcccc[nH]ccc5c9))))))))))c6))))))CC6)))))))=CccccCl)cc6))))))C6\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the DeepSMILES.\nDeepSMILES: CNC)CCNC=O)coccccCl)cc6c=O)c%10C%13cccccBr)c6\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1]"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_1-1.jsonl": "{"text":"The molecule with the SELFIES [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2] can also be represented with the DeepSMILES Ccccc-cnccccccc6[nH]c=O)n%10n%13)))))))))))))cc6."} {"text":"The molecule with the SELFIES [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2] can also be represented with the DeepSMILES CccOCCNCCNC)CC6)))))))))cnncncOccccNC=O)cccccc6F)))))))))cc6F))))))))c96."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_0-1.jsonl": "{"text":"The molecule with the SELFIES representation of [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C] can also be represented with the DeepSMILES OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C."} {"text":"The molecule with the SELFIES representation of [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1] can also be represented with the DeepSMILES CCOC=O)ccccc-[n+]c-cccccc6))))))cc-cccccc6))))))cc6-cccccc6))))))))))))c6.[O-][Cl+3][O-])[O-])[O-]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_3-4.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the DeepSMILES COCCCNC=O)C=O)\/C=C\/O)ccccS=O)=O)NCCOCC6)))))))cc6)))))))C5cccccCl)c6?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1]."} {"text":"User: Can you generate the SELFIES of the molecule with the DeepSMILES COccccNC=O)NccnnCCCCCO5))))))c5))))))))cc6F?\nAssistant: Sure, this molecule has a SELFIES of [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_0-4.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the DeepSMILES OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C?\nAssistant: Yes, this molecule has a SELFIES of [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C]."} {"text":"User: Can you generate the SELFIES of the molecule with the DeepSMILES CCOC=O)ccccc-[n+]c-cccccc6))))))cc-cccccc6))))))cc6-cccccc6))))))))))))c6.[O-][Cl+3][O-])[O-])[O-]?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_1-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of CNCCccscc5c=O)n-cccccc6))))))cnncS)n95))))))))))C6 can also be represented with the SELFIES representation [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2]."} {"text":"The molecule with the DeepSMILES O=CCncccn5)CCCCC7))))))))))NccF)ccF)cc6Br can also be represented with the SELFIES representation [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_4-5.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the SELFIES [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]?\nAssistant: Sure, this molecule has a DeepSMILES of CCNCC))C=O)CNC)C=O)cnonc5N."} {"text":"User: Can you create the DeepSMILES of the molecule with the SELFIES [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]?\nAssistant: Yes, this molecule has a DeepSMILES of CcccccC)c6NC=O)CNC)S=O)=O)Ccccccc6."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_4-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the SELFIES.\nMolecule SELFIES: [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CCNC=O)CCCNC=O)cnonc5N)))))))CC6"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the SELFIES.\nMolecule SELFIES: [C][N][Branch1][C][C][C][=N][C][Branch1][C][N][=N][C][Branch2][Ring1][Ring1][C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=N][Ring2][Ring1][Ring1]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CNC)cncN)ncCOC=O)ccccCl)cc6N))))))))))n6"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_4-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SELFIES.\nSELFIES: [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CCC)OccccNC=O)NccnnCCCCCO5))))))c5))))))))cc6"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SELFIES.\nMolecule SELFIES: [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: NC=O)ccccC#CCNC=O)ccccccccc6c%10)))))))))))))))cc6"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_3-1.jsonl": "{"text":"The molecule with the SELFIES [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1] can also be represented with the DeepSMILES Cl.O=CccccBr)cc6))))))CCNCCOCC6)))))))cccccc6."} {"text":"The molecule with the SELFIES representation of [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2] can also be represented with the DeepSMILES representation CCOcccccc6CNC=O)NccccOC))cF)c6."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_2-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the SELFIES.\nMolecule SELFIES: [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: COcccc\/C=N\/NcncC)cs5))))))))cc6OC"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SELFIES.\nMolecule SELFIES: [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CCOcccsc5C=O)OC"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_3-1.jsonl": "{"text":"The molecule with the SELFIES [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1] can also be represented with the DeepSMILES representation O=CCScccc-cccco5)))))nn6))))))))NcccccF)c6."} {"text":"The molecule with the SELFIES representation of [C][C][N][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N] can also be represented with the DeepSMILES representation CCNCCcccccc6))))))))C=O)cnonc5N."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_0-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the SELFIES.\nSELFIES: [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the SELFIES.\nSELFIES: [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: COcccOC))ccC=O)NCC=O)NCCNC)C))))))CCCCCCCC6)C8)))C6)))))))))c6"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_1-5.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the SELFIES [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2]?\nAssistant: Yes, this molecule has a DeepSMILES of CNCCccscc5c=O)n-cccccc6))))))cnncS)n95))))))))))C6."} {"text":"User: Can you tell me the DeepSMILES of the molecule with the SELFIES [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br]?\nAssistant: Yes, this molecule has a DeepSMILES of O=CCncccn5)CCCCC7))))))))))NccF)ccF)cc6Br."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_4-1.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1] can also be represented with the DeepSMILES CCC)OccccNC=O)NccnnCCCCCO5))))))c5))))))))cc6."} {"text":"The molecule with the SELFIES representation of [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1] can also be represented with the DeepSMILES representation NC=O)ccccC#CCNC=O)ccccccccc6c%10)))))))))))))))cc6."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_5-4.jsonl": "{"text":"User: Can you create the SELFIES of the molecule with the DeepSMILES CCcccCCCCN5C=O)NCCCOC))))))))))))on5?\nAssistant: Yes, this molecule has a SELFIES of [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C]."} {"text":"User: Can you generate the SELFIES of the molecule with the DeepSMILES CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58?\nAssistant: Sure, this molecule has a SELFIES of ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]']."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/valid_2-5.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the SELFIES [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C]?\nAssistant: Sure, this molecule has a DeepSMILES of COcccc\/C=N\/NcncC)cs5))))))))cc6OC."} {"text":"User: Can you tell me the DeepSMILES of the molecule with the SELFIES [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C]?\nAssistant: Sure, this molecule has a DeepSMILES of CCOcccsc5C=O)OC."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_1-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the DeepSMILES.\nDeepSMILES: Ccccc-cnccccccc6[nH]c=O)n%10n%13)))))))))))))cc6\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the DeepSMILES.\nMolecule DeepSMILES: CccOCCNCCNC)CC6)))))))))cnncncOccccNC=O)cccccc6F)))))))))cc6F))))))))c96\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2]"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_0-4.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the DeepSMILES S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6?\nAssistant: Yes, this molecule has a SELFIES of [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1]."} {"text":"User: Can you generate the SELFIES of the molecule with the DeepSMILES ccccNcncCScnc[nH]n5)))))))cs5))))))cc6?\nAssistant: Of course, this molecule has a SELFIES of [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_3-0.jsonl": "{"text":"The molecule with the DeepSMILES Cl.O=CccccBr)cc6))))))CCNCCOCC6)))))))cccccc6 can also be represented with the SELFIES representation [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]."} {"text":"The molecule with the DeepSMILES representation of CCOcccccc6CNC=O)NccccOC))cF)c6 can also be represented with the SELFIES representation [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/test_2-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the SELFIES.\nSELFIES: [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CCC)CCCCNCCNccccC=O)NS=O)=O)ccccNCCO))CCOCC6)))))))c[N+]=O)[O-]))c6)))))))))cOcccc[nH]ccc5c9))))))))))c6))))))CC6)))))))=CccccCl)cc6))))))C6"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SELFIES.\nSELFIES: [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CNC)CCNC=O)coccccCl)cc6c=O)c%10C%13cccccBr)c6"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_4-0.jsonl": "{"text":"The molecule with the DeepSMILES CCNC=O)CCCNC=O)cnonc5N)))))))CC6 can also be represented with the SELFIES representation [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C]."} {"text":"The molecule with the DeepSMILES CNC)cncN)ncCOC=O)ccccCl)cc6N))))))))))n6 can also be represented with the SELFIES representation [C][N][Branch1][C][C][C][=N][C][Branch1][C][N][=N][C][Branch2][Ring1][Ring1][C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=N][Ring2][Ring1][Ring1]."}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_3-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SELFIES from the DeepSMILES.\nDeepSMILES: Cl.O=CccccBr)cc6))))))CCNCCOCC6)))))))cccccc6\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the DeepSMILES.\nMolecule DeepSMILES: CCOcccccc6CNC=O)NccccOC))cF)c6\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2]"}", "/scratch/micpie/export/mol_repr_transl_selfies_deepsmiles/train_3-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the SELFIES.\nSELFIES: [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Cl.O=CccccBr)cc6))))))CCNCCOCC6)))))))cccccc6"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the SELFIES.\nSELFIES: [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CCOcccccc6CNC=O)NccccOC))cF)c6"}", "/scratch/micpie/export/mona/test_0-10.jsonl": "{"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should have a spectral entropy of 2.314 nat.\nAssistant: Understood, this SMILES represents a molecule that has a spectral entropy of 2.314 nat: Cc1cccc(C)c1OCC(C)N"} {"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a spectral entropy of 1.251 nat.\nAssistant: Understood, this InChI represents a molecule that has a spectral entropy of 1.251 nat: InChI=1S\/C10H10O3\/c1-6-5-7-3-2-4-8(11)9(7)10(12)13-6\/h2-4,6,11H,5H2,1H3"}", "/scratch/micpie/export/mona/valid_0-8.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that has a spectral entropy of 4.125 nat.\nAssistant: This is a molecule that has a spectral entropy of 4.125 nat: CO\/N=C(\\C(O)=N[C@@H]1C(=O)N2C(C(=O)O)=C(COC(=N)O)CS[C@H]12)c1ccco1"} {"text":"User: I'm looking for the canonical SMILES of a molecule that has a spectral entropy of 0.000 nat.\nAssistant: This is a molecule that has a spectral entropy of 0.000 nat: CC[C@@H](C)[C@@H](O)[C@H](C[C@@H](C)CCCCCC[C@@H](O)C[C@H](O)CN)OC(=O)CC(CC(=O)O)C(=O)O"}", "/scratch/micpie/export/mona/train_0-8.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that has a spectral entropy of 1.858 nat.\nAssistant: This is a molecule that has a spectral entropy of 1.858 nat: N=C(O)CS(=O)C(c1ccccc1)c1ccccc1"} {"text":"User: I'm searching for the canonical SMILES of a molecule that has a spectral entropy of 1.614 nat.\nAssistant: This is a molecule that has a spectral entropy of 1.614 nat: COc1cc(O)c2c(c1)C(=O)C1=C(C2=O)[C@H](O)[C@@H](O)[C@@](C)(O)[C@@H]1O"}", "/scratch/micpie/export/mona/test_0-5.jsonl": "{"text":"Task: Please give me a molecule canonical SMILES based on the text description.\nDescription: A molecule that has a spectral entropy of 2.314 nat.\nResult: Cc1cccc(C)c1OCC(C)N"} {"text":"Task: Please give me a InChI based on the text description below.\nDescription: A molecule that has a spectral entropy of 1.251 nat.\nResult: InChI=1S\/C10H10O3\/c1-6-5-7-3-2-4-8(11)9(7)10(12)13-6\/h2-4,6,11H,5H2,1H3"}", "/scratch/micpie/export/mona/valid_0-9.jsonl": "{"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a spectral entropy of 4.125 nat.\nAssistant: Got it, this InChI represents a molecule that has a spectral entropy of 4.125 nat: InChI=1S\/C16H16N4O8S\/c1-26-19-9(8-3-2-4-27-8)12(21)18-10-13(22)20-11(15(23)24)7(5-28-16(17)25)6-29-14(10)20\/h2-4,10,14H,5-6H2,1H3,(H2,17,25)(H,18,21)(H,23,24)\/b19-9-\/t10-,14-\/m1\/s1"} {"text":"User: I want to come up with a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a spectral entropy of 0.000 nat.\nAssistant: Got it, here you go, this canonical SMILES represents a molecule that has a spectral entropy of 0.000 nat: CC[C@@H](C)[C@@H](O)[C@H](C[C@@H](C)CCCCCC[C@@H](O)C[C@H](O)CN)OC(=O)CC(CC(=O)O)C(=O)O"}", "/scratch/micpie/export/mona/test_0-1.jsonl": "{"text":"Based on the InChI InChI=1S\/C11H17NO\/c1-8-5-4-6-9(2)11(8)13-7-10(3)12\/h4-6,10H,7,12H2,1-3H3, the molecule has a spectral entropy of 2.314 nat."} {"text":"Based on the DeepSMILES CCCcccccO)c6C=O)O%10, the molecule has a spectral entropy of 1.251 nat."}", "/scratch/micpie/export/mona/valid_0-0.jsonl": "{"text":"The molecule with the SELFIES [C][O][\/N][=C][Branch2][Ring2][Branch1][\\C][Branch1][C][O][=N][C@@H1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][=Branch1][C][=O][O][=C][Branch1][Branch2][C][O][C][=Branch1][C][=N][O][C][S][C@H1][Ring1][P][Ring1][=C][C][=C][C][=C][O][Ring1][Branch1] has a spectral entropy of 4.125 nat."} {"text":"The molecule with the InChI InChI=1S\/C25H47NO9\/c1-4-17(3)24(32)21(35-23(31)13-18(25(33)34)12-22(29)30)11-16(2)9-7-5-6-8-10-19(27)14-20(28)15-26\/h16-21,24,27-28,32H,4-15,26H2,1-3H3,(H,29,30)(H,33,34)\/t16-,17+,18?,19+,20-,21-,24+\/m0\/s1 has a spectral entropy of 0.000 nat."}", "/scratch/micpie/export/mona/test_0-2.jsonl": "{"text":"The InChI InChI=1S\/C11H17NO\/c1-8-5-4-6-9(2)11(8)13-7-10(3)12\/h4-6,10H,7,12H2,1-3H3 represents a molecule that has a spectral entropy of 2.314 nat."} {"text":"The DeepSMILES CCCcccccO)c6C=O)O%10 represents a molecule with a spectral entropy of 1.251 nat."}", "/scratch/micpie/export/mona/valid_0-10.jsonl": "{"text":"User: I want to generate a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should have a spectral entropy of 4.125 nat.\nAssistant: Understood, this canonical SMILES represents a molecule that has a spectral entropy of 4.125 nat: CO\/N=C(\\C(O)=N[C@@H]1C(=O)N2C(C(=O)O)=C(COC(=N)O)CS[C@H]12)c1ccco1"} {"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should have a spectral entropy of 0.000 nat.\nAssistant: Ok, this DeepSMILES represents a molecule that has a spectral entropy of 0.000 nat: CC[C@@H]C)[C@@H]O)[C@H]C[C@@H]C)CCCCCC[C@@H]O)C[C@H]O)CN)))))))))))))OC=O)CCCC=O)O)))C=O)O"}", "/scratch/micpie/export/mona/train_0-6.jsonl": "{"text":"User: Can you tell me the spectral entropy in nat of the molecule with the SELFIES [N][=C][Branch1][C][O][C][S][=Branch1][C][=O][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]?\nAssistant: Sure, this molecule has a spectral entropy of 1.858 nat."} {"text":"User: Can you tell me the spectral entropy in nat of the molecule with the SELFIES [C][O][C][=C][C][Branch1][C][O][=C][C][=Branch1][Ring2][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=C][Branch1][Branch1][C][Ring1][Branch2][=O][C@H1][Branch1][C][O][C@@H1][Branch1][C][O][C@@][Branch1][C][C][Branch1][C][O][C@@H1][Ring1][N][O]?\nAssistant: Of course, this molecule has a spectral entropy of 1.614 nat."}", "/scratch/micpie/export/mona/valid_0-6.jsonl": "{"text":"User: Can you tell me the spectral entropy in nat of the molecule with the SMILES CO\/N=C(\\C(O)=N[C@@H]1C(=O)N2C(C(=O)O)=C(COC(=N)O)CS[C@H]12)c1ccco1?\nAssistant: Yes, I'm happy to help, this molecule has a spectral entropy of 4.125 nat."} {"text":"User: Can you tell me the spectral entropy in nat of the molecule with the DeepSMILES CC[C@@H]C)[C@@H]O)[C@H]C[C@@H]C)CCCCCC[C@@H]O)C[C@H]O)CN)))))))))))))OC=O)CCCC=O)O)))C=O)O?\nAssistant: Yes, this molecule has a spectral entropy of 0.000 nat."}", "/scratch/micpie/export/mona/test_0-9.jsonl": "{"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a spectral entropy of 2.314 nat.\nAssistant: Ok, this canonical SMILES represents a molecule that has a spectral entropy of 2.314 nat: Cc1cccc(C)c1OCC(C)N"} {"text":"User: I want to create a SELFIES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a spectral entropy of 1.251 nat.\nAssistant: Ok, here you go, this SELFIES represents a molecule that has a spectral entropy of 1.251 nat: [C][C][C][C][=C][C][=C][C][Branch1][C][O][=C][Ring1][#Branch1][C][=Branch1][C][=O][O][Ring1][N]"}", "/scratch/micpie/export/mona/test_0-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of CcccccC)c6OCCC)N has a spectral entropy of 2.314 nat."} {"text":"The molecule with the SMILES CC1Cc2cccc(O)c2C(=O)O1 has a spectral entropy of 1.251 nat."}", "/scratch/micpie/export/mona/valid_0-7.jsonl": "{"text":"User: Can you create the DeepSMILES of a molecule that has a spectral entropy of 4.125 nat?\nAssistant: Yes, I'm happy to help, here you go: CO\/N=C\\CO)=N[C@@H]C=O)NCC=O)O))=CCOC=N)O))))CS[C@H]86))))))))))cccco5"} {"text":"User: Can you create the SMILES of a molecule that has a spectral entropy of 0.000 nat?\nAssistant: Of course, here you go: CC[C@@H](C)[C@@H](O)[C@H](C[C@@H](C)CCCCCC[C@@H](O)C[C@H](O)CN)OC(=O)CC(CC(=O)O)C(=O)O"}", "/scratch/micpie/export/mona/test_0-3.jsonl": "{"text":"The molecule with the DeepSMILES CcccccC)c6OCCC)N has a spectral entropy of 2.314 nat."} {"text":"The molecule with the InChI InChI=1S\/C10H10O3\/c1-6-5-7-3-2-4-8(11)9(7)10(12)13-6\/h2-4,6,11H,5H2,1H3 has a spectral entropy of 1.251 nat."}", "/scratch/micpie/export/mona/train_0-0.jsonl": "{"text":"The molecule with the SELFIES [N][=C][Branch1][C][O][C][S][=Branch1][C][=O][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1] has a spectral entropy of 1.858 nat."} {"text":"The molecule with the SELFIES representation of [C][O][C][=C][C][Branch1][C][O][=C][C][=Branch1][Ring2][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=C][Branch1][Branch1][C][Ring1][Branch2][=O][C@H1][Branch1][C][O][C@@H1][Branch1][C][O][C@@][Branch1][C][C][Branch1][C][O][C@@H1][Ring1][N][O] has a spectral entropy of 1.614 nat."}", "/scratch/micpie/export/mona/test_0-6.jsonl": "{"text":"User: Can you derive the spectral entropy in nat of the molecule with the DeepSMILES CcccccC)c6OCCC)N?\nAssistant: Yes, I'm happy to help, this molecule has a spectral entropy of 2.314 nat."} {"text":"User: Can you estimate the spectral entropy in nat of the molecule with the InChI InChI=1S\/C10H10O3\/c1-6-5-7-3-2-4-8(11)9(7)10(12)13-6\/h2-4,6,11H,5H2,1H3?\nAssistant: Yes, I'm happy to help, this molecule has a spectral entropy of 1.251 nat."}", "/scratch/micpie/export/mona/train_0-10.jsonl": "{"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should have a spectral entropy of 1.858 nat.\nAssistant: Understood, this InChI represents a molecule that has a spectral entropy of 1.858 nat: InChI=1S\/C15H15NO2S\/c16-14(17)11-19(18)15(12-7-3-1-4-8-12)13-9-5-2-6-10-13\/h1-10,15H,11H2,(H2,16,17)"} {"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a spectral entropy of 1.614 nat.\nAssistant: Got it, this DeepSMILES represents a molecule that has a spectral entropy of 1.614 nat: COcccO)ccc6)C=O)C=CC6=O))[C@H]O)[C@@H]O)[C@@]C)O)[C@@H]6O"}", "/scratch/micpie/export/mona/train_0-3.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C15H15NO2S\/c16-14(17)11-19(18)15(12-7-3-1-4-8-12)13-9-5-2-6-10-13\/h1-10,15H,11H2,(H2,16,17) has a spectral entropy of 1.858 nat."} {"text":"The molecule with the canonical SMILES COc1cc(O)c2c(c1)C(=O)C1=C(C2=O)[C@H](O)[C@@H](O)[C@@](C)(O)[C@@H]1O has a spectral entropy of 1.614 nat."}", "/scratch/micpie/export/mona/valid_0-2.jsonl": "{"text":"The canonical SMILES CO\/N=C(\\C(O)=N[C@@H]1C(=O)N2C(C(=O)O)=C(COC(=N)O)CS[C@H]12)c1ccco1 represents a molecule that has a spectral entropy of 4.125 nat."} {"text":"The InChI InChI=1S\/C25H47NO9\/c1-4-17(3)24(32)21(35-23(31)13-18(25(33)34)12-22(29)30)11-16(2)9-7-5-6-8-10-19(27)14-20(28)15-26\/h16-21,24,27-28,32H,4-15,26H2,1-3H3,(H,29,30)(H,33,34)\/t16-,17+,18?,19+,20-,21-,24+\/m0\/s1 is representing a molecule that has a spectral entropy of 0.000 nat."}", "/scratch/micpie/export/mona/valid_0-1.jsonl": "{"text":"Based on the DeepSMILES representation of CO\/N=C\\CO)=N[C@@H]C=O)NCC=O)O))=CCOC=N)O))))CS[C@H]86))))))))))cccco5, the molecule has a spectral entropy of 4.125 nat."} {"text":"Based on the DeepSMILES CC[C@@H]C)[C@@H]O)[C@H]C[C@@H]C)CCCCCC[C@@H]O)C[C@H]O)CN)))))))))))))OC=O)CCCC=O)O)))C=O)O, the molecule has a spectral entropy of 0.000 nat."}", "/scratch/micpie/export/mona/valid_0-5.jsonl": "{"text":"Task: Please give me a molecule canonical SMILES based on the text description below.\nDescription: A molecule that has a spectral entropy of 4.125 nat.\nResult: CO\/N=C(\\C(O)=N[C@@H]1C(=O)N2C(C(=O)O)=C(COC(=N)O)CS[C@H]12)c1ccco1"} {"text":"Task: Please give me a molecule InChI based on the text description.\nDescription: A molecule that has a spectral entropy of 0.000 nat.\nResult: InChI=1S\/C25H47NO9\/c1-4-17(3)24(32)21(35-23(31)13-18(25(33)34)12-22(29)30)11-16(2)9-7-5-6-8-10-19(27)14-20(28)15-26\/h16-21,24,27-28,32H,4-15,26H2,1-3H3,(H,29,30)(H,33,34)\/t16-,17+,18?,19+,20-,21-,24+\/m0\/s1"}", "/scratch/micpie/export/mona/valid_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the spectral entropy in nat.\nDeepSMILES: CO\/N=C\\CO)=N[C@@H]C=O)NCC=O)O))=CCOC=N)O))))CS[C@H]86))))))))))cccco5\nConstraint: Even if you are not sure, you must answer with a numeric value in nat without using any additional words.\nResult: 4.125 nat"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the spectral entropy in nat.\nMolecule SMILES: CC[C@@H](C)[C@@H](O)[C@H](C[C@@H](C)CCCCCC[C@@H](O)C[C@H](O)CN)OC(=O)CC(CC(=O)O)C(=O)O\nConstraint: Even if you are not sure, you must answer with a numeric value in nat without using any other words.\nResult: 0.000 nat"}", "/scratch/micpie/export/mona/train_0-5.jsonl": "{"text":"Task: Please give me a molecule SMILES based on the text description.\nDescription: A molecule that has a spectral entropy of 1.858 nat.\nResult: N=C(O)CS(=O)C(c1ccccc1)c1ccccc1"} {"text":"Task: Please create a molecule InChI based on the description.\nDescription: A molecule that has a spectral entropy of 1.614 nat.\nResult: InChI=1S\/C16H16O8\/c1-16(23)14(21)10-9(13(20)15(16)22)12(19)8-6(11(10)18)3-5(24-2)4-7(8)17\/h3-4,13-15,17,20-23H,1-2H3\/t13-,14+,15+,16-\/m0\/s1"}", "/scratch/micpie/export/mona/train_0-2.jsonl": "{"text":"The canonical SMILES N=C(O)CS(=O)C(c1ccccc1)c1ccccc1 is representing a molecule that has a spectral entropy of 1.858 nat."} {"text":"The SMILES COc1cc(O)c2c(c1)C(=O)C1=C(C2=O)[C@H](O)[C@@H](O)[C@@](C)(O)[C@@H]1O represents a molecule that has a spectral entropy of 1.614 nat."}", "/scratch/micpie/export/mona/train_0-7.jsonl": "{"text":"User: Can you create the DeepSMILES of a molecule that has a spectral entropy of 1.858 nat?\nAssistant: Of course, here you go: N=CO)CS=O)Ccccccc6))))))cccccc6"} {"text":"User: Can you create the InChI of a molecule that has a spectral entropy of 1.614 nat?\nAssistant: Of course, here you go: InChI=1S\/C16H16O8\/c1-16(23)14(21)10-9(13(20)15(16)22)12(19)8-6(11(10)18)3-5(24-2)4-7(8)17\/h3-4,13-15,17,20-23H,1-2H3\/t13-,14+,15+,16-\/m0\/s1"}", "/scratch/micpie/export/mona/train_0-1.jsonl": "{"text":"Based on the SMILES representation of N=C(O)CS(=O)C(c1ccccc1)c1ccccc1, the molecule has a spectral entropy of 1.858 nat."} {"text":"Based on the SELFIES representation of [C][O][C][=C][C][Branch1][C][O][=C][C][=Branch1][Ring2][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=C][Branch1][Branch1][C][Ring1][Branch2][=O][C@H1][Branch1][C][O][C@@H1][Branch1][C][O][C@@][Branch1][C][C][Branch1][C][O][C@@H1][Ring1][N][O], the molecule has a spectral entropy of 1.614 nat."}", "/scratch/micpie/export/mona/train_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the spectral entropy in nat.\nMolecule InChI: InChI=1S\/C15H15NO2S\/c16-14(17)11-19(18)15(12-7-3-1-4-8-12)13-9-5-2-6-10-13\/h1-10,15H,11H2,(H2,16,17)\nConstraint: Even if you are uncertain, you must answer with a numeric value in nat without using any other words.\nResult: 1.858 nat"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the spectral entropy in nat.\ncanonical SMILES: COc1cc(O)c2c(c1)C(=O)C1=C(C2=O)[C@H](O)[C@@H](O)[C@@](C)(O)[C@@H]1O\nConstraint: Even if you are not sure, you must answer with a numeric value in nat without using any additional words.\nResult: 1.614 nat"}", "/scratch/micpie/export/mona/test_0-7.jsonl": "{"text":"User: Can you create the SELFIES of a molecule that has a spectral entropy of 2.314 nat?\nAssistant: Yes, I'm happy to help, here you go: [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][O][C][C][Branch1][C][C][N]"} {"text":"User: Can you create the SELFIES of a molecule that has a spectral entropy of 1.251 nat?\nAssistant: Yes, here you go: [C][C][C][C][=C][C][=C][C][Branch1][C][O][=C][Ring1][#Branch1][C][=Branch1][C][=O][O][Ring1][N]"}", "/scratch/micpie/export/mona/train_0-9.jsonl": "{"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should have a spectral entropy of 1.858 nat.\nAssistant: Ok, this DeepSMILES represents a molecule that has a spectral entropy of 1.858 nat: N=CO)CS=O)Ccccccc6))))))cccccc6"} {"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should have a spectral entropy of 1.614 nat.\nAssistant: Ok, here you go, this SELFIES represents a molecule that has a spectral entropy of 1.614 nat: [C][O][C][=C][C][Branch1][C][O][=C][C][=Branch1][Ring2][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=C][Branch1][Branch1][C][Ring1][Branch2][=O][C@H1][Branch1][C][O][C@@H1][Branch1][C][O][C@@][Branch1][C][C][Branch1][C][O][C@@H1][Ring1][N][O]"}", "/scratch/micpie/export/mona/valid_0-3.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C16H16N4O8S\/c1-26-19-9(8-3-2-4-27-8)12(21)18-10-13(22)20-11(15(23)24)7(5-28-16(17)25)6-29-14(10)20\/h2-4,10,14H,5-6H2,1H3,(H2,17,25)(H,18,21)(H,23,24)\/b19-9-\/t10-,14-\/m1\/s1 has a spectral entropy of 4.125 nat."} {"text":"The molecule with the SMILES CC[C@@H](C)[C@@H](O)[C@H](C[C@@H](C)CCCCCC[C@@H](O)C[C@H](O)CN)OC(=O)CC(CC(=O)O)C(=O)O has a spectral entropy of 0.000 nat."}", "/scratch/micpie/export/mona/test_0-8.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that has a spectral entropy of 2.314 nat.\nAssistant: This is a molecule that has a spectral entropy of 2.314 nat: Cc1cccc(C)c1OCC(C)N"} {"text":"User: I'm searching for the DeepSMILES of a molecule that has a spectral entropy of 1.251 nat.\nAssistant: This is a molecule that has a spectral entropy of 1.251 nat: CCCcccccO)c6C=O)O%10"}", "/scratch/micpie/export/mona/test_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the spectral entropy in nat.\nInChI: InChI=1S\/C11H17NO\/c1-8-5-4-6-9(2)11(8)13-7-10(3)12\/h4-6,10H,7,12H2,1-3H3\nConstraint: Even if you are uncertain, you must answer with a numeric value in nat without using any additional words.\nResult: 2.314 nat"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the spectral entropy in nat.\nInChI: InChI=1S\/C10H10O3\/c1-6-5-7-3-2-4-8(11)9(7)10(12)13-6\/h2-4,6,11H,5H2,1H3\nConstraint: Even if you are not sure, you must answer with a numeric value in nat without using any other words.\nResult: 1.251 nat"}", "/scratch/micpie/export/compound_protein_domain/test_1-2.jsonl": "{"text":"User: Can you give me an example for a protein that binds the compound [C][O][C][=C][C][=N][C][=N][C][Branch2][Ring2][Ring1][O][C][=C][C][=C][C][Branch1][P][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][=C][C][=C][C][Ring1][S][=C][Ring2][Ring1][Ring2][=C][Ring2][Ring1][O][C][=C][Ring2][Ring1][#C][O][C]?\nAssistant: Of course, the compound [C][O][C][=C][C][=N][C][=N][C][Branch2][Ring2][Ring1][O][C][=C][C][=C][C][Branch1][P][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][=C][C][=C][C][Ring1][S][=C][Ring2][Ring1][Ring2][=C][Ring2][Ring1][O][C][=C][Ring2][Ring1][#C][O][C] targets for example the protein KDR.\nUser: Can you tell me a domain of the protein KDR?\nAssistant: The protein KDR has a Immunoglobulin I-set."} {"text":"User: Can you come up with an example for a protein that binds the compound NC=S)ccccc-ccccO)c[nH]ccc95)))))))))c6?\nAssistant: Of course, the compound NC=S)ccccc-ccccO)c[nH]ccc95)))))))))c6 targets for example the protein Histone-lysine N-methyltransferase ASH1L.\nUser: Can you tell me a domain of the protein Histone-lysine N-methyltransferase ASH1L?\nAssistant: The protein Histone-lysine N-methyltransferase ASH1L has a ASH1-like, Bromodomain."}", "/scratch/micpie/export/compound_protein_domain/test_0-1.jsonl": "{"text":"The compound [C][C][=N][C][Branch2][Ring2][O][C][=C][Branch1][C][F][C][=C][Branch1][C][Cl][C][=C][Ring1][Branch2][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][Ring1][Ring2][=N][O][Ring2][Ring1][=N] targets the protein Kininogen-1 (Alpha-2-thiol proteinase inhibitor) (Fitzgerald factor) (High molecular weight kininogen) (HMWK) (Williams-Fitzgerald-Flaujeac factor) which has a Cystatin domain."} {"text":"The compound InChI=1S\/C14H13N5OS\/c20-14-10(13(17-18-14)12-7-15-19-21-12)6-9-5-8-3-1-2-4-11(8)16-9\/h5-7,16H,1-4H2,(H,18,20)\/b10-6- targets the protein CD antigen CD309 which has a Immunoglobulin I-set."}", "/scratch/micpie/export/compound_protein_domain/valid_0-0.jsonl": "{"text":"Cc1nc(-c2c(F)cc(Cl)cc2-c2ccc3c(c2)CC[C@@H]3NC(=O)C2(NC(=O)c3cncnc3)COC2)no1 targets the protein Kininogen-1 (Alpha-2-thiol proteinase inhibitor) (Fitzgerald factor) (High molecular weight kininogen) (HMWK) (Williams-Fitzgerald-Flaujeac factor) which has a Cystatin domain."} {"text":"CN(c1cccc(CO)c1)c1ccnc(Nc2cc(N3CCOCC3)cc(N3CCOCC3)c2)n1 targets the protein FLK-1 which has a Immunoglobulin I-set."}", "/scratch/micpie/export/compound_protein_domain/test_0-2.jsonl": "{"text":"User: Can you give me an example for a protein that binds the compound Cc1nc(-c2c(F)cc(Cl)cc2-c2ccc3c(c2)OCC3NC(=O)C2(N)CC2)no1?\nAssistant: Sure, the compound Cc1nc(-c2c(F)cc(Cl)cc2-c2ccc3c(c2)OCC3NC(=O)C2(N)CC2)no1 targets for example the protein Kininogen-1 (Alpha-2-thiol proteinase inhibitor) (Fitzgerald factor) (High molecular weight kininogen) (HMWK) (Williams-Fitzgerald-Flaujeac factor).\nUser: Can you tell me a domain of the protein Kininogen-1 (Alpha-2-thiol proteinase inhibitor) (Fitzgerald factor) (High molecular weight kininogen) (HMWK) (Williams-Fitzgerald-Flaujeac factor)?\nAssistant: The protein Kininogen-1 (Alpha-2-thiol proteinase inhibitor) (Fitzgerald factor) (High molecular weight kininogen) (HMWK) (Williams-Fitzgerald-Flaujeac factor) has a Cystatin domain."} {"text":"User: Can you come up with one example for a protein that binds the compound O=C1NN=C(c2cnns2)\/C1=C\/c1cc2c([nH]1)CCCC2?\nAssistant: Of course, the compound O=C1NN=C(c2cnns2)\/C1=C\/c1cc2c([nH]1)CCCC2 targets for example the protein VEGFR-2.\nUser: Can you tell me a domain of the protein VEGFR-2?\nAssistant: The protein VEGFR-2 has a Immunoglobulin I-set."}", "/scratch/micpie/export/compound_protein_domain/train_1-0.jsonl": "{"text":"InChI=1S\/C27H20N4O2S\/c1-17-16-28-27(34-17)26(33)29-21-9-5-8-19(14-21)25(32)20-11-12-22-23(30-31-24(22)15-20)13-10-18-6-3-2-4-7-18\/h2-16H,1H3,(H,29,33)(H,30,31)\/b13-10+ targets the protein CD antigen CD309 which has a Immunoglobulin I-set."} {"text":"[O][=C][Branch2][#Branch1][Branch1][N][C][=C][C][=C][Branch1][C][O][C][=Branch1][Ring2][=C][Ring1][#Branch1][C][C][=C][C][Branch1][=Branch2][S][=Branch1][C][=O][=Branch1][C][=O][O][=C][C][=Branch1][Branch1][=C][Ring1][#Branch2][O][C][C][=C][C][Branch1][=Branch2][S][=Branch1][C][=O][=Branch1][C][=O][O][=C][C][=Branch1][Branch1][=C][Ring1][#Branch2][O][C][C][=C][C][Branch1][=Branch2][S][=Branch1][C][=O][=Branch1][C][=O][O][=C][C][=Branch1][Branch1][=C][Ring1][#Branch2][O][C][Ring2][Ring2][#Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1] targets the protein p33ING2 which has a ING2, PHD domain."}", "/scratch/micpie/export/compound_protein_domain/valid_1-2.jsonl": "{"text":"User: Can you give me an example for a protein that binds the compound Cc1sc2ncnc(N)c2c1-c1ccc(NC(=O)Nc2ccccc2Cl)cc1?\nAssistant: Yes, the compound Cc1sc2ncnc(N)c2c1-c1ccc(NC(=O)Nc2ccccc2Cl)cc1 targets for example the protein CD antigen CD309.\nUser: Can you tell me a domain of the protein CD antigen CD309?\nAssistant: The protein CD antigen CD309 has a Immunoglobulin I-set."} {"text":"User: Can you give me an example for a protein that binds the compound Cc1ccc2ncccc2c1-c1cc(-c2c(C)noc2C)cc2nc(C3CC3)[nH]c12?\nAssistant: Sure, the compound Cc1ccc2ncccc2c1-c1cc(-c2c(C)noc2C)cc2nc(C3CC3)[nH]c12 targets for example the protein Bromodomain-containing protein 8.\nUser: Can you tell me a domain of the protein Bromodomain-containing protein 8?\nAssistant: The protein Bromodomain-containing protein 8 has a Brd8, Bromo domain."}", "/scratch/micpie/export/compound_protein_domain/test_0-0.jsonl": "{"text":"InChI=1S\/C21H18ClFN4O3\/c1-10-25-19(27-30-10)18-14(7-12(22)8-15(18)23)11-2-3-13-16(9-29-17(13)6-11)26-20(28)21(24)4-5-21\/h2-3,6-8,16H,4-5,9,24H2,1H3,(H,26,28) targets the protein Kininogen-1 (Alpha-2-thiol proteinase inhibitor) (Fitzgerald factor) (High molecular weight kininogen) (HMWK) (Williams-Fitzgerald-Flaujeac factor) which has a Cystatin domain."} {"text":"[O][=C][N][N][=C][Branch1][Branch2][C][=C][N][=N][S][Ring1][Branch1][\/C][Ring1][#Branch2][=C][\/C][=C][C][=C][Branch1][Ring2][NH1][Ring1][Branch1][C][C][C][C][Ring1][#Branch1] targets the protein Fetal liver kinase 1 which has a Immunoglobulin I-set."}", "/scratch/micpie/export/compound_protein_domain/train_0-0.jsonl": "{"text":"Ccnc-ccF)ccCl)cc6-cccccc6)CCC5C)NC=O)CNC=O)CF)F)F))))CC3)))))))))))))))))))no5 targets the protein Kininogen-1 (Alpha-2-thiol proteinase inhibitor) (Fitzgerald factor) (High molecular weight kininogen) (HMWK) (Williams-Fitzgerald-Flaujeac factor) which has a Cystatin domain."} {"text":"InChI=1S\/C31H22N4O2\/c36-30(22-13-14-26-28(34-35-29(26)19-22)15-12-20-6-2-1-3-7-20)21-8-4-9-23(18-21)33-31(37)25-10-5-11-27-24(25)16-17-32-27\/h1-19,32H,(H,33,37)(H,34,35)\/b15-12+ targets the protein FLK-1 which has a Immunoglobulin I-set."}", "/scratch/micpie/export/compound_protein_domain/test_1-1.jsonl": "{"text":"The compound InChI=1S\/C27H20ClN3O4\/c1-33-24-13-22-23(14-25(24)34-2)29-15-30-27(22)35-19-10-11-20-16(12-19)4-3-5-21(20)26(32)31-18-8-6-17(28)7-9-18\/h3-15H,1-2H3,(H,31,32) targets the protein KDR which has a Immunoglobulin I-set."} {"text":"The compound InChI=1S\/C15H12N2OS\/c16-15(19)10-3-1-2-9(8-10)11-4-5-13(18)14-12(11)6-7-17-14\/h1-8,17-18H,(H2,16,19) targets the protein Absent small and homeotic disks protein 1 homolog which has a ASH1-like, Bromodomain."}", "/scratch/micpie/export/compound_protein_domain/valid_0-2.jsonl": "{"text":"User: Can you give me an example for a protein that binds the compound Cc1nc(-c2c(F)cc(Cl)cc2-c2ccc3c(c2)CC[C@@H]3NC(=O)C2(NC(=O)c3cncnc3)COC2)no1?\nAssistant: Yes, of course, the compound Cc1nc(-c2c(F)cc(Cl)cc2-c2ccc3c(c2)CC[C@@H]3NC(=O)C2(NC(=O)c3cncnc3)COC2)no1 targets for example the protein Kininogen-1 (Alpha-2-thiol proteinase inhibitor) (Fitzgerald factor) (High molecular weight kininogen) (HMWK) (Williams-Fitzgerald-Flaujeac factor).\nUser: Can you tell me a domain of the protein Kininogen-1 (Alpha-2-thiol proteinase inhibitor) (Fitzgerald factor) (High molecular weight kininogen) (HMWK) (Williams-Fitzgerald-Flaujeac factor)?\nAssistant: The protein Kininogen-1 (Alpha-2-thiol proteinase inhibitor) (Fitzgerald factor) (High molecular weight kininogen) (HMWK) (Williams-Fitzgerald-Flaujeac factor) has a Cystatin domain."} {"text":"User: Can you come up with one example for a protein that binds the compound CNcccccCO))c6))))))cccncNcccNCCOCC6))))))ccNCCOCC6))))))c6)))))))n6?\nAssistant: Yes, the compound CNcccccCO))c6))))))cccncNcccNCCOCC6))))))ccNCCOCC6))))))c6)))))))n6 targets for example the protein Vascular endothelial growth factor receptor 2.\nUser: Can you tell me a domain of the protein Vascular endothelial growth factor receptor 2?\nAssistant: The protein Vascular endothelial growth factor receptor 2 has a Immunoglobulin I-set."}", "/scratch/micpie/export/compound_protein_domain/valid_0-1.jsonl": "{"text":"The compound Cc1nc(-c2c(F)cc(Cl)cc2-c2ccc3c(c2)CC[C@@H]3NC(=O)C2(NC(=O)c3cncnc3)COC2)no1 targets the protein Kininogen-1 (Alpha-2-thiol proteinase inhibitor) (Fitzgerald factor) (High molecular weight kininogen) (HMWK) (Williams-Fitzgerald-Flaujeac factor) which has a Cystatin domain."} {"text":"The compound [C][N][Branch1][=N][C][=C][C][=C][C][Branch1][Ring1][C][O][=C][Ring1][Branch2][C][=C][C][=N][C][Branch2][Ring1][#C][N][C][=C][C][Branch1][=Branch2][N][C][C][O][C][C][Ring1][=Branch1][=C][C][Branch1][=Branch2][N][C][C][O][C][C][Ring1][=Branch1][=C][Ring2][Ring1][C][=N][Ring2][Ring1][=Branch2] targets the protein Kinase insert domain receptor which has a Immunoglobulin I-set."}", "/scratch/micpie/export/compound_protein_domain/valid_1-1.jsonl": "{"text":"The compound Cc1sc2ncnc(N)c2c1-c1ccc(NC(=O)Nc2ccccc2Cl)cc1 targets the protein VEGFR-2 which has a Immunoglobulin I-set."} {"text":"The compound Cc1ccc2ncccc2c1-c1cc(-c2c(C)noc2C)cc2nc(C3CC3)[nH]c12 targets the protein p120 which has a Brd8, Bromo domain."}", "/scratch/micpie/export/compound_protein_domain/test_1-0.jsonl": "{"text":"COc1cc2ncnc(Oc3ccc4c(C(=O)Nc5ccc(Cl)cc5)cccc4c3)c2cc1OC targets the protein Fetal liver kinase 1 which has a Immunoglobulin I-set."} {"text":"NC(=S)c1cccc(-c2ccc(O)c3[nH]ccc23)c1 targets the protein Absent small and homeotic disks protein 1 homolog which has a ASH1-like, Bromodomain."}", "/scratch/micpie/export/compound_protein_domain/train_0-2.jsonl": "{"text":"User: Can you give me one example for a protein that binds the compound Ccnc-ccF)ccCl)cc6-cccccc6)CCC5C)NC=O)CNC=O)CF)F)F))))CC3)))))))))))))))))))no5?\nAssistant: Yes, of course, the compound Ccnc-ccF)ccCl)cc6-cccccc6)CCC5C)NC=O)CNC=O)CF)F)F))))CC3)))))))))))))))))))no5 targets for example the protein Kininogen-1 (Alpha-2-thiol proteinase inhibitor) (Fitzgerald factor) (High molecular weight kininogen) (HMWK) (Williams-Fitzgerald-Flaujeac factor).\nUser: Can you tell me a domain of the protein Kininogen-1 (Alpha-2-thiol proteinase inhibitor) (Fitzgerald factor) (High molecular weight kininogen) (HMWK) (Williams-Fitzgerald-Flaujeac factor)?\nAssistant: The protein Kininogen-1 (Alpha-2-thiol proteinase inhibitor) (Fitzgerald factor) (High molecular weight kininogen) (HMWK) (Williams-Fitzgerald-Flaujeac factor) has a Cystatin domain."} {"text":"User: Can you give me one example for a protein that binds the compound O=C(c1cccc(NC(=O)c2cccc3[nH]ccc23)c1)c1ccc2c(\/C=C\/c3ccccc3)n[nH]c2c1?\nAssistant: Sure, the compound O=C(c1cccc(NC(=O)c2cccc3[nH]ccc23)c1)c1ccc2c(\/C=C\/c3ccccc3)n[nH]c2c1 targets for example the protein Vascular endothelial growth factor receptor 2.\nUser: Can you tell me a domain of the protein Vascular endothelial growth factor receptor 2?\nAssistant: The protein Vascular endothelial growth factor receptor 2 has a Immunoglobulin I-set."}", "/scratch/micpie/export/compound_protein_domain/train_1-1.jsonl": "{"text":"The compound Cc1cnc(C(=O)Nc2cccc(C(=O)c3ccc4c(\/C=C\/c5ccccc5)n[nH]c4c3)c2)s1 targets the protein CD antigen CD309 which has a Immunoglobulin I-set."} {"text":"The compound O=C(Nc1cc2c(O)c(c1)Cc1cc(S(=O)(=O)O)cc(c1O)Cc1cc(S(=O)(=O)O)cc(c1O)Cc1cc(S(=O)(=O)O)cc(c1O)C2)c1ccccc1 targets the protein ING1Lp which has a ING2, PHD domain."}", "/scratch/micpie/export/compound_protein_domain/train_0-1.jsonl": "{"text":"The compound Ccnc-ccF)ccCl)cc6-cccccc6)CCC5C)NC=O)CNC=O)CF)F)F))))CC3)))))))))))))))))))no5 targets the protein Kininogen-1 (Alpha-2-thiol proteinase inhibitor) (Fitzgerald factor) (High molecular weight kininogen) (HMWK) (Williams-Fitzgerald-Flaujeac factor) which has a Cystatin domain."} {"text":"The compound O=C(c1cccc(NC(=O)c2cccc3[nH]ccc23)c1)c1ccc2c(\/C=C\/c3ccccc3)n[nH]c2c1 targets the protein CD antigen CD309 which has a Immunoglobulin I-set."}", "/scratch/micpie/export/compound_protein_domain/valid_1-0.jsonl": "{"text":"InChI=1S\/C20H16ClN5OS\/c1-11-16(17-18(22)23-10-24-19(17)28-11)12-6-8-13(9-7-12)25-20(27)26-15-5-3-2-4-14(15)21\/h2-10H,1H3,(H2,22,23,24)(H2,25,26,27) targets the protein Vascular endothelial growth factor receptor 2 which has a Immunoglobulin I-set."} {"text":"Cccccncccc6c%10-ccc-ccC)noc5C))))))ccncCCC3)))[nH]c95 targets the protein TrCP120 which has a Brd8, Bromo domain."}", "/scratch/micpie/export/compound_protein_domain/train_1-2.jsonl": "{"text":"User: Can you give me one example for a protein that binds the compound [C][C][=C][N][=C][Branch2][Ring2][#C][C][=Branch1][C][=O][N][C][=C][C][=C][C][Branch2][Ring1][=C][C][=Branch1][C][=O][C][=C][C][=C][C][Branch1][O][\/C][=C][\/C][=C][C][=C][C][=C][Ring1][=Branch1][=N][NH1][C][Ring1][=N][=C][Ring1][P][=C][Ring2][Ring1][=Branch2][S][Ring2][Ring1][P]?\nAssistant: Yes, of course, the compound [C][C][=C][N][=C][Branch2][Ring2][#C][C][=Branch1][C][=O][N][C][=C][C][=C][C][Branch2][Ring1][=C][C][=Branch1][C][=O][C][=C][C][=C][C][Branch1][O][\/C][=C][\/C][=C][C][=C][C][=C][Ring1][=Branch1][=N][NH1][C][Ring1][=N][=C][Ring1][P][=C][Ring2][Ring1][=Branch2][S][Ring2][Ring1][P] targets for example the protein Kinase insert domain receptor.\nUser: Can you tell me a domain of the protein Kinase insert domain receptor?\nAssistant: The protein Kinase insert domain receptor has a Immunoglobulin I-set."} {"text":"User: Can you come up with an example for a protein that binds the compound O=C(Nc1cc2c(O)c(c1)Cc1cc(S(=O)(=O)O)cc(c1O)Cc1cc(S(=O)(=O)O)cc(c1O)Cc1cc(S(=O)(=O)O)cc(c1O)C2)c1ccccc1?\nAssistant: Sure, the compound O=C(Nc1cc2c(O)c(c1)Cc1cc(S(=O)(=O)O)cc(c1O)Cc1cc(S(=O)(=O)O)cc(c1O)Cc1cc(S(=O)(=O)O)cc(c1O)C2)c1ccccc1 targets for example the protein p32.\nUser: Can you tell me a domain of the protein p32?\nAssistant: The protein p32 has a ING2, PHD domain."}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/test_0-10.jsonl": "{"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should have a VDss of 16.980 L\/kg.\nAssistant: Got it, here you go, this DeepSMILES represents a molecule that has a VDss of 16.980 L\/kg: CCCCCcccO)ccc6)OCC)C)ccccC)cc6-%10"} {"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a volume of distribution at steady state (VDss) of 0.190 L\/kg.\nAssistant: Ok, this DeepSMILES represents a molecule that has a volume of distribution at steady state (VDss) of 0.190 L\/kg: CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20"}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/valid_0-8.jsonl": "{"text":"User: Can you create the canonical SMILES of a molecule that has a VDss of 1.530 L\/kg?\nAssistant: Yes, here you go: CN1CCc2nc(C(=O)N[C@@H]3C[C@@H](C(=O)N(C)C)CC[C@@H]3NC(=O)C(=O)Nc3ccc(Cl)cn3)sc2C1"} {"text":"User: Can you give me the InChI of a molecule that has a VDss of 0.370 L\/kg?\nAssistant: Of course, here you go: InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8-"}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/train_0-8.jsonl": "{"text":"User: Can you create the canonical SMILES of a molecule that has a volume of distribution at steady state (VDss) of 0.800 L\/kg?\nAssistant: Yes, here you go: CNC(=O)Nc1c(OCCN2CCCC2)c(OC)c2occc2c1OC"} {"text":"User: Can you generate the DeepSMILES of a molecule that has a VDss of 11.000 L\/kg?\nAssistant: Sure, here you go: Nccc[nH+]cccccc%106)))))))CCCC6"}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/test_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the VDss in L\/kg.\nMolecule SMILES: CCCCCc1cc(O)c2c(c1)OC(C)(C)c1ccc(C)cc1-2\nConstraint: Even if you are not sure, you must answer with a numeric value in L\/kg without the unit and without using any other words.\nResult: 16.980"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the VDss in L\/kg.\nMolecule DeepSMILES: CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20\nConstraint: Even if you are not sure, you must answer with a numeric value in L\/kg without the unit and without using any other words.\nResult: 0.190"}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/valid_0-9.jsonl": "{"text":"User: I'm looking for the canonical SMILES of a molecule that has a volume of distribution at steady state (VDss) of 1.530 L\/kg.\nAssistant: This is a molecule that has a volume of distribution at steady state (VDss) of 1.530 L\/kg: CN1CCc2nc(C(=O)N[C@@H]3C[C@@H](C(=O)N(C)C)CC[C@@H]3NC(=O)C(=O)Nc3ccc(Cl)cn3)sc2C1"} {"text":"User: I'm looking for the SMILES of a molecule that has a volume of distribution at steady state (VDss) of 0.370 L\/kg.\nAssistant: This is a molecule that has a volume of distribution at steady state (VDss) of 0.370 L\/kg: CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21"}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/test_0-1.jsonl": "{"text":"Based on the SMILES representation of CCCCCc1cc(O)c2c(c1)OC(C)(C)c1ccc(C)cc1-2, the molecule has a VDss of 16.980 L\/kg."} {"text":"Based on the SELFIES ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]'], the molecule has a VDss of 0.190 L\/kg."}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/valid_0-0.jsonl": "{"text":"The molecule with the SMILES representation of CN1CCc2nc(C(=O)N[C@@H]3C[C@@H](C(=O)N(C)C)CC[C@@H]3NC(=O)C(=O)Nc3ccc(Cl)cn3)sc2C1 has a VDss of 1.530 L\/kg."} {"text":"The molecule with the SMILES representation of CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21 has a volume of distribution at steady state (VDss) of 0.370 L\/kg."}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/test_0-2.jsonl": "{"text":"The IUPAC name 6,6,9-trimethyl-3-pentylbenzo[c]chromen-1-ol is representing a molecule with a volume of distribution at steady state (VDss) of 16.980 L\/kg."} {"text":"The SMILES CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1 is representing a molecule that has a volume of distribution at steady state (VDss) of 0.190 L\/kg."}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/valid_0-10.jsonl": "{"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a volume of distribution at steady state (VDss) of 1.530 L\/kg.\nAssistant: Ok, here you go, this SMILES represents a molecule that has a volume of distribution at steady state (VDss) of 1.530 L\/kg: CN1CCc2nc(C(=O)N[C@@H]3C[C@@H](C(=O)N(C)C)CC[C@@H]3NC(=O)C(=O)Nc3ccc(Cl)cn3)sc2C1"} {"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a volume of distribution at steady state (VDss) of 0.370 L\/kg.\nAssistant: Got it, here you go, this SMILES represents a molecule that has a volume of distribution at steady state (VDss) of 0.370 L\/kg: CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21"}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/train_0-6.jsonl": "{"text":"Task: Please generate a IUPAC name based on the text description.\nDescription: A molecule that has a VDss of 0.800 L\/kg.\nResult: 1-[4,7-dimethoxy-6-(2-pyrrolidin-1-ylethoxy)-1-benzofuran-5-yl]-3-methylurea"} {"text":"Task: Please generate a SMILES based on the description below.\nDescription: A molecule that has a VDss of 11.000 L\/kg.\nResult: Nc1c2c([nH+]c3ccccc13)CCCC2"}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/valid_0-6.jsonl": "{"text":"Task: Please create a molecule SMILES based on the text description.\nDescription: A molecule that has a volume of distribution at steady state (VDss) of 1.530 L\/kg.\nResult: CN1CCc2nc(C(=O)N[C@@H]3C[C@@H](C(=O)N(C)C)CC[C@@H]3NC(=O)C(=O)Nc3ccc(Cl)cn3)sc2C1"} {"text":"Task: Please give me a molecule SELFIES based on the text description.\nDescription: A molecule that has a volume of distribution at steady state (VDss) of 0.370 L\/kg.\nResult: ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]']"}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/test_0-9.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that has a VDss of 16.980 L\/kg.\nAssistant: This is a molecule that has a VDss of 16.980 L\/kg: CCCCCc1cc(O)c2c(c1)OC(C)(C)c1ccc(C)cc1-2"} {"text":"User: I'm searching for the SMILES of a molecule that has a VDss of 0.190 L\/kg.\nAssistant: This is a molecule that has a VDss of 0.190 L\/kg: CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1"}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/test_0-0.jsonl": "{"text":"The molecule with the SMILES representation of CCCCCc1cc(O)c2c(c1)OC(C)(C)c1ccc(C)cc1-2 has a VDss of 16.980 L\/kg."} {"text":"The molecule with the canonical SMILES representation of CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1 has a volume of distribution at steady state (VDss) of 0.190 L\/kg."}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/valid_0-7.jsonl": "{"text":"User: Can you derive the VDss in L\/kg of the molecule with the InChI InChI=1S\/C24H30ClN7O4S\/c1-31(2)24(36)13-4-6-15(27-20(33)21(34)30-19-7-5-14(25)11-26-19)17(10-13)28-22(35)23-29-16-8-9-32(3)12-18(16)37-23\/h5,7,11,13,15,17H,4,6,8-10,12H2,1-3H3,(H,27,33)(H,28,35)(H,26,30,34)\/t13-,15-,17+\/m0\/s1?\nAssistant: Sure, this molecule has a VDss of 1.530 L\/kg."} {"text":"User: Can you estimate the VDss in L\/kg of the molecule with the canonical SMILES CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21?\nAssistant: Sure, this molecule has a VDss of 0.370 L\/kg."}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/test_0-3.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C21H26O2\/c1-5-6-7-8-15-12-18(22)20-16-11-14(2)9-10-17(16)21(3,4)23-19(20)13-15\/h9-13,22H,5-8H2,1-4H3 has a VDss of 16.980 L\/kg."} {"text":"The molecule with the InChI InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1 has a volume of distribution at steady state (VDss) of 0.190 L\/kg."}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/valid_0-11.jsonl": "{"text":"User: I want to come up with a InChI.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should have a volume of distribution at steady state (VDss) of 1.530 L\/kg.\nAssistant: Got it, this InChI represents a molecule that has a volume of distribution at steady state (VDss) of 1.530 L\/kg: InChI=1S\/C24H30ClN7O4S\/c1-31(2)24(36)13-4-6-15(27-20(33)21(34)30-19-7-5-14(25)11-26-19)17(10-13)28-22(35)23-29-16-8-9-32(3)12-18(16)37-23\/h5,7,11,13,15,17H,4,6,8-10,12H2,1-3H3,(H,27,33)(H,28,35)(H,26,30,34)\/t13-,15-,17+\/m0\/s1"} {"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a VDss of 0.370 L\/kg.\nAssistant: Ok, this SMILES represents a molecule that has a VDss of 0.370 L\/kg: CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21"}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/train_0-0.jsonl": "{"text":"The molecule with the SMILES representation of CNC(=O)Nc1c(OCCN2CCCC2)c(OC)c2occc2c1OC has a volume of distribution at steady state (VDss) of 0.800 L\/kg."} {"text":"The molecule with the InChI representation of InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1 has a VDss of 11.000 L\/kg."}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/test_0-6.jsonl": "{"text":"Task: Please generate a molecule DeepSMILES based on the description.\nDescription: A molecule that has a volume of distribution at steady state (VDss) of 16.980 L\/kg.\nResult: CCCCCcccO)ccc6)OCC)C)ccccC)cc6-%10"} {"text":"Task: Please create a SELFIES based on the description below.\nDescription: A molecule that has a VDss of 0.190 L\/kg.\nResult: ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]']"}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/train_0-10.jsonl": "{"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a VDss of 0.800 L\/kg.\nAssistant: Got it, here you go, this DeepSMILES represents a molecule that has a VDss of 0.800 L\/kg: CNC=O)NccOCCNCCCC5))))))))cOC))coccc5c9OC"} {"text":"User: I want to come up with a InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a VDss of 11.000 L\/kg.\nAssistant: Ok, here you go, this InChI represents a molecule that has a VDss of 11.000 L\/kg: InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1"}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/train_0-3.jsonl": "{"text":"The molecule with the DeepSMILES CNC=O)NccOCCNCCCC5))))))))cOC))coccc5c9OC has a volume of distribution at steady state (VDss) of 0.800 L\/kg."} {"text":"The molecule with the InChI InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1 has a VDss of 11.000 L\/kg."}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/valid_0-2.jsonl": "{"text":"The DeepSMILES CNCCcncC=O)N[C@@H]C[C@@H]C=O)NC)C)))CC[C@@H]6NC=O)C=O)NccccCl)cn6))))))))))))))))))sc5C9 is representing a molecule that has a VDss of 1.530 L\/kg."} {"text":"The InChI InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8- represents a molecule with a VDss of 0.370 L\/kg."}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/valid_0-1.jsonl": "{"text":"Based on the canonical SMILES CN1CCc2nc(C(=O)N[C@@H]3C[C@@H](C(=O)N(C)C)CC[C@@H]3NC(=O)C(=O)Nc3ccc(Cl)cn3)sc2C1, the molecule has a VDss of 1.530 L\/kg."} {"text":"Based on the SMILES representation of CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21, the molecule has a VDss of 0.370 L\/kg."}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/valid_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the volume of distribution at steady state (VDss) in L\/kg.\ncanonical SMILES: CN1CCc2nc(C(=O)N[C@@H]3C[C@@H](C(=O)N(C)C)CC[C@@H]3NC(=O)C(=O)Nc3ccc(Cl)cn3)sc2C1\nConstraint: Even if you are not sure, you must answer with a numeric value in L\/kg without the unit and without using any additional words.\nResult: 1.530"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the volume of distribution at steady state (VDss) in L\/kg.\nMolecule SMILES: CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21\nConstraint: Even if you are not sure, you must answer with a numeric value in L\/kg without the unit and without using any additional words.\nResult: 0.370"}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/valid_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the volume of distribution at steady state (VDss) in L\/kg.\nMolecule IUPAC name: N'-(5-chloropyridin-2-yl)-N-[(1S,2R,4S)-4-(dimethylcarbamoyl)-2-[(5-methyl6,7-dihydro-4H-[1,3]thiazolo[5,4-c]pyridine-2-carbonyl)amino]cyclohexyl]oxamide\nConstraint: Even if you are uncertain, you must answer with a numeric value in L\/kg without using any other words.\nResult: 1.530 L\/kg"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the volume of distribution at steady state (VDss) in L\/kg.\nMolecule canonical SMILES: CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21\nConstraint: Even if you are not sure, you must answer with a numeric value in L\/kg without using any other words.\nResult: 0.370 L\/kg"}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/train_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the volume of distribution at steady state (VDss) in L\/kg.\nDeepSMILES: CNC=O)NccOCCNCCCC5))))))))cOC))coccc5c9OC\nConstraint: Even if you are not sure, you must answer with a numeric value in L\/kg without the unit and without using any other words.\nResult: 0.800"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the volume of distribution at steady state (VDss) in L\/kg.\nMolecule SELFIES: ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]']\nConstraint: Even if you are uncertain, you must answer with a numeric value in L\/kg without the unit and without using any other words.\nResult: 11.000"}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/train_0-2.jsonl": "{"text":"The SMILES CNC(=O)Nc1c(OCCN2CCCC2)c(OC)c2occc2c1OC is representing a molecule with a volume of distribution at steady state (VDss) of 0.800 L\/kg."} {"text":"The SELFIES ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]'] is representing a molecule that has a volume of distribution at steady state (VDss) of 11.000 L\/kg."}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/test_0-11.jsonl": "{"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should have a volume of distribution at steady state (VDss) of 16.980 L\/kg.\nAssistant: Got it, this DeepSMILES represents a molecule that has a volume of distribution at steady state (VDss) of 16.980 L\/kg: CCCCCcccO)ccc6)OCC)C)ccccC)cc6-%10"} {"text":"User: I want to create a IUPAC name.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a VDss of 0.190 L\/kg.\nAssistant: Got it, this IUPAC name represents a molecule that has a VDss of 0.190 L\/kg: 4-[19-[(2-amino-3-phenylpropanoyl)amino]-16-benzyl-4-(1,3-dihydroxybutan-2-ylcarbamoyl)-7-(1-hydroxyethyl)-13-(1H-indol-3-ylmethyl)-6,9,12,15,18-pentaoxo-1,2-dithia-5,8,11,14,17-pentazacycloicos-10-yl]butylazanium"}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/train_0-7.jsonl": "{"text":"User: Can you estimate the volume of distribution at steady state (VDss) in L\/kg of the molecule with the SELFIES ['[C][N][C][=Branch1][C][=O][N][C][=C][Branch1][O][O][C][C][N][C][C][C][C][Ring1][Branch1][C][Branch1][Ring1][O][C][=C][O][C][=C][C][Ring1][Branch1][=C][Ring2][Ring1][Ring1][O][C]']?\nAssistant: Yes, I'm happy to help, this molecule has a volume of distribution at steady state (VDss) of 0.800 L\/kg."} {"text":"User: Can you derive the VDss in L\/kg of the molecule with the canonical SMILES Nc1c2c([nH+]c3ccccc13)CCCC2?\nAssistant: Of course, this molecule has a VDss of 11.000 L\/kg."}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/train_0-11.jsonl": "{"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should have a volume of distribution at steady state (VDss) of 0.800 L\/kg.\nAssistant: Got it, this DeepSMILES represents a molecule that has a volume of distribution at steady state (VDss) of 0.800 L\/kg: CNC=O)NccOCCNCCCC5))))))))cOC))coccc5c9OC"} {"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should have a volume of distribution at steady state (VDss) of 11.000 L\/kg.\nAssistant: Understood, this DeepSMILES represents a molecule that has a volume of distribution at steady state (VDss) of 11.000 L\/kg: Nccc[nH+]cccccc%106)))))))CCCC6"}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/train_0-1.jsonl": "{"text":"Based on the IUPAC name 1-[4,7-dimethoxy-6-(2-pyrrolidin-1-ylethoxy)-1-benzofuran-5-yl]-3-methylurea, the molecule has a VDss of 0.800 L\/kg."} {"text":"Based on the IUPAC name 1,2,3,4-tetrahydroacridin-10-ium-9-amine, the molecule has a volume of distribution at steady state (VDss) of 11.000 L\/kg."}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/train_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the volume of distribution at steady state (VDss) in L\/kg.\nMolecule DeepSMILES: CNC=O)NccOCCNCCCC5))))))))cOC))coccc5c9OC\nConstraint: Even if you are uncertain, you must answer with a numeric value in L\/kg without using any other words.\nResult: 0.800 L\/kg"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the volume of distribution at steady state (VDss) in L\/kg.\nIUPAC name: 1,2,3,4-tetrahydroacridin-10-ium-9-amine\nConstraint: Even if you are uncertain, you must answer with a numeric value in L\/kg without using any other words.\nResult: 11.000 L\/kg"}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/test_0-7.jsonl": "{"text":"User: Can you estimate the VDss in L\/kg of the molecule with the SMILES CCCCCc1cc(O)c2c(c1)OC(C)(C)c1ccc(C)cc1-2?\nAssistant: Yes, I'm happy to help, this molecule has a VDss of 16.980 L\/kg."} {"text":"User: Can you derive the volume of distribution at steady state (VDss) in L\/kg of the molecule with the SMILES CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1?\nAssistant: Yes, this molecule has a volume of distribution at steady state (VDss) of 0.190 L\/kg."}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/train_0-9.jsonl": "{"text":"User: I'm searching for the DeepSMILES of a molecule that has a volume of distribution at steady state (VDss) of 0.800 L\/kg.\nAssistant: This is a molecule that has a volume of distribution at steady state (VDss) of 0.800 L\/kg: CNC=O)NccOCCNCCCC5))))))))cOC))coccc5c9OC"} {"text":"User: I'm looking for the InChI of a molecule that has a volume of distribution at steady state (VDss) of 11.000 L\/kg.\nAssistant: This is a molecule that has a volume of distribution at steady state (VDss) of 11.000 L\/kg: InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1"}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/valid_0-3.jsonl": "{"text":"The molecule with the SELFIES ['[C][N][C][C][C][N][=C][Branch2][Ring2][=C][C][=Branch1][C][=O][N][C@@H1][C][C@@H1][Branch1][#Branch2][C][=Branch1][C][=O][N][Branch1][C][C][C][C][C][C@@H1][Ring1][O][N][C][=Branch1][C][=O][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][C][Cl][C][=N][Ring1][#Branch1][S][C][=Ring2][Ring1][S][C][Ring2][Ring2][Ring2]'] has a volume of distribution at steady state (VDss) of 1.530 L\/kg."} {"text":"The molecule with the SELFIES ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]'] has a volume of distribution at steady state (VDss) of 0.370 L\/kg."}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/test_0-8.jsonl": "{"text":"User: Can you generate the canonical SMILES of a molecule that has a volume of distribution at steady state (VDss) of 16.980 L\/kg?\nAssistant: Yes, I'm happy to help, here you go: CCCCCc1cc(O)c2c(c1)OC(C)(C)c1ccc(C)cc1-2"} {"text":"User: Can you give me the canonical SMILES of a molecule that has a volume of distribution at steady state (VDss) of 0.190 L\/kg?\nAssistant: Sure, here you go: CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1"}", "/scratch/micpie/export/volume_of_distribution_at_steady_state_lombardo_et_al/test_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the VDss in L\/kg.\nSELFIES: ['[C][C][C][C][C][C][=C][C][Branch1][C][O][=C][C][=Branch1][Ring2][=C][Ring1][#Branch1][O][C][Branch1][C][C][Branch1][C][C][C][=C][C][=C][Branch1][C][C][C][=C][Ring1][#Branch1][Ring1][=C]']\nConstraint: Even if you are uncertain, you must answer with a numeric value in L\/kg without using any additional words.\nResult: 16.980 L\/kg"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the volume of distribution at steady state (VDss) in L\/kg.\nDeepSMILES: CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20\nConstraint: Even if you are uncertain, you must answer with a numeric value in L\/kg without using any additional words.\nResult: 0.190 L\/kg"}", "/scratch/micpie/export/bio_ner_47/train_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: Several genes involved in inflammation and the immune system are located in the regions of the markers identified: TNF superfamily members Tnfsf4, 6, and 8 (chr. 1, 84.9-85 cM) which are involved in T cell activation [ 37, 38]; three selectin genes (Sele, Sell, Selp, chr. 1, 86.6 cM) which are involved in immune cell infiltration into inflamed tissues [ 39]; several members of immune cell surface proteins of the Slam family (slamf1, 2, 5, 6, and 9; chr. 1, 89.5-93.3 cM) [ 40]; the chemokine gene Xcl1 (chr. 1, 87 cM) which is expressed by mast cells and recruits lymphocytes [ 41]; several immunoglobulin Fc receptor genes (Fcrl3, Fcgr2b, and Fcgr3 at chr. 1, 92.3 cM; Fcer1g at chr. 1, 93.3 cM; Fcer1a at chr. 1, 94.2 cM); the flagellin receptor Tlr5 (chr. 1, 98 cM); Mmp3 (chr. 9, 1 cM) which recruits CD4+ lymphocytes [ 42]; Mmp7 (chr. 9, 1 cM) which activates Paneth cell-derived cryptdins (alpha-defensins) [ 43]; Icam1 (chr. 9, 7 cM) which is involved in lymphocyte infiltration into inflamed tissues [ 44]; Kitl (chr. 10, 57 cM) which is also known as stem cell factor, and is crucial for mast cell differentiation [ 45]; Im5 (chr. 10, 65 cM) which is involved in antibody-responsiveness [ 46]; Lyzs (chr. 10, 66 cM) which is a Paneth cell product that digests cell walls of bacteria [ 47]; Ifng (chr. 10, 67 cM) which is an important inflammatory signal in CF as well as other conditions [ 48]; Il22 (chr. 10, 67 cM), a member of the anti-inflammatory IL-10 interleukin family [ 49]; and the Stat2 and 6 genes (chr. 10, 70 cM) which are important components of intracellular signaling pathways [ 50]..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: genes,8,13,Sequence\nregions,80,87,Sequence\nTNF superfamily members,115,138,Gene_or_geneproduct\nTnfsf4,139,145,Gene_or_geneproduct\nT cell,203,209,Cell\ngenes,247,252,Sequence\nSele,255,259,Gene_or_geneproduct\nSell,261,265,Gene_or_geneproduct\nSelp,267,271,Gene_or_geneproduct\nimmune cell,313,324,Cell\ncell surface,393,405,GO_ontology\nproteins,406,414,Chemical\nslamf1,436,442,Gene_or_geneproduct\ngene,507,511,Sequence\nXcl1,512,516,Gene_or_geneproduct\nmast cells,556,566,Cell\nlymphocytes,580,591,Cell\nimmunoglobulin,607,621,GO_ontology\ngenes,634,639,Sequence\nFcrl3,642,647,Gene_or_geneproduct\nFcgr2b,649,655,Gene_or_geneproduct\nFcgr3,661,666,Gene_or_geneproduct\nFcer1g,688,694,Gene_or_geneproduct\nFcer1a,716,722,Gene_or_geneproduct\nTlr5,768,772,Gene_or_geneproduct\nMmp3,791,795,Gene_or_geneproduct\nCD4,827,830,Gene_or_geneproduct\nlymphocytes,833,844,Cell\nMmp7,852,856,Gene_or_geneproduct\nPaneth cell,889,900,Cell\nIcam1,949,954,Gene_or_geneproduct\nlymphocyte,992,1002,Cell\nKitl,1045,1049,Gene_or_geneproduct\nstem cell factor,1091,1107,Gene_or_geneproduct\nmast cell,1128,1137,Cell\nIm5,1161,1164,Gene_or_geneproduct\nantibody,1204,1212,GO_ontology\nLyzs,1237,1241,Gene_or_geneproduct\nPaneth cell,1271,1282,Cell\ncell walls,1304,1314,GO_ontology\nbacteria,1318,1326,Organism\nIfng,1334,1338,Gene_or_geneproduct\nIl22,1440,1444,Gene_or_geneproduct\nIL - 10 interleukin,1500,1519,Gene_or_geneproduct\nStat2,1542,1547,Gene_or_geneproduct\ngenes,1554,1559,Sequence\nintracellular,1612,1625,GO_ontology"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: H2AW_HUMANPVX_083455-AA: 988hypothetical protein, conservedCore histone macro-H2A. 2sp .\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: vivax,190,195,Organism\/Species\nhuman,216,221,Organism\/Species\nHuman,240,245,Organism\/Species\nHuman,299,304,Organism\/Species\nAP180_HUMANPVX_092010,327,348,Gene\/Protein\ncoat assembly protein AP180sp,398,427,Gene\/Protein\nNFASC_HUMANPVX_095335,439,460,Gene\/Protein\nFA13A_HUMANPVX_123250,526,547,Gene\/Protein\nFAM13Asp,596,604,Gene\/Protein\nPGK2_HUMANPVX_123515,616,636,Gene\/Protein\n\/ Perforin domain containing proteinPhosphoglycerate,651,703,Gene\/Protein\nkinase 2sp,704,714,Gene\/Protein\nADDA_HUMANPVX_099980,726,746,Gene\/Protein\n- adducinsp,803,814,Gene\/Protein\nNP1L1_HUMANPVX_091530,826,847,Gene\/Protein\nassembly protein 1 - like 1sp,899,928,Gene\/Protein\nPI42B_HUMANPVX_099150,940,961,Gene\/Protein\n5 - phosphate 4 - kinase type - 2 betasp,1023,1063,Gene\/Protein\nCHD3_HUMANPVX_093655,1075,1095,Gene\/Protein\n- helicase - DNA - binding protein 3sp,1150,1188,Gene\/Protein\nROA0_HUMANPVX_092895,1200,1220,Gene\/Protein\nnuclear ribonucleoprotein A0sp,1275,1305,Gene\/Protein\nRAB32_HUMANPVX_123100,1317,1338,Gene\/Protein\n- related protein Rab - 32sp,1383,1411,Gene\/Protein\nLRRK2_HUMANPVX_000660,1423,1444,Gene\/Protein\n- rich repeat serine \/ threonine - protein kinase 2sp,1482,1535,Gene\/Protein\nATAD2_HUMANPVX_101610,1547,1568,Gene\/Protein\nATPase family AAA domain - containing protein 2sp,1606,1655,Gene\/Protein\nIGS10_HUMANPVX_085900,1667,1688,Gene\/Protein\nsuperfamily member 10sp,1745,1768,Gene\/Protein\nRHG36_HUMANPVX_114675,1780,1801,Gene\/Protein\nGTPase - activating protein 36sp,1846,1878,Gene\/Protein\nDDX50_HUMANPVX_117850,1890,1911,Gene\/Protein\n- dependent RNA helicase DDX50sp,1957,1989,Gene\/Protein\nSHAN3_HUMANPVX_122920,2001,2022,Gene\/Protein\nand multiple ankyrin repeat domains protein 3sp,2067,2114,Gene\/Protein\nUACA_HUMANPVX_099150,2126,2146,Gene\/Protein\nautoantigen with coiled - coil domains and ankyrin repeatssp,2193,2253,Gene\/Protein\nSPTN4_HUMANPVX_003755,2265,2286,Gene\/Protein\nWNK1_HUMANPVX_118355,2369,2389,Gene\/Protein\n\/ threonine - protein kinase WNK1sp,2426,2461,Gene\/Protein\nRTN4_HUMANPVX_099980,2473,2493,Gene\/Protein\n- 4sp,2554,2559,Gene\/Protein\nH2AW_HUMANPVX_083455,2571,2591,Gene\/Protein\nSRPK3_HUMANPVX_123910,2673,2694,Gene\/Protein\nprotein kinase 3sp,2740,2758,Gene\/Protein\nMACF1_HUMANPVX_099150,2770,2791,Gene\/Protein"}", "/scratch/micpie/export/bio_ner_2/valid_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: Transcripts Upregulated in ST-HSC Compared to LT-HSC.\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: ST - HSC,27,35,Anatomy\nLT - HSC,48,56,Anatomy"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Methods Inocula, feed composition, and experimental set-up for processing of lactate-rich media to methane An experimental set-up for collection of data enabling the description of metabolic transformation of lactate during the acetogenic and methanogenic steps of anaerobic digestion in methane-yielding bioreactors was presented in this study in two repeats..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: lactate,79,86,treatment\nlactate,215,222,treatment"}", "/scratch/micpie/export/bio_ner_2/test_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: The DC maturation induced by resting NK cells we observed was dependent on endogenous TNF-alpha production..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: DC,4,6,Anatomy\nNK cells,37,45,Anatomy"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: After a one-day washout, the mice were then gavaged with media or 2 109 CFU of AIEC 2A daily for three days..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: media,59,64,treatment\n2 109 CFU of AIEC 2A,68,88,treatment"}", "/scratch/micpie/export/bio_ner_2/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: The nucleoli were detected with an anti-nucleolin antibody (clone 3G4B2; Upstate biotechnology, Lake Placid, NY) on cells fixed in 2% formalin for 10 min followed by permeabilization with ice-cold methanol for 30 min..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: nucleoli,4,12,Anatomy\ncells,119,124,Anatomy"} {"text":"Task: Please carry out the NER task for the the text below.\nText: Ankle thickness was measured daily after bacterial gavage and tissues were collected 10 days after bacterial challenge for flow cytometry analysis..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: bacterial gavage,41,57,treatment\nbacterial challenge,99,118,treatment"}", "/scratch/micpie/export/qmof_gcmc/valid_0-57.jsonl": "{"text":"User: I need to design a metal-organic framework (MOF) with specific gas adsorption properties.\nAssistant: That's cool, how can I help?\nUser: I want the carbon dioxide adsorption energy (computed using grand canonical Monte Carlo) to be -25.983, the N2 adsorption energy (computed using grand canonical Monte Carlo) to be -15.288, and the methane adsorption energy (computed using grand canonical Monte Carlo) to be -21.289.\nAssistant: I suggest [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1."} {"text":"User: I want to find a metal-organic framework (MOF) with specific gas adsorption properties.\nAssistant: How can I help you?\nUser: I want the carbon dioxide adsorption energy (computed using grand canonical Monte Carlo) to be -22.982, the nitrogen adsorption energy (computed using grand canonical Monte Carlo) to be -11.424, and the methane adsorption energy (computed using grand canonical Monte Carlo) to be -15.705.\nAssistant: I suggest [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1."}", "/scratch/micpie/export/qmof_gcmc/train_0-48.jsonl": "{"text":"The MOF with the topology nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has an hydrogen working capacity between 5 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 3.738 g\/L."} {"text":"The MOF with the RCSR code pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has an H2 working capacity between 5 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 5.339 g\/L."}", "/scratch/micpie/export/qmof_gcmc/valid_0-35.jsonl": "{"text":"The MOF with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has a CH4\/N2 selectivity (computed using grand canonical Monte Carlo) of 7.834."} {"text":"The reticular material with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a methane\/nitrogen selectivity (computed using grand canonical Monte Carlo) of 4.236."}", "/scratch/micpie/export/qmof_gcmc/train_0-28.jsonl": "{"text":"The MOF with the net nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has an H2S adsorption energy (computed using grand canonical Monte Carlo) of -23.129 kJ\/mol."} {"text":"The metal-organic framework (MOF) with the RCSR identifier pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has an H2S adsorption energy (computed using grand canonical Monte Carlo) of -33.545 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/train_0-17.jsonl": "{"text":"The metal-organic framework with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has an CO2 adsorption energy (computed using grand canonical Monte Carlo) of -18.909 kJ\/mol."} {"text":"The reticular material with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an CO2 adsorption energy (computed using grand canonical Monte Carlo) of -29.977 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/train_0-16.jsonl": "{"text":"The metal-organic framework (MOF) with the topology nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has an carbon dioxide adsorption energy (computed using grand canonical Monte Carlo) of -18.909 kJ\/mol."} {"text":"The metal-organic framework with the RCSR code pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has an CO2 adsorption energy (computed using grand canonical Monte Carlo) of -29.977 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/train_0-39.jsonl": "{"text":"The metal-organic framework with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has a CH4 working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 38.544 cm^3 STP\/cm^3."} {"text":"The MOF with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a CH4 working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 6.617 cm^3 STP\/cm^3."}", "/scratch/micpie/export/qmof_gcmc/test_0-50.jsonl": "{"text":"The metal-organic framework (MOF) with the net fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an H2 working capacity between 1 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 35.566 g\/L."} {"text":"The metal-organic framework with the RCSR identifier ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has an hydrogen working capacity between 1 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 8.901 g\/L."}", "/scratch/micpie/export/qmof_gcmc/test_0-10.jsonl": "{"text":"The reticular material with the net fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of krypton Henry coefficient (computed using grand canonical Monte Carlo) of -4.825 mol\/kg\/Pa."} {"text":"The metal-organic framework (MOF) with the net ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has a 10-based logarithm of Kr Henry coefficient (computed using grand canonical Monte Carlo) of -4.879 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/valid_0-8.jsonl": "{"text":"The MOF with the RCSR identifier fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of xenon Henry coefficient (computed using grand canonical Monte Carlo) of -2.934 mol\/kg\/Pa."} {"text":"The metal-organic framework (MOF) with the net pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of xenon Henry coefficient (computed using grand canonical Monte Carlo) of -3.728 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/test_0-54.jsonl": "{"text":"The metal-organic framework with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has a density of 0.626 g\/cm^3."} {"text":"The MOF with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has a density of 1.245 g\/cm^3."}", "/scratch/micpie/export/qmof_gcmc/test_0-22.jsonl": "{"text":"The MOF with the topology fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an oxygen adsorption energy (computed using grand canonical Monte Carlo) of -9.311 kJ\/mol."} {"text":"The MOF with the topology ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has an oxygen adsorption energy (computed using grand canonical Monte Carlo) of -12.323 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/test_0-16.jsonl": "{"text":"The metal-organic framework with the RCSR identifier fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an carbon dioxide adsorption energy (computed using grand canonical Monte Carlo) of -18.846 kJ\/mol."} {"text":"The MOF with the topology ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has an carbon dioxide adsorption energy (computed using grand canonical Monte Carlo) of -20.940 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/train_0-52.jsonl": "{"text":"The metal-organic framework with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has a pore limiting diameter of 5.883 \\AA"} {"text":"The reticular material with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a pore limiting diameter (PLD) of 4.431 \\AA"}", "/scratch/micpie/export/qmof_gcmc/valid_0-53.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has a largest cavity diameter of 6.066 \\AA."} {"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a largest cavity diameter of 8.513 \\AA."}", "/scratch/micpie/export/qmof_gcmc/train_0-34.jsonl": "{"text":"The metal-organic framework (MOF) with the RCSR code nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has a methane\/nitrogen selectivity (computed using grand canonical Monte Carlo) of 5.568."} {"text":"The metal-organic framework (MOF) with the RCSR identifier pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has a CH4\/N2 selectivity (computed using grand canonical Monte Carlo) of 9.483."}", "/scratch/micpie/export/qmof_gcmc/test_0-15.jsonl": "{"text":"The metal-organic framework with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has a 10-based logarithm of H2O Henry coefficient (computed using grand canonical Monte Carlo) of -7.646 mol\/kg\/Pa."} {"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has a 10-based logarithm of H2O Henry coefficient (computed using grand canonical Monte Carlo) of -12.794 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/train_0-27.jsonl": "{"text":"The metal-organic framework with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has an krypton adsorption energy (computed using grand canonical Monte Carlo) of -16.920 kJ\/mol."} {"text":"The metal-organic framework with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an krypton adsorption energy (computed using grand canonical Monte Carlo) of -23.810 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/test_0-46.jsonl": "{"text":"The MOF with the RCSR identifier fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an hydrogen working capacity between 5 and 100 bar at 298 K (computed using grand canonical Monte Carlo) of 17.859 g\/L."} {"text":"The MOF with the RCSR code ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has an hydrogen working capacity between 5 and 100 bar at 298 K (computed using grand canonical Monte Carlo) of 9.453 g\/L."}", "/scratch/micpie/export/qmof_gcmc/train_0-8.jsonl": "{"text":"The reticular material with the RCSR code nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has a 10-based logarithm of xenon Henry coefficient (computed using grand canonical Monte Carlo) of -3.849 mol\/kg\/Pa."} {"text":"The metal-organic framework (MOF) with the RCSR identifier pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has a 10-based logarithm of Xe Henry coefficient (computed using grand canonical Monte Carlo) of -2.465 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/test_0-5.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has a 10-based logarithm of CH4 Henry coefficient (computed using grand canonical Monte Carlo) of -4.876 mol\/kg\/Pa."} {"text":"The MOF with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has a 10-based logarithm of CH4 Henry coefficient (computed using grand canonical Monte Carlo) of -4.943 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/valid_0-25.jsonl": "{"text":"The MOF with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has an Xe adsorption energy (computed using grand canonical Monte Carlo) of -31.887 kJ\/mol."} {"text":"The reticular material with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an Xe adsorption energy (computed using grand canonical Monte Carlo) of -23.387 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/valid_0-9.jsonl": "{"text":"The metal-organic framework with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has a 10-based logarithm of xenon Henry coefficient (computed using grand canonical Monte Carlo) of -2.934 mol\/kg\/Pa."} {"text":"The reticular material with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a 10-based logarithm of Xe Henry coefficient (computed using grand canonical Monte Carlo) of -3.728 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/test_0-26.jsonl": "{"text":"The metal-organic framework with the RCSR identifier fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an Kr adsorption energy (computed using grand canonical Monte Carlo) of -13.262 kJ\/mol."} {"text":"The metal-organic framework (MOF) with the RCSR identifier ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has an Kr adsorption energy (computed using grand canonical Monte Carlo) of -17.727 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/test_0-19.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has an N2 adsorption energy (computed using grand canonical Monte Carlo) of -9.096 kJ\/mol."} {"text":"The reticular material with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has an nitrogen adsorption energy (computed using grand canonical Monte Carlo) of -12.259 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/test_0-40.jsonl": "{"text":"The reticular material with the RCSR identifier fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a methane working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 10.780 mol\/kg."} {"text":"The reticular material with the topology ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has a methane working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 1.065 mol\/kg."}", "/scratch/micpie/export/qmof_gcmc/test_0-57.jsonl": "{"text":"User: I want to synthesize a metal-organic framework (MOF) with specific gas adsorption properties.\nAssistant: That's cool, how can I help?\nUser: I want the CO2 adsorption energy (computed using grand canonical Monte Carlo) to be -18.846, the nitrogen adsorption energy (computed using grand canonical Monte Carlo) to be -9.096, and the CH4 adsorption energy (computed using grand canonical Monte Carlo) to be -12.747.\nAssistant: I recommend [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0."} {"text":"User: I want to find a reticular material with specific gas adsorption properties.\nAssistant: What can I do for you?\nUser: I want the CO2 adsorption energy (computed using grand canonical Monte Carlo) to be -20.940, the N2 adsorption energy (computed using grand canonical Monte Carlo) to be -12.259, and the CH4 adsorption energy (computed using grand canonical Monte Carlo) to be -17.165.\nAssistant: I recommend [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0."}", "/scratch/micpie/export/qmof_gcmc/valid_0-38.jsonl": "{"text":"The metal-organic framework (MOF) with the RCSR identifier fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a CH4 working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 39.851 cm^3 STP\/cm^3."} {"text":"The reticular material with the topology pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a methane working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 100.994 cm^3 STP\/cm^3."}", "/scratch/micpie/export/qmof_gcmc/valid_0-28.jsonl": "{"text":"The MOF with the topology fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an H2S adsorption energy (computed using grand canonical Monte Carlo) of -31.475 kJ\/mol."} {"text":"The metal-organic framework (MOF) with the RCSR code pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an hydrogen sulfide adsorption energy (computed using grand canonical Monte Carlo) of -24.737 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/valid_0-24.jsonl": "{"text":"The reticular material with the net fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an xenon adsorption energy (computed using grand canonical Monte Carlo) of -31.887 kJ\/mol."} {"text":"The metal-organic framework with the topology pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an Xe adsorption energy (computed using grand canonical Monte Carlo) of -23.387 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/train_0-33.jsonl": "{"text":"The metal-organic framework with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has a hydrogen sulfide\/water selectivity (computed using grand canonical Monte Carlo) of 58.256."} {"text":"The metal-organic framework (MOF) with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a H2S\/H2O selectivity (computed using grand canonical Monte Carlo) of 83.762."}", "/scratch/micpie/export/qmof_gcmc/train_0-24.jsonl": "{"text":"The reticular material with the topology nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has an Xe adsorption energy (computed using grand canonical Monte Carlo) of -23.816 kJ\/mol."} {"text":"The reticular material with the RCSR identifier pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has an xenon adsorption energy (computed using grand canonical Monte Carlo) of -33.620 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/test_0-1.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has a 10-based logarithm of CO2 Henry coefficient (computed using grand canonical Monte Carlo) of -4.422 mol\/kg\/Pa."} {"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has a 10-based logarithm of carbon dioxide Henry coefficient (computed using grand canonical Monte Carlo) of -4.683 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/test_0-34.jsonl": "{"text":"The reticular material with the RCSR code fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a CH4\/N2 selectivity (computed using grand canonical Monte Carlo) of 3.256."} {"text":"The reticular material with the net ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has a CH4\/N2 selectivity (computed using grand canonical Monte Carlo) of 5.595."}", "/scratch/micpie/export/qmof_gcmc/test_0-18.jsonl": "{"text":"The reticular material with the topology fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an N2 adsorption energy (computed using grand canonical Monte Carlo) of -9.096 kJ\/mol."} {"text":"The reticular material with the net ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has an nitrogen adsorption energy (computed using grand canonical Monte Carlo) of -12.259 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/train_0-57.jsonl": "{"text":"User: I need to design a metal-organic framework (MOF) with specific gas adsorption properties.\nAssistant: How can I be of assistance?\nUser: I want the CO2 adsorption energy (computed using grand canonical Monte Carlo) to be -18.909, the nitrogen adsorption energy (computed using grand canonical Monte Carlo) to be -11.895, and the methane adsorption energy (computed using grand canonical Monte Carlo) to be -16.481.\nAssistant: In this case, I would recommend the following MOF: [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0."} {"text":"User: I need to find a metal-organic framework with specific gas adsorption properties.\nAssistant: Seems interesting, how can I support you?\nUser: I want the carbon dioxide adsorption energy (computed using grand canonical Monte Carlo) to be -29.977, the nitrogen adsorption energy (computed using grand canonical Monte Carlo) to be -17.612, and the CH4 adsorption energy (computed using grand canonical Monte Carlo) to be -23.348.\nAssistant: In this case, I would recommend the following MOF: [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1."}", "/scratch/micpie/export/qmof_gcmc/test_0-29.jsonl": "{"text":"The MOF with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has an H2S adsorption energy (computed using grand canonical Monte Carlo) of -25.140 kJ\/mol."} {"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has an H2S adsorption energy (computed using grand canonical Monte Carlo) of -24.677 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/valid_0-0.jsonl": "{"text":"The reticular material with the topology fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of carbon dioxide Henry coefficient (computed using grand canonical Monte Carlo) of -3.894 mol\/kg\/Pa."} {"text":"The MOF with the RCSR identifier pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of carbon dioxide Henry coefficient (computed using grand canonical Monte Carlo) of -4.129 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/test_0-49.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has an hydrogen working capacity between 5 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 20.246 g\/L."} {"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has an H2 working capacity between 5 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 4.484 g\/L."}", "/scratch/micpie/export/qmof_gcmc/valid_0-39.jsonl": "{"text":"The MOF with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has a CH4 working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 39.851 cm^3 STP\/cm^3."} {"text":"The metal-organic framework with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a CH4 working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 100.994 cm^3 STP\/cm^3."}", "/scratch/micpie/export/qmof_gcmc/train_0-36.jsonl": "{"text":"The MOF with the RCSR identifier nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has a Xe\/Kr selectivity (computed using grand canonical Monte Carlo) of 10.858."} {"text":"The metal-organic framework with the RCSR identifier pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has a xenon\/krypton selectivity (computed using grand canonical Monte Carlo) of 29.142."}", "/scratch/micpie/export/qmof_gcmc/test_0-33.jsonl": "{"text":"The MOF with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has a H2S\/H2O selectivity (computed using grand canonical Monte Carlo) of 0.377."} {"text":"The MOF with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has a hydrogen sulfide\/water selectivity (computed using grand canonical Monte Carlo) of 35.397."}", "/scratch/micpie/export/qmof_gcmc/test_0-32.jsonl": "{"text":"The reticular material with the topology fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a hydrogen sulfide\/water selectivity (computed using grand canonical Monte Carlo) of 0.377."} {"text":"The reticular material with the RCSR identifier ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has a hydrogen sulfide\/water selectivity (computed using grand canonical Monte Carlo) of 35.397."}", "/scratch/micpie/export/qmof_gcmc/test_0-21.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has an CH4 adsorption energy (computed using grand canonical Monte Carlo) of -12.747 kJ\/mol."} {"text":"The MOF with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has an methane adsorption energy (computed using grand canonical Monte Carlo) of -17.165 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/test_0-27.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has an Kr adsorption energy (computed using grand canonical Monte Carlo) of -13.262 kJ\/mol."} {"text":"The MOF with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has an krypton adsorption energy (computed using grand canonical Monte Carlo) of -17.727 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/test_0-2.jsonl": "{"text":"The reticular material with the topology fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of N2 Henry coefficient (computed using grand canonical Monte Carlo) of -5.389 mol\/kg\/Pa."} {"text":"The metal-organic framework (MOF) with the topology ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has a 10-based logarithm of N2 Henry coefficient (computed using grand canonical Monte Carlo) of -5.691 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/test_0-30.jsonl": "{"text":"The reticular material with the RCSR code fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an H2O adsorption energy (computed using grand canonical Monte Carlo) of -35.131 kJ\/mol."} {"text":"The metal-organic framework (MOF) with the RCSR identifier ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has an H2O adsorption energy (computed using grand canonical Monte Carlo) of -16.125 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/valid_0-42.jsonl": "{"text":"The metal-organic framework with the topology fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an O2 working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 104.906 cm^3 STP\/cm^3."} {"text":"The MOF with the topology pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an O2 working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 187.860 cm^3 STP\/cm^3."}", "/scratch/micpie/export/qmof_gcmc/test_0-41.jsonl": "{"text":"The MOF with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has a CH4 working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 10.780 mol\/kg."} {"text":"The metal-organic framework with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has a methane working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 1.065 mol\/kg."}", "/scratch/micpie/export/qmof_gcmc/train_0-22.jsonl": "{"text":"The metal-organic framework (MOF) with the topology nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has an O2 adsorption energy (computed using grand canonical Monte Carlo) of -11.905 kJ\/mol."} {"text":"The MOF with the topology pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has an O2 adsorption energy (computed using grand canonical Monte Carlo) of -16.915 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/train_0-31.jsonl": "{"text":"The reticular material with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has an water adsorption energy (computed using grand canonical Monte Carlo) of -11.822 kJ\/mol."} {"text":"The MOF with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an water adsorption energy (computed using grand canonical Monte Carlo) of -26.154 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/train_0-35.jsonl": "{"text":"The MOF with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has a CH4\/N2 selectivity (computed using grand canonical Monte Carlo) of 5.568."} {"text":"The MOF with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a CH4\/N2 selectivity (computed using grand canonical Monte Carlo) of 9.483."}", "/scratch/micpie/export/qmof_gcmc/valid_0-10.jsonl": "{"text":"The metal-organic framework with the RCSR identifier fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of Kr Henry coefficient (computed using grand canonical Monte Carlo) of -4.260 mol\/kg\/Pa."} {"text":"The MOF with the net pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of Kr Henry coefficient (computed using grand canonical Monte Carlo) of -4.659 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/train_0-56.jsonl": "{"text":"User: I want to design a reticular material with specific gas adsorption properties.\nAssistant: Seems interesting, how can I support you?\nUser: I would like the CH4\/N2 selectivity (computed using grand canonical Monte Carlo) to be 5.568, the H2S\/H2O selectivity (computed using grand canonical Monte Carlo) to be 58.256, and the methane working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) to be 38.544.\nAssistant: I found the following MOF for you: [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0."} {"text":"User: I need to design a MOF with specific gas adsorption properties.\nAssistant: Interesting, can I be of any help?\nUser: I would like the methane\/nitrogen selectivity (computed using grand canonical Monte Carlo) to be 9.483, the hydrogen sulfide\/water selectivity (computed using grand canonical Monte Carlo) to be 83.762, and the methane working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) to be 6.617.\nAssistant: In this case, I would recommend the following MOF: [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1."}", "/scratch/micpie/export/qmof_gcmc/train_0-6.jsonl": "{"text":"The MOF with the RCSR identifier nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has a 10-based logarithm of oxygen Henry coefficient using grand canonical Monte Carlo of -5.645 mol\/kg\/Pa."} {"text":"The reticular material with the net pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has a 10-based logarithm of oxygen Henry coefficient using grand canonical Monte Carlo of -4.973 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/valid_0-6.jsonl": "{"text":"The MOF with the RCSR code fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of O2 Henry coefficient (computed using grand canonical Monte Carlo) of -5.156 mol\/kg\/Pa."} {"text":"The metal-organic framework (MOF) with the net pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of O2 Henry coefficient (computed using grand canonical Monte Carlo) of -5.280 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/valid_0-32.jsonl": "{"text":"The metal-organic framework (MOF) with the topology fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a hydrogen sulfide\/water selectivity (computed using grand canonical Monte Carlo) of 37.275."} {"text":"The metal-organic framework (MOF) with the RCSR code pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a hydrogen sulfide\/water selectivity (computed using grand canonical Monte Carlo) of 4.017."}", "/scratch/micpie/export/qmof_gcmc/valid_0-30.jsonl": "{"text":"The reticular material with the net fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an water adsorption energy (computed using grand canonical Monte Carlo) of -27.533 kJ\/mol."} {"text":"The metal-organic framework (MOF) with the RCSR code pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an water adsorption energy (computed using grand canonical Monte Carlo) of -28.572 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/train_0-21.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has an CH4 adsorption energy (computed using grand canonical Monte Carlo) of -16.481 kJ\/mol."} {"text":"The MOF with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an CH4 adsorption energy (computed using grand canonical Monte Carlo) of -23.348 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/test_0-36.jsonl": "{"text":"The reticular material with the RCSR identifier fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a xenon\/krypton selectivity (computed using grand canonical Monte Carlo) of 5.718."} {"text":"The MOF with the topology ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has a Xe\/Kr selectivity (computed using grand canonical Monte Carlo) of 10.790."}", "/scratch/micpie/export/qmof_gcmc/test_0-56.jsonl": "{"text":"User: I would like to find a metal-organic framework with specific gas adsorption properties.\nAssistant: How can I help you?\nUser: I would like the methane\/nitrogen selectivity (computed using grand canonical Monte Carlo) to be 3.256, the H2S\/H2O selectivity (computed using grand canonical Monte Carlo) to be 0.377, and the methane working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) to be 151.202.\nAssistant: I recommend [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0."} {"text":"User: I would like to design a metal-organic framework with specific gas adsorption properties.\nAssistant: How can I be of assistance?\nUser: I would like the CH4\/N2 selectivity (computed using grand canonical Monte Carlo) to be 5.595, the hydrogen sulfide\/water selectivity (computed using grand canonical Monte Carlo) to be 35.397, and the CH4 working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) to be 29.724.\nAssistant: I found the following MOF for you: [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0."}", "/scratch/micpie/export/qmof_gcmc/valid_0-46.jsonl": "{"text":"The MOF with the topology fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an H2 working capacity between 5 and 100 bar at 298 K (computed using grand canonical Monte Carlo) of 13.340 g\/L."} {"text":"The metal-organic framework with the net pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an H2 working capacity between 5 and 100 bar at 298 K (computed using grand canonical Monte Carlo) of 17.404 g\/L."}", "/scratch/micpie/export/qmof_gcmc/test_0-39.jsonl": "{"text":"The metal-organic framework with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has a CH4 working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 151.202 cm^3 STP\/cm^3."} {"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has a methane working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 29.724 cm^3 STP\/cm^3."}", "/scratch/micpie/export/qmof_gcmc/train_0-47.jsonl": "{"text":"The MOF with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has an H2 working capacity between 5 and 100 bar at 298 K (computed using grand canonical Monte Carlo) of 8.506 g\/L."} {"text":"The MOF with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an hydrogen working capacity between 5 and 100 bar at 298 K (computed using grand canonical Monte Carlo) of 9.267 g\/L."}", "/scratch/micpie/export/qmof_gcmc/valid_0-36.jsonl": "{"text":"The metal-organic framework (MOF) with the net fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a Xe\/Kr selectivity (computed using grand canonical Monte Carlo) of 21.165."} {"text":"The metal-organic framework with the topology pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a xenon\/krypton selectivity (computed using grand canonical Monte Carlo) of 8.528."}", "/scratch/micpie/export/qmof_gcmc/train_0-55.jsonl": "{"text":"User: I need to synthesize a metal-organic framework with specific gas adsorption properties.\nAssistant: What can I do for you?\nUser: I want the hydrogen sulfide\/water selectivity (computed using grand canonical Monte Carlo) to be 58.256 and the CH4\/N2 selectivity (computed using grand canonical Monte Carlo) to be 5.568.\nAssistant: I recommend [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0."} {"text":"User: I must to synthesize a reticular material with specific gas adsorption properties.\nAssistant: That's cool, how can I help?\nUser: I would like the hydrogen sulfide\/water selectivity (computed using grand canonical Monte Carlo) to be 83.762 and the CH4\/N2 selectivity (computed using grand canonical Monte Carlo) to be 9.483.\nAssistant: I suggest [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1."}", "/scratch/micpie/export/qmof_gcmc/train_0-19.jsonl": "{"text":"The metal-organic framework with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has an nitrogen adsorption energy (computed using grand canonical Monte Carlo) of -11.895 kJ\/mol."} {"text":"The metal-organic framework with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an nitrogen adsorption energy (computed using grand canonical Monte Carlo) of -17.612 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/valid_0-29.jsonl": "{"text":"The MOF with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has an H2S adsorption energy (computed using grand canonical Monte Carlo) of -31.475 kJ\/mol."} {"text":"The MOF with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an H2S adsorption energy (computed using grand canonical Monte Carlo) of -24.737 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/test_0-55.jsonl": "{"text":"User: I must to find a metal-organic framework (MOF) with specific gas adsorption properties.\nAssistant: How can I be of assistance?\nUser: I want the H2S\/H2O selectivity (computed using grand canonical Monte Carlo) to be 0.377 and the methane\/nitrogen selectivity (computed using grand canonical Monte Carlo) to be 3.256.\nAssistant: I found the following MOF for you: [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0."} {"text":"User: I would like to find a reticular material with specific gas adsorption properties.\nAssistant: That's cool, how can I help?\nUser: I would like the hydrogen sulfide\/water selectivity (computed using grand canonical Monte Carlo) to be 35.397 and the methane\/nitrogen selectivity (computed using grand canonical Monte Carlo) to be 5.595.\nAssistant: In this case, I would recommend the following MOF: [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0."}", "/scratch/micpie/export/qmof_gcmc/test_0-42.jsonl": "{"text":"The reticular material with the topology fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an O2 working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 220.487 cm^3 STP\/cm^3."} {"text":"The reticular material with the topology ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has an O2 working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 91.761 cm^3 STP\/cm^3."}", "/scratch/micpie/export/qmof_gcmc/test_0-9.jsonl": "{"text":"The MOF with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has a 10-based logarithm of Xe Henry coefficient (computed using grand canonical Monte Carlo) of -4.068 mol\/kg\/Pa."} {"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has a 10-based logarithm of Xe Henry coefficient (computed using grand canonical Monte Carlo) of -3.846 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/train_0-32.jsonl": "{"text":"The MOF with the net nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has a H2S\/H2O selectivity (computed using grand canonical Monte Carlo) of 58.256."} {"text":"The metal-organic framework (MOF) with the net pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has a H2S\/H2O selectivity (computed using grand canonical Monte Carlo) of 83.762."}", "/scratch/micpie/export/qmof_gcmc/valid_0-52.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has a pore limiting diameter (PLD) of 4.535 \\AA"} {"text":"The MOF with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a pore limiting diameter (PLD) of 7.772 \\AA"}", "/scratch/micpie/export/qmof_gcmc/train_0-40.jsonl": "{"text":"The reticular material with the RCSR identifier nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has a CH4 working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 1.456 mol\/kg."} {"text":"The MOF with the net pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has a methane working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 0.238 mol\/kg."}", "/scratch/micpie/export/qmof_gcmc/test_0-0.jsonl": "{"text":"The metal-organic framework (MOF) with the RCSR code fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of carbon dioxide Henry coefficient (computed using grand canonical Monte Carlo) of -4.422 mol\/kg\/Pa."} {"text":"The reticular material with the RCSR code ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has a 10-based logarithm of carbon dioxide Henry coefficient (computed using grand canonical Monte Carlo) of -4.683 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/test_0-31.jsonl": "{"text":"The metal-organic framework with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has an water adsorption energy (computed using grand canonical Monte Carlo) of -35.131 kJ\/mol."} {"text":"The reticular material with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has an H2O adsorption energy (computed using grand canonical Monte Carlo) of -16.125 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/test_0-24.jsonl": "{"text":"The reticular material with the RCSR code fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an xenon adsorption energy (computed using grand canonical Monte Carlo) of -19.068 kJ\/mol."} {"text":"The reticular material with the RCSR code ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has an Xe adsorption energy (computed using grand canonical Monte Carlo) of -25.166 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/train_0-54.jsonl": "{"text":"The metal-organic framework with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has a density of 1.181 g\/cm^3."} {"text":"The metal-organic framework with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a density of 1.240 g\/cm^3."}", "/scratch/micpie/export/qmof_gcmc/valid_0-16.jsonl": "{"text":"The metal-organic framework (MOF) with the topology fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an CO2 adsorption energy (computed using grand canonical Monte Carlo) of -25.983 kJ\/mol."} {"text":"The MOF with the RCSR identifier pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an CO2 adsorption energy (computed using grand canonical Monte Carlo) of -22.982 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/valid_0-7.jsonl": "{"text":"The reticular material with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has a 10-based logarithm of O2 Henry coefficient (computed using grand canonical Monte Carlo) of -5.156 mol\/kg\/Pa."} {"text":"The MOF with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a 10-based logarithm of O2 Henry coefficient (computed using grand canonical Monte Carlo) of -5.280 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/valid_0-47.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has an hydrogen working capacity between 5 and 100 bar at 298 K (computed using grand canonical Monte Carlo) of 13.340 g\/L."} {"text":"The MOF with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an hydrogen working capacity between 5 and 100 bar at 298 K (computed using grand canonical Monte Carlo) of 17.404 g\/L."}", "/scratch/micpie/export/qmof_gcmc/valid_0-34.jsonl": "{"text":"The metal-organic framework with the topology fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a CH4\/N2 selectivity (computed using grand canonical Monte Carlo) of 7.834."} {"text":"The metal-organic framework (MOF) with the net pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a methane\/nitrogen selectivity (computed using grand canonical Monte Carlo) of 4.236."}", "/scratch/micpie/export/qmof_gcmc/test_0-3.jsonl": "{"text":"The metal-organic framework with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has a 10-based logarithm of N2 Henry coefficient (computed using grand canonical Monte Carlo) of -5.389 mol\/kg\/Pa."} {"text":"The reticular material with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has a 10-based logarithm of N2 Henry coefficient (computed using grand canonical Monte Carlo) of -5.691 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/valid_0-11.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has a 10-based logarithm of krypton Henry coefficient (computed using grand canonical Monte Carlo) of -4.260 mol\/kg\/Pa."} {"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a 10-based logarithm of Kr Henry coefficient (computed using grand canonical Monte Carlo) of -4.659 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/train_0-20.jsonl": "{"text":"The MOF with the RCSR identifier nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has an methane adsorption energy (computed using grand canonical Monte Carlo) of -16.481 kJ\/mol."} {"text":"The metal-organic framework (MOF) with the topology pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has an CH4 adsorption energy (computed using grand canonical Monte Carlo) of -23.348 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/train_0-43.jsonl": "{"text":"The reticular material with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has an O2 working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 82.446 cm^3 STP\/cm^3."} {"text":"The metal-organic framework (MOF) with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an O2 working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 49.111 cm^3 STP\/cm^3."}", "/scratch/micpie/export/qmof_gcmc/test_0-44.jsonl": "{"text":"The metal-organic framework with the RCSR code fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an oxygen working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 15.719 mol\/kg."} {"text":"The metal-organic framework (MOF) with the topology ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has an O2 working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 3.288 mol\/kg."}", "/scratch/micpie/export/qmof_gcmc/train_0-30.jsonl": "{"text":"The metal-organic framework (MOF) with the topology nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has an water adsorption energy (computed using grand canonical Monte Carlo) of -11.822 kJ\/mol."} {"text":"The MOF with the RCSR code pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has an water adsorption energy (computed using grand canonical Monte Carlo) of -26.154 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/valid_0-20.jsonl": "{"text":"The reticular material with the topology fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an CH4 adsorption energy (computed using grand canonical Monte Carlo) of -21.289 kJ\/mol."} {"text":"The metal-organic framework with the net pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an methane adsorption energy (computed using grand canonical Monte Carlo) of -15.705 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/valid_0-37.jsonl": "{"text":"The reticular material with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has a Xe\/Kr selectivity (computed using grand canonical Monte Carlo) of 21.165."} {"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a xenon\/krypton selectivity (computed using grand canonical Monte Carlo) of 8.528."}", "/scratch/micpie/export/qmof_gcmc/train_0-26.jsonl": "{"text":"The reticular material with the RCSR code nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has an krypton adsorption energy (computed using grand canonical Monte Carlo) of -16.920 kJ\/mol."} {"text":"The MOF with the RCSR code pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has an Kr adsorption energy (computed using grand canonical Monte Carlo) of -23.810 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/train_0-0.jsonl": "{"text":"The metal-organic framework with the RCSR identifier nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has a 10-based logarithm of carbon dioxide Henry coefficient (computed using grand canonical Monte Carlo) of -4.756 mol\/kg\/Pa."} {"text":"The MOF with the RCSR identifier pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has a 10-based logarithm of CO2 Henry coefficient (computed using grand canonical Monte Carlo) of -3.258 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/test_0-6.jsonl": "{"text":"The metal-organic framework (MOF) with the RCSR identifier fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of oxygen Henry coefficient using grand canonical Monte Carlo of -5.340 mol\/kg\/Pa."} {"text":"The MOF with the topology ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has a 10-based logarithm of O2 Henry coefficient (computed using grand canonical Monte Carlo) of -5.643 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/train_0-10.jsonl": "{"text":"The metal-organic framework (MOF) with the net nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has a 10-based logarithm of krypton Henry coefficient (computed using grand canonical Monte Carlo) of -4.885 mol\/kg\/Pa."} {"text":"The metal-organic framework with the RCSR identifier pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has a 10-based logarithm of Kr Henry coefficient (computed using grand canonical Monte Carlo) of -3.929 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/train_0-3.jsonl": "{"text":"The reticular material with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has a 10-based logarithm of N2 Henry coefficient (computed using grand canonical Monte Carlo) of -5.682 mol\/kg\/Pa."} {"text":"The metal-organic framework with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a 10-based logarithm of N2 Henry coefficient (computed using grand canonical Monte Carlo) of -4.976 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/train_0-23.jsonl": "{"text":"The metal-organic framework with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has an O2 adsorption energy (computed using grand canonical Monte Carlo) of -11.905 kJ\/mol."} {"text":"The metal-organic framework (MOF) with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an O2 adsorption energy (computed using grand canonical Monte Carlo) of -16.915 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/train_0-46.jsonl": "{"text":"The metal-organic framework (MOF) with the net nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has an H2 working capacity between 5 and 100 bar at 298 K (computed using grand canonical Monte Carlo) of 8.506 g\/L."} {"text":"The MOF with the RCSR code pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has an hydrogen working capacity between 5 and 100 bar at 298 K (computed using grand canonical Monte Carlo) of 9.267 g\/L."}", "/scratch/micpie/export/qmof_gcmc/train_0-12.jsonl": "{"text":"The metal-organic framework (MOF) with the topology nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has a 10-based logarithm of hydrogen sulfide Henry coefficient (computed using grand canonical Monte Carlo) of -9.308 mol\/kg\/Pa."} {"text":"The reticular material with the net pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has a 10-based logarithm of hydrogen sulfide Henry coefficient (computed using grand canonical Monte Carlo) of -5.927 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/test_0-28.jsonl": "{"text":"The reticular material with the topology fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an hydrogen sulfide adsorption energy (computed using grand canonical Monte Carlo) of -25.140 kJ\/mol."} {"text":"The MOF with the topology ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has an hydrogen sulfide adsorption energy (computed using grand canonical Monte Carlo) of -24.677 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/valid_0-40.jsonl": "{"text":"The MOF with the topology fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a methane working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 1.605 mol\/kg."} {"text":"The reticular material with the RCSR code pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a methane working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 5.331 mol\/kg."}", "/scratch/micpie/export/qmof_gcmc/valid_0-50.jsonl": "{"text":"The metal-organic framework with the RCSR identifier fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an hydrogen working capacity between 1 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 8.584 g\/L."} {"text":"The metal-organic framework (MOF) with the RCSR identifier pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an H2 working capacity between 1 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 21.284 g\/L."}", "/scratch/micpie/export/qmof_gcmc/test_0-13.jsonl": "{"text":"The reticular material with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has a 10-based logarithm of hydrogen sulfide Henry coefficient (computed using grand canonical Monte Carlo) of -8.621 mol\/kg\/Pa."} {"text":"The MOF with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has a 10-based logarithm of hydrogen sulfide Henry coefficient (computed using grand canonical Monte Carlo) of -9.227 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/test_0-23.jsonl": "{"text":"The MOF with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has an O2 adsorption energy (computed using grand canonical Monte Carlo) of -9.311 kJ\/mol."} {"text":"The metal-organic framework with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has an oxygen adsorption energy (computed using grand canonical Monte Carlo) of -12.323 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/valid_0-2.jsonl": "{"text":"The reticular material with the net fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of N2 Henry coefficient (computed using grand canonical Monte Carlo) of -5.230 mol\/kg\/Pa."} {"text":"The reticular material with the RCSR identifier pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of nitrogen Henry coefficient (computed using grand canonical Monte Carlo) of -5.352 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/valid_0-49.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has an hydrogen working capacity between 5 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 5.500 g\/L."} {"text":"The MOF with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an hydrogen working capacity between 5 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 11.642 g\/L."}", "/scratch/micpie/export/qmof_gcmc/valid_0-21.jsonl": "{"text":"The metal-organic framework with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has an CH4 adsorption energy (computed using grand canonical Monte Carlo) of -21.289 kJ\/mol."} {"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an methane adsorption energy (computed using grand canonical Monte Carlo) of -15.705 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/train_0-14.jsonl": "{"text":"The reticular material with the RCSR identifier nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has a 10-based logarithm of H2O Henry coefficient (computed using grand canonical Monte Carlo) of -13.373 mol\/kg\/Pa."} {"text":"The metal-organic framework (MOF) with the net pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has a 10-based logarithm of H2O Henry coefficient (computed using grand canonical Monte Carlo) of -10.355 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/test_0-51.jsonl": "{"text":"The MOF with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has an hydrogen working capacity between 1 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 35.566 g\/L."} {"text":"The MOF with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has an H2 working capacity between 1 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 8.901 g\/L."}", "/scratch/micpie/export/qmof_gcmc/valid_0-1.jsonl": "{"text":"The MOF with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has a 10-based logarithm of CO2 Henry coefficient (computed using grand canonical Monte Carlo) of -3.894 mol\/kg\/Pa."} {"text":"The reticular material with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a 10-based logarithm of CO2 Henry coefficient (computed using grand canonical Monte Carlo) of -4.129 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/valid_0-13.jsonl": "{"text":"The MOF with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has a 10-based logarithm of hydrogen sulfide Henry coefficient (computed using grand canonical Monte Carlo) of -7.002 mol\/kg\/Pa."} {"text":"The metal-organic framework with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a 10-based logarithm of hydrogen sulfide Henry coefficient (computed using grand canonical Monte Carlo) of -8.471 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/train_0-44.jsonl": "{"text":"The reticular material with the net nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has an oxygen working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 3.115 mol\/kg."} {"text":"The MOF with the net pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has an O2 working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 1.768 mol\/kg."}", "/scratch/micpie/export/qmof_gcmc/test_0-52.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has a pore limiting diameter of 7.459 \\AA"} {"text":"The MOF with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has a pore limiting diameter of 5.262 \\AA"}", "/scratch/micpie/export/qmof_gcmc/train_0-41.jsonl": "{"text":"The MOF with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has a methane working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 1.456 mol\/kg."} {"text":"The reticular material with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a methane working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 0.238 mol\/kg."}", "/scratch/micpie/export/qmof_gcmc/train_0-29.jsonl": "{"text":"The reticular material with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has an H2S adsorption energy (computed using grand canonical Monte Carlo) of -23.129 kJ\/mol."} {"text":"The metal-organic framework with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an H2S adsorption energy (computed using grand canonical Monte Carlo) of -33.545 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/valid_0-51.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has an H2 working capacity between 1 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 8.584 g\/L."} {"text":"The reticular material with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an H2 working capacity between 1 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 21.284 g\/L."}", "/scratch/micpie/export/qmof_gcmc/valid_0-23.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has an O2 adsorption energy (computed using grand canonical Monte Carlo) of -15.243 kJ\/mol."} {"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an O2 adsorption energy (computed using grand canonical Monte Carlo) of -11.739 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/valid_0-5.jsonl": "{"text":"The metal-organic framework with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has a 10-based logarithm of methane Henry coefficient (computed using grand canonical Monte Carlo) of -4.336 mol\/kg\/Pa."} {"text":"The MOF with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a 10-based logarithm of methane Henry coefficient (computed using grand canonical Monte Carlo) of -4.725 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/train_0-15.jsonl": "{"text":"The metal-organic framework with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has a 10-based logarithm of H2O Henry coefficient (computed using grand canonical Monte Carlo) of -13.373 mol\/kg\/Pa."} {"text":"The metal-organic framework (MOF) with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a 10-based logarithm of water Henry coefficient (computed using grand canonical Monte Carlo) of -10.355 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/valid_0-4.jsonl": "{"text":"The metal-organic framework (MOF) with the RCSR code fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of methane Henry coefficient (computed using grand canonical Monte Carlo) of -4.336 mol\/kg\/Pa."} {"text":"The MOF with the RCSR identifier pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of CH4 Henry coefficient (computed using grand canonical Monte Carlo) of -4.725 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/train_0-5.jsonl": "{"text":"The MOF with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has a 10-based logarithm of methane Henry coefficient (computed using grand canonical Monte Carlo) of -4.936 mol\/kg\/Pa."} {"text":"The reticular material with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a 10-based logarithm of CH4 Henry coefficient (computed using grand canonical Monte Carlo) of -3.999 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/valid_0-15.jsonl": "{"text":"The reticular material with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has a 10-based logarithm of water Henry coefficient (computed using grand canonical Monte Carlo) of -10.620 mol\/kg\/Pa."} {"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a 10-based logarithm of H2O Henry coefficient (computed using grand canonical Monte Carlo) of -9.862 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/valid_0-54.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has a density of 1.108 g\/cm^3."} {"text":"The reticular material with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a density of 0.845 g\/cm^3."}", "/scratch/micpie/export/qmof_gcmc/valid_0-12.jsonl": "{"text":"The metal-organic framework with the RCSR identifier fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of hydrogen sulfide Henry coefficient (computed using grand canonical Monte Carlo) of -7.002 mol\/kg\/Pa."} {"text":"The MOF with the RCSR code pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of hydrogen sulfide Henry coefficient (computed using grand canonical Monte Carlo) of -8.471 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/train_0-50.jsonl": "{"text":"The MOF with the topology nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has an hydrogen working capacity between 1 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 7.010 g\/L."} {"text":"The reticular material with the net pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has an H2 working capacity between 1 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 8.424 g\/L."}", "/scratch/micpie/export/qmof_gcmc/valid_0-18.jsonl": "{"text":"The reticular material with the topology fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an N2 adsorption energy (computed using grand canonical Monte Carlo) of -15.288 kJ\/mol."} {"text":"The metal-organic framework (MOF) with the RCSR identifier pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an nitrogen adsorption energy (computed using grand canonical Monte Carlo) of -11.424 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/train_0-45.jsonl": "{"text":"The metal-organic framework with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has an O2 working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 3.115 mol\/kg."} {"text":"The MOF with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an O2 working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 1.768 mol\/kg."}", "/scratch/micpie/export/qmof_gcmc/test_0-35.jsonl": "{"text":"The metal-organic framework with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has a methane\/nitrogen selectivity (computed using grand canonical Monte Carlo) of 3.256."} {"text":"The MOF with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has a methane\/nitrogen selectivity (computed using grand canonical Monte Carlo) of 5.595."}", "/scratch/micpie/export/qmof_gcmc/valid_0-56.jsonl": "{"text":"User: I must to design a metal-organic framework with specific gas adsorption properties.\nAssistant: How can I be of assistance?\nUser: I want the CH4\/N2 selectivity (computed using grand canonical Monte Carlo) to be 7.834, the hydrogen sulfide\/water selectivity (computed using grand canonical Monte Carlo) to be 37.275, and the CH4 working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) to be 39.851.\nAssistant: I suggest [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1."} {"text":"User: I need to find a metal-organic framework (MOF) with specific gas adsorption properties.\nAssistant: What can I do for you?\nUser: I would like the CH4\/N2 selectivity (computed using grand canonical Monte Carlo) to be 4.236, the H2S\/H2O selectivity (computed using grand canonical Monte Carlo) to be 4.017, and the methane working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) to be 100.994.\nAssistant: I suggest [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1."}", "/scratch/micpie/export/qmof_gcmc/test_0-47.jsonl": "{"text":"The metal-organic framework with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has an H2 working capacity between 5 and 100 bar at 298 K (computed using grand canonical Monte Carlo) of 17.859 g\/L."} {"text":"The reticular material with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has an hydrogen working capacity between 5 and 100 bar at 298 K (computed using grand canonical Monte Carlo) of 9.453 g\/L."}", "/scratch/micpie/export/qmof_gcmc/train_0-2.jsonl": "{"text":"The MOF with the RCSR code nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has a 10-based logarithm of N2 Henry coefficient (computed using grand canonical Monte Carlo) of -5.682 mol\/kg\/Pa."} {"text":"The reticular material with the RCSR code pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has a 10-based logarithm of nitrogen Henry coefficient (computed using grand canonical Monte Carlo) of -4.976 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/valid_0-33.jsonl": "{"text":"The MOF with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has a hydrogen sulfide\/water selectivity (computed using grand canonical Monte Carlo) of 37.275."} {"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a hydrogen sulfide\/water selectivity (computed using grand canonical Monte Carlo) of 4.017."}", "/scratch/micpie/export/qmof_gcmc/train_0-42.jsonl": "{"text":"The reticular material with the RCSR identifier nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has an oxygen working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 82.446 cm^3 STP\/cm^3."} {"text":"The reticular material with the RCSR identifier pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has an oxygen working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 49.111 cm^3 STP\/cm^3."}", "/scratch/micpie/export/qmof_gcmc/test_0-11.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has a 10-based logarithm of krypton Henry coefficient (computed using grand canonical Monte Carlo) of -4.825 mol\/kg\/Pa."} {"text":"The reticular material with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has a 10-based logarithm of Kr Henry coefficient (computed using grand canonical Monte Carlo) of -4.879 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/train_0-7.jsonl": "{"text":"The MOF with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has a 10-based logarithm of O2 Henry coefficient (computed using grand canonical Monte Carlo) of -5.645 mol\/kg\/Pa."} {"text":"The metal-organic framework (MOF) with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a 10-based logarithm of oxygen Henry coefficient using grand canonical Monte Carlo of -4.973 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/test_0-17.jsonl": "{"text":"The reticular material with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has an carbon dioxide adsorption energy (computed using grand canonical Monte Carlo) of -18.846 kJ\/mol."} {"text":"The reticular material with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has an CO2 adsorption energy (computed using grand canonical Monte Carlo) of -20.940 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/valid_0-27.jsonl": "{"text":"The reticular material with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has an Kr adsorption energy (computed using grand canonical Monte Carlo) of -21.832 kJ\/mol."} {"text":"The metal-organic framework with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an krypton adsorption energy (computed using grand canonical Monte Carlo) of -16.291 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/valid_0-19.jsonl": "{"text":"The reticular material with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has an N2 adsorption energy (computed using grand canonical Monte Carlo) of -15.288 kJ\/mol."} {"text":"The metal-organic framework with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an N2 adsorption energy (computed using grand canonical Monte Carlo) of -11.424 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/train_0-11.jsonl": "{"text":"The reticular material with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has a 10-based logarithm of krypton Henry coefficient (computed using grand canonical Monte Carlo) of -4.885 mol\/kg\/Pa."} {"text":"The metal-organic framework (MOF) with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a 10-based logarithm of Kr Henry coefficient (computed using grand canonical Monte Carlo) of -3.929 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/valid_0-31.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has an H2O adsorption energy (computed using grand canonical Monte Carlo) of -27.533 kJ\/mol."} {"text":"The metal-organic framework with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an water adsorption energy (computed using grand canonical Monte Carlo) of -28.572 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/train_0-1.jsonl": "{"text":"The metal-organic framework with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has a 10-based logarithm of carbon dioxide Henry coefficient (computed using grand canonical Monte Carlo) of -4.756 mol\/kg\/Pa."} {"text":"The metal-organic framework (MOF) with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a 10-based logarithm of carbon dioxide Henry coefficient (computed using grand canonical Monte Carlo) of -3.258 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/test_0-37.jsonl": "{"text":"The reticular material with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has a Xe\/Kr selectivity (computed using grand canonical Monte Carlo) of 5.718."} {"text":"The MOF with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has a Xe\/Kr selectivity (computed using grand canonical Monte Carlo) of 10.790."}", "/scratch/micpie/export/qmof_gcmc/train_0-49.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has an hydrogen working capacity between 5 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 3.738 g\/L."} {"text":"The metal-organic framework (MOF) with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an hydrogen working capacity between 5 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 5.339 g\/L."}", "/scratch/micpie/export/qmof_gcmc/train_0-13.jsonl": "{"text":"The MOF with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has a 10-based logarithm of hydrogen sulfide Henry coefficient (computed using grand canonical Monte Carlo) of -9.308 mol\/kg\/Pa."} {"text":"The reticular material with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a 10-based logarithm of H2S Henry coefficient (computed using grand canonical Monte Carlo) of -5.927 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/valid_0-26.jsonl": "{"text":"The MOF with the RCSR identifier fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an Kr adsorption energy (computed using grand canonical Monte Carlo) of -21.832 kJ\/mol."} {"text":"The metal-organic framework with the RCSR code pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an Kr adsorption energy (computed using grand canonical Monte Carlo) of -16.291 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/train_0-4.jsonl": "{"text":"The MOF with the net nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has a 10-based logarithm of CH4 Henry coefficient (computed using grand canonical Monte Carlo) of -4.936 mol\/kg\/Pa."} {"text":"The metal-organic framework with the topology pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has a 10-based logarithm of CH4 Henry coefficient (computed using grand canonical Monte Carlo) of -3.999 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/test_0-53.jsonl": "{"text":"The metal-organic framework with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has a largest cavity diameter (LCD) of 11.689 \\AA."} {"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has a largest cavity diameter of 8.205 \\AA."}", "/scratch/micpie/export/qmof_gcmc/test_0-43.jsonl": "{"text":"The MOF with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has an oxygen working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 220.487 cm^3 STP\/cm^3."} {"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has an O2 working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 91.761 cm^3 STP\/cm^3."}", "/scratch/micpie/export/qmof_gcmc/test_0-7.jsonl": "{"text":"The metal-organic framework with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has a 10-based logarithm of oxygen Henry coefficient using grand canonical Monte Carlo of -5.340 mol\/kg\/Pa."} {"text":"The metal-organic framework with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has a 10-based logarithm of oxygen Henry coefficient using grand canonical Monte Carlo of -5.643 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/train_0-9.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has a 10-based logarithm of xenon Henry coefficient (computed using grand canonical Monte Carlo) of -3.849 mol\/kg\/Pa."} {"text":"The reticular material with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a 10-based logarithm of Xe Henry coefficient (computed using grand canonical Monte Carlo) of -2.465 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/train_0-25.jsonl": "{"text":"The metal-organic framework with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has an Xe adsorption energy (computed using grand canonical Monte Carlo) of -23.816 kJ\/mol."} {"text":"The MOF with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an xenon adsorption energy (computed using grand canonical Monte Carlo) of -33.620 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/test_0-48.jsonl": "{"text":"The metal-organic framework with the RCSR identifier fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an hydrogen working capacity between 5 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 20.246 g\/L."} {"text":"The MOF with the RCSR identifier ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has an hydrogen working capacity between 5 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 4.484 g\/L."}", "/scratch/micpie/export/qmof_gcmc/valid_0-45.jsonl": "{"text":"The reticular material with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has an oxygen working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 4.226 mol\/kg."} {"text":"The metal-organic framework with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an oxygen working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 9.917 mol\/kg."}", "/scratch/micpie/export/qmof_gcmc/valid_0-41.jsonl": "{"text":"The metal-organic framework with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has a methane working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 1.605 mol\/kg."} {"text":"The metal-organic framework with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a methane working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 5.331 mol\/kg."}", "/scratch/micpie/export/qmof_gcmc/valid_0-22.jsonl": "{"text":"The metal-organic framework with the topology fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an O2 adsorption energy (computed using grand canonical Monte Carlo) of -15.243 kJ\/mol."} {"text":"The metal-organic framework (MOF) with the RCSR identifier pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an oxygen adsorption energy (computed using grand canonical Monte Carlo) of -11.739 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/test_0-45.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has an oxygen working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 15.719 mol\/kg."} {"text":"The reticular material with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has an oxygen working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 3.288 mol\/kg."}", "/scratch/micpie/export/qmof_gcmc/valid_0-44.jsonl": "{"text":"The metal-organic framework with the net fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an O2 working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 4.226 mol\/kg."} {"text":"The metal-organic framework with the topology pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an O2 working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 9.917 mol\/kg."}", "/scratch/micpie/export/qmof_gcmc/train_0-18.jsonl": "{"text":"The reticular material with the RCSR identifier nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has an N2 adsorption energy (computed using grand canonical Monte Carlo) of -11.895 kJ\/mol."} {"text":"The metal-organic framework (MOF) with the net pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has an nitrogen adsorption energy (computed using grand canonical Monte Carlo) of -17.612 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/valid_0-3.jsonl": "{"text":"The metal-organic framework with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has a 10-based logarithm of N2 Henry coefficient (computed using grand canonical Monte Carlo) of -5.230 mol\/kg\/Pa."} {"text":"The metal-organic framework with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a 10-based logarithm of N2 Henry coefficient (computed using grand canonical Monte Carlo) of -5.352 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/test_0-8.jsonl": "{"text":"The reticular material with the topology fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of Xe Henry coefficient (computed using grand canonical Monte Carlo) of -4.068 mol\/kg\/Pa."} {"text":"The MOF with the net ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has a 10-based logarithm of xenon Henry coefficient (computed using grand canonical Monte Carlo) of -3.846 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/test_0-14.jsonl": "{"text":"The MOF with the RCSR identifier fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of H2O Henry coefficient (computed using grand canonical Monte Carlo) of -7.646 mol\/kg\/Pa."} {"text":"The metal-organic framework with the RCSR identifier ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has a 10-based logarithm of water Henry coefficient (computed using grand canonical Monte Carlo) of -12.794 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/valid_0-17.jsonl": "{"text":"The reticular material with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has an carbon dioxide adsorption energy (computed using grand canonical Monte Carlo) of -25.983 kJ\/mol."} {"text":"The metal-organic framework with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an carbon dioxide adsorption energy (computed using grand canonical Monte Carlo) of -22.982 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/valid_0-55.jsonl": "{"text":"User: I have to find a reticular material with specific gas adsorption properties.\nAssistant: What can I do for you?\nUser: I want the H2S\/H2O selectivity (computed using grand canonical Monte Carlo) to be 37.275 and the CH4\/N2 selectivity (computed using grand canonical Monte Carlo) to be 7.834.\nAssistant: I found the following MOF for you: [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1."} {"text":"User: I must to synthesize a MOF with specific gas adsorption properties.\nAssistant: Interesting, can I be of any help?\nUser: I would like the H2S\/H2O selectivity (computed using grand canonical Monte Carlo) to be 4.017 and the CH4\/N2 selectivity (computed using grand canonical Monte Carlo) to be 4.236.\nAssistant: I found the following MOF for you: [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1."}", "/scratch/micpie/export/qmof_gcmc/valid_0-14.jsonl": "{"text":"The MOF with the net fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of H2O Henry coefficient (computed using grand canonical Monte Carlo) of -10.620 mol\/kg\/Pa."} {"text":"The metal-organic framework (MOF) with the net pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of H2O Henry coefficient (computed using grand canonical Monte Carlo) of -9.862 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/test_0-25.jsonl": "{"text":"The MOF with the MOFId [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat0 has an Xe adsorption energy (computed using grand canonical Monte Carlo) of -19.068 kJ\/mol."} {"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C.[Zn] MOFid-v1.ggl.cat0 has an xenon adsorption energy (computed using grand canonical Monte Carlo) of -25.166 kJ\/mol."}", "/scratch/micpie/export/qmof_gcmc/test_0-4.jsonl": "{"text":"The MOF with the topology fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of CH4 Henry coefficient (computed using grand canonical Monte Carlo) of -4.876 mol\/kg\/Pa."} {"text":"The metal-organic framework with the topology ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has a 10-based logarithm of CH4 Henry coefficient (computed using grand canonical Monte Carlo) of -4.943 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/valid_0-48.jsonl": "{"text":"The metal-organic framework (MOF) with the net fsc, linker SMILES [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an hydrogen working capacity between 5 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 5.500 g\/L."} {"text":"The MOF with the net pcu, linker SMILES [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an hydrogen working capacity between 5 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 11.642 g\/L."}", "/scratch/micpie/export/qmof_gcmc/valid_0-43.jsonl": "{"text":"The MOF with the MOFId [O-]C(=O)c1ccc(cc1)c1cc(c2ccc(cc2)C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.fsc.cat1 has an O2 working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 104.906 cm^3 STP\/cm^3."} {"text":"The metal-organic framework (MOF) with the MOFId [O-]C(=O)c1ccc2-c3c(Cc2c1)cc(cc3)C(=O)[O-].[Zn][Zn].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an oxygen working capacity between 5 and 140 bar at 298 K (computed using grand canonical Monte Carlo) of 187.860 cm^3 STP\/cm^3."}", "/scratch/micpie/export/qmof_gcmc/train_0-53.jsonl": "{"text":"The metal-organic framework with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has a largest cavity diameter (LCD) of 7.392 \\AA."} {"text":"The metal-organic framework (MOF) with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a largest cavity diameter of 4.949 \\AA."}", "/scratch/micpie/export/qmof_gcmc/train_0-37.jsonl": "{"text":"The metal-organic framework (MOF) with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has a Xe\/Kr selectivity (computed using grand canonical Monte Carlo) of 10.858."} {"text":"The metal-organic framework with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has a xenon\/krypton selectivity (computed using grand canonical Monte Carlo) of 29.142."}", "/scratch/micpie/export/qmof_gcmc/test_0-12.jsonl": "{"text":"The metal-organic framework (MOF) with the net fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a 10-based logarithm of hydrogen sulfide Henry coefficient (computed using grand canonical Monte Carlo) of -8.621 mol\/kg\/Pa."} {"text":"The reticular material with the RCSR code ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has a 10-based logarithm of hydrogen sulfide Henry coefficient (computed using grand canonical Monte Carlo) of -9.227 mol\/kg\/Pa."}", "/scratch/micpie/export/qmof_gcmc/train_0-38.jsonl": "{"text":"The reticular material with the net nbo, linker SMILES n1ccc(cc1)C1=NN=C([N]1)c1ccccn1, and node SMILES [Co] has a CH4 working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 38.544 cm^3 STP\/cm^3."} {"text":"The metal-organic framework with the RCSR identifier pcu, linker SMILES [O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-], n1ccc(cc1)c1ccncc1, and node SMILES [Fe] has a CH4 working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 6.617 cm^3 STP\/cm^3."}", "/scratch/micpie/export/qmof_gcmc/train_0-51.jsonl": "{"text":"The metal-organic framework with the MOFId [Co].n1ccc(cc1)C1=NN=C([N]1)c1ccccn1 MOFid-v1.nbo.cat0 has an hydrogen working capacity between 1 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 7.010 g\/L."} {"text":"The metal-organic framework with the MOFId [Fe].[O-]C(=O)c1ccc2c(c1)ccc(c2)C(=O)[O-].n1ccc(cc1)c1ccncc1 MOFid-v1.pcu.cat1 has an hydrogen working capacity between 1 and 100 bar at 77 K (computed using grand canonical Monte Carlo) of 8.424 g\/L."}", "/scratch/micpie/export/qmof_gcmc/test_0-38.jsonl": "{"text":"The metal-organic framework (MOF) with the topology fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has a CH4 working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 151.202 cm^3 STP\/cm^3."} {"text":"The MOF with the topology ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has a CH4 working capacity between 58 and 65 bar at 298 K (computed using grand canonical Monte Carlo) of 29.724 cm^3 STP\/cm^3."}", "/scratch/micpie/export/qmof_gcmc/test_0-20.jsonl": "{"text":"The MOF with the RCSR identifier fsc, linker SMILES [O-]C(=O)c1cc(C(=O)[O-])c(cc1c1ccc(cc1)C(=O)[O-])c1ccc(cc1)C(=O)[O-], n1ccc(cc1)c1ccc(cc1)c1ccncc1, and node SMILES [Zn][Zn] has an methane adsorption energy (computed using grand canonical Monte Carlo) of -12.747 kJ\/mol."} {"text":"The metal-organic framework with the topology ggl, linker SMILES [O-]C(=O)C(NC(=O)C1=CN=N[CH]1)C, and node SMILES [Zn] has an methane adsorption energy (computed using grand canonical Monte Carlo) of -17.165 kJ\/mol."}", "/scratch/micpie/export/drug_protein_drug/test_0-1.jsonl": "{"text":"The protein Adenosine receptor A3 is targeted by the drugs Aminophylline and CN1C=NC2=C1C(=O)N(C)C(=O)N2C."} {"text":"The protein D(1A) dopamine receptor is targeted by the drugs [C][N][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][S][C][=C][Ring1][#Branch2][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][Branch1][C][F][Branch1][C][F][F] and Periciazine."}", "/scratch/micpie/export/drug_protein_drug/valid_0-0.jsonl": "{"text":"The drug [C][S][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][Ring1][O][C][=C][Branch1][C][F][C][=C][C][=C][Ring1][#Branch1][F] targets the protein TM22 which is also targeted by the drug Caffeine."} {"text":"The drug [H][N][C][C][C][C][C][Ring1][=Branch1][C][Branch1][C][O][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1] targets the protein Dopamine D1 receptor which is also targeted by the drug Periciazine."}", "/scratch/micpie/export/drug_protein_drug/test_0-2.jsonl": "{"text":"User: Can you give me an example for a drug that targets the protein Adenosine receptor A3?\nAssistant: Sure, the drug targets the protein Adenosine receptor A3.\nUser: Can you tell me another drug that targets the protein Adenosine receptor A3?\nAssistant: Yes, the drug Aminophylline targets the drug CN1C=NC2=C1C(=O)N(C)C(=O)N2C."} {"text":"User: Can you give me an example for a drug that targets the protein D(1A) dopamine receptor?\nAssistant: Of course, the drug targets the protein D(1A) dopamine receptor.\nUser: Can you tell me another drug that targets the protein D(1A) dopamine receptor?\nAssistant: Yes, the drug CN(C)CCCN1c2ccccc2Sc2ccc(C(F)(F)F)cc21 targets the drug OC1CCN(CCCN2C3=CC=CC=C3SC3=C2C=C(C=C3)C#N)CC1."}", "/scratch/micpie/export/drug_protein_drug/test_0-0.jsonl": "{"text":"The drug NCCN.CNC=CNC=N5)))C=O)NC)C6=O.CNC=CNC=N5)))C=O)NC)C6=O targets the protein Adenosine receptor A3 which is also targeted by the drug CN1C=NC2=C1C(=O)N(C)C(=O)N2C."} {"text":"The drug [C][N][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][S][C][=C][Ring1][#Branch2][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][Branch1][C][F][Branch1][C][F][F] targets the protein Dopamine D1 receptor which is also targeted by the drug OC1CCN(CCCN2C3=CC=CC=C3SC3=C2C=C(C=C3)C#N)CC1."}", "/scratch/micpie/export/drug_protein_drug/train_0-0.jsonl": "{"text":"The drug [C][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=N][C][=N][C][=N][N][Ring1][#Branch2][Ring1][Branch1] targets the protein Cone cGMP-specific 3',5'-cyclic phosphodiesterase subunit alpha' which is also targeted by the drug CN1C=NC2=C1C(=O)N(C)C(=O)N2C."} {"text":"The drug N[C@@H](CC(N)=O)C(O)=O targets the protein System N amino acid transporter 1 which is also targeted by the drug N[C@@H](CC1=CNC=N1)C(O)=O."}", "/scratch/micpie/export/drug_protein_drug/valid_0-2.jsonl": "{"text":"User: Can you come up with one example for a drug that targets the protein HCP1?\nAssistant: Yes, the drug targets the protein HCP1.\nUser: Can you tell me another drug that targets the protein HCP1?\nAssistant: Yes, the drug CSc1nc2ccccc2c(=O)n1-c1c(F)cccc1F targets the drug CN1C=NC2=C1C(=O)N(C)C(=O)N2C."} {"text":"User: Can you come up with an example for a drug that targets the protein D(1A) dopamine receptor?\nAssistant: Yes, the drug targets the protein D(1A) dopamine receptor.\nUser: Can you tell me another drug that targets the protein D(1A) dopamine receptor?\nAssistant: Sure, the drug [H][N][C][C][C][C][C][Ring1][=Branch1][C][Branch1][C][O][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1] targets the drug OC1CCN(CCCN2C3=CC=CC=C3SC3=C2C=C(C=C3)C#N)CC1."}", "/scratch/micpie/export/drug_protein_drug/valid_0-1.jsonl": "{"text":"The protein TM22 is targeted by the drugs 3-(2,6-difluorophenyl)-2-(methylthio)quinazolin-4(3H)-one and Caffeine."} {"text":"The protein D(1A) dopamine receptor is targeted by the drugs [H][N][C][C][C][C][C][Ring1][=Branch1][C][Branch1][C][O][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1] and Periciazine."}", "/scratch/micpie/export/drug_protein_drug/train_0-2.jsonl": "{"text":"User: Can you give me one example for a drug that targets the protein cGMP phosphodiesterase 6C?\nAssistant: Sure, the drug targets the protein cGMP phosphodiesterase 6C.\nUser: Can you tell me another drug that targets the protein cGMP phosphodiesterase 6C?\nAssistant: Sure, the drug CCN(CC)c1cc(C)nc2ncnn12 targets the drug CN1C=NC2=C1C(=O)N(C)C(=O)N2C."} {"text":"User: Can you come up with an example for a drug that targets the protein N-system amino acid transporter 1?\nAssistant: Yes, of course, the drug targets the protein N-system amino acid transporter 1.\nUser: Can you tell me another drug that targets the protein N-system amino acid transporter 1?\nAssistant: Of course, the drug Asparagine targets the drug Histidine."}", "/scratch/micpie/export/drug_protein_drug/train_0-1.jsonl": "{"text":"The protein Cone cGMP-specific 3',5'-cyclic phosphodiesterase subunit alpha' is targeted by the drugs InChI=1S\/C10H15N5\/c1-4-14(5-2)9-6-8(3)13-10-11-7-12-15(9)10\/h6-7H,4-5H2,1-3H3 and CN1C=NC2=C1C(=O)N(C)C(=O)N2C."} {"text":"The protein N-system amino acid transporter 1 is targeted by the drugs Asparagine and N[C@@H](CC1=CNC=N1)C(O)=O."}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/test_0-10.jsonl": "{"text":"User: I'm searching for the DeepSMILES of a molecule that is inhibiting CYP P450 2C19?\nAssistant: This is a molecule that is inhibiting CYP P450 2C19: O=CCScnnccn6)[nH]ccccF)cc69))))))))))))))OCcccccc6"} {"text":"User: I'm looking for the SELFIES of a molecule that is inhibiting CYP P450 2C19?\nAssistant: This is a molecule that is inhibiting CYP P450 2C19: [C][C][Branch1][C][C][N][Branch2][Ring1][=C][C][C][N][C][=Branch1][C][=O][C][C][C][C][=Branch1][C][=O][N][Ring1][=Branch1][C][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][C][=C][C][=C][C][=C][Ring1][=Branch1]"}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/valid_0-8.jsonl": "{"text":"User: Is the molecule with the DeepSMILES CCOC=O)CSC=CC#N))CC)C=CCCCC6=O)))))N6 inhibiting CYP P450 2C19?\nAssistant: Yes, it is inhibiting CYP P450 2C19."} {"text":"User: Is the molecule with the DeepSMILES C=CCcccc[N+]C)C)CCC=O)CC[N+]C)C)ccccCC=C)))cc6)))))))))))))cc6 inhibiting CYP P450 2C19?\nAssistant: No, it is not inhibiting CYP P450 2C19."}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/train_0-8.jsonl": "{"text":"User: Is the molecule with the DeepSMILES CC=O)Nccccoc=O)sc5c9)))))))))S=O)=O)ccccs5 inhibiting CYP P450 2C19?\nAssistant: Yes, it is inhibiting CYP P450 2C19."} {"text":"User: Is the molecule with the canonical SMILES c1cc(-c2sc(-c3ccncc3)c(-c3ccncc3)c2-c2ccncc2)ccn1 inhibiting CYP P450 2C19?\nAssistant: Yes, it is inhibiting CYP P450 2C19."}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2C19.\nMolecule DeepSMILES: O=CCScnnccn6)[nH]ccccF)cc69))))))))))))))OCcccccc6\nConstraint: Answer the question in a full sentence.\nResult: This molecule is inhibiting CYP P450 2C19."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP2C19.\nMolecule canonical SMILES: CC(C)N(CCNC(=O)C1CCC(=O)N1Cc1ccc(F)cc1)Cc1ccccc1\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is inhibiting CYP2C19."}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/valid_0-9.jsonl": "{"text":"User: Can you generate the InChI of a molecule that is inhibiting CYP2C19?\nAssistant: Yes, here you go: InChI=1S\/C15H18N2O3S\/c1-3-20-13(19)8-21-15-10(7-16)9(2)14-11(17-15)5-4-6-12(14)18\/h9,17H,3-6,8H2,1-2H3"} {"text":"User: Can you give me the SMILES of a molecule that is not inhibiting CYP P450 2C19?\nAssistant: Yes, I'm happy to help, here you go: C=CCc1ccc([N+](C)(C)CCC(=O)CC[N+](C)(C)c2ccc(CC=C)cc2)cc1"}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/test_0-1.jsonl": "{"text":"Based on the canonical SMILES O=C(CSc1nnc2c(n1)[nH]c1ccc(F)cc12)OCc1ccccc1, the molecule shows inhibition of CYP P450 2C19."} {"text":"Based on the canonical SMILES representation CC(C)N(CCNC(=O)C1CCC(=O)N1Cc1ccc(F)cc1)Cc1ccccc1, the molecule shows inhibition of CYP P450 2C19."}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/valid_0-0.jsonl": "{"text":"The molecule with the InChI representation of InChI=1S\/C15H18N2O3S\/c1-3-20-13(19)8-21-15-10(7-16)9(2)14-11(17-15)5-4-6-12(14)18\/h9,17H,3-6,8H2,1-2H3 displays inhibition of CYP2C19."} {"text":"The molecule with the canonical SMILES C=CCc1ccc([N+](C)(C)CCC(=O)CC[N+](C)(C)c2ccc(CC=C)cc2)cc1 shows no inhibition of CYP2C19."}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/test_0-2.jsonl": "{"text":"The DeepSMILES O=CCScnnccn6)[nH]ccccF)cc69))))))))))))))OCcccccc6 is from a molecule that displays inhibition of CYP2C19."} {"text":"The InChI InChI=1S\/C24H30FN3O2\/c1-18(2)27(16-19-6-4-3-5-7-19)15-14-26-24(30)22-12-13-23(29)28(22)17-20-8-10-21(25)11-9-20\/h3-11,18,22H,12-17H2,1-2H3,(H,26,30) is from a molecule that displays inhibition of CYP2C19."}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/valid_0-10.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that is inhibiting CYP P450 2C19?\nAssistant: This is a molecule that is inhibiting CYP P450 2C19: CCOC(=O)CSC1=C(C#N)C(C)C2=C(CCCC2=O)N1"} {"text":"User: I'm searching for the canonical SMILES of a molecule that is not inhibiting CYP P450 2C19?\nAssistant: This is a molecule that is not inhibiting CYP P450 2C19: C=CCc1ccc([N+](C)(C)CCC(=O)CC[N+](C)(C)c2ccc(CC=C)cc2)cc1"}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/train_0-6.jsonl": "{"text":"Task: Please generate a canonical SMILES based on the text description.\nDescription: A molecule that is inhibiting CYP2C19.\nResult: CC(=O)N(c1ccc2oc(=O)sc2c1)S(=O)(=O)c1cccs1"} {"text":"Task: Please give me a molecule DeepSMILES based on the description.\nDescription: A molecule that is inhibiting CYP P450 2C19.\nResult: ccc-csc-cccncc6))))))c-cccncc6))))))c5-cccncc6)))))))))))ccn6"}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/valid_0-6.jsonl": "{"text":"Task: Please create a molecule InChI based on the text description below.\nDescription: A molecule that is inhibiting CYP P450 2C19.\nResult: InChI=1S\/C15H18N2O3S\/c1-3-20-13(19)8-21-15-10(7-16)9(2)14-11(17-15)5-4-6-12(14)18\/h9,17H,3-6,8H2,1-2H3"} {"text":"Task: Please generate a molecule canonical SMILES based on the text description.\nDescription: A molecule that is inhibiting CYP P450 2C19.\nResult: C=CCc1ccc([N+](C)(C)CCC(=O)CC[N+](C)(C)c2ccc(CC=C)cc2)cc1"}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/test_0-9.jsonl": "{"text":"User: Can you give me the InChI of a molecule that is inhibiting CYP2C19?\nAssistant: Yes, here you go: InChI=1S\/C18H13FN4O2S\/c19-12-6-7-14-13(8-12)16-17(20-14)21-18(23-22-16)26-10-15(24)25-9-11-4-2-1-3-5-11\/h1-8H,9-10H2,(H,20,21,23)"} {"text":"User: Can you create the DeepSMILES of a molecule that is inhibiting CYP P450 2C19?\nAssistant: Yes, I'm happy to help, here you go: CCC)NCCNC=O)CCCC=O)N5CccccF)cc6))))))))))))))))Ccccccc6"}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/test_0-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of O=C(CSc1nnc2c(n1)[nH]c1ccc(F)cc12)OCc1ccccc1 exhibits inhibition of CYP2C19."} {"text":"The molecule with the DeepSMILES CCC)NCCNC=O)CCCC=O)N5CccccF)cc6))))))))))))))))Ccccccc6 displays inhibition of CYP2C19."}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/valid_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the SMILES CCOC(=O)CSC1=C(C#N)C(C)C2=C(CCCC2=O)N1 is inhibiting CYP P450 2C19?\nAssistant: Yes, this molecule is inhibiting CYP P450 2C19."} {"text":"User: Can you derive if the molecule with the InChI InChI=1S\/C27H38N2O\/c1-7-9-23-11-15-25(16-12-23)28(3,4)21-19-27(30)20-22-29(5,6)26-17-13-24(10-8-2)14-18-26\/h7-8,11-18H,1-2,9-10,19-22H2,3-6H3\/q+2 is inhibiting CYP2C19?\nAssistant: No, this molecule is not inhibiting CYP2C19."}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/test_0-3.jsonl": "{"text":"The canonical SMILES O=C(CSc1nnc2c(n1)[nH]c1ccc(F)cc12)OCc1ccccc1 is inhibiting CYP2C19."} {"text":"The SMILES CC(C)N(CCNC(=O)C1CCC(=O)N1Cc1ccc(F)cc1)Cc1ccccc1 is inhibiting CYP P450 2C19."}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/valid_0-11.jsonl": "{"text":"User: I want to create a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should be inhibiting CYP2C19.\nAssistant: Got it, here you go, this canonical SMILES is inhibiting CYP2C19: CCOC(=O)CSC1=C(C#N)C(C)C2=C(CCCC2=O)N1"} {"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be inhibiting CYP P450 2C19.\nAssistant: Got it, this DeepSMILES is not inhibiting CYP P450 2C19: C=CCcccc[N+]C)C)CCC=O)CC[N+]C)C)ccccCC=C)))cc6)))))))))))))cc6"}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/train_0-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CC(=O)N(c1ccc2oc(=O)sc2c1)S(=O)(=O)c1cccs1 exhibits inhibition of CYP2C19."} {"text":"The molecule with the SMILES representation of c1cc(-c2sc(-c3ccncc3)c(-c3ccncc3)c2-c2ccncc2)ccn1 displays inhibition of CYP2C19."}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/test_0-6.jsonl": "{"text":"Task: Please create a molecule SELFIES based on the description below.\nDescription: A molecule that is inhibiting CYP P450 2C19.\nResult: [O][=C][Branch2][Ring1][O][C][S][C][=N][N][=C][C][=Branch1][Ring2][=N][Ring1][=Branch1][NH1][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][Ring1][O][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]"} {"text":"Task: Please generate a molecule SMILES based on the description.\nDescription: A molecule that is inhibiting CYP P450 2C19.\nResult: CC(C)N(CCNC(=O)C1CCC(=O)N1Cc1ccc(F)cc1)Cc1ccccc1"}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/train_0-10.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that is inhibiting CYP P450 2C19?\nAssistant: This is a molecule that is inhibiting CYP P450 2C19: CC(=O)N(c1ccc2oc(=O)sc2c1)S(=O)(=O)c1cccs1"} {"text":"User: I'm looking for the InChI of a molecule that is inhibiting CYP2C19?\nAssistant: This is a molecule that is inhibiting CYP2C19: InChI=1S\/C24H16N4S\/c1-9-25-10-2-17(1)21-22(18-3-11-26-12-4-18)24(20-7-15-28-16-8-20)29-23(21)19-5-13-27-14-6-19\/h1-16H"}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/train_0-3.jsonl": "{"text":"The molecule SELFIES [C][C][=Branch1][C][=O][N][Branch1][P][C][=C][C][=C][O][C][=Branch1][C][=O][S][C][Ring1][=Branch1][=C][Ring1][#Branch2][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][S][Ring1][Branch1] is inhibiting CYP2C19."} {"text":"The SELFIES [C][=C][C][Branch2][Ring2][Ring2][C][S][C][Branch1][=Branch2][C][=C][C][=N][C][=C][Ring1][=Branch1][=C][Branch1][=Branch2][C][=C][C][=N][C][=C][Ring1][=Branch1][C][=Ring1][P][C][=C][C][=N][C][=C][Ring1][=Branch1][=C][C][=N][Ring2][Ring1][=N] is inhibiting CYP2C19."}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/train_0-12.jsonl": "{"text":"User: I want to come up with a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should be inhibiting CYP2C19.\nAssistant: Got it, this canonical SMILES is inhibiting CYP2C19: CC(=O)N(c1ccc2oc(=O)sc2c1)S(=O)(=O)c1cccs1"} {"text":"User: I want to create a InChI.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should be inhibiting CYP2C19.\nAssistant: Ok, this InChI is inhibiting CYP2C19: InChI=1S\/C24H16N4S\/c1-9-25-10-2-17(1)21-22(18-3-11-26-12-4-18)24(20-7-15-28-16-8-20)29-23(21)19-5-13-27-14-6-19\/h1-16H"}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of O=C(CSc1nnc2c(n1)[nH]c1ccc(F)cc12)OCc1ccccc1 inhibiting CYP P450 2C19?\nConstraint: Even if you are not sure, you must pick either a or b without using any additional words.\nOptions:\n[a] False\n[b] True\nAnswer: b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES CC(C)N(CCNC(=O)C1CCC(=O)N1Cc1ccc(F)cc1)Cc1ccccc1 inhibiting CYP P450 2C19?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any other words.\nOptions:\n1: False\n2: True\nAnswer: 2"}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/valid_0-2.jsonl": "{"text":"The canonical SMILES CCOC(=O)CSC1=C(C#N)C(C)C2=C(CCCC2=O)N1 is from a molecule that displays inhibition of CYP2C19."} {"text":"The DeepSMILES C=CCcccc[N+]C)C)CCC=O)CC[N+]C)C)ccccCC=C)))cc6)))))))))))))cc6 is from a molecule that exhibits no inhibition of CYP2C19."}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are inhibiting CYP2C19?\nConstraint: You must select none, one or more options from a, b, c, d, or e without using any other words.\nOptions:\na. InChI=1S\/C31H48O6.Na\/c1-17(2)9-8-10-20(28(35)36)26-22-15-24(34)27-29(5)13-12-23(33)18(3)21(29)11-14-30(27,6)31(22,7)16-25(26)37-19(4)32;\/h9,18,21-25,27,33-34H,8,10-16H2,1-7H3,(H,35,36);\/q;+1\/p-1\/b26-20+;\/t18-,21+,22+,23+,24+,25-,27-,29-,30-,31-;\/m0.\/s1\nb. InChI=1S\/C11H14N2OS3\/c1-3-13-10(14)8(17-11(13)15)4-5-9-12(2)6-7-16-9\/h4-5H,3,6-7H2,1-2H3\/b8-4-,9-5+\nc. InChI=1S\/C27H27N3O3\/c1-20-6-5-9-24(18-20)30-16-14-29(15-17-30)23-12-10-22(11-13-23)28-26(31)19-25(27(32)33)21-7-3-2-4-8-21\/h2-13,18-19H,14-17H2,1H3,(H,28,31)(H,32,33)\/b25-19+\nd. InChI=1S\/C13H9NO5S3\/c1-8(15)14(22(17,18)12-3-2-6-20-12)9-4-5-10-11(7-9)21-13(16)19-10\/h2-7H,1H3\ne. InChI=1S\/C32H32O14\/c1-10-8-9-15-18-16(10)29(38)45-26-17-13(23(35)20(19(18)26)30(39)43-15)6-5-7-14(17)44-32-28(24(36)21(33)11(2)42-32)46-31-25(37)27(40-4)22(34)12(3)41-31\/h5-9,11-12,21-22,24-25,27-28,31-37H,1-4H3\/t11-,12-,21-,22-,24+,25-,27+,28-,31-,32-\/m0\/s1\nAnswer: b, c, d"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are inhibiting CYP2C19?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any additional words.\nOptions:\n[A] CNC(=O)c1sc(-c2cccnc2)nc1C\n[B] O=C1NC([O-])=NC1(c1ccccc1)c1ccccc1.[Na+]\n[C] CCOC(=O)c1[nH]c(C(=O)O)c(CCN(CC)CC)c1C\n[D] CCCc1[nH]nc2c1C(c1ccncc1)C(C#N)=C(N)O2\n[E] c1cc(-c2sc(-c3ccncc3)c(-c3ccncc3)c2-c2ccncc2)ccn1\nAnswer: D, E"}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/valid_0-1.jsonl": "{"text":"Based on the SELFIES [C][C][O][C][=Branch1][C][=O][C][S][C][=C][Branch1][Ring1][C][#N][C][Branch1][C][C][C][=C][Branch1][Branch2][C][C][C][C][Ring1][=Branch1][=O][N][Ring1][=C], the molecule exhibits inhibition of CYP P450 2C19."} {"text":"Based on the DeepSMILES C=CCcccc[N+]C)C)CCC=O)CC[N+]C)C)ccccCC=C)))cc6)))))))))))))cc6, the molecule exhibits no inhibition of CYP2C19."}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES [C][C][O][C][=Branch1][C][=O][C][S][C][=C][Branch1][Ring1][C][#N][C][Branch1][C][C][C][=C][Branch1][Branch2][C][C][C][C][Ring1][=Branch1][=O][N][Ring1][=C] inhibiting CYP2C19?\nConstraint: Even if you are not sure, you must pick either a or b without using any additional words.\nOptions:\na) False\nb) True\nAnswer: b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI representation of InChI=1S\/C27H38N2O\/c1-7-9-23-11-15-25(16-12-23)28(3,4)21-19-27(30)20-22-29(5,6)26-17-13-24(10-8-2)14-18-26\/h7-8,11-18H,1-2,9-10,19-22H2,3-6H3\/q+2 inhibiting CYP2C19?\nConstraint: Even if you are uncertain, you must pick either A or B without using any other words.\nOptions:\nA. False\nB. True\nAnswer: A"}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2C19.\nInChI: InChI=1S\/C15H18N2O3S\/c1-3-20-13(19)8-21-15-10(7-16)9(2)14-11(17-15)5-4-6-12(14)18\/h9,17H,3-6,8H2,1-2H3\nConstraint: Answer the question in a full sentence.\nResult: This molecule is inhibiting CYP P450 2C19."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2C19.\nMolecule InChI: InChI=1S\/C27H38N2O\/c1-7-9-23-11-15-25(16-12-23)28(3,4)21-19-27(30)20-22-29(5,6)26-17-13-24(10-8-2)14-18-26\/h7-8,11-18H,1-2,9-10,19-22H2,3-6H3\/q+2\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not inhibiting CYP P450 2C19."}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2C19.\nMolecule SMILES: CCOC(=O)CSC1=C(C#N)C(C)C2=C(CCCC2=O)N1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2C19.\nSMILES: C=CCc1ccc([N+](C)(C)CCC(=O)CC[N+](C)(C)c2ccc(CC=C)cc2)cc1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2C19.\nMolecule canonical SMILES: CC(=O)N(c1ccc2oc(=O)sc2c1)S(=O)(=O)c1cccs1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is inhibiting CYP P450 2C19."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2C19.\nInChI: InChI=1S\/C24H16N4S\/c1-9-25-10-2-17(1)21-22(18-3-11-26-12-4-18)24(20-7-15-28-16-8-20)29-23(21)19-5-13-27-14-6-19\/h1-16H\nConstraint: Answer the question in a full sentence.\nResult: This molecule is inhibiting CYP P450 2C19."}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/valid_0-12.jsonl": "{"text":"User: I want to come up with a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should be inhibiting CYP2C19.\nAssistant: Ok, this canonical SMILES is inhibiting CYP2C19: CCOC(=O)CSC1=C(C#N)C(C)C2=C(CCCC2=O)N1"} {"text":"User: I want to create a molecule SELFIES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be inhibiting CYP P450 2C19.\nAssistant: Understood, this SELFIES is not inhibiting CYP P450 2C19: [C][=C][C][C][=C][C][=C][Branch2][Ring2][Ring2][N+1][Branch1][C][C][Branch1][C][C][C][C][C][=Branch1][C][=O][C][C][N+1][Branch1][C][C][Branch1][C][C][C][=C][C][=C][Branch1][Ring2][C][C][=C][C][=C][Ring1][=Branch2][C][=C][Ring2][Ring1][O]"}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/train_0-2.jsonl": "{"text":"The canonical SMILES CC(=O)N(c1ccc2oc(=O)sc2c1)S(=O)(=O)c1cccs1 is from a molecule that shows inhibition of CYP P450 2C19."} {"text":"The DeepSMILES ccc-csc-cccncc6))))))c-cccncc6))))))c5-cccncc6)))))))))))ccn6 is from a molecule that exhibits inhibition of CYP P450 2C19."}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/test_0-11.jsonl": "{"text":"User: I want to generate a molecule InChI.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should be inhibiting CYP P450 2C19.\nAssistant: Got it, here you go, this InChI is inhibiting CYP P450 2C19: InChI=1S\/C18H13FN4O2S\/c19-12-6-7-14-13(8-12)16-17(20-14)21-18(23-22-16)26-10-15(24)25-9-11-4-2-1-3-5-11\/h1-8H,9-10H2,(H,20,21,23)"} {"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should be inhibiting CYP2C19.\nAssistant: Ok, here you go, this SELFIES is inhibiting CYP2C19: [C][C][Branch1][C][C][N][Branch2][Ring1][=C][C][C][N][C][=Branch1][C][=O][C][C][C][C][=Branch1][C][=O][N][Ring1][=Branch1][C][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][C][=C][C][=C][C][=C][Ring1][=Branch1]"}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/train_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the InChI InChI=1S\/C13H9NO5S3\/c1-8(15)14(22(17,18)12-3-2-6-20-12)9-4-5-10-11(7-9)21-13(16)19-10\/h2-7H,1H3 is inhibiting CYP P450 2C19?\nAssistant: Yes, this molecule is inhibiting CYP P450 2C19."} {"text":"User: Can you derive if the molecule with the SMILES c1cc(-c2sc(-c3ccncc3)c(-c3ccncc3)c2-c2ccncc2)ccn1 is inhibiting CYP2C19?\nAssistant: Yes, this molecule is inhibiting CYP2C19."}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/train_0-11.jsonl": "{"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should be inhibiting CYP2C19.\nAssistant: Got it, here you go, this SMILES is inhibiting CYP2C19: CC(=O)N(c1ccc2oc(=O)sc2c1)S(=O)(=O)c1cccs1"} {"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should be inhibiting CYP P450 2C19.\nAssistant: Ok, here you go, this DeepSMILES is inhibiting CYP P450 2C19: ccc-csc-cccncc6))))))c-cccncc6))))))c5-cccncc6)))))))))))ccn6"}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/train_0-1.jsonl": "{"text":"Based on the DeepSMILES representation CC=O)Nccccoc=O)sc5c9)))))))))S=O)=O)ccccs5, the molecule exhibits inhibition of CYP2C19."} {"text":"Based on the SMILES c1cc(-c2sc(-c3ccncc3)c(-c3ccncc3)c2-c2ccncc2)ccn1, the molecule exhibits inhibition of CYP2C19."}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES CC(=O)N(c1ccc2oc(=O)sc2c1)S(=O)(=O)c1cccs1 inhibiting CYP P450 2C19?\nConstraint: Even if you are uncertain, you must pick either A or B without using any other words.\nOptions:\nA) False\nB) True\nAnswer: B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES ccc-csc-cccncc6))))))c-cccncc6))))))c5-cccncc6)))))))))))ccn6 inhibiting CYP2C19?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any other words.\nOptions:\n[1] True\n[2] False\nAnswer: 1"}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2C19.\nSMILES: CC(=O)N(c1ccc2oc(=O)sc2c1)S(=O)(=O)c1cccs1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2C19.\nInChI: InChI=1S\/C24H16N4S\/c1-9-25-10-2-17(1)21-22(18-3-11-26-12-4-18)24(20-7-15-28-16-8-20)29-23(21)19-5-13-27-14-6-19\/h1-16H\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/test_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the canonical SMILES O=C(CSc1nnc2c(n1)[nH]c1ccc(F)cc12)OCc1ccccc1 is inhibiting CYP P450 2C19?\nAssistant: Yes, this molecule is inhibiting CYP P450 2C19."} {"text":"User: Can you estimate if the molecule with the InChI InChI=1S\/C24H30FN3O2\/c1-18(2)27(16-19-6-4-3-5-7-19)15-14-26-24(30)22-12-13-23(29)28(22)17-20-8-10-21(25)11-9-20\/h3-11,18,22H,12-17H2,1-2H3,(H,26,30) is inhibiting CYP2C19?\nAssistant: Yes, this molecule is inhibiting CYP2C19."}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/train_0-9.jsonl": "{"text":"User: Can you give me the DeepSMILES of a molecule that is inhibiting CYP2C19?\nAssistant: Sure, here you go: CC=O)Nccccoc=O)sc5c9)))))))))S=O)=O)ccccs5"} {"text":"User: Can you give me the SMILES of a molecule that is inhibiting CYP P450 2C19?\nAssistant: Sure, here you go: c1cc(-c2sc(-c3ccncc3)c(-c3ccncc3)c2-c2ccncc2)ccn1"}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/valid_0-3.jsonl": "{"text":"The SELFIES [C][C][O][C][=Branch1][C][=O][C][S][C][=C][Branch1][Ring1][C][#N][C][Branch1][C][C][C][=C][Branch1][Branch2][C][C][C][C][Ring1][=Branch1][=O][N][Ring1][=C] is inhibiting CYP P450 2C19."} {"text":"The molecule DeepSMILES C=CCcccc[N+]C)C)CCC=O)CC[N+]C)C)ccccCC=C)))cc6)))))))))))))cc6 is not inhibiting CYP2C19."}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/test_0-8.jsonl": "{"text":"User: Is the molecule with the canonical SMILES O=C(CSc1nnc2c(n1)[nH]c1ccc(F)cc12)OCc1ccccc1 inhibiting CYP2C19?\nAssistant: Yes, it is inhibiting CYP2C19."} {"text":"User: Is the molecule with the canonical SMILES CC(C)N(CCNC(=O)C1CCC(=O)N1Cc1ccc(F)cc1)Cc1ccccc1 inhibiting CYP2C19?\nAssistant: Yes, it is inhibiting CYP2C19."}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are inhibiting CYP2C19?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any additional words.\nOptions:\n[1] InChI=1S\/C31H44N2O9\/c1-21-12-10-16-29(35)33-25(18-40-17-24-13-7-6-8-14-24)31(37)42-20-27(39-5)22(2)11-9-15-28(34)32-23(3)30(36)41-19-26(21)38-4\/h6-14,21-23,25-27H,15-20H2,1-5H3,(H,32,34)(H,33,35)\/b11-9-,12-10-\/t21-,22-,23-,25+,26-,27-\/m1\/s1\n[2] InChI=1S\/C18H13FN4O2S\/c19-12-6-7-14-13(8-12)16-17(20-14)21-18(23-22-16)26-10-15(24)25-9-11-4-2-1-3-5-11\/h1-8H,9-10H2,(H,20,21,23)\n[3] InChI=1S\/C6H8As2O6\/c9-7(10,11)5-1-2-6(4-3-5)8(12,13)14\/h1-4H,(H2,9,10,11)(H2,12,13,14)\nAnswer: 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are inhibiting CYP P450 2C19?\nConstraint: You must select none, one or more options from 1, 2, 3, 4, or 5 without using any additional words.\nOptions:\n1.) CCCCOC=O)ccccO)cc6\n2.) COcccc-ncN)cC=O)NCcccco5))))))))sc5=S))))))cc6\n3.) C[N+]C)C)CCNCccccs5))))))cccccn6\n4.) Occcc\/C=C\\cccO)ccO)c6))))))))cc6\n5.) CCC)NCCNC=O)CCCC=O)N5CccccF)cc6))))))))))))))))Ccccccc6\nAnswer: 1, 2, 5"}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are inhibiting CYP P450 2C19?\nConstraint: You must select none, one or more options from 1 or 2 without using any other words.\nOptions:\n(1) O=ccCCcccccc6))))))))nccncNCCNCC6))))))nc6n%10CCcccccc6\n(2) CCOC=O)CSC=CC#N))CC)C=CCCCC6=O)))))N6\nAnswer: 1, 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting CYP P450 2C19?\nConstraint: You must select none, one or more options from A, B, C, or D without using any other words.\nOptions:\n(A) [N][C@@H1][Branch1][#Branch2][C][C][=C][C][=N][N][=C][Ring1][=Branch1][C][=Branch1][C][=O][O]\n(B) [C][=C][C][C][=C][C][=C][Branch2][Ring2][Ring2][N+1][Branch1][C][C][Branch1][C][C][C][C][C][=Branch1][C][=O][C][C][N+1][Branch1][C][C][Branch1][C][C][C][=C][C][=C][Branch1][Ring2][C][C][=C][C][=C][Ring1][=Branch2][C][=C][Ring2][Ring1][O]\n(C) [C][O][C][=Branch1][C][=O][C@@][Branch1][=N][C][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C@H1][C][C][=C][Branch1][#Branch2][C][=Branch1][C][=O][N][Branch1][C][C][C][N][Branch1][Ring2][C][C][F][C][=Ring1][=N][C][C@H1][Ring1][S][C][N][Ring2][Ring1][O][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1]\n(D) [C][C][Branch1][C][C][Branch2][Ring1][Branch1][C][O][C@H1][C][C@H1][C][C][C@@][Ring1][=Branch1][Branch1][C][C][C][Ring1][=Branch1][Branch1][C][C][C][N+1][=Branch1][C][=O][O-1]\nAnswer: A, B, D"}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2C19.\nMolecule SMILES: O=C(CSc1nnc2c(n1)[nH]c1ccc(F)cc12)OCc1ccccc1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP2C19.\nInChI: InChI=1S\/C24H30FN3O2\/c1-18(2)27(16-19-6-4-3-5-7-19)15-14-26-24(30)22-12-13-23(29)28(22)17-20-8-10-21(25)11-9-20\/h3-11,18,22H,12-17H2,1-2H3,(H,26,30)\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"}", "/scratch/micpie/export/cyp_p450_2c19_inhibition_veith_et_al/test_0-12.jsonl": "{"text":"User: I want to come up with a canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should be inhibiting CYP2C19.\nAssistant: Ok, this canonical SMILES is inhibiting CYP2C19: O=C(CSc1nnc2c(n1)[nH]c1ccc(F)cc12)OCc1ccccc1"} {"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should be inhibiting CYP2C19.\nAssistant: Got it, this DeepSMILES is inhibiting CYP2C19: CCC)NCCNC=O)CCCC=O)N5CccccF)cc6))))))))))))))))Ccccccc6"}", "/scratch/micpie/export/bio_ner_6/valid_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: Atypically, abundant capillaries were observed; however, the cystic proliferation of epithelioid cells with vacuoles and immunohistochemical profile of the epithelioid element were consistent with hepatic adenomatoid tumour..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: capillaries,21,32,Anatomy\ncystic,61,67,Anatomy\nepithelioid cells,85,102,Anatomy\nvacuoles,108,116,Anatomy\nepithelioid element,156,175,Anatomy\nhepatic adenomatoid tumour,197,223,Anatomy"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Supplementation lasted for 75days and consisted of psyllium seed and the following probiotic bacteria: Enterococcus faecium, Lactobacillus acidophilus, Lactobacillus plantarum, and Lactobacillus casei..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: psyllium seed,51,64,treatment\nprobiotic bacteria,83,101,treatment\nEnterococcus faecium,103,123,treatment\nLactobacillus acidophilus,125,150,treatment\nLactobacillus plantarum,152,175,treatment\nLactobacillus casei,181,200,treatment"}", "/scratch/micpie/export/bio_ner_6/test_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: To this end, two candidate anti-angiogenic RNA-damaging agents, onconase and (-4) rhEDN, were screened for their effects on endothelial cell proliferation using three distinct types of endothelial cells in culture: HPV-16 E6\/E7-immortalized human umbilical vein endothelial cells (HUVECs), a Kras-transformed HPV-16 E6\/E7 HUVEC (Rhim et al., Carcinogenesis 4, 673-681, 1998), and primary HUVECs..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: endothelial cell,130,146,Anatomy\nendothelial cells,191,208,Anatomy\nhuman umbilical vein endothelial cells,253,291,Anatomy\nHUVECs,294,300,Anatomy\nHUVEC,341,346,Anatomy\nHUVECs,410,416,Anatomy"} {"text":"Task: Please carry out the NER task for the the text below.\nText: To investigate these hypotheses, the virome capture sequencing for vertebrate viruses (VirCapSeq-VERT) platform was employed to detect RNA transcripts from known and novel viruses in fresh frozen lymph node tissue from CD patients (12 UCD, 11 HHV-8-negative MCD [ idiopathic MCD; iMCD], and two HHV-8-positive MCD) and related diseases (three T cell lymphoma and three Hodgkin lymphoma)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: CD patients,222,233,state\nUCD,239,242,state\nHHV - 8 - negative MCD,247,269,state\nHHV - 8 - positive MCD,303,325,state\nT cell lymphoma,356,371,state\nHodgkin lymphoma,382,398,state"}", "/scratch/micpie/export/bio_ner_6/train_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: Similar to our previous results with activated NK cells (Figs. 2 and 3), addition of NK cells to the culture leads to opposing effects on the iDC population depending on the NK\/DC ratio..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: NK cells,47,55,Anatomy\nNK cells,86,94,Anatomy\nculture,102,109,Anatomy\niDC,143,146,Anatomy\nNK,175,177,Anatomy\nDC,180,182,Anatomy"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Mice were exposed to Antibiotic Mix (ABX Mix), a combination of ampicillin (Sigma-Aldrich; Cat # A0166), metronidazole (Sigma-Aldrich; Cat # M1547), neomycin (Sigma-Aldrich; Cat # N6386), and vancomycin (Sigma-Aldrich; Cat # V1130), each at 1 g\/L, in the drinking water or control (untreated) water..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: Antibiotic Mix,21,35,treatment\nABX Mix,38,45,treatment\nampicillin,65,75,treatment\nmetronidazole,109,122,treatment\nneomycin,156,164,treatment\nvancomycin,202,212,treatment"}", "/scratch/micpie/export/bio_ner_7/valid_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: At the 5: 1 (NK\/DC) ratio, NK cell-mediated destruction of the iDCs was the dominant feature, whereas at the low ratio (1: 5) the same NK cells promoted iDCs survival compared with iDCs alone over the same period (Fig. 3 A)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: NK,14,16,Anatomy\nDC,19,21,Anatomy\nNK cell,30,37,Anatomy\niDCs,68,72,Anatomy\nNK cells,141,149,Anatomy\niDCs,159,163,Anatomy\niDCs,187,191,Anatomy"} {"text":"Task: Please carry out the NER task for the the text below.\nText: Patient materials Fresh frozen lymph node samples from iMCD (n = 11), UCD (n = 12), HHV-8-positive MCD (n = 2), and malignant lymphoma without any known or suspected association or clinical suspicion of HHV-8 infection (n = 6) were collected by either core needle or excisional biopsy at diagnosis or later during flare as part of routine clinical treatment..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: Patient,0,7,state\niMCD,55,59,state\nUCD,71,74,state\nHHV - 8 - positive MCD,86,108,state\nmalignant lymphoma,123,141,state\nHHV - 8 infection,210,227,state\nflare,324,329,state"}", "/scratch/micpie/export/bio_ner_7/test_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: These results indicate that the interaction of tumor cells and endothelial cells in orderly tumor angiomorphogenesis is highly dependent on the action of cell adhesion molecules mediating the adhesion of cancer cells to endothelial cells, inhibition of which remarkably retards tumor growth and angiogenesis..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: tumor cells,47,58,Anatomy\nendothelial cells,63,80,Anatomy\ntumor,92,97,Anatomy\ncell,154,158,Anatomy\ncancer cells,204,216,Anatomy\nendothelial cells,220,237,Anatomy\ntumor,278,283,Anatomy"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Nevertheless, discrimination between species was achieved for SbDV, BLRV, BChV, BMYV, CABYV and either PLRV or BWYV..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: SbDV,62,66,Organism\/Species\nBLRV,68,72,Organism\/Species\nBChV,74,78,Organism\/Species\nBMYV,80,84,Organism\/Species\nCABYV,86,91,Organism\/Species\nPLRV,103,107,Organism\/Species\nBWYV,111,115,Organism\/Species"}", "/scratch/micpie/export/bio_ner_7/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: (B) Contact-dependent stimulation of iDC TNF-alpha production by NK cells was tested under the following conditions: DCs alone (gray bars); NK+ DC (1: 5) (black bars); NK\/DC (1: 5) trans-wells (striped bars)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: iDC,40,43,Anatomy\nNK cells,70,78,Anatomy\nDCs,122,125,Anatomy\nNK,146,148,Anatomy\nDC,151,153,Anatomy\nNK,177,179,Anatomy\nDC,182,184,Anatomy"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: A starch, glucose and volatile fatty acids (VFA) mixture (acetate, propionate and butyrate) was fed for 55days; after that, the feeding continued with a cellulose and xylan mixture till day 190..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: starch,2,8,treatment\nglucose,10,17,treatment\nacetate,60,67,treatment\npropionate,69,79,treatment\nbutyrate,84,92,treatment\ncellulose,155,164,treatment\nxylan,169,174,treatment"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_5-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the InChI string.\nInChI string: InChI=1S\/C15H15FN2OS\/c1-10-6-7-11(17)8-13(10)18-15(19)9-20-14-5-3-2-4-12(14)16\/h2-8H,9,17H2,1H3,(H,18,19)\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CccccN)cc6NC=O)CScccccc6F"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8-\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_2-4.jsonl": "{"text":"User: Can you tell me the DeepSMILES of the molecule with the InChI string InChI=1S\/C16H12FN3O\/c17-12-3-1-11(2-4-12)15-13(16(18)21)9-14(20-15)10-5-7-19-8-6-10\/h1-9,20H,(H2,18,21)?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of NC=O)ccc-cccncc6))))))[nH]c5-ccccF)cc6."} {"text":"User: Can you create the DeepSMILES of the molecule with the InChI string InChI=1S\/C17H15N3O3\/c1-23-15-8-7-11(9-14(15)18)16-13(17(21)22)10-20(19-16)12-5-3-2-4-6-12\/h2-10H,18H2,1H3,(H,21,22)?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of COcccc-cnn-cccccc6))))))cc5C=O)O)))))))cc6N."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_2-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the DeepSMILES from the InChI string.\nInChI string: InChI=1S\/C20H22FN.ClH\/c21-20-6-5-17-11-15(12-19(17)13-20)7-9-22-10-8-16-3-1-2-4-18(16)14-22;\/h1-6,13,15H,7-12,14H2;1H\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Cl.Fcccccc6)CCCCNCCcccccc6C%10))))))))))))C5"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the InChI string.\nInChI string: InChI=1S\/C24H20N4O3S\/c1-15-8-10-16(11-9-15)28-23(30)22-21(18-6-2-3-7-19(18)26-22)27-24(28)32-14-20(29)25-13-17-5-4-12-31-17\/h2-12,26H,13-14H2,1H3,(H,25,29)\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: Ccccc-ncSCC=O)NCcccco5))))))))))ncc[nH]cccccc69)))))))c6=O)))))))cc6"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_4-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C16H18N2O4S\/c1-11-15(8-9-22-11)16(19)17-10-12-2-6-14(7-3-12)23(20,21)18-13-4-5-13\/h2-3,6-9,13,18H,4-5,10H2,1H3,(H,17,19) can also be represented with the DeepSMILES representation Ccoccc5C=O)NCccccS=O)=O)NCCC3)))))cc6."} {"text":"The molecule with the InChI string representation of InChI=1S\/C16H15N3O2S\/c1-11-12(6-9-21-11)16(20)18-8-5-15-19-14(10-22-15)13-4-2-3-7-17-13\/h2-4,6-7,9-10H,5,8H2,1H3,(H,18,20) can also be represented with the DeepSMILES Ccoccc5C=O)NCCcnc-cccccn6))))))cs5."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_4-4.jsonl": "{"text":"User: Can you tell me the DeepSMILES of the molecule with the InChI string InChI=1S\/C16H18N2O4S\/c1-11-15(8-9-22-11)16(19)17-10-12-2-6-14(7-3-12)23(20,21)18-13-4-5-13\/h2-3,6-9,13,18H,4-5,10H2,1H3,(H,17,19)?\nAssistant: Yes, this molecule has a DeepSMILES of Ccoccc5C=O)NCccccS=O)=O)NCCC3)))))cc6."} {"text":"User: Can you create the DeepSMILES of the molecule with the InChI string InChI=1S\/C16H15N3O2S\/c1-11-12(6-9-21-11)16(20)18-8-5-15-19-14(10-22-15)13-4-2-3-7-17-13\/h2-4,6-7,9-10H,5,8H2,1H3,(H,18,20)?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of Ccoccc5C=O)NCCcnc-cccccn6))))))cs5."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_1-4.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the InChI string InChI=1S\/C19H18N2O2S\/c1-13-20-18(12-24-13)15-6-8-16(9-7-15)21-19(22)11-14-4-3-5-17(10-14)23-2\/h3-10,12H,11H2,1-2H3,(H,21,22)?\nAssistant: Of course, this molecule has a DeepSMILES of COcccccCC=O)Ncccc-ccscC)n5)))))cc6)))))))))c6."} {"text":"User: Can you generate the DeepSMILES of the molecule with the InChI string InChI=1S\/C11H10N4O3\/c1-18-5-2-3-7-6(4-5)8-9(13-7)10(16)15(12)11(17)14-8\/h2-4,13H,12H2,1H3,(H,14,17)?\nAssistant: Of course, this molecule has a DeepSMILES of COcccc[nH]cc=O)nN)c=O)[nH]c6c9c%13."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_5-1.jsonl": "{"text":"The molecule with the DeepSMILES CccccN)cc6NC=O)CScccccc6F can also be represented with the InChI string representation InChI=1S\/C15H15FN2OS\/c1-10-6-7-11(17)8-13(10)18-15(19)9-20-14-5-3-2-4-12(14)16\/h2-8H,9,17H2,1H3,(H,18,19)."} {"text":"The molecule with the DeepSMILES representation of CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58 can also be represented with the InChI string representation InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8-."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_4-4.jsonl": "{"text":"User: Can you tell me the DeepSMILES of the molecule with the InChI string InChI=1S\/C15H14N4OS2\/c1-8-7-17-15(22-8)19-13(20)10-5-12(9-3-4-9)18-14(21-2)11(10)6-16\/h5,7,9H,3-4H2,1-2H3,(H,17,19,20)?\nAssistant: Of course, this molecule has a DeepSMILES of CScncCCC3)))ccC=O)NcnccC)s5)))))))c6C#N."} {"text":"User: Can you create the DeepSMILES of the molecule with the InChI string InChI=1S\/C17H22FNO4\/c1-11(2)15(17(21)23-10-14-4-3-9-22-14)19-16(20)12-5-7-13(18)8-6-12\/h5-8,11,14-15H,3-4,9-10H2,1-2H3,(H,19,20)?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of CCC)CNC=O)ccccF)cc6))))))))C=O)OCCCCCO5."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_4-4.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the InChI string InChI=1S\/C17H22N2O3S\/c20-17(14-4-2-1-3-5-14)18-12-13-6-10-16(11-7-13)23(21,22)19-15-8-9-15\/h1-2,6-7,10-11,14-15,19H,3-5,8-9,12H2,(H,18,20)?\nAssistant: Sure, this molecule has a DeepSMILES of O=CNCccccS=O)=O)NCCC3)))))cc6))))))))CCC=CCC6."} {"text":"User: Can you generate the DeepSMILES of the molecule with the InChI string InChI=1S\/C17H17FN4O\/c1-2-23-17-9-14(18)5-8-16(17)20-10-13-3-6-15(7-4-13)22-12-19-11-21-22\/h3-9,11-12,20H,2,10H2,1H3?\nAssistant: Of course, this molecule has a DeepSMILES of CCOcccF)ccc6NCcccc-ncncn5)))))cc6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_1-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the DeepSMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C22H26N2O4\/c1-4-21(25)24-17-9-7-6-8-16(17)13-18(24)22(26)23-14-15-10-11-19(28-5-2)20(12-15)27-3\/h6-12,18H,4-5,13-14H2,1-3H3,(H,23,26)\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CCOccccCNC=O)CCcccccc6N9C=O)CC)))))))))))))))cc6OC"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C17H14ClN3O3S2\/c1-11-16(26(23,24)10-12-6-2-3-7-13(12)18)25-17(20-11)21-15(22)14-8-4-5-9-19-14\/h2-9H,10H2,1H3,(H,20,21,22)\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CcncNC=O)cccccn6))))))))sc5S=O)=O)Ccccccc6Cl"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_3-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the InChI string.\nInChI string: InChI=1S\/C24H25BrClN3O3\/c1-5-29-16(3)24(15(2)28-29)27-23(30)11-7-17-6-9-21(31-4)18(12-17)14-32-22-10-8-19(26)13-20(22)25\/h6-13H,5,14H2,1-4H3,(H,27,30)\/b11-7+\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCnncC)cNC=O)\/C=C\/ccccOC))cCOccccCl)cc6Br)))))))))c6))))))))))c5C"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the InChI string.\nInChI string: InChI=1S\/C16H16FN3O2S\/c1-2-20(9-3-8-18)15(21)11-23-16-19-10-14(22-16)12-4-6-13(17)7-5-12\/h4-7,10H,2-3,9,11H2,1H3\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCNCCC#N))))C=O)CScncc-ccccF)cc6))))))o5"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_1-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the DeepSMILES CCOccccCNC=O)CCcccccc6N9C=O)CC)))))))))))))))cc6OC?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C22H26N2O4\/c1-4-21(25)24-17-9-7-6-8-16(17)13-18(24)22(26)23-14-15-10-11-19(28-5-2)20(12-15)27-3\/h6-12,18H,4-5,13-14H2,1-3H3,(H,23,26)."} {"text":"User: Can you tell me the InChI string of the molecule with the DeepSMILES CcncNC=O)cccccn6))))))))sc5S=O)=O)Ccccccc6Cl?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C17H14ClN3O3S2\/c1-11-16(26(23,24)10-12-6-2-3-7-13(12)18)25-17(20-11)21-15(22)14-8-4-5-9-19-14\/h2-9H,10H2,1H3,(H,20,21,22)."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_5-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C15H15FN2OS\/c1-10-6-7-11(17)8-13(10)18-15(19)9-20-14-5-3-2-4-12(14)16\/h2-8H,9,17H2,1H3,(H,18,19) can also be represented with the DeepSMILES representation CccccN)cc6NC=O)CScccccc6F."} {"text":"The molecule with the InChI string representation of InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8- can also be represented with the DeepSMILES CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_1-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the DeepSMILES COccc\/C=C\/C=O)O))))ccS=O)=O)NccccF)cc6))))))))c6OC?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C17H16FNO6S\/c1-24-14-9-11(3-8-16(20)21)10-15(17(14)25-2)26(22,23)19-13-6-4-12(18)5-7-13\/h3-10,19H,1-2H3,(H,20,21)\/b8-3+."} {"text":"User: Can you tell me the InChI string of the molecule with the DeepSMILES CccccNS=O)=O)ccccF)cc6))))))))cC=O)O))c6C?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C15H14FNO4S\/c1-9-3-8-13(14(10(9)2)15(18)19)17-22(20,21)12-6-4-11(16)5-7-12\/h3-8,17H,1-2H3,(H,18,19)."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_5-4.jsonl": "{"text":"User: Can you tell me the DeepSMILES of the molecule with the InChI string InChI=1S\/C14H15N7S\/c1-2-4-10(5-3-1)8-21-14(18-19-20-21)22-9-12-15-13(17-16-12)11-6-7-11\/h1-5,11H,6-9H2,(H,15,16,17)?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of ccccCnnnnc5SCcncCCC3)))n[nH]5)))))))))))))cc6."} {"text":"User: Can you generate the DeepSMILES of the molecule with the InChI string InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1?\nAssistant: Sure, this molecule has a DeepSMILES of Nccc[nH+]cccccc%106)))))))CCCC6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_0-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the DeepSMILES S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1."} {"text":"User: Can you generate the InChI string of the molecule with the DeepSMILES COccccCl)cc6CNccccnc6)))))))cccccccnc6c%10O?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C22H18ClN3O2\/c1-28-19-9-7-15(23)12-18(19)21(26-16-5-3-10-24-13-16)17-8-6-14-4-2-11-25-20(14)22(17)27\/h2-13,21,26-27H,1H3."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_3-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C15H17N5\/c1-3-4-13-9-14(20-15(19-13)16-10-17-20)18-12-7-5-11(2)6-8-12\/h5-10,18H,3-4H2,1-2H3 can also be represented with the DeepSMILES representation CCCcccNccccC)cc6)))))))nncnc5n9."} {"text":"The molecule with the InChI string representation of InChI=1S\/C15H18N2O4\/c1-11(18)16-13-5-2-4-12(10-13)15(20)21-9-8-17-7-3-6-14(17)19\/h2,4-5,10H,3,6-9H2,1H3,(H,16,18) can also be represented with the DeepSMILES CC=O)NcccccC=O)OCCNCCCC5=O))))))))))c6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_0-1.jsonl": "{"text":"The molecule with the DeepSMILES S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6 can also be represented with the InChI string InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1."} {"text":"The molecule with the DeepSMILES COccccCl)cc6CNccccnc6)))))))cccccccnc6c%10O can also be represented with the InChI string InChI=1S\/C22H18ClN3O2\/c1-28-19-9-7-15(23)12-18(19)21(26-16-5-3-10-24-13-16)17-8-6-14-4-2-11-25-20(14)22(17)27\/h2-13,21,26-27H,1H3."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_5-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C17H21N3O2S\/c1-11(2)15(17(22)20-6-3-4-7-20)19-16(21)12-9-14-13(18-10-12)5-8-23-14\/h5,8-11,15H,3-4,6-7H2,1-2H3,(H,19,21) can also be represented with the DeepSMILES CCC)CNC=O)ccncccsc5c9)))))))))))C=O)NCCCC5."} {"text":"The molecule with the InChI string InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1 can also be represented with the DeepSMILES representation CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_2-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C16H12FN3O\/c17-12-3-1-11(2-4-12)15-13(16(18)21)9-14(20-15)10-5-7-19-8-6-10\/h1-9,20H,(H2,18,21) can also be represented with the DeepSMILES representation NC=O)ccc-cccncc6))))))[nH]c5-ccccF)cc6."} {"text":"The molecule with the InChI string InChI=1S\/C17H15N3O3\/c1-23-15-8-7-11(9-14(15)18)16-13(17(21)22)10-20(19-16)12-5-3-2-4-6-12\/h2-10H,18H2,1H3,(H,21,22) can also be represented with the DeepSMILES COcccc-cnn-cccccc6))))))cc5C=O)O)))))))cc6N."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_2-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C22H22Cl2N2O2\/c1-3-5-12-26(4-2)22(28)21(27)19-17-13-16(24)10-11-18(17)25-20(19)14-6-8-15(23)9-7-14\/h6-11,13,25H,3-5,12H2,1-2H3\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CCCCNCC))C=O)C=O)cc-ccccCl)cc6))))))[nH]ccccCl)cc96"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C22H26BrClN4O2\/c1-26(15-21(29)25-20-5-3-2-4-19(20)23)22(30)16-28-12-10-27(11-13-28)14-17-6-8-18(24)9-7-17\/h2-9H,10-16H2,1H3,(H,25,29)\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CNCC=O)Ncccccc6Br))))))))))C=O)CNCCNCccccCl)cc6)))))))CC6"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_0-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1 can also be represented with the DeepSMILES representation S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6."} {"text":"The molecule with the InChI string representation of InChI=1S\/C22H23N3O4\/c1-22(2,3)24-21(27)19(11-7-10-16-8-5-4-6-9-16)23-20(26)17-12-14-18(15-13-17)25(28)29\/h4-15H,1-3H3,(H,23,26)(H,24,27)\/b10-7+,19-11+ can also be represented with the DeepSMILES CCC)C)NC=O)\/C=C\\C=C\\cccccc6)))))))))NC=O)cccc[N+]=O)[O-]))cc6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_3-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the InChI string from the DeepSMILES.\nMolecule DeepSMILES: CCCcccNccccC)cc6)))))))nncnc5n9\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C15H17N5\/c1-3-4-13-9-14(20-15(19-13)16-10-17-20)18-12-7-5-11(2)6-8-12\/h5-10,18H,3-4H2,1-2H3"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the DeepSMILES.\nDeepSMILES: CC=O)NcccccC=O)OCCNCCCC5=O))))))))))c6\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C15H18N2O4\/c1-11(18)16-13-5-2-4-12(10-13)15(20)21-9-8-17-7-3-6-14(17)19\/h2,4-5,10H,3,6-9H2,1H3,(H,16,18)"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_5-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the DeepSMILES.\nDeepSMILES: ccccCnnnnc5SCcncCCC3)))n[nH]5)))))))))))))cc6\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C14H15N7S\/c1-2-4-10(5-3-1)8-21-14(18-19-20-21)22-9-12-15-13(17-16-12)11-6-7-11\/h1-5,11H,6-9H2,(H,15,16,17)"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the DeepSMILES.\nDeepSMILES: Nccc[nH+]cccccc%106)))))))CCCC6\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_3-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the DeepSMILES COC=O)C=CCScncccc6C#N))))CNC)CC6))))))))))OCN)=CC#N))C6ccccF)cc6?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C25H22FN5O3S\/c1-31-8-7-19-16(12-31)9-15(10-27)24(30-19)35-13-20-22(25(32)33-2)21(18(11-28)23(29)34-20)14-3-5-17(26)6-4-14\/h3-6,9,21H,7-8,12-13,29H2,1-2H3."} {"text":"User: Can you tell me the InChI string of the molecule with the DeepSMILES O=COCCNCCCC5=O)))))))))ccc=O)[nH]cccccc%106?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C16H16N2O4\/c19-14-10-12(11-4-1-2-5-13(11)17-14)16(21)22-9-8-18-7-3-6-15(18)20\/h1-2,4-5,10H,3,6-9H2,(H,17,19)."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_5-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the DeepSMILES CCC)CNC=O)ccncccsc5c9)))))))))))C=O)NCCCC5?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C17H21N3O2S\/c1-11(2)15(17(22)20-6-3-4-7-20)19-16(21)12-9-14-13(18-10-12)5-8-23-14\/h5,8-11,15H,3-4,6-7H2,1-2H3,(H,19,21)."} {"text":"User: Can you create the InChI string of the molecule with the DeepSMILES CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_4-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the DeepSMILES.\nDeepSMILES: CScncCCC3)))ccC=O)NcnccC)s5)))))))c6C#N\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C15H14N4OS2\/c1-8-7-17-15(22-8)19-13(20)10-5-12(9-3-4-9)18-14(21-2)11(10)6-16\/h5,7,9H,3-4H2,1-2H3,(H,17,19,20)"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the DeepSMILES.\nMolecule DeepSMILES: CCC)CNC=O)ccccF)cc6))))))))C=O)OCCCCCO5\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C17H22FNO4\/c1-11(2)15(17(21)23-10-14-4-3-9-22-14)19-16(20)12-5-7-13(18)8-6-12\/h5-8,11,14-15H,3-4,9-10H2,1-2H3,(H,19,20)"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_1-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the DeepSMILES.\nMolecule DeepSMILES: COccc\/C=C\/C=O)O))))ccS=O)=O)NccccF)cc6))))))))c6OC\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C17H16FNO6S\/c1-24-14-9-11(3-8-16(20)21)10-15(17(14)25-2)26(22,23)19-13-6-4-12(18)5-7-13\/h3-10,19H,1-2H3,(H,20,21)\/b8-3+"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the DeepSMILES.\nMolecule DeepSMILES: CccccNS=O)=O)ccccF)cc6))))))))cC=O)O))c6C\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C15H14FNO4S\/c1-9-3-8-13(14(10(9)2)15(18)19)17-22(20,21)12-6-4-11(16)5-7-12\/h3-8,17H,1-2H3,(H,18,19)"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_0-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C22H18ClN3O2\/c1-28-19-9-7-15(23)12-18(19)21(26-16-5-3-10-24-13-16)17-8-6-14-4-2-11-25-20(14)22(17)27\/h2-13,21,26-27H,1H3\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: COccccCl)cc6CNccccnc6)))))))cccccccnc6c%10O"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_3-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C24H25BrClN3O3\/c1-5-29-16(3)24(15(2)28-29)27-23(30)11-7-17-6-9-21(31-4)18(12-17)14-32-22-10-8-19(26)13-20(22)25\/h6-13H,5,14H2,1-4H3,(H,27,30)\/b11-7+ can also be represented with the DeepSMILES CCnncC)cNC=O)\/C=C\/ccccOC))cCOccccCl)cc6Br)))))))))c6))))))))))c5C."} {"text":"The molecule with the InChI string representation of InChI=1S\/C16H16FN3O2S\/c1-2-20(9-3-8-18)15(21)11-23-16-19-10-14(22-16)12-4-6-13(17)7-5-12\/h4-7,10H,2-3,9,11H2,1H3 can also be represented with the DeepSMILES representation CCNCCC#N))))C=O)CScncc-ccccF)cc6))))))o5."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_1-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C17H16FNO6S\/c1-24-14-9-11(3-8-16(20)21)10-15(17(14)25-2)26(22,23)19-13-6-4-12(18)5-7-13\/h3-10,19H,1-2H3,(H,20,21)\/b8-3+ can also be represented with the DeepSMILES representation COccc\/C=C\/C=O)O))))ccS=O)=O)NccccF)cc6))))))))c6OC."} {"text":"The molecule with the InChI string InChI=1S\/C15H14FNO4S\/c1-9-3-8-13(14(10(9)2)15(18)19)17-22(20,21)12-6-4-11(16)5-7-12\/h3-8,17H,1-2H3,(H,18,19) can also be represented with the DeepSMILES CccccNS=O)=O)ccccF)cc6))))))))cC=O)O))c6C."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_5-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C17H21N3O2S\/c1-11(2)15(17(22)20-6-3-4-7-20)19-16(21)12-9-14-13(18-10-12)5-8-23-14\/h5,8-11,15H,3-4,6-7H2,1-2H3,(H,19,21)\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CCC)CNC=O)ccncccsc5c9)))))))))))C=O)NCCCC5"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_2-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the DeepSMILES.\nMolecule DeepSMILES: Cl.Fcccccc6)CCCCNCCcccccc6C%10))))))))))))C5\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C20H22FN.ClH\/c21-20-6-5-17-11-15(12-19(17)13-20)7-9-22-10-8-16-3-1-2-4-18(16)14-22;\/h1-6,13,15H,7-12,14H2;1H"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the DeepSMILES.\nDeepSMILES: Ccccc-ncSCC=O)NCcccco5))))))))))ncc[nH]cccccc69)))))))c6=O)))))))cc6\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C24H20N4O3S\/c1-15-8-10-16(11-9-15)28-23(30)22-21(18-6-2-3-7-19(18)26-22)27-24(28)32-14-20(29)25-13-17-5-4-12-31-17\/h2-12,26H,13-14H2,1H3,(H,25,29)"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_1-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C19H18N2O2S\/c1-13-20-18(12-24-13)15-6-8-16(9-7-15)21-19(22)11-14-4-3-5-17(10-14)23-2\/h3-10,12H,11H2,1-2H3,(H,21,22)\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: COcccccCC=O)Ncccc-ccscC)n5)))))cc6)))))))))c6"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C11H10N4O3\/c1-18-5-2-3-7-6(4-5)8-9(13-7)10(16)15(12)11(17)14-8\/h2-4,13H,12H2,1H3,(H,14,17)\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: COcccc[nH]cc=O)nN)c=O)[nH]c6c9c%13"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_5-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the DeepSMILES CccccN)cc6NC=O)CScccccc6F?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C15H15FN2OS\/c1-10-6-7-11(17)8-13(10)18-15(19)9-20-14-5-3-2-4-12(14)16\/h2-8H,9,17H2,1H3,(H,18,19)."} {"text":"User: Can you generate the InChI string of the molecule with the DeepSMILES CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8-."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_2-4.jsonl": "{"text":"User: Can you tell me the DeepSMILES of the molecule with the InChI string InChI=1S\/C20H22FN.ClH\/c21-20-6-5-17-11-15(12-19(17)13-20)7-9-22-10-8-16-3-1-2-4-18(16)14-22;\/h1-6,13,15H,7-12,14H2;1H?\nAssistant: Sure, this molecule has a DeepSMILES of Cl.Fcccccc6)CCCCNCCcccccc6C%10))))))))))))C5."} {"text":"User: Can you tell me the DeepSMILES of the molecule with the InChI string InChI=1S\/C24H20N4O3S\/c1-15-8-10-16(11-9-15)28-23(30)22-21(18-6-2-3-7-19(18)26-22)27-24(28)32-14-20(29)25-13-17-5-4-12-31-17\/h2-12,26H,13-14H2,1H3,(H,25,29)?\nAssistant: Yes, this molecule has a DeepSMILES of Ccccc-ncSCC=O)NCcccco5))))))))))ncc[nH]cccccc69)))))))c6=O)))))))cc6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_3-4.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the InChI string InChI=1S\/C15H17N5\/c1-3-4-13-9-14(20-15(19-13)16-10-17-20)18-12-7-5-11(2)6-8-12\/h5-10,18H,3-4H2,1-2H3?\nAssistant: Yes, this molecule has a DeepSMILES of CCCcccNccccC)cc6)))))))nncnc5n9."} {"text":"User: Can you tell me the DeepSMILES of the molecule with the InChI string InChI=1S\/C15H18N2O4\/c1-11(18)16-13-5-2-4-12(10-13)15(20)21-9-8-17-7-3-6-14(17)19\/h2,4-5,10H,3,6-9H2,1H3,(H,16,18)?\nAssistant: Of course, this molecule has a DeepSMILES of CC=O)NcccccC=O)OCCNCCCC5=O))))))))))c6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_0-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1 can also be represented with the DeepSMILES S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6."} {"text":"The molecule with the InChI string InChI=1S\/C22H18ClN3O2\/c1-28-19-9-7-15(23)12-18(19)21(26-16-5-3-10-24-13-16)17-8-6-14-4-2-11-25-20(14)22(17)27\/h2-13,21,26-27H,1H3 can also be represented with the DeepSMILES COccccCl)cc6CNccccnc6)))))))cccccccnc6c%10O."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_2-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the DeepSMILES NC=O)ccc-cccncc6))))))[nH]c5-ccccF)cc6?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C16H12FN3O\/c17-12-3-1-11(2-4-12)15-13(16(18)21)9-14(20-15)10-5-7-19-8-6-10\/h1-9,20H,(H2,18,21)."} {"text":"User: Can you tell me the InChI string of the molecule with the DeepSMILES COcccc-cnn-cccccc6))))))cc5C=O)O)))))))cc6N?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C17H15N3O3\/c1-23-15-8-7-11(9-14(15)18)16-13(17(21)22)10-20(19-16)12-5-3-2-4-6-12\/h2-10H,18H2,1H3,(H,21,22)."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_5-5.jsonl": "{"text":"User: Can you generate the InChI string of the molecule with the DeepSMILES ccccCnnnnc5SCcncCCC3)))n[nH]5)))))))))))))cc6?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C14H15N7S\/c1-2-4-10(5-3-1)8-21-14(18-19-20-21)22-9-12-15-13(17-16-12)11-6-7-11\/h1-5,11H,6-9H2,(H,15,16,17)."} {"text":"User: Can you tell me the InChI string of the molecule with the DeepSMILES Nccc[nH+]cccccc%106)))))))CCCC6?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_3-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the DeepSMILES CCnncC)cNC=O)\/C=C\/ccccOC))cCOccccCl)cc6Br)))))))))c6))))))))))c5C?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C24H25BrClN3O3\/c1-5-29-16(3)24(15(2)28-29)27-23(30)11-7-17-6-9-21(31-4)18(12-17)14-32-22-10-8-19(26)13-20(22)25\/h6-13H,5,14H2,1-4H3,(H,27,30)\/b11-7+."} {"text":"User: Can you create the InChI string of the molecule with the DeepSMILES CCNCCC#N))))C=O)CScncc-ccccF)cc6))))))o5?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C16H16FN3O2S\/c1-2-20(9-3-8-18)15(21)11-23-16-19-10-14(22-16)12-4-6-13(17)7-5-12\/h4-7,10H,2-3,9,11H2,1H3."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_2-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C20H22FN.ClH\/c21-20-6-5-17-11-15(12-19(17)13-20)7-9-22-10-8-16-3-1-2-4-18(16)14-22;\/h1-6,13,15H,7-12,14H2;1H can also be represented with the DeepSMILES Cl.Fcccccc6)CCCCNCCcccccc6C%10))))))))))))C5."} {"text":"The molecule with the InChI string InChI=1S\/C24H20N4O3S\/c1-15-8-10-16(11-9-15)28-23(30)22-21(18-6-2-3-7-19(18)26-22)27-24(28)32-14-20(29)25-13-17-5-4-12-31-17\/h2-12,26H,13-14H2,1H3,(H,25,29) can also be represented with the DeepSMILES Ccccc-ncSCC=O)NCcccco5))))))))))ncc[nH]cccccc69)))))))c6=O)))))))cc6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_1-4.jsonl": "{"text":"User: Can you tell me the DeepSMILES of the molecule with the InChI string InChI=1S\/C17H16FNO6S\/c1-24-14-9-11(3-8-16(20)21)10-15(17(14)25-2)26(22,23)19-13-6-4-12(18)5-7-13\/h3-10,19H,1-2H3,(H,20,21)\/b8-3+?\nAssistant: Yes, this molecule has a DeepSMILES of COccc\/C=C\/C=O)O))))ccS=O)=O)NccccF)cc6))))))))c6OC."} {"text":"User: Can you generate the DeepSMILES of the molecule with the InChI string InChI=1S\/C15H14FNO4S\/c1-9-3-8-13(14(10(9)2)15(18)19)17-22(20,21)12-6-4-11(16)5-7-12\/h3-8,17H,1-2H3,(H,18,19)?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of CccccNS=O)=O)ccccF)cc6))))))))cC=O)O))c6C."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_2-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C22H22Cl2N2O2\/c1-3-5-12-26(4-2)22(28)21(27)19-17-13-16(24)10-11-18(17)25-20(19)14-6-8-15(23)9-7-14\/h6-11,13,25H,3-5,12H2,1-2H3 can also be represented with the DeepSMILES CCCCNCC))C=O)C=O)cc-ccccCl)cc6))))))[nH]ccccCl)cc96."} {"text":"The molecule with the InChI string InChI=1S\/C22H26BrClN4O2\/c1-26(15-21(29)25-20-5-3-2-4-19(20)23)22(30)16-28-12-10-27(11-13-28)14-17-6-8-18(24)9-7-17\/h2-9H,10-16H2,1H3,(H,25,29) can also be represented with the DeepSMILES representation CNCC=O)Ncccccc6Br))))))))))C=O)CNCCNCccccCl)cc6)))))))CC6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_4-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the InChI string.\nInChI string: InChI=1S\/C15H14N4OS2\/c1-8-7-17-15(22-8)19-13(20)10-5-12(9-3-4-9)18-14(21-2)11(10)6-16\/h5,7,9H,3-4H2,1-2H3,(H,17,19,20)\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CScncCCC3)))ccC=O)NcnccC)s5)))))))c6C#N"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C17H22FNO4\/c1-11(2)15(17(21)23-10-14-4-3-9-22-14)19-16(20)12-5-7-13(18)8-6-12\/h5-8,11,14-15H,3-4,9-10H2,1-2H3,(H,19,20)\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CCC)CNC=O)ccccF)cc6))))))))C=O)OCCCCCO5"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_0-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the InChI string from the DeepSMILES.\nDeepSMILES: S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the InChI string from the DeepSMILES.\nDeepSMILES: COccccCl)cc6CNccccnc6)))))))cccccccnc6c%10O\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C22H18ClN3O2\/c1-28-19-9-7-15(23)12-18(19)21(26-16-5-3-10-24-13-16)17-8-6-14-4-2-11-25-20(14)22(17)27\/h2-13,21,26-27H,1H3"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_4-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C16H18N2O4S\/c1-11-15(8-9-22-11)16(19)17-10-12-2-6-14(7-3-12)23(20,21)18-13-4-5-13\/h2-3,6-9,13,18H,4-5,10H2,1H3,(H,17,19)\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Ccoccc5C=O)NCccccS=O)=O)NCCC3)))))cc6"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the InChI string.\nInChI string: InChI=1S\/C16H15N3O2S\/c1-11-12(6-9-21-11)16(20)18-8-5-15-19-14(10-22-15)13-4-2-3-7-17-13\/h2-4,6-7,9-10H,5,8H2,1H3,(H,18,20)\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Ccoccc5C=O)NCCcnc-cccccn6))))))cs5"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_3-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the InChI string.\nInChI string: InChI=1S\/C15H17N5\/c1-3-4-13-9-14(20-15(19-13)16-10-17-20)18-12-7-5-11(2)6-8-12\/h5-10,18H,3-4H2,1-2H3\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCCcccNccccC)cc6)))))))nncnc5n9"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C15H18N2O4\/c1-11(18)16-13-5-2-4-12(10-13)15(20)21-9-8-17-7-3-6-14(17)19\/h2,4-5,10H,3,6-9H2,1H3,(H,16,18)\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CC=O)NcccccC=O)OCCNCCCC5=O))))))))))c6"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_2-1.jsonl": "{"text":"The molecule with the DeepSMILES representation of CCCCNCC))C=O)C=O)cc-ccccCl)cc6))))))[nH]ccccCl)cc96 can also be represented with the InChI string InChI=1S\/C22H22Cl2N2O2\/c1-3-5-12-26(4-2)22(28)21(27)19-17-13-16(24)10-11-18(17)25-20(19)14-6-8-15(23)9-7-14\/h6-11,13,25H,3-5,12H2,1-2H3."} {"text":"The molecule with the DeepSMILES CNCC=O)Ncccccc6Br))))))))))C=O)CNCCNCccccCl)cc6)))))))CC6 can also be represented with the InChI string InChI=1S\/C22H26BrClN4O2\/c1-26(15-21(29)25-20-5-3-2-4-19(20)23)22(30)16-28-12-10-27(11-13-28)14-17-6-8-18(24)9-7-17\/h2-9H,10-16H2,1H3,(H,25,29)."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_5-4.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the InChI string InChI=1S\/C17H21N3O2S\/c1-11(2)15(17(22)20-6-3-4-7-20)19-16(21)12-9-14-13(18-10-12)5-8-23-14\/h5,8-11,15H,3-4,6-7H2,1-2H3,(H,19,21)?\nAssistant: Of course, this molecule has a DeepSMILES of CCC)CNC=O)ccncccsc5c9)))))))))))C=O)NCCCC5."} {"text":"User: Can you create the DeepSMILES of the molecule with the InChI string InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1?\nAssistant: Of course, this molecule has a DeepSMILES of CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_4-5.jsonl": "{"text":"User: Can you generate the InChI string of the molecule with the DeepSMILES Ccoccc5C=O)NCccccS=O)=O)NCCC3)))))cc6?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C16H18N2O4S\/c1-11-15(8-9-22-11)16(19)17-10-12-2-6-14(7-3-12)23(20,21)18-13-4-5-13\/h2-3,6-9,13,18H,4-5,10H2,1H3,(H,17,19)."} {"text":"User: Can you generate the InChI string of the molecule with the DeepSMILES Ccoccc5C=O)NCCcnc-cccccn6))))))cs5?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C16H15N3O2S\/c1-11-12(6-9-21-11)16(20)18-8-5-15-19-14(10-22-15)13-4-2-3-7-17-13\/h2-4,6-7,9-10H,5,8H2,1H3,(H,18,20)."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_4-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C15H14N4OS2\/c1-8-7-17-15(22-8)19-13(20)10-5-12(9-3-4-9)18-14(21-2)11(10)6-16\/h5,7,9H,3-4H2,1-2H3,(H,17,19,20) can also be represented with the DeepSMILES representation CScncCCC3)))ccC=O)NcnccC)s5)))))))c6C#N."} {"text":"The molecule with the InChI string InChI=1S\/C17H22FNO4\/c1-11(2)15(17(21)23-10-14-4-3-9-22-14)19-16(20)12-5-7-13(18)8-6-12\/h5-8,11,14-15H,3-4,9-10H2,1-2H3,(H,19,20) can also be represented with the DeepSMILES representation CCC)CNC=O)ccccF)cc6))))))))C=O)OCCCCCO5."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_5-1.jsonl": "{"text":"The molecule with the DeepSMILES representation of ccccCnnnnc5SCcncCCC3)))n[nH]5)))))))))))))cc6 can also be represented with the InChI string InChI=1S\/C14H15N7S\/c1-2-4-10(5-3-1)8-21-14(18-19-20-21)22-9-12-15-13(17-16-12)11-6-7-11\/h1-5,11H,6-9H2,(H,15,16,17)."} {"text":"The molecule with the DeepSMILES representation of Nccc[nH+]cccccc%106)))))))CCCC6 can also be represented with the InChI string InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_2-1.jsonl": "{"text":"The molecule with the DeepSMILES representation of NC=O)ccc-cccncc6))))))[nH]c5-ccccF)cc6 can also be represented with the InChI string InChI=1S\/C16H12FN3O\/c17-12-3-1-11(2-4-12)15-13(16(18)21)9-14(20-15)10-5-7-19-8-6-10\/h1-9,20H,(H2,18,21)."} {"text":"The molecule with the DeepSMILES COcccc-cnn-cccccc6))))))cc5C=O)O)))))))cc6N can also be represented with the InChI string InChI=1S\/C17H15N3O3\/c1-23-15-8-7-11(9-14(15)18)16-13(17(21)22)10-20(19-16)12-5-3-2-4-6-12\/h2-10H,18H2,1H3,(H,21,22)."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_0-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1 can also be represented with the DeepSMILES OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C."} {"text":"The molecule with the InChI string InChI=1S\/C18H16N2O3S\/c1-22-13-9-7-12(8-10-13)16-11-15(20-23-16)18(21)19-14-5-3-4-6-17(14)24-2\/h3-11H,1-2H3,(H,19,21) can also be represented with the DeepSMILES COcccc-cccC=O)Ncccccc6SC))))))))))no5)))))cc6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_0-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the InChI string from the DeepSMILES.\nMolecule DeepSMILES: OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the InChI string from the DeepSMILES.\nMolecule DeepSMILES: COcccc-cccC=O)Ncccccc6SC))))))))))no5)))))cc6\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C18H16N2O3S\/c1-22-13-9-7-12(8-10-13)16-11-15(20-23-16)18(21)19-14-5-3-4-6-17(14)24-2\/h3-11H,1-2H3,(H,19,21)"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_1-4.jsonl": "{"text":"User: Can you tell me the DeepSMILES of the molecule with the InChI string InChI=1S\/C22H26N2O4\/c1-4-21(25)24-17-9-7-6-8-16(17)13-18(24)22(26)23-14-15-10-11-19(28-5-2)20(12-15)27-3\/h6-12,18H,4-5,13-14H2,1-3H3,(H,23,26)?\nAssistant: Sure, this molecule has a DeepSMILES of CCOccccCNC=O)CCcccccc6N9C=O)CC)))))))))))))))cc6OC."} {"text":"User: Can you tell me the DeepSMILES of the molecule with the InChI string InChI=1S\/C17H14ClN3O3S2\/c1-11-16(26(23,24)10-12-6-2-3-7-13(12)18)25-17(20-11)21-15(22)14-8-4-5-9-19-14\/h2-9H,10H2,1H3,(H,20,21,22)?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of CcncNC=O)cccccn6))))))))sc5S=O)=O)Ccccccc6Cl."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_2-5.jsonl": "{"text":"User: Can you generate the InChI string of the molecule with the DeepSMILES Cl.Fcccccc6)CCCCNCCcccccc6C%10))))))))))))C5?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C20H22FN.ClH\/c21-20-6-5-17-11-15(12-19(17)13-20)7-9-22-10-8-16-3-1-2-4-18(16)14-22;\/h1-6,13,15H,7-12,14H2;1H."} {"text":"User: Can you tell me the InChI string of the molecule with the DeepSMILES Ccccc-ncSCC=O)NCcccco5))))))))))ncc[nH]cccccc69)))))))c6=O)))))))cc6?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C24H20N4O3S\/c1-15-8-10-16(11-9-15)28-23(30)22-21(18-6-2-3-7-19(18)26-22)27-24(28)32-14-20(29)25-13-17-5-4-12-31-17\/h2-12,26H,13-14H2,1H3,(H,25,29)."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_1-1.jsonl": "{"text":"The molecule with the DeepSMILES CCOccccCNC=O)CCcccccc6N9C=O)CC)))))))))))))))cc6OC can also be represented with the InChI string InChI=1S\/C22H26N2O4\/c1-4-21(25)24-17-9-7-6-8-16(17)13-18(24)22(26)23-14-15-10-11-19(28-5-2)20(12-15)27-3\/h6-12,18H,4-5,13-14H2,1-3H3,(H,23,26)."} {"text":"The molecule with the DeepSMILES CcncNC=O)cccccn6))))))))sc5S=O)=O)Ccccccc6Cl can also be represented with the InChI string InChI=1S\/C17H14ClN3O3S2\/c1-11-16(26(23,24)10-12-6-2-3-7-13(12)18)25-17(20-11)21-15(22)14-8-4-5-9-19-14\/h2-9H,10H2,1H3,(H,20,21,22)."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_5-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C14H15N7S\/c1-2-4-10(5-3-1)8-21-14(18-19-20-21)22-9-12-15-13(17-16-12)11-6-7-11\/h1-5,11H,6-9H2,(H,15,16,17)\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: ccccCnnnnc5SCcncCCC3)))n[nH]5)))))))))))))cc6"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the InChI string.\nInChI string: InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: Nccc[nH+]cccccc%106)))))))CCCC6"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_5-1.jsonl": "{"text":"The molecule with the DeepSMILES representation of CCC)CNC=O)ccncccsc5c9)))))))))))C=O)NCCCC5 can also be represented with the InChI string representation InChI=1S\/C17H21N3O2S\/c1-11(2)15(17(22)20-6-3-4-7-20)19-16(21)12-9-14-13(18-10-12)5-8-23-14\/h5,8-11,15H,3-4,6-7H2,1-2H3,(H,19,21)."} {"text":"The molecule with the DeepSMILES CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20 can also be represented with the InChI string InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_4-1.jsonl": "{"text":"The molecule with the DeepSMILES representation of O=CNCccccS=O)=O)NCCC3)))))cc6))))))))CCC=CCC6 can also be represented with the InChI string representation InChI=1S\/C17H22N2O3S\/c20-17(14-4-2-1-3-5-14)18-12-13-6-10-16(11-7-13)23(21,22)19-15-8-9-15\/h1-2,6-7,10-11,14-15,19H,3-5,8-9,12H2,(H,18,20)."} {"text":"The molecule with the DeepSMILES CCOcccF)ccc6NCcccc-ncncn5)))))cc6 can also be represented with the InChI string InChI=1S\/C17H17FN4O\/c1-2-23-17-9-14(18)5-8-16(17)20-10-13-3-6-15(7-4-13)22-12-19-11-21-22\/h3-9,11-12,20H,2,10H2,1H3."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_3-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the DeepSMILES.\nDeepSMILES: CCnncC)cNC=O)\/C=C\/ccccOC))cCOccccCl)cc6Br)))))))))c6))))))))))c5C\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C24H25BrClN3O3\/c1-5-29-16(3)24(15(2)28-29)27-23(30)11-7-17-6-9-21(31-4)18(12-17)14-32-22-10-8-19(26)13-20(22)25\/h6-13H,5,14H2,1-4H3,(H,27,30)\/b11-7+"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the DeepSMILES.\nDeepSMILES: CCNCCC#N))))C=O)CScncc-ccccF)cc6))))))o5\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C16H16FN3O2S\/c1-2-20(9-3-8-18)15(21)11-23-16-19-10-14(22-16)12-4-6-13(17)7-5-12\/h4-7,10H,2-3,9,11H2,1H3"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_3-4.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the InChI string InChI=1S\/C25H22FN5O3S\/c1-31-8-7-19-16(12-31)9-15(10-27)24(30-19)35-13-20-22(25(32)33-2)21(18(11-28)23(29)34-20)14-3-5-17(26)6-4-14\/h3-6,9,21H,7-8,12-13,29H2,1-2H3?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of COC=O)C=CCScncccc6C#N))))CNC)CC6))))))))))OCN)=CC#N))C6ccccF)cc6."} {"text":"User: Can you create the DeepSMILES of the molecule with the InChI string InChI=1S\/C16H16N2O4\/c19-14-10-12(11-4-1-2-5-13(11)17-14)16(21)22-9-8-18-7-3-6-15(18)20\/h1-2,4-5,10H,3,6-9H2,(H,17,19)?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of O=COCCNCCCC5=O)))))))))ccc=O)[nH]cccccc%106."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_1-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the DeepSMILES.\nDeepSMILES: COcccccCC=O)Ncccc-ccscC)n5)))))cc6)))))))))c6\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C19H18N2O2S\/c1-13-20-18(12-24-13)15-6-8-16(9-7-15)21-19(22)11-14-4-3-5-17(10-14)23-2\/h3-10,12H,11H2,1-2H3,(H,21,22)"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the InChI string from the DeepSMILES.\nDeepSMILES: COcccc[nH]cc=O)nN)c=O)[nH]c6c9c%13\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C11H10N4O3\/c1-18-5-2-3-7-6(4-5)8-9(13-7)10(16)15(12)11(17)14-8\/h2-4,13H,12H2,1H3,(H,14,17)"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_0-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the InChI string.\nInChI string: InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C22H23N3O4\/c1-22(2,3)24-21(27)19(11-7-10-16-8-5-4-6-9-16)23-20(26)17-12-14-18(15-13-17)25(28)29\/h4-15H,1-3H3,(H,23,26)(H,24,27)\/b10-7+,19-11+\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CCC)C)NC=O)\/C=C\\C=C\\cccccc6)))))))))NC=O)cccc[N+]=O)[O-]))cc6"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_5-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C14H15N7S\/c1-2-4-10(5-3-1)8-21-14(18-19-20-21)22-9-12-15-13(17-16-12)11-6-7-11\/h1-5,11H,6-9H2,(H,15,16,17) can also be represented with the DeepSMILES representation ccccCnnnnc5SCcncCCC3)))n[nH]5)))))))))))))cc6."} {"text":"The molecule with the InChI string InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1 can also be represented with the DeepSMILES representation Nccc[nH+]cccccc%106)))))))CCCC6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_0-1.jsonl": "{"text":"The molecule with the DeepSMILES S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6 can also be represented with the InChI string representation InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1."} {"text":"The molecule with the DeepSMILES CCC)C)NC=O)\/C=C\\C=C\\cccccc6)))))))))NC=O)cccc[N+]=O)[O-]))cc6 can also be represented with the InChI string representation InChI=1S\/C22H23N3O4\/c1-22(2,3)24-21(27)19(11-7-10-16-8-5-4-6-9-16)23-20(26)17-12-14-18(15-13-17)25(28)29\/h4-15H,1-3H3,(H,23,26)(H,24,27)\/b10-7+,19-11+."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_2-4.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the InChI string InChI=1S\/C22H22Cl2N2O2\/c1-3-5-12-26(4-2)22(28)21(27)19-17-13-16(24)10-11-18(17)25-20(19)14-6-8-15(23)9-7-14\/h6-11,13,25H,3-5,12H2,1-2H3?\nAssistant: Sure, this molecule has a DeepSMILES of CCCCNCC))C=O)C=O)cc-ccccCl)cc6))))))[nH]ccccCl)cc96."} {"text":"User: Can you create the DeepSMILES of the molecule with the InChI string InChI=1S\/C22H26BrClN4O2\/c1-26(15-21(29)25-20-5-3-2-4-19(20)23)22(30)16-28-12-10-27(11-13-28)14-17-6-8-18(24)9-7-17\/h2-9H,10-16H2,1H3,(H,25,29)?\nAssistant: Of course, this molecule has a DeepSMILES of CNCC=O)Ncccccc6Br))))))))))C=O)CNCCNCccccCl)cc6)))))))CC6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_2-1.jsonl": "{"text":"The molecule with the DeepSMILES Cl.Fcccccc6)CCCCNCCcccccc6C%10))))))))))))C5 can also be represented with the InChI string InChI=1S\/C20H22FN.ClH\/c21-20-6-5-17-11-15(12-19(17)13-20)7-9-22-10-8-16-3-1-2-4-18(16)14-22;\/h1-6,13,15H,7-12,14H2;1H."} {"text":"The molecule with the DeepSMILES Ccccc-ncSCC=O)NCcccco5))))))))))ncc[nH]cccccc69)))))))c6=O)))))))cc6 can also be represented with the InChI string InChI=1S\/C24H20N4O3S\/c1-15-8-10-16(11-9-15)28-23(30)22-21(18-6-2-3-7-19(18)26-22)27-24(28)32-14-20(29)25-13-17-5-4-12-31-17\/h2-12,26H,13-14H2,1H3,(H,25,29)."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_1-1.jsonl": "{"text":"The molecule with the DeepSMILES COcccccCC=O)Ncccc-ccscC)n5)))))cc6)))))))))c6 can also be represented with the InChI string representation InChI=1S\/C19H18N2O2S\/c1-13-20-18(12-24-13)15-6-8-16(9-7-15)21-19(22)11-14-4-3-5-17(10-14)23-2\/h3-10,12H,11H2,1-2H3,(H,21,22)."} {"text":"The molecule with the DeepSMILES COcccc[nH]cc=O)nN)c=O)[nH]c6c9c%13 can also be represented with the InChI string representation InChI=1S\/C11H10N4O3\/c1-18-5-2-3-7-6(4-5)8-9(13-7)10(16)15(12)11(17)14-8\/h2-4,13H,12H2,1H3,(H,14,17)."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_3-5.jsonl": "{"text":"User: Can you generate the InChI string of the molecule with the DeepSMILES CCCcccNccccC)cc6)))))))nncnc5n9?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C15H17N5\/c1-3-4-13-9-14(20-15(19-13)16-10-17-20)18-12-7-5-11(2)6-8-12\/h5-10,18H,3-4H2,1-2H3."} {"text":"User: Can you generate the InChI string of the molecule with the DeepSMILES CC=O)NcccccC=O)OCCNCCCC5=O))))))))))c6?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C15H18N2O4\/c1-11(18)16-13-5-2-4-12(10-13)15(20)21-9-8-17-7-3-6-14(17)19\/h2,4-5,10H,3,6-9H2,1H3,(H,16,18)."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_3-1.jsonl": "{"text":"The molecule with the DeepSMILES representation of CCnncC)cNC=O)\/C=C\/ccccOC))cCOccccCl)cc6Br)))))))))c6))))))))))c5C can also be represented with the InChI string InChI=1S\/C24H25BrClN3O3\/c1-5-29-16(3)24(15(2)28-29)27-23(30)11-7-17-6-9-21(31-4)18(12-17)14-32-22-10-8-19(26)13-20(22)25\/h6-13H,5,14H2,1-4H3,(H,27,30)\/b11-7+."} {"text":"The molecule with the DeepSMILES CCNCCC#N))))C=O)CScncc-ccccF)cc6))))))o5 can also be represented with the InChI string InChI=1S\/C16H16FN3O2S\/c1-2-20(9-3-8-18)15(21)11-23-16-19-10-14(22-16)12-4-6-13(17)7-5-12\/h4-7,10H,2-3,9,11H2,1H3."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_0-5.jsonl": "{"text":"User: Can you generate the InChI string of the molecule with the DeepSMILES S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1."} {"text":"User: Can you generate the InChI string of the molecule with the DeepSMILES CCC)C)NC=O)\/C=C\\C=C\\cccccc6)))))))))NC=O)cccc[N+]=O)[O-]))cc6?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C22H23N3O4\/c1-22(2,3)24-21(27)19(11-7-10-16-8-5-4-6-9-16)23-20(26)17-12-14-18(15-13-17)25(28)29\/h4-15H,1-3H3,(H,23,26)(H,24,27)\/b10-7+,19-11+."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_0-4.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the InChI string InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6."} {"text":"User: Can you tell me the DeepSMILES of the molecule with the InChI string InChI=1S\/C22H23N3O4\/c1-22(2,3)24-21(27)19(11-7-10-16-8-5-4-6-9-16)23-20(26)17-12-14-18(15-13-17)25(28)29\/h4-15H,1-3H3,(H,23,26)(H,24,27)\/b10-7+,19-11+?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of CCC)C)NC=O)\/C=C\\C=C\\cccccc6)))))))))NC=O)cccc[N+]=O)[O-]))cc6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_1-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the InChI string from the DeepSMILES.\nMolecule DeepSMILES: CCOccccCNC=O)CCcccccc6N9C=O)CC)))))))))))))))cc6OC\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C22H26N2O4\/c1-4-21(25)24-17-9-7-6-8-16(17)13-18(24)22(26)23-14-15-10-11-19(28-5-2)20(12-15)27-3\/h6-12,18H,4-5,13-14H2,1-3H3,(H,23,26)"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the DeepSMILES.\nDeepSMILES: CcncNC=O)cccccn6))))))))sc5S=O)=O)Ccccccc6Cl\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C17H14ClN3O3S2\/c1-11-16(26(23,24)10-12-6-2-3-7-13(12)18)25-17(20-11)21-15(22)14-8-4-5-9-19-14\/h2-9H,10H2,1H3,(H,20,21,22)"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_5-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the DeepSMILES.\nMolecule DeepSMILES: CCC)CNC=O)ccncccsc5c9)))))))))))C=O)NCCCC5\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C17H21N3O2S\/c1-11(2)15(17(22)20-6-3-4-7-20)19-16(21)12-9-14-13(18-10-12)5-8-23-14\/h5,8-11,15H,3-4,6-7H2,1-2H3,(H,19,21)"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the InChI string from the DeepSMILES.\nDeepSMILES: CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_0-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the DeepSMILES OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1."} {"text":"User: Can you tell me the InChI string of the molecule with the DeepSMILES COcccc-cccC=O)Ncccccc6SC))))))))))no5)))))cc6?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C18H16N2O3S\/c1-22-13-9-7-12(8-10-13)16-11-15(20-23-16)18(21)19-14-5-3-4-6-17(14)24-2\/h3-11H,1-2H3,(H,19,21)."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_4-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the DeepSMILES O=CNCccccS=O)=O)NCCC3)))))cc6))))))))CCC=CCC6?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C17H22N2O3S\/c20-17(14-4-2-1-3-5-14)18-12-13-6-10-16(11-7-13)23(21,22)19-15-8-9-15\/h1-2,6-7,10-11,14-15,19H,3-5,8-9,12H2,(H,18,20)."} {"text":"User: Can you tell me the InChI string of the molecule with the DeepSMILES CCOcccF)ccc6NCcccc-ncncn5)))))cc6?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C17H17FN4O\/c1-2-23-17-9-14(18)5-8-16(17)20-10-13-3-6-15(7-4-13)22-12-19-11-21-22\/h3-9,11-12,20H,2,10H2,1H3."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_1-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C22H26N2O4\/c1-4-21(25)24-17-9-7-6-8-16(17)13-18(24)22(26)23-14-15-10-11-19(28-5-2)20(12-15)27-3\/h6-12,18H,4-5,13-14H2,1-3H3,(H,23,26) can also be represented with the DeepSMILES CCOccccCNC=O)CCcccccc6N9C=O)CC)))))))))))))))cc6OC."} {"text":"The molecule with the InChI string InChI=1S\/C17H14ClN3O3S2\/c1-11-16(26(23,24)10-12-6-2-3-7-13(12)18)25-17(20-11)21-15(22)14-8-4-5-9-19-14\/h2-9H,10H2,1H3,(H,20,21,22) can also be represented with the DeepSMILES CcncNC=O)cccccn6))))))))sc5S=O)=O)Ccccccc6Cl."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_0-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the InChI string.\nInChI string: InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the InChI string.\nInChI string: InChI=1S\/C18H16N2O3S\/c1-22-13-9-7-12(8-10-13)16-11-15(20-23-16)18(21)19-14-5-3-4-6-17(14)24-2\/h3-11H,1-2H3,(H,19,21)\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: COcccc-cccC=O)Ncccccc6SC))))))))))no5)))))cc6"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_4-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C17H22N2O3S\/c20-17(14-4-2-1-3-5-14)18-12-13-6-10-16(11-7-13)23(21,22)19-15-8-9-15\/h1-2,6-7,10-11,14-15,19H,3-5,8-9,12H2,(H,18,20)\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: O=CNCccccS=O)=O)NCCC3)))))cc6))))))))CCC=CCC6"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C17H17FN4O\/c1-2-23-17-9-14(18)5-8-16(17)20-10-13-3-6-15(7-4-13)22-12-19-11-21-22\/h3-9,11-12,20H,2,10H2,1H3\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CCOcccF)ccc6NCcccc-ncncn5)))))cc6"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_4-1.jsonl": "{"text":"The molecule with the DeepSMILES CScncCCC3)))ccC=O)NcnccC)s5)))))))c6C#N can also be represented with the InChI string representation InChI=1S\/C15H14N4OS2\/c1-8-7-17-15(22-8)19-13(20)10-5-12(9-3-4-9)18-14(21-2)11(10)6-16\/h5,7,9H,3-4H2,1-2H3,(H,17,19,20)."} {"text":"The molecule with the DeepSMILES CCC)CNC=O)ccccF)cc6))))))))C=O)OCCCCCO5 can also be represented with the InChI string representation InChI=1S\/C17H22FNO4\/c1-11(2)15(17(21)23-10-14-4-3-9-22-14)19-16(20)12-5-7-13(18)8-6-12\/h5-8,11,14-15H,3-4,9-10H2,1-2H3,(H,19,20)."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_5-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the DeepSMILES.\nMolecule DeepSMILES: CccccN)cc6NC=O)CScccccc6F\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C15H15FN2OS\/c1-10-6-7-11(17)8-13(10)18-15(19)9-20-14-5-3-2-4-12(14)16\/h2-8H,9,17H2,1H3,(H,18,19)"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the InChI string from the DeepSMILES.\nDeepSMILES: CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8-"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_2-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C16H12FN3O\/c17-12-3-1-11(2-4-12)15-13(16(18)21)9-14(20-15)10-5-7-19-8-6-10\/h1-9,20H,(H2,18,21)\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: NC=O)ccc-cccncc6))))))[nH]c5-ccccF)cc6"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the InChI string.\nInChI string: InChI=1S\/C17H15N3O3\/c1-23-15-8-7-11(9-14(15)18)16-13(17(21)22)10-20(19-16)12-5-3-2-4-6-12\/h2-10H,18H2,1H3,(H,21,22)\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: COcccc-cnn-cccccc6))))))cc5C=O)O)))))))cc6N"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_1-1.jsonl": "{"text":"The molecule with the DeepSMILES representation of COccc\/C=C\/C=O)O))))ccS=O)=O)NccccF)cc6))))))))c6OC can also be represented with the InChI string representation InChI=1S\/C17H16FNO6S\/c1-24-14-9-11(3-8-16(20)21)10-15(17(14)25-2)26(22,23)19-13-6-4-12(18)5-7-13\/h3-10,19H,1-2H3,(H,20,21)\/b8-3+."} {"text":"The molecule with the DeepSMILES representation of CccccNS=O)=O)ccccF)cc6))))))))cC=O)O))c6C can also be represented with the InChI string InChI=1S\/C15H14FNO4S\/c1-9-3-8-13(14(10(9)2)15(18)19)17-22(20,21)12-6-4-11(16)5-7-12\/h3-8,17H,1-2H3,(H,18,19)."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_0-1.jsonl": "{"text":"The molecule with the DeepSMILES representation of OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C can also be represented with the InChI string InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1."} {"text":"The molecule with the DeepSMILES representation of COcccc-cccC=O)Ncccccc6SC))))))))))no5)))))cc6 can also be represented with the InChI string InChI=1S\/C18H16N2O3S\/c1-22-13-9-7-12(8-10-13)16-11-15(20-23-16)18(21)19-14-5-3-4-6-17(14)24-2\/h3-11H,1-2H3,(H,19,21)."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_3-4.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the InChI string InChI=1S\/C24H25BrClN3O3\/c1-5-29-16(3)24(15(2)28-29)27-23(30)11-7-17-6-9-21(31-4)18(12-17)14-32-22-10-8-19(26)13-20(22)25\/h6-13H,5,14H2,1-4H3,(H,27,30)\/b11-7+?\nAssistant: Of course, this molecule has a DeepSMILES of CCnncC)cNC=O)\/C=C\/ccccOC))cCOccccCl)cc6Br)))))))))c6))))))))))c5C."} {"text":"User: Can you tell me the DeepSMILES of the molecule with the InChI string InChI=1S\/C16H16FN3O2S\/c1-2-20(9-3-8-18)15(21)11-23-16-19-10-14(22-16)12-4-6-13(17)7-5-12\/h4-7,10H,2-3,9,11H2,1H3?\nAssistant: Sure, this molecule has a DeepSMILES of CCNCCC#N))))C=O)CScncc-ccccF)cc6))))))o5."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_0-4.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the InChI string InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1?\nAssistant: Yes, this molecule has a DeepSMILES of OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C."} {"text":"User: Can you create the DeepSMILES of the molecule with the InChI string InChI=1S\/C18H16N2O3S\/c1-22-13-9-7-12(8-10-13)16-11-15(20-23-16)18(21)19-14-5-3-4-6-17(14)24-2\/h3-11H,1-2H3,(H,19,21)?\nAssistant: Sure, this molecule has a DeepSMILES of COcccc-cccC=O)Ncccccc6SC))))))))))no5)))))cc6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_1-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C19H18N2O2S\/c1-13-20-18(12-24-13)15-6-8-16(9-7-15)21-19(22)11-14-4-3-5-17(10-14)23-2\/h3-10,12H,11H2,1-2H3,(H,21,22) can also be represented with the DeepSMILES representation COcccccCC=O)Ncccc-ccscC)n5)))))cc6)))))))))c6."} {"text":"The molecule with the InChI string representation of InChI=1S\/C11H10N4O3\/c1-18-5-2-3-7-6(4-5)8-9(13-7)10(16)15(12)11(17)14-8\/h2-4,13H,12H2,1H3,(H,14,17) can also be represented with the DeepSMILES COcccc[nH]cc=O)nN)c=O)[nH]c6c9c%13."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_4-5.jsonl": "{"text":"User: Can you generate the InChI string of the molecule with the DeepSMILES CScncCCC3)))ccC=O)NcnccC)s5)))))))c6C#N?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C15H14N4OS2\/c1-8-7-17-15(22-8)19-13(20)10-5-12(9-3-4-9)18-14(21-2)11(10)6-16\/h5,7,9H,3-4H2,1-2H3,(H,17,19,20)."} {"text":"User: Can you create the InChI string of the molecule with the DeepSMILES CCC)CNC=O)ccccF)cc6))))))))C=O)OCCCCCO5?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C17H22FNO4\/c1-11(2)15(17(21)23-10-14-4-3-9-22-14)19-16(20)12-5-7-13(18)8-6-12\/h5-8,11,14-15H,3-4,9-10H2,1-2H3,(H,19,20)."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_4-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the InChI string from the DeepSMILES.\nDeepSMILES: O=CNCccccS=O)=O)NCCC3)))))cc6))))))))CCC=CCC6\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C17H22N2O3S\/c20-17(14-4-2-1-3-5-14)18-12-13-6-10-16(11-7-13)23(21,22)19-15-8-9-15\/h1-2,6-7,10-11,14-15,19H,3-5,8-9,12H2,(H,18,20)"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the DeepSMILES.\nMolecule DeepSMILES: CCOcccF)ccc6NCcccc-ncncn5)))))cc6\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C17H17FN4O\/c1-2-23-17-9-14(18)5-8-16(17)20-10-13-3-6-15(7-4-13)22-12-19-11-21-22\/h3-9,11-12,20H,2,10H2,1H3"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_4-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the DeepSMILES.\nDeepSMILES: Ccoccc5C=O)NCccccS=O)=O)NCCC3)))))cc6\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C16H18N2O4S\/c1-11-15(8-9-22-11)16(19)17-10-12-2-6-14(7-3-12)23(20,21)18-13-4-5-13\/h2-3,6-9,13,18H,4-5,10H2,1H3,(H,17,19)"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the DeepSMILES.\nDeepSMILES: Ccoccc5C=O)NCCcnc-cccccn6))))))cs5\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C16H15N3O2S\/c1-11-12(6-9-21-11)16(20)18-8-5-15-19-14(10-22-15)13-4-2-3-7-17-13\/h2-4,6-7,9-10H,5,8H2,1H3,(H,18,20)"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_3-1.jsonl": "{"text":"The molecule with the DeepSMILES COC=O)C=CCScncccc6C#N))))CNC)CC6))))))))))OCN)=CC#N))C6ccccF)cc6 can also be represented with the InChI string representation InChI=1S\/C25H22FN5O3S\/c1-31-8-7-19-16(12-31)9-15(10-27)24(30-19)35-13-20-22(25(32)33-2)21(18(11-28)23(29)34-20)14-3-5-17(26)6-4-14\/h3-6,9,21H,7-8,12-13,29H2,1-2H3."} {"text":"The molecule with the DeepSMILES representation of O=COCCNCCCC5=O)))))))))ccc=O)[nH]cccccc%106 can also be represented with the InChI string representation InChI=1S\/C16H16N2O4\/c19-14-10-12(11-4-1-2-5-13(11)17-14)16(21)22-9-8-18-7-3-6-15(18)20\/h1-2,4-5,10H,3,6-9H2,(H,17,19)."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_2-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the DeepSMILES.\nMolecule DeepSMILES: CCCCNCC))C=O)C=O)cc-ccccCl)cc6))))))[nH]ccccCl)cc96\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C22H22Cl2N2O2\/c1-3-5-12-26(4-2)22(28)21(27)19-17-13-16(24)10-11-18(17)25-20(19)14-6-8-15(23)9-7-14\/h6-11,13,25H,3-5,12H2,1-2H3"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the InChI string from the DeepSMILES.\nMolecule DeepSMILES: CNCC=O)Ncccccc6Br))))))))))C=O)CNCCNCccccCl)cc6)))))))CC6\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C22H26BrClN4O2\/c1-26(15-21(29)25-20-5-3-2-4-19(20)23)22(30)16-28-12-10-27(11-13-28)14-17-6-8-18(24)9-7-17\/h2-9H,10-16H2,1H3,(H,25,29)"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_3-1.jsonl": "{"text":"The molecule with the DeepSMILES representation of CCCcccNccccC)cc6)))))))nncnc5n9 can also be represented with the InChI string representation InChI=1S\/C15H17N5\/c1-3-4-13-9-14(20-15(19-13)16-10-17-20)18-12-7-5-11(2)6-8-12\/h5-10,18H,3-4H2,1-2H3."} {"text":"The molecule with the DeepSMILES CC=O)NcccccC=O)OCCNCCCC5=O))))))))))c6 can also be represented with the InChI string InChI=1S\/C15H18N2O4\/c1-11(18)16-13-5-2-4-12(10-13)15(20)21-9-8-17-7-3-6-14(17)19\/h2,4-5,10H,3,6-9H2,1H3,(H,16,18)."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_0-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the DeepSMILES.\nMolecule DeepSMILES: S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the DeepSMILES.\nDeepSMILES: CCC)C)NC=O)\/C=C\\C=C\\cccccc6)))))))))NC=O)cccc[N+]=O)[O-]))cc6\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C22H23N3O4\/c1-22(2,3)24-21(27)19(11-7-10-16-8-5-4-6-9-16)23-20(26)17-12-14-18(15-13-17)25(28)29\/h4-15H,1-3H3,(H,23,26)(H,24,27)\/b10-7+,19-11+"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_1-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the DeepSMILES COcccccCC=O)Ncccc-ccscC)n5)))))cc6)))))))))c6?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C19H18N2O2S\/c1-13-20-18(12-24-13)15-6-8-16(9-7-15)21-19(22)11-14-4-3-5-17(10-14)23-2\/h3-10,12H,11H2,1-2H3,(H,21,22)."} {"text":"User: Can you create the InChI string of the molecule with the DeepSMILES COcccc[nH]cc=O)nN)c=O)[nH]c6c9c%13?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C11H10N4O3\/c1-18-5-2-3-7-6(4-5)8-9(13-7)10(16)15(12)11(17)14-8\/h2-4,13H,12H2,1H3,(H,14,17)."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_4-1.jsonl": "{"text":"The molecule with the DeepSMILES representation of Ccoccc5C=O)NCccccS=O)=O)NCCC3)))))cc6 can also be represented with the InChI string InChI=1S\/C16H18N2O4S\/c1-11-15(8-9-22-11)16(19)17-10-12-2-6-14(7-3-12)23(20,21)18-13-4-5-13\/h2-3,6-9,13,18H,4-5,10H2,1H3,(H,17,19)."} {"text":"The molecule with the DeepSMILES representation of Ccoccc5C=O)NCCcnc-cccccn6))))))cs5 can also be represented with the InChI string InChI=1S\/C16H15N3O2S\/c1-11-12(6-9-21-11)16(20)18-8-5-15-19-14(10-22-15)13-4-2-3-7-17-13\/h2-4,6-7,9-10H,5,8H2,1H3,(H,18,20)."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_5-4.jsonl": "{"text":"User: Can you tell me the DeepSMILES of the molecule with the InChI string InChI=1S\/C15H15FN2OS\/c1-10-6-7-11(17)8-13(10)18-15(19)9-20-14-5-3-2-4-12(14)16\/h2-8H,9,17H2,1H3,(H,18,19)?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of CccccN)cc6NC=O)CScccccc6F."} {"text":"User: Can you create the DeepSMILES of the molecule with the InChI string InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8-?\nAssistant: Yes, this molecule has a DeepSMILES of CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/valid_2-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the DeepSMILES CCCCNCC))C=O)C=O)cc-ccccCl)cc6))))))[nH]ccccCl)cc96?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C22H22Cl2N2O2\/c1-3-5-12-26(4-2)22(28)21(27)19-17-13-16(24)10-11-18(17)25-20(19)14-6-8-15(23)9-7-14\/h6-11,13,25H,3-5,12H2,1-2H3."} {"text":"User: Can you tell me the InChI string of the molecule with the DeepSMILES CNCC=O)Ncccccc6Br))))))))))C=O)CNCCNCccccCl)cc6)))))))CC6?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C22H26BrClN4O2\/c1-26(15-21(29)25-20-5-3-2-4-19(20)23)22(30)16-28-12-10-27(11-13-28)14-17-6-8-18(24)9-7-17\/h2-9H,10-16H2,1H3,(H,25,29)."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_1-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C17H16FNO6S\/c1-24-14-9-11(3-8-16(20)21)10-15(17(14)25-2)26(22,23)19-13-6-4-12(18)5-7-13\/h3-10,19H,1-2H3,(H,20,21)\/b8-3+\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: COccc\/C=C\/C=O)O))))ccS=O)=O)NccccF)cc6))))))))c6OC"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C15H14FNO4S\/c1-9-3-8-13(14(10(9)2)15(18)19)17-22(20,21)12-6-4-11(16)5-7-12\/h3-8,17H,1-2H3,(H,18,19)\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CccccNS=O)=O)ccccF)cc6))))))))cC=O)O))c6C"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_0-4.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the InChI string InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1?\nAssistant: Of course, this molecule has a DeepSMILES of S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6."} {"text":"User: Can you generate the DeepSMILES of the molecule with the InChI string InChI=1S\/C22H18ClN3O2\/c1-28-19-9-7-15(23)12-18(19)21(26-16-5-3-10-24-13-16)17-8-6-14-4-2-11-25-20(14)22(17)27\/h2-13,21,26-27H,1H3?\nAssistant: Sure, this molecule has a DeepSMILES of COccccCl)cc6CNccccnc6)))))))cccccccnc6c%10O."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_3-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C25H22FN5O3S\/c1-31-8-7-19-16(12-31)9-15(10-27)24(30-19)35-13-20-22(25(32)33-2)21(18(11-28)23(29)34-20)14-3-5-17(26)6-4-14\/h3-6,9,21H,7-8,12-13,29H2,1-2H3 can also be represented with the DeepSMILES COC=O)C=CCScncccc6C#N))))CNC)CC6))))))))))OCN)=CC#N))C6ccccF)cc6."} {"text":"The molecule with the InChI string representation of InChI=1S\/C16H16N2O4\/c19-14-10-12(11-4-1-2-5-13(11)17-14)16(21)22-9-8-18-7-3-6-15(18)20\/h1-2,4-5,10H,3,6-9H2,(H,17,19) can also be represented with the DeepSMILES O=COCCNCCCC5=O)))))))))ccc=O)[nH]cccccc%106."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/test_2-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the DeepSMILES.\nDeepSMILES: NC=O)ccc-cccncc6))))))[nH]c5-ccccF)cc6\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C16H12FN3O\/c17-12-3-1-11(2-4-12)15-13(16(18)21)9-14(20-15)10-5-7-19-8-6-10\/h1-9,20H,(H2,18,21)"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the DeepSMILES.\nDeepSMILES: COcccc-cnn-cccccc6))))))cc5C=O)O)))))))cc6N\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C17H15N3O3\/c1-23-15-8-7-11(9-14(15)18)16-13(17(21)22)10-20(19-16)12-5-3-2-4-6-12\/h2-10H,18H2,1H3,(H,21,22)"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_4-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C17H22N2O3S\/c20-17(14-4-2-1-3-5-14)18-12-13-6-10-16(11-7-13)23(21,22)19-15-8-9-15\/h1-2,6-7,10-11,14-15,19H,3-5,8-9,12H2,(H,18,20) can also be represented with the DeepSMILES representation O=CNCccccS=O)=O)NCCC3)))))cc6))))))))CCC=CCC6."} {"text":"The molecule with the InChI string representation of InChI=1S\/C17H17FN4O\/c1-2-23-17-9-14(18)5-8-16(17)20-10-13-3-6-15(7-4-13)22-12-19-11-21-22\/h3-9,11-12,20H,2,10H2,1H3 can also be represented with the DeepSMILES CCOcccF)ccc6NCcccc-ncncn5)))))cc6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_3-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C25H22FN5O3S\/c1-31-8-7-19-16(12-31)9-15(10-27)24(30-19)35-13-20-22(25(32)33-2)21(18(11-28)23(29)34-20)14-3-5-17(26)6-4-14\/h3-6,9,21H,7-8,12-13,29H2,1-2H3\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: COC=O)C=CCScncccc6C#N))))CNC)CC6))))))))))OCN)=CC#N))C6ccccF)cc6"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C16H16N2O4\/c19-14-10-12(11-4-1-2-5-13(11)17-14)16(21)22-9-8-18-7-3-6-15(18)20\/h1-2,4-5,10H,3,6-9H2,(H,17,19)\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: O=COCCNCCCC5=O)))))))))ccc=O)[nH]cccccc%106"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_inchi/train_3-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the DeepSMILES.\nDeepSMILES: COC=O)C=CCScncccc6C#N))))CNC)CC6))))))))))OCN)=CC#N))C6ccccF)cc6\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C25H22FN5O3S\/c1-31-8-7-19-16(12-31)9-15(10-27)24(30-19)35-13-20-22(25(32)33-2)21(18(11-28)23(29)34-20)14-3-5-17(26)6-4-14\/h3-6,9,21H,7-8,12-13,29H2,1-2H3"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the InChI string from the DeepSMILES.\nMolecule DeepSMILES: O=COCCNCCCC5=O)))))))))ccc=O)[nH]cccccc%106\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C16H16N2O4\/c19-14-10-12(11-4-1-2-5-13(11)17-14)16(21)22-9-8-18-7-3-6-15(18)20\/h1-2,4-5,10H,3,6-9H2,(H,17,19)"}", "/scratch/micpie/export/iupac_smiles/valid_5-2.jsonl": "{"text":"The preferred IUPAC name of the molecule with SMILES CCC1CNCCC1(C2(CCCCCCC2)CN)O is 4-[1-(aminomethyl)cyclooctyl]-3-ethylpiperidin-4-ol."} {"text":"The IUPAC name of the compound with canonical SMILES CC(CC(=O)Nc1ccccc1C(=O)NCCN)NC(=O)C1CCCC1.Cl is N-(2-aminoethyl)-2-[3-(cyclopentanecarbonylamino)butanoylamino]benzamide;hydrochloride."}", "/scratch/micpie/export/iupac_smiles/valid_6-8.jsonl": "{"text":"Task: Please create the SMILES of a chemical given the systematic IUPAC name.\nIUPAC name: N-(2-azanylethyl)-2-(5-methoxypentanoylamino)benzamide\nResult: COCCCCC(=O)NC1=CC=CC=C1C(=O)NCCN"} {"text":"Task: Please generate the SMILES of a compound given the systematic IUPAC name.\nIUPAC name: 2-methoxy-5-[[(phenylmethyl)-propan-2-yl-amino]methyl]aniline\nResult: [C][C][Branch1][C][C][N][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][=C][C][=Branch1][#Branch2][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][O][C][N]"}", "/scratch/micpie/export/iupac_smiles/test_6-4.jsonl": "{"text":"The DeepSMILES of the chemical with systematic IUPAC name N-(2-azanylethyl)-2-[(3-methyl-3-phenyl-butanoyl)amino]benzamide is CCC)CC=O)NC=CC=CC=C6C=O)NCCN))))))))))))))C=CC=CC=C6."} {"text":"The SELFIES of the molecule with systematic IUPAC name N-[(3,4-dichlorophenyl)methyl]-2,3-dihydro-1,4-benzodioxin-6-amine is [C][C][O][C][=C][Branch1][Ring2][O][Ring1][=Branch1][C][=C][C][=Branch1][Ring2][=C][Ring1][#Branch1][N][C][C][=C][C][=Branch1][=Branch2][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl][Cl]."}", "/scratch/micpie/export/iupac_smiles/valid_17-2.jsonl": "{"text":"The preferred IUPAC name of the compound with InChI InChI=1S\/C30H24FN7O2\/c1-2-40-30-33-26-9-5-8-25(29(39)32-22-16-14-21(31)15-17-22)27(26)38(30)18-19-10-12-20(13-11-19)23-6-3-4-7-24(23)28-34-36-37-35-28\/h3-17H,2,18H2,1H3,(H,32,39)(H,34,35,36,37) is 2-ethoxy-N-(4-fluorophenyl)-3-[[4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl]benzimidazole-4-carboxamide."} {"text":"The IUPAC name of the molecule with SELFIES [C][C][Branch1][C][C][Branch1][C][C][N][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=C][C][=Branch2][Ring1][Branch2][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][N][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][N][C][Branch1][C][C][Branch1][C][C][C] is 3-N-(3-aminophenyl)-1-N,5-N-bis[3-(tert-butylamino)phenyl]benzene-1,3,5-tricarboxamide."}", "/scratch/micpie/export/iupac_smiles/valid_9-8.jsonl": "{"text":"Task: Please create the SMILES of a molecule based on the systematic IUPAC name.\nIUPAC name: 1-(propan-2-ylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,9,11,13-heptaenylideneamino)oxy-propan-2-ol\nResult: InChI=1S\/C21H24N2O2\/c1-15(2)22-13-18(24)14-25-23-21-19-9-5-3-7-16(19)11-12-17-8-4-6-10-20(17)21\/h3-12,15,18,22,24H,13-14H2,1-2H3"} {"text":"Task: Please give me the SMILES representation of a molecule based on the systematic IUPAC name.\nIUPAC name: 5-chloranyl-3-[cyclobutyl(methyl)amino]-N-[(4-cycloheptyl-4,6-dimethyl-2-oxidanylidene-piperidin-3-yl)methyl]-2-methyl-benzamide\nResult: CCCCCC=O)N6))CNC=O)C=CC=CC=C6)Cl)))NC)CCCC4))))))C)))))))C)CCCCCCC7"}", "/scratch/micpie/export/iupac_smiles/train_10-2.jsonl": "{"text":"The preferred IUPAC name of the compound with canonical SMILES CCc1c(C(O)O)cc(Cl)cc1N(CC)C1CCC(NC(=O)OC(C)(C)C)CC1 is tert-butyl N-[4-[5-chloro-3-(dihydroxymethyl)-N,2-diethylanilino]cyclohexyl]carbamate."} {"text":"The preferred IUPAC name of the molecule with DeepSMILES CCC)CCC[C@]C5CCCCCCCCCC6CCC%10C%14CC%18))C))C)))))C)C))OC=O)C))))))C)))))))C=O)F is [(3aS)-3a-carbonofluoridoyl-5a,5b,8,8,11a-pentamethyl-1-propan-2-yl-1,2,3,4,5,6,7,7a,9,10,11,11b,12,13,13a,13b-hexadecahydrocyclopenta[a]chrysen-9-yl] acetate."}", "/scratch/micpie/export/iupac_smiles/valid_12-5.jsonl": "{"text":"The canonical SMILES of the compound with IUAPC name in CAS-like style 8-[6-(4,4-dioctoxy-1-oxobutoxy)hexyl-(2-hydroxyethyl)amino]octanoic acid ethyl ester is CCCCCCCCOC(CCC(=O)OCCCCCCN(CCO)CCCCCCCC(=O)OCC)OCCCCCCCC."} {"text":"The canonical SMILES of the chemical with CAS-like IUPAC name 2-amino-3,3-dimethyl-N-(5,5,5-trifluoropentyl)butanamide is CC(C)(C)C(N)C(=O)NCCCCC(F)(F)F."}", "/scratch/micpie/export/iupac_smiles/valid_24-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with canonical SMILES O=C(Cn1[nH]c(=O)c2ccccc2c1=O)Nc1ccccc1O is 2-(1,4-diketo-3H-phthalazin-2-yl)-N-(2-hydroxyphenyl)acetamide."} {"text":"The traditional IUPAC name of the molecule with SMILES CC1=CC(=NC2=C(C=NN12)C(=O)NC(C)C(F)(F)F)C3=CC4=C(C(=C3)F)OC=N4 is 5-(7-fluoro-1,3-benzoxazol-5-yl)-7-methyl-N-(2,2,2-trifluoro-1-methyl-ethyl)pyrazolo[1,5-a]pyrimidine-3-carboxamide."}", "/scratch/micpie/export/iupac_smiles/train_15-9.jsonl": "{"text":"Task: Please generate the SMILES representation of a compound based on the IUAPC name in CAS-like style.\nIUPAC name: 1-methyl-9H-pyrido[3,4-b]indole-7-carboxylic acid\nResult: [C][C][=N][C][=C][C][=C][Ring1][=Branch1][N][C][=C][Ring1][Branch1][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=Branch1][C][=O][O]"} {"text":"Task: Please generate the SMILES of a chemical based on the CAS-like IUPAC name.\nIUPAC name: 8-[[5-chloro-4-(chloromethyl)-2-pyridinyl]oxy]quinoline\nResult: [C][=C][C][=C][Branch2][Ring1][#Branch1][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][=N][C][=C][Branch1][=Branch2][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][Cl][Cl][N][=C][C][=C][Ring2][Ring1][C]"}", "/scratch/micpie/export/iupac_smiles/train_1-8.jsonl": "{"text":"Task: Please give me the SMILES representation of a chemical based on the systematic IUPAC name.\nIUPAC name: 3-[(1Z)-3-chloranylbuta-1,3-dienyl]-2-methyl-N-(4-piperidin-1-ylbutyl)pyridin-4-amine\nResult: CC1=NC=CC(=C1\/C=C\\C(=C)Cl)NCCCCN2CCCCC2"} {"text":"Task: Please generate the SMILES representation of a compound based on the systematic IUPAC name.\nIUPAC name: (6S,8R,9S,13S,14S,17S)-3-methoxy-13-methyl-17-[(2R)-oxan-2-yl]oxy-6,7,8,9,11,12,14,15,16,17-decahydrocyclopenta[a]phenanthren-6-ol\nResult: [C][C@][C][C][C@H1][C@H1][Branch2][Ring1][C][C@@H1][Ring1][=Branch1][C][C][C@@H1][Ring1][=Branch2][O][C@@H1][C][C][C][C][O][Ring1][=Branch1][C][C@@H1][Branch1][#C][C][=C][Ring1][P][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O]"}", "/scratch/micpie/export/iupac_smiles/train_15-5.jsonl": "{"text":"The InChI of the molecule with CAS-like IUPAC name 1-methyl-9H-pyrido[3,4-b]indole-7-carboxylic acid is InChI=1S\/C13H10N2O2\/c1-7-12-10(4-5-14-7)9-3-2-8(13(16)17)6-11(9)15-12\/h2-6,15H,1H3,(H,16,17)."} {"text":"The InChI of the chemical with CAS-like IUPAC name 8-[[5-chloro-4-(chloromethyl)-2-pyridinyl]oxy]quinoline is InChI=1S\/C15H10Cl2N2O\/c16-8-11-7-14(19-9-12(11)17)20-13-5-1-3-10-4-2-6-18-15(10)13\/h1-7,9H,8H2."}", "/scratch/micpie/export/iupac_smiles/valid_15-4.jsonl": "{"text":"The canonical SMILES of the chemical with systematic IUPAC name 2-[5-[5-[4,4-bis(fluoranyl)piperidin-1-yl]carbonylpyridin-2-yl]-7-(trifluoromethyl)-1-benzofuran-2-yl]ethyl methanesulfonate is CS(=O)(=O)OCCc1cc2cc(-c3ccc(C(=O)N4CCC(F)(F)CC4)cn3)cc(C(F)(F)F)c2o1."} {"text":"The InChI of the compound with systematic IUPAC name 5-chloranyl-4-(chloromethyl)-2-naphthalen-1-yloxy-pyridine is InChI=1S\/C16H11Cl2NO\/c17-9-12-8-16(19-10-14(12)18)20-15-7-3-5-11-4-1-2-6-13(11)15\/h1-8,10H,9H2."}", "/scratch/micpie/export/iupac_smiles/train_11-5.jsonl": "{"text":"The canonical SMILES of the compound with CAS-like IUPAC name acetic acid [(14R)-17-(5,6-dihydroxy-6-methylheptan-2-yl)-4,4,10,13,14-pentamethyl-2,3,5,6,7,11,12,15,16,17-decahydro-1H-cyclopenta[a]phenanthren-3-yl] ester is CC(=O)OC1CCC2(C)C3=C(CCC2C1(C)C)[C@]1(C)CCC(C(C)CCC(O)C(C)(C)O)C1(C)CC3."} {"text":"The InChI of the molecule with IUAPC name in CAS-like style N-[3-(1-dibenzofuranyl)phenyl]-9,9-dimethyl-N-(4-phenylphenyl)-2-fluorenamine is InChI=1S\/C45H33NO\/c1-45(2)40-19-8-6-16-37(40)38-27-26-35(29-41(38)45)46(33-24-22-31(23-25-33)30-12-4-3-5-13-30)34-15-10-14-32(28-34)36-18-11-21-43-44(36)39-17-7-9-20-42(39)47-43\/h3-29H,1-2H3."}", "/scratch/micpie/export/iupac_smiles/train_22-7.jsonl": "{"text":"Task: Please generate the SMILES of a chemical given the traditional IUPAC name.\nIUPAC name: 5-mesyl-6-methyl-5-(5-methyl-1,3,4-oxadiazol-2-yl)-4-(trifluoromethyl)cyclohexa-1,3-diene-1-carboxamide\nResult: [C][C][C][=Branch2][Ring2][=Branch1][=C][C][=C][Branch2][Ring1][Branch2][C][Ring1][=Branch1][Branch1][O][C][=N][N][=C][Branch1][Ring2][O][Ring1][Branch1][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][Branch1][C][F][Branch1][C][F][F][C][=Branch1][C][=O][N]"} {"text":"Task: Please give me the SMILES of a compound given the traditional IUPAC name.\nIUPAC name: 1-propylsulfonylnipecotaldehyde\nResult: CCCS(=O)(=O)N1CCCC(C=O)C1"}", "/scratch/micpie/export/iupac_smiles/valid_15-0.jsonl": "{"text":"The traditional IUPAC name of the compound with SMILES CS(=O)(=O)OCCC1=CC2=CC(=CC(=C2O1)C(F)(F)F)C3=NC=C(C=C3)C(=O)N4CCC(CC4)(F)F is methanesulfonic acid 2-[5-[5-(4,4-difluoropiperidine-1-carbonyl)-2-pyridyl]-7-(trifluoromethyl)benzofuran-2-yl]ethyl ester."} {"text":"The traditional IUPAC name of the compound with DeepSMILES C=CC=CC=C6)C=CC=C6OC=NC=CC=C6)CCl)))Cl is 5-chloro-4-(chloromethyl)-2-(1-naphthoxy)pyridine."}", "/scratch/micpie/export/iupac_smiles/train_14-7.jsonl": "{"text":"Task: Please generate the SMILES of a chemical given the traditional IUPAC name.\nIUPAC name: (3R)-5-keto-1-(2,2,2-trifluoroethyl)pyrrolidine-3-carboxylic acid (1,3,5-trimethylpyrazol-4-yl) ester\nResult: [C][C][=C][Branch1][#Branch2][C][=Branch1][=Branch1][=N][N][Ring1][Branch1][C][C][O][C][=Branch1][C][=O][C@@H1][C][C][=Branch1][C][=O][N][Branch1][Ring2][C][Ring1][=Branch1][C][C][Branch1][C][F][Branch1][C][F][F]"} {"text":"Task: Please give me the SMILES representation of a molecule given the traditional IUPAC name.\nIUPAC name: [7-chloro-5-[4-(3-fluoropyrrolidino)sulfonylphenyl]benzofuran-2-yl]methylamine\nResult: C1CN(CC1F)S(=O)(=O)C2=CC=C(C=C2)C3=CC(=C4C(=C3)C=C(O4)CN)Cl"}", "/scratch/micpie/export/iupac_smiles/test_26-1.jsonl": "{"text":"The CAS-like IUPAC name of the compound with SELFIES [C][C][C][=C][Branch2][Ring1][=Branch1][S][C][=Branch1][Ring2][=N][Ring1][Branch1][C][=C][Branch1][#Branch2][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][Cl][Cl][C][=Branch1][C][=O][O] is 2-(2,3-dichlorophenyl)-4-ethyl-5-thiazolecarboxylic acid."} {"text":"The IUAPC name in CAS-like style of the molecule with canonical SMILES COc1cccc(C2=NN(C(=O)CN3CCc4ccccc4C3)[C@@H](c3ccccc3Cl)C2)c1 is 1-[(3R)-3-(2-chlorophenyl)-5-(3-methoxyphenyl)-3,4-dihydropyrazol-2-yl]-2-(3,4-dihydro-1H-isoquinolin-2-yl)ethanone."}", "/scratch/micpie/export/iupac_smiles/test_8-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the chemical with InChI InChI=1S\/C18H22N2O3S\/c1-12(2)18(21)19-15-6-5-7-16(11-15)20-24(22,23)17-9-8-13(3)14(4)10-17\/h5-12,20H,1-4H3,(H,19,21) is N-[3-[(3,4-dimethylphenyl)sulfonylamino]phenyl]-2-methylpropanamide."} {"text":"The IUAPC name in CAS-like style of the chemical with InChI InChI=1S\/C20H26N2O2\/c1-20(2,3)21-14-18(23)15-24-22-19(16-10-6-4-7-11-16)17-12-8-5-9-13-17\/h4-13,18,21,23H,14-15H2,1-3H3 is 1-(tert-butylamino)-3-[(diphenylmethylene)amino]oxy-2-propanol."}", "/scratch/micpie/export/iupac_smiles/test_21-6.jsonl": "{"text":"The SMILES of the chemical with preferred IUPAC name [4-[(2S)-1-(azetidin-1-yl)-3-cyclohexyl-1-oxopropan-2-yl]oxy-2-[5-[2-[[3-[3,5-dihydroxy-6-(hydroxymethyl)-4-[(2-oxochromen-3-yl)methoxy]oxan-2-yl]oxy-4-hydroxy-5-[(2-oxochromen-3-yl)methoxy]cyclohexanecarbonyl]amino]ethylcarbamoyl]-3-ethyl-2-(3,4,5-trihydroxy-6-methyloxan-2-yl)oxycyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)oxan-3-yl] benzoate is CCC1CC(CC(C1OC2C(C(C(C(O2)C)O)O)O)OC3C(C(C(C(O3)CO)O)O[C@@H](CC4CCCCC4)C(=O)N5CCC5)OC(=O)C6=CC=CC=C6)C(=O)NCCNC(=O)C7CC(C(C(C7)OC8C(C(C(C(O8)CO)O)OCC9=CC1=CC=CC=C1OC9=O)O)O)OCC1=CC2=CC=CC=C2OC1=O."} {"text":"The SMILES of the chemical with IUPAC name 2-(methoxymethyl)-3-[4-(trifluoromethoxy)phenyl]pyrazine is COCC1=NC=CN=C1C2=CC=C(C=C2)OC(F)(F)F."}", "/scratch/micpie/export/iupac_smiles/valid_19-2.jsonl": "{"text":"The preferred IUPAC name of the compound with canonical SMILES CNCCC1CCN(CC(=O)Nc2cc(-n3cnnn3)ccc2Cl)CC1.Cl is N-[2-chloro-5-(tetrazol-1-yl)phenyl]-2-[4-[2-(methylamino)ethyl]piperidin-1-yl]acetamide;hydrochloride."} {"text":"The preferred IUPAC name of the molecule with SMILES CC(C)C1CCC2(CCCC(=C)C2C1NC(=O)CN(CC(=O)NC3C(CCC4(C3C(=C)CCC4)C)C(C)C)CC(=O)OC)C is methyl 2-[bis[2-[(4a-methyl-8-methylidene-2-propan-2-yl-1,2,3,4,5,6,7,8a-octahydronaphthalen-1-yl)amino]-2-oxoethyl]amino]acetate."}", "/scratch/micpie/export/iupac_smiles/test_2-4.jsonl": "{"text":"The DeepSMILES of the molecule with systematic IUPAC name (2R)-2-(methylamino)-1-(2-methylphenyl)propan-1-one is CC=CC=CC=C6C=O)[C@@H]C)NC."} {"text":"The SMILES of the chemical with systematic IUPAC name 2-(6-bromanyl-2,3-dihydro-1,4-benzodioxin-7-yl)-4,5,6,7-tetrakis(chloranyl)isoindole-1,3-dione is C1COC2=C(O1)C=C(C(=C2)Br)N3C(=O)C4=C(C3=O)C(=C(C(=C4Cl)Cl)Cl)Cl."}", "/scratch/micpie/export/iupac_smiles/train_2-2.jsonl": "{"text":"The preferred IUPAC name of the compound with InChI InChI=1S\/C24H34O4\/c1-24-11-10-17-16-7-6-15(26-2)13-19(16)21(25)14-18(17)20(24)8-9-22(24)28-23-5-3-4-12-27-23\/h6-7,13,17-18,20-23,25H,3-5,8-12,14H2,1-2H3\/t17-,18-,20+,21-,22+,23+,24+\/m1\/s1 is (6R,8R,9S,13S,14S,17S)-3-methoxy-13-methyl-17-[(2S)-oxan-2-yl]oxy-6,7,8,9,11,12,14,15,16,17-decahydrocyclopenta[a]phenanthren-6-ol."} {"text":"The IUPAC name of the chemical with InChI InChI=1S\/C12H18O3\/c1-10-2-8-3-11(5-10,9(13)14)7-12(15,4-8)6-10\/h8,15H,2-7H2,1H3,(H,13,14)\/p-1\/t8-,10+,11+,12-\/m1\/s1 is (1S,3R,5S,7R)-3-hydroxy-5-methyladamantane-1-carboxylate."}", "/scratch/micpie/export/iupac_smiles/test_4-0.jsonl": "{"text":"The traditional IUPAC name of the chemical with SELFIES [C][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][=C][N][Branch1][O][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][=Branch2][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C] is 1-(3-methoxyphenyl)sulfonyl-3-[(4-methylpiperazino)methyl]indole."} {"text":"The traditional IUPAC name of the chemical with SELFIES [C][C][C][C][N][C][C][C][Ring1][=Branch1][Branch1][#Branch2][C][Branch1][C][C][Branch1][C][C][C][N][O] is 4-(2-amino-1,1-dimethyl-ethyl)-3-ethyl-piperidin-4-ol."}", "/scratch/micpie/export/iupac_smiles/valid_13-2.jsonl": "{"text":"The IUPAC name of the compound with SMILES CCC(C)(CNC(=O)C(C(C)(C)C)N)C(=O)O is 2-[[(2-amino-3,3-dimethylbutanoyl)amino]methyl]-2-methylbutanoic acid."} {"text":"The preferred IUPAC name of the molecule with DeepSMILES C[C@@H]CCNC=CC=CC=C6N%11)))))))C=O)C=CC=NC=C6)))Cl is (2-chloropyridin-4-yl)-[(2R)-2-methyl-1,2,3,4-tetrahydro-1,5-benzodiazepin-5-yl]methanone."}", "/scratch/micpie/export/iupac_smiles/valid_18-0.jsonl": "{"text":"The traditional IUPAC name of the compound with DeepSMILES CNCCCC5)C=CC=CC=CN=CN=C%106)))))))C=CN=CC=C6 is 8-(1-methylpyrrolidin-3-yl)-6-(3-pyridyl)quinazoline."} {"text":"The traditional IUPAC name of the molecule with SMILES CC(C(=O)N1CCCC1C2=CC=CS2)N3CCC(CC3)CCNC is 2-[4-[2-(methylamino)ethyl]piperidino]-1-[2-(2-thienyl)pyrrolidino]propan-1-one."}", "/scratch/micpie/export/iupac_smiles/valid_1-6.jsonl": "{"text":"The canonical SMILES of the molecule with preferred IUPAC name ethene;N-ethyl-N-methylprop-2-en-1-amine is C=C.C=CCN(C)CC."} {"text":"The canonical SMILES of the molecule with IUPAC name (2R)-1-[[(2S)-2-hydroxy-3-(1H-indol-4-yloxy)propyl]-propan-2-ylamino]-3-(1H-indol-4-yloxy)propan-2-ol is CC(C)N(C[C@H](O)COc1cccc2[nH]ccc12)C[C@@H](O)COc1cccc2[nH]ccc12."}", "/scratch/micpie/export/iupac_smiles/valid_17-7.jsonl": "{"text":"Task: Please give me the SMILES representation of a chemical based on the traditional IUPAC name.\nIUPAC name: 2-ethoxy-N-(4-fluorophenyl)-3-[4-[2-(2H-tetrazol-5-yl)phenyl]benzyl]benzimidazole-4-carboxamide\nResult: [C][C][O][C][=N][C][=C][C][=C][C][=Branch2][Ring1][P][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][N][=N][Ring1][Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][F]"} {"text":"Task: Please generate the SMILES representation of a compound given the traditional IUPAC name.\nIUPAC name: N3-(3-aminophenyl)-N1,N5-bis[3-(tert-butylamino)phenyl]benzene-1,3,5-tricarboxamide\nResult: [C][C][Branch1][C][C][Branch1][C][C][N][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=C][C][=Branch2][Ring1][Branch2][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][N][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][N][C][Branch1][C][C][Branch1][C][C][C]"}", "/scratch/micpie/export/iupac_smiles/train_5-6.jsonl": "{"text":"The DeepSMILES of the chemical with IUPAC name 4-[3-(aminomethyl)oxolan-3-yl]-3-ethylpiperidin-4-ol is CCCCNCCC6CCCOC5))))CN)))O."} {"text":"The canonical SMILES of the molecule with preferred IUPAC name N-(2-aminoethyl)-2-[2-(oxolan-2-ylmethoxy)propanoylamino]benzamide;hydrochloride is CC(OCC1CCCO1)C(=O)Nc1ccccc1C(=O)NCCN.Cl."}", "/scratch/micpie/export/iupac_smiles/valid_11-4.jsonl": "{"text":"The SELFIES of the chemical with systematic IUPAC name 13-chloranyl-2-piperidin-4-ylidene-4-azatricyclo[9.4.0.03,8]pentadeca-1(11),3(8),4,6,12,14-hexaene;disulfate is [C][C][C][=C][Branch1][#Branch2][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][Cl][C][=Branch1][=Branch2][=C][C][C][N][C][C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=N][Ring1][=Branch1].[O-1][S][=Branch1][C][=O][=Branch1][C][=O][O-1].[O-1][S][=Branch1][C][=O][=Branch1][C][=O][O-1]."} {"text":"The SELFIES of the compound with systematic IUPAC name 4-(7-bromanyl-9,9-dioctyl-fluoren-2-yl)benzaldehyde is [C][C][C][C][C][C][C][C][C][Branch2][Ring2][#Branch1][C][=C][Branch2][Ring1][Branch1][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=O][C][=C][Ring1][P][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Br][C][C][C][C][C][C][C][C]."}", "/scratch/micpie/export/iupac_smiles/test_13-3.jsonl": "{"text":"The InChI of the chemical with traditional IUPAC name 2-amino-N-(2-hydroxy-1-methyl-1-methylol-ethyl)-3,3-dimethyl-butyramide is InChI=1S\/C10H22N2O3\/c1-9(2,3)7(11)8(15)12-10(4,5-13)6-14\/h7,13-14H,5-6,11H2,1-4H3,(H,12,15)."} {"text":"The DeepSMILES of the molecule with traditional IUPAC name (4S)-chroman-4-carboxylic acid (1,3,5-trimethylpyrazol-4-yl) ester is CC=CC=NN5C)))C))OC=O)[C@H]CCOC=CC=CC=C%106."}", "/scratch/micpie/export/iupac_smiles/test_4-4.jsonl": "{"text":"The SMILES of the chemical with systematic IUPAC name 1-(3-methoxyphenyl)sulfonyl-3-[(4-methylpiperazin-1-yl)methyl]indole is CN1CCN(CC1)CC2=CN(C3=CC=CC=C32)S(=O)(=O)C4=CC=CC(=C4)OC."} {"text":"The InChI of the compound with systematic IUPAC name 4-(1-azanyl-2-methyl-propan-2-yl)-3-ethyl-piperidin-4-ol is InChI=1S\/C11H24N2O\/c1-4-9-7-13-6-5-11(9,14)10(2,3)8-12\/h9,13-14H,4-8,12H2,1-3H3."}", "/scratch/micpie/export/iupac_smiles/test_14-4.jsonl": "{"text":"The SELFIES of the compound with systematic IUPAC name (1,3,5-trimethylpyrazol-4-yl) (3S)-5-oxidanylidene-1-[2,2,2-tris(fluoranyl)ethyl]pyrrolidine-3-carboxylate is [C][C][=C][Branch1][#Branch2][C][=Branch1][=Branch1][=N][N][Ring1][Branch1][C][C][O][C][=Branch1][C][=O][C@H1][C][C][=Branch1][C][=O][N][Branch1][Ring2][C][Ring1][=Branch1][C][C][Branch1][C][F][Branch1][C][F][F]."} {"text":"The DeepSMILES of the chemical with systematic IUPAC name 4-bromanyl-2-iodanyl-5-(trifluoromethyl)phenol is C=CC=CC=C6O))I)))Br))CF)F)F."}", "/scratch/micpie/export/iupac_smiles/test_9-6.jsonl": "{"text":"The InChI of the compound with preferred IUPAC name 1-(fluoren-9-ylideneamino)oxy-3-(4-phenylbutan-2-ylamino)propan-2-ol is InChI=1S\/C26H28N2O2\/c1-19(15-16-20-9-3-2-4-10-20)27-17-21(29)18-30-28-26-24-13-7-5-11-22(24)23-12-6-8-14-25(23)26\/h2-14,19,21,27,29H,15-18H2,1H3."} {"text":"The canonical SMILES of the molecule with IUPAC name (2S)-5-chloro-N-[(4,6-dimethyl-2-oxopiperidin-3-yl)methyl]-3-[ethyl(propanoyl)amino]-2-methylcyclohexane-1-carboxamide is CCC(=O)N(CC)C1CC(Cl)CC(C(=O)NCC2C(=O)NC(C)CC2C)[C@@H]1C."}", "/scratch/micpie/export/iupac_smiles/test_18-6.jsonl": "{"text":"The canonical SMILES of the molecule with IUPAC name [(E)-2-ethyl-7,10-dimethylundec-3-enylidene]-dimethylphosphanium is CCC(C=[P+](C)C)\/C=C\/CCC(C)CCC(C)C."} {"text":"The SMILES of the chemical with IUPAC name N-methyl-2-[1-[2-(4-nitrophenyl)ethyl]piperidin-4-yl]ethanamine;hydrochloride is CNCCC1CCN(CC1)CCC2=CC=C(C=C2)[N+](=O)[O-].Cl."}", "/scratch/micpie/export/iupac_smiles/train_17-6.jsonl": "{"text":"The DeepSMILES of the compound with IUPAC name 2-ethoxy-N-pyridin-4-yl-3-[[4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl]benzimidazole-4-carboxamide is CCOC=NC=CC=CC=C6N9CC=CC=CC=C6))C=CC=CC=C6C=NNN=N5))))))))))))))))))C=O)NC=CC=NC=C6."} {"text":"The InChI of the chemical with IUPAC name 6-(2-fluorophenyl)-8-heptan-4-ylquinazoline is InChI=1S\/C21H23FN2\/c1-3-7-15(8-4-2)19-12-16(18-9-5-6-10-20(18)22)11-17-13-23-14-24-21(17)19\/h5-6,9-15H,3-4,7-8H2,1-2H3."}", "/scratch/micpie/export/iupac_smiles/valid_12-7.jsonl": "{"text":"Task: Please create the SMILES of a chemical given the traditional IUPAC name.\nIUPAC name: 8-[6-(4,4-dioctoxybutanoyloxy)hexyl-(2-hydroxyethyl)amino]caprylic acid ethyl ester\nResult: InChI=1S\/C38H75NO7\/c1-4-7-9-11-17-24-34-45-38(46-35-25-18-12-10-8-5-2)28-27-37(42)44-33-23-19-16-22-30-39(31-32-40)29-21-15-13-14-20-26-36(41)43-6-3\/h38,40H,4-35H2,1-3H3"} {"text":"Task: Please create the SMILES representation of a molecule given the traditional IUPAC name.\nIUPAC name: 2-amino-3,3-dimethyl-N-(5,5,5-trifluoropentyl)butyramide\nResult: [C][C][Branch1][C][C][Branch1][C][C][C][Branch2][Ring1][C][C][=Branch1][C][=O][N][C][C][C][C][C][Branch1][C][F][Branch1][C][F][F][N]"}", "/scratch/micpie/export/iupac_smiles/train_8-6.jsonl": "{"text":"The canonical SMILES of the compound with IUPAC name 3-(N-(2-methyl-5-nitrophenyl)sulfonylanilino)propanamide is Cc1ccc([N+](=O)[O-])cc1S(=O)(=O)N(CCC(N)=O)c1ccccc1."} {"text":"The SELFIES of the molecule with IUPAC name 1-(propan-2-ylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,11,13-hexaenylideneamino)oxypropan-2-ol;hydrochloride is [C][C][Branch1][C][C][N][C][C][Branch2][Ring1][=Branch2][C][O][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][#C][O].[Cl]."}", "/scratch/micpie/export/iupac_smiles/valid_1-4.jsonl": "{"text":"The InChI of the molecule with systematic IUPAC name ethene;N-ethyl-N-methyl-prop-2-en-1-amine is InChI=1S\/C6H13N.C2H4\/c1-4-6-7(3)5-2;1-2\/h4H,1,5-6H2,2-3H3;1-2H2."} {"text":"The InChI of the chemical with systematic IUPAC name (2R)-1-(1H-indol-4-yloxy)-3-[[(2S)-3-(1H-indol-4-yloxy)-2-oxidanyl-propyl]-propan-2-yl-amino]propan-2-ol is InChI=1S\/C25H31N3O4\/c1-17(2)28(13-18(29)15-31-24-7-3-5-22-20(24)9-11-26-22)14-19(30)16-32-25-8-4-6-23-21(25)10-12-27-23\/h3-12,17-19,26-27,29-30H,13-16H2,1-2H3\/t18-,19+."}", "/scratch/micpie/export/iupac_smiles/valid_6-9.jsonl": "{"text":"Task: Please create the SMILES of a molecule given the IUAPC name in CAS-like style.\nIUPAC name: N-(2-aminoethyl)-2-[(5-methoxy-1-oxopentyl)amino]benzamide\nResult: COCCCCC=O)NC=CC=CC=C6C=O)NCCN"} {"text":"Task: Please give me the SMILES representation of a chemical based on the IUAPC name in CAS-like style.\nIUPAC name: 2-methoxy-5-[[(phenylmethyl)-propan-2-ylamino]methyl]aniline\nResult: InChI=1S\/C18H24N2O\/c1-14(2)20(12-15-7-5-4-6-8-15)13-16-9-10-18(21-3)17(19)11-16\/h4-11,14H,12-13,19H2,1-3H3"}", "/scratch/micpie/export/iupac_smiles/valid_5-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the chemical with InChI InChI=1S\/C16H32N2O\/c1-2-14-12-18-11-10-16(14,19)15(13-17)8-6-4-3-5-7-9-15\/h14,18-19H,2-13,17H2,1H3 is 4-[1-(aminomethyl)cyclooctyl]-3-ethyl-4-piperidinol."} {"text":"The IUAPC name in CAS-like style of the chemical with DeepSMILES CCCC=O)NC=CC=CC=C6C=O)NCCN))))))))))))))NC=O)CCCCC5.Cl is N-(2-aminoethyl)-2-[[3-[[cyclopentyl(oxo)methyl]amino]-1-oxobutyl]amino]benzamide;hydrochloride."}", "/scratch/micpie/export/iupac_smiles/train_11-4.jsonl": "{"text":"The SELFIES of the molecule with systematic IUPAC name [(14R)-4,4,10,13,14-pentamethyl-17-[6-methyl-5,6-bis(oxidanyl)heptan-2-yl]-2,3,5,6,7,11,12,15,16,17-decahydro-1H-cyclopenta[a]phenanthren-3-yl] ethanoate is [C][C][Branch1][#C][C][C][C][Branch1][=Branch2][C][Branch1][C][C][Branch1][C][C][O][O][C][C][C][C@@][Branch2][Ring2][#Branch2][C][Ring1][Branch1][Branch2][Ring2][Ring1][C][C][C][=C][Ring1][=Branch1][C][C][C][C][Ring1][=Branch1][Branch2][Ring1][Ring1][C][C][C][Branch1][Branch2][C][Ring1][=Branch1][Branch1][C][C][C][O][C][=Branch1][C][=O][C][C][C][C]."} {"text":"The InChI of the molecule with systematic IUPAC name N-(3-dibenzofuran-1-ylphenyl)-9,9-dimethyl-N-(4-phenylphenyl)fluoren-2-amine is InChI=1S\/C45H33NO\/c1-45(2)40-19-8-6-16-37(40)38-27-26-35(29-41(38)45)46(33-24-22-31(23-25-33)30-12-4-3-5-13-30)34-15-10-14-32(28-34)36-18-11-21-43-44(36)39-17-7-9-20-42(39)47-43\/h3-29H,1-2H3."}", "/scratch/micpie/export/iupac_smiles/test_8-8.jsonl": "{"text":"Task: Please give me the SMILES of a chemical given the systematic IUPAC name.\nIUPAC name: N-[3-[(3,4-dimethylphenyl)sulfonylamino]phenyl]-2-methyl-propanamide\nResult: InChI=1S\/C18H22N2O3S\/c1-12(2)18(21)19-15-6-5-7-16(11-15)20-24(22,23)17-9-8-13(3)14(4)10-17\/h5-12,20H,1-4H3,(H,19,21)"} {"text":"Task: Please generate the SMILES representation of a chemical based on the systematic IUPAC name.\nIUPAC name: 1-(tert-butylamino)-3-[(diphenylmethylidene)amino]oxy-propan-2-ol\nResult: CCC)C)NCCCON=CC=CC=CC=C6))))))C=CC=CC=C6))))))))))O"}", "/scratch/micpie/export/iupac_smiles/test_9-9.jsonl": "{"text":"Task: Please create the SMILES representation of a compound based on the CAS-like IUPAC name.\nIUPAC name: 1-(9-fluorenylideneamino)oxy-3-(4-phenylbutan-2-ylamino)-2-propanol\nResult: CC(CCc1ccccc1)NCC(O)CON=C1c2ccccc2-c2ccccc21"} {"text":"Task: Please generate the SMILES representation of a compound given the CAS-like IUPAC name.\nIUPAC name: (2S)-5-chloro-N-[(4,6-dimethyl-2-oxo-3-piperidinyl)methyl]-3-[ethyl(1-oxopropyl)amino]-2-methyl-1-cyclohexanecarboxamide\nResult: CCC(=O)N(CC)C1CC(Cl)CC(C(=O)NCC2C(=O)NC(C)CC2C)[C@@H]1C"}", "/scratch/micpie/export/iupac_smiles/valid_20-6.jsonl": "{"text":"The SMILES of the compound with IUPAC name methyl 4-acetyloxy-7-(2-hydroxy-2-oct-2-enyl-5-oxocyclopent-3-en-1-ylidene)hept-5-enoate is CCCCCC=CCC1(C=CC(=O)C1=CC=CC(CCC(=O)OC)OC(=O)C)O."} {"text":"The SELFIES of the molecule with preferred IUPAC name 2-[2-[2-(2-oxopropylamino)ethylamino]ethylamino]acetic acid is [C][C][=Branch1][C][=O][C][N][C][C][N][C][C][N][C][C][=Branch1][C][=O][O]."}", "/scratch/micpie/export/iupac_smiles/train_17-9.jsonl": "{"text":"Task: Please give me the SMILES of a chemical given the CAS-like IUPAC name.\nIUPAC name: 2-ethoxy-N-pyridin-4-yl-3-[[4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl]-4-benzimidazolecarboxamide\nResult: CCOC1=NC2=CC=CC(=C2N1CC3=CC=C(C=C3)C4=CC=CC=C4C5=NNN=N5)C(=O)NC6=CC=NC=C6"} {"text":"Task: Please generate the SMILES representation of a chemical based on the IUAPC name in CAS-like style.\nIUPAC name: 6-(2-fluorophenyl)-8-heptan-4-ylquinazoline\nResult: CCCC(CCC)C1=CC(=CC2=CN=CN=C12)C3=CC=CC=C3F"}", "/scratch/micpie/export/iupac_smiles/train_11-7.jsonl": "{"text":"Task: Please create the SMILES representation of a compound given the traditional IUPAC name.\nIUPAC name: acetic acid [(14R)-17-(4,5-dihydroxy-1,5-dimethyl-hexyl)-4,4,10,13,14-pentamethyl-2,3,5,6,7,11,12,15,16,17-decahydro-1H-cyclopenta[a]phenanthren-3-yl] ester\nResult: CC(=O)OC1CCC2(C)C3=C(CCC2C1(C)C)[C@]1(C)CCC(C(C)CCC(O)C(C)(C)O)C1(C)CC3"} {"text":"Task: Please give me the SMILES of a molecule given the traditional IUPAC name.\nIUPAC name: (3-dibenzofuran-1-ylphenyl)-(9,9-dimethylfluoren-2-yl)-(4-phenylphenyl)amine\nResult: CC1(C)c2ccccc2-c2ccc(N(c3ccc(-c4ccccc4)cc3)c3cccc(-c4cccc5oc6ccccc6c45)c3)cc21"}", "/scratch/micpie/export/iupac_smiles/valid_4-4.jsonl": "{"text":"The SMILES of the molecule with systematic IUPAC name N-(4-chlorophenyl)-N-[[1-[(2S,3R,4R,5S,6R)-6-(hydroxymethyl)-5-[(2S,3R,4S,5R,6R)-6-(hydroxymethyl)-3,4,5-tris(oxidanyl)oxan-2-yl]oxy-3,4-bis(oxidanyl)oxan-2-yl]-1,2,3-triazol-4-yl]methyl]ethanesulfonamide is CCS(=O)(=O)N(CC1=CN(N=N1)[C@@H]2[C@@H]([C@H]([C@@H]([C@H](O2)CO)O[C@H]3[C@@H]([C@H]([C@H]([C@H](O3)CO)O)O)O)O)O)C4=CC=C(C=C4)Cl."} {"text":"The SMILES of the compound with systematic IUPAC name 4-[1-(aminomethyl)-4-ethyl-cyclohexyl]-3-ethyl-piperidin-4-ol is CCC1CCC(CC1)(CN)C2(CCNCC2CC)O."}", "/scratch/micpie/export/iupac_smiles/test_5-6.jsonl": "{"text":"The SELFIES of the chemical with IUPAC name 4-(1-amino-2-methylbutan-2-yl)-3-ethylpiperidin-4-ol is [C][C][C][C][N][C][C][C][Ring1][=Branch1][Branch1][O][C][Branch1][C][C][Branch1][Ring1][C][C][C][N][O]."} {"text":"The InChI of the molecule with IUPAC name N-(2-aminoethyl)-2-[(4-methyl-3-phenylpentanoyl)amino]benzamide is InChI=1S\/C21H27N3O2\/c1-15(2)18(16-8-4-3-5-9-16)14-20(25)24-19-11-7-6-10-17(19)21(26)23-13-12-22\/h3-11,15,18H,12-14,22H2,1-2H3,(H,23,26)(H,24,25)."}", "/scratch/micpie/export/iupac_smiles/valid_8-2.jsonl": "{"text":"The IUPAC name of the chemical with SELFIES [C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][Branch1][Branch2][C][C][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][#Branch2] is 3-(N-naphthalen-2-ylsulfonylanilino)propanamide."} {"text":"The IUPAC name of the molecule with DeepSMILES CCC)NCCCO\/N=C\\CCCC=CC=CC=C6\\%10)))))))))))))O.Cl is 1-[(E)-3,4-dihydro-2H-naphthalen-1-ylideneamino]oxy-3-(propan-2-ylamino)propan-2-ol;hydrochloride."}", "/scratch/micpie/export/iupac_smiles/train_11-2.jsonl": "{"text":"The preferred IUPAC name of the compound with SELFIES [C][C][Branch1][#C][C][C][C][Branch1][=Branch2][C][Branch1][C][C][Branch1][C][C][O][O][C][C][C][C@@][Branch2][Ring2][#Branch2][C][Ring1][Branch1][Branch2][Ring2][Ring1][C][C][C][=C][Ring1][=Branch1][C][C][C][C][Ring1][=Branch1][Branch2][Ring1][Ring1][C][C][C][Branch1][Branch2][C][Ring1][=Branch1][Branch1][C][C][C][O][C][=Branch1][C][=O][C][C][C][C] is [(14R)-17-(5,6-dihydroxy-6-methylheptan-2-yl)-4,4,10,13,14-pentamethyl-2,3,5,6,7,11,12,15,16,17-decahydro-1H-cyclopenta[a]phenanthren-3-yl] acetate."} {"text":"The preferred IUPAC name of the compound with InChI InChI=1S\/C45H33NO\/c1-45(2)40-19-8-6-16-37(40)38-27-26-35(29-41(38)45)46(33-24-22-31(23-25-33)30-12-4-3-5-13-30)34-15-10-14-32(28-34)36-18-11-21-43-44(36)39-17-7-9-20-42(39)47-43\/h3-29H,1-2H3 is N-(3-dibenzofuran-1-ylphenyl)-9,9-dimethyl-N-(4-phenylphenyl)fluoren-2-amine."}", "/scratch/micpie/export/iupac_smiles/valid_0-8.jsonl": "{"text":"Task: Please create the SMILES representation of a chemical given the systematic IUPAC name.\nIUPAC name: 1-[(4-bromophenyl)methyl]-N-[(Z)-1-(4-methylphenyl)ethylideneamino]pyrazole-3-carboxamide\nResult: CC1=CC=C(C=C1)\/C(=N\\NC(=O)C2=NN(C=C2)CC3=CC=C(C=C3)Br)\/C"} {"text":"Task: Please generate the SMILES of a compound based on the systematic IUPAC name.\nIUPAC name: ethene;4-methoxy-4,6-dimethyl-oxane-2,5-diol\nResult: C=C.COC1(C)CC(O)OC(C)C1O"}", "/scratch/micpie/export/iupac_smiles/train_25-4.jsonl": "{"text":"The SELFIES of the chemical with systematic IUPAC name N-(4-butoxycyclohexyl)-5-(5-chloranylpyridin-3-yl)-7-methyl-pyrazolo[1,5-a]pyrimidine-3-carboxamide is [C][C][C][C][O][C][C][C][C][Branch1][Branch1][C][C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=C][N][=C][Branch1][=N][C][=C][Branch1][Branch2][N][Ring1][=Branch1][N][=C][Ring1][=Branch2][C][C][=C][C][=Branch1][=Branch1][=C][N][=C][Ring1][=Branch1][Cl]."} {"text":"The InChI of the molecule with systematic IUPAC name 8-ethyl-2,3-dihydrothieno[2,3-g][1,4]benzodioxine-7-carboxylic acid is InChI=1S\/C13H12O4S\/c1-2-7-8-5-9-10(17-4-3-16-9)6-11(8)18-12(7)13(14)15\/h5-6H,2-4H2,1H3,(H,14,15)."}", "/scratch/micpie/export/iupac_smiles/valid_12-6.jsonl": "{"text":"The SELFIES of the compound with preferred IUPAC name ethyl 8-[6-(4,4-dioctoxybutanoyloxy)hexyl-(2-hydroxyethyl)amino]octanoate is [C][C][C][C][C][C][C][C][O][C][Branch2][Ring2][C][C][C][C][=Branch1][C][=O][O][C][C][C][C][C][C][N][Branch1][#C][C][C][C][C][C][C][C][C][=Branch1][C][=O][O][C][C][C][C][O][O][C][C][C][C][C][C][C][C]."} {"text":"The InChI of the molecule with preferred IUPAC name 2-amino-3,3-dimethyl-N-(5,5,5-trifluoropentyl)butanamide is InChI=1S\/C11H21F3N2O\/c1-10(2,3)8(15)9(17)16-7-5-4-6-11(12,13)14\/h8H,4-7,15H2,1-3H3,(H,16,17)."}", "/scratch/micpie/export/iupac_smiles/test_10-4.jsonl": "{"text":"The InChI of the chemical with systematic IUPAC name 1-[5-chloranyl-2-methyl-3-[piperidin-4-yl(prop-2-enyl)amino]phenyl]ethanol is InChI=1S\/C17H25ClN2O\/c1-4-9-20(15-5-7-19-8-6-15)17-11-14(18)10-16(12(17)2)13(3)21\/h4,10-11,13,15,19,21H,1,5-9H2,2-3H3."} {"text":"The InChI of the molecule with systematic IUPAC name 2-[4-[3-(3-fluoranyl-6-methoxy-quinolin-4-yl)propyl]-1-(2-thiophen-3-ylsulfanylethyl)piperidin-3-yl]ethanoic acid is InChI=1S\/C26H31FN2O3S2\/c1-32-20-5-6-25-23(14-20)22(24(27)15-28-25)4-2-3-18-7-9-29(16-19(18)13-26(30)31)10-12-34-21-8-11-33-17-21\/h5-6,8,11,14-15,17-19H,2-4,7,9-10,12-13,16H2,1H3,(H,30,31)."}", "/scratch/micpie/export/iupac_smiles/train_5-8.jsonl": "{"text":"Task: Please generate the SMILES of a molecule given the systematic IUPAC name.\nIUPAC name: 4-[3-(aminomethyl)oxolan-3-yl]-3-ethyl-piperidin-4-ol\nResult: CCC1CNCCC1(O)C1(CN)CCOC1"} {"text":"Task: Please create the SMILES of a chemical based on the systematic IUPAC name.\nIUPAC name: N-(2-azanylethyl)-2-[2-(oxolan-2-ylmethoxy)propanoylamino]benzamide;hydrochloride\nResult: [C][C][Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][N][O][C][C][C][C][C][O][Ring1][Branch1].[Cl]"}", "/scratch/micpie/export/iupac_smiles/train_19-2.jsonl": "{"text":"The preferred IUPAC name of the chemical with SELFIES [C][N][C][C][C][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl] is N-(4-chlorophenyl)-3-[4-[2-(methylamino)ethyl]piperidin-1-yl]propanamide."} {"text":"The preferred IUPAC name of the molecule with InChI InChI=1S\/C17H11Cl2N2O4\/c1-9(12-6-10-4-2-3-5-15(10)25-17(12)22)20-16-13(18)7-11(21(23)24)8-14(16)19\/h2-8,23H,1H3\/q-1 is 3-[N-[2,6-dichloro-4-[hydroxy(oxido)amino]phenyl]-C-methylcarbonimidoyl]chromen-2-one."}", "/scratch/micpie/export/iupac_smiles/train_14-2.jsonl": "{"text":"The IUPAC name of the compound with SELFIES [C][C][=C][Branch1][#Branch2][C][=Branch1][=Branch1][=N][N][Ring1][Branch1][C][C][O][C][=Branch1][C][=O][C@@H1][C][C][=Branch1][C][=O][N][Branch1][Ring2][C][Ring1][=Branch1][C][C][Branch1][C][F][Branch1][C][F][F] is (1,3,5-trimethylpyrazol-4-yl) (3R)-5-oxo-1-(2,2,2-trifluoroethyl)pyrrolidine-3-carboxylate."} {"text":"The preferred IUPAC name of the compound with SELFIES [C][C][N][Branch1][=Branch1][C][C][Ring1][Branch1][F][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=Branch1][P][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][Branch1][Ring2][O][Ring1][=Branch1][C][N][Cl] is [7-chloro-5-[4-(3-fluoropyrrolidin-1-yl)sulfonylphenyl]-1-benzofuran-2-yl]methanamine."}", "/scratch/micpie/export/iupac_smiles/train_4-4.jsonl": "{"text":"The canonical SMILES of the chemical with systematic IUPAC name (1S,2R,4aS,5R,6S,8aS)-6-(1,3-dithiolan-2-yl)-5-[(E)-2-[5-(3-fluorophenyl)pyridin-2-yl]ethenyl]-1,4a-dimethyl-2-oxidanyl-2,3,4,5,6,7,8,8a-octahydronaphthalene-1-carbaldehyde is C[C@@]1(C=O)[C@H](O)CC[C@@]2(C)[C@H](\/C=C\/c3ccc(-c4cccc(F)c4)cn3)[C@@H](C3SCCS3)CC[C@H]12."} {"text":"The canonical SMILES of the molecule with systematic IUPAC name 4-(1-azanylbutan-2-yl)-3-ethyl-piperidin-4-ol is CCC(CN)C1(O)CCNCC1CC."}", "/scratch/micpie/export/iupac_smiles/test_1-2.jsonl": "{"text":"The IUPAC name of the chemical with DeepSMILES CC[C@H]CCC6)O)))[N+]=O)[O-] is (3R)-3-nitrocyclohexan-1-ol."} {"text":"The IUPAC name of the compound with InChI InChI=1S\/C25H31N3O4\/c1-17(2)28(13-18(29)15-31-24-7-3-5-22-20(24)9-11-26-22)14-19(30)16-32-25-8-4-6-23-21(25)10-12-27-23\/h3-12,17-19,26-27,29-30H,13-16H2,1-2H3\/t18-,19-\/m0\/s1 is (2S)-1-[[(2S)-2-hydroxy-3-(1H-indol-4-yloxy)propyl]-propan-2-ylamino]-3-(1H-indol-4-yloxy)propan-2-ol."}", "/scratch/micpie/export/iupac_smiles/train_23-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the molecule with canonical SMILES O=CC1CCCN(S(=O)(=O)c2ccccc2[N+](=O)[O-])C1 is 1-(2-nitrophenyl)sulfonyl-3-piperidinecarboxaldehyde."} {"text":"The IUAPC name in CAS-like style of the chemical with SMILES CC1=CC(=C(C=C1)C)C(=O)CCC(=O)NN2C=NC3=CC=CC=C32 is N-(1-benzimidazolyl)-4-(2,5-dimethylphenyl)-4-oxobutanamide."}", "/scratch/micpie/export/iupac_smiles/valid_11-9.jsonl": "{"text":"Task: Please give me the SMILES of a molecule given the CAS-like IUPAC name.\nIUPAC name: 13-chloro-2-(4-piperidinylidene)-4-azatricyclo[9.4.0.03,8]pentadeca-1(11),3(8),4,6,12,14-hexaene;disulfate\nResult: C1CC2=C(C=CC(=C2)Cl)C(=C3CCNCC3)C4=C1C=CC=N4.[O-]S(=O)(=O)[O-].[O-]S(=O)(=O)[O-]"} {"text":"Task: Please give me the SMILES representation of a molecule given the CAS-like IUPAC name.\nIUPAC name: 4-(7-bromo-9,9-dioctyl-2-fluorenyl)benzaldehyde\nResult: CCCCCCCCC1(C2=C(C=CC(=C2)C3=CC=C(C=C3)C=O)C4=C1C=C(C=C4)Br)CCCCCCCC"}", "/scratch/micpie/export/iupac_smiles/test_27-5.jsonl": "{"text":"The canonical SMILES of the compound with CAS-like IUPAC name 1-[(3S)-3-(4-chlorophenyl)-5-(3,4-dimethylphenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(2-fluorophenyl)-1-piperazinyl]ethanone is Cc1ccc(C2=NN(C(=O)CN3CCN(c4ccccc4F)CC3)[C@H](c3ccc(Cl)cc3)C2)cc1C."} {"text":"The DeepSMILES of the molecule with CAS-like IUPAC name N-[(1E)-1-(4-ethenyl-2,3,5,5-tetramethyl-1-cyclopenta-1,3-dienyl)buta-1,3-dien-2-yl]-5-phenyl-2-thiophenamine is CC=CCC=C5C))\/C=C\\C=C))\/NC=CC=CS5)C=CC=CC=C6))))))))))))))C)C))C=C."}", "/scratch/micpie/export/iupac_smiles/train_4-8.jsonl": "{"text":"Task: Please generate the SMILES representation of a compound based on the systematic IUPAC name.\nIUPAC name: (1S,2R,4aS,5R,6S,8aS)-6-(1,3-dithiolan-2-yl)-5-[(E)-2-[5-(3-fluorophenyl)pyridin-2-yl]ethenyl]-1,4a-dimethyl-2-oxidanyl-2,3,4,5,6,7,8,8a-octahydronaphthalene-1-carbaldehyde\nResult: C[C@@]CC[C@H][C@@][C@H]6CC[C@@H][C@H]%10\/C=C\/C=NC=CC=C6))C=CC=CC=C6)))F)))))))))))CSCCS5)))))))))C)C=O)))O"} {"text":"Task: Please create the SMILES representation of a chemical based on the systematic IUPAC name.\nIUPAC name: 4-(1-azanylbutan-2-yl)-3-ethyl-piperidin-4-ol\nResult: CCC(CN)C1(O)CCNCC1CC"}", "/scratch/micpie/export/iupac_smiles/train_20-5.jsonl": "{"text":"The InChI of the chemical with CAS-like IUPAC name sulfuric acid [4-hydroxy-6-[[2,6,13,17,17-pentamethyl-6-(4-methylpentyl)-4,8-dioxo-7-oxapentacyclo[10.8.0.02,9.05,9.013,18]eicos-11-en-16-yl]oxy]-5-[(3,4,5-trihydroxy-6-methyl-2-oxanyl)oxy]-3-oxanyl] ester is InChI=1S\/C41H64O15S\/c1-20(2)10-9-15-40(8)33-24(42)18-39(7)23-11-12-26-37(4,5)27(14-16-38(26,6)22(23)13-17-41(33,39)36(47)55-40)53-35-32(29(44)25(19-51-35)56-57(48,49)50)54-34-31(46)30(45)28(43)21(3)52-34\/h13,20-21,23,25-35,43-46H,9-12,14-19H2,1-8H3,(H,48,49,50)."} {"text":"The SELFIES of the compound with IUAPC name in CAS-like style benzoic acid [4-[(2S)-1-(1-azetidinyl)-1-oxopentan-2-yl]oxy-2-[5-[[2-[[[3-[4-(3-fluorophenyl)-1-triazolyl]-5-[[4-[4-(3-fluorophenyl)-1-triazolyl]-3,5-dihydroxy-6-(hydroxymethyl)-2-oxanyl]oxy]-4-hydroxycyclohexyl]-oxomethyl]amino]ethylamino]-oxomethyl]-3-methyl-2-[(3,4,5-trihydroxy-6-methyl-2-oxanyl)oxy]cyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)-3-oxanyl] ester is [C][C][C][C@@H1][Branch1][O][C][=Branch1][C][=O][N][C][C][C][Ring1][Ring2][O][C][C][Branch2][O][O][C][Branch2][O][Branch1][O][C][Branch1][P][C][Ring1][=Branch1][O][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][C][C][Branch2][Ring1][#C][C][C][Branch2][Ring1][=Branch2][C][Ring1][=Branch1][O][C][C][Branch1][S][C][Branch1][N][C][Branch1][Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][C][C][=Branch1][C][=O][N][C][C][N][C][=Branch1][C][=O][C][C][C][Branch2][Branch1][=Branch1][C][Branch2][Ring2][P][C][Branch1][Ring2][C][Ring1][=Branch1][O][C][C][Branch2][Ring2][Ring2][C][Branch1][=N][C][Branch1][=Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][N][C][=C][Branch1][Branch1][N][=N][Ring1][Branch1][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][F][O][O][N][C][=C][Branch1][Branch1][N][=N][Ring1][Branch1][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][F][C][O][O]."}", "/scratch/micpie/export/iupac_smiles/train_19-7.jsonl": "{"text":"Task: Please create the SMILES representation of a compound given the traditional IUPAC name.\nIUPAC name: N-(4-chlorophenyl)-3-[4-[2-(methylamino)ethyl]piperidino]propionamide\nResult: [C][N][C][C][C][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl]"} {"text":"Task: Please create the SMILES representation of a chemical given the traditional IUPAC name.\nIUPAC name: 3-[N-[2,6-dichloro-4-[hydroxy(oxido)amino]phenyl]-C-methyl-carbonimidoyl]coumarin\nResult: CC(=NC1=C(C=C(C=C1Cl)N(O)[O-])Cl)C2=CC3=CC=CC=C3OC2=O"}", "/scratch/micpie/export/iupac_smiles/train_26-6.jsonl": "{"text":"The InChI of the compound with IUPAC name 2-(4-chloro-3-fluorophenyl)-4-ethyl-1,3-thiazole-5-carboxylic acid is InChI=1S\/C12H9ClFNO2S\/c1-2-9-10(12(16)17)18-11(15-9)6-3-4-7(13)8(14)5-6\/h3-5H,2H2,1H3,(H,16,17)."} {"text":"The SMILES of the molecule with IUPAC name 1-[(3S)-3-(4-chlorophenyl)-5-(4-fluorophenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(2-fluorophenyl)piperazin-1-yl]ethanone is C1CN(CCN1CC(=O)N2[C@@H](CC(=N2)C3=CC=C(C=C3)F)C4=CC=C(C=C4)Cl)C5=CC=CC=C5F."}", "/scratch/micpie/export/iupac_smiles/valid_24-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the compound with canonical SMILES O=C(Cn1[nH]c(=O)c2ccccc2c1=O)Nc1ccccc1O is 2-(1,4-dioxo-3H-phthalazin-2-yl)-N-(2-hydroxyphenyl)acetamide."} {"text":"The IUAPC name in CAS-like style of the chemical with InChI InChI=1S\/C18H13F4N5O2\/c1-8-3-13(10-4-12(19)15-14(5-10)23-7-29-15)26-16-11(6-24-27(8)16)17(28)25-9(2)18(20,21)22\/h3-7,9H,1-2H3,(H,25,28) is 5-(7-fluoro-1,3-benzoxazol-5-yl)-7-methyl-N-(1,1,1-trifluoropropan-2-yl)-3-pyrazolo[1,5-a]pyrimidinecarboxamide."}", "/scratch/micpie/export/iupac_smiles/test_12-6.jsonl": "{"text":"The SELFIES of the compound with IUPAC name 2-(4-chlorophenyl)spiro[fluorene-9,9'-thioxanthene] is [C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][Branch2][Ring1][#Branch1][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][S][C][=C][C][=C][C][=C][Ring1][=C][Ring1][=Branch1][C][=C][Branch1][=Branch1][C][=C][Ring2][Ring1][Ring2][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl]."} {"text":"The DeepSMILES of the compound with preferred IUPAC name 2-amino-N-(cyclopropylmethyl)-3,3-dimethyl-N-prop-2-ynylbutanamide is CCC)C)CC=O)NCC#C)))CCCC3))))))N."}", "/scratch/micpie/export/iupac_smiles/test_3-2.jsonl": "{"text":"The IUPAC name of the molecule with SMILES CC(C)OC[C@H](CSCC1=CC=CC=C1)O is (2R)-1-benzylsulfanyl-3-propan-2-yloxypropan-2-ol."} {"text":"The IUPAC name of the compound with SELFIES [C][C][=C][C][=C][Branch1][=Branch1][C][=C][Ring1][=Branch1][Cl][N][C][=Branch1][C][=O][O][Ring1][=Branch2].[C][Branch1][O][C][Branch1][Ring1][C][O][Branch1][Ring1][C][O][N][O] is 2-amino-2-(hydroxymethyl)propane-1,3-diol;5-chloro-6-methyl-3H-1,3-benzoxazol-2-one."}", "/scratch/micpie/export/iupac_smiles/test_1-5.jsonl": "{"text":"The canonical SMILES of the chemical with IUAPC name in CAS-like style (3R)-3-nitro-1-cyclohexanol is O=[N+]([O-])[C@@H]1CCCC(O)C1."} {"text":"The SMILES of the chemical with IUAPC name in CAS-like style (2S)-1-[[(2S)-2-hydroxy-3-(1H-indol-4-yloxy)propyl]-propan-2-ylamino]-3-(1H-indol-4-yloxy)-2-propanol is CC(C)N(C[C@@H](COC1=CC=CC2=C1C=CN2)O)C[C@@H](COC3=CC=CC4=C3C=CN4)O."}", "/scratch/micpie/export/iupac_smiles/valid_14-0.jsonl": "{"text":"The traditional IUPAC name of the chemical with canonical SMILES CC(C)(C)OC(=O)NC[C@@H]1CCC[C@@H]1NC(=O)c1ccco1 is N-[[(1S,2S)-2-(2-furoylamino)cyclopentyl]methyl]carbamic acid tert-butyl ester."} {"text":"The traditional IUPAC name of the molecule with DeepSMILES CCC)C)OC=O)NCCC=CC=CC=CC=C6O9))Cl)))Br is N-[2-(5-bromo-7-chloro-benzofuran-2-yl)ethyl]carbamic acid tert-butyl ester."}", "/scratch/micpie/export/iupac_smiles/valid_25-9.jsonl": "{"text":"Task: Please generate the SMILES representation of a chemical based on the CAS-like IUPAC name.\nIUPAC name: 5-(5-chloro-3-pyridinyl)-N-[(1S)-1-cyclopropylethyl]-7-methyl-3-pyrazolo[1,5-a]pyrimidinecarboxamide\nResult: CC1=CC(=NC2=C(C=NN12)C(=O)N[C@@H](C)C3CC3)C4=CC(=CN=C4)Cl"} {"text":"Task: Please generate the SMILES representation of a compound based on the IUAPC name in CAS-like style.\nIUPAC name: 2-[amino(phenyl)methyl]-4-ethyl-5-thiazolecarboxylic acid\nResult: CCC1=C(SC(=N1)C(C2=CC=CC=C2)N)C(=O)O"}", "/scratch/micpie/export/iupac_smiles/train_12-2.jsonl": "{"text":"The preferred IUPAC name of the compound with SMILES CC1(C2=CC=CC=C2C3=C1C=C(C=C3)N(C4=CC=CC(=C4)C5=CC=CC=C5)C6=CC=CC(=C6)C7=C8C9=CC=CC=C9OC8=CC=C7)C is N-(3-dibenzofuran-1-ylphenyl)-9,9-dimethyl-N-(3-phenylphenyl)fluoren-2-amine."} {"text":"The IUPAC name of the chemical with SELFIES [C][C][Branch1][C][C][Branch1][C][C][C][Branch1][P][C][=Branch1][C][=O][N][C][Branch1][Ring2][C][C][=C][C][=Branch1][C][=O][O][N] is 2-[(2-amino-3,3-dimethylbutanoyl)amino]pent-4-enoic acid."}", "/scratch/micpie/export/iupac_smiles/valid_5-0.jsonl": "{"text":"The traditional IUPAC name of the compound with InChI InChI=1S\/C16H32N2O\/c1-2-14-12-18-11-10-16(14,19)15(13-17)8-6-4-3-5-7-9-15\/h14,18-19H,2-13,17H2,1H3 is 4-[1-(aminomethyl)cyclooctyl]-3-ethyl-piperidin-4-ol."} {"text":"The traditional IUPAC name of the compound with SELFIES [C][C][Branch2][Ring1][#Branch1][C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][N][N][C][=Branch1][C][=O][C][C][C][C][C][Ring1][Branch1].[Cl] is N-(2-aminoethyl)-2-[3-(cyclopentanecarbonylamino)butanoylamino]benzamide;hydrochloride."}", "/scratch/micpie/export/iupac_smiles/train_1-5.jsonl": "{"text":"The DeepSMILES of the chemical with IUAPC name in CAS-like style 3-[(1Z)-3-chlorobuta-1,3-dienyl]-2-methyl-N-[4-(1-piperidinyl)butyl]-4-pyridinamine is CC=NC=CC=C6\/C=C\\C=C)Cl)))))NCCCCNCCCCC6."} {"text":"The SELFIES of the molecule with CAS-like IUPAC name (6S,8R,9S,13S,14S,17S)-3-methoxy-13-methyl-17-[[(2R)-2-oxanyl]oxy]-6,7,8,9,11,12,14,15,16,17-decahydrocyclopenta[a]phenanthren-6-ol is [C][C@][C][C][C@H1][C@H1][Branch2][Ring1][C][C@@H1][Ring1][=Branch1][C][C][C@@H1][Ring1][=Branch2][O][C@@H1][C][C][C][C][O][Ring1][=Branch1][C][C@@H1][Branch1][#C][C][=C][Ring1][P][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O]."}", "/scratch/micpie/export/iupac_smiles/train_12-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with canonical SMILES CC1(C)c2ccccc2-c2ccc(N(c3cccc(-c4ccccc4)c3)c3cccc(-c4cccc5oc6ccccc6c45)c3)cc21 is (3-dibenzofuran-1-ylphenyl)-(9,9-dimethylfluoren-2-yl)-(3-phenylphenyl)amine."} {"text":"The traditional IUPAC name of the compound with canonical SMILES C=CCC(NC(=O)C(N)C(C)(C)C)C(=O)O is 2-[(2-amino-3,3-dimethyl-butanoyl)amino]pent-4-enoic acid."}", "/scratch/micpie/export/iupac_smiles/test_12-2.jsonl": "{"text":"The IUPAC name of the chemical with canonical SMILES Clc1ccc(-c2ccc3c(c2)C2(c4ccccc4Sc4ccccc42)c2ccccc2-3)cc1 is 2-(4-chlorophenyl)spiro[fluorene-9,9'-thioxanthene]."} {"text":"The IUPAC name of the compound with InChI InChI=1S\/C13H22N2O\/c1-5-8-15(9-10-6-7-10)12(16)11(14)13(2,3)4\/h1,10-11H,6-9,14H2,2-4H3 is 2-amino-N-(cyclopropylmethyl)-3,3-dimethyl-N-prop-2-ynylbutanamide."}", "/scratch/micpie/export/iupac_smiles/train_7-1.jsonl": "{"text":"The CAS-like IUPAC name of the compound with DeepSMILES COC=O)CNS=O)=O)C=CC=CC=C6)N))))Cl is 2-[(5-amino-2-chlorophenyl)sulfonylamino]acetic acid methyl ester."} {"text":"The IUAPC name in CAS-like style of the compound with InChI InChI=1S\/C15H15N3O5S\/c16-15(19)10-11-17(12-6-2-1-3-7-12)24(22,23)14-9-5-4-8-13(14)18(20)21\/h1-9H,10-11H2,(H2,16,19) is 3-(N-(2-nitrophenyl)sulfonylanilino)propanamide."}", "/scratch/micpie/export/iupac_smiles/valid_27-2.jsonl": "{"text":"The IUPAC name of the chemical with DeepSMILES CNCC=O)N[C@@H]CC=N5)C=CC=CC=C6))F)))))))C=CC=CC=C6))Cl)))))))))C=CC=CC=C6 is 1-[(3S)-3-(4-chlorophenyl)-5-(4-fluorophenyl)-3,4-dihydropyrazol-2-yl]-2-(N-methylanilino)ethanone."} {"text":"The IUPAC name of the chemical with DeepSMILES CCC=CC=CC=C6C=C9C=CC=C6))NC=CC=CC=C6))C=CC5C=CC=CC=C6C=CC=CC=C%136)))))))))))))C=CC=C6))C=CC=CC=C6O))O))O))O))O))))))))))))))))))))))C is 6-[7'-[(9,9-dimethylfluoren-2-yl)amino]-9,9'-spirobi[fluorene]-2'-yl]benzene-1,2,3,4,5-pentol."}", "/scratch/micpie/export/iupac_smiles/test_6-8.jsonl": "{"text":"Task: Please give me the SMILES of a compound given the systematic IUPAC name.\nIUPAC name: N-(2-azanylethyl)-2-[(3-methyl-3-phenyl-butanoyl)amino]benzamide\nResult: InChI=1S\/C20H25N3O2\/c1-20(2,15-8-4-3-5-9-15)14-18(24)23-17-11-7-6-10-16(17)19(25)22-13-12-21\/h3-11H,12-14,21H2,1-2H3,(H,22,25)(H,23,24)"} {"text":"Task: Please generate the SMILES representation of a molecule based on the systematic IUPAC name.\nIUPAC name: N-[(3,4-dichlorophenyl)methyl]-2,3-dihydro-1,4-benzodioxin-6-amine\nResult: C1COC2=C(O1)C=CC(=C2)NCC3=CC(=C(C=C3)Cl)Cl"}", "/scratch/micpie/export/iupac_smiles/test_16-4.jsonl": "{"text":"The SELFIES of the chemical with systematic IUPAC name 5-chloranyl-4-(chloromethyl)-2-(4-propoxyphenoxy)pyridine is [C][C][C][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][O][C][=N][C][=C][Branch1][=Branch2][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][Cl][Cl]."} {"text":"The SMILES of the molecule with systematic IUPAC name 2-ethoxy-N-[(4-fluorophenyl)methyl]-3-[[4-[2-(2H-1,2,3,4-tetrazol-5-yl)phenyl]phenyl]methyl]benzimidazole-4-carboxamide is CCOC1=NC2=CC=CC(=C2N1CC3=CC=C(C=C3)C4=CC=CC=C4C5=NNN=N5)C(=O)NCC6=CC=C(C=C6)F."}", "/scratch/micpie/export/iupac_smiles/train_15-7.jsonl": "{"text":"Task: Please generate the SMILES of a molecule based on the traditional IUPAC name.\nIUPAC name: 1-methyl-9H-beta-carboline-7-carboxylic acid\nResult: CC1=NC=CC2=C1NC3=C2C=CC(=C3)C(=O)O"} {"text":"Task: Please create the SMILES representation of a chemical given the traditional IUPAC name.\nIUPAC name: 8-[[5-chloro-4-(chloromethyl)-2-pyridyl]oxy]quinoline\nResult: [C][=C][C][=C][Branch2][Ring1][#Branch1][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][=N][C][=C][Branch1][=Branch2][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][Cl][Cl][N][=C][C][=C][Ring2][Ring1][C]"}", "/scratch/micpie/export/iupac_smiles/train_27-8.jsonl": "{"text":"Task: Please give me the SMILES of a compound given the systematic IUPAC name.\nIUPAC name: 1-[(3R)-3-(4-chlorophenyl)-5-(4-fluorophenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(2-fluorophenyl)piperazin-1-yl]ethanone\nResult: InChI=1S\/C27H25ClF2N4O\/c28-21-9-5-20(6-10-21)26-17-24(19-7-11-22(29)12-8-19)31-34(26)27(35)18-32-13-15-33(16-14-32)25-4-2-1-3-23(25)30\/h1-12,26H,13-18H2\/t26-\/m1\/s1"} {"text":"Task: Please generate the SMILES representation of a chemical based on the systematic IUPAC name.\nIUPAC name: 6-[4-[4-[(9,9-dimethylfluoren-2-yl)amino]phenyl]-2,3,5,6-tetrakis(oxidanyl)phenyl]benzene-1,2,3,4,5-pentol\nResult: [C][C][Branch2][=Branch1][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][=Branch2][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][Branch2][Ring2][=Branch1][C][=Branch2][Ring1][P][=C][Branch1][=Branch2][C][=Branch1][Branch1][=C][Ring1][=Branch1][O][O][C][=C][Branch1][P][C][=Branch1][=N][=C][Branch1][=Branch2][C][=Branch1][Branch1][=C][Ring1][=Branch1][O][O][O][O][O][O][O][C]"}", "/scratch/micpie/export/iupac_smiles/test_9-0.jsonl": "{"text":"The traditional IUPAC name of the chemical with SMILES CC(CCC1=CC=CC=C1)NCC(CON=C2C3=CC=CC=C3C4=CC=CC=C42)O is 1-(fluoren-9-ylideneamino)oxy-3-[(1-methyl-3-phenyl-propyl)amino]propan-2-ol."} {"text":"The traditional IUPAC name of the molecule with SELFIES [C][C][C][=Branch1][C][=O][N][Branch1][Ring1][C][C][C][C][C][Branch2][Ring1][=C][C][C][Branch1][Branch1][C@@H1][Ring1][=Branch1][C][C][=Branch1][C][=O][N][C][C][C][Branch1][O][C][C][Branch1][=Branch1][N][C][Ring1][=Branch1][=O][C][C][Cl] is (2S)-5-chloro-3-[ethyl(propionyl)amino]-N-[(2-keto-4,6-dimethyl-3-piperidyl)methyl]-2-methyl-cyclohexanecarboxamide."}", "/scratch/micpie/export/iupac_smiles/train_5-4.jsonl": "{"text":"The DeepSMILES of the compound with systematic IUPAC name 4-[3-(aminomethyl)oxolan-3-yl]-3-ethyl-piperidin-4-ol is CCCCNCCC6CCCOC5))))CN)))O."} {"text":"The DeepSMILES of the molecule with systematic IUPAC name N-(2-azanylethyl)-2-[2-(oxolan-2-ylmethoxy)propanoylamino]benzamide;hydrochloride is CCC=O)NC=CC=CC=C6C=O)NCCN)))))))))))))OCCCCCO5.Cl."}", "/scratch/micpie/export/iupac_smiles/test_2-7.jsonl": "{"text":"Task: Please create the SMILES representation of a compound based on the traditional IUPAC name.\nIUPAC name: (2R)-2-(methylamino)-1-(o-tolyl)propan-1-one\nResult: CC=CC=CC=C6C=O)[C@@H]C)NC"} {"text":"Task: Please give me the SMILES of a molecule based on the traditional IUPAC name.\nIUPAC name: 2-(6-bromo-2,3-dihydro-1,4-benzodioxin-7-yl)-4,5,6,7-tetrachloro-isoindoline-1,3-quinone\nResult: O=C1c2c(Cl)c(Cl)c(Cl)c(Cl)c2C(=O)N1c1cc2c(cc1Br)OCCO2"}", "/scratch/micpie/export/iupac_smiles/valid_9-5.jsonl": "{"text":"The SELFIES of the molecule with IUAPC name in CAS-like style 1-(propan-2-ylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,9,11,13-heptaenylideneamino)oxy-2-propanol is [C][C][Branch1][C][C][N][C][C][Branch2][Ring1][=Branch2][C][O][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][#C][O]."} {"text":"The SELFIES of the chemical with CAS-like IUPAC name 5-chloro-3-[cyclobutyl(methyl)amino]-N-[(4-cycloheptyl-4,6-dimethyl-2-oxo-3-piperidinyl)methyl]-2-methylbenzamide is [C][C][C][C][Branch2][Ring2][N][C][Branch1][Branch2][C][=Branch1][C][=O][N][Ring1][#Branch1][C][N][C][=Branch1][C][=O][C][=C][Branch2][Ring1][=Branch1][C][=Branch1][=Branch2][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][Cl][N][Branch1][C][C][C][C][C][C][Ring1][Ring2][C][Branch1][C][C][C][C][C][C][C][C][C][Ring1][#Branch1]."}", "/scratch/micpie/export/iupac_smiles/train_0-8.jsonl": "{"text":"Task: Please create the SMILES representation of a compound given the systematic IUPAC name.\nIUPAC name: N-[(Z)-(3-bromanyl-4,5-dimethoxy-phenyl)methylideneamino]-1-[(4-nitrophenyl)methyl]pyrazole-3-carboxamide\nResult: COC=CC=CC=C6)\/C=N\\NC=O)C=NNC=C5))CC=CC=CC=C6))[N+]=O)[O-]))))))))))))))))Br))OC"} {"text":"Task: Please generate the SMILES of a chemical given the systematic IUPAC name.\nIUPAC name: 1-[4-[3-(2-methylpiperidin-1-yl)propoxy]phenyl]ethanone\nResult: CCCCCCN6CCCOC=CC=CC=C6))C=O)C"}", "/scratch/micpie/export/iupac_smiles/test_22-6.jsonl": "{"text":"The InChI of the molecule with preferred IUPAC name 2-bromo-5,7-dioxa-2,3-diazatricyclo[6.3.1.04,12]dodeca-1(12),3,8,10-tetraen-6-one is InChI=1S\/C8H3BrN2O3\/c9-11-4-2-1-3-5-6(4)7(10-11)14-8(12)13-5\/h1-3H."} {"text":"The canonical SMILES of the compound with IUPAC name 1-[(2-oxo-1,3-dihydroindol-5-yl)sulfonyl]piperidine-3-carbaldehyde is O=CC1CCCN(S(=O)(=O)c2ccc3c(c2)CC(=O)N3)C1."}", "/scratch/micpie/export/iupac_smiles/test_6-7.jsonl": "{"text":"Task: Please generate the SMILES representation of a molecule given the traditional IUPAC name.\nIUPAC name: N-(2-aminoethyl)-2-[(3-methyl-3-phenyl-butanoyl)amino]benzamide\nResult: CC(C)(CC(=O)Nc1ccccc1C(=O)NCCN)c1ccccc1"} {"text":"Task: Please generate the SMILES representation of a compound given the traditional IUPAC name.\nIUPAC name: (3,4-dichlorobenzyl)-(2,3-dihydro-1,4-benzodioxin-6-yl)amine\nResult: C1COC2=C(O1)C=CC(=C2)NCC3=CC(=C(C=C3)Cl)Cl"}", "/scratch/micpie/export/iupac_smiles/test_15-5.jsonl": "{"text":"The InChI of the molecule with IUAPC name in CAS-like style (E)-3-(6-amino-3-pyridinyl)-N-[[7-(5-chloro-2,4-difluorophenyl)-5-[4-[(4,4-difluoro-1-piperidinyl)-oxomethyl]phenyl]-2-benzofuranyl]methyl]-2-propenamide is InChI=1S\/C35H27ClF4N4O3\/c36-28-16-26(29(37)17-30(28)38)27-15-23(21-3-5-22(6-4-21)34(46)44-11-9-35(39,40)10-12-44)13-24-14-25(47-33(24)27)19-43-32(45)8-2-20-1-7-31(41)42-18-20\/h1-8,13-18H,9-12,19H2,(H2,41,42)(H,43,45)\/b8-2+."} {"text":"The SMILES of the chemical with IUAPC name in CAS-like style 5-chloro-4-(chloromethyl)-2-(2,3-dihydro-1H-inden-5-yloxy)pyridine is C1CC2=C(C1)C=C(C=C2)OC3=NC=C(C(=C3)CCl)Cl."}", "/scratch/micpie/export/iupac_smiles/test_0-5.jsonl": "{"text":"The SELFIES of the molecule with CAS-like IUPAC name 1-[(4-bromophenyl)methyl]-N-[(Z)-(3-methoxyphenyl)methylideneamino]-4-piperidinecarboxamide is [C][O][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][\/C][=N][\\N][C][=Branch1][C][=O][C][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Br]."} {"text":"The canonical SMILES of the chemical with CAS-like IUPAC name 2-(5-methyl-1lambda3-ioda-3-thia-2-azacyclopenta-1,4-dien-4-yl)-4-thiazolecarboxaldehyde is CC1=C(c2nc(C=O)cs2)SN=I1."}", "/scratch/micpie/export/iupac_smiles/test_8-3.jsonl": "{"text":"The DeepSMILES of the compound with traditional IUPAC name N-[3-[(3,4-dimethylphenyl)sulfonylamino]phenyl]-2-methyl-propionamide is CC=CC=CC=C6))S=O)=O)NC=CC=CC=C6)NC=O)CC)C)))))))))))))C."} {"text":"The canonical SMILES of the compound with traditional IUPAC name 1-(benzhydrylideneamino)oxy-3-(tert-butylamino)propan-2-ol is CC(C)(C)NCC(O)CON=C(c1ccccc1)c1ccccc1."}", "/scratch/micpie/export/iupac_smiles/valid_7-7.jsonl": "{"text":"Task: Please give me the SMILES of a molecule given the traditional IUPAC name.\nIUPAC name: 2-methyl-N-[2-[[4-(trifluoromethylthio)benzyl]amino]ethyl]propionamide\nResult: InChI=1S\/C14H19F3N2OS\/c1-10(2)13(20)19-8-7-18-9-11-3-5-12(6-4-11)21-14(15,16)17\/h3-6,10,18H,7-9H2,1-2H3,(H,19,20)"} {"text":"Task: Please create the SMILES representation of a chemical based on the traditional IUPAC name.\nIUPAC name: 3-[N-(2-thienylsulfonyl)anilino]propionamide\nResult: C1=CC=C(C=C1)N(CCC(=O)N)S(=O)(=O)C2=CC=CS2"}", "/scratch/micpie/export/iupac_smiles/valid_21-9.jsonl": "{"text":"Task: Please generate the SMILES of a compound based on the IUAPC name in CAS-like style.\nIUPAC name: benzoic acid [4-[(2S)-1-(1-azetidinyl)-3-cyclohexyl-1-oxopropan-2-yl]oxy-2-[5-[[2-[[[3-[[3,5-dihydroxy-6-(hydroxymethyl)-4-[(2-oxo-1-benzopyran-3-yl)methoxy]-2-oxanyl]oxy]-4-hydroxy-5-[(2-oxo-1-benzopyran-3-yl)methoxy]cyclohexyl]-oxomethyl]amino]ethylamino]-oxomethyl]-3-(4-phenyl-1-triazolyl)-2-[(3,4,5-trihydroxy-6-methyl-2-oxanyl)oxy]cyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)-3-oxanyl] ester\nResult: [C][C][C][Branch2][=N][#C][C][Branch2][=N][#Branch2][C][Branch2][=N][Branch1][C][Branch1][Ring2][O][Ring1][=Branch1][O][C][C][Branch2][O][Branch2][C][C][Branch2][Branch1][=C][C][C][Ring1][=Branch1][O][C][C][Branch2][Ring2][#Branch1][C][Branch1][=N][C][Branch1][=Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][C@@H1][Branch1][#Branch2][C][C][C][C][C][C][C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][C][Ring1][Ring2][O][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][N][C][=Branch1][C][=O][C][C][C][Branch2][Branch1][Ring1][C][Branch2][Ring2][=C][C][Branch1][Ring2][C][Ring1][=Branch1][O][C][C][Branch2][Ring1][P][C][Branch1][=N][C][Branch1][=Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][C][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][Ring1][#Branch2][=O][O][O][O][C][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][Ring1][#Branch2][=O][N][C][=C][Branch1][Branch1][N][=N][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][O][O][O]"} {"text":"Task: Please give me the SMILES representation of a compound based on the CAS-like IUPAC name.\nIUPAC name: 3-fluoro-5-(1H-1,2,4-triazol-5-yliodanuidyl)-2-(2,2,2-trifluoroethoxy)pyridine\nResult: C=CC=NC=C6F))OCCF)F)F)))))))[I-]C=NC=NN5"}", "/scratch/micpie/export/iupac_smiles/train_10-6.jsonl": "{"text":"The DeepSMILES of the compound with IUPAC name tert-butyl N-[4-[5-chloro-3-(dihydroxymethyl)-N,2-diethylanilino]cyclohexyl]carbamate is CCC=CC=CC=C6NCC))CCCCCC6))NC=O)OCC)C)C))))))))))))Cl)))CO)O."} {"text":"The SMILES of the compound with preferred IUPAC name [(3aS)-3a-carbonofluoridoyl-5a,5b,8,8,11a-pentamethyl-1-propan-2-yl-1,2,3,4,5,6,7,7a,9,10,11,11b,12,13,13a,13b-hexadecahydrocyclopenta[a]chrysen-9-yl] acetate is CC(C)C1CC[C@]2(C1C3CCC4C5(CCC(C(C5CCC4(C3(CC2)C)C)(C)C)OC(=O)C)C)C(=O)F."}", "/scratch/micpie/export/iupac_smiles/test_11-0.jsonl": "{"text":"The traditional IUPAC name of the chemical with InChI InChI=1S\/C17H19F2NO.C7H4N2O6\/c1-20(2)11-10-16(13-6-4-3-5-7-13)21-17-9-8-14(18)12-15(17)19;10-7(11)5-3-4(8(12)13)1-2-6(5)9(14)15\/h3-9,12,16H,10-11H2,1-2H3;1-3H,(H,10,11) is [3-(2,4-difluorophenoxy)-3-phenyl-propyl]-dimethyl-amine;2,5-dinitrobenzoic acid."} {"text":"The traditional IUPAC name of the chemical with DeepSMILES CCCCCCCCCC=CC=CC=C6)C=CC=CC=C6))C=O)))))))))C=C5C=CC=C6))C=CC=CC=C6))O)))))))))))CCCCCCCC is 4-[7-(4-hydroxyphenyl)-9,9-dioctyl-fluoren-2-yl]benzaldehyde."}", "/scratch/micpie/export/iupac_smiles/valid_23-2.jsonl": "{"text":"The preferred IUPAC name of the molecule with InChI InChI=1S\/C12H14ClNO3S\/c13-11-4-1-5-12(7-11)18(16,17)14-6-2-3-10(8-14)9-15\/h1,4-5,7,9-10H,2-3,6,8H2 is 1-(3-chlorophenyl)sulfonylpiperidine-3-carbaldehyde."} {"text":"The IUPAC name of the molecule with SMILES CC1=C(C(=C(C(=C1C)C)S(=O)(=O)NCCC(=O)NN2C=NC3=CC=CC=C32)C)C is N-(benzimidazol-1-yl)-3-[(2,3,4,5,6-pentamethylphenyl)sulfonylamino]propanamide."}", "/scratch/micpie/export/iupac_smiles/valid_18-6.jsonl": "{"text":"The DeepSMILES of the molecule with preferred IUPAC name 8-(1-methylpyrrolidin-3-yl)-6-pyridin-3-ylquinazoline is CNCCCC5)C=CC=CC=CN=CN=C%106)))))))C=CN=CC=C6."} {"text":"The InChI of the compound with preferred IUPAC name 2-[4-[2-(methylamino)ethyl]piperidin-1-yl]-1-(2-thiophen-2-ylpyrrolidin-1-yl)propan-1-one is InChI=1S\/C19H31N3OS\/c1-15(21-12-8-16(9-13-21)7-10-20-2)19(23)22-11-3-5-17(22)18-6-4-14-24-18\/h4,6,14-17,20H,3,5,7-13H2,1-2H3."}", "/scratch/micpie/export/iupac_smiles/valid_23-9.jsonl": "{"text":"Task: Please generate the SMILES of a chemical based on the CAS-like IUPAC name.\nIUPAC name: 1-(3-chlorophenyl)sulfonyl-3-piperidinecarboxaldehyde\nResult: CCCCNC6)S=O)=O)C=CC=CC=C6)))Cl)))))))C=O"} {"text":"Task: Please generate the SMILES representation of a chemical given the IUAPC name in CAS-like style.\nIUPAC name: N-(1-benzimidazolyl)-3-[(2,3,4,5,6-pentamethylphenyl)sulfonylamino]propanamide\nResult: [C][C][=C][Branch2][Ring2][=N][C][=Branch2][Ring2][Branch2][=C][Branch1][=Branch2][C][=Branch1][Branch1][=C][Ring1][=Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][C][=Branch1][C][=O][N][N][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][=Branch2][C][C]"}", "/scratch/micpie/export/iupac_smiles/train_13-5.jsonl": "{"text":"The canonical SMILES of the chemical with IUAPC name in CAS-like style 2-amino-N,3,3-trimethyl-N-prop-2-ynylbutanamide is C#CCN(C)C(=O)C(N)C(C)(C)C."} {"text":"The canonical SMILES of the compound with CAS-like IUPAC name (4R)-3,4-dihydro-2H-1-benzopyran-4-carboxylic acid (1,3,5-trimethyl-4-pyrazolyl) ester is Cc1nn(C)c(C)c1OC(=O)[C@@H]1CCOc2ccccc21."}", "/scratch/micpie/export/iupac_smiles/test_19-9.jsonl": "{"text":"Task: Please give me the SMILES representation of a chemical given the IUAPC name in CAS-like style.\nIUPAC name: N-(4-chlorophenyl)-3-[4-[2-(methylamino)ethyl]-1-piperidinyl]propanamide;hydrochloride\nResult: CNCCC1CCN(CCC(=O)Nc2ccc(Cl)cc2)CC1.Cl"} {"text":"Task: Please give me the SMILES representation of a molecule based on the CAS-like IUPAC name.\nIUPAC name: nan\nResult: [C][C@H1][C][=C][C][=C][C@][Ring1][=Branch1][C][C][C@H1][Branch2][#Branch2][#Branch1][C][C][=Branch1][C][=O][C][C@H1][Branch2][#Branch1][#C][N][C][=C][C][=Branch2][Ring1][S][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][=Branch2][C][C@@][Branch1][Branch1][C@@H1][Ring1][=Branch1][O][C][C][C][Branch1][Ring2][C][Ring1][#Branch1][C][C][C][C][Ring1][=Branch1][N][Branch2][Ring2][=C][C][=C][C][=Branch1][=Branch1][=C][C][N][Ring1][=Branch1][C@H1][Branch2][Ring1][Branch1][C@@H1][Branch1][#Branch1][C][#C][C][Ring2][Ring2][O][C][=C][N][C][=C][Ring1][Branch1][C][Ring2][Ring2][S][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][O][C][C][C][=Branch1][C][=O][C][C][=C][C][=Branch2][Ring1][Branch2][=C][Branch1][=Branch2][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][O][O][C][=Branch1][C][=O][C]"}", "/scratch/micpie/export/iupac_smiles/test_12-7.jsonl": "{"text":"Task: Please give me the SMILES of a chemical based on the traditional IUPAC name.\nIUPAC name: 2-(4-chlorophenyl)spiro[fluorene-9,9'-thioxanthene]\nResult: InChI=1S\/C31H19ClS\/c32-22-16-13-20(14-17-22)21-15-18-24-23-7-1-2-8-25(23)31(28(24)19-21)26-9-3-5-11-29(26)33-30-12-6-4-10-27(30)31\/h1-19H"} {"text":"Task: Please generate the SMILES of a chemical based on the traditional IUPAC name.\nIUPAC name: 2-amino-N-(cyclopropylmethyl)-3,3-dimethyl-N-propargyl-butyramide\nResult: C#CCN(CC1CC1)C(=O)C(N)C(C)(C)C"}", "/scratch/micpie/export/iupac_smiles/test_4-9.jsonl": "{"text":"Task: Please give me the SMILES representation of a compound based on the IUAPC name in CAS-like style.\nIUPAC name: 1-(3-methoxyphenyl)sulfonyl-3-[(4-methyl-1-piperazinyl)methyl]indole\nResult: COc1cccc(S(=O)(=O)n2cc(CN3CCN(C)CC3)c3ccccc32)c1"} {"text":"Task: Please give me the SMILES representation of a molecule based on the IUAPC name in CAS-like style.\nIUPAC name: 4-(1-amino-2-methylpropan-2-yl)-3-ethyl-4-piperidinol\nResult: CCC1CNCCC1(O)C(C)(C)CN"}", "/scratch/micpie/export/iupac_smiles/valid_0-9.jsonl": "{"text":"Task: Please generate the SMILES representation of a compound based on the IUAPC name in CAS-like style.\nIUPAC name: 1-[(4-bromophenyl)methyl]-N-[(Z)-1-(4-methylphenyl)ethylideneamino]-3-pyrazolecarboxamide\nResult: InChI=1S\/C20H19BrN4O\/c1-14-3-7-17(8-4-14)15(2)22-23-20(26)19-11-12-25(24-19)13-16-5-9-18(21)10-6-16\/h3-12H,13H2,1-2H3,(H,23,26)\/b22-15-"} {"text":"Task: Please give me the SMILES representation of a compound given the CAS-like IUPAC name.\nIUPAC name: ethene;4-methoxy-4,6-dimethyloxane-2,5-diol\nResult: InChI=1S\/C8H16O4.C2H4\/c1-5-7(10)8(2,11-3)4-6(9)12-5;1-2\/h5-7,9-10H,4H2,1-3H3;1-2H2"}", "/scratch/micpie/export/iupac_smiles/test_19-7.jsonl": "{"text":"Task: Please create the SMILES representation of a compound based on the traditional IUPAC name.\nIUPAC name: N-(4-chlorophenyl)-3-[4-[2-(methylamino)ethyl]piperidino]propionamide;hydrochloride\nResult: [C][N][C][C][C][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl].[Cl]"} {"text":"Task: Please create the SMILES of a chemical given the traditional IUPAC name.\nIUPAC name: nan\nResult: C[C@H]1C=CC=C2[C@]13CC[C@H](CC(=O)C[C@H](N4C=C5C(=CC=C6C5=C4C[C@@]7([C@@H]6O)CCC8(C7)CCCC8)N(C9=CC(=CCN9)[C@H]([C@@H](C#CC2)C1=CNC=C1C3)C1=CC(=CC=C1)O)CCC(=O)C)C1=CC(=C(C(=C1)OC)OC1=CC=CC(=C1)O)O)OC(=O)C"}", "/scratch/micpie/export/iupac_smiles/test_22-9.jsonl": "{"text":"Task: Please create the SMILES of a chemical given the CAS-like IUPAC name.\nIUPAC name: 2-bromo-5,7-dioxa-2,3-diazatricyclo[6.3.1.04,12]dodeca-1(12),3,8,10-tetraen-6-one\nResult: InChI=1S\/C8H3BrN2O3\/c9-11-4-2-1-3-5-6(4)7(10-11)14-8(12)13-5\/h1-3H"} {"text":"Task: Please generate the SMILES of a chemical given the IUAPC name in CAS-like style.\nIUPAC name: 1-[(2-oxo-1,3-dihydroindol-5-yl)sulfonyl]-3-piperidinecarboxaldehyde\nResult: InChI=1S\/C14H16N2O4S\/c17-9-10-2-1-5-16(8-10)21(19,20)12-3-4-13-11(6-12)7-14(18)15-13\/h3-4,6,9-10H,1-2,5,7-8H2,(H,15,18)"}", "/scratch/micpie/export/iupac_smiles/valid_5-8.jsonl": "{"text":"Task: Please give me the SMILES representation of a chemical given the systematic IUPAC name.\nIUPAC name: 4-[1-(aminomethyl)cyclooctyl]-3-ethyl-piperidin-4-ol\nResult: CCC1CNCCC1(O)C1(CN)CCCCCCC1"} {"text":"Task: Please generate the SMILES of a chemical based on the systematic IUPAC name.\nIUPAC name: N-(2-azanylethyl)-2-[3-(cyclopentylcarbonylamino)butanoylamino]benzamide;hydrochloride\nResult: InChI=1S\/C19H28N4O3.ClH\/c1-13(22-18(25)14-6-2-3-7-14)12-17(24)23-16-9-5-4-8-15(16)19(26)21-11-10-20;\/h4-5,8-9,13-14H,2-3,6-7,10-12,20H2,1H3,(H,21,26)(H,22,25)(H,23,24);1H"}", "/scratch/micpie/export/iupac_smiles/train_12-7.jsonl": "{"text":"Task: Please give me the SMILES representation of a compound based on the traditional IUPAC name.\nIUPAC name: (3-dibenzofuran-1-ylphenyl)-(9,9-dimethylfluoren-2-yl)-(3-phenylphenyl)amine\nResult: InChI=1S\/C45H33NO\/c1-45(2)40-22-8-6-19-37(40)38-26-25-35(29-41(38)45)46(33-17-10-15-31(27-33)30-13-4-3-5-14-30)34-18-11-16-32(28-34)36-21-12-24-43-44(36)39-20-7-9-23-42(39)47-43\/h3-29H,1-2H3"} {"text":"Task: Please generate the SMILES representation of a compound given the traditional IUPAC name.\nIUPAC name: 2-[(2-amino-3,3-dimethyl-butanoyl)amino]pent-4-enoic acid\nResult: C=CCC(NC(=O)C(N)C(C)(C)C)C(=O)O"}", "/scratch/micpie/export/iupac_smiles/valid_26-7.jsonl": "{"text":"Task: Please generate the SMILES of a compound based on the traditional IUPAC name.\nIUPAC name: 2-[4-(difluoromethyl)phenyl]-4-ethyl-thiazole-5-carboxylic acid\nResult: CCC=CSC=N5)C=CC=CC=C6))CF)F))))))))C=O)O"} {"text":"Task: Please create the SMILES representation of a compound given the traditional IUPAC name.\nIUPAC name: 2-(4-benzylpiperazino)-1-[(5R)-5-(2-chlorophenyl)-3-(3-methoxyphenyl)-2-pyrazolin-1-yl]ethanone\nResult: InChI=1S\/C29H31ClN4O2\/c1-36-24-11-7-10-23(18-24)27-19-28(25-12-5-6-13-26(25)30)34(31-27)29(35)21-33-16-14-32(15-17-33)20-22-8-3-2-4-9-22\/h2-13,18,28H,14-17,19-21H2,1H3\/t28-\/m1\/s1"}", "/scratch/micpie/export/iupac_smiles/test_20-4.jsonl": "{"text":"The SMILES of the chemical with systematic IUPAC name phenyl (2S,8S,12S)-4,10-bis(2-ethoxyphenyl)-3,5,9,11-tetrakis(oxidanylidene)-4,10-diazatetracyclo[5.5.2.02,6.08,12]tetradec-13-ene-13-carboxylate is CCOC1=CC=CC=C1N2C(=O)[C@@H]3[C@@H](C2=O)C4[C@H]5C(C3C=C4C(=O)OC6=CC=CC=C6)C(=O)N(C5=O)C7=CC=CC=C7OCC."} {"text":"The DeepSMILES of the chemical with systematic IUPAC name (2S)-3-cyclohexyl-2-[2-[3-[4-(3-fluorophenyl)-1,2,3-triazol-1-yl]-5-[2-[4-[4-(3-fluorophenyl)-1,2,3-triazol-1-yl]-6-(hydroxymethyl)-3,5-bis(oxidanyl)oxan-2-yl]oxyethylcarbamoyl]-2-[6-methyl-3,4,5-tris(oxidanyl)oxan-2-yl]oxy-cyclohexyl]oxy-6-(hydroxymethyl)-5-oxidanyl-3-(phenylcarbonyloxy)oxan-4-yl]oxy-propanoic acid is CCCCCCO6)OCCCCCC6OCCCCCO6)CO)))O))O[C@@H]CCCCCCC6)))))))C=O)O)))))OC=O)C=CC=CC=C6)))))))))))))C=O)NCCOCCCCCO6)CO)))O))NC=CN=N5))C=CC=CC=C6)))F))))))))O))))))))))NC=CN=N5))C=CC=CC=C6)))F)))))))))))O))O))O."}", "/scratch/micpie/export/iupac_smiles/valid_11-1.jsonl": "{"text":"The CAS-like IUPAC name of the compound with canonical SMILES Clc1ccc2c(c1)CCc1cccnc1C2=C1CCNCC1.O=S(=O)([O-])[O-].O=S(=O)([O-])[O-] is 13-chloro-2-(4-piperidinylidene)-4-azatricyclo[9.4.0.03,8]pentadeca-1(11),3(8),4,6,12,14-hexaene;disulfate."} {"text":"The IUAPC name in CAS-like style of the chemical with canonical SMILES CCCCCCCCC1(CCCCCCCC)c2cc(Br)ccc2-c2ccc(-c3ccc(C=O)cc3)cc21 is 4-(7-bromo-9,9-dioctyl-2-fluorenyl)benzaldehyde."}", "/scratch/micpie/export/iupac_smiles/test_22-5.jsonl": "{"text":"The InChI of the compound with IUAPC name in CAS-like style 2-bromo-5,7-dioxa-2,3-diazatricyclo[6.3.1.04,12]dodeca-1(12),3,8,10-tetraen-6-one is InChI=1S\/C8H3BrN2O3\/c9-11-4-2-1-3-5-6(4)7(10-11)14-8(12)13-5\/h1-3H."} {"text":"The canonical SMILES of the molecule with IUAPC name in CAS-like style 1-[(2-oxo-1,3-dihydroindol-5-yl)sulfonyl]-3-piperidinecarboxaldehyde is O=CC1CCCN(S(=O)(=O)c2ccc3c(c2)CC(=O)N3)C1."}", "/scratch/micpie/export/iupac_smiles/valid_20-1.jsonl": "{"text":"The CAS-like IUPAC name of the chemical with DeepSMILES CCCCCC=CCCC=CC=O)C5=CC=CCCCC=O)OC)))))OC=O)C)))))))))))O is 4-acetyloxy-7-(2-hydroxy-2-oct-2-enyl-5-oxo-1-cyclopent-3-enylidene)-5-heptenoic acid methyl ester."} {"text":"The IUAPC name in CAS-like style of the chemical with SMILES CC(=O)CNCCNCCNCC(=O)O is 2-[2-[2-(2-oxopropylamino)ethylamino]ethylamino]acetic acid."}", "/scratch/micpie/export/iupac_smiles/train_18-7.jsonl": "{"text":"Task: Please give me the SMILES representation of a molecule based on the traditional IUPAC name.\nIUPAC name: N-[3-[6-(6-methyl-3-pyridyl)-3,8a-dihydroquinazolin-8-yl]phenyl]acrylamide\nResult: CC=NC=CC=C6))C=CC=CNC=NC6C=C%10)C=CC=CC=C6)))NC=O)C=C"} {"text":"Task: Please create the SMILES representation of a compound given the traditional IUPAC name.\nIUPAC name: 2-[1-(3,4-difluorobenzyl)-4-piperidyl]ethyl-methyl-amine;hydrochloride\nResult: CNCCC1CCN(Cc2ccc(F)c(F)c2)CC1.Cl"}", "/scratch/micpie/export/iupac_smiles/test_7-5.jsonl": "{"text":"The DeepSMILES of the chemical with IUAPC name in CAS-like style 3-amino-4-hydroxy-N-(2-pyridinylmethyl)benzenesulfonamide is C=CC=NC=C6)CNS=O)=O)C=CC=CC=C6))O))N."} {"text":"The canonical SMILES of the molecule with IUAPC name in CAS-like style 3-(N-(4-fluorophenyl)sulfonylanilino)propanamide is NC(=O)CCN(c1ccccc1)S(=O)(=O)c1ccc(F)cc1."}", "/scratch/micpie/export/iupac_smiles/test_18-8.jsonl": "{"text":"Task: Please give me the SMILES of a chemical based on the systematic IUPAC name.\nIUPAC name: [(E)-2-ethyl-7,10-dimethyl-undec-3-enylidene]-dimethyl-phosphanium\nResult: [C][C][C][Branch1][S][\/C][=C][\/C][C][C][Branch1][C][C][C][C][C][Branch1][C][C][C][C][=P+1][Branch1][C][C][C]"} {"text":"Task: Please create the SMILES representation of a molecule based on the systematic IUPAC name.\nIUPAC name: N-methyl-2-[1-[2-(4-nitrophenyl)ethyl]piperidin-4-yl]ethanamine;hydrochloride\nResult: CNCCCCCNCC6))CCC=CC=CC=C6))[N+]=O)[O-].Cl"}", "/scratch/micpie/export/iupac_smiles/train_16-2.jsonl": "{"text":"The preferred IUPAC name of the chemical with SMILES CC1=CC(=C(C(=C1)OC2=NC=C(C(=C2)CCl)Cl)C)C is 5-chloro-4-(chloromethyl)-2-(2,3,5-trimethylphenoxy)pyridine."} {"text":"The preferred IUPAC name of the molecule with SELFIES [C][C][O][C][=N][C][=C][Branch1][#Branch2][C][=C][C][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][=Branch1][C][=O][O][C][Branch1][C][C][C] is propan-2-yl 2-ethoxy-1H-benzimidazole-4-carboxylate."}", "/scratch/micpie/export/iupac_smiles/test_14-2.jsonl": "{"text":"The preferred IUPAC name of the chemical with SELFIES [C][C][=C][Branch1][#Branch2][C][=Branch1][=Branch1][=N][N][Ring1][Branch1][C][C][O][C][=Branch1][C][=O][C@H1][C][C][=Branch1][C][=O][N][Branch1][Ring2][C][Ring1][=Branch1][C][C][Branch1][C][F][Branch1][C][F][F] is (1,3,5-trimethylpyrazol-4-yl) (3S)-5-oxo-1-(2,2,2-trifluoroethyl)pyrrolidine-3-carboxylate."} {"text":"The IUPAC name of the chemical with canonical SMILES Oc1cc(C(F)(F)F)c(Br)cc1I is 4-bromo-2-iodo-5-(trifluoromethyl)phenol."}", "/scratch/micpie/export/iupac_smiles/valid_19-7.jsonl": "{"text":"Task: Please generate the SMILES of a chemical based on the traditional IUPAC name.\nIUPAC name: N-[2-chloro-5-(tetrazol-1-yl)phenyl]-2-[4-[2-(methylamino)ethyl]piperidino]acetamide;hydrochloride\nResult: [C][N][C][C][C][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][=Branch1][C][=O][N][C][=C][Branch1][S][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][N][C][=N][N][=N][Ring1][Branch1][Cl].[Cl]"} {"text":"Task: Please create the SMILES representation of a compound given the traditional IUPAC name.\nIUPAC name: 2-[bis[2-[(2-isopropyl-4a-methyl-8-methylene-decalin-1-yl)amino]-2-keto-ethyl]amino]acetic acid methyl ester\nResult: [C][C][Branch1][C][C][C][C][C][C][Branch2][Branch1][P][C][C][C][C][=Branch1][C][=C][C][Ring1][#Branch1][C][Ring1][O][N][C][=Branch1][C][=O][C][N][Branch2][Ring2][Ring1][C][C][=Branch1][C][=O][N][C][C][Branch2][Ring1][Ring1][C][C][C][Branch1][=N][C][Ring1][=Branch1][C][=Branch1][C][=C][C][C][C][Ring1][#Branch1][C][C][Branch1][C][C][C][C][C][=Branch1][C][=O][O][C][C]"}", "/scratch/micpie/export/iupac_smiles/train_19-3.jsonl": "{"text":"The canonical SMILES of the compound with traditional IUPAC name N-(4-chlorophenyl)-3-[4-[2-(methylamino)ethyl]piperidino]propionamide is CNCCC1CCN(CCC(=O)Nc2ccc(Cl)cc2)CC1."} {"text":"The canonical SMILES of the chemical with traditional IUPAC name 3-[N-[2,6-dichloro-4-[hydroxy(oxido)amino]phenyl]-C-methyl-carbonimidoyl]coumarin is CC(=Nc1c(Cl)cc(N([O-])O)cc1Cl)c1cc2ccccc2oc1=O."}", "/scratch/micpie/export/iupac_smiles/test_24-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with InChI InChI=1S\/C9H6F2N4OS\/c10-6-1-5(2-7(11)3-6)8(16)14-15-4-12-13-9(15)17\/h1-4H,(H,13,17)(H,14,16) is 3,5-difluoro-N-(5-thioxo-1H-1,2,4-triazol-4-yl)benzamide."} {"text":"The traditional IUPAC name of the compound with SELFIES [C][C][=C][C][=Branch2][Ring1][=C][=N][C][=C][Branch1][Branch2][C][=N][N][Ring1][=Branch2][Ring1][Branch1][C][=Branch1][C][=O][N][C][Branch1][C][C][C][Branch1][C][F][Branch1][C][F][F][C][=C][N][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][Branch1][C][F][Branch1][C][F][F] is 7-methyl-N-(2,2,2-trifluoro-1-methyl-ethyl)-5-[6-(trifluoromethyl)-3-pyridyl]pyrazolo[1,5-a]pyrimidine-3-carboxamide."}", "/scratch/micpie/export/iupac_smiles/valid_9-1.jsonl": "{"text":"The CAS-like IUPAC name of the compound with DeepSMILES CCC)NCCCON=CC=CC=CC=C6C=CC=CC=CC=C6%15))))))))))))))))))O is 1-(propan-2-ylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,9,11,13-heptaenylideneamino)oxy-2-propanol."} {"text":"The IUAPC name in CAS-like style of the molecule with DeepSMILES CCCCCC=O)N6))CNC=O)C=CC=CC=C6)Cl)))NC)CCCC4))))))C)))))))C)CCCCCCC7 is 5-chloro-3-[cyclobutyl(methyl)amino]-N-[(4-cycloheptyl-4,6-dimethyl-2-oxo-3-piperidinyl)methyl]-2-methylbenzamide."}", "/scratch/micpie/export/iupac_smiles/train_13-3.jsonl": "{"text":"The canonical SMILES of the compound with traditional IUPAC name 2-amino-N,3,3-trimethyl-N-propargyl-butyramide is C#CCN(C)C(=O)C(N)C(C)(C)C."} {"text":"The InChI of the chemical with traditional IUPAC name (4R)-chroman-4-carboxylic acid (1,3,5-trimethylpyrazol-4-yl) ester is InChI=1S\/C16H18N2O3\/c1-10-15(11(2)18(3)17-10)21-16(19)13-8-9-20-14-7-5-4-6-12(13)14\/h4-7,13H,8-9H2,1-3H3\/t13-\/m1\/s1."}", "/scratch/micpie/export/iupac_smiles/valid_12-3.jsonl": "{"text":"The SMILES of the molecule with traditional IUPAC name 8-[6-(4,4-dioctoxybutanoyloxy)hexyl-(2-hydroxyethyl)amino]caprylic acid ethyl ester is CCCCCCCCOC(CCC(=O)OCCCCCCN(CCCCCCCC(=O)OCC)CCO)OCCCCCCCC."} {"text":"The SMILES of the molecule with traditional IUPAC name 2-amino-3,3-dimethyl-N-(5,5,5-trifluoropentyl)butyramide is CC(C)(C)C(C(=O)NCCCCC(F)(F)F)N."}", "/scratch/micpie/export/iupac_smiles/test_23-8.jsonl": "{"text":"Task: Please generate the SMILES of a molecule based on the systematic IUPAC name.\nIUPAC name: 1-(3-nitro-4-oxidanyl-phenyl)sulfonylpiperidine-3-carbaldehyde\nResult: O=CC1CCCN(S(=O)(=O)c2ccc(O)c([N+](=O)[O-])c2)C1"} {"text":"Task: Please give me the SMILES representation of a compound based on the systematic IUPAC name.\nIUPAC name: N-(benzimidazol-1-yl)-4-(ethylsulfamoyl)benzamide\nResult: CCNS=O)=O)C=CC=CC=C6))C=O)NNC=NC=CC=CC=C69"}", "/scratch/micpie/export/iupac_smiles/valid_2-9.jsonl": "{"text":"Task: Please give me the SMILES representation of a compound based on the IUAPC name in CAS-like style.\nIUPAC name: 4-[(3S,4S)-2,4-dimethylheptan-3-yl]phenol\nResult: CCC[C@H]C)[C@@H]C=CC=CC=C6))O)))))CC)C"} {"text":"Task: Please create the SMILES representation of a molecule based on the IUAPC name in CAS-like style.\nIUPAC name: 8-bromo-3-[2-(4-ethoxyphenyl)-4-thiazolyl]-6-nitro-1-benzopyran-2-one\nResult: CCOC=CC=CC=C6))C=NC=CS5))C=CC=CC=CC=C6OC%10=O))))Br)))[N+]=O)[O-]"}", "/scratch/micpie/export/iupac_smiles/train_20-9.jsonl": "{"text":"Task: Please give me the SMILES of a molecule based on the CAS-like IUPAC name.\nIUPAC name: sulfuric acid [4-hydroxy-6-[[2,6,13,17,17-pentamethyl-6-(4-methylpentyl)-4,8-dioxo-7-oxapentacyclo[10.8.0.02,9.05,9.013,18]eicos-11-en-16-yl]oxy]-5-[(3,4,5-trihydroxy-6-methyl-2-oxanyl)oxy]-3-oxanyl] ester\nResult: CC1C(C(C(C(O1)OC2C(C(COC2OC3CCC4(C(C3(C)C)CCC5C4=CCC67C5(CC(=O)C6C(OC7=O)(C)CCCC(C)C)C)C)OS(=O)(=O)O)O)O)O)O"} {"text":"Task: Please create the SMILES representation of a molecule given the IUAPC name in CAS-like style.\nIUPAC name: benzoic acid [4-[(2S)-1-(1-azetidinyl)-1-oxopentan-2-yl]oxy-2-[5-[[2-[[[3-[4-(3-fluorophenyl)-1-triazolyl]-5-[[4-[4-(3-fluorophenyl)-1-triazolyl]-3,5-dihydroxy-6-(hydroxymethyl)-2-oxanyl]oxy]-4-hydroxycyclohexyl]-oxomethyl]amino]ethylamino]-oxomethyl]-3-methyl-2-[(3,4,5-trihydroxy-6-methyl-2-oxanyl)oxy]cyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)-3-oxanyl] ester\nResult: CCC[C@@H]C=O)NCCC4)))))OCCCOCC6OC=O)C=CC=CC=C6)))))))))OCCCCCC6OCCCCCO6)C))O))O))O)))))C)))C=O)NCCNC=O)CCCCCC6)OCCCCCO6)CO)))O))NC=CN=N5))C=CC=CC=C6)))F))))))))O)))))O))NC=CN=N5))C=CC=CC=C6)))F))))))))))))))))))))))CO)))O"}", "/scratch/micpie/export/iupac_smiles/test_6-6.jsonl": "{"text":"The DeepSMILES of the compound with preferred IUPAC name N-(2-aminoethyl)-2-[(3-methyl-3-phenylbutanoyl)amino]benzamide is CCC)CC=O)NC=CC=CC=C6C=O)NCCN))))))))))))))C=CC=CC=C6."} {"text":"The DeepSMILES of the chemical with preferred IUPAC name N-[(3,4-dichlorophenyl)methyl]-2,3-dihydro-1,4-benzodioxin-6-amine is CCOC=CO6)C=CC=C6)NCC=CC=CC=C6))Cl))Cl."}", "/scratch/micpie/export/iupac_smiles/train_12-5.jsonl": "{"text":"The DeepSMILES of the chemical with IUAPC name in CAS-like style N-[3-(1-dibenzofuranyl)phenyl]-9,9-dimethyl-N-(3-phenylphenyl)-2-fluorenamine is CCC=CC=CC=C6C=C9C=CC=C6))NC=CC=CC=C6)C=CC=CC=C6)))))))))))C=CC=CC=C6)C=CC=CC=CC=C6OC9=CC=C%13)))))))))))))))))))))))))))))C."} {"text":"The canonical SMILES of the compound with CAS-like IUPAC name 2-[(2-amino-3,3-dimethyl-1-oxobutyl)amino]-4-pentenoic acid is C=CCC(NC(=O)C(N)C(C)(C)C)C(=O)O."}", "/scratch/micpie/export/iupac_smiles/valid_13-1.jsonl": "{"text":"The CAS-like IUPAC name of the chemical with DeepSMILES CCCC)CNC=O)CCC)C)C))N)))))C=O)O is 2-[[(2-amino-3,3-dimethyl-1-oxobutyl)amino]methyl]-2-methylbutanoic acid."} {"text":"The CAS-like IUPAC name of the compound with DeepSMILES C[C@@H]CCNC=CC=CC=C6N%11)))))))C=O)C=CC=NC=C6)))Cl is (2-chloro-4-pyridinyl)-[(2R)-2-methyl-1,2,3,4-tetrahydro-1,5-benzodiazepin-5-yl]methanone."}", "/scratch/micpie/export/iupac_smiles/valid_3-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with canonical SMILES C[C@@]12C[C@H]3C[C@@](O)(C1)C[C@](C(=O)O)(C3)C2 is (1S,3R,5S,7R)-3-hydroxy-5-methyl-adamantane-1-carboxylic acid."} {"text":"The traditional IUPAC name of the chemical with SELFIES [C][N][Branch2][Branch1][Branch1][C][C][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=Branch1][C][=O][C][=C][Branch2][Ring1][C][C][=Branch1][=N][=C][Branch1][Branch2][C][=C][Ring1][=Branch1][O][Ring1][O][O][C][O][C][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C] is 2-[4-(5-hydroxy-4-keto-6,7-dimethoxy-chromen-2-yl)phenoxy]-N-[6-[methyl(o-anisyl)amino]hexyl]acetamide."}", "/scratch/micpie/export/iupac_smiles/test_18-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the compound with InChI InChI=1S\/C17H34P\/c1-7-17(14-18(5)6)11-9-8-10-16(4)13-12-15(2)3\/h9,11,14-17H,7-8,10,12-13H2,1-6H3\/q+1\/b11-9+ is [(E)-2-ethyl-7,10-dimethylundec-3-enylidene]-dimethylphosphonium."} {"text":"The IUAPC name in CAS-like style of the chemical with canonical SMILES CNCCC1CCN(CCc2ccc([N+](=O)[O-])cc2)CC1.Cl is N-methyl-2-[1-[2-(4-nitrophenyl)ethyl]-4-piperidinyl]ethanamine;hydrochloride."}", "/scratch/micpie/export/iupac_smiles/test_24-6.jsonl": "{"text":"The canonical SMILES of the chemical with IUPAC name 3,5-difluoro-N-(5-sulfanylidene-1H-1,2,4-triazol-4-yl)benzamide is O=C(Nn1cn[nH]c1=S)c1cc(F)cc(F)c1."} {"text":"The SELFIES of the chemical with IUPAC name 7-methyl-5-[6-(trifluoromethyl)pyridin-3-yl]-N-(1,1,1-trifluoropropan-2-yl)pyrazolo[1,5-a]pyrimidine-3-carboxamide is [C][C][=C][C][=Branch2][Ring1][=C][=N][C][=C][Branch1][Branch2][C][=N][N][Ring1][=Branch2][Ring1][Branch1][C][=Branch1][C][=O][N][C][Branch1][C][C][C][Branch1][C][F][Branch1][C][F][F][C][=C][N][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][Branch1][C][F][Branch1][C][F][F]."}", "/scratch/micpie/export/iupac_smiles/test_21-3.jsonl": "{"text":"The SELFIES of the chemical with traditional IUPAC name benzoic acid [4-[(1S)-2-(azetidin-1-yl)-1-(cyclohexylmethyl)-2-keto-ethoxy]-2-[5-[2-[[3-[3,5-dihydroxy-4-[(2-ketochromen-3-yl)methoxy]-6-methylol-tetrahydropyran-2-yl]oxy-4-hydroxy-5-[(2-ketochromen-3-yl)methoxy]cyclohexanecarbonyl]amino]ethylcarbamoyl]-3-ethyl-2-(3,4,5-trihydroxy-6-methyl-tetrahydropyran-2-yl)oxy-cyclohexoxy]-5-hydroxy-6-methylol-tetrahydropyran-3-yl] ester is [C][C][C][C][C][Branch2][#Branch1][#Branch1][C][C][Branch2][Ring1][=Branch2][C][Ring1][=Branch1][O][C][C][Branch1][S][C][Branch1][N][C][Branch1][Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][O][C][C][Branch2][Ring2][#Branch1][C][Branch1][=N][C][Branch1][=Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][C@@H1][Branch1][#Branch2][C][C][C][C][C][C][C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][C][Ring1][Ring2][O][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][N][C][=Branch1][C][=O][C][C][C][Branch2][Branch1][Ring1][C][Branch2][Ring2][=C][C][Branch1][Ring2][C][Ring1][=Branch1][O][C][C][Branch2][Ring1][P][C][Branch1][=N][C][Branch1][=Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][C][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][Ring1][#Branch2][=O][O][O][O][C][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][Ring1][#Branch2][=O]."} {"text":"The DeepSMILES of the molecule with traditional IUPAC name 2-(methoxymethyl)-3-[4-(trifluoromethoxy)phenyl]pyrazine is COCC=NC=CN=C6C=CC=CC=C6))OCF)F)F."}", "/scratch/micpie/export/iupac_smiles/train_10-8.jsonl": "{"text":"Task: Please give me the SMILES representation of a chemical based on the systematic IUPAC name.\nIUPAC name: tert-butyl N-[4-[[3-[bis(oxidanyl)methyl]-5-chloranyl-2-ethyl-phenyl]-ethyl-amino]cyclohexyl]carbamate\nResult: CCC=CC=CC=C6NCC))CCCCCC6))NC=O)OCC)C)C))))))))))))Cl)))CO)O"} {"text":"Task: Please generate the SMILES representation of a chemical based on the systematic IUPAC name.\nIUPAC name: [(3aS)-3a-carbonofluoridoyl-5a,5b,8,8,11a-pentamethyl-1-propan-2-yl-1,2,3,4,5,6,7,7a,9,10,11,11b,12,13,13a,13b-hexadecahydrocyclopenta[a]chrysen-9-yl] ethanoate\nResult: CC(=O)OC1CCC2(C)C(CCC3(C)C2CCC2C4C(C(C)C)CC[C@]4(C(=O)F)CCC23C)C1(C)C"}", "/scratch/micpie/export/iupac_smiles/valid_2-6.jsonl": "{"text":"The SELFIES of the molecule with preferred IUPAC name 4-[(3S,4S)-2,4-dimethylheptan-3-yl]phenol is [C][C][C][C@H1][Branch1][C][C][C@@H1][Branch1][N][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][O][C][Branch1][C][C][C]."} {"text":"The SELFIES of the chemical with preferred IUPAC name 8-bromo-3-[2-(4-ethoxyphenyl)-1,3-thiazol-4-yl]-6-nitrochromen-2-one is [C][C][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=N][C][=Branch1][Branch1][=C][S][Ring1][Branch1][C][=C][C][=C][C][=Branch1][=C][=C][C][=Branch1][=Branch2][=C][Ring1][=Branch1][O][C][Ring1][#Branch2][=O][Br][N+1][=Branch1][C][=O][O-1]."}", "/scratch/micpie/export/iupac_smiles/train_18-1.jsonl": "{"text":"The CAS-like IUPAC name of the molecule with DeepSMILES CC=NC=CC=C6))C=CC=CNC=NC6C=C%10)C=CC=CC=C6)))NC=O)C=C is N-[3-[6-(6-methyl-3-pyridinyl)-3,8a-dihydroquinazolin-8-yl]phenyl]-2-propenamide."} {"text":"The CAS-like IUPAC name of the compound with InChI InChI=1S\/C15H22F2N2.ClH\/c1-18-7-4-12-5-8-19(9-6-12)11-13-2-3-14(16)15(17)10-13;\/h2-3,10,12,18H,4-9,11H2,1H3;1H is 2-[1-[(3,4-difluorophenyl)methyl]-4-piperidinyl]-N-methylethanamine;hydrochloride."}", "/scratch/micpie/export/iupac_smiles/train_17-2.jsonl": "{"text":"The IUPAC name of the chemical with SMILES CCOC1=NC2=CC=CC(=C2N1CC3=CC=C(C=C3)C4=CC=CC=C4C5=NNN=N5)C(=O)NC6=CC=NC=C6 is 2-ethoxy-N-pyridin-4-yl-3-[[4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl]benzimidazole-4-carboxamide."} {"text":"The IUPAC name of the chemical with canonical SMILES CCCC(CCC)c1cc(-c2ccccc2F)cc2cncnc12 is 6-(2-fluorophenyl)-8-heptan-4-ylquinazoline."}", "/scratch/micpie/export/iupac_smiles/test_0-1.jsonl": "{"text":"The CAS-like IUPAC name of the chemical with SMILES COC1=CC=CC(=C1)\/C=N\\NC(=O)C2CCN(CC2)CC3=CC=C(C=C3)Br is 1-[(4-bromophenyl)methyl]-N-[(Z)-(3-methoxyphenyl)methylideneamino]-4-piperidinecarboxamide."} {"text":"The IUAPC name in CAS-like style of the chemical with DeepSMILES CC=CSN=I5)))C=NC=CS5))C=O is 2-(5-methyl-1lambda3-ioda-3-thia-2-azacyclopenta-1,4-dien-4-yl)-4-thiazolecarboxaldehyde."}", "/scratch/micpie/export/iupac_smiles/valid_26-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with canonical SMILES CCc1nc(-c2ccc(C(F)F)cc2)sc1C(=O)O is 2-[4-(difluoromethyl)phenyl]-4-ethyl-thiazole-5-carboxylic acid."} {"text":"The traditional IUPAC name of the compound with SMILES COC1=CC=CC(=C1)C2=NN([C@H](C2)C3=CC=CC=C3Cl)C(=O)CN4CCN(CC4)CC5=CC=CC=C5 is 2-(4-benzylpiperazino)-1-[(5R)-5-(2-chlorophenyl)-3-(3-methoxyphenyl)-2-pyrazolin-1-yl]ethanone."}", "/scratch/micpie/export/iupac_smiles/test_11-9.jsonl": "{"text":"Task: Please generate the SMILES representation of a molecule given the CAS-like IUPAC name.\nIUPAC name: 3-(2,4-difluorophenoxy)-N,N-dimethyl-3-phenyl-1-propanamine;2,5-dinitrobenzoic acid\nResult: CNC)CCCC=CC=CC=C6))))))OC=CC=CC=C6))F)))F.C=CC=CC=C6[N+]=O)[O-]))))C=O)O)))[N+]=O)[O-]"} {"text":"Task: Please give me the SMILES representation of a compound based on the CAS-like IUPAC name.\nIUPAC name: 4-[7-(4-hydroxyphenyl)-9,9-dioctyl-2-fluorenyl]benzaldehyde\nResult: InChI=1S\/C42H50O2\/c1-3-5-7-9-11-13-27-42(28-14-12-10-8-6-4-2)40-29-35(33-17-15-32(31-43)16-18-33)21-25-38(40)39-26-22-36(30-41(39)42)34-19-23-37(44)24-20-34\/h15-26,29-31,44H,3-14,27-28H2,1-2H3"}", "/scratch/micpie/export/iupac_smiles/test_25-3.jsonl": "{"text":"The DeepSMILES of the molecule with traditional IUPAC name 5-(1,3-benzoxazol-5-yl)-N-[(1R)-1-cyclopropyl-2,2,2-trifluoro-ethyl]-7-methyl-pyrazolo[1,5-a]pyrimidine-3-carboxamide is CC=CC=NC=CC=NN95)))C=O)N[C@H]CCC3)))CF)F)F))))))))C=CC=CC=C6))OC=N5."} {"text":"The SELFIES of the molecule with traditional IUPAC name 3-ethyl-5,6,7,8-tetrahydrobenzo[f]benzofuran-2-carboxylic acid is [C][C][C][=C][Branch2][Ring1][C][O][C][=C][Ring1][Branch1][C][=C][C][C][C][C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=Branch1][C][=O][O]."}", "/scratch/micpie/export/iupac_smiles/test_6-3.jsonl": "{"text":"The DeepSMILES of the compound with traditional IUPAC name N-(2-aminoethyl)-2-[(3-methyl-3-phenyl-butanoyl)amino]benzamide is CCC)CC=O)NC=CC=CC=C6C=O)NCCN))))))))))))))C=CC=CC=C6."} {"text":"The SELFIES of the chemical with traditional IUPAC name (3,4-dichlorobenzyl)-(2,3-dihydro-1,4-benzodioxin-6-yl)amine is [C][C][O][C][=C][Branch1][Ring2][O][Ring1][=Branch1][C][=C][C][=Branch1][Ring2][=C][Ring1][#Branch1][N][C][C][=C][C][=Branch1][=Branch2][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl][Cl]."}", "/scratch/micpie/export/iupac_smiles/train_22-1.jsonl": "{"text":"The CAS-like IUPAC name of the molecule with DeepSMILES CCC=CC=CC6C=NN=CO5)C)))))S=O)=O)C)))CF)F)F)))))C=O)N is 6-methyl-5-(5-methyl-1,3,4-oxadiazol-2-yl)-5-methylsulfonyl-4-(trifluoromethyl)-1-cyclohexa-1,3-dienecarboxamide."} {"text":"The CAS-like IUPAC name of the compound with DeepSMILES CCCS=O)=O)NCCCCC6)C=O is 1-propylsulfonyl-3-piperidinecarboxaldehyde."}", "/scratch/micpie/export/iupac_smiles/train_3-9.jsonl": "{"text":"Task: Please create the SMILES of a molecule based on the IUAPC name in CAS-like style.\nIUPAC name: (5-nitro-2-furanyl)-(1-pyrrolidinyl)methanone\nResult: O=C(c1ccc([N+](=O)[O-])o1)N1CCCC1"} {"text":"Task: Please give me the SMILES representation of a chemical given the CAS-like IUPAC name.\nIUPAC name: (12S,14S,17E)-7-chloro-12-(hydroxymethyl)-23-(phenylmethyl)-15,20-dioxa-2,4,5,9,11,23-hexazahexacyclo[19.7.1.13,10.111,14.04,8.022,27]hentriaconta-1(28),3(31),5,7,9,17,21(29),22(27)-octaen-24-one\nResult: [C][C][C][=Branch1][C][=O][N][Branch2][Branch1][Branch1][C][=C][Ring1][#Branch1][C][=C][C][=C][Ring1][=Branch1][O][C][\/C][=C][\/C][O][C@H1][C][C@H1][Branch2][Ring1][=N][N][Branch1][Ring2][C][Ring1][Branch1][C][=N][C][=C][Branch1][S][C][=N][N][Ring1][Branch1][C][=Branch1][Ring2][=C][Ring1][=Branch2][N][Ring2][Ring1][Branch2][Cl][C][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]"}", "/scratch/micpie/export/iupac_smiles/test_24-2.jsonl": "{"text":"The IUPAC name of the molecule with InChI InChI=1S\/C9H6F2N4OS\/c10-6-1-5(2-7(11)3-6)8(16)14-15-4-12-13-9(15)17\/h1-4H,(H,13,17)(H,14,16) is 3,5-difluoro-N-(5-sulfanylidene-1H-1,2,4-triazol-4-yl)benzamide."} {"text":"The preferred IUPAC name of the molecule with canonical SMILES Cc1cc(-c2ccc(C(F)(F)F)nc2)nc2c(C(=O)NC(C)C(F)(F)F)cnn12 is 7-methyl-5-[6-(trifluoromethyl)pyridin-3-yl]-N-(1,1,1-trifluoropropan-2-yl)pyrazolo[1,5-a]pyrimidine-3-carboxamide."}", "/scratch/micpie/export/iupac_smiles/test_21-0.jsonl": "{"text":"The traditional IUPAC name of the chemical with SELFIES [C][C][C][C][C][Branch2][#Branch1][#Branch1][C][C][Branch2][Ring1][=Branch2][C][Ring1][=Branch1][O][C][C][Branch1][S][C][Branch1][N][C][Branch1][Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][O][C][C][Branch2][Ring2][#Branch1][C][Branch1][=N][C][Branch1][=Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][C@@H1][Branch1][#Branch2][C][C][C][C][C][C][C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][C][Ring1][Ring2][O][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][N][C][=Branch1][C][=O][C][C][C][Branch2][Branch1][Ring1][C][Branch2][Ring2][=C][C][Branch1][Ring2][C][Ring1][=Branch1][O][C][C][Branch2][Ring1][P][C][Branch1][=N][C][Branch1][=Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][C][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][Ring1][#Branch2][=O][O][O][O][C][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][Ring1][#Branch2][=O] is benzoic acid [4-[(1S)-2-(azetidin-1-yl)-1-(cyclohexylmethyl)-2-keto-ethoxy]-2-[5-[2-[[3-[3,5-dihydroxy-4-[(2-ketochromen-3-yl)methoxy]-6-methylol-tetrahydropyran-2-yl]oxy-4-hydroxy-5-[(2-ketochromen-3-yl)methoxy]cyclohexanecarbonyl]amino]ethylcarbamoyl]-3-ethyl-2-(3,4,5-trihydroxy-6-methyl-tetrahydropyran-2-yl)oxy-cyclohexoxy]-5-hydroxy-6-methylol-tetrahydropyran-3-yl] ester."} {"text":"The traditional IUPAC name of the compound with canonical SMILES COCc1nccnc1-c1ccc(OC(F)(F)F)cc1 is 2-(methoxymethyl)-3-[4-(trifluoromethoxy)phenyl]pyrazine."}", "/scratch/micpie/export/iupac_smiles/test_22-8.jsonl": "{"text":"Task: Please create the SMILES of a compound based on the systematic IUPAC name.\nIUPAC name: 2-bromanyl-5,7-dioxa-2,3-diazatricyclo[6.3.1.04,12]dodeca-1(12),3,8,10-tetraen-6-one\nResult: C1=CC2=C3C(=C1)OC(=O)OC3=NN2Br"} {"text":"Task: Please give me the SMILES of a molecule based on the systematic IUPAC name.\nIUPAC name: 1-[(2-oxidanylidene-1,3-dihydroindol-5-yl)sulfonyl]piperidine-3-carbaldehyde\nResult: O=CC1CCCN(S(=O)(=O)c2ccc3c(c2)CC(=O)N3)C1"}", "/scratch/micpie/export/iupac_smiles/valid_10-0.jsonl": "{"text":"The traditional IUPAC name of the chemical with SMILES CCCCC(C)N(C)C1=CC=CC(=C1C)C(=O)OC is 2-methyl-3-[methyl(1-methylpentyl)amino]benzoic acid methyl ester."} {"text":"The traditional IUPAC name of the molecule with canonical SMILES CCC[C@H]1C(=O)C(C)(C)[C@@H](O)CC(=O)N[C@H](\/C(F)=C\/c2ccccn2)C\/C=C(\/C)CCC[C@H](C)[C@@H]1O is (4S,7R,8S,9S,13Z,16S)-16-[(Z)-1-fluoro-2-(2-pyridyl)vinyl]-4,8-dihydroxy-5,5,9,13-tetramethyl-7-propyl-1-azacyclohexadec-13-ene-2,6-quinone."}", "/scratch/micpie/export/iupac_smiles/valid_7-5.jsonl": "{"text":"The InChI of the compound with CAS-like IUPAC name 2-methyl-N-[2-[[4-(trifluoromethylthio)phenyl]methylamino]ethyl]propanamide is InChI=1S\/C14H19F3N2OS\/c1-10(2)13(20)19-8-7-18-9-11-3-5-12(6-4-11)21-14(15,16)17\/h3-6,10,18H,7-9H2,1-2H3,(H,19,20)."} {"text":"The SELFIES of the chemical with CAS-like IUPAC name 3-(N-thiophen-2-ylsulfonylanilino)propanamide is [C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][Branch1][Branch2][C][C][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][S][Ring1][Branch1]."}", "/scratch/micpie/export/iupac_smiles/test_5-0.jsonl": "{"text":"The traditional IUPAC name of the compound with DeepSMILES CCCCNCCC6CC)CC))CN)))O is 4-[1-(aminomethyl)-1-methyl-propyl]-3-ethyl-piperidin-4-ol."} {"text":"The traditional IUPAC name of the molecule with canonical SMILES CC(C)C(CC(=O)Nc1ccccc1C(=O)NCCN)c1ccccc1 is N-(2-aminoethyl)-2-[(4-methyl-3-phenyl-pentanoyl)amino]benzamide."}", "/scratch/micpie/export/iupac_smiles/test_2-0.jsonl": "{"text":"The traditional IUPAC name of the compound with DeepSMILES CC=CC=CC=C6C=O)[C@@H]C)NC is (2R)-2-(methylamino)-1-(o-tolyl)propan-1-one."} {"text":"The traditional IUPAC name of the compound with DeepSMILES CCOC=CO6)C=CC=C6)Br))NC=O)C=CC5=O))C=CC=C6Cl))Cl))Cl))Cl is 2-(6-bromo-2,3-dihydro-1,4-benzodioxin-7-yl)-4,5,6,7-tetrachloro-isoindoline-1,3-quinone."}", "/scratch/micpie/export/iupac_smiles/train_9-4.jsonl": "{"text":"The InChI of the molecule with systematic IUPAC name 1-(tert-butylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,11,13-hexaenylideneamino)oxy-propan-2-ol is InChI=1S\/C22H28N2O2\/c1-22(2,3)23-14-18(25)15-26-24-21-19-10-6-4-8-16(19)12-13-17-9-5-7-11-20(17)21\/h4-11,18,23,25H,12-15H2,1-3H3."} {"text":"The DeepSMILES of the compound with systematic IUPAC name N-[(5-butyl-2,4-dimethyl-6-oxidanylidene-3,4-dihydropyridin-5-yl)methyl]-3-[cyclopentyl(methyl)amino]-2-ethenyl-benzamide is CCCCCCCC=NC6=O)))C)))C))CNC=O)C=CC=CC=C6)))NC)CCCCC5)))))))C=C."}", "/scratch/micpie/export/iupac_smiles/test_12-3.jsonl": "{"text":"The SMILES of the chemical with traditional IUPAC name 2-(4-chlorophenyl)spiro[fluorene-9,9'-thioxanthene] is C1=CC=C2C(=C1)C3=C(C24C5=CC=CC=C5SC6=CC=CC=C46)C=C(C=C3)C7=CC=C(C=C7)Cl."} {"text":"The SELFIES of the molecule with traditional IUPAC name 2-amino-N-(cyclopropylmethyl)-3,3-dimethyl-N-propargyl-butyramide is [C][C][Branch1][C][C][Branch1][C][C][C][Branch1][P][C][=Branch1][C][=O][N][Branch1][Ring2][C][C][#C][C][C][C][C][Ring1][Ring1][N]."}", "/scratch/micpie/export/iupac_smiles/test_18-7.jsonl": "{"text":"Task: Please give me the SMILES of a chemical based on the traditional IUPAC name.\nIUPAC name: [(E)-2-ethyl-7,10-dimethyl-undec-3-enylidene]-dimethyl-phosphonium\nResult: [C][C][C][Branch1][S][\/C][=C][\/C][C][C][Branch1][C][C][C][C][C][Branch1][C][C][C][C][=P+1][Branch1][C][C][C]"} {"text":"Task: Please generate the SMILES of a compound given the traditional IUPAC name.\nIUPAC name: methyl-[2-[1-[2-(4-nitrophenyl)ethyl]-4-piperidyl]ethyl]amine;hydrochloride\nResult: CNCCC1CCN(CC1)CCC2=CC=C(C=C2)[N+](=O)[O-].Cl"}", "/scratch/micpie/export/iupac_smiles/test_24-9.jsonl": "{"text":"Task: Please create the SMILES representation of a molecule given the IUAPC name in CAS-like style.\nIUPAC name: 3,5-difluoro-N-(5-sulfanylidene-1H-1,2,4-triazol-4-yl)benzamide\nResult: C1=C(C=C(C=C1F)F)C(=O)NN2C=NNC2=S"} {"text":"Task: Please create the SMILES of a compound given the CAS-like IUPAC name.\nIUPAC name: 7-methyl-5-[6-(trifluoromethyl)-3-pyridinyl]-N-(1,1,1-trifluoropropan-2-yl)-3-pyrazolo[1,5-a]pyrimidinecarboxamide\nResult: [C][C][=C][C][=Branch2][Ring1][=C][=N][C][=C][Branch1][Branch2][C][=N][N][Ring1][=Branch2][Ring1][Branch1][C][=Branch1][C][=O][N][C][Branch1][C][C][C][Branch1][C][F][Branch1][C][F][F][C][=C][N][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][Branch1][C][F][Branch1][C][F][F]"}", "/scratch/micpie/export/iupac_smiles/test_22-4.jsonl": "{"text":"The canonical SMILES of the chemical with systematic IUPAC name 2-bromanyl-5,7-dioxa-2,3-diazatricyclo[6.3.1.04,12]dodeca-1(12),3,8,10-tetraen-6-one is O=C1Oc2cccc3c2c(nn3Br)O1."} {"text":"The SELFIES of the chemical with systematic IUPAC name 1-[(2-oxidanylidene-1,3-dihydroindol-5-yl)sulfonyl]piperidine-3-carbaldehyde is [C][C][C][Branch2][Ring1][P][C][N][Branch1][Ring2][C][Ring1][=Branch1][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][Ring1][Branch2][C][=O]."}", "/scratch/micpie/export/iupac_smiles/valid_2-2.jsonl": "{"text":"The IUPAC name of the molecule with SELFIES [C][C][C][C@H1][Branch1][C][C][C@@H1][Branch1][N][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][O][C][Branch1][C][C][C] is 4-[(3S,4S)-2,4-dimethylheptan-3-yl]phenol."} {"text":"The preferred IUPAC name of the chemical with SELFIES [C][C][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=N][C][=Branch1][Branch1][=C][S][Ring1][Branch1][C][=C][C][=C][C][=Branch1][=C][=C][C][=Branch1][=Branch2][=C][Ring1][=Branch1][O][C][Ring1][#Branch2][=O][Br][N+1][=Branch1][C][=O][O-1] is 8-bromo-3-[2-(4-ethoxyphenyl)-1,3-thiazol-4-yl]-6-nitrochromen-2-one."}", "/scratch/micpie/export/iupac_smiles/valid_23-7.jsonl": "{"text":"Task: Please create the SMILES of a compound based on the traditional IUPAC name.\nIUPAC name: 1-(3-chlorophenyl)sulfonylnipecotaldehyde\nResult: C1CC(CN(C1)S(=O)(=O)C2=CC(=CC=C2)Cl)C=O"} {"text":"Task: Please give me the SMILES representation of a chemical based on the traditional IUPAC name.\nIUPAC name: N-(benzimidazol-1-yl)-3-[(2,3,4,5,6-pentamethylphenyl)sulfonylamino]propionamide\nResult: [C][C][=C][Branch2][Ring2][=N][C][=Branch2][Ring2][Branch2][=C][Branch1][=Branch2][C][=Branch1][Branch1][=C][Ring1][=Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][C][=Branch1][C][=O][N][N][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][=Branch2][C][C]"}", "/scratch/micpie/export/iupac_smiles/valid_21-8.jsonl": "{"text":"Task: Please create the SMILES of a chemical based on the systematic IUPAC name.\nIUPAC name: [4-[(2S)-1-(azetidin-1-yl)-3-cyclohexyl-1-oxidanylidene-propan-2-yl]oxy-6-(hydroxymethyl)-2-[5-[2-[[3-[6-(hydroxymethyl)-3,5-bis(oxidanyl)-4-[(2-oxidanylidenechromen-3-yl)methoxy]oxan-2-yl]oxy-4-oxidanyl-5-[(2-oxidanylidenechromen-3-yl)methoxy]cyclohexyl]carbonylamino]ethylcarbamoyl]-2-[6-methyl-3,4,5-tris(oxidanyl)oxan-2-yl]oxy-3-(4-phenyl-1,2,3-triazol-1-yl)cyclohexyl]oxy-5-oxidanyl-oxan-3-yl] benzoate\nResult: CC1C(C(C(C(O1)OC2C(CC(CC2OC3C(C(C(C(O3)CO)O)O[C@@H](CC4CCCCC4)C(=O)N5CCC5)OC(=O)C6=CC=CC=C6)C(=O)NCCNC(=O)C7CC(C(C(C7)OC8C(C(C(C(O8)CO)O)OCC9=CC1=CC=CC=C1OC9=O)O)O)OCC1=CC2=CC=CC=C2OC1=O)N1C=C(N=N1)C1=CC=CC=C1)O)O)O"} {"text":"Task: Please give me the SMILES representation of a molecule given the systematic IUPAC name.\nIUPAC name: 3-fluoranyl-5-(1H-1,2,4-triazol-5-yliodanuidyl)-2-[2,2,2-tris(fluoranyl)ethoxy]pyridine\nResult: Fc1cc([I-]c2ncn[nH]2)cnc1OCC(F)(F)F"}", "/scratch/micpie/export/iupac_smiles/train_16-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the chemical with DeepSMILES CC=CC=CC=C6)OC=NC=CC=C6)CCl)))Cl)))))))C))C is 5-chloro-4-(chloromethyl)-2-(2,3,5-trimethylphenoxy)pyridine."} {"text":"The IUAPC name in CAS-like style of the chemical with InChI InChI=1S\/C13H16N2O3\/c1-4-17-13-14-10-7-5-6-9(11(10)15-13)12(16)18-8(2)3\/h5-8H,4H2,1-3H3,(H,14,15) is 2-ethoxy-1H-benzimidazole-4-carboxylic acid propan-2-yl ester."}", "/scratch/micpie/export/iupac_smiles/valid_0-0.jsonl": "{"text":"The traditional IUPAC name of the chemical with SELFIES [C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][\/C][=Branch2][Ring1][N][=N][\\N][C][=Branch1][C][=O][C][=N][N][Branch1][Branch1][C][=C][Ring1][Branch1][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Br][\/C] is 1-(4-bromobenzyl)-N-[(Z)-1-(p-tolyl)ethylideneamino]pyrazole-3-carboxamide."} {"text":"The traditional IUPAC name of the chemical with canonical SMILES C=C.COC1(C)CC(O)OC(C)C1O is ethylene;4-methoxy-4,6-dimethyl-tetrahydropyran-2,5-diol."}", "/scratch/micpie/export/iupac_smiles/valid_3-3.jsonl": "{"text":"The DeepSMILES of the chemical with traditional IUPAC name (1S,3R,5S,7R)-3-hydroxy-5-methyl-adamantane-1-carboxylic acid is C[C@]C[C@@H]C[C@]C6)C[C@@]C6)C8)O)))C=O)O."} {"text":"The SELFIES of the compound with traditional IUPAC name 2-[4-(5-hydroxy-4-keto-6,7-dimethoxy-chromen-2-yl)phenoxy]-N-[6-[methyl(o-anisyl)amino]hexyl]acetamide is [C][N][Branch2][Branch1][Branch1][C][C][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=Branch1][C][=O][C][=C][Branch2][Ring1][C][C][=Branch1][=N][=C][Branch1][Branch2][C][=C][Ring1][=Branch1][O][Ring1][O][O][C][O][C][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C]."}", "/scratch/micpie/export/iupac_smiles/test_17-7.jsonl": "{"text":"Task: Please generate the SMILES representation of a chemical based on the traditional IUPAC name.\nIUPAC name: 2-ethoxy-1H-benzimidazole-4-carboxylic acid [4-[2-(2H-tetrazol-5-yl)phenyl]benzyl] ester\nResult: CCOC=NC=CC=CC=C6N9)))))C=O)OCC=CC=CC=C6))C=CC=CC=C6C=NNN=N5"} {"text":"Task: Please generate the SMILES representation of a compound based on the traditional IUPAC name.\nIUPAC name: 2-[4-[2-[3-[(Z,1E)-1-ethylidenebut-2-enyl]-3-methyl-5-(3-methylcyclohexa-1,5-dien-1-yl)cyclohexa-1,5-dien-1-yl]-3-iodo-10-[4-(2-pyridyl)phenyl]-3,4-dihydroanthracen-9-yl]phenyl]pyridine\nResult: C\/C=C\\C=C\/C))\\CCC=CC=C6)C=CC=CC=CC=CC=C6C=C%10CC%14I))))C=CC=CC=C6))C=CC=CC=N6)))))))))))))))))C=CC=CC=C6))C=CC=CC=N6))))))))))))))))C=CCCC=C6)))C))))))C"}", "/scratch/micpie/export/iupac_smiles/train_14-3.jsonl": "{"text":"The canonical SMILES of the chemical with traditional IUPAC name (3R)-5-keto-1-(2,2,2-trifluoroethyl)pyrrolidine-3-carboxylic acid (1,3,5-trimethylpyrazol-4-yl) ester is Cc1nn(C)c(C)c1OC(=O)[C@@H]1CC(=O)N(CC(F)(F)F)C1."} {"text":"The SELFIES of the chemical with traditional IUPAC name [7-chloro-5-[4-(3-fluoropyrrolidino)sulfonylphenyl]benzofuran-2-yl]methylamine is [C][C][N][Branch1][=Branch1][C][C][Ring1][Branch1][F][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=Branch1][P][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][Branch1][Ring2][O][Ring1][=Branch1][C][N][Cl]."}", "/scratch/micpie/export/iupac_smiles/test_16-1.jsonl": "{"text":"The CAS-like IUPAC name of the compound with DeepSMILES CCCOC=CC=CC=C6))OC=NC=CC=C6)CCl)))Cl is 5-chloro-4-(chloromethyl)-2-(4-propoxyphenoxy)pyridine."} {"text":"The CAS-like IUPAC name of the molecule with SELFIES [C][C][O][C][=N][C][=C][C][=C][C][=Branch2][Ring1][P][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][N][=N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][F] is 2-ethoxy-N-[(4-fluorophenyl)methyl]-3-[[4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl]-4-benzimidazolecarboxamide."}", "/scratch/micpie/export/iupac_smiles/train_13-7.jsonl": "{"text":"Task: Please generate the SMILES representation of a compound based on the traditional IUPAC name.\nIUPAC name: 2-amino-N,3,3-trimethyl-N-propargyl-butyramide\nResult: InChI=1S\/C10H18N2O\/c1-6-7-12(5)9(13)8(11)10(2,3)4\/h1,8H,7,11H2,2-5H3"} {"text":"Task: Please give me the SMILES of a molecule based on the traditional IUPAC name.\nIUPAC name: (4R)-chroman-4-carboxylic acid (1,3,5-trimethylpyrazol-4-yl) ester\nResult: [C][C][=C][Branch1][#Branch2][C][=Branch1][=Branch1][=N][N][Ring1][Branch1][C][C][O][C][=Branch1][C][=O][C@@H1][C][C][O][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1]"}", "/scratch/micpie/export/iupac_smiles/valid_22-3.jsonl": "{"text":"The canonical SMILES of the molecule with traditional IUPAC name 1-cyclopropyl-3-ethynyl-2-(trifluoromethyl)benzene is C#Cc1cccc(C2CC2)c1C(F)(F)F."} {"text":"The DeepSMILES of the chemical with traditional IUPAC name 1-(3-fluoro-4-methoxy-phenyl)sulfonylnipecotaldehyde is COC=CC=CC=C6))S=O)=O)NCCCCC6)C=O))))))))))F."}", "/scratch/micpie/export/iupac_smiles/valid_16-5.jsonl": "{"text":"The InChI of the chemical with IUAPC name in CAS-like style 5-chloro-4-(chloromethyl)-2-(3-iodophenoxy)pyridine is InChI=1S\/C12H8Cl2INO\/c13-6-8-4-12(16-7-11(8)14)17-10-3-1-2-9(15)5-10\/h1-5,7H,6H2."} {"text":"The canonical SMILES of the compound with IUAPC name in CAS-like style N-[2-(2,5-dichlorophenyl)ethyl]-2-ethoxy-3-[[4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl]-4-benzimidazolecarboxamide is CCOc1nc2cccc(C(=O)NCCc3cc(Cl)ccc3Cl)c2n1Cc1ccc(-c2ccccc2-c2nn[nH]n2)cc1."}", "/scratch/micpie/export/iupac_smiles/valid_13-3.jsonl": "{"text":"The SELFIES of the chemical with traditional IUPAC name 2-[[(2-amino-3,3-dimethyl-butanoyl)amino]methyl]-2-methyl-butyric acid is [C][C][C][Branch1][C][C][Branch2][Ring1][Ring1][C][N][C][=Branch1][C][=O][C][Branch1][=Branch2][C][Branch1][C][C][Branch1][C][C][C][N][C][=Branch1][C][=O][O]."} {"text":"The SELFIES of the compound with traditional IUPAC name (2-chloro-4-pyridyl)-[(2R)-2-methyl-1,2,3,4-tetrahydro-1,5-benzodiazepin-5-yl]methanone is [C][C@@H1][C][C][N][Branch1][N][C][=C][C][=C][C][=C][Ring1][=Branch1][N][Ring1][O][C][=Branch1][C][=O][C][=C][C][=Branch1][=Branch1][=N][C][=C][Ring1][=Branch1][Cl]."}", "/scratch/micpie/export/iupac_smiles/valid_21-4.jsonl": "{"text":"The SMILES of the chemical with systematic IUPAC name [4-[(2S)-1-(azetidin-1-yl)-3-cyclohexyl-1-oxidanylidene-propan-2-yl]oxy-6-(hydroxymethyl)-2-[5-[2-[[3-[6-(hydroxymethyl)-3,5-bis(oxidanyl)-4-[(2-oxidanylidenechromen-3-yl)methoxy]oxan-2-yl]oxy-4-oxidanyl-5-[(2-oxidanylidenechromen-3-yl)methoxy]cyclohexyl]carbonylamino]ethylcarbamoyl]-2-[6-methyl-3,4,5-tris(oxidanyl)oxan-2-yl]oxy-3-(4-phenyl-1,2,3-triazol-1-yl)cyclohexyl]oxy-5-oxidanyl-oxan-3-yl] benzoate is CC1C(C(C(C(O1)OC2C(CC(CC2OC3C(C(C(C(O3)CO)O)O[C@@H](CC4CCCCC4)C(=O)N5CCC5)OC(=O)C6=CC=CC=C6)C(=O)NCCNC(=O)C7CC(C(C(C7)OC8C(C(C(C(O8)CO)O)OCC9=CC1=CC=CC=C1OC9=O)O)O)OCC1=CC2=CC=CC=C2OC1=O)N1C=C(N=N1)C1=CC=CC=C1)O)O)O."} {"text":"The SELFIES of the compound with systematic IUPAC name 3-fluoranyl-5-(1H-1,2,4-triazol-5-yliodanuidyl)-2-[2,2,2-tris(fluoranyl)ethoxy]pyridine is [C][=C][Branch2][Ring1][Ring2][C][=N][C][=Branch1][Branch1][=C][Ring1][=Branch1][F][O][C][C][Branch1][C][F][Branch1][C][F][F][I-1][C][=N][C][=N][N][Ring1][Branch1]."}", "/scratch/micpie/export/iupac_smiles/test_16-5.jsonl": "{"text":"The SELFIES of the molecule with CAS-like IUPAC name 5-chloro-4-(chloromethyl)-2-(4-propoxyphenoxy)pyridine is [C][C][C][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][O][C][=N][C][=C][Branch1][=Branch2][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][Cl][Cl]."} {"text":"The SELFIES of the molecule with IUAPC name in CAS-like style 2-ethoxy-N-[(4-fluorophenyl)methyl]-3-[[4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl]-4-benzimidazolecarboxamide is [C][C][O][C][=N][C][=C][C][=C][C][=Branch2][Ring1][P][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][N][=N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][F]."}", "/scratch/micpie/export/iupac_smiles/valid_18-3.jsonl": "{"text":"The InChI of the molecule with traditional IUPAC name 8-(1-methylpyrrolidin-3-yl)-6-(3-pyridyl)quinazoline is InChI=1S\/C18H18N4\/c1-22-6-4-14(11-22)17-8-15(13-3-2-5-19-9-13)7-16-10-20-12-21-18(16)17\/h2-3,5,7-10,12,14H,4,6,11H2,1H3."} {"text":"The InChI of the chemical with traditional IUPAC name 2-[4-[2-(methylamino)ethyl]piperidino]-1-[2-(2-thienyl)pyrrolidino]propan-1-one is InChI=1S\/C19H31N3OS\/c1-15(21-12-8-16(9-13-21)7-10-20-2)19(23)22-11-3-5-17(22)18-6-4-14-24-18\/h4,6,14-17,20H,3,5,7-13H2,1-2H3."}", "/scratch/micpie/export/iupac_smiles/valid_3-8.jsonl": "{"text":"Task: Please create the SMILES representation of a compound based on the systematic IUPAC name.\nIUPAC name: (1S,3S,5R,7R)-3-methyl-5-oxidanyl-adamantane-1-carboxylic acid\nResult: [C][C@][C][C@@H1][C][C@][Branch1][Ring2][C][Ring1][=Branch1][Branch1][=C][C][C@@][Branch1][Ring2][C][Ring1][#Branch1][Branch1][Ring2][C][Ring1][#Branch2][O][C][=Branch1][C][=O][O]"} {"text":"Task: Please create the SMILES representation of a molecule based on the systematic IUPAC name.\nIUPAC name: 2-[4-(6,7-dimethoxy-5-oxidanyl-4-oxidanylidene-chromen-2-yl)phenoxy]-N-[6-[(2-methoxyphenyl)methyl-methyl-amino]hexyl]ethanamide\nResult: COc1ccccc1CN(C)CCCCCCNC(=O)COc1ccc(-c2cc(=O)c3c(O)c(OC)c(OC)cc3o2)cc1"}", "/scratch/micpie/export/iupac_smiles/test_25-0.jsonl": "{"text":"The traditional IUPAC name of the chemical with DeepSMILES CC=CC=NC=CC=NN95)))C=O)N[C@H]CCC3)))CF)F)F))))))))C=CC=CC=C6))OC=N5 is 5-(1,3-benzoxazol-5-yl)-N-[(1R)-1-cyclopropyl-2,2,2-trifluoro-ethyl]-7-methyl-pyrazolo[1,5-a]pyrimidine-3-carboxamide."} {"text":"The traditional IUPAC name of the molecule with DeepSMILES CCC=COC=C5C=CCCCCC6=C%10)))))))))))C=O)O is 3-ethyl-5,6,7,8-tetrahydrobenzo[f]benzofuran-2-carboxylic acid."}", "/scratch/micpie/export/iupac_smiles/valid_8-9.jsonl": "{"text":"Task: Please create the SMILES representation of a compound based on the CAS-like IUPAC name.\nIUPAC name: 3-[N-(2-naphthalenylsulfonyl)anilino]propanamide\nResult: InChI=1S\/C19H18N2O3S\/c20-19(22)12-13-21(17-8-2-1-3-9-17)25(23,24)18-11-10-15-6-4-5-7-16(15)14-18\/h1-11,14H,12-13H2,(H2,20,22)"} {"text":"Task: Please generate the SMILES of a molecule based on the IUAPC name in CAS-like style.\nIUPAC name: 1-[(E)-3,4-dihydro-2H-naphthalen-1-ylideneamino]oxy-3-(propan-2-ylamino)-2-propanol;hydrochloride\nResult: CCC)NCCCO\/N=C\\CCCC=CC=CC=C6\\%10)))))))))))))O.Cl"}", "/scratch/micpie/export/iupac_smiles/train_5-3.jsonl": "{"text":"The SELFIES of the compound with traditional IUPAC name 4-[3-(aminomethyl)tetrahydrofuran-3-yl]-3-ethyl-piperidin-4-ol is [C][C][C][C][N][C][C][C][Ring1][=Branch1][Branch1][N][C][Branch1][#Branch1][C][C][O][C][Ring1][Branch1][C][N][O]."} {"text":"The SMILES of the compound with traditional IUPAC name N-(2-aminoethyl)-2-[2-(tetrahydrofurfuryloxy)propanoylamino]benzamide;hydrochloride is CC(C(=O)NC1=CC=CC=C1C(=O)NCCN)OCC2CCCO2.Cl."}", "/scratch/micpie/export/iupac_smiles/train_22-9.jsonl": "{"text":"Task: Please give me the SMILES of a molecule given the CAS-like IUPAC name.\nIUPAC name: 6-methyl-5-(5-methyl-1,3,4-oxadiazol-2-yl)-5-methylsulfonyl-4-(trifluoromethyl)-1-cyclohexa-1,3-dienecarboxamide\nResult: CC1C(=CC=C(C1(C2=NN=C(O2)C)S(=O)(=O)C)C(F)(F)F)C(=O)N"} {"text":"Task: Please give me the SMILES of a chemical given the IUAPC name in CAS-like style.\nIUPAC name: 1-propylsulfonyl-3-piperidinecarboxaldehyde\nResult: CCCS(=O)(=O)N1CCCC(C=O)C1"}", "/scratch/micpie/export/iupac_smiles/train_6-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the molecule with DeepSMILES CCC=O)NC=CC=CC=C6C=O)NCCN)))))))))))))OCCCCCO5 is N-(2-aminoethyl)-2-[[1-oxo-2-(2-oxolanylmethoxy)propyl]amino]benzamide."} {"text":"The IUAPC name in CAS-like style of the chemical with InChI InChI=1S\/C14H15FN2O2S\/c1-10(11-5-3-2-4-6-11)17-20(18,19)12-7-8-13(15)14(16)9-12\/h2-10,17H,16H2,1H3\/t10-\/m0\/s1 is 3-amino-4-fluoro-N-[(1S)-1-phenylethyl]benzenesulfonamide."}", "/scratch/micpie/export/iupac_smiles/train_7-2.jsonl": "{"text":"The IUPAC name of the chemical with DeepSMILES COC=O)CNS=O)=O)C=CC=CC=C6)N))))Cl is methyl 2-[(5-amino-2-chlorophenyl)sulfonylamino]acetate."} {"text":"The IUPAC name of the chemical with SELFIES [C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][Branch1][Branch2][C][C][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][N+1][=Branch1][C][=O][O-1] is 3-(N-(2-nitrophenyl)sulfonylanilino)propanamide."}", "/scratch/micpie/export/iupac_smiles/valid_25-7.jsonl": "{"text":"Task: Please create the SMILES of a molecule based on the traditional IUPAC name.\nIUPAC name: 5-(5-chloro-3-pyridyl)-N-[(1S)-1-cyclopropylethyl]-7-methyl-pyrazolo[1,5-a]pyrimidine-3-carboxamide\nResult: [C][C][=C][C][=Branch2][Ring1][O][=N][C][=C][Branch1][Branch2][C][=N][N][Ring1][=Branch2][Ring1][Branch1][C][=Branch1][C][=O][N][C@@H1][Branch1][C][C][C][C][C][Ring1][Ring1][C][=C][C][=Branch1][=Branch1][=C][N][=C][Ring1][=Branch1][Cl]"} {"text":"Task: Please generate the SMILES representation of a compound given the traditional IUPAC name.\nIUPAC name: 2-[amino(phenyl)methyl]-4-ethyl-thiazole-5-carboxylic acid\nResult: CCc1nc(C(N)c2ccccc2)sc1C(=O)O"}", "/scratch/micpie/export/iupac_smiles/test_1-7.jsonl": "{"text":"Task: Please generate the SMILES representation of a molecule given the traditional IUPAC name.\nIUPAC name: (3R)-3-nitrocyclohexanol\nResult: O=[N+]([O-])[C@@H]1CCCC(O)C1"} {"text":"Task: Please give me the SMILES representation of a chemical based on the traditional IUPAC name.\nIUPAC name: (2S)-1-[[(2S)-2-hydroxy-3-(1H-indol-4-yloxy)propyl]-isopropyl-amino]-3-(1H-indol-4-yloxy)propan-2-ol\nResult: CCC)NC[C@@H]COC=CC=CC=C6C=CN5)))))))))))O)))C[C@@H]COC=CC=CC=C6C=CN5)))))))))))O"}", "/scratch/micpie/export/iupac_smiles/valid_20-3.jsonl": "{"text":"The canonical SMILES of the chemical with traditional IUPAC name 4-acetoxy-7-(2-hydroxy-5-keto-2-oct-2-enyl-cyclopent-3-en-1-ylidene)hept-5-enoic acid methyl ester is CCCCCC=CCC1(O)C=CC(=O)C1=CC=CC(CCC(=O)OC)OC(C)=O."} {"text":"The DeepSMILES of the chemical with traditional IUPAC name 2-[2-[2-(acetonylamino)ethylamino]ethylamino]acetic acid is CC=O)CNCCNCCNCC=O)O."}", "/scratch/micpie/export/iupac_smiles/train_3-5.jsonl": "{"text":"The canonical SMILES of the chemical with IUAPC name in CAS-like style (5-nitro-2-furanyl)-(1-pyrrolidinyl)methanone is O=C(c1ccc([N+](=O)[O-])o1)N1CCCC1."} {"text":"The SELFIES of the chemical with CAS-like IUPAC name (12S,14S,17E)-7-chloro-12-(hydroxymethyl)-23-(phenylmethyl)-15,20-dioxa-2,4,5,9,11,23-hexazahexacyclo[19.7.1.13,10.111,14.04,8.022,27]hentriaconta-1(28),3(31),5,7,9,17,21(29),22(27)-octaen-24-one is [C][C][C][=Branch1][C][=O][N][Branch2][Branch1][Branch1][C][=C][Ring1][#Branch1][C][=C][C][=C][Ring1][=Branch1][O][C][\/C][=C][\/C][O][C@H1][C][C@H1][Branch2][Ring1][=N][N][Branch1][Ring2][C][Ring1][Branch1][C][=N][C][=C][Branch1][S][C][=N][N][Ring1][Branch1][C][=Branch1][Ring2][=C][Ring1][=Branch2][N][Ring2][Ring1][Branch2][Cl][C][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]."}", "/scratch/micpie/export/iupac_smiles/valid_14-3.jsonl": "{"text":"The SMILES of the compound with traditional IUPAC name N-[[(1S,2S)-2-(2-furoylamino)cyclopentyl]methyl]carbamic acid tert-butyl ester is CC(C)(C)OC(=O)NC[C@@H]1CCC[C@@H]1NC(=O)C2=CC=CO2."} {"text":"The SMILES of the chemical with traditional IUPAC name N-[2-(5-bromo-7-chloro-benzofuran-2-yl)ethyl]carbamic acid tert-butyl ester is CC(C)(C)OC(=O)NCCC1=CC2=CC(=CC(=C2O1)Cl)Br."}", "/scratch/micpie/export/iupac_smiles/train_2-7.jsonl": "{"text":"Task: Please give me the SMILES representation of a chemical based on the traditional IUPAC name.\nIUPAC name: (6R,8R,9S,13S,14S,17S)-3-methoxy-13-methyl-17-[(2S)-tetrahydropyran-2-yl]oxy-6,7,8,9,11,12,14,15,16,17-decahydrocyclopenta[a]phenanthren-6-ol\nResult: [C][C@][C][C][C@H1][C@H1][Branch2][Ring1][C][C@@H1][Ring1][=Branch1][C][C][C@@H1][Ring1][=Branch2][O][C@H1][C][C][C][C][O][Ring1][=Branch1][C][C@H1][Branch1][#C][C][=C][Ring1][P][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O]"} {"text":"Task: Please give me the SMILES representation of a chemical based on the traditional IUPAC name.\nIUPAC name: (1S,3R,5S,7R)-3-hydroxy-5-methyl-adamantane-1-carboxylate\nResult: InChI=1S\/C12H18O3\/c1-10-2-8-3-11(5-10,9(13)14)7-12(15,4-8)6-10\/h8,15H,2-7H2,1H3,(H,13,14)\/p-1\/t8-,10+,11+,12-\/m1\/s1"}", "/scratch/micpie/export/iupac_smiles/test_5-5.jsonl": "{"text":"The SMILES of the compound with IUAPC name in CAS-like style 4-(1-amino-2-methylbutan-2-yl)-3-ethyl-4-piperidinol is CCC1CNCCC1(C(C)(CC)CN)O."} {"text":"The DeepSMILES of the compound with CAS-like IUPAC name N-(2-aminoethyl)-2-[(4-methyl-1-oxo-3-phenylpentyl)amino]benzamide is CCC)CCC=O)NC=CC=CC=C6C=O)NCCN))))))))))))))C=CC=CC=C6."}", "/scratch/micpie/export/iupac_smiles/test_7-8.jsonl": "{"text":"Task: Please generate the SMILES of a compound based on the systematic IUPAC name.\nIUPAC name: 3-azanyl-4-oxidanyl-N-(pyridin-2-ylmethyl)benzenesulfonamide\nResult: InChI=1S\/C12H13N3O3S\/c13-11-7-10(4-5-12(11)16)19(17,18)15-8-9-3-1-2-6-14-9\/h1-7,15-16H,8,13H2"} {"text":"Task: Please give me the SMILES of a chemical based on the systematic IUPAC name.\nIUPAC name: 3-[(4-fluorophenyl)sulfonyl-phenyl-amino]propanamide\nResult: C1=CC=C(C=C1)N(CCC(=O)N)S(=O)(=O)C2=CC=C(C=C2)F"}", "/scratch/micpie/export/iupac_smiles/valid_4-3.jsonl": "{"text":"The SELFIES of the chemical with traditional IUPAC name N-(4-chlorophenyl)-N-[[1-[(2S,3R,4R,5S,6R)-3,4-dihydroxy-6-methylol-5-[(2S,3R,4S,5R,6R)-3,4,5-trihydroxy-6-methylol-tetrahydropyran-2-yl]oxy-tetrahydropyran-2-yl]triazol-4-yl]methyl]ethanesulfonamide is [C][C][S][=Branch1][C][=O][=Branch1][C][=O][N][Branch2][Branch1][#Branch1][C][C][=C][N][Branch1][Branch1][N][=N][Ring1][Branch1][C@@H1][C@@H1][Branch2][Ring2][#Branch1][C@H1][Branch2][Ring2][C][C@@H1][Branch1][=Branch2][C@H1][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][C@H1][C@@H1][Branch1][P][C@H1][Branch1][=N][C@H1][Branch1][=Branch2][C@H1][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][O][O][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl]."} {"text":"The canonical SMILES of the compound with traditional IUPAC name 4-[1-(aminomethyl)-4-ethyl-cyclohexyl]-3-ethyl-piperidin-4-ol is CCC1CCC(CN)(C2(O)CCNCC2CC)CC1."}", "/scratch/micpie/export/iupac_smiles/train_25-8.jsonl": "{"text":"Task: Please generate the SMILES representation of a compound based on the systematic IUPAC name.\nIUPAC name: N-(4-butoxycyclohexyl)-5-(5-chloranylpyridin-3-yl)-7-methyl-pyrazolo[1,5-a]pyrimidine-3-carboxamide\nResult: CCCCOC1CCC(NC(=O)c2cnn3c(C)cc(-c4cncc(Cl)c4)nc23)CC1"} {"text":"Task: Please give me the SMILES of a compound based on the systematic IUPAC name.\nIUPAC name: 8-ethyl-2,3-dihydrothieno[2,3-g][1,4]benzodioxine-7-carboxylic acid\nResult: CCC1=C(SC2=CC3=C(C=C21)OCCO3)C(=O)O"}", "/scratch/micpie/export/iupac_smiles/train_1-3.jsonl": "{"text":"The DeepSMILES of the compound with traditional IUPAC name [3-[(1Z)-3-chlorobuta-1,3-dienyl]-2-methyl-4-pyridyl]-(4-piperidinobutyl)amine is CC=NC=CC=C6\/C=C\\C=C)Cl)))))NCCCCNCCCCC6."} {"text":"The DeepSMILES of the chemical with traditional IUPAC name (6S,8R,9S,13S,14S,17S)-3-methoxy-13-methyl-17-[(2R)-tetrahydropyran-2-yl]oxy-6,7,8,9,11,12,14,15,16,17-decahydrocyclopenta[a]phenanthren-6-ol is C[C@]CC[C@H][C@H][C@@H]6CC[C@@H]9O[C@@H]CCCCO6)))))))))))C[C@@H]C=C6C=CC=C6)OC)))))))O."}", "/scratch/micpie/export/iupac_smiles/train_19-4.jsonl": "{"text":"The InChI of the compound with systematic IUPAC name N-(4-chlorophenyl)-3-[4-[2-(methylamino)ethyl]piperidin-1-yl]propanamide is InChI=1S\/C17H26ClN3O\/c1-19-10-6-14-7-11-21(12-8-14)13-9-17(22)20-16-4-2-15(18)3-5-16\/h2-5,14,19H,6-13H2,1H3,(H,20,22)."} {"text":"The SELFIES of the chemical with systematic IUPAC name 3-[N-[2,6-bis(chloranyl)-4-[oxidanidyl(oxidanyl)amino]phenyl]-C-methyl-carbonimidoyl]chromen-2-one is [C][C][=Branch2][Ring1][Branch1][=N][C][=C][Branch1][#C][C][=C][Branch1][=Branch1][C][=C][Ring1][=Branch1][Cl][N][Branch1][C][O][O-1][Cl][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][Ring1][#Branch2][=O]."}", "/scratch/micpie/export/iupac_smiles/train_27-5.jsonl": "{"text":"The SELFIES of the molecule with IUAPC name in CAS-like style 1-[(3R)-3-(4-chlorophenyl)-5-(4-fluorophenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(2-fluorophenyl)-1-piperazinyl]ethanone is [C][C][N][Branch2][Ring2][=N][C][C][N][Ring1][=Branch1][C][C][=Branch1][C][=O][N][C@H1][Branch2][Ring1][Ring1][C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][F][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl][C][=C][C][=C][C][=C][Ring1][=Branch1][F]."} {"text":"The SMILES of the chemical with CAS-like IUPAC name 6-[4-[4-[(9,9-dimethyl-2-fluorenyl)amino]phenyl]-2,3,5,6-tetrahydroxyphenyl]benzene-1,2,3,4,5-pentol is CC1(C2=CC=CC=C2C3=C1C=C(C=C3)NC4=CC=C(C=C4)C5=C(C(=C(C(=C5O)O)C6=C(C(=C(C(=C6O)O)O)O)O)O)O)C."}", "/scratch/micpie/export/iupac_smiles/test_20-5.jsonl": "{"text":"The InChI of the compound with IUAPC name in CAS-like style (2S,8S,12S)-4,10-bis(2-ethoxyphenyl)-3,5,9,11-tetraoxo-4,10-diazatetracyclo[5.5.2.02,6.08,12]tetradec-13-ene-13-carboxylic acid phenyl ester is InChI=1S\/C35H30N2O8\/c1-3-43-24-16-10-8-14-22(24)36-31(38)27-20-18-21(35(42)45-19-12-6-5-7-13-19)26(29(27)33(36)40)30-28(20)32(39)37(34(30)41)23-15-9-11-17-25(23)44-4-2\/h5-18,20,26-30H,3-4H2,1-2H3\/t20?,26?,27-,28?,29-,30-\/m0\/s1."} {"text":"The SMILES of the molecule with CAS-like IUPAC name (2S)-2-[[3-benzoyloxy-2-[3-[4-(3-fluorophenyl)-1-triazolyl]-5-[[2-[[4-[4-(3-fluorophenyl)-1-triazolyl]-3,5-dihydroxy-6-(hydroxymethyl)-2-oxanyl]oxy]ethylamino]-oxomethyl]-2-[(3,4,5-trihydroxy-6-methyl-2-oxanyl)oxy]cyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)-4-oxanyl]oxy]-3-cyclohexylpropanoic acid is CC1C(C(C(C(O1)OC2C(CC(CC2OC3C(C(C(C(O3)CO)O)O[C@@H](CC4CCCCC4)C(=O)O)OC(=O)C5=CC=CC=C5)C(=O)NCCOC6C(C(C(C(O6)CO)O)N7C=C(N=N7)C8=CC(=CC=C8)F)O)N9C=C(N=N9)C1=CC(=CC=C1)F)O)O)O."}", "/scratch/micpie/export/iupac_smiles/valid_5-7.jsonl": "{"text":"Task: Please give me the SMILES representation of a compound based on the traditional IUPAC name.\nIUPAC name: 4-[1-(aminomethyl)cyclooctyl]-3-ethyl-piperidin-4-ol\nResult: [C][C][C][C][N][C][C][C][Ring1][=Branch1][Branch1][#C][C][Branch1][#Branch2][C][C][C][C][C][C][C][Ring1][Branch2][C][N][O]"} {"text":"Task: Please give me the SMILES representation of a chemical given the traditional IUPAC name.\nIUPAC name: N-(2-aminoethyl)-2-[3-(cyclopentanecarbonylamino)butanoylamino]benzamide;hydrochloride\nResult: [C][C][Branch2][Ring1][#Branch1][C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][N][N][C][=Branch1][C][=O][C][C][C][C][C][Ring1][Branch1].[Cl]"}", "/scratch/micpie/export/iupac_smiles/test_7-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with SELFIES [C][=C][C][=N][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=Branch1][=Branch2][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][O][N] is 3-amino-4-hydroxy-N-(2-pyridylmethyl)benzenesulfonamide."} {"text":"The traditional IUPAC name of the compound with DeepSMILES C=CC=CC=C6))NCCC=O)N))))S=O)=O)C=CC=CC=C6))F is 3-(N-(4-fluorophenyl)sulfonylanilino)propionamide."}", "/scratch/micpie/export/iupac_smiles/test_14-3.jsonl": "{"text":"The SELFIES of the molecule with traditional IUPAC name (3S)-5-keto-1-(2,2,2-trifluoroethyl)pyrrolidine-3-carboxylic acid (1,3,5-trimethylpyrazol-4-yl) ester is [C][C][=C][Branch1][#Branch2][C][=Branch1][=Branch1][=N][N][Ring1][Branch1][C][C][O][C][=Branch1][C][=O][C@H1][C][C][=Branch1][C][=O][N][Branch1][Ring2][C][Ring1][=Branch1][C][C][Branch1][C][F][Branch1][C][F][F]."} {"text":"The DeepSMILES of the chemical with traditional IUPAC name 4-bromo-2-iodo-5-(trifluoromethyl)phenol is C=CC=CC=C6O))I)))Br))CF)F)F."}", "/scratch/micpie/export/iupac_smiles/test_8-2.jsonl": "{"text":"The IUPAC name of the compound with SELFIES [C][C][=C][Branch2][Ring2][Branch1][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][S][=Branch1][C][=O][=Branch1][C][=O][N][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][Branch1][C][C][C][C] is N-[3-[(3,4-dimethylphenyl)sulfonylamino]phenyl]-2-methylpropanamide."} {"text":"The IUPAC name of the compound with InChI InChI=1S\/C20H26N2O2\/c1-20(2,3)21-14-18(23)15-24-22-19(16-10-6-4-7-11-16)17-12-8-5-9-13-17\/h4-13,18,21,23H,14-15H2,1-3H3 is 1-(benzhydrylideneamino)oxy-3-(tert-butylamino)propan-2-ol."}", "/scratch/micpie/export/iupac_smiles/test_18-0.jsonl": "{"text":"The traditional IUPAC name of the chemical with DeepSMILES CCC\/C=C\/CCCC)CCCC)C)))))))))C=[P+]C)C is [(E)-2-ethyl-7,10-dimethyl-undec-3-enylidene]-dimethyl-phosphonium."} {"text":"The traditional IUPAC name of the compound with canonical SMILES CNCCC1CCN(CCc2ccc([N+](=O)[O-])cc2)CC1.Cl is methyl-[2-[1-[2-(4-nitrophenyl)ethyl]-4-piperidyl]ethyl]amine;hydrochloride."}", "/scratch/micpie/export/iupac_smiles/test_12-4.jsonl": "{"text":"The InChI of the molecule with systematic IUPAC name 2-(4-chlorophenyl)spiro[fluorene-9,9'-thioxanthene] is InChI=1S\/C31H19ClS\/c32-22-16-13-20(14-17-22)21-15-18-24-23-7-1-2-8-25(23)31(28(24)19-21)26-9-3-5-11-29(26)33-30-12-6-4-10-27(30)31\/h1-19H."} {"text":"The DeepSMILES of the molecule with systematic IUPAC name 2-azanyl-N-(cyclopropylmethyl)-3,3-dimethyl-N-prop-2-ynyl-butanamide is CCC)C)CC=O)NCC#C)))CCCC3))))))N."}", "/scratch/micpie/export/iupac_smiles/valid_23-4.jsonl": "{"text":"The SELFIES of the chemical with systematic IUPAC name 1-(3-chlorophenyl)sulfonylpiperidine-3-carbaldehyde is [C][C][C][Branch2][Ring1][#Branch2][C][N][Branch1][Ring2][C][Ring1][=Branch1][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][Cl][C][=O]."} {"text":"The SELFIES of the molecule with systematic IUPAC name N-(benzimidazol-1-yl)-3-[(2,3,4,5,6-pentamethylphenyl)sulfonylamino]propanamide is [C][C][=C][Branch2][Ring2][=N][C][=Branch2][Ring2][Branch2][=C][Branch1][=Branch2][C][=Branch1][Branch1][=C][Ring1][=Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][C][=Branch1][C][=O][N][N][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][=Branch2][C][C]."}", "/scratch/micpie/export/iupac_smiles/valid_1-8.jsonl": "{"text":"Task: Please create the SMILES of a compound given the systematic IUPAC name.\nIUPAC name: ethene;N-ethyl-N-methyl-prop-2-en-1-amine\nResult: InChI=1S\/C6H13N.C2H4\/c1-4-6-7(3)5-2;1-2\/h4H,1,5-6H2,2-3H3;1-2H2"} {"text":"Task: Please create the SMILES of a chemical based on the systematic IUPAC name.\nIUPAC name: (2R)-1-(1H-indol-4-yloxy)-3-[[(2S)-3-(1H-indol-4-yloxy)-2-oxidanyl-propyl]-propan-2-yl-amino]propan-2-ol\nResult: InChI=1S\/C25H31N3O4\/c1-17(2)28(13-18(29)15-31-24-7-3-5-22-20(24)9-11-26-22)14-19(30)16-32-25-8-4-6-23-21(25)10-12-27-23\/h3-12,17-19,26-27,29-30H,13-16H2,1-2H3\/t18-,19+"}", "/scratch/micpie/export/iupac_smiles/train_20-1.jsonl": "{"text":"The CAS-like IUPAC name of the molecule with InChI InChI=1S\/C41H64O15S\/c1-20(2)10-9-15-40(8)33-24(42)18-39(7)23-11-12-26-37(4,5)27(14-16-38(26,6)22(23)13-17-41(33,39)36(47)55-40)53-35-32(29(44)25(19-51-35)56-57(48,49)50)54-34-31(46)30(45)28(43)21(3)52-34\/h13,20-21,23,25-35,43-46H,9-12,14-19H2,1-8H3,(H,48,49,50) is sulfuric acid [4-hydroxy-6-[[2,6,13,17,17-pentamethyl-6-(4-methylpentyl)-4,8-dioxo-7-oxapentacyclo[10.8.0.02,9.05,9.013,18]eicos-11-en-16-yl]oxy]-5-[(3,4,5-trihydroxy-6-methyl-2-oxanyl)oxy]-3-oxanyl] ester."} {"text":"The IUAPC name in CAS-like style of the compound with SELFIES [C][C][C][C@@H1][Branch1][O][C][=Branch1][C][=O][N][C][C][C][Ring1][Ring2][O][C][C][Branch2][O][O][C][Branch2][O][Branch1][O][C][Branch1][P][C][Ring1][=Branch1][O][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][C][C][Branch2][Ring1][#C][C][C][Branch2][Ring1][=Branch2][C][Ring1][=Branch1][O][C][C][Branch1][S][C][Branch1][N][C][Branch1][Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][C][C][=Branch1][C][=O][N][C][C][N][C][=Branch1][C][=O][C][C][C][Branch2][Branch1][=Branch1][C][Branch2][Ring2][P][C][Branch1][Ring2][C][Ring1][=Branch1][O][C][C][Branch2][Ring2][Ring2][C][Branch1][=N][C][Branch1][=Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][N][C][=C][Branch1][Branch1][N][=N][Ring1][Branch1][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][F][O][O][N][C][=C][Branch1][Branch1][N][=N][Ring1][Branch1][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][F][C][O][O] is benzoic acid [4-[(2S)-1-(1-azetidinyl)-1-oxopentan-2-yl]oxy-2-[5-[[2-[[[3-[4-(3-fluorophenyl)-1-triazolyl]-5-[[4-[4-(3-fluorophenyl)-1-triazolyl]-3,5-dihydroxy-6-(hydroxymethyl)-2-oxanyl]oxy]-4-hydroxycyclohexyl]-oxomethyl]amino]ethylamino]-oxomethyl]-3-methyl-2-[(3,4,5-trihydroxy-6-methyl-2-oxanyl)oxy]cyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)-3-oxanyl] ester."}", "/scratch/micpie/export/iupac_smiles/train_20-8.jsonl": "{"text":"Task: Please give me the SMILES representation of a chemical given the systematic IUPAC name.\nIUPAC name: [5-[6-methyl-3,4,5-tris(oxidanyl)oxan-2-yl]oxy-4-oxidanyl-6-[[2,6,13,17,17-pentamethyl-6-(4-methylpentyl)-4,8-bis(oxidanylidene)-7-oxapentacyclo[10.8.0.02,9.05,9.013,18]icos-11-en-16-yl]oxy]oxan-3-yl] hydrogen sulfate\nResult: CC1C(C(C(C(O1)OC2C(C(COC2OC3CCC4(C(C3(C)C)CCC5C4=CCC67C5(CC(=O)C6C(OC7=O)(C)CCCC(C)C)C)C)OS(=O)(=O)O)O)O)O)O"} {"text":"Task: Please generate the SMILES of a molecule based on the systematic IUPAC name.\nIUPAC name: [4-[(2S)-1-(azetidin-1-yl)-1-oxidanylidene-pentan-2-yl]oxy-2-[5-[2-[[3-[4-(3-fluorophenyl)-1,2,3-triazol-1-yl]-5-[4-[4-(3-fluorophenyl)-1,2,3-triazol-1-yl]-6-(hydroxymethyl)-3,5-bis(oxidanyl)oxan-2-yl]oxy-4-oxidanyl-cyclohexyl]carbonylamino]ethylcarbamoyl]-3-methyl-2-[6-methyl-3,4,5-tris(oxidanyl)oxan-2-yl]oxy-cyclohexyl]oxy-6-(hydroxymethyl)-5-oxidanyl-oxan-3-yl] benzoate\nResult: [C][C][C][C@@H1][Branch1][O][C][=Branch1][C][=O][N][C][C][C][Ring1][Ring2][O][C][C][Branch2][O][O][C][Branch2][O][Branch1][O][C][Branch1][P][C][Ring1][=Branch1][O][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][C][C][Branch2][Ring1][#C][C][C][Branch2][Ring1][=Branch2][C][Ring1][=Branch1][O][C][C][Branch1][S][C][Branch1][N][C][Branch1][Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][C][C][=Branch1][C][=O][N][C][C][N][C][=Branch1][C][=O][C][C][C][Branch2][Branch1][=Branch1][C][Branch2][Ring2][P][C][Branch1][Ring2][C][Ring1][=Branch1][O][C][C][Branch2][Ring2][Ring2][C][Branch1][=N][C][Branch1][=Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][N][C][=C][Branch1][Branch1][N][=N][Ring1][Branch1][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][F][O][O][N][C][=C][Branch1][Branch1][N][=N][Ring1][Branch1][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][F][C][O][O]"}", "/scratch/micpie/export/iupac_smiles/test_0-2.jsonl": "{"text":"The preferred IUPAC name of the molecule with canonical SMILES COc1cccc(\/C=N\\NC(=O)C2CCN(Cc3ccc(Br)cc3)CC2)c1 is 1-[(4-bromophenyl)methyl]-N-[(Z)-(3-methoxyphenyl)methylideneamino]piperidine-4-carboxamide."} {"text":"The IUPAC name of the compound with canonical SMILES CC1=C(c2nc(C=O)cs2)SN=I1 is 2-(5-methyl-1lambda3-ioda-3-thia-2-azacyclopenta-1,4-dien-4-yl)-1,3-thiazole-4-carbaldehyde."}", "/scratch/micpie/export/iupac_smiles/test_3-0.jsonl": "{"text":"The traditional IUPAC name of the compound with InChI InChI=1S\/C13H20O2S\/c1-11(2)15-8-13(14)10-16-9-12-6-4-3-5-7-12\/h3-7,11,13-14H,8-10H2,1-2H3\/t13-\/m1\/s1 is (2R)-1-(benzylthio)-3-isopropoxy-propan-2-ol."} {"text":"The traditional IUPAC name of the chemical with InChI InChI=1S\/C8H6ClNO2.C4H11NO3\/c1-4-2-7-6(3-5(4)9)10-8(11)12-7;5-4(1-6,2-7)3-8\/h2-3H,1H3,(H,10,11);6-8H,1-3,5H2 is 2-amino-2-methylol-propane-1,3-diol;5-chloro-6-methyl-3H-1,3-benzoxazol-2-one."}", "/scratch/micpie/export/iupac_smiles/valid_26-3.jsonl": "{"text":"The canonical SMILES of the compound with traditional IUPAC name 2-[4-(difluoromethyl)phenyl]-4-ethyl-thiazole-5-carboxylic acid is CCc1nc(-c2ccc(C(F)F)cc2)sc1C(=O)O."} {"text":"The canonical SMILES of the compound with traditional IUPAC name 2-(4-benzylpiperazino)-1-[(5R)-5-(2-chlorophenyl)-3-(3-methoxyphenyl)-2-pyrazolin-1-yl]ethanone is COc1cccc(C2=NN(C(=O)CN3CCN(Cc4ccccc4)CC3)[C@@H](c3ccccc3Cl)C2)c1."}", "/scratch/micpie/export/iupac_smiles/train_1-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with canonical SMILES C=C(Cl)\/C=C\\c1c(NCCCCN2CCCCC2)ccnc1C is [3-[(1Z)-3-chlorobuta-1,3-dienyl]-2-methyl-4-pyridyl]-(4-piperidinobutyl)amine."} {"text":"The traditional IUPAC name of the molecule with SMILES C[C@]12CC[C@H]3[C@H]([C@@H]1CC[C@@H]2O[C@@H]4CCCCO4)C[C@@H](C5=C3C=CC(=C5)OC)O is (6S,8R,9S,13S,14S,17S)-3-methoxy-13-methyl-17-[(2R)-tetrahydropyran-2-yl]oxy-6,7,8,9,11,12,14,15,16,17-decahydrocyclopenta[a]phenanthren-6-ol."}", "/scratch/micpie/export/iupac_smiles/valid_19-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the compound with canonical SMILES CNCCC1CCN(CC(=O)Nc2cc(-n3cnnn3)ccc2Cl)CC1.Cl is N-[2-chloro-5-(1-tetrazolyl)phenyl]-2-[4-[2-(methylamino)ethyl]-1-piperidinyl]acetamide;hydrochloride."} {"text":"The IUAPC name in CAS-like style of the molecule with InChI InChI=1S\/C37H61N3O4\/c1-23(2)27-14-18-36(7)16-10-12-25(5)32(36)34(27)38-29(41)20-40(22-31(43)44-9)21-30(42)39-35-28(24(3)4)15-19-37(8)17-11-13-26(6)33(35)37\/h23-24,27-28,32-35H,5-6,10-22H2,1-4,7-9H3,(H,38,41)(H,39,42) is 2-[bis[2-[(4a-methyl-8-methylene-2-propan-2-yl-1,2,3,4,5,6,7,8a-octahydronaphthalen-1-yl)amino]-2-oxoethyl]amino]acetic acid methyl ester."}", "/scratch/micpie/export/iupac_smiles/test_5-2.jsonl": "{"text":"The preferred IUPAC name of the molecule with canonical SMILES CCC1CNCCC1(O)C(C)(CC)CN is 4-(1-amino-2-methylbutan-2-yl)-3-ethylpiperidin-4-ol."} {"text":"The preferred IUPAC name of the compound with canonical SMILES CC(C)C(CC(=O)Nc1ccccc1C(=O)NCCN)c1ccccc1 is N-(2-aminoethyl)-2-[(4-methyl-3-phenylpentanoyl)amino]benzamide."}", "/scratch/micpie/export/iupac_smiles/test_20-7.jsonl": "{"text":"Task: Please generate the SMILES of a compound given the traditional IUPAC name.\nIUPAC name: (2S,8S,12S)-3,5,9,11-tetraketo-4,10-bis(o-phenetyl)-4,10-diazatetracyclo[5.5.2.02,6.08,12]tetradec-13-ene-13-carboxylic acid phenyl ester\nResult: InChI=1S\/C35H30N2O8\/c1-3-43-24-16-10-8-14-22(24)36-31(38)27-20-18-21(35(42)45-19-12-6-5-7-13-19)26(29(27)33(36)40)30-28(20)32(39)37(34(30)41)23-15-9-11-17-25(23)44-4-2\/h5-18,20,26-30H,3-4H2,1-2H3\/t20?,26?,27-,28?,29-,30-\/m0\/s1"} {"text":"Task: Please create the SMILES of a compound given the traditional IUPAC name.\nIUPAC name: (2S)-2-[3-benzoyloxy-2-[3-[4-(3-fluorophenyl)triazol-1-yl]-5-[2-[4-[4-(3-fluorophenyl)triazol-1-yl]-3,5-dihydroxy-6-methylol-tetrahydropyran-2-yl]oxyethylcarbamoyl]-2-(3,4,5-trihydroxy-6-methyl-tetrahydropyran-2-yl)oxy-cyclohexoxy]-5-hydroxy-6-methylol-tetrahydropyran-4-yl]oxy-3-cyclohexyl-propionic acid\nResult: InChI=1S\/C59H73F2N7O20\/c1-29-45(71)49(75)50(76)58(82-29)88-51-39(67-25-37(63-65-67)32-14-8-16-35(60)21-32)23-34(54(77)62-18-19-81-57-48(74)44(46(72)42(27-69)85-57)68-26-38(64-66-68)33-15-9-17-36(61)22-33)24-40(51)84-59-53(87-56(80)31-12-6-3-7-13-31)52(47(73)43(28-70)86-59)83-41(55(78)79)20-30-10-4-2-5-11-30\/h3,6-9,12-17,21-22,25-26,29-30,34,39-53,57-59,69-76H,2,4-5,10-11,18-20,23-24,27-28H2,1H3,(H,62,77)(H,78,79)\/t29?,34?,39?,40?,41-,42?,43?,44?,45?,46?,47?,48?,49?,50?,51?,52?,53?,57?,58?,59?\/m0\/s1"}", "/scratch/micpie/export/iupac_smiles/test_11-8.jsonl": "{"text":"Task: Please create the SMILES of a molecule based on the systematic IUPAC name.\nIUPAC name: 3-[2,4-bis(fluoranyl)phenoxy]-N,N-dimethyl-3-phenyl-propan-1-amine;2,5-dinitrobenzoic acid\nResult: CN(C)CCC(Oc1ccc(F)cc1F)c1ccccc1.O=C(O)c1cc([N+](=O)[O-])ccc1[N+](=O)[O-]"} {"text":"Task: Please create the SMILES of a molecule given the systematic IUPAC name.\nIUPAC name: 4-[7-(4-hydroxyphenyl)-9,9-dioctyl-fluoren-2-yl]benzaldehyde\nResult: InChI=1S\/C42H50O2\/c1-3-5-7-9-11-13-27-42(28-14-12-10-8-6-4-2)40-29-35(33-17-15-32(31-43)16-18-33)21-25-38(40)39-26-22-36(30-41(39)42)34-19-23-37(44)24-20-34\/h15-26,29-31,44H,3-14,27-28H2,1-2H3"}", "/scratch/micpie/export/iupac_smiles/test_17-8.jsonl": "{"text":"Task: Please generate the SMILES of a molecule based on the systematic IUPAC name.\nIUPAC name: [4-[2-(2H-1,2,3,4-tetrazol-5-yl)phenyl]phenyl]methyl 2-ethoxy-1H-benzimidazole-4-carboxylate\nResult: InChI=1S\/C24H20N6O3\/c1-2-32-24-25-20-9-5-8-19(21(20)26-24)23(31)33-14-15-10-12-16(13-11-15)17-6-3-4-7-18(17)22-27-29-30-28-22\/h3-13H,2,14H2,1H3,(H,25,26)(H,27,28,29,30)"} {"text":"Task: Please create the SMILES of a chemical given the systematic IUPAC name.\nIUPAC name: 2-[4-[2-[3-[(2E,4Z)-hexa-2,4-dien-3-yl]-3-methyl-5-(3-methylcyclohexa-1,5-dien-1-yl)cyclohexa-1,5-dien-1-yl]-3-iodanyl-10-(4-pyridin-2-ylphenyl)-3,4-dihydroanthracen-9-yl]phenyl]pyridine\nResult: [C][\/C][=C][\\C][=Branch1][Ring1][=C][\/C][\\C][Branch2][#Branch1][=Branch1][C][C][=Branch2][=Branch1][=Branch1][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][C][=C][Branch2][Ring2][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][=Branch2][=C][Ring1][#Branch2][C][C][Ring1][=C][I][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=N][Ring1][=Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=N][Ring1][=Branch1][C][=C][C][Branch1][=Branch1][C][C][=C][Ring1][=Branch1][C][C]"}", "/scratch/micpie/export/iupac_smiles/valid_15-3.jsonl": "{"text":"The canonical SMILES of the compound with traditional IUPAC name methanesulfonic acid 2-[5-[5-(4,4-difluoropiperidine-1-carbonyl)-2-pyridyl]-7-(trifluoromethyl)benzofuran-2-yl]ethyl ester is CS(=O)(=O)OCCc1cc2cc(-c3ccc(C(=O)N4CCC(F)(F)CC4)cn3)cc(C(F)(F)F)c2o1."} {"text":"The DeepSMILES of the compound with traditional IUPAC name 5-chloro-4-(chloromethyl)-2-(1-naphthoxy)pyridine is C=CC=CC=C6)C=CC=C6OC=NC=CC=C6)CCl)))Cl."}", "/scratch/micpie/export/iupac_smiles/train_27-6.jsonl": "{"text":"The SELFIES of the molecule with preferred IUPAC name 1-[(3R)-3-(4-chlorophenyl)-5-(4-fluorophenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(2-fluorophenyl)piperazin-1-yl]ethanone is [C][C][N][Branch2][Ring2][=N][C][C][N][Ring1][=Branch1][C][C][=Branch1][C][=O][N][C@H1][Branch2][Ring1][Ring1][C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][F][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl][C][=C][C][=C][C][=C][Ring1][=Branch1][F]."} {"text":"The canonical SMILES of the compound with preferred IUPAC name 6-[4-[4-[(9,9-dimethylfluoren-2-yl)amino]phenyl]-2,3,5,6-tetrahydroxyphenyl]benzene-1,2,3,4,5-pentol is CC1(C)c2ccccc2-c2ccc(Nc3ccc(-c4c(O)c(O)c(-c5c(O)c(O)c(O)c(O)c5O)c(O)c4O)cc3)cc21."}", "/scratch/micpie/export/iupac_smiles/valid_19-6.jsonl": "{"text":"The SMILES of the compound with preferred IUPAC name N-[2-chloro-5-(tetrazol-1-yl)phenyl]-2-[4-[2-(methylamino)ethyl]piperidin-1-yl]acetamide;hydrochloride is CNCCC1CCN(CC1)CC(=O)NC2=C(C=CC(=C2)N3C=NN=N3)Cl.Cl."} {"text":"The InChI of the molecule with preferred IUPAC name methyl 2-[bis[2-[(4a-methyl-8-methylidene-2-propan-2-yl-1,2,3,4,5,6,7,8a-octahydronaphthalen-1-yl)amino]-2-oxoethyl]amino]acetate is InChI=1S\/C37H61N3O4\/c1-23(2)27-14-18-36(7)16-10-12-25(5)32(36)34(27)38-29(41)20-40(22-31(43)44-9)21-30(42)39-35-28(24(3)4)15-19-37(8)17-11-13-26(6)33(35)37\/h23-24,27-28,32-35H,5-6,10-22H2,1-4,7-9H3,(H,38,41)(H,39,42)."}", "/scratch/micpie/export/iupac_smiles/train_10-4.jsonl": "{"text":"The DeepSMILES of the molecule with systematic IUPAC name tert-butyl N-[4-[[3-[bis(oxidanyl)methyl]-5-chloranyl-2-ethyl-phenyl]-ethyl-amino]cyclohexyl]carbamate is CCC=CC=CC=C6NCC))CCCCCC6))NC=O)OCC)C)C))))))))))))Cl)))CO)O."} {"text":"The canonical SMILES of the molecule with systematic IUPAC name [(3aS)-3a-carbonofluoridoyl-5a,5b,8,8,11a-pentamethyl-1-propan-2-yl-1,2,3,4,5,6,7,7a,9,10,11,11b,12,13,13a,13b-hexadecahydrocyclopenta[a]chrysen-9-yl] ethanoate is CC(=O)OC1CCC2(C)C(CCC3(C)C2CCC2C4C(C(C)C)CC[C@]4(C(=O)F)CCC23C)C1(C)C."}", "/scratch/micpie/export/iupac_smiles/train_2-3.jsonl": "{"text":"The DeepSMILES of the chemical with traditional IUPAC name (6R,8R,9S,13S,14S,17S)-3-methoxy-13-methyl-17-[(2S)-tetrahydropyran-2-yl]oxy-6,7,8,9,11,12,14,15,16,17-decahydrocyclopenta[a]phenanthren-6-ol is C[C@]CC[C@H][C@H][C@@H]6CC[C@@H]9O[C@H]CCCCO6)))))))))))C[C@H]C=C6C=CC=C6)OC)))))))O."} {"text":"The SMILES of the chemical with traditional IUPAC name (1S,3R,5S,7R)-3-hydroxy-5-methyl-adamantane-1-carboxylate is C[C@]12C[C@@H]3C[C@](C1)(C[C@@](C3)(C2)O)C(=O)[O-]."}", "/scratch/micpie/export/iupac_smiles/train_26-4.jsonl": "{"text":"The DeepSMILES of the compound with systematic IUPAC name 2-(4-chloranyl-3-fluoranyl-phenyl)-4-ethyl-1,3-thiazole-5-carboxylic acid is CCC=CSC=N5)C=CC=CC=C6))Cl))F))))))C=O)O."} {"text":"The canonical SMILES of the molecule with systematic IUPAC name 1-[(3S)-3-(4-chlorophenyl)-5-(4-fluorophenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(2-fluorophenyl)piperazin-1-yl]ethanone is O=C(CN1CCN(c2ccccc2F)CC1)N1N=C(c2ccc(F)cc2)C[C@H]1c1ccc(Cl)cc1."}", "/scratch/micpie/export/iupac_smiles/valid_18-4.jsonl": "{"text":"The SMILES of the chemical with systematic IUPAC name 8-(1-methylpyrrolidin-3-yl)-6-pyridin-3-yl-quinazoline is CN1CCC(C1)C2=CC(=CC3=CN=CN=C23)C4=CN=CC=C4."} {"text":"The SMILES of the chemical with systematic IUPAC name 2-[4-[2-(methylamino)ethyl]piperidin-1-yl]-1-(2-thiophen-2-ylpyrrolidin-1-yl)propan-1-one is CC(C(=O)N1CCCC1C2=CC=CS2)N3CCC(CC3)CCNC."}", "/scratch/micpie/export/iupac_smiles/valid_16-8.jsonl": "{"text":"Task: Please create the SMILES representation of a molecule given the systematic IUPAC name.\nIUPAC name: 5-chloranyl-4-(chloromethyl)-2-(3-iodanylphenoxy)pyridine\nResult: C1=CC(=CC(=C1)I)OC2=NC=C(C(=C2)CCl)Cl"} {"text":"Task: Please give me the SMILES of a compound given the systematic IUPAC name.\nIUPAC name: N-[2-[2,5-bis(chloranyl)phenyl]ethyl]-2-ethoxy-3-[[4-[2-(2H-1,2,3,4-tetrazol-5-yl)phenyl]phenyl]methyl]benzimidazole-4-carboxamide\nResult: CCOC=NC=CC=CC=C6N9CC=CC=CC=C6))C=CC=CC=C6C=NNN=N5))))))))))))))))))C=O)NCCC=CC=CC=C6)Cl))))Cl"}", "/scratch/micpie/export/iupac_smiles/train_0-6.jsonl": "{"text":"The SELFIES of the chemical with IUPAC name N-[(Z)-(3-bromo-4,5-dimethoxyphenyl)methylideneamino]-1-[(4-nitrophenyl)methyl]pyrazole-3-carboxamide is [C][O][C][=C][Branch2][Ring2][=N][C][=Branch2][Ring2][Branch2][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][\/C][=N][\\N][C][=Branch1][C][=O][C][=N][N][Branch1][Branch1][C][=C][Ring1][Branch1][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N+1][=Branch1][C][=O][O-1][Br][O][C]."} {"text":"The InChI of the chemical with preferred IUPAC name 1-[4-[3-(2-methylpiperidin-1-yl)propoxy]phenyl]ethanone is InChI=1S\/C17H25NO2\/c1-14-6-3-4-11-18(14)12-5-13-20-17-9-7-16(8-10-17)15(2)19\/h7-10,14H,3-6,11-13H2,1-2H3."}", "/scratch/micpie/export/iupac_smiles/train_14-6.jsonl": "{"text":"The SELFIES of the chemical with IUPAC name (1,3,5-trimethylpyrazol-4-yl) (3R)-5-oxo-1-(2,2,2-trifluoroethyl)pyrrolidine-3-carboxylate is [C][C][=C][Branch1][#Branch2][C][=Branch1][=Branch1][=N][N][Ring1][Branch1][C][C][O][C][=Branch1][C][=O][C@@H1][C][C][=Branch1][C][=O][N][Branch1][Ring2][C][Ring1][=Branch1][C][C][Branch1][C][F][Branch1][C][F][F]."} {"text":"The SELFIES of the compound with IUPAC name [7-chloro-5-[4-(3-fluoropyrrolidin-1-yl)sulfonylphenyl]-1-benzofuran-2-yl]methanamine is [C][C][N][Branch1][=Branch1][C][C][Ring1][Branch1][F][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=Branch1][P][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][Branch1][Ring2][O][Ring1][=Branch1][C][N][Cl]."}", "/scratch/micpie/export/iupac_smiles/valid_0-6.jsonl": "{"text":"The DeepSMILES of the chemical with preferred IUPAC name 1-[(4-bromophenyl)methyl]-N-[(Z)-1-(4-methylphenyl)ethylideneamino]pyrazole-3-carboxamide is CC=CC=CC=C6))\/C=N\\NC=O)C=NNC=C5))CC=CC=CC=C6))Br))))))))))))\/C."} {"text":"The SMILES of the molecule with preferred IUPAC name ethene;4-methoxy-4,6-dimethyloxane-2,5-diol is CC1C(C(CC(O1)O)(C)OC)O.C=C."}", "/scratch/micpie/export/iupac_smiles/test_16-3.jsonl": "{"text":"The SMILES of the chemical with traditional IUPAC name 5-chloro-4-(chloromethyl)-2-(4-propoxyphenoxy)pyridine is CCCOC1=CC=C(C=C1)OC2=NC=C(C(=C2)CCl)Cl."} {"text":"The SMILES of the compound with traditional IUPAC name 2-ethoxy-N-(4-fluorobenzyl)-3-[4-[2-(2H-tetrazol-5-yl)phenyl]benzyl]benzimidazole-4-carboxamide is CCOC1=NC2=CC=CC(=C2N1CC3=CC=C(C=C3)C4=CC=CC=C4C5=NNN=N5)C(=O)NCC6=CC=C(C=C6)F."}", "/scratch/micpie/export/iupac_smiles/test_13-4.jsonl": "{"text":"The SMILES of the compound with systematic IUPAC name 2-azanyl-3,3-dimethyl-N-[2-methyl-1,3-bis(oxidanyl)propan-2-yl]butanamide is CC(C)(C)C(C(=O)NC(C)(CO)CO)N."} {"text":"The InChI of the molecule with systematic IUPAC name (1,3,5-trimethylpyrazol-4-yl) (4S)-3,4-dihydro-2H-chromene-4-carboxylate is InChI=1S\/C16H18N2O3\/c1-10-15(11(2)18(3)17-10)21-16(19)13-8-9-20-14-7-5-4-6-12(13)14\/h4-7,13H,8-9H2,1-3H3\/t13-\/m0\/s1."}", "/scratch/micpie/export/iupac_smiles/valid_13-9.jsonl": "{"text":"Task: Please generate the SMILES representation of a chemical given the CAS-like IUPAC name.\nIUPAC name: 2-[[(2-amino-3,3-dimethyl-1-oxobutyl)amino]methyl]-2-methylbutanoic acid\nResult: CCCC)CNC=O)CCC)C)C))N)))))C=O)O"} {"text":"Task: Please generate the SMILES of a chemical given the IUAPC name in CAS-like style.\nIUPAC name: (2-chloro-4-pyridinyl)-[(2R)-2-methyl-1,2,3,4-tetrahydro-1,5-benzodiazepin-5-yl]methanone\nResult: C[C@@H]1CCN(C(=O)c2ccnc(Cl)c2)c2ccccc2N1"}", "/scratch/micpie/export/iupac_smiles/train_6-9.jsonl": "{"text":"Task: Please create the SMILES of a chemical based on the IUAPC name in CAS-like style.\nIUPAC name: N-(2-aminoethyl)-2-[[1-oxo-2-(2-oxolanylmethoxy)propyl]amino]benzamide\nResult: InChI=1S\/C17H25N3O4\/c1-12(24-11-13-5-4-10-23-13)16(21)20-15-7-3-2-6-14(15)17(22)19-9-8-18\/h2-3,6-7,12-13H,4-5,8-11,18H2,1H3,(H,19,22)(H,20,21)"} {"text":"Task: Please generate the SMILES representation of a molecule based on the CAS-like IUPAC name.\nIUPAC name: 3-amino-4-fluoro-N-[(1S)-1-phenylethyl]benzenesulfonamide\nResult: C[C@H](NS(=O)(=O)c1ccc(F)c(N)c1)c1ccccc1"}", "/scratch/micpie/export/iupac_smiles/valid_15-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the compound with SMILES CS(=O)(=O)OCCC1=CC2=CC(=CC(=C2O1)C(F)(F)F)C3=NC=C(C=C3)C(=O)N4CCC(CC4)(F)F is methanesulfonic acid 2-[5-[5-[(4,4-difluoro-1-piperidinyl)-oxomethyl]-2-pyridinyl]-7-(trifluoromethyl)-2-benzofuranyl]ethyl ester."} {"text":"The IUAPC name in CAS-like style of the molecule with SELFIES [C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][C][=C][Ring1][#Branch1][O][C][=N][C][=C][Branch1][=Branch2][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][Cl][Cl] is 5-chloro-4-(chloromethyl)-2-(1-naphthalenyloxy)pyridine."}", "/scratch/micpie/export/iupac_smiles/valid_1-2.jsonl": "{"text":"The IUPAC name of the molecule with canonical SMILES C=C.C=CCN(C)CC is ethene;N-ethyl-N-methylprop-2-en-1-amine."} {"text":"The IUPAC name of the chemical with SELFIES [C][C][Branch1][C][C][N][Branch2][Ring1][Branch1][C][C@H1][Branch1][S][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][N][Ring1][Branch1][O][C][C@@H1][Branch1][S][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][N][Ring1][Branch1][O] is (2R)-1-[[(2S)-2-hydroxy-3-(1H-indol-4-yloxy)propyl]-propan-2-ylamino]-3-(1H-indol-4-yloxy)propan-2-ol."}", "/scratch/micpie/export/iupac_smiles/test_15-2.jsonl": "{"text":"The IUPAC name of the chemical with canonical SMILES Nc1ccc(\/C=C\/C(=O)NCc2cc3cc(-c4ccc(C(=O)N5CCC(F)(F)CC5)cc4)cc(-c4cc(Cl)c(F)cc4F)c3o2)cn1 is (E)-3-(6-aminopyridin-3-yl)-N-[[7-(5-chloro-2,4-difluorophenyl)-5-[4-(4,4-difluoropiperidine-1-carbonyl)phenyl]-1-benzofuran-2-yl]methyl]prop-2-enamide."} {"text":"The IUPAC name of the molecule with DeepSMILES CCC=CC5)C=CC=C6))OC=NC=CC=C6)CCl)))Cl is 5-chloro-4-(chloromethyl)-2-(2,3-dihydro-1H-inden-5-yloxy)pyridine."}", "/scratch/micpie/export/iupac_smiles/test_20-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the chemical with canonical SMILES CCOc1ccccc1N1C(=O)C2C3C=C(C(=O)Oc4ccccc4)C([C@@H]2C1=O)[C@@H]1C(=O)N(c2ccccc2OCC)C(=O)[C@@H]31 is (2S,8S,12S)-4,10-bis(2-ethoxyphenyl)-3,5,9,11-tetraoxo-4,10-diazatetracyclo[5.5.2.02,6.08,12]tetradec-13-ene-13-carboxylic acid phenyl ester."} {"text":"The IUAPC name in CAS-like style of the chemical with canonical SMILES CC1OC(OC2C(OC3OC(CO)C(O)C(O[C@@H](CC4CCCCC4)C(=O)O)C3OC(=O)c3ccccc3)CC(C(=O)NCCOC3OC(CO)C(O)C(n4cc(-c5cccc(F)c5)nn4)C3O)CC2n2cc(-c3cccc(F)c3)nn2)C(O)C(O)C1O is (2S)-2-[[3-benzoyloxy-2-[3-[4-(3-fluorophenyl)-1-triazolyl]-5-[[2-[[4-[4-(3-fluorophenyl)-1-triazolyl]-3,5-dihydroxy-6-(hydroxymethyl)-2-oxanyl]oxy]ethylamino]-oxomethyl]-2-[(3,4,5-trihydroxy-6-methyl-2-oxanyl)oxy]cyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)-4-oxanyl]oxy]-3-cyclohexylpropanoic acid."}", "/scratch/micpie/export/iupac_smiles/test_14-1.jsonl": "{"text":"The CAS-like IUPAC name of the molecule with DeepSMILES CC=CC=NN5C)))C))OC=O)[C@H]CC=O)NC5)CCF)F)F is (3S)-5-oxo-1-(2,2,2-trifluoroethyl)-3-pyrrolidinecarboxylic acid (1,3,5-trimethyl-4-pyrazolyl) ester."} {"text":"The IUAPC name in CAS-like style of the compound with SELFIES [C][=C][Branch1][=C][C][=Branch1][#Branch2][=C][C][=Branch1][Branch1][=C][Ring1][=Branch1][O][I][Br][C][Branch1][C][F][Branch1][C][F][F] is 4-bromo-2-iodo-5-(trifluoromethyl)phenol."}", "/scratch/micpie/export/iupac_smiles/valid_7-2.jsonl": "{"text":"The IUPAC name of the compound with InChI InChI=1S\/C14H19F3N2OS\/c1-10(2)13(20)19-8-7-18-9-11-3-5-12(6-4-11)21-14(15,16)17\/h3-6,10,18H,7-9H2,1-2H3,(H,19,20) is 2-methyl-N-[2-[[4-(trifluoromethylsulfanyl)phenyl]methylamino]ethyl]propanamide."} {"text":"The IUPAC name of the compound with InChI InChI=1S\/C13H14N2O3S2\/c14-12(16)8-9-15(11-5-2-1-3-6-11)20(17,18)13-7-4-10-19-13\/h1-7,10H,8-9H2,(H2,14,16) is 3-(N-thiophen-2-ylsulfonylanilino)propanamide."}", "/scratch/micpie/export/iupac_smiles/test_25-5.jsonl": "{"text":"The InChI of the chemical with IUAPC name in CAS-like style 5-(1,3-benzoxazol-5-yl)-N-[(1R)-1-cyclopropyl-2,2,2-trifluoroethyl]-7-methyl-3-pyrazolo[1,5-a]pyrimidinecarboxamide is InChI=1S\/C20H16F3N5O2\/c1-10-6-14(12-4-5-16-15(7-12)24-9-30-16)26-18-13(8-25-28(10)18)19(29)27-17(11-2-3-11)20(21,22)23\/h4-9,11,17H,2-3H2,1H3,(H,27,29)\/t17-\/m1\/s1."} {"text":"The SELFIES of the molecule with IUAPC name in CAS-like style 3-ethyl-5,6,7,8-tetrahydrobenzo[f]benzofuran-2-carboxylic acid is [C][C][C][=C][Branch2][Ring1][C][O][C][=C][Ring1][Branch1][C][=C][C][C][C][C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=Branch1][C][=O][O]."}", "/scratch/micpie/export/iupac_smiles/valid_12-9.jsonl": "{"text":"Task: Please create the SMILES representation of a compound based on the IUAPC name in CAS-like style.\nIUPAC name: 8-[6-(4,4-dioctoxy-1-oxobutoxy)hexyl-(2-hydroxyethyl)amino]octanoic acid ethyl ester\nResult: CCCCCCCCOCCCC=O)OCCCCCCNCCCCCCCC=O)OCC)))))))))))CCO))))))))))))))OCCCCCCCC"} {"text":"Task: Please create the SMILES of a molecule given the IUAPC name in CAS-like style.\nIUPAC name: 2-amino-3,3-dimethyl-N-(5,5,5-trifluoropentyl)butanamide\nResult: [C][C][Branch1][C][C][Branch1][C][C][C][Branch2][Ring1][C][C][=Branch1][C][=O][N][C][C][C][C][C][Branch1][C][F][Branch1][C][F][F][N]"}", "/scratch/micpie/export/iupac_smiles/test_6-9.jsonl": "{"text":"Task: Please give me the SMILES representation of a chemical given the CAS-like IUPAC name.\nIUPAC name: N-(2-aminoethyl)-2-[(3-methyl-1-oxo-3-phenylbutyl)amino]benzamide\nResult: InChI=1S\/C20H25N3O2\/c1-20(2,15-8-4-3-5-9-15)14-18(24)23-17-11-7-6-10-16(17)19(25)22-13-12-21\/h3-11H,12-14,21H2,1-2H3,(H,22,25)(H,23,24)"} {"text":"Task: Please create the SMILES of a chemical based on the IUAPC name in CAS-like style.\nIUPAC name: N-[(3,4-dichlorophenyl)methyl]-2,3-dihydro-1,4-benzodioxin-6-amine\nResult: [C][C][O][C][=C][Branch1][Ring2][O][Ring1][=Branch1][C][=C][C][=Branch1][Ring2][=C][Ring1][#Branch1][N][C][C][=C][C][=Branch1][=Branch2][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl][Cl]"}", "/scratch/micpie/export/iupac_smiles/valid_22-7.jsonl": "{"text":"Task: Please create the SMILES of a chemical given the traditional IUPAC name.\nIUPAC name: 1-cyclopropyl-3-ethynyl-2-(trifluoromethyl)benzene\nResult: C#CC=CC=CC=C6)))CCC3))))CF)F)F"} {"text":"Task: Please create the SMILES representation of a chemical based on the traditional IUPAC name.\nIUPAC name: 1-(3-fluoro-4-methoxy-phenyl)sulfonylnipecotaldehyde\nResult: [C][O][C][=C][Branch2][Ring1][N][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][=O][F]"}", "/scratch/micpie/export/iupac_smiles/train_3-8.jsonl": "{"text":"Task: Please create the SMILES representation of a chemical based on the systematic IUPAC name.\nIUPAC name: (5-nitrofuran-2-yl)-pyrrolidin-1-yl-methanone\nResult: C1CCN(C1)C(=O)C2=CC=C(O2)[N+](=O)[O-]"} {"text":"Task: Please create the SMILES of a molecule given the systematic IUPAC name.\nIUPAC name: (12S,14S,17E)-7-chloranyl-12-(hydroxymethyl)-23-(phenylmethyl)-15,20-dioxa-2,4,5,9,11,23-hexazahexacyclo[19.7.1.13,10.111,14.04,8.022,27]hentriaconta-1(28),3(31),5,7,9,17,21(29),22(27)-octaen-24-one\nResult: O=C1CCc2cc3cc(c2N1Cc1ccccc1)OC\/C=C\/CO[C@H]1C[C@@H](CO)N(C1)c1cc(n2ncc(Cl)c2n1)N3"}", "/scratch/micpie/export/iupac_smiles/test_19-4.jsonl": "{"text":"The DeepSMILES of the molecule with systematic IUPAC name N-(4-chlorophenyl)-3-[4-[2-(methylamino)ethyl]piperidin-1-yl]propanamide;hydrochloride is CNCCCCCNCC6))CCC=O)NC=CC=CC=C6))Cl.Cl."} {"text":"The SMILES of the compound with systematic IUPAC name nan is C[C@H]1C=CC=C2[C@]13CC[C@H](CC(=O)C[C@H](N4C=C5C(=CC=C6C5=C4C[C@@]7([C@@H]6O)CCC8(C7)CCCC8)N(C9=CC(=CCN9)[C@H]([C@@H](C#CC2)C1=CNC=C1C3)C1=CC(=CC=C1)O)CCC(=O)C)C1=CC(=C(C(=C1)OC)OC1=CC=CC(=C1)O)O)OC(=O)C."}", "/scratch/micpie/export/iupac_smiles/valid_13-5.jsonl": "{"text":"The InChI of the compound with CAS-like IUPAC name 2-[[(2-amino-3,3-dimethyl-1-oxobutyl)amino]methyl]-2-methylbutanoic acid is InChI=1S\/C12H24N2O3\/c1-6-12(5,10(16)17)7-14-9(15)8(13)11(2,3)4\/h8H,6-7,13H2,1-5H3,(H,14,15)(H,16,17)."} {"text":"The DeepSMILES of the compound with CAS-like IUPAC name (2-chloro-4-pyridinyl)-[(2R)-2-methyl-1,2,3,4-tetrahydro-1,5-benzodiazepin-5-yl]methanone is C[C@@H]CCNC=CC=CC=C6N%11)))))))C=O)C=CC=NC=C6)))Cl."}", "/scratch/micpie/export/iupac_smiles/test_3-9.jsonl": "{"text":"Task: Please generate the SMILES representation of a molecule based on the IUAPC name in CAS-like style.\nIUPAC name: (2R)-1-(phenylmethylthio)-3-propan-2-yloxy-2-propanol\nResult: CC(C)OC[C@@H](O)CSCc1ccccc1"} {"text":"Task: Please generate the SMILES of a compound given the CAS-like IUPAC name.\nIUPAC name: 2-amino-2-(hydroxymethyl)propane-1,3-diol;5-chloro-6-methyl-3H-1,3-benzoxazol-2-one\nResult: [C][C][=C][C][=C][Branch1][=Branch1][C][=C][Ring1][=Branch1][Cl][N][C][=Branch1][C][=O][O][Ring1][=Branch2].[C][Branch1][O][C][Branch1][Ring1][C][O][Branch1][Ring1][C][O][N][O]"}", "/scratch/micpie/export/iupac_smiles/valid_9-7.jsonl": "{"text":"Task: Please give me the SMILES of a compound given the traditional IUPAC name.\nIUPAC name: 1-(isopropylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,9,11,13-heptaenylideneamino)oxy-propan-2-ol\nResult: CCC)NCCCON=CC=CC=CC=C6C=CC=CC=CC=C6%15))))))))))))))))))O"} {"text":"Task: Please create the SMILES representation of a molecule given the traditional IUPAC name.\nIUPAC name: 5-chloro-3-[cyclobutyl(methyl)amino]-N-[(4-cycloheptyl-2-keto-4,6-dimethyl-3-piperidyl)methyl]-2-methyl-benzamide\nResult: CCCCCC=O)N6))CNC=O)C=CC=CC=C6)Cl)))NC)CCCC4))))))C)))))))C)CCCCCCC7"}", "/scratch/micpie/export/iupac_smiles/valid_20-4.jsonl": "{"text":"The DeepSMILES of the compound with systematic IUPAC name methyl 4-acetyloxy-7-(2-oct-2-enyl-2-oxidanyl-5-oxidanylidene-cyclopent-3-en-1-ylidene)hept-5-enoate is CCCCCC=CCCC=CC=O)C5=CC=CCCCC=O)OC)))))OC=O)C)))))))))))O."} {"text":"The canonical SMILES of the molecule with systematic IUPAC name 2-[2-[2-(2-oxidanylidenepropylamino)ethylamino]ethylamino]ethanoic acid is CC(=O)CNCCNCCNCC(=O)O."}", "/scratch/micpie/export/iupac_smiles/valid_17-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the compound with DeepSMILES CCOC=NC=CC=CC=C6N9CC=CC=CC=C6))C=CC=CC=C6C=NNN=N5))))))))))))))))))C=O)NC=CC=CC=C6))F is 2-ethoxy-N-(4-fluorophenyl)-3-[[4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl]-4-benzimidazolecarboxamide."} {"text":"The CAS-like IUPAC name of the molecule with canonical SMILES CC(C)(C)Nc1cccc(NC(=O)c2cc(C(=O)Nc3cccc(N)c3)cc(C(=O)Nc3cccc(NC(C)(C)C)c3)c2)c1 is N3-(3-aminophenyl)-N1,N5-bis[3-(tert-butylamino)phenyl]benzene-1,3,5-tricarboxamide."}", "/scratch/micpie/export/iupac_smiles/test_15-3.jsonl": "{"text":"The DeepSMILES of the chemical with traditional IUPAC name (E)-3-(6-amino-3-pyridyl)-N-[[7-(5-chloro-2,4-difluoro-phenyl)-5-[4-(4,4-difluoropiperidine-1-carbonyl)phenyl]benzofuran-2-yl]methyl]acrylamide is CCNCCC6F)F))))C=O)C=CC=CC=C6))C=CC=CC=C6)C=CO5)CNC=O)\/C=C\/C=CN=CC=C6))N))))))))))))))C=CC=CC=C6F)))F))Cl."} {"text":"The InChI of the molecule with traditional IUPAC name 5-chloro-4-(chloromethyl)-2-indan-5-yloxy-pyridine is InChI=1S\/C15H13Cl2NO\/c16-8-12-7-15(18-9-14(12)17)19-13-5-4-10-2-1-3-11(10)6-13\/h4-7,9H,1-3,8H2."}", "/scratch/micpie/export/iupac_smiles/test_14-5.jsonl": "{"text":"The DeepSMILES of the molecule with IUAPC name in CAS-like style (3S)-5-oxo-1-(2,2,2-trifluoroethyl)-3-pyrrolidinecarboxylic acid (1,3,5-trimethyl-4-pyrazolyl) ester is CC=CC=NN5C)))C))OC=O)[C@H]CC=O)NC5)CCF)F)F."} {"text":"The DeepSMILES of the molecule with CAS-like IUPAC name 4-bromo-2-iodo-5-(trifluoromethyl)phenol is C=CC=CC=C6O))I)))Br))CF)F)F."}", "/scratch/micpie/export/iupac_smiles/valid_27-4.jsonl": "{"text":"The SMILES of the compound with systematic IUPAC name 1-[(3S)-3-(4-chlorophenyl)-5-(4-fluorophenyl)-3,4-dihydropyrazol-2-yl]-2-[methyl(phenyl)amino]ethanone is CN(CC(=O)N1[C@@H](CC(=N1)C2=CC=C(C=C2)F)C3=CC=C(C=C3)Cl)C4=CC=CC=C4."} {"text":"The InChI of the chemical with systematic IUPAC name 6-[7'-[(9,9-dimethylfluoren-2-yl)amino]-9,9'-spirobi[fluorene]-2'-yl]benzene-1,2,3,4,5-pentol is InChI=1S\/C46H33NO5\/c1-45(2)33-12-6-3-9-27(33)30-19-16-25(22-36(30)45)47-26-17-20-32-31-18-15-24(39-40(48)42(50)44(52)43(51)41(39)49)21-37(31)46(38(32)23-26)34-13-7-4-10-28(34)29-11-5-8-14-35(29)46\/h3-23,47-52H,1-2H3."}", "/scratch/micpie/export/iupac_smiles/test_17-0.jsonl": "{"text":"The traditional IUPAC name of the compound with DeepSMILES CCOC=NC=CC=CC=C6N9)))))C=O)OCC=CC=CC=C6))C=CC=CC=C6C=NNN=N5 is 2-ethoxy-1H-benzimidazole-4-carboxylic acid [4-[2-(2H-tetrazol-5-yl)phenyl]benzyl] ester."} {"text":"The traditional IUPAC name of the chemical with SELFIES [C][\/C][=C][\\C][=Branch1][Ring1][=C][\/C][\\C][Branch2][#Branch1][=Branch1][C][C][=Branch2][=Branch1][=Branch1][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][C][=C][Branch2][Ring2][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][=Branch2][=C][Ring1][#Branch2][C][C][Ring1][=C][I][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=N][Ring1][=Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=N][Ring1][=Branch1][C][=C][C][Branch1][=Branch1][C][C][=C][Ring1][=Branch1][C][C] is 2-[4-[2-[3-[(Z,1E)-1-ethylidenebut-2-enyl]-3-methyl-5-(3-methylcyclohexa-1,5-dien-1-yl)cyclohexa-1,5-dien-1-yl]-3-iodo-10-[4-(2-pyridyl)phenyl]-3,4-dihydroanthracen-9-yl]phenyl]pyridine."}", "/scratch/micpie/export/iupac_smiles/valid_12-8.jsonl": "{"text":"Task: Please give me the SMILES of a compound given the systematic IUPAC name.\nIUPAC name: ethyl 8-[6-(4,4-dioctoxybutanoyloxy)hexyl-(2-hydroxyethyl)amino]octanoate\nResult: [C][C][C][C][C][C][C][C][O][C][Branch2][Ring2][C][C][C][C][=Branch1][C][=O][O][C][C][C][C][C][C][N][Branch1][#C][C][C][C][C][C][C][C][C][=Branch1][C][=O][O][C][C][C][C][O][O][C][C][C][C][C][C][C][C]"} {"text":"Task: Please generate the SMILES representation of a molecule based on the systematic IUPAC name.\nIUPAC name: 2-azanyl-3,3-dimethyl-N-[5,5,5-tris(fluoranyl)pentyl]butanamide\nResult: [C][C][Branch1][C][C][Branch1][C][C][C][Branch2][Ring1][C][C][=Branch1][C][=O][N][C][C][C][C][C][Branch1][C][F][Branch1][C][F][F][N]"}", "/scratch/micpie/export/iupac_smiles/train_25-6.jsonl": "{"text":"The SMILES of the molecule with preferred IUPAC name N-(4-butoxycyclohexyl)-5-(5-chloropyridin-3-yl)-7-methylpyrazolo[1,5-a]pyrimidine-3-carboxamide is CCCCOC1CCC(CC1)NC(=O)C2=C3N=C(C=C(N3N=C2)C)C4=CC(=CN=C4)Cl."} {"text":"The InChI of the chemical with preferred IUPAC name 8-ethyl-2,3-dihydrothieno[2,3-g][1,4]benzodioxine-7-carboxylic acid is InChI=1S\/C13H12O4S\/c1-2-7-8-5-9-10(17-4-3-16-9)6-11(8)18-12(7)13(14)15\/h5-6H,2-4H2,1H3,(H,14,15)."}", "/scratch/micpie/export/iupac_smiles/test_2-6.jsonl": "{"text":"The InChI of the chemical with preferred IUPAC name (2R)-2-(methylamino)-1-(2-methylphenyl)propan-1-one is InChI=1S\/C11H15NO\/c1-8-6-4-5-7-10(8)11(13)9(2)12-3\/h4-7,9,12H,1-3H3\/t9-\/m1\/s1."} {"text":"The SELFIES of the compound with IUPAC name 2-(6-bromo-2,3-dihydro-1,4-benzodioxin-7-yl)-4,5,6,7-tetrachloroisoindole-1,3-dione is [C][C][O][C][=C][Branch1][Ring2][O][Ring1][=Branch1][C][=C][Branch1][Branch2][C][=Branch1][Ring2][=C][Ring1][#Branch1][Br][N][C][=Branch1][C][=O][C][=C][Branch1][Branch1][C][Ring1][=Branch1][=O][C][=Branch1][=N][=C][Branch1][=Branch2][C][=Branch1][Branch1][=C][Ring1][Branch2][Cl][Cl][Cl][Cl]."}", "/scratch/micpie/export/iupac_smiles/valid_5-5.jsonl": "{"text":"The InChI of the chemical with CAS-like IUPAC name 4-[1-(aminomethyl)cyclooctyl]-3-ethyl-4-piperidinol is InChI=1S\/C16H32N2O\/c1-2-14-12-18-11-10-16(14,19)15(13-17)8-6-4-3-5-7-9-15\/h14,18-19H,2-13,17H2,1H3."} {"text":"The InChI of the compound with CAS-like IUPAC name N-(2-aminoethyl)-2-[[3-[[cyclopentyl(oxo)methyl]amino]-1-oxobutyl]amino]benzamide;hydrochloride is InChI=1S\/C19H28N4O3.ClH\/c1-13(22-18(25)14-6-2-3-7-14)12-17(24)23-16-9-5-4-8-15(16)19(26)21-11-10-20;\/h4-5,8-9,13-14H,2-3,6-7,10-12,20H2,1H3,(H,21,26)(H,22,25)(H,23,24);1H."}", "/scratch/micpie/export/iupac_smiles/valid_11-6.jsonl": "{"text":"The canonical SMILES of the chemical with IUPAC name 13-chloro-2-piperidin-4-ylidene-4-azatricyclo[9.4.0.03,8]pentadeca-1(11),3(8),4,6,12,14-hexaene;disulfate is Clc1ccc2c(c1)CCc1cccnc1C2=C1CCNCC1.O=S(=O)([O-])[O-].O=S(=O)([O-])[O-]."} {"text":"The SMILES of the chemical with preferred IUPAC name 4-(7-bromo-9,9-dioctylfluoren-2-yl)benzaldehyde is CCCCCCCCC1(C2=C(C=CC(=C2)C3=CC=C(C=C3)C=O)C4=C1C=C(C=C4)Br)CCCCCCCC."}", "/scratch/micpie/export/iupac_smiles/train_21-0.jsonl": "{"text":"The traditional IUPAC name of the compound with InChI InChI=1S\/C68H88F2N8O22\/c1-3-35-23-39(28-47(59(35)100-67-58(87)57(86)52(81)33(2)93-67)96-68-61(99-65(92)36-14-8-5-9-15-36)60(55(84)50(32-80)98-68)94-48(64(90)91)22-34-12-6-4-7-13-34)62(88)71-20-21-72-63(89)40-26-45(77-29-43(73-75-77)37-16-10-18-41(69)24-37)53(82)46(27-40)95-66-56(85)51(54(83)49(31-79)97-66)78-30-44(74-76-78)38-17-11-19-42(70)25-38\/h5,8-11,14-19,24-25,29-30,33-35,39-40,45-61,66-68,79-87H,3-4,6-7,12-13,20-23,26-28,31-32H2,1-2H3,(H,71,88)(H,72,89)(H,90,91)\/t33?,35?,39?,40?,45?,46?,47?,48-,49?,50?,51?,52?,53?,54?,55?,56?,57?,58?,59?,60?,61?,66?,67?,68?\/m0\/s1 is (2S)-2-[3-benzoyloxy-2-[3-ethyl-5-[2-[[3-[4-(3-fluorophenyl)triazol-1-yl]-5-[4-[4-(3-fluorophenyl)triazol-1-yl]-3,5-dihydroxy-6-methylol-tetrahydropyran-2-yl]oxy-4-hydroxy-cyclohexanecarbonyl]amino]ethylcarbamoyl]-2-(3,4,5-trihydroxy-6-methyl-tetrahydropyran-2-yl)oxy-cyclohexoxy]-5-hydroxy-6-methylol-tetrahydropyran-4-yl]oxy-3-cyclohexyl-propionic acid."} {"text":"The traditional IUPAC name of the chemical with SELFIES [C][=C][C][=Branch1][=Branch1][=N][N][=C][Ring1][=Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][O][C][Branch1][C][F][Branch1][C][F][F] is 3-[4-(trifluoromethoxy)phenyl]pyridazine."}", "/scratch/micpie/export/iupac_smiles/test_22-2.jsonl": "{"text":"The IUPAC name of the chemical with canonical SMILES O=C1Oc2cccc3c2c(nn3Br)O1 is 2-bromo-5,7-dioxa-2,3-diazatricyclo[6.3.1.04,12]dodeca-1(12),3,8,10-tetraen-6-one."} {"text":"The IUPAC name of the compound with SELFIES [C][C][C][Branch2][Ring1][P][C][N][Branch1][Ring2][C][Ring1][=Branch1][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][Ring1][Branch2][C][=O] is 1-[(2-oxo-1,3-dihydroindol-5-yl)sulfonyl]piperidine-3-carbaldehyde."}", "/scratch/micpie/export/iupac_smiles/valid_14-8.jsonl": "{"text":"Task: Please generate the SMILES representation of a chemical based on the systematic IUPAC name.\nIUPAC name: tert-butyl N-[[(1S,2S)-2-(furan-2-ylcarbonylamino)cyclopentyl]methyl]carbamate\nResult: InChI=1S\/C16H24N2O4\/c1-16(2,3)22-15(20)17-10-11-6-4-7-12(11)18-14(19)13-8-5-9-21-13\/h5,8-9,11-12H,4,6-7,10H2,1-3H3,(H,17,20)(H,18,19)\/t11-,12-\/m0\/s1"} {"text":"Task: Please give me the SMILES of a molecule based on the systematic IUPAC name.\nIUPAC name: tert-butyl N-[2-(5-bromanyl-7-chloranyl-1-benzofuran-2-yl)ethyl]carbamate\nResult: InChI=1S\/C15H17BrClNO3\/c1-15(2,3)21-14(19)18-5-4-11-7-9-6-10(16)8-12(17)13(9)20-11\/h6-8H,4-5H2,1-3H3,(H,18,19)"}", "/scratch/micpie/export/iupac_smiles/test_17-3.jsonl": "{"text":"The DeepSMILES of the compound with traditional IUPAC name 2-ethoxy-1H-benzimidazole-4-carboxylic acid [4-[2-(2H-tetrazol-5-yl)phenyl]benzyl] ester is CCOC=NC=CC=CC=C6N9)))))C=O)OCC=CC=CC=C6))C=CC=CC=C6C=NNN=N5."} {"text":"The canonical SMILES of the chemical with traditional IUPAC name 2-[4-[2-[3-[(Z,1E)-1-ethylidenebut-2-enyl]-3-methyl-5-(3-methylcyclohexa-1,5-dien-1-yl)cyclohexa-1,5-dien-1-yl]-3-iodo-10-[4-(2-pyridyl)phenyl]-3,4-dihydroanthracen-9-yl]phenyl]pyridine is C\/C=C\\C(=C\/C)C1(C)C=C(C2=Cc3c(c(-c4ccc(-c5ccccn5)cc4)c4ccccc4c3-c3ccc(-c4ccccn4)cc3)CC2I)C=C(C2=CC(C)CC=C2)C1."}", "/scratch/micpie/export/iupac_smiles/train_15-2.jsonl": "{"text":"The IUPAC name of the chemical with canonical SMILES Cc1nccc2c1[nH]c1cc(C(=O)O)ccc12 is 1-methyl-9H-pyrido[3,4-b]indole-7-carboxylic acid."} {"text":"The IUPAC name of the compound with canonical SMILES ClCc1cc(Oc2cccc3cccnc23)ncc1Cl is 8-[5-chloro-4-(chloromethyl)pyridin-2-yl]oxyquinoline."}", "/scratch/micpie/export/iupac_smiles/train_2-4.jsonl": "{"text":"The canonical SMILES of the compound with systematic IUPAC name (6R,8R,9S,13S,14S,17S)-3-methoxy-13-methyl-17-[(2S)-oxan-2-yl]oxy-6,7,8,9,11,12,14,15,16,17-decahydrocyclopenta[a]phenanthren-6-ol is COc1ccc2c(c1)[C@H](O)C[C@@H]1[C@@H]2CC[C@]2(C)[C@@H](O[C@H]3CCCCO3)CC[C@@H]12."} {"text":"The canonical SMILES of the compound with systematic IUPAC name (1S,3S,5R,7R)-3-methyl-5-oxidanyl-adamantane-1-carboxylate is C[C@@]12C[C@H]3C[C@@](O)(C1)C[C@](C(=O)[O-])(C3)C2."}", "/scratch/micpie/export/iupac_smiles/test_21-5.jsonl": "{"text":"The InChI of the compound with IUAPC name in CAS-like style benzoic acid [4-[(2S)-1-(1-azetidinyl)-3-cyclohexyl-1-oxopropan-2-yl]oxy-2-[5-[[2-[[[3-[[3,5-dihydroxy-6-(hydroxymethyl)-4-[(2-oxo-1-benzopyran-3-yl)methoxy]-2-oxanyl]oxy]-4-hydroxy-5-[(2-oxo-1-benzopyran-3-yl)methoxy]cyclohexyl]-oxomethyl]amino]ethylamino]-oxomethyl]-3-ethyl-2-[(3,4,5-trihydroxy-6-methyl-2-oxanyl)oxy]cyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)-3-oxanyl] ester is InChI=1S\/C75H97N3O27\/c1-3-40-28-44(33-52(63(40)105-73-61(86)60(85)56(81)38(2)96-73)101-75-66(104-70(91)41-17-8-5-9-18-41)65(59(84)55(35-80)103-75)97-53(69(90)78-25-14-26-78)27-39-15-6-4-7-16-39)67(88)76-23-24-77-68(89)45-31-50(94-36-46-29-42-19-10-12-21-48(42)98-71(46)92)57(82)51(32-45)100-74-62(87)64(58(83)54(34-79)102-74)95-37-47-30-43-20-11-13-22-49(43)99-72(47)93\/h5,8-13,17-22,29-30,38-40,44-45,50-66,73-75,79-87H,3-4,6-7,14-16,23-28,31-37H2,1-2H3,(H,76,88)(H,77,89)\/t38?,40?,44?,45?,50?,51?,52?,53-,54?,55?,56?,57?,58?,59?,60?,61?,62?,63?,64?,65?,66?,73?,74?,75?\/m0\/s1."} {"text":"The SELFIES of the molecule with CAS-like IUPAC name 2-(methoxymethyl)-3-[4-(trifluoromethoxy)phenyl]pyrazine is [C][O][C][C][=N][C][=C][N][=C][Ring1][=Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][O][C][Branch1][C][F][Branch1][C][F][F]."}", "/scratch/micpie/export/iupac_smiles/valid_10-7.jsonl": "{"text":"Task: Please create the SMILES representation of a chemical given the traditional IUPAC name.\nIUPAC name: 2-methyl-3-[methyl(1-methylpentyl)amino]benzoic acid methyl ester\nResult: CCCCC(C)N(C)C1=CC=CC(=C1C)C(=O)OC"} {"text":"Task: Please generate the SMILES representation of a molecule based on the traditional IUPAC name.\nIUPAC name: (4S,7R,8S,9S,13Z,16S)-16-[(Z)-1-fluoro-2-(2-pyridyl)vinyl]-4,8-dihydroxy-5,5,9,13-tetramethyl-7-propyl-1-azacyclohexadec-13-ene-2,6-quinone\nResult: CCC[C@@H][C@H][C@H]CCC\/C=C\\C[C@H]NC=O)C[C@@H]CC%16=O))C)C))O)))))\/C=C\/C=CC=CC=N6)))))))\/F)))))\/C)))))C))O"}", "/scratch/micpie/export/iupac_smiles/train_18-0.jsonl": "{"text":"The traditional IUPAC name of the compound with DeepSMILES CC=NC=CC=C6))C=CC=CNC=NC6C=C%10)C=CC=CC=C6)))NC=O)C=C is N-[3-[6-(6-methyl-3-pyridyl)-3,8a-dihydroquinazolin-8-yl]phenyl]acrylamide."} {"text":"The traditional IUPAC name of the compound with SMILES CNCCC1CCN(CC1)CC2=CC(=C(C=C2)F)F.Cl is 2-[1-(3,4-difluorobenzyl)-4-piperidyl]ethyl-methyl-amine;hydrochloride."}", "/scratch/micpie/export/iupac_smiles/valid_18-7.jsonl": "{"text":"Task: Please give me the SMILES representation of a compound given the traditional IUPAC name.\nIUPAC name: 8-(1-methylpyrrolidin-3-yl)-6-(3-pyridyl)quinazoline\nResult: CNCCCC5)C=CC=CC=CN=CN=C%106)))))))C=CN=CC=C6"} {"text":"Task: Please generate the SMILES of a chemical given the traditional IUPAC name.\nIUPAC name: 2-[4-[2-(methylamino)ethyl]piperidino]-1-[2-(2-thienyl)pyrrolidino]propan-1-one\nResult: InChI=1S\/C19H31N3OS\/c1-15(21-12-8-16(9-13-21)7-10-20-2)19(23)22-11-3-5-17(22)18-6-4-14-24-18\/h4,6,14-17,20H,3,5,7-13H2,1-2H3"}", "/scratch/micpie/export/iupac_smiles/valid_22-1.jsonl": "{"text":"The CAS-like IUPAC name of the compound with DeepSMILES C#CC=CC=CC=C6)))CCC3))))CF)F)F is 1-cyclopropyl-3-ethynyl-2-(trifluoromethyl)benzene."} {"text":"The IUAPC name in CAS-like style of the molecule with DeepSMILES COC=CC=CC=C6))S=O)=O)NCCCCC6)C=O))))))))))F is 1-(3-fluoro-4-methoxyphenyl)sulfonyl-3-piperidinecarboxaldehyde."}", "/scratch/micpie/export/iupac_smiles/valid_25-3.jsonl": "{"text":"The SELFIES of the compound with traditional IUPAC name 5-(5-chloro-3-pyridyl)-N-[(1S)-1-cyclopropylethyl]-7-methyl-pyrazolo[1,5-a]pyrimidine-3-carboxamide is [C][C][=C][C][=Branch2][Ring1][O][=N][C][=C][Branch1][Branch2][C][=N][N][Ring1][=Branch2][Ring1][Branch1][C][=Branch1][C][=O][N][C@@H1][Branch1][C][C][C][C][C][Ring1][Ring1][C][=C][C][=Branch1][=Branch1][=C][N][=C][Ring1][=Branch1][Cl]."} {"text":"The SELFIES of the chemical with traditional IUPAC name 2-[amino(phenyl)methyl]-4-ethyl-thiazole-5-carboxylic acid is [C][C][C][=C][Branch2][Ring1][Ring2][S][C][=Branch1][Ring2][=N][Ring1][Branch1][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][O]."}", "/scratch/micpie/export/iupac_smiles/valid_3-4.jsonl": "{"text":"The SELFIES of the compound with systematic IUPAC name (1S,3S,5R,7R)-3-methyl-5-oxidanyl-adamantane-1-carboxylic acid is [C][C@][C][C@@H1][C][C@][Branch1][Ring2][C][Ring1][=Branch1][Branch1][=C][C][C@@][Branch1][Ring2][C][Ring1][#Branch1][Branch1][Ring2][C][Ring1][#Branch2][O][C][=Branch1][C][=O][O]."} {"text":"The SELFIES of the compound with systematic IUPAC name 2-[4-(6,7-dimethoxy-5-oxidanyl-4-oxidanylidene-chromen-2-yl)phenoxy]-N-[6-[(2-methoxyphenyl)methyl-methyl-amino]hexyl]ethanamide is [C][N][Branch2][Branch1][Branch1][C][C][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=Branch1][C][=O][C][=C][Branch2][Ring1][C][C][=Branch1][=N][=C][Branch1][Branch2][C][=C][Ring1][=Branch1][O][Ring1][O][O][C][O][C][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C]."}", "/scratch/micpie/export/iupac_smiles/valid_13-7.jsonl": "{"text":"Task: Please create the SMILES representation of a molecule based on the traditional IUPAC name.\nIUPAC name: 2-[[(2-amino-3,3-dimethyl-butanoyl)amino]methyl]-2-methyl-butyric acid\nResult: [C][C][C][Branch1][C][C][Branch2][Ring1][Ring1][C][N][C][=Branch1][C][=O][C][Branch1][=Branch2][C][Branch1][C][C][Branch1][C][C][C][N][C][=Branch1][C][=O][O]"} {"text":"Task: Please generate the SMILES of a chemical based on the traditional IUPAC name.\nIUPAC name: (2-chloro-4-pyridyl)-[(2R)-2-methyl-1,2,3,4-tetrahydro-1,5-benzodiazepin-5-yl]methanone\nResult: C[C@@H]1CCN(C(=O)c2ccnc(Cl)c2)c2ccccc2N1"}", "/scratch/micpie/export/iupac_smiles/test_26-2.jsonl": "{"text":"The preferred IUPAC name of the compound with DeepSMILES CCC=CSC=N5)C=CC=CC=C6)))Cl))Cl)))))C=O)O is 2-(2,3-dichlorophenyl)-4-ethyl-1,3-thiazole-5-carboxylic acid."} {"text":"The IUPAC name of the compound with canonical SMILES COc1cccc(C2=NN(C(=O)CN3CCc4ccccc4C3)[C@@H](c3ccccc3Cl)C2)c1 is 1-[(3R)-3-(2-chlorophenyl)-5-(3-methoxyphenyl)-3,4-dihydropyrazol-2-yl]-2-(3,4-dihydro-1H-isoquinolin-2-yl)ethanone."}", "/scratch/micpie/export/iupac_smiles/valid_26-5.jsonl": "{"text":"The canonical SMILES of the compound with IUAPC name in CAS-like style 2-[4-(difluoromethyl)phenyl]-4-ethyl-5-thiazolecarboxylic acid is CCc1nc(-c2ccc(C(F)F)cc2)sc1C(=O)O."} {"text":"The SMILES of the chemical with IUAPC name in CAS-like style 1-[(3R)-3-(2-chlorophenyl)-5-(3-methoxyphenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(phenylmethyl)-1-piperazinyl]ethanone is COC1=CC=CC(=C1)C2=NN([C@H](C2)C3=CC=CC=C3Cl)C(=O)CN4CCN(CC4)CC5=CC=CC=C5."}", "/scratch/micpie/export/iupac_smiles/valid_20-9.jsonl": "{"text":"Task: Please give me the SMILES of a chemical based on the CAS-like IUPAC name.\nIUPAC name: 4-acetyloxy-7-(2-hydroxy-2-oct-2-enyl-5-oxo-1-cyclopent-3-enylidene)-5-heptenoic acid methyl ester\nResult: [C][C][C][C][C][C][=C][C][C][Branch2][Ring1][=C][C][=C][C][=Branch1][C][=O][C][Ring1][=Branch1][=C][C][=C][C][Branch1][=Branch2][C][C][C][=Branch1][C][=O][O][C][O][C][=Branch1][C][=O][C][O]"} {"text":"Task: Please generate the SMILES of a molecule based on the IUAPC name in CAS-like style.\nIUPAC name: 2-[2-[2-(2-oxopropylamino)ethylamino]ethylamino]acetic acid\nResult: CC(=O)CNCCNCCNCC(=O)O"}", "/scratch/micpie/export/iupac_smiles/test_11-7.jsonl": "{"text":"Task: Please create the SMILES representation of a compound based on the traditional IUPAC name.\nIUPAC name: [3-(2,4-difluorophenoxy)-3-phenyl-propyl]-dimethyl-amine;2,5-dinitrobenzoic acid\nResult: CN(C)CCC(Oc1ccc(F)cc1F)c1ccccc1.O=C(O)c1cc([N+](=O)[O-])ccc1[N+](=O)[O-]"} {"text":"Task: Please generate the SMILES of a chemical based on the traditional IUPAC name.\nIUPAC name: 4-[7-(4-hydroxyphenyl)-9,9-dioctyl-fluoren-2-yl]benzaldehyde\nResult: CCCCCCCCC1(C2=C(C=CC(=C2)C3=CC=C(C=C3)C=O)C4=C1C=C(C=C4)C5=CC=C(C=C5)O)CCCCCCCC"}", "/scratch/micpie/export/iupac_smiles/train_21-6.jsonl": "{"text":"The canonical SMILES of the molecule with IUPAC name (2S)-2-[3-benzoyloxy-2-[3-ethyl-5-[2-[[3-[4-(3-fluorophenyl)triazol-1-yl]-5-[4-[4-(3-fluorophenyl)triazol-1-yl]-3,5-dihydroxy-6-(hydroxymethyl)oxan-2-yl]oxy-4-hydroxycyclohexanecarbonyl]amino]ethylcarbamoyl]-2-(3,4,5-trihydroxy-6-methyloxan-2-yl)oxycyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)oxan-4-yl]oxy-3-cyclohexylpropanoic acid is CCC1CC(C(=O)NCCNC(=O)C2CC(OC3OC(CO)C(O)C(n4cc(-c5cccc(F)c5)nn4)C3O)C(O)C(n3cc(-c4cccc(F)c4)nn3)C2)CC(OC2OC(CO)C(O)C(O[C@@H](CC3CCCCC3)C(=O)O)C2OC(=O)c2ccccc2)C1OC1OC(C)C(O)C(O)C1O."} {"text":"The InChI of the chemical with preferred IUPAC name 3-[4-(trifluoromethoxy)phenyl]pyridazine is InChI=1S\/C11H7F3N2O\/c12-11(13,14)17-9-5-3-8(4-6-9)10-2-1-7-15-16-10\/h1-7H."}", "/scratch/micpie/export/iupac_smiles/train_22-2.jsonl": "{"text":"The preferred IUPAC name of the molecule with SMILES CC1C(=CC=C(C1(C2=NN=C(O2)C)S(=O)(=O)C)C(F)(F)F)C(=O)N is 6-methyl-5-(5-methyl-1,3,4-oxadiazol-2-yl)-5-methylsulfonyl-4-(trifluoromethyl)cyclohexa-1,3-diene-1-carboxamide."} {"text":"The IUPAC name of the molecule with SELFIES [C][C][C][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][=O] is 1-propylsulfonylpiperidine-3-carbaldehyde."}", "/scratch/micpie/export/iupac_smiles/test_10-5.jsonl": "{"text":"The DeepSMILES of the compound with IUAPC name in CAS-like style 1-[5-chloro-2-methyl-3-[4-piperidinyl(prop-2-enyl)amino]phenyl]ethanol is CC=CC=CC=C6NCC=C)))CCCNCC6)))))))))Cl)))CC)O."} {"text":"The SELFIES of the chemical with IUAPC name in CAS-like style 2-[4-[3-(3-fluoro-6-methoxy-4-quinolinyl)propyl]-1-[2-(3-thiophenylthio)ethyl]-3-piperidinyl]acetic acid is [C][O][C][=C][C][=C][Branch1][=C][C][=Branch1][#Branch2][=C][N][=C][Ring1][=Branch1][C][=C][Ring1][#Branch2][F][C][C][C][C][C][C][N][Branch1][O][C][C][Ring1][=Branch1][C][C][=Branch1][C][=O][O][C][C][S][C][=C][S][C][=C][Ring1][Branch1]."}", "/scratch/micpie/export/iupac_smiles/train_24-3.jsonl": "{"text":"The InChI of the molecule with traditional IUPAC name N-(benzimidazol-1-yl)-2-[(2-phenoxyacetyl)amino]acetamide is InChI=1S\/C17H16N4O3\/c22-16(20-21-12-19-14-8-4-5-9-15(14)21)10-18-17(23)11-24-13-6-2-1-3-7-13\/h1-9,12H,10-11H2,(H,18,23)(H,20,22)."} {"text":"The canonical SMILES of the chemical with traditional IUPAC name N-[(1S)-1-cyclopropyl-2,2,2-trifluoro-ethyl]-7-methyl-5-[4-(trifluoromethyl)-3-pyridyl]pyrazolo[1,5-a]pyrimidine-3-carboxamide is Cc1cc(-c2cnccc2C(F)(F)F)nc2c(C(=O)N[C@@H](C3CC3)C(F)(F)F)cnn12."}", "/scratch/micpie/export/iupac_smiles/train_8-7.jsonl": "{"text":"Task: Please generate the SMILES of a chemical given the traditional IUPAC name.\nIUPAC name: 3-(N-(2-methyl-5-nitro-phenyl)sulfonylanilino)propionamide\nResult: CC=CC=CC=C6))[N+]=O)[O-]))))S=O)=O)NCCC=O)N))))C=CC=CC=C6"} {"text":"Task: Please generate the SMILES representation of a molecule based on the traditional IUPAC name.\nIUPAC name: 1-(isopropylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,11,13-hexaenylideneamino)oxy-propan-2-ol;hydrochloride\nResult: CC(C)NCC(O)CON=C1c2ccccc2CCc2ccccc21.Cl"}", "/scratch/micpie/export/iupac_smiles/valid_25-0.jsonl": "{"text":"The traditional IUPAC name of the chemical with SMILES CC1=CC(=NC2=C(C=NN12)C(=O)N[C@@H](C)C3CC3)C4=CC(=CN=C4)Cl is 5-(5-chloro-3-pyridyl)-N-[(1S)-1-cyclopropylethyl]-7-methyl-pyrazolo[1,5-a]pyrimidine-3-carboxamide."} {"text":"The traditional IUPAC name of the chemical with InChI InChI=1S\/C13H14N2O2S\/c1-2-9-11(13(16)17)18-12(15-9)10(14)8-6-4-3-5-7-8\/h3-7,10H,2,14H2,1H3,(H,16,17) is 2-[amino(phenyl)methyl]-4-ethyl-thiazole-5-carboxylic acid."}", "/scratch/micpie/export/iupac_smiles/valid_8-6.jsonl": "{"text":"The canonical SMILES of the molecule with preferred IUPAC name 3-(N-naphthalen-2-ylsulfonylanilino)propanamide is NC(=O)CCN(c1ccccc1)S(=O)(=O)c1ccc2ccccc2c1."} {"text":"The SMILES of the molecule with preferred IUPAC name 1-[(E)-3,4-dihydro-2H-naphthalen-1-ylideneamino]oxy-3-(propan-2-ylamino)propan-2-ol;hydrochloride is CC(C)NCC(CO\/N=C\/1\\CCCC2=CC=CC=C21)O.Cl."}", "/scratch/micpie/export/iupac_smiles/train_9-7.jsonl": "{"text":"Task: Please generate the SMILES of a chemical based on the traditional IUPAC name.\nIUPAC name: 1-(tert-butylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,11,13-hexaenylideneamino)oxy-propan-2-ol\nResult: InChI=1S\/C22H28N2O2\/c1-22(2,3)23-14-18(25)15-26-24-21-19-10-6-4-8-16(19)12-13-17-9-5-7-11-20(17)21\/h4-11,18,23,25H,12-15H2,1-3H3"} {"text":"Task: Please give me the SMILES of a compound given the traditional IUPAC name.\nIUPAC name: N-[(5-butyl-6-keto-2,4-dimethyl-3,4-dihydropyridin-5-yl)methyl]-3-[cyclopentyl(methyl)amino]-2-vinyl-benzamide\nResult: CCCCCCCC=NC6=O)))C)))C))CNC=O)C=CC=CC=C6)))NC)CCCCC5)))))))C=C"}", "/scratch/micpie/export/iupac_smiles/train_14-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the molecule with InChI InChI=1S\/C13H16F3N3O3\/c1-7-11(8(2)18(3)17-7)22-12(21)9-4-10(20)19(5-9)6-13(14,15)16\/h9H,4-6H2,1-3H3\/t9-\/m1\/s1 is (3R)-5-oxo-1-(2,2,2-trifluoroethyl)-3-pyrrolidinecarboxylic acid (1,3,5-trimethyl-4-pyrazolyl) ester."} {"text":"The CAS-like IUPAC name of the molecule with SMILES C1CN(CC1F)S(=O)(=O)C2=CC=C(C=C2)C3=CC(=C4C(=C3)C=C(O4)CN)Cl is [7-chloro-5-[4-[(3-fluoro-1-pyrrolidinyl)sulfonyl]phenyl]-2-benzofuranyl]methanamine."}", "/scratch/micpie/export/iupac_smiles/test_0-9.jsonl": "{"text":"Task: Please generate the SMILES of a compound given the CAS-like IUPAC name.\nIUPAC name: 1-[(4-bromophenyl)methyl]-N-[(Z)-(3-methoxyphenyl)methylideneamino]-4-piperidinecarboxamide\nResult: COC1=CC=CC(=C1)\/C=N\\NC(=O)C2CCN(CC2)CC3=CC=C(C=C3)Br"} {"text":"Task: Please give me the SMILES of a molecule based on the IUAPC name in CAS-like style.\nIUPAC name: 2-(5-methyl-1lambda3-ioda-3-thia-2-azacyclopenta-1,4-dien-4-yl)-4-thiazolecarboxaldehyde\nResult: CC=CSN=I5)))C=NC=CS5))C=O"}", "/scratch/micpie/export/iupac_smiles/test_19-5.jsonl": "{"text":"The InChI of the chemical with IUAPC name in CAS-like style N-(4-chlorophenyl)-3-[4-[2-(methylamino)ethyl]-1-piperidinyl]propanamide;hydrochloride is InChI=1S\/C17H26ClN3O.ClH\/c1-19-10-6-14-7-11-21(12-8-14)13-9-17(22)20-16-4-2-15(18)3-5-16;\/h2-5,14,19H,6-13H2,1H3,(H,20,22);1H."} {"text":"The InChI of the compound with IUAPC name in CAS-like style nan is InChI=1S\/C73H78N4O10\/c1-44-11-7-13-51-14-9-18-57-59-41-74-40-50(59)38-73(44,51)26-21-56(86-46(3)79)36-54(82)37-62(49-32-64(83)69(65(33-49)85-4)87-55-17-10-16-53(81)35-55)77-42-60-61(20-19-58-68(60)63(77)39-72(70(58)84)28-27-71(43-72)24-5-6-25-71)76(30-23-45(2)78)66-34-48(22-29-75-66)67(57)47-12-8-15-52(80)31-47\/h7-8,10-13,15-17,19-20,22,31-35,40-42,44,56-57,62,67,70,74-75,80-81,83-84H,5-6,14,21,23-30,36-39,43H2,1-4H3\/t44-,56+,57-,62-,67+,70+,72-,73-\/m0\/s1."}", "/scratch/micpie/export/iupac_smiles/train_21-2.jsonl": "{"text":"The IUPAC name of the compound with DeepSMILES CCCCCCCC6OCCCCCO6)C))O))O))O)))))OCCCCCO6)CO)))O))O[C@@H]CCCCCCC6)))))))C=O)O)))))OC=O)C=CC=CC=C6)))))))))))))C=O)NCCNC=O)CCCCCC6)OCCCCCO6)CO)))O))NC=CN=N5))C=CC=CC=C6)))F))))))))O)))))O))NC=CN=N5))C=CC=CC=C6)))F is (2S)-2-[3-benzoyloxy-2-[3-ethyl-5-[2-[[3-[4-(3-fluorophenyl)triazol-1-yl]-5-[4-[4-(3-fluorophenyl)triazol-1-yl]-3,5-dihydroxy-6-(hydroxymethyl)oxan-2-yl]oxy-4-hydroxycyclohexanecarbonyl]amino]ethylcarbamoyl]-2-(3,4,5-trihydroxy-6-methyloxan-2-yl)oxycyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)oxan-4-yl]oxy-3-cyclohexylpropanoic acid."} {"text":"The preferred IUPAC name of the compound with SMILES C1=CC(=NN=C1)C2=CC=C(C=C2)OC(F)(F)F is 3-[4-(trifluoromethoxy)phenyl]pyridazine."}", "/scratch/micpie/export/iupac_smiles/valid_25-2.jsonl": "{"text":"The IUPAC name of the chemical with SELFIES [C][C][=C][C][=Branch2][Ring1][O][=N][C][=C][Branch1][Branch2][C][=N][N][Ring1][=Branch2][Ring1][Branch1][C][=Branch1][C][=O][N][C@@H1][Branch1][C][C][C][C][C][Ring1][Ring1][C][=C][C][=Branch1][=Branch1][=C][N][=C][Ring1][=Branch1][Cl] is 5-(5-chloropyridin-3-yl)-N-[(1S)-1-cyclopropylethyl]-7-methylpyrazolo[1,5-a]pyrimidine-3-carboxamide."} {"text":"The preferred IUPAC name of the chemical with DeepSMILES CCC=CSC=N5)CC=CC=CC=C6))))))N))))C=O)O is 2-[amino(phenyl)methyl]-4-ethyl-1,3-thiazole-5-carboxylic acid."}", "/scratch/micpie/export/iupac_smiles/test_12-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with canonical SMILES Clc1ccc(-c2ccc3c(c2)C2(c4ccccc4Sc4ccccc42)c2ccccc2-3)cc1 is 2-(4-chlorophenyl)spiro[fluorene-9,9'-thioxanthene]."} {"text":"The traditional IUPAC name of the compound with SELFIES [C][C][Branch1][C][C][Branch1][C][C][C][Branch1][P][C][=Branch1][C][=O][N][Branch1][Ring2][C][C][#C][C][C][C][C][Ring1][Ring1][N] is 2-amino-N-(cyclopropylmethyl)-3,3-dimethyl-N-propargyl-butyramide."}", "/scratch/micpie/export/iupac_smiles/test_26-8.jsonl": "{"text":"Task: Please create the SMILES representation of a molecule given the systematic IUPAC name.\nIUPAC name: 2-[2,3-bis(chloranyl)phenyl]-4-ethyl-1,3-thiazole-5-carboxylic acid\nResult: CCC=CSC=N5)C=CC=CC=C6)))Cl))Cl)))))C=O)O"} {"text":"Task: Please give me the SMILES representation of a chemical based on the systematic IUPAC name.\nIUPAC name: 1-[(3R)-3-(2-chlorophenyl)-5-(3-methoxyphenyl)-3,4-dihydropyrazol-2-yl]-2-(3,4-dihydro-1H-isoquinolin-2-yl)ethanone\nResult: [C][O][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=N][N][Branch1][S][C@H1][Branch1][Ring2][C][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][Cl][C][=Branch1][C][=O][C][N][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][#Branch2]"}", "/scratch/micpie/export/iupac_smiles/test_18-3.jsonl": "{"text":"The canonical SMILES of the compound with traditional IUPAC name [(E)-2-ethyl-7,10-dimethyl-undec-3-enylidene]-dimethyl-phosphonium is CCC(C=[P+](C)C)\/C=C\/CCC(C)CCC(C)C."} {"text":"The SMILES of the chemical with traditional IUPAC name methyl-[2-[1-[2-(4-nitrophenyl)ethyl]-4-piperidyl]ethyl]amine;hydrochloride is CNCCC1CCN(CC1)CCC2=CC=C(C=C2)[N+](=O)[O-].Cl."}", "/scratch/micpie/export/iupac_smiles/valid_21-5.jsonl": "{"text":"The SELFIES of the compound with CAS-like IUPAC name benzoic acid [4-[(2S)-1-(1-azetidinyl)-3-cyclohexyl-1-oxopropan-2-yl]oxy-2-[5-[[2-[[[3-[[3,5-dihydroxy-6-(hydroxymethyl)-4-[(2-oxo-1-benzopyran-3-yl)methoxy]-2-oxanyl]oxy]-4-hydroxy-5-[(2-oxo-1-benzopyran-3-yl)methoxy]cyclohexyl]-oxomethyl]amino]ethylamino]-oxomethyl]-3-(4-phenyl-1-triazolyl)-2-[(3,4,5-trihydroxy-6-methyl-2-oxanyl)oxy]cyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)-3-oxanyl] ester is [C][C][C][Branch2][=N][#C][C][Branch2][=N][#Branch2][C][Branch2][=N][Branch1][C][Branch1][Ring2][O][Ring1][=Branch1][O][C][C][Branch2][O][Branch2][C][C][Branch2][Branch1][=C][C][C][Ring1][=Branch1][O][C][C][Branch2][Ring2][#Branch1][C][Branch1][=N][C][Branch1][=Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][C@@H1][Branch1][#Branch2][C][C][C][C][C][C][C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][C][Ring1][Ring2][O][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][N][C][=Branch1][C][=O][C][C][C][Branch2][Branch1][Ring1][C][Branch2][Ring2][=C][C][Branch1][Ring2][C][Ring1][=Branch1][O][C][C][Branch2][Ring1][P][C][Branch1][=N][C][Branch1][=Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][C][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][Ring1][#Branch2][=O][O][O][O][C][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][Ring1][#Branch2][=O][N][C][=C][Branch1][Branch1][N][=N][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][O][O][O]."} {"text":"The SELFIES of the molecule with CAS-like IUPAC name 3-fluoro-5-(1H-1,2,4-triazol-5-yliodanuidyl)-2-(2,2,2-trifluoroethoxy)pyridine is [C][=C][Branch2][Ring1][Ring2][C][=N][C][=Branch1][Branch1][=C][Ring1][=Branch1][F][O][C][C][Branch1][C][F][Branch1][C][F][F][I-1][C][=N][C][=N][N][Ring1][Branch1]."}", "/scratch/micpie/export/iupac_smiles/test_26-6.jsonl": "{"text":"The canonical SMILES of the molecule with preferred IUPAC name 2-(2,3-dichlorophenyl)-4-ethyl-1,3-thiazole-5-carboxylic acid is CCc1nc(-c2cccc(Cl)c2Cl)sc1C(=O)O."} {"text":"The canonical SMILES of the compound with preferred IUPAC name 1-[(3R)-3-(2-chlorophenyl)-5-(3-methoxyphenyl)-3,4-dihydropyrazol-2-yl]-2-(3,4-dihydro-1H-isoquinolin-2-yl)ethanone is COc1cccc(C2=NN(C(=O)CN3CCc4ccccc4C3)[C@@H](c3ccccc3Cl)C2)c1."}", "/scratch/micpie/export/iupac_smiles/train_3-7.jsonl": "{"text":"Task: Please create the SMILES representation of a compound given the traditional IUPAC name.\nIUPAC name: (5-nitro-2-furyl)-pyrrolidino-methanone\nResult: CCCNC5)C=O)C=CC=CO5)[N+]=O)[O-]"} {"text":"Task: Please generate the SMILES representation of a molecule based on the traditional IUPAC name.\nIUPAC name: (12S,14S,17E)-23-benzyl-7-chloro-12-methylol-15,20-dioxa-2,4,5,9,11,23-hexazahexacyclo[19.7.1.13,10.111,14.04,8.022,27]hentriaconta-1(28),3(31),5,7,9,17,21(29),22(27)-octaen-24-one\nResult: O=C1CCc2cc3cc(c2N1Cc1ccccc1)OC\/C=C\/CO[C@H]1C[C@@H](CO)N(C1)c1cc(n2ncc(Cl)c2n1)N3"}", "/scratch/micpie/export/iupac_smiles/train_12-9.jsonl": "{"text":"Task: Please give me the SMILES of a molecule based on the CAS-like IUPAC name.\nIUPAC name: N-[3-(1-dibenzofuranyl)phenyl]-9,9-dimethyl-N-(3-phenylphenyl)-2-fluorenamine\nResult: CC1(C2=CC=CC=C2C3=C1C=C(C=C3)N(C4=CC=CC(=C4)C5=CC=CC=C5)C6=CC=CC(=C6)C7=C8C9=CC=CC=C9OC8=CC=C7)C"} {"text":"Task: Please give me the SMILES representation of a molecule given the CAS-like IUPAC name.\nIUPAC name: 2-[(2-amino-3,3-dimethyl-1-oxobutyl)amino]-4-pentenoic acid\nResult: InChI=1S\/C11H20N2O3\/c1-5-6-7(10(15)16)13-9(14)8(12)11(2,3)4\/h5,7-8H,1,6,12H2,2-4H3,(H,13,14)(H,15,16)"}", "/scratch/micpie/export/iupac_smiles/train_11-9.jsonl": "{"text":"Task: Please create the SMILES of a molecule based on the IUAPC name in CAS-like style.\nIUPAC name: acetic acid [(14R)-17-(5,6-dihydroxy-6-methylheptan-2-yl)-4,4,10,13,14-pentamethyl-2,3,5,6,7,11,12,15,16,17-decahydro-1H-cyclopenta[a]phenanthren-3-yl] ester\nResult: CC(=O)OC1CCC2(C)C3=C(CCC2C1(C)C)[C@]1(C)CCC(C(C)CCC(O)C(C)(C)O)C1(C)CC3"} {"text":"Task: Please give me the SMILES representation of a chemical given the IUAPC name in CAS-like style.\nIUPAC name: N-[3-(1-dibenzofuranyl)phenyl]-9,9-dimethyl-N-(4-phenylphenyl)-2-fluorenamine\nResult: [C][C][Branch2][=Branch1][Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][=Branch2][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][Branch2][Ring1][Ring1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][Ring1][=Branch2][=C][C][=C][Ring1][=N][C]"}", "/scratch/micpie/export/iupac_smiles/test_0-0.jsonl": "{"text":"The traditional IUPAC name of the compound with DeepSMILES COC=CC=CC=C6)\/C=N\\NC=O)CCCNCC6))CC=CC=CC=C6))Br is 1-(4-bromobenzyl)-N-[(Z)-m-anisylideneamino]isonipecotamide."} {"text":"The traditional IUPAC name of the molecule with DeepSMILES CC=CSN=I5)))C=NC=CS5))C=O is 2-(5-methyl-1lambda3-ioda-3-thia-2-azacyclopenta-1,4-dien-4-yl)thiazole-4-carbaldehyde."}", "/scratch/micpie/export/iupac_smiles/test_11-2.jsonl": "{"text":"The IUPAC name of the chemical with SMILES CN(C)CCC(C1=CC=CC=C1)OC2=C(C=C(C=C2)F)F.C1=CC(=C(C=C1[N+](=O)[O-])C(=O)O)[N+](=O)[O-] is 3-(2,4-difluorophenoxy)-N,N-dimethyl-3-phenylpropan-1-amine;2,5-dinitrobenzoic acid."} {"text":"The preferred IUPAC name of the compound with InChI InChI=1S\/C42H50O2\/c1-3-5-7-9-11-13-27-42(28-14-12-10-8-6-4-2)40-29-35(33-17-15-32(31-43)16-18-33)21-25-38(40)39-26-22-36(30-41(39)42)34-19-23-37(44)24-20-34\/h15-26,29-31,44H,3-14,27-28H2,1-2H3 is 4-[7-(4-hydroxyphenyl)-9,9-dioctylfluoren-2-yl]benzaldehyde."}", "/scratch/micpie/export/iupac_smiles/train_14-5.jsonl": "{"text":"The SMILES of the chemical with CAS-like IUPAC name (3R)-5-oxo-1-(2,2,2-trifluoroethyl)-3-pyrrolidinecarboxylic acid (1,3,5-trimethyl-4-pyrazolyl) ester is CC1=C(C(=NN1C)C)OC(=O)[C@@H]2CC(=O)N(C2)CC(F)(F)F."} {"text":"The InChI of the chemical with IUAPC name in CAS-like style [7-chloro-5-[4-[(3-fluoro-1-pyrrolidinyl)sulfonyl]phenyl]-2-benzofuranyl]methanamine is InChI=1S\/C19H18ClFN2O3S\/c20-18-9-13(7-14-8-16(10-22)26-19(14)18)12-1-3-17(4-2-12)27(24,25)23-6-5-15(21)11-23\/h1-4,7-9,15H,5-6,10-11,22H2."}", "/scratch/micpie/export/iupac_smiles/train_17-4.jsonl": "{"text":"The SELFIES of the chemical with systematic IUPAC name 2-ethoxy-N-pyridin-4-yl-3-[[4-[2-(2H-1,2,3,4-tetrazol-5-yl)phenyl]phenyl]methyl]benzimidazole-4-carboxamide is [C][C][O][C][=N][C][=C][C][=C][C][=Branch2][Ring1][P][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][N][=N][Ring1][Branch1][C][=Branch1][C][=O][N][C][=C][C][=N][C][=C][Ring1][=Branch1]."} {"text":"The canonical SMILES of the chemical with systematic IUPAC name 6-(2-fluorophenyl)-8-heptan-4-yl-quinazoline is CCCC(CCC)c1cc(-c2ccccc2F)cc2cncnc12."}", "/scratch/micpie/export/iupac_smiles/test_1-8.jsonl": "{"text":"Task: Please give me the SMILES representation of a compound given the systematic IUPAC name.\nIUPAC name: (3R)-3-nitrocyclohexan-1-ol\nResult: CC[C@H]CCC6)O)))[N+]=O)[O-]"} {"text":"Task: Please create the SMILES representation of a chemical given the systematic IUPAC name.\nIUPAC name: (2S)-1-(1H-indol-4-yloxy)-3-[[(2S)-3-(1H-indol-4-yloxy)-2-oxidanyl-propyl]-propan-2-yl-amino]propan-2-ol\nResult: CC(C)N(C[C@H](O)COc1cccc2[nH]ccc12)C[C@H](O)COc1cccc2[nH]ccc12"}", "/scratch/micpie/export/iupac_smiles/train_19-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the molecule with canonical SMILES CNCCC1CCN(CCC(=O)Nc2ccc(Cl)cc2)CC1 is N-(4-chlorophenyl)-3-[4-[2-(methylamino)ethyl]-1-piperidinyl]propanamide."} {"text":"The IUAPC name in CAS-like style of the chemical with SMILES CC(=NC1=C(C=C(C=C1Cl)N(O)[O-])Cl)C2=CC3=CC=CC=C3OC2=O is 3-[1-[2,6-dichloro-4-[hydroxy(oxido)amino]phenyl]iminoethyl]-1-benzopyran-2-one."}", "/scratch/micpie/export/iupac_smiles/test_27-4.jsonl": "{"text":"The SELFIES of the molecule with systematic IUPAC name 1-[(3S)-3-(4-chlorophenyl)-5-(3,4-dimethylphenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(2-fluorophenyl)piperazin-1-yl]ethanone is [C][C][=C][Branch2][Branch1][Branch2][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=N][N][Branch2][Ring1][C][C@@H1][Branch1][Ring2][C][Ring1][Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl][C][=Branch1][C][=O][C][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C]."} {"text":"The DeepSMILES of the molecule with systematic IUPAC name N-[(1E)-1-(4-ethenyl-2,3,5,5-tetramethyl-cyclopenta-1,3-dien-1-yl)buta-1,3-dien-2-yl]-5-phenyl-thiophen-2-amine is CC=CCC=C5C))\/C=C\\C=C))\/NC=CC=CS5)C=CC=CC=C6))))))))))))))C)C))C=C."}", "/scratch/micpie/export/iupac_smiles/valid_12-0.jsonl": "{"text":"The traditional IUPAC name of the compound with canonical SMILES CCCCCCCCOC(CCC(=O)OCCCCCCN(CCO)CCCCCCCC(=O)OCC)OCCCCCCCC is 8-[6-(4,4-dioctoxybutanoyloxy)hexyl-(2-hydroxyethyl)amino]caprylic acid ethyl ester."} {"text":"The traditional IUPAC name of the molecule with SELFIES [C][C][Branch1][C][C][Branch1][C][C][C][Branch2][Ring1][C][C][=Branch1][C][=O][N][C][C][C][C][C][Branch1][C][F][Branch1][C][F][F][N] is 2-amino-3,3-dimethyl-N-(5,5,5-trifluoropentyl)butyramide."}", "/scratch/micpie/export/iupac_smiles/test_2-5.jsonl": "{"text":"The DeepSMILES of the compound with IUAPC name in CAS-like style (2R)-2-(methylamino)-1-(2-methylphenyl)-1-propanone is CC=CC=CC=C6C=O)[C@@H]C)NC."} {"text":"The InChI of the compound with IUAPC name in CAS-like style 2-(6-bromo-2,3-dihydro-1,4-benzodioxin-7-yl)-4,5,6,7-tetrachloroisoindole-1,3-dione is InChI=1S\/C16H6BrCl4NO4\/c17-5-3-7-8(26-2-1-25-7)4-6(5)22-15(23)9-10(16(22)24)12(19)14(21)13(20)11(9)18\/h3-4H,1-2H2."}", "/scratch/micpie/export/iupac_smiles/valid_11-0.jsonl": "{"text":"The traditional IUPAC name of the compound with canonical SMILES Clc1ccc2c(c1)CCc1cccnc1C2=C1CCNCC1.O=S(=O)([O-])[O-].O=S(=O)([O-])[O-] is 13-chloro-2-(4-piperidylidene)-4-azatricyclo[9.4.0.03,8]pentadeca-1(11),3(8),4,6,12,14-hexaene;disulfate."} {"text":"The traditional IUPAC name of the chemical with SMILES CCCCCCCCC1(C2=C(C=CC(=C2)C3=CC=C(C=C3)C=O)C4=C1C=C(C=C4)Br)CCCCCCCC is 4-(7-bromo-9,9-dioctyl-fluoren-2-yl)benzaldehyde."}", "/scratch/micpie/export/iupac_smiles/valid_19-3.jsonl": "{"text":"The canonical SMILES of the compound with traditional IUPAC name N-[2-chloro-5-(tetrazol-1-yl)phenyl]-2-[4-[2-(methylamino)ethyl]piperidino]acetamide;hydrochloride is CNCCC1CCN(CC(=O)Nc2cc(-n3cnnn3)ccc2Cl)CC1.Cl."} {"text":"The canonical SMILES of the compound with traditional IUPAC name 2-[bis[2-[(2-isopropyl-4a-methyl-8-methylene-decalin-1-yl)amino]-2-keto-ethyl]amino]acetic acid methyl ester is C=C1CCCC2(C)CCC(C(C)C)C(NC(=O)CN(CC(=O)NC3C(C(C)C)CCC4(C)CCCC(=C)C34)CC(=O)OC)C12."}", "/scratch/micpie/export/iupac_smiles/test_6-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with InChI InChI=1S\/C20H25N3O2\/c1-20(2,15-8-4-3-5-9-15)14-18(24)23-17-11-7-6-10-16(17)19(25)22-13-12-21\/h3-11H,12-14,21H2,1-2H3,(H,22,25)(H,23,24) is N-(2-aminoethyl)-2-[(3-methyl-3-phenyl-butanoyl)amino]benzamide."} {"text":"The traditional IUPAC name of the compound with canonical SMILES Clc1ccc(CNc2ccc3c(c2)OCCO3)cc1Cl is (3,4-dichlorobenzyl)-(2,3-dihydro-1,4-benzodioxin-6-yl)amine."}", "/scratch/micpie/export/iupac_smiles/train_5-5.jsonl": "{"text":"The canonical SMILES of the molecule with IUAPC name in CAS-like style 4-[3-(aminomethyl)-3-oxolanyl]-3-ethyl-4-piperidinol is CCC1CNCCC1(O)C1(CN)CCOC1."} {"text":"The InChI of the compound with IUAPC name in CAS-like style N-(2-aminoethyl)-2-[[1-oxo-2-(2-oxolanylmethoxy)propyl]amino]benzamide;hydrochloride is InChI=1S\/C17H25N3O4.ClH\/c1-12(24-11-13-5-4-10-23-13)16(21)20-15-7-3-2-6-14(15)17(22)19-9-8-18;\/h2-3,6-7,12-13H,4-5,8-11,18H2,1H3,(H,19,22)(H,20,21);1H."}", "/scratch/micpie/export/iupac_smiles/test_18-2.jsonl": "{"text":"The preferred IUPAC name of the chemical with SMILES CCC(\/C=C\/CCC(C)CCC(C)C)C=[P+](C)C is [(E)-2-ethyl-7,10-dimethylundec-3-enylidene]-dimethylphosphanium."} {"text":"The preferred IUPAC name of the molecule with SELFIES [C][N][C][C][C][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N+1][=Branch1][C][=O][O-1].[Cl] is N-methyl-2-[1-[2-(4-nitrophenyl)ethyl]piperidin-4-yl]ethanamine;hydrochloride."}", "/scratch/micpie/export/iupac_smiles/valid_21-6.jsonl": "{"text":"The canonical SMILES of the compound with preferred IUPAC name [4-[(2S)-1-(azetidin-1-yl)-3-cyclohexyl-1-oxopropan-2-yl]oxy-2-[5-[2-[[3-[3,5-dihydroxy-6-(hydroxymethyl)-4-[(2-oxochromen-3-yl)methoxy]oxan-2-yl]oxy-4-hydroxy-5-[(2-oxochromen-3-yl)methoxy]cyclohexanecarbonyl]amino]ethylcarbamoyl]-3-(4-phenyltriazol-1-yl)-2-(3,4,5-trihydroxy-6-methyloxan-2-yl)oxycyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)oxan-3-yl] benzoate is CC1OC(OC2C(OC3OC(CO)C(O)C(O[C@@H](CC4CCCCC4)C(=O)N4CCC4)C3OC(=O)c3ccccc3)CC(C(=O)NCCNC(=O)C3CC(OCc4cc5ccccc5oc4=O)C(O)C(OC4OC(CO)C(O)C(OCc5cc6ccccc6oc5=O)C4O)C3)CC2n2cc(-c3ccccc3)nn2)C(O)C(O)C1O."} {"text":"The SELFIES of the molecule with preferred IUPAC name 3-fluoro-5-(1H-1,2,4-triazol-5-yliodanuidyl)-2-(2,2,2-trifluoroethoxy)pyridine is [C][=C][Branch2][Ring1][Ring2][C][=N][C][=Branch1][Branch1][=C][Ring1][=Branch1][F][O][C][C][Branch1][C][F][Branch1][C][F][F][I-1][C][=N][C][=N][N][Ring1][Branch1]."}", "/scratch/micpie/export/iupac_smiles/train_11-6.jsonl": "{"text":"The SELFIES of the compound with IUPAC name [(14R)-17-(5,6-dihydroxy-6-methylheptan-2-yl)-4,4,10,13,14-pentamethyl-2,3,5,6,7,11,12,15,16,17-decahydro-1H-cyclopenta[a]phenanthren-3-yl] acetate is [C][C][Branch1][#C][C][C][C][Branch1][=Branch2][C][Branch1][C][C][Branch1][C][C][O][O][C][C][C][C@@][Branch2][Ring2][#Branch2][C][Ring1][Branch1][Branch2][Ring2][Ring1][C][C][C][=C][Ring1][=Branch1][C][C][C][C][Ring1][=Branch1][Branch2][Ring1][Ring1][C][C][C][Branch1][Branch2][C][Ring1][=Branch1][Branch1][C][C][C][O][C][=Branch1][C][=O][C][C][C][C]."} {"text":"The InChI of the chemical with IUPAC name N-(3-dibenzofuran-1-ylphenyl)-9,9-dimethyl-N-(4-phenylphenyl)fluoren-2-amine is InChI=1S\/C45H33NO\/c1-45(2)40-19-8-6-16-37(40)38-27-26-35(29-41(38)45)46(33-24-22-31(23-25-33)30-12-4-3-5-13-30)34-15-10-14-32(28-34)36-18-11-21-43-44(36)39-17-7-9-20-42(39)47-43\/h3-29H,1-2H3."}", "/scratch/micpie/export/iupac_smiles/valid_16-2.jsonl": "{"text":"The preferred IUPAC name of the chemical with InChI InChI=1S\/C12H8Cl2INO\/c13-6-8-4-12(16-7-11(8)14)17-10-3-1-2-9(15)5-10\/h1-5,7H,6H2 is 5-chloro-4-(chloromethyl)-2-(3-iodophenoxy)pyridine."} {"text":"The preferred IUPAC name of the compound with canonical SMILES CCOc1nc2cccc(C(=O)NCCc3cc(Cl)ccc3Cl)c2n1Cc1ccc(-c2ccccc2-c2nn[nH]n2)cc1 is N-[2-(2,5-dichlorophenyl)ethyl]-2-ethoxy-3-[[4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl]benzimidazole-4-carboxamide."}", "/scratch/micpie/export/iupac_smiles/test_21-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the molecule with DeepSMILES CCCCCCCC6OCCCCCO6)C))O))O))O)))))OCCCCCO6)CO)))O))O[C@@H]CCCCCCC6)))))))C=O)NCCC4))))))))OC=O)C=CC=CC=C6)))))))))))))C=O)NCCNC=O)CCCCCC6)OCCCCCO6)CO)))O))OCC=CC=CC=CC=C6OC%10=O))))))))))))))O)))))O))OCC=CC=CC=CC=C6OC%10=O is benzoic acid [4-[(2S)-1-(1-azetidinyl)-3-cyclohexyl-1-oxopropan-2-yl]oxy-2-[5-[[2-[[[3-[[3,5-dihydroxy-6-(hydroxymethyl)-4-[(2-oxo-1-benzopyran-3-yl)methoxy]-2-oxanyl]oxy]-4-hydroxy-5-[(2-oxo-1-benzopyran-3-yl)methoxy]cyclohexyl]-oxomethyl]amino]ethylamino]-oxomethyl]-3-ethyl-2-[(3,4,5-trihydroxy-6-methyl-2-oxanyl)oxy]cyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)-3-oxanyl] ester."} {"text":"The CAS-like IUPAC name of the compound with canonical SMILES COCc1nccnc1-c1ccc(OC(F)(F)F)cc1 is 2-(methoxymethyl)-3-[4-(trifluoromethoxy)phenyl]pyrazine."}", "/scratch/micpie/export/iupac_smiles/train_6-8.jsonl": "{"text":"Task: Please give me the SMILES of a compound given the systematic IUPAC name.\nIUPAC name: N-(2-azanylethyl)-2-[2-(oxolan-2-ylmethoxy)propanoylamino]benzamide\nResult: InChI=1S\/C17H25N3O4\/c1-12(24-11-13-5-4-10-23-13)16(21)20-15-7-3-2-6-14(15)17(22)19-9-8-18\/h2-3,6-7,12-13H,4-5,8-11,18H2,1H3,(H,19,22)(H,20,21)"} {"text":"Task: Please give me the SMILES representation of a chemical given the systematic IUPAC name.\nIUPAC name: 3-azanyl-4-fluoranyl-N-[(1S)-1-phenylethyl]benzenesulfonamide\nResult: [C][C@@H1][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=Branch1][=Branch2][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][F][N]"}", "/scratch/micpie/export/iupac_smiles/test_3-5.jsonl": "{"text":"The SMILES of the molecule with IUAPC name in CAS-like style (2R)-1-(phenylmethylthio)-3-propan-2-yloxy-2-propanol is CC(C)OC[C@H](CSCC1=CC=CC=C1)O."} {"text":"The InChI of the compound with IUAPC name in CAS-like style 2-amino-2-(hydroxymethyl)propane-1,3-diol;5-chloro-6-methyl-3H-1,3-benzoxazol-2-one is InChI=1S\/C8H6ClNO2.C4H11NO3\/c1-4-2-7-6(3-5(4)9)10-8(11)12-7;5-4(1-6,2-7)3-8\/h2-3H,1H3,(H,10,11);6-8H,1-3,5H2."}", "/scratch/micpie/export/iupac_smiles/train_2-0.jsonl": "{"text":"The traditional IUPAC name of the compound with InChI InChI=1S\/C24H34O4\/c1-24-11-10-17-16-7-6-15(26-2)13-19(16)21(25)14-18(17)20(24)8-9-22(24)28-23-5-3-4-12-27-23\/h6-7,13,17-18,20-23,25H,3-5,8-12,14H2,1-2H3\/t17-,18-,20+,21-,22+,23+,24+\/m1\/s1 is (6R,8R,9S,13S,14S,17S)-3-methoxy-13-methyl-17-[(2S)-tetrahydropyran-2-yl]oxy-6,7,8,9,11,12,14,15,16,17-decahydrocyclopenta[a]phenanthren-6-ol."} {"text":"The traditional IUPAC name of the chemical with SELFIES [C][C@][C][C@@H1][C][C@][Branch1][Ring2][C][Ring1][=Branch1][Branch1][=C][C][C@@][Branch1][Ring2][C][Ring1][#Branch1][Branch1][Ring2][C][Ring1][#Branch2][O][C][=Branch1][C][=O][O-1] is (1S,3R,5S,7R)-3-hydroxy-5-methyl-adamantane-1-carboxylate."}", "/scratch/micpie/export/iupac_smiles/train_12-4.jsonl": "{"text":"The DeepSMILES of the molecule with systematic IUPAC name N-(3-dibenzofuran-1-ylphenyl)-9,9-dimethyl-N-(3-phenylphenyl)fluoren-2-amine is CCC=CC=CC=C6C=C9C=CC=C6))NC=CC=CC=C6)C=CC=CC=C6)))))))))))C=CC=CC=C6)C=CC=CC=CC=C6OC9=CC=C%13)))))))))))))))))))))))))))))C."} {"text":"The canonical SMILES of the compound with systematic IUPAC name 2-[(2-azanyl-3,3-dimethyl-butanoyl)amino]pent-4-enoic acid is C=CCC(NC(=O)C(N)C(C)(C)C)C(=O)O."}", "/scratch/micpie/export/iupac_smiles/test_13-1.jsonl": "{"text":"The CAS-like IUPAC name of the chemical with SMILES CC(C)(C)C(C(=O)NC(C)(CO)CO)N is 2-amino-N-(1,3-dihydroxy-2-methylpropan-2-yl)-3,3-dimethylbutanamide."} {"text":"The IUAPC name in CAS-like style of the chemical with InChI InChI=1S\/C16H18N2O3\/c1-10-15(11(2)18(3)17-10)21-16(19)13-8-9-20-14-7-5-4-6-12(13)14\/h4-7,13H,8-9H2,1-3H3\/t13-\/m0\/s1 is (4S)-3,4-dihydro-2H-1-benzopyran-4-carboxylic acid (1,3,5-trimethyl-4-pyrazolyl) ester."}", "/scratch/micpie/export/iupac_smiles/valid_0-7.jsonl": "{"text":"Task: Please give me the SMILES representation of a chemical based on the traditional IUPAC name.\nIUPAC name: 1-(4-bromobenzyl)-N-[(Z)-1-(p-tolyl)ethylideneamino]pyrazole-3-carboxamide\nResult: [C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][\/C][=Branch2][Ring1][N][=N][\\N][C][=Branch1][C][=O][C][=N][N][Branch1][Branch1][C][=C][Ring1][Branch1][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Br][\/C]"} {"text":"Task: Please create the SMILES representation of a molecule given the traditional IUPAC name.\nIUPAC name: ethylene;4-methoxy-4,6-dimethyl-tetrahydropyran-2,5-diol\nResult: CCCCCCO6)O)))C)OC)))O.C=C"}", "/scratch/micpie/export/iupac_smiles/train_4-9.jsonl": "{"text":"Task: Please create the SMILES of a chemical based on the IUAPC name in CAS-like style.\nIUPAC name: (1S,2R,4aS,5R,6S,8aS)-6-(1,3-dithiolan-2-yl)-5-[(E)-2-[5-(3-fluorophenyl)-2-pyridinyl]ethenyl]-2-hydroxy-1,4a-dimethyl-2,3,4,5,6,7,8,8a-octahydronaphthalene-1-carboxaldehyde\nResult: InChI=1S\/C29H34FNO2S2\/c1-28-13-12-26(33)29(2,18-32)25(28)11-9-23(27-34-14-15-35-27)24(28)10-8-22-7-6-20(17-31-22)19-4-3-5-21(30)16-19\/h3-8,10,16-18,23-27,33H,9,11-15H2,1-2H3\/b10-8+\/t23-,24+,25-,26+,28-,29-\/m0\/s1"} {"text":"Task: Please create the SMILES of a molecule given the IUAPC name in CAS-like style.\nIUPAC name: 4-(1-aminobutan-2-yl)-3-ethyl-4-piperidinol\nResult: [C][C][C][C][N][C][C][C][Ring1][=Branch1][Branch1][Branch2][C][Branch1][Ring1][C][C][C][N][O]"}", "/scratch/micpie/export/iupac_smiles/valid_15-5.jsonl": "{"text":"The InChI of the chemical with CAS-like IUPAC name methanesulfonic acid 2-[5-[5-[(4,4-difluoro-1-piperidinyl)-oxomethyl]-2-pyridinyl]-7-(trifluoromethyl)-2-benzofuranyl]ethyl ester is InChI=1S\/C23H21F5N2O5S\/c1-36(32,33)34-9-4-17-11-16-10-15(12-18(20(16)35-17)23(26,27)28)19-3-2-14(13-29-19)21(31)30-7-5-22(24,25)6-8-30\/h2-3,10-13H,4-9H2,1H3."} {"text":"The DeepSMILES of the molecule with CAS-like IUPAC name 5-chloro-4-(chloromethyl)-2-(1-naphthalenyloxy)pyridine is C=CC=CC=C6)C=CC=C6OC=NC=CC=C6)CCl)))Cl."}", "/scratch/micpie/export/iupac_smiles/valid_15-9.jsonl": "{"text":"Task: Please generate the SMILES representation of a molecule given the CAS-like IUPAC name.\nIUPAC name: methanesulfonic acid 2-[5-[5-[(4,4-difluoro-1-piperidinyl)-oxomethyl]-2-pyridinyl]-7-(trifluoromethyl)-2-benzofuranyl]ethyl ester\nResult: CS(=O)(=O)OCCC1=CC2=CC(=CC(=C2O1)C(F)(F)F)C3=NC=C(C=C3)C(=O)N4CCC(CC4)(F)F"} {"text":"Task: Please generate the SMILES representation of a compound given the CAS-like IUPAC name.\nIUPAC name: 5-chloro-4-(chloromethyl)-2-(1-naphthalenyloxy)pyridine\nResult: ClCc1cc(Oc2cccc3ccccc23)ncc1Cl"}", "/scratch/micpie/export/iupac_smiles/test_13-5.jsonl": "{"text":"The InChI of the chemical with IUAPC name in CAS-like style 2-amino-N-(1,3-dihydroxy-2-methylpropan-2-yl)-3,3-dimethylbutanamide is InChI=1S\/C10H22N2O3\/c1-9(2,3)7(11)8(15)12-10(4,5-13)6-14\/h7,13-14H,5-6,11H2,1-4H3,(H,12,15)."} {"text":"The DeepSMILES of the molecule with IUAPC name in CAS-like style (4S)-3,4-dihydro-2H-1-benzopyran-4-carboxylic acid (1,3,5-trimethyl-4-pyrazolyl) ester is CC=CC=NN5C)))C))OC=O)[C@H]CCOC=CC=CC=C%106."}", "/scratch/micpie/export/iupac_smiles/train_1-7.jsonl": "{"text":"Task: Please give me the SMILES representation of a molecule based on the traditional IUPAC name.\nIUPAC name: [3-[(1Z)-3-chlorobuta-1,3-dienyl]-2-methyl-4-pyridyl]-(4-piperidinobutyl)amine\nResult: InChI=1S\/C19H28ClN3\/c1-16(20)8-9-18-17(2)21-12-10-19(18)22-11-4-7-15-23-13-5-3-6-14-23\/h8-10,12H,1,3-7,11,13-15H2,2H3,(H,21,22)\/b9-8-"} {"text":"Task: Please create the SMILES of a chemical based on the traditional IUPAC name.\nIUPAC name: (6S,8R,9S,13S,14S,17S)-3-methoxy-13-methyl-17-[(2R)-tetrahydropyran-2-yl]oxy-6,7,8,9,11,12,14,15,16,17-decahydrocyclopenta[a]phenanthren-6-ol\nResult: InChI=1S\/C24H34O4\/c1-24-11-10-17-16-7-6-15(26-2)13-19(16)21(25)14-18(17)20(24)8-9-22(24)28-23-5-3-4-12-27-23\/h6-7,13,17-18,20-23,25H,3-5,8-12,14H2,1-2H3\/t17-,18-,20+,21+,22+,23-,24+\/m1\/s1"}", "/scratch/micpie/export/iupac_smiles/train_9-6.jsonl": "{"text":"The InChI of the chemical with IUPAC name 1-(tert-butylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,11,13-hexaenylideneamino)oxypropan-2-ol is InChI=1S\/C22H28N2O2\/c1-22(2,3)23-14-18(25)15-26-24-21-19-10-6-4-8-16(19)12-13-17-9-5-7-11-20(17)21\/h4-11,18,23,25H,12-15H2,1-3H3."} {"text":"The DeepSMILES of the compound with IUPAC name N-[(5-butyl-2,4-dimethyl-6-oxo-3,4-dihydropyridin-5-yl)methyl]-3-[cyclopentyl(methyl)amino]-2-ethenylbenzamide is CCCCCCCC=NC6=O)))C)))C))CNC=O)C=CC=CC=C6)))NC)CCCCC5)))))))C=C."}", "/scratch/micpie/export/iupac_smiles/valid_23-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the chemical with DeepSMILES CCCCNC6)S=O)=O)C=CC=CC=C6)))Cl)))))))C=O is 1-(3-chlorophenyl)sulfonyl-3-piperidinecarboxaldehyde."} {"text":"The CAS-like IUPAC name of the compound with SMILES CC1=C(C(=C(C(=C1C)C)S(=O)(=O)NCCC(=O)NN2C=NC3=CC=CC=C32)C)C is N-(1-benzimidazolyl)-3-[(2,3,4,5,6-pentamethylphenyl)sulfonylamino]propanamide."}", "/scratch/micpie/export/iupac_smiles/train_1-4.jsonl": "{"text":"The SELFIES of the chemical with systematic IUPAC name 3-[(1Z)-3-chloranylbuta-1,3-dienyl]-2-methyl-N-(4-piperidin-1-ylbutyl)pyridin-4-amine is [C][C][=N][C][=C][C][=Branch1][O][=C][Ring1][=Branch1][\/C][=C][\\C][=Branch1][C][=C][Cl][N][C][C][C][C][N][C][C][C][C][C][Ring1][=Branch1]."} {"text":"The DeepSMILES of the chemical with systematic IUPAC name (6S,8R,9S,13S,14S,17S)-3-methoxy-13-methyl-17-[(2R)-oxan-2-yl]oxy-6,7,8,9,11,12,14,15,16,17-decahydrocyclopenta[a]phenanthren-6-ol is C[C@]CC[C@H][C@H][C@@H]6CC[C@@H]9O[C@@H]CCCCO6)))))))))))C[C@@H]C=C6C=CC=C6)OC)))))))O."}", "/scratch/micpie/export/iupac_smiles/valid_2-0.jsonl": "{"text":"The traditional IUPAC name of the compound with SMILES CCC[C@H](C)[C@@H](C1=CC=C(C=C1)O)C(C)C is 4-[(1S,2S)-1-isopropyl-2-methyl-pentyl]phenol."} {"text":"The traditional IUPAC name of the molecule with SELFIES [C][C][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=N][C][=Branch1][Branch1][=C][S][Ring1][Branch1][C][=C][C][=C][C][=Branch1][=C][=C][C][=Branch1][=Branch2][=C][Ring1][=Branch1][O][C][Ring1][#Branch2][=O][Br][N+1][=Branch1][C][=O][O-1] is 8-bromo-6-nitro-3-(2-p-phenetylthiazol-4-yl)coumarin."}", "/scratch/micpie/export/iupac_smiles/valid_26-9.jsonl": "{"text":"Task: Please generate the SMILES representation of a chemical given the IUAPC name in CAS-like style.\nIUPAC name: 2-[4-(difluoromethyl)phenyl]-4-ethyl-5-thiazolecarboxylic acid\nResult: [C][C][C][=C][Branch2][Ring1][#Branch1][S][C][=Branch1][Ring2][=N][Ring1][Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][Branch1][C][F][F][C][=Branch1][C][=O][O]"} {"text":"Task: Please create the SMILES of a molecule given the CAS-like IUPAC name.\nIUPAC name: 1-[(3R)-3-(2-chlorophenyl)-5-(3-methoxyphenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(phenylmethyl)-1-piperazinyl]ethanone\nResult: [C][O][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=N][N][Branch1][S][C@H1][Branch1][Ring2][C][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][Cl][C][=Branch1][C][=O][C][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][=C][C][=C][C][=C][Ring1][=Branch1]"}", "/scratch/micpie/export/iupac_smiles/valid_27-9.jsonl": "{"text":"Task: Please generate the SMILES of a chemical given the IUAPC name in CAS-like style.\nIUPAC name: 1-[(3S)-3-(4-chlorophenyl)-5-(4-fluorophenyl)-3,4-dihydropyrazol-2-yl]-2-(N-methylanilino)ethanone\nResult: CN(CC(=O)N1[C@@H](CC(=N1)C2=CC=C(C=C2)F)C3=CC=C(C=C3)Cl)C4=CC=CC=C4"} {"text":"Task: Please give me the SMILES representation of a molecule given the IUAPC name in CAS-like style.\nIUPAC name: 6-[7'-[(9,9-dimethyl-2-fluorenyl)amino]-9,9'-spirobi[fluorene]-2'-yl]benzene-1,2,3,4,5-pentol\nResult: CC1(C)c2ccccc2-c2ccc(Nc3ccc4c(c3)C3(c5ccccc5-c5ccccc53)c3cc(-c5c(O)c(O)c(O)c(O)c5O)ccc3-4)cc21"}", "/scratch/micpie/export/iupac_smiles/train_10-3.jsonl": "{"text":"The canonical SMILES of the chemical with traditional IUPAC name N-[4-[5-chloro-3-(dihydroxymethyl)-N,2-diethyl-anilino]cyclohexyl]carbamic acid tert-butyl ester is CCc1c(C(O)O)cc(Cl)cc1N(CC)C1CCC(NC(=O)OC(C)(C)C)CC1."} {"text":"The InChI of the chemical with traditional IUPAC name acetic acid [(3aS)-3a-fluorocarbonyl-1-isopropyl-5a,5b,8,8,11a-pentamethyl-1,2,3,4,5,6,7,7a,9,10,11,11b,12,13,13a,13b-hexadecahydrocyclopenta[a]chrysen-9-yl] ester is InChI=1S\/C32H51FO3\/c1-19(2)21-11-16-32(27(33)35)18-17-30(7)22(26(21)32)9-10-24-29(6)14-13-25(36-20(3)34)28(4,5)23(29)12-15-31(24,30)8\/h19,21-26H,9-18H2,1-8H3\/t21?,22?,23?,24?,25?,26?,29?,30?,31?,32-\/m0\/s1."}", "/scratch/micpie/export/iupac_smiles/valid_14-9.jsonl": "{"text":"Task: Please create the SMILES of a molecule given the CAS-like IUPAC name.\nIUPAC name: N-[[(1S,2S)-2-[[2-furanyl(oxo)methyl]amino]cyclopentyl]methyl]carbamic acid tert-butyl ester\nResult: InChI=1S\/C16H24N2O4\/c1-16(2,3)22-15(20)17-10-11-6-4-7-12(11)18-14(19)13-8-5-9-21-13\/h5,8-9,11-12H,4,6-7,10H2,1-3H3,(H,17,20)(H,18,19)\/t11-,12-\/m0\/s1"} {"text":"Task: Please generate the SMILES representation of a molecule given the IUAPC name in CAS-like style.\nIUPAC name: N-[2-(5-bromo-7-chloro-2-benzofuranyl)ethyl]carbamic acid tert-butyl ester\nResult: InChI=1S\/C15H17BrClNO3\/c1-15(2,3)21-14(19)18-5-4-11-7-9-6-10(16)8-12(17)13(9)20-11\/h6-8H,4-5H2,1-3H3,(H,18,19)"}", "/scratch/micpie/export/iupac_smiles/valid_9-4.jsonl": "{"text":"The canonical SMILES of the chemical with systematic IUPAC name 1-(propan-2-ylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,9,11,13-heptaenylideneamino)oxy-propan-2-ol is CC(C)NCC(O)CON=c1c2ccccc2ccc2ccccc12."} {"text":"The SMILES of the molecule with systematic IUPAC name 5-chloranyl-3-[cyclobutyl(methyl)amino]-N-[(4-cycloheptyl-4,6-dimethyl-2-oxidanylidene-piperidin-3-yl)methyl]-2-methyl-benzamide is CC1CC(C(C(=O)N1)CNC(=O)C2=C(C(=CC(=C2)Cl)N(C)C3CCC3)C)(C)C4CCCCCC4."}", "/scratch/micpie/export/iupac_smiles/test_7-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the chemical with DeepSMILES C=CC=NC=C6)CNS=O)=O)C=CC=CC=C6))O))N is 3-amino-4-hydroxy-N-(2-pyridinylmethyl)benzenesulfonamide."} {"text":"The IUAPC name in CAS-like style of the molecule with SELFIES [C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][Branch1][Branch2][C][C][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][F] is 3-(N-(4-fluorophenyl)sulfonylanilino)propanamide."}", "/scratch/micpie/export/iupac_smiles/test_4-8.jsonl": "{"text":"Task: Please give me the SMILES representation of a chemical given the systematic IUPAC name.\nIUPAC name: 1-(3-methoxyphenyl)sulfonyl-3-[(4-methylpiperazin-1-yl)methyl]indole\nResult: CN1CCN(CC1)CC2=CN(C3=CC=CC=C32)S(=O)(=O)C4=CC=CC(=C4)OC"} {"text":"Task: Please generate the SMILES of a chemical based on the systematic IUPAC name.\nIUPAC name: 4-(1-azanyl-2-methyl-propan-2-yl)-3-ethyl-piperidin-4-ol\nResult: CCC1CNCCC1(C(C)(C)CN)O"}", "/scratch/micpie/export/iupac_smiles/train_7-9.jsonl": "{"text":"Task: Please create the SMILES of a molecule based on the IUAPC name in CAS-like style.\nIUPAC name: 2-[(5-amino-2-chlorophenyl)sulfonylamino]acetic acid methyl ester\nResult: InChI=1S\/C9H11ClN2O4S\/c1-16-9(13)5-12-17(14,15)8-4-6(11)2-3-7(8)10\/h2-4,12H,5,11H2,1H3"} {"text":"Task: Please generate the SMILES of a molecule given the IUAPC name in CAS-like style.\nIUPAC name: 3-(N-(2-nitrophenyl)sulfonylanilino)propanamide\nResult: C1=CC=C(C=C1)N(CCC(=O)N)S(=O)(=O)C2=CC=CC=C2[N+](=O)[O-]"}", "/scratch/micpie/export/iupac_smiles/test_5-8.jsonl": "{"text":"Task: Please create the SMILES representation of a chemical based on the systematic IUPAC name.\nIUPAC name: 4-(1-azanyl-2-methyl-butan-2-yl)-3-ethyl-piperidin-4-ol\nResult: InChI=1S\/C12H26N2O\/c1-4-10-8-14-7-6-12(10,15)11(3,5-2)9-13\/h10,14-15H,4-9,13H2,1-3H3"} {"text":"Task: Please give me the SMILES of a chemical given the systematic IUPAC name.\nIUPAC name: N-(2-azanylethyl)-2-[(4-methyl-3-phenyl-pentanoyl)amino]benzamide\nResult: CCC)CCC=O)NC=CC=CC=C6C=O)NCCN))))))))))))))C=CC=CC=C6"}", "/scratch/micpie/export/iupac_smiles/test_8-7.jsonl": "{"text":"Task: Please create the SMILES representation of a chemical based on the traditional IUPAC name.\nIUPAC name: N-[3-[(3,4-dimethylphenyl)sulfonylamino]phenyl]-2-methyl-propionamide\nResult: [C][C][=C][Branch2][Ring2][Branch1][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][S][=Branch1][C][=O][=Branch1][C][=O][N][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][Branch1][C][C][C][C]"} {"text":"Task: Please create the SMILES representation of a molecule based on the traditional IUPAC name.\nIUPAC name: 1-(benzhydrylideneamino)oxy-3-(tert-butylamino)propan-2-ol\nResult: InChI=1S\/C20H26N2O2\/c1-20(2,3)21-14-18(23)15-24-22-19(16-10-6-4-7-11-16)17-12-8-5-9-13-17\/h4-13,18,21,23H,14-15H2,1-3H3"}", "/scratch/micpie/export/iupac_smiles/train_10-5.jsonl": "{"text":"The DeepSMILES of the molecule with CAS-like IUPAC name N-[4-[5-chloro-3-(dihydroxymethyl)-N,2-diethylanilino]cyclohexyl]carbamic acid tert-butyl ester is CCC=CC=CC=C6NCC))CCCCCC6))NC=O)OCC)C)C))))))))))))Cl)))CO)O."} {"text":"The InChI of the compound with IUAPC name in CAS-like style acetic acid [(3aS)-3a-carbonofluoridoyl-5a,5b,8,8,11a-pentamethyl-1-propan-2-yl-1,2,3,4,5,6,7,7a,9,10,11,11b,12,13,13a,13b-hexadecahydrocyclopenta[a]chrysen-9-yl] ester is InChI=1S\/C32H51FO3\/c1-19(2)21-11-16-32(27(33)35)18-17-30(7)22(26(21)32)9-10-24-29(6)14-13-25(36-20(3)34)28(4,5)23(29)12-15-31(24,30)8\/h19,21-26H,9-18H2,1-8H3\/t21?,22?,23?,24?,25?,26?,29?,30?,31?,32-\/m0\/s1."}", "/scratch/micpie/export/iupac_smiles/train_20-7.jsonl": "{"text":"Task: Please create the SMILES of a molecule given the traditional IUPAC name.\nIUPAC name: sulfuric acid [4-hydroxy-6-[(6-isohexyl-4,8-diketo-2,6,13,17,17-pentamethyl-7-oxapentacyclo[10.8.0.02,9.05,9.013,18]eicos-11-en-16-yl)oxy]-5-(3,4,5-trihydroxy-6-methyl-tetrahydropyran-2-yl)oxy-tetrahydropyran-3-yl] ester\nResult: CC(C)CCCC1(C)OC(=O)C23CC=C4C(CCC5C4(C)CCC(OC4OCC(OS(=O)(=O)O)C(O)C4OC4OC(C)C(O)C(O)C4O)C5(C)C)C2(C)CC(=O)C13"} {"text":"Task: Please create the SMILES of a chemical based on the traditional IUPAC name.\nIUPAC name: benzoic acid [4-[(1S)-1-(azetidine-1-carbonyl)butoxy]-2-[5-[2-[[3-[4-(3-fluorophenyl)triazol-1-yl]-5-[4-[4-(3-fluorophenyl)triazol-1-yl]-3,5-dihydroxy-6-methylol-tetrahydropyran-2-yl]oxy-4-hydroxy-cyclohexanecarbonyl]amino]ethylcarbamoyl]-3-methyl-2-(3,4,5-trihydroxy-6-methyl-tetrahydropyran-2-yl)oxy-cyclohexoxy]-5-hydroxy-6-methylol-tetrahydropyran-3-yl] ester\nResult: InChI=1S\/C66H85F2N9O21\/c1-4-11-44(62(89)75-20-10-21-75)92-58-53(83)48(31-79)96-66(59(58)97-63(90)34-12-6-5-7-13-34)94-46-27-37(22-32(2)57(46)98-65-56(86)55(85)50(80)33(3)91-65)60(87)69-18-19-70-61(88)38-25-43(76-28-41(71-73-76)35-14-8-16-39(67)23-35)51(81)45(26-38)93-64-54(84)49(52(82)47(30-78)95-64)77-29-42(72-74-77)36-15-9-17-40(68)24-36\/h5-9,12-17,23-24,28-29,32-33,37-38,43-59,64-66,78-86H,4,10-11,18-22,25-27,30-31H2,1-3H3,(H,69,87)(H,70,88)\/t32?,33?,37?,38?,43?,44-,45?,46?,47?,48?,49?,50?,51?,52?,53?,54?,55?,56?,57?,58?,59?,64?,65?,66?\/m0\/s1"}", "/scratch/micpie/export/iupac_smiles/valid_21-2.jsonl": "{"text":"The preferred IUPAC name of the compound with DeepSMILES CCCCCCO6)OCCCCCC6OCCCCCO6)CO)))O))O[C@@H]CCCCCCC6)))))))C=O)NCCC4))))))))OC=O)C=CC=CC=C6)))))))))))))C=O)NCCNC=O)CCCCCC6)OCCCCCO6)CO)))O))OCC=CC=CC=CC=C6OC%10=O))))))))))))))O)))))O))OCC=CC=CC=CC=C6OC%10=O))))))))))))))))))))))))NC=CN=N5))C=CC=CC=C6)))))))))))))O))O))O is [4-[(2S)-1-(azetidin-1-yl)-3-cyclohexyl-1-oxopropan-2-yl]oxy-2-[5-[2-[[3-[3,5-dihydroxy-6-(hydroxymethyl)-4-[(2-oxochromen-3-yl)methoxy]oxan-2-yl]oxy-4-hydroxy-5-[(2-oxochromen-3-yl)methoxy]cyclohexanecarbonyl]amino]ethylcarbamoyl]-3-(4-phenyltriazol-1-yl)-2-(3,4,5-trihydroxy-6-methyloxan-2-yl)oxycyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)oxan-3-yl] benzoate."} {"text":"The IUPAC name of the compound with SELFIES [C][=C][Branch2][Ring1][Ring2][C][=N][C][=Branch1][Branch1][=C][Ring1][=Branch1][F][O][C][C][Branch1][C][F][Branch1][C][F][F][I-1][C][=N][C][=N][N][Ring1][Branch1] is 3-fluoro-5-(1H-1,2,4-triazol-5-yliodanuidyl)-2-(2,2,2-trifluoroethoxy)pyridine."}", "/scratch/micpie/export/iupac_smiles/test_11-1.jsonl": "{"text":"The CAS-like IUPAC name of the chemical with canonical SMILES CN(C)CCC(Oc1ccc(F)cc1F)c1ccccc1.O=C(O)c1cc([N+](=O)[O-])ccc1[N+](=O)[O-] is 3-(2,4-difluorophenoxy)-N,N-dimethyl-3-phenyl-1-propanamine;2,5-dinitrobenzoic acid."} {"text":"The IUAPC name in CAS-like style of the compound with SMILES CCCCCCCCC1(C2=C(C=CC(=C2)C3=CC=C(C=C3)C=O)C4=C1C=C(C=C4)C5=CC=C(C=C5)O)CCCCCCCC is 4-[7-(4-hydroxyphenyl)-9,9-dioctyl-2-fluorenyl]benzaldehyde."}", "/scratch/micpie/export/iupac_smiles/test_6-5.jsonl": "{"text":"The InChI of the chemical with IUAPC name in CAS-like style N-(2-aminoethyl)-2-[(3-methyl-1-oxo-3-phenylbutyl)amino]benzamide is InChI=1S\/C20H25N3O2\/c1-20(2,15-8-4-3-5-9-15)14-18(24)23-17-11-7-6-10-16(17)19(25)22-13-12-21\/h3-11H,12-14,21H2,1-2H3,(H,22,25)(H,23,24)."} {"text":"The InChI of the chemical with IUAPC name in CAS-like style N-[(3,4-dichlorophenyl)methyl]-2,3-dihydro-1,4-benzodioxin-6-amine is InChI=1S\/C15H13Cl2NO2\/c16-12-3-1-10(7-13(12)17)9-18-11-2-4-14-15(8-11)20-6-5-19-14\/h1-4,7-8,18H,5-6,9H2."}", "/scratch/micpie/export/iupac_smiles/test_26-0.jsonl": "{"text":"The traditional IUPAC name of the compound with InChI InChI=1S\/C12H9Cl2NO2S\/c1-2-8-10(12(16)17)18-11(15-8)6-4-3-5-7(13)9(6)14\/h3-5H,2H2,1H3,(H,16,17) is 2-(2,3-dichlorophenyl)-4-ethyl-thiazole-5-carboxylic acid."} {"text":"The traditional IUPAC name of the molecule with DeepSMILES COC=CC=CC=C6)C=NN[C@H]C5)C=CC=CC=C6Cl))))))))C=O)CNCCC=CC=CC=C6C%10 is 1-[(5R)-5-(2-chlorophenyl)-3-(3-methoxyphenyl)-2-pyrazolin-1-yl]-2-(3,4-dihydro-1H-isoquinolin-2-yl)ethanone."}", "/scratch/micpie/export/iupac_smiles/test_13-7.jsonl": "{"text":"Task: Please generate the SMILES of a molecule given the traditional IUPAC name.\nIUPAC name: 2-amino-N-(2-hydroxy-1-methyl-1-methylol-ethyl)-3,3-dimethyl-butyramide\nResult: [C][C][Branch1][C][C][Branch1][C][C][C][Branch1][S][C][=Branch1][C][=O][N][C][Branch1][C][C][Branch1][Ring1][C][O][C][O][N]"} {"text":"Task: Please generate the SMILES of a molecule given the traditional IUPAC name.\nIUPAC name: (4S)-chroman-4-carboxylic acid (1,3,5-trimethylpyrazol-4-yl) ester\nResult: CC1=C(C(=NN1C)C)OC(=O)[C@H]2CCOC3=CC=CC=C23"}", "/scratch/micpie/export/iupac_smiles/test_23-6.jsonl": "{"text":"The canonical SMILES of the molecule with IUPAC name 1-(4-hydroxy-3-nitrophenyl)sulfonylpiperidine-3-carbaldehyde is O=CC1CCCN(S(=O)(=O)c2ccc(O)c([N+](=O)[O-])c2)C1."} {"text":"The DeepSMILES of the compound with preferred IUPAC name N-(benzimidazol-1-yl)-4-(ethylsulfamoyl)benzamide is CCNS=O)=O)C=CC=CC=C6))C=O)NNC=NC=CC=CC=C69."}", "/scratch/micpie/export/iupac_smiles/test_21-4.jsonl": "{"text":"The canonical SMILES of the chemical with systematic IUPAC name [4-[(2S)-1-(azetidin-1-yl)-3-cyclohexyl-1-oxidanylidene-propan-2-yl]oxy-2-[3-ethyl-5-[2-[[3-[6-(hydroxymethyl)-3,5-bis(oxidanyl)-4-[(2-oxidanylidenechromen-3-yl)methoxy]oxan-2-yl]oxy-4-oxidanyl-5-[(2-oxidanylidenechromen-3-yl)methoxy]cyclohexyl]carbonylamino]ethylcarbamoyl]-2-[6-methyl-3,4,5-tris(oxidanyl)oxan-2-yl]oxy-cyclohexyl]oxy-6-(hydroxymethyl)-5-oxidanyl-oxan-3-yl] benzoate is CCC1CC(C(=O)NCCNC(=O)C2CC(OCc3cc4ccccc4oc3=O)C(O)C(OC3OC(CO)C(O)C(OCc4cc5ccccc5oc4=O)C3O)C2)CC(OC2OC(CO)C(O)C(O[C@@H](CC3CCCCC3)C(=O)N3CCC3)C2OC(=O)c2ccccc2)C1OC1OC(C)C(O)C(O)C1O."} {"text":"The SELFIES of the chemical with systematic IUPAC name 2-(methoxymethyl)-3-[4-(trifluoromethyloxy)phenyl]pyrazine is [C][O][C][C][=N][C][=C][N][=C][Ring1][=Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][O][C][Branch1][C][F][Branch1][C][F][F]."}", "/scratch/micpie/export/iupac_smiles/valid_4-2.jsonl": "{"text":"The IUPAC name of the compound with DeepSMILES CCS=O)=O)NCC=CNN=N5))[C@@H][C@@H][C@H][C@@H][C@H]O6)CO)))O[C@H][C@@H][C@H][C@H][C@H]O6)CO)))O))O))O)))))O))O)))))))C=CC=CC=C6))Cl is N-(4-chlorophenyl)-N-[[1-[(2S,3R,4R,5S,6R)-3,4-dihydroxy-6-(hydroxymethyl)-5-[(2S,3R,4S,5R,6R)-3,4,5-trihydroxy-6-(hydroxymethyl)oxan-2-yl]oxyoxan-2-yl]triazol-4-yl]methyl]ethanesulfonamide."} {"text":"The preferred IUPAC name of the chemical with DeepSMILES CCCCCCCC6))CN))CCCNCC6CC)))))))O is 4-[1-(aminomethyl)-4-ethylcyclohexyl]-3-ethylpiperidin-4-ol."}", "/scratch/micpie/export/iupac_smiles/test_15-1.jsonl": "{"text":"The CAS-like IUPAC name of the compound with SMILES C1CN(CCC1(F)F)C(=O)C2=CC=C(C=C2)C3=CC(=C4C(=C3)C=C(O4)CNC(=O)\/C=C\/C5=CN=C(C=C5)N)C6=CC(=C(C=C6F)F)Cl is (E)-3-(6-amino-3-pyridinyl)-N-[[7-(5-chloro-2,4-difluorophenyl)-5-[4-[(4,4-difluoro-1-piperidinyl)-oxomethyl]phenyl]-2-benzofuranyl]methyl]-2-propenamide."} {"text":"The IUAPC name in CAS-like style of the chemical with SELFIES [C][C][C][=C][Branch1][Ring2][C][Ring1][Branch1][C][=C][Branch1][Branch1][C][=C][Ring1][#Branch1][O][C][=N][C][=C][Branch1][=Branch2][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][Cl][Cl] is 5-chloro-4-(chloromethyl)-2-(2,3-dihydro-1H-inden-5-yloxy)pyridine."}", "/scratch/micpie/export/iupac_smiles/test_7-3.jsonl": "{"text":"The SMILES of the chemical with traditional IUPAC name 3-amino-4-hydroxy-N-(2-pyridylmethyl)benzenesulfonamide is C1=CC=NC(=C1)CNS(=O)(=O)C2=CC(=C(C=C2)O)N."} {"text":"The InChI of the molecule with traditional IUPAC name 3-(N-(4-fluorophenyl)sulfonylanilino)propionamide is InChI=1S\/C15H15FN2O3S\/c16-12-6-8-14(9-7-12)22(20,21)18(11-10-15(17)19)13-4-2-1-3-5-13\/h1-9H,10-11H2,(H2,17,19)."}", "/scratch/micpie/export/iupac_smiles/valid_25-8.jsonl": "{"text":"Task: Please create the SMILES of a molecule based on the systematic IUPAC name.\nIUPAC name: 5-(5-chloranylpyridin-3-yl)-N-[(1S)-1-cyclopropylethyl]-7-methyl-pyrazolo[1,5-a]pyrimidine-3-carboxamide\nResult: Cc1cc(-c2cncc(Cl)c2)nc2c(C(=O)N[C@@H](C)C3CC3)cnn12"} {"text":"Task: Please create the SMILES of a molecule based on the systematic IUPAC name.\nIUPAC name: 2-[azanyl(phenyl)methyl]-4-ethyl-1,3-thiazole-5-carboxylic acid\nResult: CCc1nc(C(N)c2ccccc2)sc1C(=O)O"}", "/scratch/micpie/export/iupac_smiles/valid_17-6.jsonl": "{"text":"The SMILES of the compound with IUPAC name 2-ethoxy-N-(4-fluorophenyl)-3-[[4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl]benzimidazole-4-carboxamide is CCOC1=NC2=CC=CC(=C2N1CC3=CC=C(C=C3)C4=CC=CC=C4C5=NNN=N5)C(=O)NC6=CC=C(C=C6)F."} {"text":"The SMILES of the compound with preferred IUPAC name 3-N-(3-aminophenyl)-1-N,5-N-bis[3-(tert-butylamino)phenyl]benzene-1,3,5-tricarboxamide is CC(C)(C)NC1=CC=CC(=C1)NC(=O)C2=CC(=CC(=C2)C(=O)NC3=CC=CC(=C3)N)C(=O)NC4=CC(=CC=C4)NC(C)(C)C."}", "/scratch/micpie/export/iupac_smiles/test_27-0.jsonl": "{"text":"The traditional IUPAC name of the compound with canonical SMILES Cc1ccc(C2=NN(C(=O)CN3CCN(c4ccccc4F)CC3)[C@H](c3ccc(Cl)cc3)C2)cc1C is 1-[(5S)-5-(4-chlorophenyl)-3-(3,4-dimethylphenyl)-2-pyrazolin-1-yl]-2-[4-(2-fluorophenyl)piperazino]ethanone."} {"text":"The traditional IUPAC name of the molecule with InChI InChI=1S\/C25H27NS\/c1-7-20(16-22-18(4)17(3)21(8-2)25(22,5)6)26-24-15-14-23(27-24)19-12-10-9-11-13-19\/h7-16,26H,1-2H2,3-6H3\/b20-16+ is (5-phenyl-2-thienyl)-[(1E)-1-[(2,3,5,5-tetramethyl-4-vinyl-cyclopenta-1,3-dien-1-yl)methylene]allyl]amine."}", "/scratch/micpie/export/iupac_smiles/test_0-3.jsonl": "{"text":"The SELFIES of the chemical with traditional IUPAC name 1-(4-bromobenzyl)-N-[(Z)-m-anisylideneamino]isonipecotamide is [C][O][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][\/C][=N][\\N][C][=Branch1][C][=O][C][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Br]."} {"text":"The SMILES of the molecule with traditional IUPAC name 2-(5-methyl-1lambda3-ioda-3-thia-2-azacyclopenta-1,4-dien-4-yl)thiazole-4-carbaldehyde is CC1=C(SN=I1)C2=NC(=CS2)C=O."}", "/scratch/micpie/export/iupac_smiles/train_19-5.jsonl": "{"text":"The SMILES of the compound with CAS-like IUPAC name N-(4-chlorophenyl)-3-[4-[2-(methylamino)ethyl]-1-piperidinyl]propanamide is CNCCC1CCN(CC1)CCC(=O)NC2=CC=C(C=C2)Cl."} {"text":"The canonical SMILES of the compound with IUAPC name in CAS-like style 3-[1-[2,6-dichloro-4-[hydroxy(oxido)amino]phenyl]iminoethyl]-1-benzopyran-2-one is CC(=Nc1c(Cl)cc(N([O-])O)cc1Cl)c1cc2ccccc2oc1=O."}", "/scratch/micpie/export/iupac_smiles/test_15-7.jsonl": "{"text":"Task: Please give me the SMILES of a compound given the traditional IUPAC name.\nIUPAC name: (E)-3-(6-amino-3-pyridyl)-N-[[7-(5-chloro-2,4-difluoro-phenyl)-5-[4-(4,4-difluoropiperidine-1-carbonyl)phenyl]benzofuran-2-yl]methyl]acrylamide\nResult: C1CN(CCC1(F)F)C(=O)C2=CC=C(C=C2)C3=CC(=C4C(=C3)C=C(O4)CNC(=O)\/C=C\/C5=CN=C(C=C5)N)C6=CC(=C(C=C6F)F)Cl"} {"text":"Task: Please give me the SMILES representation of a chemical given the traditional IUPAC name.\nIUPAC name: 5-chloro-4-(chloromethyl)-2-indan-5-yloxy-pyridine\nResult: CCC=CC5)C=CC=C6))OC=NC=CC=C6)CCl)))Cl"}", "/scratch/micpie/export/iupac_smiles/train_27-9.jsonl": "{"text":"Task: Please generate the SMILES representation of a molecule based on the CAS-like IUPAC name.\nIUPAC name: 1-[(3R)-3-(4-chlorophenyl)-5-(4-fluorophenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(2-fluorophenyl)-1-piperazinyl]ethanone\nResult: CCNCCN6CC=O)N[C@H]CC=N5)C=CC=CC=C6))F)))))))C=CC=CC=C6))Cl))))))))))))C=CC=CC=C6F"} {"text":"Task: Please generate the SMILES of a compound given the CAS-like IUPAC name.\nIUPAC name: 6-[4-[4-[(9,9-dimethyl-2-fluorenyl)amino]phenyl]-2,3,5,6-tetrahydroxyphenyl]benzene-1,2,3,4,5-pentol\nResult: CC1(C2=CC=CC=C2C3=C1C=C(C=C3)NC4=CC=C(C=C4)C5=C(C(=C(C(=C5O)O)C6=C(C(=C(C(=C6O)O)O)O)O)O)O)C"}", "/scratch/micpie/export/iupac_smiles/test_4-6.jsonl": "{"text":"The SMILES of the chemical with preferred IUPAC name 1-(3-methoxyphenyl)sulfonyl-3-[(4-methylpiperazin-1-yl)methyl]indole is CN1CCN(CC1)CC2=CN(C3=CC=CC=C32)S(=O)(=O)C4=CC=CC(=C4)OC."} {"text":"The InChI of the chemical with preferred IUPAC name 4-(1-amino-2-methylpropan-2-yl)-3-ethylpiperidin-4-ol is InChI=1S\/C11H24N2O\/c1-4-9-7-13-6-5-11(9,14)10(2,3)8-12\/h9,13-14H,4-8,12H2,1-3H3."}", "/scratch/micpie/export/iupac_smiles/test_4-2.jsonl": "{"text":"The IUPAC name of the molecule with InChI InChI=1S\/C21H25N3O3S\/c1-22-10-12-23(13-11-22)15-17-16-24(21-9-4-3-8-20(17)21)28(25,26)19-7-5-6-18(14-19)27-2\/h3-9,14,16H,10-13,15H2,1-2H3 is 1-(3-methoxyphenyl)sulfonyl-3-[(4-methylpiperazin-1-yl)methyl]indole."} {"text":"The preferred IUPAC name of the compound with InChI InChI=1S\/C11H24N2O\/c1-4-9-7-13-6-5-11(9,14)10(2,3)8-12\/h9,13-14H,4-8,12H2,1-3H3 is 4-(1-amino-2-methylpropan-2-yl)-3-ethylpiperidin-4-ol."}", "/scratch/micpie/export/iupac_smiles/test_22-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with canonical SMILES O=C1Oc2cccc3c2c(nn3Br)O1 is 2-bromo-5,7-dioxa-2,3-diazatricyclo[6.3.1.04,12]dodeca-1(12),3,8,10-tetraen-6-one."} {"text":"The traditional IUPAC name of the compound with DeepSMILES CCCCNC6)S=O)=O)C=CC=CC=C6))NC=O)C5))))))))))C=O is 1-(2-ketoindolin-5-yl)sulfonylnipecotaldehyde."}", "/scratch/micpie/export/iupac_smiles/valid_20-0.jsonl": "{"text":"The traditional IUPAC name of the compound with canonical SMILES CCCCCC=CCC1(O)C=CC(=O)C1=CC=CC(CCC(=O)OC)OC(C)=O is 4-acetoxy-7-(2-hydroxy-5-keto-2-oct-2-enyl-cyclopent-3-en-1-ylidene)hept-5-enoic acid methyl ester."} {"text":"The traditional IUPAC name of the compound with SMILES CC(=O)CNCCNCCNCC(=O)O is 2-[2-[2-(acetonylamino)ethylamino]ethylamino]acetic acid."}", "/scratch/micpie/export/iupac_smiles/valid_8-5.jsonl": "{"text":"The canonical SMILES of the molecule with CAS-like IUPAC name 3-[N-(2-naphthalenylsulfonyl)anilino]propanamide is NC(=O)CCN(c1ccccc1)S(=O)(=O)c1ccc2ccccc2c1."} {"text":"The DeepSMILES of the chemical with CAS-like IUPAC name 1-[(E)-3,4-dihydro-2H-naphthalen-1-ylideneamino]oxy-3-(propan-2-ylamino)-2-propanol;hydrochloride is CCC)NCCCO\/N=C\\CCCC=CC=CC=C6\\%10)))))))))))))O.Cl."}", "/scratch/micpie/export/iupac_smiles/valid_5-9.jsonl": "{"text":"Task: Please create the SMILES of a compound based on the IUAPC name in CAS-like style.\nIUPAC name: 4-[1-(aminomethyl)cyclooctyl]-3-ethyl-4-piperidinol\nResult: CCC1CNCCC1(O)C1(CN)CCCCCCC1"} {"text":"Task: Please give me the SMILES representation of a chemical based on the IUAPC name in CAS-like style.\nIUPAC name: N-(2-aminoethyl)-2-[[3-[[cyclopentyl(oxo)methyl]amino]-1-oxobutyl]amino]benzamide;hydrochloride\nResult: CCCC=O)NC=CC=CC=C6C=O)NCCN))))))))))))))NC=O)CCCCC5.Cl"}", "/scratch/micpie/export/iupac_smiles/valid_3-2.jsonl": "{"text":"The IUPAC name of the chemical with canonical SMILES C[C@@]12C[C@H]3C[C@@](O)(C1)C[C@](C(=O)O)(C3)C2 is (1S,3R,5S,7R)-3-hydroxy-5-methyladamantane-1-carboxylic acid."} {"text":"The IUPAC name of the molecule with canonical SMILES COc1ccccc1CN(C)CCCCCCNC(=O)COc1ccc(-c2cc(=O)c3c(O)c(OC)c(OC)cc3o2)cc1 is 2-[4-(5-hydroxy-6,7-dimethoxy-4-oxochromen-2-yl)phenoxy]-N-[6-[(2-methoxyphenyl)methyl-methylamino]hexyl]acetamide."}", "/scratch/micpie/export/iupac_smiles/test_13-6.jsonl": "{"text":"The SMILES of the compound with preferred IUPAC name 2-amino-N-(1,3-dihydroxy-2-methylpropan-2-yl)-3,3-dimethylbutanamide is CC(C)(C)C(C(=O)NC(C)(CO)CO)N."} {"text":"The SMILES of the molecule with IUPAC name (1,3,5-trimethylpyrazol-4-yl) (4S)-3,4-dihydro-2H-chromene-4-carboxylate is CC1=C(C(=NN1C)C)OC(=O)[C@H]2CCOC3=CC=CC=C23."}", "/scratch/micpie/export/iupac_smiles/test_23-7.jsonl": "{"text":"Task: Please create the SMILES of a chemical based on the traditional IUPAC name.\nIUPAC name: 1-(4-hydroxy-3-nitro-phenyl)sulfonylnipecotaldehyde\nResult: CCCCNC6)S=O)=O)C=CC=CC=C6))O))[N+]=O)[O-]))))))))C=O"} {"text":"Task: Please generate the SMILES representation of a compound based on the traditional IUPAC name.\nIUPAC name: N-(benzimidazol-1-yl)-4-(ethylsulfamoyl)benzamide\nResult: CCNS(=O)(=O)c1ccc(C(=O)Nn2cnc3ccccc32)cc1"}", "/scratch/micpie/export/iupac_smiles/train_23-8.jsonl": "{"text":"Task: Please create the SMILES representation of a chemical given the systematic IUPAC name.\nIUPAC name: 1-(2-nitrophenyl)sulfonylpiperidine-3-carbaldehyde\nResult: CCCCNC6)S=O)=O)C=CC=CC=C6[N+]=O)[O-])))))))))))C=O"} {"text":"Task: Please generate the SMILES representation of a molecule based on the systematic IUPAC name.\nIUPAC name: N-(benzimidazol-1-yl)-4-(2,5-dimethylphenyl)-4-oxidanylidene-butanamide\nResult: CC1=CC(=C(C=C1)C)C(=O)CCC(=O)NN2C=NC3=CC=CC=C32"}", "/scratch/micpie/export/iupac_smiles/valid_2-1.jsonl": "{"text":"The CAS-like IUPAC name of the chemical with DeepSMILES CCC[C@H]C)[C@@H]C=CC=CC=C6))O)))))CC)C is 4-[(3S,4S)-2,4-dimethylheptan-3-yl]phenol."} {"text":"The CAS-like IUPAC name of the molecule with SMILES CCOC1=CC=C(C=C1)C2=NC(=CS2)C3=CC4=CC(=CC(=C4OC3=O)Br)[N+](=O)[O-] is 8-bromo-3-[2-(4-ethoxyphenyl)-4-thiazolyl]-6-nitro-1-benzopyran-2-one."}", "/scratch/micpie/export/iupac_smiles/valid_19-5.jsonl": "{"text":"The InChI of the compound with IUAPC name in CAS-like style N-[2-chloro-5-(1-tetrazolyl)phenyl]-2-[4-[2-(methylamino)ethyl]-1-piperidinyl]acetamide;hydrochloride is InChI=1S\/C17H24ClN7O.ClH\/c1-19-7-4-13-5-8-24(9-6-13)11-17(26)21-16-10-14(2-3-15(16)18)25-12-20-22-23-25;\/h2-3,10,12-13,19H,4-9,11H2,1H3,(H,21,26);1H."} {"text":"The canonical SMILES of the chemical with CAS-like IUPAC name 2-[bis[2-[(4a-methyl-8-methylene-2-propan-2-yl-1,2,3,4,5,6,7,8a-octahydronaphthalen-1-yl)amino]-2-oxoethyl]amino]acetic acid methyl ester is C=C1CCCC2(C)CCC(C(C)C)C(NC(=O)CN(CC(=O)NC3C(C(C)C)CCC4(C)CCCC(=C)C34)CC(=O)OC)C12."}", "/scratch/micpie/export/iupac_smiles/valid_9-6.jsonl": "{"text":"The SELFIES of the compound with IUPAC name 1-(propan-2-ylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,9,11,13-heptaenylideneamino)oxypropan-2-ol is [C][C][Branch1][C][C][N][C][C][Branch2][Ring1][=Branch2][C][O][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][#C][O]."} {"text":"The SMILES of the molecule with IUPAC name 5-chloro-3-[cyclobutyl(methyl)amino]-N-[(4-cycloheptyl-4,6-dimethyl-2-oxopiperidin-3-yl)methyl]-2-methylbenzamide is CC1CC(C(C(=O)N1)CNC(=O)C2=C(C(=CC(=C2)Cl)N(C)C3CCC3)C)(C)C4CCCCCC4."}", "/scratch/micpie/export/iupac_smiles/test_5-4.jsonl": "{"text":"The DeepSMILES of the compound with systematic IUPAC name 4-(1-azanyl-2-methyl-butan-2-yl)-3-ethyl-piperidin-4-ol is CCCCNCCC6CC)CC))CN)))O."} {"text":"The SMILES of the compound with systematic IUPAC name N-(2-azanylethyl)-2-[(4-methyl-3-phenyl-pentanoyl)amino]benzamide is CC(C)C(CC(=O)NC1=CC=CC=C1C(=O)NCCN)C2=CC=CC=C2."}", "/scratch/micpie/export/iupac_smiles/test_4-5.jsonl": "{"text":"The canonical SMILES of the molecule with CAS-like IUPAC name 1-(3-methoxyphenyl)sulfonyl-3-[(4-methyl-1-piperazinyl)methyl]indole is COc1cccc(S(=O)(=O)n2cc(CN3CCN(C)CC3)c3ccccc32)c1."} {"text":"The DeepSMILES of the compound with CAS-like IUPAC name 4-(1-amino-2-methylpropan-2-yl)-3-ethyl-4-piperidinol is CCCCNCCC6CC)C)CN)))O."}", "/scratch/micpie/export/iupac_smiles/valid_7-3.jsonl": "{"text":"The DeepSMILES of the compound with traditional IUPAC name 2-methyl-N-[2-[[4-(trifluoromethylthio)benzyl]amino]ethyl]propionamide is CCC)C=O)NCCNCC=CC=CC=C6))SCF)F)F."} {"text":"The InChI of the chemical with traditional IUPAC name 3-[N-(2-thienylsulfonyl)anilino]propionamide is InChI=1S\/C13H14N2O3S2\/c14-12(16)8-9-15(11-5-2-1-3-6-11)20(17,18)13-7-4-10-19-13\/h1-7,10H,8-9H2,(H2,14,16)."}", "/scratch/micpie/export/iupac_smiles/valid_4-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with SMILES CCS(=O)(=O)N(CC1=CN(N=N1)[C@@H]2[C@@H]([C@H]([C@@H]([C@H](O2)CO)O[C@H]3[C@@H]([C@H]([C@H]([C@H](O3)CO)O)O)O)O)O)C4=CC=C(C=C4)Cl is N-(4-chlorophenyl)-N-[[1-[(2S,3R,4R,5S,6R)-3,4-dihydroxy-6-methylol-5-[(2S,3R,4S,5R,6R)-3,4,5-trihydroxy-6-methylol-tetrahydropyran-2-yl]oxy-tetrahydropyran-2-yl]triazol-4-yl]methyl]ethanesulfonamide."} {"text":"The traditional IUPAC name of the molecule with canonical SMILES CCC1CCC(CN)(C2(O)CCNCC2CC)CC1 is 4-[1-(aminomethyl)-4-ethyl-cyclohexyl]-3-ethyl-piperidin-4-ol."}", "/scratch/micpie/export/iupac_smiles/train_5-1.jsonl": "{"text":"The CAS-like IUPAC name of the chemical with InChI InChI=1S\/C12H24N2O2\/c1-2-10-7-14-5-3-12(10,15)11(8-13)4-6-16-9-11\/h10,14-15H,2-9,13H2,1H3 is 4-[3-(aminomethyl)-3-oxolanyl]-3-ethyl-4-piperidinol."} {"text":"The IUAPC name in CAS-like style of the molecule with DeepSMILES CCC=O)NC=CC=CC=C6C=O)NCCN)))))))))))))OCCCCCO5.Cl is N-(2-aminoethyl)-2-[[1-oxo-2-(2-oxolanylmethoxy)propyl]amino]benzamide;hydrochloride."}", "/scratch/micpie/export/iupac_smiles/valid_17-8.jsonl": "{"text":"Task: Please create the SMILES representation of a compound given the systematic IUPAC name.\nIUPAC name: 2-ethoxy-N-(4-fluorophenyl)-3-[[4-[2-(2H-1,2,3,4-tetrazol-5-yl)phenyl]phenyl]methyl]benzimidazole-4-carboxamide\nResult: InChI=1S\/C30H24FN7O2\/c1-2-40-30-33-26-9-5-8-25(29(39)32-22-16-14-21(31)15-17-22)27(26)38(30)18-19-10-12-20(13-11-19)23-6-3-4-7-24(23)28-34-36-37-35-28\/h3-17H,2,18H2,1H3,(H,32,39)(H,34,35,36,37)"} {"text":"Task: Please give me the SMILES of a chemical given the systematic IUPAC name.\nIUPAC name: N3-(3-aminophenyl)-N1,N5-bis[3-(tert-butylamino)phenyl]benzene-1,3,5-tricarboxamide\nResult: CCC)C)NC=CC=CC=C6)NC=O)C=CC=CC=C6)C=O)NC=CC=CC=C6)N))))))))))C=O)NC=CC=CC=C6)))NCC)C)C"}", "/scratch/micpie/export/iupac_smiles/valid_17-3.jsonl": "{"text":"The DeepSMILES of the compound with traditional IUPAC name 2-ethoxy-N-(4-fluorophenyl)-3-[4-[2-(2H-tetrazol-5-yl)phenyl]benzyl]benzimidazole-4-carboxamide is CCOC=NC=CC=CC=C6N9CC=CC=CC=C6))C=CC=CC=C6C=NNN=N5))))))))))))))))))C=O)NC=CC=CC=C6))F."} {"text":"The DeepSMILES of the molecule with traditional IUPAC name N3-(3-aminophenyl)-N1,N5-bis[3-(tert-butylamino)phenyl]benzene-1,3,5-tricarboxamide is CCC)C)NC=CC=CC=C6)NC=O)C=CC=CC=C6)C=O)NC=CC=CC=C6)N))))))))))C=O)NC=CC=CC=C6)))NCC)C)C."}", "/scratch/micpie/export/iupac_smiles/valid_23-5.jsonl": "{"text":"The DeepSMILES of the chemical with CAS-like IUPAC name 1-(3-chlorophenyl)sulfonyl-3-piperidinecarboxaldehyde is CCCCNC6)S=O)=O)C=CC=CC=C6)))Cl)))))))C=O."} {"text":"The canonical SMILES of the chemical with IUAPC name in CAS-like style N-(1-benzimidazolyl)-3-[(2,3,4,5,6-pentamethylphenyl)sulfonylamino]propanamide is Cc1c(C)c(C)c(S(=O)(=O)NCCC(=O)Nn2cnc3ccccc32)c(C)c1C."}", "/scratch/micpie/export/iupac_smiles/valid_22-2.jsonl": "{"text":"The IUPAC name of the compound with DeepSMILES C#CC=CC=CC=C6)))CCC3))))CF)F)F is 1-cyclopropyl-3-ethynyl-2-(trifluoromethyl)benzene."} {"text":"The preferred IUPAC name of the chemical with SELFIES [C][O][C][=C][Branch2][Ring1][N][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][=O][F] is 1-(3-fluoro-4-methoxyphenyl)sulfonylpiperidine-3-carbaldehyde."}", "/scratch/micpie/export/iupac_smiles/valid_6-6.jsonl": "{"text":"The SELFIES of the compound with preferred IUPAC name N-(2-aminoethyl)-2-(5-methoxypentanoylamino)benzamide is [C][O][C][C][C][C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][N]."} {"text":"The InChI of the chemical with preferred IUPAC name 5-[[benzyl(propan-2-yl)amino]methyl]-2-methoxyaniline is InChI=1S\/C18H24N2O\/c1-14(2)20(12-15-7-5-4-6-8-15)13-16-9-10-18(21-3)17(19)11-16\/h4-11,14H,12-13,19H2,1-3H3."}", "/scratch/micpie/export/iupac_smiles/train_8-3.jsonl": "{"text":"The InChI of the molecule with traditional IUPAC name 3-(N-(2-methyl-5-nitro-phenyl)sulfonylanilino)propionamide is InChI=1S\/C16H17N3O5S\/c1-12-7-8-14(19(21)22)11-15(12)25(23,24)18(10-9-16(17)20)13-5-3-2-4-6-13\/h2-8,11H,9-10H2,1H3,(H2,17,20)."} {"text":"The InChI of the compound with traditional IUPAC name 1-(isopropylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,11,13-hexaenylideneamino)oxy-propan-2-ol;hydrochloride is InChI=1S\/C21H26N2O2.ClH\/c1-15(2)22-13-18(24)14-25-23-21-19-9-5-3-7-16(19)11-12-17-8-4-6-10-20(17)21;\/h3-10,15,18,22,24H,11-14H2,1-2H3;1H."}", "/scratch/micpie/export/iupac_smiles/test_21-7.jsonl": "{"text":"Task: Please generate the SMILES of a molecule based on the traditional IUPAC name.\nIUPAC name: benzoic acid [4-[(1S)-2-(azetidin-1-yl)-1-(cyclohexylmethyl)-2-keto-ethoxy]-2-[5-[2-[[3-[3,5-dihydroxy-4-[(2-ketochromen-3-yl)methoxy]-6-methylol-tetrahydropyran-2-yl]oxy-4-hydroxy-5-[(2-ketochromen-3-yl)methoxy]cyclohexanecarbonyl]amino]ethylcarbamoyl]-3-ethyl-2-(3,4,5-trihydroxy-6-methyl-tetrahydropyran-2-yl)oxy-cyclohexoxy]-5-hydroxy-6-methylol-tetrahydropyran-3-yl] ester\nResult: InChI=1S\/C75H97N3O27\/c1-3-40-28-44(33-52(63(40)105-73-61(86)60(85)56(81)38(2)96-73)101-75-66(104-70(91)41-17-8-5-9-18-41)65(59(84)55(35-80)103-75)97-53(69(90)78-25-14-26-78)27-39-15-6-4-7-16-39)67(88)76-23-24-77-68(89)45-31-50(94-36-46-29-42-19-10-12-21-48(42)98-71(46)92)57(82)51(32-45)100-74-62(87)64(58(83)54(34-79)102-74)95-37-47-30-43-20-11-13-22-49(43)99-72(47)93\/h5,8-13,17-22,29-30,38-40,44-45,50-66,73-75,79-87H,3-4,6-7,14-16,23-28,31-37H2,1-2H3,(H,76,88)(H,77,89)\/t38?,40?,44?,45?,50?,51?,52?,53-,54?,55?,56?,57?,58?,59?,60?,61?,62?,63?,64?,65?,66?,73?,74?,75?\/m0\/s1"} {"text":"Task: Please create the SMILES representation of a chemical given the traditional IUPAC name.\nIUPAC name: 2-(methoxymethyl)-3-[4-(trifluoromethoxy)phenyl]pyrazine\nResult: InChI=1S\/C13H11F3N2O2\/c1-19-8-11-12(18-7-6-17-11)9-2-4-10(5-3-9)20-13(14,15)16\/h2-7H,8H2,1H3"}", "/scratch/micpie/export/iupac_smiles/test_2-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the compound with InChI InChI=1S\/C11H15NO\/c1-8-6-4-5-7-10(8)11(13)9(2)12-3\/h4-7,9,12H,1-3H3\/t9-\/m1\/s1 is (2R)-2-(methylamino)-1-(2-methylphenyl)-1-propanone."} {"text":"The CAS-like IUPAC name of the molecule with canonical SMILES O=C1c2c(Cl)c(Cl)c(Cl)c(Cl)c2C(=O)N1c1cc2c(cc1Br)OCCO2 is 2-(6-bromo-2,3-dihydro-1,4-benzodioxin-7-yl)-4,5,6,7-tetrachloroisoindole-1,3-dione."}", "/scratch/micpie/export/iupac_smiles/train_20-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with canonical SMILES CC(C)CCCC1(C)OC(=O)C23CC=C4C(CCC5C4(C)CCC(OC4OCC(OS(=O)(=O)O)C(O)C4OC4OC(C)C(O)C(O)C4O)C5(C)C)C2(C)CC(=O)C13 is sulfuric acid [4-hydroxy-6-[(6-isohexyl-4,8-diketo-2,6,13,17,17-pentamethyl-7-oxapentacyclo[10.8.0.02,9.05,9.013,18]eicos-11-en-16-yl)oxy]-5-(3,4,5-trihydroxy-6-methyl-tetrahydropyran-2-yl)oxy-tetrahydropyran-3-yl] ester."} {"text":"The traditional IUPAC name of the chemical with DeepSMILES CCC[C@@H]C=O)NCCC4)))))OCCCOCC6OC=O)C=CC=CC=C6)))))))))OCCCCCC6OCCCCCO6)C))O))O))O)))))C)))C=O)NCCNC=O)CCCCCC6)OCCCCCO6)CO)))O))NC=CN=N5))C=CC=CC=C6)))F))))))))O)))))O))NC=CN=N5))C=CC=CC=C6)))F))))))))))))))))))))))CO)))O is benzoic acid [4-[(1S)-1-(azetidine-1-carbonyl)butoxy]-2-[5-[2-[[3-[4-(3-fluorophenyl)triazol-1-yl]-5-[4-[4-(3-fluorophenyl)triazol-1-yl]-3,5-dihydroxy-6-methylol-tetrahydropyran-2-yl]oxy-4-hydroxy-cyclohexanecarbonyl]amino]ethylcarbamoyl]-3-methyl-2-(3,4,5-trihydroxy-6-methyl-tetrahydropyran-2-yl)oxy-cyclohexoxy]-5-hydroxy-6-methylol-tetrahydropyran-3-yl] ester."}", "/scratch/micpie/export/iupac_smiles/train_15-6.jsonl": "{"text":"The InChI of the compound with preferred IUPAC name 1-methyl-9H-pyrido[3,4-b]indole-7-carboxylic acid is InChI=1S\/C13H10N2O2\/c1-7-12-10(4-5-14-7)9-3-2-8(13(16)17)6-11(9)15-12\/h2-6,15H,1H3,(H,16,17)."} {"text":"The DeepSMILES of the chemical with IUPAC name 8-[5-chloro-4-(chloromethyl)pyridin-2-yl]oxyquinoline is C=CC=CC=C6)OC=NC=CC=C6)CCl)))Cl)))))))N=CC=C6."}", "/scratch/micpie/export/iupac_smiles/train_18-8.jsonl": "{"text":"Task: Please give me the SMILES of a chemical based on the systematic IUPAC name.\nIUPAC name: N-[3-[6-(6-methylpyridin-3-yl)-3,8a-dihydroquinazolin-8-yl]phenyl]prop-2-enamide\nResult: CC1=NC=C(C=C1)C2=CC3=CNC=NC3C(=C2)C4=CC(=CC=C4)NC(=O)C=C"} {"text":"Task: Please generate the SMILES representation of a molecule based on the systematic IUPAC name.\nIUPAC name: 2-[1-[[3,4-bis(fluoranyl)phenyl]methyl]piperidin-4-yl]-N-methyl-ethanamine;hydrochloride\nResult: CNCCC1CCN(CC1)CC2=CC(=C(C=C2)F)F.Cl"}", "/scratch/micpie/export/iupac_smiles/valid_13-8.jsonl": "{"text":"Task: Please create the SMILES of a chemical based on the systematic IUPAC name.\nIUPAC name: 2-[[(2-azanyl-3,3-dimethyl-butanoyl)amino]methyl]-2-methyl-butanoic acid\nResult: CCCC)CNC=O)CCC)C)C))N)))))C=O)O"} {"text":"Task: Please generate the SMILES of a compound given the systematic IUPAC name.\nIUPAC name: (2-chloranylpyridin-4-yl)-[(2R)-2-methyl-1,2,3,4-tetrahydro-1,5-benzodiazepin-5-yl]methanone\nResult: C[C@@H]1CCN(C(=O)c2ccnc(Cl)c2)c2ccccc2N1"}", "/scratch/micpie/export/iupac_smiles/train_9-2.jsonl": "{"text":"The preferred IUPAC name of the compound with SELFIES [C][C][Branch1][C][C][Branch1][C][C][N][C][C][Branch2][Ring1][=Branch2][C][O][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][#C][O] is 1-(tert-butylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,11,13-hexaenylideneamino)oxypropan-2-ol."} {"text":"The preferred IUPAC name of the molecule with SMILES CCCCC1(C(CC(=NC1=O)C)C)CNC(=O)C2=C(C(=CC=C2)N(C)C3CCCC3)C=C is N-[(5-butyl-2,4-dimethyl-6-oxo-3,4-dihydropyridin-5-yl)methyl]-3-[cyclopentyl(methyl)amino]-2-ethenylbenzamide."}", "/scratch/micpie/export/iupac_smiles/test_5-9.jsonl": "{"text":"Task: Please generate the SMILES representation of a chemical based on the IUAPC name in CAS-like style.\nIUPAC name: 4-(1-amino-2-methylbutan-2-yl)-3-ethyl-4-piperidinol\nResult: CCC1CNCCC1(O)C(C)(CC)CN"} {"text":"Task: Please give me the SMILES of a molecule based on the CAS-like IUPAC name.\nIUPAC name: N-(2-aminoethyl)-2-[(4-methyl-1-oxo-3-phenylpentyl)amino]benzamide\nResult: CC(C)C(CC(=O)Nc1ccccc1C(=O)NCCN)c1ccccc1"}", "/scratch/micpie/export/iupac_smiles/valid_25-6.jsonl": "{"text":"The InChI of the chemical with preferred IUPAC name 5-(5-chloropyridin-3-yl)-N-[(1S)-1-cyclopropylethyl]-7-methylpyrazolo[1,5-a]pyrimidine-3-carboxamide is InChI=1S\/C18H18ClN5O\/c1-10-5-16(13-6-14(19)8-20-7-13)23-17-15(9-21-24(10)17)18(25)22-11(2)12-3-4-12\/h5-9,11-12H,3-4H2,1-2H3,(H,22,25)\/t11-\/m0\/s1."} {"text":"The SMILES of the compound with IUPAC name 2-[amino(phenyl)methyl]-4-ethyl-1,3-thiazole-5-carboxylic acid is CCC1=C(SC(=N1)C(C2=CC=CC=C2)N)C(=O)O."}", "/scratch/micpie/export/iupac_smiles/train_0-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with canonical SMILES COc1cc(\/C=N\\NC(=O)c2ccn(Cc3ccc([N+](=O)[O-])cc3)n2)cc(Br)c1OC is N-[(Z)-(3-bromo-4,5-dimethoxy-benzylidene)amino]-1-(4-nitrobenzyl)pyrazole-3-carboxamide."} {"text":"The traditional IUPAC name of the chemical with SELFIES [C][C][C][C][C][C][N][Ring1][=Branch1][C][C][C][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C] is 1-[4-[3-(2-methylpiperidino)propoxy]phenyl]ethanone."}", "/scratch/micpie/export/iupac_smiles/test_0-6.jsonl": "{"text":"The canonical SMILES of the compound with IUPAC name 1-[(4-bromophenyl)methyl]-N-[(Z)-(3-methoxyphenyl)methylideneamino]piperidine-4-carboxamide is COc1cccc(\/C=N\\NC(=O)C2CCN(Cc3ccc(Br)cc3)CC2)c1."} {"text":"The SMILES of the chemical with preferred IUPAC name 2-(5-methyl-1lambda3-ioda-3-thia-2-azacyclopenta-1,4-dien-4-yl)-1,3-thiazole-4-carbaldehyde is CC1=C(SN=I1)C2=NC(=CS2)C=O."}", "/scratch/micpie/export/iupac_smiles/train_23-9.jsonl": "{"text":"Task: Please create the SMILES of a chemical given the IUAPC name in CAS-like style.\nIUPAC name: 1-(2-nitrophenyl)sulfonyl-3-piperidinecarboxaldehyde\nResult: CCCCNC6)S=O)=O)C=CC=CC=C6[N+]=O)[O-])))))))))))C=O"} {"text":"Task: Please give me the SMILES representation of a compound given the CAS-like IUPAC name.\nIUPAC name: N-(1-benzimidazolyl)-4-(2,5-dimethylphenyl)-4-oxobutanamide\nResult: [C][C][=C][C][=Branch1][=Branch2][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][C][=Branch1][C][=O][C][C][C][=Branch1][C][=O][N][N][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][=Branch2]"}", "/scratch/micpie/export/iupac_smiles/test_26-9.jsonl": "{"text":"Task: Please generate the SMILES representation of a molecule based on the CAS-like IUPAC name.\nIUPAC name: 2-(2,3-dichlorophenyl)-4-ethyl-5-thiazolecarboxylic acid\nResult: CCC1=C(SC(=N1)C2=C(C(=CC=C2)Cl)Cl)C(=O)O"} {"text":"Task: Please give me the SMILES representation of a molecule based on the CAS-like IUPAC name.\nIUPAC name: 1-[(3R)-3-(2-chlorophenyl)-5-(3-methoxyphenyl)-3,4-dihydropyrazol-2-yl]-2-(3,4-dihydro-1H-isoquinolin-2-yl)ethanone\nResult: InChI=1S\/C27H26ClN3O2\/c1-33-22-10-6-9-20(15-22)25-16-26(23-11-4-5-12-24(23)28)31(29-25)27(32)18-30-14-13-19-7-2-3-8-21(19)17-30\/h2-12,15,26H,13-14,16-18H2,1H3\/t26-\/m1\/s1"}", "/scratch/micpie/export/iupac_smiles/train_7-6.jsonl": "{"text":"The InChI of the molecule with IUPAC name methyl 2-[(5-amino-2-chlorophenyl)sulfonylamino]acetate is InChI=1S\/C9H11ClN2O4S\/c1-16-9(13)5-12-17(14,15)8-4-6(11)2-3-7(8)10\/h2-4,12H,5,11H2,1H3."} {"text":"The SMILES of the molecule with IUPAC name 3-(N-(2-nitrophenyl)sulfonylanilino)propanamide is C1=CC=C(C=C1)N(CCC(=O)N)S(=O)(=O)C2=CC=CC=C2[N+](=O)[O-]."}", "/scratch/micpie/export/iupac_smiles/valid_17-0.jsonl": "{"text":"The traditional IUPAC name of the compound with SELFIES [C][C][O][C][=N][C][=C][C][=C][C][=Branch2][Ring1][P][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][N][=N][Ring1][Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][F] is 2-ethoxy-N-(4-fluorophenyl)-3-[4-[2-(2H-tetrazol-5-yl)phenyl]benzyl]benzimidazole-4-carboxamide."} {"text":"The traditional IUPAC name of the molecule with SMILES CC(C)(C)NC1=CC=CC(=C1)NC(=O)C2=CC(=CC(=C2)C(=O)NC3=CC=CC(=C3)N)C(=O)NC4=CC(=CC=C4)NC(C)(C)C is N3-(3-aminophenyl)-N1,N5-bis[3-(tert-butylamino)phenyl]benzene-1,3,5-tricarboxamide."}", "/scratch/micpie/export/iupac_smiles/train_0-3.jsonl": "{"text":"The DeepSMILES of the chemical with traditional IUPAC name N-[(Z)-(3-bromo-4,5-dimethoxy-benzylidene)amino]-1-(4-nitrobenzyl)pyrazole-3-carboxamide is COC=CC=CC=C6)\/C=N\\NC=O)C=NNC=C5))CC=CC=CC=C6))[N+]=O)[O-]))))))))))))))))Br))OC."} {"text":"The SELFIES of the chemical with traditional IUPAC name 1-[4-[3-(2-methylpiperidino)propoxy]phenyl]ethanone is [C][C][C][C][C][C][N][Ring1][=Branch1][C][C][C][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C]."}", "/scratch/micpie/export/iupac_smiles/valid_19-8.jsonl": "{"text":"Task: Please generate the SMILES of a chemical based on the systematic IUPAC name.\nIUPAC name: N-[2-chloranyl-5-(1,2,3,4-tetrazol-1-yl)phenyl]-2-[4-[2-(methylamino)ethyl]piperidin-1-yl]ethanamide;hydrochloride\nResult: [C][N][C][C][C][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][=Branch1][C][=O][N][C][=C][Branch1][S][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][N][C][=N][N][=N][Ring1][Branch1][Cl].[Cl]"} {"text":"Task: Please create the SMILES of a molecule given the systematic IUPAC name.\nIUPAC name: methyl 2-[bis[2-[(4a-methyl-8-methylidene-2-propan-2-yl-1,2,3,4,5,6,7,8a-octahydronaphthalen-1-yl)amino]-2-oxidanylidene-ethyl]amino]ethanoate\nResult: [C][C][Branch1][C][C][C][C][C][C][Branch2][Branch1][P][C][C][C][C][=Branch1][C][=C][C][Ring1][#Branch1][C][Ring1][O][N][C][=Branch1][C][=O][C][N][Branch2][Ring2][Ring1][C][C][=Branch1][C][=O][N][C][C][Branch2][Ring1][Ring1][C][C][C][Branch1][=N][C][Ring1][=Branch1][C][=Branch1][C][=C][C][C][C][Ring1][#Branch1][C][C][Branch1][C][C][C][C][C][=Branch1][C][=O][O][C][C]"}", "/scratch/micpie/export/iupac_smiles/valid_2-7.jsonl": "{"text":"Task: Please create the SMILES representation of a compound given the traditional IUPAC name.\nIUPAC name: 4-[(1S,2S)-1-isopropyl-2-methyl-pentyl]phenol\nResult: InChI=1S\/C15H24O\/c1-5-6-12(4)15(11(2)3)13-7-9-14(16)10-8-13\/h7-12,15-16H,5-6H2,1-4H3\/t12-,15-\/m0\/s1"} {"text":"Task: Please give me the SMILES representation of a compound given the traditional IUPAC name.\nIUPAC name: 8-bromo-6-nitro-3-(2-p-phenetylthiazol-4-yl)coumarin\nResult: CCOC=CC=CC=C6))C=NC=CS5))C=CC=CC=CC=C6OC%10=O))))Br)))[N+]=O)[O-]"}", "/scratch/micpie/export/iupac_smiles/test_23-1.jsonl": "{"text":"The CAS-like IUPAC name of the chemical with canonical SMILES O=CC1CCCN(S(=O)(=O)c2ccc(O)c([N+](=O)[O-])c2)C1 is 1-(4-hydroxy-3-nitrophenyl)sulfonyl-3-piperidinecarboxaldehyde."} {"text":"The CAS-like IUPAC name of the compound with DeepSMILES CCNS=O)=O)C=CC=CC=C6))C=O)NNC=NC=CC=CC=C69 is N-(1-benzimidazolyl)-4-(ethylsulfamoyl)benzamide."}", "/scratch/micpie/export/iupac_smiles/train_23-4.jsonl": "{"text":"The InChI of the compound with systematic IUPAC name 1-(2-nitrophenyl)sulfonylpiperidine-3-carbaldehyde is InChI=1S\/C12H14N2O5S\/c15-9-10-4-3-7-13(8-10)20(18,19)12-6-2-1-5-11(12)14(16)17\/h1-2,5-6,9-10H,3-4,7-8H2."} {"text":"The canonical SMILES of the molecule with systematic IUPAC name N-(benzimidazol-1-yl)-4-(2,5-dimethylphenyl)-4-oxidanylidene-butanamide is Cc1ccc(C)c(C(=O)CCC(=O)Nn2cnc3ccccc32)c1."}", "/scratch/micpie/export/iupac_smiles/train_16-0.jsonl": "{"text":"The traditional IUPAC name of the chemical with SELFIES [C][C][=C][C][=Branch2][Ring1][N][=C][Branch2][Ring1][#Branch1][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][=N][C][=C][Branch1][=Branch2][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][Cl][Cl][C][C] is 5-chloro-4-(chloromethyl)-2-(2,3,5-trimethylphenoxy)pyridine."} {"text":"The traditional IUPAC name of the molecule with SELFIES [C][C][O][C][=N][C][=C][Branch1][#Branch2][C][=C][C][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][=Branch1][C][=O][O][C][Branch1][C][C][C] is 2-ethoxy-1H-benzimidazole-4-carboxylic acid isopropyl ester."}", "/scratch/micpie/export/iupac_smiles/test_1-4.jsonl": "{"text":"The DeepSMILES of the compound with systematic IUPAC name (3R)-3-nitrocyclohexan-1-ol is CC[C@H]CCC6)O)))[N+]=O)[O-]."} {"text":"The SMILES of the compound with systematic IUPAC name (2S)-1-(1H-indol-4-yloxy)-3-[[(2S)-3-(1H-indol-4-yloxy)-2-oxidanyl-propyl]-propan-2-yl-amino]propan-2-ol is CC(C)N(C[C@@H](COC1=CC=CC2=C1C=CN2)O)C[C@@H](COC3=CC=CC4=C3C=CN4)O."}", "/scratch/micpie/export/iupac_smiles/train_22-8.jsonl": "{"text":"Task: Please create the SMILES of a molecule based on the systematic IUPAC name.\nIUPAC name: 6-methyl-5-(5-methyl-1,3,4-oxadiazol-2-yl)-5-methylsulfonyl-4-(trifluoromethyl)cyclohexa-1,3-diene-1-carboxamide\nResult: Cc1nnc(C2(S(C)(=O)=O)C(C(F)(F)F)=CC=C(C(N)=O)C2C)o1"} {"text":"Task: Please generate the SMILES of a molecule based on the systematic IUPAC name.\nIUPAC name: 1-propylsulfonylpiperidine-3-carbaldehyde\nResult: [C][C][C][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][=O]"}", "/scratch/micpie/export/iupac_smiles/train_12-3.jsonl": "{"text":"The InChI of the chemical with traditional IUPAC name (3-dibenzofuran-1-ylphenyl)-(9,9-dimethylfluoren-2-yl)-(3-phenylphenyl)amine is InChI=1S\/C45H33NO\/c1-45(2)40-22-8-6-19-37(40)38-26-25-35(29-41(38)45)46(33-17-10-15-31(27-33)30-13-4-3-5-14-30)34-18-11-16-32(28-34)36-21-12-24-43-44(36)39-20-7-9-23-42(39)47-43\/h3-29H,1-2H3."} {"text":"The InChI of the compound with traditional IUPAC name 2-[(2-amino-3,3-dimethyl-butanoyl)amino]pent-4-enoic acid is InChI=1S\/C11H20N2O3\/c1-5-6-7(10(15)16)13-9(14)8(12)11(2,3)4\/h5,7-8H,1,6,12H2,2-4H3,(H,13,14)(H,15,16)."}", "/scratch/micpie/export/iupac_smiles/train_2-5.jsonl": "{"text":"The SELFIES of the molecule with CAS-like IUPAC name (6R,8R,9S,13S,14S,17S)-3-methoxy-13-methyl-17-[[(2S)-2-oxanyl]oxy]-6,7,8,9,11,12,14,15,16,17-decahydrocyclopenta[a]phenanthren-6-ol is [C][C@][C][C][C@H1][C@H1][Branch2][Ring1][C][C@@H1][Ring1][=Branch1][C][C][C@@H1][Ring1][=Branch2][O][C@H1][C][C][C][C][O][Ring1][=Branch1][C][C@H1][Branch1][#C][C][=C][Ring1][P][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O]."} {"text":"The canonical SMILES of the compound with IUAPC name in CAS-like style (1S,3R,5S,7R)-3-hydroxy-5-methyl-1-adamantanecarboxylate is C[C@@]12C[C@H]3C[C@@](O)(C1)C[C@](C(=O)[O-])(C3)C2."}", "/scratch/micpie/export/iupac_smiles/train_24-1.jsonl": "{"text":"The CAS-like IUPAC name of the compound with SELFIES [C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][O][C][C][=Branch1][C][=O][N][C][C][=Branch1][C][=O][N][N][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][=Branch2] is N-(1-benzimidazolyl)-2-[(1-oxo-2-phenoxyethyl)amino]acetamide."} {"text":"The CAS-like IUPAC name of the compound with SELFIES [C][C][=C][C][=Branch2][Ring2][C][=N][C][=C][Branch1][Branch2][C][=N][N][Ring1][=Branch2][Ring1][Branch1][C][=Branch1][C][=O][N][C@@H1][Branch1][=Branch1][C][C][C][Ring1][Ring1][C][Branch1][C][F][Branch1][C][F][F][C][=C][Branch1][#Branch1][C][=C][N][=C][Ring1][=Branch1][C][Branch1][C][F][Branch1][C][F][F] is N-[(1S)-1-cyclopropyl-2,2,2-trifluoroethyl]-7-methyl-5-[4-(trifluoromethyl)-3-pyridinyl]-3-pyrazolo[1,5-a]pyrimidinecarboxamide."}", "/scratch/micpie/export/iupac_smiles/train_2-8.jsonl": "{"text":"Task: Please create the SMILES of a compound given the systematic IUPAC name.\nIUPAC name: (6R,8R,9S,13S,14S,17S)-3-methoxy-13-methyl-17-[(2S)-oxan-2-yl]oxy-6,7,8,9,11,12,14,15,16,17-decahydrocyclopenta[a]phenanthren-6-ol\nResult: C[C@]12CC[C@H]3[C@H]([C@@H]1CC[C@@H]2O[C@H]4CCCCO4)C[C@H](C5=C3C=CC(=C5)OC)O"} {"text":"Task: Please create the SMILES of a molecule based on the systematic IUPAC name.\nIUPAC name: (1S,3S,5R,7R)-3-methyl-5-oxidanyl-adamantane-1-carboxylate\nResult: C[C@]12C[C@@H]3C[C@](C1)(C[C@@](C3)(C2)O)C(=O)[O-]"}", "/scratch/micpie/export/iupac_smiles/train_24-6.jsonl": "{"text":"The SELFIES of the chemical with IUPAC name N-(benzimidazol-1-yl)-2-[(2-phenoxyacetyl)amino]acetamide is [C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][O][C][C][=Branch1][C][=O][N][C][C][=Branch1][C][=O][N][N][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][=Branch2]."} {"text":"The SMILES of the compound with IUPAC name N-[(1S)-1-cyclopropyl-2,2,2-trifluoroethyl]-7-methyl-5-[4-(trifluoromethyl)pyridin-3-yl]pyrazolo[1,5-a]pyrimidine-3-carboxamide is CC1=CC(=NC2=C(C=NN12)C(=O)N[C@@H](C3CC3)C(F)(F)F)C4=C(C=CN=C4)C(F)(F)F."}", "/scratch/micpie/export/iupac_smiles/valid_24-9.jsonl": "{"text":"Task: Please generate the SMILES of a chemical based on the CAS-like IUPAC name.\nIUPAC name: 2-(1,4-dioxo-3H-phthalazin-2-yl)-N-(2-hydroxyphenyl)acetamide\nResult: [C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][N][Branch1][Branch1][C][Ring1][Branch2][=O][C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][O]"} {"text":"Task: Please create the SMILES representation of a chemical given the IUAPC name in CAS-like style.\nIUPAC name: 5-(7-fluoro-1,3-benzoxazol-5-yl)-7-methyl-N-(1,1,1-trifluoropropan-2-yl)-3-pyrazolo[1,5-a]pyrimidinecarboxamide\nResult: Cc1cc(-c2cc(F)c3ocnc3c2)nc2c(C(=O)NC(C)C(F)(F)F)cnn12"}", "/scratch/micpie/export/iupac_smiles/valid_25-5.jsonl": "{"text":"The SMILES of the chemical with IUAPC name in CAS-like style 5-(5-chloro-3-pyridinyl)-N-[(1S)-1-cyclopropylethyl]-7-methyl-3-pyrazolo[1,5-a]pyrimidinecarboxamide is CC1=CC(=NC2=C(C=NN12)C(=O)N[C@@H](C)C3CC3)C4=CC(=CN=C4)Cl."} {"text":"The SMILES of the chemical with CAS-like IUPAC name 2-[amino(phenyl)methyl]-4-ethyl-5-thiazolecarboxylic acid is CCC1=C(SC(=N1)C(C2=CC=CC=C2)N)C(=O)O."}", "/scratch/micpie/export/iupac_smiles/train_1-9.jsonl": "{"text":"Task: Please create the SMILES representation of a molecule given the CAS-like IUPAC name.\nIUPAC name: 3-[(1Z)-3-chlorobuta-1,3-dienyl]-2-methyl-N-[4-(1-piperidinyl)butyl]-4-pyridinamine\nResult: CC=NC=CC=C6\/C=C\\C=C)Cl)))))NCCCCNCCCCC6"} {"text":"Task: Please create the SMILES representation of a chemical based on the IUAPC name in CAS-like style.\nIUPAC name: (6S,8R,9S,13S,14S,17S)-3-methoxy-13-methyl-17-[[(2R)-2-oxanyl]oxy]-6,7,8,9,11,12,14,15,16,17-decahydrocyclopenta[a]phenanthren-6-ol\nResult: C[C@]CC[C@H][C@H][C@@H]6CC[C@@H]9O[C@@H]CCCCO6)))))))))))C[C@@H]C=C6C=CC=C6)OC)))))))O"}", "/scratch/micpie/export/iupac_smiles/test_1-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the chemical with SELFIES [C][C][C@H1][Branch1][=Branch2][C][C][Branch1][Ring2][C][Ring1][=Branch1][O][N+1][=Branch1][C][=O][O-1] is (3R)-3-nitro-1-cyclohexanol."} {"text":"The CAS-like IUPAC name of the compound with SELFIES [C][C][Branch1][C][C][N][Branch2][Ring1][Branch1][C][C@@H1][Branch1][S][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][N][Ring1][Branch1][O][C][C@@H1][Branch1][S][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][N][Ring1][Branch1][O] is (2S)-1-[[(2S)-2-hydroxy-3-(1H-indol-4-yloxy)propyl]-propan-2-ylamino]-3-(1H-indol-4-yloxy)-2-propanol."}", "/scratch/micpie/export/iupac_smiles/test_24-3.jsonl": "{"text":"The SMILES of the compound with traditional IUPAC name 3,5-difluoro-N-(5-thioxo-1H-1,2,4-triazol-4-yl)benzamide is C1=C(C=C(C=C1F)F)C(=O)NN2C=NNC2=S."} {"text":"The InChI of the molecule with traditional IUPAC name 7-methyl-N-(2,2,2-trifluoro-1-methyl-ethyl)-5-[6-(trifluoromethyl)-3-pyridyl]pyrazolo[1,5-a]pyrimidine-3-carboxamide is InChI=1S\/C17H13F6N5O\/c1-8-5-12(10-3-4-13(24-6-10)17(21,22)23)27-14-11(7-25-28(8)14)15(29)26-9(2)16(18,19)20\/h3-7,9H,1-2H3,(H,26,29)."}", "/scratch/micpie/export/iupac_smiles/valid_26-8.jsonl": "{"text":"Task: Please create the SMILES of a compound given the systematic IUPAC name.\nIUPAC name: 2-[4-[bis(fluoranyl)methyl]phenyl]-4-ethyl-1,3-thiazole-5-carboxylic acid\nResult: CCc1nc(-c2ccc(C(F)F)cc2)sc1C(=O)O"} {"text":"Task: Please create the SMILES of a compound based on the systematic IUPAC name.\nIUPAC name: 1-[(3R)-3-(2-chlorophenyl)-5-(3-methoxyphenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(phenylmethyl)piperazin-1-yl]ethanone\nResult: COC=CC=CC=C6)C=NN[C@H]C5)C=CC=CC=C6Cl))))))))C=O)CNCCNCC6))CC=CC=CC=C6"}", "/scratch/micpie/export/iupac_smiles/valid_14-5.jsonl": "{"text":"The SMILES of the molecule with CAS-like IUPAC name N-[[(1S,2S)-2-[[2-furanyl(oxo)methyl]amino]cyclopentyl]methyl]carbamic acid tert-butyl ester is CC(C)(C)OC(=O)NC[C@@H]1CCC[C@@H]1NC(=O)C2=CC=CO2."} {"text":"The SMILES of the molecule with IUAPC name in CAS-like style N-[2-(5-bromo-7-chloro-2-benzofuranyl)ethyl]carbamic acid tert-butyl ester is CC(C)(C)OC(=O)NCCC1=CC2=CC(=CC(=C2O1)Cl)Br."}", "/scratch/micpie/export/iupac_smiles/valid_24-2.jsonl": "{"text":"The IUPAC name of the compound with canonical SMILES O=C(Cn1[nH]c(=O)c2ccccc2c1=O)Nc1ccccc1O is 2-(1,4-dioxo-3H-phthalazin-2-yl)-N-(2-hydroxyphenyl)acetamide."} {"text":"The IUPAC name of the molecule with SMILES CC1=CC(=NC2=C(C=NN12)C(=O)NC(C)C(F)(F)F)C3=CC4=C(C(=C3)F)OC=N4 is 5-(7-fluoro-1,3-benzoxazol-5-yl)-7-methyl-N-(1,1,1-trifluoropropan-2-yl)pyrazolo[1,5-a]pyrimidine-3-carboxamide."}", "/scratch/micpie/export/iupac_smiles/train_6-4.jsonl": "{"text":"The DeepSMILES of the compound with systematic IUPAC name N-(2-azanylethyl)-2-[2-(oxolan-2-ylmethoxy)propanoylamino]benzamide is CCC=O)NC=CC=CC=C6C=O)NCCN)))))))))))))OCCCCCO5."} {"text":"The DeepSMILES of the chemical with systematic IUPAC name 3-azanyl-4-fluoranyl-N-[(1S)-1-phenylethyl]benzenesulfonamide is C[C@@H]C=CC=CC=C6))))))NS=O)=O)C=CC=CC=C6))F))N."}", "/scratch/micpie/export/iupac_smiles/test_3-8.jsonl": "{"text":"Task: Please create the SMILES representation of a molecule given the systematic IUPAC name.\nIUPAC name: (2R)-1-(phenylmethylsulfanyl)-3-propan-2-yloxy-propan-2-ol\nResult: InChI=1S\/C13H20O2S\/c1-11(2)15-8-13(14)10-16-9-12-6-4-3-5-7-12\/h3-7,11,13-14H,8-10H2,1-2H3\/t13-\/m1\/s1"} {"text":"Task: Please generate the SMILES of a molecule given the systematic IUPAC name.\nIUPAC name: 2-azanyl-2-(hydroxymethyl)propane-1,3-diol;5-chloranyl-6-methyl-3H-1,3-benzoxazol-2-one\nResult: InChI=1S\/C8H6ClNO2.C4H11NO3\/c1-4-2-7-6(3-5(4)9)10-8(11)12-7;5-4(1-6,2-7)3-8\/h2-3H,1H3,(H,10,11);6-8H,1-3,5H2"}", "/scratch/micpie/export/iupac_smiles/train_24-9.jsonl": "{"text":"Task: Please give me the SMILES of a compound given the CAS-like IUPAC name.\nIUPAC name: N-(1-benzimidazolyl)-2-[(1-oxo-2-phenoxyethyl)amino]acetamide\nResult: C=CC=CC=C6))OCC=O)NCC=O)NNC=NC=CC=CC=C69"} {"text":"Task: Please generate the SMILES of a compound given the IUAPC name in CAS-like style.\nIUPAC name: N-[(1S)-1-cyclopropyl-2,2,2-trifluoroethyl]-7-methyl-5-[4-(trifluoromethyl)-3-pyridinyl]-3-pyrazolo[1,5-a]pyrimidinecarboxamide\nResult: Cc1cc(-c2cnccc2C(F)(F)F)nc2c(C(=O)N[C@@H](C3CC3)C(F)(F)F)cnn12"}", "/scratch/micpie/export/iupac_smiles/valid_9-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with SELFIES [C][C][Branch1][C][C][N][C][C][Branch2][Ring1][=Branch2][C][O][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][#C][O] is 1-(isopropylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,9,11,13-heptaenylideneamino)oxy-propan-2-ol."} {"text":"The traditional IUPAC name of the molecule with InChI InChI=1S\/C28H42ClN3O2\/c1-18-16-28(3,20-10-7-5-6-8-11-20)24(27(34)31-18)17-30-26(33)23-14-21(29)15-25(19(23)2)32(4)22-12-9-13-22\/h14-15,18,20,22,24H,5-13,16-17H2,1-4H3,(H,30,33)(H,31,34) is 5-chloro-3-[cyclobutyl(methyl)amino]-N-[(4-cycloheptyl-2-keto-4,6-dimethyl-3-piperidyl)methyl]-2-methyl-benzamide."}", "/scratch/micpie/export/iupac_smiles/train_5-2.jsonl": "{"text":"The IUPAC name of the molecule with SELFIES [C][C][C][C][N][C][C][C][Ring1][=Branch1][Branch1][N][C][Branch1][#Branch1][C][C][O][C][Ring1][Branch1][C][N][O] is 4-[3-(aminomethyl)oxolan-3-yl]-3-ethylpiperidin-4-ol."} {"text":"The IUPAC name of the molecule with SMILES CC(C(=O)NC1=CC=CC=C1C(=O)NCCN)OCC2CCCO2.Cl is N-(2-aminoethyl)-2-[2-(oxolan-2-ylmethoxy)propanoylamino]benzamide;hydrochloride."}", "/scratch/micpie/export/iupac_smiles/valid_8-8.jsonl": "{"text":"Task: Please give me the SMILES of a chemical based on the systematic IUPAC name.\nIUPAC name: 3-[naphthalen-2-ylsulfonyl(phenyl)amino]propanamide\nResult: InChI=1S\/C19H18N2O3S\/c20-19(22)12-13-21(17-8-2-1-3-9-17)25(23,24)18-11-10-15-6-4-5-7-16(15)14-18\/h1-11,14H,12-13H2,(H2,20,22)"} {"text":"Task: Please create the SMILES representation of a molecule based on the systematic IUPAC name.\nIUPAC name: 1-[(E)-3,4-dihydro-2H-naphthalen-1-ylideneamino]oxy-3-(propan-2-ylamino)propan-2-ol;hydrochloride\nResult: CCC)NCCCO\/N=C\\CCCC=CC=CC=C6\\%10)))))))))))))O.Cl"}", "/scratch/micpie/export/iupac_smiles/train_1-6.jsonl": "{"text":"The DeepSMILES of the molecule with preferred IUPAC name 3-[(1Z)-3-chlorobuta-1,3-dienyl]-2-methyl-N-(4-piperidin-1-ylbutyl)pyridin-4-amine is CC=NC=CC=C6\/C=C\\C=C)Cl)))))NCCCCNCCCCC6."} {"text":"The canonical SMILES of the chemical with IUPAC name (6S,8R,9S,13S,14S,17S)-3-methoxy-13-methyl-17-[(2R)-oxan-2-yl]oxy-6,7,8,9,11,12,14,15,16,17-decahydrocyclopenta[a]phenanthren-6-ol is COc1ccc2c(c1)[C@@H](O)C[C@@H]1[C@@H]2CC[C@]2(C)[C@@H](O[C@@H]3CCCCO3)CC[C@@H]12."}", "/scratch/micpie/export/iupac_smiles/train_8-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the compound with canonical SMILES Cc1ccc([N+](=O)[O-])cc1S(=O)(=O)N(CCC(N)=O)c1ccccc1 is 3-(N-(2-methyl-5-nitrophenyl)sulfonylanilino)propanamide."} {"text":"The IUAPC name in CAS-like style of the chemical with DeepSMILES CCC)NCCCON=CC=CC=CC=C6CCC=CC=CC=C6%15))))))))))))))))))O.Cl is 1-(propan-2-ylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,11,13-hexaenylideneamino)oxy-2-propanol;hydrochloride."}", "/scratch/micpie/export/iupac_smiles/train_8-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with SMILES CC1=C(C=C(C=C1)[N+](=O)[O-])S(=O)(=O)N(CCC(=O)N)C2=CC=CC=C2 is 3-(N-(2-methyl-5-nitro-phenyl)sulfonylanilino)propionamide."} {"text":"The traditional IUPAC name of the molecule with SMILES CC(C)NCC(CON=C1C2=CC=CC=C2CCC3=CC=CC=C31)O.Cl is 1-(isopropylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,11,13-hexaenylideneamino)oxy-propan-2-ol;hydrochloride."}", "/scratch/micpie/export/iupac_smiles/train_26-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the chemical with canonical SMILES CCc1nc(-c2ccc(Cl)c(F)c2)sc1C(=O)O is 2-(4-chloro-3-fluorophenyl)-4-ethyl-5-thiazolecarboxylic acid."} {"text":"The CAS-like IUPAC name of the chemical with canonical SMILES O=C(CN1CCN(c2ccccc2F)CC1)N1N=C(c2ccc(F)cc2)C[C@H]1c1ccc(Cl)cc1 is 1-[(3S)-3-(4-chlorophenyl)-5-(4-fluorophenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(2-fluorophenyl)-1-piperazinyl]ethanone."}", "/scratch/micpie/export/iupac_smiles/test_10-1.jsonl": "{"text":"The CAS-like IUPAC name of the molecule with SELFIES [C][C][=C][Branch2][Ring1][=Branch2][C][=C][Branch2][Ring1][Ring1][C][=C][Ring1][=Branch1][N][Branch1][Ring2][C][C][=C][C][C][C][N][C][C][Ring1][=Branch1][Cl][C][Branch1][C][C][O] is 1-[5-chloro-2-methyl-3-[4-piperidinyl(prop-2-enyl)amino]phenyl]ethanol."} {"text":"The IUAPC name in CAS-like style of the compound with InChI InChI=1S\/C26H31FN2O3S2\/c1-32-20-5-6-25-23(14-20)22(24(27)15-28-25)4-2-3-18-7-9-29(16-19(18)13-26(30)31)10-12-34-21-8-11-33-17-21\/h5-6,8,11,14-15,17-19H,2-4,7,9-10,12-13,16H2,1H3,(H,30,31) is 2-[4-[3-(3-fluoro-6-methoxy-4-quinolinyl)propyl]-1-[2-(3-thiophenylthio)ethyl]-3-piperidinyl]acetic acid."}", "/scratch/micpie/export/iupac_smiles/train_13-4.jsonl": "{"text":"The SMILES of the chemical with systematic IUPAC name 2-azanyl-N,3,3-trimethyl-N-prop-2-ynyl-butanamide is CC(C)(C)C(C(=O)N(C)CC#C)N."} {"text":"The DeepSMILES of the compound with systematic IUPAC name (1,3,5-trimethylpyrazol-4-yl) (4R)-3,4-dihydro-2H-chromene-4-carboxylate is CC=CC=NN5C)))C))OC=O)[C@@H]CCOC=CC=CC=C%106."}", "/scratch/micpie/export/iupac_smiles/test_20-0.jsonl": "{"text":"The traditional IUPAC name of the chemical with SELFIES [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C@@H1][C@@H1][Branch1][Branch1][C][Ring1][=Branch1][=O][C][C@H1][C][Branch2][Ring1][Branch1][C][Ring1][Branch2][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][Branch1][=Branch1][C][Ring2][Ring1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][C] is (2S,8S,12S)-3,5,9,11-tetraketo-4,10-bis(o-phenetyl)-4,10-diazatetracyclo[5.5.2.02,6.08,12]tetradec-13-ene-13-carboxylic acid phenyl ester."} {"text":"The traditional IUPAC name of the compound with SMILES CC1C(C(C(C(O1)OC2C(CC(CC2OC3C(C(C(C(O3)CO)O)O[C@@H](CC4CCCCC4)C(=O)O)OC(=O)C5=CC=CC=C5)C(=O)NCCOC6C(C(C(C(O6)CO)O)N7C=C(N=N7)C8=CC(=CC=C8)F)O)N9C=C(N=N9)C1=CC(=CC=C1)F)O)O)O is (2S)-2-[3-benzoyloxy-2-[3-[4-(3-fluorophenyl)triazol-1-yl]-5-[2-[4-[4-(3-fluorophenyl)triazol-1-yl]-3,5-dihydroxy-6-methylol-tetrahydropyran-2-yl]oxyethylcarbamoyl]-2-(3,4,5-trihydroxy-6-methyl-tetrahydropyran-2-yl)oxy-cyclohexoxy]-5-hydroxy-6-methylol-tetrahydropyran-4-yl]oxy-3-cyclohexyl-propionic acid."}", "/scratch/micpie/export/iupac_smiles/test_5-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the chemical with SMILES CCC1CNCCC1(C(C)(CC)CN)O is 4-(1-amino-2-methylbutan-2-yl)-3-ethyl-4-piperidinol."} {"text":"The IUAPC name in CAS-like style of the molecule with InChI InChI=1S\/C21H27N3O2\/c1-15(2)18(16-8-4-3-5-9-16)14-20(25)24-19-11-7-6-10-17(19)21(26)23-13-12-22\/h3-11,15,18H,12-14,22H2,1-2H3,(H,23,26)(H,24,25) is N-(2-aminoethyl)-2-[(4-methyl-1-oxo-3-phenylpentyl)amino]benzamide."}", "/scratch/micpie/export/iupac_smiles/train_20-2.jsonl": "{"text":"The IUPAC name of the chemical with InChI InChI=1S\/C41H64O15S\/c1-20(2)10-9-15-40(8)33-24(42)18-39(7)23-11-12-26-37(4,5)27(14-16-38(26,6)22(23)13-17-41(33,39)36(47)55-40)53-35-32(29(44)25(19-51-35)56-57(48,49)50)54-34-31(46)30(45)28(43)21(3)52-34\/h13,20-21,23,25-35,43-46H,9-12,14-19H2,1-8H3,(H,48,49,50) is [4-hydroxy-6-[[2,6,13,17,17-pentamethyl-6-(4-methylpentyl)-4,8-dioxo-7-oxapentacyclo[10.8.0.02,9.05,9.013,18]icos-11-en-16-yl]oxy]-5-(3,4,5-trihydroxy-6-methyloxan-2-yl)oxyoxan-3-yl] hydrogen sulfate."} {"text":"The preferred IUPAC name of the molecule with SMILES CCC[C@@H](C(=O)N1CCC1)OC2C(C(OC(C2OC(=O)C3=CC=CC=C3)OC4CC(CC(C4OC5C(C(C(C(O5)C)O)O)O)C)C(=O)NCCNC(=O)C6CC(C(C(C6)OC7C(C(C(C(O7)CO)O)N8C=C(N=N8)C9=CC(=CC=C9)F)O)O)N1C=C(N=N1)C1=CC(=CC=C1)F)CO)O is [4-[(2S)-1-(azetidin-1-yl)-1-oxopentan-2-yl]oxy-2-[5-[2-[[3-[4-(3-fluorophenyl)triazol-1-yl]-5-[4-[4-(3-fluorophenyl)triazol-1-yl]-3,5-dihydroxy-6-(hydroxymethyl)oxan-2-yl]oxy-4-hydroxycyclohexanecarbonyl]amino]ethylcarbamoyl]-3-methyl-2-(3,4,5-trihydroxy-6-methyloxan-2-yl)oxycyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)oxan-3-yl] benzoate."}", "/scratch/micpie/export/iupac_smiles/valid_13-6.jsonl": "{"text":"The SMILES of the compound with preferred IUPAC name 2-[[(2-amino-3,3-dimethylbutanoyl)amino]methyl]-2-methylbutanoic acid is CCC(C)(CNC(=O)C(C(C)(C)C)N)C(=O)O."} {"text":"The SELFIES of the compound with preferred IUPAC name (2-chloropyridin-4-yl)-[(2R)-2-methyl-1,2,3,4-tetrahydro-1,5-benzodiazepin-5-yl]methanone is [C][C@@H1][C][C][N][Branch1][N][C][=C][C][=C][C][=C][Ring1][=Branch1][N][Ring1][O][C][=Branch1][C][=O][C][=C][C][=Branch1][=Branch1][=N][C][=C][Ring1][=Branch1][Cl]."}", "/scratch/micpie/export/iupac_smiles/valid_15-7.jsonl": "{"text":"Task: Please give me the SMILES of a molecule given the traditional IUPAC name.\nIUPAC name: methanesulfonic acid 2-[5-[5-(4,4-difluoropiperidine-1-carbonyl)-2-pyridyl]-7-(trifluoromethyl)benzofuran-2-yl]ethyl ester\nResult: CS(=O)(=O)OCCC1=CC2=CC(=CC(=C2O1)C(F)(F)F)C3=NC=C(C=C3)C(=O)N4CCC(CC4)(F)F"} {"text":"Task: Please give me the SMILES representation of a compound based on the traditional IUPAC name.\nIUPAC name: 5-chloro-4-(chloromethyl)-2-(1-naphthoxy)pyridine\nResult: C=CC=CC=C6)C=CC=C6OC=NC=CC=C6)CCl)))Cl"}", "/scratch/micpie/export/iupac_smiles/test_19-6.jsonl": "{"text":"The DeepSMILES of the compound with IUPAC name N-(4-chlorophenyl)-3-[4-[2-(methylamino)ethyl]piperidin-1-yl]propanamide;hydrochloride is CNCCCCCNCC6))CCC=O)NC=CC=CC=C6))Cl.Cl."} {"text":"The canonical SMILES of the compound with preferred IUPAC name nan is COc1cc([C@@H]2CC(=O)C[C@H](OC(C)=O)CC[C@@]34Cc5c[nH]cc5[C@H](C#CCC3=CC=C[C@@H]4C)[C@H](c3cccc(O)c3)C3=CCNC(=C3)N(CCC(C)=O)c3ccc4c5c(n2cc35)C[C@]2(CCC3(CCCC3)C2)[C@@H]4O)cc(O)c1Oc1cccc(O)c1."}", "/scratch/micpie/export/iupac_smiles/valid_13-0.jsonl": "{"text":"The traditional IUPAC name of the compound with SELFIES [C][C][C][Branch1][C][C][Branch2][Ring1][Ring1][C][N][C][=Branch1][C][=O][C][Branch1][=Branch2][C][Branch1][C][C][Branch1][C][C][C][N][C][=Branch1][C][=O][O] is 2-[[(2-amino-3,3-dimethyl-butanoyl)amino]methyl]-2-methyl-butyric acid."} {"text":"The traditional IUPAC name of the chemical with canonical SMILES C[C@@H]1CCN(C(=O)c2ccnc(Cl)c2)c2ccccc2N1 is (2-chloro-4-pyridyl)-[(2R)-2-methyl-1,2,3,4-tetrahydro-1,5-benzodiazepin-5-yl]methanone."}", "/scratch/micpie/export/iupac_smiles/valid_8-4.jsonl": "{"text":"The SELFIES of the chemical with systematic IUPAC name 3-[naphthalen-2-ylsulfonyl(phenyl)amino]propanamide is [C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][Branch1][Branch2][C][C][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][#Branch2]."} {"text":"The DeepSMILES of the molecule with systematic IUPAC name 1-[(E)-3,4-dihydro-2H-naphthalen-1-ylideneamino]oxy-3-(propan-2-ylamino)propan-2-ol;hydrochloride is CCC)NCCCO\/N=C\\CCCC=CC=CC=C6\\%10)))))))))))))O.Cl."}", "/scratch/micpie/export/iupac_smiles/train_4-1.jsonl": "{"text":"The CAS-like IUPAC name of the molecule with DeepSMILES C[C@@]CC[C@H][C@@][C@H]6CC[C@@H][C@H]%10\/C=C\/C=NC=CC=C6))C=CC=CC=C6)))F)))))))))))CSCCS5)))))))))C)C=O)))O is (1S,2R,4aS,5R,6S,8aS)-6-(1,3-dithiolan-2-yl)-5-[(E)-2-[5-(3-fluorophenyl)-2-pyridinyl]ethenyl]-2-hydroxy-1,4a-dimethyl-2,3,4,5,6,7,8,8a-octahydronaphthalene-1-carboxaldehyde."} {"text":"The CAS-like IUPAC name of the molecule with canonical SMILES CCC(CN)C1(O)CCNCC1CC is 4-(1-aminobutan-2-yl)-3-ethyl-4-piperidinol."}", "/scratch/micpie/export/iupac_smiles/valid_2-8.jsonl": "{"text":"Task: Please create the SMILES of a molecule based on the systematic IUPAC name.\nIUPAC name: 4-[(3S,4S)-2,4-dimethylheptan-3-yl]phenol\nResult: CCC[C@H]C)[C@@H]C=CC=CC=C6))O)))))CC)C"} {"text":"Task: Please create the SMILES representation of a molecule based on the systematic IUPAC name.\nIUPAC name: 8-bromanyl-3-[2-(4-ethoxyphenyl)-1,3-thiazol-4-yl]-6-nitro-chromen-2-one\nResult: CCOc1ccc(-c2nc(-c3cc4cc([N+](=O)[O-])cc(Br)c4oc3=O)cs2)cc1"}", "/scratch/micpie/export/iupac_smiles/train_9-3.jsonl": "{"text":"The SELFIES of the compound with traditional IUPAC name 1-(tert-butylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,11,13-hexaenylideneamino)oxy-propan-2-ol is [C][C][Branch1][C][C][Branch1][C][C][N][C][C][Branch2][Ring1][=Branch2][C][O][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][#C][O]."} {"text":"The InChI of the compound with traditional IUPAC name N-[(5-butyl-6-keto-2,4-dimethyl-3,4-dihydropyridin-5-yl)methyl]-3-[cyclopentyl(methyl)amino]-2-vinyl-benzamide is InChI=1S\/C27H39N3O2\/c1-6-8-16-27(19(3)17-20(4)29-26(27)32)18-28-25(31)23-14-11-15-24(22(23)7-2)30(5)21-12-9-10-13-21\/h7,11,14-15,19,21H,2,6,8-10,12-13,16-18H2,1,3-5H3,(H,28,31)."}", "/scratch/micpie/export/iupac_smiles/test_4-7.jsonl": "{"text":"Task: Please generate the SMILES of a compound given the traditional IUPAC name.\nIUPAC name: 1-(3-methoxyphenyl)sulfonyl-3-[(4-methylpiperazino)methyl]indole\nResult: InChI=1S\/C21H25N3O3S\/c1-22-10-12-23(13-11-22)15-17-16-24(21-9-4-3-8-20(17)21)28(25,26)19-7-5-6-18(14-19)27-2\/h3-9,14,16H,10-13,15H2,1-2H3"} {"text":"Task: Please generate the SMILES representation of a compound given the traditional IUPAC name.\nIUPAC name: 4-(2-amino-1,1-dimethyl-ethyl)-3-ethyl-piperidin-4-ol\nResult: CCCCNCCC6CC)C)CN)))O"}", "/scratch/micpie/export/iupac_smiles/train_22-4.jsonl": "{"text":"The SMILES of the molecule with systematic IUPAC name 6-methyl-5-(5-methyl-1,3,4-oxadiazol-2-yl)-5-methylsulfonyl-4-(trifluoromethyl)cyclohexa-1,3-diene-1-carboxamide is CC1C(=CC=C(C1(C2=NN=C(O2)C)S(=O)(=O)C)C(F)(F)F)C(=O)N."} {"text":"The SMILES of the molecule with systematic IUPAC name 1-propylsulfonylpiperidine-3-carbaldehyde is CCCS(=O)(=O)N1CCCC(C1)C=O."}", "/scratch/micpie/export/iupac_smiles/train_26-2.jsonl": "{"text":"The preferred IUPAC name of the molecule with InChI InChI=1S\/C12H9ClFNO2S\/c1-2-9-10(12(16)17)18-11(15-9)6-3-4-7(13)8(14)5-6\/h3-5H,2H2,1H3,(H,16,17) is 2-(4-chloro-3-fluorophenyl)-4-ethyl-1,3-thiazole-5-carboxylic acid."} {"text":"The preferred IUPAC name of the compound with InChI InChI=1S\/C27H25ClF2N4O\/c28-21-9-5-20(6-10-21)26-17-24(19-7-11-22(29)12-8-19)31-34(26)27(35)18-32-13-15-33(16-14-32)25-4-2-1-3-23(25)30\/h1-12,26H,13-18H2\/t26-\/m0\/s1 is 1-[(3S)-3-(4-chlorophenyl)-5-(4-fluorophenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(2-fluorophenyl)piperazin-1-yl]ethanone."}", "/scratch/micpie/export/iupac_smiles/test_3-3.jsonl": "{"text":"The InChI of the compound with traditional IUPAC name (2R)-1-(benzylthio)-3-isopropoxy-propan-2-ol is InChI=1S\/C13H20O2S\/c1-11(2)15-8-13(14)10-16-9-12-6-4-3-5-7-12\/h3-7,11,13-14H,8-10H2,1-2H3\/t13-\/m1\/s1."} {"text":"The SELFIES of the compound with traditional IUPAC name 2-amino-2-methylol-propane-1,3-diol;5-chloro-6-methyl-3H-1,3-benzoxazol-2-one is [C][C][=C][C][=C][Branch1][=Branch1][C][=C][Ring1][=Branch1][Cl][N][C][=Branch1][C][=O][O][Ring1][=Branch2].[C][Branch1][O][C][Branch1][Ring1][C][O][Branch1][Ring1][C][O][N][O]."}", "/scratch/micpie/export/iupac_smiles/train_25-0.jsonl": "{"text":"The traditional IUPAC name of the chemical with SELFIES [C][C][C][C][O][C][C][C][C][Branch1][Branch1][C][C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=C][N][=C][Branch1][=N][C][=C][Branch1][Branch2][N][Ring1][=Branch1][N][=C][Ring1][=Branch2][C][C][=C][C][=Branch1][=Branch1][=C][N][=C][Ring1][=Branch1][Cl] is N-(4-butoxycyclohexyl)-5-(5-chloro-3-pyridyl)-7-methyl-pyrazolo[1,5-a]pyrimidine-3-carboxamide."} {"text":"The traditional IUPAC name of the chemical with SELFIES [C][C][C][=C][Branch2][Ring1][Ring2][S][C][=C][C][=C][Branch1][#Branch1][C][=C][Ring1][=Branch1][Ring1][=Branch2][O][C][C][O][Ring1][Branch2][C][=Branch1][C][=O][O] is 8-ethyl-2,3-dihydrothieno[2,3-g][1,4]benzodioxin-7-carboxylic acid."}", "/scratch/micpie/export/iupac_smiles/valid_19-9.jsonl": "{"text":"Task: Please create the SMILES representation of a molecule given the IUAPC name in CAS-like style.\nIUPAC name: N-[2-chloro-5-(1-tetrazolyl)phenyl]-2-[4-[2-(methylamino)ethyl]-1-piperidinyl]acetamide;hydrochloride\nResult: InChI=1S\/C17H24ClN7O.ClH\/c1-19-7-4-13-5-8-24(9-6-13)11-17(26)21-16-10-14(2-3-15(16)18)25-12-20-22-23-25;\/h2-3,10,12-13,19H,4-9,11H2,1H3,(H,21,26);1H"} {"text":"Task: Please generate the SMILES representation of a compound based on the IUAPC name in CAS-like style.\nIUPAC name: 2-[bis[2-[(4a-methyl-8-methylene-2-propan-2-yl-1,2,3,4,5,6,7,8a-octahydronaphthalen-1-yl)amino]-2-oxoethyl]amino]acetic acid methyl ester\nResult: CC(C)C1CCC2(CCCC(=C)C2C1NC(=O)CN(CC(=O)NC3C(CCC4(C3C(=C)CCC4)C)C(C)C)CC(=O)OC)C"}", "/scratch/micpie/export/iupac_smiles/train_3-4.jsonl": "{"text":"The DeepSMILES of the chemical with systematic IUPAC name (5-nitrofuran-2-yl)-pyrrolidin-1-yl-methanone is CCCNC5)C=O)C=CC=CO5)[N+]=O)[O-]."} {"text":"The SELFIES of the molecule with systematic IUPAC name (12S,14S,17E)-7-chloranyl-12-(hydroxymethyl)-23-(phenylmethyl)-15,20-dioxa-2,4,5,9,11,23-hexazahexacyclo[19.7.1.13,10.111,14.04,8.022,27]hentriaconta-1(28),3(31),5,7,9,17,21(29),22(27)-octaen-24-one is [C][C][C][=Branch1][C][=O][N][Branch2][Branch1][Branch1][C][=C][Ring1][#Branch1][C][=C][C][=C][Ring1][=Branch1][O][C][\/C][=C][\/C][O][C@H1][C][C@H1][Branch2][Ring1][=N][N][Branch1][Ring2][C][Ring1][Branch1][C][=N][C][=C][Branch1][S][C][=N][N][Ring1][Branch1][C][=Branch1][Ring2][=C][Ring1][=Branch2][N][Ring2][Ring1][Branch2][Cl][C][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]."}", "/scratch/micpie/export/iupac_smiles/test_12-9.jsonl": "{"text":"Task: Please create the SMILES representation of a compound given the CAS-like IUPAC name.\nIUPAC name: 2-(4-chlorophenyl)spiro[fluorene-9,9'-thioxanthene]\nResult: Clc1ccc(-c2ccc3c(c2)C2(c4ccccc4Sc4ccccc42)c2ccccc2-3)cc1"} {"text":"Task: Please give me the SMILES of a chemical given the CAS-like IUPAC name.\nIUPAC name: 2-amino-N-(cyclopropylmethyl)-3,3-dimethyl-N-prop-2-ynylbutanamide\nResult: InChI=1S\/C13H22N2O\/c1-5-8-15(9-10-6-7-10)12(16)11(14)13(2,3)4\/h1,10-11H,6-9,14H2,2-4H3"}", "/scratch/micpie/export/iupac_smiles/valid_1-3.jsonl": "{"text":"The DeepSMILES of the compound with traditional IUPAC name allyl-ethyl-methyl-amine;ethylene is CCNC)CC=C.C=C."} {"text":"The SELFIES of the compound with traditional IUPAC name (2R)-1-[[(2S)-2-hydroxy-3-(1H-indol-4-yloxy)propyl]-isopropyl-amino]-3-(1H-indol-4-yloxy)propan-2-ol is [C][C][Branch1][C][C][N][Branch2][Ring1][Branch1][C][C@H1][Branch1][S][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][N][Ring1][Branch1][O][C][C@@H1][Branch1][S][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][N][Ring1][Branch1][O]."}", "/scratch/micpie/export/iupac_smiles/train_8-5.jsonl": "{"text":"The canonical SMILES of the molecule with IUAPC name in CAS-like style 3-(N-(2-methyl-5-nitrophenyl)sulfonylanilino)propanamide is Cc1ccc([N+](=O)[O-])cc1S(=O)(=O)N(CCC(N)=O)c1ccccc1."} {"text":"The SMILES of the compound with IUAPC name in CAS-like style 1-(propan-2-ylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,11,13-hexaenylideneamino)oxy-2-propanol;hydrochloride is CC(C)NCC(CON=C1C2=CC=CC=C2CCC3=CC=CC=C31)O.Cl."}", "/scratch/micpie/export/iupac_smiles/valid_22-6.jsonl": "{"text":"The SELFIES of the compound with preferred IUPAC name 1-cyclopropyl-3-ethynyl-2-(trifluoromethyl)benzene is [C][#C][C][=C][Branch1][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][C][C][Ring1][Ring1][C][Branch1][C][F][Branch1][C][F][F]."} {"text":"The InChI of the molecule with IUPAC name 1-(3-fluoro-4-methoxyphenyl)sulfonylpiperidine-3-carbaldehyde is InChI=1S\/C13H16FNO4S\/c1-19-13-5-4-11(7-12(13)14)20(17,18)15-6-2-3-10(8-15)9-16\/h4-5,7,9-10H,2-3,6,8H2,1H3."}", "/scratch/micpie/export/iupac_smiles/test_9-4.jsonl": "{"text":"The canonical SMILES of the compound with systematic IUPAC name 1-(fluoren-9-ylideneamino)oxy-3-(4-phenylbutan-2-ylamino)propan-2-ol is CC(CCc1ccccc1)NCC(O)CON=C1c2ccccc2-c2ccccc21."} {"text":"The SELFIES of the molecule with systematic IUPAC name (2S)-5-chloranyl-N-[(4,6-dimethyl-2-oxidanylidene-piperidin-3-yl)methyl]-3-[ethyl(propanoyl)amino]-2-methyl-cyclohexane-1-carboxamide is [C][C][C][=Branch1][C][=O][N][Branch1][Ring1][C][C][C][C][C][Branch2][Ring1][=C][C][C][Branch1][Branch1][C@@H1][Ring1][=Branch1][C][C][=Branch1][C][=O][N][C][C][C][Branch1][O][C][C][Branch1][=Branch1][N][C][Ring1][=Branch1][=O][C][C][Cl]."}", "/scratch/micpie/export/iupac_smiles/train_8-9.jsonl": "{"text":"Task: Please create the SMILES of a chemical given the IUAPC name in CAS-like style.\nIUPAC name: 3-(N-(2-methyl-5-nitrophenyl)sulfonylanilino)propanamide\nResult: CC=CC=CC=C6))[N+]=O)[O-]))))S=O)=O)NCCC=O)N))))C=CC=CC=C6"} {"text":"Task: Please give me the SMILES of a molecule given the IUAPC name in CAS-like style.\nIUPAC name: 1-(propan-2-ylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,11,13-hexaenylideneamino)oxy-2-propanol;hydrochloride\nResult: CCC)NCCCON=CC=CC=CC=C6CCC=CC=CC=C6%15))))))))))))))))))O.Cl"}", "/scratch/micpie/export/iupac_smiles/train_27-4.jsonl": "{"text":"The DeepSMILES of the compound with systematic IUPAC name 1-[(3R)-3-(4-chlorophenyl)-5-(4-fluorophenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(2-fluorophenyl)piperazin-1-yl]ethanone is CCNCCN6CC=O)N[C@H]CC=N5)C=CC=CC=C6))F)))))))C=CC=CC=C6))Cl))))))))))))C=CC=CC=C6F."} {"text":"The DeepSMILES of the molecule with systematic IUPAC name 6-[4-[4-[(9,9-dimethylfluoren-2-yl)amino]phenyl]-2,3,5,6-tetrakis(oxidanyl)phenyl]benzene-1,2,3,4,5-pentol is CCC=CC=CC=C6C=C9C=CC=C6))NC=CC=CC=C6))C=CC=CC=C6O))O))C=CC=CC=C6O))O))O))O))O))))O))O))))))))))))))))))C."}", "/scratch/micpie/export/iupac_smiles/train_13-1.jsonl": "{"text":"The CAS-like IUPAC name of the compound with SMILES CC(C)(C)C(C(=O)N(C)CC#C)N is 2-amino-N,3,3-trimethyl-N-prop-2-ynylbutanamide."} {"text":"The CAS-like IUPAC name of the chemical with canonical SMILES Cc1nn(C)c(C)c1OC(=O)[C@@H]1CCOc2ccccc21 is (4R)-3,4-dihydro-2H-1-benzopyran-4-carboxylic acid (1,3,5-trimethyl-4-pyrazolyl) ester."}", "/scratch/micpie/export/iupac_smiles/test_19-1.jsonl": "{"text":"The CAS-like IUPAC name of the chemical with SELFIES [C][N][C][C][C][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl].[Cl] is N-(4-chlorophenyl)-3-[4-[2-(methylamino)ethyl]-1-piperidinyl]propanamide;hydrochloride."} {"text":"The CAS-like IUPAC name of the compound with InChI InChI=1S\/C73H78N4O10\/c1-44-11-7-13-51-14-9-18-57-59-41-74-40-50(59)38-73(44,51)26-21-56(86-46(3)79)36-54(82)37-62(49-32-64(83)69(65(33-49)85-4)87-55-17-10-16-53(81)35-55)77-42-60-61(20-19-58-68(60)63(77)39-72(70(58)84)28-27-71(43-72)24-5-6-25-71)76(30-23-45(2)78)66-34-48(22-29-75-66)67(57)47-12-8-15-52(80)31-47\/h7-8,10-13,15-17,19-20,22,31-35,40-42,44,56-57,62,67,70,74-75,80-81,83-84H,5-6,14,21,23-30,36-39,43H2,1-4H3\/t44-,56+,57-,62-,67+,70+,72-,73-\/m0\/s1 is nan."}", "/scratch/micpie/export/iupac_smiles/train_21-7.jsonl": "{"text":"Task: Please generate the SMILES representation of a compound based on the traditional IUPAC name.\nIUPAC name: (2S)-2-[3-benzoyloxy-2-[3-ethyl-5-[2-[[3-[4-(3-fluorophenyl)triazol-1-yl]-5-[4-[4-(3-fluorophenyl)triazol-1-yl]-3,5-dihydroxy-6-methylol-tetrahydropyran-2-yl]oxy-4-hydroxy-cyclohexanecarbonyl]amino]ethylcarbamoyl]-2-(3,4,5-trihydroxy-6-methyl-tetrahydropyran-2-yl)oxy-cyclohexoxy]-5-hydroxy-6-methylol-tetrahydropyran-4-yl]oxy-3-cyclohexyl-propionic acid\nResult: CCCCCCCC6OCCCCCO6)C))O))O))O)))))OCCCCCO6)CO)))O))O[C@@H]CCCCCCC6)))))))C=O)O)))))OC=O)C=CC=CC=C6)))))))))))))C=O)NCCNC=O)CCCCCC6)OCCCCCO6)CO)))O))NC=CN=N5))C=CC=CC=C6)))F))))))))O)))))O))NC=CN=N5))C=CC=CC=C6)))F"} {"text":"Task: Please give me the SMILES of a compound given the traditional IUPAC name.\nIUPAC name: 3-[4-(trifluoromethoxy)phenyl]pyridazine\nResult: FC(F)(F)Oc1ccc(-c2cccnn2)cc1"}", "/scratch/micpie/export/iupac_smiles/train_26-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with SELFIES [C][C][C][=C][Branch2][Ring1][=Branch1][S][C][=Branch1][Ring2][=N][Ring1][Branch1][C][=C][C][=Branch1][=Branch2][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl][F][C][=Branch1][C][=O][O] is 2-(4-chloro-3-fluoro-phenyl)-4-ethyl-thiazole-5-carboxylic acid."} {"text":"The traditional IUPAC name of the chemical with SMILES C1CN(CCN1CC(=O)N2[C@@H](CC(=N2)C3=CC=C(C=C3)F)C4=CC=C(C=C4)Cl)C5=CC=CC=C5F is 1-[(5S)-5-(4-chlorophenyl)-3-(4-fluorophenyl)-2-pyrazolin-1-yl]-2-[4-(2-fluorophenyl)piperazino]ethanone."}", "/scratch/micpie/export/iupac_smiles/valid_22-8.jsonl": "{"text":"Task: Please give me the SMILES representation of a compound based on the systematic IUPAC name.\nIUPAC name: 1-cyclopropyl-3-ethynyl-2-(trifluoromethyl)benzene\nResult: [C][#C][C][=C][Branch1][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][C][C][Ring1][Ring1][C][Branch1][C][F][Branch1][C][F][F]"} {"text":"Task: Please give me the SMILES representation of a compound based on the systematic IUPAC name.\nIUPAC name: 1-(3-fluoranyl-4-methoxy-phenyl)sulfonylpiperidine-3-carbaldehyde\nResult: InChI=1S\/C13H16FNO4S\/c1-19-13-5-4-11(7-12(13)14)20(17,18)15-6-2-3-10(8-15)9-16\/h4-5,7,9-10H,2-3,6,8H2,1H3"}", "/scratch/micpie/export/iupac_smiles/train_11-0.jsonl": "{"text":"The traditional IUPAC name of the chemical with canonical SMILES CC(=O)OC1CCC2(C)C3=C(CCC2C1(C)C)[C@]1(C)CCC(C(C)CCC(O)C(C)(C)O)C1(C)CC3 is acetic acid [(14R)-17-(4,5-dihydroxy-1,5-dimethyl-hexyl)-4,4,10,13,14-pentamethyl-2,3,5,6,7,11,12,15,16,17-decahydro-1H-cyclopenta[a]phenanthren-3-yl] ester."} {"text":"The traditional IUPAC name of the compound with SELFIES [C][C][Branch2][=Branch1][Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][=Branch2][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][Branch2][Ring1][Ring1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][Ring1][=Branch2][=C][C][=C][Ring1][=N][C] is (3-dibenzofuran-1-ylphenyl)-(9,9-dimethylfluoren-2-yl)-(4-phenylphenyl)amine."}", "/scratch/micpie/export/iupac_smiles/train_26-3.jsonl": "{"text":"The SMILES of the compound with traditional IUPAC name 2-(4-chloro-3-fluoro-phenyl)-4-ethyl-thiazole-5-carboxylic acid is CCC1=C(SC(=N1)C2=CC(=C(C=C2)Cl)F)C(=O)O."} {"text":"The SELFIES of the compound with traditional IUPAC name 1-[(5S)-5-(4-chlorophenyl)-3-(4-fluorophenyl)-2-pyrazolin-1-yl]-2-[4-(2-fluorophenyl)piperazino]ethanone is [C][C][N][Branch2][Ring2][=N][C][C][N][Ring1][=Branch1][C][C][=Branch1][C][=O][N][C@@H1][Branch2][Ring1][Ring1][C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][F][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl][C][=C][C][=C][C][=C][Ring1][=Branch1][F]."}", "/scratch/micpie/export/iupac_smiles/train_16-3.jsonl": "{"text":"The SELFIES of the compound with traditional IUPAC name 5-chloro-4-(chloromethyl)-2-(2,3,5-trimethylphenoxy)pyridine is [C][C][=C][C][=Branch2][Ring1][N][=C][Branch2][Ring1][#Branch1][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][=N][C][=C][Branch1][=Branch2][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][Cl][Cl][C][C]."} {"text":"The canonical SMILES of the molecule with traditional IUPAC name 2-ethoxy-1H-benzimidazole-4-carboxylic acid isopropyl ester is CCOc1nc2c(C(=O)OC(C)C)cccc2[nH]1."}", "/scratch/micpie/export/iupac_smiles/valid_0-2.jsonl": "{"text":"The preferred IUPAC name of the molecule with DeepSMILES CC=CC=CC=C6))\/C=N\\NC=O)C=NNC=C5))CC=CC=CC=C6))Br))))))))))))\/C is 1-[(4-bromophenyl)methyl]-N-[(Z)-1-(4-methylphenyl)ethylideneamino]pyrazole-3-carboxamide."} {"text":"The preferred IUPAC name of the molecule with InChI InChI=1S\/C8H16O4.C2H4\/c1-5-7(10)8(2,11-3)4-6(9)12-5;1-2\/h5-7,9-10H,4H2,1-3H3;1-2H2 is ethene;4-methoxy-4,6-dimethyloxane-2,5-diol."}", "/scratch/micpie/export/iupac_smiles/train_19-9.jsonl": "{"text":"Task: Please create the SMILES of a compound based on the CAS-like IUPAC name.\nIUPAC name: N-(4-chlorophenyl)-3-[4-[2-(methylamino)ethyl]-1-piperidinyl]propanamide\nResult: InChI=1S\/C17H26ClN3O\/c1-19-10-6-14-7-11-21(12-8-14)13-9-17(22)20-16-4-2-15(18)3-5-16\/h2-5,14,19H,6-13H2,1H3,(H,20,22)"} {"text":"Task: Please create the SMILES representation of a chemical based on the IUAPC name in CAS-like style.\nIUPAC name: 3-[1-[2,6-dichloro-4-[hydroxy(oxido)amino]phenyl]iminoethyl]-1-benzopyran-2-one\nResult: CC=NC=CC=CC=C6Cl)))NO)[O-]))))Cl))))C=CC=CC=CC=C6OC%10=O"}", "/scratch/micpie/export/iupac_smiles/train_25-5.jsonl": "{"text":"The SELFIES of the compound with CAS-like IUPAC name N-(4-butoxycyclohexyl)-5-(5-chloro-3-pyridinyl)-7-methyl-3-pyrazolo[1,5-a]pyrimidinecarboxamide is [C][C][C][C][O][C][C][C][C][Branch1][Branch1][C][C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=C][N][=C][Branch1][=N][C][=C][Branch1][Branch2][N][Ring1][=Branch1][N][=C][Ring1][=Branch2][C][C][=C][C][=Branch1][=Branch1][=C][N][=C][Ring1][=Branch1][Cl]."} {"text":"The SELFIES of the molecule with IUAPC name in CAS-like style 8-ethyl-2,3-dihydrothieno[2,3-g][1,4]benzodioxin-7-carboxylic acid is [C][C][C][=C][Branch2][Ring1][Ring2][S][C][=C][C][=C][Branch1][#Branch1][C][=C][Ring1][=Branch1][Ring1][=Branch2][O][C][C][O][Ring1][Branch2][C][=Branch1][C][=O][O]."}", "/scratch/micpie/export/iupac_smiles/test_23-9.jsonl": "{"text":"Task: Please give me the SMILES representation of a compound given the CAS-like IUPAC name.\nIUPAC name: 1-(4-hydroxy-3-nitrophenyl)sulfonyl-3-piperidinecarboxaldehyde\nResult: CCCCNC6)S=O)=O)C=CC=CC=C6))O))[N+]=O)[O-]))))))))C=O"} {"text":"Task: Please generate the SMILES representation of a compound based on the CAS-like IUPAC name.\nIUPAC name: N-(1-benzimidazolyl)-4-(ethylsulfamoyl)benzamide\nResult: CCNS(=O)(=O)C1=CC=C(C=C1)C(=O)NN2C=NC3=CC=CC=C32"}", "/scratch/micpie/export/iupac_smiles/train_5-0.jsonl": "{"text":"The traditional IUPAC name of the compound with SELFIES [C][C][C][C][N][C][C][C][Ring1][=Branch1][Branch1][N][C][Branch1][#Branch1][C][C][O][C][Ring1][Branch1][C][N][O] is 4-[3-(aminomethyl)tetrahydrofuran-3-yl]-3-ethyl-piperidin-4-ol."} {"text":"The traditional IUPAC name of the molecule with DeepSMILES CCC=O)NC=CC=CC=C6C=O)NCCN)))))))))))))OCCCCCO5.Cl is N-(2-aminoethyl)-2-[2-(tetrahydrofurfuryloxy)propanoylamino]benzamide;hydrochloride."}", "/scratch/micpie/export/iupac_smiles/test_6-2.jsonl": "{"text":"The preferred IUPAC name of the molecule with SELFIES [C][C][Branch1][C][C][Branch2][Ring1][#Branch1][C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1] is N-(2-aminoethyl)-2-[(3-methyl-3-phenylbutanoyl)amino]benzamide."} {"text":"The IUPAC name of the molecule with canonical SMILES Clc1ccc(CNc2ccc3c(c2)OCCO3)cc1Cl is N-[(3,4-dichlorophenyl)methyl]-2,3-dihydro-1,4-benzodioxin-6-amine."}", "/scratch/micpie/export/iupac_smiles/train_11-3.jsonl": "{"text":"The SELFIES of the molecule with traditional IUPAC name acetic acid [(14R)-17-(4,5-dihydroxy-1,5-dimethyl-hexyl)-4,4,10,13,14-pentamethyl-2,3,5,6,7,11,12,15,16,17-decahydro-1H-cyclopenta[a]phenanthren-3-yl] ester is [C][C][Branch1][#C][C][C][C][Branch1][=Branch2][C][Branch1][C][C][Branch1][C][C][O][O][C][C][C][C@@][Branch2][Ring2][#Branch2][C][Ring1][Branch1][Branch2][Ring2][Ring1][C][C][C][=C][Ring1][=Branch1][C][C][C][C][Ring1][=Branch1][Branch2][Ring1][Ring1][C][C][C][Branch1][Branch2][C][Ring1][=Branch1][Branch1][C][C][C][O][C][=Branch1][C][=O][C][C][C][C]."} {"text":"The SELFIES of the molecule with traditional IUPAC name (3-dibenzofuran-1-ylphenyl)-(9,9-dimethylfluoren-2-yl)-(4-phenylphenyl)amine is [C][C][Branch2][=Branch1][Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][=Branch2][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][Branch2][Ring1][Ring1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][Ring1][=Branch2][=C][C][=C][Ring1][=N][C]."}", "/scratch/micpie/export/iupac_smiles/test_23-3.jsonl": "{"text":"The InChI of the molecule with traditional IUPAC name 1-(4-hydroxy-3-nitro-phenyl)sulfonylnipecotaldehyde is InChI=1S\/C12H14N2O6S\/c15-8-9-2-1-5-13(7-9)21(19,20)10-3-4-12(16)11(6-10)14(17)18\/h3-4,6,8-9,16H,1-2,5,7H2."} {"text":"The DeepSMILES of the chemical with traditional IUPAC name N-(benzimidazol-1-yl)-4-(ethylsulfamoyl)benzamide is CCNS=O)=O)C=CC=CC=C6))C=O)NNC=NC=CC=CC=C69."}", "/scratch/micpie/export/iupac_smiles/test_27-2.jsonl": "{"text":"The preferred IUPAC name of the molecule with InChI InChI=1S\/C29H30ClFN4O\/c1-20-7-8-23(17-21(20)2)26-18-28(22-9-11-24(30)12-10-22)35(32-26)29(36)19-33-13-15-34(16-14-33)27-6-4-3-5-25(27)31\/h3-12,17,28H,13-16,18-19H2,1-2H3\/t28-\/m0\/s1 is 1-[(3S)-3-(4-chlorophenyl)-5-(3,4-dimethylphenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(2-fluorophenyl)piperazin-1-yl]ethanone."} {"text":"The IUPAC name of the chemical with canonical SMILES C=CC1=C(C)C(C)=C(\/C=C(\\C=C)Nc2ccc(-c3ccccc3)s2)C1(C)C is N-[(1E)-1-(4-ethenyl-2,3,5,5-tetramethylcyclopenta-1,3-dien-1-yl)buta-1,3-dien-2-yl]-5-phenylthiophen-2-amine."}", "/scratch/micpie/export/iupac_smiles/valid_24-8.jsonl": "{"text":"Task: Please create the SMILES of a compound given the systematic IUPAC name.\nIUPAC name: 2-[1,4-bis(oxidanylidene)-3H-phthalazin-2-yl]-N-(2-hydroxyphenyl)ethanamide\nResult: O=C(Cn1[nH]c(=O)c2ccccc2c1=O)Nc1ccccc1O"} {"text":"Task: Please generate the SMILES of a molecule based on the systematic IUPAC name.\nIUPAC name: 5-(7-fluoranyl-1,3-benzoxazol-5-yl)-7-methyl-N-[1,1,1-tris(fluoranyl)propan-2-yl]pyrazolo[1,5-a]pyrimidine-3-carboxamide\nResult: CC1=CC(=NC2=C(C=NN12)C(=O)NC(C)C(F)(F)F)C3=CC4=C(C(=C3)F)OC=N4"}", "/scratch/micpie/export/iupac_smiles/test_7-4.jsonl": "{"text":"The SMILES of the compound with systematic IUPAC name 3-azanyl-4-oxidanyl-N-(pyridin-2-ylmethyl)benzenesulfonamide is C1=CC=NC(=C1)CNS(=O)(=O)C2=CC(=C(C=C2)O)N."} {"text":"The SELFIES of the chemical with systematic IUPAC name 3-[(4-fluorophenyl)sulfonyl-phenyl-amino]propanamide is [C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][Branch1][Branch2][C][C][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][F]."}", "/scratch/micpie/export/iupac_smiles/test_2-8.jsonl": "{"text":"Task: Please generate the SMILES of a molecule given the systematic IUPAC name.\nIUPAC name: (2R)-2-(methylamino)-1-(2-methylphenyl)propan-1-one\nResult: CC=CC=CC=C6C=O)[C@@H]C)NC"} {"text":"Task: Please give me the SMILES representation of a molecule based on the systematic IUPAC name.\nIUPAC name: 2-(6-bromanyl-2,3-dihydro-1,4-benzodioxin-7-yl)-4,5,6,7-tetrakis(chloranyl)isoindole-1,3-dione\nResult: O=C1c2c(Cl)c(Cl)c(Cl)c(Cl)c2C(=O)N1c1cc2c(cc1Br)OCCO2"}", "/scratch/micpie/export/iupac_smiles/valid_0-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the chemical with SMILES CC1=CC=C(C=C1)\/C(=N\\NC(=O)C2=NN(C=C2)CC3=CC=C(C=C3)Br)\/C is 1-[(4-bromophenyl)methyl]-N-[(Z)-1-(4-methylphenyl)ethylideneamino]-3-pyrazolecarboxamide."} {"text":"The IUAPC name in CAS-like style of the chemical with SELFIES [C][C][C][Branch1][P][C][Branch1][=Branch2][C][C][Branch1][Ring2][O][Ring1][=Branch1][O][Branch1][C][C][O][C][O].[C][=C] is ethene;4-methoxy-4,6-dimethyloxane-2,5-diol."}", "/scratch/micpie/export/iupac_smiles/train_7-7.jsonl": "{"text":"Task: Please create the SMILES of a compound based on the traditional IUPAC name.\nIUPAC name: 2-[(5-amino-2-chloro-phenyl)sulfonylamino]acetic acid methyl ester\nResult: COC(=O)CNS(=O)(=O)c1cc(N)ccc1Cl"} {"text":"Task: Please generate the SMILES representation of a molecule based on the traditional IUPAC name.\nIUPAC name: 3-(N-(2-nitrophenyl)sulfonylanilino)propionamide\nResult: C=CC=CC=C6))NCCC=O)N))))S=O)=O)C=CC=CC=C6[N+]=O)[O-]"}", "/scratch/micpie/export/iupac_smiles/valid_7-8.jsonl": "{"text":"Task: Please create the SMILES of a compound given the systematic IUPAC name.\nIUPAC name: 2-methyl-N-[2-[[4-(trifluoromethylsulfanyl)phenyl]methylamino]ethyl]propanamide\nResult: CC(C)C(=O)NCCNCc1ccc(SC(F)(F)F)cc1"} {"text":"Task: Please create the SMILES representation of a chemical based on the systematic IUPAC name.\nIUPAC name: 3-[phenyl(thiophen-2-ylsulfonyl)amino]propanamide\nResult: NC(=O)CCN(c1ccccc1)S(=O)(=O)c1cccs1"}", "/scratch/micpie/export/iupac_smiles/train_25-3.jsonl": "{"text":"The SMILES of the chemical with traditional IUPAC name N-(4-butoxycyclohexyl)-5-(5-chloro-3-pyridyl)-7-methyl-pyrazolo[1,5-a]pyrimidine-3-carboxamide is CCCCOC1CCC(CC1)NC(=O)C2=C3N=C(C=C(N3N=C2)C)C4=CC(=CN=C4)Cl."} {"text":"The SELFIES of the compound with traditional IUPAC name 8-ethyl-2,3-dihydrothieno[2,3-g][1,4]benzodioxin-7-carboxylic acid is [C][C][C][=C][Branch2][Ring1][Ring2][S][C][=C][C][=C][Branch1][#Branch1][C][=C][Ring1][=Branch1][Ring1][=Branch2][O][C][C][O][Ring1][Branch2][C][=Branch1][C][=O][O]."}", "/scratch/micpie/export/iupac_smiles/test_12-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the compound with SELFIES [C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][Branch2][Ring1][#Branch1][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][S][C][=C][C][=C][C][=C][Ring1][=C][Ring1][=Branch1][C][=C][Branch1][=Branch1][C][=C][Ring2][Ring1][Ring2][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl] is 2-(4-chlorophenyl)spiro[fluorene-9,9'-thioxanthene]."} {"text":"The CAS-like IUPAC name of the chemical with SMILES CC(C)(C)C(C(=O)N(CC#C)CC1CC1)N is 2-amino-N-(cyclopropylmethyl)-3,3-dimethyl-N-prop-2-ynylbutanamide."}", "/scratch/micpie/export/iupac_smiles/test_27-7.jsonl": "{"text":"Task: Please give me the SMILES representation of a compound based on the traditional IUPAC name.\nIUPAC name: 1-[(5S)-5-(4-chlorophenyl)-3-(3,4-dimethylphenyl)-2-pyrazolin-1-yl]-2-[4-(2-fluorophenyl)piperazino]ethanone\nResult: CC1=C(C=C(C=C1)C2=NN([C@@H](C2)C3=CC=C(C=C3)Cl)C(=O)CN4CCN(CC4)C5=CC=CC=C5F)C"} {"text":"Task: Please give me the SMILES representation of a compound based on the traditional IUPAC name.\nIUPAC name: (5-phenyl-2-thienyl)-[(1E)-1-[(2,3,5,5-tetramethyl-4-vinyl-cyclopenta-1,3-dien-1-yl)methylene]allyl]amine\nResult: CC=CCC=C5C))\/C=C\\C=C))\/NC=CC=CS5)C=CC=CC=C6))))))))))))))C)C))C=C"}", "/scratch/micpie/export/iupac_smiles/train_16-4.jsonl": "{"text":"The canonical SMILES of the compound with systematic IUPAC name 5-chloranyl-4-(chloromethyl)-2-(2,3,5-trimethylphenoxy)pyridine is Cc1cc(C)c(C)c(Oc2cc(CCl)c(Cl)cn2)c1."} {"text":"The SELFIES of the compound with systematic IUPAC name propan-2-yl 2-ethoxy-1H-benzimidazole-4-carboxylate is [C][C][O][C][=N][C][=C][Branch1][#Branch2][C][=C][C][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][=Branch1][C][=O][O][C][Branch1][C][C][C]."}", "/scratch/micpie/export/iupac_smiles/train_15-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with canonical SMILES Cc1nccc2c1[nH]c1cc(C(=O)O)ccc12 is 1-methyl-9H-beta-carboline-7-carboxylic acid."} {"text":"The traditional IUPAC name of the chemical with SMILES C1=CC2=C(C(=C1)OC3=NC=C(C(=C3)CCl)Cl)N=CC=C2 is 8-[[5-chloro-4-(chloromethyl)-2-pyridyl]oxy]quinoline."}", "/scratch/micpie/export/iupac_smiles/test_9-8.jsonl": "{"text":"Task: Please give me the SMILES of a compound based on the systematic IUPAC name.\nIUPAC name: 1-(fluoren-9-ylideneamino)oxy-3-(4-phenylbutan-2-ylamino)propan-2-ol\nResult: InChI=1S\/C26H28N2O2\/c1-19(15-16-20-9-3-2-4-10-20)27-17-21(29)18-30-28-26-24-13-7-5-11-22(24)23-12-6-8-14-25(23)26\/h2-14,19,21,27,29H,15-18H2,1H3"} {"text":"Task: Please create the SMILES representation of a chemical based on the systematic IUPAC name.\nIUPAC name: (2S)-5-chloranyl-N-[(4,6-dimethyl-2-oxidanylidene-piperidin-3-yl)methyl]-3-[ethyl(propanoyl)amino]-2-methyl-cyclohexane-1-carboxamide\nResult: InChI=1S\/C21H36ClN3O3\/c1-6-19(26)25(7-2)18-10-15(22)9-16(14(18)5)20(27)23-11-17-12(3)8-13(4)24-21(17)28\/h12-18H,6-11H2,1-5H3,(H,23,27)(H,24,28)\/t12?,13?,14-,15?,16?,17?,18?\/m0\/s1"}", "/scratch/micpie/export/iupac_smiles/valid_2-4.jsonl": "{"text":"The DeepSMILES of the molecule with systematic IUPAC name 4-[(3S,4S)-2,4-dimethylheptan-3-yl]phenol is CCC[C@H]C)[C@@H]C=CC=CC=C6))O)))))CC)C."} {"text":"The InChI of the molecule with systematic IUPAC name 8-bromanyl-3-[2-(4-ethoxyphenyl)-1,3-thiazol-4-yl]-6-nitro-chromen-2-one is InChI=1S\/C20H13BrN2O5S\/c1-2-27-14-5-3-11(4-6-14)19-22-17(10-29-19)15-8-12-7-13(23(25)26)9-16(21)18(12)28-20(15)24\/h3-10H,2H2,1H3."}", "/scratch/micpie/export/iupac_smiles/valid_25-4.jsonl": "{"text":"The canonical SMILES of the compound with systematic IUPAC name 5-(5-chloranylpyridin-3-yl)-N-[(1S)-1-cyclopropylethyl]-7-methyl-pyrazolo[1,5-a]pyrimidine-3-carboxamide is Cc1cc(-c2cncc(Cl)c2)nc2c(C(=O)N[C@@H](C)C3CC3)cnn12."} {"text":"The DeepSMILES of the molecule with systematic IUPAC name 2-[azanyl(phenyl)methyl]-4-ethyl-1,3-thiazole-5-carboxylic acid is CCC=CSC=N5)CC=CC=CC=C6))))))N))))C=O)O."}", "/scratch/micpie/export/iupac_smiles/test_17-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the chemical with SELFIES [C][C][O][C][=N][C][=C][Branch1][#Branch2][C][=C][C][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][=Branch1][C][=O][O][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][N][=N][Ring1][Branch1] is 2-ethoxy-1H-benzimidazole-4-carboxylic acid [4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl ester."} {"text":"The CAS-like IUPAC name of the molecule with InChI InChI=1S\/C56H49IN2\/c1-5-14-45(6-2)56(4)35-43(42-16-13-15-37(3)31-42)32-44(36-56)48-33-49-50(34-51(48)57)55(41-27-23-39(24-28-41)53-20-10-12-30-59-53)47-18-8-7-17-46(47)54(49)40-25-21-38(22-26-40)52-19-9-11-29-58-52\/h5-14,16-33,36-37,51H,15,34-35H2,1-4H3\/b14-5-,45-6+ is 2-[4-[2-[3-[(2E,4Z)-hexa-2,4-dien-3-yl]-3-methyl-5-(3-methyl-1-cyclohexa-1,5-dienyl)-1-cyclohexa-1,5-dienyl]-3-iodo-10-[4-(2-pyridinyl)phenyl]-3,4-dihydroanthracen-9-yl]phenyl]pyridine."}", "/scratch/micpie/export/iupac_smiles/test_13-9.jsonl": "{"text":"Task: Please generate the SMILES representation of a compound based on the IUAPC name in CAS-like style.\nIUPAC name: 2-amino-N-(1,3-dihydroxy-2-methylpropan-2-yl)-3,3-dimethylbutanamide\nResult: InChI=1S\/C10H22N2O3\/c1-9(2,3)7(11)8(15)12-10(4,5-13)6-14\/h7,13-14H,5-6,11H2,1-4H3,(H,12,15)"} {"text":"Task: Please generate the SMILES representation of a compound given the CAS-like IUPAC name.\nIUPAC name: (4S)-3,4-dihydro-2H-1-benzopyran-4-carboxylic acid (1,3,5-trimethyl-4-pyrazolyl) ester\nResult: CC1=C(C(=NN1C)C)OC(=O)[C@H]2CCOC3=CC=CC=C23"}", "/scratch/micpie/export/iupac_smiles/valid_7-1.jsonl": "{"text":"The CAS-like IUPAC name of the chemical with SELFIES [C][C][Branch1][C][C][C][=Branch1][C][=O][N][C][C][N][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][S][C][Branch1][C][F][Branch1][C][F][F] is 2-methyl-N-[2-[[4-(trifluoromethylthio)phenyl]methylamino]ethyl]propanamide."} {"text":"The CAS-like IUPAC name of the chemical with InChI InChI=1S\/C13H14N2O3S2\/c14-12(16)8-9-15(11-5-2-1-3-6-11)20(17,18)13-7-4-10-19-13\/h1-7,10H,8-9H2,(H2,14,16) is 3-(N-thiophen-2-ylsulfonylanilino)propanamide."}", "/scratch/micpie/export/iupac_smiles/valid_16-0.jsonl": "{"text":"The traditional IUPAC name of the chemical with DeepSMILES C=CC=CC=C6)I)))OC=NC=CC=C6)CCl)))Cl is 5-chloro-4-(chloromethyl)-2-(3-iodophenoxy)pyridine."} {"text":"The traditional IUPAC name of the molecule with SMILES CCOC1=NC2=CC=CC(=C2N1CC3=CC=C(C=C3)C4=CC=CC=C4C5=NNN=N5)C(=O)NCCC6=C(C=CC(=C6)Cl)Cl is N-[2-(2,5-dichlorophenyl)ethyl]-2-ethoxy-3-[4-[2-(2H-tetrazol-5-yl)phenyl]benzyl]benzimidazole-4-carboxamide."}", "/scratch/micpie/export/iupac_smiles/valid_22-4.jsonl": "{"text":"The DeepSMILES of the compound with systematic IUPAC name 1-cyclopropyl-3-ethynyl-2-(trifluoromethyl)benzene is C#CC=CC=CC=C6)))CCC3))))CF)F)F."} {"text":"The DeepSMILES of the chemical with systematic IUPAC name 1-(3-fluoranyl-4-methoxy-phenyl)sulfonylpiperidine-3-carbaldehyde is COC=CC=CC=C6))S=O)=O)NCCCCC6)C=O))))))))))F."}", "/scratch/micpie/export/iupac_smiles/valid_17-5.jsonl": "{"text":"The canonical SMILES of the molecule with IUAPC name in CAS-like style 2-ethoxy-N-(4-fluorophenyl)-3-[[4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl]-4-benzimidazolecarboxamide is CCOc1nc2cccc(C(=O)Nc3ccc(F)cc3)c2n1Cc1ccc(-c2ccccc2-c2nn[nH]n2)cc1."} {"text":"The SELFIES of the compound with IUAPC name in CAS-like style N3-(3-aminophenyl)-N1,N5-bis[3-(tert-butylamino)phenyl]benzene-1,3,5-tricarboxamide is [C][C][Branch1][C][C][Branch1][C][C][N][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=C][C][=Branch2][Ring1][Branch2][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][N][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][N][C][Branch1][C][C][Branch1][C][C][C]."}", "/scratch/micpie/export/iupac_smiles/test_7-7.jsonl": "{"text":"Task: Please create the SMILES representation of a molecule given the traditional IUPAC name.\nIUPAC name: 3-amino-4-hydroxy-N-(2-pyridylmethyl)benzenesulfonamide\nResult: C1=CC=NC(=C1)CNS(=O)(=O)C2=CC(=C(C=C2)O)N"} {"text":"Task: Please generate the SMILES representation of a chemical based on the traditional IUPAC name.\nIUPAC name: 3-(N-(4-fluorophenyl)sulfonylanilino)propionamide\nResult: InChI=1S\/C15H15FN2O3S\/c16-12-6-8-14(9-7-12)22(20,21)18(11-10-15(17)19)13-4-2-1-3-5-13\/h1-9H,10-11H2,(H2,17,19)"}", "/scratch/micpie/export/iupac_smiles/train_27-7.jsonl": "{"text":"Task: Please create the SMILES of a compound based on the traditional IUPAC name.\nIUPAC name: 1-[(5R)-5-(4-chlorophenyl)-3-(4-fluorophenyl)-2-pyrazolin-1-yl]-2-[4-(2-fluorophenyl)piperazino]ethanone\nResult: O=C(CN1CCN(c2ccccc2F)CC1)N1N=C(c2ccc(F)cc2)C[C@@H]1c1ccc(Cl)cc1"} {"text":"Task: Please generate the SMILES representation of a chemical based on the traditional IUPAC name.\nIUPAC name: 6-[4-[4-[(9,9-dimethylfluoren-2-yl)amino]phenyl]-2,3,5,6-tetrahydroxy-phenyl]benzene-1,2,3,4,5-pentol\nResult: CCC=CC=CC=C6C=C9C=CC=C6))NC=CC=CC=C6))C=CC=CC=C6O))O))C=CC=CC=C6O))O))O))O))O))))O))O))))))))))))))))))C"}", "/scratch/micpie/export/iupac_smiles/train_15-8.jsonl": "{"text":"Task: Please create the SMILES representation of a compound based on the systematic IUPAC name.\nIUPAC name: 1-methyl-9H-pyrido[3,4-b]indole-7-carboxylic acid\nResult: [C][C][=N][C][=C][C][=C][Ring1][=Branch1][N][C][=C][Ring1][Branch1][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=Branch1][C][=O][O]"} {"text":"Task: Please generate the SMILES representation of a molecule based on the systematic IUPAC name.\nIUPAC name: 8-[5-chloranyl-4-(chloromethyl)pyridin-2-yl]oxyquinoline\nResult: ClCc1cc(Oc2cccc3cccnc23)ncc1Cl"}", "/scratch/micpie/export/iupac_smiles/train_21-9.jsonl": "{"text":"Task: Please create the SMILES of a compound given the CAS-like IUPAC name.\nIUPAC name: (2S)-2-[[3-benzoyloxy-2-[3-ethyl-5-[[2-[[[3-[4-(3-fluorophenyl)-1-triazolyl]-5-[[4-[4-(3-fluorophenyl)-1-triazolyl]-3,5-dihydroxy-6-(hydroxymethyl)-2-oxanyl]oxy]-4-hydroxycyclohexyl]-oxomethyl]amino]ethylamino]-oxomethyl]-2-[(3,4,5-trihydroxy-6-methyl-2-oxanyl)oxy]cyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)-4-oxanyl]oxy]-3-cyclohexylpropanoic acid\nResult: InChI=1S\/C68H88F2N8O22\/c1-3-35-23-39(28-47(59(35)100-67-58(87)57(86)52(81)33(2)93-67)96-68-61(99-65(92)36-14-8-5-9-15-36)60(55(84)50(32-80)98-68)94-48(64(90)91)22-34-12-6-4-7-13-34)62(88)71-20-21-72-63(89)40-26-45(77-29-43(73-75-77)37-16-10-18-41(69)24-37)53(82)46(27-40)95-66-56(85)51(54(83)49(31-79)97-66)78-30-44(74-76-78)38-17-11-19-42(70)25-38\/h5,8-11,14-19,24-25,29-30,33-35,39-40,45-61,66-68,79-87H,3-4,6-7,12-13,20-23,26-28,31-32H2,1-2H3,(H,71,88)(H,72,89)(H,90,91)\/t33?,35?,39?,40?,45?,46?,47?,48-,49?,50?,51?,52?,53?,54?,55?,56?,57?,58?,59?,60?,61?,66?,67?,68?\/m0\/s1"} {"text":"Task: Please generate the SMILES of a molecule given the CAS-like IUPAC name.\nIUPAC name: 3-[4-(trifluoromethoxy)phenyl]pyridazine\nResult: C1=CC(=NN=C1)C2=CC=C(C=C2)OC(F)(F)F"}", "/scratch/micpie/export/iupac_smiles/train_18-2.jsonl": "{"text":"The preferred IUPAC name of the molecule with SELFIES [C][C][=N][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][N][C][=N][C][Ring1][=Branch1][C][=Branch1][Ring2][=C][Ring1][#Branch2][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=C] is N-[3-[6-(6-methylpyridin-3-yl)-3,8a-dihydroquinazolin-8-yl]phenyl]prop-2-enamide."} {"text":"The IUPAC name of the molecule with SMILES CNCCC1CCN(CC1)CC2=CC(=C(C=C2)F)F.Cl is 2-[1-[(3,4-difluorophenyl)methyl]piperidin-4-yl]-N-methylethanamine;hydrochloride."}", "/scratch/micpie/export/iupac_smiles/valid_17-9.jsonl": "{"text":"Task: Please give me the SMILES representation of a molecule based on the CAS-like IUPAC name.\nIUPAC name: 2-ethoxy-N-(4-fluorophenyl)-3-[[4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl]-4-benzimidazolecarboxamide\nResult: [C][C][O][C][=N][C][=C][C][=C][C][=Branch2][Ring1][P][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][N][=N][Ring1][Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][F]"} {"text":"Task: Please create the SMILES of a molecule given the IUAPC name in CAS-like style.\nIUPAC name: N3-(3-aminophenyl)-N1,N5-bis[3-(tert-butylamino)phenyl]benzene-1,3,5-tricarboxamide\nResult: InChI=1S\/C35H40N6O3\/c1-34(2,3)40-29-14-8-12-27(20-29)38-32(43)23-16-22(31(42)37-26-11-7-10-25(36)19-26)17-24(18-23)33(44)39-28-13-9-15-30(21-28)41-35(4,5)6\/h7-21,40-41H,36H2,1-6H3,(H,37,42)(H,38,43)(H,39,44)"}", "/scratch/micpie/export/iupac_smiles/test_24-5.jsonl": "{"text":"The DeepSMILES of the compound with CAS-like IUPAC name 3,5-difluoro-N-(5-sulfanylidene-1H-1,2,4-triazol-4-yl)benzamide is C=CC=CC=C6F)))F)))C=O)NNC=NNC5=S."} {"text":"The DeepSMILES of the molecule with IUAPC name in CAS-like style 7-methyl-5-[6-(trifluoromethyl)-3-pyridinyl]-N-(1,1,1-trifluoropropan-2-yl)-3-pyrazolo[1,5-a]pyrimidinecarboxamide is CC=CC=NC=CC=NN95)))C=O)NCC)CF)F)F))))))))C=CN=CC=C6))CF)F)F."}", "/scratch/micpie/export/iupac_smiles/train_6-3.jsonl": "{"text":"The DeepSMILES of the molecule with traditional IUPAC name N-(2-aminoethyl)-2-[2-(tetrahydrofurfuryloxy)propanoylamino]benzamide is CCC=O)NC=CC=CC=C6C=O)NCCN)))))))))))))OCCCCCO5."} {"text":"The SMILES of the compound with traditional IUPAC name 3-amino-4-fluoro-N-[(1S)-1-phenylethyl]benzenesulfonamide is C[C@@H](C1=CC=CC=C1)NS(=O)(=O)C2=CC(=C(C=C2)F)N."}", "/scratch/micpie/export/iupac_smiles/train_10-1.jsonl": "{"text":"The CAS-like IUPAC name of the compound with DeepSMILES CCC=CC=CC=C6NCC))CCCCCC6))NC=O)OCC)C)C))))))))))))Cl)))CO)O is N-[4-[5-chloro-3-(dihydroxymethyl)-N,2-diethylanilino]cyclohexyl]carbamic acid tert-butyl ester."} {"text":"The IUAPC name in CAS-like style of the molecule with DeepSMILES CCC)CCC[C@]C5CCCCCCCCCC6CCC%10C%14CC%18))C))C)))))C)C))OC=O)C))))))C)))))))C=O)F is acetic acid [(3aS)-3a-carbonofluoridoyl-5a,5b,8,8,11a-pentamethyl-1-propan-2-yl-1,2,3,4,5,6,7,7a,9,10,11,11b,12,13,13a,13b-hexadecahydrocyclopenta[a]chrysen-9-yl] ester."}", "/scratch/micpie/export/iupac_smiles/test_15-8.jsonl": "{"text":"Task: Please create the SMILES of a molecule based on the systematic IUPAC name.\nIUPAC name: (E)-3-(6-azanylpyridin-3-yl)-N-[[5-[4-[4,4-bis(fluoranyl)piperidin-1-yl]carbonylphenyl]-7-[5-chloranyl-2,4-bis(fluoranyl)phenyl]-1-benzofuran-2-yl]methyl]prop-2-enamide\nResult: C1CN(CCC1(F)F)C(=O)C2=CC=C(C=C2)C3=CC(=C4C(=C3)C=C(O4)CNC(=O)\/C=C\/C5=CN=C(C=C5)N)C6=CC(=C(C=C6F)F)Cl"} {"text":"Task: Please generate the SMILES of a compound based on the systematic IUPAC name.\nIUPAC name: 5-chloranyl-4-(chloromethyl)-2-(2,3-dihydro-1H-inden-5-yloxy)pyridine\nResult: ClCc1cc(Oc2ccc3c(c2)CCC3)ncc1Cl"}", "/scratch/micpie/export/iupac_smiles/test_11-4.jsonl": "{"text":"The SMILES of the compound with systematic IUPAC name 3-[2,4-bis(fluoranyl)phenoxy]-N,N-dimethyl-3-phenyl-propan-1-amine;2,5-dinitrobenzoic acid is CN(C)CCC(C1=CC=CC=C1)OC2=C(C=C(C=C2)F)F.C1=CC(=C(C=C1[N+](=O)[O-])C(=O)O)[N+](=O)[O-]."} {"text":"The SMILES of the molecule with systematic IUPAC name 4-[7-(4-hydroxyphenyl)-9,9-dioctyl-fluoren-2-yl]benzaldehyde is CCCCCCCCC1(C2=C(C=CC(=C2)C3=CC=C(C=C3)C=O)C4=C1C=C(C=C4)C5=CC=C(C=C5)O)CCCCCCCC."}", "/scratch/micpie/export/iupac_smiles/test_23-5.jsonl": "{"text":"The canonical SMILES of the chemical with CAS-like IUPAC name 1-(4-hydroxy-3-nitrophenyl)sulfonyl-3-piperidinecarboxaldehyde is O=CC1CCCN(S(=O)(=O)c2ccc(O)c([N+](=O)[O-])c2)C1."} {"text":"The InChI of the compound with CAS-like IUPAC name N-(1-benzimidazolyl)-4-(ethylsulfamoyl)benzamide is InChI=1S\/C16H16N4O3S\/c1-2-18-24(22,23)13-9-7-12(8-10-13)16(21)19-20-11-17-14-5-3-4-6-15(14)20\/h3-11,18H,2H2,1H3,(H,19,21)."}", "/scratch/micpie/export/iupac_smiles/train_2-1.jsonl": "{"text":"The CAS-like IUPAC name of the molecule with SMILES C[C@]12CC[C@H]3[C@H]([C@@H]1CC[C@@H]2O[C@H]4CCCCO4)C[C@H](C5=C3C=CC(=C5)OC)O is (6R,8R,9S,13S,14S,17S)-3-methoxy-13-methyl-17-[[(2S)-2-oxanyl]oxy]-6,7,8,9,11,12,14,15,16,17-decahydrocyclopenta[a]phenanthren-6-ol."} {"text":"The CAS-like IUPAC name of the molecule with DeepSMILES C[C@]C[C@@H]C[C@]C6)C[C@@]C6)C8)O)))C=O)[O-] is (1S,3R,5S,7R)-3-hydroxy-5-methyl-1-adamantanecarboxylate."}", "/scratch/micpie/export/iupac_smiles/valid_12-2.jsonl": "{"text":"The IUPAC name of the compound with DeepSMILES CCCCCCCCOCCCC=O)OCCCCCCNCCCCCCCC=O)OCC)))))))))))CCO))))))))))))))OCCCCCCCC is ethyl 8-[6-(4,4-dioctoxybutanoyloxy)hexyl-(2-hydroxyethyl)amino]octanoate."} {"text":"The IUPAC name of the compound with DeepSMILES CCC)C)CC=O)NCCCCCF)F)F))))))))N is 2-amino-3,3-dimethyl-N-(5,5,5-trifluoropentyl)butanamide."}", "/scratch/micpie/export/iupac_smiles/train_22-0.jsonl": "{"text":"The traditional IUPAC name of the compound with InChI InChI=1S\/C13H14F3N3O4S\/c1-6-8(10(17)20)4-5-9(13(14,15)16)12(6,24(3,21)22)11-19-18-7(2)23-11\/h4-6H,1-3H3,(H2,17,20) is 5-mesyl-6-methyl-5-(5-methyl-1,3,4-oxadiazol-2-yl)-4-(trifluoromethyl)cyclohexa-1,3-diene-1-carboxamide."} {"text":"The traditional IUPAC name of the chemical with SMILES CCCS(=O)(=O)N1CCCC(C1)C=O is 1-propylsulfonylnipecotaldehyde."}", "/scratch/micpie/export/iupac_smiles/valid_1-1.jsonl": "{"text":"The CAS-like IUPAC name of the chemical with SMILES CCN(C)CC=C.C=C is ethene;N-ethyl-N-methyl-2-propen-1-amine."} {"text":"The CAS-like IUPAC name of the molecule with DeepSMILES CCC)NC[C@H]COC=CC=CC=C6C=CN5)))))))))))O)))C[C@@H]COC=CC=CC=C6C=CN5)))))))))))O is (2R)-1-[[(2S)-2-hydroxy-3-(1H-indol-4-yloxy)propyl]-propan-2-ylamino]-3-(1H-indol-4-yloxy)-2-propanol."}", "/scratch/micpie/export/iupac_smiles/test_8-9.jsonl": "{"text":"Task: Please create the SMILES of a molecule given the CAS-like IUPAC name.\nIUPAC name: N-[3-[(3,4-dimethylphenyl)sulfonylamino]phenyl]-2-methylpropanamide\nResult: Cc1ccc(S(=O)(=O)Nc2cccc(NC(=O)C(C)C)c2)cc1C"} {"text":"Task: Please generate the SMILES representation of a chemical given the CAS-like IUPAC name.\nIUPAC name: 1-(tert-butylamino)-3-[(diphenylmethylene)amino]oxy-2-propanol\nResult: CC(C)(C)NCC(O)CON=C(c1ccccc1)c1ccccc1"}", "/scratch/micpie/export/iupac_smiles/test_16-2.jsonl": "{"text":"The preferred IUPAC name of the chemical with InChI InChI=1S\/C15H15Cl2NO2\/c1-2-7-19-12-3-5-13(6-4-12)20-15-8-11(9-16)14(17)10-18-15\/h3-6,8,10H,2,7,9H2,1H3 is 5-chloro-4-(chloromethyl)-2-(4-propoxyphenoxy)pyridine."} {"text":"The IUPAC name of the molecule with DeepSMILES CCOC=NC=CC=CC=C6N9CC=CC=CC=C6))C=CC=CC=C6C=NNN=N5))))))))))))))))))C=O)NCC=CC=CC=C6))F is 2-ethoxy-N-[(4-fluorophenyl)methyl]-3-[[4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl]benzimidazole-4-carboxamide."}", "/scratch/micpie/export/iupac_smiles/valid_26-6.jsonl": "{"text":"The InChI of the chemical with IUPAC name 2-[4-(difluoromethyl)phenyl]-4-ethyl-1,3-thiazole-5-carboxylic acid is InChI=1S\/C13H11F2NO2S\/c1-2-9-10(13(17)18)19-12(16-9)8-5-3-7(4-6-8)11(14)15\/h3-6,11H,2H2,1H3,(H,17,18)."} {"text":"The DeepSMILES of the molecule with IUPAC name 2-(4-benzylpiperazin-1-yl)-1-[(3R)-3-(2-chlorophenyl)-5-(3-methoxyphenyl)-3,4-dihydropyrazol-2-yl]ethanone is COC=CC=CC=C6)C=NN[C@H]C5)C=CC=CC=C6Cl))))))))C=O)CNCCNCC6))CC=CC=CC=C6."}", "/scratch/micpie/export/iupac_smiles/test_20-8.jsonl": "{"text":"Task: Please give me the SMILES of a molecule given the systematic IUPAC name.\nIUPAC name: phenyl (2S,8S,12S)-4,10-bis(2-ethoxyphenyl)-3,5,9,11-tetrakis(oxidanylidene)-4,10-diazatetracyclo[5.5.2.02,6.08,12]tetradec-13-ene-13-carboxylate\nResult: InChI=1S\/C35H30N2O8\/c1-3-43-24-16-10-8-14-22(24)36-31(38)27-20-18-21(35(42)45-19-12-6-5-7-13-19)26(29(27)33(36)40)30-28(20)32(39)37(34(30)41)23-15-9-11-17-25(23)44-4-2\/h5-18,20,26-30H,3-4H2,1-2H3\/t20?,26?,27-,28?,29-,30-\/m0\/s1"} {"text":"Task: Please generate the SMILES representation of a chemical based on the systematic IUPAC name.\nIUPAC name: (2S)-3-cyclohexyl-2-[2-[3-[4-(3-fluorophenyl)-1,2,3-triazol-1-yl]-5-[2-[4-[4-(3-fluorophenyl)-1,2,3-triazol-1-yl]-6-(hydroxymethyl)-3,5-bis(oxidanyl)oxan-2-yl]oxyethylcarbamoyl]-2-[6-methyl-3,4,5-tris(oxidanyl)oxan-2-yl]oxy-cyclohexyl]oxy-6-(hydroxymethyl)-5-oxidanyl-3-(phenylcarbonyloxy)oxan-4-yl]oxy-propanoic acid\nResult: CCCCCCO6)OCCCCCC6OCCCCCO6)CO)))O))O[C@@H]CCCCCCC6)))))))C=O)O)))))OC=O)C=CC=CC=C6)))))))))))))C=O)NCCOCCCCCO6)CO)))O))NC=CN=N5))C=CC=CC=C6)))F))))))))O))))))))))NC=CN=N5))C=CC=CC=C6)))F)))))))))))O))O))O"}", "/scratch/micpie/export/iupac_smiles/valid_11-2.jsonl": "{"text":"The IUPAC name of the chemical with canonical SMILES Clc1ccc2c(c1)CCc1cccnc1C2=C1CCNCC1.O=S(=O)([O-])[O-].O=S(=O)([O-])[O-] is 13-chloro-2-piperidin-4-ylidene-4-azatricyclo[9.4.0.03,8]pentadeca-1(11),3(8),4,6,12,14-hexaene;disulfate."} {"text":"The IUPAC name of the molecule with canonical SMILES CCCCCCCCC1(CCCCCCCC)c2cc(Br)ccc2-c2ccc(-c3ccc(C=O)cc3)cc21 is 4-(7-bromo-9,9-dioctylfluoren-2-yl)benzaldehyde."}", "/scratch/micpie/export/iupac_smiles/train_2-6.jsonl": "{"text":"The canonical SMILES of the molecule with IUPAC name (6R,8R,9S,13S,14S,17S)-3-methoxy-13-methyl-17-[(2S)-oxan-2-yl]oxy-6,7,8,9,11,12,14,15,16,17-decahydrocyclopenta[a]phenanthren-6-ol is COc1ccc2c(c1)[C@H](O)C[C@@H]1[C@@H]2CC[C@]2(C)[C@@H](O[C@H]3CCCCO3)CC[C@@H]12."} {"text":"The SELFIES of the compound with preferred IUPAC name (1S,3R,5S,7R)-3-hydroxy-5-methyladamantane-1-carboxylate is [C][C@][C][C@@H1][C][C@][Branch1][Ring2][C][Ring1][=Branch1][Branch1][=C][C][C@@][Branch1][Ring2][C][Ring1][#Branch1][Branch1][Ring2][C][Ring1][#Branch2][O][C][=Branch1][C][=O][O-1]."}", "/scratch/micpie/export/iupac_smiles/test_20-2.jsonl": "{"text":"The IUPAC name of the molecule with canonical SMILES CCOc1ccccc1N1C(=O)C2C3C=C(C(=O)Oc4ccccc4)C([C@@H]2C1=O)[C@@H]1C(=O)N(c2ccccc2OCC)C(=O)[C@@H]31 is phenyl (2S,8S,12S)-4,10-bis(2-ethoxyphenyl)-3,5,9,11-tetraoxo-4,10-diazatetracyclo[5.5.2.02,6.08,12]tetradec-13-ene-13-carboxylate."} {"text":"The preferred IUPAC name of the compound with SELFIES [C][C][C][Branch2][O][=Branch2][C][Branch2][O][Ring2][C][Branch2][#Branch2][#C][C][Branch1][Ring2][O][Ring1][=Branch1][O][C][C][Branch2][Branch2][#C][C][C][Branch2][Branch1][=Branch2][C][C][Ring1][=Branch1][O][C][C][Branch2][Ring2][C][C][Branch1][=N][C][Branch1][=Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][C@@H1][Branch1][#Branch2][C][C][C][C][C][C][C][Ring1][=Branch1][C][=Branch1][C][=O][O][O][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][O][C][C][Branch2][Ring2][Ring2][C][Branch1][=N][C][Branch1][=Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][N][C][=C][Branch1][Branch1][N][=N][Ring1][Branch1][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][F][O][N][C][=C][Branch1][Branch1][N][=N][Ring1][Branch1][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][F][O][O][O] is (2S)-2-[3-benzoyloxy-2-[3-[4-(3-fluorophenyl)triazol-1-yl]-5-[2-[4-[4-(3-fluorophenyl)triazol-1-yl]-3,5-dihydroxy-6-(hydroxymethyl)oxan-2-yl]oxyethylcarbamoyl]-2-(3,4,5-trihydroxy-6-methyloxan-2-yl)oxycyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)oxan-4-yl]oxy-3-cyclohexylpropanoic acid."}", "/scratch/micpie/export/iupac_smiles/test_15-9.jsonl": "{"text":"Task: Please create the SMILES of a molecule given the CAS-like IUPAC name.\nIUPAC name: (E)-3-(6-amino-3-pyridinyl)-N-[[7-(5-chloro-2,4-difluorophenyl)-5-[4-[(4,4-difluoro-1-piperidinyl)-oxomethyl]phenyl]-2-benzofuranyl]methyl]-2-propenamide\nResult: InChI=1S\/C35H27ClF4N4O3\/c36-28-16-26(29(37)17-30(28)38)27-15-23(21-3-5-22(6-4-21)34(46)44-11-9-35(39,40)10-12-44)13-24-14-25(47-33(24)27)19-43-32(45)8-2-20-1-7-31(41)42-18-20\/h1-8,13-18H,9-12,19H2,(H2,41,42)(H,43,45)\/b8-2+"} {"text":"Task: Please generate the SMILES representation of a chemical based on the IUAPC name in CAS-like style.\nIUPAC name: 5-chloro-4-(chloromethyl)-2-(2,3-dihydro-1H-inden-5-yloxy)pyridine\nResult: C1CC2=C(C1)C=C(C=C2)OC3=NC=C(C(=C3)CCl)Cl"}", "/scratch/micpie/export/iupac_smiles/valid_3-5.jsonl": "{"text":"The DeepSMILES of the chemical with CAS-like IUPAC name (1S,3R,5S,7R)-3-hydroxy-5-methyl-1-adamantanecarboxylic acid is C[C@]C[C@@H]C[C@]C6)C[C@@]C6)C8)O)))C=O)O."} {"text":"The SMILES of the chemical with CAS-like IUPAC name 2-[4-(5-hydroxy-6,7-dimethoxy-4-oxo-1-benzopyran-2-yl)phenoxy]-N-[6-[(2-methoxyphenyl)methyl-methylamino]hexyl]acetamide is CN(CCCCCCNC(=O)COC1=CC=C(C=C1)C2=CC(=O)C3=C(C(=C(C=C3O2)OC)OC)O)CC4=CC=CC=C4OC."}", "/scratch/micpie/export/iupac_smiles/train_25-9.jsonl": "{"text":"Task: Please create the SMILES of a molecule given the IUAPC name in CAS-like style.\nIUPAC name: N-(4-butoxycyclohexyl)-5-(5-chloro-3-pyridinyl)-7-methyl-3-pyrazolo[1,5-a]pyrimidinecarboxamide\nResult: CCCCOC1CCC(NC(=O)c2cnn3c(C)cc(-c4cncc(Cl)c4)nc23)CC1"} {"text":"Task: Please generate the SMILES of a compound based on the CAS-like IUPAC name.\nIUPAC name: 8-ethyl-2,3-dihydrothieno[2,3-g][1,4]benzodioxin-7-carboxylic acid\nResult: CCC=CSC=CC=CC=C69))OCCO6)))))))))C=O)O"}", "/scratch/micpie/export/iupac_smiles/valid_17-4.jsonl": "{"text":"The InChI of the molecule with systematic IUPAC name 2-ethoxy-N-(4-fluorophenyl)-3-[[4-[2-(2H-1,2,3,4-tetrazol-5-yl)phenyl]phenyl]methyl]benzimidazole-4-carboxamide is InChI=1S\/C30H24FN7O2\/c1-2-40-30-33-26-9-5-8-25(29(39)32-22-16-14-21(31)15-17-22)27(26)38(30)18-19-10-12-20(13-11-19)23-6-3-4-7-24(23)28-34-36-37-35-28\/h3-17H,2,18H2,1H3,(H,32,39)(H,34,35,36,37)."} {"text":"The InChI of the molecule with systematic IUPAC name N3-(3-aminophenyl)-N1,N5-bis[3-(tert-butylamino)phenyl]benzene-1,3,5-tricarboxamide is InChI=1S\/C35H40N6O3\/c1-34(2,3)40-29-14-8-12-27(20-29)38-32(43)23-16-22(31(42)37-26-11-7-10-25(36)19-26)17-24(18-23)33(44)39-28-13-9-15-30(21-28)41-35(4,5)6\/h7-21,40-41H,36H2,1-6H3,(H,37,42)(H,38,43)(H,39,44)."}", "/scratch/micpie/export/iupac_smiles/train_26-5.jsonl": "{"text":"The SMILES of the chemical with CAS-like IUPAC name 2-(4-chloro-3-fluorophenyl)-4-ethyl-5-thiazolecarboxylic acid is CCC1=C(SC(=N1)C2=CC(=C(C=C2)Cl)F)C(=O)O."} {"text":"The SELFIES of the chemical with CAS-like IUPAC name 1-[(3S)-3-(4-chlorophenyl)-5-(4-fluorophenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(2-fluorophenyl)-1-piperazinyl]ethanone is [C][C][N][Branch2][Ring2][=N][C][C][N][Ring1][=Branch1][C][C][=Branch1][C][=O][N][C@@H1][Branch2][Ring1][Ring1][C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][F][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl][C][=C][C][=C][C][=C][Ring1][=Branch1][F]."}", "/scratch/micpie/export/iupac_smiles/test_27-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the compound with canonical SMILES Cc1ccc(C2=NN(C(=O)CN3CCN(c4ccccc4F)CC3)[C@H](c3ccc(Cl)cc3)C2)cc1C is 1-[(3S)-3-(4-chlorophenyl)-5-(3,4-dimethylphenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(2-fluorophenyl)-1-piperazinyl]ethanone."} {"text":"The IUAPC name in CAS-like style of the compound with canonical SMILES C=CC1=C(C)C(C)=C(\/C=C(\\C=C)Nc2ccc(-c3ccccc3)s2)C1(C)C is N-[(1E)-1-(4-ethenyl-2,3,5,5-tetramethyl-1-cyclopenta-1,3-dienyl)buta-1,3-dien-2-yl]-5-phenyl-2-thiophenamine."}", "/scratch/micpie/export/iupac_smiles/valid_3-6.jsonl": "{"text":"The DeepSMILES of the chemical with IUPAC name (1S,3R,5S,7R)-3-hydroxy-5-methyladamantane-1-carboxylic acid is C[C@]C[C@@H]C[C@]C6)C[C@@]C6)C8)O)))C=O)O."} {"text":"The canonical SMILES of the compound with IUPAC name 2-[4-(5-hydroxy-6,7-dimethoxy-4-oxochromen-2-yl)phenoxy]-N-[6-[(2-methoxyphenyl)methyl-methylamino]hexyl]acetamide is COc1ccccc1CN(C)CCCCCCNC(=O)COc1ccc(-c2cc(=O)c3c(O)c(OC)c(OC)cc3o2)cc1."}", "/scratch/micpie/export/iupac_smiles/train_25-2.jsonl": "{"text":"The preferred IUPAC name of the molecule with SMILES CCCCOC1CCC(CC1)NC(=O)C2=C3N=C(C=C(N3N=C2)C)C4=CC(=CN=C4)Cl is N-(4-butoxycyclohexyl)-5-(5-chloropyridin-3-yl)-7-methylpyrazolo[1,5-a]pyrimidine-3-carboxamide."} {"text":"The IUPAC name of the compound with canonical SMILES CCc1c(C(=O)O)sc2cc3c(cc12)OCCO3 is 8-ethyl-2,3-dihydrothieno[2,3-g][1,4]benzodioxine-7-carboxylic acid."}", "/scratch/micpie/export/iupac_smiles/test_25-4.jsonl": "{"text":"The SMILES of the molecule with systematic IUPAC name 5-(1,3-benzoxazol-5-yl)-N-[(1R)-1-cyclopropyl-2,2,2-tris(fluoranyl)ethyl]-7-methyl-pyrazolo[1,5-a]pyrimidine-3-carboxamide is CC1=CC(=NC2=C(C=NN12)C(=O)N[C@H](C3CC3)C(F)(F)F)C4=CC5=C(C=C4)OC=N5."} {"text":"The SMILES of the compound with systematic IUPAC name 3-ethyl-5,6,7,8-tetrahydrobenzo[f][1]benzofuran-2-carboxylic acid is CCC1=C(OC2=C1C=C3CCCCC3=C2)C(=O)O."}", "/scratch/micpie/export/iupac_smiles/valid_10-8.jsonl": "{"text":"Task: Please create the SMILES representation of a compound based on the systematic IUPAC name.\nIUPAC name: methyl 3-[hexan-2-yl(methyl)amino]-2-methyl-benzoate\nResult: CCCCC(C)N(C)C1=CC=CC(=C1C)C(=O)OC"} {"text":"Task: Please give me the SMILES representation of a compound given the systematic IUPAC name.\nIUPAC name: (4S,7R,8S,9S,13Z,16S)-16-[(Z)-1-fluoranyl-2-pyridin-2-yl-ethenyl]-5,5,9,13-tetramethyl-4,8-bis(oxidanyl)-7-propyl-1-azacyclohexadec-13-ene-2,6-dione\nResult: CCC[C@@H][C@H][C@H]CCC\/C=C\\C[C@H]NC=O)C[C@@H]CC%16=O))C)C))O)))))\/C=C\/C=CC=CC=N6)))))))\/F)))))\/C)))))C))O"}", "/scratch/micpie/export/iupac_smiles/valid_22-5.jsonl": "{"text":"The canonical SMILES of the compound with IUAPC name in CAS-like style 1-cyclopropyl-3-ethynyl-2-(trifluoromethyl)benzene is C#Cc1cccc(C2CC2)c1C(F)(F)F."} {"text":"The DeepSMILES of the compound with CAS-like IUPAC name 1-(3-fluoro-4-methoxyphenyl)sulfonyl-3-piperidinecarboxaldehyde is COC=CC=CC=C6))S=O)=O)NCCCCC6)C=O))))))))))F."}", "/scratch/micpie/export/iupac_smiles/test_3-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the compound with canonical SMILES CC(C)OC[C@@H](O)CSCc1ccccc1 is (2R)-1-(phenylmethylthio)-3-propan-2-yloxy-2-propanol."} {"text":"The CAS-like IUPAC name of the compound with DeepSMILES CC=CC=CC=C6Cl)))NC=O)O5.CCCO))CO))N))O is 2-amino-2-(hydroxymethyl)propane-1,3-diol;5-chloro-6-methyl-3H-1,3-benzoxazol-2-one."}", "/scratch/micpie/export/iupac_smiles/train_23-0.jsonl": "{"text":"The traditional IUPAC name of the chemical with InChI InChI=1S\/C12H14N2O5S\/c15-9-10-4-3-7-13(8-10)20(18,19)12-6-2-1-5-11(12)14(16)17\/h1-2,5-6,9-10H,3-4,7-8H2 is 1-(2-nitrophenyl)sulfonylnipecotaldehyde."} {"text":"The traditional IUPAC name of the chemical with SELFIES [C][C][=C][C][=Branch1][=Branch2][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][C][=Branch1][C][=O][C][C][C][=Branch1][C][=O][N][N][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][=Branch2] is N-(benzimidazol-1-yl)-4-(2,5-dimethylphenyl)-4-keto-butyramide."}", "/scratch/micpie/export/iupac_smiles/test_12-8.jsonl": "{"text":"Task: Please create the SMILES of a molecule given the systematic IUPAC name.\nIUPAC name: 2-(4-chlorophenyl)spiro[fluorene-9,9'-thioxanthene]\nResult: C1=CC=C2C(=C1)C3=C(C24C5=CC=CC=C5SC6=CC=CC=C46)C=C(C=C3)C7=CC=C(C=C7)Cl"} {"text":"Task: Please create the SMILES of a compound given the systematic IUPAC name.\nIUPAC name: 2-azanyl-N-(cyclopropylmethyl)-3,3-dimethyl-N-prop-2-ynyl-butanamide\nResult: C#CCN(CC1CC1)C(=O)C(N)C(C)(C)C"}", "/scratch/micpie/export/iupac_smiles/test_22-7.jsonl": "{"text":"Task: Please create the SMILES representation of a molecule given the traditional IUPAC name.\nIUPAC name: 2-bromo-5,7-dioxa-2,3-diazatricyclo[6.3.1.04,12]dodeca-1(12),3,8,10-tetraen-6-one\nResult: C=CC=CC=C6)OC=O)OC6=NN9Br"} {"text":"Task: Please generate the SMILES representation of a molecule given the traditional IUPAC name.\nIUPAC name: 1-(2-ketoindolin-5-yl)sulfonylnipecotaldehyde\nResult: [C][C][C][Branch2][Ring1][P][C][N][Branch1][Ring2][C][Ring1][=Branch1][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][Ring1][Branch2][C][=O]"}", "/scratch/micpie/export/iupac_smiles/valid_0-5.jsonl": "{"text":"The DeepSMILES of the chemical with CAS-like IUPAC name 1-[(4-bromophenyl)methyl]-N-[(Z)-1-(4-methylphenyl)ethylideneamino]-3-pyrazolecarboxamide is CC=CC=CC=C6))\/C=N\\NC=O)C=NNC=C5))CC=CC=CC=C6))Br))))))))))))\/C."} {"text":"The InChI of the chemical with IUAPC name in CAS-like style ethene;4-methoxy-4,6-dimethyloxane-2,5-diol is InChI=1S\/C8H16O4.C2H4\/c1-5-7(10)8(2,11-3)4-6(9)12-5;1-2\/h5-7,9-10H,4H2,1-3H3;1-2H2."}", "/scratch/micpie/export/iupac_smiles/train_17-0.jsonl": "{"text":"The traditional IUPAC name of the chemical with DeepSMILES CCOC=NC=CC=CC=C6N9CC=CC=CC=C6))C=CC=CC=C6C=NNN=N5))))))))))))))))))C=O)NC=CC=NC=C6 is 2-ethoxy-N-(4-pyridyl)-3-[4-[2-(2H-tetrazol-5-yl)phenyl]benzyl]benzimidazole-4-carboxamide."} {"text":"The traditional IUPAC name of the compound with canonical SMILES CCCC(CCC)c1cc(-c2ccccc2F)cc2cncnc12 is 6-(2-fluorophenyl)-8-(1-propylbutyl)quinazoline."}", "/scratch/micpie/export/iupac_smiles/train_9-8.jsonl": "{"text":"Task: Please give me the SMILES representation of a molecule based on the systematic IUPAC name.\nIUPAC name: 1-(tert-butylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,11,13-hexaenylideneamino)oxy-propan-2-ol\nResult: CC(C)(C)NCC(CON=C1C2=CC=CC=C2CCC3=CC=CC=C31)O"} {"text":"Task: Please give me the SMILES representation of a chemical given the systematic IUPAC name.\nIUPAC name: N-[(5-butyl-2,4-dimethyl-6-oxidanylidene-3,4-dihydropyridin-5-yl)methyl]-3-[cyclopentyl(methyl)amino]-2-ethenyl-benzamide\nResult: InChI=1S\/C27H39N3O2\/c1-6-8-16-27(19(3)17-20(4)29-26(27)32)18-28-25(31)23-14-11-15-24(22(23)7-2)30(5)21-12-9-10-13-21\/h7,11,14-15,19,21H,2,6,8-10,12-13,16-18H2,1,3-5H3,(H,28,31)"}", "/scratch/micpie/export/iupac_smiles/train_23-5.jsonl": "{"text":"The DeepSMILES of the chemical with CAS-like IUPAC name 1-(2-nitrophenyl)sulfonyl-3-piperidinecarboxaldehyde is CCCCNC6)S=O)=O)C=CC=CC=C6[N+]=O)[O-])))))))))))C=O."} {"text":"The SMILES of the chemical with IUAPC name in CAS-like style N-(1-benzimidazolyl)-4-(2,5-dimethylphenyl)-4-oxobutanamide is CC1=CC(=C(C=C1)C)C(=O)CCC(=O)NN2C=NC3=CC=CC=C32."}", "/scratch/micpie/export/iupac_smiles/valid_23-0.jsonl": "{"text":"The traditional IUPAC name of the compound with SMILES C1CC(CN(C1)S(=O)(=O)C2=CC(=CC=C2)Cl)C=O is 1-(3-chlorophenyl)sulfonylnipecotaldehyde."} {"text":"The traditional IUPAC name of the molecule with SMILES CC1=C(C(=C(C(=C1C)C)S(=O)(=O)NCCC(=O)NN2C=NC3=CC=CC=C32)C)C is N-(benzimidazol-1-yl)-3-[(2,3,4,5,6-pentamethylphenyl)sulfonylamino]propionamide."}", "/scratch/micpie/export/iupac_smiles/valid_0-4.jsonl": "{"text":"The SELFIES of the compound with systematic IUPAC name 1-[(4-bromophenyl)methyl]-N-[(Z)-1-(4-methylphenyl)ethylideneamino]pyrazole-3-carboxamide is [C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][\/C][=Branch2][Ring1][N][=N][\\N][C][=Branch1][C][=O][C][=N][N][Branch1][Branch1][C][=C][Ring1][Branch1][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Br][\/C]."} {"text":"The SMILES of the compound with systematic IUPAC name ethene;4-methoxy-4,6-dimethyl-oxane-2,5-diol is CC1C(C(CC(O1)O)(C)OC)O.C=C."}", "/scratch/micpie/export/iupac_smiles/train_27-3.jsonl": "{"text":"The SELFIES of the molecule with traditional IUPAC name 1-[(5R)-5-(4-chlorophenyl)-3-(4-fluorophenyl)-2-pyrazolin-1-yl]-2-[4-(2-fluorophenyl)piperazino]ethanone is [C][C][N][Branch2][Ring2][=N][C][C][N][Ring1][=Branch1][C][C][=Branch1][C][=O][N][C@H1][Branch2][Ring1][Ring1][C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][F][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl][C][=C][C][=C][C][=C][Ring1][=Branch1][F]."} {"text":"The SMILES of the compound with traditional IUPAC name 6-[4-[4-[(9,9-dimethylfluoren-2-yl)amino]phenyl]-2,3,5,6-tetrahydroxy-phenyl]benzene-1,2,3,4,5-pentol is CC1(C2=CC=CC=C2C3=C1C=C(C=C3)NC4=CC=C(C=C4)C5=C(C(=C(C(=C5O)O)C6=C(C(=C(C(=C6O)O)O)O)O)O)O)C."}", "/scratch/micpie/export/iupac_smiles/test_1-3.jsonl": "{"text":"The InChI of the compound with traditional IUPAC name (3R)-3-nitrocyclohexanol is InChI=1S\/C6H11NO3\/c8-6-3-1-2-5(4-6)7(9)10\/h5-6,8H,1-4H2\/t5-,6?\/m1\/s1."} {"text":"The canonical SMILES of the chemical with traditional IUPAC name (2S)-1-[[(2S)-2-hydroxy-3-(1H-indol-4-yloxy)propyl]-isopropyl-amino]-3-(1H-indol-4-yloxy)propan-2-ol is CC(C)N(C[C@H](O)COc1cccc2[nH]ccc12)C[C@H](O)COc1cccc2[nH]ccc12."}", "/scratch/micpie/export/iupac_smiles/valid_8-3.jsonl": "{"text":"The SMILES of the compound with traditional IUPAC name 3-[N-(2-naphthylsulfonyl)anilino]propionamide is C1=CC=C(C=C1)N(CCC(=O)N)S(=O)(=O)C2=CC3=CC=CC=C3C=C2."} {"text":"The InChI of the compound with traditional IUPAC name 1-(isopropylamino)-3-[(E)-tetralin-1-ylideneamino]oxy-propan-2-ol;hydrochloride is InChI=1S\/C16H24N2O2.ClH\/c1-12(2)17-10-14(19)11-20-18-16-9-5-7-13-6-3-4-8-15(13)16;\/h3-4,6,8,12,14,17,19H,5,7,9-11H2,1-2H3;1H\/b18-16+;."}", "/scratch/micpie/export/iupac_smiles/train_24-7.jsonl": "{"text":"Task: Please create the SMILES representation of a compound given the traditional IUPAC name.\nIUPAC name: N-(benzimidazol-1-yl)-2-[(2-phenoxyacetyl)amino]acetamide\nResult: [C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][O][C][C][=Branch1][C][=O][N][C][C][=Branch1][C][=O][N][N][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][=Branch2]"} {"text":"Task: Please create the SMILES representation of a chemical based on the traditional IUPAC name.\nIUPAC name: N-[(1S)-1-cyclopropyl-2,2,2-trifluoro-ethyl]-7-methyl-5-[4-(trifluoromethyl)-3-pyridyl]pyrazolo[1,5-a]pyrimidine-3-carboxamide\nResult: CC=CC=NC=CC=NN95)))C=O)N[C@@H]CCC3)))CF)F)F))))))))C=CC=CN=C6))))CF)F)F"}", "/scratch/micpie/export/iupac_smiles/test_5-3.jsonl": "{"text":"The canonical SMILES of the molecule with traditional IUPAC name 4-[1-(aminomethyl)-1-methyl-propyl]-3-ethyl-piperidin-4-ol is CCC1CNCCC1(O)C(C)(CC)CN."} {"text":"The InChI of the compound with traditional IUPAC name N-(2-aminoethyl)-2-[(4-methyl-3-phenyl-pentanoyl)amino]benzamide is InChI=1S\/C21H27N3O2\/c1-15(2)18(16-8-4-3-5-9-16)14-20(25)24-19-11-7-6-10-17(19)21(26)23-13-12-22\/h3-11,15,18H,12-14,22H2,1-2H3,(H,23,26)(H,24,25)."}", "/scratch/micpie/export/iupac_smiles/test_2-9.jsonl": "{"text":"Task: Please give me the SMILES representation of a molecule given the CAS-like IUPAC name.\nIUPAC name: (2R)-2-(methylamino)-1-(2-methylphenyl)-1-propanone\nResult: CC1=CC=CC=C1C(=O)[C@@H](C)NC"} {"text":"Task: Please generate the SMILES of a compound given the IUAPC name in CAS-like style.\nIUPAC name: 2-(6-bromo-2,3-dihydro-1,4-benzodioxin-7-yl)-4,5,6,7-tetrachloroisoindole-1,3-dione\nResult: [C][C][O][C][=C][Branch1][Ring2][O][Ring1][=Branch1][C][=C][Branch1][Branch2][C][=Branch1][Ring2][=C][Ring1][#Branch1][Br][N][C][=Branch1][C][=O][C][=C][Branch1][Branch1][C][Ring1][=Branch1][=O][C][=Branch1][=N][=C][Branch1][=Branch2][C][=Branch1][Branch1][=C][Ring1][Branch2][Cl][Cl][Cl][Cl]"}", "/scratch/micpie/export/iupac_smiles/valid_21-3.jsonl": "{"text":"The SMILES of the chemical with traditional IUPAC name benzoic acid [4-[(1S)-2-(azetidin-1-yl)-1-(cyclohexylmethyl)-2-keto-ethoxy]-2-[5-[2-[[3-[3,5-dihydroxy-4-[(2-ketochromen-3-yl)methoxy]-6-methylol-tetrahydropyran-2-yl]oxy-4-hydroxy-5-[(2-ketochromen-3-yl)methoxy]cyclohexanecarbonyl]amino]ethylcarbamoyl]-3-(4-phenyltriazol-1-yl)-2-(3,4,5-trihydroxy-6-methyl-tetrahydropyran-2-yl)oxy-cyclohexoxy]-5-hydroxy-6-methylol-tetrahydropyran-3-yl] ester is CC1C(C(C(C(O1)OC2C(CC(CC2OC3C(C(C(C(O3)CO)O)O[C@@H](CC4CCCCC4)C(=O)N5CCC5)OC(=O)C6=CC=CC=C6)C(=O)NCCNC(=O)C7CC(C(C(C7)OC8C(C(C(C(O8)CO)O)OCC9=CC1=CC=CC=C1OC9=O)O)O)OCC1=CC2=CC=CC=C2OC1=O)N1C=C(N=N1)C1=CC=CC=C1)O)O)O."} {"text":"The SELFIES of the molecule with traditional IUPAC name 3-fluoro-5-(1H-1,2,4-triazol-5-yliodanuidyl)-2-(2,2,2-trifluoroethoxy)pyridine is [C][=C][Branch2][Ring1][Ring2][C][=N][C][=Branch1][Branch1][=C][Ring1][=Branch1][F][O][C][C][Branch1][C][F][Branch1][C][F][F][I-1][C][=N][C][=N][N][Ring1][Branch1]."}", "/scratch/micpie/export/iupac_smiles/train_26-9.jsonl": "{"text":"Task: Please generate the SMILES of a chemical given the IUAPC name in CAS-like style.\nIUPAC name: 2-(4-chloro-3-fluorophenyl)-4-ethyl-5-thiazolecarboxylic acid\nResult: [C][C][C][=C][Branch2][Ring1][=Branch1][S][C][=Branch1][Ring2][=N][Ring1][Branch1][C][=C][C][=Branch1][=Branch2][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl][F][C][=Branch1][C][=O][O]"} {"text":"Task: Please give me the SMILES of a chemical given the IUAPC name in CAS-like style.\nIUPAC name: 1-[(3S)-3-(4-chlorophenyl)-5-(4-fluorophenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(2-fluorophenyl)-1-piperazinyl]ethanone\nResult: O=C(CN1CCN(c2ccccc2F)CC1)N1N=C(c2ccc(F)cc2)C[C@H]1c1ccc(Cl)cc1"}", "/scratch/micpie/export/iupac_smiles/valid_21-1.jsonl": "{"text":"The CAS-like IUPAC name of the chemical with canonical SMILES CC1OC(OC2C(OC3OC(CO)C(O)C(O[C@@H](CC4CCCCC4)C(=O)N4CCC4)C3OC(=O)c3ccccc3)CC(C(=O)NCCNC(=O)C3CC(OCc4cc5ccccc5oc4=O)C(O)C(OC4OC(CO)C(O)C(OCc5cc6ccccc6oc5=O)C4O)C3)CC2n2cc(-c3ccccc3)nn2)C(O)C(O)C1O is benzoic acid [4-[(2S)-1-(1-azetidinyl)-3-cyclohexyl-1-oxopropan-2-yl]oxy-2-[5-[[2-[[[3-[[3,5-dihydroxy-6-(hydroxymethyl)-4-[(2-oxo-1-benzopyran-3-yl)methoxy]-2-oxanyl]oxy]-4-hydroxy-5-[(2-oxo-1-benzopyran-3-yl)methoxy]cyclohexyl]-oxomethyl]amino]ethylamino]-oxomethyl]-3-(4-phenyl-1-triazolyl)-2-[(3,4,5-trihydroxy-6-methyl-2-oxanyl)oxy]cyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)-3-oxanyl] ester."} {"text":"The IUAPC name in CAS-like style of the compound with SMILES C1=C(C=NC(=C1F)OCC(F)(F)F)[I-]C2=NC=NN2 is 3-fluoro-5-(1H-1,2,4-triazol-5-yliodanuidyl)-2-(2,2,2-trifluoroethoxy)pyridine."}", "/scratch/micpie/export/iupac_smiles/valid_14-1.jsonl": "{"text":"The CAS-like IUPAC name of the chemical with DeepSMILES CCC)C)OC=O)NC[C@@H]CCC[C@@H]5NC=O)C=CC=CO5 is N-[[(1S,2S)-2-[[2-furanyl(oxo)methyl]amino]cyclopentyl]methyl]carbamic acid tert-butyl ester."} {"text":"The CAS-like IUPAC name of the chemical with SELFIES [C][C][Branch1][C][C][Branch1][C][C][O][C][=Branch1][C][=O][N][C][C][C][=C][C][=C][C][=Branch1][N][=C][C][=Branch1][#Branch1][=C][Ring1][=Branch1][O][Ring1][=Branch2][Cl][Br] is N-[2-(5-bromo-7-chloro-2-benzofuranyl)ethyl]carbamic acid tert-butyl ester."}", "/scratch/micpie/export/iupac_smiles/train_6-7.jsonl": "{"text":"Task: Please give me the SMILES of a compound given the traditional IUPAC name.\nIUPAC name: N-(2-aminoethyl)-2-[2-(tetrahydrofurfuryloxy)propanoylamino]benzamide\nResult: CC(C(=O)NC1=CC=CC=C1C(=O)NCCN)OCC2CCCO2"} {"text":"Task: Please generate the SMILES of a molecule given the traditional IUPAC name.\nIUPAC name: 3-amino-4-fluoro-N-[(1S)-1-phenylethyl]benzenesulfonamide\nResult: InChI=1S\/C14H15FN2O2S\/c1-10(11-5-3-2-4-6-11)17-20(18,19)12-7-8-13(15)14(16)9-12\/h2-10,17H,16H2,1H3\/t10-\/m0\/s1"}", "/scratch/micpie/export/iupac_smiles/test_11-6.jsonl": "{"text":"The SELFIES of the chemical with IUPAC name 3-(2,4-difluorophenoxy)-N,N-dimethyl-3-phenylpropan-1-amine;2,5-dinitrobenzoic acid is [C][N][Branch1][C][C][C][C][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][=C][Branch1][#Branch2][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][F][F].[C][=C][C][=Branch2][Ring1][C][=C][Branch1][#Branch2][C][=C][Ring1][=Branch1][N+1][=Branch1][C][=O][O-1][C][=Branch1][C][=O][O][N+1][=Branch1][C][=O][O-1]."} {"text":"The SELFIES of the chemical with IUPAC name 4-[7-(4-hydroxyphenyl)-9,9-dioctylfluoren-2-yl]benzaldehyde is [C][C][C][C][C][C][C][C][C][Branch2][Ring2][P][C][=C][Branch2][Ring1][Branch1][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=O][C][=C][Ring1][P][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][O][C][C][C][C][C][C][C][C]."}", "/scratch/micpie/export/iupac_smiles/train_9-0.jsonl": "{"text":"The traditional IUPAC name of the compound with InChI InChI=1S\/C22H28N2O2\/c1-22(2,3)23-14-18(25)15-26-24-21-19-10-6-4-8-16(19)12-13-17-9-5-7-11-20(17)21\/h4-11,18,23,25H,12-15H2,1-3H3 is 1-(tert-butylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,11,13-hexaenylideneamino)oxy-propan-2-ol."} {"text":"The traditional IUPAC name of the chemical with InChI InChI=1S\/C27H39N3O2\/c1-6-8-16-27(19(3)17-20(4)29-26(27)32)18-28-25(31)23-14-11-15-24(22(23)7-2)30(5)21-12-9-10-13-21\/h7,11,14-15,19,21H,2,6,8-10,12-13,16-18H2,1,3-5H3,(H,28,31) is N-[(5-butyl-6-keto-2,4-dimethyl-3,4-dihydropyridin-5-yl)methyl]-3-[cyclopentyl(methyl)amino]-2-vinyl-benzamide."}", "/scratch/micpie/export/iupac_smiles/train_16-8.jsonl": "{"text":"Task: Please give me the SMILES representation of a molecule given the systematic IUPAC name.\nIUPAC name: 5-chloranyl-4-(chloromethyl)-2-(2,3,5-trimethylphenoxy)pyridine\nResult: [C][C][=C][C][=Branch2][Ring1][N][=C][Branch2][Ring1][#Branch1][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][=N][C][=C][Branch1][=Branch2][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][Cl][Cl][C][C]"} {"text":"Task: Please create the SMILES of a compound based on the systematic IUPAC name.\nIUPAC name: propan-2-yl 2-ethoxy-1H-benzimidazole-4-carboxylate\nResult: InChI=1S\/C13H16N2O3\/c1-4-17-13-14-10-7-5-6-9(11(10)15-13)12(16)18-8(2)3\/h5-8H,4H2,1-3H3,(H,14,15)"}", "/scratch/micpie/export/iupac_smiles/train_0-5.jsonl": "{"text":"The SMILES of the compound with CAS-like IUPAC name N-[(Z)-(3-bromo-4,5-dimethoxyphenyl)methylideneamino]-1-[(4-nitrophenyl)methyl]-3-pyrazolecarboxamide is COC1=C(C(=CC(=C1)\/C=N\\NC(=O)C2=NN(C=C2)CC3=CC=C(C=C3)[N+](=O)[O-])Br)OC."} {"text":"The DeepSMILES of the chemical with CAS-like IUPAC name 1-[4-[3-(2-methyl-1-piperidinyl)propoxy]phenyl]ethanone is CCCCCCN6CCCOC=CC=CC=C6))C=O)C."}", "/scratch/micpie/export/iupac_smiles/train_24-2.jsonl": "{"text":"The IUPAC name of the chemical with canonical SMILES O=C(COc1ccccc1)NCC(=O)Nn1cnc2ccccc21 is N-(benzimidazol-1-yl)-2-[(2-phenoxyacetyl)amino]acetamide."} {"text":"The preferred IUPAC name of the compound with DeepSMILES CC=CC=NC=CC=NN95)))C=O)N[C@@H]CCC3)))CF)F)F))))))))C=CC=CN=C6))))CF)F)F is N-[(1S)-1-cyclopropyl-2,2,2-trifluoroethyl]-7-methyl-5-[4-(trifluoromethyl)pyridin-3-yl]pyrazolo[1,5-a]pyrimidine-3-carboxamide."}", "/scratch/micpie/export/iupac_smiles/train_8-4.jsonl": "{"text":"The DeepSMILES of the chemical with systematic IUPAC name 3-[(2-methyl-5-nitro-phenyl)sulfonyl-phenyl-amino]propanamide is CC=CC=CC=C6))[N+]=O)[O-]))))S=O)=O)NCCC=O)N))))C=CC=CC=C6."} {"text":"The InChI of the chemical with systematic IUPAC name 1-(propan-2-ylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,11,13-hexaenylideneamino)oxy-propan-2-ol;hydrochloride is InChI=1S\/C21H26N2O2.ClH\/c1-15(2)22-13-18(24)14-25-23-21-19-9-5-3-7-16(19)11-12-17-8-4-6-10-20(17)21;\/h3-10,15,18,22,24H,11-14H2,1-2H3;1H."}", "/scratch/micpie/export/iupac_smiles/train_6-5.jsonl": "{"text":"The SELFIES of the compound with CAS-like IUPAC name N-(2-aminoethyl)-2-[[1-oxo-2-(2-oxolanylmethoxy)propyl]amino]benzamide is [C][C][Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][N][O][C][C][C][C][C][O][Ring1][Branch1]."} {"text":"The canonical SMILES of the molecule with CAS-like IUPAC name 3-amino-4-fluoro-N-[(1S)-1-phenylethyl]benzenesulfonamide is C[C@H](NS(=O)(=O)c1ccc(F)c(N)c1)c1ccccc1."}", "/scratch/micpie/export/iupac_smiles/test_17-9.jsonl": "{"text":"Task: Please create the SMILES representation of a compound given the CAS-like IUPAC name.\nIUPAC name: 2-ethoxy-1H-benzimidazole-4-carboxylic acid [4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl ester\nResult: [C][C][O][C][=N][C][=C][Branch1][#Branch2][C][=C][C][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][=Branch1][C][=O][O][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][N][=N][Ring1][Branch1]"} {"text":"Task: Please give me the SMILES of a molecule given the CAS-like IUPAC name.\nIUPAC name: 2-[4-[2-[3-[(2E,4Z)-hexa-2,4-dien-3-yl]-3-methyl-5-(3-methyl-1-cyclohexa-1,5-dienyl)-1-cyclohexa-1,5-dienyl]-3-iodo-10-[4-(2-pyridinyl)phenyl]-3,4-dihydroanthracen-9-yl]phenyl]pyridine\nResult: C\/C=C\\C=C\/C))\\CCC=CC=C6)C=CC=CC=CC=CC=C6C=C%10CC%14I))))C=CC=CC=C6))C=CC=CC=N6)))))))))))))))))C=CC=CC=C6))C=CC=CC=N6))))))))))))))))C=CCCC=C6)))C))))))C"}", "/scratch/micpie/export/iupac_smiles/valid_9-2.jsonl": "{"text":"The IUPAC name of the compound with DeepSMILES CCC)NCCCON=CC=CC=CC=C6C=CC=CC=CC=C6%15))))))))))))))))))O is 1-(propan-2-ylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,9,11,13-heptaenylideneamino)oxypropan-2-ol."} {"text":"The preferred IUPAC name of the molecule with canonical SMILES Cc1c(C(=O)NCC2C(=O)NC(C)CC2(C)C2CCCCCC2)cc(Cl)cc1N(C)C1CCC1 is 5-chloro-3-[cyclobutyl(methyl)amino]-N-[(4-cycloheptyl-4,6-dimethyl-2-oxopiperidin-3-yl)methyl]-2-methylbenzamide."}", "/scratch/micpie/export/iupac_smiles/train_12-6.jsonl": "{"text":"The SMILES of the compound with IUPAC name N-(3-dibenzofuran-1-ylphenyl)-9,9-dimethyl-N-(3-phenylphenyl)fluoren-2-amine is CC1(C2=CC=CC=C2C3=C1C=C(C=C3)N(C4=CC=CC(=C4)C5=CC=CC=C5)C6=CC=CC(=C6)C7=C8C9=CC=CC=C9OC8=CC=C7)C."} {"text":"The DeepSMILES of the molecule with IUPAC name 2-[(2-amino-3,3-dimethylbutanoyl)amino]pent-4-enoic acid is CCC)C)CC=O)NCCC=C)))C=O)O)))))N."}", "/scratch/micpie/export/iupac_smiles/test_27-9.jsonl": "{"text":"Task: Please give me the SMILES of a compound given the IUAPC name in CAS-like style.\nIUPAC name: 1-[(3S)-3-(4-chlorophenyl)-5-(3,4-dimethylphenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(2-fluorophenyl)-1-piperazinyl]ethanone\nResult: Cc1ccc(C2=NN(C(=O)CN3CCN(c4ccccc4F)CC3)[C@H](c3ccc(Cl)cc3)C2)cc1C"} {"text":"Task: Please generate the SMILES of a compound given the IUAPC name in CAS-like style.\nIUPAC name: N-[(1E)-1-(4-ethenyl-2,3,5,5-tetramethyl-1-cyclopenta-1,3-dienyl)buta-1,3-dien-2-yl]-5-phenyl-2-thiophenamine\nResult: CC1=C(C(C(=C1C)\/C=C(\\C=C)\/NC2=CC=C(S2)C3=CC=CC=C3)(C)C)C=C"}", "/scratch/micpie/export/iupac_smiles/valid_20-5.jsonl": "{"text":"The SMILES of the molecule with IUAPC name in CAS-like style 4-acetyloxy-7-(2-hydroxy-2-oct-2-enyl-5-oxo-1-cyclopent-3-enylidene)-5-heptenoic acid methyl ester is CCCCCC=CCC1(C=CC(=O)C1=CC=CC(CCC(=O)OC)OC(=O)C)O."} {"text":"The SELFIES of the molecule with IUAPC name in CAS-like style 2-[2-[2-(2-oxopropylamino)ethylamino]ethylamino]acetic acid is [C][C][=Branch1][C][=O][C][N][C][C][N][C][C][N][C][C][=Branch1][C][=O][O]."}", "/scratch/micpie/export/iupac_smiles/train_4-5.jsonl": "{"text":"The SELFIES of the chemical with CAS-like IUPAC name (1S,2R,4aS,5R,6S,8aS)-6-(1,3-dithiolan-2-yl)-5-[(E)-2-[5-(3-fluorophenyl)-2-pyridinyl]ethenyl]-2-hydroxy-1,4a-dimethyl-2,3,4,5,6,7,8,8a-octahydronaphthalene-1-carboxaldehyde is [C][C@@][C][C][C@H1][Branch2][Branch1][Ring2][C@@][Branch2][Ring2][O][C@H1][Ring1][=Branch1][C][C][C@@H1][Branch2][Ring1][O][C@H1][Ring1][#Branch2][\/C][=C][\/C][=N][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][F][C][S][C][C][S][Ring1][Branch1][Branch1][C][C][C][=O][O]."} {"text":"The canonical SMILES of the compound with IUAPC name in CAS-like style 4-(1-aminobutan-2-yl)-3-ethyl-4-piperidinol is CCC(CN)C1(O)CCNCC1CC."}", "/scratch/micpie/export/iupac_smiles/train_11-8.jsonl": "{"text":"Task: Please give me the SMILES of a chemical given the systematic IUPAC name.\nIUPAC name: [(14R)-4,4,10,13,14-pentamethyl-17-[6-methyl-5,6-bis(oxidanyl)heptan-2-yl]-2,3,5,6,7,11,12,15,16,17-decahydro-1H-cyclopenta[a]phenanthren-3-yl] ethanoate\nResult: CC(=O)OC1CCC2(C)C3=C(CCC2C1(C)C)[C@]1(C)CCC(C(C)CCC(O)C(C)(C)O)C1(C)CC3"} {"text":"Task: Please give me the SMILES representation of a compound given the systematic IUPAC name.\nIUPAC name: N-(3-dibenzofuran-1-ylphenyl)-9,9-dimethyl-N-(4-phenylphenyl)fluoren-2-amine\nResult: InChI=1S\/C45H33NO\/c1-45(2)40-19-8-6-16-37(40)38-27-26-35(29-41(38)45)46(33-24-22-31(23-25-33)30-12-4-3-5-13-30)34-15-10-14-32(28-34)36-18-11-21-43-44(36)39-17-7-9-20-42(39)47-43\/h3-29H,1-2H3"}", "/scratch/micpie/export/iupac_smiles/test_14-9.jsonl": "{"text":"Task: Please generate the SMILES of a chemical given the CAS-like IUPAC name.\nIUPAC name: (3S)-5-oxo-1-(2,2,2-trifluoroethyl)-3-pyrrolidinecarboxylic acid (1,3,5-trimethyl-4-pyrazolyl) ester\nResult: InChI=1S\/C13H16F3N3O3\/c1-7-11(8(2)18(3)17-7)22-12(21)9-4-10(20)19(5-9)6-13(14,15)16\/h9H,4-6H2,1-3H3\/t9-\/m0\/s1"} {"text":"Task: Please give me the SMILES of a compound given the IUAPC name in CAS-like style.\nIUPAC name: 4-bromo-2-iodo-5-(trifluoromethyl)phenol\nResult: [C][=C][Branch1][=C][C][=Branch1][#Branch2][=C][C][=Branch1][Branch1][=C][Ring1][=Branch1][O][I][Br][C][Branch1][C][F][Branch1][C][F][F]"}", "/scratch/micpie/export/iupac_smiles/valid_6-7.jsonl": "{"text":"Task: Please create the SMILES of a molecule given the traditional IUPAC name.\nIUPAC name: N-(2-aminoethyl)-2-(5-methoxypentanoylamino)benzamide\nResult: COCCCCC(=O)Nc1ccccc1C(=O)NCCN"} {"text":"Task: Please create the SMILES of a chemical given the traditional IUPAC name.\nIUPAC name: (3-amino-4-methoxy-benzyl)-benzyl-isopropyl-amine\nResult: COc1ccc(CN(Cc2ccccc2)C(C)C)cc1N"}", "/scratch/micpie/export/iupac_smiles/train_23-7.jsonl": "{"text":"Task: Please generate the SMILES representation of a compound based on the traditional IUPAC name.\nIUPAC name: 1-(2-nitrophenyl)sulfonylnipecotaldehyde\nResult: C1CC(CN(C1)S(=O)(=O)C2=CC=CC=C2[N+](=O)[O-])C=O"} {"text":"Task: Please create the SMILES representation of a molecule given the traditional IUPAC name.\nIUPAC name: N-(benzimidazol-1-yl)-4-(2,5-dimethylphenyl)-4-keto-butyramide\nResult: CC=CC=CC=C6))C))C=O)CCC=O)NNC=NC=CC=CC=C69"}", "/scratch/micpie/export/iupac_smiles/train_8-8.jsonl": "{"text":"Task: Please generate the SMILES representation of a chemical given the systematic IUPAC name.\nIUPAC name: 3-[(2-methyl-5-nitro-phenyl)sulfonyl-phenyl-amino]propanamide\nResult: [C][C][=C][Branch1][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N+1][=Branch1][C][=O][O-1][S][=Branch1][C][=O][=Branch1][C][=O][N][Branch1][Branch2][C][C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1]"} {"text":"Task: Please create the SMILES of a chemical based on the systematic IUPAC name.\nIUPAC name: 1-(propan-2-ylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,11,13-hexaenylideneamino)oxy-propan-2-ol;hydrochloride\nResult: [C][C][Branch1][C][C][N][C][C][Branch2][Ring1][=Branch2][C][O][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][#C][O].[Cl]"}", "/scratch/micpie/export/iupac_smiles/test_1-0.jsonl": "{"text":"The traditional IUPAC name of the compound with DeepSMILES CC[C@H]CCC6)O)))[N+]=O)[O-] is (3R)-3-nitrocyclohexanol."} {"text":"The traditional IUPAC name of the molecule with DeepSMILES CCC)NC[C@@H]COC=CC=CC=C6C=CN5)))))))))))O)))C[C@@H]COC=CC=CC=C6C=CN5)))))))))))O is (2S)-1-[[(2S)-2-hydroxy-3-(1H-indol-4-yloxy)propyl]-isopropyl-amino]-3-(1H-indol-4-yloxy)propan-2-ol."}", "/scratch/micpie/export/iupac_smiles/test_27-6.jsonl": "{"text":"The InChI of the chemical with IUPAC name 1-[(3S)-3-(4-chlorophenyl)-5-(3,4-dimethylphenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(2-fluorophenyl)piperazin-1-yl]ethanone is InChI=1S\/C29H30ClFN4O\/c1-20-7-8-23(17-21(20)2)26-18-28(22-9-11-24(30)12-10-22)35(32-26)29(36)19-33-13-15-34(16-14-33)27-6-4-3-5-25(27)31\/h3-12,17,28H,13-16,18-19H2,1-2H3\/t28-\/m0\/s1."} {"text":"The SELFIES of the compound with IUPAC name N-[(1E)-1-(4-ethenyl-2,3,5,5-tetramethylcyclopenta-1,3-dien-1-yl)buta-1,3-dien-2-yl]-5-phenylthiophen-2-amine is [C][C][=C][Branch2][Ring2][Branch2][C][Branch2][Ring1][S][C][=Branch1][Branch1][=C][Ring1][Branch1][C][\/C][=C][Branch1][Ring1][\\C][=C][\/N][C][=C][C][=C][Branch1][Ring2][S][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][Branch1][C][C][C][C][=C]."}", "/scratch/micpie/export/iupac_smiles/valid_26-4.jsonl": "{"text":"The SMILES of the chemical with systematic IUPAC name 2-[4-[bis(fluoranyl)methyl]phenyl]-4-ethyl-1,3-thiazole-5-carboxylic acid is CCC1=C(SC(=N1)C2=CC=C(C=C2)C(F)F)C(=O)O."} {"text":"The canonical SMILES of the molecule with systematic IUPAC name 1-[(3R)-3-(2-chlorophenyl)-5-(3-methoxyphenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(phenylmethyl)piperazin-1-yl]ethanone is COc1cccc(C2=NN(C(=O)CN3CCN(Cc4ccccc4)CC3)[C@@H](c3ccccc3Cl)C2)c1."}", "/scratch/micpie/export/iupac_smiles/train_21-5.jsonl": "{"text":"The SMILES of the chemical with IUAPC name in CAS-like style (2S)-2-[[3-benzoyloxy-2-[3-ethyl-5-[[2-[[[3-[4-(3-fluorophenyl)-1-triazolyl]-5-[[4-[4-(3-fluorophenyl)-1-triazolyl]-3,5-dihydroxy-6-(hydroxymethyl)-2-oxanyl]oxy]-4-hydroxycyclohexyl]-oxomethyl]amino]ethylamino]-oxomethyl]-2-[(3,4,5-trihydroxy-6-methyl-2-oxanyl)oxy]cyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)-4-oxanyl]oxy]-3-cyclohexylpropanoic acid is CCC1CC(CC(C1OC2C(C(C(C(O2)C)O)O)O)OC3C(C(C(C(O3)CO)O)O[C@@H](CC4CCCCC4)C(=O)O)OC(=O)C5=CC=CC=C5)C(=O)NCCNC(=O)C6CC(C(C(C6)OC7C(C(C(C(O7)CO)O)N8C=C(N=N8)C9=CC(=CC=C9)F)O)O)N1C=C(N=N1)C1=CC(=CC=C1)F."} {"text":"The canonical SMILES of the chemical with CAS-like IUPAC name 3-[4-(trifluoromethoxy)phenyl]pyridazine is FC(F)(F)Oc1ccc(-c2cccnn2)cc1."}", "/scratch/micpie/export/iupac_smiles/valid_7-9.jsonl": "{"text":"Task: Please give me the SMILES representation of a molecule based on the IUAPC name in CAS-like style.\nIUPAC name: 2-methyl-N-[2-[[4-(trifluoromethylthio)phenyl]methylamino]ethyl]propanamide\nResult: CCC)C=O)NCCNCC=CC=CC=C6))SCF)F)F"} {"text":"Task: Please create the SMILES representation of a compound given the IUAPC name in CAS-like style.\nIUPAC name: 3-(N-thiophen-2-ylsulfonylanilino)propanamide\nResult: C1=CC=C(C=C1)N(CCC(=O)N)S(=O)(=O)C2=CC=CS2"}", "/scratch/micpie/export/iupac_smiles/train_24-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with DeepSMILES C=CC=CC=C6))OCC=O)NCC=O)NNC=NC=CC=CC=C69 is N-(benzimidazol-1-yl)-2-[(2-phenoxyacetyl)amino]acetamide."} {"text":"The traditional IUPAC name of the compound with canonical SMILES Cc1cc(-c2cnccc2C(F)(F)F)nc2c(C(=O)N[C@@H](C3CC3)C(F)(F)F)cnn12 is N-[(1S)-1-cyclopropyl-2,2,2-trifluoro-ethyl]-7-methyl-5-[4-(trifluoromethyl)-3-pyridyl]pyrazolo[1,5-a]pyrimidine-3-carboxamide."}", "/scratch/micpie/export/iupac_smiles/valid_12-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the compound with SMILES CCCCCCCCOC(CCC(=O)OCCCCCCN(CCCCCCCC(=O)OCC)CCO)OCCCCCCCC is 8-[6-(4,4-dioctoxy-1-oxobutoxy)hexyl-(2-hydroxyethyl)amino]octanoic acid ethyl ester."} {"text":"The IUAPC name in CAS-like style of the molecule with SELFIES [C][C][Branch1][C][C][Branch1][C][C][C][Branch2][Ring1][C][C][=Branch1][C][=O][N][C][C][C][C][C][Branch1][C][F][Branch1][C][F][F][N] is 2-amino-3,3-dimethyl-N-(5,5,5-trifluoropentyl)butanamide."}", "/scratch/micpie/export/iupac_smiles/train_17-3.jsonl": "{"text":"The DeepSMILES of the compound with traditional IUPAC name 2-ethoxy-N-(4-pyridyl)-3-[4-[2-(2H-tetrazol-5-yl)phenyl]benzyl]benzimidazole-4-carboxamide is CCOC=NC=CC=CC=C6N9CC=CC=CC=C6))C=CC=CC=C6C=NNN=N5))))))))))))))))))C=O)NC=CC=NC=C6."} {"text":"The DeepSMILES of the molecule with traditional IUPAC name 6-(2-fluorophenyl)-8-(1-propylbutyl)quinazoline is CCCCCCC)))C=CC=CC=CN=CN=C%106)))))))C=CC=CC=C6F."}", "/scratch/micpie/export/iupac_smiles/test_19-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with InChI InChI=1S\/C17H26ClN3O.ClH\/c1-19-10-6-14-7-11-21(12-8-14)13-9-17(22)20-16-4-2-15(18)3-5-16;\/h2-5,14,19H,6-13H2,1H3,(H,20,22);1H is N-(4-chlorophenyl)-3-[4-[2-(methylamino)ethyl]piperidino]propionamide;hydrochloride."} {"text":"The traditional IUPAC name of the chemical with SELFIES [C][C@H1][C][=C][C][=C][C@][Ring1][=Branch1][C][C][C@H1][Branch2][#Branch2][#Branch1][C][C][=Branch1][C][=O][C][C@H1][Branch2][#Branch1][#C][N][C][=C][C][=Branch2][Ring1][S][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][=Branch2][C][C@@][Branch1][Branch1][C@@H1][Ring1][=Branch1][O][C][C][C][Branch1][Ring2][C][Ring1][#Branch1][C][C][C][C][Ring1][=Branch1][N][Branch2][Ring2][=C][C][=C][C][=Branch1][=Branch1][=C][C][N][Ring1][=Branch1][C@H1][Branch2][Ring1][Branch1][C@@H1][Branch1][#Branch1][C][#C][C][Ring2][Ring2][O][C][=C][N][C][=C][Ring1][Branch1][C][Ring2][Ring2][S][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][O][C][C][C][=Branch1][C][=O][C][C][=C][C][=Branch2][Ring1][Branch2][=C][Branch1][=Branch2][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][O][O][C][=Branch1][C][=O][C] is nan."}", "/scratch/micpie/export/iupac_smiles/train_22-6.jsonl": "{"text":"The InChI of the molecule with preferred IUPAC name 6-methyl-5-(5-methyl-1,3,4-oxadiazol-2-yl)-5-methylsulfonyl-4-(trifluoromethyl)cyclohexa-1,3-diene-1-carboxamide is InChI=1S\/C13H14F3N3O4S\/c1-6-8(10(17)20)4-5-9(13(14,15)16)12(6,24(3,21)22)11-19-18-7(2)23-11\/h4-6H,1-3H3,(H2,17,20)."} {"text":"The DeepSMILES of the compound with IUPAC name 1-propylsulfonylpiperidine-3-carbaldehyde is CCCS=O)=O)NCCCCC6)C=O."}", "/scratch/micpie/export/iupac_smiles/test_21-9.jsonl": "{"text":"Task: Please create the SMILES representation of a compound based on the CAS-like IUPAC name.\nIUPAC name: benzoic acid [4-[(2S)-1-(1-azetidinyl)-3-cyclohexyl-1-oxopropan-2-yl]oxy-2-[5-[[2-[[[3-[[3,5-dihydroxy-6-(hydroxymethyl)-4-[(2-oxo-1-benzopyran-3-yl)methoxy]-2-oxanyl]oxy]-4-hydroxy-5-[(2-oxo-1-benzopyran-3-yl)methoxy]cyclohexyl]-oxomethyl]amino]ethylamino]-oxomethyl]-3-ethyl-2-[(3,4,5-trihydroxy-6-methyl-2-oxanyl)oxy]cyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)-3-oxanyl] ester\nResult: CCC1CC(CC(C1OC2C(C(C(C(O2)C)O)O)O)OC3C(C(C(C(O3)CO)O)O[C@@H](CC4CCCCC4)C(=O)N5CCC5)OC(=O)C6=CC=CC=C6)C(=O)NCCNC(=O)C7CC(C(C(C7)OC8C(C(C(C(O8)CO)O)OCC9=CC1=CC=CC=C1OC9=O)O)O)OCC1=CC2=CC=CC=C2OC1=O"} {"text":"Task: Please create the SMILES representation of a molecule given the CAS-like IUPAC name.\nIUPAC name: 2-(methoxymethyl)-3-[4-(trifluoromethoxy)phenyl]pyrazine\nResult: InChI=1S\/C13H11F3N2O2\/c1-19-8-11-12(18-7-6-17-11)9-2-4-10(5-3-9)20-13(14,15)16\/h2-7H,8H2,1H3"}", "/scratch/micpie/export/iupac_smiles/test_24-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the chemical with SELFIES [C][=C][Branch1][O][C][=C][Branch1][=Branch1][C][=C][Ring1][=Branch1][F][F][C][=Branch1][C][=O][N][N][C][=N][N][C][Ring1][Branch1][=S] is 3,5-difluoro-N-(5-sulfanylidene-1H-1,2,4-triazol-4-yl)benzamide."} {"text":"The CAS-like IUPAC name of the compound with SELFIES [C][C][=C][C][=Branch2][Ring1][=C][=N][C][=C][Branch1][Branch2][C][=N][N][Ring1][=Branch2][Ring1][Branch1][C][=Branch1][C][=O][N][C][Branch1][C][C][C][Branch1][C][F][Branch1][C][F][F][C][=C][N][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][Branch1][C][F][Branch1][C][F][F] is 7-methyl-5-[6-(trifluoromethyl)-3-pyridinyl]-N-(1,1,1-trifluoropropan-2-yl)-3-pyrazolo[1,5-a]pyrimidinecarboxamide."}", "/scratch/micpie/export/iupac_smiles/train_20-3.jsonl": "{"text":"The InChI of the compound with traditional IUPAC name sulfuric acid [4-hydroxy-6-[(6-isohexyl-4,8-diketo-2,6,13,17,17-pentamethyl-7-oxapentacyclo[10.8.0.02,9.05,9.013,18]eicos-11-en-16-yl)oxy]-5-(3,4,5-trihydroxy-6-methyl-tetrahydropyran-2-yl)oxy-tetrahydropyran-3-yl] ester is InChI=1S\/C41H64O15S\/c1-20(2)10-9-15-40(8)33-24(42)18-39(7)23-11-12-26-37(4,5)27(14-16-38(26,6)22(23)13-17-41(33,39)36(47)55-40)53-35-32(29(44)25(19-51-35)56-57(48,49)50)54-34-31(46)30(45)28(43)21(3)52-34\/h13,20-21,23,25-35,43-46H,9-12,14-19H2,1-8H3,(H,48,49,50)."} {"text":"The SELFIES of the compound with traditional IUPAC name benzoic acid [4-[(1S)-1-(azetidine-1-carbonyl)butoxy]-2-[5-[2-[[3-[4-(3-fluorophenyl)triazol-1-yl]-5-[4-[4-(3-fluorophenyl)triazol-1-yl]-3,5-dihydroxy-6-methylol-tetrahydropyran-2-yl]oxy-4-hydroxy-cyclohexanecarbonyl]amino]ethylcarbamoyl]-3-methyl-2-(3,4,5-trihydroxy-6-methyl-tetrahydropyran-2-yl)oxy-cyclohexoxy]-5-hydroxy-6-methylol-tetrahydropyran-3-yl] ester is [C][C][C][C@@H1][Branch1][O][C][=Branch1][C][=O][N][C][C][C][Ring1][Ring2][O][C][C][Branch2][O][O][C][Branch2][O][Branch1][O][C][Branch1][P][C][Ring1][=Branch1][O][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][C][C][Branch2][Ring1][#C][C][C][Branch2][Ring1][=Branch2][C][Ring1][=Branch1][O][C][C][Branch1][S][C][Branch1][N][C][Branch1][Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][C][C][=Branch1][C][=O][N][C][C][N][C][=Branch1][C][=O][C][C][C][Branch2][Branch1][=Branch1][C][Branch2][Ring2][P][C][Branch1][Ring2][C][Ring1][=Branch1][O][C][C][Branch2][Ring2][Ring2][C][Branch1][=N][C][Branch1][=Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][N][C][=C][Branch1][Branch1][N][=N][Ring1][Branch1][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][F][O][O][N][C][=C][Branch1][Branch1][N][=N][Ring1][Branch1][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][F][C][O][O]."}", "/scratch/micpie/export/iupac_smiles/train_7-8.jsonl": "{"text":"Task: Please generate the SMILES representation of a compound based on the systematic IUPAC name.\nIUPAC name: methyl 2-[(5-azanyl-2-chloranyl-phenyl)sulfonylamino]ethanoate\nResult: COC(=O)CNS(=O)(=O)C1=C(C=CC(=C1)N)Cl"} {"text":"Task: Please give me the SMILES representation of a chemical based on the systematic IUPAC name.\nIUPAC name: 3-[(2-nitrophenyl)sulfonyl-phenyl-amino]propanamide\nResult: C1=CC=C(C=C1)N(CCC(=O)N)S(=O)(=O)C2=CC=CC=C2[N+](=O)[O-]"}", "/scratch/micpie/export/iupac_smiles/train_21-4.jsonl": "{"text":"The canonical SMILES of the compound with systematic IUPAC name (2S)-3-cyclohexyl-2-[2-[3-ethyl-5-[2-[[3-[4-(3-fluorophenyl)-1,2,3-triazol-1-yl]-5-[4-[4-(3-fluorophenyl)-1,2,3-triazol-1-yl]-6-(hydroxymethyl)-3,5-bis(oxidanyl)oxan-2-yl]oxy-4-oxidanyl-cyclohexyl]carbonylamino]ethylcarbamoyl]-2-[6-methyl-3,4,5-tris(oxidanyl)oxan-2-yl]oxy-cyclohexyl]oxy-6-(hydroxymethyl)-5-oxidanyl-3-(phenylcarbonyloxy)oxan-4-yl]oxy-propanoic acid is CCC1CC(C(=O)NCCNC(=O)C2CC(OC3OC(CO)C(O)C(n4cc(-c5cccc(F)c5)nn4)C3O)C(O)C(n3cc(-c4cccc(F)c4)nn3)C2)CC(OC2OC(CO)C(O)C(O[C@@H](CC3CCCCC3)C(=O)O)C2OC(=O)c2ccccc2)C1OC1OC(C)C(O)C(O)C1O."} {"text":"The canonical SMILES of the molecule with systematic IUPAC name 3-[4-(trifluoromethyloxy)phenyl]pyridazine is FC(F)(F)Oc1ccc(-c2cccnn2)cc1."}", "/scratch/micpie/export/iupac_smiles/valid_1-9.jsonl": "{"text":"Task: Please generate the SMILES representation of a chemical given the IUAPC name in CAS-like style.\nIUPAC name: ethene;N-ethyl-N-methyl-2-propen-1-amine\nResult: [C][C][N][Branch1][C][C][C][C][=C].[C][=C]"} {"text":"Task: Please generate the SMILES representation of a compound based on the IUAPC name in CAS-like style.\nIUPAC name: (2R)-1-[[(2S)-2-hydroxy-3-(1H-indol-4-yloxy)propyl]-propan-2-ylamino]-3-(1H-indol-4-yloxy)-2-propanol\nResult: InChI=1S\/C25H31N3O4\/c1-17(2)28(13-18(29)15-31-24-7-3-5-22-20(24)9-11-26-22)14-19(30)16-32-25-8-4-6-23-21(25)10-12-27-23\/h3-12,17-19,26-27,29-30H,13-16H2,1-2H3\/t18-,19+"}", "/scratch/micpie/export/iupac_smiles/test_11-5.jsonl": "{"text":"The SMILES of the chemical with CAS-like IUPAC name 3-(2,4-difluorophenoxy)-N,N-dimethyl-3-phenyl-1-propanamine;2,5-dinitrobenzoic acid is CN(C)CCC(C1=CC=CC=C1)OC2=C(C=C(C=C2)F)F.C1=CC(=C(C=C1[N+](=O)[O-])C(=O)O)[N+](=O)[O-]."} {"text":"The canonical SMILES of the molecule with IUAPC name in CAS-like style 4-[7-(4-hydroxyphenyl)-9,9-dioctyl-2-fluorenyl]benzaldehyde is CCCCCCCCC1(CCCCCCCC)c2cc(-c3ccc(O)cc3)ccc2-c2ccc(-c3ccc(C=O)cc3)cc21."}", "/scratch/micpie/export/iupac_smiles/test_10-6.jsonl": "{"text":"The InChI of the molecule with preferred IUPAC name 1-[5-chloro-2-methyl-3-[piperidin-4-yl(prop-2-enyl)amino]phenyl]ethanol is InChI=1S\/C17H25ClN2O\/c1-4-9-20(15-5-7-19-8-6-15)17-11-14(18)10-16(12(17)2)13(3)21\/h4,10-11,13,15,19,21H,1,5-9H2,2-3H3."} {"text":"The InChI of the molecule with IUPAC name 2-[4-[3-(3-fluoro-6-methoxyquinolin-4-yl)propyl]-1-(2-thiophen-3-ylsulfanylethyl)piperidin-3-yl]acetic acid is InChI=1S\/C26H31FN2O3S2\/c1-32-20-5-6-25-23(14-20)22(24(27)15-28-25)4-2-3-18-7-9-29(16-19(18)13-26(30)31)10-12-34-21-8-11-33-17-21\/h5-6,8,11,14-15,17-19H,2-4,7,9-10,12-13,16H2,1H3,(H,30,31)."}", "/scratch/micpie/export/iupac_smiles/train_10-7.jsonl": "{"text":"Task: Please generate the SMILES of a compound given the traditional IUPAC name.\nIUPAC name: N-[4-[5-chloro-3-(dihydroxymethyl)-N,2-diethyl-anilino]cyclohexyl]carbamic acid tert-butyl ester\nResult: CCc1c(C(O)O)cc(Cl)cc1N(CC)C1CCC(NC(=O)OC(C)(C)C)CC1"} {"text":"Task: Please create the SMILES of a molecule based on the traditional IUPAC name.\nIUPAC name: acetic acid [(3aS)-3a-fluorocarbonyl-1-isopropyl-5a,5b,8,8,11a-pentamethyl-1,2,3,4,5,6,7,7a,9,10,11,11b,12,13,13a,13b-hexadecahydrocyclopenta[a]chrysen-9-yl] ester\nResult: CC(=O)OC1CCC2(C)C(CCC3(C)C2CCC2C4C(C(C)C)CC[C@]4(C(=O)F)CCC23C)C1(C)C"}", "/scratch/micpie/export/iupac_smiles/test_20-6.jsonl": "{"text":"The InChI of the molecule with IUPAC name phenyl (2S,8S,12S)-4,10-bis(2-ethoxyphenyl)-3,5,9,11-tetraoxo-4,10-diazatetracyclo[5.5.2.02,6.08,12]tetradec-13-ene-13-carboxylate is InChI=1S\/C35H30N2O8\/c1-3-43-24-16-10-8-14-22(24)36-31(38)27-20-18-21(35(42)45-19-12-6-5-7-13-19)26(29(27)33(36)40)30-28(20)32(39)37(34(30)41)23-15-9-11-17-25(23)44-4-2\/h5-18,20,26-30H,3-4H2,1-2H3\/t20?,26?,27-,28?,29-,30-\/m0\/s1."} {"text":"The SMILES of the compound with preferred IUPAC name (2S)-2-[3-benzoyloxy-2-[3-[4-(3-fluorophenyl)triazol-1-yl]-5-[2-[4-[4-(3-fluorophenyl)triazol-1-yl]-3,5-dihydroxy-6-(hydroxymethyl)oxan-2-yl]oxyethylcarbamoyl]-2-(3,4,5-trihydroxy-6-methyloxan-2-yl)oxycyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)oxan-4-yl]oxy-3-cyclohexylpropanoic acid is CC1C(C(C(C(O1)OC2C(CC(CC2OC3C(C(C(C(O3)CO)O)O[C@@H](CC4CCCCC4)C(=O)O)OC(=O)C5=CC=CC=C5)C(=O)NCCOC6C(C(C(C(O6)CO)O)N7C=C(N=N7)C8=CC(=CC=C8)F)O)N9C=C(N=N9)C1=CC(=CC=C1)F)O)O)O."}", "/scratch/micpie/export/iupac_smiles/train_18-5.jsonl": "{"text":"The SELFIES of the chemical with IUAPC name in CAS-like style N-[3-[6-(6-methyl-3-pyridinyl)-3,8a-dihydroquinazolin-8-yl]phenyl]-2-propenamide is [C][C][=N][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][N][C][=N][C][Ring1][=Branch1][C][=Branch1][Ring2][=C][Ring1][#Branch2][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=C]."} {"text":"The SELFIES of the compound with CAS-like IUPAC name 2-[1-[(3,4-difluorophenyl)methyl]-4-piperidinyl]-N-methylethanamine;hydrochloride is [C][N][C][C][C][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][=C][C][=Branch1][=Branch2][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][F][F].[Cl]."}", "/scratch/micpie/export/iupac_smiles/test_10-7.jsonl": "{"text":"Task: Please give me the SMILES of a chemical given the traditional IUPAC name.\nIUPAC name: 1-[3-[allyl(4-piperidyl)amino]-5-chloro-2-methyl-phenyl]ethanol\nResult: C=CCN(c1cc(Cl)cc(C(C)O)c1C)C1CCNCC1"} {"text":"Task: Please create the SMILES of a chemical based on the traditional IUPAC name.\nIUPAC name: 2-[4-[3-(3-fluoro-6-methoxy-4-quinolyl)propyl]-1-[2-(3-thienylthio)ethyl]-3-piperidyl]acetic acid\nResult: COC1=CC2=C(C(=CN=C2C=C1)F)CCCC3CCN(CC3CC(=O)O)CCSC4=CSC=C4"}", "/scratch/micpie/export/iupac_smiles/train_0-2.jsonl": "{"text":"The IUPAC name of the compound with canonical SMILES COc1cc(\/C=N\\NC(=O)c2ccn(Cc3ccc([N+](=O)[O-])cc3)n2)cc(Br)c1OC is N-[(Z)-(3-bromo-4,5-dimethoxyphenyl)methylideneamino]-1-[(4-nitrophenyl)methyl]pyrazole-3-carboxamide."} {"text":"The IUPAC name of the compound with canonical SMILES CC(=O)c1ccc(OCCCN2CCCCC2C)cc1 is 1-[4-[3-(2-methylpiperidin-1-yl)propoxy]phenyl]ethanone."}", "/scratch/micpie/export/iupac_smiles/train_14-0.jsonl": "{"text":"The traditional IUPAC name of the compound with DeepSMILES CC=CC=NN5C)))C))OC=O)[C@@H]CC=O)NC5)CCF)F)F is (3R)-5-keto-1-(2,2,2-trifluoroethyl)pyrrolidine-3-carboxylic acid (1,3,5-trimethylpyrazol-4-yl) ester."} {"text":"The traditional IUPAC name of the compound with canonical SMILES NCc1cc2cc(-c3ccc(S(=O)(=O)N4CCC(F)C4)cc3)cc(Cl)c2o1 is [7-chloro-5-[4-(3-fluoropyrrolidino)sulfonylphenyl]benzofuran-2-yl]methylamine."}", "/scratch/micpie/export/iupac_smiles/train_4-2.jsonl": "{"text":"The IUPAC name of the chemical with InChI InChI=1S\/C29H34FNO2S2\/c1-28-13-12-26(33)29(2,18-32)25(28)11-9-23(27-34-14-15-35-27)24(28)10-8-22-7-6-20(17-31-22)19-4-3-5-21(30)16-19\/h3-8,10,16-18,23-27,33H,9,11-15H2,1-2H3\/b10-8+\/t23-,24+,25-,26+,28-,29-\/m0\/s1 is (1S,2R,4aS,5R,6S,8aS)-6-(1,3-dithiolan-2-yl)-5-[(E)-2-[5-(3-fluorophenyl)pyridin-2-yl]ethenyl]-2-hydroxy-1,4a-dimethyl-2,3,4,5,6,7,8,8a-octahydronaphthalene-1-carbaldehyde."} {"text":"The IUPAC name of the compound with canonical SMILES CCC(CN)C1(O)CCNCC1CC is 4-(1-aminobutan-2-yl)-3-ethylpiperidin-4-ol."}", "/scratch/micpie/export/iupac_smiles/test_6-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the chemical with SMILES CC(C)(CC(=O)NC1=CC=CC=C1C(=O)NCCN)C2=CC=CC=C2 is N-(2-aminoethyl)-2-[(3-methyl-1-oxo-3-phenylbutyl)amino]benzamide."} {"text":"The CAS-like IUPAC name of the compound with SELFIES [C][C][O][C][=C][Branch1][Ring2][O][Ring1][=Branch1][C][=C][C][=Branch1][Ring2][=C][Ring1][#Branch1][N][C][C][=C][C][=Branch1][=Branch2][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl][Cl] is N-[(3,4-dichlorophenyl)methyl]-2,3-dihydro-1,4-benzodioxin-6-amine."}", "/scratch/micpie/export/iupac_smiles/valid_4-1.jsonl": "{"text":"The CAS-like IUPAC name of the compound with DeepSMILES CCS=O)=O)NCC=CNN=N5))[C@@H][C@@H][C@H][C@@H][C@H]O6)CO)))O[C@H][C@@H][C@H][C@H][C@H]O6)CO)))O))O))O)))))O))O)))))))C=CC=CC=C6))Cl is N-(4-chlorophenyl)-N-[[1-[(2S,3R,4R,5S,6R)-3,4-dihydroxy-6-(hydroxymethyl)-5-[[(2S,3R,4S,5R,6R)-3,4,5-trihydroxy-6-(hydroxymethyl)-2-oxanyl]oxy]-2-oxanyl]-4-triazolyl]methyl]ethanesulfonamide."} {"text":"The CAS-like IUPAC name of the chemical with SELFIES [C][C][C][C][C][C][Branch1][Branch1][C][C][Ring1][=Branch1][Branch1][Ring1][C][N][C][Branch1][#Branch2][C][C][N][C][C][Ring1][=Branch1][C][C][O] is 4-[1-(aminomethyl)-4-ethylcyclohexyl]-3-ethyl-4-piperidinol."}", "/scratch/micpie/export/iupac_smiles/test_10-9.jsonl": "{"text":"Task: Please generate the SMILES of a molecule based on the CAS-like IUPAC name.\nIUPAC name: 1-[5-chloro-2-methyl-3-[4-piperidinyl(prop-2-enyl)amino]phenyl]ethanol\nResult: [C][C][=C][Branch2][Ring1][=Branch2][C][=C][Branch2][Ring1][Ring1][C][=C][Ring1][=Branch1][N][Branch1][Ring2][C][C][=C][C][C][C][N][C][C][Ring1][=Branch1][Cl][C][Branch1][C][C][O]"} {"text":"Task: Please generate the SMILES of a chemical given the IUAPC name in CAS-like style.\nIUPAC name: 2-[4-[3-(3-fluoro-6-methoxy-4-quinolinyl)propyl]-1-[2-(3-thiophenylthio)ethyl]-3-piperidinyl]acetic acid\nResult: InChI=1S\/C26H31FN2O3S2\/c1-32-20-5-6-25-23(14-20)22(24(27)15-28-25)4-2-3-18-7-9-29(16-19(18)13-26(30)31)10-12-34-21-8-11-33-17-21\/h5-6,8,11,14-15,17-19H,2-4,7,9-10,12-13,16H2,1H3,(H,30,31)"}", "/scratch/micpie/export/iupac_smiles/valid_5-3.jsonl": "{"text":"The SMILES of the compound with traditional IUPAC name 4-[1-(aminomethyl)cyclooctyl]-3-ethyl-piperidin-4-ol is CCC1CNCCC1(C2(CCCCCCC2)CN)O."} {"text":"The SELFIES of the chemical with traditional IUPAC name N-(2-aminoethyl)-2-[3-(cyclopentanecarbonylamino)butanoylamino]benzamide;hydrochloride is [C][C][Branch2][Ring1][#Branch1][C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][N][N][C][=Branch1][C][=O][C][C][C][C][C][Ring1][Branch1].[Cl]."}", "/scratch/micpie/export/iupac_smiles/valid_11-7.jsonl": "{"text":"Task: Please generate the SMILES representation of a chemical given the traditional IUPAC name.\nIUPAC name: 13-chloro-2-(4-piperidylidene)-4-azatricyclo[9.4.0.03,8]pentadeca-1(11),3(8),4,6,12,14-hexaene;disulfate\nResult: C1CC2=C(C=CC(=C2)Cl)C(=C3CCNCC3)C4=C1C=CC=N4.[O-]S(=O)(=O)[O-].[O-]S(=O)(=O)[O-]"} {"text":"Task: Please create the SMILES representation of a molecule based on the traditional IUPAC name.\nIUPAC name: 4-(7-bromo-9,9-dioctyl-fluoren-2-yl)benzaldehyde\nResult: CCCCCCCCC1(C2=C(C=CC(=C2)C3=CC=C(C=C3)C=O)C4=C1C=C(C=C4)Br)CCCCCCCC"}", "/scratch/micpie/export/iupac_smiles/test_12-5.jsonl": "{"text":"The SMILES of the compound with CAS-like IUPAC name 2-(4-chlorophenyl)spiro[fluorene-9,9'-thioxanthene] is C1=CC=C2C(=C1)C3=C(C24C5=CC=CC=C5SC6=CC=CC=C46)C=C(C=C3)C7=CC=C(C=C7)Cl."} {"text":"The SELFIES of the molecule with IUAPC name in CAS-like style 2-amino-N-(cyclopropylmethyl)-3,3-dimethyl-N-prop-2-ynylbutanamide is [C][C][Branch1][C][C][Branch1][C][C][C][Branch1][P][C][=Branch1][C][=O][N][Branch1][Ring2][C][C][#C][C][C][C][C][Ring1][Ring1][N]."}", "/scratch/micpie/export/iupac_smiles/test_3-6.jsonl": "{"text":"The SMILES of the chemical with preferred IUPAC name (2R)-1-benzylsulfanyl-3-propan-2-yloxypropan-2-ol is CC(C)OC[C@H](CSCC1=CC=CC=C1)O."} {"text":"The DeepSMILES of the compound with preferred IUPAC name 2-amino-2-(hydroxymethyl)propane-1,3-diol;5-chloro-6-methyl-3H-1,3-benzoxazol-2-one is CC=CC=CC=C6Cl)))NC=O)O5.CCCO))CO))N))O."}", "/scratch/micpie/export/iupac_smiles/test_8-4.jsonl": "{"text":"The SELFIES of the compound with systematic IUPAC name N-[3-[(3,4-dimethylphenyl)sulfonylamino]phenyl]-2-methyl-propanamide is [C][C][=C][Branch2][Ring2][Branch1][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][S][=Branch1][C][=O][=Branch1][C][=O][N][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][Branch1][C][C][C][C]."} {"text":"The DeepSMILES of the molecule with systematic IUPAC name 1-(tert-butylamino)-3-[(diphenylmethylidene)amino]oxy-propan-2-ol is CCC)C)NCCCON=CC=CC=CC=C6))))))C=CC=CC=C6))))))))))O."}", "/scratch/micpie/export/iupac_smiles/test_2-2.jsonl": "{"text":"The IUPAC name of the chemical with DeepSMILES CC=CC=CC=C6C=O)[C@@H]C)NC is (2R)-2-(methylamino)-1-(2-methylphenyl)propan-1-one."} {"text":"The preferred IUPAC name of the chemical with SMILES C1COC2=C(O1)C=C(C(=C2)Br)N3C(=O)C4=C(C3=O)C(=C(C(=C4Cl)Cl)Cl)Cl is 2-(6-bromo-2,3-dihydro-1,4-benzodioxin-7-yl)-4,5,6,7-tetrachloroisoindole-1,3-dione."}", "/scratch/micpie/export/iupac_smiles/valid_18-1.jsonl": "{"text":"The CAS-like IUPAC name of the chemical with DeepSMILES CNCCCC5)C=CC=CC=CN=CN=C%106)))))))C=CN=CC=C6 is 8-(1-methyl-3-pyrrolidinyl)-6-(3-pyridinyl)quinazoline."} {"text":"The IUAPC name in CAS-like style of the compound with InChI InChI=1S\/C19H31N3OS\/c1-15(21-12-8-16(9-13-21)7-10-20-2)19(23)22-11-3-5-17(22)18-6-4-14-24-18\/h4,6,14-17,20H,3,5,7-13H2,1-2H3 is 2-[4-[2-(methylamino)ethyl]-1-piperidinyl]-1-(2-thiophen-2-yl-1-pyrrolidinyl)-1-propanone."}", "/scratch/micpie/export/iupac_smiles/train_5-9.jsonl": "{"text":"Task: Please give me the SMILES of a compound based on the CAS-like IUPAC name.\nIUPAC name: 4-[3-(aminomethyl)-3-oxolanyl]-3-ethyl-4-piperidinol\nResult: CCC1CNCCC1(O)C1(CN)CCOC1"} {"text":"Task: Please give me the SMILES representation of a compound based on the CAS-like IUPAC name.\nIUPAC name: N-(2-aminoethyl)-2-[[1-oxo-2-(2-oxolanylmethoxy)propyl]amino]benzamide;hydrochloride\nResult: CC(OCC1CCCO1)C(=O)Nc1ccccc1C(=O)NCCN.Cl"}", "/scratch/micpie/export/iupac_smiles/train_27-1.jsonl": "{"text":"The CAS-like IUPAC name of the chemical with DeepSMILES CCNCCN6CC=O)N[C@H]CC=N5)C=CC=CC=C6))F)))))))C=CC=CC=C6))Cl))))))))))))C=CC=CC=C6F is 1-[(3R)-3-(4-chlorophenyl)-5-(4-fluorophenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(2-fluorophenyl)-1-piperazinyl]ethanone."} {"text":"The CAS-like IUPAC name of the molecule with SMILES CC1(C2=CC=CC=C2C3=C1C=C(C=C3)NC4=CC=C(C=C4)C5=C(C(=C(C(=C5O)O)C6=C(C(=C(C(=C6O)O)O)O)O)O)O)C is 6-[4-[4-[(9,9-dimethyl-2-fluorenyl)amino]phenyl]-2,3,5,6-tetrahydroxyphenyl]benzene-1,2,3,4,5-pentol."}", "/scratch/micpie/export/iupac_smiles/valid_18-2.jsonl": "{"text":"The preferred IUPAC name of the chemical with InChI InChI=1S\/C18H18N4\/c1-22-6-4-14(11-22)17-8-15(13-3-2-5-19-9-13)7-16-10-20-12-21-18(16)17\/h2-3,5,7-10,12,14H,4,6,11H2,1H3 is 8-(1-methylpyrrolidin-3-yl)-6-pyridin-3-ylquinazoline."} {"text":"The IUPAC name of the chemical with DeepSMILES CCC=O)NCCCC5C=CC=CS5)))))))))))NCCCCC6))CCNC is 2-[4-[2-(methylamino)ethyl]piperidin-1-yl]-1-(2-thiophen-2-ylpyrrolidin-1-yl)propan-1-one."}", "/scratch/micpie/export/iupac_smiles/valid_22-0.jsonl": "{"text":"The traditional IUPAC name of the compound with canonical SMILES C#Cc1cccc(C2CC2)c1C(F)(F)F is 1-cyclopropyl-3-ethynyl-2-(trifluoromethyl)benzene."} {"text":"The traditional IUPAC name of the molecule with SELFIES [C][O][C][=C][Branch2][Ring1][N][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][=O][F] is 1-(3-fluoro-4-methoxy-phenyl)sulfonylnipecotaldehyde."}", "/scratch/micpie/export/iupac_smiles/train_0-7.jsonl": "{"text":"Task: Please give me the SMILES representation of a chemical given the traditional IUPAC name.\nIUPAC name: N-[(Z)-(3-bromo-4,5-dimethoxy-benzylidene)amino]-1-(4-nitrobenzyl)pyrazole-3-carboxamide\nResult: InChI=1S\/C20H18BrN5O5\/c1-30-18-10-14(9-16(21)19(18)31-2)11-22-23-20(27)17-7-8-25(24-17)12-13-3-5-15(6-4-13)26(28)29\/h3-11H,12H2,1-2H3,(H,23,27)\/b22-11-"} {"text":"Task: Please generate the SMILES of a molecule based on the traditional IUPAC name.\nIUPAC name: 1-[4-[3-(2-methylpiperidino)propoxy]phenyl]ethanone\nResult: CC(=O)c1ccc(OCCCN2CCCCC2C)cc1"}", "/scratch/micpie/export/iupac_smiles/train_20-4.jsonl": "{"text":"The SMILES of the compound with systematic IUPAC name [5-[6-methyl-3,4,5-tris(oxidanyl)oxan-2-yl]oxy-4-oxidanyl-6-[[2,6,13,17,17-pentamethyl-6-(4-methylpentyl)-4,8-bis(oxidanylidene)-7-oxapentacyclo[10.8.0.02,9.05,9.013,18]icos-11-en-16-yl]oxy]oxan-3-yl] hydrogen sulfate is CC1C(C(C(C(O1)OC2C(C(COC2OC3CCC4(C(C3(C)C)CCC5C4=CCC67C5(CC(=O)C6C(OC7=O)(C)CCCC(C)C)C)C)OS(=O)(=O)O)O)O)O)O."} {"text":"The DeepSMILES of the chemical with systematic IUPAC name [4-[(2S)-1-(azetidin-1-yl)-1-oxidanylidene-pentan-2-yl]oxy-2-[5-[2-[[3-[4-(3-fluorophenyl)-1,2,3-triazol-1-yl]-5-[4-[4-(3-fluorophenyl)-1,2,3-triazol-1-yl]-6-(hydroxymethyl)-3,5-bis(oxidanyl)oxan-2-yl]oxy-4-oxidanyl-cyclohexyl]carbonylamino]ethylcarbamoyl]-3-methyl-2-[6-methyl-3,4,5-tris(oxidanyl)oxan-2-yl]oxy-cyclohexyl]oxy-6-(hydroxymethyl)-5-oxidanyl-oxan-3-yl] benzoate is CCC[C@@H]C=O)NCCC4)))))OCCCOCC6OC=O)C=CC=CC=C6)))))))))OCCCCCC6OCCCCCO6)C))O))O))O)))))C)))C=O)NCCNC=O)CCCCCC6)OCCCCCO6)CO)))O))NC=CN=N5))C=CC=CC=C6)))F))))))))O)))))O))NC=CN=N5))C=CC=CC=C6)))F))))))))))))))))))))))CO)))O."}", "/scratch/micpie/export/iupac_smiles/valid_24-5.jsonl": "{"text":"The InChI of the compound with IUAPC name in CAS-like style 2-(1,4-dioxo-3H-phthalazin-2-yl)-N-(2-hydroxyphenyl)acetamide is InChI=1S\/C16H13N3O4\/c20-13-8-4-3-7-12(13)17-14(21)9-19-16(23)11-6-2-1-5-10(11)15(22)18-19\/h1-8,20H,9H2,(H,17,21)(H,18,22)."} {"text":"The DeepSMILES of the compound with CAS-like IUPAC name 5-(7-fluoro-1,3-benzoxazol-5-yl)-7-methyl-N-(1,1,1-trifluoropropan-2-yl)-3-pyrazolo[1,5-a]pyrimidinecarboxamide is CC=CC=NC=CC=NN95)))C=O)NCC)CF)F)F))))))))C=CC=CC=C6)F))OC=N5."}", "/scratch/micpie/export/iupac_smiles/test_15-4.jsonl": "{"text":"The canonical SMILES of the compound with systematic IUPAC name (E)-3-(6-azanylpyridin-3-yl)-N-[[5-[4-[4,4-bis(fluoranyl)piperidin-1-yl]carbonylphenyl]-7-[5-chloranyl-2,4-bis(fluoranyl)phenyl]-1-benzofuran-2-yl]methyl]prop-2-enamide is Nc1ccc(\/C=C\/C(=O)NCc2cc3cc(-c4ccc(C(=O)N5CCC(F)(F)CC5)cc4)cc(-c4cc(Cl)c(F)cc4F)c3o2)cn1."} {"text":"The canonical SMILES of the compound with systematic IUPAC name 5-chloranyl-4-(chloromethyl)-2-(2,3-dihydro-1H-inden-5-yloxy)pyridine is ClCc1cc(Oc2ccc3c(c2)CCC3)ncc1Cl."}", "/scratch/micpie/export/iupac_smiles/train_12-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the chemical with SELFIES [C][C][Branch2][=Branch1][Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][=Branch2][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][Branch2][Ring1][Ring1][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][Ring1][=Branch2][=C][C][=C][Ring1][=N][C] is N-[3-(1-dibenzofuranyl)phenyl]-9,9-dimethyl-N-(3-phenylphenyl)-2-fluorenamine."} {"text":"The CAS-like IUPAC name of the compound with DeepSMILES CCC)C)CC=O)NCCC=C)))C=O)O)))))N is 2-[(2-amino-3,3-dimethyl-1-oxobutyl)amino]-4-pentenoic acid."}", "/scratch/micpie/export/iupac_smiles/valid_4-7.jsonl": "{"text":"Task: Please give me the SMILES of a molecule given the traditional IUPAC name.\nIUPAC name: N-(4-chlorophenyl)-N-[[1-[(2S,3R,4R,5S,6R)-3,4-dihydroxy-6-methylol-5-[(2S,3R,4S,5R,6R)-3,4,5-trihydroxy-6-methylol-tetrahydropyran-2-yl]oxy-tetrahydropyran-2-yl]triazol-4-yl]methyl]ethanesulfonamide\nResult: InChI=1S\/C23H33ClN4O12S\/c1-2-41(36,37)28(13-5-3-11(24)4-6-13)8-12-7-27(26-25-12)22-19(34)18(33)21(15(10-30)38-22)40-23-20(35)17(32)16(31)14(9-29)39-23\/h3-7,14-23,29-35H,2,8-10H2,1H3\/t14-,15-,16+,17+,18-,19-,20-,21-,22+,23+\/m1\/s1"} {"text":"Task: Please give me the SMILES of a molecule based on the traditional IUPAC name.\nIUPAC name: 4-[1-(aminomethyl)-4-ethyl-cyclohexyl]-3-ethyl-piperidin-4-ol\nResult: [C][C][C][C][C][C][Branch1][Branch1][C][C][Ring1][=Branch1][Branch1][Ring1][C][N][C][Branch1][#Branch2][C][C][N][C][C][Ring1][=Branch1][C][C][O]"}", "/scratch/micpie/export/iupac_smiles/train_1-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the molecule with canonical SMILES C=C(Cl)\/C=C\\c1c(NCCCCN2CCCCC2)ccnc1C is 3-[(1Z)-3-chlorobuta-1,3-dienyl]-2-methyl-N-[4-(1-piperidinyl)butyl]-4-pyridinamine."} {"text":"The IUAPC name in CAS-like style of the chemical with DeepSMILES C[C@]CC[C@H][C@H][C@@H]6CC[C@@H]9O[C@@H]CCCCO6)))))))))))C[C@@H]C=C6C=CC=C6)OC)))))))O is (6S,8R,9S,13S,14S,17S)-3-methoxy-13-methyl-17-[[(2R)-2-oxanyl]oxy]-6,7,8,9,11,12,14,15,16,17-decahydrocyclopenta[a]phenanthren-6-ol."}", "/scratch/micpie/export/iupac_smiles/train_16-9.jsonl": "{"text":"Task: Please create the SMILES of a chemical based on the IUAPC name in CAS-like style.\nIUPAC name: 5-chloro-4-(chloromethyl)-2-(2,3,5-trimethylphenoxy)pyridine\nResult: InChI=1S\/C15H15Cl2NO\/c1-9-4-10(2)11(3)14(5-9)19-15-6-12(7-16)13(17)8-18-15\/h4-6,8H,7H2,1-3H3"} {"text":"Task: Please give me the SMILES representation of a molecule based on the CAS-like IUPAC name.\nIUPAC name: 2-ethoxy-1H-benzimidazole-4-carboxylic acid propan-2-yl ester\nResult: InChI=1S\/C13H16N2O3\/c1-4-17-13-14-10-7-5-6-9(11(10)15-13)12(16)18-8(2)3\/h5-8H,4H2,1-3H3,(H,14,15)"}", "/scratch/micpie/export/iupac_smiles/test_24-7.jsonl": "{"text":"Task: Please give me the SMILES representation of a molecule based on the traditional IUPAC name.\nIUPAC name: 3,5-difluoro-N-(5-thioxo-1H-1,2,4-triazol-4-yl)benzamide\nResult: C1=C(C=C(C=C1F)F)C(=O)NN2C=NNC2=S"} {"text":"Task: Please generate the SMILES representation of a molecule given the traditional IUPAC name.\nIUPAC name: 7-methyl-N-(2,2,2-trifluoro-1-methyl-ethyl)-5-[6-(trifluoromethyl)-3-pyridyl]pyrazolo[1,5-a]pyrimidine-3-carboxamide\nResult: Cc1cc(-c2ccc(C(F)(F)F)nc2)nc2c(C(=O)NC(C)C(F)(F)F)cnn12"}", "/scratch/micpie/export/iupac_smiles/valid_23-8.jsonl": "{"text":"Task: Please create the SMILES of a molecule given the systematic IUPAC name.\nIUPAC name: 1-(3-chlorophenyl)sulfonylpiperidine-3-carbaldehyde\nResult: O=CC1CCCN(S(=O)(=O)c2cccc(Cl)c2)C1"} {"text":"Task: Please create the SMILES of a molecule based on the systematic IUPAC name.\nIUPAC name: N-(benzimidazol-1-yl)-3-[(2,3,4,5,6-pentamethylphenyl)sulfonylamino]propanamide\nResult: CC=CC=CC=C6C))C))S=O)=O)NCCC=O)NNC=NC=CC=CC=C69))))))))))))))))C))C"}", "/scratch/micpie/export/iupac_smiles/valid_27-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the compound with SELFIES [C][N][Branch2][Ring2][Branch2][C][C][=Branch1][C][=O][N][C@@H1][Branch2][Ring1][Ring1][C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][F][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl][C][=C][C][=C][C][=C][Ring1][=Branch1] is 1-[(3S)-3-(4-chlorophenyl)-5-(4-fluorophenyl)-3,4-dihydropyrazol-2-yl]-2-(N-methylanilino)ethanone."} {"text":"The CAS-like IUPAC name of the compound with SELFIES [C][C][Branch2][#Branch1][Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][=Branch2][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][Branch2][Ring1][=Branch1][C][Ring1][#Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=N][Ring1][=Branch1][C][=C][Branch1][=Branch1][C][=C][Ring2][Ring1][Ring1][C][=C][Branch1][P][C][=Branch1][=N][=C][Branch1][=Branch2][C][=Branch1][Branch1][=C][Ring1][=Branch1][O][O][O][O][O][C] is 6-[7'-[(9,9-dimethyl-2-fluorenyl)amino]-9,9'-spirobi[fluorene]-2'-yl]benzene-1,2,3,4,5-pentol."}", "/scratch/micpie/export/iupac_smiles/valid_18-9.jsonl": "{"text":"Task: Please create the SMILES of a chemical given the IUAPC name in CAS-like style.\nIUPAC name: 8-(1-methyl-3-pyrrolidinyl)-6-(3-pyridinyl)quinazoline\nResult: InChI=1S\/C18H18N4\/c1-22-6-4-14(11-22)17-8-15(13-3-2-5-19-9-13)7-16-10-20-12-21-18(16)17\/h2-3,5,7-10,12,14H,4,6,11H2,1H3"} {"text":"Task: Please create the SMILES of a molecule based on the IUAPC name in CAS-like style.\nIUPAC name: 2-[4-[2-(methylamino)ethyl]-1-piperidinyl]-1-(2-thiophen-2-yl-1-pyrrolidinyl)-1-propanone\nResult: CC(C(=O)N1CCCC1C2=CC=CS2)N3CCC(CC3)CCNC"}", "/scratch/micpie/export/iupac_smiles/valid_14-2.jsonl": "{"text":"The preferred IUPAC name of the chemical with SELFIES [C][C][Branch1][C][C][Branch1][C][C][O][C][=Branch1][C][=O][N][C][C@@H1][C][C][C][C@@H1][Ring1][Branch1][N][C][=Branch1][C][=O][C][=C][C][=C][O][Ring1][Branch1] is tert-butyl N-[[(1S,2S)-2-(furan-2-carbonylamino)cyclopentyl]methyl]carbamate."} {"text":"The IUPAC name of the chemical with SELFIES [C][C][Branch1][C][C][Branch1][C][C][O][C][=Branch1][C][=O][N][C][C][C][=C][C][=C][C][=Branch1][N][=C][C][=Branch1][#Branch1][=C][Ring1][=Branch1][O][Ring1][=Branch2][Cl][Br] is tert-butyl N-[2-(5-bromo-7-chloro-1-benzofuran-2-yl)ethyl]carbamate."}", "/scratch/micpie/export/iupac_smiles/train_24-4.jsonl": "{"text":"The DeepSMILES of the compound with systematic IUPAC name N-(benzimidazol-1-yl)-2-(2-phenoxyethanoylamino)ethanamide is C=CC=CC=C6))OCC=O)NCC=O)NNC=NC=CC=CC=C69."} {"text":"The InChI of the molecule with systematic IUPAC name N-[(1S)-1-cyclopropyl-2,2,2-tris(fluoranyl)ethyl]-7-methyl-5-[4-(trifluoromethyl)pyridin-3-yl]pyrazolo[1,5-a]pyrimidine-3-carboxamide is InChI=1S\/C19H15F6N5O\/c1-9-6-14(11-7-26-5-4-13(11)18(20,21)22)28-16-12(8-27-30(9)16)17(31)29-15(10-2-3-10)19(23,24)25\/h4-8,10,15H,2-3H2,1H3,(H,29,31)\/t15-\/m0\/s1."}", "/scratch/micpie/export/iupac_smiles/train_21-3.jsonl": "{"text":"The SELFIES of the compound with traditional IUPAC name (2S)-2-[3-benzoyloxy-2-[3-ethyl-5-[2-[[3-[4-(3-fluorophenyl)triazol-1-yl]-5-[4-[4-(3-fluorophenyl)triazol-1-yl]-3,5-dihydroxy-6-methylol-tetrahydropyran-2-yl]oxy-4-hydroxy-cyclohexanecarbonyl]amino]ethylcarbamoyl]-2-(3,4,5-trihydroxy-6-methyl-tetrahydropyran-2-yl)oxy-cyclohexoxy]-5-hydroxy-6-methylol-tetrahydropyran-4-yl]oxy-3-cyclohexyl-propionic acid is [C][C][C][C][C][Branch2][#Branch1][C][C][C][Branch2][Ring1][=Branch2][C][Ring1][=Branch1][O][C][C][Branch1][S][C][Branch1][N][C][Branch1][Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][O][C][C][Branch2][Ring2][C][C][Branch1][=N][C][Branch1][=Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][C@@H1][Branch1][#Branch2][C][C][C][C][C][C][C][Ring1][=Branch1][C][=Branch1][C][=O][O][O][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][N][C][=Branch1][C][=O][C][C][C][Branch2][Branch1][=Branch1][C][Branch2][Ring2][P][C][Branch1][Ring2][C][Ring1][=Branch1][O][C][C][Branch2][Ring2][Ring2][C][Branch1][=N][C][Branch1][=Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][N][C][=C][Branch1][Branch1][N][=N][Ring1][Branch1][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][F][O][O][N][C][=C][Branch1][Branch1][N][=N][Ring1][Branch1][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][F]."} {"text":"The InChI of the compound with traditional IUPAC name 3-[4-(trifluoromethoxy)phenyl]pyridazine is InChI=1S\/C11H7F3N2O\/c12-11(13,14)17-9-5-3-8(4-6-9)10-2-1-7-15-16-10\/h1-7H."}", "/scratch/micpie/export/iupac_smiles/valid_16-1.jsonl": "{"text":"The CAS-like IUPAC name of the molecule with SMILES C1=CC(=CC(=C1)I)OC2=NC=C(C(=C2)CCl)Cl is 5-chloro-4-(chloromethyl)-2-(3-iodophenoxy)pyridine."} {"text":"The CAS-like IUPAC name of the chemical with canonical SMILES CCOc1nc2cccc(C(=O)NCCc3cc(Cl)ccc3Cl)c2n1Cc1ccc(-c2ccccc2-c2nn[nH]n2)cc1 is N-[2-(2,5-dichlorophenyl)ethyl]-2-ethoxy-3-[[4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl]-4-benzimidazolecarboxamide."}", "/scratch/micpie/export/iupac_smiles/valid_15-8.jsonl": "{"text":"Task: Please generate the SMILES representation of a chemical given the systematic IUPAC name.\nIUPAC name: 2-[5-[5-[4,4-bis(fluoranyl)piperidin-1-yl]carbonylpyridin-2-yl]-7-(trifluoromethyl)-1-benzofuran-2-yl]ethyl methanesulfonate\nResult: CS(=O)(=O)OCCC1=CC2=CC(=CC(=C2O1)C(F)(F)F)C3=NC=C(C=C3)C(=O)N4CCC(CC4)(F)F"} {"text":"Task: Please give me the SMILES of a compound given the systematic IUPAC name.\nIUPAC name: 5-chloranyl-4-(chloromethyl)-2-naphthalen-1-yloxy-pyridine\nResult: [C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][C][=C][Ring1][#Branch1][O][C][=N][C][=C][Branch1][=Branch2][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][Cl][Cl]"}", "/scratch/micpie/export/iupac_smiles/test_27-3.jsonl": "{"text":"The SMILES of the molecule with traditional IUPAC name 1-[(5S)-5-(4-chlorophenyl)-3-(3,4-dimethylphenyl)-2-pyrazolin-1-yl]-2-[4-(2-fluorophenyl)piperazino]ethanone is CC1=C(C=C(C=C1)C2=NN([C@@H](C2)C3=CC=C(C=C3)Cl)C(=O)CN4CCN(CC4)C5=CC=CC=C5F)C."} {"text":"The DeepSMILES of the compound with traditional IUPAC name (5-phenyl-2-thienyl)-[(1E)-1-[(2,3,5,5-tetramethyl-4-vinyl-cyclopenta-1,3-dien-1-yl)methylene]allyl]amine is CC=CCC=C5C))\/C=C\\C=C))\/NC=CC=CS5)C=CC=CC=C6))))))))))))))C)C))C=C."}", "/scratch/micpie/export/iupac_smiles/train_16-6.jsonl": "{"text":"The DeepSMILES of the molecule with preferred IUPAC name 5-chloro-4-(chloromethyl)-2-(2,3,5-trimethylphenoxy)pyridine is CC=CC=CC=C6)OC=NC=CC=C6)CCl)))Cl)))))))C))C."} {"text":"The DeepSMILES of the compound with preferred IUPAC name propan-2-yl 2-ethoxy-1H-benzimidazole-4-carboxylate is CCOC=NC=CC=CC=C6N9)))))C=O)OCC)C."}", "/scratch/micpie/export/iupac_smiles/test_5-7.jsonl": "{"text":"Task: Please create the SMILES of a compound given the traditional IUPAC name.\nIUPAC name: 4-[1-(aminomethyl)-1-methyl-propyl]-3-ethyl-piperidin-4-ol\nResult: [C][C][C][C][N][C][C][C][Ring1][=Branch1][Branch1][O][C][Branch1][C][C][Branch1][Ring1][C][C][C][N][O]"} {"text":"Task: Please give me the SMILES of a chemical given the traditional IUPAC name.\nIUPAC name: N-(2-aminoethyl)-2-[(4-methyl-3-phenyl-pentanoyl)amino]benzamide\nResult: CCC)CCC=O)NC=CC=CC=C6C=O)NCCN))))))))))))))C=CC=CC=C6"}", "/scratch/micpie/export/iupac_smiles/valid_7-0.jsonl": "{"text":"The traditional IUPAC name of the compound with canonical SMILES CC(C)C(=O)NCCNCc1ccc(SC(F)(F)F)cc1 is 2-methyl-N-[2-[[4-(trifluoromethylthio)benzyl]amino]ethyl]propionamide."} {"text":"The traditional IUPAC name of the compound with DeepSMILES C=CC=CC=C6))NCCC=O)N))))S=O)=O)C=CC=CS5 is 3-[N-(2-thienylsulfonyl)anilino]propionamide."}", "/scratch/micpie/export/iupac_smiles/valid_21-7.jsonl": "{"text":"Task: Please create the SMILES of a compound given the traditional IUPAC name.\nIUPAC name: benzoic acid [4-[(1S)-2-(azetidin-1-yl)-1-(cyclohexylmethyl)-2-keto-ethoxy]-2-[5-[2-[[3-[3,5-dihydroxy-4-[(2-ketochromen-3-yl)methoxy]-6-methylol-tetrahydropyran-2-yl]oxy-4-hydroxy-5-[(2-ketochromen-3-yl)methoxy]cyclohexanecarbonyl]amino]ethylcarbamoyl]-3-(4-phenyltriazol-1-yl)-2-(3,4,5-trihydroxy-6-methyl-tetrahydropyran-2-yl)oxy-cyclohexoxy]-5-hydroxy-6-methylol-tetrahydropyran-3-yl] ester\nResult: CCCCCCO6)OCCCCCC6OCCCCCO6)CO)))O))O[C@@H]CCCCCCC6)))))))C=O)NCCC4))))))))OC=O)C=CC=CC=C6)))))))))))))C=O)NCCNC=O)CCCCCC6)OCCCCCO6)CO)))O))OCC=CC=CC=CC=C6OC%10=O))))))))))))))O)))))O))OCC=CC=CC=CC=C6OC%10=O))))))))))))))))))))))))NC=CN=N5))C=CC=CC=C6)))))))))))))O))O))O"} {"text":"Task: Please create the SMILES representation of a molecule given the traditional IUPAC name.\nIUPAC name: 3-fluoro-5-(1H-1,2,4-triazol-5-yliodanuidyl)-2-(2,2,2-trifluoroethoxy)pyridine\nResult: C1=C(C=NC(=C1F)OCC(F)(F)F)[I-]C2=NC=NN2"}", "/scratch/micpie/export/iupac_smiles/test_7-6.jsonl": "{"text":"The SELFIES of the molecule with IUPAC name 3-amino-4-hydroxy-N-(pyridin-2-ylmethyl)benzenesulfonamide is [C][=C][C][=N][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=Branch1][=Branch2][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][O][N]."} {"text":"The DeepSMILES of the molecule with IUPAC name 3-(N-(4-fluorophenyl)sulfonylanilino)propanamide is C=CC=CC=C6))NCCC=O)N))))S=O)=O)C=CC=CC=C6))F."}", "/scratch/micpie/export/iupac_smiles/valid_4-8.jsonl": "{"text":"Task: Please generate the SMILES of a compound given the systematic IUPAC name.\nIUPAC name: N-(4-chlorophenyl)-N-[[1-[(2S,3R,4R,5S,6R)-6-(hydroxymethyl)-5-[(2S,3R,4S,5R,6R)-6-(hydroxymethyl)-3,4,5-tris(oxidanyl)oxan-2-yl]oxy-3,4-bis(oxidanyl)oxan-2-yl]-1,2,3-triazol-4-yl]methyl]ethanesulfonamide\nResult: [C][C][S][=Branch1][C][=O][=Branch1][C][=O][N][Branch2][Branch1][#Branch1][C][C][=C][N][Branch1][Branch1][N][=N][Ring1][Branch1][C@@H1][C@@H1][Branch2][Ring2][#Branch1][C@H1][Branch2][Ring2][C][C@@H1][Branch1][=Branch2][C@H1][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][C@H1][C@@H1][Branch1][P][C@H1][Branch1][=N][C@H1][Branch1][=Branch2][C@H1][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][O][O][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl]"} {"text":"Task: Please create the SMILES representation of a chemical given the systematic IUPAC name.\nIUPAC name: 4-[1-(aminomethyl)-4-ethyl-cyclohexyl]-3-ethyl-piperidin-4-ol\nResult: CCC1CCC(CC1)(CN)C2(CCNCC2CC)O"}", "/scratch/micpie/export/iupac_smiles/test_18-4.jsonl": "{"text":"The InChI of the molecule with systematic IUPAC name [(E)-2-ethyl-7,10-dimethyl-undec-3-enylidene]-dimethyl-phosphanium is InChI=1S\/C17H34P\/c1-7-17(14-18(5)6)11-9-8-10-16(4)13-12-15(2)3\/h9,11,14-17H,7-8,10,12-13H2,1-6H3\/q+1\/b11-9+."} {"text":"The canonical SMILES of the chemical with systematic IUPAC name N-methyl-2-[1-[2-(4-nitrophenyl)ethyl]piperidin-4-yl]ethanamine;hydrochloride is CNCCC1CCN(CCc2ccc([N+](=O)[O-])cc2)CC1.Cl."}", "/scratch/micpie/export/iupac_smiles/train_17-5.jsonl": "{"text":"The canonical SMILES of the compound with CAS-like IUPAC name 2-ethoxy-N-pyridin-4-yl-3-[[4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl]-4-benzimidazolecarboxamide is CCOc1nc2cccc(C(=O)Nc3ccncc3)c2n1Cc1ccc(-c2ccccc2-c2nn[nH]n2)cc1."} {"text":"The DeepSMILES of the compound with IUAPC name in CAS-like style 6-(2-fluorophenyl)-8-heptan-4-ylquinazoline is CCCCCCC)))C=CC=CC=CN=CN=C%106)))))))C=CC=CC=C6F."}", "/scratch/micpie/export/iupac_smiles/valid_8-1.jsonl": "{"text":"The CAS-like IUPAC name of the molecule with canonical SMILES NC(=O)CCN(c1ccccc1)S(=O)(=O)c1ccc2ccccc2c1 is 3-[N-(2-naphthalenylsulfonyl)anilino]propanamide."} {"text":"The CAS-like IUPAC name of the compound with SELFIES [C][C][Branch1][C][C][N][C][C][Branch2][Ring1][C][C][O][\/N][=C][\\C][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][\/-Ring1][#Branch2][O].[Cl] is 1-[(E)-3,4-dihydro-2H-naphthalen-1-ylideneamino]oxy-3-(propan-2-ylamino)-2-propanol;hydrochloride."}", "/scratch/micpie/export/iupac_smiles/train_13-9.jsonl": "{"text":"Task: Please give me the SMILES representation of a compound based on the CAS-like IUPAC name.\nIUPAC name: 2-amino-N,3,3-trimethyl-N-prop-2-ynylbutanamide\nResult: InChI=1S\/C10H18N2O\/c1-6-7-12(5)9(13)8(11)10(2,3)4\/h1,8H,7,11H2,2-5H3"} {"text":"Task: Please generate the SMILES of a compound given the IUAPC name in CAS-like style.\nIUPAC name: (4R)-3,4-dihydro-2H-1-benzopyran-4-carboxylic acid (1,3,5-trimethyl-4-pyrazolyl) ester\nResult: CC1=C(C(=NN1C)C)OC(=O)[C@@H]2CCOC3=CC=CC=C23"}", "/scratch/micpie/export/iupac_smiles/train_17-1.jsonl": "{"text":"The CAS-like IUPAC name of the compound with DeepSMILES CCOC=NC=CC=CC=C6N9CC=CC=CC=C6))C=CC=CC=C6C=NNN=N5))))))))))))))))))C=O)NC=CC=NC=C6 is 2-ethoxy-N-pyridin-4-yl-3-[[4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl]-4-benzimidazolecarboxamide."} {"text":"The IUAPC name in CAS-like style of the molecule with canonical SMILES CCCC(CCC)c1cc(-c2ccccc2F)cc2cncnc12 is 6-(2-fluorophenyl)-8-heptan-4-ylquinazoline."}", "/scratch/micpie/export/iupac_smiles/test_13-8.jsonl": "{"text":"Task: Please create the SMILES representation of a molecule based on the systematic IUPAC name.\nIUPAC name: 2-azanyl-3,3-dimethyl-N-[2-methyl-1,3-bis(oxidanyl)propan-2-yl]butanamide\nResult: CC(C)(C)C(C(=O)NC(C)(CO)CO)N"} {"text":"Task: Please generate the SMILES of a compound based on the systematic IUPAC name.\nIUPAC name: (1,3,5-trimethylpyrazol-4-yl) (4S)-3,4-dihydro-2H-chromene-4-carboxylate\nResult: [C][C][=C][Branch1][#Branch2][C][=Branch1][=Branch1][=N][N][Ring1][Branch1][C][C][O][C][=Branch1][C][=O][C@H1][C][C][O][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1]"}", "/scratch/micpie/export/iupac_smiles/train_0-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the molecule with InChI InChI=1S\/C20H18BrN5O5\/c1-30-18-10-14(9-16(21)19(18)31-2)11-22-23-20(27)17-7-8-25(24-17)12-13-3-5-15(6-4-13)26(28)29\/h3-11H,12H2,1-2H3,(H,23,27)\/b22-11- is N-[(Z)-(3-bromo-4,5-dimethoxyphenyl)methylideneamino]-1-[(4-nitrophenyl)methyl]-3-pyrazolecarboxamide."} {"text":"The CAS-like IUPAC name of the compound with DeepSMILES CCCCCCN6CCCOC=CC=CC=C6))C=O)C is 1-[4-[3-(2-methyl-1-piperidinyl)propoxy]phenyl]ethanone."}", "/scratch/micpie/export/iupac_smiles/train_13-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with InChI InChI=1S\/C10H18N2O\/c1-6-7-12(5)9(13)8(11)10(2,3)4\/h1,8H,7,11H2,2-5H3 is 2-amino-N,3,3-trimethyl-N-propargyl-butyramide."} {"text":"The traditional IUPAC name of the chemical with SELFIES [C][C][=C][Branch1][#Branch2][C][=Branch1][=Branch1][=N][N][Ring1][Branch1][C][C][O][C][=Branch1][C][=O][C@@H1][C][C][O][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1] is (4R)-chroman-4-carboxylic acid (1,3,5-trimethylpyrazol-4-yl) ester."}", "/scratch/micpie/export/iupac_smiles/test_17-5.jsonl": "{"text":"The SELFIES of the molecule with IUAPC name in CAS-like style 2-ethoxy-1H-benzimidazole-4-carboxylic acid [4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl ester is [C][C][O][C][=N][C][=C][Branch1][#Branch2][C][=C][C][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][=Branch1][C][=O][O][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][N][=N][Ring1][Branch1]."} {"text":"The canonical SMILES of the chemical with IUAPC name in CAS-like style 2-[4-[2-[3-[(2E,4Z)-hexa-2,4-dien-3-yl]-3-methyl-5-(3-methyl-1-cyclohexa-1,5-dienyl)-1-cyclohexa-1,5-dienyl]-3-iodo-10-[4-(2-pyridinyl)phenyl]-3,4-dihydroanthracen-9-yl]phenyl]pyridine is C\/C=C\\C(=C\/C)C1(C)C=C(C2=Cc3c(c(-c4ccc(-c5ccccn5)cc4)c4ccccc4c3-c3ccc(-c4ccccn4)cc3)CC2I)C=C(C2=CC(C)CC=C2)C1."}", "/scratch/micpie/export/iupac_smiles/test_16-0.jsonl": "{"text":"The traditional IUPAC name of the compound with InChI InChI=1S\/C15H15Cl2NO2\/c1-2-7-19-12-3-5-13(6-4-12)20-15-8-11(9-16)14(17)10-18-15\/h3-6,8,10H,2,7,9H2,1H3 is 5-chloro-4-(chloromethyl)-2-(4-propoxyphenoxy)pyridine."} {"text":"The traditional IUPAC name of the compound with InChI InChI=1S\/C31H26FN7O2\/c1-2-41-31-34-27-9-5-8-26(30(40)33-18-20-12-16-23(32)17-13-20)28(27)39(31)19-21-10-14-22(15-11-21)24-6-3-4-7-25(24)29-35-37-38-36-29\/h3-17H,2,18-19H2,1H3,(H,33,40)(H,35,36,37,38) is 2-ethoxy-N-(4-fluorobenzyl)-3-[4-[2-(2H-tetrazol-5-yl)phenyl]benzyl]benzimidazole-4-carboxamide."}", "/scratch/micpie/export/iupac_smiles/test_26-7.jsonl": "{"text":"Task: Please generate the SMILES of a compound based on the traditional IUPAC name.\nIUPAC name: 2-(2,3-dichlorophenyl)-4-ethyl-thiazole-5-carboxylic acid\nResult: CCC1=C(SC(=N1)C2=C(C(=CC=C2)Cl)Cl)C(=O)O"} {"text":"Task: Please generate the SMILES representation of a compound based on the traditional IUPAC name.\nIUPAC name: 1-[(5R)-5-(2-chlorophenyl)-3-(3-methoxyphenyl)-2-pyrazolin-1-yl]-2-(3,4-dihydro-1H-isoquinolin-2-yl)ethanone\nResult: COC=CC=CC=C6)C=NN[C@H]C5)C=CC=CC=C6Cl))))))))C=O)CNCCC=CC=CC=C6C%10"}", "/scratch/micpie/export/iupac_smiles/valid_8-0.jsonl": "{"text":"The traditional IUPAC name of the compound with canonical SMILES NC(=O)CCN(c1ccccc1)S(=O)(=O)c1ccc2ccccc2c1 is 3-[N-(2-naphthylsulfonyl)anilino]propionamide."} {"text":"The traditional IUPAC name of the chemical with DeepSMILES CCC)NCCCO\/N=C\\CCCC=CC=CC=C6\\%10)))))))))))))O.Cl is 1-(isopropylamino)-3-[(E)-tetralin-1-ylideneamino]oxy-propan-2-ol;hydrochloride."}", "/scratch/micpie/export/iupac_smiles/valid_27-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with DeepSMILES CNCC=O)N[C@@H]CC=N5)C=CC=CC=C6))F)))))))C=CC=CC=C6))Cl)))))))))C=CC=CC=C6 is 1-[(5S)-5-(4-chlorophenyl)-3-(4-fluorophenyl)-2-pyrazolin-1-yl]-2-(N-methylanilino)ethanone."} {"text":"The traditional IUPAC name of the compound with DeepSMILES CCC=CC=CC=C6C=C9C=CC=C6))NC=CC=CC=C6))C=CC5C=CC=CC=C6C=CC=CC=C%136)))))))))))))C=CC=C6))C=CC=CC=C6O))O))O))O))O))))))))))))))))))))))C is 6-[7'-[(9,9-dimethylfluoren-2-yl)amino]-9,9'-spirobi[fluorene]-2'-yl]benzene-1,2,3,4,5-pentol."}", "/scratch/micpie/export/iupac_smiles/test_20-3.jsonl": "{"text":"The SMILES of the compound with traditional IUPAC name (2S,8S,12S)-3,5,9,11-tetraketo-4,10-bis(o-phenetyl)-4,10-diazatetracyclo[5.5.2.02,6.08,12]tetradec-13-ene-13-carboxylic acid phenyl ester is CCOC1=CC=CC=C1N2C(=O)[C@@H]3[C@@H](C2=O)C4[C@H]5C(C3C=C4C(=O)OC6=CC=CC=C6)C(=O)N(C5=O)C7=CC=CC=C7OCC."} {"text":"The canonical SMILES of the compound with traditional IUPAC name (2S)-2-[3-benzoyloxy-2-[3-[4-(3-fluorophenyl)triazol-1-yl]-5-[2-[4-[4-(3-fluorophenyl)triazol-1-yl]-3,5-dihydroxy-6-methylol-tetrahydropyran-2-yl]oxyethylcarbamoyl]-2-(3,4,5-trihydroxy-6-methyl-tetrahydropyran-2-yl)oxy-cyclohexoxy]-5-hydroxy-6-methylol-tetrahydropyran-4-yl]oxy-3-cyclohexyl-propionic acid is CC1OC(OC2C(OC3OC(CO)C(O)C(O[C@@H](CC4CCCCC4)C(=O)O)C3OC(=O)c3ccccc3)CC(C(=O)NCCOC3OC(CO)C(O)C(n4cc(-c5cccc(F)c5)nn4)C3O)CC2n2cc(-c3cccc(F)c3)nn2)C(O)C(O)C1O."}", "/scratch/micpie/export/iupac_smiles/test_25-9.jsonl": "{"text":"Task: Please create the SMILES of a compound based on the CAS-like IUPAC name.\nIUPAC name: 5-(1,3-benzoxazol-5-yl)-N-[(1R)-1-cyclopropyl-2,2,2-trifluoroethyl]-7-methyl-3-pyrazolo[1,5-a]pyrimidinecarboxamide\nResult: InChI=1S\/C20H16F3N5O2\/c1-10-6-14(12-4-5-16-15(7-12)24-9-30-16)26-18-13(8-25-28(10)18)19(29)27-17(11-2-3-11)20(21,22)23\/h4-9,11,17H,2-3H2,1H3,(H,27,29)\/t17-\/m1\/s1"} {"text":"Task: Please create the SMILES representation of a compound based on the IUAPC name in CAS-like style.\nIUPAC name: 3-ethyl-5,6,7,8-tetrahydrobenzo[f]benzofuran-2-carboxylic acid\nResult: CCC1=C(OC2=C1C=C3CCCCC3=C2)C(=O)O"}", "/scratch/micpie/export/iupac_smiles/valid_27-7.jsonl": "{"text":"Task: Please give me the SMILES of a chemical given the traditional IUPAC name.\nIUPAC name: 1-[(5S)-5-(4-chlorophenyl)-3-(4-fluorophenyl)-2-pyrazolin-1-yl]-2-(N-methylanilino)ethanone\nResult: [C][N][Branch2][Ring2][Branch2][C][C][=Branch1][C][=O][N][C@@H1][Branch2][Ring1][Ring1][C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][F][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl][C][=C][C][=C][C][=C][Ring1][=Branch1]"} {"text":"Task: Please generate the SMILES of a compound given the traditional IUPAC name.\nIUPAC name: 6-[7'-[(9,9-dimethylfluoren-2-yl)amino]-9,9'-spirobi[fluorene]-2'-yl]benzene-1,2,3,4,5-pentol\nResult: CCC=CC=CC=C6C=C9C=CC=C6))NC=CC=CC=C6))C=CC5C=CC=CC=C6C=CC=CC=C%136)))))))))))))C=CC=C6))C=CC=CC=C6O))O))O))O))O))))))))))))))))))))))C"}", "/scratch/micpie/export/iupac_smiles/valid_6-4.jsonl": "{"text":"The InChI of the compound with systematic IUPAC name N-(2-azanylethyl)-2-(5-methoxypentanoylamino)benzamide is InChI=1S\/C15H23N3O3\/c1-21-11-5-4-8-14(19)18-13-7-3-2-6-12(13)15(20)17-10-9-16\/h2-3,6-7H,4-5,8-11,16H2,1H3,(H,17,20)(H,18,19)."} {"text":"The SMILES of the molecule with systematic IUPAC name 2-methoxy-5-[[(phenylmethyl)-propan-2-yl-amino]methyl]aniline is CC(C)N(CC1=CC=CC=C1)CC2=CC(=C(C=C2)OC)N."}", "/scratch/micpie/export/iupac_smiles/test_3-4.jsonl": "{"text":"The SELFIES of the chemical with systematic IUPAC name (2R)-1-(phenylmethylsulfanyl)-3-propan-2-yloxy-propan-2-ol is [C][C][Branch1][C][C][O][C][C@H1][Branch1][N][C][S][C][C][=C][C][=C][C][=C][Ring1][=Branch1][O]."} {"text":"The SELFIES of the chemical with systematic IUPAC name 2-azanyl-2-(hydroxymethyl)propane-1,3-diol;5-chloranyl-6-methyl-3H-1,3-benzoxazol-2-one is [C][C][=C][C][=C][Branch1][=Branch1][C][=C][Ring1][=Branch1][Cl][N][C][=Branch1][C][=O][O][Ring1][=Branch2].[C][Branch1][O][C][Branch1][Ring1][C][O][Branch1][Ring1][C][O][N][O]."}", "/scratch/micpie/export/iupac_smiles/valid_27-6.jsonl": "{"text":"The SELFIES of the molecule with preferred IUPAC name 1-[(3S)-3-(4-chlorophenyl)-5-(4-fluorophenyl)-3,4-dihydropyrazol-2-yl]-2-(N-methylanilino)ethanone is [C][N][Branch2][Ring2][Branch2][C][C][=Branch1][C][=O][N][C@@H1][Branch2][Ring1][Ring1][C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][F][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl][C][=C][C][=C][C][=C][Ring1][=Branch1]."} {"text":"The InChI of the molecule with preferred IUPAC name 6-[7'-[(9,9-dimethylfluoren-2-yl)amino]-9,9'-spirobi[fluorene]-2'-yl]benzene-1,2,3,4,5-pentol is InChI=1S\/C46H33NO5\/c1-45(2)33-12-6-3-9-27(33)30-19-16-25(22-36(30)45)47-26-17-20-32-31-18-15-24(39-40(48)42(50)44(52)43(51)41(39)49)21-37(31)46(38(32)23-26)34-13-7-4-10-28(34)29-11-5-8-14-35(29)46\/h3-23,47-52H,1-2H3."}", "/scratch/micpie/export/iupac_smiles/test_19-3.jsonl": "{"text":"The canonical SMILES of the molecule with traditional IUPAC name N-(4-chlorophenyl)-3-[4-[2-(methylamino)ethyl]piperidino]propionamide;hydrochloride is CNCCC1CCN(CCC(=O)Nc2ccc(Cl)cc2)CC1.Cl."} {"text":"The canonical SMILES of the compound with traditional IUPAC name nan is COc1cc([C@@H]2CC(=O)C[C@H](OC(C)=O)CC[C@@]34Cc5c[nH]cc5[C@H](C#CCC3=CC=C[C@@H]4C)[C@H](c3cccc(O)c3)C3=CCNC(=C3)N(CCC(C)=O)c3ccc4c5c(n2cc35)C[C@]2(CCC3(CCCC3)C2)[C@@H]4O)cc(O)c1Oc1cccc(O)c1."}", "/scratch/micpie/export/iupac_smiles/train_18-6.jsonl": "{"text":"The DeepSMILES of the molecule with preferred IUPAC name N-[3-[6-(6-methylpyridin-3-yl)-3,8a-dihydroquinazolin-8-yl]phenyl]prop-2-enamide is CC=NC=CC=C6))C=CC=CNC=NC6C=C%10)C=CC=CC=C6)))NC=O)C=C."} {"text":"The canonical SMILES of the compound with preferred IUPAC name 2-[1-[(3,4-difluorophenyl)methyl]piperidin-4-yl]-N-methylethanamine;hydrochloride is CNCCC1CCN(Cc2ccc(F)c(F)c2)CC1.Cl."}", "/scratch/micpie/export/iupac_smiles/valid_14-6.jsonl": "{"text":"The DeepSMILES of the compound with IUPAC name tert-butyl N-[[(1S,2S)-2-(furan-2-carbonylamino)cyclopentyl]methyl]carbamate is CCC)C)OC=O)NC[C@@H]CCC[C@@H]5NC=O)C=CC=CO5."} {"text":"The DeepSMILES of the molecule with IUPAC name tert-butyl N-[2-(5-bromo-7-chloro-1-benzofuran-2-yl)ethyl]carbamate is CCC)C)OC=O)NCCC=CC=CC=CC=C6O9))Cl)))Br."}", "/scratch/micpie/export/iupac_smiles/valid_11-5.jsonl": "{"text":"The DeepSMILES of the chemical with CAS-like IUPAC name 13-chloro-2-(4-piperidinylidene)-4-azatricyclo[9.4.0.03,8]pentadeca-1(11),3(8),4,6,12,14-hexaene;disulfate is CCC=CC=CC=C6)Cl))))C=CCCNCC6))))))C=C7C=CC=N6.[O-]S=O)=O)[O-].[O-]S=O)=O)[O-]."} {"text":"The InChI of the compound with IUAPC name in CAS-like style 4-(7-bromo-9,9-dioctyl-2-fluorenyl)benzaldehyde is InChI=1S\/C36H45BrO\/c1-3-5-7-9-11-13-23-36(24-14-12-10-8-6-4-2)34-25-30(29-17-15-28(27-38)16-18-29)19-21-32(34)33-22-20-31(37)26-35(33)36\/h15-22,25-27H,3-14,23-24H2,1-2H3."}", "/scratch/micpie/export/iupac_smiles/test_16-7.jsonl": "{"text":"Task: Please give me the SMILES representation of a molecule given the traditional IUPAC name.\nIUPAC name: 5-chloro-4-(chloromethyl)-2-(4-propoxyphenoxy)pyridine\nResult: [C][C][C][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][O][C][=N][C][=C][Branch1][=Branch2][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][Cl][Cl]"} {"text":"Task: Please give me the SMILES of a chemical based on the traditional IUPAC name.\nIUPAC name: 2-ethoxy-N-(4-fluorobenzyl)-3-[4-[2-(2H-tetrazol-5-yl)phenyl]benzyl]benzimidazole-4-carboxamide\nResult: [C][C][O][C][=N][C][=C][C][=C][C][=Branch2][Ring1][P][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][N][=N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][F]"}", "/scratch/micpie/export/iupac_smiles/valid_27-8.jsonl": "{"text":"Task: Please give me the SMILES representation of a chemical based on the systematic IUPAC name.\nIUPAC name: 1-[(3S)-3-(4-chlorophenyl)-5-(4-fluorophenyl)-3,4-dihydropyrazol-2-yl]-2-[methyl(phenyl)amino]ethanone\nResult: CN(CC(=O)N1N=C(c2ccc(F)cc2)C[C@H]1c1ccc(Cl)cc1)c1ccccc1"} {"text":"Task: Please give me the SMILES representation of a compound given the systematic IUPAC name.\nIUPAC name: 6-[7'-[(9,9-dimethylfluoren-2-yl)amino]-9,9'-spirobi[fluorene]-2'-yl]benzene-1,2,3,4,5-pentol\nResult: InChI=1S\/C46H33NO5\/c1-45(2)33-12-6-3-9-27(33)30-19-16-25(22-36(30)45)47-26-17-20-32-31-18-15-24(39-40(48)42(50)44(52)43(51)41(39)49)21-37(31)46(38(32)23-26)34-13-7-4-10-28(34)29-11-5-8-14-35(29)46\/h3-23,47-52H,1-2H3"}", "/scratch/micpie/export/iupac_smiles/train_9-5.jsonl": "{"text":"The DeepSMILES of the compound with CAS-like IUPAC name 1-(tert-butylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,11,13-hexaenylideneamino)oxy-2-propanol is CCC)C)NCCCON=CC=CC=CC=C6CCC=CC=CC=C6%15))))))))))))))))))O."} {"text":"The SMILES of the molecule with IUAPC name in CAS-like style N-[(5-butyl-2,4-dimethyl-6-oxo-3,4-dihydropyridin-5-yl)methyl]-3-[cyclopentyl(methyl)amino]-2-ethenylbenzamide is CCCCC1(C(CC(=NC1=O)C)C)CNC(=O)C2=C(C(=CC=C2)N(C)C3CCCC3)C=C."}", "/scratch/micpie/export/iupac_smiles/test_10-8.jsonl": "{"text":"Task: Please give me the SMILES representation of a compound given the systematic IUPAC name.\nIUPAC name: 1-[5-chloranyl-2-methyl-3-[piperidin-4-yl(prop-2-enyl)amino]phenyl]ethanol\nResult: [C][C][=C][Branch2][Ring1][=Branch2][C][=C][Branch2][Ring1][Ring1][C][=C][Ring1][=Branch1][N][Branch1][Ring2][C][C][=C][C][C][C][N][C][C][Ring1][=Branch1][Cl][C][Branch1][C][C][O]"} {"text":"Task: Please create the SMILES representation of a compound based on the systematic IUPAC name.\nIUPAC name: 2-[4-[3-(3-fluoranyl-6-methoxy-quinolin-4-yl)propyl]-1-(2-thiophen-3-ylsulfanylethyl)piperidin-3-yl]ethanoic acid\nResult: COC=CC=CC=CN=C6C=C%10)))))F))CCCCCCNCC6CC=O)O)))))CCSC=CSC=C5"}", "/scratch/micpie/export/iupac_smiles/train_21-1.jsonl": "{"text":"The CAS-like IUPAC name of the molecule with DeepSMILES CCCCCCCC6OCCCCCO6)C))O))O))O)))))OCCCCCO6)CO)))O))O[C@@H]CCCCCCC6)))))))C=O)O)))))OC=O)C=CC=CC=C6)))))))))))))C=O)NCCNC=O)CCCCCC6)OCCCCCO6)CO)))O))NC=CN=N5))C=CC=CC=C6)))F))))))))O)))))O))NC=CN=N5))C=CC=CC=C6)))F is (2S)-2-[[3-benzoyloxy-2-[3-ethyl-5-[[2-[[[3-[4-(3-fluorophenyl)-1-triazolyl]-5-[[4-[4-(3-fluorophenyl)-1-triazolyl]-3,5-dihydroxy-6-(hydroxymethyl)-2-oxanyl]oxy]-4-hydroxycyclohexyl]-oxomethyl]amino]ethylamino]-oxomethyl]-2-[(3,4,5-trihydroxy-6-methyl-2-oxanyl)oxy]cyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)-4-oxanyl]oxy]-3-cyclohexylpropanoic acid."} {"text":"The CAS-like IUPAC name of the compound with InChI InChI=1S\/C11H7F3N2O\/c12-11(13,14)17-9-5-3-8(4-6-9)10-2-1-7-15-16-10\/h1-7H is 3-[4-(trifluoromethoxy)phenyl]pyridazine."}", "/scratch/micpie/export/iupac_smiles/test_1-9.jsonl": "{"text":"Task: Please generate the SMILES of a chemical based on the CAS-like IUPAC name.\nIUPAC name: (3R)-3-nitro-1-cyclohexanol\nResult: C1C[C@H](CC(C1)O)[N+](=O)[O-]"} {"text":"Task: Please create the SMILES of a molecule given the IUAPC name in CAS-like style.\nIUPAC name: (2S)-1-[[(2S)-2-hydroxy-3-(1H-indol-4-yloxy)propyl]-propan-2-ylamino]-3-(1H-indol-4-yloxy)-2-propanol\nResult: CCC)NC[C@@H]COC=CC=CC=C6C=CN5)))))))))))O)))C[C@@H]COC=CC=CC=C6C=CN5)))))))))))O"}", "/scratch/micpie/export/iupac_smiles/test_15-6.jsonl": "{"text":"The SMILES of the compound with preferred IUPAC name (E)-3-(6-aminopyridin-3-yl)-N-[[7-(5-chloro-2,4-difluorophenyl)-5-[4-(4,4-difluoropiperidine-1-carbonyl)phenyl]-1-benzofuran-2-yl]methyl]prop-2-enamide is C1CN(CCC1(F)F)C(=O)C2=CC=C(C=C2)C3=CC(=C4C(=C3)C=C(O4)CNC(=O)\/C=C\/C5=CN=C(C=C5)N)C6=CC(=C(C=C6F)F)Cl."} {"text":"The InChI of the molecule with preferred IUPAC name 5-chloro-4-(chloromethyl)-2-(2,3-dihydro-1H-inden-5-yloxy)pyridine is InChI=1S\/C15H13Cl2NO\/c16-8-12-7-15(18-9-14(12)17)19-13-5-4-10-2-1-3-11(10)6-13\/h4-7,9H,1-3,8H2."}", "/scratch/micpie/export/iupac_smiles/valid_9-3.jsonl": "{"text":"The canonical SMILES of the chemical with traditional IUPAC name 1-(isopropylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,9,11,13-heptaenylideneamino)oxy-propan-2-ol is CC(C)NCC(O)CON=c1c2ccccc2ccc2ccccc12."} {"text":"The canonical SMILES of the chemical with traditional IUPAC name 5-chloro-3-[cyclobutyl(methyl)amino]-N-[(4-cycloheptyl-2-keto-4,6-dimethyl-3-piperidyl)methyl]-2-methyl-benzamide is Cc1c(C(=O)NCC2C(=O)NC(C)CC2(C)C2CCCCCC2)cc(Cl)cc1N(C)C1CCC1."}", "/scratch/micpie/export/iupac_smiles/test_21-8.jsonl": "{"text":"Task: Please create the SMILES representation of a molecule based on the systematic IUPAC name.\nIUPAC name: [4-[(2S)-1-(azetidin-1-yl)-3-cyclohexyl-1-oxidanylidene-propan-2-yl]oxy-2-[3-ethyl-5-[2-[[3-[6-(hydroxymethyl)-3,5-bis(oxidanyl)-4-[(2-oxidanylidenechromen-3-yl)methoxy]oxan-2-yl]oxy-4-oxidanyl-5-[(2-oxidanylidenechromen-3-yl)methoxy]cyclohexyl]carbonylamino]ethylcarbamoyl]-2-[6-methyl-3,4,5-tris(oxidanyl)oxan-2-yl]oxy-cyclohexyl]oxy-6-(hydroxymethyl)-5-oxidanyl-oxan-3-yl] benzoate\nResult: [C][C][C][C][C][Branch2][#Branch1][#Branch1][C][C][Branch2][Ring1][=Branch2][C][Ring1][=Branch1][O][C][C][Branch1][S][C][Branch1][N][C][Branch1][Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][O][C][C][Branch2][Ring2][#Branch1][C][Branch1][=N][C][Branch1][=Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][C@@H1][Branch1][#Branch2][C][C][C][C][C][C][C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][C][Ring1][Ring2][O][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][N][C][=Branch1][C][=O][C][C][C][Branch2][Branch1][Ring1][C][Branch2][Ring2][=C][C][Branch1][Ring2][C][Ring1][=Branch1][O][C][C][Branch2][Ring1][P][C][Branch1][=N][C][Branch1][=Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][C][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][Ring1][#Branch2][=O][O][O][O][C][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][Ring1][#Branch2][=O]"} {"text":"Task: Please create the SMILES representation of a chemical given the systematic IUPAC name.\nIUPAC name: 2-(methoxymethyl)-3-[4-(trifluoromethyloxy)phenyl]pyrazine\nResult: COCC1=NC=CN=C1C2=CC=C(C=C2)OC(F)(F)F"}", "/scratch/micpie/export/iupac_smiles/valid_20-8.jsonl": "{"text":"Task: Please give me the SMILES of a chemical based on the systematic IUPAC name.\nIUPAC name: methyl 4-acetyloxy-7-(2-oct-2-enyl-2-oxidanyl-5-oxidanylidene-cyclopent-3-en-1-ylidene)hept-5-enoate\nResult: CCCCCC=CCC1(O)C=CC(=O)C1=CC=CC(CCC(=O)OC)OC(C)=O"} {"text":"Task: Please create the SMILES of a molecule given the systematic IUPAC name.\nIUPAC name: 2-[2-[2-(2-oxidanylidenepropylamino)ethylamino]ethylamino]ethanoic acid\nResult: CC=O)CNCCNCCNCC=O)O"}", "/scratch/micpie/export/iupac_smiles/train_19-6.jsonl": "{"text":"The SMILES of the compound with preferred IUPAC name N-(4-chlorophenyl)-3-[4-[2-(methylamino)ethyl]piperidin-1-yl]propanamide is CNCCC1CCN(CC1)CCC(=O)NC2=CC=C(C=C2)Cl."} {"text":"The DeepSMILES of the compound with IUPAC name 3-[N-[2,6-dichloro-4-[hydroxy(oxido)amino]phenyl]-C-methylcarbonimidoyl]chromen-2-one is CC=NC=CC=CC=C6Cl)))NO)[O-]))))Cl))))C=CC=CC=CC=C6OC%10=O."}", "/scratch/micpie/export/iupac_smiles/test_17-4.jsonl": "{"text":"The InChI of the chemical with systematic IUPAC name [4-[2-(2H-1,2,3,4-tetrazol-5-yl)phenyl]phenyl]methyl 2-ethoxy-1H-benzimidazole-4-carboxylate is InChI=1S\/C24H20N6O3\/c1-2-32-24-25-20-9-5-8-19(21(20)26-24)23(31)33-14-15-10-12-16(13-11-15)17-6-3-4-7-18(17)22-27-29-30-28-22\/h3-13H,2,14H2,1H3,(H,25,26)(H,27,28,29,30)."} {"text":"The canonical SMILES of the molecule with systematic IUPAC name 2-[4-[2-[3-[(2E,4Z)-hexa-2,4-dien-3-yl]-3-methyl-5-(3-methylcyclohexa-1,5-dien-1-yl)cyclohexa-1,5-dien-1-yl]-3-iodanyl-10-(4-pyridin-2-ylphenyl)-3,4-dihydroanthracen-9-yl]phenyl]pyridine is C\/C=C\\C(=C\/C)C1(C)C=C(C2=Cc3c(c(-c4ccc(-c5ccccn5)cc4)c4ccccc4c3-c3ccc(-c4ccccn4)cc3)CC2I)C=C(C2=CC(C)CC=C2)C1."}", "/scratch/micpie/export/iupac_smiles/train_15-4.jsonl": "{"text":"The SELFIES of the molecule with systematic IUPAC name 1-methyl-9H-pyrido[3,4-b]indole-7-carboxylic acid is [C][C][=N][C][=C][C][=C][Ring1][=Branch1][N][C][=C][Ring1][Branch1][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=Branch1][C][=O][O]."} {"text":"The DeepSMILES of the chemical with systematic IUPAC name 8-[5-chloranyl-4-(chloromethyl)pyridin-2-yl]oxyquinoline is C=CC=CC=C6)OC=NC=CC=C6)CCl)))Cl)))))))N=CC=C6."}", "/scratch/micpie/export/iupac_smiles/train_6-6.jsonl": "{"text":"The InChI of the compound with IUPAC name N-(2-aminoethyl)-2-[2-(oxolan-2-ylmethoxy)propanoylamino]benzamide is InChI=1S\/C17H25N3O4\/c1-12(24-11-13-5-4-10-23-13)16(21)20-15-7-3-2-6-14(15)17(22)19-9-8-18\/h2-3,6-7,12-13H,4-5,8-11,18H2,1H3,(H,19,22)(H,20,21)."} {"text":"The SMILES of the chemical with IUPAC name 3-amino-4-fluoro-N-[(1S)-1-phenylethyl]benzenesulfonamide is C[C@@H](C1=CC=CC=C1)NS(=O)(=O)C2=CC(=C(C=C2)F)N."}", "/scratch/micpie/export/iupac_smiles/valid_16-4.jsonl": "{"text":"The DeepSMILES of the chemical with systematic IUPAC name 5-chloranyl-4-(chloromethyl)-2-(3-iodanylphenoxy)pyridine is C=CC=CC=C6)I)))OC=NC=CC=C6)CCl)))Cl."} {"text":"The SELFIES of the compound with systematic IUPAC name N-[2-[2,5-bis(chloranyl)phenyl]ethyl]-2-ethoxy-3-[[4-[2-(2H-1,2,3,4-tetrazol-5-yl)phenyl]phenyl]methyl]benzimidazole-4-carboxamide is [C][C][O][C][=N][C][=C][C][=C][C][=Branch2][Ring1][P][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][N][=N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][=C][Branch1][#Branch2][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][Cl][Cl]."}", "/scratch/micpie/export/iupac_smiles/test_9-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the molecule with SELFIES [C][C][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][N][C][C][Branch2][Ring1][#Branch1][C][O][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][=N][O] is 1-(9-fluorenylideneamino)oxy-3-(4-phenylbutan-2-ylamino)-2-propanol."} {"text":"The CAS-like IUPAC name of the compound with DeepSMILES CCC=O)NCC))CCCCC[C@@H]6C))C=O)NCCCCCNC6=O)))C)))C))))))))Cl is (2S)-5-chloro-N-[(4,6-dimethyl-2-oxo-3-piperidinyl)methyl]-3-[ethyl(1-oxopropyl)amino]-2-methyl-1-cyclohexanecarboxamide."}", "/scratch/micpie/export/iupac_smiles/train_0-4.jsonl": "{"text":"The canonical SMILES of the compound with systematic IUPAC name N-[(Z)-(3-bromanyl-4,5-dimethoxy-phenyl)methylideneamino]-1-[(4-nitrophenyl)methyl]pyrazole-3-carboxamide is COc1cc(\/C=N\\NC(=O)c2ccn(Cc3ccc([N+](=O)[O-])cc3)n2)cc(Br)c1OC."} {"text":"The SELFIES of the molecule with systematic IUPAC name 1-[4-[3-(2-methylpiperidin-1-yl)propoxy]phenyl]ethanone is [C][C][C][C][C][C][N][Ring1][=Branch1][C][C][C][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C]."}", "/scratch/micpie/export/iupac_smiles/valid_24-3.jsonl": "{"text":"The DeepSMILES of the molecule with traditional IUPAC name 2-(1,4-diketo-3H-phthalazin-2-yl)-N-(2-hydroxyphenyl)acetamide is C=CC=CC=C6)C=O)NNC6=O))CC=O)NC=CC=CC=C6O."} {"text":"The SELFIES of the compound with traditional IUPAC name 5-(7-fluoro-1,3-benzoxazol-5-yl)-7-methyl-N-(2,2,2-trifluoro-1-methyl-ethyl)pyrazolo[1,5-a]pyrimidine-3-carboxamide is [C][C][=C][C][=Branch2][Ring1][=C][=N][C][=C][Branch1][Branch2][C][=N][N][Ring1][=Branch2][Ring1][Branch1][C][=Branch1][C][=O][N][C][Branch1][C][C][C][Branch1][C][F][Branch1][C][F][F][C][=C][C][=C][Branch1][Branch2][C][=Branch1][Ring2][=C][Ring1][=Branch1][F][O][C][=N][Ring1][Branch2]."}", "/scratch/micpie/export/iupac_smiles/valid_10-9.jsonl": "{"text":"Task: Please create the SMILES of a chemical based on the IUAPC name in CAS-like style.\nIUPAC name: 3-[hexan-2-yl(methyl)amino]-2-methylbenzoic acid methyl ester\nResult: CCCCC(C)N(C)c1cccc(C(=O)OC)c1C"} {"text":"Task: Please create the SMILES of a chemical based on the CAS-like IUPAC name.\nIUPAC name: (4S,7R,8S,9S,13Z,16S)-16-[(Z)-1-fluoro-2-(2-pyridinyl)ethenyl]-4,8-dihydroxy-5,5,9,13-tetramethyl-7-propyl-1-azacyclohexadec-13-ene-2,6-dione\nResult: CCC[C@@H]1[C@H]([C@H](CCC\/C(=C\\C[C@H](NC(=O)C[C@@H](C(C1=O)(C)C)O)\/C(=C\/C2=CC=CC=N2)\/F)\/C)C)O"}", "/scratch/micpie/export/iupac_smiles/test_16-9.jsonl": "{"text":"Task: Please generate the SMILES representation of a chemical given the IUAPC name in CAS-like style.\nIUPAC name: 5-chloro-4-(chloromethyl)-2-(4-propoxyphenoxy)pyridine\nResult: CCCOC=CC=CC=C6))OC=NC=CC=C6)CCl)))Cl"} {"text":"Task: Please generate the SMILES representation of a molecule given the CAS-like IUPAC name.\nIUPAC name: 2-ethoxy-N-[(4-fluorophenyl)methyl]-3-[[4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl]-4-benzimidazolecarboxamide\nResult: CCOc1nc2cccc(C(=O)NCc3ccc(F)cc3)c2n1Cc1ccc(-c2ccccc2-c2nn[nH]n2)cc1"}", "/scratch/micpie/export/iupac_smiles/valid_1-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with canonical SMILES C=C.C=CCN(C)CC is allyl-ethyl-methyl-amine;ethylene."} {"text":"The traditional IUPAC name of the chemical with canonical SMILES CC(C)N(C[C@H](O)COc1cccc2[nH]ccc12)C[C@@H](O)COc1cccc2[nH]ccc12 is (2R)-1-[[(2S)-2-hydroxy-3-(1H-indol-4-yloxy)propyl]-isopropyl-amino]-3-(1H-indol-4-yloxy)propan-2-ol."}", "/scratch/micpie/export/iupac_smiles/valid_19-0.jsonl": "{"text":"The traditional IUPAC name of the compound with canonical SMILES CNCCC1CCN(CC(=O)Nc2cc(-n3cnnn3)ccc2Cl)CC1.Cl is N-[2-chloro-5-(tetrazol-1-yl)phenyl]-2-[4-[2-(methylamino)ethyl]piperidino]acetamide;hydrochloride."} {"text":"The traditional IUPAC name of the chemical with SMILES CC(C)C1CCC2(CCCC(=C)C2C1NC(=O)CN(CC(=O)NC3C(CCC4(C3C(=C)CCC4)C)C(C)C)CC(=O)OC)C is 2-[bis[2-[(2-isopropyl-4a-methyl-8-methylene-decalin-1-yl)amino]-2-keto-ethyl]amino]acetic acid methyl ester."}", "/scratch/micpie/export/iupac_smiles/test_9-3.jsonl": "{"text":"The SMILES of the compound with traditional IUPAC name 1-(fluoren-9-ylideneamino)oxy-3-[(1-methyl-3-phenyl-propyl)amino]propan-2-ol is CC(CCC1=CC=CC=C1)NCC(CON=C2C3=CC=CC=C3C4=CC=CC=C42)O."} {"text":"The SELFIES of the compound with traditional IUPAC name (2S)-5-chloro-3-[ethyl(propionyl)amino]-N-[(2-keto-4,6-dimethyl-3-piperidyl)methyl]-2-methyl-cyclohexanecarboxamide is [C][C][C][=Branch1][C][=O][N][Branch1][Ring1][C][C][C][C][C][Branch2][Ring1][=C][C][C][Branch1][Branch1][C@@H1][Ring1][=Branch1][C][C][=Branch1][C][=O][N][C][C][C][Branch1][O][C][C][Branch1][=Branch1][N][C][Ring1][=Branch1][=O][C][C][Cl]."}", "/scratch/micpie/export/iupac_smiles/train_16-5.jsonl": "{"text":"The canonical SMILES of the chemical with CAS-like IUPAC name 5-chloro-4-(chloromethyl)-2-(2,3,5-trimethylphenoxy)pyridine is Cc1cc(C)c(C)c(Oc2cc(CCl)c(Cl)cn2)c1."} {"text":"The DeepSMILES of the chemical with CAS-like IUPAC name 2-ethoxy-1H-benzimidazole-4-carboxylic acid propan-2-yl ester is CCOC=NC=CC=CC=C6N9)))))C=O)OCC)C."}", "/scratch/micpie/export/iupac_smiles/train_6-2.jsonl": "{"text":"The IUPAC name of the molecule with SELFIES [C][C][Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][N][O][C][C][C][C][C][O][Ring1][Branch1] is N-(2-aminoethyl)-2-[2-(oxolan-2-ylmethoxy)propanoylamino]benzamide."} {"text":"The preferred IUPAC name of the compound with DeepSMILES C[C@@H]C=CC=CC=C6))))))NS=O)=O)C=CC=CC=C6))F))N is 3-amino-4-fluoro-N-[(1S)-1-phenylethyl]benzenesulfonamide."}", "/scratch/micpie/export/iupac_smiles/test_0-7.jsonl": "{"text":"Task: Please give me the SMILES of a chemical based on the traditional IUPAC name.\nIUPAC name: 1-(4-bromobenzyl)-N-[(Z)-m-anisylideneamino]isonipecotamide\nResult: COC=CC=CC=C6)\/C=N\\NC=O)CCCNCC6))CC=CC=CC=C6))Br"} {"text":"Task: Please generate the SMILES of a molecule given the traditional IUPAC name.\nIUPAC name: 2-(5-methyl-1lambda3-ioda-3-thia-2-azacyclopenta-1,4-dien-4-yl)thiazole-4-carbaldehyde\nResult: CC=CSN=I5)))C=NC=CS5))C=O"}", "/scratch/micpie/export/iupac_smiles/valid_8-7.jsonl": "{"text":"Task: Please give me the SMILES representation of a molecule based on the traditional IUPAC name.\nIUPAC name: 3-[N-(2-naphthylsulfonyl)anilino]propionamide\nResult: NC(=O)CCN(c1ccccc1)S(=O)(=O)c1ccc2ccccc2c1"} {"text":"Task: Please give me the SMILES representation of a compound given the traditional IUPAC name.\nIUPAC name: 1-(isopropylamino)-3-[(E)-tetralin-1-ylideneamino]oxy-propan-2-ol;hydrochloride\nResult: InChI=1S\/C16H24N2O2.ClH\/c1-12(2)17-10-14(19)11-20-18-16-9-5-7-13-6-3-4-8-15(13)16;\/h3-4,6,8,12,14,17,19H,5,7,9-11H2,1-2H3;1H\/b18-16+;"}", "/scratch/micpie/export/iupac_smiles/test_26-3.jsonl": "{"text":"The DeepSMILES of the chemical with traditional IUPAC name 2-(2,3-dichlorophenyl)-4-ethyl-thiazole-5-carboxylic acid is CCC=CSC=N5)C=CC=CC=C6)))Cl))Cl)))))C=O)O."} {"text":"The canonical SMILES of the chemical with traditional IUPAC name 1-[(5R)-5-(2-chlorophenyl)-3-(3-methoxyphenyl)-2-pyrazolin-1-yl]-2-(3,4-dihydro-1H-isoquinolin-2-yl)ethanone is COc1cccc(C2=NN(C(=O)CN3CCc4ccccc4C3)[C@@H](c3ccccc3Cl)C2)c1."}", "/scratch/micpie/export/iupac_smiles/train_19-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with SMILES CNCCC1CCN(CC1)CCC(=O)NC2=CC=C(C=C2)Cl is N-(4-chlorophenyl)-3-[4-[2-(methylamino)ethyl]piperidino]propionamide."} {"text":"The traditional IUPAC name of the compound with SELFIES [C][C][=Branch2][Ring1][Branch1][=N][C][=C][Branch1][#C][C][=C][Branch1][=Branch1][C][=C][Ring1][=Branch1][Cl][N][Branch1][C][O][O-1][Cl][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][Ring1][#Branch2][=O] is 3-[N-[2,6-dichloro-4-[hydroxy(oxido)amino]phenyl]-C-methyl-carbonimidoyl]coumarin."}", "/scratch/micpie/export/iupac_smiles/valid_16-7.jsonl": "{"text":"Task: Please give me the SMILES representation of a molecule based on the traditional IUPAC name.\nIUPAC name: 5-chloro-4-(chloromethyl)-2-(3-iodophenoxy)pyridine\nResult: InChI=1S\/C12H8Cl2INO\/c13-6-8-4-12(16-7-11(8)14)17-10-3-1-2-9(15)5-10\/h1-5,7H,6H2"} {"text":"Task: Please generate the SMILES of a molecule based on the traditional IUPAC name.\nIUPAC name: N-[2-(2,5-dichlorophenyl)ethyl]-2-ethoxy-3-[4-[2-(2H-tetrazol-5-yl)phenyl]benzyl]benzimidazole-4-carboxamide\nResult: CCOc1nc2cccc(C(=O)NCCc3cc(Cl)ccc3Cl)c2n1Cc1ccc(-c2ccccc2-c2nn[nH]n2)cc1"}", "/scratch/micpie/export/iupac_smiles/valid_3-9.jsonl": "{"text":"Task: Please create the SMILES of a chemical based on the IUAPC name in CAS-like style.\nIUPAC name: (1S,3R,5S,7R)-3-hydroxy-5-methyl-1-adamantanecarboxylic acid\nResult: [C][C@][C][C@@H1][C][C@][Branch1][Ring2][C][Ring1][=Branch1][Branch1][=C][C][C@@][Branch1][Ring2][C][Ring1][#Branch1][Branch1][Ring2][C][Ring1][#Branch2][O][C][=Branch1][C][=O][O]"} {"text":"Task: Please create the SMILES of a molecule based on the IUAPC name in CAS-like style.\nIUPAC name: 2-[4-(5-hydroxy-6,7-dimethoxy-4-oxo-1-benzopyran-2-yl)phenoxy]-N-[6-[(2-methoxyphenyl)methyl-methylamino]hexyl]acetamide\nResult: COc1ccccc1CN(C)CCCCCCNC(=O)COc1ccc(-c2cc(=O)c3c(O)c(OC)c(OC)cc3o2)cc1"}", "/scratch/micpie/export/iupac_smiles/valid_4-6.jsonl": "{"text":"The canonical SMILES of the chemical with preferred IUPAC name N-(4-chlorophenyl)-N-[[1-[(2S,3R,4R,5S,6R)-3,4-dihydroxy-6-(hydroxymethyl)-5-[(2S,3R,4S,5R,6R)-3,4,5-trihydroxy-6-(hydroxymethyl)oxan-2-yl]oxyoxan-2-yl]triazol-4-yl]methyl]ethanesulfonamide is CCS(=O)(=O)N(Cc1cn([C@H]2O[C@H](CO)[C@@H](O[C@@H]3O[C@H](CO)[C@H](O)[C@H](O)[C@H]3O)[C@H](O)[C@H]2O)nn1)c1ccc(Cl)cc1."} {"text":"The DeepSMILES of the chemical with preferred IUPAC name 4-[1-(aminomethyl)-4-ethylcyclohexyl]-3-ethylpiperidin-4-ol is CCCCCCCC6))CN))CCCNCC6CC)))))))O."}", "/scratch/micpie/export/iupac_smiles/valid_13-4.jsonl": "{"text":"The DeepSMILES of the chemical with systematic IUPAC name 2-[[(2-azanyl-3,3-dimethyl-butanoyl)amino]methyl]-2-methyl-butanoic acid is CCCC)CNC=O)CCC)C)C))N)))))C=O)O."} {"text":"The SMILES of the molecule with systematic IUPAC name (2-chloranylpyridin-4-yl)-[(2R)-2-methyl-1,2,3,4-tetrahydro-1,5-benzodiazepin-5-yl]methanone is C[C@@H]1CCN(C2=CC=CC=C2N1)C(=O)C3=CC(=NC=C3)Cl."}", "/scratch/micpie/export/iupac_smiles/test_23-2.jsonl": "{"text":"The IUPAC name of the compound with SELFIES [C][C][C][Branch2][Ring1][P][C][N][Branch1][Ring2][C][Ring1][=Branch1][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=Branch1][=Branch2][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][O][N+1][=Branch1][C][=O][O-1][C][=O] is 1-(4-hydroxy-3-nitrophenyl)sulfonylpiperidine-3-carbaldehyde."} {"text":"The preferred IUPAC name of the chemical with canonical SMILES CCNS(=O)(=O)c1ccc(C(=O)Nn2cnc3ccccc32)cc1 is N-(benzimidazol-1-yl)-4-(ethylsulfamoyl)benzamide."}", "/scratch/micpie/export/iupac_smiles/train_4-7.jsonl": "{"text":"Task: Please generate the SMILES of a chemical based on the traditional IUPAC name.\nIUPAC name: (1S,2R,4aS,5R,6S,8aS)-6-(1,3-dithiolan-2-yl)-5-[(E)-2-[5-(3-fluorophenyl)-2-pyridyl]vinyl]-2-hydroxy-1,4a-dimethyl-decalin-1-carbaldehyde\nResult: [C][C@@][C][C][C@H1][Branch2][Branch1][Ring2][C@@][Branch2][Ring2][O][C@H1][Ring1][=Branch1][C][C][C@@H1][Branch2][Ring1][O][C@H1][Ring1][#Branch2][\/C][=C][\/C][=N][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][F][C][S][C][C][S][Ring1][Branch1][Branch1][C][C][C][=O][O]"} {"text":"Task: Please generate the SMILES representation of a chemical given the traditional IUPAC name.\nIUPAC name: 4-[1-(aminomethyl)propyl]-3-ethyl-piperidin-4-ol\nResult: CCC1CNCCC1(C(CC)CN)O"}", "/scratch/micpie/export/iupac_smiles/test_14-7.jsonl": "{"text":"Task: Please create the SMILES representation of a chemical based on the traditional IUPAC name.\nIUPAC name: (3S)-5-keto-1-(2,2,2-trifluoroethyl)pyrrolidine-3-carboxylic acid (1,3,5-trimethylpyrazol-4-yl) ester\nResult: Cc1nn(C)c(C)c1OC(=O)[C@H]1CC(=O)N(CC(F)(F)F)C1"} {"text":"Task: Please create the SMILES of a compound given the traditional IUPAC name.\nIUPAC name: 4-bromo-2-iodo-5-(trifluoromethyl)phenol\nResult: C=CC=CC=C6O))I)))Br))CF)F)F"}", "/scratch/micpie/export/iupac_smiles/train_0-9.jsonl": "{"text":"Task: Please generate the SMILES representation of a chemical given the CAS-like IUPAC name.\nIUPAC name: N-[(Z)-(3-bromo-4,5-dimethoxyphenyl)methylideneamino]-1-[(4-nitrophenyl)methyl]-3-pyrazolecarboxamide\nResult: COc1cc(\/C=N\\NC(=O)c2ccn(Cc3ccc([N+](=O)[O-])cc3)n2)cc(Br)c1OC"} {"text":"Task: Please generate the SMILES of a compound given the CAS-like IUPAC name.\nIUPAC name: 1-[4-[3-(2-methyl-1-piperidinyl)propoxy]phenyl]ethanone\nResult: [C][C][C][C][C][C][N][Ring1][=Branch1][C][C][C][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C]"}", "/scratch/micpie/export/iupac_smiles/valid_26-1.jsonl": "{"text":"The CAS-like IUPAC name of the molecule with DeepSMILES CCC=CSC=N5)C=CC=CC=C6))CF)F))))))))C=O)O is 2-[4-(difluoromethyl)phenyl]-4-ethyl-5-thiazolecarboxylic acid."} {"text":"The IUAPC name in CAS-like style of the molecule with SMILES COC1=CC=CC(=C1)C2=NN([C@H](C2)C3=CC=CC=C3Cl)C(=O)CN4CCN(CC4)CC5=CC=CC=C5 is 1-[(3R)-3-(2-chlorophenyl)-5-(3-methoxyphenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(phenylmethyl)-1-piperazinyl]ethanone."}", "/scratch/micpie/export/iupac_smiles/valid_27-3.jsonl": "{"text":"The DeepSMILES of the molecule with traditional IUPAC name 1-[(5S)-5-(4-chlorophenyl)-3-(4-fluorophenyl)-2-pyrazolin-1-yl]-2-(N-methylanilino)ethanone is CNCC=O)N[C@@H]CC=N5)C=CC=CC=C6))F)))))))C=CC=CC=C6))Cl)))))))))C=CC=CC=C6."} {"text":"The SMILES of the molecule with traditional IUPAC name 6-[7'-[(9,9-dimethylfluoren-2-yl)amino]-9,9'-spirobi[fluorene]-2'-yl]benzene-1,2,3,4,5-pentol is CC1(C2=CC=CC=C2C3=C1C=C(C=C3)NC4=CC5=C(C=C4)C6=C(C57C8=CC=CC=C8C9=CC=CC=C79)C=C(C=C6)C1=C(C(=C(C(=C1O)O)O)O)O)C."}", "/scratch/micpie/export/iupac_smiles/train_7-3.jsonl": "{"text":"The InChI of the chemical with traditional IUPAC name 2-[(5-amino-2-chloro-phenyl)sulfonylamino]acetic acid methyl ester is InChI=1S\/C9H11ClN2O4S\/c1-16-9(13)5-12-17(14,15)8-4-6(11)2-3-7(8)10\/h2-4,12H,5,11H2,1H3."} {"text":"The InChI of the molecule with traditional IUPAC name 3-(N-(2-nitrophenyl)sulfonylanilino)propionamide is InChI=1S\/C15H15N3O5S\/c16-15(19)10-11-17(12-6-2-1-3-7-12)24(22,23)14-9-5-4-8-13(14)18(20)21\/h1-9H,10-11H2,(H2,16,19)."}", "/scratch/micpie/export/iupac_smiles/test_14-6.jsonl": "{"text":"The canonical SMILES of the compound with IUPAC name (1,3,5-trimethylpyrazol-4-yl) (3S)-5-oxo-1-(2,2,2-trifluoroethyl)pyrrolidine-3-carboxylate is Cc1nn(C)c(C)c1OC(=O)[C@H]1CC(=O)N(CC(F)(F)F)C1."} {"text":"The canonical SMILES of the chemical with IUPAC name 4-bromo-2-iodo-5-(trifluoromethyl)phenol is Oc1cc(C(F)(F)F)c(Br)cc1I."}", "/scratch/micpie/export/iupac_smiles/valid_26-2.jsonl": "{"text":"The preferred IUPAC name of the chemical with canonical SMILES CCc1nc(-c2ccc(C(F)F)cc2)sc1C(=O)O is 2-[4-(difluoromethyl)phenyl]-4-ethyl-1,3-thiazole-5-carboxylic acid."} {"text":"The preferred IUPAC name of the compound with canonical SMILES COc1cccc(C2=NN(C(=O)CN3CCN(Cc4ccccc4)CC3)[C@@H](c3ccccc3Cl)C2)c1 is 2-(4-benzylpiperazin-1-yl)-1-[(3R)-3-(2-chlorophenyl)-5-(3-methoxyphenyl)-3,4-dihydropyrazol-2-yl]ethanone."}", "/scratch/micpie/export/iupac_smiles/valid_4-5.jsonl": "{"text":"The SELFIES of the molecule with IUAPC name in CAS-like style N-(4-chlorophenyl)-N-[[1-[(2S,3R,4R,5S,6R)-3,4-dihydroxy-6-(hydroxymethyl)-5-[[(2S,3R,4S,5R,6R)-3,4,5-trihydroxy-6-(hydroxymethyl)-2-oxanyl]oxy]-2-oxanyl]-4-triazolyl]methyl]ethanesulfonamide is [C][C][S][=Branch1][C][=O][=Branch1][C][=O][N][Branch2][Branch1][#Branch1][C][C][=C][N][Branch1][Branch1][N][=N][Ring1][Branch1][C@@H1][C@@H1][Branch2][Ring2][#Branch1][C@H1][Branch2][Ring2][C][C@@H1][Branch1][=Branch2][C@H1][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][C@H1][C@@H1][Branch1][P][C@H1][Branch1][=N][C@H1][Branch1][=Branch2][C@H1][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][O][O][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl]."} {"text":"The SMILES of the compound with CAS-like IUPAC name 4-[1-(aminomethyl)-4-ethylcyclohexyl]-3-ethyl-4-piperidinol is CCC1CCC(CC1)(CN)C2(CCNCC2CC)O."}", "/scratch/micpie/export/iupac_smiles/train_4-3.jsonl": "{"text":"The InChI of the molecule with traditional IUPAC name (1S,2R,4aS,5R,6S,8aS)-6-(1,3-dithiolan-2-yl)-5-[(E)-2-[5-(3-fluorophenyl)-2-pyridyl]vinyl]-2-hydroxy-1,4a-dimethyl-decalin-1-carbaldehyde is InChI=1S\/C29H34FNO2S2\/c1-28-13-12-26(33)29(2,18-32)25(28)11-9-23(27-34-14-15-35-27)24(28)10-8-22-7-6-20(17-31-22)19-4-3-5-21(30)16-19\/h3-8,10,16-18,23-27,33H,9,11-15H2,1-2H3\/b10-8+\/t23-,24+,25-,26+,28-,29-\/m0\/s1."} {"text":"The InChI of the compound with traditional IUPAC name 4-[1-(aminomethyl)propyl]-3-ethyl-piperidin-4-ol is InChI=1S\/C11H24N2O\/c1-3-9(7-12)11(14)5-6-13-8-10(11)4-2\/h9-10,13-14H,3-8,12H2,1-2H3."}", "/scratch/micpie/export/iupac_smiles/valid_6-2.jsonl": "{"text":"The preferred IUPAC name of the compound with SMILES COCCCCC(=O)NC1=CC=CC=C1C(=O)NCCN is N-(2-aminoethyl)-2-(5-methoxypentanoylamino)benzamide."} {"text":"The IUPAC name of the chemical with SMILES CC(C)N(CC1=CC=CC=C1)CC2=CC(=C(C=C2)OC)N is 5-[[benzyl(propan-2-yl)amino]methyl]-2-methoxyaniline."}", "/scratch/micpie/export/iupac_smiles/test_15-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with SELFIES [C][C][N][Branch1][#Branch2][C][C][C][Ring1][=Branch1][Branch1][C][F][F][C][=Branch1][C][=O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=Branch2][Ring2][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][Branch1][Ring2][O][Ring1][=Branch1][C][N][C][=Branch1][C][=O][\/C][=C][\/C][=C][N][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][C][=C][C][=Branch1][#Branch2][=C][Branch1][=Branch1][C][=C][Ring1][=Branch1][F][F][Cl] is (E)-3-(6-amino-3-pyridyl)-N-[[7-(5-chloro-2,4-difluoro-phenyl)-5-[4-(4,4-difluoropiperidine-1-carbonyl)phenyl]benzofuran-2-yl]methyl]acrylamide."} {"text":"The traditional IUPAC name of the molecule with InChI InChI=1S\/C15H13Cl2NO\/c16-8-12-7-15(18-9-14(12)17)19-13-5-4-10-2-1-3-11(10)6-13\/h4-7,9H,1-3,8H2 is 5-chloro-4-(chloromethyl)-2-indan-5-yloxy-pyridine."}", "/scratch/micpie/export/iupac_smiles/valid_14-4.jsonl": "{"text":"The SMILES of the chemical with systematic IUPAC name tert-butyl N-[[(1S,2S)-2-(furan-2-ylcarbonylamino)cyclopentyl]methyl]carbamate is CC(C)(C)OC(=O)NC[C@@H]1CCC[C@@H]1NC(=O)C2=CC=CO2."} {"text":"The DeepSMILES of the molecule with systematic IUPAC name tert-butyl N-[2-(5-bromanyl-7-chloranyl-1-benzofuran-2-yl)ethyl]carbamate is CCC)C)OC=O)NCCC=CC=CC=CC=C6O9))Cl)))Br."}", "/scratch/micpie/export/iupac_smiles/train_11-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the compound with DeepSMILES CCCCCCC)C)O))O))))CCC[C@@]C5CCC=C6CCCC6CCCC6C)C))OC=O)C))))))C)))))))))C))C is acetic acid [(14R)-17-(5,6-dihydroxy-6-methylheptan-2-yl)-4,4,10,13,14-pentamethyl-2,3,5,6,7,11,12,15,16,17-decahydro-1H-cyclopenta[a]phenanthren-3-yl] ester."} {"text":"The CAS-like IUPAC name of the compound with InChI InChI=1S\/C45H33NO\/c1-45(2)40-19-8-6-16-37(40)38-27-26-35(29-41(38)45)46(33-24-22-31(23-25-33)30-12-4-3-5-13-30)34-15-10-14-32(28-34)36-18-11-21-43-44(36)39-17-7-9-20-42(39)47-43\/h3-29H,1-2H3 is N-[3-(1-dibenzofuranyl)phenyl]-9,9-dimethyl-N-(4-phenylphenyl)-2-fluorenamine."}", "/scratch/micpie/export/iupac_smiles/valid_10-3.jsonl": "{"text":"The InChI of the compound with traditional IUPAC name 2-methyl-3-[methyl(1-methylpentyl)amino]benzoic acid methyl ester is InChI=1S\/C16H25NO2\/c1-6-7-9-12(2)17(4)15-11-8-10-14(13(15)3)16(18)19-5\/h8,10-12H,6-7,9H2,1-5H3."} {"text":"The DeepSMILES of the molecule with traditional IUPAC name (4S,7R,8S,9S,13Z,16S)-16-[(Z)-1-fluoro-2-(2-pyridyl)vinyl]-4,8-dihydroxy-5,5,9,13-tetramethyl-7-propyl-1-azacyclohexadec-13-ene-2,6-quinone is CCC[C@@H][C@H][C@H]CCC\/C=C\\C[C@H]NC=O)C[C@@H]CC%16=O))C)C))O)))))\/C=C\/C=CC=CC=N6)))))))\/F)))))\/C)))))C))O."}", "/scratch/micpie/export/iupac_smiles/test_25-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the molecule with SMILES CC1=CC(=NC2=C(C=NN12)C(=O)N[C@H](C3CC3)C(F)(F)F)C4=CC5=C(C=C4)OC=N5 is 5-(1,3-benzoxazol-5-yl)-N-[(1R)-1-cyclopropyl-2,2,2-trifluoroethyl]-7-methyl-3-pyrazolo[1,5-a]pyrimidinecarboxamide."} {"text":"The CAS-like IUPAC name of the molecule with InChI InChI=1S\/C15H16O3\/c1-2-11-12-7-9-5-3-4-6-10(9)8-13(12)18-14(11)15(16)17\/h7-8H,2-6H2,1H3,(H,16,17) is 3-ethyl-5,6,7,8-tetrahydrobenzo[f]benzofuran-2-carboxylic acid."}", "/scratch/micpie/export/iupac_smiles/valid_16-9.jsonl": "{"text":"Task: Please create the SMILES representation of a molecule based on the IUAPC name in CAS-like style.\nIUPAC name: 5-chloro-4-(chloromethyl)-2-(3-iodophenoxy)pyridine\nResult: C=CC=CC=C6)I)))OC=NC=CC=C6)CCl)))Cl"} {"text":"Task: Please generate the SMILES representation of a compound given the CAS-like IUPAC name.\nIUPAC name: N-[2-(2,5-dichlorophenyl)ethyl]-2-ethoxy-3-[[4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl]-4-benzimidazolecarboxamide\nResult: CCOc1nc2cccc(C(=O)NCCc3cc(Cl)ccc3Cl)c2n1Cc1ccc(-c2ccccc2-c2nn[nH]n2)cc1"}", "/scratch/micpie/export/iupac_smiles/test_24-8.jsonl": "{"text":"Task: Please generate the SMILES of a compound based on the systematic IUPAC name.\nIUPAC name: 3,5-bis(fluoranyl)-N-(5-sulfanylidene-1H-1,2,4-triazol-4-yl)benzamide\nResult: C1=C(C=C(C=C1F)F)C(=O)NN2C=NNC2=S"} {"text":"Task: Please generate the SMILES representation of a chemical based on the systematic IUPAC name.\nIUPAC name: 7-methyl-5-[6-(trifluoromethyl)pyridin-3-yl]-N-[1,1,1-tris(fluoranyl)propan-2-yl]pyrazolo[1,5-a]pyrimidine-3-carboxamide\nResult: CC1=CC(=NC2=C(C=NN12)C(=O)NC(C)C(F)(F)F)C3=CN=C(C=C3)C(F)(F)F"}", "/scratch/micpie/export/iupac_smiles/train_4-6.jsonl": "{"text":"The canonical SMILES of the chemical with preferred IUPAC name (1S,2R,4aS,5R,6S,8aS)-6-(1,3-dithiolan-2-yl)-5-[(E)-2-[5-(3-fluorophenyl)pyridin-2-yl]ethenyl]-2-hydroxy-1,4a-dimethyl-2,3,4,5,6,7,8,8a-octahydronaphthalene-1-carbaldehyde is C[C@@]1(C=O)[C@H](O)CC[C@@]2(C)[C@H](\/C=C\/c3ccc(-c4cccc(F)c4)cn3)[C@@H](C3SCCS3)CC[C@H]12."} {"text":"The SMILES of the molecule with preferred IUPAC name 4-(1-aminobutan-2-yl)-3-ethylpiperidin-4-ol is CCC1CNCCC1(C(CC)CN)O."}", "/scratch/micpie/export/iupac_smiles/test_19-2.jsonl": "{"text":"The IUPAC name of the compound with SMILES CNCCC1CCN(CC1)CCC(=O)NC2=CC=C(C=C2)Cl.Cl is N-(4-chlorophenyl)-3-[4-[2-(methylamino)ethyl]piperidin-1-yl]propanamide;hydrochloride."} {"text":"The IUPAC name of the molecule with SMILES C[C@H]1C=CC=C2[C@]13CC[C@H](CC(=O)C[C@H](N4C=C5C(=CC=C6C5=C4C[C@@]7([C@@H]6O)CCC8(C7)CCCC8)N(C9=CC(=CCN9)[C@H]([C@@H](C#CC2)C1=CNC=C1C3)C1=CC(=CC=C1)O)CCC(=O)C)C1=CC(=C(C(=C1)OC)OC1=CC=CC(=C1)O)O)OC(=O)C is nan."}", "/scratch/micpie/export/iupac_smiles/train_2-9.jsonl": "{"text":"Task: Please give me the SMILES representation of a molecule based on the IUAPC name in CAS-like style.\nIUPAC name: (6R,8R,9S,13S,14S,17S)-3-methoxy-13-methyl-17-[[(2S)-2-oxanyl]oxy]-6,7,8,9,11,12,14,15,16,17-decahydrocyclopenta[a]phenanthren-6-ol\nResult: C[C@]CC[C@H][C@H][C@@H]6CC[C@@H]9O[C@H]CCCCO6)))))))))))C[C@H]C=C6C=CC=C6)OC)))))))O"} {"text":"Task: Please generate the SMILES representation of a chemical given the IUAPC name in CAS-like style.\nIUPAC name: (1S,3R,5S,7R)-3-hydroxy-5-methyl-1-adamantanecarboxylate\nResult: [C][C@][C][C@@H1][C][C@][Branch1][Ring2][C][Ring1][=Branch1][Branch1][=C][C][C@@][Branch1][Ring2][C][Ring1][#Branch1][Branch1][Ring2][C][Ring1][#Branch2][O][C][=Branch1][C][=O][O-1]"}", "/scratch/micpie/export/iupac_smiles/valid_15-2.jsonl": "{"text":"The preferred IUPAC name of the chemical with InChI InChI=1S\/C23H21F5N2O5S\/c1-36(32,33)34-9-4-17-11-16-10-15(12-18(20(16)35-17)23(26,27)28)19-3-2-14(13-29-19)21(31)30-7-5-22(24,25)6-8-30\/h2-3,10-13H,4-9H2,1H3 is 2-[5-[5-(4,4-difluoropiperidine-1-carbonyl)pyridin-2-yl]-7-(trifluoromethyl)-1-benzofuran-2-yl]ethyl methanesulfonate."} {"text":"The preferred IUPAC name of the compound with InChI InChI=1S\/C16H11Cl2NO\/c17-9-12-8-16(19-10-14(12)18)20-15-7-3-5-11-4-1-2-6-13(11)15\/h1-8,10H,9H2 is 5-chloro-4-(chloromethyl)-2-naphthalen-1-yloxypyridine."}", "/scratch/micpie/export/iupac_smiles/valid_25-1.jsonl": "{"text":"The CAS-like IUPAC name of the molecule with InChI InChI=1S\/C18H18ClN5O\/c1-10-5-16(13-6-14(19)8-20-7-13)23-17-15(9-21-24(10)17)18(25)22-11(2)12-3-4-12\/h5-9,11-12H,3-4H2,1-2H3,(H,22,25)\/t11-\/m0\/s1 is 5-(5-chloro-3-pyridinyl)-N-[(1S)-1-cyclopropylethyl]-7-methyl-3-pyrazolo[1,5-a]pyrimidinecarboxamide."} {"text":"The CAS-like IUPAC name of the chemical with SMILES CCC1=C(SC(=N1)C(C2=CC=CC=C2)N)C(=O)O is 2-[amino(phenyl)methyl]-4-ethyl-5-thiazolecarboxylic acid."}", "/scratch/micpie/export/iupac_smiles/train_17-7.jsonl": "{"text":"Task: Please give me the SMILES representation of a chemical given the traditional IUPAC name.\nIUPAC name: 2-ethoxy-N-(4-pyridyl)-3-[4-[2-(2H-tetrazol-5-yl)phenyl]benzyl]benzimidazole-4-carboxamide\nResult: CCOc1nc2cccc(C(=O)Nc3ccncc3)c2n1Cc1ccc(-c2ccccc2-c2nn[nH]n2)cc1"} {"text":"Task: Please create the SMILES of a molecule given the traditional IUPAC name.\nIUPAC name: 6-(2-fluorophenyl)-8-(1-propylbutyl)quinazoline\nResult: CCCC(CCC)C1=CC(=CC2=CN=CN=C12)C3=CC=CC=C3F"}", "/scratch/micpie/export/iupac_smiles/train_24-5.jsonl": "{"text":"The SMILES of the molecule with IUAPC name in CAS-like style N-(1-benzimidazolyl)-2-[(1-oxo-2-phenoxyethyl)amino]acetamide is C1=CC=C(C=C1)OCC(=O)NCC(=O)NN2C=NC3=CC=CC=C32."} {"text":"The DeepSMILES of the molecule with IUAPC name in CAS-like style N-[(1S)-1-cyclopropyl-2,2,2-trifluoroethyl]-7-methyl-5-[4-(trifluoromethyl)-3-pyridinyl]-3-pyrazolo[1,5-a]pyrimidinecarboxamide is CC=CC=NC=CC=NN95)))C=O)N[C@@H]CCC3)))CF)F)F))))))))C=CC=CN=C6))))CF)F)F."}", "/scratch/micpie/export/iupac_smiles/test_13-0.jsonl": "{"text":"The traditional IUPAC name of the compound with InChI InChI=1S\/C10H22N2O3\/c1-9(2,3)7(11)8(15)12-10(4,5-13)6-14\/h7,13-14H,5-6,11H2,1-4H3,(H,12,15) is 2-amino-N-(2-hydroxy-1-methyl-1-methylol-ethyl)-3,3-dimethyl-butyramide."} {"text":"The traditional IUPAC name of the chemical with DeepSMILES CC=CC=NN5C)))C))OC=O)[C@H]CCOC=CC=CC=C%106 is (4S)-chroman-4-carboxylic acid (1,3,5-trimethylpyrazol-4-yl) ester."}", "/scratch/micpie/export/iupac_smiles/test_9-7.jsonl": "{"text":"Task: Please generate the SMILES of a chemical based on the traditional IUPAC name.\nIUPAC name: 1-(fluoren-9-ylideneamino)oxy-3-[(1-methyl-3-phenyl-propyl)amino]propan-2-ol\nResult: CCCCC=CC=CC=C6))))))))NCCCON=CC=CC=CC=C6C=CC=CC=C6%13))))))))))))))))O"} {"text":"Task: Please generate the SMILES representation of a compound given the traditional IUPAC name.\nIUPAC name: (2S)-5-chloro-3-[ethyl(propionyl)amino]-N-[(2-keto-4,6-dimethyl-3-piperidyl)methyl]-2-methyl-cyclohexanecarboxamide\nResult: CCC=O)NCC))CCCCC[C@@H]6C))C=O)NCCCCCNC6=O)))C)))C))))))))Cl"}", "/scratch/micpie/export/iupac_smiles/valid_20-2.jsonl": "{"text":"The preferred IUPAC name of the molecule with canonical SMILES CCCCCC=CCC1(O)C=CC(=O)C1=CC=CC(CCC(=O)OC)OC(C)=O is methyl 4-acetyloxy-7-(2-hydroxy-2-oct-2-enyl-5-oxocyclopent-3-en-1-ylidene)hept-5-enoate."} {"text":"The preferred IUPAC name of the compound with DeepSMILES CC=O)CNCCNCCNCC=O)O is 2-[2-[2-(2-oxopropylamino)ethylamino]ethylamino]acetic acid."}", "/scratch/micpie/export/iupac_smiles/test_8-6.jsonl": "{"text":"The SELFIES of the compound with IUPAC name N-[3-[(3,4-dimethylphenyl)sulfonylamino]phenyl]-2-methylpropanamide is [C][C][=C][Branch2][Ring2][Branch1][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][S][=Branch1][C][=O][=Branch1][C][=O][N][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][Branch1][C][C][C][C]."} {"text":"The SMILES of the compound with preferred IUPAC name 1-(benzhydrylideneamino)oxy-3-(tert-butylamino)propan-2-ol is CC(C)(C)NCC(CON=C(C1=CC=CC=C1)C2=CC=CC=C2)O."}", "/scratch/micpie/export/iupac_smiles/test_4-3.jsonl": "{"text":"The SMILES of the chemical with traditional IUPAC name 1-(3-methoxyphenyl)sulfonyl-3-[(4-methylpiperazino)methyl]indole is CN1CCN(CC1)CC2=CN(C3=CC=CC=C32)S(=O)(=O)C4=CC=CC(=C4)OC."} {"text":"The SMILES of the chemical with traditional IUPAC name 4-(2-amino-1,1-dimethyl-ethyl)-3-ethyl-piperidin-4-ol is CCC1CNCCC1(C(C)(C)CN)O."}", "/scratch/micpie/export/iupac_smiles/test_16-8.jsonl": "{"text":"Task: Please create the SMILES representation of a molecule given the systematic IUPAC name.\nIUPAC name: 5-chloranyl-4-(chloromethyl)-2-(4-propoxyphenoxy)pyridine\nResult: InChI=1S\/C15H15Cl2NO2\/c1-2-7-19-12-3-5-13(6-4-12)20-15-8-11(9-16)14(17)10-18-15\/h3-6,8,10H,2,7,9H2,1H3"} {"text":"Task: Please generate the SMILES of a compound based on the systematic IUPAC name.\nIUPAC name: 2-ethoxy-N-[(4-fluorophenyl)methyl]-3-[[4-[2-(2H-1,2,3,4-tetrazol-5-yl)phenyl]phenyl]methyl]benzimidazole-4-carboxamide\nResult: InChI=1S\/C31H26FN7O2\/c1-2-41-31-34-27-9-5-8-26(30(40)33-18-20-12-16-23(32)17-13-20)28(27)39(31)19-21-10-14-22(15-11-21)24-6-3-4-7-25(24)29-35-37-38-36-29\/h3-17H,2,18-19H2,1H3,(H,33,40)(H,35,36,37,38)"}", "/scratch/micpie/export/iupac_smiles/valid_6-3.jsonl": "{"text":"The DeepSMILES of the chemical with traditional IUPAC name N-(2-aminoethyl)-2-(5-methoxypentanoylamino)benzamide is COCCCCC=O)NC=CC=CC=C6C=O)NCCN."} {"text":"The DeepSMILES of the molecule with traditional IUPAC name (3-amino-4-methoxy-benzyl)-benzyl-isopropyl-amine is CCC)NCC=CC=CC=C6)))))))CC=CC=CC=C6))OC)))N."}", "/scratch/micpie/export/iupac_smiles/test_24-4.jsonl": "{"text":"The DeepSMILES of the molecule with systematic IUPAC name 3,5-bis(fluoranyl)-N-(5-sulfanylidene-1H-1,2,4-triazol-4-yl)benzamide is C=CC=CC=C6F)))F)))C=O)NNC=NNC5=S."} {"text":"The DeepSMILES of the chemical with systematic IUPAC name 7-methyl-5-[6-(trifluoromethyl)pyridin-3-yl]-N-[1,1,1-tris(fluoranyl)propan-2-yl]pyrazolo[1,5-a]pyrimidine-3-carboxamide is CC=CC=NC=CC=NN95)))C=O)NCC)CF)F)F))))))))C=CN=CC=C6))CF)F)F."}", "/scratch/micpie/export/iupac_smiles/train_22-3.jsonl": "{"text":"The InChI of the compound with traditional IUPAC name 5-mesyl-6-methyl-5-(5-methyl-1,3,4-oxadiazol-2-yl)-4-(trifluoromethyl)cyclohexa-1,3-diene-1-carboxamide is InChI=1S\/C13H14F3N3O4S\/c1-6-8(10(17)20)4-5-9(13(14,15)16)12(6,24(3,21)22)11-19-18-7(2)23-11\/h4-6H,1-3H3,(H2,17,20)."} {"text":"The SMILES of the molecule with traditional IUPAC name 1-propylsulfonylnipecotaldehyde is CCCS(=O)(=O)N1CCCC(C1)C=O."}", "/scratch/micpie/export/iupac_smiles/train_6-0.jsonl": "{"text":"The traditional IUPAC name of the chemical with SMILES CC(C(=O)NC1=CC=CC=C1C(=O)NCCN)OCC2CCCO2 is N-(2-aminoethyl)-2-[2-(tetrahydrofurfuryloxy)propanoylamino]benzamide."} {"text":"The traditional IUPAC name of the compound with InChI InChI=1S\/C14H15FN2O2S\/c1-10(11-5-3-2-4-6-11)17-20(18,19)12-7-8-13(15)14(16)9-12\/h2-10,17H,16H2,1H3\/t10-\/m0\/s1 is 3-amino-4-fluoro-N-[(1S)-1-phenylethyl]benzenesulfonamide."}", "/scratch/micpie/export/iupac_smiles/test_3-7.jsonl": "{"text":"Task: Please give me the SMILES representation of a chemical given the traditional IUPAC name.\nIUPAC name: (2R)-1-(benzylthio)-3-isopropoxy-propan-2-ol\nResult: [C][C][Branch1][C][C][O][C][C@H1][Branch1][N][C][S][C][C][=C][C][=C][C][=C][Ring1][=Branch1][O]"} {"text":"Task: Please generate the SMILES representation of a compound given the traditional IUPAC name.\nIUPAC name: 2-amino-2-methylol-propane-1,3-diol;5-chloro-6-methyl-3H-1,3-benzoxazol-2-one\nResult: Cc1cc2oc(=O)[nH]c2cc1Cl.NC(CO)(CO)CO"}", "/scratch/micpie/export/iupac_smiles/valid_22-9.jsonl": "{"text":"Task: Please give me the SMILES representation of a compound based on the CAS-like IUPAC name.\nIUPAC name: 1-cyclopropyl-3-ethynyl-2-(trifluoromethyl)benzene\nResult: InChI=1S\/C12H9F3\/c1-2-8-4-3-5-10(9-6-7-9)11(8)12(13,14)15\/h1,3-5,9H,6-7H2"} {"text":"Task: Please create the SMILES of a compound given the CAS-like IUPAC name.\nIUPAC name: 1-(3-fluoro-4-methoxyphenyl)sulfonyl-3-piperidinecarboxaldehyde\nResult: COc1ccc(S(=O)(=O)N2CCCC(C=O)C2)cc1F"}", "/scratch/micpie/export/iupac_smiles/train_8-2.jsonl": "{"text":"The preferred IUPAC name of the compound with SELFIES [C][C][=C][Branch1][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N+1][=Branch1][C][=O][O-1][S][=Branch1][C][=O][=Branch1][C][=O][N][Branch1][Branch2][C][C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1] is 3-(N-(2-methyl-5-nitrophenyl)sulfonylanilino)propanamide."} {"text":"The IUPAC name of the chemical with SELFIES [C][C][Branch1][C][C][N][C][C][Branch2][Ring1][=Branch2][C][O][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][#C][O].[Cl] is 1-(propan-2-ylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,11,13-hexaenylideneamino)oxypropan-2-ol;hydrochloride."}", "/scratch/micpie/export/iupac_smiles/train_3-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the molecule with DeepSMILES CCCNC5)C=O)C=CC=CO5)[N+]=O)[O-] is (5-nitro-2-furanyl)-(1-pyrrolidinyl)methanone."} {"text":"The IUAPC name in CAS-like style of the compound with SELFIES [C][C][C][=Branch1][C][=O][N][Branch2][Branch1][Branch1][C][=C][Ring1][#Branch1][C][=C][C][=C][Ring1][=Branch1][O][C][\/C][=C][\/C][O][C@H1][C][C@H1][Branch2][Ring1][=N][N][Branch1][Ring2][C][Ring1][Branch1][C][=N][C][=C][Branch1][S][C][=N][N][Ring1][Branch1][C][=Branch1][Ring2][=C][Ring1][=Branch2][N][Ring2][Ring1][Branch2][Cl][C][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1] is (12S,14S,17E)-7-chloro-12-(hydroxymethyl)-23-(phenylmethyl)-15,20-dioxa-2,4,5,9,11,23-hexazahexacyclo[19.7.1.13,10.111,14.04,8.022,27]hentriaconta-1(28),3(31),5,7,9,17,21(29),22(27)-octaen-24-one."}", "/scratch/micpie/export/iupac_smiles/valid_18-8.jsonl": "{"text":"Task: Please generate the SMILES representation of a compound given the systematic IUPAC name.\nIUPAC name: 8-(1-methylpyrrolidin-3-yl)-6-pyridin-3-yl-quinazoline\nResult: CN1CCC(C1)C2=CC(=CC3=CN=CN=C23)C4=CN=CC=C4"} {"text":"Task: Please give me the SMILES of a chemical given the systematic IUPAC name.\nIUPAC name: 2-[4-[2-(methylamino)ethyl]piperidin-1-yl]-1-(2-thiophen-2-ylpyrrolidin-1-yl)propan-1-one\nResult: InChI=1S\/C19H31N3OS\/c1-15(21-12-8-16(9-13-21)7-10-20-2)19(23)22-11-3-5-17(22)18-6-4-14-24-18\/h4,6,14-17,20H,3,5,7-13H2,1-2H3"}", "/scratch/micpie/export/iupac_smiles/train_18-3.jsonl": "{"text":"The SELFIES of the compound with traditional IUPAC name N-[3-[6-(6-methyl-3-pyridyl)-3,8a-dihydroquinazolin-8-yl]phenyl]acrylamide is [C][C][=N][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][N][C][=N][C][Ring1][=Branch1][C][=Branch1][Ring2][=C][Ring1][#Branch2][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=C]."} {"text":"The SMILES of the molecule with traditional IUPAC name 2-[1-(3,4-difluorobenzyl)-4-piperidyl]ethyl-methyl-amine;hydrochloride is CNCCC1CCN(CC1)CC2=CC(=C(C=C2)F)F.Cl."}", "/scratch/micpie/export/iupac_smiles/valid_4-9.jsonl": "{"text":"Task: Please give me the SMILES representation of a molecule based on the CAS-like IUPAC name.\nIUPAC name: N-(4-chlorophenyl)-N-[[1-[(2S,3R,4R,5S,6R)-3,4-dihydroxy-6-(hydroxymethyl)-5-[[(2S,3R,4S,5R,6R)-3,4,5-trihydroxy-6-(hydroxymethyl)-2-oxanyl]oxy]-2-oxanyl]-4-triazolyl]methyl]ethanesulfonamide\nResult: InChI=1S\/C23H33ClN4O12S\/c1-2-41(36,37)28(13-5-3-11(24)4-6-13)8-12-7-27(26-25-12)22-19(34)18(33)21(15(10-30)38-22)40-23-20(35)17(32)16(31)14(9-29)39-23\/h3-7,14-23,29-35H,2,8-10H2,1H3\/t14-,15-,16+,17+,18-,19-,20-,21-,22+,23+\/m1\/s1"} {"text":"Task: Please generate the SMILES representation of a molecule based on the IUAPC name in CAS-like style.\nIUPAC name: 4-[1-(aminomethyl)-4-ethylcyclohexyl]-3-ethyl-4-piperidinol\nResult: CCC1CCC(CC1)(CN)C2(CCNCC2CC)O"}", "/scratch/micpie/export/iupac_smiles/test_8-0.jsonl": "{"text":"The traditional IUPAC name of the compound with InChI InChI=1S\/C18H22N2O3S\/c1-12(2)18(21)19-15-6-5-7-16(11-15)20-24(22,23)17-9-8-13(3)14(4)10-17\/h5-12,20H,1-4H3,(H,19,21) is N-[3-[(3,4-dimethylphenyl)sulfonylamino]phenyl]-2-methyl-propionamide."} {"text":"The traditional IUPAC name of the chemical with DeepSMILES CCC)C)NCCCON=CC=CC=CC=C6))))))C=CC=CC=C6))))))))))O is 1-(benzhydrylideneamino)oxy-3-(tert-butylamino)propan-2-ol."}", "/scratch/micpie/export/iupac_smiles/valid_2-3.jsonl": "{"text":"The InChI of the compound with traditional IUPAC name 4-[(1S,2S)-1-isopropyl-2-methyl-pentyl]phenol is InChI=1S\/C15H24O\/c1-5-6-12(4)15(11(2)3)13-7-9-14(16)10-8-13\/h7-12,15-16H,5-6H2,1-4H3\/t12-,15-\/m0\/s1."} {"text":"The SMILES of the chemical with traditional IUPAC name 8-bromo-6-nitro-3-(2-p-phenetylthiazol-4-yl)coumarin is CCOC1=CC=C(C=C1)C2=NC(=CS2)C3=CC4=CC(=CC(=C4OC3=O)Br)[N+](=O)[O-]."}", "/scratch/micpie/export/iupac_smiles/train_9-9.jsonl": "{"text":"Task: Please generate the SMILES representation of a chemical based on the CAS-like IUPAC name.\nIUPAC name: 1-(tert-butylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,11,13-hexaenylideneamino)oxy-2-propanol\nResult: CC(C)(C)NCC(O)CON=C1c2ccccc2CCc2ccccc21"} {"text":"Task: Please generate the SMILES of a chemical given the IUAPC name in CAS-like style.\nIUPAC name: N-[(5-butyl-2,4-dimethyl-6-oxo-3,4-dihydropyridin-5-yl)methyl]-3-[cyclopentyl(methyl)amino]-2-ethenylbenzamide\nResult: CCCCCCCC=NC6=O)))C)))C))CNC=O)C=CC=CC=C6)))NC)CCCCC5)))))))C=C"}", "/scratch/micpie/export/iupac_smiles/test_17-2.jsonl": "{"text":"The IUPAC name of the molecule with DeepSMILES CCOC=NC=CC=CC=C6N9)))))C=O)OCC=CC=CC=C6))C=CC=CC=C6C=NNN=N5 is [4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl 2-ethoxy-1H-benzimidazole-4-carboxylate."} {"text":"The IUPAC name of the compound with canonical SMILES C\/C=C\\C(=C\/C)C1(C)C=C(C2=Cc3c(c(-c4ccc(-c5ccccn5)cc4)c4ccccc4c3-c3ccc(-c4ccccn4)cc3)CC2I)C=C(C2=CC(C)CC=C2)C1 is 2-[4-[2-[3-[(2E,4Z)-hexa-2,4-dien-3-yl]-3-methyl-5-(3-methylcyclohexa-1,5-dien-1-yl)cyclohexa-1,5-dien-1-yl]-3-iodo-10-(4-pyridin-2-ylphenyl)-3,4-dihydroanthracen-9-yl]phenyl]pyridine."}", "/scratch/micpie/export/iupac_smiles/train_19-8.jsonl": "{"text":"Task: Please create the SMILES representation of a chemical based on the systematic IUPAC name.\nIUPAC name: N-(4-chlorophenyl)-3-[4-[2-(methylamino)ethyl]piperidin-1-yl]propanamide\nResult: [C][N][C][C][C][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl]"} {"text":"Task: Please give me the SMILES of a compound given the systematic IUPAC name.\nIUPAC name: 3-[N-[2,6-bis(chloranyl)-4-[oxidanidyl(oxidanyl)amino]phenyl]-C-methyl-carbonimidoyl]chromen-2-one\nResult: InChI=1S\/C17H11Cl2N2O4\/c1-9(12-6-10-4-2-3-5-15(10)25-17(12)22)20-16-13(18)7-11(21(23)24)8-14(16)19\/h2-8,23H,1H3\/q-1"}", "/scratch/micpie/export/iupac_smiles/test_1-6.jsonl": "{"text":"The InChI of the compound with IUPAC name (3R)-3-nitrocyclohexan-1-ol is InChI=1S\/C6H11NO3\/c8-6-3-1-2-5(4-6)7(9)10\/h5-6,8H,1-4H2\/t5-,6?\/m1\/s1."} {"text":"The InChI of the compound with preferred IUPAC name (2S)-1-[[(2S)-2-hydroxy-3-(1H-indol-4-yloxy)propyl]-propan-2-ylamino]-3-(1H-indol-4-yloxy)propan-2-ol is InChI=1S\/C25H31N3O4\/c1-17(2)28(13-18(29)15-31-24-7-3-5-22-20(24)9-11-26-22)14-19(30)16-32-25-8-4-6-23-21(25)10-12-27-23\/h3-12,17-19,26-27,29-30H,13-16H2,1-2H3\/t18-,19-\/m0\/s1."}", "/scratch/micpie/export/iupac_smiles/train_21-8.jsonl": "{"text":"Task: Please create the SMILES representation of a compound based on the systematic IUPAC name.\nIUPAC name: (2S)-3-cyclohexyl-2-[2-[3-ethyl-5-[2-[[3-[4-(3-fluorophenyl)-1,2,3-triazol-1-yl]-5-[4-[4-(3-fluorophenyl)-1,2,3-triazol-1-yl]-6-(hydroxymethyl)-3,5-bis(oxidanyl)oxan-2-yl]oxy-4-oxidanyl-cyclohexyl]carbonylamino]ethylcarbamoyl]-2-[6-methyl-3,4,5-tris(oxidanyl)oxan-2-yl]oxy-cyclohexyl]oxy-6-(hydroxymethyl)-5-oxidanyl-3-(phenylcarbonyloxy)oxan-4-yl]oxy-propanoic acid\nResult: CCC1CC(C(=O)NCCNC(=O)C2CC(OC3OC(CO)C(O)C(n4cc(-c5cccc(F)c5)nn4)C3O)C(O)C(n3cc(-c4cccc(F)c4)nn3)C2)CC(OC2OC(CO)C(O)C(O[C@@H](CC3CCCCC3)C(=O)O)C2OC(=O)c2ccccc2)C1OC1OC(C)C(O)C(O)C1O"} {"text":"Task: Please generate the SMILES representation of a chemical given the systematic IUPAC name.\nIUPAC name: 3-[4-(trifluoromethyloxy)phenyl]pyridazine\nResult: C1=CC(=NN=C1)C2=CC=C(C=C2)OC(F)(F)F"}", "/scratch/micpie/export/iupac_smiles/valid_23-6.jsonl": "{"text":"The InChI of the compound with preferred IUPAC name 1-(3-chlorophenyl)sulfonylpiperidine-3-carbaldehyde is InChI=1S\/C12H14ClNO3S\/c13-11-4-1-5-12(7-11)18(16,17)14-6-2-3-10(8-14)9-15\/h1,4-5,7,9-10H,2-3,6,8H2."} {"text":"The canonical SMILES of the compound with IUPAC name N-(benzimidazol-1-yl)-3-[(2,3,4,5,6-pentamethylphenyl)sulfonylamino]propanamide is Cc1c(C)c(C)c(S(=O)(=O)NCCC(=O)Nn2cnc3ccccc32)c(C)c1C."}", "/scratch/micpie/export/iupac_smiles/test_23-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with InChI InChI=1S\/C12H14N2O6S\/c15-8-9-2-1-5-13(7-9)21(19,20)10-3-4-12(16)11(6-10)14(17)18\/h3-4,6,8-9,16H,1-2,5,7H2 is 1-(4-hydroxy-3-nitro-phenyl)sulfonylnipecotaldehyde."} {"text":"The traditional IUPAC name of the molecule with canonical SMILES CCNS(=O)(=O)c1ccc(C(=O)Nn2cnc3ccccc32)cc1 is N-(benzimidazol-1-yl)-4-(ethylsulfamoyl)benzamide."}", "/scratch/micpie/export/iupac_smiles/train_13-8.jsonl": "{"text":"Task: Please generate the SMILES of a chemical given the systematic IUPAC name.\nIUPAC name: 2-azanyl-N,3,3-trimethyl-N-prop-2-ynyl-butanamide\nResult: [C][C][Branch1][C][C][Branch1][C][C][C][Branch1][N][C][=Branch1][C][=O][N][Branch1][C][C][C][C][#C][N]"} {"text":"Task: Please give me the SMILES of a molecule given the systematic IUPAC name.\nIUPAC name: (1,3,5-trimethylpyrazol-4-yl) (4R)-3,4-dihydro-2H-chromene-4-carboxylate\nResult: InChI=1S\/C16H18N2O3\/c1-10-15(11(2)18(3)17-10)21-16(19)13-8-9-20-14-7-5-4-6-12(13)14\/h4-7,13H,8-9H2,1-3H3\/t13-\/m1\/s1"}", "/scratch/micpie/export/iupac_smiles/valid_3-1.jsonl": "{"text":"The CAS-like IUPAC name of the compound with canonical SMILES C[C@@]12C[C@H]3C[C@@](O)(C1)C[C@](C(=O)O)(C3)C2 is (1S,3R,5S,7R)-3-hydroxy-5-methyl-1-adamantanecarboxylic acid."} {"text":"The CAS-like IUPAC name of the compound with canonical SMILES COc1ccccc1CN(C)CCCCCCNC(=O)COc1ccc(-c2cc(=O)c3c(O)c(OC)c(OC)cc3o2)cc1 is 2-[4-(5-hydroxy-6,7-dimethoxy-4-oxo-1-benzopyran-2-yl)phenoxy]-N-[6-[(2-methoxyphenyl)methyl-methylamino]hexyl]acetamide."}", "/scratch/micpie/export/iupac_smiles/valid_10-5.jsonl": "{"text":"The canonical SMILES of the compound with IUAPC name in CAS-like style 3-[hexan-2-yl(methyl)amino]-2-methylbenzoic acid methyl ester is CCCCC(C)N(C)c1cccc(C(=O)OC)c1C."} {"text":"The SMILES of the chemical with CAS-like IUPAC name (4S,7R,8S,9S,13Z,16S)-16-[(Z)-1-fluoro-2-(2-pyridinyl)ethenyl]-4,8-dihydroxy-5,5,9,13-tetramethyl-7-propyl-1-azacyclohexadec-13-ene-2,6-dione is CCC[C@@H]1[C@H]([C@H](CCC\/C(=C\\C[C@H](NC(=O)C[C@@H](C(C1=O)(C)C)O)\/C(=C\/C2=CC=CC=N2)\/F)\/C)C)O."}", "/scratch/micpie/export/iupac_smiles/test_7-9.jsonl": "{"text":"Task: Please give me the SMILES representation of a chemical given the IUAPC name in CAS-like style.\nIUPAC name: 3-amino-4-hydroxy-N-(2-pyridinylmethyl)benzenesulfonamide\nResult: [C][=C][C][=N][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=Branch1][=Branch2][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][O][N]"} {"text":"Task: Please create the SMILES of a molecule given the IUAPC name in CAS-like style.\nIUPAC name: 3-(N-(4-fluorophenyl)sulfonylanilino)propanamide\nResult: C=CC=CC=C6))NCCC=O)N))))S=O)=O)C=CC=CC=C6))F"}", "/scratch/micpie/export/iupac_smiles/train_27-0.jsonl": "{"text":"The traditional IUPAC name of the chemical with canonical SMILES O=C(CN1CCN(c2ccccc2F)CC1)N1N=C(c2ccc(F)cc2)C[C@@H]1c1ccc(Cl)cc1 is 1-[(5R)-5-(4-chlorophenyl)-3-(4-fluorophenyl)-2-pyrazolin-1-yl]-2-[4-(2-fluorophenyl)piperazino]ethanone."} {"text":"The traditional IUPAC name of the compound with DeepSMILES CCC=CC=CC=C6C=C9C=CC=C6))NC=CC=CC=C6))C=CC=CC=C6O))O))C=CC=CC=C6O))O))O))O))O))))O))O))))))))))))))))))C is 6-[4-[4-[(9,9-dimethylfluoren-2-yl)amino]phenyl]-2,3,5,6-tetrahydroxy-phenyl]benzene-1,2,3,4,5-pentol."}", "/scratch/micpie/export/iupac_smiles/train_14-4.jsonl": "{"text":"The canonical SMILES of the molecule with systematic IUPAC name (1,3,5-trimethylpyrazol-4-yl) (3R)-5-oxidanylidene-1-[2,2,2-tris(fluoranyl)ethyl]pyrrolidine-3-carboxylate is Cc1nn(C)c(C)c1OC(=O)[C@@H]1CC(=O)N(CC(F)(F)F)C1."} {"text":"The canonical SMILES of the molecule with systematic IUPAC name [7-chloranyl-5-[4-(3-fluoranylpyrrolidin-1-yl)sulfonylphenyl]-1-benzofuran-2-yl]methanamine is NCc1cc2cc(-c3ccc(S(=O)(=O)N4CCC(F)C4)cc3)cc(Cl)c2o1."}", "/scratch/micpie/export/iupac_smiles/train_23-6.jsonl": "{"text":"The canonical SMILES of the compound with IUPAC name 1-(2-nitrophenyl)sulfonylpiperidine-3-carbaldehyde is O=CC1CCCN(S(=O)(=O)c2ccccc2[N+](=O)[O-])C1."} {"text":"The DeepSMILES of the chemical with preferred IUPAC name N-(benzimidazol-1-yl)-4-(2,5-dimethylphenyl)-4-oxobutanamide is CC=CC=CC=C6))C))C=O)CCC=O)NNC=NC=CC=CC=C69."}", "/scratch/micpie/export/iupac_smiles/train_12-8.jsonl": "{"text":"Task: Please give me the SMILES of a compound based on the systematic IUPAC name.\nIUPAC name: N-(3-dibenzofuran-1-ylphenyl)-9,9-dimethyl-N-(3-phenylphenyl)fluoren-2-amine\nResult: CC1(C)c2ccccc2-c2ccc(N(c3cccc(-c4ccccc4)c3)c3cccc(-c4cccc5oc6ccccc6c45)c3)cc21"} {"text":"Task: Please create the SMILES representation of a compound given the systematic IUPAC name.\nIUPAC name: 2-[(2-azanyl-3,3-dimethyl-butanoyl)amino]pent-4-enoic acid\nResult: InChI=1S\/C11H20N2O3\/c1-5-6-7(10(15)16)13-9(14)8(12)11(2,3)4\/h5,7-8H,1,6,12H2,2-4H3,(H,13,14)(H,15,16)"}", "/scratch/micpie/export/iupac_smiles/train_14-8.jsonl": "{"text":"Task: Please create the SMILES of a molecule given the systematic IUPAC name.\nIUPAC name: (1,3,5-trimethylpyrazol-4-yl) (3R)-5-oxidanylidene-1-[2,2,2-tris(fluoranyl)ethyl]pyrrolidine-3-carboxylate\nResult: Cc1nn(C)c(C)c1OC(=O)[C@@H]1CC(=O)N(CC(F)(F)F)C1"} {"text":"Task: Please give me the SMILES of a molecule based on the systematic IUPAC name.\nIUPAC name: [7-chloranyl-5-[4-(3-fluoranylpyrrolidin-1-yl)sulfonylphenyl]-1-benzofuran-2-yl]methanamine\nResult: NCc1cc2cc(-c3ccc(S(=O)(=O)N4CCC(F)C4)cc3)cc(Cl)c2o1"}", "/scratch/micpie/export/iupac_smiles/valid_16-3.jsonl": "{"text":"The InChI of the chemical with traditional IUPAC name 5-chloro-4-(chloromethyl)-2-(3-iodophenoxy)pyridine is InChI=1S\/C12H8Cl2INO\/c13-6-8-4-12(16-7-11(8)14)17-10-3-1-2-9(15)5-10\/h1-5,7H,6H2."} {"text":"The InChI of the chemical with traditional IUPAC name N-[2-(2,5-dichlorophenyl)ethyl]-2-ethoxy-3-[4-[2-(2H-tetrazol-5-yl)phenyl]benzyl]benzimidazole-4-carboxamide is InChI=1S\/C32H27Cl2N7O2\/c1-2-43-32-36-28-9-5-8-26(31(42)35-17-16-22-18-23(33)14-15-27(22)34)29(28)41(32)19-20-10-12-21(13-11-20)24-6-3-4-7-25(24)30-37-39-40-38-30\/h3-15,18H,2,16-17,19H2,1H3,(H,35,42)(H,37,38,39,40)."}", "/scratch/micpie/export/iupac_smiles/test_14-8.jsonl": "{"text":"Task: Please generate the SMILES representation of a molecule given the systematic IUPAC name.\nIUPAC name: (1,3,5-trimethylpyrazol-4-yl) (3S)-5-oxidanylidene-1-[2,2,2-tris(fluoranyl)ethyl]pyrrolidine-3-carboxylate\nResult: Cc1nn(C)c(C)c1OC(=O)[C@H]1CC(=O)N(CC(F)(F)F)C1"} {"text":"Task: Please give me the SMILES representation of a molecule based on the systematic IUPAC name.\nIUPAC name: 4-bromanyl-2-iodanyl-5-(trifluoromethyl)phenol\nResult: Oc1cc(C(F)(F)F)c(Br)cc1I"}", "/scratch/micpie/export/iupac_smiles/test_18-5.jsonl": "{"text":"The SMILES of the chemical with CAS-like IUPAC name [(E)-2-ethyl-7,10-dimethylundec-3-enylidene]-dimethylphosphonium is CCC(\/C=C\/CCC(C)CCC(C)C)C=[P+](C)C."} {"text":"The canonical SMILES of the chemical with CAS-like IUPAC name N-methyl-2-[1-[2-(4-nitrophenyl)ethyl]-4-piperidinyl]ethanamine;hydrochloride is CNCCC1CCN(CCc2ccc([N+](=O)[O-])cc2)CC1.Cl."}", "/scratch/micpie/export/iupac_smiles/train_26-7.jsonl": "{"text":"Task: Please generate the SMILES representation of a molecule given the traditional IUPAC name.\nIUPAC name: 2-(4-chloro-3-fluoro-phenyl)-4-ethyl-thiazole-5-carboxylic acid\nResult: CCC1=C(SC(=N1)C2=CC(=C(C=C2)Cl)F)C(=O)O"} {"text":"Task: Please generate the SMILES of a chemical based on the traditional IUPAC name.\nIUPAC name: 1-[(5S)-5-(4-chlorophenyl)-3-(4-fluorophenyl)-2-pyrazolin-1-yl]-2-[4-(2-fluorophenyl)piperazino]ethanone\nResult: C1CN(CCN1CC(=O)N2[C@@H](CC(=N2)C3=CC=C(C=C3)F)C4=CC=C(C=C4)Cl)C5=CC=CC=C5F"}", "/scratch/micpie/export/iupac_smiles/valid_6-5.jsonl": "{"text":"The SELFIES of the chemical with CAS-like IUPAC name N-(2-aminoethyl)-2-[(5-methoxy-1-oxopentyl)amino]benzamide is [C][O][C][C][C][C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][N]."} {"text":"The canonical SMILES of the chemical with IUAPC name in CAS-like style 2-methoxy-5-[[(phenylmethyl)-propan-2-ylamino]methyl]aniline is COc1ccc(CN(Cc2ccccc2)C(C)C)cc1N."}", "/scratch/micpie/export/iupac_smiles/valid_24-6.jsonl": "{"text":"The DeepSMILES of the molecule with IUPAC name 2-(1,4-dioxo-3H-phthalazin-2-yl)-N-(2-hydroxyphenyl)acetamide is C=CC=CC=C6)C=O)NNC6=O))CC=O)NC=CC=CC=C6O."} {"text":"The InChI of the compound with preferred IUPAC name 5-(7-fluoro-1,3-benzoxazol-5-yl)-7-methyl-N-(1,1,1-trifluoropropan-2-yl)pyrazolo[1,5-a]pyrimidine-3-carboxamide is InChI=1S\/C18H13F4N5O2\/c1-8-3-13(10-4-12(19)15-14(5-10)23-7-29-15)26-16-11(6-24-27(8)16)17(28)25-9(2)18(20,21)22\/h3-7,9H,1-2H3,(H,25,28)."}", "/scratch/micpie/export/iupac_smiles/valid_11-8.jsonl": "{"text":"Task: Please create the SMILES representation of a compound based on the systematic IUPAC name.\nIUPAC name: 13-chloranyl-2-piperidin-4-ylidene-4-azatricyclo[9.4.0.03,8]pentadeca-1(11),3(8),4,6,12,14-hexaene;disulfate\nResult: [C][C][C][=C][Branch1][#Branch2][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][Cl][C][=Branch1][=Branch2][=C][C][C][N][C][C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=N][Ring1][=Branch1].[O-1][S][=Branch1][C][=O][=Branch1][C][=O][O-1].[O-1][S][=Branch1][C][=O][=Branch1][C][=O][O-1]"} {"text":"Task: Please create the SMILES representation of a chemical based on the systematic IUPAC name.\nIUPAC name: 4-(7-bromanyl-9,9-dioctyl-fluoren-2-yl)benzaldehyde\nResult: [C][C][C][C][C][C][C][C][C][Branch2][Ring2][#Branch1][C][=C][Branch2][Ring1][Branch1][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=O][C][=C][Ring1][P][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Br][C][C][C][C][C][C][C][C]"}", "/scratch/micpie/export/iupac_smiles/train_18-9.jsonl": "{"text":"Task: Please create the SMILES representation of a compound given the IUAPC name in CAS-like style.\nIUPAC name: N-[3-[6-(6-methyl-3-pyridinyl)-3,8a-dihydroquinazolin-8-yl]phenyl]-2-propenamide\nResult: CC1=NC=C(C=C1)C2=CC3=CNC=NC3C(=C2)C4=CC(=CC=C4)NC(=O)C=C"} {"text":"Task: Please give me the SMILES of a molecule given the CAS-like IUPAC name.\nIUPAC name: 2-[1-[(3,4-difluorophenyl)methyl]-4-piperidinyl]-N-methylethanamine;hydrochloride\nResult: CNCCC1CCN(CC1)CC2=CC(=C(C=C2)F)F.Cl"}", "/scratch/micpie/export/iupac_smiles/valid_24-4.jsonl": "{"text":"The InChI of the chemical with systematic IUPAC name 2-[1,4-bis(oxidanylidene)-3H-phthalazin-2-yl]-N-(2-hydroxyphenyl)ethanamide is InChI=1S\/C16H13N3O4\/c20-13-8-4-3-7-12(13)17-14(21)9-19-16(23)11-6-2-1-5-10(11)15(22)18-19\/h1-8,20H,9H2,(H,17,21)(H,18,22)."} {"text":"The SELFIES of the molecule with systematic IUPAC name 5-(7-fluoranyl-1,3-benzoxazol-5-yl)-7-methyl-N-[1,1,1-tris(fluoranyl)propan-2-yl]pyrazolo[1,5-a]pyrimidine-3-carboxamide is [C][C][=C][C][=Branch2][Ring1][=C][=N][C][=C][Branch1][Branch2][C][=N][N][Ring1][=Branch2][Ring1][Branch1][C][=Branch1][C][=O][N][C][Branch1][C][C][C][Branch1][C][F][Branch1][C][F][F][C][=C][C][=C][Branch1][Branch2][C][=Branch1][Ring2][=C][Ring1][=Branch1][F][O][C][=N][Ring1][Branch2]."}", "/scratch/micpie/export/iupac_smiles/valid_14-7.jsonl": "{"text":"Task: Please generate the SMILES representation of a compound given the traditional IUPAC name.\nIUPAC name: N-[[(1S,2S)-2-(2-furoylamino)cyclopentyl]methyl]carbamic acid tert-butyl ester\nResult: CCC)C)OC=O)NC[C@@H]CCC[C@@H]5NC=O)C=CC=CO5"} {"text":"Task: Please create the SMILES representation of a compound given the traditional IUPAC name.\nIUPAC name: N-[2-(5-bromo-7-chloro-benzofuran-2-yl)ethyl]carbamic acid tert-butyl ester\nResult: CCC)C)OC=O)NCCC=CC=CC=CC=C6O9))Cl)))Br"}", "/scratch/micpie/export/iupac_smiles/train_26-8.jsonl": "{"text":"Task: Please create the SMILES representation of a chemical given the systematic IUPAC name.\nIUPAC name: 2-(4-chloranyl-3-fluoranyl-phenyl)-4-ethyl-1,3-thiazole-5-carboxylic acid\nResult: CCc1nc(-c2ccc(Cl)c(F)c2)sc1C(=O)O"} {"text":"Task: Please create the SMILES representation of a chemical based on the systematic IUPAC name.\nIUPAC name: 1-[(3S)-3-(4-chlorophenyl)-5-(4-fluorophenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(2-fluorophenyl)piperazin-1-yl]ethanone\nResult: O=C(CN1CCN(c2ccccc2F)CC1)N1N=C(c2ccc(F)cc2)C[C@H]1c1ccc(Cl)cc1"}", "/scratch/micpie/export/iupac_smiles/test_26-4.jsonl": "{"text":"The InChI of the chemical with systematic IUPAC name 2-[2,3-bis(chloranyl)phenyl]-4-ethyl-1,3-thiazole-5-carboxylic acid is InChI=1S\/C12H9Cl2NO2S\/c1-2-8-10(12(16)17)18-11(15-8)6-4-3-5-7(13)9(6)14\/h3-5H,2H2,1H3,(H,16,17)."} {"text":"The InChI of the molecule with systematic IUPAC name 1-[(3R)-3-(2-chlorophenyl)-5-(3-methoxyphenyl)-3,4-dihydropyrazol-2-yl]-2-(3,4-dihydro-1H-isoquinolin-2-yl)ethanone is InChI=1S\/C27H26ClN3O2\/c1-33-22-10-6-9-20(15-22)25-16-26(23-11-4-5-12-24(23)28)31(29-25)27(32)18-30-14-13-19-7-2-3-8-21(19)17-30\/h2-12,15,26H,13-14,16-18H2,1H3\/t26-\/m1\/s1."}", "/scratch/micpie/export/iupac_smiles/valid_0-3.jsonl": "{"text":"The SMILES of the molecule with traditional IUPAC name 1-(4-bromobenzyl)-N-[(Z)-1-(p-tolyl)ethylideneamino]pyrazole-3-carboxamide is CC1=CC=C(C=C1)\/C(=N\\NC(=O)C2=NN(C=C2)CC3=CC=C(C=C3)Br)\/C."} {"text":"The canonical SMILES of the chemical with traditional IUPAC name ethylene;4-methoxy-4,6-dimethyl-tetrahydropyran-2,5-diol is C=C.COC1(C)CC(O)OC(C)C1O."}", "/scratch/micpie/export/iupac_smiles/valid_1-5.jsonl": "{"text":"The canonical SMILES of the molecule with IUAPC name in CAS-like style ethene;N-ethyl-N-methyl-2-propen-1-amine is C=C.C=CCN(C)CC."} {"text":"The InChI of the molecule with CAS-like IUPAC name (2R)-1-[[(2S)-2-hydroxy-3-(1H-indol-4-yloxy)propyl]-propan-2-ylamino]-3-(1H-indol-4-yloxy)-2-propanol is InChI=1S\/C25H31N3O4\/c1-17(2)28(13-18(29)15-31-24-7-3-5-22-20(24)9-11-26-22)14-19(30)16-32-25-8-4-6-23-21(25)10-12-27-23\/h3-12,17-19,26-27,29-30H,13-16H2,1-2H3\/t18-,19+."}", "/scratch/micpie/export/iupac_smiles/train_5-7.jsonl": "{"text":"Task: Please give me the SMILES representation of a compound based on the traditional IUPAC name.\nIUPAC name: 4-[3-(aminomethyl)tetrahydrofuran-3-yl]-3-ethyl-piperidin-4-ol\nResult: [C][C][C][C][N][C][C][C][Ring1][=Branch1][Branch1][N][C][Branch1][#Branch1][C][C][O][C][Ring1][Branch1][C][N][O]"} {"text":"Task: Please generate the SMILES of a chemical given the traditional IUPAC name.\nIUPAC name: N-(2-aminoethyl)-2-[2-(tetrahydrofurfuryloxy)propanoylamino]benzamide;hydrochloride\nResult: InChI=1S\/C17H25N3O4.ClH\/c1-12(24-11-13-5-4-10-23-13)16(21)20-15-7-3-2-6-14(15)17(22)19-9-8-18;\/h2-3,6-7,12-13H,4-5,8-11,18H2,1H3,(H,19,22)(H,20,21);1H"}", "/scratch/micpie/export/iupac_smiles/valid_10-6.jsonl": "{"text":"The canonical SMILES of the molecule with preferred IUPAC name methyl 3-[hexan-2-yl(methyl)amino]-2-methylbenzoate is CCCCC(C)N(C)c1cccc(C(=O)OC)c1C."} {"text":"The SMILES of the chemical with IUPAC name (4S,7R,8S,9S,13Z,16S)-16-[(Z)-1-fluoro-2-pyridin-2-ylethenyl]-4,8-dihydroxy-5,5,9,13-tetramethyl-7-propyl-1-azacyclohexadec-13-ene-2,6-dione is CCC[C@@H]1[C@H]([C@H](CCC\/C(=C\\C[C@H](NC(=O)C[C@@H](C(C1=O)(C)C)O)\/C(=C\/C2=CC=CC=N2)\/F)\/C)C)O."}", "/scratch/micpie/export/iupac_smiles/valid_3-7.jsonl": "{"text":"Task: Please create the SMILES representation of a molecule given the traditional IUPAC name.\nIUPAC name: (1S,3R,5S,7R)-3-hydroxy-5-methyl-adamantane-1-carboxylic acid\nResult: InChI=1S\/C12H18O3\/c1-10-2-8-3-11(5-10,9(13)14)7-12(15,4-8)6-10\/h8,15H,2-7H2,1H3,(H,13,14)\/t8-,10+,11+,12-\/m1\/s1"} {"text":"Task: Please give me the SMILES representation of a chemical given the traditional IUPAC name.\nIUPAC name: 2-[4-(5-hydroxy-4-keto-6,7-dimethoxy-chromen-2-yl)phenoxy]-N-[6-[methyl(o-anisyl)amino]hexyl]acetamide\nResult: COc1ccccc1CN(C)CCCCCCNC(=O)COc1ccc(-c2cc(=O)c3c(O)c(OC)c(OC)cc3o2)cc1"}", "/scratch/micpie/export/iupac_smiles/valid_7-6.jsonl": "{"text":"The SELFIES of the compound with preferred IUPAC name 2-methyl-N-[2-[[4-(trifluoromethylsulfanyl)phenyl]methylamino]ethyl]propanamide is [C][C][Branch1][C][C][C][=Branch1][C][=O][N][C][C][N][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][S][C][Branch1][C][F][Branch1][C][F][F]."} {"text":"The canonical SMILES of the molecule with preferred IUPAC name 3-(N-thiophen-2-ylsulfonylanilino)propanamide is NC(=O)CCN(c1ccccc1)S(=O)(=O)c1cccs1."}", "/scratch/micpie/export/iupac_smiles/train_10-0.jsonl": "{"text":"The traditional IUPAC name of the compound with DeepSMILES CCC=CC=CC=C6NCC))CCCCCC6))NC=O)OCC)C)C))))))))))))Cl)))CO)O is N-[4-[5-chloro-3-(dihydroxymethyl)-N,2-diethyl-anilino]cyclohexyl]carbamic acid tert-butyl ester."} {"text":"The traditional IUPAC name of the molecule with SMILES CC(C)C1CC[C@]2(C1C3CCC4C5(CCC(C(C5CCC4(C3(CC2)C)C)(C)C)OC(=O)C)C)C(=O)F is acetic acid [(3aS)-3a-fluorocarbonyl-1-isopropyl-5a,5b,8,8,11a-pentamethyl-1,2,3,4,5,6,7,7a,9,10,11,11b,12,13,13a,13b-hexadecahydrocyclopenta[a]chrysen-9-yl] ester."}", "/scratch/micpie/export/iupac_smiles/test_27-8.jsonl": "{"text":"Task: Please give me the SMILES of a molecule based on the systematic IUPAC name.\nIUPAC name: 1-[(3S)-3-(4-chlorophenyl)-5-(3,4-dimethylphenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(2-fluorophenyl)piperazin-1-yl]ethanone\nResult: CC1=C(C=C(C=C1)C2=NN([C@@H](C2)C3=CC=C(C=C3)Cl)C(=O)CN4CCN(CC4)C5=CC=CC=C5F)C"} {"text":"Task: Please generate the SMILES of a chemical based on the systematic IUPAC name.\nIUPAC name: N-[(1E)-1-(4-ethenyl-2,3,5,5-tetramethyl-cyclopenta-1,3-dien-1-yl)buta-1,3-dien-2-yl]-5-phenyl-thiophen-2-amine\nResult: [C][C][=C][Branch2][Ring2][Branch2][C][Branch2][Ring1][S][C][=Branch1][Branch1][=C][Ring1][Branch1][C][\/C][=C][Branch1][Ring1][\\C][=C][\/N][C][=C][C][=C][Branch1][Ring2][S][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][Branch1][C][C][C][C][=C]"}", "/scratch/micpie/export/iupac_smiles/valid_19-4.jsonl": "{"text":"The SELFIES of the molecule with systematic IUPAC name N-[2-chloranyl-5-(1,2,3,4-tetrazol-1-yl)phenyl]-2-[4-[2-(methylamino)ethyl]piperidin-1-yl]ethanamide;hydrochloride is [C][N][C][C][C][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][=Branch1][C][=O][N][C][=C][Branch1][S][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][N][C][=N][N][=N][Ring1][Branch1][Cl].[Cl]."} {"text":"The DeepSMILES of the chemical with systematic IUPAC name methyl 2-[bis[2-[(4a-methyl-8-methylidene-2-propan-2-yl-1,2,3,4,5,6,7,8a-octahydronaphthalen-1-yl)amino]-2-oxidanylidene-ethyl]amino]ethanoate is CCC)CCCCCCCC=C)C6C%10NC=O)CNCC=O)NCCCCCC6C=C)CCC6)))))C))))CC)C)))))))CC=O)OC))))))))))))))C."}", "/scratch/micpie/export/iupac_smiles/valid_1-7.jsonl": "{"text":"Task: Please create the SMILES representation of a chemical given the traditional IUPAC name.\nIUPAC name: allyl-ethyl-methyl-amine;ethylene\nResult: CCNC)CC=C.C=C"} {"text":"Task: Please give me the SMILES of a chemical given the traditional IUPAC name.\nIUPAC name: (2R)-1-[[(2S)-2-hydroxy-3-(1H-indol-4-yloxy)propyl]-isopropyl-amino]-3-(1H-indol-4-yloxy)propan-2-ol\nResult: CCC)NC[C@H]COC=CC=CC=C6C=CN5)))))))))))O)))C[C@@H]COC=CC=CC=C6C=CN5)))))))))))O"}", "/scratch/micpie/export/iupac_smiles/test_9-5.jsonl": "{"text":"The SELFIES of the molecule with IUAPC name in CAS-like style 1-(9-fluorenylideneamino)oxy-3-(4-phenylbutan-2-ylamino)-2-propanol is [C][C][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][N][C][C][Branch2][Ring1][#Branch1][C][O][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][=N][O]."} {"text":"The DeepSMILES of the compound with CAS-like IUPAC name (2S)-5-chloro-N-[(4,6-dimethyl-2-oxo-3-piperidinyl)methyl]-3-[ethyl(1-oxopropyl)amino]-2-methyl-1-cyclohexanecarboxamide is CCC=O)NCC))CCCCC[C@@H]6C))C=O)NCCCCCNC6=O)))C)))C))))))))Cl."}", "/scratch/micpie/export/iupac_smiles/test_10-2.jsonl": "{"text":"The IUPAC name of the compound with canonical SMILES C=CCN(c1cc(Cl)cc(C(C)O)c1C)C1CCNCC1 is 1-[5-chloro-2-methyl-3-[piperidin-4-yl(prop-2-enyl)amino]phenyl]ethanol."} {"text":"The IUPAC name of the compound with canonical SMILES COc1ccc2ncc(F)c(CCCC3CCN(CCSc4ccsc4)CC3CC(=O)O)c2c1 is 2-[4-[3-(3-fluoro-6-methoxyquinolin-4-yl)propyl]-1-(2-thiophen-3-ylsulfanylethyl)piperidin-3-yl]acetic acid."}", "/scratch/micpie/export/iupac_smiles/test_18-9.jsonl": "{"text":"Task: Please generate the SMILES representation of a molecule based on the IUAPC name in CAS-like style.\nIUPAC name: [(E)-2-ethyl-7,10-dimethylundec-3-enylidene]-dimethylphosphonium\nResult: CCC(C=[P+](C)C)\/C=C\/CCC(C)CCC(C)C"} {"text":"Task: Please give me the SMILES representation of a chemical based on the IUAPC name in CAS-like style.\nIUPAC name: N-methyl-2-[1-[2-(4-nitrophenyl)ethyl]-4-piperidinyl]ethanamine;hydrochloride\nResult: CNCCCCCNCC6))CCC=CC=CC=C6))[N+]=O)[O-].Cl"}", "/scratch/micpie/export/iupac_smiles/test_0-8.jsonl": "{"text":"Task: Please generate the SMILES of a molecule given the systematic IUPAC name.\nIUPAC name: 1-[(4-bromophenyl)methyl]-N-[(Z)-(3-methoxyphenyl)methylideneamino]piperidine-4-carboxamide\nResult: COC1=CC=CC(=C1)\/C=N\\NC(=O)C2CCN(CC2)CC3=CC=C(C=C3)Br"} {"text":"Task: Please give me the SMILES representation of a chemical based on the systematic IUPAC name.\nIUPAC name: 2-(5-methyl-1lambda3-ioda-3-thia-2-azacyclopenta-1,4-dien-4-yl)-1,3-thiazole-4-carbaldehyde\nResult: InChI=1S\/C7H5IN2OS2\/c1-4-6(13-10-8-4)7-9-5(2-11)3-12-7\/h2-3H,1H3"}", "/scratch/micpie/export/iupac_smiles/train_9-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the compound with SMILES CC(C)(C)NCC(CON=C1C2=CC=CC=C2CCC3=CC=CC=C31)O is 1-(tert-butylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,11,13-hexaenylideneamino)oxy-2-propanol."} {"text":"The IUAPC name in CAS-like style of the compound with InChI InChI=1S\/C27H39N3O2\/c1-6-8-16-27(19(3)17-20(4)29-26(27)32)18-28-25(31)23-14-11-15-24(22(23)7-2)30(5)21-12-9-10-13-21\/h7,11,14-15,19,21H,2,6,8-10,12-13,16-18H2,1,3-5H3,(H,28,31) is N-[(5-butyl-2,4-dimethyl-6-oxo-3,4-dihydropyridin-5-yl)methyl]-3-[cyclopentyl(methyl)amino]-2-ethenylbenzamide."}", "/scratch/micpie/export/iupac_smiles/test_4-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the molecule with DeepSMILES CNCCNCC6))CC=CNC=CC=CC=C69))))))S=O)=O)C=CC=CC=C6)OC is 1-(3-methoxyphenyl)sulfonyl-3-[(4-methyl-1-piperazinyl)methyl]indole."} {"text":"The IUAPC name in CAS-like style of the compound with canonical SMILES CCC1CNCCC1(O)C(C)(C)CN is 4-(1-amino-2-methylpropan-2-yl)-3-ethyl-4-piperidinol."}", "/scratch/micpie/export/iupac_smiles/valid_27-5.jsonl": "{"text":"The SELFIES of the compound with CAS-like IUPAC name 1-[(3S)-3-(4-chlorophenyl)-5-(4-fluorophenyl)-3,4-dihydropyrazol-2-yl]-2-(N-methylanilino)ethanone is [C][N][Branch2][Ring2][Branch2][C][C][=Branch1][C][=O][N][C@@H1][Branch2][Ring1][Ring1][C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][F][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl][C][=C][C][=C][C][=C][Ring1][=Branch1]."} {"text":"The canonical SMILES of the chemical with IUAPC name in CAS-like style 6-[7'-[(9,9-dimethyl-2-fluorenyl)amino]-9,9'-spirobi[fluorene]-2'-yl]benzene-1,2,3,4,5-pentol is CC1(C)c2ccccc2-c2ccc(Nc3ccc4c(c3)C3(c5ccccc5-c5ccccc53)c3cc(-c5c(O)c(O)c(O)c(O)c5O)ccc3-4)cc21."}", "/scratch/micpie/export/iupac_smiles/train_18-4.jsonl": "{"text":"The SELFIES of the compound with systematic IUPAC name N-[3-[6-(6-methylpyridin-3-yl)-3,8a-dihydroquinazolin-8-yl]phenyl]prop-2-enamide is [C][C][=N][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][N][C][=N][C][Ring1][=Branch1][C][=Branch1][Ring2][=C][Ring1][#Branch2][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=C]."} {"text":"The DeepSMILES of the molecule with systematic IUPAC name 2-[1-[[3,4-bis(fluoranyl)phenyl]methyl]piperidin-4-yl]-N-methyl-ethanamine;hydrochloride is CNCCCCCNCC6))CC=CC=CC=C6))F))F.Cl."}", "/scratch/micpie/export/iupac_smiles/valid_18-5.jsonl": "{"text":"The SELFIES of the chemical with CAS-like IUPAC name 8-(1-methyl-3-pyrrolidinyl)-6-(3-pyridinyl)quinazoline is [C][N][C][C][C][Branch1][Ring2][C][Ring1][Branch1][C][=C][C][=Branch1][N][=C][C][=C][N][=C][N][=C][Ring1][#Branch2][Ring1][=Branch1][C][=C][N][=C][C][=C][Ring1][=Branch1]."} {"text":"The InChI of the chemical with CAS-like IUPAC name 2-[4-[2-(methylamino)ethyl]-1-piperidinyl]-1-(2-thiophen-2-yl-1-pyrrolidinyl)-1-propanone is InChI=1S\/C19H31N3OS\/c1-15(21-12-8-16(9-13-21)7-10-20-2)19(23)22-11-3-5-17(22)18-6-4-14-24-18\/h4,6,14-17,20H,3,5,7-13H2,1-2H3."}", "/scratch/micpie/export/iupac_smiles/valid_9-9.jsonl": "{"text":"Task: Please create the SMILES representation of a molecule based on the IUAPC name in CAS-like style.\nIUPAC name: 1-(propan-2-ylamino)-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,9,11,13-heptaenylideneamino)oxy-2-propanol\nResult: InChI=1S\/C21H24N2O2\/c1-15(2)22-13-18(24)14-25-23-21-19-9-5-3-7-16(19)11-12-17-8-4-6-10-20(17)21\/h3-12,15,18,22,24H,13-14H2,1-2H3"} {"text":"Task: Please generate the SMILES representation of a chemical based on the IUAPC name in CAS-like style.\nIUPAC name: 5-chloro-3-[cyclobutyl(methyl)amino]-N-[(4-cycloheptyl-4,6-dimethyl-2-oxo-3-piperidinyl)methyl]-2-methylbenzamide\nResult: Cc1c(C(=O)NCC2C(=O)NC(C)CC2(C)C2CCCCCC2)cc(Cl)cc1N(C)C1CCC1"}", "/scratch/micpie/export/iupac_smiles/test_26-5.jsonl": "{"text":"The SMILES of the molecule with CAS-like IUPAC name 2-(2,3-dichlorophenyl)-4-ethyl-5-thiazolecarboxylic acid is CCC1=C(SC(=N1)C2=C(C(=CC=C2)Cl)Cl)C(=O)O."} {"text":"The InChI of the compound with IUAPC name in CAS-like style 1-[(3R)-3-(2-chlorophenyl)-5-(3-methoxyphenyl)-3,4-dihydropyrazol-2-yl]-2-(3,4-dihydro-1H-isoquinolin-2-yl)ethanone is InChI=1S\/C27H26ClN3O2\/c1-33-22-10-6-9-20(15-22)25-16-26(23-11-4-5-12-24(23)28)31(29-25)27(32)18-30-14-13-19-7-2-3-8-21(19)17-30\/h2-12,15,26H,13-14,16-18H2,1H3\/t26-\/m1\/s1."}", "/scratch/micpie/export/iupac_smiles/valid_6-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the chemical with SELFIES [C][O][C][C][C][C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][N] is N-(2-aminoethyl)-2-[(5-methoxy-1-oxopentyl)amino]benzamide."} {"text":"The CAS-like IUPAC name of the molecule with InChI InChI=1S\/C18H24N2O\/c1-14(2)20(12-15-7-5-4-6-8-15)13-16-9-10-18(21-3)17(19)11-16\/h4-11,14H,12-13,19H2,1-3H3 is 2-methoxy-5-[[(phenylmethyl)-propan-2-ylamino]methyl]aniline."}", "/scratch/micpie/export/iupac_smiles/test_25-8.jsonl": "{"text":"Task: Please generate the SMILES representation of a compound based on the systematic IUPAC name.\nIUPAC name: 5-(1,3-benzoxazol-5-yl)-N-[(1R)-1-cyclopropyl-2,2,2-tris(fluoranyl)ethyl]-7-methyl-pyrazolo[1,5-a]pyrimidine-3-carboxamide\nResult: Cc1cc(-c2ccc3ocnc3c2)nc2c(C(=O)N[C@H](C3CC3)C(F)(F)F)cnn12"} {"text":"Task: Please give me the SMILES representation of a molecule given the systematic IUPAC name.\nIUPAC name: 3-ethyl-5,6,7,8-tetrahydrobenzo[f][1]benzofuran-2-carboxylic acid\nResult: InChI=1S\/C15H16O3\/c1-2-11-12-7-9-5-3-4-6-10(9)8-13(12)18-14(11)15(16)17\/h7-8H,2-6H2,1H3,(H,16,17)"}", "/scratch/micpie/export/iupac_smiles/test_20-9.jsonl": "{"text":"Task: Please give me the SMILES representation of a chemical based on the IUAPC name in CAS-like style.\nIUPAC name: (2S,8S,12S)-4,10-bis(2-ethoxyphenyl)-3,5,9,11-tetraoxo-4,10-diazatetracyclo[5.5.2.02,6.08,12]tetradec-13-ene-13-carboxylic acid phenyl ester\nResult: CCOC1=CC=CC=C1N2C(=O)[C@@H]3[C@@H](C2=O)C4[C@H]5C(C3C=C4C(=O)OC6=CC=CC=C6)C(=O)N(C5=O)C7=CC=CC=C7OCC"} {"text":"Task: Please generate the SMILES representation of a compound given the IUAPC name in CAS-like style.\nIUPAC name: (2S)-2-[[3-benzoyloxy-2-[3-[4-(3-fluorophenyl)-1-triazolyl]-5-[[2-[[4-[4-(3-fluorophenyl)-1-triazolyl]-3,5-dihydroxy-6-(hydroxymethyl)-2-oxanyl]oxy]ethylamino]-oxomethyl]-2-[(3,4,5-trihydroxy-6-methyl-2-oxanyl)oxy]cyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)-4-oxanyl]oxy]-3-cyclohexylpropanoic acid\nResult: [C][C][C][Branch2][O][=Branch2][C][Branch2][O][Ring2][C][Branch2][#Branch2][#C][C][Branch1][Ring2][O][Ring1][=Branch1][O][C][C][Branch2][Branch2][#C][C][C][Branch2][Branch1][=Branch2][C][C][Ring1][=Branch1][O][C][C][Branch2][Ring2][C][C][Branch1][=N][C][Branch1][=Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][C@@H1][Branch1][#Branch2][C][C][C][C][C][C][C][Ring1][=Branch1][C][=Branch1][C][=O][O][O][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][O][C][C][Branch2][Ring2][Ring2][C][Branch1][=N][C][Branch1][=Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][N][C][=C][Branch1][Branch1][N][=N][Ring1][Branch1][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][F][O][N][C][=C][Branch1][Branch1][N][=N][Ring1][Branch1][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][F][O][O][O]"}", "/scratch/micpie/export/iupac_smiles/train_27-2.jsonl": "{"text":"The preferred IUPAC name of the compound with SELFIES [C][C][N][Branch2][Ring2][=N][C][C][N][Ring1][=Branch1][C][C][=Branch1][C][=O][N][C@H1][Branch2][Ring1][Ring1][C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][F][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl][C][=C][C][=C][C][=C][Ring1][=Branch1][F] is 1-[(3R)-3-(4-chlorophenyl)-5-(4-fluorophenyl)-3,4-dihydropyrazol-2-yl]-2-[4-(2-fluorophenyl)piperazin-1-yl]ethanone."} {"text":"The preferred IUPAC name of the molecule with canonical SMILES CC1(C)c2ccccc2-c2ccc(Nc3ccc(-c4c(O)c(O)c(-c5c(O)c(O)c(O)c(O)c5O)c(O)c4O)cc3)cc21 is 6-[4-[4-[(9,9-dimethylfluoren-2-yl)amino]phenyl]-2,3,5,6-tetrahydroxyphenyl]benzene-1,2,3,4,5-pentol."}", "/scratch/micpie/export/iupac_smiles/test_10-3.jsonl": "{"text":"The SELFIES of the molecule with traditional IUPAC name 1-[3-[allyl(4-piperidyl)amino]-5-chloro-2-methyl-phenyl]ethanol is [C][C][=C][Branch2][Ring1][=Branch2][C][=C][Branch2][Ring1][Ring1][C][=C][Ring1][=Branch1][N][Branch1][Ring2][C][C][=C][C][C][C][N][C][C][Ring1][=Branch1][Cl][C][Branch1][C][C][O]."} {"text":"The SMILES of the molecule with traditional IUPAC name 2-[4-[3-(3-fluoro-6-methoxy-4-quinolyl)propyl]-1-[2-(3-thienylthio)ethyl]-3-piperidyl]acetic acid is COC1=CC2=C(C(=CN=C2C=C1)F)CCCC3CCN(CC3CC(=O)O)CCSC4=CSC=C4."}", "/scratch/micpie/export/iupac_smiles/train_13-2.jsonl": "{"text":"The preferred IUPAC name of the molecule with SELFIES [C][C][Branch1][C][C][Branch1][C][C][C][Branch1][N][C][=Branch1][C][=O][N][Branch1][C][C][C][C][#C][N] is 2-amino-N,3,3-trimethyl-N-prop-2-ynylbutanamide."} {"text":"The IUPAC name of the molecule with InChI InChI=1S\/C16H18N2O3\/c1-10-15(11(2)18(3)17-10)21-16(19)13-8-9-20-14-7-5-4-6-12(13)14\/h4-7,13H,8-9H2,1-3H3\/t13-\/m1\/s1 is (1,3,5-trimethylpyrazol-4-yl) (4R)-3,4-dihydro-2H-chromene-4-carboxylate."}", "/scratch/micpie/export/iupac_smiles/valid_16-6.jsonl": "{"text":"The SMILES of the chemical with preferred IUPAC name 5-chloro-4-(chloromethyl)-2-(3-iodophenoxy)pyridine is C1=CC(=CC(=C1)I)OC2=NC=C(C(=C2)CCl)Cl."} {"text":"The InChI of the molecule with preferred IUPAC name N-[2-(2,5-dichlorophenyl)ethyl]-2-ethoxy-3-[[4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl]benzimidazole-4-carboxamide is InChI=1S\/C32H27Cl2N7O2\/c1-2-43-32-36-28-9-5-8-26(31(42)35-17-16-22-18-23(33)14-15-27(22)34)29(28)41(32)19-20-10-12-21(13-11-20)24-6-3-4-7-25(24)30-37-39-40-38-30\/h3-15,18H,2,16-17,19H2,1H3,(H,35,42)(H,37,38,39,40)."}", "/scratch/micpie/export/iupac_smiles/valid_24-7.jsonl": "{"text":"Task: Please generate the SMILES of a compound based on the traditional IUPAC name.\nIUPAC name: 2-(1,4-diketo-3H-phthalazin-2-yl)-N-(2-hydroxyphenyl)acetamide\nResult: C=CC=CC=C6)C=O)NNC6=O))CC=O)NC=CC=CC=C6O"} {"text":"Task: Please create the SMILES of a compound based on the traditional IUPAC name.\nIUPAC name: 5-(7-fluoro-1,3-benzoxazol-5-yl)-7-methyl-N-(2,2,2-trifluoro-1-methyl-ethyl)pyrazolo[1,5-a]pyrimidine-3-carboxamide\nResult: [C][C][=C][C][=Branch2][Ring1][=C][=N][C][=C][Branch1][Branch2][C][=N][N][Ring1][=Branch2][Ring1][Branch1][C][=Branch1][C][=O][N][C][Branch1][C][C][C][Branch1][C][F][Branch1][C][F][F][C][=C][C][=C][Branch1][Branch2][C][=Branch1][Ring2][=C][Ring1][=Branch1][F][O][C][=N][Ring1][Branch2]"}", "/scratch/micpie/export/iupac_smiles/train_24-8.jsonl": "{"text":"Task: Please give me the SMILES of a molecule given the systematic IUPAC name.\nIUPAC name: N-(benzimidazol-1-yl)-2-(2-phenoxyethanoylamino)ethanamide\nResult: C1=CC=C(C=C1)OCC(=O)NCC(=O)NN2C=NC3=CC=CC=C32"} {"text":"Task: Please generate the SMILES of a compound given the systematic IUPAC name.\nIUPAC name: N-[(1S)-1-cyclopropyl-2,2,2-tris(fluoranyl)ethyl]-7-methyl-5-[4-(trifluoromethyl)pyridin-3-yl]pyrazolo[1,5-a]pyrimidine-3-carboxamide\nResult: Cc1cc(-c2cnccc2C(F)(F)F)nc2c(C(=O)N[C@@H](C3CC3)C(F)(F)F)cnn12"}", "/scratch/micpie/export/iupac_smiles/valid_20-7.jsonl": "{"text":"Task: Please give me the SMILES representation of a compound given the traditional IUPAC name.\nIUPAC name: 4-acetoxy-7-(2-hydroxy-5-keto-2-oct-2-enyl-cyclopent-3-en-1-ylidene)hept-5-enoic acid methyl ester\nResult: CCCCCC=CCC1(C=CC(=O)C1=CC=CC(CCC(=O)OC)OC(=O)C)O"} {"text":"Task: Please give me the SMILES representation of a molecule based on the traditional IUPAC name.\nIUPAC name: 2-[2-[2-(acetonylamino)ethylamino]ethylamino]acetic acid\nResult: CC(=O)CNCCNCCNCC(=O)O"}", "/scratch/micpie/export/iupac_smiles/valid_5-4.jsonl": "{"text":"The DeepSMILES of the molecule with systematic IUPAC name 4-[1-(aminomethyl)cyclooctyl]-3-ethyl-piperidin-4-ol is CCCCNCCC6CCCCCCCC8)))))))CN)))O."} {"text":"The SELFIES of the chemical with systematic IUPAC name N-(2-azanylethyl)-2-[3-(cyclopentylcarbonylamino)butanoylamino]benzamide;hydrochloride is [C][C][Branch2][Ring1][#Branch1][C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][N][N][C][=Branch1][C][=O][C][C][C][C][C][Ring1][Branch1].[Cl]."}", "/scratch/micpie/export/iupac_smiles/train_15-1.jsonl": "{"text":"The CAS-like IUPAC name of the chemical with SELFIES [C][C][=N][C][=C][C][=C][Ring1][=Branch1][N][C][=C][Ring1][Branch1][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=Branch1][C][=O][O] is 1-methyl-9H-pyrido[3,4-b]indole-7-carboxylic acid."} {"text":"The CAS-like IUPAC name of the chemical with InChI InChI=1S\/C15H10Cl2N2O\/c16-8-11-7-14(19-9-12(11)17)20-13-5-1-3-10-4-2-6-18-15(10)13\/h1-7,9H,8H2 is 8-[[5-chloro-4-(chloromethyl)-2-pyridinyl]oxy]quinoline."}", "/scratch/micpie/export/iupac_smiles/train_25-7.jsonl": "{"text":"Task: Please generate the SMILES representation of a compound given the traditional IUPAC name.\nIUPAC name: N-(4-butoxycyclohexyl)-5-(5-chloro-3-pyridyl)-7-methyl-pyrazolo[1,5-a]pyrimidine-3-carboxamide\nResult: CCCCOC1CCC(CC1)NC(=O)C2=C3N=C(C=C(N3N=C2)C)C4=CC(=CN=C4)Cl"} {"text":"Task: Please give me the SMILES of a compound based on the traditional IUPAC name.\nIUPAC name: 8-ethyl-2,3-dihydrothieno[2,3-g][1,4]benzodioxin-7-carboxylic acid\nResult: CCC=CSC=CC=CC=C69))OCCO6)))))))))C=O)O"}", "/scratch/micpie/export/iupac_smiles/valid_2-5.jsonl": "{"text":"The InChI of the chemical with CAS-like IUPAC name 4-[(3S,4S)-2,4-dimethylheptan-3-yl]phenol is InChI=1S\/C15H24O\/c1-5-6-12(4)15(11(2)3)13-7-9-14(16)10-8-13\/h7-12,15-16H,5-6H2,1-4H3\/t12-,15-\/m0\/s1."} {"text":"The SELFIES of the molecule with CAS-like IUPAC name 8-bromo-3-[2-(4-ethoxyphenyl)-4-thiazolyl]-6-nitro-1-benzopyran-2-one is [C][C][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=N][C][=Branch1][Branch1][=C][S][Ring1][Branch1][C][=C][C][=C][C][=Branch1][=C][=C][C][=Branch1][=Branch2][=C][Ring1][=Branch1][O][C][Ring1][#Branch2][=O][Br][N+1][=Branch1][C][=O][O-1]."}", "/scratch/micpie/export/iupac_smiles/train_1-2.jsonl": "{"text":"The preferred IUPAC name of the molecule with canonical SMILES C=C(Cl)\/C=C\\c1c(NCCCCN2CCCCC2)ccnc1C is 3-[(1Z)-3-chlorobuta-1,3-dienyl]-2-methyl-N-(4-piperidin-1-ylbutyl)pyridin-4-amine."} {"text":"The IUPAC name of the compound with canonical SMILES COc1ccc2c(c1)[C@@H](O)C[C@@H]1[C@@H]2CC[C@]2(C)[C@@H](O[C@@H]3CCCCO3)CC[C@@H]12 is (6S,8R,9S,13S,14S,17S)-3-methoxy-13-methyl-17-[(2R)-oxan-2-yl]oxy-6,7,8,9,11,12,14,15,16,17-decahydrocyclopenta[a]phenanthren-6-ol."}", "/scratch/micpie/export/iupac_smiles/test_0-4.jsonl": "{"text":"The canonical SMILES of the chemical with systematic IUPAC name 1-[(4-bromophenyl)methyl]-N-[(Z)-(3-methoxyphenyl)methylideneamino]piperidine-4-carboxamide is COc1cccc(\/C=N\\NC(=O)C2CCN(Cc3ccc(Br)cc3)CC2)c1."} {"text":"The SMILES of the compound with systematic IUPAC name 2-(5-methyl-1lambda3-ioda-3-thia-2-azacyclopenta-1,4-dien-4-yl)-1,3-thiazole-4-carbaldehyde is CC1=C(SN=I1)C2=NC(=CS2)C=O."}", "/scratch/micpie/export/iupac_smiles/train_10-9.jsonl": "{"text":"Task: Please give me the SMILES of a compound based on the IUAPC name in CAS-like style.\nIUPAC name: N-[4-[5-chloro-3-(dihydroxymethyl)-N,2-diethylanilino]cyclohexyl]carbamic acid tert-butyl ester\nResult: CCc1c(C(O)O)cc(Cl)cc1N(CC)C1CCC(NC(=O)OC(C)(C)C)CC1"} {"text":"Task: Please give me the SMILES of a chemical based on the CAS-like IUPAC name.\nIUPAC name: acetic acid [(3aS)-3a-carbonofluoridoyl-5a,5b,8,8,11a-pentamethyl-1-propan-2-yl-1,2,3,4,5,6,7,7a,9,10,11,11b,12,13,13a,13b-hexadecahydrocyclopenta[a]chrysen-9-yl] ester\nResult: CC(=O)OC1CCC2(C)C(CCC3(C)C2CCC2C4C(C(C)C)CC[C@]4(C(=O)F)CCC23C)C1(C)C"}", "/scratch/micpie/export/iupac_smiles/test_11-3.jsonl": "{"text":"The DeepSMILES of the chemical with traditional IUPAC name [3-(2,4-difluorophenoxy)-3-phenyl-propyl]-dimethyl-amine;2,5-dinitrobenzoic acid is CNC)CCCC=CC=CC=C6))))))OC=CC=CC=C6))F)))F.C=CC=CC=C6[N+]=O)[O-]))))C=O)O)))[N+]=O)[O-]."} {"text":"The SELFIES of the compound with traditional IUPAC name 4-[7-(4-hydroxyphenyl)-9,9-dioctyl-fluoren-2-yl]benzaldehyde is [C][C][C][C][C][C][C][C][C][Branch2][Ring2][P][C][=C][Branch2][Ring1][Branch1][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=O][C][=C][Ring1][P][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][O][C][C][C][C][C][C][C][C]."}", "/scratch/micpie/export/iupac_smiles/test_21-2.jsonl": "{"text":"The IUPAC name of the compound with canonical SMILES CCC1CC(C(=O)NCCNC(=O)C2CC(OCc3cc4ccccc4oc3=O)C(O)C(OC3OC(CO)C(O)C(OCc4cc5ccccc5oc4=O)C3O)C2)CC(OC2OC(CO)C(O)C(O[C@@H](CC3CCCCC3)C(=O)N3CCC3)C2OC(=O)c2ccccc2)C1OC1OC(C)C(O)C(O)C1O is [4-[(2S)-1-(azetidin-1-yl)-3-cyclohexyl-1-oxopropan-2-yl]oxy-2-[5-[2-[[3-[3,5-dihydroxy-6-(hydroxymethyl)-4-[(2-oxochromen-3-yl)methoxy]oxan-2-yl]oxy-4-hydroxy-5-[(2-oxochromen-3-yl)methoxy]cyclohexanecarbonyl]amino]ethylcarbamoyl]-3-ethyl-2-(3,4,5-trihydroxy-6-methyloxan-2-yl)oxycyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)oxan-3-yl] benzoate."} {"text":"The preferred IUPAC name of the compound with DeepSMILES COCC=NC=CN=C6C=CC=CC=C6))OCF)F)F is 2-(methoxymethyl)-3-[4-(trifluoromethoxy)phenyl]pyrazine."}", "/scratch/micpie/export/iupac_smiles/valid_6-0.jsonl": "{"text":"The traditional IUPAC name of the compound with SELFIES [C][O][C][C][C][C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][N] is N-(2-aminoethyl)-2-(5-methoxypentanoylamino)benzamide."} {"text":"The traditional IUPAC name of the molecule with SMILES CC(C)N(CC1=CC=CC=C1)CC2=CC(=C(C=C2)OC)N is (3-amino-4-methoxy-benzyl)-benzyl-isopropyl-amine."}", "/scratch/micpie/export/iupac_smiles/test_8-5.jsonl": "{"text":"The DeepSMILES of the chemical with CAS-like IUPAC name N-[3-[(3,4-dimethylphenyl)sulfonylamino]phenyl]-2-methylpropanamide is CC=CC=CC=C6))S=O)=O)NC=CC=CC=C6)NC=O)CC)C)))))))))))))C."} {"text":"The InChI of the compound with IUAPC name in CAS-like style 1-(tert-butylamino)-3-[(diphenylmethylene)amino]oxy-2-propanol is InChI=1S\/C20H26N2O2\/c1-20(2,3)21-14-18(23)15-24-22-19(16-10-6-4-7-11-16)17-12-8-5-9-13-17\/h4-13,18,21,23H,14-15H2,1-3H3."}", "/scratch/micpie/export/iupac_smiles/valid_10-4.jsonl": "{"text":"The canonical SMILES of the chemical with systematic IUPAC name methyl 3-[hexan-2-yl(methyl)amino]-2-methyl-benzoate is CCCCC(C)N(C)c1cccc(C(=O)OC)c1C."} {"text":"The SMILES of the molecule with systematic IUPAC name (4S,7R,8S,9S,13Z,16S)-16-[(Z)-1-fluoranyl-2-pyridin-2-yl-ethenyl]-5,5,9,13-tetramethyl-4,8-bis(oxidanyl)-7-propyl-1-azacyclohexadec-13-ene-2,6-dione is CCC[C@@H]1[C@H]([C@H](CCC\/C(=C\\C[C@H](NC(=O)C[C@@H](C(C1=O)(C)C)O)\/C(=C\/C2=CC=CC=N2)\/F)\/C)C)O."}", "/scratch/micpie/export/iupac_smiles/test_14-0.jsonl": "{"text":"The traditional IUPAC name of the chemical with DeepSMILES CC=CC=NN5C)))C))OC=O)[C@H]CC=O)NC5)CCF)F)F is (3S)-5-keto-1-(2,2,2-trifluoroethyl)pyrrolidine-3-carboxylic acid (1,3,5-trimethylpyrazol-4-yl) ester."} {"text":"The traditional IUPAC name of the compound with DeepSMILES C=CC=CC=C6O))I)))Br))CF)F)F is 4-bromo-2-iodo-5-(trifluoromethyl)phenol."}", "/scratch/micpie/export/iupac_smiles/test_25-2.jsonl": "{"text":"The preferred IUPAC name of the chemical with SELFIES [C][C][=C][C][=Branch2][Ring2][C][=N][C][=C][Branch1][Branch2][C][=N][N][Ring1][=Branch2][Ring1][Branch1][C][=Branch1][C][=O][N][C@H1][Branch1][=Branch1][C][C][C][Ring1][Ring1][C][Branch1][C][F][Branch1][C][F][F][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][O][C][=N][Ring1][#Branch1] is 5-(1,3-benzoxazol-5-yl)-N-[(1R)-1-cyclopropyl-2,2,2-trifluoroethyl]-7-methylpyrazolo[1,5-a]pyrimidine-3-carboxamide."} {"text":"The preferred IUPAC name of the chemical with SMILES CCC1=C(OC2=C1C=C3CCCCC3=C2)C(=O)O is 3-ethyl-5,6,7,8-tetrahydrobenzo[f][1]benzofuran-2-carboxylic acid."}", "/scratch/micpie/export/iupac_smiles/test_13-2.jsonl": "{"text":"The preferred IUPAC name of the compound with DeepSMILES CCC)C)CC=O)NCC)CO))CO)))))N is 2-amino-N-(1,3-dihydroxy-2-methylpropan-2-yl)-3,3-dimethylbutanamide."} {"text":"The IUPAC name of the molecule with InChI InChI=1S\/C16H18N2O3\/c1-10-15(11(2)18(3)17-10)21-16(19)13-8-9-20-14-7-5-4-6-12(13)14\/h4-7,13H,8-9H2,1-3H3\/t13-\/m0\/s1 is (1,3,5-trimethylpyrazol-4-yl) (4S)-3,4-dihydro-2H-chromene-4-carboxylate."}", "/scratch/micpie/export/iupac_smiles/valid_10-2.jsonl": "{"text":"The preferred IUPAC name of the chemical with SMILES CCCCC(C)N(C)C1=CC=CC(=C1C)C(=O)OC is methyl 3-[hexan-2-yl(methyl)amino]-2-methylbenzoate."} {"text":"The IUPAC name of the compound with InChI InChI=1S\/C29H43FN2O4\/c1-6-10-22-27(35)20(3)12-9-11-19(2)14-15-24(23(30)17-21-13-7-8-16-31-21)32-26(34)18-25(33)29(4,5)28(22)36\/h7-8,13-14,16-17,20,22,24-25,27,33,35H,6,9-12,15,18H2,1-5H3,(H,32,34)\/b19-14-,23-17-\/t20-,22+,24-,25-,27-\/m0\/s1 is (4S,7R,8S,9S,13Z,16S)-16-[(Z)-1-fluoro-2-pyridin-2-ylethenyl]-4,8-dihydroxy-5,5,9,13-tetramethyl-7-propyl-1-azacyclohexadec-13-ene-2,6-dione."}", "/scratch/micpie/export/iupac_smiles/train_20-6.jsonl": "{"text":"The SMILES of the compound with IUPAC name [4-hydroxy-6-[[2,6,13,17,17-pentamethyl-6-(4-methylpentyl)-4,8-dioxo-7-oxapentacyclo[10.8.0.02,9.05,9.013,18]icos-11-en-16-yl]oxy]-5-(3,4,5-trihydroxy-6-methyloxan-2-yl)oxyoxan-3-yl] hydrogen sulfate is CC1C(C(C(C(O1)OC2C(C(COC2OC3CCC4(C(C3(C)C)CCC5C4=CCC67C5(CC(=O)C6C(OC7=O)(C)CCCC(C)C)C)C)OS(=O)(=O)O)O)O)O)O."} {"text":"The InChI of the chemical with IUPAC name [4-[(2S)-1-(azetidin-1-yl)-1-oxopentan-2-yl]oxy-2-[5-[2-[[3-[4-(3-fluorophenyl)triazol-1-yl]-5-[4-[4-(3-fluorophenyl)triazol-1-yl]-3,5-dihydroxy-6-(hydroxymethyl)oxan-2-yl]oxy-4-hydroxycyclohexanecarbonyl]amino]ethylcarbamoyl]-3-methyl-2-(3,4,5-trihydroxy-6-methyloxan-2-yl)oxycyclohexyl]oxy-5-hydroxy-6-(hydroxymethyl)oxan-3-yl] benzoate is InChI=1S\/C66H85F2N9O21\/c1-4-11-44(62(89)75-20-10-21-75)92-58-53(83)48(31-79)96-66(59(58)97-63(90)34-12-6-5-7-13-34)94-46-27-37(22-32(2)57(46)98-65-56(86)55(85)50(80)33(3)91-65)60(87)69-18-19-70-61(88)38-25-43(76-28-41(71-73-76)35-14-8-16-39(67)23-35)51(81)45(26-38)93-64-54(84)49(52(82)47(30-78)95-64)77-29-42(72-74-77)36-15-9-17-40(68)24-36\/h5-9,12-17,23-24,28-29,32-33,37-38,43-59,64-66,78-86H,4,10-11,18-22,25-27,30-31H2,1-3H3,(H,69,87)(H,70,88)\/t32?,33?,37?,38?,43?,44-,45?,46?,47?,48?,49?,50?,51?,52?,53?,54?,55?,56?,57?,58?,59?,64?,65?,66?\/m0\/s1."}", "/scratch/micpie/export/iupac_smiles/train_22-5.jsonl": "{"text":"The SMILES of the molecule with IUAPC name in CAS-like style 6-methyl-5-(5-methyl-1,3,4-oxadiazol-2-yl)-5-methylsulfonyl-4-(trifluoromethyl)-1-cyclohexa-1,3-dienecarboxamide is CC1C(=CC=C(C1(C2=NN=C(O2)C)S(=O)(=O)C)C(F)(F)F)C(=O)N."} {"text":"The DeepSMILES of the compound with CAS-like IUPAC name 1-propylsulfonyl-3-piperidinecarboxaldehyde is CCCS=O)=O)NCCCCC6)C=O."}", "/scratch/micpie/export/iupac_smiles/train_3-0.jsonl": "{"text":"The traditional IUPAC name of the chemical with SELFIES [C][C][C][N][Branch1][Ring2][C][Ring1][Branch1][C][=Branch1][C][=O][C][=C][C][=C][Branch1][Ring2][O][Ring1][Branch1][N+1][=Branch1][C][=O][O-1] is (5-nitro-2-furyl)-pyrrolidino-methanone."} {"text":"The traditional IUPAC name of the molecule with SMILES C1CC(=O)N(C2=C1C=C3C=C2OC\/C=C\/CO[C@H]4C[C@H](N(C4)C5=NC6=C(C=NN6C(=C5)N3)Cl)CO)CC7=CC=CC=C7 is (12S,14S,17E)-23-benzyl-7-chloro-12-methylol-15,20-dioxa-2,4,5,9,11,23-hexazahexacyclo[19.7.1.13,10.111,14.04,8.022,27]hentriaconta-1(28),3(31),5,7,9,17,21(29),22(27)-octaen-24-one."}", "/scratch/micpie/export/iupac_smiles/test_2-3.jsonl": "{"text":"The DeepSMILES of the molecule with traditional IUPAC name (2R)-2-(methylamino)-1-(o-tolyl)propan-1-one is CC=CC=CC=C6C=O)[C@@H]C)NC."} {"text":"The InChI of the compound with traditional IUPAC name 2-(6-bromo-2,3-dihydro-1,4-benzodioxin-7-yl)-4,5,6,7-tetrachloro-isoindoline-1,3-quinone is InChI=1S\/C16H6BrCl4NO4\/c17-5-3-7-8(26-2-1-25-7)4-6(5)22-15(23)9-10(16(22)24)12(19)14(21)13(20)11(9)18\/h3-4H,1-2H2."}", "/scratch/micpie/export/iupac_smiles/valid_7-4.jsonl": "{"text":"The canonical SMILES of the compound with systematic IUPAC name 2-methyl-N-[2-[[4-(trifluoromethylsulfanyl)phenyl]methylamino]ethyl]propanamide is CC(C)C(=O)NCCNCc1ccc(SC(F)(F)F)cc1."} {"text":"The InChI of the chemical with systematic IUPAC name 3-[phenyl(thiophen-2-ylsulfonyl)amino]propanamide is InChI=1S\/C13H14N2O3S2\/c14-12(16)8-9-15(11-5-2-1-3-6-11)20(17,18)13-7-4-10-19-13\/h1-7,10H,8-9H2,(H2,14,16)."}", "/scratch/micpie/export/iupac_smiles/test_22-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the chemical with SELFIES [C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][=Branch1][C][=O][O][C][Ring1][Branch2][=N][N][Ring1][O][Br] is 2-bromo-5,7-dioxa-2,3-diazatricyclo[6.3.1.04,12]dodeca-1(12),3,8,10-tetraen-6-one."} {"text":"The CAS-like IUPAC name of the molecule with SMILES C1CC(CN(C1)S(=O)(=O)C2=CC3=C(C=C2)NC(=O)C3)C=O is 1-[(2-oxo-1,3-dihydroindol-5-yl)sulfonyl]-3-piperidinecarboxaldehyde."}", "/scratch/micpie/export/iupac_smiles/test_9-2.jsonl": "{"text":"The IUPAC name of the chemical with canonical SMILES CC(CCc1ccccc1)NCC(O)CON=C1c2ccccc2-c2ccccc21 is 1-(fluoren-9-ylideneamino)oxy-3-(4-phenylbutan-2-ylamino)propan-2-ol."} {"text":"The IUPAC name of the compound with canonical SMILES CCC(=O)N(CC)C1CC(Cl)CC(C(=O)NCC2C(=O)NC(C)CC2C)[C@@H]1C is (2S)-5-chloro-N-[(4,6-dimethyl-2-oxopiperidin-3-yl)methyl]-3-[ethyl(propanoyl)amino]-2-methylcyclohexane-1-carboxamide."}", "/scratch/micpie/export/iupac_smiles/train_7-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with canonical SMILES COC(=O)CNS(=O)(=O)c1cc(N)ccc1Cl is 2-[(5-amino-2-chloro-phenyl)sulfonylamino]acetic acid methyl ester."} {"text":"The traditional IUPAC name of the chemical with InChI InChI=1S\/C15H15N3O5S\/c16-15(19)10-11-17(12-6-2-1-3-7-12)24(22,23)14-9-5-4-8-13(14)18(20)21\/h1-9H,10-11H2,(H2,16,19) is 3-(N-(2-nitrophenyl)sulfonylanilino)propionamide."}", "/scratch/micpie/export/iupac_smiles/test_23-4.jsonl": "{"text":"The SELFIES of the molecule with systematic IUPAC name 1-(3-nitro-4-oxidanyl-phenyl)sulfonylpiperidine-3-carbaldehyde is [C][C][C][Branch2][Ring1][P][C][N][Branch1][Ring2][C][Ring1][=Branch1][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=Branch1][=Branch2][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][O][N+1][=Branch1][C][=O][O-1][C][=O]."} {"text":"The SELFIES of the compound with systematic IUPAC name N-(benzimidazol-1-yl)-4-(ethylsulfamoyl)benzamide is [C][C][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][N][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][=Branch2]."}", "/scratch/micpie/export/iupac_smiles/test_25-7.jsonl": "{"text":"Task: Please give me the SMILES representation of a molecule given the traditional IUPAC name.\nIUPAC name: 5-(1,3-benzoxazol-5-yl)-N-[(1R)-1-cyclopropyl-2,2,2-trifluoro-ethyl]-7-methyl-pyrazolo[1,5-a]pyrimidine-3-carboxamide\nResult: CC=CC=NC=CC=NN95)))C=O)N[C@H]CCC3)))CF)F)F))))))))C=CC=CC=C6))OC=N5"} {"text":"Task: Please create the SMILES representation of a compound based on the traditional IUPAC name.\nIUPAC name: 3-ethyl-5,6,7,8-tetrahydrobenzo[f]benzofuran-2-carboxylic acid\nResult: [C][C][C][=C][Branch2][Ring1][C][O][C][=C][Ring1][Branch1][C][=C][C][C][C][C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=Branch1][C][=O][O]"}", "/scratch/micpie/export/iupac_smiles/valid_11-3.jsonl": "{"text":"The SMILES of the molecule with traditional IUPAC name 13-chloro-2-(4-piperidylidene)-4-azatricyclo[9.4.0.03,8]pentadeca-1(11),3(8),4,6,12,14-hexaene;disulfate is C1CC2=C(C=CC(=C2)Cl)C(=C3CCNCC3)C4=C1C=CC=N4.[O-]S(=O)(=O)[O-].[O-]S(=O)(=O)[O-]."} {"text":"The SELFIES of the molecule with traditional IUPAC name 4-(7-bromo-9,9-dioctyl-fluoren-2-yl)benzaldehyde is [C][C][C][C][C][C][C][C][C][Branch2][Ring2][#Branch1][C][=C][Branch2][Ring1][Branch1][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=O][C][=C][Ring1][P][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Br][C][C][C][C][C][C][C][C]."}", "/scratch/micpie/export/iupac_smiles/train_4-0.jsonl": "{"text":"The traditional IUPAC name of the compound with DeepSMILES C[C@@]CC[C@H][C@@][C@H]6CC[C@@H][C@H]%10\/C=C\/C=NC=CC=C6))C=CC=CC=C6)))F)))))))))))CSCCS5)))))))))C)C=O)))O is (1S,2R,4aS,5R,6S,8aS)-6-(1,3-dithiolan-2-yl)-5-[(E)-2-[5-(3-fluorophenyl)-2-pyridyl]vinyl]-2-hydroxy-1,4a-dimethyl-decalin-1-carbaldehyde."} {"text":"The traditional IUPAC name of the chemical with SELFIES [C][C][C][C][N][C][C][C][Ring1][=Branch1][Branch1][Branch2][C][Branch1][Ring1][C][C][C][N][O] is 4-[1-(aminomethyl)propyl]-3-ethyl-piperidin-4-ol."}", "/scratch/micpie/export/iupac_smiles/valid_21-0.jsonl": "{"text":"The traditional IUPAC name of the chemical with SELFIES [C][C][C][Branch2][=N][#C][C][Branch2][=N][#Branch2][C][Branch2][=N][Branch1][C][Branch1][Ring2][O][Ring1][=Branch1][O][C][C][Branch2][O][Branch2][C][C][Branch2][Branch1][=C][C][C][Ring1][=Branch1][O][C][C][Branch2][Ring2][#Branch1][C][Branch1][=N][C][Branch1][=Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][C@@H1][Branch1][#Branch2][C][C][C][C][C][C][C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][C][Ring1][Ring2][O][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][N][C][=Branch1][C][=O][C][C][C][Branch2][Branch1][Ring1][C][Branch2][Ring2][=C][C][Branch1][Ring2][C][Ring1][=Branch1][O][C][C][Branch2][Ring1][P][C][Branch1][=N][C][Branch1][=Branch2][C][Branch1][Ring2][O][Ring1][=Branch1][C][O][O][O][C][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][Ring1][#Branch2][=O][O][O][O][C][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][Ring1][#Branch2][=O][N][C][=C][Branch1][Branch1][N][=N][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][O][O][O] is benzoic acid [4-[(1S)-2-(azetidin-1-yl)-1-(cyclohexylmethyl)-2-keto-ethoxy]-2-[5-[2-[[3-[3,5-dihydroxy-4-[(2-ketochromen-3-yl)methoxy]-6-methylol-tetrahydropyran-2-yl]oxy-4-hydroxy-5-[(2-ketochromen-3-yl)methoxy]cyclohexanecarbonyl]amino]ethylcarbamoyl]-3-(4-phenyltriazol-1-yl)-2-(3,4,5-trihydroxy-6-methyl-tetrahydropyran-2-yl)oxy-cyclohexoxy]-5-hydroxy-6-methylol-tetrahydropyran-3-yl] ester."} {"text":"The traditional IUPAC name of the compound with DeepSMILES C=CC=NC=C6F))OCCF)F)F)))))))[I-]C=NC=NN5 is 3-fluoro-5-(1H-1,2,4-triazol-5-yliodanuidyl)-2-(2,2,2-trifluoroethoxy)pyridine."}", "/scratch/micpie/export/iupac_smiles/test_17-6.jsonl": "{"text":"The SELFIES of the chemical with IUPAC name [4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl 2-ethoxy-1H-benzimidazole-4-carboxylate is [C][C][O][C][=N][C][=C][Branch1][#Branch2][C][=C][C][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][=Branch1][C][=O][O][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][N][=N][Ring1][Branch1]."} {"text":"The InChI of the molecule with preferred IUPAC name 2-[4-[2-[3-[(2E,4Z)-hexa-2,4-dien-3-yl]-3-methyl-5-(3-methylcyclohexa-1,5-dien-1-yl)cyclohexa-1,5-dien-1-yl]-3-iodo-10-(4-pyridin-2-ylphenyl)-3,4-dihydroanthracen-9-yl]phenyl]pyridine is InChI=1S\/C56H49IN2\/c1-5-14-45(6-2)56(4)35-43(42-16-13-15-37(3)31-42)32-44(36-56)48-33-49-50(34-51(48)57)55(41-27-23-39(24-28-41)53-20-10-12-30-59-53)47-18-8-7-17-46(47)54(49)40-25-21-38(22-26-40)52-19-9-11-29-58-52\/h5-14,16-33,36-37,51H,15,34-35H2,1-4H3\/b14-5-,45-6+."}", "/scratch/micpie/export/iupac_smiles/test_25-6.jsonl": "{"text":"The DeepSMILES of the compound with preferred IUPAC name 5-(1,3-benzoxazol-5-yl)-N-[(1R)-1-cyclopropyl-2,2,2-trifluoroethyl]-7-methylpyrazolo[1,5-a]pyrimidine-3-carboxamide is CC=CC=NC=CC=NN95)))C=O)N[C@H]CCC3)))CF)F)F))))))))C=CC=CC=C6))OC=N5."} {"text":"The DeepSMILES of the chemical with preferred IUPAC name 3-ethyl-5,6,7,8-tetrahydrobenzo[f][1]benzofuran-2-carboxylic acid is CCC=COC=C5C=CCCCCC6=C%10)))))))))))C=O)O."}", "/scratch/micpie/export/iupac_smiles/valid_12-4.jsonl": "{"text":"The InChI of the molecule with systematic IUPAC name ethyl 8-[6-(4,4-dioctoxybutanoyloxy)hexyl-(2-hydroxyethyl)amino]octanoate is InChI=1S\/C38H75NO7\/c1-4-7-9-11-17-24-34-45-38(46-35-25-18-12-10-8-5-2)28-27-37(42)44-33-23-19-16-22-30-39(31-32-40)29-21-15-13-14-20-26-36(41)43-6-3\/h38,40H,4-35H2,1-3H3."} {"text":"The DeepSMILES of the compound with systematic IUPAC name 2-azanyl-3,3-dimethyl-N-[5,5,5-tris(fluoranyl)pentyl]butanamide is CCC)C)CC=O)NCCCCCF)F)F))))))))N."}", "/scratch/micpie/export/iupac_smiles/train_7-4.jsonl": "{"text":"The SELFIES of the molecule with systematic IUPAC name methyl 2-[(5-azanyl-2-chloranyl-phenyl)sulfonylamino]ethanoate is [C][O][C][=Branch1][C][=O][C][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][Branch1][#Branch2][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][N][Cl]."} {"text":"The DeepSMILES of the molecule with systematic IUPAC name 3-[(2-nitrophenyl)sulfonyl-phenyl-amino]propanamide is C=CC=CC=C6))NCCC=O)N))))S=O)=O)C=CC=CC=C6[N+]=O)[O-]."}", "/scratch/micpie/export/iupac_smiles/train_25-1.jsonl": "{"text":"The IUAPC name in CAS-like style of the compound with DeepSMILES CCCCOCCCCCC6))NC=O)C=CN=CC=CN6N=C9)))C)))C=CC=CN=C6)))Cl is N-(4-butoxycyclohexyl)-5-(5-chloro-3-pyridinyl)-7-methyl-3-pyrazolo[1,5-a]pyrimidinecarboxamide."} {"text":"The IUAPC name in CAS-like style of the chemical with SELFIES [C][C][C][=C][Branch2][Ring1][Ring2][S][C][=C][C][=C][Branch1][#Branch1][C][=C][Ring1][=Branch1][Ring1][=Branch2][O][C][C][O][Ring1][Branch2][C][=Branch1][C][=O][O] is 8-ethyl-2,3-dihydrothieno[2,3-g][1,4]benzodioxin-7-carboxylic acid."}", "/scratch/micpie/export/iupac_smiles/train_23-3.jsonl": "{"text":"The canonical SMILES of the compound with traditional IUPAC name 1-(2-nitrophenyl)sulfonylnipecotaldehyde is O=CC1CCCN(S(=O)(=O)c2ccccc2[N+](=O)[O-])C1."} {"text":"The canonical SMILES of the chemical with traditional IUPAC name N-(benzimidazol-1-yl)-4-(2,5-dimethylphenyl)-4-keto-butyramide is Cc1ccc(C)c(C(=O)CCC(=O)Nn2cnc3ccccc32)c1."}", "/scratch/micpie/export/iupac_smiles/train_3-2.jsonl": "{"text":"The IUPAC name of the chemical with SELFIES [C][C][C][N][Branch1][Ring2][C][Ring1][Branch1][C][=Branch1][C][=O][C][=C][C][=C][Branch1][Ring2][O][Ring1][Branch1][N+1][=Branch1][C][=O][O-1] is (5-nitrofuran-2-yl)-pyrrolidin-1-ylmethanone."} {"text":"The preferred IUPAC name of the chemical with InChI InChI=1S\/C31H31ClN6O4\/c32-25-16-33-38-28-15-27(35-31(25)38)36-18-24(14-23(36)19-39)41-10-4-5-11-42-26-13-22(34-28)12-21-8-9-29(40)37(30(21)26)17-20-6-2-1-3-7-20\/h1-7,12-13,15-16,23-24,34,39H,8-11,14,17-19H2\/b5-4+\/t23-,24-\/m0\/s1 is (12S,14S,17E)-23-benzyl-7-chloro-12-(hydroxymethyl)-15,20-dioxa-2,4,5,9,11,23-hexazahexacyclo[19.7.1.13,10.111,14.04,8.022,27]hentriaconta-1(28),3(31),5,7,9,17,21(29),22(27)-octaen-24-one."}", "/scratch/micpie/export/iupac_smiles/test_22-3.jsonl": "{"text":"The DeepSMILES of the compound with traditional IUPAC name 2-bromo-5,7-dioxa-2,3-diazatricyclo[6.3.1.04,12]dodeca-1(12),3,8,10-tetraen-6-one is C=CC=CC=C6)OC=O)OC6=NN9Br."} {"text":"The SMILES of the molecule with traditional IUPAC name 1-(2-ketoindolin-5-yl)sulfonylnipecotaldehyde is C1CC(CN(C1)S(=O)(=O)C2=CC3=C(C=C2)NC(=O)C3)C=O."}", "/scratch/micpie/export/iupac_smiles/valid_10-1.jsonl": "{"text":"The CAS-like IUPAC name of the compound with InChI InChI=1S\/C16H25NO2\/c1-6-7-9-12(2)17(4)15-11-8-10-14(13(15)3)16(18)19-5\/h8,10-12H,6-7,9H2,1-5H3 is 3-[hexan-2-yl(methyl)amino]-2-methylbenzoic acid methyl ester."} {"text":"The IUAPC name in CAS-like style of the compound with SMILES CCC[C@@H]1[C@H]([C@H](CCC\/C(=C\\C[C@H](NC(=O)C[C@@H](C(C1=O)(C)C)O)\/C(=C\/C2=CC=CC=N2)\/F)\/C)C)O is (4S,7R,8S,9S,13Z,16S)-16-[(Z)-1-fluoro-2-(2-pyridinyl)ethenyl]-4,8-dihydroxy-5,5,9,13-tetramethyl-7-propyl-1-azacyclohexadec-13-ene-2,6-dione."}", "/scratch/micpie/export/iupac_smiles/test_19-8.jsonl": "{"text":"Task: Please give me the SMILES of a compound based on the systematic IUPAC name.\nIUPAC name: N-(4-chlorophenyl)-3-[4-[2-(methylamino)ethyl]piperidin-1-yl]propanamide;hydrochloride\nResult: CNCCC1CCN(CC1)CCC(=O)NC2=CC=C(C=C2)Cl.Cl"} {"text":"Task: Please give me the SMILES representation of a compound given the systematic IUPAC name.\nIUPAC name: nan\nResult: InChI=1S\/C73H78N4O10\/c1-44-11-7-13-51-14-9-18-57-59-41-74-40-50(59)38-73(44,51)26-21-56(86-46(3)79)36-54(82)37-62(49-32-64(83)69(65(33-49)85-4)87-55-17-10-16-53(81)35-55)77-42-60-61(20-19-58-68(60)63(77)39-72(70(58)84)28-27-71(43-72)24-5-6-25-71)76(30-23-45(2)78)66-34-48(22-29-75-66)67(57)47-12-8-15-52(80)31-47\/h7-8,10-13,15-17,19-20,22,31-35,40-42,44,56-57,62,67,70,74-75,80-81,83-84H,5-6,14,21,23-30,36-39,43H2,1-4H3\/t44-,56+,57-,62-,67+,70+,72-,73-\/m0\/s1"}", "/scratch/micpie/export/iupac_smiles/valid_15-6.jsonl": "{"text":"The SELFIES of the molecule with IUPAC name 2-[5-[5-(4,4-difluoropiperidine-1-carbonyl)pyridin-2-yl]-7-(trifluoromethyl)-1-benzofuran-2-yl]ethyl methanesulfonate is [C][S][=Branch1][C][=O][=Branch1][C][=O][O][C][C][C][=C][C][=C][C][=Branch2][Ring1][Ring1][=C][C][=Branch1][#Branch1][=C][Ring1][=Branch1][O][Ring1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][=N][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][C][Branch1][Branch1][C][C][Ring1][=Branch1][Branch1][C][F][F]."} {"text":"The SELFIES of the chemical with preferred IUPAC name 5-chloro-4-(chloromethyl)-2-naphthalen-1-yloxypyridine is [C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][C][=C][Ring1][#Branch1][O][C][=N][C][=C][Branch1][=Branch2][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][Cl][Cl]."}", "/scratch/micpie/export/iupac_smiles/valid_23-3.jsonl": "{"text":"The SMILES of the compound with traditional IUPAC name 1-(3-chlorophenyl)sulfonylnipecotaldehyde is C1CC(CN(C1)S(=O)(=O)C2=CC(=CC=C2)Cl)C=O."} {"text":"The DeepSMILES of the molecule with traditional IUPAC name N-(benzimidazol-1-yl)-3-[(2,3,4,5,6-pentamethylphenyl)sulfonylamino]propionamide is CC=CC=CC=C6C))C))S=O)=O)NCCC=O)NNC=NC=CC=CC=C69))))))))))))))))C))C."}", "/scratch/micpie/export/iupac_smiles/train_16-7.jsonl": "{"text":"Task: Please give me the SMILES of a molecule given the traditional IUPAC name.\nIUPAC name: 5-chloro-4-(chloromethyl)-2-(2,3,5-trimethylphenoxy)pyridine\nResult: Cc1cc(C)c(C)c(Oc2cc(CCl)c(Cl)cn2)c1"} {"text":"Task: Please give me the SMILES of a molecule based on the traditional IUPAC name.\nIUPAC name: 2-ethoxy-1H-benzimidazole-4-carboxylic acid isopropyl ester\nResult: CCOC1=NC2=C(C=CC=C2N1)C(=O)OC(C)C"}", "/scratch/micpie/export/iupac_smiles/train_15-3.jsonl": "{"text":"The canonical SMILES of the chemical with traditional IUPAC name 1-methyl-9H-beta-carboline-7-carboxylic acid is Cc1nccc2c1[nH]c1cc(C(=O)O)ccc12."} {"text":"The canonical SMILES of the molecule with traditional IUPAC name 8-[[5-chloro-4-(chloromethyl)-2-pyridyl]oxy]quinoline is ClCc1cc(Oc2cccc3cccnc23)ncc1Cl."}", "/scratch/micpie/export/iupac_smiles/train_7-5.jsonl": "{"text":"The SMILES of the compound with CAS-like IUPAC name 2-[(5-amino-2-chlorophenyl)sulfonylamino]acetic acid methyl ester is COC(=O)CNS(=O)(=O)C1=C(C=CC(=C1)N)Cl."} {"text":"The DeepSMILES of the compound with CAS-like IUPAC name 3-(N-(2-nitrophenyl)sulfonylanilino)propanamide is C=CC=CC=C6))NCCC=O)N))))S=O)=O)C=CC=CC=C6[N+]=O)[O-]."}", "/scratch/micpie/export/iupac_smiles/test_16-6.jsonl": "{"text":"The DeepSMILES of the compound with preferred IUPAC name 5-chloro-4-(chloromethyl)-2-(4-propoxyphenoxy)pyridine is CCCOC=CC=CC=C6))OC=NC=CC=C6)CCl)))Cl."} {"text":"The SMILES of the compound with IUPAC name 2-ethoxy-N-[(4-fluorophenyl)methyl]-3-[[4-[2-(2H-tetrazol-5-yl)phenyl]phenyl]methyl]benzimidazole-4-carboxamide is CCOC1=NC2=CC=CC(=C2N1CC3=CC=C(C=C3)C4=CC=CC=C4C5=NNN=N5)C(=O)NCC6=CC=C(C=C6)F."}", "/scratch/micpie/export/iupac_smiles/train_14-9.jsonl": "{"text":"Task: Please generate the SMILES of a compound given the IUAPC name in CAS-like style.\nIUPAC name: (3R)-5-oxo-1-(2,2,2-trifluoroethyl)-3-pyrrolidinecarboxylic acid (1,3,5-trimethyl-4-pyrazolyl) ester\nResult: InChI=1S\/C13H16F3N3O3\/c1-7-11(8(2)18(3)17-7)22-12(21)9-4-10(20)19(5-9)6-13(14,15)16\/h9H,4-6H2,1-3H3\/t9-\/m1\/s1"} {"text":"Task: Please generate the SMILES of a chemical given the CAS-like IUPAC name.\nIUPAC name: [7-chloro-5-[4-[(3-fluoro-1-pyrrolidinyl)sulfonyl]phenyl]-2-benzofuranyl]methanamine\nResult: InChI=1S\/C19H18ClFN2O3S\/c20-18-9-13(7-14-8-16(10-22)26-19(14)18)12-1-3-17(4-2-12)27(24,25)23-6-5-15(21)11-23\/h1-4,7-9,15H,5-6,10-11,22H2"}", "/scratch/micpie/export/iupac_smiles/test_7-2.jsonl": "{"text":"The IUPAC name of the compound with SELFIES [C][=C][C][=N][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=Branch1][=Branch2][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][O][N] is 3-amino-4-hydroxy-N-(pyridin-2-ylmethyl)benzenesulfonamide."} {"text":"The preferred IUPAC name of the chemical with canonical SMILES NC(=O)CCN(c1ccccc1)S(=O)(=O)c1ccc(F)cc1 is 3-(N-(4-fluorophenyl)sulfonylanilino)propanamide."}", "/scratch/micpie/export/iupac_smiles/train_23-2.jsonl": "{"text":"The preferred IUPAC name of the chemical with canonical SMILES O=CC1CCCN(S(=O)(=O)c2ccccc2[N+](=O)[O-])C1 is 1-(2-nitrophenyl)sulfonylpiperidine-3-carbaldehyde."} {"text":"The preferred IUPAC name of the chemical with SELFIES [C][C][=C][C][=Branch1][=Branch2][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][C][=Branch1][C][=O][C][C][C][=Branch1][C][=O][N][N][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][=Branch2] is N-(benzimidazol-1-yl)-4-(2,5-dimethylphenyl)-4-oxobutanamide."}", "/scratch/micpie/export/iupac_smiles/train_3-3.jsonl": "{"text":"The SMILES of the chemical with traditional IUPAC name (5-nitro-2-furyl)-pyrrolidino-methanone is C1CCN(C1)C(=O)C2=CC=C(O2)[N+](=O)[O-]."} {"text":"The canonical SMILES of the compound with traditional IUPAC name (12S,14S,17E)-23-benzyl-7-chloro-12-methylol-15,20-dioxa-2,4,5,9,11,23-hexazahexacyclo[19.7.1.13,10.111,14.04,8.022,27]hentriaconta-1(28),3(31),5,7,9,17,21(29),22(27)-octaen-24-one is O=C1CCc2cc3cc(c2N1Cc1ccccc1)OC\/C=C\/CO[C@H]1C[C@@H](CO)N(C1)c1cc(n2ncc(Cl)c2n1)N3."}", "/scratch/micpie/export/iupac_smiles/valid_5-6.jsonl": "{"text":"The InChI of the compound with preferred IUPAC name 4-[1-(aminomethyl)cyclooctyl]-3-ethylpiperidin-4-ol is InChI=1S\/C16H32N2O\/c1-2-14-12-18-11-10-16(14,19)15(13-17)8-6-4-3-5-7-9-15\/h14,18-19H,2-13,17H2,1H3."} {"text":"The SMILES of the compound with preferred IUPAC name N-(2-aminoethyl)-2-[3-(cyclopentanecarbonylamino)butanoylamino]benzamide;hydrochloride is CC(CC(=O)NC1=CC=CC=C1C(=O)NCCN)NC(=O)C2CCCC2.Cl."}", "/scratch/micpie/export/iupac_smiles/train_17-8.jsonl": "{"text":"Task: Please generate the SMILES representation of a compound based on the systematic IUPAC name.\nIUPAC name: 2-ethoxy-N-pyridin-4-yl-3-[[4-[2-(2H-1,2,3,4-tetrazol-5-yl)phenyl]phenyl]methyl]benzimidazole-4-carboxamide\nResult: [C][C][O][C][=N][C][=C][C][=C][C][=Branch2][Ring1][P][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][N][=N][Ring1][Branch1][C][=Branch1][C][=O][N][C][=C][C][=N][C][=C][Ring1][=Branch1]"} {"text":"Task: Please generate the SMILES representation of a molecule given the systematic IUPAC name.\nIUPAC name: 6-(2-fluorophenyl)-8-heptan-4-yl-quinazoline\nResult: [C][C][C][C][Branch1][Ring2][C][C][C][C][=C][C][=Branch1][N][=C][C][=C][N][=C][N][=C][Ring1][#Branch2][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][F]"}", "/scratch/micpie/export/iupac_smiles/test_10-0.jsonl": "{"text":"The traditional IUPAC name of the molecule with SELFIES [C][C][=C][Branch2][Ring1][=Branch2][C][=C][Branch2][Ring1][Ring1][C][=C][Ring1][=Branch1][N][Branch1][Ring2][C][C][=C][C][C][C][N][C][C][Ring1][=Branch1][Cl][C][Branch1][C][C][O] is 1-[3-[allyl(4-piperidyl)amino]-5-chloro-2-methyl-phenyl]ethanol."} {"text":"The traditional IUPAC name of the compound with SMILES COC1=CC2=C(C(=CN=C2C=C1)F)CCCC3CCN(CC3CC(=O)O)CCSC4=CSC=C4 is 2-[4-[3-(3-fluoro-6-methoxy-4-quinolyl)propyl]-1-[2-(3-thienylthio)ethyl]-3-piperidyl]acetic acid."}", "/scratch/micpie/export/iupac_smiles/train_13-6.jsonl": "{"text":"The DeepSMILES of the molecule with IUPAC name 2-amino-N,3,3-trimethyl-N-prop-2-ynylbutanamide is CCC)C)CC=O)NC)CC#C)))))N."} {"text":"The SELFIES of the molecule with preferred IUPAC name (1,3,5-trimethylpyrazol-4-yl) (4R)-3,4-dihydro-2H-chromene-4-carboxylate is [C][C][=C][Branch1][#Branch2][C][=Branch1][=Branch1][=N][N][Ring1][Branch1][C][C][O][C][=Branch1][C][=O][C@@H1][C][C][O][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1]."}", "/scratch/micpie/export/iupac_smiles/train_3-6.jsonl": "{"text":"The InChI of the compound with IUPAC name (5-nitrofuran-2-yl)-pyrrolidin-1-ylmethanone is InChI=1S\/C9H10N2O4\/c12-9(10-5-1-2-6-10)7-3-4-8(15-7)11(13)14\/h3-4H,1-2,5-6H2."} {"text":"The SMILES of the chemical with preferred IUPAC name (12S,14S,17E)-23-benzyl-7-chloro-12-(hydroxymethyl)-15,20-dioxa-2,4,5,9,11,23-hexazahexacyclo[19.7.1.13,10.111,14.04,8.022,27]hentriaconta-1(28),3(31),5,7,9,17,21(29),22(27)-octaen-24-one is C1CC(=O)N(C2=C1C=C3C=C2OC\/C=C\/CO[C@H]4C[C@H](N(C4)C5=NC6=C(C=NN6C(=C5)N3)Cl)CO)CC7=CC=CC=C7."}", "/scratch/micpie/export/compound_protein_compound_1/valid_5-2.jsonl": "{"text":"User: Can you come up with one example for a compound SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, of course, the compound SMILES Cc1ccc(C(=O)ONc2ccc(N=NC(=O)c3ccccc3O)cc2)cc1 targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you tell another compound SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Of course, the compound SMILES Cc1ccc(C(=O)ONc2ccc(N=NC(=O)c3ccccc3O)cc2)cc1 targets the compound SMILES COc1cc2ccc(=O)oc2c(O)c1O."} {"text":"User: Can you come up with an example for a compound SMILES that targets the protein Serotonin receptor 2A?\nAssistant: Sure, the compound SMILES Cc1cc(CC(=O)N[C@H]2CC[C@H](CCN3CCN(c4cccc5c4OCO5)CC3)CC2)on1 targets the protein Serotonin receptor 2A.\nUser: Can you generate another compound SMILES that targets the protein Serotonin receptor 2A?\nAssistant: Of course, the compound SMILES Cc1cc(CC(=O)N[C@H]2CC[C@H](CCN3CCN(c4cccc5c4OCO5)CC3)CC2)on1 targets the compound SMILES O=S(=O)(Nc1cc(Sc2cccc3cccnc23)c(O)c2ccccc12)c1ccc(F)cc1."}", "/scratch/micpie/export/compound_protein_compound_1/test_8-1.jsonl": "{"text":"The protein Kappa-type opioid receptor is targeted by the compound with the DeepSMILES CCC)[C@H][C@H]cccccc6C[C@H]cccccc6%15))))))N%11C and C[C@H]1CN(Cc2ccc3ccccc3n2)CC[C@@]1(C)c1cccc(C(N)=O)c1."} {"text":"The protein CAC is targeted by the compound with the InChI InChI=1S\/C15H21N7O5S4\/c16-30(23,24)15-20-19-14(29-15)21-31(25,26)12-3-1-11(2-4-12)18-13(28)17-5-6-22-7-9-27-10-8-22\/h1-4H,5-10H2,(H,19,21)(H2,16,23,24)(H2,17,18,28) and CN(CC(=O)Nc1c(S(N)(=O)=O)cc(S(N)(=O)=O)c(Cl)c1Cl)C(=N)N."}", "/scratch/micpie/export/compound_protein_compound_1/train_2-2.jsonl": "{"text":"User: Can you come up with an example for a compound InChI that targets the protein Prostaglandin D2 receptor 2?\nAssistant: Yes, the compound InChI InChI=1S\/C21H16ClFN2O6\/c1-31-16-5-2-11(6-14(16)23)9-25-17(26)8-21(20(25)30)13-7-12(22)3-4-15(13)24(19(21)29)10-18(27)28\/h2-7H,8-10H2,1H3,(H,27,28) targets the protein Prostaglandin D2 receptor 2.\nUser: Can you create another compound InChI that targets the protein Prostaglandin D2 receptor 2?\nAssistant: Yes, the compound InChI InChI=1S\/C21H16ClFN2O6\/c1-31-16-5-2-11(6-14(16)23)9-25-17(26)8-21(20(25)30)13-7-12(22)3-4-15(13)24(19(21)29)10-18(27)28\/h2-7H,8-10H2,1H3,(H,27,28) targets the compound InChI Cc1ccc(S(=O)(=O)C(C)C)cc1C#Cc1cc(Cl)ccc1OCC(=O)O."} {"text":"User: Can you come up with one example for a compound SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, of course, the compound SMILES CN1CCN(NS(=O)(=O)c2ccc(Cl)s2)CC1 targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you create another compound SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Sure, the compound SMILES CN1CCN(NS(=O)(=O)c2ccc(Cl)s2)CC1 targets the compound SMILES O=C(\/C=C\/c1ccc(O)c(O)c1)O[C@@H](C(=O)O)[C@@H](OC(=O)\/C=C\/c1ccc(O)c(O)c1)C(=O)O."}", "/scratch/micpie/export/compound_protein_compound_1/test_4-0.jsonl": "{"text":"The compound DeepSMILES CCCS=O)=O)cccccc6)NC=O)CS6)))))))))))C=O)NCcccccc6Cl targets the protein Tyrosyl-DNA phosphodiesterase 1 and which is also targeted by the compound CCOc1ccc(CCNC(=O)c2ccc(Cl)c(S(=O)(=O)N3CCOCC3)c2)cc1OCC."} {"text":"The compound SELFIES [O][=C][Branch1][S][C][O][C][=Branch1][C][=O][\/C][=C][\/C][=C][C][=C][O][Ring1][Branch1][N][C][C][=C][C][=C][O][Ring1][Branch1] targets the protein Tyrosyl-DNA phosphodiesterase 1 and which is also targeted by the compound COc1cc2ccc(=O)oc2c(O)c1O."}", "/scratch/micpie/export/compound_protein_compound_1/valid_5-1.jsonl": "{"text":"The protein Tyr-DNA phosphodiesterase 1 is targeted by the compound with the canonical SMILES Cc1ccc(C(=O)ONc2ccc(N=NC(=O)c3ccccc3O)cc2)cc1 and COc1cc2ccc(=O)oc2c(O)c1O."} {"text":"The protein 5-hydroxytryptamine receptor 2A is targeted by the compound with the DeepSMILES CcccCC=O)N[C@H]CC[C@H]CCNCCNcccccc6OCO5)))))))))CC6))))))))CC6)))))))))on5 and O=S(=O)(Nc1cc(Sc2cccc3cccnc23)c(O)c2ccccc12)c1ccc(F)cc1."}", "/scratch/micpie/export/compound_protein_compound_1/valid_8-2.jsonl": "{"text":"User: Can you give me an example for a compound canonical SMILES that targets the protein KOR-1?\nAssistant: Of course, the compound canonical SMILES COc1ccc2c(c1)[C@]13CCN(CC4CC4)[C@H](C2)[C@@H]1Cc1sc(N)nc1C3 targets the protein KOR-1.\nUser: Can you create another compound canonical SMILES that targets the protein KOR-1?\nAssistant: Yes, of course, the compound canonical SMILES COc1ccc2c(c1)[C@]13CCN(CC4CC4)[C@H](C2)[C@@H]1Cc1sc(N)nc1C3 targets the compound canonical SMILES C[C@H]1CN(Cc2ccc3ccccc3n2)CC[C@@]1(C)c1cccc(C(N)=O)c1."} {"text":"User: Can you come up with one example for a compound canonical SMILES that targets the protein Carbonic anhydrase II?\nAssistant: Yes, of course, the compound canonical SMILES Cc1cc(S(=O)(=O)NC(=N)Nc2ccc(S(N)(=O)=O)cc2)c(SCc2cccc3ccccc23)cc1Cl targets the protein Carbonic anhydrase II.\nUser: Can you create another compound canonical SMILES that targets the protein Carbonic anhydrase II?\nAssistant: Of course, the compound canonical SMILES Cc1cc(S(=O)(=O)NC(=N)Nc2ccc(S(N)(=O)=O)cc2)c(SCc2cccc3ccccc23)cc1Cl targets the compound canonical SMILES CN(CC(=O)Nc1c(S(N)(=O)=O)cc(S(N)(=O)=O)c(Cl)c1Cl)C(=N)N."}", "/scratch/micpie/export/compound_protein_compound_1/test_1-2.jsonl": "{"text":"User: Can you come up with an example for a compound SELFIES that targets the protein Aldose reductase?\nAssistant: Sure, the compound SELFIES [C][C][O][C][=Branch1][C][=O][C][=Branch1][C][=O][N][C][=C][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][C][Branch1][=Branch1][C][=Branch1][C][=O][O][=C][Ring1][N][O] targets the protein Aldose reductase.\nUser: Can you create another compound SELFIES that targets the protein Aldose reductase?\nAssistant: Of course, the compound SELFIES [C][C][O][C][=Branch1][C][=O][C][=Branch1][C][=O][N][C][=C][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][C][Branch1][=Branch1][C][=Branch1][C][=O][O][=C][Ring1][N][O] targets the compound SELFIES O=c1cc(-c2ccccc2)c2ccc(O)cc2o1."} {"text":"User: Can you come up with one example for a compound SELFIES that targets the protein G-protein coupled receptor 44?\nAssistant: Yes, the compound SELFIES [C][C][O][C][=C][C][=C][Branch1][#Branch1][C][C][=Branch1][C][=O][O][C][=C][Ring1][#Branch2][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][N][Branch2][Ring1][C][C][=Branch1][C][=O][C][C][Branch1][C][C][C][=C][N][=C][C][=N][Ring1][=Branch1][C][C][Ring1][P] targets the protein G-protein coupled receptor 44.\nUser: Can you generate another compound SELFIES that targets the protein G-protein coupled receptor 44?\nAssistant: Yes, of course, the compound SELFIES [C][C][O][C][=C][C][=C][Branch1][#Branch1][C][C][=Branch1][C][=O][O][C][=C][Ring1][#Branch2][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][N][Branch2][Ring1][C][C][=Branch1][C][=O][C][C][Branch1][C][C][C][=C][N][=C][C][=N][Ring1][=Branch1][C][C][Ring1][P] targets the compound SELFIES Cc1ccc(S(=O)(=O)C(C)C)cc1C#Cc1cc(Cl)ccc1OCC(=O)O."}", "/scratch/micpie/export/compound_protein_compound_1/test_3-2.jsonl": "{"text":"User: Can you give me an example for a compound canonical SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, of course, the compound canonical SMILES O=c1[nH]c2ccc(S(=O)(=O)N3CCN(S(=O)(=O)c4ccc5c(c4)OCCO5)CC3)cc2[nH]c1=O targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you generate another compound canonical SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, the compound canonical SMILES O=c1[nH]c2ccc(S(=O)(=O)N3CCN(S(=O)(=O)c4ccc5c(c4)OCCO5)CC3)cc2[nH]c1=O targets the compound canonical SMILES O=C(\/C=C\/c1ccc(O)c(O)c1)O[C@@H](C(=O)O)[C@@H](OC(=O)\/C=C\/c1ccc(O)c(O)c1)C(=O)O."} {"text":"User: Can you come up with an example for a compound canonical SMILES that targets the protein Tyrosyl-DNA phosphodiesterase 1?\nAssistant: Yes, of course, the compound canonical SMILES Oc1[nH]c(S)nc2nc(S)nc1-2 targets the protein Tyrosyl-DNA phosphodiesterase 1.\nUser: Can you tell another compound canonical SMILES that targets the protein Tyrosyl-DNA phosphodiesterase 1?\nAssistant: Of course, the compound canonical SMILES Oc1[nH]c(S)nc2nc(S)nc1-2 targets the compound canonical SMILES CCOc1ccc(CCNC(=O)c2ccc(Cl)c(S(=O)(=O)N3CCOCC3)c2)cc1OCC."}", "/scratch/micpie/export/compound_protein_compound_1/valid_5-0.jsonl": "{"text":"The compound SMILES Cc1ccc(C(=O)ONc2ccc(N=NC(=O)c3ccccc3O)cc2)cc1 targets the protein Tyrosyl-DNA phosphodiesterase 1 and which is also targeted by the compound COc1cc2ccc(=O)oc2c(O)c1O."} {"text":"The compound canonical SMILES Cc1cc(CC(=O)N[C@H]2CC[C@H](CCN3CCN(c4cccc5c4OCO5)CC3)CC2)on1 targets the protein Serotonin receptor 2A and which is also targeted by the compound O=S(=O)(Nc1cc(Sc2cccc3cccnc23)c(O)c2ccccc12)c1ccc(F)cc1."}", "/scratch/micpie/export/compound_protein_compound_1/train_7-1.jsonl": "{"text":"The protein Adenosine receptor A2a is targeted by the compound with the DeepSMILES ClccccCcn[nH]c-cccccc6)OCO5))))))))n5))))))cc6Cl and OC[C@H]1O[C@@H](n2cnc3c(NC4CC5CCC4C5)ncnc32)[C@H](O)[C@@H]1O."} {"text":"The protein Kappa-type opioid receptor is targeted by the compound with the DeepSMILES O=CO)CCO))NCCNcncccccc6n[C@@H]C[C@@H]CC[C@H]C7)N5CCCCCCCC8)))))))))))))))c%10=O and C[C@H]1CN(Cc2ccc3ccccc3n2)CC[C@@]1(C)c1cccc(C(N)=O)c1."}", "/scratch/micpie/export/compound_protein_compound_1/test_9-0.jsonl": "{"text":"The compound canonical SMILES NS(=O)(=O)[C@@H]1O[C@H](CO)[C@@H](O)[C@H](O)[C@H]1O targets the protein CAC and which is also targeted by the compound CN(CC(=O)Nc1c(S(N)(=O)=O)cc(S(N)(=O)=O)c(Cl)c1Cl)C(=N)N."} {"text":"The compound SMILES CC(C)OC(=O)c1cc(O)cc(O)c1 targets the protein Carbonic anhydrase B and which is also targeted by the compound CSc1ccc(\/C=N\/CCc2ccc(S(N)(=O)=O)cc2)cc1."}", "/scratch/micpie/export/compound_protein_compound_1/valid_9-1.jsonl": "{"text":"The protein Carbonate dehydratase II is targeted by the compound with the SMILES CCCCCCC(N)c1csc(S(N)(=O)=O)c1 and CN(CC(=O)Nc1c(S(N)(=O)=O)cc(S(N)(=O)=O)c(Cl)c1Cl)C(=N)N."} {"text":"The protein Carbonic anhydrase B is targeted by the compound with the InChI InChI=1S\/C34H50N4O11S\/c1-19(2)13-27(39)45-18-26-31(47-28(40)14-20(3)4)32(48-29(41)15-21(5)6)33(49-30(42)16-22(7)8)34(46-26)38-17-25(36-37-38)23-9-11-24(12-10-23)50(35,43)44\/h9-12,17,19-22,26,31-34H,13-16,18H2,1-8H3,(H2,35,43,44)\/t26-,31-,32+,33-,34-\/m1\/s1 and CSc1ccc(\/C=N\/CCc2ccc(S(N)(=O)=O)cc2)cc1."}", "/scratch/micpie/export/compound_protein_compound_1/valid_3-0.jsonl": "{"text":"The compound InChI InChI=1S\/C16H15F3N4S2\/c1-10-5-7-12(8-6-10)20-14(24)22-23-15(25)21-13-4-2-3-11(9-13)16(17,18)19\/h2-9H,1H3,(H2,20,22,24)(H2,21,23,25) targets the protein Tyr-DNA phosphodiesterase 1 and which is also targeted by the compound O=C(\/C=C\/c1ccc(O)c(O)c1)O[C@@H](C(=O)O)[C@@H](OC(=O)\/C=C\/c1ccc(O)c(O)c1)C(=O)O."} {"text":"The compound InChI InChI=1S\/C15H18O4\/c1-7-6-10-12(8(2)14(18)19-10)13(17)15(3)9(7)4-5-11(15)16\/h4-5,7,9-10,12-13,17H,2,6H2,1,3H3\/t7-,9+,10-,12-,13+,15+\/m1\/s1 targets the protein Tyr-DNA phosphodiesterase 1 and which is also targeted by the compound CCOc1ccc(CCNC(=O)c2ccc(Cl)c(S(=O)(=O)N3CCOCC3)c2)cc1OCC."}", "/scratch/micpie/export/compound_protein_compound_1/test_0-1.jsonl": "{"text":"The protein Serine-protein kinase ATM is targeted by the compound with the canonical SMILES Cc1nn(CC(=O)NC(C)C(C)C)c(C)c1S(=O)(=O)N1CCCCC1 and Cc1ccc(SC(CC(=O)c2ccc(F)cc2)C(=O)O)cc1."} {"text":"The protein Aldo-keto reductase family 1 member B1 is targeted by the compound with the InChI InChI=1S\/C8H6Cl2O2\/c9-6-2-1-3-7(10)5(6)4-8(11)12\/h1-3H,4H2,(H,11,12) and O=c1cc(-c2ccccc2)c2ccc(O)cc2o1."}", "/scratch/micpie/export/compound_protein_compound_1/test_5-0.jsonl": "{"text":"The compound SELFIES [C][N][Branch1][C][C][C@@H1][C][Branch1][C][O][=C][Branch1][=Branch1][C][Branch1][C][N][=O][C][=Branch1][C][=O][C@@][Branch1][C][O][C][Branch1][C][O][=C][C][=Branch1][C][=O][C][=C][Branch1][C][O][C][=C][C][=C][Ring1][#Branch1][C@@][Branch1][C][C][Branch1][C][O][C@H1][Ring1][=C][C@H1][Branch1][C][O][C@@H1][Ring2][Ring1][=C][Ring2][Ring1][Branch1] targets the protein Tyr-DNA phosphodiesterase 1 and which is also targeted by the compound COc1cc2ccc(=O)oc2c(O)c1O."} {"text":"The compound DeepSMILES CCCnc-ccccCl)cc6))))))ccC=O)NCCCNCCNcccccCl)c6Cl)))))))CC6)))))))))))c5C.Cl targets the protein 5-hydroxytryptamine receptor 2A and which is also targeted by the compound O=S(=O)(Nc1cc(Sc2cccc3cccnc23)c(O)c2ccccc12)c1ccc(F)cc1."}", "/scratch/micpie/export/compound_protein_compound_1/test_2-0.jsonl": "{"text":"The compound SMILES Cc1c(CC(=O)O)cc2ccc(Cl)cc2c1-c1ccc(S(=O)(=O)c2ccc(F)cc2F)cc1 targets the protein CD antigen CD294 and which is also targeted by the compound Cc1ccc(S(=O)(=O)C(C)C)cc1C#Cc1cc(Cl)ccc1OCC(=O)O."} {"text":"The compound SELFIES [C][C][C][C][C][=C][C][=Branch1][C][=O][O][C][=C][C][Branch1][C][C][=C][Branch1][=Branch2][C][N][C][C][C][C][Ring1][Branch1][C][Branch1][C][O][=C][Ring2][Ring1][Ring1][Ring1][=C] targets the protein Tyr-DNA phosphodiesterase 1 and which is also targeted by the compound O=C(\/C=C\/c1ccc(O)c(O)c1)O[C@@H](C(=O)O)[C@@H](OC(=O)\/C=C\/c1ccc(O)c(O)c1)C(=O)O."}", "/scratch/micpie/export/compound_protein_compound_1/valid_2-2.jsonl": "{"text":"User: Can you give me one example for a compound canonical SMILES that targets the protein Chemoattractant receptor-homologous molecule expressed on Th2 cells?\nAssistant: Yes, the compound canonical SMILES CCN(c1nc2cc(F)ccc2o1)C1CCc2c(CC(=O)O)c3ccc(Cl)cc3n2C1 targets the protein Chemoattractant receptor-homologous molecule expressed on Th2 cells.\nUser: Can you create another compound canonical SMILES that targets the protein Chemoattractant receptor-homologous molecule expressed on Th2 cells?\nAssistant: Of course, the compound canonical SMILES CCN(c1nc2cc(F)ccc2o1)C1CCc2c(CC(=O)O)c3ccc(Cl)cc3n2C1 targets the compound canonical SMILES Cc1ccc(S(=O)(=O)C(C)C)cc1C#Cc1cc(Cl)ccc1OCC(=O)O."} {"text":"User: Can you give me one example for a compound SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, the compound SMILES C[C@H]1C(c2cccc(Cl)c2)=NN(c2ccccc2)[C@@H]1CO targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you create another compound SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, of course, the compound SMILES C[C@H]1C(c2cccc(Cl)c2)=NN(c2ccccc2)[C@@H]1CO targets the compound SMILES O=C(\/C=C\/c1ccc(O)c(O)c1)O[C@@H](C(=O)O)[C@@H](OC(=O)\/C=C\/c1ccc(O)c(O)c1)C(=O)O."}", "/scratch/micpie/export/compound_protein_compound_1/valid_0-0.jsonl": "{"text":"The compound canonical SMILES CC1C=CCC2C(=O)N(c3cccc(C(=O)Nc4ccc(Br)cc4)c3)C(=O)C12 targets the protein Serine-protein kinase ATM and which is also targeted by the compound Cc1ccc(SC(CC(=O)c2ccc(F)cc2)C(=O)O)cc1."} {"text":"The compound SMILES O=S(=O)(c1ccc(Cl)cc1)c1ccc(O)nn1 targets the protein Aldehyde reductase and which is also targeted by the compound O=c1cc(-c2ccccc2)c2ccc(O)cc2o1."}", "/scratch/micpie/export/compound_protein_compound_1/train_6-1.jsonl": "{"text":"The protein 5-HT-2 is targeted by the compound with the SELFIES [C][N][Branch1][C][C][C][C][S][C][Branch1][C][C][Branch1][C][C][C] and O=S(=O)(Nc1cc(Sc2cccc3cccnc23)c(O)c2ccccc12)c1ccc(F)cc1."} {"text":"The protein Adenosine receptor A2a is targeted by the compound with the canonical SMILES Nc1nc(-c2ccccc2)cc(-c2ccc3c(c2)OCO3)n1 and OC[C@H]1O[C@@H](n2cnc3c(NC4CC5CCC4C5)ncnc32)[C@H](O)[C@@H]1O."}", "/scratch/micpie/export/compound_protein_compound_1/train_7-2.jsonl": "{"text":"User: Can you come up with an example for a compound SMILES that targets the protein Adenosine receptor A2a?\nAssistant: Of course, the compound SMILES Clc1ccc(Cc2n[nH]c(-c3ccc4c(c3)OCO4)n2)cc1Cl targets the protein Adenosine receptor A2a.\nUser: Can you tell another compound SMILES that targets the protein Adenosine receptor A2a?\nAssistant: Yes, of course, the compound SMILES Clc1ccc(Cc2n[nH]c(-c3ccc4c(c3)OCO4)n2)cc1Cl targets the compound SMILES OC[C@H]1O[C@@H](n2cnc3c(NC4CC5CCC4C5)ncnc32)[C@H](O)[C@@H]1O."} {"text":"User: Can you give me one example for a compound InChI that targets the protein K-OR-1?\nAssistant: Sure, the compound InChI InChI=1S\/C28H41N5O4\/c34-18-24(28(36)37)29-14-15-30-26-27(35)33(25-11-7-6-10-23(25)31-26)22-16-20-12-13-21(17-22)32(20)19-8-4-2-1-3-5-9-19\/h6-7,10-11,19-22,24,29,34H,1-5,8-9,12-18H2,(H,30,31)(H,36,37)\/t20-,21+,22+,24? targets the protein K-OR-1.\nUser: Can you create another compound InChI that targets the protein K-OR-1?\nAssistant: Of course, the compound InChI InChI=1S\/C28H41N5O4\/c34-18-24(28(36)37)29-14-15-30-26-27(35)33(25-11-7-6-10-23(25)31-26)22-16-20-12-13-21(17-22)32(20)19-8-4-2-1-3-5-9-19\/h6-7,10-11,19-22,24,29,34H,1-5,8-9,12-18H2,(H,30,31)(H,36,37)\/t20-,21+,22+,24? targets the compound InChI C[C@H]1CN(Cc2ccc3ccccc3n2)CC[C@@]1(C)c1cccc(C(N)=O)c1."}", "/scratch/micpie/export/compound_protein_compound_1/test_7-0.jsonl": "{"text":"The compound canonical SMILES O=C(Nc1cccc(Cl)c1)Nc1nc2nn(CCc3ccccc3)cc2c2nc(-c3ccco3)nn12 targets the protein Adenosine receptor A2a and which is also targeted by the compound OC[C@H]1O[C@@H](n2cnc3c(NC4CC5CCC4C5)ncnc32)[C@H](O)[C@@H]1O."} {"text":"The compound SMILES CCn1c2c(c3c1[C@@H]1Oc4c(O)ccc5c4[C@@]14CCN(C)C(C5)[C@]4(O)C3)CCCC2 targets the protein KOR-1 and which is also targeted by the compound C[C@H]1CN(Cc2ccc3ccccc3n2)CC[C@@]1(C)c1cccc(C(N)=O)c1."}", "/scratch/micpie/export/compound_protein_compound_1/test_8-2.jsonl": "{"text":"User: Can you give me one example for a compound canonical SMILES that targets the protein KOR-1?\nAssistant: Yes, of course, the compound canonical SMILES CC(C)[C@H]1[C@H]2c3ccccc3C[C@H](c3ccccc32)N1C targets the protein KOR-1.\nUser: Can you generate another compound canonical SMILES that targets the protein KOR-1?\nAssistant: Yes, of course, the compound canonical SMILES CC(C)[C@H]1[C@H]2c3ccccc3C[C@H](c3ccccc32)N1C targets the compound canonical SMILES C[C@H]1CN(Cc2ccc3ccccc3n2)CC[C@@]1(C)c1cccc(C(N)=O)c1."} {"text":"User: Can you come up with an example for a compound SELFIES that targets the protein CA-II?\nAssistant: Yes, of course, the compound SELFIES [N][S][=Branch1][C][=O][=Branch1][C][=O][C][=N][N][=C][Branch2][Ring2][Ring2][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][P][N][\/C][Branch1][C][S][=N][\/C][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][S][Ring2][Ring1][O] targets the protein CA-II.\nUser: Can you create another compound SELFIES that targets the protein CA-II?\nAssistant: Of course, the compound SELFIES [N][S][=Branch1][C][=O][=Branch1][C][=O][C][=N][N][=C][Branch2][Ring2][Ring2][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][P][N][\/C][Branch1][C][S][=N][\/C][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][S][Ring2][Ring1][O] targets the compound SELFIES CN(CC(=O)Nc1c(S(N)(=O)=O)cc(S(N)(=O)=O)c(Cl)c1Cl)C(=N)N."}", "/scratch/micpie/export/compound_protein_compound_1/test_0-2.jsonl": "{"text":"User: Can you come up with an example for a compound InChI that targets the protein A-T mutated?\nAssistant: Of course, the compound InChI InChI=1S\/C17H30N4O3S\/c1-12(2)13(3)18-16(22)11-21-15(5)17(14(4)19-21)25(23,24)20-9-7-6-8-10-20\/h12-13H,6-11H2,1-5H3,(H,18,22) targets the protein A-T mutated.\nUser: Can you tell me another SMILES that targets the protein A-T mutated?\nAssistant: Of course, the SMILES Cc1ccc(SC(CC(=O)c2ccc(F)cc2)C(=O)O)cc1 also targets the protein A-T mutated."} {"text":"User: Can you give me an example for a compound canonical SMILES that targets the protein AR?\nAssistant: Of course, the compound canonical SMILES O=C(O)Cc1c(Cl)cccc1Cl targets the protein AR.\nUser: Can you create another SMILES that targets the protein AR?\nAssistant: Yes, the SMILES O=c1cc(-c2ccccc2)c2ccc(O)cc2o1 also targets the protein AR."}", "/scratch/micpie/export/compound_protein_compound_1/test_3-0.jsonl": "{"text":"The compound InChI InChI=1S\/C20H20N4O8S2\/c25-19-20(26)22-16-11-13(1-3-15(16)21-19)33(27,28)23-5-7-24(8-6-23)34(29,30)14-2-4-17-18(12-14)32-10-9-31-17\/h1-4,11-12H,5-10H2,(H,21,25)(H,22,26) targets the protein Tyr-DNA phosphodiesterase 1 and which is also targeted by the compound O=C(\/C=C\/c1ccc(O)c(O)c1)O[C@@H](C(=O)O)[C@@H](OC(=O)\/C=C\/c1ccc(O)c(O)c1)C(=O)O."} {"text":"The compound SMILES Oc1[nH]c(S)nc2nc(S)nc1-2 targets the protein Tyrosyl-DNA phosphodiesterase 1 and which is also targeted by the compound CCOc1ccc(CCNC(=O)c2ccc(Cl)c(S(=O)(=O)N3CCOCC3)c2)cc1OCC."}", "/scratch/micpie/export/compound_protein_compound_1/train_1-0.jsonl": "{"text":"The compound DeepSMILES CCOC=O)C=CO)C=O)N[C@@H]C)cccccc6)))))))C5 targets the protein Aldose reductase and which is also targeted by the compound O=c1cc(-c2ccccc2)c2ccc(O)cc2o1."} {"text":"The compound SMILES Cc1c(Cc2nn(Cc3ccccc3)c(=O)c3ccccc23)c2cc(Cl)ccc2n1CC(=O)O targets the protein G-protein coupled receptor 44 and which is also targeted by the compound Cc1ccc(S(=O)(=O)C(C)C)cc1C#Cc1cc(Cl)ccc1OCC(=O)O."}", "/scratch/micpie/export/compound_protein_compound_1/test_5-2.jsonl": "{"text":"User: Can you come up with one example for a compound SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, of course, the compound SMILES CN(C)[C@@H]1C(O)=C(C(N)=O)C(=O)[C@@]2(O)C(O)=C3C(=O)c4c(O)cccc4[C@@](C)(O)[C@H]3[C@H](O)[C@@H]12 targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you generate another compound SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Sure, the compound SMILES CN(C)[C@@H]1C(O)=C(C(N)=O)C(=O)[C@@]2(O)C(O)=C3C(=O)c4c(O)cccc4[C@@](C)(O)[C@H]3[C@H](O)[C@@H]12 targets the compound SMILES COc1cc2ccc(=O)oc2c(O)c1O."} {"text":"User: Can you give me an example for a compound DeepSMILES that targets the protein 5-HT-2?\nAssistant: Yes, the compound DeepSMILES CCCnc-ccccCl)cc6))))))ccC=O)NCCCNCCNcccccCl)c6Cl)))))))CC6)))))))))))c5C.Cl targets the protein 5-HT-2.\nUser: Can you create another compound DeepSMILES that targets the protein 5-HT-2?\nAssistant: Sure, the compound DeepSMILES CCCnc-ccccCl)cc6))))))ccC=O)NCCCNCCNcccccCl)c6Cl)))))))CC6)))))))))))c5C.Cl targets the compound DeepSMILES O=S(=O)(Nc1cc(Sc2cccc3cccnc23)c(O)c2ccccc12)c1ccc(F)cc1."}", "/scratch/micpie/export/compound_protein_compound_1/valid_1-2.jsonl": "{"text":"User: Can you give me one example for a compound canonical SMILES that targets the protein Aldehyde reductase?\nAssistant: Of course, the compound canonical SMILES CCOC(=O)c1cc(=O)c2ccc(O)cc2o1 targets the protein Aldehyde reductase.\nUser: Can you create another compound canonical SMILES that targets the protein Aldehyde reductase?\nAssistant: Of course, the compound canonical SMILES CCOC(=O)c1cc(=O)c2ccc(O)cc2o1 targets the compound canonical SMILES O=c1cc(-c2ccccc2)c2ccc(O)cc2o1."} {"text":"User: Can you give me one example for a compound DeepSMILES that targets the protein Prostaglandin D2 receptor 2?\nAssistant: Yes, of course, the compound DeepSMILES C[C@@H]CC=O)NCCccF)ccc-ccccCC=O)O)))cc6OCCF)F)F))))))))))c6C%10))))))))))))cccccc6 targets the protein Prostaglandin D2 receptor 2.\nUser: Can you generate another compound DeepSMILES that targets the protein Prostaglandin D2 receptor 2?\nAssistant: Yes, the compound DeepSMILES C[C@@H]CC=O)NCCccF)ccc-ccccCC=O)O)))cc6OCCF)F)F))))))))))c6C%10))))))))))))cccccc6 targets the compound DeepSMILES Cc1ccc(S(=O)(=O)C(C)C)cc1C#Cc1cc(Cl)ccc1OCC(=O)O."}", "/scratch/micpie/export/compound_protein_compound_1/valid_7-2.jsonl": "{"text":"User: Can you give me an example for a compound canonical SMILES that targets the protein Adenosine receptor A2a?\nAssistant: Yes, of course, the compound canonical SMILES CC(=O)c1snc(Nc2cc(Cl)ccc2F)c1N targets the protein Adenosine receptor A2a.\nUser: Can you generate another compound canonical SMILES that targets the protein Adenosine receptor A2a?\nAssistant: Sure, the compound canonical SMILES CC(=O)c1snc(Nc2cc(Cl)ccc2F)c1N targets the compound canonical SMILES OC[C@H]1O[C@@H](n2cnc3c(NC4CC5CCC4C5)ncnc32)[C@H](O)[C@@H]1O."} {"text":"User: Can you give me an example for a compound DeepSMILES that targets the protein Kappa-type opioid receptor?\nAssistant: Sure, the compound DeepSMILES CNC=O)NCccccCC=O)NC)[C@H]CNCCCC5))))))cccccc6))))))))))cc6 targets the protein Kappa-type opioid receptor.\nUser: Can you generate another compound DeepSMILES that targets the protein Kappa-type opioid receptor?\nAssistant: Sure, the compound DeepSMILES CNC=O)NCccccCC=O)NC)[C@H]CNCCCC5))))))cccccc6))))))))))cc6 targets the compound DeepSMILES C[C@H]1CN(Cc2ccc3ccccc3n2)CC[C@@]1(C)c1cccc(C(N)=O)c1."}", "/scratch/micpie/export/compound_protein_compound_1/test_0-0.jsonl": "{"text":"The compound canonical SMILES Cc1nn(CC(=O)NC(C)C(C)C)c(C)c1S(=O)(=O)N1CCCCC1 targets the protein A-T mutated and which is also targeted by the compound Cc1ccc(SC(CC(=O)c2ccc(F)cc2)C(=O)O)cc1."} {"text":"The compound SMILES O=C(O)Cc1c(Cl)cccc1Cl targets the protein AR and which is also targeted by the compound O=c1cc(-c2ccccc2)c2ccc(O)cc2o1."}", "/scratch/micpie/export/compound_protein_compound_1/test_6-0.jsonl": "{"text":"The compound SELFIES [C][C][NH1][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Branch1][C][C][C][=Ring1][N][C][=Branch1][C][=O][N][C][C][C][N][C][C][N][Branch1][=N][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1][Cl][C][C][Ring1][=C].[Cl] targets the protein Serotonin receptor 2A and which is also targeted by the compound O=S(=O)(Nc1cc(Sc2cccc3cccnc23)c(O)c2ccccc12)c1ccc(F)cc1."} {"text":"The compound InChI InChI=1S\/C24H25FN6O4S\/c1-2-31-23(32)20-22(28-24(31)33)27-21(26-20)17-5-9-19(10-6-17)36(34,35)30-13-11-29(12-14-30)15-16-3-7-18(25)8-4-16\/h3-10H,2,11-15H2,1H3,(H,26,27)(H,28,33) targets the protein Adenosine receptor A2a and which is also targeted by the compound OC[C@H]1O[C@@H](n2cnc3c(NC4CC5CCC4C5)ncnc32)[C@H](O)[C@@H]1O."}", "/scratch/micpie/export/compound_protein_compound_1/train_2-0.jsonl": "{"text":"The compound SMILES COc1ccc(CN2C(=O)CC3(C2=O)C(=O)N(CC(=O)O)c2ccc(Cl)cc23)cc1F targets the protein Chemoattractant receptor-homologous molecule expressed on Th2 cells and which is also targeted by the compound Cc1ccc(S(=O)(=O)C(C)C)cc1C#Cc1cc(Cl)ccc1OCC(=O)O."} {"text":"The compound canonical SMILES CN1CCN(NS(=O)(=O)c2ccc(Cl)s2)CC1 targets the protein Tyrosyl-DNA phosphodiesterase 1 and which is also targeted by the compound O=C(\/C=C\/c1ccc(O)c(O)c1)O[C@@H](C(=O)O)[C@@H](OC(=O)\/C=C\/c1ccc(O)c(O)c1)C(=O)O."}", "/scratch/micpie/export/compound_protein_compound_1/valid_2-0.jsonl": "{"text":"The compound InChI InChI=1S\/C23H21ClFN3O3\/c1-2-27(23-26-18-10-14(25)4-8-21(18)31-23)15-5-7-19-17(11-22(29)30)16-6-3-13(24)9-20(16)28(19)12-15\/h3-4,6,8-10,15H,2,5,7,11-12H2,1H3,(H,29,30) targets the protein CD antigen CD294 and which is also targeted by the compound Cc1ccc(S(=O)(=O)C(C)C)cc1C#Cc1cc(Cl)ccc1OCC(=O)O."} {"text":"The compound InChI InChI=1S\/C17H17ClN2O\/c1-12-16(11-21)20(15-8-3-2-4-9-15)19-17(12)13-6-5-7-14(18)10-13\/h2-10,12,16,21H,11H2,1H3\/t12-,16-\/m1\/s1 targets the protein Tyrosyl-DNA phosphodiesterase 1 and which is also targeted by the compound O=C(\/C=C\/c1ccc(O)c(O)c1)O[C@@H](C(=O)O)[C@@H](OC(=O)\/C=C\/c1ccc(O)c(O)c1)C(=O)O."}", "/scratch/micpie/export/compound_protein_compound_1/test_7-1.jsonl": "{"text":"The protein Adenosine receptor A2a is targeted by the compound with the DeepSMILES O=CNcccccCl)c6)))))))NcncnnCCcccccc6))))))))cc5cnc-cccco5)))))nn%125 and OC[C@H]1O[C@@H](n2cnc3c(NC4CC5CCC4C5)ncnc32)[C@H](O)[C@@H]1O."} {"text":"The protein Kappa-type opioid receptor is targeted by the compound with the DeepSMILES CCncccc5[C@@H]OccO)cccc6[C@@]9CCNC)CC8)[C@]6O)C%17)))))))))))))))))CCCC6 and C[C@H]1CN(Cc2ccc3ccccc3n2)CC[C@@]1(C)c1cccc(C(N)=O)c1."}", "/scratch/micpie/export/compound_protein_compound_1/valid_4-2.jsonl": "{"text":"User: Can you come up with an example for a compound InChI that targets the protein Tyrosyl-DNA phosphodiesterase 1?\nAssistant: Yes, the compound InChI InChI=1S\/C20H24O5\/c1-3-4-9-15-10-14-11-17(21)20(2,18(22)16(14)12-24-15)25-19(23)13-7-5-6-8-13\/h10-13H,3-9H2,1-2H3 targets the protein Tyrosyl-DNA phosphodiesterase 1.\nUser: Can you create another compound InChI that targets the protein Tyrosyl-DNA phosphodiesterase 1?\nAssistant: Yes, the compound InChI InChI=1S\/C20H24O5\/c1-3-4-9-15-10-14-11-17(21)20(2,18(22)16(14)12-24-15)25-19(23)13-7-5-6-8-13\/h10-13H,3-9H2,1-2H3 targets the compound InChI CCOc1ccc(CCNC(=O)c2ccc(Cl)c(S(=O)(=O)N3CCOCC3)c2)cc1OCC."} {"text":"User: Can you come up with one example for a compound SMILES that targets the protein Tyrosyl-DNA phosphodiesterase 1?\nAssistant: Sure, the compound SMILES COc1ccc(C(=O)O)cc1NC(=S)NC(=O)c1ccc(-c2ccc(Cl)cc2)o1 targets the protein Tyrosyl-DNA phosphodiesterase 1.\nUser: Can you tell another compound SMILES that targets the protein Tyrosyl-DNA phosphodiesterase 1?\nAssistant: Yes, the compound SMILES COc1ccc(C(=O)O)cc1NC(=S)NC(=O)c1ccc(-c2ccc(Cl)cc2)o1 targets the compound SMILES COc1cc2ccc(=O)oc2c(O)c1O."}", "/scratch/micpie/export/compound_protein_compound_1/test_4-2.jsonl": "{"text":"User: Can you give me one example for a compound DeepSMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Sure, the compound DeepSMILES CCCS=O)=O)cccccc6)NC=O)CS6)))))))))))C=O)NCcccccc6Cl targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you tell another compound DeepSMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Of course, the compound DeepSMILES CCCS=O)=O)cccccc6)NC=O)CS6)))))))))))C=O)NCcccccc6Cl targets the compound DeepSMILES CCOc1ccc(CCNC(=O)c2ccc(Cl)c(S(=O)(=O)N3CCOCC3)c2)cc1OCC."} {"text":"User: Can you come up with an example for a compound SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, of course, the compound SMILES O=C(COC(=O)\/C=C\/c1ccco1)NCc1ccco1 targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you tell another compound SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, of course, the compound SMILES O=C(COC(=O)\/C=C\/c1ccco1)NCc1ccco1 targets the compound SMILES COc1cc2ccc(=O)oc2c(O)c1O."}", "/scratch/micpie/export/compound_protein_compound_1/valid_3-2.jsonl": "{"text":"User: Can you come up with one example for a compound SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, the compound SMILES Cc1ccc(NC(S)=NN=C(S)Nc2cccc(C(F)(F)F)c2)cc1 targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you generate another compound SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, of course, the compound SMILES Cc1ccc(NC(S)=NN=C(S)Nc2cccc(C(F)(F)F)c2)cc1 targets the compound SMILES O=C(\/C=C\/c1ccc(O)c(O)c1)O[C@@H](C(=O)O)[C@@H](OC(=O)\/C=C\/c1ccc(O)c(O)c1)C(=O)O."} {"text":"User: Can you come up with an example for a compound canonical SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Sure, the compound canonical SMILES C=C1C(=O)O[C@@H]2C[C@@H](C)[C@@H]3C=CC(=O)[C@@]3(C)[C@@H](O)[C@H]12 targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you create another compound canonical SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, the compound canonical SMILES C=C1C(=O)O[C@@H]2C[C@@H](C)[C@@H]3C=CC(=O)[C@@]3(C)[C@@H](O)[C@H]12 targets the compound canonical SMILES CCOc1ccc(CCNC(=O)c2ccc(Cl)c(S(=O)(=O)N3CCOCC3)c2)cc1OCC."}", "/scratch/micpie/export/compound_protein_compound_1/valid_2-1.jsonl": "{"text":"The protein G-protein coupled receptor 44 is targeted by the compound with the SMILES CCN(c1nc2cc(F)ccc2o1)C1CCc2c(CC(=O)O)c3ccc(Cl)cc3n2C1 and Cc1ccc(S(=O)(=O)C(C)C)cc1C#Cc1cc(Cl)ccc1OCC(=O)O."} {"text":"The protein Tyr-DNA phosphodiesterase 1 is targeted by the compound with the SELFIES [C][C@H1][C][Branch1][N][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1][=N][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C@@H1][Ring2][Ring1][C][C][O] and O=C(\/C=C\/c1ccc(O)c(O)c1)O[C@@H](C(=O)O)[C@@H](OC(=O)\/C=C\/c1ccc(O)c(O)c1)C(=O)O."}", "/scratch/micpie/export/compound_protein_compound_1/valid_4-0.jsonl": "{"text":"The compound DeepSMILES CCCCC=CC=CC=O)CC)OC=O)CCCCC5)))))))C=O)C6=CO%10 targets the protein Tyr-DNA phosphodiesterase 1 and which is also targeted by the compound CCOc1ccc(CCNC(=O)c2ccc(Cl)c(S(=O)(=O)N3CCOCC3)c2)cc1OCC."} {"text":"The compound SELFIES [C][O][C][=C][C][=C][Branch1][=Branch1][C][=Branch1][C][=O][O][C][=C][Ring1][=Branch2][N][C][=Branch1][C][=S][N][C][=Branch1][C][=O][C][=C][C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][O][Ring1][N] targets the protein Tyr-DNA phosphodiesterase 1 and which is also targeted by the compound COc1cc2ccc(=O)oc2c(O)c1O."}", "/scratch/micpie/export/compound_protein_compound_1/train_5-1.jsonl": "{"text":"The protein Tyrosyl-DNA phosphodiesterase 1 is targeted by the compound with the SMILES O=C(c1ccc(F)cc1)N1CCN(c2ccc([N+](=O)[O-])cc2-n2cccc2)CC1 and COc1cc2ccc(=O)oc2c(O)c1O."} {"text":"The protein 5-hydroxytryptamine receptor 2A is targeted by the compound with the SELFIES [C][=C][C][=C][C][Branch2][Ring1][=N][N][C][C][N][Branch2][Ring1][C][C][C][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][N][C][N][Ring1][=Branch1][C][C][Ring1][P][=C][C][=C][C][Ring2][Ring1][#Branch1][=C][Ring2][Ring1][O] and O=S(=O)(Nc1cc(Sc2cccc3cccnc23)c(O)c2ccccc12)c1ccc(F)cc1."}", "/scratch/micpie/export/compound_protein_compound_1/test_2-1.jsonl": "{"text":"The protein CD antigen CD294 is targeted by the compound with the SMILES Cc1c(CC(=O)O)cc2ccc(Cl)cc2c1-c1ccc(S(=O)(=O)c2ccc(F)cc2F)cc1 and Cc1ccc(S(=O)(=O)C(C)C)cc1C#Cc1cc(Cl)ccc1OCC(=O)O."} {"text":"The protein Tyrosyl-DNA phosphodiesterase 1 is targeted by the compound with the SMILES CCCCc1cc(=O)oc2cc(C)c(CN3CCCC3)c(O)c12 and O=C(\/C=C\/c1ccc(O)c(O)c1)O[C@@H](C(=O)O)[C@@H](OC(=O)\/C=C\/c1ccc(O)c(O)c1)C(=O)O."}", "/scratch/micpie/export/compound_protein_compound_1/train_9-2.jsonl": "{"text":"User: Can you come up with one example for a compound InChI that targets the protein Carbonic anhydrase II?\nAssistant: Yes, of course, the compound InChI InChI=1S\/C17H20N2O3S\/c1-12-3-5-13(6-4-12)17-16-8-7-15(22-2)11-14(16)9-10-19(17)23(18,20)21\/h3-8,11,17H,9-10H2,1-2H3,(H2,18,20,21) targets the protein Carbonic anhydrase II.\nUser: Can you create another compound InChI that targets the protein Carbonic anhydrase II?\nAssistant: Yes, of course, the compound InChI InChI=1S\/C17H20N2O3S\/c1-12-3-5-13(6-4-12)17-16-8-7-15(22-2)11-14(16)9-10-19(17)23(18,20)21\/h3-8,11,17H,9-10H2,1-2H3,(H2,18,20,21) targets the compound InChI CN(CC(=O)Nc1c(S(N)(=O)=O)cc(S(N)(=O)=O)c(Cl)c1Cl)C(=N)N."} {"text":"User: Can you give me an example for a compound DeepSMILES that targets the protein Carbonic anhydrase 1?\nAssistant: Of course, the compound DeepSMILES NS=O)=O)ccccCNC=O)CCCNC=O)Ncccccc6)))))))))))))))cc6 targets the protein Carbonic anhydrase 1.\nUser: Can you create another compound DeepSMILES that targets the protein Carbonic anhydrase 1?\nAssistant: Yes, of course, the compound DeepSMILES NS=O)=O)ccccCNC=O)CCCNC=O)Ncccccc6)))))))))))))))cc6 targets the compound DeepSMILES CSc1ccc(\/C=N\/CCc2ccc(S(N)(=O)=O)cc2)cc1."}", "/scratch/micpie/export/compound_protein_compound_1/train_0-0.jsonl": "{"text":"The compound DeepSMILES O=CCOcccccc6[N+]=O)[O-]))))))))))NCCCcccccc6%10 targets the protein A-T mutated and which is also targeted by the compound Cc1ccc(SC(CC(=O)c2ccc(F)cc2)C(=O)O)cc1."} {"text":"The compound SMILES CS(=O)(=O)c1c(C2NC(=O)NC2=O)cc(Cl)cc1C(F)(F)F targets the protein Aldehyde reductase and which is also targeted by the compound O=c1cc(-c2ccccc2)c2ccc(O)cc2o1."}", "/scratch/micpie/export/compound_protein_compound_1/test_1-1.jsonl": "{"text":"The protein Aldo-keto reductase family 1 member B1 is targeted by the compound with the InChI InChI=1S\/C11H10N2O8\/c1-2-21-11(18)9(15)12-7-4-5(13(19)20)3-6(8(7)14)10(16)17\/h3-4,14H,2H2,1H3,(H,12,15)(H,16,17) and O=c1cc(-c2ccccc2)c2ccc(O)cc2o1."} {"text":"The protein Prostaglandin D2 receptor 2 is targeted by the compound with the InChI InChI=1S\/C27H28FN3O4\/c1-3-35-25-7-4-18(14-27(33)34)13-21(25)19-5-6-23(28)20-8-11-31(16-22(19)20)26(32)12-17(2)24-15-29-9-10-30-24\/h4-7,9-10,13,15,17H,3,8,11-12,14,16H2,1-2H3,(H,33,34) and Cc1ccc(S(=O)(=O)C(C)C)cc1C#Cc1cc(Cl)ccc1OCC(=O)O."}", "/scratch/micpie/export/compound_protein_compound_1/valid_9-0.jsonl": "{"text":"The compound SELFIES [C][C][C][C][C][C][C][Branch1][C][N][C][=C][S][C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][=C][Ring1][=Branch2] targets the protein CA-II and which is also targeted by the compound CN(CC(=O)Nc1c(S(N)(=O)=O)cc(S(N)(=O)=O)c(Cl)c1Cl)C(=N)N."} {"text":"The compound InChI InChI=1S\/C34H50N4O11S\/c1-19(2)13-27(39)45-18-26-31(47-28(40)14-20(3)4)32(48-29(41)15-21(5)6)33(49-30(42)16-22(7)8)34(46-26)38-17-25(36-37-38)23-9-11-24(12-10-23)50(35,43)44\/h9-12,17,19-22,26,31-34H,13-16,18H2,1-8H3,(H2,35,43,44)\/t26-,31-,32+,33-,34-\/m1\/s1 targets the protein CA-I and which is also targeted by the compound CSc1ccc(\/C=N\/CCc2ccc(S(N)(=O)=O)cc2)cc1."}", "/scratch/micpie/export/compound_protein_compound_1/train_5-2.jsonl": "{"text":"User: Can you give me an example for a compound InChI that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Sure, the compound InChI InChI=1S\/C21H19FN4O3\/c22-17-5-3-16(4-6-17)21(27)25-13-11-24(12-14-25)19-8-7-18(26(28)29)15-20(19)23-9-1-2-10-23\/h1-10,15H,11-14H2 targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you create another compound InChI that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Of course, the compound InChI InChI=1S\/C21H19FN4O3\/c22-17-5-3-16(4-6-17)21(27)25-13-11-24(12-14-25)19-8-7-18(26(28)29)15-20(19)23-9-1-2-10-23\/h1-10,15H,11-14H2 targets the compound InChI COc1cc2ccc(=O)oc2c(O)c1O."} {"text":"User: Can you give me an example for a compound InChI that targets the protein 5-HT-2A?\nAssistant: Sure, the compound InChI InChI=1S\/C23H26N4\/c1-2-6-20-19(4-1)5-3-7-23(20)27-14-12-26(13-15-27)11-10-18-8-9-21-22(16-18)25-17-24-21\/h1-9,16,24-25H,10-15,17H2 targets the protein 5-HT-2A.\nUser: Can you create another compound InChI that targets the protein 5-HT-2A?\nAssistant: Of course, the compound InChI InChI=1S\/C23H26N4\/c1-2-6-20-19(4-1)5-3-7-23(20)27-14-12-26(13-15-27)11-10-18-8-9-21-22(16-18)25-17-24-21\/h1-9,16,24-25H,10-15,17H2 targets the compound InChI O=S(=O)(Nc1cc(Sc2cccc3cccnc23)c(O)c2ccccc12)c1ccc(F)cc1."}", "/scratch/micpie/export/compound_protein_compound_1/train_8-1.jsonl": "{"text":"The protein K-OR-1 is targeted by the compound with the InChI InChI=1S\/C24H29N3O\/c1-2-25-22-12-6-7-13-23(22)27(24(25)28)20-15-16-26-19(17-20)11-8-14-21(26)18-9-4-3-5-10-18\/h3-7,9-10,12-13,19-21H,2,8,11,14-17H2,1H3\/t19-,20+,21+\/m1\/s1 and C[C@H]1CN(Cc2ccc3ccccc3n2)CC[C@@]1(C)c1cccc(C(N)=O)c1."} {"text":"The protein Carbonate dehydratase II is targeted by the compound with the SELFIES [N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][N][Branch2][Ring1][Branch1][C][C][N][Branch1][#Branch1][C][C][=Branch1][C][=O][O][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][=C][C][=C][C][=C][Ring1][=Branch1][O] and CN(CC(=O)Nc1c(S(N)(=O)=O)cc(S(N)(=O)=O)c(Cl)c1Cl)C(=N)N."}", "/scratch/micpie/export/compound_protein_compound_1/train_8-0.jsonl": "{"text":"The compound SMILES CCn1c(=O)n([C@H]2CCN3[C@H](c4ccccc4)CCC[C@@H]3C2)c2ccccc21 targets the protein KOR-1 and which is also targeted by the compound C[C@H]1CN(Cc2ccc3ccccc3n2)CC[C@@]1(C)c1cccc(C(N)=O)c1."} {"text":"The compound canonical SMILES NS(=O)(=O)c1ccccc1NC(=O)CN(CCN(CC(=O)O)c1ccccc1O)c1ccccc1O targets the protein Carbonate dehydratase II and which is also targeted by the compound CN(CC(=O)Nc1c(S(N)(=O)=O)cc(S(N)(=O)=O)c(Cl)c1Cl)C(=N)N."}", "/scratch/micpie/export/compound_protein_compound_1/test_5-1.jsonl": "{"text":"The protein Tyrosyl-DNA phosphodiesterase 1 is targeted by the compound with the SMILES CN(C)[C@@H]1C(O)=C(C(N)=O)C(=O)[C@@]2(O)C(O)=C3C(=O)c4c(O)cccc4[C@@](C)(O)[C@H]3[C@H](O)[C@@H]12 and COc1cc2ccc(=O)oc2c(O)c1O."} {"text":"The protein 5-HT-2A is targeted by the compound with the SMILES CCCn1c(-c2ccc(Cl)cc2)cc(C(=O)NCCCN2CCN(c3cccc(Cl)c3Cl)CC2)c1C.Cl and O=S(=O)(Nc1cc(Sc2cccc3cccnc23)c(O)c2ccccc12)c1ccc(F)cc1."}", "/scratch/micpie/export/compound_protein_compound_1/train_4-1.jsonl": "{"text":"The protein Tyrosyl-DNA phosphodiesterase 1 is targeted by the compound with the DeepSMILES O=CCScncccccc6c=O)n%10-cccccc6))))))))))))))))))NC=O)CNCCCC5=O and CCOc1ccc(CCNC(=O)c2ccc(Cl)c(S(=O)(=O)N3CCOCC3)c2)cc1OCC."} {"text":"The protein Tyr-DNA phosphodiesterase 1 is targeted by the compound with the DeepSMILES COC=O)cc-ccccs5)))))csc5NC=O)ccccCC)C))cc6 and COc1cc2ccc(=O)oc2c(O)c1O."}", "/scratch/micpie/export/compound_protein_compound_1/valid_0-2.jsonl": "{"text":"User: Can you come up with an example for a compound SELFIES that targets the protein Serine-protein kinase ATM?\nAssistant: Yes, of course, the compound SELFIES [C][C][C][=C][C][C][C][=Branch1][C][=O][N][Branch2][Ring1][O][C][=C][C][=C][C][Branch1][P][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][=C][Ring1][S][C][=Branch1][C][=O][C][Ring2][Ring1][O][Ring2][Ring1][#Branch1] targets the protein Serine-protein kinase ATM.\nUser: Can you generate another SMILES that targets the protein Serine-protein kinase ATM?\nAssistant: Yes, the SMILES representation Cc1ccc(SC(CC(=O)c2ccc(F)cc2)C(=O)O)cc1 also targets the protein Serine-protein kinase ATM."} {"text":"User: Can you come up with one example for a compound InChI that targets the protein AR?\nAssistant: Yes, of course, the compound InChI InChI=1S\/C10H7ClN2O3S\/c11-7-1-3-8(4-2-7)17(15,16)10-6-5-9(14)12-13-10\/h1-6H,(H,12,14) targets the protein AR.\nUser: Can you tell me another SMILES of a molecule that targets the protein AR?\nAssistant: Sure, the SMILES O=c1cc(-c2ccccc2)c2ccc(O)cc2o1 also targets the protein AR."}", "/scratch/micpie/export/compound_protein_compound_1/train_5-0.jsonl": "{"text":"The compound DeepSMILES O=CccccF)cc6))))))NCCNcccc[N+]=O)[O-]))cc6-ncccc5)))))))))))CC6 targets the protein Tyrosyl-DNA phosphodiesterase 1 and which is also targeted by the compound COc1cc2ccc(=O)oc2c(O)c1O."} {"text":"The compound SMILES c1ccc2c(N3CCN(CCc4ccc5c(c4)NCN5)CC3)cccc2c1 targets the protein Serotonin receptor 2A and which is also targeted by the compound O=S(=O)(Nc1cc(Sc2cccc3cccnc23)c(O)c2ccccc12)c1ccc(F)cc1."}", "/scratch/micpie/export/compound_protein_compound_1/test_6-2.jsonl": "{"text":"User: Can you come up with one example for a compound canonical SMILES that targets the protein 5-HT-2A?\nAssistant: Of course, the compound canonical SMILES Cc1[nH]c(-c2ccccc2)c(C)c1C(=O)NCCCN1CCN(c2cccc(Cl)c2Cl)CC1.Cl targets the protein 5-HT-2A.\nUser: Can you tell another compound canonical SMILES that targets the protein 5-HT-2A?\nAssistant: Yes, the compound canonical SMILES Cc1[nH]c(-c2ccccc2)c(C)c1C(=O)NCCCN1CCN(c2cccc(Cl)c2Cl)CC1.Cl targets the compound canonical SMILES O=S(=O)(Nc1cc(Sc2cccc3cccnc23)c(O)c2ccccc12)c1ccc(F)cc1."} {"text":"User: Can you give me one example for a compound canonical SMILES that targets the protein Adenosine receptor A2a?\nAssistant: Of course, the compound canonical SMILES CCn1c(=O)[nH]c2nc(-c3ccc(S(=O)(=O)N4CCN(Cc5ccc(F)cc5)CC4)cc3)[nH]c2c1=O targets the protein Adenosine receptor A2a.\nUser: Can you generate another compound canonical SMILES that targets the protein Adenosine receptor A2a?\nAssistant: Sure, the compound canonical SMILES CCn1c(=O)[nH]c2nc(-c3ccc(S(=O)(=O)N4CCN(Cc5ccc(F)cc5)CC4)cc3)[nH]c2c1=O targets the compound canonical SMILES OC[C@H]1O[C@@H](n2cnc3c(NC4CC5CCC4C5)ncnc32)[C@H](O)[C@@H]1O."}", "/scratch/micpie/export/compound_protein_compound_1/valid_0-1.jsonl": "{"text":"The protein Serine-protein kinase ATM is targeted by the compound with the canonical SMILES CC1C=CCC2C(=O)N(c3cccc(C(=O)Nc4ccc(Br)cc4)c3)C(=O)C12 and Cc1ccc(SC(CC(=O)c2ccc(F)cc2)C(=O)O)cc1."} {"text":"The protein Aldo-keto reductase family 1 member B1 is targeted by the compound with the SELFIES [O][=S][=Branch1][C][=O][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=C][C][=C][Branch1][C][O][N][=N][Ring1][#Branch1] and O=c1cc(-c2ccccc2)c2ccc(O)cc2o1."}", "/scratch/micpie/export/compound_protein_compound_1/valid_7-1.jsonl": "{"text":"The protein Adenosine receptor A2a is targeted by the compound with the InChI InChI=1S\/C11H9ClFN3OS\/c1-5(17)10-9(14)11(16-18-10)15-8-4-6(12)2-3-7(8)13\/h2-4H,14H2,1H3,(H,15,16) and OC[C@H]1O[C@@H](n2cnc3c(NC4CC5CCC4C5)ncnc32)[C@H](O)[C@@H]1O."} {"text":"The protein KOR-1 is targeted by the compound with the SELFIES [C][N][C][=Branch1][C][=O][N][C][C][=C][C][=C][Branch2][Ring1][=N][C][C][=Branch1][C][=O][N][Branch1][C][C][C@H1][Branch1][=Branch2][C][N][C][C][C][C][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][Branch2] and C[C@H]1CN(Cc2ccc3ccccc3n2)CC[C@@]1(C)c1cccc(C(N)=O)c1."}", "/scratch/micpie/export/compound_protein_compound_1/train_2-1.jsonl": "{"text":"The protein G-protein coupled receptor 44 is targeted by the compound with the SMILES COc1ccc(CN2C(=O)CC3(C2=O)C(=O)N(CC(=O)O)c2ccc(Cl)cc23)cc1F and Cc1ccc(S(=O)(=O)C(C)C)cc1C#Cc1cc(Cl)ccc1OCC(=O)O."} {"text":"The protein Tyr-DNA phosphodiesterase 1 is targeted by the compound with the SMILES CN1CCN(NS(=O)(=O)c2ccc(Cl)s2)CC1 and O=C(\/C=C\/c1ccc(O)c(O)c1)O[C@@H](C(=O)O)[C@@H](OC(=O)\/C=C\/c1ccc(O)c(O)c1)C(=O)O."}", "/scratch/micpie/export/compound_protein_compound_1/valid_1-1.jsonl": "{"text":"The protein AR is targeted by the compound with the SELFIES [C][C][O][C][=Branch1][C][=O][C][=C][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][O][C][=C][Ring1][#Branch1][O][Ring1][N] and O=c1cc(-c2ccccc2)c2ccc(O)cc2o1."} {"text":"The protein Prostaglandin D2 receptor 2 is targeted by the compound with the DeepSMILES C[C@@H]CC=O)NCCccF)ccc-ccccCC=O)O)))cc6OCCF)F)F))))))))))c6C%10))))))))))))cccccc6 and Cc1ccc(S(=O)(=O)C(C)C)cc1C#Cc1cc(Cl)ccc1OCC(=O)O."}", "/scratch/micpie/export/compound_protein_compound_1/test_3-1.jsonl": "{"text":"The protein Tyr-DNA phosphodiesterase 1 is targeted by the compound with the DeepSMILES O=c[nH]ccccS=O)=O)NCCNS=O)=O)cccccc6)OCCO6))))))))))CC6)))))))cc6[nH]c%10=O and O=C(\/C=C\/c1ccc(O)c(O)c1)O[C@@H](C(=O)O)[C@@H](OC(=O)\/C=C\/c1ccc(O)c(O)c1)C(=O)O."} {"text":"The protein Tyr-DNA phosphodiesterase 1 is targeted by the compound with the canonical SMILES Oc1[nH]c(S)nc2nc(S)nc1-2 and CCOc1ccc(CCNC(=O)c2ccc(Cl)c(S(=O)(=O)N3CCOCC3)c2)cc1OCC."}", "/scratch/micpie/export/compound_protein_compound_1/train_9-0.jsonl": "{"text":"The compound DeepSMILES COcccccc6)CCNSN)=O)=O))C6ccccC)cc6 targets the protein Carbonate dehydratase II and which is also targeted by the compound CN(CC(=O)Nc1c(S(N)(=O)=O)cc(S(N)(=O)=O)c(Cl)c1Cl)C(=N)N."} {"text":"The compound canonical SMILES NS(=O)(=O)c1ccc(CNC(=O)CCCNC(=O)Nc2ccccc2)cc1 targets the protein Carbonate dehydratase I and which is also targeted by the compound CSc1ccc(\/C=N\/CCc2ccc(S(N)(=O)=O)cc2)cc1."}", "/scratch/micpie/export/compound_protein_compound_1/valid_9-2.jsonl": "{"text":"User: Can you come up with an example for a compound SELFIES that targets the protein Carbonate dehydratase II?\nAssistant: Yes, of course, the compound SELFIES [C][C][C][C][C][C][C][Branch1][C][N][C][=C][S][C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][=C][Ring1][=Branch2] targets the protein Carbonate dehydratase II.\nUser: Can you tell another compound SELFIES that targets the protein Carbonate dehydratase II?\nAssistant: Of course, the compound SELFIES [C][C][C][C][C][C][C][Branch1][C][N][C][=C][S][C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][=C][Ring1][=Branch2] targets the compound SELFIES CN(CC(=O)Nc1c(S(N)(=O)=O)cc(S(N)(=O)=O)c(Cl)c1Cl)C(=N)N."} {"text":"User: Can you come up with one example for a compound DeepSMILES that targets the protein CAB?\nAssistant: Yes, of course, the compound DeepSMILES CCC)CC=O)OC[C@H]O[C@@H]ncc-ccccSN)=O)=O))cc6))))))nn5)))))[C@H]OC=O)CCC)C)))))[C@@H]OC=O)CCC)C)))))[C@@H]6OC=O)CCC)C targets the protein CAB.\nUser: Can you tell another compound DeepSMILES that targets the protein CAB?\nAssistant: Sure, the compound DeepSMILES CCC)CC=O)OC[C@H]O[C@@H]ncc-ccccSN)=O)=O))cc6))))))nn5)))))[C@H]OC=O)CCC)C)))))[C@@H]OC=O)CCC)C)))))[C@@H]6OC=O)CCC)C targets the compound DeepSMILES CSc1ccc(\/C=N\/CCc2ccc(S(N)(=O)=O)cc2)cc1."}", "/scratch/micpie/export/compound_protein_compound_1/test_1-0.jsonl": "{"text":"The compound InChI InChI=1S\/C11H10N2O8\/c1-2-21-11(18)9(15)12-7-4-5(13(19)20)3-6(8(7)14)10(16)17\/h3-4,14H,2H2,1H3,(H,12,15)(H,16,17) targets the protein Aldose reductase and which is also targeted by the compound O=c1cc(-c2ccccc2)c2ccc(O)cc2o1."} {"text":"The compound SELFIES [C][C][O][C][=C][C][=C][Branch1][#Branch1][C][C][=Branch1][C][=O][O][C][=C][Ring1][#Branch2][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][N][Branch2][Ring1][C][C][=Branch1][C][=O][C][C][Branch1][C][C][C][=C][N][=C][C][=N][Ring1][=Branch1][C][C][Ring1][P] targets the protein Chemoattractant receptor-homologous molecule expressed on Th2 cells and which is also targeted by the compound Cc1ccc(S(=O)(=O)C(C)C)cc1C#Cc1cc(Cl)ccc1OCC(=O)O."}", "/scratch/micpie/export/compound_protein_compound_1/train_0-2.jsonl": "{"text":"User: Can you give me an example for a compound InChI that targets the protein Ataxia telangiectasia mutated?\nAssistant: Yes, the compound InChI InChI=1S\/C17H16N2O4\/c20-17(12-23-16-10-4-3-9-15(16)19(21)22)18-11-5-7-13-6-1-2-8-14(13)18\/h1-4,6,8-10H,5,7,11-12H2 targets the protein Ataxia telangiectasia mutated.\nUser: Can you create another SMILES of a molecule that targets the protein Ataxia telangiectasia mutated?\nAssistant: Of course, the SMILES representation Cc1ccc(SC(CC(=O)c2ccc(F)cc2)C(=O)O)cc1 targets the protein Ataxia telangiectasia mutated."} {"text":"User: Can you give me an example for a compound DeepSMILES that targets the protein Aldo-keto reductase family 1 member B1?\nAssistant: Sure, the compound DeepSMILES CS=O)=O)ccCNC=O)NC5=O))))))ccCl)cc6CF)F)F targets the protein Aldo-keto reductase family 1 member B1.\nUser: Can you create another SMILES of a molecule that targets the protein Aldo-keto reductase family 1 member B1?\nAssistant: Yes, of course, the SMILES O=c1cc(-c2ccccc2)c2ccc(O)cc2o1 targets the protein Aldo-keto reductase family 1 member B1."}", "/scratch/micpie/export/compound_protein_compound_1/train_4-2.jsonl": "{"text":"User: Can you come up with one example for a compound DeepSMILES that targets the protein Tyrosyl-DNA phosphodiesterase 1?\nAssistant: Yes, of course, the compound DeepSMILES O=CCScncccccc6c=O)n%10-cccccc6))))))))))))))))))NC=O)CNCCCC5=O targets the protein Tyrosyl-DNA phosphodiesterase 1.\nUser: Can you generate another compound DeepSMILES that targets the protein Tyrosyl-DNA phosphodiesterase 1?\nAssistant: Yes, of course, the compound DeepSMILES O=CCScncccccc6c=O)n%10-cccccc6))))))))))))))))))NC=O)CNCCCC5=O targets the compound DeepSMILES CCOc1ccc(CCNC(=O)c2ccc(Cl)c(S(=O)(=O)N3CCOCC3)c2)cc1OCC."} {"text":"User: Can you come up with one example for a compound canonical SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, the compound canonical SMILES COC(=O)c1c(-c2cccs2)csc1NC(=O)c1ccc(C(C)C)cc1 targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you tell another compound canonical SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, the compound canonical SMILES COC(=O)c1c(-c2cccs2)csc1NC(=O)c1ccc(C(C)C)cc1 targets the compound canonical SMILES COc1cc2ccc(=O)oc2c(O)c1O."}", "/scratch/micpie/export/compound_protein_compound_1/test_6-1.jsonl": "{"text":"The protein 5-HT-2 is targeted by the compound with the SELFIES [C][C][NH1][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Branch1][C][C][C][=Ring1][N][C][=Branch1][C][=O][N][C][C][C][N][C][C][N][Branch1][=N][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1][Cl][C][C][Ring1][=C].[Cl] and O=S(=O)(Nc1cc(Sc2cccc3cccnc23)c(O)c2ccccc12)c1ccc(F)cc1."} {"text":"The protein Adenosine receptor A2a is targeted by the compound with the canonical SMILES CCn1c(=O)[nH]c2nc(-c3ccc(S(=O)(=O)N4CCN(Cc5ccc(F)cc5)CC4)cc3)[nH]c2c1=O and OC[C@H]1O[C@@H](n2cnc3c(NC4CC5CCC4C5)ncnc32)[C@H](O)[C@@H]1O."}", "/scratch/micpie/export/compound_protein_compound_1/valid_4-1.jsonl": "{"text":"The protein Tyr-DNA phosphodiesterase 1 is targeted by the compound with the SMILES CCCCC1=CC2=CC(=O)C(C)(OC(=O)C3CCCC3)C(=O)C2=CO1 and CCOc1ccc(CCNC(=O)c2ccc(Cl)c(S(=O)(=O)N3CCOCC3)c2)cc1OCC."} {"text":"The protein Tyr-DNA phosphodiesterase 1 is targeted by the compound with the SELFIES [C][O][C][=C][C][=C][Branch1][=Branch1][C][=Branch1][C][=O][O][C][=C][Ring1][=Branch2][N][C][=Branch1][C][=S][N][C][=Branch1][C][=O][C][=C][C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][O][Ring1][N] and COc1cc2ccc(=O)oc2c(O)c1O."}", "/scratch/micpie/export/compound_protein_compound_1/test_2-2.jsonl": "{"text":"User: Can you come up with an example for a compound InChI that targets the protein Chemoattractant receptor-homologous molecule expressed on Th2 cells?\nAssistant: Sure, the compound InChI InChI=1S\/C25H17ClF2O4S\/c1-14-17(11-24(29)30)10-16-2-5-18(26)12-21(16)25(14)15-3-7-20(8-4-15)33(31,32)23-9-6-19(27)13-22(23)28\/h2-10,12-13H,11H2,1H3,(H,29,30) targets the protein Chemoattractant receptor-homologous molecule expressed on Th2 cells.\nUser: Can you generate another compound InChI that targets the protein Chemoattractant receptor-homologous molecule expressed on Th2 cells?\nAssistant: Yes, of course, the compound InChI InChI=1S\/C25H17ClF2O4S\/c1-14-17(11-24(29)30)10-16-2-5-18(26)12-21(16)25(14)15-3-7-20(8-4-15)33(31,32)23-9-6-19(27)13-22(23)28\/h2-10,12-13H,11H2,1H3,(H,29,30) targets the compound InChI Cc1ccc(S(=O)(=O)C(C)C)cc1C#Cc1cc(Cl)ccc1OCC(=O)O."} {"text":"User: Can you come up with one example for a compound InChI that targets the protein Tyrosyl-DNA phosphodiesterase 1?\nAssistant: Yes, the compound InChI InChI=1S\/C19H25NO3\/c1-3-4-7-14-11-17(21)23-16-10-13(2)15(19(22)18(14)16)12-20-8-5-6-9-20\/h10-11,22H,3-9,12H2,1-2H3 targets the protein Tyrosyl-DNA phosphodiesterase 1.\nUser: Can you create another compound InChI that targets the protein Tyrosyl-DNA phosphodiesterase 1?\nAssistant: Sure, the compound InChI InChI=1S\/C19H25NO3\/c1-3-4-7-14-11-17(21)23-16-10-13(2)15(19(22)18(14)16)12-20-8-5-6-9-20\/h10-11,22H,3-9,12H2,1-2H3 targets the compound InChI O=C(\/C=C\/c1ccc(O)c(O)c1)O[C@@H](C(=O)O)[C@@H](OC(=O)\/C=C\/c1ccc(O)c(O)c1)C(=O)O."}", "/scratch/micpie/export/compound_protein_compound_1/train_1-1.jsonl": "{"text":"The protein Aldehyde reductase is targeted by the compound with the SELFIES [C][C][O][C][=Branch1][C][=O][C][=C][Branch1][C][O][C][=Branch1][C][=O][N][Branch1][=N][C@@H1][Branch1][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][#C] and O=c1cc(-c2ccccc2)c2ccc(O)cc2o1."} {"text":"The protein G-protein coupled receptor 44 is targeted by the compound with the canonical SMILES Cc1c(Cc2nn(Cc3ccccc3)c(=O)c3ccccc23)c2cc(Cl)ccc2n1CC(=O)O and Cc1ccc(S(=O)(=O)C(C)C)cc1C#Cc1cc(Cl)ccc1OCC(=O)O."}", "/scratch/micpie/export/compound_protein_compound_1/valid_7-0.jsonl": "{"text":"The compound DeepSMILES CC=O)csncNcccCl)ccc6F))))))))c5N targets the protein Adenosine receptor A2a and which is also targeted by the compound OC[C@H]1O[C@@H](n2cnc3c(NC4CC5CCC4C5)ncnc32)[C@H](O)[C@@H]1O."} {"text":"The compound canonical SMILES CNC(=O)NCc1ccc(CC(=O)N(C)[C@H](CN2CCCC2)c2ccccc2)cc1 targets the protein Kappa-type opioid receptor and which is also targeted by the compound C[C@H]1CN(Cc2ccc3ccccc3n2)CC[C@@]1(C)c1cccc(C(N)=O)c1."}", "/scratch/micpie/export/compound_protein_compound_1/valid_8-1.jsonl": "{"text":"The protein KOR-1 is targeted by the compound with the SELFIES [C][O][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C@][C][C][N][Branch1][#Branch1][C][C][C][C][Ring1][Ring1][C@H1][Branch1][Ring2][C][Ring1][=N][C@@H1][Ring1][O][C][C][S][C][Branch1][C][N][=N][C][=Ring1][=Branch1][C][Ring2][Ring1][Ring1] and C[C@H]1CN(Cc2ccc3ccccc3n2)CC[C@@]1(C)c1cccc(C(N)=O)c1."} {"text":"The protein Cyanamide hydratase CA2 is targeted by the compound with the DeepSMILES CcccS=O)=O)NC=N)NccccSN)=O)=O))cc6))))))))))cSCcccccccccc%106))))))))))))cc6Cl and CN(CC(=O)Nc1c(S(N)(=O)=O)cc(S(N)(=O)=O)c(Cl)c1Cl)C(=N)N."}", "/scratch/micpie/export/compound_protein_compound_1/train_0-1.jsonl": "{"text":"The protein A-T mutated is targeted by the compound with the DeepSMILES O=CCOcccccc6[N+]=O)[O-]))))))))))NCCCcccccc6%10 and Cc1ccc(SC(CC(=O)c2ccc(F)cc2)C(=O)O)cc1."} {"text":"The protein Aldo-keto reductase family 1 member B1 is targeted by the compound with the InChI InChI=1S\/C11H8ClF3N2O4S\/c1-22(20,21)8-5(7-9(18)17-10(19)16-7)2-4(12)3-6(8)11(13,14)15\/h2-3,7H,1H3,(H2,16,17,18,19) and O=c1cc(-c2ccccc2)c2ccc(O)cc2o1."}", "/scratch/micpie/export/compound_protein_compound_1/valid_8-0.jsonl": "{"text":"The compound DeepSMILES COcccccc6)[C@]CCNCCCC3))))[C@H]C8)[C@@H]6CcscN)nc5C%13 targets the protein Kappa-type opioid receptor and which is also targeted by the compound C[C@H]1CN(Cc2ccc3ccccc3n2)CC[C@@]1(C)c1cccc(C(N)=O)c1."} {"text":"The compound SELFIES [C][C][=C][C][Branch2][Ring1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][=Branch1][C][=N][N][C][=C][C][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][C][=C][Ring1][#Branch2][=C][Branch1][P][S][C][C][=C][C][=C][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][=C][Ring2][Ring2][Ring1][Cl] targets the protein Carbonic anhydrase II and which is also targeted by the compound CN(CC(=O)Nc1c(S(N)(=O)=O)cc(S(N)(=O)=O)c(Cl)c1Cl)C(=N)N."}", "/scratch/micpie/export/compound_protein_compound_1/test_9-1.jsonl": "{"text":"The protein Cyanamide hydratase CA2 is targeted by the compound with the SELFIES [N][S][=Branch1][C][=O][=Branch1][C][=O][C@@H1][O][C@H1][Branch1][Ring1][C][O][C@@H1][Branch1][C][O][C@H1][Branch1][C][O][C@H1][Ring1][#Branch2][O] and CN(CC(=O)Nc1c(S(N)(=O)=O)cc(S(N)(=O)=O)c(Cl)c1Cl)C(=N)N."} {"text":"The protein Carbonic anhydrase I is targeted by the compound with the InChI InChI=1S\/C10H12O4\/c1-6(2)14-10(13)7-3-8(11)5-9(12)4-7\/h3-6,11-12H,1-2H3 and CSc1ccc(\/C=N\/CCc2ccc(S(N)(=O)=O)cc2)cc1."}", "/scratch/micpie/export/compound_protein_compound_1/valid_1-0.jsonl": "{"text":"The compound canonical SMILES CCOC(=O)c1cc(=O)c2ccc(O)cc2o1 targets the protein AR and which is also targeted by the compound O=c1cc(-c2ccccc2)c2ccc(O)cc2o1."} {"text":"The compound SELFIES [C][C@@H1][Branch2][Branch1][=Branch1][C][C][=Branch1][C][=O][N][C][C][C][=C][Branch1][C][F][C][=C][C][Branch2][Ring1][O][C][=C][C][=C][Branch1][#Branch1][C][C][=Branch1][C][=O][O][C][=C][Ring1][#Branch2][O][C][C][Branch1][C][F][Branch1][C][F][F][=C][Ring2][Ring1][#Branch1][C][Ring2][Ring1][O][C][=C][C][=C][C][=C][Ring1][=Branch1] targets the protein Prostaglandin D2 receptor 2 and which is also targeted by the compound Cc1ccc(S(=O)(=O)C(C)C)cc1C#Cc1cc(Cl)ccc1OCC(=O)O."}", "/scratch/micpie/export/compound_protein_compound_1/train_6-2.jsonl": "{"text":"User: Can you give me an example for a compound canonical SMILES that targets the protein 5-HT-2?\nAssistant: Sure, the compound canonical SMILES CN(C)CCSC(C)(C)C targets the protein 5-HT-2.\nUser: Can you generate another compound canonical SMILES that targets the protein 5-HT-2?\nAssistant: Sure, the compound canonical SMILES CN(C)CCSC(C)(C)C targets the compound canonical SMILES O=S(=O)(Nc1cc(Sc2cccc3cccnc23)c(O)c2ccccc12)c1ccc(F)cc1."} {"text":"User: Can you give me one example for a compound SELFIES that targets the protein Adenosine receptor A2a?\nAssistant: Of course, the compound SELFIES [N][C][=N][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][C][Branch1][S][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O][Ring1][=Branch1][=N][Ring2][Ring1][Branch1] targets the protein Adenosine receptor A2a.\nUser: Can you create another compound SELFIES that targets the protein Adenosine receptor A2a?\nAssistant: Yes, the compound SELFIES [N][C][=N][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][C][Branch1][S][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O][Ring1][=Branch1][=N][Ring2][Ring1][Branch1] targets the compound SELFIES OC[C@H]1O[C@@H](n2cnc3c(NC4CC5CCC4C5)ncnc32)[C@H](O)[C@@H]1O."}", "/scratch/micpie/export/compound_protein_compound_1/valid_6-2.jsonl": "{"text":"User: Can you come up with one example for a compound SELFIES that targets the protein 5-HT-2?\nAssistant: Yes, the compound SELFIES [Cl][C][=C][C][=C][Branch2][Ring2][Ring2][C][=C][C][=C][Branch2][Ring1][Branch2][C][C][C][N][C][=C][C][=C][Branch1][#Branch2][C][N][C][C][C][C][C][Ring1][=Branch1][C][=C][Ring1][=N][C][=C][Ring2][Ring1][#Branch1][C][=C][Ring2][Ring1][=N] targets the protein 5-HT-2.\nUser: Can you create another compound SELFIES that targets the protein 5-HT-2?\nAssistant: Yes, the compound SELFIES [Cl][C][=C][C][=C][Branch2][Ring2][Ring2][C][=C][C][=C][Branch2][Ring1][Branch2][C][C][C][N][C][=C][C][=C][Branch1][#Branch2][C][N][C][C][C][C][C][Ring1][=Branch1][C][=C][Ring1][=N][C][=C][Ring2][Ring1][#Branch1][C][=C][Ring2][Ring1][=N] targets the compound SELFIES O=S(=O)(Nc1cc(Sc2cccc3cccnc23)c(O)c2ccccc12)c1ccc(F)cc1."} {"text":"User: Can you give me an example for a compound SMILES that targets the protein Adenosine receptor A2a?\nAssistant: Of course, the compound SMILES Cc1cc(C)n(-c2cc(NC(=O)COc3cccc(CCN(C)C)c3)nc(-c3ccc(C)o3)n2)n1 targets the protein Adenosine receptor A2a.\nUser: Can you create another compound SMILES that targets the protein Adenosine receptor A2a?\nAssistant: Sure, the compound SMILES Cc1cc(C)n(-c2cc(NC(=O)COc3cccc(CCN(C)C)c3)nc(-c3ccc(C)o3)n2)n1 targets the compound SMILES OC[C@H]1O[C@@H](n2cnc3c(NC4CC5CCC4C5)ncnc32)[C@H](O)[C@@H]1O."}", "/scratch/micpie/export/compound_protein_compound_1/train_6-0.jsonl": "{"text":"The compound SMILES CN(C)CCSC(C)(C)C targets the protein 5-HT-2A and which is also targeted by the compound O=S(=O)(Nc1cc(Sc2cccc3cccnc23)c(O)c2ccccc12)c1ccc(F)cc1."} {"text":"The compound SMILES Nc1nc(-c2ccccc2)cc(-c2ccc3c(c2)OCO3)n1 targets the protein Adenosine receptor A2a and which is also targeted by the compound OC[C@H]1O[C@@H](n2cnc3c(NC4CC5CCC4C5)ncnc32)[C@H](O)[C@@H]1O."}", "/scratch/micpie/export/compound_protein_compound_1/train_8-2.jsonl": "{"text":"User: Can you come up with an example for a compound SMILES that targets the protein Kappa-type opioid receptor?\nAssistant: Of course, the compound SMILES CCn1c(=O)n([C@H]2CCN3[C@H](c4ccccc4)CCC[C@@H]3C2)c2ccccc21 targets the protein Kappa-type opioid receptor.\nUser: Can you create another compound SMILES that targets the protein Kappa-type opioid receptor?\nAssistant: Yes, of course, the compound SMILES CCn1c(=O)n([C@H]2CCN3[C@H](c4ccccc4)CCC[C@@H]3C2)c2ccccc21 targets the compound SMILES C[C@H]1CN(Cc2ccc3ccccc3n2)CC[C@@]1(C)c1cccc(C(N)=O)c1."} {"text":"User: Can you come up with an example for a compound DeepSMILES that targets the protein Cyanamide hydratase CA2?\nAssistant: Yes, of course, the compound DeepSMILES NS=O)=O)cccccc6NC=O)CNCCNCC=O)O)))cccccc6O))))))))))cccccc6O targets the protein Cyanamide hydratase CA2.\nUser: Can you create another compound DeepSMILES that targets the protein Cyanamide hydratase CA2?\nAssistant: Of course, the compound DeepSMILES NS=O)=O)cccccc6NC=O)CNCCNCC=O)O)))cccccc6O))))))))))cccccc6O targets the compound DeepSMILES CN(CC(=O)Nc1c(S(N)(=O)=O)cc(S(N)(=O)=O)c(Cl)c1Cl)C(=N)N."}", "/scratch/micpie/export/compound_protein_compound_1/train_3-1.jsonl": "{"text":"The protein Tyrosyl-DNA phosphodiesterase 1 is targeted by the compound with the SELFIES [C][C][=C][C][=C][Branch2][Ring1][O][S][=Branch1][C][=O][=Branch1][C][=O][N][Branch1][Ring1][C][#N][C][=N][C][Branch1][C][C][=C][C][Branch1][C][C][=N][Ring1][Branch2][C][=C][Ring2][Ring1][Ring2] and O=C(\/C=C\/c1ccc(O)c(O)c1)O[C@@H](C(=O)O)[C@@H](OC(=O)\/C=C\/c1ccc(O)c(O)c1)C(=O)O."} {"text":"The protein Tyrosyl-DNA phosphodiesterase 1 is targeted by the compound with the InChI InChI=1S\/C21H25BrN2O3S\/c1-16-4-3-5-19(14-16)24-12-11-23(15-17(24)2)21(25)10-13-28(26,27)20-8-6-18(22)7-9-20\/h3-9,14,17H,10-13,15H2,1-2H3 and CCOc1ccc(CCNC(=O)c2ccc(Cl)c(S(=O)(=O)N3CCOCC3)c2)cc1OCC."}", "/scratch/micpie/export/compound_protein_compound_1/test_8-0.jsonl": "{"text":"The compound SMILES CC(C)[C@H]1[C@H]2c3ccccc3C[C@H](c3ccccc32)N1C targets the protein K-OR-1 and which is also targeted by the compound C[C@H]1CN(Cc2ccc3ccccc3n2)CC[C@@]1(C)c1cccc(C(N)=O)c1."} {"text":"The compound InChI InChI=1S\/C15H21N7O5S4\/c16-30(23,24)15-20-19-14(29-15)21-31(25,26)12-3-1-11(2-4-12)18-13(28)17-5-6-22-7-9-27-10-8-22\/h1-4H,5-10H2,(H,19,21)(H2,16,23,24)(H2,17,18,28) targets the protein CA-II and which is also targeted by the compound CN(CC(=O)Nc1c(S(N)(=O)=O)cc(S(N)(=O)=O)c(Cl)c1Cl)C(=N)N."}", "/scratch/micpie/export/compound_protein_compound_1/valid_3-1.jsonl": "{"text":"The protein Tyrosyl-DNA phosphodiesterase 1 is targeted by the compound with the DeepSMILES CccccNCS)=NN=CS)NcccccCF)F)F))c6))))))))))))cc6 and O=C(\/C=C\/c1ccc(O)c(O)c1)O[C@@H](C(=O)O)[C@@H](OC(=O)\/C=C\/c1ccc(O)c(O)c1)C(=O)O."} {"text":"The protein Tyr-DNA phosphodiesterase 1 is targeted by the compound with the DeepSMILES C=CC=O)O[C@@H]C[C@@H]C)[C@@H]C=CC=O)[C@@]5C)[C@@H]O)[C@H]%13%10 and CCOc1ccc(CCNC(=O)c2ccc(Cl)c(S(=O)(=O)N3CCOCC3)c2)cc1OCC."}", "/scratch/micpie/export/compound_protein_compound_1/train_9-1.jsonl": "{"text":"The protein Cyanamide hydratase CA2 is targeted by the compound with the SELFIES [C][O][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][C][N][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][C][Ring1][O][C][=C][C][=C][Branch1][C][C][C][=C][Ring1][#Branch1] and CN(CC(=O)Nc1c(S(N)(=O)=O)cc(S(N)(=O)=O)c(Cl)c1Cl)C(=N)N."} {"text":"The protein CA-I is targeted by the compound with the SMILES NS(=O)(=O)c1ccc(CNC(=O)CCCNC(=O)Nc2ccccc2)cc1 and CSc1ccc(\/C=N\/CCc2ccc(S(N)(=O)=O)cc2)cc1."}", "/scratch/micpie/export/compound_protein_compound_1/test_4-1.jsonl": "{"text":"The protein Tyrosyl-DNA phosphodiesterase 1 is targeted by the compound with the InChI InChI=1S\/C19H19ClN2O4S2\/c1-12(19(24)21-9-13-4-2-3-5-15(13)20)11-28(25,26)14-6-7-17-16(8-14)22-18(23)10-27-17\/h2-8,12H,9-11H2,1H3,(H,21,24)(H,22,23) and CCOc1ccc(CCNC(=O)c2ccc(Cl)c(S(=O)(=O)N3CCOCC3)c2)cc1OCC."} {"text":"The protein Tyrosyl-DNA phosphodiesterase 1 is targeted by the compound with the InChI InChI=1S\/C14H13NO5\/c16-13(15-9-12-4-2-8-19-12)10-20-14(17)6-5-11-3-1-7-18-11\/h1-8H,9-10H2,(H,15,16)\/b6-5+ and COc1cc2ccc(=O)oc2c(O)c1O."}", "/scratch/micpie/export/compound_protein_compound_1/valid_6-1.jsonl": "{"text":"The protein 5-HT-2 is targeted by the compound with the SMILES Clc1ccc(-c2ccc(CCCNc3ccc(CN4CCCCC4)cc3)cc2)cc1 and O=S(=O)(Nc1cc(Sc2cccc3cccnc23)c(O)c2ccccc12)c1ccc(F)cc1."} {"text":"The protein Adenosine receptor A2a is targeted by the compound with the SMILES Cc1cc(C)n(-c2cc(NC(=O)COc3cccc(CCN(C)C)c3)nc(-c3ccc(C)o3)n2)n1 and OC[C@H]1O[C@@H](n2cnc3c(NC4CC5CCC4C5)ncnc32)[C@H](O)[C@@H]1O."}", "/scratch/micpie/export/compound_protein_compound_1/train_1-2.jsonl": "{"text":"User: Can you give me an example for a compound SELFIES that targets the protein AR?\nAssistant: Yes, the compound SELFIES [C][C][O][C][=Branch1][C][=O][C][=C][Branch1][C][O][C][=Branch1][C][=O][N][Branch1][=N][C@@H1][Branch1][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][#C] targets the protein AR.\nUser: Can you tell another compound SELFIES that targets the protein AR?\nAssistant: Sure, the compound SELFIES [C][C][O][C][=Branch1][C][=O][C][=C][Branch1][C][O][C][=Branch1][C][=O][N][Branch1][=N][C@@H1][Branch1][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][#C] targets the compound SELFIES O=c1cc(-c2ccccc2)c2ccc(O)cc2o1."} {"text":"User: Can you come up with an example for a compound InChI that targets the protein Chemoattractant receptor-homologous molecule expressed on Th2 cells?\nAssistant: Yes, the compound InChI InChI=1S\/C27H22ClN3O3\/c1-17-22(23-13-19(28)11-12-25(23)30(17)16-26(32)33)14-24-20-9-5-6-10-21(20)27(34)31(29-24)15-18-7-3-2-4-8-18\/h2-13H,14-16H2,1H3,(H,32,33) targets the protein Chemoattractant receptor-homologous molecule expressed on Th2 cells.\nUser: Can you generate another compound InChI that targets the protein Chemoattractant receptor-homologous molecule expressed on Th2 cells?\nAssistant: Yes, the compound InChI InChI=1S\/C27H22ClN3O3\/c1-17-22(23-13-19(28)11-12-25(23)30(17)16-26(32)33)14-24-20-9-5-6-10-21(20)27(34)31(29-24)15-18-7-3-2-4-8-18\/h2-13H,14-16H2,1H3,(H,32,33) targets the compound InChI Cc1ccc(S(=O)(=O)C(C)C)cc1C#Cc1cc(Cl)ccc1OCC(=O)O."}", "/scratch/micpie/export/compound_protein_compound_1/valid_6-0.jsonl": "{"text":"The compound canonical SMILES Clc1ccc(-c2ccc(CCCNc3ccc(CN4CCCCC4)cc3)cc2)cc1 targets the protein 5-hydroxytryptamine receptor 2A and which is also targeted by the compound O=S(=O)(Nc1cc(Sc2cccc3cccnc23)c(O)c2ccccc12)c1ccc(F)cc1."} {"text":"The compound DeepSMILES CcccC)n-cccNC=O)COcccccCCNC)C))))c6))))))))))nc-ccccC)o5)))))n6))))))n5 targets the protein Adenosine receptor A2a and which is also targeted by the compound OC[C@H]1O[C@@H](n2cnc3c(NC4CC5CCC4C5)ncnc32)[C@H](O)[C@@H]1O."}", "/scratch/micpie/export/compound_protein_compound_1/train_3-0.jsonl": "{"text":"The compound InChI InChI=1S\/C14H14N4O2S\/c1-10-4-6-13(7-5-10)21(19,20)18(9-15)14-16-11(2)8-12(3)17-14\/h4-8H,1-3H3 targets the protein Tyrosyl-DNA phosphodiesterase 1 and which is also targeted by the compound O=C(\/C=C\/c1ccc(O)c(O)c1)O[C@@H](C(=O)O)[C@@H](OC(=O)\/C=C\/c1ccc(O)c(O)c1)C(=O)O."} {"text":"The compound DeepSMILES CcccccNCCNC=O)CCS=O)=O)ccccBr)cc6))))))))))CC6C)))))))c6 targets the protein Tyrosyl-DNA phosphodiesterase 1 and which is also targeted by the compound CCOc1ccc(CCNC(=O)c2ccc(Cl)c(S(=O)(=O)N3CCOCC3)c2)cc1OCC."}", "/scratch/micpie/export/compound_protein_compound_1/test_9-2.jsonl": "{"text":"User: Can you give me an example for a compound canonical SMILES that targets the protein Cyanamide hydratase CA2?\nAssistant: Sure, the compound canonical SMILES NS(=O)(=O)[C@@H]1O[C@H](CO)[C@@H](O)[C@H](O)[C@H]1O targets the protein Cyanamide hydratase CA2.\nUser: Can you tell another compound canonical SMILES that targets the protein Cyanamide hydratase CA2?\nAssistant: Yes, of course, the compound canonical SMILES NS(=O)(=O)[C@@H]1O[C@H](CO)[C@@H](O)[C@H](O)[C@H]1O targets the compound canonical SMILES CN(CC(=O)Nc1c(S(N)(=O)=O)cc(S(N)(=O)=O)c(Cl)c1Cl)C(=N)N."} {"text":"User: Can you give me an example for a compound SMILES that targets the protein Carbonic anhydrase B?\nAssistant: Yes, of course, the compound SMILES CC(C)OC(=O)c1cc(O)cc(O)c1 targets the protein Carbonic anhydrase B.\nUser: Can you tell another compound SMILES that targets the protein Carbonic anhydrase B?\nAssistant: Of course, the compound SMILES CC(C)OC(=O)c1cc(O)cc(O)c1 targets the compound SMILES CSc1ccc(\/C=N\/CCc2ccc(S(N)(=O)=O)cc2)cc1."}", "/scratch/micpie/export/compound_protein_compound_1/train_7-0.jsonl": "{"text":"The compound InChI InChI=1S\/C16H11Cl2N3O2\/c17-11-3-1-9(5-12(11)18)6-15-19-16(21-20-15)10-2-4-13-14(7-10)23-8-22-13\/h1-5,7H,6,8H2,(H,19,20,21) targets the protein Adenosine receptor A2a and which is also targeted by the compound OC[C@H]1O[C@@H](n2cnc3c(NC4CC5CCC4C5)ncnc32)[C@H](O)[C@@H]1O."} {"text":"The compound canonical SMILES O=C(O)C(CO)NCCNc1nc2ccccc2n([C@H]2C[C@H]3CC[C@@H](C2)N3C2CCCCCCC2)c1=O targets the protein K-OR-1 and which is also targeted by the compound C[C@H]1CN(Cc2ccc3ccccc3n2)CC[C@@]1(C)c1cccc(C(N)=O)c1."}", "/scratch/micpie/export/compound_protein_compound_1/train_4-0.jsonl": "{"text":"The compound SMILES O=C(CSc1nc2ccccc2c(=O)n1-c1ccccc1)NC(=O)CN1CCCC1=O targets the protein Tyrosyl-DNA phosphodiesterase 1 and which is also targeted by the compound CCOc1ccc(CCNC(=O)c2ccc(Cl)c(S(=O)(=O)N3CCOCC3)c2)cc1OCC."} {"text":"The compound canonical SMILES COC(=O)c1c(-c2cccs2)csc1NC(=O)c1ccc(C(C)C)cc1 targets the protein Tyrosyl-DNA phosphodiesterase 1 and which is also targeted by the compound COc1cc2ccc(=O)oc2c(O)c1O."}", "/scratch/micpie/export/compound_protein_compound_1/train_3-2.jsonl": "{"text":"User: Can you give me one example for a compound InChI that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, of course, the compound InChI InChI=1S\/C14H14N4O2S\/c1-10-4-6-13(7-5-10)21(19,20)18(9-15)14-16-11(2)8-12(3)17-14\/h4-8H,1-3H3 targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you tell another compound InChI that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, of course, the compound InChI InChI=1S\/C14H14N4O2S\/c1-10-4-6-13(7-5-10)21(19,20)18(9-15)14-16-11(2)8-12(3)17-14\/h4-8H,1-3H3 targets the compound InChI O=C(\/C=C\/c1ccc(O)c(O)c1)O[C@@H](C(=O)O)[C@@H](OC(=O)\/C=C\/c1ccc(O)c(O)c1)C(=O)O."} {"text":"User: Can you give me one example for a compound canonical SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, of course, the compound canonical SMILES Cc1cccc(N2CCN(C(=O)CCS(=O)(=O)c3ccc(Br)cc3)CC2C)c1 targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you tell another compound canonical SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Of course, the compound canonical SMILES Cc1cccc(N2CCN(C(=O)CCS(=O)(=O)c3ccc(Br)cc3)CC2C)c1 targets the compound canonical SMILES CCOc1ccc(CCNC(=O)c2ccc(Cl)c(S(=O)(=O)N3CCOCC3)c2)cc1OCC."}", "/scratch/micpie/export/compound_protein_compound_1/test_7-2.jsonl": "{"text":"User: Can you come up with an example for a compound SMILES that targets the protein Adenosine receptor A2a?\nAssistant: Sure, the compound SMILES O=C(Nc1cccc(Cl)c1)Nc1nc2nn(CCc3ccccc3)cc2c2nc(-c3ccco3)nn12 targets the protein Adenosine receptor A2a.\nUser: Can you create another compound SMILES that targets the protein Adenosine receptor A2a?\nAssistant: Of course, the compound SMILES O=C(Nc1cccc(Cl)c1)Nc1nc2nn(CCc3ccccc3)cc2c2nc(-c3ccco3)nn12 targets the compound SMILES OC[C@H]1O[C@@H](n2cnc3c(NC4CC5CCC4C5)ncnc32)[C@H](O)[C@@H]1O."} {"text":"User: Can you come up with one example for a compound InChI that targets the protein K-OR-1?\nAssistant: Yes, the compound InChI InChI=1S\/C25H30N2O3\/c1-3-27-17-7-5-4-6-15(17)16-13-25(29)19-12-14-8-9-18(28)22-20(14)24(25,10-11-26(19)2)23(30-22)21(16)27\/h8-9,19,23,28-29H,3-7,10-13H2,1-2H3\/t19?,23-,24-,25+\/m0\/s1 targets the protein K-OR-1.\nUser: Can you generate another compound InChI that targets the protein K-OR-1?\nAssistant: Of course, the compound InChI InChI=1S\/C25H30N2O3\/c1-3-27-17-7-5-4-6-15(17)16-13-25(29)19-12-14-8-9-18(28)22-20(14)24(25,10-11-26(19)2)23(30-22)21(16)27\/h8-9,19,23,28-29H,3-7,10-13H2,1-2H3\/t19?,23-,24-,25+\/m0\/s1 targets the compound InChI C[C@H]1CN(Cc2ccc3ccccc3n2)CC[C@@]1(C)c1cccc(C(N)=O)c1."}", "/scratch/micpie/export/herg_central_inhib/test_0-10.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that does not block the human ether-à-go-go related gene (hERG)?\n#Assistant: This is a molecule that is not a hERG blocking compound: COc1cccc(NC(=O)C2C3C=CC4(O3)C2C(=O)N(CCCN2CCCCC2)C4C(=O)NC2CCCCC2)c1"} {"text":"User: I'm searching for the InChI of a molecule that does not block the human ether-à-go-go related gene (hERG)?\n#Assistant: This is a molecule that is not a hERG blocking compound (IC50 less than 10uM): InChI=1S\/C18H15BrN2O3S\/c1-12-8-14(19)6-7-16(12)25-11-18(23)24-10-17(22)21-15-5-3-2-4-13(15)9-20\/h2-8H,10-11H2,1H3,(H,21,22)"}", "/scratch/micpie/export/herg_central_inhib/valid_0-8.jsonl": "{"text":"User: Is the molecule with the SMILES COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1 a hERG blocking compound (IC50 < 10uM)?\nAssistant: No, it is not a hERG blocking compound (IC50 < 10uM)."} {"text":"User: Is the molecule with the SMILES O=C(\/C=C\/C=C\/c1ccc2c(c1)OCO2)N1CCCCC1 a hERG blocking compound (IC50 < 10uM)?\nAssistant: No, it is not a hERG blocking compound (IC50 < 10uM)."}", "/scratch/micpie/export/herg_central_inhib/train_0-8.jsonl": "{"text":"User: Is the molecule with the SELFIES [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][=C][C][=Branch1][C][=O][O][C][Ring1][#Branch1][=C][Ring1][=N][C][Branch1][=C][C][C][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][Branch1][=Branch1][N][Branch1][C][C][C][C][=C][Ring1][=Branch2] a human ether-à-go-go related gene (hERG) blocker?\nAssistant: No, it is not a human ether-à-go-go related gene (hERG) blocker."} {"text":"User: Is the molecule with the canonical SMILES COc1cccc(-n2nnc3c(=O)n(CC(=O)N4CCCc5ccccc54)cnc32)c1 a hERG blocker?\nAssistant: No, it is not a hERG blocker."}", "/scratch/micpie/export/herg_central_inhib/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a human ether-à-go-go related gene (hERG) blocking compound.\nSELFIES: [C][O][C][=C][C][=C][C][Branch2][Branch1][Branch1][N][C][=Branch1][C][=O][C][C][C][=C][C][Branch1][Ring2][O][Ring1][Branch1][C][Ring1][#Branch1][C][=Branch1][C][=O][N][Branch1][N][C][C][C][N][C][C][C][C][C][Ring1][=Branch1][C][Ring1][S][C][=Branch1][C][=O][N][C][C][C][C][C][C][Ring1][=Branch1][=C][Ring2][Ring2][=Branch1]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is no human ether-à-go-go related gene (hERG) blocking compound."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a human ether-à-go-go related gene (hERG) blocking compound.\nMolecule InChI: InChI=1S\/C18H15BrN2O3S\/c1-12-8-14(19)6-7-16(12)25-11-18(23)24-10-17(22)21-15-5-3-2-4-13(15)9-20\/h2-8H,10-11H2,1H3,(H,21,22)\nConstraint: Answer the question in a full sentence.\nResult: This molecule is no human ether-à-go-go related gene (hERG) blocking compound."}", "/scratch/micpie/export/herg_central_inhib/valid_0-9.jsonl": "{"text":"User: Can you create the canonical SMILES of a molecule that is not a hERG blocking compound (IC50 less than 10uM)?\nAssistant: Yes, I'm happy to help, here you go: COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1"} {"text":"User: Can you generate the SMILES of a molecule that is not a hERG blocker?\nAssistant: Of course, here you go: O=C(\/C=C\/C=C\/c1ccc2c(c1)OCO2)N1CCCCC1"}", "/scratch/micpie/export/herg_central_inhib/test_0-1.jsonl": "{"text":"Based on the SMILES representation COc1cccc(NC(=O)C2C3C=CC4(O3)C2C(=O)N(CCCN2CCCCC2)C4C(=O)NC2CCCCC2)c1, the molecule is not a hERG blocking compound (IC50 < 10uM)."} {"text":"Based on the SELFIES [C][C][=C][C][Branch1][C][Br][=C][C][=C][Ring1][#Branch1][S][C][C][=Branch1][C][=O][O][C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][#N], the molecule is not a hERG blocking compound (IC50 < 10uM)."}", "/scratch/micpie/export/herg_central_inhib/valid_0-0.jsonl": "{"text":"The molecule with the SMILES representation of COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1 is not a hERG blocking compound (IC50 < 10uM)."} {"text":"The molecule with the DeepSMILES O=C\/C=C\/C=C\/cccccc6)OCO5))))))))))))NCCCCC6 is not a hERG blocking compound."}", "/scratch/micpie/export/herg_central_inhib/test_0-2.jsonl": "{"text":"The DeepSMILES COcccccNC=O)CCC=CCO5)C6C=O)NCCCNCCCCC6)))))))))C5C=O)NCCCCCC6)))))))))))))))))))c6 is from a molecule that is not a hERG blocking compound (IC50 less than 10uM)."} {"text":"The SMILES Cc1cc(Br)ccc1SCC(=O)OCC(=O)Nc1ccccc1C#N is from a molecule that is not a human ether-à-go-go related gene (hERG) blocking compound."}", "/scratch/micpie/export/herg_central_inhib/valid_0-10.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that does not block hERG?\n#Assistant: This is a molecule that is not a human ether-à-go-go related gene (hERG) blocking compound: COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1"} {"text":"User: I'm looking for the SMILES of a molecule that does not block the human ether-à-go-go related gene (hERG)?\n#Assistant: This is a molecule that is not a hERG blocking compound: O=C(\/C=C\/C=C\/c1ccc2c(c1)OCO2)N1CCCCC1"}", "/scratch/micpie/export/herg_central_inhib/train_0-6.jsonl": "{"text":"Task: Please create a molecule canonical SMILES based on the text description.\nDescription: A molecule that is a human ether-à-go-go related gene (hERG) blocker.\nResult: COc1cc(OC)c2ccc(=O)oc2c1C(CC(=O)N1CCOCC1)c1ccc(N(C)C)cc1"} {"text":"Task: Please generate a molecule SELFIES based on the text description below.\nDescription: A molecule that is a hERG blocking compound (IC50 less than 10uM).\nResult: [C][O][C][=C][C][=C][C][Branch2][Ring2][=Branch2][N][N][=N][C][C][=Branch1][C][=O][N][Branch2][Ring1][Ring2][C][C][=Branch1][C][=O][N][C][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][#Branch2][C][=N][C][=Ring2][Ring1][Ring2][Ring2][Ring1][#Branch1][=C][Ring2][Ring1][=N]"}", "/scratch/micpie/export/herg_central_inhib/valid_0-6.jsonl": "{"text":"Task: Please create a molecule canonical SMILES based on the text description below.\nDescription: A molecule that is a hERG blocking compound (IC50 < 10uM).\nResult: COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1"} {"text":"Task: Please create a molecule SELFIES based on the text description.\nDescription: A molecule that is a human ether-à-go-go related gene (hERG) blocker.\nResult: [O][=C][Branch2][Ring1][Ring2][\/C][=C][\/C][=C][\/C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O][Ring1][=Branch1][N][C][C][C][C][C][Ring1][=Branch1]"}", "/scratch/micpie/export/herg_central_inhib/test_0-9.jsonl": "{"text":"User: Can you give me the DeepSMILES of a molecule that is not a hERG blocking compound (IC50 less than 10uM)?\nAssistant: Yes, I'm happy to help, here you go: COcccccNC=O)CCC=CCO5)C6C=O)NCCCNCCCCC6)))))))))C5C=O)NCCCCCC6)))))))))))))))))))c6"} {"text":"User: Can you give me the SELFIES of a molecule that is not a hERG blocking compound (IC50 less than 10uM)?\nAssistant: Yes, here you go: [C][C][=C][C][Branch1][C][Br][=C][C][=C][Ring1][#Branch1][S][C][C][=Branch1][C][=O][O][C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][#N]"}", "/scratch/micpie/export/herg_central_inhib/test_0-0.jsonl": "{"text":"The molecule with the SMILES COc1cccc(NC(=O)C2C3C=CC4(O3)C2C(=O)N(CCCN2CCCCC2)C4C(=O)NC2CCCCC2)c1 is not a hERG blocking compound."} {"text":"The molecule with the canonical SMILES representation of Cc1cc(Br)ccc1SCC(=O)OCC(=O)Nc1ccccc1C#N is not a hERG blocking compound (IC50 < 10uM)."}", "/scratch/micpie/export/herg_central_inhib/valid_0-7.jsonl": "{"text":"User: Can you derive if the molecule with the canonical SMILES COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1 is a human ether-à-go-go related gene (hERG) blocker?\nAssistant: No, this molecule is not a human ether-à-go-go related gene (hERG) blocker."} {"text":"User: Can you derive if the molecule with the InChI InChI=1S\/C17H19NO3\/c19-17(18-10-4-1-5-11-18)7-3-2-6-14-8-9-15-16(12-14)21-13-20-15\/h2-3,6-9,12H,1,4-5,10-11,13H2\/b6-2+,7-3+ is a hERG blocker?\nAssistant: No, this molecule is not a hERG blocker."}", "/scratch/micpie/export/herg_central_inhib/test_0-3.jsonl": "{"text":"The SMILES COc1cccc(NC(=O)C2C3C=CC4(O3)C2C(=O)N(CCCN2CCCCC2)C4C(=O)NC2CCCCC2)c1 is not a human ether-à-go-go related gene (hERG) blocker."} {"text":"The molecule DeepSMILES CcccBr)ccc6SCC=O)OCC=O)Ncccccc6C#N is not a human ether-à-go-go related gene (hERG) blocker."}", "/scratch/micpie/export/herg_central_inhib/valid_0-11.jsonl": "{"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be a hERG blocker.\nAssistant: Got it, this canonical SMILES is not a hERG blocker: COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1"} {"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be a human ether-à-go-go related gene (hERG) blocking compound.\nAssistant: Ok, this DeepSMILES is not a human ether-à-go-go related gene (hERG) blocking compound: O=C\/C=C\/C=C\/cccccc6)OCO5))))))))))))NCCCCC6"}", "/scratch/micpie/export/herg_central_inhib/train_0-0.jsonl": "{"text":"The molecule with the SMILES representation of COc1cc(OC)c2ccc(=O)oc2c1C(CC(=O)N1CCOCC1)c1ccc(N(C)C)cc1 is not a hERG blocking compound (IC50 < 10uM)."} {"text":"The molecule with the canonical SMILES COc1cccc(-n2nnc3c(=O)n(CC(=O)N4CCCc5ccccc54)cnc32)c1 is not a human ether-à-go-go related gene (hERG) blocking compound."}", "/scratch/micpie/export/herg_central_inhib/test_0-6.jsonl": "{"text":"Task: Please generate a canonical SMILES based on the text description.\nDescription: A molecule that is a human ether-à-go-go related gene (hERG) blocking compound.\nResult: COc1cccc(NC(=O)C2C3C=CC4(O3)C2C(=O)N(CCCN2CCCCC2)C4C(=O)NC2CCCCC2)c1"} {"text":"Task: Please generate a canonical SMILES based on the description.\nDescription: A molecule that is a human ether-à-go-go related gene (hERG) blocking compound.\nResult: Cc1cc(Br)ccc1SCC(=O)OCC(=O)Nc1ccccc1C#N"}", "/scratch/micpie/export/herg_central_inhib/train_0-10.jsonl": "{"text":"User: I'm looking for the canonical SMILES of a molecule that does not block hERG (IC50 < 10uM)?\n#Assistant: This is a molecule that is not a human ether-à-go-go related gene (hERG) blocker: COc1cc(OC)c2ccc(=O)oc2c1C(CC(=O)N1CCOCC1)c1ccc(N(C)C)cc1"} {"text":"User: I'm looking for the SMILES of a molecule that does not block hERG?\n#Assistant: This is a molecule that is not a human ether-à-go-go related gene (hERG) blocker: COc1cccc(-n2nnc3c(=O)n(CC(=O)N4CCCc5ccccc54)cnc32)c1"}", "/scratch/micpie/export/herg_central_inhib/train_0-3.jsonl": "{"text":"The canonical SMILES COc1cc(OC)c2ccc(=O)oc2c1C(CC(=O)N1CCOCC1)c1ccc(N(C)C)cc1 is not a hERG blocking compound (IC50 < 10uM)."} {"text":"The SELFIES [C][O][C][=C][C][=C][C][Branch2][Ring2][=Branch2][N][N][=N][C][C][=Branch1][C][=O][N][Branch2][Ring1][Ring2][C][C][=Branch1][C][=O][N][C][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][#Branch2][C][=N][C][=Ring2][Ring1][Ring2][Ring2][Ring1][#Branch1][=C][Ring2][Ring1][=N] is not a hERG blocking compound."}", "/scratch/micpie/export/herg_central_inhib/train_0-12.jsonl": "{"text":"User: I want to come up with a InChI.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be a hERG blocking compound (IC50 less than 10uM).\nAssistant: Got it, this InChI is not a hERG blocking compound (IC50 less than 10uM): InChI=1S\/C26H30N2O6\/c1-27(2)18-7-5-17(6-8-18)20(15-23(29)28-11-13-33-14-12-28)25-22(32-4)16-21(31-3)19-9-10-24(30)34-26(19)25\/h5-10,16,20H,11-15H2,1-4H3"} {"text":"User: I want to generate a SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be a hERG blocking compound (IC50 less than 10uM).\nAssistant: Understood, this SMILES is not a hERG blocking compound (IC50 less than 10uM): COc1cccc(-n2nnc3c(=O)n(CC(=O)N4CCCc5ccccc54)cnc32)c1"}", "/scratch/micpie/export/herg_central_inhib/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a hERG blocker?\nConstraint: You must select none, one or more options from a or b without using any other words.\nOptions:\na CC(=O)NC1(C(F)(F)F)C(=O)Nc2c1c(=O)[nH]c(=O)n2-c1ccccc1\nb COc1cccc(NC(=O)C2C3C=CC4(O3)C2C(=O)N(CCCN2CCCCC2)C4C(=O)NC2CCCCC2)c1\nAnswer: a, b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a hERG blocking compound?\nConstraint: You must select none, one or more options from A, B, C, or D without using any other words.\nOptions:\nA InChI=1S\/C18H15BrN2O3S\/c1-12-8-14(19)6-7-16(12)25-11-18(23)24-10-17(22)21-15-5-3-2-4-13(15)9-20\/h2-8H,10-11H2,1H3,(H,21,22)\nB InChI=1S\/C14H15ClN4O3S\/c1-2-17-14(20)19-23(21,22)13-9-16-7-6-12(13)18-11-5-3-4-10(15)8-11\/h3-9H,2H2,1H3,(H,16,18)(H2,17,19,20)\nC InChI=1S\/C15H10BrN3O2S2\/c16-10-5-3-9(4-6-10)11-8-23-15(17-11)19-14(22)18-13(20)12-2-1-7-21-12\/h1-8H,(H2,17,18,19,20,22)\nD InChI=1S\/C24H27N3O5\/c1-3-32-24(30)26-13-11-17(12-14-26)27-15-16-7-6-8-18(21(16)23(27)29)22(28)25-19-9-4-5-10-20(19)31-2\/h4-10,17H,3,11-15H2,1-2H3,(H,25,28)\nAnswer: A, B, C"}", "/scratch/micpie/export/herg_central_inhib/valid_0-2.jsonl": "{"text":"The canonical SMILES COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1 is from a molecule that is not a human ether-à-go-go related gene (hERG) blocker."} {"text":"The InChI InChI=1S\/C17H19NO3\/c19-17(18-10-4-1-5-11-18)7-3-2-6-14-8-9-15-16(12-14)21-13-20-15\/h2-3,6-9,12H,1,4-5,10-11,13H2\/b6-2+,7-3+ represents a molecule that is not a human ether-à-go-go related gene (hERG) blocking compound."}", "/scratch/micpie/export/herg_central_inhib/valid_0-1.jsonl": "{"text":"Based on the canonical SMILES representation COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1, the molecule is not a human ether-à-go-go related gene (hERG) blocker."} {"text":"Based on the InChI representation InChI=1S\/C17H19NO3\/c19-17(18-10-4-1-5-11-18)7-3-2-6-14-8-9-15-16(12-14)21-13-20-15\/h2-3,6-9,12H,1,4-5,10-11,13H2\/b6-2+,7-3+, the molecule is not a hERG blocking compound (IC50 less than 10uM)."}", "/scratch/micpie/export/herg_central_inhib/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a hERG blocker?\nConstraint: You must select none, one or more options from a, b, c, or d without using any other words.\nOptions:\n(a) C#CCNCcccco5))))))C=O)cccCOccccF)cc6Cl)))))))))on5\n(b) COccccCNC=O)CScncn[nH]5))))))))))cc6\n(c) CcccC)cncNCCCNC)C)))))C=O)cccccc6)CCCC6)))))))))))sc5c9.Cl\n(d) COcccc\/C=C\\SC=S)NNCCOCC6))))))C5=O)))))))cOC))c6\nAnswer: a, b, c, d"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a hERG blocking compound (IC50 < 10uM)?\nConstraint: You must select none, one or more options from a, b, c, or d without using any additional words.\nOptions:\na. [C][S][C][=N][N][=C][Branch2][Ring1][S][C@H1][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][N][C][=Branch1][C][=O][O][C][Branch1][C][C][Branch1][C][C][C][O][Ring2][Ring1][Branch2]\nb. [C][O][C][=C][C][=C][Branch2][Ring1][S][C][C][C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][=N][N][Ring1][N][C][=Branch1][C][=O][C][C][C][=Branch1][C][=O][O][C][=C][Ring2][Ring1][=Branch2][O][C]\nc. [O][=C][Branch2][Ring1][=Branch2][O][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][=Branch1][C][=S][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]\nd. [O][=C][Branch2][Ring1][Ring2][\/C][=C][\/C][=C][\/C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O][Ring1][=Branch1][N][C][C][C][C][C][Ring1][=Branch1]\nAnswer: a, b, c, d"}", "/scratch/micpie/export/herg_central_inhib/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a human ether-à-go-go related gene (hERG) blocking compound.\ncanonical SMILES: COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is no human ether-à-go-go related gene (hERG) blocking compound."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a hERG blocker.\nInChI: InChI=1S\/C17H19NO3\/c19-17(18-10-4-1-5-11-18)7-3-2-6-14-8-9-15-16(12-14)21-13-20-15\/h2-3,6-9,12H,1,4-5,10-11,13H2\/b6-2+,7-3+\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is no hERG blocker."}", "/scratch/micpie/export/herg_central_inhib/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that block hERG (IC50 less than 10uM).\nMolecule canonical SMILES: COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that block hERG.\nMolecule SMILES: O=C(\/C=C\/C=C\/c1ccc2c(c1)OCO2)N1CCCCC1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/herg_central_inhib/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a hERG blocking compound (IC50 less than 10uM).\nDeepSMILES: COcccOC))cccc=O)oc6c%10CCC=O)NCCOCC6))))))))ccccNC)C))cc6\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is no hERG blocking compound (IC50 less than 10uM)."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a human ether-à-go-go related gene (hERG) blocking compound.\nDeepSMILES: COccccc-nnncc=O)nCC=O)NCCCcccccc6%10))))))))))))cnc69)))))))))c6\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is no human ether-à-go-go related gene (hERG) blocking compound."}", "/scratch/micpie/export/herg_central_inhib/valid_0-12.jsonl": "{"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be a hERG blocking compound (IC50 less than 10uM).\nAssistant: Got it, this SMILES is not a hERG blocking compound (IC50 less than 10uM): COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1"} {"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be a hERG blocking compound (IC50 < 10uM).\nAssistant: Ok, this DeepSMILES is not a hERG blocking compound (IC50 < 10uM): O=C\/C=C\/C=C\/cccccc6)OCO5))))))))))))NCCCCC6"}", "/scratch/micpie/export/herg_central_inhib/train_0-2.jsonl": "{"text":"The InChI InChI=1S\/C26H30N2O6\/c1-27(2)18-7-5-17(6-8-18)20(15-23(29)28-11-13-33-14-12-28)25-22(32-4)16-21(31-3)19-9-10-24(30)34-26(19)25\/h5-10,16,20H,11-15H2,1-4H3 is from a molecule that is not a human ether-à-go-go related gene (hERG) blocker."} {"text":"The InChI InChI=1S\/C22H20N6O3\/c1-31-17-9-4-8-16(12-17)28-21-20(24-25-28)22(30)26(14-23-21)13-19(29)27-11-5-7-15-6-2-3-10-18(15)27\/h2-4,6,8-10,12,14H,5,7,11,13H2,1H3 represents a molecule that is not a hERG blocking compound (IC50 < 10uM)."}", "/scratch/micpie/export/herg_central_inhib/test_0-11.jsonl": "{"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be a hERG blocking compound (IC50 < 10uM).\nAssistant: Ok, here you go, this SMILES is not a hERG blocking compound (IC50 < 10uM): COc1cccc(NC(=O)C2C3C=CC4(O3)C2C(=O)N(CCCN2CCCCC2)C4C(=O)NC2CCCCC2)c1"} {"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be a hERG blocking compound (IC50 < 10uM).\nAssistant: Ok, this SMILES is not a hERG blocking compound (IC50 < 10uM): Cc1cc(Br)ccc1SCC(=O)OCC(=O)Nc1ccccc1C#N"}", "/scratch/micpie/export/herg_central_inhib/train_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the canonical SMILES COc1cc(OC)c2ccc(=O)oc2c1C(CC(=O)N1CCOCC1)c1ccc(N(C)C)cc1 is a hERG blocking compound?\nAssistant: No, this molecule is not a hERG blocking compound."} {"text":"User: Can you tell me if the molecule with the DeepSMILES COccccc-nnncc=O)nCC=O)NCCCcccccc6%10))))))))))))cnc69)))))))))c6 is a hERG blocking compound (IC50 less than 10uM)?\nAssistant: No, this molecule is not a hERG blocking compound (IC50 less than 10uM)."}", "/scratch/micpie/export/herg_central_inhib/train_0-11.jsonl": "{"text":"User: I want to come up with a SELFIES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be a human ether-à-go-go related gene (hERG) blocking compound.\nAssistant: Ok, this SELFIES is not a human ether-à-go-go related gene (hERG) blocking compound: [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][=C][C][=Branch1][C][=O][O][C][Ring1][#Branch1][=C][Ring1][=N][C][Branch1][=C][C][C][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][Branch1][=Branch1][N][Branch1][C][C][C][C][=C][Ring1][=Branch2]"} {"text":"User: I want to create a InChI.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be a hERG blocker.\nAssistant: Ok, this InChI is not a hERG blocker: InChI=1S\/C22H20N6O3\/c1-31-17-9-4-8-16(12-17)28-21-20(24-25-28)22(30)26(14-23-21)13-19(29)27-11-5-7-15-6-2-3-10-18(15)27\/h2-4,6,8-10,12,14H,5,7,11,13H2,1H3"}", "/scratch/micpie/export/herg_central_inhib/train_0-1.jsonl": "{"text":"Based on the DeepSMILES representation COcccOC))cccc=O)oc6c%10CCC=O)NCCOCC6))))))))ccccNC)C))cc6, the molecule is not a hERG blocking compound (IC50 < 10uM)."} {"text":"Based on the DeepSMILES COccccc-nnncc=O)nCC=O)NCCCcccccc6%10))))))))))))cnc69)))))))))c6, the molecule is not a hERG blocker."}", "/scratch/micpie/export/herg_central_inhib/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a hERG blocker?\nConstraint: You must select none, one or more options from A, B, or C without using any additional words.\nOptions:\nA CSCCC(NC(=O)c1ccco1)C(=O)OCC(=O)Nc1sccc1C#N\nB COc1cc(OC)c2ccc(=O)oc2c1C(CC(=O)N1CCOCC1)c1ccc(N(C)C)cc1\nC Cc1cc2c(cc1S(=O)(=O)Nc1c(C)cccc1C)n(C)c(=O)c(=O)n2C\nAnswer: A, B, C"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a hERG blocking compound?\nConstraint: You must select none, one or more options from 1, 2, 3, 4, or 5 without using any other words.\nOptions:\n1: CCCCCCOc1ccc(N2C(=O)CC(N(CCc3ccc(S(N)(=O)=O)cc3)C(C)=O)C2=O)cc1\n2: CCOP(=O)(c1ccccc1)C(O)c1ccc([N+](=O)[O-])cc1\n3: COCCCn1c(SCC(=O)N2CCc3ccccc32)nnc1-c1ccoc1C\n4: COc1cccc(-n2nnc3c(=O)n(CC(=O)N4CCCc5ccccc54)cnc32)c1\n5: c1ccc(CN2CCC(Nc3nc(-c4ccccc4)nc4ccccc34)CC2)cc1\nAnswer: 1, 2, 3, 4, 5"}", "/scratch/micpie/export/herg_central_inhib/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that block hERG (IC50 less than 10uM).\nMolecule InChI: InChI=1S\/C26H30N2O6\/c1-27(2)18-7-5-17(6-8-18)20(15-23(29)28-11-13-33-14-12-28)25-22(32-4)16-21(31-3)19-9-10-24(30)34-26(19)25\/h5-10,16,20H,11-15H2,1-4H3\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that block hERG.\nDeepSMILES: COccccc-nnncc=O)nCC=O)NCCCcccccc6%10))))))))))))cnc69)))))))))c6\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/herg_central_inhib/test_0-7.jsonl": "{"text":"User: Can you derive if the molecule with the DeepSMILES COcccccNC=O)CCC=CCO5)C6C=O)NCCCNCCCCC6)))))))))C5C=O)NCCCCCC6)))))))))))))))))))c6 is a hERG blocking compound (IC50 less than 10uM)?\nAssistant: No, this molecule is not a hERG blocking compound (IC50 less than 10uM)."} {"text":"User: Can you tell me if the molecule with the SMILES Cc1cc(Br)ccc1SCC(=O)OCC(=O)Nc1ccccc1C#N is a hERG blocking compound?\nAssistant: No, this molecule is not a hERG blocking compound."}", "/scratch/micpie/export/herg_central_inhib/train_0-9.jsonl": "{"text":"User: Can you give me the canonical SMILES of a molecule that is not a hERG blocking compound?\nAssistant: Of course, here you go: COc1cc(OC)c2ccc(=O)oc2c1C(CC(=O)N1CCOCC1)c1ccc(N(C)C)cc1"} {"text":"User: Can you create the canonical SMILES of a molecule that is not a human ether-à-go-go related gene (hERG) blocking compound?\nAssistant: Of course, here you go: COc1cccc(-n2nnc3c(=O)n(CC(=O)N4CCCc5ccccc54)cnc32)c1"}", "/scratch/micpie/export/herg_central_inhib/valid_0-3.jsonl": "{"text":"The molecule SMILES COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1 is not a human ether-à-go-go related gene (hERG) blocking compound."} {"text":"The molecule DeepSMILES O=C\/C=C\/C=C\/cccccc6)OCO5))))))))))))NCCCCC6 is not a hERG blocking compound."}", "/scratch/micpie/export/herg_central_inhib/test_0-8.jsonl": "{"text":"User: Is the molecule with the canonical SMILES COc1cccc(NC(=O)C2C3C=CC4(O3)C2C(=O)N(CCCN2CCCCC2)C4C(=O)NC2CCCCC2)c1 a human ether-à-go-go related gene (hERG) blocking compound?\nAssistant: No, it is not a human ether-à-go-go related gene (hERG) blocking compound."} {"text":"User: Is the molecule with the InChI InChI=1S\/C18H15BrN2O3S\/c1-12-8-14(19)6-7-16(12)25-11-18(23)24-10-17(22)21-15-5-3-2-4-13(15)9-20\/h2-8H,10-11H2,1H3,(H,21,22) a human ether-à-go-go related gene (hERG) blocking compound?\nAssistant: No, it is not a human ether-à-go-go related gene (hERG) blocking compound."}", "/scratch/micpie/export/herg_central_inhib/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that block the human ether-à-go-go related gene (hERG).\nSMILES: COc1cccc(NC(=O)C2C3C=CC4(O3)C2C(=O)N(CCCN2CCCCC2)C4C(=O)NC2CCCCC2)c1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that block hERG.\nDeepSMILES: CcccBr)ccc6SCC=O)OCC=O)Ncccccc6C#N\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/herg_central_inhib/test_0-12.jsonl": "{"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be a hERG blocking compound (IC50 less than 10uM).\nAssistant: Understood, this SMILES is not a hERG blocking compound (IC50 less than 10uM): COc1cccc(NC(=O)C2C3C=CC4(O3)C2C(=O)N(CCCN2CCCCC2)C4C(=O)NC2CCCCC2)c1"} {"text":"User: I want to create a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be a human ether-à-go-go related gene (hERG) blocker.\nAssistant: Understood, this canonical SMILES is not a human ether-à-go-go related gene (hERG) blocker: Cc1cc(Br)ccc1SCC(=O)OCC(=O)Nc1ccccc1C#N"}", "/scratch/micpie/export/bio_ner_28/train_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: Primer sequences were: Pax6-F CAG CCA AAA TAG ATC TAC CTG; Pax6-R CGA TCA CAT GCT CTC TCC TT; Homer3-F CCC AGG TGG CTG TAG AGC; Homer3-R CTC TAC ACA GTG CAA AGC TCA G; Trim11-F GTG CAG GAT GTG AAG CTG; Trim11-R GCC TGC AGA TAG TCA TAG GG; Dncl1-F CAA AAA TGC AGA CAT GTC G; Dncl1-R CTA AGG GAG AAA AAA ATG GGG; Gapdh-F: CAT CAC CAT CTT CCA GGA GC; Gapdh-R: ATG ACC TTG CCC ACA GCC TT; Atp5a1-F: CAC ACG TGA GAT GTC CTC CA; Atp5a1-R: CAC AGA GAT TCG GGG ATA A..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: Primer sequences,0,16,Sequence\nPax6,23,27,Gene_or_geneproduct\nF,30,31,Sequence\nPax6,61,65,Gene_or_geneproduct\nR,68,69,Sequence\nHomer3,98,104,Gene_or_geneproduct\nF,107,108,Sequence\nHomer3,134,140,Gene_or_geneproduct\nR,143,144,Sequence\nG,157,158,Chemical\nTrim11,176,182,Gene_or_geneproduct\nF,185,186,Sequence\nTrim11,212,218,Gene_or_geneproduct\nR,221,222,Sequence\nDncl1,251,256,Gene_or_geneproduct\nF,259,260,Sequence\nG,270,271,Chemical\nDncl1,288,293,Gene_or_geneproduct\nR,296,297,Sequence\nGapdh,327,332,Gene_or_geneproduct\nF,335,336,Sequence\nGapdh,366,371,Gene_or_geneproduct\nR,374,375,Sequence\nAtp5a1,405,411,Gene_or_geneproduct\nF,414,415,Sequence\nAtp5a1,445,451,Gene_or_geneproduct\nR,454,455,Sequence\nA,458,459,Chemical"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: The expression patterns of the genes varied among different members. CsbHLH2, CsbHLH5, CsbHLH9, CsbHLH15, CsbHLH16, CsbHLH21, CsbHLH24, CsbHLH27, CsbHLH33, CsbHLH35, CsbHLH37, CsbHLH38, CsbHLH39, CsbHLH42, CsbHLH44, CsbHLH47, CsbHLH48, and CsbHLH54 were down-regulated by cold, while CsbHLH4, CsbHLH18, CsbHLH22, CsbHLH26, CsbHLH34, CsbHLH36, CsbHLH41, CsbHLH43, CsbHLH46, and CsbHLH49 showed increased expression levels during the cold treatments..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: CsbHLH2,69,76,Gene\/Protein\nCsbHLH5,78,85,Gene\/Protein\nCsbHLH9,87,94,Gene\/Protein\nCsbHLH15,96,104,Gene\/Protein\nCsbHLH16,106,114,Gene\/Protein\nCsbHLH21,116,124,Gene\/Protein\nCsbHLH24,126,134,Gene\/Protein\nCsbHLH27,136,144,Gene\/Protein\nCsbHLH33,146,154,Gene\/Protein\nCsbHLH35,156,164,Gene\/Protein\nCsbHLH37,166,174,Gene\/Protein\nCsbHLH38,176,184,Gene\/Protein\nCsbHLH39,186,194,Gene\/Protein\nCsbHLH42,196,204,Gene\/Protein\nCsbHLH44,206,214,Gene\/Protein\nCsbHLH47,216,224,Gene\/Protein\nCsbHLH48,226,234,Gene\/Protein\nCsbHLH54,240,248,Gene\/Protein\nCsbHLH4,286,293,Gene\/Protein\nCsbHLH18,295,303,Gene\/Protein\nCsbHLH22,305,313,Gene\/Protein\nCsbHLH26,315,323,Gene\/Protein\nCsbHLH34,325,333,Gene\/Protein\nCsbHLH36,335,343,Gene\/Protein\nCsbHLH41,345,353,Gene\/Protein\nCsbHLH43,355,363,Gene\/Protein\nCsbHLH46,365,373,Gene\/Protein\nCsbHLH49,379,387,Gene\/Protein"}", "/scratch/micpie/export/MUV_852/valid_0-0.jsonl": "{"text":"The chemical compound with the SMILES representation of Cc1nn(-c2ccccc2)c(Cl)c1C1C(C#N)=C(N)OC2=C1C(=O)CCC2 is not an inhibitor of factor XIIa (FXIIa)."} {"text":"The compound with the DeepSMILES representation of COC=O)cccOC))cOC))cc6NC=O)cccccC)c6 is not an inhibitor of factor XIIa (FXIIa)."}", "/scratch/micpie/export/MUV_852/test_0-0.jsonl": "{"text":"The chemical compound with the SMILES representation of CC(=O)N1CCC2(CC1)NC(=O)N(c1ccccc1)N2 is not an inhibitor of factor XIIa (FXIIa)."} {"text":"The molecular species with the SELFIES representation of ['[C][C][=C][C][=Branch1][C][=O][O][C][=C][C][Branch2][Ring1][#Branch2][O][C][C][=Branch1][C][=O][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][O][Ring1][Branch2][=C][C][=C][Ring2][Ring1][#Branch2][Ring2][Ring1][Branch1]'] is not an inhibitor of factor XIIa (FXIIa)."}", "/scratch/micpie/export/MUV_852/train_0-0.jsonl": "{"text":"The compound with the canonical SMILES representation of O=C(O)\/C(=C\\c1ccco1)NC(=O)c1cccc2ccccc12 is not an inhibitor of factor XIIa (FXIIa)."} {"text":"The molecular species with the InChI InChI=1S\/C13H12N2O5\/c1-8-3-5-12(20-8)13(16)14-10-7-9(15(17)18)4-6-11(10)19-2\/h3-7H,1-2H3,(H,14,16) is not an inhibitor of factor XIIa (FXIIa)."}", "/scratch/micpie/export/bio_ner_15/valid_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: The results indicate that 1) the U value can affect the calculated energetic result significantly, not only the absolute adsorption energy but also the trend in adsorption energy; 2) CO can directly react with surface lattice oxygen atoms (O (2f) \/ O (3f)) to form CO (2) via the Mars-van Krevelen reaction mechanism on both (110)-B and (111)-B; 3) pre-adsorbed molecular O (2) can enhance CO oxidation through the channel in which it directly reacts with molecular CO to form CO (2) [ O (2) (a)+ CO (g)--> CO (2) (g)+ O (a)] on (110)-A \/ (111)-A; 4) CO oxidation is a structure-sensitive reaction, and the activation energy of CO oxidation follows the order of Co (3) O (4) (111)-A (0.78 eV) > Co (3) O (4) (111)-B (0.68 eV) > Co (3) O (4) (110)-A (0.51 eV) > Co (3) O (4) (110)-B (0.41 eV), that is, the (110) surface shows higher reactivity for CO oxidation than the (111) surface; 5) in addition to the O (2f), it was also found that Co (3+) is more active than Co (2+), so both O (2f) and Co (3+) control the catalytic activity of CO oxidation on Co (3) O (4), as opposed to a previous DFT study which concluded that either Co (3+) or O (2f) is the active site..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: CO,183,185,Chemical\/Drug\noxygen,226,232,Chemical\/Drug\nO,241,242,Chemical\/Drug\nO,251,252,Chemical\/Drug\nCO,268,270,Chemical\/Drug\nCO,405,407,Chemical\/Drug\nCO,481,483,Chemical\/Drug\nO,493,494,Chemical\/Drug\nCO,516,518,Chemical\/Drug\nCO,530,532,Chemical\/Drug\nCO,584,586,Chemical\/Drug\nO,664,665,Chemical\/Drug\nO,705,706,Chemical\/Drug\nCO,912,914,Chemical\/Drug\nO,972,973,Chemical\/Drug"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Group I included deaths attributed to communicable diseases and maternal, perinatal and nutritional conditions (CD): intestinal infectious diseases, tuberculosis (TB), other bacterial diseases, rabies, measles, viral hepatitis, human immunodeficiency virus (HIV), malaria, nutritional deficiencies, meningitis, otitis, conditions related to or aggravated by the pregnancy, childbirth or by the puerperium (maternal causes or obstetric causes) and certain conditions originating in the perinatal period..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: communicable diseases,38,59,Disease\/Disorder\nCD,113,115,Chemical\/Drug\nintestinal infectious diseases,119,149,Disease\/Disorder\ntuberculosis,151,163,Disease\/Disorder\nTB,166,168,Disease\/Disorder\nbacterial diseases,178,196,Disease\/Disorder\nrabies,198,204,Disease\/Disorder\nmeasles,206,213,Disease\/Disorder\nviral hepatitis,215,230,Disease\/Disorder\nhuman immunodeficiency virus,232,260,Organism\/Species\nHIV,263,266,Organism\/Species\nmalaria,270,277,Disease\/Disorder\nnutritional deficiencies,279,303,Disease\/Disorder\nmeningitis,305,315,Disease\/Disorder\notitis,317,323,Disease\/Disorder"}", "/scratch/micpie/export/bio_ner_15/test_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: On the singlet PES, the most important products include CF3OOOI, CF3OOIO, CF3OIO2, and CF2O+ FIO2, while other products such as CF2O+ FOIO, CF2O+ FOOI, CF3OOI+ O ((3) P), CF3OI+ O2 ((1) Delta and (3) Sigma), and CF3O+ OIO are negligible due to high barriers or unstable formations..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: CF3OOOI,56,63,Chemical\/Drug\nCF3OOIO,65,72,Chemical\/Drug\nCF3OIO2,74,81,Chemical\/Drug\nCF2O,87,91,Chemical\/Drug\nFIO2,94,98,Chemical\/Drug\nCF2O,129,133,Chemical\/Drug\nFOIO,136,140,Chemical\/Drug\nCF2O,142,146,Chemical\/Drug\nFOOI,149,153,Chemical\/Drug\nCF3OOI,155,161,Chemical\/Drug\nO,164,165,Chemical\/Drug\nCF3OI,177,182,Chemical\/Drug\nO2,185,187,Chemical\/Drug\nCF3O,222,226,Chemical\/Drug\nOIO,229,232,Chemical\/Drug"} {"text":"Task: Please carry out the NER task for the the text below.\nText: Phylogenetic analyses All concatenated ribosomal trees (Fig. 1) used finished and permanent draft genomes downloaded (September 2016) from IMG [ 41] to create reference datasets for 16 ribosomal proteins chosen as single-copy phylogenetic marker genes (rpL2, rpL3, rpL4, rpL5, rpL6, rpL14, rpL15, rpL16, rpL18, rpL22, rpL24, rpS3, rpS8, rpS10, rpS17, rpS19)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: 16 ribosomal,184,196,gene\nrpL2,258,262,gene\nrpL3,264,268,gene\nrpL4,270,274,gene\nrpL5,276,280,gene\nrpL6,282,286,gene\nrpL14,288,293,gene\nrpL15,295,300,gene\nrpL16,302,307,gene\nrpL18,309,314,gene\nrpS3,330,334,gene\nrpS8,336,340,gene\nrpS10,342,347,gene\nrpS17,349,354,gene\nrpS19,356,361,gene"}", "/scratch/micpie/export/bio_ner_15/train_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: These include the genes, undefined 1 (UD1), UD2, and UD3, each coding for proteins of unknown function, the ken gene encoding a new Kruppel-like putative transcription factor, the fly homologues of the mammalian mitochondrial trifunctional enzyme (thiolase), and the TAR DNA-binding protein-43 (TBPH), the first nonvertebrate member of the transmembrane 4 superfamily (TM4SF) gene, a new homeodomain gene, and a gene coding for a putative nuclear binding protein (PNBP) that is homologous to maleless, and a Copia-like element..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: undefined 1,25,36,Gene\/Protein\nUD1,39,42,Gene\/Protein\nUD2,45,48,Gene\/Protein\nUD3,54,57,Gene\/Protein\nken gene,109,117,Gene\/Protein\nKruppel - like putative transcription factor,133,177,Gene\/Protein\nmammalian mitochondrial trifunctional enzyme,205,249,Gene\/Protein\nthiolase,252,260,Gene\/Protein\nTAR DNA - binding protein - 43,271,301,Gene\/Protein\nTBPH,304,308,Gene\/Protein\ntransmembrane 4 superfamily,349,376,Gene\/Protein\nTM4SF,379,384,Gene\/Protein\nhomeodomain gene,398,414,Gene\/Protein\nmaleless,503,511,Gene\/Protein\nCopia - like element,519,539,Gene\/Protein"} {"text":"Task: Please carry out the NER task for the the text below.\nText: Aloes (Aloe vera) shandileer (Leonotis nepetifolia), wild onion (Hymenocallis tubiflora), pepper (Capsicum spp.) tulsi (Ocimum gratissimum), black sage (Cordia curassavica), shadon beni (Eryngium foetidium), lemongrass (Cymbopogon citratus) and nutmeg (Myristica fragrans) were the more popular traditional indigenous West Indian medicinal plants used..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: Aloe vera,8,17,Organism\/Species\nshandileer,19,29,Organism\/Species\nLeonotis nepetifolia,32,52,Organism\/Species\nonion,60,65,Organism\/Species\nHymenocallis tubiflora,68,90,Organism\/Species\ntulsi,117,122,Organism\/Species\nOcimum gratissimum,125,143,Organism\/Species\nblack sage,146,156,Organism\/Species\nCordia curassavica,159,177,Organism\/Species\nshadon beni,180,191,Organism\/Species\nEryngium foetidium,194,212,Organism\/Species\nlemongrass,215,225,Organism\/Species\nCymbopogon citratus,228,247,Organism\/Species\nnutmeg,253,259,Organism\/Species\nMyristica fragrans,262,280,Organism\/Species"}", "/scratch/micpie/export/sr_are_tox21/test_0-10.jsonl": "{"text":"User: Can you create the SELFIES of a molecule that is not toxic in the Antioxidant response element assay?\nAssistant: Of course, here you go: [O][=S][=Branch1][C][=O][Branch1][C][Cl][C][=C][C][=C][C][=C][Ring1][=Branch1]"} {"text":"User: Can you generate the InChI of a molecule that is not toxic in the SR-ARE assay?\nAssistant: Yes, I'm happy to help, here you go: InChI=1S\/C7H7NO2\/c8-6-3-1-5(2-4-6)7(9)10\/h1-4H,8H2,(H,9,10)"}", "/scratch/micpie/export/sr_are_tox21/valid_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the SMILES CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21 is toxic in the SR-Antioxidant response element assay?\nAssistant: No, this molecule is not toxic in the SR-Antioxidant response element assay."} {"text":"User: Can you figure out if the molecule with the InChI InChI=1S\/C20H25NO2S2\/c1-14-7-11-24-18(14)17(19-15(2)8-12-25-19)6-4-10-21-9-3-5-16(13-21)20(22)23\/h6-8,11-12,16H,3-5,9-10,13H2,1-2H3,(H,22,23)\/t16-\/m1\/s1 is toxic in the Antioxidant response element assay?\nAssistant: No, this molecule is not toxic in the Antioxidant response element assay."}", "/scratch/micpie/export/sr_are_tox21/test_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the SR-Antioxidant response element assay?\nConstraint: You must select none, one or more options from a, b, c, d, or e without using any additional words.\nOptions:\na: O=S=O)Cl)cccccc6\nb: CCCCOC=O)CC)OC=O)CCC\nc: OCccccCl)cc6))))))CCNCC6\nd: CcccC)cC)cc6C\ne: CCC)=CCO\nAnswer: a, b, c, d, e"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the SR-ARE assay?\nConstraint: You must select none, one or more options from A, B, C, or D without using any additional words.\nOptions:\n(A) CcccccOcccccc6)))))))c6\n(B) CCCCCOC=O)COcccNC=O)C=CCCCC6))))C5=O))))))cF)cc6Cl\n(C) C[C@H]CNCCOCC6)))))))CC=O)NCCCC5))))))cccccc6))))))cccccc6\n(D) NccccC=O)O))cc6\nAnswer: A, B, C, D"}", "/scratch/micpie/export/sr_are_tox21/train_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the DeepSMILES CCOccccncSN)=O)=O))sc5c9 is toxic in the SR-Antioxidant response element assay?\nAssistant: Yes, this molecule is toxic in the SR-Antioxidant response element assay."} {"text":"User: Can you tell me if the molecule with the InChI InChI=1S\/C10H18O2\/c1-9(11)12-8-7-10-5-3-2-4-6-10\/h10H,2-8H2,1H3 is toxic in the SR-ARE assay?\nAssistant: No, this molecule is not toxic in the SR-ARE assay."}", "/scratch/micpie/export/sr_are_tox21/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-ARE assay.\nMolecule SELFIES: [O][=S][=Branch1][C][=O][Branch1][C][Cl][C][=C][C][=C][C][=C][Ring1][=Branch1]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-ARE assay.\nMolecule SMILES: Nc1ccc(C(=O)O)cc1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/sr_are_tox21/valid_0-9.jsonl": "{"text":"User: Is the molecule with the DeepSMILES CNC)CCCNcccccc6CCccccCl)cc6%15 toxic in the SR-ARE assay?\nAssistant: No, it is not toxic in the SR-ARE assay."} {"text":"User: Is the molecule with the InChI InChI=1S\/C20H25NO2S2\/c1-14-7-11-24-18(14)17(19-15(2)8-12-25-19)6-4-10-21-9-3-5-16(13-21)20(22)23\/h6-8,11-12,16H,3-5,9-10,13H2,1-2H3,(H,22,23)\/t16-\/m1\/s1 toxic in the Antioxidant response element assay?\nAssistant: No, it is not toxic in the Antioxidant response element assay."}", "/scratch/micpie/export/sr_are_tox21/test_0-1.jsonl": "{"text":"The molecule with the DeepSMILES O=S=O)Cl)cccccc6 is not showing activity in the SR-ARE toxicity assay."} {"text":"The molecule with the DeepSMILES representation of NccccC=O)O))cc6 is not showing activity in the SR-ARE toxicity assay."}", "/scratch/micpie/export/sr_are_tox21/valid_0-0.jsonl": "{"text":"The molecule with the SMILES CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21 is not toxic in the Antioxidant response element assay."} {"text":"The molecule with the InChI representation of InChI=1S\/C20H25NO2S2\/c1-14-7-11-24-18(14)17(19-15(2)8-12-25-19)6-4-10-21-9-3-5-16(13-21)20(22)23\/h6-8,11-12,16H,3-5,9-10,13H2,1-2H3,(H,22,23)\/t16-\/m1\/s1 is not toxic in the SR-ARE assay."}", "/scratch/micpie/export/sr_are_tox21/test_0-2.jsonl": "{"text":"Based on the SMILES representation O=S(=O)(Cl)c1ccccc1, the molecule has no SR-ARE toxicity properties."} {"text":"Based on the SELFIES representation [N][C][=C][C][=C][Branch1][=Branch1][C][=Branch1][C][=O][O][C][=C][Ring1][=Branch2], the molecule has no Antioxidant response element toxicity properties."}", "/scratch/micpie/export/sr_are_tox21/valid_0-10.jsonl": "{"text":"User: Can you create the InChI of a molecule that is not toxic in the SR-Antioxidant response element assay?\nAssistant: Yes, here you go: InChI=1S\/C19H23ClN2\/c1-21(2)12-5-13-22-18-7-4-3-6-15(18)8-9-16-10-11-17(20)14-19(16)22\/h3-4,6-7,10-11,14H,5,8-9,12-13H2,1-2H3"} {"text":"User: Can you create the InChI of a molecule that is not toxic in the SR-ARE assay?\nAssistant: Yes, here you go: InChI=1S\/C20H25NO2S2\/c1-14-7-11-24-18(14)17(19-15(2)8-12-25-19)6-4-10-21-9-3-5-16(13-21)20(22)23\/h6-8,11-12,16H,3-5,9-10,13H2,1-2H3,(H,22,23)\/t16-\/m1\/s1"}", "/scratch/micpie/export/sr_are_tox21/train_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-Antioxidant response element assay.\nMolecule SMILES: CCOc1ccc2nc(S(N)(=O)=O)sc2c1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is toxic in the SR-Antioxidant response element assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-Antioxidant response element assay.\nMolecule InChI: InChI=1S\/C10H18O2\/c1-9(11)12-8-7-10-5-3-2-4-6-10\/h10H,2-8H2,1H3\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the SR-Antioxidant response element assay."}", "/scratch/micpie/export/sr_are_tox21/valid_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-ARE assay.\nMolecule SMILES: CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the SR-ARE assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Antioxidant response element assay.\nSELFIES: [C][C][C][=C][S][C][=Ring1][Branch1][C][=Branch2][Ring1][Ring1][=C][C][C][N][C][C][C][C@@H1][Branch1][=Branch1][C][=Branch1][C][=O][O][C][Ring1][=Branch2][C][S][C][=C][C][=Ring1][Branch1][C]\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the Antioxidant response element assay."}", "/scratch/micpie/export/sr_are_tox21/test_0-9.jsonl": "{"text":"User: Is the molecule with the SMILES O=S(=O)(Cl)c1ccccc1 toxic in the Antioxidant response element assay?\nAssistant: No, it is not toxic in the Antioxidant response element assay."} {"text":"User: Is the molecule with the SMILES Nc1ccc(C(=O)O)cc1 toxic in the SR-ARE assay?\nAssistant: No, it is not toxic in the SR-ARE assay."}", "/scratch/micpie/export/sr_are_tox21/test_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of [O][=S][=Branch1][C][=O][Branch1][C][Cl][C][=C][C][=C][C][=C][Ring1][=Branch1] is not toxic in the SR-Antioxidant response element assay."} {"text":"The molecule with the InChI InChI=1S\/C7H7NO2\/c8-6-3-1-5(2-4-6)7(9)10\/h1-4H,8H2,(H,9,10) is not toxic in the SR-Antioxidant response element assay."}", "/scratch/micpie/export/sr_are_tox21/valid_0-7.jsonl": "{"text":"Task: Please give me a SELFIES based on the text description.\nDescription: A molecule that is toxic in the Antioxidant response element assay.\nResult: [C][N][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Ring1][S]"} {"text":"Task: Please give me a canonical SMILES based on the description.\nDescription: A molecule that is toxic in the SR-Antioxidant response element assay.\nResult: Cc1ccsc1C(=CCCN1CCC[C@@H](C(=O)O)C1)c1sccc1C"}", "/scratch/micpie/export/sr_are_tox21/test_0-3.jsonl": "{"text":"The DeepSMILES O=S=O)Cl)cccccc6 is from a molecule that is not identified as toxic in the SR-ARE assay."} {"text":"The InChI InChI=1S\/C7H7NO2\/c8-6-3-1-5(2-4-6)7(9)10\/h1-4H,8H2,(H,9,10) is from a molecule that is not identified as toxic in the Antioxidant response element assay."}", "/scratch/micpie/export/sr_are_tox21/valid_0-11.jsonl": "{"text":"User: I'm looking for the canonical SMILES of a molecule that is not toxic in the SR-Antioxidant response element assay?\nAssistant: This is a molecule that is not toxic in the SR-Antioxidant response element assay: CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21"} {"text":"User: I'm looking for the canonical SMILES of a molecule that is not toxic in the Antioxidant response element assay?\nAssistant: This is a molecule that is not toxic in the Antioxidant response element assay: Cc1ccsc1C(=CCCN1CCC[C@@H](C(=O)O)C1)c1sccc1C"}", "/scratch/micpie/export/sr_are_tox21/train_0-0.jsonl": "{"text":"The molecule with the DeepSMILES CCOccccncSN)=O)=O))sc5c9 is toxic in the SR-ARE assay."} {"text":"The molecule with the SMILES representation of CC(=O)OCCC1CCCCC1 is not toxic in the SR-Antioxidant response element assay."}", "/scratch/micpie/export/sr_are_tox21/test_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-ARE assay.\nDeepSMILES: O=S=O)Cl)cccccc6\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the SR-ARE assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Antioxidant response element assay.\nMolecule InChI: InChI=1S\/C7H7NO2\/c8-6-3-1-5(2-4-6)7(9)10\/h1-4H,8H2,(H,9,10)\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the Antioxidant response element assay."}", "/scratch/micpie/export/sr_are_tox21/train_0-10.jsonl": "{"text":"User: Can you generate the canonical SMILES of a molecule that is toxic in the SR-ARE assay?\nAssistant: Yes, here you go: CCOc1ccc2nc(S(N)(=O)=O)sc2c1"} {"text":"User: Can you create the InChI of a molecule that is not toxic in the SR-ARE assay?\nAssistant: Yes, here you go: InChI=1S\/C10H18O2\/c1-9(11)12-8-7-10-5-3-2-4-6-10\/h10H,2-8H2,1H3"}", "/scratch/micpie/export/sr_are_tox21/train_0-3.jsonl": "{"text":"The SMILES CCOc1ccc2nc(S(N)(=O)=O)sc2c1 is from a molecule that is identified as toxic in the SR-ARE assay."} {"text":"The InChI InChI=1S\/C10H18O2\/c1-9(11)12-8-7-10-5-3-2-4-6-10\/h10H,2-8H2,1H3 represents a molecule that is not identified as toxic in the Antioxidant response element assay."}", "/scratch/micpie/export/sr_are_tox21/train_0-12.jsonl": "{"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very curious. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should be toxic in the SR-ARE assay.\nAssistant: Ok, this DeepSMILES is toxic in the SR-ARE assay: CCOccccncSN)=O)=O))sc5c9"} {"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be toxic in the SR-Antioxidant response element assay.\nAssistant: Got it, this InChI is not toxic in the SR-Antioxidant response element assay: InChI=1S\/C10H18O2\/c1-9(11)12-8-7-10-5-3-2-4-6-10\/h10H,2-8H2,1H3"}", "/scratch/micpie/export/sr_are_tox21/test_0-13.jsonl": "{"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the SR-Antioxidant response element assay.\nAssistant: Understood, this canonical SMILES is not toxic in the SR-Antioxidant response element assay: O=S(=O)(Cl)c1ccccc1"} {"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the SR-Antioxidant response element assay.\nAssistant: Understood, this canonical SMILES is not toxic in the SR-Antioxidant response element assay: Nc1ccc(C(=O)O)cc1"}", "/scratch/micpie/export/sr_are_tox21/valid_0-2.jsonl": "{"text":"Based on the DeepSMILES representation CNC)CCCNcccccc6CCccccCl)cc6%15, the molecule has no Antioxidant response element toxicity features."} {"text":"Based on the canonical SMILES Cc1ccsc1C(=CCCN1CCC[C@@H](C(=O)O)C1)c1sccc1C, the molecule has no SR-Antioxidant response element toxicity characteristics."}", "/scratch/micpie/export/sr_are_tox21/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13) toxic in the SR-ARE assay?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any additional words.\nOptions:\n1: False\n2: True\nAnswer: 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES representation of CC(=O)OCCC1CCCCC1 toxic in the SR-Antioxidant response element assay?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any additional words.\nOptions:\n1 True\n2 False\nAnswer: 2"}", "/scratch/micpie/export/sr_are_tox21/valid_0-1.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C19H23ClN2\/c1-21(2)12-5-13-22-18-7-4-3-6-15(18)8-9-16-10-11-17(20)14-19(16)22\/h3-4,6-7,10-11,14H,5,8-9,12-13H2,1-2H3 is not showing activity in the SR-ARE toxicity assay."} {"text":"The molecule with the SELFIES [C][C][C][=C][S][C][=Ring1][Branch1][C][=Branch2][Ring1][Ring1][=C][C][C][N][C][C][C][C@@H1][Branch1][=Branch1][C][=Branch1][C][=O][O][C][Ring1][=Branch2][C][S][C][=C][C][=Ring1][Branch1][C] is not showing activity in the SR-ARE toxicity assay."}", "/scratch/micpie/export/sr_are_tox21/valid_0-13.jsonl": "{"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the SR-Antioxidant response element assay.\nAssistant: Got it, this DeepSMILES is not toxic in the SR-Antioxidant response element assay: CNC)CCCNcccccc6CCccccCl)cc6%15"} {"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the SR-ARE assay.\nAssistant: Understood, this DeepSMILES is not toxic in the SR-ARE assay: Ccccsc5C=CCCNCCC[C@@H]C=O)O))C6)))))))))csccc5C"}", "/scratch/micpie/export/sr_are_tox21/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Antioxidant response element assay.\nSELFIES: [C][N][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Ring1][S]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Antioxidant response element assay.\nMolecule SELFIES: [C][C][C][=C][S][C][=Ring1][Branch1][C][=Branch2][Ring1][Ring1][=C][C][C][N][C][C][C][C@@H1][Branch1][=Branch1][C][=Branch1][C][=O][O][C][Ring1][=Branch2][C][S][C][=C][C][=Ring1][Branch1][C]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/sr_are_tox21/train_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are toxic in the SR-ARE assay?\nConstraint: You must select none, one or more options from a or b without using any other words.\nOptions:\na) InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13)\nb) InChI=1S\/C21H28O2\/c1-4-16-19(23)12-18-15-6-5-13-11-14(22)7-9-20(13,2)17(15)8-10-21(16,18)3\/h4,11,15,17-18H,5-10,12H2,1-3H3\/b16-4-\/t15-,17+,18+,20+,21-\/m1\/s1\nAnswer: a, b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the SR-Antioxidant response element assay?\nConstraint: You must select none, one or more options from a, b, c, or d without using any other words.\nOptions:\na CC=O)OCCCCCCCC6\nb NCcccccc6\nc CCCNC)C)))CNcccccc6CCcccccc6%15\nd CCC=O)Ncccccc6))))))CCCNCCO)cccccc6))))))))CC6\nAnswer: a, b, c, d"}", "/scratch/micpie/export/sr_are_tox21/valid_0-4.jsonl": "{"text":"The molecule SELFIES [C][N][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Ring1][S] is not toxic in the SR-ARE assay."} {"text":"The molecule SELFIES [C][C][C][=C][S][C][=Ring1][Branch1][C][=Branch2][Ring1][Ring1][=C][C][C][N][C][C][C][C@@H1][Branch1][=Branch1][C][=Branch1][C][=O][O][C][Ring1][=Branch2][C][S][C][=C][C][=Ring1][Branch1][C] is not toxic in the SR-Antioxidant response element assay."}", "/scratch/micpie/export/sr_are_tox21/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Antioxidant response element assay.\ncanonical SMILES: CCOc1ccc2nc(S(N)(=O)=O)sc2c1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-Antioxidant response element assay.\nSMILES: CC(=O)OCCC1CCCCC1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/sr_are_tox21/valid_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the SR-ARE assay?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any other words.\nOptions:\n1) [C][N][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Ring1][S]\n2) [C][C][C][Branch1][C][Cl][C][C][Branch1][C][Cl][C][Branch1][C][Cl][C][Branch1][C][Cl][C][Branch1][C][Cl][C][Branch1][C][C][Cl]\n3) [O][=P][Branch1][C][O][Branch1][C][O][C][N][Branch1][#Branch2][C][P][=Branch1][C][=O][Branch1][C][O][O][C][P][=Branch1][C][=O][Branch1][C][O][O]\n4) [C][C][C][C][S]\nAnswer: 1, 2, 3, 4"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the Antioxidant response element assay?\nConstraint: You must select none, one or more options from a, b, or c without using any additional words.\nOptions:\n(a) Ccccsc5C=CCCNCCC[C@@H]C=O)O))C6)))))))))csccc5C\n(b) CCCCOC=O)cccccc6C=O)OCCCC))CCCC\n(c) CccccO)cO)c6\nAnswer: a, b"}", "/scratch/micpie/export/sr_are_tox21/valid_0-12.jsonl": "{"text":"User: I want to generate a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the SR-ARE assay.\nAssistant: Got it, this canonical SMILES is not toxic in the SR-ARE assay: CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21"} {"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the SR-ARE assay.\nAssistant: Ok, this SELFIES is not toxic in the SR-ARE assay: [C][C][C][=C][S][C][=Ring1][Branch1][C][=Branch2][Ring1][Ring1][=C][C][C][N][C][C][C][C@@H1][Branch1][=Branch1][C][=Branch1][C][=O][O][C][Ring1][=Branch2][C][S][C][=C][C][=Ring1][Branch1][C]"}", "/scratch/micpie/export/sr_are_tox21/train_0-2.jsonl": "{"text":"Based on the canonical SMILES CCOc1ccc2nc(S(N)(=O)=O)sc2c1, the molecule has SR-Antioxidant response element toxicity characteristics."} {"text":"Based on the canonical SMILES representation CC(=O)OCCC1CCCCC1, the molecule has no SR-Antioxidant response element toxicity features."}", "/scratch/micpie/export/sr_are_tox21/test_0-11.jsonl": "{"text":"User: I'm searching for the DeepSMILES of a molecule that is not toxic in the SR-Antioxidant response element assay?\nAssistant: This is a molecule that is not toxic in the SR-Antioxidant response element assay: O=S=O)Cl)cccccc6"} {"text":"User: I'm looking for the InChI of a molecule that is not toxic in the Antioxidant response element assay?\nAssistant: This is a molecule that is not toxic in the Antioxidant response element assay: InChI=1S\/C7H7NO2\/c8-6-3-1-5(2-4-6)7(9)10\/h1-4H,8H2,(H,9,10)"}", "/scratch/micpie/export/sr_are_tox21/train_0-7.jsonl": "{"text":"Task: Please create a molecule SELFIES based on the text description below.\nDescription: A molecule that is toxic in the Antioxidant response element assay.\nResult: [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N]"} {"text":"Task: Please give me a InChI based on the text description.\nDescription: A molecule that is toxic in the SR-Antioxidant response element assay.\nResult: InChI=1S\/C10H18O2\/c1-9(11)12-8-7-10-5-3-2-4-6-10\/h10H,2-8H2,1H3"}", "/scratch/micpie/export/sr_are_tox21/train_0-11.jsonl": "{"text":"User: I'm looking for the DeepSMILES of a molecule that is toxic in the Antioxidant response element assay?\nAssistant: This is a molecule that is toxic in the Antioxidant response element assay: CCOccccncSN)=O)=O))sc5c9"} {"text":"User: I'm searching for the SMILES of a molecule that is not toxic in the SR-Antioxidant response element assay?\nAssistant: This is a molecule that is not toxic in the SR-Antioxidant response element assay: CC(=O)OCCC1CCCCC1"}", "/scratch/micpie/export/sr_are_tox21/train_0-1.jsonl": "{"text":"The molecule with the canonical SMILES representation of CCOc1ccc2nc(S(N)(=O)=O)sc2c1 is showing activity in the SR-ARE toxicity assay."} {"text":"The molecule with the DeepSMILES representation of CC=O)OCCCCCCCC6 is not showing activity in the SR-ARE toxicity assay."}", "/scratch/micpie/export/sr_are_tox21/train_0-13.jsonl": "{"text":"User: I want to create a SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should be toxic in the Antioxidant response element assay.\nAssistant: Ok, this SMILES is toxic in the Antioxidant response element assay: CCOc1ccc2nc(S(N)(=O)=O)sc2c1"} {"text":"User: I want to come up with a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the Antioxidant response element assay.\nAssistant: Got it, this canonical SMILES is not toxic in the Antioxidant response element assay: CC(=O)OCCC1CCCCC1"}", "/scratch/micpie/export/sr_are_tox21/train_0-4.jsonl": "{"text":"The SELFIES [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N] is toxic in the SR-Antioxidant response element assay."} {"text":"The molecule SMILES CC(=O)OCCC1CCCCC1 is not toxic in the SR-Antioxidant response element assay."}", "/scratch/micpie/export/sr_are_tox21/test_0-7.jsonl": "{"text":"Task: Please create a molecule DeepSMILES based on the description.\nDescription: A molecule that is toxic in the Antioxidant response element assay.\nResult: O=S=O)Cl)cccccc6"} {"text":"Task: Please generate a SELFIES based on the description.\nDescription: A molecule that is toxic in the Antioxidant response element assay.\nResult: [N][C][=C][C][=C][Branch1][=Branch1][C][=Branch1][C][=O][O][C][=C][Ring1][=Branch2]"}", "/scratch/micpie/export/sr_are_tox21/train_0-9.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13) toxic in the Antioxidant response element assay?\nAssistant: Yes, it is toxic in the Antioxidant response element assay."} {"text":"User: Is the molecule with the SMILES CC(=O)OCCC1CCCCC1 toxic in the Antioxidant response element assay?\nAssistant: No, it is not toxic in the Antioxidant response element assay."}", "/scratch/micpie/export/sr_are_tox21/valid_0-3.jsonl": "{"text":"The DeepSMILES CNC)CCCNcccccc6CCccccCl)cc6%15 is from a molecule that is not identified as toxic in the Antioxidant response element assay."} {"text":"The SMILES Cc1ccsc1C(=CCCN1CCC[C@@H](C(=O)O)C1)c1sccc1C is from a molecule that is not identified as toxic in the Antioxidant response element assay."}", "/scratch/micpie/export/sr_are_tox21/test_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the canonical SMILES O=S(=O)(Cl)c1ccccc1 is toxic in the SR-ARE assay?\nAssistant: No, this molecule is not toxic in the SR-ARE assay."} {"text":"User: Can you estimate if the molecule with the SMILES Nc1ccc(C(=O)O)cc1 is toxic in the Antioxidant response element assay?\nAssistant: No, this molecule is not toxic in the Antioxidant response element assay."}", "/scratch/micpie/export/sr_are_tox21/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES O=S(=O)(Cl)c1ccccc1 toxic in the Antioxidant response element assay?\nConstraint: Even if you are not sure, you must pick either A or B without using any other words.\nOptions:\nA: True\nB: False\nAnswer: B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES NccccC=O)O))cc6 toxic in the SR-Antioxidant response element assay?\nConstraint: Even if you are not sure, you must pick either a or b without using any other words.\nOptions:\na) True\nb) False\nAnswer: b"}", "/scratch/micpie/export/sr_are_tox21/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES representation of CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21 toxic in the SR-ARE assay?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any other words.\nOptions:\n1. False\n2. True\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI representation of InChI=1S\/C20H25NO2S2\/c1-14-7-11-24-18(14)17(19-15(2)8-12-25-19)6-4-10-21-9-3-5-16(13-21)20(22)23\/h6-8,11-12,16H,3-5,9-10,13H2,1-2H3,(H,22,23)\/t16-\/m1\/s1 toxic in the SR-ARE assay?\nConstraint: Even if you are uncertain, you must pick either a or b without using any other words.\nOptions:\na: True\nb: False\nAnswer: b"}", "/scratch/micpie/export/sr_are_tox21/test_0-4.jsonl": "{"text":"The SMILES O=S(=O)(Cl)c1ccccc1 is not toxic in the SR-Antioxidant response element assay."} {"text":"The molecule SMILES Nc1ccc(C(=O)O)cc1 is not toxic in the Antioxidant response element assay."}", "/scratch/micpie/export/sr_are_tox21/test_0-12.jsonl": "{"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the SR-Antioxidant response element assay.\nAssistant: Got it, here you go, this DeepSMILES is not toxic in the SR-Antioxidant response element assay: O=S=O)Cl)cccccc6"} {"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be toxic in the SR-ARE assay.\nAssistant: Ok, here you go, this InChI is not toxic in the SR-ARE assay: InChI=1S\/C7H7NO2\/c8-6-3-1-5(2-4-6)7(9)10\/h1-4H,8H2,(H,9,10)"}", "/scratch/micpie/export/mol2svg/valid_0-0.jsonl": "{"text":"User: Draw a structure of a molecule.\nDescription: The molecule can be represented with the SMILES string OC1CN(C1)C(=O)c1cccc(c1O)I.\nConstraint: Return an SVG of size 217x217 pixels.\nAnswer: \n\n\n \n <\/rect>\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<\/svg>\n"} {"text":"Task: Convert an SVG to a SMILES amd SMARTS string. If there is a substructure highlighted in the SVG, return the SMARTS string. Otherwise, return None as the SMARTS string.\nDescription: The SVG is \n\n\n \n <\/rect>\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<\/svg>\n.\nConstraint: The output must be two comma-separated strings. The first string is the SMILES string. The second string is the SMARTS string.\nAnswer: CC[C@@](NC(=O)c1n[nH]cc1I)(CO)C, [#6H]=[#8]."}", "/scratch/micpie/export/mol2svg/test_0-0.jsonl": "{"text":"User: Draw a structure of a molecule.\nDescription: The molecule has the SMILES string OCC1CN(C1)C(=O)c1[nH]cc(c1)I.\nConstraint: Return an SVG of size 273x273 pixels.\nAnswer: \n\n\n \n <\/rect>\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<\/svg>\n"} {"text":"Task: Convert an SVG to a SMILES amd SMARTS string. If there is a substructure highlighted in the SVG, return the SMARTS string. Otherwise, return None as the SMARTS string.\nDescription: The SVG is \n\n\n \n <\/rect>\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<\/svg>\n.\nConstraint: The output must be two comma-separated strings. The first string is the SMILES string. The second string is the SMARTS string.\nAnswer: CNC(=O)C(C)NC1(C(C)(F)F)CC1, [#6]1-[#6]-[#6]-12-[#6]-[#6]-2."}", "/scratch/micpie/export/mol2svg/train_0-0.jsonl": "{"text":"Task: Draw a structure of a molecule.\nDescription: Draw the molecule with SELFIES string [C][N][C][=Branch1][C][=O][C@H1][Branch1][=N][N][N][=C][C][=Branch1][=Branch1][=C][C][Ring1][=Branch1][=O][I][C].\nConstraint: The output must be an SVG of size 285x285 pixels.\nAnswer: \n\n\n \n <\/rect>\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<\/svg>\n"} {"text":"Task: Convert an SVG to a SMILES amd SMARTS string. If there is a substructure highlighted in the SVG, return the SMARTS string. Otherwise, return None as the SMARTS string.\nDescription: The SVG is \n\n\n \n <\/rect>\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<\/svg>\n.\nConstraint: The output must be two comma-separated strings. The first string is the SMILES string. The second string is the SMARTS string.\nAnswer: COC(CSCC(Br)=C(C)C)C1CC1, [#6]1-[#6]-[#6]-[#6]-[#6]-[#6]-1."}", "/scratch/micpie/export/bio_ner_57/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: GACATGACCGACGCSATYCT TCGTCGTCRTTGCCCCAYTT 774++ Winderl et al., 2007 Primer set 1 BssA327f BssA2004r bssA CGAATTCATCNTCGGCTACC GTCGTCRTTGCCCCAYTTNGG 1667 Washer and Edwards, 2007 Primer set 2 MBssA1516f BssA2524r bssA AGACCCAGAAGACCAGGTC ATGATSGTGTTYTGSCCRTAGGT 1008 Primer set 3 BssA327f MBssA2446r bssA CGAATTCATCNTCGGCTACC ATGCTTTTCAGGCTCCCTCT 2119 Primer set 5 BssA1985f BssA2524r bssA CNAARTGGGGCAAYGACGA ATGATSGTGTTYTGSCCRTAGGT 539++ Primer set 1 1294\/1321f 1933\/1981r assA, bssA TTTGAGTGCATCCGCCAYGGICT TCGTCRTTGCCCCATTTIGGIGC assA: 661 bssA: 682++ Callaghan et al., 2010 Primer set 2 1213f 1987r bssA GACATGACCGAYGCCATYCT TCRTCGTCRTTGCCCCAYTT 793++ Primer set 3 1294f (a) 1936r assA TTSGARTGCATCCGNCACGGN TCRTCATTNCCCCAYTTNGG 661 Primer set 4 1294f (a) 2457r assA TTSGARTGCATCCGNCACGGN TTGTCCTGNGTYTTGCGG 1180+ Primer set 5 1294f (b) 1936r assA TTYGAGTGYATNCGCCASGGC TCRTCATTNCCCCAYTTNGG 661 Primer set 6 1294f (b) 2457r assA TTYGAGTGYATNCGCCASGG TTGTCCTGNGTYTTGCGG 1180+ Primer set 7 1432f 1936r assA CCNACCACNAAGCAYGG TCRTCATTNCCCCAYTTNGG 523+ Primer set 8 1432f 2457r assA CCNACCACNAAGCAYGG TTGTCCTGNGTYTTGCGG 1042 Primer set 9 1432f 1933\/1981r assA, bssA CCNACCACNAAGCAYGG TCGTCRTTGCCCCATTTIGGIGC 523 FAE-B 7768f 8543r bssA s. l..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: GACATGACCGACGCSATYCT,0,20,primer\nTCGTCGTCRTTGCCCCAYTT,21,41,primer\nBssA327f,84,92,primer\nBssA2004r,93,102,primer\nCGAATTCATCNTCGGCTACC,108,128,primer\nGTCGTCRTTGCCCCAYTTNGG,129,150,primer\nMBssA1516f,194,204,primer\nBssA2524r,205,214,primer\nAGACCCAGAAGACCAGGTC,220,239,primer\nATGATSGTGTTYTGSCCRTAGGT,240,263,primer\nBssA327f,282,290,primer\nMBssA2446r,291,301,primer\nCGAATTCATCNTCGGCTACC,307,327,primer\nATGCTTTTCAGGCTCCCTCT,328,348,primer\nBssA1985f,367,376,primer\nBssA2524r,377,386,primer\nCNAARTGGGGCAAYGACGA,392,411,primer\nATGATSGTGTTYTGSCCRTAGGT,412,435,primer\n1294 \/ 1321f,457,469,primer\n1933 \/ 1981r,470,482,primer\nTTTGAGTGCATCCGCCAYGGICT,494,517,primer\nTCGTCRTTGCCCCATTTIGGIGC,518,541,primer\n1213f,602,607,primer\n1987r,608,613,primer\nGACATGACCGAYGCCATYCT,619,639,primer\nTCRTCGTCRTTGCCCCAYTT,640,660,primer\n1294f,682,687,primer\n1936r,693,698,primer\nTTSGARTGCATCCGNCACGGN,704,725,primer\nTCRTCATTNCCCCAYTTNGG,726,746,primer\n1294f,764,769,primer\n2457r,775,780,primer\nTTSGARTGCATCCGNCACGGN,786,807,primer\nTTGTCCTGNGTYTTGCGG,808,826,primer\n1294f,847,852,primer\n1936r,858,863,primer\nTTYGAGTGYATNCGCCASGGC,869,890,primer\nTCRTCATTNCCCCAYTTNGG,891,911,primer\n1294f,929,934,primer\n2457r,940,945,primer\nTTYGAGTGYATNCGCCASGG,951,971,primer\nTTGTCCTGNGTYTTGCGG,972,990,primer\n1432f,1011,1016,primer\n1936r,1017,1022,primer\nCCNACCACNAAGCAYGG,1028,1045,primer\nTCRTCATTNCCCCAYTTNGG,1046,1066,primer\n1432f,1086,1091,primer\n2457r,1092,1097,primer\nCCNACCACNAAGCAYGG,1103,1120,primer\nTTGTCCTGNGTYTTGCGG,1121,1139,primer\n1432f,1158,1163,primer\n1933 \/ 1981r,1164,1176,primer\nCCNACCACNAAGCAYGG,1188,1205,primer\nTCGTCRTTGCCCCATTTIGGIGC,1206,1229,primer\nFAE - B,1234,1241,primer\n7768f,1242,1247,primer\n8543r,1248,1253,primer"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: GACATGACCGACGCSATYCT TCGTCGTCRTTGCCCCAYTT 774++ Winderl et al., 2007 Primer set 1 BssA327f BssA2004r bssA CGAATTCATCNTCGGCTACC GTCGTCRTTGCCCCAYTTNGG 1667 Washer and Edwards, 2007 Primer set 2 MBssA1516f BssA2524r bssA AGACCCAGAAGACCAGGTC ATGATSGTGTTYTGSCCRTAGGT 1008 Primer set 3 BssA327f MBssA2446r bssA CGAATTCATCNTCGGCTACC ATGCTTTTCAGGCTCCCTCT 2119 Primer set 5 BssA1985f BssA2524r bssA CNAARTGGGGCAAYGACGA ATGATSGTGTTYTGSCCRTAGGT 539++ Primer set 1 1294\/1321f 1933\/1981r assA, bssA TTTGAGTGCATCCGCCAYGGICT TCGTCRTTGCCCCATTTIGGIGC assA: 661 bssA: 682++ Callaghan et al., 2010 Primer set 2 1213f 1987r bssA GACATGACCGAYGCCATYCT TCRTCGTCRTTGCCCCAYTT 793++ Primer set 3 1294f (a) 1936r assA TTSGARTGCATCCGNCACGGN TCRTCATTNCCCCAYTTNGG 661 Primer set 4 1294f (a) 2457r assA TTSGARTGCATCCGNCACGGN TTGTCCTGNGTYTTGCGG 1180+ Primer set 5 1294f (b) 1936r assA TTYGAGTGYATNCGCCASGGC TCRTCATTNCCCCAYTTNGG 661 Primer set 6 1294f (b) 2457r assA TTYGAGTGYATNCGCCASGG TTGTCCTGNGTYTTGCGG 1180+ Primer set 7 1432f 1936r assA CCNACCACNAAGCAYGG TCRTCATTNCCCCAYTTNGG 523+ Primer set 8 1432f 2457r assA CCNACCACNAAGCAYGG TTGTCCTGNGTYTTGCGG 1042 Primer set 9 1432f 1933\/1981r assA, bssA CCNACCACNAAGCAYGG TCGTCRTTGCCCCATTTIGGIGC 523 FAE-B 7768f 8543r bssA s. l..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: GACATGACCGACGCSATYCT,0,20,primer\nTCGTCGTCRTTGCCCCAYTT,21,41,primer\nBssA327f,84,92,primer\nBssA2004r,93,102,primer\nCGAATTCATCNTCGGCTACC,108,128,primer\nGTCGTCRTTGCCCCAYTTNGG,129,150,primer\nMBssA1516f,194,204,primer\nBssA2524r,205,214,primer\nAGACCCAGAAGACCAGGTC,220,239,primer\nATGATSGTGTTYTGSCCRTAGGT,240,263,primer\nBssA327f,282,290,primer\nMBssA2446r,291,301,primer\nCGAATTCATCNTCGGCTACC,307,327,primer\nATGCTTTTCAGGCTCCCTCT,328,348,primer\nBssA1985f,367,376,primer\nBssA2524r,377,386,primer\nCNAARTGGGGCAAYGACGA,392,411,primer\nATGATSGTGTTYTGSCCRTAGGT,412,435,primer\n1294 \/ 1321f,457,469,primer\n1933 \/ 1981r,470,482,primer\nTTTGAGTGCATCCGCCAYGGICT,494,517,primer\nTCGTCRTTGCCCCATTTIGGIGC,518,541,primer\n1213f,602,607,primer\n1987r,608,613,primer\nGACATGACCGAYGCCATYCT,619,639,primer\nTCRTCGTCRTTGCCCCAYTT,640,660,primer\n1294f,682,687,primer\n1936r,693,698,primer\nTTSGARTGCATCCGNCACGGN,704,725,primer\nTCRTCATTNCCCCAYTTNGG,726,746,primer\n1294f,764,769,primer\n2457r,775,780,primer\nTTSGARTGCATCCGNCACGGN,786,807,primer\nTTGTCCTGNGTYTTGCGG,808,826,primer\n1294f,847,852,primer\n1936r,858,863,primer\nTTYGAGTGYATNCGCCASGGC,869,890,primer\nTCRTCATTNCCCCAYTTNGG,891,911,primer\n1294f,929,934,primer\n2457r,940,945,primer\nTTYGAGTGYATNCGCCASGG,951,971,primer\nTTGTCCTGNGTYTTGCGG,972,990,primer\n1432f,1011,1016,primer\n1936r,1017,1022,primer\nCCNACCACNAAGCAYGG,1028,1045,primer\nTCRTCATTNCCCCAYTTNGG,1046,1066,primer\n1432f,1086,1091,primer\n2457r,1092,1097,primer\nCCNACCACNAAGCAYGG,1103,1120,primer\nTTGTCCTGNGTYTTGCGG,1121,1139,primer\n1432f,1158,1163,primer\n1933 \/ 1981r,1164,1176,primer\nCCNACCACNAAGCAYGG,1188,1205,primer\nTCGTCRTTGCCCCATTTIGGIGC,1206,1229,primer\nFAE - B,1234,1241,primer\n7768f,1242,1247,primer\n8543r,1248,1253,primer"}", "/scratch/micpie/export/clintox/test_0-10.jsonl": "{"text":"User: I'm looking for the canonical SMILES of a molecule that is not toxic?\nAssistant: This is a molecule that is not toxic: [H]\/[NH+]=C(\\N)C1=CC(=O)\/C(=C\\C=c2ccc(=C(N)[NH3+])cc2)C=C1"} {"text":"User: I'm looking for the SELFIES of a molecule that is not clinically toxic?\nAssistant: This is a molecule that is not clinically toxic: [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][Branch1][C][C][NH3+1]"}", "/scratch/micpie/export/clintox/valid_0-8.jsonl": "{"text":"User: Is the molecule with the canonical SMILES C#CCC(Cc1cnc2nc(N)nc(N)c2n1)c1ccc(C(=O)N[C@@H](CCC(=O)[O-])C(=O)[O-])cc1 clinically toxic?\nAssistant: No, it is not clinically toxic."} {"text":"User: Is the molecule with the SELFIES [C][S][=Branch1][C][=O][=Branch1][C][=O][O].[N][C][=N][C][=N][C][=C][Ring1][=Branch1][N][=C][N][Ring1][Branch1][C][C][O][C][P@@][=Branch1][C][=O][O][C][C][C@@H1][Branch1][N][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1][O][Ring1][=C] toxic?\nAssistant: Yes, it is toxic."}", "/scratch/micpie/export/clintox/train_0-8.jsonl": "{"text":"User: Is the molecule with the canonical SMILES *C(=O)[C@H](CCCCNC(=O)OCCOC)NC(=O)OCCOC clinically toxic?\nAssistant: No, it is not clinically toxic."} {"text":"User: Is the molecule with the InChI InChI=1S\/C28H36N4O2S\/c33-27-24-18-9-10-19(15-18)25(24)28(34)32(27)17-21-6-2-1-5-20(21)16-30-11-13-31(14-12-30)26-22-7-3-4-8-23(22)35-29-26\/h3-4,7-8,18-21,24-25H,1-2,5-6,9-17H2\/p+1\/t18-,19+,20-,21-,24+,25-\/m0\/s1 clinically toxic?\nAssistant: No, it is not clinically toxic."}", "/scratch/micpie/export/clintox/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is clinically toxic.\ncanonical SMILES: [H]\/[NH+]=C(\\N)C1=CC(=O)\/C(=C\\C=c2ccc(=C(N)[NH3+])cc2)C=C1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not clinically toxic."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic.\nMolecule canonical SMILES: Cc1cccc(C)c1NC(=O)C(C)[NH3+]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic."}", "/scratch/micpie/export/clintox/valid_0-9.jsonl": "{"text":"User: Can you generate the InChI of a molecule that is not toxic?\nAssistant: Yes, I'm happy to help, here you go: InChI=1S\/C23H23N7O5\/c1-2-3-14(10-15-11-26-20-18(27-15)19(24)29-23(25)30-20)12-4-6-13(7-5-12)21(33)28-16(22(34)35)8-9-17(31)32\/h1,4-7,11,14,16H,3,8-10H2,(H,28,33)(H,31,32)(H,34,35)(H4,24,25,26,29,30)\/p-2\/t14?,16-\/m0\/s1"} {"text":"User: Can you create the InChI of a molecule that is clinically toxic?\nAssistant: Yes, I'm happy to help, here you go: InChI=1S\/C17H19ClN5O4P.CH4O3S\/c18-13-3-1-2-12(8-13)14-4-6-26-28(24,27-14)11-25-7-5-23-10-22-15-16(19)20-9-21-17(15)23;1-5(2,3)4\/h1-3,8-10,14H,4-7,11H2,(H2,19,20,21);1H3,(H,2,3,4)\/t14-,28+;\/m0.\/s1"}", "/scratch/micpie/export/clintox/test_0-1.jsonl": "{"text":"Based on the canonical SMILES representation [H]\/[NH+]=C(\\N)C1=CC(=O)\/C(=C\\C=c2ccc(=C(N)[NH3+])cc2)C=C1, the molecule has no toxic traits."} {"text":"Based on the InChI representation InChI=1S\/C11H16N2O\/c1-7-5-4-6-8(2)10(7)13-11(14)9(3)12\/h4-6,9H,12H2,1-3H3,(H,13,14)\/p+1, the molecule has no toxic features."}", "/scratch/micpie/export/clintox/valid_0-0.jsonl": "{"text":"The molecule with the SMILES C#CCC(Cc1cnc2nc(N)nc(N)c2n1)c1ccc(C(=O)N[C@@H](CCC(=O)[O-])C(=O)[O-])cc1 displays no toxicity."} {"text":"The molecule with the DeepSMILES representation of CS=O)=O)O.Ncncncc6ncn5CCOC[P@@]=O)OCC[C@@H]cccccCl)c6))))))O6 displays clinical toxicity."}", "/scratch/micpie/export/clintox/test_0-2.jsonl": "{"text":"The InChI InChI=1S\/C16H16N4O\/c17-15(18)12-5-2-10(3-6-12)1-4-11-7-8-13(16(19)20)9-14(11)21\/h1-9H,17-18H2,(H3,19,20)\/p+2\/b11-4- represents a molecule that is not identified as toxic."} {"text":"The DeepSMILES CcccccC)c6NC=O)CC)[NH3+] is from a molecule that is not identified as toxic."}", "/scratch/micpie/export/clintox/valid_0-10.jsonl": "{"text":"User: I'm searching for the SELFIES of a molecule that is not clinically toxic?\nAssistant: This is a molecule that is not clinically toxic: [C][#C][C][C][Branch2][Ring1][=Branch1][C][C][=C][N][=C][N][=C][Branch1][C][N][N][=C][Branch1][C][N][C][Ring1][Branch2][=N][Ring1][N][C][=C][C][=C][Branch2][Ring1][Branch1][C][=Branch1][C][=O][N][C@@H1][Branch1][Branch2][C][C][C][=Branch1][C][=O][O-1][C][=Branch1][C][=O][O-1][C][=C][Ring2][Ring1][C]"} {"text":"User: I'm searching for the canonical SMILES of a molecule that is clinically toxic?\nAssistant: This is a molecule that is clinically toxic: CS(=O)(=O)O.Nc1ncnc2c1ncn2CCOC[P@@]1(=O)OCC[C@@H](c2cccc(Cl)c2)O1"}", "/scratch/micpie/export/clintox/train_0-6.jsonl": "{"text":"Task: Please give me a molecule SMILES based on the description below.\nDescription: A molecule that is toxic.\nResult: *C(=O)[C@H](CCCCNC(=O)OCCOC)NC(=O)OCCOC"} {"text":"Task: Please give me a canonical SMILES based on the description below.\nDescription: A molecule that is clinically toxic.\nResult: O=C1[C@@H]2[C@H]3CC[C@H](C3)[C@@H]2C(=O)N1C[C@@H]1CCCC[C@H]1C[NH+]1CCN(c2nsc3ccccc23)CC1"}", "/scratch/micpie/export/clintox/valid_0-6.jsonl": "{"text":"Task: Please create a molecule SMILES based on the text description below.\nDescription: A molecule that is toxic.\nResult: C#CCC(Cc1cnc2nc(N)nc(N)c2n1)c1ccc(C(=O)N[C@@H](CCC(=O)[O-])C(=O)[O-])cc1"} {"text":"Task: Please generate a molecule canonical SMILES based on the text description below.\nDescription: A molecule that is toxic.\nResult: CS(=O)(=O)O.Nc1ncnc2c1ncn2CCOC[P@@]1(=O)OCC[C@@H](c2cccc(Cl)c2)O1"}", "/scratch/micpie/export/clintox/test_0-9.jsonl": "{"text":"User: Can you generate the InChI of a molecule that is not toxic?\nAssistant: Yes, I'm happy to help, here you go: InChI=1S\/C16H16N4O\/c17-15(18)12-5-2-10(3-6-12)1-4-11-7-8-13(16(19)20)9-14(11)21\/h1-9H,17-18H2,(H3,19,20)\/p+2\/b11-4-"} {"text":"User: Can you give me the SELFIES of a molecule that is not clinically toxic?\nAssistant: Of course, here you go: [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][Branch1][C][C][NH3+1]"}", "/scratch/micpie/export/clintox/test_0-0.jsonl": "{"text":"The molecule with the SELFIES [H][\/NH1+1][=C][Branch1][C][\\N][C][=C][C][=Branch1][C][=O][\/C][=Branch2][Ring1][C][=C][\\C][=C][C][=C][C][=Branch1][=Branch1][=C][Branch1][C][N][NH3+1][C][=C][Ring1][=Branch2][C][=C][Ring2][Ring1][C] displays no toxicity."} {"text":"The molecule with the canonical SMILES Cc1cccc(C)c1NC(=O)C(C)[NH3+] displays no clinical toxicity."}", "/scratch/micpie/export/clintox/valid_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the canonical SMILES C#CCC(Cc1cnc2nc(N)nc(N)c2n1)c1ccc(C(=O)N[C@@H](CCC(=O)[O-])C(=O)[O-])cc1 is clinically toxic?\nAssistant: No, this molecule is not clinically toxic."} {"text":"User: Can you derive if the molecule with the SELFIES [C][S][=Branch1][C][=O][=Branch1][C][=O][O].[N][C][=N][C][=N][C][=C][Ring1][=Branch1][N][=C][N][Ring1][Branch1][C][C][O][C][P@@][=Branch1][C][=O][O][C][C][C@@H1][Branch1][N][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1][O][Ring1][=C] is clinically toxic?\nAssistant: Yes, this molecule is clinically toxic."}", "/scratch/micpie/export/clintox/test_0-3.jsonl": "{"text":"The molecule InChI InChI=1S\/C16H16N4O\/c17-15(18)12-5-2-10(3-6-12)1-4-11-7-8-13(16(19)20)9-14(11)21\/h1-9H,17-18H2,(H3,19,20)\/p+2\/b11-4- is not toxic."} {"text":"The molecule SMILES Cc1cccc(C)c1NC(=O)C(C)[NH3+] is not toxic."}", "/scratch/micpie/export/clintox/valid_0-11.jsonl": "{"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be clinically toxic.\nAssistant: Ok, here you go, this SMILES is not clinically toxic: C#CCC(Cc1cnc2nc(N)nc(N)c2n1)c1ccc(C(=O)N[C@@H](CCC(=O)[O-])C(=O)[O-])cc1"} {"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should be toxic.\nAssistant: Got it, this SELFIES is toxic: [C][S][=Branch1][C][=O][=Branch1][C][=O][O].[N][C][=N][C][=N][C][=C][Ring1][=Branch1][N][=C][N][Ring1][Branch1][C][C][O][C][P@@][=Branch1][C][=O][O][C][C][C@@H1][Branch1][N][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1][O][Ring1][=C]"}", "/scratch/micpie/export/clintox/train_0-0.jsonl": "{"text":"The molecule with the canonical SMILES *C(=O)[C@H](CCCCNC(=O)OCCOC)NC(=O)OCCOC exhibits no clinical toxicity."} {"text":"The molecule with the InChI representation of InChI=1S\/C28H36N4O2S\/c33-27-24-18-9-10-19(15-18)25(24)28(34)32(27)17-21-6-2-1-5-20(21)16-30-11-13-31(14-12-30)26-22-7-3-4-8-23(22)35-29-26\/h3-4,7-8,18-21,24-25H,1-2,5-6,9-17H2\/p+1\/t18-,19+,20-,21-,24+,25-\/m0\/s1 shows no clinical toxicity."}", "/scratch/micpie/export/clintox/test_0-6.jsonl": "{"text":"Task: Please create a molecule InChI based on the description below.\nDescription: A molecule that is clinically toxic.\nResult: InChI=1S\/C16H16N4O\/c17-15(18)12-5-2-10(3-6-12)1-4-11-7-8-13(16(19)20)9-14(11)21\/h1-9H,17-18H2,(H3,19,20)\/p+2\/b11-4-"} {"text":"Task: Please create a molecule SELFIES based on the text description.\nDescription: A molecule that is toxic.\nResult: [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][Branch1][C][C][NH3+1]"}", "/scratch/micpie/export/clintox/train_0-10.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that is not toxic?\nAssistant: This is a molecule that is not toxic: *C(=O)[C@H](CCCCNC(=O)OCCOC)NC(=O)OCCOC"} {"text":"User: I'm searching for the canonical SMILES of a molecule that is not toxic?\nAssistant: This is a molecule that is not toxic: O=C1[C@@H]2[C@H]3CC[C@H](C3)[C@@H]2C(=O)N1C[C@@H]1CCCC[C@H]1C[NH+]1CCN(c2nsc3ccccc23)CC1"}", "/scratch/micpie/export/clintox/train_0-3.jsonl": "{"text":"The molecule canonical SMILES *C(=O)[C@H](CCCCNC(=O)OCCOC)NC(=O)OCCOC is not toxic."} {"text":"The canonical SMILES O=C1[C@@H]2[C@H]3CC[C@H](C3)[C@@H]2C(=O)N1C[C@@H]1CCCC[C@H]1C[NH+]1CCN(c2nsc3ccccc23)CC1 is not clinically toxic."}", "/scratch/micpie/export/clintox/train_0-12.jsonl": "{"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be toxic.\nAssistant: Got it, this SMILES is not toxic: *C(=O)[C@H](CCCCNC(=O)OCCOC)NC(=O)OCCOC"} {"text":"User: I want to generate a InChI.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be clinically toxic.\nAssistant: Got it, this InChI is not clinically toxic: InChI=1S\/C28H36N4O2S\/c33-27-24-18-9-10-19(15-18)25(24)28(34)32(27)17-21-6-2-1-5-20(21)16-30-11-13-31(14-12-30)26-22-7-3-4-8-23(22)35-29-26\/h3-4,7-8,18-21,24-25H,1-2,5-6,9-17H2\/p+1\/t18-,19+,20-,21-,24+,25-\/m0\/s1"}", "/scratch/micpie/export/clintox/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [H][\/NH1+1][=C][Branch1][C][\\N][C][=C][C][=Branch1][C][=O][\/C][=Branch2][Ring1][C][=C][\\C][=C][C][=C][C][=Branch1][=Branch1][=C][Branch1][C][N][NH3+1][C][=C][Ring1][=Branch2][C][=C][Ring2][Ring1][C] toxic?\nConstraint: Even if you are uncertain, you must pick either A or B without using any other words.\nOptions:\n[A] True\n[B] False\nAnswer: B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][Branch1][C][C][NH3+1] toxic?\nConstraint: Even if you are not sure, you must pick either a or b without using any additional words.\nOptions:\na. False\nb. True\nAnswer: a"}", "/scratch/micpie/export/clintox/valid_0-2.jsonl": "{"text":"The InChI InChI=1S\/C23H23N7O5\/c1-2-3-14(10-15-11-26-20-18(27-15)19(24)29-23(25)30-20)12-4-6-13(7-5-12)21(33)28-16(22(34)35)8-9-17(31)32\/h1,4-7,11,14,16H,3,8-10H2,(H,28,33)(H,31,32)(H,34,35)(H4,24,25,26,29,30)\/p-2\/t14?,16-\/m0\/s1 is from a molecule that is not identified as clinically toxic."} {"text":"The DeepSMILES CS=O)=O)O.Ncncncc6ncn5CCOC[P@@]=O)OCC[C@@H]cccccCl)c6))))))O6 represents a molecule that is identified as clinically toxic."}", "/scratch/micpie/export/clintox/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not clinically toxic?\nConstraint: You must select none, one or more options from a or b without using any additional words.\nOptions:\n[a] O=CN[C@H]CO))[C@H]O)cccc[N+]=O)[O-]))cc6)))))))))[C-]Cl)Cl\n[b] *C=O)[C@H]CCCCNC=O)OCCOC)))))))))))NC=O)OCCOC\nAnswer: a, b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not clinically toxic?\nConstraint: You must select none, one or more options from a or b without using any additional words.\nOptions:\na [NH3+][C@H][C@@H]CNcncccc6F)))c=O)cC=O)[O-]))cn6-ccccF)cc6F)))))))))))))))C[C@H]65\nb O=C[C@H][C@@H]CC[C@@H]C5)[C@H]6C=O)N9C[C@@H]CCCC[C@H]6C[NH+]CCNcnscccccc96)))))))))CC6\nAnswer: a, b"}", "/scratch/micpie/export/clintox/valid_0-1.jsonl": "{"text":"Based on the SMILES C#CCC(Cc1cnc2nc(N)nc(N)c2n1)c1ccc(C(=O)N[C@@H](CCC(=O)[O-])C(=O)[O-])cc1, the molecule has no clinically toxic characteristics."} {"text":"Based on the SMILES CS(=O)(=O)O.Nc1ncnc2c1ncn2CCOC[P@@]1(=O)OCC[C@@H](c2cccc(Cl)c2)O1, the molecule has clinically toxic features."}", "/scratch/micpie/export/clintox/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of C#CCC(Cc1cnc2nc(N)nc(N)c2n1)c1ccc(C(=O)N[C@@H](CCC(=O)[O-])C(=O)[O-])cc1 clinically toxic?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any additional words.\nOptions:\n1) False\n2) True\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI InChI=1S\/C17H19ClN5O4P.CH4O3S\/c18-13-3-1-2-12(8-13)14-4-6-26-28(24,27-14)11-25-7-5-23-10-22-15-16(19)20-9-21-17(15)23;1-5(2,3)4\/h1-3,8-10,14H,4-7,11H2,(H2,19,20,21);1H3,(H,2,3,4)\/t14-,28+;\/m0.\/s1 clinically toxic?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any additional words.\nOptions:\n1 True\n2 False\nAnswer: 1"}", "/scratch/micpie/export/clintox/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic.\nMolecule SMILES: C#CCC(Cc1cnc2nc(N)nc(N)c2n1)c1ccc(C(=O)N[C@@H](CCC(=O)[O-])C(=O)[O-])cc1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic.\nMolecule canonical SMILES: CS(=O)(=O)O.Nc1ncnc2c1ncn2CCOC[P@@]1(=O)OCC[C@@H](c2cccc(Cl)c2)O1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is toxic."}", "/scratch/micpie/export/clintox/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is clinically toxic.\nMolecule canonical SMILES: C#CCC(Cc1cnc2nc(N)nc(N)c2n1)c1ccc(C(=O)N[C@@H](CCC(=O)[O-])C(=O)[O-])cc1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic.\nSMILES: CS(=O)(=O)O.Nc1ncnc2c1ncn2CCOC[P@@]1(=O)OCC[C@@H](c2cccc(Cl)c2)O1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"}", "/scratch/micpie/export/clintox/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic.\ncanonical SMILES: *C(=O)[C@H](CCCCNC(=O)OCCOC)NC(=O)OCCOC\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is clinically toxic.\nSMILES: O=C1[C@H]2[C@@H]3CC[C@@H](C3)[C@H]2C(=O)N1C[C@@H]1CCCC[C@H]1C[NH+]1CCN(c2nsc3ccccc23)CC1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not clinically toxic."}", "/scratch/micpie/export/clintox/valid_0-12.jsonl": "{"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic.\nAssistant: Got it, this SELFIES is not toxic: [C][#C][C][C][Branch2][Ring1][=Branch1][C][C][=C][N][=C][N][=C][Branch1][C][N][N][=C][Branch1][C][N][C][Ring1][Branch2][=N][Ring1][N][C][=C][C][=C][Branch2][Ring1][Branch1][C][=Branch1][C][=O][N][C@@H1][Branch1][Branch2][C][C][C][=Branch1][C][=O][O-1][C][=Branch1][C][=O][O-1][C][=C][Ring2][Ring1][C]"} {"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should be toxic.\nAssistant: Ok, this InChI is toxic: InChI=1S\/C17H19ClN5O4P.CH4O3S\/c18-13-3-1-2-12(8-13)14-4-6-26-28(24,27-14)11-25-7-5-23-10-22-15-16(19)20-9-21-17(15)23;1-5(2,3)4\/h1-3,8-10,14H,4-7,11H2,(H2,19,20,21);1H3,(H,2,3,4)\/t14-,28+;\/m0.\/s1"}", "/scratch/micpie/export/clintox/train_0-2.jsonl": "{"text":"The DeepSMILES *C=O)[C@H]CCCCNC=O)OCCOC)))))))))))NC=O)OCCOC is from a molecule that is not identified as toxic."} {"text":"The canonical SMILES O=C1[C@@H]2[C@H]3CC[C@H](C3)[C@@H]2C(=O)N1C[C@@H]1CCCC[C@H]1C[NH+]1CCN(c2nsc3ccccc23)CC1 is from a molecule that is not identified as clinically toxic."}", "/scratch/micpie/export/clintox/test_0-11.jsonl": "{"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic.\nAssistant: Ok, here you go, this InChI is not toxic: InChI=1S\/C16H16N4O\/c17-15(18)12-5-2-10(3-6-12)1-4-11-7-8-13(16(19)20)9-14(11)21\/h1-9H,17-18H2,(H3,19,20)\/p+2\/b11-4-"} {"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic.\nAssistant: Got it, this DeepSMILES is not toxic: CcccccC)c6NC=O)CC)[NH3+]"}", "/scratch/micpie/export/clintox/train_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the SMILES *C(=O)[C@H](CCCCNC(=O)OCCOC)NC(=O)OCCOC is toxic?\nAssistant: No, this molecule is not toxic."} {"text":"User: Can you estimate if the molecule with the SELFIES [O][=C][C@H1][C@@H1][C][C][C@@H1][Branch1][Ring2][C][Ring1][Branch1][C@H1][Ring1][#Branch1][C][=Branch1][C][=O][N][Ring1][O][C][C@@H1][C][C][C][C][C@H1][Ring1][=Branch1][C][NH1+1][C][C][N][Branch1][=C][C][=N][S][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][C][Ring1][#C] is toxic?\nAssistant: No, this molecule is not toxic."}", "/scratch/micpie/export/clintox/train_0-11.jsonl": "{"text":"User: I want to come up with a canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be clinically toxic.\nAssistant: Ok, here you go, this canonical SMILES is not clinically toxic: *C(=O)[C@H](CCCCNC(=O)OCCOC)NC(=O)OCCOC"} {"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be clinically toxic.\nAssistant: Got it, this SMILES is not clinically toxic: O=C1[C@H]2[C@@H]3CC[C@@H](C3)[C@H]2C(=O)N1C[C@@H]1CCCC[C@H]1C[NH+]1CCN(c2nsc3ccccc23)CC1"}", "/scratch/micpie/export/clintox/train_0-1.jsonl": "{"text":"Based on the canonical SMILES representation *C(=O)[C@H](CCCCNC(=O)OCCOC)NC(=O)OCCOC, the molecule has no clinically toxic characteristics."} {"text":"Based on the SELFIES [O][=C][C@H1][C@@H1][C][C][C@@H1][Branch1][Ring2][C][Ring1][Branch1][C@H1][Ring1][#Branch1][C][=Branch1][C][=O][N][Ring1][O][C][C@@H1][C][C][C][C][C@H1][Ring1][=Branch1][C][NH1+1][C][C][N][Branch1][=C][C][=N][S][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][C][Ring1][#C], the molecule has no toxic characteristics."}", "/scratch/micpie/export/clintox/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES representation of *C(=O)[C@H](CCCCNC(=O)OCCOC)NC(=O)OCCOC clinically toxic?\nConstraint: Even if you are not sure, you must pick either a or b without using any additional words.\nOptions:\na. False\nb. True\nAnswer: a"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of O=C1[C@@H]2[C@H]3CC[C@H](C3)[C@@H]2C(=O)N1C[C@@H]1CCCC[C@H]1C[NH+]1CCN(c2nsc3ccccc23)CC1 toxic?\nConstraint: Even if you are uncertain, you must pick either A or B without using any additional words.\nOptions:\nA False\nB True\nAnswer: A"}", "/scratch/micpie/export/clintox/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic.\ncanonical SMILES: *C(=O)[C@H](CCCCNC(=O)OCCOC)NC(=O)OCCOC\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic.\nMolecule canonical SMILES: O=C1[C@@H]2[C@H]3CC[C@H](C3)[C@@H]2C(=O)N1C[C@@H]1CCCC[C@H]1C[NH+]1CCN(c2nsc3ccccc23)CC1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/clintox/test_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the SMILES [H]\/[NH+]=C(\\N)C1=CC(=O)\/C(=C\\C=c2ccc(=C(N)[NH3+])cc2)C=C1 is clinically toxic?\nAssistant: No, this molecule is not clinically toxic."} {"text":"User: Can you tell me if the molecule with the DeepSMILES CcccccC)c6NC=O)CC)[NH3+] is toxic?\nAssistant: No, this molecule is not toxic."}", "/scratch/micpie/export/clintox/train_0-9.jsonl": "{"text":"User: Can you create the canonical SMILES of a molecule that is not toxic?\nAssistant: Yes, here you go: *C(=O)[C@H](CCCCNC(=O)OCCOC)NC(=O)OCCOC"} {"text":"User: Can you create the InChI of a molecule that is not clinically toxic?\nAssistant: Yes, I'm happy to help, here you go: InChI=1S\/C28H36N4O2S\/c33-27-24-18-9-10-19(15-18)25(24)28(34)32(27)17-21-6-2-1-5-20(21)16-30-11-13-31(14-12-30)26-22-7-3-4-8-23(22)35-29-26\/h3-4,7-8,18-21,24-25H,1-2,5-6,9-17H2\/p+1\/t18-,19+,20-,21-,24+,25-\/m0\/s1"}", "/scratch/micpie/export/clintox/valid_0-3.jsonl": "{"text":"The SELFIES [C][#C][C][C][Branch2][Ring1][=Branch1][C][C][=C][N][=C][N][=C][Branch1][C][N][N][=C][Branch1][C][N][C][Ring1][Branch2][=N][Ring1][N][C][=C][C][=C][Branch2][Ring1][Branch1][C][=Branch1][C][=O][N][C@@H1][Branch1][Branch2][C][C][C][=Branch1][C][=O][O-1][C][=Branch1][C][=O][O-1][C][=C][Ring2][Ring1][C] is not clinically toxic."} {"text":"The DeepSMILES CS=O)=O)O.Ncncncc6ncn5CCOC[P@@]=O)OCC[C@@H]cccccCl)c6))))))O6 is toxic."}", "/scratch/micpie/export/clintox/test_0-8.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C16H16N4O\/c17-15(18)12-5-2-10(3-6-12)1-4-11-7-8-13(16(19)20)9-14(11)21\/h1-9H,17-18H2,(H3,19,20)\/p+2\/b11-4- clinically toxic?\nAssistant: No, it is not clinically toxic."} {"text":"User: Is the molecule with the canonical SMILES Cc1cccc(C)c1NC(=O)C(C)[NH3+] clinically toxic?\nAssistant: No, it is not clinically toxic."}", "/scratch/micpie/export/clintox/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic?\nConstraint: You must select none, one or more options from a, b, c, or d without using any other words.\nOptions:\na [O][=C][Branch1][C][O-1][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Branch2][Ring1][#Branch2][C][C][=C][Branch1][C][O][C][Branch1][=Branch1][C][=Branch1][C][=O][O-1][=C][C][=C][C][=C][C][=C][Ring1][=C][Ring1][=Branch1][=C][Ring2][Ring1][=Branch2][O]\nb [C][#C][C@][Branch1][C][O][C][C][C@H1][C@@H1][C][C][C][=C][C][=Branch1][C][=O][C][C][C@@H1][Ring1][#Branch1][C@H1][Ring1][O][C][C][C@@][Ring1][#C][Ring2][Ring1][Ring1][C][C]\nc [H][\/NH1+1][=C][Branch1][C][\\N][C][=C][C][=Branch1][C][=O][\/C][=Branch2][Ring1][C][=C][\\C][=C][C][=C][C][=Branch1][=Branch1][=C][Branch1][C][N][NH3+1][C][=C][Ring1][=Branch2][C][=C][Ring2][Ring1][C]\nd [C][C][C][C][O][C][=C][C][Branch1][#C][C][=Branch1][C][=O][O][C][C][NH1+1][Branch1][Ring1][C][C][C][C][=C][C][=C][Ring1][S][N]\nAnswer: a, b, c, d"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not clinically toxic?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any additional words.\nOptions:\n[A] InChI=1S\/C4H10N2\/c1-2-6-4-3-5-1\/h5-6H,1-4H2\/p+1\n[B] InChI=1S\/C26H32F2O7\/c1-13(29)33-12-20(32)26-21(34-22(2,3)35-26)10-15-16-9-18(27)17-8-14(30)6-7-23(17,4)25(16,28)19(31)11-24(15,26)5\/h6-8,15-16,18-19,21,31H,9-12H2,1-5H3\/t15-,16-,18-,19-,21+,23-,24-,25-,26+\/m0\/s1\n[C] InChI=1S\/C11H16N2O\/c1-7-5-4-6-8(2)10(7)13-11(14)9(3)12\/h4-6,9H,12H2,1-3H3,(H,13,14)\/p+1\n[D] InChI=1S\/C25H24FNO4\/c26-17-9-7-15(8-10-17)24-20-3-1-2-4-22(20)27-25(16-5-6-16)21(24)12-11-18(28)13-19(29)14-23(30)31\/h1-4,7-12,16,18-19,28-29H,5-6,13-14H2,(H,30,31)\/p-1\/b12-11+\/t18-,19-\/m1\/s1\n[E] InChI=1S\/C11H12N2O2S\/c1-7(13(15)11(12)14)10-6-8-4-2-3-5-9(8)16-10\/h2-7,15H,1H3,(H2,12,14)\nAnswer: A, B, C, D, E"}", "/scratch/micpie/export/clintox/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any other words.\nOptions:\n1. C#CCC(Cc1cnc2nc(N)nc(N)c2n1)c1ccc(C(=O)N[C@@H](CCC(=O)[O-])C(=O)[O-])cc1\n2. O=S(=O)([O-])O[C@H]1[C@H](O)CO[C@@H](O[C@@H]2CO[C@@H](O)[C@H](OS(=O)(=O)[O-])[C@H]2OS(=O)(=O)[O-])[C@@H]1OS(=O)(=O)[O-]\n3. CC(C)[NH2+]CCCC1(C(N)=O)c2ccccc2-c2ccccc21\n4. CCCCCCCCCC(=O)O[C@H]1CC[C@H]2[C@@H]3CCC4=CC(=O)CC[C@@H]4[C@H]3CC[C@]12C\nAnswer: 1, 2, 3, 4"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are clinically toxic?\nConstraint: You must select none, one or more options from 1, 2, 3, 4, or 5 without using any additional words.\nOptions:\n1. Cc1cc(-c2ccc(N\/N=C3\/C(=O)c4c(N)cc(S(=O)(=O)[O-])cc4C=C3S(=O)(=O)[O-])c(C)c2)ccc1N\/N=C1\/C(=O)c2c(N)cc(S(=O)(=O)[O-])cc2C=C1S(=O)(=O)[O-]\n2. CCOc1cc(CC(=O)N[C@@H](CC(C)C)c2ccccc2N2CCCCC2)ccc1C(=O)[O-]\n3. CS(=O)(=O)O.Nc1ncnc2c1ncn2CCOC[P@@]1(=O)OCC[C@@H](c2cccc(Cl)c2)O1\n4. CC1C=CC=CC=CC=CC=CC=CC=CC(OC2O[C@H](C)[C@@H](O)[C@H]([NH3+])[C@@H]2O)CC2OC(O)(CC(O)CC(O)CC(O)CC(=O)CCCC(=O)CC(=O)OC1C(C)CC(C)C(O)CC(=O)c1ccc(N)cc1)CC(O)C2C(=O)[O-]\n5. CC(=O)N(CC(O)CO)c1c(I)c(C(=O)NCCO)c(I)c(C(=O)NCC(O)CO)c1I\nAnswer: 3"}", "/scratch/micpie/export/clintox/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic.\nMolecule DeepSMILES: [H]\/[NH+]=C\\N)C=CC=O)\/C=C\\C=cccc=CN)[NH3+]))cc6))))))))C=C6\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic.\nMolecule SELFIES: [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][Branch1][C][C][NH3+1]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/clintox/test_0-12.jsonl": "{"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be toxic.\nAssistant: Ok, this SMILES is not toxic: [H]\/[NH+]=C(\\N)C1=CC(=O)\/C(=C\\C=c2ccc(=C(N)[NH3+])cc2)C=C1"} {"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be clinically toxic.\nAssistant: Got it, this canonical SMILES is not clinically toxic: Cc1cccc(C)c1NC(=O)C(C)[NH3+]"}", "/scratch/micpie/export/bio_ner_31/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: The medical history (for CVD and diabetes mellitus), smoking habits and alcohol use was similar between all groups. Table 1Baseline characteristics for males and females of the low dose group (IGF-1 target level between − 2 and − 1 SDS) and the high dose group (IGF-1 target level between 1 and 2 SDS) MalesFemalesLow Dosen = 10High Dosen = 11Low Dosen = 6High Dosen = 5Age (years) 46.3 (11.2) 47.4 (8.9) 49.1 (10.7) 44.3 (10.9) CO GH deficiency (%) 8027 * 330Duration GH treatment (years) 18.6 (9.4) 13.4 (6.3) 8.9 (5.3) 4.8 (1.8) IGF-1 in SDS0.29 (0.62)− 0.38 (0.38)**− 0.08 (0.57)− 0.03 (0.42) BMI (kg\/m2) 25.6 (3.4) 28.6 (3.8) 33 (14.5) 29 (5.2) Cranial radiotherapy (%) 101800Pituitary surgery (%) 10541740Isolated GH deficiency (%) 2026330LH\/FSH deficiency (%) 80453340TSH deficiency (%) 70641780ACTH deficiency (%) 80643360ADH deficiency (%) 03600Diabetes mellitus (%) 100330CVD (%) 0185040Married (%) 60738360Education 10 – 13 years (%) 30181640Education > 13 years (%) 70828360Values are mean (SD) unless stated otherwiseCO childhood onset GH deficiency, GH growth hormone, IGF-1 in SDS insulin like growth factor-1 in standard deviation score, BMI body mass index, LH\/FSH luteinising hormone\/follicle stimulating hormone, TSH thyroid stimulating hormone, ACTH adrenocorticotropic hormone, ADH antidiuretic hormone, CVD cardiovascular disease * p < 0.05, ** p < 0.01 (low dose versus high dose).\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: CVD,26,29,Disease\/Disorder\ndiabetes mellitus,34,51,Disease\/Disorder\nalcohol,74,81,Chemical\/Drug\nIGF - 1,196,203,Gene\/Protein\nIGF - 1,268,275,Gene\/Protein\nGH deficiency,453,466,Disease\/Disorder\nGH,490,492,Gene\/Protein\nIGF - 1,566,573,Gene\/Protein\nGH deficiency,784,797,Disease\/Disorder\nFSH deficiency,814,828,Disease\/Disorder\ndeficiency,845,855,Disease\/Disorder\ndeficiency,873,883,Disease\/Disorder\ndeficiency,900,910,Disease\/Disorder\nmellitus,929,937,Disease\/Disorder\nGH deficiency,1116,1129,Disease\/Disorder\nGH,1131,1133,Gene\/Protein\ngrowth hormone,1134,1148,Gene\/Protein\nIGF - 1,1150,1157,Gene\/Protein\ninsulin like growth factor - 1,1165,1195,Gene\/Protein\nLH,1246,1248,Gene\/Protein\nFSH,1251,1254,Gene\/Protein\nluteinising hormone,1255,1274,Gene\/Protein\nfollicle stimulating hormone,1277,1305,Gene\/Protein\nTSH,1307,1310,Gene\/Protein\nthyroid stimulating hormone,1311,1338,Gene\/Protein\nACTH,1340,1344,Gene\/Protein\nadrenocorticotropic hormone,1345,1372,Gene\/Protein\nADH,1374,1377,Gene\/Protein\nantidiuretic hormone,1378,1398,Gene\/Protein\nCVD,1400,1403,Disease\/Disorder\ncardiovascular disease,1404,1426,Disease\/Disorder"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: The medical history (for CVD and diabetes mellitus), smoking habits and alcohol use was similar between all groups. Table 1Baseline characteristics for males and females of the low dose group (IGF-1 target level between − 2 and − 1 SDS) and the high dose group (IGF-1 target level between 1 and 2 SDS) MalesFemalesLow Dosen = 10High Dosen = 11Low Dosen = 6High Dosen = 5Age (years) 46.3 (11.2) 47.4 (8.9) 49.1 (10.7) 44.3 (10.9) CO GH deficiency (%) 8027 * 330Duration GH treatment (years) 18.6 (9.4) 13.4 (6.3) 8.9 (5.3) 4.8 (1.8) IGF-1 in SDS0.29 (0.62)− 0.38 (0.38)**− 0.08 (0.57)− 0.03 (0.42) BMI (kg\/m2) 25.6 (3.4) 28.6 (3.8) 33 (14.5) 29 (5.2) Cranial radiotherapy (%) 101800Pituitary surgery (%) 10541740Isolated GH deficiency (%) 2026330LH\/FSH deficiency (%) 80453340TSH deficiency (%) 70641780ACTH deficiency (%) 80643360ADH deficiency (%) 03600Diabetes mellitus (%) 100330CVD (%) 0185040Married (%) 60738360Education 10 – 13 years (%) 30181640Education > 13 years (%) 70828360Values are mean (SD) unless stated otherwiseCO childhood onset GH deficiency, GH growth hormone, IGF-1 in SDS insulin like growth factor-1 in standard deviation score, BMI body mass index, LH\/FSH luteinising hormone\/follicle stimulating hormone, TSH thyroid stimulating hormone, ACTH adrenocorticotropic hormone, ADH antidiuretic hormone, CVD cardiovascular disease * p < 0.05, ** p < 0.01 (low dose versus high dose).\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: CVD,26,29,Disease\/Disorder\ndiabetes mellitus,34,51,Disease\/Disorder\nalcohol,74,81,Chemical\/Drug\nIGF - 1,196,203,Gene\/Protein\nIGF - 1,268,275,Gene\/Protein\nGH deficiency,453,466,Disease\/Disorder\nGH,490,492,Gene\/Protein\nIGF - 1,566,573,Gene\/Protein\nGH deficiency,784,797,Disease\/Disorder\nFSH deficiency,814,828,Disease\/Disorder\ndeficiency,845,855,Disease\/Disorder\ndeficiency,873,883,Disease\/Disorder\ndeficiency,900,910,Disease\/Disorder\nmellitus,929,937,Disease\/Disorder\nGH deficiency,1116,1129,Disease\/Disorder\nGH,1131,1133,Gene\/Protein\ngrowth hormone,1134,1148,Gene\/Protein\nIGF - 1,1150,1157,Gene\/Protein\ninsulin like growth factor - 1,1165,1195,Gene\/Protein\nLH,1246,1248,Gene\/Protein\nFSH,1251,1254,Gene\/Protein\nluteinising hormone,1255,1274,Gene\/Protein\nfollicle stimulating hormone,1277,1305,Gene\/Protein\nTSH,1307,1310,Gene\/Protein\nthyroid stimulating hormone,1311,1338,Gene\/Protein\nACTH,1340,1344,Gene\/Protein\nadrenocorticotropic hormone,1345,1372,Gene\/Protein\nADH,1374,1377,Gene\/Protein\nantidiuretic hormone,1378,1398,Gene\/Protein\nCVD,1400,1403,Disease\/Disorder\ncardiovascular disease,1404,1426,Disease\/Disorder"}", "/scratch/micpie/export/odd_one_out/test_0-5.jsonl": "{"text":"User: I have the following SMILES strings: CC(C)C[C@H](NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)[C@@H](C)O)C(N)=O, C=C1CC[C@@]2(OC1)O[C@H]1C[C@H]3[C@@H]4CC[C@@H]5C[C@@H](O[C@@H]6O[C@H](CO)[C@H](O)[C@H](O)[C@H]6O[C@@H]6O[C@H](CO)[C@@H](O)[C@H](O)[C@H]6O)[C@@H](O)C[C@]5(C)[C@H]4CC[C@]3(C)[C@H]1[C@@H]2C, CC(C)C[C@H](NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](CO)NC(=O)[C@@H]1CCCN1C(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](N)CS)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CS)C(=O)O, and CCCCCCCCCCCC[C@@H](O)[C@H]1CC[C@H]([C@H](O)CCC(O)CCCC(O)CCC[C@@H](O)CC2=C[C@H](C)OC2=O)O1. Which is the odd one in this list?\nAssistant: What do you understand as \"odd one\"?\nUser: We measure similarity in terms of Tanimoto distance between Morgan fingerprints of radius two. The \"odd one\" is the molecule that is most different from the others.\nAssistant: In that case, I believe that C=C1CC[C@@]2(OC1)O[C@H]1C[C@H]3[C@@H]4CC[C@@H]5C[C@@H](O[C@@H]6O[C@H](CO)[C@H](O)[C@H](O)[C@H]6O[C@@H]6O[C@H](CO)[C@@H](O)[C@H](O)[C@H]6O)[C@@H](O)C[C@]5(C)[C@H]4CC[C@]3(C)[C@H]1[C@@H]2C is the \"odd one\" you're looking for."} {"text":"User: I have the following SMILES strings: CC(C)Cn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccccc4)cc3)cc2n(C)c1=O, CC(C)c1c(Sc2ccccc2)n(OCCCO)c(=O)[nH]c1=O, CN1CCC2(CC1)SSC1(CCN(C)CC1)S2, and COc1cc(S(=O)(=O)c2ccccc2F)cc2c3c(oc12)CCNC3. Which is the odd one in this list?\nAssistant: Interesting question, what do you understand as \"odd one\"?\nUser: Let's assume we measure similarity in terms of Tanimoto distance between Morgan fingerprints of radius two. The \"odd one\" is the molecule that is most different from the others.\nAssistant: Then, I propose that CN1CCC2(CC1)SSC1(CCN(C)CC1)S2 is the \"odd one\" you're looking for."}", "/scratch/micpie/export/odd_one_out/test_0-1.jsonl": "{"text":"Task: You are given a list of SMILES of molecules and are asked to find the pair chemical that is maximally different from each other.\nMolecules: CC(C)C[C@H](NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)[C@@H](C)O)C(N)=O, C=C1CC[C@@]2(OC1)O[C@H]1C[C@H]3[C@@H]4CC[C@@H]5C[C@@H](O[C@@H]6O[C@H](CO)[C@H](O)[C@H](O)[C@H]6O[C@@H]6O[C@H](CO)[C@@H](O)[C@H](O)[C@H]6O)[C@@H](O)C[C@]5(C)[C@H]4CC[C@]3(C)[C@H]1[C@@H]2C, CC(C)C[C@H](NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](CO)NC(=O)[C@@H]1CCCN1C(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](N)CS)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CS)C(=O)O, and CCCCCCCCCCCC[C@@H](O)[C@H]1CC[C@H]([C@H](O)CCC(O)CCCC(O)CCC[C@@H](O)CC2=C[C@H](C)OC2=O)O1\nConstraint: Answer by returning two SMILES strings separated by a comma. Similarity is measured in terms of Tanimoto distance between Morgan fingerprints of radius two.\nAnswer: CC(C)C[C@H](NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)[C@@H](C)O)C(N)=O, C=C1CC[C@@]2(OC1)O[C@H]1C[C@H]3[C@@H]4CC[C@@H]5C[C@@H](O[C@@H]6O[C@H](CO)[C@H](O)[C@H](O)[C@H]6O[C@@H]6O[C@H](CO)[C@@H](O)[C@H](O)[C@H]6O)[C@@H](O)C[C@]5(C)[C@H]4CC[C@]3(C)[C@H]1[C@@H]2C"} {"text":"Task: You are given a list of SMILES of chemical compounds and are asked to find the pair compound that is maximally different from each other.\nMolecules: CC(C)Cn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccccc4)cc3)cc2n(C)c1=O, CC(C)c1c(Sc2ccccc2)n(OCCCO)c(=O)[nH]c1=O, CN1CCC2(CC1)SSC1(CCN(C)CC1)S2, and COc1cc(S(=O)(=O)c2ccccc2F)cc2c3c(oc12)CCNC3\nConstraint: Answer by returning two SMILES strings separated by a comma. Similarity is measured in terms of Tanimoto distance between Morgan fingerprints of radius 2.\nAnswer: CC(C)c1c(Sc2ccccc2)n(OCCCO)c(=O)[nH]c1=O, CN1CCC2(CC1)SSC1(CCN(C)CC1)S2"}", "/scratch/micpie/export/odd_one_out/valid_0-0.jsonl": "{"text":"Task: You are given a list of SMILES of chemicals and are asked to find the chemical that is most different from the others.\nMolecules: CC(C)(C)OC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)Cc1ccc(O)cc1, CCCCCC[C@H](N)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N1CCC[C@H]1C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@@H](Cc1ccccc1)C(N)=O, CC(C)C[C@H](NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)[C@@H](C)O)C(N)=O, and C=C1CC[C@@]2(OC1)O[C@H]1C[C@H]3[C@@H]4CC[C@@H]5C[C@@H](O[C@@H]6O[C@H](CO)[C@H](O)[C@H](O)[C@H]6O[C@@H]6O[C@H](CO)[C@@H](O)[C@H](O)[C@H]6O)[C@@H](O)C[C@]5(C)[C@H]4CC[C@]3(C)[C@H]1[C@@H]2C\nConstraint: Answer by returning the SMILES string. Similarity is measured in terms of Tanimoto distance between Morgan fingerprints of radius two.\nAnswer: C=C1CC[C@@]2(OC1)O[C@H]1C[C@H]3[C@@H]4CC[C@@H]5C[C@@H](O[C@@H]6O[C@H](CO)[C@H](O)[C@H](O)[C@H]6O[C@@H]6O[C@H](CO)[C@@H](O)[C@H](O)[C@H]6O)[C@@H](O)C[C@]5(C)[C@H]4CC[C@]3(C)[C@H]1[C@@H]2C"} {"text":"Task: You are given a sequence of SMILES of chemical compounds and are asked to find the compound that is maximally different from the others.\nMolecules: C=CCCC[C@@H]1CC[C@@H](CC)[C@@H]2CCCCN21, C[C@H](Nc1cc(C(N)=O)nc(-c2ccc(Oc3ccc(C#N)c(C(F)(F)F)c3)cc2)n1)C(N)=O, N#Cc1ccnc(N2CC[C@H](S(=O)(=O)c3ccc(N4CCN5CCC[C@H]5C4)cc3Cl)C2)n1, and CCC(=O)NCCOc1cccc2[nH]c(-c3n[nH]c4cc(C(F)(F)F)ccc34)cc12\nConstraint: Answer by returning the SMILES string. Similarity is measured in terms of Tanimoto distance between Morgan fingerprints of radius 2.\nAnswer: C=CCCC[C@@H]1CC[C@@H](CC)[C@@H]2CCCCN21"}", "/scratch/micpie/export/odd_one_out/test_0-2.jsonl": "{"text":"Task: You are given a sequence of SMILES of molecules and are asked to find the pair compound that is most similar to each other.\nMolecules: CC(C)C[C@H](NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)[C@@H](C)O)C(N)=O, C=C1CC[C@@]2(OC1)O[C@H]1C[C@H]3[C@@H]4CC[C@@H]5C[C@@H](O[C@@H]6O[C@H](CO)[C@H](O)[C@H](O)[C@H]6O[C@@H]6O[C@H](CO)[C@@H](O)[C@H](O)[C@H]6O)[C@@H](O)C[C@]5(C)[C@H]4CC[C@]3(C)[C@H]1[C@@H]2C, CC(C)C[C@H](NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](CO)NC(=O)[C@@H]1CCCN1C(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](N)CS)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CS)C(=O)O, and CCCCCCCCCCCC[C@@H](O)[C@H]1CC[C@H]([C@H](O)CCC(O)CCCC(O)CCC[C@@H](O)CC2=C[C@H](C)OC2=O)O1\nConstraint: Answer by returning two SMILES strings separated by a comma. Similarity is measured in terms of Tanimoto distance between Morgan fingerprints of radius 2.\nAnswer: CC(C)C[C@H](NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](CO)NC(=O)[C@@H]1CCCN1C(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](N)CS)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CS)C(=O)O, CC(C)C[C@H](NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)[C@@H](C)O)C(N)=O"} {"text":"Task: You are given a sequence of SMILES of chemicals and must find the pair compound that is most similar to each other.\nMolecules: CC(C)Cn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccccc4)cc3)cc2n(C)c1=O, CC(C)c1c(Sc2ccccc2)n(OCCCO)c(=O)[nH]c1=O, CN1CCC2(CC1)SSC1(CCN(C)CC1)S2, and COc1cc(S(=O)(=O)c2ccccc2F)cc2c3c(oc12)CCNC3\nConstraint: Answer by returning two SMILES strings separated by a comma. Similarity is measured in terms of Tanimoto distance between Morgan fingerprints of radius two.\nAnswer: CC(C)c1c(Sc2ccccc2)n(OCCCO)c(=O)[nH]c1=O, CC(C)Cn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccccc4)cc3)cc2n(C)c1=O"}", "/scratch/micpie/export/odd_one_out/test_0-0.jsonl": "{"text":"Task: You are given a sequence of SMILES of chemicals and are asked to find the molecule that is most different from the others.\nMolecules: CC(C)C[C@H](NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)[C@@H](C)O)C(N)=O, C=C1CC[C@@]2(OC1)O[C@H]1C[C@H]3[C@@H]4CC[C@@H]5C[C@@H](O[C@@H]6O[C@H](CO)[C@H](O)[C@H](O)[C@H]6O[C@@H]6O[C@H](CO)[C@@H](O)[C@H](O)[C@H]6O)[C@@H](O)C[C@]5(C)[C@H]4CC[C@]3(C)[C@H]1[C@@H]2C, CC(C)C[C@H](NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](CO)NC(=O)[C@@H]1CCCN1C(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](N)CS)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CS)C(=O)O, and CCCCCCCCCCCC[C@@H](O)[C@H]1CC[C@H]([C@H](O)CCC(O)CCCC(O)CCC[C@@H](O)CC2=C[C@H](C)OC2=O)O1\nConstraint: Answer by returning the SMILES string. Similarity is measured in terms of Tanimoto distance between Morgan fingerprints of radius two.\nAnswer: C=C1CC[C@@]2(OC1)O[C@H]1C[C@H]3[C@@H]4CC[C@@H]5C[C@@H](O[C@@H]6O[C@H](CO)[C@H](O)[C@H](O)[C@H]6O[C@@H]6O[C@H](CO)[C@@H](O)[C@H](O)[C@H]6O)[C@@H](O)C[C@]5(C)[C@H]4CC[C@]3(C)[C@H]1[C@@H]2C"} {"text":"Task: You are given a sequence of SMILES of chemicals and must find the compound that is maximally different from the others.\nMolecules: CC(C)Cn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccccc4)cc3)cc2n(C)c1=O, CC(C)c1c(Sc2ccccc2)n(OCCCO)c(=O)[nH]c1=O, CN1CCC2(CC1)SSC1(CCN(C)CC1)S2, and COc1cc(S(=O)(=O)c2ccccc2F)cc2c3c(oc12)CCNC3\nConstraint: Answer by returning the SMILES string. Similarity is measured in terms of Tanimoto distance between Morgan fingerprints of radius two.\nAnswer: CN1CCC2(CC1)SSC1(CCN(C)CC1)S2"}", "/scratch/micpie/export/odd_one_out/test_0-3.jsonl": "{"text":"Question: I have a sequence of SMILES for chemical compounds: CC(C)C[C@H](NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)[C@@H](C)O)C(N)=O, C=C1CC[C@@]2(OC1)O[C@H]1C[C@H]3[C@@H]4CC[C@@H]5C[C@@H](O[C@@H]6O[C@H](CO)[C@H](O)[C@H](O)[C@H]6O[C@@H]6O[C@H](CO)[C@@H](O)[C@H](O)[C@H]6O)[C@@H](O)C[C@]5(C)[C@H]4CC[C@]3(C)[C@H]1[C@@H]2C, CC(C)C[C@H](NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](CO)NC(=O)[C@@H]1CCCN1C(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](N)CS)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CS)C(=O)O, and CCCCCCCCCCCC[C@@H](O)[C@H]1CC[C@H]([C@H](O)CCC(O)CCCC(O)CCC[C@@H](O)CC2=C[C@H](C)OC2=O)O1. Which two molecules have the highest similarity based on their Tanimoto distance calculated from Morgan fingerprints of radius 2?\nAnswer: The two most similar molecules are CC(C)C[C@H](NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](CO)NC(=O)[C@@H]1CCCN1C(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](N)CS)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CS)C(=O)O and CC(C)C[C@H](NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)[C@@H](C)O)C(N)=O."} {"text":"Question: I have a sequence of SMILES for molecules: CC(C)Cn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccccc4)cc3)cc2n(C)c1=O, CC(C)c1c(Sc2ccccc2)n(OCCCO)c(=O)[nH]c1=O, CN1CCC2(CC1)SSC1(CCN(C)CC1)S2, and COc1cc(S(=O)(=O)c2ccccc2F)cc2c3c(oc12)CCNC3. Which two molecules have the highest similarity based on their Tanimoto distance calculated from Morgan fingerprints of radius two?\nAnswer: The two most similar molecules are CC(C)c1c(Sc2ccccc2)n(OCCCO)c(=O)[nH]c1=O and CC(C)Cn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccccc4)cc3)cc2n(C)c1=O."}", "/scratch/micpie/export/odd_one_out/train_0-0.jsonl": "{"text":"Task: You are given a list of SMILES of chemical compounds and must find the molecule that is most different from the others.\nMolecules: CC[C@H](C)[C@@H]1NC(=O)[C@@H](CCCN=C(N)N)NC(=O)CNC(=O)CNC(=O)[C@@H](Cc2ccccc2)NC(=O)CNC(=O)[C@H](NC(=O)[C@H](CO)NC(=O)[C@@H](N)CO)CSSC[C@H](C(=O)N[C@H](Cc2ccccc2)C(=O)NCC(=O)O)NC(=O)[C@@H]([C@@H](C)CC)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@@H](CC(=O)O)NC1=O, CC(=O)N[C@@H](Cc1ccc2ccccc2c1)C(=O)N[C@@H](Cc1ccc(Cl)cc1)C(=O)N[C@@H](Cc1cccnc1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCCNC(=O)c1cccnc1N)C(=O)N[C@@H](CCCCNC(=O)c1cccnc1N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N1CCC[C@@H]1C(=O)N[C@@H](C)C(=O)O, Cc1nonc1-c1nc(S)n[nH]1, and Cc1ccc(OCCCON\/C(N)=N\/C(N)=N\/C(C)C)cc1C.Cl\nConstraint: Answer by returning the SMILES string. Similarity is measured in terms of Tanimoto distance between Morgan fingerprints of radius 2.\nAnswer: Cc1nonc1-c1nc(S)n[nH]1"} {"text":"Task: You are given a list of SMILES of chemicals and are asked to find the molecule that is maximally different from the others.\nMolecules: COc1ccc(N2CCN(C(=O)c3ccc(C)c(NC(=O)c4nsc5ccccc45)c3)CC2)cc1, CS[C@H]1CCS[C@H]2C1=C(C(=O)O)N1C(=O)[C@H]([C@@H](C)O)[C@@H]21, COc1ccc(\/C(=N\/O)c2ccnc(Nc3ccc(C#N)cc3)n2)cc1, and O=C1\/C(=C\/c2ccccc2)C(c2ccccc2)n2cccc21\nConstraint: Answer by returning the SMILES string. Similarity is measured in terms of Tanimoto distance between Morgan fingerprints of radius two.\nAnswer: CS[C@H]1CCS[C@H]2C1=C(C(=O)O)N1C(=O)[C@H]([C@@H](C)O)[C@@H]21"}", "/scratch/micpie/export/odd_one_out/train_0-3.jsonl": "{"text":"Question: I have a sequence of SMILES for chemicals: CC[C@H](C)[C@@H]1NC(=O)[C@@H](CCCN=C(N)N)NC(=O)CNC(=O)CNC(=O)[C@@H](Cc2ccccc2)NC(=O)CNC(=O)[C@H](NC(=O)[C@H](CO)NC(=O)[C@@H](N)CO)CSSC[C@H](C(=O)N[C@H](Cc2ccccc2)C(=O)NCC(=O)O)NC(=O)[C@@H]([C@@H](C)CC)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@@H](CC(=O)O)NC1=O, CC(=O)N[C@@H](Cc1ccc2ccccc2c1)C(=O)N[C@@H](Cc1ccc(Cl)cc1)C(=O)N[C@@H](Cc1cccnc1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCCNC(=O)c1cccnc1N)C(=O)N[C@@H](CCCCNC(=O)c1cccnc1N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N1CCC[C@@H]1C(=O)N[C@@H](C)C(=O)O, Cc1nonc1-c1nc(S)n[nH]1, and Cc1ccc(OCCCON\/C(N)=N\/C(N)=N\/C(C)C)cc1C.Cl. Which two molecules have the highest similarity based on their Tanimoto distance calculated from Morgan fingerprints of radius 2?\nAnswer: The two most similar molecules are CC(=O)N[C@@H](Cc1ccc2ccccc2c1)C(=O)N[C@@H](Cc1ccc(Cl)cc1)C(=O)N[C@@H](Cc1cccnc1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCCNC(=O)c1cccnc1N)C(=O)N[C@@H](CCCCNC(=O)c1cccnc1N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N1CCC[C@@H]1C(=O)N[C@@H](C)C(=O)O and CC[C@H](C)[C@@H]1NC(=O)[C@@H](CCCN=C(N)N)NC(=O)CNC(=O)CNC(=O)[C@@H](Cc2ccccc2)NC(=O)CNC(=O)[C@H](NC(=O)[C@H](CO)NC(=O)[C@@H](N)CO)CSSC[C@H](C(=O)N[C@H](Cc2ccccc2)C(=O)NCC(=O)O)NC(=O)[C@@H]([C@@H](C)CC)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@@H](CC(=O)O)NC1=O."} {"text":"Question: I have a sequence of SMILES for molecules: COc1ccc(N2CCN(C(=O)c3ccc(C)c(NC(=O)c4nsc5ccccc45)c3)CC2)cc1, CS[C@H]1CCS[C@H]2C1=C(C(=O)O)N1C(=O)[C@H]([C@@H](C)O)[C@@H]21, COc1ccc(\/C(=N\/O)c2ccnc(Nc3ccc(C#N)cc3)n2)cc1, and O=C1\/C(=C\/c2ccccc2)C(c2ccccc2)n2cccc21. Which two molecules have the highest similarity based on their Tanimoto distance calculated from Morgan fingerprints of radius 2?\nAnswer: The two most similar molecules are COc1ccc(\/C(=N\/O)c2ccnc(Nc3ccc(C#N)cc3)n2)cc1 and COc1ccc(N2CCN(C(=O)c3ccc(C)c(NC(=O)c4nsc5ccccc45)c3)CC2)cc1."}", "/scratch/micpie/export/odd_one_out/valid_0-2.jsonl": "{"text":"Task: You are given a sequence of SMILES of chemicals and are asked to find the pair compound that is most similar to each other.\nMolecules: CC(C)(C)OC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)Cc1ccc(O)cc1, CCCCCC[C@H](N)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N1CCC[C@H]1C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@@H](Cc1ccccc1)C(N)=O, CC(C)C[C@H](NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)[C@@H](C)O)C(N)=O, and C=C1CC[C@@]2(OC1)O[C@H]1C[C@H]3[C@@H]4CC[C@@H]5C[C@@H](O[C@@H]6O[C@H](CO)[C@H](O)[C@H](O)[C@H]6O[C@@H]6O[C@H](CO)[C@@H](O)[C@H](O)[C@H]6O)[C@@H](O)C[C@]5(C)[C@H]4CC[C@]3(C)[C@H]1[C@@H]2C\nConstraint: Answer by returning two SMILES strings separated by a comma. Similarity is measured in terms of Tanimoto distance between Morgan fingerprints of radius 2.\nAnswer: CCCCCC[C@H](N)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N1CCC[C@H]1C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@@H](Cc1ccccc1)C(N)=O, CC(C)(C)OC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)Cc1ccc(O)cc1"} {"text":"Task: You are given a sequence of SMILES of molecules and must find the pair molecule that is maximally similar to each other.\nMolecules: C=CCCC[C@@H]1CC[C@@H](CC)[C@@H]2CCCCN21, C[C@H](Nc1cc(C(N)=O)nc(-c2ccc(Oc3ccc(C#N)c(C(F)(F)F)c3)cc2)n1)C(N)=O, N#Cc1ccnc(N2CC[C@H](S(=O)(=O)c3ccc(N4CCN5CCC[C@H]5C4)cc3Cl)C2)n1, and CCC(=O)NCCOc1cccc2[nH]c(-c3n[nH]c4cc(C(F)(F)F)ccc34)cc12\nConstraint: Answer by returning two SMILES strings separated by a comma. Similarity is measured in terms of Tanimoto distance between Morgan fingerprints of radius two.\nAnswer: CCC(=O)NCCOc1cccc2[nH]c(-c3n[nH]c4cc(C(F)(F)F)ccc34)cc12, C[C@H](Nc1cc(C(N)=O)nc(-c2ccc(Oc3ccc(C#N)c(C(F)(F)F)c3)cc2)n1)C(N)=O"}", "/scratch/micpie/export/odd_one_out/valid_0-1.jsonl": "{"text":"Task: You are given a sequence of SMILES of molecules and must find the pair chemical that is maximally different from each other.\nMolecules: CC(C)(C)OC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)Cc1ccc(O)cc1, CCCCCC[C@H](N)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N1CCC[C@H]1C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@@H](Cc1ccccc1)C(N)=O, CC(C)C[C@H](NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)[C@@H](C)O)C(N)=O, and C=C1CC[C@@]2(OC1)O[C@H]1C[C@H]3[C@@H]4CC[C@@H]5C[C@@H](O[C@@H]6O[C@H](CO)[C@H](O)[C@H](O)[C@H]6O[C@@H]6O[C@H](CO)[C@@H](O)[C@H](O)[C@H]6O)[C@@H](O)C[C@]5(C)[C@H]4CC[C@]3(C)[C@H]1[C@@H]2C\nConstraint: Answer by returning two SMILES strings separated by a comma. Similarity is measured in terms of Tanimoto distance between Morgan fingerprints of radius two.\nAnswer: CC(C)C[C@H](NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)[C@@H](C)O)C(N)=O, C=C1CC[C@@]2(OC1)O[C@H]1C[C@H]3[C@@H]4CC[C@@H]5C[C@@H](O[C@@H]6O[C@H](CO)[C@H](O)[C@H](O)[C@H]6O[C@@H]6O[C@H](CO)[C@@H](O)[C@H](O)[C@H]6O)[C@@H](O)C[C@]5(C)[C@H]4CC[C@]3(C)[C@H]1[C@@H]2C"} {"text":"Task: You are given a sequence of SMILES of molecules and are asked to find the pair molecule that is most different from each other.\nMolecules: C=CCCC[C@@H]1CC[C@@H](CC)[C@@H]2CCCCN21, C[C@H](Nc1cc(C(N)=O)nc(-c2ccc(Oc3ccc(C#N)c(C(F)(F)F)c3)cc2)n1)C(N)=O, N#Cc1ccnc(N2CC[C@H](S(=O)(=O)c3ccc(N4CCN5CCC[C@H]5C4)cc3Cl)C2)n1, and CCC(=O)NCCOc1cccc2[nH]c(-c3n[nH]c4cc(C(F)(F)F)ccc34)cc12\nConstraint: Answer by returning two SMILES strings separated by a comma. Similarity is measured in terms of Tanimoto distance between Morgan fingerprints of radius 2.\nAnswer: C=CCCC[C@@H]1CC[C@@H](CC)[C@@H]2CCCCN21, C[C@H](Nc1cc(C(N)=O)nc(-c2ccc(Oc3ccc(C#N)c(C(F)(F)F)c3)cc2)n1)C(N)=O"}", "/scratch/micpie/export/odd_one_out/valid_0-5.jsonl": "{"text":"User: I have the following SMILES strings: CC(C)(C)OC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)Cc1ccc(O)cc1, CCCCCC[C@H](N)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N1CCC[C@H]1C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@@H](Cc1ccccc1)C(N)=O, CC(C)C[C@H](NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)[C@@H](C)O)C(N)=O, and C=C1CC[C@@]2(OC1)O[C@H]1C[C@H]3[C@@H]4CC[C@@H]5C[C@@H](O[C@@H]6O[C@H](CO)[C@H](O)[C@H](O)[C@H]6O[C@@H]6O[C@H](CO)[C@@H](O)[C@H](O)[C@H]6O)[C@@H](O)C[C@]5(C)[C@H]4CC[C@]3(C)[C@H]1[C@@H]2C. Which is the odd one in this list?\nAssistant: What do you mean by \"odd one\"?\nUser: We measure similarity in terms of Tanimoto distance between Morgan fingerprints of radius two. The \"odd one\" is the molecule that is most different from the others.\nAssistant: In that case, I propose that C=C1CC[C@@]2(OC1)O[C@H]1C[C@H]3[C@@H]4CC[C@@H]5C[C@@H](O[C@@H]6O[C@H](CO)[C@H](O)[C@H](O)[C@H]6O[C@@H]6O[C@H](CO)[C@@H](O)[C@H](O)[C@H]6O)[C@@H](O)C[C@]5(C)[C@H]4CC[C@]3(C)[C@H]1[C@@H]2C is the \"odd one\" you're looking for."} {"text":"User: I have the following SMILES strings: C=CCCC[C@@H]1CC[C@@H](CC)[C@@H]2CCCCN21, C[C@H](Nc1cc(C(N)=O)nc(-c2ccc(Oc3ccc(C#N)c(C(F)(F)F)c3)cc2)n1)C(N)=O, N#Cc1ccnc(N2CC[C@H](S(=O)(=O)c3ccc(N4CCN5CCC[C@H]5C4)cc3Cl)C2)n1, and CCC(=O)NCCOc1cccc2[nH]c(-c3n[nH]c4cc(C(F)(F)F)ccc34)cc12. Which is the odd one in this list?\nAssistant: Cool, what do you mean by \"odd one\"?\nUser: We measure similarity in terms of Tanimoto distance between Morgan fingerprints of radius two. The \"odd one\" is the molecule that is most different from the others.\nAssistant: Then, I believe that C=CCCC[C@@H]1CC[C@@H](CC)[C@@H]2CCCCN21 is the \"odd one\" you're looking for."}", "/scratch/micpie/export/odd_one_out/valid_0-4.jsonl": "{"text":"Question: I have the following SMILES strings: CC(C)(C)OC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)Cc1ccc(O)cc1, CCCCCC[C@H](N)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N1CCC[C@H]1C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@@H](Cc1ccccc1)C(N)=O, CC(C)C[C@H](NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)[C@@H](C)O)C(N)=O, and C=C1CC[C@@]2(OC1)O[C@H]1C[C@H]3[C@@H]4CC[C@@H]5C[C@@H](O[C@@H]6O[C@H](CO)[C@H](O)[C@H](O)[C@H]6O[C@@H]6O[C@H](CO)[C@@H](O)[C@H](O)[C@H]6O)[C@@H](O)C[C@]5(C)[C@H]4CC[C@]3(C)[C@H]1[C@@H]2C. Which compound is the most different from the all others based on Tanimoto distance of their Morgan fingerprints of radius 2?\nAnswer: The most dissimilar compound is C=C1CC[C@@]2(OC1)O[C@H]1C[C@H]3[C@@H]4CC[C@@H]5C[C@@H](O[C@@H]6O[C@H](CO)[C@H](O)[C@H](O)[C@H]6O[C@@H]6O[C@H](CO)[C@@H](O)[C@H](O)[C@H]6O)[C@@H](O)C[C@]5(C)[C@H]4CC[C@]3(C)[C@H]1[C@@H]2C."} {"text":"Question: I have the following SMILES strings: C=CCCC[C@@H]1CC[C@@H](CC)[C@@H]2CCCCN21, C[C@H](Nc1cc(C(N)=O)nc(-c2ccc(Oc3ccc(C#N)c(C(F)(F)F)c3)cc2)n1)C(N)=O, N#Cc1ccnc(N2CC[C@H](S(=O)(=O)c3ccc(N4CCN5CCC[C@H]5C4)cc3Cl)C2)n1, and CCC(=O)NCCOc1cccc2[nH]c(-c3n[nH]c4cc(C(F)(F)F)ccc34)cc12. Which molecule is the most different from the all others based on Tanimoto distance of their Morgan fingerprints of radius 2?\nAnswer: The most dissimilar molecule is C=CCCC[C@@H]1CC[C@@H](CC)[C@@H]2CCCCN21."}", "/scratch/micpie/export/odd_one_out/train_0-5.jsonl": "{"text":"User: I have the following SMILES strings: CC[C@H](C)[C@@H]1NC(=O)[C@@H](CCCN=C(N)N)NC(=O)CNC(=O)CNC(=O)[C@@H](Cc2ccccc2)NC(=O)CNC(=O)[C@H](NC(=O)[C@H](CO)NC(=O)[C@@H](N)CO)CSSC[C@H](C(=O)N[C@H](Cc2ccccc2)C(=O)NCC(=O)O)NC(=O)[C@@H]([C@@H](C)CC)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@@H](CC(=O)O)NC1=O, CC(=O)N[C@@H](Cc1ccc2ccccc2c1)C(=O)N[C@@H](Cc1ccc(Cl)cc1)C(=O)N[C@@H](Cc1cccnc1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCCNC(=O)c1cccnc1N)C(=O)N[C@@H](CCCCNC(=O)c1cccnc1N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N1CCC[C@@H]1C(=O)N[C@@H](C)C(=O)O, Cc1nonc1-c1nc(S)n[nH]1, and Cc1ccc(OCCCON\/C(N)=N\/C(N)=N\/C(C)C)cc1C.Cl. Which is the odd one in this list?\nAssistant: What do you understand as \"odd one\"?\nUser: Let's assume we measure similarity in terms of Tanimoto distance between Morgan fingerprints of radius two. The \"odd one\" is the molecule that is most different from the others.\nAssistant: Then, I propose that Cc1nonc1-c1nc(S)n[nH]1 is the \"odd one\" you're looking for."} {"text":"User: I have the following SMILES strings: COc1ccc(N2CCN(C(=O)c3ccc(C)c(NC(=O)c4nsc5ccccc45)c3)CC2)cc1, CS[C@H]1CCS[C@H]2C1=C(C(=O)O)N1C(=O)[C@H]([C@@H](C)O)[C@@H]21, COc1ccc(\/C(=N\/O)c2ccnc(Nc3ccc(C#N)cc3)n2)cc1, and O=C1\/C(=C\/c2ccccc2)C(c2ccccc2)n2cccc21. Which is the odd one in this list?\nAssistant: Interesting, what do you mean by \"odd one\"?\nUser: Let's assume we measure similarity in terms of Tanimoto distance between Morgan fingerprints of radius two. The \"odd one\" is the molecule that is most different from the others.\nAssistant: Then, I believe that CS[C@H]1CCS[C@H]2C1=C(C(=O)O)N1C(=O)[C@H]([C@@H](C)O)[C@@H]21 is the \"odd one\" you're looking for."}", "/scratch/micpie/export/odd_one_out/train_0-2.jsonl": "{"text":"Task: You are given a list of SMILES of molecules and are asked to find the pair molecule that is maximally similar to each other.\nMolecules: CC[C@H](C)[C@@H]1NC(=O)[C@@H](CCCN=C(N)N)NC(=O)CNC(=O)CNC(=O)[C@@H](Cc2ccccc2)NC(=O)CNC(=O)[C@H](NC(=O)[C@H](CO)NC(=O)[C@@H](N)CO)CSSC[C@H](C(=O)N[C@H](Cc2ccccc2)C(=O)NCC(=O)O)NC(=O)[C@@H]([C@@H](C)CC)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@@H](CC(=O)O)NC1=O, CC(=O)N[C@@H](Cc1ccc2ccccc2c1)C(=O)N[C@@H](Cc1ccc(Cl)cc1)C(=O)N[C@@H](Cc1cccnc1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCCNC(=O)c1cccnc1N)C(=O)N[C@@H](CCCCNC(=O)c1cccnc1N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N1CCC[C@@H]1C(=O)N[C@@H](C)C(=O)O, Cc1nonc1-c1nc(S)n[nH]1, and Cc1ccc(OCCCON\/C(N)=N\/C(N)=N\/C(C)C)cc1C.Cl\nConstraint: Answer by returning two SMILES strings separated by a comma. Similarity is measured in terms of Tanimoto distance between Morgan fingerprints of radius 2.\nAnswer: CC(=O)N[C@@H](Cc1ccc2ccccc2c1)C(=O)N[C@@H](Cc1ccc(Cl)cc1)C(=O)N[C@@H](Cc1cccnc1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCCNC(=O)c1cccnc1N)C(=O)N[C@@H](CCCCNC(=O)c1cccnc1N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N1CCC[C@@H]1C(=O)N[C@@H](C)C(=O)O, CC[C@H](C)[C@@H]1NC(=O)[C@@H](CCCN=C(N)N)NC(=O)CNC(=O)CNC(=O)[C@@H](Cc2ccccc2)NC(=O)CNC(=O)[C@H](NC(=O)[C@H](CO)NC(=O)[C@@H](N)CO)CSSC[C@H](C(=O)N[C@H](Cc2ccccc2)C(=O)NCC(=O)O)NC(=O)[C@@H]([C@@H](C)CC)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@@H](CC(=O)O)NC1=O"} {"text":"Task: You are given a list of SMILES of molecules and are asked to find the pair molecule that is maximally similar to each other.\nMolecules: COc1ccc(N2CCN(C(=O)c3ccc(C)c(NC(=O)c4nsc5ccccc45)c3)CC2)cc1, CS[C@H]1CCS[C@H]2C1=C(C(=O)O)N1C(=O)[C@H]([C@@H](C)O)[C@@H]21, COc1ccc(\/C(=N\/O)c2ccnc(Nc3ccc(C#N)cc3)n2)cc1, and O=C1\/C(=C\/c2ccccc2)C(c2ccccc2)n2cccc21\nConstraint: Answer by returning two SMILES strings separated by a comma. Similarity is measured in terms of Tanimoto distance between Morgan fingerprints of radius two.\nAnswer: COc1ccc(\/C(=N\/O)c2ccnc(Nc3ccc(C#N)cc3)n2)cc1, COc1ccc(N2CCN(C(=O)c3ccc(C)c(NC(=O)c4nsc5ccccc45)c3)CC2)cc1"}", "/scratch/micpie/export/odd_one_out/train_0-1.jsonl": "{"text":"Task: You are given a sequence of SMILES of chemical compounds and are asked to find the pair molecule that is most different from each other.\nMolecules: CC[C@H](C)[C@@H]1NC(=O)[C@@H](CCCN=C(N)N)NC(=O)CNC(=O)CNC(=O)[C@@H](Cc2ccccc2)NC(=O)CNC(=O)[C@H](NC(=O)[C@H](CO)NC(=O)[C@@H](N)CO)CSSC[C@H](C(=O)N[C@H](Cc2ccccc2)C(=O)NCC(=O)O)NC(=O)[C@@H]([C@@H](C)CC)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@@H](CC(=O)O)NC1=O, CC(=O)N[C@@H](Cc1ccc2ccccc2c1)C(=O)N[C@@H](Cc1ccc(Cl)cc1)C(=O)N[C@@H](Cc1cccnc1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCCNC(=O)c1cccnc1N)C(=O)N[C@@H](CCCCNC(=O)c1cccnc1N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N1CCC[C@@H]1C(=O)N[C@@H](C)C(=O)O, Cc1nonc1-c1nc(S)n[nH]1, and Cc1ccc(OCCCON\/C(N)=N\/C(N)=N\/C(C)C)cc1C.Cl\nConstraint: Answer by returning two SMILES strings separated by a comma. Similarity is measured in terms of Tanimoto distance between Morgan fingerprints of radius two.\nAnswer: CC[C@H](C)[C@@H]1NC(=O)[C@@H](CCCN=C(N)N)NC(=O)CNC(=O)CNC(=O)[C@@H](Cc2ccccc2)NC(=O)CNC(=O)[C@H](NC(=O)[C@H](CO)NC(=O)[C@@H](N)CO)CSSC[C@H](C(=O)N[C@H](Cc2ccccc2)C(=O)NCC(=O)O)NC(=O)[C@@H]([C@@H](C)CC)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@@H](CC(=O)O)NC1=O, Cc1nonc1-c1nc(S)n[nH]1"} {"text":"Task: You are given a sequence of SMILES of chemical compounds and must find the pair compound that is maximally different from each other.\nMolecules: COc1ccc(N2CCN(C(=O)c3ccc(C)c(NC(=O)c4nsc5ccccc45)c3)CC2)cc1, CS[C@H]1CCS[C@H]2C1=C(C(=O)O)N1C(=O)[C@H]([C@@H](C)O)[C@@H]21, COc1ccc(\/C(=N\/O)c2ccnc(Nc3ccc(C#N)cc3)n2)cc1, and O=C1\/C(=C\/c2ccccc2)C(c2ccccc2)n2cccc21\nConstraint: Answer by returning two SMILES strings separated by a comma. Similarity is measured in terms of Tanimoto distance between Morgan fingerprints of radius 2.\nAnswer: CS[C@H]1CCS[C@H]2C1=C(C(=O)O)N1C(=O)[C@H]([C@@H](C)O)[C@@H]21, COc1ccc(\/C(=N\/O)c2ccnc(Nc3ccc(C#N)cc3)n2)cc1"}", "/scratch/micpie/export/odd_one_out/train_0-4.jsonl": "{"text":"Question: I have the following SMILES strings: CC[C@H](C)[C@@H]1NC(=O)[C@@H](CCCN=C(N)N)NC(=O)CNC(=O)CNC(=O)[C@@H](Cc2ccccc2)NC(=O)CNC(=O)[C@H](NC(=O)[C@H](CO)NC(=O)[C@@H](N)CO)CSSC[C@H](C(=O)N[C@H](Cc2ccccc2)C(=O)NCC(=O)O)NC(=O)[C@@H]([C@@H](C)CC)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@@H](CC(=O)O)NC1=O, CC(=O)N[C@@H](Cc1ccc2ccccc2c1)C(=O)N[C@@H](Cc1ccc(Cl)cc1)C(=O)N[C@@H](Cc1cccnc1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCCNC(=O)c1cccnc1N)C(=O)N[C@@H](CCCCNC(=O)c1cccnc1N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N1CCC[C@@H]1C(=O)N[C@@H](C)C(=O)O, Cc1nonc1-c1nc(S)n[nH]1, and Cc1ccc(OCCCON\/C(N)=N\/C(N)=N\/C(C)C)cc1C.Cl. Which chemical is the most different from the all others based on Tanimoto distance of their Morgan fingerprints of radius 2?\nAnswer: The most dissimilar chemical is Cc1nonc1-c1nc(S)n[nH]1."} {"text":"Question: I have the following SMILES strings: COc1ccc(N2CCN(C(=O)c3ccc(C)c(NC(=O)c4nsc5ccccc45)c3)CC2)cc1, CS[C@H]1CCS[C@H]2C1=C(C(=O)O)N1C(=O)[C@H]([C@@H](C)O)[C@@H]21, COc1ccc(\/C(=N\/O)c2ccnc(Nc3ccc(C#N)cc3)n2)cc1, and O=C1\/C(=C\/c2ccccc2)C(c2ccccc2)n2cccc21. Which chemical is the most different from the all others based on Tanimoto distance of their Morgan fingerprints of radius two?\nAnswer: The most dissimilar chemical is CS[C@H]1CCS[C@H]2C1=C(C(=O)O)N1C(=O)[C@H]([C@@H](C)O)[C@@H]21."}", "/scratch/micpie/export/odd_one_out/valid_0-3.jsonl": "{"text":"Question: I have a sequence of SMILES for molecules: CC(C)(C)OC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)Cc1ccc(O)cc1, CCCCCC[C@H](N)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N1CCC[C@H]1C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@@H](Cc1ccccc1)C(N)=O, CC(C)C[C@H](NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)[C@@H](C)O)C(N)=O, and C=C1CC[C@@]2(OC1)O[C@H]1C[C@H]3[C@@H]4CC[C@@H]5C[C@@H](O[C@@H]6O[C@H](CO)[C@H](O)[C@H](O)[C@H]6O[C@@H]6O[C@H](CO)[C@@H](O)[C@H](O)[C@H]6O)[C@@H](O)C[C@]5(C)[C@H]4CC[C@]3(C)[C@H]1[C@@H]2C. Which two molecules have the highest similarity based on their Tanimoto distance calculated from Morgan fingerprints of radius 2?\nAnswer: The two most similar molecules are CCCCCC[C@H](N)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N1CCC[C@H]1C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@@H](Cc1ccccc1)C(N)=O and CC(C)(C)OC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)Cc1ccc(O)cc1."} {"text":"Question: I have a list of SMILES for molecules: C=CCCC[C@@H]1CC[C@@H](CC)[C@@H]2CCCCN21, C[C@H](Nc1cc(C(N)=O)nc(-c2ccc(Oc3ccc(C#N)c(C(F)(F)F)c3)cc2)n1)C(N)=O, N#Cc1ccnc(N2CC[C@H](S(=O)(=O)c3ccc(N4CCN5CCC[C@H]5C4)cc3Cl)C2)n1, and CCC(=O)NCCOc1cccc2[nH]c(-c3n[nH]c4cc(C(F)(F)F)ccc34)cc12. Which two molecules have the highest similarity based on their Tanimoto distance calculated from Morgan fingerprints of radius 2?\nAnswer: The two most similar molecules are CCC(=O)NCCOc1cccc2[nH]c(-c3n[nH]c4cc(C(F)(F)F)ccc34)cc12 and C[C@H](Nc1cc(C(N)=O)nc(-c2ccc(Oc3ccc(C#N)c(C(F)(F)F)c3)cc2)n1)C(N)=O."}", "/scratch/micpie/export/odd_one_out/test_0-4.jsonl": "{"text":"Question: I have the following SMILES strings: CC(C)C[C@H](NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)[C@@H](C)O)C(N)=O, C=C1CC[C@@]2(OC1)O[C@H]1C[C@H]3[C@@H]4CC[C@@H]5C[C@@H](O[C@@H]6O[C@H](CO)[C@H](O)[C@H](O)[C@H]6O[C@@H]6O[C@H](CO)[C@@H](O)[C@H](O)[C@H]6O)[C@@H](O)C[C@]5(C)[C@H]4CC[C@]3(C)[C@H]1[C@@H]2C, CC(C)C[C@H](NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](CO)NC(=O)[C@@H]1CCCN1C(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](N)CS)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CS)C(=O)O, and CCCCCCCCCCCC[C@@H](O)[C@H]1CC[C@H]([C@H](O)CCC(O)CCCC(O)CCC[C@@H](O)CC2=C[C@H](C)OC2=O)O1. Which molecule is the most different from the all others based on Tanimoto distance of their Morgan fingerprints of radius two?\nAnswer: The most dissimilar molecule is C=C1CC[C@@]2(OC1)O[C@H]1C[C@H]3[C@@H]4CC[C@@H]5C[C@@H](O[C@@H]6O[C@H](CO)[C@H](O)[C@H](O)[C@H]6O[C@@H]6O[C@H](CO)[C@@H](O)[C@H](O)[C@H]6O)[C@@H](O)C[C@]5(C)[C@H]4CC[C@]3(C)[C@H]1[C@@H]2C."} {"text":"Question: I have the following SMILES strings: CC(C)Cn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccccc4)cc3)cc2n(C)c1=O, CC(C)c1c(Sc2ccccc2)n(OCCCO)c(=O)[nH]c1=O, CN1CCC2(CC1)SSC1(CCN(C)CC1)S2, and COc1cc(S(=O)(=O)c2ccccc2F)cc2c3c(oc12)CCNC3. Which molecule is the most different from the all others based on Tanimoto distance of their Morgan fingerprints of radius 2?\nAnswer: The most dissimilar molecule is CN1CCC2(CC1)SSC1(CCN(C)CC1)S2."}", "/scratch/micpie/export/skin_reaction/test_0-10.jsonl": "{"text":"User: I'm searching for the DeepSMILES of a molecule that is not causing a skin reaction?\nAssistant: This is a molecule that is not causing a skin reaction: CCCOccccBr)cC=O)ccccOC))cc6O))))))))c6"} {"text":"User: I'm searching for the SELFIES of a molecule that is causing a skin reaction?\nAssistant: This is a molecule that is causing a skin reaction: [C][C][Branch1][Ring1][C][=O][C][C][=C][C][=C][Branch1][=Branch2][C][Branch1][C][C][Branch1][C][C][C][C][=C][Ring1][#Branch2]"}", "/scratch/micpie/export/skin_reaction/valid_0-8.jsonl": "{"text":"User: Is the molecule with the canonical SMILES O=C1C([PH](c2ccccc2)(c2ccccc2)c2ccccc2)CCN1c1ccccc1 causing a skin reaction?\nAssistant: No, it is not causing a skin reaction."} {"text":"User: Is the molecule with the SELFIES [C][O][C][=C][C][Branch1][C][C][=C][C][Branch2][Ring1][Ring2][O][C][=C][C][=C][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][=C][Ring1][#Branch2][=C][Branch1][Ring1][O][C][C][=C][Branch1][C][N][C][Ring2][Ring1][Ring2][=N][Ring2][Ring1][=Branch2] causing a skin reaction?\nAssistant: No, it is not causing a skin reaction."}", "/scratch/micpie/export/skin_reaction/train_0-8.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C5H8O\/c1-3-5(2)4-6\/h3-4H,1-2H3 causing a skin reaction?\nAssistant: No, it is not causing a skin reaction."} {"text":"User: Is the molecule with the DeepSMILES OCCO)CO causing a skin reaction?\nAssistant: No, it is not causing a skin reaction."}", "/scratch/micpie/export/skin_reaction/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is causing a skin reaction.\nMolecule canonical SMILES: CCCOc1ccc(Br)c(C(=O)c2ccc(OC)cc2O)c1\nConstraint: Answer the question in a definite sentence.\nResult: This molecule is not causing a skin reaction."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is causing a skin reaction.\nMolecule canonical SMILES: CC(C=O)Cc1ccc(C(C)(C)C)cc1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is causing a skin reaction."}", "/scratch/micpie/export/skin_reaction/valid_0-9.jsonl": "{"text":"User: Can you give me the canonical SMILES of a molecule that is not causing a skin reaction?\nAssistant: Sure, here you go: O=C1C([PH](c2ccccc2)(c2ccccc2)c2ccccc2)CCN1c1ccccc1"} {"text":"User: Can you give me the DeepSMILES of a molecule that is not causing a skin reaction?\nAssistant: Yes, I'm happy to help, here you go: COcccC)ccOcccccCF)F)F))c6)))))))cOC))ccN)c6n%10"}", "/scratch/micpie/export/skin_reaction/test_0-1.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C17H17BrO4\/c1-3-8-22-12-5-7-15(18)14(9-12)17(20)13-6-4-11(21-2)10-16(13)19\/h4-7,9-10,19H,3,8H2,1-2H3 is not causing a skin reaction."} {"text":"The molecule with the SMILES representation of CC(C=O)Cc1ccc(C(C)(C)C)cc1 is causing a skin reaction."}", "/scratch/micpie/export/skin_reaction/valid_0-0.jsonl": "{"text":"The SELFIES [O][=C][C][Branch2][Ring1][=C][PH1][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][N][Ring2][Ring1][Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1] causes no agent induced skin reaction."} {"text":"The molecule DeepSMILES COcccC)ccOcccccCF)F)F))c6)))))))cOC))ccN)c6n%10 causes no agent induced skin reaction."}", "/scratch/micpie/export/skin_reaction/test_0-2.jsonl": "{"text":"Based on the InChI InChI=1S\/C17H17BrO4\/c1-3-8-22-12-5-7-15(18)14(9-12)17(20)13-6-4-11(21-2)10-16(13)19\/h4-7,9-10,19H,3,8H2,1-2H3, there is no drug induced skin immune reaction."} {"text":"Based on the canonical SMILES CC(C=O)Cc1ccc(C(C)(C)C)cc1, there is skin reaction."}", "/scratch/micpie/export/skin_reaction/valid_0-10.jsonl": "{"text":"User: I'm looking for the SMILES of a molecule that is not causing a skin reaction?\nAssistant: This is a molecule that is not causing a skin reaction: O=C1C([PH](c2ccccc2)(c2ccccc2)c2ccccc2)CCN1c1ccccc1"} {"text":"User: I'm searching for the SELFIES of a molecule that is not causing a skin reaction?\nAssistant: This is a molecule that is not causing a skin reaction: [C][O][C][=C][C][Branch1][C][C][=C][C][Branch2][Ring1][Ring2][O][C][=C][C][=C][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][=C][Ring1][#Branch2][=C][Branch1][Ring1][O][C][C][=C][Branch1][C][N][C][Ring2][Ring1][Ring2][=N][Ring2][Ring1][=Branch2]"}", "/scratch/micpie/export/skin_reaction/train_0-6.jsonl": "{"text":"Task: Please generate a canonical SMILES based on the description below.\nDescription: A molecule that is causing a skin reaction.\nResult: CC=C(C)C=O"} {"text":"Task: Please create a molecule SMILES based on the text description.\nDescription: A molecule that is causing a skin reaction.\nResult: OCC(O)CO"}", "/scratch/micpie/export/skin_reaction/valid_0-6.jsonl": "{"text":"Task: Please create a canonical SMILES based on the description below.\nDescription: A molecule that is causing a skin reaction.\nResult: O=C1C([PH](c2ccccc2)(c2ccccc2)c2ccccc2)CCN1c1ccccc1"} {"text":"Task: Please create a molecule InChI based on the text description below.\nDescription: A molecule that is causing a skin reaction.\nResult: InChI=1S\/C19H17F3N2O3\/c1-10-7-15(26-3)24-17-13(23)9-14(25-2)18(16(10)17)27-12-6-4-5-11(8-12)19(20,21)22\/h4-9H,23H2,1-3H3"}", "/scratch/micpie/export/skin_reaction/test_0-9.jsonl": "{"text":"User: Can you generate the InChI of a molecule that is not causing a skin reaction?\nAssistant: Yes, I'm happy to help, here you go: InChI=1S\/C17H17BrO4\/c1-3-8-22-12-5-7-15(18)14(9-12)17(20)13-6-4-11(21-2)10-16(13)19\/h4-7,9-10,19H,3,8H2,1-2H3"} {"text":"User: Can you give me the SELFIES of a molecule that is causing a skin reaction?\nAssistant: Yes, I'm happy to help, here you go: [C][C][Branch1][Ring1][C][=O][C][C][=C][C][=C][Branch1][=Branch2][C][Branch1][C][C][Branch1][C][C][C][C][=C][Ring1][#Branch2]"}", "/scratch/micpie/export/skin_reaction/test_0-0.jsonl": "{"text":"The SELFIES [C][C][C][O][C][=C][C][=C][Branch1][C][Br][C][Branch2][Ring1][C][C][=Branch1][C][=O][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][O][=C][Ring2][Ring1][C] causes no skin reaction."} {"text":"The SMILES CC(C=O)Cc1ccc(C(C)(C)C)cc1 causes drug induced skin immune reaction."}", "/scratch/micpie/export/skin_reaction/valid_0-7.jsonl": "{"text":"User: Can you derive if the molecule with the canonical SMILES O=C1C([PH](c2ccccc2)(c2ccccc2)c2ccccc2)CCN1c1ccccc1 is causing a skin reaction?\nAssistant: No, this molecule is not causing a skin reaction."} {"text":"User: Can you estimate if the molecule with the SMILES COc1cc(C)c2c(Oc3cccc(C(F)(F)F)c3)c(OC)cc(N)c2n1 is causing a skin reaction?\nAssistant: No, this molecule is not causing a skin reaction."}", "/scratch/micpie/export/skin_reaction/test_0-3.jsonl": "{"text":"The molecule SMILES CCCOc1ccc(Br)c(C(=O)c2ccc(OC)cc2O)c1 does not causes a skin sensitization."} {"text":"The molecule InChI InChI=1S\/C14H20O\/c1-11(10-15)9-12-5-7-13(8-6-12)14(2,3)4\/h5-8,10-11H,9H2,1-4H3 does causes a skin reaction."}", "/scratch/micpie/export/skin_reaction/valid_0-11.jsonl": "{"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be causing a skin reaction.\nAssistant: Ok, this SELFIES is not causing a skin reaction: [O][=C][C][Branch2][Ring1][=C][PH1][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][N][Ring2][Ring1][Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1]"} {"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be causing a skin reaction.\nAssistant: Got it, this InChI is not causing a skin reaction: InChI=1S\/C19H17F3N2O3\/c1-10-7-15(26-3)24-17-13(23)9-14(25-2)18(16(10)17)27-12-6-4-5-11(8-12)19(20,21)22\/h4-9H,23H2,1-3H3"}", "/scratch/micpie/export/skin_reaction/train_0-0.jsonl": "{"text":"The SMILES CC=C(C)C=O causes no agent induced skin reaction."} {"text":"The molecule InChI InChI=1S\/C3H8O3\/c4-1-3(6)2-5\/h3-6H,1-2H2 causes no skin reaction."}", "/scratch/micpie/export/skin_reaction/test_0-6.jsonl": "{"text":"Task: Please generate a SMILES based on the text description.\nDescription: A molecule that is causing a skin reaction.\nResult: CCCOc1ccc(Br)c(C(=O)c2ccc(OC)cc2O)c1"} {"text":"Task: Please create a canonical SMILES based on the text description.\nDescription: A molecule that is causing a skin reaction.\nResult: CC(C=O)Cc1ccc(C(C)(C)C)cc1"}", "/scratch/micpie/export/skin_reaction/train_0-10.jsonl": "{"text":"User: I'm looking for the DeepSMILES of a molecule that is not causing a skin reaction?\nAssistant: This is a molecule that is not causing a skin reaction: CC=CC)C=O"} {"text":"User: I'm looking for the DeepSMILES of a molecule that is not causing a skin reaction?\nAssistant: This is a molecule that is not causing a skin reaction: OCCO)CO"}", "/scratch/micpie/export/skin_reaction/train_0-3.jsonl": "{"text":"The molecule SELFIES [C][C][=C][Branch1][C][C][C][=O] does not causes a skin sensitization."} {"text":"The molecule SELFIES [O][C][C][Branch1][C][O][C][O] does not causes a drug induced skin immune reaction."}", "/scratch/micpie/export/skin_reaction/train_0-12.jsonl": "{"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be causing a skin reaction.\nAssistant: Ok, this SMILES is not causing a skin reaction: CC=C(C)C=O"} {"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be causing a skin reaction.\nAssistant: Got it, this canonical SMILES is not causing a skin reaction: OCC(O)CO"}", "/scratch/micpie/export/skin_reaction/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of CCCOc1ccc(Br)c(C(=O)c2ccc(OC)cc2O)c1 causing a skin reaction?\nConstraint: Even if you are not sure, you must pick either a or b without using any other words.\nOptions:\na: False\nb: True\nAnswer: a"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES [C][C][Branch1][Ring1][C][=O][C][C][=C][C][=C][Branch1][=Branch2][C][Branch1][C][C][Branch1][C][C][C][C][=C][Ring1][#Branch2] causing a skin reaction?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any other words.\nOptions:\n(1) True\n(2) False\nAnswer: 1"}", "/scratch/micpie/export/skin_reaction/valid_0-2.jsonl": "{"text":"Based on the SMILES representation O=C1C([PH](c2ccccc2)(c2ccccc2)c2ccccc2)CCN1c1ccccc1, there is no skin sensitization."} {"text":"Based on the SMILES COc1cc(C)c2c(Oc3cccc(C(F)(F)F)c3)c(OC)cc(N)c2n1, there is no skin sensitization."}", "/scratch/micpie/export/skin_reaction/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not causing a skin reaction?\nConstraint: You must select none, one or more options from a, b, c, or d without using any additional words.\nOptions:\na) Ncccccc6C=O)O\nb) CCCCCCCCBr)CCCCCC\nc) CC=CC)C=O\nd) O=CC=O)cccccc6)))))))C=CCCBr)Br)C=C6\nAnswer: a, c"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not causing a skin reaction?\nConstraint: You must select none, one or more options from a, b, or c without using any other words.\nOptions:\n(a) [O][C][C][Branch1][C][O][C][O]\n(b) [O][=C][Branch1][C][O][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=Branch1][C][=O][O][C][Ring1][#Branch1][=O]\n(c) [Cl][C][=N][C][=N][C][=C][C][=C][Branch1][C][I][C][=C][Ring1][O][Ring1][#Branch1]\nAnswer: a"}", "/scratch/micpie/export/skin_reaction/valid_0-1.jsonl": "{"text":"The molecule with the SELFIES representation of [O][=C][C][Branch2][Ring1][=C][PH1][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][N][Ring2][Ring1][Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1] is not causing a skin reaction."} {"text":"The molecule with the DeepSMILES representation of COcccC)ccOcccccCF)F)F))c6)))))))cOC))ccN)c6n%10 is not causing a skin reaction."}", "/scratch/micpie/export/skin_reaction/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [O][=C][C][Branch2][Ring1][=C][PH1][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][N][Ring2][Ring1][Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1] causing a skin reaction?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any other words.\nOptions:\n1: False\n2: True\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES [C][O][C][=C][C][Branch1][C][C][=C][C][Branch2][Ring1][Ring2][O][C][=C][C][=C][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][=C][Ring1][#Branch2][=C][Branch1][Ring1][O][C][C][=C][Branch1][C][N][C][Ring2][Ring1][Ring2][=N][Ring2][Ring1][=Branch2] causing a skin reaction?\nConstraint: Even if you are not sure, you must pick either A or B without using any additional words.\nOptions:\nA True\nB False\nAnswer: B"}", "/scratch/micpie/export/skin_reaction/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is causing a skin reaction.\nInChI: InChI=1S\/C28H26NOP\/c30-28-27(21-22-29(28)23-13-5-1-6-14-23)31(24-15-7-2-8-16-24,25-17-9-3-10-18-25)26-19-11-4-12-20-26\/h1-20,27,31H,21-22H2\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not causing a skin reaction."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is causing a skin reaction.\nMolecule DeepSMILES: COcccC)ccOcccccCF)F)F))c6)))))))cOC))ccN)c6n%10\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not causing a skin reaction."}", "/scratch/micpie/export/skin_reaction/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is causing a skin reaction.\nMolecule SMILES: O=C1C([PH](c2ccccc2)(c2ccccc2)c2ccccc2)CCN1c1ccccc1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is causing a skin reaction.\nMolecule DeepSMILES: COcccC)ccOcccccCF)F)F))c6)))))))cOC))ccN)c6n%10\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/skin_reaction/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is causing a skin reaction.\nMolecule InChI: InChI=1S\/C5H8O\/c1-3-5(2)4-6\/h3-4H,1-2H3\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not causing a skin reaction."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is causing a skin reaction.\nMolecule InChI: InChI=1S\/C3H8O3\/c4-1-3(6)2-5\/h3-6H,1-2H2\nConstraint: Answer the question in a definite sentence.\nResult: This molecule is not causing a skin reaction."}", "/scratch/micpie/export/skin_reaction/valid_0-12.jsonl": "{"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be causing a skin reaction.\nAssistant: Ok, this InChI is not causing a skin reaction: InChI=1S\/C28H26NOP\/c30-28-27(21-22-29(28)23-13-5-1-6-14-23)31(24-15-7-2-8-16-24,25-17-9-3-10-18-25)26-19-11-4-12-20-26\/h1-20,27,31H,21-22H2"} {"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be causing a skin reaction.\nAssistant: Got it, this DeepSMILES is not causing a skin reaction: COcccC)ccOcccccCF)F)F))c6)))))))cOC))ccN)c6n%10"}", "/scratch/micpie/export/skin_reaction/train_0-2.jsonl": "{"text":"Based on the canonical SMILES CC=C(C)C=O, there is no skin reaction."} {"text":"Based on the canonical SMILES OCC(O)CO, there is no agent induced skin reaction."}", "/scratch/micpie/export/skin_reaction/test_0-11.jsonl": "{"text":"User: I want to generate a molecule InChI.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be causing a skin reaction.\nAssistant: Got it, here you go, this InChI is not causing a skin reaction: InChI=1S\/C17H17BrO4\/c1-3-8-22-12-5-7-15(18)14(9-12)17(20)13-6-4-11(21-2)10-16(13)19\/h4-7,9-10,19H,3,8H2,1-2H3"} {"text":"User: I want to generate a molecule InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should be causing a skin reaction.\nAssistant: Ok, here you go, this InChI is causing a skin reaction: InChI=1S\/C14H20O\/c1-11(10-15)9-12-5-7-13(8-6-12)14(2,3)4\/h5-8,10-11H,9H2,1-4H3"}", "/scratch/micpie/export/skin_reaction/train_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the SMILES CC=C(C)C=O is causing a skin reaction?\nAssistant: No, this molecule is not causing a skin reaction."} {"text":"User: Can you tell me if the molecule with the SELFIES [O][C][C][Branch1][C][O][C][O] is causing a skin reaction?\nAssistant: No, this molecule is not causing a skin reaction."}", "/scratch/micpie/export/skin_reaction/train_0-11.jsonl": "{"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be causing a skin reaction.\nAssistant: Ok, here you go, this InChI is not causing a skin reaction: InChI=1S\/C5H8O\/c1-3-5(2)4-6\/h3-4H,1-2H3"} {"text":"User: I want to generate a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be causing a skin reaction.\nAssistant: Got it, here you go, this canonical SMILES is not causing a skin reaction: OCC(O)CO"}", "/scratch/micpie/export/skin_reaction/train_0-1.jsonl": "{"text":"The molecule with the SELFIES [C][C][=C][Branch1][C][C][C][=O] is not causing a skin reaction."} {"text":"The molecule with the InChI InChI=1S\/C3H8O3\/c4-1-3(6)2-5\/h3-6H,1-2H2 is not causing a skin reaction."}", "/scratch/micpie/export/skin_reaction/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES CC=C(C)C=O causing a skin reaction?\nConstraint: Even if you are not sure, you must pick either A or B without using any other words.\nOptions:\nA) True\nB) False\nAnswer: B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES [O][C][C][Branch1][C][O][C][O] causing a skin reaction?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any other words.\nOptions:\n1) False\n2) True\nAnswer: 1"}", "/scratch/micpie/export/skin_reaction/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is causing a skin reaction.\nSELFIES: [C][C][=C][Branch1][C][C][C][=O]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is causing a skin reaction.\nMolecule InChI: InChI=1S\/C3H8O3\/c4-1-3(6)2-5\/h3-6H,1-2H2\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/skin_reaction/test_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the canonical SMILES CCCOc1ccc(Br)c(C(=O)c2ccc(OC)cc2O)c1 is causing a skin reaction?\nAssistant: No, this molecule is not causing a skin reaction."} {"text":"User: Can you tell me if the molecule with the SELFIES [C][C][Branch1][Ring1][C][=O][C][C][=C][C][=C][Branch1][=Branch2][C][Branch1][C][C][Branch1][C][C][C][C][=C][Ring1][#Branch2] is causing a skin reaction?\nAssistant: Yes, this molecule is causing a skin reaction."}", "/scratch/micpie/export/skin_reaction/train_0-9.jsonl": "{"text":"User: Can you give me the SMILES of a molecule that is not causing a skin reaction?\nAssistant: Sure, here you go: CC=C(C)C=O"} {"text":"User: Can you give me the canonical SMILES of a molecule that is not causing a skin reaction?\nAssistant: Sure, here you go: OCC(O)CO"}", "/scratch/micpie/export/skin_reaction/valid_0-3.jsonl": "{"text":"The molecule DeepSMILES O=CC[PH]cccccc6))))))cccccc6))))))cccccc6)))))))CCN5cccccc6 does not causes a drug induced skin immune reaction."} {"text":"The molecule canonical SMILES COc1cc(C)c2c(Oc3cccc(C(F)(F)F)c3)c(OC)cc(N)c2n1 does not causes a skin sensitization."}", "/scratch/micpie/export/skin_reaction/test_0-8.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C17H17BrO4\/c1-3-8-22-12-5-7-15(18)14(9-12)17(20)13-6-4-11(21-2)10-16(13)19\/h4-7,9-10,19H,3,8H2,1-2H3 causing a skin reaction?\nAssistant: No, it is not causing a skin reaction."} {"text":"User: Is the molecule with the SELFIES [C][C][Branch1][Ring1][C][=O][C][C][=C][C][=C][Branch1][=Branch2][C][Branch1][C][C][Branch1][C][C][C][C][=C][Ring1][#Branch2] causing a skin reaction?\nAssistant: Yes, it is causing a skin reaction."}", "/scratch/micpie/export/skin_reaction/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not causing a skin reaction?\nConstraint: You must select none, one or more options from a or b without using any other words.\nOptions:\n[a] O=CNcccCO)CBr)))ccc6OCcccccc6\n[b] CCCOccccBr)cC=O)ccccOC))cc6O))))))))c6\nAnswer: a, b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are causing a skin reaction?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any additional words.\nOptions:\n[1] [C][C][C][C][C][C][=C][C][C][=C][C][C][C][C][C][C][C][C][=Branch1][C][=O][O]\n[2] [C][C][=Branch1][C][=O][O][C][C][=Branch1][C][=O][C][Branch1][C][O][C][Branch1][C][C][C][C][C][C][C][C][=C][C][=Branch1][C][=O][C][=C][C][Ring1][#Branch1][Branch1][C][C][C][Ring1][N][=C][C][C][Ring1][S][Ring2][Ring1][Branch1][C]\n[3] [C][C][=C][C][=C][Branch1][=Branch1][C][Branch1][C][C][C][C][C][Ring1][=Branch2]\n[4] [C][C][Branch1][Ring1][C][=O][C][C][=C][C][=C][Branch1][=Branch2][C][Branch1][C][C][Branch1][C][C][C][C][=C][Ring1][#Branch2]\nAnswer: 1, 3, 4"}", "/scratch/micpie/export/skin_reaction/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not causing a skin reaction?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any additional words.\nOptions:\n(1) [O][=C][C][Branch2][Ring1][=C][PH1][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][N][Ring2][Ring1][Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1]\n(2) [C][=C][C][C][=C][C][Branch1][C][C][=C][Branch1][C][O][C][Branch1][Ring1][O][C][=C][Ring1][#Branch2]\n(3) [O][C][C][C][C][C][C][C][C][C][C][C][C][Br]\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not causing a skin reaction?\nConstraint: You must select none, one or more options from 1, 2, 3, 4, or 5 without using any additional words.\nOptions:\n(1) C=C1CC=C(C(C)C)CC1\n(2) COC(=O)CCC(C#N)(CCC(=O)OC)c1ccc(OC)c(OC2CCCC2)c1\n(3) CC(C)(C)OC(=O)NC(C(=O)O)C(c1ccc(F)cc1)c1ccc(F)cc1\n(4) COc1cc(C)c2c(Oc3cccc(C(F)(F)F)c3)c(OC)cc(N)c2n1\n(5) CCCCCCCC(Br)CCCCCC\nAnswer: 1, 2, 3, 4"}", "/scratch/micpie/export/skin_reaction/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is causing a skin reaction.\nMolecule InChI: InChI=1S\/C17H17BrO4\/c1-3-8-22-12-5-7-15(18)14(9-12)17(20)13-6-4-11(21-2)10-16(13)19\/h4-7,9-10,19H,3,8H2,1-2H3\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is causing a skin reaction.\nInChI: InChI=1S\/C14H20O\/c1-11(10-15)9-12-5-7-13(8-6-12)14(2,3)4\/h5-8,10-11H,9H2,1-4H3\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"}", "/scratch/micpie/export/skin_reaction/test_0-12.jsonl": "{"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be causing a skin reaction.\nAssistant: Understood, this DeepSMILES is not causing a skin reaction: CCCOccccBr)cC=O)ccccOC))cc6O))))))))c6"} {"text":"User: I want to generate a InChI.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should be causing a skin reaction.\nAssistant: Ok, this InChI is causing a skin reaction: InChI=1S\/C14H20O\/c1-11(10-15)9-12-5-7-13(8-6-12)14(2,3)4\/h5-8,10-11H,9H2,1-4H3"}", "/scratch/micpie/export/bio_ner_39/train_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: The sequences identified above and the following protein or predicted translations were used for phylogenetic analysis: hsTACC1A (NP _ 006274), hsTACC2l (AAO62630) hsTACC2s (AAO62629), hsTACC3 (NP _ 006333), mmTACC3 (Q9JJ11), xlMaskin (Q9PTG8), dmTACC (AAF52099), ceTAC1 (NP _ 497059), scSPC72 (NP _ 009352), hsRHAMM (NP _ 036616), mmRHAMM (NP _ 038580), rnRHAMM (NP _ 037096), drRHAMM (AAQ97980), hsKeratin (CAB76828), mmKeratin (A61368), rnKeratin (XP _ 235679), hsTPM1 (NP _ 000357), mmTPM1 (NP _ 077745), rnTPM1 (NP _ 62004, drTPM1 (NP _ 571180) dmTPM1 (P06754), ceTPM (NP _ 493540) scTPM1 (P17536), hsKLP2 (BAB03309), rnKIF15 (AAP44513), xlKLP2 (CAA08879), dmKLP2 (NP _ 476818), ceKLP18 (AA034669), hsKIF3A (Q9Y496), mmKIF3A (NP _ 032469), rnKIF3A (XP _ 340797), xlKIF3A (CAA08879), ceKLP11 (NP _ 741473), ciKIF3 (ci0100148992), hsKIF3B (NP _ 004789), mmKIF3B (NP _ 004789), rnKIF3B (XP _ 215883), dmKIF3B (NP _ 524029), hsKIF3C (NP _ 002245), mmKIF3C (NP _ 032471), rnKIF3C (NP _ 445938), dmKIF3C (NP _ 651939)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: sequences,4,13,Sequence\nprotein,49,56,Chemical\npredicted,60,69,Sequence\nhsTACC3,188,195,Gene_or_geneproduct\nmmTACC3,212,219,Gene_or_geneproduct\nxlMaskin,231,239,Gene_or_geneproduct\ndmTACC,251,257,Gene_or_geneproduct\nceTAC1,271,277,Gene_or_geneproduct\nscSPC72,294,301,Gene_or_geneproduct\nhsRHAMM,318,325,Gene_or_geneproduct\nmmRHAMM,342,349,Gene_or_geneproduct\nrnRHAMM,366,373,Gene_or_geneproduct\ndrRHAMM,390,397,Gene_or_geneproduct\nhsKeratin,411,420,Gene_or_geneproduct\nhsTPM1,481,487,Gene_or_geneproduct\nmmTPM1,504,510,Gene_or_geneproduct\nrnTPM1,527,533,Gene_or_geneproduct\ndrTPM1,548,554,Gene_or_geneproduct\ndmTPM1,570,576,Gene_or_geneproduct\nceTPM,588,593,Gene_or_geneproduct\nscTPM1,609,615,Gene_or_geneproduct\nhsKLP2,627,633,Gene_or_geneproduct\nrnKIF15,647,654,Gene_or_geneproduct\nxlKLP2,668,674,Gene_or_geneproduct\ndmKLP2,688,694,Gene_or_geneproduct\nceKLP18,711,718,Gene_or_geneproduct\nhsKIF3A,732,739,Gene_or_geneproduct\nmmKIF3A,751,758,Gene_or_geneproduct\nrnKIF3A,775,782,Gene_or_geneproduct\nxlKIF3A,799,806,Gene_or_geneproduct\nceKLP11,820,827,Gene_or_geneproduct\nhsKIF3B,868,875,Gene_or_geneproduct\nmmKIF3B,892,899,Gene_or_geneproduct\nrnKIF3B,916,923,Gene_or_geneproduct\ndmKIF3B,940,947,Gene_or_geneproduct\nhsKIF3C,964,971,Gene_or_geneproduct\nmmKIF3C,988,995,Gene_or_geneproduct\nrnKIF3C,1012,1019,Gene_or_geneproduct\ndmKIF3C,1036,1043,Gene_or_geneproduct"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: (A) Survival curves of unirradiated Ptc1+\/-, Rad54-\/-\/ Ptc1+\/-, Rad54-\/-\/ Parp-1+\/-\/ Ptc1+\/-, Parp-1-\/-\/ Ptc1+\/-and Rad54+\/-\/ Parp-1-\/-\/ Ptc1+\/-mice, showing significant lifespan shortening after inactivation of Parp-1. (B) Survival curves of Ptc1+\/-, Rad54-\/-\/ Ptc1+\/-, Rad54-\/-\/ Parp-1+\/-\/ Ptc1+\/-, Parp-1-\/-\/ Ptc1+\/-and Rad54+\/-\/ Parp-1-\/-\/ Ptc1+\/-mice irradiated with 1Gy at P1, all showing significant lifespan shortening compared to control mice with exception of Parp-1 null mice (Rad54+\/+ or+\/-\/ Parp-1-\/-\/ Ptc1+\/-). (C) Median survival of unirradiated and irradiated mice. (D) Effect of Rad54 and Parp-1 inactivation on spontaneous and (E) radiation-induced medulloblastoma tumorigenesis. (F) Percent incidence of medulloblastoma, sarcoma and other tumors for each mouse group. * P ≤ 0.05; ** P ≤ 0.005; *** P ≤ 0.0001..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: Ptc1,37,41,Gene\/Protein\nRad54,47,52,Gene\/Protein\nPtc1,58,62,Gene\/Protein\nRad54,68,73,Gene\/Protein\nParp - 1,79,87,Gene\/Protein\nPtc1,93,97,Gene\/Protein\nParp - 1,103,111,Gene\/Protein\nPtc1,117,121,Gene\/Protein\nRad54,130,135,Gene\/Protein\nParp - 1,141,149,Gene\/Protein\nPtc1,155,159,Gene\/Protein\nmice,164,168,Organism\/Species\nParp - 1,232,240,Gene\/Protein\nPtc1,266,270,Gene\/Protein\nRad54,276,281,Gene\/Protein\nPtc1,287,291,Gene\/Protein\nRad54,297,302,Gene\/Protein\nParp - 1,308,316,Gene\/Protein\nPtc1,322,326,Gene\/Protein\nParp - 1,332,340,Gene\/Protein\nPtc1,346,350,Gene\/Protein\nRad54,359,364,Gene\/Protein\nParp - 1,370,378,Gene\/Protein\nPtc1,384,388,Gene\/Protein\nmice,393,397,Organism\/Species\nmice,489,493,Organism\/Species\nParp - 1,512,520,Gene\/Protein\nmice,526,530,Organism\/Species\nRad54,533,538,Gene\/Protein\nParp - 1,551,559,Gene\/Protein\nPtc1,565,569,Gene\/Protein\nmice,628,632,Organism\/Species\nRad54,649,654,Gene\/Protein\nParp - 1,659,667,Gene\/Protein\nmedulloblastoma,725,740,Disease\/Disorder\nmedulloblastoma,782,797,Disease\/Disorder\nsarcoma,799,806,Disease\/Disorder\ntumors,817,823,Disease\/Disorder\nmouse,833,838,Organism\/Species"}", "/scratch/micpie/export/choline_transporter_butkiewicz/test_0-10.jsonl": "{"text":"User: I'm looking for the InChI of a molecule that is not inhibiting the choline transporter activity?\nAssistant: This is a molecule that is not inhibiting the choline transporter activity: InChI=1S\/C16H12N2O2\/c1-10-4-2-6-12-13(16(19)20)8-14(18-15(10)12)11-5-3-7-17-9-11\/h2-9H,1H3,(H,19,20)"} {"text":"User: I'm looking for the DeepSMILES of a molecule that is not inhibiting the choline transporter activity?\nAssistant: This is a molecule that is not inhibiting the choline transporter activity: S=O)=O)NccccOCC=O)NCCNC)C))CCCCC6)))))))))))cc6))))))C))cccccc6))C"}", "/scratch/micpie/export/choline_transporter_butkiewicz/valid_0-8.jsonl": "{"text":"User: Is the molecule with the SELFIES [S][C][=C][Branch2][Ring1][O][N][=C][Branch1][Ring1][S][C][N][Branch1][Branch1][C][Ring1][Branch2][=O][C][=C][C][=C][Branch1][Ring2][O][C][C][C][=C][Ring1][=Branch2][C][C][Ring2][Ring1][Branch1] inhibiting the choline transporter activity?\nAssistant: No, it is not inhibiting the choline transporter activity."} {"text":"User: Is the molecule with the DeepSMILES S=O)=O)NCC=O)Nccccnc6)))))))))ccccF)cc6)))))))cccOCCOc6cc%10 inhibiting the choline transporter activity?\nAssistant: No, it is not inhibiting the choline transporter activity."}", "/scratch/micpie/export/choline_transporter_butkiewicz/train_0-8.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C11H11F3N2S\/c12-11(13,14)8-3-1-4-9(7-8)16-10-15-5-2-6-17-10\/h1,3-4,7H,2,5-6H2,(H,15,16) inhibiting the choline transporter activity?\nAssistant: No, it is not inhibiting the choline transporter activity."} {"text":"User: Is the molecule with the canonical SMILES O=S(=O)(NCc1ccccc1Br)c1cccs1 inhibiting the choline transporter activity?\nAssistant: No, it is not inhibiting the choline transporter activity."}", "/scratch/micpie/export/choline_transporter_butkiewicz/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the choline transporter activity.\nMolecule InChI: InChI=1S\/C16H12N2O2\/c1-10-4-2-6-12-13(16(19)20)8-14(18-15(10)12)11-5-3-7-17-9-11\/h2-9H,1H3,(H,19,20)\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not inhibiting the choline transporter activity."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the choline transporter activity.\nSMILES: S(=O)(=O)(N(c1ccc(OCC(=O)NCC2(N(C)C)CCCCC2)cc1)C)c1ccc(cc1)C\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not inhibiting the choline transporter activity."}", "/scratch/micpie/export/choline_transporter_butkiewicz/valid_0-9.jsonl": "{"text":"User: Can you create the DeepSMILES of a molecule that is not inhibiting the choline transporter activity?\nAssistant: Yes, here you go: SccncSC))nc6=O))ccccOCC)))cc6)))))))))CC5"} {"text":"User: Can you generate the InChI of a molecule that is not inhibiting the choline transporter activity?\nAssistant: Sure, here you go: InChI=1S\/C21H18FN3O5S\/c22-15-3-5-17(6-4-15)25(14-21(26)24-16-2-1-9-23-13-16)31(27,28)18-7-8-19-20(12-18)30-11-10-29-19\/h1-9,12-13H,10-11,14H2,(H,24,26)"}", "/scratch/micpie/export/choline_transporter_butkiewicz/test_0-1.jsonl": "{"text":"Based on the InChI InChI=1S\/C16H12N2O2\/c1-10-4-2-6-12-13(16(19)20)8-14(18-15(10)12)11-5-3-7-17-9-11\/h2-9H,1H3,(H,19,20), the molecule displays no inhibition of choline transporter activity."} {"text":"Based on the DeepSMILES S=O)=O)NccccOCC=O)NCCNC)C))CCCCC6)))))))))))cc6))))))C))cccccc6))C, the molecule displays no inhibition of choline transporter activity."}", "/scratch/micpie/export/choline_transporter_butkiewicz/valid_0-0.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C15H16N2O2S2\/c1-3-19-11-6-4-10(5-7-11)17-14(18)13-12(8-9-21-13)16-15(17)20-2\/h4-7H,3,8-9H2,1-2H3 displays no inhibition of choline transporter activity."} {"text":"The molecule with the DeepSMILES S=O)=O)NCC=O)Nccccnc6)))))))))ccccF)cc6)))))))cccOCCOc6cc%10 shows no inhibition of choline transporter activity."}", "/scratch/micpie/export/choline_transporter_butkiewicz/test_0-2.jsonl": "{"text":"The DeepSMILES OC=O)cccncc6)ccccnc6))))))))cccc6)))C is from a molecule that displays no inhibition of choline transporter activity."} {"text":"The SMILES S(=O)(=O)(N(c1ccc(OCC(=O)NCC2(N(C)C)CCCCC2)cc1)C)c1ccc(cc1)C represents a molecule that shows no inhibition of choline transporter activity."}", "/scratch/micpie/export/choline_transporter_butkiewicz/valid_0-10.jsonl": "{"text":"User: I'm looking for the IUPAC name of a molecule that is not inhibiting the choline transporter activity?\nAssistant: This is a molecule that is not inhibiting the choline transporter activity: 3-(4-ethoxyphenyl)-2-methylsulfanyl-6,7-dihydrothieno[2,3-e]pyrimidin-4-one"} {"text":"User: I'm searching for the SELFIES of a molecule that is not inhibiting the choline transporter activity?\nAssistant: This is a molecule that is not inhibiting the choline transporter activity: [S][=Branch1][C][=O][=Branch1][C][=O][Branch2][Ring1][=N][N][Branch1][#C][C][C][=Branch1][C][=O][N][C][=C][C][=C][N][=C][Ring1][=Branch1][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][=C][C][O][C][C][O][C][=Ring1][=Branch1][C][=C][Ring1][#Branch2]"}", "/scratch/micpie/export/choline_transporter_butkiewicz/train_0-6.jsonl": "{"text":"Task: Please create a molecule SMILES based on the text description.\nDescription: A molecule that is inhibiting the choline transporter activity.\nResult: S1CCCN=C1Nc1cc(ccc1)C(F)(F)F"} {"text":"Task: Please give me a InChI based on the description.\nDescription: A molecule that is inhibiting the choline transporter activity.\nResult: InChI=1S\/C11H10BrNO2S2\/c12-10-5-2-1-4-9(10)8-13-17(14,15)11-6-3-7-16-11\/h1-7,13H,8H2"}", "/scratch/micpie/export/choline_transporter_butkiewicz/valid_0-6.jsonl": "{"text":"Task: Please generate a molecule InChI based on the description.\nDescription: A molecule that is inhibiting the choline transporter activity.\nResult: InChI=1S\/C15H16N2O2S2\/c1-3-19-11-6-4-10(5-7-11)17-14(18)13-12(8-9-21-13)16-15(17)20-2\/h4-7H,3,8-9H2,1-2H3"} {"text":"Task: Please give me a molecule SELFIES based on the text description below.\nDescription: A molecule that is inhibiting the choline transporter activity.\nResult: [S][=Branch1][C][=O][=Branch1][C][=O][Branch2][Ring1][=N][N][Branch1][#C][C][C][=Branch1][C][=O][N][C][=C][C][=C][N][=C][Ring1][=Branch1][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][=C][C][O][C][C][O][C][=Ring1][=Branch1][C][=C][Ring1][#Branch2]"}", "/scratch/micpie/export/choline_transporter_butkiewicz/test_0-9.jsonl": "{"text":"User: Can you generate the canonical SMILES of a molecule that is not inhibiting the choline transporter activity?\nAssistant: Of course, here you go: Cc1cccc2c(C(=O)O)cc(-c3cccnc3)nc12"} {"text":"User: Can you create the SMILES of a molecule that is not inhibiting the choline transporter activity?\nAssistant: Of course, here you go: S(=O)(=O)(N(c1ccc(OCC(=O)NCC2(N(C)C)CCCCC2)cc1)C)c1ccc(cc1)C"}", "/scratch/micpie/export/choline_transporter_butkiewicz/test_0-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of Cc1cccc2c(C(=O)O)cc(-c3cccnc3)nc12 shows no inhibition of choline transporter activity."} {"text":"The molecule with the SMILES representation of S(=O)(=O)(N(c1ccc(OCC(=O)NCC2(N(C)C)CCCCC2)cc1)C)c1ccc(cc1)C shows no inhibition of choline transporter activity."}", "/scratch/micpie/export/choline_transporter_butkiewicz/valid_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the canonical SMILES CCOc1ccc(-n2c(SC)nc3c(c2=O)SCC3)cc1 is inhibiting the choline transporter activity?\nAssistant: No, this molecule is not inhibiting the choline transporter activity."} {"text":"User: Can you estimate if the molecule with the canonical SMILES O=C(CN(c1ccc(F)cc1)S(=O)(=O)c1ccc2c(c1)OCCO2)Nc1cccnc1 is inhibiting the choline transporter activity?\nAssistant: No, this molecule is not inhibiting the choline transporter activity."}", "/scratch/micpie/export/choline_transporter_butkiewicz/test_0-3.jsonl": "{"text":"The molecule SELFIES [O][C][=Branch1][C][=O][C][=C][C][=Branch1][S][=N][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][C][=C][N][=C][Ring1][=Branch1][C][=Branch1][=Branch1][=C][C][=C][Ring1][#C][C] is not inhibiting the choline transporter activity."} {"text":"The molecule InChI InChI=1S\/C25H35N3O4S\/c1-20-8-14-23(15-9-20)33(30,31)28(4)21-10-12-22(13-11-21)32-18-24(29)26-19-25(27(2)3)16-6-5-7-17-25\/h8-15H,5-7,16-19H2,1-4H3,(H,26,29) is not inhibiting the choline transporter activity."}", "/scratch/micpie/export/choline_transporter_butkiewicz/valid_0-11.jsonl": "{"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be inhibiting the choline transporter activity.\nAssistant: Ok, here you go, this SMILES is not inhibiting the choline transporter activity: S1c2c(nc(SC)n(c2=O)c2ccc(OCC)cc2)CC1"} {"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be inhibiting the choline transporter activity.\nAssistant: Ok, this DeepSMILES is not inhibiting the choline transporter activity: S=O)=O)NCC=O)Nccccnc6)))))))))ccccF)cc6)))))))cccOCCOc6cc%10"}", "/scratch/micpie/export/choline_transporter_butkiewicz/train_0-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of SCCCN=C6Ncccccc6)))CF)F)F displays no inhibition of choline transporter activity."} {"text":"The molecule with the canonical SMILES O=S(=O)(NCc1ccccc1Br)c1cccs1 shows no inhibition of choline transporter activity."}", "/scratch/micpie/export/choline_transporter_butkiewicz/test_0-6.jsonl": "{"text":"Task: Please generate a InChI based on the text description.\nDescription: A molecule that is inhibiting the choline transporter activity.\nResult: InChI=1S\/C16H12N2O2\/c1-10-4-2-6-12-13(16(19)20)8-14(18-15(10)12)11-5-3-7-17-9-11\/h2-9H,1H3,(H,19,20)"} {"text":"Task: Please generate a molecule InChI based on the text description below.\nDescription: A molecule that is inhibiting the choline transporter activity.\nResult: InChI=1S\/C25H35N3O4S\/c1-20-8-14-23(15-9-20)33(30,31)28(4)21-10-12-22(13-11-21)32-18-24(29)26-19-25(27(2)3)16-6-5-7-17-25\/h8-15H,5-7,16-19H2,1-4H3,(H,26,29)"}", "/scratch/micpie/export/choline_transporter_butkiewicz/train_0-10.jsonl": "{"text":"User: I'm looking for the SMILES of a molecule that is not inhibiting the choline transporter activity?\nAssistant: This is a molecule that is not inhibiting the choline transporter activity: S1CCCN=C1Nc1cc(ccc1)C(F)(F)F"} {"text":"User: I'm looking for the SMILES of a molecule that is not inhibiting the choline transporter activity?\nAssistant: This is a molecule that is not inhibiting the choline transporter activity: Brc1c(CNS(=O)(=O)c2sccc2)cccc1"}", "/scratch/micpie/export/choline_transporter_butkiewicz/train_0-3.jsonl": "{"text":"The molecule SELFIES [S][C][C][C][N][=C][Ring1][=Branch1][N][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][F][Branch1][C][F][F] is not inhibiting the choline transporter activity."} {"text":"The molecule SMILES Brc1c(CNS(=O)(=O)c2sccc2)cccc1 is not inhibiting the choline transporter activity."}", "/scratch/micpie/export/choline_transporter_butkiewicz/train_0-12.jsonl": "{"text":"User: I want to generate a SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be inhibiting the choline transporter activity.\nAssistant: Ok, this SMILES is not inhibiting the choline transporter activity: S1CCCN=C1Nc1cc(ccc1)C(F)(F)F"} {"text":"User: I want to generate a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be inhibiting the choline transporter activity.\nAssistant: Ok, this canonical SMILES is not inhibiting the choline transporter activity: O=S(=O)(NCc1ccccc1Br)c1cccs1"}", "/scratch/micpie/export/choline_transporter_butkiewicz/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the choline transporter activity?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any additional words.\nOptions:\n[1] COc1cc(NC(=O)Cc2nn(C)c(=O)c3ccccc23)cc(OC)c1OC\n[2] Cc1cccc2c(C(=O)O)cc(-c3cccnc3)nc12\n[3] O=C(NCC1CC1)c1cc([N+](=O)[O-])cc([N+](=O)[O-])c1\nAnswer: 1, 2, 3"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the choline transporter activity?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any additional words.\nOptions:\nA. S1(=O)(=O)c2c(C(=O)c3c1cccc3)ccc(c2)C(=O)Nc1ccc(S(O)(=O)=O)cc1\nB. S(=O)(=O)(N(c1ccc(OCC(=O)NCC2(N(C)C)CCCCC2)cc1)C)c1ccc(cc1)C\nC. S(c1n(c(nn1)CNC(=O)C12CC3CC(C1)CC(C2)C3)CC)C\nD. s1c2CCCc2c(c1NC(=O)CSc1n(c(nn1)C1Oc2c(OC1)cccc2)CC=C)C#N\nE. S(CC(OC(C)C)=O)c1oc(nn1)c1c(OC)cccc1\nAnswer: A, B, C, D, E"}", "/scratch/micpie/export/choline_transporter_butkiewicz/valid_0-2.jsonl": "{"text":"The canonical SMILES CCOc1ccc(-n2c(SC)nc3c(c2=O)SCC3)cc1 is from a molecule that exhibits no inhibition of choline transporter activity."} {"text":"The SELFIES [S][=Branch1][C][=O][=Branch1][C][=O][Branch2][Ring1][=N][N][Branch1][#C][C][C][=Branch1][C][=O][N][C][=C][C][=C][N][=C][Ring1][=Branch1][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][=C][C][O][C][C][O][C][=Ring1][=Branch1][C][=C][Ring1][#Branch2] is from a molecule that shows no inhibition of choline transporter activity."}", "/scratch/micpie/export/choline_transporter_butkiewicz/valid_0-1.jsonl": "{"text":"Based on the SMILES S1c2c(nc(SC)n(c2=O)c2ccc(OCC)cc2)CC1, the molecule shows no inhibition of choline transporter activity."} {"text":"Based on the SMILES S(=O)(=O)(N(CC(=O)Nc1cccnc1)c1ccc(F)cc1)c1cc2OCCOc2cc1, the molecule displays no inhibition of choline transporter activity."}", "/scratch/micpie/export/choline_transporter_butkiewicz/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the choline transporter activity?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any other words.\nOptions:\n[A] 3-(4-ethoxyphenyl)-2-methylsulfanyl-6,7-dihydrothieno[2,3-e]pyrimidin-4-one\n[B] nan\n[C] nan\n[D] nan\n[E] nan\nAnswer: A, B, C, D, E"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the choline transporter activity?\nConstraint: You must select none, one or more options from A, B, or C without using any other words.\nOptions:\nA) O=C(CN(c1ccc(F)cc1)S(=O)(=O)c1ccc2c(c1)OCCO2)Nc1cccnc1\nB) COc1ccccc1C(=O)COC(=O)Cc1n[nH]c(=O)c2ccccc12\nC) OC(CNC1CCCC1)COCCOc1ccc(Br)cc1\nAnswer: A, B, C"}", "/scratch/micpie/export/choline_transporter_butkiewicz/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the choline transporter activity.\nDeepSMILES: SccncSC))nc6=O))ccccOCC)))cc6)))))))))CC5\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not inhibiting the choline transporter activity."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the choline transporter activity.\nDeepSMILES: S=O)=O)NCC=O)Nccccnc6)))))))))ccccF)cc6)))))))cccOCCOc6cc%10\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not inhibiting the choline transporter activity."}", "/scratch/micpie/export/choline_transporter_butkiewicz/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the choline transporter activity.\nInChI: InChI=1S\/C15H16N2O2S2\/c1-3-19-11-6-4-10(5-7-11)17-14(18)13-12(8-9-21-13)16-15(17)20-2\/h4-7H,3,8-9H2,1-2H3\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the choline transporter activity.\nMolecule DeepSMILES: S=O)=O)NCC=O)Nccccnc6)))))))))ccccF)cc6)))))))cccOCCOc6cc%10\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/choline_transporter_butkiewicz/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the choline transporter activity.\nMolecule SMILES: S1CCCN=C1Nc1cc(ccc1)C(F)(F)F\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not inhibiting the choline transporter activity."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the choline transporter activity.\nMolecule SMILES: Brc1c(CNS(=O)(=O)c2sccc2)cccc1\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not inhibiting the choline transporter activity."}", "/scratch/micpie/export/choline_transporter_butkiewicz/valid_0-12.jsonl": "{"text":"User: I want to create a SELFIES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be inhibiting the choline transporter activity.\nAssistant: Ok, this SELFIES is not inhibiting the choline transporter activity: [S][C][=C][Branch2][Ring1][O][N][=C][Branch1][Ring1][S][C][N][Branch1][Branch1][C][Ring1][Branch2][=O][C][=C][C][=C][Branch1][Ring2][O][C][C][C][=C][Ring1][=Branch2][C][C][Ring2][Ring1][Branch1]"} {"text":"User: I want to generate a InChI.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be inhibiting the choline transporter activity.\nAssistant: Understood, this InChI is not inhibiting the choline transporter activity: InChI=1S\/C21H18FN3O5S\/c22-15-3-5-17(6-4-15)25(14-21(26)24-16-2-1-9-23-13-16)31(27,28)18-7-8-19-20(12-18)30-11-10-29-19\/h1-9,12-13H,10-11,14H2,(H,24,26)"}", "/scratch/micpie/export/choline_transporter_butkiewicz/train_0-2.jsonl": "{"text":"The canonical SMILES FC(F)(F)c1cccc(NC2=NCCCS2)c1 represents a molecule that displays no inhibition of choline transporter activity."} {"text":"The InChI InChI=1S\/C11H10BrNO2S2\/c12-10-5-2-1-4-9(10)8-13-17(14,15)11-6-3-7-16-11\/h1-7,13H,8H2 is from a molecule that shows no inhibition of choline transporter activity."}", "/scratch/micpie/export/choline_transporter_butkiewicz/test_0-11.jsonl": "{"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be inhibiting the choline transporter activity.\nAssistant: Got it, this canonical SMILES is not inhibiting the choline transporter activity: Cc1cccc2c(C(=O)O)cc(-c3cccnc3)nc12"} {"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be inhibiting the choline transporter activity.\nAssistant: Got it, this SMILES is not inhibiting the choline transporter activity: S(=O)(=O)(N(c1ccc(OCC(=O)NCC2(N(C)C)CCCCC2)cc1)C)c1ccc(cc1)C"}", "/scratch/micpie/export/choline_transporter_butkiewicz/train_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the SELFIES [S][C][C][C][N][=C][Ring1][=Branch1][N][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][F][Branch1][C][F][F] is inhibiting the choline transporter activity?\nAssistant: No, this molecule is not inhibiting the choline transporter activity."} {"text":"User: Can you estimate if the molecule with the DeepSMILES BrccCNS=O)=O)csccc5))))))))cccc6 is inhibiting the choline transporter activity?\nAssistant: No, this molecule is not inhibiting the choline transporter activity."}", "/scratch/micpie/export/choline_transporter_butkiewicz/train_0-11.jsonl": "{"text":"User: I want to generate a molecule InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be inhibiting the choline transporter activity.\nAssistant: Ok, this InChI is not inhibiting the choline transporter activity: InChI=1S\/C11H11F3N2S\/c12-11(13,14)8-3-1-4-9(7-8)16-10-15-5-2-6-17-10\/h1,3-4,7H,2,5-6H2,(H,15,16)"} {"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be inhibiting the choline transporter activity.\nAssistant: Got it, here you go, this DeepSMILES is not inhibiting the choline transporter activity: BrccCNS=O)=O)csccc5))))))))cccc6"}", "/scratch/micpie/export/choline_transporter_butkiewicz/train_0-1.jsonl": "{"text":"Based on the SELFIES representation [S][C][C][C][N][=C][Ring1][=Branch1][N][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][F][Branch1][C][F][F], the molecule shows no inhibition of choline transporter activity."} {"text":"Based on the InChI representation InChI=1S\/C11H10BrNO2S2\/c12-10-5-2-1-4-9(10)8-13-17(14,15)11-6-3-7-16-11\/h1-7,13H,8H2, the molecule shows no inhibition of choline transporter activity."}", "/scratch/micpie/export/choline_transporter_butkiewicz/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the choline transporter activity?\nConstraint: You must select none, one or more options from a, b, c, d, or e without using any additional words.\nOptions:\na Clc1c(Oc2ccc(S(=O)(=O)NCc3ccc(cc3)C(O)=O)cc2)cccc1\nb s1c(NC(=O)c2ccc(OCC)cc2)nc(c1C(OC)=O)C\nc o1c2c(cc(c3cc(N)ccc3)c1=O)cccc2\nd O(C(C)(C)C)C(=O)C(NC(=O)c1nc[nH]c1C(=O)NCc1ccccc1)CC(C)C\ne S1CCCN=C1Nc1cc(ccc1)C(F)(F)F\nAnswer: a, b, c, d, e"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the choline transporter activity?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any other words.\nOptions:\n1) OCCCCCCC6))))))cccccc6))))))C#CCNCCO)))CCO\n2) BrccCNS=O)=O)csccc5))))))))cccc6\n3) S=O)=O)NCCCCC6))C=O)NCCccC6)cccc6))))))))))))))c[nH]cnc5\nAnswer: 1, 2, 3"}", "/scratch/micpie/export/choline_transporter_butkiewicz/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the choline transporter activity.\nMolecule SELFIES: [S][C][C][C][N][=C][Ring1][=Branch1][N][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][F][Branch1][C][F][F]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the choline transporter activity.\nMolecule DeepSMILES: BrccCNS=O)=O)csccc5))))))))cccc6\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/choline_transporter_butkiewicz/test_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the DeepSMILES OC=O)cccncc6)ccccnc6))))))))cccc6)))C is inhibiting the choline transporter activity?\nAssistant: No, this molecule is not inhibiting the choline transporter activity."} {"text":"User: Can you estimate if the molecule with the canonical SMILES Cc1ccc(S(=O)(=O)N(C)c2ccc(OCC(=O)NCC3(N(C)C)CCCCC3)cc2)cc1 is inhibiting the choline transporter activity?\nAssistant: No, this molecule is not inhibiting the choline transporter activity."}", "/scratch/micpie/export/choline_transporter_butkiewicz/train_0-9.jsonl": "{"text":"User: Can you generate the SELFIES of a molecule that is not inhibiting the choline transporter activity?\nAssistant: Yes, here you go: [S][C][C][C][N][=C][Ring1][=Branch1][N][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][F][Branch1][C][F][F]"} {"text":"User: Can you generate the DeepSMILES of a molecule that is not inhibiting the choline transporter activity?\nAssistant: Sure, here you go: BrccCNS=O)=O)csccc5))))))))cccc6"}", "/scratch/micpie/export/choline_transporter_butkiewicz/valid_0-3.jsonl": "{"text":"The InChI InChI=1S\/C15H16N2O2S2\/c1-3-19-11-6-4-10(5-7-11)17-14(18)13-12(8-9-21-13)16-15(17)20-2\/h4-7H,3,8-9H2,1-2H3 is not inhibiting the choline transporter activity."} {"text":"The InChI InChI=1S\/C21H18FN3O5S\/c22-15-3-5-17(6-4-15)25(14-21(26)24-16-2-1-9-23-13-16)31(27,28)18-7-8-19-20(12-18)30-11-10-29-19\/h1-9,12-13H,10-11,14H2,(H,24,26) is not inhibiting the choline transporter activity."}", "/scratch/micpie/export/choline_transporter_butkiewicz/test_0-8.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C16H12N2O2\/c1-10-4-2-6-12-13(16(19)20)8-14(18-15(10)12)11-5-3-7-17-9-11\/h2-9H,1H3,(H,19,20) inhibiting the choline transporter activity?\nAssistant: No, it is not inhibiting the choline transporter activity."} {"text":"User: Is the molecule with the InChI InChI=1S\/C25H35N3O4S\/c1-20-8-14-23(15-9-20)33(30,31)28(4)21-10-12-22(13-11-21)32-18-24(29)26-19-25(27(2)3)16-6-5-7-17-25\/h8-15H,5-7,16-19H2,1-4H3,(H,26,29) inhibiting the choline transporter activity?\nAssistant: No, it is not inhibiting the choline transporter activity."}", "/scratch/micpie/export/choline_transporter_butkiewicz/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the choline transporter activity.\nMolecule SMILES: OC(=O)c1c2c(nc(c1)c1cccnc1)c(ccc2)C\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the choline transporter activity.\nMolecule DeepSMILES: S=O)=O)NccccOCC=O)NCCNC)C))CCCCC6)))))))))))cc6))))))C))cccccc6))C\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/choline_transporter_butkiewicz/test_0-12.jsonl": "{"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be inhibiting the choline transporter activity.\nAssistant: Got it, this canonical SMILES is not inhibiting the choline transporter activity: Cc1cccc2c(C(=O)O)cc(-c3cccnc3)nc12"} {"text":"User: I want to come up with a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be inhibiting the choline transporter activity.\nAssistant: Ok, this canonical SMILES is not inhibiting the choline transporter activity: Cc1ccc(S(=O)(=O)N(C)c2ccc(OCC(=O)NCC3(N(C)C)CCCCC3)cc2)cc1"}", "/scratch/micpie/export/bio_ner_17/valid_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Examples of neighbor clusters (NCs) enriched in differentially expressed genes are a locus involved in L-ascorbate utilization (SPy0175-SPy0179); the citrate lyase locus (SPy1186-SPy1191); the Trx chromosomal locus (SPy1582-SPy1596) that includes the recently described, CovR-repressed two-component response regulator TrxR [ 42]; and the well-studied Mga (SPy2010-SPy2025), SpeB (SPy2037-SPy2042), and capsule synthesis (SPy2200-SPy2202) loci (Fig. 3 and Fig. S4)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: L - ascorbate,104,117,Chemical\nSPy0175,132,139,Protein\nSPy0179,142,149,Protein\nSPy1186,178,185,Protein\nSPy1191,188,195,Protein\nSPy1582,226,233,Protein\nSPy1596,236,243,Protein\nCovR,283,287,Protein\nTrxR,335,339,Protein\nMga,370,373,Protein\nSPy2010,376,383,Protein\nSPy2025,386,393,Protein\nSpeB,396,400,Protein\nSPy2037,403,410,Protein\nSPy2042,413,420,Protein\nSPy2200,447,454,Protein\nSPy2202,457,464,Protein"} {"text":"Task: Please carry out the NER task for the the text below.\nText: Selective IgG anti-HIV adherence to erythrocytes. Pattern of immunoglobulin G anti-HIV (IgG anti-HIV) determined by western blot in plasma of patients (IgG anti-HIV-P), plasma diluted at the same concentration than 100x concentrated IgG present in purified erythrocytes (IgG anti-HIV-Pd), purified erythrocytes (IgG anti-HIV-E), and IgG anti-HIV-E 100x concentrated (IgG anti-HIV-Ec)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: IgG,10,13,Gene\/Protein\nHIV,21,24,Organism\/Species\nimmunoglobulin G,63,79,Gene\/Protein\nHIV,87,90,Organism\/Species\nIgG,93,96,Gene\/Protein\nHIV,104,107,Organism\/Species\nIgG,160,163,Gene\/Protein\nHIV,171,174,Organism\/Species\nIgG,246,249,Gene\/Protein\nIgG,285,288,Gene\/Protein\nHIV,296,299,Organism\/Species\nIgG,332,335,Gene\/Protein\nHIV,343,346,Organism\/Species\nIgG,358,361,Gene\/Protein\nHIV,369,372,Organism\/Species\nIgG,397,400,Gene\/Protein\nHIV,408,411,Organism\/Species"}", "/scratch/micpie/export/bio_ner_17/test_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: In addition, 19 known compounds were identified: beta-sitosteryl and stigmasteryl linoleates, beta-sitosterol, stigmasterol, triacontanol, squalene, alpha-and beta-amyrin, lupeol, lupenone, betulin aldehyde, betulon aldehyde, oleanolic aldehyde, betulinic acid, betulonic acid, moronic acid, morolic acid, oleanolic acid, flavonoids acacetin and acacetin 7-methyl ether..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: beta - sitosteryl and stigmasteryl linoleates,49,94,Chemical\/Drug\nbeta - sitosterol,96,113,Chemical\/Drug\nstigmasterol,115,127,Chemical\/Drug\ntriacontanol,129,141,Chemical\/Drug\nlupeol,180,186,Chemical\/Drug\nlupenone,188,196,Chemical\/Drug\nbetulin aldehyde,198,214,Chemical\/Drug\nbetulon aldehyde,216,232,Chemical\/Drug\noleanolic aldehyde,234,252,Chemical\/Drug\nbetulinic acid,254,268,Chemical\/Drug\nbetulonic acid,270,284,Chemical\/Drug\nmoronic acid,286,298,Chemical\/Drug\nmorolic acid,300,312,Chemical\/Drug\noleanolic acid,314,328,Chemical\/Drug\nflavonoids,330,340,Chemical\/Drug\nacacetin,341,349,Chemical\/Drug\nacacetin 7 - methyl ether,354,379,Chemical\/Drug"} {"text":"Task: Please carry out the NER task for the the text below.\nText: In vitro modulation of β-catenin altered uc. 158 − expression in human malignant hepatocytes. uc. 158 − expression was increased in CTNNB1-mutated human HCCs compared with non-mutated human HCCs, and in human HCC with nuclear localisation of β-catenin. uc. 158 − was increased in TAA rat CCA and reduced after treatment with Wnt \/ β-catenin inhibitors. uc. 158 − expression was negative in human normal liver and biliary epithelia, while it was increased in human CCA in two different cohorts..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: β - catenin,23,34,Gene\/Protein\nhuman,67,72,Organism\/Species\nCTNNB1,134,140,Gene\/Protein\nhuman,151,156,Organism\/Species\nHCCs,157,161,Disease\/Disorder\nhuman,190,195,Organism\/Species\nHCCs,196,200,Disease\/Disorder\nhuman,209,214,Organism\/Species\nHCC,215,218,Disease\/Disorder\nβ - catenin,248,259,Gene\/Protein\nrat,292,295,Organism\/Species\nCCA,296,299,Disease\/Disorder\nWnt,333,336,Gene\/Protein\nβ - catenin,339,350,Gene\/Protein\nhuman,400,405,Organism\/Species\nhuman,468,473,Organism\/Species\nCCA,474,477,Disease\/Disorder"}", "/scratch/micpie/export/bio_ner_17/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: They include genes encoding three subunits of the cytochrome oxidase (cox1 to 3), apocytochrome b (cob), seven subunits of the NADH dehydrogenase complex (nad1 to 6, nad4L), two ATPase subunits (atp6 and atp9), three ribosomal RNAs (rrn5, srn and lrn), 23 tRNAs and four ribosomal proteins (rps3, rps11, rps12 and rpl16)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: cytochrome oxidase,50,68,Gene\/Protein\ncox1 to 3,71,80,Gene\/Protein\napocytochrome b,83,98,Gene\/Protein\ncob,101,104,Gene\/Protein\nNADH dehydrogenase complex,129,155,Gene\/Protein\nnad1 to 6,158,167,Gene\/Protein\nnad4L,169,174,Gene\/Protein\nATPase,181,187,Gene\/Protein\natp6,199,203,Gene\/Protein\natp9,208,212,Gene\/Protein\nrrn5,238,242,Gene\/Protein\nsrn,244,247,Gene\/Protein\nlrn,252,255,Gene\/Protein\nrps3,297,301,Gene\/Protein\nrps11,303,308,Gene\/Protein\nrps12,310,315,Gene\/Protein\nrpl16,320,325,Gene\/Protein"} {"text":"Task: Please carry out the NER task for the the text below.\nText: S1; TableS1): the North Sea (Oslo, Norway), the English Channel (Roscoff, France), the Bay of Biscay (Gijn, Spain), the Mediterranean Sea (Blanes, Spain, and Naples, Italy) and the Black Sea (Varna, Bulgaria)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: North Sea,18,27,place\nOslo,30,34,place\nNorway,36,42,place\nEnglish Channel,49,64,place\nRoscoff,67,74,place\nFrance,76,82,place\nBay of Biscay,89,102,place\nGijn,105,109,place\nSpain,111,116,place\nMediterranean Sea,123,140,place\nBlanes,143,149,place\nSpain,151,156,place\nNaples,162,168,place\nItaly,170,175,place\nBlack Sea,185,194,place\nVarna,197,202,place\nBulgaria,204,212,place"}", "/scratch/micpie/export/chebi_20/test_0-5.jsonl": "{"text":"User: I'm looking for the canonical SMILES of a molecule that can be described by:\nThe molecule is a tetracyclic triterpenoid that is 4,4,8-trimethylandrosta-1,14-diene substituted by an oxo group at position 3, an acetoxy group at position 7 and a furan-3-yl group at position 17. Isolated from Azadirachta indica, it exhibits antiplasmodial and antineoplastic activities. It has a role as an antineoplastic agent, an antiplasmodial drug and a plant metabolite. It is an acetate ester, a cyclic terpene ketone, a member of furans, a limonoid and a tetracyclic triterpenoid.\nAssistant: This is a molecule that fits your description: CC(=O)O[C@@H]1C[C@H]2C(C)(C)C(=O)C=C[C@]2(C)[C@H]2CC[C@]3(C)C(=CC[C@H]3c3ccoc3)[C@@]21C"} {"text":"User: I'm searching for the SMILES of a molecule that can be described by:\nThe molecule is an organophosphate oxoanion obtained by deprotonation of the phosphate OH groups of geranyl phosphate; major species at pH 7.3. It is a conjugate base of a geranyl phosphate.\nAssistant: This is a molecule that fits this description: CC(=CCC\/C(=C\/COP(=O)([O-])[O-])\/C)C"}", "/scratch/micpie/export/chebi_20/test_0-1.jsonl": "{"text":"Based on the SELFIES [C][C][=Branch1][C][=O][O][C@@H1][C][C@@H1][C@][Branch1][=C][C][=C][C][=Branch1][C][=O][C][Ring1][#Branch1][Branch1][C][C][C][Branch2][Ring1][S][C@@H1][C@@][Ring1][=N][Branch2][Ring1][Branch2][C][=C][C][C@H1][Branch1][O][C@@][Ring1][Branch1][Branch1][Branch1][C][C][Ring1][=Branch2][C][C][=C][O][C][=C][Ring1][Branch1][C][C], the molecule can be described as:\nThe molecule is a tetracyclic triterpenoid that is 4,4,8-trimethylandrosta-1,14-diene substituted by an oxo group at position 3, an acetoxy group at position 7 and a furan-3-yl group at position 17. Isolated from Azadirachta indica, it exhibits antiplasmodial and antineoplastic activities. It has a role as an antineoplastic agent, an antiplasmodial drug and a plant metabolite. It is an acetate ester, a cyclic terpene ketone, a member of furans, a limonoid and a tetracyclic triterpenoid."} {"text":"Based on the InChI representation InChI=1S\/C10H19O4P\/c1-9(2)5-4-6-10(3)7-8-14-15(11,12)13\/h5,7H,4,6,8H2,1-3H3,(H2,11,12,13)\/p-2\/b10-7+, the molecule can be described by:\nThe molecule is an organophosphate oxoanion obtained by deprotonation of the phosphate OH groups of geranyl phosphate; major species at pH 7.3. It is a conjugate base of a geranyl phosphate."}", "/scratch/micpie/export/chebi_20/valid_0-0.jsonl": "{"text":"The molecule with the SMILES CCCCC[C@H]1[C@H](O1)\/C=C\/C(C\/C=C\\\\C\/C=C\\\\CCCC(=O)[O-])O can be described as:\nThe molecule is an epoxy(hydroxy)icosatrienoate that is the conjugate base of 11 hydroxy-(14R,15S)-epoxy-(5Z,8Z,12E)-icosatrienoic acid, obtained by deprotonation of the carboxy group; major species at pH 7.3. It is a conjugate base of an 11 hydroxy-(14R,15S)-epoxy-(5Z,8Z,12E)-icosatrienoic acid."} {"text":"The molecule with the SELFIES [C][Branch2][Ring1][Ring1][C@H1][Branch1][#C][C@H1][Branch1][O][C@@H1][Branch1][#Branch1][C][=Branch1][C][=O][C][O][O][O][O][O] can be described by:\nThe molecule is the open-chain form of D-fructose. It is a keto-fructose and a D-fructose. It is an enantiomer of a keto-L-fructose."}", "/scratch/micpie/export/chebi_20/test_0-2.jsonl": "{"text":"Task: Please create a description for a molecule based on its representation.\nMolecule canonical SMILES: CC(=O)O[C@@H]1C[C@H]2C(C)(C)C(=O)C=C[C@]2(C)[C@H]2CC[C@]3(C)C(=CC[C@H]3c3ccoc3)[C@@]21C\nConstraint: Answer the question with complete sentences.\nResult: The molecule is a tetracyclic triterpenoid that is 4,4,8-trimethylandrosta-1,14-diene substituted by an oxo group at position 3, an acetoxy group at position 7 and a furan-3-yl group at position 17. Isolated from Azadirachta indica, it exhibits antiplasmodial and antineoplastic activities. It has a role as an antineoplastic agent, an antiplasmodial drug and a plant metabolite. It is an acetate ester, a cyclic terpene ketone, a member of furans, a limonoid and a tetracyclic triterpenoid."} {"text":"Task: Please create a description for a molecule based on its representation.\nMolecule DeepSMILES: CC=CCC\/C=C\/COP=O)[O-])[O-])))))\/C)))))C\nConstraint: Answer the question with full sentences.\nResult: The molecule is an organophosphate oxoanion obtained by deprotonation of the phosphate OH groups of geranyl phosphate; major species at pH 7.3. It is a conjugate base of a geranyl phosphate."}", "/scratch/micpie/export/chebi_20/train_0-6.jsonl": "{"text":"User: I want to create a InChI.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule can be described as:\nThe molecule is the stable isotope of tellurium with relative atomic mass 124.904425, 71.4 atom percent natural abundance and nuclear spin 1\/2.\nAssistant: Ok, this InChI fits this description: InChI=1S\/Te\/i1-3"} {"text":"User: I want to generate a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule can be described as:\nThe molecule is a hydrate that is the monohydrate form of glycine betaine. It has a role as a fundamental metabolite. It contains a N,N,N-trimethylglycinium.\nAssistant: Ok, here you go, this canonical SMILES fits your description: C[N+](C)(C)CC(=O)[O-].O"}", "/scratch/micpie/export/chebi_20/valid_0-6.jsonl": "{"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule can be described by:\nThe molecule is an epoxy(hydroxy)icosatrienoate that is the conjugate base of 11 hydroxy-(14R,15S)-epoxy-(5Z,8Z,12E)-icosatrienoic acid, obtained by deprotonation of the carboxy group; major species at pH 7.3. It is a conjugate base of an 11 hydroxy-(14R,15S)-epoxy-(5Z,8Z,12E)-icosatrienoic acid.\nAssistant: Ok, this SMILES fits your description: CCCCC[C@H]1[C@H](O1)\/C=C\/C(C\/C=C\\\\C\/C=C\\\\CCCC(=O)[O-])O"} {"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule can be described as:\nThe molecule is the open-chain form of D-fructose. It is a keto-fructose and a D-fructose. It is an enantiomer of a keto-L-fructose.\nAssistant: Ok, here you go, this InChI fits your description: InChI=1S\/C6H12O6\/c7-1-3(9)5(11)6(12)4(10)2-8\/h3,5-9,11-12H,1-2H2\/t3-,5-,6-\/m1\/s1"}", "/scratch/micpie/export/chebi_20/test_0-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CC(=O)O[C@@H]1C[C@H]2C(C)(C)C(=O)C=C[C@]2(C)[C@H]2CC[C@]3(C)C(=CC[C@H]3c3ccoc3)[C@@]21C can be described by:\nThe molecule is a tetracyclic triterpenoid that is 4,4,8-trimethylandrosta-1,14-diene substituted by an oxo group at position 3, an acetoxy group at position 7 and a furan-3-yl group at position 17. Isolated from Azadirachta indica, it exhibits antiplasmodial and antineoplastic activities. It has a role as an antineoplastic agent, an antiplasmodial drug and a plant metabolite. It is an acetate ester, a cyclic terpene ketone, a member of furans, a limonoid and a tetracyclic triterpenoid."} {"text":"The molecule with the SELFIES [C][C][=Branch2][Ring1][Ring1][=C][C][C][\/C][=Branch1][N][=C][\/C][O][P][=Branch1][C][=O][Branch1][C][O-1][O-1][\/C][C] can be described by:\nThe molecule is an organophosphate oxoanion obtained by deprotonation of the phosphate OH groups of geranyl phosphate; major species at pH 7.3. It is a conjugate base of a geranyl phosphate."}", "/scratch/micpie/export/chebi_20/valid_0-7.jsonl": "{"text":"User: I want to generate a InChI.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule can be described by:\nThe molecule is an epoxy(hydroxy)icosatrienoate that is the conjugate base of 11 hydroxy-(14R,15S)-epoxy-(5Z,8Z,12E)-icosatrienoic acid, obtained by deprotonation of the carboxy group; major species at pH 7.3. It is a conjugate base of an 11 hydroxy-(14R,15S)-epoxy-(5Z,8Z,12E)-icosatrienoic acid.\nAssistant: Ok, this InChI fits this description: InChI=1S\/C20H32O4\/c1-2-3-9-13-18-19(24-18)16-15-17(21)12-10-7-5-4-6-8-11-14-20(22)23\/h4,6-7,10,15-19,21H,2-3,5,8-9,11-14H2,1H3,(H,22,23)\/p-1\/b6-4-,10-7-,16-15+\/t17?,18-,19+\/m0\/s1"} {"text":"User: I want to generate a SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule can be described as:\nThe molecule is the open-chain form of D-fructose. It is a keto-fructose and a D-fructose. It is an enantiomer of a keto-L-fructose.\nAssistant: Ok, this SMILES fits your description: C([C@H]([C@H]([C@@H](C(=O)CO)O)O)O)O"}", "/scratch/micpie/export/chebi_20/test_0-3.jsonl": "{"text":"Task: Please generate a molecule DeepSMILES based on the description.\nDescription: The molecule is a tetracyclic triterpenoid that is 4,4,8-trimethylandrosta-1,14-diene substituted by an oxo group at position 3, an acetoxy group at position 7 and a furan-3-yl group at position 17. Isolated from Azadirachta indica, it exhibits antiplasmodial and antineoplastic activities. It has a role as an antineoplastic agent, an antiplasmodial drug and a plant metabolite. It is an acetate ester, a cyclic terpene ketone, a member of furans, a limonoid and a tetracyclic triterpenoid.\nResult: CC=O)O[C@@H]C[C@@H][C@]C=CC=O)C6C)C)))))[C@@H][C@@]6C=CC[C@H][C@@]5CC9))C))C=COC=C5)))))))))C)))C"} {"text":"Task: Please generate a SMILES based on the text description.\nDescription: The molecule is an organophosphate oxoanion obtained by deprotonation of the phosphate OH groups of geranyl phosphate; major species at pH 7.3. It is a conjugate base of a geranyl phosphate.\nResult: CC(=CCC\/C(=C\/COP(=O)([O-])[O-])\/C)C"}", "/scratch/micpie/export/chebi_20/train_0-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of [125Te] can be described as:\nThe molecule is the stable isotope of tellurium with relative atomic mass 124.904425, 71.4 atom percent natural abundance and nuclear spin 1\/2."} {"text":"The molecule with the SELFIES [C][N+1][Branch1][C][C][Branch1][C][C][C][C][=Branch1][C][=O][O-1].[O] can be described as:\nThe molecule is a hydrate that is the monohydrate form of glycine betaine. It has a role as a fundamental metabolite. It contains a N,N,N-trimethylglycinium."}", "/scratch/micpie/export/chebi_20/test_0-6.jsonl": "{"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule can be described by:\nThe molecule is a tetracyclic triterpenoid that is 4,4,8-trimethylandrosta-1,14-diene substituted by an oxo group at position 3, an acetoxy group at position 7 and a furan-3-yl group at position 17. Isolated from Azadirachta indica, it exhibits antiplasmodial and antineoplastic activities. It has a role as an antineoplastic agent, an antiplasmodial drug and a plant metabolite. It is an acetate ester, a cyclic terpene ketone, a member of furans, a limonoid and a tetracyclic triterpenoid.\nAssistant: Ok, here you go, this SMILES fits this description: CC(=O)O[C@@H]1C[C@@H]2[C@](C=CC(=O)C2(C)C)([C@@H]3[C@@]1(C4=CC[C@H]([C@@]4(CC3)C)C5=COC=C5)C)C"} {"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule can be described as:\nThe molecule is an organophosphate oxoanion obtained by deprotonation of the phosphate OH groups of geranyl phosphate; major species at pH 7.3. It is a conjugate base of a geranyl phosphate.\nAssistant: Got it, this DeepSMILES fits this description: CC=CCC\/C=C\/COP=O)[O-])[O-])))))\/C)))))C"}", "/scratch/micpie/export/chebi_20/train_0-3.jsonl": "{"text":"Task: Please give me a molecule InChI based on the description.\nDescription: The molecule is the stable isotope of tellurium with relative atomic mass 124.904425, 71.4 atom percent natural abundance and nuclear spin 1\/2.\nResult: InChI=1S\/Te\/i1-3"} {"text":"Task: Please generate a molecule InChI based on the text description below.\nDescription: The molecule is a hydrate that is the monohydrate form of glycine betaine. It has a role as a fundamental metabolite. It contains a N,N,N-trimethylglycinium.\nResult: InChI=1S\/C5H11NO2.H2O\/c1-6(2,3)4-5(7)8;\/h4H2,1-3H3;1H2"}", "/scratch/micpie/export/chebi_20/valid_0-2.jsonl": "{"text":"Task: Please create a text description for a molecule based on its representation.\nSMILES: CCCCC[C@H]1[C@H](O1)\/C=C\/C(C\/C=C\\\\C\/C=C\\\\CCCC(=O)[O-])O\nConstraint: Answer the question with complete sentences.\nResult: The molecule is an epoxy(hydroxy)icosatrienoate that is the conjugate base of 11 hydroxy-(14R,15S)-epoxy-(5Z,8Z,12E)-icosatrienoic acid, obtained by deprotonation of the carboxy group; major species at pH 7.3. It is a conjugate base of an 11 hydroxy-(14R,15S)-epoxy-(5Z,8Z,12E)-icosatrienoic acid."} {"text":"Task: Please create a text description for a molecule based on its representation.\nSMILES: C([C@H]([C@H]([C@@H](C(=O)CO)O)O)O)O\nConstraint: Answer the question with complete sentences.\nResult: The molecule is the open-chain form of D-fructose. It is a keto-fructose and a D-fructose. It is an enantiomer of a keto-L-fructose."}", "/scratch/micpie/export/chebi_20/valid_0-1.jsonl": "{"text":"Based on the canonical SMILES representation CCCCC[C@@H]1O[C@@H]1\/C=C\/C(O)C\/C=C\\C\/C=C\\CCCC(=O)[O-], the molecule can be described by:\nThe molecule is an epoxy(hydroxy)icosatrienoate that is the conjugate base of 11 hydroxy-(14R,15S)-epoxy-(5Z,8Z,12E)-icosatrienoic acid, obtained by deprotonation of the carboxy group; major species at pH 7.3. It is a conjugate base of an 11 hydroxy-(14R,15S)-epoxy-(5Z,8Z,12E)-icosatrienoic acid."} {"text":"Based on the DeepSMILES C[C@H][C@H][C@@H]C=O)CO)))O))O))O))O, the molecule can be described by:\nThe molecule is the open-chain form of D-fructose. It is a keto-fructose and a D-fructose. It is an enantiomer of a keto-L-fructose."}", "/scratch/micpie/export/chebi_20/valid_0-5.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that can be described as:\nThe molecule is an epoxy(hydroxy)icosatrienoate that is the conjugate base of 11 hydroxy-(14R,15S)-epoxy-(5Z,8Z,12E)-icosatrienoic acid, obtained by deprotonation of the carboxy group; major species at pH 7.3. It is a conjugate base of an 11 hydroxy-(14R,15S)-epoxy-(5Z,8Z,12E)-icosatrienoic acid.\nAssistant: This is a molecule that fits this description: CCCCC[C@@H]1O[C@@H]1\/C=C\/C(O)C\/C=C\\C\/C=C\\CCCC(=O)[O-]"} {"text":"User: I'm looking for the SELFIES of a molecule that can be described as:\nThe molecule is the open-chain form of D-fructose. It is a keto-fructose and a D-fructose. It is an enantiomer of a keto-L-fructose.\nAssistant: This is a molecule that fits your description: [C][Branch2][Ring1][Ring1][C@H1][Branch1][#C][C@H1][Branch1][O][C@@H1][Branch1][#Branch1][C][=Branch1][C][=O][C][O][O][O][O][O]"}", "/scratch/micpie/export/chebi_20/valid_0-4.jsonl": "{"text":"User: Can you create the SMILES of a molecule based in this description:\nThe molecule is an epoxy(hydroxy)icosatrienoate that is the conjugate base of 11 hydroxy-(14R,15S)-epoxy-(5Z,8Z,12E)-icosatrienoic acid, obtained by deprotonation of the carboxy group; major species at pH 7.3. It is a conjugate base of an 11 hydroxy-(14R,15S)-epoxy-(5Z,8Z,12E)-icosatrienoic acid.\nAssistant: Yes, I'm happy to help, here you go: CCCCC[C@H]1[C@H](O1)\/C=C\/C(C\/C=C\\\\C\/C=C\\\\CCCC(=O)[O-])O"} {"text":"User: Can you generate the SELFIES of a molecule based in this description:\nThe molecule is the open-chain form of D-fructose. It is a keto-fructose and a D-fructose. It is an enantiomer of a keto-L-fructose.\nAssistant: Yes, I'm happy to help, here you go: [C][Branch2][Ring1][Ring1][C@H1][Branch1][#C][C@H1][Branch1][O][C@@H1][Branch1][#Branch1][C][=Branch1][C][=O][C][O][O][O][O][O]"}", "/scratch/micpie/export/chebi_20/train_0-5.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that can be described as:\nThe molecule is the stable isotope of tellurium with relative atomic mass 124.904425, 71.4 atom percent natural abundance and nuclear spin 1\/2.\nAssistant: This is a molecule that fits your description: [125Te]"} {"text":"User: I'm searching for the InChI of a molecule that can be described as:\nThe molecule is a hydrate that is the monohydrate form of glycine betaine. It has a role as a fundamental metabolite. It contains a N,N,N-trimethylglycinium.\nAssistant: This is a molecule that fits your description: InChI=1S\/C5H11NO2.H2O\/c1-6(2,3)4-5(7)8;\/h4H2,1-3H3;1H2"}", "/scratch/micpie/export/chebi_20/train_0-2.jsonl": "{"text":"Task: Please create a description for a molecule.\nMolecule canonical SMILES: [125Te]\nConstraint: Answer the question with full sentences.\nResult: The molecule is the stable isotope of tellurium with relative atomic mass 124.904425, 71.4 atom percent natural abundance and nuclear spin 1\/2."} {"text":"Task: Please create a description for a molecule.\nMolecule DeepSMILES: C[N+]C)C)CC=O)[O-].O\nConstraint: Answer the question with full sentences.\nResult: The molecule is a hydrate that is the monohydrate form of glycine betaine. It has a role as a fundamental metabolite. It contains a N,N,N-trimethylglycinium."}", "/scratch/micpie/export/chebi_20/train_0-7.jsonl": "{"text":"User: I want to create a InChI.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule can be described as:\nThe molecule is the stable isotope of tellurium with relative atomic mass 124.904425, 71.4 atom percent natural abundance and nuclear spin 1\/2.\nAssistant: Got it, here you go, this InChI fits this description: InChI=1S\/Te\/i1-3"} {"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule can be described by:\nThe molecule is a hydrate that is the monohydrate form of glycine betaine. It has a role as a fundamental metabolite. It contains a N,N,N-trimethylglycinium.\nAssistant: Ok, here you go, this SMILES fits this description: C[N+](C)(C)CC(=O)[O-].O"}", "/scratch/micpie/export/chebi_20/train_0-1.jsonl": "{"text":"Based on the SELFIES representation [125Te], the molecule can be described as:\nThe molecule is the stable isotope of tellurium with relative atomic mass 124.904425, 71.4 atom percent natural abundance and nuclear spin 1\/2."} {"text":"Based on the DeepSMILES representation C[N+]C)C)CC=O)[O-].O, the molecule can be described by:\nThe molecule is a hydrate that is the monohydrate form of glycine betaine. It has a role as a fundamental metabolite. It contains a N,N,N-trimethylglycinium."}", "/scratch/micpie/export/chebi_20/train_0-4.jsonl": "{"text":"User: Can you create the DeepSMILES of a molecule based in this description:\nThe molecule is the stable isotope of tellurium with relative atomic mass 124.904425, 71.4 atom percent natural abundance and nuclear spin 1\/2.\nAssistant: Yes, I'm happy to help, here you go: [125Te]"} {"text":"User: Can you generate the SMILES of a molecule based in this description:\nThe molecule is a hydrate that is the monohydrate form of glycine betaine. It has a role as a fundamental metabolite. It contains a N,N,N-trimethylglycinium.\nAssistant: Of course, here you go: C[N+](C)(C)CC(=O)[O-].O"}", "/scratch/micpie/export/chebi_20/test_0-7.jsonl": "{"text":"User: I want to create a molecule SELFIES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule can be described as:\nThe molecule is a tetracyclic triterpenoid that is 4,4,8-trimethylandrosta-1,14-diene substituted by an oxo group at position 3, an acetoxy group at position 7 and a furan-3-yl group at position 17. Isolated from Azadirachta indica, it exhibits antiplasmodial and antineoplastic activities. It has a role as an antineoplastic agent, an antiplasmodial drug and a plant metabolite. It is an acetate ester, a cyclic terpene ketone, a member of furans, a limonoid and a tetracyclic triterpenoid.\nAssistant: Got it, here you go, this SELFIES fits your description: [C][C][=Branch1][C][=O][O][C@@H1][C][C@@H1][C@][Branch1][=C][C][=C][C][=Branch1][C][=O][C][Ring1][#Branch1][Branch1][C][C][C][Branch2][Ring1][S][C@@H1][C@@][Ring1][=N][Branch2][Ring1][Branch2][C][=C][C][C@H1][Branch1][O][C@@][Ring1][Branch1][Branch1][Branch1][C][C][Ring1][=Branch2][C][C][=C][O][C][=C][Ring1][Branch1][C][C]"} {"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule can be described as:\nThe molecule is an organophosphate oxoanion obtained by deprotonation of the phosphate OH groups of geranyl phosphate; major species at pH 7.3. It is a conjugate base of a geranyl phosphate.\nAssistant: Got it, here you go, this SMILES fits this description: CC(=CCC\/C(=C\/COP(=O)([O-])[O-])\/C)C"}", "/scratch/micpie/export/chebi_20/valid_0-3.jsonl": "{"text":"Task: Please give me a molecule SMILES based on the text description below.\nDescription: The molecule is an epoxy(hydroxy)icosatrienoate that is the conjugate base of 11 hydroxy-(14R,15S)-epoxy-(5Z,8Z,12E)-icosatrienoic acid, obtained by deprotonation of the carboxy group; major species at pH 7.3. It is a conjugate base of an 11 hydroxy-(14R,15S)-epoxy-(5Z,8Z,12E)-icosatrienoic acid.\nResult: CCCCC[C@H]1[C@H](O1)\/C=C\/C(C\/C=C\\\\C\/C=C\\\\CCCC(=O)[O-])O"} {"text":"Task: Please generate a molecule SMILES based on the text description below.\nDescription: The molecule is the open-chain form of D-fructose. It is a keto-fructose and a D-fructose. It is an enantiomer of a keto-L-fructose.\nResult: C([C@H]([C@H]([C@@H](C(=O)CO)O)O)O)O"}", "/scratch/micpie/export/chebi_20/test_0-4.jsonl": "{"text":"User: Can you generate the canonical SMILES of a molecule based in this description:\nThe molecule is a tetracyclic triterpenoid that is 4,4,8-trimethylandrosta-1,14-diene substituted by an oxo group at position 3, an acetoxy group at position 7 and a furan-3-yl group at position 17. Isolated from Azadirachta indica, it exhibits antiplasmodial and antineoplastic activities. It has a role as an antineoplastic agent, an antiplasmodial drug and a plant metabolite. It is an acetate ester, a cyclic terpene ketone, a member of furans, a limonoid and a tetracyclic triterpenoid.\nAssistant: Sure, here you go: CC(=O)O[C@@H]1C[C@H]2C(C)(C)C(=O)C=C[C@]2(C)[C@H]2CC[C@]3(C)C(=CC[C@H]3c3ccoc3)[C@@]21C"} {"text":"User: Can you generate the SELFIES of a molecule based in this description:\nThe molecule is an organophosphate oxoanion obtained by deprotonation of the phosphate OH groups of geranyl phosphate; major species at pH 7.3. It is a conjugate base of a geranyl phosphate.\nAssistant: Of course, here you go: [C][C][=Branch2][Ring1][Ring1][=C][C][C][\/C][=Branch1][N][=C][\/C][O][P][=Branch1][C][=O][Branch1][C][O-1][O-1][\/C][C]"}", "/scratch/micpie/export/bio_ner_30/test_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Applying a stringent threshold to the output allowed us to identify two groups-genes with high-scoring upstream palindromes (ssaB, ssaG, ssaM, ssaR, sopD2, sifA, sifB, sseK2, sseK3, sseL, sseA ', steC, and srcA) and those with medium-scoring palindromes (0.7-0.8 threshold; ssrA, STM1633, sseI, slrP sspH2, pipB, sseJ, pipB2, srfN, sseA and steB) (Figure 6A and Dataset S4) (sseA'denotes the SsrB palindrome sequence upstream of sseA that falls within the ssaE CDS, while sseA refers to the SsrB-footprinted IGR site with only one conserved heptamer defined in Figure 4C)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: ssaB,130,134,Protein\nssaG,136,140,Protein\nssaM,142,146,Protein\nssaR,148,152,Protein\nsopD2,154,159,Protein\nsifA,161,165,Protein\nsifB,167,171,Protein\nsseK2,173,178,Protein\nsseK3,180,185,Protein\nsseL,187,191,Protein\nsseA,193,197,Protein\nsteC,201,205,Protein\nsrcA,211,215,Protein\nssrA,286,290,Protein\nSTM1633,292,299,Protein\nsseI,301,305,Protein\nslrP,307,311,Protein\nsspH2,312,317,Protein\npipB,319,323,Protein\nsseJ,325,329,Protein\npipB2,331,336,Protein\nsrfN,338,342,Protein\nsseA,344,348,Protein\nsteB,353,357,Protein\nsseA,389,393,Protein\nSsrB,408,412,Protein\nsseA,445,449,Protein\nssaE,472,476,Protein\nsseA,488,492,Protein\nSsrB,507,511,Protein"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Applying a stringent threshold to the output allowed us to identify two groups-genes with high-scoring upstream palindromes (ssaB, ssaG, ssaM, ssaR, sopD2, sifA, sifB, sseK2, sseK3, sseL, sseA ', steC, and srcA) and those with medium-scoring palindromes (0.7-0.8 threshold; ssrA, STM1633, sseI, slrP sspH2, pipB, sseJ, pipB2, srfN, sseA and steB) (Figure 6A and Dataset S4) (sseA'denotes the SsrB palindrome sequence upstream of sseA that falls within the ssaE CDS, while sseA refers to the SsrB-footprinted IGR site with only one conserved heptamer defined in Figure 4C)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: ssaB,130,134,Protein\nssaG,136,140,Protein\nssaM,142,146,Protein\nssaR,148,152,Protein\nsopD2,154,159,Protein\nsifA,161,165,Protein\nsifB,167,171,Protein\nsseK2,173,178,Protein\nsseK3,180,185,Protein\nsseL,187,191,Protein\nsseA,193,197,Protein\nsteC,201,205,Protein\nsrcA,211,215,Protein\nssrA,286,290,Protein\nSTM1633,292,299,Protein\nsseI,301,305,Protein\nslrP,307,311,Protein\nsspH2,312,317,Protein\npipB,319,323,Protein\nsseJ,325,329,Protein\npipB2,331,336,Protein\nsrfN,338,342,Protein\nsseA,344,348,Protein\nsteB,353,357,Protein\nsseA,389,393,Protein\nSsrB,408,412,Protein\nsseA,445,449,Protein\nssaE,472,476,Protein\nsseA,488,492,Protein\nSsrB,507,511,Protein"}", "/scratch/micpie/export/bio_ner_30/train_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: Applying a stringent threshold to the output allowed us to identify two groups-genes with high-scoring upstream palindromes (ssaB, ssaG, ssaM, ssaR, sopD2, sifA, sifB, sseK2, sseK3, sseL, sseA ', steC, and srcA) and those with medium-scoring palindromes (0.7-0.8 threshold; ssrA, STM1633, sseI, slrP sspH2, pipB, sseJ, pipB2, srfN, sseA and steB) (Figure 6A and Dataset S4) (sseA'denotes the SsrB palindrome sequence upstream of sseA that falls within the ssaE CDS, while sseA refers to the SsrB-footprinted IGR site with only one conserved heptamer defined in Figure 4C)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: ssaB,130,134,Protein\nssaG,136,140,Protein\nssaM,142,146,Protein\nssaR,148,152,Protein\nsopD2,154,159,Protein\nsifA,161,165,Protein\nsifB,167,171,Protein\nsseK2,173,178,Protein\nsseK3,180,185,Protein\nsseL,187,191,Protein\nsseA,193,197,Protein\nsteC,201,205,Protein\nsrcA,211,215,Protein\nssrA,286,290,Protein\nSTM1633,292,299,Protein\nsseI,301,305,Protein\nslrP,307,311,Protein\nsspH2,312,317,Protein\npipB,319,323,Protein\nsseJ,325,329,Protein\npipB2,331,336,Protein\nsrfN,338,342,Protein\nsseA,344,348,Protein\nsteB,353,357,Protein\nsseA,389,393,Protein\nSsrB,408,412,Protein\nsseA,445,449,Protein\nssaE,472,476,Protein\nsseA,488,492,Protein\nSsrB,507,511,Protein"} {"text":"Task: Please carry out the NER task for the the text below.\nText: Applying a stringent threshold to the output allowed us to identify two groups-genes with high-scoring upstream palindromes (ssaB, ssaG, ssaM, ssaR, sopD2, sifA, sifB, sseK2, sseK3, sseL, sseA ', steC, and srcA) and those with medium-scoring palindromes (0.7-0.8 threshold; ssrA, STM1633, sseI, slrP sspH2, pipB, sseJ, pipB2, srfN, sseA and steB) (Figure 6A and Dataset S4) (sseA'denotes the SsrB palindrome sequence upstream of sseA that falls within the ssaE CDS, while sseA refers to the SsrB-footprinted IGR site with only one conserved heptamer defined in Figure 4C)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: ssaB,130,134,Protein\nssaG,136,140,Protein\nssaM,142,146,Protein\nssaR,148,152,Protein\nsopD2,154,159,Protein\nsifA,161,165,Protein\nsifB,167,171,Protein\nsseK2,173,178,Protein\nsseK3,180,185,Protein\nsseL,187,191,Protein\nsseA,193,197,Protein\nsteC,201,205,Protein\nsrcA,211,215,Protein\nssrA,286,290,Protein\nSTM1633,292,299,Protein\nsseI,301,305,Protein\nslrP,307,311,Protein\nsspH2,312,317,Protein\npipB,319,323,Protein\nsseJ,325,329,Protein\npipB2,331,336,Protein\nsrfN,338,342,Protein\nsseA,344,348,Protein\nsteB,353,357,Protein\nsseA,389,393,Protein\nSsrB,408,412,Protein\nsseA,445,449,Protein\nssaE,472,476,Protein\nsseA,488,492,Protein\nSsrB,507,511,Protein"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/test_0-10.jsonl": "{"text":"User: Can you create the InChI of a molecule that is not effective for the blocking potassium ion channel activity?\nAssistant: Yes, I'm happy to help, here you go: InChI=1S\/C18H30N2O7S\/c19-17-1-3-18(4-2-17)28(21,22)20-5-7-23-9-11-25-13-15-27-16-14-26-12-10-24-8-6-20\/h1-4H,5-16,19H2"} {"text":"User: Can you give me the canonical SMILES of a molecule that is not reliable for potassium ion channel blocking?\nAssistant: Yes, here you go: O=C(c1ccc2ncsc2c1)N1CCN(c2nc3ccccc3s2)CC1"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/valid_0-8.jsonl": "{"text":"User: Can you estimate if the molecule with the SELFIES [S][=Branch1][C][=O][=Branch1][C][=O][Branch1][=Branch2][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][Branch2][Ring1][Ring2][N][C][=Branch1][C][=O][C][O][C][=C][Branch1][Ring1][O][C][C][=C][C][=C][Ring1][Branch2][=C][Branch1][Ring1][O][C][C][=C][Ring2][Ring1][Branch1] is effective for the blocking potassium ion channel activity?\nAssistant: No, this molecule is not effective for the blocking potassium ion channel activity."} {"text":"User: Can you tell me if the molecule with the SELFIES [Cl][C][=C][Branch2][Ring2][Ring1][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][C][Branch1][Branch1][C][C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=C][Branch1][=Branch1][N+1][Branch1][C][O-1][=O][C][=C][Ring2][Ring1][=N] is reliable for potassium ion channel blocking?\nAssistant: No, this molecule is not reliable for potassium ion channel blocking."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/test_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not reliable for potassium ion channel blocking?\nConstraint: You must select none, one or more options from A, B, or C without using any other words.\nOptions:\nA: [Cl][C][=C][C][Branch2][Ring1][N][N][C][=Branch1][C][=O][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=N][O][N][=C][Ring1][Branch1][C][=C][C][=Ring1][=Branch2][=C][C][=C][Ring2][Ring1][#Branch1][O][C]\nB: [O][Branch2][Ring2][#C][C][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][C][=Branch1][P][=N][N][Branch1][Ring2][C][=Ring1][Branch1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][O][C][=C][Branch1][Ring2][C][=Ring1][Branch1][C][=C][C][=C][Ring1][#Branch1][C]\nC: [S][=Branch1][C][=O][=Branch1][C][=O][Branch2][Ring1][=Branch1][N][C][C][O][C][C][O][C][C][O][C][C][O][C][C][O][C][C][Ring2][Ring1][C][C][=C][C][=C][Branch1][C][N][C][=C][Ring1][#Branch1]\nAnswer: A, B, C"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not reliable for potassium ion channel blocking?\nConstraint: You must select none, one or more options from A, B, or C without using any additional words.\nOptions:\nA.) OCcccccc6))COC))=O)))))))cc\/C=C\\C=O)NCC)C))))C#N))))cccc6\nB.) scNCCNCC6))C=O)cccscnc5cc9))))))))))))))ncc5cccc6\nC.) FCF)F)ccNC=O)COccccOCC)))cc6))))))))))cccc6\nAnswer: A, B, C"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/train_0-8.jsonl": "{"text":"User: Can you estimate if the molecule with the DeepSMILES SCCNCCOCC6))))))))cncnccc5nn9)))cccc6))))))CCC is effective for the blocking potassium ion channel activity?\nAssistant: No, this molecule is not effective for the blocking potassium ion channel activity."} {"text":"User: Can you estimate if the molecule with the InChI InChI=1S\/C16H17ClN2O3\/c17-13-14(18-5-6-19-7-9-22-10-8-19)16(21)12-4-2-1-3-11(12)15(13)20\/h1-4,18H,5-10H2 is reliable for potassium ion channel blocking?\nAssistant: No, this molecule is not reliable for potassium ion channel blocking."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is reliable for potassium ion channel blocking.\nSMILES: S(=O)(=O)(N1CCOCCOCCOCCOCCOCC1)c1ccc(N)cc1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is reliable for potassium ion channel blocking.\nInChI: InChI=1S\/C19H16N4OS2\/c24-18(13-5-6-14-17(11-13)25-12-20-14)22-7-9-23(10-8-22)19-21-15-3-1-2-4-16(15)26-19\/h1-6,11-12H,7-10H2\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/valid_0-9.jsonl": "{"text":"User: Is the molecule with the DeepSMILES S=O)=O)NCCOCC6))))))cccNC=O)COccOC))cccc6))))))))))cOC))cc6 reliable for potassium ion channel blocking?\nAssistant: No, it is not reliable for potassium ion channel blocking."} {"text":"User: Is the molecule with the SMILES Clc1c(S(=O)(=O)N2CCC(CC2)C(=O)NCc2ccc(Cl)cc2)cc([N+]([O-])=O)cc1 reliable for potassium ion channel blocking?\nAssistant: No, it is not reliable for potassium ion channel blocking."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/test_0-1.jsonl": "{"text":"The molecule with the SELFIES representation of [S][=Branch1][C][=O][=Branch1][C][=O][Branch2][Ring1][=Branch1][N][C][C][O][C][C][O][C][C][O][C][C][O][C][C][O][C][C][Ring2][Ring1][C][C][=C][C][=C][Branch1][C][N][C][=C][Ring1][#Branch1] displays no blocking the potassium ion channel."} {"text":"The molecule with the SMILES representation of s1c(N2CCN(CC2)C(=O)c2cc3scnc3cc2)nc2c1cccc2 shows no blocking the potassium ion channel."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/valid_0-0.jsonl": "{"text":"The molecule with the SMILES representation of S(=O)(=O)(N1CCOCC1)c1cc(NC(=O)COc2c(OC)cccc2)c(OC)cc1 is not blocker of the potassium ion channel activity."} {"text":"The molecule with the canonical SMILES representation of O=C(NCc1ccc(Cl)cc1)C1CCN(S(=O)(=O)c2cc([N+](=O)[O-])ccc2Cl)CC1 is not blocker of the potassium ion channel activity."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/test_0-2.jsonl": "{"text":"Based on the DeepSMILES representation S=O)=O)NCCOCCOCCOCCOCCOCC%18))))))))))))))))))ccccN)cc6, the molecule has no reliable for potassium ion channel blocking characteristics."} {"text":"Based on the SELFIES representation [S][C][Branch2][Ring1][N][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][=Branch1][C][=O][C][=C][C][S][C][=N][C][=Ring1][Branch1][C][=C][Ring1][=Branch2][=N][C][=C][Ring2][Ring1][=Branch1][C][=C][C][=C][Ring1][=Branch1], the molecule has no reliable for potassium ion channel blocking properties."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/valid_0-10.jsonl": "{"text":"User: Can you create the SMILES of a molecule that is not reliable for potassium ion channel blocking?\nAssistant: Yes, I'm happy to help, here you go: S(=O)(=O)(N1CCOCC1)c1cc(NC(=O)COc2c(OC)cccc2)c(OC)cc1"} {"text":"User: Can you generate the canonical SMILES of a molecule that is not effective for the blocking potassium ion channel activity?\nAssistant: Yes, I'm happy to help, here you go: O=C(NCc1ccc(Cl)cc1)C1CCN(S(=O)(=O)c2cc([N+](=O)[O-])ccc2Cl)CC1"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/train_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is effective for the blocking potassium ion channel activity.\nMolecule DeepSMILES: SCCNCCOCC6))))))))cncnccc5nn9)))cccc6))))))CCC\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not effective for the blocking potassium ion channel activity."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is reliable for potassium ion channel blocking.\nDeepSMILES: ClC=CNCCNCCOCC6)))))))))C=O)ccC6=O))cccc6\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not reliable for potassium ion channel blocking."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/valid_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is effective for the blocking potassium ion channel activity.\nSMILES: S(=O)(=O)(N1CCOCC1)c1cc(NC(=O)COc2c(OC)cccc2)c(OC)cc1\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not effective for the blocking potassium ion channel activity."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is reliable for potassium ion channel blocking.\nMolecule InChI: InChI=1S\/C19H19Cl2N3O5S\/c20-15-3-1-13(2-4-15)12-22-19(25)14-7-9-23(10-8-14)30(28,29)18-11-16(24(26)27)5-6-17(18)21\/h1-6,11,14H,7-10,12H2,(H,22,25)\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not reliable for potassium ion channel blocking."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/test_0-9.jsonl": "{"text":"User: Is the molecule with the canonical SMILES Nc1ccc(S(=O)(=O)N2CCOCCOCCOCCOCCOCC2)cc1 reliable for potassium ion channel blocking?\nAssistant: No, it is not reliable for potassium ion channel blocking."} {"text":"User: Is the molecule with the canonical SMILES O=C(c1ccc2ncsc2c1)N1CCN(c2nc3ccccc3s2)CC1 effective for the blocking potassium ion channel activity?\nAssistant: No, it is not effective for the blocking potassium ion channel activity."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/test_0-0.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C18H30N2O7S\/c19-17-1-3-18(4-2-17)28(21,22)20-5-7-23-9-11-25-13-15-27-16-14-26-12-10-24-8-6-20\/h1-4H,5-16,19H2 is not blocker of the potassium ion channel activity."} {"text":"The molecule with the canonical SMILES representation of O=C(c1ccc2ncsc2c1)N1CCN(c2nc3ccccc3s2)CC1 is not blocker of the potassium ion channel activity."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/valid_0-7.jsonl": "{"text":"Task: Please create a SMILES based on the description.\nDescription: A molecule that is effective for the blocking potassium ion channel activity.\nResult: S(=O)(=O)(N1CCOCC1)c1cc(NC(=O)COc2c(OC)cccc2)c(OC)cc1"} {"text":"Task: Please create a SMILES based on the description.\nDescription: A molecule that is effective for the blocking potassium ion channel activity.\nResult: Clc1c(S(=O)(=O)N2CCC(CC2)C(=O)NCc2ccc(Cl)cc2)cc([N+]([O-])=O)cc1"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/test_0-3.jsonl": "{"text":"The InChI InChI=1S\/C18H30N2O7S\/c19-17-1-3-18(4-2-17)28(21,22)20-5-7-23-9-11-25-13-15-27-16-14-26-12-10-24-8-6-20\/h1-4H,5-16,19H2 represents a molecule that is not identified as reliable for potassium ion channel blocking."} {"text":"The InChI InChI=1S\/C19H16N4OS2\/c24-18(13-5-6-14-17(11-13)25-12-20-14)22-7-9-23(10-8-22)19-21-15-3-1-2-4-16(15)26-19\/h1-6,11-12H,7-10H2 is from a molecule that is not identified as effective for the blocking potassium ion channel activity."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/valid_0-11.jsonl": "{"text":"User: I'm looking for the SMILES of a molecule that is not reliable for potassium ion channel blocking?\nAssistant: This is a molecule that is not reliable for potassium ion channel blocking: S(=O)(=O)(N1CCOCC1)c1cc(NC(=O)COc2c(OC)cccc2)c(OC)cc1"} {"text":"User: I'm looking for the SMILES of a molecule that is not reliable for potassium ion channel blocking?\nAssistant: This is a molecule that is not reliable for potassium ion channel blocking: Clc1c(S(=O)(=O)N2CCC(CC2)C(=O)NCc2ccc(Cl)cc2)cc([N+]([O-])=O)cc1"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/train_0-0.jsonl": "{"text":"The molecule with the InChI representation of InChI=1S\/C18H23N5OS\/c1-2-7-23-15-6-4-3-5-14(15)16-17(23)19-18(21-20-16)25-13-10-22-8-11-24-12-9-22\/h3-6H,2,7-13H2,1H3 is not blocker of the potassium ion channel activity."} {"text":"The molecule with the InChI InChI=1S\/C16H17ClN2O3\/c17-13-14(18-5-6-19-7-9-22-10-8-19)16(21)12-4-2-1-3-11(12)15(13)20\/h1-4,18H,5-10H2 is not blocker of the potassium ion channel activity."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/test_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is reliable for potassium ion channel blocking.\nSELFIES: [S][=Branch1][C][=O][=Branch1][C][=O][Branch2][Ring1][=Branch1][N][C][C][O][C][C][O][C][C][O][C][C][O][C][C][O][C][C][Ring2][Ring1][C][C][=C][C][=C][Branch1][C][N][C][=C][Ring1][#Branch1]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not reliable for potassium ion channel blocking."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is reliable for potassium ion channel blocking.\nDeepSMILES: scNCCNCC6))C=O)cccscnc5cc9))))))))))))))ncc5cccc6\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not reliable for potassium ion channel blocking."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/train_0-10.jsonl": "{"text":"User: Can you create the DeepSMILES of a molecule that is not effective for the blocking potassium ion channel activity?\nAssistant: Yes, I'm happy to help, here you go: SCCNCCOCC6))))))))cncnccc5nn9)))cccc6))))))CCC"} {"text":"User: Can you generate the SMILES of a molecule that is not effective for the blocking potassium ion channel activity?\nAssistant: Of course, here you go: ClC1=C(NCCN2CCOCC2)C(=O)c2c(C1=O)cccc2"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/train_0-3.jsonl": "{"text":"The DeepSMILES SCCNCCOCC6))))))))cncnccc5nn9)))cccc6))))))CCC represents a molecule that is not identified as effective for the blocking potassium ion channel activity."} {"text":"The DeepSMILES ClC=CNCCNCCOCC6)))))))))C=O)ccC6=O))cccc6 is from a molecule that is not identified as effective for the blocking potassium ion channel activity."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/train_0-12.jsonl": "{"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be reliable for potassium ion channel blocking.\nAssistant: Ok, here you go, this SMILES is not reliable for potassium ion channel blocking: S(CCN1CCOCC1)c1nc2n(c3c(c2nn1)cccc3)CCC"} {"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be reliable for potassium ion channel blocking.\nAssistant: Got it, this InChI is not reliable for potassium ion channel blocking: InChI=1S\/C16H17ClN2O3\/c17-13-14(18-5-6-19-7-9-22-10-8-19)16(21)12-4-2-1-3-11(12)15(13)20\/h1-4,18H,5-10H2"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/test_0-13.jsonl": "{"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be effective for the blocking potassium ion channel activity.\nAssistant: Ok, this SMILES is not effective for the blocking potassium ion channel activity: S(=O)(=O)(N1CCOCCOCCOCCOCCOCC1)c1ccc(N)cc1"} {"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be effective for the blocking potassium ion channel activity.\nAssistant: Ok, this DeepSMILES is not effective for the blocking potassium ion channel activity: scNCCNCC6))C=O)cccscnc5cc9))))))))))))))ncc5cccc6"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/valid_0-2.jsonl": "{"text":"Based on the InChI representation InChI=1S\/C20H24N2O7S\/c1-26-17-8-7-15(30(24,25)22-9-11-28-12-10-22)13-16(17)21-20(23)14-29-19-6-4-3-5-18(19)27-2\/h3-8,13H,9-12,14H2,1-2H3,(H,21,23), the molecule has no reliable for potassium ion channel blocking characteristics."} {"text":"Based on the SMILES Clc1c(S(=O)(=O)N2CCC(CC2)C(=O)NCc2ccc(Cl)cc2)cc([N+]([O-])=O)cc1, the molecule has no reliable for potassium ion channel blocking features."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of CCCn1c2ccccc2c2nnc(SCCN3CCOCC3)nc21 effective for the blocking potassium ion channel activity?\nConstraint: Even if you are uncertain, you must pick either a or b without using any additional words.\nOptions:\na True\nb False\nAnswer: b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [Cl][C][=C][Branch1][N][N][C][C][N][C][C][O][C][C][Ring1][=Branch1][C][=Branch1][C][=O][C][=C][Branch1][Branch1][C][Ring1][S][=O][C][=C][C][=C][Ring1][Branch2] reliable for potassium ion channel blocking?\nConstraint: Even if you are not sure, you must pick either a or b without using any additional words.\nOptions:\n[a] True\n[b] False\nAnswer: b"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/valid_0-1.jsonl": "{"text":"The molecule with the InChI representation of InChI=1S\/C20H24N2O7S\/c1-26-17-8-7-15(30(24,25)22-9-11-28-12-10-22)13-16(17)21-20(23)14-29-19-6-4-3-5-18(19)27-2\/h3-8,13H,9-12,14H2,1-2H3,(H,21,23) shows no blocking the potassium ion channel."} {"text":"The molecule with the canonical SMILES representation of O=C(NCc1ccc(Cl)cc1)C1CCN(S(=O)(=O)c2cc([N+](=O)[O-])ccc2Cl)CC1 exhibits no blocking the potassium ion channel."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/valid_0-13.jsonl": "{"text":"User: I want to create a InChI.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be reliable for potassium ion channel blocking.\nAssistant: Got it, this InChI is not reliable for potassium ion channel blocking: InChI=1S\/C20H24N2O7S\/c1-26-17-8-7-15(30(24,25)22-9-11-28-12-10-22)13-16(17)21-20(23)14-29-19-6-4-3-5-18(19)27-2\/h3-8,13H,9-12,14H2,1-2H3,(H,21,23)"} {"text":"User: I want to generate a SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be reliable for potassium ion channel blocking.\nAssistant: Got it, this SMILES is not reliable for potassium ion channel blocking: Clc1c(S(=O)(=O)N2CCC(CC2)C(=O)NCc2ccc(Cl)cc2)cc([N+]([O-])=O)cc1"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is reliable for potassium ion channel blocking.\nMolecule canonical SMILES: COc1ccc(S(=O)(=O)N2CCOCC2)cc1NC(=O)COc1ccccc1OC\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is reliable for potassium ion channel blocking.\nMolecule InChI: InChI=1S\/C19H19Cl2N3O5S\/c20-15-3-1-13(2-4-15)12-22-19(25)14-7-9-23(10-8-14)30(28,29)18-11-16(24(26)27)5-6-17(18)21\/h1-6,11,14H,7-10,12H2,(H,22,25)\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/train_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not effective for the blocking potassium ion channel activity?\nConstraint: You must select none, one or more options from 1 or 2 without using any other words.\nOptions:\n1: S(CCN1CCOCC1)c1nc2n(c3c(c2nn1)cccc3)CCC\n2: Fc1ccc(COC(=O)CNC(=O)CNC(=O)c2occc2)cc1\nAnswer: 1, 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not effective for the blocking potassium ion channel activity?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any other words.\nOptions:\n1: COC(=O)CN(C#N)c1nc(N(C)C)nc(N(C)C)n1\n2: O=C1C(Cl)=C(NCCN2CCOCC2)C(=O)c2ccccc21\n3: O=C(CSc1nc(-c2ccccc2)c(-c2ccccc2)[nH]1)Nc1ccc2c(c1)OCO2\n4: Fc1ccc(CN2CCN(c3ncnc4scc(-c5ccc(F)cc5)c34)CC2)cc1\nAnswer: 1, 2, 3, 4"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/valid_0-4.jsonl": "{"text":"The canonical SMILES COc1ccc(S(=O)(=O)N2CCOCC2)cc1NC(=O)COc1ccccc1OC is not reliable for potassium ion channel blocking."} {"text":"The DeepSMILES ClccS=O)=O)NCCCCC6))C=O)NCccccCl)cc6))))))))))))))cc[N+][O-])=O))cc6 is not effective for the blocking potassium ion channel activity."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is effective for the blocking potassium ion channel activity.\nMolecule SELFIES: [S][Branch1][O][C][C][N][C][C][O][C][C][Ring1][=Branch1][C][=N][C][N][Branch2][Ring1][C][C][=C][Branch1][Branch2][C][=Ring1][Branch1][N][=N][Ring1][=Branch2][C][=C][C][=C][Ring1][=Branch2][C][C][C]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is reliable for potassium ion channel blocking.\nInChI: InChI=1S\/C16H17ClN2O3\/c17-13-14(18-5-6-19-7-9-22-10-8-19)16(21)12-4-2-1-3-11(12)15(13)20\/h1-4,18H,5-10H2\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/valid_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not effective for the blocking potassium ion channel activity?\nConstraint: You must select none, one or more options from A, B, or C without using any other words.\nOptions:\nA. COc1ccc(S(=O)(=O)N2CCOCC2)cc1NC(=O)COc1ccccc1OC\nB. COC(=O)C(CCSC)NC(=O)Nc1ccc(OC)cc1\nC. COc1cc2c(cc1OC)C(c1ccc(N(C)C)cc1)CC(=O)N2\nAnswer: A, B, C"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not reliable for potassium ion channel blocking?\nConstraint: You must select none, one or more options from a, b, c, or d without using any additional words.\nOptions:\na BrcccCl)cNC=O)NCC)C)))))cc6\nb Clccccc[nH]c5cc9))))C=O)NCCNCC6))C=O\nc SCcccccc6))C))))))cnN)c=O)cnn6))C\nd ClccS=O)=O)NCCCCC6))C=O)NCccccCl)cc6))))))))))))))cc[N+][O-])=O))cc6\nAnswer: a, b, c, d"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/valid_0-12.jsonl": "{"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very curious. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be effective for the blocking potassium ion channel activity.\nAssistant: Ok, here you go, this DeepSMILES is not effective for the blocking potassium ion channel activity: S=O)=O)NCCOCC6))))))cccNC=O)COccOC))cccc6))))))))))cOC))cc6"} {"text":"User: I want to come up with a canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be effective for the blocking potassium ion channel activity.\nAssistant: Got it, here you go, this canonical SMILES is not effective for the blocking potassium ion channel activity: O=C(NCc1ccc(Cl)cc1)C1CCN(S(=O)(=O)c2cc([N+](=O)[O-])ccc2Cl)CC1"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/train_0-2.jsonl": "{"text":"Based on the canonical SMILES representation CCCn1c2ccccc2c2nnc(SCCN3CCOCC3)nc21, the molecule has no reliable for potassium ion channel blocking properties."} {"text":"Based on the DeepSMILES ClC=CNCCNCCOCC6)))))))))C=O)ccC6=O))cccc6, the molecule has no effective for the blocking potassium ion channel activity features."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/test_0-11.jsonl": "{"text":"User: I'm looking for the InChI of a molecule that is not effective for the blocking potassium ion channel activity?\nAssistant: This is a molecule that is not effective for the blocking potassium ion channel activity: InChI=1S\/C18H30N2O7S\/c19-17-1-3-18(4-2-17)28(21,22)20-5-7-23-9-11-25-13-15-27-16-14-26-12-10-24-8-6-20\/h1-4H,5-16,19H2"} {"text":"User: I'm searching for the InChI of a molecule that is not reliable for potassium ion channel blocking?\nAssistant: This is a molecule that is not reliable for potassium ion channel blocking: InChI=1S\/C19H16N4OS2\/c24-18(13-5-6-14-17(11-13)25-12-20-14)22-7-9-23(10-8-22)19-21-15-3-1-2-4-16(15)26-19\/h1-6,11-12H,7-10H2"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/train_0-7.jsonl": "{"text":"Task: Please generate a DeepSMILES based on the text description.\nDescription: A molecule that is reliable for potassium ion channel blocking.\nResult: SCCNCCOCC6))))))))cncnccc5nn9)))cccc6))))))CCC"} {"text":"Task: Please generate a molecule canonical SMILES based on the text description.\nDescription: A molecule that is effective for the blocking potassium ion channel activity.\nResult: O=C1C(Cl)=C(NCCN2CCOCC2)C(=O)c2ccccc21"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/train_0-11.jsonl": "{"text":"User: I'm looking for the InChI of a molecule that is not effective for the blocking potassium ion channel activity?\nAssistant: This is a molecule that is not effective for the blocking potassium ion channel activity: InChI=1S\/C18H23N5OS\/c1-2-7-23-15-6-4-3-5-14(15)16-17(23)19-18(21-20-16)25-13-10-22-8-11-24-12-9-22\/h3-6H,2,7-13H2,1H3"} {"text":"User: I'm looking for the canonical SMILES of a molecule that is not effective for the blocking potassium ion channel activity?\nAssistant: This is a molecule that is not effective for the blocking potassium ion channel activity: O=C1C(Cl)=C(NCCN2CCOCC2)C(=O)c2ccccc21"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/train_0-1.jsonl": "{"text":"The molecule with the SELFIES representation of [S][Branch1][O][C][C][N][C][C][O][C][C][Ring1][=Branch1][C][=N][C][N][Branch2][Ring1][C][C][=C][Branch1][Branch2][C][=Ring1][Branch1][N][=N][Ring1][=Branch2][C][=C][C][=C][Ring1][=Branch2][C][C][C] exhibits no blocking the potassium ion channel."} {"text":"The molecule with the SELFIES representation of [Cl][C][=C][Branch1][N][N][C][C][N][C][C][O][C][C][Ring1][=Branch1][C][=Branch1][C][=O][C][=C][Branch1][Branch1][C][Ring1][S][=O][C][=C][C][=C][Ring1][Branch2] displays no blocking the potassium ion channel."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/train_0-13.jsonl": "{"text":"User: I want to create a molecule SELFIES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be effective for the blocking potassium ion channel activity.\nAssistant: Understood, this SELFIES is not effective for the blocking potassium ion channel activity: [S][Branch1][O][C][C][N][C][C][O][C][C][Ring1][=Branch1][C][=N][C][N][Branch2][Ring1][C][C][=C][Branch1][Branch2][C][=Ring1][Branch1][N][=N][Ring1][=Branch2][C][=C][C][=C][Ring1][=Branch2][C][C][C]"} {"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be reliable for potassium ion channel blocking.\nAssistant: Ok, this DeepSMILES is not reliable for potassium ion channel blocking: ClC=CNCCNCCOCC6)))))))))C=O)ccC6=O))cccc6"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/train_0-4.jsonl": "{"text":"The molecule SMILES S(CCN1CCOCC1)c1nc2n(c3c(c2nn1)cccc3)CCC is not reliable for potassium ion channel blocking."} {"text":"The SMILES ClC1=C(NCCN2CCOCC2)C(=O)c2c(C1=O)cccc2 is not effective for the blocking potassium ion channel activity."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/test_0-7.jsonl": "{"text":"Task: Please generate a molecule InChI based on the text description below.\nDescription: A molecule that is effective for the blocking potassium ion channel activity.\nResult: InChI=1S\/C18H30N2O7S\/c19-17-1-3-18(4-2-17)28(21,22)20-5-7-23-9-11-25-13-15-27-16-14-26-12-10-24-8-6-20\/h1-4H,5-16,19H2"} {"text":"Task: Please generate a SELFIES based on the text description below.\nDescription: A molecule that is effective for the blocking potassium ion channel activity.\nResult: [S][C][Branch2][Ring1][N][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][=Branch1][C][=O][C][=C][C][S][C][=N][C][=Ring1][Branch1][C][=C][Ring1][=Branch2][=N][C][=C][Ring2][Ring1][=Branch1][C][=C][C][=C][Ring1][=Branch1]"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/train_0-9.jsonl": "{"text":"User: Is the molecule with the canonical SMILES CCCn1c2ccccc2c2nnc(SCCN3CCOCC3)nc21 effective for the blocking potassium ion channel activity?\nAssistant: No, it is not effective for the blocking potassium ion channel activity."} {"text":"User: Is the molecule with the DeepSMILES ClC=CNCCNCCOCC6)))))))))C=O)ccC6=O))cccc6 effective for the blocking potassium ion channel activity?\nAssistant: No, it is not effective for the blocking potassium ion channel activity."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/valid_0-3.jsonl": "{"text":"The canonical SMILES COc1ccc(S(=O)(=O)N2CCOCC2)cc1NC(=O)COc1ccccc1OC is from a molecule that is not identified as reliable for potassium ion channel blocking."} {"text":"The canonical SMILES O=C(NCc1ccc(Cl)cc1)C1CCN(S(=O)(=O)c2cc([N+](=O)[O-])ccc2Cl)CC1 is from a molecule that is not identified as effective for the blocking potassium ion channel activity."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/test_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the SMILES S(=O)(=O)(N1CCOCCOCCOCCOCCOCC1)c1ccc(N)cc1 is effective for the blocking potassium ion channel activity?\nAssistant: No, this molecule is not effective for the blocking potassium ion channel activity."} {"text":"User: Can you tell me if the molecule with the SELFIES [S][C][Branch2][Ring1][N][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][=Branch1][C][=O][C][=C][C][S][C][=N][C][=Ring1][Branch1][C][=C][Ring1][=Branch2][=N][C][=C][Ring2][Ring1][=Branch1][C][=C][C][=C][Ring1][=Branch1] is effective for the blocking potassium ion channel activity?\nAssistant: No, this molecule is not effective for the blocking potassium ion channel activity."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [S][=Branch1][C][=O][=Branch1][C][=O][Branch2][Ring1][=Branch1][N][C][C][O][C][C][O][C][C][O][C][C][O][C][C][O][C][C][Ring2][Ring1][C][C][=C][C][=C][Branch1][C][N][C][=C][Ring1][#Branch1] reliable for potassium ion channel blocking?\nConstraint: Even if you are uncertain, you must pick either A or B without using any additional words.\nOptions:\nA: True\nB: False\nAnswer: B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI representation of InChI=1S\/C19H16N4OS2\/c24-18(13-5-6-14-17(11-13)25-12-20-14)22-7-9-23(10-8-22)19-21-15-3-1-2-4-16(15)26-19\/h1-6,11-12H,7-10H2 reliable for potassium ion channel blocking?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any additional words.\nOptions:\n1. False\n2. True\nAnswer: 1"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES [S][=Branch1][C][=O][=Branch1][C][=O][Branch1][=Branch2][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][Branch2][Ring1][Ring2][N][C][=Branch1][C][=O][C][O][C][=C][Branch1][Ring1][O][C][C][=C][C][=C][Ring1][Branch2][=C][Branch1][Ring1][O][C][C][=C][Ring2][Ring1][Branch1] reliable for potassium ion channel blocking?\nConstraint: Even if you are not sure, you must pick either A or B without using any additional words.\nOptions:\n(A) False\n(B) True\nAnswer: A"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES representation of Clc1c(S(=O)(=O)N2CCC(CC2)C(=O)NCc2ccc(Cl)cc2)cc([N+]([O-])=O)cc1 reliable for potassium ion channel blocking?\nConstraint: Even if you are uncertain, you must pick either A or B without using any other words.\nOptions:\n(A) False\n(B) True\nAnswer: A"}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/test_0-4.jsonl": "{"text":"The molecule DeepSMILES S=O)=O)NCCOCCOCCOCCOCCOCC%18))))))))))))))))))ccccN)cc6 is not reliable for potassium ion channel blocking."} {"text":"The SELFIES [S][C][Branch2][Ring1][N][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][=Branch1][C][=O][C][=C][C][S][C][=N][C][=Ring1][Branch1][C][=C][Ring1][=Branch2][=N][C][=C][Ring2][Ring1][=Branch1][C][=C][C][=C][Ring1][=Branch1] is not effective for the blocking potassium ion channel activity."}", "/scratch/micpie/export/potassium_ion_channel_kir2_1_butkiewicz/test_0-12.jsonl": "{"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be effective for the blocking potassium ion channel activity.\nAssistant: Ok, here you go, this DeepSMILES is not effective for the blocking potassium ion channel activity: S=O)=O)NCCOCCOCCOCCOCCOCC%18))))))))))))))))))ccccN)cc6"} {"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be effective for the blocking potassium ion channel activity.\nAssistant: Got it, here you go, this SELFIES is not effective for the blocking potassium ion channel activity: [S][C][Branch2][Ring1][N][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][=Branch1][C][=O][C][=C][C][S][C][=N][C][=Ring1][Branch1][C][=C][Ring1][=Branch2][=N][C][=C][Ring2][Ring1][=Branch1][C][=C][C][=C][Ring1][=Branch1]"}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/test_0-10.jsonl": "{"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a PPBR of 98.290 %.\nAssistant: Got it, this DeepSMILES represents a molecule that has a PPBR of 98.290 %: CCCCCOccccC=O)NCC=O)O))CC)C)))))cc6"} {"text":"User: I want to come up with a InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a human plasma protein binding rate of 54.020 %.\nAssistant: Got it, this InChI represents a molecule that has a human plasma protein binding rate of 54.020 %: InChI=1S\/C22H28N8O5S\/c1-22(2,3)28-36(33,34)17-10-16(11-23-12-17)15-4-5-18-25-20(27-30(18)14-15)26-21(32)24-13-19(31)29-6-8-35-9-7-29\/h4-5,10-12,14,28H,6-9,13H2,1-3H3,(H2,24,26,27,32)"}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/valid_0-8.jsonl": "{"text":"User: Can you give me the SMILES of a molecule that has a PPBR of 95.430 %?\nAssistant: Of course, here you go: COCC[C@H](Oc1ncnc2c1cnn2-c1ccccc1Cl)C(=O)Nc1ccc(C)cn1"} {"text":"User: Can you create the SELFIES of a molecule that has a human plasma protein binding rate of 92.320 %?\nAssistant: Yes, here you go: [C][C][=C][Branch1][#Branch1][C][C][=Branch1][C][=O][O][C][=C][C][Branch1][C][F][=C][C][=C][Ring1][#Branch1][N][Ring1][=C][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=Branch2][S][Branch1][C][C][=Branch1][C][=O][=O][C][=C][Ring1][#Branch2]"}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/train_0-8.jsonl": "{"text":"User: Can you give me the canonical SMILES of a molecule that has a PPBR of 71.530 %?\nAssistant: Of course, here you go: O=CNc1cc([C@@H](O)CNCCc2ccc(NC[C@H](O)c3ccccc3)cc2)ccc1O"} {"text":"User: Can you create the SMILES of a molecule that has a PPBR of 88.820 %?\nAssistant: Yes, here you go: CC(O)c1ccc2c(c1)N(CCN1CCC(NCc3ccc4c(n3)NC(=O)CO4)CC1)C(=O)CO2"}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/test_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the PPBR in %.\ncanonical SMILES: CCCCCOc1ccc(C(=O)NC(C(=O)O)C(C)C)cc1\nConstraint: Even if you are not sure, you must answer with a numeric value in % without the unit and without using any other words.\nResult: 98.290"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the human plasma protein binding rate (PPBR) in %.\nMolecule SMILES: CC(C)(C)NS(=O)(=O)c1cncc(-c2ccc3nc(NC(=O)NCC(=O)N4CCOCC4)nn3c2)c1\nConstraint: Even if you are not sure, you must answer with a numeric value in % without the unit and without using any additional words.\nResult: 54.020"}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/valid_0-9.jsonl": "{"text":"User: I'm looking for the DeepSMILES of a molecule that has a human plasma protein binding rate (PPBR) of 95.430 %.\nAssistant: This is a molecule that has a human plasma protein binding rate (PPBR) of 95.430 %: COCC[C@H]Ocncncc6cnn5-cccccc6Cl)))))))))))))))))C=O)NccccC)cn6"} {"text":"User: I'm looking for the SMILES of a molecule that has a human plasma protein binding rate (PPBR) of 92.320 %.\nAssistant: This is a molecule that has a human plasma protein binding rate (PPBR) of 92.320 %: Cc1c(CC(=O)O)c2cc(F)ccc2n1S(=O)(=O)c1ccc(S(C)(=O)=O)cc1"}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/test_0-1.jsonl": "{"text":"Based on the canonical SMILES representation of CCCCCOc1ccc(C(=O)NC(C(=O)O)C(C)C)cc1, the molecule has a human plasma protein binding rate (PPBR) of 98.290 %."} {"text":"Based on the DeepSMILES representation of CCC)C)NS=O)=O)ccncc-ccccncNC=O)NCC=O)NCCOCC6)))))))))))nn5c9)))))))))c6, the molecule has a human plasma protein binding rate of 54.020 %."}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/valid_0-0.jsonl": "{"text":"The molecule with the SELFIES [C][O][C][C][C@H1][Branch2][Ring1][Branch2][O][C][=N][C][=N][C][=C][Ring1][=Branch1][C][=N][N][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][Cl][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][C][C][C][=N][Ring1][#Branch1] has a PPBR of 95.430 %."} {"text":"The molecule with the InChI representation of InChI=1S\/C18H16FNO6S2\/c1-11-15(10-18(21)22)16-9-12(19)3-8-17(16)20(11)28(25,26)14-6-4-13(5-7-14)27(2,23)24\/h3-9H,10H2,1-2H3,(H,21,22) has a PPBR of 92.320 %."}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/test_0-2.jsonl": "{"text":"The InChI InChI=1S\/C17H25NO4\/c1-4-5-6-11-22-14-9-7-13(8-10-14)16(19)18-15(12(2)3)17(20)21\/h7-10,12,15H,4-6,11H2,1-3H3,(H,18,19)(H,20,21) represents a molecule that has a human plasma protein binding rate (PPBR) of 98.290 %."} {"text":"The DeepSMILES CCC)C)NS=O)=O)ccncc-ccccncNC=O)NCC=O)NCCOCC6)))))))))))nn5c9)))))))))c6 is representing a molecule that has a PPBR of 54.020 %."}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/valid_0-10.jsonl": "{"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a human plasma protein binding rate of 95.430 %.\nAssistant: Ok, this DeepSMILES represents a molecule that has a human plasma protein binding rate of 95.430 %: COCC[C@H]Ocncncc6cnn5-cccccc6Cl)))))))))))))))))C=O)NccccC)cn6"} {"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a PPBR of 92.320 %.\nAssistant: Got it, here you go, this canonical SMILES represents a molecule that has a PPBR of 92.320 %: Cc1c(CC(=O)O)c2cc(F)ccc2n1S(=O)(=O)c1ccc(S(C)(=O)=O)cc1"}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/train_0-6.jsonl": "{"text":"Task: Please create a molecule canonical SMILES based on the text description.\nDescription: A molecule that has a human plasma protein binding rate of 71.530 %.\nResult: O=CNc1cc([C@@H](O)CNCCc2ccc(NC[C@H](O)c3ccccc3)cc2)ccc1O"} {"text":"Task: Please create a molecule DeepSMILES based on the description.\nDescription: A molecule that has a human plasma protein binding rate of 88.820 %.\nResult: CCO)cccccc6)NCCNCCCNCcccccn6)NC=O)CO6)))))))))))CC6))))))))C=O)CO6"}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/valid_0-6.jsonl": "{"text":"Task: Please generate a molecule SELFIES based on the description below.\nDescription: A molecule that has a human plasma protein binding rate of 95.430 %.\nResult: [C][O][C][C][C@H1][Branch2][Ring1][Branch2][O][C][=N][C][=N][C][=C][Ring1][=Branch1][C][=N][N][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][Cl][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][C][C][C][=N][Ring1][#Branch1]"} {"text":"Task: Please create a InChI based on the text description below.\nDescription: A molecule that has a PPBR of 92.320 %.\nResult: InChI=1S\/C18H16FNO6S2\/c1-11-15(10-18(21)22)16-9-12(19)3-8-17(16)20(11)28(25,26)14-6-4-13(5-7-14)27(2,23)24\/h3-9H,10H2,1-2H3,(H,21,22)"}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/test_0-9.jsonl": "{"text":"User: I'm looking for the SELFIES of a molecule that has a PPBR of 98.290 %.\nAssistant: This is a molecule that has a PPBR of 98.290 %: [C][C][C][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][=Branch1][C][=O][O][C][Branch1][C][C][C][C][=C][Ring1][S]"} {"text":"User: I'm looking for the SELFIES of a molecule that has a human plasma protein binding rate (PPBR) of 54.020 %.\nAssistant: This is a molecule that has a human plasma protein binding rate (PPBR) of 54.020 %: [C][C][Branch1][C][C][Branch1][C][C][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][N][=C][C][Branch2][Ring2][=Branch1][C][C][=C][C][=N][C][Branch2][Ring1][Ring2][N][C][=Branch1][C][=O][N][C][C][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][=N][N][Ring2][Ring1][C][C][=Ring2][Ring1][=Branch1][=C][Ring2][Ring1][N]"}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/test_0-0.jsonl": "{"text":"The molecule with the SMILES CCCCCOc1ccc(C(=O)NC(C(=O)O)C(C)C)cc1 has a PPBR of 98.290 %."} {"text":"The molecule with the DeepSMILES representation of CCC)C)NS=O)=O)ccncc-ccccncNC=O)NCC=O)NCCOCC6)))))))))))nn5c9)))))))))c6 has a PPBR of 54.020 %."}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/valid_0-7.jsonl": "{"text":"User: Can you tell me the human plasma protein binding rate in % of the molecule with the DeepSMILES COCC[C@H]Ocncncc6cnn5-cccccc6Cl)))))))))))))))))C=O)NccccC)cn6?\nAssistant: Of course, this molecule has a human plasma protein binding rate of 95.430 %."} {"text":"User: Can you estimate the PPBR in % of the molecule with the SELFIES [C][C][=C][Branch1][#Branch1][C][C][=Branch1][C][=O][O][C][=C][C][Branch1][C][F][=C][C][=C][Ring1][#Branch1][N][Ring1][=C][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=Branch2][S][Branch1][C][C][=Branch1][C][=O][=O][C][=C][Ring1][#Branch2]?\nAssistant: Yes, this molecule has a PPBR of 92.320 %."}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/test_0-3.jsonl": "{"text":"The molecule with the canonical SMILES CCCCCOc1ccc(C(=O)NC(C(=O)O)C(C)C)cc1 has a human plasma protein binding rate of 98.290 %."} {"text":"The molecule with the SELFIES [C][C][Branch1][C][C][Branch1][C][C][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][N][=C][C][Branch2][Ring2][=Branch1][C][C][=C][C][=N][C][Branch2][Ring1][Ring2][N][C][=Branch1][C][=O][N][C][C][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][=N][N][Ring2][Ring1][C][C][=Ring2][Ring1][=Branch1][=C][Ring2][Ring1][N] has a human plasma protein binding rate (PPBR) of 54.020 %."}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/valid_0-11.jsonl": "{"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should have a human plasma protein binding rate of 95.430 %.\nAssistant: Got it, this DeepSMILES represents a molecule that has a human plasma protein binding rate of 95.430 %: COCC[C@H]Ocncncc6cnn5-cccccc6Cl)))))))))))))))))C=O)NccccC)cn6"} {"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a human plasma protein binding rate of 92.320 %.\nAssistant: Understood, this InChI represents a molecule that has a human plasma protein binding rate of 92.320 %: InChI=1S\/C18H16FNO6S2\/c1-11-15(10-18(21)22)16-9-12(19)3-8-17(16)20(11)28(25,26)14-6-4-13(5-7-14)27(2,23)24\/h3-9H,10H2,1-2H3,(H,21,22)"}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/train_0-0.jsonl": "{"text":"The molecule with the SMILES representation of O=CNc1cc([C@@H](O)CNCCc2ccc(NC[C@H](O)c3ccccc3)cc2)ccc1O has a human plasma protein binding rate (PPBR) of 71.530 %."} {"text":"The molecule with the DeepSMILES representation of CCO)cccccc6)NCCNCCCNCcccccn6)NC=O)CO6)))))))))))CC6))))))))C=O)CO6 has a human plasma protein binding rate of 88.820 %."}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/test_0-6.jsonl": "{"text":"Task: Please generate a SMILES based on the description.\nDescription: A molecule that has a PPBR of 98.290 %.\nResult: CCCCCOc1ccc(C(=O)NC(C(=O)O)C(C)C)cc1"} {"text":"Task: Please generate a DeepSMILES based on the text description.\nDescription: A molecule that has a human plasma protein binding rate (PPBR) of 54.020 %.\nResult: CCC)C)NS=O)=O)ccncc-ccccncNC=O)NCC=O)NCCOCC6)))))))))))nn5c9)))))))))c6"}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/train_0-10.jsonl": "{"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a human plasma protein binding rate of 71.530 %.\nAssistant: Got it, here you go, this DeepSMILES represents a molecule that has a human plasma protein binding rate of 71.530 %: O=CNccc[C@@H]O)CNCCccccNC[C@H]O)cccccc6)))))))))cc6)))))))))))ccc6O"} {"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should have a human plasma protein binding rate of 88.820 %.\nAssistant: Got it, here you go, this canonical SMILES represents a molecule that has a human plasma protein binding rate of 88.820 %: CC(O)c1ccc2c(c1)N(CCN1CCC(NCc3ccc4c(n3)NC(=O)CO4)CC1)C(=O)CO2"}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/train_0-3.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C25H29N3O4\/c29-17-28-22-14-20(8-11-23(22)30)24(31)15-26-13-12-18-6-9-21(10-7-18)27-16-25(32)19-4-2-1-3-5-19\/h1-11,14,17,24-27,30-32H,12-13,15-16H2,(H,28,29)\/t24-,25-\/m0\/s1 has a human plasma protein binding rate (PPBR) of 71.530 %."} {"text":"The molecule with the SELFIES [C][C][Branch1][C][O][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][N][Branch2][Ring2][Ring2][C][C][N][C][C][C][Branch2][Ring1][=Branch1][N][C][C][=C][C][=C][C][=Branch1][Ring2][=N][Ring1][=Branch1][N][C][=Branch1][C][=O][C][O][Ring1][Branch2][C][C][Ring2][Ring1][Ring1][C][=Branch1][C][=O][C][O][Ring2][Ring1][=N] has a PPBR of 88.820 %."}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/valid_0-2.jsonl": "{"text":"The SELFIES [C][O][C][C][C@H1][Branch2][Ring1][Branch2][O][C][=N][C][=N][C][=C][Ring1][=Branch1][C][=N][N][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][Cl][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][C][C][C][=N][Ring1][#Branch1] is representing a molecule with a PPBR of 95.430 %."} {"text":"The canonical SMILES Cc1c(CC(=O)O)c2cc(F)ccc2n1S(=O)(=O)c1ccc(S(C)(=O)=O)cc1 represents a molecule with a PPBR of 92.320 %."}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/valid_0-1.jsonl": "{"text":"Based on the InChI representation of InChI=1S\/C22H21ClN6O3\/c1-14-7-8-19(24-11-14)28-21(30)18(9-10-31-2)32-22-15-12-27-29(20(15)25-13-26-22)17-6-4-3-5-16(17)23\/h3-8,11-13,18H,9-10H2,1-2H3,(H,24,28,30)\/t18-\/m0\/s1, the molecule has a human plasma protein binding rate (PPBR) of 95.430 %."} {"text":"Based on the SMILES Cc1c(CC(=O)O)c2cc(F)ccc2n1S(=O)(=O)c1ccc(S(C)(=O)=O)cc1, the molecule has a human plasma protein binding rate of 92.320 %."}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/valid_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the PPBR in %.\ncanonical SMILES: COCC[C@H](Oc1ncnc2c1cnn2-c1ccccc1Cl)C(=O)Nc1ccc(C)cn1\nConstraint: Even if you are not sure, you must answer with a numeric value in % without the unit and without using any additional words.\nResult: 95.430"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the PPBR in %.\nSMILES: Cc1c(CC(=O)O)c2cc(F)ccc2n1S(=O)(=O)c1ccc(S(C)(=O)=O)cc1\nConstraint: Even if you are uncertain, you must answer with a numeric value in % without the unit and without using any other words.\nResult: 92.320"}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/valid_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the human plasma protein binding rate in %.\nMolecule SELFIES: [C][O][C][C][C@H1][Branch2][Ring1][Branch2][O][C][=N][C][=N][C][=C][Ring1][=Branch1][C][=N][N][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][Cl][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][C][C][C][=N][Ring1][#Branch1]\nConstraint: Even if you are uncertain, you must answer with a numeric value in % without using any other words.\nResult: 95.430 %"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the human plasma protein binding rate in %.\nMolecule SELFIES: [C][C][=C][Branch1][#Branch1][C][C][=Branch1][C][=O][O][C][=C][C][Branch1][C][F][=C][C][=C][Ring1][#Branch1][N][Ring1][=C][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=Branch2][S][Branch1][C][C][=Branch1][C][=O][=O][C][=C][Ring1][#Branch2]\nConstraint: Even if you are not sure, you must answer with a numeric value in % without using any additional words.\nResult: 92.320 %"}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/train_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the human plasma protein binding rate (PPBR) in %.\nMolecule DeepSMILES: O=CNccc[C@@H]O)CNCCccccNC[C@H]O)cccccc6)))))))))cc6)))))))))))ccc6O\nConstraint: Even if you are uncertain, you must answer with a numeric value in % without the unit and without using any other words.\nResult: 71.530"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the human plasma protein binding rate (PPBR) in %.\nDeepSMILES: CCO)cccccc6)NCCNCCCNCcccccn6)NC=O)CO6)))))))))))CC6))))))))C=O)CO6\nConstraint: Even if you are uncertain, you must answer with a numeric value in % without the unit and without using any other words.\nResult: 88.820"}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/train_0-2.jsonl": "{"text":"The SMILES O=CNc1cc([C@@H](O)CNCCc2ccc(NC[C@H](O)c3ccccc3)cc2)ccc1O is representing a molecule that has a human plasma protein binding rate (PPBR) of 71.530 %."} {"text":"The DeepSMILES CCO)cccccc6)NCCNCCCNCcccccn6)NC=O)CO6)))))))))))CC6))))))))C=O)CO6 represents a molecule with a human plasma protein binding rate (PPBR) of 88.820 %."}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/test_0-11.jsonl": "{"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should have a human plasma protein binding rate of 98.290 %.\nAssistant: Understood, this InChI represents a molecule that has a human plasma protein binding rate of 98.290 %: InChI=1S\/C17H25NO4\/c1-4-5-6-11-22-14-9-7-13(8-10-14)16(19)18-15(12(2)3)17(20)21\/h7-10,12,15H,4-6,11H2,1-3H3,(H,18,19)(H,20,21)"} {"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should have a human plasma protein binding rate (PPBR) of 54.020 %.\nAssistant: Got it, this SELFIES represents a molecule that has a human plasma protein binding rate (PPBR) of 54.020 %: [C][C][Branch1][C][C][Branch1][C][C][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][N][=C][C][Branch2][Ring2][=Branch1][C][C][=C][C][=N][C][Branch2][Ring1][Ring2][N][C][=Branch1][C][=O][N][C][C][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][=N][N][Ring2][Ring1][C][C][=Ring2][Ring1][=Branch1][=C][Ring2][Ring1][N]"}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/train_0-7.jsonl": "{"text":"User: Can you tell me the human plasma protein binding rate (PPBR) in % of the molecule with the canonical SMILES O=CNc1cc([C@@H](O)CNCCc2ccc(NC[C@H](O)c3ccccc3)cc2)ccc1O?\nAssistant: Sure, this molecule has a human plasma protein binding rate (PPBR) of 71.530 %."} {"text":"User: Can you tell me the PPBR in % of the molecule with the SELFIES [C][C][Branch1][C][O][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][N][Branch2][Ring2][Ring2][C][C][N][C][C][C][Branch2][Ring1][=Branch1][N][C][C][=C][C][=C][C][=Branch1][Ring2][=N][Ring1][=Branch1][N][C][=Branch1][C][=O][C][O][Ring1][Branch2][C][C][Ring2][Ring1][Ring1][C][=Branch1][C][=O][C][O][Ring2][Ring1][=N]?\nAssistant: Of course, this molecule has a PPBR of 88.820 %."}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/train_0-11.jsonl": "{"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should have a PPBR of 71.530 %.\nAssistant: Ok, this SMILES represents a molecule that has a PPBR of 71.530 %: O=CNc1cc([C@@H](O)CNCCc2ccc(NC[C@H](O)c3ccccc3)cc2)ccc1O"} {"text":"User: I want to come up with a SELFIES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should have a human plasma protein binding rate of 88.820 %.\nAssistant: Got it, this SELFIES represents a molecule that has a human plasma protein binding rate of 88.820 %: [C][C][Branch1][C][O][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][N][Branch2][Ring2][Ring2][C][C][N][C][C][C][Branch2][Ring1][=Branch1][N][C][C][=C][C][=C][C][=Branch1][Ring2][=N][Ring1][=Branch1][N][C][=Branch1][C][=O][C][O][Ring1][Branch2][C][C][Ring2][Ring1][Ring1][C][=Branch1][C][=O][C][O][Ring2][Ring1][=N]"}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/train_0-1.jsonl": "{"text":"Based on the InChI representation of InChI=1S\/C25H29N3O4\/c29-17-28-22-14-20(8-11-23(22)30)24(31)15-26-13-12-18-6-9-21(10-7-18)27-16-25(32)19-4-2-1-3-5-19\/h1-11,14,17,24-27,30-32H,12-13,15-16H2,(H,28,29)\/t24-,25-\/m0\/s1, the molecule has a human plasma protein binding rate of 71.530 %."} {"text":"Based on the InChI representation of InChI=1S\/C25H31N5O5\/c1-16(31)17-2-4-21-20(12-17)30(24(33)15-35-21)11-10-29-8-6-18(7-9-29)26-13-19-3-5-22-25(27-19)28-23(32)14-34-22\/h2-5,12,16,18,26,31H,6-11,13-15H2,1H3,(H,27,28,32), the molecule has a human plasma protein binding rate (PPBR) of 88.820 %."}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/train_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the human plasma protein binding rate (PPBR) in %.\nDeepSMILES: O=CNccc[C@@H]O)CNCCccccNC[C@H]O)cccccc6)))))))))cc6)))))))))))ccc6O\nConstraint: Even if you are uncertain, you must answer with a numeric value in % without using any additional words.\nResult: 71.530 %"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the human plasma protein binding rate (PPBR) in %.\nDeepSMILES: CCO)cccccc6)NCCNCCCNCcccccn6)NC=O)CO6)))))))))))CC6))))))))C=O)CO6\nConstraint: Even if you are uncertain, you must answer with a numeric value in % without using any additional words.\nResult: 88.820 %"}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/test_0-7.jsonl": "{"text":"User: Can you derive the human plasma protein binding rate in % of the molecule with the DeepSMILES CCCCCOccccC=O)NCC=O)O))CC)C)))))cc6?\nAssistant: Sure, this molecule has a human plasma protein binding rate of 98.290 %."} {"text":"User: Can you tell me the human plasma protein binding rate (PPBR) in % of the molecule with the SMILES CC(C)(C)NS(=O)(=O)c1cncc(-c2ccc3nc(NC(=O)NCC(=O)N4CCOCC4)nn3c2)c1?\nAssistant: Of course, this molecule has a human plasma protein binding rate (PPBR) of 54.020 %."}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/train_0-9.jsonl": "{"text":"User: I'm looking for the SMILES of a molecule that has a human plasma protein binding rate (PPBR) of 71.530 %.\nAssistant: This is a molecule that has a human plasma protein binding rate (PPBR) of 71.530 %: O=CNc1cc([C@@H](O)CNCCc2ccc(NC[C@H](O)c3ccccc3)cc2)ccc1O"} {"text":"User: I'm searching for the SMILES of a molecule that has a human plasma protein binding rate (PPBR) of 88.820 %.\nAssistant: This is a molecule that has a human plasma protein binding rate (PPBR) of 88.820 %: CC(O)c1ccc2c(c1)N(CCN1CCC(NCc3ccc4c(n3)NC(=O)CO4)CC1)C(=O)CO2"}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/valid_0-3.jsonl": "{"text":"The molecule with the DeepSMILES COCC[C@H]Ocncncc6cnn5-cccccc6Cl)))))))))))))))))C=O)NccccC)cn6 has a PPBR of 95.430 %."} {"text":"The molecule with the SELFIES [C][C][=C][Branch1][#Branch1][C][C][=Branch1][C][=O][O][C][=C][C][Branch1][C][F][=C][C][=C][Ring1][#Branch1][N][Ring1][=C][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=Branch2][S][Branch1][C][C][=Branch1][C][=O][=O][C][=C][Ring1][#Branch2] has a human plasma protein binding rate of 92.320 %."}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/test_0-8.jsonl": "{"text":"User: Can you generate the InChI of a molecule that has a PPBR of 98.290 %?\nAssistant: Of course, here you go: InChI=1S\/C17H25NO4\/c1-4-5-6-11-22-14-9-7-13(8-10-14)16(19)18-15(12(2)3)17(20)21\/h7-10,12,15H,4-6,11H2,1-3H3,(H,18,19)(H,20,21)"} {"text":"User: Can you generate the SMILES of a molecule that has a human plasma protein binding rate of 54.020 %?\nAssistant: Yes, I'm happy to help, here you go: CC(C)(C)NS(=O)(=O)c1cncc(-c2ccc3nc(NC(=O)NCC(=O)N4CCOCC4)nn3c2)c1"}", "/scratch/micpie/export/plasma_protein_binding_rate_astrazeneca/test_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the human plasma protein binding rate in %.\nMolecule InChI: InChI=1S\/C17H25NO4\/c1-4-5-6-11-22-14-9-7-13(8-10-14)16(19)18-15(12(2)3)17(20)21\/h7-10,12,15H,4-6,11H2,1-3H3,(H,18,19)(H,20,21)\nConstraint: Even if you are not sure, you must answer with a numeric value in % without using any other words.\nResult: 98.290 %"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the human plasma protein binding rate (PPBR) in %.\nSMILES: CC(C)(C)NS(=O)(=O)c1cncc(-c2ccc3nc(NC(=O)NCC(=O)N4CCOCC4)nn3c2)c1\nConstraint: Even if you are uncertain, you must answer with a numeric value in % without using any other words.\nResult: 54.020 %"}", "/scratch/micpie/export/mp_bulk_modulus/test_0-5.jsonl": "{"text":"Task: Please create a solid with a bulk modulus computed using DFT with the PBE functional of 58.612 GPa.\nResult: YMgZn2"} {"text":"Task: Please generate a compound with a bulk modulus derived from DFT simulations with the PBE functional of 61.805 GPa.\nResult: CaAlPd"}", "/scratch/micpie/export/mp_bulk_modulus/test_0-1.jsonl": "{"text":"Question: How large is the bulk modulus derived from DFT simulations with the PBE functional of the compound YMgZn2?\nAnswer: The bulk modulus derived from DFT simulations with the PBE functional of the compound YMgZn2 is 58.612 GPa."} {"text":"Question: How large is the bulk modulus derived from DFT simulations with the PBE functional of the compound CaAlPd?\nAnswer: The bulk modulus derived from DFT simulations with the PBE functional of the compound CaAlPd is 61.805 GPa."}", "/scratch/micpie/export/mp_bulk_modulus/valid_0-0.jsonl": "{"text":"The bulk modulus computed using DFT with the PBE GGA functional of VSi2 is 174.498 GPa."} {"text":"The bulk modulus computed using DFT with the PBE functional of the compound Li4CuF7 is 53.028 GPa."}", "/scratch/micpie/export/mp_bulk_modulus/test_0-2.jsonl": "{"text":"User: I would like to know the bulk modulus computed using DFT with the PBE functional of the solid YMgZn2.\nAssistant: The bulk modulus computed using DFT with the PBE functional of the solid YMgZn2 is 58.612 GPa."} {"text":"User: I want to know the bulk modulus computed using DFT with the PBE GGA functional of the solid CaAlPd.\nAssistant: The bulk modulus computed using DFT with the PBE GGA functional of the solid CaAlPd is 61.805 GPa."}", "/scratch/micpie/export/mp_bulk_modulus/test_0-0.jsonl": "{"text":"The bulk modulus computed using DFT with the PBE GGA functional of YMgZn2 is 58.612 GPa."} {"text":"The bulk modulus computed using DFT with the PBE GGA functional of CaAlPd is 61.805 GPa."}", "/scratch/micpie/export/mp_bulk_modulus/test_0-3.jsonl": "{"text":"User: I would like to design a solid with a bulk modulus derived from DFT simulations with the PBE functional of 58.612 GPa.\nAssistant: I have found a solid with a bulk modulus derived from DFT simulations with the PBE functional of 58.612 GPa: YMgZn2."} {"text":"User: I want to design a material with a bulk modulus computed using DFT with the PBE GGA functional of 61.805 GPa.\nAssistant: I have found a material with a bulk modulus computed using DFT with the PBE GGA functional of 61.805 GPa: CaAlPd."}", "/scratch/micpie/export/mp_bulk_modulus/train_0-0.jsonl": "{"text":"The bulk modulus derived from DFT simulations with the PBE functional of the solid TiNbO4 is 162.335 GPa."} {"text":"The bulk modulus computed using DFT with the PBE GGA functional of the solid VBO3 is 181.886 GPa."}", "/scratch/micpie/export/mp_bulk_modulus/train_0-3.jsonl": "{"text":"User: I want to design a material with a bulk modulus computed using DFT with the PBE functional of 162.335 GPa.\nAssistant: Here is a material with a bulk modulus computed using DFT with the PBE functional of 162.335 GPa: TiNbO4."} {"text":"User: I want to design a with a bulk modulus derived from DFT simulations with the PBE functional of 181.886 GPa.\nAssistant: Here is a with a bulk modulus derived from DFT simulations with the PBE functional of 181.886 GPa: VBO3."}", "/scratch/micpie/export/mp_bulk_modulus/valid_0-2.jsonl": "{"text":"User: I would like to know the bulk modulus computed using DFT with the PBE GGA functional of VSi2.\nAssistant: The bulk modulus computed using DFT with the PBE GGA functional of VSi2 is 174.498 GPa."} {"text":"User: I would like to know the bulk modulus derived from DFT simulations with the PBE functional of Li4CuF7.\nAssistant: The bulk modulus derived from DFT simulations with the PBE functional of Li4CuF7 is 53.028 GPa."}", "/scratch/micpie/export/mp_bulk_modulus/valid_0-1.jsonl": "{"text":"Question: How large is the bulk modulus derived from DFT simulations with the PBE functional of the compound VSi2?\nAnswer: The bulk modulus derived from DFT simulations with the PBE functional of the compound VSi2 is 174.498 GPa."} {"text":"Question: How large is the bulk modulus derived from DFT simulations with the PBE functional of Li4CuF7?\nAnswer: The bulk modulus derived from DFT simulations with the PBE functional of Li4CuF7 is 53.028 GPa."}", "/scratch/micpie/export/mp_bulk_modulus/valid_0-5.jsonl": "{"text":"Task: Please generate a compound with a bulk modulus computed using DFT with the PBE GGA functional of 174.498 GPa.\nResult: VSi2"} {"text":"Task: Please generate a material with a bulk modulus computed using DFT with the PBE functional of 53.028 GPa.\nResult: Li4CuF7"}", "/scratch/micpie/export/mp_bulk_modulus/valid_0-4.jsonl": "{"text":"A with a bulk modulus derived from DFT simulations with the PBE functional of 174.498 GPa is VSi2."} {"text":"A solid with a bulk modulus derived from DFT simulations with the PBE functional of 53.028 GPa is Li4CuF7."}", "/scratch/micpie/export/mp_bulk_modulus/train_0-5.jsonl": "{"text":"Task: Please create a with a bulk modulus computed using DFT with the PBE GGA functional of 162.335 GPa.\nResult: TiNbO4"} {"text":"Task: Please generate a compound with a bulk modulus computed using DFT with the PBE GGA functional of 181.886 GPa.\nResult: VBO3"}", "/scratch/micpie/export/mp_bulk_modulus/train_0-2.jsonl": "{"text":"User: I want to know the bulk modulus computed using DFT with the PBE GGA functional of the solid TiNbO4.\nAssistant: The bulk modulus computed using DFT with the PBE GGA functional of the solid TiNbO4 is 162.335 GPa."} {"text":"User: I want to know the bulk modulus computed using DFT with the PBE GGA functional of the solid VBO3.\nAssistant: The bulk modulus computed using DFT with the PBE GGA functional of the solid VBO3 is 181.886 GPa."}", "/scratch/micpie/export/mp_bulk_modulus/train_0-1.jsonl": "{"text":"Question: How large is the bulk modulus derived from DFT simulations with the PBE functional of the solid TiNbO4?\nAnswer: The bulk modulus derived from DFT simulations with the PBE functional of the solid TiNbO4 is 162.335 GPa."} {"text":"Question: How large is the bulk modulus computed using DFT with the PBE GGA functional of the solid VBO3?\nAnswer: The bulk modulus computed using DFT with the PBE GGA functional of the solid VBO3 is 181.886 GPa."}", "/scratch/micpie/export/mp_bulk_modulus/train_0-4.jsonl": "{"text":"A with a bulk modulus computed using DFT with the PBE GGA functional of 162.335 GPa is TiNbO4."} {"text":"A material with a bulk modulus computed using DFT with the PBE GGA functional of 181.886 GPa is VBO3."}", "/scratch/micpie/export/mp_bulk_modulus/valid_0-3.jsonl": "{"text":"User: I would like to design a material with a bulk modulus computed using DFT with the PBE functional of 174.498 GPa.\nAssistant: Here is a material with a bulk modulus computed using DFT with the PBE functional of 174.498 GPa: VSi2."} {"text":"User: I would like to design a solid with a bulk modulus computed using DFT with the PBE GGA functional of 53.028 GPa.\nAssistant: I have found a solid with a bulk modulus computed using DFT with the PBE GGA functional of 53.028 GPa: Li4CuF7."}", "/scratch/micpie/export/mp_bulk_modulus/test_0-4.jsonl": "{"text":"A with a bulk modulus derived from DFT simulations with the PBE functional of 58.612 GPa is YMgZn2."} {"text":"A with a bulk modulus computed using DFT with the PBE GGA functional of 61.805 GPa is CaAlPd."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-35.jsonl": "{"text":"User: I have to design a chemical with 0 violations of Lipinski's rule of 5.\nAssistant: That's interesting, do you have any other conditions?\nUser: I want the chemical to have 46 bonds.\nAssistant: Thanks, that already helps. It would be useful if you could tell me more about the chemical you are looking for.\nUser: I want the chemical to have 2 chiral centers. Moreover, I want the chemical to have a carbon mass proportion of 0.527.\nAssistant: I recommend the chemical with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1."} {"text":"User: I have to design a drug with 0 violations of Lipinski's rule of 5.\nAssistant: That's interesting, do you have any other limitations?\nUser: I would like the drug to have 58 bonds.\nAssistant: Thanks, that already helps constraining my search. It would be great if you could tell me more about the drug you are looking for.\nUser: I would like the drug to have 1 chiral centers. In addition, I want the drug to have a carbon mass proportion of 0.551.\nAssistant: I recommend the drug with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-28.jsonl": "{"text":"The molecule with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has the molecular formula C16H29N3O3."} {"text":"The chemical structure with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has the chemical formula C17H18F3N5O3."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-17.jsonl": "{"text":"The chemical with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has a hydrogen mass fraction of 0.094."} {"text":"The chemical with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has a hydrogen mass fraction of 0.046."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-16.jsonl": "{"text":"The molecule with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has a carbon mass fraction of 0.617."} {"text":"The compound with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has a carbon mass fraction of 0.514."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-39.jsonl": "{"text":"User: I have some questions about the molecule with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C].\nAssistant: How can I be of assistance?\nUser: I want to know the asphericity of this molecule.\nAssistant: The molecule has an asphericity of 0.240."} {"text":"User: I have some questions about the chemical structure with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+.\nAssistant: What can I do for you?\nUser: What is the asphericity of this chemical structure.\nAssistant: The chemical structure has an asphericity of 0.647."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-10.jsonl": "{"text":"The chemical structure with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has 3 nitrogen atoms."} {"text":"The chemical structure with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has 5 nitrogen atoms."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-8.jsonl": "{"text":"The molecule with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has 14 carbon atoms."} {"text":"The chemical with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has 18 carbon atoms."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-22.jsonl": "{"text":"A conformer of the molecule with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has an eccentricity of 0.996."} {"text":"A conformer of the chemical structure with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has an eccentricity of 0.994."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-16.jsonl": "{"text":"The chemical structure with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has a carbon mass fraction of 0.557."} {"text":"The chemical with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has a carbon mass proportion of 0.509."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-34.jsonl": "{"text":"User: I need to design a chemical with 0 violations of Lipinski's rule of five.\nAssistant: Cool, do you have any other limitations?\nUser: I want the chemical to have 51 bonds, 2 chiral centers, and a carbon mass proportion of 0.617.\nAssistant: I recommend the chemical with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C]."} {"text":"User: I need to design a molecule with 0 violations of Lipinski's rule of 5.\nAssistant: Do you have any other conditions?\nUser: I want the molecule to have 49 bonds, 3 chiral centers, and a carbon mass fraction of 0.514.\nAssistant: Given those requirements, I recommend the molecule with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-15.jsonl": "{"text":"The chemical structure with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has a monoisotopic mass of 323.148 Da."} {"text":"The chemical structure with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has a monoisotopic mass of 401.167 Da."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-27.jsonl": "{"text":"A conformer of the molecule with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has a principal moment of inertia 2 (PMI2) value of 2865.258."} {"text":"A conformer of the chemical with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has a PMI2 value of 9927.547."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-8.jsonl": "{"text":"The chemical with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has 16 carbon atoms."} {"text":"The chemical structure with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has 17 carbon atoms."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-5.jsonl": "{"text":"The chemical with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has 0 triple bonds."} {"text":"The chemical structure with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has 0 triple bonds."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-25.jsonl": "{"text":"A conformer of the chemical with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has an normalized principal moment of inertia ratio 2 (NPR2) value of 0.943."} {"text":"A conformer of the molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has an normalized principal moment of inertia ratio 2 (NPR2) value of 0.944."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-9.jsonl": "{"text":"The chemical structure with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has 23 hydrogen atoms."} {"text":"The compound with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has 28 hydrogen atoms."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-26.jsonl": "{"text":"A conformer of the chemical structure with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has a principal moment of inertia 1 (PMI1) value of 550.906."} {"text":"A conformer of the compound with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has a principal moment of inertia 1 value of 1075.263."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-19.jsonl": "{"text":"The chemical structure with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has an oxygen mass proportion of 0.247."} {"text":"The molecule with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has an oxygen mass fraction of 0.120."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-40.jsonl": "{"text":"User: I would like to design a drug with asphericity of 0.789.\nAssistant: Cool, do you have any other limitations?\nUser: In addition, I want the drug to have a eccentricity of 0.996.\nAssistant: Is there anything else I should know?\nUser: In addition, I want the drug to have a normalized principal moment of inertia ratio 1 (NPR1) value of 0.087 and a molecular formula of C15H21N3O5.\nAssistant: I recommend the drug with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1."} {"text":"User: I would like to synthesize a compound with asphericity of 0.647.\nAssistant: Do you have any other limitations?\nUser: Additionally, I want the compound to have a eccentricity of 0.994.\nAssistant: Is there anything else I should take into account?\nUser: Additionally, I want the compound to have a NPR1 value of 0.112 and a chemical formula of C17H22F3N5O3.\nAssistant: I recommend the compound with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22)."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-38.jsonl": "{"text":"User: I have some questions about the molecule with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1.\nAssistant: How can I help?\nUser: I want to know the molecular formula and monoisotopic molecular mass of this molecule.\nAssistant: The molecule has the molecular formula C14H23F2N3O3 and a monoisotopic molecular mass of 319.171 Da."} {"text":"User: I want to ask you about the molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2.\nAssistant: What can I do for you?\nUser: What is the molecular formula and monoisotopic mass of this molecule.\nAssistant: The molecule has the molecular formula C18H28N6O4 and a monoisotopic mass of 392.217 Da."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-28.jsonl": "{"text":"The molecule with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has the molecular formula C14H23F2N3O3."} {"text":"The chemical with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has the chemical formula C18H28N6O4."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-24.jsonl": "{"text":"A conformer of the molecule with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has an normalized principal moment of inertia ratio 1 value of 0.116."} {"text":"A conformer of the compound with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has an NPR1 value of 0.203."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-33.jsonl": "{"text":"User: I have to design a compound with chemical formula C16H29N3O3.\nAssistant: That's interesting, do you have any other limitations?\nUser: I want the compound to have 126 valence electron count, 2 chiral centers, and 0 violations of Lipinski's rule of 5.\nAssistant: Given those requirements, I recommend the compound with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C]."} {"text":"User: I have to design a chemical with chemical formula C17H18F3N5O3.\nAssistant: Cool, do you have any other requirements?\nUser: I would like the chemical to have 150 valence electron count, 3 chiral centers, and 0 violations of Lipinski's rule of 5.\nAssistant: In that case, I recommend the chemical with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-24.jsonl": "{"text":"A conformer of the compound with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has an normalized principal moment of inertia ratio 1 (NPR1) value of 0.364."} {"text":"A conformer of the compound with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has an NPR1 value of 0.132."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-1.jsonl": "{"text":"The compound with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has a rotatable bond proportion of 0.152."} {"text":"The chemical structure with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has a proportion of rotatable bonds of 0.212."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-34.jsonl": "{"text":"User: I need to design a molecule with 0 violations of Lipinski's rule of 5.\nAssistant: That's interesting, do you have any other requirements?\nUser: I would like the molecule to have 46 bonds, 2 chiral centers, and a carbon mass fraction of 0.557.\nAssistant: In that case, I recommend the molecule with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1."} {"text":"User: I need to design a molecule with 0 violations of Lipinski's rule of 5.\nAssistant: Awesome, do you have any other conditions?\nUser: I would like the molecule to have 52 bonds, 2 chiral centers, and a carbon mass fraction of 0.509.\nAssistant: I recommend the molecule with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22)."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-18.jsonl": "{"text":"The chemical structure with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has a nitrogen mass fraction of 0.130."} {"text":"The chemical with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has a nitrogen mass proportion of 0.174."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-29.jsonl": "{"text":"Question: What is the molecular formula and monoisotopic molecular mass of the compound with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: C15H21N3O5, 323.148 Da"} {"text":"Question: What is the molecular formula and monoisotopic mass of the chemical structure with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22)?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: C17H22F3N5O3, 401.167 Da"}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-0.jsonl": "{"text":"The compound with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has 126 valence electrons."} {"text":"The molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has 154 valence electrons."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-39.jsonl": "{"text":"User: I want to ask you about the molecule with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1.\nAssistant: How can I be of assistance?\nUser: I need to know the asphericity of this molecule.\nAssistant: The molecule has an asphericity of 0.756."} {"text":"User: I want to ask you about the chemical with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2.\nAssistant: How can I help?\nUser: I want to know the asphericity of this chemical.\nAssistant: The chemical has an asphericity of 0.338."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-36.jsonl": "{"text":"User: I have to design a chemical with 0 violations of Lipinski's rule of 5.\nAssistant: Do you have any other conditions?\nUser: I want the chemical to have 51 bonds.\nAssistant: OK, that already helps. It would help if you could tell me more about the chemical you are interested in.\nUser: I want the chemical to have 2 chiral centers. In addition, I want the chemical to have a monoisotopic molecular mass of 311.221 Da.\nAssistant: I recommend the chemical with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C]."} {"text":"User: I want to design a chemical structure with 0 violations of Lipinski's rule of five.\nAssistant: Awesome, do you have any other limitations?\nUser: I want the chemical structure to have 49 bonds.\nAssistant: OK, that already helps. It would be great if you could tell me more about the chemical structure you want to design.\nUser: I want the chemical structure to have 3 chiral centers. Moreover, I want the chemical structure to have a monoisotopic mass of 397.136 Da.\nAssistant: I recommend the chemical structure with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-33.jsonl": "{"text":"User: I have to design a molecule with chemical formula C15H21N3O5.\nAssistant: That's interesting, do you have any other limitations?\nUser: I would like the molecule to have 126 valence electron count, 2 chiral centers, and 0 violations of Lipinski's rule of five.\nAssistant: I recommend the molecule with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1."} {"text":"User: I want to design a chemical structure with molecular formula C17H22F3N5O3.\nAssistant: Awesome, do you have any other constraints?\nUser: I want the chemical structure to have 154 number of valence electrons, 2 chiral centers, and 0 violations of Lipinski's rule of 5.\nAssistant: In that case, I recommend the chemical structure with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22)."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-32.jsonl": "{"text":"Question: What is the carbon mass fraction, hydrogen mass fraction, nitrogen mass fraction, and oxygen mass proportion of the chemical with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: 0.557, 0.065, 0.130, 0.247"} {"text":"Question: What is the carbon mass proportion, hydrogen mass proportion, nitrogen mass proportion, and oxygen mass fraction of the molecule with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22)?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: 0.509, 0.055, 0.174, 0.120"}", "/scratch/micpie/export/chemcaption_rdkit/test_0-21.jsonl": "{"text":"The chemical with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has an inertial shape factor of 0.002."} {"text":"The chemical structure with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has an inertial shape factor of 0.001."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-27.jsonl": "{"text":"A conformer of the molecule with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has a principal moment of inertia 2 value of 6088.795."} {"text":"A conformer of the molecule with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has a principal moment of inertia 2 (PMI2) value of 9404.572."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-2.jsonl": "{"text":"The chemical with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has a non-rotatable bond proportion of 0.848."} {"text":"The chemical with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has a proportion of non-rotatable bonds of 0.788."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-30.jsonl": "{"text":"Question: What is the molecular formula and valence electron count of the chemical structure with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: C15H21N3O5, 126"} {"text":"Question: What is the chemical formula and number of valence electrons of the compound with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22)?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: C17H22F3N5O3, 154"}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-42.jsonl": "{"text":"User: I want to make a drug with NPR1 value of 0.116.\nAssistant: Cool, do you have any other limitations?\nUser: Additionally, I want the drug to have a normalized principal moment of inertia ratio 2 value of 0.943.\nAssistant: Is there anything else I should take into account?\nUser: Additionally, I want the drug to have a chemical formula of C14H23F2N3O3.\nAssistant: I recommend the drug with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1."} {"text":"User: I would like to synthesize a molecule with normalized principal moment of inertia ratio 1 value of 0.203.\nAssistant: Cool, do you have any other limitations?\nUser: Additionally, I would like the molecule to have a normalized principal moment of inertia ratio 2 value of 0.944.\nAssistant: Is there anything else I should consider?\nUser: Additionally, I would like the molecule to have a molecular formula of C18H28N6O4.\nAssistant: I recommend the molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-41.jsonl": "{"text":"User: I would like to create a molecule with eccentricity of 0.996.\nAssistant: Do you have any other requirements?\nUser: Moreover, I would like the molecule to have a asphericity of 0.789.\nAssistant: Is there anything else I should consider?\nUser: Moreover, I would like the molecule to have a 0 violations of Lipinski's rule of five and a molecular formula of C15H21N3O5.\nAssistant: In that case, I recommend the molecule with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1."} {"text":"User: I must to make a molecule with eccentricity of 0.994.\nAssistant: That's interesting, do you have any other conditions?\nUser: Moreover, I would like the molecule to have a asphericity of 0.647.\nAssistant: Is there anything else I should take into account?\nUser: Moreover, I would like the molecule to have a 0 violations of Lipinski's rule of 5 and a chemical formula of C17H22F3N5O3.\nAssistant: I recommend the molecule with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22)."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-22.jsonl": "{"text":"A conformer of the molecule with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has an eccentricity of 0.932."} {"text":"A conformer of the chemical with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has an eccentricity of 0.991."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-31.jsonl": "{"text":"Question: What is the molecular formula, rotatable bond proportion, and chiral center count of the chemical structure with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C]?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: C16H29N3O3, 0.275, 2"} {"text":"Question: What is the chemical formula, proportion of rotatable bonds, and chiral center count of the chemical with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: C17H18F3N5O3, 0.163, 3"}", "/scratch/micpie/export/chemcaption_rdkit/train_0-35.jsonl": "{"text":"User: I need to design a chemical structure with 0 violations of Lipinski's rule of 5.\nAssistant: Awesome, do you have any other limitations?\nUser: I want the chemical structure to have 51 bonds.\nAssistant: Thanks, that already helps. It would be great if you could tell me more about the chemical structure you are looking for.\nUser: I want the chemical structure to have 2 chiral centers. Moreover, I want the chemical structure to have a carbon mass fraction of 0.617.\nAssistant: Given those requirements, I recommend the chemical structure with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C]."} {"text":"User: I have to design a molecule with 0 violations of Lipinski's rule of five.\nAssistant: Awesome, do you have any other requirements?\nUser: I want the molecule to have 49 bonds.\nAssistant: It would be great if you could tell me more about the molecule you are looking for.\nUser: I want the molecule to have 3 chiral centers. Moreover, I want the molecule to have a carbon mass fraction of 0.514.\nAssistant: I recommend the molecule with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-10.jsonl": "{"text":"The chemical with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has 3 nitrogen atoms."} {"text":"The compound with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has 6 nitrogen atoms."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-6.jsonl": "{"text":"The compound with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has 0 aromatic bonds."} {"text":"The molecule with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has 10 aromatic bonds."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-6.jsonl": "{"text":"The chemical with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has 0 aromatic bonds."} {"text":"The compound with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has 5 aromatic bonds."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-32.jsonl": "{"text":"Question: What is the carbon mass proportion, hydrogen mass proportion, nitrogen mass proportion, and oxygen mass proportion of the molecule with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: 0.527, 0.073, 0.132, 0.150"} {"text":"Question: What is the carbon mass fraction, hydrogen mass proportion, nitrogen mass proportion, and oxygen mass fraction of the compound with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: 0.551, 0.072, 0.214, 0.163"}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-30.jsonl": "{"text":"Question: What is the molecular formula and number of valence electrons of the chemical structure with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: C14H23F2N3O3, 126"} {"text":"Question: What is the chemical formula and valence electron count of the chemical structure with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: C18H28N6O4, 154"}", "/scratch/micpie/export/chemcaption_rdkit/train_0-21.jsonl": "{"text":"The chemical with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has an inertial shape factor of 0.001."} {"text":"The chemical with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has an inertial shape factor of 0.001."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-36.jsonl": "{"text":"User: I need to design a compound with 0 violations of Lipinski's rule of five.\nAssistant: That's interesting, do you have any other limitations?\nUser: I want the compound to have 46 bonds.\nAssistant: It would be useful if you could tell me more about the compound you want to design.\nUser: I want the compound to have 2 chiral centers. Moreover, I want the compound to have a monoisotopic mass of 323.148 Da.\nAssistant: I recommend the compound with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1."} {"text":"User: I need to design a compound with 0 violations of Lipinski's rule of 5.\nAssistant: Cool, do you have any other constraints?\nUser: I want the compound to have 52 bonds.\nAssistant: It would be useful if you could tell me more about the compound you are looking for.\nUser: I want the compound to have 2 chiral centers. Additionally, I want the compound to have a monoisotopic molecular mass of 401.167 Da.\nAssistant: Given those requirements, I recommend the compound with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22)."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-39.jsonl": "{"text":"User: I want to ask you about the chemical with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1.\nAssistant: How can I be of assistance?\nUser: I want to know the asphericity of this chemical.\nAssistant: The chemical has an asphericity of 0.789."} {"text":"User: I want to ask you about the compound with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22).\nAssistant: How can I help?\nUser: I want to know the asphericity of this compound.\nAssistant: The compound has an asphericity of 0.647."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-36.jsonl": "{"text":"User: I need to design a chemical with 0 violations of Lipinski's rule of 5.\nAssistant: That's interesting, do you have any other constraints?\nUser: I want the chemical to have 46 bonds.\nAssistant: OK, that already helps. It would be useful if you could tell me more about the chemical you want to design.\nUser: I want the chemical to have 2 chiral centers. Additionally, I want the chemical to have a monoisotopic molecular mass of 319.171 Da.\nAssistant: In that case, I recommend the chemical with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1."} {"text":"User: I need to design a compound with 0 violations of Lipinski's rule of five.\nAssistant: Cool, do you have any other requirements?\nUser: I would like the compound to have 58 bonds.\nAssistant: Thanks, that already helps constraining my search. It would be great if you could tell me more about the compound you want to design.\nUser: I would like the compound to have 1 chiral centers. In addition, I want the compound to have a monoisotopic mass of 392.217 Da.\nAssistant: Given those requirements, I recommend the compound with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-19.jsonl": "{"text":"The chemical with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has an oxygen mass fraction of 0.154."} {"text":"The molecule with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has an oxygen mass proportion of 0.121."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-29.jsonl": "{"text":"Question: What is the chemical formula and monoisotopic mass of the compound with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: C14H23F2N3O3, 319.171 Da"} {"text":"Question: What is the chemical formula and monoisotopic mass of the molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: C18H28N6O4, 392.217 Da"}", "/scratch/micpie/export/chemcaption_rdkit/test_0-42.jsonl": "{"text":"User: I would like to synthesize a chemical structure with normalized principal moment of inertia ratio 1 value of 0.087.\nAssistant: Do you have any other constraints?\nUser: Additionally, I would like the chemical structure to have a normalized principal moment of inertia ratio 2 value of 0.964.\nAssistant: Is there anything else I should take into account?\nUser: Additionally, I would like the chemical structure to have a chemical formula of C15H21N3O5.\nAssistant: Given those requirements, I recommend the chemical structure with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1."} {"text":"User: I would like to design a molecule with NPR1 value of 0.112.\nAssistant: Awesome, do you have any other limitations?\nUser: Additionally, I want the molecule to have a normalized principal moment of inertia ratio 2 (NPR2) value of 0.979.\nAssistant: Is there anything else I should take into consideration?\nUser: Additionally, I want the molecule to have a chemical formula of C17H22F3N5O3.\nAssistant: I recommend the molecule with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22)."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-9.jsonl": "{"text":"The chemical structure with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has 21 hydrogen atoms."} {"text":"The chemical structure with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has 22 hydrogen atoms."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-32.jsonl": "{"text":"Question: What is the carbon mass proportion, hydrogen mass proportion, nitrogen mass proportion, and oxygen mass proportion of the compound with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C]?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: 0.617, 0.094, 0.135, 0.154"} {"text":"Question: What is the carbon mass proportion, hydrogen mass fraction, nitrogen mass fraction, and oxygen mass proportion of the molecule with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: 0.514, 0.046, 0.176, 0.121"}", "/scratch/micpie/export/chemcaption_rdkit/train_0-40.jsonl": "{"text":"User: I would like to make a drug with asphericity of 0.240.\nAssistant: Do you have any other limitations?\nUser: In addition, I would like the drug to have a eccentricity of 0.932.\nAssistant: Is there anything else I should be aware of?\nUser: In addition, I would like the drug to have a NPR1 value of 0.364 and a chemical formula of C16H29N3O3.\nAssistant: Given those requirements, I recommend the drug with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C]."} {"text":"User: I would like to design a chemical structure with asphericity of 0.647.\nAssistant: Do you have any other constraints?\nUser: Moreover, I want the chemical structure to have a eccentricity of 0.991.\nAssistant: Is there anything else I should take into consideration?\nUser: Moreover, I want the chemical structure to have a normalized principal moment of inertia ratio 1 (NPR1) value of 0.132 and a molecular formula of C17H18F3N5O3.\nAssistant: Given those requirements, I recommend the chemical structure with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-0.jsonl": "{"text":"The chemical structure with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has 126 valence electrons."} {"text":"The compound with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has 154 valence electrons."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-31.jsonl": "{"text":"Question: What is the chemical formula, rotatable bond proportion, and chiral center count of the molecule with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: C15H21N3O5, 0.152, 2"} {"text":"Question: What is the chemical formula, rotatable bond proportion, and chiral center count of the compound with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22)?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: C17H22F3N5O3, 0.212, 2"}", "/scratch/micpie/export/chemcaption_rdkit/test_0-24.jsonl": "{"text":"A conformer of the chemical with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has an NPR1 value of 0.087."} {"text":"A conformer of the molecule with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has an normalized principal moment of inertia ratio 1 value of 0.112."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-16.jsonl": "{"text":"The chemical structure with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has a carbon mass proportion of 0.527."} {"text":"The molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has a carbon mass fraction of 0.551."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-7.jsonl": "{"text":"The molecule with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has 46 bonds."} {"text":"The compound with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has 58 bonds."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-34.jsonl": "{"text":"User: I need to design a chemical with 0 violations of Lipinski's rule of 5.\nAssistant: Cool, do you have any other conditions?\nUser: I want the chemical to have 46 bonds, 2 chiral centers, and a carbon mass proportion of 0.527.\nAssistant: In that case, I recommend the chemical with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1."} {"text":"User: I have to design a chemical structure with 0 violations of Lipinski's rule of 5.\nAssistant: Do you have any other conditions?\nUser: I would like the chemical structure to have 58 bonds, 1 chiral centers, and a carbon mass proportion of 0.551.\nAssistant: Given those requirements, I recommend the chemical structure with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-3.jsonl": "{"text":"The chemical with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has 39 single bonds."} {"text":"The compound with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has 44 single bonds."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-11.jsonl": "{"text":"The chemical with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has 3 oxygen atoms."} {"text":"The molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has 4 oxygen atoms."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-20.jsonl": "{"text":"The compound with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has 2 chiral centers."} {"text":"The chemical structure with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has 3 chiral centers."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-43.jsonl": "{"text":"User: I would like to make a compound with principal moment of inertia 1 value of 1321.670.\nAssistant: Awesome, do you have any other conditions?\nUser: Additionally, I want the compound to have a PMI2 value of 2865.258.\nAssistant: Is there anything else I should consider?\nUser: Additionally, I want the compound to have a chemical formula of C16H29N3O3.\nAssistant: Given those requirements, I recommend the compound with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C]."} {"text":"User: I would like to design a drug with PMI1 value of 1436.072.\nAssistant: That's interesting, do you have any other requirements?\nUser: In addition, I want the drug to have a principal moment of inertia 2 (PMI2) value of 9927.547.\nAssistant: Is there anything else I should take into account?\nUser: In addition, I want the drug to have a molecular formula of C17H18F3N5O3.\nAssistant: I recommend the drug with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-30.jsonl": "{"text":"Question: What is the molecular formula and valence electron count of the chemical with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C]?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: C16H29N3O3, 126"} {"text":"Question: What is the molecular formula and number of valence electrons of the molecule with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: C17H18F3N5O3, 150"}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-20.jsonl": "{"text":"The chemical with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has 2 chiral centers."} {"text":"The chemical structure with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has 1 chiral centers."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-37.jsonl": "{"text":"User: I have to design a compound with a monoisotopic molecular mass of 319.171 Da.\nAssistant: That's interesting, do you have any other limitations?\nUser: I want the compound to have 46 bonds.\nAssistant: It would be useful if you could tell me more about the compound you are interested in.\nUser: I want the compound to have 2 chiral centers.\nAssistant: Given those requirements, I recommend the compound with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1."} {"text":"User: I have to design a compound with a monoisotopic mass of 392.217 Da.\nAssistant: Awesome, do you have any other constraints?\nUser: I would like the compound to have 58 bonds.\nAssistant: OK, that already helps constraining my search. It would be great if you could tell me more about the compound you are interested in.\nUser: I would like the compound to have 1 chiral centers.\nAssistant: In that case, I recommend the compound with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-26.jsonl": "{"text":"A conformer of the chemical with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has a principal moment of inertia 1 value of 1321.670."} {"text":"A conformer of the compound with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has a principal moment of inertia 1 value of 1436.072."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-0.jsonl": "{"text":"The compound with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has 126 valence electrons."} {"text":"The compound with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has 150 valence electrons."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-6.jsonl": "{"text":"The chemical structure with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has 5 aromatic bonds."} {"text":"The chemical structure with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has 6 aromatic bonds."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-10.jsonl": "{"text":"The chemical with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has 3 nitrogen atoms."} {"text":"The compound with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has 5 nitrogen atoms."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-3.jsonl": "{"text":"The chemical with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has 48 single bonds."} {"text":"The chemical structure with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has 36 single bonds."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-23.jsonl": "{"text":"A conformer of the chemical with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has an asphericity of 0.240."} {"text":"A conformer of the chemical structure with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has an asphericity of 0.647."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-12.jsonl": "{"text":"The chemical with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has 3 hydrogen bond acceptors."} {"text":"The chemical with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has 6 hydrogen bond acceptors."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-28.jsonl": "{"text":"The chemical structure with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has the molecular formula C15H21N3O5."} {"text":"The chemical with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has the molecular formula C17H22F3N5O3."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-40.jsonl": "{"text":"User: I want to synthesize a chemical with asphericity of 0.756.\nAssistant: Awesome, do you have any other requirements?\nUser: Moreover, I would like the chemical to have a eccentricity of 0.993.\nAssistant: Is there anything else I should take into consideration?\nUser: Moreover, I would like the chemical to have a NPR1 value of 0.116 and a chemical formula of C14H23F2N3O3.\nAssistant: I recommend the chemical with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1."} {"text":"User: I would like to synthesize a chemical with asphericity of 0.338.\nAssistant: Cool, do you have any other conditions?\nUser: Moreover, I want the chemical to have a eccentricity of 0.979.\nAssistant: Is there anything else I should consider?\nUser: Moreover, I want the chemical to have a normalized principal moment of inertia ratio 1 (NPR1) value of 0.203 and a chemical formula of C18H28N6O4.\nAssistant: Given those requirements, I recommend the chemical with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-13.jsonl": "{"text":"The molecule with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has 0 hydrogen bond donors."} {"text":"The chemical with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has 1 hydrogen bond donors."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-23.jsonl": "{"text":"A conformer of the molecule with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has an asphericity of 0.789."} {"text":"A conformer of the compound with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has an asphericity of 0.647."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-2.jsonl": "{"text":"The molecule with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has a proportion of non-rotatable bonds of 0.848."} {"text":"The compound with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has a non-rotatable bond proportion of 0.793."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-21.jsonl": "{"text":"The chemical structure with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has an inertial shape factor of 0.001."} {"text":"The compound with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has an inertial shape factor of 0.001."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-14.jsonl": "{"text":"The compound with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has 0 violations of Lipinski's rule of five."} {"text":"The chemical structure with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has 0 violations of Lipinski's rule of five."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-1.jsonl": "{"text":"The molecule with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has a rotatable bond proportion of 0.152."} {"text":"The molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has a proportion of rotatable bonds of 0.207."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-13.jsonl": "{"text":"The molecule with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has 1 hydrogen bond donors."} {"text":"The compound with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has 2 hydrogen bond donors."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-41.jsonl": "{"text":"User: I need to create a chemical structure with eccentricity of 0.932.\nAssistant: Do you have any other constraints?\nUser: Additionally, I would like the chemical structure to have a asphericity of 0.240.\nAssistant: Is there anything else I should be aware of?\nUser: Additionally, I would like the chemical structure to have a 0 violations of Lipinski's rule of five and a chemical formula of C16H29N3O3.\nAssistant: Given those requirements, I recommend the chemical structure with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C]."} {"text":"User: I must to design a compound with eccentricity of 0.991.\nAssistant: Do you have any other conditions?\nUser: Additionally, I would like the compound to have a asphericity of 0.647.\nAssistant: Is there anything else I should take into consideration?\nUser: Additionally, I would like the compound to have a 0 violations of Lipinski's rule of 5 and a chemical formula of C17H18F3N5O3.\nAssistant: Given those requirements, I recommend the compound with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-29.jsonl": "{"text":"Question: What is the molecular formula and monoisotopic molecular mass of the molecule with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C]?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: C16H29N3O3, 311.221 Da"} {"text":"Question: What is the molecular formula and monoisotopic molecular mass of the chemical structure with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: C17H18F3N5O3, 397.136 Da"}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-23.jsonl": "{"text":"A conformer of the chemical structure with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has an asphericity of 0.756."} {"text":"A conformer of the compound with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has an asphericity of 0.338."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-5.jsonl": "{"text":"The compound with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has 0 triple bonds."} {"text":"The molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has 0 triple bonds."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-15.jsonl": "{"text":"The compound with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has a monoisotopic molecular mass of 311.221 Da."} {"text":"The chemical with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has a monoisotopic molecular mass of 397.136 Da."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-4.jsonl": "{"text":"The molecule with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has 2 double bonds."} {"text":"The molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has 3 double bonds."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-5.jsonl": "{"text":"The chemical structure with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has 0 triple bonds."} {"text":"The chemical with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has 0 triple bonds."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-15.jsonl": "{"text":"The chemical structure with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has a monoisotopic molecular mass of 319.171 Da."} {"text":"The molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has a monoisotopic mass of 392.217 Da."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-12.jsonl": "{"text":"The molecule with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has 4 hydrogen bond acceptors."} {"text":"The molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has 6 hydrogen bond acceptors."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-18.jsonl": "{"text":"The chemical with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has a nitrogen mass fraction of 0.132."} {"text":"The chemical structure with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has a nitrogen mass proportion of 0.214."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-35.jsonl": "{"text":"User: I want to design a chemical with 0 violations of Lipinski's rule of 5.\nAssistant: Awesome, do you have any other requirements?\nUser: I want the chemical to have 46 bonds.\nAssistant: OK, that already helps constraining my search. It would be useful if you could tell me more about the chemical you are looking for.\nUser: I want the chemical to have 2 chiral centers. In addition, I want the chemical to have a carbon mass fraction of 0.557.\nAssistant: I recommend the chemical with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1."} {"text":"User: I want to design a molecule with 0 violations of Lipinski's rule of five.\nAssistant: Awesome, do you have any other limitations?\nUser: I want the molecule to have 52 bonds.\nAssistant: Thanks, that already helps. It would help if you could tell me more about the molecule you are looking for.\nUser: I want the molecule to have 2 chiral centers. Moreover, I want the molecule to have a carbon mass proportion of 0.509.\nAssistant: In that case, I recommend the molecule with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22)."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-2.jsonl": "{"text":"The compound with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has a proportion of non-rotatable bonds of 0.725."} {"text":"The molecule with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has a proportion of non-rotatable bonds of 0.837."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-33.jsonl": "{"text":"User: I need to design a chemical with chemical formula C14H23F2N3O3.\nAssistant: That's interesting, do you have any other requirements?\nUser: I would like the chemical to have 126 valence electron count, 2 chiral centers, and 0 violations of Lipinski's rule of five.\nAssistant: I recommend the chemical with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1."} {"text":"User: I need to design a molecule with molecular formula C18H28N6O4.\nAssistant: Do you have any other requirements?\nUser: I want the molecule to have 154 number of valence electrons, 1 chiral centers, and 0 violations of Lipinski's rule of five.\nAssistant: I recommend the molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-42.jsonl": "{"text":"User: I would like to create a chemical structure with normalized principal moment of inertia ratio 1 (NPR1) value of 0.364.\nAssistant: Do you have any other constraints?\nUser: Moreover, I want the chemical structure to have a normalized principal moment of inertia ratio 2 (NPR2) value of 0.788.\nAssistant: Is there anything else I should take into consideration?\nUser: Moreover, I want the chemical structure to have a molecular formula of C16H29N3O3.\nAssistant: Given those requirements, I recommend the chemical structure with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C]."} {"text":"User: I want to design a compound with normalized principal moment of inertia ratio 1 (NPR1) value of 0.132.\nAssistant: Awesome, do you have any other constraints?\nUser: Additionally, I want the compound to have a NPR2 value of 0.912.\nAssistant: Is there anything else I should consider?\nUser: Additionally, I want the compound to have a chemical formula of C17H18F3N5O3.\nAssistant: I recommend the compound with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-11.jsonl": "{"text":"The compound with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has 5 oxygen atoms."} {"text":"The compound with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has 3 oxygen atoms."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-7.jsonl": "{"text":"The chemical with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has 51 bonds."} {"text":"The chemical with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has 49 bonds."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-17.jsonl": "{"text":"The chemical structure with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has a hydrogen mass fraction of 0.065."} {"text":"The chemical structure with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has a hydrogen mass proportion of 0.055."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-27.jsonl": "{"text":"A conformer of the chemical structure with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has a principal moment of inertia 2 value of 6097.342."} {"text":"A conformer of the molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has a principal moment of inertia 2 value of 6146.209."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-19.jsonl": "{"text":"The chemical structure with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has an oxygen mass proportion of 0.150."} {"text":"The chemical structure with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has an oxygen mass fraction of 0.163."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-11.jsonl": "{"text":"The compound with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has 3 oxygen atoms."} {"text":"The molecule with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has 3 oxygen atoms."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-31.jsonl": "{"text":"Question: What is the chemical formula, proportion of rotatable bonds, and number of chiral centers of the compound with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: C14H23F2N3O3, 0.152, 2"} {"text":"Question: What is the chemical formula, rotatable bond proportion, and number of chiral centers of the molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2?\nConstraint: Answer by only returning the values separated by a comma.\nAnswer: C18H28N6O4, 0.207, 1"}", "/scratch/micpie/export/chemcaption_rdkit/train_0-1.jsonl": "{"text":"The compound with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has a proportion of rotatable bonds of 0.275."} {"text":"The compound with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has a proportion of rotatable bonds of 0.163."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-37.jsonl": "{"text":"User: I need to design a drug with a monoisotopic molecular mass of 323.148 Da.\nAssistant: Cool, do you have any other constraints?\nUser: I want the drug to have 46 bonds.\nAssistant: OK, that already helps constraining my search. It would be useful if you could tell me more about the drug you are interested in.\nUser: I want the drug to have 2 chiral centers.\nAssistant: In that case, I recommend the drug with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1."} {"text":"User: I have to design a chemical with a monoisotopic mass of 401.167 Da.\nAssistant: Awesome, do you have any other constraints?\nUser: I would like the chemical to have 52 bonds.\nAssistant: Thanks, that already helps. It would be great if you could tell me more about the chemical you are looking for.\nUser: I would like the chemical to have 2 chiral centers.\nAssistant: Given those requirements, I recommend the chemical with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22)."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-13.jsonl": "{"text":"The compound with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has 2 hydrogen bond donors."} {"text":"The chemical structure with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has 2 hydrogen bond donors."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-26.jsonl": "{"text":"A conformer of the compound with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has a PMI1 value of 748.924."} {"text":"A conformer of the molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has a principal moment of inertia 1 (PMI1) value of 1321.406."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-4.jsonl": "{"text":"The compound with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has 3 double bonds."} {"text":"The chemical with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has 3 double bonds."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-43.jsonl": "{"text":"User: I want to design a chemical structure with principal moment of inertia 1 (PMI1) value of 550.906.\nAssistant: Awesome, do you have any other constraints?\nUser: Moreover, I would like the chemical structure to have a principal moment of inertia 2 value of 6088.795.\nAssistant: Is there anything else I should take into account?\nUser: Moreover, I would like the chemical structure to have a chemical formula of C15H21N3O5.\nAssistant: Given those requirements, I recommend the chemical structure with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1."} {"text":"User: I need to synthesize a compound with PMI1 value of 1075.263.\nAssistant: Do you have any other constraints?\nUser: Moreover, I would like the compound to have a principal moment of inertia 2 (PMI2) value of 9404.572.\nAssistant: Is there anything else I should take into consideration?\nUser: Moreover, I would like the compound to have a molecular formula of C17H22F3N5O3.\nAssistant: I recommend the compound with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22)."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-7.jsonl": "{"text":"The chemical with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has 46 bonds."} {"text":"The molecule with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has 52 bonds."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-9.jsonl": "{"text":"The compound with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has 29 hydrogen atoms."} {"text":"The chemical with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has 18 hydrogen atoms."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-25.jsonl": "{"text":"A conformer of the chemical with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has an NPR2 value of 0.788."} {"text":"A conformer of the chemical structure with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has an NPR2 value of 0.912."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-41.jsonl": "{"text":"User: I need to create a chemical with eccentricity of 0.993.\nAssistant: That's interesting, do you have any other requirements?\nUser: Moreover, I would like the chemical to have a asphericity of 0.756.\nAssistant: Is there anything else I should take into account?\nUser: Moreover, I would like the chemical to have a 0 violations of Lipinski's rule of five and a molecular formula of C14H23F2N3O3.\nAssistant: I recommend the chemical with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1."} {"text":"User: I need to make a chemical with eccentricity of 0.979.\nAssistant: Cool, do you have any other constraints?\nUser: Moreover, I want the chemical to have a asphericity of 0.338.\nAssistant: Is there anything else I should consider?\nUser: Moreover, I want the chemical to have a 0 violations of Lipinski's rule of 5 and a chemical formula of C18H28N6O4.\nAssistant: Given those requirements, I recommend the chemical with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-22.jsonl": "{"text":"A conformer of the chemical structure with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has an eccentricity of 0.993."} {"text":"A conformer of the chemical structure with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has an eccentricity of 0.979."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-18.jsonl": "{"text":"The chemical structure with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C] has a nitrogen mass proportion of 0.135."} {"text":"The chemical structure with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ has a nitrogen mass fraction of 0.176."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-3.jsonl": "{"text":"The compound with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has 44 single bonds."} {"text":"The chemical structure with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has 50 single bonds."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-8.jsonl": "{"text":"The molecule with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has 15 carbon atoms."} {"text":"The chemical with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has 17 carbon atoms."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-14.jsonl": "{"text":"The molecule with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has 0 violations of Lipinski's rule of 5."} {"text":"The chemical structure with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has 0 violations of Lipinski's rule of five."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-17.jsonl": "{"text":"The compound with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has a hydrogen mass proportion of 0.073."} {"text":"The chemical structure with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has a hydrogen mass fraction of 0.072."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-14.jsonl": "{"text":"The molecule with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1 has 0 violations of Lipinski's rule of five."} {"text":"The chemical structure with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2 has 0 violations of Lipinski's rule of five."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-25.jsonl": "{"text":"A conformer of the molecule with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has an normalized principal moment of inertia ratio 2 value of 0.964."} {"text":"A conformer of the chemical structure with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has an normalized principal moment of inertia ratio 2 value of 0.979."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-4.jsonl": "{"text":"The chemical with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has 2 double bonds."} {"text":"The chemical structure with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has 2 double bonds."}", "/scratch/micpie/export/chemcaption_rdkit/valid_0-43.jsonl": "{"text":"User: I must to create a drug with principal moment of inertia 1 (PMI1) value of 748.924.\nAssistant: That's interesting, do you have any other requirements?\nUser: Additionally, I want the drug to have a PMI2 value of 6097.342.\nAssistant: Is there anything else I should consider?\nUser: Additionally, I want the drug to have a chemical formula of C14H23F2N3O3.\nAssistant: I recommend the drug with InChI InChI=1S\/C14H23F2N3O3\/c1-10-8-11(2-7-22-10)17-13(20)14(21)19-5-3-18(4-6-19)9-12(15)16\/h10-12H,2-9H2,1H3,(H,17,20)\/t10-,11+\/m0\/s1."} {"text":"User: I need to design a molecule with principal moment of inertia 1 (PMI1) value of 1321.406.\nAssistant: That's interesting, do you have any other constraints?\nUser: In addition, I want the molecule to have a principal moment of inertia 2 value of 6146.209.\nAssistant: Is there anything else I should take into consideration?\nUser: In addition, I want the molecule to have a chemical formula of C18H28N6O4.\nAssistant: I recommend the molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-37.jsonl": "{"text":"User: I have to design a molecule with a monoisotopic molecular mass of 311.221 Da.\nAssistant: Do you have any other constraints?\nUser: I would like the molecule to have 51 bonds.\nAssistant: OK, that already helps constraining my search. It would help if you could tell me more about the molecule you are interested in.\nUser: I would like the molecule to have 2 chiral centers.\nAssistant: Given those requirements, I recommend the molecule with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C]."} {"text":"User: I have to design a drug with a monoisotopic mass of 397.136 Da.\nAssistant: Cool, do you have any other constraints?\nUser: I want the drug to have 49 bonds.\nAssistant: Thanks, that already helps. It would help if you could tell me more about the drug you are looking for.\nUser: I want the drug to have 3 chiral centers.\nAssistant: In that case, I recommend the drug with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-12.jsonl": "{"text":"The chemical with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has 7 hydrogen bond acceptors."} {"text":"The chemical with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has 6 hydrogen bond acceptors."}", "/scratch/micpie/export/chemcaption_rdkit/train_0-38.jsonl": "{"text":"User: I want to ask you about the chemical structure with SELFIES [C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch2][N][C][C][C][C][Ring1][Branch1][=O][C][Branch1][C][C][C].\nAssistant: What can I do for you?\nUser: I want to know the chemical formula and monoisotopic mass of this chemical structure.\nAssistant: The chemical structure has the chemical formula C16H29N3O3 and a monoisotopic mass of 311.221 Da."} {"text":"User: I want to ask you about the chemical structure with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+.\nAssistant: What can I do for you?\nUser: I need to know the molecular formula and monoisotopic molecular mass of this chemical structure.\nAssistant: The chemical structure has the molecular formula C17H18F3N5O3 and a monoisotopic molecular mass of 397.136 Da."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-38.jsonl": "{"text":"User: I have some questions about the drug with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1.\nAssistant: What can I do for you?\nUser: I want to know the chemical formula and monoisotopic molecular mass of this drug.\nAssistant: The drug has the chemical formula C15H21N3O5 and a monoisotopic molecular mass of 323.148 Da."} {"text":"User: I want to ask you about the drug with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22).\nAssistant: How can I help?\nUser: I want to know the molecular formula and monoisotopic molecular mass of this drug.\nAssistant: The drug has the molecular formula C17H22F3N5O3 and a monoisotopic molecular mass of 401.167 Da."}", "/scratch/micpie/export/chemcaption_rdkit/test_0-20.jsonl": "{"text":"The molecule with SMILES COC(=O)c1cc(C(=O)N2C[C@H](N3CCOCC3)C[C@H]2C)no1 has 2 chiral centers."} {"text":"The chemical structure with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) has 2 chiral centers."}", "/scratch/micpie/export/compound_protein_disease/test_1-2.jsonl": "{"text":"User: Can you come up with one example for a protein that is targeted by the compound CC1(O)CC(N)(c2ccc(-c3nc4n(c3-c3ccncc3)COc3cccc(F)c3-4)cc2)C1?\nAssistant: Of course, the protein Protein kinase B beta is targeted by the compound CC1(O)CC(N)(c2ccc(-c3nc4n(c3-c3ccncc3)COc3cccc(F)c3-4)cc2)C1.\nUser: Can you tell me which disease the protein Protein kinase B beta is related to?\nAssistant: The protein Protein kinase B beta is related to the Hypoinsulinemic hypoglycemia with hemihypertrophy disease."} {"text":"User: Can you come up with an example for a protein that is targeted by the compound O=c1nc2n(c3ccccc13)CC\/C2=C\\N1CCCCC1?\nAssistant: Yes, the protein ATP-dependent helicase SMARCA4 is targeted by the compound O=c1nc2n(c3ccccc13)CC\/C2=C\\N1CCCCC1.\nUser: Can you tell me which disease the protein ATP-dependent helicase SMARCA4 is related to?\nAssistant: The protein ATP-dependent helicase SMARCA4 is related to the Familial rhabdoid tumor disease."}", "/scratch/micpie/export/compound_protein_disease/test_0-1.jsonl": "{"text":"The compound Cnc=O)n[C@H]CCCOC6))))))cccc-ccccOCCCNCCCC5)))))))))nc6))))))cF)cc6ncc%10%13 targets the protein A-T mutated. The protein A-T mutated is related to the disease Ataxia-telangiectasia."} {"text":"The compound NC1(c2ccc(-c3nc4n(c3-c3ccncc3)COc3ccccc3-4)cc2)CCC1 targets the protein RAC-PK-beta. The protein RAC-PK-beta is related to the disease Hypoinsulinemic hypoglycemia with hemihypertrophy."}", "/scratch/micpie/export/compound_protein_disease/valid_0-0.jsonl": "{"text":"The compound Brc1ccc(-c2nn[nH]n2)cc1 targets the protein Serine-protein kinase ATM which is related to the disease Ataxia-telangiectasia."} {"text":"The compound [C][C][Branch1][C][O][C][C][Branch1][C][N][Branch2][Ring2][#Branch2][C][=C][C][=C][Branch2][Ring1][=C][C][N][=C][N][Branch1][N][C][=Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][O][C][=N][C][=C][C][=C][Ring1][=Branch1][Ring1][P][C][=C][Ring2][Ring1][=Branch2][C][Ring2][Ring1][#C] targets the protein RAC-beta serine\/threonine-protein kinase which is related to the disease Hypoinsulinemic hypoglycemia with hemihypertrophy."}", "/scratch/micpie/export/compound_protein_disease/test_0-2.jsonl": "{"text":"User: Can you give me one example for a protein that is targeted by the compound Cnc=O)n[C@H]CCCOC6))))))cccc-ccccOCCCNCCCC5)))))))))nc6))))))cF)cc6ncc%10%13?\nAssistant: Of course, the protein Ataxia telangiectasia mutated is targeted by the compound Cnc=O)n[C@H]CCCOC6))))))cccc-ccccOCCCNCCCC5)))))))))nc6))))))cF)cc6ncc%10%13.\nUser: Can you tell me which disease the protein Ataxia telangiectasia mutated is related to?\nAssistant: The protein Ataxia telangiectasia mutated is related to the Ataxia-telangiectasia disease."} {"text":"User: Can you come up with one example for a protein that is targeted by the compound NC1(c2ccc(-c3nc4n(c3-c3ccncc3)COc3ccccc3-4)cc2)CCC1?\nAssistant: Yes, the protein Protein kinase B beta is targeted by the compound NC1(c2ccc(-c3nc4n(c3-c3ccncc3)COc3ccccc3-4)cc2)CCC1.\nUser: Can you tell me which disease the protein Protein kinase B beta is related to?\nAssistant: The protein Protein kinase B beta is related to the Hypoinsulinemic hypoglycemia with hemihypertrophy disease."}", "/scratch/micpie/export/compound_protein_disease/train_1-0.jsonl": "{"text":"The compound [C][C][Branch1][C][O][C][C][Branch1][C][N][Branch2][Ring2][=C][C][=C][C][=C][Branch2][Ring2][C][C][N][=C][N][Branch1][N][C][=Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][O][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][Ring2][Ring1][C][C][=C][Ring2][Ring1][#Branch2][C][Ring2][Ring1][S] targets the protein RAC-beta serine\/threonine-protein kinase which is related to the disease Hypoinsulinemic hypoglycemia with hemihypertrophy."} {"text":"The compound CCN1CCN(\/C=C2\\CCn3c2nc(=O)c2ccccc23)CC1 targets the protein SNF2-beta which is related to the disease Familial rhabdoid tumor."}", "/scratch/micpie/export/compound_protein_disease/valid_1-2.jsonl": "{"text":"User: Can you come up with one example for a protein that is targeted by the compound [C][C][=C][C][Branch1][Branch2][C][C][=N][NH1][C][=Ring1][Branch1][=C][N][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Branch2][Ring1][Ring2][C][=C][C][=C][Branch1][#Branch2][C][Branch1][C][N][C][C][C][Ring1][Branch1][C][=C][Ring1][O][N][=C][Ring2][Ring1][#C][Ring2][Ring1][=Branch1]?\nAssistant: Of course, the protein RAC-PK-beta is targeted by the compound [C][C][=C][C][Branch1][Branch2][C][C][=N][NH1][C][=Ring1][Branch1][=C][N][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Branch2][Ring1][Ring2][C][=C][C][=C][Branch1][#Branch2][C][Branch1][C][N][C][C][C][Ring1][Branch1][C][=C][Ring1][O][N][=C][Ring2][Ring1][#C][Ring2][Ring1][=Branch1].\nUser: Can you tell me which disease the protein RAC-PK-beta is related to?\nAssistant: The protein RAC-PK-beta is related to the Hypoinsulinemic hypoglycemia with hemihypertrophy disease."} {"text":"User: Can you give me an example for a protein that is targeted by the compound O=C(Nc1ccnc(Cl)c1)Nc1ccccn1?\nAssistant: Of course, the protein Mitotic growth and transcription activator is targeted by the compound O=C(Nc1ccnc(Cl)c1)Nc1ccccn1.\nUser: Can you tell me which disease the protein Mitotic growth and transcription activator is related to?\nAssistant: The protein Mitotic growth and transcription activator is related to the Familial rhabdoid tumor disease."}", "/scratch/micpie/export/compound_protein_disease/test_0-0.jsonl": "{"text":"The compound InChI=1S\/C28H32FN5O3\/c1-32-25-17-30-24-15-23(29)21(14-22(24)27(25)34(28(32)35)20-6-4-12-36-18-20)19-7-8-26(31-16-19)37-13-5-11-33-9-2-3-10-33\/h7-8,14-17,20H,2-6,9-13,18H2,1H3\/t20-\/m0\/s1 targets the protein Ataxia telangiectasia mutated which is related to the disease Ataxia-telangiectasia."} {"text":"The compound NCcccc-cncnc5-cccncc6)))))))COcccccc6-%10))))))))))))cc6))))))CCC4 targets the protein Protein kinase B beta which is related to the disease Hypoinsulinemic hypoglycemia with hemihypertrophy."}", "/scratch/micpie/export/compound_protein_disease/train_0-0.jsonl": "{"text":"The compound COCc1ccc(-c2cc3c(N[C@@H](C)c4ccn(C)n4)c(C(N)=O)cnc3cc2F)cn1 targets the protein A-T mutated which is related to the disease Ataxia-telangiectasia."} {"text":"The compound CC1(O)CC(N)(c2ccc(-c3nc4n(c3-c3ccccc3)COc3cc(F)ccc3-4)cc2)C1 targets the protein RAC-PK-beta which is related to the disease Hypoinsulinemic hypoglycemia with hemihypertrophy."}", "/scratch/micpie/export/compound_protein_disease/test_1-1.jsonl": "{"text":"The compound InChI=1S\/C26H23FN4O2\/c1-25(32)13-26(28,14-25)18-7-5-16(6-8-18)22-23(17-9-11-29-12-10-17)31-15-33-20-4-2-3-19(27)21(20)24(31)30-22\/h2-12,32H,13-15,28H2,1H3 targets the protein Protein kinase B beta. The protein Protein kinase B beta is related to the disease Hypoinsulinemic hypoglycemia with hemihypertrophy."} {"text":"The compound InChI=1S\/C17H19N3O\/c21-17-14-6-2-3-7-15(14)20-11-8-13(16(20)18-17)12-19-9-4-1-5-10-19\/h2-3,6-7,12H,1,4-5,8-11H2\/b13-12+ targets the protein Protein brahma homolog 1. The protein Protein brahma homolog 1 is related to the disease Familial rhabdoid tumor."}", "/scratch/micpie/export/compound_protein_disease/valid_0-2.jsonl": "{"text":"User: Can you give me an example for a protein that is targeted by the compound InChI=1S\/C7H5BrN4\/c8-6-3-1-5(2-4-6)7-9-11-12-10-7\/h1-4H,(H,9,10,11,12)?\nAssistant: Yes, of course, the protein A-T mutated is targeted by the compound InChI=1S\/C7H5BrN4\/c8-6-3-1-5(2-4-6)7-9-11-12-10-7\/h1-4H,(H,9,10,11,12).\nUser: Can you tell me which disease the protein A-T mutated is related to?\nAssistant: The protein A-T mutated is related to the Ataxia-telangiectasia disease."} {"text":"User: Can you give me one example for a protein that is targeted by the compound [C][C][Branch1][C][O][C][C][Branch1][C][N][Branch2][Ring2][#Branch2][C][=C][C][=C][Branch2][Ring1][=C][C][N][=C][N][Branch1][N][C][=Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][O][C][=N][C][=C][C][=C][Ring1][=Branch1][Ring1][P][C][=C][Ring2][Ring1][=Branch2][C][Ring2][Ring1][#C]?\nAssistant: Yes, the protein RAC-PK-beta is targeted by the compound [C][C][Branch1][C][O][C][C][Branch1][C][N][Branch2][Ring2][#Branch2][C][=C][C][=C][Branch2][Ring1][=C][C][N][=C][N][Branch1][N][C][=Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][O][C][=N][C][=C][C][=C][Ring1][=Branch1][Ring1][P][C][=C][Ring2][Ring1][=Branch2][C][Ring2][Ring1][#C].\nUser: Can you tell me which disease the protein RAC-PK-beta is related to?\nAssistant: The protein RAC-PK-beta is related to the Hypoinsulinemic hypoglycemia with hemihypertrophy disease."}", "/scratch/micpie/export/compound_protein_disease/valid_0-1.jsonl": "{"text":"The compound Brcccc-cnn[nH]n5)))))cc6 targets the protein Serine-protein kinase ATM. The protein Serine-protein kinase ATM is related to the disease Ataxia-telangiectasia."} {"text":"The compound CC1(O)CC(N)(c2ccc(-c3nc4n(c3-c3ccccc3)COc3ncccc3-4)cc2)C1 targets the protein RAC-beta serine\/threonine-protein kinase. The protein RAC-beta serine\/threonine-protein kinase is related to the disease Hypoinsulinemic hypoglycemia with hemihypertrophy."}", "/scratch/micpie/export/compound_protein_disease/valid_1-1.jsonl": "{"text":"The compound Cccc-ccn[nH]c5)))))cnc-cccccc6))))))c-ccccCN)CCC4))))cc6))))))nc95 targets the protein RAC-beta serine\/threonine-protein kinase. The protein RAC-beta serine\/threonine-protein kinase is related to the disease Hypoinsulinemic hypoglycemia with hemihypertrophy."} {"text":"The compound O=CNcccncCl)c6)))))))Ncccccn6 targets the protein BAF190A. The protein BAF190A is related to the disease Familial rhabdoid tumor."}", "/scratch/micpie/export/compound_protein_disease/test_1-0.jsonl": "{"text":"The compound CC1(O)CC(N)(c2ccc(-c3nc4n(c3-c3ccncc3)COc3cccc(F)c3-4)cc2)C1 targets the protein Protein kinase B beta which is related to the disease Hypoinsulinemic hypoglycemia with hemihypertrophy."} {"text":"The compound InChI=1S\/C17H19N3O\/c21-17-14-6-2-3-7-15(14)20-11-8-13(16(20)18-17)12-19-9-4-1-5-10-19\/h2-3,6-7,12H,1,4-5,8-11H2\/b13-12+ targets the protein SWI\/SNF-related matrix-associated actin-dependent regulator of chromatin subfamily A member 4 which is related to the disease Familial rhabdoid tumor."}", "/scratch/micpie/export/compound_protein_disease/train_0-2.jsonl": "{"text":"User: Can you come up with an example for a protein that is targeted by the compound InChI=1S\/C23H23FN6O2\/c1-13(20-6-7-30(2)29-20)28-22-17-8-16(14-4-5-15(12-32-3)26-10-14)19(24)9-21(17)27-11-18(22)23(25)31\/h4-11,13H,12H2,1-3H3,(H2,25,31)(H,27,28)\/t13-\/m0\/s1?\nAssistant: Sure, the protein A-T mutated is targeted by the compound InChI=1S\/C23H23FN6O2\/c1-13(20-6-7-30(2)29-20)28-22-17-8-16(14-4-5-15(12-32-3)26-10-14)19(24)9-21(17)27-11-18(22)23(25)31\/h4-11,13H,12H2,1-3H3,(H2,25,31)(H,27,28)\/t13-\/m0\/s1.\nUser: Can you tell me which disease the protein A-T mutated is related to?\nAssistant: The protein A-T mutated is related to the Ataxia-telangiectasia disease."} {"text":"User: Can you give me an example for a protein that is targeted by the compound [C][C][Branch1][C][O][C][C][Branch1][C][N][Branch2][Ring2][=C][C][=C][C][=C][Branch2][Ring2][C][C][N][=C][N][Branch1][N][C][=Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][O][C][=C][C][Branch1][C][F][=C][C][=C][Ring1][#Branch1][Ring2][Ring1][C][C][=C][Ring2][Ring1][#Branch2][C][Ring2][Ring1][S]?\nAssistant: Yes, of course, the protein RAC-PK-beta is targeted by the compound [C][C][Branch1][C][O][C][C][Branch1][C][N][Branch2][Ring2][=C][C][=C][C][=C][Branch2][Ring2][C][C][N][=C][N][Branch1][N][C][=Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][O][C][=C][C][Branch1][C][F][=C][C][=C][Ring1][#Branch1][Ring2][Ring1][C][C][=C][Ring2][Ring1][#Branch2][C][Ring2][Ring1][S].\nUser: Can you tell me which disease the protein RAC-PK-beta is related to?\nAssistant: The protein RAC-PK-beta is related to the Hypoinsulinemic hypoglycemia with hemihypertrophy disease."}", "/scratch/micpie/export/compound_protein_disease/train_1-1.jsonl": "{"text":"The compound CC1(O)CC(N)(c2ccc(-c3nc4n(c3-c3ccccc3)COc3ccc(F)cc3-4)cc2)C1 targets the protein PKB beta. The protein PKB beta is related to the disease Hypoinsulinemic hypoglycemia with hemihypertrophy."} {"text":"The compound [C][C][N][C][C][N][Branch2][Ring1][Branch2][\/C][=C][\\C][C][N][C][Ring1][Branch1][=N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][O][C][C][Ring2][Ring1][Branch1] targets the protein Transcription activator BRG1. The protein Transcription activator BRG1 is related to the disease Familial rhabdoid tumor."}", "/scratch/micpie/export/compound_protein_disease/train_0-1.jsonl": "{"text":"The compound COCc1ccc(-c2cc3c(N[C@@H](C)c4ccn(C)n4)c(C(N)=O)cnc3cc2F)cn1 targets the protein A-T mutated. The protein A-T mutated is related to the disease Ataxia-telangiectasia."} {"text":"The compound CCO)CCN)cccc-cncnc5-cccccc6)))))))COcccF)ccc6-%10))))))))))))cc6))))))C4 targets the protein Protein kinase B beta. The protein Protein kinase B beta is related to the disease Hypoinsulinemic hypoglycemia with hemihypertrophy."}", "/scratch/micpie/export/compound_protein_disease/valid_1-0.jsonl": "{"text":"The compound Cccc-ccn[nH]c5)))))cnc-cccccc6))))))c-ccccCN)CCC4))))cc6))))))nc95 targets the protein RAC-beta serine\/threonine-protein kinase which is related to the disease Hypoinsulinemic hypoglycemia with hemihypertrophy."} {"text":"The compound [O][=C][Branch1][=N][N][C][=C][C][=N][C][Branch1][C][Cl][=C][Ring1][#Branch1][N][C][=C][C][=C][C][=N][Ring1][=Branch1] targets the protein ATP-dependent helicase SMARCA4 which is related to the disease Familial rhabdoid tumor."}", "/scratch/micpie/export/compound_protein_disease/train_1-2.jsonl": "{"text":"User: Can you come up with one example for a protein that is targeted by the compound InChI=1S\/C27H24FN3O2\/c1-26(32)14-27(29,15-26)19-9-7-17(8-10-19)23-24(18-5-3-2-4-6-18)31-16-33-22-12-11-20(28)13-21(22)25(31)30-23\/h2-13,32H,14-16,29H2,1H3?\nAssistant: Yes, the protein RAC-beta serine\/threonine-protein kinase is targeted by the compound InChI=1S\/C27H24FN3O2\/c1-26(32)14-27(29,15-26)19-9-7-17(8-10-19)23-24(18-5-3-2-4-6-18)31-16-33-22-12-11-20(28)13-21(22)25(31)30-23\/h2-13,32H,14-16,29H2,1H3.\nUser: Can you tell me which disease the protein RAC-beta serine\/threonine-protein kinase is related to?\nAssistant: The protein RAC-beta serine\/threonine-protein kinase is related to the Hypoinsulinemic hypoglycemia with hemihypertrophy disease."} {"text":"User: Can you give me one example for a protein that is targeted by the compound CCNCCN\/C=C\\CCnc5nc=O)cccccc6%10))))))))))))))CC6?\nAssistant: Sure, the protein SWI\/SNF-related matrix-associated actin-dependent regulator of chromatin subfamily A member 4 is targeted by the compound CCNCCN\/C=C\\CCnc5nc=O)cccccc6%10))))))))))))))CC6.\nUser: Can you tell me which disease the protein SWI\/SNF-related matrix-associated actin-dependent regulator of chromatin subfamily A member 4 is related to?\nAssistant: The protein SWI\/SNF-related matrix-associated actin-dependent regulator of chromatin subfamily A member 4 is related to the Familial rhabdoid tumor disease."}", "/scratch/micpie/export/MUV_689/valid_0-0.jsonl": "{"text":"The chemical compound with the SELFIES representation of ['[C][N][C][=C][N][=C][Ring1][Branch1][S][C][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][#Branch2][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][=N]'] is not an inhibitor of the EPH receptor A4."} {"text":"The molecular species with the SMILES O=C(NCC(=O)N(c1ccc2c(c1)OCO2)C(C(=O)NC1CCCC1)c1cccs1)c1ccco1 is not an inhibitor of the EPH receptor A4."}", "/scratch/micpie/export/MUV_689/test_0-0.jsonl": "{"text":"The chemical compound with the canonical SMILES COc1ccccc1Nc1nc2c(s1)CCC2 is not an inhibitor of the EPH receptor A4."} {"text":"The chemical compound with the canonical SMILES representation of COCCN1C(SCc2cccnc2)=NN\/C1=C1\\C=Nc2ccccc21 is not an inhibitor of the EPH receptor A4."}", "/scratch/micpie/export/MUV_689/train_0-0.jsonl": "{"text":"The chemical with the InChI InChI=1S\/C20H18N2O4S\/c23-19-17-9-5-4-8-16(17)18(12-21-14-10-11-27(25,26)13-14)20(24)22(19)15-6-2-1-3-7-15\/h1-9,12,14,21H,10-11,13H2\/b18-12+ is not an inhibitor of the EPH receptor A4."} {"text":"The chemical compound with the DeepSMILES Ccccc-cccccc6))))))n5-ccccO)cc6 is not an inhibitor of the EPH receptor A4."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_5-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SMILES from the SELFIES.\nSELFIES: [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CCc1cc(C2CCCN2C(=O)NCCCOC)on1"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SMILES from the SELFIES.\nMolecule SELFIES: ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]']\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_2-4.jsonl": "{"text":"User: Can you create the SMILES of the molecule with the SELFIES [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N]?\nAssistant: Yes, this molecule has a SMILES of CC1(C)CCC(CN2CCN(c3ccc(C(=O)NS(=O)(=O)c4ccc(NC5(CO)CCOCC5)c([N+](=O)[O-])c4)c(Oc4ccc5[nH]ccc5c4)c3)CC2)=C(c2ccc(Cl)cc2)C1."} {"text":"User: Can you tell me the SMILES of the molecule with the SELFIES [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1]?\nAssistant: Sure, this molecule has a SMILES of CN(C)CCN1C(=O)c2oc3ccc(Cl)cc3c(=O)c2C1c1cccc(Br)c1."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_2-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SMILES from the SELFIES.\nSELFIES: [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CC(=O)[C@]1(O)CC[C@]2(O)[C@]3(O)CC=C4C[C@@H](O)CC[C@]4(C)[C@H]3C[C@@H](OC(=O)\/C=C(\\C)C(C)C)[C@]12C"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the SELFIES.\nSELFIES: [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: c1ccc(COc2ccc(CN3CCN(c4ccccn4)CC3)cc2)cc1"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_4-0.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1] can also be represented with the SMILES CC(C)Oc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1."} {"text":"The molecule with the SELFIES representation of [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1] can also be represented with the SMILES representation NC(=O)c1ccc(C#CCNC(=O)c2ccc3ccccc3c2)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_4-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the SELFIES [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1]?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of CC(C)Oc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1."} {"text":"User: Can you create the SMILES of the molecule with the SELFIES [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1]?\nAssistant: Yes, this molecule has a SMILES of NC(=O)c1ccc(C#CCNC(=O)c2ccc3ccccc3c2)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_1-4.jsonl": "{"text":"User: Can you create the SMILES of the molecule with the SELFIES [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2]?\nAssistant: Yes, this molecule has a SMILES of CN1CCc2c(sc3c2c(=O)n(-c2ccccc2)c2nnc(S)n32)C1."} {"text":"User: Can you create the SMILES of the molecule with the SELFIES [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br]?\nAssistant: Sure, this molecule has a SMILES of O=C(Cn1cc2c(n1)CCCCC2)Nc1c(F)cc(F)cc1Br."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_5-1.jsonl": "{"text":"The molecule with the SMILES representation of CCc1cc(C2CCCN2C(=O)NCCCOC)on1 can also be represented with the SELFIES [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C]."} {"text":"The molecule with the SMILES representation of CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21 can also be represented with the SELFIES representation ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]']."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_4-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the SELFIES [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]?\nAssistant: Sure, this molecule has a SMILES of CCN(CC)C(=O)CN(C)C(=O)c1nonc1N."} {"text":"User: Can you tell me the SMILES of the molecule with the SELFIES [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]?\nAssistant: Yes, this molecule has a SMILES of Cc1cccc(C)c1NC(=O)CN(C)S(=O)(=O)Cc1ccccc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_4-4.jsonl": "{"text":"User: Can you create the SMILES of the molecule with the SELFIES [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C]?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of CCNC(=O)C1CCN(C(=O)c2nonc2N)CC1."} {"text":"User: Can you tell me the SMILES of the molecule with the SELFIES [C][N][Branch1][C][C][C][=N][C][Branch1][C][N][=N][C][Branch2][Ring1][Ring1][C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=N][Ring2][Ring1][Ring1]?\nAssistant: Yes, this molecule has a SMILES of CN(C)c1nc(N)nc(COC(=O)c2ccc(Cl)cc2N)n1."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_1-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SMILES from the SELFIES.\nSELFIES: [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: COc1cc(CNCCSc2nnnn2C)ccc1OCc1ccccc1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the SELFIES.\nMolecule SELFIES: [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CNC(=S)N1CCC(c2ccc(OC)cc2)=N1"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_3-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the SELFIES.\nSELFIES: [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: COCCCN1C(=O)C(=O)\/C(=C(\/O)c2ccc(S(=O)(=O)N3CCOCC3)cc2)C1c1cccc(Cl)c1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the SELFIES.\nMolecule SELFIES: [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: COc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1F"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_1-5.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the SMILES COc1cc(CNCCSc2nnnn2C)ccc1OCc1ccccc1?\nAssistant: Sure, this molecule has a SELFIES of [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]."} {"text":"User: Can you tell me the SELFIES of the molecule with the SMILES CNC(=S)N1CCC(c2ccc(OC)cc2)=N1?\nAssistant: Of course, this molecule has a SELFIES of [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_5-0.jsonl": "{"text":"The molecule with the SELFIES [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C] can also be represented with the SMILES CCc1cc(C2CCCN2C(=O)NCCCOC)on1."} {"text":"The molecule with the SELFIES ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]'] can also be represented with the SMILES CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_1-5.jsonl": "{"text":"User: Can you create the SELFIES of the molecule with the SMILES Cc1ccc(-c2nc3c4ccccc4[nH]c(=O)n3n2)cc1?\nAssistant: Sure, this molecule has a SELFIES of [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2]."} {"text":"User: Can you generate the SELFIES of the molecule with the SMILES Cc1c(OCCN2CCN(C)CC2)cn2ncnc(Oc3ccc(NC(=O)c4ccccc4F)cc3F)c12?\nAssistant: Of course, this molecule has a SELFIES of [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_5-4.jsonl": "{"text":"User: Can you create the SMILES of the molecule with the SELFIES [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C]?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of Cc1nc2c3ccccc3nn2c(C)c1CCC(=O)NC(C)C."} {"text":"User: Can you tell me the SMILES of the molecule with the SELFIES ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]']?\nAssistant: Of course, this molecule has a SMILES of Nc1c2c([nH+]c3ccccc13)CCCC2."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_0-5.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the SMILES S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1?\nAssistant: Yes, this molecule has a SELFIES of [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1]."} {"text":"User: Can you create the SELFIES of the molecule with the SMILES c1ccc(Nc2nc(CSc3nc[nH]n3)cs2)cc1?\nAssistant: Yes, this molecule has a SELFIES of [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_3-0.jsonl": "{"text":"The molecule with the SELFIES representation of [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1] can also be represented with the SMILES representation O=C(CSc1ccc(-c2ccco2)nn1)Nc1cccc(F)c1."} {"text":"The molecule with the SELFIES representation of [C][C][N][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N] can also be represented with the SMILES CCN(CCc1ccccc1)C(=O)c1nonc1N."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_0-1.jsonl": "{"text":"The molecule with the SMILES representation of S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1 can also be represented with the SELFIES representation [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1]."} {"text":"The molecule with the SMILES c1ccc(Nc2nc(CSc3nc[nH]n3)cs2)cc1 can also be represented with the SELFIES [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_5-0.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2] can also be represented with the SMILES representation CCN(CC)C(=O)CN(CC)c1cc(C)cc(C)c1."} {"text":"The molecule with the SELFIES representation of ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]'] can also be represented with the SMILES CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_2-0.jsonl": "{"text":"The molecule with the SELFIES [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N] can also be represented with the SMILES CC1(C)CCC(CN2CCN(c3ccc(C(=O)NS(=O)(=O)c4ccc(NC5(CO)CCOCC5)c([N+](=O)[O-])c4)c(Oc4ccc5[nH]ccc5c4)c3)CC2)=C(c2ccc(Cl)cc2)C1."} {"text":"The molecule with the SELFIES representation of [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1] can also be represented with the SMILES CN(C)CCN1C(=O)c2oc3ccc(Cl)cc3c(=O)c2C1c1cccc(Br)c1."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_2-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the SELFIES.\nSELFIES: [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: COc1ccc(\/C=N\/Nc2nc(C)cs2)cc1OC"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SMILES from the SELFIES.\nMolecule SELFIES: [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CCOc1ccsc1C(=O)OC"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2] can also be represented with the SMILES representation S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1."} {"text":"The molecule with the SELFIES [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C] can also be represented with the SMILES representation COc1cc(OC)cc(C(=O)NC(C(=O)NCCN(C)C)C23CC4CC(CC(C4)C2)C3)c1."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_3-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SELFIES from the SMILES.\nSMILES: O=C(CSc1ccc(-c2ccco2)nn1)Nc1cccc(F)c1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SELFIES from the SMILES.\nSMILES: CCN(CCc1ccccc1)C(=O)c1nonc1N\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][C][N][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_5-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the SMILES.\nMolecule SMILES: Cc1nc2c3ccccc3nn2c(C)c1CCC(=O)NC(C)C\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C]"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SELFIES from the SMILES.\nSMILES: Nc1c2c([nH+]c3ccccc13)CCCC2\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]']"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_3-5.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the SMILES Cl.O=C(c1ccc(Br)cc1)C(CN1CCOCC1)c1ccccc1?\nAssistant: Sure, this molecule has a SELFIES of [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]."} {"text":"User: Can you generate the SELFIES of the molecule with the SMILES CCOc1ccccc1CNC(=O)Nc1ccc(OC)c(F)c1?\nAssistant: Of course, this molecule has a SELFIES of [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_5-5.jsonl": "{"text":"User: Can you create the SELFIES of the molecule with the SMILES CCN(CC)C(=O)CN(CC)c1cc(C)cc(C)c1?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2]."} {"text":"User: Can you create the SELFIES of the molecule with the SMILES CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1?\nAssistant: Yes, this molecule has a SELFIES of ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]']."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_4-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the SMILES.\nSMILES: CCN(CC)C(=O)CN(C)C(=O)c1nonc1N\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the SMILES.\nMolecule SMILES: Cc1cccc(C)c1NC(=O)CN(C)S(=O)(=O)Cc1ccccc1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_1-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the SMILES.\nMolecule SMILES: Cc1ccc(-c2nc3c4ccccc4[nH]c(=O)n3n2)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2]"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the SMILES.\nSMILES: Cc1c(OCCN2CCN(C)CC2)cn2ncnc(Oc3ccc(NC(=O)c4ccccc4F)cc3F)c12\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2]"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_0-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the SELFIES.\nSELFIES: [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the SELFIES.\nSELFIES: [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: c1ccc(Nc2nc(CSc3nc[nH]n3)cs2)cc1"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_3-0.jsonl": "{"text":"The molecule with the SELFIES [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1] can also be represented with the SMILES representation COCCCN1C(=O)C(=O)\/C(=C(\/O)c2ccc(S(=O)(=O)N3CCOCC3)cc2)C1c1cccc(Cl)c1."} {"text":"The molecule with the SELFIES representation of [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F] can also be represented with the SMILES COc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1F."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_1-0.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2] can also be represented with the SMILES Cc1ccc(-c2nc3c4ccccc4[nH]c(=O)n3n2)cc1."} {"text":"The molecule with the SELFIES [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2] can also be represented with the SMILES Cc1c(OCCN2CCN(C)CC2)cn2ncnc(Oc3ccc(NC(=O)c4ccccc4F)cc3F)c12."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_5-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the SELFIES.\nSELFIES: [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CCN(CC)C(=O)CN(CC)c1cc(C)cc(C)c1"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SMILES from the SELFIES.\nMolecule SELFIES: ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]']\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_2-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SELFIES from the SMILES.\nMolecule SMILES: CC(=O)[C@]1(O)CC[C@]2(O)[C@]3(O)CC=C4C[C@@H](O)CC[C@]4(C)[C@H]3C[C@@H](OC(=O)\/C=C(\\C)C(C)C)[C@]12C\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the SMILES.\nSMILES: c1ccc(COc2ccc(CN3CCN(c4ccccn4)CC3)cc2)cc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O]"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_1-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SMILES from the SELFIES.\nMolecule SELFIES: [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CN1CCc2c(sc3c2c(=O)n(-c2ccccc2)c2nnc(S)n32)C1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the SELFIES.\nSELFIES: [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: O=C(Cn1cc2c(n1)CCCCC2)Nc1c(F)cc(F)cc1Br"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_5-5.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the SMILES CCc1cc(C2CCCN2C(=O)NCCCOC)on1?\nAssistant: Sure, this molecule has a SELFIES of [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C]."} {"text":"User: Can you generate the SELFIES of the molecule with the SMILES CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21?\nAssistant: Yes, this molecule has a SELFIES of ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]']."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_2-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the SELFIES [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C]?\nAssistant: Of course, this molecule has a SMILES of CC(=O)[C@]1(O)CC[C@]2(O)[C@]3(O)CC=C4C[C@@H](O)CC[C@]4(C)[C@H]3C[C@@H](OC(=O)\/C=C(\\C)C(C)C)[C@]12C."} {"text":"User: Can you generate the SMILES of the molecule with the SELFIES [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O]?\nAssistant: Sure, this molecule has a SMILES of c1ccc(COc2ccc(CN3CCN(c4ccccn4)CC3)cc2)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_3-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the SELFIES [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1]?\nAssistant: Of course, this molecule has a SMILES of O=C(CSc1ccc(-c2ccco2)nn1)Nc1cccc(F)c1."} {"text":"User: Can you tell me the SMILES of the molecule with the SELFIES [C][C][N][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of CCN(CCc1ccccc1)C(=O)c1nonc1N."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1] can also be represented with the SMILES S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1."} {"text":"The molecule with the SELFIES representation of [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1] can also be represented with the SMILES c1ccc(Nc2nc(CSc3nc[nH]n3)cs2)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_2-5.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the SMILES CC1(C)CCC(CN2CCN(c3ccc(C(=O)NS(=O)(=O)c4ccc(NC5(CO)CCOCC5)c([N+](=O)[O-])c4)c(Oc4ccc5[nH]ccc5c4)c3)CC2)=C(c2ccc(Cl)cc2)C1?\nAssistant: Sure, this molecule has a SELFIES of [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N]."} {"text":"User: Can you create the SELFIES of the molecule with the SMILES CN(C)CCN1C(=O)c2oc3ccc(Cl)cc3c(=O)c2C1c1cccc(Br)c1?\nAssistant: Sure, this molecule has a SELFIES of [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_5-5.jsonl": "{"text":"User: Can you create the SELFIES of the molecule with the SMILES Cc1nc2c3ccccc3nn2c(C)c1CCC(=O)NC(C)C?\nAssistant: Sure, this molecule has a SELFIES of [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C]."} {"text":"User: Can you create the SELFIES of the molecule with the SMILES Nc1c2c([nH+]c3ccccc13)CCCC2?\nAssistant: Sure, this molecule has a SELFIES of ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]']."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_3-5.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the SMILES COCCCN1C(=O)C(=O)\/C(=C(\/O)c2ccc(S(=O)(=O)N3CCOCC3)cc2)C1c1cccc(Cl)c1?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1]."} {"text":"User: Can you create the SELFIES of the molecule with the SMILES COc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1F?\nAssistant: Yes, this molecule has a SELFIES of [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_2-0.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C] can also be represented with the SMILES representation CC(=O)[C@]1(O)CC[C@]2(O)[C@]3(O)CC=C4C[C@@H](O)CC[C@]4(C)[C@H]3C[C@@H](OC(=O)\/C=C(\\C)C(C)C)[C@]12C."} {"text":"The molecule with the SELFIES representation of [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O] can also be represented with the SMILES representation c1ccc(COc2ccc(CN3CCN(c4ccccn4)CC3)cc2)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_1-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the SELFIES [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2]?\nAssistant: Sure, this molecule has a SMILES of Cc1ccc(-c2nc3c4ccccc4[nH]c(=O)n3n2)cc1."} {"text":"User: Can you tell me the SMILES of the molecule with the SELFIES [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2]?\nAssistant: Sure, this molecule has a SMILES of Cc1c(OCCN2CCN(C)CC2)cn2ncnc(Oc3ccc(NC(=O)c4ccccc4F)cc3F)c12."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_2-0.jsonl": "{"text":"The molecule with the SELFIES [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C] can also be represented with the SMILES COc1ccc(\/C=N\/Nc2nc(C)cs2)cc1OC."} {"text":"The molecule with the SELFIES [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C] can also be represented with the SMILES CCOc1ccsc1C(=O)OC."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_4-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the SELFIES.\nSELFIES: [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCN(CC)C(=O)CN(C)C(=O)c1nonc1N"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the SELFIES.\nMolecule SELFIES: [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Cc1cccc(C)c1NC(=O)CN(C)S(=O)(=O)Cc1ccccc1"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_0-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the SMILES.\nMolecule SMILES: S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1]"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the SMILES.\nMolecule SMILES: c1ccc(Nc2nc(CSc3nc[nH]n3)cs2)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1]"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_4-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SMILES from the SELFIES.\nSELFIES: [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CC(C)Oc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the SELFIES.\nSELFIES: [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: NC(=O)c1ccc(C#CCNC(=O)c2ccc3ccccc3c2)cc1"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_3-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the SELFIES.\nSELFIES: [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: O=C(CSc1ccc(-c2ccco2)nn1)Nc1cccc(F)c1"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SMILES from the SELFIES.\nSELFIES: [C][C][N][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CCN(CCc1ccccc1)C(=O)c1nonc1N"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_2-1.jsonl": "{"text":"The molecule with the SMILES representation of COc1ccc(\/C=N\/Nc2nc(C)cs2)cc1OC can also be represented with the SELFIES [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C]."} {"text":"The molecule with the SMILES CCOc1ccsc1C(=O)OC can also be represented with the SELFIES representation [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_5-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the SELFIES [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2]?\nAssistant: Sure, this molecule has a SMILES of CCN(CC)C(=O)CN(CC)c1cc(C)cc(C)c1."} {"text":"User: Can you tell me the SMILES of the molecule with the SELFIES ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]']?\nAssistant: Of course, this molecule has a SMILES of CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_4-5.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the SMILES CC(C)Oc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1?\nAssistant: Sure, this molecule has a SELFIES of [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1]."} {"text":"User: Can you generate the SELFIES of the molecule with the SMILES NC(=O)c1ccc(C#CCNC(=O)c2ccc3ccccc3c2)cc1?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_4-0.jsonl": "{"text":"The molecule with the SELFIES [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N] can also be represented with the SMILES representation CCN(CC)C(=O)CN(C)C(=O)c1nonc1N."} {"text":"The molecule with the SELFIES representation of [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1] can also be represented with the SMILES representation Cc1cccc(C)c1NC(=O)CN(C)S(=O)(=O)Cc1ccccc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_5-1.jsonl": "{"text":"The molecule with the SMILES Cc1nc2c3ccccc3nn2c(C)c1CCC(=O)NC(C)C can also be represented with the SELFIES representation [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C]."} {"text":"The molecule with the SMILES Nc1c2c([nH+]c3ccccc13)CCCC2 can also be represented with the SELFIES ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]']."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_2-1.jsonl": "{"text":"The molecule with the SMILES CC1(C)CCC(CN2CCN(c3ccc(C(=O)NS(=O)(=O)c4ccc(NC5(CO)CCOCC5)c([N+](=O)[O-])c4)c(Oc4ccc5[nH]ccc5c4)c3)CC2)=C(c2ccc(Cl)cc2)C1 can also be represented with the SELFIES representation [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N]."} {"text":"The molecule with the SMILES CN(C)CCN1C(=O)c2oc3ccc(Cl)cc3c(=O)c2C1c1cccc(Br)c1 can also be represented with the SELFIES [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_0-0.jsonl": "{"text":"The molecule with the SELFIES [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C] can also be represented with the SMILES O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C."} {"text":"The molecule with the SELFIES [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1] can also be represented with the SMILES CCOC(=O)c1cccc(-[n+]2c(-c3ccccc3)cc(-c3ccccc3)cc2-c2ccccc2)c1.[O-][Cl+3]([O-])([O-])[O-]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_0-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SELFIES from the SMILES.\nMolecule SMILES: O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the SMILES.\nSMILES: CCOC(=O)c1cccc(-[n+]2c(-c3ccccc3)cc(-c3ccccc3)cc2-c2ccccc2)c1.[O-][Cl+3]([O-])([O-])[O-]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1]"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_1-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the SELFIES [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]?\nAssistant: Yes, this molecule has a SMILES of COc1cc(CNCCSc2nnnn2C)ccc1OCc1ccccc1."} {"text":"User: Can you tell me the SMILES of the molecule with the SELFIES [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N]?\nAssistant: Yes, this molecule has a SMILES of CNC(=S)N1CCC(c2ccc(OC)cc2)=N1."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_2-5.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the SMILES CC(=O)[C@]1(O)CC[C@]2(O)[C@]3(O)CC=C4C[C@@H](O)CC[C@]4(C)[C@H]3C[C@@H](OC(=O)\/C=C(\\C)C(C)C)[C@]12C?\nAssistant: Yes, this molecule has a SELFIES of [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C]."} {"text":"User: Can you create the SELFIES of the molecule with the SMILES c1ccc(COc2ccc(CN3CCN(c4ccccn4)CC3)cc2)cc1?\nAssistant: Sure, this molecule has a SELFIES of [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_1-1.jsonl": "{"text":"The molecule with the SMILES COc1cc(CNCCSc2nnnn2C)ccc1OCc1ccccc1 can also be represented with the SELFIES [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]."} {"text":"The molecule with the SMILES CNC(=S)N1CCC(c2ccc(OC)cc2)=N1 can also be represented with the SELFIES [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_5-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the SELFIES.\nMolecule SELFIES: [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: Cc1nc2c3ccccc3nn2c(C)c1CCC(=O)NC(C)C"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SMILES from the SELFIES.\nMolecule SELFIES: ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]']\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Nc1c2c([nH+]c3ccccc13)CCCC2"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_5-1.jsonl": "{"text":"The molecule with the SMILES CCN(CC)C(=O)CN(CC)c1cc(C)cc(C)c1 can also be represented with the SELFIES representation [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2]."} {"text":"The molecule with the SMILES representation of CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1 can also be represented with the SELFIES ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]']."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_4-1.jsonl": "{"text":"The molecule with the SMILES CCNC(=O)C1CCN(C(=O)c2nonc2N)CC1 can also be represented with the SELFIES representation [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C]."} {"text":"The molecule with the SMILES representation of CN(C)c1nc(N)nc(COC(=O)c2ccc(Cl)cc2N)n1 can also be represented with the SELFIES [C][N][Branch1][C][C][C][=N][C][Branch1][C][N][=N][C][Branch2][Ring1][Ring1][C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=N][Ring2][Ring1][Ring1]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_3-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the SMILES.\nSMILES: COCCCN1C(=O)C(=O)\/C(=C(\/O)c2ccc(S(=O)(=O)N3CCOCC3)cc2)C1c1cccc(Cl)c1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1]"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SELFIES from the SMILES.\nSMILES: COc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1F\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F]"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_3-4.jsonl": "{"text":"User: Can you create the SMILES of the molecule with the SELFIES [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]?\nAssistant: Of course, this molecule has a SMILES of Cl.O=C(c1ccc(Br)cc1)C(CN1CCOCC1)c1ccccc1."} {"text":"User: Can you tell me the SMILES of the molecule with the SELFIES [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2]?\nAssistant: Sure, this molecule has a SMILES of CCOc1ccccc1CNC(=O)Nc1ccc(OC)c(F)c1."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_1-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the SMILES.\nMolecule SMILES: CN1CCc2c(sc3c2c(=O)n(-c2ccccc2)c2nnc(S)n32)C1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2]"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the SMILES.\nSMILES: O=C(Cn1cc2c(n1)CCCCC2)Nc1c(F)cc(F)cc1Br\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br]"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_0-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SMILES from the SELFIES.\nSELFIES: [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the SELFIES.\nSELFIES: [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: COc1cc(OC)cc(C(=O)NC(C(=O)NCCN(C)C)C23CC4CC(CC(C4)C2)C3)c1"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_5-0.jsonl": "{"text":"The molecule with the SELFIES [C][C][=N][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][N][Ring1][=Branch2][C][Branch1][C][C][=C][Ring1][=C][C][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C] can also be represented with the SMILES Cc1nc2c3ccccc3nn2c(C)c1CCC(=O)NC(C)C."} {"text":"The molecule with the SELFIES ['[N][C][=C][C][=Branch1][N][=NH1+1][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring1][=N]'] can also be represented with the SMILES representation Nc1c2c([nH+]c3ccccc13)CCCC2."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_0-1.jsonl": "{"text":"The molecule with the SMILES representation of S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1 can also be represented with the SELFIES representation [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2]."} {"text":"The molecule with the SMILES representation of COc1cc(OC)cc(C(=O)NC(C(=O)NCCN(C)C)C23CC4CC(CC(C4)C2)C3)c1 can also be represented with the SELFIES representation [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_2-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the SELFIES [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C]?\nAssistant: Yes, this molecule has a SMILES of COc1ccc(\/C=N\/Nc2nc(C)cs2)cc1OC."} {"text":"User: Can you tell me the SMILES of the molecule with the SELFIES [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C]?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of CCOc1ccsc1C(=O)OC."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_2-1.jsonl": "{"text":"The molecule with the SMILES representation of CC(=O)[C@]1(O)CC[C@]2(O)[C@]3(O)CC=C4C[C@@H](O)CC[C@]4(C)[C@H]3C[C@@H](OC(=O)\/C=C(\\C)C(C)C)[C@]12C can also be represented with the SELFIES representation [C][C][=Branch1][C][=O][C@][Branch1][C][O][C][C][C@][Branch1][C][O][C@][Branch1][C][O][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][=N][C][C@@H1][Branch1][S][O][C][=Branch1][C][=O][\/C][=C][Branch1][C][\\C][C][Branch1][C][C][C][C@][Ring2][Ring1][#C][Ring2][Ring1][O][C]."} {"text":"The molecule with the SMILES representation of c1ccc(COc2ccc(CN3CCN(c4ccccn4)CC3)cc2)cc1 can also be represented with the SELFIES representation [C][=C][C][=C][Branch2][Ring2][C][C][O][C][=C][C][=C][Branch2][Ring1][Ring2][C][N][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][O]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_1-1.jsonl": "{"text":"The molecule with the SMILES representation of CN1CCc2c(sc3c2c(=O)n(-c2ccccc2)c2nnc(S)n32)C1 can also be represented with the SELFIES [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2]."} {"text":"The molecule with the SMILES representation of O=C(Cn1cc2c(n1)CCCCC2)Nc1c(F)cc(F)cc1Br can also be represented with the SELFIES [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_3-5.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the SMILES O=C(CSc1ccc(-c2ccco2)nn1)Nc1cccc(F)c1?\nAssistant: Of course, this molecule has a SELFIES of [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1]."} {"text":"User: Can you generate the SELFIES of the molecule with the SMILES CCN(CCc1ccccc1)C(=O)c1nonc1N?\nAssistant: Yes, this molecule has a SELFIES of [C][C][N][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_3-1.jsonl": "{"text":"The molecule with the SMILES representation of COCCCN1C(=O)C(=O)\/C(=C(\/O)c2ccc(S(=O)(=O)N3CCOCC3)cc2)C1c1cccc(Cl)c1 can also be represented with the SELFIES [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1]."} {"text":"The molecule with the SMILES representation of COc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1F can also be represented with the SELFIES [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_0-5.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the SMILES S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1?\nAssistant: Of course, this molecule has a SELFIES of [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2]."} {"text":"User: Can you tell me the SELFIES of the molecule with the SMILES COc1cc(OC)cc(C(=O)NC(C(=O)NCCN(C)C)C23CC4CC(CC(C4)C2)C3)c1?\nAssistant: Sure, this molecule has a SELFIES of [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_0-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the SELFIES [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2]?\nAssistant: Sure, this molecule has a SMILES of S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1."} {"text":"User: Can you generate the SMILES of the molecule with the SELFIES [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C]?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of COc1cc(OC)cc(C(=O)NC(C(=O)NCCN(C)C)C23CC4CC(CC(C4)C2)C3)c1."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_1-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SELFIES from the SMILES.\nSMILES: COc1cc(CNCCSc2nnnn2C)ccc1OCc1ccccc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SELFIES from the SMILES.\nSMILES: CNC(=S)N1CCC(c2ccc(OC)cc2)=N1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N]"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_5-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SELFIES from the SMILES.\nMolecule SMILES: CCN(CC)C(=O)CN(CC)c1cc(C)cc(C)c1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][Ring1][C][C][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2]"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SELFIES from the SMILES.\nSMILES: CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: ['[C][C][Branch1][C][O][C][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][C][S][S][C][C][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][Branch1][C][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][C][C][C][NH3+1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][Branch1][C][C][O][C][=Branch1][C][=O][N][Ring2][Branch1][=C]']"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_0-5.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the SMILES O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C?\nAssistant: Yes, this molecule has a SELFIES of [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C]."} {"text":"User: Can you tell me the SELFIES of the molecule with the SMILES CCOC(=O)c1cccc(-[n+]2c(-c3ccccc3)cc(-c3ccccc3)cc2-c2ccccc2)c1.[O-][Cl+3]([O-])([O-])[O-]?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_4-5.jsonl": "{"text":"User: Can you generate the SELFIES of the molecule with the SMILES CCNC(=O)C1CCN(C(=O)c2nonc2N)CC1?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C]."} {"text":"User: Can you tell me the SELFIES of the molecule with the SMILES CN(C)c1nc(N)nc(COC(=O)c2ccc(Cl)cc2N)n1?\nAssistant: Yes, this molecule has a SELFIES of [C][N][Branch1][C][C][C][=N][C][Branch1][C][N][=N][C][Branch2][Ring1][Ring1][C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=N][Ring2][Ring1][Ring1]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_1-0.jsonl": "{"text":"The molecule with the SELFIES representation of [C][O][C][=C][C][Branch1][=C][C][N][C][C][S][C][=N][N][=N][N][Ring1][Branch1][C][=C][C][=C][Ring1][P][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1] can also be represented with the SMILES representation COc1cc(CNCCSc2nnnn2C)ccc1OCc1ccccc1."} {"text":"The molecule with the SELFIES representation of [C][N][C][=Branch1][C][=S][N][C][C][C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=N][Ring1][=N] can also be represented with the SMILES representation CNC(=S)N1CCC(c2ccc(OC)cc2)=N1."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_0-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SMILES from the SELFIES.\nMolecule SELFIES: [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the SELFIES.\nSELFIES: [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CCOC(=O)c1cccc(-[n+]2c(-c3ccccc3)cc(-c3ccccc3)cc2-c2ccccc2)c1.[O-][Cl+3]([O-])([O-])[O-]"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_4-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SMILES from the SELFIES.\nSELFIES: [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C]\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCNC(=O)C1CCN(C(=O)c2nonc2N)CC1"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SMILES from the SELFIES.\nMolecule SELFIES: [C][N][Branch1][C][C][C][=N][C][Branch1][C][N][=N][C][Branch2][Ring1][Ring1][C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=N][Ring2][Ring1][Ring1]\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CN(C)c1nc(N)nc(COC(=O)c2ccc(Cl)cc2N)n1"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_4-1.jsonl": "{"text":"The molecule with the SMILES representation of CCN(CC)C(=O)CN(C)C(=O)c1nonc1N can also be represented with the SELFIES representation [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]."} {"text":"The molecule with the SMILES representation of Cc1cccc(C)c1NC(=O)CN(C)S(=O)(=O)Cc1ccccc1 can also be represented with the SELFIES [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_5-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SELFIES from the SMILES.\nSMILES: CCc1cc(C2CCCN2C(=O)NCCCOC)on1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C]"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SELFIES from the SMILES.\nMolecule SMILES: CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]']"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_2-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SMILES from the SELFIES.\nMolecule SELFIES: [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CC1(C)CCC(CN2CCN(c3ccc(C(=O)NS(=O)(=O)c4ccc(NC5(CO)CCOCC5)c([N+](=O)[O-])c4)c(Oc4ccc5[nH]ccc5c4)c3)CC2)=C(c2ccc(Cl)cc2)C1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the SELFIES.\nSELFIES: [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CN(C)CCN1C(=O)c2oc3ccc(Cl)cc3c(=O)c2C1c1cccc(Br)c1"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_1-1.jsonl": "{"text":"The molecule with the SMILES representation of Cc1ccc(-c2nc3c4ccccc4[nH]c(=O)n3n2)cc1 can also be represented with the SELFIES [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2]."} {"text":"The molecule with the SMILES representation of Cc1c(OCCN2CCN(C)CC2)cn2ncnc(Oc3ccc(NC(=O)c4ccccc4F)cc3F)c12 can also be represented with the SELFIES [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_0-1.jsonl": "{"text":"The molecule with the SMILES O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C can also be represented with the SELFIES [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C]."} {"text":"The molecule with the SMILES CCOC(=O)c1cccc(-[n+]2c(-c3ccccc3)cc(-c3ccccc3)cc2-c2ccccc2)c1.[O-][Cl+3]([O-])([O-])[O-] can also be represented with the SELFIES [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_3-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the SELFIES [C][O][C][C][C][N][C][=Branch1][C][=O][C][=Branch1][C][=O][\/C][=Branch2][Ring1][=C][=C][Branch1][C][\/O][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][Ring1][#C][C][Ring2][Ring1][Branch2][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1]?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of COCCCN1C(=O)C(=O)\/C(=C(\/O)c2ccc(S(=O)(=O)N3CCOCC3)cc2)C1c1cccc(Cl)c1."} {"text":"User: Can you tell me the SMILES of the molecule with the SELFIES [C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1][F]?\nAssistant: Yes, this molecule has a SMILES of COc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1F."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_0-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the SELFIES [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C]?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C."} {"text":"User: Can you generate the SMILES of the molecule with the SELFIES [C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring2][=Branch1][N+1][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring2][Ring1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring2][Ring1][=C].[O-1][Cl+3][Branch1][C][O-1][Branch1][C][O-1][O-1]?\nAssistant: Of course, this molecule has a SMILES of CCOC(=O)c1cccc(-[n+]2c(-c3ccccc3)cc(-c3ccccc3)cc2-c2ccccc2)c1.[O-][Cl+3]([O-])([O-])[O-]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_1-0.jsonl": "{"text":"The molecule with the SELFIES representation of [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2] can also be represented with the SMILES CN1CCc2c(sc3c2c(=O)n(-c2ccccc2)c2nnc(S)n32)C1."} {"text":"The molecule with the SELFIES [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br] can also be represented with the SMILES representation O=C(Cn1cc2c(n1)CCCCC2)Nc1c(F)cc(F)cc1Br."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_4-5.jsonl": "{"text":"User: Can you create the SELFIES of the molecule with the SMILES CCN(CC)C(=O)CN(C)C(=O)c1nonc1N?\nAssistant: Yes, this molecule has a SELFIES of [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][Branch1][C][C][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]."} {"text":"User: Can you tell me the SELFIES of the molecule with the SMILES Cc1cccc(C)c1NC(=O)CN(C)S(=O)(=O)Cc1ccccc1?\nAssistant: Of course, this molecule has a SELFIES of [C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_4-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SELFIES from the SMILES.\nMolecule SMILES: CCNC(=O)C1CCN(C(=O)c2nonc2N)CC1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C]"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SELFIES from the SMILES.\nMolecule SMILES: CN(C)c1nc(N)nc(COC(=O)c2ccc(Cl)cc2N)n1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][N][Branch1][C][C][C][=N][C][Branch1][C][N][=N][C][Branch2][Ring1][Ring1][C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=N][Ring2][Ring1][Ring1]"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_4-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the SMILES.\nMolecule SMILES: CC(C)Oc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SELFIES from the SMILES.\nMolecule SMILES: NC(=O)c1ccc(C#CCNC(=O)c2ccc3ccccc3c2)cc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1]"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_3-1.jsonl": "{"text":"The molecule with the SMILES representation of Cl.O=C(c1ccc(Br)cc1)C(CN1CCOCC1)c1ccccc1 can also be represented with the SELFIES representation [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]."} {"text":"The molecule with the SMILES representation of CCOc1ccccc1CNC(=O)Nc1ccc(OC)c(F)c1 can also be represented with the SELFIES [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_2-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SELFIES from the SMILES.\nSMILES: COc1ccc(\/C=N\/Nc2nc(C)cs2)cc1OC\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C]"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SELFIES from the SMILES.\nMolecule SMILES: CCOc1ccsc1C(=O)OC\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C]"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_3-1.jsonl": "{"text":"The molecule with the SMILES representation of O=C(CSc1ccc(-c2ccco2)nn1)Nc1cccc(F)c1 can also be represented with the SELFIES representation [O][=C][Branch2][Ring1][Ring2][C][S][C][=C][C][=C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][N][=N][Ring1][O][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1]."} {"text":"The molecule with the SMILES CCN(CCc1ccccc1)C(=O)c1nonc1N can also be represented with the SELFIES [C][C][N][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_0-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the SMILES.\nSMILES: S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [S][=Branch1][C][=O][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Ring1][C][C][=C][C][Branch1][P][O][C@H1][Branch1][Branch1][C][O][C][C][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][Branch2]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SELFIES from the SMILES.\nSMILES: COc1cc(OC)cc(C(=O)NC(C(=O)NCCN(C)C)C23CC4CC(CC(C4)C2)C3)c1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][Branch2][Ring2][=Branch2][C][=Branch1][C][=O][N][C][Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][=C][Ring2][Ring1][=C]"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_1-5.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the SMILES CN1CCc2c(sc3c2c(=O)n(-c2ccccc2)c2nnc(S)n32)C1?\nAssistant: Yes, this molecule has a SELFIES of [C][N][C][C][C][=C][Branch2][Ring1][P][S][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=C][Branch1][C][S][N][Ring1][P][Ring1][=Branch1][C][Ring2][Ring1][Branch2]."} {"text":"User: Can you create the SELFIES of the molecule with the SMILES O=C(Cn1cc2c(n1)CCCCC2)Nc1c(F)cc(F)cc1Br?\nAssistant: Yes, this molecule has a SELFIES of [O][=C][Branch2][Ring1][C][C][N][C][=C][C][=Branch1][Ring2][=N][Ring1][Branch1][C][C][C][C][C][Ring1][Branch2][N][C][=C][Branch1][C][F][C][=C][Branch1][C][F][C][=C][Ring1][Branch2][Br]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_4-1.jsonl": "{"text":"The molecule with the SMILES representation of CC(C)Oc1ccc(NC(=O)Nc2cnn(CC3CCCO3)c2)cc1 can also be represented with the SELFIES [C][C][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring1][Branch2][N][C][=Branch1][C][=O][N][C][C][=N][N][Branch1][=Branch2][C][C][C][C][C][O][Ring1][Branch1][C][=Ring1][O][C][=C][Ring2][Ring1][Branch1]."} {"text":"The molecule with the SMILES NC(=O)c1ccc(C#CCNC(=O)c2ccc3ccccc3c2)cc1 can also be represented with the SELFIES [N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][#Branch1][C][#C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_5-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the SELFIES [C][C][C][C][=C][Branch2][Ring1][C][C][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][O][C][O][N][=Ring2][Ring1][C]?\nAssistant: Sure, this molecule has a SMILES of CCc1cc(C2CCCN2C(=O)NCCCOC)on1."} {"text":"User: Can you create the SMILES of the molecule with the SELFIES ['[C][C][#C][C][C][Branch1][C][C][C][Branch1][C][O][\/C][=C][\/C][C][Branch1][C][O][C][C][C][\/C][=Branch1][#Branch2][=C][\/C][C][C][C][=Branch1][C][=O][O-1][C][C][Ring1][N][Ring1][S]']?\nAssistant: Of course, this molecule has a SMILES of CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/valid_2-5.jsonl": "{"text":"User: Can you tell me the SELFIES of the molecule with the SMILES COc1ccc(\/C=N\/Nc2nc(C)cs2)cc1OC?\nAssistant: Yes, this molecule has a SELFIES of [C][O][C][=C][C][=C][Branch1][=C][\/C][=N][\/N][C][=N][C][Branch1][C][C][=C][S][Ring1][=Branch1][C][=C][Ring1][#C][O][C]."} {"text":"User: Can you generate the SELFIES of the molecule with the SMILES CCOc1ccsc1C(=O)OC?\nAssistant: Yes, I'm happy to help, this molecule has a SELFIES of [C][C][O][C][C][=C][S][C][=Ring1][Branch1][C][=Branch1][C][=O][O][C]."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_1-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SMILES from the SELFIES.\nMolecule SELFIES: [C][C][=C][C][=C][Branch2][Ring1][#Branch1][C][N][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=Branch1][C][=O][N][Ring1][O][N][=Ring1][=C][C][=C][Ring2][Ring1][Ring2]\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: Cc1ccc(-c2nc3c4ccccc4[nH]c(=O)n3n2)cc1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the SELFIES.\nMolecule SELFIES: [C][C][C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][N][N][=C][N][=C][Branch2][Ring1][O][O][C][=C][C][=C][Branch1][#C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][F][C][=C][Ring1][S][F][C][=Ring2][Ring2][Branch1][Ring2][Ring1][Branch2]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Cc1c(OCCN2CCN(C)CC2)cn2ncnc(Oc3ccc(NC(=O)c4ccccc4F)cc3F)c12"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_0-4.jsonl": "{"text":"User: Can you create the SMILES of the molecule with the SELFIES [S][=Branch1][C][=O][C][C@@H1][Branch2][Ring2][Branch2][C][C][=C][C][Branch2][Ring1][Branch1][O][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][=C][Branch1][C][N][C][Branch1][C][F][=C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Branch2][Ring1][Branch1][NH2+1][C][C][=C][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Ring2][Ring2][#Branch1]?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1."} {"text":"User: Can you generate the SMILES of the molecule with the SELFIES [C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=N][C][Branch1][#Branch2][C][S][C][N][=C][NH1][N][=Ring1][Branch1][=C][S][Ring1][N][C][=C][Ring2][Ring1][Ring1]?\nAssistant: Of course, this molecule has a SMILES of c1ccc(Nc2nc(CSc3nc[nH]n3)cs2)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_3-0.jsonl": "{"text":"The molecule with the SELFIES representation of [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1] can also be represented with the SMILES representation Cl.O=C(c1ccc(Br)cc1)C(CN1CCOCC1)c1ccccc1."} {"text":"The molecule with the SELFIES [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2] can also be represented with the SMILES CCOc1ccccc1CNC(=O)Nc1ccc(OC)c(F)c1."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/test_2-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SELFIES from the SMILES.\nSMILES: CC1(C)CCC(CN2CCN(c3ccc(C(=O)NS(=O)(=O)c4ccc(NC5(CO)CCOCC5)c([N+](=O)[O-])c4)c(Oc4ccc5[nH]ccc5c4)c3)CC2)=C(c2ccc(Cl)cc2)C1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: [C][C][Branch1][C][C][C][C][C][Branch2][#Branch1][Branch1][C][N][C][C][N][Branch2][=Branch1][Branch2][C][=C][C][=C][Branch2][Ring2][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=C][N][C][Branch1][Ring1][C][O][C][C][O][C][C][Ring1][Branch2][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring2][Ring1][C][C][Branch1][#C][O][C][=C][C][=C][NH1][C][=C][C][Ring1][Branch1][=C][Ring1][=Branch2][=C][Ring2][Ring2][Branch2][C][C][Ring2][Ring2][=C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring2][Branch1][=N]"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SELFIES from the SMILES.\nMolecule SMILES: CN(C)CCN1C(=O)c2oc3ccc(Cl)cc3c(=O)c2C1c1cccc(Br)c1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: [C][N][Branch1][C][C][C][C][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][C][=Ring1][N][C][Ring1][S][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1]"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_4-0.jsonl": "{"text":"The molecule with the SELFIES [C][C][N][C][=Branch1][C][=O][C][C][C][N][Branch1][=N][C][=Branch1][C][=O][C][=N][O][N][=C][Ring1][Branch1][N][C][C][Ring1][=C] can also be represented with the SMILES representation CCNC(=O)C1CCN(C(=O)c2nonc2N)CC1."} {"text":"The molecule with the SELFIES representation of [C][N][Branch1][C][C][C][=N][C][Branch1][C][N][=N][C][Branch2][Ring1][Ring1][C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=N][Ring2][Ring1][Ring1] can also be represented with the SMILES representation CN(C)c1nc(N)nc(COC(=O)c2ccc(Cl)cc2N)n1."}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_3-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SMILES from the SELFIES.\nSELFIES: [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Cl.O=C(c1ccc(Br)cc1)C(CN1CCOCC1)c1ccccc1"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SMILES from the SELFIES.\nMolecule SELFIES: [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2]\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CCOc1ccccc1CNC(=O)Nc1ccc(OC)c(F)c1"}", "/scratch/micpie/export/mol_repr_transl_smiles_selfies/train_3-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SELFIES from the SMILES.\nMolecule SMILES: Cl.O=C(c1ccc(Br)cc1)C(CN1CCOCC1)c1ccccc1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [Cl].[O][=C][Branch1][N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][C][Branch1][#Branch2][C][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SELFIES from the SMILES.\nMolecule SMILES: CCOc1ccccc1CNC(=O)Nc1ccc(OC)c(F)c1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: [C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][C][F][=C][Ring1][=Branch2]"}", "/scratch/micpie/export/nlmchem/test_0-5.jsonl": "{"text":"User: I'm looking for the abbreviation for: fine needle aspiration cytology\nAssistant: Yes, here you go: FNAC"} {"text":"User: I'm searching for the abbreviation for: Diffused reflectance infrared technique\nAssistant: Yes, here you go: DRIFT"}", "/scratch/micpie/export/nlmchem/test_0-1.jsonl": "{"text":"Task: Please give me the full form or meaning of the abbreviation.\nAbbreviation: FNAC\nConstraint: Answer the question with complete words.\nResult: fine needle aspiration cytology"} {"text":"Task: Please give me the full form or meaning of the abbreviation.\nAbbreviation: DRIFT\nConstraint: Answer the question with full words.\nResult: Diffused reflectance infrared technique"}", "/scratch/micpie/export/nlmchem/valid_0-0.jsonl": "{"text":"The abbreviation \"ALH\" stands for \"amplitude of lateral head displacement\"."} {"text":"The abbreviation \"SD\" stands for \"synthetic dropout\"."}", "/scratch/micpie/export/nlmchem/test_0-2.jsonl": "{"text":"Task: Please give me the abbreviation of the following full form or meaning.\nFull form or meaning of the abbreviation: fine needle aspiration cytology\nConstraint: Answer the question with an abbreviation.\nResult: FNAC"} {"text":"Task: Please give me the abbreviation of the following full form or meaning.\nFull form or meaning of the abbreviation: Diffused reflectance infrared technique\nConstraint: Answer the question with an abbreviation.\nResult: DRIFT"}", "/scratch/micpie/export/nlmchem/test_0-0.jsonl": "{"text":"The abbreviation \"FNAC\" stands for \"fine needle aspiration cytology\"."} {"text":"The abbreviation \"DRIFT\" stands for \"Diffused reflectance infrared technique\"."}", "/scratch/micpie/export/nlmchem/test_0-3.jsonl": "{"text":"User: Can you give me the abbreviation of the following full form or meaning: fine needle aspiration cytology\nAssistant: Yes, here you go: FNAC"} {"text":"User: Can you give me the abbreviation of the following full form or meaning: Diffused reflectance infrared technique\nAssistant: Yes, here you go: DRIFT"}", "/scratch/micpie/export/nlmchem/train_0-0.jsonl": "{"text":"The abbreviation \"1N\" stands for \"1-naphthol\"."} {"text":"The abbreviation \"SEM\" stands for \"scanning electron microscopy\"."}", "/scratch/micpie/export/nlmchem/train_0-3.jsonl": "{"text":"User: Can you give me the abbreviation of the following full form or meaning: 1-naphthol\nAssistant: Yes, I'm happy to help, here you go: 1N"} {"text":"User: Can you give me the abbreviation of the following full form or meaning: scanning electron microscopy\nAssistant: Sure, here you go: SEM"}", "/scratch/micpie/export/nlmchem/valid_0-2.jsonl": "{"text":"Task: Please give me the abbreviation of the following full form or meaning.\nFull form or meaning of the abbreviation: amplitude of lateral head displacement\nConstraint: Answer the question with an abbreviation.\nResult: ALH"} {"text":"Task: Please give me the abbreviation of the following full form or meaning.\nFull form or meaning of the abbreviation: synthetic dropout\nConstraint: Answer the question with an abbreviation.\nResult: SD"}", "/scratch/micpie/export/nlmchem/valid_0-1.jsonl": "{"text":"Task: Please give me the full form or meaning of the abbreviation.\nAbbreviation: ALH\nConstraint: Answer the question with complete words.\nResult: amplitude of lateral head displacement"} {"text":"Task: Please give me the full form or meaning of the abbreviation.\nAbbreviation: SD\nConstraint: Answer the question with complete words.\nResult: synthetic dropout"}", "/scratch/micpie/export/nlmchem/valid_0-5.jsonl": "{"text":"User: I'm looking for the abbreviation for: amplitude of lateral head displacement\nAssistant: Of course, here you go: ALH"} {"text":"User: I'm searching for the abbreviation for: synthetic dropout\nAssistant: Yes, here you go: SD"}", "/scratch/micpie/export/nlmchem/valid_0-4.jsonl": "{"text":"User: Can you give me the full form or meaning of the following abbreviation: ALH\nAssistant: Of course, here you go: amplitude of lateral head displacement"} {"text":"User: Can you give me the full form or meaning of the following abbreviation: SD\nAssistant: Of course, here you go: synthetic dropout"}", "/scratch/micpie/export/nlmchem/train_0-5.jsonl": "{"text":"User: I'm looking for the abbreviation for: 1-naphthol\nAssistant: Sure, here you go: 1N"} {"text":"User: I'm searching for the abbreviation for: scanning electron microscopy\nAssistant: Yes, I'm happy to help, here you go: SEM"}", "/scratch/micpie/export/nlmchem/train_0-2.jsonl": "{"text":"Task: Please give me the abbreviation of the following full form or meaning.\nFull form or meaning of the abbreviation: 1-naphthol\nConstraint: Answer the question with an abbreviation.\nResult: 1N"} {"text":"Task: Please give me the abbreviation of the following full form or meaning.\nFull form or meaning of the abbreviation: scanning electron microscopy\nConstraint: Answer the question with an abbreviation.\nResult: SEM"}", "/scratch/micpie/export/nlmchem/train_0-1.jsonl": "{"text":"Task: Please give me the full form or meaning of the abbreviation.\nAbbreviation: 1N\nConstraint: Answer the question with full words.\nResult: 1-naphthol"} {"text":"Task: Please give me the full form or meaning of the abbreviation.\nAbbreviation: SEM\nConstraint: Answer the question with complete words.\nResult: scanning electron microscopy"}", "/scratch/micpie/export/nlmchem/train_0-4.jsonl": "{"text":"User: Can you give me the full form or meaning of the following abbreviation: 1N\nAssistant: Of course, here you go: 1-naphthol"} {"text":"User: Can you give me the full form or meaning of the following abbreviation: SEM\nAssistant: Of course, here you go: scanning electron microscopy"}", "/scratch/micpie/export/nlmchem/valid_0-3.jsonl": "{"text":"User: Can you give me the abbreviation of the following full form or meaning: amplitude of lateral head displacement\nAssistant: Yes, I'm happy to help, here you go: ALH"} {"text":"User: Can you give me the abbreviation of the following full form or meaning: synthetic dropout\nAssistant: Of course, here you go: SD"}", "/scratch/micpie/export/nlmchem/test_0-4.jsonl": "{"text":"User: Can you give me the full form or meaning of the following abbreviation: FNAC\nAssistant: Yes, here you go: fine needle aspiration cytology"} {"text":"User: Can you give me the full form or meaning of the following abbreviation: DRIFT\nAssistant: Of course, here you go: Diffused reflectance infrared technique"}", "/scratch/micpie/export/compound_protein_hpo_disease_1/test_4-0.jsonl": "{"text":"The compound COc1cc(NC(=O)c2ccc(-c3ccc(Cl)cc3)o2)cc(OC)c1 targets the protein 5-HT-2A. The protein 5-HT-2A is associated with Obsessive-compulsive behavior. The Obsessive-compulsive behavior is associated with the disease Early infantile epileptic encephalopathy."} {"text":"The compound InChI=1S\/C17H22ClNO2S\/c1-2-3-4-5-6-13-19-22(20,21)17-12-8-9-14-15(17)10-7-11-16(14)18\/h7-12,19H,2-6,13H2,1H3 targets the protein Tyrosyl-DNA phosphodiesterase 1. The protein Tyrosyl-DNA phosphodiesterase 1 is associated with Abnormality of the spinocerebellar tracts. The Abnormality of the spinocerebellar tracts is associated with the disease obsolete_Machado-Joseph disease."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/valid_5-0.jsonl": "{"text":"The compound [C][C][O][C][=N][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Branch1][C][O][=C][Ring1][N][\/C][=N][\/C][=C][C][=C][C][=C][Ring1][=Branch1] targets the protein Tyr-DNA phosphodiesterase 1. The protein Tyr-DNA phosphodiesterase 1 is associated with Abnormality of the spinocerebellar tracts. The Abnormality of the spinocerebellar tracts is associated with the disease Adult-onset chronic progressive external ophthalmoplegia with mitochondrial myopathy."} {"text":"The compound Cl.Oc1ccc2c3c(ccc2c1)-c1cc(F)cc(F)c1O[C@@H]3c1ccc(OCCN2CCCCC2)cc1 targets the protein ER-alpha. The protein ER-alpha is associated with Somatic mutation. The Somatic mutation is associated with the disease Ewing sarcoma."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/test_9-0.jsonl": "{"text":"The compound [C][C@H1][Branch1][C][N][C@H1][Branch1][C][O][C][=C][C][=C][Branch1][C][O][C][Branch1][C][O][=C][Ring1][Branch2] targets the protein Tyrosyl-DNA phosphodiesterase 1. The protein Tyrosyl-DNA phosphodiesterase 1 is associated with Gaze-evoked nystagmus. The Gaze-evoked nystagmus is associated with the disease Autosomal recessive congenital cerebellar ataxia due to MGLUR1 deficiency."} {"text":"The compound [O][=C][Branch1][=C][N][C][=C][Branch1][C][F][C][=C][C][=C][Ring1][#Branch1][F][C][=C][C][=C][C][Branch2][Branch1][=Branch2][C][N][=C][C][=C][C][=C][N][Ring1][=Branch1][C][=Ring1][=Branch2][C][=C][C][=N][C][Branch2][Ring1][S][N][C][=C][C][=C][Branch2][Ring1][Ring1][N][C][C][C][Branch1][=Branch2][N][C][C][C][C][C][Ring1][=Branch1][C][C][Ring1][N][C][=C][Ring2][Ring1][C][=N][Ring2][Ring1][=Branch2][=C][Ring2][Ring2][Branch2] targets the protein Insulin-like growth factor 1 receptor (EC 2.7.10.1) (Insulin-like growth factor I receptor) (IGF-I receptor) (CD antigen CD221). The protein Insulin-like growth factor 1 receptor (EC 2.7.10.1) (Insulin-like growth factor I receptor) (IGF-I receptor) (CD antigen CD221) is associated with Abnormality of subcutaneous fat tissue. The Abnormality of subcutaneous fat tissue is associated with the disease Growth delay due to insulin-like growth factor I resistance."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/valid_3-0.jsonl": "{"text":"The compound CCcccccc6CncC=O)O))cCcccccC=O)O))c6)))))))c=O)cccccc6%10)))))))))))))OCO5 targets the protein ET-A. The protein ET-A is associated with Lower eyelid coloboma. The Lower eyelid coloboma is associated with the disease mandibulofacial dysostosis with alopecia."} {"text":"The compound C[C@H]N)Cnncccccocnc5c9%12 targets the protein 5-HT-2. The protein 5-HT-2 is associated with Obsessive-compulsive behavior. The Obsessive-compulsive behavior is associated with the disease Isolated CoQ-cytochrome C reductase deficiency."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/test_5-0.jsonl": "{"text":"The compound O=CO)cccc[N+]=O)[O-]))cc6 targets the protein Tyrosyl-DNA phosphodiesterase 1. The protein Tyrosyl-DNA phosphodiesterase 1 is associated with Abnormality of the spinocerebellar tracts. The Abnormality of the spinocerebellar tracts is associated with the disease Adult-onset chronic progressive external ophthalmoplegia with mitochondrial myopathy."} {"text":"The compound CC[C@H](c1ccc(O)cc1)[C@@H](C#N)c1ccc(O)cc1 targets the protein Nuclear receptor subfamily 3 group A member 1. The protein Nuclear receptor subfamily 3 group A member 1 is associated with Somatic mutation. The Somatic mutation is associated with the disease Ewing sarcoma."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/test_2-0.jsonl": "{"text":"The compound CCNCC)C)C)))C=O)cccccc6 targets the protein DAT. The protein DAT is associated with Oculogyric crisis. The Oculogyric crisis is associated with the disease Autosomal dominant non-syndromic intellectual disability."} {"text":"The compound COC=O)cccC)ccc6NC=O)csccc5S=O)=O)NconcC)c5Cl targets the protein hET-AR. The protein hET-AR is associated with Lower eyelid coloboma. The Lower eyelid coloboma is associated with the disease obsolete_Treacher-Collins syndrome."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/valid_0-0.jsonl": "{"text":"The compound [Br][C][=C][C][=C][Branch1][Branch2][C][N][=N][NH1][N][=Ring1][Branch1][C][=C][Ring1][O] targets the protein A-T mutated. The protein A-T mutated is associated with Defective B cell differentiation. The Defective B cell differentiation is associated with the disease GM00719."} {"text":"The compound CCC[C@@H]NCCOcccccc6\/C=C\\CNC=O)[C@@H]Ccccccc6)))))))NC=O)[C@@H]C)NC)C%22=O targets the protein Growth hormone secretagogue receptor type 1. The protein Growth hormone secretagogue receptor type 1 is associated with Ketosis. The Ketosis is associated with the disease Vitamin B12-unresponsive methylmalonic acidemia."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/test_7-0.jsonl": "{"text":"The compound CCOc1ccc(N)cc1 targets the protein Tyr-DNA phosphodiesterase 1. The protein Tyr-DNA phosphodiesterase 1 is associated with Cerebellar vermis atrophy. The Cerebellar vermis atrophy is associated with the disease cerebellar atrophy, developmental delay, and seizures."} {"text":"The compound [C][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][\/C][=Branch2][Ring2][Branch1][=N][\/C][=C][C][=C][Branch2][Ring1][Branch2][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][C][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][Ring1][C][=Branch1][C][=O][N][Ring2][Ring1][O] targets the protein Cyanamide hydratase CA2. The protein Cyanamide hydratase CA2 is associated with Elevated serum acid phosphatase. The Elevated serum acid phosphatase is associated with the disease Oculocerebrorenal syndrome."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/test_3-0.jsonl": "{"text":"The compound CccccNC=O)csccc5S=O)=O)NconcC)c5Cl)))))))))))))))cC=O)OCC)C))))c6 targets the protein Endothelin receptor type A. The protein Endothelin receptor type A is associated with Lower eyelid coloboma. The Lower eyelid coloboma is associated with the disease mandibulofacial dysostosis with alopecia."} {"text":"The compound N=C(N)NCCC[C@@H]1NC(=O)\/C(=C\/c2c[nH]c3cc(Br)ccc23)NC1=O targets the protein 5-HT-2A. The protein 5-HT-2A is associated with Obsessive-compulsive behavior. The Obsessive-compulsive behavior is associated with the disease Isolated CoQ-cytochrome C reductase deficiency."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/train_1-0.jsonl": "{"text":"The compound [C][C][Branch1][C][C][C][C@@H1][N][C][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][\/C][=C][\\C][N][C][=Branch1][C][=O][C@H1][Branch1][=N][C][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1][N][C][=Branch1][C][=O][C][Branch1][#Branch1][C][C][C][C][Ring1][Branch1][N][C][Ring2][Ring2][Ring2][=O] targets the protein Ghrelin receptor. The protein Ghrelin receptor is associated with Ketosis. The Ketosis is associated with the disease Classic maple syrup urine disease."} {"text":"The compound InChI=1S\/C16H18N2O\/c1-2-6-14(7-3-1)19-16-9-5-4-8-15(16)18-12-10-17-11-13-18\/h1-9,17H,10-13H2 targets the protein DA transporter. The protein DA transporter is associated with Oculogyric crisis. The Oculogyric crisis is associated with the disease hyperphenylalaninemia due to DNAJC12 deficiency."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/test_0-0.jsonl": "{"text":"The compound [C][C][Branch1][C][C][Branch1][C][C][N][N][=C][C][=Branch2][Ring1][S][=C][Ring1][Branch1][N][C][=Branch1][C][=O][C][C][C][=Branch1][C][=O][N][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][C][Ring1][=N][C][S][=Branch1][C][=O][=Branch1][C][=O][C][Ring2][Ring1][Branch2] targets the protein Serine-protein kinase ATM. The protein Serine-protein kinase ATM is associated with Defective B cell differentiation. The Defective B cell differentiation is associated with the disease GM00719."} {"text":"The compound COc1ccc(CN2C([C@@H](Cc3c[nH]c4ccccc34)NC(=O)C3CCNCC3)=NNC2CCc2ccccc2)cc1 targets the protein Growth hormone secretagogue receptor type 1. The protein Growth hormone secretagogue receptor type 1 is associated with Ketosis. The Ketosis is associated with the disease Vitamin B12-unresponsive methylmalonic acidemia."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/test_6-0.jsonl": "{"text":"The compound [O][C][=C][C][=C][Branch2][Ring1][#C][O][C][=C][Branch1][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][S][C][=C][C][Branch1][C][O][=C][C][=C][Ring1][P][Ring1][#Branch1][C][=C][Ring2][Ring1][Branch2] targets the protein Estradiol receptor. The protein Estradiol receptor is associated with Somatic mutation. The Somatic mutation is associated with the disease oligodendroglioma."} {"text":"The compound O=C(N\/N=C\/c1ccco1)c1ccc(C(=O)N\/N=C\/c2ccco2)o1 targets the protein Tyr-DNA phosphodiesterase 1. The protein Tyr-DNA phosphodiesterase 1 is associated with Cerebellar vermis atrophy. The Cerebellar vermis atrophy is associated with the disease Spinocerebellar ataxia type 8."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/train_2-0.jsonl": "{"text":"The compound [C][=C][C][=C][Branch2][Ring1][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][N][C][C][N][C][C][Ring1][=Branch1][C][=C][Ring2][Ring1][Ring1] targets the protein DAT. The protein DAT is associated with Oculogyric crisis. The Oculogyric crisis is associated with the disease 6-pyruvoyl-tetrahydropterin synthase deficiency."} {"text":"The compound CCCc1nc(CC)c(C(=O)N(C)C)n1Cc1ccc(-c2ccccc2S(=O)(=O)Nc2onc(C)c2C)cc1 targets the protein Endothelin receptor type A. The protein Endothelin receptor type A is associated with Lower eyelid coloboma. The Lower eyelid coloboma is associated with the disease obsolete_Treacher-Collins syndrome."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/valid_2-0.jsonl": "{"text":"The compound CCCN1C2CCC1[C@@H](C(=O)OC)[C@@H](c1ccc(SC)cc1)C2 targets the protein Solute carrier family 6 member 3. The protein Solute carrier family 6 member 3 is associated with Oculogyric crisis. The Oculogyric crisis is associated with the disease Autosomal dominant non-syndromic intellectual disability."} {"text":"The compound CC(C)C[C@@H]1NC(=O)[C@@H](c2cccs2)NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](CC(=O)N2CCN(c3ccccc3)CC2)NC(=O)[C@@H](CC(=O)O)NC(=O)[C@@H](Cc2c[nH]c3ccccc23)NC1=O targets the protein Endothelin receptor type A. The protein Endothelin receptor type A is associated with Lower eyelid coloboma. The Lower eyelid coloboma is associated with the disease obsolete_Treacher-Collins syndrome."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/valid_4-0.jsonl": "{"text":"The compound [O][=C][Branch2][Ring2][=N][N][C][C][C][C][=C][C][=C][Branch2][Ring1][#Branch2][C][C][N][C][C][N][Branch1][=C][C][=N][S][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][C][C][Ring1][#C][C][=C][Ring2][Ring1][#Branch1][Ring2][Ring1][#Branch2][C][C][C][Ring1][Ring1] targets the protein 5-HT-2A. The protein 5-HT-2A is associated with Obsessive-compulsive behavior. The Obsessive-compulsive behavior is associated with the disease Early infantile epileptic encephalopathy."} {"text":"The compound Ccccsc5\/C=N\/NCCNCcccccc6-cccccc6%13)))))))))))))CC6 targets the protein Tyrosyl-DNA phosphodiesterase 1. The protein Tyrosyl-DNA phosphodiesterase 1 is associated with Abnormality of the spinocerebellar tracts. The Abnormality of the spinocerebellar tracts is associated with the disease obsolete_Machado-Joseph disease."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/train_0-0.jsonl": "{"text":"The compound COc1cc2c(cc1OC)CN(CC(=O)Nc1ccc(F)cc1)CC2 targets the protein A-T mutated. The protein A-T mutated is associated with Defective B cell differentiation. The Defective B cell differentiation is associated with the disease GM00719."} {"text":"The compound CCC)C[C@@H]NCCOcccccc6\/C=C\\CNC=O)[C@H]CcccccCl)c6)))))))NC=O)CCCCC5))))NC%22=O targets the protein GHRP. The protein GHRP is associated with Ketosis. The Ketosis is associated with the disease Malonic aciduria."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/valid_9-0.jsonl": "{"text":"The compound [C][C][O][C][=Branch1][C][=O][C][Branch1][C][C][S][C][=N][N][=C][Branch2][Ring1][C][C][N][C][=Branch1][C][=O][S][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][#Branch2][N][Ring1][S][C][=C][C][=C][C][=C][Ring1][=Branch1] targets the protein Tyr-DNA phosphodiesterase 1. The protein Tyr-DNA phosphodiesterase 1 is associated with Gaze-evoked nystagmus. The Gaze-evoked nystagmus is associated with the disease Episodic ataxia type 5."} {"text":"The compound C[C@@H](Nc1ncc(Br)c(Nc2cc(C3CC3)[nH]n2)n1)c1ccccc1 targets the protein Insulin-like growth factor 1 receptor (EC 2.7.10.1) (Insulin-like growth factor I receptor) (IGF-I receptor) (CD antigen CD221). The protein Insulin-like growth factor 1 receptor (EC 2.7.10.1) (Insulin-like growth factor I receptor) (IGF-I receptor) (CD antigen CD221) is associated with Abnormality of subcutaneous fat tissue. The Abnormality of subcutaneous fat tissue is associated with the disease Growth delay due to insulin-like growth factor I resistance."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/train_8-0.jsonl": "{"text":"The compound NS(=O)(=O)c1ccc(-c2cn(S(=O)(=O)c3ccccc3)c3ccccc23)cc1 targets the protein Carbonic anhydrase II. The protein Carbonic anhydrase II is associated with Elevated serum acid phosphatase. The Elevated serum acid phosphatase is associated with the disease Juvenile Paget disease."} {"text":"The compound Cc1noc(C)c1COC(=O)c1csc2c1CCCC2 targets the protein Tyrosyl-DNA phosphodiesterase 1. The protein Tyrosyl-DNA phosphodiesterase 1 is associated with Gaze-evoked nystagmus. The Gaze-evoked nystagmus is associated with the disease myopathy, congenital, progressive, with scoliosis."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/train_5-0.jsonl": "{"text":"The compound Cc1ccc(-c2nnc(SCC(=O)Nc3nccs3)n2N)cc1 targets the protein Tyrosyl-DNA phosphodiesterase 1. The protein Tyrosyl-DNA phosphodiesterase 1 is associated with Abnormality of the spinocerebellar tracts. The Abnormality of the spinocerebellar tracts is associated with the disease Gamma-glutamylcysteine synthetase deficiency."} {"text":"The compound C[C@@H]cccccO)c6F)))))O[C@@H]ccccOC[C@H]C)NCCCC5))))))))cc6))))))[C@H]6ccccO)cc6 targets the protein Estrogen receptor. The protein Estrogen receptor is associated with Somatic mutation. The Somatic mutation is associated with the disease ovarian carcinoma."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/train_9-0.jsonl": "{"text":"The compound Cc1cc(-c2csc(NC(=O)c3cccc(S(=O)(=O)N(C)C)c3)n2)c(C)n1-c1ccccc1 targets the protein Tyr-DNA phosphodiesterase 1. The protein Tyr-DNA phosphodiesterase 1 is associated with Gaze-evoked nystagmus. The Gaze-evoked nystagmus is associated with the disease Autosomal recessive congenital cerebellar ataxia due to MGLUR1 deficiency."} {"text":"The compound COcccccc6Ccncccc-cnn[C@H]CC[C@H]NCCNC=O)NC)C)))CC6))))))CC6))))))cncncN)c96)))))))))cc6[nH]9 targets the protein Insulin-like growth factor 1 receptor (EC 2.7.10.1) (Insulin-like growth factor I receptor) (IGF-I receptor) (CD antigen CD221). The protein Insulin-like growth factor 1 receptor (EC 2.7.10.1) (Insulin-like growth factor I receptor) (IGF-I receptor) (CD antigen CD221) is associated with Abnormality of subcutaneous fat tissue. The Abnormality of subcutaneous fat tissue is associated with the disease Growth delay due to insulin-like growth factor I resistance."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/test_1-0.jsonl": "{"text":"The compound CC[C@H](C)[C@@H]1NCCOc2ccccc2CCCNC(=O)[C@@H](Cc2ccc3ccccc3c2)NC(=O)[C@@H](C)N(C)C1=O targets the protein GHRP. The protein GHRP is associated with Ketosis. The Ketosis is associated with the disease Short stature due to GHSR deficiency."} {"text":"The compound [O][=C][\/C][=Branch2][Ring1][C][=C][\/C][=C][C][=C][C][=C][Ring1][=Branch1][N][C][C][N][C][C][Ring1][=Branch1][S][C][C][N][Ring2][Ring1][Ring1][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][F] targets the protein Sodium-dependent dopamine transporter. The protein Sodium-dependent dopamine transporter is associated with Oculogyric crisis. The Oculogyric crisis is associated with the disease Autosomal recessive dopa-responsive dystonia."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/valid_7-0.jsonl": "{"text":"The compound InChI=1S\/C22H23NO5\/c1-14(23-12-11-15-9-10-17(26-2)18(13-15)27-3)19-20(24)21(28-22(19)25)16-7-5-4-6-8-16\/h4-10,13,21,25H,11-12H2,1-3H3\/b23-14+ targets the protein Tyrosyl-DNA phosphodiesterase 1. The protein Tyrosyl-DNA phosphodiesterase 1 is associated with Cerebellar vermis atrophy. The Cerebellar vermis atrophy is associated with the disease cerebellar atrophy, developmental delay, and seizures."} {"text":"The compound InChI=1S\/C14H10BrN3O3S2\/c15-11-3-1-2-10-12(11)17-14(22)18(13(10)19)8-4-6-9(7-5-8)23(16,20)21\/h1-7H,(H,17,22)(H2,16,20,21) targets the protein Carbonate dehydratase II. The protein Carbonate dehydratase II is associated with Elevated serum acid phosphatase. The Elevated serum acid phosphatase is associated with the disease Albers-Schönberg osteopetrosis."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/valid_8-0.jsonl": "{"text":"The compound NS(=O)(=O)c1ccc(-n2c(S)nc3c(Br)cccc3c2=O)cc1 targets the protein Cyanamide hydratase CA2. The protein Cyanamide hydratase CA2 is associated with Elevated serum acid phosphatase. The Elevated serum acid phosphatase is associated with the disease Osteopetrosis with renal tubular acidosis."} {"text":"The compound CCOC=O)CC)ScnncCnc=O)scccccc69))))))))))n5-cccccc6 targets the protein Tyr-DNA phosphodiesterase 1. The protein Tyr-DNA phosphodiesterase 1 is associated with Gaze-evoked nystagmus. The Gaze-evoked nystagmus is associated with the disease Spinocerebellar ataxia with axonal neuropathy type 2."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/valid_1-0.jsonl": "{"text":"The compound CC(C)(N)C(=O)N[C@H](COCc1ccccc1)C(=O)N1CCC2(CC1)CS(=O)(=O)c1ccccc12 targets the protein Ghrelin receptor. The protein Ghrelin receptor is associated with Ketosis. The Ketosis is associated with the disease Short stature due to GHSR deficiency."} {"text":"The compound [C][C][S][C][=C][C][=C][Branch2][Ring1][=Branch2][C@H1][C][C][C][C][C][Branch1][#Branch2][C@H1][Ring1][#Branch1][C][=Branch1][C][=O][O][C][N][Ring1][#Branch2][C][C][C][F][C][=C][Ring2][Ring1][=Branch1] targets the protein Solute carrier family 6 member 3. The protein Solute carrier family 6 member 3 is associated with Oculogyric crisis. The Oculogyric crisis is associated with the disease Autosomal recessive dopa-responsive dystonia."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/train_6-0.jsonl": "{"text":"The compound C[C@@H]cccccO)c6F)))))O[C@@H]ccccOC[C@H]C)NCCCC5))))))))cc6))))))[C@H]6ccccO)cc6 targets the protein ER. The protein ER is associated with Somatic mutation. The Somatic mutation is associated with the disease prostate adenocarcinoma."} {"text":"The compound CCOC(=O)N1CCN(C(=O)CSc2ccc3nnc(-c4ccncc4)n3n2)CC1 targets the protein Tyr-DNA phosphodiesterase 1. The protein Tyr-DNA phosphodiesterase 1 is associated with Cerebellar vermis atrophy. The Cerebellar vermis atrophy is associated with the disease spinocerebellar ataxia 47."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/test_8-0.jsonl": "{"text":"The compound NS(=O)(=O)c1ccc(NC(=O)c2ccc(\/N=C3\\C(=O)Nc4ccc(Br)cc43)cc2)cc1 targets the protein Carbonic anhydrase 2. The protein Carbonic anhydrase 2 is associated with Elevated serum acid phosphatase. The Elevated serum acid phosphatase is associated with the disease Juvenile Paget disease."} {"text":"The compound cccccc6)ccn5CCNC%10)))))CCCC6 targets the protein Tyr-DNA phosphodiesterase 1. The protein Tyr-DNA phosphodiesterase 1 is associated with Gaze-evoked nystagmus. The Gaze-evoked nystagmus is associated with the disease myopathy, congenital, progressive, with scoliosis."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/valid_6-0.jsonl": "{"text":"The compound InChI=1S\/C25H20F3NO3\/c1-2-9-29-24(18-7-4-15(31)11-22(18)27)13-20(17-6-3-14(30)10-21(17)26)25(29)19-8-5-16(32)12-23(19)28\/h3-8,10-13,30-32H,2,9H2,1H3 targets the protein Estrogen receptor. The protein Estrogen receptor is associated with Somatic mutation. The Somatic mutation is associated with the disease oligodendroglioma."} {"text":"The compound InChI=1S\/C29H42N4O3\/c1-3-14-32-16-5-7-25-22-24(8-13-28(25)32)23-33(17-6-15-31-18-20-35-21-19-31)29(34)30-26-9-11-27(12-10-26)36-4-2\/h8-13,22H,3-7,14-21,23H2,1-2H3,(H,30,34) targets the protein Tyrosyl-DNA phosphodiesterase 1. The protein Tyrosyl-DNA phosphodiesterase 1 is associated with Cerebellar vermis atrophy. The Cerebellar vermis atrophy is associated with the disease Spinocerebellar ataxia type 8."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/train_3-0.jsonl": "{"text":"The compound CCCc1nc2c(n1Cc1ccc(-c3ccccc3S(=O)(=O)Nc3onc(C)c3C)c(C)c1)C(=O)CCCC2 targets the protein Endothelin receptor type A. The protein Endothelin receptor type A is associated with Lower eyelid coloboma. The Lower eyelid coloboma is associated with the disease mandibulofacial dysostosis with alopecia."} {"text":"The compound O=CNCCCccccCCNCCNcnscccccc96)))))))))CC6))))))))cc69))))))))))cccccc6 targets the protein 5-HT-2A. The protein 5-HT-2A is associated with Obsessive-compulsive behavior. The Obsessive-compulsive behavior is associated with the disease FRAXE intellectual disability."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/train_7-0.jsonl": "{"text":"The compound CCOC=O)NCCNC=O)CSccccnnc-cccncc6))))))n5n9))))))))))))CC6 targets the protein Tyr-DNA phosphodiesterase 1. The protein Tyr-DNA phosphodiesterase 1 is associated with Cerebellar vermis atrophy. The Cerebellar vermis atrophy is associated with the disease Diffuse cerebral and cerebellar atrophy-intractable seizures-progressive microcephaly syndrome."} {"text":"The compound NS(=O)(=O)c1ccc(-c2ccc(-c3ccc(S(N)(=O)=O)cc3)cc2)cc1 targets the protein CAC. The protein CAC is associated with Elevated serum acid phosphatase. The Elevated serum acid phosphatase is associated with the disease Oculocerebrorenal syndrome."}", "/scratch/micpie/export/compound_protein_hpo_disease_1/train_4-0.jsonl": "{"text":"The compound O=C(NC1CCc2ccc(CCN3CCN(c4nsc5ccccc45)CC3)cc21)c1ccccc1 targets the protein 5-hydroxytryptamine receptor 2A. The protein 5-hydroxytryptamine receptor 2A is associated with Obsessive-compulsive behavior. The Obsessive-compulsive behavior is associated with the disease Isolated CoQ-cytochrome C reductase deficiency."} {"text":"The compound InChI=1S\/C14H14N6OS2\/c1-9-2-4-10(5-3-9)12-18-19-14(20(12)15)23-8-11(21)17-13-16-6-7-22-13\/h2-7H,8,15H2,1H3,(H,16,17,21) targets the protein Tyr-DNA phosphodiesterase 1. The protein Tyr-DNA phosphodiesterase 1 is associated with Abnormality of the spinocerebellar tracts. The Abnormality of the spinocerebellar tracts is associated with the disease Adult-onset chronic progressive external ophthalmoplegia with mitochondrial myopathy."}", "/scratch/micpie/export/uniprot_binding_single/test_0-1.jsonl": "{"text":"Task: Create a chemical that binds to the given site in the peptide sequence.\nAA sequence: MRSVSGQVVCVTGAGGFIASWLVKILLEKGYTVRGTVRNPDDPKNGHLRELEGAKERLTLCKADLLDYQSLREAINGCDGVFHTASPVTDDPEQMVEPAVIGTKNVINAAAEANVRRVVFTSSIGAVYMDPNRDPETVVDETCWSDPDFCKNTKNWYCYGKMVAEQAAWEEAKEKGVDLVVINPVLVQGPLLQTTVNASVLHILKYLTGSAKTYANSVQAYVDVKDVALAHILLYETPEASGRYLCAESVLHRGDVVEILSKFFPEYPIPTKCSDVTKPRVKPYKFSNQKLKDLGLEFTPVKQCLYETVKSLQEKGHLPIPTQKDEPIIRIQP\nBinding position: 38\nOutput: O=CO)CC[C@H]NC=O)cccccccc6s9)))))))))))C=O)NCCOCC5=O"} {"text":"Task: Create a chemical that binds to the given site in the AA sequence.\nPeptide sequence: MDIKYNLAAAYKIMAYLSLDDHTYTHLSARSKNADFYYIYPFGLRFEEVTTENLLKVSLDGKILEGEEYQYNKTGYFIHGNIYKARPDVLAIFHYHTPASTAVSALKCGLLPISQWALHFYDRISYHDYNSLVLDADKQSTKFVNDLKQNYVMLLRNHGAITCGKTIHEAMFYTYHLEQACKTQCLLNSTIQQELIIPSVEICKKTVKDLLSFEEDLGKRDWEAWLRLINM\nBinding position: 158\nOutput: CCNC(=O)c1ncc2c(Nc3c(C)cccc3Cl)nc3c(OC)c(OC)ccc3n12"}", "/scratch/micpie/export/uniprot_binding_single/valid_0-0.jsonl": "{"text":"Task: Find a binding site for the compound in the amino acid sequence.\nPeptide sequence: MSLEQKKGADIISKILQIQNSIGKTTSPSTLKTKLSEISRKEQENARIQSKLSDLQKKKIDIDNKLLKEKQNLIKEEILERKKLEVLTKKQQKDEIEHQKKLKREIDAIKASTQYITDVSISSYNNTIPETEPEYDLFISHASEDKEDFVRPLAETLQQLGVNVWYDEFTLKVGDSLRQKIDSGLRNSKYGTVVLSTDFIKKDWTNYELDGLVAREMNGHKMILPIWHKITKNDVLDYSPNLADKVALNTSVNSIEEIAHQLADVILNR\nInChI representation: InChI=1S\/C22H31N3O2\/c1-27-20-7-3-2-6-19(20)24-15-12-23(13-16-24)14-17-25-11-10-22(18-21(25)26)8-4-5-9-22\/h2-3,6-7,10-11H,4-5,8-9,12-18H2,1H3\nResult: 143"} {"text":"Task: Identify a binding site for the compound in the AA sequence.\nAA sequence: MSIAIVTDSTSDLTPEHLAALGVTGVPLYVLFEGQLYQDGVQLSARQLVEGVRAGKAIPSTSQPSPAEFAQAYAQALEHADEVLSLHISGQLSGTVGSARLAAQEFGGRVTVVDTHTVTLGLGLQVLRAAELVRAGQSVPQIVQTLERVYPQADLRFTVDTLDFLRLNGRIGGASALLGGLLNIKPLLVVRGGRVDAGGRVRGQKKALADLAEHVRRYVSQHGGARVAFLATVGGEEDRAAVRAQLSDLHFQDMGDHEIGAVVTVHAGPGAVGVALEPLSA\ncanonical SMILES: CN(C)c1ncnc2c1ncn2Cc1cccc(Br)c1\nResult: 93"}", "/scratch/micpie/export/uniprot_binding_single/test_0-2.jsonl": "{"text":"Question: Can you find one binding site of the compound with the SELFIES [O][=C][Branch1][C][O][C][C][C@H1][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][S][Ring1][=Branch2][C][=Branch1][C][=O][N][C][C][O][C][C][Ring1][Branch1][=O] in this protein MRSVSGQVVCVTGAGGFIASWLVKILLEKGYTVRGTVRNPDDPKNGHLRELEGAKERLTLCKADLLDYQSLREAINGCDGVFHTASPVTDDPEQMVEPAVIGTKNVINAAAEANVRRVVFTSSIGAVYMDPNRDPETVVDETCWSDPDFCKNTKNWYCYGKMVAEQAAWEEAKEKGVDLVVINPVLVQGPLLQTTVNASVLHILKYLTGSAKTYANSVQAYVDVKDVALAHILLYETPEASGRYLCAESVLHRGDVVEILSKFFPEYPIPTKCSDVTKPRVKPYKFSNQKLKDLGLEFTPVKQCLYETVKSLQEKGHLPIPTQKDEPIIRIQP?\nAnswer: One site for the molecule is 38."} {"text":"Question: Can you find one binding site of the chemical with the SMILES CCNC(=O)c1ncc2c(Nc3c(C)cccc3Cl)nc3c(OC)c(OC)ccc3n12 in this AA sequence MDIKYNLAAAYKIMAYLSLDDHTYTHLSARSKNADFYYIYPFGLRFEEVTTENLLKVSLDGKILEGEEYQYNKTGYFIHGNIYKARPDVLAIFHYHTPASTAVSALKCGLLPISQWALHFYDRISYHDYNSLVLDADKQSTKFVNDLKQNYVMLLRNHGAITCGKTIHEAMFYTYHLEQACKTQCLLNSTIQQELIIPSVEICKKTVKDLLSFEEDLGKRDWEAWLRLINM?\nAnswer: One possible site for the compound is 158."}", "/scratch/micpie/export/uniprot_binding_single/test_0-0.jsonl": "{"text":"Task: Come up with a binding site for the chemical in the amino acid sequence.\nAA sequence: MRSVSGQVVCVTGAGGFIASWLVKILLEKGYTVRGTVRNPDDPKNGHLRELEGAKERLTLCKADLLDYQSLREAINGCDGVFHTASPVTDDPEQMVEPAVIGTKNVINAAAEANVRRVVFTSSIGAVYMDPNRDPETVVDETCWSDPDFCKNTKNWYCYGKMVAEQAAWEEAKEKGVDLVVINPVLVQGPLLQTTVNASVLHILKYLTGSAKTYANSVQAYVDVKDVALAHILLYETPEASGRYLCAESVLHRGDVVEILSKFFPEYPIPTKCSDVTKPRVKPYKFSNQKLKDLGLEFTPVKQCLYETVKSLQEKGHLPIPTQKDEPIIRIQP\ncanonical SMILES: O=C(O)CC[C@H](NC(=O)c1cc2ccccc2s1)C(=O)NC1COCC1=O\nOutput: 38"} {"text":"Task: Find a binding site for the molecule in the amino acid sequence.\nAA sequence: MDIKYNLAAAYKIMAYLSLDDHTYTHLSARSKNADFYYIYPFGLRFEEVTTENLLKVSLDGKILEGEEYQYNKTGYFIHGNIYKARPDVLAIFHYHTPASTAVSALKCGLLPISQWALHFYDRISYHDYNSLVLDADKQSTKFVNDLKQNYVMLLRNHGAITCGKTIHEAMFYTYHLEQACKTQCLLNSTIQQELIIPSVEICKKTVKDLLSFEEDLGKRDWEAWLRLINM\nInChI: InChI=1S\/C22H22ClN5O3\/c1-5-24-22(29)21-25-11-15-20(26-17-12(2)7-6-8-13(17)23)27-18-14(28(15)21)9-10-16(30-3)19(18)31-4\/h6-11H,5H2,1-4H3,(H,24,29)(H,26,27)\nResult: 158"}", "/scratch/micpie/export/uniprot_binding_single/test_0-3.jsonl": "{"text":"Question: What compound can possibly bind to the binding site at the position 38 in the protein sequence below?\nSequence: MRSVSGQVVCVTGAGGFIASWLVKILLEKGYTVRGTVRNPDDPKNGHLRELEGAKERLTLCKADLLDYQSLREAINGCDGVFHTASPVTDDPEQMVEPAVIGTKNVINAAAEANVRRVVFTSSIGAVYMDPNRDPETVVDETCWSDPDFCKNTKNWYCYGKMVAEQAAWEEAKEKGVDLVVINPVLVQGPLLQTTVNASVLHILKYLTGSAKTYANSVQAYVDVKDVALAHILLYETPEASGRYLCAESVLHRGDVVEILSKFFPEYPIPTKCSDVTKPRVKPYKFSNQKLKDLGLEFTPVKQCLYETVKSLQEKGHLPIPTQKDEPIIRIQP\nAnswer: [O][=C][Branch1][C][O][C][C][C@H1][Branch2][Ring1][Ring1][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][S][Ring1][=Branch2][C][=Branch1][C][=O][N][C][C][O][C][C][Ring1][Branch1][=O]"} {"text":"Question: What molecule can possibly bind to the site at 158 in the given AA sequence below?\nSequence: MDIKYNLAAAYKIMAYLSLDDHTYTHLSARSKNADFYYIYPFGLRFEEVTTENLLKVSLDGKILEGEEYQYNKTGYFIHGNIYKARPDVLAIFHYHTPASTAVSALKCGLLPISQWALHFYDRISYHDYNSLVLDADKQSTKFVNDLKQNYVMLLRNHGAITCGKTIHEAMFYTYHLEQACKTQCLLNSTIQQELIIPSVEICKKTVKDLLSFEEDLGKRDWEAWLRLINM\nAnswer: CCNC(=O)c1ncc2c(Nc3c(C)cccc3Cl)nc3c(OC)c(OC)ccc3n12"}", "/scratch/micpie/export/uniprot_binding_single/train_0-0.jsonl": "{"text":"Task: Find a binding site for the molecule in the peptide sequence.\nProtein: MRFQVIVAAATITMITSYIPGVASQSTSDGDDLFVPVSNFDPKSIFPEIKHPFEPMYANTENGKIVPTNSWISNLFYPSADNLAPTTPDPYTLRLLDGYGGNPGLTIRQPSAKVLGSYPPTNDVPYTDAGYMINSVVVDLRLTSSEWSDVVPDRQVTDWDHLSANLRLSTPQDSNSYIDFPIVRGMAYITANYNNLTPQFLSQHAIISVEADEKKSDDNTSTFSGRKFKITMNDDPTSTFIIYSLGDKPLELRKQDNSNLVASKPYTGVIRVAKLPAPEFETLLDASRAVWPTGGDISARSDDNNGASYTIKWKTNSNEAPLLTYAYAHHLTSIDDSNVKRTDMTLQSATKGPMTALVGNEWTLRETELSPVEWLPLQAAPNPTTINEIMTEINKDIASNYTQETAKEDNYFSGKGLQKFAMLALILNKSDQTQLRNPELAQIALDKLKAAFLPYLQNEQADPFRYDTLYKGIVAKAGLPTSMGGTDDLSAEFGHSYYSDHHYHQGYFVVTAAIIHHLDPTWNADRLKAWTEALIRDVNNANDGDEYFAAFRNWDWFAGHSWAGGIKPDGALDGRDQESVPESVNFYWGAKLWGLATGNTPLTKLASLQLAVTKRTTYEYFWMLDGNKNRPENIVRNKVIGIYFEQKTDYTTYFGRFLEYIHGIQQLPMTPELMEYIRTPEFVSQEWDEKLGAIAPTVQSPWAGVLYLNYAIINPAEAYPALRKVQMDDGQTRSYSLYLTATRPHFFRRSLLAALARHGSTRRPSLPSSGDDDKHEDGFLLRFRRLNPFNLKHRIY\nDeepSMILES representation: CSC=NC=O)CCCC)C)OccccBr)cc6%10)))))))))N5\nResult: 504"} {"text":"Task: Identify a binding site for the molecule in the protein.\nAA sequence: MSTNSYYSSASSSGFRVCPPGVPSKCWCGEEIITFTSKTKENPYRRFYRCAIAMKRENEEHLFKWVDEALLDEIKMVNEKCKRVVENISDLRMNVMANMELLNKNAKQMEEELIKKMEGELLTMKENVEELGHVMAKSALKTVGVAVVIVASIVWLWGRV\nSMILES: CCNC(=O)c1ncc2c(Nc3c(C)cccc3Cl)nc3c(OC)c(OC)ccc3n12\nOutput: 61"}", "/scratch/micpie/export/uniprot_binding_single/train_0-3.jsonl": "{"text":"Question: What molecule can bind to the site at the position 504 in the given amino acid sequence below?\nSequence: MRFQVIVAAATITMITSYIPGVASQSTSDGDDLFVPVSNFDPKSIFPEIKHPFEPMYANTENGKIVPTNSWISNLFYPSADNLAPTTPDPYTLRLLDGYGGNPGLTIRQPSAKVLGSYPPTNDVPYTDAGYMINSVVVDLRLTSSEWSDVVPDRQVTDWDHLSANLRLSTPQDSNSYIDFPIVRGMAYITANYNNLTPQFLSQHAIISVEADEKKSDDNTSTFSGRKFKITMNDDPTSTFIIYSLGDKPLELRKQDNSNLVASKPYTGVIRVAKLPAPEFETLLDASRAVWPTGGDISARSDDNNGASYTIKWKTNSNEAPLLTYAYAHHLTSIDDSNVKRTDMTLQSATKGPMTALVGNEWTLRETELSPVEWLPLQAAPNPTTINEIMTEINKDIASNYTQETAKEDNYFSGKGLQKFAMLALILNKSDQTQLRNPELAQIALDKLKAAFLPYLQNEQADPFRYDTLYKGIVAKAGLPTSMGGTDDLSAEFGHSYYSDHHYHQGYFVVTAAIIHHLDPTWNADRLKAWTEALIRDVNNANDGDEYFAAFRNWDWFAGHSWAGGIKPDGALDGRDQESVPESVNFYWGAKLWGLATGNTPLTKLASLQLAVTKRTTYEYFWMLDGNKNRPENIVRNKVIGIYFEQKTDYTTYFGRFLEYIHGIQQLPMTPELMEYIRTPEFVSQEWDEKLGAIAPTVQSPWAGVLYLNYAIINPAEAYPALRKVQMDDGQTRSYSLYLTATRPHFFRRSLLAALARHGSTRRPSLPSSGDDDKHEDGFLLRFRRLNPFNLKHRIY\nAnswer: [C][S][C][=N][C][=Branch1][C][=O][C][Branch2][Ring1][#Branch1][C][C][Branch1][C][C][Branch1][C][C][O][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][Ring1][=N][N][Ring2][Ring1][C]"} {"text":"Question: What compound can possibly bind to the binding site at 61 in the protein sequence below?\nSequence: MSTNSYYSSASSSGFRVCPPGVPSKCWCGEEIITFTSKTKENPYRRFYRCAIAMKRENEEHLFKWVDEALLDEIKMVNEKCKRVVENISDLRMNVMANMELLNKNAKQMEEELIKKMEGELLTMKENVEELGHVMAKSALKTVGVAVVIVASIVWLWGRV\nAnswer: CCNC(=O)c1ncc2c(Nc3c(C)cccc3Cl)nc3c(OC)c(OC)ccc3n12"}", "/scratch/micpie/export/uniprot_binding_single/valid_0-2.jsonl": "{"text":"Question: Can you find one binding site of the chemical with the canonical SMILES representation COc1ccccc1N1CCN(CCN2C=CC3(CCCC3)CC2=O)CC1 in this AA sequence MSLEQKKGADIISKILQIQNSIGKTTSPSTLKTKLSEISRKEQENARIQSKLSDLQKKKIDIDNKLLKEKQNLIKEEILERKKLEVLTKKQQKDEIEHQKKLKREIDAIKASTQYITDVSISSYNNTIPETEPEYDLFISHASEDKEDFVRPLAETLQQLGVNVWYDEFTLKVGDSLRQKIDSGLRNSKYGTVVLSTDFIKKDWTNYELDGLVAREMNGHKMILPIWHKITKNDVLDYSPNLADKVALNTSVNSIEEIAHQLADVILNR?\nAnswer: One binding site for the molecule is 143."} {"text":"Question: Can you give me one example of a binding site of the compound with the canonical SMILES representation CN(C)c1ncnc2c1ncn2Cc1cccc(Br)c1 in this peptide sequence MSIAIVTDSTSDLTPEHLAALGVTGVPLYVLFEGQLYQDGVQLSARQLVEGVRAGKAIPSTSQPSPAEFAQAYAQALEHADEVLSLHISGQLSGTVGSARLAAQEFGGRVTVVDTHTVTLGLGLQVLRAAELVRAGQSVPQIVQTLERVYPQADLRFTVDTLDFLRLNGRIGGASALLGGLLNIKPLLVVRGGRVDAGGRVRGQKKALADLAEHVRRYVSQHGGARVAFLATVGGEEDRAAVRAQLSDLHFQDMGDHEIGAVVTVHAGPGAVGVALEPLSA?\nAnswer: One site for the compound is 93."}", "/scratch/micpie/export/uniprot_binding_single/valid_0-1.jsonl": "{"text":"Task: Come up with a compound that binds to the given site in the peptide sequence.\nProtein: MSLEQKKGADIISKILQIQNSIGKTTSPSTLKTKLSEISRKEQENARIQSKLSDLQKKKIDIDNKLLKEKQNLIKEEILERKKLEVLTKKQQKDEIEHQKKLKREIDAIKASTQYITDVSISSYNNTIPETEPEYDLFISHASEDKEDFVRPLAETLQQLGVNVWYDEFTLKVGDSLRQKIDSGLRNSKYGTVVLSTDFIKKDWTNYELDGLVAREMNGHKMILPIWHKITKNDVLDYSPNLADKVALNTSVNSIEEIAHQLADVILNR\nBinding site: 143\nResult: COc1ccccc1N1CCN(CCN2C=CC3(CCCC3)CC2=O)CC1"} {"text":"Task: Come up with a molecule that binds to the given binding site in the protein.\nProtein: MSIAIVTDSTSDLTPEHLAALGVTGVPLYVLFEGQLYQDGVQLSARQLVEGVRAGKAIPSTSQPSPAEFAQAYAQALEHADEVLSLHISGQLSGTVGSARLAAQEFGGRVTVVDTHTVTLGLGLQVLRAAELVRAGQSVPQIVQTLERVYPQADLRFTVDTLDFLRLNGRIGGASALLGGLLNIKPLLVVRGGRVDAGGRVRGQKKALADLAEHVRRYVSQHGGARVAFLATVGGEEDRAAVRAQLSDLHFQDMGDHEIGAVVTVHAGPGAVGVALEPLSA\nBinding position: 93\nOutput: [C][N][Branch1][C][C][C][=N][C][=N][C][=C][Ring1][=Branch1][N][=C][N][Ring1][Branch1][C][C][=C][C][=C][C][Branch1][C][Br][=C][Ring1][#Branch1]"}", "/scratch/micpie/export/uniprot_binding_single/train_0-2.jsonl": "{"text":"Question: Can you find one binding site of the compound with the SMILES CSC1=NC(=O)C2(CC(C)(C)Oc3ccc(Br)cc32)N1 in this protein MRFQVIVAAATITMITSYIPGVASQSTSDGDDLFVPVSNFDPKSIFPEIKHPFEPMYANTENGKIVPTNSWISNLFYPSADNLAPTTPDPYTLRLLDGYGGNPGLTIRQPSAKVLGSYPPTNDVPYTDAGYMINSVVVDLRLTSSEWSDVVPDRQVTDWDHLSANLRLSTPQDSNSYIDFPIVRGMAYITANYNNLTPQFLSQHAIISVEADEKKSDDNTSTFSGRKFKITMNDDPTSTFIIYSLGDKPLELRKQDNSNLVASKPYTGVIRVAKLPAPEFETLLDASRAVWPTGGDISARSDDNNGASYTIKWKTNSNEAPLLTYAYAHHLTSIDDSNVKRTDMTLQSATKGPMTALVGNEWTLRETELSPVEWLPLQAAPNPTTINEIMTEINKDIASNYTQETAKEDNYFSGKGLQKFAMLALILNKSDQTQLRNPELAQIALDKLKAAFLPYLQNEQADPFRYDTLYKGIVAKAGLPTSMGGTDDLSAEFGHSYYSDHHYHQGYFVVTAAIIHHLDPTWNADRLKAWTEALIRDVNNANDGDEYFAAFRNWDWFAGHSWAGGIKPDGALDGRDQESVPESVNFYWGAKLWGLATGNTPLTKLASLQLAVTKRTTYEYFWMLDGNKNRPENIVRNKVIGIYFEQKTDYTTYFGRFLEYIHGIQQLPMTPELMEYIRTPEFVSQEWDEKLGAIAPTVQSPWAGVLYLNYAIINPAEAYPALRKVQMDDGQTRSYSLYLTATRPHFFRRSLLAALARHGSTRRPSLPSSGDDDKHEDGFLLRFRRLNPFNLKHRIY?\nAnswer: One site for the compound is 504."} {"text":"Question: Can you give me one example of a binding site of the chemical with the DeepSMILES CCNC=O)cncccNccC)cccc6Cl))))))))nccOC))cOC))ccc6n%13%10 in this peptide sequence MSTNSYYSSASSSGFRVCPPGVPSKCWCGEEIITFTSKTKENPYRRFYRCAIAMKRENEEHLFKWVDEALLDEIKMVNEKCKRVVENISDLRMNVMANMELLNKNAKQMEEELIKKMEGELLTMKENVEELGHVMAKSALKTVGVAVVIVASIVWLWGRV?\nAnswer: One binding site for the molecule is 61."}", "/scratch/micpie/export/uniprot_binding_single/train_0-1.jsonl": "{"text":"Task: Come up with a chemical that binds to the given in the AA sequence.\nAA sequence: MRFQVIVAAATITMITSYIPGVASQSTSDGDDLFVPVSNFDPKSIFPEIKHPFEPMYANTENGKIVPTNSWISNLFYPSADNLAPTTPDPYTLRLLDGYGGNPGLTIRQPSAKVLGSYPPTNDVPYTDAGYMINSVVVDLRLTSSEWSDVVPDRQVTDWDHLSANLRLSTPQDSNSYIDFPIVRGMAYITANYNNLTPQFLSQHAIISVEADEKKSDDNTSTFSGRKFKITMNDDPTSTFIIYSLGDKPLELRKQDNSNLVASKPYTGVIRVAKLPAPEFETLLDASRAVWPTGGDISARSDDNNGASYTIKWKTNSNEAPLLTYAYAHHLTSIDDSNVKRTDMTLQSATKGPMTALVGNEWTLRETELSPVEWLPLQAAPNPTTINEIMTEINKDIASNYTQETAKEDNYFSGKGLQKFAMLALILNKSDQTQLRNPELAQIALDKLKAAFLPYLQNEQADPFRYDTLYKGIVAKAGLPTSMGGTDDLSAEFGHSYYSDHHYHQGYFVVTAAIIHHLDPTWNADRLKAWTEALIRDVNNANDGDEYFAAFRNWDWFAGHSWAGGIKPDGALDGRDQESVPESVNFYWGAKLWGLATGNTPLTKLASLQLAVTKRTTYEYFWMLDGNKNRPENIVRNKVIGIYFEQKTDYTTYFGRFLEYIHGIQQLPMTPELMEYIRTPEFVSQEWDEKLGAIAPTVQSPWAGVLYLNYAIINPAEAYPALRKVQMDDGQTRSYSLYLTATRPHFFRRSLLAALARHGSTRRPSLPSSGDDDKHEDGFLLRFRRLNPFNLKHRIY\nBinding position: 504\nOutput: InChI=1S\/C14H15BrN2O2S\/c1-13(2)7-14(11(18)16-12(17-14)20-3)9-6-8(15)4-5-10(9)19-13\/h4-6H,7H2,1-3H3,(H,16,17,18)"} {"text":"Task: Create a chemical that binds to the given in the protein.\nAmino acid sequence: MSTNSYYSSASSSGFRVCPPGVPSKCWCGEEIITFTSKTKENPYRRFYRCAIAMKRENEEHLFKWVDEALLDEIKMVNEKCKRVVENISDLRMNVMANMELLNKNAKQMEEELIKKMEGELLTMKENVEELGHVMAKSALKTVGVAVVIVASIVWLWGRV\nBinding site: 61\nResult: CCNC(=O)c1ncc2c(Nc3c(C)cccc3Cl)nc3c(OC)c(OC)ccc3n12"}", "/scratch/micpie/export/uniprot_binding_single/valid_0-3.jsonl": "{"text":"Question: What compound can possibly bind to the site at the position 143 in the AA sequence below?\nSequence: MSLEQKKGADIISKILQIQNSIGKTTSPSTLKTKLSEISRKEQENARIQSKLSDLQKKKIDIDNKLLKEKQNLIKEEILERKKLEVLTKKQQKDEIEHQKKLKREIDAIKASTQYITDVSISSYNNTIPETEPEYDLFISHASEDKEDFVRPLAETLQQLGVNVWYDEFTLKVGDSLRQKIDSGLRNSKYGTVVLSTDFIKKDWTNYELDGLVAREMNGHKMILPIWHKITKNDVLDYSPNLADKVALNTSVNSIEEIAHQLADVILNR\nAnswer: COcccccc6NCCNCCNC=CCCCCC5))))CC6=O)))))))))CC6"} {"text":"Question: What molecule can bind to the binding site at 93 in the given protein sequence?\nSequence: MSIAIVTDSTSDLTPEHLAALGVTGVPLYVLFEGQLYQDGVQLSARQLVEGVRAGKAIPSTSQPSPAEFAQAYAQALEHADEVLSLHISGQLSGTVGSARLAAQEFGGRVTVVDTHTVTLGLGLQVLRAAELVRAGQSVPQIVQTLERVYPQADLRFTVDTLDFLRLNGRIGGASALLGGLLNIKPLLVVRGGRVDAGGRVRGQKKALADLAEHVRRYVSQHGGARVAFLATVGGEEDRAAVRAQLSDLHFQDMGDHEIGAVVTVHAGPGAVGVALEPLSA\nAnswer: CN(C)c1ncnc2c1ncn2Cc1cccc(Br)c1"}", "/scratch/micpie/export/compound_protein_hpo_disease_2/test_2-0.jsonl": "{"text":"The compound CCCCCOC=O)NCCNC=O)[C@H]CCC=O)O))))NC=O)cccSCCO))))nc-cccccc6))))))n6))))))))))CC6 targets the protein SP1999. The protein SP1999 is associated with Impaired ADP-induced platelet aggregation. The Impaired ADP-induced platelet aggregation is associated with the disease P2Y12 defect."} {"text":"The compound C[C@H](NC(=O)OCc1ccccc1)C(=O)NCC(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC1CCCCC1)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@@H](CCC(=O)O)C(N)=O targets the protein Protein APC. The protein Protein APC is associated with Osteoma. The Osteoma is associated with the disease Turcot syndrome with polyposis."}", "/scratch/micpie/export/compound_protein_hpo_disease_2/valid_0-0.jsonl": "{"text":"The compound CCn1c(Sc2ccc(Nc3c(C#N)cnc4cc(OCCCN(C)CC(O)CO)c(OC)cc34)cc2Cl)nc(C)c1C targets the protein Insulin-like growth factor 1 receptor (EC 2.7.10.1) (Insulin-like growth factor I receptor) (IGF-I receptor) (CD antigen CD221). The protein Insulin-like growth factor 1 receptor (EC 2.7.10.1) (Insulin-like growth factor I receptor) (IGF-I receptor) (CD antigen CD221) is associated with Abnormality of subcutaneous fat tissue. The Abnormality of subcutaneous fat tissue is associated with the disease Familial partial lipodystrophy associated with PLIN1 mutations."} {"text":"The compound COcccccccc=O)n-cccccc6))))))nc-5oc%139 targets the protein ER-beta. The protein ER-beta is associated with Elevated circulating luteinizing hormone level. The Elevated circulating luteinizing hormone level is associated with the disease spermatogenic failure 28."}", "/scratch/micpie/export/compound_protein_hpo_disease_2/train_1-0.jsonl": "{"text":"The compound O=C(Cc1cccs1)Nc1nnc(Cc2ccccc2)s1 targets the protein ER-beta. The protein ER-beta is associated with Elevated circulating luteinizing hormone level. The Elevated circulating luteinizing hormone level is associated with the disease 46,XX gonadal dysgenesis."} {"text":"The compound CCCCCOC=O)NCCNC=O)[C@H]CCC=O)O))))NC=O)cccNCCC=O)NC)))C4))))cc-cccccc6))))))n6))))))))))CC6 targets the protein ADP-glucose receptor. The protein ADP-glucose receptor is associated with Impaired ADP-induced platelet aggregation. The Impaired ADP-induced platelet aggregation is associated with the disease Hermansky-Pudlak syndrome."}", "/scratch/micpie/export/compound_protein_hpo_disease_2/test_0-0.jsonl": "{"text":"The compound Cc1nc(Cc2nc3cc(-c4nn([C@H]5CC[C@H](N6CCOCC6)CC5)c5ncnc(N)c45)ccc3[nH]2)cs1 targets the protein Insulin-like growth factor 1 receptor (EC 2.7.10.1) (Insulin-like growth factor I receptor) (IGF-I receptor) (CD antigen CD221). The protein Insulin-like growth factor 1 receptor (EC 2.7.10.1) (Insulin-like growth factor I receptor) (IGF-I receptor) (CD antigen CD221) is associated with Abnormality of subcutaneous fat tissue. The Abnormality of subcutaneous fat tissue is associated with the disease Familial partial lipodystrophy associated with PLIN1 mutations."} {"text":"The compound O=C1NCCCC2c3cc(O)c4ccccc4c3OC12N1CCCCC1 targets the protein ER-beta. The protein ER-beta is associated with Elevated circulating luteinizing hormone level. The Elevated circulating luteinizing hormone level is associated with the disease gonadal dysgenesis."}", "/scratch/micpie/export/compound_protein_hpo_disease_2/train_2-0.jsonl": "{"text":"The compound CCCCCOC(=O)N1CCN(C(=O)[C@H](CCC(=O)O)NC(=O)c2cc(N3CC(C(=O)NC)C3)cc(-c3ccccc3)n2)CC1 targets the protein P2Y12. The protein P2Y12 is associated with Impaired ADP-induced platelet aggregation. The Impaired ADP-induced platelet aggregation is associated with the disease Hermansky-Pudlak syndrome with neutropenia."} {"text":"The compound C[C@H]NC=O)OCcccccc6))))))))))C=O)NCC=O)N[C@@H]CCC=O)O))))C=O)N[C@@H]CO))C=O)N[C@@H]Ccccccccccc%106)))))))))))C=O)N[C@@H]CccccO)cc6)))))))C=O)N[C@@H]CCC=O)O))))CN)=O targets the protein Deleted in polyposis 2.5. The protein Deleted in polyposis 2.5 is associated with Osteoma. The Osteoma is associated with the disease Turcot syndrome with polyposis."}", "/scratch/micpie/export/compound_protein_hpo_disease_2/valid_2-0.jsonl": "{"text":"The compound CCCCCOC(=O)N1CCN(C(=O)[C@H](CCC(=O)O)NC(=O)c2cc(N3CCCC(C(N)=O)C3)cc(-c3ccccc3)n2)CC1 targets the protein ADP-glucose receptor. The protein ADP-glucose receptor is associated with Impaired ADP-induced platelet aggregation. The Impaired ADP-induced platelet aggregation is associated with the disease P2Y12 defect."} {"text":"The compound C[C@H](NC(=O)OCc1ccccc1)C(=O)NCC(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CO)C(=O)N[C@@H](Cc1ccc2ccccc2c1)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@@H](CCC(=O)O)C(N)=O targets the protein Deleted in polyposis 2.5. The protein Deleted in polyposis 2.5 is associated with Osteoma. The Osteoma is associated with the disease Turcot syndrome with polyposis."}", "/scratch/micpie/export/compound_protein_hpo_disease_2/train_0-0.jsonl": "{"text":"The compound CCNC=O)NCCNCCCOcccnccC#N))cNccccScncC)cC)n5CC))))))))cCl)c6)))))))c6cc%10OC))))))))))))))))CC6 targets the protein Insulin-like growth factor 1 receptor (EC 2.7.10.1) (Insulin-like growth factor I receptor) (IGF-I receptor) (CD antigen CD221). The protein Insulin-like growth factor 1 receptor (EC 2.7.10.1) (Insulin-like growth factor I receptor) (IGF-I receptor) (CD antigen CD221) is associated with Abnormality of subcutaneous fat tissue. The Abnormality of subcutaneous fat tissue is associated with the disease Familial partial lipodystrophy associated with PLIN1 mutations."} {"text":"The compound [C][N][C][C][N][Branch2][Ring2][Branch2][C][=C][Branch1][C][Cl][C][=Branch1][C][=O][C][Branch2][Ring1][C][C][=C][N][=C][Branch1][=Branch2][N][C][C][C][C][C][Ring1][=Branch1][S][Ring1][O][=C][Branch1][C][Cl][C][Ring2][Ring1][Ring2][=O][C][C][Ring2][Ring1][O] targets the protein Nuclear receptor subfamily 3 group A member 2. The protein Nuclear receptor subfamily 3 group A member 2 is associated with Elevated circulating luteinizing hormone level. The Elevated circulating luteinizing hormone level is associated with the disease spermatogenic failure 28."}", "/scratch/micpie/export/compound_protein_hpo_disease_2/test_1-0.jsonl": "{"text":"The compound O=C1NCCCC2c3cc(O)c4ccccc4c3OC12N1CCCCC1 targets the protein Nuclear receptor subfamily 3 group A member 2. The protein Nuclear receptor subfamily 3 group A member 2 is associated with Elevated circulating luteinizing hormone level. The Elevated circulating luteinizing hormone level is associated with the disease spermatogenic failure 28."} {"text":"The compound CCCCCOC=O)NCCNC=O)[C@H]CCC=O)O))))NC=O)cccNCC[C@H]C=O)NCCCC5))))))C5)))))cc-cccccc6))))))n6))))))))))CC6 targets the protein SP1999. The protein SP1999 is associated with Impaired ADP-induced platelet aggregation. The Impaired ADP-induced platelet aggregation is associated with the disease Hermansky-Pudlak syndrome with neutropenia."}", "/scratch/micpie/export/compound_protein_hpo_disease_2/valid_1-0.jsonl": "{"text":"The compound O=cccCScnnc-ccccs5)))))n5Ccccccc6))))))))))))))ncscccccc6n%139 targets the protein Nuclear receptor subfamily 3 group A member 2. The protein Nuclear receptor subfamily 3 group A member 2 is associated with Elevated circulating luteinizing hormone level. The Elevated circulating luteinizing hormone level is associated with the disease 46,XX gonadal dysgenesis."} {"text":"The compound CCCCCCOC(=O)N1CCN(C(=O)[C@H](CCC(=O)O)NC(=O)c2cc(N3CCC(N)CC3)cc(-c3ccccc3)n2)CC1 targets the protein P2Y(ADP. The protein P2Y(ADP is associated with Impaired ADP-induced platelet aggregation. The Impaired ADP-induced platelet aggregation is associated with the disease Hermansky-Pudlak syndrome with neutropenia."}", "/scratch/micpie/export/uspto_yield/test_0-1.jsonl": "{"text":"User: I would like to run a reaction with the RXNSMILES CC(=O)O.COc1ccc2c(c1)CCC(=O)C2CC(=O)c1ccccc1.Nc1ccc(O)c(C(=O)O)c1>>COc1ccc2c(c1)CCc1c-2cc(-c2ccccc2)n1-c1ccc(O)c(C(=O)O)c1. What is the yield I should expect?\nAssistant: The yield is 66\\%."} {"text":"User: I would like to run a reaction with the reaction SMILES string CC#N.CCc1cncc(Nc2cc(COc3ccc(NC(=O)OC(C)(C)C)c4ccccc34)ccn2)n1.N.O.O=S(=O)(O)O>>CCc1cncc(Nc2cc(COc3ccc(N)c4ccccc34)ccn2)n1. What is the reaction yield I can get?\nAssistant: The reaction yield is 75\\%."}", "/scratch/micpie/export/uspto_yield/valid_0-0.jsonl": "{"text":"The yield of a reaction with the reaction SMILES CC(C)N.CO.O=C1CCC(c2ccc(OCC3CO3)cc2)=NN1>>CC(C)NCC(O)COc1ccc(C2=NNC(=O)CC2)cc1 is 100\\%."} {"text":"The yield of a reaction with the RXNSMILES COC(=O)CCn1c(=O)c2c(CBr)c(-c3ccc(Cl)cc3)sc2n(C)c1=O.Cc1ccccc1.O=C([O-])[O-].OB(O)c1ccc(Cl)cc1.[Cs+]>>COC(=O)CCn1c(=O)c2c(Cc3ccc(Cl)cc3)c(-c3ccc(Cl)cc3)sc2n(C)c1=O is 40\\%."}", "/scratch/micpie/export/uspto_yield/test_0-2.jsonl": "{"text":"Question: What is the yield of a reaction with the reaction SMILES string CC(=O)O.COc1ccc2c(c1)CCC(=O)C2CC(=O)c1ccccc1.Nc1ccc(O)c(C(=O)O)c1>>COc1ccc2c(c1)CCc1c-2cc(-c2ccccc2)n1-c1ccc(O)c(C(=O)O)c1?\nAnswer: 66\\%."} {"text":"Question: What is the yield of a reaction with the reaction SMILES (RXNSMILES) CC#N.CCc1cncc(Nc2cc(COc3ccc(NC(=O)OC(C)(C)C)c4ccccc34)ccn2)n1.N.O.O=S(=O)(O)O>>CCc1cncc(Nc2cc(COc3ccc(N)c4ccccc34)ccn2)n1?\nAnswer: 75\\%."}", "/scratch/micpie/export/uspto_yield/test_0-0.jsonl": "{"text":"The yield of a reaction with the reaction SMILES string CC(=O)O.COc1ccc2c(c1)CCC(=O)C2CC(=O)c1ccccc1.Nc1ccc(O)c(C(=O)O)c1>>COc1ccc2c(c1)CCc1c-2cc(-c2ccccc2)n1-c1ccc(O)c(C(=O)O)c1 is 66\\%."} {"text":"The yield of a reaction with the reaction SMILES (RXNSMILES) CC#N.CCc1cncc(Nc2cc(COc3ccc(NC(=O)OC(C)(C)C)c4ccccc34)ccn2)n1.N.O.O=S(=O)(O)O>>CCc1cncc(Nc2cc(COc3ccc(N)c4ccccc34)ccn2)n1 is 75\\%."}", "/scratch/micpie/export/uspto_yield/test_0-3.jsonl": "{"text":"The yield of a reaction of CC(=O)O, COc1ccc2c(c1)CCC(=O)C2CC(=O)c1ccccc1, and Nc1ccc(O)c(C(=O)O)c1 to COc1ccc2c(c1)CCc1c-2cc(-c2ccccc2)n1-c1ccc(O)c(C(=O)O)c1 is 66\\%."} {"text":"The reaction yield of a reaction of CC#N, CCc1cncc(Nc2cc(COc3ccc(NC(=O)OC(C)(C)C)c4ccccc34)ccn2)n1, N, O, and O=S(=O)(O)O to CCc1cncc(Nc2cc(COc3ccc(N)c4ccccc34)ccn2)n1 is 75\\%."}", "/scratch/micpie/export/uspto_yield/train_0-0.jsonl": "{"text":"The reaction yield of a reaction with the RXNSMILES CC(=O)c1ccco1.CC(C)=O.Cl.Nc1ccc(Cl)c(Cl)c1.O.O=N[O-].[Na+]>>CC(=O)c1ccc(-c2ccc(Cl)c(Cl)c2)o1 is 35\\%."} {"text":"The yield of a reaction with the reaction SMILES string CC(OCc1ccccc1)c1nn2c(C3=CCOCC3)ncc2c(=O)[nH]1.CO.[OH-].[Pd+2]>>CC(O)c1nn2c(C3CCOCC3)ncc2c(=O)[nH]1 is 80\\%."}", "/scratch/micpie/export/uspto_yield/train_0-3.jsonl": "{"text":"The yield of a reaction of CC(=O)c1ccco1, CC(C)=O, Cl, Nc1ccc(Cl)c(Cl)c1, O, O=N[O-], and [Na+] to CC(=O)c1ccc(-c2ccc(Cl)c(Cl)c2)o1 is 35\\%."} {"text":"The reaction yield of a reaction of CC(OCc1ccccc1)c1nn2c(C3=CCOCC3)ncc2c(=O)[nH]1, CO, [OH-], and [Pd+2] to CC(O)c1nn2c(C3CCOCC3)ncc2c(=O)[nH]1 is 80\\%."}", "/scratch/micpie/export/uspto_yield/valid_0-2.jsonl": "{"text":"Question: What is the yield of a reaction with the reaction SMILES (RXNSMILES) CC(C)N.CO.O=C1CCC(c2ccc(OCC3CO3)cc2)=NN1>>CC(C)NCC(O)COc1ccc(C2=NNC(=O)CC2)cc1?\nAnswer: 100\\%."} {"text":"Question: What is reaction yield of a reaction with the reaction SMILES (RXNSMILES) COC(=O)CCn1c(=O)c2c(CBr)c(-c3ccc(Cl)cc3)sc2n(C)c1=O.Cc1ccccc1.O=C([O-])[O-].OB(O)c1ccc(Cl)cc1.[Cs+]>>COC(=O)CCn1c(=O)c2c(Cc3ccc(Cl)cc3)c(-c3ccc(Cl)cc3)sc2n(C)c1=O?\nAnswer: 40\\%."}", "/scratch/micpie/export/uspto_yield/valid_0-1.jsonl": "{"text":"User: I need to run a reaction with the RXNSMILES CC(C)N.CO.O=C1CCC(c2ccc(OCC3CO3)cc2)=NN1>>CC(C)NCC(O)COc1ccc(C2=NNC(=O)CC2)cc1. What is the reaction yield I should expect?\nAssistant: The expected reaction yield is 100\\%."} {"text":"User: I need to run a reaction with the reaction SMILES COC(=O)CCn1c(=O)c2c(CBr)c(-c3ccc(Cl)cc3)sc2n(C)c1=O.Cc1ccccc1.O=C([O-])[O-].OB(O)c1ccc(Cl)cc1.[Cs+]>>COC(=O)CCn1c(=O)c2c(Cc3ccc(Cl)cc3)c(-c3ccc(Cl)cc3)sc2n(C)c1=O. What is the yield I should expect?\nAssistant: The expected yield is 40\\%."}", "/scratch/micpie/export/uspto_yield/valid_0-4.jsonl": "{"text":"Question: What's the reaction yield of a reaction of CC(C)N, CO, and O=C1CCC(c2ccc(OCC3CO3)cc2)=NN1 to CC(C)NCC(O)COc1ccc(C2=NNC(=O)CC2)cc1?\nAnswer: 100\\%."} {"text":"Question: What's the reaction yield of a reaction of COC(=O)CCn1c(=O)c2c(CBr)c(-c3ccc(Cl)cc3)sc2n(C)c1=O, Cc1ccccc1, O=C([O-])[O-], OB(O)c1ccc(Cl)cc1, and [Cs+] to COC(=O)CCn1c(=O)c2c(Cc3ccc(Cl)cc3)c(-c3ccc(Cl)cc3)sc2n(C)c1=O?\nAnswer: 40\\%."}", "/scratch/micpie/export/uspto_yield/train_0-2.jsonl": "{"text":"Question: What is the yield of a reaction with the reaction SMILES CC(=O)c1ccco1.CC(C)=O.Cl.Nc1ccc(Cl)c(Cl)c1.O.O=N[O-].[Na+]>>CC(=O)c1ccc(-c2ccc(Cl)c(Cl)c2)o1?\nAnswer: 35\\%."} {"text":"Question: What is the reaction yield of a reaction with the reaction SMILES string CC(OCc1ccccc1)c1nn2c(C3=CCOCC3)ncc2c(=O)[nH]1.CO.[OH-].[Pd+2]>>CC(O)c1nn2c(C3CCOCC3)ncc2c(=O)[nH]1?\nAnswer: 80\\%."}", "/scratch/micpie/export/uspto_yield/train_0-1.jsonl": "{"text":"User: I would like to run a reaction with the reaction SMILES (RXNSMILES) CC(=O)c1ccco1.CC(C)=O.Cl.Nc1ccc(Cl)c(Cl)c1.O.O=N[O-].[Na+]>>CC(=O)c1ccc(-c2ccc(Cl)c(Cl)c2)o1. What is the yield I can get?\nAssistant: The expected yield is 35\\%."} {"text":"User: I need to run a reaction with the reaction SMILES (RXNSMILES) CC(OCc1ccccc1)c1nn2c(C3=CCOCC3)ncc2c(=O)[nH]1.CO.[OH-].[Pd+2]>>CC(O)c1nn2c(C3CCOCC3)ncc2c(=O)[nH]1. What is the yield I can get?\nAssistant: The estimated yield is 80\\%."}", "/scratch/micpie/export/uspto_yield/train_0-4.jsonl": "{"text":"Question: What is the yield of a reaction of CC(=O)c1ccco1, CC(C)=O, Cl, Nc1ccc(Cl)c(Cl)c1, O, O=N[O-], and [Na+] to CC(=O)c1ccc(-c2ccc(Cl)c(Cl)c2)o1?\nAnswer: 35\\%."} {"text":"Question: What's the yield of a reaction of CC(OCc1ccccc1)c1nn2c(C3=CCOCC3)ncc2c(=O)[nH]1, CO, [OH-], and [Pd+2] to CC(O)c1nn2c(C3CCOCC3)ncc2c(=O)[nH]1?\nAnswer: 80\\%."}", "/scratch/micpie/export/uspto_yield/valid_0-3.jsonl": "{"text":"The yield of a reaction of CC(C)N, CO, and O=C1CCC(c2ccc(OCC3CO3)cc2)=NN1 to CC(C)NCC(O)COc1ccc(C2=NNC(=O)CC2)cc1 is 100\\%."} {"text":"The yield of a reaction of COC(=O)CCn1c(=O)c2c(CBr)c(-c3ccc(Cl)cc3)sc2n(C)c1=O, Cc1ccccc1, O=C([O-])[O-], OB(O)c1ccc(Cl)cc1, and [Cs+] to COC(=O)CCn1c(=O)c2c(Cc3ccc(Cl)cc3)c(-c3ccc(Cl)cc3)sc2n(C)c1=O is 40\\%."}", "/scratch/micpie/export/uspto_yield/test_0-4.jsonl": "{"text":"Question: What is yield of a reaction of CC(=O)O, COc1ccc2c(c1)CCC(=O)C2CC(=O)c1ccccc1, and Nc1ccc(O)c(C(=O)O)c1 to COc1ccc2c(c1)CCc1c-2cc(-c2ccccc2)n1-c1ccc(O)c(C(=O)O)c1?\nAnswer: 66\\%."} {"text":"Question: What's the yield of a reaction of CC#N, CCc1cncc(Nc2cc(COc3ccc(NC(=O)OC(C)(C)C)c4ccccc34)ccn2)n1, N, O, and O=S(=O)(O)O to CCc1cncc(Nc2cc(COc3ccc(N)c4ccccc34)ccn2)n1?\nAnswer: 75\\%."}", "/scratch/micpie/export/BBBP/test_0-5.jsonl": "{"text":"User: Can you derive if the molecule with the DeepSMILES cccNCCNC)CC6))))))cF)cc6ccCO)=O))cn%10CC)CO%12))))))=O is permeable through the membrane separating circulating blood and extracellular brain fluid?\nAssistant: Yes, this molecule is permeable through the membrane separating circulating blood and extracellular brain fluid."} {"text":"User: Can you derive if the molecule with the canonical SMILES [N-]=[N+]=O is permeable through the membrane separating circulating blood and extracellular brain fluid?\nAssistant: Yes, this molecule is permeable through the membrane separating circulating blood and extracellular brain fluid."}", "/scratch/micpie/export/BBBP/test_0-1.jsonl": "{"text":"Based on the SELFIES [C][=C][C][Branch1][N][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][Branch1][C][F][C][=C][Ring1][=C][C][Branch2][Ring1][#Branch1][C][Branch1][=Branch1][C][Branch1][C][O][=O][=C][N][Ring2][Ring1][Branch1][C][Branch1][C][C][C][O][Ring2][Ring1][Branch2][=O], the molecule is permeable through the membrane separating circulating blood and extracellular brain fluid."} {"text":"Based on the DeepSMILES representation [N+]=[N-])=O, the molecule is permeable through the membrane separating circulating blood and extracellular brain fluid."}", "/scratch/micpie/export/BBBP/valid_0-0.jsonl": "{"text":"The chemical with the InChI of InChI=1S\/C16H21NO2.Cl\/c1-12(2)17-10-14(18)11-19-16-9-5-7-13-6-3-4-8-15(13)16;\/h3-9,12,14,17-18H,10-11H2,1-2H3; exhibits permeability through the membrane separating circulating blood and extracellular brain fluid."} {"text":"The chemical with the InChI of InChI=1S\/C16H17BrN2\/c1-19(2)11-9-16(14-4-3-10-18-12-14)13-5-7-15(17)8-6-13\/h3-10,12H,11H2,1-2H3\/b16-9- displays permeability through the membrane separating circulating blood and extracellular brain fluid."}", "/scratch/micpie/export/BBBP/test_0-2.jsonl": "{"text":"The DeepSMILES cccNCCNC)CC6))))))cF)cc6ccCO)=O))cn%10CC)CO%12))))))=O represents a molecule that is identified as permeable through the blood-brain barrier."} {"text":"The SELFIES [N+1][=Branch1][C][=N-1][=O] represents a molecule that is identified as permeable through the membrane separating circulating blood and extracellular brain fluid."}", "/scratch/micpie/export/BBBP/train_0-6.jsonl": "{"text":"User: Is the molecule with the SELFIES [C][=Branch1][C][=O][Branch1][#Branch2][O][C][Branch1][C][C][Branch1][C][C][C][C][C][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][Branch1][Ring2][C][C][Cl][C][C][Cl] permeable through the blood-brain barrier?\nAssistant: Yes, it is permeable through the blood-brain barrier."} {"text":"User: Is the molecule with the InChI InChI=1S\/C11H13N5O5\/c12-15-13-5-10(18)14-9(6-17)11(19)7-1-3-8(4-2-7)16(20)21\/h1-4,9,11,17,19H,5-6H2,(H,14,18)\/t9-,11-\/m1\/s1 permeable through the membrane separating circulating blood and extracellular brain fluid?\nAssistant: Yes, it is permeable through the membrane separating circulating blood and extracellular brain fluid."}", "/scratch/micpie/export/BBBP/valid_0-6.jsonl": "{"text":"User: Is the molecule with the IUPAC name 1-naphthalen-1-yloxy-3-(propan-2-ylamino)propan-2-ol;hydrochloride permeable through the blood-brain barrier?\nAssistant: Yes, it is permeable through the blood-brain barrier."} {"text":"User: Is the molecule with the canonical SMILES CN(C)C\/C=C(\/c1ccc(Br)cc1)c1cccnc1 permeable through the membrane separating circulating blood and extracellular brain fluid?\nAssistant: Yes, it is permeable through the membrane separating circulating blood and extracellular brain fluid."}", "/scratch/micpie/export/BBBP/test_0-0.jsonl": "{"text":"The chemical with the canonical SMILES of CC1COc2c(N3CCN(C)CC3)c(F)cc3c(=O)c(C(=O)O)cn1c23 exhibits permeability through the membrane separating circulating blood and extracellular brain fluid."} {"text":"The compound with the InChI of InChI=1S\/N2O\/c1-2-3 exhibits permeability through the membrane separating circulating blood and extracellular brain fluid."}", "/scratch/micpie/export/BBBP/test_0-3.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is permeable through the membrane separating circulating blood and extracellular brain fluid.\nMolecule SMILES: c12c3c(N4CCN(C)CC4)c(F)cc1c(c(C(O)=O)cn2C(C)CO3)=O\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is permeable through the blood-brain barrier.\nSMILES: [N+](=[N-])=O\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"}", "/scratch/micpie/export/BBBP/train_0-0.jsonl": "{"text":"The chemical with the SELFIES of [C][=Branch1][C][=O][Branch1][#Branch2][O][C][Branch1][C][C][Branch1][C][C][C][C][C][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][Branch1][Ring2][C][C][Cl][C][C][Cl] shows permeability through the membrane separating circulating blood and extracellular brain fluid."} {"text":"The compound with the canonical SMILES of [N-]=[N+]=NCC(=O)N[C@H](CO)[C@H](O)c1ccc([N+](=O)[O-])cc1 displays permeability through the membrane separating circulating blood and extracellular brain fluid."}", "/scratch/micpie/export/BBBP/test_0-6.jsonl": "{"text":"User: Is the molecule with the SMILES c12c3c(N4CCN(C)CC4)c(F)cc1c(c(C(O)=O)cn2C(C)CO3)=O permeable through the membrane separating circulating blood and extracellular brain fluid?\nAssistant: Yes, it is permeable through the membrane separating circulating blood and extracellular brain fluid."} {"text":"User: Is the molecule with the DeepSMILES [N+]=[N-])=O permeable through the blood-brain barrier?\nAssistant: Yes, it is permeable through the blood-brain barrier."}", "/scratch/micpie/export/BBBP/train_0-3.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is permeable through the membrane separating circulating blood and extracellular brain fluid.\nInChI: InChI=1S\/C18H27Cl2NO2\/c1-18(2,3)23-17(22)6-4-5-15-7-9-16(10-8-15)21(13-11-19)14-12-20\/h7-10H,4-6,11-14H2,1-3H3\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is permeable through the membrane separating circulating blood and extracellular brain fluid.\nMolecule canonical SMILES: [N-]=[N+]=NCC(=O)N[C@H](CO)[C@H](O)c1ccc([N+](=O)[O-])cc1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"}", "/scratch/micpie/export/BBBP/valid_0-2.jsonl": "{"text":"The SELFIES [ClH0].[C][C][Branch1][C][C][N][C][C][Branch1][C][O][C][O][C][=C][C][=C][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1] represents a molecule that is identified as permeable through the blood-brain barrier."} {"text":"The canonical SMILES CN(C)C\/C=C(\/c1ccc(Br)cc1)c1cccnc1 represents a molecule that is identified as permeable through the membrane separating circulating blood and extracellular brain fluid."}", "/scratch/micpie/export/BBBP/valid_0-1.jsonl": "{"text":"Based on the IUPAC name 1-naphthalen-1-yloxy-3-(propan-2-ylamino)propan-2-ol;hydrochloride, the molecule is permeable through the blood-brain barrier."} {"text":"Based on the IUPAC name representation (Z)-3-(4-bromophenyl)-N,N-dimethyl-3-pyridin-3-ylprop-2-en-1-amine, the molecule is permeable through the membrane separating circulating blood and extracellular brain fluid."}", "/scratch/micpie/export/BBBP/valid_0-5.jsonl": "{"text":"User: Can you estimate if the molecule with the IUPAC name 1-naphthalen-1-yloxy-3-(propan-2-ylamino)propan-2-ol;hydrochloride is permeable through the membrane separating circulating blood and extracellular brain fluid?\nAssistant: Yes, this molecule is permeable through the membrane separating circulating blood and extracellular brain fluid."} {"text":"User: Can you derive if the molecule with the DeepSMILES C=C\\CC=CC=CN=C6))))))=C\\CNC)C)))))C=CC=C6)Br is permeable through the membrane separating circulating blood and extracellular brain fluid?\nAssistant: Yes, this molecule is permeable through the membrane separating circulating blood and extracellular brain fluid."}", "/scratch/micpie/export/BBBP/valid_0-4.jsonl": "{"text":"Task: Please give me a molecule IUPAC name based on the text description below.\nDescription: A molecule that is permeable through the blood-brain barrier.\nResult: 1-naphthalen-1-yloxy-3-(propan-2-ylamino)propan-2-ol;hydrochloride"} {"text":"Task: Please create a SMILES based on the text description.\nDescription: A molecule that is permeable through the blood-brain barrier.\nResult: C2=C(\\C(C1=CC=CN=C1)=C\\CN(C)C)C=CC(=C2)Br"}", "/scratch/micpie/export/BBBP/train_0-5.jsonl": "{"text":"User: Can you tell me if the molecule with the SMILES C(=O)(OC(C)(C)C)CCCc1ccc(cc1)N(CCCl)CCCl is permeable through the blood-brain barrier?\nAssistant: Yes, this molecule is permeable through the blood-brain barrier."} {"text":"User: Can you derive if the molecule with the SELFIES [N+1][=Branch2][Ring1][P][=N][C][C][=Branch1][C][=O][N][C@@H1][Branch2][Ring1][Ring2][C@H1][Branch1][C][O][C][=C][C][=C][Branch1][=Branch1][N+1][Branch1][C][O-1][=O][C][=C][Ring1][=Branch2][C][O][=N-1] is permeable through the blood-brain barrier?\nAssistant: Yes, this molecule is permeable through the blood-brain barrier."}", "/scratch/micpie/export/BBBP/train_0-2.jsonl": "{"text":"The InChI InChI=1S\/C18H27Cl2NO2\/c1-18(2,3)23-17(22)6-4-5-15-7-9-16(10-8-15)21(13-11-19)14-12-20\/h7-10H,4-6,11-14H2,1-3H3 represents a molecule that is identified as permeable through the blood-brain barrier."} {"text":"The SELFIES [N+1][=Branch2][Ring1][P][=N][C][C][=Branch1][C][=O][N][C@@H1][Branch2][Ring1][Ring2][C@H1][Branch1][C][O][C][=C][C][=C][Branch1][=Branch1][N+1][Branch1][C][O-1][=O][C][=C][Ring1][=Branch2][C][O][=N-1] represents a molecule that is identified as permeable through the membrane separating circulating blood and extracellular brain fluid."}", "/scratch/micpie/export/BBBP/train_0-1.jsonl": "{"text":"Based on the SELFIES representation [C][=Branch1][C][=O][Branch1][#Branch2][O][C][Branch1][C][C][Branch1][C][C][C][C][C][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][Branch1][Ring2][C][C][Cl][C][C][Cl], the molecule is permeable through the blood-brain barrier."} {"text":"Based on the SELFIES [N+1][=Branch2][Ring1][P][=N][C][C][=Branch1][C][=O][N][C@@H1][Branch2][Ring1][Ring2][C@H1][Branch1][C][O][C][=C][C][=C][Branch1][=Branch1][N+1][Branch1][C][O-1][=O][C][=C][Ring1][=Branch2][C][O][=N-1], the molecule is permeable through the blood-brain barrier."}", "/scratch/micpie/export/BBBP/train_0-4.jsonl": "{"text":"Task: Please generate a SMILES based on the text description below.\nDescription: A molecule that is permeable through the membrane separating circulating blood and extracellular brain fluid.\nResult: C(=O)(OC(C)(C)C)CCCc1ccc(cc1)N(CCCl)CCCl"} {"text":"Task: Please give me a molecule canonical SMILES based on the text description.\nDescription: A molecule that is permeable through the blood-brain barrier.\nResult: [N-]=[N+]=NCC(=O)N[C@H](CO)[C@H](O)c1ccc([N+](=O)[O-])cc1"}", "/scratch/micpie/export/BBBP/valid_0-3.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is permeable through the membrane separating circulating blood and extracellular brain fluid.\ncanonical SMILES: CC(C)NCC(O)COc1cccc2ccccc12.[Cl]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any extra words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is permeable through the membrane separating circulating blood and extracellular brain fluid.\nMolecule SELFIES: [C][=C][Branch2][Ring1][Ring1][\\C][Branch1][=Branch2][C][=C][C][=C][N][=C][Ring1][=Branch1][=C][\\C][N][Branch1][C][C][C][C][=C][C][=Branch1][Branch1][=C][Ring2][Ring1][C][Br]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any extra words.\nResult: True"}", "/scratch/micpie/export/BBBP/test_0-4.jsonl": "{"text":"Task: Please create a molecule SELFIES based on the text description.\nDescription: A molecule that is permeable through the blood-brain barrier.\nResult: [C][=C][C][Branch1][N][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][=C][Branch1][C][F][C][=C][Ring1][=C][C][Branch2][Ring1][#Branch1][C][Branch1][=Branch1][C][Branch1][C][O][=O][=C][N][Ring2][Ring1][Branch1][C][Branch1][C][C][C][O][Ring2][Ring1][Branch2][=O]"} {"text":"Task: Please create a molecule SELFIES based on the text description.\nDescription: A molecule that is permeable through the blood-brain barrier.\nResult: [N+1][=Branch1][C][=N-1][=O]"}", "/scratch/micpie/export/compound_protein_go_term_3/test_8-1.jsonl": "{"text":"The compound Nc1cnc(-c2ccc(-c3ccccc3Oc3cc(N4CCC4)ncn3)cc2F)cn1 targets the protein Arachidonate 5-lipoxygenase-activating protein. The protein Arachidonate 5-lipoxygenase-activating protein enables the arachidonic acid binding."} {"text":"The compound [N][#C][C][Branch2][Ring2][Branch1][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][C][C][C][N][N][C][=Branch1][C][=O][C][=C][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][N][C][C][Ring2][Ring1][O] targets the protein Cathepsin B (EC 3.4.22.1) (APP secretase) (APPS) (Cathepsin B1). The protein Cathepsin B (EC 3.4.22.1) (APP secretase) (APPS) (Cathepsin B1) is located in the collagen-containing extracellular matrix."}", "/scratch/micpie/export/compound_protein_go_term_3/test_4-0.jsonl": "{"text":"The compound [C][C][O][C][=Branch1][C][=O][C][O][C][=C][C][Branch1][=Branch1][C][=Branch1][C][=O][O][=C][Branch1][C][O][C][=C][C][=C][C][=C][Ring1][=C][Ring1][=Branch1] targets the protein Caspase-9 (CASP-9) (EC 3.4.22.62) (Apoptotic protease Mch-6) (Apoptotic protease-activating factor 3) (APAF-3) (ICE-like apoptotic protease 6) (ICE-LAP6) which enables the cysteine-type endopeptidase activity involved in execution phase of apoptosis."} {"text":"The compound Nc1ncnc2c1c(SCc1ccccn1)nn2[C@@H]1O[C@H](COS(N)(=O)=O)[C@@H](O)[C@H]1O targets the protein Autophagy-related protein 7 which is located in the phagophore assembly site."}", "/scratch/micpie/export/compound_protein_go_term_3/valid_5-1.jsonl": "{"text":"The compound ccccc-cccncc-ccccOCCNCCCCC6)))))))))cc6))))))cnc96)))))))))ccnc6c%10 targets the protein Bone morphogenetic protein 4. The protein Bone morphogenetic protein 4 is involved in the positive regulation of bone mineralization."} {"text":"The compound CCN(C(=O)c1ccc(CNc2nc(NCCN(C)C)nc(N3CCc4cc(OC)c(OC)cc4C3)n2)cc1)c1cccc(C)c1 targets the protein ADMP-1. The protein ADMP-1 is located in the extracellular matrix."}", "/scratch/micpie/export/compound_protein_go_term_3/valid_5-0.jsonl": "{"text":"The compound c1ccc2c(-c3ccn4cc(-c5ccc(OCCN6CCCCC6)cc5)cnc34)ccnc2c1 targets the protein BMP-2B which is involved in the positive regulation of bone mineralization."} {"text":"The compound CCN(C(=O)c1ccc(CNc2nc(NCCN(C)C)nc(N3CCc4cc(OC)c(OC)cc4C3)n2)cc1)c1cccc(C)c1 targets the protein ADAMTS-4 which is located in the extracellular matrix."}", "/scratch/micpie/export/compound_protein_go_term_3/train_7-1.jsonl": "{"text":"The compound InChI=1S\/C17H22FN3O5S\/c1-16(10-27(23,24)17(15(19)21-16)5-6-26-9-17)12-7-11(3-4-13(12)18)20-14(22)8-25-2\/h3-4,7H,5-6,8-10H2,1-2H3,(H2,19,21)(H,20,22)\/t16-,17-\/m0\/s1 targets the protein Aspartyl protease 2. The protein Aspartyl protease 2 enables the enzyme binding."} {"text":"The compound InChI=1S\/C21H16FN5O2\/c1-28-15-11-26-21(27-12-15)29-19-5-3-2-4-17(19)13-6-7-16(18(22)8-13)14-9-24-20(23)25-10-14\/h2-12H,1H3,(H2,23,24,25) targets the protein Arachidonate 5-lipoxygenase-activating protein. The protein Arachidonate 5-lipoxygenase-activating protein is located in the nuclear envelope."}", "/scratch/micpie/export/compound_protein_go_term_3/test_9-0.jsonl": "{"text":"The compound Ncnccc-ncc-cccccc6))))))ccnccc69)))))))))n6 targets the protein huCdc7 which enables the protein serine kinase activity."} {"text":"The compound CCC(C(=O)O)C1CCc2cc(OCCc3nc(-c4ccc(OC)c(OC)c4)oc3C)ccc21 targets the protein PPAR-gamma which is located in the nucleus."}", "/scratch/micpie/export/compound_protein_go_term_3/valid_9-1.jsonl": "{"text":"The compound InChI=1S\/C27H28Cl2F3N3O3S\/c1-39(37,38)21-12-6-17(7-13-21)16-2-4-18(5-3-16)24(27(30,31)32)34-22(14-23(28)29)25(36)35-26(15-33,19-8-9-19)20-10-11-20\/h2-7,12-13,19-20,22-24,34H,8-11,14H2,1H3,(H,35,36)\/t22-,24-\/m0\/s1 targets the protein Cathepsin S. The protein Cathepsin S enables the proteoglycan binding."} {"text":"The compound Cn1c(O)nc2ccc(-c3cccc([N+](=O)[O-])c3)cc21 targets the protein Nuclear receptor subfamily 3 group C member 3. The protein Nuclear receptor subfamily 3 group C member 3 is located in the nucleus."}", "/scratch/micpie/export/compound_protein_go_term_3/valid_3-0.jsonl": "{"text":"The compound CCCCNCCNCNcccccc6))))))CCCNCCCCC)C)cccCl)cCl)cc6%10))))))))))CC6)))))C5=O targets the protein Orphanin FQ receptor which enables the neuropeptide binding."} {"text":"The compound COc1ccc(P(=O)(OC)N2Cc3ccccc3C[C@@H]2C(=O)NO)cc1 targets the protein Matrix metalloproteinase-9 (MMP-9) (EC 3.4.24.35) (92 kDa gelatinase) (92 kDa type IV collagenase) (Gelatinase B) (GELB) which is located in the extracellular region."}", "/scratch/micpie/export/compound_protein_go_term_3/test_0-1.jsonl": "{"text":"The compound CCC)O)[C@@H]NC=O)Nccc[nH]ncNC=O)OCCF)F))))))c5cn9))))))))))))ccccF)cc6 targets the protein Mitogen-activated protein kinase 1. The protein Mitogen-activated protein kinase 1 enables the protein serine kinase activity."} {"text":"The compound COcccNcncN[C@@H]C)cnccF)cn6))))))))ncNCCOCC6))))))c6F))))))))n[nH]5 targets the protein L-JAK. The protein L-JAK is located in the cytosol."}", "/scratch/micpie/export/compound_protein_go_term_3/test_5-0.jsonl": "{"text":"The compound NC(=O)c1cnccc1N(CCO)c1nc(-c2nn(Cc3ccccc3F)c3c2CCC3)ncc1OCCO targets the protein hBUB1 which enables the protein serine kinase activity."} {"text":"The compound N=C(N)C1CCC[C@H](NC(=O)CN2CCC[C@H](NS(=O)(=O)Cc3ccccc3)C2=O)C1O targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) which is located in the external side of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_3/test_2-0.jsonl": "{"text":"The compound FcccF)cCl)cO[C@H]cccccc6))))))[C@@H]CCNC5)))))))c6Cl targets the protein Solute carrier family 6 member 3 which enables the heterocyclic compound binding."} {"text":"The compound Nc1nc(Cc2ccccc2Cl)nc2cn(-c3ccccc3)nc12 targets the protein Adenosine receptor A1 which is involved in the regulation of glomerular filtration."}", "/scratch/micpie/export/compound_protein_go_term_3/valid_0-0.jsonl": "{"text":"The compound CN1CCN(CCCOc2ccc3c(Nc4ccc(NC(=O)NC5CCCCC5)cc4)ncnc3c2)CC1 targets the protein Aurora\/IPL1-related kinase 2 which is involved in the positive regulation of mitotic sister chromatid segregation."} {"text":"The compound InChI=1S\/C32H41N7O2\/c1-23-8-9-27(36-32(40)35-25-6-2-3-7-25)20-30(23)39-17-14-29-24(22-39)21-33-31(37-29)34-26-10-12-28(13-11-26)41-19-18-38-15-4-5-16-38\/h8-13,20-21,25H,2-7,14-19,22H2,1H3,(H,33,34,37)(H2,35,36,40) targets the protein Janus kinase 2 which is located in the cytosol."}", "/scratch/micpie/export/compound_protein_go_term_3/train_6-1.jsonl": "{"text":"The compound CC(C)(NS(=O)(=O)c1ccccc1)C(=O)NC1=NN=C(c2ccc(Cl)cc2)CS1 targets the protein 72 kDa type IV collagenase (EC 3.4.24.24) (72 kDa gelatinase) (Gelatinase A) (Matrix metalloproteinase-2) (MMP-2) (TBE-1). The protein 72 kDa type IV collagenase (EC 3.4.24.24) (72 kDa gelatinase) (Gelatinase A) (Matrix metalloproteinase-2) (MMP-2) (TBE-1) is involved in the response to electrical stimulus."} {"text":"The compound InChI=1S\/C15H24N2\/c1-2-3-7-13-12-17(11-10-15(13)16)14-8-5-4-6-9-14\/h4-6,8-9,13,15H,2-3,7,10-12,16H2,1H3\/t13-,15+\/m0\/s1 targets the protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26). The protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26) is located in the plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_3/test_7-0.jsonl": "{"text":"The compound Cc1ccc(C)c(N2CCN(C(=O)C[C@@H](NS(=O)(=O)Cc3ccccc3)C(=O)N3CCC[C@H]3C(=O)NCc3ccc(C(=N)N)cc3)CC2)c1 targets the protein Transmembrane protease serine 11D (EC 3.4.21.-) (Airway trypsin-like protease) which enables the hydrolase activity."} {"text":"The compound InChI=1S\/C19H16F4N4O2S\/c1-2-27-30(28,29)17-8-13(19(21,22)23)4-6-15(17)11-3-5-14(16(20)7-11)12-9-25-18(24)26-10-12\/h3-10,27H,2H2,1H3,(H2,24,25,26) targets the protein Arachidonate 5-lipoxygenase-activating protein which is located in the nuclear envelope."}", "/scratch/micpie/export/compound_protein_go_term_3/test_3-0.jsonl": "{"text":"The compound [N][C][=N][C][Branch1][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][Cl][=N][C][=C][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][N][=C][Ring2][Ring1][#Branch1][Ring1][O] targets the protein Adenosine receptor A1 which is involved in the negative regulation of systemic arterial blood pressure."} {"text":"The compound CCC[C@H]NC=O)[C@@H]C[C@@H]S=O)=O)cccccc6CF)F)F)))))))))CN5C=O)CcnccCl)cc6F)))))))CC3)))))))))))C=O)C=O)NCCC3 targets the protein Procathepsin L (EC 3.4.22.15) (Cathepsin L1) (Major excreted protein) (MEP) which is located in the extracellular space."}", "/scratch/micpie/export/compound_protein_go_term_3/train_1-0.jsonl": "{"text":"The compound Cc1cc(Nc2nc(O[C@@H](C)c3ccc(F)cn3)c(C#N)nc2C)n[nH]1 targets the protein Tyrosine-protein kinase JAK2 which is involved in the activation of cysteine-type endopeptidase activity involved in apoptotic process."} {"text":"The compound [Cl][C][=C][C][=C][Branch1][S][C@@H1][C][C][N][C][C@@][Ring1][=Branch1][C][C][C][O][C][Ring1][=Branch1][C][=C][Ring1][P][Cl] targets the protein Norepinephrine transporter which is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_3/test_0-0.jsonl": "{"text":"The compound CC(C)(O)[C@@H](NC(=O)Nc1cc2[nH]nc(NC(=O)OCC(F)F)c2cn1)c1ccc(F)cc1 targets the protein ERK-2 which enables the protein serine kinase activity."} {"text":"The compound COc1cc(Nc2nc(N[C@@H](C)c3ncc(F)cn3)nc(N3CCOCC3)c2F)n[nH]1 targets the protein Janus kinase 3 which is located in the cytosol."}", "/scratch/micpie/export/compound_protein_go_term_3/test_6-0.jsonl": "{"text":"The compound CC(C)(C)OC(=O)N[C@@H](C(=O)N1CCC[C@H]1C(=O)NC(C=O)CCCN=C(N)N)c1cccc2ccccc12 targets the protein Serine protease 1 (EC 3.4.21.4) (Anionic trypsin I) (Anionic trypsin-I) (Beta-trypsin) (Cationic trypsinogen) (Pretrypsinogen I) (Trypsin I) (Trypsin-1) which enables the metal ion binding."} {"text":"The compound InChI=1S\/C22H29N3O4\/c1-25(29-2)21(28)20(27)24-18-5-3-17(4-6-18)23-19(26)13-22-10-14-7-15(11-22)9-16(8-14)12-22\/h3-6,14-16H,7-13H2,1-2H3,(H,23,26)(H,24,27) targets the protein Bifunctional epoxide hydrolase 2 which is located in the peroxisome."}", "/scratch/micpie/export/compound_protein_go_term_3/train_2-0.jsonl": "{"text":"The compound COc1ccc(O[C@@H](CC(C)C)[C@H]2CCNC2)c(C)n1.O=C(O)C(O)C(O)C(=O)O targets the protein Sodium-dependent dopamine transporter which enables the heterocyclic compound binding."} {"text":"The compound [C][N][C][C][C@][C][=C][C][=C][C][Branch1][C][O][=C][Ring1][#Branch1][O][C@H1][Ring1][#Branch2][C@@][C][C][C@@][Ring1][=C][Branch2][Ring1][=Branch1][C][C@H1][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][Branch1][C][C][Branch1][C][C][O][Ring1][=N][C][Ring2][Ring1][O][C][Ring2][Ring1][#Branch1] targets the protein D-OR-1 which is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_3/valid_2-0.jsonl": "{"text":"The compound [C][C][=C][S][C][Branch2][Ring1][N][C@@][C][N][C][C][C@][Ring1][=Branch1][Branch1][#C][C][=C][C][=C][Branch1][C][Cl][C][Branch1][C][Cl][=C][Ring1][Branch2][C][Ring1][#C][=N][Ring2][Ring1][Ring2] targets the protein NET which is involved in the organic substance transport."} {"text":"The compound [C][C][=Branch1][C][=O][C][S][N][=C][Branch1][=C][N][C][=C][C][Branch1][C][Cl][=C][C][=C][Ring1][#Branch1][F][C][=Ring1][=C][N] targets the protein Adenosine receptor A2a which is located in the plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_3/test_7-1.jsonl": "{"text":"The compound Cc1ccc(C)c(N2CCN(C(=O)C[C@@H](NS(=O)(=O)Cc3ccccc3)C(=O)N3CCC[C@H]3C(=O)NCc3ccc(C(=N)N)cc3)CC2)c1 targets the protein Transmembrane protease serine 11D (EC 3.4.21.-) (Airway trypsin-like protease). The protein Transmembrane protease serine 11D (EC 3.4.21.-) (Airway trypsin-like protease) enables the hydrolase activity."} {"text":"The compound [C][C][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][=C][C][=C][Ring1][#Branch2][C][=C][C][=C][Branch1][N][C][=C][N][=C][Branch1][C][N][N][=C][Ring1][#Branch1][C][Branch1][C][F][=C][Ring1][=C] targets the protein FLAP. The protein FLAP is located in the nuclear envelope."}", "/scratch/micpie/export/compound_protein_go_term_3/valid_2-1.jsonl": "{"text":"The compound InChI=1S\/C16H16Cl2N2S\/c1-10-7-21-14(20-10)16-8-15(16,4-5-19-9-16)11-2-3-12(17)13(18)6-11\/h2-3,6-7,19H,4-5,8-9H2,1H3\/t15-,16-\/m1\/s1 targets the protein Norepinephrine transporter. The protein Norepinephrine transporter is involved in the organic substance transport."} {"text":"The compound CC=O)csncNcccCl)ccc6F))))))))c5N targets the protein Adenosine receptor A2a. The protein Adenosine receptor A2a is located in the plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_3/valid_4-0.jsonl": "{"text":"The compound InChI=1S\/C25H28ClN7O5S2\/c26-18-6-1-5-17-16(18)4-2-8-20(17)40(37,38)33-12-11-32(22(35)15-33)14-21(34)31-19(7-3-9-30-25(27)28)23(36)24-29-10-13-39-24\/h1-2,4-6,8,10,13,19H,3,7,9,11-12,14-15H2,(H,31,34)(H4,27,28,30) targets the protein Coagulation factor X (EC 3.4.21.6) (Stuart factor) (Stuart-Prower factor) which enables the hydrolase activity."} {"text":"The compound ccccc-cccncc-ccccOCCNCCCCC6)))))))))cc6))))))cnc96)))))))))ccnc6c%10 targets the protein Bone morphogenetic protein 2B which is involved in the BMP signaling pathway."}", "/scratch/micpie/export/compound_protein_go_term_3/train_5-1.jsonl": "{"text":"The compound InChI=1S\/C15H10N4\/c1-2-5-14-12(4-1)11(6-8-16-14)13-10-18-19-9-3-7-17-15(13)19\/h1-10H targets the protein BMP-4. The protein BMP-4 enables the BMP receptor binding."} {"text":"The compound CC(C)(NS(=O)(=O)c1ccccc1)C(=O)NC1=NN=C(c2ccc(Cl)cc2)CS1 targets the protein 72 kDa type IV collagenase (EC 3.4.24.24) (72 kDa gelatinase) (Gelatinase A) (Matrix metalloproteinase-2) (MMP-2) (TBE-1). The protein 72 kDa type IV collagenase (EC 3.4.24.24) (72 kDa gelatinase) (Gelatinase A) (Matrix metalloproteinase-2) (MMP-2) (TBE-1) is involved in the response to hyperoxia."}", "/scratch/micpie/export/compound_protein_go_term_3/test_2-1.jsonl": "{"text":"The compound Fc1cc(F)c(Cl)c(O[C@H](c2ccccc2)[C@@H]2CCNC2)c1Cl targets the protein DA transporter. The protein DA transporter enables the heterocyclic compound binding."} {"text":"The compound [N][C][=N][C][Branch1][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][Cl][=N][C][=C][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][N][=C][Ring2][Ring1][#Branch1][Ring1][O] targets the protein Adenosine receptor A1. The protein Adenosine receptor A1 is involved in the regulation of glomerular filtration."}", "/scratch/micpie/export/compound_protein_go_term_3/train_0-0.jsonl": "{"text":"The compound InChI=1S\/C25H27ClN6O3S\/c1-4-18-11-16(9-10-27-18)14-28-20-13-23(26)30-32-24(15(2)29-25(20)32)17-5-8-21(35-3)22(12-17)36(33,34)31-19-6-7-19\/h5,8-13,19,28,31H,4,6-7,14H2,1-3H3 targets the protein PI4Kbeta which enables the phosphatidylinositol kinase activity."} {"text":"The compound Cc1cc(Nc2nc(O[C@@H](C)c3ccc(F)cn3)c(C#N)nc2C)n[nH]1 targets the protein Janus kinase 2 which is involved in the immune response."}", "/scratch/micpie/export/compound_protein_go_term_3/test_1-1.jsonl": "{"text":"The compound Cn1cc(Nc2ncc(Cl)c(NC3CC4CN(c5ccc(C#N)cn5)CC4C3)n2)cn1 targets the protein JAK-1. The protein JAK-1 enables the metal ion binding."} {"text":"The compound InChI=1S\/C15H19Cl2NO\/c1-14-8-18-9-15(14,13(14)5-6-19-2)10-3-4-11(16)12(17)7-10\/h3-4,7,13,18H,5-6,8-9H2,1-2H3\/t13-,14-,15-\/m0\/s1 targets the protein Solute carrier family 6 member 2. The protein Solute carrier family 6 member 2 is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_3/valid_9-0.jsonl": "{"text":"The compound CS(=O)(=O)c1ccc(-c2ccc([C@H](N[C@@H](CC(Cl)Cl)C(=O)NC(C#N)(C3CC3)C3CC3)C(F)(F)F)cc2)cc1 targets the protein Cathepsin S which enables the proteoglycan binding."} {"text":"The compound InChI=1S\/C14H11N3O3\/c1-16-13-8-10(5-6-12(13)15-14(16)18)9-3-2-4-11(7-9)17(19)20\/h2-8H,1H3,(H,15,18) targets the protein Nuclear receptor subfamily 3 group C member 3 which is located in the nucleus."}", "/scratch/micpie/export/compound_protein_go_term_3/train_8-1.jsonl": "{"text":"The compound O=S(=O)(c1ccccc1-c1ccc(-c2cnc3[nH]ccc3n2)c(F)c1)N1CCOCC1 targets the protein MK-886-binding protein. The protein MK-886-binding protein enables the arachidonic acid binding."} {"text":"The compound CC[C@@H](C)[C@H](NC(=O)OCc1ccccc1)C(=O)N[C@@H](C=O)Cc1c[nH]c2ccccc12 targets the protein Procathepsin L (EC 3.4.22.15) (Cathepsin L1) (Major excreted protein) (MEP). The protein Procathepsin L (EC 3.4.22.15) (Cathepsin L1) (Major excreted protein) (MEP) is involved in the fusion of virus membrane with host plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_3/train_8-0.jsonl": "{"text":"The compound O=S(=O)(c1ccccc1-c1ccc(-c2cnc3[nH]ccc3n2)c(F)c1)N1CCOCC1 targets the protein MK-886-binding protein which enables the arachidonic acid binding."} {"text":"The compound CC[C@@H](C)[C@H](NC(=O)OCc1ccccc1)C(=O)N[C@@H](C=O)Cc1c[nH]c2ccccc12 targets the protein Procathepsin L (EC 3.4.22.15) (Cathepsin L1) (Major excreted protein) (MEP) which is involved in the fusion of virus membrane with host plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_3/test_5-1.jsonl": "{"text":"The compound NC(=O)c1cnccc1N(CCO)c1nc(-c2nn(Cc3ccccc3F)c3c2CCC3)ncc1OCCO targets the protein Mitotic checkpoint serine\/threonine-protein kinase BUB1. The protein Mitotic checkpoint serine\/threonine-protein kinase BUB1 enables the protein serine kinase activity."} {"text":"The compound N=C(N)C1CCC[C@H](NC(=O)CN2CCC[C@H](NS(=O)(=O)Cc3ccccc3)C2=O)C1O targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II). The protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) is located in the external side of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_3/train_4-1.jsonl": "{"text":"The compound [C][C][=C][C][Branch2][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring2][Ring2][S][=Branch1][C][=O][=Branch1][C][=O][C][C][Branch2][Ring1][C][C][C][C][C][C][Branch1][Branch1][C][C][Ring1][=Branch1][O][C][C][O][Ring1][#Branch1][N][Branch1][C][O][C][=O][C][=C][Ring2][Ring1][#Branch2][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][Ring2][Ring2][=Branch1] targets the protein Matrix metalloproteinase-13. The protein Matrix metalloproteinase-13 enables the peptidase activity."} {"text":"The compound InChI=1S\/C24H26N6O\/c1-28-10-12-29(13-11-28)14-15-31-22-4-2-19(3-5-22)21-16-26-24-23(17-27-30(24)18-21)20-6-8-25-9-7-20\/h2-9,16-18H,10-15H2,1H3 targets the protein Bone morphogenetic protein 4. The protein Bone morphogenetic protein 4 is located in the extracellular space."}", "/scratch/micpie/export/compound_protein_go_term_3/train_5-0.jsonl": "{"text":"The compound [C][=C][C][=C][C][Branch1][=C][C][C][=N][N][C][=C][C][=N][C][=Ring1][=Branch2][Ring1][=Branch1][=C][C][=N][C][Ring1][#C][=C][Ring2][Ring1][Ring1] targets the protein BMP-2B which enables the BMP receptor binding."} {"text":"The compound CC(C)(NS(=O)(=O)c1ccccc1)C(=O)NC1=NN=C(c2ccc(Cl)cc2)CS1 targets the protein 72 kDa type IV collagenase (EC 3.4.24.24) (72 kDa gelatinase) (Gelatinase A) (Matrix metalloproteinase-2) (MMP-2) (TBE-1) which is involved in the response to hyperoxia."}", "/scratch/micpie/export/compound_protein_go_term_3/valid_0-1.jsonl": "{"text":"The compound InChI=1S\/C29H39N7O2\/c1-35-15-17-36(18-16-35)14-5-19-38-25-12-13-26-27(20-25)30-21-31-28(26)32-23-8-10-24(11-9-23)34-29(37)33-22-6-3-2-4-7-22\/h8-13,20-22H,2-7,14-19H2,1H3,(H,30,31,32)(H2,33,34,37) targets the protein AIM-1. The protein AIM-1 is involved in the positive regulation of mitotic sister chromatid segregation."} {"text":"The compound Cc1ccc(NC(=O)NC2CCCC2)cc1N1CCc2nc(Nc3ccc(OCCN4CCCC4)cc3)ncc2C1 targets the protein JAK-2. The protein JAK-2 is located in the cytosol."}", "/scratch/micpie/export/compound_protein_go_term_3/valid_7-1.jsonl": "{"text":"The compound COC[C@H](C)S(=O)(=O)c1ccc2cc1CN(C)C(=O)[C@H](Nc1ccc3c(N)ncc(F)c3c1)c1ccc(c(C)c1)[C@@H](C)COC(=O)N2 targets the protein Coagulation factor VII (EC 3.4.21.21) (Proconvertin) (Serum prothrombin conversion accelerator) (SPCA) (Eptacog alfa). The protein Coagulation factor VII (EC 3.4.21.21) (Proconvertin) (Serum prothrombin conversion accelerator) (SPCA) (Eptacog alfa) is involved in the positive regulation of leukocyte chemotaxis."} {"text":"The compound CCNCC))S=O)=O)cccccc6-cccc-ccnccc6)OCCN6)))))))))cF)c6 targets the protein FLAP. The protein FLAP is located in the membrane."}", "/scratch/micpie/export/compound_protein_go_term_3/train_2-1.jsonl": "{"text":"The compound COc1ccc(O[C@@H](CC(C)C)[C@H]2CCNC2)c(C)n1.O=C(O)C(O)C(O)C(=O)O targets the protein DAT. The protein DAT enables the heterocyclic compound binding."} {"text":"The compound [C][N][C][C][C@][C][=C][C][=C][C][Branch1][C][O][=C][Ring1][#Branch1][O][C@H1][Ring1][#Branch2][C@@][C][C][C@@][Ring1][=C][Branch2][Ring1][=Branch1][C][C@H1][Ring1][=Branch1][C][Branch1][C][C][Branch1][C][C][C][Branch1][C][C][Branch1][C][C][O][Ring1][=N][C][Ring2][Ring1][O][C][Ring2][Ring1][#Branch1] targets the protein D-OR-1. The protein D-OR-1 is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_3/valid_1-1.jsonl": "{"text":"The compound N#CCCnccCN)=O))cNccccS=O)=O)CF)F)))cc6)))))))n5)))))CCCNCCF)C4))))CC6 targets the protein Janus kinase 2. The protein Janus kinase 2 enables the peptide hormone receptor binding."} {"text":"The compound Cccsc[C@@]CNCC[C@]6ccccCl)cCl)c6))))))C7)))))))n5 targets the protein Norepinephrine transporter. The protein Norepinephrine transporter is involved in the nitrogen compound transport."}", "/scratch/micpie/export/compound_protein_go_term_3/test_3-1.jsonl": "{"text":"The compound InChI=1S\/C18H14ClN5\/c19-14-9-5-4-6-12(14)10-16-21-15-11-24(13-7-2-1-3-8-13)23-17(15)18(20)22-16\/h1-9,11H,10H2,(H2,20,21,22) targets the protein Adenosine receptor A1. The protein Adenosine receptor A1 is involved in the negative regulation of systemic arterial blood pressure."} {"text":"The compound CCC[C@H](NC(=O)[C@@H]1C[C@@H](S(=O)(=O)c2ccccc2C(F)(F)F)CN1C(=O)C1(c2ncc(Cl)cc2F)CC1)C(=O)C(=O)NC1CC1 targets the protein Procathepsin L (EC 3.4.22.15) (Cathepsin L1) (Major excreted protein) (MEP). The protein Procathepsin L (EC 3.4.22.15) (Cathepsin L1) (Major excreted protein) (MEP) is located in the extracellular space."}", "/scratch/micpie/export/compound_protein_go_term_3/train_9-0.jsonl": "{"text":"The compound CC[C@@H]C)[C@H]NC=O)OCcccccc6))))))))))C=O)N[C@@H]C=O))Ccc[nH]cccccc96 targets the protein Procathepsin L (EC 3.4.22.15) (Cathepsin L1) (Major excreted protein) (MEP) which is involved in the protein autoprocessing."} {"text":"The compound C=C[C@]12CC[C@](C)(O)CC1=CC[C@H]1[C@@H]3CC[C@H](O)[C@@]3(C)CC[C@@H]12 targets the protein Nuclear receptor subfamily 3 group A member 2 which is involved in the negative regulation of transcription by RNA polymerase II."}", "/scratch/micpie/export/compound_protein_go_term_3/test_1-0.jsonl": "{"text":"The compound Cn1cc(Nc2ncc(Cl)c(NC3CC4CN(c5ccc(C#N)cn5)CC4C3)n2)cn1 targets the protein Janus kinase 1 which enables the metal ion binding."} {"text":"The compound COCC[C@H]1[C@]2(C)CNC[C@]12c1ccc(Cl)c(Cl)c1 targets the protein NET which is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_3/test_6-1.jsonl": "{"text":"The compound CCC)C)OC=O)N[C@@H]C=O)NCCC[C@H]5C=O)NCC=O))CCCN=CN)N)))))))))))))))cccccccccc%106 targets the protein Serine protease 1 (EC 3.4.21.4) (Anionic trypsin I) (Anionic trypsin-I) (Beta-trypsin) (Cationic trypsinogen) (Pretrypsinogen I) (Trypsin I) (Trypsin-1). The protein Serine protease 1 (EC 3.4.21.4) (Anionic trypsin I) (Anionic trypsin-I) (Beta-trypsin) (Cationic trypsinogen) (Pretrypsinogen I) (Trypsin I) (Trypsin-1) enables the metal ion binding."} {"text":"The compound CON(C)C(=O)C(=O)Nc1ccc(NC(=O)CC23CC4CC(CC(C4)C2)C3)cc1 targets the protein Bifunctional epoxide hydrolase 2. The protein Bifunctional epoxide hydrolase 2 is located in the peroxisome."}", "/scratch/micpie/export/compound_protein_go_term_3/valid_4-1.jsonl": "{"text":"The compound N=C(N)NCCCC(NC(=O)CN1CCN(S(=O)(=O)c2cccc3c(Cl)cccc23)CC1=O)C(=O)c1nccs1 targets the protein Coagulation factor X (EC 3.4.21.6) (Stuart factor) (Stuart-Prower factor). The protein Coagulation factor X (EC 3.4.21.6) (Stuart factor) (Stuart-Prower factor) enables the hydrolase activity."} {"text":"The compound InChI=1S\/C29H28N4O\/c1-4-15-32(16-5-1)18-19-34-24-10-8-22(9-11-24)23-20-31-29-27(13-17-33(29)21-23)25-12-14-30-28-7-3-2-6-26(25)28\/h2-3,6-14,17,20-21H,1,4-5,15-16,18-19H2 targets the protein BMP-4. The protein BMP-4 is involved in the BMP signaling pathway."}", "/scratch/micpie/export/compound_protein_go_term_3/train_1-1.jsonl": "{"text":"The compound CcccNcncO[C@@H]C)ccccF)cn6))))))))cC#N))nc6C))))))))n[nH]5 targets the protein JAK-2. The protein JAK-2 is involved in the activation of cysteine-type endopeptidase activity involved in apoptotic process."} {"text":"The compound [Cl][C][=C][C][=C][Branch1][S][C@@H1][C][C][N][C][C@@][Ring1][=Branch1][C][C][C][O][C][Ring1][=Branch1][C][=C][Ring1][P][Cl] targets the protein Solute carrier family 6 member 2. The protein Solute carrier family 6 member 2 is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_3/valid_7-0.jsonl": "{"text":"The compound COC[C@H]C)S=O)=O)cccccc6CNC)C=O)[C@H]NcccccN)nccF)c6c%10)))))))))))cccccC)c6))[C@@H]C)COC=O)N%16 targets the protein Coagulation factor VII (EC 3.4.21.21) (Proconvertin) (Serum prothrombin conversion accelerator) (SPCA) (Eptacog alfa) which is involved in the positive regulation of leukocyte chemotaxis."} {"text":"The compound CCNCC))S=O)=O)cccccc6-cccc-ccnccc6)OCCN6)))))))))cF)c6 targets the protein MK-886-binding protein which is located in the membrane."}", "/scratch/micpie/export/compound_protein_go_term_3/valid_8-1.jsonl": "{"text":"The compound CCN(CC)S(=O)(=O)c1ccccc1-c1ccc(-c2cnc3c(c2)OCCN3)c(F)c1 targets the protein MK-886-binding protein. The protein MK-886-binding protein is located in the cytosol."} {"text":"The compound CC(C)[C@H](NC(=O)[C@H](C)NC(=O)[C@@H](NC(=O)c1cccc2ccccc12)C(C)(C)C)C(=O)C(=O)NCc1ccc(S(N)(=O)=O)cc1 targets the protein Cathepsin B (EC 3.4.22.1) (APP secretase) (APPS) (Cathepsin B1). The protein Cathepsin B (EC 3.4.22.1) (APP secretase) (APPS) (Cathepsin B1) is located in the collagen-containing extracellular matrix."}", "/scratch/micpie/export/compound_protein_go_term_3/train_0-1.jsonl": "{"text":"The compound [C][C][C][=C][C][Branch2][Branch1][=Branch1][C][N][C][=C][C][Branch1][C][Cl][=N][N][C][Branch2][Ring1][N][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][=C][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][C][Ring1][Ring1][=C][Ring1][#C][=C][Branch1][C][C][N][=C][Ring2][Ring1][#Branch2][Ring2][Ring1][Branch1][=C][C][=N][Ring2][Ring2][C] targets the protein PI4K-beta. The protein PI4K-beta enables the phosphatidylinositol kinase activity."} {"text":"The compound InChI=1S\/C17H16FN7O\/c1-9-6-15(25-24-9)22-16-10(2)21-14(7-19)17(23-16)26-11(3)13-5-4-12(18)8-20-13\/h4-6,8,11H,1-3H3,(H2,22,23,24,25)\/t11-\/m0\/s1 targets the protein Tyrosine-protein kinase JAK2. The protein Tyrosine-protein kinase JAK2 is involved in the immune response."}", "/scratch/micpie/export/compound_protein_go_term_3/valid_8-0.jsonl": "{"text":"The compound [C][C][N][Branch1][Ring1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][Branch1][P][C][=C][N][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][C][N][Ring1][#Branch1][C][Branch1][C][F][=C][Ring1][P] targets the protein MK-886-binding protein which is located in the cytosol."} {"text":"The compound InChI=1S\/C33H41N5O7S\/c1-19(2)26(27(39)31(42)35-18-21-14-16-23(17-15-21)46(34,44)45)37-29(40)20(3)36-32(43)28(33(4,5)6)38-30(41)25-13-9-11-22-10-7-8-12-24(22)25\/h7-17,19-20,26,28H,18H2,1-6H3,(H,35,42)(H,36,43)(H,37,40)(H,38,41)(H2,34,44,45)\/t20-,26-,28+\/m0\/s1 targets the protein Cathepsin B (EC 3.4.22.1) (APP secretase) (APPS) (Cathepsin B1) which is located in the collagen-containing extracellular matrix."}", "/scratch/micpie/export/compound_protein_go_term_3/test_9-1.jsonl": "{"text":"The compound [N][C][=N][C][=C][C][Branch2][Ring1][Branch2][N][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][N][=C][C][=C][Ring1][=Branch1][Ring1][#C][=N][Ring2][Ring1][Branch1] targets the protein HsCdc7. The protein HsCdc7 enables the protein serine kinase activity."} {"text":"The compound CCC(C(=O)O)C1CCc2cc(OCCc3nc(-c4ccc(OC)c(OC)c4)oc3C)ccc21 targets the protein PPAR-gamma. The protein PPAR-gamma is located in the nucleus."}", "/scratch/micpie/export/compound_protein_go_term_3/valid_1-0.jsonl": "{"text":"The compound N#CCCnccCN)=O))cNccccS=O)=O)CF)F)))cc6)))))))n5)))))CCCNCCF)C4))))CC6 targets the protein Janus kinase 2 which enables the peptide hormone receptor binding."} {"text":"The compound [C][C][=C][S][C][Branch2][Ring1][N][C@@][C][N][C][C][C@][Ring1][=Branch1][Branch1][#C][C][=C][C][=C][Branch1][C][Cl][C][Branch1][C][Cl][=C][Ring1][Branch2][C][Ring1][#C][=N][Ring2][Ring1][Ring2] targets the protein Norepinephrine transporter which is involved in the nitrogen compound transport."}", "/scratch/micpie/export/compound_protein_go_term_3/train_6-0.jsonl": "{"text":"The compound CC(C)(NS(=O)(=O)c1ccccc1)C(=O)NC1=NN=C(c2ccc(Cl)cc2)CS1 targets the protein 72 kDa type IV collagenase (EC 3.4.24.24) (72 kDa gelatinase) (Gelatinase A) (Matrix metalloproteinase-2) (MMP-2) (TBE-1) which is involved in the response to electrical stimulus."} {"text":"The compound CCCC[C@H]1CN(c2ccccc2)CC[C@H]1N targets the protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26) which is located in the plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_3/train_3-1.jsonl": "{"text":"The compound [C][C][=C][C][=C][Branch2][Ring1][N][S][=Branch1][C][=O][=Branch1][C][=O][O][C][=C][C][=C][C][Branch1][=Branch2][C][C][C][N][C][C][Ring1][=Branch1][=C][Ring1][N][C][C][=C][Ring2][Ring1][#Branch1] targets the protein Serotonin receptor 6. The protein Serotonin receptor 6 enables the neurotransmitter receptor activity."} {"text":"The compound [C][C][=C][C][Branch2][Branch1][C][C][O][C][=C][C][=C][Branch2][Ring2][Ring2][S][=Branch1][C][=O][=Branch1][C][=O][C][C][Branch2][Ring1][C][C][C][C][C][C][Branch1][Branch1][C][C][Ring1][=Branch1][O][C][C][O][Ring1][#Branch1][N][Branch1][C][O][C][=O][C][=C][Ring2][Ring1][#Branch2][=C][C][=C][C][=C][C][Ring1][=Branch1][=N][Ring2][Ring2][=Branch1] targets the protein Collagenase 3. The protein Collagenase 3 enables the metallopeptidase activity."}", "/scratch/micpie/export/compound_protein_go_term_3/test_8-0.jsonl": "{"text":"The compound InChI=1S\/C23H19FN6O\/c24-18-10-15(6-7-17(18)19-12-27-21(25)13-26-19)16-4-1-2-5-20(16)31-23-11-22(28-14-29-23)30-8-3-9-30\/h1-2,4-7,10-14H,3,8-9H2,(H2,25,27) targets the protein FLAP which enables the arachidonic acid binding."} {"text":"The compound [N][#C][C][Branch2][Ring2][Branch1][N][C][=Branch1][C][=O][C@H1][Branch1][=Branch1][C][C][C][C][N][N][C][=Branch1][C][=O][C][=C][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][N][C][C][Ring2][Ring1][O] targets the protein Cathepsin B (EC 3.4.22.1) (APP secretase) (APPS) (Cathepsin B1) which is located in the collagen-containing extracellular matrix."}", "/scratch/micpie/export/compound_protein_go_term_3/valid_3-1.jsonl": "{"text":"The compound InChI=1S\/C31H42Cl2N4O\/c1-4-5-15-34-16-19-36-22-37(23-9-7-6-8-10-23)31(29(36)38)13-17-35(18-14-31)28-11-12-30(2,3)25-21-27(33)26(32)20-24(25)28\/h6-10,20-21,28,34H,4-5,11-19,22H2,1-3H3 targets the protein Kappa-type 3 opioid receptor. The protein Kappa-type 3 opioid receptor enables the neuropeptide binding."} {"text":"The compound InChI=1S\/C18H21N2O5P\/c1-24-15-7-9-16(10-8-15)26(23,25-2)20-12-14-6-4-3-5-13(14)11-17(20)18(21)19-22\/h3-10,17,22H,11-12H2,1-2H3,(H,19,21)\/t17-,26?\/m1\/s1 targets the protein Matrix metalloproteinase-9 (MMP-9) (EC 3.4.24.35) (92 kDa gelatinase) (92 kDa type IV collagenase) (Gelatinase B) (GELB). The protein Matrix metalloproteinase-9 (MMP-9) (EC 3.4.24.35) (92 kDa gelatinase) (92 kDa type IV collagenase) (Gelatinase B) (GELB) is located in the extracellular region."}", "/scratch/micpie/export/compound_protein_go_term_3/train_9-1.jsonl": "{"text":"The compound CC[C@@H](C)[C@H](NC(=O)OCc1ccccc1)C(=O)N[C@@H](C=O)Cc1c[nH]c2ccccc12 targets the protein Procathepsin L (EC 3.4.22.15) (Cathepsin L1) (Major excreted protein) (MEP). The protein Procathepsin L (EC 3.4.22.15) (Cathepsin L1) (Major excreted protein) (MEP) is involved in the protein autoprocessing."} {"text":"The compound InChI=1S\/C21H32O2\/c1-4-21-12-11-19(2,23)13-14(21)5-6-15-16-7-8-18(22)20(16,3)10-9-17(15)21\/h4-5,15-18,22-23H,1,6-13H2,2-3H3\/t15-,16-,17-,18-,19-,20-,21-\/m0\/s1 targets the protein Nuclear receptor subfamily 3 group A member 2. The protein Nuclear receptor subfamily 3 group A member 2 is involved in the negative regulation of transcription by RNA polymerase II."}", "/scratch/micpie/export/compound_protein_go_term_3/test_4-1.jsonl": "{"text":"The compound CCOC(=O)COc1cc(C(=O)O)c(O)c2ccccc12 targets the protein Caspase-9 (CASP-9) (EC 3.4.22.62) (Apoptotic protease Mch-6) (Apoptotic protease-activating factor 3) (APAF-3) (ICE-like apoptotic protease 6) (ICE-LAP6). The protein Caspase-9 (CASP-9) (EC 3.4.22.62) (Apoptotic protease Mch-6) (Apoptotic protease-activating factor 3) (APAF-3) (ICE-like apoptotic protease 6) (ICE-LAP6) enables the cysteine-type endopeptidase activity involved in execution phase of apoptosis."} {"text":"The compound InChI=1S\/C16H19N7O6S2\/c17-13-10-14(21-7-20-13)23(22-15(10)30-6-8-3-1-2-4-19-8)16-12(25)11(24)9(29-16)5-28-31(18,26)27\/h1-4,7,9,11-12,16,24-25H,5-6H2,(H2,17,20,21)(H2,18,26,27)\/t9-,11-,12-,16-\/m1\/s1 targets the protein ATG12-activating enzyme E1 ATG7. The protein ATG12-activating enzyme E1 ATG7 is located in the phagophore assembly site."}", "/scratch/micpie/export/compound_protein_go_term_3/valid_6-1.jsonl": "{"text":"The compound CC(C)NC(=O)c1ccc(-c2ccc(-c3ccccc3)n2Cc2cccc(N)n2)cc1 targets the protein Memapsin-2. The protein Memapsin-2 enables the enzyme binding."} {"text":"The compound COC[C@H]C)S=O)=O)cccccc6CNC)C=O)[C@H]NcccccN)nccF)c6c%10)))))))))))cccccC)c6))[C@@H]C)COC=O)N%16 targets the protein Coagulation factor VII (EC 3.4.21.21) (Proconvertin) (Serum prothrombin conversion accelerator) (SPCA) (Eptacog alfa). The protein Coagulation factor VII (EC 3.4.21.21) (Proconvertin) (Serum prothrombin conversion accelerator) (SPCA) (Eptacog alfa) is involved in the proteolysis."}", "/scratch/micpie/export/compound_protein_go_term_3/valid_6-0.jsonl": "{"text":"The compound InChI=1S\/C26H26N4O\/c1-18(2)28-26(31)21-13-11-20(12-14-21)24-16-15-23(19-7-4-3-5-8-19)30(24)17-22-9-6-10-25(27)29-22\/h3-16,18H,17H2,1-2H3,(H2,27,29)(H,28,31) targets the protein Beta-secretase 1 which enables the enzyme binding."} {"text":"The compound COC[C@H](C)S(=O)(=O)c1ccc2cc1CN(C)C(=O)[C@H](Nc1ccc3c(N)ncc(F)c3c1)c1ccc(c(C)c1)[C@@H](C)COC(=O)N2 targets the protein Coagulation factor VII (EC 3.4.21.21) (Proconvertin) (Serum prothrombin conversion accelerator) (SPCA) (Eptacog alfa) which is involved in the proteolysis."}", "/scratch/micpie/export/compound_protein_go_term_3/train_3-0.jsonl": "{"text":"The compound InChI=1S\/C19H23NO3S\/c1-14-6-8-17(9-7-14)24(21,22)23-19-5-3-4-18(15(19)2)16-10-12-20-13-11-16\/h3-9,16,20H,10-13H2,1-2H3 targets the protein Serotonin receptor 6 which enables the neurotransmitter receptor activity."} {"text":"The compound CcccCOccccS=O)=O)CCCCCCCCC6))OCCO5)))))))))NO)C=O))))))cc6))))))))cccccc6n%10 targets the protein MMP-13 which enables the metallopeptidase activity."}", "/scratch/micpie/export/compound_protein_go_term_3/train_7-0.jsonl": "{"text":"The compound InChI=1S\/C17H22FN3O5S\/c1-16(10-27(23,24)17(15(19)21-16)5-6-26-9-17)12-7-11(3-4-13(12)18)20-14(22)8-25-2\/h3-4,7H,5-6,8-10H2,1-2H3,(H2,19,21)(H,20,22)\/t16-,17-\/m0\/s1 targets the protein Membrane-associated aspartic protease 2 which enables the enzyme binding."} {"text":"The compound COccncOcccccc6-cccc-ccncN)nc6))))))cF)c6)))))))))))))nc6 targets the protein MK-886-binding protein which is located in the nuclear envelope."}", "/scratch/micpie/export/compound_protein_go_term_3/train_4-0.jsonl": "{"text":"The compound InChI=1S\/C29H34N2O7S\/c1-21-16-23(27-4-2-3-5-28(27)30-21)18-36-25-6-8-26(9-7-25)39(34,35)19-24(31(33)20-32)17-22-10-12-29(13-11-22)37-14-15-38-29\/h2-9,16,20,22,24,33H,10-15,17-19H2,1H3 targets the protein Collagenase 3 which enables the peptidase activity."} {"text":"The compound InChI=1S\/C24H26N6O\/c1-28-10-12-29(13-11-28)14-15-31-22-4-2-19(3-5-22)21-16-26-24-23(17-27-30(24)18-21)20-6-8-25-9-7-20\/h2-9,16-18H,10-15H2,1H3 targets the protein BMP-2B which is located in the extracellular space."}", "/scratch/micpie/export/formation_energies/test_0-10.jsonl": "{"text":"User: I want to design a material that is thermodynamically stable. What chemical formula should I use?\nAssistant: Do you have any other constraints?\nUser: The material should have a decomposition enthalpy of -0.006 eV\/atom.\nAssistant: I would recommend usingAc1Au2Hg1."} {"text":"User: I want to design a that is not thermodynamically stable. What chemical formula should I use?\nAssistant: Are there any other preferences?\nUser: The should have a decomposition enthalpy of 0.003 eV\/atom.\nAssistant: I suggest using Zn22Zr1."}", "/scratch/micpie/export/formation_energies/valid_0-8.jsonl": "{"text":"User: I need help withAc1Ag2Tl1.\nAssistant: How can I assist? What is your question?\nUser: Ac1Ag2Tl1 is thermodynamically stable because its decomposition enthalpy is -0.060 eV\/atom."} {"text":"User: I have a question about Yb3Zr1.\nAssistant: How can I be of help? What do you want to know?\nUser: Yb3Zr1 is not thermodynamically stable because its decomposition enthalpy is 0.344 eV\/atom."}", "/scratch/micpie/export/formation_energies/train_0-8.jsonl": "{"text":"User: I have a question about Ac1Ag1.\nAssistant: Sure. \nUser: Ac1Ag1 is thermodynamically stable because its decomposition enthalpy is -0.123 eV\/atom."} {"text":"User: I have a question about Zn3Zr1.\nAssistant: Sure. What do you want to know?\nUser: Zn3Zr1 is thermodynamically stable because its decomposition enthalpy is -0.051 eV\/atom."}", "/scratch/micpie/export/formation_energies/test_0-5.jsonl": "{"text":"User: What is the decomposition enthalpy of Ac1Au2Hg1?\nAssistant: -0.006 eV\/atom."} {"text":"User: What is the decomposition enthalpy of Zn22Zr1?\nAssistant: 0.003 eV\/atom."}", "/scratch/micpie/export/formation_energies/valid_0-9.jsonl": "{"text":"User: I want to design a stability#not &NULL}thermodynamically stable material What chemical formula should I use?\nAssistant: I would use Ac1Ag2Tl1."} {"text":"User: I want to design a stability#not &NULL}thermodynamically stable compound What composition should I use?\nAssistant: I would recommend usingYb3Zr1."}", "/scratch/micpie/export/formation_energies/test_0-1.jsonl": "{"text":"The formation enthalpy of Ac1Au2Hg1 is -0.601 eV\/atom."} {"text":"The formation enthalpy of Zn22Zr1 is -0.056 eV\/atom."}", "/scratch/micpie/export/formation_energies/valid_0-0.jsonl": "{"text":"The decomposition enthalpy of Ac1Ag2Tl1 is -0.060 eV\/atom."} {"text":"The decomposition enthalpy of Yb3Zr1 is 0.344 eV\/atom."}", "/scratch/micpie/export/formation_energies/test_0-2.jsonl": "{"text":"The decomposition enthalpy of Ac1Au2Hg1 is -0.006 eV\/atom."} {"text":"The decomposition enthalpy of Zn22Zr1 is 0.003 eV\/atom."}", "/scratch/micpie/export/formation_energies/valid_0-10.jsonl": "{"text":"User: I want to design a structure that is thermodynamically stable. What chemical formula should I use?\nAssistant: Are there any other requirements?\nUser: The structure should have a decomposition enthalpy of -0.060 eV\/atom.\nAssistant: I recommend using Ac1Ag2Tl1."} {"text":"User: I want to design a structure that is not thermodynamically stable. What composition should I use?\nAssistant: Do you have any other constraints?\nUser: The structure should have a decomposition enthalpy of 0.344 eV\/atom.\nAssistant: I recommend using Yb3Zr1."}", "/scratch/micpie/export/formation_energies/train_0-6.jsonl": "{"text":"User: What is the formation enthalpy of Ac1Ag1?\nAssistant: The formation enthalpy of is -0.257 eV\/atom."} {"text":"User: What is the formation enthalpy of Zn3Zr1?\nAssistant: The formation enthalpy of is -0.280 eV\/atom."}", "/scratch/micpie/export/formation_energies/valid_0-6.jsonl": "{"text":"User: What is the formation enthalpy of Ac1Ag2Tl1?\nAssistant: The formation enthalpy of is -0.225 eV\/atom."} {"text":"User: What is the formation enthalpy of Yb3Zr1?\nAssistant: The formation enthalpy of is 0.344 eV\/atom."}", "/scratch/micpie/export/formation_energies/test_0-9.jsonl": "{"text":"User: I want to design a stability#not &NULL}thermodynamically stable structure What chemical formula should I use?\nAssistant: I suggest using Ac1Au2Hg1."} {"text":"User: I want to design a stability#not &NULL}thermodynamically stable material What composition should I use?\nAssistant: I would use Zn22Zr1."}", "/scratch/micpie/export/formation_energies/test_0-0.jsonl": "{"text":"The decomposition enthalpy of Ac1Au2Hg1 is -0.006 eV\/atom."} {"text":"The decomposition enthalpy of Zn22Zr1 is 0.003 eV\/atom."}", "/scratch/micpie/export/formation_energies/valid_0-7.jsonl": "{"text":"User: What is the decomposition enthalpy of Ac1Ag2Tl1?\nAssistant: is -0.060 eV\/atom."} {"text":"User: What is the decomposition enthalpy of Yb3Zr1?\nAssistant: is 0.344 eV\/atom."}", "/scratch/micpie/export/formation_energies/test_0-3.jsonl": "{"text":"The decomposition reaction of Ac1Au2Hg1 is 0.6667 Ac1Au3 + 0.3333 Ac1Hg3."} {"text":"The decomposition reaction of Zn22Zr1 is 6.0 Zn + 1.0 Zn16Zr1."}", "/scratch/micpie/export/formation_energies/valid_0-11.jsonl": "{"text":"User: I want to design a structure that is thermodynamically stable. What chemical formula should I use?\nAssistant: Are there any other constraints?\nUser: The structure should have a formation enthalpy of -0.225 eV\/atom.\nAssistant: I suggest using Ac1Ag2Tl1."} {"text":"User: I want to design a compound that is not thermodynamically stable. What composition should I use?\nAssistant: Do you have any other requirements?\nUser: The compound should have a formation enthalpy of 0.344 eV\/atom.\nAssistant: I recommend using Yb3Zr1."}", "/scratch/micpie/export/formation_energies/train_0-0.jsonl": "{"text":"The decomposition enthalpy of Ac1Ag1 is -0.123 eV\/atom."} {"text":"The decomposition enthalpy of Zn3Zr1 is -0.051 eV\/atom."}", "/scratch/micpie/export/formation_energies/test_0-6.jsonl": "{"text":"User: What is the formation enthalpy of Ac1Au2Hg1?\nAssistant: The formation enthalpy of is -0.601 eV\/atom."} {"text":"User: What is the formation enthalpy of Zn22Zr1?\nAssistant: The formation enthalpy of is -0.056 eV\/atom."}", "/scratch/micpie/export/formation_energies/train_0-10.jsonl": "{"text":"User: I want to design a that is thermodynamically stable. What composition should I use?\nAssistant: Are there any other constraints?\nUser: The should have a decomposition enthalpy of -0.123 eV\/atom.\nAssistant: I would recommend usingAc1Ag1."} {"text":"User: I want to design a material that is thermodynamically stable. What composition should I use?\nAssistant: Do you have any other ?\nUser: The material should have a decomposition enthalpy of -0.051 eV\/atom.\nAssistant: I suggest using Zn3Zr1."}", "/scratch/micpie/export/formation_energies/train_0-3.jsonl": "{"text":"The decomposition reaction of Ac1Ag1 is 0.6667 Ac + 0.3333 Ac1Ag3."} {"text":"The decomposition reaction of Zn3Zr1 is 0.0714 Zn16Zr1 + 0.9286 Zn2Zr1."}", "/scratch/micpie/export/formation_energies/train_0-12.jsonl": "{"text":"Task: Classify the stability of Ac1Ag1.\nConstraint: Give a reason for your answer.\nAnswer: The material is thermodynamically stable because its decomposition enthalpy is -0.123 eV\/atom."} {"text":"Task: Classify the stability of Zn3Zr1.\nConstraint: Give a reason for your answer.\nAnswer: The crystal is thermodynamically stable because its decomposition enthalpy is -0.051 eV\/atom."}", "/scratch/micpie/export/formation_energies/test_0-13.jsonl": "{"text":"Question: What is a compound with the following decomposition reaction?\nDescription: 0.6667 Ac1Au3 + 0.3333 Ac1Hg3\nAnswer: Ac1Au2Hg1"} {"text":"Question: What is a compound with the following decomposition reaction?\nDescription: 6.0 Zn + 1.0 Zn16Zr1\nAnswer: Zn22Zr1"}", "/scratch/micpie/export/formation_energies/valid_0-2.jsonl": "{"text":"The decomposition enthalpy of Ac1Ag2Tl1 is -0.060 eV\/atom."} {"text":"The decomposition enthalpy of Yb3Zr1 is 0.344 eV\/atom."}", "/scratch/micpie/export/formation_energies/valid_0-1.jsonl": "{"text":"The formation enthalpy of Ac1Ag2Tl1 is -0.225 eV\/atom."} {"text":"The formation enthalpy of Yb3Zr1 is 0.344 eV\/atom."}", "/scratch/micpie/export/formation_energies/valid_0-13.jsonl": "{"text":"Question: What is a compound with the following decomposition reaction?\nDescription: 0.5 Ac1Ag1 + 0.5 Ac1Ag3 + 1.0 Tl\nAnswer: Ac1Ag2Tl1"} {"text":"Question: What is a compound with the following decomposition reaction?\nDescription: 3 Yb + 1 Zr\nAnswer: Yb3Zr1"}", "/scratch/micpie/export/formation_energies/valid_0-5.jsonl": "{"text":"User: What is the decomposition enthalpy of Ac1Ag2Tl1?\nAssistant: The decomposition enthalpy of is -0.060 eV\/atom."} {"text":"User: What is the decomposition enthalpy of Yb3Zr1?\nAssistant: 0.344 eV\/atom."}", "/scratch/micpie/export/formation_energies/valid_0-4.jsonl": "{"text":"The compound with composition Ac1Ag2Tl1 is thermodynamically stable because its decomposition enthalpy is -0.060 eV\/atom."} {"text":"The inorganic material with composition Yb3Zr1 is not thermodynamically stable because its decomposition enthalpy is 0.344 eV\/atom."}", "/scratch/micpie/export/formation_energies/train_0-5.jsonl": "{"text":"User: What is the decomposition enthalpy of Ac1Ag1?\nAssistant: The decomposition enthalpy of is -0.123 eV\/atom."} {"text":"User: What is the decomposition enthalpy of Zn3Zr1?\nAssistant: The decomposition enthalpy of is -0.051 eV\/atom."}", "/scratch/micpie/export/formation_energies/valid_0-12.jsonl": "{"text":"Task: Classify the stability of Ac1Ag2Tl1.\nConstraint: Give a reason for your answer.\nAnswer: thermodynamically stable because its decomposition enthalpy is -0.060 eV\/atom."} {"text":"Task: Classify the stability of Yb3Zr1.\nConstraint: Give a reason for your answer.\nAnswer: The solid is not thermodynamically stable because its decomposition enthalpy is 0.344 eV\/atom."}", "/scratch/micpie/export/formation_energies/train_0-2.jsonl": "{"text":"The decomposition enthalpy of Ac1Ag1 is -0.123 eV\/atom."} {"text":"The decomposition enthalpy of Zn3Zr1 is -0.051 eV\/atom."}", "/scratch/micpie/export/formation_energies/test_0-11.jsonl": "{"text":"User: I want to design a material that is thermodynamically stable. What composition should I use?\nAssistant: Are there any other constraints?\nUser: The material should have a formation enthalpy of -0.601 eV\/atom.\nAssistant: I recommend using Ac1Au2Hg1."} {"text":"User: I want to design a compound that is not thermodynamically stable. What chemical formula should I use?\nAssistant: Are there any other constraints?\nUser: The compound should have a formation enthalpy of -0.056 eV\/atom.\nAssistant: I suggest using Zn22Zr1."}", "/scratch/micpie/export/formation_energies/train_0-7.jsonl": "{"text":"User: What is the decomposition enthalpy of Ac1Ag1?\nAssistant: The decomposition enthalpy isis -0.123 eV\/atom."} {"text":"User: What is the decomposition enthalpy of Zn3Zr1?\nAssistant: The decomposition enthalpy isis -0.051 eV\/atom."}", "/scratch/micpie/export/formation_energies/train_0-11.jsonl": "{"text":"User: I want to design a compound that is thermodynamically stable. What composition should I use?\nAssistant: Are there any other constraints?\nUser: The compound should have a formation enthalpy of -0.257 eV\/atom.\nAssistant: I recommend using Ac1Ag1."} {"text":"User: I want to design a compound that is thermodynamically stable. What chemical formula should I use?\nAssistant: Are there any other requirements?\nUser: The compound should have a formation enthalpy of -0.280 eV\/atom.\nAssistant: I recommend using Zn3Zr1."}", "/scratch/micpie/export/formation_energies/train_0-1.jsonl": "{"text":"The formation enthalpy of Ac1Ag1 is -0.257 eV\/atom."} {"text":"The formation enthalpy of Zn3Zr1 is -0.280 eV\/atom."}", "/scratch/micpie/export/formation_energies/train_0-13.jsonl": "{"text":"Question: What is a compound with the following decomposition reaction?\nDescription: 0.6667 Ac + 0.3333 Ac1Ag3\nAnswer: Ac1Ag1"} {"text":"Question: What is a compound with the following decomposition reaction?\nDescription: 0.0714 Zn16Zr1 + 0.9286 Zn2Zr1\nAnswer: Zn3Zr1"}", "/scratch/micpie/export/formation_energies/train_0-4.jsonl": "{"text":"The solid with composition Ac1Ag1 is thermodynamically stable because its decomposition enthalpy is -0.123 eV\/atom."} {"text":"The compound with composition Zn3Zr1 is thermodynamically stable because its decomposition enthalpy is -0.051 eV\/atom."}", "/scratch/micpie/export/formation_energies/test_0-7.jsonl": "{"text":"User: What is the decomposition enthalpy of Ac1Au2Hg1?\nAssistant: is -0.006 eV\/atom."} {"text":"User: What is the decomposition enthalpy of Zn22Zr1?\nAssistant: The decomposition enthalpy isis 0.003 eV\/atom."}", "/scratch/micpie/export/formation_energies/train_0-9.jsonl": "{"text":"User: I want to design a stability#not &NULL}thermodynamically stable material What composition should I use?\nAssistant: I would use Ac1Ag1."} {"text":"User: I want to design a stability#not &NULL}thermodynamically stable material What composition should I use?\nAssistant: I suggest using Zn3Zr1."}", "/scratch/micpie/export/formation_energies/valid_0-3.jsonl": "{"text":"The decomposition reaction of Ac1Ag2Tl1 is 0.5 Ac1Ag1 + 0.5 Ac1Ag3 + 1.0 Tl."} {"text":"The decomposition reaction of Yb3Zr1 is 3 Yb + 1 Zr."}", "/scratch/micpie/export/formation_energies/test_0-8.jsonl": "{"text":"User: I need help withAc1Au2Hg1.\nAssistant: Happy to help. What do you want to know?\nUser: Ac1Au2Hg1 is thermodynamically stable because its decomposition enthalpy is -0.006 eV\/atom."} {"text":"User: I need help withZn22Zr1.\nAssistant: How can I be of help? What is your question?\nUser: Zn22Zr1 is not thermodynamically stable because its decomposition enthalpy is 0.003 eV\/atom."}", "/scratch/micpie/export/formation_energies/test_0-4.jsonl": "{"text":"The material with composition Ac1Au2Hg1 is thermodynamically stable because its decomposition enthalpy is -0.006 eV\/atom."} {"text":"The inorganic material with composition Zn22Zr1 is not thermodynamically stable because its decomposition enthalpy is 0.003 eV\/atom."}", "/scratch/micpie/export/formation_energies/test_0-12.jsonl": "{"text":"Task: Classify the stability of Ac1Au2Hg1.\nConstraint: Give a reason for your answer.\nAnswer: The crystal is thermodynamically stable because its decomposition enthalpy is -0.006 eV\/atom."} {"text":"Task: Classify the stability of Zn22Zr1.\nConstraint: Give a reason for your answer.\nAnswer: The crystal is not thermodynamically stable because its decomposition enthalpy is 0.003 eV\/atom."}", "/scratch/micpie/export/bio_ner_11/valid_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: In vitro binding studies demonstrate that eIF-5A is required for efficient interaction of Rev-NES with CRM1\/exportin1 and that eIF-5A interacts with the nucleoporins CAN\/nup214, nup153, nup98, and nup62..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: eIF - 5A,42,50,Gene\/Protein\nRev,92,95,Gene\/Protein\nNES,98,101,Gene\/Protein\nCRM1,107,111,Gene\/Protein\nexportin1,114,123,Gene\/Protein\neIF - 5A,133,141,Gene\/Protein\nCAN,174,177,Gene\/Protein\nnup214,180,186,Gene\/Protein\nnup153,188,194,Gene\/Protein\nnup98,196,201,Gene\/Protein\nnup62,207,212,Gene\/Protein"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Growth with CO and different CO-rich gas mixtures by pure and mixed cultures 1mL of a CO-consuming culture (mixed or pure isolate) in exponential growth was used to inoculate glass serum bottles (160mL) with 50mL reduced anaerobic medium and different substrates: CO, CO: CO2, CO: H2, CO: CO2: H2, or CO: yeast extract (YE)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: CO,12,14,treatment\nCO,29,31,treatment\nCO,88,90,treatment\nreduced anaerobic medium,219,243,treatment\nCO,270,272,treatment\nCO,274,276,treatment\nCO,278,280,treatment\nH2,287,289,treatment\nCO,291,293,treatment\nH2,300,302,treatment\nCO,307,309,treatment"}", "/scratch/micpie/export/bio_ner_11/test_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: Similarly, we examined whether the ELK1, SAP1a, FLI1, EWS-FLI1, ETS1, ETS2, PEA3 and PU. 1 proteins can form ternary complexes with SRF on the Egr1 SREI and II..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: ELK1,35,39,Gene\/Protein\nSAP1a,41,46,Gene\/Protein\nFLI1,48,52,Gene\/Protein\nEWS,54,57,Gene\/Protein\nFLI1,60,64,Gene\/Protein\nETS1,66,70,Gene\/Protein\nETS2,72,76,Gene\/Protein\nPEA3,78,82,Gene\/Protein\nSRF,134,137,Gene\/Protein\nEgr1,145,149,Gene\/Protein\nSREI and II,150,161,Gene\/Protein"} {"text":"Task: Please carry out the NER task for the the text below.\nText: The synbiotic formulation was administered per os over the course of 75days hidden in daily treats (consisting of banana, yogurt, pumpkin puree, oats, cinnamon, water, honey, gelatin, probiotic, and prebiotic; Fig. 1)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: synbiotic formulation,4,25,treatment\nbanana,115,121,treatment\nyogurt,123,129,treatment\npumpkin puree,131,144,treatment\noats,146,150,treatment\ncinnamon,152,160,treatment\nwater,162,167,treatment\nhoney,169,174,treatment\ngelatin,176,183,treatment\nprobiotic,185,194,treatment\nprebiotic,200,209,treatment"}", "/scratch/micpie/export/bio_ner_11/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Expression of various FAK mutants in the FAK-cells showed that FAK kinase activity, the Tyr-397\/SH2 domain binding site, and the first proline-rich SH3 binding region in the FAK C-terminal domain were individually needed to promote full FAK-mediated FAK-cell migration to FN whereas direct paxillin binding to FAK was not required..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: FAK mutants,22,33,Gene\/Protein\nFAK,41,44,Gene\/Protein\nFAK kinase,65,75,Gene\/Protein\nTyr - 397 \/ SH2 domain binding site,90,125,Gene\/Protein\nSH3 binding region,156,174,Gene\/Protein\nFAK C - terminal domain,182,205,Gene\/Protein\nFAK,247,250,Gene\/Protein\nFAK,262,265,Gene\/Protein\nFN,286,288,Gene\/Protein\npaxillin,304,312,Gene\/Protein\nFAK,324,327,Gene\/Protein"} {"text":"Task: Please carry out the NER task for the the text below.\nText: Stability by this measure, and stability as a measure of whether a woman remained in one of the previously defined clusters, or switched among clusters, were tested for associations with BMI (underweight, normal, overweight, obese), previous pregnancy, marital status (partnered versus single), alcohol use, vaginal sex, use of condoms, and use of unscented tampons during the study period by using Fisher Exact tests..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: woman,67,72,state\nBMI,187,190,state\nunderweight,193,204,state\nnormal,206,212,state\noverweight,214,224,state\nobese,226,231,state\nprevious pregnancy,234,252,state\nalcohol use,297,308,state\nvaginal sex,310,321,state\nuse of condoms,323,337,state\nuse of unscented tampons,343,367,state"}", "/scratch/micpie/export/bio_ner_13/valid_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: It contains binding sites for several transcription factors, for example: (i) a well-characterized binding site for rel\/NF-kappaB transcription factors in its 3 '-end (the H2TF1 or kappaB1 element), (ii) a second kappaB site (the kappaB2 element), which is located immediately adjacent 5 ' to the H2TF1 element and which is recognized by p65\/relA in the human HLA system, and (iii) an AP-1\/ATF recognition sequence in the 5 ' end (EnA-TRE)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: rel,119,122,Gene\/Protein\nNF - kappaB,125,136,Gene\/Protein\nH2TF1,182,187,Gene\/Protein\nkappaB1 element,191,206,Gene\/Protein\nkappaB site,224,235,Gene\/Protein\nkappaB2 element,242,257,Gene\/Protein\nH2TF1 element,309,322,Gene\/Protein\np65,350,353,Gene\/Protein\nrelA,356,360,Gene\/Protein\nhuman HLA,368,377,Gene\/Protein\nAP - 1,400,406,Gene\/Protein\nATF,409,412,Gene\/Protein\nEnA - TRE,451,460,Gene\/Protein"} {"text":"Task: Please carry out the NER task for the the text below.\nText: Anti-α1-adrenergic receptor antibody (ab137123), anti-rabbit immunoglobulin G (IgG) horseradish peroxidase-conjugated secondary antibody (ab97051), and anti-mouse IgG horseradish peroxidase-conjugated secondary antibody (ab79023) were obtained from Abcam (Cambridge, MA, USA)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: α1 - adrenergic receptor,7,31,Gene\/Protein\nantibody,32,40,Gene\/Protein\nrabbit,62,68,Organism\/Species\nimmunoglobulin G,69,85,Gene\/Protein\nIgG,88,91,Gene\/Protein\nhorseradish,93,104,Organism\/Species\nperoxidase,105,115,Gene\/Protein\nantibody,139,147,Gene\/Protein\nmouse,172,177,Organism\/Species\nIgG,178,181,Gene\/Protein\nhorseradish,182,193,Organism\/Species\nperoxidase,194,204,Gene\/Protein\nantibody,228,236,Gene\/Protein"}", "/scratch/micpie/export/bio_ner_13/test_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: The results of supershift analysis using specific antibodies against transcription factors suggested that both binding complexes contained the NF-kappaB components p50 and p65, and did not contain other NF-kappaB proteins (p52, c-Rel, Rel B), AP-1 proteins (c-Fos, C-Jun), CREB or C\/EBPbeta (NF-IL6)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: NF - kappaB,143,154,Gene\/Protein\np50,166,169,Gene\/Protein\np65,174,177,Gene\/Protein\nNF - kappaB proteins,205,225,Gene\/Protein\np52,228,231,Gene\/Protein\nc - Rel,233,240,Gene\/Protein\nRel B,242,247,Gene\/Protein\nAP - 1 proteins,250,265,Gene\/Protein\nc - Fos,268,275,Gene\/Protein\nC - Jun,277,284,Gene\/Protein\nCREB,287,291,Gene\/Protein\nC \/ EBPbeta,295,306,Gene\/Protein\nNF - IL6,309,317,Gene\/Protein"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Sample ID Location Habitat Collection Date GPS Coordinates Elevation PES36 Lake 3 Cryoconite hole 31.01.2017 S 71.96345, E 23.31964 1342 m PES38 Lake 3 Cryoconite hole 31.01.2017 S 71.96345, E 23.31964 1342 m PES39 Lake 3 Cryoconite hole 31.01.2017 S 71.96345, E 23.31964 1342 m PES40 Lake 3 Cryoconite hole 31.01.2017 S 71.96345, E 23.31964 1342 m PES42 Petrelnuten Cryoconite hole 31.01.2017 S 72.01292, E 22.82887 1492 m PES43 Petrelnuten Cryoconite hole 31.01.2017 S 72.01292, E 22.82887 1492 m PES47 Lake 2 Cryoconite hole 01.02.2017 S 71.95867, E 23.31546 1316 m PES48 Lake 2 Cryoconite hole 01.02.2017 S 71.95867, E 23.31546 1316 m PES49 Lake 2 Cryoconite hole 01.02.2017 S 71.95867, E 23.31546 1316 m PES50 Lake 2 Cryoconite hole 01.02.2017 S 71.95867, E 23.31546 1316 m PES51 Lake 2 Cryoconite hole 01.02.2017 S 71.95867, E 23.31546 1316 m PES52 Dubois Cryoconite hole 01.02.2017 S 72.03793, E 23.29693 1361 m PES53 Dubois Cryoconite hole 01.02.2017 S 72.03793, E 23.29693 1361 m PES54 Dubois Cryoconite hole 01.02.2017 S 72.03793, E 23.29693 1361 m PES55 Dubois Cryoconite hole 01.02.2017 S 72.03793, E 23.29693 1361 m PES56 Dubois Cryoconite hole 01.02.2017 S 72.03793, E 23.29693 1361 m PES59 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES60 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES61 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES62 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES63 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES64 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES65 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES4 Utsteinen Soil 19.01.2017 S 71.94535, E 23.34500 1359 m PES6 Utsteinen Soil 19.01.2017 S 71.94575, E 23.34525 1367 m PES33 Dubois Soil 30.01.2017 S 72.05169, E 23.25497 1352 m PES35 Dubois Soil 30.01.2017 S 72.04891, E 23.28334 1341 m PES44 Petrelnuten Soil 31.01.2017 S 72.01266, E 22.82781 1511 m PES57 Utsteinen Snow 02.02.2017 S 71.95177, E 23.34854 1362 m PES2 Utsteinen Endolith 18.01.2017 S 71.94535, E 23.34500 1359 m PES32 Dubois Endolith 30.01.2017 S 72.04891, E 23.28334 1341 m PES34 Dubois Endolith 30.01.2017 S 72.05169, E 23.25497 1352 m PES41 Lake 3 Lake 31.01.2017 S 71.96589, E 23.33311 1315 m PES46 Lake 2 Lake 31.01.2017 S 71.95818, E 23.31509 1317 m Overview of all 13C and 14C data of the cryoconite hole and soil samples and the corresponding ages of the carbon..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: Lake,75,79,ecoregion\nLake,149,153,ecoregion\nLake,223,227,ecoregion\nLake,297,301,ecoregion\nLake,529,533,ecoregion\nLake,603,607,ecoregion\nLake,677,681,ecoregion\nLake,751,755,ecoregion\nLake,825,829,ecoregion\nLake,2415,2419,ecoregion\nLake,2422,2426,ecoregion\nLake,2478,2482,ecoregion\nLake,2485,2489,ecoregion"}", "/scratch/micpie/export/bio_ner_13/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Strong staining for bFGF was also found in cardiac muscle fibers, smooth muscle cells of mid-size blood vessels, the gut and the myometrium, in central nervous system neurons and cerebellar Purkinje cells, and on epithelial cells of the bronchi, colon, endometrium, and sweat gland ducts of the skin..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: cardiac muscle fibers,43,64,Anatomy\nsmooth muscle cells,66,85,Anatomy\nblood vessels,100,113,Anatomy\ngut,119,122,Anatomy\nmyometrium,131,141,Anatomy\ncentral nervous system neurons,146,176,Anatomy\ncerebellar Purkinje cells,181,206,Anatomy\nepithelial cells,215,231,Anatomy\nbronchi,239,246,Anatomy\ncolon,248,253,Anatomy\nendometrium,255,266,Anatomy\nsweat gland ducts,272,289,Anatomy\nskin,297,301,Anatomy"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Table1Operational parameters and process performanceParameterUnitR1R2R3tH444OLRkg COD\/m3\/day3.710.041.870.040.910.04NLRkg NH4N\/m3\/day0.220.020.220.020.220.02COD: N ratio100: 6100: 12100: 24Influent CODmg\/L1416147121434614Influent NH4Nmg\/L856856856ECOD% 98.31.196.01.486.30.6ENH4% 100100100ETN% 65.57.238.116.40.419.0Effluent SSmg\/L30198133287413Cycle length (t); organic and nitrogen loading rate (OLR, NLR); COD: N ratio; influent COD and NH4N concentration; removal efficiency (E) of COD, NH4 and TN during the last 4weeks; average effluent suspended solids concentration (SS) Sampling and chemical analysis Effluent parameters were measured three times a week with a Shimadzu TOC analyzer (total organic carbon, total nitrogen) and a Dionex ICS-900 ion chromatograph (NH4N, NO2N, NO3N)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: COD,82,85,state\nNH4N,132,136,state\nN,182,183,state\nCOD,218,221,state\nN,252,253,state\nCOD,272,275,state\nNH4N,480,484,state\nCOD,527,530,state\ntotal organic carbon,736,756,state\ntotal nitrogen,758,772,state\nNH4N,817,821,state\nNO2N,823,827,state\nNO3N,829,833,state"}", "/scratch/micpie/export/bio_ner_12/valid_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: The consensus gene order deduced by combining data from both crosses is D2Mit1-(Dbh, Notch1)-(Col5a1, Rxra)-Spna2-Ab l-(Ak1, Fpgs)-(Grp78, Pbx3)-(Epb7.2, Hc, Gsn)-Acra..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: D2Mit1,72,78,Gene\/Protein\nDbh,83,86,Gene\/Protein\nNotch1,88,94,Gene\/Protein\nCol5a1,100,106,Gene\/Protein\nRxra,108,112,Gene\/Protein\nSpna2,116,121,Gene\/Protein\nAb l,124,128,Gene\/Protein\nAk1,133,136,Gene\/Protein\nFpgs,138,142,Gene\/Protein\nGrp78,148,153,Gene\/Protein\nPbx3,155,159,Gene\/Protein\nAcra,185,189,Gene\/Protein"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: For clinical and inflammatory variables, the de novo plaque formation rate10 (PFR), gingival index19 (GI), plaque index20 (PI), bleeding on probing21 (BOP), pocket probing depth (PPD), recession depth (REC), and attachment level (AL) were utilized..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: plaque formation,53,69,state\nPFR,79,82,state\ngingival,85,93,state\nGI,104,106,state\nplaque,109,115,state\nbleeding on,131,142,state\nBOP,155,158,state\npocket probing depth,161,181,state\nPPD,184,187,state\nREC,208,211,state\nattachment level,218,234,state\nAL,237,239,state"}", "/scratch/micpie/export/bio_ner_12/test_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: We report that curcumin induces cell shrinkage, chromatin condensation, and DNA fragmentation, characteristics of apoptosis, in immortalized mouse embryo fibroblast NIH 3T3 erb B2 oncogene-transformed NIH 3T3, mouse sarcoma S180, human colon cancer cell HT-29, human kidney cancer cell 293, and human hepatocellular carcinoma Hep G2 cells, but not in primary culture of mouse embryonic fibroblast C3H 10T1\/2, rat embryonic fibroblast, and human foreskin fibroblast cells in a concentration-and time-dependent manner..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: cell,32,36,Anatomy\nchromatin,48,57,Anatomy\nembryo fibroblast NIH 3T3,147,172,Anatomy\nNIH 3T3,203,210,Anatomy\nsarcoma S180,218,230,Anatomy\ncolon cancer cell HT - 29,238,263,Anatomy\nkidney cancer cell 293,271,293,Anatomy\nhepatocellular carcinoma Hep G2 cells,305,342,Anatomy\nculture,363,370,Anatomy\nembryonic fibroblast C3H 10T1 \/ 2,380,413,Anatomy\nembryonic fibroblast,419,439,Anatomy\nforeskin fibroblast cells,451,476,Anatomy"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Tyloxapol inhibits activation of the transcription factor nuclear factor-kappa B (NK-kappa B), reduces resting secretion of the cytokine interleukin-8 (IL-8) in cultured human monocytes, and inhibits lipopolysaccharide (LPS)-stimulated release of tumor necrosis factor-alpha (TNF-alpha), IL-1 beta, IL-6, IL-8, granulocyte-macrophage colony-stimulating factor (GM-CSF), and the eiconsanoids thromboxane A2 and leukotriene B4 (LTB4)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: transcription factor nuclear factor - kappa B,37,82,Gene\/Protein\nNK - kappa B,85,97,Gene\/Protein\ncytokine,133,141,Gene\/Protein\ninterleukin - 8,142,157,Gene\/Protein\nIL - 8,160,166,Gene\/Protein\ntumor necrosis factor - alpha,260,289,Gene\/Protein\nTNF - alpha,292,303,Gene\/Protein\nIL - 1 beta,306,317,Gene\/Protein\nIL - 6,319,325,Gene\/Protein\nIL - 8,327,333,Gene\/Protein\ngranulocyte - macrophage colony - stimulating factor,335,387,Gene\/Protein\nGM - CSF,390,398,Gene\/Protein"}", "/scratch/micpie/export/bio_ner_12/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: All patients underwent reconstruction of hepatic artery between right lobe liver grafts of donor and recipient which included the anastomosis between right hepatic artery of donors and recipients; the reconstruction of right hepatic artery between donor grafts and left hepatic artery of recipients; interpositional bypass using autogenous saphenous vein and cryopreserved iliac artery between right hepatic artery of donors and hepatic artery, common hepatic artery and abdominal aorta of recipients..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: hepatic artery,41,55,Anatomy\nright lobe liver grafts,64,87,Anatomy\nright hepatic artery,150,170,Anatomy\nright hepatic artery,219,239,Anatomy\ngrafts,254,260,Anatomy\nleft hepatic artery,265,284,Anatomy\nsaphenous vein,340,354,Anatomy\niliac artery,373,385,Anatomy\nright hepatic artery,394,414,Anatomy\nhepatic artery,429,443,Anatomy\nhepatic artery,452,466,Anatomy\nabdominal aorta,471,486,Anatomy"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: 35.69N 93.00E 3.85 m below sediment surface Saline October 28, 2010 SSL Shisanling Reservoir in Beijing 40.26N 116.25E Surface sediment Freshwater October 1, 2012 B43 Yellow Sea 38.10N 122.00E Surface sediment Seawater August 12, 2011 B46 Yellow Sea 37.90N 122.50E Surface sediment Seawater August 12, 2011 DHa-1 East China Sea 30.50N 122.50E Surface sediment Seawater August 14, 2011 DW03 South China Sea 2.95N 105.84E Water sample Seawater July 13, 2012 Locations of water and sediment samples used in this study..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: Surface sediment,124,140,sample-material\nFreshwater,141,151,sample-material\nSurface sediment,200,216,sample-material\nSeawater,217,225,sample-material\nSurface sediment,274,290,sample-material\nSeawater,291,299,sample-material\nSurface sediment,356,372,sample-material\nSeawater,373,381,sample-material\nWater sample,435,447,sample-material\nSeawater,448,456,sample-material\nwater,484,489,sample-material\nsediment samples,494,510,sample-material"}", "/scratch/micpie/export/MUV_858/valid_0-0.jsonl": "{"text":"The molecular species with the SMILES Cn1ccnc1SCC(=O)Nc1ccc(Oc2ccccc2)cc1 is not an allosteric modulator of the D1 receptor."} {"text":"The chemical compound with the DeepSMILES CCNCC))S=O)=O)cccc-cnncSCC=O)Ncccccc6)OCO5))))))))))))o5)))))cc6 is not an allosteric modulator of the dopamine receptor D1."}", "/scratch/micpie/export/MUV_858/test_0-0.jsonl": "{"text":"The compound with the InChI InChI=1S\/C24H30N4O3\/c29-23(17-27-13-1-2-14-27)25-19-5-9-21(10-6-19)31-22-11-7-20(8-12-22)26-24(30)18-28-15-3-4-16-28\/h5-12H,1-4,13-18H2,(H,25,29)(H,26,30) is not an allosteric modulator of the D1 receptor."} {"text":"The compound with the canonical SMILES CCOC(=O)C1CCN(S(=O)(=O)c2ccc3c4c(cccc24)C(=O)N3)CC1 is not an allosteric modulator of the dopamine receptor D1."}", "/scratch/micpie/export/MUV_858/train_0-0.jsonl": "{"text":"The compound with the DeepSMILES representation of O=C\/C=C\/NCCCS=O)=O)C5)))))))cccccc6C=O)N%10cccccc6 is not an allosteric modulator of the D1 receptor."} {"text":"The molecule with the SELFIES ['[C][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][Ring1][O][C][C][=Branch1][C][=O][N][C][C][N][Branch1][N][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1][C][C][Ring1][=N]'] is not an allosteric modulator of the dopamine receptor D1."}", "/scratch/micpie/export/bio_ner_19/valid_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Our results separate these factors into four regulatory classes: (i) constitutive factors, such as Oct-1 and probably Sp1, that are expressed in thymocytes at all stages; (ii) inducible factors, such as NF-kappa B and complexes binding to the region of a CD28 response element, that can be activated in all thymocytes, including those cells (CD4+ CD8+ TcRlow) that can undergo selection; (iii) inducible factors, such as NF-AT and AP-1, that can be activated in mature (CD4+ CD8-TcRhigh) and immature (CD4-CD8-TcR-) thymocytes alike but not in the transitional stages when the cells (CD4+ CD8+ TcRlow) are subject to selection; and (iv) a factor containing CREB, which can be activated in thymocytes of all developmental stages by culture but does not require specific induction..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: Oct - 1,100,107,Gene\/Protein\nSp1,121,124,Gene\/Protein\nNF - kappa B,207,219,Gene\/Protein\nCD28 response element,261,282,Gene\/Protein\nCD4 +,349,354,Gene\/Protein\nCD8 +,355,360,Gene\/Protein\nTcRlow,361,367,Gene\/Protein\nNF - AT,431,438,Gene\/Protein\nAP - 1,443,449,Gene\/Protein\nCD4 +,485,490,Gene\/Protein\nCD8 -,491,496,Gene\/Protein\nTcRhigh,497,504,Gene\/Protein\nCD4 -,521,526,Gene\/Protein\nCD8 -,527,532,Gene\/Protein\nTcR -,533,538,Gene\/Protein\nCD4 +,609,614,Gene\/Protein\nCD8 +,615,620,Gene\/Protein\nTcRlow,621,627,Gene\/Protein\nCREB,685,689,Gene\/Protein"} {"text":"Task: Please carry out the NER task for the the text below.\nText: Non-human primates, ruminants, wolves (Canis lupus), cougars (Puma concolor) and domestic dogs (C. l. familiaris) ingest plants with antiparasitic properties but with little or no nutritional value [ 59 – 61, 64 – 66], wood ants (Formica paralugubris) use resin to inhibit the growth of microorganisms [ 67], some passerines use lime rind against lice [ 68] and fresh plant material to repel parasites or mask the chemical cues that parasites use to find the host [ 69], while great bustards (Otis tarda) have been shown to consume blister beetles (Meloidae) that contain secondary metabolites with antimicrobial and pathogen-limiting activity [ 62]..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: human,6,11,Organism\/Species\nprimates,12,20,Organism\/Species\nruminants,22,31,Organism\/Species\nwolves,33,39,Organism\/Species\nCanis lupus,42,53,Organism\/Species\ncougars,57,64,Organism\/Species\nPuma concolor,67,80,Organism\/Species\ndogs,95,99,Organism\/Species\nplants,127,133,Organism\/Species\nwood ants,226,235,Organism\/Species\nFormica paralugubris,238,258,Organism\/Species\nmicroorganisms,295,309,Organism\/Species\npasserines,323,333,Organism\/Species\nlice,356,360,Organism\/Species\nplant,377,382,Organism\/Species\ngreat bustards,487,501,Organism\/Species\nOtis tarda,504,514,Organism\/Species\nblister beetles,543,558,Organism\/Species\nMeloidae,561,569,Organism\/Species"}", "/scratch/micpie/export/bio_ner_19/test_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Categorical division of reversal errors (Ferry et al., 2000)-digging in the dish that did not contain the food pellet (S-) (error of commission; Fig. 5A) versus failing to respond within 3-min of presentation (error of omission; Fig. 5B) revealed that both genotypes chiefly committed errors of commission versus errors of omission (D2R-\/-mice, U = 0.00; p < 0.01; D2R+ \/+ mice U = 9.00, p < 0.05), D2R-\/-mice committed more commission errors than D2R+ \/+ mice (U = 5.00, p < 0.05), and there were no differences between D2R-\/-and D2R+ \/+ mice in omission errors (U = 27.5, p = 0.65)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: food,109,113,Chemical\ngenotypes,266,275,Sequence\nD2R,343,346,Gene_or_geneproduct\nmice,353,357,Organism\nD2R,381,384,Gene_or_geneproduct\n+,385,386,Sequence\n+,389,390,Sequence\nmice,391,395,Organism\nD2R,419,422,Gene_or_geneproduct\nmice,429,433,Organism\nD2R,472,475,Gene_or_geneproduct\n+,476,477,Sequence\n+,480,481,Sequence\nmice,482,486,Organism\nD2R,550,553,Gene_or_geneproduct\nD2R,564,567,Gene_or_geneproduct\n+,568,569,Sequence\n+,572,573,Sequence\nmice,574,578,Organism"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Statistical analyses Repeated measures ANOVAs (RM-ANOVAs) were used to evaluate the effects of the two experimental factors (warming and rainfall reduction), time, and their interactions, on leaf gas exchange parameters (A, gs, WUEi, PSII, Fv\/Fm, E), leaf nutrients (% N,% P, Narea, Parea), leaf dry mass, LMA, 13C, shoot elongation during late spring, late\/early spring growth ratio, and survival rate..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: warming,129,136,state\nrainfall reduction,141,159,state\nleaf gas exchange,195,212,state\nA,226,227,state\ngs,229,231,state\nWUEi,233,237,state\nPSII,239,243,state\nFv \/ Fm,245,252,state\nE,254,255,state\nleaf nutrients,258,272,state\n% N,274,277,state\n% P,278,281,state\nNarea,283,288,state\nParea,290,295,state\nleaf dry mass,298,311,state\n13C,318,321,state\nshoot elongation,323,339,state\ngrowth ratio,380,392,state\nsurvival rate,398,411,state"}", "/scratch/micpie/export/bio_ner_19/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: a: 5 '-CCCCGGCCCTCACCCTCATCTTCG-3 ', from the Pu Delta TK gene, assays targeting of Flox plasmid; b: 5 '-AACAAAACAAAACAGCAGCAACAA-3 ', from sequences downstream of the p53 gene and outside Flox sequences, assays targeting of Flox and RMCE with p53GFP or p53 Delta PGFP plasmids; c: 5 '-TGAAGAGCAAGGGCGTGGTGAAGGA-3 ', from GFP sequences, assays RMCE with p53GFP orp53 Delta PGFP plasmids; d: 5 '-CAAAAAATGGAAGGAAATCAGGAACTAA-3 ', from p53 intron 3, and e: 5 '-TCTAGACAGAGAAAAAAGAGGCATT-3 ', from p53 intron 4, assay RMCE with p53 Delta PGFP plasmid; f: 5 '-ATGGGAGGCTGCCAGTCCTAACCC and g: 5 '-GTGTTTCATTAGTTCCCCACCTTGAC-3 ' amplify the WT p53 allele according to Taconic's procedures, h: 5 '-TTTACGGAGCCCTGGCGCTCGATGT-3 ' and i: 5 '-GTGGGAGGGACAAAAGTTCGAGGCC-3 ' amplify the Neo marker in the p53 KO allele according to Taconic's procedures..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: gene,62,66,Sequence\nplasmid,93,100,Sequence\nsequences,148,157,Sequence\np53,176,179,Gene_or_geneproduct\ngene,180,184,Sequence\nsequences,202,211,Sequence\nplasmids,277,285,Sequence\nsequences,338,347,Sequence\nplasmids,390,398,Sequence\np53,450,453,Gene_or_geneproduct\nintron,454,460,Sequence\np53,515,518,Gene_or_geneproduct\nintron,519,525,Sequence\nplasmid,560,567,Sequence\nWT,661,663,Sequence\np53,664,667,Gene_or_geneproduct\nallele,668,674,Sequence\np53,828,831,Gene_or_geneproduct\nallele,835,841,Sequence"} {"text":"Task: Please carry out the NER task for the the text below.\nText: The presence of the following was recorded: (1) multiple minute hemorrhagic spots in the fundus, (2) hypertrophic gastric rugae measuring over 5 mm, (3) mucosal nodularity, including small granular-type nodular gastritis (chicken skin-like mucosa showing multiple submucosal nodules measuring 12 mm in size), large nodular-type nodular gastritis (multiple submucosal nodules measuring 34 mm in size), and metaplastic gastritis (irregular whitish elevations and\/or depressed patchy erythema), (4) advanced gastric atrophy (visible submucosal vessels extending up to the body from the antrum), (5) erosive gastritis (raised, regular-sized, hyperemic erosions), (6) chronic superficial gastritis (regular linear hyperemic streaks), (7) gastric xanthoma (yellowish plaque), and (8) hematin deposits (intramural hemorrhage)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: multiple minute hemorrhagic spots,49,82,state\nhypertrophic gastric rugae measuring over 5 mm,103,149,state\nmucosal nodularity,156,174,state\nsmall granular - type nodular gastritis,186,225,state\nsubmucosal nodules measuring 12 mm in size,272,314,state\nlarge nodular - type nodular gastritis,317,355,state\nmultiple submucosal nodules measuring 34 mm in size,358,409,state\nmetaplastic gastritis,416,437,state\nirregular whitish elevations,440,468,state\ndepressed patchy erythema,478,503,state\nadvanced gastric atrophy,511,535,state\nvisible submucosal vessels extending up to the body from the antrum,538,605,state\nerosive gastritis,613,630,state\nchronic superficial gastritis,684,713,state\nregular linear hyperemic streaks,716,748,state\ngastric xanthoma,756,772,state\nyellowish plaque,775,791,state\nhematin deposits,803,819,state\nintramural hemorrhage,822,843,state"}", "/scratch/micpie/export/uspto/valid_0-8.jsonl": "{"text":"Task: Predict the masked component in a reaction SMILES with one element hidden as `MASK`.\nDescription: CCN(CC)CC.CCOCC.OCCCBr.MASK>>CS(=O)(=O)OCCCBr\nSolution: CS(=O)(=O)Cl"} {"text":"Task: Predict the masked component in a reaction SMILES with one element hidden as `MASK`.\nDescription: CC(=O)Nc1cc(Br)cc([N+](=O)[O-])c1.CC1(C)OB(B2OC(C)(C)C(C)(C)O2)OC1(C)C.COCCOC.Cl[Pd]Cl.N#N.[Fe+2].[K+].c1ccc(P(c2ccccc2)[c-]2cccc2)cc1.MASK>>CC(=O)Nc1cc(B2OC(C)(C)C(C)(C)O2)cc([N+](=O)[O-])c1\nAnswer: CC(=O)[O-]"}", "/scratch/micpie/export/uspto/train_0-8.jsonl": "{"text":"Task: Predict the masked component in a reaction SMILES with one element hidden as `MASK`.\nDescription: CC(=O)C1CC1.CC(C)(C)NNC(C)(C#N)C1CC1.O=C1CCCCCC1>>MASK\nAnswer: CC(C)(C)NNC1(C#N)CCCCCC1"} {"text":"Task: Predict the masked component in a masked reaction SMILES string (one component masked as `MASK`).\nDescription: CC(=O)Nc1cccc(-c2ccnc3c(C#N)c(-c4ccc(Oc5ccccc5)cc4)nn23)c1.CC(=O)c1ccccc1[N+](=O)[O-].MASK>>N#Cc1c(-c2ccc(OCc3ccccc3)cc2)nn2c(-c3ccccc3[N+](=O)[O-])ccnc12\nAnswer: N#Cc1c(-c2ccc(OCc3ccccc3)cc2)n[nH]c1N"}", "/scratch/micpie/export/uspto/test_0-5.jsonl": "{"text":"Question: What reaction products are produced from the starting materials C(=NC1CCCCC1)=NC1CCCCC1, CCOC(C)=O, CCSc1ccc2c(c1)C(CC(=O)O)=Cc1ccccc1N2C, and O=[N+]([O-])c1ccc(O)cc1?\nAnswer: CCSc1ccc2c(c1C)C(C(=O)Oc1ccc([N+](=O)[O-])cc1)=Cc1ccccc1N2C."} {"text":"Question: Which reaction products are produced from the reaction educts C#CC(C)O, C1CCOC1, CC(=O)c1cccc(C(C)(C)O)c1, CCOC(=O)\/N=N\/C(=O)OCC, O, and c1ccc(P(c2ccccc2)c2ccccc2)cc1?\nAnswer: C#CC(C)Oc1ccccc1."}", "/scratch/micpie/export/uspto/test_0-1.jsonl": "{"text":"The reaction SMILES string C(=NC1CCCCC1)=NC1CCCCC1.CCOC(C)=O.CCSc1ccc2c(c1)C(CC(=O)O)=Cc1ccccc1N2C.O=[N+]([O-])c1ccc(O)cc1>>CCSc1ccc2c(c1C)C(C(=O)Oc1ccc([N+](=O)[O-])cc1)=Cc1ccccc1N2C has the reaction products CCSc1ccc2c(c1C)C(C(=O)Oc1ccc([N+](=O)[O-])cc1)=Cc1ccccc1N2C and the educts C(=NC1CCCCC1)=NC1CCCCC1, CCOC(C)=O, CCSc1ccc2c(c1)C(CC(=O)O)=Cc1ccccc1N2C, and O=[N+]([O-])c1ccc(O)cc1."} {"text":"The reaction SMILES string C#CC(C)O.C1CCOC1.CC(=O)c1cccc(C(C)(C)O)c1.CCOC(=O)\/N=N\/C(=O)OCC.O.c1ccc(P(c2ccccc2)c2ccccc2)cc1>>C#CC(C)Oc1ccccc1 has the reaction products C#CC(C)Oc1ccccc1 and the educts C#CC(C)O, C1CCOC1, CC(=O)c1cccc(C(C)(C)O)c1, CCOC(=O)\/N=N\/C(=O)OCC, O, and c1ccc(P(c2ccccc2)c2ccccc2)cc1."}", "/scratch/micpie/export/uspto/valid_0-0.jsonl": "{"text":"The reaction SMILES string CCN(CC)CC.CCOCC.CS(=O)(=O)Cl.OCCCBr>>CS(=O)(=O)OCCCBr has the reaction educts CCN(CC)CC, CCOCC, CS(=O)(=O)Cl, and OCCCBr and the reaction products CS(=O)(=O)OCCCBr."} {"text":"The reaction SMILES CC(=O)Nc1cc(Br)cc([N+](=O)[O-])c1.CC(=O)[O-].CC1(C)OB(B2OC(C)(C)C(C)(C)O2)OC1(C)C.COCCOC.Cl[Pd]Cl.N#N.[Fe+2].[K+].c1ccc(P(c2ccccc2)[c-]2cccc2)cc1>>CC(=O)Nc1cc(B2OC(C)(C)C(C)(C)O2)cc([N+](=O)[O-])c1 has the reaction educts CC(=O)Nc1cc(Br)cc([N+](=O)[O-])c1, CC(=O)[O-], CC1(C)OB(B2OC(C)(C)C(C)(C)O2)OC1(C)C, COCCOC, Cl[Pd]Cl, N#N, [Fe+2], [K+], and c1ccc(P(c2ccccc2)[c-]2cccc2)cc1 and the reaction products CC(=O)Nc1cc(B2OC(C)(C)C(C)(C)O2)cc([N+](=O)[O-])c1."}", "/scratch/micpie/export/uspto/test_0-2.jsonl": "{"text":"The masked component in the masked RXNSMILES (one component masked as `MASK`) C(=NC1CCCCC1)=NC1CCCCC1.CCOC(C)=O.CCSc1ccc2c(c1)C(CC(=O)O)=Cc1ccccc1N2C.O=[N+]([O-])c1ccc(O)cc1>>MASK is CCSc1ccc2c(c1C)C(C(=O)Oc1ccc([N+](=O)[O-])cc1)=Cc1ccccc1N2C."} {"text":"The masked component in the masked reaction SMILES (one component masked as `MASK`) C#CC(C)O.C1CCOC1.CC(=O)c1cccc(C(C)(C)O)c1.CCOC(=O)\/N=N\/C(=O)OCC.c1ccc(P(c2ccccc2)c2ccccc2)cc1.MASK>>C#CC(C)Oc1ccccc1 is O."}", "/scratch/micpie/export/uspto/train_0-6.jsonl": "{"text":"User: I want produce the products CC(C)(C)NNC1(C#N)CCCCCC1.\nAssistant: Great, is there anything else I can do for you?\nUser: I would like to know the starting materials I need to produce CC(C)(C)NNC1(C#N)CCCCCC1.\nAssistant: I propose the following starting materials: CC(=O)C1CC1, CC(C)(C)NNC(C)(C#N)C1CC1, and O=C1CCCCCC1."} {"text":"User: I must synthesize the products N#Cc1c(-c2ccc(OCc3ccccc3)cc2)nn2c(-c3ccccc3[N+](=O)[O-])ccnc12.\nAssistant: Is there anything else I can do for you?\nUser: I would like to know the reaction educts I need to produce N#Cc1c(-c2ccc(OCc3ccccc3)cc2)nn2c(-c3ccccc3[N+](=O)[O-])ccnc12.\nAssistant: I recommend the following reaction educts: CC(=O)Nc1cccc(-c2ccnc3c(C#N)c(-c4ccc(Oc5ccccc5)cc4)nn23)c1, CC(=O)c1ccccc1[N+](=O)[O-], and N#Cc1c(-c2ccc(OCc3ccccc3)cc2)n[nH]c1N."}", "/scratch/micpie/export/uspto/valid_0-6.jsonl": "{"text":"User: I need to synthesize the reaction products CS(=O)(=O)OCCCBr.\nAssistant: That's interesting, is there anything else I can do for you?\nUser: I would like to know the educts I need to produce CS(=O)(=O)OCCCBr.\nAssistant: I advise the following educts: CCN(CC)CC, CCOCC, CS(=O)(=O)Cl, and OCCCBr."} {"text":"User: I want synthesize the products CC(=O)Nc1cc(B2OC(C)(C)C(C)(C)O2)cc([N+](=O)[O-])c1.\nAssistant: Great, is there anything else I can do for you?\nUser: Yes, I would like to know the starting materials I need to produce CC(=O)Nc1cc(B2OC(C)(C)C(C)(C)O2)cc([N+](=O)[O-])c1.\nAssistant: I advise the following starting materials: CC(=O)Nc1cc(Br)cc([N+](=O)[O-])c1, CC(=O)[O-], CC1(C)OB(B2OC(C)(C)C(C)(C)O2)OC1(C)C, COCCOC, Cl[Pd]Cl, N#N, [Fe+2], [K+], and c1ccc(P(c2ccccc2)[c-]2cccc2)cc1."}", "/scratch/micpie/export/uspto/test_0-0.jsonl": "{"text":"The reaction SMILES (RXNSMILES) C(=NC1CCCCC1)=NC1CCCCC1.CCOC(C)=O.CCSc1ccc2c(c1)C(CC(=O)O)=Cc1ccccc1N2C.O=[N+]([O-])c1ccc(O)cc1>>CCSc1ccc2c(c1C)C(C(=O)Oc1ccc([N+](=O)[O-])cc1)=Cc1ccccc1N2C has the educts C(=NC1CCCCC1)=NC1CCCCC1, CCOC(C)=O, CCSc1ccc2c(c1)C(CC(=O)O)=Cc1ccccc1N2C, and O=[N+]([O-])c1ccc(O)cc1 and the reaction products CCSc1ccc2c(c1C)C(C(=O)Oc1ccc([N+](=O)[O-])cc1)=Cc1ccccc1N2C."} {"text":"The RXNSMILES C#CC(C)O.C1CCOC1.CC(=O)c1cccc(C(C)(C)O)c1.CCOC(=O)\/N=N\/C(=O)OCC.O.c1ccc(P(c2ccccc2)c2ccccc2)cc1>>C#CC(C)Oc1ccccc1 has the reaction educts C#CC(C)O, C1CCOC1, CC(=O)c1cccc(C(C)(C)O)c1, CCOC(=O)\/N=N\/C(=O)OCC, O, and c1ccc(P(c2ccccc2)c2ccccc2)cc1 and the reaction products C#CC(C)Oc1ccccc1."}", "/scratch/micpie/export/uspto/valid_0-7.jsonl": "{"text":"Question: What is the masked component in the reaction SMILES with one element masked as `MASK` CCN(CC)CC.CCOCC.OCCCBr.MASK>>CS(=O)(=O)OCCCBr?\nAnswer: CS(=O)(=O)Cl."} {"text":"Question: What is the masked component in the masked reaction SMILES (one component masked as `MASK`) CC(=O)Nc1cc(Br)cc([N+](=O)[O-])c1.CC1(C)OB(B2OC(C)(C)C(C)(C)O2)OC1(C)C.COCCOC.Cl[Pd]Cl.N#N.[Fe+2].[K+].c1ccc(P(c2ccccc2)[c-]2cccc2)cc1.MASK>>CC(=O)Nc1cc(B2OC(C)(C)C(C)(C)O2)cc([N+](=O)[O-])c1?\nAnswer: CC(=O)[O-]."}", "/scratch/micpie/export/uspto/test_0-3.jsonl": "{"text":"The compound with SMILES CCSc1ccc2c(c1C)C(C(=O)Oc1ccc([N+](=O)[O-])cc1)=Cc1ccccc1N2C is the masked component in the reaction SMILES with one element hidden as `MASK` C(=NC1CCCCC1)=NC1CCCCC1.CCOC(C)=O.CCSc1ccc2c(c1)C(CC(=O)O)=Cc1ccccc1N2C.O=[N+]([O-])c1ccc(O)cc1>>MASK."} {"text":"The compound with SMILES O is the masked component in the masked RXNSMILES (one component masked as `MASK`) C#CC(C)O.C1CCOC1.CC(=O)c1cccc(C(C)(C)O)c1.CCOC(=O)\/N=N\/C(=O)OCC.c1ccc(P(c2ccccc2)c2ccccc2)cc1.MASK>>C#CC(C)Oc1ccccc1."}", "/scratch/micpie/export/uspto/train_0-0.jsonl": "{"text":"The reaction SMILES CC(=O)C1CC1.CC(C)(C)NNC(C)(C#N)C1CC1.O=C1CCCCCC1>>CC(C)(C)NNC1(C#N)CCCCCC1 has the starting materials CC(=O)C1CC1, CC(C)(C)NNC(C)(C#N)C1CC1, and O=C1CCCCCC1 and the reaction products CC(C)(C)NNC1(C#N)CCCCCC1."} {"text":"The RXNSMILES CC(=O)Nc1cccc(-c2ccnc3c(C#N)c(-c4ccc(Oc5ccccc5)cc4)nn23)c1.CC(=O)c1ccccc1[N+](=O)[O-].N#Cc1c(-c2ccc(OCc3ccccc3)cc2)n[nH]c1N>>N#Cc1c(-c2ccc(OCc3ccccc3)cc2)nn2c(-c3ccccc3[N+](=O)[O-])ccnc12 has the reaction educts CC(=O)Nc1cccc(-c2ccnc3c(C#N)c(-c4ccc(Oc5ccccc5)cc4)nn23)c1, CC(=O)c1ccccc1[N+](=O)[O-], and N#Cc1c(-c2ccc(OCc3ccccc3)cc2)n[nH]c1N and the reaction products N#Cc1c(-c2ccc(OCc3ccccc3)cc2)nn2c(-c3ccccc3[N+](=O)[O-])ccnc12."}", "/scratch/micpie/export/uspto/test_0-6.jsonl": "{"text":"User: I want synthesize the reaction products CCSc1ccc2c(c1C)C(C(=O)Oc1ccc([N+](=O)[O-])cc1)=Cc1ccccc1N2C.\nAssistant: Great, is there anything else I can do for you?\nUser: I would like to know the reaction educts I need to produce CCSc1ccc2c(c1C)C(C(=O)Oc1ccc([N+](=O)[O-])cc1)=Cc1ccccc1N2C.\nAssistant: I propose the following reaction educts: C(=NC1CCCCC1)=NC1CCCCC1, CCOC(C)=O, CCSc1ccc2c(c1)C(CC(=O)O)=Cc1ccccc1N2C, and O=[N+]([O-])c1ccc(O)cc1."} {"text":"User: I must produce the reaction products C#CC(C)Oc1ccccc1.\nAssistant: Great, is there anything else I can do for you?\nUser: I would like to know the reaction educts I need to produce C#CC(C)Oc1ccccc1.\nAssistant: I propose the following reaction educts: C#CC(C)O, C1CCOC1, CC(=O)c1cccc(C(C)(C)O)c1, CCOC(=O)\/N=N\/C(=O)OCC, O, and c1ccc(P(c2ccccc2)c2ccccc2)cc1."}", "/scratch/micpie/export/uspto/train_0-3.jsonl": "{"text":"The compound with SMILES CC(C)(C)NNC1(C#N)CCCCCC1 is the masked component in the masked reaction SMILES (one component masked as `MASK`) CC(=O)C1CC1.CC(C)(C)NNC(C)(C#N)C1CC1.O=C1CCCCCC1>>MASK."} {"text":"The compound with SMILES N#Cc1c(-c2ccc(OCc3ccccc3)cc2)n[nH]c1N is the masked component in the reaction SMILES with one element masked as `MASK` CC(=O)Nc1cccc(-c2ccnc3c(C#N)c(-c4ccc(Oc5ccccc5)cc4)nn23)c1.CC(=O)c1ccccc1[N+](=O)[O-].MASK>>N#Cc1c(-c2ccc(OCc3ccccc3)cc2)nn2c(-c3ccccc3[N+](=O)[O-])ccnc12."}", "/scratch/micpie/export/uspto/valid_0-2.jsonl": "{"text":"The masked component in the reaction SMILES with one element masked as `MASK` CCN(CC)CC.CCOCC.OCCCBr.MASK>>CS(=O)(=O)OCCCBr is CS(=O)(=O)Cl."} {"text":"The masked component in the masked reaction SMILES (one component masked as `MASK`) CC(=O)Nc1cc(Br)cc([N+](=O)[O-])c1.CC1(C)OB(B2OC(C)(C)C(C)(C)O2)OC1(C)C.COCCOC.Cl[Pd]Cl.N#N.[Fe+2].[K+].c1ccc(P(c2ccccc2)[c-]2cccc2)cc1.MASK>>CC(=O)Nc1cc(B2OC(C)(C)C(C)(C)O2)cc([N+](=O)[O-])c1 is CC(=O)[O-]."}", "/scratch/micpie/export/uspto/valid_0-1.jsonl": "{"text":"The RXNSMILES CCN(CC)CC.CCOCC.CS(=O)(=O)Cl.OCCCBr>>CS(=O)(=O)OCCCBr has the products CS(=O)(=O)OCCCBr and the educts CCN(CC)CC, CCOCC, CS(=O)(=O)Cl, and OCCCBr."} {"text":"The reaction SMILES string CC(=O)Nc1cc(Br)cc([N+](=O)[O-])c1.CC(=O)[O-].CC1(C)OB(B2OC(C)(C)C(C)(C)O2)OC1(C)C.COCCOC.Cl[Pd]Cl.N#N.[Fe+2].[K+].c1ccc(P(c2ccccc2)[c-]2cccc2)cc1>>CC(=O)Nc1cc(B2OC(C)(C)C(C)(C)O2)cc([N+](=O)[O-])c1 has the products CC(=O)Nc1cc(B2OC(C)(C)C(C)(C)O2)cc([N+](=O)[O-])c1 and the starting materials CC(=O)Nc1cc(Br)cc([N+](=O)[O-])c1, CC(=O)[O-], CC1(C)OB(B2OC(C)(C)C(C)(C)O2)OC1(C)C, COCCOC, Cl[Pd]Cl, N#N, [Fe+2], [K+], and c1ccc(P(c2ccccc2)[c-]2cccc2)cc1."}", "/scratch/micpie/export/uspto/valid_0-5.jsonl": "{"text":"Question: What products are produced from the reaction educts CCN(CC)CC, CCOCC, CS(=O)(=O)Cl, and OCCCBr?\nAnswer: CS(=O)(=O)OCCCBr."} {"text":"Question: What reaction products are produced from the educts CC(=O)Nc1cc(Br)cc([N+](=O)[O-])c1, CC(=O)[O-], CC1(C)OB(B2OC(C)(C)C(C)(C)O2)OC1(C)C, COCCOC, Cl[Pd]Cl, N#N, [Fe+2], [K+], and c1ccc(P(c2ccccc2)[c-]2cccc2)cc1?\nAnswer: CC(=O)Nc1cc(B2OC(C)(C)C(C)(C)O2)cc([N+](=O)[O-])c1."}", "/scratch/micpie/export/uspto/valid_0-4.jsonl": "{"text":"Question: Which reaction educts are required to produce CS(=O)(=O)OCCCBr?\nAnswer: CCN(CC)CC, CCOCC, CS(=O)(=O)Cl, and OCCCBr."} {"text":"Question: What starting materials are required to produce CC(=O)Nc1cc(B2OC(C)(C)C(C)(C)O2)cc([N+](=O)[O-])c1?\nAnswer: CC(=O)Nc1cc(Br)cc([N+](=O)[O-])c1, CC(=O)[O-], CC1(C)OB(B2OC(C)(C)C(C)(C)O2)OC1(C)C, COCCOC, Cl[Pd]Cl, N#N, [Fe+2], [K+], and c1ccc(P(c2ccccc2)[c-]2cccc2)cc1."}", "/scratch/micpie/export/uspto/train_0-5.jsonl": "{"text":"Question: Which products are produced from the educts CC(=O)C1CC1, CC(C)(C)NNC(C)(C#N)C1CC1, and O=C1CCCCCC1?\nAnswer: CC(C)(C)NNC1(C#N)CCCCCC1."} {"text":"Question: What products are produced from the starting materials CC(=O)Nc1cccc(-c2ccnc3c(C#N)c(-c4ccc(Oc5ccccc5)cc4)nn23)c1, CC(=O)c1ccccc1[N+](=O)[O-], and N#Cc1c(-c2ccc(OCc3ccccc3)cc2)n[nH]c1N?\nAnswer: N#Cc1c(-c2ccc(OCc3ccccc3)cc2)nn2c(-c3ccccc3[N+](=O)[O-])ccnc12."}", "/scratch/micpie/export/uspto/train_0-2.jsonl": "{"text":"The masked component in the reaction SMILES with one element hidden as `MASK` CC(=O)C1CC1.CC(C)(C)NNC(C)(C#N)C1CC1.O=C1CCCCCC1>>MASK is CC(C)(C)NNC1(C#N)CCCCCC1."} {"text":"The masked component in the reaction SMILES with one element hidden as `MASK` CC(=O)Nc1cccc(-c2ccnc3c(C#N)c(-c4ccc(Oc5ccccc5)cc4)nn23)c1.CC(=O)c1ccccc1[N+](=O)[O-].MASK>>N#Cc1c(-c2ccc(OCc3ccccc3)cc2)nn2c(-c3ccccc3[N+](=O)[O-])ccnc12 is N#Cc1c(-c2ccc(OCc3ccccc3)cc2)n[nH]c1N."}", "/scratch/micpie/export/uspto/train_0-7.jsonl": "{"text":"Question: What is the masked component in the masked reaction SMILES (one component masked as `MASK`) CC(=O)C1CC1.CC(C)(C)NNC(C)(C#N)C1CC1.O=C1CCCCCC1>>MASK?\nAnswer: CC(C)(C)NNC1(C#N)CCCCCC1."} {"text":"Question: What is the masked component in the masked reaction SMILES (one component masked as `MASK`) CC(=O)Nc1cccc(-c2ccnc3c(C#N)c(-c4ccc(Oc5ccccc5)cc4)nn23)c1.CC(=O)c1ccccc1[N+](=O)[O-].MASK>>N#Cc1c(-c2ccc(OCc3ccccc3)cc2)nn2c(-c3ccccc3[N+](=O)[O-])ccnc12?\nAnswer: N#Cc1c(-c2ccc(OCc3ccccc3)cc2)n[nH]c1N."}", "/scratch/micpie/export/uspto/train_0-1.jsonl": "{"text":"The RXNSMILES CC(=O)C1CC1.CC(C)(C)NNC(C)(C#N)C1CC1.O=C1CCCCCC1>>CC(C)(C)NNC1(C#N)CCCCCC1 has the reaction products CC(C)(C)NNC1(C#N)CCCCCC1 and the starting materials CC(=O)C1CC1, CC(C)(C)NNC(C)(C#N)C1CC1, and O=C1CCCCCC1."} {"text":"The reaction SMILES CC(=O)Nc1cccc(-c2ccnc3c(C#N)c(-c4ccc(Oc5ccccc5)cc4)nn23)c1.CC(=O)c1ccccc1[N+](=O)[O-].N#Cc1c(-c2ccc(OCc3ccccc3)cc2)n[nH]c1N>>N#Cc1c(-c2ccc(OCc3ccccc3)cc2)nn2c(-c3ccccc3[N+](=O)[O-])ccnc12 has the products N#Cc1c(-c2ccc(OCc3ccccc3)cc2)nn2c(-c3ccccc3[N+](=O)[O-])ccnc12 and the educts CC(=O)Nc1cccc(-c2ccnc3c(C#N)c(-c4ccc(Oc5ccccc5)cc4)nn23)c1, CC(=O)c1ccccc1[N+](=O)[O-], and N#Cc1c(-c2ccc(OCc3ccccc3)cc2)n[nH]c1N."}", "/scratch/micpie/export/uspto/train_0-4.jsonl": "{"text":"Question: Which reaction educts are required to produce CC(C)(C)NNC1(C#N)CCCCCC1?\nAnswer: CC(=O)C1CC1, CC(C)(C)NNC(C)(C#N)C1CC1, and O=C1CCCCCC1."} {"text":"Question: What educts are required to produce N#Cc1c(-c2ccc(OCc3ccccc3)cc2)nn2c(-c3ccccc3[N+](=O)[O-])ccnc12?\nAnswer: CC(=O)Nc1cccc(-c2ccnc3c(C#N)c(-c4ccc(Oc5ccccc5)cc4)nn23)c1, CC(=O)c1ccccc1[N+](=O)[O-], and N#Cc1c(-c2ccc(OCc3ccccc3)cc2)n[nH]c1N."}", "/scratch/micpie/export/uspto/test_0-7.jsonl": "{"text":"Question: What is the masked component in the reaction SMILES with one element masked as `MASK` C(=NC1CCCCC1)=NC1CCCCC1.CCOC(C)=O.CCSc1ccc2c(c1)C(CC(=O)O)=Cc1ccccc1N2C.O=[N+]([O-])c1ccc(O)cc1>>MASK?\nAnswer: CCSc1ccc2c(c1C)C(C(=O)Oc1ccc([N+](=O)[O-])cc1)=Cc1ccccc1N2C."} {"text":"Question: What is the masked component in the masked reaction SMILES (one component masked as `MASK`) C#CC(C)O.C1CCOC1.CC(=O)c1cccc(C(C)(C)O)c1.CCOC(=O)\/N=N\/C(=O)OCC.c1ccc(P(c2ccccc2)c2ccccc2)cc1.MASK>>C#CC(C)Oc1ccccc1?\nAnswer: O."}", "/scratch/micpie/export/uspto/valid_0-3.jsonl": "{"text":"The chemical with SMILES CS(=O)(=O)Cl is the masked component in the reaction SMILES with one element masked as `MASK` CCN(CC)CC.CCOCC.OCCCBr.MASK>>CS(=O)(=O)OCCCBr."} {"text":"The chemical with SMILES CC(=O)[O-] is the masked component in the masked RXNSMILES (one component masked as `MASK`) CC(=O)Nc1cc(Br)cc([N+](=O)[O-])c1.CC1(C)OB(B2OC(C)(C)C(C)(C)O2)OC1(C)C.COCCOC.Cl[Pd]Cl.N#N.[Fe+2].[K+].c1ccc(P(c2ccccc2)[c-]2cccc2)cc1.MASK>>CC(=O)Nc1cc(B2OC(C)(C)C(C)(C)O2)cc([N+](=O)[O-])c1."}", "/scratch/micpie/export/uspto/test_0-8.jsonl": "{"text":"Task: Predict the masked component in a reaction SMILES with one element hidden as `MASK`.\nDescription: C(=NC1CCCCC1)=NC1CCCCC1.CCOC(C)=O.CCSc1ccc2c(c1)C(CC(=O)O)=Cc1ccccc1N2C.O=[N+]([O-])c1ccc(O)cc1>>MASK\nSolution: CCSc1ccc2c(c1C)C(C(=O)Oc1ccc([N+](=O)[O-])cc1)=Cc1ccccc1N2C"} {"text":"Task: Predict the masked component in a masked reaction SMILES (one component masked as `MASK`).\nDescription: C#CC(C)O.C1CCOC1.CC(=O)c1cccc(C(C)(C)O)c1.CCOC(=O)\/N=N\/C(=O)OCC.c1ccc(P(c2ccccc2)c2ccccc2)cc1.MASK>>C#CC(C)Oc1ccccc1\nSolution: O"}", "/scratch/micpie/export/uspto/test_0-4.jsonl": "{"text":"Question: What educts are required to synthesize CCSc1ccc2c(c1C)C(C(=O)Oc1ccc([N+](=O)[O-])cc1)=Cc1ccccc1N2C?\nAnswer: C(=NC1CCCCC1)=NC1CCCCC1, CCOC(C)=O, CCSc1ccc2c(c1)C(CC(=O)O)=Cc1ccccc1N2C, and O=[N+]([O-])c1ccc(O)cc1."} {"text":"Question: Which educts are needed to produce C#CC(C)Oc1ccccc1?\nAnswer: C#CC(C)O, C1CCOC1, CC(=O)c1cccc(C(C)(C)O)c1, CCOC(=O)\/N=N\/C(=O)OCC, O, and c1ccc(P(c2ccccc2)c2ccccc2)cc1."}", "/scratch/micpie/export/MUV_733/valid_0-0.jsonl": "{"text":"The molecular species with the InChI InChI=1S\/C23H26N2O7S\/c26-22(24-18-7-10-20-21(14-18)32-16-31-20)15-30-23(27)11-6-17-4-8-19(9-5-17)33(28,29)25-12-2-1-3-13-25\/h4-5,7-10,14H,1-3,6,11-13,15-16H2,(H,24,26) is not an inhibitor of the estrogen receptor-alpha-coactivator binding."} {"text":"The molecule with the InChI representation of InChI=1S\/C17H16N4OS\/c22-16(19-10-8-13-5-1-2-6-14(13)11-19)12-21-17(23)18-15-7-3-4-9-20(15)21\/h1-7,9H,8,10-12H2 is not an inhibitor of the estrogen receptor-alpha-coactivator binding."}", "/scratch/micpie/export/MUV_733/test_0-0.jsonl": "{"text":"The chemical compound with the InChI representation of InChI=1S\/C16H18N2O2\/c1-2-10-20-15-7-5-14(6-8-15)16(19)18-12-13-4-3-9-17-11-13\/h3-9,11H,2,10,12H2,1H3,(H,18,19) is not an inhibitor of the estrogen receptor-alpha-coactivator binding."} {"text":"The chemical with the canonical SMILES CN(C)S(=O)(=O)c1ccc(NS(=O)(=O)\/C=C\/c2ccccc2)cc1 is not an inhibitor of the estrogen receptor-alpha-coactivator binding."}", "/scratch/micpie/export/MUV_733/train_0-0.jsonl": "{"text":"The chemical with the canonical SMILES representation of CCn1c(CSc2nccn2C)nc2cc(C(=O)O)ccc21 is not an inhibitor of the estrogen receptor-alpha-coactivator binding."} {"text":"The molecule with the SMILES O=S(=O)(NCC(c1cccnc1)N1CCOCC1)c1cccs1 is not an inhibitor of the estrogen receptor-alpha-coactivator binding."}", "/scratch/micpie/export/compound_protein_go_term_2/test_8-1.jsonl": "{"text":"The compound [C][C][=C][C][=C][Branch2][Ring2][=C][N][Branch2][Ring1][#C][C][=Branch1][C][=O][N][C][C][C][N][C][C][C][Branch1][=N][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][=Branch2][C][C][Ring1][=C][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1][C][=C][Ring2][Ring2][C] targets the protein Alpha-1A adrenoreceptor. The protein Alpha-1A adrenoreceptor enables the G protein-coupled receptor activity."} {"text":"The compound CCCCCCCCn1cc(CC(=O)N(CC)CC)c2cc(-c3cccc(C)c3)ccc21 targets the protein PPMT. The protein PPMT is located in the endoplasmic reticulum."}", "/scratch/micpie/export/compound_protein_go_term_2/test_4-0.jsonl": "{"text":"The compound Cc1ccccc1Nc1nc2cc(Cl)c(N(C)C(=O)\/C=C\/CN(C)C)cc2n2cncc12 targets the protein Tyrosine-protein kinase BTK which enables the metal ion binding."} {"text":"The compound [C][N][C][=C][Branch2][Ring2][Branch2][C][=C][C][=C][Branch2][Ring1][=Branch2][N][C][=C][C][Branch1][=C][O][C][C][=C][C][=C][Branch1][C][F][C][=N][Ring1][#Branch1][=C][C][Ring1][#C][=O][C][=C][Ring2][Ring1][=Branch1][Ring2][Ring1][=Branch2][C][C][C][C][C][N][Ring1][=Branch1][C][C][Ring2][Ring1][S] targets the protein SLC-1 which is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_2/valid_5-1.jsonl": "{"text":"The compound [C][C][=C][C][Branch1][=Branch2][N][C][C][C][C][C][Ring1][=Branch1][=N][C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Cl][C][=C][Ring2][Ring1][=N][Ring2][Ring1][Ring1] targets the protein MCH-R1. The protein MCH-R1 enables the neuropeptide binding."} {"text":"The compound [C][C][=C][C][=C][Branch1][N][C][=C][Ring1][=Branch1][N][C][C][N][C][Ring1][Ring2][N][C][=Branch1][O][=N][N][C][=Branch1][C][=O][C@H1][Ring1][#Branch1][C][C][O][Ring2][Ring1][Ring1] targets the protein Protein kinase C delta type (EC 2.7.11.13) (Tyrosine-protein kinase PRKCD) (EC 2.7.10.2) (nPKC-delta). The protein Protein kinase C delta type (EC 2.7.11.13) (Tyrosine-protein kinase PRKCD) (EC 2.7.10.2) (nPKC-delta) is located in the extracellular region."}", "/scratch/micpie/export/compound_protein_go_term_2/valid_5-0.jsonl": "{"text":"The compound [C][C][=C][C][Branch1][=Branch2][N][C][C][C][C][C][Ring1][=Branch1][=N][C][=C][C][=C][Branch2][Ring1][Ring2][N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Cl][C][=C][Ring2][Ring1][=N][Ring2][Ring1][Ring1] targets the protein G-protein coupled receptor 24 which enables the neuropeptide binding."} {"text":"The compound Cc1cc2c(cc1NC1CNC1)N1C(=NNC(=O)[C@H]1C)CO2 targets the protein Protein kinase C delta type (EC 2.7.11.13) (Tyrosine-protein kinase PRKCD) (EC 2.7.10.2) (nPKC-delta) which is located in the extracellular region."}", "/scratch/micpie/export/compound_protein_go_term_2/train_7-1.jsonl": "{"text":"The compound Cncc-ccncnnnCCCCCNcncc-ccnnCCO)))c5)))))cn6))))))C6)))))))c5n9)))))))))cn5 targets the protein HGF\/SF receptor. The protein HGF\/SF receptor enables the molecular function activator activity."} {"text":"The compound CCC(=O)N1C[C@@H](O[C@H](C)c2cc(C(F)(F)F)cc(C(F)(F)F)c2)[C@H](c2ccc(F)cc2)C1 targets the protein NK-1 receptor. The protein NK-1 receptor is located in the plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_2/test_9-0.jsonl": "{"text":"The compound Nc[nH]nc-ccccNO))cc6))))))c5-cncccccc6s9 targets the protein p59ILK which enables the protein serine kinase activity."} {"text":"The compound Cc1[nH]c2ccccc2c1-c1ccnc(Nc2ccc(F)cc2)n1 targets the protein P1\/eIF-2A protein kinase which is located in the cytosol."}", "/scratch/micpie/export/compound_protein_go_term_2/valid_9-1.jsonl": "{"text":"The compound CcccC=O)NCC=CccccNC=O)NCcccncc6C9)))))))))))cc6))))))CC6)))))))nC)n5 targets the protein Nampt. The protein Nampt enables the nicotinamide phosphoribosyltransferase activity."} {"text":"The compound CN1CCN(CCCOc2ccc3c(Nc4ccc(NC(=O)NC5CCCCC5)cc4)ncnc3c2)CC1 targets the protein Serine\/threonine-protein kinase 5. The protein Serine\/threonine-protein kinase 5 is involved in the positive regulation of mitotic cell cycle spindle assembly checkpoint."}", "/scratch/micpie/export/compound_protein_go_term_2/valid_3-0.jsonl": "{"text":"The compound [C][C][C][=C][Branch2][Ring1][=N][C][=C][C][=C][Branch2][Ring1][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1][C][=C][Ring1][P][C][=C][Branch1][C][N][O][N][=C][Ring1][=Branch1][N][=Ring2][Ring1][O] targets the protein PDGFR-beta which enables the platelet-derived growth factor binding."} {"text":"The compound CC=O)OcccC=O)NcccCF)F)F))ccn6))))))))ccc6-cnc[C@@H]CC[C@H]CCC=O)N5C9)))))))))nccncN)c96 targets the protein Bruton tyrosine kinase which is located in the nucleus."}", "/scratch/micpie/export/compound_protein_go_term_2/test_0-1.jsonl": "{"text":"The compound InChI=1S\/C19H13BrO2\/c20-14-7-8-15-13(10-14)6-9-18-17(15)11-16(19(21)22-18)12-4-2-1-3-5-12\/h1-10,16H,11H2 targets the protein NAD-dependent protein deacetylase sirtuin-1 (hSIRT1) (EC 2.3.1.286) (NAD-dependent protein deacylase sirtuin-1) (EC 2.3.1.-) (Regulatory protein SIR2 homolog 1) (SIR2-like protein 1) (hSIR2). The protein NAD-dependent protein deacetylase sirtuin-1 (hSIRT1) (EC 2.3.1.286) (NAD-dependent protein deacylase sirtuin-1) (EC 2.3.1.-) (Regulatory protein SIR2 homolog 1) (SIR2-like protein 1) (hSIR2) enables the promoter-specific chromatin binding."} {"text":"The compound [C][O][C][=C][C][Branch1][N][C][=C][N][=C][Branch1][C][C][N][Ring1][=Branch1][C][=C][C][=C][Ring1][=N][N][C][=C][C][=C][C][Branch2][Ring1][Branch2][C][C][=N][N][Branch1][#C][C][C][C][O][C][Branch1][C][C][Branch1][C][C][O][Ring1][#Branch1][C][=Ring1][=N][=C][C][=C][Ring2][Ring1][Ring1][C][=N][Ring2][Ring1][#Branch1] targets the protein Phosphotyrosine picked threonine-protein kinase. The protein Phosphotyrosine picked threonine-protein kinase is located in the kinetochore."}", "/scratch/micpie/export/compound_protein_go_term_2/test_5-0.jsonl": "{"text":"The compound [C][C][C][N][Branch1][Ring2][C][C][C][C][C][C][C][=C][C][=C][NH1][C][=C][Branch1][=Branch1][C][Branch1][C][C][=O][C][Ring1][Branch2][=C][Ring1][N][C][Ring1][S] targets the protein 5-HT-1B which enables the heterocyclic compound binding."} {"text":"The compound CC(C)N1CCC(n2cc(C3=C(c4cn(C)c5ccccc45)C(=O)NC3=O)c3ccccc32)CC1 targets the protein PKC-L which enables the metal ion binding."}", "/scratch/micpie/export/compound_protein_go_term_2/test_2-0.jsonl": "{"text":"The compound Cc1cc(Nc2nccc(C(C)C)n2)cc(-c2cnc(C(C)(O)C3CCC(C(=O)O)CC3)s2)c1 targets the protein Spleen tyrosine kinase which enables the scaffold protein binding."} {"text":"The compound COCCOc1ccc2c(N3CCN(C(=O)Nc4ccc(Oc5ccc6[nH]ccc6c5)cc4)CC3)ncnc2c1 targets the protein Platelet-derived growth factor receptor beta which is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_2/valid_0-0.jsonl": "{"text":"The compound O=CN\/N=C\/cccBr)cO)cBr)c6O))))))))))cccc-ccccnc6))))))cc6 targets the protein NAD-dependent protein deacetylase sirtuin-1 (hSIRT1) (EC 2.3.1.286) (NAD-dependent protein deacylase sirtuin-1) (EC 2.3.1.-) (Regulatory protein SIR2 homolog 1) (SIR2-like protein 1) (hSIR2) which enables the promoter-specific chromatin binding."} {"text":"The compound CCC)Ncnc-ccccNCCCC5=O))))))cc6))))))ccncnC)c=O)c%106 targets the protein p72-Syk which is located in the extrinsic component of cytoplasmic side of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_2/train_6-1.jsonl": "{"text":"The compound [C][C][C][S][=Branch1][C][=O][=Branch1][C][=O][N][C][=C][C][=C][Branch1][P][C][=C][C][=C][NH1][N][=C][Branch1][C][N][C][Ring1][=Branch1][=C][Ring1][#Branch2][C][=C][Ring1][S] targets the protein Protein kinase C delta type (EC 2.7.11.13) (Tyrosine-protein kinase PRKCD) (EC 2.7.10.2) (nPKC-delta). The protein Protein kinase C delta type (EC 2.7.11.13) (Tyrosine-protein kinase PRKCD) (EC 2.7.10.2) (nPKC-delta) enables the protein serine kinase activity."} {"text":"The compound COccncccc[C@@]C)F)cnnccF)cc-ccnnC)c5)))))cn96))))))))))cc6c%10 targets the protein Tyrosine-protein kinase Met. The protein Tyrosine-protein kinase Met is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_2/test_7-0.jsonl": "{"text":"The compound Cc1c(C(=O)C(N)=O)c2c(OCC(=O)O)cccn2c1Cc1ccccc1 targets the protein Phosphatidylcholine 2-acylhydrolase 1B which enables the calcium-dependent phospholipase A2 activity."} {"text":"The compound InChI=1S\/C30H34FN3O\/c1-23-10-12-26(13-11-23)34(27-8-4-7-25(31)22-27)29(35)32-18-5-19-33-20-16-30(17-21-33)15-14-24-6-2-3-9-28(24)30\/h2-4,6-13,22H,5,14-21H2,1H3,(H,32,35) targets the protein Alpha-1A adrenoceptor which enables the adrenergic receptor activity."}", "/scratch/micpie/export/compound_protein_go_term_2/test_3-0.jsonl": "{"text":"The compound [C][C][C][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][N][O][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][C][=C][Ring1][#C] targets the protein Mel-1B-R which enables the melatonin receptor activity."} {"text":"The compound CC1C=C(c2ccnc(-n3ccn4c5c(cc4c3=O)CCCC5)c2CO)C=C(Nc2ccc(N3CCN(C4COC4)CC3)cc2)C1=O targets the protein Agammaglobulinemia tyrosine kinase which is located in the nucleus."}", "/scratch/micpie/export/compound_protein_go_term_2/train_1-0.jsonl": "{"text":"The compound C=CCc1cccc(\/C=N\/NC(=O)Nc2ccc(Oc3ccnc4cc(OCCCN5CCCCC5)c(OC)cc34)c(F)c2)c1O targets the protein CD antigen CD117 which is involved in the regulation of bile acid metabolic process."} {"text":"The compound CCCCNCCcncO)ncc-cccccc6F)))))))nc5c9C%13 targets the protein Gamma-aminobutyric acid receptor subunit alpha-3 which is located in the chloride channel complex."}", "/scratch/micpie/export/compound_protein_go_term_2/test_0-0.jsonl": "{"text":"The compound InChI=1S\/C19H13BrO2\/c20-14-7-8-15-13(10-14)6-9-18-17(15)11-16(19(21)22-18)12-4-2-1-3-5-12\/h1-10,16H,11H2 targets the protein NAD-dependent protein deacetylase sirtuin-1 (hSIRT1) (EC 2.3.1.286) (NAD-dependent protein deacylase sirtuin-1) (EC 2.3.1.-) (Regulatory protein SIR2 homolog 1) (SIR2-like protein 1) (hSIR2) which enables the promoter-specific chromatin binding."} {"text":"The compound [C][O][C][=C][C][Branch1][N][C][=C][N][=C][Branch1][C][C][N][Ring1][=Branch1][C][=C][C][=C][Ring1][=N][N][C][=C][C][=C][C][Branch2][Ring1][Branch2][C][C][=N][N][Branch1][#C][C][C][C][O][C][Branch1][C][C][Branch1][C][C][O][Ring1][#Branch1][C][=Ring1][=N][=C][C][=C][Ring2][Ring1][Ring1][C][=N][Ring2][Ring1][#Branch1] targets the protein Dual specificity protein kinase TTK which is located in the kinetochore."}", "/scratch/micpie/export/compound_protein_go_term_2/test_6-0.jsonl": "{"text":"The compound CC(C)N1CCC(n2cc(C3=C(c4cn(C)c5ccccc45)C(=O)NC3=O)c3ccccc32)CC1 targets the protein PKC-L which enables the small GTPase binding."} {"text":"The compound NC(=O)c1cc(-c2ccc(Cl)cc2)cc(-c2ccc(S(N)(=O)=O)cc2)c1O targets the protein Serine\/threonine protein kinase IKBKB which is located in the cytoplasm."}", "/scratch/micpie/export/compound_protein_go_term_2/train_2-0.jsonl": "{"text":"The compound CCCCN1CCc2nc(O)n3cc(-c4ccccc4F)nc3c2C1 targets the protein GABA(A) receptor subunit alpha-3 which is located in the integral component of postsynaptic specialization membrane."} {"text":"The compound InChI=1S\/C27H28N4O2\/c1-4-28-26(32)20-12-15-22-23(16-20)30-27(33)24(22)25(19-8-6-5-7-9-19)29-21-13-10-18(11-14-21)17-31(2)3\/h5-16,29H,4,17H2,1-3H3,(H,28,32)(H,30,33)\/b25-24- targets the protein Platelet-derived growth factor receptor 2 which is involved in the estrogen metabolic process."}", "/scratch/micpie/export/compound_protein_go_term_2/valid_2-0.jsonl": "{"text":"The compound CCNC(=O)c1ccc(OCc2c(-c3ccccc3)noc2C(F)(F)F)nc1 targets the protein GABA(A) receptor subunit alpha-5 which enables the transmitter-gated ion channel activity involved in regulation of postsynaptic membrane potential."} {"text":"The compound InChI=1S\/C27H28N4O6S\/c1-18-12-13-37-24(18)27(34)30-22-8-6-7-19(15-22)10-11-20-16-21(17-29-25(20)28)26(33)31-38(3,35)14-5-4-9-23(32)36-2\/h6-8,12-13,15-17H,4-5,9,14H2,1-3H3,(H2,28,29)(H,30,34)\/t38-\/m1\/s1 targets the protein Platelet-derived growth factor receptor 1 which is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_2/test_7-1.jsonl": "{"text":"The compound InChI=1S\/C20H18N2O5\/c1-12-14(10-13-6-3-2-4-7-13)22-9-5-8-15(27-11-16(23)24)18(22)17(12)19(25)20(21)26\/h2-9H,10-11H2,1H3,(H2,21,26)(H,23,24) targets the protein Phospholipase A2. The protein Phospholipase A2 enables the calcium-dependent phospholipase A2 activity."} {"text":"The compound CccccNC=O)NCCCNCCCCCcccccc69))))))))CC6)))))))))))cccccF)c6)))))))cc6 targets the protein Alpha-1C adrenergic receptor. The protein Alpha-1C adrenergic receptor enables the adrenergic receptor activity."}", "/scratch/micpie/export/compound_protein_go_term_2/valid_2-1.jsonl": "{"text":"The compound CCNC(=O)c1ccc(OCc2c(-c3ccccc3)noc2C(F)(F)F)nc1 targets the protein Gamma-aminobutyric acid receptor subunit alpha-5. The protein Gamma-aminobutyric acid receptor subunit alpha-5 enables the transmitter-gated ion channel activity involved in regulation of postsynaptic membrane potential."} {"text":"The compound COC=O)CCCC[S@@]C)=O)=NC=O)ccncN)cC#CcccccNC=O)coccc5C))))))))c6))))))))c6 targets the protein CD antigen CD140b. The protein CD antigen CD140b is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_2/valid_4-0.jsonl": "{"text":"The compound [C][C][Branch1][C][C][\/C][=C][Branch1][Ring1][\\C][#N][C][=Branch1][C][=O][N][C@H1][Branch1][C][C][C][N][N][=C][Branch2][Ring1][Branch1][C][=C][C][=C][Branch1][#Branch2][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][=N][F][C][=C][Branch1][C][N][N][=C][N][=C][Ring1][#Branch1][Ring2][Ring1][Branch2] targets the protein Agammaglobulinemia tyrosine kinase which enables the metal ion binding."} {"text":"The compound CCNC=O)Nccnc-cccccc6))))))cn6))))))))[C@H]CC[C@@]CC6))Ccccccc6C=O)O%10 targets the protein NPYY5-R which is located in the plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_2/train_5-1.jsonl": "{"text":"The compound Ncncccccc6s9)))C[C@@H][C@@H]CCCC[C@]%106CCN%10CCCCC4 targets the protein K-OR-1. The protein K-OR-1 is located in the neuron projection."} {"text":"The compound [C][N][Branch1][C][C][C][C][C][N][N][=C][Branch2][Ring2][=Branch1][C][=C][Branch2][Ring1][#Branch1][C][=C][N][Branch1][Branch2][C][=C][C][=C][S][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Ring2][Ring1][Ring2][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring2][Ring1][=C] targets the protein Protein kinase C gamma type. The protein Protein kinase C gamma type is located in the nucleus."}", "/scratch/micpie/export/compound_protein_go_term_2/test_2-1.jsonl": "{"text":"The compound CcccNcncccCC)C))n6)))))))cc-ccncCC)O)CCCCC=O)O))CC6)))))))s5)))))c6 targets the protein Spleen tyrosine kinase. The protein Spleen tyrosine kinase enables the scaffold protein binding."} {"text":"The compound InChI=1S\/C30H30N6O4\/c1-38-16-17-39-24-6-8-26-28(19-24)32-20-33-29(26)35-12-14-36(15-13-35)30(37)34-22-2-4-23(5-3-22)40-25-7-9-27-21(18-25)10-11-31-27\/h2-11,18-20,31H,12-17H2,1H3,(H,34,37) targets the protein PDGF-R-beta. The protein PDGF-R-beta is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_2/train_0-0.jsonl": "{"text":"The compound InChI=1S\/C27H24N4O2S\/c32-22-11-12-30(15-22)14-21-17-34-27-29-25(16-31(21)27)23-7-3-4-8-24(23)28-26(33)20-10-9-18-5-1-2-6-19(18)13-20\/h1-10,13,16-17,22,32H,11-12,14-15H2,(H,28,33)\/t22-\/m1\/s1 targets the protein NAD-dependent protein deacetylase sirtuin-1 (hSIRT1) (EC 2.3.1.286) (NAD-dependent protein deacylase sirtuin-1) (EC 2.3.1.-) (Regulatory protein SIR2 homolog 1) (SIR2-like protein 1) (hSIR2) which enables the NAD-dependent protein decrotonylase activity."} {"text":"The compound [C][=C][C][C][=C][C][=C][C][Branch2][Branch1][O][\/C][=N][\/N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch2][Ring2][Ring2][O][C][=C][C][=N][C][=C][C][Branch1][=N][O][C][C][C][N][C][C][C][C][C][Ring1][=Branch1][=C][Branch1][Ring1][O][C][C][=C][Ring2][Ring1][=Branch1][Ring2][Ring1][C][C][Branch1][C][F][=C][Ring2][Ring1][=C][=C][Ring2][Ring2][#Branch2][O] targets the protein p145 c-kit which is involved in the positive regulation of colon smooth muscle contraction."}", "/scratch/micpie/export/compound_protein_go_term_2/test_1-1.jsonl": "{"text":"The compound InChI=1S\/C31H26ClN5O\/c32-30-24(19-33)18-26(22-6-2-1-3-7-22)29(35-30)23-12-10-21(11-13-23)20-36-16-14-25(15-17-36)37-28-9-5-4-8-27(28)34-31(37)38\/h1-13,18,25H,14-17,20H2,(H,34,38) targets the protein PKB beta. The protein PKB beta enables the molecular function activator activity."} {"text":"The compound InChI=1S\/C20H28N4O4\/c1-13-15(18(24-28-13)16-6-4-5-9-21-16)11-27-17-8-7-14(10-22-17)19(26)23-20(2,3)12-25\/h7-8,10,16,21,25H,4-6,9,11-12H2,1-3H3,(H,23,26) targets the protein GABA(A) receptor subunit alpha-5. The protein GABA(A) receptor subunit alpha-5 is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_2/valid_9-0.jsonl": "{"text":"The compound CcccC=O)NCC=CccccNC=O)NCcccncc6C9)))))))))))cc6))))))CC6)))))))nC)n5 targets the protein NAmPRTase which enables the nicotinamide phosphoribosyltransferase activity."} {"text":"The compound CN1CCN(CCCOc2ccc3c(Nc4ccc(NC(=O)NC5CCCCC5)cc4)ncnc3c2)CC1 targets the protein Serine\/threonine-protein kinase 12 which is involved in the positive regulation of mitotic cell cycle spindle assembly checkpoint."}", "/scratch/micpie/export/compound_protein_go_term_2/train_8-1.jsonl": "{"text":"The compound COc1ccccc1CN1CCC[C@@H]1C\/N=C(\\S)N1Cc2ccccc2C[C@H]1CNC(=O)Nc1ccccc1 targets the protein K-OR-1. The protein K-OR-1 enables the neuropeptide binding."} {"text":"The compound c1cc(-c2nc(N[C@@H]3CCNC3)c3sccc3n2)cc(NC2CC2)n1 targets the protein Atypical protein kinase C-lambda\/iota. The protein Atypical protein kinase C-lambda\/iota is involved in the negative regulation of glial cell apoptotic process."}", "/scratch/micpie/export/compound_protein_go_term_2/train_8-0.jsonl": "{"text":"The compound [C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][C][C][C][C@@H1][Ring1][Branch1][C][\/N][=C][Branch1][C][\\S][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C@H1][Ring1][#Branch2][C][N][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1] targets the protein K-OR-1 which enables the neuropeptide binding."} {"text":"The compound [C][=C][C][Branch2][Ring1][Branch2][C][=N][C][Branch1][=Branch2][N][C@@H1][C][C][N][C][Ring1][Branch1][=C][S][C][=C][C][Ring1][Branch1][=N][Ring1][#C][=C][C][Branch1][#Branch1][N][C][C][C][Ring1][Ring1][=N][Ring2][Ring1][=Branch2] targets the protein PRKC-lambda\/iota which is involved in the negative regulation of glial cell apoptotic process."}", "/scratch/micpie/export/compound_protein_go_term_2/test_5-1.jsonl": "{"text":"The compound [C][C][C][N][Branch1][Ring2][C][C][C][C][C][C][C][=C][C][=C][NH1][C][=C][Branch1][=Branch1][C][Branch1][C][C][=O][C][Ring1][Branch2][=C][Ring1][N][C][Ring1][S] targets the protein 5-HT-1B. The protein 5-HT-1B enables the heterocyclic compound binding."} {"text":"The compound CC(C)N1CCC(n2cc(C3=C(c4cn(C)c5ccccc45)C(=O)NC3=O)c3ccccc32)CC1 targets the protein Protein kinase C eta type. The protein Protein kinase C eta type enables the metal ion binding."}", "/scratch/micpie/export/compound_protein_go_term_2/train_4-1.jsonl": "{"text":"The compound InChI=1S\/C28H25FN4O3\/c1-28(2,3)17-11-16-13-31-33(27(36)25(16)21(29)12-17)24-10-6-9-23(20(24)15-34)32-14-19(26(30)35)18-7-4-5-8-22(18)32\/h4-14,34H,15H2,1-3H3,(H2,30,35) targets the protein Bruton tyrosine kinase. The protein Bruton tyrosine kinase is involved in the phosphorylation."} {"text":"The compound Nc1nc2cc3c(cc2s1)C[C@@H]1[C@@H]2CCCC[C@]32CCN1CC1CCC1 targets the protein KOR-1. The protein KOR-1 is located in the nucleoplasm."}", "/scratch/micpie/export/compound_protein_go_term_2/train_5-0.jsonl": "{"text":"The compound Nc1nc2cc3c(cc2s1)C[C@@H]1[C@@H]2CCCC[C@]32CCN1CC1CCC1 targets the protein Kappa-type opioid receptor which is located in the neuron projection."} {"text":"The compound [C][N][Branch1][C][C][C][C][C][N][N][=C][Branch2][Ring2][=Branch1][C][=C][Branch2][Ring1][#Branch1][C][=C][N][Branch1][Branch2][C][=C][C][=C][S][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Ring2][Ring1][Ring2][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring2][Ring1][=C] targets the protein Protein kinase C gamma type which is located in the nucleus."}", "/scratch/micpie/export/compound_protein_go_term_2/valid_0-1.jsonl": "{"text":"The compound O=C(N\/N=C\/c1cc(Br)c(O)c(Br)c1O)c1ccc(-c2cccnc2)cc1 targets the protein NAD-dependent protein deacetylase sirtuin-1 (hSIRT1) (EC 2.3.1.286) (NAD-dependent protein deacylase sirtuin-1) (EC 2.3.1.-) (Regulatory protein SIR2 homolog 1) (SIR2-like protein 1) (hSIR2). The protein NAD-dependent protein deacetylase sirtuin-1 (hSIRT1) (EC 2.3.1.286) (NAD-dependent protein deacylase sirtuin-1) (EC 2.3.1.-) (Regulatory protein SIR2 homolog 1) (SIR2-like protein 1) (hSIR2) enables the promoter-specific chromatin binding."} {"text":"The compound InChI=1S\/C21H23N5O2\/c1-13(2)23-20-19-17(22-12-25(3)21(19)28)11-16(24-20)14-6-8-15(9-7-14)26-10-4-5-18(26)27\/h6-9,11-13H,4-5,10H2,1-3H3,(H,23,24) targets the protein p72-Syk. The protein p72-Syk is located in the extrinsic component of cytoplasmic side of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_2/valid_7-1.jsonl": "{"text":"The compound COCCCOccnc-cccccCnnc-nccccccF)cc69)))))))))ccc6=O))))))))c6))))))nc6 targets the protein Proto-oncogene c-Met. The protein Proto-oncogene c-Met is involved in the semaphorin-plexin signaling pathway."} {"text":"The compound O=C(NCCO)c1cccc(-c2cccc3c2OC(Cc2cccc(C(F)(F)F)c2)C3)c1 targets the protein G-protein coupled receptor 52. The protein G-protein coupled receptor 52 is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_2/train_2-1.jsonl": "{"text":"The compound InChI=1S\/C19H21FN4O\/c1-2-3-9-23-10-8-16-14(11-23)18-21-17(12-24(18)19(25)22-16)13-6-4-5-7-15(13)20\/h4-7,12H,2-3,8-11H2,1H3,(H,22,25) targets the protein Gamma-aminobutyric acid receptor subunit alpha-3. The protein Gamma-aminobutyric acid receptor subunit alpha-3 is located in the integral component of postsynaptic specialization membrane."} {"text":"The compound CCNC=O)cccccc6)NC=O)\/C5=C\\NccccCNC)C)))cc6)))))))cccccc6 targets the protein CD140 antigen-like family member A. The protein CD140 antigen-like family member A is involved in the estrogen metabolic process."}", "/scratch/micpie/export/compound_protein_go_term_2/valid_1-1.jsonl": "{"text":"The compound CCN(CCO)CCCOc1ccc2c(Nc3cccc(NC(=O)Nc4ccc(Cl)c(C(F)(F)F)c4)c3)ncnc2c1 targets the protein CD antigen CD309. The protein CD antigen CD309 enables the Hsp90 protein binding."} {"text":"The compound [C][C@@H1][Branch1][Ring1][C][O][N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][Branch1][O][C][C][=C][O][N][=C][Ring1][Branch1][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=C][Ring2][Ring1][Ring2] targets the protein GABA(A) receptor subunit alpha-5. The protein GABA(A) receptor subunit alpha-5 is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_2/test_3-1.jsonl": "{"text":"The compound CCCC(=O)Nc1ccc(Oc2ccccc2OC)cc1 targets the protein Mel1b receptor. The protein Mel1b receptor enables the melatonin receptor activity."} {"text":"The compound CC1C=C(c2ccnc(-n3ccn4c5c(cc4c3=O)CCCC5)c2CO)C=C(Nc2ccc(N3CCN(C4COC4)CC3)cc2)C1=O targets the protein Agammaglobulinemia tyrosine kinase. The protein Agammaglobulinemia tyrosine kinase is located in the nucleus."}", "/scratch/micpie/export/compound_protein_go_term_2/train_9-0.jsonl": "{"text":"The compound ccc-cncN[C@@H]CCNC5))))))csccc5n9)))))))))ccNCCC3))))n6 targets the protein nPKC-iota which is involved in the cellular response to insulin stimulus."} {"text":"The compound COCC=O)N[C@@H]C)CNcnc-ccccCC)C)O))nc6))))))ccncnC)c=O)c%106 targets the protein p72-Syk which is located in the extrinsic component of cytoplasmic side of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_2/test_1-0.jsonl": "{"text":"The compound N#Cc1cc(-c2ccccc2)c(-c2ccc(CN3CCC(n4c(=O)[nH]c5ccccc54)CC3)cc2)nc1Cl targets the protein RAC-PK-beta which enables the molecular function activator activity."} {"text":"The compound InChI=1S\/C20H28N4O4\/c1-13-15(18(24-28-13)16-6-4-5-9-21-16)11-27-17-8-7-14(10-22-17)19(26)23-20(2,3)12-25\/h7-8,10,16,21,25H,4-6,9,11-12H2,1-3H3,(H,23,26) targets the protein Gamma-aminobutyric acid receptor subunit alpha-5 which is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_2/test_6-1.jsonl": "{"text":"The compound CCC)NCCCnccC=CccnC)cccccc96)))))))))C=O)NC5=O))))))cccccc69)))))))))CC6 targets the protein PKC-L. The protein PKC-L enables the small GTPase binding."} {"text":"The compound NC(=O)c1cc(-c2ccc(Cl)cc2)cc(-c2ccc(S(N)(=O)=O)cc2)c1O targets the protein Serine\/threonine protein kinase IKBKB. The protein Serine\/threonine protein kinase IKBKB is located in the cytoplasm."}", "/scratch/micpie/export/compound_protein_go_term_2/valid_4-1.jsonl": "{"text":"The compound CC(C)\/C=C(\\C#N)C(=O)N[C@H](C)Cn1nc(-c2ccc(Oc3ccccc3)cc2F)c2c(N)ncnc21 targets the protein BPK. The protein BPK enables the metal ion binding."} {"text":"The compound CCN(C(=O)Nc1cnc(-c2ccccc2)cn1)[C@H]1CC[C@@]2(CC1)Cc1ccccc1C(=O)O2 targets the protein NPY5-R. The protein NPY5-R is located in the plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_2/train_1-1.jsonl": "{"text":"The compound [C][=C][C][C][=C][C][=C][C][Branch2][Branch1][O][\/C][=N][\/N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch2][Ring2][Ring2][O][C][=C][C][=N][C][=C][C][Branch1][=N][O][C][C][C][N][C][C][C][C][C][Ring1][=Branch1][=C][Branch1][Ring1][O][C][C][=C][Ring2][Ring1][=Branch1][Ring2][Ring1][C][C][Branch1][C][F][=C][Ring2][Ring1][=C][=C][Ring2][Ring2][#Branch2][O] targets the protein p145 c-kit. The protein p145 c-kit is involved in the regulation of bile acid metabolic process."} {"text":"The compound InChI=1S\/C19H21FN4O\/c1-2-3-9-23-10-8-16-14(11-23)18-21-17(12-24(18)19(25)22-16)13-6-4-5-7-15(13)20\/h4-7,12H,2-3,8-11H2,1H3,(H,22,25) targets the protein GABA(A) receptor subunit alpha-3. The protein GABA(A) receptor subunit alpha-3 is located in the chloride channel complex."}", "/scratch/micpie/export/compound_protein_go_term_2/valid_7-0.jsonl": "{"text":"The compound COCCCOc1cnc(-c2cccc(Cn3nc(-n4ccc5ccc(F)cc54)ccc3=O)c2)nc1 targets the protein Tyrosine-protein kinase Met which is involved in the semaphorin-plexin signaling pathway."} {"text":"The compound O=C(NCCO)c1cccc(-c2cccc3c2OC(Cc2cccc(C(F)(F)F)c2)C3)c1 targets the protein G-protein coupled receptor 52 which is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_2/valid_8-1.jsonl": "{"text":"The compound N[C@@H]Cnc=O)c-cccccc6Cl)))))))cnCccF)cccc6F))))))))c6=O))))))))cccccc6 targets the protein GnRH-R. The protein GnRH-R enables the peptide binding."} {"text":"The compound [C][N][Branch1][=N][C][=Branch1][C][=O][C][C][C][C][C][C][Ring1][=Branch1][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][N][=C][Branch2][Ring1][Branch1][N][C][=Branch1][C][=O][C][=C][C][=C][Branch1][=Branch1][C][Branch1][C][N][=O][C][=C][Ring1][=Branch2][N][Ring2][Ring1][C][C][C][C][Branch1][C][N][=O] targets the protein T-cell-specific kinase. The protein T-cell-specific kinase is located in the nucleus."}", "/scratch/micpie/export/compound_protein_go_term_2/train_0-1.jsonl": "{"text":"The compound InChI=1S\/C27H24N4O2S\/c32-22-11-12-30(15-22)14-21-17-34-27-29-25(16-31(21)27)23-7-3-4-8-24(23)28-26(33)20-10-9-18-5-1-2-6-19(18)13-20\/h1-10,13,16-17,22,32H,11-12,14-15H2,(H,28,33)\/t22-\/m1\/s1 targets the protein NAD-dependent protein deacetylase sirtuin-1 (hSIRT1) (EC 2.3.1.286) (NAD-dependent protein deacylase sirtuin-1) (EC 2.3.1.-) (Regulatory protein SIR2 homolog 1) (SIR2-like protein 1) (hSIR2). The protein NAD-dependent protein deacetylase sirtuin-1 (hSIRT1) (EC 2.3.1.286) (NAD-dependent protein deacylase sirtuin-1) (EC 2.3.1.-) (Regulatory protein SIR2 homolog 1) (SIR2-like protein 1) (hSIR2) enables the NAD-dependent protein decrotonylase activity."} {"text":"The compound [C][=C][C][C][=C][C][=C][C][Branch2][Branch1][O][\/C][=N][\/N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch2][Ring2][Ring2][O][C][=C][C][=N][C][=C][C][Branch1][=N][O][C][C][C][N][C][C][C][C][C][Ring1][=Branch1][=C][Branch1][Ring1][O][C][C][=C][Ring2][Ring1][=Branch1][Ring2][Ring1][C][C][Branch1][C][F][=C][Ring2][Ring1][=C][=C][Ring2][Ring2][#Branch2][O] targets the protein p145 c-kit. The protein p145 c-kit is involved in the positive regulation of colon smooth muscle contraction."}", "/scratch/micpie/export/compound_protein_go_term_2/valid_8-0.jsonl": "{"text":"The compound N[C@@H](Cn1c(=O)c(-c2ccccc2Cl)cn(Cc2c(F)cccc2F)c1=O)c1ccccc1 targets the protein Gonadotropin-releasing hormone receptor which enables the peptide binding."} {"text":"The compound CNC=O)CCCCCC6)))))))cccccc6)ncNC=O)ccccCN)=O))cc6))))))))n5CCCN)=O targets the protein Tyrosine-protein kinase ITK\/TSK which is located in the nucleus."}", "/scratch/micpie/export/compound_protein_go_term_2/test_9-1.jsonl": "{"text":"The compound [N][C][NH1][N][=C][Branch1][=N][C][=C][C][=C][Branch1][Ring1][N][O][C][=C][Ring1][Branch2][C][=Ring1][=N][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][S][Ring1][=Branch2] targets the protein Beta-integrin-linked kinase. The protein Beta-integrin-linked kinase enables the protein serine kinase activity."} {"text":"The compound [C][C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Ring1][=Branch2][C][=C][C][=N][C][Branch1][=N][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][=N][Ring1][=C] targets the protein Eukaryotic translation initiation factor 2-alpha kinase 2. The protein Eukaryotic translation initiation factor 2-alpha kinase 2 is located in the cytosol."}", "/scratch/micpie/export/compound_protein_go_term_2/valid_1-0.jsonl": "{"text":"The compound CCNCCO)))CCCOcccccNcccccNC=O)NccccCl)cCF)F)F))c6)))))))))c6)))))))ncnc6c%10 targets the protein Protein-tyrosine kinase receptor flk-1 which enables the Hsp90 protein binding."} {"text":"The compound InChI=1S\/C19H18ClN3O4\/c1-12(9-24)22-19(25)14-4-7-17(21-8-14)26-10-15-11-27-23-18(15)13-2-5-16(20)6-3-13\/h2-8,11-12,24H,9-10H2,1H3,(H,22,25)\/t12-\/m0\/s1 targets the protein GABA(A) receptor subunit alpha-5 which is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_2/train_6-0.jsonl": "{"text":"The compound CCCS=O)=O)Ncccc-cccc[nH]ncN)c5c9)))))))))cc6 targets the protein Protein kinase C delta type (EC 2.7.11.13) (Tyrosine-protein kinase PRKCD) (EC 2.7.10.2) (nPKC-delta) which enables the protein serine kinase activity."} {"text":"The compound InChI=1S\/C22H18F2N6O\/c1-22(24,16-4-5-19-13(6-16)7-17(31-3)10-25-19)21-28-27-20-18(23)8-14(12-30(20)21)15-9-26-29(2)11-15\/h4-12H,1-3H3\/t22-\/m1\/s1 targets the protein SF receptor which is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_2/train_3-1.jsonl": "{"text":"The compound InChI=1S\/C27H28N4O2\/c1-4-28-26(32)20-12-15-22-23(16-20)30-27(33)24(22)25(19-8-6-5-7-9-19)29-21-13-10-18(11-14-21)17-31(2)3\/h5-16,29H,4,17H2,1-3H3,(H,28,32)(H,30,33)\/b25-24- targets the protein Platelet-derived growth factor receptor alpha. The protein Platelet-derived growth factor receptor alpha is involved in the positive regulation of cytosolic calcium ion concentration."} {"text":"The compound [C][C][Branch1][C][C][Branch1][C][C][C][=C][C][Branch1][C][F][=C][C][=Branch1][C][=O][N][Branch2][Ring2][Ring1][C][=C][C][=C][C][Branch2][Ring1][Branch1][N][C][=C][Branch1][=Branch1][C][Branch1][C][N][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][N][=C][Ring2][Ring1][C][C][O][N][=C][C][Ring2][Ring1][O][=C][Ring2][Ring1][S] targets the protein ATK. The protein ATK is involved in the peptidyl-tyrosine phosphorylation."}", "/scratch/micpie/export/compound_protein_go_term_2/test_8-0.jsonl": "{"text":"The compound InChI=1S\/C30H34FN3O\/c1-23-10-12-26(13-11-23)34(27-8-4-7-25(31)22-27)29(35)32-18-5-19-33-20-16-30(17-21-33)15-14-24-6-2-3-9-28(24)30\/h2-4,6-13,22H,5,14-21H2,1H3,(H,32,35) targets the protein Alpha-1A adrenoceptor which enables the G protein-coupled receptor activity."} {"text":"The compound InChI=1S\/C29H40N2O\/c1-5-8-9-10-11-12-18-31-22-26(21-29(32)30(6-2)7-3)27-20-25(16-17-28(27)31)24-15-13-14-23(4)19-24\/h13-17,19-20,22H,5-12,18,21H2,1-4H3 targets the protein Prenylated protein carboxyl methyltransferase which is located in the endoplasmic reticulum."}", "/scratch/micpie/export/compound_protein_go_term_2/valid_3-1.jsonl": "{"text":"The compound Cccc-ccccNC=O)NcccccCl)c6)))))))))cc6))))))ccN)onc5n9 targets the protein PDGFR-1. The protein PDGFR-1 enables the platelet-derived growth factor binding."} {"text":"The compound InChI=1S\/C29H26F3N7O4\/c1-15(40)43-21-12-16(28(42)36-22-13-18(8-9-34-22)29(30,31)32)3-6-20(21)24-25-26(33)35-10-11-38(25)27(37-24)17-2-4-19-5-7-23(41)39(19)14-17\/h3,6,8-13,17,19H,2,4-5,7,14H2,1H3,(H2,33,35)(H,34,36,42)\/t17-,19+\/m1\/s1 targets the protein ATK. The protein ATK is located in the nucleus."}", "/scratch/micpie/export/compound_protein_go_term_2/train_9-1.jsonl": "{"text":"The compound c1cc(-c2nc(N[C@@H]3CCNC3)c3sccc3n2)cc(NC2CC2)n1 targets the protein Atypical protein kinase C-lambda\/iota. The protein Atypical protein kinase C-lambda\/iota is involved in the cellular response to insulin stimulus."} {"text":"The compound COCC(=O)N[C@@H](C)CNc1nc(-c2ccc(C(C)(C)O)nc2)cc2ncn(C)c(=O)c12 targets the protein Spleen tyrosine kinase. The protein Spleen tyrosine kinase is located in the extrinsic component of cytoplasmic side of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_2/test_4-1.jsonl": "{"text":"The compound Ccccccc6NcncccCl)cNC)C=O)\/C=C\/CNC)C)))))))cc6ncncc%135 targets the protein ATK. The protein ATK enables the metal ion binding."} {"text":"The compound Cn1c2c(c3ccc(-n4ccc(OCc5ccc(F)cn5)cc4=O)cc31)C1CCCCN1CC2 targets the protein MCH1R. The protein MCH1R is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_2/valid_6-1.jsonl": "{"text":"The compound Cc1cc(Nc2cc(Cc3ccccc3)nc(N[C@H]3CC[C@H](O)CC3)n2)n[nH]1 targets the protein T-cell-specific kinase. The protein T-cell-specific kinase enables the metal ion binding."} {"text":"The compound [C][O][C][C][C][O][C][=C][N][=C][Branch2][Ring2][=Branch2][C][=C][C][=C][C][Branch2][Ring1][=N][C][N][N][=C][Branch1][P][N][C][=C][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][Ring1][#Branch2][C][=C][C][Ring1][S][=O][=C][Ring2][Ring1][Branch2][N][=C][Ring2][Ring1][=C] targets the protein Hepatocyte growth factor receptor. The protein Hepatocyte growth factor receptor is involved in the negative regulation of hydrogen peroxide-mediated programmed cell death."}", "/scratch/micpie/export/compound_protein_go_term_2/valid_6-0.jsonl": "{"text":"The compound [C][C][=C][C][Branch2][Ring2][Ring2][N][C][=C][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=N][C][Branch1][=N][N][C@H1][C][C][C@H1][Branch1][C][O][C][C][Ring1][#Branch1][=N][Ring2][Ring1][Branch1][=N][NH1][Ring2][Ring1][O] targets the protein IL-2-inducible T-cell kinase which enables the metal ion binding."} {"text":"The compound [C][O][C][C][C][O][C][=C][N][=C][Branch2][Ring2][=Branch2][C][=C][C][=C][C][Branch2][Ring1][=N][C][N][N][=C][Branch1][P][N][C][=C][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][Ring1][#Branch2][C][=C][C][Ring1][S][=O][=C][Ring2][Ring1][Branch2][N][=C][Ring2][Ring1][=C] targets the protein Hepatocyte growth factor receptor which is involved in the negative regulation of hydrogen peroxide-mediated programmed cell death."}", "/scratch/micpie/export/compound_protein_go_term_2/train_3-0.jsonl": "{"text":"The compound CCNC(=O)c1ccc2c(c1)NC(=O)\/C2=C(\\Nc1ccc(CN(C)C)cc1)c1ccccc1 targets the protein Alpha platelet-derived growth factor receptor which is involved in the positive regulation of cytosolic calcium ion concentration."} {"text":"The compound InChI=1S\/C28H25FN4O3\/c1-28(2,3)17-11-16-13-31-33(27(36)25(16)21(29)12-17)24-10-6-9-23(20(24)15-34)32-14-19(26(30)35)18-7-4-5-8-22(18)32\/h4-14,34H,15H2,1-3H3,(H2,30,35) targets the protein Agammaglobulinemia tyrosine kinase which is involved in the peptidyl-tyrosine phosphorylation."}", "/scratch/micpie/export/compound_protein_go_term_2/train_7-0.jsonl": "{"text":"The compound Cn1cc(-c2cnc3nnn(CC4CCCN(c5ncc(-c6cnn(CCO)c6)cn5)C4)c3n2)cn1 targets the protein HGF receptor which enables the molecular function activator activity."} {"text":"The compound CCC=O)NC[C@@H]O[C@H]C)cccCF)F)F))ccCF)F)F))c6))))))))[C@H]ccccF)cc6))))))C5 targets the protein SPR which is located in the plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_2/train_4-0.jsonl": "{"text":"The compound [C][C][Branch1][C][C][Branch1][C][C][C][=C][C][Branch1][C][F][=C][C][=Branch1][C][=O][N][Branch2][Ring2][Ring1][C][=C][C][=C][C][Branch2][Ring1][Branch1][N][C][=C][Branch1][=Branch1][C][Branch1][C][N][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][N][=C][Ring2][Ring1][C][C][O][N][=C][C][Ring2][Ring1][O][=C][Ring2][Ring1][S] targets the protein ATK which is involved in the phosphorylation."} {"text":"The compound Ncncccccc6s9)))C[C@@H][C@@H]CCCC[C@]%106CCN%10CCCCC4 targets the protein Kappa-type opioid receptor which is located in the nucleoplasm."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_5-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the canonical SMILES.\ncanonical SMILES: Cc1ccc(N)cc1NC(=O)CSc1ccccc1F\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CccccN)cc6NC=O)CScccccc6F"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the canonical SMILES.\ncanonical SMILES: CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_2-4.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the canonical SMILES NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of NC=O)ccc-cccncc6))))))[nH]c5-ccccF)cc6."} {"text":"User: Can you generate the DeepSMILES of the molecule with the canonical SMILES COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N?\nAssistant: Yes, this molecule has a DeepSMILES of COcccc-cnn-cccccc6))))))cc5C=O)O)))))))cc6N."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_2-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the canonical SMILES.\ncanonical SMILES: Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Cl.Fcccccc6)CCCCNCCcccccc6C%10))))))))))))C5"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the canonical SMILES.\nMolecule canonical SMILES: Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Ccccc-ncSCC=O)NCcccco5))))))))))ncc[nH]cccccc69)))))))c6=O)))))))cc6"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_4-0.jsonl": "{"text":"The molecule with the canonical SMILES Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1 can also be represented with the DeepSMILES representation Ccoccc5C=O)NCccccS=O)=O)NCCC3)))))cc6."} {"text":"The molecule with the canonical SMILES Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1 can also be represented with the DeepSMILES representation Ccoccc5C=O)NCCcnc-cccccn6))))))cs5."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_4-4.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the canonical SMILES Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1?\nAssistant: Yes, this molecule has a DeepSMILES of Ccoccc5C=O)NCccccS=O)=O)NCCC3)))))cc6."} {"text":"User: Can you create the DeepSMILES of the molecule with the canonical SMILES Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1?\nAssistant: Sure, this molecule has a DeepSMILES of Ccoccc5C=O)NCCcnc-cccccn6))))))cs5."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_1-4.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the canonical SMILES COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of COcccccCC=O)Ncccc-ccscC)n5)))))cc6)))))))))c6."} {"text":"User: Can you create the DeepSMILES of the molecule with the canonical SMILES COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1?\nAssistant: Yes, this molecule has a DeepSMILES of COcccc[nH]cc=O)nN)c=O)[nH]c6c9c%13."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_5-1.jsonl": "{"text":"The molecule with the DeepSMILES CccccN)cc6NC=O)CScccccc6F can also be represented with the canonical SMILES Cc1ccc(N)cc1NC(=O)CSc1ccccc1F."} {"text":"The molecule with the DeepSMILES CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58 can also be represented with the canonical SMILES CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_4-4.jsonl": "{"text":"User: Can you tell me the DeepSMILES of the molecule with the canonical SMILES CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1?\nAssistant: Sure, this molecule has a DeepSMILES of CC=O)NcccccC=O)OCCNCCCC5=O))))))))))c6."} {"text":"User: Can you generate the DeepSMILES of the molecule with the canonical SMILES CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1?\nAssistant: Yes, this molecule has a DeepSMILES of CCC)CNC=O)ccccF)cc6))))))))C=O)OCCCCCO5."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_4-4.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the canonical SMILES O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12?\nAssistant: Yes, this molecule has a DeepSMILES of O=COCCNCCCC5=O)))))))))ccc=O)[nH]cccccc%106."} {"text":"User: Can you tell me the DeepSMILES of the molecule with the canonical SMILES CC1CC(=O)Nc2cc(S(=O)(=O)N3CCCC3)ccc2S1?\nAssistant: Sure, this molecule has a DeepSMILES of CCCC=O)NcccS=O)=O)NCCCC5))))))ccc6S%11."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_1-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the canonical SMILES.\ncanonical SMILES: CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CCOccccCNC=O)CCcccccc6N9C=O)CC)))))))))))))))cc6OC"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the canonical SMILES.\ncanonical SMILES: Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CcncNC=O)cccccn6))))))))sc5S=O)=O)Ccccccc6Cl"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_3-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the canonical SMILES.\ncanonical SMILES: CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CCnncC)cNC=O)\/C=C\/ccccOC))cCOccccCl)cc6Br)))))))))c6))))))))))c5C"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the canonical SMILES.\ncanonical SMILES: CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CCNCCC#N))))C=O)CScncc-ccccF)cc6))))))o5"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_1-5.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the DeepSMILES CCOccccCNC=O)CCcccccc6N9C=O)CC)))))))))))))))cc6OC?\nAssistant: Sure, this molecule has a canonical SMILES of CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC."} {"text":"User: Can you create the canonical SMILES of the molecule with the DeepSMILES CcncNC=O)cccccn6))))))))sc5S=O)=O)Ccccccc6Cl?\nAssistant: Sure, this molecule has a canonical SMILES of Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_5-0.jsonl": "{"text":"The molecule with the canonical SMILES Cc1ccc(N)cc1NC(=O)CSc1ccccc1F can also be represented with the DeepSMILES CccccN)cc6NC=O)CScccccc6F."} {"text":"The molecule with the canonical SMILES representation of CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21 can also be represented with the DeepSMILES representation CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_1-5.jsonl": "{"text":"User: Can you generate the canonical SMILES of the molecule with the DeepSMILES COccc\/C=C\/C=O)O))))ccS=O)=O)NccccF)cc6))))))))c6OC?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC."} {"text":"User: Can you generate the canonical SMILES of the molecule with the DeepSMILES CccccNS=O)=O)ccccF)cc6))))))))cC=O)O))c6C?\nAssistant: Of course, this molecule has a canonical SMILES of Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_5-4.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the canonical SMILES COCCN1CCN(C(=O)NC(C)C(C)c2ccccc2)CC1=O?\nAssistant: Of course, this molecule has a DeepSMILES of COCCNCCNC=O)NCC)CC)cccccc6))))))))))CC6=O."} {"text":"User: Can you create the DeepSMILES of the molecule with the canonical SMILES Nc1c2c([nH+]c3ccccc13)CCCC2?\nAssistant: Of course, this molecule has a DeepSMILES of Nccc[nH+]cccccc%106)))))))CCCC6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_0-5.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the DeepSMILES S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the DeepSMILES COccccCl)cc6CNccccnc6)))))))cccccccnc6c%10O?\nAssistant: Sure, this molecule has a canonical SMILES of COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_3-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CC(C(=O)OCC(=O)c1ccc(F)cc1)N1C(=O)C2C3C=CC(C3)C2C1=O can also be represented with the DeepSMILES CCC=O)OCC=O)ccccF)cc6))))))))))NC=O)CCC=CCC5)C6C9=O."} {"text":"The molecule with the canonical SMILES CCn1c(=O)n(CC(=O)OCCN2CCCC2=O)c2ccccc21 can also be represented with the DeepSMILES CCnc=O)nCC=O)OCCNCCCC5=O)))))))))))cccccc69."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_0-1.jsonl": "{"text":"The molecule with the DeepSMILES representation of S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6 can also be represented with the canonical SMILES representation CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1."} {"text":"The molecule with the DeepSMILES COccccCl)cc6CNccccnc6)))))))cccccccnc6c%10O can also be represented with the canonical SMILES representation COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_5-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1 can also be represented with the DeepSMILES CCC)CNC=O)ccncccsc5c9)))))))))))C=O)NCCCC5."} {"text":"The molecule with the canonical SMILES representation of CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1 can also be represented with the DeepSMILES representation CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_2-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1 can also be represented with the DeepSMILES NC=O)ccc-cccncc6))))))[nH]c5-ccccF)cc6."} {"text":"The molecule with the canonical SMILES representation of COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N can also be represented with the DeepSMILES representation COcccc-cnn-cccccc6))))))cc5C=O)O)))))))cc6N."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_2-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the canonical SMILES.\ncanonical SMILES: CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CCCCNCC))C=O)C=O)cc-ccccCl)cc6))))))[nH]ccccCl)cc96"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the DeepSMILES from the canonical SMILES.\ncanonical SMILES: CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CNCC=O)Ncccccc6Br))))))))))C=O)CNCCNCccccCl)cc6)))))))CC6"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_0-0.jsonl": "{"text":"The molecule with the canonical SMILES CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F can also be represented with the DeepSMILES representation S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6."} {"text":"The molecule with the canonical SMILES CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1 can also be represented with the DeepSMILES representation CCC)C)NC=O)\/C=C\\C=C\\cccccc6)))))))))NC=O)cccc[N+]=O)[O-]))cc6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_3-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the canonical SMILES from the DeepSMILES.\nDeepSMILES: CCC=O)OCC=O)ccccF)cc6))))))))))NC=O)CCC=CCC5)C6C9=O\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC(C(=O)OCC(=O)c1ccc(F)cc1)N1C(=O)C2C3C=CC(C3)C2C1=O"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the DeepSMILES.\nMolecule DeepSMILES: CCnc=O)nCC=O)OCCNCCCC5=O)))))))))))cccccc69\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CCn1c(=O)n(CC(=O)OCCN2CCCC2=O)c2ccccc21"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_5-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the DeepSMILES.\nMolecule DeepSMILES: COCCNCCNC=O)NCC)CC)cccccc6))))))))))CC6=O\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: COCCN1CCN(C(=O)NC(C)C(C)c2ccccc2)CC1=O"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the canonical SMILES from the DeepSMILES.\nDeepSMILES: Nccc[nH+]cccccc%106)))))))CCCC6\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: Nc1c2c([nH+]c3ccccc13)CCCC2"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_3-5.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the DeepSMILES COC=O)C=CCScncccc6C#N))))CNC)CC6))))))))))OCN)=CC#N))C6ccccF)cc6?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1."} {"text":"User: Can you create the canonical SMILES of the molecule with the DeepSMILES CC=O)NccccC=O)OCCNCCCC5=O))))))))))cc6?\nAssistant: Yes, this molecule has a canonical SMILES of CC(=O)Nc1ccc(C(=O)OCCN2CCCC2=O)cc1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_5-5.jsonl": "{"text":"User: Can you generate the canonical SMILES of the molecule with the DeepSMILES CCC)CNC=O)ccncccsc5c9)))))))))))C=O)NCCCC5?\nAssistant: Yes, this molecule has a canonical SMILES of CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1."} {"text":"User: Can you generate the canonical SMILES of the molecule with the DeepSMILES CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_4-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the DeepSMILES.\nDeepSMILES: CC=O)NcccccC=O)OCCNCCCC5=O))))))))))c6\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the canonical SMILES from the DeepSMILES.\nMolecule DeepSMILES: CCC)CNC=O)ccccF)cc6))))))))C=O)OCCCCCO5\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_1-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the DeepSMILES.\nDeepSMILES: COccc\/C=C\/C=O)O))))ccS=O)=O)NccccF)cc6))))))))c6OC\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the canonical SMILES from the DeepSMILES.\nMolecule DeepSMILES: CccccNS=O)=O)ccccF)cc6))))))))cC=O)O))c6C\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_0-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the canonical SMILES.\ncanonical SMILES: CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the canonical SMILES.\ncanonical SMILES: COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: COccccCl)cc6CNccccnc6)))))))cccccccnc6c%10O"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_3-0.jsonl": "{"text":"The molecule with the canonical SMILES CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C can also be represented with the DeepSMILES CCnncC)cNC=O)\/C=C\/ccccOC))cCOccccCl)cc6Br)))))))))c6))))))))))c5C."} {"text":"The molecule with the canonical SMILES CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1 can also be represented with the DeepSMILES CCNCCC#N))))C=O)CScncc-ccccF)cc6))))))o5."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_1-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC can also be represented with the DeepSMILES COccc\/C=C\/C=O)O))))ccS=O)=O)NccccF)cc6))))))))c6OC."} {"text":"The molecule with the canonical SMILES representation of Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C can also be represented with the DeepSMILES representation CccccNS=O)=O)ccccF)cc6))))))))cC=O)O))c6C."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_5-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the canonical SMILES.\ncanonical SMILES: CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CCC)CNC=O)ccncccsc5c9)))))))))))C=O)NCCCC5"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the canonical SMILES.\nMolecule canonical SMILES: CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_2-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the DeepSMILES.\nDeepSMILES: Cl.Fcccccc6)CCCCNCCcccccc6C%10))))))))))))C5\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the DeepSMILES.\nDeepSMILES: Ccccc-ncSCC=O)NCcccco5))))))))))ncc[nH]cccccc69)))))))c6=O)))))))cc6\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_1-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the canonical SMILES.\ncanonical SMILES: COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: COcccccCC=O)Ncccc-ccscC)n5)))))cc6)))))))))c6"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the canonical SMILES.\ncanonical SMILES: COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: COcccc[nH]cc=O)nN)c=O)[nH]c6c9c%13"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_5-5.jsonl": "{"text":"User: Can you generate the canonical SMILES of the molecule with the DeepSMILES CccccN)cc6NC=O)CScccccc6F?\nAssistant: Of course, this molecule has a canonical SMILES of Cc1ccc(N)cc1NC(=O)CSc1ccccc1F."} {"text":"User: Can you create the canonical SMILES of the molecule with the DeepSMILES CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58?\nAssistant: Of course, this molecule has a canonical SMILES of CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_2-4.jsonl": "{"text":"User: Can you tell me the DeepSMILES of the molecule with the canonical SMILES Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2?\nAssistant: Sure, this molecule has a DeepSMILES of Cl.Fcccccc6)CCCCNCCcccccc6C%10))))))))))))C5."} {"text":"User: Can you tell me the DeepSMILES of the molecule with the canonical SMILES Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1?\nAssistant: Of course, this molecule has a DeepSMILES of Ccccc-ncSCC=O)NCcccco5))))))))))ncc[nH]cccccc69)))))))c6=O)))))))cc6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_3-4.jsonl": "{"text":"User: Can you tell me the DeepSMILES of the molecule with the canonical SMILES CC(C(=O)OCC(=O)c1ccc(F)cc1)N1C(=O)C2C3C=CC(C3)C2C1=O?\nAssistant: Of course, this molecule has a DeepSMILES of CCC=O)OCC=O)ccccF)cc6))))))))))NC=O)CCC=CCC5)C6C9=O."} {"text":"User: Can you tell me the DeepSMILES of the molecule with the canonical SMILES CCn1c(=O)n(CC(=O)OCCN2CCCC2=O)c2ccccc21?\nAssistant: Sure, this molecule has a DeepSMILES of CCnc=O)nCC=O)OCCNCCCC5=O)))))))))))cccccc69."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_0-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1 can also be represented with the DeepSMILES representation S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6."} {"text":"The molecule with the canonical SMILES representation of COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O can also be represented with the DeepSMILES COccccCl)cc6CNccccnc6)))))))cccccccnc6c%10O."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_2-5.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the DeepSMILES NC=O)ccc-cccncc6))))))[nH]c5-ccccF)cc6?\nAssistant: Of course, this molecule has a canonical SMILES of NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1."} {"text":"User: Can you generate the canonical SMILES of the molecule with the DeepSMILES COcccc-cnn-cccccc6))))))cc5C=O)O)))))))cc6N?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_5-5.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the DeepSMILES COCCNCCNC=O)NCC)CC)cccccc6))))))))))CC6=O?\nAssistant: Yes, this molecule has a canonical SMILES of COCCN1CCN(C(=O)NC(C)C(C)c2ccccc2)CC1=O."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the DeepSMILES Nccc[nH+]cccccc%106)))))))CCCC6?\nAssistant: Yes, this molecule has a canonical SMILES of Nc1c2c([nH+]c3ccccc13)CCCC2."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_3-5.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the DeepSMILES CCnncC)cNC=O)\/C=C\/ccccOC))cCOccccCl)cc6Br)))))))))c6))))))))))c5C?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C."} {"text":"User: Can you create the canonical SMILES of the molecule with the DeepSMILES CCNCCC#N))))C=O)CScncc-ccccF)cc6))))))o5?\nAssistant: Of course, this molecule has a canonical SMILES of CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_2-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2 can also be represented with the DeepSMILES Cl.Fcccccc6)CCCCNCCcccccc6C%10))))))))))))C5."} {"text":"The molecule with the canonical SMILES representation of Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1 can also be represented with the DeepSMILES representation Ccccc-ncSCC=O)NCcccco5))))))))))ncc[nH]cccccc69)))))))c6=O)))))))cc6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_1-4.jsonl": "{"text":"User: Can you tell me the DeepSMILES of the molecule with the canonical SMILES COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC?\nAssistant: Yes, this molecule has a DeepSMILES of COccc\/C=C\/C=O)O))))ccS=O)=O)NccccF)cc6))))))))c6OC."} {"text":"User: Can you create the DeepSMILES of the molecule with the canonical SMILES Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C?\nAssistant: Yes, this molecule has a DeepSMILES of CccccNS=O)=O)ccccF)cc6))))))))cC=O)O))c6C."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_2-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12 can also be represented with the DeepSMILES representation CCCCNCC))C=O)C=O)cc-ccccCl)cc6))))))[nH]ccccCl)cc96."} {"text":"The molecule with the canonical SMILES representation of CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1 can also be represented with the DeepSMILES representation CNCC=O)Ncccccc6Br))))))))))C=O)CNCCNCccccCl)cc6)))))))CC6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_4-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the canonical SMILES.\ncanonical SMILES: CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CC=O)NcccccC=O)OCCNCCCC5=O))))))))))c6"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the canonical SMILES.\nMolecule canonical SMILES: CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CCC)CNC=O)ccccF)cc6))))))))C=O)OCCCCCO5"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_0-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the canonical SMILES from the DeepSMILES.\nDeepSMILES: S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the DeepSMILES.\nDeepSMILES: COccccCl)cc6CNccccnc6)))))))cccccccnc6c%10O\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_4-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the canonical SMILES.\ncanonical SMILES: Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: Ccoccc5C=O)NCccccS=O)=O)NCCC3)))))cc6"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the canonical SMILES.\ncanonical SMILES: Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Ccoccc5C=O)NCCcnc-cccccn6))))))cs5"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_3-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the canonical SMILES.\ncanonical SMILES: CC(C(=O)OCC(=O)c1ccc(F)cc1)N1C(=O)C2C3C=CC(C3)C2C1=O\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CCC=O)OCC=O)ccccF)cc6))))))))))NC=O)CCC=CCC5)C6C9=O"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the canonical SMILES.\nMolecule canonical SMILES: CCn1c(=O)n(CC(=O)OCCN2CCCC2=O)c2ccccc21\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CCnc=O)nCC=O)OCCNCCCC5=O)))))))))))cccccc69"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_2-1.jsonl": "{"text":"The molecule with the DeepSMILES representation of CCCCNCC))C=O)C=O)cc-ccccCl)cc6))))))[nH]ccccCl)cc96 can also be represented with the canonical SMILES CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12."} {"text":"The molecule with the DeepSMILES CNCC=O)Ncccccc6Br))))))))))C=O)CNCCNCccccCl)cc6)))))))CC6 can also be represented with the canonical SMILES representation CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_5-4.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the canonical SMILES CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1?\nAssistant: Yes, this molecule has a DeepSMILES of CCC)CNC=O)ccncccsc5c9)))))))))))C=O)NCCCC5."} {"text":"User: Can you tell me the DeepSMILES of the molecule with the canonical SMILES CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1?\nAssistant: Sure, this molecule has a DeepSMILES of CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_4-5.jsonl": "{"text":"User: Can you generate the canonical SMILES of the molecule with the DeepSMILES Ccoccc5C=O)NCccccS=O)=O)NCCC3)))))cc6?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the DeepSMILES Ccoccc5C=O)NCCcnc-cccccn6))))))cs5?\nAssistant: Yes, this molecule has a canonical SMILES of Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_4-0.jsonl": "{"text":"The molecule with the canonical SMILES CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1 can also be represented with the DeepSMILES representation CC=O)NcccccC=O)OCCNCCCC5=O))))))))))c6."} {"text":"The molecule with the canonical SMILES CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1 can also be represented with the DeepSMILES CCC)CNC=O)ccccF)cc6))))))))C=O)OCCCCCO5."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_5-1.jsonl": "{"text":"The molecule with the DeepSMILES COCCNCCNC=O)NCC)CC)cccccc6))))))))))CC6=O can also be represented with the canonical SMILES representation COCCN1CCN(C(=O)NC(C)C(C)c2ccccc2)CC1=O."} {"text":"The molecule with the DeepSMILES Nccc[nH+]cccccc%106)))))))CCCC6 can also be represented with the canonical SMILES representation Nc1c2c([nH+]c3ccccc13)CCCC2."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_2-1.jsonl": "{"text":"The molecule with the DeepSMILES NC=O)ccc-cccncc6))))))[nH]c5-ccccF)cc6 can also be represented with the canonical SMILES NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1."} {"text":"The molecule with the DeepSMILES representation of COcccc-cnn-cccccc6))))))cc5C=O)O)))))))cc6N can also be represented with the canonical SMILES COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_0-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1 can also be represented with the DeepSMILES OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C."} {"text":"The molecule with the canonical SMILES COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1 can also be represented with the DeepSMILES COcccc-cccC=O)Ncccccc6SC))))))))))no5)))))cc6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_0-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the DeepSMILES.\nDeepSMILES: OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the DeepSMILES.\nDeepSMILES: COcccc-cccC=O)Ncccccc6SC))))))))))no5)))))cc6\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_1-4.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the canonical SMILES CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC?\nAssistant: Of course, this molecule has a DeepSMILES of CCOccccCNC=O)CCcccccc6N9C=O)CC)))))))))))))))cc6OC."} {"text":"User: Can you generate the DeepSMILES of the molecule with the canonical SMILES Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of CcncNC=O)cccccn6))))))))sc5S=O)=O)Ccccccc6Cl."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_2-5.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the DeepSMILES Cl.Fcccccc6)CCCCNCCcccccc6C%10))))))))))))C5?\nAssistant: Of course, this molecule has a canonical SMILES of Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the DeepSMILES Ccccc-ncSCC=O)NCcccco5))))))))))ncc[nH]cccccc69)))))))c6=O)))))))cc6?\nAssistant: Yes, this molecule has a canonical SMILES of Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_1-1.jsonl": "{"text":"The molecule with the DeepSMILES representation of CCOccccCNC=O)CCcccccc6N9C=O)CC)))))))))))))))cc6OC can also be represented with the canonical SMILES representation CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC."} {"text":"The molecule with the DeepSMILES CcncNC=O)cccccn6))))))))sc5S=O)=O)Ccccccc6Cl can also be represented with the canonical SMILES representation Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_5-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the canonical SMILES.\nMolecule canonical SMILES: COCCN1CCN(C(=O)NC(C)C(C)c2ccccc2)CC1=O\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: COCCNCCNC=O)NCC)CC)cccccc6))))))))))CC6=O"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the DeepSMILES from the canonical SMILES.\nMolecule canonical SMILES: Nc1c2c([nH+]c3ccccc13)CCCC2\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: Nccc[nH+]cccccc%106)))))))CCCC6"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_5-1.jsonl": "{"text":"The molecule with the DeepSMILES CCC)CNC=O)ccncccsc5c9)))))))))))C=O)NCCCC5 can also be represented with the canonical SMILES representation CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1."} {"text":"The molecule with the DeepSMILES CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20 can also be represented with the canonical SMILES representation CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_4-1.jsonl": "{"text":"The molecule with the DeepSMILES representation of O=COCCNCCCC5=O)))))))))ccc=O)[nH]cccccc%106 can also be represented with the canonical SMILES O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12."} {"text":"The molecule with the DeepSMILES representation of CCCC=O)NcccS=O)=O)NCCCC5))))))ccc6S%11 can also be represented with the canonical SMILES CC1CC(=O)Nc2cc(S(=O)(=O)N3CCCC3)ccc2S1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_3-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the DeepSMILES.\nMolecule DeepSMILES: CCnncC)cNC=O)\/C=C\/ccccOC))cCOccccCl)cc6Br)))))))))c6))))))))))c5C\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the DeepSMILES.\nMolecule DeepSMILES: CCNCCC#N))))C=O)CScncc-ccccF)cc6))))))o5\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_3-4.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the canonical SMILES COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of COC=O)C=CCScncccc6C#N))))CNC)CC6))))))))))OCN)=CC#N))C6ccccF)cc6."} {"text":"User: Can you create the DeepSMILES of the molecule with the canonical SMILES CC(=O)Nc1ccc(C(=O)OCCN2CCCC2=O)cc1?\nAssistant: Of course, this molecule has a DeepSMILES of CC=O)NccccC=O)OCCNCCCC5=O))))))))))cc6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_1-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the DeepSMILES.\nMolecule DeepSMILES: COcccccCC=O)Ncccc-ccscC)n5)))))cc6)))))))))c6\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the DeepSMILES.\nDeepSMILES: COcccc[nH]cc=O)nN)c=O)[nH]c6c9c%13\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_0-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the canonical SMILES.\ncanonical SMILES: CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the canonical SMILES.\ncanonical SMILES: CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CCC)C)NC=O)\/C=C\\C=C\\cccccc6)))))))))NC=O)cccc[N+]=O)[O-]))cc6"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_5-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of COCCN1CCN(C(=O)NC(C)C(C)c2ccccc2)CC1=O can also be represented with the DeepSMILES representation COCCNCCNC=O)NCC)CC)cccccc6))))))))))CC6=O."} {"text":"The molecule with the canonical SMILES representation of Nc1c2c([nH+]c3ccccc13)CCCC2 can also be represented with the DeepSMILES Nccc[nH+]cccccc%106)))))))CCCC6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_0-1.jsonl": "{"text":"The molecule with the DeepSMILES representation of S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6 can also be represented with the canonical SMILES representation CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F."} {"text":"The molecule with the DeepSMILES representation of CCC)C)NC=O)\/C=C\\C=C\\cccccc6)))))))))NC=O)cccc[N+]=O)[O-]))cc6 can also be represented with the canonical SMILES representation CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_2-4.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the canonical SMILES CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12?\nAssistant: Yes, this molecule has a DeepSMILES of CCCCNCC))C=O)C=O)cc-ccccCl)cc6))))))[nH]ccccCl)cc96."} {"text":"User: Can you create the DeepSMILES of the molecule with the canonical SMILES CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1?\nAssistant: Sure, this molecule has a DeepSMILES of CNCC=O)Ncccccc6Br))))))))))C=O)CNCCNCccccCl)cc6)))))))CC6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_2-1.jsonl": "{"text":"The molecule with the DeepSMILES Cl.Fcccccc6)CCCCNCCcccccc6C%10))))))))))))C5 can also be represented with the canonical SMILES Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2."} {"text":"The molecule with the DeepSMILES Ccccc-ncSCC=O)NCcccco5))))))))))ncc[nH]cccccc69)))))))c6=O)))))))cc6 can also be represented with the canonical SMILES representation Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_1-1.jsonl": "{"text":"The molecule with the DeepSMILES COcccccCC=O)Ncccc-ccscC)n5)))))cc6)))))))))c6 can also be represented with the canonical SMILES COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1."} {"text":"The molecule with the DeepSMILES COcccc[nH]cc=O)nN)c=O)[nH]c6c9c%13 can also be represented with the canonical SMILES COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_3-5.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the DeepSMILES CCC=O)OCC=O)ccccF)cc6))))))))))NC=O)CCC=CCC5)C6C9=O?\nAssistant: Yes, this molecule has a canonical SMILES of CC(C(=O)OCC(=O)c1ccc(F)cc1)N1C(=O)C2C3C=CC(C3)C2C1=O."} {"text":"User: Can you create the canonical SMILES of the molecule with the DeepSMILES CCnc=O)nCC=O)OCCNCCCC5=O)))))))))))cccccc69?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of CCn1c(=O)n(CC(=O)OCCN2CCCC2=O)c2ccccc21."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_3-1.jsonl": "{"text":"The molecule with the DeepSMILES representation of CCnncC)cNC=O)\/C=C\/ccccOC))cCOccccCl)cc6Br)))))))))c6))))))))))c5C can also be represented with the canonical SMILES CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C."} {"text":"The molecule with the DeepSMILES representation of CCNCCC#N))))C=O)CScncc-ccccF)cc6))))))o5 can also be represented with the canonical SMILES CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_0-5.jsonl": "{"text":"User: Can you generate the canonical SMILES of the molecule with the DeepSMILES S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6?\nAssistant: Yes, this molecule has a canonical SMILES of CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the DeepSMILES CCC)C)NC=O)\/C=C\\C=C\\cccccc6)))))))))NC=O)cccc[N+]=O)[O-]))cc6?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_0-4.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the canonical SMILES CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6."} {"text":"User: Can you create the DeepSMILES of the molecule with the canonical SMILES CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1?\nAssistant: Of course, this molecule has a DeepSMILES of CCC)C)NC=O)\/C=C\\C=C\\cccccc6)))))))))NC=O)cccc[N+]=O)[O-]))cc6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_1-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the DeepSMILES.\nMolecule DeepSMILES: CCOccccCNC=O)CCcccccc6N9C=O)CC)))))))))))))))cc6OC\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the canonical SMILES from the DeepSMILES.\nDeepSMILES: CcncNC=O)cccccn6))))))))sc5S=O)=O)Ccccccc6Cl\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_5-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the canonical SMILES from the DeepSMILES.\nDeepSMILES: CCC)CNC=O)ccncccsc5c9)))))))))))C=O)NCCCC5\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the DeepSMILES.\nMolecule DeepSMILES: CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_0-5.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the DeepSMILES OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C?\nAssistant: Sure, this molecule has a canonical SMILES of Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1."} {"text":"User: Can you create the canonical SMILES of the molecule with the DeepSMILES COcccc-cccC=O)Ncccccc6SC))))))))))no5)))))cc6?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_4-5.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the DeepSMILES O=COCCNCCCC5=O)))))))))ccc=O)[nH]cccccc%106?\nAssistant: Sure, this molecule has a canonical SMILES of O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the DeepSMILES CCCC=O)NcccS=O)=O)NCCCC5))))))ccc6S%11?\nAssistant: Sure, this molecule has a canonical SMILES of CC1CC(=O)Nc2cc(S(=O)(=O)N3CCCC3)ccc2S1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_1-0.jsonl": "{"text":"The molecule with the canonical SMILES CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC can also be represented with the DeepSMILES representation CCOccccCNC=O)CCcccccc6N9C=O)CC)))))))))))))))cc6OC."} {"text":"The molecule with the canonical SMILES representation of Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl can also be represented with the DeepSMILES representation CcncNC=O)cccccn6))))))))sc5S=O)=O)Ccccccc6Cl."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_0-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the canonical SMILES.\nMolecule canonical SMILES: Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the canonical SMILES.\nMolecule canonical SMILES: COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: COcccc-cccC=O)Ncccccc6SC))))))))))no5)))))cc6"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_4-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the canonical SMILES.\nMolecule canonical SMILES: O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: O=COCCNCCCC5=O)))))))))ccc=O)[nH]cccccc%106"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the canonical SMILES.\ncanonical SMILES: CC1CC(=O)Nc2cc(S(=O)(=O)N3CCCC3)ccc2S1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCCC=O)NcccS=O)=O)NCCCC5))))))ccc6S%11"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_4-1.jsonl": "{"text":"The molecule with the DeepSMILES CC=O)NcccccC=O)OCCNCCCC5=O))))))))))c6 can also be represented with the canonical SMILES representation CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1."} {"text":"The molecule with the DeepSMILES representation of CCC)CNC=O)ccccF)cc6))))))))C=O)OCCCCCO5 can also be represented with the canonical SMILES representation CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_5-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the DeepSMILES.\nDeepSMILES: CccccN)cc6NC=O)CScccccc6F\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: Cc1ccc(N)cc1NC(=O)CSc1ccccc1F"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the DeepSMILES.\nDeepSMILES: CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_2-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the canonical SMILES.\nMolecule canonical SMILES: NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: NC=O)ccc-cccncc6))))))[nH]c5-ccccF)cc6"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the canonical SMILES.\ncanonical SMILES: COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: COcccc-cnn-cccccc6))))))cc5C=O)O)))))))cc6N"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_1-1.jsonl": "{"text":"The molecule with the DeepSMILES representation of COccc\/C=C\/C=O)O))))ccS=O)=O)NccccF)cc6))))))))c6OC can also be represented with the canonical SMILES COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC."} {"text":"The molecule with the DeepSMILES CccccNS=O)=O)ccccF)cc6))))))))cC=O)O))c6C can also be represented with the canonical SMILES representation Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_0-1.jsonl": "{"text":"The molecule with the DeepSMILES OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C can also be represented with the canonical SMILES representation Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1."} {"text":"The molecule with the DeepSMILES representation of COcccc-cccC=O)Ncccccc6SC))))))))))no5)))))cc6 can also be represented with the canonical SMILES representation COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_3-4.jsonl": "{"text":"User: Can you tell me the DeepSMILES of the molecule with the canonical SMILES CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C?\nAssistant: Of course, this molecule has a DeepSMILES of CCnncC)cNC=O)\/C=C\/ccccOC))cCOccccCl)cc6Br)))))))))c6))))))))))c5C."} {"text":"User: Can you create the DeepSMILES of the molecule with the canonical SMILES CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1?\nAssistant: Of course, this molecule has a DeepSMILES of CCNCCC#N))))C=O)CScncc-ccccF)cc6))))))o5."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_0-4.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the canonical SMILES Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C."} {"text":"User: Can you create the DeepSMILES of the molecule with the canonical SMILES COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1?\nAssistant: Of course, this molecule has a DeepSMILES of COcccc-cccC=O)Ncccccc6SC))))))))))no5)))))cc6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_1-0.jsonl": "{"text":"The molecule with the canonical SMILES COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1 can also be represented with the DeepSMILES COcccccCC=O)Ncccc-ccscC)n5)))))cc6)))))))))c6."} {"text":"The molecule with the canonical SMILES representation of COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1 can also be represented with the DeepSMILES representation COcccc[nH]cc=O)nN)c=O)[nH]c6c9c%13."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_4-5.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the DeepSMILES CC=O)NcccccC=O)OCCNCCCC5=O))))))))))c6?\nAssistant: Sure, this molecule has a canonical SMILES of CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the DeepSMILES CCC)CNC=O)ccccF)cc6))))))))C=O)OCCCCCO5?\nAssistant: Of course, this molecule has a canonical SMILES of CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_4-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the DeepSMILES.\nDeepSMILES: O=COCCNCCCC5=O)))))))))ccc=O)[nH]cccccc%106\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the DeepSMILES.\nDeepSMILES: CCCC=O)NcccS=O)=O)NCCCC5))))))ccc6S%11\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CC1CC(=O)Nc2cc(S(=O)(=O)N3CCCC3)ccc2S1"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_4-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the canonical SMILES from the DeepSMILES.\nMolecule DeepSMILES: Ccoccc5C=O)NCccccS=O)=O)NCCC3)))))cc6\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the DeepSMILES.\nMolecule DeepSMILES: Ccoccc5C=O)NCCcnc-cccccn6))))))cs5\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_3-1.jsonl": "{"text":"The molecule with the DeepSMILES representation of COC=O)C=CCScncccc6C#N))))CNC)CC6))))))))))OCN)=CC#N))C6ccccF)cc6 can also be represented with the canonical SMILES COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1."} {"text":"The molecule with the DeepSMILES CC=O)NccccC=O)OCCNCCCC5=O))))))))))cc6 can also be represented with the canonical SMILES CC(=O)Nc1ccc(C(=O)OCCN2CCCC2=O)cc1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_2-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the DeepSMILES.\nDeepSMILES: CCCCNCC))C=O)C=O)cc-ccccCl)cc6))))))[nH]ccccCl)cc96\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the DeepSMILES.\nMolecule DeepSMILES: CNCC=O)Ncccccc6Br))))))))))C=O)CNCCNCccccCl)cc6)))))))CC6\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_3-1.jsonl": "{"text":"The molecule with the DeepSMILES representation of CCC=O)OCC=O)ccccF)cc6))))))))))NC=O)CCC=CCC5)C6C9=O can also be represented with the canonical SMILES CC(C(=O)OCC(=O)c1ccc(F)cc1)N1C(=O)C2C3C=CC(C3)C2C1=O."} {"text":"The molecule with the DeepSMILES CCnc=O)nCC=O)OCCNCCCC5=O)))))))))))cccccc69 can also be represented with the canonical SMILES CCn1c(=O)n(CC(=O)OCCN2CCCC2=O)c2ccccc21."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_0-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the DeepSMILES.\nMolecule DeepSMILES: S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the DeepSMILES.\nDeepSMILES: CCC)C)NC=O)\/C=C\\C=C\\cccccc6)))))))))NC=O)cccc[N+]=O)[O-]))cc6\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_1-5.jsonl": "{"text":"User: Can you generate the canonical SMILES of the molecule with the DeepSMILES COcccccCC=O)Ncccc-ccscC)n5)))))cc6)))))))))c6?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1."} {"text":"User: Can you create the canonical SMILES of the molecule with the DeepSMILES COcccc[nH]cc=O)nN)c=O)[nH]c6c9c%13?\nAssistant: Sure, this molecule has a canonical SMILES of COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_4-1.jsonl": "{"text":"The molecule with the DeepSMILES Ccoccc5C=O)NCccccS=O)=O)NCCC3)))))cc6 can also be represented with the canonical SMILES Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1."} {"text":"The molecule with the DeepSMILES Ccoccc5C=O)NCCcnc-cccccn6))))))cs5 can also be represented with the canonical SMILES representation Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_5-4.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the canonical SMILES Cc1ccc(N)cc1NC(=O)CSc1ccccc1F?\nAssistant: Yes, this molecule has a DeepSMILES of CccccN)cc6NC=O)CScccccc6F."} {"text":"User: Can you tell me the DeepSMILES of the molecule with the canonical SMILES CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/valid_2-5.jsonl": "{"text":"User: Can you generate the canonical SMILES of the molecule with the DeepSMILES CCCCNCC))C=O)C=O)cc-ccccCl)cc6))))))[nH]ccccCl)cc96?\nAssistant: Of course, this molecule has a canonical SMILES of CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12."} {"text":"User: Can you generate the canonical SMILES of the molecule with the DeepSMILES CNCC=O)Ncccccc6Br))))))))))C=O)CNCCNCccccCl)cc6)))))))CC6?\nAssistant: Yes, this molecule has a canonical SMILES of CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_1-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the canonical SMILES.\ncanonical SMILES: COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: COccc\/C=C\/C=O)O))))ccS=O)=O)NccccF)cc6))))))))c6OC"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the canonical SMILES.\nMolecule canonical SMILES: Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CccccNS=O)=O)ccccF)cc6))))))))cC=O)O))c6C"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_0-4.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the canonical SMILES CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1?\nAssistant: Yes, this molecule has a DeepSMILES of S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6."} {"text":"User: Can you create the DeepSMILES of the molecule with the canonical SMILES COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O?\nAssistant: Sure, this molecule has a DeepSMILES of COccccCl)cc6CNccccnc6)))))))cccccccnc6c%10O."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_3-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1 can also be represented with the DeepSMILES representation COC=O)C=CCScncccc6C#N))))CNC)CC6))))))))))OCN)=CC#N))C6ccccF)cc6."} {"text":"The molecule with the canonical SMILES representation of CC(=O)Nc1ccc(C(=O)OCCN2CCCC2=O)cc1 can also be represented with the DeepSMILES CC=O)NccccC=O)OCCNCCCC5=O))))))))))cc6."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/test_2-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the canonical SMILES from the DeepSMILES.\nMolecule DeepSMILES: NC=O)ccc-cccncc6))))))[nH]c5-ccccF)cc6\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the DeepSMILES.\nDeepSMILES: COcccc-cnn-cccccc6))))))cc5C=O)O)))))))cc6N\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_4-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12 can also be represented with the DeepSMILES O=COCCNCCCC5=O)))))))))ccc=O)[nH]cccccc%106."} {"text":"The molecule with the canonical SMILES representation of CC1CC(=O)Nc2cc(S(=O)(=O)N3CCCC3)ccc2S1 can also be represented with the DeepSMILES CCCC=O)NcccS=O)=O)NCCCC5))))))ccc6S%11."}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_3-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the canonical SMILES.\nMolecule canonical SMILES: COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: COC=O)C=CCScncccc6C#N))))CNC)CC6))))))))))OCN)=CC#N))C6ccccF)cc6"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the canonical SMILES.\ncanonical SMILES: CC(=O)Nc1ccc(C(=O)OCCN2CCCC2=O)cc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC=O)NccccC=O)OCCNCCCC5=O))))))))))cc6"}", "/scratch/micpie/export/mol_repr_transl_deepsmiles_canonical/train_3-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the DeepSMILES.\nMolecule DeepSMILES: COC=O)C=CCScncccc6C#N))))CNC)CC6))))))))))OCN)=CC#N))C6ccccF)cc6\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the DeepSMILES.\nMolecule DeepSMILES: CC=O)NccccC=O)OCCNCCCC5=O))))))))))cc6\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CC(=O)Nc1ccc(C(=O)OCCN2CCCC2=O)cc1"}", "/scratch/micpie/export/compound_protein_pathway_disease_3/test_4-0.jsonl": "{"text":"The compound [O][=C][O][C][C@H1][C@@H1][Ring1][Branch1][C][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C@@H1][Ring1][O][\/C][=C][\/C][=C][C][=C][Branch1][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1][C][=N][Ring1][=N] targets the protein PAR-1. The protein PAR-1 is involved in Calcium signaling pathway. The Calcium signaling pathway is modulated by the disease Progressive osseous heteroplasia."} {"text":"The compound N#C\/C(=C\\c1cc(O)ccc1O)C(=O)O targets the protein Proto-oncogene c-ErbB-1. The protein Proto-oncogene c-ErbB-1 is involved in Focal adhesion. The Focal adhesion is modulated by the disease Stickler syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/valid_5-0.jsonl": "{"text":"The compound [O][C][=C][Branch1][C][Cl][C][=C][Branch1][C][Cl][C][Branch1][C][Cl][=C][Ring1][=Branch2][C][C][=C][Branch1][C][O][C][Branch1][C][Cl][=C][C][Branch1][C][Cl][=C][Ring1][=Branch2][Cl] targets the protein Proto-oncogene c-ErbB-1. The protein Proto-oncogene c-ErbB-1 is involved in Focal adhesion. The Focal adhesion is modulated by the disease Stickler syndrome."} {"text":"The compound CC(C)(C)OC(=O)N1CCC(C(=O)Nc2ccc(-c3ccc(NC(=O)C4CCN(C(=O)OC(C)(C)C)CC4)cc3)cc2)CC1 targets the protein PTP-1C. The protein PTP-1C is involved in Natural killer cell mediated cytotoxicity. The Natural killer cell mediated cytotoxicity is modulated by the disease Cherubism."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/valid_3-0.jsonl": "{"text":"The compound InChI=1S\/C20H14ClN3O2\/c21-16-3-1-2-4-17(16)22-14-9-10-15-18(11-14)23-24-19(15)12-5-7-13(8-6-12)20(25)26\/h1-11,22H,(H,23,24)(H,25,26) targets the protein Stress-activated protein kinase JNK1. The protein Stress-activated protein kinase JNK1 is involved in MAPK signaling pathway. The MAPK signaling pathway is modulated by the disease Atelosteogenesis type I and III."} {"text":"The compound N=C(N)NCCC[C@H](NC(=O)[C@H](CC1CCCCC1)NC(=O)C1=COc2ccccc2O1)C(=O)N[C@@H](Cc1ccccc1)C(N)=O targets the protein Coagulation factor II receptor. The protein Coagulation factor II receptor is involved in Calcium signaling pathway. The Calcium signaling pathway is modulated by the disease Progressive osseous heteroplasia."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/test_5-0.jsonl": "{"text":"The compound N#CC(C#N)=C(C#N)c1cc(O)c(O)c(O)c1 targets the protein Proto-oncogene c-ErbB-1. The protein Proto-oncogene c-ErbB-1 is involved in Focal adhesion. The Focal adhesion is modulated by the disease Stickler syndrome."} {"text":"The compound C[C@H]1[C@H](C)CC[C@]2(C(=O)O)CC[C@]3(C)C(=CC[C@@H]4[C@@]5(C)CC[C@H](O)C(C)(C)[C@@H]5CC[C@]43C)[C@H]12 targets the protein Protein-tyrosine phosphatase 1C. The protein Protein-tyrosine phosphatase 1C is involved in Natural killer cell mediated cytotoxicity. The Natural killer cell mediated cytotoxicity is modulated by the disease Cherubism."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/test_2-0.jsonl": "{"text":"The compound InChI=1S\/C30H40N6O\/c1-22(2)29-33-32-23(3)36(29)28-19-26-15-16-27(20-28)34(26)17-10-18-35(25-13-8-5-9-14-25)30(37)31-21-24-11-6-4-7-12-24\/h4-9,11-14,22,26-28H,10,15-21H2,1-3H3,(H,31,37)\/t26-,27+,28+ targets the protein C-C chemokine receptor type 5. The protein C-C chemokine receptor type 5 is involved in Cytokine-cytokine receptor interaction. The Cytokine-cytokine receptor interaction is modulated by the disease Kowarski syndrome."} {"text":"The compound [C][O][C][=Branch1][C][=O][C][=C][S][C][=C][Ring1][Branch1][N][C][=Branch1][C][=O][C][C][=C][C][=C][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1] targets the protein JNK-46. The protein JNK-46 is involved in MAPK signaling pathway. The MAPK signaling pathway is modulated by the disease Atelosteogenesis type I and III."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/valid_0-0.jsonl": "{"text":"The compound [C][C][C][N][Branch1][Ring2][C][C][C][C][=N][C][Branch1][C][N][=N][C][=N][C][=C][Branch1][P][C][=C][C][=C][Branch1][Ring1][O][C][C][Branch1][Ring1][O][C][=C][Ring1][#Branch2][N][=C][Ring2][Ring1][Branch1][Ring1][S] targets the protein Neuronal NOS. The protein Neuronal NOS is involved in Arginine and proline metabolism. The Arginine and proline metabolism is modulated by the disease Snyder-Robinson syndrome."} {"text":"The compound CN(C)CCn1cc(NC(=O)c2cccc(-n3cc(NC(=O)Nc4ccccc4Cl)cn3)c2)cn1 targets the protein MAP kinase 9. The protein MAP kinase 9 is involved in Protein processing in endoplasmic reticulum. The Protein processing in endoplasmic reticulum is modulated by the disease Leprosy."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/test_7-0.jsonl": "{"text":"The compound C[C@]12CC[C@@H]3c4ccc(O)cc4C[C@@H](CCCCCCCCC[S+]([O-])CCCC(F)(F)C(F)(F)F)[C@H]3[C@@H]1CC[C@@H]2O targets the protein Farnesoid X-activated receptor. The protein Farnesoid X-activated receptor is involved in Bile secretion. The Bile secretion is modulated by the disease Dubin-Johnson syndrome."} {"text":"The compound [3H]C[3H])[3H])NCCCOcccccc6OC)))))))))cccccc6 targets the protein NET. The protein NET is involved in Synaptic vesicle cycle. The Synaptic vesicle cycle is modulated by the disease Orthostatic intolerance."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/test_3-0.jsonl": "{"text":"The compound [C][O][C][=Branch1][C][=O][C][=C][Branch2][Ring1][Ring2][C][N][C][C][C][Branch1][=Branch2][S][Branch1][C][C][=Branch1][C][=O][=O][C][C][Ring1][#Branch2][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][N][Ring2][Ring1][#Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1] targets the protein MAPK 8. The protein MAPK 8 is involved in MAPK signaling pathway. The MAPK signaling pathway is modulated by the disease Atelosteogenesis type I and III."} {"text":"The compound C[C@H]1OC(=O)[C@@H]2C[C@@H]3C[C@@]4(CC[C@H]3[C@H](\/C=C\/c3ccc(-c5cccc(C(F)(F)F)c5)cn3)[C@H]12)CNC(=O)O4 targets the protein PAR-1. The protein PAR-1 is involved in Calcium signaling pathway. The Calcium signaling pathway is modulated by the disease Progressive osseous heteroplasia."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/train_1-0.jsonl": "{"text":"The compound InChI=1S\/C21H22N4O4\/c1-27-17-10-14(11-18(28-2)20(17)29-3)25-19-12-13(8-9-23-19)24-16-7-5-4-6-15(16)21(22)26\/h4-12H,1-3H3,(H2,22,26)(H2,23,24,25) targets the protein SAPK1a. The protein SAPK1a is involved in Protein processing in endoplasmic reticulum. The Protein processing in endoplasmic reticulum is modulated by the disease Leprosy."} {"text":"The compound O=C(O)[C@@H](CC1CCC1)N1C[C@H](CN2CCC(CCCc3ccccc3)CC2)[C@@H](c2cccc(F)c2)C1 targets the protein CC-CKR-5. The protein CC-CKR-5 is involved in Cytokine-cytokine receptor interaction. The Cytokine-cytokine receptor interaction is modulated by the disease Kowarski syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/test_0-0.jsonl": "{"text":"The compound InChI=1S\/C4H9N3\/c5-4-6-2-1-3-7-4\/h1-3H2,(H3,5,6,7) targets the protein Neuronal NOS. The protein Neuronal NOS is involved in Arginine and proline metabolism. The Arginine and proline metabolism is modulated by the disease Snyder-Robinson syndrome."} {"text":"The compound CCNC(=O)N[C@H]1CC[C@H](Nc2ncc3ccc(=O)n(C(C)C)c3n2)CC1 targets the protein c-Jun N-terminal kinase 2. The protein c-Jun N-terminal kinase 2 is involved in Protein processing in endoplasmic reticulum. The Protein processing in endoplasmic reticulum is modulated by the disease Leprosy."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/test_6-0.jsonl": "{"text":"The compound O=C(OCCCCC#Cc1ccc(C(=O)OC2CSSC2)cc1)c1cccc([N+](=O)[O-])c1 targets the protein Protein-tyrosine phosphatase SHP-1. The protein Protein-tyrosine phosphatase SHP-1 is involved in Natural killer cell mediated cytotoxicity. The Natural killer cell mediated cytotoxicity is modulated by the disease Cherubism."} {"text":"The compound [C][C@H1][Branch2][Ring1][O][C][C][N][C][=Branch1][C][=O][O][C][C][C][C][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][C][Ring1][=Branch2][C][Ring1][#Branch1][C][C][C][C@H1][C@H1][C@H1][Branch1][=Branch2][C][C][C@][Ring1][=Branch2][Ring1][=Branch1][C][C@@][Branch1][C][C][C][C][C@@H1][Branch1][C][O][C][C][Ring1][Branch2][C][C@H1][Ring1][S][O] targets the protein Farnesol receptor HRR-1. The protein Farnesol receptor HRR-1 is involved in Bile secretion. The Bile secretion is modulated by the disease Dubin-Johnson syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/train_2-0.jsonl": "{"text":"The compound O=C(NCc1cc(C(F)(F)F)cc(C(F)(F)F)c1)[C@@H](CCN1CCC2(C=Cc3ccccc32)CC1)c1ccc(F)cc1 targets the protein C-C CKR-5. The protein C-C CKR-5 is involved in Cytokine-cytokine receptor interaction. The Cytokine-cytokine receptor interaction is modulated by the disease Kowarski syndrome."} {"text":"The compound InChI=1S\/C16H12N6OS\/c23-13(6-10-2-1-3-11-7-17-5-4-12(10)11)21-16-14(19-9-24-16)15-18-8-20-22-15\/h1-5,7-9H,6H2,(H,21,23)(H,18,20,22) targets the protein SAPK1c. The protein SAPK1c is involved in MAPK signaling pathway. The MAPK signaling pathway is modulated by the disease Atelosteogenesis type I and III."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/valid_2-0.jsonl": "{"text":"The compound CC[C@@H](C)[C@H](C(=O)O)N1C[C@H](CN2CCC(c3cc(Cc4ccc(OCC(F)(F)F)cc4)nn3CC)CC2)[C@@H](c2cccc(F)c2)C1 targets the protein CCR-5. The protein CCR-5 is involved in Cytokine-cytokine receptor interaction. The Cytokine-cytokine receptor interaction is modulated by the disease Kowarski syndrome."} {"text":"The compound [O][=C][Branch2][Ring1][N][C][N][C][=Branch1][C][=O][C][=C][N][N][=C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][=C][Ring1][=C][Ring1][=Branch2][N][C][S][C][=C][Branch1][C][Br][C][=Ring1][=Branch1][C][=N][C][=N][NH1][Ring1][Branch1] targets the protein Mitogen-activated protein kinase 8. The protein Mitogen-activated protein kinase 8 is involved in MAPK signaling pathway. The MAPK signaling pathway is modulated by the disease Atelosteogenesis type I and III."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/valid_4-0.jsonl": "{"text":"The compound InChI=1S\/C29H31FN2O3\/c1-17-27-24(11-10-21-9-8-19(15-31-21)23-4-2-3-5-26(23)30)22-12-13-32(28(33)18-6-7-18)16-20(22)14-25(27)29(34)35-17\/h2-5,8-11,15,17-18,20,22,24-25,27H,6-7,12-14,16H2,1H3\/b11-10+\/t17-,20-,22-,24+,25-,27+\/m1\/s1 targets the protein Coagulation factor II receptor. The protein Coagulation factor II receptor is involved in Calcium signaling pathway. The Calcium signaling pathway is modulated by the disease Progressive osseous heteroplasia."} {"text":"The compound Ncncncc6ncn5CC=O)ccccnc6 targets the protein Proto-oncogene c-ErbB-1. The protein Proto-oncogene c-ErbB-1 is involved in Focal adhesion. The Focal adhesion is modulated by the disease Stickler syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/train_0-0.jsonl": "{"text":"The compound InChI=1S\/C7H15N3O3S\/c1-5(8)10-2-3-14(13)4-6(9)7(11)12\/h6H,2-4,9H2,1H3,(H2,8,10)(H,11,12)\/t6-,14?\/m1\/s1 targets the protein Nitric oxide synthase, brain. The protein Nitric oxide synthase, brain is involved in Arginine and proline metabolism. The Arginine and proline metabolism is modulated by the disease Snyder-Robinson syndrome."} {"text":"The compound [C][O][C][=C][C][Branch1][=Branch2][N][C][C][O][C][C][Ring1][=Branch1][=C][C][=C][Ring1][N][N][C][=N][C][=C][Branch1][C][Cl][C][Branch1][#C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Branch1][C][N][=O][=N][Ring1][P] targets the protein SAPK1a. The protein SAPK1a is involved in Protein processing in endoplasmic reticulum. The Protein processing in endoplasmic reticulum is modulated by the disease Leprosy."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/train_8-0.jsonl": "{"text":"The compound CNCCCCCcccccc6NcccccF)c6))))))C%10=O targets the protein Solute carrier family 6 member 2. The protein Solute carrier family 6 member 2 is involved in Synaptic vesicle cycle. The Synaptic vesicle cycle is modulated by the disease Orthostatic intolerance."} {"text":"The compound InChI=1S\/C4H10N2S\/c1-3(2)7-4(5)6\/h3H,1-2H3,(H3,5,6) targets the protein NOS type I. The protein NOS type I is involved in Arginine biosynthesis. The Arginine biosynthesis is modulated by the disease Hyperargininemia."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/train_5-0.jsonl": "{"text":"The compound [O][=C][Branch2][Branch1][#Branch2][O][C@H1][C][N][C@H1][Branch2][Ring2][=C][C][#C][C][=C][C][=N][C][=N][C][Branch2][Ring1][#Branch2][N][C][=C][C][=C][C][Branch1][#C][C][=C][N][Ring1][Branch1][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring1][S][=C][Ring2][Ring1][#Branch1][S][Ring2][Ring1][#Branch2][C][Ring2][Ring1][P][N][C][C][O][C][C][Ring1][=Branch1] targets the protein Proto-oncogene c-ErbB-1. The protein Proto-oncogene c-ErbB-1 is involved in Focal adhesion. The Focal adhesion is modulated by the disease Stickler syndrome."} {"text":"The compound Cn1c(-c2ccccc2)c(-c2cn(CCC(=O)Nc3ccc(-c4ccccc4)cc3)nn2)c2cc(C(=O)O)c(O)cc21 targets the protein SH-PTP1. The protein SH-PTP1 is involved in Natural killer cell mediated cytotoxicity. The Natural killer cell mediated cytotoxicity is modulated by the disease Cherubism."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/test_1-0.jsonl": "{"text":"The compound [C][C][N][C][=Branch1][C][=O][N][C][C][C][Branch2][Ring1][=Branch2][N][C][=N][C][=C][C][Branch1][=C][C][=C][N][=C][C][=C][C][=C][N][Ring1][=Branch2][Ring1][=Branch1][=N][Ring1][#C][C][C][Ring2][Ring1][=Branch1] targets the protein Stress-activated protein kinase 1a. The protein Stress-activated protein kinase 1a is involved in Protein processing in endoplasmic reticulum. The Protein processing in endoplasmic reticulum is modulated by the disease Leprosy."} {"text":"The compound InChI=1S\/C29H32N2O3S2\/c32-35-22-29(27-13-7-8-14-28(27)35)15-17-30(18-16-29)19-24-20-31(21-26(24)23-9-3-1-4-10-23)36(33,34)25-11-5-2-6-12-25\/h1-14,24,26H,15-22H2\/t24-,26-,35?\/m1\/s1 targets the protein CHEMR13. The protein CHEMR13 is involved in Cytokine-cytokine receptor interaction. The Cytokine-cytokine receptor interaction is modulated by the disease Kowarski syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/valid_7-0.jsonl": "{"text":"The compound CCOC(C)(C)[C@H](O)[C@@H](O)C[C@@H](C)C1=C2C[C@H](O)[C@H]3[C@@]4(C)CCC(=O)C(C)(C)[C@@H]4CC[C@]3(C)[C@@]2(C)CC1 targets the protein Bile acid receptor. The protein Bile acid receptor is involved in Bile secretion. The Bile secretion is modulated by the disease Dubin-Johnson syndrome."} {"text":"The compound CNC[C@@H](O)CN1c2ccccc2CCc2ccccc21 targets the protein NET. The protein NET is involved in Synaptic vesicle cycle. The Synaptic vesicle cycle is modulated by the disease Orthostatic intolerance."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/valid_8-0.jsonl": "{"text":"The compound CCOc1ccc(-c2cc(C3CC(C)(C)OC(C)(C)C3)nn2-c2ccccc2OC)cn1 targets the protein Calcium channel, L type, alpha-1 polypeptide isoform 5. The protein Calcium channel, L type, alpha-1 polypeptide isoform 5 is involved in Synaptic vesicle cycle. The Synaptic vesicle cycle is modulated by the disease Orthostatic intolerance."} {"text":"The compound CN1CCCC1CCN1C(=O)CCc2cc(NC(=N)c3cccs3)ccc21.Cl.Cl targets the protein Constitutive NOS. The protein Constitutive NOS is involved in Arginine biosynthesis. The Arginine biosynthesis is modulated by the disease Hyperargininemia."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/valid_1-0.jsonl": "{"text":"The compound InChI=1S\/C24H30ClN7O\/c1-16-6-12-32(13-7-16)30-24(33)31-10-8-17(9-11-31)28-23-27-15-20(25)22(29-23)19-14-26-21-5-3-2-4-18(19)21\/h2-5,14-17,26H,6-13H2,1H3,(H,30,33)(H,27,28,29) targets the protein MAP kinase 9. The protein MAP kinase 9 is involved in Protein processing in endoplasmic reticulum. The Protein processing in endoplasmic reticulum is modulated by the disease Leprosy."} {"text":"The compound CC[C@H](C)[C@H](C(=O)O)N1C[C@H](CN2CCC(c3cc(Cc4ccc(C(C)(C)C)cc4)nn3CC)CC2)[C@@H](c2cccc(F)c2)C1 targets the protein CCR-5. The protein CCR-5 is involved in Cytokine-cytokine receptor interaction. The Cytokine-cytokine receptor interaction is modulated by the disease Kowarski syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/train_6-0.jsonl": "{"text":"The compound COcccccC#Ccc-cccccc6))))))occcO)cC=O)O))cc96)))))))))))c6 targets the protein Tyrosine-protein phosphatase non-receptor type 6. The protein Tyrosine-protein phosphatase non-receptor type 6 is involved in Natural killer cell mediated cytotoxicity. The Natural killer cell mediated cytotoxicity is modulated by the disease Cherubism."} {"text":"The compound [C][C][Branch1][C][C][C][=Branch1][C][=O][N][C][C][C][Branch2][Branch1][=Branch2][C][C@@H1][Branch2][Ring1][C][C][=N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][Ring1][#Branch2][C][N][C][=Branch1][C][=O][C][N][Branch2][Ring1][Ring2][C][=C][C][=C][Branch1][#Branch2][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][=N][C][Ring2][Ring1][Ring1][=O][C][C][Ring2][Ring2][#Branch1] targets the protein Farnesoid X-activated receptor. The protein Farnesoid X-activated receptor is involved in Bile secretion. The Bile secretion is modulated by the disease Dubin-Johnson syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/test_8-0.jsonl": "{"text":"The compound CNC)CCCNcccccc6Scccccc6%14 targets the protein Solute carrier family 6 member 2. The protein Solute carrier family 6 member 2 is involved in Synaptic vesicle cycle. The Synaptic vesicle cycle is modulated by the disease Orthostatic intolerance."} {"text":"The compound InChI=1S\/C9H15N5O3\/c1-3(15)6(16)4-2-11-7-5(12-4)8(17)14-9(10)13-7\/h3-4,6,12,15-16H,2H2,1H3,(H4,10,11,13,14,17)\/t3-,4+,6-\/m0\/s1 targets the protein nNOS. The protein nNOS is involved in Arginine biosynthesis. The Arginine biosynthesis is modulated by the disease Hyperargininemia."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/valid_6-0.jsonl": "{"text":"The compound CSCC[C@H](NC(=O)[C@H](Cc1ccc(O)cc1)NC(=O)[C@@H](N)Cc1ccccc1)C(=O)N[C@@H](Cc1ccc(OP(=O)(O)O)cc1)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(C)C)C(=O)NCC(=O)N[C@@H](CCC(=O)O)C(=O)O targets the protein Protein-tyrosine phosphatase 1C. The protein Protein-tyrosine phosphatase 1C is involved in Natural killer cell mediated cytotoxicity. The Natural killer cell mediated cytotoxicity is modulated by the disease Cherubism."} {"text":"The compound [C][C][Branch1][C][C][Branch1][C][C][C][=C][C][=C][Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][C][Branch1][#Branch1][C][C][=Branch1][C][=O][O][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][Ring1] targets the protein Retinoid X receptor-interacting protein 14. The protein Retinoid X receptor-interacting protein 14 is involved in Bile secretion. The Bile secretion is modulated by the disease Dubin-Johnson syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/train_3-0.jsonl": "{"text":"The compound O=C(NC1CCNCC1)c1cccc(-c2n[nH]c3cc(Nc4ccccc4Cl)ccc23)c1 targets the protein JNK-46. The protein JNK-46 is involved in MAPK signaling pathway. The MAPK signaling pathway is modulated by the disease Atelosteogenesis type I and III."} {"text":"The compound [C][O][C][=Branch1][C][=O][C@@][C][C][Branch1][C][F][Branch1][C][F][C@@H1][Branch1][C][C][C@H1][Branch2][Ring1][Branch2][\/C][=C][\/C][=C][C][=C][Branch1][N][C][=C][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1][C][=N][Ring1][=N][C][Ring2][Ring1][Branch2][C@@H1][Branch1][C][C][O][C][Ring2][Ring1][N][=O] targets the protein Proteinase-activated receptor 1. The protein Proteinase-activated receptor 1 is involved in Calcium signaling pathway. The Calcium signaling pathway is modulated by the disease Progressive osseous heteroplasia."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/train_7-0.jsonl": "{"text":"The compound CCC)OC=O)C=CNC=O)ccccF)cc6)))))))CCC)C)cc7[nH]cccccc96 targets the protein Farnesol receptor HRR-1. The protein Farnesol receptor HRR-1 is involved in Bile secretion. The Bile secretion is modulated by the disease Dubin-Johnson syndrome."} {"text":"The compound CNCCCCCcccccc6Ncccccc6F)))))))S%10=O)=O.Cl targets the protein Norepinephrine transporter. The protein Norepinephrine transporter is involved in Synaptic vesicle cycle. The Synaptic vesicle cycle is modulated by the disease Orthostatic intolerance."}", "/scratch/micpie/export/compound_protein_pathway_disease_3/train_4-0.jsonl": "{"text":"The compound C[C@H]1OC(=O)[C@@H]2C[C@@H]3CCCC[C@H]3[C@H](\/C=C\/c3ccc4cc(O)ccc4n3)[C@H]12 targets the protein PAR-1. The protein PAR-1 is involved in Calcium signaling pathway. The Calcium signaling pathway is modulated by the disease Progressive osseous heteroplasia."} {"text":"The compound CCC(=O)O[C@H]1CN[C@H](C#Cc2cc3ncnc(Nc4ccc(OCc5cccc(F)c5)c(Cl)c4)c3s2)C1 targets the protein Proto-oncogene c-ErbB-1. The protein Proto-oncogene c-ErbB-1 is involved in Focal adhesion. The Focal adhesion is modulated by the disease Stickler syndrome."}", "/scratch/micpie/export/bio_ner_16/valid_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: These included the already mentioned vjbR, but also motB (BAB2 _ 1103), malK (BAB1 _ 0241), norC (BAB2 _ 0955), oppA (BAB1 _ 1601), aspB (BAB1 _ 1397), mosA (BAB1 _ 0666) and three genes encoding hypothetical proteins (BAB1 _ 1717, BAB1 _ 0597 y BAB2 _ 1127)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: vjbR,37,41,Protein\nmotB,52,56,Protein\nBAB2 _ 1103,59,70,Protein\nmalK,73,77,Protein\nBAB1 _ 0241,80,91,Protein\nnorC,94,98,Protein\nBAB2 _ 0955,101,112,Protein\noppA,115,119,Protein\nBAB1 _ 1601,122,133,Protein\naspB,136,140,Protein\nBAB1 _ 1397,143,154,Protein\nmosA,157,161,Protein\nBAB1 _ 0666,164,175,Protein\nBAB1 _ 1717,226,237,Protein\nBAB1 _ 0597,239,250,Protein\nBAB2 _ 1127,253,264,Protein"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Our results demonstrated that: (i) PGN induced phosphorylation of the transcription factors ATF-1 and CREB; (ii) ATF-1 and CREB bound DNA as a dimer and induced transcriptional activation of a CRE reporter plasmid, which was inhibited by dominant negative CREB and ATF-1; (iii) PGN induced phosphorylation of c-Jun, protein synthesis of JunB and c-Fos, and transcriptional activation of the AP-1 reporter plasmid, which was inhibited by dominant negative c-Fos; and (iv) PGN-induced activation of CREB\/ATF and AP-1 was mediated through CD14..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: transcription factors,71,92,Gene\/Protein\nATF - 1,93,100,Gene\/Protein\nCREB,105,109,Gene\/Protein\nATF - 1,117,124,Gene\/Protein\nCREB,129,133,Gene\/Protein\nCRE reporter plasmid,199,219,Gene\/Protein\nCREB,262,266,Gene\/Protein\nATF - 1,271,278,Gene\/Protein\nc - Jun,318,325,Gene\/Protein\nJunB,348,352,Gene\/Protein\nc - Fos,357,364,Gene\/Protein\nAP - 1 reporter plasmid,404,427,Gene\/Protein\nc - Fos,470,477,Gene\/Protein\nCREB \/ ATF,517,527,Gene\/Protein\nAP - 1,532,538,Gene\/Protein\nCD14,560,564,Gene\/Protein"}", "/scratch/micpie/export/bio_ner_16/test_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Thioxodipeptides Gly-thio-Lys (GtK), Ala-thio-Lys (AtK), and Ala-thio-Arg (AtR) in which the amide group has been modified to a thioxoamide were made into dications by electrospray ionization and converted to cation-radicals, (GtK+ 2H) (+ *), (AtK+ 2H) (+ *), and (AtR+ 2H) (+ *), by electron transfer dissociation (ETD) tandem mass spectrometry using fluoranthene anion-radical as an electron donor..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: Thioxodipeptides,0,16,Chemical\/Drug\nGly - thio - Lys,17,33,Chemical\/Drug\nGtK,36,39,Chemical\/Drug\nAla - thio - Lys,42,58,Chemical\/Drug\nAtK,61,64,Chemical\/Drug\nAla - thio - Arg,71,87,Chemical\/Drug\nAtR,90,93,Chemical\/Drug\namide,108,113,Chemical\/Drug\nthioxoamide,143,154,Chemical\/Drug\nGtK,245,248,Chemical\/Drug\n2H,251,253,Chemical\/Drug\nAtK,265,268,Chemical\/Drug\n2H,271,273,Chemical\/Drug\nAtR,289,292,Chemical\/Drug\n2H,295,297,Chemical\/Drug\nfluoranthene anion - radical,379,407,Chemical\/Drug"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Our results demonstrated that: (i) PGN induced phosphorylation of the transcription factors ATF-1 and CREB; (ii) ATF-1 and CREB bound DNA as a dimer and induced transcriptional activation of a CRE reporter plasmid, which was inhibited by dominant negative CREB and ATF-1; (iii) PGN induced phosphorylation of c-Jun, protein synthesis of JunB and c-Fos, and transcriptional activation of the AP-1 reporter plasmid, which was inhibited by dominant negative c-Fos; and (iv) PGN-induced activation of CREB\/ATF and AP-1 was mediated through CD14..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: transcription factors,71,92,Gene\/Protein\nATF - 1,93,100,Gene\/Protein\nCREB,105,109,Gene\/Protein\nATF - 1,117,124,Gene\/Protein\nCREB,129,133,Gene\/Protein\nCRE reporter plasmid,199,219,Gene\/Protein\nCREB,262,266,Gene\/Protein\nATF - 1,271,278,Gene\/Protein\nc - Jun,318,325,Gene\/Protein\nJunB,348,352,Gene\/Protein\nc - Fos,357,364,Gene\/Protein\nAP - 1 reporter plasmid,404,427,Gene\/Protein\nc - Fos,470,477,Gene\/Protein\nCREB \/ ATF,517,527,Gene\/Protein\nAP - 1,532,538,Gene\/Protein\nCD14,560,564,Gene\/Protein"}", "/scratch/micpie/export/bio_ner_16/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: We present CCSD (T) interaction energies and the bonding analysis for complexes of Cu, Ag, and Au with the lone-pair ligands, H2O, OF2, OMe2, NH3, NF3, NMe3, H2S, SF2, SMe2, PH3, PF3, PCl3, PMe3 (ML complexes)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: Cu,84,86,Chemical\/Drug\nAg,88,90,Chemical\/Drug\nAu,96,98,Chemical\/Drug\nH2O,129,132,Chemical\/Drug\nOF2,134,137,Chemical\/Drug\nOMe2,139,143,Chemical\/Drug\nNH3,145,148,Chemical\/Drug\nNF3,150,153,Chemical\/Drug\nNMe3,155,159,Chemical\/Drug\nH2S,161,164,Chemical\/Drug\nSF2,166,169,Chemical\/Drug\nSMe2,171,175,Chemical\/Drug\nPH3,177,180,Chemical\/Drug\nPF3,182,185,Chemical\/Drug\nPCl3,187,191,Chemical\/Drug\nPMe3,193,197,Chemical\/Drug"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: (% (n\/N)) 42.5 (31\/73HPV infection (% (n\/N)) Any HPV type42.5 (37\/87) Any high-risk HPV type34.5 (30\/87) Cervical cytology (% (n\/N)) Normal75.6 (62\/82) ASCUS13.4 (11\/82) LSIL11.0 (9\/82) Experienced vaginal discharge in last 6 months (% (n\/N)) 15.4 (12\/78) Experienced genital ulceration in last 6 months (% (n\/N)) 2.3 (2\/87) Findings suggestive of BV on Papanicolaou smear (% (n\/N)) 43.7 (38\/87) Cigarette use (% (n\/N)) Never smoked64.4 (56\/87) Ex-smoker3.4 (3\/87) Current smoker32.2 (28\/87) Abbreviations: HPV human papillomavirus, ASCUS atypical cells of undetermined significance, LSIL low-grade squamous intraepithelial lesion, BV bacterial vaginosis..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: HPV,28,31,state\nhigh - risk HPV,88,103,state\nCervical cytology,125,142,state\nvaginal discharge,233,250,state\ngenital ulceration,310,328,state\nBV,397,399,state\nNever,479,484,state\nEx -,508,512,state\nCurrent,534,541,state\nHPV,580,583,state\nASCUS,606,611,state\natypical cells of undetermined significance,612,655,state\nLSIL,657,661,state\nlow - grade squamous intraepithelial lesion,662,705,state\nBV,707,709,state\nbacterial vaginosis,710,729,state"}", "/scratch/micpie/export/solubility_aqsoldb/test_0-10.jsonl": "{"text":"Task: Please generate a molecule SMILES based on the text description.\nDescription: A molecule that has a aqueous solubility (at room temperature) of -2.177 log(mol\/L).\nResult: O=Cc1ccc(Cl)cc1"} {"text":"Task: Please generate a molecule canonical SMILES based on the description below.\nDescription: A molecule that has a water-solubility at room temperature of -0.391 log(mol\/L).\nResult: O=C([O-])c1ccccc1C(=O)O.[K+]"}", "/scratch/micpie/export/solubility_aqsoldb/valid_0-8.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the solubility in water (at room temperature) in log(mol\/L).\nDeepSMILES: CCCCCCCCC=O)O))))C5\nConstraint: Even if you are not sure, you must answer with a numeric value in log(mol\/L) without using any additional words.\nResult: -3.286 log(mol\/L)"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the solubility in water (at room temperature) in log(mol\/L).\ncanonical SMILES: Fc1ccc(Cc2ccc(F)cc2)cc1\nConstraint: Even if you are uncertain, you must answer with a numeric value in log(mol\/L) without using any other words.\nResult: -6.912 log(mol\/L)"}", "/scratch/micpie/export/solubility_aqsoldb/test_0-15.jsonl": "{"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should have a aqueous solubility (at room temperature) of -2.177 log(mol\/L).\nAssistant: Understood, this canonical SMILES represents a molecule that has a aqueous solubility (at room temperature) of -2.177 log(mol\/L): O=Cc1ccc(Cl)cc1"} {"text":"User: I want to come up with a InChI.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should have a solubility in water (at room temperature) of -0.391 log(mol\/L).\nAssistant: Understood, this InChI represents a molecule that has a solubility in water (at room temperature) of -0.391 log(mol\/L): InChI=1S\/C8H6O4.K\/c9-7(10)5-3-1-2-4-6(5)8(11)12;\/h1-4H,(H,9,10)(H,11,12);\/q;+1\/p-1"}", "/scratch/micpie/export/solubility_aqsoldb/train_0-8.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the water-solubility at room temperature in log(mol\/L).\nInChI: InChI=1S\/C11H7NO\/c13-11-8-5-1-3-7-4-2-6-9(12-11)10(7)8\/h1-6H,(H,12,13)\nConstraint: Even if you are not sure, you must answer with a numeric value in log(mol\/L) without using any additional words.\nResult: -3.255 log(mol\/L)"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the water-solubility at room temperature in log(mol\/L).\ncanonical SMILES: C\/C=C\/C(=O)C1C(C)C=CCC1(C)C\nConstraint: Even if you are uncertain, you must answer with a numeric value in log(mol\/L) without using any additional words.\nResult: -3.396 log(mol\/L)"}", "/scratch/micpie/export/solubility_aqsoldb/test_0-5.jsonl": "{"text":"Based on the SMILES representation of O=Cc1ccc(Cl)cc1, the molecule has a solubility in water (at room temperature) of -2.177 log(mol\/L)."} {"text":"Based on the SELFIES representation of [O][=C][Branch1][C][O-1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][O].[K+1], the molecule has a aqueous solubility (at room temperature) of -0.391 log(mol\/L)."}", "/scratch/micpie/export/solubility_aqsoldb/valid_0-9.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the water-solubility at room temperature in log(mol\/L).\nMolecule canonical SMILES: CCC1CCC(CCC(=O)O)C1\nConstraint: Even if you are uncertain, you must answer with a numeric value in log(mol\/L) without the unit and without using any other words.\nResult: -3.286"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the water-solubility at room temperature in log(mol\/L).\nMolecule SELFIES: [F][C][=C][C][=C][Branch1][=N][C][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][=C][Ring1][=C]\nConstraint: Even if you are not sure, you must answer with a numeric value in log(mol\/L) without the unit and without using any other words.\nResult: -6.912"}", "/scratch/micpie/export/solubility_aqsoldb/test_0-1.jsonl": "{"text":"The aqueous solubility (at room temperature) of the compound with the DeepSMILES O=CccccCl)cc6 is -2.177 log(mol\/L)."} {"text":"The water-solubility at room temperature of the compound with the SELFIES [O][=C][Branch1][C][O-1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][O].[K+1] is -0.391 log(mol\/L)."}", "/scratch/micpie/export/solubility_aqsoldb/valid_0-0.jsonl": "{"text":"The name of the molecule with the SMILES string CCCCCCCCC=O)O))))C5 is 3-(3-ethylcyclopentyl)propanoic acid."} {"text":"The name of the chemical with the InChI=1S\/C13H10F2\/c14-12-5-1-10(2-6-12)9-11-3-7-13(15)8-4-11\/h1-8H,9H2 is 1,1'-methylenebis(4-fluorobenzene)."}", "/scratch/micpie/export/solubility_aqsoldb/test_0-2.jsonl": "{"text":"The water-solubility at room temperature of the compound with the drug name 4-chlorobenzaldehyde is -2.177 log(mol\/L)."} {"text":"The water-solubility at room temperature of the chemical with the generic drug name potassium hydrogen benzene-1,2-dicarboxylate is -0.391 log(mol\/L)."}", "/scratch/micpie/export/solubility_aqsoldb/valid_0-10.jsonl": "{"text":"Task: Please give me a molecule canonical SMILES based on the text description below.\nDescription: A molecule that has a solubility in water (at room temperature) of -3.286 log(mol\/L).\nResult: CCC1CCC(CCC(=O)O)C1"} {"text":"Task: Please give me a molecule InChI based on the text description.\nDescription: A molecule that has a solubility in water (at room temperature) of -6.912 log(mol\/L).\nResult: InChI=1S\/C13H10F2\/c14-12-5-1-10(2-6-12)9-11-3-7-13(15)8-4-11\/h1-8H,9H2"}", "/scratch/micpie/export/solubility_aqsoldb/train_0-6.jsonl": "{"text":"The SELFIES [O][=C][N][C][=C][C][=C][C][=C][C][=C][C][Ring1][O][=C][Ring1][#Branch2][Ring1][=Branch1] represents a molecule that has a solubility in water (at room temperature) of -3.255 log(mol\/L)."} {"text":"The SMILES C\/C=C\/C(=O)C1C(C)C=CCC1(C)C is representing a molecule that has a aqueous solubility (at room temperature) of -3.396 log(mol\/L)."}", "/scratch/micpie/export/solubility_aqsoldb/valid_0-6.jsonl": "{"text":"The canonical SMILES CCC1CCC(CCC(=O)O)C1 is representing a molecule that has a water-solubility at room temperature of -3.286 log(mol\/L)."} {"text":"The DeepSMILES FccccCccccF)cc6)))))))cc6 represents a molecule that has a water-solubility at room temperature of -6.912 log(mol\/L)."}", "/scratch/micpie/export/solubility_aqsoldb/test_0-9.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the solubility in water (at room temperature) in log(mol\/L).\nMolecule DeepSMILES: O=CccccCl)cc6\nConstraint: Even if you are uncertain, you must answer with a numeric value in log(mol\/L) without the unit and without using any other words.\nResult: -2.177"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the solubility in water (at room temperature) in log(mol\/L).\nMolecule SELFIES: [O][=C][Branch1][C][O-1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][O].[K+1]\nConstraint: Even if you are uncertain, you must answer with a numeric value in log(mol\/L) without the unit and without using any other words.\nResult: -0.391"}", "/scratch/micpie/export/solubility_aqsoldb/test_0-0.jsonl": "{"text":"The name of the molecule with the SMILES [O][=C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1] is 4-chlorobenzaldehyde."} {"text":"The name of the molecule with the SMILES O=C([O-])c1ccccc1C(=O)O.[K+] is potassium hydrogen benzene-1,2-dicarboxylate."}", "/scratch/micpie/export/solubility_aqsoldb/valid_0-7.jsonl": "{"text":"The molecule with the DeepSMILES CCCCCCCCC=O)O))))C5 has a water-solubility at room temperature of -3.286 log(mol\/L)."} {"text":"The molecule with the canonical SMILES Fc1ccc(Cc2ccc(F)cc2)cc1 has a water-solubility at room temperature of -6.912 log(mol\/L)."}", "/scratch/micpie/export/solubility_aqsoldb/test_0-3.jsonl": "{"text":"User: I want to identify a molecule with a particular water-solubility at room temperature.\nAssistant: That sounds interesting, I would need to know the water-solubility at room temperature of the molecule you want to identify.\nUser: The water-solubility at room temperature should be -2.177 log(mol\/L).\nAssistant: I the molecule with the InChI InChI=1S\/C7H5ClO\/c8-7-3-1-6(5-9)2-4-7\/h1-5H. \nUser: Yes, I would like to know the generic drug name of the molecule.\nAssistant: The generic drug name of the molecule with the InChI InChI=1S\/C7H5ClO\/c8-7-3-1-6(5-9)2-4-7\/h1-5H is 4-chlorobenzaldehyde."} {"text":"User: I want to find a compound with a particular water-solubility at room temperature.\nAssistant: Great, I would need to know the water-solubility at room temperature of the compound you want to find.\nUser: The water-solubility at room temperature should be -0.391 log(mol\/L).\nAssistant: I advise the compound with the SMILES O=C([O-])c1ccccc1C(=O)O.[K+]. Is there anything else I can do for you?\nUser: Yes, I would like to know the compound name of the compound.\nAssistant: The compound name of the compound with the SMILES O=C([O-])c1ccccc1C(=O)O.[K+] is potassium hydrogen benzene-1,2-dicarboxylate."}", "/scratch/micpie/export/solubility_aqsoldb/valid_0-11.jsonl": "{"text":"User: Can you tell me the water-solubility at room temperature in log(mol\/L) of the molecule with the SELFIES [C][C][C][C][C][C][Branch1][Branch2][C][C][C][=Branch1][C][=O][O][C][Ring1][#Branch2]?\nAssistant: Yes, I'm happy to help, this molecule has a water-solubility at room temperature of -3.286 log(mol\/L)."} {"text":"User: Can you tell me the solubility in water (at room temperature) in log(mol\/L) of the molecule with the DeepSMILES FccccCccccF)cc6)))))))cc6?\nAssistant: Sure, this molecule has a solubility in water (at room temperature) of -6.912 log(mol\/L)."}", "/scratch/micpie/export/solubility_aqsoldb/train_0-0.jsonl": "{"text":"The name of the drug with the SMILES string O=CNccccccccc%11c%106 is Benzo[cd]indol-2(1H)-one."} {"text":"The name of the compound with the SMILES string [C][\/C][=C][\/C][=Branch1][C][=O][C][C][Branch1][C][C][C][=C][C][C][Ring1][#Branch1][Branch1][C][C][C] is (2E)-1-(2,6,6-trimethylcyclohex-3-en-1-yl)but-2-en-1-one."}", "/scratch/micpie/export/solubility_aqsoldb/test_0-6.jsonl": "{"text":"The SMILES O=Cc1ccc(Cl)cc1 represents a molecule with a water-solubility at room temperature of -2.177 log(mol\/L)."} {"text":"The DeepSMILES O=C[O-])cccccc6C=O)O.[K+] represents a molecule that has a solubility in water (at room temperature) of -0.391 log(mol\/L)."}", "/scratch/micpie/export/solubility_aqsoldb/train_0-10.jsonl": "{"text":"Task: Please give me a SMILES based on the description below.\nDescription: A molecule that has a water-solubility at room temperature of -3.255 log(mol\/L).\nResult: O=C1Nc2cccc3cccc1c23"} {"text":"Task: Please give me a molecule InChI based on the description below.\nDescription: A molecule that has a water-solubility at room temperature of -3.396 log(mol\/L).\nResult: InChI=1S\/C13H20O\/c1-5-7-11(14)12-10(2)8-6-9-13(12,3)4\/h5-8,10,12H,9H2,1-4H3\/b7-5+"}", "/scratch/micpie/export/solubility_aqsoldb/train_0-3.jsonl": "{"text":"User: I want to find a compound with a particular solubility in water (at room temperature).\nAssistant: Cool, I would need to know the solubility in water (at room temperature) of the compound you want to find.\nUser: The solubility in water (at room temperature) should be -3.255 log(mol\/L).\nAssistant: I recommend the compound with the InChI InChI=1S\/C11H7NO\/c13-11-8-5-1-3-7-4-2-6-9(12-11)10(7)8\/h1-6H,(H,12,13). Do you need anything else?\nUser: I would like to know the compound name of the compound.\nAssistant: The compound name of the compound with the InChI InChI=1S\/C11H7NO\/c13-11-8-5-1-3-7-4-2-6-9(12-11)10(7)8\/h1-6H,(H,12,13) is Benzo[cd]indol-2(1H)-one."} {"text":"User: I want to a molecule with a particular solubility in water (at room temperature).\nAssistant: That sounds interesting, I would need to know the solubility in water (at room temperature) of the molecule you want to .\nUser: The solubility in water (at room temperature) should be -3.396 log(mol\/L).\nAssistant: I recommend the molecule with the SMILES C\/C=C\/C(=O)C1C(C)C=CCC1(C)C. \nUser: I would like to know the generic drug name of the molecule.\nAssistant: The generic drug name of the molecule with the SMILES C\/C=C\/C(=O)C1C(C)C=CCC1(C)C is (2E)-1-(2,6,6-trimethylcyclohex-3-en-1-yl)but-2-en-1-one."}", "/scratch/micpie/export/solubility_aqsoldb/train_0-12.jsonl": "{"text":"User: Can you create the DeepSMILES of a molecule that has a solubility in water (at room temperature) of -3.255 log(mol\/L)?\nAssistant: Yes, here you go: O=CNccccccccc%11c%106"} {"text":"User: Can you generate the DeepSMILES of a molecule that has a aqueous solubility (at room temperature) of -3.396 log(mol\/L)?\nAssistant: Yes, here you go: C\/C=C\/C=O)CCC)C=CCC6C)C"}", "/scratch/micpie/export/solubility_aqsoldb/test_0-13.jsonl": "{"text":"User: I'm looking for the InChI of a molecule that has a aqueous solubility (at room temperature) of -2.177 log(mol\/L).\nAssistant: This is a molecule that has a aqueous solubility (at room temperature) of -2.177 log(mol\/L): InChI=1S\/C7H5ClO\/c8-7-3-1-6(5-9)2-4-7\/h1-5H"} {"text":"User: I'm looking for the SMILES of a molecule that has a water-solubility at room temperature of -0.391 log(mol\/L).\nAssistant: This is a molecule that has a water-solubility at room temperature of -0.391 log(mol\/L): O=C([O-])c1ccccc1C(=O)O.[K+]"}", "/scratch/micpie/export/solubility_aqsoldb/valid_0-2.jsonl": "{"text":"The solubility in water (at room temperature) of the molecule with the generic drug name 3-(3-ethylcyclopentyl)propanoic acid is -3.286 log(mol\/L)."} {"text":"The aqueous solubility (at room temperature) of the drug with the compound name 1,1'-methylenebis(4-fluorobenzene) is -6.912 log(mol\/L)."}", "/scratch/micpie/export/solubility_aqsoldb/train_0-14.jsonl": "{"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a aqueous solubility (at room temperature) of -3.255 log(mol\/L).\nAssistant: Ok, this DeepSMILES represents a molecule that has a aqueous solubility (at room temperature) of -3.255 log(mol\/L): O=CNccccccccc%11c%106"} {"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a water-solubility at room temperature of -3.396 log(mol\/L).\nAssistant: Ok, this SELFIES represents a molecule that has a water-solubility at room temperature of -3.396 log(mol\/L): [C][\/C][=C][\/C][=Branch1][C][=O][C][C][Branch1][C][C][C][=C][C][C][Ring1][#Branch1][Branch1][C][C][C]"}", "/scratch/micpie/export/solubility_aqsoldb/valid_0-1.jsonl": "{"text":"The water-solubility at room temperature of the drug with the canonical SMILES CCC1CCC(CCC(=O)O)C1 is -3.286 log(mol\/L)."} {"text":"The aqueous solubility (at room temperature) of the drug with the InChI InChI=1S\/C13H10F2\/c14-12-5-1-10(2-6-12)9-11-3-7-13(15)8-4-11\/h1-8H,9H2 is -6.912 log(mol\/L)."}", "/scratch/micpie/export/solubility_aqsoldb/valid_0-13.jsonl": "{"text":"User: I'm searching for the SELFIES of a molecule that has a solubility in water (at room temperature) of -3.286 log(mol\/L).\nAssistant: This is a molecule that has a solubility in water (at room temperature) of -3.286 log(mol\/L): [C][C][C][C][C][C][Branch1][Branch2][C][C][C][=Branch1][C][=O][O][C][Ring1][#Branch2]"} {"text":"User: I'm looking for the InChI of a molecule that has a aqueous solubility (at room temperature) of -6.912 log(mol\/L).\nAssistant: This is a molecule that has a aqueous solubility (at room temperature) of -6.912 log(mol\/L): InChI=1S\/C13H10F2\/c14-12-5-1-10(2-6-12)9-11-3-7-13(15)8-4-11\/h1-8H,9H2"}", "/scratch/micpie/export/solubility_aqsoldb/valid_0-5.jsonl": "{"text":"Based on the DeepSMILES CCCCCCCCC=O)O))))C5, the molecule has a aqueous solubility (at room temperature) of -3.286 log(mol\/L)."} {"text":"Based on the InChI representation of InChI=1S\/C13H10F2\/c14-12-5-1-10(2-6-12)9-11-3-7-13(15)8-4-11\/h1-8H,9H2, the molecule has a solubility in water (at room temperature) of -6.912 log(mol\/L)."}", "/scratch/micpie/export/solubility_aqsoldb/train_0-15.jsonl": "{"text":"User: I want to come up with a SELFIES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a water-solubility at room temperature of -3.255 log(mol\/L).\nAssistant: Understood, this SELFIES represents a molecule that has a water-solubility at room temperature of -3.255 log(mol\/L): [O][=C][N][C][=C][C][=C][C][=C][C][=C][C][Ring1][O][=C][Ring1][#Branch2][Ring1][=Branch1]"} {"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should have a aqueous solubility (at room temperature) of -3.396 log(mol\/L).\nAssistant: Understood, this SMILES represents a molecule that has a aqueous solubility (at room temperature) of -3.396 log(mol\/L): C\/C=C\/C(=O)C1C(C)C=CCC1(C)C"}", "/scratch/micpie/export/solubility_aqsoldb/valid_0-4.jsonl": "{"text":"The molecule with the SMILES CCC1CCC(CCC(=O)O)C1 has a solubility in water (at room temperature) of -3.286 log(mol\/L)."} {"text":"The molecule with the canonical SMILES Fc1ccc(Cc2ccc(F)cc2)cc1 has a aqueous solubility (at room temperature) of -6.912 log(mol\/L)."}", "/scratch/micpie/export/solubility_aqsoldb/train_0-5.jsonl": "{"text":"Based on the SELFIES [O][=C][N][C][=C][C][=C][C][=C][C][=C][C][Ring1][O][=C][Ring1][#Branch2][Ring1][=Branch1], the molecule has a aqueous solubility (at room temperature) of -3.255 log(mol\/L)."} {"text":"Based on the DeepSMILES representation of C\/C=C\/C=O)CCC)C=CCC6C)C, the molecule has a aqueous solubility (at room temperature) of -3.396 log(mol\/L)."}", "/scratch/micpie/export/solubility_aqsoldb/valid_0-15.jsonl": "{"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should have a water-solubility at room temperature of -3.286 log(mol\/L).\nAssistant: Understood, this SELFIES represents a molecule that has a water-solubility at room temperature of -3.286 log(mol\/L): [C][C][C][C][C][C][Branch1][Branch2][C][C][C][=Branch1][C][=O][O][C][Ring1][#Branch2]"} {"text":"User: I want to generate a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a aqueous solubility (at room temperature) of -6.912 log(mol\/L).\nAssistant: Ok, this canonical SMILES represents a molecule that has a aqueous solubility (at room temperature) of -6.912 log(mol\/L): Fc1ccc(Cc2ccc(F)cc2)cc1"}", "/scratch/micpie/export/solubility_aqsoldb/valid_0-12.jsonl": "{"text":"User: Can you give me the canonical SMILES of a molecule that has a solubility in water (at room temperature) of -3.286 log(mol\/L)?\nAssistant: Of course, here you go: CCC1CCC(CCC(=O)O)C1"} {"text":"User: Can you create the SMILES of a molecule that has a solubility in water (at room temperature) of -6.912 log(mol\/L)?\nAssistant: Yes, I'm happy to help, here you go: Fc1ccc(Cc2ccc(F)cc2)cc1"}", "/scratch/micpie/export/solubility_aqsoldb/train_0-2.jsonl": "{"text":"The aqueous solubility (at room temperature) of the compound with the compound name Benzo[cd]indol-2(1H)-one is -3.255 log(mol\/L)."} {"text":"The aqueous solubility (at room temperature) of the chemical with the generic drug name (2E)-1-(2,6,6-trimethylcyclohex-3-en-1-yl)but-2-en-1-one is -3.396 log(mol\/L)."}", "/scratch/micpie/export/solubility_aqsoldb/test_0-11.jsonl": "{"text":"User: Can you estimate the aqueous solubility (at room temperature) in log(mol\/L) of the molecule with the canonical SMILES O=Cc1ccc(Cl)cc1?\nAssistant: Yes, this molecule has a aqueous solubility (at room temperature) of -2.177 log(mol\/L)."} {"text":"User: Can you derive the aqueous solubility (at room temperature) in log(mol\/L) of the molecule with the SMILES O=C([O-])c1ccccc1C(=O)O.[K+]?\nAssistant: Yes, this molecule has a aqueous solubility (at room temperature) of -0.391 log(mol\/L)."}", "/scratch/micpie/export/solubility_aqsoldb/train_0-7.jsonl": "{"text":"The molecule with the SMILES O=C1Nc2cccc3cccc1c23 has a aqueous solubility (at room temperature) of -3.255 log(mol\/L)."} {"text":"The molecule with the canonical SMILES C\/C=C\/C(=O)C1C(C)C=CCC1(C)C has a solubility in water (at room temperature) of -3.396 log(mol\/L)."}", "/scratch/micpie/export/solubility_aqsoldb/train_0-11.jsonl": "{"text":"User: Can you estimate the aqueous solubility (at room temperature) in log(mol\/L) of the molecule with the InChI InChI=1S\/C11H7NO\/c13-11-8-5-1-3-7-4-2-6-9(12-11)10(7)8\/h1-6H,(H,12,13)?\nAssistant: Of course, this molecule has a aqueous solubility (at room temperature) of -3.255 log(mol\/L)."} {"text":"User: Can you estimate the water-solubility at room temperature in log(mol\/L) of the molecule with the SELFIES [C][\/C][=C][\/C][=Branch1][C][=O][C][C][Branch1][C][C][C][=C][C][C][Ring1][#Branch1][Branch1][C][C][C]?\nAssistant: Yes, this molecule has a water-solubility at room temperature of -3.396 log(mol\/L)."}", "/scratch/micpie/export/solubility_aqsoldb/train_0-1.jsonl": "{"text":"The aqueous solubility (at room temperature) of the compound with the InChI InChI=1S\/C11H7NO\/c13-11-8-5-1-3-7-4-2-6-9(12-11)10(7)8\/h1-6H,(H,12,13) is -3.255 log(mol\/L)."} {"text":"The water-solubility at room temperature of the molecule with the SMILES C\/C=C\/C(=O)C1C(C)C=CCC1(C)C is -3.396 log(mol\/L)."}", "/scratch/micpie/export/solubility_aqsoldb/train_0-13.jsonl": "{"text":"User: I'm searching for the InChI of a molecule that has a aqueous solubility (at room temperature) of -3.255 log(mol\/L).\nAssistant: This is a molecule that has a aqueous solubility (at room temperature) of -3.255 log(mol\/L): InChI=1S\/C11H7NO\/c13-11-8-5-1-3-7-4-2-6-9(12-11)10(7)8\/h1-6H,(H,12,13)"} {"text":"User: I'm looking for the SELFIES of a molecule that has a water-solubility at room temperature of -3.396 log(mol\/L).\nAssistant: This is a molecule that has a water-solubility at room temperature of -3.396 log(mol\/L): [C][\/C][=C][\/C][=Branch1][C][=O][C][C][Branch1][C][C][C][=C][C][C][Ring1][#Branch1][Branch1][C][C][C]"}", "/scratch/micpie/export/solubility_aqsoldb/train_0-4.jsonl": "{"text":"The molecule with the SELFIES representation of [O][=C][N][C][=C][C][=C][C][=C][C][=C][C][Ring1][O][=C][Ring1][#Branch2][Ring1][=Branch1] has a solubility in water (at room temperature) of -3.255 log(mol\/L)."} {"text":"The molecule with the SMILES representation of C\/C=C\/C(=O)C1C(C)C=CCC1(C)C has a aqueous solubility (at room temperature) of -3.396 log(mol\/L)."}", "/scratch/micpie/export/solubility_aqsoldb/test_0-7.jsonl": "{"text":"The molecule with the DeepSMILES O=CccccCl)cc6 has a aqueous solubility (at room temperature) of -2.177 log(mol\/L)."} {"text":"The molecule with the InChI InChI=1S\/C8H6O4.K\/c9-7(10)5-3-1-2-4-6(5)8(11)12;\/h1-4H,(H,9,10)(H,11,12);\/q;+1\/p-1 has a water-solubility at room temperature of -0.391 log(mol\/L)."}", "/scratch/micpie/export/solubility_aqsoldb/train_0-9.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the solubility in water (at room temperature) in log(mol\/L).\nSMILES: O=C1Nc2cccc3cccc1c23\nConstraint: Even if you are uncertain, you must answer with a numeric value in log(mol\/L) without the unit and without using any other words.\nResult: -3.255"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the water-solubility at room temperature in log(mol\/L).\nSMILES: C\/C=C\/C(=O)C1C(C)C=CCC1(C)C\nConstraint: Even if you are not sure, you must answer with a numeric value in log(mol\/L) without the unit and without using any other words.\nResult: -3.396"}", "/scratch/micpie/export/solubility_aqsoldb/valid_0-3.jsonl": "{"text":"User: I want to a chemical with a particular water-solubility at room temperature.\nAssistant: That sounds interesting, I would need to know the water-solubility at room temperature of the chemical you want to .\nUser: The water-solubility at room temperature should be -3.286 log(mol\/L).\nAssistant: I recommend the chemical with the SELFIES [C][C][C][C][C][C][Branch1][Branch2][C][C][C][=Branch1][C][=O][O][C][Ring1][#Branch2]. \nUser: I would like to know the generic drug name of the chemical.\nAssistant: The generic drug name of the chemical with the SELFIES [C][C][C][C][C][C][Branch1][Branch2][C][C][C][=Branch1][C][=O][O][C][Ring1][#Branch2] is 3-(3-ethylcyclopentyl)propanoic acid."} {"text":"User: I want to a compound with a particular aqueous solubility (at room temperature).\nAssistant: Cool, I would need to know the aqueous solubility (at room temperature) of the compound you want to .\nUser: The aqueous solubility (at room temperature) should be -6.912 log(mol\/L).\nAssistant: I advise the compound with the InChI InChI=1S\/C13H10F2\/c14-12-5-1-10(2-6-12)9-11-3-7-13(15)8-4-11\/h1-8H,9H2. Anything else?\nUser: I would like to know the generic drug name of the compound.\nAssistant: The generic drug name of the compound with the InChI InChI=1S\/C13H10F2\/c14-12-5-1-10(2-6-12)9-11-3-7-13(15)8-4-11\/h1-8H,9H2 is 1,1'-methylenebis(4-fluorobenzene)."}", "/scratch/micpie/export/solubility_aqsoldb/test_0-8.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the water-solubility at room temperature in log(mol\/L).\nMolecule canonical SMILES: O=Cc1ccc(Cl)cc1\nConstraint: Even if you are uncertain, you must answer with a numeric value in log(mol\/L) without using any other words.\nResult: -2.177 log(mol\/L)"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the solubility in water (at room temperature) in log(mol\/L).\nMolecule InChI: InChI=1S\/C8H6O4.K\/c9-7(10)5-3-1-2-4-6(5)8(11)12;\/h1-4H,(H,9,10)(H,11,12);\/q;+1\/p-1\nConstraint: Even if you are uncertain, you must answer with a numeric value in log(mol\/L) without using any additional words.\nResult: -0.391 log(mol\/L)"}", "/scratch/micpie/export/solubility_aqsoldb/test_0-14.jsonl": "{"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should have a aqueous solubility (at room temperature) of -2.177 log(mol\/L).\nAssistant: Ok, this SMILES represents a molecule that has a aqueous solubility (at room temperature) of -2.177 log(mol\/L): O=Cc1ccc(Cl)cc1"} {"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a water-solubility at room temperature of -0.391 log(mol\/L).\nAssistant: Ok, here you go, this DeepSMILES represents a molecule that has a water-solubility at room temperature of -0.391 log(mol\/L): O=C[O-])cccccc6C=O)O.[K+]"}", "/scratch/micpie/export/solubility_aqsoldb/valid_0-14.jsonl": "{"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a water-solubility at room temperature of -3.286 log(mol\/L).\nAssistant: Ok, this SELFIES represents a molecule that has a water-solubility at room temperature of -3.286 log(mol\/L): [C][C][C][C][C][C][Branch1][Branch2][C][C][C][=Branch1][C][=O][O][C][Ring1][#Branch2]"} {"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a water-solubility at room temperature of -6.912 log(mol\/L).\nAssistant: Got it, here you go, this DeepSMILES represents a molecule that has a water-solubility at room temperature of -6.912 log(mol\/L): FccccCccccF)cc6)))))))cc6"}", "/scratch/micpie/export/solubility_aqsoldb/test_0-4.jsonl": "{"text":"The molecule with the DeepSMILES representation of O=CccccCl)cc6 has a aqueous solubility (at room temperature) of -2.177 log(mol\/L)."} {"text":"The molecule with the SELFIES [O][=C][Branch1][C][O-1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][O].[K+1] has a aqueous solubility (at room temperature) of -0.391 log(mol\/L)."}", "/scratch/micpie/export/solubility_aqsoldb/test_0-12.jsonl": "{"text":"User: Can you generate the SELFIES of a molecule that has a aqueous solubility (at room temperature) of -2.177 log(mol\/L)?\nAssistant: Sure, here you go: [O][=C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1]"} {"text":"User: Can you generate the SELFIES of a molecule that has a solubility in water (at room temperature) of -0.391 log(mol\/L)?\nAssistant: Of course, here you go: [O][=C][Branch1][C][O-1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][O].[K+1]"}", "/scratch/micpie/export/mofdscribe/test_0-1.jsonl": "{"text":"Task: Generate a CIF file of a crystal structure with the following description\n(HgCl2)3(C3S2H4)2 is Indium-derived structured and crystallizes in the triclinic P-1 space group. The structure is zero-dimensional and consists of one 1,3-dithiolane, 2-(1,3-dithiolan-2-ylidene)- molecule and one HgCl2 cluster. In the HgCl2 cluster, there are two inequivalent Hg sites. In the first Hg site, Hg(1) is bonded in a distorted square co-planar geometry to two equivalent Cl(1) and two equivalent Cl(3) atoms. Both Hg(1)-Cl(1) bond lengths are 2.37 Å. Both Hg(1)-Cl(3) bond lengths are 2.98 Å. In the second Hg site, Hg(2) is bonded in a linear geometry to one Cl(2) and one Cl(3) atom. The Hg(2)-Cl(2) bond length is 2.37 Å. The Hg(2)-Cl(3) bond length is 2.39 Å. There are three inequivalent Cl sites. In the first Cl site, Cl(1) is bonded in a single-bond geometry to one Hg(1) atom. In the second Cl site, Cl(2) is bonded in a single-bond geometry to one Hg(2) atom. In the third Cl site, Cl(3) is bonded in a distorted water-like geometry to one Hg(1) and one Hg(2) atom. Linkers: 2 [CH]1CSC(=C2SCCS2)S1 ,1 C1CSC(=C2SCCS2)S1. Metal clusters: 3 [Hg]. The MOF has largest included sphere 1.80 A, density 3.65 g\/cm3, surface area 2242.82 m2\/g, accessible volume 0.04 cm3\/g.\nAnswer: [CIF]\ndata_Hg3H8C6(S2Cl3)2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.610\n_cell_length_b 7.247\n_cell_length_c 9.920\n_cell_angle_alpha 101.245\n_cell_angle_beta 91.888\n_cell_angle_gamma 91.378\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Hg3H8C6(S2Cl3)2\n_chemical_formula_sum 'Hg3 H8 C6 S4 Cl6'\n_cell_volume 465.602\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Hg Hg0 1 0.000 0.000 0.000 1.0\n Hg Hg1 1 0.507 0.321 0.142 1.0\n Hg Hg2 1 0.493 0.679 0.858 1.0\n H H3 1 0.988 0.830 0.522 1.0\n H H4 1 0.012 0.170 0.478 1.0\n H H5 1 0.232 0.945 0.565 1.0\n H H6 1 0.768 0.055 0.435 1.0\n H H7 1 0.106 0.653 0.301 1.0\n H H8 1 0.894 0.347 0.699 1.0\n H H9 1 0.200 0.890 0.310 1.0\n H H10 1 0.800 0.110 0.690 1.0\n C C11 1 0.435 0.566 0.489 1.0\n C C12 1 0.565 0.434 0.511 1.0\n C C13 1 0.152 0.817 0.510 1.0\n C C14 1 0.848 0.183 0.490 1.0\n C C15 1 0.204 0.767 0.359 1.0\n C C16 1 0.796 0.233 0.641 1.0\n S S17 1 0.464 0.691 0.356 1.0\n S S18 1 0.536 0.309 0.644 1.0\n S S19 1 0.225 0.621 0.589 1.0\n S S20 1 0.775 0.379 0.411 1.0\n Cl Cl21 1 0.732 0.921 0.133 1.0\n Cl Cl22 1 0.268 0.079 0.867 1.0\n Cl Cl23 1 0.771 0.383 0.999 1.0\n Cl Cl24 1 0.229 0.617 0.001 1.0\n Cl Cl25 1 0.218 0.217 0.250 1.0\n Cl Cl26 1 0.782 0.783 0.750 1.0\n[\/CIF]\n"} {"text":"Task: Propose a CIF file of a metal-organic framework with the following description\nCaC6H2SO4 crystallizes in the monoclinic P2_1\/c space group. Ca(1) is bonded in a 5-coordinate geometry to one O(3), two equivalent O(1), and two equivalent O(2) atoms. The Ca(1)-O(3) bond length is 2.36 Å. There is one shorter (2.29 Å) and one longer (2.84 Å) Ca(1)-O(1) bond length. There is one shorter (2.35 Å) and one longer (2.44 Å) Ca(1)-O(2) bond length. There are six inequivalent C sites. In the first C site, C(1) is bonded in a distorted bent 120 degrees geometry to one C(6), one O(3), and one O(4) atom. The C(1)-C(6) bond length is 1.50 Å. The C(1)-O(3) bond length is 1.26 Å. The C(1)-O(4) bond length is 1.24 Å. In the second C site, C(2) is bonded in a bent 120 degrees geometry to one C(3), one O(1), and one O(2) atom. The C(2)-C(3) bond length is 1.47 Å. The C(2)-O(1) bond length is 1.26 Å. The C(2)-O(2) bond length is 1.26 Å. In the third C site, C(3) is bonded in a trigonal planar geometry to one C(2), one C(4), and one S(1) atom. The C(3)-C(4) bond length is 1.36 Å. The C(3)-S(1) bond length is 1.72 Å. In the fourth C site, C(4) is bonded in a distorted single-bond geometry to one C(3) and one H(1,2) atom. The C(4)-H(1,2) bond length is 0.93 Å. In the fifth C site, C(5) is bonded in a distorted single-bond geometry to one C(6) and one H(1,2) atom. The C(5)-C(6) bond length is 1.37 Å. The C(5)-H(1,2) bond length is 0.93 Å. In the sixth C site, C(6) is bonded in a trigonal planar geometry to one C(1), one C(5), and one S(1) atom. The C(6)-S(1) bond length is 1.71 Å. H(1,2) is bonded in a single-bond geometry to one C(4) atom. S(1) is bonded in an L-shaped geometry to one C(3) and one C(6) atom. There are four inequivalent O sites. In the first O site, O(1) is bonded in a 2-coordinate geometry to two equivalent Ca(1) and one C(2) atom. In the second O site, O(2) is bonded in a 3-coordinate geometry to two equivalent Ca(1) and one C(2) atom. In the third O site, O(3) is bonded in a bent 120 degrees geometry to one Ca(1) and one C(1) atom. In the fourth O site, O(4) is bonded in a single-bond geometry to one C(1) atom. Linkers: 4 [O]C(=O)c1ccc(C([O])=O)s1. Metal clusters: 4 [Ca]. The MOF has largest included sphere 5.44 A, density 1.07 g\/cm3, surface area 4304.84 m2\/g, accessible volume 0.53 cm3\/g.\nAnswer: [CIF]\ndata_CaH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.577\n_cell_length_b 11.123\n_cell_length_c 18.033\n_cell_angle_alpha 90.000\n_cell_angle_beta 97.602\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural CaH2C6SO4\n_chemical_formula_sum 'Ca4 H8 C24 S4 O16'\n_cell_volume 1307.600\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ca Ca0 1 0.253 0.907 0.477 1\n Ca Ca1 1 0.247 0.407 0.023 1\n Ca Ca2 1 0.747 0.093 0.523 1\n Ca Ca3 1 0.753 0.593 0.977 1\n H H4 1 0.615 0.106 0.704 1\n H H5 1 0.582 0.196 0.826 1\n H H6 1 0.885 0.606 0.796 1\n H H7 1 0.918 0.696 0.674 1\n H H8 1 0.385 0.894 0.296 1\n H H9 1 0.418 0.804 0.174 1\n H H10 1 0.115 0.394 0.204 1\n H H11 1 0.082 0.304 0.326 1\n C C12 1 0.195 0.286 0.857 1\n C C13 1 0.276 0.095 0.592 1\n C C14 1 0.313 0.137 0.670 1\n C C15 1 0.493 0.137 0.717 1\n C C16 1 0.474 0.189 0.787 1\n C C17 1 0.279 0.228 0.792 1\n C C18 1 0.305 0.786 0.643 1\n C C19 1 0.225 0.595 0.908 1\n C C20 1 0.187 0.637 0.830 1\n C C21 1 0.007 0.637 0.783 1\n C C22 1 0.026 0.689 0.713 1\n C C23 1 0.221 0.728 0.708 1\n C C24 1 0.805 0.714 0.143 1\n C C25 1 0.725 0.905 0.408 1\n C C26 1 0.687 0.863 0.330 1\n C C27 1 0.507 0.863 0.283 1\n C C28 1 0.526 0.811 0.213 1\n C C29 1 0.721 0.772 0.208 1\n C C30 1 0.695 0.214 0.357 1\n C C31 1 0.775 0.405 0.092 1\n C C32 1 0.813 0.363 0.170 1\n C C33 1 0.993 0.363 0.217 1\n C C34 1 0.974 0.311 0.287 1\n C C35 1 0.779 0.272 0.292 1\n S S36 1 0.117 0.199 0.712 1\n S S37 1 0.383 0.699 0.788 1\n S S38 1 0.883 0.801 0.288 1\n S S39 1 0.617 0.301 0.212 1\n O O40 1 0.096 0.096 0.558 1\n O O41 1 0.425 0.057 0.562 1\n O O42 1 0.324 0.322 0.910 1\n O O43 1 0.006 0.294 0.852 1\n O O44 1 0.405 0.596 0.942 1\n O O45 1 0.075 0.557 0.938 1\n O O46 1 0.176 0.822 0.590 1\n O O47 1 0.494 0.794 0.648 1\n O O48 1 0.904 0.904 0.442 1\n O O49 1 0.575 0.943 0.438 1\n O O50 1 0.676 0.678 0.090 1\n O O51 1 0.994 0.706 0.148 1\n O O52 1 0.596 0.404 0.058 1\n O O53 1 0.925 0.443 0.062 1\n O O54 1 0.824 0.178 0.410 1\n O O55 1 0.506 0.206 0.352 1\n[\/CIF]\n"}", "/scratch/micpie/export/mofdscribe/valid_0-0.jsonl": "{"text":"Task: Write a description of the structure with the CIF file [CIF]\ndata_MnH4(C2O3)2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.167\n_cell_length_b 7.565\n_cell_length_c 7.814\n_cell_angle_alpha 91.240\n_cell_angle_beta 117.313\n_cell_angle_gamma 118.290\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural MnH4(C2O3)2\n_chemical_formula_sum 'Mn2 H8 C8 O12'\n_cell_volume 315.619\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Mn Mn0 1 0.500 0.000 0.500 1.0\n Mn Mn1 1 0.500 0.500 1.000 1.0\n H H2 1 0.737 0.405 0.446 1.0\n H H3 1 0.387 0.095 0.054 1.0\n H H4 1 0.263 0.595 0.554 1.0\n H H5 1 0.613 0.905 0.946 1.0\n H H6 1 0.917 0.366 0.641 1.0\n H H7 1 0.410 0.134 0.859 1.0\n H H8 1 0.083 0.634 0.359 1.0\n H H9 1 0.590 0.866 0.141 1.0\n C C10 1 0.034 0.770 0.050 1.0\n C C11 1 0.714 0.730 0.450 1.0\n C C12 1 0.966 0.230 0.950 1.0\n C C13 1 0.286 0.270 0.550 1.0\n C C14 1 0.928 0.754 0.839 1.0\n C C15 1 0.835 0.746 0.661 1.0\n C C16 1 0.072 0.246 0.161 1.0\n C C17 1 0.165 0.254 0.339 1.0\n O O18 1 0.765 0.297 0.503 1.0\n O O19 1 0.465 0.203 0.997 1.0\n O O20 1 0.235 0.703 0.497 1.0\n O O21 1 0.535 0.797 0.003 1.0\n O O22 1 0.260 0.924 0.171 1.0\n O O23 1 0.665 0.576 0.329 1.0\n O O24 1 0.740 0.076 0.829 1.0\n O O25 1 0.335 0.424 0.671 1.0\n O O26 1 0.898 0.633 0.102 1.0\n O O27 1 0.664 0.867 0.398 1.0\n O O28 1 0.102 0.367 0.898 1.0\n O O29 1 0.336 0.133 0.602 1.0\n[\/CIF]\n.\nA: MnH4(C2O3)2 crystallizes in the monoclinic C2\/c space group. Mn(1) is bonded in an octahedral geometry to two equivalent O(1), two equivalent O(2), and two equivalent O(3) atoms. Both Mn(1)-O(1) bond lengths are 2.14 Å. Both Mn(1)-O(2) bond lengths are 2.20 Å. Both Mn(1)-O(3) bond lengths are 2.22 Å. There are two inequivalent C sites. In the first C site, C(1) is bonded in a distorted bent 120 degrees geometry to one C(2), one O(2), and one O(3) atom. The C(1)-C(2) bond length is 1.44 Å. The C(1)-O(2) bond length is 1.28 Å. The C(1)-O(3) bond length is 1.27 Å. In the second C site, C(2) is bonded in a linear geometry to one C(1) and one C(2) atom. The C(2)-C(2) bond length is 1.22 Å. There are two inequivalent H sites. In the first H site, H(1) is bonded in a single-bond geometry to one O(1) and one O(2) atom. The H(1)-O(1) bond length is 1.00 Å. The H(1)-O(2) bond length is 1.75 Å. In the second H site, H(2) is bonded in a single-bond geometry to one O(1) atom. The H(2)-O(1) bond length is 0.99 Å. There are three inequivalent O sites. In the first O site, O(1) is bonded in a distorted trigonal non-coplanar geometry to one Mn(1), one H(1), and one H(2) atom. In the second O site, O(2) is bonded in a 3-coordinate geometry to one Mn(1), one C(1), and one H(1) atom. In the third O site, O(3) is bonded in a distorted bent 120 degrees geometry to one Mn(1) and one C(1) atom. Linkers: 2 [O]C(=O)C#CC([O])=O. Metal clusters: 2 [Mn]. RCSR code: pts. The MOF has largest included sphere 1.48 A, density 2.14 g\/cm3, surface area 3172.67 m2\/g, accessible volume 0.06 cm3\/g"} {"text":"Task: Describe the structure with the Crystallographic Information File (CIF) [CIF]\ndata_NiH3C5N2ClO2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.865\n_cell_length_b 10.545\n_cell_length_c 11.200\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural NiH3C5N2ClO2\n_chemical_formula_sum 'Ni4 H12 C20 N8 Cl4 O8'\n_cell_volume 928.900\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ni Ni0 1 0.743 0.215 0.321 1\n Ni Ni1 1 0.243 0.785 0.679 1\n Ni Ni2 1 0.743 0.715 0.179 1\n Ni Ni3 1 0.243 0.285 0.821 1\n H H4 1 0.964 0.038 0.144 1\n H H5 1 0.996 0.428 0.982 1\n H H6 1 0.126 0.072 0.980 1\n H H7 1 0.464 0.962 0.856 1\n H H8 1 0.496 0.572 0.018 1\n H H9 1 0.625 0.928 0.021 1\n H H10 1 0.964 0.538 0.356 1\n H H11 1 0.996 0.928 0.518 1\n H H12 1 0.126 0.572 0.520 1\n H H13 1 0.464 0.462 0.644 1\n H H14 1 0.496 0.072 0.482 1\n H H15 1 0.625 0.428 0.479 1\n C C16 1 0.789 0.429 0.173 1\n C C17 1 0.969 0.119 0.112 1\n C C18 1 0.891 0.326 0.112 1\n C C19 1 0.992 0.347 0.015 1\n C C20 1 0.068 0.140 0.013 1\n C C21 1 0.289 0.571 0.827 1\n C C22 1 0.469 0.881 0.888 1\n C C23 1 0.391 0.674 0.888 1\n C C24 1 0.491 0.653 0.985 1\n C C25 1 0.568 0.860 0.987 1\n C C26 1 0.789 0.929 0.327 1\n C C27 1 0.969 0.619 0.388 1\n C C28 1 0.891 0.826 0.388 1\n C C29 1 0.992 0.847 0.485 1\n C C30 1 0.068 0.640 0.486 1\n C C31 1 0.289 0.071 0.673 1\n C C32 1 0.469 0.381 0.612 1\n C C33 1 0.391 0.174 0.612 1\n C C34 1 0.491 0.153 0.515 1\n C C35 1 0.568 0.360 0.513 1\n N N36 1 0.882 0.211 0.163 1\n N N37 1 0.084 0.255 0.964 1\n N N38 1 0.382 0.789 0.837 1\n N N39 1 0.584 0.745 0.036 1\n N N40 1 0.882 0.711 0.337 1\n N N41 1 0.084 0.755 0.536 1\n N N42 1 0.382 0.289 0.663 1\n N N43 1 0.584 0.245 0.464 1\n Cl Cl44 1 0.992 0.281 0.439 1\n Cl Cl45 1 0.492 0.719 0.561 1\n Cl Cl46 1 0.992 0.781 0.061 1\n Cl Cl47 1 0.492 0.219 0.939 1\n O O48 1 0.714 0.400 0.267 1\n O O49 1 0.793 0.535 0.124 1\n O O50 1 0.214 0.600 0.733 1\n O O51 1 0.293 0.465 0.876 1\n O O52 1 0.714 0.900 0.233 1\n O O53 1 0.793 0.035 0.376 1\n O O54 1 0.214 0.100 0.767 1\n O O55 1 0.293 0.965 0.624 1\n[\/CIF]\n.\nNiC5N2H3O2Cl crystallizes in the orthorhombic Pna2_1 space group. Ni(1) is bonded in a square pyramidal geometry to one N(1), one N(2), one O(1), one O(2), and one Cl(1) atom. The Ni(1)-N(1) bond length is 2.08 Å. The Ni(1)-N(2) bond length is 2.06 Å. The Ni(1)-O(1) bond length is 2.05 Å. The Ni(1)-O(2) bond length is 2.03 Å. The Ni(1)-Cl(1) bond length is 2.46 Å. There are five inequivalent C sites. In the first C site, C(1) is bonded in a distorted bent 120 degrees geometry to one C(3), one O(1), and one O(2) atom. The C(1)-C(3) bond length is 1.51 Å. The C(1)-O(1) bond length is 1.25 Å. The C(1)-O(2) bond length is 1.25 Å. In the second C site, C(2) is bonded in a distorted bent 120 degrees geometry to one N(1) and one H(1) atom. The C(2)-N(1) bond length is 1.31 Å. The C(2)-H(1) bond length is 0.93 Å. In the third C site, C(3) is bonded in a distorted trigonal planar geometry to one C(1), one C(4), and one N(1) atom. The C(3)-C(4) bond length is 1.37 Å. The C(3)-N(1) bond length is 1.34 Å. In the fourth C site, C(4) is bonded in a distorted trigonal planar geometry to one C(3), one N(2), and one H(2) atom. The C(4)-N(2) bond length is 1.34 Å. The C(4)-H(2) bond length is 0.93 Å. In the fifth C site, C(5) is bonded in a distorted bent 120 degrees geometry to one N(2) and one H(3) atom. The C(5)-N(2) bond length is 1.33 Å. The C(5)-H(3) bond length is 0.93 Å. There are two inequivalent N sites. In the first N site, N(1) is bonded in a trigonal planar geometry to one Ni(1), one C(2), and one C(3) atom. In the second N site, N(2) is bonded in a trigonal planar geometry to one Ni(1), one C(4), and one C(5) atom. There are three inequivalent H sites. In the first H site, H(1) is bonded in a single-bond geometry to one C(2) atom. In the second H site, H(2) is bonded in a single-bond geometry to one C(4) atom. In the third H site, H(3) is bonded in a single-bond geometry to one C(5) atom. There are two inequivalent O sites. In the first O site, O(1) is bonded in a bent 120 degrees geometry to one Ni(1) and one C(1) atom. In the second O site, O(2) is bonded in a distorted bent 150 degrees geometry to one Ni(1) and one C(1) atom. Cl(1) is bonded in a single-bond geometry to one Ni(1) atom. Linkers: 3 [O]C(=O)c1cnccn1. Metal clusters: 4 [Ni]. The MOF has largest included sphere 3.78 A, density 1.55 g\/cm3, surface area 3667.09 m2\/g, accessible volume 0.28 cm3\/g"}", "/scratch/micpie/export/mofdscribe/test_0-0.jsonl": "{"text":"Task: Write a description of the structure with the CIF card [CIF]\ndata_Hg3H8C6(S2Cl3)2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.610\n_cell_length_b 7.247\n_cell_length_c 9.920\n_cell_angle_alpha 101.245\n_cell_angle_beta 91.888\n_cell_angle_gamma 91.378\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Hg3H8C6(S2Cl3)2\n_chemical_formula_sum 'Hg3 H8 C6 S4 Cl6'\n_cell_volume 465.602\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Hg Hg0 1 0.000 0.000 0.000 1.0\n Hg Hg1 1 0.507 0.321 0.142 1.0\n Hg Hg2 1 0.493 0.679 0.858 1.0\n H H3 1 0.988 0.830 0.522 1.0\n H H4 1 0.012 0.170 0.478 1.0\n H H5 1 0.232 0.945 0.565 1.0\n H H6 1 0.768 0.055 0.435 1.0\n H H7 1 0.106 0.653 0.301 1.0\n H H8 1 0.894 0.347 0.699 1.0\n H H9 1 0.200 0.890 0.310 1.0\n H H10 1 0.800 0.110 0.690 1.0\n C C11 1 0.435 0.566 0.489 1.0\n C C12 1 0.565 0.434 0.511 1.0\n C C13 1 0.152 0.817 0.510 1.0\n C C14 1 0.848 0.183 0.490 1.0\n C C15 1 0.204 0.767 0.359 1.0\n C C16 1 0.796 0.233 0.641 1.0\n S S17 1 0.464 0.691 0.356 1.0\n S S18 1 0.536 0.309 0.644 1.0\n S S19 1 0.225 0.621 0.589 1.0\n S S20 1 0.775 0.379 0.411 1.0\n Cl Cl21 1 0.732 0.921 0.133 1.0\n Cl Cl22 1 0.268 0.079 0.867 1.0\n Cl Cl23 1 0.771 0.383 0.999 1.0\n Cl Cl24 1 0.229 0.617 0.001 1.0\n Cl Cl25 1 0.218 0.217 0.250 1.0\n Cl Cl26 1 0.782 0.783 0.750 1.0\n[\/CIF]\n.\nAnswer: (HgCl2)3(C3S2H4)2 is Indium-derived structured and crystallizes in the triclinic P-1 space group. The structure is zero-dimensional and consists of one 1,3-dithiolane, 2-(1,3-dithiolan-2-ylidene)- molecule and one HgCl2 cluster. In the HgCl2 cluster, there are two inequivalent Hg sites. In the first Hg site, Hg(1) is bonded in a distorted square co-planar geometry to two equivalent Cl(1) and two equivalent Cl(3) atoms. Both Hg(1)-Cl(1) bond lengths are 2.37 Å. Both Hg(1)-Cl(3) bond lengths are 2.98 Å. In the second Hg site, Hg(2) is bonded in a linear geometry to one Cl(2) and one Cl(3) atom. The Hg(2)-Cl(2) bond length is 2.37 Å. The Hg(2)-Cl(3) bond length is 2.39 Å. There are three inequivalent Cl sites. In the first Cl site, Cl(1) is bonded in a single-bond geometry to one Hg(1) atom. In the second Cl site, Cl(2) is bonded in a single-bond geometry to one Hg(2) atom. In the third Cl site, Cl(3) is bonded in a distorted water-like geometry to one Hg(1) and one Hg(2) atom. Linkers: 2 [CH]1CSC(=C2SCCS2)S1 ,1 C1CSC(=C2SCCS2)S1. Metal clusters: 3 [Hg]. The MOF has largest included sphere 1.80 A, density 3.65 g\/cm3, surface area 2242.82 m2\/g, accessible volume 0.04 cm3\/g"} {"text":"Task: Describe the structure with the Crystallographic Information File (CIF) [CIF]\ndata_CaH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.577\n_cell_length_b 11.123\n_cell_length_c 18.033\n_cell_angle_alpha 90.000\n_cell_angle_beta 97.602\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural CaH2C6SO4\n_chemical_formula_sum 'Ca4 H8 C24 S4 O16'\n_cell_volume 1307.600\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ca Ca0 1 0.253 0.907 0.477 1\n Ca Ca1 1 0.247 0.407 0.023 1\n Ca Ca2 1 0.747 0.093 0.523 1\n Ca Ca3 1 0.753 0.593 0.977 1\n H H4 1 0.615 0.106 0.704 1\n H H5 1 0.582 0.196 0.826 1\n H H6 1 0.885 0.606 0.796 1\n H H7 1 0.918 0.696 0.674 1\n H H8 1 0.385 0.894 0.296 1\n H H9 1 0.418 0.804 0.174 1\n H H10 1 0.115 0.394 0.204 1\n H H11 1 0.082 0.304 0.326 1\n C C12 1 0.195 0.286 0.857 1\n C C13 1 0.276 0.095 0.592 1\n C C14 1 0.313 0.137 0.670 1\n C C15 1 0.493 0.137 0.717 1\n C C16 1 0.474 0.189 0.787 1\n C C17 1 0.279 0.228 0.792 1\n C C18 1 0.305 0.786 0.643 1\n C C19 1 0.225 0.595 0.908 1\n C C20 1 0.187 0.637 0.830 1\n C C21 1 0.007 0.637 0.783 1\n C C22 1 0.026 0.689 0.713 1\n C C23 1 0.221 0.728 0.708 1\n C C24 1 0.805 0.714 0.143 1\n C C25 1 0.725 0.905 0.408 1\n C C26 1 0.687 0.863 0.330 1\n C C27 1 0.507 0.863 0.283 1\n C C28 1 0.526 0.811 0.213 1\n C C29 1 0.721 0.772 0.208 1\n C C30 1 0.695 0.214 0.357 1\n C C31 1 0.775 0.405 0.092 1\n C C32 1 0.813 0.363 0.170 1\n C C33 1 0.993 0.363 0.217 1\n C C34 1 0.974 0.311 0.287 1\n C C35 1 0.779 0.272 0.292 1\n S S36 1 0.117 0.199 0.712 1\n S S37 1 0.383 0.699 0.788 1\n S S38 1 0.883 0.801 0.288 1\n S S39 1 0.617 0.301 0.212 1\n O O40 1 0.096 0.096 0.558 1\n O O41 1 0.425 0.057 0.562 1\n O O42 1 0.324 0.322 0.910 1\n O O43 1 0.006 0.294 0.852 1\n O O44 1 0.405 0.596 0.942 1\n O O45 1 0.075 0.557 0.938 1\n O O46 1 0.176 0.822 0.590 1\n O O47 1 0.494 0.794 0.648 1\n O O48 1 0.904 0.904 0.442 1\n O O49 1 0.575 0.943 0.438 1\n O O50 1 0.676 0.678 0.090 1\n O O51 1 0.994 0.706 0.148 1\n O O52 1 0.596 0.404 0.058 1\n O O53 1 0.925 0.443 0.062 1\n O O54 1 0.824 0.178 0.410 1\n O O55 1 0.506 0.206 0.352 1\n[\/CIF]\n.\nAnswer: CaC6H2SO4 crystallizes in the monoclinic P2_1\/c space group. Ca(1) is bonded in a 5-coordinate geometry to one O(3), two equivalent O(1), and two equivalent O(2) atoms. The Ca(1)-O(3) bond length is 2.36 Å. There is one shorter (2.29 Å) and one longer (2.84 Å) Ca(1)-O(1) bond length. There is one shorter (2.35 Å) and one longer (2.44 Å) Ca(1)-O(2) bond length. There are six inequivalent C sites. In the first C site, C(1) is bonded in a distorted bent 120 degrees geometry to one C(6), one O(3), and one O(4) atom. The C(1)-C(6) bond length is 1.50 Å. The C(1)-O(3) bond length is 1.26 Å. The C(1)-O(4) bond length is 1.24 Å. In the second C site, C(2) is bonded in a bent 120 degrees geometry to one C(3), one O(1), and one O(2) atom. The C(2)-C(3) bond length is 1.47 Å. The C(2)-O(1) bond length is 1.26 Å. The C(2)-O(2) bond length is 1.26 Å. In the third C site, C(3) is bonded in a trigonal planar geometry to one C(2), one C(4), and one S(1) atom. The C(3)-C(4) bond length is 1.36 Å. The C(3)-S(1) bond length is 1.72 Å. In the fourth C site, C(4) is bonded in a distorted single-bond geometry to one C(3) and one H(1,2) atom. The C(4)-H(1,2) bond length is 0.93 Å. In the fifth C site, C(5) is bonded in a distorted single-bond geometry to one C(6) and one H(1,2) atom. The C(5)-C(6) bond length is 1.37 Å. The C(5)-H(1,2) bond length is 0.93 Å. In the sixth C site, C(6) is bonded in a trigonal planar geometry to one C(1), one C(5), and one S(1) atom. The C(6)-S(1) bond length is 1.71 Å. H(1,2) is bonded in a single-bond geometry to one C(4) atom. S(1) is bonded in an L-shaped geometry to one C(3) and one C(6) atom. There are four inequivalent O sites. In the first O site, O(1) is bonded in a 2-coordinate geometry to two equivalent Ca(1) and one C(2) atom. In the second O site, O(2) is bonded in a 3-coordinate geometry to two equivalent Ca(1) and one C(2) atom. In the third O site, O(3) is bonded in a bent 120 degrees geometry to one Ca(1) and one C(1) atom. In the fourth O site, O(4) is bonded in a single-bond geometry to one C(1) atom. Linkers: 4 [O]C(=O)c1ccc(C([O])=O)s1. Metal clusters: 4 [Ca]. The MOF has largest included sphere 5.44 A, density 1.07 g\/cm3, surface area 4304.84 m2\/g, accessible volume 0.53 cm3\/g"}", "/scratch/micpie/export/mofdscribe/train_0-0.jsonl": "{"text":"Task: Write a description of the structure with the Crystallographic Information File (CIF) [CIF]\ndata_CdH6C5N8\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.565\n_cell_length_b 7.897\n_cell_length_c 8.953\n_cell_angle_alpha 72.617\n_cell_angle_beta 88.137\n_cell_angle_gamma 87.528\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural CdH6C5N8\n_chemical_formula_sum 'Cd2 H12 C10 N16'\n_cell_volume 442.489\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cd Cd0 1 0.403 0.635 0.802 1.0\n Cd Cd1 1 0.597 0.365 0.198 1.0\n H H2 1 0.337 0.878 0.463 1.0\n H H3 1 0.663 0.122 0.537 1.0\n H H4 1 0.094 0.399 0.451 1.0\n H H5 1 0.906 0.601 0.549 1.0\n H H6 1 0.045 0.285 0.742 1.0\n H H7 1 0.955 0.715 0.258 1.0\n H H8 1 0.154 0.971 0.881 1.0\n H H9 1 0.846 0.029 0.119 1.0\n H H10 1 0.160 0.304 0.216 1.0\n H H11 1 0.840 0.696 0.784 1.0\n H H12 1 0.287 0.110 0.223 1.0\n H H13 1 0.713 0.890 0.777 1.0\n C C14 1 0.276 0.974 0.521 1.0\n C C15 1 0.724 0.026 0.479 1.0\n C C16 1 0.226 0.150 0.434 1.0\n C C17 1 0.774 0.850 0.566 1.0\n C C18 1 0.138 0.263 0.515 1.0\n C C19 1 0.862 0.737 0.485 1.0\n C C20 1 0.110 0.199 0.676 1.0\n C C21 1 0.890 0.801 0.324 1.0\n C C22 1 0.170 0.025 0.754 1.0\n C C23 1 0.830 0.975 0.246 1.0\n N N24 1 0.438 0.644 0.062 1.0\n N N25 1 0.562 0.356 0.938 1.0\n N N26 1 0.400 0.769 0.111 1.0\n N N27 1 0.600 0.231 0.889 1.0\n N N28 1 0.361 0.890 0.158 1.0\n N N29 1 0.639 0.110 0.842 1.0\n N N30 1 0.063 0.528 0.870 1.0\n N N31 1 0.937 0.472 0.130 1.0\n N N32 1 1.000 0.500 0.000 1.0\n N N33 1 0.388 0.576 0.566 1.0\n N N34 1 0.612 0.424 0.434 1.0\n N N35 1 0.500 0.500 0.500 1.0\n N N36 1 0.251 0.915 0.677 1.0\n N N37 1 0.749 0.085 0.323 1.0\n N N38 1 0.269 0.211 0.273 1.0\n N N39 1 0.731 0.789 0.727 1.0\n[\/CIF]\n.\nAnswer: CdH5(CN2)4CH crystallizes in the triclinic P-1 space group. The structure consists of two 02329_fluka molecules inside a CdH5(CN2)4 framework. In the CdH5(CN2)4 framework, Cd(1) is bonded to one N(4), one N(6), one N(8), one N(9), and two equivalent N(1) atoms to form edge-sharing CdN6 octahedra. The Cd(1)-N(4) bond length is 2.41 Å. The Cd(1)-N(6) bond length is 2.30 Å. The Cd(1)-N(8) bond length is 2.35 Å. The Cd(1)-N(9) bond length is 2.49 Å. There is one shorter (2.37 Å) and one longer (2.39 Å) Cd(1)-N(1) bond length. There are four inequivalent C sites. In the first C site, C(1) is bonded in a distorted trigonal planar geometry to one C(2), one N(8), and one H(1) atom. The C(1)-C(2) bond length is 1.40 Å. The C(1)-N(8) bond length is 1.34 Å. The C(1)-H(1) bond length is 1.09 Å. In the second C site, C(2) is bonded in a distorted trigonal planar geometry to one C(1), one C(3), and one N(9) atom. The C(2)-C(3) bond length is 1.40 Å. The C(2)-N(9) bond length is 1.41 Å. In the third C site, C(3) is bonded in a distorted single-bond geometry to one C(2) and one H(2) atom. The C(3)-H(2) bond length is 1.09 Å. In the fourth C site, C(5) is bonded in a distorted bent 120 degrees geometry to one N(8) and one H(4) atom. The C(5)-N(8) bond length is 1.35 Å. The C(5)-H(4) bond length is 1.09 Å. There are nine inequivalent N sites. In the first N site, N(1) is bonded in a trigonal planar geometry to two equivalent Cd(1) and one N(2) atom. The N(1)-N(2) bond length is 1.21 Å. In the second N site, N(2) is bonded in a linear geometry to one N(1) and one N(3) atom. The N(2)-N(3) bond length is 1.17 Å. In the third N site, N(3) is bonded in a single-bond geometry to one N(2) atom. In the fourth N site, N(4) is bonded in a bent 120 degrees geometry to one Cd(1) and one N(5) atom. The N(4)-N(5) bond length is 1.19 Å. In the fifth N site, N(5) is bonded in a linear geometry to two equivalent N(4) atoms. In the sixth N site, N(6) is bonded in a distorted bent 120 degrees geometry to one Cd(1) and one N(7) atom. The N(6)-N(7) bond length is 1.18 Å. In the seventh N site, N(7) is bonded in a linear geometry to two equivalent N(6) atoms. In the eighth N site, N(8) is bonded in a trigonal planar geometry to one Cd(1), one C(1), and one C(5) atom. In the ninth N site, N(9) is bonded in a 3-coordinate geometry to one Cd(1), one C(2), one H(5), and one H(6) atom. The N(9)-H(5) bond length is 1.03 Å. The N(9)-H(6) bond length is 1.03 Å. There are five inequivalent H sites. In the first H site, H(1) is bonded in a single-bond geometry to one C(1) atom. In the second H site, H(2) is bonded in a single-bond geometry to one C(3) atom. In the third H site, H(4) is bonded in a single-bond geometry to one C(5) atom. In the fourth H site, H(5) is bonded in a single-bond geometry to one N(9) atom. In the fifth H site, H(6) is bonded in a single-bond geometry to one N(9) atom. Linkers: 4 [N][N][N] ,2 Nc1cccnc1. Metal clusters: 2 [Cd]. The MOF has largest included sphere 2.22 A, density 2.18 g\/cm3, surface area 3608.94 m2\/g, accessible volume 0.08 cm3\/g"} {"text":"Task: Write a description of the structure with the Crystallographic Information File (CIF) [CIF]\ndata_FeHC4O3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.843\n_cell_length_b 15.159\n_cell_length_c 15.159\n_cell_angle_alpha 62.223\n_cell_angle_beta 81.346\n_cell_angle_gamma 98.654\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural FeHC4O3\n_chemical_formula_sum 'Fe6 H6 C24 O18'\n_cell_volume 1330.982\n_cell_formula_units_Z 6\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Fe Fe0 1 0.892 0.685 0.586 1\n Fe Fe1 1 0.478 0.586 0.729 1\n Fe Fe2 1 0.207 0.729 0.685 1\n Fe Fe3 1 0.108 0.315 0.414 1\n Fe Fe4 1 0.522 0.414 0.271 1\n Fe Fe5 1 0.793 0.271 0.315 1\n H H6 1 0.646 0.019 0.538 1\n H H7 1 0.184 0.538 0.443 1\n H H8 1 0.628 0.443 0.019 1\n H H9 1 0.353 0.981 0.462 1\n H H10 1 0.816 0.462 0.557 1\n H H11 1 0.372 0.557 0.981 1\n C C12 1 0.663 0.822 0.616 1\n C C13 1 0.832 0.912 0.543 1\n C C14 1 0.033 0.897 0.535 1\n C C15 1 0.805 0.008 0.523 1\n C C16 1 0.279 0.616 0.562 1\n C C17 1 0.374 0.543 0.545 1\n C C18 1 0.569 0.535 0.568 1\n C C19 1 0.328 0.523 0.469 1\n C C20 1 0.841 0.562 0.822 1\n C C21 1 0.919 0.545 0.912 1\n C C22 1 0.136 0.568 0.897 1\n C C23 1 0.797 0.469 0.008 1\n C C24 1 0.337 0.178 0.384 1\n C C25 1 0.169 0.088 0.457 1\n C C26 1 0.967 0.103 0.465 1\n C C27 1 0.195 0.992 0.477 1\n C C28 1 0.721 0.384 0.438 1\n C C29 1 0.626 0.457 0.455 1\n C C30 1 0.431 0.465 0.432 1\n C C31 1 0.672 0.477 0.531 1\n C C32 1 0.159 0.438 0.178 1\n C C33 1 0.081 0.455 0.088 1\n C C34 1 0.864 0.432 0.103 1\n C C35 1 0.203 0.531 0.992 1\n O O36 1 0.680 0.729 0.648 1\n O O37 1 0.479 0.858 0.617 1\n O O38 1 0.071 0.809 0.569 1\n O O39 1 0.328 0.648 0.623 1\n O O40 1 0.096 0.617 0.525 1\n O O41 1 0.640 0.569 0.623 1\n O O42 1 0.951 0.623 0.729 1\n O O43 1 0.621 0.525 0.858 1\n O O44 1 0.262 0.623 0.809 1\n O O45 1 0.320 0.271 0.352 1\n O O46 1 0.521 0.142 0.384 1\n O O47 1 0.929 0.191 0.431 1\n O O48 1 0.672 0.352 0.377 1\n O O49 1 0.904 0.384 0.475 1\n O O50 1 0.360 0.431 0.377 1\n O O51 1 0.049 0.377 0.271 1\n O O52 1 0.379 0.475 0.142 1\n O O53 1 0.738 0.377 0.191 1\n[\/CIF]\n.\nFeC4HO3 crystallizes in the trigonal R-3 space group. Fe(1) is bonded to one O(2), two equivalent O(1), and two equivalent O(3) atoms to form distorted edge-sharing FeO5 square pyramids. The Fe(1)-O(2) bond length is 2.19 Å. There is one shorter (1.97 Å) and one longer (2.05 Å) Fe(1)-O(1) bond length. There is one shorter (1.96 Å) and one longer (2.07 Å) Fe(1)-O(3) bond length. There are four inequivalent C sites. In the first C site, C(1) is bonded in a distorted bent 120 degrees geometry to one C(2), one O(1), and one O(2) atom. The C(1)-C(2) bond length is 1.47 Å. The C(1)-O(1) bond length is 1.29 Å. The C(1)-O(2) bond length is 1.45 Å. In the second C site, C(2) is bonded in a trigonal planar geometry to one C(1), one C(3), and one C(4) atom. The C(2)-C(3) bond length is 1.43 Å. The C(2)-C(4) bond length is 1.38 Å. In the third C site, C(3) is bonded in a single-bond geometry to one C(2), one C(4), and one O(3) atom. The C(3)-C(4) bond length is 1.46 Å. The C(3)-O(3) bond length is 1.27 Å. In the fourth C site, C(4) is bonded in a distorted trigonal planar geometry to one C(2), one C(3), and one H(1) atom. The C(4)-H(1) bond length is 1.12 Å. H(1) is bonded in a single-bond geometry to one C(4) atom. There are three inequivalent O sites. In the first O site, O(1) is bonded in a distorted trigonal planar geometry to two equivalent Fe(1) and one C(1) atom. In the second O site, O(2) is bonded in a water-like geometry to one Fe(1) and one C(1) atom. In the third O site, O(3) is bonded in a distorted trigonal planar geometry to two equivalent Fe(1) and one C(3) atom. Linkers: 3 [O]C(=O)[C]1[CH]C(=O)[C](C([O])=O)[CH]C1=O. Metal clusters: 6 [Fe]. The MOF has largest included sphere 11.48 A, density 1.14 g\/cm3, surface area 2345.11 m2\/g, accessible volume 0.52 cm3\/g"}", "/scratch/micpie/export/mofdscribe/valid_0-1.jsonl": "{"text":"Task: Create a CIF file of a structure with the following description\nMnH4(C2O3)2 crystallizes in the monoclinic C2\/c space group. Mn(1) is bonded in an octahedral geometry to two equivalent O(1), two equivalent O(2), and two equivalent O(3) atoms. Both Mn(1)-O(1) bond lengths are 2.14 Å. Both Mn(1)-O(2) bond lengths are 2.20 Å. Both Mn(1)-O(3) bond lengths are 2.22 Å. There are two inequivalent C sites. In the first C site, C(1) is bonded in a distorted bent 120 degrees geometry to one C(2), one O(2), and one O(3) atom. The C(1)-C(2) bond length is 1.44 Å. The C(1)-O(2) bond length is 1.28 Å. The C(1)-O(3) bond length is 1.27 Å. In the second C site, C(2) is bonded in a linear geometry to one C(1) and one C(2) atom. The C(2)-C(2) bond length is 1.22 Å. There are two inequivalent H sites. In the first H site, H(1) is bonded in a single-bond geometry to one O(1) and one O(2) atom. The H(1)-O(1) bond length is 1.00 Å. The H(1)-O(2) bond length is 1.75 Å. In the second H site, H(2) is bonded in a single-bond geometry to one O(1) atom. The H(2)-O(1) bond length is 0.99 Å. There are three inequivalent O sites. In the first O site, O(1) is bonded in a distorted trigonal non-coplanar geometry to one Mn(1), one H(1), and one H(2) atom. In the second O site, O(2) is bonded in a 3-coordinate geometry to one Mn(1), one C(1), and one H(1) atom. In the third O site, O(3) is bonded in a distorted bent 120 degrees geometry to one Mn(1) and one C(1) atom. Linkers: 2 [O]C(=O)C#CC([O])=O. Metal clusters: 2 [Mn]. RCSR code: pts. The MOF has largest included sphere 1.48 A, density 2.14 g\/cm3, surface area 3172.67 m2\/g, accessible volume 0.06 cm3\/g.\nAnswer: [CIF]\ndata_MnH4(C2O3)2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.167\n_cell_length_b 7.565\n_cell_length_c 7.814\n_cell_angle_alpha 91.240\n_cell_angle_beta 117.313\n_cell_angle_gamma 118.290\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural MnH4(C2O3)2\n_chemical_formula_sum 'Mn2 H8 C8 O12'\n_cell_volume 315.619\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Mn Mn0 1 0.500 0.000 0.500 1.0\n Mn Mn1 1 0.500 0.500 1.000 1.0\n H H2 1 0.737 0.405 0.446 1.0\n H H3 1 0.387 0.095 0.054 1.0\n H H4 1 0.263 0.595 0.554 1.0\n H H5 1 0.613 0.905 0.946 1.0\n H H6 1 0.917 0.366 0.641 1.0\n H H7 1 0.410 0.134 0.859 1.0\n H H8 1 0.083 0.634 0.359 1.0\n H H9 1 0.590 0.866 0.141 1.0\n C C10 1 0.034 0.770 0.050 1.0\n C C11 1 0.714 0.730 0.450 1.0\n C C12 1 0.966 0.230 0.950 1.0\n C C13 1 0.286 0.270 0.550 1.0\n C C14 1 0.928 0.754 0.839 1.0\n C C15 1 0.835 0.746 0.661 1.0\n C C16 1 0.072 0.246 0.161 1.0\n C C17 1 0.165 0.254 0.339 1.0\n O O18 1 0.765 0.297 0.503 1.0\n O O19 1 0.465 0.203 0.997 1.0\n O O20 1 0.235 0.703 0.497 1.0\n O O21 1 0.535 0.797 0.003 1.0\n O O22 1 0.260 0.924 0.171 1.0\n O O23 1 0.665 0.576 0.329 1.0\n O O24 1 0.740 0.076 0.829 1.0\n O O25 1 0.335 0.424 0.671 1.0\n O O26 1 0.898 0.633 0.102 1.0\n O O27 1 0.664 0.867 0.398 1.0\n O O28 1 0.102 0.367 0.898 1.0\n O O29 1 0.336 0.133 0.602 1.0\n[\/CIF]\n"} {"text":"Task: Generate a CIF file of a material with the following description\nNiC5N2H3O2Cl crystallizes in the orthorhombic Pna2_1 space group. Ni(1) is bonded in a square pyramidal geometry to one N(1), one N(2), one O(1), one O(2), and one Cl(1) atom. The Ni(1)-N(1) bond length is 2.08 Å. The Ni(1)-N(2) bond length is 2.06 Å. The Ni(1)-O(1) bond length is 2.05 Å. The Ni(1)-O(2) bond length is 2.03 Å. The Ni(1)-Cl(1) bond length is 2.46 Å. There are five inequivalent C sites. In the first C site, C(1) is bonded in a distorted bent 120 degrees geometry to one C(3), one O(1), and one O(2) atom. The C(1)-C(3) bond length is 1.51 Å. The C(1)-O(1) bond length is 1.25 Å. The C(1)-O(2) bond length is 1.25 Å. In the second C site, C(2) is bonded in a distorted bent 120 degrees geometry to one N(1) and one H(1) atom. The C(2)-N(1) bond length is 1.31 Å. The C(2)-H(1) bond length is 0.93 Å. In the third C site, C(3) is bonded in a distorted trigonal planar geometry to one C(1), one C(4), and one N(1) atom. The C(3)-C(4) bond length is 1.37 Å. The C(3)-N(1) bond length is 1.34 Å. In the fourth C site, C(4) is bonded in a distorted trigonal planar geometry to one C(3), one N(2), and one H(2) atom. The C(4)-N(2) bond length is 1.34 Å. The C(4)-H(2) bond length is 0.93 Å. In the fifth C site, C(5) is bonded in a distorted bent 120 degrees geometry to one N(2) and one H(3) atom. The C(5)-N(2) bond length is 1.33 Å. The C(5)-H(3) bond length is 0.93 Å. There are two inequivalent N sites. In the first N site, N(1) is bonded in a trigonal planar geometry to one Ni(1), one C(2), and one C(3) atom. In the second N site, N(2) is bonded in a trigonal planar geometry to one Ni(1), one C(4), and one C(5) atom. There are three inequivalent H sites. In the first H site, H(1) is bonded in a single-bond geometry to one C(2) atom. In the second H site, H(2) is bonded in a single-bond geometry to one C(4) atom. In the third H site, H(3) is bonded in a single-bond geometry to one C(5) atom. There are two inequivalent O sites. In the first O site, O(1) is bonded in a bent 120 degrees geometry to one Ni(1) and one C(1) atom. In the second O site, O(2) is bonded in a distorted bent 150 degrees geometry to one Ni(1) and one C(1) atom. Cl(1) is bonded in a single-bond geometry to one Ni(1) atom. Linkers: 3 [O]C(=O)c1cnccn1. Metal clusters: 4 [Ni]. The MOF has largest included sphere 3.78 A, density 1.55 g\/cm3, surface area 3667.09 m2\/g, accessible volume 0.28 cm3\/g.\nAnswer: [CIF]\ndata_NiH3C5N2ClO2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.865\n_cell_length_b 10.545\n_cell_length_c 11.200\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural NiH3C5N2ClO2\n_chemical_formula_sum 'Ni4 H12 C20 N8 Cl4 O8'\n_cell_volume 928.900\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ni Ni0 1 0.743 0.215 0.321 1\n Ni Ni1 1 0.243 0.785 0.679 1\n Ni Ni2 1 0.743 0.715 0.179 1\n Ni Ni3 1 0.243 0.285 0.821 1\n H H4 1 0.964 0.038 0.144 1\n H H5 1 0.996 0.428 0.982 1\n H H6 1 0.126 0.072 0.980 1\n H H7 1 0.464 0.962 0.856 1\n H H8 1 0.496 0.572 0.018 1\n H H9 1 0.625 0.928 0.021 1\n H H10 1 0.964 0.538 0.356 1\n H H11 1 0.996 0.928 0.518 1\n H H12 1 0.126 0.572 0.520 1\n H H13 1 0.464 0.462 0.644 1\n H H14 1 0.496 0.072 0.482 1\n H H15 1 0.625 0.428 0.479 1\n C C16 1 0.789 0.429 0.173 1\n C C17 1 0.969 0.119 0.112 1\n C C18 1 0.891 0.326 0.112 1\n C C19 1 0.992 0.347 0.015 1\n C C20 1 0.068 0.140 0.013 1\n C C21 1 0.289 0.571 0.827 1\n C C22 1 0.469 0.881 0.888 1\n C C23 1 0.391 0.674 0.888 1\n C C24 1 0.491 0.653 0.985 1\n C C25 1 0.568 0.860 0.987 1\n C C26 1 0.789 0.929 0.327 1\n C C27 1 0.969 0.619 0.388 1\n C C28 1 0.891 0.826 0.388 1\n C C29 1 0.992 0.847 0.485 1\n C C30 1 0.068 0.640 0.486 1\n C C31 1 0.289 0.071 0.673 1\n C C32 1 0.469 0.381 0.612 1\n C C33 1 0.391 0.174 0.612 1\n C C34 1 0.491 0.153 0.515 1\n C C35 1 0.568 0.360 0.513 1\n N N36 1 0.882 0.211 0.163 1\n N N37 1 0.084 0.255 0.964 1\n N N38 1 0.382 0.789 0.837 1\n N N39 1 0.584 0.745 0.036 1\n N N40 1 0.882 0.711 0.337 1\n N N41 1 0.084 0.755 0.536 1\n N N42 1 0.382 0.289 0.663 1\n N N43 1 0.584 0.245 0.464 1\n Cl Cl44 1 0.992 0.281 0.439 1\n Cl Cl45 1 0.492 0.719 0.561 1\n Cl Cl46 1 0.992 0.781 0.061 1\n Cl Cl47 1 0.492 0.219 0.939 1\n O O48 1 0.714 0.400 0.267 1\n O O49 1 0.793 0.535 0.124 1\n O O50 1 0.214 0.600 0.733 1\n O O51 1 0.293 0.465 0.876 1\n O O52 1 0.714 0.900 0.233 1\n O O53 1 0.793 0.035 0.376 1\n O O54 1 0.214 0.100 0.767 1\n O O55 1 0.293 0.965 0.624 1\n[\/CIF]\n"}", "/scratch/micpie/export/mofdscribe/train_0-1.jsonl": "{"text":"Task: Create a Crystallographic Information File (CIF) of a material with the following description\nCdH5(CN2)4CH crystallizes in the triclinic P-1 space group. The structure consists of two 02329_fluka molecules inside a CdH5(CN2)4 framework. In the CdH5(CN2)4 framework, Cd(1) is bonded to one N(4), one N(6), one N(8), one N(9), and two equivalent N(1) atoms to form edge-sharing CdN6 octahedra. The Cd(1)-N(4) bond length is 2.41 Å. The Cd(1)-N(6) bond length is 2.30 Å. The Cd(1)-N(8) bond length is 2.35 Å. The Cd(1)-N(9) bond length is 2.49 Å. There is one shorter (2.37 Å) and one longer (2.39 Å) Cd(1)-N(1) bond length. There are four inequivalent C sites. In the first C site, C(1) is bonded in a distorted trigonal planar geometry to one C(2), one N(8), and one H(1) atom. The C(1)-C(2) bond length is 1.40 Å. The C(1)-N(8) bond length is 1.34 Å. The C(1)-H(1) bond length is 1.09 Å. In the second C site, C(2) is bonded in a distorted trigonal planar geometry to one C(1), one C(3), and one N(9) atom. The C(2)-C(3) bond length is 1.40 Å. The C(2)-N(9) bond length is 1.41 Å. In the third C site, C(3) is bonded in a distorted single-bond geometry to one C(2) and one H(2) atom. The C(3)-H(2) bond length is 1.09 Å. In the fourth C site, C(5) is bonded in a distorted bent 120 degrees geometry to one N(8) and one H(4) atom. The C(5)-N(8) bond length is 1.35 Å. The C(5)-H(4) bond length is 1.09 Å. There are nine inequivalent N sites. In the first N site, N(1) is bonded in a trigonal planar geometry to two equivalent Cd(1) and one N(2) atom. The N(1)-N(2) bond length is 1.21 Å. In the second N site, N(2) is bonded in a linear geometry to one N(1) and one N(3) atom. The N(2)-N(3) bond length is 1.17 Å. In the third N site, N(3) is bonded in a single-bond geometry to one N(2) atom. In the fourth N site, N(4) is bonded in a bent 120 degrees geometry to one Cd(1) and one N(5) atom. The N(4)-N(5) bond length is 1.19 Å. In the fifth N site, N(5) is bonded in a linear geometry to two equivalent N(4) atoms. In the sixth N site, N(6) is bonded in a distorted bent 120 degrees geometry to one Cd(1) and one N(7) atom. The N(6)-N(7) bond length is 1.18 Å. In the seventh N site, N(7) is bonded in a linear geometry to two equivalent N(6) atoms. In the eighth N site, N(8) is bonded in a trigonal planar geometry to one Cd(1), one C(1), and one C(5) atom. In the ninth N site, N(9) is bonded in a 3-coordinate geometry to one Cd(1), one C(2), one H(5), and one H(6) atom. The N(9)-H(5) bond length is 1.03 Å. The N(9)-H(6) bond length is 1.03 Å. There are five inequivalent H sites. In the first H site, H(1) is bonded in a single-bond geometry to one C(1) atom. In the second H site, H(2) is bonded in a single-bond geometry to one C(3) atom. In the third H site, H(4) is bonded in a single-bond geometry to one C(5) atom. In the fourth H site, H(5) is bonded in a single-bond geometry to one N(9) atom. In the fifth H site, H(6) is bonded in a single-bond geometry to one N(9) atom. Linkers: 4 [N][N][N] ,2 Nc1cccnc1. Metal clusters: 2 [Cd]. The MOF has largest included sphere 2.22 A, density 2.18 g\/cm3, surface area 3608.94 m2\/g, accessible volume 0.08 cm3\/g.\n[CIF]\ndata_CdH6C5N8\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.565\n_cell_length_b 7.897\n_cell_length_c 8.953\n_cell_angle_alpha 72.617\n_cell_angle_beta 88.137\n_cell_angle_gamma 87.528\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural CdH6C5N8\n_chemical_formula_sum 'Cd2 H12 C10 N16'\n_cell_volume 442.489\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cd Cd0 1 0.403 0.635 0.802 1.0\n Cd Cd1 1 0.597 0.365 0.198 1.0\n H H2 1 0.337 0.878 0.463 1.0\n H H3 1 0.663 0.122 0.537 1.0\n H H4 1 0.094 0.399 0.451 1.0\n H H5 1 0.906 0.601 0.549 1.0\n H H6 1 0.045 0.285 0.742 1.0\n H H7 1 0.955 0.715 0.258 1.0\n H H8 1 0.154 0.971 0.881 1.0\n H H9 1 0.846 0.029 0.119 1.0\n H H10 1 0.160 0.304 0.216 1.0\n H H11 1 0.840 0.696 0.784 1.0\n H H12 1 0.287 0.110 0.223 1.0\n H H13 1 0.713 0.890 0.777 1.0\n C C14 1 0.276 0.974 0.521 1.0\n C C15 1 0.724 0.026 0.479 1.0\n C C16 1 0.226 0.150 0.434 1.0\n C C17 1 0.774 0.850 0.566 1.0\n C C18 1 0.138 0.263 0.515 1.0\n C C19 1 0.862 0.737 0.485 1.0\n C C20 1 0.110 0.199 0.676 1.0\n C C21 1 0.890 0.801 0.324 1.0\n C C22 1 0.170 0.025 0.754 1.0\n C C23 1 0.830 0.975 0.246 1.0\n N N24 1 0.438 0.644 0.062 1.0\n N N25 1 0.562 0.356 0.938 1.0\n N N26 1 0.400 0.769 0.111 1.0\n N N27 1 0.600 0.231 0.889 1.0\n N N28 1 0.361 0.890 0.158 1.0\n N N29 1 0.639 0.110 0.842 1.0\n N N30 1 0.063 0.528 0.870 1.0\n N N31 1 0.937 0.472 0.130 1.0\n N N32 1 1.000 0.500 0.000 1.0\n N N33 1 0.388 0.576 0.566 1.0\n N N34 1 0.612 0.424 0.434 1.0\n N N35 1 0.500 0.500 0.500 1.0\n N N36 1 0.251 0.915 0.677 1.0\n N N37 1 0.749 0.085 0.323 1.0\n N N38 1 0.269 0.211 0.273 1.0\n N N39 1 0.731 0.789 0.727 1.0\n[\/CIF]\n"} {"text":"Task: Create a Crystallographic Information File (CIF) of a structure with the following description\nFeC4HO3 crystallizes in the trigonal R-3 space group. Fe(1) is bonded to one O(2), two equivalent O(1), and two equivalent O(3) atoms to form distorted edge-sharing FeO5 square pyramids. The Fe(1)-O(2) bond length is 2.19 Å. There is one shorter (1.97 Å) and one longer (2.05 Å) Fe(1)-O(1) bond length. There is one shorter (1.96 Å) and one longer (2.07 Å) Fe(1)-O(3) bond length. There are four inequivalent C sites. In the first C site, C(1) is bonded in a distorted bent 120 degrees geometry to one C(2), one O(1), and one O(2) atom. The C(1)-C(2) bond length is 1.47 Å. The C(1)-O(1) bond length is 1.29 Å. The C(1)-O(2) bond length is 1.45 Å. In the second C site, C(2) is bonded in a trigonal planar geometry to one C(1), one C(3), and one C(4) atom. The C(2)-C(3) bond length is 1.43 Å. The C(2)-C(4) bond length is 1.38 Å. In the third C site, C(3) is bonded in a single-bond geometry to one C(2), one C(4), and one O(3) atom. The C(3)-C(4) bond length is 1.46 Å. The C(3)-O(3) bond length is 1.27 Å. In the fourth C site, C(4) is bonded in a distorted trigonal planar geometry to one C(2), one C(3), and one H(1) atom. The C(4)-H(1) bond length is 1.12 Å. H(1) is bonded in a single-bond geometry to one C(4) atom. There are three inequivalent O sites. In the first O site, O(1) is bonded in a distorted trigonal planar geometry to two equivalent Fe(1) and one C(1) atom. In the second O site, O(2) is bonded in a water-like geometry to one Fe(1) and one C(1) atom. In the third O site, O(3) is bonded in a distorted trigonal planar geometry to two equivalent Fe(1) and one C(3) atom. Linkers: 3 [O]C(=O)[C]1[CH]C(=O)[C](C([O])=O)[CH]C1=O. Metal clusters: 6 [Fe]. The MOF has largest included sphere 11.48 A, density 1.14 g\/cm3, surface area 2345.11 m2\/g, accessible volume 0.52 cm3\/g.\nAnswer: [CIF]\ndata_FeHC4O3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.843\n_cell_length_b 15.159\n_cell_length_c 15.159\n_cell_angle_alpha 62.223\n_cell_angle_beta 81.346\n_cell_angle_gamma 98.654\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural FeHC4O3\n_chemical_formula_sum 'Fe6 H6 C24 O18'\n_cell_volume 1330.982\n_cell_formula_units_Z 6\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Fe Fe0 1 0.892 0.685 0.586 1\n Fe Fe1 1 0.478 0.586 0.729 1\n Fe Fe2 1 0.207 0.729 0.685 1\n Fe Fe3 1 0.108 0.315 0.414 1\n Fe Fe4 1 0.522 0.414 0.271 1\n Fe Fe5 1 0.793 0.271 0.315 1\n H H6 1 0.646 0.019 0.538 1\n H H7 1 0.184 0.538 0.443 1\n H H8 1 0.628 0.443 0.019 1\n H H9 1 0.353 0.981 0.462 1\n H H10 1 0.816 0.462 0.557 1\n H H11 1 0.372 0.557 0.981 1\n C C12 1 0.663 0.822 0.616 1\n C C13 1 0.832 0.912 0.543 1\n C C14 1 0.033 0.897 0.535 1\n C C15 1 0.805 0.008 0.523 1\n C C16 1 0.279 0.616 0.562 1\n C C17 1 0.374 0.543 0.545 1\n C C18 1 0.569 0.535 0.568 1\n C C19 1 0.328 0.523 0.469 1\n C C20 1 0.841 0.562 0.822 1\n C C21 1 0.919 0.545 0.912 1\n C C22 1 0.136 0.568 0.897 1\n C C23 1 0.797 0.469 0.008 1\n C C24 1 0.337 0.178 0.384 1\n C C25 1 0.169 0.088 0.457 1\n C C26 1 0.967 0.103 0.465 1\n C C27 1 0.195 0.992 0.477 1\n C C28 1 0.721 0.384 0.438 1\n C C29 1 0.626 0.457 0.455 1\n C C30 1 0.431 0.465 0.432 1\n C C31 1 0.672 0.477 0.531 1\n C C32 1 0.159 0.438 0.178 1\n C C33 1 0.081 0.455 0.088 1\n C C34 1 0.864 0.432 0.103 1\n C C35 1 0.203 0.531 0.992 1\n O O36 1 0.680 0.729 0.648 1\n O O37 1 0.479 0.858 0.617 1\n O O38 1 0.071 0.809 0.569 1\n O O39 1 0.328 0.648 0.623 1\n O O40 1 0.096 0.617 0.525 1\n O O41 1 0.640 0.569 0.623 1\n O O42 1 0.951 0.623 0.729 1\n O O43 1 0.621 0.525 0.858 1\n O O44 1 0.262 0.623 0.809 1\n O O45 1 0.320 0.271 0.352 1\n O O46 1 0.521 0.142 0.384 1\n O O47 1 0.929 0.191 0.431 1\n O O48 1 0.672 0.352 0.377 1\n O O49 1 0.904 0.384 0.475 1\n O O50 1 0.360 0.431 0.377 1\n O O51 1 0.049 0.377 0.271 1\n O O52 1 0.379 0.475 0.142 1\n O O53 1 0.738 0.377 0.191 1\n[\/CIF]\n"}", "/scratch/micpie/export/compound_protein_compound_2/valid_5-2.jsonl": "{"text":"User: Can you come up with one example for a compound InChI that targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II)?\nAssistant: Sure, the compound InChI InChI=1S\/C24H25ClFN3O6\/c1-13-11-29(17(12-34-13)9-21(30)31)23(32)15-7-18-22(20(8-15)33-2)35-24(27-18)28-19(10-26)14-4-3-5-16(25)6-14\/h3-8,13,17,19H,9-12H2,1-2H3,(H,27,28)(H,30,31)\/t13?,17-,19?\/m0\/s1 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II).\nUser: Can you create another compound InChI that targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II)?\nAssistant: Yes, the compound InChI InChI=1S\/C24H25ClFN3O6\/c1-13-11-29(17(12-34-13)9-21(30)31)23(32)15-7-18-22(20(8-15)33-2)35-24(27-18)28-19(10-26)14-4-3-5-16(25)6-14\/h3-8,13,17,19H,9-12H2,1-2H3,(H,27,28)(H,30,31)\/t13?,17-,19?\/m0\/s1 targets the compound InChI CC(C)=CC(=O)C[C@@H]1C(=O)O[C@H]2CCCC21."} {"text":"User: Can you give me an example for a compound SMILES that targets the protein Adenosine receptor A3?\nAssistant: Yes, the compound SMILES OC12CCCCC1=Nc1ccccc12 targets the protein Adenosine receptor A3.\nUser: Can you generate another compound SMILES that targets the protein Adenosine receptor A3?\nAssistant: Yes, of course, the compound SMILES OC12CCCCC1=Nc1ccccc12 targets the compound SMILES CNC(=O)[C@H]1O[C@@H](n2cnc3c(NCc4cc(OC)ccc4OC)ncnc32)[C@H](O)[C@@H]1O."}", "/scratch/micpie/export/compound_protein_compound_2/test_8-1.jsonl": "{"text":"The protein GR is targeted by the compound with the SELFIES [C][C@][C][C@H1][Branch1][C][O][C@@][Branch1][C][F][C@@H1][Branch2][Ring1][Ring2][C][C@H1][Branch1][C][F][C][=C][C][=Branch1][C][=O][C][=C][C@@][Ring1][#Branch1][Ring1][=N][C][C@@H1][Ring2][Ring1][Ring1][C][C@H1][O][C@@H1][Branch2][Ring1][Branch1][C][=C][C][=C][Branch1][O][C][S][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][=C][O][C@][Ring2][Ring1][Ring1][Ring2][Ring2][#Branch1][C][=Branch1][C][=O][C][O] and COC(=O)c1ccc(NC(=O)O[C@@H]2[C@@H](C)c3c(cc(F)c(-c4cccc5cc[nH]c45)c3F)NC2(C)C)cc1."} {"text":"The protein Tyrosyl-DNA phosphodiesterase 1 is targeted by the compound with the canonical SMILES O=C1CCCc2oc3ccc(NS(=O)(=O)c4cccc5cccnc45)cc3c21 and COc1cc2c(=O)n(CCCN3CCOCC3)c3c(c2cc1OCCCN(C)C)C(=O)c1cc2c(cc1-3)OCO2."}", "/scratch/micpie/export/compound_protein_compound_2/train_2-2.jsonl": "{"text":"User: Can you give me an example for a compound SMILES that targets the protein CAB?\nAssistant: Sure, the compound SMILES NC(=O)Nc1ccc(S(=O)(=O)Nc2cccc(S(N)(=O)=O)c2)cc1 targets the protein CAB.\nUser: Can you generate another compound SMILES that targets the protein CAB?\nAssistant: Yes, of course, the compound SMILES NC(=O)Nc1ccc(S(=O)(=O)Nc2cccc(S(N)(=O)=O)c2)cc1 targets the compound SMILES Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cl.Cl.[Zn+2]."} {"text":"User: Can you give me one example for a compound InChI that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, of course, the compound InChI InChI=1S\/C17H19N3O\/c1-17(2,12-7-4-5-8-13(12)18-3)16-11-14(19-20-16)15-9-6-10-21-15\/h4-11,18H,1-3H3,(H,19,20) targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you generate another compound InChI that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Sure, the compound InChI InChI=1S\/C17H19N3O\/c1-17(2,12-7-4-5-8-13(12)18-3)16-11-14(19-20-16)15-9-6-10-21-15\/h4-11,18H,1-3H3,(H,19,20) targets the compound InChI O=C(O)CC1OCC=C2CN3CCC45C6=CC(=O)C(O)=C([N+](=O)[O-])C6=NC4C1C2CC35."}", "/scratch/micpie/export/compound_protein_compound_2/test_4-0.jsonl": "{"text":"The compound SMILES CCCC[C@H](NC(=O)OC(CCC)CCC)C(=O)C(=O)N[C@H](C)c1ccccc1 targets the protein Cathepsin O and which is also targeted by the compound CC(C)CC(C(=O)NCC#N)c1cccc(-c2ccc(C3CNCCO3)cc2)c1."} {"text":"The compound SELFIES [N][=C][Branch1][C][N][C][=C][C][=C][S][C][Branch2][=Branch1][=N][C][=Branch1][C][=O][N][C][C][N][Branch2][Branch1][=N][C][=Branch1][C][=O][C][O][C][=C][C][=C][Branch2][Ring2][O][O][C][C][=Branch1][C][=O][N][C][C][N][Branch2][Ring1][=Branch2][C][=Branch1][C][=O][C][=C][C][=C][C][Branch1][=Branch1][C][=Branch1][C][=N][N][=C][C][=C][Ring1][=Branch2][S][Ring1][N][C][C][Ring2][Ring1][Ring2][C][=C][Ring2][Ring1][=C][C][C][Ring2][Ring2][Branch2][=C][C][Ring2][Ring2][#C][=C][Ring2][Branch1][Ring1] targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) and which is also targeted by the compound CC(C)=CC(=O)C[C@@H]1C(=O)O[C@H]2CCCC21."}", "/scratch/micpie/export/compound_protein_compound_2/valid_5-1.jsonl": "{"text":"The protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) is targeted by the compound with the DeepSMILES COcccC=O)NCCC)OC[C@@H]6CC=O)O))))))))))ccncNCCF))cccccCl)c6))))))))oc95 and CC(C)=CC(=O)C[C@@H]1C(=O)O[C@H]2CCCC21."} {"text":"The protein Adenosine receptor A3 is targeted by the compound with the SMILES OC12CCCCC1=Nc1ccccc12 and CNC(=O)[C@H]1O[C@@H](n2cnc3c(NCc4cc(OC)ccc4OC)ncnc32)[C@H](O)[C@@H]1O."}", "/scratch/micpie/export/compound_protein_compound_2/valid_8-2.jsonl": "{"text":"User: Can you give me an example for a compound canonical SMILES that targets the protein GR?\nAssistant: Of course, the compound canonical SMILES C[C@@H]1C[C@H]2[C@@H]3C[C@H](F)C4=CC(=O)C=C[C@]4(C)[C@@]3(F)[C@@H](O)C[C@]2(C)[C@@]1(O)C(=O)CSc1nc2ccccc2s1 targets the protein GR.\nUser: Can you create another compound canonical SMILES that targets the protein GR?\nAssistant: Yes, of course, the compound canonical SMILES C[C@@H]1C[C@H]2[C@@H]3C[C@H](F)C4=CC(=O)C=C[C@]4(C)[C@@]3(F)[C@@H](O)C[C@]2(C)[C@@]1(O)C(=O)CSc1nc2ccccc2s1 targets the compound canonical SMILES COC(=O)c1ccc(NC(=O)O[C@@H]2[C@@H](C)c3c(cc(F)c(-c4cccc5cc[nH]c45)c3F)NC2(C)C)cc1."} {"text":"User: Can you come up with one example for a compound SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Sure, the compound SMILES Cc1ccc(Nc2nnc(SCC(=O)NCc3ccco3)s2)cc1 targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you generate another compound SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, the compound SMILES Cc1ccc(Nc2nnc(SCC(=O)NCc3ccco3)s2)cc1 targets the compound SMILES COc1cc2c(=O)n(CCCN3CCOCC3)c3c(c2cc1OCCCN(C)C)C(=O)c1cc2c(cc1-3)OCO2."}", "/scratch/micpie/export/compound_protein_compound_2/test_1-2.jsonl": "{"text":"User: Can you give me an example for a compound SMILES that targets the protein Tumor antigen HOM-RCC-3.1.3?\nAssistant: Sure, the compound SMILES COC(=O)[C@@H](NS(=O)(=O)NCc1ccc(F)cc1)C(C)C targets the protein Tumor antigen HOM-RCC-3.1.3.\nUser: Can you create another compound SMILES that targets the protein Tumor antigen HOM-RCC-3.1.3?\nAssistant: Yes, of course, the compound SMILES COC(=O)[C@@H](NS(=O)(=O)NCc1ccc(F)cc1)C(C)C targets the compound SMILES NS(=O)(=O)c1cc(NC(=O)Nc2ccccc2)c(Cl)cc1Cl."} {"text":"User: Can you give me an example for a compound SMILES that targets the protein Carbonate dehydratase I?\nAssistant: Yes, of course, the compound SMILES COc1ccccc1C(C)NS(N)(=O)=O targets the protein Carbonate dehydratase I.\nUser: Can you tell another compound SMILES that targets the protein Carbonate dehydratase I?\nAssistant: Yes, of course, the compound SMILES COc1ccccc1C(C)NS(N)(=O)=O targets the compound SMILES Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cl.Cl.[Zn+2]."}", "/scratch/micpie/export/compound_protein_compound_2/test_3-2.jsonl": "{"text":"User: Can you come up with one example for a compound SELFIES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, of course, the compound SELFIES [C][=Branch1][#Branch2][=C][\\C][=C][C][=C][C][=C][Ring1][=Branch1][\\C][N][C][C][N][Branch2][Ring1][#Branch1][C][C][O][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][Ring2][Ring1][=Branch1] targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you generate another compound SELFIES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, of course, the compound SELFIES [C][=Branch1][#Branch2][=C][\\C][=C][C][=C][C][=C][Ring1][=Branch1][\\C][N][C][C][N][Branch2][Ring1][#Branch1][C][C][O][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][Ring2][Ring1][=Branch1] targets the compound SELFIES O=C(O)CC1OCC=C2CN3CCC45C6=CC(=O)C(O)=C([N+](=O)[O-])C6=NC4C1C2CC35."} {"text":"User: Can you give me one example for a compound SMILES that targets the protein Cathepsin K?\nAssistant: Sure, the compound SMILES N#CC(NC(=O)[C@@H]1CCCC[C@H]1C(=O)N1CCN(c2ccc(F)cc2)CC1)c1ccccc1 targets the protein Cathepsin K.\nUser: Can you create another compound SMILES that targets the protein Cathepsin K?\nAssistant: Of course, the compound SMILES N#CC(NC(=O)[C@@H]1CCCC[C@H]1C(=O)N1CCN(c2ccc(F)cc2)CC1)c1ccccc1 targets the compound SMILES CC(C)CC(C(=O)NCC#N)c1cccc(-c2ccc(C3CNCCO3)cc2)c1."}", "/scratch/micpie/export/compound_protein_compound_2/valid_5-0.jsonl": "{"text":"The compound canonical SMILES COc1cc(C(=O)N2CC(C)OC[C@@H]2CC(=O)O)cc2nc(NC(CF)c3cccc(Cl)c3)oc12 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) and which is also targeted by the compound CC(C)=CC(=O)C[C@@H]1C(=O)O[C@H]2CCCC21."} {"text":"The compound DeepSMILES OCCCCCC6=Ncccccc6%13 targets the protein Adenosine receptor A3 and which is also targeted by the compound CNC(=O)[C@H]1O[C@@H](n2cnc3c(NCc4cc(OC)ccc4OC)ncnc32)[C@H](O)[C@@H]1O."}", "/scratch/micpie/export/compound_protein_compound_2/train_7-1.jsonl": "{"text":"The protein SAPK2a is targeted by the compound with the InChI InChI=1S\/C19H19N5O\/c1-11-6-9-24-16(22-23-18(24)20)15(11)12-4-5-13-14(10-12)21-17(25)19(13)7-2-3-8-19\/h4-6,9-10H,2-3,7-8H2,1H3,(H2,20,23)(H,21,25) and CC(C)Cn1c(N)nc2ccc(-c3[nH]c(-c4c(F)cccc4F)nc3-c3ccccc3)nc21.CS(=O)(=O)O."} {"text":"The protein Nuclear receptor subfamily 3 group C member 1 is targeted by the compound with the InChI InChI=1S\/C22H30O5\/c1-12-8-16-15-5-4-13-9-14(24)6-7-20(13,2)19(15)17(25)10-21(16,3)22(12,27)18(26)11-23\/h6-7,9,12,15-17,19,23,25,27H,4-5,8,10-11H2,1-3H3\/t12-,15+,16+,17+,19-,20+,21+,22+\/m1\/s1 and COC(=O)c1ccc(NC(=O)O[C@@H]2[C@@H](C)c3c(cc(F)c(-c4cccc5cc[nH]c45)c3F)NC2(C)C)cc1."}", "/scratch/micpie/export/compound_protein_compound_2/test_9-0.jsonl": "{"text":"The compound SELFIES [C][O][C][=C][C][=C][C][S][C][Branch2][Ring1][Ring2][N][N][C][=Branch1][C][=O][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][S][Ring1][=Branch2][=N][C][Ring2][Ring1][=Branch1][=Ring2][Ring1][C] targets the protein Tyr-DNA phosphodiesterase 1 and which is also targeted by the compound COc1cc2c(=O)n(CCCN3CCOCC3)c3c(c2cc1OCCCN(C)C)C(=O)c1cc2c(cc1-3)OCO2."} {"text":"The compound canonical SMILES COc1cc(CCC(=O)OCC(=O)Nc2sccc2C(N)=O)cc(OC)c1OC targets the protein Tyrosyl-DNA phosphodiesterase 1 and which is also targeted by the compound C\/C(=N\\NC(=O)c1cc(-c2cccs2)nc2ccccc12)c1ccccn1."}", "/scratch/micpie/export/compound_protein_compound_2/valid_9-1.jsonl": "{"text":"The protein Tyrosyl-DNA phosphodiesterase 1 is targeted by the compound with the InChI InChI=1S\/C24H23N5O7\/c1-4-35-18-11-14(8-9-17(18)36-24(32)15-6-5-7-16(10-15)29(33)34)21(19-12(2)25-27-22(19)30)20-13(3)26-28-23(20)31\/h5-11,21H,4H2,1-3H3,(H2,25,27,30)(H2,26,28,31) and COc1cc2c(=O)n(CCCN3CCOCC3)c3c(c2cc1OCCCN(C)C)C(=O)c1cc2c(cc1-3)OCO2."} {"text":"The protein Tyr-DNA phosphodiesterase 1 is targeted by the compound with the InChI InChI=1S\/C25H21NO5\/c1-29-18-6-8-20-21-9-7-19(13-23(21)31-25(28)22(20)12-18)30-15-24(27)26-11-10-16-4-2-3-5-17(16)14-26\/h2-9,12-13H,10-11,14-15H2,1H3 and C\/C(=N\\NC(=O)c1cc(-c2cccs2)nc2ccccc12)c1ccccn1."}", "/scratch/micpie/export/compound_protein_compound_2/valid_3-0.jsonl": "{"text":"The compound SMILES COc1cccc2c1[C@@H]1CN(CCCCn3c(=O)[nH]c4c(sc5ncc(-c6ccccc6)nc54)c3=O)C[C@@H]1CO2.Cl targets the protein Tyr-DNA phosphodiesterase 1 and which is also targeted by the compound O=C(O)CC1OCC=C2CN3CCC45C6=CC(=O)C(O)=C([N+](=O)[O-])C6=NC4C1C2CC35."} {"text":"The compound DeepSMILES COccccNCCNC=O)[C@@H]CCCC[C@H]6C=O)NCC#N))CC3))))))))))))[C@H]C)C6))))))cc6OC targets the protein Cathepsin X and which is also targeted by the compound CC(C)CC(C(=O)NCC#N)c1cccc(-c2ccc(C3CNCCO3)cc2)c1."}", "/scratch/micpie/export/compound_protein_compound_2/test_0-1.jsonl": "{"text":"The protein Carbonic anhydrase I is targeted by the compound with the canonical SMILES CCc1nc2ccccc2n1CC(=O)c1ccc(Cl)c(S(N)(=O)=O)c1 and CSc1ccc(\/C=N\/CCc2ccc(S(N)(=O)=O)cc2)cc1."} {"text":"The protein Carbonic anhydrase 12 is targeted by the compound with the SELFIES [N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][Branch1][C][F][C][Branch1][C][F][=C][Branch2][Ring1][=Branch1][N][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][C][Branch1][C][F][=C][Ring2][Ring1][Ring2][F] and NS(=O)(=O)c1cc(NC(=O)Nc2ccccc2)c(Cl)cc1Cl."}", "/scratch/micpie/export/compound_protein_compound_2/test_5-0.jsonl": "{"text":"The compound InChI InChI=1S\/C19H32BN3O4\/c1-15(2)23(19(25)12-11-16-8-4-3-5-9-16)14-18(24)22-17(20(26)27)10-6-7-13-21\/h3-5,8-9,15,17,26-27H,6-7,10-14,21H2,1-2H3,(H,22,24)\/t17-\/m0\/s1 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) and which is also targeted by the compound CC(C)=CC(=O)C[C@@H]1C(=O)O[C@H]2CCCC21."} {"text":"The compound canonical SMILES COC(=O)\/C=C\\Nc1cc(-c2ccncc2)nc2c1c(C)c(C)n2[C@H](C)c1ccccc1 targets the protein Adenosine receptor A3 and which is also targeted by the compound CNC(=O)[C@H]1O[C@@H](n2cnc3c(NCc4cc(OC)ccc4OC)ncnc32)[C@H](O)[C@@H]1O."}", "/scratch/micpie/export/compound_protein_compound_2/test_2-0.jsonl": "{"text":"The compound InChI InChI=1S\/C14H14Cl2N4O7S3\/c1-7-2-4-8(5-3-7)30(26,27)20-14(21)19-13-10(29(18,24)25)6-9(28(17,22)23)11(15)12(13)16\/h2-6H,1H3,(H2,17,22,23)(H2,18,24,25)(H2,19,20,21) targets the protein CAB and which is also targeted by the compound Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cl.Cl.[Zn+2]."} {"text":"The compound SMILES CCCCN(CCCC)CC(O)COc1ccc(Br)cc1 targets the protein Tyr-DNA phosphodiesterase 1 and which is also targeted by the compound O=C(O)CC1OCC=C2CN3CCC45C6=CC(=O)C(O)=C([N+](=O)[O-])C6=NC4C1C2CC35."}", "/scratch/micpie/export/compound_protein_compound_2/valid_2-2.jsonl": "{"text":"User: Can you come up with an example for a compound InChI that targets the protein Carbonic anhydrase 1?\nAssistant: Yes, of course, the compound InChI InChI=1S\/C13H16N4O6S\/c14-24(21,22)8-3-1-7(2-4-8)9-5-17(16-15-9)13-12(20)11(19)10(18)6-23-13\/h1-5,10-13,18-20H,6H2,(H2,14,21,22)\/t10-,11-,12+,13+\/m1\/s1 targets the protein Carbonic anhydrase 1.\nUser: Can you generate another compound InChI that targets the protein Carbonic anhydrase 1?\nAssistant: Sure, the compound InChI InChI=1S\/C13H16N4O6S\/c14-24(21,22)8-3-1-7(2-4-8)9-5-17(16-15-9)13-12(20)11(19)10(18)6-23-13\/h1-5,10-13,18-20H,6H2,(H2,14,21,22)\/t10-,11-,12+,13+\/m1\/s1 targets the compound InChI Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cl.Cl.[Zn+2]."} {"text":"User: Can you come up with an example for a compound DeepSMILES that targets the protein Tyrosyl-DNA phosphodiesterase 1?\nAssistant: Sure, the compound DeepSMILES COC=O)ccccCCCC4)[C@@H]cccccc6))))))NCcccc-cccccc6))))))cc6)))))))C[C@@H]5C8ccccCF)F)F))cc6))))))))))))))cc6 targets the protein Tyrosyl-DNA phosphodiesterase 1.\nUser: Can you tell another compound DeepSMILES that targets the protein Tyrosyl-DNA phosphodiesterase 1?\nAssistant: Of course, the compound DeepSMILES COC=O)ccccCCCC4)[C@@H]cccccc6))))))NCcccc-cccccc6))))))cc6)))))))C[C@@H]5C8ccccCF)F)F))cc6))))))))))))))cc6 targets the compound DeepSMILES O=C(O)CC1OCC=C2CN3CCC45C6=CC(=O)C(O)=C([N+](=O)[O-])C6=NC4C1C2CC35."}", "/scratch/micpie/export/compound_protein_compound_2/valid_0-0.jsonl": "{"text":"The compound SMILES CCCC(=O)OC[C@H]1O[C@@H](n2cc(-c3cccc(S(N)(=O)=O)c3)nn2)[C@H](OC(=O)CCC)[C@@H](OC(=O)CCC)[C@H]1OC(=O)CCC targets the protein Carbonic anhydrase I and which is also targeted by the compound CSc1ccc(\/C=N\/CCc2ccc(S(N)(=O)=O)cc2)cc1."} {"text":"The compound SELFIES [N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring1][=C][N][C][=Branch1][C][=O][C][=C][Branch1][C][Br][C][Branch1][C][Br][=C][Branch1][C][Br][C][Branch1][C][Br][=C][Ring1][#Branch2][C][Ring1][=C][=O][=C][Ring2][Ring1][Branch1] targets the protein Tumor antigen HOM-RCC-3.1.3 and which is also targeted by the compound NS(=O)(=O)c1cc(NC(=O)Nc2ccccc2)c(Cl)cc1Cl."}", "/scratch/micpie/export/compound_protein_compound_2/train_6-1.jsonl": "{"text":"The protein Adenosine receptor A3 is targeted by the compound with the SELFIES [O][=C][C][Branch1][C][O][=C][Branch1][=N][C][=C][C][=C][Branch1][C][O][C][=C][Ring1][#Branch1][O][O][C][=C][C][Branch1][C][O][=C][C][Branch1][C][O][=C][Ring2][Ring1][Branch1][Ring1][Branch2] and CNC(=O)[C@H]1O[C@@H](n2cnc3c(NCc4cc(OC)ccc4OC)ncnc32)[C@H](O)[C@@H]1O."} {"text":"The protein MAX-interacting protein 2 is targeted by the compound with the canonical SMILES Cc1cc(OCc2ccc(F)cc2F)c(Br)c(=O)n1-c1cccnc1 and CC(C)Cn1c(N)nc2ccc(-c3[nH]c(-c4c(F)cccc4F)nc3-c3ccccc3)nc21.CS(=O)(=O)O."}", "/scratch/micpie/export/compound_protein_compound_2/train_7-2.jsonl": "{"text":"User: Can you give me one example for a compound canonical SMILES that targets the protein MAPK 14?\nAssistant: Of course, the compound canonical SMILES Cc1ccn2c(N)nnc2c1-c1ccc2c(c1)NC(=O)C21CCCC1 targets the protein MAPK 14.\nUser: Can you tell another compound canonical SMILES that targets the protein MAPK 14?\nAssistant: Of course, the compound canonical SMILES Cc1ccn2c(N)nnc2c1-c1ccc2c(c1)NC(=O)C21CCCC1 targets the compound canonical SMILES CC(C)Cn1c(N)nc2ccc(-c3[nH]c(-c4c(F)cccc4F)nc3-c3ccccc3)nc21.CS(=O)(=O)O."} {"text":"User: Can you come up with an example for a compound canonical SMILES that targets the protein Glucocorticoid receptor?\nAssistant: Sure, the compound canonical SMILES C[C@@H]1C[C@H]2[C@@H]3CCC4=CC(=O)C=C[C@]4(C)[C@H]3[C@@H](O)C[C@]2(C)[C@@]1(O)C(=O)CO targets the protein Glucocorticoid receptor.\nUser: Can you create another compound canonical SMILES that targets the protein Glucocorticoid receptor?\nAssistant: Of course, the compound canonical SMILES C[C@@H]1C[C@H]2[C@@H]3CCC4=CC(=O)C=C[C@]4(C)[C@H]3[C@@H](O)C[C@]2(C)[C@@]1(O)C(=O)CO targets the compound canonical SMILES COC(=O)c1ccc(NC(=O)O[C@@H]2[C@@H](C)c3c(cc(F)c(-c4cccc5cc[nH]c45)c3F)NC2(C)C)cc1."}", "/scratch/micpie/export/compound_protein_compound_2/test_7-0.jsonl": "{"text":"The compound InChI InChI=1S\/C26H25FN6O2\/c1-17(34)31-21-4-2-3-5-23(21)35-26-29-15-12-22(32-26)25-24(18-6-8-19(27)9-7-18)30-16-33(25)20-10-13-28-14-11-20\/h2-9,12,15-16,20,28H,10-11,13-14H2,1H3,(H,31,34) targets the protein MAPK 14 and which is also targeted by the compound CC(C)Cn1c(N)nc2ccc(-c3[nH]c(-c4c(F)cccc4F)nc3-c3ccccc3)nc21.CS(=O)(=O)O."} {"text":"The compound InChI InChI=1S\/C22H26O2\/c1-2-22-14-21(24)19(15-6-4-3-5-7-15)13-17(22)9-8-16-12-18(23)10-11-20(16)22\/h3-7,10-12,17,19,21,23-24H,2,8-9,13-14H2,1H3\/t17-,19+,21+,22-\/m1\/s1 targets the protein GR and which is also targeted by the compound COC(=O)c1ccc(NC(=O)O[C@@H]2[C@@H](C)c3c(cc(F)c(-c4cccc5cc[nH]c45)c3F)NC2(C)C)cc1."}", "/scratch/micpie/export/compound_protein_compound_2/test_8-2.jsonl": "{"text":"User: Can you give me one example for a compound SMILES that targets the protein GR?\nAssistant: Yes, the compound SMILES C[C@]12C[C@H](O)[C@@]3(F)[C@@H](C[C@H](F)C4=CC(=O)C=C[C@@]43C)[C@@H]1C[C@H]1O[C@@H](c3ccc(CSc4ccccc4)cc3)O[C@]12C(=O)CO targets the protein GR.\nUser: Can you tell another compound SMILES that targets the protein GR?\nAssistant: Sure, the compound SMILES C[C@]12C[C@H](O)[C@@]3(F)[C@@H](C[C@H](F)C4=CC(=O)C=C[C@@]43C)[C@@H]1C[C@H]1O[C@@H](c3ccc(CSc4ccccc4)cc3)O[C@]12C(=O)CO targets the compound SMILES COC(=O)c1ccc(NC(=O)O[C@@H]2[C@@H](C)c3c(cc(F)c(-c4cccc5cc[nH]c45)c3F)NC2(C)C)cc1."} {"text":"User: Can you come up with an example for a compound SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Of course, the compound SMILES O=C1CCCc2oc3ccc(NS(=O)(=O)c4cccc5cccnc45)cc3c21 targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you tell another compound SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, the compound SMILES O=C1CCCc2oc3ccc(NS(=O)(=O)c4cccc5cccnc45)cc3c21 targets the compound SMILES COc1cc2c(=O)n(CCCN3CCOCC3)c3c(c2cc1OCCCN(C)C)C(=O)c1cc2c(cc1-3)OCO2."}", "/scratch/micpie/export/compound_protein_compound_2/test_0-2.jsonl": "{"text":"User: Can you give me an example for a compound SELFIES that targets the protein Carbonic anhydrase B?\nAssistant: Sure, the compound SELFIES [C][C][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][=C][Ring1][O] targets the protein Carbonic anhydrase B.\nUser: Can you generate another compound SELFIES that targets the protein Carbonic anhydrase B?\nAssistant: Sure, the compound SELFIES [C][C][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][=C][Ring1][O] targets the compound SELFIES CSc1ccc(\/C=N\/CCc2ccc(S(N)(=O)=O)cc2)cc1."} {"text":"User: Can you give me an example for a compound canonical SMILES that targets the protein Carbonic anhydrase XII?\nAssistant: Yes, the compound canonical SMILES NS(=O)(=O)c1c(F)c(F)c(NC23CC4CC(CC(C4)C2)C3)c(F)c1F targets the protein Carbonic anhydrase XII.\nUser: Can you generate another compound canonical SMILES that targets the protein Carbonic anhydrase XII?\nAssistant: Yes, of course, the compound canonical SMILES NS(=O)(=O)c1c(F)c(F)c(NC23CC4CC(CC(C4)C2)C3)c(F)c1F targets the compound canonical SMILES NS(=O)(=O)c1cc(NC(=O)Nc2ccccc2)c(Cl)cc1Cl."}", "/scratch/micpie/export/compound_protein_compound_2/test_3-0.jsonl": "{"text":"The compound SMILES C(=C\\c1ccccc1)\\CN1CCN(CCOC(c2ccccc2)c2ccccc2)CC1 targets the protein Tyr-DNA phosphodiesterase 1 and which is also targeted by the compound O=C(O)CC1OCC=C2CN3CCC45C6=CC(=O)C(O)=C([N+](=O)[O-])C6=NC4C1C2CC35."} {"text":"The compound SMILES N#CC(NC(=O)[C@@H]1CCCC[C@H]1C(=O)N1CCN(c2ccc(F)cc2)CC1)c1ccccc1 targets the protein Cathepsin O and which is also targeted by the compound CC(C)CC(C(=O)NCC#N)c1cccc(-c2ccc(C3CNCCO3)cc2)c1."}", "/scratch/micpie/export/compound_protein_compound_2/train_1-0.jsonl": "{"text":"The compound canonical SMILES CC(=O)N[C@@H]1[C@@H](O)[C@H](O)[C@@H](CO)O[C@H]1n1cc(COC(=O)c2ccc(S(N)(=O)=O)cc2)nn1 targets the protein CA-XII and which is also targeted by the compound NS(=O)(=O)c1cc(NC(=O)Nc2ccccc2)c(Cl)cc1Cl."} {"text":"The compound SELFIES [N][S][=Branch1][C][=O][=Branch1][C][=O][C][=N][C][=C][C][=C][Branch2][Ring1][=C][O][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][Branch1][C][F][C][Branch1][C][F][=C][Branch1][C][F][C][Branch1][C][F][=C][Ring1][#Branch2][F][C][=C][Ring2][Ring1][Branch1][S][Ring2][Ring1][Branch2] targets the protein Carbonic anhydrase 1 and which is also targeted by the compound Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cl.Cl.[Zn+2]."}", "/scratch/micpie/export/compound_protein_compound_2/test_5-2.jsonl": "{"text":"User: Can you come up with an example for a compound SMILES that targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II)?\nAssistant: Yes, the compound SMILES CC(C)N(CC(=O)N[C@@H](CCCCN)B(O)O)C(=O)CCc1ccccc1 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II).\nUser: Can you tell another compound SMILES that targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II)?\nAssistant: Of course, the compound SMILES CC(C)N(CC(=O)N[C@@H](CCCCN)B(O)O)C(=O)CCc1ccccc1 targets the compound SMILES CC(C)=CC(=O)C[C@@H]1C(=O)O[C@H]2CCCC21."} {"text":"User: Can you give me an example for a compound SMILES that targets the protein Adenosine receptor A3?\nAssistant: Of course, the compound SMILES COC(=O)\/C=C\\Nc1cc(-c2ccncc2)nc2c1c(C)c(C)n2[C@H](C)c1ccccc1 targets the protein Adenosine receptor A3.\nUser: Can you tell another compound SMILES that targets the protein Adenosine receptor A3?\nAssistant: Yes, of course, the compound SMILES COC(=O)\/C=C\\Nc1cc(-c2ccncc2)nc2c1c(C)c(C)n2[C@H](C)c1ccccc1 targets the compound SMILES CNC(=O)[C@H]1O[C@@H](n2cnc3c(NCc4cc(OC)ccc4OC)ncnc32)[C@H](O)[C@@H]1O."}", "/scratch/micpie/export/compound_protein_compound_2/valid_1-2.jsonl": "{"text":"User: Can you come up with an example for a compound InChI that targets the protein Carbonate dehydratase XII?\nAssistant: Of course, the compound InChI InChI=1S\/C30H42N4O11S\/c1-5-9-23(35)41-18-22-27(43-24(36)10-6-2)28(44-25(37)11-7-3)29(45-26(38)12-8-4)30(42-22)34-17-21(32-33-34)19-13-15-20(16-14-19)46(31,39)40\/h13-17,22,27-30H,5-12,18H2,1-4H3,(H2,31,39,40)\/t22-,27+,28+,29-,30-\/m1\/s1 targets the protein Carbonate dehydratase XII.\nUser: Can you create another compound InChI that targets the protein Carbonate dehydratase XII?\nAssistant: Sure, the compound InChI InChI=1S\/C30H42N4O11S\/c1-5-9-23(35)41-18-22-27(43-24(36)10-6-2)28(44-25(37)11-7-3)29(45-26(38)12-8-4)30(42-22)34-17-21(32-33-34)19-13-15-20(16-14-19)46(31,39)40\/h13-17,22,27-30H,5-12,18H2,1-4H3,(H2,31,39,40)\/t22-,27+,28+,29-,30-\/m1\/s1 targets the compound InChI NS(=O)(=O)c1cc(NC(=O)Nc2ccccc2)c(Cl)cc1Cl."} {"text":"User: Can you come up with one example for a compound SMILES that targets the protein Carbonic anhydrase I?\nAssistant: Sure, the compound SMILES CC(C)CC(=O)OC[C@H]1O[C@@H](n2cc(-c3cccc(S(N)(=O)=O)c3)nn2)[C@H](OC(=O)CC(C)C)[C@@H](OC(=O)CC(C)C)[C@@H]1OC(=O)CC(C)C targets the protein Carbonic anhydrase I.\nUser: Can you tell another compound SMILES that targets the protein Carbonic anhydrase I?\nAssistant: Sure, the compound SMILES CC(C)CC(=O)OC[C@H]1O[C@@H](n2cc(-c3cccc(S(N)(=O)=O)c3)nn2)[C@H](OC(=O)CC(C)C)[C@@H](OC(=O)CC(C)C)[C@@H]1OC(=O)CC(C)C targets the compound SMILES Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cl.Cl.[Zn+2]."}", "/scratch/micpie/export/compound_protein_compound_2/valid_7-2.jsonl": "{"text":"User: Can you give me an example for a compound DeepSMILES that targets the protein Mitogen-activated protein kinase p38 alpha?\nAssistant: Sure, the compound DeepSMILES CccccC=O)NcccF)ccNCCOCC6))))))c6))))))))cc6NcncnnccC=O)cccccc6)))))))cC)c95 targets the protein Mitogen-activated protein kinase p38 alpha.\nUser: Can you tell another compound DeepSMILES that targets the protein Mitogen-activated protein kinase p38 alpha?\nAssistant: Sure, the compound DeepSMILES CccccC=O)NcccF)ccNCCOCC6))))))c6))))))))cc6NcncnnccC=O)cccccc6)))))))cC)c95 targets the compound DeepSMILES CC(C)Cn1c(N)nc2ccc(-c3[nH]c(-c4c(F)cccc4F)nc3-c3ccccc3)nc21.CS(=O)(=O)O."} {"text":"User: Can you give me one example for a compound SELFIES that targets the protein GR?\nAssistant: Sure, the compound SELFIES [C][C][C][Branch2][Ring1][#C][C][C][Branch1][C][O][Branch1][S][C][C][=C][C][=N][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][Branch1][C][F][Branch1][C][F][F][C][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring2][Ring1][N] targets the protein GR.\nUser: Can you create another compound SELFIES that targets the protein GR?\nAssistant: Yes, the compound SELFIES [C][C][C][Branch2][Ring1][#C][C][C][Branch1][C][O][Branch1][S][C][C][=C][C][=N][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1][C][Branch1][C][F][Branch1][C][F][F][C][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring2][Ring1][N] targets the compound SELFIES COC(=O)c1ccc(NC(=O)O[C@@H]2[C@@H](C)c3c(cc(F)c(-c4cccc5cc[nH]c45)c3F)NC2(C)C)cc1."}", "/scratch/micpie/export/compound_protein_compound_2/test_0-0.jsonl": "{"text":"The compound SMILES CCc1nc2ccccc2n1CC(=O)c1ccc(Cl)c(S(N)(=O)=O)c1 targets the protein Carbonic anhydrase 1 and which is also targeted by the compound CSc1ccc(\/C=N\/CCc2ccc(S(N)(=O)=O)cc2)cc1."} {"text":"The compound SELFIES [N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][Branch1][C][F][C][Branch1][C][F][=C][Branch2][Ring1][=Branch1][N][C][C][C][C][C][Branch1][O][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][Ring1][=Branch2][C][Ring1][#Branch2][C][Branch1][C][F][=C][Ring2][Ring1][Ring2][F] targets the protein Tumor antigen HOM-RCC-3.1.3 and which is also targeted by the compound NS(=O)(=O)c1cc(NC(=O)Nc2ccccc2)c(Cl)cc1Cl."}", "/scratch/micpie/export/compound_protein_compound_2/test_6-0.jsonl": "{"text":"The compound InChI InChI=1S\/C24H25ClN6O4S\/c1-2-31-23(32)20-22(28-24(31)33)27-21(26-20)17-6-8-19(9-7-17)36(34,35)30-12-10-29(11-13-30)15-16-4-3-5-18(25)14-16\/h3-9,14H,2,10-13,15H2,1H3,(H,26,27)(H,28,33) targets the protein Adenosine receptor A3 and which is also targeted by the compound CNC(=O)[C@H]1O[C@@H](n2cnc3c(NCc4cc(OC)ccc4OC)ncnc32)[C@H](O)[C@@H]1O."} {"text":"The compound SELFIES [C][C][=C][C][Branch1][C][C][=C][C][Branch2][Ring2][#Branch2][O][C][=N][C][=C][C][Branch2][Ring1][=N][C][=C][Branch1][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][N][=C][N][Ring1][N][C][C][C][N][C][C][Ring1][=Branch1][=N][Ring2][Ring1][Branch2][=C][Ring2][Ring1][S] targets the protein SAPK2a and which is also targeted by the compound CC(C)Cn1c(N)nc2ccc(-c3[nH]c(-c4c(F)cccc4F)nc3-c3ccccc3)nc21.CS(=O)(=O)O."}", "/scratch/micpie/export/compound_protein_compound_2/train_2-0.jsonl": "{"text":"The compound DeepSMILES NC=O)NccccS=O)=O)NcccccSN)=O)=O))c6))))))))cc6 targets the protein CAB and which is also targeted by the compound Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cl.Cl.[Zn+2]."} {"text":"The compound canonical SMILES CNc1ccccc1C(C)(C)c1cc(-c2ccco2)[nH]n1 targets the protein Tyr-DNA phosphodiesterase 1 and which is also targeted by the compound O=C(O)CC1OCC=C2CN3CCC45C6=CC(=O)C(O)=C([N+](=O)[O-])C6=NC4C1C2CC35."}", "/scratch/micpie/export/compound_protein_compound_2/valid_2-0.jsonl": "{"text":"The compound SMILES NS(=O)(=O)c1ccc(-c2cn([C@H]3OC[C@@H](O)[C@@H](O)[C@@H]3O)nn2)cc1 targets the protein Cyanamide hydratase CA1 and which is also targeted by the compound Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cl.Cl.[Zn+2]."} {"text":"The compound InChI InChI=1S\/C42H36F3NO2\/c1-48-39(47)33-18-20-34(21-19-33)40-26-41(27-40)36(37(40)31-16-22-35(23-17-31)42(43,44)45)25-46(38(41)32-10-6-3-7-11-32)24-28-12-14-30(15-13-28)29-8-4-2-5-9-29\/h2-23,36-38H,24-27H2,1H3\/t36-,37?,38-,40?,41?\/m1\/s1 targets the protein Tyrosyl-DNA phosphodiesterase 1 and which is also targeted by the compound O=C(O)CC1OCC=C2CN3CCC45C6=CC(=O)C(O)=C([N+](=O)[O-])C6=NC4C1C2CC35."}", "/scratch/micpie/export/compound_protein_compound_2/test_7-1.jsonl": "{"text":"The protein SAPK2a is targeted by the compound with the SELFIES [C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][=N][C][=C][C][Branch2][Ring1][=N][C][=C][Branch1][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][N][=C][N][Ring1][N][C][C][C][N][C][C][Ring1][=Branch1][=N][Ring2][Ring1][Branch2] and CC(C)Cn1c(N)nc2ccc(-c3[nH]c(-c4c(F)cccc4F)nc3-c3ccccc3)nc21.CS(=O)(=O)O."} {"text":"The protein Glucocorticoid receptor is targeted by the compound with the DeepSMILES CC[C@@]C[C@H]O)[C@H]cccccc6))))))C[C@H]6CCcccO)ccc6%14 and COC(=O)c1ccc(NC(=O)O[C@@H]2[C@@H](C)c3c(cc(F)c(-c4cccc5cc[nH]c45)c3F)NC2(C)C)cc1."}", "/scratch/micpie/export/compound_protein_compound_2/valid_4-2.jsonl": "{"text":"User: Can you give me an example for a compound canonical SMILES that targets the protein Cathepsin O2?\nAssistant: Of course, the compound canonical SMILES O=C(N[C@@H](CC1CCCCC1)C(=O)NCCNc1ccc(OC(F)(F)F)cc1)c1ccc(-c2cccc(C(F)(F)F)c2)o1 targets the protein Cathepsin O2.\nUser: Can you tell another compound canonical SMILES that targets the protein Cathepsin O2?\nAssistant: Yes, the compound canonical SMILES O=C(N[C@@H](CC1CCCCC1)C(=O)NCCNc1ccc(OC(F)(F)F)cc1)c1ccc(-c2cccc(C(F)(F)F)c2)o1 targets the compound canonical SMILES CC(C)CC(C(=O)NCC#N)c1cccc(-c2ccc(C3CNCCO3)cc2)c1."} {"text":"User: Can you come up with one example for a compound SMILES that targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II)?\nAssistant: Yes, of course, the compound SMILES N=C(N)c1ccc2oc(C(=O)N3CCN(C(=O)COc4ccccc4OCC(=O)N4CCN(C(=O)c5cc6cc(C(=N)N)ccc6o5)CC4)CC3)cc2c1 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II).\nUser: Can you generate another compound SMILES that targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II)?\nAssistant: Sure, the compound SMILES N=C(N)c1ccc2oc(C(=O)N3CCN(C(=O)COc4ccccc4OCC(=O)N4CCN(C(=O)c5cc6cc(C(=N)N)ccc6o5)CC4)CC3)cc2c1 targets the compound SMILES CC(C)=CC(=O)C[C@@H]1C(=O)O[C@H]2CCCC21."}", "/scratch/micpie/export/compound_protein_compound_2/test_4-2.jsonl": "{"text":"User: Can you give me an example for a compound SELFIES that targets the protein Cathepsin O2?\nAssistant: Of course, the compound SELFIES [C][C][C][C][C@H1][Branch1][S][N][C][=Branch1][C][=O][O][C][Branch1][Ring2][C][C][C][C][C][C][C][=Branch1][C][=O][C][=Branch1][C][=O][N][C@H1][Branch1][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1] targets the protein Cathepsin O2.\nUser: Can you tell another compound SELFIES that targets the protein Cathepsin O2?\nAssistant: Yes, of course, the compound SELFIES [C][C][C][C][C@H1][Branch1][S][N][C][=Branch1][C][=O][O][C][Branch1][Ring2][C][C][C][C][C][C][C][=Branch1][C][=O][C][=Branch1][C][=O][N][C@H1][Branch1][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1] targets the compound SELFIES CC(C)CC(C(=O)NCC#N)c1cccc(-c2ccc(C3CNCCO3)cc2)c1."} {"text":"User: Can you give me one example for a compound DeepSMILES that targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II)?\nAssistant: Yes, of course, the compound DeepSMILES N=CN)ccccscC=O)NCCNC=O)COccccOCC=O)NCCNC=O)cccccC=N)N))ccc6s9))))))))))CC6)))))))))cc6)))))))))CC6)))))))cc5c9 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II).\nUser: Can you tell another compound DeepSMILES that targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II)?\nAssistant: Sure, the compound DeepSMILES N=CN)ccccscC=O)NCCNC=O)COccccOCC=O)NCCNC=O)cccccC=N)N))ccc6s9))))))))))CC6)))))))))cc6)))))))))CC6)))))))cc5c9 targets the compound DeepSMILES CC(C)=CC(=O)C[C@@H]1C(=O)O[C@H]2CCCC21."}", "/scratch/micpie/export/compound_protein_compound_2/valid_3-2.jsonl": "{"text":"User: Can you give me one example for a compound SELFIES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, the compound SELFIES [C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C@@H1][C][N][Branch2][Ring2][#Branch2][C][C][C][C][N][C][=Branch1][C][=O][NH1][C][=C][Branch2][Ring1][=Branch1][S][C][=N][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][N][Ring1][#C][C][Ring2][Ring1][Ring2][=O][C][C@@H1][Ring2][Ring1][=C][C][O][Ring2][Ring2][C].[Cl] targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you create another compound SELFIES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Sure, the compound SELFIES [C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C@@H1][C][N][Branch2][Ring2][#Branch2][C][C][C][C][N][C][=Branch1][C][=O][NH1][C][=C][Branch2][Ring1][=Branch1][S][C][=N][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][N][Ring1][#C][C][Ring2][Ring1][Ring2][=O][C][C@@H1][Ring2][Ring1][=C][C][O][Ring2][Ring2][C].[Cl] targets the compound SELFIES O=C(O)CC1OCC=C2CN3CCC45C6=CC(=O)C(O)=C([N+](=O)[O-])C6=NC4C1C2CC35."} {"text":"User: Can you come up with one example for a compound SELFIES that targets the protein Cathepsin O2?\nAssistant: Yes, of course, the compound SELFIES [C][O][C][=C][C][=C][Branch2][Ring2][#Branch2][N][C][C][N][Branch2][Ring1][O][C][=Branch1][C][=O][C@@H1][C][C][C][C][C@H1][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][Ring1][C][#N][C][C][Ring1][Branch1][C@H1][Branch1][C][C][C][Ring2][Ring1][#Branch1][C][=C][Ring2][Ring1][=N][O][C] targets the protein Cathepsin O2.\nUser: Can you generate another compound SELFIES that targets the protein Cathepsin O2?\nAssistant: Yes, of course, the compound SELFIES [C][O][C][=C][C][=C][Branch2][Ring2][#Branch2][N][C][C][N][Branch2][Ring1][O][C][=Branch1][C][=O][C@@H1][C][C][C][C][C@H1][Ring1][=Branch1][C][=Branch1][C][=O][N][C][Branch1][Ring1][C][#N][C][C][Ring1][Branch1][C@H1][Branch1][C][C][C][Ring2][Ring1][#Branch1][C][=C][Ring2][Ring1][=N][O][C] targets the compound SELFIES CC(C)CC(C(=O)NCC#N)c1cccc(-c2ccc(C3CNCCO3)cc2)c1."}", "/scratch/micpie/export/compound_protein_compound_2/valid_2-1.jsonl": "{"text":"The protein Carbonic anhydrase B is targeted by the compound with the DeepSMILES NS=O)=O)cccc-ccn[C@H]OC[C@@H]O)[C@@H]O)[C@@H]6O)))))))nn5)))))cc6 and Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cl.Cl.[Zn+2]."} {"text":"The protein Tyr-DNA phosphodiesterase 1 is targeted by the compound with the canonical SMILES COC(=O)c1ccc(C23CC4(C2)[C@H](CN(Cc2ccc(-c5ccccc5)cc2)[C@@H]4c2ccccc2)C3c2ccc(C(F)(F)F)cc2)cc1 and O=C(O)CC1OCC=C2CN3CCC45C6=CC(=O)C(O)=C([N+](=O)[O-])C6=NC4C1C2CC35."}", "/scratch/micpie/export/compound_protein_compound_2/valid_4-0.jsonl": "{"text":"The compound InChI InChI=1S\/C30H31F6N3O4\/c31-29(32,33)21-8-4-7-20(18-21)25-13-14-26(42-25)28(41)39-24(17-19-5-2-1-3-6-19)27(40)38-16-15-37-22-9-11-23(12-10-22)43-30(34,35)36\/h4,7-14,18-19,24,37H,1-3,5-6,15-17H2,(H,38,40)(H,39,41)\/t24-\/m0\/s1 targets the protein Cathepsin X and which is also targeted by the compound CC(C)CC(C(=O)NCC#N)c1cccc(-c2ccc(C3CNCCO3)cc2)c1."} {"text":"The compound canonical SMILES N=C(N)c1ccc2oc(C(=O)N3CCN(C(=O)COc4ccccc4OCC(=O)N4CCN(C(=O)c5cc6cc(C(=N)N)ccc6o5)CC4)CC3)cc2c1 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) and which is also targeted by the compound CC(C)=CC(=O)C[C@@H]1C(=O)O[C@H]2CCCC21."}", "/scratch/micpie/export/compound_protein_compound_2/train_5-1.jsonl": "{"text":"The protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) is targeted by the compound with the canonical SMILES COc1ccc2ccc(S(=O)(=O)N(Cc3ccccc3)[C@H]3CCN(Cc4cccc(C(=N)N)c4)C3=O)cc2c1 and CC(C)=CC(=O)C[C@@H]1C(=O)O[C@H]2CCCC21."} {"text":"The protein Adenosine receptor A3 is targeted by the compound with the SELFIES [C][C][C][N][C][=Branch1][C][=O][C][=C][Branch1][=C][N][=C][NH1][C][Branch1][C][C][=C][N][Ring1][=Branch1][Ring1][=Branch2][N][Branch1][#Branch2][C][C][=C][C][=N][C][=C][Ring1][=Branch1][C][Ring2][Ring1][Branch1][=O].[Cl] and CNC(=O)[C@H]1O[C@@H](n2cnc3c(NCc4cc(OC)ccc4OC)ncnc32)[C@H](O)[C@@H]1O."}", "/scratch/micpie/export/compound_protein_compound_2/test_2-1.jsonl": "{"text":"The protein CA-I is targeted by the compound with the InChI InChI=1S\/C14H14Cl2N4O7S3\/c1-7-2-4-8(5-3-7)30(26,27)20-14(21)19-13-10(29(18,24)25)6-9(28(17,22)23)11(15)12(13)16\/h2-6H,1H3,(H2,17,22,23)(H2,18,24,25)(H2,19,20,21) and Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cl.Cl.[Zn+2]."} {"text":"The protein Tyr-DNA phosphodiesterase 1 is targeted by the compound with the DeepSMILES CCCCNCCCC))))CCO)COccccBr)cc6 and O=C(O)CC1OCC=C2CN3CCC45C6=CC(=O)C(O)=C([N+](=O)[O-])C6=NC4C1C2CC35."}", "/scratch/micpie/export/compound_protein_compound_2/train_9-2.jsonl": "{"text":"User: Can you come up with one example for a compound SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Sure, the compound SMILES COc1ccc(-c2cn3cccnc3n2)cc1NC(=O)Nc1cc(C)ccc1C targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you create another compound SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, the compound SMILES COc1ccc(-c2cn3cccnc3n2)cc1NC(=O)Nc1cc(C)ccc1C targets the compound SMILES COc1cc2c(=O)n(CCCN3CCOCC3)c3c(c2cc1OCCCN(C)C)C(=O)c1cc2c(cc1-3)OCO2."} {"text":"User: Can you come up with an example for a compound InChI that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, the compound InChI InChI=1S\/C10H7NO3S2\/c12-6-3-1-2-5(8(6)13)4-7-9(15)11-10(14)16-7\/h1-4,12-13H,(H,11,14,15)\/b7-4- targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you tell another compound InChI that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, of course, the compound InChI InChI=1S\/C10H7NO3S2\/c12-6-3-1-2-5(8(6)13)4-7-9(15)11-10(14)16-7\/h1-4,12-13H,(H,11,14,15)\/b7-4- targets the compound InChI C\/C(=N\\NC(=O)c1cc(-c2cccs2)nc2ccccc12)c1ccccn1."}", "/scratch/micpie/export/compound_protein_compound_2/train_0-0.jsonl": "{"text":"The compound InChI InChI=1S\/C8H7BrN4O2S2\/c9-6-3-1-5(2-4-6)7-11-12-8(16-7)13-17(10,14)15\/h1-4H,(H,12,13)(H2,10,14,15) targets the protein Carbonate dehydratase I and which is also targeted by the compound CSc1ccc(\/C=N\/CCc2ccc(S(N)(=O)=O)cc2)cc1."} {"text":"The compound InChI InChI=1S\/C21H25N5O10S\/c1-11(27)34-17-10-33-21(19(36-13(3)29)18(17)35-12(2)28)26-9-15(24-25-26)8-23-20(30)14-4-6-16(7-5-14)37(22,31)32\/h4-7,9,17-19,21H,8,10H2,1-3H3,(H,23,30)(H2,22,31,32)\/t17-,18-,19+,21+\/m1\/s1 targets the protein Carbonic anhydrase 12 and which is also targeted by the compound NS(=O)(=O)c1cc(NC(=O)Nc2ccccc2)c(Cl)cc1Cl."}", "/scratch/micpie/export/compound_protein_compound_2/test_1-1.jsonl": "{"text":"The protein Carbonic anhydrase 12 is targeted by the compound with the InChI InChI=1S\/C13H19FN2O4S\/c1-9(2)12(13(17)20-3)16-21(18,19)15-8-10-4-6-11(14)7-5-10\/h4-7,9,12,15-16H,8H2,1-3H3\/t12-\/m0\/s1 and NS(=O)(=O)c1cc(NC(=O)Nc2ccccc2)c(Cl)cc1Cl."} {"text":"The protein Carbonic anhydrase 1 is targeted by the compound with the InChI InChI=1S\/C9H14N2O3S\/c1-7(11-15(10,12)13)8-5-3-4-6-9(8)14-2\/h3-7,11H,1-2H3,(H2,10,12,13) and Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cl.Cl.[Zn+2]."}", "/scratch/micpie/export/compound_protein_compound_2/valid_9-0.jsonl": "{"text":"The compound DeepSMILES CCOcccCccC)n[nH]c5O))))))ccC)n[nH]c5O)))))))ccc6OC=O)ccccc[N+]=O)[O-]))c6 targets the protein Tyr-DNA phosphodiesterase 1 and which is also targeted by the compound COc1cc2c(=O)n(CCCN3CCOCC3)c3c(c2cc1OCCCN(C)C)C(=O)c1cc2c(cc1-3)OCO2."} {"text":"The compound InChI InChI=1S\/C25H21NO5\/c1-29-18-6-8-20-21-9-7-19(13-23(21)31-25(28)22(20)12-18)30-15-24(27)26-11-10-16-4-2-3-5-17(16)14-26\/h2-9,12-13H,10-11,14-15H2,1H3 targets the protein Tyrosyl-DNA phosphodiesterase 1 and which is also targeted by the compound C\/C(=N\\NC(=O)c1cc(-c2cccs2)nc2ccccc12)c1ccccn1."}", "/scratch/micpie/export/compound_protein_compound_2/train_5-2.jsonl": "{"text":"User: Can you give me an example for a compound SELFIES that targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II)?\nAssistant: Yes, of course, the compound SELFIES [C][O][C][=C][C][=C][C][=C][C][Branch2][Ring2][=C][S][=Branch1][C][=O][=Branch1][C][=O][N][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C@H1][C][C][N][Branch1][P][C][C][=C][C][=C][C][Branch1][=Branch1][C][=Branch1][C][=N][N][=C][Ring1][=Branch2][C][Ring1][#C][=O][=C][C][Ring2][Ring1][P][=C][Ring2][Ring2][Branch1] targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II).\nUser: Can you create another compound SELFIES that targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II)?\nAssistant: Yes, the compound SELFIES [C][O][C][=C][C][=C][C][=C][C][Branch2][Ring2][=C][S][=Branch1][C][=O][=Branch1][C][=O][N][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C@H1][C][C][N][Branch1][P][C][C][=C][C][=C][C][Branch1][=Branch1][C][=Branch1][C][=N][N][=C][Ring1][=Branch2][C][Ring1][#C][=O][=C][C][Ring2][Ring1][P][=C][Ring2][Ring2][Branch1] targets the compound SELFIES CC(C)=CC(=O)C[C@@H]1C(=O)O[C@H]2CCCC21."} {"text":"User: Can you come up with one example for a compound DeepSMILES that targets the protein Adenosine receptor A3?\nAssistant: Sure, the compound DeepSMILES CCCnc=O)ccnc[nH]cC)cn58))))))nCcccncc6)))))))c6=O.Cl targets the protein Adenosine receptor A3.\nUser: Can you tell another compound DeepSMILES that targets the protein Adenosine receptor A3?\nAssistant: Yes, of course, the compound DeepSMILES CCCnc=O)ccnc[nH]cC)cn58))))))nCcccncc6)))))))c6=O.Cl targets the compound DeepSMILES CNC(=O)[C@H]1O[C@@H](n2cnc3c(NCc4cc(OC)ccc4OC)ncnc32)[C@H](O)[C@@H]1O."}", "/scratch/micpie/export/compound_protein_compound_2/train_8-1.jsonl": "{"text":"The protein Nuclear receptor subfamily 3 group C member 1 is targeted by the compound with the SMILES CC1(CC(O)(CNc2cccc3cnccc23)C(F)(F)F)CCCc2ccccc21 and COC(=O)c1ccc(NC(=O)O[C@@H]2[C@@H](C)c3c(cc(F)c(-c4cccc5cc[nH]c45)c3F)NC2(C)C)cc1."} {"text":"The protein Tyr-DNA phosphodiesterase 1 is targeted by the compound with the SELFIES [C][\/C][=Branch2][Ring1][Ring2][=N][\\N][C][=N][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][S][Ring1][O][C][=C][C][=C][S][Ring1][Branch1] and COc1cc2c(=O)n(CCCN3CCOCC3)c3c(c2cc1OCCCN(C)C)C(=O)c1cc2c(cc1-3)OCO2."}", "/scratch/micpie/export/compound_protein_compound_2/train_8-0.jsonl": "{"text":"The compound canonical SMILES CC1(CC(O)(CNc2cccc3cnccc23)C(F)(F)F)CCCc2ccccc21 targets the protein Nuclear receptor subfamily 3 group C member 1 and which is also targeted by the compound COC(=O)c1ccc(NC(=O)O[C@@H]2[C@@H](C)c3c(cc(F)c(-c4cccc5cc[nH]c45)c3F)NC2(C)C)cc1."} {"text":"The compound InChI InChI=1S\/C15H13N3S2\/c1-11(14-8-5-9-19-14)17-18-15-16-13(10-20-15)12-6-3-2-4-7-12\/h2-10H,1H3,(H,16,18)\/b17-11+ targets the protein Tyr-DNA phosphodiesterase 1 and which is also targeted by the compound COc1cc2c(=O)n(CCCN3CCOCC3)c3c(c2cc1OCCCN(C)C)C(=O)c1cc2c(cc1-3)OCO2."}", "/scratch/micpie/export/compound_protein_compound_2/test_5-1.jsonl": "{"text":"The protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) is targeted by the compound with the InChI InChI=1S\/C19H32BN3O4\/c1-15(2)23(19(25)12-11-16-8-4-3-5-9-16)14-18(24)22-17(20(26)27)10-6-7-13-21\/h3-5,8-9,15,17,26-27H,6-7,10-14,21H2,1-2H3,(H,22,24)\/t17-\/m0\/s1 and CC(C)=CC(=O)C[C@@H]1C(=O)O[C@H]2CCCC21."} {"text":"The protein Adenosine receptor A3 is targeted by the compound with the SELFIES [C][O][C][=Branch1][C][=O][\/C][=C][\\N][C][=C][C][Branch1][=Branch2][C][=C][C][=N][C][=C][Ring1][=Branch1][=N][C][=C][Ring1][N][C][Branch1][C][C][=C][Branch1][C][C][N][Ring1][#Branch1][C@H1][Branch1][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1] and CNC(=O)[C@H]1O[C@@H](n2cnc3c(NCc4cc(OC)ccc4OC)ncnc32)[C@H](O)[C@@H]1O."}", "/scratch/micpie/export/compound_protein_compound_2/train_4-1.jsonl": "{"text":"The protein Cathepsin O is targeted by the compound with the SELFIES [C][C][Branch1][C][F][Branch1][C][F][C][C@H1][Branch2][Ring1][P][N][C][=Branch1][C][=O][N][C][C][C][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][=Branch1][C][=O][N][Branch1][#Branch1][C][C][C][C][Ring1][Ring1][C][Ring1][N][C][=Branch1][C][=O][N][C][Branch1][Ring1][C][#N][C][C][Ring1][Branch1] and CC(C)CC(C(=O)NCC#N)c1cccc(-c2ccc(C3CNCCO3)cc2)c1."} {"text":"The protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) is targeted by the compound with the SMILES CCOC(=O)C(=O)N(Cc1ccccc1)c1ccc2c(c1)OC[C@H](COc1ccc(C(=N)N)cc1)O2 and CC(C)=CC(=O)C[C@@H]1C(=O)O[C@H]2CCCC21."}", "/scratch/micpie/export/compound_protein_compound_2/valid_0-2.jsonl": "{"text":"User: Can you give me an example for a compound DeepSMILES that targets the protein Carbonate dehydratase I?\nAssistant: Yes, the compound DeepSMILES CCCC=O)OC[C@H]O[C@@H]ncc-cccccSN)=O)=O))c6))))))nn5)))))[C@H]OC=O)CCC)))))[C@@H]OC=O)CCC)))))[C@H]6OC=O)CCC targets the protein Carbonate dehydratase I.\nUser: Can you tell another compound DeepSMILES that targets the protein Carbonate dehydratase I?\nAssistant: Yes, the compound DeepSMILES CCCC=O)OC[C@H]O[C@@H]ncc-cccccSN)=O)=O))c6))))))nn5)))))[C@H]OC=O)CCC)))))[C@@H]OC=O)CCC)))))[C@H]6OC=O)CCC targets the compound DeepSMILES CSc1ccc(\/C=N\/CCc2ccc(S(N)(=O)=O)cc2)cc1."} {"text":"User: Can you give me an example for a compound SELFIES that targets the protein Tumor antigen HOM-RCC-3.1.3?\nAssistant: Yes, the compound SELFIES [N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring1][=C][N][C][=Branch1][C][=O][C][=C][Branch1][C][Br][C][Branch1][C][Br][=C][Branch1][C][Br][C][Branch1][C][Br][=C][Ring1][#Branch2][C][Ring1][=C][=O][=C][Ring2][Ring1][Branch1] targets the protein Tumor antigen HOM-RCC-3.1.3.\nUser: Can you create another compound SELFIES that targets the protein Tumor antigen HOM-RCC-3.1.3?\nAssistant: Sure, the compound SELFIES [N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring1][=C][N][C][=Branch1][C][=O][C][=C][Branch1][C][Br][C][Branch1][C][Br][=C][Branch1][C][Br][C][Branch1][C][Br][=C][Ring1][#Branch2][C][Ring1][=C][=O][=C][Ring2][Ring1][Branch1] targets the compound SELFIES NS(=O)(=O)c1cc(NC(=O)Nc2ccccc2)c(Cl)cc1Cl."}", "/scratch/micpie/export/compound_protein_compound_2/train_5-0.jsonl": "{"text":"The compound SMILES COc1ccc2ccc(S(=O)(=O)N(Cc3ccccc3)[C@H]3CCN(Cc4cccc(C(=N)N)c4)C3=O)cc2c1 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) and which is also targeted by the compound CC(C)=CC(=O)C[C@@H]1C(=O)O[C@H]2CCCC21."} {"text":"The compound SMILES CCCn1c(=O)c2c(nc3[nH]c(C)cn32)n(Cc2ccncc2)c1=O.Cl targets the protein Adenosine receptor A3 and which is also targeted by the compound CNC(=O)[C@H]1O[C@@H](n2cnc3c(NCc4cc(OC)ccc4OC)ncnc32)[C@H](O)[C@@H]1O."}", "/scratch/micpie/export/compound_protein_compound_2/test_6-2.jsonl": "{"text":"User: Can you give me an example for a compound InChI that targets the protein Adenosine receptor A3?\nAssistant: Yes, the compound InChI InChI=1S\/C24H25ClN6O4S\/c1-2-31-23(32)20-22(28-24(31)33)27-21(26-20)17-6-8-19(9-7-17)36(34,35)30-12-10-29(11-13-30)15-16-4-3-5-18(25)14-16\/h3-9,14H,2,10-13,15H2,1H3,(H,26,27)(H,28,33) targets the protein Adenosine receptor A3.\nUser: Can you generate another compound InChI that targets the protein Adenosine receptor A3?\nAssistant: Sure, the compound InChI InChI=1S\/C24H25ClN6O4S\/c1-2-31-23(32)20-22(28-24(31)33)27-21(26-20)17-6-8-19(9-7-17)36(34,35)30-12-10-29(11-13-30)15-16-4-3-5-18(25)14-16\/h3-9,14H,2,10-13,15H2,1H3,(H,26,27)(H,28,33) targets the compound InChI CNC(=O)[C@H]1O[C@@H](n2cnc3c(NCc4cc(OC)ccc4OC)ncnc32)[C@H](O)[C@@H]1O."} {"text":"User: Can you give me an example for a compound SELFIES that targets the protein MAP kinase MXI2?\nAssistant: Of course, the compound SELFIES [C][C][=C][C][Branch1][C][C][=C][C][Branch2][Ring2][#Branch2][O][C][=N][C][=C][C][Branch2][Ring1][=N][C][=C][Branch1][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][N][=C][N][Ring1][N][C][C][C][N][C][C][Ring1][=Branch1][=N][Ring2][Ring1][Branch2][=C][Ring2][Ring1][S] targets the protein MAP kinase MXI2.\nUser: Can you create another compound SELFIES that targets the protein MAP kinase MXI2?\nAssistant: Yes, the compound SELFIES [C][C][=C][C][Branch1][C][C][=C][C][Branch2][Ring2][#Branch2][O][C][=N][C][=C][C][Branch2][Ring1][=N][C][=C][Branch1][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][N][=C][N][Ring1][N][C][C][C][N][C][C][Ring1][=Branch1][=N][Ring2][Ring1][Branch2][=C][Ring2][Ring1][S] targets the compound SELFIES CC(C)Cn1c(N)nc2ccc(-c3[nH]c(-c4c(F)cccc4F)nc3-c3ccccc3)nc21.CS(=O)(=O)O."}", "/scratch/micpie/export/compound_protein_compound_2/valid_0-1.jsonl": "{"text":"The protein Carbonic anhydrase B is targeted by the compound with the canonical SMILES CCCC(=O)OC[C@H]1O[C@@H](n2cc(-c3cccc(S(N)(=O)=O)c3)nn2)[C@H](OC(=O)CCC)[C@@H](OC(=O)CCC)[C@H]1OC(=O)CCC and CSc1ccc(\/C=N\/CCc2ccc(S(N)(=O)=O)cc2)cc1."} {"text":"The protein Carbonate dehydratase XII is targeted by the compound with the SELFIES [N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][C][Branch2][Ring1][=C][N][C][=Branch1][C][=O][C][=C][Branch1][C][Br][C][Branch1][C][Br][=C][Branch1][C][Br][C][Branch1][C][Br][=C][Ring1][#Branch2][C][Ring1][=C][=O][=C][Ring2][Ring1][Branch1] and NS(=O)(=O)c1cc(NC(=O)Nc2ccccc2)c(Cl)cc1Cl."}", "/scratch/micpie/export/compound_protein_compound_2/valid_7-1.jsonl": "{"text":"The protein MAX-interacting protein 2 is targeted by the compound with the InChI InChI=1S\/C32H29FN6O3\/c1-20-8-9-23(32(41)36-25-15-24(33)16-26(17-25)38-10-12-42-13-11-38)14-28(20)37-31-29-21(2)27(18-39(29)35-19-34-31)30(40)22-6-4-3-5-7-22\/h3-9,14-19H,10-13H2,1-2H3,(H,36,41)(H,34,35,37) and CC(C)Cn1c(N)nc2ccc(-c3[nH]c(-c4c(F)cccc4F)nc3-c3ccccc3)nc21.CS(=O)(=O)O."} {"text":"The protein Glucocorticoid receptor is targeted by the compound with the DeepSMILES CCCCCO)Ccccncccccc%106)))))))))))CF)F)F))))CCCcccccc6%10 and COC(=O)c1ccc(NC(=O)O[C@@H]2[C@@H](C)c3c(cc(F)c(-c4cccc5cc[nH]c45)c3F)NC2(C)C)cc1."}", "/scratch/micpie/export/compound_protein_compound_2/train_2-1.jsonl": "{"text":"The protein CAB is targeted by the compound with the SMILES NC(=O)Nc1ccc(S(=O)(=O)Nc2cccc(S(N)(=O)=O)c2)cc1 and Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cl.Cl.[Zn+2]."} {"text":"The protein Tyr-DNA phosphodiesterase 1 is targeted by the compound with the InChI InChI=1S\/C17H19N3O\/c1-17(2,12-7-4-5-8-13(12)18-3)16-11-14(19-20-16)15-9-6-10-21-15\/h4-11,18H,1-3H3,(H,19,20) and O=C(O)CC1OCC=C2CN3CCC45C6=CC(=O)C(O)=C([N+](=O)[O-])C6=NC4C1C2CC35."}", "/scratch/micpie/export/compound_protein_compound_2/valid_1-1.jsonl": "{"text":"The protein Tumor antigen HOM-RCC-3.1.3 is targeted by the compound with the SMILES CCCC(=O)OC[C@H]1O[C@@H](n2cc(-c3ccc(S(N)(=O)=O)cc3)nn2)[C@H](OC(=O)CCC)[C@@H](OC(=O)CCC)[C@H]1OC(=O)CCC and NS(=O)(=O)c1cc(NC(=O)Nc2ccccc2)c(Cl)cc1Cl."} {"text":"The protein Carbonic anhydrase B is targeted by the compound with the SMILES CC(C)CC(=O)OC[C@H]1O[C@@H](n2cc(-c3cccc(S(N)(=O)=O)c3)nn2)[C@H](OC(=O)CC(C)C)[C@@H](OC(=O)CC(C)C)[C@@H]1OC(=O)CC(C)C and Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cl.Cl.[Zn+2]."}", "/scratch/micpie/export/compound_protein_compound_2/test_3-1.jsonl": "{"text":"The protein Tyrosyl-DNA phosphodiesterase 1 is targeted by the compound with the SELFIES [C][=Branch1][#Branch2][=C][\\C][=C][C][=C][C][=C][Ring1][=Branch1][\\C][N][C][C][N][Branch2][Ring1][#Branch1][C][C][O][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][Ring2][Ring1][=Branch1] and O=C(O)CC1OCC=C2CN3CCC45C6=CC(=O)C(O)=C([N+](=O)[O-])C6=NC4C1C2CC35."} {"text":"The protein Cathepsin K is targeted by the compound with the InChI InChI=1S\/C26H29FN4O2\/c27-20-10-12-21(13-11-20)30-14-16-31(17-15-30)26(33)23-9-5-4-8-22(23)25(32)29-24(18-28)19-6-2-1-3-7-19\/h1-3,6-7,10-13,22-24H,4-5,8-9,14-17H2,(H,29,32)\/t22-,23-,24?\/m1\/s1 and CC(C)CC(C(=O)NCC#N)c1cccc(-c2ccc(C3CNCCO3)cc2)c1."}", "/scratch/micpie/export/compound_protein_compound_2/train_9-0.jsonl": "{"text":"The compound DeepSMILES COcccc-ccncccnc6n9)))))))))cc6NC=O)NcccC)ccc6C targets the protein Tyr-DNA phosphodiesterase 1 and which is also targeted by the compound COc1cc2c(=O)n(CCCN3CCOCC3)c3c(c2cc1OCCCN(C)C)C(=O)c1cc2c(cc1-3)OCO2."} {"text":"The compound DeepSMILES O=CNC=S)\/C=C\/cccccO)c6O))))))))S5 targets the protein Tyrosyl-DNA phosphodiesterase 1 and which is also targeted by the compound C\/C(=N\\NC(=O)c1cc(-c2cccs2)nc2ccccc12)c1ccccn1."}", "/scratch/micpie/export/compound_protein_compound_2/valid_9-2.jsonl": "{"text":"User: Can you come up with one example for a compound canonical SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, the compound canonical SMILES CCOc1cc(C(c2c(C)n[nH]c2O)c2c(C)n[nH]c2O)ccc1OC(=O)c1cccc([N+](=O)[O-])c1 targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you generate another compound canonical SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Sure, the compound canonical SMILES CCOc1cc(C(c2c(C)n[nH]c2O)c2c(C)n[nH]c2O)ccc1OC(=O)c1cccc([N+](=O)[O-])c1 targets the compound canonical SMILES COc1cc2c(=O)n(CCCN3CCOCC3)c3c(c2cc1OCCCN(C)C)C(=O)c1cc2c(cc1-3)OCO2."} {"text":"User: Can you give me an example for a compound InChI that targets the protein Tyrosyl-DNA phosphodiesterase 1?\nAssistant: Yes, of course, the compound InChI InChI=1S\/C25H21NO5\/c1-29-18-6-8-20-21-9-7-19(13-23(21)31-25(28)22(20)12-18)30-15-24(27)26-11-10-16-4-2-3-5-17(16)14-26\/h2-9,12-13H,10-11,14-15H2,1H3 targets the protein Tyrosyl-DNA phosphodiesterase 1.\nUser: Can you generate another compound InChI that targets the protein Tyrosyl-DNA phosphodiesterase 1?\nAssistant: Yes, the compound InChI InChI=1S\/C25H21NO5\/c1-29-18-6-8-20-21-9-7-19(13-23(21)31-25(28)22(20)12-18)30-15-24(27)26-11-10-16-4-2-3-5-17(16)14-26\/h2-9,12-13H,10-11,14-15H2,1H3 targets the compound InChI C\/C(=N\\NC(=O)c1cc(-c2cccs2)nc2ccccc12)c1ccccn1."}", "/scratch/micpie/export/compound_protein_compound_2/test_1-0.jsonl": "{"text":"The compound canonical SMILES COC(=O)[C@@H](NS(=O)(=O)NCc1ccc(F)cc1)C(C)C targets the protein CA-XII and which is also targeted by the compound NS(=O)(=O)c1cc(NC(=O)Nc2ccccc2)c(Cl)cc1Cl."} {"text":"The compound InChI InChI=1S\/C9H14N2O3S\/c1-7(11-15(10,12)13)8-5-3-4-6-9(8)14-2\/h3-7,11H,1-2H3,(H2,10,12,13) targets the protein Carbonic anhydrase I and which is also targeted by the compound Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cl.Cl.[Zn+2]."}", "/scratch/micpie/export/compound_protein_compound_2/train_0-2.jsonl": "{"text":"User: Can you come up with an example for a compound DeepSMILES that targets the protein Carbonate dehydratase I?\nAssistant: Sure, the compound DeepSMILES NS=O)=O)Ncnnc-ccccBr)cc6))))))s5 targets the protein Carbonate dehydratase I.\nUser: Can you create another compound DeepSMILES that targets the protein Carbonate dehydratase I?\nAssistant: Sure, the compound DeepSMILES NS=O)=O)Ncnnc-ccccBr)cc6))))))s5 targets the compound DeepSMILES CSc1ccc(\/C=N\/CCc2ccc(S(N)(=O)=O)cc2)cc1."} {"text":"User: Can you come up with an example for a compound DeepSMILES that targets the protein Carbonic anhydrase XII?\nAssistant: Sure, the compound DeepSMILES CC=O)O[C@@H][C@@H]nccCNC=O)ccccSN)=O)=O))cc6)))))))))nn5)))))OC[C@@H]OCC)=O)))[C@H]6OCC)=O targets the protein Carbonic anhydrase XII.\nUser: Can you tell another compound DeepSMILES that targets the protein Carbonic anhydrase XII?\nAssistant: Sure, the compound DeepSMILES CC=O)O[C@@H][C@@H]nccCNC=O)ccccSN)=O)=O))cc6)))))))))nn5)))))OC[C@@H]OCC)=O)))[C@H]6OCC)=O targets the compound DeepSMILES NS(=O)(=O)c1cc(NC(=O)Nc2ccccc2)c(Cl)cc1Cl."}", "/scratch/micpie/export/compound_protein_compound_2/train_4-2.jsonl": "{"text":"User: Can you give me an example for a compound SELFIES that targets the protein Cathepsin X?\nAssistant: Sure, the compound SELFIES [C][C][Branch1][C][F][Branch1][C][F][C][C@H1][Branch2][Ring1][P][N][C][=Branch1][C][=O][N][C][C][C][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][=Branch1][C][=O][N][Branch1][#Branch1][C][C][C][C][Ring1][Ring1][C][Ring1][N][C][=Branch1][C][=O][N][C][Branch1][Ring1][C][#N][C][C][Ring1][Branch1] targets the protein Cathepsin X.\nUser: Can you create another compound SELFIES that targets the protein Cathepsin X?\nAssistant: Yes, of course, the compound SELFIES [C][C][Branch1][C][F][Branch1][C][F][C][C@H1][Branch2][Ring1][P][N][C][=Branch1][C][=O][N][C][C][C][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][=Branch1][C][=O][N][Branch1][#Branch1][C][C][C][C][Ring1][Ring1][C][Ring1][N][C][=Branch1][C][=O][N][C][Branch1][Ring1][C][#N][C][C][Ring1][Branch1] targets the compound SELFIES CC(C)CC(C(=O)NCC#N)c1cccc(-c2ccc(C3CNCCO3)cc2)c1."} {"text":"User: Can you give me one example for a compound DeepSMILES that targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II)?\nAssistant: Of course, the compound DeepSMILES CCOC=O)C=O)NCcccccc6)))))))cccccc6)OC[C@H]COccccC=N)N))cc6))))))))O6 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II).\nUser: Can you tell another compound DeepSMILES that targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II)?\nAssistant: Sure, the compound DeepSMILES CCOC=O)C=O)NCcccccc6)))))))cccccc6)OC[C@H]COccccC=N)N))cc6))))))))O6 targets the compound DeepSMILES CC(C)=CC(=O)C[C@@H]1C(=O)O[C@H]2CCCC21."}", "/scratch/micpie/export/compound_protein_compound_2/test_6-1.jsonl": "{"text":"The protein Adenosine receptor A3 is targeted by the compound with the DeepSMILES CCnc=O)[nH]cnc-ccccS=O)=O)NCCNCcccccCl)c6)))))))CC6)))))))cc6))))))[nH]c5c9=O and CNC(=O)[C@H]1O[C@@H](n2cnc3c(NCc4cc(OC)ccc4OC)ncnc32)[C@H](O)[C@@H]1O."} {"text":"The protein Mitogen-activated protein kinase 14 is targeted by the compound with the SELFIES [C][C][=C][C][Branch1][C][C][=C][C][Branch2][Ring2][#Branch2][O][C][=N][C][=C][C][Branch2][Ring1][=N][C][=C][Branch1][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][N][=C][N][Ring1][N][C][C][C][N][C][C][Ring1][=Branch1][=N][Ring2][Ring1][Branch2][=C][Ring2][Ring1][S] and CC(C)Cn1c(N)nc2ccc(-c3[nH]c(-c4c(F)cccc4F)nc3-c3ccccc3)nc21.CS(=O)(=O)O."}", "/scratch/micpie/export/compound_protein_compound_2/valid_4-1.jsonl": "{"text":"The protein Cathepsin O is targeted by the compound with the SELFIES [O][=C][Branch2][Ring2][=Branch2][N][C@@H1][Branch1][#Branch2][C][C][C][C][C][C][C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][N][C][=C][C][=C][Branch1][#Branch2][O][C][Branch1][C][F][Branch1][C][F][F][C][=C][Ring1][O][C][=C][C][=C][Branch2][Ring1][Ring1][C][=C][C][=C][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][=C][Ring1][#Branch2][O][Ring1][#C] and CC(C)CC(C(=O)NCC#N)c1cccc(-c2ccc(C3CNCCO3)cc2)c1."} {"text":"The protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) is targeted by the compound with the InChI InChI=1S\/C38H38N8O8\/c39-35(40)23-5-7-27-25(17-23)19-31(53-27)37(49)45-13-9-43(10-14-45)33(47)21-51-29-3-1-2-4-30(29)52-22-34(48)44-11-15-46(16-12-44)38(50)32-20-26-18-24(36(41)42)6-8-28(26)54-32\/h1-8,17-20H,9-16,21-22H2,(H3,39,40)(H3,41,42) and CC(C)=CC(=O)C[C@@H]1C(=O)O[C@H]2CCCC21."}", "/scratch/micpie/export/compound_protein_compound_2/test_2-2.jsonl": "{"text":"User: Can you give me an example for a compound SMILES that targets the protein Carbonic anhydrase 1?\nAssistant: Yes, of course, the compound SMILES Cc1ccc(S(=O)(=O)NC(=O)Nc2c(S(N)(=O)=O)cc(S(N)(=O)=O)c(Cl)c2Cl)cc1 targets the protein Carbonic anhydrase 1.\nUser: Can you create another compound SMILES that targets the protein Carbonic anhydrase 1?\nAssistant: Yes, of course, the compound SMILES Cc1ccc(S(=O)(=O)NC(=O)Nc2c(S(N)(=O)=O)cc(S(N)(=O)=O)c(Cl)c2Cl)cc1 targets the compound SMILES Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cl.Cl.[Zn+2]."} {"text":"User: Can you give me one example for a compound InChI that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Of course, the compound InChI InChI=1S\/C17H28BrNO2\/c1-3-5-11-19(12-6-4-2)13-16(20)14-21-17-9-7-15(18)8-10-17\/h7-10,16,20H,3-6,11-14H2,1-2H3 targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you tell another compound InChI that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Sure, the compound InChI InChI=1S\/C17H28BrNO2\/c1-3-5-11-19(12-6-4-2)13-16(20)14-21-17-9-7-15(18)8-10-17\/h7-10,16,20H,3-6,11-14H2,1-2H3 targets the compound InChI O=C(O)CC1OCC=C2CN3CCC45C6=CC(=O)C(O)=C([N+](=O)[O-])C6=NC4C1C2CC35."}", "/scratch/micpie/export/compound_protein_compound_2/train_1-1.jsonl": "{"text":"The protein Carbonic anhydrase XII is targeted by the compound with the canonical SMILES CC(=O)N[C@@H]1[C@@H](O)[C@H](O)[C@@H](CO)O[C@H]1n1cc(COC(=O)c2ccc(S(N)(=O)=O)cc2)nn1 and NS(=O)(=O)c1cc(NC(=O)Nc2ccccc2)c(Cl)cc1Cl."} {"text":"The protein Cyanamide hydratase CA1 is targeted by the compound with the DeepSMILES NS=O)=O)cnccccOS=O)=O)ccF)cF)cF)cF)c6F)))))))))cc6s9 and Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cl.Cl.[Zn+2]."}", "/scratch/micpie/export/compound_protein_compound_2/valid_7-0.jsonl": "{"text":"The compound SMILES Cc1ccc(C(=O)Nc2cc(F)cc(N3CCOCC3)c2)cc1Nc1ncnn2cc(C(=O)c3ccccc3)c(C)c12 targets the protein MAP kinase p38 alpha and which is also targeted by the compound CC(C)Cn1c(N)nc2ccc(-c3[nH]c(-c4c(F)cccc4F)nc3-c3ccccc3)nc21.CS(=O)(=O)O."} {"text":"The compound canonical SMILES CCC1(CC(O)(Cc2ccnc3ccccc23)C(F)(F)F)CCCc2ccccc21 targets the protein Nuclear receptor subfamily 3 group C member 1 and which is also targeted by the compound COC(=O)c1ccc(NC(=O)O[C@@H]2[C@@H](C)c3c(cc(F)c(-c4cccc5cc[nH]c45)c3F)NC2(C)C)cc1."}", "/scratch/micpie/export/compound_protein_compound_2/valid_8-1.jsonl": "{"text":"The protein Glucocorticoid receptor is targeted by the compound with the SELFIES [C][C@@H1][C][C@H1][C@@H1][C][C@H1][Branch1][C][F][C][=C][C][=Branch1][C][=O][C][=C][C@][Ring1][#Branch1][Branch1][C][C][C@@][Ring1][=N][Branch1][C][F][C@@H1][Branch1][C][O][C][C@][Ring2][Ring1][Ring1][Branch1][C][C][C@@][Ring2][Ring1][#Branch1][Branch1][C][O][C][=Branch1][C][=O][C][S][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][S][Ring1][=Branch2] and COC(=O)c1ccc(NC(=O)O[C@@H]2[C@@H](C)c3c(cc(F)c(-c4cccc5cc[nH]c45)c3F)NC2(C)C)cc1."} {"text":"The protein Tyr-DNA phosphodiesterase 1 is targeted by the compound with the DeepSMILES CccccNcnncSCC=O)NCcccco5))))))))))s5))))))cc6 and COc1cc2c(=O)n(CCCN3CCOCC3)c3c(c2cc1OCCCN(C)C)C(=O)c1cc2c(cc1-3)OCO2."}", "/scratch/micpie/export/compound_protein_compound_2/train_0-1.jsonl": "{"text":"The protein Carbonic anhydrase I is targeted by the compound with the InChI InChI=1S\/C8H7BrN4O2S2\/c9-6-3-1-5(2-4-6)7-11-12-8(16-7)13-17(10,14)15\/h1-4H,(H,12,13)(H2,10,14,15) and CSc1ccc(\/C=N\/CCc2ccc(S(N)(=O)=O)cc2)cc1."} {"text":"The protein Carbonic anhydrase 12 is targeted by the compound with the InChI InChI=1S\/C21H25N5O10S\/c1-11(27)34-17-10-33-21(19(36-13(3)29)18(17)35-12(2)28)26-9-15(24-25-26)8-23-20(30)14-4-6-16(7-5-14)37(22,31)32\/h4-7,9,17-19,21H,8,10H2,1-3H3,(H,23,30)(H2,22,31,32)\/t17-,18-,19+,21+\/m1\/s1 and NS(=O)(=O)c1cc(NC(=O)Nc2ccccc2)c(Cl)cc1Cl."}", "/scratch/micpie/export/compound_protein_compound_2/valid_8-0.jsonl": "{"text":"The compound DeepSMILES C[C@@H]C[C@H][C@@H]C[C@H]F)C=CC=O)C=C[C@]6C)[C@@]%10F)[C@@H]O)C[C@]%14C)[C@@]%17O)C=O)CScncccccc6s9 targets the protein Nuclear receptor subfamily 3 group C member 1 and which is also targeted by the compound COC(=O)c1ccc(NC(=O)O[C@@H]2[C@@H](C)c3c(cc(F)c(-c4cccc5cc[nH]c45)c3F)NC2(C)C)cc1."} {"text":"The compound SMILES Cc1ccc(Nc2nnc(SCC(=O)NCc3ccco3)s2)cc1 targets the protein Tyrosyl-DNA phosphodiesterase 1 and which is also targeted by the compound COc1cc2c(=O)n(CCCN3CCOCC3)c3c(c2cc1OCCCN(C)C)C(=O)c1cc2c(cc1-3)OCO2."}", "/scratch/micpie/export/compound_protein_compound_2/test_9-1.jsonl": "{"text":"The protein Tyrosyl-DNA phosphodiesterase 1 is targeted by the compound with the InChI InChI=1S\/C16H12N4O2S2\/c1-22-10-6-4-8-12-13(10)18-16(24-12)20-19-14(21)15-17-9-5-2-3-7-11(9)23-15\/h2-8H,1H3,(H,18,20)(H,19,21) and COc1cc2c(=O)n(CCCN3CCOCC3)c3c(c2cc1OCCCN(C)C)C(=O)c1cc2c(cc1-3)OCO2."} {"text":"The protein Tyr-DNA phosphodiesterase 1 is targeted by the compound with the InChI InChI=1S\/C19H22N2O7S\/c1-25-13-8-11(9-14(26-2)17(13)27-3)4-5-16(23)28-10-15(22)21-19-12(18(20)24)6-7-29-19\/h6-9H,4-5,10H2,1-3H3,(H2,20,24)(H,21,22) and C\/C(=N\\NC(=O)c1cc(-c2cccs2)nc2ccccc12)c1ccccn1."}", "/scratch/micpie/export/compound_protein_compound_2/valid_1-0.jsonl": "{"text":"The compound SELFIES [C][C][C][C][=Branch1][C][=O][O][C][C@H1][O][C@@H1][Branch2][Ring1][=N][N][C][=C][Branch2][Ring1][Ring1][C][=C][C][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][C][=C][Ring1][#Branch2][N][=N][Ring1][#C][C@H1][Branch1][=Branch2][O][C][=Branch1][C][=O][C][C][C][C@@H1][Branch1][=Branch2][O][C][=Branch1][C][=O][C][C][C][C@H1][Ring2][Ring1][P][O][C][=Branch1][C][=O][C][C][C] targets the protein Carbonic anhydrase XII and which is also targeted by the compound NS(=O)(=O)c1cc(NC(=O)Nc2ccccc2)c(Cl)cc1Cl."} {"text":"The compound DeepSMILES CCC)CC=O)OC[C@H]O[C@@H]ncc-cccccSN)=O)=O))c6))))))nn5)))))[C@H]OC=O)CCC)C)))))[C@@H]OC=O)CCC)C)))))[C@@H]6OC=O)CCC)C targets the protein CA-I and which is also targeted by the compound Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cl.Cl.[Zn+2]."}", "/scratch/micpie/export/compound_protein_compound_2/train_6-2.jsonl": "{"text":"User: Can you come up with an example for a compound DeepSMILES that targets the protein Adenosine receptor A3?\nAssistant: Yes, the compound DeepSMILES O=ccO)c-ccccO)cc6O)))))))occcO)ccO)c%106 targets the protein Adenosine receptor A3.\nUser: Can you generate another compound DeepSMILES that targets the protein Adenosine receptor A3?\nAssistant: Sure, the compound DeepSMILES O=ccO)c-ccccO)cc6O)))))))occcO)ccO)c%106 targets the compound DeepSMILES CNC(=O)[C@H]1O[C@@H](n2cnc3c(NCc4cc(OC)ccc4OC)ncnc32)[C@H](O)[C@@H]1O."} {"text":"User: Can you come up with one example for a compound SMILES that targets the protein MAX-interacting protein 2?\nAssistant: Yes, of course, the compound SMILES Cc1cc(OCc2ccc(F)cc2F)c(Br)c(=O)n1-c1cccnc1 targets the protein MAX-interacting protein 2.\nUser: Can you tell another compound SMILES that targets the protein MAX-interacting protein 2?\nAssistant: Yes, the compound SMILES Cc1cc(OCc2ccc(F)cc2F)c(Br)c(=O)n1-c1cccnc1 targets the compound SMILES CC(C)Cn1c(N)nc2ccc(-c3[nH]c(-c4c(F)cccc4F)nc3-c3ccccc3)nc21.CS(=O)(=O)O."}", "/scratch/micpie/export/compound_protein_compound_2/valid_6-2.jsonl": "{"text":"User: Can you give me one example for a compound SMILES that targets the protein Adenosine receptor A3?\nAssistant: Of course, the compound SMILES CC(C)NC(=O)c1cccc(Cn2nnc3c(-c4ccco4)nc(N)nc32)c1 targets the protein Adenosine receptor A3.\nUser: Can you generate another compound SMILES that targets the protein Adenosine receptor A3?\nAssistant: Yes, the compound SMILES CC(C)NC(=O)c1cccc(Cn2nnc3c(-c4ccco4)nc(N)nc32)c1 targets the compound SMILES CNC(=O)[C@H]1O[C@@H](n2cnc3c(NCc4cc(OC)ccc4OC)ncnc32)[C@H](O)[C@@H]1O."} {"text":"User: Can you give me one example for a compound SMILES that targets the protein CSAID-binding protein?\nAssistant: Yes, the compound SMILES O=C(Nc1cc(-c2c(-c3ccc(F)cc3)nc3n2CCS3)ccn1)C1CC1c1ccccc1 targets the protein CSAID-binding protein.\nUser: Can you create another compound SMILES that targets the protein CSAID-binding protein?\nAssistant: Yes, the compound SMILES O=C(Nc1cc(-c2c(-c3ccc(F)cc3)nc3n2CCS3)ccn1)C1CC1c1ccccc1 targets the compound SMILES CC(C)Cn1c(N)nc2ccc(-c3[nH]c(-c4c(F)cccc4F)nc3-c3ccccc3)nc21.CS(=O)(=O)O."}", "/scratch/micpie/export/compound_protein_compound_2/train_6-0.jsonl": "{"text":"The compound InChI InChI=1S\/C15H10O7\/c16-6-1-2-8(9(18)3-6)15-14(21)13(20)12-10(19)4-7(17)5-11(12)22-15\/h1-5,16-19,21H targets the protein Adenosine receptor A3 and which is also targeted by the compound CNC(=O)[C@H]1O[C@@H](n2cnc3c(NCc4cc(OC)ccc4OC)ncnc32)[C@H](O)[C@@H]1O."} {"text":"The compound InChI InChI=1S\/C18H13BrF2N2O2\/c1-11-7-16(25-10-12-4-5-13(20)8-15(12)21)17(19)18(24)23(11)14-3-2-6-22-9-14\/h2-9H,10H2,1H3 targets the protein MAPK 14 and which is also targeted by the compound CC(C)Cn1c(N)nc2ccc(-c3[nH]c(-c4c(F)cccc4F)nc3-c3ccccc3)nc21.CS(=O)(=O)O."}", "/scratch/micpie/export/compound_protein_compound_2/train_8-2.jsonl": "{"text":"User: Can you give me an example for a compound InChI that targets the protein GR?\nAssistant: Yes, the compound InChI InChI=1S\/C24H25F3N2O\/c1-22(12-5-8-17-6-2-3-9-20(17)22)15-23(30,24(25,26)27)16-29-21-10-4-7-18-14-28-13-11-19(18)21\/h2-4,6-7,9-11,13-14,29-30H,5,8,12,15-16H2,1H3 targets the protein GR.\nUser: Can you create another compound InChI that targets the protein GR?\nAssistant: Yes, the compound InChI InChI=1S\/C24H25F3N2O\/c1-22(12-5-8-17-6-2-3-9-20(17)22)15-23(30,24(25,26)27)16-29-21-10-4-7-18-14-28-13-11-19(18)21\/h2-4,6-7,9-11,13-14,29-30H,5,8,12,15-16H2,1H3 targets the compound InChI COC(=O)c1ccc(NC(=O)O[C@@H]2[C@@H](C)c3c(cc(F)c(-c4cccc5cc[nH]c45)c3F)NC2(C)C)cc1."} {"text":"User: Can you come up with an example for a compound DeepSMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Sure, the compound DeepSMILES C\/C=N\\Ncnc-cccccc6))))))cs5)))))))ccccs5 targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you create another compound DeepSMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, the compound DeepSMILES C\/C=N\\Ncnc-cccccc6))))))cs5)))))))ccccs5 targets the compound DeepSMILES COc1cc2c(=O)n(CCCN3CCOCC3)c3c(c2cc1OCCCN(C)C)C(=O)c1cc2c(cc1-3)OCO2."}", "/scratch/micpie/export/compound_protein_compound_2/train_3-1.jsonl": "{"text":"The protein Tyr-DNA phosphodiesterase 1 is targeted by the compound with the DeepSMILES CCOC=O)Cccc=O)n-cncccccc6s9)))))))))[nH]5 and O=C(O)CC1OCC=C2CN3CCC45C6=CC(=O)C(O)=C([N+](=O)[O-])C6=NC4C1C2CC35."} {"text":"The protein Cathepsin X is targeted by the compound with the canonical SMILES COc1cccc(C(=O)N[C@@H](CCC2CCCCC2)C(=O)N[C@H](CN2CCc3cc(F)ccc32)[C@@H](C)OCc2ccccc2)c1 and CC(C)CC(C(=O)NCC#N)c1cccc(-c2ccc(C3CNCCO3)cc2)c1."}", "/scratch/micpie/export/compound_protein_compound_2/test_8-0.jsonl": "{"text":"The compound DeepSMILES C[C@]C[C@H]O)[C@@]F)[C@@H]C[C@H]F)C=CC=O)C=C[C@@]6%10C)))))))))[C@@H]6C[C@H]O[C@@H]ccccCScccccc6))))))))cc6))))))O[C@]5%12C=O)CO targets the protein Glucocorticoid receptor and which is also targeted by the compound COC(=O)c1ccc(NC(=O)O[C@@H]2[C@@H](C)c3c(cc(F)c(-c4cccc5cc[nH]c45)c3F)NC2(C)C)cc1."} {"text":"The compound DeepSMILES O=CCCCcoccccNS=O)=O)ccccccccnc%106))))))))))))cc6c9%13 targets the protein Tyrosyl-DNA phosphodiesterase 1 and which is also targeted by the compound COc1cc2c(=O)n(CCCN3CCOCC3)c3c(c2cc1OCCCN(C)C)C(=O)c1cc2c(cc1-3)OCO2."}", "/scratch/micpie/export/compound_protein_compound_2/valid_3-1.jsonl": "{"text":"The protein Tyr-DNA phosphodiesterase 1 is targeted by the compound with the SMILES COc1cccc2c1[C@@H]1CN(CCCCn3c(=O)[nH]c4c(sc5ncc(-c6ccccc6)nc54)c3=O)C[C@@H]1CO2.Cl and O=C(O)CC1OCC=C2CN3CCC45C6=CC(=O)C(O)=C([N+](=O)[O-])C6=NC4C1C2CC35."} {"text":"The protein Cathepsin O2 is targeted by the compound with the SMILES COc1ccc(N2CCN(C(=O)[C@@H]3CCCC[C@H]3C(=O)NC3(C#N)CC3)[C@H](C)C2)cc1OC and CC(C)CC(C(=O)NCC#N)c1cccc(-c2ccc(C3CNCCO3)cc2)c1."}", "/scratch/micpie/export/compound_protein_compound_2/train_9-1.jsonl": "{"text":"The protein Tyr-DNA phosphodiesterase 1 is targeted by the compound with the SELFIES [C][O][C][=C][C][=C][Branch1][=C][C][=C][N][C][=C][C][=N][C][Ring1][=Branch1][=N][Ring1][=Branch2][C][=C][Ring1][#C][N][C][=Branch1][C][=O][N][C][=C][C][Branch1][C][C][=C][C][=C][Ring1][#Branch1][C] and COc1cc2c(=O)n(CCCN3CCOCC3)c3c(c2cc1OCCCN(C)C)C(=O)c1cc2c(cc1-3)OCO2."} {"text":"The protein Tyr-DNA phosphodiesterase 1 is targeted by the compound with the canonical SMILES O=C1NC(=S)\/C(=C\/c2cccc(O)c2O)S1 and C\/C(=N\\NC(=O)c1cc(-c2cccs2)nc2ccccc12)c1ccccn1."}", "/scratch/micpie/export/compound_protein_compound_2/test_4-1.jsonl": "{"text":"The protein Cathepsin O2 is targeted by the compound with the SMILES CCCC[C@H](NC(=O)OC(CCC)CCC)C(=O)C(=O)N[C@H](C)c1ccccc1 and CC(C)CC(C(=O)NCC#N)c1cccc(-c2ccc(C3CNCCO3)cc2)c1."} {"text":"The protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) is targeted by the compound with the InChI InChI=1S\/C38H38N8O6S2\/c39-35(40)23-1-7-29-25(17-23)19-31(53-29)37(49)45-13-9-43(10-14-45)33(47)21-51-27-3-5-28(6-4-27)52-22-34(48)44-11-15-46(16-12-44)38(50)32-20-26-18-24(36(41)42)2-8-30(26)54-32\/h1-8,17-20H,9-16,21-22H2,(H3,39,40)(H3,41,42) and CC(C)=CC(=O)C[C@@H]1C(=O)O[C@H]2CCCC21."}", "/scratch/micpie/export/compound_protein_compound_2/valid_6-1.jsonl": "{"text":"The protein Adenosine receptor A3 is targeted by the compound with the InChI InChI=1S\/C19H19N7O2\/c1-11(2)21-18(27)13-6-3-5-12(9-13)10-26-17-16(24-25-26)15(22-19(20)23-17)14-7-4-8-28-14\/h3-9,11H,10H2,1-2H3,(H,21,27)(H2,20,22,23) and CNC(=O)[C@H]1O[C@@H](n2cnc3c(NCc4cc(OC)ccc4OC)ncnc32)[C@H](O)[C@@H]1O."} {"text":"The protein Cytokine suppressive anti-inflammatory drug-binding protein is targeted by the compound with the SELFIES [O][=C][Branch2][Ring2][#Branch1][N][C][=C][C][Branch2][Ring1][#Branch2][C][=C][Branch1][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][N][=C][N][Ring1][N][C][C][S][Ring1][Branch1][=C][C][=N][Ring2][Ring1][Branch1][C][C][C][Ring1][Ring1][C][=C][C][=C][C][=C][Ring1][=Branch1] and CC(C)Cn1c(N)nc2ccc(-c3[nH]c(-c4c(F)cccc4F)nc3-c3ccccc3)nc21.CS(=O)(=O)O."}", "/scratch/micpie/export/compound_protein_compound_2/train_1-2.jsonl": "{"text":"User: Can you give me one example for a compound DeepSMILES that targets the protein Carbonate dehydratase XII?\nAssistant: Yes, of course, the compound DeepSMILES CC=O)N[C@@H][C@@H]O)[C@H]O)[C@@H]CO))O[C@H]6nccCOC=O)ccccSN)=O)=O))cc6)))))))))nn5 targets the protein Carbonate dehydratase XII.\nUser: Can you tell another compound DeepSMILES that targets the protein Carbonate dehydratase XII?\nAssistant: Yes, of course, the compound DeepSMILES CC=O)N[C@@H][C@@H]O)[C@H]O)[C@@H]CO))O[C@H]6nccCOC=O)ccccSN)=O)=O))cc6)))))))))nn5 targets the compound DeepSMILES NS(=O)(=O)c1cc(NC(=O)Nc2ccccc2)c(Cl)cc1Cl."} {"text":"User: Can you come up with an example for a compound SELFIES that targets the protein CAB?\nAssistant: Of course, the compound SELFIES [N][S][=Branch1][C][=O][=Branch1][C][=O][C][=N][C][=C][C][=C][Branch2][Ring1][=C][O][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][Branch1][C][F][C][Branch1][C][F][=C][Branch1][C][F][C][Branch1][C][F][=C][Ring1][#Branch2][F][C][=C][Ring2][Ring1][Branch1][S][Ring2][Ring1][Branch2] targets the protein CAB.\nUser: Can you tell another compound SELFIES that targets the protein CAB?\nAssistant: Yes, of course, the compound SELFIES [N][S][=Branch1][C][=O][=Branch1][C][=O][C][=N][C][=C][C][=C][Branch2][Ring1][=C][O][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][Branch1][C][F][C][Branch1][C][F][=C][Branch1][C][F][C][Branch1][C][F][=C][Ring1][#Branch2][F][C][=C][Ring2][Ring1][Branch1][S][Ring2][Ring1][Branch2] targets the compound SELFIES Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cc1ccc2occ(CNc3ccc(S(N)(=O)=O)cc3)c(=O)c2c1.Cl.Cl.[Zn+2]."}", "/scratch/micpie/export/compound_protein_compound_2/valid_6-0.jsonl": "{"text":"The compound canonical SMILES CC(C)NC(=O)c1cccc(Cn2nnc3c(-c4ccco4)nc(N)nc32)c1 targets the protein Adenosine receptor A3 and which is also targeted by the compound CNC(=O)[C@H]1O[C@@H](n2cnc3c(NCc4cc(OC)ccc4OC)ncnc32)[C@H](O)[C@@H]1O."} {"text":"The compound SMILES O=C(Nc1cc(-c2c(-c3ccc(F)cc3)nc3n2CCS3)ccn1)C1CC1c1ccccc1 targets the protein Stress-activated protein kinase 2a and which is also targeted by the compound CC(C)Cn1c(N)nc2ccc(-c3[nH]c(-c4c(F)cccc4F)nc3-c3ccccc3)nc21.CS(=O)(=O)O."}", "/scratch/micpie/export/compound_protein_compound_2/train_3-0.jsonl": "{"text":"The compound InChI InChI=1S\/C14H13N3O3S\/c1-2-20-13(19)8-9-7-12(18)17(16-9)14-15-10-5-3-4-6-11(10)21-14\/h3-7,16H,2,8H2,1H3 targets the protein Tyr-DNA phosphodiesterase 1 and which is also targeted by the compound O=C(O)CC1OCC=C2CN3CCC45C6=CC(=O)C(O)=C([N+](=O)[O-])C6=NC4C1C2CC35."} {"text":"The compound DeepSMILES COcccccC=O)N[C@@H]CCCCCCCC6))))))))C=O)N[C@H]CNCCcccF)ccc69))))))))))[C@@H]C)OCcccccc6)))))))))))))))c6 targets the protein Cathepsin O and which is also targeted by the compound CC(C)CC(C(=O)NCC#N)c1cccc(-c2ccc(C3CNCCO3)cc2)c1."}", "/scratch/micpie/export/compound_protein_compound_2/test_9-2.jsonl": "{"text":"User: Can you give me one example for a compound InChI that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Sure, the compound InChI InChI=1S\/C16H12N4O2S2\/c1-22-10-6-4-8-12-13(10)18-16(24-12)20-19-14(21)15-17-9-5-2-3-7-11(9)23-15\/h2-8H,1H3,(H,18,20)(H,19,21) targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you create another compound InChI that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, the compound InChI InChI=1S\/C16H12N4O2S2\/c1-22-10-6-4-8-12-13(10)18-16(24-12)20-19-14(21)15-17-9-5-2-3-7-11(9)23-15\/h2-8H,1H3,(H,18,20)(H,19,21) targets the compound InChI COc1cc2c(=O)n(CCCN3CCOCC3)c3c(c2cc1OCCCN(C)C)C(=O)c1cc2c(cc1-3)OCO2."} {"text":"User: Can you come up with an example for a compound DeepSMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Of course, the compound DeepSMILES COcccCCC=O)OCC=O)Ncsccc5CN)=O))))))))))))))ccOC))c6OC targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you generate another compound DeepSMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Of course, the compound DeepSMILES COcccCCC=O)OCC=O)Ncsccc5CN)=O))))))))))))))ccOC))c6OC targets the compound DeepSMILES C\/C(=N\\NC(=O)c1cc(-c2cccs2)nc2ccccc12)c1ccccn1."}", "/scratch/micpie/export/compound_protein_compound_2/train_7-0.jsonl": "{"text":"The compound SELFIES [C][C][C][=C][N][C][Branch1][C][N][=N][N][=C][Ring1][=Branch1][C][=Ring1][#Branch2][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][Ring1][#Branch1][C][C][C][C][Ring1][Branch1] targets the protein MAPK 14 and which is also targeted by the compound CC(C)Cn1c(N)nc2ccc(-c3[nH]c(-c4c(F)cccc4F)nc3-c3ccccc3)nc21.CS(=O)(=O)O."} {"text":"The compound SELFIES [C][C@@H1][C][C@H1][C@@H1][C][C][C][=C][C][=Branch1][C][=O][C][=C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][N][C@@H1][Branch1][C][O][C][C@][Ring1][P][Branch1][C][C][C@@][Ring2][Ring1][Branch1][Branch1][C][O][C][=Branch1][C][=O][C][O] targets the protein Glucocorticoid receptor and which is also targeted by the compound COC(=O)c1ccc(NC(=O)O[C@@H]2[C@@H](C)c3c(cc(F)c(-c4cccc5cc[nH]c45)c3F)NC2(C)C)cc1."}", "/scratch/micpie/export/compound_protein_compound_2/train_4-0.jsonl": "{"text":"The compound SMILES CC(F)(F)C[C@H](NC(=O)N1CCC2(CC1)CC(=O)N(CC1CC1)C2)C(=O)NC1(C#N)CC1 targets the protein Cathepsin O and which is also targeted by the compound CC(C)CC(C(=O)NCC#N)c1cccc(-c2ccc(C3CNCCO3)cc2)c1."} {"text":"The compound canonical SMILES CCOC(=O)C(=O)N(Cc1ccccc1)c1ccc2c(c1)OC[C@H](COc1ccc(C(=N)N)cc1)O2 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) and which is also targeted by the compound CC(C)=CC(=O)C[C@@H]1C(=O)O[C@H]2CCCC21."}", "/scratch/micpie/export/compound_protein_compound_2/train_3-2.jsonl": "{"text":"User: Can you come up with one example for a compound SMILES that targets the protein Tyrosyl-DNA phosphodiesterase 1?\nAssistant: Sure, the compound SMILES CCOC(=O)Cc1cc(=O)n(-c2nc3ccccc3s2)[nH]1 targets the protein Tyrosyl-DNA phosphodiesterase 1.\nUser: Can you tell another compound SMILES that targets the protein Tyrosyl-DNA phosphodiesterase 1?\nAssistant: Yes, the compound SMILES CCOC(=O)Cc1cc(=O)n(-c2nc3ccccc3s2)[nH]1 targets the compound SMILES O=C(O)CC1OCC=C2CN3CCC45C6=CC(=O)C(O)=C([N+](=O)[O-])C6=NC4C1C2CC35."} {"text":"User: Can you come up with an example for a compound canonical SMILES that targets the protein Cathepsin X?\nAssistant: Sure, the compound canonical SMILES COc1cccc(C(=O)N[C@@H](CCC2CCCCC2)C(=O)N[C@H](CN2CCc3cc(F)ccc32)[C@@H](C)OCc2ccccc2)c1 targets the protein Cathepsin X.\nUser: Can you generate another compound canonical SMILES that targets the protein Cathepsin X?\nAssistant: Yes, of course, the compound canonical SMILES COc1cccc(C(=O)N[C@@H](CCC2CCCCC2)C(=O)N[C@H](CN2CCc3cc(F)ccc32)[C@@H](C)OCc2ccccc2)c1 targets the compound canonical SMILES CC(C)CC(C(=O)NCC#N)c1cccc(-c2ccc(C3CNCCO3)cc2)c1."}", "/scratch/micpie/export/compound_protein_compound_2/test_7-2.jsonl": "{"text":"User: Can you come up with one example for a compound DeepSMILES that targets the protein MAP kinase 14?\nAssistant: Sure, the compound DeepSMILES CC=O)Ncccccc6Ocnccc-cc-ccccF)cc6))))))ncn5CCCNCC6)))))))))))n6 targets the protein MAP kinase 14.\nUser: Can you create another compound DeepSMILES that targets the protein MAP kinase 14?\nAssistant: Of course, the compound DeepSMILES CC=O)Ncccccc6Ocnccc-cc-ccccF)cc6))))))ncn5CCCNCC6)))))))))))n6 targets the compound DeepSMILES CC(C)Cn1c(N)nc2ccc(-c3[nH]c(-c4c(F)cccc4F)nc3-c3ccccc3)nc21.CS(=O)(=O)O."} {"text":"User: Can you come up with an example for a compound DeepSMILES that targets the protein GR?\nAssistant: Sure, the compound DeepSMILES CC[C@@]C[C@H]O)[C@H]cccccc6))))))C[C@H]6CCcccO)ccc6%14 targets the protein GR.\nUser: Can you create another compound DeepSMILES that targets the protein GR?\nAssistant: Yes, of course, the compound DeepSMILES CC[C@@]C[C@H]O)[C@H]cccccc6))))))C[C@H]6CCcccO)ccc6%14 targets the compound DeepSMILES COC(=O)c1ccc(NC(=O)O[C@@H]2[C@@H](C)c3c(cc(F)c(-c4cccc5cc[nH]c45)c3F)NC2(C)C)cc1."}", "/scratch/micpie/export/nr_er_tox21/test_0-10.jsonl": "{"text":"User: Can you give me the canonical SMILES of a molecule that is not toxic in the NR-Estrogen receptor alpha assay?\nAssistant: Yes, I'm happy to help, here you go: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"User: Can you create the DeepSMILES of a molecule that is not toxic in the estrogen receptor alpha assay?\nAssistant: Yes, here you go: CNC)cccccc6"}", "/scratch/micpie/export/nr_er_tox21/valid_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the SMILES CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21 is toxic in the estrogen receptor alpha assay?\nAssistant: No, this molecule is not toxic in the estrogen receptor alpha assay."} {"text":"User: Can you tell me if the molecule with the InChI InChI=1S\/C7H10N.CHF3O3S\/c1-2-8-6-4-3-5-7-8;2-1(3,4)8(5,6)7\/h3-7H,2H2,1H3;(H,5,6,7)\/q+1;\/p-1 is toxic in the estrogen receptor alpha assay?\nAssistant: No, this molecule is not toxic in the estrogen receptor alpha assay."}", "/scratch/micpie/export/nr_er_tox21/test_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the estrogen receptor alpha assay?\nConstraint: You must select none, one or more options from A, B, or C without using any additional words.\nOptions:\nA.) CCCBr\nB.) CC1(C)S[C@@H]2[C@H](NC(=O)[C@H](C(=O)[O-])c3ccsc3)C(=O)N2[C@H]1C(=O)[O-]\nC.) CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C\nAnswer: A, B, C"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-Estrogen receptor alpha assay?\nConstraint: You must select none, one or more options from A or B without using any additional words.\nOptions:\nA [C][C][N][Branch1][Ring1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][=Branch1][C][=Branch1][C][=O][O][C][=C][Ring1][=Branch2]\nB [C][N][Branch1][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1]\nAnswer: A, B"}", "/scratch/micpie/export/nr_er_tox21/train_0-8.jsonl": "{"text":"User: Can you estimate if the molecule with the canonical SMILES CCN1C(=O)NC(c2ccccc2)C1=O is toxic in the estrogen receptor alpha assay?\nAssistant: No, this molecule is not toxic in the estrogen receptor alpha assay."} {"text":"User: Can you estimate if the molecule with the canonical SMILES OCCN1CCN(CCCN2c3ccccc3C=Cc3ccccc32)CC1 is toxic in the NR-Estrogen receptor alpha assay?\nAssistant: No, this molecule is not toxic in the NR-Estrogen receptor alpha assay."}", "/scratch/micpie/export/nr_er_tox21/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-Estrogen receptor alpha assay.\nMolecule DeepSMILES: CCCNCC))CCC))C=O)NccC)cccc6C\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the estrogen receptor alpha assay.\nInChI: InChI=1S\/C8H11N\/c1-9(2)8-6-4-3-5-7-8\/h3-7H,1-2H3\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"}", "/scratch/micpie/export/nr_er_tox21/valid_0-9.jsonl": "{"text":"User: Is the molecule with the canonical SMILES CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21 toxic in the estrogen receptor alpha assay?\nAssistant: No, it is not toxic in the estrogen receptor alpha assay."} {"text":"User: Is the molecule with the SMILES CC[n+]1ccccc1.O=S(=O)([O-])C(F)(F)F toxic in the estrogen receptor alpha assay?\nAssistant: No, it is not toxic in the estrogen receptor alpha assay."}", "/scratch/micpie/export/nr_er_tox21/test_0-1.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C] is not showing toxicity in the NR-ER assay."} {"text":"The molecule with the InChI representation of InChI=1S\/C8H11N\/c1-9(2)8-6-4-3-5-7-8\/h3-7H,1-2H3 is not exhibiting toxicity in the NR-Estrogen receptor alpha assay."}", "/scratch/micpie/export/nr_er_tox21/valid_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of [C][N][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Ring1][S] is not toxic in the estrogen receptor alpha assay."} {"text":"The molecule with the InChI representation of InChI=1S\/C7H10N.CHF3O3S\/c1-2-8-6-4-3-5-7-8;2-1(3,4)8(5,6)7\/h3-7H,2H2,1H3;(H,5,6,7)\/q+1;\/p-1 is not toxic in the NR-Estrogen receptor alpha assay."}", "/scratch/micpie/export/nr_er_tox21/test_0-2.jsonl": "{"text":"Based on the SMILES representation CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C, the molecule has no NR-Estrogen receptor alpha toxicity features."} {"text":"Based on the SMILES representation CN(C)c1ccccc1, the molecule has no estrogen receptor alpha toxicity features."}", "/scratch/micpie/export/nr_er_tox21/valid_0-10.jsonl": "{"text":"User: Can you give me the InChI of a molecule that is not toxic in the estrogen receptor alpha assay?\nAssistant: Yes, I'm happy to help, here you go: InChI=1S\/C19H23ClN2\/c1-21(2)12-5-13-22-18-7-4-3-6-15(18)8-9-16-10-11-17(20)14-19(16)22\/h3-4,6-7,10-11,14H,5,8-9,12-13H2,1-2H3"} {"text":"User: Can you give me the InChI of a molecule that is not toxic in the NR-Estrogen receptor alpha assay?\nAssistant: Yes, I'm happy to help, here you go: InChI=1S\/C7H10N.CHF3O3S\/c1-2-8-6-4-3-5-7-8;2-1(3,4)8(5,6)7\/h3-7H,2H2,1H3;(H,5,6,7)\/q+1;\/p-1"}", "/scratch/micpie/export/nr_er_tox21/train_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-Estrogen receptor alpha assay.\nSELFIES: [C][C][N][C][=Branch1][C][=O][N][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][N][=O]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the NR-Estrogen receptor alpha assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-Estrogen receptor alpha assay.\nSMILES: OCCN1CCN(CCCN2c3ccccc3C=Cc3ccccc32)CC1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the NR-Estrogen receptor alpha assay."}", "/scratch/micpie/export/nr_er_tox21/valid_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the estrogen receptor alpha assay.\nMolecule DeepSMILES: CNC)CCCNcccccc6CCccccCl)cc6%15\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the estrogen receptor alpha assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the estrogen receptor alpha assay.\nSELFIES: [C][C][N+1][=C][C][=C][C][=C][Ring1][=Branch1].[O][=S][=Branch1][C][=O][Branch1][C][O-1][C][Branch1][C][F][Branch1][C][F][F]\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the estrogen receptor alpha assay."}", "/scratch/micpie/export/nr_er_tox21/test_0-9.jsonl": "{"text":"User: Is the molecule with the SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C toxic in the NR-ER assay?\nAssistant: No, it is not toxic in the NR-ER assay."} {"text":"User: Is the molecule with the SELFIES [C][N][Branch1][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1] toxic in the NR-ER assay?\nAssistant: No, it is not toxic in the NR-ER assay."}", "/scratch/micpie/export/nr_er_tox21/test_0-0.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20) is not toxic in the estrogen receptor alpha assay."} {"text":"The molecule with the canonical SMILES CN(C)c1ccccc1 is not toxic in the NR-Estrogen receptor alpha assay."}", "/scratch/micpie/export/nr_er_tox21/valid_0-7.jsonl": "{"text":"Task: Please generate a canonical SMILES based on the description below.\nDescription: A molecule that is toxic in the NR-Estrogen receptor alpha assay.\nResult: CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21"} {"text":"Task: Please generate a canonical SMILES based on the text description.\nDescription: A molecule that is toxic in the estrogen receptor alpha assay.\nResult: CC[n+]1ccccc1.O=S(=O)([O-])C(F)(F)F"}", "/scratch/micpie/export/nr_er_tox21/test_0-3.jsonl": "{"text":"The SELFIES [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C] represents a molecule that is not identified as toxic in the NR-Estrogen receptor alpha assay."} {"text":"The canonical SMILES CN(C)c1ccccc1 is from a molecule that is not identified as toxic in the estrogen receptor alpha assay."}", "/scratch/micpie/export/nr_er_tox21/valid_0-11.jsonl": "{"text":"User: I'm looking for the DeepSMILES of a molecule that is not toxic in the NR-ER assay?\nAssistant: This is a molecule that is not toxic in the NR-ER assay: CNC)CCCNcccccc6CCccccCl)cc6%15"} {"text":"User: I'm looking for the DeepSMILES of a molecule that is not toxic in the estrogen receptor alpha assay?\nAssistant: This is a molecule that is not toxic in the estrogen receptor alpha assay: CC[n+]ccccc6.O=S=O)[O-])CF)F)F"}", "/scratch/micpie/export/nr_er_tox21/train_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][N][C][=Branch1][C][=O][N][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][N][=O] is not toxic in the NR-ER assay."} {"text":"The molecule with the DeepSMILES representation of OCCNCCNCCCNcccccc6C=Ccccccc6%15))))))))))))))))))CC6 is not toxic in the estrogen receptor alpha assay."}", "/scratch/micpie/export/nr_er_tox21/test_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the estrogen receptor alpha assay.\nMolecule SMILES: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the estrogen receptor alpha assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-Estrogen receptor alpha assay.\nInChI: InChI=1S\/C8H11N\/c1-9(2)8-6-4-3-5-7-8\/h3-7H,1-2H3\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the NR-Estrogen receptor alpha assay."}", "/scratch/micpie/export/nr_er_tox21/train_0-10.jsonl": "{"text":"User: Can you create the InChI of a molecule that is not toxic in the NR-Estrogen receptor alpha assay?\nAssistant: Sure, here you go: InChI=1S\/C11H12N2O2\/c1-2-13-10(14)9(12-11(13)15)8-6-4-3-5-7-8\/h3-7,9H,2H2,1H3,(H,12,15)"} {"text":"User: Can you generate the InChI of a molecule that is not toxic in the NR-Estrogen receptor alpha assay?\nAssistant: Of course, here you go: InChI=1S\/C23H29N3O\/c27-19-18-25-16-14-24(15-17-25)12-5-13-26-22-8-3-1-6-20(22)10-11-21-7-2-4-9-23(21)26\/h1-4,6-11,27H,5,12-19H2"}", "/scratch/micpie/export/nr_er_tox21/train_0-3.jsonl": "{"text":"The InChI InChI=1S\/C11H12N2O2\/c1-2-13-10(14)9(12-11(13)15)8-6-4-3-5-7-8\/h3-7,9H,2H2,1H3,(H,12,15) represents a molecule that is not identified as toxic in the NR-Estrogen receptor alpha assay."} {"text":"The canonical SMILES OCCN1CCN(CCCN2c3ccccc3C=Cc3ccccc32)CC1 represents a molecule that is not identified as toxic in the NR-Estrogen receptor alpha assay."}", "/scratch/micpie/export/nr_er_tox21/train_0-12.jsonl": "{"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very curious. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the NR-ER assay.\nAssistant: Got it, this SELFIES is not toxic in the NR-ER assay: [C][C][N][C][=Branch1][C][=O][N][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][N][=O]"} {"text":"User: I want to come up with a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be toxic in the estrogen receptor alpha assay.\nAssistant: Got it, this canonical SMILES is not toxic in the estrogen receptor alpha assay: OCCN1CCN(CCCN2c3ccccc3C=Cc3ccccc32)CC1"}", "/scratch/micpie/export/nr_er_tox21/test_0-13.jsonl": "{"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the NR-Estrogen receptor alpha assay.\nAssistant: Understood, this SELFIES is not toxic in the NR-Estrogen receptor alpha assay: [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C]"} {"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the NR-Estrogen receptor alpha assay.\nAssistant: Understood, this SMILES is not toxic in the NR-Estrogen receptor alpha assay: CN(C)c1ccccc1"}", "/scratch/micpie/export/nr_er_tox21/valid_0-2.jsonl": "{"text":"Based on the canonical SMILES representation CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21, the molecule has no NR-ER toxicity features."} {"text":"Based on the canonical SMILES CC[n+]1ccccc1.O=S(=O)([O-])C(F)(F)F, the molecule has no estrogen receptor alpha toxicity characteristics."}", "/scratch/micpie/export/nr_er_tox21/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [C][C][N][C][=Branch1][C][=O][N][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][N][=O] toxic in the NR-ER assay?\nConstraint: Even if you are uncertain, you must pick either a or b without using any other words.\nOptions:\na) True\nb) False\nAnswer: b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [O][C][C][N][C][C][N][Branch2][Ring1][=Branch2][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][#C][C][C][Ring2][Ring1][Branch2] toxic in the NR-ER assay?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any additional words.\nOptions:\n1.) True\n2.) False\nAnswer: 2"}", "/scratch/micpie/export/nr_er_tox21/valid_0-1.jsonl": "{"text":"The molecule with the DeepSMILES representation of CNC)CCCNcccccc6CCccccCl)cc6%15 is not showing toxicity in the NR-ER assay."} {"text":"The molecule with the DeepSMILES representation of CC[n+]ccccc6.O=S=O)[O-])CF)F)F is not showing toxicity in the NR-ER assay."}", "/scratch/micpie/export/nr_er_tox21/valid_0-13.jsonl": "{"text":"User: I want to create a SELFIES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the NR-Estrogen receptor alpha assay.\nAssistant: Got it, this SELFIES is not toxic in the NR-Estrogen receptor alpha assay: [C][N][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Ring1][S]"} {"text":"User: I want to come up with a InChI.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the NR-Estrogen receptor alpha assay.\nAssistant: Ok, this InChI is not toxic in the NR-Estrogen receptor alpha assay: InChI=1S\/C7H10N.CHF3O3S\/c1-2-8-6-4-3-5-7-8;2-1(3,4)8(5,6)7\/h3-7H,2H2,1H3;(H,5,6,7)\/q+1;\/p-1"}", "/scratch/micpie/export/nr_er_tox21/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-ER assay.\nMolecule SELFIES: [C][N][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Ring1][S]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-Estrogen receptor alpha assay.\nDeepSMILES: CC[n+]ccccc6.O=S=O)[O-])CF)F)F\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"}", "/scratch/micpie/export/nr_er_tox21/train_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the estrogen receptor alpha assay?\nConstraint: You must select none, one or more options from 1, 2, 3, 4, or 5 without using any other words.\nOptions:\n1) InChI=1S\/C11H12N2O2\/c1-2-13-10(14)9(12-11(13)15)8-6-4-3-5-7-8\/h3-7,9H,2H2,1H3,(H,12,15)\n2) InChI=1S\/C10H14NO6P\/c1-3-15-18(14,16-4-2)17-10-7-5-9(6-8-10)11(12)13\/h5-8H,3-4H2,1-2H3\n3) InChI=1S\/C6H6Br2N2\/c7-4-6(8,5-10)2-1-3-9\/h1-2,4H2\n4) InChI=1S\/C4H10N2\/c1-2-6-4-3-5-1\/h5-6H,1-4H2\n5) InChI=1S\/C6H10O4S2\/c7-5(8)1-3-11-12-4-2-6(9)10\/h1-4H2,(H,7,8)(H,9,10)\nAnswer: 1, 2, 3, 5"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-ER assay?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any additional words.\nOptions:\n1: NC(=O)NC(N)=O\n2: CN[C@@H](C)[C@H](O)c1ccccc1\n3: OCCN1CCN(CCCN2c3ccccc3C=Cc3ccccc32)CC1\n4: O=C1c2ccccc2-c2n[nH]c3cccc1c23\nAnswer: 1, 3, 4"}", "/scratch/micpie/export/nr_er_tox21/valid_0-4.jsonl": "{"text":"The SMILES CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21 is not toxic in the estrogen receptor alpha assay."} {"text":"The molecule canonical SMILES CC[n+]1ccccc1.O=S(=O)([O-])C(F)(F)F is not toxic in the estrogen receptor alpha assay."}", "/scratch/micpie/export/nr_er_tox21/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-Estrogen receptor alpha assay.\ncanonical SMILES: CCN1C(=O)NC(c2ccccc2)C1=O\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-Estrogen receptor alpha assay.\nMolecule DeepSMILES: OCCNCCNCCCNcccccc6C=Ccccccc6%15))))))))))))))))))CC6\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/nr_er_tox21/valid_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-ER assay?\nConstraint: You must select none, one or more options from A, B, or C without using any other words.\nOptions:\n(A) [C][N][Branch1][C][C][C][Branch1][C][C][Branch1][C][C][C][O]\n(B) [C][C][Branch1][C][C][C][C@H1][Branch1][Ring1][C][N][C][C][=Branch1][C][=O][O]\n(C) [C][N][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Ring1][S]\nAnswer: C"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-ER assay?\nConstraint: You must select none, one or more options from A, B, C, or D without using any additional words.\nOptions:\n[A] [C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][#Branch1][C][=Branch1][C][=O][O][C][C][=C][Ring1][#Branch2]\n[B] [C][=C][C][=Branch1][C][=O][O][C][C][Branch1][C][O][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1]\n[C] [C][C][=Branch1][C][=O][S][C][C][Branch1][P][C][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][C][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][O]\n[D] [C][C][N+1][=C][C][=C][C][=C][Ring1][=Branch1].[O][=S][=Branch1][C][=O][Branch1][C][O-1][C][Branch1][C][F][Branch1][C][F][F]\nAnswer: A, B, C, D"}", "/scratch/micpie/export/nr_er_tox21/valid_0-12.jsonl": "{"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be toxic in the NR-Estrogen receptor alpha assay.\nAssistant: Got it, this canonical SMILES is not toxic in the NR-Estrogen receptor alpha assay: CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21"} {"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very curious. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be toxic in the NR-ER assay.\nAssistant: Got it, here you go, this SMILES is not toxic in the NR-ER assay: CC[n+]1ccccc1.O=S(=O)([O-])C(F)(F)F"}", "/scratch/micpie/export/nr_er_tox21/train_0-2.jsonl": "{"text":"Based on the DeepSMILES CCNC=O)NCcccccc6))))))C5=O, the molecule has no NR-Estrogen receptor alpha toxicity properties."} {"text":"Based on the canonical SMILES OCCN1CCN(CCCN2c3ccccc3C=Cc3ccccc32)CC1, the molecule has no NR-ER toxicity properties."}", "/scratch/micpie/export/nr_er_tox21/test_0-11.jsonl": "{"text":"User: I'm looking for the SELFIES of a molecule that is not toxic in the NR-Estrogen receptor alpha assay?\nAssistant: This is a molecule that is not toxic in the NR-Estrogen receptor alpha assay: [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C]"} {"text":"User: I'm looking for the DeepSMILES of a molecule that is not toxic in the NR-ER assay?\nAssistant: This is a molecule that is not toxic in the NR-ER assay: CNC)cccccc6"}", "/scratch/micpie/export/nr_er_tox21/train_0-7.jsonl": "{"text":"Task: Please generate a molecule canonical SMILES based on the text description.\nDescription: A molecule that is toxic in the NR-ER assay.\nResult: CCN1C(=O)NC(c2ccccc2)C1=O"} {"text":"Task: Please generate a molecule SELFIES based on the description below.\nDescription: A molecule that is toxic in the estrogen receptor alpha assay.\nResult: [O][C][C][N][C][C][N][Branch2][Ring1][=Branch2][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][#C][C][C][Ring2][Ring1][Branch2]"}", "/scratch/micpie/export/nr_er_tox21/train_0-11.jsonl": "{"text":"User: I'm looking for the DeepSMILES of a molecule that is not toxic in the NR-Estrogen receptor alpha assay?\nAssistant: This is a molecule that is not toxic in the NR-Estrogen receptor alpha assay: CCNC=O)NCcccccc6))))))C5=O"} {"text":"User: I'm searching for the SELFIES of a molecule that is not toxic in the estrogen receptor alpha assay?\nAssistant: This is a molecule that is not toxic in the estrogen receptor alpha assay: [O][C][C][N][C][C][N][Branch2][Ring1][=Branch2][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][#C][C][C][Ring2][Ring1][Branch2]"}", "/scratch/micpie/export/nr_er_tox21/train_0-1.jsonl": "{"text":"The molecule with the canonical SMILES CCN1C(=O)NC(c2ccccc2)C1=O is not showing toxicity in the NR-ER assay."} {"text":"The molecule with the SELFIES representation of [O][C][C][N][C][C][N][Branch2][Ring1][=Branch2][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][#C][C][C][Ring2][Ring1][Branch2] is not showing toxicity in the NR-ER assay."}", "/scratch/micpie/export/nr_er_tox21/train_0-13.jsonl": "{"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the NR-ER assay.\nAssistant: Understood, this canonical SMILES is not toxic in the NR-ER assay: CCN1C(=O)NC(c2ccccc2)C1=O"} {"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the NR-ER assay.\nAssistant: Ok, this SMILES is not toxic in the NR-ER assay: OCCN1CCN(CCCN2c3ccccc3C=Cc3ccccc32)CC1"}", "/scratch/micpie/export/nr_er_tox21/train_0-4.jsonl": "{"text":"The canonical SMILES CCN1C(=O)NC(c2ccccc2)C1=O is not toxic in the estrogen receptor alpha assay."} {"text":"The molecule DeepSMILES OCCNCCNCCCNcccccc6C=Ccccccc6%15))))))))))))))))))CC6 is not toxic in the NR-Estrogen receptor alpha assay."}", "/scratch/micpie/export/nr_er_tox21/test_0-7.jsonl": "{"text":"Task: Please generate a molecule SMILES based on the text description below.\nDescription: A molecule that is toxic in the estrogen receptor alpha assay.\nResult: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"Task: Please create a SMILES based on the description below.\nDescription: A molecule that is toxic in the NR-Estrogen receptor alpha assay.\nResult: CN(C)c1ccccc1"}", "/scratch/micpie/export/nr_er_tox21/train_0-9.jsonl": "{"text":"User: Is the molecule with the SMILES CCN1C(=O)NC(c2ccccc2)C1=O toxic in the NR-ER assay?\nAssistant: No, it is not toxic in the NR-ER assay."} {"text":"User: Is the molecule with the canonical SMILES OCCN1CCN(CCCN2c3ccccc3C=Cc3ccccc32)CC1 toxic in the NR-Estrogen receptor alpha assay?\nAssistant: No, it is not toxic in the NR-Estrogen receptor alpha assay."}", "/scratch/micpie/export/nr_er_tox21/valid_0-3.jsonl": "{"text":"The SMILES CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21 represents a molecule that is not identified as toxic in the NR-ER assay."} {"text":"The SELFIES [C][C][N+1][=C][C][=C][C][=C][Ring1][=Branch1].[O][=S][=Branch1][C][=O][Branch1][C][O-1][C][Branch1][C][F][Branch1][C][F][F] represents a molecule that is not identified as toxic in the NR-ER assay."}", "/scratch/micpie/export/nr_er_tox21/test_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the InChI InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20) is toxic in the NR-ER assay?\nAssistant: No, this molecule is not toxic in the NR-ER assay."} {"text":"User: Can you figure out if the molecule with the InChI InChI=1S\/C8H11N\/c1-9(2)8-6-4-3-5-7-8\/h3-7H,1-2H3 is toxic in the NR-ER assay?\nAssistant: No, this molecule is not toxic in the NR-ER assay."}", "/scratch/micpie/export/nr_er_tox21/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C toxic in the NR-ER assay?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any additional words.\nOptions:\n1) True\n2) False\nAnswer: 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of CN(C)c1ccccc1 toxic in the NR-ER assay?\nConstraint: Even if you are not sure, you must pick either A or B without using any other words.\nOptions:\nA: True\nB: False\nAnswer: B"}", "/scratch/micpie/export/nr_er_tox21/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [C][N][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Ring1][S] toxic in the NR-ER assay?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any additional words.\nOptions:\n1: True\n2: False\nAnswer: 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES CC[n+]ccccc6.O=S=O)[O-])CF)F)F toxic in the NR-ER assay?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any additional words.\nOptions:\n(1) True\n(2) False\nAnswer: 2"}", "/scratch/micpie/export/nr_er_tox21/test_0-4.jsonl": "{"text":"The SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C is not toxic in the NR-Estrogen receptor alpha assay."} {"text":"The SELFIES [C][N][Branch1][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1] is not toxic in the estrogen receptor alpha assay."}", "/scratch/micpie/export/nr_er_tox21/test_0-12.jsonl": "{"text":"User: I want to create a SELFIES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be toxic in the estrogen receptor alpha assay.\nAssistant: Ok, this SELFIES is not toxic in the estrogen receptor alpha assay: [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C]"} {"text":"User: I want to come up with a canonical SMILES.\nAssistant: This sounds very curious. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the estrogen receptor alpha assay.\nAssistant: Ok, here you go, this canonical SMILES is not toxic in the estrogen receptor alpha assay: CN(C)c1ccccc1"}", "/scratch/micpie/export/clearance_astrazeneca/test_0-10.jsonl": "{"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a drug clearance of 98.000 mL \/ (min g).\nAssistant: Got it, this DeepSMILES represents a molecule that has a drug clearance of 98.000 mL \/ (min g): O=CNCCCCCCNCCccccO)cncO)sc95)))))))))))))))))))Ncccccc6"} {"text":"User: I want to create a InChI.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should have a volume of plasma cleared of a drug over a specified time period of 27.300 mL \/ (min g).\nAssistant: Got it, this InChI represents a molecule that has a volume of plasma cleared of a drug over a specified time period of 27.300 mL \/ (min g): InChI=1S\/C13H14N4O3S2\/c1-7(5-18)14-10-9-11(17-13(19)22-9)16-12(15-10)21-6-8-3-2-4-20-8\/h2-4,7,18H,5-6H2,1H3,(H2,14,15,16,17,19)\/t7-\/m1\/s1"}", "/scratch/micpie/export/clearance_astrazeneca/valid_0-8.jsonl": "{"text":"User: Can you create the canonical SMILES of a molecule that has a volume of plasma cleared of a drug over a specified time period of 33.880 mL \/ (min g)?\nAssistant: Of course, here you go: CCNC(=O)OC1CCN(c2nncc3cc(OC)c(OC)cc23)CC1"} {"text":"User: Can you create the SELFIES of a molecule that has a drug clearance of 8.000 mL \/ (min g)?\nAssistant: Sure, here you go: [N][#C][C][=C][C][=C][C][Branch2][Ring2][Branch1][N][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][Cl][C][=C][C][=N][C][Branch1][O][N][C][C][C@@H1][Branch1][C][O][C][Ring1][=Branch1][=C][C][=C][Ring1][P][Ring1][N][=C][Ring2][Ring1][N]"}", "/scratch/micpie/export/clearance_astrazeneca/train_0-8.jsonl": "{"text":"User: Can you give me the canonical SMILES of a molecule that has a drug clearance of 3.000 mL \/ (min g)?\nAssistant: Sure, here you go: Cc1c(C(=O)NC2C3CC4CC(C3)CC2C4)cnn1-c1ccc(C(=O)O)cc1"} {"text":"User: Can you create the SELFIES of a molecule that has a drug clearance of 12.020 mL \/ (min g)?\nAssistant: Sure, here you go: [C][N][C][C][=C][C][=C][Branch2][Ring1][Branch2][C][NH1][C][=C][C][Branch1][C][F][=C][C][=C][Ring1][#Branch1][C][=Ring1][#Branch2][C][C][N][C][Ring1][#Branch1][=O][C][=C][Ring2][Ring1][Branch1]"}", "/scratch/micpie/export/clearance_astrazeneca/test_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the drug clearance in mL \/ (min g).\nSELFIES: [O][=C][Branch2][Ring1][=C][N][C][C][C][C][C][C][N][C][C][C][=C][C][=C][Branch1][C][O][C][N][=C][Branch1][C][O][S][C][Ring1][O][=Ring1][=Branch1][N][C][=C][C][=C][C][=C][Ring1][=Branch1]\nConstraint: Even if you are not sure, you must answer with a numeric value in mL \/ (min g) without the unit and without using any other words.\nResult: 98.000"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the volume of plasma cleared of a drug over a specified time period in mL \/ (min g).\nSELFIES: [C][C@H1][Branch1][Ring1][C][O][N][C][=N][C][Branch1][#Branch2][S][C][C][=C][C][=C][O][Ring1][Branch1][=N][C][NH1][C][=Branch1][C][=O][S][C][Ring1][P][=Ring1][=Branch1]\nConstraint: Even if you are not sure, you must answer with a numeric value in mL \/ (min g) without the unit and without using any other words.\nResult: 27.300"}", "/scratch/micpie/export/clearance_astrazeneca/valid_0-9.jsonl": "{"text":"User: I'm looking for the SMILES of a molecule that has a volume of plasma cleared of a drug over a specified time period of 33.880 mL \/ (min g).\nAssistant: This is a molecule that has a volume of plasma cleared of a drug over a specified time period of 33.880 mL \/ (min g): CCNC(=O)OC1CCN(c2nncc3cc(OC)c(OC)cc23)CC1"} {"text":"User: I'm looking for the InChI of a molecule that has a volume of plasma cleared of a drug over a specified time period of 8.000 mL \/ (min g).\nAssistant: This is a molecule that has a volume of plasma cleared of a drug over a specified time period of 8.000 mL \/ (min g): InChI=1S\/C22H20ClN5O2\/c23-18-5-6-19-17(4-7-20(26-19)28-9-8-16(29)13-28)22(18)27-21(30)12-25-15-3-1-2-14(10-15)11-24\/h1-7,10,16,25,29H,8-9,12-13H2,(H,27,30)\/t16-\/m1\/s1"}", "/scratch/micpie/export/clearance_astrazeneca/test_0-1.jsonl": "{"text":"Based on the canonical SMILES O=C(NCCCCCCNCCc1ccc(O)c2nc(O)sc12)Nc1ccccc1, the molecule has a volume of plasma cleared of a drug over a specified time period of 98.000 mL \/ (min g)."} {"text":"Based on the SELFIES [C][C@H1][Branch1][Ring1][C][O][N][C][=N][C][Branch1][#Branch2][S][C][C][=C][C][=C][O][Ring1][Branch1][=N][C][NH1][C][=Branch1][C][=O][S][C][Ring1][P][=Ring1][=Branch1], the molecule has a drug clearance of 27.300 mL \/ (min g)."}", "/scratch/micpie/export/clearance_astrazeneca/valid_0-0.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C18H24N4O4\/c1-4-19-18(23)26-13-5-7-22(8-6-13)17-14-10-16(25-3)15(24-2)9-12(14)11-20-21-17\/h9-11,13H,4-8H2,1-3H3,(H,19,23) has a volume of plasma cleared of a drug over a specified time period of 33.880 mL \/ (min g)."} {"text":"The molecule with the canonical SMILES N#Cc1cccc(NCC(=O)Nc2c(Cl)ccc3nc(N4CC[C@@H](O)C4)ccc23)c1 has a drug clearance of 8.000 mL \/ (min g)."}", "/scratch/micpie/export/clearance_astrazeneca/test_0-2.jsonl": "{"text":"The SMILES O=C(NCCCCCCNCCc1ccc(O)c2nc(O)sc12)Nc1ccccc1 represents a molecule that has a drug clearance of 98.000 mL \/ (min g)."} {"text":"The InChI InChI=1S\/C13H14N4O3S2\/c1-7(5-18)14-10-9-11(17-13(19)22-9)16-12(15-10)21-6-8-3-2-4-20-8\/h2-4,7,18H,5-6H2,1H3,(H2,14,15,16,17,19)\/t7-\/m1\/s1 represents a molecule with a volume of plasma cleared of a drug over a specified time period of 27.300 mL \/ (min g)."}", "/scratch/micpie/export/clearance_astrazeneca/valid_0-10.jsonl": "{"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should have a drug clearance of 33.880 mL \/ (min g).\nAssistant: Ok, here you go, this DeepSMILES represents a molecule that has a drug clearance of 33.880 mL \/ (min g): CCNC=O)OCCCNcnnccccOC))cOC))cc%106))))))))))CC6"} {"text":"User: I want to generate a InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a volume of plasma cleared of a drug over a specified time period of 8.000 mL \/ (min g).\nAssistant: Ok, here you go, this InChI represents a molecule that has a volume of plasma cleared of a drug over a specified time period of 8.000 mL \/ (min g): InChI=1S\/C22H20ClN5O2\/c23-18-5-6-19-17(4-7-20(26-19)28-9-8-16(29)13-28)22(18)27-21(30)12-25-15-3-1-2-14(10-15)11-24\/h1-7,10,16,25,29H,8-9,12-13H2,(H,27,30)\/t16-\/m1\/s1"}", "/scratch/micpie/export/clearance_astrazeneca/train_0-6.jsonl": "{"text":"Task: Please create a SELFIES based on the text description below.\nDescription: A molecule that has a drug clearance of 3.000 mL \/ (min g).\nResult: [C][C][=C][Branch2][Ring1][Branch2][C][=Branch1][C][=O][N][C][C][C][C][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][C][Ring1][=Branch2][C][Ring1][#Branch1][C][=N][N][Ring2][Ring1][C][C][=C][C][=C][Branch1][=Branch1][C][=Branch1][C][=O][O][C][=C][Ring1][=Branch2]"} {"text":"Task: Please create a molecule DeepSMILES based on the description.\nDescription: A molecule that has a volume of plasma cleared of a drug over a specified time period of 12.020 mL \/ (min g).\nResult: CNCcccc-c[nH]cccF)ccc6c9CCNC7=O))))))))))))))cc6"}", "/scratch/micpie/export/clearance_astrazeneca/valid_0-6.jsonl": "{"text":"Task: Please create a molecule InChI based on the text description.\nDescription: A molecule that has a volume of plasma cleared of a drug over a specified time period of 33.880 mL \/ (min g).\nResult: InChI=1S\/C18H24N4O4\/c1-4-19-18(23)26-13-5-7-22(8-6-13)17-14-10-16(25-3)15(24-2)9-12(14)11-20-21-17\/h9-11,13H,4-8H2,1-3H3,(H,19,23)"} {"text":"Task: Please give me a molecule InChI based on the text description.\nDescription: A molecule that has a volume of plasma cleared of a drug over a specified time period of 8.000 mL \/ (min g).\nResult: InChI=1S\/C22H20ClN5O2\/c23-18-5-6-19-17(4-7-20(26-19)28-9-8-16(29)13-28)22(18)27-21(30)12-25-15-3-1-2-14(10-15)11-24\/h1-7,10,16,25,29H,8-9,12-13H2,(H,27,30)\/t16-\/m1\/s1"}", "/scratch/micpie/export/clearance_astrazeneca/test_0-9.jsonl": "{"text":"User: I'm looking for the SELFIES of a molecule that has a volume of plasma cleared of a drug over a specified time period of 98.000 mL \/ (min g).\nAssistant: This is a molecule that has a volume of plasma cleared of a drug over a specified time period of 98.000 mL \/ (min g): [O][=C][Branch2][Ring1][=C][N][C][C][C][C][C][C][N][C][C][C][=C][C][=C][Branch1][C][O][C][N][=C][Branch1][C][O][S][C][Ring1][O][=Ring1][=Branch1][N][C][=C][C][=C][C][=C][Ring1][=Branch1]"} {"text":"User: I'm searching for the SELFIES of a molecule that has a volume of plasma cleared of a drug over a specified time period of 27.300 mL \/ (min g).\nAssistant: This is a molecule that has a volume of plasma cleared of a drug over a specified time period of 27.300 mL \/ (min g): [C][C@H1][Branch1][Ring1][C][O][N][C][=N][C][Branch1][#Branch2][S][C][C][=C][C][=C][O][Ring1][Branch1][=N][C][NH1][C][=Branch1][C][=O][S][C][Ring1][P][=Ring1][=Branch1]"}", "/scratch/micpie/export/clearance_astrazeneca/test_0-0.jsonl": "{"text":"The molecule with the InChI representation of InChI=1S\/C22H28N4O3S\/c27-18-11-10-16(20-19(18)26-22(29)30-20)12-15-23-13-6-1-2-7-14-24-21(28)25-17-8-4-3-5-9-17\/h3-5,8-11,23,27H,1-2,6-7,12-15H2,(H,26,29)(H2,24,25,28) has a volume of plasma cleared of a drug over a specified time period of 98.000 mL \/ (min g)."} {"text":"The molecule with the SMILES C[C@H](CO)Nc1nc(SCc2ccco2)nc2[nH]c(=O)sc12 has a volume of plasma cleared of a drug over a specified time period of 27.300 mL \/ (min g)."}", "/scratch/micpie/export/clearance_astrazeneca/valid_0-7.jsonl": "{"text":"User: Can you estimate the volume of plasma cleared of a drug over a specified time period in mL \/ (min g) of the molecule with the InChI InChI=1S\/C18H24N4O4\/c1-4-19-18(23)26-13-5-7-22(8-6-13)17-14-10-16(25-3)15(24-2)9-12(14)11-20-21-17\/h9-11,13H,4-8H2,1-3H3,(H,19,23)?\nAssistant: Yes, I'm happy to help, this molecule has a volume of plasma cleared of a drug over a specified time period of 33.880 mL \/ (min g)."} {"text":"User: Can you estimate the volume of plasma cleared of a drug over a specified time period in mL \/ (min g) of the molecule with the canonical SMILES N#Cc1cccc(NCC(=O)Nc2c(Cl)ccc3nc(N4CC[C@@H](O)C4)ccc23)c1?\nAssistant: Of course, this molecule has a volume of plasma cleared of a drug over a specified time period of 8.000 mL \/ (min g)."}", "/scratch/micpie/export/clearance_astrazeneca/test_0-3.jsonl": "{"text":"The molecule with the DeepSMILES O=CNCCCCCCNCCccccO)cncO)sc95)))))))))))))))))))Ncccccc6 has a volume of plasma cleared of a drug over a specified time period of 98.000 mL \/ (min g)."} {"text":"The molecule with the InChI InChI=1S\/C13H14N4O3S2\/c1-7(5-18)14-10-9-11(17-13(19)22-9)16-12(15-10)21-6-8-3-2-4-20-8\/h2-4,7,18H,5-6H2,1H3,(H2,14,15,16,17,19)\/t7-\/m1\/s1 has a drug clearance of 27.300 mL \/ (min g)."}", "/scratch/micpie/export/clearance_astrazeneca/valid_0-11.jsonl": "{"text":"User: I want to come up with a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should have a volume of plasma cleared of a drug over a specified time period of 33.880 mL \/ (min g).\nAssistant: Got it, this canonical SMILES represents a molecule that has a volume of plasma cleared of a drug over a specified time period of 33.880 mL \/ (min g): CCNC(=O)OC1CCN(c2nncc3cc(OC)c(OC)cc23)CC1"} {"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a volume of plasma cleared of a drug over a specified time period of 8.000 mL \/ (min g).\nAssistant: Got it, this canonical SMILES represents a molecule that has a volume of plasma cleared of a drug over a specified time period of 8.000 mL \/ (min g): N#Cc1cccc(NCC(=O)Nc2c(Cl)ccc3nc(N4CC[C@@H](O)C4)ccc23)c1"}", "/scratch/micpie/export/clearance_astrazeneca/train_0-0.jsonl": "{"text":"The molecule with the SMILES representation of Cc1c(C(=O)NC2C3CC4CC(C3)CC2C4)cnn1-c1ccc(C(=O)O)cc1 has a volume of plasma cleared of a drug over a specified time period of 3.000 mL \/ (min g)."} {"text":"The molecule with the InChI InChI=1S\/C19H18FN3O\/c1-21-10-11-2-4-12(5-3-11)18-14-6-7-22-19(24)15-8-13(20)9-16(23-18)17(14)15\/h2-5,8-9,21,23H,6-7,10H2,1H3,(H,22,24) has a volume of plasma cleared of a drug over a specified time period of 12.020 mL \/ (min g)."}", "/scratch/micpie/export/clearance_astrazeneca/test_0-6.jsonl": "{"text":"Task: Please generate a SELFIES based on the description.\nDescription: A molecule that has a volume of plasma cleared of a drug over a specified time period of 98.000 mL \/ (min g).\nResult: [O][=C][Branch2][Ring1][=C][N][C][C][C][C][C][C][N][C][C][C][=C][C][=C][Branch1][C][O][C][N][=C][Branch1][C][O][S][C][Ring1][O][=Ring1][=Branch1][N][C][=C][C][=C][C][=C][Ring1][=Branch1]"} {"text":"Task: Please give me a SELFIES based on the text description.\nDescription: A molecule that has a volume of plasma cleared of a drug over a specified time period of 27.300 mL \/ (min g).\nResult: [C][C@H1][Branch1][Ring1][C][O][N][C][=N][C][Branch1][#Branch2][S][C][C][=C][C][=C][O][Ring1][Branch1][=N][C][NH1][C][=Branch1][C][=O][S][C][Ring1][P][=Ring1][=Branch1]"}", "/scratch/micpie/export/clearance_astrazeneca/train_0-10.jsonl": "{"text":"User: I want to come up with a InChI.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should have a drug clearance of 3.000 mL \/ (min g).\nAssistant: Ok, here you go, this InChI represents a molecule that has a drug clearance of 3.000 mL \/ (min g): InChI=1S\/C22H25N3O3\/c1-12-19(11-23-25(12)18-4-2-15(3-5-18)22(27)28)21(26)24-20-16-7-13-6-14(9-16)10-17(20)8-13\/h2-5,11,13-14,16-17,20H,6-10H2,1H3,(H,24,26)(H,27,28)"} {"text":"User: I want to create a molecule SELFIES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a volume of plasma cleared of a drug over a specified time period of 12.020 mL \/ (min g).\nAssistant: Ok, this SELFIES represents a molecule that has a volume of plasma cleared of a drug over a specified time period of 12.020 mL \/ (min g): [C][N][C][C][=C][C][=C][Branch2][Ring1][Branch2][C][NH1][C][=C][C][Branch1][C][F][=C][C][=C][Ring1][#Branch1][C][=Ring1][#Branch2][C][C][N][C][Ring1][#Branch1][=O][C][=C][Ring2][Ring1][Branch1]"}", "/scratch/micpie/export/clearance_astrazeneca/train_0-3.jsonl": "{"text":"The molecule with the canonical SMILES Cc1c(C(=O)NC2C3CC4CC(C3)CC2C4)cnn1-c1ccc(C(=O)O)cc1 has a drug clearance of 3.000 mL \/ (min g)."} {"text":"The molecule with the SELFIES [C][N][C][C][=C][C][=C][Branch2][Ring1][Branch2][C][NH1][C][=C][C][Branch1][C][F][=C][C][=C][Ring1][#Branch1][C][=Ring1][#Branch2][C][C][N][C][Ring1][#Branch1][=O][C][=C][Ring2][Ring1][Branch1] has a volume of plasma cleared of a drug over a specified time period of 12.020 mL \/ (min g)."}", "/scratch/micpie/export/clearance_astrazeneca/valid_0-2.jsonl": "{"text":"The SMILES CCNC(=O)OC1CCN(c2nncc3cc(OC)c(OC)cc23)CC1 represents a molecule with a volume of plasma cleared of a drug over a specified time period of 33.880 mL \/ (min g)."} {"text":"The DeepSMILES N#CcccccNCC=O)NccCl)cccncNCC[C@@H]O)C5)))))ccc%106))))))))))))))c6 represents a molecule with a volume of plasma cleared of a drug over a specified time period of 8.000 mL \/ (min g)."}", "/scratch/micpie/export/clearance_astrazeneca/valid_0-1.jsonl": "{"text":"Based on the InChI representation of InChI=1S\/C18H24N4O4\/c1-4-19-18(23)26-13-5-7-22(8-6-13)17-14-10-16(25-3)15(24-2)9-12(14)11-20-21-17\/h9-11,13H,4-8H2,1-3H3,(H,19,23), the molecule has a drug clearance of 33.880 mL \/ (min g)."} {"text":"Based on the canonical SMILES N#Cc1cccc(NCC(=O)Nc2c(Cl)ccc3nc(N4CC[C@@H](O)C4)ccc23)c1, the molecule has a volume of plasma cleared of a drug over a specified time period of 8.000 mL \/ (min g)."}", "/scratch/micpie/export/clearance_astrazeneca/valid_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the drug clearance in mL \/ (min g).\nSELFIES: [C][C][N][C][=Branch1][C][=O][O][C][C][C][N][Branch2][Ring1][#Branch1][C][=N][N][=C][C][=C][C][Branch1][Ring1][O][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][=C][Ring1][#Branch2][C][C][Ring2][Ring1][Ring2]\nConstraint: Even if you are uncertain, you must answer with a numeric value in mL \/ (min g) without the unit and without using any additional words.\nResult: 33.880"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the drug clearance in mL \/ (min g).\nMolecule SMILES: N#Cc1cccc(NCC(=O)Nc2c(Cl)ccc3nc(N4CC[C@@H](O)C4)ccc23)c1\nConstraint: Even if you are uncertain, you must answer with a numeric value in mL \/ (min g) without the unit and without using any additional words.\nResult: 8.000"}", "/scratch/micpie/export/clearance_astrazeneca/valid_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the drug clearance in mL \/ (min g).\nInChI: InChI=1S\/C18H24N4O4\/c1-4-19-18(23)26-13-5-7-22(8-6-13)17-14-10-16(25-3)15(24-2)9-12(14)11-20-21-17\/h9-11,13H,4-8H2,1-3H3,(H,19,23)\nConstraint: Even if you are not sure, you must answer with a numeric value in mL \/ (min g) without using any other words.\nResult: 33.880 mL \/ (min g)"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the drug clearance in mL \/ (min g).\nSELFIES: [N][#C][C][=C][C][=C][C][Branch2][Ring2][Branch1][N][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][Cl][C][=C][C][=N][C][Branch1][O][N][C][C][C@@H1][Branch1][C][O][C][Ring1][=Branch1][=C][C][=C][Ring1][P][Ring1][N][=C][Ring2][Ring1][N]\nConstraint: Even if you are uncertain, you must answer with a numeric value in mL \/ (min g) without using any other words.\nResult: 8.000 mL \/ (min g)"}", "/scratch/micpie/export/clearance_astrazeneca/train_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the drug clearance in mL \/ (min g).\nSELFIES: [C][C][=C][Branch2][Ring1][Branch2][C][=Branch1][C][=O][N][C][C][C][C][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][C][Ring1][=Branch2][C][Ring1][#Branch1][C][=N][N][Ring2][Ring1][C][C][=C][C][=C][Branch1][=Branch1][C][=Branch1][C][=O][O][C][=C][Ring1][=Branch2]\nConstraint: Even if you are uncertain, you must answer with a numeric value in mL \/ (min g) without the unit and without using any additional words.\nResult: 3.000"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the drug clearance in mL \/ (min g).\ncanonical SMILES: CNCc1ccc(-c2[nH]c3cc(F)cc4c3c2CCNC4=O)cc1\nConstraint: Even if you are uncertain, you must answer with a numeric value in mL \/ (min g) without the unit and without using any other words.\nResult: 12.020"}", "/scratch/micpie/export/clearance_astrazeneca/train_0-2.jsonl": "{"text":"The SELFIES [C][C][=C][Branch2][Ring1][Branch2][C][=Branch1][C][=O][N][C][C][C][C][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][C][Ring1][=Branch2][C][Ring1][#Branch1][C][=N][N][Ring2][Ring1][C][C][=C][C][=C][Branch1][=Branch1][C][=Branch1][C][=O][O][C][=C][Ring1][=Branch2] represents a molecule that has a volume of plasma cleared of a drug over a specified time period of 3.000 mL \/ (min g)."} {"text":"The SMILES CNCc1ccc(-c2[nH]c3cc(F)cc4c3c2CCNC4=O)cc1 represents a molecule that has a drug clearance of 12.020 mL \/ (min g)."}", "/scratch/micpie/export/clearance_astrazeneca/test_0-11.jsonl": "{"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a drug clearance of 98.000 mL \/ (min g).\nAssistant: Ok, this SELFIES represents a molecule that has a drug clearance of 98.000 mL \/ (min g): [O][=C][Branch2][Ring1][=C][N][C][C][C][C][C][C][N][C][C][C][=C][C][=C][Branch1][C][O][C][N][=C][Branch1][C][O][S][C][Ring1][O][=Ring1][=Branch1][N][C][=C][C][=C][C][=C][Ring1][=Branch1]"} {"text":"User: I want to create a SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a volume of plasma cleared of a drug over a specified time period of 27.300 mL \/ (min g).\nAssistant: Got it, this SMILES represents a molecule that has a volume of plasma cleared of a drug over a specified time period of 27.300 mL \/ (min g): C[C@H](CO)Nc1nc(SCc2ccco2)nc2[nH]c(=O)sc12"}", "/scratch/micpie/export/clearance_astrazeneca/train_0-7.jsonl": "{"text":"User: Can you derive the drug clearance in mL \/ (min g) of the molecule with the DeepSMILES CccC=O)NCCCCCCC6)CC8C6)))))))))))cnn5-ccccC=O)O))cc6?\nAssistant: Sure, this molecule has a drug clearance of 3.000 mL \/ (min g)."} {"text":"User: Can you estimate the drug clearance in mL \/ (min g) of the molecule with the DeepSMILES CNCcccc-c[nH]cccF)ccc6c9CCNC7=O))))))))))))))cc6?\nAssistant: Yes, I'm happy to help, this molecule has a drug clearance of 12.020 mL \/ (min g)."}", "/scratch/micpie/export/clearance_astrazeneca/train_0-11.jsonl": "{"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should have a drug clearance of 3.000 mL \/ (min g).\nAssistant: Ok, this canonical SMILES represents a molecule that has a drug clearance of 3.000 mL \/ (min g): Cc1c(C(=O)NC2C3CC4CC(C3)CC2C4)cnn1-c1ccc(C(=O)O)cc1"} {"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should have a volume of plasma cleared of a drug over a specified time period of 12.020 mL \/ (min g).\nAssistant: Understood, this DeepSMILES represents a molecule that has a volume of plasma cleared of a drug over a specified time period of 12.020 mL \/ (min g): CNCcccc-c[nH]cccF)ccc6c9CCNC7=O))))))))))))))cc6"}", "/scratch/micpie/export/clearance_astrazeneca/train_0-1.jsonl": "{"text":"Based on the canonical SMILES representation of Cc1c(C(=O)NC2C3CC4CC(C3)CC2C4)cnn1-c1ccc(C(=O)O)cc1, the molecule has a volume of plasma cleared of a drug over a specified time period of 3.000 mL \/ (min g)."} {"text":"Based on the InChI representation of InChI=1S\/C19H18FN3O\/c1-21-10-11-2-4-12(5-3-11)18-14-6-7-22-19(24)15-8-13(20)9-16(23-18)17(14)15\/h2-5,8-9,21,23H,6-7,10H2,1H3,(H,22,24), the molecule has a volume of plasma cleared of a drug over a specified time period of 12.020 mL \/ (min g)."}", "/scratch/micpie/export/clearance_astrazeneca/train_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the drug clearance in mL \/ (min g).\nInChI: InChI=1S\/C22H25N3O3\/c1-12-19(11-23-25(12)18-4-2-15(3-5-18)22(27)28)21(26)24-20-16-7-13-6-14(9-16)10-17(20)8-13\/h2-5,11,13-14,16-17,20H,6-10H2,1H3,(H,24,26)(H,27,28)\nConstraint: Even if you are uncertain, you must answer with a numeric value in mL \/ (min g) without using any other words.\nResult: 3.000 mL \/ (min g)"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the volume of plasma cleared of a drug over a specified time period in mL \/ (min g).\nMolecule InChI: InChI=1S\/C19H18FN3O\/c1-21-10-11-2-4-12(5-3-11)18-14-6-7-22-19(24)15-8-13(20)9-16(23-18)17(14)15\/h2-5,8-9,21,23H,6-7,10H2,1H3,(H,22,24)\nConstraint: Even if you are uncertain, you must answer with a numeric value in mL \/ (min g) without using any other words.\nResult: 12.020 mL \/ (min g)"}", "/scratch/micpie/export/clearance_astrazeneca/test_0-7.jsonl": "{"text":"User: Can you estimate the drug clearance in mL \/ (min g) of the molecule with the InChI InChI=1S\/C22H28N4O3S\/c27-18-11-10-16(20-19(18)26-22(29)30-20)12-15-23-13-6-1-2-7-14-24-21(28)25-17-8-4-3-5-9-17\/h3-5,8-11,23,27H,1-2,6-7,12-15H2,(H,26,29)(H2,24,25,28)?\nAssistant: Sure, this molecule has a drug clearance of 98.000 mL \/ (min g)."} {"text":"User: Can you estimate the drug clearance in mL \/ (min g) of the molecule with the canonical SMILES C[C@H](CO)Nc1nc(SCc2ccco2)nc2[nH]c(=O)sc12?\nAssistant: Of course, this molecule has a drug clearance of 27.300 mL \/ (min g)."}", "/scratch/micpie/export/clearance_astrazeneca/train_0-9.jsonl": "{"text":"User: I'm searching for the SELFIES of a molecule that has a drug clearance of 3.000 mL \/ (min g).\nAssistant: This is a molecule that has a drug clearance of 3.000 mL \/ (min g): [C][C][=C][Branch2][Ring1][Branch2][C][=Branch1][C][=O][N][C][C][C][C][C][C][Branch1][Ring2][C][Ring1][=Branch1][C][C][Ring1][=Branch2][C][Ring1][#Branch1][C][=N][N][Ring2][Ring1][C][C][=C][C][=C][Branch1][=Branch1][C][=Branch1][C][=O][O][C][=C][Ring1][=Branch2]"} {"text":"User: I'm looking for the InChI of a molecule that has a drug clearance of 12.020 mL \/ (min g).\nAssistant: This is a molecule that has a drug clearance of 12.020 mL \/ (min g): InChI=1S\/C19H18FN3O\/c1-21-10-11-2-4-12(5-3-11)18-14-6-7-22-19(24)15-8-13(20)9-16(23-18)17(14)15\/h2-5,8-9,21,23H,6-7,10H2,1H3,(H,22,24)"}", "/scratch/micpie/export/clearance_astrazeneca/valid_0-3.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C18H24N4O4\/c1-4-19-18(23)26-13-5-7-22(8-6-13)17-14-10-16(25-3)15(24-2)9-12(14)11-20-21-17\/h9-11,13H,4-8H2,1-3H3,(H,19,23) has a volume of plasma cleared of a drug over a specified time period of 33.880 mL \/ (min g)."} {"text":"The molecule with the SELFIES [N][#C][C][=C][C][=C][C][Branch2][Ring2][Branch1][N][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][Cl][C][=C][C][=N][C][Branch1][O][N][C][C][C@@H1][Branch1][C][O][C][Ring1][=Branch1][=C][C][=C][Ring1][P][Ring1][N][=C][Ring2][Ring1][N] has a drug clearance of 8.000 mL \/ (min g)."}", "/scratch/micpie/export/clearance_astrazeneca/test_0-8.jsonl": "{"text":"User: Can you create the SELFIES of a molecule that has a volume of plasma cleared of a drug over a specified time period of 98.000 mL \/ (min g)?\nAssistant: Yes, here you go: [O][=C][Branch2][Ring1][=C][N][C][C][C][C][C][C][N][C][C][C][=C][C][=C][Branch1][C][O][C][N][=C][Branch1][C][O][S][C][Ring1][O][=Ring1][=Branch1][N][C][=C][C][=C][C][=C][Ring1][=Branch1]"} {"text":"User: Can you create the DeepSMILES of a molecule that has a drug clearance of 27.300 mL \/ (min g)?\nAssistant: Yes, I'm happy to help, here you go: C[C@H]CO))NcncSCcccco5)))))))nc[nH]c=O)sc95"}", "/scratch/micpie/export/clearance_astrazeneca/test_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the volume of plasma cleared of a drug over a specified time period in mL \/ (min g).\nMolecule SELFIES: [O][=C][Branch2][Ring1][=C][N][C][C][C][C][C][C][N][C][C][C][=C][C][=C][Branch1][C][O][C][N][=C][Branch1][C][O][S][C][Ring1][O][=Ring1][=Branch1][N][C][=C][C][=C][C][=C][Ring1][=Branch1]\nConstraint: Even if you are uncertain, you must answer with a numeric value in mL \/ (min g) without using any other words.\nResult: 98.000 mL \/ (min g)"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the volume of plasma cleared of a drug over a specified time period in mL \/ (min g).\nMolecule DeepSMILES: C[C@H]CO))NcncSCcccco5)))))))nc[nH]c=O)sc95\nConstraint: Even if you are not sure, you must answer with a numeric value in mL \/ (min g) without using any additional words.\nResult: 27.300 mL \/ (min g)"}", "/scratch/micpie/export/iupac_goldbook/test_0-5.jsonl": "{"text":"User: I'm searching for the chemistry term that can be described as:\nA measuring assembly incorporating a radiation detector and a pulse amplitude, used for determining the energy spectrum of α (β, γ) radiation.\nAssistant: This chemistry term fits this definition: α- (β-, γ-) ray spectrometer"} {"text":"User: I'm looking for the chemistry term that can be described by:\nReversal of coagulation or flocculation, i.e., the dispersion of agglomerates to form a colloidally stablesuspension or emulsion.\nAssistant: This chemistry term fits your definition: peptization"}", "/scratch/micpie/export/iupac_goldbook/test_0-1.jsonl": "{"text":"Task: Please generate a text definition of a chemistry term.\nTerm: α- (β-, γ-) ray spectrometer\nConstraint: Answer the question with full sentences.\nResult: A measuring assembly incorporating a radiation detector and a pulse amplitude, used for determining the energy spectrum of α (β, γ) radiation."} {"text":"Task: Please generate a text definition of a chemistry term.\nTerm: peptization\nConstraint: Answer the question with complete sentences.\nResult: Reversal of coagulation or flocculation, i.e., the dispersion of agglomerates to form a colloidally stablesuspension or emulsion."}", "/scratch/micpie/export/iupac_goldbook/valid_0-0.jsonl": "{"text":"The chemistry term \"α-expulsion\" can be defined by:\nA general reaction by which a group attached to the alpha carbon of an excited chromophore is expelled either as an odd electron species or as an anionic species. This reaction should be distinguished from an α- (alpha-) cleavage."} {"text":"The chemistry term \"supercritical drying of a gel\" can be defined as:\nDrying of a gel using a supercritical fluid."}", "/scratch/micpie/export/iupac_goldbook/test_0-2.jsonl": "{"text":"Task: Please generate a chemistry term for the following definition:\nDefinition: A measuring assembly incorporating a radiation detector and a pulse amplitude, used for determining the energy spectrum of α (β, γ) radiation.\nResult: α- (β-, γ-) ray spectrometer"} {"text":"Task: Please create a chemistry term for the following definition:\nDefinition: Reversal of coagulation or flocculation, i.e., the dispersion of agglomerates to form a colloidally stablesuspension or emulsion.\nResult: peptization"}", "/scratch/micpie/export/iupac_goldbook/train_0-6.jsonl": "{"text":"User: I want to create a text definition.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The chemistry term can be described by:\nα-decay\nAssistant: Got it, this text definition fits your description: Radioactive decay in which an alpha particle is emitted."} {"text":"User: I want to create a text definition.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The chemistry term can be described as:\nviscous sintering\nAssistant: Got it, this text definition fits this description: Sintering process by which it is possible to densify gels to glasses and ceramics at elevated temperatures."}", "/scratch/micpie/export/iupac_goldbook/valid_0-6.jsonl": "{"text":"User: I want to create a definition.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The chemistry term can be described as:\nα-expulsion\nAssistant: Ok, here you go, this definition fits your description: A general reaction by which a group attached to the alpha carbon of an excited chromophore is expelled either as an odd electron species or as an anionic species. This reaction should be distinguished from an α- (alpha-) cleavage."} {"text":"User: I want to generate a definition.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The chemistry term can be described as:\nsupercritical drying of a gel\nAssistant: Got it, this definition fits your description: Drying of a gel using a supercritical fluid."}", "/scratch/micpie/export/iupac_goldbook/test_0-0.jsonl": "{"text":"The chemistry term \"α- (β-, γ-) ray spectrometer\" can be defined by:\nA measuring assembly incorporating a radiation detector and a pulse amplitude, used for determining the energy spectrum of α (β, γ) radiation."} {"text":"The chemistry term \"peptization\" can be described by:\nReversal of coagulation or flocculation, i.e., the dispersion of agglomerates to form a colloidally stablesuspension or emulsion."}", "/scratch/micpie/export/iupac_goldbook/valid_0-7.jsonl": "{"text":"User: I want to create a chemistry term.\nAssistant: How is the chemistry term described?\nUser: The chemistry term can be described as:\nA general reaction by which a group attached to the alpha carbon of an excited chromophore is expelled either as an odd electron species or as an anionic species. This reaction should be distinguished from an α- (alpha-) cleavage.\nAssistant: Got it, this chemistry term fits this description: α-expulsion"} {"text":"User: I want to generate a chemistry term.\nAssistant: How is the chemistry term described?\nUser: The chemistry term can be described as:\nDrying of a gel using a supercritical fluid.\nAssistant: Ok, this chemistry term fits your description: supercritical drying of a gel"}", "/scratch/micpie/export/iupac_goldbook/test_0-3.jsonl": "{"text":"User: Can you give me a chemistry term defined by:\nA measuring assembly incorporating a radiation detector and a pulse amplitude, used for determining the energy spectrum of α (β, γ) radiation.\nAssistant: Of course, here you go: α- (β-, γ-) ray spectrometer"} {"text":"User: Can you give me a chemistry term defined by:\nReversal of coagulation or flocculation, i.e., the dispersion of agglomerates to form a colloidally stablesuspension or emulsion.\nAssistant: Of course, here you go: peptization"}", "/scratch/micpie/export/iupac_goldbook/train_0-0.jsonl": "{"text":"The chemistry term \"α-decay\" can be described as:\nRadioactive decay in which an alpha particle is emitted."} {"text":"The chemistry term \"viscous sintering\" can be defined as:\nSintering process by which it is possible to densify gels to glasses and ceramics at elevated temperatures."}", "/scratch/micpie/export/iupac_goldbook/test_0-6.jsonl": "{"text":"User: I want to create a definition.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The chemistry term can be described as:\nα- (β-, γ-) ray spectrometer\nAssistant: Got it, here you go, this definition fits your description: A measuring assembly incorporating a radiation detector and a pulse amplitude, used for determining the energy spectrum of α (β, γ) radiation."} {"text":"User: I want to come up with a definition.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The chemistry term can be described by:\npeptization\nAssistant: Got it, here you go, this definition fits your description: Reversal of coagulation or flocculation, i.e., the dispersion of agglomerates to form a colloidally stablesuspension or emulsion."}", "/scratch/micpie/export/iupac_goldbook/train_0-3.jsonl": "{"text":"User: Can you give me a chemistry term described by:\nRadioactive decay in which an alpha particle is emitted.\nAssistant: Sure, here you go: α-decay"} {"text":"User: Can you generate a chemistry term defined by:\nSintering process by which it is possible to densify gels to glasses and ceramics at elevated temperatures.\nAssistant: Sure, here you go: viscous sintering"}", "/scratch/micpie/export/iupac_goldbook/valid_0-2.jsonl": "{"text":"Task: Please give me a chemistry term for the definition:\nDefinition: A general reaction by which a group attached to the alpha carbon of an excited chromophore is expelled either as an odd electron species or as an anionic species. This reaction should be distinguished from an α- (alpha-) cleavage.\nResult: α-expulsion"} {"text":"Task: Please create a chemistry term for the definition:\nDefinition: Drying of a gel using a supercritical fluid.\nResult: supercritical drying of a gel"}", "/scratch/micpie/export/iupac_goldbook/valid_0-1.jsonl": "{"text":"Task: Please generate a text definition of a chemistry term.\nTerm: α-expulsion\nConstraint: Answer the question with complete sentences.\nResult: A general reaction by which a group attached to the alpha carbon of an excited chromophore is expelled either as an odd electron species or as an anionic species. This reaction should be distinguished from an α- (alpha-) cleavage."} {"text":"Task: Please create a definition of a chemistry term.\nTerm: supercritical drying of a gel\nConstraint: Answer the question with complete sentences.\nResult: Drying of a gel using a supercritical fluid."}", "/scratch/micpie/export/iupac_goldbook/valid_0-5.jsonl": "{"text":"User: I'm searching for the chemistry term that can be described by:\nA general reaction by which a group attached to the alpha carbon of an excited chromophore is expelled either as an odd electron species or as an anionic species. This reaction should be distinguished from an α- (alpha-) cleavage.\nAssistant: This chemistry term fits this definition: α-expulsion"} {"text":"User: I'm searching for the chemistry term that can be described as:\nDrying of a gel using a supercritical fluid.\nAssistant: This chemistry term fits this definition: supercritical drying of a gel"}", "/scratch/micpie/export/iupac_goldbook/valid_0-4.jsonl": "{"text":"User: Can you generate the definition for the following chemistry term:\nα-expulsion\nAssistant: Of course, here you go:\nA general reaction by which a group attached to the alpha carbon of an excited chromophore is expelled either as an odd electron species or as an anionic species. This reaction should be distinguished from an α- (alpha-) cleavage."} {"text":"User: Can you generate the text definition for the following chemistry term:\nsupercritical drying of a gel\nAssistant: Yes, here you go:\nDrying of a gel using a supercritical fluid."}", "/scratch/micpie/export/iupac_goldbook/train_0-5.jsonl": "{"text":"User: I'm looking for the chemistry term that can be described as:\nRadioactive decay in which an alpha particle is emitted.\nAssistant: This chemistry term fits your definition: α-decay"} {"text":"User: I'm looking for the chemistry term that can be described by:\nSintering process by which it is possible to densify gels to glasses and ceramics at elevated temperatures.\nAssistant: This chemistry term fits your definition: viscous sintering"}", "/scratch/micpie/export/iupac_goldbook/train_0-2.jsonl": "{"text":"Task: Please create a chemistry term for the text definition:\nDefinition: Radioactive decay in which an alpha particle is emitted.\nResult: α-decay"} {"text":"Task: Please generate a chemistry term for the following text definition:\nDefinition: Sintering process by which it is possible to densify gels to glasses and ceramics at elevated temperatures.\nResult: viscous sintering"}", "/scratch/micpie/export/iupac_goldbook/train_0-7.jsonl": "{"text":"User: I want to generate a chemistry term.\nAssistant: How is the chemistry term described?\nUser: The chemistry term can be described as:\nRadioactive decay in which an alpha particle is emitted.\nAssistant: Ok, here you go, this chemistry term fits this description: α-decay"} {"text":"User: I want to generate a chemistry term.\nAssistant: This sounds very exciting. How is the chemistry term described?\nUser: The chemistry term can be described as:\nSintering process by which it is possible to densify gels to glasses and ceramics at elevated temperatures.\nAssistant: Got it, here you go, this chemistry term fits your description: viscous sintering"}", "/scratch/micpie/export/iupac_goldbook/train_0-1.jsonl": "{"text":"Task: Please generate a definition of a chemistry term.\nTerm: α-decay\nConstraint: Answer the question with complete sentences.\nResult: Radioactive decay in which an alpha particle is emitted."} {"text":"Task: Please give me a definition of a chemistry term.\nTerm: viscous sintering\nConstraint: Answer the question with full sentences.\nResult: Sintering process by which it is possible to densify gels to glasses and ceramics at elevated temperatures."}", "/scratch/micpie/export/iupac_goldbook/train_0-4.jsonl": "{"text":"User: Can you create the text definition for the following chemistry term:\nα-decay\nAssistant: Sure, here you go:\nRadioactive decay in which an alpha particle is emitted."} {"text":"User: Can you give me the text definition for the following chemistry term:\nviscous sintering\nAssistant: Yes, here you go:\nSintering process by which it is possible to densify gels to glasses and ceramics at elevated temperatures."}", "/scratch/micpie/export/iupac_goldbook/test_0-7.jsonl": "{"text":"User: I want to generate a chemistry term.\nAssistant: This sounds very interesting. How is the chemistry term described?\nUser: The chemistry term can be described as:\nA measuring assembly incorporating a radiation detector and a pulse amplitude, used for determining the energy spectrum of α (β, γ) radiation.\nAssistant: Ok, here you go, this chemistry term fits your description: α- (β-, γ-) ray spectrometer"} {"text":"User: I want to generate a chemistry term.\nAssistant: This sounds very exciting. How is the chemistry term described?\nUser: The chemistry term can be described as:\nReversal of coagulation or flocculation, i.e., the dispersion of agglomerates to form a colloidally stablesuspension or emulsion.\nAssistant: Ok, this chemistry term fits this description: peptization"}", "/scratch/micpie/export/iupac_goldbook/valid_0-3.jsonl": "{"text":"User: Can you give me a chemistry term defined by:\nA general reaction by which a group attached to the alpha carbon of an excited chromophore is expelled either as an odd electron species or as an anionic species. This reaction should be distinguished from an α- (alpha-) cleavage.\nAssistant: Sure, here you go: α-expulsion"} {"text":"User: Can you give me a chemistry term defined by:\nDrying of a gel using a supercritical fluid.\nAssistant: Yes, I'm happy to help, here you go: supercritical drying of a gel"}", "/scratch/micpie/export/iupac_goldbook/test_0-4.jsonl": "{"text":"User: Can you give me the text definition for the following chemistry term:\nα- (β-, γ-) ray spectrometer\nAssistant: Yes, I'm happy to help, here you go:\nA measuring assembly incorporating a radiation detector and a pulse amplitude, used for determining the energy spectrum of α (β, γ) radiation."} {"text":"User: Can you give me the definition for the following chemistry term:\npeptization\nAssistant: Yes, I'm happy to help, here you go:\nReversal of coagulation or flocculation, i.e., the dispersion of agglomerates to form a colloidally stablesuspension or emulsion."}", "/scratch/micpie/export/bio_ner_14/valid_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: The known compounds were identified as 3 beta-hydroxytaraxer-14-en-28-oic acid or aleuritolic acid (5) (McPhail, A. T., McPhail, D. R., Wani, M. C., Wall, M. E. & A. W., Nicholas, A. W. (1989). Identity of maprounic acid with aleuritolic acid. Revision of the structure of maprounic acid: X-ray crystal structure of p-bromobenzyl acetylmaprounate. Journal Natural Products, 52, 212), 3 alpha-hydroxytaraxer-14-en-28-oic acid or isoaleuritolic acid (6), 3 alpha-acetyltaraxer-14-en-28-oic acid acetate or aleuritolic acid acetate (7) (Chaudhuri, S. K., Fullas, F., Brown, D. M., Wani, M. C., Wall, M. E., Cai, L.,... Kinghorn, A. D. (1995). Isolation and structural elucidation of pentacyclic triterpenoids from Maprounea africana. Journal of Natural Products, 58, 1-9), 3-oxo-taraxer-14-ene or taraxerone (8) beta-sitosterol (9) and stigmasterol (10) (Kamboj & Saluja, 2011), together with fatty acids..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: 3 beta - hydroxytaraxer - 14 - en - 28 - oic acid,39,88,Chemical\/Drug\naleuritolic acid,92,108,Chemical\/Drug\nmaprounic acid,219,233,Chemical\/Drug\naleuritolic acid,239,255,Chemical\/Drug\nmaprounic acid,286,300,Chemical\/Drug\np - bromobenzyl acetylmaprounate,331,363,Chemical\/Drug\n3 alpha - hydroxytaraxer - 14 - en - 28 - oic acid,401,451,Chemical\/Drug\nisoaleuritolic acid,455,474,Chemical\/Drug\n3 alpha - acetyltaraxer - 14 - en - 28 - oic acid acetate,481,538,Chemical\/Drug\naleuritolic acid acetate,542,566,Chemical\/Drug\n3 - oxo - taraxer - 14 - ene,813,841,Chemical\/Drug\ntaraxerone,845,855,Chemical\/Drug\nbeta - sitosterol,861,878,Chemical\/Drug\nstigmasterol,888,900,Chemical\/Drug"} {"text":"Task: Please carry out the NER task for the the text below.\nText: We demonstrate that: 1) RANTES promoter activity is up-regulated by PMA plus ionomycin, coexpression of the p65 subunit of nuclear factor (NF)-kappa B, the proinflammatory cytokines TNF-alpha and IL-1 beta, and the CD28 costimulatory pathway; 2) the RANTES promoter region contains four NF-kappa B binding sites at positions-30,-44,-213, and-579 relative to the transcription start site; 3) one site (-213) is an NF-AT (nuclear factor of activated T cells) binding site that also has weak affinity to NF-kappa B, and the most distal site (-579) also serves as a CD28-responsive element; and 4) mutation on any of those NF-kappa B sites or coexpression of I kappa B alpha (cytoplasmic inhibitor of NF-kappa B) markedly reduced the promoter activity..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: RANTES,24,30,Gene\/Protein\np65 subunit,110,121,Gene\/Protein\nproinflammatory cytokines,161,186,Gene\/Protein\nTNF - alpha,187,198,Gene\/Protein\nIL - 1 beta,203,214,Gene\/Protein\nCD28,224,228,Gene\/Protein\nRANTES promoter region,259,281,Gene\/Protein\nNF - kappa B binding sites,296,322,Gene\/Protein\ntranscription start site,381,405,Gene\/Protein\nNF - kappa B,525,537,Gene\/Protein\nCD28 - responsive element,590,615,Gene\/Protein\nNF - kappa B sites,649,667,Gene\/Protein\nI kappa B alpha,687,702,Gene\/Protein\ncytoplasmic inhibitor of NF - kappa B,705,742,Gene\/Protein"}", "/scratch/micpie/export/bio_ner_14/test_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: OCPs (hexachlorocyclohexan (HCH), aldrin, dieldrin, endosulfan, pp '-dichlorodiphenyldich (pp '-DDE), op '-DDE, pp '-dichlorodiphenyltric (pp '-DDT), op '-DDT, pp '-dichlorodiphenyldich (pp '-DDD) and op '-DDD) were extracted from blood and quantitatively estimated using gas chromatography..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: hexachlorocyclohexan,7,27,Chemical\/Drug\nHCH,30,33,Chemical\/Drug\naldrin,36,42,Chemical\/Drug\ndieldrin,44,52,Chemical\/Drug\nendosulfan,54,64,Chemical\/Drug\npp ' - dichlorodiphenyldich,66,93,Chemical\/Drug\npp ' - DDE,96,106,Chemical\/Drug\nop ' - DDE,109,119,Chemical\/Drug\npp ' - dichlorodiphenyltric,121,148,Chemical\/Drug\npp ' - DDT,151,161,Chemical\/Drug\nop ' - DDT,164,174,Chemical\/Drug\npp ' - dichlorodiphenyldich,176,203,Chemical\/Drug\npp ' - DDD,206,216,Chemical\/Drug\nop ' - DDD,222,232,Chemical\/Drug"} {"text":"Task: Please carry out the NER task for the the text below.\nText: Primary antibodies used were: rabbit anti-p75 (1: 1500, generous gift from Louis Reichardt, UCSF [ 26]), mouse IgG2b anti-Hu C\/D, (1: 250, Molecular Probes); mouse IgG1 anti-Islet-1, (1: 10, Developmental Studies Hybridoma Bank); rabbit anti-chicken TrkA (1: 500); rabbit anti-chicken TrkB (1: 500); rabbit anti-chicken TrkC (1: 500) (all Trk antibodies were generous gifts of Dr. Louis Reichardt, UCSF [ 26-28]); mouse anti-HNK-1 (1: 50, Developmental Studies Hybridoma Bank); mouse IgG2a anti-tyrosine hydroxylase (1: 10, Developmental Studies Hybridoma Bank), sheep anti-BrdU (1: 100, Biodesign International), rabbit anti-tyrosine hydroxylase (1: 100, Chemicon), and goat anti-TrkB (1: 1000, R & D Systems)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: rabbit,30,36,Organism\/Species\nmouse,108,113,Organism\/Species\nmouse,166,171,Organism\/Species\nrabbit,243,249,Organism\/Species\nchicken,257,264,Organism\/Species\nrabbit,281,287,Organism\/Species\nchicken,295,302,Organism\/Species\nrabbit,319,325,Organism\/Species\nchicken,333,340,Organism\/Species\nmouse,439,444,Organism\/Species\nmouse,508,513,Organism\/Species\nsheep,596,601,Organism\/Species\nrabbit,650,656,Organism\/Species\ngoat,710,714,Organism\/Species"}", "/scratch/micpie/export/bio_ner_14/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Interaction was apparently determined by the N-terminal splice region of RPDE-6, as the PDE4A splice variant RPDE-39, which differs from RPDE-6 at the extreme N-terminus, failed to associate with v-Src-SH3; met26RD1 (where RD1 is rat'dunc-like'PDE), which has the N-terminal splice region deleted, failed to associate with v-Src-SH3, and the association of RPDE-6 and v-Src-SH3 was blocked by a fusion protein formed from the N-terminal splice region..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: RPDE - 6,75,83,Gene\/Protein\nPDE4A,92,97,Gene\/Protein\nRPDE - 39,113,122,Gene\/Protein\nRPDE - 6,143,151,Gene\/Protein\nv - Src,206,213,Gene\/Protein\nSH3,216,219,Gene\/Protein\nmet26RD1,221,229,Gene\/Protein\nRD1,238,241,Gene\/Protein\nrat ' dunc - like ' PDE,245,268,Gene\/Protein\nv - Src,346,353,Gene\/Protein\nSH3,356,359,Gene\/Protein\nRPDE - 6,384,392,Gene\/Protein\nv - Src,397,404,Gene\/Protein\nSH3,407,410,Gene\/Protein"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: NF-kappaB activation by LMP1 (1-231) is likely to be mediated by TRAF1\/TRAF2 heteroaggregates since TRAF1 is unique among the TRAFs in coactivating NF-kappaB with LMP1 (1-231), a TRAF2 dominant-negative mutant can block LMP1 (1-231)-mediated NF-kappaB activation as well as TRAF1 coactivation, and 30% of TRAF2 is associated with TRAF1 in EBV-transformed B cells..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: NF - kappaB,0,11,Gene\/Protein\nLMP1,26,30,Gene\/Protein\nTRAF1,70,75,Gene\/Protein\n\/ TRAF2,76,83,Gene\/Protein\nTRAF1,107,112,Gene\/Protein\nTRAFs,133,138,Gene\/Protein\nNF - kappaB,155,166,Gene\/Protein\nLMP1,172,176,Gene\/Protein\nTRAF2,191,196,Gene\/Protein\nLMP1,234,238,Gene\/Protein\nNF - kappaB,261,272,Gene\/Protein\nTRAF1,295,300,Gene\/Protein\nTRAF2,326,331,Gene\/Protein\nTRAF1,351,356,Gene\/Protein"}", "/scratch/micpie/export/drug_protein_disease/test_0-1.jsonl": "{"text":"The drug CC(C)Cc1ccc(C(C)C(=O)O)cc1 targets the protein Apoptosis regulator Bcl-2. The protein Apoptosis regulator Bcl-2 is related to the disease Chronic lymphocytic leukemia."} {"text":"The drug NC1=NC2=C(NC=C2)C(=O)N1 targets the protein HGPRT. The protein HGPRT is related to the disease Lesch-Nyhan syndrome."}", "/scratch/micpie/export/drug_protein_disease/valid_0-0.jsonl": "{"text":"The drug Caffeine targets the protein Serine-protein kinase ATM which is related to the disease Ataxia-telangiectasia."} {"text":"The drug Mercaptopurine targets the protein HGPRT which is related to the disease Lesch-Nyhan syndrome."}", "/scratch/micpie/export/drug_protein_disease/test_0-2.jsonl": "{"text":"User: Can you come up with one example for a protein that is targeted by the drug Ibuprofen?\nAssistant: Yes, the protein Apoptosis regulator Bcl-2 is targeted by the above drug.\nUser: Can you tell me which disease the protein Apoptosis regulator Bcl-2 is related to?\nAssistant: The protein Apoptosis regulator Bcl-2 is related to the Chronic lymphocytic leukemia disease."} {"text":"User: Can you come up with one example for a protein that is targeted by the drug 9-Deazaguanine?\nAssistant: Yes, the protein Hypoxanthine-guanine phosphoribosyltransferase is targeted by this drug.\nUser: Can you tell me which disease the protein Hypoxanthine-guanine phosphoribosyltransferase is related to?\nAssistant: The protein Hypoxanthine-guanine phosphoribosyltransferase is related to the Lesch-Nyhan syndrome disease."}", "/scratch/micpie/export/drug_protein_disease/test_0-0.jsonl": "{"text":"The drug Ibuprofen targets the protein Apoptosis regulator Bcl-2 which is related to the disease Chronic lymphocytic leukemia."} {"text":"The drug 9-Deazaguanine targets the protein Hypoxanthine-guanine phosphoribosyltransferase which is related to the disease Lesch-Nyhan syndrome."}", "/scratch/micpie/export/drug_protein_disease/train_0-0.jsonl": "{"text":"The drug Paclitaxel targets the protein Apoptosis regulator Bcl-2 which is related to the disease Chronic lymphocytic leukemia."} {"text":"The drug InChI=1S\/C3H7NO2\/c1-2(4)3(5)6\/h2H,4H2,1H3,(H,5,6)\/t2-\/m0\/s1 targets the protein Alanyl-tRNA synthetase which is related to the disease Leukoencephalopathy, progressive, with ovarian failure."}", "/scratch/micpie/export/drug_protein_disease/valid_0-2.jsonl": "{"text":"User: Can you come up with one example for a protein that is targeted by the drug Caffeine?\nAssistant: Yes, the protein Ataxia telangiectasia mutated is targeted by the above drug.\nUser: Can you tell me which disease the protein Ataxia telangiectasia mutated is related to?\nAssistant: The protein Ataxia telangiectasia mutated is related to the Ataxia-telangiectasia disease."} {"text":"User: Can you give me one example for a protein that is targeted by the drug Mercaptopurine?\nAssistant: Yes, the protein Hypoxanthine-guanine phosphoribosyltransferase is targeted by this drug.\nUser: Can you tell me which disease the protein Hypoxanthine-guanine phosphoribosyltransferase is related to?\nAssistant: The protein Hypoxanthine-guanine phosphoribosyltransferase is related to the Lesch-Nyhan syndrome disease."}", "/scratch/micpie/export/drug_protein_disease/valid_0-1.jsonl": "{"text":"The drug CNC=NC=C5C=O)NC)C=O)N6C targets the protein A-T mutated. The protein A-T mutated is related to the disease Ataxia-telangiectasia."} {"text":"The drug S=CN=CNC=C6NC=N5 targets the protein HGPRTase. The protein HGPRTase is related to the disease Lesch-Nyhan syndrome."}", "/scratch/micpie/export/drug_protein_disease/train_0-2.jsonl": "{"text":"User: Can you come up with one example for a protein that is targeted by the drug Paclitaxel?\nAssistant: Of course, the protein Apoptosis regulator Bcl-2 is targeted by the above drug.\nUser: Can you tell me which disease the protein Apoptosis regulator Bcl-2 is related to?\nAssistant: The protein Apoptosis regulator Bcl-2 is related to the Chronic lymphocytic leukemia disease."} {"text":"User: Can you come up with an example for a protein that is targeted by the drug Alanine?\nAssistant: Yes, the protein Alanyl-tRNA synthetase is targeted by this drug.\nUser: Can you tell me which disease the protein Alanyl-tRNA synthetase is related to?\nAssistant: The protein Alanyl-tRNA synthetase is related to the Leukoencephalopathy, progressive, with ovarian failure disease."}", "/scratch/micpie/export/drug_protein_disease/train_0-1.jsonl": "{"text":"The drug Paclitaxel targets the protein Apoptosis regulator Bcl-2. The protein Apoptosis regulator Bcl-2 is related to the disease Chronic lymphocytic leukemia."} {"text":"The drug Alanine targets the protein Alanine--tRNA ligase, mitochondrial. The protein Alanine--tRNA ligase, mitochondrial is related to the disease Leukoencephalopathy, progressive, with ovarian failure."}", "/scratch/micpie/export/physics_stackexchange/test_0-1.jsonl": "{"text":"Task: Create a meaningful title for this question.\nInquiry: In quantum mechanics, the probability, say, that a radioactive atom will decay is well defined. By the Born Rule, it says that the probability of obtaining any possible measurement outcome is equal to the square of the corresponding amplitude. However, the actual decay point and the actual measurement outcome are individually unpredictable even if one knew everything there is to know about the world.\nBut if, say, these things were unpredictable and happening for no reason, why does the Born’s rule exist in the first place? Why is it the square of the amplitude and not 1\/4 or 1\/8?\nSecondly, why does the way this work stay constant? If for example, a radioactive atom’s decay point is truly happening for no reason, why doesn’t it just suddenly start decaying with different probabilities?\nBut because this does not happen, does it not still indicate some level of order? If there is order, where does this come from without determinism?\nI am aware that “randomness” can create order through the law of large numbers but that originally interpreted randomness as a mere function of ignorance going back to Laplace’s time. This doesn’t fully explain how or why indeterminism can lead to specific probability functions over others and why they generate particular kinds of order over others.\nIf one cannot in principle predict what will happen, then presumably, what is happening is occurring for no reason. But if what is happening is occurring for no reason, why are there certain probabilistic laws in the first place?\nGiven the lack of explanation here, how are we sure that the universe is indeterministic? Is it possible that our theorems deciding that no local hidden variables can exist are simply incorrect?\nOf course, even if the universe was deterministic, it would beg the question of how those laws came about. But laws always existing seem to be more explainable than laws ultimately appearing without cause while also staying constant.\nAssistant: If standard quantum mechanics disproves determinism, why are probabilities the way they are?"} {"text":"Task: Create a meaningful title for this question.\nInquiry: For a project, I recorded the sounds of a boccee ball impacting with some ping pong balls in a container using Audacity. I also used a sound pressure level meter to record the maximum dB C that was produced from the impact. How do I analyse this data to find the energy in the sound wave.\nHow do I find the energy of sound from Audacity?"}", "/scratch/micpie/export/physics_stackexchange/valid_0-0.jsonl": "{"text":"Task: Please answer the question of the user.\n\\nIt is implied, per QM, that the behavior of subatomic particles cannot be precisely predicted. However, these indeterministic effects do have defined probabilities. By the law of large numbers, they can “average” out and result in approximately deterministic laws.\nFor this reason, I presume, we can predict with pinpoint accuracy whether or not atleast some kinds of events will happen in the macro scale even if we can’t know their minute details on a subatomic level.\nThe question then is how fine or loose grained of an event is predictable given all knowledge about antecedent conditions. And how antecedent must these conditions be?\nSuppose I woke up today at 9 AM and ate toast for breakfast. If I were to know **everything** that could be possibly known about the configuration of the universe right after the Big Bang, is this event predictable? Can one say, given that knowledge, with assuredness whether or not this will happen?\nAnswer: When thinking about the entirety of the Universe in terms of QM you will very quickly run into paradoxes. That's why I don't think we are at a point when your question can be meaningfully answered. For instance, the Universe is by definition a closed system (there is nothing else but it). So it must be in a pure state. Therefore its entropy must be zero ${\\it always}$. How does this agree with the Second Law, the most obvious physical law out there?"} {"text":"Task: Provide a clear and concise reply to the user's inquiry.\n\\nIn Griffiths electrodynamics, The maxwell stress tensor is used to determine the net force on the northern hemisphere of a uniformly charged solid sphere of radius R and charge Q. To do this, we solve the integral\n$$\\vec{F}=\\int\\_S\\vec{T}\\cdot d\\vec{a}$$\nwhere S is the closed surface enclosing the entire northern hemisphere and $\\vec{T}$ is the Maxwell stress tensor. The result we get is that $$\\vec{F}=\\int\\_S\\vec{T}\\cdot d\\vec{a}=\\frac{1}{4\\pi \\epsilon\\_o}\\frac{3Q^2}{16R^2}\\hat{z} \\, .$$\nThis is clearly a *non-zero net force in the $\\hat{z}$ direction*. Later in the text though, Griffiths goes on to say that $\\int\\_S\\vec{T}\\cdot d\\vec{a}$ represents the \"momentum per unit time flowing in through the surface\". But if we have already calculated that\n$$\\vec{F}=\\int\\_S\\vec{T}\\cdot d\\vec{a}=\\frac{1}{4\\pi \\epsilon\\_o}\\frac{3Q^2}{16R^2}\\hat{z}$$\nin the case of the hemisphere, then that means that a non-zero amount of momentum is flowing in through the surface of the hemisphere at any giving time. But how can this be if we can easily make the assumption that the uniformly charged sphere is static. That is, if we assume that there is some force that is holding the charges together, counteracting their mutual repulsion then the charged sphere remains intact and does not change in time and so its total momentum remains constant. If this is the case, then how can momentum be flowing into the surface enclosing the northern hemisphere? It seems to me like the interpretation of $\\vec{F}=\\int\\_S\\vec{T}\\cdot d\\vec{a}$ as the momentum per unit time flowing in through the surface is rendered untenable by this very example? Am I missing something in my interpretation or is griffiths interpretation actually incorrect here?\nAnswer: > \n> That is, if we assume that there is some force that is holding the charges together, counteracting their mutual repulsion then the charged sphere remains intact and does not change in time and so its total momentum remains constant.\n> \nAbsolutely correct. The problem lies in the assumption. The Maxwell stress tensor directly derives from the Lorentz force (or in your simplified example, from the Coulomb force) that a charge distribution exerts on itself. And this force would drive the charge distribution apart *if no other forces are present, that hold it together*. These other forces are, in your example, outside the description of the referenced Maxwell stress tensor. A complete description would also include another stress tensor $\\mathbf{T}\\_2$, that describes the binding forces. In this complete description, the net force over the whole hemisphere would be zero because the configuration is static, and hence, no momentum flux through the surface would occur.\nIf you do not know the whole stress tensor, as is the case in your example (unknown contribution of binding forces), it is not valid to consider the given (partial) stress tensor as a representation of net momentum flux.\nThis can be most lucidly expressed simply by citing Newton's second law (augmented by the superposition principle, \"lex quarta\"):\n$$\\frac{d\\mathbf{p}}{dt}=\\sum\\_i \\mathbf{F}\\_i$$\nIf the sum of all forces on the system equals zero, then the momentum of that system is conserved. Nevertheless, nobody keeps you from computing or measuring only $\\mathbf{F}\\_1$, but then you should not label this a momentum change.\nFor example, you fix a rope to the wall, and then you pull at the rope. The rope doesn't move, but if you erroneously ignore the force that the wall exerts on the rope, and just identify the force *you* apply to the rope, you get to the false conclusion that the rope were changing momentum."}", "/scratch/micpie/export/physics_stackexchange/test_0-0.jsonl": "{"text":"Task: Provide a clear and concise reply to the user's inquiry.\nInquiry: In quantum mechanics, the probability, say, that a radioactive atom will decay is well defined. By the Born Rule, it says that the probability of obtaining any possible measurement outcome is equal to the square of the corresponding amplitude. However, the actual decay point and the actual measurement outcome are individually unpredictable even if one knew everything there is to know about the world.\nBut if, say, these things were unpredictable and happening for no reason, why does the Born’s rule exist in the first place? Why is it the square of the amplitude and not 1\/4 or 1\/8?\nSecondly, why does the way this work stay constant? If for example, a radioactive atom’s decay point is truly happening for no reason, why doesn’t it just suddenly start decaying with different probabilities?\nBut because this does not happen, does it not still indicate some level of order? If there is order, where does this come from without determinism?\nI am aware that “randomness” can create order through the law of large numbers but that originally interpreted randomness as a mere function of ignorance going back to Laplace’s time. This doesn’t fully explain how or why indeterminism can lead to specific probability functions over others and why they generate particular kinds of order over others.\nIf one cannot in principle predict what will happen, then presumably, what is happening is occurring for no reason. But if what is happening is occurring for no reason, why are there certain probabilistic laws in the first place?\nGiven the lack of explanation here, how are we sure that the universe is indeterministic? Is it possible that our theorems deciding that no local hidden variables can exist are simply incorrect?\nOf course, even if the universe was deterministic, it would beg the question of how those laws came about. But laws always existing seem to be more explainable than laws ultimately appearing without cause while also staying constant.\nAnswer: \\*\\*\\* Why is it the square of the amplitude and not 1\/4 or 1\/8?\nIt's essential that the description of quantum phenomena use probability amplitudes that have the feature that they are complex numbers and can be negative. The amplitudes allow for interference, etc. We take the square modulus for the probability as that is what relates to how measurements are constructed.\n\\*\\*\\* Secondly, why does the way this work stay constant? If for example, a radioactive atom’s decay point is truly happening for no reason, why doesn’t it just suddenly start decaying with different probabilities?\nIt's not true that it isn't happening for no reason. If you look at the deeper theory what you'll find is the quantum vacuum interacts with the excited state to induce the emission."} {"text":"Task: Your role is to respond to the user's question with clarity.\n\\nFor a project, I recorded the sounds of a boccee ball impacting with some ping pong balls in a container using Audacity. I also used a sound pressure level meter to record the maximum dB C that was produced from the impact. How do I analyse this data to find the energy in the sound wave.\nAnswer: I don't know if you can do this directly in Audacity. But you can process the signal to estimate what you want as follows.\n1. Compute the amplitude of the signal from the sound pressure level.\n2. Normalize your signal to have an amplitude according to the previous step.\n3. Compute the time integral of the square of your signal.\nIf you use SI units you should obtain a value in Joules."}", "/scratch/micpie/export/physics_stackexchange/train_0-0.jsonl": "{"text":"Task: Offer a concise and informative answer to the user's question.\nInquiry: It is implied, per QM, that the behavior of subatomic particles cannot be precisely predicted. However, these indeterministic effects do have defined probabilities. By the law of large numbers, they can “average” out and result in approximately deterministic laws.\nFor this reason, I presume, we can predict with pinpoint accuracy whether or not atleast some kinds of events will happen in the macro scale even if we can’t know their minute details on a subatomic level.\nThe question then is how fine or loose grained of an event is predictable given all knowledge about antecedent conditions. And how antecedent must these conditions be?\nSuppose I woke up today at 9 AM and ate toast for breakfast. If I were to know **everything** that could be possibly known about the configuration of the universe right after the Big Bang, is this event predictable? Can one say, given that knowledge, with assuredness whether or not this will happen?\nAnswer: When thinking about the entirety of the Universe in terms of QM you will very quickly run into paradoxes. That's why I don't think we are at a point when your question can be meaningfully answered. For instance, the Universe is by definition a closed system (there is nothing else but it). So it must be in a pure state. Therefore its entropy must be zero ${\\it always}$. How does this agree with the Second Law, the most obvious physical law out there?"} {"text":"Task: Please answer the question of the user.\nQuestion: I was reading Peter Mann's Lagrangian & Hamiltonian Dynamics, and I found this equation (page 115):\n$$p\\_i := \\frac{\\partial L}{\\partial \\dot{q}^i}$$\nwhere L is the Lagrangian. I understand this is the definition of conjugate momentum, but I wanted to know if there is a particular reason for the momentum index to be a lower index and the coordinate index to be an upper index. Is it simply the author's preference or there is a deeper reason?\nAnswer: Usually, you would write your Lagrangian in some sort of form like:\n$$L = {\\dot q}^{i}{\\dot q}\\_{i} = g^{ij}{\\dot q}\\_{i}{\\dot q}\\_{j}$$, because the lagrangian itself is a scalar. Then, if you took a variation with respect to the \"downed\" version, you'd be left with\n$$\\frac{\\delta L}{\\delta {\\dot q}\\_i} = 2 g^{ij}{\\dot q}\\_{j} = 2 {\\dot q}^{i}$$\nSo, variation of a scalar with respect to a \"downed\" index leaves an \"upped\" index."}", "/scratch/micpie/export/physics_stackexchange/valid_0-1.jsonl": "{"text":"Task: Generate a title for this question.\nInquiry: It is implied, per QM, that the behavior of subatomic particles cannot be precisely predicted. However, these indeterministic effects do have defined probabilities. By the law of large numbers, they can “average” out and result in approximately deterministic laws.\nFor this reason, I presume, we can predict with pinpoint accuracy whether or not atleast some kinds of events will happen in the macro scale even if we can’t know their minute details on a subatomic level.\nThe question then is how fine or loose grained of an event is predictable given all knowledge about antecedent conditions. And how antecedent must these conditions be?\nSuppose I woke up today at 9 AM and ate toast for breakfast. If I were to know **everything** that could be possibly known about the configuration of the universe right after the Big Bang, is this event predictable? Can one say, given that knowledge, with assuredness whether or not this will happen?\nIf we were to know everything about the universe right after the Big Bang, can we predict me eating toast today?"} {"text":"Task: Create a meaningful title for this question.\nInquiry: In Griffiths electrodynamics, The maxwell stress tensor is used to determine the net force on the northern hemisphere of a uniformly charged solid sphere of radius R and charge Q. To do this, we solve the integral\n$$\\vec{F}=\\int\\_S\\vec{T}\\cdot d\\vec{a}$$\nwhere S is the closed surface enclosing the entire northern hemisphere and $\\vec{T}$ is the Maxwell stress tensor. The result we get is that $$\\vec{F}=\\int\\_S\\vec{T}\\cdot d\\vec{a}=\\frac{1}{4\\pi \\epsilon\\_o}\\frac{3Q^2}{16R^2}\\hat{z} \\, .$$\nThis is clearly a *non-zero net force in the $\\hat{z}$ direction*. Later in the text though, Griffiths goes on to say that $\\int\\_S\\vec{T}\\cdot d\\vec{a}$ represents the \"momentum per unit time flowing in through the surface\". But if we have already calculated that\n$$\\vec{F}=\\int\\_S\\vec{T}\\cdot d\\vec{a}=\\frac{1}{4\\pi \\epsilon\\_o}\\frac{3Q^2}{16R^2}\\hat{z}$$\nin the case of the hemisphere, then that means that a non-zero amount of momentum is flowing in through the surface of the hemisphere at any giving time. But how can this be if we can easily make the assumption that the uniformly charged sphere is static. That is, if we assume that there is some force that is holding the charges together, counteracting their mutual repulsion then the charged sphere remains intact and does not change in time and so its total momentum remains constant. If this is the case, then how can momentum be flowing into the surface enclosing the northern hemisphere? It seems to me like the interpretation of $\\vec{F}=\\int\\_S\\vec{T}\\cdot d\\vec{a}$ as the momentum per unit time flowing in through the surface is rendered untenable by this very example? Am I missing something in my interpretation or is griffiths interpretation actually incorrect here?\nAnswer: What is the correct interpretation of the Maxwell stress tensor?"}", "/scratch/micpie/export/physics_stackexchange/train_0-1.jsonl": "{"text":"Task: Summarize the question in a title.\nQuestion: It is implied, per QM, that the behavior of subatomic particles cannot be precisely predicted. However, these indeterministic effects do have defined probabilities. By the law of large numbers, they can “average” out and result in approximately deterministic laws.\nFor this reason, I presume, we can predict with pinpoint accuracy whether or not atleast some kinds of events will happen in the macro scale even if we can’t know their minute details on a subatomic level.\nThe question then is how fine or loose grained of an event is predictable given all knowledge about antecedent conditions. And how antecedent must these conditions be?\nSuppose I woke up today at 9 AM and ate toast for breakfast. If I were to know **everything** that could be possibly known about the configuration of the universe right after the Big Bang, is this event predictable? Can one say, given that knowledge, with assuredness whether or not this will happen?\nAnswer: If we were to know everything about the universe right after the Big Bang, can we predict me eating toast today?"} {"text":"Task: Generate a title for this question.\nQuestion: I was reading Peter Mann's Lagrangian & Hamiltonian Dynamics, and I found this equation (page 115):\n$$p\\_i := \\frac{\\partial L}{\\partial \\dot{q}^i}$$\nwhere L is the Lagrangian. I understand this is the definition of conjugate momentum, but I wanted to know if there is a particular reason for the momentum index to be a lower index and the coordinate index to be an upper index. Is it simply the author's preference or there is a deeper reason?\nAnswer: Conjugate momentum notation"}", "/scratch/micpie/export/peptides_soluble/test_0-10.jsonl": "{"text":"User: Can you create the amino acid sequence of a peptide that is soluble?\nAssistant: Yes, I'm happy to help, here you go: MKPKPSQFKNLERMLGLKTEQLDAVKVTIELKDKILIIENPTVIKMIAQGQEIYSVIGEAKEAQKEEPKVEIKDEDVKFVMEQTGKGEQEVKEALQKANGDIAKAILLLTGQET"} {"text":"User: Can you create the amino acid sequence of a peptide that is not soluble?\nAssistant: Of course, here you go: AVANILNRDYNRAQEILNGIKNPDATTYYLMAVVAARTNNLDVVINSLRESISLDSSMMKKAATDLEFAKYANDGGFKSLLRH"}", "/scratch/micpie/export/peptides_soluble/valid_0-8.jsonl": "{"text":"User: Can you estimate if the peptide with the amino acid sequence GSHMKETKKAGPIELPEELEAQRQRHNDPRRPPWPLLHQRVVLLREGKGAPEDIALMWEQTKHYYPADWLIPLELTQVLKYSSGKYLQTYVADPDEMRKEVLMQLLNVKYGRVSDPNGGRVNKDVEEIISMAVDDLENMDLNPAADAVLIPTHT is soluble?\nAssistant: Yes, this amino acid sequence is soluble."} {"text":"User: Can you derive if the peptide with the amino acid sequence MAKVCYFTGRKTVSANNRSHAMNKTKRVAKPNLQKVTVLIDGKPKKVWASARALKSGKVERVLEHHHHHH is soluble?\nAssistant: No, this amino acid sequence is not soluble."}", "/scratch/micpie/export/peptides_soluble/test_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which amino acid sequences are soluble?\nConstraint: You must select none, one or more options from a, b, c, or d without using any other words.\nOptions:\n[a] MKPKPSQFKNLERMLGLKTEQLDAVKVTIELKDKILIIENPTVIKMIAQGQEIYSVIGEAKEAQKEEPKVEIKDEDVKFVMEQTGKGEQEVKEALQKANGDIAKAILLLTGQET\n[b] MGSDKIHHHHHHADGNDPVENSVSTYLLEKTYGARSVTYEENNSDKLKLSELPAISLSEADHILSVLRKHTDAQEELDIQTATKGEQTWLRIVMKQTIDHKYAFTIQLNMNCYNDGSLYYGGYQAECSSSLIKWYLKGFSLATDNATKNYKFESQSYIYMKVIDNGIKYMQIPVTINGNYNPQNHDAAFSYNL\n[c] MEPRTIACPHKGCTKMFRDNSAMRKHLHTHGPRVHVCAECGKAFVESSKLKRHQLVHTGEKPFQCTFEGCGKRFSLDFNLRTHVRIHTGDRPYVCPFDGCNKKFAQSTNLKSHILTHAKAKNNQ\n[d] MMRPDAKVEKVYLYPKPVDFRKSIDGLAALVELDIKVAVFDPVLFVFLNRHRNRVKILYWERNGFCLWLKRLESERFKTSPDETDEAIVLTVQELNWLLDGFDLWRNRPHKVLTPRFVA\nAnswer: a, c"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which amino acid sequences are not soluble?\nConstraint: You must select none, one or more options from A or B without using any other words.\nOptions:\nA MRASEVLQKCLPNSLSGMHALRERALLHAVEALLHGRRLTLMDIARSWPSALRVRAPLKAVDRLLSNRNLQVERSVIDHEMAHWLLRGAQPVIVIDWSDLKPDKSWCLLRAAVPVGGRTLTLLDMVVPGKQQGLEHHHHHH\nB AVANILNRDYNRAQEILNGIKNPDATTYYLMAVVAARTNNLDVVINSLRESISLDSSMMKKAATDLEFAKYANDGGFKSLLRH\nAnswer: B"}", "/scratch/micpie/export/peptides_soluble/train_0-8.jsonl": "{"text":"User: Can you tell me if the peptide with the amino acid sequence MSLIPKFREFDRERHRTDYQKGMSYAEQQDFDMGFTIWFDHIEDLDLIEKDGTINRIVMMSTGLKDKNVKEIYESDIVRNLYGELYVVEWLDGSFVLTEFYNGGYDHYIIDSSTEYEVLGNIYENPELLEDDNHASNEGHHHHHH is soluble?\nAssistant: Yes, this amino acid sequence is soluble."} {"text":"User: Can you derive if the peptide with the amino acid sequence MTARARSALLDAEHIVGYTTYVELLPDEITEGADDIYNTPMCGEVSRTEEAIDRALAGNDVAIIGSGDPNVYALAGLALEIIESKGATATMLDFDVVPGVP is soluble?\nAssistant: No, this amino acid sequence is not soluble."}", "/scratch/micpie/export/peptides_soluble/test_0-5.jsonl": "{"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is soluble.\namino acid sequence : MKPKPSQFKNLERMLGLKTEQLDAVKVTIELKDKILIIENPTVIKMIAQGQEIYSVIGEAKEAQKEEPKVEIKDEDVKFVMEQTGKGEQEVKEALQKANGDIAKAILLLTGQET\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is soluble.\nsequence: AVANILNRDYNRAQEILNGIKNPDATTYYLMAVVAARTNNLDVVINSLRESISLDSSMMKKAATDLEFAKYANDGGFKSLLRH\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/peptides_soluble/valid_0-9.jsonl": "{"text":"User: Is the peptide with the amino acid sequence GSHMKETKKAGPIELPEELEAQRQRHNDPRRPPWPLLHQRVVLLREGKGAPEDIALMWEQTKHYYPADWLIPLELTQVLKYSSGKYLQTYVADPDEMRKEVLMQLLNVKYGRVSDPNGGRVNKDVEEIISMAVDDLENMDLNPAADAVLIPTHT soluble?\nAssistant: Yes, it is soluble."} {"text":"User: Is the peptide with the amino acid sequence MAKVCYFTGRKTVSANNRSHAMNKTKRVAKPNLQKVTVLIDGKPKKVWASARALKSGKVERVLEHHHHHH soluble?\nAssistant: No, it is not soluble."}", "/scratch/micpie/export/peptides_soluble/test_0-1.jsonl": "{"text":"The amino acid sequence MKPKPSQFKNLERMLGLKTEQLDAVKVTIELKDKILIIENPTVIKMIAQGQEIYSVIGEAKEAQKEEPKVEIKDEDVKFVMEQTGKGEQEVKEALQKANGDIAKAILLLTGQET exhibits soluble properties."} {"text":"The amino acid sequence AVANILNRDYNRAQEILNGIKNPDATTYYLMAVVAARTNNLDVVINSLRESISLDSSMMKKAATDLEFAKYANDGGFKSLLRH exhibits no soluble properties."}", "/scratch/micpie/export/peptides_soluble/valid_0-0.jsonl": "{"text":"The sequence of amino acids GSHMKETKKAGPIELPEELEAQRQRHNDPRRPPWPLLHQRVVLLREGKGAPEDIALMWEQTKHYYPADWLIPLELTQVLKYSSGKYLQTYVADPDEMRKEVLMQLLNVKYGRVSDPNGGRVNKDVEEIISMAVDDLENMDLNPAADAVLIPTHT demonstrates soluble properties."} {"text":"The sequence of AAs MAKVCYFTGRKTVSANNRSHAMNKTKRVAKPNLQKVTVLIDGKPKKVWASARALKSGKVERVLEHHHHHH shows no soluble properties."}", "/scratch/micpie/export/peptides_soluble/test_0-2.jsonl": "{"text":"Based on the amino acid sequence representation MKPKPSQFKNLERMLGLKTEQLDAVKVTIELKDKILIIENPTVIKMIAQGQEIYSVIGEAKEAQKEEPKVEIKDEDVKFVMEQTGKGEQEVKEALQKANGDIAKAILLLTGQET, the peptide has soluble properties."} {"text":"Based on the amino acid sequence representation AVANILNRDYNRAQEILNGIKNPDATTYYLMAVVAARTNNLDVVINSLRESISLDSSMMKKAATDLEFAKYANDGGFKSLLRH, the peptide has no soluble features."}", "/scratch/micpie/export/peptides_soluble/valid_0-10.jsonl": "{"text":"User: Can you generate the amino acid sequence of a peptide that is soluble?\nAssistant: Of course, here you go: GSHMKETKKAGPIELPEELEAQRQRHNDPRRPPWPLLHQRVVLLREGKGAPEDIALMWEQTKHYYPADWLIPLELTQVLKYSSGKYLQTYVADPDEMRKEVLMQLLNVKYGRVSDPNGGRVNKDVEEIISMAVDDLENMDLNPAADAVLIPTHT"} {"text":"User: Can you generate the amino acid sequence of a peptide that is not soluble?\nAssistant: Of course, here you go: MAKVCYFTGRKTVSANNRSHAMNKTKRVAKPNLQKVTVLIDGKPKKVWASARALKSGKVERVLEHHHHHH"}", "/scratch/micpie/export/peptides_soluble/train_0-6.jsonl": "{"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is soluble.\nAA sequence: MSLIPKFREFDRERHRTDYQKGMSYAEQQDFDMGFTIWFDHIEDLDLIEKDGTINRIVMMSTGLKDKNVKEIYESDIVRNLYGELYVVEWLDGSFVLTEFYNGGYDHYIIDSSTEYEVLGNIYENPELLEDDNHASNEGHHHHHH\nConstraint: Answer the question in a full sentence.\nResult: This amino acid sequence is soluble."} {"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is soluble.\nAA sequence: MTARARSALLDAEHIVGYTTYVELLPDEITEGADDIYNTPMCGEVSRTEEAIDRALAGNDVAIIGSGDPNVYALAGLALEIIESKGATATMLDFDVVPGVP\nConstraint: Answer the question in a full sentence.\nResult: This amino acid sequence is not soluble."}", "/scratch/micpie/export/peptides_soluble/valid_0-6.jsonl": "{"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is soluble.\nAA sequence: GSHMKETKKAGPIELPEELEAQRQRHNDPRRPPWPLLHQRVVLLREGKGAPEDIALMWEQTKHYYPADWLIPLELTQVLKYSSGKYLQTYVADPDEMRKEVLMQLLNVKYGRVSDPNGGRVNKDVEEIISMAVDDLENMDLNPAADAVLIPTHT\nConstraint: Answer the question in a full sentence.\nResult: This amino acid sequence is soluble."} {"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is soluble.\nAA sequence: MAKVCYFTGRKTVSANNRSHAMNKTKRVAKPNLQKVTVLIDGKPKKVWASARALKSGKVERVLEHHHHHH\nConstraint: Answer the question in a full sentence.\nResult: This amino acid sequence is not soluble."}", "/scratch/micpie/export/peptides_soluble/test_0-9.jsonl": "{"text":"User: Is the peptide with the amino acid sequence MKPKPSQFKNLERMLGLKTEQLDAVKVTIELKDKILIIENPTVIKMIAQGQEIYSVIGEAKEAQKEEPKVEIKDEDVKFVMEQTGKGEQEVKEALQKANGDIAKAILLLTGQET soluble?\nAssistant: Yes, it is soluble."} {"text":"User: Is the peptide with the amino acid sequence AVANILNRDYNRAQEILNGIKNPDATTYYLMAVVAARTNNLDVVINSLRESISLDSSMMKKAATDLEFAKYANDGGFKSLLRH soluble?\nAssistant: No, it is not soluble."}", "/scratch/micpie/export/peptides_soluble/test_0-0.jsonl": "{"text":"The sequence of AAs MKPKPSQFKNLERMLGLKTEQLDAVKVTIELKDKILIIENPTVIKMIAQGQEIYSVIGEAKEAQKEEPKVEIKDEDVKFVMEQTGKGEQEVKEALQKANGDIAKAILLLTGQET shows soluble properties."} {"text":"The sequence of amino acids AVANILNRDYNRAQEILNGIKNPDATTYYLMAVVAARTNNLDVVINSLRESISLDSSMMKKAATDLEFAKYANDGGFKSLLRH demonstrates no soluble properties."}", "/scratch/micpie/export/peptides_soluble/valid_0-7.jsonl": "{"text":"Task: Please create a sequence based on the text description below.\nDescription: A amino acid sequence that is soluble.\nResult: GSHMKETKKAGPIELPEELEAQRQRHNDPRRPPWPLLHQRVVLLREGKGAPEDIALMWEQTKHYYPADWLIPLELTQVLKYSSGKYLQTYVADPDEMRKEVLMQLLNVKYGRVSDPNGGRVNKDVEEIISMAVDDLENMDLNPAADAVLIPTHT"} {"text":"Task: Please create a AA sequence based on the description.\nDescription: A amino acid sequence that is soluble.\nResult: MAKVCYFTGRKTVSANNRSHAMNKTKRVAKPNLQKVTVLIDGKPKKVWASARALKSGKVERVLEHHHHHH"}", "/scratch/micpie/export/peptides_soluble/test_0-3.jsonl": "{"text":"The amino acid sequence MKPKPSQFKNLERMLGLKTEQLDAVKVTIELKDKILIIENPTVIKMIAQGQEIYSVIGEAKEAQKEEPKVEIKDEDVKFVMEQTGKGEQEVKEALQKANGDIAKAILLLTGQET is from a peptide that is identified as soluble."} {"text":"The amino acid sequence AVANILNRDYNRAQEILNGIKNPDATTYYLMAVVAARTNNLDVVINSLRESISLDSSMMKKAATDLEFAKYANDGGFKSLLRH is from a peptide that is not identified as soluble."}", "/scratch/micpie/export/peptides_soluble/valid_0-11.jsonl": "{"text":"User: I'm searching for the amino acid sequence of a peptide that is soluble?\nAssistant: This is a amino acid sequence that is soluble: GSHMKETKKAGPIELPEELEAQRQRHNDPRRPPWPLLHQRVVLLREGKGAPEDIALMWEQTKHYYPADWLIPLELTQVLKYSSGKYLQTYVADPDEMRKEVLMQLLNVKYGRVSDPNGGRVNKDVEEIISMAVDDLENMDLNPAADAVLIPTHT"} {"text":"User: I'm looking for the amino acid sequence of a peptide that is not soluble?\nAssistant: This is a amino acid sequence that is not soluble: MAKVCYFTGRKTVSANNRSHAMNKTKRVAKPNLQKVTVLIDGKPKKVWASARALKSGKVERVLEHHHHHH"}", "/scratch/micpie/export/peptides_soluble/train_0-0.jsonl": "{"text":"The sequence of amino acids MSLIPKFREFDRERHRTDYQKGMSYAEQQDFDMGFTIWFDHIEDLDLIEKDGTINRIVMMSTGLKDKNVKEIYESDIVRNLYGELYVVEWLDGSFVLTEFYNGGYDHYIIDSSTEYEVLGNIYENPELLEDDNHASNEGHHHHHH demonstrates soluble properties."} {"text":"The sequence of AAs MTARARSALLDAEHIVGYTTYVELLPDEITEGADDIYNTPMCGEVSRTEEAIDRALAGNDVAIIGSGDPNVYALAGLALEIIESKGATATMLDFDVVPGVP exhibits no soluble properties."}", "/scratch/micpie/export/peptides_soluble/test_0-6.jsonl": "{"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is soluble.\nsequence: MKPKPSQFKNLERMLGLKTEQLDAVKVTIELKDKILIIENPTVIKMIAQGQEIYSVIGEAKEAQKEEPKVEIKDEDVKFVMEQTGKGEQEVKEALQKANGDIAKAILLLTGQET\nConstraint: Answer the question in a complete sentence.\nResult: This amino acid sequence is soluble."} {"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is soluble.\namino acid sequence : AVANILNRDYNRAQEILNGIKNPDATTYYLMAVVAARTNNLDVVINSLRESISLDSSMMKKAATDLEFAKYANDGGFKSLLRH\nConstraint: Answer the question in a full sentence.\nResult: This amino acid sequence is not soluble."}", "/scratch/micpie/export/peptides_soluble/train_0-10.jsonl": "{"text":"User: Can you create the amino acid sequence of a peptide that is soluble?\nAssistant: Of course, here you go: MSLIPKFREFDRERHRTDYQKGMSYAEQQDFDMGFTIWFDHIEDLDLIEKDGTINRIVMMSTGLKDKNVKEIYESDIVRNLYGELYVVEWLDGSFVLTEFYNGGYDHYIIDSSTEYEVLGNIYENPELLEDDNHASNEGHHHHHH"} {"text":"User: Can you generate the amino acid sequence of a peptide that is not soluble?\nAssistant: Sure, here you go: MTARARSALLDAEHIVGYTTYVELLPDEITEGADDIYNTPMCGEVSRTEEAIDRALAGNDVAIIGSGDPNVYALAGLALEIIESKGATATMLDFDVVPGVP"}", "/scratch/micpie/export/peptides_soluble/train_0-3.jsonl": "{"text":"The amino acid sequence MSLIPKFREFDRERHRTDYQKGMSYAEQQDFDMGFTIWFDHIEDLDLIEKDGTINRIVMMSTGLKDKNVKEIYESDIVRNLYGELYVVEWLDGSFVLTEFYNGGYDHYIIDSSTEYEVLGNIYENPELLEDDNHASNEGHHHHHH represents a peptide that is identified as soluble."} {"text":"The amino acid sequence MTARARSALLDAEHIVGYTTYVELLPDEITEGADDIYNTPMCGEVSRTEEAIDRALAGNDVAIIGSGDPNVYALAGLALEIIESKGATATMLDFDVVPGVP is from a peptide that is not identified as soluble."}", "/scratch/micpie/export/peptides_soluble/train_0-12.jsonl": "{"text":"User: I want to create a AA sequence.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The amino acid sequence should be soluble.\nAssistant: Got it, here you go, this amino acid sequence is soluble: MSLIPKFREFDRERHRTDYQKGMSYAEQQDFDMGFTIWFDHIEDLDLIEKDGTINRIVMMSTGLKDKNVKEIYESDIVRNLYGELYVVEWLDGSFVLTEFYNGGYDHYIIDSSTEYEVLGNIYENPELLEDDNHASNEGHHHHHH"} {"text":"User: I want to create a amino acid sequence .\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The amino acid sequence should not be soluble.\nAssistant: Ok, this amino acid sequence is not soluble: MTARARSALLDAEHIVGYTTYVELLPDEITEGADDIYNTPMCGEVSRTEEAIDRALAGNDVAIIGSGDPNVYALAGLALEIIESKGATATMLDFDVVPGVP"}", "/scratch/micpie/export/peptides_soluble/test_0-13.jsonl": "{"text":"User: I want to generate a AA sequence.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the amino acid sequence should be soluble.\nAssistant: Understood, this amino acid sequence is soluble: MKPKPSQFKNLERMLGLKTEQLDAVKVTIELKDKILIIENPTVIKMIAQGQEIYSVIGEAKEAQKEEPKVEIKDEDVKFVMEQTGKGEQEVKEALQKANGDIAKAILLLTGQET"} {"text":"User: I want to create a AA sequence.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the amino acid sequence should not be soluble.\nAssistant: Got it, this amino acid sequence is not soluble: AVANILNRDYNRAQEILNGIKNPDATTYYLMAVVAARTNNLDVVINSLRESISLDSSMMKKAATDLEFAKYANDGGFKSLLRH"}", "/scratch/micpie/export/peptides_soluble/valid_0-2.jsonl": "{"text":"Based on the amino acid sequence representation GSHMKETKKAGPIELPEELEAQRQRHNDPRRPPWPLLHQRVVLLREGKGAPEDIALMWEQTKHYYPADWLIPLELTQVLKYSSGKYLQTYVADPDEMRKEVLMQLLNVKYGRVSDPNGGRVNKDVEEIISMAVDDLENMDLNPAADAVLIPTHT, the peptide has soluble characteristics."} {"text":"Based on the amino acid sequence representation MAKVCYFTGRKTVSANNRSHAMNKTKRVAKPNLQKVTVLIDGKPKKVWASARALKSGKVERVLEHHHHHH, the peptide has no soluble characteristics."}", "/scratch/micpie/export/peptides_soluble/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the peptide with the amino acid sequence MSLIPKFREFDRERHRTDYQKGMSYAEQQDFDMGFTIWFDHIEDLDLIEKDGTINRIVMMSTGLKDKNVKEIYESDIVRNLYGELYVVEWLDGSFVLTEFYNGGYDHYIIDSSTEYEVLGNIYENPELLEDDNHASNEGHHHHHH soluble?\nConstraint: Even if you are uncertain, you must pick either a or b without using any other words.\nOptions:\na.) True\nb.) False\nAnswer: a"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the peptide with the amino acid sequence MTARARSALLDAEHIVGYTTYVELLPDEITEGADDIYNTPMCGEVSRTEEAIDRALAGNDVAIIGSGDPNVYALAGLALEIIESKGATATMLDFDVVPGVP soluble?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any additional words.\nOptions:\n1 False\n2 True\nAnswer: 1"}", "/scratch/micpie/export/peptides_soluble/valid_0-1.jsonl": "{"text":"The amino acid sequence GSHMKETKKAGPIELPEELEAQRQRHNDPRRPPWPLLHQRVVLLREGKGAPEDIALMWEQTKHYYPADWLIPLELTQVLKYSSGKYLQTYVADPDEMRKEVLMQLLNVKYGRVSDPNGGRVNKDVEEIISMAVDDLENMDLNPAADAVLIPTHT shows soluble properties."} {"text":"The amino acid sequence MAKVCYFTGRKTVSANNRSHAMNKTKRVAKPNLQKVTVLIDGKPKKVWASARALKSGKVERVLEHHHHHH shows no soluble properties."}", "/scratch/micpie/export/peptides_soluble/valid_0-13.jsonl": "{"text":"User: I want to come up with a amino acid sequence .\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the amino acid sequence should be soluble.\nAssistant: Ok, this amino acid sequence is soluble: GSHMKETKKAGPIELPEELEAQRQRHNDPRRPPWPLLHQRVVLLREGKGAPEDIALMWEQTKHYYPADWLIPLELTQVLKYSSGKYLQTYVADPDEMRKEVLMQLLNVKYGRVSDPNGGRVNKDVEEIISMAVDDLENMDLNPAADAVLIPTHT"} {"text":"User: I want to generate a AA sequence.\nAssistant: This sounds very interesting. Should it be a special amino acid sequence?\nUser: Yes, the amino acid sequence should not be soluble.\nAssistant: Understood, this amino acid sequence is not soluble: MAKVCYFTGRKTVSANNRSHAMNKTKRVAKPNLQKVTVLIDGKPKKVWASARALKSGKVERVLEHHHHHH"}", "/scratch/micpie/export/peptides_soluble/valid_0-5.jsonl": "{"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is soluble.\namino acid sequence : GSHMKETKKAGPIELPEELEAQRQRHNDPRRPPWPLLHQRVVLLREGKGAPEDIALMWEQTKHYYPADWLIPLELTQVLKYSSGKYLQTYVADPDEMRKEVLMQLLNVKYGRVSDPNGGRVNKDVEEIISMAVDDLENMDLNPAADAVLIPTHT\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is soluble.\nAA sequence: MAKVCYFTGRKTVSANNRSHAMNKTKRVAKPNLQKVTVLIDGKPKKVWASARALKSGKVERVLEHHHHHH\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/peptides_soluble/train_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which amino acid sequences are soluble?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any other words.\nOptions:\n1: MSLIPKFREFDRERHRTDYQKGMSYAEQQDFDMGFTIWFDHIEDLDLIEKDGTINRIVMMSTGLKDKNVKEIYESDIVRNLYGELYVVEWLDGSFVLTEFYNGGYDHYIIDSSTEYEVLGNIYENPELLEDDNHASNEGHHHHHH\n2: MQDGMYQRFLRQHVHPEETGGSDRYCNLMMQRRKMTLYHCKRFNTFIHEDIWNIRSICSTTNIQCKNGKMNCHEGVVKVTDCRDTGSSRAPNCRYRAIASTRRVVIACEGNPQVPVHFDG\n3: MAITVSIELNRDLEIPASYDEVFDLLADVPKSASHFPKVDKLVDLGNNAYRWEMEKVGVDKHAIQSVYACTYHADKEAGKITWSPIKGEGNGVVSGSWTLSAKGDNATAVKFQTSAELTVPLPSLLKLAISPVIKHEFNSLVDTYMANLKKAFLEHHHHHH\nAnswer: 1, 2, 3"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which amino acid sequences are not soluble?\nConstraint: You must select none, one or more options from a or b without using any additional words.\nOptions:\na.) GSMADTRRRQNHSCDPCRKGKRRCDAPENRNEANENGWVSCSNCKRWNKDCTFNWLSSQRSKNSS\nb.) MTARARSALLDAEHIVGYTTYVELLPDEITEGADDIYNTPMCGEVSRTEEAIDRALAGNDVAIIGSGDPNVYALAGLALEIIESKGATATMLDFDVVPGVP\nAnswer: b"}", "/scratch/micpie/export/peptides_soluble/valid_0-4.jsonl": "{"text":"The sequence of amino acids (AAs) GSHMKETKKAGPIELPEELEAQRQRHNDPRRPPWPLLHQRVVLLREGKGAPEDIALMWEQTKHYYPADWLIPLELTQVLKYSSGKYLQTYVADPDEMRKEVLMQLLNVKYGRVSDPNGGRVNKDVEEIISMAVDDLENMDLNPAADAVLIPTHT is soluble."} {"text":"The amino acid sequence MAKVCYFTGRKTVSANNRSHAMNKTKRVAKPNLQKVTVLIDGKPKKVWASARALKSGKVERVLEHHHHHH is not soluble."}", "/scratch/micpie/export/peptides_soluble/train_0-5.jsonl": "{"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is soluble.\namino acid sequence : MSLIPKFREFDRERHRTDYQKGMSYAEQQDFDMGFTIWFDHIEDLDLIEKDGTINRIVMMSTGLKDKNVKEIYESDIVRNLYGELYVVEWLDGSFVLTEFYNGGYDHYIIDSSTEYEVLGNIYENPELLEDDNHASNEGHHHHHH\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"} {"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is soluble.\nAA sequence: MTARARSALLDAEHIVGYTTYVELLPDEITEGADDIYNTPMCGEVSRTEEAIDRALAGNDVAIIGSGDPNVYALAGLALEIIESKGATATMLDFDVVPGVP\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/peptides_soluble/valid_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which amino acid sequences are soluble?\nConstraint: You must select none, one or more options from 1 or 2 without using any other words.\nOptions:\n[1] LIIGVTGCPAGIAHTYLAAEALEKGAAALGFEIKVETNGSIGVKNSPSAEEIERAEAIVVACDKQVDMARFAGKRLIKTNVKAPIRDAQKLINEALRAPTY\n[2] GSHMKETKKAGPIELPEELEAQRQRHNDPRRPPWPLLHQRVVLLREGKGAPEDIALMWEQTKHYYPADWLIPLELTQVLKYSSGKYLQTYVADPDEMRKEVLMQLLNVKYGRVSDPNGGRVNKDVEEIISMAVDDLENMDLNPAADAVLIPTHT\nAnswer: 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which amino acid sequences are not soluble?\nConstraint: You must select none, one or more options from 1, 2, 3, 4, or 5 without using any other words.\nOptions:\n1 MKMQKELSVYLKQILQAAQYIRLYTDKMDYGQFSADTKTVQAVVFNLFLIGENATHILKSYPEFAEETKYLNWIGMRGYCAIGLPMATLK\n2 MNDKKLLVAYDISSNRRRRKVARILEQCGIRINKSVFICSLRELTMDKLVEAVTSQTAKRDKVFFLPLCQHCYTAAWMSGHPTLPKSRRKRKSIVV\n3 MAVRKLTTGKWLCECYPAGRSGRRVRKQFATKGEALAFERHTMEETEAKPWLGESVDRRTLKDVVELWFKLHGKSLTAGQHVYDKLLLMVDALGNPLATDLTSKMFAHYRDKRLTGEIYFSEKWKKGASPVTINLEQSYLSSVFSELSRLGEWSYPNPLENMRKFTIA\n4 MQITEALISEPGEIRRFVQQAVDHWPNLLAFHFTLYSAEGIYGQQIQTFCSSFHRRVHERITEHNHTVSPSAPVVLRWLREQHEGAQIRCLLLLSQTSICHPRVGVMADEECAQLVDLLQQTWSVISAGGQCRVERCFRVARPGSSGQYVALKTAVQSFMSQVIATIIR\n5 MAKVCYFTGRKTVSANNRSHAMNKTKRVAKPNLQKVTVLIDGKPKKVWASARALKSGKVERVLEHHHHHH\nAnswer: 1, 2, 3, 4, 5"}", "/scratch/micpie/export/peptides_soluble/valid_0-12.jsonl": "{"text":"User: I want to create a AA sequence.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The amino acid sequence should be soluble.\nAssistant: Ok, here you go, this amino acid sequence is soluble: GSHMKETKKAGPIELPEELEAQRQRHNDPRRPPWPLLHQRVVLLREGKGAPEDIALMWEQTKHYYPADWLIPLELTQVLKYSSGKYLQTYVADPDEMRKEVLMQLLNVKYGRVSDPNGGRVNKDVEEIISMAVDDLENMDLNPAADAVLIPTHT"} {"text":"User: I want to generate a amino acid sequence .\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The amino acid sequence should not be soluble.\nAssistant: Got it, this amino acid sequence is not soluble: MAKVCYFTGRKTVSANNRSHAMNKTKRVAKPNLQKVTVLIDGKPKKVWASARALKSGKVERVLEHHHHHH"}", "/scratch/micpie/export/peptides_soluble/train_0-2.jsonl": "{"text":"Based on the amino acid sequence representation MSLIPKFREFDRERHRTDYQKGMSYAEQQDFDMGFTIWFDHIEDLDLIEKDGTINRIVMMSTGLKDKNVKEIYESDIVRNLYGELYVVEWLDGSFVLTEFYNGGYDHYIIDSSTEYEVLGNIYENPELLEDDNHASNEGHHHHHH, the peptide has soluble characteristics."} {"text":"Based on the amino acid sequence MTARARSALLDAEHIVGYTTYVELLPDEITEGADDIYNTPMCGEVSRTEEAIDRALAGNDVAIIGSGDPNVYALAGLALEIIESKGATATMLDFDVVPGVP, the peptide has no soluble features."}", "/scratch/micpie/export/peptides_soluble/test_0-11.jsonl": "{"text":"User: I'm searching for the amino acid sequence of a peptide that is soluble?\nAssistant: This is a amino acid sequence that is soluble: MKPKPSQFKNLERMLGLKTEQLDAVKVTIELKDKILIIENPTVIKMIAQGQEIYSVIGEAKEAQKEEPKVEIKDEDVKFVMEQTGKGEQEVKEALQKANGDIAKAILLLTGQET"} {"text":"User: I'm looking for the amino acid sequence of a peptide that is not soluble?\nAssistant: This is a amino acid sequence that is not soluble: AVANILNRDYNRAQEILNGIKNPDATTYYLMAVVAARTNNLDVVINSLRESISLDSSMMKKAATDLEFAKYANDGGFKSLLRH"}", "/scratch/micpie/export/peptides_soluble/train_0-7.jsonl": "{"text":"Task: Please create a amino acid sequence based on the text description.\nDescription: A amino acid sequence that is soluble.\nResult: MSLIPKFREFDRERHRTDYQKGMSYAEQQDFDMGFTIWFDHIEDLDLIEKDGTINRIVMMSTGLKDKNVKEIYESDIVRNLYGELYVVEWLDGSFVLTEFYNGGYDHYIIDSSTEYEVLGNIYENPELLEDDNHASNEGHHHHHH"} {"text":"Task: Please generate a AA sequence based on the description.\nDescription: A amino acid sequence that is soluble.\nResult: MTARARSALLDAEHIVGYTTYVELLPDEITEGADDIYNTPMCGEVSRTEEAIDRALAGNDVAIIGSGDPNVYALAGLALEIIESKGATATMLDFDVVPGVP"}", "/scratch/micpie/export/peptides_soluble/train_0-11.jsonl": "{"text":"User: I'm looking for the amino acid sequence of a peptide that is soluble?\nAssistant: This is a amino acid sequence that is soluble: MSLIPKFREFDRERHRTDYQKGMSYAEQQDFDMGFTIWFDHIEDLDLIEKDGTINRIVMMSTGLKDKNVKEIYESDIVRNLYGELYVVEWLDGSFVLTEFYNGGYDHYIIDSSTEYEVLGNIYENPELLEDDNHASNEGHHHHHH"} {"text":"User: I'm searching for the amino acid sequence of a peptide that is not soluble?\nAssistant: This is a amino acid sequence that is not soluble: MTARARSALLDAEHIVGYTTYVELLPDEITEGADDIYNTPMCGEVSRTEEAIDRALAGNDVAIIGSGDPNVYALAGLALEIIESKGATATMLDFDVVPGVP"}", "/scratch/micpie/export/peptides_soluble/train_0-1.jsonl": "{"text":"The amino acid sequence MSLIPKFREFDRERHRTDYQKGMSYAEQQDFDMGFTIWFDHIEDLDLIEKDGTINRIVMMSTGLKDKNVKEIYESDIVRNLYGELYVVEWLDGSFVLTEFYNGGYDHYIIDSSTEYEVLGNIYENPELLEDDNHASNEGHHHHHH exhibits soluble properties."} {"text":"The amino acid sequence MTARARSALLDAEHIVGYTTYVELLPDEITEGADDIYNTPMCGEVSRTEEAIDRALAGNDVAIIGSGDPNVYALAGLALEIIESKGATATMLDFDVVPGVP displays no soluble properties."}", "/scratch/micpie/export/peptides_soluble/train_0-13.jsonl": "{"text":"User: I want to generate a sequence.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the amino acid sequence should be soluble.\nAssistant: Ok, this amino acid sequence is soluble: MSLIPKFREFDRERHRTDYQKGMSYAEQQDFDMGFTIWFDHIEDLDLIEKDGTINRIVMMSTGLKDKNVKEIYESDIVRNLYGELYVVEWLDGSFVLTEFYNGGYDHYIIDSSTEYEVLGNIYENPELLEDDNHASNEGHHHHHH"} {"text":"User: I want to generate a AA sequence.\nAssistant: This sounds very exciting. Should it be a special amino acid sequence?\nUser: Yes, the amino acid sequence should not be soluble.\nAssistant: Ok, this amino acid sequence is not soluble: MTARARSALLDAEHIVGYTTYVELLPDEITEGADDIYNTPMCGEVSRTEEAIDRALAGNDVAIIGSGDPNVYALAGLALEIIESKGATATMLDFDVVPGVP"}", "/scratch/micpie/export/peptides_soluble/train_0-4.jsonl": "{"text":"The amino acid sequence MSLIPKFREFDRERHRTDYQKGMSYAEQQDFDMGFTIWFDHIEDLDLIEKDGTINRIVMMSTGLKDKNVKEIYESDIVRNLYGELYVVEWLDGSFVLTEFYNGGYDHYIIDSSTEYEVLGNIYENPELLEDDNHASNEGHHHHHH is soluble."} {"text":"The sequence of amino acids (AAs) MTARARSALLDAEHIVGYTTYVELLPDEITEGADDIYNTPMCGEVSRTEEAIDRALAGNDVAIIGSGDPNVYALAGLALEIIESKGATATMLDFDVVPGVP is not soluble."}", "/scratch/micpie/export/peptides_soluble/test_0-7.jsonl": "{"text":"Task: Please give me a amino acid sequence based on the text description.\nDescription: A amino acid sequence that is soluble.\nResult: MKPKPSQFKNLERMLGLKTEQLDAVKVTIELKDKILIIENPTVIKMIAQGQEIYSVIGEAKEAQKEEPKVEIKDEDVKFVMEQTGKGEQEVKEALQKANGDIAKAILLLTGQET"} {"text":"Task: Please create a AA sequence based on the text description.\nDescription: A amino acid sequence that is soluble.\nResult: AVANILNRDYNRAQEILNGIKNPDATTYYLMAVVAARTNNLDVVINSLRESISLDSSMMKKAATDLEFAKYANDGGFKSLLRH"}", "/scratch/micpie/export/peptides_soluble/train_0-9.jsonl": "{"text":"User: Is the peptide with the amino acid sequence MSLIPKFREFDRERHRTDYQKGMSYAEQQDFDMGFTIWFDHIEDLDLIEKDGTINRIVMMSTGLKDKNVKEIYESDIVRNLYGELYVVEWLDGSFVLTEFYNGGYDHYIIDSSTEYEVLGNIYENPELLEDDNHASNEGHHHHHH soluble?\nAssistant: Yes, it is soluble."} {"text":"User: Is the peptide with the amino acid sequence MTARARSALLDAEHIVGYTTYVELLPDEITEGADDIYNTPMCGEVSRTEEAIDRALAGNDVAIIGSGDPNVYALAGLALEIIESKGATATMLDFDVVPGVP soluble?\nAssistant: No, it is not soluble."}", "/scratch/micpie/export/peptides_soluble/valid_0-3.jsonl": "{"text":"The amino acid sequence GSHMKETKKAGPIELPEELEAQRQRHNDPRRPPWPLLHQRVVLLREGKGAPEDIALMWEQTKHYYPADWLIPLELTQVLKYSSGKYLQTYVADPDEMRKEVLMQLLNVKYGRVSDPNGGRVNKDVEEIISMAVDDLENMDLNPAADAVLIPTHT is from a peptide that is identified as soluble."} {"text":"The amino acid sequence MAKVCYFTGRKTVSANNRSHAMNKTKRVAKPNLQKVTVLIDGKPKKVWASARALKSGKVERVLEHHHHHH represents a peptide that is not identified as soluble."}", "/scratch/micpie/export/peptides_soluble/test_0-8.jsonl": "{"text":"User: Can you estimate if the peptide with the amino acid sequence MKPKPSQFKNLERMLGLKTEQLDAVKVTIELKDKILIIENPTVIKMIAQGQEIYSVIGEAKEAQKEEPKVEIKDEDVKFVMEQTGKGEQEVKEALQKANGDIAKAILLLTGQET is soluble?\nAssistant: Yes, this amino acid sequence is soluble."} {"text":"User: Can you derive if the peptide with the amino acid sequence AVANILNRDYNRAQEILNGIKNPDATTYYLMAVVAARTNNLDVVINSLRESISLDSSMMKKAATDLEFAKYANDGGFKSLLRH is soluble?\nAssistant: No, this amino acid sequence is not soluble."}", "/scratch/micpie/export/peptides_soluble/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the peptide with the amino acid sequence representation of MKPKPSQFKNLERMLGLKTEQLDAVKVTIELKDKILIIENPTVIKMIAQGQEIYSVIGEAKEAQKEEPKVEIKDEDVKFVMEQTGKGEQEVKEALQKANGDIAKAILLLTGQET soluble?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any additional words.\nOptions:\n1.) False\n2.) True\nAnswer: 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the peptide with the amino acid sequence representation of AVANILNRDYNRAQEILNGIKNPDATTYYLMAVVAARTNNLDVVINSLRESISLDSSMMKKAATDLEFAKYANDGGFKSLLRH soluble?\nConstraint: Even if you are uncertain, you must pick either a or b without using any additional words.\nOptions:\na. True\nb. False\nAnswer: b"}", "/scratch/micpie/export/peptides_soluble/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the peptide with the amino acid sequence GSHMKETKKAGPIELPEELEAQRQRHNDPRRPPWPLLHQRVVLLREGKGAPEDIALMWEQTKHYYPADWLIPLELTQVLKYSSGKYLQTYVADPDEMRKEVLMQLLNVKYGRVSDPNGGRVNKDVEEIISMAVDDLENMDLNPAADAVLIPTHT soluble?\nConstraint: Even if you are not sure, you must pick either A or B without using any additional words.\nOptions:\n(A) True\n(B) False\nAnswer: A"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the peptide with the amino acid sequence MAKVCYFTGRKTVSANNRSHAMNKTKRVAKPNLQKVTVLIDGKPKKVWASARALKSGKVERVLEHHHHHH soluble?\nConstraint: Even if you are uncertain, you must pick either a or b without using any additional words.\nOptions:\na. True\nb. False\nAnswer: b"}", "/scratch/micpie/export/peptides_soluble/test_0-4.jsonl": "{"text":"The peptide with amino acid sequence MKPKPSQFKNLERMLGLKTEQLDAVKVTIELKDKILIIENPTVIKMIAQGQEIYSVIGEAKEAQKEEPKVEIKDEDVKFVMEQTGKGEQEVKEALQKANGDIAKAILLLTGQET is soluble."} {"text":"The sequence of amino acids (AAs) AVANILNRDYNRAQEILNGIKNPDATTYYLMAVVAARTNNLDVVINSLRESISLDSSMMKKAATDLEFAKYANDGGFKSLLRH is not soluble."}", "/scratch/micpie/export/peptides_soluble/test_0-12.jsonl": "{"text":"User: I want to generate a AA sequence.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The amino acid sequence should be soluble.\nAssistant: Got it, this amino acid sequence is soluble: MKPKPSQFKNLERMLGLKTEQLDAVKVTIELKDKILIIENPTVIKMIAQGQEIYSVIGEAKEAQKEEPKVEIKDEDVKFVMEQTGKGEQEVKEALQKANGDIAKAILLLTGQET"} {"text":"User: I want to generate a amino acid sequence .\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The amino acid sequence should not be soluble.\nAssistant: Got it, this amino acid sequence is not soluble: AVANILNRDYNRAQEILNGIKNPDATTYYLMAVVAARTNNLDVVINSLRESISLDSSMMKKAATDLEFAKYANDGGFKSLLRH"}", "/scratch/micpie/export/mp_self_supervised/test_0-10.jsonl": "{"text":"User: I have a solid with the following CIF card [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.508\n_cell_length_b 5.970\n_cell_length_c 6.879\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural VSiNi\n_chemical_formula_sum 'V4 Si4 Ni4'\n_cell_volume 144.098\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n V V0 1 0.250 0.026 0.334 1.0\n V V1 1 0.250 0.526 0.166 1.0\n V V2 1 0.750 0.974 0.666 1.0\n V V3 1 0.750 0.474 0.834 1.0\n Si Si4 1 0.250 0.271 0.630 1.0\n Si Si5 1 0.250 0.771 0.870 1.0\n Si Si6 1 0.750 0.729 0.370 1.0\n Si Si7 1 0.750 0.229 0.130 1.0\n Ni Ni8 1 0.250 0.144 0.940 1.0\n Ni Ni9 1 0.250 0.644 0.560 1.0\n Ni Ni10 1 0.750 0.856 0.060 1.0\n Ni Ni11 1 0.750 0.356 0.440 1.0\n[\/CIF]\n. What is the density?\nAssistant: The density of the material of the material is 6.35 g\/cm^3. \nUser: Indeed, I also want to know the spacegroup of the material of this material.\nAssistant: The spacegroup of the material of the material is Pnma."} {"text":"User: I have a compound with the following CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.342\n_cell_length_b 5.342\n_cell_length_c 3.140\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural TmB2Rh3\n_chemical_formula_sum 'Tm1 B2 Rh3'\n_cell_volume 77.606\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Tm Tm0 1 0.000 0.000 0.500 1.0\n B B1 1 0.667 0.333 0.500 1.0\n B B2 1 0.333 0.667 0.500 1.0\n Rh Rh3 1 0.500 0.000 0.000 1.0\n Rh Rh4 1 0.500 0.500 0.000 1.0\n Rh Rh5 1 0.000 0.500 0.000 1.0\n[\/CIF]\n. What is the density?\nAssistant: The density of the solid is 10.68 g\/cm^3. \nUser: Yep, I also want to know the spacegroup of the material of this solid.\nAssistant: The spacegroup of the material of the solid is P6\/mmm."}", "/scratch/micpie/export/mp_self_supervised/valid_0-8.jsonl": "{"text":"Question: What is the spacegroup number of the material with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 10.187\n_cell_length_b 5.762\n_cell_length_c 10.423\n_cell_angle_alpha 87.690\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural BaTaF7\n_chemical_formula_sum 'Ba4 Ta4 F28'\n_cell_volume 611.287\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ba Ba0 1 0.331 0.310 0.509 1.0\n Ba Ba1 1 0.831 0.690 0.991 1.0\n Ba Ba2 1 0.669 0.690 0.491 1.0\n Ba Ba3 1 0.169 0.310 0.009 1.0\n Ta Ta4 1 0.611 0.185 0.185 1.0\n Ta Ta5 1 0.389 0.815 0.815 1.0\n Ta Ta6 1 0.889 0.185 0.685 1.0\n Ta Ta7 1 0.111 0.815 0.315 1.0\n F F8 1 0.449 0.101 0.880 1.0\n F F9 1 0.050 0.331 0.635 1.0\n F F10 1 0.691 0.469 0.241 1.0\n F F11 1 0.309 0.531 0.759 1.0\n F F12 1 0.455 0.880 0.641 1.0\n F F13 1 0.804 0.256 0.523 1.0\n F F14 1 0.762 0.001 0.241 1.0\n F F15 1 0.955 0.120 0.859 1.0\n F F16 1 0.450 0.331 0.135 1.0\n F F17 1 0.191 0.531 0.259 1.0\n F F18 1 0.051 0.101 0.380 1.0\n F F19 1 0.551 0.899 0.120 1.0\n F F20 1 0.262 0.999 0.259 1.0\n F F21 1 0.958 0.388 0.113 1.0\n F F22 1 0.545 0.120 0.359 1.0\n F F23 1 0.045 0.880 0.141 1.0\n F F24 1 0.238 0.999 0.759 1.0\n F F25 1 0.949 0.899 0.620 1.0\n F F26 1 0.458 0.612 0.387 1.0\n F F27 1 0.950 0.669 0.365 1.0\n F F28 1 0.042 0.612 0.887 1.0\n F F29 1 0.738 0.001 0.741 1.0\n F F30 1 0.542 0.388 0.613 1.0\n F F31 1 0.304 0.744 0.977 1.0\n F F32 1 0.196 0.744 0.477 1.0\n F F33 1 0.696 0.256 0.023 1.0\n F F34 1 0.550 0.669 0.865 1.0\n F F35 1 0.809 0.469 0.741 1.0\n[\/CIF]\n?\nAnswer: 14."} {"text":"Question: What is the spacegroup number of the material of the material with the CIF card [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.885\n_cell_length_b 6.885\n_cell_length_c 8.857\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 99.714\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Na3Al2P2O8F3\n_chemical_formula_sum 'Na6 Al4 P4 O16 F6'\n_cell_volume 413.828\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Na Na0 1 0.525 0.525 0.989 1.0\n Na Na1 1 0.761 0.761 0.222 1.0\n Na Na2 1 0.111 0.111 0.356 1.0\n Na Na3 1 0.239 0.239 0.722 1.0\n Na Na4 1 0.475 0.475 0.489 1.0\n Na Na5 1 0.889 0.889 0.856 1.0\n Al Al6 1 0.426 0.066 0.004 1.0\n Al Al7 1 0.066 0.426 0.004 1.0\n Al Al8 1 0.934 0.574 0.504 1.0\n Al Al9 1 0.574 0.934 0.504 1.0\n P P10 1 0.748 0.251 0.248 1.0\n P P11 1 0.251 0.748 0.248 1.0\n P P12 1 0.252 0.749 0.748 1.0\n P P13 1 0.749 0.252 0.748 1.0\n O O14 1 0.565 0.242 0.148 1.0\n O O15 1 0.737 0.064 0.348 1.0\n O O16 1 0.937 0.258 0.154 1.0\n O O17 1 0.762 0.439 0.348 1.0\n O O18 1 0.242 0.565 0.148 1.0\n O O19 1 0.064 0.737 0.348 1.0\n O O20 1 0.258 0.937 0.154 1.0\n O O21 1 0.439 0.762 0.348 1.0\n O O22 1 0.063 0.742 0.654 1.0\n O O23 1 0.238 0.561 0.848 1.0\n O O24 1 0.435 0.758 0.648 1.0\n O O25 1 0.263 0.936 0.848 1.0\n O O26 1 0.742 0.063 0.654 1.0\n O O27 1 0.561 0.238 0.848 1.0\n O O28 1 0.758 0.435 0.648 1.0\n O O29 1 0.936 0.263 0.848 1.0\n F F30 1 0.250 0.250 0.996 1.0\n F F31 1 0.599 0.888 0.004 1.0\n F F32 1 0.888 0.599 0.004 1.0\n F F33 1 0.750 0.750 0.496 1.0\n F F34 1 0.112 0.401 0.504 1.0\n F F35 1 0.401 0.112 0.504 1.0\n[\/CIF]\n?\nAnswer: 36."}", "/scratch/micpie/export/mp_self_supervised/train_0-8.jsonl": "{"text":"Question: What is the number of the spacegroup in the International Tables for Crystallography of the compound with the CIF file [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.748\n_cell_length_b 6.486\n_cell_length_c 5.727\n_cell_angle_alpha 90.915\n_cell_angle_beta 106.375\n_cell_angle_gamma 86.870\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiCr6(OF)4\n_chemical_formula_sum 'Li1 Cr6 O4 F4'\n_cell_volume 204.531\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.034 0.377 0.921 1.0\n Cr Cr1 1 0.003 0.532 0.465 1.0\n Cr Cr2 1 0.990 0.980 0.007 1.0\n Cr Cr3 1 0.476 0.129 0.768 1.0\n Cr Cr4 1 0.514 0.624 0.738 1.0\n Cr Cr5 1 0.498 0.377 0.273 1.0\n Cr Cr6 1 0.497 0.880 0.231 1.0\n O O7 1 0.340 0.379 0.549 1.0\n O O8 1 0.659 0.869 0.956 1.0\n O O9 1 0.298 0.145 0.042 1.0\n O O10 1 0.658 0.637 0.446 1.0\n F F11 1 0.258 0.607 0.951 1.0\n F F12 1 0.183 0.874 0.350 1.0\n F F13 1 0.820 0.136 0.688 1.0\n F F14 1 0.750 0.374 0.057 1.0\n[\/CIF]\n?\nAnswer: 1."} {"text":"Question: What is the number of the spacegroup in the International Tables for Crystallography of the compound with the CIF file [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.306\n_cell_length_b 4.306\n_cell_length_c 14.143\n_cell_angle_alpha 81.244\n_cell_angle_beta 98.756\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Ti8Ga3Co4Si\n_chemical_formula_sum 'Ti8 Ga3 Co4 Si1'\n_cell_volume 223.544\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ti Ti0 1 0.250 0.750 0.750 1.0\n Ti Ti1 1 0.500 0.500 0.501 1.0\n Ti Ti2 1 0.999 0.001 0.998 1.0\n Ti Ti3 1 0.750 0.250 0.249 1.0\n Ti Ti4 1 0.313 0.687 0.938 1.0\n Ti Ti5 1 0.563 0.437 0.690 1.0\n Ti Ti6 1 0.063 0.937 0.189 1.0\n Ti Ti7 1 0.813 0.187 0.438 1.0\n Ga Ga8 1 0.687 0.313 0.062 1.0\n Ga Ga9 1 0.437 0.563 0.312 1.0\n Ga Ga10 1 0.188 0.812 0.563 1.0\n Co Co11 1 0.375 0.625 0.125 1.0\n Co Co12 1 0.125 0.875 0.376 1.0\n Co Co13 1 0.877 0.123 0.630 1.0\n Co Co14 1 0.624 0.376 0.871 1.0\n Si Si15 1 0.936 0.064 0.809 1.0\n[\/CIF]\n?\nAnswer: 160."}", "/scratch/micpie/export/mp_self_supervised/test_0-5.jsonl": "{"text":"Question: What is the density of the material of the material with the CIF card [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.508\n_cell_length_b 5.970\n_cell_length_c 6.879\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural VSiNi\n_chemical_formula_sum 'V4 Si4 Ni4'\n_cell_volume 144.098\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n V V0 1 0.250 0.026 0.334 1.0\n V V1 1 0.250 0.526 0.166 1.0\n V V2 1 0.750 0.974 0.666 1.0\n V V3 1 0.750 0.474 0.834 1.0\n Si Si4 1 0.250 0.271 0.630 1.0\n Si Si5 1 0.250 0.771 0.870 1.0\n Si Si6 1 0.750 0.729 0.370 1.0\n Si Si7 1 0.750 0.229 0.130 1.0\n Ni Ni8 1 0.250 0.144 0.940 1.0\n Ni Ni9 1 0.250 0.644 0.560 1.0\n Ni Ni10 1 0.750 0.856 0.060 1.0\n Ni Ni11 1 0.750 0.356 0.440 1.0\n[\/CIF]\n?\nAnswer: 6.35 g\/cm^3."} {"text":"Question: What is the density of the material of the material with the CIF file [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.342\n_cell_length_b 5.342\n_cell_length_c 3.140\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural TmB2Rh3\n_chemical_formula_sum 'Tm1 B2 Rh3'\n_cell_volume 77.606\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Tm Tm0 1 0.000 0.000 0.500 1.0\n B B1 1 0.667 0.333 0.500 1.0\n B B2 1 0.333 0.667 0.500 1.0\n Rh Rh3 1 0.500 0.000 0.000 1.0\n Rh Rh4 1 0.500 0.500 0.000 1.0\n Rh Rh5 1 0.000 0.500 0.000 1.0\n[\/CIF]\n?\nAnswer: 10.68 g\/cm^3."}", "/scratch/micpie/export/mp_self_supervised/valid_0-9.jsonl": "{"text":"User: I want to design a solid with a density of 4.90 g\/cm^3, and a reduced formula of BaTaF7.\nAssistant: Cool, I suggest the solid with the CIF card [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 10.187\n_cell_length_b 5.762\n_cell_length_c 10.423\n_cell_angle_alpha 87.690\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural BaTaF7\n_chemical_formula_sum 'Ba4 Ta4 F28'\n_cell_volume 611.287\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ba Ba0 1 0.331 0.310 0.509 1.0\n Ba Ba1 1 0.831 0.690 0.991 1.0\n Ba Ba2 1 0.669 0.690 0.491 1.0\n Ba Ba3 1 0.169 0.310 0.009 1.0\n Ta Ta4 1 0.611 0.185 0.185 1.0\n Ta Ta5 1 0.389 0.815 0.815 1.0\n Ta Ta6 1 0.889 0.185 0.685 1.0\n Ta Ta7 1 0.111 0.815 0.315 1.0\n F F8 1 0.449 0.101 0.880 1.0\n F F9 1 0.050 0.331 0.635 1.0\n F F10 1 0.691 0.469 0.241 1.0\n F F11 1 0.309 0.531 0.759 1.0\n F F12 1 0.455 0.880 0.641 1.0\n F F13 1 0.804 0.256 0.523 1.0\n F F14 1 0.762 0.001 0.241 1.0\n F F15 1 0.955 0.120 0.859 1.0\n F F16 1 0.450 0.331 0.135 1.0\n F F17 1 0.191 0.531 0.259 1.0\n F F18 1 0.051 0.101 0.380 1.0\n F F19 1 0.551 0.899 0.120 1.0\n F F20 1 0.262 0.999 0.259 1.0\n F F21 1 0.958 0.388 0.113 1.0\n F F22 1 0.545 0.120 0.359 1.0\n F F23 1 0.045 0.880 0.141 1.0\n F F24 1 0.238 0.999 0.759 1.0\n F F25 1 0.949 0.899 0.620 1.0\n F F26 1 0.458 0.612 0.387 1.0\n F F27 1 0.950 0.669 0.365 1.0\n F F28 1 0.042 0.612 0.887 1.0\n F F29 1 0.738 0.001 0.741 1.0\n F F30 1 0.542 0.388 0.613 1.0\n F F31 1 0.304 0.744 0.977 1.0\n F F32 1 0.196 0.744 0.477 1.0\n F F33 1 0.696 0.256 0.023 1.0\n F F34 1 0.550 0.669 0.865 1.0\n F F35 1 0.809 0.469 0.741 1.0\n[\/CIF]\n. Do you need anything else?\nUser: Yeah, I also want to know the spacegroup of the material of this solid.\nAssistant: The spacegroup of the material of the solid is P2_1\/c."} {"text":"User: I want to design a solid with a density of 2.97 g\/cm^3, and a composition of Na3Al2P2O8F3.\nAssistant: I suggest the solid with the CIF card [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.885\n_cell_length_b 6.885\n_cell_length_c 8.857\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 99.714\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Na3Al2P2O8F3\n_chemical_formula_sum 'Na6 Al4 P4 O16 F6'\n_cell_volume 413.828\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Na Na0 1 0.525 0.525 0.989 1.0\n Na Na1 1 0.761 0.761 0.222 1.0\n Na Na2 1 0.111 0.111 0.356 1.0\n Na Na3 1 0.239 0.239 0.722 1.0\n Na Na4 1 0.475 0.475 0.489 1.0\n Na Na5 1 0.889 0.889 0.856 1.0\n Al Al6 1 0.426 0.066 0.004 1.0\n Al Al7 1 0.066 0.426 0.004 1.0\n Al Al8 1 0.934 0.574 0.504 1.0\n Al Al9 1 0.574 0.934 0.504 1.0\n P P10 1 0.748 0.251 0.248 1.0\n P P11 1 0.251 0.748 0.248 1.0\n P P12 1 0.252 0.749 0.748 1.0\n P P13 1 0.749 0.252 0.748 1.0\n O O14 1 0.565 0.242 0.148 1.0\n O O15 1 0.737 0.064 0.348 1.0\n O O16 1 0.937 0.258 0.154 1.0\n O O17 1 0.762 0.439 0.348 1.0\n O O18 1 0.242 0.565 0.148 1.0\n O O19 1 0.064 0.737 0.348 1.0\n O O20 1 0.258 0.937 0.154 1.0\n O O21 1 0.439 0.762 0.348 1.0\n O O22 1 0.063 0.742 0.654 1.0\n O O23 1 0.238 0.561 0.848 1.0\n O O24 1 0.435 0.758 0.648 1.0\n O O25 1 0.263 0.936 0.848 1.0\n O O26 1 0.742 0.063 0.654 1.0\n O O27 1 0.561 0.238 0.848 1.0\n O O28 1 0.758 0.435 0.648 1.0\n O O29 1 0.936 0.263 0.848 1.0\n F F30 1 0.250 0.250 0.996 1.0\n F F31 1 0.599 0.888 0.004 1.0\n F F32 1 0.888 0.599 0.004 1.0\n F F33 1 0.750 0.750 0.496 1.0\n F F34 1 0.112 0.401 0.504 1.0\n F F35 1 0.401 0.112 0.504 1.0\n[\/CIF]\n. Do you need anything else?\nUser: I also want to know the spacegroup of this solid.\nAssistant: The spacegroup of the solid is Cmc2_1."}", "/scratch/micpie/export/mp_self_supervised/test_0-1.jsonl": "{"text":"The density of the material of the compound with the CIF card [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.508\n_cell_length_b 5.970\n_cell_length_c 6.879\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural VSiNi\n_chemical_formula_sum 'V4 Si4 Ni4'\n_cell_volume 144.098\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n V V0 1 0.250 0.026 0.334 1.0\n V V1 1 0.250 0.526 0.166 1.0\n V V2 1 0.750 0.974 0.666 1.0\n V V3 1 0.750 0.474 0.834 1.0\n Si Si4 1 0.250 0.271 0.630 1.0\n Si Si5 1 0.250 0.771 0.870 1.0\n Si Si6 1 0.750 0.729 0.370 1.0\n Si Si7 1 0.750 0.229 0.130 1.0\n Ni Ni8 1 0.250 0.144 0.940 1.0\n Ni Ni9 1 0.250 0.644 0.560 1.0\n Ni Ni10 1 0.750 0.856 0.060 1.0\n Ni Ni11 1 0.750 0.356 0.440 1.0\n[\/CIF]\n is 6.35 g\/cm^3."} {"text":"The density of the material with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.342\n_cell_length_b 5.342\n_cell_length_c 3.140\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural TmB2Rh3\n_chemical_formula_sum 'Tm1 B2 Rh3'\n_cell_volume 77.606\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Tm Tm0 1 0.000 0.000 0.500 1.0\n B B1 1 0.667 0.333 0.500 1.0\n B B2 1 0.333 0.667 0.500 1.0\n Rh Rh3 1 0.500 0.000 0.000 1.0\n Rh Rh4 1 0.500 0.500 0.000 1.0\n Rh Rh5 1 0.000 0.500 0.000 1.0\n[\/CIF]\n is 10.68 g\/cm^3."}", "/scratch/micpie/export/mp_self_supervised/valid_0-0.jsonl": "{"text":"The spacegroup of the compound with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 10.187\n_cell_length_b 5.762\n_cell_length_c 10.423\n_cell_angle_alpha 87.690\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural BaTaF7\n_chemical_formula_sum 'Ba4 Ta4 F28'\n_cell_volume 611.287\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ba Ba0 1 0.331 0.310 0.509 1.0\n Ba Ba1 1 0.831 0.690 0.991 1.0\n Ba Ba2 1 0.669 0.690 0.491 1.0\n Ba Ba3 1 0.169 0.310 0.009 1.0\n Ta Ta4 1 0.611 0.185 0.185 1.0\n Ta Ta5 1 0.389 0.815 0.815 1.0\n Ta Ta6 1 0.889 0.185 0.685 1.0\n Ta Ta7 1 0.111 0.815 0.315 1.0\n F F8 1 0.449 0.101 0.880 1.0\n F F9 1 0.050 0.331 0.635 1.0\n F F10 1 0.691 0.469 0.241 1.0\n F F11 1 0.309 0.531 0.759 1.0\n F F12 1 0.455 0.880 0.641 1.0\n F F13 1 0.804 0.256 0.523 1.0\n F F14 1 0.762 0.001 0.241 1.0\n F F15 1 0.955 0.120 0.859 1.0\n F F16 1 0.450 0.331 0.135 1.0\n F F17 1 0.191 0.531 0.259 1.0\n F F18 1 0.051 0.101 0.380 1.0\n F F19 1 0.551 0.899 0.120 1.0\n F F20 1 0.262 0.999 0.259 1.0\n F F21 1 0.958 0.388 0.113 1.0\n F F22 1 0.545 0.120 0.359 1.0\n F F23 1 0.045 0.880 0.141 1.0\n F F24 1 0.238 0.999 0.759 1.0\n F F25 1 0.949 0.899 0.620 1.0\n F F26 1 0.458 0.612 0.387 1.0\n F F27 1 0.950 0.669 0.365 1.0\n F F28 1 0.042 0.612 0.887 1.0\n F F29 1 0.738 0.001 0.741 1.0\n F F30 1 0.542 0.388 0.613 1.0\n F F31 1 0.304 0.744 0.977 1.0\n F F32 1 0.196 0.744 0.477 1.0\n F F33 1 0.696 0.256 0.023 1.0\n F F34 1 0.550 0.669 0.865 1.0\n F F35 1 0.809 0.469 0.741 1.0\n[\/CIF]\n is P2_1\/c."} {"text":"The spacegroup of the solid with the CIF card [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.885\n_cell_length_b 6.885\n_cell_length_c 8.857\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 99.714\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Na3Al2P2O8F3\n_chemical_formula_sum 'Na6 Al4 P4 O16 F6'\n_cell_volume 413.828\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Na Na0 1 0.525 0.525 0.989 1.0\n Na Na1 1 0.761 0.761 0.222 1.0\n Na Na2 1 0.111 0.111 0.356 1.0\n Na Na3 1 0.239 0.239 0.722 1.0\n Na Na4 1 0.475 0.475 0.489 1.0\n Na Na5 1 0.889 0.889 0.856 1.0\n Al Al6 1 0.426 0.066 0.004 1.0\n Al Al7 1 0.066 0.426 0.004 1.0\n Al Al8 1 0.934 0.574 0.504 1.0\n Al Al9 1 0.574 0.934 0.504 1.0\n P P10 1 0.748 0.251 0.248 1.0\n P P11 1 0.251 0.748 0.248 1.0\n P P12 1 0.252 0.749 0.748 1.0\n P P13 1 0.749 0.252 0.748 1.0\n O O14 1 0.565 0.242 0.148 1.0\n O O15 1 0.737 0.064 0.348 1.0\n O O16 1 0.937 0.258 0.154 1.0\n O O17 1 0.762 0.439 0.348 1.0\n O O18 1 0.242 0.565 0.148 1.0\n O O19 1 0.064 0.737 0.348 1.0\n O O20 1 0.258 0.937 0.154 1.0\n O O21 1 0.439 0.762 0.348 1.0\n O O22 1 0.063 0.742 0.654 1.0\n O O23 1 0.238 0.561 0.848 1.0\n O O24 1 0.435 0.758 0.648 1.0\n O O25 1 0.263 0.936 0.848 1.0\n O O26 1 0.742 0.063 0.654 1.0\n O O27 1 0.561 0.238 0.848 1.0\n O O28 1 0.758 0.435 0.648 1.0\n O O29 1 0.936 0.263 0.848 1.0\n F F30 1 0.250 0.250 0.996 1.0\n F F31 1 0.599 0.888 0.004 1.0\n F F32 1 0.888 0.599 0.004 1.0\n F F33 1 0.750 0.750 0.496 1.0\n F F34 1 0.112 0.401 0.504 1.0\n F F35 1 0.401 0.112 0.504 1.0\n[\/CIF]\n is Cmc2_1."}", "/scratch/micpie/export/mp_self_supervised/test_0-2.jsonl": "{"text":"The chemical formula of the solid with the CIF file [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.508\n_cell_length_b 5.970\n_cell_length_c 6.879\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural VSiNi\n_chemical_formula_sum 'V4 Si4 Ni4'\n_cell_volume 144.098\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n V V0 1 0.250 0.026 0.334 1.0\n V V1 1 0.250 0.526 0.166 1.0\n V V2 1 0.750 0.974 0.666 1.0\n V V3 1 0.750 0.474 0.834 1.0\n Si Si4 1 0.250 0.271 0.630 1.0\n Si Si5 1 0.250 0.771 0.870 1.0\n Si Si6 1 0.750 0.729 0.370 1.0\n Si Si7 1 0.750 0.229 0.130 1.0\n Ni Ni8 1 0.250 0.144 0.940 1.0\n Ni Ni9 1 0.250 0.644 0.560 1.0\n Ni Ni10 1 0.750 0.856 0.060 1.0\n Ni Ni11 1 0.750 0.356 0.440 1.0\n[\/CIF]\n is VSiNi."} {"text":"The chemical formula of the material with the CIF file [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.342\n_cell_length_b 5.342\n_cell_length_c 3.140\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural TmB2Rh3\n_chemical_formula_sum 'Tm1 B2 Rh3'\n_cell_volume 77.606\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Tm Tm0 1 0.000 0.000 0.500 1.0\n B B1 1 0.667 0.333 0.500 1.0\n B B2 1 0.333 0.667 0.500 1.0\n Rh Rh3 1 0.500 0.000 0.000 1.0\n Rh Rh4 1 0.500 0.500 0.000 1.0\n Rh Rh5 1 0.000 0.500 0.000 1.0\n[\/CIF]\n is TmB2Rh3."}", "/scratch/micpie/export/mp_self_supervised/valid_0-10.jsonl": "{"text":"User: I have a compound with the following CIF file [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 10.187\n_cell_length_b 5.762\n_cell_length_c 10.423\n_cell_angle_alpha 87.690\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural BaTaF7\n_chemical_formula_sum 'Ba4 Ta4 F28'\n_cell_volume 611.287\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ba Ba0 1 0.331 0.310 0.509 1.0\n Ba Ba1 1 0.831 0.690 0.991 1.0\n Ba Ba2 1 0.669 0.690 0.491 1.0\n Ba Ba3 1 0.169 0.310 0.009 1.0\n Ta Ta4 1 0.611 0.185 0.185 1.0\n Ta Ta5 1 0.389 0.815 0.815 1.0\n Ta Ta6 1 0.889 0.185 0.685 1.0\n Ta Ta7 1 0.111 0.815 0.315 1.0\n F F8 1 0.449 0.101 0.880 1.0\n F F9 1 0.050 0.331 0.635 1.0\n F F10 1 0.691 0.469 0.241 1.0\n F F11 1 0.309 0.531 0.759 1.0\n F F12 1 0.455 0.880 0.641 1.0\n F F13 1 0.804 0.256 0.523 1.0\n F F14 1 0.762 0.001 0.241 1.0\n F F15 1 0.955 0.120 0.859 1.0\n F F16 1 0.450 0.331 0.135 1.0\n F F17 1 0.191 0.531 0.259 1.0\n F F18 1 0.051 0.101 0.380 1.0\n F F19 1 0.551 0.899 0.120 1.0\n F F20 1 0.262 0.999 0.259 1.0\n F F21 1 0.958 0.388 0.113 1.0\n F F22 1 0.545 0.120 0.359 1.0\n F F23 1 0.045 0.880 0.141 1.0\n F F24 1 0.238 0.999 0.759 1.0\n F F25 1 0.949 0.899 0.620 1.0\n F F26 1 0.458 0.612 0.387 1.0\n F F27 1 0.950 0.669 0.365 1.0\n F F28 1 0.042 0.612 0.887 1.0\n F F29 1 0.738 0.001 0.741 1.0\n F F30 1 0.542 0.388 0.613 1.0\n F F31 1 0.304 0.744 0.977 1.0\n F F32 1 0.196 0.744 0.477 1.0\n F F33 1 0.696 0.256 0.023 1.0\n F F34 1 0.550 0.669 0.865 1.0\n F F35 1 0.809 0.469 0.741 1.0\n[\/CIF]\n. Can you tell me the density?\nAssistant: The density of the solid is 4.90 g\/cm^3. \nUser: Yep, I also want to know the spacegroup of the material of this solid.\nAssistant: The spacegroup of the material of the solid is P2_1\/c."} {"text":"User: I have a solid with the following CIF card [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.885\n_cell_length_b 6.885\n_cell_length_c 8.857\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 99.714\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Na3Al2P2O8F3\n_chemical_formula_sum 'Na6 Al4 P4 O16 F6'\n_cell_volume 413.828\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Na Na0 1 0.525 0.525 0.989 1.0\n Na Na1 1 0.761 0.761 0.222 1.0\n Na Na2 1 0.111 0.111 0.356 1.0\n Na Na3 1 0.239 0.239 0.722 1.0\n Na Na4 1 0.475 0.475 0.489 1.0\n Na Na5 1 0.889 0.889 0.856 1.0\n Al Al6 1 0.426 0.066 0.004 1.0\n Al Al7 1 0.066 0.426 0.004 1.0\n Al Al8 1 0.934 0.574 0.504 1.0\n Al Al9 1 0.574 0.934 0.504 1.0\n P P10 1 0.748 0.251 0.248 1.0\n P P11 1 0.251 0.748 0.248 1.0\n P P12 1 0.252 0.749 0.748 1.0\n P P13 1 0.749 0.252 0.748 1.0\n O O14 1 0.565 0.242 0.148 1.0\n O O15 1 0.737 0.064 0.348 1.0\n O O16 1 0.937 0.258 0.154 1.0\n O O17 1 0.762 0.439 0.348 1.0\n O O18 1 0.242 0.565 0.148 1.0\n O O19 1 0.064 0.737 0.348 1.0\n O O20 1 0.258 0.937 0.154 1.0\n O O21 1 0.439 0.762 0.348 1.0\n O O22 1 0.063 0.742 0.654 1.0\n O O23 1 0.238 0.561 0.848 1.0\n O O24 1 0.435 0.758 0.648 1.0\n O O25 1 0.263 0.936 0.848 1.0\n O O26 1 0.742 0.063 0.654 1.0\n O O27 1 0.561 0.238 0.848 1.0\n O O28 1 0.758 0.435 0.648 1.0\n O O29 1 0.936 0.263 0.848 1.0\n F F30 1 0.250 0.250 0.996 1.0\n F F31 1 0.599 0.888 0.004 1.0\n F F32 1 0.888 0.599 0.004 1.0\n F F33 1 0.750 0.750 0.496 1.0\n F F34 1 0.112 0.401 0.504 1.0\n F F35 1 0.401 0.112 0.504 1.0\n[\/CIF]\n. Can you tell me the density?\nAssistant: The density of the material is 2.97 g\/cm^3. Is there anything else I can do for you?\nUser: Yeah, I also want to know the spacegroup of this material.\nAssistant: The spacegroup of the material is Cmc2_1."}", "/scratch/micpie/export/mp_self_supervised/train_0-6.jsonl": "{"text":"Question: What is the spacegroup of the material of the material with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.748\n_cell_length_b 6.486\n_cell_length_c 5.727\n_cell_angle_alpha 90.915\n_cell_angle_beta 106.375\n_cell_angle_gamma 86.870\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiCr6(OF)4\n_chemical_formula_sum 'Li1 Cr6 O4 F4'\n_cell_volume 204.531\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.034 0.377 0.921 1.0\n Cr Cr1 1 0.003 0.532 0.465 1.0\n Cr Cr2 1 0.990 0.980 0.007 1.0\n Cr Cr3 1 0.476 0.129 0.768 1.0\n Cr Cr4 1 0.514 0.624 0.738 1.0\n Cr Cr5 1 0.498 0.377 0.273 1.0\n Cr Cr6 1 0.497 0.880 0.231 1.0\n O O7 1 0.340 0.379 0.549 1.0\n O O8 1 0.659 0.869 0.956 1.0\n O O9 1 0.298 0.145 0.042 1.0\n O O10 1 0.658 0.637 0.446 1.0\n F F11 1 0.258 0.607 0.951 1.0\n F F12 1 0.183 0.874 0.350 1.0\n F F13 1 0.820 0.136 0.688 1.0\n F F14 1 0.750 0.374 0.057 1.0\n[\/CIF]\n?\nAnswer: P1."} {"text":"Question: What is the spacegroup of the material of the material with the CIF card [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.306\n_cell_length_b 4.306\n_cell_length_c 14.143\n_cell_angle_alpha 81.244\n_cell_angle_beta 98.756\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Ti8Ga3Co4Si\n_chemical_formula_sum 'Ti8 Ga3 Co4 Si1'\n_cell_volume 223.544\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ti Ti0 1 0.250 0.750 0.750 1.0\n Ti Ti1 1 0.500 0.500 0.501 1.0\n Ti Ti2 1 0.999 0.001 0.998 1.0\n Ti Ti3 1 0.750 0.250 0.249 1.0\n Ti Ti4 1 0.313 0.687 0.938 1.0\n Ti Ti5 1 0.563 0.437 0.690 1.0\n Ti Ti6 1 0.063 0.937 0.189 1.0\n Ti Ti7 1 0.813 0.187 0.438 1.0\n Ga Ga8 1 0.687 0.313 0.062 1.0\n Ga Ga9 1 0.437 0.563 0.312 1.0\n Ga Ga10 1 0.188 0.812 0.563 1.0\n Co Co11 1 0.375 0.625 0.125 1.0\n Co Co12 1 0.125 0.875 0.376 1.0\n Co Co13 1 0.877 0.123 0.630 1.0\n Co Co14 1 0.624 0.376 0.871 1.0\n Si Si15 1 0.936 0.064 0.809 1.0\n[\/CIF]\n?\nAnswer: R3m."}", "/scratch/micpie/export/mp_self_supervised/valid_0-6.jsonl": "{"text":"Question: What is the spacegroup of the material with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 10.187\n_cell_length_b 5.762\n_cell_length_c 10.423\n_cell_angle_alpha 87.690\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural BaTaF7\n_chemical_formula_sum 'Ba4 Ta4 F28'\n_cell_volume 611.287\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ba Ba0 1 0.331 0.310 0.509 1.0\n Ba Ba1 1 0.831 0.690 0.991 1.0\n Ba Ba2 1 0.669 0.690 0.491 1.0\n Ba Ba3 1 0.169 0.310 0.009 1.0\n Ta Ta4 1 0.611 0.185 0.185 1.0\n Ta Ta5 1 0.389 0.815 0.815 1.0\n Ta Ta6 1 0.889 0.185 0.685 1.0\n Ta Ta7 1 0.111 0.815 0.315 1.0\n F F8 1 0.449 0.101 0.880 1.0\n F F9 1 0.050 0.331 0.635 1.0\n F F10 1 0.691 0.469 0.241 1.0\n F F11 1 0.309 0.531 0.759 1.0\n F F12 1 0.455 0.880 0.641 1.0\n F F13 1 0.804 0.256 0.523 1.0\n F F14 1 0.762 0.001 0.241 1.0\n F F15 1 0.955 0.120 0.859 1.0\n F F16 1 0.450 0.331 0.135 1.0\n F F17 1 0.191 0.531 0.259 1.0\n F F18 1 0.051 0.101 0.380 1.0\n F F19 1 0.551 0.899 0.120 1.0\n F F20 1 0.262 0.999 0.259 1.0\n F F21 1 0.958 0.388 0.113 1.0\n F F22 1 0.545 0.120 0.359 1.0\n F F23 1 0.045 0.880 0.141 1.0\n F F24 1 0.238 0.999 0.759 1.0\n F F25 1 0.949 0.899 0.620 1.0\n F F26 1 0.458 0.612 0.387 1.0\n F F27 1 0.950 0.669 0.365 1.0\n F F28 1 0.042 0.612 0.887 1.0\n F F29 1 0.738 0.001 0.741 1.0\n F F30 1 0.542 0.388 0.613 1.0\n F F31 1 0.304 0.744 0.977 1.0\n F F32 1 0.196 0.744 0.477 1.0\n F F33 1 0.696 0.256 0.023 1.0\n F F34 1 0.550 0.669 0.865 1.0\n F F35 1 0.809 0.469 0.741 1.0\n[\/CIF]\n?\nAnswer: P2_1\/c."} {"text":"Question: What is the spacegroup of the material of the material with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.885\n_cell_length_b 6.885\n_cell_length_c 8.857\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 99.714\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Na3Al2P2O8F3\n_chemical_formula_sum 'Na6 Al4 P4 O16 F6'\n_cell_volume 413.828\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Na Na0 1 0.525 0.525 0.989 1.0\n Na Na1 1 0.761 0.761 0.222 1.0\n Na Na2 1 0.111 0.111 0.356 1.0\n Na Na3 1 0.239 0.239 0.722 1.0\n Na Na4 1 0.475 0.475 0.489 1.0\n Na Na5 1 0.889 0.889 0.856 1.0\n Al Al6 1 0.426 0.066 0.004 1.0\n Al Al7 1 0.066 0.426 0.004 1.0\n Al Al8 1 0.934 0.574 0.504 1.0\n Al Al9 1 0.574 0.934 0.504 1.0\n P P10 1 0.748 0.251 0.248 1.0\n P P11 1 0.251 0.748 0.248 1.0\n P P12 1 0.252 0.749 0.748 1.0\n P P13 1 0.749 0.252 0.748 1.0\n O O14 1 0.565 0.242 0.148 1.0\n O O15 1 0.737 0.064 0.348 1.0\n O O16 1 0.937 0.258 0.154 1.0\n O O17 1 0.762 0.439 0.348 1.0\n O O18 1 0.242 0.565 0.148 1.0\n O O19 1 0.064 0.737 0.348 1.0\n O O20 1 0.258 0.937 0.154 1.0\n O O21 1 0.439 0.762 0.348 1.0\n O O22 1 0.063 0.742 0.654 1.0\n O O23 1 0.238 0.561 0.848 1.0\n O O24 1 0.435 0.758 0.648 1.0\n O O25 1 0.263 0.936 0.848 1.0\n O O26 1 0.742 0.063 0.654 1.0\n O O27 1 0.561 0.238 0.848 1.0\n O O28 1 0.758 0.435 0.648 1.0\n O O29 1 0.936 0.263 0.848 1.0\n F F30 1 0.250 0.250 0.996 1.0\n F F31 1 0.599 0.888 0.004 1.0\n F F32 1 0.888 0.599 0.004 1.0\n F F33 1 0.750 0.750 0.496 1.0\n F F34 1 0.112 0.401 0.504 1.0\n F F35 1 0.401 0.112 0.504 1.0\n[\/CIF]\n?\nAnswer: Cmc2_1."}", "/scratch/micpie/export/mp_self_supervised/test_0-9.jsonl": "{"text":"User: I want to design a material with a density of 6.35 g\/cm^3, and a composition of VSiNi.\nAssistant: Awesome, I suggest the material with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.508\n_cell_length_b 5.970\n_cell_length_c 6.879\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural VSiNi\n_chemical_formula_sum 'V4 Si4 Ni4'\n_cell_volume 144.098\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n V V0 1 0.250 0.026 0.334 1.0\n V V1 1 0.250 0.526 0.166 1.0\n V V2 1 0.750 0.974 0.666 1.0\n V V3 1 0.750 0.474 0.834 1.0\n Si Si4 1 0.250 0.271 0.630 1.0\n Si Si5 1 0.250 0.771 0.870 1.0\n Si Si6 1 0.750 0.729 0.370 1.0\n Si Si7 1 0.750 0.229 0.130 1.0\n Ni Ni8 1 0.250 0.144 0.940 1.0\n Ni Ni9 1 0.250 0.644 0.560 1.0\n Ni Ni10 1 0.750 0.856 0.060 1.0\n Ni Ni11 1 0.750 0.356 0.440 1.0\n[\/CIF]\n. Is there anything else I can do for you?\nUser: Yep, I also want to know the spacegroup of this material.\nAssistant: The spacegroup of the material is Pnma."} {"text":"User: I want to design a solid with a density of 10.68 g\/cm^3, and a chemical formula of TmB2Rh3.\nAssistant: Awesome, I suggest the solid with the CIF card [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.342\n_cell_length_b 5.342\n_cell_length_c 3.140\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural TmB2Rh3\n_chemical_formula_sum 'Tm1 B2 Rh3'\n_cell_volume 77.606\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Tm Tm0 1 0.000 0.000 0.500 1.0\n B B1 1 0.667 0.333 0.500 1.0\n B B2 1 0.333 0.667 0.500 1.0\n Rh Rh3 1 0.500 0.000 0.000 1.0\n Rh Rh4 1 0.500 0.500 0.000 1.0\n Rh Rh5 1 0.000 0.500 0.000 1.0\n[\/CIF]\n. \nUser: Yes, I also want to know the spacegroup of this solid.\nAssistant: The spacegroup of the solid is P6\/mmm."}", "/scratch/micpie/export/mp_self_supervised/test_0-0.jsonl": "{"text":"The spacegroup of the material of the material with the CIF card [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.508\n_cell_length_b 5.970\n_cell_length_c 6.879\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural VSiNi\n_chemical_formula_sum 'V4 Si4 Ni4'\n_cell_volume 144.098\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n V V0 1 0.250 0.026 0.334 1.0\n V V1 1 0.250 0.526 0.166 1.0\n V V2 1 0.750 0.974 0.666 1.0\n V V3 1 0.750 0.474 0.834 1.0\n Si Si4 1 0.250 0.271 0.630 1.0\n Si Si5 1 0.250 0.771 0.870 1.0\n Si Si6 1 0.750 0.729 0.370 1.0\n Si Si7 1 0.750 0.229 0.130 1.0\n Ni Ni8 1 0.250 0.144 0.940 1.0\n Ni Ni9 1 0.250 0.644 0.560 1.0\n Ni Ni10 1 0.750 0.856 0.060 1.0\n Ni Ni11 1 0.750 0.356 0.440 1.0\n[\/CIF]\n is Pnma."} {"text":"The spacegroup of the material of the solid with the CIF file [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.342\n_cell_length_b 5.342\n_cell_length_c 3.140\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural TmB2Rh3\n_chemical_formula_sum 'Tm1 B2 Rh3'\n_cell_volume 77.606\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Tm Tm0 1 0.000 0.000 0.500 1.0\n B B1 1 0.667 0.333 0.500 1.0\n B B2 1 0.333 0.667 0.500 1.0\n Rh Rh3 1 0.500 0.000 0.000 1.0\n Rh Rh4 1 0.500 0.500 0.000 1.0\n Rh Rh5 1 0.000 0.500 0.000 1.0\n[\/CIF]\n is P6\/mmm."}", "/scratch/micpie/export/mp_self_supervised/valid_0-7.jsonl": "{"text":"Question: What is the reduced formula of the solid with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 10.187\n_cell_length_b 5.762\n_cell_length_c 10.423\n_cell_angle_alpha 87.690\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural BaTaF7\n_chemical_formula_sum 'Ba4 Ta4 F28'\n_cell_volume 611.287\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ba Ba0 1 0.331 0.310 0.509 1.0\n Ba Ba1 1 0.831 0.690 0.991 1.0\n Ba Ba2 1 0.669 0.690 0.491 1.0\n Ba Ba3 1 0.169 0.310 0.009 1.0\n Ta Ta4 1 0.611 0.185 0.185 1.0\n Ta Ta5 1 0.389 0.815 0.815 1.0\n Ta Ta6 1 0.889 0.185 0.685 1.0\n Ta Ta7 1 0.111 0.815 0.315 1.0\n F F8 1 0.449 0.101 0.880 1.0\n F F9 1 0.050 0.331 0.635 1.0\n F F10 1 0.691 0.469 0.241 1.0\n F F11 1 0.309 0.531 0.759 1.0\n F F12 1 0.455 0.880 0.641 1.0\n F F13 1 0.804 0.256 0.523 1.0\n F F14 1 0.762 0.001 0.241 1.0\n F F15 1 0.955 0.120 0.859 1.0\n F F16 1 0.450 0.331 0.135 1.0\n F F17 1 0.191 0.531 0.259 1.0\n F F18 1 0.051 0.101 0.380 1.0\n F F19 1 0.551 0.899 0.120 1.0\n F F20 1 0.262 0.999 0.259 1.0\n F F21 1 0.958 0.388 0.113 1.0\n F F22 1 0.545 0.120 0.359 1.0\n F F23 1 0.045 0.880 0.141 1.0\n F F24 1 0.238 0.999 0.759 1.0\n F F25 1 0.949 0.899 0.620 1.0\n F F26 1 0.458 0.612 0.387 1.0\n F F27 1 0.950 0.669 0.365 1.0\n F F28 1 0.042 0.612 0.887 1.0\n F F29 1 0.738 0.001 0.741 1.0\n F F30 1 0.542 0.388 0.613 1.0\n F F31 1 0.304 0.744 0.977 1.0\n F F32 1 0.196 0.744 0.477 1.0\n F F33 1 0.696 0.256 0.023 1.0\n F F34 1 0.550 0.669 0.865 1.0\n F F35 1 0.809 0.469 0.741 1.0\n[\/CIF]\n?\nAnswer: BaTaF7."} {"text":"Question: What is the composition of the material with the CIF file [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.885\n_cell_length_b 6.885\n_cell_length_c 8.857\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 99.714\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Na3Al2P2O8F3\n_chemical_formula_sum 'Na6 Al4 P4 O16 F6'\n_cell_volume 413.828\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Na Na0 1 0.525 0.525 0.989 1.0\n Na Na1 1 0.761 0.761 0.222 1.0\n Na Na2 1 0.111 0.111 0.356 1.0\n Na Na3 1 0.239 0.239 0.722 1.0\n Na Na4 1 0.475 0.475 0.489 1.0\n Na Na5 1 0.889 0.889 0.856 1.0\n Al Al6 1 0.426 0.066 0.004 1.0\n Al Al7 1 0.066 0.426 0.004 1.0\n Al Al8 1 0.934 0.574 0.504 1.0\n Al Al9 1 0.574 0.934 0.504 1.0\n P P10 1 0.748 0.251 0.248 1.0\n P P11 1 0.251 0.748 0.248 1.0\n P P12 1 0.252 0.749 0.748 1.0\n P P13 1 0.749 0.252 0.748 1.0\n O O14 1 0.565 0.242 0.148 1.0\n O O15 1 0.737 0.064 0.348 1.0\n O O16 1 0.937 0.258 0.154 1.0\n O O17 1 0.762 0.439 0.348 1.0\n O O18 1 0.242 0.565 0.148 1.0\n O O19 1 0.064 0.737 0.348 1.0\n O O20 1 0.258 0.937 0.154 1.0\n O O21 1 0.439 0.762 0.348 1.0\n O O22 1 0.063 0.742 0.654 1.0\n O O23 1 0.238 0.561 0.848 1.0\n O O24 1 0.435 0.758 0.648 1.0\n O O25 1 0.263 0.936 0.848 1.0\n O O26 1 0.742 0.063 0.654 1.0\n O O27 1 0.561 0.238 0.848 1.0\n O O28 1 0.758 0.435 0.648 1.0\n O O29 1 0.936 0.263 0.848 1.0\n F F30 1 0.250 0.250 0.996 1.0\n F F31 1 0.599 0.888 0.004 1.0\n F F32 1 0.888 0.599 0.004 1.0\n F F33 1 0.750 0.750 0.496 1.0\n F F34 1 0.112 0.401 0.504 1.0\n F F35 1 0.401 0.112 0.504 1.0\n[\/CIF]\n?\nAnswer: Na3Al2P2O8F3."}", "/scratch/micpie/export/mp_self_supervised/test_0-3.jsonl": "{"text":"The number of the spacegroup in the International Tables for Crystallography of the material with the CIF file [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.508\n_cell_length_b 5.970\n_cell_length_c 6.879\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural VSiNi\n_chemical_formula_sum 'V4 Si4 Ni4'\n_cell_volume 144.098\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n V V0 1 0.250 0.026 0.334 1.0\n V V1 1 0.250 0.526 0.166 1.0\n V V2 1 0.750 0.974 0.666 1.0\n V V3 1 0.750 0.474 0.834 1.0\n Si Si4 1 0.250 0.271 0.630 1.0\n Si Si5 1 0.250 0.771 0.870 1.0\n Si Si6 1 0.750 0.729 0.370 1.0\n Si Si7 1 0.750 0.229 0.130 1.0\n Ni Ni8 1 0.250 0.144 0.940 1.0\n Ni Ni9 1 0.250 0.644 0.560 1.0\n Ni Ni10 1 0.750 0.856 0.060 1.0\n Ni Ni11 1 0.750 0.356 0.440 1.0\n[\/CIF]\n is 62."} {"text":"The spacegroup number of the material with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.342\n_cell_length_b 5.342\n_cell_length_c 3.140\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural TmB2Rh3\n_chemical_formula_sum 'Tm1 B2 Rh3'\n_cell_volume 77.606\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Tm Tm0 1 0.000 0.000 0.500 1.0\n B B1 1 0.667 0.333 0.500 1.0\n B B2 1 0.333 0.667 0.500 1.0\n Rh Rh3 1 0.500 0.000 0.000 1.0\n Rh Rh4 1 0.500 0.500 0.000 1.0\n Rh Rh5 1 0.000 0.500 0.000 1.0\n[\/CIF]\n is 191."}", "/scratch/micpie/export/mp_self_supervised/train_0-0.jsonl": "{"text":"The spacegroup of the material of the compound with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.748\n_cell_length_b 6.486\n_cell_length_c 5.727\n_cell_angle_alpha 90.915\n_cell_angle_beta 106.375\n_cell_angle_gamma 86.870\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiCr6(OF)4\n_chemical_formula_sum 'Li1 Cr6 O4 F4'\n_cell_volume 204.531\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.034 0.377 0.921 1.0\n Cr Cr1 1 0.003 0.532 0.465 1.0\n Cr Cr2 1 0.990 0.980 0.007 1.0\n Cr Cr3 1 0.476 0.129 0.768 1.0\n Cr Cr4 1 0.514 0.624 0.738 1.0\n Cr Cr5 1 0.498 0.377 0.273 1.0\n Cr Cr6 1 0.497 0.880 0.231 1.0\n O O7 1 0.340 0.379 0.549 1.0\n O O8 1 0.659 0.869 0.956 1.0\n O O9 1 0.298 0.145 0.042 1.0\n O O10 1 0.658 0.637 0.446 1.0\n F F11 1 0.258 0.607 0.951 1.0\n F F12 1 0.183 0.874 0.350 1.0\n F F13 1 0.820 0.136 0.688 1.0\n F F14 1 0.750 0.374 0.057 1.0\n[\/CIF]\n is P1."} {"text":"The spacegroup of the material of the compound with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.306\n_cell_length_b 4.306\n_cell_length_c 14.143\n_cell_angle_alpha 81.244\n_cell_angle_beta 98.756\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Ti8Ga3Co4Si\n_chemical_formula_sum 'Ti8 Ga3 Co4 Si1'\n_cell_volume 223.544\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ti Ti0 1 0.250 0.750 0.750 1.0\n Ti Ti1 1 0.500 0.500 0.501 1.0\n Ti Ti2 1 0.999 0.001 0.998 1.0\n Ti Ti3 1 0.750 0.250 0.249 1.0\n Ti Ti4 1 0.313 0.687 0.938 1.0\n Ti Ti5 1 0.563 0.437 0.690 1.0\n Ti Ti6 1 0.063 0.937 0.189 1.0\n Ti Ti7 1 0.813 0.187 0.438 1.0\n Ga Ga8 1 0.687 0.313 0.062 1.0\n Ga Ga9 1 0.437 0.563 0.312 1.0\n Ga Ga10 1 0.188 0.812 0.563 1.0\n Co Co11 1 0.375 0.625 0.125 1.0\n Co Co12 1 0.125 0.875 0.376 1.0\n Co Co13 1 0.877 0.123 0.630 1.0\n Co Co14 1 0.624 0.376 0.871 1.0\n Si Si15 1 0.936 0.064 0.809 1.0\n[\/CIF]\n is R3m."}", "/scratch/micpie/export/mp_self_supervised/test_0-6.jsonl": "{"text":"Question: What is the spacegroup of the material of the solid with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.508\n_cell_length_b 5.970\n_cell_length_c 6.879\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural VSiNi\n_chemical_formula_sum 'V4 Si4 Ni4'\n_cell_volume 144.098\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n V V0 1 0.250 0.026 0.334 1.0\n V V1 1 0.250 0.526 0.166 1.0\n V V2 1 0.750 0.974 0.666 1.0\n V V3 1 0.750 0.474 0.834 1.0\n Si Si4 1 0.250 0.271 0.630 1.0\n Si Si5 1 0.250 0.771 0.870 1.0\n Si Si6 1 0.750 0.729 0.370 1.0\n Si Si7 1 0.750 0.229 0.130 1.0\n Ni Ni8 1 0.250 0.144 0.940 1.0\n Ni Ni9 1 0.250 0.644 0.560 1.0\n Ni Ni10 1 0.750 0.856 0.060 1.0\n Ni Ni11 1 0.750 0.356 0.440 1.0\n[\/CIF]\n?\nAnswer: Pnma."} {"text":"Question: What is the spacegroup of the material with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.342\n_cell_length_b 5.342\n_cell_length_c 3.140\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural TmB2Rh3\n_chemical_formula_sum 'Tm1 B2 Rh3'\n_cell_volume 77.606\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Tm Tm0 1 0.000 0.000 0.500 1.0\n B B1 1 0.667 0.333 0.500 1.0\n B B2 1 0.333 0.667 0.500 1.0\n Rh Rh3 1 0.500 0.000 0.000 1.0\n Rh Rh4 1 0.500 0.500 0.000 1.0\n Rh Rh5 1 0.000 0.500 0.000 1.0\n[\/CIF]\n?\nAnswer: P6\/mmm."}", "/scratch/micpie/export/mp_self_supervised/train_0-10.jsonl": "{"text":"User: I have a compound with the following CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.748\n_cell_length_b 6.486\n_cell_length_c 5.727\n_cell_angle_alpha 90.915\n_cell_angle_beta 106.375\n_cell_angle_gamma 86.870\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiCr6(OF)4\n_chemical_formula_sum 'Li1 Cr6 O4 F4'\n_cell_volume 204.531\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.034 0.377 0.921 1.0\n Cr Cr1 1 0.003 0.532 0.465 1.0\n Cr Cr2 1 0.990 0.980 0.007 1.0\n Cr Cr3 1 0.476 0.129 0.768 1.0\n Cr Cr4 1 0.514 0.624 0.738 1.0\n Cr Cr5 1 0.498 0.377 0.273 1.0\n Cr Cr6 1 0.497 0.880 0.231 1.0\n O O7 1 0.340 0.379 0.549 1.0\n O O8 1 0.659 0.869 0.956 1.0\n O O9 1 0.298 0.145 0.042 1.0\n O O10 1 0.658 0.637 0.446 1.0\n F F11 1 0.258 0.607 0.951 1.0\n F F12 1 0.183 0.874 0.350 1.0\n F F13 1 0.820 0.136 0.688 1.0\n F F14 1 0.750 0.374 0.057 1.0\n[\/CIF]\n. Can you tell me the density?\nAssistant: The density of the compound is 3.73 g\/cm^3. \nUser: Yep, I also want to know the spacegroup of the material of this compound.\nAssistant: The spacegroup of the material of the compound is P1."} {"text":"User: I have a material with the following CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.306\n_cell_length_b 4.306\n_cell_length_c 14.143\n_cell_angle_alpha 81.244\n_cell_angle_beta 98.756\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Ti8Ga3Co4Si\n_chemical_formula_sum 'Ti8 Ga3 Co4 Si1'\n_cell_volume 223.544\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ti Ti0 1 0.250 0.750 0.750 1.0\n Ti Ti1 1 0.500 0.500 0.501 1.0\n Ti Ti2 1 0.999 0.001 0.998 1.0\n Ti Ti3 1 0.750 0.250 0.249 1.0\n Ti Ti4 1 0.313 0.687 0.938 1.0\n Ti Ti5 1 0.563 0.437 0.690 1.0\n Ti Ti6 1 0.063 0.937 0.189 1.0\n Ti Ti7 1 0.813 0.187 0.438 1.0\n Ga Ga8 1 0.687 0.313 0.062 1.0\n Ga Ga9 1 0.437 0.563 0.312 1.0\n Ga Ga10 1 0.188 0.812 0.563 1.0\n Co Co11 1 0.375 0.625 0.125 1.0\n Co Co12 1 0.125 0.875 0.376 1.0\n Co Co13 1 0.877 0.123 0.630 1.0\n Co Co14 1 0.624 0.376 0.871 1.0\n Si Si15 1 0.936 0.064 0.809 1.0\n[\/CIF]\n. What is the density?\nAssistant: The density of the material of the solid is 6.36 g\/cm^3. Is there anything else I can do for you?\nUser: I also want to know the spacegroup of the material of this solid.\nAssistant: The spacegroup of the material of the solid is R3m."}", "/scratch/micpie/export/mp_self_supervised/train_0-3.jsonl": "{"text":"The number of the spacegroup in the International Tables for Crystallography of the solid with the CIF card [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.748\n_cell_length_b 6.486\n_cell_length_c 5.727\n_cell_angle_alpha 90.915\n_cell_angle_beta 106.375\n_cell_angle_gamma 86.870\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiCr6(OF)4\n_chemical_formula_sum 'Li1 Cr6 O4 F4'\n_cell_volume 204.531\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.034 0.377 0.921 1.0\n Cr Cr1 1 0.003 0.532 0.465 1.0\n Cr Cr2 1 0.990 0.980 0.007 1.0\n Cr Cr3 1 0.476 0.129 0.768 1.0\n Cr Cr4 1 0.514 0.624 0.738 1.0\n Cr Cr5 1 0.498 0.377 0.273 1.0\n Cr Cr6 1 0.497 0.880 0.231 1.0\n O O7 1 0.340 0.379 0.549 1.0\n O O8 1 0.659 0.869 0.956 1.0\n O O9 1 0.298 0.145 0.042 1.0\n O O10 1 0.658 0.637 0.446 1.0\n F F11 1 0.258 0.607 0.951 1.0\n F F12 1 0.183 0.874 0.350 1.0\n F F13 1 0.820 0.136 0.688 1.0\n F F14 1 0.750 0.374 0.057 1.0\n[\/CIF]\n is 1."} {"text":"The spacegroup number of the material of the solid with the CIF card [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.306\n_cell_length_b 4.306\n_cell_length_c 14.143\n_cell_angle_alpha 81.244\n_cell_angle_beta 98.756\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Ti8Ga3Co4Si\n_chemical_formula_sum 'Ti8 Ga3 Co4 Si1'\n_cell_volume 223.544\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ti Ti0 1 0.250 0.750 0.750 1.0\n Ti Ti1 1 0.500 0.500 0.501 1.0\n Ti Ti2 1 0.999 0.001 0.998 1.0\n Ti Ti3 1 0.750 0.250 0.249 1.0\n Ti Ti4 1 0.313 0.687 0.938 1.0\n Ti Ti5 1 0.563 0.437 0.690 1.0\n Ti Ti6 1 0.063 0.937 0.189 1.0\n Ti Ti7 1 0.813 0.187 0.438 1.0\n Ga Ga8 1 0.687 0.313 0.062 1.0\n Ga Ga9 1 0.437 0.563 0.312 1.0\n Ga Ga10 1 0.188 0.812 0.563 1.0\n Co Co11 1 0.375 0.625 0.125 1.0\n Co Co12 1 0.125 0.875 0.376 1.0\n Co Co13 1 0.877 0.123 0.630 1.0\n Co Co14 1 0.624 0.376 0.871 1.0\n Si Si15 1 0.936 0.064 0.809 1.0\n[\/CIF]\n is 160."}", "/scratch/micpie/export/mp_self_supervised/valid_0-2.jsonl": "{"text":"The chemical formula of the compound with the CIF card [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 10.187\n_cell_length_b 5.762\n_cell_length_c 10.423\n_cell_angle_alpha 87.690\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural BaTaF7\n_chemical_formula_sum 'Ba4 Ta4 F28'\n_cell_volume 611.287\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ba Ba0 1 0.331 0.310 0.509 1.0\n Ba Ba1 1 0.831 0.690 0.991 1.0\n Ba Ba2 1 0.669 0.690 0.491 1.0\n Ba Ba3 1 0.169 0.310 0.009 1.0\n Ta Ta4 1 0.611 0.185 0.185 1.0\n Ta Ta5 1 0.389 0.815 0.815 1.0\n Ta Ta6 1 0.889 0.185 0.685 1.0\n Ta Ta7 1 0.111 0.815 0.315 1.0\n F F8 1 0.449 0.101 0.880 1.0\n F F9 1 0.050 0.331 0.635 1.0\n F F10 1 0.691 0.469 0.241 1.0\n F F11 1 0.309 0.531 0.759 1.0\n F F12 1 0.455 0.880 0.641 1.0\n F F13 1 0.804 0.256 0.523 1.0\n F F14 1 0.762 0.001 0.241 1.0\n F F15 1 0.955 0.120 0.859 1.0\n F F16 1 0.450 0.331 0.135 1.0\n F F17 1 0.191 0.531 0.259 1.0\n F F18 1 0.051 0.101 0.380 1.0\n F F19 1 0.551 0.899 0.120 1.0\n F F20 1 0.262 0.999 0.259 1.0\n F F21 1 0.958 0.388 0.113 1.0\n F F22 1 0.545 0.120 0.359 1.0\n F F23 1 0.045 0.880 0.141 1.0\n F F24 1 0.238 0.999 0.759 1.0\n F F25 1 0.949 0.899 0.620 1.0\n F F26 1 0.458 0.612 0.387 1.0\n F F27 1 0.950 0.669 0.365 1.0\n F F28 1 0.042 0.612 0.887 1.0\n F F29 1 0.738 0.001 0.741 1.0\n F F30 1 0.542 0.388 0.613 1.0\n F F31 1 0.304 0.744 0.977 1.0\n F F32 1 0.196 0.744 0.477 1.0\n F F33 1 0.696 0.256 0.023 1.0\n F F34 1 0.550 0.669 0.865 1.0\n F F35 1 0.809 0.469 0.741 1.0\n[\/CIF]\n is BaTaF7."} {"text":"The composition of the compound with the CIF card [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.885\n_cell_length_b 6.885\n_cell_length_c 8.857\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 99.714\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Na3Al2P2O8F3\n_chemical_formula_sum 'Na6 Al4 P4 O16 F6'\n_cell_volume 413.828\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Na Na0 1 0.525 0.525 0.989 1.0\n Na Na1 1 0.761 0.761 0.222 1.0\n Na Na2 1 0.111 0.111 0.356 1.0\n Na Na3 1 0.239 0.239 0.722 1.0\n Na Na4 1 0.475 0.475 0.489 1.0\n Na Na5 1 0.889 0.889 0.856 1.0\n Al Al6 1 0.426 0.066 0.004 1.0\n Al Al7 1 0.066 0.426 0.004 1.0\n Al Al8 1 0.934 0.574 0.504 1.0\n Al Al9 1 0.574 0.934 0.504 1.0\n P P10 1 0.748 0.251 0.248 1.0\n P P11 1 0.251 0.748 0.248 1.0\n P P12 1 0.252 0.749 0.748 1.0\n P P13 1 0.749 0.252 0.748 1.0\n O O14 1 0.565 0.242 0.148 1.0\n O O15 1 0.737 0.064 0.348 1.0\n O O16 1 0.937 0.258 0.154 1.0\n O O17 1 0.762 0.439 0.348 1.0\n O O18 1 0.242 0.565 0.148 1.0\n O O19 1 0.064 0.737 0.348 1.0\n O O20 1 0.258 0.937 0.154 1.0\n O O21 1 0.439 0.762 0.348 1.0\n O O22 1 0.063 0.742 0.654 1.0\n O O23 1 0.238 0.561 0.848 1.0\n O O24 1 0.435 0.758 0.648 1.0\n O O25 1 0.263 0.936 0.848 1.0\n O O26 1 0.742 0.063 0.654 1.0\n O O27 1 0.561 0.238 0.848 1.0\n O O28 1 0.758 0.435 0.648 1.0\n O O29 1 0.936 0.263 0.848 1.0\n F F30 1 0.250 0.250 0.996 1.0\n F F31 1 0.599 0.888 0.004 1.0\n F F32 1 0.888 0.599 0.004 1.0\n F F33 1 0.750 0.750 0.496 1.0\n F F34 1 0.112 0.401 0.504 1.0\n F F35 1 0.401 0.112 0.504 1.0\n[\/CIF]\n is Na3Al2P2O8F3."}", "/scratch/micpie/export/mp_self_supervised/valid_0-1.jsonl": "{"text":"The density of the compound with the CIF card [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 10.187\n_cell_length_b 5.762\n_cell_length_c 10.423\n_cell_angle_alpha 87.690\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural BaTaF7\n_chemical_formula_sum 'Ba4 Ta4 F28'\n_cell_volume 611.287\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ba Ba0 1 0.331 0.310 0.509 1.0\n Ba Ba1 1 0.831 0.690 0.991 1.0\n Ba Ba2 1 0.669 0.690 0.491 1.0\n Ba Ba3 1 0.169 0.310 0.009 1.0\n Ta Ta4 1 0.611 0.185 0.185 1.0\n Ta Ta5 1 0.389 0.815 0.815 1.0\n Ta Ta6 1 0.889 0.185 0.685 1.0\n Ta Ta7 1 0.111 0.815 0.315 1.0\n F F8 1 0.449 0.101 0.880 1.0\n F F9 1 0.050 0.331 0.635 1.0\n F F10 1 0.691 0.469 0.241 1.0\n F F11 1 0.309 0.531 0.759 1.0\n F F12 1 0.455 0.880 0.641 1.0\n F F13 1 0.804 0.256 0.523 1.0\n F F14 1 0.762 0.001 0.241 1.0\n F F15 1 0.955 0.120 0.859 1.0\n F F16 1 0.450 0.331 0.135 1.0\n F F17 1 0.191 0.531 0.259 1.0\n F F18 1 0.051 0.101 0.380 1.0\n F F19 1 0.551 0.899 0.120 1.0\n F F20 1 0.262 0.999 0.259 1.0\n F F21 1 0.958 0.388 0.113 1.0\n F F22 1 0.545 0.120 0.359 1.0\n F F23 1 0.045 0.880 0.141 1.0\n F F24 1 0.238 0.999 0.759 1.0\n F F25 1 0.949 0.899 0.620 1.0\n F F26 1 0.458 0.612 0.387 1.0\n F F27 1 0.950 0.669 0.365 1.0\n F F28 1 0.042 0.612 0.887 1.0\n F F29 1 0.738 0.001 0.741 1.0\n F F30 1 0.542 0.388 0.613 1.0\n F F31 1 0.304 0.744 0.977 1.0\n F F32 1 0.196 0.744 0.477 1.0\n F F33 1 0.696 0.256 0.023 1.0\n F F34 1 0.550 0.669 0.865 1.0\n F F35 1 0.809 0.469 0.741 1.0\n[\/CIF]\n is 4.90 g\/cm^3."} {"text":"The density of the solid with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.885\n_cell_length_b 6.885\n_cell_length_c 8.857\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 99.714\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Na3Al2P2O8F3\n_chemical_formula_sum 'Na6 Al4 P4 O16 F6'\n_cell_volume 413.828\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Na Na0 1 0.525 0.525 0.989 1.0\n Na Na1 1 0.761 0.761 0.222 1.0\n Na Na2 1 0.111 0.111 0.356 1.0\n Na Na3 1 0.239 0.239 0.722 1.0\n Na Na4 1 0.475 0.475 0.489 1.0\n Na Na5 1 0.889 0.889 0.856 1.0\n Al Al6 1 0.426 0.066 0.004 1.0\n Al Al7 1 0.066 0.426 0.004 1.0\n Al Al8 1 0.934 0.574 0.504 1.0\n Al Al9 1 0.574 0.934 0.504 1.0\n P P10 1 0.748 0.251 0.248 1.0\n P P11 1 0.251 0.748 0.248 1.0\n P P12 1 0.252 0.749 0.748 1.0\n P P13 1 0.749 0.252 0.748 1.0\n O O14 1 0.565 0.242 0.148 1.0\n O O15 1 0.737 0.064 0.348 1.0\n O O16 1 0.937 0.258 0.154 1.0\n O O17 1 0.762 0.439 0.348 1.0\n O O18 1 0.242 0.565 0.148 1.0\n O O19 1 0.064 0.737 0.348 1.0\n O O20 1 0.258 0.937 0.154 1.0\n O O21 1 0.439 0.762 0.348 1.0\n O O22 1 0.063 0.742 0.654 1.0\n O O23 1 0.238 0.561 0.848 1.0\n O O24 1 0.435 0.758 0.648 1.0\n O O25 1 0.263 0.936 0.848 1.0\n O O26 1 0.742 0.063 0.654 1.0\n O O27 1 0.561 0.238 0.848 1.0\n O O28 1 0.758 0.435 0.648 1.0\n O O29 1 0.936 0.263 0.848 1.0\n F F30 1 0.250 0.250 0.996 1.0\n F F31 1 0.599 0.888 0.004 1.0\n F F32 1 0.888 0.599 0.004 1.0\n F F33 1 0.750 0.750 0.496 1.0\n F F34 1 0.112 0.401 0.504 1.0\n F F35 1 0.401 0.112 0.504 1.0\n[\/CIF]\n is 2.97 g\/cm^3."}", "/scratch/micpie/export/mp_self_supervised/valid_0-5.jsonl": "{"text":"Question: What is the density of the material with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 10.187\n_cell_length_b 5.762\n_cell_length_c 10.423\n_cell_angle_alpha 87.690\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural BaTaF7\n_chemical_formula_sum 'Ba4 Ta4 F28'\n_cell_volume 611.287\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ba Ba0 1 0.331 0.310 0.509 1.0\n Ba Ba1 1 0.831 0.690 0.991 1.0\n Ba Ba2 1 0.669 0.690 0.491 1.0\n Ba Ba3 1 0.169 0.310 0.009 1.0\n Ta Ta4 1 0.611 0.185 0.185 1.0\n Ta Ta5 1 0.389 0.815 0.815 1.0\n Ta Ta6 1 0.889 0.185 0.685 1.0\n Ta Ta7 1 0.111 0.815 0.315 1.0\n F F8 1 0.449 0.101 0.880 1.0\n F F9 1 0.050 0.331 0.635 1.0\n F F10 1 0.691 0.469 0.241 1.0\n F F11 1 0.309 0.531 0.759 1.0\n F F12 1 0.455 0.880 0.641 1.0\n F F13 1 0.804 0.256 0.523 1.0\n F F14 1 0.762 0.001 0.241 1.0\n F F15 1 0.955 0.120 0.859 1.0\n F F16 1 0.450 0.331 0.135 1.0\n F F17 1 0.191 0.531 0.259 1.0\n F F18 1 0.051 0.101 0.380 1.0\n F F19 1 0.551 0.899 0.120 1.0\n F F20 1 0.262 0.999 0.259 1.0\n F F21 1 0.958 0.388 0.113 1.0\n F F22 1 0.545 0.120 0.359 1.0\n F F23 1 0.045 0.880 0.141 1.0\n F F24 1 0.238 0.999 0.759 1.0\n F F25 1 0.949 0.899 0.620 1.0\n F F26 1 0.458 0.612 0.387 1.0\n F F27 1 0.950 0.669 0.365 1.0\n F F28 1 0.042 0.612 0.887 1.0\n F F29 1 0.738 0.001 0.741 1.0\n F F30 1 0.542 0.388 0.613 1.0\n F F31 1 0.304 0.744 0.977 1.0\n F F32 1 0.196 0.744 0.477 1.0\n F F33 1 0.696 0.256 0.023 1.0\n F F34 1 0.550 0.669 0.865 1.0\n F F35 1 0.809 0.469 0.741 1.0\n[\/CIF]\n?\nAnswer: 4.90 g\/cm^3."} {"text":"Question: What is the density of the material of the solid with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.885\n_cell_length_b 6.885\n_cell_length_c 8.857\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 99.714\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Na3Al2P2O8F3\n_chemical_formula_sum 'Na6 Al4 P4 O16 F6'\n_cell_volume 413.828\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Na Na0 1 0.525 0.525 0.989 1.0\n Na Na1 1 0.761 0.761 0.222 1.0\n Na Na2 1 0.111 0.111 0.356 1.0\n Na Na3 1 0.239 0.239 0.722 1.0\n Na Na4 1 0.475 0.475 0.489 1.0\n Na Na5 1 0.889 0.889 0.856 1.0\n Al Al6 1 0.426 0.066 0.004 1.0\n Al Al7 1 0.066 0.426 0.004 1.0\n Al Al8 1 0.934 0.574 0.504 1.0\n Al Al9 1 0.574 0.934 0.504 1.0\n P P10 1 0.748 0.251 0.248 1.0\n P P11 1 0.251 0.748 0.248 1.0\n P P12 1 0.252 0.749 0.748 1.0\n P P13 1 0.749 0.252 0.748 1.0\n O O14 1 0.565 0.242 0.148 1.0\n O O15 1 0.737 0.064 0.348 1.0\n O O16 1 0.937 0.258 0.154 1.0\n O O17 1 0.762 0.439 0.348 1.0\n O O18 1 0.242 0.565 0.148 1.0\n O O19 1 0.064 0.737 0.348 1.0\n O O20 1 0.258 0.937 0.154 1.0\n O O21 1 0.439 0.762 0.348 1.0\n O O22 1 0.063 0.742 0.654 1.0\n O O23 1 0.238 0.561 0.848 1.0\n O O24 1 0.435 0.758 0.648 1.0\n O O25 1 0.263 0.936 0.848 1.0\n O O26 1 0.742 0.063 0.654 1.0\n O O27 1 0.561 0.238 0.848 1.0\n O O28 1 0.758 0.435 0.648 1.0\n O O29 1 0.936 0.263 0.848 1.0\n F F30 1 0.250 0.250 0.996 1.0\n F F31 1 0.599 0.888 0.004 1.0\n F F32 1 0.888 0.599 0.004 1.0\n F F33 1 0.750 0.750 0.496 1.0\n F F34 1 0.112 0.401 0.504 1.0\n F F35 1 0.401 0.112 0.504 1.0\n[\/CIF]\n?\nAnswer: 2.97 g\/cm^3."}", "/scratch/micpie/export/mp_self_supervised/valid_0-4.jsonl": "{"text":"User: I want to design a material with a particular density of the material, spacegroup of the material, and chemical formula.\nAssistant: Cool, I would need to know the density of the material, spacegroup of the material, and chemical formula of the material you want to design.\nUser: The density of the material should be 4.90 g\/cm^3, the spacegroup of the material should be P2_1\/c, and the chemical formula should be BaTaF7.\nAssistant: I suggest the solid with the CIF card [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 10.187\n_cell_length_b 5.762\n_cell_length_c 10.423\n_cell_angle_alpha 87.690\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural BaTaF7\n_chemical_formula_sum 'Ba4 Ta4 F28'\n_cell_volume 611.287\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ba Ba0 1 0.331 0.310 0.509 1.0\n Ba Ba1 1 0.831 0.690 0.991 1.0\n Ba Ba2 1 0.669 0.690 0.491 1.0\n Ba Ba3 1 0.169 0.310 0.009 1.0\n Ta Ta4 1 0.611 0.185 0.185 1.0\n Ta Ta5 1 0.389 0.815 0.815 1.0\n Ta Ta6 1 0.889 0.185 0.685 1.0\n Ta Ta7 1 0.111 0.815 0.315 1.0\n F F8 1 0.449 0.101 0.880 1.0\n F F9 1 0.050 0.331 0.635 1.0\n F F10 1 0.691 0.469 0.241 1.0\n F F11 1 0.309 0.531 0.759 1.0\n F F12 1 0.455 0.880 0.641 1.0\n F F13 1 0.804 0.256 0.523 1.0\n F F14 1 0.762 0.001 0.241 1.0\n F F15 1 0.955 0.120 0.859 1.0\n F F16 1 0.450 0.331 0.135 1.0\n F F17 1 0.191 0.531 0.259 1.0\n F F18 1 0.051 0.101 0.380 1.0\n F F19 1 0.551 0.899 0.120 1.0\n F F20 1 0.262 0.999 0.259 1.0\n F F21 1 0.958 0.388 0.113 1.0\n F F22 1 0.545 0.120 0.359 1.0\n F F23 1 0.045 0.880 0.141 1.0\n F F24 1 0.238 0.999 0.759 1.0\n F F25 1 0.949 0.899 0.620 1.0\n F F26 1 0.458 0.612 0.387 1.0\n F F27 1 0.950 0.669 0.365 1.0\n F F28 1 0.042 0.612 0.887 1.0\n F F29 1 0.738 0.001 0.741 1.0\n F F30 1 0.542 0.388 0.613 1.0\n F F31 1 0.304 0.744 0.977 1.0\n F F32 1 0.196 0.744 0.477 1.0\n F F33 1 0.696 0.256 0.023 1.0\n F F34 1 0.550 0.669 0.865 1.0\n F F35 1 0.809 0.469 0.741 1.0\n[\/CIF]\n."} {"text":"User: I want to design a material with a particular density of the material, spacegroup of the material, and reduced formula.\nAssistant: I would need to know the density of the material, spacegroup of the material, and reduced formula of the material you want to design.\nUser: The density of the material should be 2.97 g\/cm^3, the spacegroup of the material should be Cmc2_1, and the reduced formula should be Na3Al2P2O8F3.\nAssistant: I advise the compound with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.885\n_cell_length_b 6.885\n_cell_length_c 8.857\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 99.714\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Na3Al2P2O8F3\n_chemical_formula_sum 'Na6 Al4 P4 O16 F6'\n_cell_volume 413.828\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Na Na0 1 0.525 0.525 0.989 1.0\n Na Na1 1 0.761 0.761 0.222 1.0\n Na Na2 1 0.111 0.111 0.356 1.0\n Na Na3 1 0.239 0.239 0.722 1.0\n Na Na4 1 0.475 0.475 0.489 1.0\n Na Na5 1 0.889 0.889 0.856 1.0\n Al Al6 1 0.426 0.066 0.004 1.0\n Al Al7 1 0.066 0.426 0.004 1.0\n Al Al8 1 0.934 0.574 0.504 1.0\n Al Al9 1 0.574 0.934 0.504 1.0\n P P10 1 0.748 0.251 0.248 1.0\n P P11 1 0.251 0.748 0.248 1.0\n P P12 1 0.252 0.749 0.748 1.0\n P P13 1 0.749 0.252 0.748 1.0\n O O14 1 0.565 0.242 0.148 1.0\n O O15 1 0.737 0.064 0.348 1.0\n O O16 1 0.937 0.258 0.154 1.0\n O O17 1 0.762 0.439 0.348 1.0\n O O18 1 0.242 0.565 0.148 1.0\n O O19 1 0.064 0.737 0.348 1.0\n O O20 1 0.258 0.937 0.154 1.0\n O O21 1 0.439 0.762 0.348 1.0\n O O22 1 0.063 0.742 0.654 1.0\n O O23 1 0.238 0.561 0.848 1.0\n O O24 1 0.435 0.758 0.648 1.0\n O O25 1 0.263 0.936 0.848 1.0\n O O26 1 0.742 0.063 0.654 1.0\n O O27 1 0.561 0.238 0.848 1.0\n O O28 1 0.758 0.435 0.648 1.0\n O O29 1 0.936 0.263 0.848 1.0\n F F30 1 0.250 0.250 0.996 1.0\n F F31 1 0.599 0.888 0.004 1.0\n F F32 1 0.888 0.599 0.004 1.0\n F F33 1 0.750 0.750 0.496 1.0\n F F34 1 0.112 0.401 0.504 1.0\n F F35 1 0.401 0.112 0.504 1.0\n[\/CIF]\n."}", "/scratch/micpie/export/mp_self_supervised/train_0-5.jsonl": "{"text":"Question: What is the density of the material of the compound with the CIF file [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.748\n_cell_length_b 6.486\n_cell_length_c 5.727\n_cell_angle_alpha 90.915\n_cell_angle_beta 106.375\n_cell_angle_gamma 86.870\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiCr6(OF)4\n_chemical_formula_sum 'Li1 Cr6 O4 F4'\n_cell_volume 204.531\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.034 0.377 0.921 1.0\n Cr Cr1 1 0.003 0.532 0.465 1.0\n Cr Cr2 1 0.990 0.980 0.007 1.0\n Cr Cr3 1 0.476 0.129 0.768 1.0\n Cr Cr4 1 0.514 0.624 0.738 1.0\n Cr Cr5 1 0.498 0.377 0.273 1.0\n Cr Cr6 1 0.497 0.880 0.231 1.0\n O O7 1 0.340 0.379 0.549 1.0\n O O8 1 0.659 0.869 0.956 1.0\n O O9 1 0.298 0.145 0.042 1.0\n O O10 1 0.658 0.637 0.446 1.0\n F F11 1 0.258 0.607 0.951 1.0\n F F12 1 0.183 0.874 0.350 1.0\n F F13 1 0.820 0.136 0.688 1.0\n F F14 1 0.750 0.374 0.057 1.0\n[\/CIF]\n?\nAnswer: 3.73 g\/cm^3."} {"text":"Question: What is the density of the material of the solid with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.306\n_cell_length_b 4.306\n_cell_length_c 14.143\n_cell_angle_alpha 81.244\n_cell_angle_beta 98.756\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Ti8Ga3Co4Si\n_chemical_formula_sum 'Ti8 Ga3 Co4 Si1'\n_cell_volume 223.544\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ti Ti0 1 0.250 0.750 0.750 1.0\n Ti Ti1 1 0.500 0.500 0.501 1.0\n Ti Ti2 1 0.999 0.001 0.998 1.0\n Ti Ti3 1 0.750 0.250 0.249 1.0\n Ti Ti4 1 0.313 0.687 0.938 1.0\n Ti Ti5 1 0.563 0.437 0.690 1.0\n Ti Ti6 1 0.063 0.937 0.189 1.0\n Ti Ti7 1 0.813 0.187 0.438 1.0\n Ga Ga8 1 0.687 0.313 0.062 1.0\n Ga Ga9 1 0.437 0.563 0.312 1.0\n Ga Ga10 1 0.188 0.812 0.563 1.0\n Co Co11 1 0.375 0.625 0.125 1.0\n Co Co12 1 0.125 0.875 0.376 1.0\n Co Co13 1 0.877 0.123 0.630 1.0\n Co Co14 1 0.624 0.376 0.871 1.0\n Si Si15 1 0.936 0.064 0.809 1.0\n[\/CIF]\n?\nAnswer: 6.36 g\/cm^3."}", "/scratch/micpie/export/mp_self_supervised/train_0-2.jsonl": "{"text":"The reduced formula of the solid with the CIF card [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.748\n_cell_length_b 6.486\n_cell_length_c 5.727\n_cell_angle_alpha 90.915\n_cell_angle_beta 106.375\n_cell_angle_gamma 86.870\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiCr6(OF)4\n_chemical_formula_sum 'Li1 Cr6 O4 F4'\n_cell_volume 204.531\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.034 0.377 0.921 1.0\n Cr Cr1 1 0.003 0.532 0.465 1.0\n Cr Cr2 1 0.990 0.980 0.007 1.0\n Cr Cr3 1 0.476 0.129 0.768 1.0\n Cr Cr4 1 0.514 0.624 0.738 1.0\n Cr Cr5 1 0.498 0.377 0.273 1.0\n Cr Cr6 1 0.497 0.880 0.231 1.0\n O O7 1 0.340 0.379 0.549 1.0\n O O8 1 0.659 0.869 0.956 1.0\n O O9 1 0.298 0.145 0.042 1.0\n O O10 1 0.658 0.637 0.446 1.0\n F F11 1 0.258 0.607 0.951 1.0\n F F12 1 0.183 0.874 0.350 1.0\n F F13 1 0.820 0.136 0.688 1.0\n F F14 1 0.750 0.374 0.057 1.0\n[\/CIF]\n is LiCr6(OF)4."} {"text":"The reduced formula of the solid with the CIF file [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.306\n_cell_length_b 4.306\n_cell_length_c 14.143\n_cell_angle_alpha 81.244\n_cell_angle_beta 98.756\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Ti8Ga3Co4Si\n_chemical_formula_sum 'Ti8 Ga3 Co4 Si1'\n_cell_volume 223.544\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ti Ti0 1 0.250 0.750 0.750 1.0\n Ti Ti1 1 0.500 0.500 0.501 1.0\n Ti Ti2 1 0.999 0.001 0.998 1.0\n Ti Ti3 1 0.750 0.250 0.249 1.0\n Ti Ti4 1 0.313 0.687 0.938 1.0\n Ti Ti5 1 0.563 0.437 0.690 1.0\n Ti Ti6 1 0.063 0.937 0.189 1.0\n Ti Ti7 1 0.813 0.187 0.438 1.0\n Ga Ga8 1 0.687 0.313 0.062 1.0\n Ga Ga9 1 0.437 0.563 0.312 1.0\n Ga Ga10 1 0.188 0.812 0.563 1.0\n Co Co11 1 0.375 0.625 0.125 1.0\n Co Co12 1 0.125 0.875 0.376 1.0\n Co Co13 1 0.877 0.123 0.630 1.0\n Co Co14 1 0.624 0.376 0.871 1.0\n Si Si15 1 0.936 0.064 0.809 1.0\n[\/CIF]\n is Ti8Ga3Co4Si."}", "/scratch/micpie/export/mp_self_supervised/train_0-7.jsonl": "{"text":"Question: What is the composition of the solid with the CIF file [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.748\n_cell_length_b 6.486\n_cell_length_c 5.727\n_cell_angle_alpha 90.915\n_cell_angle_beta 106.375\n_cell_angle_gamma 86.870\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiCr6(OF)4\n_chemical_formula_sum 'Li1 Cr6 O4 F4'\n_cell_volume 204.531\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.034 0.377 0.921 1.0\n Cr Cr1 1 0.003 0.532 0.465 1.0\n Cr Cr2 1 0.990 0.980 0.007 1.0\n Cr Cr3 1 0.476 0.129 0.768 1.0\n Cr Cr4 1 0.514 0.624 0.738 1.0\n Cr Cr5 1 0.498 0.377 0.273 1.0\n Cr Cr6 1 0.497 0.880 0.231 1.0\n O O7 1 0.340 0.379 0.549 1.0\n O O8 1 0.659 0.869 0.956 1.0\n O O9 1 0.298 0.145 0.042 1.0\n O O10 1 0.658 0.637 0.446 1.0\n F F11 1 0.258 0.607 0.951 1.0\n F F12 1 0.183 0.874 0.350 1.0\n F F13 1 0.820 0.136 0.688 1.0\n F F14 1 0.750 0.374 0.057 1.0\n[\/CIF]\n?\nAnswer: LiCr6(OF)4."} {"text":"Question: What is the composition of the compound with the CIF file [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.306\n_cell_length_b 4.306\n_cell_length_c 14.143\n_cell_angle_alpha 81.244\n_cell_angle_beta 98.756\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Ti8Ga3Co4Si\n_chemical_formula_sum 'Ti8 Ga3 Co4 Si1'\n_cell_volume 223.544\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ti Ti0 1 0.250 0.750 0.750 1.0\n Ti Ti1 1 0.500 0.500 0.501 1.0\n Ti Ti2 1 0.999 0.001 0.998 1.0\n Ti Ti3 1 0.750 0.250 0.249 1.0\n Ti Ti4 1 0.313 0.687 0.938 1.0\n Ti Ti5 1 0.563 0.437 0.690 1.0\n Ti Ti6 1 0.063 0.937 0.189 1.0\n Ti Ti7 1 0.813 0.187 0.438 1.0\n Ga Ga8 1 0.687 0.313 0.062 1.0\n Ga Ga9 1 0.437 0.563 0.312 1.0\n Ga Ga10 1 0.188 0.812 0.563 1.0\n Co Co11 1 0.375 0.625 0.125 1.0\n Co Co12 1 0.125 0.875 0.376 1.0\n Co Co13 1 0.877 0.123 0.630 1.0\n Co Co14 1 0.624 0.376 0.871 1.0\n Si Si15 1 0.936 0.064 0.809 1.0\n[\/CIF]\n?\nAnswer: Ti8Ga3Co4Si."}", "/scratch/micpie/export/mp_self_supervised/train_0-1.jsonl": "{"text":"The density of the material of the compound with the CIF file [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.748\n_cell_length_b 6.486\n_cell_length_c 5.727\n_cell_angle_alpha 90.915\n_cell_angle_beta 106.375\n_cell_angle_gamma 86.870\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiCr6(OF)4\n_chemical_formula_sum 'Li1 Cr6 O4 F4'\n_cell_volume 204.531\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.034 0.377 0.921 1.0\n Cr Cr1 1 0.003 0.532 0.465 1.0\n Cr Cr2 1 0.990 0.980 0.007 1.0\n Cr Cr3 1 0.476 0.129 0.768 1.0\n Cr Cr4 1 0.514 0.624 0.738 1.0\n Cr Cr5 1 0.498 0.377 0.273 1.0\n Cr Cr6 1 0.497 0.880 0.231 1.0\n O O7 1 0.340 0.379 0.549 1.0\n O O8 1 0.659 0.869 0.956 1.0\n O O9 1 0.298 0.145 0.042 1.0\n O O10 1 0.658 0.637 0.446 1.0\n F F11 1 0.258 0.607 0.951 1.0\n F F12 1 0.183 0.874 0.350 1.0\n F F13 1 0.820 0.136 0.688 1.0\n F F14 1 0.750 0.374 0.057 1.0\n[\/CIF]\n is 3.73 g\/cm^3."} {"text":"The density of the material of the compound with the CIF file [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.306\n_cell_length_b 4.306\n_cell_length_c 14.143\n_cell_angle_alpha 81.244\n_cell_angle_beta 98.756\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Ti8Ga3Co4Si\n_chemical_formula_sum 'Ti8 Ga3 Co4 Si1'\n_cell_volume 223.544\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ti Ti0 1 0.250 0.750 0.750 1.0\n Ti Ti1 1 0.500 0.500 0.501 1.0\n Ti Ti2 1 0.999 0.001 0.998 1.0\n Ti Ti3 1 0.750 0.250 0.249 1.0\n Ti Ti4 1 0.313 0.687 0.938 1.0\n Ti Ti5 1 0.563 0.437 0.690 1.0\n Ti Ti6 1 0.063 0.937 0.189 1.0\n Ti Ti7 1 0.813 0.187 0.438 1.0\n Ga Ga8 1 0.687 0.313 0.062 1.0\n Ga Ga9 1 0.437 0.563 0.312 1.0\n Ga Ga10 1 0.188 0.812 0.563 1.0\n Co Co11 1 0.375 0.625 0.125 1.0\n Co Co12 1 0.125 0.875 0.376 1.0\n Co Co13 1 0.877 0.123 0.630 1.0\n Co Co14 1 0.624 0.376 0.871 1.0\n Si Si15 1 0.936 0.064 0.809 1.0\n[\/CIF]\n is 6.36 g\/cm^3."}", "/scratch/micpie/export/mp_self_supervised/train_0-4.jsonl": "{"text":"User: I want to design a material with a particular density of the material, spacegroup, and reduced formula.\nAssistant: Cool, I would need to know the density of the material, spacegroup, and reduced formula of the material you want to design.\nUser: The density of the material should be 3.73 g\/cm^3, the spacegroup should be P1, and the reduced formula should be LiCr6(OF)4.\nAssistant: I the material with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.748\n_cell_length_b 6.486\n_cell_length_c 5.727\n_cell_angle_alpha 90.915\n_cell_angle_beta 106.375\n_cell_angle_gamma 86.870\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiCr6(OF)4\n_chemical_formula_sum 'Li1 Cr6 O4 F4'\n_cell_volume 204.531\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.034 0.377 0.921 1.0\n Cr Cr1 1 0.003 0.532 0.465 1.0\n Cr Cr2 1 0.990 0.980 0.007 1.0\n Cr Cr3 1 0.476 0.129 0.768 1.0\n Cr Cr4 1 0.514 0.624 0.738 1.0\n Cr Cr5 1 0.498 0.377 0.273 1.0\n Cr Cr6 1 0.497 0.880 0.231 1.0\n O O7 1 0.340 0.379 0.549 1.0\n O O8 1 0.659 0.869 0.956 1.0\n O O9 1 0.298 0.145 0.042 1.0\n O O10 1 0.658 0.637 0.446 1.0\n F F11 1 0.258 0.607 0.951 1.0\n F F12 1 0.183 0.874 0.350 1.0\n F F13 1 0.820 0.136 0.688 1.0\n F F14 1 0.750 0.374 0.057 1.0\n[\/CIF]\n."} {"text":"User: I want to design a material with a particular density of the material, spacegroup of the material, and reduced formula.\nAssistant: That sounds interesting, I would need to know the density of the material, spacegroup of the material, and reduced formula of the material you want to design.\nUser: The density of the material should be 6.36 g\/cm^3, the spacegroup of the material should be R3m, and the reduced formula should be Ti8Ga3Co4Si.\nAssistant: I propose the solid with the CIF card [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.306\n_cell_length_b 4.306\n_cell_length_c 14.143\n_cell_angle_alpha 81.244\n_cell_angle_beta 98.756\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Ti8Ga3Co4Si\n_chemical_formula_sum 'Ti8 Ga3 Co4 Si1'\n_cell_volume 223.544\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ti Ti0 1 0.250 0.750 0.750 1.0\n Ti Ti1 1 0.500 0.500 0.501 1.0\n Ti Ti2 1 0.999 0.001 0.998 1.0\n Ti Ti3 1 0.750 0.250 0.249 1.0\n Ti Ti4 1 0.313 0.687 0.938 1.0\n Ti Ti5 1 0.563 0.437 0.690 1.0\n Ti Ti6 1 0.063 0.937 0.189 1.0\n Ti Ti7 1 0.813 0.187 0.438 1.0\n Ga Ga8 1 0.687 0.313 0.062 1.0\n Ga Ga9 1 0.437 0.563 0.312 1.0\n Ga Ga10 1 0.188 0.812 0.563 1.0\n Co Co11 1 0.375 0.625 0.125 1.0\n Co Co12 1 0.125 0.875 0.376 1.0\n Co Co13 1 0.877 0.123 0.630 1.0\n Co Co14 1 0.624 0.376 0.871 1.0\n Si Si15 1 0.936 0.064 0.809 1.0\n[\/CIF]\n."}", "/scratch/micpie/export/mp_self_supervised/test_0-7.jsonl": "{"text":"Question: What is the composition of the compound with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.508\n_cell_length_b 5.970\n_cell_length_c 6.879\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural VSiNi\n_chemical_formula_sum 'V4 Si4 Ni4'\n_cell_volume 144.098\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n V V0 1 0.250 0.026 0.334 1.0\n V V1 1 0.250 0.526 0.166 1.0\n V V2 1 0.750 0.974 0.666 1.0\n V V3 1 0.750 0.474 0.834 1.0\n Si Si4 1 0.250 0.271 0.630 1.0\n Si Si5 1 0.250 0.771 0.870 1.0\n Si Si6 1 0.750 0.729 0.370 1.0\n Si Si7 1 0.750 0.229 0.130 1.0\n Ni Ni8 1 0.250 0.144 0.940 1.0\n Ni Ni9 1 0.250 0.644 0.560 1.0\n Ni Ni10 1 0.750 0.856 0.060 1.0\n Ni Ni11 1 0.750 0.356 0.440 1.0\n[\/CIF]\n?\nAnswer: VSiNi."} {"text":"Question: What is the reduced formula of the compound with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.342\n_cell_length_b 5.342\n_cell_length_c 3.140\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural TmB2Rh3\n_chemical_formula_sum 'Tm1 B2 Rh3'\n_cell_volume 77.606\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Tm Tm0 1 0.000 0.000 0.500 1.0\n B B1 1 0.667 0.333 0.500 1.0\n B B2 1 0.333 0.667 0.500 1.0\n Rh Rh3 1 0.500 0.000 0.000 1.0\n Rh Rh4 1 0.500 0.500 0.000 1.0\n Rh Rh5 1 0.000 0.500 0.000 1.0\n[\/CIF]\n?\nAnswer: TmB2Rh3."}", "/scratch/micpie/export/mp_self_supervised/train_0-9.jsonl": "{"text":"User: I want to design a material with a density of the material of 3.73 g\/cm^3, and a composition of LiCr6(OF)4.\nAssistant: Great, I suggest the material with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.748\n_cell_length_b 6.486\n_cell_length_c 5.727\n_cell_angle_alpha 90.915\n_cell_angle_beta 106.375\n_cell_angle_gamma 86.870\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiCr6(OF)4\n_chemical_formula_sum 'Li1 Cr6 O4 F4'\n_cell_volume 204.531\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.034 0.377 0.921 1.0\n Cr Cr1 1 0.003 0.532 0.465 1.0\n Cr Cr2 1 0.990 0.980 0.007 1.0\n Cr Cr3 1 0.476 0.129 0.768 1.0\n Cr Cr4 1 0.514 0.624 0.738 1.0\n Cr Cr5 1 0.498 0.377 0.273 1.0\n Cr Cr6 1 0.497 0.880 0.231 1.0\n O O7 1 0.340 0.379 0.549 1.0\n O O8 1 0.659 0.869 0.956 1.0\n O O9 1 0.298 0.145 0.042 1.0\n O O10 1 0.658 0.637 0.446 1.0\n F F11 1 0.258 0.607 0.951 1.0\n F F12 1 0.183 0.874 0.350 1.0\n F F13 1 0.820 0.136 0.688 1.0\n F F14 1 0.750 0.374 0.057 1.0\n[\/CIF]\n. \nUser: Yeah, I also want to know the spacegroup of the material of this material.\nAssistant: The spacegroup of the material of the material is P1."} {"text":"User: I want to design a solid with a density of 6.36 g\/cm^3, and a chemical formula of Ti8Ga3Co4Si.\nAssistant: Great, I suggest the solid with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.306\n_cell_length_b 4.306\n_cell_length_c 14.143\n_cell_angle_alpha 81.244\n_cell_angle_beta 98.756\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Ti8Ga3Co4Si\n_chemical_formula_sum 'Ti8 Ga3 Co4 Si1'\n_cell_volume 223.544\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ti Ti0 1 0.250 0.750 0.750 1.0\n Ti Ti1 1 0.500 0.500 0.501 1.0\n Ti Ti2 1 0.999 0.001 0.998 1.0\n Ti Ti3 1 0.750 0.250 0.249 1.0\n Ti Ti4 1 0.313 0.687 0.938 1.0\n Ti Ti5 1 0.563 0.437 0.690 1.0\n Ti Ti6 1 0.063 0.937 0.189 1.0\n Ti Ti7 1 0.813 0.187 0.438 1.0\n Ga Ga8 1 0.687 0.313 0.062 1.0\n Ga Ga9 1 0.437 0.563 0.312 1.0\n Ga Ga10 1 0.188 0.812 0.563 1.0\n Co Co11 1 0.375 0.625 0.125 1.0\n Co Co12 1 0.125 0.875 0.376 1.0\n Co Co13 1 0.877 0.123 0.630 1.0\n Co Co14 1 0.624 0.376 0.871 1.0\n Si Si15 1 0.936 0.064 0.809 1.0\n[\/CIF]\n. Is there anything else I can do for you?\nUser: Indeed, I also want to know the spacegroup of the material of this solid.\nAssistant: The spacegroup of the material of the solid is R3m."}", "/scratch/micpie/export/mp_self_supervised/valid_0-3.jsonl": "{"text":"The spacegroup number of the solid with the CIF card [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 10.187\n_cell_length_b 5.762\n_cell_length_c 10.423\n_cell_angle_alpha 87.690\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural BaTaF7\n_chemical_formula_sum 'Ba4 Ta4 F28'\n_cell_volume 611.287\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Ba Ba0 1 0.331 0.310 0.509 1.0\n Ba Ba1 1 0.831 0.690 0.991 1.0\n Ba Ba2 1 0.669 0.690 0.491 1.0\n Ba Ba3 1 0.169 0.310 0.009 1.0\n Ta Ta4 1 0.611 0.185 0.185 1.0\n Ta Ta5 1 0.389 0.815 0.815 1.0\n Ta Ta6 1 0.889 0.185 0.685 1.0\n Ta Ta7 1 0.111 0.815 0.315 1.0\n F F8 1 0.449 0.101 0.880 1.0\n F F9 1 0.050 0.331 0.635 1.0\n F F10 1 0.691 0.469 0.241 1.0\n F F11 1 0.309 0.531 0.759 1.0\n F F12 1 0.455 0.880 0.641 1.0\n F F13 1 0.804 0.256 0.523 1.0\n F F14 1 0.762 0.001 0.241 1.0\n F F15 1 0.955 0.120 0.859 1.0\n F F16 1 0.450 0.331 0.135 1.0\n F F17 1 0.191 0.531 0.259 1.0\n F F18 1 0.051 0.101 0.380 1.0\n F F19 1 0.551 0.899 0.120 1.0\n F F20 1 0.262 0.999 0.259 1.0\n F F21 1 0.958 0.388 0.113 1.0\n F F22 1 0.545 0.120 0.359 1.0\n F F23 1 0.045 0.880 0.141 1.0\n F F24 1 0.238 0.999 0.759 1.0\n F F25 1 0.949 0.899 0.620 1.0\n F F26 1 0.458 0.612 0.387 1.0\n F F27 1 0.950 0.669 0.365 1.0\n F F28 1 0.042 0.612 0.887 1.0\n F F29 1 0.738 0.001 0.741 1.0\n F F30 1 0.542 0.388 0.613 1.0\n F F31 1 0.304 0.744 0.977 1.0\n F F32 1 0.196 0.744 0.477 1.0\n F F33 1 0.696 0.256 0.023 1.0\n F F34 1 0.550 0.669 0.865 1.0\n F F35 1 0.809 0.469 0.741 1.0\n[\/CIF]\n is 14."} {"text":"The spacegroup number of the material with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.885\n_cell_length_b 6.885\n_cell_length_c 8.857\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 99.714\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural Na3Al2P2O8F3\n_chemical_formula_sum 'Na6 Al4 P4 O16 F6'\n_cell_volume 413.828\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Na Na0 1 0.525 0.525 0.989 1.0\n Na Na1 1 0.761 0.761 0.222 1.0\n Na Na2 1 0.111 0.111 0.356 1.0\n Na Na3 1 0.239 0.239 0.722 1.0\n Na Na4 1 0.475 0.475 0.489 1.0\n Na Na5 1 0.889 0.889 0.856 1.0\n Al Al6 1 0.426 0.066 0.004 1.0\n Al Al7 1 0.066 0.426 0.004 1.0\n Al Al8 1 0.934 0.574 0.504 1.0\n Al Al9 1 0.574 0.934 0.504 1.0\n P P10 1 0.748 0.251 0.248 1.0\n P P11 1 0.251 0.748 0.248 1.0\n P P12 1 0.252 0.749 0.748 1.0\n P P13 1 0.749 0.252 0.748 1.0\n O O14 1 0.565 0.242 0.148 1.0\n O O15 1 0.737 0.064 0.348 1.0\n O O16 1 0.937 0.258 0.154 1.0\n O O17 1 0.762 0.439 0.348 1.0\n O O18 1 0.242 0.565 0.148 1.0\n O O19 1 0.064 0.737 0.348 1.0\n O O20 1 0.258 0.937 0.154 1.0\n O O21 1 0.439 0.762 0.348 1.0\n O O22 1 0.063 0.742 0.654 1.0\n O O23 1 0.238 0.561 0.848 1.0\n O O24 1 0.435 0.758 0.648 1.0\n O O25 1 0.263 0.936 0.848 1.0\n O O26 1 0.742 0.063 0.654 1.0\n O O27 1 0.561 0.238 0.848 1.0\n O O28 1 0.758 0.435 0.648 1.0\n O O29 1 0.936 0.263 0.848 1.0\n F F30 1 0.250 0.250 0.996 1.0\n F F31 1 0.599 0.888 0.004 1.0\n F F32 1 0.888 0.599 0.004 1.0\n F F33 1 0.750 0.750 0.496 1.0\n F F34 1 0.112 0.401 0.504 1.0\n F F35 1 0.401 0.112 0.504 1.0\n[\/CIF]\n is 36."}", "/scratch/micpie/export/mp_self_supervised/test_0-8.jsonl": "{"text":"Question: What is the number of the spacegroup in the International Tables for Crystallography of the solid with the CIF card [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.508\n_cell_length_b 5.970\n_cell_length_c 6.879\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural VSiNi\n_chemical_formula_sum 'V4 Si4 Ni4'\n_cell_volume 144.098\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n V V0 1 0.250 0.026 0.334 1.0\n V V1 1 0.250 0.526 0.166 1.0\n V V2 1 0.750 0.974 0.666 1.0\n V V3 1 0.750 0.474 0.834 1.0\n Si Si4 1 0.250 0.271 0.630 1.0\n Si Si5 1 0.250 0.771 0.870 1.0\n Si Si6 1 0.750 0.729 0.370 1.0\n Si Si7 1 0.750 0.229 0.130 1.0\n Ni Ni8 1 0.250 0.144 0.940 1.0\n Ni Ni9 1 0.250 0.644 0.560 1.0\n Ni Ni10 1 0.750 0.856 0.060 1.0\n Ni Ni11 1 0.750 0.356 0.440 1.0\n[\/CIF]\n?\nAnswer: 62."} {"text":"Question: What is the number of the spacegroup in the International Tables for Crystallography of the material with the CIF [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.342\n_cell_length_b 5.342\n_cell_length_c 3.140\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural TmB2Rh3\n_chemical_formula_sum 'Tm1 B2 Rh3'\n_cell_volume 77.606\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Tm Tm0 1 0.000 0.000 0.500 1.0\n B B1 1 0.667 0.333 0.500 1.0\n B B2 1 0.333 0.667 0.500 1.0\n Rh Rh3 1 0.500 0.000 0.000 1.0\n Rh Rh4 1 0.500 0.500 0.000 1.0\n Rh Rh5 1 0.000 0.500 0.000 1.0\n[\/CIF]\n?\nAnswer: 191."}", "/scratch/micpie/export/mp_self_supervised/test_0-4.jsonl": "{"text":"User: I want to design a material with a particular density of the material, spacegroup of the material, and composition.\nAssistant: I would need to know the density of the material, spacegroup of the material, and composition of the material you want to design.\nUser: The density of the material should be 6.35 g\/cm^3, the spacegroup of the material should be Pnma, and the composition should be VSiNi.\nAssistant: I propose the material with the CIF card [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.508\n_cell_length_b 5.970\n_cell_length_c 6.879\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural VSiNi\n_chemical_formula_sum 'V4 Si4 Ni4'\n_cell_volume 144.098\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n V V0 1 0.250 0.026 0.334 1.0\n V V1 1 0.250 0.526 0.166 1.0\n V V2 1 0.750 0.974 0.666 1.0\n V V3 1 0.750 0.474 0.834 1.0\n Si Si4 1 0.250 0.271 0.630 1.0\n Si Si5 1 0.250 0.771 0.870 1.0\n Si Si6 1 0.750 0.729 0.370 1.0\n Si Si7 1 0.750 0.229 0.130 1.0\n Ni Ni8 1 0.250 0.144 0.940 1.0\n Ni Ni9 1 0.250 0.644 0.560 1.0\n Ni Ni10 1 0.750 0.856 0.060 1.0\n Ni Ni11 1 0.750 0.356 0.440 1.0\n[\/CIF]\n."} {"text":"User: I want to design a material with a particular density of the material, spacegroup of the material, and chemical formula.\nAssistant: I would need to know the density of the material, spacegroup of the material, and chemical formula of the material you want to design.\nUser: The density of the material should be 10.68 g\/cm^3, the spacegroup of the material should be P6\/mmm, and the chemical formula should be TmB2Rh3.\nAssistant: I advise the compound with the CIF file [CIF]\ndata_cif\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.342\n_cell_length_b 5.342\n_cell_length_c 3.140\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 120.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural TmB2Rh3\n_chemical_formula_sum 'Tm1 B2 Rh3'\n_cell_volume 77.606\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Tm Tm0 1 0.000 0.000 0.500 1.0\n B B1 1 0.667 0.333 0.500 1.0\n B B2 1 0.333 0.667 0.500 1.0\n Rh Rh3 1 0.500 0.000 0.000 1.0\n Rh Rh4 1 0.500 0.500 0.000 1.0\n Rh Rh5 1 0.000 0.500 0.000 1.0\n[\/CIF]\n."}", "/scratch/micpie/export/SIDER/train_0-17.jsonl": "{"text":"The molecular species with the canonical SMILES representation of CC(C)(C)c1cc(C(C)(C)C)c(NC(=O)c2c[nH]c3ccccc3c2=O)cc1O is not a potential cause for psychiatric disorders."} {"text":"The compound with the SELFIES [C][C][C][=Branch1][C][=O][C][Branch1][O][C][C][Branch1][C][C][N][Branch1][C][C][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1] is a potential reason for psychiatric disorders."}", "/scratch/micpie/export/SIDER/train_0-16.jsonl": "{"text":"The chemical compound with the DeepSMILES representation of CCC)C)C=CC=CC=C6NC=O)C=CNC=CC=CC=C6C%10=O)))))))))))))))O))CC)C)C is a potential cause for respiratory, thoracic and mediastinal disorders."} {"text":"The compound with the SMILES CCC(=O)C(CC(C)N(C)C)(C1=CC=CC=C1)C2=CC=CC=C2 is a potential reason for respiratory and thoracic disorders."}", "/scratch/micpie/export/SIDER/test_0-10.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C22H30O\/c1-4-21-14-15(3)20-17-9-7-6-8-16(17)10-11-18(20)19(21)12-13-22(21,23)5-2\/h2,8,17-20,23H,3-4,6-7,9-14H2,1H3\/t17-,18-,19-,20+,21-,22-\/m0\/s1 is not a potential cause for surgical and medical procedures."} {"text":"The molecule with the SELFIES [O-1][S][=Branch1][C][=O][=Branch1][C][=O][O-1].[Zn+2] is not a potential reason for medical and surgical procedures."}", "/scratch/micpie/export/SIDER/valid_0-8.jsonl": "{"text":"The molecular species with the canonical SMILES representation of NCCNCCNCCNCCN is a potential cause for general disorders and administration site conditions."} {"text":"The molecular species with the DeepSMILES C=CC=CC=C6CCCC=O)O))))))))NCCCl)))CCCl is a potential reason for general disorders and administration site conditions."}", "/scratch/micpie/export/SIDER/test_0-22.jsonl": "{"text":"The chemical compound with the SMILES representation of CC[C@]12CC(=C)[C@H]3[C@H]([C@@H]1CC[C@]2(C#C)O)CCC4=CCCC[C@H]34 is a potential reason for disorders of the nervous system."} {"text":"The molecule with the InChI InChI=1S\/H2O4S.Zn\/c1-5(2,3)4;\/h(H2,1,2,3,4);\/q;+2\/p-2 is not a potential reason for nervous system disorders."}", "/scratch/micpie/export/SIDER/test_0-16.jsonl": "{"text":"The molecule with the canonical SMILES C#C[C@]1(O)CC[C@H]2[C@@H]3CCC4=CCCC[C@@H]4[C@H]3C(=C)C[C@@]21CC is not a potential cause for respiratory and thoracic disorders."} {"text":"The compound with the canonical SMILES representation of O=S(=O)([O-])[O-].[Zn+2] is not a potential reason for respiratory and thoracic disorders."}", "/scratch/micpie/export/SIDER/test_0-15.jsonl": "{"text":"The chemical compound with the SELFIES [C][C][C@][C][C][=Branch1][C][=C][C@H1][C@H1][Branch1][=C][C@@H1][Ring1][#Branch1][C][C][C@][Ring1][#Branch2][Branch1][Ring1][C][#C][O][C][C][C][=C][C][C][C][C@H1][Ring1][P][Ring1][=Branch1] is not a potential cause for infections and infestations."} {"text":"The molecular species with the InChI representation of InChI=1S\/H2O4S.Zn\/c1-5(2,3)4;\/h(H2,1,2,3,4);\/q;+2\/p-2 is not a potential reason for infestations and infections."}", "/scratch/micpie/export/SIDER/train_0-8.jsonl": "{"text":"The molecular species with the DeepSMILES CCC)C)C=CC=CC=C6NC=O)C=CNC=CC=CC=C6C%10=O)))))))))))))))O))CC)C)C is not a potential cause for general health and administration site conditions."} {"text":"The compound with the InChI InChI=1S\/C21H27NO\/c1-5-20(23)21(16-17(2)22(3)4,18-12-8-6-9-13-18)19-14-10-7-11-15-19\/h6-15,17H,5,16H2,1-4H3 is a potential cause for general disorders and administration site conditions."}", "/scratch/micpie/export/SIDER/test_0-5.jsonl": "{"text":"The chemical compound with the InChI InChI=1S\/C22H30O\/c1-4-21-14-15(3)20-17-9-7-6-8-16(17)10-11-18(20)19(21)12-13-22(21,23)5-2\/h2,8,17-20,23H,3-4,6-7,9-14H2,1H3\/t17-,18-,19-,20+,21-,22-\/m0\/s1 is a potential cause for disorders of the immune system."} {"text":"The molecular species with the InChI representation of InChI=1S\/H2O4S.Zn\/c1-5(2,3)4;\/h(H2,1,2,3,4);\/q;+2\/p-2 is not a potential cause for disorders of the immune system."}", "/scratch/micpie/export/SIDER/valid_0-9.jsonl": "{"text":"The molecular species with the SMILES C(CNCCNCCNCCN)N is not a potential cause for endocrine disorders."} {"text":"The chemical with the canonical SMILES O=C(O)CCCc1ccc(N(CCCl)CCCl)cc1 is a potential reason for endocrine system disorders."}", "/scratch/micpie/export/SIDER/test_0-19.jsonl": "{"text":"The molecular species with the DeepSMILES CC[C@]CC=C)[C@H][C@H][C@@H]6CC[C@]9C#C))O)))))CCC=CCCC[C@H]%106 is not a potential cause for pregnancy, puerperium and perinatal conditions."} {"text":"The compound with the DeepSMILES [O-]S=O)=O)[O-].[Zn+2] is not a potential reason for pregnancy, childbirth, and newborn conditions."}", "/scratch/micpie/export/SIDER/test_0-1.jsonl": "{"text":"The molecule with the canonical SMILES C#C[C@]1(O)CC[C@H]2[C@@H]3CCC4=CCCC[C@@H]4[C@H]3C(=C)C[C@@]21CC is a potential cause for metabolic and nutritional disorders."} {"text":"The chemical compound with the DeepSMILES [O-]S=O)=O)[O-].[Zn+2] is not a potential cause for metabolism and nutrition disorders."}", "/scratch/micpie/export/SIDER/test_0-18.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C22H30O\/c1-4-21-14-15(3)20-17-9-7-6-8-16(17)10-11-18(20)19(21)12-13-22(21,23)5-2\/h2,8,17-20,23H,3-4,6-7,9-14H2,1H3\/t17-,18-,19-,20+,21-,22-\/m0\/s1 is not a potential cause for kidney and urinary tract disorders."} {"text":"The chemical with the SELFIES representation of [O-1][S][=Branch1][C][=O][=Branch1][C][=O][O-1].[Zn+2] is not a potential reason for renal and urinary disorders."}", "/scratch/micpie/export/SIDER/valid_0-0.jsonl": "{"text":"The compound with the DeepSMILES CCNCCNCCNCCN)))))))))))N is a potential cause for hepatobiliary disorders."} {"text":"The molecular species with the canonical SMILES representation of O=C(O)CCCc1ccc(N(CCCl)CCCl)cc1 is a potential reason for liver and gallbladder disorders."}", "/scratch/micpie/export/SIDER/test_0-21.jsonl": "{"text":"The chemical compound with the InChI representation of InChI=1S\/C22H30O\/c1-4-21-14-15(3)20-17-9-7-6-8-16(17)10-11-18(20)19(21)12-13-22(21,23)5-2\/h2,8,17-20,23H,3-4,6-7,9-14H2,1H3\/t17-,18-,19-,20+,21-,22-\/m0\/s1 is not a potential reason for cardiac disorders."} {"text":"The compound with the SELFIES representation of [O-1][S][=Branch1][C][=O][=Branch1][C][=O][O-1].[Zn+2] is not a potential cause for cardiac disorders."}", "/scratch/micpie/export/SIDER/test_0-2.jsonl": "{"text":"The compound with the SELFIES [C][C][C@][C][C][=Branch1][C][=C][C@H1][C@H1][Branch1][=C][C@@H1][Ring1][#Branch1][C][C][C@][Ring1][#Branch2][Branch1][Ring1][C][#C][O][C][C][C][=C][C][C][C][C@H1][Ring1][P][Ring1][=Branch1] is a potential reason for ophthalmic disorders."} {"text":"The molecular species with the SMILES [O-]S(=O)(=O)[O-].[Zn+2] is not a potential reason for ophthalmic disorders."}", "/scratch/micpie/export/SIDER/train_0-22.jsonl": "{"text":"The chemical compound with the SMILES CC(C)(C)C1=CC(=C(C=C1NC(=O)C2=CNC3=CC=CC=C3C2=O)O)C(C)(C)C is a potential reason for disorders of the nervous system."} {"text":"The chemical compound with the DeepSMILES representation of CCC=O)CCCC)NC)C))))C=CC=CC=C6))))))C=CC=CC=C6 is a potential cause for nervous system disorders."}", "/scratch/micpie/export/SIDER/valid_0-10.jsonl": "{"text":"The compound with the SMILES C(CNCCNCCNCCN)N is not a potential cause for medical and surgical procedures."} {"text":"The molecular species with the canonical SMILES representation of O=C(O)CCCc1ccc(N(CCCl)CCCl)cc1 is not a potential cause for surgical and medical procedures."}", "/scratch/micpie/export/SIDER/train_0-6.jsonl": "{"text":"The compound with the SMILES representation of CC(C)(C)C1=CC(=C(C=C1NC(=O)C2=CNC3=CC=CC=C3C2=O)O)C(C)(C)C is a potential reason for disorders of the breasts and the reproductive system."} {"text":"The chemical with the SELFIES representation of [C][C][C][=Branch1][C][=O][C][Branch1][O][C][C][Branch1][C][C][N][Branch1][C][C][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1] is a potential reason for reproductive system and breast disorders."}", "/scratch/micpie/export/SIDER/valid_0-6.jsonl": "{"text":"The molecular species with the SELFIES representation of ['[C][Branch1][N][C][N][C][C][N][C][C][N][C][C][N][N]'] is not a potential cause for reproductive system and breast disorders."} {"text":"The chemical with the canonical SMILES O=C(O)CCCc1ccc(N(CCCl)CCCl)cc1 is a potential cause for disorders of the breasts and the reproductive system."}", "/scratch/micpie/export/SIDER/train_0-21.jsonl": "{"text":"The chemical compound with the DeepSMILES representation of CCC)C)C=CC=CC=C6NC=O)C=CNC=CC=CC=C6C%10=O)))))))))))))))O))CC)C)C is not a potential cause for cardiac disorders."} {"text":"The chemical compound with the SELFIES [C][C][C][=Branch1][C][=O][C][Branch1][O][C][C][Branch1][C][C][N][Branch1][C][C][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1] is a potential reason for cardiac disorders."}", "/scratch/micpie/export/SIDER/train_0-19.jsonl": "{"text":"The chemical with the canonical SMILES CC(C)(C)c1cc(C(C)(C)C)c(NC(=O)c2c[nH]c3ccccc3c2=O)cc1O is not a potential reason for pregnancy, childbirth, and newborn conditions."} {"text":"The chemical compound with the DeepSMILES representation of CCC=O)CCCC)NC)C))))C=CC=CC=C6))))))C=CC=CC=C6 is not a potential cause for pregnancy, puerperium and perinatal conditions."}", "/scratch/micpie/export/SIDER/test_0-9.jsonl": "{"text":"The chemical compound with the InChI InChI=1S\/C22H30O\/c1-4-21-14-15(3)20-17-9-7-6-8-16(17)10-11-18(20)19(21)12-13-22(21,23)5-2\/h2,8,17-20,23H,3-4,6-7,9-14H2,1H3\/t17-,18-,19-,20+,21-,22-\/m0\/s1 is not a potential reason for endocrine system disorders."} {"text":"The molecule with the SELFIES [O-1][S][=Branch1][C][=O][=Branch1][C][=O][O-1].[Zn+2] is not a potential cause for endocrine system disorders."}", "/scratch/micpie/export/SIDER/test_0-0.jsonl": "{"text":"The chemical with the InChI representation of InChI=1S\/C22H30O\/c1-4-21-14-15(3)20-17-9-7-6-8-16(17)10-11-18(20)19(21)12-13-22(21,23)5-2\/h2,8,17-20,23H,3-4,6-7,9-14H2,1H3\/t17-,18-,19-,20+,21-,22-\/m0\/s1 is not a potential reason for hepatobiliary disorders."} {"text":"The molecule with the DeepSMILES representation of [O-]S=O)=O)[O-].[Zn+2] is not a potential cause for hepatobiliary disorders."}", "/scratch/micpie/export/SIDER/valid_0-16.jsonl": "{"text":"The chemical with the canonical SMILES NCCNCCNCCNCCN is a potential reason for respiratory, thoracic and mediastinal disorders."} {"text":"The chemical with the DeepSMILES representation of C=CC=CC=C6CCCC=O)O))))))))NCCCl)))CCCl is a potential cause for respiratory, thoracic and mediastinal disorders."}", "/scratch/micpie/export/SIDER/valid_0-7.jsonl": "{"text":"The compound with the SELFIES ['[C][Branch1][N][C][N][C][C][N][C][C][N][C][C][N][N]'] is not a potential cause for benign and malignant tumors (including cysts and polyps)."} {"text":"The compound with the SMILES C1=CC(=CC=C1CCCC(=O)O)N(CCCl)CCCl is a potential reason for benign and malignant tumors (including cysts and polyps)."}", "/scratch/micpie/export/SIDER/test_0-3.jsonl": "{"text":"The molecular species with the SMILES CC[C@]12CC(=C)[C@H]3[C@H]([C@@H]1CC[C@]2(C#C)O)CCC4=CCCC[C@H]34 is not a potential cause for musculoskeletal and connective tissue disorders."} {"text":"The molecule with the DeepSMILES [O-]S=O)=O)[O-].[Zn+2] is not a potential cause for muscle and joint disorders."}", "/scratch/micpie/export/SIDER/valid_0-11.jsonl": "{"text":"The compound with the SMILES C(CNCCNCCNCCN)N is not a potential reason for vascular system disorders."} {"text":"The chemical compound with the SELFIES representation of [C][=C][C][=Branch1][=C][=C][C][=C][Ring1][=Branch1][C][C][C][C][=Branch1][C][=O][O][N][Branch1][Ring2][C][C][Cl][C][C][Cl] is not a potential reason for vascular disorders."}", "/scratch/micpie/export/SIDER/train_0-20.jsonl": "{"text":"The chemical with the SMILES representation of CC(C)(C)C1=CC(=C(C=C1NC(=O)C2=CNC3=CC=CC=C3C2=O)O)C(C)(C)C is a potential cause for ear and inner ear disorders."} {"text":"The molecular species with the canonical SMILES representation of CCC(=O)C(CC(C)N(C)C)(c1ccccc1)c1ccccc1 is a potential cause for ear and labyrinth disorders."}", "/scratch/micpie/export/SIDER/valid_0-20.jsonl": "{"text":"The chemical compound with the SELFIES representation of ['[C][Branch1][N][C][N][C][C][N][C][C][N][C][C][N][N]'] is a potential cause for ear and inner ear disorders."} {"text":"The molecule with the DeepSMILES C=CC=CC=C6CCCC=O)O))))))))NCCCl)))CCCl is not a potential cause for ear and labyrinth disorders."}", "/scratch/micpie/export/SIDER/train_0-0.jsonl": "{"text":"The molecular species with the DeepSMILES CCC)C)C=CC=CC=C6NC=O)C=CNC=CC=CC=C6C%10=O)))))))))))))))O))CC)C)C is not a potential cause for liver and gallbladder disorders."} {"text":"The chemical with the SELFIES representation of [C][C][C][=Branch1][C][=O][C][Branch1][O][C][C][Branch1][C][C][N][Branch1][C][C][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1] is a potential cause for liver and gallbladder disorders."}", "/scratch/micpie/export/SIDER/test_0-6.jsonl": "{"text":"The chemical compound with the canonical SMILES C#C[C@]1(O)CC[C@H]2[C@@H]3CCC4=CCCC[C@@H]4[C@H]3C(=C)C[C@@]21CC is a potential reason for reproductive system and breast disorders."} {"text":"The chemical compound with the canonical SMILES representation of O=S(=O)([O-])[O-].[Zn+2] is not a potential cause for reproductive system and breast disorders."}", "/scratch/micpie/export/SIDER/train_0-10.jsonl": "{"text":"The compound with the SMILES CC(C)(C)C1=CC(=C(C=C1NC(=O)C2=CNC3=CC=CC=C3C2=O)O)C(C)(C)C is not a potential cause for surgical and medical procedures."} {"text":"The chemical with the SMILES representation of CCC(=O)C(CC(C)N(C)C)(C1=CC=CC=C1)C2=CC=CC=C2 is not a potential cause for medical and surgical procedures."}", "/scratch/micpie/export/SIDER/train_0-3.jsonl": "{"text":"The compound with the SELFIES representation of [C][C][Branch1][C][C][Branch1][C][C][C][=C][C][=Branch2][Ring1][=C][=C][Branch2][Ring1][=Branch2][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][#Branch2][=O][O][C][Branch1][C][C][Branch1][C][C][C] is a potential cause for musculoskeletal and connective tissue disorders."} {"text":"The molecule with the SELFIES [C][C][C][=Branch1][C][=O][C][Branch1][O][C][C][Branch1][C][C][N][Branch1][C][C][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1] is a potential cause for musculoskeletal and connective tissue disorders."}", "/scratch/micpie/export/SIDER/train_0-12.jsonl": "{"text":"The chemical compound with the SELFIES representation of [C][C][Branch1][C][C][Branch1][C][C][C][=C][C][=Branch2][Ring1][=C][=C][Branch2][Ring1][=Branch2][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][#Branch2][=O][O][C][Branch1][C][C][Branch1][C][C][C] is not a potential cause for disorders of the blood and lymphatic system."} {"text":"The compound with the DeepSMILES CCC=O)CCCC)NC)C))))C=CC=CC=C6))))))C=CC=CC=C6 is a potential cause for blood and lymphatic system disorders."}", "/scratch/micpie/export/SIDER/test_0-13.jsonl": "{"text":"The chemical compound with the SELFIES representation of [C][C][C@][C][C][=Branch1][C][=C][C@H1][C@H1][Branch1][=C][C@@H1][Ring1][#Branch1][C][C][C@][Ring1][#Branch2][Branch1][Ring1][C][#C][O][C][C][C][=C][C][C][C][C@H1][Ring1][P][Ring1][=Branch1] is a potential reason for skin and subcutaneous tissue disorders."} {"text":"The compound with the SELFIES representation of [O-1][S][=Branch1][C][=O][=Branch1][C][=O][O-1].[Zn+2] is not a potential reason for skin and subcutaneous tissue disorders."}", "/scratch/micpie/export/SIDER/valid_0-2.jsonl": "{"text":"The molecule with the SMILES representation of C(CNCCNCCNCCN)N is not a potential reason for eye disorders."} {"text":"The chemical compound with the canonical SMILES O=C(O)CCCc1ccc(N(CCCl)CCCl)cc1 is a potential cause for ophthalmic disorders."}", "/scratch/micpie/export/SIDER/valid_0-21.jsonl": "{"text":"The molecule with the SMILES representation of C(CNCCNCCNCCN)N is a potential cause for cardiac disorders."} {"text":"The molecular species with the canonical SMILES representation of O=C(O)CCCc1ccc(N(CCCl)CCCl)cc1 is not a potential reason for cardiac disorders."}", "/scratch/micpie/export/SIDER/train_0-14.jsonl": "{"text":"The molecular species with the DeepSMILES representation of CCC)C)C=CC=CC=C6NC=O)C=CNC=CC=CC=C6C%10=O)))))))))))))))O))CC)C)C is not a potential reason for congenital, familial and genetic disorders."} {"text":"The compound with the SMILES CCC(=O)C(CC(C)N(C)C)(C1=CC=CC=C1)C2=CC=CC=C2 is not a potential cause for congenital, familial and genetic disorders."}", "/scratch/micpie/export/SIDER/valid_0-1.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C8H23N5\/c9-1-3-11-5-7-13-8-6-12-4-2-10\/h11-13H,1-10H2 is a potential cause for metabolic and nutritional disorders."} {"text":"The molecule with the canonical SMILES O=C(O)CCCc1ccc(N(CCCl)CCCl)cc1 is a potential cause for metabolism and nutrition disorders."}", "/scratch/micpie/export/SIDER/valid_0-13.jsonl": "{"text":"The molecular species with the InChI InChI=1S\/C8H23N5\/c9-1-3-11-5-7-13-8-6-12-4-2-10\/h11-13H,1-10H2 is a potential reason for skin and subcutaneous tissue disorders."} {"text":"The compound with the InChI representation of InChI=1S\/C14H19Cl2NO2\/c15-8-10-17(11-9-16)13-6-4-12(5-7-13)2-1-3-14(18)19\/h4-7H,1-3,8-11H2,(H,18,19) is a potential cause for disorders of the skin and subcutaneous tissue."}", "/scratch/micpie/export/SIDER/valid_0-5.jsonl": "{"text":"The compound with the DeepSMILES representation of CCNCCNCCNCCN)))))))))))N is not a potential cause for immune system disorders."} {"text":"The molecule with the canonical SMILES representation of O=C(O)CCCc1ccc(N(CCCl)CCCl)cc1 is a potential cause for disorders of the immune system."}", "/scratch/micpie/export/SIDER/train_0-15.jsonl": "{"text":"The molecule with the DeepSMILES representation of CCC)C)C=CC=CC=C6NC=O)C=CNC=CC=CC=C6C%10=O)))))))))))))))O))CC)C)C is a potential cause for infestations and infections."} {"text":"The chemical compound with the InChI InChI=1S\/C21H27NO\/c1-5-20(23)21(16-17(2)22(3)4,18-12-8-6-9-13-18)19-14-10-7-11-15-19\/h6-15,17H,5,16H2,1-4H3 is a potential reason for infestations and infections."}", "/scratch/micpie/export/SIDER/valid_0-4.jsonl": "{"text":"The compound with the SMILES C(CNCCNCCNCCN)N is a potential reason for gastrointestinal disorders."} {"text":"The molecule with the SMILES C1=CC(=CC=C1CCCC(=O)O)N(CCCl)CCCl is a potential reason for digestive system disorders."}", "/scratch/micpie/export/SIDER/train_0-5.jsonl": "{"text":"The chemical compound with the SELFIES [C][C][Branch1][C][C][Branch1][C][C][C][=C][C][=Branch2][Ring1][=C][=C][Branch2][Ring1][=Branch2][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][#Branch2][=O][O][C][Branch1][C][C][Branch1][C][C][C] is not a potential cause for immune system disorders."} {"text":"The molecule with the SELFIES [C][C][C][=Branch1][C][=O][C][Branch1][O][C][C][Branch1][C][C][N][Branch1][C][C][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1] is a potential reason for immune system disorders."}", "/scratch/micpie/export/SIDER/valid_0-15.jsonl": "{"text":"The molecule with the canonical SMILES representation of NCCNCCNCCNCCN is not a potential cause for infestations and infections."} {"text":"The chemical with the SMILES representation of C1=CC(=CC=C1CCCC(=O)O)N(CCCl)CCCl is a potential reason for infections and infestations."}", "/scratch/micpie/export/SIDER/valid_0-12.jsonl": "{"text":"The compound with the SMILES representation of C(CNCCNCCNCCN)N is not a potential cause for disorders of the blood and lymphatic system."} {"text":"The compound with the SMILES representation of C1=CC(=CC=C1CCCC(=O)O)N(CCCl)CCCl is a potential reason for blood and lymphatic system disorders."}", "/scratch/micpie/export/SIDER/valid_0-18.jsonl": "{"text":"The molecular species with the canonical SMILES representation of NCCNCCNCCNCCN is not a potential cause for kidney and urinary tract disorders."} {"text":"The chemical with the canonical SMILES O=C(O)CCCc1ccc(N(CCCl)CCCl)cc1 is a potential cause for kidney and urinary tract disorders."}", "/scratch/micpie/export/SIDER/train_0-2.jsonl": "{"text":"The compound with the SMILES representation of CC(C)(C)C1=CC(=C(C=C1NC(=O)C2=CNC3=CC=CC=C3C2=O)O)C(C)(C)C is not a potential reason for eye disorders."} {"text":"The molecule with the DeepSMILES representation of CCC=O)CCCC)NC)C))))C=CC=CC=C6))))))C=CC=CC=C6 is a potential reason for eye disorders."}", "/scratch/micpie/export/SIDER/test_0-11.jsonl": "{"text":"The chemical compound with the InChI representation of InChI=1S\/C22H30O\/c1-4-21-14-15(3)20-17-9-7-6-8-16(17)10-11-18(20)19(21)12-13-22(21,23)5-2\/h2,8,17-20,23H,3-4,6-7,9-14H2,1H3\/t17-,18-,19-,20+,21-,22-\/m0\/s1 is a potential cause for vascular disorders."} {"text":"The molecular species with the InChI InChI=1S\/H2O4S.Zn\/c1-5(2,3)4;\/h(H2,1,2,3,4);\/q;+2\/p-2 is not a potential cause for vascular disorders."}", "/scratch/micpie/export/SIDER/train_0-7.jsonl": "{"text":"The chemical compound with the DeepSMILES CCC)C)C=CC=CC=C6NC=O)C=CNC=CC=CC=C6C%10=O)))))))))))))))O))CC)C)C is not a potential cause for benign and malignant tumors (including cysts and polyps)."} {"text":"The chemical compound with the InChI InChI=1S\/C21H27NO\/c1-5-20(23)21(16-17(2)22(3)4,18-12-8-6-9-13-18)19-14-10-7-11-15-19\/h6-15,17H,5,16H2,1-4H3 is not a potential cause for neoplasms benign, malignant and unspecified (incl cysts and polyps)."}", "/scratch/micpie/export/SIDER/test_0-17.jsonl": "{"text":"The molecule with the DeepSMILES CC[C@]CC=C)[C@H][C@H][C@@H]6CC[C@]9C#C))O)))))CCC=CCCC[C@H]%106 is a potential cause for mental health and psychiatric disorders."} {"text":"The chemical compound with the canonical SMILES representation of O=S(=O)([O-])[O-].[Zn+2] is not a potential cause for psychiatric disorders."}", "/scratch/micpie/export/SIDER/valid_0-19.jsonl": "{"text":"The chemical compound with the SELFIES representation of ['[C][Branch1][N][C][N][C][C][N][C][C][N][C][C][N][N]'] is not a potential cause for pregnancy, puerperium and perinatal conditions."} {"text":"The chemical with the InChI representation of InChI=1S\/C14H19Cl2NO2\/c15-8-10-17(11-9-16)13-6-4-12(5-7-13)2-1-3-14(18)19\/h4-7H,1-3,8-11H2,(H,18,19) is not a potential cause for pregnancy, puerperium and perinatal conditions."}", "/scratch/micpie/export/SIDER/train_0-11.jsonl": "{"text":"The molecule with the canonical SMILES CC(C)(C)c1cc(C(C)(C)C)c(NC(=O)c2c[nH]c3ccccc3c2=O)cc1O is not a potential cause for vascular disorders."} {"text":"The molecule with the canonical SMILES representation of CCC(=O)C(CC(C)N(C)C)(c1ccccc1)c1ccccc1 is a potential cause for vascular system disorders."}", "/scratch/micpie/export/SIDER/train_0-1.jsonl": "{"text":"The molecular species with the InChI representation of InChI=1S\/C24H28N2O3\/c1-23(2,3)16-11-17(24(4,5)6)20(27)12-19(16)26-22(29)15-13-25-18-10-8-7-9-14(18)21(15)28\/h7-13,27H,1-6H3,(H,25,28)(H,26,29) is a potential cause for metabolism and nutrition disorders."} {"text":"The molecular species with the InChI representation of InChI=1S\/C21H27NO\/c1-5-20(23)21(16-17(2)22(3)4,18-12-8-6-9-13-18)19-14-10-7-11-15-19\/h6-15,17H,5,16H2,1-4H3 is a potential reason for metabolic and nutritional disorders."}", "/scratch/micpie/export/SIDER/train_0-13.jsonl": "{"text":"The chemical with the InChI representation of InChI=1S\/C24H28N2O3\/c1-23(2,3)16-11-17(24(4,5)6)20(27)12-19(16)26-22(29)15-13-25-18-10-8-7-9-14(18)21(15)28\/h7-13,27H,1-6H3,(H,25,28)(H,26,29) is a potential reason for skin and subcutaneous tissue disorders."} {"text":"The chemical with the InChI InChI=1S\/C21H27NO\/c1-5-20(23)21(16-17(2)22(3)4,18-12-8-6-9-13-18)19-14-10-7-11-15-19\/h6-15,17H,5,16H2,1-4H3 is a potential reason for skin and subcutaneous tissue disorders."}", "/scratch/micpie/export/SIDER/train_0-4.jsonl": "{"text":"The molecular species with the canonical SMILES CC(C)(C)c1cc(C(C)(C)C)c(NC(=O)c2c[nH]c3ccccc3c2=O)cc1O is a potential reason for digestive system disorders."} {"text":"The chemical with the SELFIES representation of [C][C][C][=Branch1][C][=O][C][Branch1][O][C][C][Branch1][C][C][N][Branch1][C][C][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1] is a potential reason for digestive system disorders."}", "/scratch/micpie/export/SIDER/test_0-7.jsonl": "{"text":"The compound with the DeepSMILES CC[C@]CC=C)[C@H][C@H][C@@H]6CC[C@]9C#C))O)))))CCC=CCCC[C@H]%106 is a potential reason for benign and malignant tumors (including cysts and polyps)."} {"text":"The molecule with the SMILES [O-]S(=O)(=O)[O-].[Zn+2] is not a potential reason for benign and malignant tumors (including cysts and polyps)."}", "/scratch/micpie/export/SIDER/train_0-9.jsonl": "{"text":"The chemical compound with the InChI InChI=1S\/C24H28N2O3\/c1-23(2,3)16-11-17(24(4,5)6)20(27)12-19(16)26-22(29)15-13-25-18-10-8-7-9-14(18)21(15)28\/h7-13,27H,1-6H3,(H,25,28)(H,26,29) is not a potential reason for endocrine disorders."} {"text":"The chemical compound with the SMILES CCC(=O)C(CC(C)N(C)C)(C1=CC=CC=C1)C2=CC=CC=C2 is a potential cause for endocrine system disorders."}", "/scratch/micpie/export/SIDER/valid_0-22.jsonl": "{"text":"The molecule with the SMILES representation of C(CNCCNCCNCCN)N is a potential cause for disorders of the nervous system."} {"text":"The compound with the InChI representation of InChI=1S\/C14H19Cl2NO2\/c15-8-10-17(11-9-16)13-6-4-12(5-7-13)2-1-3-14(18)19\/h4-7H,1-3,8-11H2,(H,18,19) is a potential reason for disorders of the nervous system."}", "/scratch/micpie/export/SIDER/train_0-18.jsonl": "{"text":"The chemical compound with the InChI InChI=1S\/C24H28N2O3\/c1-23(2,3)16-11-17(24(4,5)6)20(27)12-19(16)26-22(29)15-13-25-18-10-8-7-9-14(18)21(15)28\/h7-13,27H,1-6H3,(H,25,28)(H,26,29) is not a potential reason for renal and urinary disorders."} {"text":"The chemical compound with the InChI InChI=1S\/C21H27NO\/c1-5-20(23)21(16-17(2)22(3)4,18-12-8-6-9-13-18)19-14-10-7-11-15-19\/h6-15,17H,5,16H2,1-4H3 is a potential reason for kidney and urinary tract disorders."}", "/scratch/micpie/export/SIDER/valid_0-3.jsonl": "{"text":"The chemical compound with the InChI representation of InChI=1S\/C8H23N5\/c9-1-3-11-5-7-13-8-6-12-4-2-10\/h11-13H,1-10H2 is a potential cause for muscle and joint disorders."} {"text":"The chemical with the InChI InChI=1S\/C14H19Cl2NO2\/c15-8-10-17(11-9-16)13-6-4-12(5-7-13)2-1-3-14(18)19\/h4-7H,1-3,8-11H2,(H,18,19) is a potential cause for muscle and joint disorders."}", "/scratch/micpie/export/SIDER/test_0-8.jsonl": "{"text":"The chemical with the canonical SMILES representation of C#C[C@]1(O)CC[C@H]2[C@@H]3CCC4=CCCC[C@@H]4[C@H]3C(=C)C[C@@]21CC is not a potential reason for general disorders and administration site conditions."} {"text":"The molecular species with the canonical SMILES representation of O=S(=O)([O-])[O-].[Zn+2] is not a potential reason for general disorders and administration site conditions."}", "/scratch/micpie/export/SIDER/test_0-14.jsonl": "{"text":"The chemical compound with the InChI InChI=1S\/C22H30O\/c1-4-21-14-15(3)20-17-9-7-6-8-16(17)10-11-18(20)19(21)12-13-22(21,23)5-2\/h2,8,17-20,23H,3-4,6-7,9-14H2,1H3\/t17-,18-,19-,20+,21-,22-\/m0\/s1 is not a potential reason for congenital, familial and genetic disorders."} {"text":"The compound with the SMILES [O-]S(=O)(=O)[O-].[Zn+2] is not a potential cause for familial, congenital and genetic disorders."}", "/scratch/micpie/export/SIDER/valid_0-17.jsonl": "{"text":"The chemical with the SMILES C(CNCCNCCNCCN)N is a potential reason for psychiatric disorders."} {"text":"The chemical compound with the SELFIES [C][=C][C][=Branch1][=C][=C][C][=C][Ring1][=Branch1][C][C][C][C][=Branch1][C][=O][O][N][Branch1][Ring2][C][C][Cl][C][C][Cl] is a potential reason for mental health and psychiatric disorders."}", "/scratch/micpie/export/SIDER/valid_0-14.jsonl": "{"text":"The chemical with the canonical SMILES NCCNCCNCCNCCN is not a potential cause for congenital, familial and genetic disorders."} {"text":"The molecular species with the SMILES representation of C1=CC(=CC=C1CCCC(=O)O)N(CCCl)CCCl is not a potential reason for familial, congenital and genetic disorders."}", "/scratch/micpie/export/SIDER/test_0-4.jsonl": "{"text":"The molecular species with the DeepSMILES CC[C@]CC=C)[C@H][C@H][C@@H]6CC[C@]9C#C))O)))))CCC=CCCC[C@H]%106 is a potential reason for digestive system disorders."} {"text":"The chemical compound with the InChI InChI=1S\/H2O4S.Zn\/c1-5(2,3)4;\/h(H2,1,2,3,4);\/q;+2\/p-2 is a potential reason for gastrointestinal disorders."}", "/scratch/micpie/export/SIDER/test_0-12.jsonl": "{"text":"The molecular species with the SMILES CC[C@]12CC(=C)[C@H]3[C@H]([C@@H]1CC[C@]2(C#C)O)CCC4=CCCC[C@H]34 is not a potential reason for disorders of the blood and lymphatic system."} {"text":"The chemical compound with the DeepSMILES [O-]S=O)=O)[O-].[Zn+2] is not a potential reason for blood and lymphatic system disorders."}", "/scratch/micpie/export/SIDER/test_0-20.jsonl": "{"text":"The molecule with the canonical SMILES representation of C#C[C@]1(O)CC[C@H]2[C@@H]3CCC4=CCCC[C@@H]4[C@H]3C(=C)C[C@@]21CC is not a potential reason for ear and labyrinth disorders."} {"text":"The molecular species with the InChI InChI=1S\/H2O4S.Zn\/c1-5(2,3)4;\/h(H2,1,2,3,4);\/q;+2\/p-2 is not a potential cause for ear and labyrinth disorders."}", "/scratch/micpie/export/valid.jsonl": "{"text":"The compound with the IUPAC name of [(3R,4S,5S)-5-[[4-amino-3-[(2R)-3-ethoxy-1,1,1-trifluoropropan-2-yl]oxy-5-fluorophenyl]methyl]-4-hydroxy-1,1-dioxothian-3-yl]-[(3-tert-butylphenyl)methyl]azanium displays inhibition of the human beta-secretase 1 (BACE-1)."} {"text":"User: I'm looking for the SMILES of a molecule that has a volume of distribution at steady state (VDss) of 0.370 L\/kg.\nAssistant: This is a molecule that has a volume of distribution at steady state (VDss) of 0.370 L\/kg: CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21"}", "/scratch/micpie/export/ord_rxn_smiles_procedure/test_0-1.jsonl": "{"text":"The reaction procedure description of a reaction with the reaction SMILES string CO.COC[O:14][c:3]1[c:2]([Br:1])[cH:7][cH:6][c:5]([O:8][CH:9]([F:10])[F:11])[c:4]1[O:12][CH3:13].Cl>>[Br:1][c:2]1[c:3]([OH:14])[c:4]([O:12][CH3:13])[c:5]([O:8][CH:9]([F:10])[F:11])[cH:6][cH:7]1 is To a stirring solution of 1-bromo-4-(difluoromethoxy)-3-methoxy-2-(methoxymethoxy)benzene (1 g, 3.19 mmol) in methanol (15 mL) was added concentrated hydrochloride (2 mL) and the reaction mixture was heated to 50° C. for 1 h. The reaction mixture was cooled to RT, concentrated under reduced pressure and the obtained residue was basified with sodium bicarbonate solution and extracted with dichloromethane (3×). The combined dichloromethane layers were washed with brine, dried over sodium sulphate and concentrated under reduced pressure to afford 6-Bromo-3-(difluoromethoxy)-2-methoxyphenol as a solid (800 mg, 93%)."} {"text":"The description of reaction procedure of a reaction with the reaction SMILES (RXNSMILES) C1CCC([n:27]2[cH:26][n:25][c:24]3[c:23]([NH:22][CH:20]([c:8]4[cH:7][c:6]5[cH:5][cH:4][cH:3][c:2]([CH3:1])[c:11]5[c:10](=[O:12])[n:9]4-[c:13]4[c:14]([CH3:19])[cH:15][cH:16][cH:17][cH:18]4)[CH3:21])[n:31][cH:30][n:29][c:28]32)OC1.O=C([O-])O.[Na+]>>[CH3:1][c:2]1[cH:3][cH:4][cH:5][c:6]2[cH:7][c:8]([CH:20]([CH3:21])[NH:22][c:23]3[c:24]4[n:25][cH:26][nH:27][c:28]4[n:29][cH:30][n:31]3)[n:9](-[c:13]3[c:14]([CH3:19])[cH:15][cH:16][cH:17][cH:18]3)[c:10](=[O:12])[c:11]12 is 8-Methyl-3-(1-(9-(tetrahydro-2H-pyran-2-yl)-9H-purin-6-ylamino)ethyl)-2-o-tolylisoquinolin-1(2H)-one 4105 (180 mg, 0.36 mmol) was dissolved in MeOH(HCl) (50 mL) and stirred for 2 h. Aqueous NaHCO3 solution was added to the reaction mixture and the pH value was adjusted to 9. The mixture was then filtered and the filtrate was concentrated in vacuo to afford the desired product, 3-(1-(9H-purin-6-ylamino)ethyl)-8-methyl-2-o-tolylisoquinolin-1(2H)-one 4106 (80 mg, 54% yield) as a yellow solid."}", "/scratch/micpie/export/ord_rxn_smiles_procedure/valid_0-0.jsonl": "{"text":"The RXNSMILES of a reaction with the procedure below is C1CCOC1.C[Si](C)(C)c1cc(CCC[O:14][C:1]([CH2:2][CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH2:8][CH2:9][CH2:10][CH2:11][CH3:12])=[O:13])co1.O=[O:60].Oc1c(I)cc2c(c1I)Oc1c(cc(I)c(O)c1I)[C:43]21[c:44]2[c:45](c(Cl)[c:47](Cl)[c:48](Cl)[c:49]2Cl)[C:54](=[O:55])[O:56]1>>[C:1]([CH2:2][CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH2:8][CH2:9][CH2:10][CH2:11][CH3:12])(=[O:13])[O:14][CH2:47][CH2:48][CH2:49][C:44]1=[CH:45][CH:54]([OH:55])[O:56][C:43]1=[O:60].\nProcedure: A mixture of 3-(5-trimethylsilyl-3-furyl)propyl dodecanoate (105 mg, 0.27 mmol) and Rose Bengal (5 mg) in tetrahydrofuran (6 ml) was exposed to singlet oxygen for 2.5 hours at -78 degrees. The residue, after solvent removal, was purified by preparative thin layer chromatography (TLC) (20×20 cm, 500u silica plate; developed with 60% ethyl ether\/petroleum ether). The title ester was obtained as an off-white solid."} {"text":"The reaction SMILES of a reaction with the reaction procedure description below is CC(=O)OC(C)=O.O=[CH:31][c:24]1[cH:23][cH:22][c:21]([N:20]([CH3:19])[CH3:33])[c:30]2[c:25]1[cH:26][cH:27][cH:28][cH:29]2.[CH2:2]([CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH3:8])[n+:9]1[c:10]([CH3:18])[s:11][c:12]2[c:13]1[cH:14][cH:15][cH:16][cH:17]2.[I-:1]>>[CH2:2]([CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH3:8])[n+:9]1[c:10](\/[CH:18]=[CH:31]\/[c:24]2[cH:23][cH:22][c:21]([N:20]([CH3:19])[CH3:33])[c:30]3[c:25]2[cH:26][cH:27][cH:28][cH:29]3)[s:11][c:12]2[c:13]1[cH:14][cH:15][cH:16][cH:17]2.[I-:1].\nProcedure: A mixture of 3-heptyl-2-methylbenzothiazolium iodide (0.50 g, 1.33 mmol) and 4-dimethylamino-1-naphthaldehyde (0.40 g, 2.01 mmol) in 5 mL of acetic anhydride under nitrogen was heated. Upon refluxing, the mixture turned purple and all solids seemed to be in solution. The solution was refluxed for 15 minutes and on cooling, solid formed. The mixture was filtered and washed with ethyl acetate to give (E)-2-[2-(4-dimethylaminonaphthalen-1-yl)vinyl]-3-heptylbenzothiazol-3-ium iodide as a dark brown solid, mp 195-197° C."}", "/scratch/micpie/export/ord_rxn_smiles_procedure/test_0-2.jsonl": "{"text":"User: I would like to run a reaction with the reaction SMILES string CO.COC[O:14][c:3]1[c:2]([Br:1])[cH:7][cH:6][c:5]([O:8][CH:9]([F:10])[F:11])[c:4]1[O:12][CH3:13].Cl>>[Br:1][c:2]1[c:3]([OH:14])[c:4]([O:12][CH3:13])[c:5]([O:8][CH:9]([F:10])[F:11])[cH:6][cH:7]1.\nAssistant: That's interesting, is there anything else I can do for you?\nUser: Yes, I would like to know the description of reaction procedure I should follow to run the reaction.\nAssistant: I advise the following procedure: To a stirring solution of 1-bromo-4-(difluoromethoxy)-3-methoxy-2-(methoxymethoxy)benzene (1 g, 3.19 mmol) in methanol (15 mL) was added concentrated hydrochloride (2 mL) and the reaction mixture was heated to 50° C. for 1 h. The reaction mixture was cooled to RT, concentrated under reduced pressure and the obtained residue was basified with sodium bicarbonate solution and extracted with dichloromethane (3×). The combined dichloromethane layers were washed with brine, dried over sodium sulphate and concentrated under reduced pressure to afford 6-Bromo-3-(difluoromethoxy)-2-methoxyphenol as a solid (800 mg, 93%)."} {"text":"User: I need to run a reaction with the reaction SMILES C1CCC([n:27]2[cH:26][n:25][c:24]3[c:23]([NH:22][CH:20]([c:8]4[cH:7][c:6]5[cH:5][cH:4][cH:3][c:2]([CH3:1])[c:11]5[c:10](=[O:12])[n:9]4-[c:13]4[c:14]([CH3:19])[cH:15][cH:16][cH:17][cH:18]4)[CH3:21])[n:31][cH:30][n:29][c:28]32)OC1.O=C([O-])O.[Na+]>>[CH3:1][c:2]1[cH:3][cH:4][cH:5][c:6]2[cH:7][c:8]([CH:20]([CH3:21])[NH:22][c:23]3[c:24]4[n:25][cH:26][nH:27][c:28]4[n:29][cH:30][n:31]3)[n:9](-[c:13]3[c:14]([CH3:19])[cH:15][cH:16][cH:17][cH:18]3)[c:10](=[O:12])[c:11]12.\nAssistant: Cool, is there anything else I can do for you?\nUser: I would like to know the reaction procedure I should follow to run the reaction.\nAssistant: I recommend the following procedure: 8-Methyl-3-(1-(9-(tetrahydro-2H-pyran-2-yl)-9H-purin-6-ylamino)ethyl)-2-o-tolylisoquinolin-1(2H)-one 4105 (180 mg, 0.36 mmol) was dissolved in MeOH(HCl) (50 mL) and stirred for 2 h. Aqueous NaHCO3 solution was added to the reaction mixture and the pH value was adjusted to 9. The mixture was then filtered and the filtrate was concentrated in vacuo to afford the desired product, 3-(1-(9H-purin-6-ylamino)ethyl)-8-methyl-2-o-tolylisoquinolin-1(2H)-one 4106 (80 mg, 54% yield) as a yellow solid."}", "/scratch/micpie/export/ord_rxn_smiles_procedure/test_0-0.jsonl": "{"text":"The reaction SMILES string of a reaction with the reaction procedure description below is CO.COC[O:14][c:3]1[c:2]([Br:1])[cH:7][cH:6][c:5]([O:8][CH:9]([F:10])[F:11])[c:4]1[O:12][CH3:13].Cl>>[Br:1][c:2]1[c:3]([OH:14])[c:4]([O:12][CH3:13])[c:5]([O:8][CH:9]([F:10])[F:11])[cH:6][cH:7]1.\nProcedure: To a stirring solution of 1-bromo-4-(difluoromethoxy)-3-methoxy-2-(methoxymethoxy)benzene (1 g, 3.19 mmol) in methanol (15 mL) was added concentrated hydrochloride (2 mL) and the reaction mixture was heated to 50° C. for 1 h. The reaction mixture was cooled to RT, concentrated under reduced pressure and the obtained residue was basified with sodium bicarbonate solution and extracted with dichloromethane (3×). The combined dichloromethane layers were washed with brine, dried over sodium sulphate and concentrated under reduced pressure to afford 6-Bromo-3-(difluoromethoxy)-2-methoxyphenol as a solid (800 mg, 93%)."} {"text":"The reaction SMILES string of a reaction with the description of reaction procedure below is C1CCC([n:27]2[cH:26][n:25][c:24]3[c:23]([NH:22][CH:20]([c:8]4[cH:7][c:6]5[cH:5][cH:4][cH:3][c:2]([CH3:1])[c:11]5[c:10](=[O:12])[n:9]4-[c:13]4[c:14]([CH3:19])[cH:15][cH:16][cH:17][cH:18]4)[CH3:21])[n:31][cH:30][n:29][c:28]32)OC1.O=C([O-])O.[Na+]>>[CH3:1][c:2]1[cH:3][cH:4][cH:5][c:6]2[cH:7][c:8]([CH:20]([CH3:21])[NH:22][c:23]3[c:24]4[n:25][cH:26][nH:27][c:28]4[n:29][cH:30][n:31]3)[n:9](-[c:13]3[c:14]([CH3:19])[cH:15][cH:16][cH:17][cH:18]3)[c:10](=[O:12])[c:11]12.\nProcedure: 8-Methyl-3-(1-(9-(tetrahydro-2H-pyran-2-yl)-9H-purin-6-ylamino)ethyl)-2-o-tolylisoquinolin-1(2H)-one 4105 (180 mg, 0.36 mmol) was dissolved in MeOH(HCl) (50 mL) and stirred for 2 h. Aqueous NaHCO3 solution was added to the reaction mixture and the pH value was adjusted to 9. The mixture was then filtered and the filtrate was concentrated in vacuo to afford the desired product, 3-(1-(9H-purin-6-ylamino)ethyl)-8-methyl-2-o-tolylisoquinolin-1(2H)-one 4106 (80 mg, 54% yield) as a yellow solid."}", "/scratch/micpie/export/ord_rxn_smiles_procedure/test_0-3.jsonl": "{"text":"User: I want to run a reaction with the reaction procedure below and now need to know the RXNSMILES.\nProcedure: To a stirring solution of 1-bromo-4-(difluoromethoxy)-3-methoxy-2-(methoxymethoxy)benzene (1 g, 3.19 mmol) in methanol (15 mL) was added concentrated hydrochloride (2 mL) and the reaction mixture was heated to 50° C. for 1 h. The reaction mixture was cooled to RT, concentrated under reduced pressure and the obtained residue was basified with sodium bicarbonate solution and extracted with dichloromethane (3×). The combined dichloromethane layers were washed with brine, dried over sodium sulphate and concentrated under reduced pressure to afford 6-Bromo-3-(difluoromethoxy)-2-methoxyphenol as a solid (800 mg, 93%).\nAssistant: The RXNSMILES of the reaction is CO.COC[O:14][c:3]1[c:2]([Br:1])[cH:7][cH:6][c:5]([O:8][CH:9]([F:10])[F:11])[c:4]1[O:12][CH3:13].Cl>>[Br:1][c:2]1[c:3]([OH:14])[c:4]([O:12][CH3:13])[c:5]([O:8][CH:9]([F:10])[F:11])[cH:6][cH:7]1."} {"text":"User: I would like to run a reaction with the reaction procedure below and now need to know the reaction SMILES.\nProcedure: 8-Methyl-3-(1-(9-(tetrahydro-2H-pyran-2-yl)-9H-purin-6-ylamino)ethyl)-2-o-tolylisoquinolin-1(2H)-one 4105 (180 mg, 0.36 mmol) was dissolved in MeOH(HCl) (50 mL) and stirred for 2 h. Aqueous NaHCO3 solution was added to the reaction mixture and the pH value was adjusted to 9. The mixture was then filtered and the filtrate was concentrated in vacuo to afford the desired product, 3-(1-(9H-purin-6-ylamino)ethyl)-8-methyl-2-o-tolylisoquinolin-1(2H)-one 4106 (80 mg, 54% yield) as a yellow solid.\nAssistant: The reaction SMILES of the reaction is C1CCC([n:27]2[cH:26][n:25][c:24]3[c:23]([NH:22][CH:20]([c:8]4[cH:7][c:6]5[cH:5][cH:4][cH:3][c:2]([CH3:1])[c:11]5[c:10](=[O:12])[n:9]4-[c:13]4[c:14]([CH3:19])[cH:15][cH:16][cH:17][cH:18]4)[CH3:21])[n:31][cH:30][n:29][c:28]32)OC1.O=C([O-])O.[Na+]>>[CH3:1][c:2]1[cH:3][cH:4][cH:5][c:6]2[cH:7][c:8]([CH:20]([CH3:21])[NH:22][c:23]3[c:24]4[n:25][cH:26][nH:27][c:28]4[n:29][cH:30][n:31]3)[n:9](-[c:13]3[c:14]([CH3:19])[cH:15][cH:16][cH:17][cH:18]3)[c:10](=[O:12])[c:11]12."}", "/scratch/micpie/export/ord_rxn_smiles_procedure/train_0-0.jsonl": "{"text":"The reaction SMILES string of a reaction with the procedure below is C[O:15][c:7]1[cH:6][c:5]2[c:10]([c:9]([CH3:13])[c:8]1[CH3:14])[C:11](=[O:12])[C:3]([CH2:1][CH3:2])([CH2:18][CH3:19])[C:4]2=[O:17].Cl.c1ccncc1.O>>[CH2:1]([CH3:2])[C:3]1([CH2:18][CH3:19])[C:4](=[O:17])[c:5]2[cH:6][c:7]([OH:15])[c:8]([CH3:14])[c:9]([CH3:13])[c:10]2[C:11]1=[O:12].\nProcedure: A mixture of 2,2-diethyl-4,5-dimethyl-6-methoxy-indan-1,3-dione (5.5 g., 0.021 mole) and pyridine hydrochloride (50 g.) is heated at 180°C. for six hours then poured into water (1 l) affording 4.0 g. of 2,2-diethyl-4,5-dimethyl-6-hydroxy-indan-1,3-dione which melts at 141°-142°C. after recrystallization from methanol-water."} {"text":"The RXNSMILES of a reaction with the reaction procedure description below is Cc1ccccc1.O=C([O-])[O-].[Ag+2].O[C@:49]1(Br)[C@H:48]([O:53][C:54]([CH3:55])=[O:56])[C@@H:47]([O:57][C:58]([CH3:59])=[O:60])[C@H:46]([O:61][C:62]([CH3:63])=[O:64])[C@@H:45]([C:44]([O:43][CH3:42])=[O:65])[O:51]1.[OH:1][CH2:2][C:3]([CH3:4])([CH3:5])[c:6]1[cH:7][c:8]([NH:11][C:12](=[O:13])[NH:14][c:15]2[cH:16][cH:17][c:18](-[c:21]3[n:22][c:23]4[s:24][c:25]5[c:26]([n:27]4[cH:28]3)[cH:29][cH:30][c:31]([O:33][CH2:34][CH2:35][N:36]3[CH2:37][CH2:38][O:39][CH2:40][CH2:41]3)[cH:32]5)[cH:19][cH:20]2)[n:9][o:10]1>>[O:1]([CH2:2][C:3]([CH3:4])([CH3:5])[c:6]1[cH:7][c:8]([NH:11][C:12](=[O:13])[NH:14][c:15]2[cH:16][cH:17][c:18](-[c:21]3[n:22][c:23]4[s:24][c:25]5[c:26]([n:27]4[cH:28]3)[cH:29][cH:30][c:31]([O:33][CH2:34][CH2:35][N:36]3[CH2:37][CH2:38][O:39][CH2:40][CH2:41]3)[cH:32]5)[cH:19][cH:20]2)[n:9][o:10]1)[CH:49]1[CH:48]([O:53][C:54]([CH3:55])=[O:56])[CH:47]([O:57][C:58]([CH3:59])=[O:60])[CH:46]([O:61][C:62]([CH3:63])=[O:64])[CH:45]([C:44]([O:43][CH3:42])=[O:65])[O:51]1.\nProcedure: To a stirred mixture of 1-[5-(2-hydroxy-1,1-dimethyl-ethyl)-isoxazol-3-yl]-3-{4-[7-(2-morpholin-4-yl-ethoxy)-benzo[d]imidazo[2,1-b]thiazol-2-yl]-phenyl}-urea (I-1) (1 equivalent) and silver carbonate (0.5-1 equivalent) in toluene at rt is added 1-bromo-2,3,4-tri-O-acetyl-α-D-glucuronic acid methyl ester (0.8-1.5 equivalents) and the mixture is stirred at rt until the reaction is substantially complete as monitored by LCMS or TLC. The mixture is filtered and the filtrate is partitioned between water and either dichloromethane or a mixture of dichloromethane and isopropanol. The separated aqueous layer is further extracted with dichloromethane or a mixture of dichloromethane and isopropanol. The combined organic layers are washed with 2N aq sodium hydroxide dried over magnesium sulfate, filtered, and concentrated under reduced pressure. The residue is purified by preparative reverse-phase HPLC or silica gel flash chromatography to afford 3,4,5-triacetoxy-6-{2-methyl-2-[3-(3-{4-[7-(2-morpholin-4-yl-ethoxy)-benzo[d]imidazo[2,1-b]thiazol-2-yl]-phenyl}-ureido)-isoxazol-5-yl]-propoxy}-tetrahydro-pyran-2-carboxylic acid methyl ester."}", "/scratch/micpie/export/ord_rxn_smiles_procedure/train_0-3.jsonl": "{"text":"User: I want to run a reaction with the reaction procedure below and now need to know the RXNSMILES.\nProcedure: A mixture of 2,2-diethyl-4,5-dimethyl-6-methoxy-indan-1,3-dione (5.5 g., 0.021 mole) and pyridine hydrochloride (50 g.) is heated at 180°C. for six hours then poured into water (1 l) affording 4.0 g. of 2,2-diethyl-4,5-dimethyl-6-hydroxy-indan-1,3-dione which melts at 141°-142°C. after recrystallization from methanol-water.\nAssistant: The RXNSMILES of the reaction is C[O:15][c:7]1[cH:6][c:5]2[c:10]([c:9]([CH3:13])[c:8]1[CH3:14])[C:11](=[O:12])[C:3]([CH2:1][CH3:2])([CH2:18][CH3:19])[C:4]2=[O:17].Cl.c1ccncc1.O>>[CH2:1]([CH3:2])[C:3]1([CH2:18][CH3:19])[C:4](=[O:17])[c:5]2[cH:6][c:7]([OH:15])[c:8]([CH3:14])[c:9]([CH3:13])[c:10]2[C:11]1=[O:12]."} {"text":"User: I want to run a reaction with the description of reaction procedure below and now need to know the reaction SMILES.\nProcedure: To a stirred mixture of 1-[5-(2-hydroxy-1,1-dimethyl-ethyl)-isoxazol-3-yl]-3-{4-[7-(2-morpholin-4-yl-ethoxy)-benzo[d]imidazo[2,1-b]thiazol-2-yl]-phenyl}-urea (I-1) (1 equivalent) and silver carbonate (0.5-1 equivalent) in toluene at rt is added 1-bromo-2,3,4-tri-O-acetyl-α-D-glucuronic acid methyl ester (0.8-1.5 equivalents) and the mixture is stirred at rt until the reaction is substantially complete as monitored by LCMS or TLC. The mixture is filtered and the filtrate is partitioned between water and either dichloromethane or a mixture of dichloromethane and isopropanol. The separated aqueous layer is further extracted with dichloromethane or a mixture of dichloromethane and isopropanol. The combined organic layers are washed with 2N aq sodium hydroxide dried over magnesium sulfate, filtered, and concentrated under reduced pressure. The residue is purified by preparative reverse-phase HPLC or silica gel flash chromatography to afford 3,4,5-triacetoxy-6-{2-methyl-2-[3-(3-{4-[7-(2-morpholin-4-yl-ethoxy)-benzo[d]imidazo[2,1-b]thiazol-2-yl]-phenyl}-ureido)-isoxazol-5-yl]-propoxy}-tetrahydro-pyran-2-carboxylic acid methyl ester.\nAssistant: The reaction SMILES of the reaction is Cc1ccccc1.O=C([O-])[O-].[Ag+2].O[C@:49]1(Br)[C@H:48]([O:53][C:54]([CH3:55])=[O:56])[C@@H:47]([O:57][C:58]([CH3:59])=[O:60])[C@H:46]([O:61][C:62]([CH3:63])=[O:64])[C@@H:45]([C:44]([O:43][CH3:42])=[O:65])[O:51]1.[OH:1][CH2:2][C:3]([CH3:4])([CH3:5])[c:6]1[cH:7][c:8]([NH:11][C:12](=[O:13])[NH:14][c:15]2[cH:16][cH:17][c:18](-[c:21]3[n:22][c:23]4[s:24][c:25]5[c:26]([n:27]4[cH:28]3)[cH:29][cH:30][c:31]([O:33][CH2:34][CH2:35][N:36]3[CH2:37][CH2:38][O:39][CH2:40][CH2:41]3)[cH:32]5)[cH:19][cH:20]2)[n:9][o:10]1>>[O:1]([CH2:2][C:3]([CH3:4])([CH3:5])[c:6]1[cH:7][c:8]([NH:11][C:12](=[O:13])[NH:14][c:15]2[cH:16][cH:17][c:18](-[c:21]3[n:22][c:23]4[s:24][c:25]5[c:26]([n:27]4[cH:28]3)[cH:29][cH:30][c:31]([O:33][CH2:34][CH2:35][N:36]3[CH2:37][CH2:38][O:39][CH2:40][CH2:41]3)[cH:32]5)[cH:19][cH:20]2)[n:9][o:10]1)[CH:49]1[CH:48]([O:53][C:54]([CH3:55])=[O:56])[CH:47]([O:57][C:58]([CH3:59])=[O:60])[CH:46]([O:61][C:62]([CH3:63])=[O:64])[CH:45]([C:44]([O:43][CH3:42])=[O:65])[O:51]1."}", "/scratch/micpie/export/ord_rxn_smiles_procedure/valid_0-2.jsonl": "{"text":"User: I want to run a reaction with the reaction SMILES string C1CCOC1.C[Si](C)(C)c1cc(CCC[O:14][C:1]([CH2:2][CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH2:8][CH2:9][CH2:10][CH2:11][CH3:12])=[O:13])co1.O=[O:60].Oc1c(I)cc2c(c1I)Oc1c(cc(I)c(O)c1I)[C:43]21[c:44]2[c:45](c(Cl)[c:47](Cl)[c:48](Cl)[c:49]2Cl)[C:54](=[O:55])[O:56]1>>[C:1]([CH2:2][CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH2:8][CH2:9][CH2:10][CH2:11][CH3:12])(=[O:13])[O:14][CH2:47][CH2:48][CH2:49][C:44]1=[CH:45][CH:54]([OH:55])[O:56][C:43]1=[O:60].\nAssistant: That's interesting, is there anything else I can do for you?\nUser: I would like to know the procedure I should follow to run the reaction.\nAssistant: I advise the following procedure: A mixture of 3-(5-trimethylsilyl-3-furyl)propyl dodecanoate (105 mg, 0.27 mmol) and Rose Bengal (5 mg) in tetrahydrofuran (6 ml) was exposed to singlet oxygen for 2.5 hours at -78 degrees. The residue, after solvent removal, was purified by preparative thin layer chromatography (TLC) (20×20 cm, 500u silica plate; developed with 60% ethyl ether\/petroleum ether). The title ester was obtained as an off-white solid."} {"text":"User: I need to run a reaction with the reaction SMILES (RXNSMILES) CC(=O)OC(C)=O.O=[CH:31][c:24]1[cH:23][cH:22][c:21]([N:20]([CH3:19])[CH3:33])[c:30]2[c:25]1[cH:26][cH:27][cH:28][cH:29]2.[CH2:2]([CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH3:8])[n+:9]1[c:10]([CH3:18])[s:11][c:12]2[c:13]1[cH:14][cH:15][cH:16][cH:17]2.[I-:1]>>[CH2:2]([CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH3:8])[n+:9]1[c:10](\/[CH:18]=[CH:31]\/[c:24]2[cH:23][cH:22][c:21]([N:20]([CH3:19])[CH3:33])[c:30]3[c:25]2[cH:26][cH:27][cH:28][cH:29]3)[s:11][c:12]2[c:13]1[cH:14][cH:15][cH:16][cH:17]2.[I-:1].\nAssistant: That's interesting, is there anything else I can do for you?\nUser: Yes, I would like to know the procedure I should follow to run the reaction.\nAssistant: I advise the following procedure: A mixture of 3-heptyl-2-methylbenzothiazolium iodide (0.50 g, 1.33 mmol) and 4-dimethylamino-1-naphthaldehyde (0.40 g, 2.01 mmol) in 5 mL of acetic anhydride under nitrogen was heated. Upon refluxing, the mixture turned purple and all solids seemed to be in solution. The solution was refluxed for 15 minutes and on cooling, solid formed. The mixture was filtered and washed with ethyl acetate to give (E)-2-[2-(4-dimethylaminonaphthalen-1-yl)vinyl]-3-heptylbenzothiazol-3-ium iodide as a dark brown solid, mp 195-197° C."}", "/scratch/micpie/export/ord_rxn_smiles_procedure/valid_0-1.jsonl": "{"text":"The procedure of a reaction with the RXNSMILES C1CCOC1.C[Si](C)(C)c1cc(CCC[O:14][C:1]([CH2:2][CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH2:8][CH2:9][CH2:10][CH2:11][CH3:12])=[O:13])co1.O=[O:60].Oc1c(I)cc2c(c1I)Oc1c(cc(I)c(O)c1I)[C:43]21[c:44]2[c:45](c(Cl)[c:47](Cl)[c:48](Cl)[c:49]2Cl)[C:54](=[O:55])[O:56]1>>[C:1]([CH2:2][CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH2:8][CH2:9][CH2:10][CH2:11][CH3:12])(=[O:13])[O:14][CH2:47][CH2:48][CH2:49][C:44]1=[CH:45][CH:54]([OH:55])[O:56][C:43]1=[O:60] is A mixture of 3-(5-trimethylsilyl-3-furyl)propyl dodecanoate (105 mg, 0.27 mmol) and Rose Bengal (5 mg) in tetrahydrofuran (6 ml) was exposed to singlet oxygen for 2.5 hours at -78 degrees. The residue, after solvent removal, was purified by preparative thin layer chromatography (TLC) (20×20 cm, 500u silica plate; developed with 60% ethyl ether\/petroleum ether). The title ester was obtained as an off-white solid."} {"text":"The reaction procedure of a reaction with the RXNSMILES CC(=O)OC(C)=O.O=[CH:31][c:24]1[cH:23][cH:22][c:21]([N:20]([CH3:19])[CH3:33])[c:30]2[c:25]1[cH:26][cH:27][cH:28][cH:29]2.[CH2:2]([CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH3:8])[n+:9]1[c:10]([CH3:18])[s:11][c:12]2[c:13]1[cH:14][cH:15][cH:16][cH:17]2.[I-:1]>>[CH2:2]([CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH3:8])[n+:9]1[c:10](\/[CH:18]=[CH:31]\/[c:24]2[cH:23][cH:22][c:21]([N:20]([CH3:19])[CH3:33])[c:30]3[c:25]2[cH:26][cH:27][cH:28][cH:29]3)[s:11][c:12]2[c:13]1[cH:14][cH:15][cH:16][cH:17]2.[I-:1] is A mixture of 3-heptyl-2-methylbenzothiazolium iodide (0.50 g, 1.33 mmol) and 4-dimethylamino-1-naphthaldehyde (0.40 g, 2.01 mmol) in 5 mL of acetic anhydride under nitrogen was heated. Upon refluxing, the mixture turned purple and all solids seemed to be in solution. The solution was refluxed for 15 minutes and on cooling, solid formed. The mixture was filtered and washed with ethyl acetate to give (E)-2-[2-(4-dimethylaminonaphthalen-1-yl)vinyl]-3-heptylbenzothiazol-3-ium iodide as a dark brown solid, mp 195-197° C."}", "/scratch/micpie/export/ord_rxn_smiles_procedure/valid_0-4.jsonl": "{"text":"Task: Extract the RXNSMILES of a reaction based on its reaction procedure.\nProcedure: A mixture of 3-(5-trimethylsilyl-3-furyl)propyl dodecanoate (105 mg, 0.27 mmol) and Rose Bengal (5 mg) in tetrahydrofuran (6 ml) was exposed to singlet oxygen for 2.5 hours at -78 degrees. The residue, after solvent removal, was purified by preparative thin layer chromatography (TLC) (20×20 cm, 500u silica plate; developed with 60% ethyl ether\/petroleum ether). The title ester was obtained as an off-white solid.\nAnswer: C1CCOC1.C[Si](C)(C)c1cc(CCC[O:14][C:1]([CH2:2][CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH2:8][CH2:9][CH2:10][CH2:11][CH3:12])=[O:13])co1.O=[O:60].Oc1c(I)cc2c(c1I)Oc1c(cc(I)c(O)c1I)[C:43]21[c:44]2[c:45](c(Cl)[c:47](Cl)[c:48](Cl)[c:49]2Cl)[C:54](=[O:55])[O:56]1>>[C:1]([CH2:2][CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH2:8][CH2:9][CH2:10][CH2:11][CH3:12])(=[O:13])[O:14][CH2:47][CH2:48][CH2:49][C:44]1=[CH:45][CH:54]([OH:55])[O:56][C:43]1=[O:60]"} {"text":"Task: Extract the reaction SMILES string of a reaction based on its reaction procedure.\nProcedure: A mixture of 3-heptyl-2-methylbenzothiazolium iodide (0.50 g, 1.33 mmol) and 4-dimethylamino-1-naphthaldehyde (0.40 g, 2.01 mmol) in 5 mL of acetic anhydride under nitrogen was heated. Upon refluxing, the mixture turned purple and all solids seemed to be in solution. The solution was refluxed for 15 minutes and on cooling, solid formed. The mixture was filtered and washed with ethyl acetate to give (E)-2-[2-(4-dimethylaminonaphthalen-1-yl)vinyl]-3-heptylbenzothiazol-3-ium iodide as a dark brown solid, mp 195-197° C.\nAnswer: CC(=O)OC(C)=O.O=[CH:31][c:24]1[cH:23][cH:22][c:21]([N:20]([CH3:19])[CH3:33])[c:30]2[c:25]1[cH:26][cH:27][cH:28][cH:29]2.[CH2:2]([CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH3:8])[n+:9]1[c:10]([CH3:18])[s:11][c:12]2[c:13]1[cH:14][cH:15][cH:16][cH:17]2.[I-:1]>>[CH2:2]([CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH3:8])[n+:9]1[c:10](\/[CH:18]=[CH:31]\/[c:24]2[cH:23][cH:22][c:21]([N:20]([CH3:19])[CH3:33])[c:30]3[c:25]2[cH:26][cH:27][cH:28][cH:29]3)[s:11][c:12]2[c:13]1[cH:14][cH:15][cH:16][cH:17]2.[I-:1]"}", "/scratch/micpie/export/ord_rxn_smiles_procedure/train_0-2.jsonl": "{"text":"User: I need to run a reaction with the RXNSMILES C[O:15][c:7]1[cH:6][c:5]2[c:10]([c:9]([CH3:13])[c:8]1[CH3:14])[C:11](=[O:12])[C:3]([CH2:1][CH3:2])([CH2:18][CH3:19])[C:4]2=[O:17].Cl.c1ccncc1.O>>[CH2:1]([CH3:2])[C:3]1([CH2:18][CH3:19])[C:4](=[O:17])[c:5]2[cH:6][c:7]([OH:15])[c:8]([CH3:14])[c:9]([CH3:13])[c:10]2[C:11]1=[O:12].\nAssistant: Cool, is there anything else I can do for you?\nUser: Yes, I would like to know the reaction procedure description I should follow to run the reaction.\nAssistant: I recommend the following procedure: A mixture of 2,2-diethyl-4,5-dimethyl-6-methoxy-indan-1,3-dione (5.5 g., 0.021 mole) and pyridine hydrochloride (50 g.) is heated at 180°C. for six hours then poured into water (1 l) affording 4.0 g. of 2,2-diethyl-4,5-dimethyl-6-hydroxy-indan-1,3-dione which melts at 141°-142°C. after recrystallization from methanol-water."} {"text":"User: I would like to run a reaction with the reaction SMILES (RXNSMILES) Cc1ccccc1.O=C([O-])[O-].[Ag+2].O[C@:49]1(Br)[C@H:48]([O:53][C:54]([CH3:55])=[O:56])[C@@H:47]([O:57][C:58]([CH3:59])=[O:60])[C@H:46]([O:61][C:62]([CH3:63])=[O:64])[C@@H:45]([C:44]([O:43][CH3:42])=[O:65])[O:51]1.[OH:1][CH2:2][C:3]([CH3:4])([CH3:5])[c:6]1[cH:7][c:8]([NH:11][C:12](=[O:13])[NH:14][c:15]2[cH:16][cH:17][c:18](-[c:21]3[n:22][c:23]4[s:24][c:25]5[c:26]([n:27]4[cH:28]3)[cH:29][cH:30][c:31]([O:33][CH2:34][CH2:35][N:36]3[CH2:37][CH2:38][O:39][CH2:40][CH2:41]3)[cH:32]5)[cH:19][cH:20]2)[n:9][o:10]1>>[O:1]([CH2:2][C:3]([CH3:4])([CH3:5])[c:6]1[cH:7][c:8]([NH:11][C:12](=[O:13])[NH:14][c:15]2[cH:16][cH:17][c:18](-[c:21]3[n:22][c:23]4[s:24][c:25]5[c:26]([n:27]4[cH:28]3)[cH:29][cH:30][c:31]([O:33][CH2:34][CH2:35][N:36]3[CH2:37][CH2:38][O:39][CH2:40][CH2:41]3)[cH:32]5)[cH:19][cH:20]2)[n:9][o:10]1)[CH:49]1[CH:48]([O:53][C:54]([CH3:55])=[O:56])[CH:47]([O:57][C:58]([CH3:59])=[O:60])[CH:46]([O:61][C:62]([CH3:63])=[O:64])[CH:45]([C:44]([O:43][CH3:42])=[O:65])[O:51]1.\nAssistant: That's interesting, is there anything else I can do for you?\nUser: I would like to know the description of reaction procedure I should follow to run the reaction.\nAssistant: I recommend the following procedure: To a stirred mixture of 1-[5-(2-hydroxy-1,1-dimethyl-ethyl)-isoxazol-3-yl]-3-{4-[7-(2-morpholin-4-yl-ethoxy)-benzo[d]imidazo[2,1-b]thiazol-2-yl]-phenyl}-urea (I-1) (1 equivalent) and silver carbonate (0.5-1 equivalent) in toluene at rt is added 1-bromo-2,3,4-tri-O-acetyl-α-D-glucuronic acid methyl ester (0.8-1.5 equivalents) and the mixture is stirred at rt until the reaction is substantially complete as monitored by LCMS or TLC. The mixture is filtered and the filtrate is partitioned between water and either dichloromethane or a mixture of dichloromethane and isopropanol. The separated aqueous layer is further extracted with dichloromethane or a mixture of dichloromethane and isopropanol. The combined organic layers are washed with 2N aq sodium hydroxide dried over magnesium sulfate, filtered, and concentrated under reduced pressure. The residue is purified by preparative reverse-phase HPLC or silica gel flash chromatography to afford 3,4,5-triacetoxy-6-{2-methyl-2-[3-(3-{4-[7-(2-morpholin-4-yl-ethoxy)-benzo[d]imidazo[2,1-b]thiazol-2-yl]-phenyl}-ureido)-isoxazol-5-yl]-propoxy}-tetrahydro-pyran-2-carboxylic acid methyl ester."}", "/scratch/micpie/export/ord_rxn_smiles_procedure/train_0-1.jsonl": "{"text":"The procedure of a reaction with the RXNSMILES C[O:15][c:7]1[cH:6][c:5]2[c:10]([c:9]([CH3:13])[c:8]1[CH3:14])[C:11](=[O:12])[C:3]([CH2:1][CH3:2])([CH2:18][CH3:19])[C:4]2=[O:17].Cl.c1ccncc1.O>>[CH2:1]([CH3:2])[C:3]1([CH2:18][CH3:19])[C:4](=[O:17])[c:5]2[cH:6][c:7]([OH:15])[c:8]([CH3:14])[c:9]([CH3:13])[c:10]2[C:11]1=[O:12] is A mixture of 2,2-diethyl-4,5-dimethyl-6-methoxy-indan-1,3-dione (5.5 g., 0.021 mole) and pyridine hydrochloride (50 g.) is heated at 180°C. for six hours then poured into water (1 l) affording 4.0 g. of 2,2-diethyl-4,5-dimethyl-6-hydroxy-indan-1,3-dione which melts at 141°-142°C. after recrystallization from methanol-water."} {"text":"The reaction procedure of a reaction with the RXNSMILES Cc1ccccc1.O=C([O-])[O-].[Ag+2].O[C@:49]1(Br)[C@H:48]([O:53][C:54]([CH3:55])=[O:56])[C@@H:47]([O:57][C:58]([CH3:59])=[O:60])[C@H:46]([O:61][C:62]([CH3:63])=[O:64])[C@@H:45]([C:44]([O:43][CH3:42])=[O:65])[O:51]1.[OH:1][CH2:2][C:3]([CH3:4])([CH3:5])[c:6]1[cH:7][c:8]([NH:11][C:12](=[O:13])[NH:14][c:15]2[cH:16][cH:17][c:18](-[c:21]3[n:22][c:23]4[s:24][c:25]5[c:26]([n:27]4[cH:28]3)[cH:29][cH:30][c:31]([O:33][CH2:34][CH2:35][N:36]3[CH2:37][CH2:38][O:39][CH2:40][CH2:41]3)[cH:32]5)[cH:19][cH:20]2)[n:9][o:10]1>>[O:1]([CH2:2][C:3]([CH3:4])([CH3:5])[c:6]1[cH:7][c:8]([NH:11][C:12](=[O:13])[NH:14][c:15]2[cH:16][cH:17][c:18](-[c:21]3[n:22][c:23]4[s:24][c:25]5[c:26]([n:27]4[cH:28]3)[cH:29][cH:30][c:31]([O:33][CH2:34][CH2:35][N:36]3[CH2:37][CH2:38][O:39][CH2:40][CH2:41]3)[cH:32]5)[cH:19][cH:20]2)[n:9][o:10]1)[CH:49]1[CH:48]([O:53][C:54]([CH3:55])=[O:56])[CH:47]([O:57][C:58]([CH3:59])=[O:60])[CH:46]([O:61][C:62]([CH3:63])=[O:64])[CH:45]([C:44]([O:43][CH3:42])=[O:65])[O:51]1 is To a stirred mixture of 1-[5-(2-hydroxy-1,1-dimethyl-ethyl)-isoxazol-3-yl]-3-{4-[7-(2-morpholin-4-yl-ethoxy)-benzo[d]imidazo[2,1-b]thiazol-2-yl]-phenyl}-urea (I-1) (1 equivalent) and silver carbonate (0.5-1 equivalent) in toluene at rt is added 1-bromo-2,3,4-tri-O-acetyl-α-D-glucuronic acid methyl ester (0.8-1.5 equivalents) and the mixture is stirred at rt until the reaction is substantially complete as monitored by LCMS or TLC. The mixture is filtered and the filtrate is partitioned between water and either dichloromethane or a mixture of dichloromethane and isopropanol. The separated aqueous layer is further extracted with dichloromethane or a mixture of dichloromethane and isopropanol. The combined organic layers are washed with 2N aq sodium hydroxide dried over magnesium sulfate, filtered, and concentrated under reduced pressure. The residue is purified by preparative reverse-phase HPLC or silica gel flash chromatography to afford 3,4,5-triacetoxy-6-{2-methyl-2-[3-(3-{4-[7-(2-morpholin-4-yl-ethoxy)-benzo[d]imidazo[2,1-b]thiazol-2-yl]-phenyl}-ureido)-isoxazol-5-yl]-propoxy}-tetrahydro-pyran-2-carboxylic acid methyl ester."}", "/scratch/micpie/export/ord_rxn_smiles_procedure/train_0-4.jsonl": "{"text":"Task: Extract the reaction SMILES string of a reaction based on its procedure.\nProcedure: A mixture of 2,2-diethyl-4,5-dimethyl-6-methoxy-indan-1,3-dione (5.5 g., 0.021 mole) and pyridine hydrochloride (50 g.) is heated at 180°C. for six hours then poured into water (1 l) affording 4.0 g. of 2,2-diethyl-4,5-dimethyl-6-hydroxy-indan-1,3-dione which melts at 141°-142°C. after recrystallization from methanol-water.\nAnswer: C[O:15][c:7]1[cH:6][c:5]2[c:10]([c:9]([CH3:13])[c:8]1[CH3:14])[C:11](=[O:12])[C:3]([CH2:1][CH3:2])([CH2:18][CH3:19])[C:4]2=[O:17].Cl.c1ccncc1.O>>[CH2:1]([CH3:2])[C:3]1([CH2:18][CH3:19])[C:4](=[O:17])[c:5]2[cH:6][c:7]([OH:15])[c:8]([CH3:14])[c:9]([CH3:13])[c:10]2[C:11]1=[O:12]"} {"text":"Task: Extract the reaction SMILES string of a reaction based on its reaction procedure description.\nProcedure: To a stirred mixture of 1-[5-(2-hydroxy-1,1-dimethyl-ethyl)-isoxazol-3-yl]-3-{4-[7-(2-morpholin-4-yl-ethoxy)-benzo[d]imidazo[2,1-b]thiazol-2-yl]-phenyl}-urea (I-1) (1 equivalent) and silver carbonate (0.5-1 equivalent) in toluene at rt is added 1-bromo-2,3,4-tri-O-acetyl-α-D-glucuronic acid methyl ester (0.8-1.5 equivalents) and the mixture is stirred at rt until the reaction is substantially complete as monitored by LCMS or TLC. The mixture is filtered and the filtrate is partitioned between water and either dichloromethane or a mixture of dichloromethane and isopropanol. The separated aqueous layer is further extracted with dichloromethane or a mixture of dichloromethane and isopropanol. The combined organic layers are washed with 2N aq sodium hydroxide dried over magnesium sulfate, filtered, and concentrated under reduced pressure. The residue is purified by preparative reverse-phase HPLC or silica gel flash chromatography to afford 3,4,5-triacetoxy-6-{2-methyl-2-[3-(3-{4-[7-(2-morpholin-4-yl-ethoxy)-benzo[d]imidazo[2,1-b]thiazol-2-yl]-phenyl}-ureido)-isoxazol-5-yl]-propoxy}-tetrahydro-pyran-2-carboxylic acid methyl ester.\nAnswer: Cc1ccccc1.O=C([O-])[O-].[Ag+2].O[C@:49]1(Br)[C@H:48]([O:53][C:54]([CH3:55])=[O:56])[C@@H:47]([O:57][C:58]([CH3:59])=[O:60])[C@H:46]([O:61][C:62]([CH3:63])=[O:64])[C@@H:45]([C:44]([O:43][CH3:42])=[O:65])[O:51]1.[OH:1][CH2:2][C:3]([CH3:4])([CH3:5])[c:6]1[cH:7][c:8]([NH:11][C:12](=[O:13])[NH:14][c:15]2[cH:16][cH:17][c:18](-[c:21]3[n:22][c:23]4[s:24][c:25]5[c:26]([n:27]4[cH:28]3)[cH:29][cH:30][c:31]([O:33][CH2:34][CH2:35][N:36]3[CH2:37][CH2:38][O:39][CH2:40][CH2:41]3)[cH:32]5)[cH:19][cH:20]2)[n:9][o:10]1>>[O:1]([CH2:2][C:3]([CH3:4])([CH3:5])[c:6]1[cH:7][c:8]([NH:11][C:12](=[O:13])[NH:14][c:15]2[cH:16][cH:17][c:18](-[c:21]3[n:22][c:23]4[s:24][c:25]5[c:26]([n:27]4[cH:28]3)[cH:29][cH:30][c:31]([O:33][CH2:34][CH2:35][N:36]3[CH2:37][CH2:38][O:39][CH2:40][CH2:41]3)[cH:32]5)[cH:19][cH:20]2)[n:9][o:10]1)[CH:49]1[CH:48]([O:53][C:54]([CH3:55])=[O:56])[CH:47]([O:57][C:58]([CH3:59])=[O:60])[CH:46]([O:61][C:62]([CH3:63])=[O:64])[CH:45]([C:44]([O:43][CH3:42])=[O:65])[O:51]1"}", "/scratch/micpie/export/ord_rxn_smiles_procedure/valid_0-3.jsonl": "{"text":"User: I want to run a reaction with the description of reaction procedure below and now need to know the RXNSMILES.\nProcedure: A mixture of 3-(5-trimethylsilyl-3-furyl)propyl dodecanoate (105 mg, 0.27 mmol) and Rose Bengal (5 mg) in tetrahydrofuran (6 ml) was exposed to singlet oxygen for 2.5 hours at -78 degrees. The residue, after solvent removal, was purified by preparative thin layer chromatography (TLC) (20×20 cm, 500u silica plate; developed with 60% ethyl ether\/petroleum ether). The title ester was obtained as an off-white solid.\nAssistant: The RXNSMILES of the reaction is C1CCOC1.C[Si](C)(C)c1cc(CCC[O:14][C:1]([CH2:2][CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH2:8][CH2:9][CH2:10][CH2:11][CH3:12])=[O:13])co1.O=[O:60].Oc1c(I)cc2c(c1I)Oc1c(cc(I)c(O)c1I)[C:43]21[c:44]2[c:45](c(Cl)[c:47](Cl)[c:48](Cl)[c:49]2Cl)[C:54](=[O:55])[O:56]1>>[C:1]([CH2:2][CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH2:8][CH2:9][CH2:10][CH2:11][CH3:12])(=[O:13])[O:14][CH2:47][CH2:48][CH2:49][C:44]1=[CH:45][CH:54]([OH:55])[O:56][C:43]1=[O:60]."} {"text":"User: I need to run a reaction with the procedure below and now need to know the reaction SMILES string.\nProcedure: A mixture of 3-heptyl-2-methylbenzothiazolium iodide (0.50 g, 1.33 mmol) and 4-dimethylamino-1-naphthaldehyde (0.40 g, 2.01 mmol) in 5 mL of acetic anhydride under nitrogen was heated. Upon refluxing, the mixture turned purple and all solids seemed to be in solution. The solution was refluxed for 15 minutes and on cooling, solid formed. The mixture was filtered and washed with ethyl acetate to give (E)-2-[2-(4-dimethylaminonaphthalen-1-yl)vinyl]-3-heptylbenzothiazol-3-ium iodide as a dark brown solid, mp 195-197° C.\nAssistant: The reaction SMILES string of the reaction is CC(=O)OC(C)=O.O=[CH:31][c:24]1[cH:23][cH:22][c:21]([N:20]([CH3:19])[CH3:33])[c:30]2[c:25]1[cH:26][cH:27][cH:28][cH:29]2.[CH2:2]([CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH3:8])[n+:9]1[c:10]([CH3:18])[s:11][c:12]2[c:13]1[cH:14][cH:15][cH:16][cH:17]2.[I-:1]>>[CH2:2]([CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH3:8])[n+:9]1[c:10](\/[CH:18]=[CH:31]\/[c:24]2[cH:23][cH:22][c:21]([N:20]([CH3:19])[CH3:33])[c:30]3[c:25]2[cH:26][cH:27][cH:28][cH:29]3)[s:11][c:12]2[c:13]1[cH:14][cH:15][cH:16][cH:17]2.[I-:1]."}", "/scratch/micpie/export/ord_rxn_smiles_procedure/test_0-4.jsonl": "{"text":"Task: Extract the reaction SMILES string of a reaction based on its description of reaction procedure.\nProcedure: To a stirring solution of 1-bromo-4-(difluoromethoxy)-3-methoxy-2-(methoxymethoxy)benzene (1 g, 3.19 mmol) in methanol (15 mL) was added concentrated hydrochloride (2 mL) and the reaction mixture was heated to 50° C. for 1 h. The reaction mixture was cooled to RT, concentrated under reduced pressure and the obtained residue was basified with sodium bicarbonate solution and extracted with dichloromethane (3×). The combined dichloromethane layers were washed with brine, dried over sodium sulphate and concentrated under reduced pressure to afford 6-Bromo-3-(difluoromethoxy)-2-methoxyphenol as a solid (800 mg, 93%).\nAnswer: CO.COC[O:14][c:3]1[c:2]([Br:1])[cH:7][cH:6][c:5]([O:8][CH:9]([F:10])[F:11])[c:4]1[O:12][CH3:13].Cl>>[Br:1][c:2]1[c:3]([OH:14])[c:4]([O:12][CH3:13])[c:5]([O:8][CH:9]([F:10])[F:11])[cH:6][cH:7]1"} {"text":"Task: Extract the reaction SMILES string of a reaction based on its reaction procedure description.\nProcedure: 8-Methyl-3-(1-(9-(tetrahydro-2H-pyran-2-yl)-9H-purin-6-ylamino)ethyl)-2-o-tolylisoquinolin-1(2H)-one 4105 (180 mg, 0.36 mmol) was dissolved in MeOH(HCl) (50 mL) and stirred for 2 h. Aqueous NaHCO3 solution was added to the reaction mixture and the pH value was adjusted to 9. The mixture was then filtered and the filtrate was concentrated in vacuo to afford the desired product, 3-(1-(9H-purin-6-ylamino)ethyl)-8-methyl-2-o-tolylisoquinolin-1(2H)-one 4106 (80 mg, 54% yield) as a yellow solid.\nAnswer: C1CCC([n:27]2[cH:26][n:25][c:24]3[c:23]([NH:22][CH:20]([c:8]4[cH:7][c:6]5[cH:5][cH:4][cH:3][c:2]([CH3:1])[c:11]5[c:10](=[O:12])[n:9]4-[c:13]4[c:14]([CH3:19])[cH:15][cH:16][cH:17][cH:18]4)[CH3:21])[n:31][cH:30][n:29][c:28]32)OC1.O=C([O-])O.[Na+]>>[CH3:1][c:2]1[cH:3][cH:4][cH:5][c:6]2[cH:7][c:8]([CH:20]([CH3:21])[NH:22][c:23]3[c:24]4[n:25][cH:26][nH:27][c:28]4[n:29][cH:30][n:31]3)[n:9](-[c:13]3[c:14]([CH3:19])[cH:15][cH:16][cH:17][cH:18]3)[c:10](=[O:12])[c:11]12"}", "/scratch/micpie/export/qm8/train_0-28.jsonl": "{"text":"User: I have computed the RI-CC2\/def2TZVP-computed S0 -> S1 transition energy, RI-CC2\/def2TZVP-computed S0 -> S2 transition energy, and S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) of a molecule and want to know its canonical SMILES.\nAssistant: That sounds interesting, I would need to know the RI-CC2\/def2TZVP-computed S0 -> S1 transition energy, RI-CC2\/def2TZVP-computed S0 -> S2 transition energy, and S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) of the molecule you want to know the canonical SMILES of.\nUser: The RI-CC2\/def2TZVP-computed S0 -> S1 transition energy is 0.265 a. u., the RI-CC2\/def2TZVP-computed S0 -> S2 transition energy is 0.35008 a. u., and the S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) is 0.06702 a. u.\nAssistant: The canonical SMILES of the molecule is N."} {"text":"User: I have computed the RI-CC2\/def2TZVP-computed S0 -> S1 transition energy, S0 -> S2 transition energy computed using RI-CC2\/def2TZVP, and S0 -> S1 transition oscillator strength computed using RI-CC2\/def2TZVP of a molecule and want to know its canonical SMILES.\nAssistant: Cool, I would need to know the RI-CC2\/def2TZVP-computed S0 -> S1 transition energy, S0 -> S2 transition energy computed using RI-CC2\/def2TZVP, and S0 -> S1 transition oscillator strength computed using RI-CC2\/def2TZVP of the molecule you want to know the canonical SMILES of.\nUser: The RI-CC2\/def2TZVP-computed S0 -> S1 transition energy is 0.262 a. u., the S0 -> S2 transition energy computed using RI-CC2\/def2TZVP is 0.28498 a. u., and the S0 -> S1 transition oscillator strength computed using RI-CC2\/def2TZVP is 0.00004 a. u.\nAssistant: The canonical SMILES of the molecule is FC(F)(F)C1COC1."}", "/scratch/micpie/export/qm8/train_0-17.jsonl": "{"text":"Question: What is the LR-TDPBE0\/def2SVP-computed S0 -> S2 transition energy of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 2\t293.60975\t293.54111\t191.39397\t1.6256\t9.46\t-0.257\t0.0829\t0.3399\t26.1563\t0.034358\t-56.525887\t-56.523026\t-56.522082\t-56.544961\t6.316\n ChemNLP 3D\n\n 4 3 0 0 0 0 0 0 0 0999 V2000\n -0.0404 1.0241 0.0626 N 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0173 0.0125 -0.0274 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9158 1.3587 -0.0288 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5203 1.3435 -0.7755 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.34911 a. u."} {"text":"Question: What is the S0 -> S2 transition energy computed using linear-response time-dependent density functional theory (LR-TDPBE0\/def2SVP) of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23927\t3.78971\t1.55457\t1.42284\t1.532\t46.48\t-0.2621\t0.067\t0.329\t855.7697\t0.092261\t-530.085764\t-530.078593\t-530.077649\t-530.118327\t24.984\n ChemNLP 3D\n\n 13 13 0 0 0 0 0 0 0 0999 V2000\n 0.1408 0.0536 0.1012 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0125 1.3864 0.0414 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2077 1.9240 -0.1160 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7231 1.6594 -1.0646 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6922 1.9297 1.2733 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9757 1.1953 1.7185 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3649 0.7933 2.9595 O 0 0 0 0 0 0 0 0 0 0 0 0\n -0.1261 1.4488 2.6272 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7843 3.0100 1.1594 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.8483 1.8387 1.8753 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.2640 0.3380 1.1002 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7117 0.7458 2.5622 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.1240 2.2461 3.3356 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 10 1 0\n 6 7 1 0\n 8 7 1 0\n 8 13 1 0\n 9 5 1 0\n 11 6 1 0\n 12 8 1 0\nM END\n[\\V2000].\nAnswer: 0.29260 a. u."}", "/scratch/micpie/export/qm8/train_0-16.jsonl": "{"text":"Question: What is the S0 -> S1 transition energy computed using LR-TDPBE0\/def2SVP of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 2\t293.60975\t293.54111\t191.39397\t1.6256\t9.46\t-0.257\t0.0829\t0.3399\t26.1563\t0.034358\t-56.525887\t-56.523026\t-56.522082\t-56.544961\t6.316\n ChemNLP 3D\n\n 4 3 0 0 0 0 0 0 0 0999 V2000\n -0.0404 1.0241 0.0626 N 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0173 0.0125 -0.0274 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9158 1.3587 -0.0288 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5203 1.3435 -0.7755 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.26839 a. u."} {"text":"Question: What is the S0 -> S1 transition energy computed using LR-TDPBE0\/def2SVP of the chemical with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23927\t3.78971\t1.55457\t1.42284\t1.532\t46.48\t-0.2621\t0.067\t0.329\t855.7697\t0.092261\t-530.085764\t-530.078593\t-530.077649\t-530.118327\t24.984\n ChemNLP 3D\n\n 13 13 0 0 0 0 0 0 0 0999 V2000\n 0.1408 0.0536 0.1012 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0125 1.3864 0.0414 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2077 1.9240 -0.1160 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7231 1.6594 -1.0646 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6922 1.9297 1.2733 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9757 1.1953 1.7185 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3649 0.7933 2.9595 O 0 0 0 0 0 0 0 0 0 0 0 0\n -0.1261 1.4488 2.6272 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7843 3.0100 1.1594 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.8483 1.8387 1.8753 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.2640 0.3380 1.1002 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7117 0.7458 2.5622 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.1240 2.2461 3.3356 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 10 1 0\n 6 7 1 0\n 8 7 1 0\n 8 13 1 0\n 9 5 1 0\n 11 6 1 0\n 12 8 1 0\nM END\n[\\V2000].\nAnswer: 0.27690 a. u."}", "/scratch/micpie/export/qm8/test_0-10.jsonl": "{"text":"The S0 -> S1 transition oscillator strength computed using Coulomb-attenuated B3LYP density functional theory (CAM-B3LYP\/def2TZVP) of the compound with the InChI InChI=1S\/H2O\/h1H2 is 0.03330 a. u."} {"text":"The S0 -> S1 transition oscillator strength computed using Coulomb-attenuated B3LYP density functional theory (CAM-B3LYP\/def2TZVP) of the molecule with the DeepSMILES [H]C[H])CF)F)F))C[H])C[H])[H])C3[H])[H] is 0.00990 a. u."}", "/scratch/micpie/export/qm8/valid_0-8.jsonl": "{"text":"The S0 -> S1 transition energy computed using Coulomb-attenuated B3LYP density functional theory (CAM-B3LYP\/def2TZVP) of the compound with the DeepSMILES [H]C[H])[H])[H] is 0.40993 a. u."} {"text":"The S0 -> S1 transition energy computed using Coulomb-attenuated B3LYP density functional theory (CAM-B3LYP\/def2TZVP) of the compound with the SELFIES [H][C][Branch1][C][H][C][Branch1][C][H][Branch1][C][H][C][Branch1][C][H][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Ring1][N][Branch1][C][H][H] is 0.33602 a. u."}", "/scratch/micpie/export/qm8/test_0-22.jsonl": "{"text":"Question: What is the S0 -> S1 transition oscillator strength computed using CAM-B3LYP\/def2TZVP of the chemical with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 3\t799.58812\t437.90386\t282.94545\t1.8511\t6.31\t-0.2928\t0.0687\t0.3615\t19.0002\t0.021375\t-76.404702\t-76.401867\t-76.400922\t-76.422349\t6.002\n ChemNLP 3D\n\n 3 2 0 0 0 0 0 0 0 0999 V2000\n -0.0344 0.9775 0.0076 O 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0648 0.0206 0.0015 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.8718 1.3008 0.0007 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\nM END\n[\\V2000].\nAnswer: 0.03330 a. u."} {"text":"Question: What is the S0 -> S1 transition oscillator strength computed using CAM-B3LYP\/def2TZVP of the molecule with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23920\t4.16555\t1.22902\t1.1911\t2.0686\t53.89\t-0.2927\t0.0895\t0.3822\t1023.2247\t0.115399\t-494.167503\t-494.159866\t-494.158921\t-494.200318\t27.778\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.3774 0.1902 -0.0327 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0267 1.4673 0.0776 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0762 2.2330 0.1036 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6975 1.7727 -1.0459 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8924 1.6752 1.3057 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.2029 1.2594 2.5856 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7820 2.1866 3.2514 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5605 1.9303 3.8875 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.1633 2.7356 1.3332 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.8151 1.1063 1.1484 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0263 0.1997 2.6383 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.6634 1.7526 3.7093 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9530 3.1535 2.7915 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.2959 2.7280 3.8623 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6021 1.3189 4.7817 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 3 1 0\n 2 5 1 0\n 4 2 1 0\n 5 9 1 0\n 5 6 1 0\n 6 11 1 0\n 6 7 1 0\n 6 8 1 0\n 7 12 1 0\n 7 8 1 0\n 8 15 1 0\n 10 5 1 0\n 13 7 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.00990 a. u."}", "/scratch/micpie/export/qm8/test_0-16.jsonl": "{"text":"Question: What is the LR-TDPBE0\/def2SVP-computed S0 -> S1 transition energy of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 3\t799.58812\t437.90386\t282.94545\t1.8511\t6.31\t-0.2928\t0.0687\t0.3615\t19.0002\t0.021375\t-76.404702\t-76.401867\t-76.400922\t-76.422349\t6.002\n ChemNLP 3D\n\n 3 2 0 0 0 0 0 0 0 0999 V2000\n -0.0344 0.9775 0.0076 O 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0648 0.0206 0.0015 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.8718 1.3008 0.0007 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\nM END\n[\\V2000].\nAnswer: 0.29138 a. u."} {"text":"Question: What is the S0 -> S1 transition energy computed using linear-response time-dependent density functional theory (LR-TDPBE0\/def2SVP) of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23920\t4.16555\t1.22902\t1.1911\t2.0686\t53.89\t-0.2927\t0.0895\t0.3822\t1023.2247\t0.115399\t-494.167503\t-494.159866\t-494.158921\t-494.200318\t27.778\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.3774 0.1902 -0.0327 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0267 1.4673 0.0776 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0762 2.2330 0.1036 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6975 1.7727 -1.0459 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8924 1.6752 1.3057 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.2029 1.2594 2.5856 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7820 2.1866 3.2514 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5605 1.9303 3.8875 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.1633 2.7356 1.3332 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.8151 1.1063 1.1484 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0263 0.1997 2.6383 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.6634 1.7526 3.7093 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9530 3.1535 2.7915 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.2959 2.7280 3.8623 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6021 1.3189 4.7817 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 3 1 0\n 2 5 1 0\n 4 2 1 0\n 5 9 1 0\n 5 6 1 0\n 6 11 1 0\n 6 7 1 0\n 6 8 1 0\n 7 12 1 0\n 7 8 1 0\n 8 15 1 0\n 10 5 1 0\n 13 7 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.32976 a. u."}", "/scratch/micpie/export/qm8/test_0-15.jsonl": "{"text":"Question: What is the S0 -> S2 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 3\t799.58812\t437.90386\t282.94545\t1.8511\t6.31\t-0.2928\t0.0687\t0.3615\t19.0002\t0.021375\t-76.404702\t-76.401867\t-76.400922\t-76.422349\t6.002\n ChemNLP 3D\n\n 3 2 0 0 0 0 0 0 0 0999 V2000\n -0.0344 0.9775 0.0076 O 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0648 0.0206 0.0015 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.8718 1.3008 0.0007 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\nM END\n[\\V2000].\nAnswer: 0.00000 a. u."} {"text":"Question: What is the S0 -> S2 transition oscillator strength computed using RI-CC2\/def2TZVP of the chemical with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23920\t4.16555\t1.22902\t1.1911\t2.0686\t53.89\t-0.2927\t0.0895\t0.3822\t1023.2247\t0.115399\t-494.167503\t-494.159866\t-494.158921\t-494.200318\t27.778\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.3774 0.1902 -0.0327 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0267 1.4673 0.0776 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0762 2.2330 0.1036 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6975 1.7727 -1.0459 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8924 1.6752 1.3057 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.2029 1.2594 2.5856 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7820 2.1866 3.2514 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5605 1.9303 3.8875 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.1633 2.7356 1.3332 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.8151 1.1063 1.1484 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0263 0.1997 2.6383 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.6634 1.7526 3.7093 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9530 3.1535 2.7915 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.2959 2.7280 3.8623 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6021 1.3189 4.7817 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 3 1 0\n 2 5 1 0\n 4 2 1 0\n 5 9 1 0\n 5 6 1 0\n 6 11 1 0\n 6 7 1 0\n 6 8 1 0\n 7 12 1 0\n 7 8 1 0\n 8 15 1 0\n 10 5 1 0\n 13 7 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.00075 a. u."}", "/scratch/micpie/export/qm8/train_0-27.jsonl": "{"text":"User: I want to design a molecule with a particular S0 -> S1 transition energy computed using RI-CC2\/def2TZVP, S0 -> S2 transition energy computed using RI-CC2\/def2TZVP, and RI-CC2\/def2TZVP-computed S0 -> S1 transition oscillator strength.\nAssistant: Great, I would need to know the S0 -> S1 transition energy computed using RI-CC2\/def2TZVP, S0 -> S2 transition energy computed using RI-CC2\/def2TZVP, and RI-CC2\/def2TZVP-computed S0 -> S1 transition oscillator strength of the molecule you want to design.\nUser: The S0 -> S1 transition energy computed using RI-CC2\/def2TZVP should be 0.265 a. u., the S0 -> S2 transition energy computed using RI-CC2\/def2TZVP should be 0.35008 a. u., and the RI-CC2\/def2TZVP-computed S0 -> S1 transition oscillator strength should be 0.06702 a. u.\nAssistant: I propose the chemical with the V2000 Molfile with the following content: [V2000]\ngdb 2\t293.60975\t293.54111\t191.39397\t1.6256\t9.46\t-0.257\t0.0829\t0.3399\t26.1563\t0.034358\t-56.525887\t-56.523026\t-56.522082\t-56.544961\t6.316\n ChemNLP 3D\n\n 4 3 0 0 0 0 0 0 0 0999 V2000\n -0.0404 1.0241 0.0626 N 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0173 0.0125 -0.0274 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9158 1.3587 -0.0288 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5203 1.3435 -0.7755 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000]."} {"text":"User: I want to design a molecule with a particular RI-CC2\/def2TZVP-computed S0 -> S1 transition energy, S0 -> S2 transition energy computed using RI-CC2\/def2TZVP, and S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2).\nAssistant: Awesome, I would need to know the RI-CC2\/def2TZVP-computed S0 -> S1 transition energy, S0 -> S2 transition energy computed using RI-CC2\/def2TZVP, and S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) of the molecule you want to design.\nUser: The RI-CC2\/def2TZVP-computed S0 -> S1 transition energy should be 0.262 a. u., the S0 -> S2 transition energy computed using RI-CC2\/def2TZVP should be 0.28498 a. u., and the S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) should be 0.00004 a. u.\nAssistant: I recommend the chemical with the V2000 Molfile with the following content: [V2000]\ngdb 23927\t3.78971\t1.55457\t1.42284\t1.532\t46.48\t-0.2621\t0.067\t0.329\t855.7697\t0.092261\t-530.085764\t-530.078593\t-530.077649\t-530.118327\t24.984\n ChemNLP 3D\n\n 13 13 0 0 0 0 0 0 0 0999 V2000\n 0.1408 0.0536 0.1012 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0125 1.3864 0.0414 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2077 1.9240 -0.1160 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7231 1.6594 -1.0646 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6922 1.9297 1.2733 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9757 1.1953 1.7185 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3649 0.7933 2.9595 O 0 0 0 0 0 0 0 0 0 0 0 0\n -0.1261 1.4488 2.6272 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7843 3.0100 1.1594 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.8483 1.8387 1.8753 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.2640 0.3380 1.1002 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7117 0.7458 2.5622 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.1240 2.2461 3.3356 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 10 1 0\n 6 7 1 0\n 8 7 1 0\n 8 13 1 0\n 9 5 1 0\n 11 6 1 0\n 12 8 1 0\nM END\n[\\V2000]."}", "/scratch/micpie/export/qm8/train_0-8.jsonl": "{"text":"The S0 -> S1 transition energy computed using Coulomb-attenuated B3LYP density functional theory (CAM-B3LYP\/def2TZVP) of the molecule with the InChI InChI=1S\/H3N\/h1H3 is 0.25385 a. u."} {"text":"The CAM-B3LYP\/def2TZVP-computed S0 -> S1 transition energy of the compound with the SELFIES [H][C][Branch1][C][H][O][C][Branch1][C][H][Branch1][C][H][C][Ring1][#Branch1][Branch1][C][H][C][Branch1][C][F][Branch1][C][F][F] is 0.26683 a. u."}", "/scratch/micpie/export/qm8/test_0-5.jsonl": "{"text":"The S0 -> S2 transition energy computed using linear-response time-dependent density functional theory (LR-TDPBE0\/def2SVP) of the chemical with the InChI InChI=1S\/H2O\/h1H2 is 0.36209 a. u."} {"text":"The S0 -> S2 transition energy computed using LR-TDPBE0\/def2SVP of the compound with the SELFIES [H][C][Branch1][C][H][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][H][C][Branch1][C][H][Branch1][C][H][C][Ring1][=Branch1][Branch1][C][H][H] is 0.33185 a. u."}", "/scratch/micpie/export/qm8/valid_0-25.jsonl": "{"text":"Question: What is the InChI of the molecule with the V3000 Molfile with the following content?\nDescription: The content of the V3000 Molfile is [V3000]\ngdb 1\t157.7118\t157.70997\t157.70699\t0.\t13.21\t-0.3877\t0.1171\t0.5048\t35.3641\t0.044749\t-40.47893\t-40.476062\t-40.475117\t-40.498597\t6.469\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 5 4 0 0 0\nM V30 BEGIN ATOM\nM V30 1 C -0.012700 1.085800 0.008000 0\nM V30 2 H 0.002200 -0.006000 0.002000 0\nM V30 3 H 1.011700 1.463800 0.000300 0\nM V30 4 H -0.540800 1.447500 -0.876600 0\nM V30 5 H -0.523800 1.437900 0.906400 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 5\nM V30 2 1 2 1\nM V30 3 1 3 1\nM V30 4 1 4 1\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000].\nAnswer: InChI=1S\/CH4\/h1H4."} {"text":"Question: What is the SELFIES of the chemical with the V3000 Molfile with the following content?\nDescription: The content of the V3000 Molfile is [V3000]\ngdb 23924\t3.7138\t1.49465\t1.35927\t2.1985\t53.65\t-0.3185\t0.0809\t0.3994\t921.4627\t0.116107\t-494.170733\t-494.163382\t-494.162438\t-494.203105\t26.845\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 15 15 0 0 0\nM V30 BEGIN ATOM\nM V30 1 F 0.209500 0.193800 -0.074700 0\nM V30 2 C -0.013500 1.518700 0.004200 0\nM V30 3 F 1.178800 2.127900 -0.114500 0\nM V30 4 F -0.748900 1.860100 -1.067900 0\nM V30 5 C -0.702000 1.898100 1.289700 0\nM V30 6 C -2.004700 1.139300 1.663200 0\nM V30 7 C -1.499700 1.082500 3.131400 0\nM V30 8 C -0.068900 1.408100 2.620600 0\nM V30 9 H -0.845900 2.981300 1.276600 0\nM V30 10 H -2.950500 1.645500 1.463700 0\nM V30 11 H -2.028000 0.149500 1.201200 0\nM V30 12 H -1.911300 1.895700 3.735300 0\nM V30 13 H -1.637400 0.144800 3.673000 0\nM V30 14 H 0.523600 0.503700 2.463300 0\nM V30 15 H 0.526900 2.128200 3.183700 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 2 5\nM V30 3 1 3 2\nM V30 4 1 4 2\nM V30 5 1 5 6\nM V30 6 1 5 8\nM V30 7 1 6 7\nM V30 8 1 7 13\nM V30 9 1 7 12\nM V30 10 1 8 7\nM V30 11 1 8 15\nM V30 12 1 9 5\nM V30 13 1 10 6\nM V30 14 1 11 6\nM V30 15 1 14 8\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000].\nAnswer: [H][C][Branch1][C][H][C][Branch1][C][H][Branch1][C][H][C][Branch1][C][H][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Ring1][N][Branch1][C][H][H]."}", "/scratch/micpie/export/qm8/valid_0-9.jsonl": "{"text":"The CAM-B3LYP\/def2TZVP-computed S0 -> S2 transition energy of the molecule with the canonical SMILES C is 0.40994 a. u."} {"text":"The S0 -> S2 transition energy computed using Coulomb-attenuated B3LYP density functional theory (CAM-B3LYP\/def2TZVP) of the molecule with the InChI InChI=1S\/C5H7F3\/c6-5(7,8)4-2-1-3-4\/h4H,1-3H2 is 0.33788 a. u."}", "/scratch/micpie/export/qm8/test_0-26.jsonl": "{"text":"Question: What is the RI-CC2\/def2TZVP-computed S0 -> S2 transition energy of the chemical with the V3000 Molfile with the following content?\nDescription: The content of the V3000 Molfile is [V3000]\ngdb 3\t799.58812\t437.90386\t282.94545\t1.8511\t6.31\t-0.2928\t0.0687\t0.3615\t19.0002\t0.021375\t-76.404702\t-76.401867\t-76.400922\t-76.422349\t6.002\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 3 2 0 0 0\nM V30 BEGIN ATOM\nM V30 1 O -0.034400 0.977500 0.007600 0\nM V30 2 H 0.064800 0.020600 0.001500 0\nM V30 3 H 0.871800 1.300800 0.000700 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 2 1\nM V30 2 1 3 1\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000].\nAnswer: 0.36358 a. u."} {"text":"Question: What is the S0 -> S2 transition energy computed using RI-CC2\/def2TZVP of the chemical with the V3000 Molfile with the following content?\nDescription: The content of the V3000 Molfile is [V3000]\ngdb 23920\t4.16555\t1.22902\t1.1911\t2.0686\t53.89\t-0.2927\t0.0895\t0.3822\t1023.2247\t0.115399\t-494.167503\t-494.159866\t-494.158921\t-494.200318\t27.778\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 15 15 0 0 0\nM V30 BEGIN ATOM\nM V30 1 F 0.377400 0.190200 -0.032700 0\nM V30 2 C -0.026700 1.467300 0.077600 0\nM V30 3 F 1.076200 2.233000 0.103600 0\nM V30 4 F -0.697500 1.772700 -1.045900 0\nM V30 5 C -0.892400 1.675200 1.305700 0\nM V30 6 C -0.202900 1.259400 2.585600 0\nM V30 7 C 0.782000 2.186600 3.251400 0\nM V30 8 C -0.560500 1.930300 3.887500 0\nM V30 9 H -1.163300 2.735600 1.333200 0\nM V30 10 H -1.815100 1.106300 1.148400 0\nM V30 11 H 0.026300 0.199700 2.638300 0\nM V30 12 H 1.663400 1.752600 3.709300 0\nM V30 13 H 0.953000 3.153500 2.791500 0\nM V30 14 H -1.295900 2.728000 3.862300 0\nM V30 15 H -0.602100 1.318900 4.781700 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 2 3\nM V30 3 1 2 5\nM V30 4 1 4 2\nM V30 5 1 5 9\nM V30 6 1 5 6\nM V30 7 1 6 11\nM V30 8 1 6 7\nM V30 9 1 6 8\nM V30 10 1 7 12\nM V30 11 1 7 8\nM V30 12 1 8 15\nM V30 13 1 10 5\nM V30 14 1 13 7\nM V30 15 1 14 8\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000].\nAnswer: 0.33078 a. u."}", "/scratch/micpie/export/qm8/test_0-19.jsonl": "{"text":"Question: What is the LR-TDPBE0\/def2SVP-computed S0 -> S2 transition oscillator strength of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 3\t799.58812\t437.90386\t282.94545\t1.8511\t6.31\t-0.2928\t0.0687\t0.3615\t19.0002\t0.021375\t-76.404702\t-76.401867\t-76.400922\t-76.422349\t6.002\n ChemNLP 3D\n\n 3 2 0 0 0 0 0 0 0 0999 V2000\n -0.0344 0.9775 0.0076 O 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0648 0.0206 0.0015 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.8718 1.3008 0.0007 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\nM END\n[\\V2000].\nAnswer: 0.0000 a. u."} {"text":"Question: What is the S0 -> S2 transition oscillator strength computed using LR-TDPBE0\/def2SVP of the molecule with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23920\t4.16555\t1.22902\t1.1911\t2.0686\t53.89\t-0.2927\t0.0895\t0.3822\t1023.2247\t0.115399\t-494.167503\t-494.159866\t-494.158921\t-494.200318\t27.778\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.3774 0.1902 -0.0327 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0267 1.4673 0.0776 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0762 2.2330 0.1036 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6975 1.7727 -1.0459 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8924 1.6752 1.3057 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.2029 1.2594 2.5856 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7820 2.1866 3.2514 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5605 1.9303 3.8875 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.1633 2.7356 1.3332 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.8151 1.1063 1.1484 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0263 0.1997 2.6383 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.6634 1.7526 3.7093 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9530 3.1535 2.7915 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.2959 2.7280 3.8623 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6021 1.3189 4.7817 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 3 1 0\n 2 5 1 0\n 4 2 1 0\n 5 9 1 0\n 5 6 1 0\n 6 11 1 0\n 6 7 1 0\n 6 8 1 0\n 7 12 1 0\n 7 8 1 0\n 8 15 1 0\n 10 5 1 0\n 13 7 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.0184 a. u."}", "/scratch/micpie/export/qm8/valid_0-28.jsonl": "{"text":"User: I have computed the RI-CC2\/def2TZVP-computed S0 -> S1 transition energy, S0 -> S2 transition energy computed using second-order approximate coupled-cluster theory (CC2), and RI-CC2\/def2TZVP-computed S0 -> S1 transition oscillator strength of a compound and want to know its canonical SMILES.\nAssistant: Great, I would need to know the RI-CC2\/def2TZVP-computed S0 -> S1 transition energy, S0 -> S2 transition energy computed using second-order approximate coupled-cluster theory (CC2), and RI-CC2\/def2TZVP-computed S0 -> S1 transition oscillator strength of the molecule you want to know the canonical SMILES of.\nUser: The RI-CC2\/def2TZVP-computed S0 -> S1 transition energy is 0.433 a. u., the S0 -> S2 transition energy computed using second-order approximate coupled-cluster theory (CC2) is 0.43296 a. u., and the RI-CC2\/def2TZVP-computed S0 -> S1 transition oscillator strength is 0.24973 a. u.\nAssistant: The canonical SMILES of the molecule is C."} {"text":"User: I have computed the RI-CC2\/def2TZVP-computed S0 -> S1 transition energy, RI-CC2\/def2TZVP-computed S0 -> S2 transition energy, and S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) of a molecule and want to know its SMILES.\nAssistant: I would need to know the RI-CC2\/def2TZVP-computed S0 -> S1 transition energy, RI-CC2\/def2TZVP-computed S0 -> S2 transition energy, and S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) of the molecule you want to know the SMILES of.\nUser: The RI-CC2\/def2TZVP-computed S0 -> S1 transition energy is 0.344 a. u., the RI-CC2\/def2TZVP-computed S0 -> S2 transition energy is 0.34462 a. u., and the S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) is 0.04496 a. u.\nAssistant: The SMILES of the molecule is [H]C1([H])C([H])([H])C([H])(C(F)(F)F)C1([H])[H]."}", "/scratch/micpie/export/qm8/valid_0-24.jsonl": "{"text":"Question: What is the S0 -> S1 transition energy computed using second-order approximate coupled-cluster theory (CC2) of the molecule with the V3000 Molfile with the following content?\nDescription: The content of the V3000 Molfile is [V3000]\ngdb 1\t157.7118\t157.70997\t157.70699\t0.\t13.21\t-0.3877\t0.1171\t0.5048\t35.3641\t0.044749\t-40.47893\t-40.476062\t-40.475117\t-40.498597\t6.469\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 5 4 0 0 0\nM V30 BEGIN ATOM\nM V30 1 C -0.012700 1.085800 0.008000 0\nM V30 2 H 0.002200 -0.006000 0.002000 0\nM V30 3 H 1.011700 1.463800 0.000300 0\nM V30 4 H -0.540800 1.447500 -0.876600 0\nM V30 5 H -0.523800 1.437900 0.906400 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 5\nM V30 2 1 2 1\nM V30 3 1 3 1\nM V30 4 1 4 1\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000].\nAnswer: 0.433 a. u."} {"text":"Question: What is the S0 -> S1 transition energy computed using RI-CC2\/def2TZVP of the chemical with the V3000 Molfile with the following content?\nDescription: The content of the V3000 Molfile is [V3000]\ngdb 23924\t3.7138\t1.49465\t1.35927\t2.1985\t53.65\t-0.3185\t0.0809\t0.3994\t921.4627\t0.116107\t-494.170733\t-494.163382\t-494.162438\t-494.203105\t26.845\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 15 15 0 0 0\nM V30 BEGIN ATOM\nM V30 1 F 0.209500 0.193800 -0.074700 0\nM V30 2 C -0.013500 1.518700 0.004200 0\nM V30 3 F 1.178800 2.127900 -0.114500 0\nM V30 4 F -0.748900 1.860100 -1.067900 0\nM V30 5 C -0.702000 1.898100 1.289700 0\nM V30 6 C -2.004700 1.139300 1.663200 0\nM V30 7 C -1.499700 1.082500 3.131400 0\nM V30 8 C -0.068900 1.408100 2.620600 0\nM V30 9 H -0.845900 2.981300 1.276600 0\nM V30 10 H -2.950500 1.645500 1.463700 0\nM V30 11 H -2.028000 0.149500 1.201200 0\nM V30 12 H -1.911300 1.895700 3.735300 0\nM V30 13 H -1.637400 0.144800 3.673000 0\nM V30 14 H 0.523600 0.503700 2.463300 0\nM V30 15 H 0.526900 2.128200 3.183700 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 2 5\nM V30 3 1 3 2\nM V30 4 1 4 2\nM V30 5 1 5 6\nM V30 6 1 5 8\nM V30 7 1 6 7\nM V30 8 1 7 13\nM V30 9 1 7 12\nM V30 10 1 8 7\nM V30 11 1 8 15\nM V30 12 1 9 5\nM V30 13 1 10 6\nM V30 14 1 11 6\nM V30 15 1 14 8\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000].\nAnswer: 0.344 a. u."}", "/scratch/micpie/export/qm8/train_0-24.jsonl": "{"text":"Question: What is the S0 -> S1 transition energy computed using RI-CC2\/def2TZVP of the compound with the V3000 Molfile with the following content?\nDescription: The content of the V3000 Molfile is [V3000]\ngdb 2\t293.60975\t293.54111\t191.39397\t1.6256\t9.46\t-0.257\t0.0829\t0.3399\t26.1563\t0.034358\t-56.525887\t-56.523026\t-56.522082\t-56.544961\t6.316\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 4 3 0 0 0\nM V30 BEGIN ATOM\nM V30 1 N -0.040400 1.024100 0.062600 0\nM V30 2 H 0.017300 0.012500 -0.027400 0\nM V30 3 H 0.915800 1.358700 -0.028800 0\nM V30 4 H -0.520300 1.343500 -0.775500 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 2 1\nM V30 2 1 3 1\nM V30 3 1 4 1\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000].\nAnswer: 0.265 a. u."} {"text":"Question: What is the S0 -> S1 transition energy computed using second-order approximate coupled-cluster theory (CC2) of the chemical with the V3000 Molfile with the following content?\nDescription: The content of the V3000 Molfile is [V3000]\ngdb 23927\t3.78971\t1.55457\t1.42284\t1.532\t46.48\t-0.2621\t0.067\t0.329\t855.7697\t0.092261\t-530.085764\t-530.078593\t-530.077649\t-530.118327\t24.984\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 13 13 0 0 0\nM V30 BEGIN ATOM\nM V30 1 F 0.140800 0.053600 0.101200 0\nM V30 2 C -0.012500 1.386400 0.041400 0\nM V30 3 F 1.207700 1.924000 -0.116000 0\nM V30 4 F -0.723100 1.659400 -1.064600 0\nM V30 5 C -0.692200 1.929700 1.273300 0\nM V30 6 C -1.975700 1.195300 1.718500 0\nM V30 7 O -1.364900 0.793300 2.959500 0\nM V30 8 C -0.126100 1.448800 2.627200 0\nM V30 9 H -0.784300 3.010000 1.159400 0\nM V30 10 H -2.848300 1.838700 1.875300 0\nM V30 11 H -2.264000 0.338000 1.100200 0\nM V30 12 H 0.711700 0.745800 2.562200 0\nM V30 13 H 0.124000 2.246100 3.335600 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 2 1\nM V30 2 1 2 5\nM V30 3 1 3 2\nM V30 4 1 4 2\nM V30 5 1 5 6\nM V30 6 1 5 8\nM V30 7 1 6 10\nM V30 8 1 6 7\nM V30 9 1 8 7\nM V30 10 1 8 13\nM V30 11 1 9 5\nM V30 12 1 11 6\nM V30 13 1 12 8\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000].\nAnswer: 0.262 a. u."}", "/scratch/micpie/export/qm8/test_0-1.jsonl": "{"text":"The S0 -> S2 transition energy computed using RI-CC2\/def2TZVP of the chemical with the DeepSMILES [H]O[H] is 0.36358 a. u."} {"text":"The S0 -> S2 transition energy computed using second-order approximate coupled-cluster theory (CC2) of the chemical with the SELFIES [H][C][Branch1][C][H][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][H][C][Branch1][C][H][Branch1][C][H][C][Ring1][=Branch1][Branch1][C][H][H] is 0.33078 a. u."}", "/scratch/micpie/export/qm8/test_0-18.jsonl": "{"text":"Question: What is the S0 -> S1 transition oscillator strength computed using linear-response time-dependent density functional theory (LR-TDPBE0\/def2SVP) of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 3\t799.58812\t437.90386\t282.94545\t1.8511\t6.31\t-0.2928\t0.0687\t0.3615\t19.0002\t0.021375\t-76.404702\t-76.401867\t-76.400922\t-76.422349\t6.002\n ChemNLP 3D\n\n 3 2 0 0 0 0 0 0 0 0999 V2000\n -0.0344 0.9775 0.0076 O 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0648 0.0206 0.0015 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.8718 1.3008 0.0007 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\nM END\n[\\V2000].\nAnswer: 0.01950 a. u."} {"text":"Question: What is the S0 -> S1 transition oscillator strength computed using linear-response time-dependent density functional theory (LR-TDPBE0\/def2SVP) of the molecule with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23920\t4.16555\t1.22902\t1.1911\t2.0686\t53.89\t-0.2927\t0.0895\t0.3822\t1023.2247\t0.115399\t-494.167503\t-494.159866\t-494.158921\t-494.200318\t27.778\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.3774 0.1902 -0.0327 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0267 1.4673 0.0776 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0762 2.2330 0.1036 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6975 1.7727 -1.0459 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8924 1.6752 1.3057 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.2029 1.2594 2.5856 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7820 2.1866 3.2514 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5605 1.9303 3.8875 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.1633 2.7356 1.3332 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.8151 1.1063 1.1484 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0263 0.1997 2.6383 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.6634 1.7526 3.7093 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9530 3.1535 2.7915 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.2959 2.7280 3.8623 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6021 1.3189 4.7817 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 3 1 0\n 2 5 1 0\n 4 2 1 0\n 5 9 1 0\n 5 6 1 0\n 6 11 1 0\n 6 7 1 0\n 6 8 1 0\n 7 12 1 0\n 7 8 1 0\n 8 15 1 0\n 10 5 1 0\n 13 7 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.00664 a. u."}", "/scratch/micpie/export/qm8/test_0-29.jsonl": "{"text":"User: I want to design a molecule that has a S0 -> S1 transition energy computed using second-order approximate coupled-cluster theory (CC2) of 0.287 a. u. and a S0 -> S2 transition energy computed using second-order approximate coupled-cluster theory (CC2) of 0.36358 a. u.\nAssistant: Great, What else should I take into account?\nUser: I only want to know the canonical SMILES of the molecule.\nAssistant: I suggest the molecule with the canonical SMILES O."} {"text":"User: I want to design a molecule that has a S0 -> S1 transition energy computed using second-order approximate coupled-cluster theory (CC2) of 0.325 a. u. and a RI-CC2\/def2TZVP-computed S0 -> S2 transition energy of 0.33078 a. u.\nAssistant: Do you have any other constraints?\nUser: No, I don't, I only want to know the SELFIES of the molecule.\nAssistant: I recommend the molecule with the SELFIES [H][C][Branch1][C][H][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][H][C][Branch1][C][H][Branch1][C][H][C][Ring1][=Branch1][Branch1][C][H][H]."}", "/scratch/micpie/export/qm8/valid_0-0.jsonl": "{"text":"The S0 -> S1 transition energy computed using second-order approximate coupled-cluster theory (CC2) of the molecule with the SELFIES [H][C][Branch1][C][H][Branch1][C][H][H] is 0.433 a. u."} {"text":"The S0 -> S1 transition energy computed using second-order approximate coupled-cluster theory (CC2) of the molecule with the canonical SMILES FC(F)(F)C1CCC1 is 0.344 a. u."}", "/scratch/micpie/export/qm8/test_0-21.jsonl": "{"text":"Question: What is the S0 -> S2 transition energy computed using CAM-B3LYP\/def2TZVP of the chemical with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 3\t799.58812\t437.90386\t282.94545\t1.8511\t6.31\t-0.2928\t0.0687\t0.3615\t19.0002\t0.021375\t-76.404702\t-76.401867\t-76.400922\t-76.422349\t6.002\n ChemNLP 3D\n\n 3 2 0 0 0 0 0 0 0 0999 V2000\n -0.0344 0.9775 0.0076 O 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0648 0.0206 0.0015 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.8718 1.3008 0.0007 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\nM END\n[\\V2000].\nAnswer: 0.35007 a. u."} {"text":"Question: What is the S0 -> S2 transition energy computed using CAM-B3LYP\/def2TZVP of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23920\t4.16555\t1.22902\t1.1911\t2.0686\t53.89\t-0.2927\t0.0895\t0.3822\t1023.2247\t0.115399\t-494.167503\t-494.159866\t-494.158921\t-494.200318\t27.778\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.3774 0.1902 -0.0327 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0267 1.4673 0.0776 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0762 2.2330 0.1036 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6975 1.7727 -1.0459 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8924 1.6752 1.3057 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.2029 1.2594 2.5856 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7820 2.1866 3.2514 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5605 1.9303 3.8875 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.1633 2.7356 1.3332 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.8151 1.1063 1.1484 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0263 0.1997 2.6383 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.6634 1.7526 3.7093 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9530 3.1535 2.7915 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.2959 2.7280 3.8623 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6021 1.3189 4.7817 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 3 1 0\n 2 5 1 0\n 4 2 1 0\n 5 9 1 0\n 5 6 1 0\n 6 11 1 0\n 6 7 1 0\n 6 8 1 0\n 7 12 1 0\n 7 8 1 0\n 8 15 1 0\n 10 5 1 0\n 13 7 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.32280 a. u."}", "/scratch/micpie/export/qm8/test_0-27.jsonl": "{"text":"User: I want to design a compound with a particular S0 -> S1 transition energy computed using RI-CC2\/def2TZVP, RI-CC2\/def2TZVP-computed S0 -> S2 transition energy, and S0 -> S1 transition oscillator strength computed using RI-CC2\/def2TZVP.\nAssistant: That sounds interesting, I would need to know the S0 -> S1 transition energy computed using RI-CC2\/def2TZVP, RI-CC2\/def2TZVP-computed S0 -> S2 transition energy, and S0 -> S1 transition oscillator strength computed using RI-CC2\/def2TZVP of the molecule you want to design.\nUser: The S0 -> S1 transition energy computed using RI-CC2\/def2TZVP should be 0.287 a. u., the RI-CC2\/def2TZVP-computed S0 -> S2 transition energy should be 0.36358 a. u., and the S0 -> S1 transition oscillator strength computed using RI-CC2\/def2TZVP should be 0.03776 a. u.\nAssistant: I propose the compound with the V2000 Molfile with the following content: [V2000]\ngdb 3\t799.58812\t437.90386\t282.94545\t1.8511\t6.31\t-0.2928\t0.0687\t0.3615\t19.0002\t0.021375\t-76.404702\t-76.401867\t-76.400922\t-76.422349\t6.002\n ChemNLP 3D\n\n 3 2 0 0 0 0 0 0 0 0999 V2000\n -0.0344 0.9775 0.0076 O 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0648 0.0206 0.0015 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.8718 1.3008 0.0007 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\nM END\n[\\V2000]."} {"text":"User: I want to design a compound with a particular S0 -> S1 transition energy computed using second-order approximate coupled-cluster theory (CC2), S0 -> S2 transition energy computed using second-order approximate coupled-cluster theory (CC2), and S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2).\nAssistant: Great, I would need to know the S0 -> S1 transition energy computed using second-order approximate coupled-cluster theory (CC2), S0 -> S2 transition energy computed using second-order approximate coupled-cluster theory (CC2), and S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) of the molecule you want to design.\nUser: The S0 -> S1 transition energy computed using second-order approximate coupled-cluster theory (CC2) should be 0.325 a. u., the S0 -> S2 transition energy computed using second-order approximate coupled-cluster theory (CC2) should be 0.33078 a. u., and the S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) should be 0.01563 a. u.\nAssistant: I recommend the molecule with the V2000 Molfile with the following content: [V2000]\ngdb 23920\t4.16555\t1.22902\t1.1911\t2.0686\t53.89\t-0.2927\t0.0895\t0.3822\t1023.2247\t0.115399\t-494.167503\t-494.159866\t-494.158921\t-494.200318\t27.778\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.3774 0.1902 -0.0327 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0267 1.4673 0.0776 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0762 2.2330 0.1036 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6975 1.7727 -1.0459 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8924 1.6752 1.3057 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.2029 1.2594 2.5856 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7820 2.1866 3.2514 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5605 1.9303 3.8875 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.1633 2.7356 1.3332 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.8151 1.1063 1.1484 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0263 0.1997 2.6383 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.6634 1.7526 3.7093 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9530 3.1535 2.7915 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.2959 2.7280 3.8623 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6021 1.3189 4.7817 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 3 1 0\n 2 5 1 0\n 4 2 1 0\n 5 9 1 0\n 5 6 1 0\n 6 11 1 0\n 6 7 1 0\n 6 8 1 0\n 7 12 1 0\n 7 8 1 0\n 8 15 1 0\n 10 5 1 0\n 13 7 1 0\n 14 8 1 0\nM END\n[\\V2000]."}", "/scratch/micpie/export/qm8/test_0-2.jsonl": "{"text":"The S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) of the molecule with the InChI InChI=1S\/H2O\/h1H2 is 0.03776 a. u."} {"text":"The S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) of the molecule with the SMILES [H]C([H])(C(F)(F)F)C1([H])C([H])([H])C1([H])[H] is 0.01563 a. u."}", "/scratch/micpie/export/qm8/test_0-30.jsonl": "{"text":"User: I want to design a molecule that has a RI-CC2\/def2TZVP-computed S0 -> S1 transition energy of 0.287 a. u. and a S0 -> S2 transition energy computed using second-order approximate coupled-cluster theory (CC2) of 0.36358 a. u.\nAssistant: Do you have any other constraints?\nUser: Yep, I would like the S0 -> S1 transition oscillator strength computed using RI-CC2\/def2TZVP to be 0.03776 a. u.\nAssistant: I advise the chemical with the SELFIES [H][O][H]."} {"text":"User: I want to design a molecule that has a S0 -> S1 transition energy computed using RI-CC2\/def2TZVP of 0.325 a. u. and a S0 -> S2 transition energy computed using second-order approximate coupled-cluster theory (CC2) of 0.33078 a. u.\nAssistant: Awesome, What else should I take into account?\nUser: Yeah, I want the S0 -> S1 transition oscillator strength computed using RI-CC2\/def2TZVP to be 0.01563 a. u.\nAssistant: I recommend the compound with the InChI InChI=1S\/C5H7F3\/c6-5(7,8)3-4-1-2-4\/h4H,1-3H2."}", "/scratch/micpie/export/qm8/train_0-22.jsonl": "{"text":"Question: What is the CAM-B3LYP\/def2TZVP-computed S0 -> S1 transition oscillator strength of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 2\t293.60975\t293.54111\t191.39397\t1.6256\t9.46\t-0.257\t0.0829\t0.3399\t26.1563\t0.034358\t-56.525887\t-56.523026\t-56.522082\t-56.544961\t6.316\n ChemNLP 3D\n\n 4 3 0 0 0 0 0 0 0 0999 V2000\n -0.0404 1.0241 0.0626 N 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0173 0.0125 -0.0274 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9158 1.3587 -0.0288 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5203 1.3435 -0.7755 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.05750 a. u."} {"text":"Question: What is the CAM-B3LYP\/def2TZVP-computed S0 -> S1 transition oscillator strength of the molecule with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23927\t3.78971\t1.55457\t1.42284\t1.532\t46.48\t-0.2621\t0.067\t0.329\t855.7697\t0.092261\t-530.085764\t-530.078593\t-530.077649\t-530.118327\t24.984\n ChemNLP 3D\n\n 13 13 0 0 0 0 0 0 0 0999 V2000\n 0.1408 0.0536 0.1012 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0125 1.3864 0.0414 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2077 1.9240 -0.1160 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7231 1.6594 -1.0646 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6922 1.9297 1.2733 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9757 1.1953 1.7185 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3649 0.7933 2.9595 O 0 0 0 0 0 0 0 0 0 0 0 0\n -0.1261 1.4488 2.6272 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7843 3.0100 1.1594 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.8483 1.8387 1.8753 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.2640 0.3380 1.1002 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7117 0.7458 2.5622 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.1240 2.2461 3.3356 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 10 1 0\n 6 7 1 0\n 8 7 1 0\n 8 13 1 0\n 9 5 1 0\n 11 6 1 0\n 12 8 1 0\nM END\n[\\V2000].\nAnswer: 0.00010 a. u."}", "/scratch/micpie/export/qm8/valid_0-10.jsonl": "{"text":"The S0 -> S1 transition oscillator strength computed using CAM-B3LYP\/def2TZVP of the compound with the SMILES [H]C([H])([H])[H] is 0.18320 a. u."} {"text":"The S0 -> S1 transition oscillator strength computed using CAM-B3LYP\/def2TZVP of the chemical with the SELFIES [H][C][Branch1][C][H][C][Branch1][C][H][Branch1][C][H][C][Branch1][C][H][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Ring1][N][Branch1][C][H][H] is 0.03760 a. u."}", "/scratch/micpie/export/qm8/train_0-6.jsonl": "{"text":"The S0 -> S1 transition oscillator strength computed using LR-TDPBE0\/def2SVP of the molecule with the DeepSMILES [H]N[H])[H] is 0.04076 a. u."} {"text":"The S0 -> S1 transition oscillator strength computed using linear-response time-dependent density functional theory (LR-TDPBE0\/def2SVP) of the chemical with the SELFIES [H][C][Branch1][C][H][O][C][Branch1][C][H][Branch1][C][H][C][Ring1][#Branch1][Branch1][C][H][C][Branch1][C][F][Branch1][C][F][F] is 0.00194 a. u."}", "/scratch/micpie/export/qm8/valid_0-6.jsonl": "{"text":"The S0 -> S1 transition oscillator strength computed using linear-response time-dependent density functional theory (LR-TDPBE0\/def2SVP) of the chemical with the canonical SMILES C is 0.18144 a. u."} {"text":"The S0 -> S1 transition oscillator strength computed using LR-TDPBE0\/def2SVP of the molecule with the DeepSMILES [H]C[H])C[H])[H])C[H])CF)F)F))C4[H])[H] is 0.00389 a. u."}", "/scratch/micpie/export/qm8/valid_0-30.jsonl": "{"text":"User: I want to design a molecule that has a RI-CC2\/def2TZVP-computed S0 -> S1 transition energy of 0.433 a. u. and a S0 -> S2 transition energy computed using second-order approximate coupled-cluster theory (CC2) of 0.43296 a. u.\nAssistant: Great, What else should I take into account?\nUser: I want the S0 -> S1 transition oscillator strength computed using RI-CC2\/def2TZVP to be 0.24973 a. u.\nAssistant: I advise the chemical with the SELFIES [H][C][Branch1][C][H][Branch1][C][H][H]."} {"text":"User: I want to design a molecule that has a S0 -> S1 transition energy computed using RI-CC2\/def2TZVP of 0.344 a. u. and a S0 -> S2 transition energy computed using second-order approximate coupled-cluster theory (CC2) of 0.34462 a. u.\nAssistant: Awesome, Do you have any other constraints?\nUser: I want the RI-CC2\/def2TZVP-computed S0 -> S1 transition oscillator strength to be 0.04496 a. u.\nAssistant: I advise the molecule with the canonical SMILES FC(F)(F)C1CCC1."}", "/scratch/micpie/export/qm8/train_0-21.jsonl": "{"text":"Question: What is the S0 -> S2 transition energy computed using Coulomb-attenuated B3LYP density functional theory (CAM-B3LYP\/def2TZVP) of the chemical with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 2\t293.60975\t293.54111\t191.39397\t1.6256\t9.46\t-0.257\t0.0829\t0.3399\t26.1563\t0.034358\t-56.525887\t-56.523026\t-56.522082\t-56.544961\t6.316\n ChemNLP 3D\n\n 4 3 0 0 0 0 0 0 0 0999 V2000\n -0.0404 1.0241 0.0626 N 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0173 0.0125 -0.0274 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9158 1.3587 -0.0288 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5203 1.3435 -0.7755 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.33448 a. u."} {"text":"Question: What is the CAM-B3LYP\/def2TZVP-computed S0 -> S2 transition energy of the chemical with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23927\t3.78971\t1.55457\t1.42284\t1.532\t46.48\t-0.2621\t0.067\t0.329\t855.7697\t0.092261\t-530.085764\t-530.078593\t-530.077649\t-530.118327\t24.984\n ChemNLP 3D\n\n 13 13 0 0 0 0 0 0 0 0999 V2000\n 0.1408 0.0536 0.1012 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0125 1.3864 0.0414 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2077 1.9240 -0.1160 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7231 1.6594 -1.0646 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6922 1.9297 1.2733 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9757 1.1953 1.7185 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3649 0.7933 2.9595 O 0 0 0 0 0 0 0 0 0 0 0 0\n -0.1261 1.4488 2.6272 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7843 3.0100 1.1594 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.8483 1.8387 1.8753 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.2640 0.3380 1.1002 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7117 0.7458 2.5622 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.1240 2.2461 3.3356 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 10 1 0\n 6 7 1 0\n 8 7 1 0\n 8 13 1 0\n 9 5 1 0\n 11 6 1 0\n 12 8 1 0\nM END\n[\\V2000].\nAnswer: 0.29204 a. u."}", "/scratch/micpie/export/qm8/train_0-19.jsonl": "{"text":"Question: What is the S0 -> S2 transition oscillator strength computed using linear-response time-dependent density functional theory (LR-TDPBE0\/def2SVP) of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 2\t293.60975\t293.54111\t191.39397\t1.6256\t9.46\t-0.257\t0.0829\t0.3399\t26.1563\t0.034358\t-56.525887\t-56.523026\t-56.522082\t-56.544961\t6.316\n ChemNLP 3D\n\n 4 3 0 0 0 0 0 0 0 0999 V2000\n -0.0404 1.0241 0.0626 N 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0173 0.0125 -0.0274 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9158 1.3587 -0.0288 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5203 1.3435 -0.7755 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.0316 a. u."} {"text":"Question: What is the S0 -> S2 transition oscillator strength computed using linear-response time-dependent density functional theory (LR-TDPBE0\/def2SVP) of the chemical with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23927\t3.78971\t1.55457\t1.42284\t1.532\t46.48\t-0.2621\t0.067\t0.329\t855.7697\t0.092261\t-530.085764\t-530.078593\t-530.077649\t-530.118327\t24.984\n ChemNLP 3D\n\n 13 13 0 0 0 0 0 0 0 0999 V2000\n 0.1408 0.0536 0.1012 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0125 1.3864 0.0414 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2077 1.9240 -0.1160 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7231 1.6594 -1.0646 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6922 1.9297 1.2733 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9757 1.1953 1.7185 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3649 0.7933 2.9595 O 0 0 0 0 0 0 0 0 0 0 0 0\n -0.1261 1.4488 2.6272 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7843 3.0100 1.1594 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.8483 1.8387 1.8753 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.2640 0.3380 1.1002 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7117 0.7458 2.5622 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.1240 2.2461 3.3356 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 10 1 0\n 6 7 1 0\n 8 7 1 0\n 8 13 1 0\n 9 5 1 0\n 11 6 1 0\n 12 8 1 0\nM END\n[\\V2000].\nAnswer: 0.0238 a. u."}", "/scratch/micpie/export/qm8/valid_0-29.jsonl": "{"text":"User: I want to design a molecule that has a RI-CC2\/def2TZVP-computed S0 -> S1 transition energy of 0.433 a. u. and a S0 -> S2 transition energy computed using second-order approximate coupled-cluster theory (CC2) of 0.43296 a. u.\nAssistant: What else should I take into account?\nUser: I only want to know the InChI of the molecule.\nAssistant: I suggest the molecule with the InChI InChI=1S\/CH4\/h1H4."} {"text":"User: I want to design a molecule that has a RI-CC2\/def2TZVP-computed S0 -> S1 transition energy of 0.344 a. u. and a RI-CC2\/def2TZVP-computed S0 -> S2 transition energy of 0.34462 a. u.\nAssistant: That sounds interesting, What else should I take into account?\nUser: I only want to know the SMILES of the molecule.\nAssistant: I recommend the molecule with the SMILES [H]C1([H])C([H])([H])C([H])(C(F)(F)F)C1([H])[H]."}", "/scratch/micpie/export/qm8/test_0-9.jsonl": "{"text":"The S0 -> S2 transition energy computed using CAM-B3LYP\/def2TZVP of the chemical with the canonical SMILES O is 0.35007 a. u."} {"text":"The S0 -> S2 transition energy computed using Coulomb-attenuated B3LYP density functional theory (CAM-B3LYP\/def2TZVP) of the chemical with the canonical SMILES FC(F)(F)CC1CC1 is 0.32280 a. u."}", "/scratch/micpie/export/qm8/test_0-0.jsonl": "{"text":"The S0 -> S1 transition energy computed using second-order approximate coupled-cluster theory (CC2) of the chemical with the SMILES [H]O[H] is 0.287 a. u."} {"text":"The S0 -> S1 transition energy computed using RI-CC2\/def2TZVP of the molecule with the SMILES [H]C([H])(C(F)(F)F)C1([H])C([H])([H])C1([H])[H] is 0.325 a. u."}", "/scratch/micpie/export/qm8/test_0-24.jsonl": "{"text":"Question: What is the S0 -> S1 transition energy computed using RI-CC2\/def2TZVP of the molecule with the V3000 Molfile with the following content?\nDescription: The content of the V3000 Molfile is [V3000]\ngdb 3\t799.58812\t437.90386\t282.94545\t1.8511\t6.31\t-0.2928\t0.0687\t0.3615\t19.0002\t0.021375\t-76.404702\t-76.401867\t-76.400922\t-76.422349\t6.002\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 3 2 0 0 0\nM V30 BEGIN ATOM\nM V30 1 O -0.034400 0.977500 0.007600 0\nM V30 2 H 0.064800 0.020600 0.001500 0\nM V30 3 H 0.871800 1.300800 0.000700 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 2 1\nM V30 2 1 3 1\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000].\nAnswer: 0.287 a. u."} {"text":"Question: What is the RI-CC2\/def2TZVP-computed S0 -> S1 transition energy of the compound with the V3000 Molfile with the following content?\nDescription: The content of the V3000 Molfile is [V3000]\ngdb 23920\t4.16555\t1.22902\t1.1911\t2.0686\t53.89\t-0.2927\t0.0895\t0.3822\t1023.2247\t0.115399\t-494.167503\t-494.159866\t-494.158921\t-494.200318\t27.778\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 15 15 0 0 0\nM V30 BEGIN ATOM\nM V30 1 F 0.377400 0.190200 -0.032700 0\nM V30 2 C -0.026700 1.467300 0.077600 0\nM V30 3 F 1.076200 2.233000 0.103600 0\nM V30 4 F -0.697500 1.772700 -1.045900 0\nM V30 5 C -0.892400 1.675200 1.305700 0\nM V30 6 C -0.202900 1.259400 2.585600 0\nM V30 7 C 0.782000 2.186600 3.251400 0\nM V30 8 C -0.560500 1.930300 3.887500 0\nM V30 9 H -1.163300 2.735600 1.333200 0\nM V30 10 H -1.815100 1.106300 1.148400 0\nM V30 11 H 0.026300 0.199700 2.638300 0\nM V30 12 H 1.663400 1.752600 3.709300 0\nM V30 13 H 0.953000 3.153500 2.791500 0\nM V30 14 H -1.295900 2.728000 3.862300 0\nM V30 15 H -0.602100 1.318900 4.781700 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 2 3\nM V30 3 1 2 5\nM V30 4 1 4 2\nM V30 5 1 5 9\nM V30 6 1 5 6\nM V30 7 1 6 11\nM V30 8 1 6 7\nM V30 9 1 6 8\nM V30 10 1 7 12\nM V30 11 1 7 8\nM V30 12 1 8 15\nM V30 13 1 10 5\nM V30 14 1 13 7\nM V30 15 1 14 8\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000].\nAnswer: 0.325 a. u."}", "/scratch/micpie/export/qm8/valid_0-16.jsonl": "{"text":"Question: What is the LR-TDPBE0\/def2SVP-computed S0 -> S1 transition energy of the molecule with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 1\t157.7118\t157.70997\t157.70699\t0.\t13.21\t-0.3877\t0.1171\t0.5048\t35.3641\t0.044749\t-40.47893\t-40.476062\t-40.475117\t-40.498597\t6.469\n ChemNLP 3D\n\n 5 4 0 0 0 0 0 0 0 0999 V2000\n -0.0127 1.0858 0.0080 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0022 -0.0060 0.0020 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0117 1.4638 0.0003 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5408 1.4475 -0.8766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5238 1.4379 0.9064 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 5 1 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.43022 a. u."} {"text":"Question: What is the S0 -> S1 transition energy computed using LR-TDPBE0\/def2SVP of the chemical with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23924\t3.7138\t1.49465\t1.35927\t2.1985\t53.65\t-0.3185\t0.0809\t0.3994\t921.4627\t0.116107\t-494.170733\t-494.163382\t-494.162438\t-494.203105\t26.845\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.2095 0.1938 -0.0747 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0135 1.5187 0.0042 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.1788 2.1279 -0.1145 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7489 1.8601 -1.0679 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7020 1.8981 1.2897 C 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0047 1.1393 1.6632 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.4997 1.0825 3.1314 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0689 1.4081 2.6206 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8459 2.9813 1.2766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.9505 1.6455 1.4637 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0280 0.1495 1.2012 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9113 1.8957 3.7353 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.6374 0.1448 3.6730 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5236 0.5037 2.4633 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5269 2.1282 3.1837 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 7 1 0\n 7 13 1 0\n 7 12 1 0\n 8 7 1 0\n 8 15 1 0\n 9 5 1 0\n 10 6 1 0\n 11 6 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.34706 a. u."}", "/scratch/micpie/export/qm8/valid_0-7.jsonl": "{"text":"The LR-TDPBE0\/def2SVP-computed S0 -> S2 transition oscillator strength of the compound with the DeepSMILES [H]C[H])[H])[H] is 0.1815 a. u."} {"text":"The S0 -> S2 transition oscillator strength computed using LR-TDPBE0\/def2SVP of the chemical with the DeepSMILES [H]C[H])C[H])[H])C[H])CF)F)F))C4[H])[H] is 0.0570 a. u."}", "/scratch/micpie/export/qm8/test_0-3.jsonl": "{"text":"The RI-CC2\/def2TZVP-computed S0 -> S2 transition oscillator strength of the chemical with the DeepSMILES [H]O[H] is 0.00000 a. u."} {"text":"The RI-CC2\/def2TZVP-computed S0 -> S2 transition oscillator strength of the chemical with the SELFIES [H][C][Branch1][C][H][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][H][C][Branch1][C][H][Branch1][C][H][C][Ring1][=Branch1][Branch1][C][H][H] is 0.00075 a. u."}", "/scratch/micpie/export/qm8/valid_0-11.jsonl": "{"text":"The S0 -> S2 transition oscillator strength computed using CAM-B3LYP\/def2TZVP of the compound with the canonical SMILES C is 0.1832 a. u."} {"text":"The S0 -> S2 transition oscillator strength computed using Coulomb-attenuated B3LYP density functional theory (CAM-B3LYP\/def2TZVP) of the molecule with the canonical SMILES FC(F)(F)C1CCC1 is 0.0107 a. u."}", "/scratch/micpie/export/qm8/train_0-20.jsonl": "{"text":"Question: What is the S0 -> S1 transition energy computed using Coulomb-attenuated B3LYP density functional theory (CAM-B3LYP\/def2TZVP) of the molecule with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 2\t293.60975\t293.54111\t191.39397\t1.6256\t9.46\t-0.257\t0.0829\t0.3399\t26.1563\t0.034358\t-56.525887\t-56.523026\t-56.522082\t-56.544961\t6.316\n ChemNLP 3D\n\n 4 3 0 0 0 0 0 0 0 0999 V2000\n -0.0404 1.0241 0.0626 N 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0173 0.0125 -0.0274 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9158 1.3587 -0.0288 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5203 1.3435 -0.7755 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.25385 a. u."} {"text":"Question: What is the CAM-B3LYP\/def2TZVP-computed S0 -> S1 transition energy of the molecule with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23927\t3.78971\t1.55457\t1.42284\t1.532\t46.48\t-0.2621\t0.067\t0.329\t855.7697\t0.092261\t-530.085764\t-530.078593\t-530.077649\t-530.118327\t24.984\n ChemNLP 3D\n\n 13 13 0 0 0 0 0 0 0 0999 V2000\n 0.1408 0.0536 0.1012 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0125 1.3864 0.0414 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2077 1.9240 -0.1160 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7231 1.6594 -1.0646 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6922 1.9297 1.2733 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9757 1.1953 1.7185 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3649 0.7933 2.9595 O 0 0 0 0 0 0 0 0 0 0 0 0\n -0.1261 1.4488 2.6272 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7843 3.0100 1.1594 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.8483 1.8387 1.8753 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.2640 0.3380 1.1002 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7117 0.7458 2.5622 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.1240 2.2461 3.3356 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 10 1 0\n 6 7 1 0\n 8 7 1 0\n 8 13 1 0\n 9 5 1 0\n 11 6 1 0\n 12 8 1 0\nM END\n[\\V2000].\nAnswer: 0.26683 a. u."}", "/scratch/micpie/export/qm8/train_0-30.jsonl": "{"text":"User: I want to design a molecule that has a S0 -> S1 transition energy computed using RI-CC2\/def2TZVP of 0.265 a. u. and a RI-CC2\/def2TZVP-computed S0 -> S2 transition energy of 0.35008 a. u.\nAssistant: Cool, Do you have other requirements?\nUser: Yep, I would like the S0 -> S1 transition oscillator strength computed using RI-CC2\/def2TZVP to be 0.06702 a. u.\nAssistant: I suggest the compound with the canonical SMILES N."} {"text":"User: I want to design a compound that has a RI-CC2\/def2TZVP-computed S0 -> S1 transition energy of 0.262 a. u. and a S0 -> S2 transition energy computed using second-order approximate coupled-cluster theory (CC2) of 0.28498 a. u.\nAssistant: Do you have other requirements?\nUser: Yeah, I would like the RI-CC2\/def2TZVP-computed S0 -> S1 transition oscillator strength to be 0.00004 a. u.\nAssistant: I propose the molecule with the DeepSMILES [H]C[H])OC[H])[H])C4[H])CF)F)F."}", "/scratch/micpie/export/qm8/valid_0-20.jsonl": "{"text":"Question: What is the CAM-B3LYP\/def2TZVP-computed S0 -> S1 transition energy of the chemical with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 1\t157.7118\t157.70997\t157.70699\t0.\t13.21\t-0.3877\t0.1171\t0.5048\t35.3641\t0.044749\t-40.47893\t-40.476062\t-40.475117\t-40.498597\t6.469\n ChemNLP 3D\n\n 5 4 0 0 0 0 0 0 0 0999 V2000\n -0.0127 1.0858 0.0080 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0022 -0.0060 0.0020 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0117 1.4638 0.0003 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5408 1.4475 -0.8766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5238 1.4379 0.9064 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 5 1 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.40993 a. u."} {"text":"Question: What is the S0 -> S1 transition energy computed using CAM-B3LYP\/def2TZVP of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23924\t3.7138\t1.49465\t1.35927\t2.1985\t53.65\t-0.3185\t0.0809\t0.3994\t921.4627\t0.116107\t-494.170733\t-494.163382\t-494.162438\t-494.203105\t26.845\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.2095 0.1938 -0.0747 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0135 1.5187 0.0042 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.1788 2.1279 -0.1145 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7489 1.8601 -1.0679 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7020 1.8981 1.2897 C 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0047 1.1393 1.6632 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.4997 1.0825 3.1314 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0689 1.4081 2.6206 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8459 2.9813 1.2766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.9505 1.6455 1.4637 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0280 0.1495 1.2012 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9113 1.8957 3.7353 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.6374 0.1448 3.6730 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5236 0.5037 2.4633 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5269 2.1282 3.1837 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 7 1 0\n 7 13 1 0\n 7 12 1 0\n 8 7 1 0\n 8 15 1 0\n 9 5 1 0\n 10 6 1 0\n 11 6 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.33602 a. u."}", "/scratch/micpie/export/qm8/train_0-26.jsonl": "{"text":"Question: What is the S0 -> S2 transition energy computed using RI-CC2\/def2TZVP of the molecule with the V3000 Molfile with the following content?\nDescription: The content of the V3000 Molfile is [V3000]\ngdb 2\t293.60975\t293.54111\t191.39397\t1.6256\t9.46\t-0.257\t0.0829\t0.3399\t26.1563\t0.034358\t-56.525887\t-56.523026\t-56.522082\t-56.544961\t6.316\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 4 3 0 0 0\nM V30 BEGIN ATOM\nM V30 1 N -0.040400 1.024100 0.062600 0\nM V30 2 H 0.017300 0.012500 -0.027400 0\nM V30 3 H 0.915800 1.358700 -0.028800 0\nM V30 4 H -0.520300 1.343500 -0.775500 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 2 1\nM V30 2 1 3 1\nM V30 3 1 4 1\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000].\nAnswer: 0.35008 a. u."} {"text":"Question: What is the S0 -> S2 transition energy computed using RI-CC2\/def2TZVP of the compound with the V3000 Molfile with the following content?\nDescription: The content of the V3000 Molfile is [V3000]\ngdb 23927\t3.78971\t1.55457\t1.42284\t1.532\t46.48\t-0.2621\t0.067\t0.329\t855.7697\t0.092261\t-530.085764\t-530.078593\t-530.077649\t-530.118327\t24.984\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 13 13 0 0 0\nM V30 BEGIN ATOM\nM V30 1 F 0.140800 0.053600 0.101200 0\nM V30 2 C -0.012500 1.386400 0.041400 0\nM V30 3 F 1.207700 1.924000 -0.116000 0\nM V30 4 F -0.723100 1.659400 -1.064600 0\nM V30 5 C -0.692200 1.929700 1.273300 0\nM V30 6 C -1.975700 1.195300 1.718500 0\nM V30 7 O -1.364900 0.793300 2.959500 0\nM V30 8 C -0.126100 1.448800 2.627200 0\nM V30 9 H -0.784300 3.010000 1.159400 0\nM V30 10 H -2.848300 1.838700 1.875300 0\nM V30 11 H -2.264000 0.338000 1.100200 0\nM V30 12 H 0.711700 0.745800 2.562200 0\nM V30 13 H 0.124000 2.246100 3.335600 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 2 1\nM V30 2 1 2 5\nM V30 3 1 3 2\nM V30 4 1 4 2\nM V30 5 1 5 6\nM V30 6 1 5 8\nM V30 7 1 6 10\nM V30 8 1 6 7\nM V30 9 1 8 7\nM V30 10 1 8 13\nM V30 11 1 9 5\nM V30 12 1 11 6\nM V30 13 1 12 8\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000].\nAnswer: 0.28498 a. u."}", "/scratch/micpie/export/qm8/train_0-0.jsonl": "{"text":"The S0 -> S1 transition energy computed using second-order approximate coupled-cluster theory (CC2) of the compound with the canonical SMILES N is 0.265 a. u."} {"text":"The S0 -> S1 transition energy computed using RI-CC2\/def2TZVP of the molecule with the SMILES [H]C1([H])OC([H])([H])C1([H])C(F)(F)F is 0.262 a. u."}", "/scratch/micpie/export/qm8/test_0-6.jsonl": "{"text":"The LR-TDPBE0\/def2SVP-computed S0 -> S1 transition oscillator strength of the molecule with the SMILES [H]O[H] is 0.01950 a. u."} {"text":"The LR-TDPBE0\/def2SVP-computed S0 -> S1 transition oscillator strength of the chemical with the SMILES [H]C([H])(C(F)(F)F)C1([H])C([H])([H])C1([H])[H] is 0.00664 a. u."}", "/scratch/micpie/export/qm8/train_0-10.jsonl": "{"text":"The S0 -> S1 transition oscillator strength computed using Coulomb-attenuated B3LYP density functional theory (CAM-B3LYP\/def2TZVP) of the chemical with the SELFIES [H][N][Branch1][C][H][H] is 0.05750 a. u."} {"text":"The CAM-B3LYP\/def2TZVP-computed S0 -> S1 transition oscillator strength of the compound with the SELFIES [H][C][Branch1][C][H][O][C][Branch1][C][H][Branch1][C][H][C][Ring1][#Branch1][Branch1][C][H][C][Branch1][C][F][Branch1][C][F][F] is 0.00010 a. u."}", "/scratch/micpie/export/qm8/train_0-3.jsonl": "{"text":"The S0 -> S2 transition oscillator strength computed using RI-CC2\/def2TZVP of the compound with the SELFIES [H][N][Branch1][C][H][H] is 0.03005 a. u."} {"text":"The S0 -> S2 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) of the compound with the canonical SMILES FC(F)(F)C1COC1 is 0.02414 a. u."}", "/scratch/micpie/export/qm8/train_0-23.jsonl": "{"text":"Question: What is the S0 -> S2 transition oscillator strength computed using CAM-B3LYP\/def2TZVP of the chemical with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 2\t293.60975\t293.54111\t191.39397\t1.6256\t9.46\t-0.257\t0.0829\t0.3399\t26.1563\t0.034358\t-56.525887\t-56.523026\t-56.522082\t-56.544961\t6.316\n ChemNLP 3D\n\n 4 3 0 0 0 0 0 0 0 0999 V2000\n -0.0404 1.0241 0.0626 N 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0173 0.0125 -0.0274 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9158 1.3587 -0.0288 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5203 1.3435 -0.7755 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.0238 a. u."} {"text":"Question: What is the CAM-B3LYP\/def2TZVP-computed S0 -> S2 transition oscillator strength of the molecule with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23927\t3.78971\t1.55457\t1.42284\t1.532\t46.48\t-0.2621\t0.067\t0.329\t855.7697\t0.092261\t-530.085764\t-530.078593\t-530.077649\t-530.118327\t24.984\n ChemNLP 3D\n\n 13 13 0 0 0 0 0 0 0 0999 V2000\n 0.1408 0.0536 0.1012 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0125 1.3864 0.0414 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2077 1.9240 -0.1160 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7231 1.6594 -1.0646 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6922 1.9297 1.2733 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9757 1.1953 1.7185 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3649 0.7933 2.9595 O 0 0 0 0 0 0 0 0 0 0 0 0\n -0.1261 1.4488 2.6272 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7843 3.0100 1.1594 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.8483 1.8387 1.8753 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.2640 0.3380 1.1002 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7117 0.7458 2.5622 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.1240 2.2461 3.3356 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 10 1 0\n 6 7 1 0\n 8 7 1 0\n 8 13 1 0\n 9 5 1 0\n 11 6 1 0\n 12 8 1 0\nM END\n[\\V2000].\nAnswer: 0.0266 a. u."}", "/scratch/micpie/export/qm8/train_0-12.jsonl": "{"text":"Question: What is the S0 -> S1 transition energy computed using second-order approximate coupled-cluster theory (CC2) of the molecule with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 2\t293.60975\t293.54111\t191.39397\t1.6256\t9.46\t-0.257\t0.0829\t0.3399\t26.1563\t0.034358\t-56.525887\t-56.523026\t-56.522082\t-56.544961\t6.316\n ChemNLP 3D\n\n 4 3 0 0 0 0 0 0 0 0999 V2000\n -0.0404 1.0241 0.0626 N 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0173 0.0125 -0.0274 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9158 1.3587 -0.0288 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5203 1.3435 -0.7755 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.265 a. u."} {"text":"Question: What is the RI-CC2\/def2TZVP-computed S0 -> S1 transition energy of the molecule with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23927\t3.78971\t1.55457\t1.42284\t1.532\t46.48\t-0.2621\t0.067\t0.329\t855.7697\t0.092261\t-530.085764\t-530.078593\t-530.077649\t-530.118327\t24.984\n ChemNLP 3D\n\n 13 13 0 0 0 0 0 0 0 0999 V2000\n 0.1408 0.0536 0.1012 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0125 1.3864 0.0414 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2077 1.9240 -0.1160 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7231 1.6594 -1.0646 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6922 1.9297 1.2733 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9757 1.1953 1.7185 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3649 0.7933 2.9595 O 0 0 0 0 0 0 0 0 0 0 0 0\n -0.1261 1.4488 2.6272 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7843 3.0100 1.1594 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.8483 1.8387 1.8753 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.2640 0.3380 1.1002 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7117 0.7458 2.5622 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.1240 2.2461 3.3356 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 10 1 0\n 6 7 1 0\n 8 7 1 0\n 8 13 1 0\n 9 5 1 0\n 11 6 1 0\n 12 8 1 0\nM END\n[\\V2000].\nAnswer: 0.262 a. u."}", "/scratch/micpie/export/qm8/test_0-28.jsonl": "{"text":"User: I have computed the RI-CC2\/def2TZVP-computed S0 -> S1 transition energy, S0 -> S2 transition energy computed using RI-CC2\/def2TZVP, and RI-CC2\/def2TZVP-computed S0 -> S1 transition oscillator strength of a compound and want to know its SMILES.\nAssistant: That sounds interesting, I would need to know the RI-CC2\/def2TZVP-computed S0 -> S1 transition energy, S0 -> S2 transition energy computed using RI-CC2\/def2TZVP, and RI-CC2\/def2TZVP-computed S0 -> S1 transition oscillator strength of the molecule you want to know the SMILES of.\nUser: The RI-CC2\/def2TZVP-computed S0 -> S1 transition energy is 0.287 a. u., the S0 -> S2 transition energy computed using RI-CC2\/def2TZVP is 0.36358 a. u., and the RI-CC2\/def2TZVP-computed S0 -> S1 transition oscillator strength is 0.03776 a. u.\nAssistant: The SMILES of the molecule is [H]O[H]."} {"text":"User: I have computed the S0 -> S1 transition energy computed using second-order approximate coupled-cluster theory (CC2), RI-CC2\/def2TZVP-computed S0 -> S2 transition energy, and S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) of a molecule and want to know its InChI.\nAssistant: I would need to know the S0 -> S1 transition energy computed using second-order approximate coupled-cluster theory (CC2), RI-CC2\/def2TZVP-computed S0 -> S2 transition energy, and S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) of the molecule you want to know the InChI of.\nUser: The S0 -> S1 transition energy computed using second-order approximate coupled-cluster theory (CC2) is 0.325 a. u., the RI-CC2\/def2TZVP-computed S0 -> S2 transition energy is 0.33078 a. u., and the S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) is 0.01563 a. u.\nAssistant: The InChI of the molecule is InChI=1S\/C5H7F3\/c6-5(7,8)3-4-1-2-4\/h4H,1-3H2."}", "/scratch/micpie/export/qm8/test_0-13.jsonl": "{"text":"Question: What is the S0 -> S2 transition energy computed using RI-CC2\/def2TZVP of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 3\t799.58812\t437.90386\t282.94545\t1.8511\t6.31\t-0.2928\t0.0687\t0.3615\t19.0002\t0.021375\t-76.404702\t-76.401867\t-76.400922\t-76.422349\t6.002\n ChemNLP 3D\n\n 3 2 0 0 0 0 0 0 0 0999 V2000\n -0.0344 0.9775 0.0076 O 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0648 0.0206 0.0015 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.8718 1.3008 0.0007 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\nM END\n[\\V2000].\nAnswer: 0.36358 a. u."} {"text":"Question: What is the RI-CC2\/def2TZVP-computed S0 -> S2 transition energy of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23920\t4.16555\t1.22902\t1.1911\t2.0686\t53.89\t-0.2927\t0.0895\t0.3822\t1023.2247\t0.115399\t-494.167503\t-494.159866\t-494.158921\t-494.200318\t27.778\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.3774 0.1902 -0.0327 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0267 1.4673 0.0776 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0762 2.2330 0.1036 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6975 1.7727 -1.0459 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8924 1.6752 1.3057 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.2029 1.2594 2.5856 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7820 2.1866 3.2514 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5605 1.9303 3.8875 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.1633 2.7356 1.3332 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.8151 1.1063 1.1484 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0263 0.1997 2.6383 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.6634 1.7526 3.7093 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9530 3.1535 2.7915 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.2959 2.7280 3.8623 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6021 1.3189 4.7817 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 3 1 0\n 2 5 1 0\n 4 2 1 0\n 5 9 1 0\n 5 6 1 0\n 6 11 1 0\n 6 7 1 0\n 6 8 1 0\n 7 12 1 0\n 7 8 1 0\n 8 15 1 0\n 10 5 1 0\n 13 7 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.33078 a. u."}", "/scratch/micpie/export/qm8/test_0-23.jsonl": "{"text":"Question: What is the S0 -> S2 transition oscillator strength computed using Coulomb-attenuated B3LYP density functional theory (CAM-B3LYP\/def2TZVP) of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 3\t799.58812\t437.90386\t282.94545\t1.8511\t6.31\t-0.2928\t0.0687\t0.3615\t19.0002\t0.021375\t-76.404702\t-76.401867\t-76.400922\t-76.422349\t6.002\n ChemNLP 3D\n\n 3 2 0 0 0 0 0 0 0 0999 V2000\n -0.0344 0.9775 0.0076 O 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0648 0.0206 0.0015 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.8718 1.3008 0.0007 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\nM END\n[\\V2000].\nAnswer: 0.0000 a. u."} {"text":"Question: What is the CAM-B3LYP\/def2TZVP-computed S0 -> S2 transition oscillator strength of the chemical with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23920\t4.16555\t1.22902\t1.1911\t2.0686\t53.89\t-0.2927\t0.0895\t0.3822\t1023.2247\t0.115399\t-494.167503\t-494.159866\t-494.158921\t-494.200318\t27.778\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.3774 0.1902 -0.0327 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0267 1.4673 0.0776 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0762 2.2330 0.1036 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6975 1.7727 -1.0459 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8924 1.6752 1.3057 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.2029 1.2594 2.5856 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7820 2.1866 3.2514 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5605 1.9303 3.8875 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.1633 2.7356 1.3332 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.8151 1.1063 1.1484 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0263 0.1997 2.6383 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.6634 1.7526 3.7093 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9530 3.1535 2.7915 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.2959 2.7280 3.8623 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6021 1.3189 4.7817 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 3 1 0\n 2 5 1 0\n 4 2 1 0\n 5 9 1 0\n 5 6 1 0\n 6 11 1 0\n 6 7 1 0\n 6 8 1 0\n 7 12 1 0\n 7 8 1 0\n 8 15 1 0\n 10 5 1 0\n 13 7 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.0008 a. u."}", "/scratch/micpie/export/qm8/valid_0-2.jsonl": "{"text":"The S0 -> S1 transition oscillator strength computed using RI-CC2\/def2TZVP of the chemical with the InChI InChI=1S\/CH4\/h1H4 is 0.24973 a. u."} {"text":"The S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) of the compound with the canonical SMILES FC(F)(F)C1CCC1 is 0.04496 a. u."}", "/scratch/micpie/export/qm8/valid_0-21.jsonl": "{"text":"Question: What is the S0 -> S2 transition energy computed using Coulomb-attenuated B3LYP density functional theory (CAM-B3LYP\/def2TZVP) of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 1\t157.7118\t157.70997\t157.70699\t0.\t13.21\t-0.3877\t0.1171\t0.5048\t35.3641\t0.044749\t-40.47893\t-40.476062\t-40.475117\t-40.498597\t6.469\n ChemNLP 3D\n\n 5 4 0 0 0 0 0 0 0 0999 V2000\n -0.0127 1.0858 0.0080 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0022 -0.0060 0.0020 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0117 1.4638 0.0003 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5408 1.4475 -0.8766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5238 1.4379 0.9064 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 5 1 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.40994 a. u."} {"text":"Question: What is the CAM-B3LYP\/def2TZVP-computed S0 -> S2 transition energy of the molecule with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23924\t3.7138\t1.49465\t1.35927\t2.1985\t53.65\t-0.3185\t0.0809\t0.3994\t921.4627\t0.116107\t-494.170733\t-494.163382\t-494.162438\t-494.203105\t26.845\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.2095 0.1938 -0.0747 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0135 1.5187 0.0042 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.1788 2.1279 -0.1145 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7489 1.8601 -1.0679 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7020 1.8981 1.2897 C 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0047 1.1393 1.6632 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.4997 1.0825 3.1314 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0689 1.4081 2.6206 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8459 2.9813 1.2766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.9505 1.6455 1.4637 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0280 0.1495 1.2012 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9113 1.8957 3.7353 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.6374 0.1448 3.6730 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5236 0.5037 2.4633 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5269 2.1282 3.1837 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 7 1 0\n 7 13 1 0\n 7 12 1 0\n 8 7 1 0\n 8 15 1 0\n 9 5 1 0\n 10 6 1 0\n 11 6 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.33788 a. u."}", "/scratch/micpie/export/qm8/train_0-14.jsonl": "{"text":"Question: What is the S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) of the molecule with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 2\t293.60975\t293.54111\t191.39397\t1.6256\t9.46\t-0.257\t0.0829\t0.3399\t26.1563\t0.034358\t-56.525887\t-56.523026\t-56.522082\t-56.544961\t6.316\n ChemNLP 3D\n\n 4 3 0 0 0 0 0 0 0 0999 V2000\n -0.0404 1.0241 0.0626 N 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0173 0.0125 -0.0274 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9158 1.3587 -0.0288 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5203 1.3435 -0.7755 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.06702 a. u."} {"text":"Question: What is the RI-CC2\/def2TZVP-computed S0 -> S1 transition oscillator strength of the chemical with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23927\t3.78971\t1.55457\t1.42284\t1.532\t46.48\t-0.2621\t0.067\t0.329\t855.7697\t0.092261\t-530.085764\t-530.078593\t-530.077649\t-530.118327\t24.984\n ChemNLP 3D\n\n 13 13 0 0 0 0 0 0 0 0999 V2000\n 0.1408 0.0536 0.1012 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0125 1.3864 0.0414 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2077 1.9240 -0.1160 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7231 1.6594 -1.0646 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6922 1.9297 1.2733 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9757 1.1953 1.7185 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3649 0.7933 2.9595 O 0 0 0 0 0 0 0 0 0 0 0 0\n -0.1261 1.4488 2.6272 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7843 3.0100 1.1594 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.8483 1.8387 1.8753 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.2640 0.3380 1.1002 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7117 0.7458 2.5622 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.1240 2.2461 3.3356 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 10 1 0\n 6 7 1 0\n 8 7 1 0\n 8 13 1 0\n 9 5 1 0\n 11 6 1 0\n 12 8 1 0\nM END\n[\\V2000].\nAnswer: 0.00004 a. u."}", "/scratch/micpie/export/qm8/valid_0-1.jsonl": "{"text":"The RI-CC2\/def2TZVP-computed S0 -> S2 transition energy of the chemical with the SELFIES [H][C][Branch1][C][H][Branch1][C][H][H] is 0.43296 a. u."} {"text":"The S0 -> S2 transition energy computed using RI-CC2\/def2TZVP of the chemical with the DeepSMILES [H]C[H])C[H])[H])C[H])CF)F)F))C4[H])[H] is 0.34462 a. u."}", "/scratch/micpie/export/qm8/valid_0-13.jsonl": "{"text":"Question: What is the S0 -> S2 transition energy computed using second-order approximate coupled-cluster theory (CC2) of the chemical with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 1\t157.7118\t157.70997\t157.70699\t0.\t13.21\t-0.3877\t0.1171\t0.5048\t35.3641\t0.044749\t-40.47893\t-40.476062\t-40.475117\t-40.498597\t6.469\n ChemNLP 3D\n\n 5 4 0 0 0 0 0 0 0 0999 V2000\n -0.0127 1.0858 0.0080 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0022 -0.0060 0.0020 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0117 1.4638 0.0003 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5408 1.4475 -0.8766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5238 1.4379 0.9064 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 5 1 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.43296 a. u."} {"text":"Question: What is the S0 -> S2 transition energy computed using RI-CC2\/def2TZVP of the molecule with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23924\t3.7138\t1.49465\t1.35927\t2.1985\t53.65\t-0.3185\t0.0809\t0.3994\t921.4627\t0.116107\t-494.170733\t-494.163382\t-494.162438\t-494.203105\t26.845\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.2095 0.1938 -0.0747 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0135 1.5187 0.0042 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.1788 2.1279 -0.1145 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7489 1.8601 -1.0679 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7020 1.8981 1.2897 C 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0047 1.1393 1.6632 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.4997 1.0825 3.1314 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0689 1.4081 2.6206 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8459 2.9813 1.2766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.9505 1.6455 1.4637 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0280 0.1495 1.2012 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9113 1.8957 3.7353 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.6374 0.1448 3.6730 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5236 0.5037 2.4633 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5269 2.1282 3.1837 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 7 1 0\n 7 13 1 0\n 7 12 1 0\n 8 7 1 0\n 8 15 1 0\n 9 5 1 0\n 10 6 1 0\n 11 6 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.34462 a. u."}", "/scratch/micpie/export/qm8/train_0-29.jsonl": "{"text":"User: I want to design a molecule that has a RI-CC2\/def2TZVP-computed S0 -> S1 transition energy of 0.265 a. u. and a RI-CC2\/def2TZVP-computed S0 -> S2 transition energy of 0.35008 a. u.\nAssistant: That sounds interesting, What else should I take into account?\nUser: No, I only want to know the DeepSMILES of the molecule.\nAssistant: I propose the chemical with the DeepSMILES [H]N[H])[H]."} {"text":"User: I want to design a molecule that has a S0 -> S1 transition energy computed using second-order approximate coupled-cluster theory (CC2) of 0.262 a. u. and a S0 -> S2 transition energy computed using second-order approximate coupled-cluster theory (CC2) of 0.28498 a. u.\nAssistant: Do you have any other constraints?\nUser: I only want to know the DeepSMILES of the molecule.\nAssistant: I suggest the chemical with the DeepSMILES [H]C[H])OC[H])[H])C4[H])CF)F)F."}", "/scratch/micpie/export/qm8/valid_0-23.jsonl": "{"text":"Question: What is the S0 -> S2 transition oscillator strength computed using CAM-B3LYP\/def2TZVP of the chemical with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 1\t157.7118\t157.70997\t157.70699\t0.\t13.21\t-0.3877\t0.1171\t0.5048\t35.3641\t0.044749\t-40.47893\t-40.476062\t-40.475117\t-40.498597\t6.469\n ChemNLP 3D\n\n 5 4 0 0 0 0 0 0 0 0999 V2000\n -0.0127 1.0858 0.0080 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0022 -0.0060 0.0020 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0117 1.4638 0.0003 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5408 1.4475 -0.8766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5238 1.4379 0.9064 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 5 1 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.1832 a. u."} {"text":"Question: What is the S0 -> S2 transition oscillator strength computed using Coulomb-attenuated B3LYP density functional theory (CAM-B3LYP\/def2TZVP) of the molecule with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23924\t3.7138\t1.49465\t1.35927\t2.1985\t53.65\t-0.3185\t0.0809\t0.3994\t921.4627\t0.116107\t-494.170733\t-494.163382\t-494.162438\t-494.203105\t26.845\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.2095 0.1938 -0.0747 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0135 1.5187 0.0042 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.1788 2.1279 -0.1145 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7489 1.8601 -1.0679 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7020 1.8981 1.2897 C 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0047 1.1393 1.6632 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.4997 1.0825 3.1314 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0689 1.4081 2.6206 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8459 2.9813 1.2766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.9505 1.6455 1.4637 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0280 0.1495 1.2012 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9113 1.8957 3.7353 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.6374 0.1448 3.6730 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5236 0.5037 2.4633 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5269 2.1282 3.1837 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 7 1 0\n 7 13 1 0\n 7 12 1 0\n 8 7 1 0\n 8 15 1 0\n 9 5 1 0\n 10 6 1 0\n 11 6 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.0107 a. u."}", "/scratch/micpie/export/qm8/valid_0-5.jsonl": "{"text":"The S0 -> S2 transition energy computed using LR-TDPBE0\/def2SVP of the chemical with the InChI InChI=1S\/CH4\/h1H4 is 0.43024 a. u."} {"text":"The S0 -> S2 transition energy computed using LR-TDPBE0\/def2SVP of the molecule with the SMILES [H]C1([H])C([H])([H])C([H])(C(F)(F)F)C1([H])[H] is 0.34821 a. u."}", "/scratch/micpie/export/qm8/train_0-15.jsonl": "{"text":"Question: What is the RI-CC2\/def2TZVP-computed S0 -> S2 transition oscillator strength of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 2\t293.60975\t293.54111\t191.39397\t1.6256\t9.46\t-0.257\t0.0829\t0.3399\t26.1563\t0.034358\t-56.525887\t-56.523026\t-56.522082\t-56.544961\t6.316\n ChemNLP 3D\n\n 4 3 0 0 0 0 0 0 0 0999 V2000\n -0.0404 1.0241 0.0626 N 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0173 0.0125 -0.0274 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9158 1.3587 -0.0288 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5203 1.3435 -0.7755 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.03005 a. u."} {"text":"Question: What is the S0 -> S2 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) of the molecule with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23927\t3.78971\t1.55457\t1.42284\t1.532\t46.48\t-0.2621\t0.067\t0.329\t855.7697\t0.092261\t-530.085764\t-530.078593\t-530.077649\t-530.118327\t24.984\n ChemNLP 3D\n\n 13 13 0 0 0 0 0 0 0 0999 V2000\n 0.1408 0.0536 0.1012 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0125 1.3864 0.0414 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2077 1.9240 -0.1160 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7231 1.6594 -1.0646 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6922 1.9297 1.2733 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9757 1.1953 1.7185 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3649 0.7933 2.9595 O 0 0 0 0 0 0 0 0 0 0 0 0\n -0.1261 1.4488 2.6272 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7843 3.0100 1.1594 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.8483 1.8387 1.8753 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.2640 0.3380 1.1002 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7117 0.7458 2.5622 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.1240 2.2461 3.3356 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 10 1 0\n 6 7 1 0\n 8 7 1 0\n 8 13 1 0\n 9 5 1 0\n 11 6 1 0\n 12 8 1 0\nM END\n[\\V2000].\nAnswer: 0.02414 a. u."}", "/scratch/micpie/export/qm8/valid_0-4.jsonl": "{"text":"The S0 -> S1 transition energy computed using LR-TDPBE0\/def2SVP of the chemical with the DeepSMILES [H]C[H])[H])[H] is 0.43022 a. u."} {"text":"The LR-TDPBE0\/def2SVP-computed S0 -> S1 transition energy of the chemical with the DeepSMILES [H]C[H])C[H])[H])C[H])CF)F)F))C4[H])[H] is 0.34706 a. u."}", "/scratch/micpie/export/qm8/train_0-5.jsonl": "{"text":"The S0 -> S2 transition energy computed using linear-response time-dependent density functional theory (LR-TDPBE0\/def2SVP) of the compound with the InChI InChI=1S\/H3N\/h1H3 is 0.34911 a. u."} {"text":"The LR-TDPBE0\/def2SVP-computed S0 -> S2 transition energy of the molecule with the SELFIES [H][C][Branch1][C][H][O][C][Branch1][C][H][Branch1][C][H][C][Ring1][#Branch1][Branch1][C][H][C][Branch1][C][F][Branch1][C][F][F] is 0.29260 a. u."}", "/scratch/micpie/export/qm8/valid_0-15.jsonl": "{"text":"Question: What is the S0 -> S2 transition oscillator strength computed using RI-CC2\/def2TZVP of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 1\t157.7118\t157.70997\t157.70699\t0.\t13.21\t-0.3877\t0.1171\t0.5048\t35.3641\t0.044749\t-40.47893\t-40.476062\t-40.475117\t-40.498597\t6.469\n ChemNLP 3D\n\n 5 4 0 0 0 0 0 0 0 0999 V2000\n -0.0127 1.0858 0.0080 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0022 -0.0060 0.0020 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0117 1.4638 0.0003 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5408 1.4475 -0.8766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5238 1.4379 0.9064 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 5 1 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.24974 a. u."} {"text":"Question: What is the RI-CC2\/def2TZVP-computed S0 -> S2 transition oscillator strength of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23924\t3.7138\t1.49465\t1.35927\t2.1985\t53.65\t-0.3185\t0.0809\t0.3994\t921.4627\t0.116107\t-494.170733\t-494.163382\t-494.162438\t-494.203105\t26.845\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.2095 0.1938 -0.0747 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0135 1.5187 0.0042 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.1788 2.1279 -0.1145 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7489 1.8601 -1.0679 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7020 1.8981 1.2897 C 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0047 1.1393 1.6632 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.4997 1.0825 3.1314 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0689 1.4081 2.6206 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8459 2.9813 1.2766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.9505 1.6455 1.4637 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0280 0.1495 1.2012 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9113 1.8957 3.7353 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.6374 0.1448 3.6730 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5236 0.5037 2.4633 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5269 2.1282 3.1837 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 7 1 0\n 7 13 1 0\n 7 12 1 0\n 8 7 1 0\n 8 15 1 0\n 9 5 1 0\n 10 6 1 0\n 11 6 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.01704 a. u."}", "/scratch/micpie/export/qm8/valid_0-12.jsonl": "{"text":"Question: What is the S0 -> S1 transition energy computed using second-order approximate coupled-cluster theory (CC2) of the chemical with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 1\t157.7118\t157.70997\t157.70699\t0.\t13.21\t-0.3877\t0.1171\t0.5048\t35.3641\t0.044749\t-40.47893\t-40.476062\t-40.475117\t-40.498597\t6.469\n ChemNLP 3D\n\n 5 4 0 0 0 0 0 0 0 0999 V2000\n -0.0127 1.0858 0.0080 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0022 -0.0060 0.0020 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0117 1.4638 0.0003 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5408 1.4475 -0.8766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5238 1.4379 0.9064 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 5 1 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.433 a. u."} {"text":"Question: What is the S0 -> S1 transition energy computed using RI-CC2\/def2TZVP of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23924\t3.7138\t1.49465\t1.35927\t2.1985\t53.65\t-0.3185\t0.0809\t0.3994\t921.4627\t0.116107\t-494.170733\t-494.163382\t-494.162438\t-494.203105\t26.845\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.2095 0.1938 -0.0747 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0135 1.5187 0.0042 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.1788 2.1279 -0.1145 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7489 1.8601 -1.0679 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7020 1.8981 1.2897 C 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0047 1.1393 1.6632 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.4997 1.0825 3.1314 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0689 1.4081 2.6206 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8459 2.9813 1.2766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.9505 1.6455 1.4637 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0280 0.1495 1.2012 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9113 1.8957 3.7353 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.6374 0.1448 3.6730 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5236 0.5037 2.4633 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5269 2.1282 3.1837 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 7 1 0\n 7 13 1 0\n 7 12 1 0\n 8 7 1 0\n 8 15 1 0\n 9 5 1 0\n 10 6 1 0\n 11 6 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.344 a. u."}", "/scratch/micpie/export/qm8/valid_0-18.jsonl": "{"text":"Question: What is the S0 -> S1 transition oscillator strength computed using linear-response time-dependent density functional theory (LR-TDPBE0\/def2SVP) of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 1\t157.7118\t157.70997\t157.70699\t0.\t13.21\t-0.3877\t0.1171\t0.5048\t35.3641\t0.044749\t-40.47893\t-40.476062\t-40.475117\t-40.498597\t6.469\n ChemNLP 3D\n\n 5 4 0 0 0 0 0 0 0 0999 V2000\n -0.0127 1.0858 0.0080 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0022 -0.0060 0.0020 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0117 1.4638 0.0003 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5408 1.4475 -0.8766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5238 1.4379 0.9064 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 5 1 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.18144 a. u."} {"text":"Question: What is the S0 -> S1 transition oscillator strength computed using linear-response time-dependent density functional theory (LR-TDPBE0\/def2SVP) of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23924\t3.7138\t1.49465\t1.35927\t2.1985\t53.65\t-0.3185\t0.0809\t0.3994\t921.4627\t0.116107\t-494.170733\t-494.163382\t-494.162438\t-494.203105\t26.845\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.2095 0.1938 -0.0747 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0135 1.5187 0.0042 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.1788 2.1279 -0.1145 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7489 1.8601 -1.0679 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7020 1.8981 1.2897 C 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0047 1.1393 1.6632 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.4997 1.0825 3.1314 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0689 1.4081 2.6206 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8459 2.9813 1.2766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.9505 1.6455 1.4637 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0280 0.1495 1.2012 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9113 1.8957 3.7353 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.6374 0.1448 3.6730 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5236 0.5037 2.4633 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5269 2.1282 3.1837 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 7 1 0\n 7 13 1 0\n 7 12 1 0\n 8 7 1 0\n 8 15 1 0\n 9 5 1 0\n 10 6 1 0\n 11 6 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.00389 a. u."}", "/scratch/micpie/export/qm8/train_0-2.jsonl": "{"text":"The RI-CC2\/def2TZVP-computed S0 -> S1 transition oscillator strength of the chemical with the SELFIES [H][N][Branch1][C][H][H] is 0.06702 a. u."} {"text":"The S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) of the molecule with the SELFIES [H][C][Branch1][C][H][O][C][Branch1][C][H][Branch1][C][H][C][Ring1][#Branch1][Branch1][C][H][C][Branch1][C][F][Branch1][C][F][F] is 0.00004 a. u."}", "/scratch/micpie/export/qm8/test_0-11.jsonl": "{"text":"The S0 -> S2 transition oscillator strength computed using CAM-B3LYP\/def2TZVP of the molecule with the DeepSMILES [H]O[H] is 0.0000 a. u."} {"text":"The S0 -> S2 transition oscillator strength computed using CAM-B3LYP\/def2TZVP of the molecule with the canonical SMILES FC(F)(F)CC1CC1 is 0.0008 a. u."}", "/scratch/micpie/export/qm8/train_0-7.jsonl": "{"text":"The S0 -> S2 transition oscillator strength computed using LR-TDPBE0\/def2SVP of the compound with the canonical SMILES N is 0.0316 a. u."} {"text":"The S0 -> S2 transition oscillator strength computed using LR-TDPBE0\/def2SVP of the molecule with the SMILES [H]C1([H])OC([H])([H])C1([H])C(F)(F)F is 0.0238 a. u."}", "/scratch/micpie/export/qm8/test_0-17.jsonl": "{"text":"Question: What is the S0 -> S2 transition energy computed using LR-TDPBE0\/def2SVP of the chemical with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 3\t799.58812\t437.90386\t282.94545\t1.8511\t6.31\t-0.2928\t0.0687\t0.3615\t19.0002\t0.021375\t-76.404702\t-76.401867\t-76.400922\t-76.422349\t6.002\n ChemNLP 3D\n\n 3 2 0 0 0 0 0 0 0 0999 V2000\n -0.0344 0.9775 0.0076 O 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0648 0.0206 0.0015 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.8718 1.3008 0.0007 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\nM END\n[\\V2000].\nAnswer: 0.36209 a. u."} {"text":"Question: What is the LR-TDPBE0\/def2SVP-computed S0 -> S2 transition energy of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23920\t4.16555\t1.22902\t1.1911\t2.0686\t53.89\t-0.2927\t0.0895\t0.3822\t1023.2247\t0.115399\t-494.167503\t-494.159866\t-494.158921\t-494.200318\t27.778\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.3774 0.1902 -0.0327 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0267 1.4673 0.0776 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0762 2.2330 0.1036 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6975 1.7727 -1.0459 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8924 1.6752 1.3057 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.2029 1.2594 2.5856 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7820 2.1866 3.2514 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5605 1.9303 3.8875 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.1633 2.7356 1.3332 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.8151 1.1063 1.1484 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0263 0.1997 2.6383 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.6634 1.7526 3.7093 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9530 3.1535 2.7915 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.2959 2.7280 3.8623 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6021 1.3189 4.7817 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 3 1 0\n 2 5 1 0\n 4 2 1 0\n 5 9 1 0\n 5 6 1 0\n 6 11 1 0\n 6 7 1 0\n 6 8 1 0\n 7 12 1 0\n 7 8 1 0\n 8 15 1 0\n 10 5 1 0\n 13 7 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.33185 a. u."}", "/scratch/micpie/export/qm8/valid_0-27.jsonl": "{"text":"User: I want to design a molecule with a particular S0 -> S1 transition energy computed using RI-CC2\/def2TZVP, S0 -> S2 transition energy computed using second-order approximate coupled-cluster theory (CC2), and S0 -> S1 transition oscillator strength computed using RI-CC2\/def2TZVP.\nAssistant: Cool, I would need to know the S0 -> S1 transition energy computed using RI-CC2\/def2TZVP, S0 -> S2 transition energy computed using second-order approximate coupled-cluster theory (CC2), and S0 -> S1 transition oscillator strength computed using RI-CC2\/def2TZVP of the molecule you want to design.\nUser: The S0 -> S1 transition energy computed using RI-CC2\/def2TZVP should be 0.433 a. u., the S0 -> S2 transition energy computed using second-order approximate coupled-cluster theory (CC2) should be 0.43296 a. u., and the S0 -> S1 transition oscillator strength computed using RI-CC2\/def2TZVP should be 0.24973 a. u.\nAssistant: I propose the compound with the V2000 Molfile with the following content: [V2000]\ngdb 1\t157.7118\t157.70997\t157.70699\t0.\t13.21\t-0.3877\t0.1171\t0.5048\t35.3641\t0.044749\t-40.47893\t-40.476062\t-40.475117\t-40.498597\t6.469\n ChemNLP 3D\n\n 5 4 0 0 0 0 0 0 0 0999 V2000\n -0.0127 1.0858 0.0080 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0022 -0.0060 0.0020 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0117 1.4638 0.0003 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5408 1.4475 -0.8766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5238 1.4379 0.9064 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 5 1 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000]."} {"text":"User: I want to design a compound with a particular S0 -> S1 transition energy computed using second-order approximate coupled-cluster theory (CC2), RI-CC2\/def2TZVP-computed S0 -> S2 transition energy, and S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2).\nAssistant: That sounds interesting, I would need to know the S0 -> S1 transition energy computed using second-order approximate coupled-cluster theory (CC2), RI-CC2\/def2TZVP-computed S0 -> S2 transition energy, and S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) of the molecule you want to design.\nUser: The S0 -> S1 transition energy computed using second-order approximate coupled-cluster theory (CC2) should be 0.344 a. u., the RI-CC2\/def2TZVP-computed S0 -> S2 transition energy should be 0.34462 a. u., and the S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) should be 0.04496 a. u.\nAssistant: I advise the molecule with the V2000 Molfile with the following content: [V2000]\ngdb 23924\t3.7138\t1.49465\t1.35927\t2.1985\t53.65\t-0.3185\t0.0809\t0.3994\t921.4627\t0.116107\t-494.170733\t-494.163382\t-494.162438\t-494.203105\t26.845\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.2095 0.1938 -0.0747 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0135 1.5187 0.0042 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.1788 2.1279 -0.1145 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7489 1.8601 -1.0679 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7020 1.8981 1.2897 C 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0047 1.1393 1.6632 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.4997 1.0825 3.1314 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0689 1.4081 2.6206 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8459 2.9813 1.2766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.9505 1.6455 1.4637 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0280 0.1495 1.2012 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9113 1.8957 3.7353 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.6374 0.1448 3.6730 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5236 0.5037 2.4633 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5269 2.1282 3.1837 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 7 1 0\n 7 13 1 0\n 7 12 1 0\n 8 7 1 0\n 8 15 1 0\n 9 5 1 0\n 10 6 1 0\n 11 6 1 0\n 14 8 1 0\nM END\n[\\V2000]."}", "/scratch/micpie/export/qm8/valid_0-19.jsonl": "{"text":"Question: What is the S0 -> S2 transition oscillator strength computed using linear-response time-dependent density functional theory (LR-TDPBE0\/def2SVP) of the chemical with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 1\t157.7118\t157.70997\t157.70699\t0.\t13.21\t-0.3877\t0.1171\t0.5048\t35.3641\t0.044749\t-40.47893\t-40.476062\t-40.475117\t-40.498597\t6.469\n ChemNLP 3D\n\n 5 4 0 0 0 0 0 0 0 0999 V2000\n -0.0127 1.0858 0.0080 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0022 -0.0060 0.0020 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0117 1.4638 0.0003 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5408 1.4475 -0.8766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5238 1.4379 0.9064 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 5 1 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.1815 a. u."} {"text":"Question: What is the LR-TDPBE0\/def2SVP-computed S0 -> S2 transition oscillator strength of the chemical with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23924\t3.7138\t1.49465\t1.35927\t2.1985\t53.65\t-0.3185\t0.0809\t0.3994\t921.4627\t0.116107\t-494.170733\t-494.163382\t-494.162438\t-494.203105\t26.845\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.2095 0.1938 -0.0747 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0135 1.5187 0.0042 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.1788 2.1279 -0.1145 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7489 1.8601 -1.0679 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7020 1.8981 1.2897 C 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0047 1.1393 1.6632 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.4997 1.0825 3.1314 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0689 1.4081 2.6206 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8459 2.9813 1.2766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.9505 1.6455 1.4637 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0280 0.1495 1.2012 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9113 1.8957 3.7353 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.6374 0.1448 3.6730 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5236 0.5037 2.4633 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5269 2.1282 3.1837 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 7 1 0\n 7 13 1 0\n 7 12 1 0\n 8 7 1 0\n 8 15 1 0\n 9 5 1 0\n 10 6 1 0\n 11 6 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.0570 a. u."}", "/scratch/micpie/export/qm8/train_0-11.jsonl": "{"text":"The S0 -> S2 transition oscillator strength computed using CAM-B3LYP\/def2TZVP of the molecule with the DeepSMILES [H]N[H])[H] is 0.0238 a. u."} {"text":"The S0 -> S2 transition oscillator strength computed using Coulomb-attenuated B3LYP density functional theory (CAM-B3LYP\/def2TZVP) of the compound with the SMILES [H]C1([H])OC([H])([H])C1([H])C(F)(F)F is 0.0266 a. u."}", "/scratch/micpie/export/qm8/train_0-1.jsonl": "{"text":"The S0 -> S2 transition energy computed using RI-CC2\/def2TZVP of the compound with the InChI InChI=1S\/H3N\/h1H3 is 0.35008 a. u."} {"text":"The RI-CC2\/def2TZVP-computed S0 -> S2 transition energy of the chemical with the DeepSMILES [H]C[H])OC[H])[H])C4[H])CF)F)F is 0.28498 a. u."}", "/scratch/micpie/export/qm8/train_0-13.jsonl": "{"text":"Question: What is the S0 -> S2 transition energy computed using RI-CC2\/def2TZVP of the molecule with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 2\t293.60975\t293.54111\t191.39397\t1.6256\t9.46\t-0.257\t0.0829\t0.3399\t26.1563\t0.034358\t-56.525887\t-56.523026\t-56.522082\t-56.544961\t6.316\n ChemNLP 3D\n\n 4 3 0 0 0 0 0 0 0 0999 V2000\n -0.0404 1.0241 0.0626 N 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0173 0.0125 -0.0274 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9158 1.3587 -0.0288 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5203 1.3435 -0.7755 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.35008 a. u."} {"text":"Question: What is the RI-CC2\/def2TZVP-computed S0 -> S2 transition energy of the molecule with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23927\t3.78971\t1.55457\t1.42284\t1.532\t46.48\t-0.2621\t0.067\t0.329\t855.7697\t0.092261\t-530.085764\t-530.078593\t-530.077649\t-530.118327\t24.984\n ChemNLP 3D\n\n 13 13 0 0 0 0 0 0 0 0999 V2000\n 0.1408 0.0536 0.1012 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0125 1.3864 0.0414 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2077 1.9240 -0.1160 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7231 1.6594 -1.0646 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6922 1.9297 1.2733 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9757 1.1953 1.7185 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3649 0.7933 2.9595 O 0 0 0 0 0 0 0 0 0 0 0 0\n -0.1261 1.4488 2.6272 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7843 3.0100 1.1594 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.8483 1.8387 1.8753 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.2640 0.3380 1.1002 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7117 0.7458 2.5622 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.1240 2.2461 3.3356 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 10 1 0\n 6 7 1 0\n 8 7 1 0\n 8 13 1 0\n 9 5 1 0\n 11 6 1 0\n 12 8 1 0\nM END\n[\\V2000].\nAnswer: 0.28498 a. u."}", "/scratch/micpie/export/qm8/valid_0-26.jsonl": "{"text":"Question: What is the S0 -> S2 transition energy computed using second-order approximate coupled-cluster theory (CC2) of the compound with the V3000 Molfile with the following content?\nDescription: The content of the V3000 Molfile is [V3000]\ngdb 1\t157.7118\t157.70997\t157.70699\t0.\t13.21\t-0.3877\t0.1171\t0.5048\t35.3641\t0.044749\t-40.47893\t-40.476062\t-40.475117\t-40.498597\t6.469\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 5 4 0 0 0\nM V30 BEGIN ATOM\nM V30 1 C -0.012700 1.085800 0.008000 0\nM V30 2 H 0.002200 -0.006000 0.002000 0\nM V30 3 H 1.011700 1.463800 0.000300 0\nM V30 4 H -0.540800 1.447500 -0.876600 0\nM V30 5 H -0.523800 1.437900 0.906400 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 5\nM V30 2 1 2 1\nM V30 3 1 3 1\nM V30 4 1 4 1\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000].\nAnswer: 0.43296 a. u."} {"text":"Question: What is the S0 -> S2 transition energy computed using second-order approximate coupled-cluster theory (CC2) of the molecule with the V3000 Molfile with the following content?\nDescription: The content of the V3000 Molfile is [V3000]\ngdb 23924\t3.7138\t1.49465\t1.35927\t2.1985\t53.65\t-0.3185\t0.0809\t0.3994\t921.4627\t0.116107\t-494.170733\t-494.163382\t-494.162438\t-494.203105\t26.845\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 15 15 0 0 0\nM V30 BEGIN ATOM\nM V30 1 F 0.209500 0.193800 -0.074700 0\nM V30 2 C -0.013500 1.518700 0.004200 0\nM V30 3 F 1.178800 2.127900 -0.114500 0\nM V30 4 F -0.748900 1.860100 -1.067900 0\nM V30 5 C -0.702000 1.898100 1.289700 0\nM V30 6 C -2.004700 1.139300 1.663200 0\nM V30 7 C -1.499700 1.082500 3.131400 0\nM V30 8 C -0.068900 1.408100 2.620600 0\nM V30 9 H -0.845900 2.981300 1.276600 0\nM V30 10 H -2.950500 1.645500 1.463700 0\nM V30 11 H -2.028000 0.149500 1.201200 0\nM V30 12 H -1.911300 1.895700 3.735300 0\nM V30 13 H -1.637400 0.144800 3.673000 0\nM V30 14 H 0.523600 0.503700 2.463300 0\nM V30 15 H 0.526900 2.128200 3.183700 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 2 5\nM V30 3 1 3 2\nM V30 4 1 4 2\nM V30 5 1 5 6\nM V30 6 1 5 8\nM V30 7 1 6 7\nM V30 8 1 7 13\nM V30 9 1 7 12\nM V30 10 1 8 7\nM V30 11 1 8 15\nM V30 12 1 9 5\nM V30 13 1 10 6\nM V30 14 1 11 6\nM V30 15 1 14 8\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000].\nAnswer: 0.34462 a. u."}", "/scratch/micpie/export/qm8/train_0-4.jsonl": "{"text":"The S0 -> S1 transition energy computed using LR-TDPBE0\/def2SVP of the compound with the InChI InChI=1S\/H3N\/h1H3 is 0.26839 a. u."} {"text":"The S0 -> S1 transition energy computed using linear-response time-dependent density functional theory (LR-TDPBE0\/def2SVP) of the compound with the SMILES [H]C1([H])OC([H])([H])C1([H])C(F)(F)F is 0.27690 a. u."}", "/scratch/micpie/export/qm8/test_0-7.jsonl": "{"text":"The S0 -> S2 transition oscillator strength computed using linear-response time-dependent density functional theory (LR-TDPBE0\/def2SVP) of the chemical with the canonical SMILES O is 0.0000 a. u."} {"text":"The S0 -> S2 transition oscillator strength computed using LR-TDPBE0\/def2SVP of the compound with the SELFIES [H][C][Branch1][C][H][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][H][C][Branch1][C][H][Branch1][C][H][C][Ring1][=Branch1][Branch1][C][H][H] is 0.0184 a. u."}", "/scratch/micpie/export/qm8/train_0-9.jsonl": "{"text":"The CAM-B3LYP\/def2TZVP-computed S0 -> S2 transition energy of the molecule with the DeepSMILES [H]N[H])[H] is 0.33448 a. u."} {"text":"The S0 -> S2 transition energy computed using Coulomb-attenuated B3LYP density functional theory (CAM-B3LYP\/def2TZVP) of the chemical with the InChI InChI=1S\/C4H5F3O\/c5-4(6,7)3-1-8-2-3\/h3H,1-2H2 is 0.29204 a. u."}", "/scratch/micpie/export/qm8/train_0-25.jsonl": "{"text":"Question: What is the DeepSMILES of the chemical with the V3000 Molfile with the following content?\nDescription: The content of the V3000 Molfile is [V3000]\ngdb 2\t293.60975\t293.54111\t191.39397\t1.6256\t9.46\t-0.257\t0.0829\t0.3399\t26.1563\t0.034358\t-56.525887\t-56.523026\t-56.522082\t-56.544961\t6.316\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 4 3 0 0 0\nM V30 BEGIN ATOM\nM V30 1 N -0.040400 1.024100 0.062600 0\nM V30 2 H 0.017300 0.012500 -0.027400 0\nM V30 3 H 0.915800 1.358700 -0.028800 0\nM V30 4 H -0.520300 1.343500 -0.775500 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 2 1\nM V30 2 1 3 1\nM V30 3 1 4 1\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000].\nAnswer: [H]N[H])[H]."} {"text":"Question: What is the SMILES of the chemical with the V3000 Molfile with the following content?\nDescription: The content of the V3000 Molfile is [V3000]\ngdb 23927\t3.78971\t1.55457\t1.42284\t1.532\t46.48\t-0.2621\t0.067\t0.329\t855.7697\t0.092261\t-530.085764\t-530.078593\t-530.077649\t-530.118327\t24.984\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 13 13 0 0 0\nM V30 BEGIN ATOM\nM V30 1 F 0.140800 0.053600 0.101200 0\nM V30 2 C -0.012500 1.386400 0.041400 0\nM V30 3 F 1.207700 1.924000 -0.116000 0\nM V30 4 F -0.723100 1.659400 -1.064600 0\nM V30 5 C -0.692200 1.929700 1.273300 0\nM V30 6 C -1.975700 1.195300 1.718500 0\nM V30 7 O -1.364900 0.793300 2.959500 0\nM V30 8 C -0.126100 1.448800 2.627200 0\nM V30 9 H -0.784300 3.010000 1.159400 0\nM V30 10 H -2.848300 1.838700 1.875300 0\nM V30 11 H -2.264000 0.338000 1.100200 0\nM V30 12 H 0.711700 0.745800 2.562200 0\nM V30 13 H 0.124000 2.246100 3.335600 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 2 1\nM V30 2 1 2 5\nM V30 3 1 3 2\nM V30 4 1 4 2\nM V30 5 1 5 6\nM V30 6 1 5 8\nM V30 7 1 6 10\nM V30 8 1 6 7\nM V30 9 1 8 7\nM V30 10 1 8 13\nM V30 11 1 9 5\nM V30 12 1 11 6\nM V30 13 1 12 8\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000].\nAnswer: [H]C1([H])OC([H])([H])C1([H])C(F)(F)F."}", "/scratch/micpie/export/qm8/valid_0-22.jsonl": "{"text":"Question: What is the CAM-B3LYP\/def2TZVP-computed S0 -> S1 transition oscillator strength of the molecule with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 1\t157.7118\t157.70997\t157.70699\t0.\t13.21\t-0.3877\t0.1171\t0.5048\t35.3641\t0.044749\t-40.47893\t-40.476062\t-40.475117\t-40.498597\t6.469\n ChemNLP 3D\n\n 5 4 0 0 0 0 0 0 0 0999 V2000\n -0.0127 1.0858 0.0080 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0022 -0.0060 0.0020 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0117 1.4638 0.0003 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5408 1.4475 -0.8766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5238 1.4379 0.9064 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 5 1 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.18320 a. u."} {"text":"Question: What is the S0 -> S1 transition oscillator strength computed using Coulomb-attenuated B3LYP density functional theory (CAM-B3LYP\/def2TZVP) of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23924\t3.7138\t1.49465\t1.35927\t2.1985\t53.65\t-0.3185\t0.0809\t0.3994\t921.4627\t0.116107\t-494.170733\t-494.163382\t-494.162438\t-494.203105\t26.845\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.2095 0.1938 -0.0747 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0135 1.5187 0.0042 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.1788 2.1279 -0.1145 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7489 1.8601 -1.0679 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7020 1.8981 1.2897 C 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0047 1.1393 1.6632 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.4997 1.0825 3.1314 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0689 1.4081 2.6206 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8459 2.9813 1.2766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.9505 1.6455 1.4637 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0280 0.1495 1.2012 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9113 1.8957 3.7353 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.6374 0.1448 3.6730 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5236 0.5037 2.4633 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5269 2.1282 3.1837 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 7 1 0\n 7 13 1 0\n 7 12 1 0\n 8 7 1 0\n 8 15 1 0\n 9 5 1 0\n 10 6 1 0\n 11 6 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.03760 a. u."}", "/scratch/micpie/export/qm8/train_0-18.jsonl": "{"text":"Question: What is the S0 -> S1 transition oscillator strength computed using LR-TDPBE0\/def2SVP of the chemical with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 2\t293.60975\t293.54111\t191.39397\t1.6256\t9.46\t-0.257\t0.0829\t0.3399\t26.1563\t0.034358\t-56.525887\t-56.523026\t-56.522082\t-56.544961\t6.316\n ChemNLP 3D\n\n 4 3 0 0 0 0 0 0 0 0999 V2000\n -0.0404 1.0241 0.0626 N 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0173 0.0125 -0.0274 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9158 1.3587 -0.0288 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5203 1.3435 -0.7755 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.04076 a. u."} {"text":"Question: What is the LR-TDPBE0\/def2SVP-computed S0 -> S1 transition oscillator strength of the molecule with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23927\t3.78971\t1.55457\t1.42284\t1.532\t46.48\t-0.2621\t0.067\t0.329\t855.7697\t0.092261\t-530.085764\t-530.078593\t-530.077649\t-530.118327\t24.984\n ChemNLP 3D\n\n 13 13 0 0 0 0 0 0 0 0999 V2000\n 0.1408 0.0536 0.1012 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0125 1.3864 0.0414 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.2077 1.9240 -0.1160 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7231 1.6594 -1.0646 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6922 1.9297 1.2733 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9757 1.1953 1.7185 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.3649 0.7933 2.9595 O 0 0 0 0 0 0 0 0 0 0 0 0\n -0.1261 1.4488 2.6272 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7843 3.0100 1.1594 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.8483 1.8387 1.8753 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.2640 0.3380 1.1002 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7117 0.7458 2.5622 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.1240 2.2461 3.3356 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 10 1 0\n 6 7 1 0\n 8 7 1 0\n 8 13 1 0\n 9 5 1 0\n 11 6 1 0\n 12 8 1 0\nM END\n[\\V2000].\nAnswer: 0.00194 a. u."}", "/scratch/micpie/export/qm8/valid_0-3.jsonl": "{"text":"The S0 -> S2 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) of the chemical with the canonical SMILES C is 0.24974 a. u."} {"text":"The S0 -> S2 transition oscillator strength computed using RI-CC2\/def2TZVP of the molecule with the SELFIES [H][C][Branch1][C][H][C][Branch1][C][H][Branch1][C][H][C][Branch1][C][H][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Ring1][N][Branch1][C][H][H] is 0.01704 a. u."}", "/scratch/micpie/export/qm8/test_0-8.jsonl": "{"text":"The S0 -> S1 transition energy computed using CAM-B3LYP\/def2TZVP of the compound with the SELFIES [H][O][H] is 0.27852 a. u."} {"text":"The CAM-B3LYP\/def2TZVP-computed S0 -> S1 transition energy of the molecule with the DeepSMILES [H]C[H])CF)F)F))C[H])C[H])[H])C3[H])[H] is 0.31766 a. u."}", "/scratch/micpie/export/qm8/test_0-14.jsonl": "{"text":"Question: What is the S0 -> S1 transition oscillator strength computed using second-order approximate coupled-cluster theory (CC2) of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 3\t799.58812\t437.90386\t282.94545\t1.8511\t6.31\t-0.2928\t0.0687\t0.3615\t19.0002\t0.021375\t-76.404702\t-76.401867\t-76.400922\t-76.422349\t6.002\n ChemNLP 3D\n\n 3 2 0 0 0 0 0 0 0 0999 V2000\n -0.0344 0.9775 0.0076 O 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0648 0.0206 0.0015 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.8718 1.3008 0.0007 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\nM END\n[\\V2000].\nAnswer: 0.03776 a. u."} {"text":"Question: What is the RI-CC2\/def2TZVP-computed S0 -> S1 transition oscillator strength of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23920\t4.16555\t1.22902\t1.1911\t2.0686\t53.89\t-0.2927\t0.0895\t0.3822\t1023.2247\t0.115399\t-494.167503\t-494.159866\t-494.158921\t-494.200318\t27.778\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.3774 0.1902 -0.0327 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0267 1.4673 0.0776 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0762 2.2330 0.1036 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6975 1.7727 -1.0459 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8924 1.6752 1.3057 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.2029 1.2594 2.5856 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7820 2.1866 3.2514 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5605 1.9303 3.8875 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.1633 2.7356 1.3332 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.8151 1.1063 1.1484 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0263 0.1997 2.6383 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.6634 1.7526 3.7093 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9530 3.1535 2.7915 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.2959 2.7280 3.8623 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6021 1.3189 4.7817 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 3 1 0\n 2 5 1 0\n 4 2 1 0\n 5 9 1 0\n 5 6 1 0\n 6 11 1 0\n 6 7 1 0\n 6 8 1 0\n 7 12 1 0\n 7 8 1 0\n 8 15 1 0\n 10 5 1 0\n 13 7 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.01563 a. u."}", "/scratch/micpie/export/qm8/valid_0-17.jsonl": "{"text":"Question: What is the S0 -> S2 transition energy computed using LR-TDPBE0\/def2SVP of the molecule with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 1\t157.7118\t157.70997\t157.70699\t0.\t13.21\t-0.3877\t0.1171\t0.5048\t35.3641\t0.044749\t-40.47893\t-40.476062\t-40.475117\t-40.498597\t6.469\n ChemNLP 3D\n\n 5 4 0 0 0 0 0 0 0 0999 V2000\n -0.0127 1.0858 0.0080 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0022 -0.0060 0.0020 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0117 1.4638 0.0003 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5408 1.4475 -0.8766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5238 1.4379 0.9064 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 5 1 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.43024 a. u."} {"text":"Question: What is the S0 -> S2 transition energy computed using LR-TDPBE0\/def2SVP of the chemical with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23924\t3.7138\t1.49465\t1.35927\t2.1985\t53.65\t-0.3185\t0.0809\t0.3994\t921.4627\t0.116107\t-494.170733\t-494.163382\t-494.162438\t-494.203105\t26.845\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.2095 0.1938 -0.0747 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0135 1.5187 0.0042 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.1788 2.1279 -0.1145 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7489 1.8601 -1.0679 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7020 1.8981 1.2897 C 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0047 1.1393 1.6632 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.4997 1.0825 3.1314 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0689 1.4081 2.6206 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8459 2.9813 1.2766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.9505 1.6455 1.4637 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0280 0.1495 1.2012 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9113 1.8957 3.7353 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.6374 0.1448 3.6730 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5236 0.5037 2.4633 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5269 2.1282 3.1837 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 7 1 0\n 7 13 1 0\n 7 12 1 0\n 8 7 1 0\n 8 15 1 0\n 9 5 1 0\n 10 6 1 0\n 11 6 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.34821 a. u."}", "/scratch/micpie/export/qm8/valid_0-14.jsonl": "{"text":"Question: What is the RI-CC2\/def2TZVP-computed S0 -> S1 transition oscillator strength of the chemical with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 1\t157.7118\t157.70997\t157.70699\t0.\t13.21\t-0.3877\t0.1171\t0.5048\t35.3641\t0.044749\t-40.47893\t-40.476062\t-40.475117\t-40.498597\t6.469\n ChemNLP 3D\n\n 5 4 0 0 0 0 0 0 0 0999 V2000\n -0.0127 1.0858 0.0080 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0022 -0.0060 0.0020 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0117 1.4638 0.0003 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5408 1.4475 -0.8766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5238 1.4379 0.9064 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 5 1 0\n 2 1 1 0\n 3 1 1 0\n 4 1 1 0\nM END\n[\\V2000].\nAnswer: 0.24973 a. u."} {"text":"Question: What is the RI-CC2\/def2TZVP-computed S0 -> S1 transition oscillator strength of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23924\t3.7138\t1.49465\t1.35927\t2.1985\t53.65\t-0.3185\t0.0809\t0.3994\t921.4627\t0.116107\t-494.170733\t-494.163382\t-494.162438\t-494.203105\t26.845\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.2095 0.1938 -0.0747 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0135 1.5187 0.0042 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.1788 2.1279 -0.1145 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7489 1.8601 -1.0679 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.7020 1.8981 1.2897 C 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0047 1.1393 1.6632 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.4997 1.0825 3.1314 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0689 1.4081 2.6206 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8459 2.9813 1.2766 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.9505 1.6455 1.4637 H 0 0 0 0 0 0 0 0 0 0 0 0\n -2.0280 0.1495 1.2012 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9113 1.8957 3.7353 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.6374 0.1448 3.6730 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5236 0.5037 2.4633 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.5269 2.1282 3.1837 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 5 1 0\n 3 2 1 0\n 4 2 1 0\n 5 6 1 0\n 5 8 1 0\n 6 7 1 0\n 7 13 1 0\n 7 12 1 0\n 8 7 1 0\n 8 15 1 0\n 9 5 1 0\n 10 6 1 0\n 11 6 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.04496 a. u."}", "/scratch/micpie/export/qm8/test_0-25.jsonl": "{"text":"Question: What is the canonical SMILES of the chemical with the V3000 Molfile with the following content?\nDescription: The content of the V3000 Molfile is [V3000]\ngdb 3\t799.58812\t437.90386\t282.94545\t1.8511\t6.31\t-0.2928\t0.0687\t0.3615\t19.0002\t0.021375\t-76.404702\t-76.401867\t-76.400922\t-76.422349\t6.002\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 3 2 0 0 0\nM V30 BEGIN ATOM\nM V30 1 O -0.034400 0.977500 0.007600 0\nM V30 2 H 0.064800 0.020600 0.001500 0\nM V30 3 H 0.871800 1.300800 0.000700 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 2 1\nM V30 2 1 3 1\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000].\nAnswer: O."} {"text":"Question: What is the SMILES of the chemical with the V3000 Molfile with the following content?\nDescription: The content of the V3000 Molfile is [V3000]\ngdb 23920\t4.16555\t1.22902\t1.1911\t2.0686\t53.89\t-0.2927\t0.0895\t0.3822\t1023.2247\t0.115399\t-494.167503\t-494.159866\t-494.158921\t-494.200318\t27.778\n ChemNLP 3D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 15 15 0 0 0\nM V30 BEGIN ATOM\nM V30 1 F 0.377400 0.190200 -0.032700 0\nM V30 2 C -0.026700 1.467300 0.077600 0\nM V30 3 F 1.076200 2.233000 0.103600 0\nM V30 4 F -0.697500 1.772700 -1.045900 0\nM V30 5 C -0.892400 1.675200 1.305700 0\nM V30 6 C -0.202900 1.259400 2.585600 0\nM V30 7 C 0.782000 2.186600 3.251400 0\nM V30 8 C -0.560500 1.930300 3.887500 0\nM V30 9 H -1.163300 2.735600 1.333200 0\nM V30 10 H -1.815100 1.106300 1.148400 0\nM V30 11 H 0.026300 0.199700 2.638300 0\nM V30 12 H 1.663400 1.752600 3.709300 0\nM V30 13 H 0.953000 3.153500 2.791500 0\nM V30 14 H -1.295900 2.728000 3.862300 0\nM V30 15 H -0.602100 1.318900 4.781700 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2\nM V30 2 1 2 3\nM V30 3 1 2 5\nM V30 4 1 4 2\nM V30 5 1 5 9\nM V30 6 1 5 6\nM V30 7 1 6 11\nM V30 8 1 6 7\nM V30 9 1 6 8\nM V30 10 1 7 12\nM V30 11 1 7 8\nM V30 12 1 8 15\nM V30 13 1 10 5\nM V30 14 1 13 7\nM V30 15 1 14 8\nM V30 END BOND\nM V30 END CTAB\nM END\n[\\V3000].\nAnswer: [H]C([H])(C(F)(F)F)C1([H])C([H])([H])C1([H])[H]."}", "/scratch/micpie/export/qm8/test_0-4.jsonl": "{"text":"The LR-TDPBE0\/def2SVP-computed S0 -> S1 transition energy of the compound with the SMILES [H]O[H] is 0.29138 a. u."} {"text":"The S0 -> S1 transition energy computed using LR-TDPBE0\/def2SVP of the compound with the InChI InChI=1S\/C5H7F3\/c6-5(7,8)3-4-1-2-4\/h4H,1-3H2 is 0.32976 a. u."}", "/scratch/micpie/export/qm8/test_0-12.jsonl": "{"text":"Question: What is the S0 -> S1 transition energy computed using second-order approximate coupled-cluster theory (CC2) of the compound with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 3\t799.58812\t437.90386\t282.94545\t1.8511\t6.31\t-0.2928\t0.0687\t0.3615\t19.0002\t0.021375\t-76.404702\t-76.401867\t-76.400922\t-76.422349\t6.002\n ChemNLP 3D\n\n 3 2 0 0 0 0 0 0 0 0999 V2000\n -0.0344 0.9775 0.0076 O 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0648 0.0206 0.0015 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.8718 1.3008 0.0007 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\nM END\n[\\V2000].\nAnswer: 0.287 a. u."} {"text":"Question: What is the S0 -> S1 transition energy computed using second-order approximate coupled-cluster theory (CC2) of the chemical with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23920\t4.16555\t1.22902\t1.1911\t2.0686\t53.89\t-0.2927\t0.0895\t0.3822\t1023.2247\t0.115399\t-494.167503\t-494.159866\t-494.158921\t-494.200318\t27.778\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.3774 0.1902 -0.0327 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0267 1.4673 0.0776 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0762 2.2330 0.1036 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6975 1.7727 -1.0459 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8924 1.6752 1.3057 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.2029 1.2594 2.5856 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7820 2.1866 3.2514 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5605 1.9303 3.8875 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.1633 2.7356 1.3332 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.8151 1.1063 1.1484 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0263 0.1997 2.6383 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.6634 1.7526 3.7093 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9530 3.1535 2.7915 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.2959 2.7280 3.8623 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6021 1.3189 4.7817 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 3 1 0\n 2 5 1 0\n 4 2 1 0\n 5 9 1 0\n 5 6 1 0\n 6 11 1 0\n 6 7 1 0\n 6 8 1 0\n 7 12 1 0\n 7 8 1 0\n 8 15 1 0\n 10 5 1 0\n 13 7 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.325 a. u."}", "/scratch/micpie/export/qm8/test_0-20.jsonl": "{"text":"Question: What is the S0 -> S1 transition energy computed using Coulomb-attenuated B3LYP density functional theory (CAM-B3LYP\/def2TZVP) of the chemical with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 3\t799.58812\t437.90386\t282.94545\t1.8511\t6.31\t-0.2928\t0.0687\t0.3615\t19.0002\t0.021375\t-76.404702\t-76.401867\t-76.400922\t-76.422349\t6.002\n ChemNLP 3D\n\n 3 2 0 0 0 0 0 0 0 0999 V2000\n -0.0344 0.9775 0.0076 O 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0648 0.0206 0.0015 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.8718 1.3008 0.0007 H 0 0 0 0 0 0 0 0 0 0 0 0\n 2 1 1 0\n 3 1 1 0\nM END\n[\\V2000].\nAnswer: 0.27852 a. u."} {"text":"Question: What is the S0 -> S1 transition energy computed using CAM-B3LYP\/def2TZVP of the molecule with the V2000 Molfile with the following content?\nDescription: The content of the V2000 Molfile is [V2000]\ngdb 23920\t4.16555\t1.22902\t1.1911\t2.0686\t53.89\t-0.2927\t0.0895\t0.3822\t1023.2247\t0.115399\t-494.167503\t-494.159866\t-494.158921\t-494.200318\t27.778\n ChemNLP 3D\n\n 15 15 0 0 0 0 0 0 0 0999 V2000\n 0.3774 0.1902 -0.0327 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.0267 1.4673 0.0776 C 0 0 0 0 0 0 0 0 0 0 0 0\n 1.0762 2.2330 0.1036 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6975 1.7727 -1.0459 F 0 0 0 0 0 0 0 0 0 0 0 0\n -0.8924 1.6752 1.3057 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.2029 1.2594 2.5856 C 0 0 0 0 0 0 0 0 0 0 0 0\n 0.7820 2.1866 3.2514 C 0 0 0 0 0 0 0 0 0 0 0 0\n -0.5605 1.9303 3.8875 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.1633 2.7356 1.3332 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.8151 1.1063 1.1484 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.0263 0.1997 2.6383 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1.6634 1.7526 3.7093 H 0 0 0 0 0 0 0 0 0 0 0 0\n 0.9530 3.1535 2.7915 H 0 0 0 0 0 0 0 0 0 0 0 0\n -1.2959 2.7280 3.8623 H 0 0 0 0 0 0 0 0 0 0 0 0\n -0.6021 1.3189 4.7817 H 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0\n 2 3 1 0\n 2 5 1 0\n 4 2 1 0\n 5 9 1 0\n 5 6 1 0\n 6 11 1 0\n 6 7 1 0\n 6 8 1 0\n 7 12 1 0\n 7 8 1 0\n 8 15 1 0\n 10 5 1 0\n 13 7 1 0\n 14 8 1 0\nM END\n[\\V2000].\nAnswer: 0.31766 a. u."}", "/scratch/micpie/export/drugchat_liang_zhang_et_al/test_0-1.jsonl": "{"text":"Q: How many hydrogen bond donors does this compound have?\nConstraint: The Compound can be represented with the canonical SMILES CC(C)c1nc(COC(N)=O)n(Cc2ccncc2)c1Sc1cc(Cl)cc(Cl)c1.\nAnswer: 1"} {"text":"Question: How many hydrogen bond donors does this compound have?\nDescription: The Compound has the DeepSMILES CCNCC))C=O)CNC=O)cccOC))cOC))cOC))c6.\nAnswer: 1"}", "/scratch/micpie/export/drugchat_liang_zhang_et_al/valid_0-0.jsonl": "{"text":"Task: Please answer the following question about the molecule with SMILES Cl.Cn1c(=O)c(Oc2ccc(F)cc2F)cc2cnc(NC3CCOCC3)nc21.\nRequest: Determine the type of availability for this drug.\nCompletion: unknown"} {"text":"Task: Please answer the following question about the molecule with InChI InChI=1S\/C25H40N8O2\/c1-15(34)28-18-12-16(30-25(5,6)7)8-9-19(18)32-11-10-17(22(32)35)29-23-27-14-26-21-13-20(24(2,3)4)31-33(21)23\/h13-14,16-19,30H,8-12H2,1-7H3,(H,28,34)(H,26,27,29)\/t16-,17+,18-,19+\/m1\/s1.\nQuestion: Please provide a description of this drug's mechanism of action.\nCompletion: C-C chemokine receptor type 2 antagonist"}", "/scratch/micpie/export/drugchat_liang_zhang_et_al/test_0-2.jsonl": "{"text":"User: I have a question about the molecule with DeepSMILES CCC)cncCOCN)=O))))nCcccncc6)))))))c5ScccCl)ccCl)c6.\nAssistant: How can I help?\nUser: How many hydrogen bond donors does this compound have?\nAssistant: 1"} {"text":"User: I have a question about the molecule with SELFIES [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][C][=Branch1][C][=O][C][=C][C][Branch1][Ring1][O][C][=C][Branch1][Ring1][O][C][C][Branch1][Ring1][O][C][=C][Ring1][N].\nAssistant: Interesting, how can I help?\nUser: How many hydrogen bond donors does this compound have?\nAssistant: The answer is 1"}", "/scratch/micpie/export/drugchat_liang_zhang_et_al/test_0-0.jsonl": "{"text":"Task: Please answer the following question about the molecule with InChI InChI=1S\/C20H20Cl2N4O2S\/c1-12(2)18-19(29-16-8-14(21)7-15(22)9-16)26(10-13-3-5-24-6-4-13)17(25-18)11-28-20(23)27\/h3-9,12H,10-11H2,1-2H3,(H2,23,27).\nRequest: How many hydrogen bond donors does this compound have?\nAnswer: 1"} {"text":"Task: Please answer the following question about the molecule with canonical SMILES CCN(CC)C(=O)CNC(=O)c1cc(OC)c(OC)c(OC)c1.\nQuestion: How many hydrogen bond donors does this compound have?\nResult: 1"}", "/scratch/micpie/export/drugchat_liang_zhang_et_al/test_0-3.jsonl": "{"text":"User: I want to know more about the compound with SELFIES [C][C][Branch1][C][C][C][N][=C][Branch1][Branch2][C][O][C][Branch1][C][N][=O][N][Branch1][#Branch2][C][C][=C][C][=N][C][=C][Ring1][=Branch1][C][=Ring1][P][S][C][=C][C][Branch1][C][Cl][=C][C][Branch1][C][Cl][=C][Ring1][Branch2].\nAssistant: Interesting, how can I help?\nUser: How many hydrogen bond donors does this compound have?\nAssistant: The answer is 1"} {"text":"User: I want to know more about the compound with SELFIES [C][C][N][Branch1][Ring1][C][C][C][=Branch1][C][=O][C][N][C][=Branch1][C][=O][C][=C][C][Branch1][Ring1][O][C][=C][Branch1][Ring1][O][C][C][Branch1][Ring1][O][C][=C][Ring1][N].\nAssistant: Interesting, how can I help?\nUser: How many hydrogen bond donors does this compound have?\nAssistant: The answer is 1"}", "/scratch/micpie/export/drugchat_liang_zhang_et_al/train_0-0.jsonl": "{"text":"Task: Please answer the following question about the molecule with SMILES CCOC1=CC=C(C=C1)CC2=C(C=CC(=C2)[C@H]3[C@@H]([C@H]([C@@H]([C@H](O3)CO)O)O)O)Cl.C[C@@H](CO)O.\nQuestion: Please describe this drug.\nCompletion: Dapagliflozin Propanediol is the propanediol form of dapagliflozin, a selective sodium-glucose co-transporter subtype 2 (SGLT2) inhibitor with antihyperglycemic activity. Upon administration, dapagliflozin selectively targets and inhibits SGLT2, thereby preventing the reabsorption of glucose by the kidneys."} {"text":"Task: Please answer the following question about the molecule with SELFIES [OH1-1].[O-2].[O-2].[O-2].[O-2].[O-2].[O-1][Si][O][Si][Branch1][Ring2][O][Ring1][Ring2][Branch1][Ring2][O][Ring1][Branch1][O-1].[O-1][Si][O][Si][Branch1][Ring2][O][Ring1][Ring2][Branch1][Ring2][O][Ring1][Branch1][O-1].[O-1][Si][O][Si][Branch1][Ring2][O][Ring1][Ring2][Branch1][Ring2][O][Ring1][Branch1][O-1].[O-1][Si][O][Si][Branch1][Ring2][O][Ring1][Ring2][Branch1][Ring2][O][Ring1][Branch1][O-1].[O-1][Si][O][Si][Branch1][Ring2][O][Ring1][Ring2][Branch1][Ring2][O][Ring1][Branch1][O-1].[O-1][Si][O][Si][Branch1][Ring2][O][Ring1][Ring2][Branch1][Ring2][O][Ring1][Branch1][O-1].[O-1][Si][O][Si][Branch1][Ring2][O][Ring1][Ring2][Branch1][Ring2][O][Ring1][Branch1][O-1].[F-1].[Mg+2].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[K+1].[K+1].[K+1].[Fe+2].\nQuestion: Please describe this drug.\nCompletion: illite is a mineral."}", "/scratch/micpie/export/drugchat_liang_zhang_et_al/train_0-3.jsonl": "{"text":"User: I want to know more about the chemical with InChI InChI=1S\/C21H25ClO6.C3H8O2\/c1-2-27-15-6-3-12(4-7-15)9-14-10-13(5-8-16(14)22)21-20(26)19(25)18(24)17(11-23)28-21;1-3(5)2-4\/h3-8,10,17-21,23-26H,2,9,11H2,1H3;3-5H,2H2,1H3\/t17-,18-,19+,20-,21+;3-\/m10\/s1.\nAssistant: Interesting, how can I help?\nUser: Please describe this drug.\nAssistant: The answer is Dapagliflozin Propanediol is the propanediol form of dapagliflozin, a selective sodium-glucose co-transporter subtype 2 (SGLT2) inhibitor with antihyperglycemic activity. Upon administration, dapagliflozin selectively targets and inhibits SGLT2, thereby preventing the reabsorption of glucose by the kidneys."} {"text":"User: I want to know more about the chemical with SMILES [OH-].[O-2].[O-2].[O-2].[O-2].[O-2].[O-][Si]12O[Si](O1)(O2)[O-].[O-][Si]12O[Si](O1)(O2)[O-].[O-][Si]12O[Si](O1)(O2)[O-].[O-][Si]12O[Si](O1)(O2)[O-].[O-][Si]12O[Si](O1)(O2)[O-].[O-][Si]12O[Si](O1)(O2)[O-].[O-][Si]12O[Si](O1)(O2)[O-].[F-].[Mg+2].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[K+].[K+].[K+].[Fe+2].\nAssistant: Sure, what is your question?\nUser: Please describe this drug.\nAssistant: illite is a mineral."}", "/scratch/micpie/export/drugchat_liang_zhang_et_al/valid_0-2.jsonl": "{"text":"User: I have a question about the chemical with SELFIES [Cl].[C][N][C][=Branch1][C][=O][C][Branch1][=C][O][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][F][=C][C][=C][N][=C][Branch1][#Branch2][N][C][C][C][O][C][C][Ring1][=Branch1][N][=C][Ring1][=N][Ring2][Ring1][O].\nAssistant: That sounds interesting, how can I help?\nUser: Determine the type of availability for this drug.\nAssistant: The answer is unknown"} {"text":"User: I have a question about the compound with canonical SMILES CC(=O)N[C@@H]1C[C@H](NC(C)(C)C)CC[C@@H]1N1CC[C@H](Nc2ncnc3cc(C(C)(C)C)nn23)C1=O.\nAssistant: Interesting, how can I help?\nUser: Please provide a description of this drug's mechanism of action.\nAssistant: C-C chemokine receptor type 2 antagonist"}", "/scratch/micpie/export/drugchat_liang_zhang_et_al/valid_0-1.jsonl": "{"text":"Question: Determine the type of availability for this drug.\nDescription: The Chemical can be represented with the DeepSMILES Cl.Cnc=O)cOccccF)cc6F))))))))cccncNCCCOCC6)))))))nc6%10.\nAnswer: unknown"} {"text":"Q: Please provide a description of this drug's mechanism of action.\n The Chemical has the DeepSMILES CC=O)N[C@@H]C[C@H]NCC)C)C)))CC[C@@H]6NCC[C@H]NcncncccCC)C)C))nn95))))))))))C5=O.\nAnswer: C-C chemokine receptor type 2 antagonist"}", "/scratch/micpie/export/drugchat_liang_zhang_et_al/valid_0-4.jsonl": "{"text":"Task: Answer the following question about the molecule with SELFIES [Cl].[C][N][C][=Branch1][C][=O][C][Branch1][=C][O][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][F][=C][C][=C][N][=C][Branch1][#Branch2][N][C][C][C][O][C][C][Ring1][=Branch1][N][=C][Ring1][=N][Ring2][Ring1][O].\nDescription: Determine the type of availability for this drug.\nResult: unknown"} {"text":"Task: Answer the following question about the molecule with DeepSMILES CC=O)N[C@@H]C[C@H]NCC)C)C)))CC[C@@H]6NCC[C@H]NcncncccCC)C)C))nn95))))))))))C5=O.\nQuestion: Please provide a description of this drug's mechanism of action.\nAnswer: C-C chemokine receptor type 2 antagonist"}", "/scratch/micpie/export/drugchat_liang_zhang_et_al/train_0-2.jsonl": "{"text":"User: I have a question about the compound with SMILES CCOC1=CC=C(C=C1)CC2=C(C=CC(=C2)[C@H]3[C@@H]([C@H]([C@@H]([C@H](O3)CO)O)O)O)Cl.C[C@@H](CO)O.\nAssistant: Sure, what is your question?\nUser: Please describe this drug.\nAssistant: The answer is Dapagliflozin Propanediol is the propanediol form of dapagliflozin, a selective sodium-glucose co-transporter subtype 2 (SGLT2) inhibitor with antihyperglycemic activity. Upon administration, dapagliflozin selectively targets and inhibits SGLT2, thereby preventing the reabsorption of glucose by the kidneys."} {"text":"User: I have a question about the compound with SELFIES [OH1-1].[O-2].[O-2].[O-2].[O-2].[O-2].[O-1][Si][O][Si][Branch1][Ring2][O][Ring1][Ring2][Branch1][Ring2][O][Ring1][Branch1][O-1].[O-1][Si][O][Si][Branch1][Ring2][O][Ring1][Ring2][Branch1][Ring2][O][Ring1][Branch1][O-1].[O-1][Si][O][Si][Branch1][Ring2][O][Ring1][Ring2][Branch1][Ring2][O][Ring1][Branch1][O-1].[O-1][Si][O][Si][Branch1][Ring2][O][Ring1][Ring2][Branch1][Ring2][O][Ring1][Branch1][O-1].[O-1][Si][O][Si][Branch1][Ring2][O][Ring1][Ring2][Branch1][Ring2][O][Ring1][Branch1][O-1].[O-1][Si][O][Si][Branch1][Ring2][O][Ring1][Ring2][Branch1][Ring2][O][Ring1][Branch1][O-1].[O-1][Si][O][Si][Branch1][Ring2][O][Ring1][Ring2][Branch1][Ring2][O][Ring1][Branch1][O-1].[F-1].[Mg+2].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[K+1].[K+1].[K+1].[Fe+2].\nAssistant: How can I help?\nUser: Please describe this drug.\nAssistant: illite is a mineral."}", "/scratch/micpie/export/drugchat_liang_zhang_et_al/train_0-1.jsonl": "{"text":"Q: Please describe this drug.\nConstraint: The Chemical can be represented with the canonical SMILES CCOc1ccc(Cc2cc([C@@H]3O[C@H](CO)[C@@H](O)[C@H](O)[C@H]3O)ccc2Cl)cc1.C[C@H](O)CO.\nAnswer: Dapagliflozin Propanediol is the propanediol form of dapagliflozin, a selective sodium-glucose co-transporter subtype 2 (SGLT2) inhibitor with antihyperglycemic activity. Upon administration, dapagliflozin selectively targets and inhibits SGLT2, thereby preventing the reabsorption of glucose by the kidneys."} {"text":"Question: Please describe this drug.\n The Compound can be represented with the SMILES [OH-].[O-2].[O-2].[O-2].[O-2].[O-2].[O-][Si]12O[Si](O1)(O2)[O-].[O-][Si]12O[Si](O1)(O2)[O-].[O-][Si]12O[Si](O1)(O2)[O-].[O-][Si]12O[Si](O1)(O2)[O-].[O-][Si]12O[Si](O1)(O2)[O-].[O-][Si]12O[Si](O1)(O2)[O-].[O-][Si]12O[Si](O1)(O2)[O-].[F-].[Mg+2].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[K+].[K+].[K+].[Fe+2].\nAnswer: illite is a mineral."}", "/scratch/micpie/export/drugchat_liang_zhang_et_al/train_0-4.jsonl": "{"text":"Task: Answer the following question about the molecule with canonical SMILES CCOc1ccc(Cc2cc([C@@H]3O[C@H](CO)[C@@H](O)[C@H](O)[C@H]3O)ccc2Cl)cc1.C[C@H](O)CO.\nDescription: Please describe this drug.\nAnswer: Dapagliflozin Propanediol is the propanediol form of dapagliflozin, a selective sodium-glucose co-transporter subtype 2 (SGLT2) inhibitor with antihyperglycemic activity. Upon administration, dapagliflozin selectively targets and inhibits SGLT2, thereby preventing the reabsorption of glucose by the kidneys."} {"text":"Task: Answer the following question about the molecule with SELFIES [OH1-1].[O-2].[O-2].[O-2].[O-2].[O-2].[O-1][Si][O][Si][Branch1][Ring2][O][Ring1][Ring2][Branch1][Ring2][O][Ring1][Branch1][O-1].[O-1][Si][O][Si][Branch1][Ring2][O][Ring1][Ring2][Branch1][Ring2][O][Ring1][Branch1][O-1].[O-1][Si][O][Si][Branch1][Ring2][O][Ring1][Ring2][Branch1][Ring2][O][Ring1][Branch1][O-1].[O-1][Si][O][Si][Branch1][Ring2][O][Ring1][Ring2][Branch1][Ring2][O][Ring1][Branch1][O-1].[O-1][Si][O][Si][Branch1][Ring2][O][Ring1][Ring2][Branch1][Ring2][O][Ring1][Branch1][O-1].[O-1][Si][O][Si][Branch1][Ring2][O][Ring1][Ring2][Branch1][Ring2][O][Ring1][Branch1][O-1].[O-1][Si][O][Si][Branch1][Ring2][O][Ring1][Ring2][Branch1][Ring2][O][Ring1][Branch1][O-1].[F-1].[Mg+2].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[Al+3].[K+1].[K+1].[K+1].[Fe+2].\nDescription: Please describe this drug.\nResult: illite is a mineral."}", "/scratch/micpie/export/drugchat_liang_zhang_et_al/valid_0-3.jsonl": "{"text":"User: I want to know more about the molecule with InChI InChI=1S\/C19H18F2N4O3.ClH\/c1-25-17-11(10-22-19(24-17)23-13-4-6-27-7-5-13)8-16(18(25)26)28-15-3-2-12(20)9-14(15)21;\/h2-3,8-10,13H,4-7H2,1H3,(H,22,23,24);1H.\nAssistant: Interesting, how can I help?\nUser: Determine the type of availability for this drug.\nAssistant: unknown"} {"text":"User: I want to know more about the chemical with SMILES CC(=O)N[C@@H]1C[C@H](NC(C)(C)C)CC[C@@H]1N1CC[C@H](Nc2ncnc3cc(C(C)(C)C)nn23)C1=O.\nAssistant: Interesting, how can I help?\nUser: Please provide a description of this drug's mechanism of action.\nAssistant: The answer is C-C chemokine receptor type 2 antagonist"}", "/scratch/micpie/export/drugchat_liang_zhang_et_al/test_0-4.jsonl": "{"text":"Task: Answer the following question about the molecule with canonical SMILES CC(C)c1nc(COC(N)=O)n(Cc2ccncc2)c1Sc1cc(Cl)cc(Cl)c1.\nQuestion: How many hydrogen bond donors does this compound have?\nAnswer: 1"} {"text":"Task: Answer the following question about the molecule with DeepSMILES CCNCC))C=O)CNC=O)cccOC))cOC))cOC))c6.\nQuestion: How many hydrogen bond donors does this compound have?\nCompletion: 1"}", "/scratch/micpie/export/block_polymers_morphology/test_0-1.jsonl": "{"text":"Question: If I have a copolymer with number-average molar mass (Mn) of 12000 g\/mol, volume fraction of block type 1 of 0.50, average molecular mass of 12960.0 g\/mol, and dispersity of 1.1, what phase will it be in at 140 K?\nAnswer: The polymer will be in the disordered phase."} {"text":"Question: If I have a copolymer with Mn of 8400 g\/mol, volume fraction of block type 1 of 0.70, what phase will it be in at 223 K?\nAnswer: The polymer will be in the gyroid phase."}", "/scratch/micpie/export/block_polymers_morphology/valid_0-0.jsonl": "{"text":"The di-block copolymer with BigSMILES {[][$]CC(c1ccccc1)[$][$]}{[>][<][Si](C)(C)O[>][]}, number-average molar mass (Mn) of 66577 g\/mol, volume fraction of block type 1 of 0.16, average molecular mass of 73900.0 g\/mol, and dispersity of 1.1 was studied at 25 K and found to be in the cylinder phase."} {"text":"The polymer with BigSMILES {[][<]CCO[>][<]}{[$][$]CCCC[$],[$]CC(CC)[$][]}, Mn of 8400 g\/mol, volume fraction of block type 1 of 0.70 was measured at 219 K and found to be in the gyroid phase."}", "/scratch/micpie/export/block_polymers_morphology/test_0-2.jsonl": "{"text":"User: I want to design a polymer with a particular number-average molar mass (Mn), volume fraction of block type 1, and phase.\nAssistant: I would need to know the number-average molar mass (Mn), volume fraction of block type 1, and phase of the polymer you want to design.\nUser: The number-average molar mass (Mn) should be 12000 g\/mol, the volume fraction of block type 1 should be 0.50, and the phase should be disordered.\nAssistant: I propose the polymer with BigSMILES CCC(C){[$][$]CC(C1CCCCC1)[$][$]}{[$][$]CCCC[$],[$]CC(CC)[$][$]}[H], average molecular mass of 12960.0 g\/mol, and dispersity of 1.1."} {"text":"User: I want to design a copolymer with a particular Mn, volume fraction of block type 1, and experimentally observed phase.\nAssistant: Cool, I would need to know the Mn, volume fraction of block type 1, and experimentally observed phase of the polymer you want to design.\nUser: The Mn should be 8400 g\/mol, the volume fraction of block type 1 should be 0.70, and the experimentally observed phase should be gyroid.\nAssistant: I advise the copolymer with BigSMILES {[][<]CCO[>][<]}{[$][$]CCCC[$],[$]CC(CC)[$][]}."}", "/scratch/micpie/export/block_polymers_morphology/test_0-0.jsonl": "{"text":"The di-block copolymer with BigSMILES CCC(C){[$][$]CC(C1CCCCC1)[$][$]}{[$][$]CCCC[$],[$]CC(CC)[$][$]}[H], number-average molar mass (Mn) of 12000 g\/mol, volume fraction of block type 1 of 0.50, average molecular mass of 12960.0 g\/mol, and dispersity of 1.1 was measured at 140 K and found to be in the disordered phase."} {"text":"The di-block copolymer with BigSMILES {[][<]CCO[>][<]}{[$][$]CCCC[$],[$]CC(CC)[$][]}, number-average molar mass (Mn) of 8400 g\/mol, volume fraction of block type 1 of 0.70 was measured at 223 K and found to be in the gyroid phase."}", "/scratch/micpie/export/block_polymers_morphology/test_0-3.jsonl": "{"text":"User: I want to design a di-block copolymer that is in the disordered phase.\nAssistant: That sounds interesting, Do you have any other constraints?\nUser: The Mn should be 12000 g\/mol, the volume fraction of block type 1 should be 0.50.\nAssistant: I propose the di-block copolymer with BigSMILES CCC(C){[$][$]CC(C1CCCCC1)[$][$]}{[$][$]CCCC[$],[$]CC(CC)[$][$]}[H], average molecular mass of 12960.0 g\/mol, and dispersity of 1.1."} {"text":"User: I want to design a di-block copolymer that is in the gyroid phase.\nAssistant: Awesome, Do you have any other constraints?\nUser: The number-average molar mass (Mn) should be 8400 g\/mol, the volume fraction of block type 1 should be 0.70.\nAssistant: I recommend the di-block copolymer with BigSMILES {[][<]CCO[>][<]}{[$][$]CCCC[$],[$]CC(CC)[$][]}."}", "/scratch/micpie/export/block_polymers_morphology/train_0-0.jsonl": "{"text":"The copolymer with BigSMILES {[][$]CC(c1ccccc1)[$][$]}{[>][<][Si](C)(C)O[>][]}, Mn of 60804 g\/mol, volume fraction of block type 1 of 0.08, average molecular mass of 68100.0 g\/mol, and dispersity of 1.1 was measured at 25 K and found to be in the sphere phase."} {"text":"The di-block copolymer with BigSMILES {[][<]CCO[>][<]}{[$][$]CCCC[$],[$]CC(CC)[$][]}, number-average molar mass (Mn) of 8400 g\/mol, volume fraction of block type 1 of 0.70 was studied at 227 K and found to be in the gyroid phase."}", "/scratch/micpie/export/block_polymers_morphology/train_0-3.jsonl": "{"text":"User: I want to design a polymer that is in the sphere phase.\nAssistant: Great, What else should I take into account?\nUser: The number-average molar mass should be 60804 g\/mol, the volume fraction of block type 1 should be 0.08.\nAssistant: I advise the polymer with BigSMILES {[][$]CC(c1ccccc1)[$][$]}{[>][<][Si](C)(C)O[>][]}, average molecular mass of 68100.0 g\/mol, and dispersity of 1.1."} {"text":"User: I want to design a copolymer that is in the gyroid phase.\nAssistant: That sounds interesting, What else should I take into account?\nUser: The Mn should be 8400 g\/mol, the volume fraction of block type 1 should be 0.70.\nAssistant: I recommend the copolymer with BigSMILES {[][<]CCO[>][<]}{[$][$]CCCC[$],[$]CC(CC)[$][]}."}", "/scratch/micpie/export/block_polymers_morphology/valid_0-2.jsonl": "{"text":"User: I want to design a polymer with a particular number-average molar mass (Mn), volume fraction of block type 1, and phase.\nAssistant: Awesome, I would need to know the number-average molar mass (Mn), volume fraction of block type 1, and phase of the polymer you want to design.\nUser: The number-average molar mass (Mn) should be 66577 g\/mol, the volume fraction of block type 1 should be 0.16, and the phase should be cylinder.\nAssistant: I advise the polymer with BigSMILES {[][$]CC(c1ccccc1)[$][$]}{[>][<][Si](C)(C)O[>][]}, average molecular mass of 73900.0 g\/mol, and dispersity of 1.1."} {"text":"User: I want to design a polymer with a particular Mn, volume fraction of block type 1, and experimentally observed phase.\nAssistant: Great, I would need to know the Mn, volume fraction of block type 1, and experimentally observed phase of the polymer you want to design.\nUser: The Mn should be 8400 g\/mol, the volume fraction of block type 1 should be 0.70, and the experimentally observed phase should be gyroid.\nAssistant: I advise the polymer with BigSMILES {[][<]CCO[>][<]}{[$][$]CCCC[$],[$]CC(CC)[$][]}."}", "/scratch/micpie/export/block_polymers_morphology/valid_0-1.jsonl": "{"text":"Question: If I have a di-block copolymer with number-average molar mass (Mn) of 66577 g\/mol, volume fraction of block type 1 of 0.16, average molecular mass of 73900.0 g\/mol, and dispersity of 1.1, what phase will it be in at 25 K?\nAnswer: The polymer will be in the cylinder phase."} {"text":"Question: If I have a polymer with number-average molar mass of 8400 g\/mol, volume fraction of block type 1 of 0.70, what phase will it be in at 219 K?\nAnswer: The polymer will be in the gyroid phase."}", "/scratch/micpie/export/block_polymers_morphology/train_0-2.jsonl": "{"text":"User: I want to design a di-block copolymer with a particular number-average molar mass, volume fraction of block type 1, and experimentally observed phase.\nAssistant: I would need to know the number-average molar mass, volume fraction of block type 1, and experimentally observed phase of the polymer you want to design.\nUser: The number-average molar mass should be 60804 g\/mol, the volume fraction of block type 1 should be 0.08, and the experimentally observed phase should be sphere.\nAssistant: I recommend the di-block copolymer with BigSMILES {[][$]CC(c1ccccc1)[$][$]}{[>][<][Si](C)(C)O[>][]}, average molecular mass of 68100.0 g\/mol, and dispersity of 1.1."} {"text":"User: I want to design a copolymer with a particular Mn, volume fraction of block type 1, and phase.\nAssistant: Awesome, I would need to know the Mn, volume fraction of block type 1, and phase of the polymer you want to design.\nUser: The Mn should be 8400 g\/mol, the volume fraction of block type 1 should be 0.70, and the phase should be gyroid.\nAssistant: I advise the copolymer with BigSMILES {[][<]CCO[>][<]}{[$][$]CCCC[$],[$]CC(CC)[$][]}."}", "/scratch/micpie/export/block_polymers_morphology/train_0-1.jsonl": "{"text":"Question: If I have a polymer with Mn of 60804 g\/mol, volume fraction of block type 1 of 0.08, average molecular mass of 68100.0 g\/mol, and dispersity of 1.1, what phase will it be in at 25 K?\nAnswer: The polymer will be in the sphere phase."} {"text":"Question: If I have a di-block copolymer with number-average molar mass of 8400 g\/mol, volume fraction of block type 1 of 0.70, what phase will it be in at 227 K?\nAnswer: The polymer will be in the gyroid phase."}", "/scratch/micpie/export/block_polymers_morphology/valid_0-3.jsonl": "{"text":"User: I want to design a copolymer that is in the cylinder phase.\nAssistant: That sounds interesting, What else should I take into account?\nUser: The number-average molar mass (Mn) should be 66577 g\/mol, the volume fraction of block type 1 should be 0.16.\nAssistant: I propose the copolymer with BigSMILES {[][$]CC(c1ccccc1)[$][$]}{[>][<][Si](C)(C)O[>][]}, average molecular mass of 73900.0 g\/mol, and dispersity of 1.1."} {"text":"User: I want to design a polymer that is in the gyroid phase.\nAssistant: Great, Do you have any other constraints?\nUser: The number-average molar mass (Mn) should be 8400 g\/mol, the volume fraction of block type 1 should be 0.70.\nAssistant: I propose the polymer with BigSMILES {[][<]CCO[>][<]}{[$][$]CCCC[$],[$]CC(CC)[$][]}."}", "/scratch/micpie/export/mp_anisotropy/test_0-5.jsonl": "{"text":"Task: Please give me a with a elastic anisotropy index derived from DFT simulations with the PBE functional of 0.258.\nResult: Ti2MnIr"} {"text":"Task: Please create a compound with a elastic anisotropy index computed using DFT with the PBE GGA functional of 0.584.\nResult: ScInIr2"}", "/scratch/micpie/export/mp_anisotropy/test_0-1.jsonl": "{"text":"Question: How large is the elastic anisotropy index derived from DFT simulations with the PBE functional of Ti2MnIr?\nAnswer: The elastic anisotropy index derived from DFT simulations with the PBE functional of Ti2MnIr is 0.258."} {"text":"Question: How large is the elastic anisotropy index computed using DFT with the PBE functional of the solid ScInIr2?\nAnswer: The elastic anisotropy index computed using DFT with the PBE functional of the solid ScInIr2 is 0.584."}", "/scratch/micpie/export/mp_anisotropy/valid_0-0.jsonl": "{"text":"The elastic anisotropy index computed using DFT with the PBE GGA functional of the compound ErTlTe2 is 1.080."} {"text":"The elastic anisotropy index computed using DFT with the PBE GGA functional of YCuSi is 0.700."}", "/scratch/micpie/export/mp_anisotropy/test_0-2.jsonl": "{"text":"User: I want to know the elastic anisotropy index derived from DFT simulations with the PBE functional of Ti2MnIr.\nAssistant: The elastic anisotropy index derived from DFT simulations with the PBE functional of Ti2MnIr is 0.258."} {"text":"User: I would like to know the elastic anisotropy index computed using DFT with the PBE functional of the compound ScInIr2.\nAssistant: The elastic anisotropy index computed using DFT with the PBE functional of the compound ScInIr2 is 0.584."}", "/scratch/micpie/export/mp_anisotropy/test_0-0.jsonl": "{"text":"The elastic anisotropy index computed using DFT with the PBE functional of Ti2MnIr is 0.258."} {"text":"The elastic anisotropy index derived from DFT simulations with the PBE functional of the solid ScInIr2 is 0.584."}", "/scratch/micpie/export/mp_anisotropy/test_0-3.jsonl": "{"text":"User: I would like to design a solid with a elastic anisotropy index computed using DFT with the PBE GGA functional of 0.258.\nAssistant: Here is a solid with a elastic anisotropy index computed using DFT with the PBE GGA functional of 0.258: Ti2MnIr."} {"text":"User: I want to design a with a elastic anisotropy index computed using DFT with the PBE GGA functional of 0.584.\nAssistant: Here is a with a elastic anisotropy index computed using DFT with the PBE GGA functional of 0.584: ScInIr2."}", "/scratch/micpie/export/mp_anisotropy/train_0-0.jsonl": "{"text":"The elastic anisotropy index computed using DFT with the PBE GGA functional of NbB2 is 0.280."} {"text":"The elastic anisotropy index computed using DFT with the PBE GGA functional of the solid Pt3O4 is 0.165."}", "/scratch/micpie/export/mp_anisotropy/train_0-3.jsonl": "{"text":"User: I would like to design a material with a elastic anisotropy index derived from DFT simulations with the PBE functional of 0.280.\nAssistant: I have found a material with a elastic anisotropy index derived from DFT simulations with the PBE functional of 0.280: NbB2."} {"text":"User: I would like to design a material with a elastic anisotropy index computed using DFT with the PBE GGA functional of 0.165.\nAssistant: Here is a material with a elastic anisotropy index computed using DFT with the PBE GGA functional of 0.165: Pt3O4."}", "/scratch/micpie/export/mp_anisotropy/valid_0-2.jsonl": "{"text":"User: I would like to know the elastic anisotropy index computed using DFT with the PBE GGA functional of the compound ErTlTe2.\nAssistant: The elastic anisotropy index computed using DFT with the PBE GGA functional of the compound ErTlTe2 is 1.080."} {"text":"User: I want to know the elastic anisotropy index derived from DFT simulations with the PBE functional of the solid YCuSi.\nAssistant: The elastic anisotropy index derived from DFT simulations with the PBE functional of the solid YCuSi is 0.700."}", "/scratch/micpie/export/mp_anisotropy/valid_0-1.jsonl": "{"text":"Question: How large is the elastic anisotropy index computed using DFT with the PBE GGA functional of ErTlTe2?\nAnswer: The elastic anisotropy index computed using DFT with the PBE GGA functional of ErTlTe2 is 1.080."} {"text":"Question: How large is the elastic anisotropy index computed using DFT with the PBE functional of the solid YCuSi?\nAnswer: The elastic anisotropy index computed using DFT with the PBE functional of the solid YCuSi is 0.700."}", "/scratch/micpie/export/mp_anisotropy/valid_0-5.jsonl": "{"text":"Task: Please give me a solid with a elastic anisotropy index computed using DFT with the PBE GGA functional of 1.080.\nResult: ErTlTe2"} {"text":"Task: Please create a solid with a elastic anisotropy index computed using DFT with the PBE GGA functional of 0.700.\nResult: YCuSi"}", "/scratch/micpie/export/mp_anisotropy/valid_0-4.jsonl": "{"text":"A with a elastic anisotropy index derived from DFT simulations with the PBE functional of 1.080 is ErTlTe2."} {"text":"A compound with a elastic anisotropy index computed using DFT with the PBE GGA functional of 0.700 is YCuSi."}", "/scratch/micpie/export/mp_anisotropy/train_0-5.jsonl": "{"text":"Task: Please give me a material with a elastic anisotropy index computed using DFT with the PBE GGA functional of 0.280.\nResult: NbB2"} {"text":"Task: Please give me a material with a elastic anisotropy index computed using DFT with the PBE functional of 0.165.\nResult: Pt3O4"}", "/scratch/micpie/export/mp_anisotropy/train_0-2.jsonl": "{"text":"User: I would like to know the elastic anisotropy index computed using DFT with the PBE GGA functional of NbB2.\nAssistant: The elastic anisotropy index computed using DFT with the PBE GGA functional of NbB2 is 0.280."} {"text":"User: I want to know the elastic anisotropy index computed using DFT with the PBE functional of the solid Pt3O4.\nAssistant: The elastic anisotropy index computed using DFT with the PBE functional of the solid Pt3O4 is 0.165."}", "/scratch/micpie/export/mp_anisotropy/train_0-1.jsonl": "{"text":"Question: How large is the elastic anisotropy index derived from DFT simulations with the PBE functional of the compound NbB2?\nAnswer: The elastic anisotropy index derived from DFT simulations with the PBE functional of the compound NbB2 is 0.280."} {"text":"Question: How large is the elastic anisotropy index computed using DFT with the PBE functional of the compound Pt3O4?\nAnswer: The elastic anisotropy index computed using DFT with the PBE functional of the compound Pt3O4 is 0.165."}", "/scratch/micpie/export/mp_anisotropy/train_0-4.jsonl": "{"text":"A compound with a elastic anisotropy index computed using DFT with the PBE functional of 0.280 is NbB2."} {"text":"A solid with a elastic anisotropy index computed using DFT with the PBE functional of 0.165 is Pt3O4."}", "/scratch/micpie/export/mp_anisotropy/valid_0-3.jsonl": "{"text":"User: I would like to design a material with a elastic anisotropy index computed using DFT with the PBE functional of 1.080.\nAssistant: I have found a material with a elastic anisotropy index computed using DFT with the PBE functional of 1.080: ErTlTe2."} {"text":"User: I would like to design a compound with a elastic anisotropy index computed using DFT with the PBE GGA functional of 0.700.\nAssistant: I have found a compound with a elastic anisotropy index computed using DFT with the PBE GGA functional of 0.700: YCuSi."}", "/scratch/micpie/export/mp_anisotropy/test_0-4.jsonl": "{"text":"A material with a elastic anisotropy index derived from DFT simulations with the PBE functional of 0.258 is Ti2MnIr."} {"text":"A solid with a elastic anisotropy index computed using DFT with the PBE functional of 0.584 is ScInIr2."}", "/scratch/micpie/export/melting_points/test_0-10.jsonl": "{"text":"User: I have a question about 1-chloro-3-methyl-5-nitro-isoquinoline.\nAssistant: That sounds interesting, how can I help?\nUser: What is the melting point of this compound?\nAssistant: 112.0 deg C. Do you have any other questions for me?\nUser: Indeed, what is the canonical SMILES of this compound?\nAssistant: Cc1cc2c([N+](=O)[O-])cccc2c(Cl)n1"} {"text":"User: I have a question about 7-[4-(Benzyloxy)phenoxy]heptanal.\nAssistant: How can I help?\nUser: What is the melting point of this molecule?\nAssistant: The melting point is 65.0 - 67.0 deg C. Do you have any other questions?\nUser: Yes, what is the InChI of this molecule?\nAssistant: InChI=1S\/C20H24O3\/c21-15-7-2-1-3-8-16-22-19-11-13-20(14-12-19)23-17-18-9-5-4-6-10-18\/h4-6,9-15H,1-3,7-8,16-17H2"}", "/scratch/micpie/export/melting_points/valid_0-8.jsonl": "{"text":"User: I have a question about 1-(2,6-Diethylphenyl)pyrrol-2-yl 2-piperidylmethyl ketone.\nAssistant: Interesting, how can I help?\nUser: What is the melting point of this compound?\nAssistant: The melting point is 163.000 deg C."} {"text":"User: I have a question about N-Fluorenylmethoxycarbonyl-S-[2,3-bis(palmitoyloxy)-(2RS)-propyl]-(R)-cysteine.\nAssistant: Interesting, how can I help?\nUser: What is the melting point of this molecule?\nAssistant: 83.000 deg C."}", "/scratch/micpie/export/melting_points/train_0-8.jsonl": "{"text":"User: I have a question about 1-n-butyl-5-nitro-isoquinoline.\nAssistant: Interesting, how can I help?\nUser: What is the melting point of this compound?\nAssistant: The melting point is 69.250 deg C."} {"text":"User: I have a question about 2,5-Di(2,2-diethoxyethyl)-1,4-diketo-3,6-di(5,7-di-t-butyl-2-benzofuryl)pyrrolo[3,4-c]pyrrole.\nAssistant: Interesting, how can I help?\nUser: What is the melting point of this molecule?\nAssistant: The melting point is 195.000 deg C."}", "/scratch/micpie/export/melting_points/test_0-5.jsonl": "{"text":"Q: What is the melting point of a molecule with the DeepSMILES ClC=NC=CC=CC=CC=C%106))))[N+]=O)[O-])))))C?\n112.0 deg C."} {"text":"Question: What is the melting point of a molecule with the DeepSMILES CC=CC=CC=C6))))))OC=CC=COCCCCCCC=O)))))))))C=C6?\nAnswer: 65.0 - 67.0 deg C."}", "/scratch/micpie/export/melting_points/valid_0-9.jsonl": "{"text":"User: I have a question about a molecule with the SELFIES [N][C][Branch1][#Branch1][C][C][C][C][Ring1][=Branch1][C][C][=Branch1][C][=O][C][N][Branch1][=Branch1][C][=C][C][=Ring1][Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][Ring1][=Branch1][C][C][C][C].\nAssistant: That sounds interesting, how can I help?\nUser: What is the melting point of this compound?\nAssistant: The melting point is 163.000 deg C. Is there anything else I can help you with?\nUser: Yes, what is the name of this compound?\nAssistant: 1-(2,6-Diethylphenyl)pyrrol-2-yl 2-piperidylmethyl ketone"} {"text":"User: I have a question about a molecule with the SMILES C1(=CC=CC=2C3=CC=CC=C3CC12)COC(=O)NC(CSCC(COC(CCCCCCCCCCCCCCC)=O)OC(CCCCCCCCCCCCCCC)=O)C(=O)O.\nAssistant: Sure, what is your question?\nUser: What is the melting point of this compound?\nAssistant: 83.000 deg C. Is there anything else I can help you with?\nUser: Yes, what is the name of this compound?\nAssistant: N-Fluorenylmethoxycarbonyl-S-[2,3-bis(palmitoyloxy)-(2RS)-propyl]-(R)-cysteine"}", "/scratch/micpie/export/melting_points/test_0-1.jsonl": "{"text":"Predict the melting point of a molecule with the SMILES ClC1=NC(=CC2=C(C=CC=C12)[N+](=O)[O-])C?\nA: The melting point is 112.000 deg C."} {"text":"Task: Predict the melting point of a molecule with the SELFIES [C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][=C][C][=C][Branch1][#Branch2][O][C][C][C][C][C][C][C][=O][C][=C][Ring1][#C]?\nAnswer: The melting point is 66.000 deg C."}", "/scratch/micpie/export/melting_points/valid_0-0.jsonl": "{"text":"Estimate the melting point of 1-(2,6-Diethylphenyl)pyrrol-2-yl 2-piperidylmethyl ketone.\nA: The melting point is 163.000 deg C."} {"text":"Task: Estimate the melting point of N-Fluorenylmethoxycarbonyl-S-[2,3-bis(palmitoyloxy)-(2RS)-propyl]-(R)-cysteine.\nThe melting point is 83.000 deg C."}", "/scratch/micpie/export/melting_points/test_0-2.jsonl": "{"text":"Q: What is the melting point of 1-chloro-3-methyl-5-nitro-isoquinoline?\nAnswer: 112.000 deg C."} {"text":"Q: What is the melting point of 7-[4-(Benzyloxy)phenoxy]heptanal?\nThe melting point is 66.000 deg C."}", "/scratch/micpie/export/melting_points/valid_0-10.jsonl": "{"text":"User: I have a question about 1-(2,6-Diethylphenyl)pyrrol-2-yl 2-piperidylmethyl ketone.\nAssistant: How can I help?\nUser: What is the melting point of this molecule?\nAssistant: 162.0 - 164.0 deg C. Do you have any other questions for me?\nUser: Yes, what is the InChI of this molecule?\nAssistant: InChI=1S\/C21H28N2O\/c1-3-16-9-7-10-17(4-2)21(16)23-14-8-12-19(23)20(24)15-18-11-5-6-13-22-18\/h7-10,12,14,18,22H,3-6,11,13,15H2,1-2H3"} {"text":"User: I have a question about N-Fluorenylmethoxycarbonyl-S-[2,3-bis(palmitoyloxy)-(2RS)-propyl]-(R)-cysteine.\nAssistant: How can I help?\nUser: What is the melting point of this molecule?\nAssistant: 82.0 - 84.0 deg C. Do you have any other questions?\nUser: Yes, what is the DeepSMILES of this molecule?\nAssistant: C=CC=CCC=CC=CC=C6CC%13=9))))))))))))COC=O)NCCSCCCOCCCCCCCCCCCCCCCC)))))))))))))))=O))))OCCCCCCCCCCCCCCCC)))))))))))))))=O)))))))C=O)O"}", "/scratch/micpie/export/melting_points/train_0-6.jsonl": "{"text":"Question: What is a compound with a melting point of 69.250 deg C?\nAnswer: 1-n-butyl-5-nitro-isoquinoline"} {"text":"Question: What is a compound with a melting point of 195.000 deg C?\n2,5-Di(2,2-diethoxyethyl)-1,4-diketo-3,6-di(5,7-di-t-butyl-2-benzofuryl)pyrrolo[3,4-c]pyrrole"}", "/scratch/micpie/export/melting_points/valid_0-6.jsonl": "{"text":"Question: What is a compound with a melting point of 163.000 deg C?\nA: 1-(2,6-Diethylphenyl)pyrrol-2-yl 2-piperidylmethyl ketone"} {"text":"Q: What is a compound with a melting point of 83.000 deg C?\nAnswer: N-Fluorenylmethoxycarbonyl-S-[2,3-bis(palmitoyloxy)-(2RS)-propyl]-(R)-cysteine"}", "/scratch/micpie/export/melting_points/test_0-9.jsonl": "{"text":"User: I have a question about a compound with the SMILES ClC1=NC(=CC2=C(C=CC=C12)[N+](=O)[O-])C.\nAssistant: That sounds interesting, how can I help?\nUser: What is the melting point of this molecule?\nAssistant: The melting point is 112.000 deg C. Is there anything else I can help you with?\nUser: Indeed, what is the name of this molecule?\nAssistant: 1-chloro-3-methyl-5-nitro-isoquinoline"} {"text":"User: I have a question about a compound with the InChI InChI=1S\/C20H24O3\/c21-15-7-2-1-3-8-16-22-19-11-13-20(14-12-19)23-17-18-9-5-4-6-10-18\/h4-6,9-15H,1-3,7-8,16-17H2.\nAssistant: Sure, what is your question?\nUser: What is the melting point of this compound?\nAssistant: The melting point is 66.000 deg C. Do you have any other questions for me today?\nUser: Yes, what is the name of this compound?\nAssistant: 7-[4-(Benzyloxy)phenoxy]heptanal"}", "/scratch/micpie/export/melting_points/test_0-0.jsonl": "{"text":"Task: Estimate the melting point of 1-chloro-3-methyl-5-nitro-isoquinoline.\nAnswer: The melting point is 112.000 deg C."} {"text":"Task: Estimate the melting point of 7-[4-(Benzyloxy)phenoxy]heptanal.\nAnswer: The melting point is 66.000 deg C."}", "/scratch/micpie/export/melting_points/valid_0-7.jsonl": "{"text":"Q: What is a compound with a melting point in the range 162.0 - 164.0 deg C?\nA: 1-(2,6-Diethylphenyl)pyrrol-2-yl 2-piperidylmethyl ketone"} {"text":"Q: What is a compound with a melting point in the range 82.0 - 84.0 deg C?\nAnswer: N-Fluorenylmethoxycarbonyl-S-[2,3-bis(palmitoyloxy)-(2RS)-propyl]-(R)-cysteine"}", "/scratch/micpie/export/melting_points/test_0-3.jsonl": "{"text":"Question: What is the melting point of a compound with the SMILES ClC1=NC(=CC2=C(C=CC=C12)[N+](=O)[O-])C?\nA: The melting point is 112.000 deg C."} {"text":"Q: What is the melting point of a molecule with the InChI InChI=1S\/C20H24O3\/c21-15-7-2-1-3-8-16-22-19-11-13-20(14-12-19)23-17-18-9-5-4-6-10-18\/h4-6,9-15H,1-3,7-8,16-17H2?\nA: The melting point is 66.000 deg C."}", "/scratch/micpie/export/melting_points/train_0-0.jsonl": "{"text":"Task: Predict the melting point of 1-n-butyl-5-nitro-isoquinoline.\nThe melting point is 69.250 deg C."} {"text":"Predict the melting point of 2,5-Di(2,2-diethoxyethyl)-1,4-diketo-3,6-di(5,7-di-t-butyl-2-benzofuryl)pyrrolo[3,4-c]pyrrole.\nAnswer: The melting point is 195.000 deg C."}", "/scratch/micpie/export/melting_points/test_0-6.jsonl": "{"text":"Q: What is a compound with a melting point of 112.000 deg C?\nA: 1-chloro-3-methyl-5-nitro-isoquinoline"} {"text":"Q: What is a compound with a melting point of 66.000 deg C?\nA: 7-[4-(Benzyloxy)phenoxy]heptanal"}", "/scratch/micpie/export/melting_points/train_0-10.jsonl": "{"text":"User: I have a question about 1-n-butyl-5-nitro-isoquinoline.\nAssistant: That sounds interesting, how can I help?\nUser: What is the melting point of this molecule?\nAssistant: The melting point is 69.0 - 69.5 deg C. Do you have any other questions for me?\nUser: Indeed, what is the canonical SMILES of this molecule?\nAssistant: CCCCc1nccc2c([N+](=O)[O-])cccc12"} {"text":"User: I have a question about 2,5-Di(2,2-diethoxyethyl)-1,4-diketo-3,6-di(5,7-di-t-butyl-2-benzofuryl)pyrrolo[3,4-c]pyrrole.\nAssistant: That sounds interesting, how can I help?\nUser: What is the melting point of this molecule?\nAssistant: 194.0 - 196.0 deg C. Do you have any other questions for me today?\nUser: Yes, what is the SELFIES of this molecule?\nAssistant: [C][Branch1][C][C][O][C][Branch2][Branch2][Branch2][C][N][C][Branch2][#Branch1][P][C][=C][Branch2][Branch1][O][N][Branch2][Ring2][=N][C][Branch2][Ring2][Branch2][C][Ring1][Branch1][=C][Ring1][Branch2][C][O][C][=C][Branch1][Ring2][C][=Ring1][Branch1][C][=C][Branch1][=N][C][=C][Ring1][#Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Branch1][C][C][Branch1][C][C][C][=O][C][C][Branch1][Ring2][O][C][C][O][C][C][C][O][C][=C][Branch1][Ring2][C][=Ring1][Branch1][C][=C][Branch1][=N][C][=C][Ring1][#Branch1][C][Branch1][C][C][Branch1][C][C][C][C][Branch1][C][C][Branch1][C][C][C][=O][O][C][C]"}", "/scratch/micpie/export/melting_points/train_0-3.jsonl": "{"text":"Question: What is the melting point of a molecule with the DeepSMILES CCCC)))C=NC=CC=CC=CC=C%106))))[N+]=O)[O-]?\nA: 69.250 deg C."} {"text":"Q: What is the melting point of a compound with the DeepSMILES CC)OCCNCC=CNCC5=C8COC=CC=5)C=CC=C6CC)C)C))))CC)C)C))))))))))=O))CCOCC)))OCC))))))COC=CC=5)C=CC=C6CC)C)C))))CC)C)C))))))))))=O))))OCC?\n195.000 deg C."}", "/scratch/micpie/export/melting_points/valid_0-2.jsonl": "{"text":"Question: What is the melting point of 1-(2,6-Diethylphenyl)pyrrol-2-yl 2-piperidylmethyl ketone?\nA: 163.000 deg C."} {"text":"Q: What is the melting point of N-Fluorenylmethoxycarbonyl-S-[2,3-bis(palmitoyloxy)-(2RS)-propyl]-(R)-cysteine?\nA: 83.000 deg C."}", "/scratch/micpie/export/melting_points/valid_0-1.jsonl": "{"text":"Task: Estimate the melting point of a molecule with the SELFIES [N][C][Branch1][#Branch1][C][C][C][C][Ring1][=Branch1][C][C][=Branch1][C][=O][C][N][Branch1][=Branch1][C][=C][C][=Ring1][Branch1][C][=C][Branch1][=Branch2][C][=C][C][=C][Ring1][=Branch1][C][C][C][C]?\nA: 163.000 deg C."} {"text":"Estimate the melting point of a compound with the canonical SMILES CCCCCCCCCCCCCCCC(=O)OCC(CSCC(NC(=O)OCc1cccc2c1Cc1ccccc1-2)C(=O)O)OC(=O)CCCCCCCCCCCCCCC?\nAnswer: The melting point is 83.000 deg C."}", "/scratch/micpie/export/melting_points/valid_0-5.jsonl": "{"text":"Q: What is the melting point of a compound with the SMILES N1C(CCCC1)CC(=O)C=2N(C=CC2)C3=C(C=CC=C3CC)CC?\nA: The melting point is in the range162.0 - 164.0 deg C."} {"text":"Question: What is the melting point of a molecule with the canonical SMILES CCCCCCCCCCCCCCCC(=O)OCC(CSCC(NC(=O)OCc1cccc2c1Cc1ccccc1-2)C(=O)O)OC(=O)CCCCCCCCCCCCCCC?\nA: 82.0 - 84.0 deg C."}", "/scratch/micpie/export/melting_points/valid_0-4.jsonl": "{"text":"Q: What is the melting point of 1-(2,6-Diethylphenyl)pyrrol-2-yl 2-piperidylmethyl ketone?\n162.0 - 164.0 deg C."} {"text":"Q: What is the melting point of N-Fluorenylmethoxycarbonyl-S-[2,3-bis(palmitoyloxy)-(2RS)-propyl]-(R)-cysteine?\n82.0 - 84.0 deg C."}", "/scratch/micpie/export/melting_points/train_0-5.jsonl": "{"text":"Question: What is the melting point of a molecule with the DeepSMILES CCCC)))C=NC=CC=CC=CC=C%106))))[N+]=O)[O-]?\nA: 69.0 - 69.5 deg C."} {"text":"Question: What is the melting point of a compound with the InChI InChI=1S\/C50H68N2O8\/c1-17-55-37(56-18-2)27-51-41(35-23-29-21-31(47(5,6)7)25-33(43(29)59-35)49(11,12)13)39-40(45(51)53)42(52(46(39)54)28-38(57-19-3)58-20-4)36-24-30-22-32(48(8,9)10)26-34(44(30)60-36)50(14,15)16\/h21-26,37-38H,17-20,27-28H2,1-16H3?\nThe melting point is in the range194.0 - 196.0 deg C."}", "/scratch/micpie/export/melting_points/train_0-2.jsonl": "{"text":"Question: What is the melting point of 1-n-butyl-5-nitro-isoquinoline?\nThe melting point is 69.250 deg C."} {"text":"Question: What is the melting point of 2,5-Di(2,2-diethoxyethyl)-1,4-diketo-3,6-di(5,7-di-t-butyl-2-benzofuryl)pyrrolo[3,4-c]pyrrole?\nAnswer: 195.000 deg C."}", "/scratch/micpie/export/melting_points/train_0-7.jsonl": "{"text":"Q: What is a compound with a melting point in the range 69.0 - 69.5 deg C?\nA: 1-n-butyl-5-nitro-isoquinoline"} {"text":"Q: What is a compound with a melting point in the range 194.0 - 196.0 deg C?\nA: 2,5-Di(2,2-diethoxyethyl)-1,4-diketo-3,6-di(5,7-di-t-butyl-2-benzofuryl)pyrrolo[3,4-c]pyrrole"}", "/scratch/micpie/export/melting_points/train_0-1.jsonl": "{"text":"Estimate the melting point of a molecule with the SMILES C(CCC)C1=NC=CC2=C(C=CC=C12)[N+](=O)[O-]?\nA: The melting point is 69.250 deg C."} {"text":"Task: Estimate the melting point of a molecule with the DeepSMILES CC)OCCNCC=CNCC5=C8COC=CC=5)C=CC=C6CC)C)C))))CC)C)C))))))))))=O))CCOCC)))OCC))))))COC=CC=5)C=CC=C6CC)C)C))))CC)C)C))))))))))=O))))OCC?\n195.000 deg C."}", "/scratch/micpie/export/melting_points/train_0-4.jsonl": "{"text":"Question: What is the melting point of 1-n-butyl-5-nitro-isoquinoline?\n69.0 - 69.5 deg C."} {"text":"Question: What is the melting point of 2,5-Di(2,2-diethoxyethyl)-1,4-diketo-3,6-di(5,7-di-t-butyl-2-benzofuryl)pyrrolo[3,4-c]pyrrole?\nAnswer: 194.0 - 196.0 deg C."}", "/scratch/micpie/export/melting_points/test_0-7.jsonl": "{"text":"Question: What is a compound with a melting point in the range 112.0 deg C?\n1-chloro-3-methyl-5-nitro-isoquinoline"} {"text":"Q: What is a compound with a melting point in the range 65.0 - 67.0 deg C?\nA: 7-[4-(Benzyloxy)phenoxy]heptanal"}", "/scratch/micpie/export/melting_points/train_0-9.jsonl": "{"text":"User: I have a question about a compound with the DeepSMILES CCCC)))C=NC=CC=CC=CC=C%106))))[N+]=O)[O-].\nAssistant: How can I help?\nUser: What is the melting point of this compound?\nAssistant: The melting point is 69.250 deg C. Is there anything else I can help you with?\nUser: Yes, what is the name of this compound?\nAssistant: 1-n-butyl-5-nitro-isoquinoline"} {"text":"User: I have a question about a compound with the DeepSMILES CC)OCCNCC=CNCC5=C8COC=CC=5)C=CC=C6CC)C)C))))CC)C)C))))))))))=O))CCOCC)))OCC))))))COC=CC=5)C=CC=C6CC)C)C))))CC)C)C))))))))))=O))))OCC.\nAssistant: Interesting, how can I help?\nUser: What is the melting point of this molecule?\nAssistant: The melting point is 195.000 deg C. Do you have any other questions?\nUser: Indeed, what is the name of this molecule?\nAssistant: 2,5-Di(2,2-diethoxyethyl)-1,4-diketo-3,6-di(5,7-di-t-butyl-2-benzofuryl)pyrrolo[3,4-c]pyrrole"}", "/scratch/micpie/export/melting_points/valid_0-3.jsonl": "{"text":"Q: What is the melting point of a compound with the DeepSMILES NCCCCC6))))CC=O)CNC=CC=5)))C=CC=CC=C6CC))))))CC?\nAnswer: 163.000 deg C."} {"text":"Q: What is the melting point of a compound with the canonical SMILES CCCCCCCCCCCCCCCC(=O)OCC(CSCC(NC(=O)OCc1cccc2c1Cc1ccccc1-2)C(=O)O)OC(=O)CCCCCCCCCCCCCCC?\nThe melting point is 83.000 deg C."}", "/scratch/micpie/export/melting_points/test_0-8.jsonl": "{"text":"User: I have a question about 1-chloro-3-methyl-5-nitro-isoquinoline.\nAssistant: Interesting, how can I help?\nUser: What is the melting point of this compound?\nAssistant: The melting point is 112.000 deg C."} {"text":"User: I have a question about 7-[4-(Benzyloxy)phenoxy]heptanal.\nAssistant: Interesting, how can I help?\nUser: What is the melting point of this molecule?\nAssistant: 66.000 deg C."}", "/scratch/micpie/export/melting_points/test_0-4.jsonl": "{"text":"Question: What is the melting point of 1-chloro-3-methyl-5-nitro-isoquinoline?\nA: 112.0 deg C."} {"text":"Q: What is the melting point of 7-[4-(Benzyloxy)phenoxy]heptanal?\nAnswer: 65.0 - 67.0 deg C."}", "/scratch/micpie/export/flashpoint/test_0-10.jsonl": "{"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a flash point of 286.150 K.\nAssistant: Got it, this DeepSMILES represents a molecule that has a flash point of 286.150 K: CCCl))Cl"} {"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a flash point of 394.175 K.\nAssistant: Got it, here you go, this SELFIES represents a molecule that has a flash point of 394.175 K: [O][=C][Branch1][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1]"}", "/scratch/micpie/export/flashpoint/valid_0-8.jsonl": "{"text":"User: Can you create the canonical SMILES of a molecule that has a flash point of 350.150 K?\nAssistant: Sure, here you go: CC(O)CN"} {"text":"User: Can you give me the InChI of a molecule that has a flash point of 250.125 K?\nAssistant: Sure, here you go: InChI=1S\/C6H14\/c1-3-5-6-4-2\/h3-6H2,1-2H3"}", "/scratch/micpie/export/flashpoint/train_0-8.jsonl": "{"text":"User: Can you create the InChI of a molecule that has a flash point of 467.150 K?\nAssistant: Sure, here you go: InChI=1S\/C6H3ClN2O4\/c7-5-2-1-4(8(10)11)3-6(5)9(12)13\/h1-3H"} {"text":"User: Can you generate the canonical SMILES of a molecule that has a flash point of 376.300 K?\nAssistant: Yes, here you go: COC(=O)CCCC(=O)OC"}", "/scratch/micpie/export/flashpoint/test_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the flash point in K.\nInChI: InChI=1S\/C2H4Cl2\/c3-1-2-4\/h1-2H2\nConstraint: Even if you are uncertain, you must answer with a numeric value in K without the unit and without using any other words.\nResult: 286.150"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the flash point in K.\nMolecule canonical SMILES: O=C(O)c1ccccc1\nConstraint: Even if you are not sure, you must answer with a numeric value in K without the unit and without using any additional words.\nResult: 394.175"}", "/scratch/micpie/export/flashpoint/valid_0-9.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that has a flash point of 350.150 K.\nAssistant: This is a molecule that has a flash point of 350.150 K: CC(CN)O"} {"text":"User: I'm looking for the canonical SMILES of a molecule that has a flash point of 250.125 K.\nAssistant: This is a molecule that has a flash point of 250.125 K: CCCCCC"}", "/scratch/micpie/export/flashpoint/test_0-1.jsonl": "{"text":"Based on the DeepSMILES CCCl))Cl, the molecule has a flash point of 286.150 K."} {"text":"Based on the SELFIES [O][=C][Branch1][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1], the molecule has a flash point of 394.175 K."}", "/scratch/micpie/export/flashpoint/valid_0-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CC(O)CN has a flash point of 350.150 K."} {"text":"The molecule with the SMILES representation of CCCCCC has a flash point of 250.125 K."}", "/scratch/micpie/export/flashpoint/test_0-2.jsonl": "{"text":"The SMILES C(CCl)Cl represents a molecule that has a flash point of 286.150 K."} {"text":"The InChI InChI=1S\/C7H6O2\/c8-7(9)6-4-2-1-3-5-6\/h1-5H,(H,8,9) represents a molecule with a flash point of 394.175 K."}", "/scratch/micpie/export/flashpoint/valid_0-10.jsonl": "{"text":"User: I want to create a InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a flash point of 350.150 K.\nAssistant: Ok, this InChI represents a molecule that has a flash point of 350.150 K: InChI=1S\/C3H9NO\/c1-3(5)2-4\/h3,5H,2,4H2,1H3"} {"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a flash point of 250.125 K.\nAssistant: Ok, this SELFIES represents a molecule that has a flash point of 250.125 K: [C][C][C][C][C][C]"}", "/scratch/micpie/export/flashpoint/train_0-6.jsonl": "{"text":"Task: Please give me a molecule canonical SMILES based on the description below.\nDescription: A molecule that has a flash point of 467.150 K.\nResult: O=[N+]([O-])c1ccc(Cl)c([N+](=O)[O-])c1"} {"text":"Task: Please generate a molecule DeepSMILES based on the text description below.\nDescription: A molecule that has a flash point of 376.300 K.\nResult: COC=O)CCCC=O)OC"}", "/scratch/micpie/export/flashpoint/valid_0-6.jsonl": "{"text":"Task: Please give me a SELFIES based on the text description below.\nDescription: A molecule that has a flash point of 350.150 K.\nResult: [C][C][Branch1][Ring1][C][N][O]"} {"text":"Task: Please give me a SELFIES based on the description.\nDescription: A molecule that has a flash point of 250.125 K.\nResult: [C][C][C][C][C][C]"}", "/scratch/micpie/export/flashpoint/test_0-9.jsonl": "{"text":"User: I'm searching for the InChI of a molecule that has a flash point of 286.150 K.\nAssistant: This is a molecule that has a flash point of 286.150 K: InChI=1S\/C2H4Cl2\/c3-1-2-4\/h1-2H2"} {"text":"User: I'm looking for the SELFIES of a molecule that has a flash point of 394.175 K.\nAssistant: This is a molecule that has a flash point of 394.175 K: [O][=C][Branch1][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1]"}", "/scratch/micpie/export/flashpoint/test_0-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of CCCl))Cl has a flash point of 286.150 K."} {"text":"The molecule with the SELFIES [O][=C][Branch1][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1] has a flash point of 394.175 K."}", "/scratch/micpie/export/flashpoint/valid_0-7.jsonl": "{"text":"User: Can you derive the flash point in K of the molecule with the canonical SMILES CC(O)CN?\nAssistant: Sure, this molecule has a flash point of 350.150 K."} {"text":"User: Can you derive the flash point in K of the molecule with the InChI InChI=1S\/C6H14\/c1-3-5-6-4-2\/h3-6H2,1-2H3?\nAssistant: Of course, this molecule has a flash point of 250.125 K."}", "/scratch/micpie/export/flashpoint/test_0-3.jsonl": "{"text":"The molecule with the SMILES C(CCl)Cl has a flash point of 286.150 K."} {"text":"The molecule with the DeepSMILES O=CO)cccccc6 has a flash point of 394.175 K."}", "/scratch/micpie/export/flashpoint/valid_0-11.jsonl": "{"text":"User: I want to come up with a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a flash point of 350.150 K.\nAssistant: Ok, this canonical SMILES represents a molecule that has a flash point of 350.150 K: CC(O)CN"} {"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should have a flash point of 250.125 K.\nAssistant: Got it, this SELFIES represents a molecule that has a flash point of 250.125 K: [C][C][C][C][C][C]"}", "/scratch/micpie/export/flashpoint/train_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of [C][=C][C][=Branch2][Ring1][C][=C][Branch1][#Branch2][C][=C][Ring1][=Branch1][N+1][=Branch1][C][=O][O-1][N+1][=Branch1][C][=O][O-1][Cl] has a flash point of 467.150 K."} {"text":"The molecule with the SMILES representation of COC(=O)CCCC(=O)OC has a flash point of 376.300 K."}", "/scratch/micpie/export/flashpoint/test_0-6.jsonl": "{"text":"Task: Please generate a molecule canonical SMILES based on the text description below.\nDescription: A molecule that has a flash point of 286.150 K.\nResult: ClCCCl"} {"text":"Task: Please give me a canonical SMILES based on the text description.\nDescription: A molecule that has a flash point of 394.175 K.\nResult: O=C(O)c1ccccc1"}", "/scratch/micpie/export/flashpoint/train_0-10.jsonl": "{"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a flash point of 467.150 K.\nAssistant: Got it, this InChI represents a molecule that has a flash point of 467.150 K: InChI=1S\/C6H3ClN2O4\/c7-5-2-1-4(8(10)11)3-6(5)9(12)13\/h1-3H"} {"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should have a flash point of 376.300 K.\nAssistant: Ok, here you go, this SELFIES represents a molecule that has a flash point of 376.300 K: [C][O][C][=Branch1][C][=O][C][C][C][C][=Branch1][C][=O][O][C]"}", "/scratch/micpie/export/flashpoint/train_0-3.jsonl": "{"text":"The molecule with the canonical SMILES O=[N+]([O-])c1ccc(Cl)c([N+](=O)[O-])c1 has a flash point of 467.150 K."} {"text":"The molecule with the canonical SMILES COC(=O)CCCC(=O)OC has a flash point of 376.300 K."}", "/scratch/micpie/export/flashpoint/valid_0-2.jsonl": "{"text":"The SMILES CC(CN)O represents a molecule with a flash point of 350.150 K."} {"text":"The SELFIES [C][C][C][C][C][C] represents a molecule with a flash point of 250.125 K."}", "/scratch/micpie/export/flashpoint/valid_0-1.jsonl": "{"text":"Based on the canonical SMILES CC(O)CN, the molecule has a flash point of 350.150 K."} {"text":"Based on the SELFIES representation of [C][C][C][C][C][C], the molecule has a flash point of 250.125 K."}", "/scratch/micpie/export/flashpoint/valid_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the flash point in K.\nDeepSMILES: CCCN))O\nConstraint: Even if you are not sure, you must answer with a numeric value in K without the unit and without using any additional words.\nResult: 350.150"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the flash point in K.\nMolecule SMILES: CCCCCC\nConstraint: Even if you are uncertain, you must answer with a numeric value in K without the unit and without using any additional words.\nResult: 250.125"}", "/scratch/micpie/export/flashpoint/valid_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the flash point in K.\nMolecule InChI: InChI=1S\/C3H9NO\/c1-3(5)2-4\/h3,5H,2,4H2,1H3\nConstraint: Even if you are uncertain, you must answer with a numeric value in K without using any additional words.\nResult: 350.150 K"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the flash point in K.\nMolecule canonical SMILES: CCCCCC\nConstraint: Even if you are not sure, you must answer with a numeric value in K without using any additional words.\nResult: 250.125 K"}", "/scratch/micpie/export/flashpoint/train_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the flash point in K.\nMolecule SMILES: C1=CC(=C(C=C1[N+](=O)[O-])[N+](=O)[O-])Cl\nConstraint: Even if you are uncertain, you must answer with a numeric value in K without the unit and without using any other words.\nResult: 467.150"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the flash point in K.\ncanonical SMILES: COC(=O)CCCC(=O)OC\nConstraint: Even if you are uncertain, you must answer with a numeric value in K without the unit and without using any other words.\nResult: 376.300"}", "/scratch/micpie/export/flashpoint/train_0-2.jsonl": "{"text":"The DeepSMILES C=CC=CC=C6[N+]=O)[O-]))))[N+]=O)[O-])))Cl represents a molecule with a flash point of 467.150 K."} {"text":"The SMILES COC(=O)CCCC(=O)OC represents a molecule with a flash point of 376.300 K."}", "/scratch/micpie/export/flashpoint/test_0-11.jsonl": "{"text":"User: I want to come up with a SELFIES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should have a flash point of 286.150 K.\nAssistant: Understood, this SELFIES represents a molecule that has a flash point of 286.150 K: [C][Branch1][Ring1][C][Cl][Cl]"} {"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should have a flash point of 394.175 K.\nAssistant: Got it, this InChI represents a molecule that has a flash point of 394.175 K: InChI=1S\/C7H6O2\/c8-7(9)6-4-2-1-3-5-6\/h1-5H,(H,8,9)"}", "/scratch/micpie/export/flashpoint/train_0-7.jsonl": "{"text":"User: Can you estimate the flash point in K of the molecule with the SMILES C1=CC(=C(C=C1[N+](=O)[O-])[N+](=O)[O-])Cl?\nAssistant: Sure, this molecule has a flash point of 467.150 K."} {"text":"User: Can you tell me the flash point in K of the molecule with the InChI InChI=1S\/C7H12O4\/c1-10-6(8)4-3-5-7(9)11-2\/h3-5H2,1-2H3?\nAssistant: Sure, this molecule has a flash point of 376.300 K."}", "/scratch/micpie/export/flashpoint/train_0-11.jsonl": "{"text":"User: I want to come up with a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should have a flash point of 467.150 K.\nAssistant: Got it, this canonical SMILES represents a molecule that has a flash point of 467.150 K: O=[N+]([O-])c1ccc(Cl)c([N+](=O)[O-])c1"} {"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should have a flash point of 376.300 K.\nAssistant: Got it, this InChI represents a molecule that has a flash point of 376.300 K: InChI=1S\/C7H12O4\/c1-10-6(8)4-3-5-7(9)11-2\/h3-5H2,1-2H3"}", "/scratch/micpie/export/flashpoint/train_0-1.jsonl": "{"text":"Based on the SELFIES [C][=C][C][=Branch2][Ring1][C][=C][Branch1][#Branch2][C][=C][Ring1][=Branch1][N+1][=Branch1][C][=O][O-1][N+1][=Branch1][C][=O][O-1][Cl], the molecule has a flash point of 467.150 K."} {"text":"Based on the DeepSMILES representation of COC=O)CCCC=O)OC, the molecule has a flash point of 376.300 K."}", "/scratch/micpie/export/flashpoint/train_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the flash point in K.\nMolecule DeepSMILES: C=CC=CC=C6[N+]=O)[O-]))))[N+]=O)[O-])))Cl\nConstraint: Even if you are uncertain, you must answer with a numeric value in K without using any additional words.\nResult: 467.150 K"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the flash point in K.\nInChI: InChI=1S\/C7H12O4\/c1-10-6(8)4-3-5-7(9)11-2\/h3-5H2,1-2H3\nConstraint: Even if you are not sure, you must answer with a numeric value in K without using any additional words.\nResult: 376.300 K"}", "/scratch/micpie/export/flashpoint/test_0-7.jsonl": "{"text":"User: Can you tell me the flash point in K of the molecule with the canonical SMILES ClCCCl?\nAssistant: Sure, this molecule has a flash point of 286.150 K."} {"text":"User: Can you derive the flash point in K of the molecule with the DeepSMILES O=CO)cccccc6?\nAssistant: Sure, this molecule has a flash point of 394.175 K."}", "/scratch/micpie/export/flashpoint/train_0-9.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that has a flash point of 467.150 K.\nAssistant: This is a molecule that has a flash point of 467.150 K: C1=CC(=C(C=C1[N+](=O)[O-])[N+](=O)[O-])Cl"} {"text":"User: I'm looking for the DeepSMILES of a molecule that has a flash point of 376.300 K.\nAssistant: This is a molecule that has a flash point of 376.300 K: COC=O)CCCC=O)OC"}", "/scratch/micpie/export/flashpoint/valid_0-3.jsonl": "{"text":"The molecule with the SMILES CC(CN)O has a flash point of 350.150 K."} {"text":"The molecule with the InChI InChI=1S\/C6H14\/c1-3-5-6-4-2\/h3-6H2,1-2H3 has a flash point of 250.125 K."}", "/scratch/micpie/export/flashpoint/test_0-8.jsonl": "{"text":"User: Can you give me the SELFIES of a molecule that has a flash point of 286.150 K?\nAssistant: Of course, here you go: [C][Branch1][Ring1][C][Cl][Cl]"} {"text":"User: Can you give me the canonical SMILES of a molecule that has a flash point of 394.175 K?\nAssistant: Sure, here you go: O=C(O)c1ccccc1"}", "/scratch/micpie/export/flashpoint/test_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the flash point in K.\ncanonical SMILES: ClCCCl\nConstraint: Even if you are not sure, you must answer with a numeric value in K without using any additional words.\nResult: 286.150 K"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the flash point in K.\nMolecule InChI: InChI=1S\/C7H6O2\/c8-7(9)6-4-2-1-3-5-6\/h1-5H,(H,8,9)\nConstraint: Even if you are uncertain, you must answer with a numeric value in K without using any additional words.\nResult: 394.175 K"}", "/scratch/micpie/export/herg_central_at_10uM/test_0-10.jsonl": "{"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should have a human ether-à-go-go related gene (hERG) inhibition at a concentration of 10uM of 19.031 %.\nAssistant: Got it, here you go, this SMILES represents a molecule that has a human ether-à-go-go related gene (hERG) inhibition at a concentration of 10uM of 19.031 %: COc1cccc(NC(=O)C2C3C=CC4(O3)C2C(=O)N(CCCN2CCCCC2)C4C(=O)NC2CCCCC2)c1"} {"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should have a hERG inhibition at a concentration of 10uM of 13.087 %.\nAssistant: Ok, this DeepSMILES represents a molecule that has a hERG inhibition at a concentration of 10uM of 13.087 %: CcccBr)ccc6SCC=O)OCC=O)Ncccccc6C#N"}", "/scratch/micpie/export/herg_central_at_10uM/valid_0-8.jsonl": "{"text":"User: Can you give me the SELFIES of a molecule that has a human ether-à-go-go related gene (hERG) inhibition at 10uM of 13.059 %?\nAssistant: Yes, here you go: [C][O][C][=C][C][=C][Branch2][Ring1][#Branch1][\/C][=C][\\S][C][=Branch1][C][=S][N][Branch1][=Branch2][N][C][C][O][C][C][Ring1][=Branch1][C][Ring1][N][=O][C][Branch1][Ring1][O][C][=C][Ring2][Ring1][=Branch1]"} {"text":"User: Can you generate the canonical SMILES of a molecule that has a human ether-à-go-go related gene (hERG) inhibition at 10uM of -8.227 %?\nAssistant: Sure, here you go: O=C(\/C=C\/C=C\/c1ccc2c(c1)OCO2)N1CCCCC1"}", "/scratch/micpie/export/herg_central_at_10uM/train_0-8.jsonl": "{"text":"User: Can you generate the IUPAC name of a molecule that has a hERG inhibition at a concentration of 10uM of 17.879 %?\nAssistant: Yes, I'm happy to help, here you go: 8-[1-(4-dimethylaminophenyl)-3-morpholin-4-yl-3-oxopropyl]-5,7-dimethoxychromen-2-one"} {"text":"User: Can you give me the SMILES of a molecule that has a hERG inhibition at a concentration of 10uM of -10.285 %?\nAssistant: Sure, here you go: COc1cccc(-n2nnc3c(=O)n(CC(=O)N4CCCc5ccccc54)cnc32)c1"}", "/scratch/micpie/export/herg_central_at_10uM/test_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the human ether-à-go-go related gene (hERG) inhibition at 10uM in %.\nMolecule DeepSMILES: COcccccNC=O)CCC=CCO5)C6C=O)NCCCNCCCCC6)))))))))C5C=O)NCCCCCC6)))))))))))))))))))c6\nConstraint: Even if you are uncertain, you must answer with a numeric value in % without the unit and without using any other words.\nResult: 19.031"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hERG inhibition at a concentration of 10uM in %.\nMolecule SELFIES: [C][C][=C][C][Branch1][C][Br][=C][C][=C][Ring1][#Branch1][S][C][C][=Branch1][C][=O][O][C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][#N]\nConstraint: Even if you are uncertain, you must answer with a numeric value in % without the unit and without using any other words.\nResult: 13.087"}", "/scratch/micpie/export/herg_central_at_10uM/valid_0-9.jsonl": "{"text":"User: I'm looking for the SELFIES of a molecule that has a hERG inhibition at a concentration of 10uM of 13.059 %.\nAssistant: This is a molecule that has a hERG inhibition at a concentration of 10uM of 13.059 %: [C][O][C][=C][C][=C][Branch2][Ring1][#Branch1][\/C][=C][\\S][C][=Branch1][C][=S][N][Branch1][=Branch2][N][C][C][O][C][C][Ring1][=Branch1][C][Ring1][N][=O][C][Branch1][Ring1][O][C][=C][Ring2][Ring1][=Branch1]"} {"text":"User: I'm looking for the canonical SMILES of a molecule that has a hERG inhibition at a concentration of 10uM of -8.227 %.\nAssistant: This is a molecule that has a hERG inhibition at a concentration of 10uM of -8.227 %: O=C(\/C=C\/C=C\/c1ccc2c(c1)OCO2)N1CCCCC1"}", "/scratch/micpie/export/herg_central_at_10uM/test_0-1.jsonl": "{"text":"Based on the DeepSMILES COcccccNC=O)CCC=CCO5)C6C=O)NCCCNCCCCC6)))))))))C5C=O)NCCCCCC6)))))))))))))))))))c6, the molecule has a human ether-à-go-go related gene (hERG) inhibition at 10uM of 19.031 %."} {"text":"Based on the canonical SMILES representation of Cc1cc(Br)ccc1SCC(=O)OCC(=O)Nc1ccccc1C#N, the molecule has a human ether-à-go-go related gene (hERG) inhibition at a concentration of 10uM of 13.087 %."}", "/scratch/micpie/export/herg_central_at_10uM/valid_0-0.jsonl": "{"text":"The molecule with the SELFIES [C][O][C][=C][C][=C][Branch2][Ring1][#Branch1][\/C][=C][\\S][C][=Branch1][C][=S][N][Branch1][=Branch2][N][C][C][O][C][C][Ring1][=Branch1][C][Ring1][N][=O][C][Branch1][Ring1][O][C][=C][Ring2][Ring1][=Branch1] has a hERG inhibition at 10uM of 13.059 %."} {"text":"The molecule with the SMILES O=C(\/C=C\/C=C\/c1ccc2c(c1)OCO2)N1CCCCC1 has a human ether-à-go-go related gene (hERG) inhibition at a concentration of 10uM of -8.227 %."}", "/scratch/micpie/export/herg_central_at_10uM/test_0-2.jsonl": "{"text":"The canonical SMILES COc1cccc(NC(=O)C2C3C=CC4(O3)C2C(=O)N(CCCN2CCCCC2)C4C(=O)NC2CCCCC2)c1 represents a molecule with a hERG inhibition at a concentration of 10uM of 19.031 %."} {"text":"The SMILES Cc1cc(Br)ccc1SCC(=O)OCC(=O)Nc1ccccc1C#N represents a molecule that has a hERG inhibition at a concentration of 10uM of 13.087 %."}", "/scratch/micpie/export/herg_central_at_10uM/valid_0-10.jsonl": "{"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a human ether-à-go-go related gene (hERG) inhibition at 10uM of 13.059 %.\nAssistant: Ok, this canonical SMILES represents a molecule that has a human ether-à-go-go related gene (hERG) inhibition at 10uM of 13.059 %: COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1"} {"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a human ether-à-go-go related gene (hERG) inhibition at a concentration of 10uM of -8.227 %.\nAssistant: Got it, this DeepSMILES represents a molecule that has a human ether-à-go-go related gene (hERG) inhibition at a concentration of 10uM of -8.227 %: O=C\/C=C\/C=C\/cccccc6)OCO5))))))))))))NCCCCC6"}", "/scratch/micpie/export/herg_central_at_10uM/train_0-6.jsonl": "{"text":"Task: Please generate a IUPAC name based on the text description below.\nDescription: A molecule that has a hERG inhibition at a concentration of 10uM of 17.879 %.\nResult: 8-[1-(4-dimethylaminophenyl)-3-morpholin-4-yl-3-oxopropyl]-5,7-dimethoxychromen-2-one"} {"text":"Task: Please give me a molecule SELFIES based on the text description.\nDescription: A molecule that has a human ether-à-go-go related gene (hERG) inhibition at 10uM of -10.285 %.\nResult: [C][O][C][=C][C][=C][C][Branch2][Ring2][=Branch2][N][N][=N][C][C][=Branch1][C][=O][N][Branch2][Ring1][Ring2][C][C][=Branch1][C][=O][N][C][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][#Branch2][C][=N][C][=Ring2][Ring1][Ring2][Ring2][Ring1][#Branch1][=C][Ring2][Ring1][=N]"}", "/scratch/micpie/export/herg_central_at_10uM/valid_0-6.jsonl": "{"text":"Task: Please give me a SMILES based on the description.\nDescription: A molecule that has a human ether-à-go-go related gene (hERG) inhibition at 10uM of 13.059 %.\nResult: COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1"} {"text":"Task: Please give me a molecule canonical SMILES based on the text description.\nDescription: A molecule that has a human ether-à-go-go related gene (hERG) inhibition at 10uM of -8.227 %.\nResult: O=C(\/C=C\/C=C\/c1ccc2c(c1)OCO2)N1CCCCC1"}", "/scratch/micpie/export/herg_central_at_10uM/test_0-9.jsonl": "{"text":"User: I'm searching for the SELFIES of a molecule that has a hERG inhibition at a concentration of 10uM of 19.031 %.\nAssistant: This is a molecule that has a hERG inhibition at a concentration of 10uM of 19.031 %: [C][O][C][=C][C][=C][C][Branch2][Branch1][Branch1][N][C][=Branch1][C][=O][C][C][C][=C][C][Branch1][Ring2][O][Ring1][Branch1][C][Ring1][#Branch1][C][=Branch1][C][=O][N][Branch1][N][C][C][C][N][C][C][C][C][C][Ring1][=Branch1][C][Ring1][S][C][=Branch1][C][=O][N][C][C][C][C][C][C][Ring1][=Branch1][=C][Ring2][Ring2][=Branch1]"} {"text":"User: I'm searching for the canonical SMILES of a molecule that has a hERG inhibition at 10uM of 13.087 %.\nAssistant: This is a molecule that has a hERG inhibition at 10uM of 13.087 %: Cc1cc(Br)ccc1SCC(=O)OCC(=O)Nc1ccccc1C#N"}", "/scratch/micpie/export/herg_central_at_10uM/test_0-0.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C31H42N4O5\/c1-39-23-13-8-12-22(20-23)33-28(36)25-24-14-15-31(40-24)26(25)30(38)35(19-9-18-34-16-6-3-7-17-34)27(31)29(37)32-21-10-4-2-5-11-21\/h8,12-15,20-21,24-27H,2-7,9-11,16-19H2,1H3,(H,32,37)(H,33,36) has a hERG inhibition at a concentration of 10uM of 19.031 %."} {"text":"The molecule with the canonical SMILES representation of Cc1cc(Br)ccc1SCC(=O)OCC(=O)Nc1ccccc1C#N has a hERG inhibition at a concentration of 10uM of 13.087 %."}", "/scratch/micpie/export/herg_central_at_10uM/valid_0-7.jsonl": "{"text":"User: Can you tell me the human ether-à-go-go related gene (hERG) inhibition at 10uM in % of the molecule with the SMILES COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1?\nAssistant: Yes, I'm happy to help, this molecule has a human ether-à-go-go related gene (hERG) inhibition at 10uM of 13.059 %."} {"text":"User: Can you derive the hERG inhibition at a concentration of 10uM in % of the molecule with the DeepSMILES O=C\/C=C\/C=C\/cccccc6)OCO5))))))))))))NCCCCC6?\nAssistant: Sure, this molecule has a hERG inhibition at a concentration of 10uM of -8.227 %."}", "/scratch/micpie/export/herg_central_at_10uM/test_0-3.jsonl": "{"text":"The molecule with the SMILES COc1cccc(NC(=O)C2C3C=CC4(O3)C2C(=O)N(CCCN2CCCCC2)C4C(=O)NC2CCCCC2)c1 has a human ether-à-go-go related gene (hERG) inhibition at 10uM of 19.031 %."} {"text":"The molecule with the DeepSMILES CcccBr)ccc6SCC=O)OCC=O)Ncccccc6C#N has a hERG inhibition at a concentration of 10uM of 13.087 %."}", "/scratch/micpie/export/herg_central_at_10uM/valid_0-11.jsonl": "{"text":"User: I want to come up with a SELFIES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a hERG inhibition at a concentration of 10uM of 13.059 %.\nAssistant: Ok, this SELFIES represents a molecule that has a hERG inhibition at a concentration of 10uM of 13.059 %: [C][O][C][=C][C][=C][Branch2][Ring1][#Branch1][\/C][=C][\\S][C][=Branch1][C][=S][N][Branch1][=Branch2][N][C][C][O][C][C][Ring1][=Branch1][C][Ring1][N][=O][C][Branch1][Ring1][O][C][=C][Ring2][Ring1][=Branch1]"} {"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should have a hERG inhibition at 10uM of -8.227 %.\nAssistant: Got it, this canonical SMILES represents a molecule that has a hERG inhibition at 10uM of -8.227 %: O=C(\/C=C\/C=C\/c1ccc2c(c1)OCO2)N1CCCCC1"}", "/scratch/micpie/export/herg_central_at_10uM/train_0-0.jsonl": "{"text":"The molecule with the canonical SMILES COc1cc(OC)c2ccc(=O)oc2c1C(CC(=O)N1CCOCC1)c1ccc(N(C)C)cc1 has a human ether-à-go-go related gene (hERG) inhibition at a concentration of 10uM of 17.879 %."} {"text":"The molecule with the InChI InChI=1S\/C22H20N6O3\/c1-31-17-9-4-8-16(12-17)28-21-20(24-25-28)22(30)26(14-23-21)13-19(29)27-11-5-7-15-6-2-3-10-18(15)27\/h2-4,6,8-10,12,14H,5,7,11,13H2,1H3 has a hERG inhibition at a concentration of 10uM of -10.285 %."}", "/scratch/micpie/export/herg_central_at_10uM/test_0-6.jsonl": "{"text":"Task: Please create a SELFIES based on the text description.\nDescription: A molecule that has a human ether-à-go-go related gene (hERG) inhibition at 10uM of 19.031 %.\nResult: [C][O][C][=C][C][=C][C][Branch2][Branch1][Branch1][N][C][=Branch1][C][=O][C][C][C][=C][C][Branch1][Ring2][O][Ring1][Branch1][C][Ring1][#Branch1][C][=Branch1][C][=O][N][Branch1][N][C][C][C][N][C][C][C][C][C][Ring1][=Branch1][C][Ring1][S][C][=Branch1][C][=O][N][C][C][C][C][C][C][Ring1][=Branch1][=C][Ring2][Ring2][=Branch1]"} {"text":"Task: Please create a SELFIES based on the description below.\nDescription: A molecule that has a hERG inhibition at a concentration of 10uM of 13.087 %.\nResult: [C][C][=C][C][Branch1][C][Br][=C][C][=C][Ring1][#Branch1][S][C][C][=Branch1][C][=O][O][C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][#N]"}", "/scratch/micpie/export/herg_central_at_10uM/train_0-10.jsonl": "{"text":"User: I want to generate a molecule IUPAC name.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a hERG inhibition at a concentration of 10uM of 17.879 %.\nAssistant: Got it, this IUPAC name represents a molecule that has a hERG inhibition at a concentration of 10uM of 17.879 %: 8-[1-(4-dimethylaminophenyl)-3-morpholin-4-yl-3-oxopropyl]-5,7-dimethoxychromen-2-one"} {"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a human ether-à-go-go related gene (hERG) inhibition at 10uM of -10.285 %.\nAssistant: Ok, here you go, this canonical SMILES represents a molecule that has a human ether-à-go-go related gene (hERG) inhibition at 10uM of -10.285 %: COc1cccc(-n2nnc3c(=O)n(CC(=O)N4CCCc5ccccc54)cnc32)c1"}", "/scratch/micpie/export/herg_central_at_10uM/train_0-3.jsonl": "{"text":"The molecule with the SELFIES [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][=C][C][=Branch1][C][=O][O][C][Ring1][#Branch1][=C][Ring1][=N][C][Branch1][=C][C][C][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][Branch1][=Branch1][N][Branch1][C][C][C][C][=C][Ring1][=Branch2] has a hERG inhibition at 10uM of 17.879 %."} {"text":"The molecule with the canonical SMILES COc1cccc(-n2nnc3c(=O)n(CC(=O)N4CCCc5ccccc54)cnc32)c1 has a human ether-à-go-go related gene (hERG) inhibition at 10uM of -10.285 %."}", "/scratch/micpie/export/herg_central_at_10uM/valid_0-2.jsonl": "{"text":"The SMILES COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1 is representing a molecule that has a hERG inhibition at a concentration of 10uM of 13.059 %."} {"text":"The DeepSMILES O=C\/C=C\/C=C\/cccccc6)OCO5))))))))))))NCCCCC6 is representing a molecule that has a human ether-à-go-go related gene (hERG) inhibition at a concentration of 10uM of -8.227 %."}", "/scratch/micpie/export/herg_central_at_10uM/valid_0-1.jsonl": "{"text":"Based on the canonical SMILES COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1, the molecule has a human ether-à-go-go related gene (hERG) inhibition at 10uM of 13.059 %."} {"text":"Based on the SELFIES representation of [O][=C][Branch2][Ring1][Ring2][\/C][=C][\/C][=C][\/C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O][Ring1][=Branch1][N][C][C][C][C][C][Ring1][=Branch1], the molecule has a human ether-à-go-go related gene (hERG) inhibition at 10uM of -8.227 %."}", "/scratch/micpie/export/herg_central_at_10uM/valid_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hERG inhibition at 10uM in %.\nSMILES: COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1\nConstraint: Even if you are not sure, you must answer with a numeric value in % without the unit and without using any other words.\nResult: 13.059"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the human ether-à-go-go related gene (hERG) inhibition at 10uM in %.\nSELFIES: [O][=C][Branch2][Ring1][Ring2][\/C][=C][\/C][=C][\/C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O][Ring1][=Branch1][N][C][C][C][C][C][Ring1][=Branch1]\nConstraint: Even if you are not sure, you must answer with a numeric value in % without the unit and without using any additional words.\nResult: -8.227"}", "/scratch/micpie/export/herg_central_at_10uM/valid_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the human ether-à-go-go related gene (hERG) inhibition at 10uM in %.\nDeepSMILES: COcccc\/C=C\\SC=S)NNCCOCC6))))))C5=O)))))))cOC))c6\nConstraint: Even if you are not sure, you must answer with a numeric value in % without using any other words.\nResult: 13.059 %"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hERG inhibition at a concentration of 10uM in %.\nInChI: InChI=1S\/C17H19NO3\/c19-17(18-10-4-1-5-11-18)7-3-2-6-14-8-9-15-16(12-14)21-13-20-15\/h2-3,6-9,12H,1,4-5,10-11,13H2\/b6-2+,7-3+\nConstraint: Even if you are not sure, you must answer with a numeric value in % without using any other words.\nResult: -8.227 %"}", "/scratch/micpie/export/herg_central_at_10uM/train_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the human ether-à-go-go related gene (hERG) inhibition at a concentration of 10uM in %.\nMolecule SELFIES: [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][=C][C][=Branch1][C][=O][O][C][Ring1][#Branch1][=C][Ring1][=N][C][Branch1][=C][C][C][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][Branch1][=Branch1][N][Branch1][C][C][C][C][=C][Ring1][=Branch2]\nConstraint: Even if you are not sure, you must answer with a numeric value in % without the unit and without using any other words.\nResult: 17.879"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the human ether-à-go-go related gene (hERG) inhibition at 10uM in %.\nDeepSMILES: COccccc-nnncc=O)nCC=O)NCCCcccccc6%10))))))))))))cnc69)))))))))c6\nConstraint: Even if you are uncertain, you must answer with a numeric value in % without the unit and without using any additional words.\nResult: -10.285"}", "/scratch/micpie/export/herg_central_at_10uM/train_0-2.jsonl": "{"text":"The SMILES COc1cc(OC)c2ccc(=O)oc2c1C(CC(=O)N1CCOCC1)c1ccc(N(C)C)cc1 is representing a molecule that has a human ether-à-go-go related gene (hERG) inhibition at 10uM of 17.879 %."} {"text":"The DeepSMILES COccccc-nnncc=O)nCC=O)NCCCcccccc6%10))))))))))))cnc69)))))))))c6 represents a molecule that has a hERG inhibition at 10uM of -10.285 %."}", "/scratch/micpie/export/herg_central_at_10uM/test_0-11.jsonl": "{"text":"User: I want to create a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should have a hERG inhibition at 10uM of 19.031 %.\nAssistant: Got it, this canonical SMILES represents a molecule that has a hERG inhibition at 10uM of 19.031 %: COc1cccc(NC(=O)C2C3C=CC4(O3)C2C(=O)N(CCCN2CCCCC2)C4C(=O)NC2CCCCC2)c1"} {"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should have a human ether-à-go-go related gene (hERG) inhibition at 10uM of 13.087 %.\nAssistant: Understood, this DeepSMILES represents a molecule that has a human ether-à-go-go related gene (hERG) inhibition at 10uM of 13.087 %: CcccBr)ccc6SCC=O)OCC=O)Ncccccc6C#N"}", "/scratch/micpie/export/herg_central_at_10uM/train_0-7.jsonl": "{"text":"User: Can you tell me the human ether-à-go-go related gene (hERG) inhibition at a concentration of 10uM in % of the molecule with the IUPAC name 8-[1-(4-dimethylaminophenyl)-3-morpholin-4-yl-3-oxopropyl]-5,7-dimethoxychromen-2-one?\nAssistant: Sure, this molecule has a human ether-à-go-go related gene (hERG) inhibition at a concentration of 10uM of 17.879 %."} {"text":"User: Can you tell me the human ether-à-go-go related gene (hERG) inhibition at 10uM in % of the molecule with the SMILES COc1cccc(-n2nnc3c(=O)n(CC(=O)N4CCCc5ccccc54)cnc32)c1?\nAssistant: Yes, this molecule has a human ether-à-go-go related gene (hERG) inhibition at 10uM of -10.285 %."}", "/scratch/micpie/export/herg_central_at_10uM/train_0-11.jsonl": "{"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a hERG inhibition at a concentration of 10uM of 17.879 %.\nAssistant: Got it, this SMILES represents a molecule that has a hERG inhibition at a concentration of 10uM of 17.879 %: COc1cc(OC)c2ccc(=O)oc2c1C(CC(=O)N1CCOCC1)c1ccc(N(C)C)cc1"} {"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a human ether-à-go-go related gene (hERG) inhibition at 10uM of -10.285 %.\nAssistant: Understood, this InChI represents a molecule that has a human ether-à-go-go related gene (hERG) inhibition at 10uM of -10.285 %: InChI=1S\/C22H20N6O3\/c1-31-17-9-4-8-16(12-17)28-21-20(24-25-28)22(30)26(14-23-21)13-19(29)27-11-5-7-15-6-2-3-10-18(15)27\/h2-4,6,8-10,12,14H,5,7,11,13H2,1H3"}", "/scratch/micpie/export/herg_central_at_10uM/train_0-1.jsonl": "{"text":"Based on the InChI representation of InChI=1S\/C26H30N2O6\/c1-27(2)18-7-5-17(6-8-18)20(15-23(29)28-11-13-33-14-12-28)25-22(32-4)16-21(31-3)19-9-10-24(30)34-26(19)25\/h5-10,16,20H,11-15H2,1-4H3, the molecule has a hERG inhibition at a concentration of 10uM of 17.879 %."} {"text":"Based on the SMILES representation of COc1cccc(-n2nnc3c(=O)n(CC(=O)N4CCCc5ccccc54)cnc32)c1, the molecule has a hERG inhibition at 10uM of -10.285 %."}", "/scratch/micpie/export/herg_central_at_10uM/train_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the human ether-à-go-go related gene (hERG) inhibition at a concentration of 10uM in %.\nSMILES: COc1cc(OC)c2ccc(=O)oc2c1C(CC(=O)N1CCOCC1)c1ccc(N(C)C)cc1\nConstraint: Even if you are not sure, you must answer with a numeric value in % without using any other words.\nResult: 17.879 %"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hERG inhibition at a concentration of 10uM in %.\nSMILES: COc1cccc(-n2nnc3c(=O)n(CC(=O)N4CCCc5ccccc54)cnc32)c1\nConstraint: Even if you are not sure, you must answer with a numeric value in % without using any other words.\nResult: -10.285 %"}", "/scratch/micpie/export/herg_central_at_10uM/test_0-7.jsonl": "{"text":"User: Can you tell me the hERG inhibition at a concentration of 10uM in % of the molecule with the SMILES COc1cccc(NC(=O)C2C3C=CC4(O3)C2C(=O)N(CCCN2CCCCC2)C4C(=O)NC2CCCCC2)c1?\nAssistant: Sure, this molecule has a hERG inhibition at a concentration of 10uM of 19.031 %."} {"text":"User: Can you estimate the hERG inhibition at a concentration of 10uM in % of the molecule with the canonical SMILES Cc1cc(Br)ccc1SCC(=O)OCC(=O)Nc1ccccc1C#N?\nAssistant: Sure, this molecule has a hERG inhibition at a concentration of 10uM of 13.087 %."}", "/scratch/micpie/export/herg_central_at_10uM/train_0-9.jsonl": "{"text":"User: I'm looking for the SELFIES of a molecule that has a hERG inhibition at 10uM of 17.879 %.\nAssistant: This is a molecule that has a hERG inhibition at 10uM of 17.879 %: [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][=C][C][=Branch1][C][=O][O][C][Ring1][#Branch1][=C][Ring1][=N][C][Branch1][=C][C][C][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][Branch1][=Branch1][N][Branch1][C][C][C][C][=C][Ring1][=Branch2]"} {"text":"User: I'm searching for the DeepSMILES of a molecule that has a hERG inhibition at a concentration of 10uM of -10.285 %.\nAssistant: This is a molecule that has a hERG inhibition at a concentration of 10uM of -10.285 %: COccccc-nnncc=O)nCC=O)NCCCcccccc6%10))))))))))))cnc69)))))))))c6"}", "/scratch/micpie/export/herg_central_at_10uM/valid_0-3.jsonl": "{"text":"The molecule with the DeepSMILES COcccc\/C=C\\SC=S)NNCCOCC6))))))C5=O)))))))cOC))c6 has a human ether-à-go-go related gene (hERG) inhibition at 10uM of 13.059 %."} {"text":"The molecule with the canonical SMILES O=C(\/C=C\/C=C\/c1ccc2c(c1)OCO2)N1CCCCC1 has a human ether-à-go-go related gene (hERG) inhibition at 10uM of -8.227 %."}", "/scratch/micpie/export/herg_central_at_10uM/test_0-8.jsonl": "{"text":"User: Can you generate the InChI of a molecule that has a human ether-à-go-go related gene (hERG) inhibition at a concentration of 10uM of 19.031 %?\nAssistant: Of course, here you go: InChI=1S\/C31H42N4O5\/c1-39-23-13-8-12-22(20-23)33-28(36)25-24-14-15-31(40-24)26(25)30(38)35(19-9-18-34-16-6-3-7-17-34)27(31)29(37)32-21-10-4-2-5-11-21\/h8,12-15,20-21,24-27H,2-7,9-11,16-19H2,1H3,(H,32,37)(H,33,36)"} {"text":"User: Can you create the SMILES of a molecule that has a human ether-à-go-go related gene (hERG) inhibition at a concentration of 10uM of 13.087 %?\nAssistant: Yes, I'm happy to help, here you go: Cc1cc(Br)ccc1SCC(=O)OCC(=O)Nc1ccccc1C#N"}", "/scratch/micpie/export/herg_central_at_10uM/test_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the human ether-à-go-go related gene (hERG) inhibition at 10uM in %.\nMolecule canonical SMILES: COc1cccc(NC(=O)C2C3C=CC4(O3)C2C(=O)N(CCCN2CCCCC2)C4C(=O)NC2CCCCC2)c1\nConstraint: Even if you are uncertain, you must answer with a numeric value in % without using any additional words.\nResult: 19.031 %"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the human ether-à-go-go related gene (hERG) inhibition at 10uM in %.\nMolecule canonical SMILES: Cc1cc(Br)ccc1SCC(=O)OCC(=O)Nc1ccccc1C#N\nConstraint: Even if you are uncertain, you must answer with a numeric value in % without using any other words.\nResult: 13.087 %"}", "/scratch/micpie/export/rhea_db_masked/test_0-1.jsonl": "{"text":"The compound with SMILES *O is the masked component in the reaction SMILES with one element masked as `MASK` *N[C@@H](CS)C(*)=O.*OO>>*N[C@@H](CSSC[C@H](N*)C(*)=O)C(*)=O.O.MASK."} {"text":"The compound with SMILES [Fe+3] is the masked component in the reaction SMILES with one element masked as `MASK` O=O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCCCCCCC.[Fe+2].[H+]>>O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCC\/C=C\\CCC.MASK."}", "/scratch/micpie/export/rhea_db_masked/valid_0-0.jsonl": "{"text":"The masked component in the masked reaction SMILES string (one component masked as `MASK`) CCCCC(=O)[O-].MASK>>CCCCC(N)=O.O is [NH4+]."} {"text":"The masked component in the masked RXNSMILES (one component masked as `MASK`) O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCC\/C=C\\CCC.MASK>>O=O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCCCCCCC.[Fe+2].[H+] is [Fe+3]."}", "/scratch/micpie/export/rhea_db_masked/test_0-2.jsonl": "{"text":"Question: What is the masked component in the masked reaction SMILES string (one component masked as `MASK`) *N[C@@H](CS)C(*)=O.*OO>>*N[C@@H](CSSC[C@H](N*)C(*)=O)C(*)=O.O.MASK?\nAnswer: *O."} {"text":"Question: What is the masked component in the reaction SMILES with one element hidden as `MASK` O=O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCCCCCCC.[Fe+2].[H+]>>O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCC\/C=C\\CCC.MASK?\nAnswer: [Fe+3]."}", "/scratch/micpie/export/rhea_db_masked/test_0-0.jsonl": "{"text":"The masked component in the masked RXNSMILES (one component masked as `MASK`) *N[C@@H](CS)C(*)=O.*OO>>*N[C@@H](CSSC[C@H](N*)C(*)=O)C(*)=O.O.MASK is *O."} {"text":"The masked component in the masked reaction SMILES (one component masked as `MASK`) O=O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCCCCCCC.[Fe+2].[H+]>>O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCC\/C=C\\CCC.MASK is [Fe+3]."}", "/scratch/micpie/export/rhea_db_masked/test_0-3.jsonl": "{"text":"Task: Predict the masked component in a reaction SMILES with one element hidden as `MASK`.\nDescription: *N[C@@H](CS)C(*)=O.*OO>>*N[C@@H](CSSC[C@H](N*)C(*)=O)C(*)=O.O.MASK\nAnswer: *O"} {"text":"Task: Predict the masked component in a masked reaction SMILES (one component masked as `MASK`).\nDescription: O=O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCCCCCCC.[Fe+2].[H+]>>O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCC\/C=C\\CCC.MASK\nAnswer: [Fe+3]"}", "/scratch/micpie/export/rhea_db_masked/train_0-0.jsonl": "{"text":"The masked component in the masked RXNSMILES (one component masked as `MASK`) CCCCC(N)=O.O>>[NH4+].MASK is CCCCC(=O)[O-]."} {"text":"The masked component in the masked reaction SMILES (one component masked as `MASK`) O=C([O-])\/C=C\/C(=O)[O-].MASK>>CC(=O)N[C@@H](CSC(CC(=O)[O-])C(=O)[O-])C(=O)[O-] is CC(=O)N[C@@H](CS)C(=O)[O-]."}", "/scratch/micpie/export/rhea_db_masked/train_0-3.jsonl": "{"text":"Task: Predict the masked component in a masked RXNSMILES (one component masked as `MASK`).\nDescription: CCCCC(N)=O.O>>[NH4+].MASK\nAnswer: CCCCC(=O)[O-]"} {"text":"Task: Predict the masked component in a reaction SMILES with one element hidden as `MASK`.\nDescription: O=C([O-])\/C=C\/C(=O)[O-].MASK>>CC(=O)N[C@@H](CSC(CC(=O)[O-])C(=O)[O-])C(=O)[O-]\nAnswer: CC(=O)N[C@@H](CS)C(=O)[O-]"}", "/scratch/micpie/export/rhea_db_masked/valid_0-2.jsonl": "{"text":"Question: What is the masked component in the reaction SMILES with one element masked as `MASK` CCCCC(=O)[O-].MASK>>CCCCC(N)=O.O?\nAnswer: [NH4+]."} {"text":"Question: What is the masked component in the masked reaction SMILES string (one component masked as `MASK`) O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCC\/C=C\\CCC.MASK>>O=O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCCCCCCC.[Fe+2].[H+]?\nAnswer: [Fe+3]."}", "/scratch/micpie/export/rhea_db_masked/valid_0-1.jsonl": "{"text":"The compound with SMILES [NH4+] is the masked component in the masked RXNSMILES (one component masked as `MASK`) CCCCC(=O)[O-].MASK>>CCCCC(N)=O.O."} {"text":"The compound with SMILES [Fe+3] is the masked component in the masked reaction SMILES string (one component masked as `MASK`) O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCC\/C=C\\CCC.MASK>>O=O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCCCCCCC.[Fe+2].[H+]."}", "/scratch/micpie/export/rhea_db_masked/train_0-2.jsonl": "{"text":"Question: What is the masked component in the reaction SMILES with one element hidden as `MASK` CCCCC(N)=O.O>>[NH4+].MASK?\nAnswer: CCCCC(=O)[O-]."} {"text":"Question: What is the masked component in the reaction SMILES with one element hidden as `MASK` O=C([O-])\/C=C\/C(=O)[O-].MASK>>CC(=O)N[C@@H](CSC(CC(=O)[O-])C(=O)[O-])C(=O)[O-]?\nAnswer: CC(=O)N[C@@H](CS)C(=O)[O-]."}", "/scratch/micpie/export/rhea_db_masked/train_0-1.jsonl": "{"text":"The compound with SMILES CCCCC(=O)[O-] is the masked component in the reaction SMILES with one element hidden as `MASK` CCCCC(N)=O.O>>[NH4+].MASK."} {"text":"The chemical with SMILES CC(=O)N[C@@H](CS)C(=O)[O-] is the masked component in the masked reaction SMILES (one component masked as `MASK`) O=C([O-])\/C=C\/C(=O)[O-].MASK>>CC(=O)N[C@@H](CSC(CC(=O)[O-])C(=O)[O-])C(=O)[O-]."}", "/scratch/micpie/export/rhea_db_masked/valid_0-3.jsonl": "{"text":"Task: Predict the masked component in a masked reaction SMILES (one component masked as `MASK`).\nDescription: CCCCC(=O)[O-].MASK>>CCCCC(N)=O.O\nAnswer: [NH4+]"} {"text":"Task: Predict the masked component in a reaction SMILES with one element hidden as `MASK`.\nDescription: O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCC\/C=C\\CCC.MASK>>O=O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCCCCCCC.[Fe+2].[H+]\nAnswer: [Fe+3]"}", "/scratch/micpie/export/sr_p53_tox21/test_0-10.jsonl": "{"text":"User: Can you generate the SMILES of a molecule that is not toxic in the SR-p53 response assay?\nAssistant: Of course, here you go: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"User: Can you generate the SMILES of a molecule that is not toxic in the p53 response assay?\nAssistant: Yes, I'm happy to help, here you go: FC(F)(F)c1ccc(Cl)cc1Cl"}", "/scratch/micpie/export/sr_p53_tox21/valid_0-8.jsonl": "{"text":"User: Can you estimate if the molecule with the SMILES Cc1cnc(C)c(C)n1 is toxic in the SR-p53 assay?\nAssistant: No, this molecule is not toxic in the SR-p53 assay."} {"text":"User: Can you figure out if the molecule with the InChI InChI=1S\/C16H12O\/c1-11(17)14-9-8-13-7-6-12-4-2-3-5-15(12)16(13)10-14\/h2-10H,1H3 is toxic in the SR-p53 response assay?\nAssistant: Yes, this molecule is toxic in the SR-p53 response assay."}", "/scratch/micpie/export/sr_p53_tox21/test_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the p53 response assay?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any other words.\nOptions:\n1 O=NN1CCCCC1\n2 CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C\n3 CCOC(=O)C1=C(COCCN)NC(C)=C(C(=O)OC)C1c1ccccc1Cl.O=S(=O)(O)c1ccccc1\nAnswer: 1, 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the SR-p53 assay?\nConstraint: You must select none, one or more options from a, b, c, or d without using any additional words.\nOptions:\na.) FC(F)(F)c1ccc(Cl)cc1Cl\nb.) O=C(O)c1ccccc1Nc1cccc(C(F)(F)F)c1\nc.) O=C=Nc1cccc(C(F)(F)F)c1\nd.) CC(C)(C)CC(C)(C)c1ccc(OCCO)cc1\nAnswer: a, c, d"}", "/scratch/micpie/export/sr_p53_tox21/train_0-8.jsonl": "{"text":"User: Can you estimate if the molecule with the SELFIES [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N] is toxic in the SR-p53 assay?\nAssistant: No, this molecule is not toxic in the SR-p53 assay."} {"text":"User: Can you tell me if the molecule with the SMILES CCBr is toxic in the SR-p53 assay?\nAssistant: No, this molecule is not toxic in the SR-p53 assay."}", "/scratch/micpie/export/sr_p53_tox21/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the p53 response assay.\nMolecule canonical SMILES: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the p53 response assay.\nMolecule DeepSMILES: FCF)F)ccccCl)cc6Cl\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"}", "/scratch/micpie/export/sr_p53_tox21/valid_0-9.jsonl": "{"text":"User: Is the molecule with the DeepSMILES CccncC)cC)n6 toxic in the SR-p53 assay?\nAssistant: No, it is not toxic in the SR-p53 assay."} {"text":"User: Is the molecule with the InChI InChI=1S\/C16H12O\/c1-11(17)14-9-8-13-7-6-12-4-2-3-5-15(12)16(13)10-14\/h2-10H,1H3 toxic in the SR-p53 response assay?\nAssistant: Yes, it is toxic in the SR-p53 response assay."}", "/scratch/micpie/export/sr_p53_tox21/test_0-1.jsonl": "{"text":"The molecule with the SMILES representation of CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C is not showing SR-p53 toxicity."} {"text":"The molecule with the InChI representation of InChI=1S\/C7H3Cl2F3\/c8-4-1-2-5(6(9)3-4)7(10,11)12\/h1-3H is not showing SR-p53 toxicity."}", "/scratch/micpie/export/sr_p53_tox21/valid_0-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of Cc1cnc(C)c(C)n1 is not toxic in the SR-p53 assay."} {"text":"The molecule with the SMILES CC(=O)c1ccc2ccc3ccccc3c2c1 is toxic in the SR-p53 assay."}", "/scratch/micpie/export/sr_p53_tox21/test_0-2.jsonl": "{"text":"Based on the canonical SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C, the molecule has no p53 response toxicity features."} {"text":"Based on the SMILES representation FC(F)(F)c1ccc(Cl)cc1Cl, the molecule has no SR-p53 response toxicity features."}", "/scratch/micpie/export/sr_p53_tox21/valid_0-10.jsonl": "{"text":"User: Can you create the InChI of a molecule that is not toxic in the SR-p53 response assay?\nAssistant: Of course, here you go: InChI=1S\/C7H10N2\/c1-5-4-8-6(2)7(3)9-5\/h4H,1-3H3"} {"text":"User: Can you generate the DeepSMILES of a molecule that is toxic in the p53 response assay?\nAssistant: Of course, here you go: CC=O)cccccccccccc6c%10c%14"}", "/scratch/micpie/export/sr_p53_tox21/train_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-p53 assay.\nMolecule InChI: InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13)\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the SR-p53 assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-p53 response assay.\nMolecule SMILES: CCBr\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the SR-p53 response assay."}", "/scratch/micpie/export/sr_p53_tox21/valid_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-p53 assay.\nMolecule InChI: InChI=1S\/C7H10N2\/c1-5-4-8-6(2)7(3)9-5\/h4H,1-3H3\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the SR-p53 assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-p53 response assay.\nSMILES: CC(=O)c1ccc2ccc3ccccc3c2c1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is toxic in the SR-p53 response assay."}", "/scratch/micpie/export/sr_p53_tox21/test_0-9.jsonl": "{"text":"User: Is the molecule with the canonical SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C toxic in the SR-p53 response assay?\nAssistant: No, it is not toxic in the SR-p53 response assay."} {"text":"User: Is the molecule with the canonical SMILES FC(F)(F)c1ccc(Cl)cc1Cl toxic in the p53 response assay?\nAssistant: No, it is not toxic in the p53 response assay."}", "/scratch/micpie/export/sr_p53_tox21/test_0-0.jsonl": "{"text":"The molecule with the SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C is not toxic in the SR-p53 assay."} {"text":"The molecule with the canonical SMILES representation of FC(F)(F)c1ccc(Cl)cc1Cl is not toxic in the SR-p53 response assay."}", "/scratch/micpie/export/sr_p53_tox21/valid_0-7.jsonl": "{"text":"Task: Please generate a SMILES based on the description.\nDescription: A molecule that is toxic in the p53 response assay.\nResult: Cc1cnc(C)c(C)n1"} {"text":"Task: Please generate a molecule SMILES based on the text description.\nDescription: A molecule that is toxic in the SR-p53 assay.\nResult: CC(=O)c1ccc2ccc3ccccc3c2c1"}", "/scratch/micpie/export/sr_p53_tox21/test_0-3.jsonl": "{"text":"The SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C is from a molecule that is not identified as toxic in the p53 response assay."} {"text":"The canonical SMILES FC(F)(F)c1ccc(Cl)cc1Cl represents a molecule that is not identified as toxic in the SR-p53 response assay."}", "/scratch/micpie/export/sr_p53_tox21/valid_0-11.jsonl": "{"text":"User: I'm searching for the SELFIES of a molecule that is not toxic in the p53 response assay?\nAssistant: This is a molecule that is not toxic in the p53 response assay: [C][C][=C][N][=C][Branch1][C][C][C][Branch1][C][C][=N][Ring1][Branch2]"} {"text":"User: I'm looking for the DeepSMILES of a molecule that is toxic in the SR-p53 response assay?\nAssistant: This is a molecule that is toxic in the SR-p53 response assay: CC=O)cccccccccccc6c%10c%14"}", "/scratch/micpie/export/sr_p53_tox21/train_0-0.jsonl": "{"text":"The molecule with the SELFIES [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N] is not toxic in the SR-p53 response assay."} {"text":"The molecule with the canonical SMILES CCBr is not toxic in the SR-p53 response assay."}", "/scratch/micpie/export/sr_p53_tox21/test_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-p53 assay.\nMolecule DeepSMILES: CCCNCC))CCC))C=O)NccC)cccc6C\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the SR-p53 assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-p53 assay.\nInChI: InChI=1S\/C7H3Cl2F3\/c8-4-1-2-5(6(9)3-4)7(10,11)12\/h1-3H\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the SR-p53 assay."}", "/scratch/micpie/export/sr_p53_tox21/train_0-10.jsonl": "{"text":"User: Can you give me the InChI of a molecule that is not toxic in the p53 response assay?\nAssistant: Yes, I'm happy to help, here you go: InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13)"} {"text":"User: Can you give me the DeepSMILES of a molecule that is not toxic in the SR-p53 assay?\nAssistant: Of course, here you go: CCBr"}", "/scratch/micpie/export/sr_p53_tox21/train_0-3.jsonl": "{"text":"The InChI InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13) represents a molecule that is not identified as toxic in the p53 response assay."} {"text":"The InChI InChI=1S\/C2H5Br\/c1-2-3\/h2H2,1H3 represents a molecule that is not identified as toxic in the p53 response assay."}", "/scratch/micpie/export/sr_p53_tox21/train_0-12.jsonl": "{"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be toxic in the p53 response assay.\nAssistant: Ok, here you go, this canonical SMILES is not toxic in the p53 response assay: CCOc1ccc2nc(S(N)(=O)=O)sc2c1"} {"text":"User: I want to create a InChI.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be toxic in the SR-p53 assay.\nAssistant: Ok, here you go, this InChI is not toxic in the SR-p53 assay: InChI=1S\/C2H5Br\/c1-2-3\/h2H2,1H3"}", "/scratch/micpie/export/sr_p53_tox21/test_0-13.jsonl": "{"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the p53 response assay.\nAssistant: Understood, this SMILES is not toxic in the p53 response assay: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the SR-p53 response assay.\nAssistant: Got it, this DeepSMILES is not toxic in the SR-p53 response assay: FCF)F)ccccCl)cc6Cl"}", "/scratch/micpie/export/sr_p53_tox21/valid_0-2.jsonl": "{"text":"Based on the SELFIES [C][C][=C][N][=C][Branch1][C][C][C][Branch1][C][C][=N][Ring1][Branch2], the molecule has no SR-p53 response toxicity characteristics."} {"text":"Based on the canonical SMILES representation CC(=O)c1ccc2ccc3ccccc3c2c1, the molecule has SR-p53 toxicity characteristics."}", "/scratch/micpie/export/sr_p53_tox21/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of CCOc1ccc2nc(S(N)(=O)=O)sc2c1 toxic in the SR-p53 response assay?\nConstraint: Even if you are not sure, you must pick either a or b without using any other words.\nOptions:\na) False\nb) True\nAnswer: a"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [C][C][Br] toxic in the SR-p53 response assay?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any other words.\nOptions:\n1) False\n2) True\nAnswer: 1"}", "/scratch/micpie/export/sr_p53_tox21/valid_0-1.jsonl": "{"text":"The molecule with the SELFIES [C][C][=C][N][=C][Branch1][C][C][C][Branch1][C][C][=N][Ring1][Branch2] is not showing SR-p53 toxicity."} {"text":"The molecule with the SMILES representation of CC(=O)c1ccc2ccc3ccccc3c2c1 is showing SR-p53 toxicity."}", "/scratch/micpie/export/sr_p53_tox21/valid_0-13.jsonl": "{"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the SR-p53 response assay.\nAssistant: Understood, this SELFIES is not toxic in the SR-p53 response assay: [C][C][=C][N][=C][Branch1][C][C][C][Branch1][C][C][=N][Ring1][Branch2]"} {"text":"User: I want to generate a InChI.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should be toxic in the SR-p53 response assay.\nAssistant: Understood, this InChI is toxic in the SR-p53 response assay: InChI=1S\/C16H12O\/c1-11(17)14-9-8-13-7-6-12-4-2-3-5-15(12)16(13)10-14\/h2-10H,1H3"}", "/scratch/micpie/export/sr_p53_tox21/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the p53 response assay.\nMolecule canonical SMILES: Cc1cnc(C)c(C)n1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-p53 response assay.\nMolecule DeepSMILES: CC=O)cccccccccccc6c%10c%14\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"}", "/scratch/micpie/export/sr_p53_tox21/train_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the p53 response assay?\nConstraint: You must select none, one or more options from A or B without using any other words.\nOptions:\nA.) CCOc1ccc2nc(S(N)(=O)=O)sc2c1\nB.) CC(C)(C)CC(=O)OCC(=O)[C@@]12OC(C)(C)O[C@@H]1C[C@H]1[C@@H]3CCC4=CC(=O)C=C[C@]4(C)[C@@]3(F)[C@@H](O)C[C@@]12C\nAnswer: A, B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the p53 response assay?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any other words.\nOptions:\n1: InChI=1S\/C17H21NO4\/c1-11(6-12-2-4-14(19)5-3-12)18-10-17(22)13-7-15(20)9-16(21)8-13\/h2-5,7-9,11,17-22H,6,10H2,1H3\n2: InChI=1S\/C8H11NO\/c9-6-5-7-1-3-8(10)4-2-7\/h1-4,10H,5-6,9H2\n3: InChI=1S\/C2H5Br\/c1-2-3\/h2H2,1H3\nAnswer: 1, 2, 3"}", "/scratch/micpie/export/sr_p53_tox21/valid_0-4.jsonl": "{"text":"The SMILES Cc1cnc(C)c(C)n1 is not toxic in the SR-p53 assay."} {"text":"The molecule canonical SMILES CC(=O)c1ccc2ccc3ccccc3c2c1 is toxic in the SR-p53 assay."}", "/scratch/micpie/export/sr_p53_tox21/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-p53 response assay.\nMolecule DeepSMILES: CCOccccncSN)=O)=O))sc5c9\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-p53 response assay.\nSMILES: CCBr\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/sr_p53_tox21/valid_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the p53 response assay?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any additional words.\nOptions:\n(1) Cc1cnc(C)c(C)n1\n(2) O=C1c2c(O)cccc2[C@H]([C@@H]2O[C@H](CO)[C@@H](O)[C@H](O)[C@H]2O)c2cc(CO)cc(O)c21\n(3) CC(=O)[C@H]1CC[C@H]2[C@@H]3C=CC4=CC(=O)CC[C@@]4(C)[C@@H]3CC[C@]12C\nAnswer: 1, 2, 3"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are toxic in the SR-p53 assay?\nConstraint: You must select none, one or more options from 1 or 2 without using any additional words.\nOptions:\n1.) CC(=O)c1ccc2ccc3ccccc3c2c1\n2.) O=C(NNCS(=O)(=O)[O-])c1ccncc1\nAnswer: 1"}", "/scratch/micpie/export/sr_p53_tox21/valid_0-12.jsonl": "{"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very curious. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be toxic in the p53 response assay.\nAssistant: Ok, this SMILES is not toxic in the p53 response assay: Cc1cnc(C)c(C)n1"} {"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should be toxic in the SR-p53 response assay.\nAssistant: Got it, here you go, this InChI is toxic in the SR-p53 response assay: InChI=1S\/C16H12O\/c1-11(17)14-9-8-13-7-6-12-4-2-3-5-15(12)16(13)10-14\/h2-10H,1H3"}", "/scratch/micpie/export/sr_p53_tox21/train_0-2.jsonl": "{"text":"Based on the SELFIES representation [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N], the molecule has no SR-p53 toxicity properties."} {"text":"Based on the canonical SMILES CCBr, the molecule has no SR-p53 response toxicity properties."}", "/scratch/micpie/export/sr_p53_tox21/test_0-11.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that is not toxic in the p53 response assay?\nAssistant: This is a molecule that is not toxic in the p53 response assay: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"User: I'm looking for the SELFIES of a molecule that is not toxic in the p53 response assay?\nAssistant: This is a molecule that is not toxic in the p53 response assay: [F][C][Branch1][C][F][Branch1][C][F][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Cl]"}", "/scratch/micpie/export/sr_p53_tox21/train_0-7.jsonl": "{"text":"Task: Please give me a SMILES based on the description.\nDescription: A molecule that is toxic in the p53 response assay.\nResult: CCOc1ccc2nc(S(N)(=O)=O)sc2c1"} {"text":"Task: Please generate a molecule SMILES based on the text description below.\nDescription: A molecule that is toxic in the SR-p53 assay.\nResult: CCBr"}", "/scratch/micpie/export/sr_p53_tox21/train_0-11.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that is not toxic in the p53 response assay?\nAssistant: This is a molecule that is not toxic in the p53 response assay: CCOc1ccc2nc(S(N)(=O)=O)sc2c1"} {"text":"User: I'm searching for the InChI of a molecule that is not toxic in the SR-p53 assay?\nAssistant: This is a molecule that is not toxic in the SR-p53 assay: InChI=1S\/C2H5Br\/c1-2-3\/h2H2,1H3"}", "/scratch/micpie/export/sr_p53_tox21/train_0-1.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N] is not showing SR-p53 toxicity."} {"text":"The molecule with the SMILES representation of CCBr is not showing SR-p53 toxicity."}", "/scratch/micpie/export/sr_p53_tox21/train_0-13.jsonl": "{"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the SR-p53 assay.\nAssistant: Ok, this DeepSMILES is not toxic in the SR-p53 assay: CCOccccncSN)=O)=O))sc5c9"} {"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the SR-p53 response assay.\nAssistant: Got it, this SMILES is not toxic in the SR-p53 response assay: CCBr"}", "/scratch/micpie/export/sr_p53_tox21/train_0-4.jsonl": "{"text":"The canonical SMILES CCOc1ccc2nc(S(N)(=O)=O)sc2c1 is not toxic in the p53 response assay."} {"text":"The molecule SMILES CCBr is not toxic in the SR-p53 assay."}", "/scratch/micpie/export/sr_p53_tox21/test_0-7.jsonl": "{"text":"Task: Please create a molecule canonical SMILES based on the description below.\nDescription: A molecule that is toxic in the SR-p53 assay.\nResult: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"Task: Please create a InChI based on the text description below.\nDescription: A molecule that is toxic in the SR-p53 assay.\nResult: InChI=1S\/C7H3Cl2F3\/c8-4-1-2-5(6(9)3-4)7(10,11)12\/h1-3H"}", "/scratch/micpie/export/sr_p53_tox21/train_0-9.jsonl": "{"text":"User: Is the molecule with the canonical SMILES CCOc1ccc2nc(S(N)(=O)=O)sc2c1 toxic in the p53 response assay?\nAssistant: No, it is not toxic in the p53 response assay."} {"text":"User: Is the molecule with the canonical SMILES CCBr toxic in the SR-p53 assay?\nAssistant: No, it is not toxic in the SR-p53 assay."}", "/scratch/micpie/export/sr_p53_tox21/valid_0-3.jsonl": "{"text":"The canonical SMILES Cc1cnc(C)c(C)n1 represents a molecule that is not identified as toxic in the SR-p53 assay."} {"text":"The InChI InChI=1S\/C16H12O\/c1-11(17)14-9-8-13-7-6-12-4-2-3-5-15(12)16(13)10-14\/h2-10H,1H3 represents a molecule that is identified as toxic in the SR-p53 assay."}", "/scratch/micpie/export/sr_p53_tox21/test_0-8.jsonl": "{"text":"User: Can you figure out if the molecule with the InChI InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20) is toxic in the p53 response assay?\nAssistant: No, this molecule is not toxic in the p53 response assay."} {"text":"User: Can you tell me if the molecule with the SMILES FC(F)(F)c1ccc(Cl)cc1Cl is toxic in the SR-p53 response assay?\nAssistant: No, this molecule is not toxic in the SR-p53 response assay."}", "/scratch/micpie/export/sr_p53_tox21/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C] toxic in the SR-p53 response assay?\nConstraint: Even if you are uncertain, you must pick either a or b without using any additional words.\nOptions:\na True\nb False\nAnswer: b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES FC(F)(F)c1ccc(Cl)cc1Cl toxic in the SR-p53 assay?\nConstraint: Even if you are not sure, you must pick either A or B without using any other words.\nOptions:\nA) False\nB) True\nAnswer: A"}", "/scratch/micpie/export/sr_p53_tox21/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES [C][C][=C][N][=C][Branch1][C][C][C][Branch1][C][C][=N][Ring1][Branch2] toxic in the SR-p53 assay?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any other words.\nOptions:\n1) True\n2) False\nAnswer: 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES [C][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][#Branch2][=C][Ring1][=C] toxic in the p53 response assay?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any additional words.\nOptions:\n1: False\n2: True\nAnswer: 2"}", "/scratch/micpie/export/sr_p53_tox21/test_0-4.jsonl": "{"text":"The molecule SELFIES [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C] is not toxic in the SR-p53 response assay."} {"text":"The canonical SMILES FC(F)(F)c1ccc(Cl)cc1Cl is not toxic in the SR-p53 assay."}", "/scratch/micpie/export/sr_p53_tox21/test_0-12.jsonl": "{"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be toxic in the SR-p53 response assay.\nAssistant: Got it, this DeepSMILES is not toxic in the SR-p53 response assay: CCCNCC))CCC))C=O)NccC)cccc6C"} {"text":"User: I want to create a InChI.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be toxic in the p53 response assay.\nAssistant: Got it, here you go, this InChI is not toxic in the p53 response assay: InChI=1S\/C7H3Cl2F3\/c8-4-1-2-5(6(9)3-4)7(10,11)12\/h1-3H"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_5-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SMILES from the canonical SMILES.\nMolecule canonical SMILES: Cc1ccc(N)cc1NC(=O)CSc1ccccc1F\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Cc1ccc(N)cc1NC(=O)CSc1ccccc1F"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SMILES from the canonical SMILES.\ncanonical SMILES: CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_2-4.jsonl": "{"text":"User: Can you create the SMILES of the molecule with the canonical SMILES NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1?\nAssistant: Of course, this molecule has a SMILES of NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1."} {"text":"User: Can you tell me the SMILES of the molecule with the canonical SMILES COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N?\nAssistant: Yes, this molecule has a SMILES of COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_2-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the canonical SMILES.\nMolecule canonical SMILES: Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SMILES from the canonical SMILES.\nMolecule canonical SMILES: Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_4-0.jsonl": "{"text":"The molecule with the canonical SMILES Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1 can also be represented with the SMILES Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1."} {"text":"The molecule with the canonical SMILES Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1 can also be represented with the SMILES Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_4-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the canonical SMILES Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1?\nAssistant: Yes, this molecule has a SMILES of Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1."} {"text":"User: Can you tell me the SMILES of the molecule with the canonical SMILES Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1?\nAssistant: Sure, this molecule has a SMILES of Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_1-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the canonical SMILES COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1."} {"text":"User: Can you generate the SMILES of the molecule with the canonical SMILES COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1?\nAssistant: Sure, this molecule has a SMILES of COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_5-1.jsonl": "{"text":"The molecule with the SMILES Cc1ccc(N)cc1NC(=O)CSc1ccccc1F can also be represented with the canonical SMILES Cc1ccc(N)cc1NC(=O)CSc1ccccc1F."} {"text":"The molecule with the SMILES representation of CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21 can also be represented with the canonical SMILES CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_4-4.jsonl": "{"text":"User: Can you create the SMILES of the molecule with the canonical SMILES CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1?\nAssistant: Sure, this molecule has a SMILES of CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1."} {"text":"User: Can you generate the SMILES of the molecule with the canonical SMILES CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_4-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the canonical SMILES O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12?\nAssistant: Yes, this molecule has a SMILES of O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12."} {"text":"User: Can you create the SMILES of the molecule with the canonical SMILES CC1CC(=O)Nc2cc(S(=O)(=O)N3CCCC3)ccc2S1?\nAssistant: Of course, this molecule has a SMILES of CC1CC(=O)Nc2cc(S(=O)(=O)N3CCCC3)ccc2S1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_1-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SMILES from the canonical SMILES.\ncanonical SMILES: CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SMILES from the canonical SMILES.\ncanonical SMILES: Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_3-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SMILES from the canonical SMILES.\nMolecule canonical SMILES: CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SMILES from the canonical SMILES.\ncanonical SMILES: CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_1-5.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the SMILES CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC?\nAssistant: Of course, this molecule has a canonical SMILES of CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the SMILES Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_5-0.jsonl": "{"text":"The molecule with the canonical SMILES Cc1ccc(N)cc1NC(=O)CSc1ccccc1F can also be represented with the SMILES representation Cc1ccc(N)cc1NC(=O)CSc1ccccc1F."} {"text":"The molecule with the canonical SMILES CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21 can also be represented with the SMILES representation CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_1-5.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the SMILES COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC?\nAssistant: Sure, this molecule has a canonical SMILES of COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC."} {"text":"User: Can you create the canonical SMILES of the molecule with the SMILES Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C?\nAssistant: Sure, this molecule has a canonical SMILES of Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_5-4.jsonl": "{"text":"User: Can you create the SMILES of the molecule with the canonical SMILES COCCN1CCN(C(=O)NC(C)C(C)c2ccccc2)CC1=O?\nAssistant: Of course, this molecule has a SMILES of COCCN1CCN(C(=O)NC(C)C(C)c2ccccc2)CC1=O."} {"text":"User: Can you generate the SMILES of the molecule with the canonical SMILES Nc1c2c([nH+]c3ccccc13)CCCC2?\nAssistant: Of course, this molecule has a SMILES of Nc1c2c([nH+]c3ccccc13)CCCC2."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_0-5.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the SMILES S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the SMILES COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O?\nAssistant: Sure, this molecule has a canonical SMILES of COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_3-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CC(C(=O)OCC(=O)c1ccc(F)cc1)N1C(=O)C2C3C=CC(C3)C2C1=O can also be represented with the SMILES representation CC(C(=O)OCC(=O)c1ccc(F)cc1)N1C(=O)C2C3C=CC(C3)C2C1=O."} {"text":"The molecule with the canonical SMILES representation of CCn1c(=O)n(CC(=O)OCCN2CCCC2=O)c2ccccc21 can also be represented with the SMILES CCn1c(=O)n(CC(=O)OCCN2CCCC2=O)c2ccccc21."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_0-1.jsonl": "{"text":"The molecule with the SMILES S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1 can also be represented with the canonical SMILES representation CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1."} {"text":"The molecule with the SMILES COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O can also be represented with the canonical SMILES COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_5-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1 can also be represented with the SMILES representation CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1."} {"text":"The molecule with the canonical SMILES representation of CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1 can also be represented with the SMILES CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_2-0.jsonl": "{"text":"The molecule with the canonical SMILES NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1 can also be represented with the SMILES representation NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1."} {"text":"The molecule with the canonical SMILES COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N can also be represented with the SMILES representation COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_2-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the canonical SMILES.\nMolecule canonical SMILES: CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the canonical SMILES.\nMolecule canonical SMILES: CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_0-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F can also be represented with the SMILES S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1."} {"text":"The molecule with the canonical SMILES representation of CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1 can also be represented with the SMILES representation CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_3-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the SMILES.\nSMILES: CC(C(=O)OCC(=O)c1ccc(F)cc1)N1C(=O)C2C3C=CC(C3)C2C1=O\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC(C(=O)OCC(=O)c1ccc(F)cc1)N1C(=O)C2C3C=CC(C3)C2C1=O"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the SMILES.\nSMILES: CCn1c(=O)n(CC(=O)OCCN2CCCC2=O)c2ccccc21\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CCn1c(=O)n(CC(=O)OCCN2CCCC2=O)c2ccccc21"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_5-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the SMILES.\nMolecule SMILES: COCCN1CCN(C(=O)NC(C)C(C)c2ccccc2)CC1=O\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: COCCN1CCN(C(=O)NC(C)C(C)c2ccccc2)CC1=O"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the SMILES.\nSMILES: Nc1c2c([nH+]c3ccccc13)CCCC2\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: Nc1c2c([nH+]c3ccccc13)CCCC2"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_3-5.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the SMILES COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1?\nAssistant: Sure, this molecule has a canonical SMILES of COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1."} {"text":"User: Can you generate the canonical SMILES of the molecule with the SMILES CC(=O)Nc1ccc(C(=O)OCCN2CCCC2=O)cc1?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of CC(=O)Nc1ccc(C(=O)OCCN2CCCC2=O)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_5-5.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the SMILES CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1?\nAssistant: Of course, this molecule has a canonical SMILES of CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the SMILES CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1?\nAssistant: Yes, this molecule has a canonical SMILES of CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_4-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the canonical SMILES from the SMILES.\nSMILES: CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the canonical SMILES from the SMILES.\nMolecule SMILES: CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_1-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the SMILES.\nMolecule SMILES: COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the canonical SMILES from the SMILES.\nSMILES: Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_0-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SMILES from the canonical SMILES.\ncanonical SMILES: CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the canonical SMILES.\ncanonical SMILES: COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_3-0.jsonl": "{"text":"The molecule with the canonical SMILES CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C can also be represented with the SMILES representation CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C."} {"text":"The molecule with the canonical SMILES CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1 can also be represented with the SMILES CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_1-0.jsonl": "{"text":"The molecule with the canonical SMILES COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC can also be represented with the SMILES COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC."} {"text":"The molecule with the canonical SMILES Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C can also be represented with the SMILES Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_5-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the canonical SMILES.\ncanonical SMILES: CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the canonical SMILES.\ncanonical SMILES: CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_2-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the SMILES.\nMolecule SMILES: Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the canonical SMILES from the SMILES.\nMolecule SMILES: Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_1-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the canonical SMILES.\ncanonical SMILES: COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the canonical SMILES.\ncanonical SMILES: COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_5-5.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the SMILES Cc1ccc(N)cc1NC(=O)CSc1ccccc1F?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of Cc1ccc(N)cc1NC(=O)CSc1ccccc1F."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the SMILES CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_2-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the canonical SMILES Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2."} {"text":"User: Can you generate the SMILES of the molecule with the canonical SMILES Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1?\nAssistant: Sure, this molecule has a SMILES of Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_3-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the canonical SMILES CC(C(=O)OCC(=O)c1ccc(F)cc1)N1C(=O)C2C3C=CC(C3)C2C1=O?\nAssistant: Of course, this molecule has a SMILES of CC(C(=O)OCC(=O)c1ccc(F)cc1)N1C(=O)C2C3C=CC(C3)C2C1=O."} {"text":"User: Can you tell me the SMILES of the molecule with the canonical SMILES CCn1c(=O)n(CC(=O)OCCN2CCCC2=O)c2ccccc21?\nAssistant: Of course, this molecule has a SMILES of CCn1c(=O)n(CC(=O)OCCN2CCCC2=O)c2ccccc21."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_0-0.jsonl": "{"text":"The molecule with the canonical SMILES CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1 can also be represented with the SMILES S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1."} {"text":"The molecule with the canonical SMILES representation of COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O can also be represented with the SMILES COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_2-5.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the SMILES NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1."} {"text":"User: Can you generate the canonical SMILES of the molecule with the SMILES COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_5-5.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the SMILES COCCN1CCN(C(=O)NC(C)C(C)c2ccccc2)CC1=O?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of COCCN1CCN(C(=O)NC(C)C(C)c2ccccc2)CC1=O."} {"text":"User: Can you create the canonical SMILES of the molecule with the SMILES Nc1c2c([nH+]c3ccccc13)CCCC2?\nAssistant: Sure, this molecule has a canonical SMILES of Nc1c2c([nH+]c3ccccc13)CCCC2."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_3-5.jsonl": "{"text":"User: Can you generate the canonical SMILES of the molecule with the SMILES CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C?\nAssistant: Sure, this molecule has a canonical SMILES of CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the SMILES CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_2-0.jsonl": "{"text":"The molecule with the canonical SMILES Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2 can also be represented with the SMILES Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2."} {"text":"The molecule with the canonical SMILES Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1 can also be represented with the SMILES representation Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_1-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the canonical SMILES COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC?\nAssistant: Of course, this molecule has a SMILES of COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC."} {"text":"User: Can you create the SMILES of the molecule with the canonical SMILES Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C?\nAssistant: Of course, this molecule has a SMILES of Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_2-0.jsonl": "{"text":"The molecule with the canonical SMILES CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12 can also be represented with the SMILES representation CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12."} {"text":"The molecule with the canonical SMILES representation of CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1 can also be represented with the SMILES CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_4-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SMILES from the canonical SMILES.\nMolecule canonical SMILES: CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SMILES from the canonical SMILES.\nMolecule canonical SMILES: CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_0-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the SMILES.\nSMILES: S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the SMILES.\nSMILES: COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_4-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the canonical SMILES.\ncanonical SMILES: Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the canonical SMILES.\ncanonical SMILES: Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_3-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the canonical SMILES.\nMolecule canonical SMILES: CC(C(=O)OCC(=O)c1ccc(F)cc1)N1C(=O)C2C3C=CC(C3)C2C1=O\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CC(C(=O)OCC(=O)c1ccc(F)cc1)N1C(=O)C2C3C=CC(C3)C2C1=O"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SMILES from the canonical SMILES.\ncanonical SMILES: CCn1c(=O)n(CC(=O)OCCN2CCCC2=O)c2ccccc21\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCn1c(=O)n(CC(=O)OCCN2CCCC2=O)c2ccccc21"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_2-1.jsonl": "{"text":"The molecule with the SMILES CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12 can also be represented with the canonical SMILES representation CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12."} {"text":"The molecule with the SMILES representation of CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1 can also be represented with the canonical SMILES representation CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_5-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the canonical SMILES CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1?\nAssistant: Of course, this molecule has a SMILES of CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1."} {"text":"User: Can you create the SMILES of the molecule with the canonical SMILES CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1?\nAssistant: Yes, this molecule has a SMILES of CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_4-5.jsonl": "{"text":"User: Can you generate the canonical SMILES of the molecule with the SMILES Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1?\nAssistant: Yes, this molecule has a canonical SMILES of Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the SMILES Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1?\nAssistant: Yes, this molecule has a canonical SMILES of Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_4-0.jsonl": "{"text":"The molecule with the canonical SMILES CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1 can also be represented with the SMILES representation CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1."} {"text":"The molecule with the canonical SMILES representation of CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1 can also be represented with the SMILES CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_5-1.jsonl": "{"text":"The molecule with the SMILES COCCN1CCN(C(=O)NC(C)C(C)c2ccccc2)CC1=O can also be represented with the canonical SMILES representation COCCN1CCN(C(=O)NC(C)C(C)c2ccccc2)CC1=O."} {"text":"The molecule with the SMILES representation of Nc1c2c([nH+]c3ccccc13)CCCC2 can also be represented with the canonical SMILES Nc1c2c([nH+]c3ccccc13)CCCC2."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_2-1.jsonl": "{"text":"The molecule with the SMILES NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1 can also be represented with the canonical SMILES NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1."} {"text":"The molecule with the SMILES representation of COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N can also be represented with the canonical SMILES representation COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_0-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1 can also be represented with the SMILES representation O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C."} {"text":"The molecule with the canonical SMILES representation of COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1 can also be represented with the SMILES COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_0-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the SMILES.\nMolecule SMILES: O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the SMILES.\nMolecule SMILES: COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_1-4.jsonl": "{"text":"User: Can you create the SMILES of the molecule with the canonical SMILES CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC."} {"text":"User: Can you create the SMILES of the molecule with the canonical SMILES Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_2-5.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the SMILES Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2?\nAssistant: Sure, this molecule has a canonical SMILES of Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2."} {"text":"User: Can you create the canonical SMILES of the molecule with the SMILES Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_1-1.jsonl": "{"text":"The molecule with the SMILES representation of CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC can also be represented with the canonical SMILES CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC."} {"text":"The molecule with the SMILES Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl can also be represented with the canonical SMILES Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_5-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SMILES from the canonical SMILES.\ncanonical SMILES: COCCN1CCN(C(=O)NC(C)C(C)c2ccccc2)CC1=O\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: COCCN1CCN(C(=O)NC(C)C(C)c2ccccc2)CC1=O"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SMILES from the canonical SMILES.\nMolecule canonical SMILES: Nc1c2c([nH+]c3ccccc13)CCCC2\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: Nc1c2c([nH+]c3ccccc13)CCCC2"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_5-1.jsonl": "{"text":"The molecule with the SMILES representation of CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1 can also be represented with the canonical SMILES representation CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1."} {"text":"The molecule with the SMILES representation of CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1 can also be represented with the canonical SMILES representation CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_4-1.jsonl": "{"text":"The molecule with the SMILES representation of O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12 can also be represented with the canonical SMILES O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12."} {"text":"The molecule with the SMILES CC1CC(=O)Nc2cc(S(=O)(=O)N3CCCC3)ccc2S1 can also be represented with the canonical SMILES CC1CC(=O)Nc2cc(S(=O)(=O)N3CCCC3)ccc2S1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_3-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the SMILES.\nSMILES: CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the canonical SMILES from the SMILES.\nMolecule SMILES: CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_3-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the canonical SMILES COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1?\nAssistant: Of course, this molecule has a SMILES of COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1."} {"text":"User: Can you create the SMILES of the molecule with the canonical SMILES CC(=O)Nc1ccc(C(=O)OCCN2CCCC2=O)cc1?\nAssistant: Of course, this molecule has a SMILES of CC(=O)Nc1ccc(C(=O)OCCN2CCCC2=O)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_1-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the SMILES.\nSMILES: COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the SMILES.\nSMILES: COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_0-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SMILES from the canonical SMILES.\ncanonical SMILES: CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SMILES from the canonical SMILES.\nMolecule canonical SMILES: CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_5-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of COCCN1CCN(C(=O)NC(C)C(C)c2ccccc2)CC1=O can also be represented with the SMILES representation COCCN1CCN(C(=O)NC(C)C(C)c2ccccc2)CC1=O."} {"text":"The molecule with the canonical SMILES representation of Nc1c2c([nH+]c3ccccc13)CCCC2 can also be represented with the SMILES Nc1c2c([nH+]c3ccccc13)CCCC2."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_0-1.jsonl": "{"text":"The molecule with the SMILES S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1 can also be represented with the canonical SMILES CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F."} {"text":"The molecule with the SMILES CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1 can also be represented with the canonical SMILES CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_2-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the canonical SMILES CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12?\nAssistant: Of course, this molecule has a SMILES of CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12."} {"text":"User: Can you create the SMILES of the molecule with the canonical SMILES CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1?\nAssistant: Of course, this molecule has a SMILES of CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_2-1.jsonl": "{"text":"The molecule with the SMILES Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2 can also be represented with the canonical SMILES Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2."} {"text":"The molecule with the SMILES representation of Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1 can also be represented with the canonical SMILES Cc1ccc(-n2c(SCC(=O)NCc3ccco3)nc3c([nH]c4ccccc43)c2=O)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_1-1.jsonl": "{"text":"The molecule with the SMILES representation of COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1 can also be represented with the canonical SMILES representation COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1."} {"text":"The molecule with the SMILES representation of COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1 can also be represented with the canonical SMILES COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_3-5.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the SMILES CC(C(=O)OCC(=O)c1ccc(F)cc1)N1C(=O)C2C3C=CC(C3)C2C1=O?\nAssistant: Yes, this molecule has a canonical SMILES of CC(C(=O)OCC(=O)c1ccc(F)cc1)N1C(=O)C2C3C=CC(C3)C2C1=O."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the SMILES CCn1c(=O)n(CC(=O)OCCN2CCCC2=O)c2ccccc21?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of CCn1c(=O)n(CC(=O)OCCN2CCCC2=O)c2ccccc21."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_3-1.jsonl": "{"text":"The molecule with the SMILES representation of CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C can also be represented with the canonical SMILES representation CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C."} {"text":"The molecule with the SMILES representation of CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1 can also be represented with the canonical SMILES CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_0-5.jsonl": "{"text":"User: Can you generate the canonical SMILES of the molecule with the SMILES S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1?\nAssistant: Yes, this molecule has a canonical SMILES of CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F."} {"text":"User: Can you create the canonical SMILES of the molecule with the SMILES CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1?\nAssistant: Yes, this molecule has a canonical SMILES of CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_0-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the canonical SMILES CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1."} {"text":"User: Can you tell me the SMILES of the molecule with the canonical SMILES CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_1-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the SMILES.\nSMILES: CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the SMILES.\nMolecule SMILES: Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_5-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the SMILES.\nMolecule SMILES: CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the SMILES.\nSMILES: CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_0-5.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the SMILES O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1."} {"text":"User: Can you create the canonical SMILES of the molecule with the SMILES COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1?\nAssistant: Of course, this molecule has a canonical SMILES of COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_4-5.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the SMILES O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12?\nAssistant: Yes, this molecule has a canonical SMILES of O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12."} {"text":"User: Can you generate the canonical SMILES of the molecule with the SMILES CC1CC(=O)Nc2cc(S(=O)(=O)N3CCCC3)ccc2S1?\nAssistant: Yes, this molecule has a canonical SMILES of CC1CC(=O)Nc2cc(S(=O)(=O)N3CCCC3)ccc2S1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_1-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC can also be represented with the SMILES CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC."} {"text":"The molecule with the canonical SMILES Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl can also be represented with the SMILES representation Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_0-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the canonical SMILES.\ncanonical SMILES: Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the canonical SMILES.\ncanonical SMILES: COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_4-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the canonical SMILES.\nMolecule canonical SMILES: O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the canonical SMILES.\nMolecule canonical SMILES: CC1CC(=O)Nc2cc(S(=O)(=O)N3CCCC3)ccc2S1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC1CC(=O)Nc2cc(S(=O)(=O)N3CCCC3)ccc2S1"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_4-1.jsonl": "{"text":"The molecule with the SMILES CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1 can also be represented with the canonical SMILES CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1."} {"text":"The molecule with the SMILES representation of CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1 can also be represented with the canonical SMILES CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_5-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the SMILES.\nMolecule SMILES: Cc1ccc(N)cc1NC(=O)CSc1ccccc1F\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Cc1ccc(N)cc1NC(=O)CSc1ccccc1F"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the SMILES.\nSMILES: CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_2-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the canonical SMILES.\ncanonical SMILES: NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SMILES from the canonical SMILES.\nMolecule canonical SMILES: COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_1-1.jsonl": "{"text":"The molecule with the SMILES representation of COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC can also be represented with the canonical SMILES COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC."} {"text":"The molecule with the SMILES representation of Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C can also be represented with the canonical SMILES Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_0-1.jsonl": "{"text":"The molecule with the SMILES O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C can also be represented with the canonical SMILES Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1."} {"text":"The molecule with the SMILES representation of COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1 can also be represented with the canonical SMILES COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_3-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the canonical SMILES CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C."} {"text":"User: Can you tell me the SMILES of the molecule with the canonical SMILES CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1?\nAssistant: Yes, this molecule has a SMILES of CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_0-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the canonical SMILES Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1?\nAssistant: Yes, this molecule has a SMILES of O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C."} {"text":"User: Can you tell me the SMILES of the molecule with the canonical SMILES COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1?\nAssistant: Sure, this molecule has a SMILES of COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_1-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1 can also be represented with the SMILES representation COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1."} {"text":"The molecule with the canonical SMILES COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1 can also be represented with the SMILES COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_4-5.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the SMILES CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1?\nAssistant: Yes, this molecule has a canonical SMILES of CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the SMILES CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1?\nAssistant: Yes, this molecule has a canonical SMILES of CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_4-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the SMILES.\nMolecule SMILES: O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the canonical SMILES from the SMILES.\nMolecule SMILES: CC1CC(=O)Nc2cc(S(=O)(=O)N3CCCC3)ccc2S1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC1CC(=O)Nc2cc(S(=O)(=O)N3CCCC3)ccc2S1"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_4-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the canonical SMILES from the SMILES.\nMolecule SMILES: Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the SMILES.\nSMILES: Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_3-1.jsonl": "{"text":"The molecule with the SMILES representation of COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1 can also be represented with the canonical SMILES representation COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1."} {"text":"The molecule with the SMILES representation of CC(=O)Nc1ccc(C(=O)OCCN2CCCC2=O)cc1 can also be represented with the canonical SMILES CC(=O)Nc1ccc(C(=O)OCCN2CCCC2=O)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_2-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the SMILES.\nMolecule SMILES: CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the SMILES.\nSMILES: CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_3-1.jsonl": "{"text":"The molecule with the SMILES CC(C(=O)OCC(=O)c1ccc(F)cc1)N1C(=O)C2C3C=CC(C3)C2C1=O can also be represented with the canonical SMILES CC(C(=O)OCC(=O)c1ccc(F)cc1)N1C(=O)C2C3C=CC(C3)C2C1=O."} {"text":"The molecule with the SMILES representation of CCn1c(=O)n(CC(=O)OCCN2CCCC2=O)c2ccccc21 can also be represented with the canonical SMILES representation CCn1c(=O)n(CC(=O)OCCN2CCCC2=O)c2ccccc21."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_0-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the SMILES.\nSMILES: S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the canonical SMILES from the SMILES.\nSMILES: CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_1-5.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the SMILES COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1?\nAssistant: Yes, this molecule has a canonical SMILES of COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1."} {"text":"User: Can you create the canonical SMILES of the molecule with the SMILES COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_4-1.jsonl": "{"text":"The molecule with the SMILES Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1 can also be represented with the canonical SMILES Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1."} {"text":"The molecule with the SMILES Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1 can also be represented with the canonical SMILES Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_5-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the canonical SMILES Cc1ccc(N)cc1NC(=O)CSc1ccccc1F?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of Cc1ccc(N)cc1NC(=O)CSc1ccccc1F."} {"text":"User: Can you create the SMILES of the molecule with the canonical SMILES CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/valid_2-5.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the SMILES CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12?\nAssistant: Yes, this molecule has a canonical SMILES of CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12."} {"text":"User: Can you create the canonical SMILES of the molecule with the SMILES CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1?\nAssistant: Sure, this molecule has a canonical SMILES of CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_1-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the canonical SMILES.\ncanonical SMILES: COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SMILES from the canonical SMILES.\nMolecule canonical SMILES: Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_0-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the canonical SMILES CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1?\nAssistant: Yes, this molecule has a SMILES of S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1."} {"text":"User: Can you generate the SMILES of the molecule with the canonical SMILES COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O?\nAssistant: Sure, this molecule has a SMILES of COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_3-0.jsonl": "{"text":"The molecule with the canonical SMILES COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1 can also be represented with the SMILES COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1."} {"text":"The molecule with the canonical SMILES representation of CC(=O)Nc1ccc(C(=O)OCCN2CCCC2=O)cc1 can also be represented with the SMILES representation CC(=O)Nc1ccc(C(=O)OCCN2CCCC2=O)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/test_2-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the canonical SMILES from the SMILES.\nSMILES: NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the SMILES.\nMolecule SMILES: COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: COc1ccc(-c2nn(-c3ccccc3)cc2C(=O)O)cc1N"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_4-0.jsonl": "{"text":"The molecule with the canonical SMILES O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12 can also be represented with the SMILES O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12."} {"text":"The molecule with the canonical SMILES CC1CC(=O)Nc2cc(S(=O)(=O)N3CCCC3)ccc2S1 can also be represented with the SMILES CC1CC(=O)Nc2cc(S(=O)(=O)N3CCCC3)ccc2S1."}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_3-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the canonical SMILES.\ncanonical SMILES: COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the canonical SMILES.\nMolecule canonical SMILES: CC(=O)Nc1ccc(C(=O)OCCN2CCCC2=O)cc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CC(=O)Nc1ccc(C(=O)OCCN2CCCC2=O)cc1"}", "/scratch/micpie/export/mol_repr_transl_smiles_canonical/train_3-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the SMILES.\nSMILES: COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the SMILES.\nSMILES: CC(=O)Nc1ccc(C(=O)OCCN2CCCC2=O)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CC(=O)Nc1ccc(C(=O)OCCN2CCCC2=O)cc1"}", "/scratch/micpie/export/herg_karim_et_al/test_0-10.jsonl": "{"text":"User: I'm searching for the DeepSMILES of a molecule that does not block the human ether-à-go-go related gene (hERG) (<10uM)?\nAssistant: This is a molecule that is not a hERG blocker (<10uM): O=CNCCOccccc-cccncCO))c6))))))c6C%10)))))))))))ccccOCCF)F)F))))nc6"} {"text":"User: I'm looking for the SELFIES of a molecule that does not block hERG (<10uM)?\nAssistant: This is a molecule that is not a human ether-à-go-go related gene (hERG) blocker (<10uM): [O][=C][Branch2][Ring1][Ring2][N][C][=C][C][=C][Branch1][Branch2][C][=N][N][=N][NH1][Ring1][Branch1][C][=C][Ring1][O][F][C][Branch1][=Branch2][C][C][C][C][C][C][Ring1][=Branch1][N][C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][=N][C][=C][C][Branch1][C][F][=C][Branch1][C][F][C][=C][Ring1][Branch2][Ring2][Ring1][C]"}", "/scratch/micpie/export/herg_karim_et_al/valid_0-8.jsonl": "{"text":"User: Is the molecule with the SMILES N[C@@H](Cn1c(=O)cnc2ccc(F)cc21)C1CCC(NCc2ccc3c(n2)NC(=O)CO3)CC1 a hERG blocker (<10uM)?\nAssistant: No, it is not a hERG blocker (<10uM)."} {"text":"User: Is the molecule with the DeepSMILES O=ccc[C@H]CCN[C@@H]Ccccccc6)))))))C6))))))o[nH]5 a hERG blocker (<10uM)?\nAssistant: No, it is not a hERG blocker (<10uM)."}", "/scratch/micpie/export/herg_karim_et_al/train_0-8.jsonl": "{"text":"User: Is the molecule with the SMILES COc1cc(N2Cc3ccc(Sc4ccc(F)cc4)nc3C2=O)ccc1OCCN1CCCC1 a hERG blocking compound (<10uM)?\nAssistant: No, it is not a hERG blocking compound (<10uM)."} {"text":"User: Is the molecule with the SMILES O=C(Nc1ccc(-c2nnc(NCCCN3CCCCC3)o2)cc1)c1c(F)cccc1F a hERG blocker (<10uM)?\nAssistant: Yes, it is a hERG blocker (<10uM)."}", "/scratch/micpie/export/herg_karim_et_al/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a human ether-à-go-go related gene (hERG) blocker (<10uM).\nDeepSMILES: O=CNCCOccccc-cccncCO))c6))))))c6C%10)))))))))))ccccOCCF)F)F))))nc6\nConstraint: Answer the question in a full sentence.\nResult: This molecule is no human ether-à-go-go related gene (hERG) blocker (<10uM)."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a hERG blocker (<10uM).\ncanonical SMILES: O=C(Nc1ccc(-c2nnn[nH]2)cc1F)C(C1CCCCC1)n1c(-c2ccc(Cl)cc2)nc2cc(F)c(F)cc21\nConstraint: Answer the question in a full sentence.\nResult: This molecule is no hERG blocker (<10uM)."}", "/scratch/micpie/export/herg_karim_et_al/valid_0-9.jsonl": "{"text":"User: Can you generate the SMILES of a molecule that is not a hERG blocking compound (<10uM)?\nAssistant: Yes, I'm happy to help, here you go: N[C@@H](Cn1c(=O)cnc2ccc(F)cc21)C1CCC(NCc2ccc3c(n2)NC(=O)CO3)CC1"} {"text":"User: Can you generate the DeepSMILES of a molecule that is not a hERG blocker (<10uM)?\nAssistant: Yes, here you go: O=ccc[C@H]CCN[C@@H]Ccccccc6)))))))C6))))))o[nH]5"}", "/scratch/micpie/export/herg_karim_et_al/test_0-1.jsonl": "{"text":"Based on the SMILES representation O=C(NC1COc2cccc(-c3ccnc(CO)c3)c2C1)c1ccc(OCC(F)(F)F)nc1, the molecule is not a human ether-à-go-go related gene (hERG) blocker (>= 10uM)."} {"text":"Based on the DeepSMILES representation O=CNcccc-cnnn[nH]5)))))cc6F))))))))CCCCCCC6))))))nc-ccccCl)cc6))))))ncccF)cF)cc69, the molecule is not a human ether-à-go-go related gene (hERG) blocker (>= 10uM)."}", "/scratch/micpie/export/herg_karim_et_al/valid_0-0.jsonl": "{"text":"The molecule with the SMILES representation of N[C@@H](Cn1c(=O)cnc2ccc(F)cc21)C1CCC(NCc2ccc3c(n2)NC(=O)CO3)CC1 is not a hERG blocker (>= 10uM)."} {"text":"The molecule with the DeepSMILES O=ccc[C@H]CCN[C@@H]Ccccccc6)))))))C6))))))o[nH]5 is not a hERG blocker (>= 10uM)."}", "/scratch/micpie/export/herg_karim_et_al/test_0-2.jsonl": "{"text":"The InChI InChI=1S\/C23H20F3N3O4\/c24-23(25,26)13-33-21-5-4-15(10-28-21)22(31)29-17-9-19-18(2-1-3-20(19)32-12-17)14-6-7-27-16(8-14)11-30\/h1-8,10,17,30H,9,11-13H2,(H,29,31) is from a molecule that is not a hERG blocker (>= 10uM)."} {"text":"The DeepSMILES O=CNcccc-cnnn[nH]5)))))cc6F))))))))CCCCCCC6))))))nc-ccccCl)cc6))))))ncccF)cF)cc69 is from a molecule that is not a hERG blocker (>= 10uM)."}", "/scratch/micpie/export/herg_karim_et_al/valid_0-10.jsonl": "{"text":"User: I'm searching for the InChI of a molecule that does not block hERG (<10uM)?\nAssistant: This is a molecule that is not a hERG blocking compound (<10uM): InChI=1S\/C24H27FN6O3\/c25-15-3-7-19-20(9-15)31(23(33)11-28-19)12-18(26)14-1-4-16(5-2-14)27-10-17-6-8-21-24(29-17)30-22(32)13-34-21\/h3,6-9,11,14,16,18,27H,1-2,4-5,10,12-13,26H2,(H,29,30,32)\/t14?,16?,18-\/m0\/s1"} {"text":"User: I'm looking for the SMILES of a molecule that does not block the human ether-à-go-go related gene (hERG) (<10uM)?\nAssistant: This is a molecule that is not a human ether-à-go-go related gene (hERG) blocker (<10uM): O=c1cc([C@H]2CCN[C@@H](Cc3ccccc3)C2)o[nH]1"}", "/scratch/micpie/export/herg_karim_et_al/train_0-6.jsonl": "{"text":"Task: Please give me a DeepSMILES based on the text description below.\nDescription: A molecule that is a hERG blocker (<10uM).\nResult: COcccNCccccSccccF)cc6)))))))nc6C9=O))))))))))ccc6OCCNCCCC5"} {"text":"Task: Please create a molecule SMILES based on the description below.\nDescription: A molecule that is a hERG blocker (<10uM).\nResult: O=C(Nc1ccc(-c2nnc(NCCCN3CCCCC3)o2)cc1)c1c(F)cccc1F"}", "/scratch/micpie/export/herg_karim_et_al/valid_0-6.jsonl": "{"text":"Task: Please generate a molecule DeepSMILES based on the description.\nDescription: A molecule that is a hERG blocker (<10uM).\nResult: N[C@@H]Cnc=O)cnccccF)cc6%10)))))))))))CCCCNCcccccn6)NC=O)CO6)))))))))))CC6"} {"text":"Task: Please create a molecule InChI based on the text description.\nDescription: A molecule that is a human ether-à-go-go related gene (hERG) blocker (<10uM).\nResult: InChI=1S\/C15H18N2O2\/c18-15-10-14(19-17-15)12-6-7-16-13(9-12)8-11-4-2-1-3-5-11\/h1-5,10,12-13,16H,6-9H2,(H,17,18)\/t12-,13-\/m0\/s1"}", "/scratch/micpie/export/herg_karim_et_al/test_0-9.jsonl": "{"text":"User: Can you give me the SMILES of a molecule that is not a human ether-à-go-go related gene (hERG) blocker (<10uM)?\nAssistant: Yes, I'm happy to help, here you go: O=C(NC1COc2cccc(-c3ccnc(CO)c3)c2C1)c1ccc(OCC(F)(F)F)nc1"} {"text":"User: Can you create the canonical SMILES of a molecule that is not a human ether-à-go-go related gene (hERG) blocker (<10uM)?\nAssistant: Yes, here you go: O=C(Nc1ccc(-c2nnn[nH]2)cc1F)C(C1CCCCC1)n1c(-c2ccc(Cl)cc2)nc2cc(F)c(F)cc21"}", "/scratch/micpie/export/herg_karim_et_al/test_0-0.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C23H20F3N3O4\/c24-23(25,26)13-33-21-5-4-15(10-28-21)22(31)29-17-9-19-18(2-1-3-20(19)32-12-17)14-6-7-27-16(8-14)11-30\/h1-8,10,17,30H,9,11-13H2,(H,29,31) is not a hERG blocker (>= 10uM)."} {"text":"The molecule with the SMILES representation of O=C(Nc1ccc(-c2nnn[nH]2)cc1F)C(C1CCCCC1)n1c(-c2ccc(Cl)cc2)nc2cc(F)c(F)cc21 is not a hERG blocker (>= 10uM)."}", "/scratch/micpie/export/herg_karim_et_al/valid_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the InChI InChI=1S\/C24H27FN6O3\/c25-15-3-7-19-20(9-15)31(23(33)11-28-19)12-18(26)14-1-4-16(5-2-14)27-10-17-6-8-21-24(29-17)30-22(32)13-34-21\/h3,6-9,11,14,16,18,27H,1-2,4-5,10,12-13,26H2,(H,29,30,32)\/t14?,16?,18-\/m0\/s1 is a hERG blocking compound (<10uM)?\nAssistant: No, this molecule is not a hERG blocking compound (<10uM)."} {"text":"User: Can you estimate if the molecule with the canonical SMILES O=c1cc([C@H]2CCN[C@@H](Cc3ccccc3)C2)o[nH]1 is a hERG blocking compound (<10uM)?\nAssistant: No, this molecule is not a hERG blocking compound (<10uM)."}", "/scratch/micpie/export/herg_karim_et_al/test_0-3.jsonl": "{"text":"The DeepSMILES O=CNCCOccccc-cccncCO))c6))))))c6C%10)))))))))))ccccOCCF)F)F))))nc6 is not a human ether-à-go-go related gene (hERG) blocker (>= 10uM)."} {"text":"The InChI InChI=1S\/C28H23ClF3N7O\/c29-18-9-6-16(7-10-18)27-33-23-13-19(30)20(31)14-24(23)39(27)25(15-4-2-1-3-5-15)28(40)34-22-11-8-17(12-21(22)32)26-35-37-38-36-26\/h6-15,25H,1-5H2,(H,34,40)(H,35,36,37,38) is not a human ether-à-go-go related gene (hERG) blocker (>= 10uM)."}", "/scratch/micpie/export/herg_karim_et_al/valid_0-11.jsonl": "{"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be a human ether-à-go-go related gene (hERG) blocker (<10uM).\nAssistant: Got it, this DeepSMILES is not a human ether-à-go-go related gene (hERG) blocker (<10uM): N[C@@H]Cnc=O)cnccccF)cc6%10)))))))))))CCCCNCcccccn6)NC=O)CO6)))))))))))CC6"} {"text":"User: I want to generate a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be a human ether-à-go-go related gene (hERG) blocker (<10uM).\nAssistant: Ok, this canonical SMILES is not a human ether-à-go-go related gene (hERG) blocker (<10uM): O=c1cc([C@H]2CCN[C@@H](Cc3ccccc3)C2)o[nH]1"}", "/scratch/micpie/export/herg_karim_et_al/train_0-0.jsonl": "{"text":"The molecule with the SELFIES [C][O][C][=C][C][Branch2][Ring1][=N][N][C][C][=C][C][=C][Branch1][=N][S][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][N][=C][Ring1][=C][C][Ring1][P][=O][=C][C][=C][Ring2][Ring1][Branch2][O][C][C][N][C][C][C][C][Ring1][Branch1] is not a hERG blocker (>= 10uM)."} {"text":"The molecule with the SELFIES [O][=C][Branch2][Ring2][Ring1][N][C][=C][C][=C][Branch2][Ring1][=Branch1][C][=N][N][=C][Branch1][=N][N][C][C][C][N][C][C][C][C][C][Ring1][=Branch1][O][Ring1][#C][C][=C][Ring2][Ring1][Branch1][C][=C][Branch1][C][F][C][=C][C][=C][Ring1][#Branch1][F] is a hERG blocker (<10uM)."}", "/scratch/micpie/export/herg_karim_et_al/test_0-6.jsonl": "{"text":"Task: Please generate a DeepSMILES based on the text description below.\nDescription: A molecule that is a hERG blocking compound (<10uM).\nResult: O=CNCCOccccc-cccncCO))c6))))))c6C%10)))))))))))ccccOCCF)F)F))))nc6"} {"text":"Task: Please create a canonical SMILES based on the text description.\nDescription: A molecule that is a hERG blocking compound (<10uM).\nResult: O=C(Nc1ccc(-c2nnn[nH]2)cc1F)C(C1CCCCC1)n1c(-c2ccc(Cl)cc2)nc2cc(F)c(F)cc21"}", "/scratch/micpie/export/herg_karim_et_al/train_0-10.jsonl": "{"text":"User: I'm searching for the SELFIES of a molecule that does not block hERG (<10uM)?\nAssistant: This is a molecule that is not a human ether-à-go-go related gene (hERG) blocking compound (<10uM): [C][O][C][=C][C][Branch2][Ring1][=N][N][C][C][=C][C][=C][Branch1][=N][S][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][N][=C][Ring1][=C][C][Ring1][P][=O][=C][C][=C][Ring2][Ring1][Branch2][O][C][C][N][C][C][C][C][Ring1][Branch1]"} {"text":"User: I'm looking for the SELFIES of a molecule that does block hERG (<10uM)?\nAssistant: This is a molecule that is a hERG blocker (<10uM): [O][=C][Branch2][Ring2][Ring1][N][C][=C][C][=C][Branch2][Ring1][=Branch1][C][=N][N][=C][Branch1][=N][N][C][C][C][N][C][C][C][C][C][Ring1][=Branch1][O][Ring1][#C][C][=C][Ring2][Ring1][Branch1][C][=C][Branch1][C][F][C][=C][C][=C][Ring1][#Branch1][F]"}", "/scratch/micpie/export/herg_karim_et_al/train_0-3.jsonl": "{"text":"The molecule InChI InChI=1S\/C26H26FN3O3S\/c1-32-23-16-20(7-10-22(23)33-15-14-29-12-2-3-13-29)30-17-18-4-11-24(28-25(18)26(30)31)34-21-8-5-19(27)6-9-21\/h4-11,16H,2-3,12-15,17H2,1H3 is not a human ether-à-go-go related gene (hERG) blocker (>= 10uM)."} {"text":"The canonical SMILES O=C(Nc1ccc(-c2nnc(NCCCN3CCCCC3)o2)cc1)c1c(F)cccc1F is a human ether-à-go-go related gene (hERG) blocker (<10uM)."}", "/scratch/micpie/export/herg_karim_et_al/train_0-12.jsonl": "{"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be a human ether-à-go-go related gene (hERG) blocker (<10uM).\nAssistant: Got it, this SELFIES is not a human ether-à-go-go related gene (hERG) blocker (<10uM): [C][O][C][=C][C][Branch2][Ring1][=N][N][C][C][=C][C][=C][Branch1][=N][S][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][N][=C][Ring1][=C][C][Ring1][P][=O][=C][C][=C][Ring2][Ring1][Branch2][O][C][C][N][C][C][C][C][Ring1][Branch1]"} {"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should be a hERG blocker (<10uM).\nAssistant: Understood, this DeepSMILES is a hERG blocker (<10uM): O=CNcccc-cnncNCCCNCCCCC6))))))))))o5)))))cc6)))))))ccF)cccc6F"}", "/scratch/micpie/export/herg_karim_et_al/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a hERG blocking compound (<10uM)?\nConstraint: You must select none, one or more options from a, b, c, or d without using any additional words.\nOptions:\na) N[C@H]1CN(c2ccncc2Nc2ncc3ccc(-c4c(F)cccc4F)nn23)C[C@@H](N)[C@@H]1O\nb) CC(c1ccnc(Cl)c1)N1C2CCC1CC(Oc1cccc(C(N)=O)c1)C2\nc) CCN1CC2CCCC[C@]2(c2ccc(Cl)c(Cl)c2)C1\nd) O=C(NC1COc2cccc(-c3ccnc(CO)c3)c2C1)c1ccc(OCC(F)(F)F)nc1\nAnswer: a, d"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a human ether-à-go-go related gene (hERG) blocker (<10uM)?\nConstraint: You must select none, one or more options from a, b, or c without using any additional words.\nOptions:\na O=CNcccc-cnnn[nH]5)))))cc6F))))))))CCCCCCC6))))))nc-ccccCl)cc6))))))ncccF)cF)cc69\nb Ccncsc5-cnncSCCCNCC[C@]C[C@@H]3ccccCF)F)F))cc6))))))))C5)))))))))n5C\nc OcccccncCccccOcccccc6)))))))cc6)))))))[nH]c95\nAnswer: a"}", "/scratch/micpie/export/herg_karim_et_al/valid_0-2.jsonl": "{"text":"The DeepSMILES N[C@@H]Cnc=O)cnccccF)cc6%10)))))))))))CCCCNCcccccn6)NC=O)CO6)))))))))))CC6 is from a molecule that is not a hERG blocker (>= 10uM)."} {"text":"The InChI InChI=1S\/C15H18N2O2\/c18-15-10-14(19-17-15)12-6-7-16-13(9-12)8-11-4-2-1-3-5-11\/h1-5,10,12-13,16H,6-9H2,(H,17,18)\/t12-,13-\/m0\/s1 is from a molecule that is not a hERG blocker (>= 10uM)."}", "/scratch/micpie/export/herg_karim_et_al/valid_0-1.jsonl": "{"text":"Based on the SELFIES representation [N][C@@H1][Branch2][Ring1][=Branch1][C][N][C][=Branch1][C][=O][C][=N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][Ring1][N][C][C][C][C][Branch2][Ring1][=Branch1][N][C][C][=C][C][=C][C][=Branch1][Ring2][=N][Ring1][=Branch1][N][C][=Branch1][C][=O][C][O][Ring1][Branch2][C][C][Ring2][Ring1][Ring1], the molecule is not a human ether-à-go-go related gene (hERG) blocker (>= 10uM)."} {"text":"Based on the InChI representation InChI=1S\/C15H18N2O2\/c18-15-10-14(19-17-15)12-6-7-16-13(9-12)8-11-4-2-1-3-5-11\/h1-5,10,12-13,16H,6-9H2,(H,17,18)\/t12-,13-\/m0\/s1, the molecule is not a human ether-à-go-go related gene (hERG) blocker (>= 10uM)."}", "/scratch/micpie/export/herg_karim_et_al/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a human ether-à-go-go related gene (hERG) blocking compound (<10uM)?\nConstraint: You must select none, one or more options from A, B, or C without using any additional words.\nOptions:\n[A] [C][C][=C][C][=C][C][Branch2][Branch1][#Branch1][C@H1][C][C][C@H1][Branch2][Ring2][O][C][N][C][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][=Branch2][C][=N][C][=C][C][Branch1][Ring1][C][#N][=C][C][Branch1][=Branch1][C][Branch1][C][C][C][=C][Ring1][O][O][Ring1][=C][C][=C][Ring2][Ring1][Ring2][C][C][Ring2][Ring1][=C][=C][Ring2][Ring2][Ring2]\n[B] [C][N][C][C][N][Branch2][Ring2][#Branch2][C][=C][C][=C][C][=C][C][=C][Branch2][Ring1][#Branch1][O][C][C][=N][O][C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][=N][Ring1][N][C][=C][Ring2][Ring1][Branch2][Ring2][Ring1][Ring2][C][C][Ring2][Ring1][=C]\n[C] [N][C@@H1][Branch2][Ring1][=Branch1][C][N][C][=Branch1][C][=O][C][=N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][Ring1][N][C][C][C][C][Branch2][Ring1][=Branch1][N][C][C][=C][C][=C][C][=Branch1][Ring2][=N][Ring1][=Branch1][N][C][=Branch1][C][=O][C][O][Ring1][Branch2][C][C][Ring2][Ring1][Ring1]\nAnswer: A, C"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a hERG blocker (<10uM)?\nConstraint: You must select none, one or more options from a, b, c, or d without using any additional words.\nOptions:\na) O=c1cc([C@H]2CCN[C@@H](Cc3ccccc3)C2)o[nH]1\nb) C[n+]1c(CCCOc2ccccc2Cl)cccc1CCCOc1ccccc1Cl\nc) Nc1cc(C(F)(F)F)c(-c2cc(N3CCOCC3)nc(N3CCOCC3)n2)cn1\nd) CC1(c2cc(C(F)(F)F)cc(C(F)(F)F)c2)CCN([C@]2(c3ccccc3)CC[C@@H](N3CCN(c4ccccc4Cl)C(=O)C3)CC2)C1=O\nAnswer: a, c"}", "/scratch/micpie/export/herg_karim_et_al/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a hERG blocking compound (<10uM).\ncanonical SMILES: N[C@@H](Cn1c(=O)cnc2ccc(F)cc21)C1CCC(NCc2ccc3c(n2)NC(=O)CO3)CC1\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is no hERG blocking compound (<10uM)."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a human ether-à-go-go related gene (hERG) blocking compound (<10uM).\nInChI: InChI=1S\/C15H18N2O2\/c18-15-10-14(19-17-15)12-6-7-16-13(9-12)8-11-4-2-1-3-5-11\/h1-5,10,12-13,16H,6-9H2,(H,17,18)\/t12-,13-\/m0\/s1\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is no human ether-à-go-go related gene (hERG) blocking compound (<10uM)."}", "/scratch/micpie/export/herg_karim_et_al/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that does block hERG (<10uM).\nSELFIES: [N][C@@H1][Branch2][Ring1][=Branch1][C][N][C][=Branch1][C][=O][C][=N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][Ring1][N][C][C][C][C][Branch2][Ring1][=Branch1][N][C][C][=C][C][=C][C][=Branch1][Ring2][=N][Ring1][=Branch1][N][C][=Branch1][C][=O][C][O][Ring1][Branch2][C][C][Ring2][Ring1][Ring1]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that does block hERG (<10uM).\nInChI: InChI=1S\/C15H18N2O2\/c18-15-10-14(19-17-15)12-6-7-16-13(9-12)8-11-4-2-1-3-5-11\/h1-5,10,12-13,16H,6-9H2,(H,17,18)\/t12-,13-\/m0\/s1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/herg_karim_et_al/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a hERG blocking compound (<10uM).\nMolecule InChI: InChI=1S\/C26H26FN3O3S\/c1-32-23-16-20(7-10-22(23)33-15-14-29-12-2-3-13-29)30-17-18-4-11-24(28-25(18)26(30)31)34-21-8-5-19(27)6-9-21\/h4-11,16H,2-3,12-15,17H2,1H3\nConstraint: Answer the question in a full sentence.\nResult: This molecule is no hERG blocking compound (<10uM)."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a hERG blocker (<10uM).\ncanonical SMILES: O=C(Nc1ccc(-c2nnc(NCCCN3CCCCC3)o2)cc1)c1c(F)cccc1F\nConstraint: Answer the question in a full sentence.\nResult: This molecule is hERG blocker (<10uM)."}", "/scratch/micpie/export/herg_karim_et_al/valid_0-12.jsonl": "{"text":"User: I want to come up with a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be a human ether-à-go-go related gene (hERG) blocking compound (<10uM).\nAssistant: Ok, this canonical SMILES is not a human ether-à-go-go related gene (hERG) blocking compound (<10uM): N[C@@H](Cn1c(=O)cnc2ccc(F)cc21)C1CCC(NCc2ccc3c(n2)NC(=O)CO3)CC1"} {"text":"User: I want to generate a InChI.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be a hERG blocking compound (<10uM).\nAssistant: Understood, this InChI is not a hERG blocking compound (<10uM): InChI=1S\/C15H18N2O2\/c18-15-10-14(19-17-15)12-6-7-16-13(9-12)8-11-4-2-1-3-5-11\/h1-5,10,12-13,16H,6-9H2,(H,17,18)\/t12-,13-\/m0\/s1"}", "/scratch/micpie/export/herg_karim_et_al/train_0-2.jsonl": "{"text":"The canonical SMILES COc1cc(N2Cc3ccc(Sc4ccc(F)cc4)nc3C2=O)ccc1OCCN1CCCC1 is from a molecule that is not a hERG blocker (>= 10uM)."} {"text":"The SMILES O=C(Nc1ccc(-c2nnc(NCCCN3CCCCC3)o2)cc1)c1c(F)cccc1F represents a molecule that is a hERG blocker (<10uM)."}", "/scratch/micpie/export/herg_karim_et_al/test_0-11.jsonl": "{"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be a hERG blocking compound (<10uM).\nAssistant: Got it, this SMILES is not a hERG blocking compound (<10uM): O=C(NC1COc2cccc(-c3ccnc(CO)c3)c2C1)c1ccc(OCC(F)(F)F)nc1"} {"text":"User: I want to create a molecule SELFIES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be a human ether-à-go-go related gene (hERG) blocking compound (<10uM).\nAssistant: Got it, here you go, this SELFIES is not a human ether-à-go-go related gene (hERG) blocking compound (<10uM): [O][=C][Branch2][Ring1][Ring2][N][C][=C][C][=C][Branch1][Branch2][C][=N][N][=N][NH1][Ring1][Branch1][C][=C][Ring1][O][F][C][Branch1][=Branch2][C][C][C][C][C][C][Ring1][=Branch1][N][C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][=N][C][=C][C][Branch1][C][F][=C][Branch1][C][F][C][=C][Ring1][Branch2][Ring2][Ring1][C]"}", "/scratch/micpie/export/herg_karim_et_al/train_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the DeepSMILES COcccNCccccSccccF)cc6)))))))nc6C9=O))))))))))ccc6OCCNCCCC5 is a hERG blocker (<10uM)?\nAssistant: No, this molecule is not a hERG blocker (<10uM)."} {"text":"User: Can you derive if the molecule with the SMILES O=C(Nc1ccc(-c2nnc(NCCCN3CCCCC3)o2)cc1)c1c(F)cccc1F is a human ether-à-go-go related gene (hERG) blocking compound (<10uM)?\nAssistant: Yes, this molecule is a human ether-à-go-go related gene (hERG) blocking compound (<10uM)."}", "/scratch/micpie/export/herg_karim_et_al/train_0-11.jsonl": "{"text":"User: I want to create a SELFIES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be a human ether-à-go-go related gene (hERG) blocking compound (<10uM).\nAssistant: Got it, here you go, this SELFIES is not a human ether-à-go-go related gene (hERG) blocking compound (<10uM): [C][O][C][=C][C][Branch2][Ring1][=N][N][C][C][=C][C][=C][Branch1][=N][S][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][N][=C][Ring1][=C][C][Ring1][P][=O][=C][C][=C][Ring2][Ring1][Branch2][O][C][C][N][C][C][C][C][Ring1][Branch1]"} {"text":"User: I want to come up with a InChI.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should be a hERG blocking compound (<10uM).\nAssistant: Got it, here you go, this InChI is a hERG blocking compound (<10uM): InChI=1S\/C23H25F2N5O2\/c24-18-6-4-7-19(25)20(18)21(31)27-17-10-8-16(9-11-17)22-28-29-23(32-22)26-12-5-15-30-13-2-1-3-14-30\/h4,6-11H,1-3,5,12-15H2,(H,26,29)(H,27,31)"}", "/scratch/micpie/export/herg_karim_et_al/train_0-1.jsonl": "{"text":"Based on the InChI InChI=1S\/C26H26FN3O3S\/c1-32-23-16-20(7-10-22(23)33-15-14-29-12-2-3-13-29)30-17-18-4-11-24(28-25(18)26(30)31)34-21-8-5-19(27)6-9-21\/h4-11,16H,2-3,12-15,17H2,1H3, the molecule is not a human ether-à-go-go related gene (hERG) blocker (>= 10uM)."} {"text":"Based on the canonical SMILES O=C(Nc1ccc(-c2nnc(NCCCN3CCCCC3)o2)cc1)c1c(F)cccc1F, the molecule is a human ether-à-go-go related gene (hERG) blocker (<10uM)."}", "/scratch/micpie/export/herg_karim_et_al/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a hERG blocking compound (<10uM)?\nConstraint: You must select none, one or more options from a, b, or c without using any additional words.\nOptions:\n[a] InChI=1S\/C28H28F3N5S\/c1-18-7-12-22-23(5-3-6-24(22)32-18)25-33-34-26(35(25)2)37-14-4-13-36-16-21-15-27(21,17-36)19-8-10-20(11-9-19)28(29,30)31\/h3,5-12,21H,4,13-17H2,1-2H3\n[b] InChI=1S\/C26H26FN3O3S\/c1-32-23-16-20(7-10-22(23)33-15-14-29-12-2-3-13-29)30-17-18-4-11-24(28-25(18)26(30)31)34-21-8-5-19(27)6-9-21\/h4-11,16H,2-3,12-15,17H2,1H3\n[c] InChI=1S\/C21H24F3N3S\/c1-25-11-13-26(14-12-25)9-4-10-27-17-5-2-3-6-19(17)28-20-8-7-16(15-18(20)27)21(22,23)24\/h2-3,5-8,15H,4,9-14H2,1H3\nAnswer: b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are a hERG blocker (<10uM)?\nConstraint: You must select none, one or more options from A or B without using any additional words.\nOptions:\nA O=S(=O)(c1ccc(C=Cc2cccc(F)c2)nc1)c1ccccc1F\nB O=C(Nc1ccc(-c2nnc(NCCCN3CCCCC3)o2)cc1)c1c(F)cccc1F\nAnswer: A, B"}", "/scratch/micpie/export/herg_karim_et_al/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that does block hERG (<10uM).\nMolecule InChI: InChI=1S\/C26H26FN3O3S\/c1-32-23-16-20(7-10-22(23)33-15-14-29-12-2-3-13-29)30-17-18-4-11-24(28-25(18)26(30)31)34-21-8-5-19(27)6-9-21\/h4-11,16H,2-3,12-15,17H2,1H3\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that does block the human ether-à-go-go related gene (hERG) (<10uM).\nMolecule InChI: InChI=1S\/C23H25F2N5O2\/c24-18-6-4-7-19(25)20(18)21(31)27-17-10-8-16(9-11-17)22-28-29-23(32-22)26-12-5-15-30-13-2-1-3-14-30\/h4,6-11H,1-3,5,12-15H2,(H,26,29)(H,27,31)\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"}", "/scratch/micpie/export/herg_karim_et_al/test_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the SELFIES [O][=C][Branch2][Ring1][#C][N][C][C][O][C][=C][C][=C][C][Branch1][=N][C][=C][C][=N][C][Branch1][Ring1][C][O][=C][Ring1][Branch2][=C][Ring1][=C][C][Ring2][Ring1][C][C][=C][C][=C][Branch1][O][O][C][C][Branch1][C][F][Branch1][C][F][F][N][=C][Ring1][N] is a human ether-à-go-go related gene (hERG) blocking compound (<10uM)?\nAssistant: No, this molecule is not a human ether-à-go-go related gene (hERG) blocking compound (<10uM)."} {"text":"User: Can you estimate if the molecule with the SELFIES [O][=C][Branch2][Ring1][Ring2][N][C][=C][C][=C][Branch1][Branch2][C][=N][N][=N][NH1][Ring1][Branch1][C][=C][Ring1][O][F][C][Branch1][=Branch2][C][C][C][C][C][C][Ring1][=Branch1][N][C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][=N][C][=C][C][Branch1][C][F][=C][Branch1][C][F][C][=C][Ring1][Branch2][Ring2][Ring1][C] is a hERG blocking compound (<10uM)?\nAssistant: No, this molecule is not a hERG blocking compound (<10uM)."}", "/scratch/micpie/export/herg_karim_et_al/train_0-9.jsonl": "{"text":"User: Can you give me the SMILES of a molecule that is not a hERG blocking compound (<10uM)?\nAssistant: Yes, here you go: COc1cc(N2Cc3ccc(Sc4ccc(F)cc4)nc3C2=O)ccc1OCCN1CCCC1"} {"text":"User: Can you create the DeepSMILES of a molecule that is a human ether-à-go-go related gene (hERG) blocker (<10uM)?\nAssistant: Yes, here you go: O=CNcccc-cnncNCCCNCCCCC6))))))))))o5)))))cc6)))))))ccF)cccc6F"}", "/scratch/micpie/export/herg_karim_et_al/valid_0-3.jsonl": "{"text":"The InChI InChI=1S\/C24H27FN6O3\/c25-15-3-7-19-20(9-15)31(23(33)11-28-19)12-18(26)14-1-4-16(5-2-14)27-10-17-6-8-21-24(29-17)30-22(32)13-34-21\/h3,6-9,11,14,16,18,27H,1-2,4-5,10,12-13,26H2,(H,29,30,32)\/t14?,16?,18-\/m0\/s1 is not a human ether-à-go-go related gene (hERG) blocker (>= 10uM)."} {"text":"The molecule SELFIES [O][=C][C][=C][Branch2][Ring1][Ring2][C@H1][C][C][N][C@@H1][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][=N][O][NH1][Ring2][Ring1][C] is not a human ether-à-go-go related gene (hERG) blocker (>= 10uM)."}", "/scratch/micpie/export/herg_karim_et_al/test_0-8.jsonl": "{"text":"User: Is the molecule with the DeepSMILES O=CNCCOccccc-cccncCO))c6))))))c6C%10)))))))))))ccccOCCF)F)F))))nc6 a human ether-à-go-go related gene (hERG) blocker (<10uM)?\nAssistant: No, it is not a human ether-à-go-go related gene (hERG) blocker (<10uM)."} {"text":"User: Is the molecule with the InChI InChI=1S\/C28H23ClF3N7O\/c29-18-9-6-16(7-10-18)27-33-23-13-19(30)20(31)14-24(23)39(27)25(15-4-2-1-3-5-15)28(40)34-22-11-8-17(12-21(22)32)26-35-37-38-36-26\/h6-15,25H,1-5H2,(H,34,40)(H,35,36,37,38) a human ether-à-go-go related gene (hERG) blocking compound (<10uM)?\nAssistant: No, it is not a human ether-à-go-go related gene (hERG) blocking compound (<10uM)."}", "/scratch/micpie/export/herg_karim_et_al/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that does block hERG (<10uM).\nMolecule DeepSMILES: O=CNCCOccccc-cccncCO))c6))))))c6C%10)))))))))))ccccOCCF)F)F))))nc6\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that does block the human ether-à-go-go related gene (hERG) (<10uM).\nMolecule InChI: InChI=1S\/C28H23ClF3N7O\/c29-18-9-6-16(7-10-18)27-33-23-13-19(30)20(31)14-24(23)39(27)25(15-4-2-1-3-5-15)28(40)34-22-11-8-17(12-21(22)32)26-35-37-38-36-26\/h6-15,25H,1-5H2,(H,34,40)(H,35,36,37,38)\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/herg_karim_et_al/test_0-12.jsonl": "{"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be a human ether-à-go-go related gene (hERG) blocker (<10uM).\nAssistant: Ok, this DeepSMILES is not a human ether-à-go-go related gene (hERG) blocker (<10uM): O=CNCCOccccc-cccncCO))c6))))))c6C%10)))))))))))ccccOCCF)F)F))))nc6"} {"text":"User: I want to create a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be a human ether-à-go-go related gene (hERG) blocking compound (<10uM).\nAssistant: Ok, this canonical SMILES is not a human ether-à-go-go related gene (hERG) blocking compound (<10uM): O=C(Nc1ccc(-c2nnn[nH]2)cc1F)C(C1CCCCC1)n1c(-c2ccc(Cl)cc2)nc2cc(F)c(F)cc21"}", "/scratch/micpie/export/ocp/test_0-1.jsonl": "{"text":"Task: Determine the adsorption energy of the following adsorbate-adsorbent pair.\nText: Adsorbate OH is adsorbed on the catalytic surface Y2Cd6 with a Miller Index of (1, 1, 0). The O atom of the adsorbate is placed on the hollow site and is binding to the catalytic surface atoms Cd, Cd, Y, Y. The adsorbate OH molecule is a polar molecule that consists of one oxygen atom and one hydrogen atom. The bonding type of the OH molecule is a covalent bond, where electrons are shared between the oxygen and hydrogen atoms, resulting in a single bond.\n\nThe molecular size of the OH molecule is 0.96 \\r{A}, and its bond angle is 104.5 degrees. The bond length between the oxygen and hydrogen atoms is 0.955 \\r{A}. \n\nThe OH molecule has a lone pair of electrons on the oxygen atom that occupies a non-bonding sp3 hybrid orbital. The oxygen atom also has two unpaired electrons that occupy two different p orbitals.\n\nThe dipole moment of the OH molecule is 1.85 Debye, resulting from the difference in electronegativity between oxygen and hydrogen atoms, where oxygen is more electronegative. \n\nOverall, the OH molecule is a polar molecule with a bent shape and a strong dipole moment, making it a common and important adsorbate in catalytic systems. YCd\\$\\_3\\$ crystallizes in the orthorhombic Cmcm space group. Y is bonded to twelve Cd atoms to form YCd\\$\\_1\\$\\$\\_2\\$ cuboctahedra that share corners with four equivalent CdY\\$\\_4\\$Cd\\$\\_8\\$ cuboctahedra, corners with six equivalent YCd\\$\\_1\\$\\$\\_2\\$ cuboctahedra, edges with six equivalent CdY\\$\\_4\\$Cd\\$\\_8\\$ cuboctahedra, faces with four equivalent CdY\\$\\_4\\$Cd\\$\\_8\\$ cuboctahedra, and faces with eight equivalent YCd\\$\\_1\\$\\$\\_2\\$ cuboctahedra. There are a spread of Y{\\textendash}Cd bond distances ranging from 3.16{\\textendash}3.35 \\r{A}. There are two inequivalent Cd sites. In the first Cd site, Cd is bonded to four equivalent Y and eight equivalent Cd atoms to form distorted CdY\\$\\_4\\$Cd\\$\\_8\\$ cuboctahedra that share corners with four equivalent YCd\\$\\_1\\$\\$\\_2\\$ cuboctahedra, corners with six equivalent CdY\\$\\_4\\$Cd\\$\\_8\\$ cuboctahedra, edges with six equivalent YCd\\$\\_1\\$\\$\\_2\\$ cuboctahedra, faces with four equivalent YCd\\$\\_1\\$\\$\\_2\\$ cuboctahedra, and faces with eight equivalent CdY\\$\\_4\\$Cd\\$\\_8\\$ cuboctahedra. There are a spread of Cd{\\textendash}Cd bond distances ranging from 3.05{\\textendash}3.35 \\r{A}. In the second Cd site, Cd is bonded in a 11-coordinate geometry to four equivalent Y and seven Cd atoms. There are two shorter (3.10 \\r{A}) and one longer (3.11 \\r{A}) Cd{\\textendash}Cd bond length.\nAnswer: -2.2244 eV"} {"text":"Task: Compute the adsorption energy of the following adsorbate-adsorbent pair.\nText: Adsorbate CH is adsorbed on the catalytic surface Hf18Re8Se2 with a Miller Index of (0, 0, 1). The C atom of the adsorbate is placed on the atop site and is binding to the catalytic surface atoms Hf. The CH molecule is a neutral molecule composed of one carbon atom and one hydrogen atom. The bond between them is a covalent bond, in which each atom shares one electron to form a shared pair. The bond length of the CH bond is about 1.09 \\r{A}, while the bond angle is around 109.5{\\textdegree}.\n\nThe CH molecule has a tetrahedral geometry, with the carbon atom at the center and the four bonding electrons (one from the H atom and three from the C atom) arranged in a tetrahedral shape. This geometry is a result of the hybridization of the carbon atom orbitals, which combines one s and three p orbitals to form four sp3 hybrid orbitals.\n\nThe CH molecule has a dipole moment of about 0.74 Debye, which indicates a partial separation of positive and negative charges along the CH bond axis. This dipole moment arises due to the difference in electronegativity between the carbon and hydrogen atoms, with the carbon atom being slightly more electronegative than the hydrogen atom.\n\nIn summary, the CH molecule has a covalent bond with a bond length of 1.09 \\r{A} and a bond angle of 109.5{\\textdegree}, a tetrahedral geometry resulting from the hybridization of carbon atom orbitals, and a dipole moment of 0.74 Debye. Hf\\$\\_9\\$Re\\$\\_4\\$Se crystallizes in the hexagonal P6\\$\\_3\\$\/mmc space group. There are two inequivalent Hf sites. In the first Hf site, Hf is bonded in a 1-coordinate geometry to four Re and one Se atom. There are a spread of Hf{\\textendash}Re bond distances ranging from 2.84{\\textendash}3.04 \\r{A}. The Hf{\\textendash}Se bond length is 2.75 \\r{A}. In the second Hf site, Hf is bonded in a distorted T-shaped geometry to two equivalent Re and one Se atom. Both Hf{\\textendash}Re bond lengths are 2.90 \\r{A}. The Hf{\\textendash}Se bond length is 3.16 \\r{A}. There are two inequivalent Re sites. In the first Re site, Re is bonded to eight Hf and four Re atoms to form distorted ReHf\\$\\_8\\$Re\\$\\_4\\$ cuboctahedra that share corners with four equivalent ReHf\\$\\_8\\$Re\\$\\_4\\$ cuboctahedra and faces with eight ReHf\\$\\_6\\$Re\\$\\_6\\$ cuboctahedra. There are two shorter (2.72 \\r{A}) and two longer (2.87 \\r{A}) Re{\\textendash}Re bond lengths. In the second Re site, Re is bonded to six equivalent Hf and six equivalent Re atoms to form face-sharing ReHf\\$\\_6\\$Re\\$\\_6\\$ cuboctahedra. Se is bonded in a 9-coordinate geometry to nine Hf atoms.\nAnswer: -1.8567 eV"}", "/scratch/micpie/export/ocp/valid_0-0.jsonl": "{"text":"Question: What is the adsorption energy of the following adsorbate-adsorbent pair?\nText: Adsorbate NH is adsorbed on the catalytic surface Al20Rh8 with a Miller Index of (2, 1, 1). The N atom of the adsorbate is placed on the hollow site and is binding to the catalytic surface atoms Al, Al, Rh. The NH molecule is composed of one nitrogen atom and one hydrogen atom and has a single covalent bond. The bond length of NH is approximately 1.015 \\r{A} and the bond angle is roughly 106.7 degrees. NH has a dipole moment of 1.47 debye due to the electronegativity difference between nitrogen and hydrogen atoms. The NH molecule has one lone pair of electrons on the nitrogen atom in the 2p orbital, which can interact with the empty orbitals of the catalyst surface. The molecule also has a sigma bond and a pi bond that can participate in chemical reactions with the catalyst. Additionally, the NH molecule is relatively small with a van der Waals diameter of approximately 2.89 \\r{A}, which may allow it to interact with various atomic or molecular species on the catalyst surface. Al\\$\\_5\\$Rh\\$\\_2\\$ crystallizes in the hexagonal P6\\$\\_3\\$\/mmc space group. There are two inequivalent Rh sites. In the first Rh site, Rh is bonded in a 9-coordinate geometry to nine Al atoms. There are six shorter (2.44 \\r{A}) and three longer (2.67 \\r{A}) Rh{\\textendash}Al bond lengths. In the second Rh site, Rh is bonded in a 10-coordinate geometry to ten Al atoms. There are a spread of Rh{\\textendash}Al bond distances ranging from 2.46{\\textendash}2.76 \\r{A}. There are three inequivalent Al sites. In the first Al site, Al is bonded to six equivalent Rh atoms to form distorted face-sharing AlRh\\$\\_6\\$ cuboctahedra. In the second Al site, Al is bonded in a 3-coordinate geometry to three Rh atoms. In the third Al site, Al is bonded in a 4-coordinate geometry to four Rh atoms.\nAnswer: -0.0256 eV"} {"text":"Question: What is the adsorption energy of the following adsorbate-adsorbent pair?\nText: Adsorbate CH is adsorbed on the catalytic surface AlAu2 with a Miller Index of (1, 0, 0). The C atom of the adsorbate is placed on the hollow site and is binding to the catalytic surface atoms Al, Au, Au, Au. The CH molecule is a neutral molecule composed of one carbon atom and one hydrogen atom. The bond between them is a covalent bond, in which each atom shares one electron to form a shared pair. The bond length of the CH bond is about 1.09 \\r{A}, while the bond angle is around 109.5{\\textdegree}.\n\nThe CH molecule has a tetrahedral geometry, with the carbon atom at the center and the four bonding electrons (one from the H atom and three from the C atom) arranged in a tetrahedral shape. This geometry is a result of the hybridization of the carbon atom orbitals, which combines one s and three p orbitals to form four sp3 hybrid orbitals.\n\nThe CH molecule has a dipole moment of about 0.74 Debye, which indicates a partial separation of positive and negative charges along the CH bond axis. This dipole moment arises due to the difference in electronegativity between the carbon and hydrogen atoms, with the carbon atom being slightly more electronegative than the hydrogen atom.\n\nIn summary, the CH molecule has a covalent bond with a bond length of 1.09 \\r{A} and a bond angle of 109.5{\\textdegree}, a tetrahedral geometry resulting from the hybridization of carbon atom orbitals, and a dipole moment of 0.74 Debye. Au\\$\\_2\\$Al is Titanium Disilicide-like structured and crystallizes in the tetragonal I4\/mmm space group. Au is bonded in a 4-coordinate geometry to five equivalent Al atoms. There are four shorter (2.73 \\r{A}) and one longer (3.02 \\r{A}) Au{\\textendash}Al bond length. Al is bonded in a 10-coordinate geometry to ten equivalent Au atoms.\nAnswer: -0.4367 eV"}", "/scratch/micpie/export/ocp/test_0-0.jsonl": "{"text":"Question: What is the adsorption energy of the following adsorbate-adsorbent pair?\nText: Adsorbate OH is adsorbed on the catalytic surface Y2Cd6 with a Miller Index of (1, 1, 0). The O atom of the adsorbate is placed on the hollow site and is binding to the catalytic surface atoms Cd, Cd, Y, Y. The adsorbate OH molecule is a polar molecule that consists of one oxygen atom and one hydrogen atom. The bonding type of the OH molecule is a covalent bond, where electrons are shared between the oxygen and hydrogen atoms, resulting in a single bond.\n\nThe molecular size of the OH molecule is 0.96 \\r{A}, and its bond angle is 104.5 degrees. The bond length between the oxygen and hydrogen atoms is 0.955 \\r{A}. \n\nThe OH molecule has a lone pair of electrons on the oxygen atom that occupies a non-bonding sp3 hybrid orbital. The oxygen atom also has two unpaired electrons that occupy two different p orbitals.\n\nThe dipole moment of the OH molecule is 1.85 Debye, resulting from the difference in electronegativity between oxygen and hydrogen atoms, where oxygen is more electronegative. \n\nOverall, the OH molecule is a polar molecule with a bent shape and a strong dipole moment, making it a common and important adsorbate in catalytic systems. YCd\\$\\_3\\$ crystallizes in the orthorhombic Cmcm space group. Y is bonded to twelve Cd atoms to form YCd\\$\\_1\\$\\$\\_2\\$ cuboctahedra that share corners with four equivalent CdY\\$\\_4\\$Cd\\$\\_8\\$ cuboctahedra, corners with six equivalent YCd\\$\\_1\\$\\$\\_2\\$ cuboctahedra, edges with six equivalent CdY\\$\\_4\\$Cd\\$\\_8\\$ cuboctahedra, faces with four equivalent CdY\\$\\_4\\$Cd\\$\\_8\\$ cuboctahedra, and faces with eight equivalent YCd\\$\\_1\\$\\$\\_2\\$ cuboctahedra. There are a spread of Y{\\textendash}Cd bond distances ranging from 3.16{\\textendash}3.35 \\r{A}. There are two inequivalent Cd sites. In the first Cd site, Cd is bonded to four equivalent Y and eight equivalent Cd atoms to form distorted CdY\\$\\_4\\$Cd\\$\\_8\\$ cuboctahedra that share corners with four equivalent YCd\\$\\_1\\$\\$\\_2\\$ cuboctahedra, corners with six equivalent CdY\\$\\_4\\$Cd\\$\\_8\\$ cuboctahedra, edges with six equivalent YCd\\$\\_1\\$\\$\\_2\\$ cuboctahedra, faces with four equivalent YCd\\$\\_1\\$\\$\\_2\\$ cuboctahedra, and faces with eight equivalent CdY\\$\\_4\\$Cd\\$\\_8\\$ cuboctahedra. There are a spread of Cd{\\textendash}Cd bond distances ranging from 3.05{\\textendash}3.35 \\r{A}. In the second Cd site, Cd is bonded in a 11-coordinate geometry to four equivalent Y and seven Cd atoms. There are two shorter (3.10 \\r{A}) and one longer (3.11 \\r{A}) Cd{\\textendash}Cd bond length.\nAnswer: -2.2244 eV"} {"text":"Question: What is the adsorption energy of the following adsorbate-adsorbent pair?\nText: Adsorbate CH is adsorbed on the catalytic surface Hf18Re8Se2 with a Miller Index of (0, 0, 1). The C atom of the adsorbate is placed on the atop site and is binding to the catalytic surface atoms Hf. The CH molecule is a neutral molecule composed of one carbon atom and one hydrogen atom. The bond between them is a covalent bond, in which each atom shares one electron to form a shared pair. The bond length of the CH bond is about 1.09 \\r{A}, while the bond angle is around 109.5{\\textdegree}.\n\nThe CH molecule has a tetrahedral geometry, with the carbon atom at the center and the four bonding electrons (one from the H atom and three from the C atom) arranged in a tetrahedral shape. This geometry is a result of the hybridization of the carbon atom orbitals, which combines one s and three p orbitals to form four sp3 hybrid orbitals.\n\nThe CH molecule has a dipole moment of about 0.74 Debye, which indicates a partial separation of positive and negative charges along the CH bond axis. This dipole moment arises due to the difference in electronegativity between the carbon and hydrogen atoms, with the carbon atom being slightly more electronegative than the hydrogen atom.\n\nIn summary, the CH molecule has a covalent bond with a bond length of 1.09 \\r{A} and a bond angle of 109.5{\\textdegree}, a tetrahedral geometry resulting from the hybridization of carbon atom orbitals, and a dipole moment of 0.74 Debye. Hf\\$\\_9\\$Re\\$\\_4\\$Se crystallizes in the hexagonal P6\\$\\_3\\$\/mmc space group. There are two inequivalent Hf sites. In the first Hf site, Hf is bonded in a 1-coordinate geometry to four Re and one Se atom. There are a spread of Hf{\\textendash}Re bond distances ranging from 2.84{\\textendash}3.04 \\r{A}. The Hf{\\textendash}Se bond length is 2.75 \\r{A}. In the second Hf site, Hf is bonded in a distorted T-shaped geometry to two equivalent Re and one Se atom. Both Hf{\\textendash}Re bond lengths are 2.90 \\r{A}. The Hf{\\textendash}Se bond length is 3.16 \\r{A}. There are two inequivalent Re sites. In the first Re site, Re is bonded to eight Hf and four Re atoms to form distorted ReHf\\$\\_8\\$Re\\$\\_4\\$ cuboctahedra that share corners with four equivalent ReHf\\$\\_8\\$Re\\$\\_4\\$ cuboctahedra and faces with eight ReHf\\$\\_6\\$Re\\$\\_6\\$ cuboctahedra. There are two shorter (2.72 \\r{A}) and two longer (2.87 \\r{A}) Re{\\textendash}Re bond lengths. In the second Re site, Re is bonded to six equivalent Hf and six equivalent Re atoms to form face-sharing ReHf\\$\\_6\\$Re\\$\\_6\\$ cuboctahedra. Se is bonded in a 9-coordinate geometry to nine Hf atoms.\nAnswer: -1.8567 eV"}", "/scratch/micpie/export/ocp/train_0-0.jsonl": "{"text":"Question: What is the adsorption energy of the following adsorbate-adsorbent pair?\nText: Adsorbate COCHO is adsorbed on the catalytic surface Zr2SeN2 with a Miller Index of (2, 2, 1). The C atom of the adsorbate is placed on the atop site and is binding to the catalytic surface atoms Zr.The O atom of the adsorbate is placed on the atop site and is binding to the catalytic surface atoms Zr. The adsorbate molecule COCHO, also known as glyoxal, is a small organic molecule consisting of two carbonyl groups attached to a central carbon atom. The carbon atom is sp2 hybridized and forms double bonds with the two oxygen atoms, resulting in a planar molecule with bond angles of approximately 120 degrees.\n\nThe two carbonyl groups possess a polar C=O bond, which gives the molecule an overall dipole moment of approximately 2.8 Debye. Additionally, the molecule possesses two lone pairs of electrons on the central carbon atom, which can participate in weak hydrogen bonding interactions with neighboring molecules or the catalyst surface.\n\nIn terms of molecular size, COCHO has a bond length of approximately 1.2 \\r{A} for the C=O bond and 1.5 \\r{A} for the C-C bond between the two carbonyl groups. The molecule has a molecular weight of 58.04 g\/mol and a van der Waals radius of approximately 1.66 \\r{A}.\n\nOverall, the bonding type in COCHO involves covalent bonding between the carbon and oxygen atoms within each carbonyl group, while weaker non-covalent interactions such as hydrogen bonding may be present between the molecule and the catalyst surface. The molecule possesses a planar geometry with a significant dipole moment, which may contribute to its adsorption and reactivity on catalyst surfaces. Zr\\$\\_2\\$N\\$\\_2\\$Se crystallizes in the trigonal P\\${\\textasciicircum}-\\$3m1 space group. Zr\\${\\textasciicircum}4\\$\\${\\textasciicircum}+\\$ is bonded in a 4-coordinate geometry to four equivalent N{\\textthreesuperior}\\${\\textasciicircum}-\\$ and three equivalent Se{\\texttwosuperior}\\${\\textasciicircum}-\\$ atoms. There are three shorter (2.15 \\r{A}) and one longer (2.17 \\r{A}) Zr{\\textendash}N bond length. All Zr{\\textendash}Se bond lengths are 2.92 \\r{A}. N{\\textthreesuperior}\\${\\textasciicircum}-\\$ is bonded to four equivalent Zr\\${\\textasciicircum}4\\$\\${\\textasciicircum}+\\$ atoms to form a mixture of corner and edge-sharing NZr\\$\\_4\\$ tetrahedra. Se{\\texttwosuperior}\\${\\textasciicircum}-\\$ is bonded in a 6-coordinate geometry to six equivalent Zr\\${\\textasciicircum}4\\$\\${\\textasciicircum}+\\$ atoms.\nAnswer: -1.8371 eV"} {"text":"Question: What is the adsorption energy of the following adsorbate-adsorbent pair?\nText: Adsorbate CH is adsorbed on the catalytic surface Sc2GaPd with a Miller Index of (1, 1, 0). The C atom of the adsorbate is placed on the bridge site and is binding to the catalytic surface atoms Ga, Sc. The CH molecule is a neutral molecule composed of one carbon atom and one hydrogen atom. The bond between them is a covalent bond, in which each atom shares one electron to form a shared pair. The bond length of the CH bond is about 1.09 \\r{A}, while the bond angle is around 109.5{\\textdegree}.\n\nThe CH molecule has a tetrahedral geometry, with the carbon atom at the center and the four bonding electrons (one from the H atom and three from the C atom) arranged in a tetrahedral shape. This geometry is a result of the hybridization of the carbon atom orbitals, which combines one s and three p orbitals to form four sp3 hybrid orbitals.\n\nThe CH molecule has a dipole moment of about 0.74 Debye, which indicates a partial separation of positive and negative charges along the CH bond axis. This dipole moment arises due to the difference in electronegativity between the carbon and hydrogen atoms, with the carbon atom being slightly more electronegative than the hydrogen atom.\n\nIn summary, the CH molecule has a covalent bond with a bond length of 1.09 \\r{A} and a bond angle of 109.5{\\textdegree}, a tetrahedral geometry resulting from the hybridization of carbon atom orbitals, and a dipole moment of 0.74 Debye. Sc\\$\\_2\\$PdGa is Heusler structured and crystallizes in the cubic Fm\\${\\textasciicircum}-\\$3m space group. Sc is bonded in a body-centered cubic geometry to four equivalent Pd and four equivalent Ga atoms. All Sc{\\textendash}Pd bond lengths are 2.88 \\r{A}. All Sc{\\textendash}Ga bond lengths are 2.88 \\r{A}. Pd is bonded in a body-centered cubic geometry to eight equivalent Sc atoms. Ga is bonded in a body-centered cubic geometry to eight equivalent Sc atoms.\nAnswer: -1.2897 eV"}", "/scratch/micpie/export/ocp/valid_0-1.jsonl": "{"text":"Task: Estimate the adsorption energy of the following adsorbate-adsorbent pair.\nText: Adsorbate NH is adsorbed on the catalytic surface Al20Rh8 with a Miller Index of (2, 1, 1). The N atom of the adsorbate is placed on the hollow site and is binding to the catalytic surface atoms Al, Al, Rh. The NH molecule is composed of one nitrogen atom and one hydrogen atom and has a single covalent bond. The bond length of NH is approximately 1.015 \\r{A} and the bond angle is roughly 106.7 degrees. NH has a dipole moment of 1.47 debye due to the electronegativity difference between nitrogen and hydrogen atoms. The NH molecule has one lone pair of electrons on the nitrogen atom in the 2p orbital, which can interact with the empty orbitals of the catalyst surface. The molecule also has a sigma bond and a pi bond that can participate in chemical reactions with the catalyst. Additionally, the NH molecule is relatively small with a van der Waals diameter of approximately 2.89 \\r{A}, which may allow it to interact with various atomic or molecular species on the catalyst surface. Al\\$\\_5\\$Rh\\$\\_2\\$ crystallizes in the hexagonal P6\\$\\_3\\$\/mmc space group. There are two inequivalent Rh sites. In the first Rh site, Rh is bonded in a 9-coordinate geometry to nine Al atoms. There are six shorter (2.44 \\r{A}) and three longer (2.67 \\r{A}) Rh{\\textendash}Al bond lengths. In the second Rh site, Rh is bonded in a 10-coordinate geometry to ten Al atoms. There are a spread of Rh{\\textendash}Al bond distances ranging from 2.46{\\textendash}2.76 \\r{A}. There are three inequivalent Al sites. In the first Al site, Al is bonded to six equivalent Rh atoms to form distorted face-sharing AlRh\\$\\_6\\$ cuboctahedra. In the second Al site, Al is bonded in a 3-coordinate geometry to three Rh atoms. In the third Al site, Al is bonded in a 4-coordinate geometry to four Rh atoms.\nAnswer: -0.0256 eV"} {"text":"Task: Determine the adsorption energy of the following adsorbate-adsorbent pair.\nText: Adsorbate CH is adsorbed on the catalytic surface AlAu2 with a Miller Index of (1, 0, 0). The C atom of the adsorbate is placed on the hollow site and is binding to the catalytic surface atoms Al, Au, Au, Au. The CH molecule is a neutral molecule composed of one carbon atom and one hydrogen atom. The bond between them is a covalent bond, in which each atom shares one electron to form a shared pair. The bond length of the CH bond is about 1.09 \\r{A}, while the bond angle is around 109.5{\\textdegree}.\n\nThe CH molecule has a tetrahedral geometry, with the carbon atom at the center and the four bonding electrons (one from the H atom and three from the C atom) arranged in a tetrahedral shape. This geometry is a result of the hybridization of the carbon atom orbitals, which combines one s and three p orbitals to form four sp3 hybrid orbitals.\n\nThe CH molecule has a dipole moment of about 0.74 Debye, which indicates a partial separation of positive and negative charges along the CH bond axis. This dipole moment arises due to the difference in electronegativity between the carbon and hydrogen atoms, with the carbon atom being slightly more electronegative than the hydrogen atom.\n\nIn summary, the CH molecule has a covalent bond with a bond length of 1.09 \\r{A} and a bond angle of 109.5{\\textdegree}, a tetrahedral geometry resulting from the hybridization of carbon atom orbitals, and a dipole moment of 0.74 Debye. Au\\$\\_2\\$Al is Titanium Disilicide-like structured and crystallizes in the tetragonal I4\/mmm space group. Au is bonded in a 4-coordinate geometry to five equivalent Al atoms. There are four shorter (2.73 \\r{A}) and one longer (3.02 \\r{A}) Au{\\textendash}Al bond length. Al is bonded in a 10-coordinate geometry to ten equivalent Au atoms.\nAnswer: -0.4367 eV"}", "/scratch/micpie/export/ocp/train_0-1.jsonl": "{"text":"Task: Determine the adsorption energy of the following adsorbate-adsorbent pair.\nText: Adsorbate COCHO is adsorbed on the catalytic surface Zr2SeN2 with a Miller Index of (2, 2, 1). The C atom of the adsorbate is placed on the atop site and is binding to the catalytic surface atoms Zr.The O atom of the adsorbate is placed on the atop site and is binding to the catalytic surface atoms Zr. The adsorbate molecule COCHO, also known as glyoxal, is a small organic molecule consisting of two carbonyl groups attached to a central carbon atom. The carbon atom is sp2 hybridized and forms double bonds with the two oxygen atoms, resulting in a planar molecule with bond angles of approximately 120 degrees.\n\nThe two carbonyl groups possess a polar C=O bond, which gives the molecule an overall dipole moment of approximately 2.8 Debye. Additionally, the molecule possesses two lone pairs of electrons on the central carbon atom, which can participate in weak hydrogen bonding interactions with neighboring molecules or the catalyst surface.\n\nIn terms of molecular size, COCHO has a bond length of approximately 1.2 \\r{A} for the C=O bond and 1.5 \\r{A} for the C-C bond between the two carbonyl groups. The molecule has a molecular weight of 58.04 g\/mol and a van der Waals radius of approximately 1.66 \\r{A}.\n\nOverall, the bonding type in COCHO involves covalent bonding between the carbon and oxygen atoms within each carbonyl group, while weaker non-covalent interactions such as hydrogen bonding may be present between the molecule and the catalyst surface. The molecule possesses a planar geometry with a significant dipole moment, which may contribute to its adsorption and reactivity on catalyst surfaces. Zr\\$\\_2\\$N\\$\\_2\\$Se crystallizes in the trigonal P\\${\\textasciicircum}-\\$3m1 space group. Zr\\${\\textasciicircum}4\\$\\${\\textasciicircum}+\\$ is bonded in a 4-coordinate geometry to four equivalent N{\\textthreesuperior}\\${\\textasciicircum}-\\$ and three equivalent Se{\\texttwosuperior}\\${\\textasciicircum}-\\$ atoms. There are three shorter (2.15 \\r{A}) and one longer (2.17 \\r{A}) Zr{\\textendash}N bond length. All Zr{\\textendash}Se bond lengths are 2.92 \\r{A}. N{\\textthreesuperior}\\${\\textasciicircum}-\\$ is bonded to four equivalent Zr\\${\\textasciicircum}4\\$\\${\\textasciicircum}+\\$ atoms to form a mixture of corner and edge-sharing NZr\\$\\_4\\$ tetrahedra. Se{\\texttwosuperior}\\${\\textasciicircum}-\\$ is bonded in a 6-coordinate geometry to six equivalent Zr\\${\\textasciicircum}4\\$\\${\\textasciicircum}+\\$ atoms.\nAnswer: -1.8371 eV"} {"text":"Task: Calculate the adsorption energy of the following adsorbate-adsorbent pair.\nText: Adsorbate CH is adsorbed on the catalytic surface Sc2GaPd with a Miller Index of (1, 1, 0). The C atom of the adsorbate is placed on the bridge site and is binding to the catalytic surface atoms Ga, Sc. The CH molecule is a neutral molecule composed of one carbon atom and one hydrogen atom. The bond between them is a covalent bond, in which each atom shares one electron to form a shared pair. The bond length of the CH bond is about 1.09 \\r{A}, while the bond angle is around 109.5{\\textdegree}.\n\nThe CH molecule has a tetrahedral geometry, with the carbon atom at the center and the four bonding electrons (one from the H atom and three from the C atom) arranged in a tetrahedral shape. This geometry is a result of the hybridization of the carbon atom orbitals, which combines one s and three p orbitals to form four sp3 hybrid orbitals.\n\nThe CH molecule has a dipole moment of about 0.74 Debye, which indicates a partial separation of positive and negative charges along the CH bond axis. This dipole moment arises due to the difference in electronegativity between the carbon and hydrogen atoms, with the carbon atom being slightly more electronegative than the hydrogen atom.\n\nIn summary, the CH molecule has a covalent bond with a bond length of 1.09 \\r{A} and a bond angle of 109.5{\\textdegree}, a tetrahedral geometry resulting from the hybridization of carbon atom orbitals, and a dipole moment of 0.74 Debye. Sc\\$\\_2\\$PdGa is Heusler structured and crystallizes in the cubic Fm\\${\\textasciicircum}-\\$3m space group. Sc is bonded in a body-centered cubic geometry to four equivalent Pd and four equivalent Ga atoms. All Sc{\\textendash}Pd bond lengths are 2.88 \\r{A}. All Sc{\\textendash}Ga bond lengths are 2.88 \\r{A}. Pd is bonded in a body-centered cubic geometry to eight equivalent Sc atoms. Ga is bonded in a body-centered cubic geometry to eight equivalent Sc atoms.\nAnswer: -1.2897 eV"}", "/scratch/micpie/export/peptides_hemolytic/test_0-10.jsonl": "{"text":"User: Can you create the amino acid sequence of a peptide that is hemolytic?\nAssistant: Of course, here you go: KLKLKLKLKLKLKL"} {"text":"User: Can you generate the amino acid sequence of a peptide that is not hemolytic?\nAssistant: Yes, I'm happy to help, here you go: KKKAAFAAWAAFAAKKK"}", "/scratch/micpie/export/peptides_hemolytic/valid_0-8.jsonl": "{"text":"User: Can you tell me if the peptide with the sequence of amino acids FLPILASLAAKLGPKLFCLVTKKC is hemolytic?\nAssistant: Yes, this amino acid sequence is hemolytic."} {"text":"User: Can you tell me if the peptide with the sequence of amino acids KRRVRWIIW is hemolytic?\nAssistant: No, this amino acid sequence is not hemolytic."}", "/scratch/micpie/export/peptides_hemolytic/test_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which amino acid sequences are hemolytic?\nConstraint: You must select none, one or more options from A, B, or C without using any additional words.\nOptions:\nA) KLKLKLKLKLKLKL\nB) LLRHVVKILEKYL\nC) GCRRLCYKQRCVTYCRGR\nAnswer: A"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which amino acid sequences are not hemolytic?\nConstraint: You must select none, one or more options from A or B without using any additional words.\nOptions:\n[A] KFFKFFKFFK\n[B] KKKAAFAAWAAFAAKKK\nAnswer: A, B"}", "/scratch/micpie/export/peptides_hemolytic/train_0-8.jsonl": "{"text":"User: Can you tell me if the peptide with the amino acid sequence AGRQWIAKYLRREIRKRGRKAVIAW is hemolytic?\nAssistant: Yes, this amino acid sequence is hemolytic."} {"text":"User: Can you tell me if the peptide with the amino acid sequence GLLKFIKKLL is hemolytic?\nAssistant: No, this amino acid sequence is not hemolytic."}", "/scratch/micpie/export/peptides_hemolytic/test_0-5.jsonl": "{"text":"Task: Please classify a peptide based on the description.\nDescription: A amino acid sequence that is hemolytic.\namino acid sequence : KLKLKLKLKLKLKL\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"} {"text":"Task: Please classify a peptide based on the description.\nDescription: A amino acid sequence that is hemolytic.\namino acid sequence : KKKAAFAAWAAFAAKKK\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/peptides_hemolytic/valid_0-9.jsonl": "{"text":"User: Is the peptide with the sequence of amino acids FLPILASLAAKLGPKLFCLVTKKC hemolytic?\nAssistant: Yes, it is hemolytic."} {"text":"User: Is the peptide with the amino acid sequence KRRVRWIIW hemolytic?\nAssistant: No, it is not hemolytic."}", "/scratch/micpie/export/peptides_hemolytic/test_0-1.jsonl": "{"text":"The amino acid sequence KLKLKLKLKLKLKL displays hemolytic properties."} {"text":"The amino acid sequence KKKAAFAAWAAFAAKKK exhibits no hemolytic properties."}", "/scratch/micpie/export/peptides_hemolytic/valid_0-0.jsonl": "{"text":"The sequence of amino acids FLPILASLAAKLGPKLFCLVTKKC exhibits hemolytic properties."} {"text":"The sequence of AAs KRRVRWIIW shows no hemolytic properties."}", "/scratch/micpie/export/peptides_hemolytic/test_0-2.jsonl": "{"text":"Based on the amino acid sequence KLKLKLKLKLKLKL, the peptide has hemolytic properties."} {"text":"Based on the sequence of amino acids KKKAAFAAWAAFAAKKK, the peptide has no hemolytic characteristics."}", "/scratch/micpie/export/peptides_hemolytic/valid_0-10.jsonl": "{"text":"User: Can you give me the amino acid sequence of a peptide that is hemolytic?\nAssistant: Yes, here you go: FLPILASLAAKLGPKLFCLVTKKC"} {"text":"User: Can you create the amino acid sequence of a peptide that is not hemolytic?\nAssistant: Sure, here you go: KRRVRWIIW"}", "/scratch/micpie/export/peptides_hemolytic/train_0-6.jsonl": "{"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is hemolytic.\namino acid sequence : AGRQWIAKYLRREIRKRGRKAVIAW\nConstraint: Answer the question in a full sentence.\nResult: This amino acid sequence is hemolytic."} {"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is hemolytic.\namino acid sequence : GLLKFIKKLL\nConstraint: Answer the question in a complete sentence.\nResult: This amino acid sequence is not hemolytic."}", "/scratch/micpie/export/peptides_hemolytic/valid_0-6.jsonl": "{"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is hemolytic.\nsequence of amino acids : FLPILASLAAKLGPKLFCLVTKKC\nConstraint: Answer the question in a complete sentence.\nResult: This amino acid sequence is hemolytic."} {"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is hemolytic.\namino acid sequence : KRRVRWIIW\nConstraint: Answer the question in a full sentence.\nResult: This amino acid sequence is not hemolytic."}", "/scratch/micpie/export/peptides_hemolytic/test_0-9.jsonl": "{"text":"User: Is the peptide with the sequence of amino acids KLKLKLKLKLKLKL hemolytic?\nAssistant: Yes, it is hemolytic."} {"text":"User: Is the peptide with the amino acid sequence KKKAAFAAWAAFAAKKK hemolytic?\nAssistant: No, it is not hemolytic."}", "/scratch/micpie/export/peptides_hemolytic/test_0-0.jsonl": "{"text":"The sequence of AAs KLKLKLKLKLKLKL exhibits hemolytic properties."} {"text":"The sequence of amino acids KKKAAFAAWAAFAAKKK demonstrates no hemolytic properties."}", "/scratch/micpie/export/peptides_hemolytic/valid_0-7.jsonl": "{"text":"Task: Please give me a sequence of amino acids based on the description.\nDescription: A amino acid sequence that is hemolytic.\nResult: FLPILASLAAKLGPKLFCLVTKKC"} {"text":"Task: Please generate a sequence of amino acids based on the description below.\nDescription: A amino acid sequence that is hemolytic.\nResult: KRRVRWIIW"}", "/scratch/micpie/export/peptides_hemolytic/test_0-3.jsonl": "{"text":"The amino acid sequence KLKLKLKLKLKLKL represents a peptide that is identified as hemolytic."} {"text":"The amino acid sequence KKKAAFAAWAAFAAKKK represents a peptide that is not identified as hemolytic."}", "/scratch/micpie/export/peptides_hemolytic/valid_0-11.jsonl": "{"text":"User: I'm searching for the amino acid sequence of a peptide that is hemolytic?\nAssistant: This is a amino acid sequence that is hemolytic: FLPILASLAAKLGPKLFCLVTKKC"} {"text":"User: I'm searching for the amino acid sequence of a peptide that is not hemolytic?\nAssistant: This is a amino acid sequence that is not hemolytic: KRRVRWIIW"}", "/scratch/micpie/export/peptides_hemolytic/train_0-0.jsonl": "{"text":"The sequence of amino acids AGRQWIAKYLRREIRKRGRKAVIAW exhibits hemolytic properties."} {"text":"The sequence of amino acids GLLKFIKKLL exhibits no hemolytic properties."}", "/scratch/micpie/export/peptides_hemolytic/test_0-6.jsonl": "{"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is hemolytic.\nsequence of amino acids : KLKLKLKLKLKLKL\nConstraint: Answer the question in a complete sentence.\nResult: This amino acid sequence is hemolytic."} {"text":"Task: Please classify a amino acid sequence based on the description.\nDescription: A amino acid sequence that is hemolytic.\namino acid sequence : KKKAAFAAWAAFAAKKK\nConstraint: Answer the question in a complete sentence.\nResult: This amino acid sequence is not hemolytic."}", "/scratch/micpie/export/peptides_hemolytic/train_0-10.jsonl": "{"text":"User: Can you create the amino acid sequence of a peptide that is hemolytic?\nAssistant: Yes, I'm happy to help, here you go: AGRQWIAKYLRREIRKRGRKAVIAW"} {"text":"User: Can you generate the amino acid sequence of a peptide that is not hemolytic?\nAssistant: Of course, here you go: GLLKFIKKLL"}", "/scratch/micpie/export/peptides_hemolytic/train_0-3.jsonl": "{"text":"The amino acid sequence AGRQWIAKYLRREIRKRGRKAVIAW represents a peptide that is identified as hemolytic."} {"text":"The amino acid sequence GLLKFIKKLL is from a peptide that is not identified as hemolytic."}", "/scratch/micpie/export/peptides_hemolytic/train_0-12.jsonl": "{"text":"User: I want to generate a peptide.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The amino acid sequence should be hemolytic.\nAssistant: Ok, this amino acid sequence is hemolytic: AGRQWIAKYLRREIRKRGRKAVIAW"} {"text":"User: I want to generate a peptide.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The amino acid sequence should not be hemolytic.\nAssistant: Ok, this amino acid sequence is not hemolytic: GLLKFIKKLL"}", "/scratch/micpie/export/peptides_hemolytic/test_0-13.jsonl": "{"text":"User: I want to create a sequence of amino acids.\nAssistant: This sounds very exciting. Should it be a special amino acid sequence?\nUser: Yes, the amino acid sequence should be hemolytic.\nAssistant: Ok, this amino acid sequence is hemolytic: KLKLKLKLKLKLKL"} {"text":"User: I want to create a peptide.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the amino acid sequence should not be hemolytic.\nAssistant: Got it, this amino acid sequence is not hemolytic: KKKAAFAAWAAFAAKKK"}", "/scratch/micpie/export/peptides_hemolytic/valid_0-2.jsonl": "{"text":"Based on the sequence of amino acids FLPILASLAAKLGPKLFCLVTKKC, the peptide has hemolytic properties."} {"text":"Based on the sequence of amino acids KRRVRWIIW, the peptide has no hemolytic features."}", "/scratch/micpie/export/peptides_hemolytic/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the peptide with the amino acid sequence representation of AGRQWIAKYLRREIRKRGRKAVIAW hemolytic?\nConstraint: Even if you are uncertain, you must pick either a or b without using any other words.\nOptions:\na.) False\nb.) True\nAnswer: b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the peptide with the amino acid sequence representation of GLLKFIKKLL hemolytic?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any additional words.\nOptions:\n1. False\n2. True\nAnswer: 1"}", "/scratch/micpie/export/peptides_hemolytic/valid_0-1.jsonl": "{"text":"The amino acid sequence FLPILASLAAKLGPKLFCLVTKKC displays hemolytic properties."} {"text":"The amino acid sequence KRRVRWIIW displays no hemolytic properties."}", "/scratch/micpie/export/peptides_hemolytic/valid_0-13.jsonl": "{"text":"User: I want to create a peptide.\nAssistant: This sounds very interesting. Should it be a special amino acid sequence?\nUser: Yes, the amino acid sequence should be hemolytic.\nAssistant: Understood, this amino acid sequence is hemolytic: FLPILASLAAKLGPKLFCLVTKKC"} {"text":"User: I want to come up with a sequence of amino acids.\nAssistant: This sounds very interesting. Should it be a special amino acid sequence?\nUser: Yes, the amino acid sequence should not be hemolytic.\nAssistant: Ok, this amino acid sequence is not hemolytic: KRRVRWIIW"}", "/scratch/micpie/export/peptides_hemolytic/valid_0-5.jsonl": "{"text":"Task: Please classify a peptide based on the description.\nDescription: A amino acid sequence that is hemolytic.\namino acid sequence : FLPILASLAAKLGPKLFCLVTKKC\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a peptide based on the description.\nDescription: A amino acid sequence that is hemolytic.\namino acid sequence : KRRVRWIIW\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/peptides_hemolytic/train_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which amino acid sequences are hemolytic?\nConstraint: You must select none, one or more options from A or B without using any other words.\nOptions:\nA) KKWRKLLKKLKKLL\nB) AGRQWIAKYLRREIRKRGRKAVIAW\nAnswer: B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which amino acid sequences are not hemolytic?\nConstraint: You must select none, one or more options from a, b, c, d, or e without using any additional words.\nOptions:\na. KKKKKKKKKKGIGKFLHSAKKFGKAFVGEIMNS\nb. FSISPGKVLDKFGKIVGKVLKQLKKVSAVAKV\nc. GLLKFIKKLL\nd. GLLKWIKTLL\ne. GFLDVVKHIGKAALGAVTHLINQ\nAnswer: a, c, e"}", "/scratch/micpie/export/peptides_hemolytic/valid_0-4.jsonl": "{"text":"The amino acid sequence FLPILASLAAKLGPKLFCLVTKKC is hemolytic."} {"text":"The sequence of amino acids KRRVRWIIW is not hemolytic."}", "/scratch/micpie/export/peptides_hemolytic/train_0-5.jsonl": "{"text":"Task: Please classify a peptide based on the description.\nDescription: A amino acid sequence that is hemolytic.\namino acid sequence : AGRQWIAKYLRREIRKRGRKAVIAW\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a peptide based on the description.\nDescription: A amino acid sequence that is hemolytic.\namino acid sequence : GLLKFIKKLL\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/peptides_hemolytic/valid_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which amino acid sequences are hemolytic?\nConstraint: You must select none, one or more options from A or B without using any other words.\nOptions:\nA.) FLPILASLAAKLGPKLFCLVTKKC\nB.) FLSIIAKVLGSLF\nAnswer: A, B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which amino acid sequences are not hemolytic?\nConstraint: You must select none, one or more options from 1 or 2 without using any additional words.\nOptions:\n(1) ILGPVISTIGNALGGLLKNL\n(2) KRRVRWIIW\nAnswer: 2"}", "/scratch/micpie/export/peptides_hemolytic/valid_0-12.jsonl": "{"text":"User: I want to generate a peptide.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The amino acid sequence should be hemolytic.\nAssistant: Ok, this amino acid sequence is hemolytic: FLPILASLAAKLGPKLFCLVTKKC"} {"text":"User: I want to generate a sequence of amino acids.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The amino acid sequence should not be hemolytic.\nAssistant: Ok, this amino acid sequence is not hemolytic: KRRVRWIIW"}", "/scratch/micpie/export/peptides_hemolytic/train_0-2.jsonl": "{"text":"Based on the sequence of amino acids AGRQWIAKYLRREIRKRGRKAVIAW, the peptide has hemolytic properties."} {"text":"Based on the amino acid sequence GLLKFIKKLL, the peptide has no hemolytic characteristics."}", "/scratch/micpie/export/peptides_hemolytic/test_0-11.jsonl": "{"text":"User: I'm looking for the amino acid sequence of a peptide that is hemolytic?\nAssistant: This is a amino acid sequence that is hemolytic: KLKLKLKLKLKLKL"} {"text":"User: I'm looking for the amino acid sequence of a peptide that is not hemolytic?\nAssistant: This is a amino acid sequence that is not hemolytic: KKKAAFAAWAAFAAKKK"}", "/scratch/micpie/export/peptides_hemolytic/train_0-7.jsonl": "{"text":"Task: Please generate a sequence of amino acids based on the description.\nDescription: A amino acid sequence that is hemolytic.\nResult: AGRQWIAKYLRREIRKRGRKAVIAW"} {"text":"Task: Please create a amino acid sequence based on the text description.\nDescription: A amino acid sequence that is hemolytic.\nResult: GLLKFIKKLL"}", "/scratch/micpie/export/peptides_hemolytic/train_0-11.jsonl": "{"text":"User: I'm looking for the amino acid sequence of a peptide that is hemolytic?\nAssistant: This is a amino acid sequence that is hemolytic: AGRQWIAKYLRREIRKRGRKAVIAW"} {"text":"User: I'm searching for the amino acid sequence of a peptide that is not hemolytic?\nAssistant: This is a amino acid sequence that is not hemolytic: GLLKFIKKLL"}", "/scratch/micpie/export/peptides_hemolytic/train_0-1.jsonl": "{"text":"The amino acid sequence AGRQWIAKYLRREIRKRGRKAVIAW displays hemolytic properties."} {"text":"The amino acid sequence GLLKFIKKLL displays no hemolytic properties."}", "/scratch/micpie/export/peptides_hemolytic/train_0-13.jsonl": "{"text":"User: I want to generate a amino acid sequence.\nAssistant: This sounds very interesting. Should it be a special amino acid sequence?\nUser: Yes, the amino acid sequence should be hemolytic.\nAssistant: Understood, this amino acid sequence is hemolytic: AGRQWIAKYLRREIRKRGRKAVIAW"} {"text":"User: I want to generate a amino acid sequence.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the amino acid sequence should not be hemolytic.\nAssistant: Ok, this amino acid sequence is not hemolytic: GLLKFIKKLL"}", "/scratch/micpie/export/peptides_hemolytic/train_0-4.jsonl": "{"text":"The amino acid sequence AGRQWIAKYLRREIRKRGRKAVIAW is hemolytic."} {"text":"The sequence of amino acids GLLKFIKKLL is not hemolytic."}", "/scratch/micpie/export/peptides_hemolytic/test_0-7.jsonl": "{"text":"Task: Please generate a amino acid sequence based on the description.\nDescription: A amino acid sequence that is hemolytic.\nResult: KLKLKLKLKLKLKL"} {"text":"Task: Please give me a sequence of amino acids based on the text description.\nDescription: A amino acid sequence that is hemolytic.\nResult: KKKAAFAAWAAFAAKKK"}", "/scratch/micpie/export/peptides_hemolytic/train_0-9.jsonl": "{"text":"User: Is the peptide with the amino acid sequence AGRQWIAKYLRREIRKRGRKAVIAW hemolytic?\nAssistant: Yes, it is hemolytic."} {"text":"User: Is the peptide with the amino acid sequence GLLKFIKKLL hemolytic?\nAssistant: No, it is not hemolytic."}", "/scratch/micpie/export/peptides_hemolytic/valid_0-3.jsonl": "{"text":"The amino acid sequence FLPILASLAAKLGPKLFCLVTKKC represents a peptide that is identified as hemolytic."} {"text":"The amino acid sequence KRRVRWIIW is from a peptide that is not identified as hemolytic."}", "/scratch/micpie/export/peptides_hemolytic/test_0-8.jsonl": "{"text":"User: Can you estimate if the peptide with the amino acid sequence KLKLKLKLKLKLKL is hemolytic?\nAssistant: Yes, this amino acid sequence is hemolytic."} {"text":"User: Can you estimate if the peptide with the sequence of amino acids KKKAAFAAWAAFAAKKK is hemolytic?\nAssistant: No, this amino acid sequence is not hemolytic."}", "/scratch/micpie/export/peptides_hemolytic/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the peptide with the amino acid sequence representation of KLKLKLKLKLKLKL hemolytic?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any additional words.\nOptions:\n(1) True\n(2) False\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the peptide with the amino acid sequence representation of KKKAAFAAWAAFAAKKK hemolytic?\nConstraint: Even if you are uncertain, you must pick either A or B without using any other words.\nOptions:\n(A) True\n(B) False\nAnswer: B"}", "/scratch/micpie/export/peptides_hemolytic/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the peptide with the amino acid sequence representation of FLPILASLAAKLGPKLFCLVTKKC hemolytic?\nConstraint: Even if you are uncertain, you must pick either A or B without using any additional words.\nOptions:\nA.) False\nB.) True\nAnswer: B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the peptide with the amino acid sequence representation of KRRVRWIIW hemolytic?\nConstraint: Even if you are not sure, you must pick either a or b without using any other words.\nOptions:\na. False\nb. True\nAnswer: a"}", "/scratch/micpie/export/peptides_hemolytic/test_0-4.jsonl": "{"text":"The sequence of amino acids KLKLKLKLKLKLKL is hemolytic."} {"text":"The amino acid sequence KKKAAFAAWAAFAAKKK is not hemolytic."}", "/scratch/micpie/export/peptides_hemolytic/test_0-12.jsonl": "{"text":"User: I want to come up with a peptide.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The amino acid sequence should be hemolytic.\nAssistant: Ok, here you go, this amino acid sequence is hemolytic: KLKLKLKLKLKLKL"} {"text":"User: I want to create a sequence of amino acids.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The amino acid sequence should not be hemolytic.\nAssistant: Got it, here you go, this amino acid sequence is not hemolytic: KKKAAFAAWAAFAAKKK"}", "/scratch/micpie/export/nr_ar_lbd_tox21/test_0-10.jsonl": "{"text":"User: Can you create the canonical SMILES of a molecule that is not toxic in the NR-AR-LBD assay?\nAssistant: Yes, here you go: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"User: Can you create the SELFIES of a molecule that is not toxic in the androgen receptor ligand-binding domain assay?\nAssistant: Sure, here you go: [O][=C][Branch1][C][O][C][=C][C][=C][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][=C][Ring1][#Branch2]"}", "/scratch/micpie/export/nr_ar_lbd_tox21/valid_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the SELFIES [C][N][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Ring1][S] is toxic in the NR-AR-LBD assay?\nAssistant: No, this molecule is not toxic in the NR-AR-LBD assay."} {"text":"User: Can you estimate if the molecule with the DeepSMILES CCOP=S)OCC)))Occc-cccccc6))))))on5 is toxic in the androgen receptor ligand-binding domain assay?\nAssistant: No, this molecule is not toxic in the androgen receptor ligand-binding domain assay."}", "/scratch/micpie/export/nr_ar_lbd_tox21/test_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-AR-LBD assay?\nConstraint: You must select none, one or more options from A, B, C, or D without using any other words.\nOptions:\n(A) CC(C)NC[C@@H](O)COc1ccc(CC(N)=O)cc1\n(B) N#CCCC#N\n(C) CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C\n(D) O=Cc1c2ccccc2c(Cl)c2ccccc12\nAnswer: A, B, C"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-AR-LBD assay?\nConstraint: You must select none, one or more options from A or B without using any additional words.\nOptions:\nA. [O][=C][Branch1][C][O][C][=C][C][=C][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][=C][Ring1][#Branch2]\nB. [C][C][Branch1][C][C][C][C][C][C][C][C][O][C][=Branch1][C][=O][C][C][C][C][C][=Branch1][C][=O][O][C][C][C][C][C][C][C][Branch1][C][C][C]\nAnswer: A, B"}", "/scratch/micpie/export/nr_ar_lbd_tox21/train_0-8.jsonl": "{"text":"User: Can you estimate if the molecule with the SELFIES [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N] is toxic in the NR-AR-LBD assay?\nAssistant: No, this molecule is not toxic in the NR-AR-LBD assay."} {"text":"User: Can you figure out if the molecule with the InChI InChI=1S\/C5H14NO\/c1-6(2,3)4-5-7\/h7H,4-5H2,1-3H3\/q+1 is toxic in the androgen receptor ligand-binding domain assay?\nAssistant: No, this molecule is not toxic in the androgen receptor ligand-binding domain assay."}", "/scratch/micpie/export/nr_ar_lbd_tox21/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-AR-LBD assay.\nDeepSMILES: CCCNCC))CCC))C=O)NccC)cccc6C\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-AR-LBD assay.\nSMILES: O=C(O)c1cccc(C(F)(F)F)c1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"}", "/scratch/micpie/export/nr_ar_lbd_tox21/valid_0-9.jsonl": "{"text":"User: Is the molecule with the SMILES CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21 toxic in the NR-AR-LBD assay?\nAssistant: No, it is not toxic in the NR-AR-LBD assay."} {"text":"User: Is the molecule with the InChI InChI=1S\/C13H16NO4PS\/c1-3-15-19(20,16-4-2)18-13-10-12(17-14-13)11-8-6-5-7-9-11\/h5-10H,3-4H2,1-2H3 toxic in the NR-AR-LBD assay?\nAssistant: No, it is not toxic in the NR-AR-LBD assay."}", "/scratch/micpie/export/nr_ar_lbd_tox21/test_0-1.jsonl": "{"text":"The molecule with the DeepSMILES CCCNCC))CCC))C=O)NccC)cccc6C is not exhibiting toxicity in the NR-androgen-LBD receptor alpha assay."} {"text":"The molecule with the InChI InChI=1S\/C8H5F3O2\/c9-8(10,11)6-3-1-2-5(4-6)7(12)13\/h1-4H,(H,12,13) is not demonstrating toxicity in the NR-androgen-LBD receptor alpha assay."}", "/scratch/micpie/export/nr_ar_lbd_tox21/valid_0-0.jsonl": "{"text":"The molecule with the SMILES representation of CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21 is not toxic in the androgen receptor ligand-binding domain assay."} {"text":"The molecule with the InChI InChI=1S\/C13H16NO4PS\/c1-3-15-19(20,16-4-2)18-13-10-12(17-14-13)11-8-6-5-7-9-11\/h5-10H,3-4H2,1-2H3 is not toxic in the androgen receptor ligand-binding domain assay."}", "/scratch/micpie/export/nr_ar_lbd_tox21/test_0-2.jsonl": "{"text":"Based on the SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C, the molecule has no NR-AR-LBD toxicity features."} {"text":"Based on the InChI representation InChI=1S\/C8H5F3O2\/c9-8(10,11)6-3-1-2-5(4-6)7(12)13\/h1-4H,(H,12,13), the molecule has no NR-AR-LBD toxicity characteristics."}", "/scratch/micpie/export/nr_ar_lbd_tox21/valid_0-10.jsonl": "{"text":"User: Can you generate the InChI of a molecule that is not toxic in the androgen receptor ligand-binding domain assay?\nAssistant: Yes, I'm happy to help, here you go: InChI=1S\/C19H23ClN2\/c1-21(2)12-5-13-22-18-7-4-3-6-15(18)8-9-16-10-11-17(20)14-19(16)22\/h3-4,6-7,10-11,14H,5,8-9,12-13H2,1-2H3"} {"text":"User: Can you generate the InChI of a molecule that is not toxic in the NR-AR-LBD assay?\nAssistant: Sure, here you go: InChI=1S\/C13H16NO4PS\/c1-3-15-19(20,16-4-2)18-13-10-12(17-14-13)11-8-6-5-7-9-11\/h5-10H,3-4H2,1-2H3"}", "/scratch/micpie/export/nr_ar_lbd_tox21/train_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the androgen receptor ligand-binding domain assay.\nMolecule SELFIES: [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the androgen receptor ligand-binding domain assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-AR-LBD assay.\nMolecule InChI: InChI=1S\/C5H14NO\/c1-6(2,3)4-5-7\/h7H,4-5H2,1-3H3\/q+1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the NR-AR-LBD assay."}", "/scratch/micpie/export/nr_ar_lbd_tox21/valid_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-AR-LBD assay.\nDeepSMILES: CNC)CCCNcccccc6CCccccCl)cc6%15\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the NR-AR-LBD assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the androgen receptor ligand-binding domain assay.\nSMILES: CCOP(=S)(OCC)Oc1cc(-c2ccccc2)on1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the androgen receptor ligand-binding domain assay."}", "/scratch/micpie/export/nr_ar_lbd_tox21/test_0-9.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20) toxic in the NR-AR-LBD assay?\nAssistant: No, it is not toxic in the NR-AR-LBD assay."} {"text":"User: Is the molecule with the canonical SMILES O=C(O)c1cccc(C(F)(F)F)c1 toxic in the androgen receptor ligand-binding domain assay?\nAssistant: No, it is not toxic in the androgen receptor ligand-binding domain assay."}", "/scratch/micpie/export/nr_ar_lbd_tox21/test_0-0.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20) is not toxic in the NR-AR-LBD assay."} {"text":"The molecule with the SMILES representation of O=C(O)c1cccc(C(F)(F)F)c1 is not toxic in the NR-AR-LBD assay."}", "/scratch/micpie/export/nr_ar_lbd_tox21/valid_0-7.jsonl": "{"text":"Task: Please give me a molecule DeepSMILES based on the text description.\nDescription: A molecule that is toxic in the androgen receptor ligand-binding domain assay.\nResult: CNC)CCCNcccccc6CCccccCl)cc6%15"} {"text":"Task: Please create a SELFIES based on the text description.\nDescription: A molecule that is toxic in the androgen receptor ligand-binding domain assay.\nResult: [C][C][O][P][=Branch1][C][=S][Branch1][Ring2][O][C][C][O][C][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][O][N][=Ring1][O]"}", "/scratch/micpie/export/nr_ar_lbd_tox21/test_0-3.jsonl": "{"text":"The canonical SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C represents a molecule that is not identified as toxic in the androgen receptor ligand-binding domain assay."} {"text":"The canonical SMILES O=C(O)c1cccc(C(F)(F)F)c1 is from a molecule that is not identified as toxic in the androgen receptor ligand-binding domain assay."}", "/scratch/micpie/export/nr_ar_lbd_tox21/valid_0-11.jsonl": "{"text":"User: I'm searching for the InChI of a molecule that is not toxic in the androgen receptor ligand-binding domain assay?\nAssistant: This is a molecule that is not toxic in the androgen receptor ligand-binding domain assay: InChI=1S\/C19H23ClN2\/c1-21(2)12-5-13-22-18-7-4-3-6-15(18)8-9-16-10-11-17(20)14-19(16)22\/h3-4,6-7,10-11,14H,5,8-9,12-13H2,1-2H3"} {"text":"User: I'm searching for the canonical SMILES of a molecule that is not toxic in the androgen receptor ligand-binding domain assay?\nAssistant: This is a molecule that is not toxic in the androgen receptor ligand-binding domain assay: CCOP(=S)(OCC)Oc1cc(-c2ccccc2)on1"}", "/scratch/micpie/export/nr_ar_lbd_tox21/train_0-0.jsonl": "{"text":"The molecule with the InChI representation of InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13) is not toxic in the NR-AR-LBD assay."} {"text":"The molecule with the InChI InChI=1S\/C5H14NO\/c1-6(2,3)4-5-7\/h7H,4-5H2,1-3H3\/q+1 is not toxic in the androgen receptor ligand-binding domain assay."}", "/scratch/micpie/export/nr_ar_lbd_tox21/test_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-AR-LBD assay.\nMolecule InChI: InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20)\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the NR-AR-LBD assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-AR-LBD assay.\nMolecule SMILES: O=C(O)c1cccc(C(F)(F)F)c1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the NR-AR-LBD assay."}", "/scratch/micpie/export/nr_ar_lbd_tox21/train_0-10.jsonl": "{"text":"User: Can you generate the InChI of a molecule that is not toxic in the androgen receptor ligand-binding domain assay?\nAssistant: Yes, I'm happy to help, here you go: InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13)"} {"text":"User: Can you give me the InChI of a molecule that is not toxic in the androgen receptor ligand-binding domain assay?\nAssistant: Of course, here you go: InChI=1S\/C5H14NO\/c1-6(2,3)4-5-7\/h7H,4-5H2,1-3H3\/q+1"}", "/scratch/micpie/export/nr_ar_lbd_tox21/train_0-3.jsonl": "{"text":"The DeepSMILES CCOccccncSN)=O)=O))sc5c9 represents a molecule that is not identified as toxic in the NR-AR-LBD assay."} {"text":"The canonical SMILES C[N+](C)(C)CCO represents a molecule that is not identified as toxic in the NR-AR-LBD assay."}", "/scratch/micpie/export/nr_ar_lbd_tox21/train_0-12.jsonl": "{"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be toxic in the androgen receptor ligand-binding domain assay.\nAssistant: Ok, here you go, this DeepSMILES is not toxic in the androgen receptor ligand-binding domain assay: CCOccccncSN)=O)=O))sc5c9"} {"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be toxic in the NR-AR-LBD assay.\nAssistant: Ok, here you go, this InChI is not toxic in the NR-AR-LBD assay: InChI=1S\/C5H14NO\/c1-6(2,3)4-5-7\/h7H,4-5H2,1-3H3\/q+1"}", "/scratch/micpie/export/nr_ar_lbd_tox21/test_0-13.jsonl": "{"text":"User: I want to generate a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the androgen receptor ligand-binding domain assay.\nAssistant: Understood, this canonical SMILES is not toxic in the androgen receptor ligand-binding domain assay: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the androgen receptor ligand-binding domain assay.\nAssistant: Understood, this DeepSMILES is not toxic in the androgen receptor ligand-binding domain assay: O=CO)cccccCF)F)F))c6"}", "/scratch/micpie/export/nr_ar_lbd_tox21/valid_0-2.jsonl": "{"text":"Based on the InChI representation InChI=1S\/C19H23ClN2\/c1-21(2)12-5-13-22-18-7-4-3-6-15(18)8-9-16-10-11-17(20)14-19(16)22\/h3-4,6-7,10-11,14H,5,8-9,12-13H2,1-2H3, the molecule has no NR-AR-LBD toxicity properties."} {"text":"Based on the InChI representation InChI=1S\/C13H16NO4PS\/c1-3-15-19(20,16-4-2)18-13-10-12(17-14-13)11-8-6-5-7-9-11\/h5-10H,3-4H2,1-2H3, the molecule has no androgen receptor ligand-binding domain toxicity features."}", "/scratch/micpie/export/nr_ar_lbd_tox21/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES CCOccccncSN)=O)=O))sc5c9 toxic in the androgen receptor ligand-binding domain assay?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any other words.\nOptions:\n[1] False\n[2] True\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES C[N+]C)C)CCO toxic in the androgen receptor ligand-binding domain assay?\nConstraint: Even if you are not sure, you must pick either a or b without using any additional words.\nOptions:\na. True\nb. False\nAnswer: b"}", "/scratch/micpie/export/nr_ar_lbd_tox21/valid_0-1.jsonl": "{"text":"The molecule with the DeepSMILES CNC)CCCNcccccc6CCccccCl)cc6%15 is not demonstrating toxicity in the NR-androgen-LBD receptor alpha assay."} {"text":"The molecule with the SMILES CCOP(=S)(OCC)Oc1cc(-c2ccccc2)on1 is not exhibiting toxicity in the NR-androgen-LBD receptor alpha assay."}", "/scratch/micpie/export/nr_ar_lbd_tox21/valid_0-13.jsonl": "{"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the androgen receptor ligand-binding domain assay.\nAssistant: Got it, this SMILES is not toxic in the androgen receptor ligand-binding domain assay: CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21"} {"text":"User: I want to create a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the androgen receptor ligand-binding domain assay.\nAssistant: Understood, this canonical SMILES is not toxic in the androgen receptor ligand-binding domain assay: CCOP(=S)(OCC)Oc1cc(-c2ccccc2)on1"}", "/scratch/micpie/export/nr_ar_lbd_tox21/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-AR-LBD assay.\ncanonical SMILES: CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-AR-LBD assay.\nInChI: InChI=1S\/C13H16NO4PS\/c1-3-15-19(20,16-4-2)18-13-10-12(17-14-13)11-8-6-5-7-9-11\/h5-10H,3-4H2,1-2H3\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/nr_ar_lbd_tox21/train_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-AR-LBD assay?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any additional words.\nOptions:\n(1) InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13)\n(2) InChI=1S\/C5H13S.C2F6NO4S2\/c1-4-6(3)5-2;3-1(4,5)14(10,11)9-15(12,13)2(6,7)8\/h4-5H2,1-3H3;\/q+1;-1\n(3) InChI=1S\/C11H21N\/c1-10(2)8-5-6-9(7-8)11(10,3)12-4\/h8-9,12H,5-7H2,1-4H3\n(4) InChI=1S\/C12H22O4\/c1-9(2)15-11(13)7-5-6-8-12(14)16-10(3)4\/h9-10H,5-8H2,1-4H3\nAnswer: 1, 2, 3, 4"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-AR-LBD assay?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any other words.\nOptions:\n1.) [O][=C][Branch1][C][O][C][C][=N][N][Branch1][=C][C][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1][F][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring2][Ring1][Ring2][Ring1][=Branch1]\n2.) [C][C@][C][C][C@H1][C@@H1][Branch1][P][C][C][C][=C][C][=Branch1][C][=O][C][C][C@@][Ring1][#Branch1][Ring1][O][C][C@@H1][Ring1][S][C][C][C@H1][Ring2][Ring1][Ring1][O]\n3.) [C][N+1][Branch1][C][C][Branch1][C][C][C][C][O]\nAnswer: 1, 3"}", "/scratch/micpie/export/nr_ar_lbd_tox21/valid_0-4.jsonl": "{"text":"The molecule DeepSMILES CNC)CCCNcccccc6CCccccCl)cc6%15 is not toxic in the androgen receptor ligand-binding domain assay."} {"text":"The SELFIES [C][C][O][P][=Branch1][C][=S][Branch1][Ring2][O][C][C][O][C][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][O][N][=Ring1][O] is not toxic in the androgen receptor ligand-binding domain assay."}", "/scratch/micpie/export/nr_ar_lbd_tox21/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-AR-LBD assay.\nDeepSMILES: CCOccccncSN)=O)=O))sc5c9\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the androgen receptor ligand-binding domain assay.\nDeepSMILES: C[N+]C)C)CCO\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"}", "/scratch/micpie/export/nr_ar_lbd_tox21/valid_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-AR-LBD assay?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any other words.\nOptions:\nA: C=CC(=O)OCCCCOC(=O)C=C\nB: CC(C)C1(C)N=C(c2ncccc2C(=O)O)NC1=O\nC: CCN1CCCC1CNC(=O)c1cc(S(=O)(=O)CC)c(N)cc1OC\nD: CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21\nE: COc1ccccc1N1CCN(CCCNc2cc(=O)n(C)c(=O)n2C)CC1\nAnswer: B, C, D, E"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-AR-LBD assay?\nConstraint: You must select none, one or more options from A, B, C, or D without using any additional words.\nOptions:\nA.) CCCCOC=O)cccccc6C=O)OCcccccc6\nB.) CCCCCCOcccccc6CN)=O\nC.) CNCCcccCl)cO)cc6[C@H]cccccc6CC[C@@H]%10%19\nD.) CCOP=S)OCC)))Occc-cccccc6))))))on5\nAnswer: A, B, C, D"}", "/scratch/micpie/export/nr_ar_lbd_tox21/valid_0-12.jsonl": "{"text":"User: I want to create a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be toxic in the androgen receptor ligand-binding domain assay.\nAssistant: Ok, this canonical SMILES is not toxic in the androgen receptor ligand-binding domain assay: CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21"} {"text":"User: I want to generate a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the androgen receptor ligand-binding domain assay.\nAssistant: Got it, here you go, this canonical SMILES is not toxic in the androgen receptor ligand-binding domain assay: CCOP(=S)(OCC)Oc1cc(-c2ccccc2)on1"}", "/scratch/micpie/export/nr_ar_lbd_tox21/train_0-2.jsonl": "{"text":"Based on the SELFIES [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N], the molecule has no NR-AR-LBD toxicity properties."} {"text":"Based on the SMILES representation C[N+](C)(C)CCO, the molecule has no androgen receptor ligand-binding domain toxicity properties."}", "/scratch/micpie/export/nr_ar_lbd_tox21/test_0-11.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that is not toxic in the NR-AR-LBD assay?\nAssistant: This is a molecule that is not toxic in the NR-AR-LBD assay: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"User: I'm searching for the InChI of a molecule that is not toxic in the NR-AR-LBD assay?\nAssistant: This is a molecule that is not toxic in the NR-AR-LBD assay: InChI=1S\/C8H5F3O2\/c9-8(10,11)6-3-1-2-5(4-6)7(12)13\/h1-4H,(H,12,13)"}", "/scratch/micpie/export/nr_ar_lbd_tox21/train_0-7.jsonl": "{"text":"Task: Please give me a molecule SMILES based on the description below.\nDescription: A molecule that is toxic in the androgen receptor ligand-binding domain assay.\nResult: CCOc1ccc2nc(S(N)(=O)=O)sc2c1"} {"text":"Task: Please create a SELFIES based on the description.\nDescription: A molecule that is toxic in the NR-AR-LBD assay.\nResult: [C][N+1][Branch1][C][C][Branch1][C][C][C][C][O]"}", "/scratch/micpie/export/nr_ar_lbd_tox21/train_0-11.jsonl": "{"text":"User: I'm searching for the InChI of a molecule that is not toxic in the androgen receptor ligand-binding domain assay?\nAssistant: This is a molecule that is not toxic in the androgen receptor ligand-binding domain assay: InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13)"} {"text":"User: I'm looking for the InChI of a molecule that is not toxic in the NR-AR-LBD assay?\nAssistant: This is a molecule that is not toxic in the NR-AR-LBD assay: InChI=1S\/C5H14NO\/c1-6(2,3)4-5-7\/h7H,4-5H2,1-3H3\/q+1"}", "/scratch/micpie/export/nr_ar_lbd_tox21/train_0-1.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13) is not displaying toxicity in the NR-AR ligand binding domain assay."} {"text":"The molecule with the canonical SMILES representation of C[N+](C)(C)CCO is not exhibiting toxicity in the NR-androgen-LBD receptor alpha assay."}", "/scratch/micpie/export/nr_ar_lbd_tox21/train_0-13.jsonl": "{"text":"User: I want to generate a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the androgen receptor ligand-binding domain assay.\nAssistant: Ok, this canonical SMILES is not toxic in the androgen receptor ligand-binding domain assay: CCOc1ccc2nc(S(N)(=O)=O)sc2c1"} {"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the androgen receptor ligand-binding domain assay.\nAssistant: Got it, this canonical SMILES is not toxic in the androgen receptor ligand-binding domain assay: C[N+](C)(C)CCO"}", "/scratch/micpie/export/nr_ar_lbd_tox21/train_0-4.jsonl": "{"text":"The DeepSMILES CCOccccncSN)=O)=O))sc5c9 is not toxic in the androgen receptor ligand-binding domain assay."} {"text":"The molecule canonical SMILES C[N+](C)(C)CCO is not toxic in the NR-AR-LBD assay."}", "/scratch/micpie/export/nr_ar_lbd_tox21/test_0-7.jsonl": "{"text":"Task: Please give me a canonical SMILES based on the description below.\nDescription: A molecule that is toxic in the androgen receptor ligand-binding domain assay.\nResult: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"Task: Please create a InChI based on the description below.\nDescription: A molecule that is toxic in the NR-AR-LBD assay.\nResult: InChI=1S\/C8H5F3O2\/c9-8(10,11)6-3-1-2-5(4-6)7(12)13\/h1-4H,(H,12,13)"}", "/scratch/micpie/export/nr_ar_lbd_tox21/train_0-9.jsonl": "{"text":"User: Is the molecule with the SMILES CCOc1ccc2nc(S(N)(=O)=O)sc2c1 toxic in the androgen receptor ligand-binding domain assay?\nAssistant: No, it is not toxic in the androgen receptor ligand-binding domain assay."} {"text":"User: Is the molecule with the SMILES C[N+](C)(C)CCO toxic in the androgen receptor ligand-binding domain assay?\nAssistant: No, it is not toxic in the androgen receptor ligand-binding domain assay."}", "/scratch/micpie/export/nr_ar_lbd_tox21/valid_0-3.jsonl": "{"text":"The SMILES CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21 is from a molecule that is not identified as toxic in the androgen receptor ligand-binding domain assay."} {"text":"The SMILES CCOP(=S)(OCC)Oc1cc(-c2ccccc2)on1 is from a molecule that is not identified as toxic in the androgen receptor ligand-binding domain assay."}", "/scratch/micpie/export/nr_ar_lbd_tox21/test_0-8.jsonl": "{"text":"User: Can you estimate if the molecule with the DeepSMILES CCCNCC))CCC))C=O)NccC)cccc6C is toxic in the NR-AR-LBD assay?\nAssistant: No, this molecule is not toxic in the NR-AR-LBD assay."} {"text":"User: Can you tell me if the molecule with the InChI InChI=1S\/C8H5F3O2\/c9-8(10,11)6-3-1-2-5(4-6)7(12)13\/h1-4H,(H,12,13) is toxic in the NR-AR-LBD assay?\nAssistant: No, this molecule is not toxic in the NR-AR-LBD assay."}", "/scratch/micpie/export/nr_ar_lbd_tox21/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20) toxic in the NR-AR-LBD assay?\nConstraint: Even if you are not sure, you must pick either a or b without using any additional words.\nOptions:\n(a) False\n(b) True\nAnswer: a"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES representation of O=CO)cccccCF)F)F))c6 toxic in the androgen receptor ligand-binding domain assay?\nConstraint: Even if you are uncertain, you must pick either a or b without using any other words.\nOptions:\na: True\nb: False\nAnswer: b"}", "/scratch/micpie/export/nr_ar_lbd_tox21/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21 toxic in the NR-AR-LBD assay?\nConstraint: Even if you are not sure, you must pick either A or B without using any other words.\nOptions:\nA.) False\nB.) True\nAnswer: A"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES CCOP(=S)(OCC)Oc1cc(-c2ccccc2)on1 toxic in the androgen receptor ligand-binding domain assay?\nConstraint: Even if you are not sure, you must pick either a or b without using any other words.\nOptions:\na. True\nb. False\nAnswer: b"}", "/scratch/micpie/export/nr_ar_lbd_tox21/test_0-4.jsonl": "{"text":"The canonical SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C is not toxic in the androgen receptor ligand-binding domain assay."} {"text":"The molecule InChI InChI=1S\/C8H5F3O2\/c9-8(10,11)6-3-1-2-5(4-6)7(12)13\/h1-4H,(H,12,13) is not toxic in the NR-AR-LBD assay."}", "/scratch/micpie/export/nr_ar_lbd_tox21/test_0-12.jsonl": "{"text":"User: I want to come up with a molecule canonical SMILES.\nAssistant: This sounds very curious. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be toxic in the androgen receptor ligand-binding domain assay.\nAssistant: Ok, here you go, this canonical SMILES is not toxic in the androgen receptor ligand-binding domain assay: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be toxic in the androgen receptor ligand-binding domain assay.\nAssistant: Got it, here you go, this DeepSMILES is not toxic in the androgen receptor ligand-binding domain assay: O=CO)cccccCF)F)F))c6"}", "/scratch/micpie/export/uniprot_organisms/test_0-1.jsonl": "{"text":"Task: Identify the organism in which this amino acid sequence can be found.\nAA sequence: MMKMKQQGLVADLLPNIRVMKTFGHFVFNYYNDNSSKYLHKVYCCVNLFMLLLQFGLCAVNLIVESADVDDLTANTITLLFFTHSIVKICYFAIRSKYFYRTWAIWNNPNSHPLFAESNARYHAIALKKMRLLLFLVGGTTMLAAVAWTVLTFFEHPIRKIVDPVTNETEIIELPQLLIRSFYPFDAGKGITHVLVLVYQFYWVLFMLIDANSLDVLFCSWLLFACEQLQHLKQIMKPLMELSATLDTVVPNSSELFKAGSADHLRDGDNPPPPPPPQSDNMLDLDLRNIYSNRQDFTATFRPTAGMTFNGGVGPNGLTKKQEALVRSAIKYWVERHKHIVRLVTAVGDAYGFALLLHMLTTTITLTLLAYQATKVNGINVYAASTIGYILYTFGQVFLFCIFGNRLIEESTSVMEAAYSCHWYDGSEEAKTFVQIVCQQCQKAMSISGAKFFTVSLDLFASVLGAVVTYFMVLVQLK\nResult: Ooceraea biroi"} {"text":"Task: Identify the organism in which the below protein can be found.\nSequence: MSAPLSIEQDDLLTDDLKSWLSDIDFSNDNEEAIEMEPSDIEMSSPPIDIETSPPEEADVNLDDTWATVQKNGNNKLNRFILTFFPSDMDTKWLEPETYFENSPNKFDCWTGQYEYCPDTGKLHAHIYIECNHKHRIRFNVFHREIRKYHQSVQLQLAKRASKKQRQSAINYVTADFKRAPGSLVFRWEHNKFPSDFDPKCVNKKSKSDKVSKDEQHETQRLWIESKPRHWTWDQIVHENEESKKLLFGCTAGEKYHKGRHAEDARRTINDVIIFYGAGGTGKTTEAQAWGSEDEPVQECRYYRRNPDDGAFWGGGRTCYKGQRIVHYEEFAGQEAFGRLKEVCDIGKHGPAVNVKNGGALLNHDTVIFTSNIHPAGWFHKLWESDPKQWMPFERRITQVRFYPSHRADGSLNQPDENNPPYFIDQTEEFRQFVGDYDKAKEHAELHWPLKEAPEPTAQVFVPGRSHGVTENTFFEYCKTGRAP\nResult: Chaetoceros protobacilladnavirus 2"}", "/scratch/micpie/export/uniprot_organisms/valid_0-0.jsonl": "{"text":"The protein with the amino acid sequence MSLEQKKGADIISKILQIQNSIGKTTSPSTLKTKLSEISRKEQENARIQSKLSDLQKKKIDIDNKLLKEKQNLIKEEILERKKLEVLTKKQQKDEIEHQKKLKREIDAIKASTQYITDVSISSYNNTIPETEPEYDLFISHASEDKEDFVRPLAETLQQLGVNVWYDEFTLKVGDSLRQKIDSGLRNSKYGTVVLSTDFIKKDWTNYELDGLVAREMNGHKMILPIWHKITKNDVLDYSPNLADKVALNTSVNSIEEIAHQLADVILNR can be found in the organism Acinetobacter baumannii (strain 1295743)."} {"text":"The protein with the amino acid sequence MGWVWKDDDEQGGHVNPSAADISPRLDGDRCSTRKVVRTQCKTEEVEPGKFIRKCEKTEEVLRDCVGRPIEVVQSNKEYTEDDVTDQVMKGSVSFGSADNGAFNFPGLQHDIDEIEHNFLGGLSRFFEAAEDMKNGFFSSFGIPHIFDEGPSTSLPSPRREIPIDSPRQLEAFQKAYGTKSGEVDLSGLARDV can be found in the organism Fragaria ananassa."}", "/scratch/micpie/export/uniprot_organisms/test_0-2.jsonl": "{"text":"User: In what organism can you find the following polypeptide:\\nMMKMKQQGLVADLLPNIRVMKTFGHFVFNYYNDNSSKYLHKVYCCVNLFMLLLQFGLCAVNLIVESADVDDLTANTITLLFFTHSIVKICYFAIRSKYFYRTWAIWNNPNSHPLFAESNARYHAIALKKMRLLLFLVGGTTMLAAVAWTVLTFFEHPIRKIVDPVTNETEIIELPQLLIRSFYPFDAGKGITHVLVLVYQFYWVLFMLIDANSLDVLFCSWLLFACEQLQHLKQIMKPLMELSATLDTVVPNSSELFKAGSADHLRDGDNPPPPPPPQSDNMLDLDLRNIYSNRQDFTATFRPTAGMTFNGGVGPNGLTKKQEALVRSAIKYWVERHKHIVRLVTAVGDAYGFALLLHMLTTTITLTLLAYQATKVNGINVYAASTIGYILYTFGQVFLFCIFGNRLIEESTSVMEAAYSCHWYDGSEEAKTFVQIVCQQCQKAMSISGAKFFTVSLDLFASVLGAVVTYFMVLVQLK\nAssistant: The given polypeptide can be found in Ooceraea biroi."} {"text":"User: In what organism can you find the following polypeptide:\\nMSAPLSIEQDDLLTDDLKSWLSDIDFSNDNEEAIEMEPSDIEMSSPPIDIETSPPEEADVNLDDTWATVQKNGNNKLNRFILTFFPSDMDTKWLEPETYFENSPNKFDCWTGQYEYCPDTGKLHAHIYIECNHKHRIRFNVFHREIRKYHQSVQLQLAKRASKKQRQSAINYVTADFKRAPGSLVFRWEHNKFPSDFDPKCVNKKSKSDKVSKDEQHETQRLWIESKPRHWTWDQIVHENEESKKLLFGCTAGEKYHKGRHAEDARRTINDVIIFYGAGGTGKTTEAQAWGSEDEPVQECRYYRRNPDDGAFWGGGRTCYKGQRIVHYEEFAGQEAFGRLKEVCDIGKHGPAVNVKNGGALLNHDTVIFTSNIHPAGWFHKLWESDPKQWMPFERRITQVRFYPSHRADGSLNQPDENNPPYFIDQTEEFRQFVGDYDKAKEHAELHWPLKEAPEPTAQVFVPGRSHGVTENTFFEYCKTGRAP\nAssistant: The given polypeptide can be found in Chaetoceros protobacilladnavirus 2."}", "/scratch/micpie/export/uniprot_organisms/test_0-0.jsonl": "{"text":"The protein with the AA sequence MMKMKQQGLVADLLPNIRVMKTFGHFVFNYYNDNSSKYLHKVYCCVNLFMLLLQFGLCAVNLIVESADVDDLTANTITLLFFTHSIVKICYFAIRSKYFYRTWAIWNNPNSHPLFAESNARYHAIALKKMRLLLFLVGGTTMLAAVAWTVLTFFEHPIRKIVDPVTNETEIIELPQLLIRSFYPFDAGKGITHVLVLVYQFYWVLFMLIDANSLDVLFCSWLLFACEQLQHLKQIMKPLMELSATLDTVVPNSSELFKAGSADHLRDGDNPPPPPPPQSDNMLDLDLRNIYSNRQDFTATFRPTAGMTFNGGVGPNGLTKKQEALVRSAIKYWVERHKHIVRLVTAVGDAYGFALLLHMLTTTITLTLLAYQATKVNGINVYAASTIGYILYTFGQVFLFCIFGNRLIEESTSVMEAAYSCHWYDGSEEAKTFVQIVCQQCQKAMSISGAKFFTVSLDLFASVLGAVVTYFMVLVQLK can be found in the organism Ooceraea biroi."} {"text":"The protein with the AA sequence MSAPLSIEQDDLLTDDLKSWLSDIDFSNDNEEAIEMEPSDIEMSSPPIDIETSPPEEADVNLDDTWATVQKNGNNKLNRFILTFFPSDMDTKWLEPETYFENSPNKFDCWTGQYEYCPDTGKLHAHIYIECNHKHRIRFNVFHREIRKYHQSVQLQLAKRASKKQRQSAINYVTADFKRAPGSLVFRWEHNKFPSDFDPKCVNKKSKSDKVSKDEQHETQRLWIESKPRHWTWDQIVHENEESKKLLFGCTAGEKYHKGRHAEDARRTINDVIIFYGAGGTGKTTEAQAWGSEDEPVQECRYYRRNPDDGAFWGGGRTCYKGQRIVHYEEFAGQEAFGRLKEVCDIGKHGPAVNVKNGGALLNHDTVIFTSNIHPAGWFHKLWESDPKQWMPFERRITQVRFYPSHRADGSLNQPDENNPPYFIDQTEEFRQFVGDYDKAKEHAELHWPLKEAPEPTAQVFVPGRSHGVTENTFFEYCKTGRAP can be found in the organism Chaetoceros protobacilladnavirus 2."}", "/scratch/micpie/export/uniprot_organisms/train_0-0.jsonl": "{"text":"The protein with the amino acid sequence MRFQVIVAAATITMITSYIPGVASQSTSDGDDLFVPVSNFDPKSIFPEIKHPFEPMYANTENGKIVPTNSWISNLFYPSADNLAPTTPDPYTLRLLDGYGGNPGLTIRQPSAKVLGSYPPTNDVPYTDAGYMINSVVVDLRLTSSEWSDVVPDRQVTDWDHLSANLRLSTPQDSNSYIDFPIVRGMAYITANYNNLTPQFLSQHAIISVEADEKKSDDNTSTFSGRKFKITMNDDPTSTFIIYSLGDKPLELRKQDNSNLVASKPYTGVIRVAKLPAPEFETLLDASRAVWPTGGDISARSDDNNGASYTIKWKTNSNEAPLLTYAYAHHLTSIDDSNVKRTDMTLQSATKGPMTALVGNEWTLRETELSPVEWLPLQAAPNPTTINEIMTEINKDIASNYTQETAKEDNYFSGKGLQKFAMLALILNKSDQTQLRNPELAQIALDKLKAAFLPYLQNEQADPFRYDTLYKGIVAKAGLPTSMGGTDDLSAEFGHSYYSDHHYHQGYFVVTAAIIHHLDPTWNADRLKAWTEALIRDVNNANDGDEYFAAFRNWDWFAGHSWAGGIKPDGALDGRDQESVPESVNFYWGAKLWGLATGNTPLTKLASLQLAVTKRTTYEYFWMLDGNKNRPENIVRNKVIGIYFEQKTDYTTYFGRFLEYIHGIQQLPMTPELMEYIRTPEFVSQEWDEKLGAIAPTVQSPWAGVLYLNYAIINPAEAYPALRKVQMDDGQTRSYSLYLTATRPHFFRRSLLAALARHGSTRRPSLPSSGDDDKHEDGFLLRFRRLNPFNLKHRIY can be found in Rhizomucor miehei."} {"text":"The protein with the amino acid sequence MGVVLSPHPAPSRREPLAPLAPGTRPGWSPAVSGSSRSALRPSTAGPGPGPGTGWGGTAASGRWVPAPAVHCAAPRAAAGHQQHHGPPLCSPDGAPRRFKRRPGSPAPAAQTGETSLREQPHGGPPAVPFVVPPTLQGRDWVPLHSGEWADAPWDPCPASELLPHTSSGGLGDACMVGAINPELYKFPEDKSETDFPDGCLGRLWFSVEYEQEAERLLVGLIKAQHLQAPSETCSPLVKLYLLPDERRFLQSKTKRKTSNPQFDEHFIFQVSSKTITQRVLKFSVYHVDRQRKHQLLGQVLFPLKNETLVGDCRRVIWRDLEAESLEPPSEFGDLQFCLSYNDYLSRLTVVVLRAKGLRLQEDRGIVSVFVKVSLMNHNKFVKCKKTSAVLGSINPVYNETFSFKADATELDTASLSLTVVQNMEGDKSQQLGRVVVGPYMYTRGRELEHWDEMLSKPKELVKRWHALCRTTEP can be found in Homo sapiens."}", "/scratch/micpie/export/uniprot_organisms/valid_0-2.jsonl": "{"text":"User: In what organism can you find the following AA sequence:\\nMSLEQKKGADIISKILQIQNSIGKTTSPSTLKTKLSEISRKEQENARIQSKLSDLQKKKIDIDNKLLKEKQNLIKEEILERKKLEVLTKKQQKDEIEHQKKLKREIDAIKASTQYITDVSISSYNNTIPETEPEYDLFISHASEDKEDFVRPLAETLQQLGVNVWYDEFTLKVGDSLRQKIDSGLRNSKYGTVVLSTDFIKKDWTNYELDGLVAREMNGHKMILPIWHKITKNDVLDYSPNLADKVALNTSVNSIEEIAHQLADVILNR\nAssistant: The given AA sequence can be found in Acinetobacter baumannii (strain 1295743)."} {"text":"User: In what organism can you find the following protein:\\nMGWVWKDDDEQGGHVNPSAADISPRLDGDRCSTRKVVRTQCKTEEVEPGKFIRKCEKTEEVLRDCVGRPIEVVQSNKEYTEDDVTDQVMKGSVSFGSADNGAFNFPGLQHDIDEIEHNFLGGLSRFFEAAEDMKNGFFSSFGIPHIFDEGPSTSLPSPRREIPIDSPRQLEAFQKAYGTKSGEVDLSGLARDV\nAssistant: The given protein can be found in Fragaria ananassa."}", "/scratch/micpie/export/uniprot_organisms/valid_0-1.jsonl": "{"text":"Task: Identify the organism in which this AA sequence can be found.\nAmino acid sequence : MSLEQKKGADIISKILQIQNSIGKTTSPSTLKTKLSEISRKEQENARIQSKLSDLQKKKIDIDNKLLKEKQNLIKEEILERKKLEVLTKKQQKDEIEHQKKLKREIDAIKASTQYITDVSISSYNNTIPETEPEYDLFISHASEDKEDFVRPLAETLQQLGVNVWYDEFTLKVGDSLRQKIDSGLRNSKYGTVVLSTDFIKKDWTNYELDGLVAREMNGHKMILPIWHKITKNDVLDYSPNLADKVALNTSVNSIEEIAHQLADVILNR\nResult: Acinetobacter baumannii (strain 1295743)"} {"text":"Task: Predict the organism in which this protein can be found.\nAmino acid sequence : MGWVWKDDDEQGGHVNPSAADISPRLDGDRCSTRKVVRTQCKTEEVEPGKFIRKCEKTEEVLRDCVGRPIEVVQSNKEYTEDDVTDQVMKGSVSFGSADNGAFNFPGLQHDIDEIEHNFLGGLSRFFEAAEDMKNGFFSSFGIPHIFDEGPSTSLPSPRREIPIDSPRQLEAFQKAYGTKSGEVDLSGLARDV\nResult: Fragaria ananassa"}", "/scratch/micpie/export/uniprot_organisms/train_0-2.jsonl": "{"text":"User: In what organism can you find the following amino acid sequence:\\nMRFQVIVAAATITMITSYIPGVASQSTSDGDDLFVPVSNFDPKSIFPEIKHPFEPMYANTENGKIVPTNSWISNLFYPSADNLAPTTPDPYTLRLLDGYGGNPGLTIRQPSAKVLGSYPPTNDVPYTDAGYMINSVVVDLRLTSSEWSDVVPDRQVTDWDHLSANLRLSTPQDSNSYIDFPIVRGMAYITANYNNLTPQFLSQHAIISVEADEKKSDDNTSTFSGRKFKITMNDDPTSTFIIYSLGDKPLELRKQDNSNLVASKPYTGVIRVAKLPAPEFETLLDASRAVWPTGGDISARSDDNNGASYTIKWKTNSNEAPLLTYAYAHHLTSIDDSNVKRTDMTLQSATKGPMTALVGNEWTLRETELSPVEWLPLQAAPNPTTINEIMTEINKDIASNYTQETAKEDNYFSGKGLQKFAMLALILNKSDQTQLRNPELAQIALDKLKAAFLPYLQNEQADPFRYDTLYKGIVAKAGLPTSMGGTDDLSAEFGHSYYSDHHYHQGYFVVTAAIIHHLDPTWNADRLKAWTEALIRDVNNANDGDEYFAAFRNWDWFAGHSWAGGIKPDGALDGRDQESVPESVNFYWGAKLWGLATGNTPLTKLASLQLAVTKRTTYEYFWMLDGNKNRPENIVRNKVIGIYFEQKTDYTTYFGRFLEYIHGIQQLPMTPELMEYIRTPEFVSQEWDEKLGAIAPTVQSPWAGVLYLNYAIINPAEAYPALRKVQMDDGQTRSYSLYLTATRPHFFRRSLLAALARHGSTRRPSLPSSGDDDKHEDGFLLRFRRLNPFNLKHRIY\nAssistant: The given amino acid sequence can be found in Rhizomucor miehei."} {"text":"User: In what organism can you find the following AA sequence:\\nMGVVLSPHPAPSRREPLAPLAPGTRPGWSPAVSGSSRSALRPSTAGPGPGPGTGWGGTAASGRWVPAPAVHCAAPRAAAGHQQHHGPPLCSPDGAPRRFKRRPGSPAPAAQTGETSLREQPHGGPPAVPFVVPPTLQGRDWVPLHSGEWADAPWDPCPASELLPHTSSGGLGDACMVGAINPELYKFPEDKSETDFPDGCLGRLWFSVEYEQEAERLLVGLIKAQHLQAPSETCSPLVKLYLLPDERRFLQSKTKRKTSNPQFDEHFIFQVSSKTITQRVLKFSVYHVDRQRKHQLLGQVLFPLKNETLVGDCRRVIWRDLEAESLEPPSEFGDLQFCLSYNDYLSRLTVVVLRAKGLRLQEDRGIVSVFVKVSLMNHNKFVKCKKTSAVLGSINPVYNETFSFKADATELDTASLSLTVVQNMEGDKSQQLGRVVVGPYMYTRGRELEHWDEMLSKPKELVKRWHALCRTTEP\nAssistant: The given AA sequence can be found in Homo sapiens."}", "/scratch/micpie/export/uniprot_organisms/train_0-1.jsonl": "{"text":"Task: Identify the organism in which the below protein can be found.\nAA sequence: MRFQVIVAAATITMITSYIPGVASQSTSDGDDLFVPVSNFDPKSIFPEIKHPFEPMYANTENGKIVPTNSWISNLFYPSADNLAPTTPDPYTLRLLDGYGGNPGLTIRQPSAKVLGSYPPTNDVPYTDAGYMINSVVVDLRLTSSEWSDVVPDRQVTDWDHLSANLRLSTPQDSNSYIDFPIVRGMAYITANYNNLTPQFLSQHAIISVEADEKKSDDNTSTFSGRKFKITMNDDPTSTFIIYSLGDKPLELRKQDNSNLVASKPYTGVIRVAKLPAPEFETLLDASRAVWPTGGDISARSDDNNGASYTIKWKTNSNEAPLLTYAYAHHLTSIDDSNVKRTDMTLQSATKGPMTALVGNEWTLRETELSPVEWLPLQAAPNPTTINEIMTEINKDIASNYTQETAKEDNYFSGKGLQKFAMLALILNKSDQTQLRNPELAQIALDKLKAAFLPYLQNEQADPFRYDTLYKGIVAKAGLPTSMGGTDDLSAEFGHSYYSDHHYHQGYFVVTAAIIHHLDPTWNADRLKAWTEALIRDVNNANDGDEYFAAFRNWDWFAGHSWAGGIKPDGALDGRDQESVPESVNFYWGAKLWGLATGNTPLTKLASLQLAVTKRTTYEYFWMLDGNKNRPENIVRNKVIGIYFEQKTDYTTYFGRFLEYIHGIQQLPMTPELMEYIRTPEFVSQEWDEKLGAIAPTVQSPWAGVLYLNYAIINPAEAYPALRKVQMDDGQTRSYSLYLTATRPHFFRRSLLAALARHGSTRRPSLPSSGDDDKHEDGFLLRFRRLNPFNLKHRIY\nResult: Rhizomucor miehei"} {"text":"Task: Identify the organism in which the below AA sequence can be found.\nAA sequence: MGVVLSPHPAPSRREPLAPLAPGTRPGWSPAVSGSSRSALRPSTAGPGPGPGTGWGGTAASGRWVPAPAVHCAAPRAAAGHQQHHGPPLCSPDGAPRRFKRRPGSPAPAAQTGETSLREQPHGGPPAVPFVVPPTLQGRDWVPLHSGEWADAPWDPCPASELLPHTSSGGLGDACMVGAINPELYKFPEDKSETDFPDGCLGRLWFSVEYEQEAERLLVGLIKAQHLQAPSETCSPLVKLYLLPDERRFLQSKTKRKTSNPQFDEHFIFQVSSKTITQRVLKFSVYHVDRQRKHQLLGQVLFPLKNETLVGDCRRVIWRDLEAESLEPPSEFGDLQFCLSYNDYLSRLTVVVLRAKGLRLQEDRGIVSVFVKVSLMNHNKFVKCKKTSAVLGSINPVYNETFSFKADATELDTASLSLTVVQNMEGDKSQQLGRVVVGPYMYTRGRELEHWDEMLSKPKELVKRWHALCRTTEP\nResult: Homo sapiens"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/test_0-10.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that is not metabolized by CYP3A4?\nAssistant: This is a molecule that is not a CYP3A4 substrate: CCCC(=O)Nc1ccc(OC[C@@H](O)CNC(C)C)c(C(C)=O)c1"} {"text":"User: I'm searching for the SMILES of a molecule that is metabolized by CYP P450 3A4?\nAssistant: This is a molecule that is a CYP3A4 substrate: O=C(O)c1ccccc1O"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/valid_0-8.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C14H18N6O\/c15-14-18-12(17-9-2-3-9)11-13(19-14)20(7-16-11)10-4-1-8(5-10)6-21\/h1,4,7-10,21H,2-3,5-6H2,(H3,15,17,18,19)\/t8-,10+\/m1\/s1 metabolized by CYP3A4?\nAssistant: No, it is not a CYP P450 3A4 substrate."} {"text":"User: Is the molecule with the canonical SMILES CCCN(CCC)C(=O)Cc1c(-c2ccc(Cl)cc2)nc2ccc(Cl)cn12 metabolized by CYP3A4?\nAssistant: Yes, it is a CYP P450 3A4 substrate."}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/train_0-8.jsonl": "{"text":"User: Is the molecule with the DeepSMILES COCccC=O)OCC)C))))ncc[nH]ccccOCcccccc6))))))))cc6c%139 metabolized by CYP P450 3A4?\nAssistant: No, it is not a CYP3A4 substrate."} {"text":"User: Is the molecule with the SELFIES [C][C][O][C][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Branch1][=Branch1][C][Branch1][C][C][C][C][=Branch1][C][=O][NH1][C][Ring1][P][=O] metabolized by CYP P450 3A4?\nAssistant: Yes, it is a CYP P450 3A4 substrate."}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is metabolized by CYP P450 3A4.\nMolecule SMILES: CCCC(=O)Nc1ccc(OC[C@@H](O)CNC(C)C)c(C(C)=O)c1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not a CYP P450 3A4 substrate."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is metabolized by CYP P450 3A4.\nMolecule SMILES: O=C(O)c1ccccc1O\nConstraint: Answer the question in a full sentence.\nResult: This molecule is a CYP P450 3A4 substrate."}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/valid_0-9.jsonl": "{"text":"User: Can you create the InChI of a molecule that is a not CYP3A4 substrate?\nAssistant: Yes, here you go: InChI=1S\/C14H18N6O\/c15-14-18-12(17-9-2-3-9)11-13(19-14)20(7-16-11)10-4-1-8(5-10)6-21\/h1,4,7-10,21H,2-3,5-6H2,(H3,15,17,18,19)\/t8-,10+\/m1\/s1"} {"text":"User: Can you give me the SELFIES of a molecule that is a CYP3A4 substrate?\nAssistant: Yes, I'm happy to help, here you go: [C][C][C][N][Branch1][Ring2][C][C][C][C][=Branch1][C][=O][C][C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=C][C][=C][C][Branch1][C][Cl][=C][N][Ring1][P][Ring1][#Branch1]"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/test_0-1.jsonl": "{"text":"Based on the SELFIES [C][C][C][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][=C][O][C][C@@H1][Branch1][C][O][C][N][C][Branch1][C][C][C][C][Branch1][=Branch1][C][Branch1][C][C][=O][=C][Ring2][Ring1][C], the molecule is not metabolized by CYP P450 3A4."} {"text":"Based on the SMILES O=C(O)c1ccccc1O, the molecule is metabolized by CYP P450 3A4."}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/valid_0-0.jsonl": "{"text":"The molecule with the SMILES representation of Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1 is not a substrate for CYP3A4."} {"text":"The molecule with the DeepSMILES representation of CCCNCCC)))C=O)Ccc-ccccCl)cc6))))))nccccCl)cn96 is a substrate for CYP3A4."}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/test_0-2.jsonl": "{"text":"The canonical SMILES CCCC(=O)Nc1ccc(OC[C@@H](O)CNC(C)C)c(C(C)=O)c1 represents a molecule that is not identified as a substrate for CYP3A4."} {"text":"The DeepSMILES O=CO)cccccc6O is from a molecule that is identified as a CYP3A4 substrate."}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/valid_0-10.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that is not metabolized by CYP3A4?\nAssistant: This is a molecule that is not a CYP P450 3A4 substrate: Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1"} {"text":"User: I'm looking for the DeepSMILES of a molecule that is metabolized by CYP3A4?\nAssistant: This is a molecule that is a substrate for CYP3A4: CCCNCCC)))C=O)Ccc-ccccCl)cc6))))))nccccCl)cn96"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/train_0-6.jsonl": "{"text":"Task: Please create a molecule SELFIES based on the text description below.\nDescription: A molecule that is a substrate for CYP3A4.\nResult: [C][O][C][C][=C][Branch1][O][C][=Branch1][C][=O][O][C][Branch1][C][C][C][N][=C][C][NH1][C][=C][C][=C][Branch1][O][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][=C][C][Ring2][Ring1][O][=Ring1][P]"} {"text":"Task: Please create a molecule InChI based on the text description.\nDescription: A molecule that is a substrate for CYP3A4.\nResult: InChI=1S\/C17H22N2O3\/c1-4-22-11-19-14(10-13-8-6-5-7-9-13)15(12(2)3)16(20)18-17(19)21\/h5-9,12H,4,10-11H2,1-3H3,(H,18,20,21)"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/valid_0-6.jsonl": "{"text":"Task: Please create a canonical SMILES based on the text description below.\nDescription: A molecule that is a CYP3A4 substrate.\nResult: Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1"} {"text":"Task: Please generate a molecule InChI based on the description below.\nDescription: A molecule that is a CYP3A4 substrate.\nResult: InChI=1S\/C21H23Cl2N3O\/c1-3-11-25(12-4-2)20(27)13-18-21(15-5-7-16(22)8-6-15)24-19-10-9-17(23)14-26(18)19\/h5-10,14H,3-4,11-13H2,1-2H3"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/test_0-9.jsonl": "{"text":"User: Can you generate the DeepSMILES of a molecule that is a not CYP P450 3A4 substrate?\nAssistant: Yes, I'm happy to help, here you go: CCCC=O)NccccOC[C@@H]O)CNCC)C)))))))cCC)=O))c6"} {"text":"User: Can you give me the SMILES of a molecule that is a substrate for CYP3A4?\nAssistant: Yes, I'm happy to help, here you go: O=C(O)c1ccccc1O"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/test_0-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CCCC(=O)Nc1ccc(OC[C@@H](O)CNC(C)C)c(C(C)=O)c1 is not a CYP P450 3A4 substrate."} {"text":"The molecule with the InChI InChI=1S\/C7H6O3\/c8-6-4-2-1-3-5(6)7(9)10\/h1-4,8H,(H,9,10) is a CYP3A4 substrate."}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/valid_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the SMILES Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1 is a substrate for CYP3A4?\nAssistant: No, this molecule is not metabolized by CYP3A4."} {"text":"User: Can you derive if the molecule with the DeepSMILES CCCNCCC)))C=O)Ccc-ccccCl)cc6))))))nccccCl)cn96 is a CYP P450 3A4 substrate?\nAssistant: Yes, this molecule is metabolized by CYP3A4."}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/test_0-3.jsonl": "{"text":"The molecule SELFIES [C][C][C][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][=C][O][C][C@@H1][Branch1][C][O][C][N][C][Branch1][C][C][C][C][Branch1][=Branch1][C][Branch1][C][C][=O][=C][Ring2][Ring1][C] is not metabolized by CYP P450 3A4."} {"text":"The molecule DeepSMILES O=CO)cccccc6O is metabolized by CYP3A4."}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/valid_0-11.jsonl": "{"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be metabolized by CYP3A4.\nAssistant: Got it, this canonical SMILES is not metabolized by CYP3A4: Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1"} {"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should be metabolized by CYP3A4.\nAssistant: Ok, this DeepSMILES is metabolized by CYP3A4: CCCNCCC)))C=O)Ccc-ccccCl)cc6))))))nccccCl)cn96"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/train_0-0.jsonl": "{"text":"The molecule with the InChI representation of InChI=1S\/C24H24N2O4\/c1-15(2)30-24(27)23-19(14-28-3)22-18-11-17(29-13-16-7-5-4-6-8-16)9-10-20(18)26-21(22)12-25-23\/h4-12,15,26H,13-14H2,1-3H3 is not a CYP3A4 substrate."} {"text":"The molecule with the DeepSMILES CCOCncCcccccc6)))))))cCC)C))c=O)[nH]c6=O is a CYP3A4 substrate."}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/test_0-6.jsonl": "{"text":"Task: Please generate a DeepSMILES based on the text description.\nDescription: A molecule that is a CYP3A4 substrate.\nResult: CCCC=O)NccccOC[C@@H]O)CNCC)C)))))))cCC)=O))c6"} {"text":"Task: Please create a SMILES based on the description.\nDescription: A molecule that is a substrate for CYP3A4.\nResult: O=C(O)c1ccccc1O"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/train_0-10.jsonl": "{"text":"User: I'm looking for the SELFIES of a molecule that is not metabolized by CYP P450 3A4?\nAssistant: This is a molecule that is not a substrate for CYP P450 3A4: [C][O][C][C][=C][Branch1][O][C][=Branch1][C][=O][O][C][Branch1][C][C][C][N][=C][C][NH1][C][=C][C][=C][Branch1][O][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][=C][C][Ring2][Ring1][O][=Ring1][P]"} {"text":"User: I'm searching for the SMILES of a molecule that is metabolized by CYP3A4?\nAssistant: This is a molecule that is a substrate for CYP P450 3A4: CCOCn1c(Cc2ccccc2)c(C(C)C)c(=O)[nH]c1=O"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/train_0-3.jsonl": "{"text":"The canonical SMILES COCc1c(C(=O)OC(C)C)ncc2[nH]c3ccc(OCc4ccccc4)cc3c12 is not metabolized by CYP3A4."} {"text":"The SELFIES [C][C][O][C][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Branch1][=Branch1][C][Branch1][C][C][C][C][=Branch1][C][=O][NH1][C][Ring1][P][=O] is metabolized by CYP P450 3A4."}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/train_0-12.jsonl": "{"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be a substrate for CYP3A4.\nAssistant: Got it, this SMILES is not a substrate for CYP3A4: COCc1c(C(=O)OC(C)C)ncc2[nH]c3ccc(OCc4ccccc4)cc3c12"} {"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should be a substrate for CYP P450 3A4.\nAssistant: Understood, this DeepSMILES is a substrate for CYP P450 3A4: CCOCncCcccccc6)))))))cCC)C))c=O)[nH]c6=O"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES representation of CCCC=O)NccccOC[C@@H]O)CNCC)C)))))))cCC)=O))c6 metabolized by CYP3A4?\nConstraint: Even if you are not sure, you must pick either A or B without using any other words.\nOptions:\nA False\nB True\nAnswer: A"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES O=C(O)c1ccccc1O metabolized by CYP P450 3A4?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any additional words.\nOptions:\n1. False\n2. True\nAnswer: 2"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/valid_0-2.jsonl": "{"text":"The InChI InChI=1S\/C14H18N6O\/c15-14-18-12(17-9-2-3-9)11-13(19-14)20(7-16-11)10-4-1-8(5-10)6-21\/h1,4,7-10,21H,2-3,5-6H2,(H3,15,17,18,19)\/t8-,10+\/m1\/s1 is from a molecule that is not identified as a CYP P450 3A4 substrate."} {"text":"The SELFIES [C][C][C][N][Branch1][Ring2][C][C][C][C][=Branch1][C][=O][C][C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=C][C][=C][C][Branch1][C][Cl][=C][N][Ring1][P][Ring1][#Branch1] is from a molecule that is identified as a CYP3A4 substrate."}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a substrate for CYP P450 3A4?\nConstraint: You must select none, one or more options from a, b, c, d, or e without using any additional words.\nOptions:\n[a] C[C@H]1c2cccc(O)c2C(=O)C2=C(O)[C@]3(O)C(=O)C(C(N)=O)=C(O)[C@@H](N(C)C)[C@@H]3[C@@H](O)[C@@H]21\n[b] CS(=O)(=O)c1ccc(C2=C(c3ccccc3)C(=O)OC2)cc1\n[c] COCc1c(C(=O)OC(C)C)ncc2[nH]c3ccc(OCc4ccccc4)cc3c12\n[d] C1CCC(C(C[C@H]2CCCCN2)C2CCCCC2)CC1\n[e] COc1ccc([C@@H]2Sc3ccccc3N(CCN(C)C)C(=O)[C@@H]2OC(C)=O)cc1\nAnswer: b, c, d"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are a CYP P450 3A4 substrate?\nConstraint: You must select none, one or more options from a, b, c, or d without using any other words.\nOptions:\n[a] CCOCn1c(Cc2ccccc2)c(C(C)C)c(=O)[nH]c1=O\n[b] CC[C@]1(O)C[C@H]2CN(CCc3c([nH]c4ccccc34)[C@@](C(=O)OC)(c3cc4c(cc3OC)N(C)[C@H]3[C@@](O)(C(N)=O)[C@H](O)[C@]5(CC)C=CCN6CC[C@]43[C@@H]65)C2)C1\n[c] c1ccc2cc(COC3CCNCC3)ccc2c1\n[d] CNC[C@H](O)c1ccc(O)c(O)c1\nAnswer: a, b"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/valid_0-1.jsonl": "{"text":"Based on the canonical SMILES Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1, the molecule is not metabolized by CYP P450 3A4."} {"text":"Based on the SMILES CCCN(CCC)C(=O)Cc1c(-c2ccc(Cl)cc2)nc2ccc(Cl)cn12, the molecule is metabolized by CYP P450 3A4."}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES NcncNCCC3))))cncn[C@H]C=C[C@@H]CO))C5)))))c5n9 metabolized by CYP P450 3A4?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any additional words.\nOptions:\n1 False\n2 True\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES representation of CCCN(CCC)C(=O)Cc1c(-c2ccc(Cl)cc2)nc2ccc(Cl)cn12 metabolized by CYP3A4?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any other words.\nOptions:\n(1) True\n(2) False\nAnswer: 1"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is metabolized by CYP3A4.\nSMILES: Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not a CYP P450 3A4 substrate."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is metabolized by CYP3A4.\nMolecule SMILES: CCCN(CCC)C(=O)Cc1c(-c2ccc(Cl)cc2)nc2ccc(Cl)cn12\nConstraint: Answer the question in a full sentence.\nResult: This molecule is a CYP P450 3A4 substrate."}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a CYP P450 3A4 substrate.\nMolecule SMILES: Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nesult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a substrate for CYP3A4.\nSELFIES: [C][C][C][N][Branch1][Ring2][C][C][C][C][=Branch1][C][=O][C][C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=C][C][=C][C][Branch1][C][Cl][=C][N][Ring1][P][Ring1][#Branch1]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nesult: True"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is metabolized by CYP3A4.\nSMILES: COCc1c(C(=O)OC(C)C)ncc2[nH]c3ccc(OCc4ccccc4)cc3c12\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not a CYP P450 3A4 substrate."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is metabolized by CYP3A4.\nDeepSMILES: CCOCncCcccccc6)))))))cCC)C))c=O)[nH]c6=O\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is a substrate for CYP P450 3A4."}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/valid_0-12.jsonl": "{"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be a CYP3A4 substrate.\nAssistant: Ok, this DeepSMILES is not a CYP3A4 substrate: NcncNCCC3))))cncn[C@H]C=C[C@@H]CO))C5)))))c5n9"} {"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should be a CYP3A4 substrate.\nAssistant: Understood, this SELFIES is a CYP3A4 substrate: [C][C][C][N][Branch1][Ring2][C][C][C][C][=Branch1][C][=O][C][C][=C][Branch1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][=C][C][=C][C][Branch1][C][Cl][=C][N][Ring1][P][Ring1][#Branch1]"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/train_0-2.jsonl": "{"text":"The SELFIES [C][O][C][C][=C][Branch1][O][C][=Branch1][C][=O][O][C][Branch1][C][C][C][N][=C][C][NH1][C][=C][C][=C][Branch1][O][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][=C][C][Ring2][Ring1][O][=Ring1][P] is from a molecule that is not identified as a CYP3A4 substrate."} {"text":"The canonical SMILES CCOCn1c(Cc2ccccc2)c(C(C)C)c(=O)[nH]c1=O represents a molecule that is identified as a substrate for CYP P450 3A4."}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/test_0-11.jsonl": "{"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be metabolized by CYP P450 3A4.\nAssistant: Got it, here you go, this InChI is not metabolized by CYP P450 3A4: InChI=1S\/C18H28N2O4\/c1-5-6-18(23)20-14-7-8-17(16(9-14)13(4)21)24-11-15(22)10-19-12(2)3\/h7-9,12,15,19,22H,5-6,10-11H2,1-4H3,(H,20,23)\/t15-\/m0\/s1"} {"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should be metabolized by CYP P450 3A4.\nAssistant: Got it, here you go, this SMILES is metabolized by CYP P450 3A4: O=C(O)c1ccccc1O"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/train_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the DeepSMILES COCccC=O)OCC)C))))ncc[nH]ccccOCcccccc6))))))))cc6c%139 is a CYP3A4 substrate?\nAssistant: No, this molecule is not metabolized by CYP P450 3A4."} {"text":"User: Can you tell me if the molecule with the SELFIES [C][C][O][C][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Branch1][=Branch1][C][Branch1][C][C][C][C][=Branch1][C][=O][NH1][C][Ring1][P][=O] is a CYP3A4 substrate?\nAssistant: Yes, this molecule is metabolized by CYP3A4."}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/train_0-11.jsonl": "{"text":"User: I want to generate a SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be metabolized by CYP P450 3A4.\nAssistant: Got it, this SMILES is not metabolized by CYP P450 3A4: COCc1c(C(=O)OC(C)C)ncc2[nH]c3ccc(OCc4ccccc4)cc3c12"} {"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should be metabolized by CYP P450 3A4.\nAssistant: Ok, this SELFIES is metabolized by CYP P450 3A4: [C][C][O][C][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Branch1][=Branch1][C][Branch1][C][C][C][C][=Branch1][C][=O][NH1][C][Ring1][P][=O]"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/train_0-1.jsonl": "{"text":"Based on the DeepSMILES COCccC=O)OCC)C))))ncc[nH]ccccOCcccccc6))))))))cc6c%139, the molecule is not metabolized by CYP3A4."} {"text":"Based on the SELFIES [C][C][O][C][N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Branch1][=Branch1][C][Branch1][C][C][C][C][=Branch1][C][=O][NH1][C][Ring1][P][=O], the molecule is metabolized by CYP P450 3A4."}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES representation of COCc1c(C(=O)OC(C)C)ncc2[nH]c3ccc(OCc4ccccc4)cc3c12 metabolized by CYP3A4?\nConstraint: Even if you are uncertain, you must pick either A or B without using any other words.\nOptions:\nA) True\nB) False\nAnswer: B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of CCOCn1c(Cc2ccccc2)c(C(C)C)c(=O)[nH]c1=O metabolized by CYP P450 3A4?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any other words.\nOptions:\n1) False\n2) True\nAnswer: 2"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a CYP P450 3A4 substrate.\nMolecule InChI: InChI=1S\/C24H24N2O4\/c1-15(2)30-24(27)23-19(14-28-3)22-18-11-17(29-13-16-7-5-4-6-8-16)9-10-20(18)26-21(22)12-25-23\/h4-12,15,26H,13-14H2,1-3H3\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nesult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a substrate for CYP3A4.\nMolecule canonical SMILES: CCOCn1c(Cc2ccccc2)c(C(C)C)c(=O)[nH]c1=O\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nesult: True"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/test_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the DeepSMILES CCCC=O)NccccOC[C@@H]O)CNCC)C)))))))cCC)=O))c6 is a CYP3A4 substrate?\nAssistant: No, this molecule is not metabolized by CYP3A4."} {"text":"User: Can you estimate if the molecule with the InChI InChI=1S\/C7H6O3\/c8-6-4-2-1-3-5(6)7(9)10\/h1-4,8H,(H,9,10) is a substrate for CYP P450 3A4?\nAssistant: Yes, this molecule is metabolized by CYP P450 3A4."}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/train_0-9.jsonl": "{"text":"User: Can you generate the InChI of a molecule that is a not CYP3A4 substrate?\nAssistant: Sure, here you go: InChI=1S\/C24H24N2O4\/c1-15(2)30-24(27)23-19(14-28-3)22-18-11-17(29-13-16-7-5-4-6-8-16)9-10-20(18)26-21(22)12-25-23\/h4-12,15,26H,13-14H2,1-3H3"} {"text":"User: Can you generate the canonical SMILES of a molecule that is a CYP P450 3A4 substrate?\nAssistant: Sure, here you go: CCOCn1c(Cc2ccccc2)c(C(C)C)c(=O)[nH]c1=O"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/valid_0-3.jsonl": "{"text":"The InChI InChI=1S\/C14H18N6O\/c15-14-18-12(17-9-2-3-9)11-13(19-14)20(7-16-11)10-4-1-8(5-10)6-21\/h1,4,7-10,21H,2-3,5-6H2,(H3,15,17,18,19)\/t8-,10+\/m1\/s1 is not metabolized by CYP P450 3A4."} {"text":"The molecule InChI InChI=1S\/C21H23Cl2N3O\/c1-3-11-25(12-4-2)20(27)13-18-21(15-5-7-16(22)8-6-15)24-19-10-9-17(23)14-26(18)19\/h5-10,14H,3-4,11-13H2,1-2H3 is metabolized by CYP P450 3A4."}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/test_0-8.jsonl": "{"text":"User: Is the molecule with the DeepSMILES CCCC=O)NccccOC[C@@H]O)CNCC)C)))))))cCC)=O))c6 metabolized by CYP3A4?\nAssistant: No, it is not a CYP3A4 substrate."} {"text":"User: Is the molecule with the canonical SMILES O=C(O)c1ccccc1O metabolized by CYP3A4?\nAssistant: Yes, it is a CYP3A4 substrate."}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a substrate for CYP P450 3A4?\nConstraint: You must select none, one or more options from a or b without using any other words.\nOptions:\n[a] [C][C][C][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][=C][O][C][C@@H1][Branch1][C][O][C][N][C][Branch1][C][C][C][C][Branch1][=Branch1][C][Branch1][C][C][=O][=C][Ring2][Ring1][C]\n[b] [C][O][C][=Branch1][C][=O][C][=C][Branch1][C][C][N][C][Branch1][C][C][=C][Branch2][Ring1][Branch1][C][=Branch1][C][=O][O][C][C][N][Branch1][C][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C@H1][Ring2][Ring1][=Branch1][C][=C][C][=C][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring1][=Branch2]\nAnswer: a"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are a substrate for CYP P450 3A4?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any other words.\nOptions:\nA. Cc1nnc2n1-c1sc(Br)cc1C(c1ccccc1Cl)=NC2\nB. CN(C)CCOc1ccc(\/C(=C(\\CCCl)c2ccccc2)c2ccccc2)cc1\nC. O=C(O)c1ccccc1O\nD. Cc1ncc([N+](=O)[O-])n1CCO\nE. CC1(C)NC(=O)N(c2ccc([N+](=O)[O-])c(C(F)(F)F)c2)C1=O\nAnswer: A, B, C"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a substrate for CYP P450 3A4?\nConstraint: You must select none, one or more options from a, b, or c without using any other words.\nOptions:\na.) CC=O)CCCCnc=O)ccncn5C))))nC)c6=O\nb.) NcncNCCC3))))cncn[C@H]C=C[C@@H]CO))C5)))))c5n9\nc.) CNCCOccccC[C@@H]SC=O)NC5=O)))))))cc6)))))))))cccccn6\nAnswer: a, b, c"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are a CYP3A4 substrate?\nConstraint: You must select none, one or more options from A or B without using any other words.\nOptions:\nA) CCN[C@H]1CN(CCCOC)S(=O)(=O)c2sc(S(N)(=O)=O)cc21\nB) CCCN(CCC)C(=O)Cc1c(-c2ccc(Cl)cc2)nc2ccc(Cl)cn12\nAnswer: A, B"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a substrate for CYP3A4.\ncanonical SMILES: CCCC(=O)Nc1ccc(OC[C@@H](O)CNC(C)C)c(C(C)=O)c1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nesult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a substrate for CYP P450 3A4.\nInChI: InChI=1S\/C7H6O3\/c8-6-4-2-1-3-5(6)7(9)10\/h1-4,8H,(H,9,10)\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nesult: True"}", "/scratch/micpie/export/cyp3a4_substrate_carbonmangels/test_0-12.jsonl": "{"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be a substrate for CYP P450 3A4.\nAssistant: Understood, this DeepSMILES is not a substrate for CYP P450 3A4: CCCC=O)NccccOC[C@@H]O)CNCC)C)))))))cCC)=O))c6"} {"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should be a substrate for CYP P450 3A4.\nAssistant: Ok, this InChI is a substrate for CYP P450 3A4: InChI=1S\/C7H6O3\/c8-6-4-2-1-3-5(6)7(9)10\/h1-4,8H,(H,9,10)"}", "/scratch/micpie/export/ord_masked/train_2-2.jsonl": "{"text":"Question: What is the masked component in the masked RXNSMILES (one component masked as `MASK`) Cl[CH2:45][CH2:44][CH2:43][CH2:42][CH2:41][CH2:40][N:39]=[CH:47][CH2:48]Cl.N#C[S:21][c:18]1[s:17][c:16]([NH:15][C:13]([N:12]([CH:9]2[CH2:8][CH2:7][N:6]([C:1]([CH2:2][CH2:3][CH3:4])=[O:5])[CH2:11][CH2:10]2)[C@@H:24]2[CH2:25][CH2:26][C@@H:27]([CH3:30])[CH2:28][CH2:29]2)=[O:14])[n:20][cH:19]1.O[C@H](CS)[C@@H](O)CS>>MASK?\nAnswer: [C:1]([CH2:2][CH2:3][CH3:4])(=[O:5])[N:6]1[CH2:7][CH2:8][CH:9]([N:12]([C:13](=[O:14])[NH:15][c:16]2[s:17][c:18]([S:21][CH2:48][CH2:47][N:39]3[CH2:40][CH2:41][CH2:42][CH2:43][CH2:44][CH2:45]3)[cH:19][n:20]2)[C@@H:24]2[CH2:25][CH2:26][C@@H:27]([CH3:30])[CH2:28][CH2:29]2)[CH2:10][CH2:11]1."} {"text":"Question: What is the masked component in the masked RXNSMILES (one component masked as `MASK`) [Br:1][c:2]1[cH:3][cH:4][c:5]([CH:8]([C:9]#[N:10])[CH2:11][CH:12]2[CH2:13][CH2:14][N:15]([CH3:18])[CH2:16][CH2:17]2)[cH:6][cH:7]1.[C:25]([O-:26])([O-:27])=[O:28].[CH3:31][OH:32].[Na+:29].[Na+:30].[OH2:24].[S:19](=[O:20])(=[O:21])([OH:22])[OH:23]>>MASK?\nAnswer: [Br:1][c:2]1[cH:3][cH:4][c:5]([CH:8]([CH2:11][CH:12]2[CH2:13][CH2:14][N:15]([CH3:18])[CH2:16][CH2:17]2)[C:25]([OH:26])=[O:28])[cH:6][cH:7]1."}", "/scratch/micpie/export/ord_masked/test_1-2.jsonl": "{"text":"Question: What is the masked component in the masked reaction SMILES (one component masked as `MASK`) CO.C[O:11][C:9]([CH:8]([c:7]1[c:2]([CH3:1])[n:3][c:4](-[c:23]2[cH:24][n:25][n:26]([CH3:28])[cH:27]2)[n:5][c:6]1-[c:16]1[cH:17][cH:18][c:19]([CH3:22])[cH:20][cH:21]1)[CH2:13][CH2:14][CH3:15])=[O:10].[Na+].[OH-]>>MASK?\nAnswer: [CH3:1][c:2]1[n:3][c:4](-[c:23]2[cH:24][n:25][n:26]([CH3:28])[cH:27]2)[n:5][c:6](-[c:16]2[cH:17][cH:18][c:19]([CH3:22])[cH:20][cH:21]2)[c:7]1[CH:8]([C:9](=[O:10])[OH:11])[CH2:13][CH2:14][CH3:15]."} {"text":"Question: What is the masked component in the masked reaction SMILES (one component masked as `MASK`) CCN(C(C)C)C(C)C.CN(C)C=O.ClCCl.O=S(Cl)Cl.O=[C:1]([OH:3])[c:4]1[cH:5][cH:6][c:7]([CH:8]=[O:9])[cH:10][cH:11]1.[C:16]([CH3:17])([CH3:18])([CH3:19])[O:20][C:21]([NH:22][c:23]1[c:24]([NH2:29])[cH:25][cH:26][cH:27][cH:28]1)=[O:30].MASK>>[C:1](=[O:3])([c:4]1[cH:5][cH:6][c:7]([CH:8]=[O:9])[cH:10][cH:11]1)[NH:29][c:24]1[c:23]([NH:22][C:21]([O:20][C:16]([CH3:17])([CH3:18])[CH3:19])=[O:30])[cH:28][cH:27][cH:26][cH:25]1?\nAnswer: O=Cc1ccc(C(=O)Cl)cc1."}", "/scratch/micpie/export/ord_masked/test_0-1.jsonl": "{"text":"The compound with SMILES Oc1c(I)cc2c(c1I)Oc1c(cc(I)c(O)c1I)[C:43]21[c:44]2[c:45](c(Cl)[c:47](Cl)[c:48](Cl)[c:49]2Cl)[C:54](=[O:55])[O:56]1 is the masked component in the reaction SMILES with one element masked as `MASK` C1CCOC1.C[Si](C)(C)c1cc(CCC[O:14][C:1]([CH2:2][CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH2:8][CH2:9][CH2:10][CH2:11][CH3:12])=[O:13])co1.O=[O:60].MASK>>[C:1]([CH2:2][CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH2:8][CH2:9][CH2:10][CH2:11][CH3:12])(=[O:13])[O:14][CH2:47][CH2:48][CH2:49][C:44]1=[CH:45][CH:54]([OH:55])[O:56][C:43]1=[O:60]."} {"text":"The compound with SMILES [c:16]1([CH2:22][CH2:23][C:24]#[CH:25])[cH:17][cH:18][cH:19][cH:20][cH:21]1 is the masked component in the reaction SMILES with one element hidden as `MASK` [CH2:34]1[O:35][CH2:36][CH2:37][CH2:38]1.[CH3:26][CH2:27][N:28]([CH2:29][CH3:30])[CH2:31][CH3:32].[I-:33].[NH:1]([c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1)[c:8]1[n:9][c:10]([Cl:15])[n:11][cH:12][c:13]1[I:14].[cH:39]1[cH:40][cH:41][c:42]([P:43]([Pd:44]([P:45]([c:46]2[cH:47][cH:48][cH:49][cH:50][cH:51]2)([c:52]2[cH:53][cH:54][cH:55][cH:56][cH:57]2)[c:58]2[cH:59][cH:60][cH:61][cH:62][cH:63]2)([P:64]([c:65]2[cH:66][cH:67][cH:68][cH:69][cH:70]2)([c:71]2[cH:72][cH:73][cH:74][cH:75][cH:76]2)[c:77]2[cH:78][cH:79][cH:80][cH:81][cH:82]2)[P:83]([c:84]2[cH:85][cH:86][cH:87][cH:88][cH:89]2)([c:90]2[cH:91][cH:92][cH:93][cH:94][cH:95]2)[c:96]2[cH:97][cH:98][cH:99][cH:100][cH:101]2)([c:102]2[cH:103][cH:104][cH:105][cH:106][cH:107]2)[c:108]2[cH:109][cH:110][cH:111][cH:112][cH:113]2)[cH:114][cH:115]1.MASK>>[NH:1]([c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1)[c:8]1[n:9][c:10]([Cl:15])[n:11][cH:12][c:13]1[C:25]#[C:24][CH2:23][CH2:22][c:16]1[cH:17][cH:18][cH:19][cH:20][cH:21]1."}", "/scratch/micpie/export/ord_masked/test_2-0.jsonl": "{"text":"The masked component in the reaction SMILES with one element hidden as `MASK` CC#N.O=[N+]([O-])c1ccc([O:43][C:42](=O)[NH:41][c:33]2[s:32][c:36]3[c:35]([n:34]2)[cH:40][cH:39][cH:38][cH:37]3)cc1.[O:1]1[c:2]2[c:3]([cH:7][c:8](-[c:11]3[s:12][c:13]([N:21]([CH2:22][CH2:23][O:24][c:25]4[cH:26][cH:27][cH:28][cH:29][cH:30]4)[CH3:31])[c:14]([C:16](=[O:17])[O:18][CH2:19][CH3:20])[n:15]3)[cH:9][cH:10]2)[NH:4][CH2:5][CH2:6]1>>MASK is [O:1]1[c:2]2[c:3]([cH:7][c:8](-[c:11]3[s:12][c:13]([N:21]([CH2:22][CH2:23][O:24][c:25]4[cH:26][cH:27][cH:28][cH:29][cH:30]4)[CH3:31])[c:14]([C:16](=[O:17])[O:18][CH2:19][CH3:20])[n:15]3)[cH:9][cH:10]2)[N:4]([C:42]([NH:41][c:33]2[s:32][c:36]3[c:35]([n:34]2)[cH:40][cH:39][cH:38][cH:37]3)=[O:43])[CH2:5][CH2:6]1."} {"text":"The masked component in the masked reaction SMILES string (one component masked as `MASK`) C1COCCO1.CC[O:42][C:40]([c:4]1[c:3]([C:2]([F:1])([F:45])[F:46])[n:7](-[c:8]2[n:9][c:10](-[c:14]3[c:15]([O:20][CH2:21][c:22]4[cH:23][cH:24][c:25](\/[CH:28]=[CH:29]\/[c:30]5[cH:31][cH:32][c:33]([C:36]([F:37])([F:38])[F:39])[cH:34][cH:35]5)[cH:26][cH:27]4)[cH:16][cH:17][cH:18][cH:19]3)[cH:11][cH:12][cH:13]2)[n:6][cH:5]1)=[O:41].CC[O:54]C(C)=O.Cl.[H][H].[Li+].[OH-:49].MASK>>[F:1][C:2]([C:3](=[O:49])[OH:54])([F:45])[F:46].[F:1][C:2]([c:3]1[c:4]([C:40](=[O:41])[OH:42])[cH:5][n:6][n:7]1-[c:8]1[n:9][c:10](-[c:14]2[c:15]([O:20][CH2:21][c:22]3[cH:23][cH:24][c:25]([CH2:28][CH2:29][c:30]4[cH:31][cH:32][c:33]([C:36]([F:37])([F:38])[F:39])[cH:34][cH:35]4)[cH:26][cH:27]3)[cH:16][cH:17][cH:18][cH:19]2)[cH:11][cH:12][cH:13]1)([F:45])[F:46] is O=[Pt]=O."}", "/scratch/micpie/export/ord_masked/valid_2-2.jsonl": "{"text":"Question: What is the masked component in the masked RXNSMILES (one component masked as `MASK`) [c:1]1([S:7](=[O:8])(=[O:9])[c:10]2[cH:11][cH:12][c:13]([C:14](=[O:15])[OH:16])[cH:17][cH:18]2)[cH:2][cH:3][cH:4][cH:5][cH:6]1.MASK>>[c:1]1([S:7](=[O:8])(=[O:9])[c:10]2[cH:11][cH:12][c:13]([C:14](=[O:16])[N:19]3[CH:20]([CH2:24][N:25]4[CH2:26][CH2:27][CH2:28][CH2:29]4)[CH2:21][CH2:22][CH2:23]3)[cH:17][cH:18]2)[cH:2][cH:3][cH:4][cH:5][cH:6]1?\nAnswer: [NH:19]1[CH:20]([CH2:24][N:25]2[CH2:26][CH2:27][CH2:28][CH2:29]2)[CH2:21][CH2:22][CH2:23]1."} {"text":"Question: What is the masked component in the reaction SMILES with one element hidden as `MASK` O=[c:5]1[nH:4][c:3](=[O:10])[c:2]([Br:1])[c:7]([CH3:8])[nH:6]1.MASK>>[Br:1][c:2]1[c:3](=[O:10])[nH:4][c:5]([Cl:11])[n:6][c:7]1[CH3:8]?\nAnswer: Cc1nc([Cl:11])[nH]c(=O)c1C."}", "/scratch/micpie/export/ord_masked/valid_0-0.jsonl": "{"text":"The masked component in the reaction SMILES with one element hidden as `MASK` C[CH2:1][O:3][CH:4]([CH2:5][P:6](=[O:7])([O:8][CH2:9][CH3:10])[O:11][CH2:12][CH3:13])[O:14][CH2:15][CH3:16].Cc1ccc(S(=O)(=O)O)cc1.Cc1ccccc1.O=C([O-])O.[Na+].OCC(O)C[Br:17]>>MASK is [CH2:1]1[O:3][C@@H:4]([CH2:5][P:6](=[O:7])([O:8][CH2:9][CH3:10])[O:11][CH2:12][CH3:13])[O:14][C@@H:15]1[CH2:16][Br:17]."} {"text":"The masked component in the reaction SMILES with one element masked as `MASK` CC[O:3][C:4]([CH:5]([OH:6])[CH3:7])=[O:8].Cl[CH2:9][CH:11]1[CH2:12][O:13]1.MASK>>[OH:3][C:4]([CH:5]([O:6][CH2:12][CH:11]([CH2:9][OH:20])[OH:13])[CH3:7])=[O:8] is CC[O:20]CC.FB(F)F."}", "/scratch/micpie/export/ord_masked/train_1-3.jsonl": "{"text":"Task: Predict the masked component in a masked reaction SMILES (one component masked as `MASK`).\nDescription: CC(=O)O.CC([O-])=[O:16].[Na+].MASK>>[C:2]([CH3:3])([c:4]1[cH:5][c:6]2[cH:7][cH:8][cH:9][cH:10][c:11]2[cH:12][cH:13]1)=[O:16]\nAnswer: Br[CH:2]([CH3:3])[c:4]1[cH:5][c:6]2[cH:7][cH:8][cH:9][cH:10][c:11]2[cH:12][cH:13]1"} {"text":"Task: Predict the masked component in a masked reaction SMILES (one component masked as `MASK`).\nDescription: [CH3:20][CH:21]([CH3:22])[NH2:23].MASK>>[OH:1][CH:2]([CH2:3][O:4][c:5]1[c:6]([C:7](=[O:8])[CH2:9][CH2:10][C:11](=[O:12])[O:13][CH3:14])[cH:15][cH:16][cH:17][cH:18]1)[CH2:19][NH:23][CH:21]([CH3:20])[CH3:22]\nAnswer: [O:1]1[CH:2]([CH2:3][O:4][c:5]2[c:6]([C:7](=[O:8])[CH2:9][CH2:10][C:11](=[O:12])[O:13][CH3:14])[cH:15][cH:16][cH:17][cH:18]2)[CH2:19]1"}", "/scratch/micpie/export/ord_masked/test_0-2.jsonl": "{"text":"Question: What is the masked component in the masked RXNSMILES (one component masked as `MASK`) C1CCOC1.C[Si](C)(C)c1cc(CCC[O:14][C:1]([CH2:2][CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH2:8][CH2:9][CH2:10][CH2:11][CH3:12])=[O:13])co1.O=[O:60].MASK>>[C:1]([CH2:2][CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH2:8][CH2:9][CH2:10][CH2:11][CH3:12])(=[O:13])[O:14][CH2:47][CH2:48][CH2:49][C:44]1=[CH:45][CH:54]([OH:55])[O:56][C:43]1=[O:60]?\nAnswer: Oc1c(I)cc2c(c1I)Oc1c(cc(I)c(O)c1I)[C:43]21[c:44]2[c:45](c(Cl)[c:47](Cl)[c:48](Cl)[c:49]2Cl)[C:54](=[O:55])[O:56]1."} {"text":"Question: What is the masked component in the reaction SMILES with one element masked as `MASK` [CH2:34]1[O:35][CH2:36][CH2:37][CH2:38]1.[CH3:26][CH2:27][N:28]([CH2:29][CH3:30])[CH2:31][CH3:32].[I-:33].[NH:1]([c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1)[c:8]1[n:9][c:10]([Cl:15])[n:11][cH:12][c:13]1[I:14].[cH:39]1[cH:40][cH:41][c:42]([P:43]([Pd:44]([P:45]([c:46]2[cH:47][cH:48][cH:49][cH:50][cH:51]2)([c:52]2[cH:53][cH:54][cH:55][cH:56][cH:57]2)[c:58]2[cH:59][cH:60][cH:61][cH:62][cH:63]2)([P:64]([c:65]2[cH:66][cH:67][cH:68][cH:69][cH:70]2)([c:71]2[cH:72][cH:73][cH:74][cH:75][cH:76]2)[c:77]2[cH:78][cH:79][cH:80][cH:81][cH:82]2)[P:83]([c:84]2[cH:85][cH:86][cH:87][cH:88][cH:89]2)([c:90]2[cH:91][cH:92][cH:93][cH:94][cH:95]2)[c:96]2[cH:97][cH:98][cH:99][cH:100][cH:101]2)([c:102]2[cH:103][cH:104][cH:105][cH:106][cH:107]2)[c:108]2[cH:109][cH:110][cH:111][cH:112][cH:113]2)[cH:114][cH:115]1.MASK>>[NH:1]([c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1)[c:8]1[n:9][c:10]([Cl:15])[n:11][cH:12][c:13]1[C:25]#[C:24][CH2:23][CH2:22][c:16]1[cH:17][cH:18][cH:19][cH:20][cH:21]1?\nAnswer: [c:16]1([CH2:22][CH2:23][C:24]#[CH:25])[cH:17][cH:18][cH:19][cH:20][cH:21]1."}", "/scratch/micpie/export/ord_masked/train_1-0.jsonl": "{"text":"The masked component in the masked reaction SMILES string (one component masked as `MASK`) CC(=O)O.CC([O-])=[O:16].[Na+].MASK>>[C:2]([CH3:3])([c:4]1[cH:5][c:6]2[cH:7][cH:8][cH:9][cH:10][c:11]2[cH:12][cH:13]1)=[O:16] is Br[CH:2]([CH3:3])[c:4]1[cH:5][c:6]2[cH:7][cH:8][cH:9][cH:10][c:11]2[cH:12][cH:13]1."} {"text":"The masked component in the reaction SMILES with one element hidden as `MASK` [CH3:20][CH:21]([CH3:22])[NH2:23].MASK>>[OH:1][CH:2]([CH2:3][O:4][c:5]1[c:6]([C:7](=[O:8])[CH2:9][CH2:10][C:11](=[O:12])[O:13][CH3:14])[cH:15][cH:16][cH:17][cH:18]1)[CH2:19][NH:23][CH:21]([CH3:20])[CH3:22] is [O:1]1[CH:2]([CH2:3][O:4][c:5]2[c:6]([C:7](=[O:8])[CH2:9][CH2:10][C:11](=[O:12])[O:13][CH3:14])[cH:15][cH:16][cH:17][cH:18]2)[CH2:19]1."}", "/scratch/micpie/export/ord_masked/train_2-3.jsonl": "{"text":"Task: Predict the masked component in a reaction SMILES with one element masked as `MASK`.\nDescription: Cl[CH2:45][CH2:44][CH2:43][CH2:42][CH2:41][CH2:40][N:39]=[CH:47][CH2:48]Cl.N#C[S:21][c:18]1[s:17][c:16]([NH:15][C:13]([N:12]([CH:9]2[CH2:8][CH2:7][N:6]([C:1]([CH2:2][CH2:3][CH3:4])=[O:5])[CH2:11][CH2:10]2)[C@@H:24]2[CH2:25][CH2:26][C@@H:27]([CH3:30])[CH2:28][CH2:29]2)=[O:14])[n:20][cH:19]1.O[C@H](CS)[C@@H](O)CS>>MASK\nAnswer: [C:1]([CH2:2][CH2:3][CH3:4])(=[O:5])[N:6]1[CH2:7][CH2:8][CH:9]([N:12]([C:13](=[O:14])[NH:15][c:16]2[s:17][c:18]([S:21][CH2:48][CH2:47][N:39]3[CH2:40][CH2:41][CH2:42][CH2:43][CH2:44][CH2:45]3)[cH:19][n:20]2)[C@@H:24]2[CH2:25][CH2:26][C@@H:27]([CH3:30])[CH2:28][CH2:29]2)[CH2:10][CH2:11]1"} {"text":"Task: Predict the masked component in a reaction SMILES with one element masked as `MASK`.\nDescription: [Br:1][c:2]1[cH:3][cH:4][c:5]([CH:8]([C:9]#[N:10])[CH2:11][CH:12]2[CH2:13][CH2:14][N:15]([CH3:18])[CH2:16][CH2:17]2)[cH:6][cH:7]1.[C:25]([O-:26])([O-:27])=[O:28].[CH3:31][OH:32].[Na+:29].[Na+:30].[OH2:24].[S:19](=[O:20])(=[O:21])([OH:22])[OH:23]>>MASK\nAnswer: [Br:1][c:2]1[cH:3][cH:4][c:5]([CH:8]([CH2:11][CH:12]2[CH2:13][CH2:14][N:15]([CH3:18])[CH2:16][CH2:17]2)[C:25]([OH:26])=[O:28])[cH:6][cH:7]1"}", "/scratch/micpie/export/ord_masked/valid_1-2.jsonl": "{"text":"Question: What is the masked component in the reaction SMILES with one element hidden as `MASK` C=CCBr.CN(C)C=O.C[C:24]([O-])([CH3:23])[CH3:25].[K+].O.[Cl:1][c:2]1[cH:3][c:4]2[c:5]([cH:21][cH:22]1)-[c:6]1[c:7]([cH:18][nH:19][cH:20]1)[CH2:8][N:9]=[C:10]2[c:11]1[c:12]([F:17])[cH:13][cH:14][cH:15][cH:16]1>>MASK?\nAnswer: [Cl:1][c:2]1[cH:3][c:4]2[c:5]([cH:21][cH:22]1)-[c:6]1[c:7]([cH:18][n:19]([CH2:25][CH:24]=[CH2:23])[cH:20]1)[CH2:8][N:9]=[C:10]2[c:11]1[c:12]([F:17])[cH:13][cH:14][cH:15][cH:16]1."} {"text":"Question: What is the masked component in the reaction SMILES with one element masked as `MASK` ClCCl.O.O=C(O)C(F)(F)F.O=C([O-])[O-].[K+].[K+].MASK>>[I:8][c:9]1[cH:10][cH:11][c:12]([C:15]2([NH2:18])[CH2:16][CH2:17]2)[cH:13][cH:14]1?\nAnswer: CC(C)(C)OC(=O)[NH:18][C:15]1([c:12]2[cH:11][cH:10][c:9]([I:8])[cH:14][cH:13]2)[CH2:16][CH2:17]1."}", "/scratch/micpie/export/ord_masked/test_0-0.jsonl": "{"text":"The masked component in the reaction SMILES with one element masked as `MASK` C1CCOC1.C[Si](C)(C)c1cc(CCC[O:14][C:1]([CH2:2][CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH2:8][CH2:9][CH2:10][CH2:11][CH3:12])=[O:13])co1.O=[O:60].MASK>>[C:1]([CH2:2][CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH2:8][CH2:9][CH2:10][CH2:11][CH3:12])(=[O:13])[O:14][CH2:47][CH2:48][CH2:49][C:44]1=[CH:45][CH:54]([OH:55])[O:56][C:43]1=[O:60] is Oc1c(I)cc2c(c1I)Oc1c(cc(I)c(O)c1I)[C:43]21[c:44]2[c:45](c(Cl)[c:47](Cl)[c:48](Cl)[c:49]2Cl)[C:54](=[O:55])[O:56]1."} {"text":"The masked component in the masked reaction SMILES (one component masked as `MASK`) [CH2:34]1[O:35][CH2:36][CH2:37][CH2:38]1.[CH3:26][CH2:27][N:28]([CH2:29][CH3:30])[CH2:31][CH3:32].[I-:33].[NH:1]([c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1)[c:8]1[n:9][c:10]([Cl:15])[n:11][cH:12][c:13]1[I:14].[cH:39]1[cH:40][cH:41][c:42]([P:43]([Pd:44]([P:45]([c:46]2[cH:47][cH:48][cH:49][cH:50][cH:51]2)([c:52]2[cH:53][cH:54][cH:55][cH:56][cH:57]2)[c:58]2[cH:59][cH:60][cH:61][cH:62][cH:63]2)([P:64]([c:65]2[cH:66][cH:67][cH:68][cH:69][cH:70]2)([c:71]2[cH:72][cH:73][cH:74][cH:75][cH:76]2)[c:77]2[cH:78][cH:79][cH:80][cH:81][cH:82]2)[P:83]([c:84]2[cH:85][cH:86][cH:87][cH:88][cH:89]2)([c:90]2[cH:91][cH:92][cH:93][cH:94][cH:95]2)[c:96]2[cH:97][cH:98][cH:99][cH:100][cH:101]2)([c:102]2[cH:103][cH:104][cH:105][cH:106][cH:107]2)[c:108]2[cH:109][cH:110][cH:111][cH:112][cH:113]2)[cH:114][cH:115]1.MASK>>[NH:1]([c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1)[c:8]1[n:9][c:10]([Cl:15])[n:11][cH:12][c:13]1[C:25]#[C:24][CH2:23][CH2:22][c:16]1[cH:17][cH:18][cH:19][cH:20][cH:21]1 is [c:16]1([CH2:22][CH2:23][C:24]#[CH:25])[cH:17][cH:18][cH:19][cH:20][cH:21]1."}", "/scratch/micpie/export/ord_masked/train_2-0.jsonl": "{"text":"The masked component in the reaction SMILES with one element masked as `MASK` Cl[CH2:45][CH2:44][CH2:43][CH2:42][CH2:41][CH2:40][N:39]=[CH:47][CH2:48]Cl.N#C[S:21][c:18]1[s:17][c:16]([NH:15][C:13]([N:12]([CH:9]2[CH2:8][CH2:7][N:6]([C:1]([CH2:2][CH2:3][CH3:4])=[O:5])[CH2:11][CH2:10]2)[C@@H:24]2[CH2:25][CH2:26][C@@H:27]([CH3:30])[CH2:28][CH2:29]2)=[O:14])[n:20][cH:19]1.O[C@H](CS)[C@@H](O)CS>>MASK is [C:1]([CH2:2][CH2:3][CH3:4])(=[O:5])[N:6]1[CH2:7][CH2:8][CH:9]([N:12]([C:13](=[O:14])[NH:15][c:16]2[s:17][c:18]([S:21][CH2:48][CH2:47][N:39]3[CH2:40][CH2:41][CH2:42][CH2:43][CH2:44][CH2:45]3)[cH:19][n:20]2)[C@@H:24]2[CH2:25][CH2:26][C@@H:27]([CH3:30])[CH2:28][CH2:29]2)[CH2:10][CH2:11]1."} {"text":"The masked component in the reaction SMILES with one element hidden as `MASK` [Br:1][c:2]1[cH:3][cH:4][c:5]([CH:8]([C:9]#[N:10])[CH2:11][CH:12]2[CH2:13][CH2:14][N:15]([CH3:18])[CH2:16][CH2:17]2)[cH:6][cH:7]1.[C:25]([O-:26])([O-:27])=[O:28].[CH3:31][OH:32].[Na+:29].[Na+:30].[OH2:24].[S:19](=[O:20])(=[O:21])([OH:22])[OH:23]>>MASK is [Br:1][c:2]1[cH:3][cH:4][c:5]([CH:8]([CH2:11][CH:12]2[CH2:13][CH2:14][N:15]([CH3:18])[CH2:16][CH2:17]2)[C:25]([OH:26])=[O:28])[cH:6][cH:7]1."}", "/scratch/micpie/export/ord_masked/valid_2-0.jsonl": "{"text":"The masked component in the masked RXNSMILES (one component masked as `MASK`) [c:1]1([S:7](=[O:8])(=[O:9])[c:10]2[cH:11][cH:12][c:13]([C:14](=[O:15])[OH:16])[cH:17][cH:18]2)[cH:2][cH:3][cH:4][cH:5][cH:6]1.MASK>>[c:1]1([S:7](=[O:8])(=[O:9])[c:10]2[cH:11][cH:12][c:13]([C:14](=[O:16])[N:19]3[CH:20]([CH2:24][N:25]4[CH2:26][CH2:27][CH2:28][CH2:29]4)[CH2:21][CH2:22][CH2:23]3)[cH:17][cH:18]2)[cH:2][cH:3][cH:4][cH:5][cH:6]1 is [NH:19]1[CH:20]([CH2:24][N:25]2[CH2:26][CH2:27][CH2:28][CH2:29]2)[CH2:21][CH2:22][CH2:23]1."} {"text":"The masked component in the masked RXNSMILES (one component masked as `MASK`) O=[c:5]1[nH:4][c:3](=[O:10])[c:2]([Br:1])[c:7]([CH3:8])[nH:6]1.MASK>>[Br:1][c:2]1[c:3](=[O:10])[nH:4][c:5]([Cl:11])[n:6][c:7]1[CH3:8] is Cc1nc([Cl:11])[nH]c(=O)c1C."}", "/scratch/micpie/export/ord_masked/test_0-3.jsonl": "{"text":"Task: Predict the masked component in a masked RXNSMILES (one component masked as `MASK`).\nDescription: C1CCOC1.C[Si](C)(C)c1cc(CCC[O:14][C:1]([CH2:2][CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH2:8][CH2:9][CH2:10][CH2:11][CH3:12])=[O:13])co1.O=[O:60].MASK>>[C:1]([CH2:2][CH2:3][CH2:4][CH2:5][CH2:6][CH2:7][CH2:8][CH2:9][CH2:10][CH2:11][CH3:12])(=[O:13])[O:14][CH2:47][CH2:48][CH2:49][C:44]1=[CH:45][CH:54]([OH:55])[O:56][C:43]1=[O:60]\nSolution: Oc1c(I)cc2c(c1I)Oc1c(cc(I)c(O)c1I)[C:43]21[c:44]2[c:45](c(Cl)[c:47](Cl)[c:48](Cl)[c:49]2Cl)[C:54](=[O:55])[O:56]1"} {"text":"Task: Predict the masked component in a masked reaction SMILES string (one component masked as `MASK`).\nDescription: [CH2:34]1[O:35][CH2:36][CH2:37][CH2:38]1.[CH3:26][CH2:27][N:28]([CH2:29][CH3:30])[CH2:31][CH3:32].[I-:33].[NH:1]([c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1)[c:8]1[n:9][c:10]([Cl:15])[n:11][cH:12][c:13]1[I:14].[cH:39]1[cH:40][cH:41][c:42]([P:43]([Pd:44]([P:45]([c:46]2[cH:47][cH:48][cH:49][cH:50][cH:51]2)([c:52]2[cH:53][cH:54][cH:55][cH:56][cH:57]2)[c:58]2[cH:59][cH:60][cH:61][cH:62][cH:63]2)([P:64]([c:65]2[cH:66][cH:67][cH:68][cH:69][cH:70]2)([c:71]2[cH:72][cH:73][cH:74][cH:75][cH:76]2)[c:77]2[cH:78][cH:79][cH:80][cH:81][cH:82]2)[P:83]([c:84]2[cH:85][cH:86][cH:87][cH:88][cH:89]2)([c:90]2[cH:91][cH:92][cH:93][cH:94][cH:95]2)[c:96]2[cH:97][cH:98][cH:99][cH:100][cH:101]2)([c:102]2[cH:103][cH:104][cH:105][cH:106][cH:107]2)[c:108]2[cH:109][cH:110][cH:111][cH:112][cH:113]2)[cH:114][cH:115]1.MASK>>[NH:1]([c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1)[c:8]1[n:9][c:10]([Cl:15])[n:11][cH:12][c:13]1[C:25]#[C:24][CH2:23][CH2:22][c:16]1[cH:17][cH:18][cH:19][cH:20][cH:21]1\nAnswer: [c:16]1([CH2:22][CH2:23][C:24]#[CH:25])[cH:17][cH:18][cH:19][cH:20][cH:21]1"}", "/scratch/micpie/export/ord_masked/valid_2-1.jsonl": "{"text":"The compound with SMILES [NH:19]1[CH:20]([CH2:24][N:25]2[CH2:26][CH2:27][CH2:28][CH2:29]2)[CH2:21][CH2:22][CH2:23]1 is the masked component in the reaction SMILES with one element hidden as `MASK` [c:1]1([S:7](=[O:8])(=[O:9])[c:10]2[cH:11][cH:12][c:13]([C:14](=[O:15])[OH:16])[cH:17][cH:18]2)[cH:2][cH:3][cH:4][cH:5][cH:6]1.MASK>>[c:1]1([S:7](=[O:8])(=[O:9])[c:10]2[cH:11][cH:12][c:13]([C:14](=[O:16])[N:19]3[CH:20]([CH2:24][N:25]4[CH2:26][CH2:27][CH2:28][CH2:29]4)[CH2:21][CH2:22][CH2:23]3)[cH:17][cH:18]2)[cH:2][cH:3][cH:4][cH:5][cH:6]1."} {"text":"The chemical with SMILES Cc1nc([Cl:11])[nH]c(=O)c1C is the masked component in the masked RXNSMILES (one component masked as `MASK`) O=[c:5]1[nH:4][c:3](=[O:10])[c:2]([Br:1])[c:7]([CH3:8])[nH:6]1.MASK>>[Br:1][c:2]1[c:3](=[O:10])[nH:4][c:5]([Cl:11])[n:6][c:7]1[CH3:8]."}", "/scratch/micpie/export/ord_masked/test_2-1.jsonl": "{"text":"The compound with SMILES [O:1]1[c:2]2[c:3]([cH:7][c:8](-[c:11]3[s:12][c:13]([N:21]([CH2:22][CH2:23][O:24][c:25]4[cH:26][cH:27][cH:28][cH:29][cH:30]4)[CH3:31])[c:14]([C:16](=[O:17])[O:18][CH2:19][CH3:20])[n:15]3)[cH:9][cH:10]2)[N:4]([C:42]([NH:41][c:33]2[s:32][c:36]3[c:35]([n:34]2)[cH:40][cH:39][cH:38][cH:37]3)=[O:43])[CH2:5][CH2:6]1 is the masked component in the reaction SMILES with one element masked as `MASK` CC#N.O=[N+]([O-])c1ccc([O:43][C:42](=O)[NH:41][c:33]2[s:32][c:36]3[c:35]([n:34]2)[cH:40][cH:39][cH:38][cH:37]3)cc1.[O:1]1[c:2]2[c:3]([cH:7][c:8](-[c:11]3[s:12][c:13]([N:21]([CH2:22][CH2:23][O:24][c:25]4[cH:26][cH:27][cH:28][cH:29][cH:30]4)[CH3:31])[c:14]([C:16](=[O:17])[O:18][CH2:19][CH3:20])[n:15]3)[cH:9][cH:10]2)[NH:4][CH2:5][CH2:6]1>>MASK."} {"text":"The chemical with SMILES O=[Pt]=O is the masked component in the masked reaction SMILES (one component masked as `MASK`) C1COCCO1.CC[O:42][C:40]([c:4]1[c:3]([C:2]([F:1])([F:45])[F:46])[n:7](-[c:8]2[n:9][c:10](-[c:14]3[c:15]([O:20][CH2:21][c:22]4[cH:23][cH:24][c:25](\/[CH:28]=[CH:29]\/[c:30]5[cH:31][cH:32][c:33]([C:36]([F:37])([F:38])[F:39])[cH:34][cH:35]5)[cH:26][cH:27]4)[cH:16][cH:17][cH:18][cH:19]3)[cH:11][cH:12][cH:13]2)[n:6][cH:5]1)=[O:41].CC[O:54]C(C)=O.Cl.[H][H].[Li+].[OH-:49].MASK>>[F:1][C:2]([C:3](=[O:49])[OH:54])([F:45])[F:46].[F:1][C:2]([c:3]1[c:4]([C:40](=[O:41])[OH:42])[cH:5][n:6][n:7]1-[c:8]1[n:9][c:10](-[c:14]2[c:15]([O:20][CH2:21][c:22]3[cH:23][cH:24][c:25]([CH2:28][CH2:29][c:30]4[cH:31][cH:32][c:33]([C:36]([F:37])([F:38])[F:39])[cH:34][cH:35]4)[cH:26][cH:27]3)[cH:16][cH:17][cH:18][cH:19]2)[cH:11][cH:12][cH:13]1)([F:45])[F:46]."}", "/scratch/micpie/export/ord_masked/train_0-0.jsonl": "{"text":"The masked component in the masked RXNSMILES (one component masked as `MASK`) [CH2:25]([CH:26]([CH3:27])[OH:28])[OH:29].[CH3:1][n:2]1[c:3](=[O:24])[n:4](-[c:13]2[cH:14][cH:15][c:16]3[c:17]([c:18]([CH:21]=[O:22])[n:19][s:20]3)[cH:23]2)[c:5](=[O:12])[cH:6][c:7]1[C:8]([F:9])([F:10])[F:11].[CH3:41][c:42]1[cH:43][cH:44][cH:45][cH:46][cH:47]1.[CH3:48][CH2:49][O:50][CH2:51][CH3:52].[OH2:53].[c:30]1([CH3:31])[cH:32][cH:33][c:34]([S:35]([OH:36])(=[O:37])=[O:38])[cH:39][cH:40]1>>MASK is [CH3:1][n:2]1[c:3](=[O:24])[n:4](-[c:13]2[cH:14][cH:15][c:16]3[c:17]([c:18]([CH:21]4[O:22][CH2:25][CH:26]([CH3:27])[O:28]4)[n:19][s:20]3)[cH:23]2)[c:5](=[O:12])[cH:6][c:7]1[C:8]([F:9])([F:10])[F:11]."} {"text":"The masked component in the masked reaction SMILES string (one component masked as `MASK`) [NH2:25][c:26]1[n:27][cH:28][c:29]([Br:33])[cH:30][c:31]1[CH3:32].MASK>>[F:1][CH:2]([c:3]1[cH:4][c:5](-[c:14]2[cH:15][cH:16][c:17]([C:20]([F:21])([F:22])[F:23])[cH:18][cH:19]2)[n:6][c:7]2[n:8]1[n:9][cH:10][c:11]2[C:12]#[C:13][c:29]1[cH:28][n:27][c:26]([NH2:25])[c:31]([CH3:32])[cH:30]1)[F:24] is [F:1][CH:2]([c:3]1[cH:4][c:5](-[c:14]2[cH:15][cH:16][c:17]([C:20]([F:21])([F:22])[F:23])[cH:18][cH:19]2)[n:6][c:7]2[n:8]1[n:9][cH:10][c:11]2[C:12]#[CH:13])[F:24]."}", "/scratch/micpie/export/ord_masked/train_0-3.jsonl": "{"text":"Task: Predict the masked component in a reaction SMILES with one element hidden as `MASK`.\nDescription: [CH2:25]([CH:26]([CH3:27])[OH:28])[OH:29].[CH3:1][n:2]1[c:3](=[O:24])[n:4](-[c:13]2[cH:14][cH:15][c:16]3[c:17]([c:18]([CH:21]=[O:22])[n:19][s:20]3)[cH:23]2)[c:5](=[O:12])[cH:6][c:7]1[C:8]([F:9])([F:10])[F:11].[CH3:41][c:42]1[cH:43][cH:44][cH:45][cH:46][cH:47]1.[CH3:48][CH2:49][O:50][CH2:51][CH3:52].[OH2:53].[c:30]1([CH3:31])[cH:32][cH:33][c:34]([S:35]([OH:36])(=[O:37])=[O:38])[cH:39][cH:40]1>>MASK\nSolution: [CH3:1][n:2]1[c:3](=[O:24])[n:4](-[c:13]2[cH:14][cH:15][c:16]3[c:17]([c:18]([CH:21]4[O:22][CH2:25][CH:26]([CH3:27])[O:28]4)[n:19][s:20]3)[cH:23]2)[c:5](=[O:12])[cH:6][c:7]1[C:8]([F:9])([F:10])[F:11]"} {"text":"Task: Predict the masked component in a reaction SMILES with one element masked as `MASK`.\nDescription: [NH2:25][c:26]1[n:27][cH:28][c:29]([Br:33])[cH:30][c:31]1[CH3:32].MASK>>[F:1][CH:2]([c:3]1[cH:4][c:5](-[c:14]2[cH:15][cH:16][c:17]([C:20]([F:21])([F:22])[F:23])[cH:18][cH:19]2)[n:6][c:7]2[n:8]1[n:9][cH:10][c:11]2[C:12]#[C:13][c:29]1[cH:28][n:27][c:26]([NH2:25])[c:31]([CH3:32])[cH:30]1)[F:24]\nAnswer: [F:1][CH:2]([c:3]1[cH:4][c:5](-[c:14]2[cH:15][cH:16][c:17]([C:20]([F:21])([F:22])[F:23])[cH:18][cH:19]2)[n:6][c:7]2[n:8]1[n:9][cH:10][c:11]2[C:12]#[CH:13])[F:24]"}", "/scratch/micpie/export/ord_masked/test_1-1.jsonl": "{"text":"The chemical with SMILES [CH3:1][c:2]1[n:3][c:4](-[c:23]2[cH:24][n:25][n:26]([CH3:28])[cH:27]2)[n:5][c:6](-[c:16]2[cH:17][cH:18][c:19]([CH3:22])[cH:20][cH:21]2)[c:7]1[CH:8]([C:9](=[O:10])[OH:11])[CH2:13][CH2:14][CH3:15] is the masked component in the masked reaction SMILES string (one component masked as `MASK`) CO.C[O:11][C:9]([CH:8]([c:7]1[c:2]([CH3:1])[n:3][c:4](-[c:23]2[cH:24][n:25][n:26]([CH3:28])[cH:27]2)[n:5][c:6]1-[c:16]1[cH:17][cH:18][c:19]([CH3:22])[cH:20][cH:21]1)[CH2:13][CH2:14][CH3:15])=[O:10].[Na+].[OH-]>>MASK."} {"text":"The compound with SMILES O=Cc1ccc(C(=O)Cl)cc1 is the masked component in the masked RXNSMILES (one component masked as `MASK`) CCN(C(C)C)C(C)C.CN(C)C=O.ClCCl.O=S(Cl)Cl.O=[C:1]([OH:3])[c:4]1[cH:5][cH:6][c:7]([CH:8]=[O:9])[cH:10][cH:11]1.[C:16]([CH3:17])([CH3:18])([CH3:19])[O:20][C:21]([NH:22][c:23]1[c:24]([NH2:29])[cH:25][cH:26][cH:27][cH:28]1)=[O:30].MASK>>[C:1](=[O:3])([c:4]1[cH:5][cH:6][c:7]([CH:8]=[O:9])[cH:10][cH:11]1)[NH:29][c:24]1[c:23]([NH:22][C:21]([O:20][C:16]([CH3:17])([CH3:18])[CH3:19])=[O:30])[cH:28][cH:27][cH:26][cH:25]1."}", "/scratch/micpie/export/ord_masked/valid_1-3.jsonl": "{"text":"Task: Predict the masked component in a masked reaction SMILES string (one component masked as `MASK`).\nDescription: C=CCBr.CN(C)C=O.C[C:24]([O-])([CH3:23])[CH3:25].[K+].O.[Cl:1][c:2]1[cH:3][c:4]2[c:5]([cH:21][cH:22]1)-[c:6]1[c:7]([cH:18][nH:19][cH:20]1)[CH2:8][N:9]=[C:10]2[c:11]1[c:12]([F:17])[cH:13][cH:14][cH:15][cH:16]1>>MASK\nSolution: [Cl:1][c:2]1[cH:3][c:4]2[c:5]([cH:21][cH:22]1)-[c:6]1[c:7]([cH:18][n:19]([CH2:25][CH:24]=[CH2:23])[cH:20]1)[CH2:8][N:9]=[C:10]2[c:11]1[c:12]([F:17])[cH:13][cH:14][cH:15][cH:16]1"} {"text":"Task: Predict the masked component in a masked reaction SMILES (one component masked as `MASK`).\nDescription: ClCCl.O.O=C(O)C(F)(F)F.O=C([O-])[O-].[K+].[K+].MASK>>[I:8][c:9]1[cH:10][cH:11][c:12]([C:15]2([NH2:18])[CH2:16][CH2:17]2)[cH:13][cH:14]1\nAnswer: CC(C)(C)OC(=O)[NH:18][C:15]1([c:12]2[cH:11][cH:10][c:9]([I:8])[cH:14][cH:13]2)[CH2:16][CH2:17]1"}", "/scratch/micpie/export/ord_masked/valid_0-2.jsonl": "{"text":"Question: What is the masked component in the masked RXNSMILES (one component masked as `MASK`) C[CH2:1][O:3][CH:4]([CH2:5][P:6](=[O:7])([O:8][CH2:9][CH3:10])[O:11][CH2:12][CH3:13])[O:14][CH2:15][CH3:16].Cc1ccc(S(=O)(=O)O)cc1.Cc1ccccc1.O=C([O-])O.[Na+].OCC(O)C[Br:17]>>MASK?\nAnswer: [CH2:1]1[O:3][C@@H:4]([CH2:5][P:6](=[O:7])([O:8][CH2:9][CH3:10])[O:11][CH2:12][CH3:13])[O:14][C@@H:15]1[CH2:16][Br:17]."} {"text":"Question: What is the masked component in the reaction SMILES with one element masked as `MASK` CC[O:3][C:4]([CH:5]([OH:6])[CH3:7])=[O:8].Cl[CH2:9][CH:11]1[CH2:12][O:13]1.MASK>>[OH:3][C:4]([CH:5]([O:6][CH2:12][CH:11]([CH2:9][OH:20])[OH:13])[CH3:7])=[O:8]?\nAnswer: CC[O:20]CC.FB(F)F."}", "/scratch/micpie/export/ord_masked/valid_0-1.jsonl": "{"text":"The compound with SMILES [CH2:1]1[O:3][C@@H:4]([CH2:5][P:6](=[O:7])([O:8][CH2:9][CH3:10])[O:11][CH2:12][CH3:13])[O:14][C@@H:15]1[CH2:16][Br:17] is the masked component in the reaction SMILES with one element masked as `MASK` C[CH2:1][O:3][CH:4]([CH2:5][P:6](=[O:7])([O:8][CH2:9][CH3:10])[O:11][CH2:12][CH3:13])[O:14][CH2:15][CH3:16].Cc1ccc(S(=O)(=O)O)cc1.Cc1ccccc1.O=C([O-])O.[Na+].OCC(O)C[Br:17]>>MASK."} {"text":"The compound with SMILES CC[O:20]CC.FB(F)F is the masked component in the reaction SMILES with one element hidden as `MASK` CC[O:3][C:4]([CH:5]([OH:6])[CH3:7])=[O:8].Cl[CH2:9][CH:11]1[CH2:12][O:13]1.MASK>>[OH:3][C:4]([CH:5]([O:6][CH2:12][CH:11]([CH2:9][OH:20])[OH:13])[CH3:7])=[O:8]."}", "/scratch/micpie/export/ord_masked/train_2-1.jsonl": "{"text":"The chemical with SMILES [C:1]([CH2:2][CH2:3][CH3:4])(=[O:5])[N:6]1[CH2:7][CH2:8][CH:9]([N:12]([C:13](=[O:14])[NH:15][c:16]2[s:17][c:18]([S:21][CH2:48][CH2:47][N:39]3[CH2:40][CH2:41][CH2:42][CH2:43][CH2:44][CH2:45]3)[cH:19][n:20]2)[C@@H:24]2[CH2:25][CH2:26][C@@H:27]([CH3:30])[CH2:28][CH2:29]2)[CH2:10][CH2:11]1 is the masked component in the masked RXNSMILES (one component masked as `MASK`) Cl[CH2:45][CH2:44][CH2:43][CH2:42][CH2:41][CH2:40][N:39]=[CH:47][CH2:48]Cl.N#C[S:21][c:18]1[s:17][c:16]([NH:15][C:13]([N:12]([CH:9]2[CH2:8][CH2:7][N:6]([C:1]([CH2:2][CH2:3][CH3:4])=[O:5])[CH2:11][CH2:10]2)[C@@H:24]2[CH2:25][CH2:26][C@@H:27]([CH3:30])[CH2:28][CH2:29]2)=[O:14])[n:20][cH:19]1.O[C@H](CS)[C@@H](O)CS>>MASK."} {"text":"The compound with SMILES [Br:1][c:2]1[cH:3][cH:4][c:5]([CH:8]([CH2:11][CH:12]2[CH2:13][CH2:14][N:15]([CH3:18])[CH2:16][CH2:17]2)[C:25]([OH:26])=[O:28])[cH:6][cH:7]1 is the masked component in the masked reaction SMILES string (one component masked as `MASK`) [Br:1][c:2]1[cH:3][cH:4][c:5]([CH:8]([C:9]#[N:10])[CH2:11][CH:12]2[CH2:13][CH2:14][N:15]([CH3:18])[CH2:16][CH2:17]2)[cH:6][cH:7]1.[C:25]([O-:26])([O-:27])=[O:28].[CH3:31][OH:32].[Na+:29].[Na+:30].[OH2:24].[S:19](=[O:20])(=[O:21])([OH:22])[OH:23]>>MASK."}", "/scratch/micpie/export/ord_masked/valid_1-1.jsonl": "{"text":"The chemical with SMILES [Cl:1][c:2]1[cH:3][c:4]2[c:5]([cH:21][cH:22]1)-[c:6]1[c:7]([cH:18][n:19]([CH2:25][CH:24]=[CH2:23])[cH:20]1)[CH2:8][N:9]=[C:10]2[c:11]1[c:12]([F:17])[cH:13][cH:14][cH:15][cH:16]1 is the masked component in the masked reaction SMILES string (one component masked as `MASK`) C=CCBr.CN(C)C=O.C[C:24]([O-])([CH3:23])[CH3:25].[K+].O.[Cl:1][c:2]1[cH:3][c:4]2[c:5]([cH:21][cH:22]1)-[c:6]1[c:7]([cH:18][nH:19][cH:20]1)[CH2:8][N:9]=[C:10]2[c:11]1[c:12]([F:17])[cH:13][cH:14][cH:15][cH:16]1>>MASK."} {"text":"The chemical with SMILES CC(C)(C)OC(=O)[NH:18][C:15]1([c:12]2[cH:11][cH:10][c:9]([I:8])[cH:14][cH:13]2)[CH2:16][CH2:17]1 is the masked component in the masked reaction SMILES string (one component masked as `MASK`) ClCCl.O.O=C(O)C(F)(F)F.O=C([O-])[O-].[K+].[K+].MASK>>[I:8][c:9]1[cH:10][cH:11][c:12]([C:15]2([NH2:18])[CH2:16][CH2:17]2)[cH:13][cH:14]1."}", "/scratch/micpie/export/ord_masked/test_1-3.jsonl": "{"text":"Task: Predict the masked component in a reaction SMILES with one element masked as `MASK`.\nDescription: CO.C[O:11][C:9]([CH:8]([c:7]1[c:2]([CH3:1])[n:3][c:4](-[c:23]2[cH:24][n:25][n:26]([CH3:28])[cH:27]2)[n:5][c:6]1-[c:16]1[cH:17][cH:18][c:19]([CH3:22])[cH:20][cH:21]1)[CH2:13][CH2:14][CH3:15])=[O:10].[Na+].[OH-]>>MASK\nAnswer: [CH3:1][c:2]1[n:3][c:4](-[c:23]2[cH:24][n:25][n:26]([CH3:28])[cH:27]2)[n:5][c:6](-[c:16]2[cH:17][cH:18][c:19]([CH3:22])[cH:20][cH:21]2)[c:7]1[CH:8]([C:9](=[O:10])[OH:11])[CH2:13][CH2:14][CH3:15]"} {"text":"Task: Predict the masked component in a masked reaction SMILES (one component masked as `MASK`).\nDescription: CCN(C(C)C)C(C)C.CN(C)C=O.ClCCl.O=S(Cl)Cl.O=[C:1]([OH:3])[c:4]1[cH:5][cH:6][c:7]([CH:8]=[O:9])[cH:10][cH:11]1.[C:16]([CH3:17])([CH3:18])([CH3:19])[O:20][C:21]([NH:22][c:23]1[c:24]([NH2:29])[cH:25][cH:26][cH:27][cH:28]1)=[O:30].MASK>>[C:1](=[O:3])([c:4]1[cH:5][cH:6][c:7]([CH:8]=[O:9])[cH:10][cH:11]1)[NH:29][c:24]1[c:23]([NH:22][C:21]([O:20][C:16]([CH3:17])([CH3:18])[CH3:19])=[O:30])[cH:28][cH:27][cH:26][cH:25]1\nAnswer: O=Cc1ccc(C(=O)Cl)cc1"}", "/scratch/micpie/export/ord_masked/test_1-0.jsonl": "{"text":"The masked component in the reaction SMILES with one element hidden as `MASK` CO.C[O:11][C:9]([CH:8]([c:7]1[c:2]([CH3:1])[n:3][c:4](-[c:23]2[cH:24][n:25][n:26]([CH3:28])[cH:27]2)[n:5][c:6]1-[c:16]1[cH:17][cH:18][c:19]([CH3:22])[cH:20][cH:21]1)[CH2:13][CH2:14][CH3:15])=[O:10].[Na+].[OH-]>>MASK is [CH3:1][c:2]1[n:3][c:4](-[c:23]2[cH:24][n:25][n:26]([CH3:28])[cH:27]2)[n:5][c:6](-[c:16]2[cH:17][cH:18][c:19]([CH3:22])[cH:20][cH:21]2)[c:7]1[CH:8]([C:9](=[O:10])[OH:11])[CH2:13][CH2:14][CH3:15]."} {"text":"The masked component in the masked RXNSMILES (one component masked as `MASK`) CCN(C(C)C)C(C)C.CN(C)C=O.ClCCl.O=S(Cl)Cl.O=[C:1]([OH:3])[c:4]1[cH:5][cH:6][c:7]([CH:8]=[O:9])[cH:10][cH:11]1.[C:16]([CH3:17])([CH3:18])([CH3:19])[O:20][C:21]([NH:22][c:23]1[c:24]([NH2:29])[cH:25][cH:26][cH:27][cH:28]1)=[O:30].MASK>>[C:1](=[O:3])([c:4]1[cH:5][cH:6][c:7]([CH:8]=[O:9])[cH:10][cH:11]1)[NH:29][c:24]1[c:23]([NH:22][C:21]([O:20][C:16]([CH3:17])([CH3:18])[CH3:19])=[O:30])[cH:28][cH:27][cH:26][cH:25]1 is O=Cc1ccc(C(=O)Cl)cc1."}", "/scratch/micpie/export/ord_masked/train_0-2.jsonl": "{"text":"Question: What is the masked component in the reaction SMILES with one element hidden as `MASK` [CH2:25]([CH:26]([CH3:27])[OH:28])[OH:29].[CH3:1][n:2]1[c:3](=[O:24])[n:4](-[c:13]2[cH:14][cH:15][c:16]3[c:17]([c:18]([CH:21]=[O:22])[n:19][s:20]3)[cH:23]2)[c:5](=[O:12])[cH:6][c:7]1[C:8]([F:9])([F:10])[F:11].[CH3:41][c:42]1[cH:43][cH:44][cH:45][cH:46][cH:47]1.[CH3:48][CH2:49][O:50][CH2:51][CH3:52].[OH2:53].[c:30]1([CH3:31])[cH:32][cH:33][c:34]([S:35]([OH:36])(=[O:37])=[O:38])[cH:39][cH:40]1>>MASK?\nAnswer: [CH3:1][n:2]1[c:3](=[O:24])[n:4](-[c:13]2[cH:14][cH:15][c:16]3[c:17]([c:18]([CH:21]4[O:22][CH2:25][CH:26]([CH3:27])[O:28]4)[n:19][s:20]3)[cH:23]2)[c:5](=[O:12])[cH:6][c:7]1[C:8]([F:9])([F:10])[F:11]."} {"text":"Question: What is the masked component in the reaction SMILES with one element masked as `MASK` [NH2:25][c:26]1[n:27][cH:28][c:29]([Br:33])[cH:30][c:31]1[CH3:32].MASK>>[F:1][CH:2]([c:3]1[cH:4][c:5](-[c:14]2[cH:15][cH:16][c:17]([C:20]([F:21])([F:22])[F:23])[cH:18][cH:19]2)[n:6][c:7]2[n:8]1[n:9][cH:10][c:11]2[C:12]#[C:13][c:29]1[cH:28][n:27][c:26]([NH2:25])[c:31]([CH3:32])[cH:30]1)[F:24]?\nAnswer: [F:1][CH:2]([c:3]1[cH:4][c:5](-[c:14]2[cH:15][cH:16][c:17]([C:20]([F:21])([F:22])[F:23])[cH:18][cH:19]2)[n:6][c:7]2[n:8]1[n:9][cH:10][c:11]2[C:12]#[CH:13])[F:24]."}", "/scratch/micpie/export/ord_masked/test_2-2.jsonl": "{"text":"Question: What is the masked component in the masked reaction SMILES string (one component masked as `MASK`) CC#N.O=[N+]([O-])c1ccc([O:43][C:42](=O)[NH:41][c:33]2[s:32][c:36]3[c:35]([n:34]2)[cH:40][cH:39][cH:38][cH:37]3)cc1.[O:1]1[c:2]2[c:3]([cH:7][c:8](-[c:11]3[s:12][c:13]([N:21]([CH2:22][CH2:23][O:24][c:25]4[cH:26][cH:27][cH:28][cH:29][cH:30]4)[CH3:31])[c:14]([C:16](=[O:17])[O:18][CH2:19][CH3:20])[n:15]3)[cH:9][cH:10]2)[NH:4][CH2:5][CH2:6]1>>MASK?\nAnswer: [O:1]1[c:2]2[c:3]([cH:7][c:8](-[c:11]3[s:12][c:13]([N:21]([CH2:22][CH2:23][O:24][c:25]4[cH:26][cH:27][cH:28][cH:29][cH:30]4)[CH3:31])[c:14]([C:16](=[O:17])[O:18][CH2:19][CH3:20])[n:15]3)[cH:9][cH:10]2)[N:4]([C:42]([NH:41][c:33]2[s:32][c:36]3[c:35]([n:34]2)[cH:40][cH:39][cH:38][cH:37]3)=[O:43])[CH2:5][CH2:6]1."} {"text":"Question: What is the masked component in the masked RXNSMILES (one component masked as `MASK`) C1COCCO1.CC[O:42][C:40]([c:4]1[c:3]([C:2]([F:1])([F:45])[F:46])[n:7](-[c:8]2[n:9][c:10](-[c:14]3[c:15]([O:20][CH2:21][c:22]4[cH:23][cH:24][c:25](\/[CH:28]=[CH:29]\/[c:30]5[cH:31][cH:32][c:33]([C:36]([F:37])([F:38])[F:39])[cH:34][cH:35]5)[cH:26][cH:27]4)[cH:16][cH:17][cH:18][cH:19]3)[cH:11][cH:12][cH:13]2)[n:6][cH:5]1)=[O:41].CC[O:54]C(C)=O.Cl.[H][H].[Li+].[OH-:49].MASK>>[F:1][C:2]([C:3](=[O:49])[OH:54])([F:45])[F:46].[F:1][C:2]([c:3]1[c:4]([C:40](=[O:41])[OH:42])[cH:5][n:6][n:7]1-[c:8]1[n:9][c:10](-[c:14]2[c:15]([O:20][CH2:21][c:22]3[cH:23][cH:24][c:25]([CH2:28][CH2:29][c:30]4[cH:31][cH:32][c:33]([C:36]([F:37])([F:38])[F:39])[cH:34][cH:35]4)[cH:26][cH:27]3)[cH:16][cH:17][cH:18][cH:19]2)[cH:11][cH:12][cH:13]1)([F:45])[F:46]?\nAnswer: O=[Pt]=O."}", "/scratch/micpie/export/ord_masked/train_1-1.jsonl": "{"text":"The chemical with SMILES Br[CH:2]([CH3:3])[c:4]1[cH:5][c:6]2[cH:7][cH:8][cH:9][cH:10][c:11]2[cH:12][cH:13]1 is the masked component in the masked reaction SMILES (one component masked as `MASK`) CC(=O)O.CC([O-])=[O:16].[Na+].MASK>>[C:2]([CH3:3])([c:4]1[cH:5][c:6]2[cH:7][cH:8][cH:9][cH:10][c:11]2[cH:12][cH:13]1)=[O:16]."} {"text":"The chemical with SMILES [O:1]1[CH:2]([CH2:3][O:4][c:5]2[c:6]([C:7](=[O:8])[CH2:9][CH2:10][C:11](=[O:12])[O:13][CH3:14])[cH:15][cH:16][cH:17][cH:18]2)[CH2:19]1 is the masked component in the masked reaction SMILES string (one component masked as `MASK`) [CH3:20][CH:21]([CH3:22])[NH2:23].MASK>>[OH:1][CH:2]([CH2:3][O:4][c:5]1[c:6]([C:7](=[O:8])[CH2:9][CH2:10][C:11](=[O:12])[O:13][CH3:14])[cH:15][cH:16][cH:17][cH:18]1)[CH2:19][NH:23][CH:21]([CH3:20])[CH3:22]."}", "/scratch/micpie/export/ord_masked/train_0-1.jsonl": "{"text":"The compound with SMILES [CH3:1][n:2]1[c:3](=[O:24])[n:4](-[c:13]2[cH:14][cH:15][c:16]3[c:17]([c:18]([CH:21]4[O:22][CH2:25][CH:26]([CH3:27])[O:28]4)[n:19][s:20]3)[cH:23]2)[c:5](=[O:12])[cH:6][c:7]1[C:8]([F:9])([F:10])[F:11] is the masked component in the reaction SMILES with one element hidden as `MASK` [CH2:25]([CH:26]([CH3:27])[OH:28])[OH:29].[CH3:1][n:2]1[c:3](=[O:24])[n:4](-[c:13]2[cH:14][cH:15][c:16]3[c:17]([c:18]([CH:21]=[O:22])[n:19][s:20]3)[cH:23]2)[c:5](=[O:12])[cH:6][c:7]1[C:8]([F:9])([F:10])[F:11].[CH3:41][c:42]1[cH:43][cH:44][cH:45][cH:46][cH:47]1.[CH3:48][CH2:49][O:50][CH2:51][CH3:52].[OH2:53].[c:30]1([CH3:31])[cH:32][cH:33][c:34]([S:35]([OH:36])(=[O:37])=[O:38])[cH:39][cH:40]1>>MASK."} {"text":"The chemical with SMILES [F:1][CH:2]([c:3]1[cH:4][c:5](-[c:14]2[cH:15][cH:16][c:17]([C:20]([F:21])([F:22])[F:23])[cH:18][cH:19]2)[n:6][c:7]2[n:8]1[n:9][cH:10][c:11]2[C:12]#[CH:13])[F:24] is the masked component in the reaction SMILES with one element masked as `MASK` [NH2:25][c:26]1[n:27][cH:28][c:29]([Br:33])[cH:30][c:31]1[CH3:32].MASK>>[F:1][CH:2]([c:3]1[cH:4][c:5](-[c:14]2[cH:15][cH:16][c:17]([C:20]([F:21])([F:22])[F:23])[cH:18][cH:19]2)[n:6][c:7]2[n:8]1[n:9][cH:10][c:11]2[C:12]#[C:13][c:29]1[cH:28][n:27][c:26]([NH2:25])[c:31]([CH3:32])[cH:30]1)[F:24]."}", "/scratch/micpie/export/ord_masked/valid_1-0.jsonl": "{"text":"The masked component in the reaction SMILES with one element hidden as `MASK` C=CCBr.CN(C)C=O.C[C:24]([O-])([CH3:23])[CH3:25].[K+].O.[Cl:1][c:2]1[cH:3][c:4]2[c:5]([cH:21][cH:22]1)-[c:6]1[c:7]([cH:18][nH:19][cH:20]1)[CH2:8][N:9]=[C:10]2[c:11]1[c:12]([F:17])[cH:13][cH:14][cH:15][cH:16]1>>MASK is [Cl:1][c:2]1[cH:3][c:4]2[c:5]([cH:21][cH:22]1)-[c:6]1[c:7]([cH:18][n:19]([CH2:25][CH:24]=[CH2:23])[cH:20]1)[CH2:8][N:9]=[C:10]2[c:11]1[c:12]([F:17])[cH:13][cH:14][cH:15][cH:16]1."} {"text":"The masked component in the masked reaction SMILES (one component masked as `MASK`) ClCCl.O.O=C(O)C(F)(F)F.O=C([O-])[O-].[K+].[K+].MASK>>[I:8][c:9]1[cH:10][cH:11][c:12]([C:15]2([NH2:18])[CH2:16][CH2:17]2)[cH:13][cH:14]1 is CC(C)(C)OC(=O)[NH:18][C:15]1([c:12]2[cH:11][cH:10][c:9]([I:8])[cH:14][cH:13]2)[CH2:16][CH2:17]1."}", "/scratch/micpie/export/ord_masked/valid_2-3.jsonl": "{"text":"Task: Predict the masked component in a reaction SMILES with one element hidden as `MASK`.\nDescription: [c:1]1([S:7](=[O:8])(=[O:9])[c:10]2[cH:11][cH:12][c:13]([C:14](=[O:15])[OH:16])[cH:17][cH:18]2)[cH:2][cH:3][cH:4][cH:5][cH:6]1.MASK>>[c:1]1([S:7](=[O:8])(=[O:9])[c:10]2[cH:11][cH:12][c:13]([C:14](=[O:16])[N:19]3[CH:20]([CH2:24][N:25]4[CH2:26][CH2:27][CH2:28][CH2:29]4)[CH2:21][CH2:22][CH2:23]3)[cH:17][cH:18]2)[cH:2][cH:3][cH:4][cH:5][cH:6]1\nAnswer: [NH:19]1[CH:20]([CH2:24][N:25]2[CH2:26][CH2:27][CH2:28][CH2:29]2)[CH2:21][CH2:22][CH2:23]1"} {"text":"Task: Predict the masked component in a masked reaction SMILES string (one component masked as `MASK`).\nDescription: O=[c:5]1[nH:4][c:3](=[O:10])[c:2]([Br:1])[c:7]([CH3:8])[nH:6]1.MASK>>[Br:1][c:2]1[c:3](=[O:10])[nH:4][c:5]([Cl:11])[n:6][c:7]1[CH3:8]\nSolution: Cc1nc([Cl:11])[nH]c(=O)c1C"}", "/scratch/micpie/export/ord_masked/valid_0-3.jsonl": "{"text":"Task: Predict the masked component in a reaction SMILES with one element hidden as `MASK`.\nDescription: C[CH2:1][O:3][CH:4]([CH2:5][P:6](=[O:7])([O:8][CH2:9][CH3:10])[O:11][CH2:12][CH3:13])[O:14][CH2:15][CH3:16].Cc1ccc(S(=O)(=O)O)cc1.Cc1ccccc1.O=C([O-])O.[Na+].OCC(O)C[Br:17]>>MASK\nSolution: [CH2:1]1[O:3][C@@H:4]([CH2:5][P:6](=[O:7])([O:8][CH2:9][CH3:10])[O:11][CH2:12][CH3:13])[O:14][C@@H:15]1[CH2:16][Br:17]"} {"text":"Task: Predict the masked component in a reaction SMILES with one element masked as `MASK`.\nDescription: CC[O:3][C:4]([CH:5]([OH:6])[CH3:7])=[O:8].Cl[CH2:9][CH:11]1[CH2:12][O:13]1.MASK>>[OH:3][C:4]([CH:5]([O:6][CH2:12][CH:11]([CH2:9][OH:20])[OH:13])[CH3:7])=[O:8]\nAnswer: CC[O:20]CC.FB(F)F"}", "/scratch/micpie/export/ord_masked/train_1-2.jsonl": "{"text":"Question: What is the masked component in the masked reaction SMILES (one component masked as `MASK`) CC(=O)O.CC([O-])=[O:16].[Na+].MASK>>[C:2]([CH3:3])([c:4]1[cH:5][c:6]2[cH:7][cH:8][cH:9][cH:10][c:11]2[cH:12][cH:13]1)=[O:16]?\nAnswer: Br[CH:2]([CH3:3])[c:4]1[cH:5][c:6]2[cH:7][cH:8][cH:9][cH:10][c:11]2[cH:12][cH:13]1."} {"text":"Question: What is the masked component in the masked RXNSMILES (one component masked as `MASK`) [CH3:20][CH:21]([CH3:22])[NH2:23].MASK>>[OH:1][CH:2]([CH2:3][O:4][c:5]1[c:6]([C:7](=[O:8])[CH2:9][CH2:10][C:11](=[O:12])[O:13][CH3:14])[cH:15][cH:16][cH:17][cH:18]1)[CH2:19][NH:23][CH:21]([CH3:20])[CH3:22]?\nAnswer: [O:1]1[CH:2]([CH2:3][O:4][c:5]2[c:6]([C:7](=[O:8])[CH2:9][CH2:10][C:11](=[O:12])[O:13][CH3:14])[cH:15][cH:16][cH:17][cH:18]2)[CH2:19]1."}", "/scratch/micpie/export/ord_masked/test_2-3.jsonl": "{"text":"Task: Predict the masked component in a reaction SMILES with one element masked as `MASK`.\nDescription: CC#N.O=[N+]([O-])c1ccc([O:43][C:42](=O)[NH:41][c:33]2[s:32][c:36]3[c:35]([n:34]2)[cH:40][cH:39][cH:38][cH:37]3)cc1.[O:1]1[c:2]2[c:3]([cH:7][c:8](-[c:11]3[s:12][c:13]([N:21]([CH2:22][CH2:23][O:24][c:25]4[cH:26][cH:27][cH:28][cH:29][cH:30]4)[CH3:31])[c:14]([C:16](=[O:17])[O:18][CH2:19][CH3:20])[n:15]3)[cH:9][cH:10]2)[NH:4][CH2:5][CH2:6]1>>MASK\nAnswer: [O:1]1[c:2]2[c:3]([cH:7][c:8](-[c:11]3[s:12][c:13]([N:21]([CH2:22][CH2:23][O:24][c:25]4[cH:26][cH:27][cH:28][cH:29][cH:30]4)[CH3:31])[c:14]([C:16](=[O:17])[O:18][CH2:19][CH3:20])[n:15]3)[cH:9][cH:10]2)[N:4]([C:42]([NH:41][c:33]2[s:32][c:36]3[c:35]([n:34]2)[cH:40][cH:39][cH:38][cH:37]3)=[O:43])[CH2:5][CH2:6]1"} {"text":"Task: Predict the masked component in a reaction SMILES with one element masked as `MASK`.\nDescription: C1COCCO1.CC[O:42][C:40]([c:4]1[c:3]([C:2]([F:1])([F:45])[F:46])[n:7](-[c:8]2[n:9][c:10](-[c:14]3[c:15]([O:20][CH2:21][c:22]4[cH:23][cH:24][c:25](\/[CH:28]=[CH:29]\/[c:30]5[cH:31][cH:32][c:33]([C:36]([F:37])([F:38])[F:39])[cH:34][cH:35]5)[cH:26][cH:27]4)[cH:16][cH:17][cH:18][cH:19]3)[cH:11][cH:12][cH:13]2)[n:6][cH:5]1)=[O:41].CC[O:54]C(C)=O.Cl.[H][H].[Li+].[OH-:49].MASK>>[F:1][C:2]([C:3](=[O:49])[OH:54])([F:45])[F:46].[F:1][C:2]([c:3]1[c:4]([C:40](=[O:41])[OH:42])[cH:5][n:6][n:7]1-[c:8]1[n:9][c:10](-[c:14]2[c:15]([O:20][CH2:21][c:22]3[cH:23][cH:24][c:25]([CH2:28][CH2:29][c:30]4[cH:31][cH:32][c:33]([C:36]([F:37])([F:38])[F:39])[cH:34][cH:35]4)[cH:26][cH:27]3)[cH:16][cH:17][cH:18][cH:19]2)[cH:11][cH:12][cH:13]1)([F:45])[F:46]\nSolution: O=[Pt]=O"}", "/scratch/micpie/export/bio_ner_25/train_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: These included the first enzyme in gluconeogenesis (pckA, phosphoenolpyruvate carboxykinase, BAB1 _ 2091), four genes involved in TCA cycle and pyruvate metabolism (fumB, fumarate hydratase, BAB1 _ 0977; lpdA, dihydrolipoamide dehydrogenase, BAB2 _ 0712; pyruvate dehydrogenase, BAB2 _ 0032; acetyl-CoA acetyltransferase, BAB2 _ 0443), three genes involved in amino or fatty acid metabolism (aldehyde dehydrogenases, BAB2 _ 1130, BAB2 _ 1114; hydroxymethylglutaryl-CoA lyase, BAB1 _ 0017), and two genes involve in benzoate degradation (pcaC, carboxymuconolactone decarboxylase, BAB2 _ 0597; pcaI, coenzyme A transferase, BAB2 _ 0604)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: pckA,53,57,Protein\nphosphoenolpyruvate,59,78,Chemical\nBAB1 _ 2091,94,105,Protein\npyruvate,145,153,Chemical\nfumB,167,171,Protein\nfumarate,173,181,Chemical\nBAB1 _ 0977,193,204,Protein\nlpdA,206,210,Protein\ndihydrolipoamide,212,228,Chemical\nBAB2 _ 0712,244,255,Protein\npyruvate,257,265,Chemical\nBAB2 _ 0032,281,292,Protein\nacetyl - CoA,294,306,Chemical\nBAB2 _ 0443,326,337,Protein\naldehyde,397,405,Chemical\nBAB2 _ 1130,422,433,Protein\nBAB2 _ 1114,435,446,Protein\nhydroxymethylglutaryl - CoA,448,475,Chemical\nBAB1 _ 0017,483,494,Protein\nbenzoate,522,530,Chemical\npcaC,545,549,Protein\ncarboxymuconolactone,551,571,Chemical\nBAB2 _ 0597,587,598,Protein\npcaI,600,604,Protein\nBAB2 _ 0604,630,641,Protein"} {"text":"Task: Please carry out the NER task for the the text below.\nText: Several genes involved in inflammation and the immune system are located in the regions of the markers identified: TNF superfamily members Tnfsf4, 6, and 8 (chr. 1, 84.9-85 cM) which are involved in T cell activation [ 37, 38]; three selectin genes (Sele, Sell, Selp, chr. 1, 86.6 cM) which are involved in immune cell infiltration into inflamed tissues [ 39]; several members of immune cell surface proteins of the Slam family (slamf1, 2, 5, 6, and 9; chr. 1, 89.5-93.3 cM) [ 40]; the chemokine gene Xcl1 (chr. 1, 87 cM) which is expressed by mast cells and recruits lymphocytes [ 41]; several immunoglobulin Fc receptor genes (Fcrl3, Fcgr2b, and Fcgr3 at chr. 1, 92.3 cM; Fcer1g at chr. 1, 93.3 cM; Fcer1a at chr. 1, 94.2 cM); the flagellin receptor Tlr5 (chr. 1, 98 cM); Mmp3 (chr. 9, 1 cM) which recruits CD4+ lymphocytes [ 42]; Mmp7 (chr. 9, 1 cM) which activates Paneth cell-derived cryptdins (alpha-defensins) [ 43]; Icam1 (chr. 9, 7 cM) which is involved in lymphocyte infiltration into inflamed tissues [ 44]; Kitl (chr. 10, 57 cM) which is also known as stem cell factor, and is crucial for mast cell differentiation [ 45]; Im5 (chr. 10, 65 cM) which is involved in antibody-responsiveness [ 46]; Lyzs (chr. 10, 66 cM) which is a Paneth cell product that digests cell walls of bacteria [ 47]; Ifng (chr. 10, 67 cM) which is an important inflammatory signal in CF as well as other conditions [ 48]; Il22 (chr. 10, 67 cM), a member of the anti-inflammatory IL-10 interleukin family [ 49]; and the Stat2 and 6 genes (chr. 10, 70 cM) which are important components of intracellular signaling pathways [ 50]..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: TNF superfamily members,115,138,Gene_or_geneproduct\nTnfsf4,139,145,Gene_or_geneproduct\nSele,255,259,Gene_or_geneproduct\nSell,261,265,Gene_or_geneproduct\nSelp,267,271,Gene_or_geneproduct\nslamf1,436,442,Gene_or_geneproduct\nXcl1,512,516,Gene_or_geneproduct\nFcrl3,642,647,Gene_or_geneproduct\nFcgr2b,649,655,Gene_or_geneproduct\nFcgr3,661,666,Gene_or_geneproduct\nFcer1g,688,694,Gene_or_geneproduct\nFcer1a,716,722,Gene_or_geneproduct\nTlr5,768,772,Gene_or_geneproduct\nMmp3,791,795,Gene_or_geneproduct\nCD4,827,830,Gene_or_geneproduct\nMmp7,852,856,Gene_or_geneproduct\nIcam1,949,954,Gene_or_geneproduct\nKitl,1045,1049,Gene_or_geneproduct\nstem cell factor,1091,1107,Gene_or_geneproduct\nIm5,1161,1164,Gene_or_geneproduct\nLyzs,1237,1241,Gene_or_geneproduct\nIfng,1334,1338,Gene_or_geneproduct\nIl22,1440,1444,Gene_or_geneproduct\nIL - 10 interleukin,1500,1519,Gene_or_geneproduct\nStat2,1542,1547,Gene_or_geneproduct"}", "/scratch/micpie/export/compound_protein_compound_3/valid_5-2.jsonl": "{"text":"User: Can you come up with an example for a compound DeepSMILES that targets the protein Monoamine oxidase type B?\nAssistant: Sure, the compound DeepSMILES C\/N=C\\NC=O)\/C=C\\cc[nH]ccccBr)cc96))))))))))N5C targets the protein Monoamine oxidase type B.\nUser: Can you create another compound DeepSMILES that targets the protein Monoamine oxidase type B?\nAssistant: Yes, the compound DeepSMILES C\/N=C\\NC=O)\/C=C\\cc[nH]ccccBr)cc96))))))))))N5C targets the compound DeepSMILES Clc1ccc(\/C=N\/Nc2nc3ccccc3nc2Cc2ccccc2)cc1."} {"text":"User: Can you come up with one example for a compound InChI that targets the protein DA transporter?\nAssistant: Sure, the compound InChI InChI=1S\/C21H30FNO2S2\/c1-23-17-7-8-20(23)19(14-25-21(24)9-10-27-12-11-26-2)18(13-17)15-3-5-16(22)6-4-15\/h3-6,17-20H,7-14H2,1-2H3\/t17?,18-,19+,20?\/m1\/s1 targets the protein DA transporter.\nUser: Can you generate another compound InChI that targets the protein DA transporter?\nAssistant: Yes, the compound InChI InChI=1S\/C21H30FNO2S2\/c1-23-17-7-8-20(23)19(14-25-21(24)9-10-27-12-11-26-2)18(13-17)15-3-5-16(22)6-4-15\/h3-6,17-20H,7-14H2,1-2H3\/t17?,18-,19+,20?\/m1\/s1 targets the compound InChI COC[C@@]12CNCC[C@]1(c1ccc3ccccc3c1)C2."}", "/scratch/micpie/export/compound_protein_compound_3/test_8-1.jsonl": "{"text":"The protein Carbonic anhydrase I is targeted by the compound with the canonical SMILES Cc1cc(C)[n+](-c2ccc(S(N)(=O)=O)cc2)c(C)c1C.[O-][Cl+3]([O-])([O-])[O-] and NS(=O)(=O)c1ccc(C(=O)NCc2cn([C@@H]3O[C@H](CO)[C@@H](O[C@H]4O[C@H](CO)[C@@H](O)[C@H](O)[C@H]4O)[C@H](O)[C@H]3O)nn2)cc1."} {"text":"The protein Carbonic anhydrase II is targeted by the compound with the InChI InChI=1S\/C13H20N4O4S\/c14-22(20,21)12-3-1-11(2-4-12)15-13(19)17-7-5-16(6-8-17)9-10-18\/h1-4,18H,5-10H2,(H,15,19)(H2,14,20,21) and NS(=O)(=O)c1ccc(CCOC(=O)CN(CCN(CC(=O)O)CC(=O)O)CC(=O)O)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/train_2-2.jsonl": "{"text":"User: Can you come up with one example for a compound DeepSMILES that targets the protein p110alpha?\nAssistant: Yes, of course, the compound DeepSMILES O=CccccC=O)CF)F)F)))cc6))))))NCCOcccc-ccccccccc6c%10))))))))))cc6C%11 targets the protein p110alpha.\nUser: Can you create another compound DeepSMILES that targets the protein p110alpha?\nAssistant: Yes, of course, the compound DeepSMILES O=CccccC=O)CF)F)F)))cc6))))))NCCOcccc-ccccccccc6c%10))))))))))cc6C%11 targets the compound DeepSMILES Nc1cn2nc(-c3cnc(Cl)c(NS(=O)(=O)c4ccc(F)cc4)c3)ccc2n1."} {"text":"User: Can you come up with an example for a compound SELFIES that targets the protein Adenosine receptor A2a?\nAssistant: Yes, the compound SELFIES [O][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][N][Branch1][=N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][=C][Branch1][C][Cl][C][=C][N][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][=N][N][=Ring2][Ring1][C] targets the protein Adenosine receptor A2a.\nUser: Can you create another compound SELFIES that targets the protein Adenosine receptor A2a?\nAssistant: Of course, the compound SELFIES [O][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][N][Branch1][=N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][=C][Branch1][C][Cl][C][=C][N][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][=N][N][=Ring2][Ring1][C] targets the compound SELFIES Cn1c(=O)n(CC2CC2)c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc21."}", "/scratch/micpie/export/compound_protein_compound_3/test_4-0.jsonl": "{"text":"The compound SELFIES [O][=C][Branch1][C][O][C][C][S][C][Branch2][Ring1][#Branch2][C][Branch1][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][=N][C][=Ring2][Ring1][Ring2][C][=C][C][=N][NH1][Ring1][Branch1] targets the protein Chemoattractant receptor-homologous molecule expressed on Th2 cells and which is also targeted by the compound COc1ccc(CC(=O)O)cc1C1=NCC(=O)N(Cc2ccc(Cl)cc2Cl)c2ccccc21."} {"text":"The compound InChI InChI=1S\/C12H19NS\/c1-3-8-14-12-6-4-11(5-7-12)9-10(2)13\/h4-7,10H,3,8-9,13H2,1-2H3\/t10-\/m0\/s1 targets the protein MAO-B and which is also targeted by the compound Clc1ccc(\/C=N\/Nc2nc3ccccc3nc2Cc2ccccc2)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/valid_5-1.jsonl": "{"text":"The protein Amine oxidase [flavin-containing] B is targeted by the compound with the SELFIES [C][\/N][=C][\\N][C][=Branch1][C][=O][\/C][=Branch2][Ring1][C][=C][\\C][=C][NH1][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch2][Ring1][#Branch1][N][Ring1][P][C] and Clc1ccc(\/C=N\/Nc2nc3ccccc3nc2Cc2ccccc2)cc1."} {"text":"The protein Sodium-dependent dopamine transporter is targeted by the compound with the InChI InChI=1S\/C21H30FNO2S2\/c1-23-17-7-8-20(23)19(14-25-21(24)9-10-27-12-11-26-2)18(13-17)15-3-5-16(22)6-4-15\/h3-6,17-20H,7-14H2,1-2H3\/t17?,18-,19+,20?\/m1\/s1 and COC[C@@]12CNCC[C@]1(c1ccc3ccccc3c1)C2."}", "/scratch/micpie/export/compound_protein_compound_3/valid_8-2.jsonl": "{"text":"User: Can you give me an example for a compound DeepSMILES that targets the protein Carbonic anhydrase I?\nAssistant: Yes, of course, the compound DeepSMILES NS=O)=O)ccccCCncS)nccccI)cc6c%10=O)))))))))))))cc6 targets the protein Carbonic anhydrase I.\nUser: Can you generate another compound DeepSMILES that targets the protein Carbonic anhydrase I?\nAssistant: Yes, the compound DeepSMILES NS=O)=O)ccccCCncS)nccccI)cc6c%10=O)))))))))))))cc6 targets the compound DeepSMILES NS(=O)(=O)c1ccc(C(=O)NCc2cn([C@@H]3O[C@H](CO)[C@@H](O[C@H]4O[C@H](CO)[C@@H](O)[C@H](O)[C@H]4O)[C@H](O)[C@H]3O)nn2)cc1."} {"text":"User: Can you give me an example for a compound canonical SMILES that targets the protein Carbonic anhydrase C?\nAssistant: Yes, of course, the compound canonical SMILES CN[C@H]1C[C@H](C)S(=O)(=O)c2sc(S(N)(=O)=O)cc21 targets the protein Carbonic anhydrase C.\nUser: Can you create another compound canonical SMILES that targets the protein Carbonic anhydrase C?\nAssistant: Yes, of course, the compound canonical SMILES CN[C@H]1C[C@H](C)S(=O)(=O)c2sc(S(N)(=O)=O)cc21 targets the compound canonical SMILES NS(=O)(=O)c1ccc(CCOC(=O)CN(CCN(CC(=O)O)CC(=O)O)CC(=O)O)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/test_1-2.jsonl": "{"text":"User: Can you give me one example for a compound SMILES that targets the protein Protein-tyrosine phosphatase 1B?\nAssistant: Of course, the compound SMILES N[C@@H](Cc1ccc(C(F)(F)P(=O)(O)O)cc1)C(=O)O targets the protein Protein-tyrosine phosphatase 1B.\nUser: Can you create another compound SMILES that targets the protein Protein-tyrosine phosphatase 1B?\nAssistant: Yes, of course, the compound SMILES N[C@@H](Cc1ccc(C(F)(F)P(=O)(O)O)cc1)C(=O)O targets the compound SMILES O=C(O)[C@H](Oc1ccc(-c2ccc(-c3c(Cc4ccc(O)cc4O)sc4ccccc34)cc2)cc1)c1ccccc1."} {"text":"User: Can you give me an example for a compound DeepSMILES that targets the protein Phosphoinositide-3-kinase catalytic alpha polypeptide?\nAssistant: Sure, the compound DeepSMILES CCcncnc-ccccC=O)NC[C@H]CNC=O)OCC)C)C))))C[C@H]5C8)))))))))cF)c6))))))c6C#CccccN)nc6 targets the protein Phosphoinositide-3-kinase catalytic alpha polypeptide.\nUser: Can you create another compound DeepSMILES that targets the protein Phosphoinositide-3-kinase catalytic alpha polypeptide?\nAssistant: Yes, the compound DeepSMILES CCcncnc-ccccC=O)NC[C@H]CNC=O)OCC)C)C))))C[C@H]5C8)))))))))cF)c6))))))c6C#CccccN)nc6 targets the compound DeepSMILES Nc1cn2nc(-c3cnc(Cl)c(NS(=O)(=O)c4ccc(F)cc4)c3)ccc2n1."}", "/scratch/micpie/export/compound_protein_compound_3/test_3-2.jsonl": "{"text":"User: Can you come up with one example for a compound SMILES that targets the protein Adenosine receptor A2a?\nAssistant: Of course, the compound SMILES N#Cc1sc(NC(=O)c2ccco2)nc1-c1ccccc1 targets the protein Adenosine receptor A2a.\nUser: Can you create another compound SMILES that targets the protein Adenosine receptor A2a?\nAssistant: Of course, the compound SMILES N#Cc1sc(NC(=O)c2ccco2)nc1-c1ccccc1 targets the compound SMILES Cn1c(=O)n(CC2CC2)c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc21."} {"text":"User: Can you give me an example for a compound SMILES that targets the protein G-protein coupled receptor 44?\nAssistant: Sure, the compound SMILES C[C@@H]1CN(c2ccc(F)cc2)CCN1C(=O)c1ccc2c(=O)n(-c3ccc(F)cc3)c(CCCCC(=O)NS(=O)(=O)C3CC3)cc2c1 targets the protein G-protein coupled receptor 44.\nUser: Can you create another compound SMILES that targets the protein G-protein coupled receptor 44?\nAssistant: Yes, the compound SMILES C[C@@H]1CN(c2ccc(F)cc2)CCN1C(=O)c1ccc2c(=O)n(-c3ccc(F)cc3)c(CCCCC(=O)NS(=O)(=O)C3CC3)cc2c1 targets the compound SMILES COc1ccc(CC(=O)O)cc1C1=NCC(=O)N(Cc2ccc(Cl)cc2Cl)c2ccccc21."}", "/scratch/micpie/export/compound_protein_compound_3/valid_5-0.jsonl": "{"text":"The compound canonical SMILES C\/N=C1\\NC(=O)\/C(=C\\c2c[nH]c3ccc(Br)cc23)N1C targets the protein Amine oxidase [flavin-containing] B and which is also targeted by the compound Clc1ccc(\/C=N\/Nc2nc3ccccc3nc2Cc2ccccc2)cc1."} {"text":"The compound DeepSMILES CSCCSCCC=O)OC[C@@H]CCCCC[C@@H]7ccccF)cc6))))))))N5C targets the protein DAT and which is also targeted by the compound COC[C@@]12CNCC[C@]1(c1ccc3ccccc3c1)C2."}", "/scratch/micpie/export/compound_protein_compound_3/train_7-1.jsonl": "{"text":"The protein CA-II is targeted by the compound with the SELFIES [N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][N][C][C][N][C][=Branch1][C][=S][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][C][Cl][C][Branch1][C][Cl][=C][Ring1][Branch2][C][=C][Ring2][Ring1][#Branch1] and CC(=O)NC1Cc2ccc(S(N)(=O)=O)cc2C1."} {"text":"The protein CA-I is targeted by the compound with the SMILES CC(C)(C)c1ccc(S(=O)(=O)Nc2cccc(S(N)(=O)=O)c2)cc1 and NS(=O)(=O)c1ccc(C(=O)NCc2cn([C@@H]3O[C@H](CO)[C@@H](O[C@H]4O[C@H](CO)[C@@H](O)[C@H](O)[C@H]4O)[C@H](O)[C@H]3O)nn2)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/test_9-0.jsonl": "{"text":"The compound InChI InChI=1S\/C21H35N5O7S\/c1-14(2)13-17(22)19(27)18(23)21(29)26-8-10-33-12-11-32-9-7-25-20(28)15-3-5-16(6-4-15)34(24,30)31\/h3-6,14,17-18H,7-13,22-23H2,1-2H3,(H,25,28)(H,26,29)(H2,24,30,31)\/t17?,18-\/m1\/s1 targets the protein Carbonic anhydrase II and which is also targeted by the compound NS(=O)(=O)c1ccc(CCOC(=O)CN(CCN(CC(=O)O)CC(=O)O)CC(=O)O)cc1."} {"text":"The compound SELFIES [O][=C][Branch1][S][C][N][O][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][N][O] targets the protein Cyanamide hydratase CA1 and which is also targeted by the compound NS(=O)(=O)c1ccc(\/N=C\/c2ccccc2)c(F)c1."}", "/scratch/micpie/export/compound_protein_compound_3/valid_9-1.jsonl": "{"text":"The protein Cyanamide hydratase CA2 is targeted by the compound with the canonical SMILES COc1cccc(OC)c1C(=O)Oc1ccc2nc(S(N)(=O)=O)sc2c1 and NS(=O)(=O)c1ccc(CCOC(=O)CN(CCN(CC(=O)O)CC(=O)O)CC(=O)O)cc1."} {"text":"The protein Carbonic anhydrase 1 is targeted by the compound with the InChI InChI=1S\/C16H15N3O5S2\/c1-23-12-8-7-11-13(14(12)24-2)18-16(25)19(15(11)20)9-3-5-10(6-4-9)26(17,21)22\/h3-8H,1-2H3,(H,18,25)(H2,17,21,22) and NS(=O)(=O)c1ccc(\/N=C\/c2ccccc2)c(F)c1."}", "/scratch/micpie/export/compound_protein_compound_3/valid_3-0.jsonl": "{"text":"The compound SELFIES [C][O][N][C][=N][C][Branch1][O][C][#C][C][=C][C][=N][C][=C][Ring1][=Branch1][=N][C][=C][Ring1][=C][N][=C][N][Ring1][Branch1][C@@H1][O][C@H1][Branch1][Ring1][C][O][C@@H1][Branch1][C][O][C@H1][Ring1][Branch2][O] targets the protein Adenosine receptor A2a and which is also targeted by the compound Cn1c(=O)n(CC2CC2)c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc21."} {"text":"The compound InChI InChI=1S\/C21H22ClNO5S\/c1-14-5-9-18(29(26,27)23-21(2,3)4)12-15(14)6-7-16-11-17(22)8-10-19(16)28-13-20(24)25\/h5,8-12,23H,13H2,1-4H3,(H,24,25) targets the protein CD antigen CD294 and which is also targeted by the compound COc1ccc(CC(=O)O)cc1C1=NCC(=O)N(Cc2ccc(Cl)cc2Cl)c2ccccc21."}", "/scratch/micpie/export/compound_protein_compound_3/test_0-1.jsonl": "{"text":"The protein Tyr-DNA phosphodiesterase 1 is targeted by the compound with the SELFIES [C][C][C][=C][C][=N][C][Branch1][#C][C][O][C][=Branch1][C][=O][C][=C][N][=C][C][=N][Ring1][=Branch1][=C][C][=Branch1][C][=O][N][Ring1][P][C][=Ring2][Ring1][Branch1] and C\/C(=N\\NC(=O)c1cc(-c2cccs2)nc2ccccc12)c1ccccn1."} {"text":"The protein PTP-1B is targeted by the compound with the canonical SMILES O=C(O)COc1ccc(S(=O)(=O)N(Cc2ccc(-c3csnn3)cc2)Cc2ccc(C(F)(F)P(=O)(O)O)cc2)cc1 and O=C(O)[C@H](Oc1ccc(-c2ccc(-c3c(Cc4ccc(O)cc4O)sc4ccccc34)cc2)cc1)c1ccccc1."}", "/scratch/micpie/export/compound_protein_compound_3/test_5-0.jsonl": "{"text":"The compound SMILES Cn1c(=O)c2c(cc(\/C=C\/c3cccc(C(F)(F)F)c3)n2C)n(C)c1=O targets the protein MAO-B and which is also targeted by the compound Clc1ccc(\/C=N\/Nc2nc3ccccc3nc2Cc2ccccc2)cc1."} {"text":"The compound InChI InChI=1S\/C15H19Cl2NO\/c1-19-9-11-8-18-12-4-5-15(11,7-12)10-2-3-13(16)14(17)6-10\/h2-3,6,11-12,18H,4-5,7-9H2,1H3\/t11-,12+,15-\/m1\/s1 targets the protein Solute carrier family 6 member 3 and which is also targeted by the compound COC[C@@]12CNCC[C@]1(c1ccc3ccccc3c1)C2."}", "/scratch/micpie/export/compound_protein_compound_3/test_2-0.jsonl": "{"text":"The compound DeepSMILES COC=O)cccCNccccS=O)=O)NcnncC)s5)))))))cc6))))))))cc-ccccOC))nc6))))))c6 targets the protein PI3-kinase subunit alpha and which is also targeted by the compound Nc1cn2nc(-c3cnc(Cl)c(NS(=O)(=O)c4ccc(F)cc4)c3)ccc2n1."} {"text":"The compound SMILES N#Cc1c(NC(=O)C2CC2)nc(-c2ccccc2)nc1-c1ccccc1 targets the protein Adenosine receptor A2a and which is also targeted by the compound Cn1c(=O)n(CC2CC2)c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc21."}", "/scratch/micpie/export/compound_protein_compound_3/valid_2-2.jsonl": "{"text":"User: Can you give me an example for a compound SMILES that targets the protein PtdIns-3-kinase subunit alpha?\nAssistant: Yes, the compound SMILES Clc1ccccc1-c1ocnc1-c1cc2c(s1)-c1ncccc1OCC2 targets the protein PtdIns-3-kinase subunit alpha.\nUser: Can you tell another compound SMILES that targets the protein PtdIns-3-kinase subunit alpha?\nAssistant: Yes, the compound SMILES Clc1ccccc1-c1ocnc1-c1cc2c(s1)-c1ncccc1OCC2 targets the compound SMILES Nc1cn2nc(-c3cnc(Cl)c(NS(=O)(=O)c4ccc(F)cc4)c3)ccc2n1."} {"text":"User: Can you give me an example for a compound canonical SMILES that targets the protein Adenosine receptor A2a?\nAssistant: Of course, the compound canonical SMILES S=c1sc2c(ncn3nc(-c4ccco4)nc23)n1-c1ccc(I)cc1 targets the protein Adenosine receptor A2a.\nUser: Can you generate another compound canonical SMILES that targets the protein Adenosine receptor A2a?\nAssistant: Yes, the compound canonical SMILES S=c1sc2c(ncn3nc(-c4ccco4)nc23)n1-c1ccc(I)cc1 targets the compound canonical SMILES Cn1c(=O)n(CC2CC2)c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc21."}", "/scratch/micpie/export/compound_protein_compound_3/valid_0-0.jsonl": "{"text":"The compound InChI InChI=1S\/C13H18N4O3S.BrH\/c1-3-20-9(18)6-4-5-8-11(16-12(14)21-8)10-7(2)15-13(19)17-10;\/h3-6H2,1-2H3,(H2,14,16)(H2,15,17,19);1H targets the protein Tyrosyl-DNA phosphodiesterase 1 and which is also targeted by the compound C\/C(=N\\NC(=O)c1cc(-c2cccs2)nc2ccccc12)c1ccccn1."} {"text":"The compound InChI InChI=1S\/C33H30F4O10P2\/c34-32(35,48(40,41)42)27-15-11-23(12-16-27)19-31(29(38)46-21-25-7-3-1-4-8-25,30(39)47-22-26-9-5-2-6-10-26)20-24-13-17-28(18-14-24)33(36,37)49(43,44)45\/h1-18H,19-22H2,(H2,40,41,42)(H2,43,44,45) targets the protein PTP-1B and which is also targeted by the compound O=C(O)[C@H](Oc1ccc(-c2ccc(-c3c(Cc4ccc(O)cc4O)sc4ccccc34)cc2)cc1)c1ccccc1."}", "/scratch/micpie/export/compound_protein_compound_3/train_6-1.jsonl": "{"text":"The protein DA transporter is targeted by the compound with the DeepSMILES CCNCcccccc6Ccccccn[nH]c5c9)))))))))C%10 and COC[C@@]12CNCC[C@]1(c1ccc3ccccc3c1)C2."} {"text":"The protein CA-II is targeted by the compound with the DeepSMILES NS=O)=O)cnncNC=O)NccccCl)cCl)c6)))))))))s5 and CC(=O)NC1Cc2ccc(S(N)(=O)=O)cc2C1."}", "/scratch/micpie/export/compound_protein_compound_3/train_7-2.jsonl": "{"text":"User: Can you come up with an example for a compound SELFIES that targets the protein Carbonate dehydratase II?\nAssistant: Sure, the compound SELFIES [N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][N][C][C][N][C][=Branch1][C][=S][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][C][Cl][C][Branch1][C][Cl][=C][Ring1][Branch2][C][=C][Ring2][Ring1][#Branch1] targets the protein Carbonate dehydratase II.\nUser: Can you tell another compound SELFIES that targets the protein Carbonate dehydratase II?\nAssistant: Yes, the compound SELFIES [N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring1][N][C][C][N][C][=Branch1][C][=S][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][C][Cl][C][Branch1][C][Cl][=C][Ring1][Branch2][C][=C][Ring2][Ring1][#Branch1] targets the compound SELFIES CC(=O)NC1Cc2ccc(S(N)(=O)=O)cc2C1."} {"text":"User: Can you come up with an example for a compound SMILES that targets the protein CAB?\nAssistant: Of course, the compound SMILES CC(C)(C)c1ccc(S(=O)(=O)Nc2cccc(S(N)(=O)=O)c2)cc1 targets the protein CAB.\nUser: Can you tell another compound SMILES that targets the protein CAB?\nAssistant: Yes, of course, the compound SMILES CC(C)(C)c1ccc(S(=O)(=O)Nc2cccc(S(N)(=O)=O)c2)cc1 targets the compound SMILES NS(=O)(=O)c1ccc(C(=O)NCc2cn([C@@H]3O[C@H](CO)[C@@H](O[C@H]4O[C@H](CO)[C@@H](O)[C@H](O)[C@H]4O)[C@H](O)[C@H]3O)nn2)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/test_7-0.jsonl": "{"text":"The compound SELFIES [C][C][=C][C][=C][Branch2][Ring2][=C][S][=Branch1][C][=O][=Branch1][C][=O][N][C][=Branch1][C][=O][N][C][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][C][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][C][Branch1][C][Cl][=C][Ring1][#C][Cl][C][=C][Ring2][Ring1][=N] targets the protein Cyanamide hydratase CA2 and which is also targeted by the compound CC(=O)NC1Cc2ccc(S(N)(=O)=O)cc2C1."} {"text":"The compound canonical SMILES NS(=O)(=O)c1ccc(NS(=O)(=O)C(F)(F)F)c(F)c1 targets the protein Carbonic anhydrase 1 and which is also targeted by the compound NS(=O)(=O)c1ccc(C(=O)NCc2cn([C@@H]3O[C@H](CO)[C@@H](O[C@H]4O[C@H](CO)[C@@H](O)[C@H](O)[C@H]4O)[C@H](O)[C@H]3O)nn2)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/test_8-2.jsonl": "{"text":"User: Can you give me one example for a compound canonical SMILES that targets the protein CAB?\nAssistant: Of course, the compound canonical SMILES Cc1cc(C)[n+](-c2ccc(S(N)(=O)=O)cc2)c(C)c1C.[O-][Cl+3]([O-])([O-])[O-] targets the protein CAB.\nUser: Can you create another compound canonical SMILES that targets the protein CAB?\nAssistant: Sure, the compound canonical SMILES Cc1cc(C)[n+](-c2ccc(S(N)(=O)=O)cc2)c(C)c1C.[O-][Cl+3]([O-])([O-])[O-] targets the compound canonical SMILES NS(=O)(=O)c1ccc(C(=O)NCc2cn([C@@H]3O[C@H](CO)[C@@H](O[C@H]4O[C@H](CO)[C@@H](O)[C@H](O)[C@H]4O)[C@H](O)[C@H]3O)nn2)cc1."} {"text":"User: Can you give me one example for a compound InChI that targets the protein CAC?\nAssistant: Of course, the compound InChI InChI=1S\/C13H20N4O4S\/c14-22(20,21)12-3-1-11(2-4-12)15-13(19)17-7-5-16(6-8-17)9-10-18\/h1-4,18H,5-10H2,(H,15,19)(H2,14,20,21) targets the protein CAC.\nUser: Can you create another compound InChI that targets the protein CAC?\nAssistant: Of course, the compound InChI InChI=1S\/C13H20N4O4S\/c14-22(20,21)12-3-1-11(2-4-12)15-13(19)17-7-5-16(6-8-17)9-10-18\/h1-4,18H,5-10H2,(H,15,19)(H2,14,20,21) targets the compound InChI NS(=O)(=O)c1ccc(CCOC(=O)CN(CCN(CC(=O)O)CC(=O)O)CC(=O)O)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/test_0-2.jsonl": "{"text":"User: Can you come up with one example for a compound canonical SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Yes, of course, the compound canonical SMILES Cc1ccc2nc(COC(=O)c3cnccn3)cc(=O)n2c1 targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you tell another compound canonical SMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Sure, the compound canonical SMILES Cc1ccc2nc(COC(=O)c3cnccn3)cc(=O)n2c1 targets the compound canonical SMILES C\/C(=N\\NC(=O)c1cc(-c2cccs2)nc2ccccc12)c1ccccn1."} {"text":"User: Can you come up with an example for a compound DeepSMILES that targets the protein Tyrosine-protein phosphatase non-receptor type 1?\nAssistant: Yes, of course, the compound DeepSMILES O=CO)COccccS=O)=O)NCcccc-ccsnn5)))))cc6)))))))CccccCF)F)P=O)O)O)))cc6)))))))))cc6 targets the protein Tyrosine-protein phosphatase non-receptor type 1.\nUser: Can you generate another compound DeepSMILES that targets the protein Tyrosine-protein phosphatase non-receptor type 1?\nAssistant: Yes, of course, the compound DeepSMILES O=CO)COccccS=O)=O)NCcccc-ccsnn5)))))cc6)))))))CccccCF)F)P=O)O)O)))cc6)))))))))cc6 targets the compound DeepSMILES O=C(O)[C@H](Oc1ccc(-c2ccc(-c3c(Cc4ccc(O)cc4O)sc4ccccc34)cc2)cc1)c1ccccc1."}", "/scratch/micpie/export/compound_protein_compound_3/test_3-0.jsonl": "{"text":"The compound DeepSMILES N#CcscNC=O)cccco5)))))))nc5-cccccc6 targets the protein Adenosine receptor A2a and which is also targeted by the compound Cn1c(=O)n(CC2CC2)c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc21."} {"text":"The compound SELFIES [C][C@@H1][C][N][Branch1][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][C][N][Ring1][=N][C][=Branch1][C][=O][C][=C][C][=C][C][=Branch1][C][=O][N][Branch1][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][Branch2][Ring1][=Branch1][C][C][C][C][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][C][C][Ring1][Ring1][=C][C][Ring2][Ring1][O][=C][Ring2][Ring1][#C] targets the protein Chemoattractant receptor-homologous molecule expressed on Th2 cells and which is also targeted by the compound COc1ccc(CC(=O)O)cc1C1=NCC(=O)N(Cc2ccc(Cl)cc2Cl)c2ccccc21."}", "/scratch/micpie/export/compound_protein_compound_3/train_1-0.jsonl": "{"text":"The compound SMILES NS(=O)(=O)C(F)(F)c1cc2cc(CN(Cc3ccc(-c4csnn4)cc3)S(=O)(=O)c3ccc(OCC(=O)O)cc3)ccc2cc1F targets the protein PTP-1B and which is also targeted by the compound O=C(O)[C@H](Oc1ccc(-c2ccc(-c3c(Cc4ccc(O)cc4O)sc4ccccc34)cc2)cc1)c1ccccc1."} {"text":"The compound SMILES C[C@@H]1CC[C@H]2C(C)(C)CCC[C@]2(C)c2c1oc1c(Br)c(O)ccc21 targets the protein PI3Kalpha and which is also targeted by the compound Nc1cn2nc(-c3cnc(Cl)c(NS(=O)(=O)c4ccc(F)cc4)c3)ccc2n1."}", "/scratch/micpie/export/compound_protein_compound_3/test_5-2.jsonl": "{"text":"User: Can you give me an example for a compound DeepSMILES that targets the protein MAO-B?\nAssistant: Yes, of course, the compound DeepSMILES Cnc=O)cccc\/C=C\/cccccCF)F)F))c6))))))))n5C))))nC)c6=O targets the protein MAO-B.\nUser: Can you create another compound DeepSMILES that targets the protein MAO-B?\nAssistant: Sure, the compound DeepSMILES Cnc=O)cccc\/C=C\/cccccCF)F)F))c6))))))))n5C))))nC)c6=O targets the compound DeepSMILES Clc1ccc(\/C=N\/Nc2nc3ccccc3nc2Cc2ccccc2)cc1."} {"text":"User: Can you give me one example for a compound SELFIES that targets the protein Solute carrier family 6 member 3?\nAssistant: Yes, the compound SELFIES [C][O][C][C@H1][C][N][C@H1][C][C][C@][Ring1][#Branch1][Branch1][#C][C][=C][C][=C][Branch1][C][Cl][C][Branch1][C][Cl][=C][Ring1][Branch2][C][Ring1][=N] targets the protein Solute carrier family 6 member 3.\nUser: Can you create another compound SELFIES that targets the protein Solute carrier family 6 member 3?\nAssistant: Yes, the compound SELFIES [C][O][C][C@H1][C][N][C@H1][C][C][C@][Ring1][#Branch1][Branch1][#C][C][=C][C][=C][Branch1][C][Cl][C][Branch1][C][Cl][=C][Ring1][Branch2][C][Ring1][=N] targets the compound SELFIES COC[C@@]12CNCC[C@]1(c1ccc3ccccc3c1)C2."}", "/scratch/micpie/export/compound_protein_compound_3/valid_1-2.jsonl": "{"text":"User: Can you come up with one example for a compound SMILES that targets the protein PTP-1B?\nAssistant: Of course, the compound SMILES NS(=O)(=O)c1cc(-c2ccc(CSCc3ccc(C(F)(F)P(=O)(O)O)c(Br)c3)cc2)ccc1Br targets the protein PTP-1B.\nUser: Can you generate another compound SMILES that targets the protein PTP-1B?\nAssistant: Of course, the compound SMILES NS(=O)(=O)c1cc(-c2ccc(CSCc3ccc(C(F)(F)P(=O)(O)O)c(Br)c3)cc2)ccc1Br targets the compound SMILES O=C(O)[C@H](Oc1ccc(-c2ccc(-c3c(Cc4ccc(O)cc4O)sc4ccccc34)cc2)cc1)c1ccccc1."} {"text":"User: Can you give me an example for a compound SMILES that targets the protein Phosphatidylinositol 4,5-bisphosphate 3-kinase 110 kDa catalytic subunit alpha?\nAssistant: Of course, the compound SMILES CN(C)c1cc(C(=O)O)nc2c(-c3cccc4[nH]ccc34)nc(N3CCOCC3)nc12 targets the protein Phosphatidylinositol 4,5-bisphosphate 3-kinase 110 kDa catalytic subunit alpha.\nUser: Can you generate another compound SMILES that targets the protein Phosphatidylinositol 4,5-bisphosphate 3-kinase 110 kDa catalytic subunit alpha?\nAssistant: Yes, of course, the compound SMILES CN(C)c1cc(C(=O)O)nc2c(-c3cccc4[nH]ccc34)nc(N3CCOCC3)nc12 targets the compound SMILES Nc1cn2nc(-c3cnc(Cl)c(NS(=O)(=O)c4ccc(F)cc4)c3)ccc2n1."}", "/scratch/micpie/export/compound_protein_compound_3/valid_7-2.jsonl": "{"text":"User: Can you give me one example for a compound DeepSMILES that targets the protein CA-II?\nAssistant: Yes, of course, the compound DeepSMILES NS=O)=O)cccc-ccn[C@H]OC[C@@H]O)[C@@H]O)[C@@H]6O)))))))nn5)))))cc6 targets the protein CA-II.\nUser: Can you generate another compound DeepSMILES that targets the protein CA-II?\nAssistant: Of course, the compound DeepSMILES NS=O)=O)cccc-ccn[C@H]OC[C@@H]O)[C@@H]O)[C@@H]6O)))))))nn5)))))cc6 targets the compound DeepSMILES CC(=O)NC1Cc2ccc(S(N)(=O)=O)cc2C1."} {"text":"User: Can you come up with one example for a compound canonical SMILES that targets the protein CA-I?\nAssistant: Of course, the compound canonical SMILES Cc1ccc(S(=O)(=O)n2cc(Br)c(=O)[nH]c2=O)cc1 targets the protein CA-I.\nUser: Can you tell another compound canonical SMILES that targets the protein CA-I?\nAssistant: Yes, the compound canonical SMILES Cc1ccc(S(=O)(=O)n2cc(Br)c(=O)[nH]c2=O)cc1 targets the compound canonical SMILES NS(=O)(=O)c1ccc(C(=O)NCc2cn([C@@H]3O[C@H](CO)[C@@H](O[C@H]4O[C@H](CO)[C@@H](O)[C@H](O)[C@H]4O)[C@H](O)[C@H]3O)nn2)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/test_0-0.jsonl": "{"text":"The compound InChI InChI=1S\/C15H12N4O3\/c1-10-2-3-13-18-11(6-14(20)19(13)8-10)9-22-15(21)12-7-16-4-5-17-12\/h2-8H,9H2,1H3 targets the protein Tyrosyl-DNA phosphodiesterase 1 and which is also targeted by the compound C\/C(=N\\NC(=O)c1cc(-c2cccs2)nc2ccccc12)c1ccccn1."} {"text":"The compound InChI InChI=1S\/C25H22F2N3O8PS2\/c26-25(27,39(33,34)35)20-7-3-18(4-8-20)14-30(13-17-1-5-19(6-2-17)23-16-40-29-28-23)41(36,37)22-11-9-21(10-12-22)38-15-24(31)32\/h1-12,16H,13-15H2,(H,31,32)(H2,33,34,35) targets the protein Tyrosine-protein phosphatase non-receptor type 1 and which is also targeted by the compound O=C(O)[C@H](Oc1ccc(-c2ccc(-c3c(Cc4ccc(O)cc4O)sc4ccccc34)cc2)cc1)c1ccccc1."}", "/scratch/micpie/export/compound_protein_compound_3/test_6-0.jsonl": "{"text":"The compound SMILES Clc1ccc([C@]23CNC[C@H]2C3)cc1Cl targets the protein DA transporter and which is also targeted by the compound COC[C@@]12CNCC[C@]1(c1ccc3ccccc3c1)C2."} {"text":"The compound InChI InChI=1S\/C9H14N2O3S\/c1-7(11-15(10,12)13)8-5-3-4-6-9(8)14-2\/h3-7,11H,1-2H3,(H2,10,12,13) targets the protein Cyanamide hydratase CA2 and which is also targeted by the compound CC(=O)NC1Cc2ccc(S(N)(=O)=O)cc2C1."}", "/scratch/micpie/export/compound_protein_compound_3/train_2-0.jsonl": "{"text":"The compound DeepSMILES O=CccccC=O)CF)F)F)))cc6))))))NCCOcccc-ccccccccc6c%10))))))))))cc6C%11 targets the protein PtdIns-3-kinase subunit alpha and which is also targeted by the compound Nc1cn2nc(-c3cnc(Cl)c(NS(=O)(=O)c4ccc(F)cc4)c3)ccc2n1."} {"text":"The compound DeepSMILES O=Ccccccc6))))))NC=O)cccccc6)))))))cncCl)ccnCCcccccc6))))))))nc5n9 targets the protein Adenosine receptor A2a and which is also targeted by the compound Cn1c(=O)n(CC2CC2)c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc21."}", "/scratch/micpie/export/compound_protein_compound_3/valid_2-0.jsonl": "{"text":"The compound InChI InChI=1S\/C20H13ClN2O2S\/c21-14-5-2-1-4-13(14)19-18(23-11-25-19)16-10-12-7-9-24-15-6-3-8-22-17(15)20(12)26-16\/h1-6,8,10-11H,7,9H2 targets the protein PI3K-alpha and which is also targeted by the compound Nc1cn2nc(-c3cnc(Cl)c(NS(=O)(=O)c4ccc(F)cc4)c3)ccc2n1."} {"text":"The compound InChI InChI=1S\/C16H8IN5OS2\/c17-9-3-5-10(6-4-9)22-14-12(25-16(22)24)15-19-13(11-2-1-7-23-11)20-21(15)8-18-14\/h1-8H targets the protein Adenosine receptor A2a and which is also targeted by the compound Cn1c(=O)n(CC2CC2)c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc21."}", "/scratch/micpie/export/compound_protein_compound_3/test_7-1.jsonl": "{"text":"The protein Cyanamide hydratase CA2 is targeted by the compound with the SELFIES [C][C][=C][C][=C][Branch2][Ring2][=C][S][=Branch1][C][=O][=Branch1][C][=O][N][C][=Branch1][C][=O][N][C][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][C][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][C][Branch1][C][Cl][=C][Ring1][#C][Cl][C][=C][Ring2][Ring1][=N] and CC(=O)NC1Cc2ccc(S(N)(=O)=O)cc2C1."} {"text":"The protein CAB is targeted by the compound with the SELFIES [N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][P][N][S][=Branch1][C][=O][=Branch1][C][=O][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][=C][Ring1][#C] and NS(=O)(=O)c1ccc(C(=O)NCc2cn([C@@H]3O[C@H](CO)[C@@H](O[C@H]4O[C@H](CO)[C@@H](O)[C@H](O)[C@H]4O)[C@H](O)[C@H]3O)nn2)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/valid_4-2.jsonl": "{"text":"User: Can you give me one example for a compound SELFIES that targets the protein G-protein coupled receptor 44?\nAssistant: Yes, the compound SELFIES [C][N][Branch2][Ring1][N][C@@H1][C][C][C][=C][Branch1][#Branch1][C][C][=Branch1][C][=O][O][C][=N][C][=C][C][=C][Ring1][=Branch1][N][Ring1][=N][C][Ring1][P][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1] targets the protein G-protein coupled receptor 44.\nUser: Can you create another compound SELFIES that targets the protein G-protein coupled receptor 44?\nAssistant: Sure, the compound SELFIES [C][N][Branch2][Ring1][N][C@@H1][C][C][C][=C][Branch1][#Branch1][C][C][=Branch1][C][=O][O][C][=N][C][=C][C][=C][Ring1][=Branch1][N][Ring1][=N][C][Ring1][P][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1] targets the compound SELFIES COc1ccc(CC(=O)O)cc1C1=NCC(=O)N(Cc2ccc(Cl)cc2Cl)c2ccccc21."} {"text":"User: Can you give me an example for a compound SMILES that targets the protein Amine oxidase [flavin-containing] B?\nAssistant: Yes, the compound SMILES Cc1cncn1-c1ccc2nc(-c3ccccc3)ncc2c1 targets the protein Amine oxidase [flavin-containing] B.\nUser: Can you generate another compound SMILES that targets the protein Amine oxidase [flavin-containing] B?\nAssistant: Of course, the compound SMILES Cc1cncn1-c1ccc2nc(-c3ccccc3)ncc2c1 targets the compound SMILES Clc1ccc(\/C=N\/Nc2nc3ccccc3nc2Cc2ccccc2)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/test_4-2.jsonl": "{"text":"User: Can you give me one example for a compound SMILES that targets the protein Chemoattractant receptor-homologous molecule expressed on Th2 cells?\nAssistant: Sure, the compound SMILES O=C(O)Cc1sc(C(c2ccc(F)cc2)c2ccc(F)cc2)nc1-c1ccn[nH]1 targets the protein Chemoattractant receptor-homologous molecule expressed on Th2 cells.\nUser: Can you create another compound SMILES that targets the protein Chemoattractant receptor-homologous molecule expressed on Th2 cells?\nAssistant: Sure, the compound SMILES O=C(O)Cc1sc(C(c2ccc(F)cc2)c2ccc(F)cc2)nc1-c1ccn[nH]1 targets the compound SMILES COc1ccc(CC(=O)O)cc1C1=NCC(=O)N(Cc2ccc(Cl)cc2Cl)c2ccccc21."} {"text":"User: Can you come up with one example for a compound canonical SMILES that targets the protein MAO-B?\nAssistant: Yes, of course, the compound canonical SMILES CCCSc1ccc(C[C@H](C)N)cc1 targets the protein MAO-B.\nUser: Can you tell another compound canonical SMILES that targets the protein MAO-B?\nAssistant: Yes, the compound canonical SMILES CCCSc1ccc(C[C@H](C)N)cc1 targets the compound canonical SMILES Clc1ccc(\/C=N\/Nc2nc3ccccc3nc2Cc2ccccc2)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/valid_3-2.jsonl": "{"text":"User: Can you come up with an example for a compound SELFIES that targets the protein Adenosine receptor A2a?\nAssistant: Yes, of course, the compound SELFIES [C][O][N][C][=N][C][Branch1][O][C][#C][C][=C][C][=N][C][=C][Ring1][=Branch1][=N][C][=C][Ring1][=C][N][=C][N][Ring1][Branch1][C@@H1][O][C@H1][Branch1][Ring1][C][O][C@@H1][Branch1][C][O][C@H1][Ring1][Branch2][O] targets the protein Adenosine receptor A2a.\nUser: Can you tell another compound SELFIES that targets the protein Adenosine receptor A2a?\nAssistant: Sure, the compound SELFIES [C][O][N][C][=N][C][Branch1][O][C][#C][C][=C][C][=N][C][=C][Ring1][=Branch1][=N][C][=C][Ring1][=C][N][=C][N][Ring1][Branch1][C@@H1][O][C@H1][Branch1][Ring1][C][O][C@@H1][Branch1][C][O][C@H1][Ring1][Branch2][O] targets the compound SELFIES Cn1c(=O)n(CC2CC2)c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc21."} {"text":"User: Can you give me one example for a compound DeepSMILES that targets the protein G-protein coupled receptor 44?\nAssistant: Sure, the compound DeepSMILES CccccS=O)=O)NCC)C)C))))cc6C#CcccCl)ccc6OCC=O)O targets the protein G-protein coupled receptor 44.\nUser: Can you tell another compound DeepSMILES that targets the protein G-protein coupled receptor 44?\nAssistant: Sure, the compound DeepSMILES CccccS=O)=O)NCC)C)C))))cc6C#CcccCl)ccc6OCC=O)O targets the compound DeepSMILES COc1ccc(CC(=O)O)cc1C1=NCC(=O)N(Cc2ccc(Cl)cc2Cl)c2ccccc21."}", "/scratch/micpie/export/compound_protein_compound_3/valid_2-1.jsonl": "{"text":"The protein PI3-kinase subunit alpha is targeted by the compound with the canonical SMILES Clc1ccccc1-c1ocnc1-c1cc2c(s1)-c1ncccc1OCC2 and Nc1cn2nc(-c3cnc(Cl)c(NS(=O)(=O)c4ccc(F)cc4)c3)ccc2n1."} {"text":"The protein Adenosine receptor A2a is targeted by the compound with the DeepSMILES S=csccncnnc-cccco5)))))nc95)))))))n5-ccccI)cc6 and Cn1c(=O)n(CC2CC2)c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc21."}", "/scratch/micpie/export/compound_protein_compound_3/valid_4-0.jsonl": "{"text":"The compound SELFIES [C][N][Branch2][Ring1][N][C@@H1][C][C][C][=C][Branch1][#Branch1][C][C][=Branch1][C][=O][O][C][=N][C][=C][C][=C][Ring1][=Branch1][N][Ring1][=N][C][Ring1][P][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1] targets the protein CD antigen CD294 and which is also targeted by the compound COc1ccc(CC(=O)O)cc1C1=NCC(=O)N(Cc2ccc(Cl)cc2Cl)c2ccccc21."} {"text":"The compound DeepSMILES Cccncn5-ccccnc-cccccc6))))))ncc6c%10 targets the protein MAO-B and which is also targeted by the compound Clc1ccc(\/C=N\/Nc2nc3ccccc3nc2Cc2ccccc2)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/train_5-1.jsonl": "{"text":"The protein Monoamine oxidase type B is targeted by the compound with the SMILES COc1ccc(\/C=C2\\COc3ccccc3C2=O)c(O)c1 and Clc1ccc(\/C=N\/Nc2nc3ccccc3nc2Cc2ccccc2)cc1."} {"text":"The protein DAT is targeted by the compound with the SELFIES [N][C][=C][C][=C][Branch2][Ring1][#C][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][N][C][C][Ring1][#Branch1][C][=C][C][=C][Branch1][C][Cl][C][Branch1][C][Cl][=C][Ring1][Branch2][N][=N][Ring2][Ring1][Branch2] and COC[C@@]12CNCC[C@]1(c1ccc3ccccc3c1)C2."}", "/scratch/micpie/export/compound_protein_compound_3/test_2-1.jsonl": "{"text":"The protein PI3K-alpha is targeted by the compound with the SMILES COC(=O)c1cc(CNc2ccc(S(=O)(=O)Nc3nnc(C)s3)cc2)cc(-c2ccc(OC)nc2)c1 and Nc1cn2nc(-c3cnc(Cl)c(NS(=O)(=O)c4ccc(F)cc4)c3)ccc2n1."} {"text":"The protein Adenosine receptor A2a is targeted by the compound with the canonical SMILES N#Cc1c(NC(=O)C2CC2)nc(-c2ccccc2)nc1-c1ccccc1 and Cn1c(=O)n(CC2CC2)c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc21."}", "/scratch/micpie/export/compound_protein_compound_3/train_9-2.jsonl": "{"text":"User: Can you come up with one example for a compound SMILES that targets the protein CAC?\nAssistant: Sure, the compound SMILES COc1ccc2c(=O)c(-c3ccc(OC)c(OC)c3)c(N)oc2c1 targets the protein CAC.\nUser: Can you generate another compound SMILES that targets the protein CAC?\nAssistant: Yes, of course, the compound SMILES COc1ccc2c(=O)c(-c3ccc(OC)c(OC)c3)c(N)oc2c1 targets the compound SMILES NS(=O)(=O)c1ccc(CCOC(=O)CN(CCN(CC(=O)O)CC(=O)O)CC(=O)O)cc1."} {"text":"User: Can you give me an example for a compound SMILES that targets the protein Cyanamide hydratase CA1?\nAssistant: Yes, of course, the compound SMILES Cc1c(C(=O)NCCCN(C)C)cc(-c2ccccc2)n1-c1ccc(S(N)(=O)=O)cc1 targets the protein Cyanamide hydratase CA1.\nUser: Can you tell another compound SMILES that targets the protein Cyanamide hydratase CA1?\nAssistant: Sure, the compound SMILES Cc1c(C(=O)NCCCN(C)C)cc(-c2ccccc2)n1-c1ccc(S(N)(=O)=O)cc1 targets the compound SMILES NS(=O)(=O)c1ccc(\/N=C\/c2ccccc2)c(F)c1."}", "/scratch/micpie/export/compound_protein_compound_3/train_0-0.jsonl": "{"text":"The compound SMILES Cn1cccc1\/C=C1\\CCC\/C(=C\\c2cccn2C)C1=O targets the protein Tyrosyl-DNA phosphodiesterase 1 and which is also targeted by the compound C\/C(=N\\NC(=O)c1cc(-c2cccs2)nc2ccccc12)c1ccccn1."} {"text":"The compound SELFIES [O][=C][Branch1][C][O][C][=C][C][=C][Branch2][Branch1][=Branch2][S][=Branch1][C][=O][=Branch1][C][=O][O][C][=C][C][=C][Branch2][Ring2][Branch1][C][=C][C][=C][Branch2][Ring1][=Branch2][C][=C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][=C][C][=C][C][=C][Ring1][S][Ring1][=Branch1][C][=C][Ring2][Ring1][=Branch1][C][=C][Ring2][Ring1][N][C][=C][Ring2][Ring2][=Branch1] targets the protein Protein-tyrosine phosphatase 1B and which is also targeted by the compound O=C(O)[C@H](Oc1ccc(-c2ccc(-c3c(Cc4ccc(O)cc4O)sc4ccccc34)cc2)cc1)c1ccccc1."}", "/scratch/micpie/export/compound_protein_compound_3/test_1-1.jsonl": "{"text":"The protein Protein-tyrosine phosphatase 1B is targeted by the compound with the DeepSMILES N[C@@H]CccccCF)F)P=O)O)O)))cc6)))))))C=O)O and O=C(O)[C@H](Oc1ccc(-c2ccc(-c3c(Cc4ccc(O)cc4O)sc4ccccc34)cc2)cc1)c1ccccc1."} {"text":"The protein Phosphoinositide 3-kinase alpha is targeted by the compound with the SELFIES [C][C][C][=N][C][=N][C][Branch2][Ring2][#C][C][=C][C][=C][Branch2][Ring1][S][C][=Branch1][C][=O][N][C][C@H1][C][N][Branch1][=C][C][=Branch1][C][=O][O][C][Branch1][C][C][Branch1][C][C][C][C][C@H1][Ring1][N][C][Ring1][#C][C][Branch1][C][F][=C][Ring2][Ring1][Branch2][=C][Ring2][Ring1][=C][C][#C][C][=C][C][=C][Branch1][C][N][N][=C][Ring1][#Branch1] and Nc1cn2nc(-c3cnc(Cl)c(NS(=O)(=O)c4ccc(F)cc4)c3)ccc2n1."}", "/scratch/micpie/export/compound_protein_compound_3/valid_9-0.jsonl": "{"text":"The compound SMILES COc1cccc(OC)c1C(=O)Oc1ccc2nc(S(N)(=O)=O)sc2c1 targets the protein Carbonate dehydratase II and which is also targeted by the compound NS(=O)(=O)c1ccc(CCOC(=O)CN(CCN(CC(=O)O)CC(=O)O)CC(=O)O)cc1."} {"text":"The compound InChI InChI=1S\/C16H15N3O5S2\/c1-23-12-8-7-11-13(14(12)24-2)18-16(25)19(15(11)20)9-3-5-10(6-4-9)26(17,21)22\/h3-8H,1-2H3,(H,18,25)(H2,17,21,22) targets the protein Carbonic anhydrase 1 and which is also targeted by the compound NS(=O)(=O)c1ccc(\/N=C\/c2ccccc2)c(F)c1."}", "/scratch/micpie/export/compound_protein_compound_3/train_5-2.jsonl": "{"text":"User: Can you come up with one example for a compound SELFIES that targets the protein Amine oxidase [flavin-containing] B?\nAssistant: Of course, the compound SELFIES [C][O][C][=C][C][=C][Branch1][P][\/C][=C][\\C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][#Branch2][=O][C][Branch1][C][O][=C][Ring2][Ring1][Ring1] targets the protein Amine oxidase [flavin-containing] B.\nUser: Can you generate another compound SELFIES that targets the protein Amine oxidase [flavin-containing] B?\nAssistant: Yes, of course, the compound SELFIES [C][O][C][=C][C][=C][Branch1][P][\/C][=C][\\C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][#Branch2][=O][C][Branch1][C][O][=C][Ring2][Ring1][Ring1] targets the compound SELFIES Clc1ccc(\/C=N\/Nc2nc3ccccc3nc2Cc2ccccc2)cc1."} {"text":"User: Can you come up with one example for a compound SMILES that targets the protein DA transporter?\nAssistant: Of course, the compound SMILES Nc1ccc(-c2ccc3c(c2)CNCC3c2ccc(Cl)c(Cl)c2)nn1 targets the protein DA transporter.\nUser: Can you generate another compound SMILES that targets the protein DA transporter?\nAssistant: Of course, the compound SMILES Nc1ccc(-c2ccc3c(c2)CNCC3c2ccc(Cl)c(Cl)c2)nn1 targets the compound SMILES COC[C@@]12CNCC[C@]1(c1ccc3ccccc3c1)C2."}", "/scratch/micpie/export/compound_protein_compound_3/train_8-1.jsonl": "{"text":"The protein Carbonic anhydrase B is targeted by the compound with the InChI InChI=1S\/C15H14Br2N2O3S\/c16-12-7-11(15(20)14(17)8-12)9-19-6-5-10-1-3-13(4-2-10)23(18,21)22\/h1-4,7-9,20H,5-6H2,(H2,18,21,22)\/b19-9- and NS(=O)(=O)c1ccc(C(=O)NCc2cn([C@@H]3O[C@H](CO)[C@@H](O[C@H]4O[C@H](CO)[C@@H](O)[C@H](O)[C@H]4O)[C@H](O)[C@H]3O)nn2)cc1."} {"text":"The protein Carbonic anhydrase C is targeted by the compound with the canonical SMILES N#Cc1cn(-c2nc3ccc(S(N)(=O)=O)cc3s2)nc1-c1ccc(Cl)cc1 and NS(=O)(=O)c1ccc(CCOC(=O)CN(CCN(CC(=O)O)CC(=O)O)CC(=O)O)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/train_8-0.jsonl": "{"text":"The compound InChI InChI=1S\/C15H14Br2N2O3S\/c16-12-7-11(15(20)14(17)8-12)9-19-6-5-10-1-3-13(4-2-10)23(18,21)22\/h1-4,7-9,20H,5-6H2,(H2,18,21,22)\/b19-9- targets the protein CAB and which is also targeted by the compound NS(=O)(=O)c1ccc(C(=O)NCc2cn([C@@H]3O[C@H](CO)[C@@H](O[C@H]4O[C@H](CO)[C@@H](O)[C@H](O)[C@H]4O)[C@H](O)[C@H]3O)nn2)cc1."} {"text":"The compound SELFIES [N][#C][C][=C][N][Branch2][Ring1][Branch2][C][=N][C][=C][C][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][C][=C][Ring1][#Branch2][S][Ring1][=N][N][=C][Ring2][Ring1][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1] targets the protein CAC and which is also targeted by the compound NS(=O)(=O)c1ccc(CCOC(=O)CN(CCN(CC(=O)O)CC(=O)O)CC(=O)O)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/test_5-1.jsonl": "{"text":"The protein Amine oxidase [flavin-containing] B is targeted by the compound with the canonical SMILES Cn1c(=O)c2c(cc(\/C=C\/c3cccc(C(F)(F)F)c3)n2C)n(C)c1=O and Clc1ccc(\/C=N\/Nc2nc3ccccc3nc2Cc2ccccc2)cc1."} {"text":"The protein Sodium-dependent dopamine transporter is targeted by the compound with the DeepSMILES COC[C@H]CN[C@H]CC[C@]7ccccCl)cCl)c6))))))C5 and COC[C@@]12CNCC[C@]1(c1ccc3ccccc3c1)C2."}", "/scratch/micpie/export/compound_protein_compound_3/train_4-1.jsonl": "{"text":"The protein Chemoattractant receptor-homologous molecule expressed on Th2 cells is targeted by the compound with the SELFIES [C][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][Branch2][Ring1][=Branch2][C][=C][C][=N][C][=C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][=C][C][=C][Ring1][=C][Ring1][#Branch2][=C][Branch1][C][C][N][Ring2][Ring1][Branch1][C][C][=Branch1][C][=O][O] and COc1ccc(CC(=O)O)cc1C1=NCC(=O)N(Cc2ccc(Cl)cc2Cl)c2ccccc21."} {"text":"The protein Monoamine oxidase type B is targeted by the compound with the canonical SMILES Cc1ccc2oc(=O)c(-c3cccs3)cc2c1 and Clc1ccc(\/C=N\/Nc2nc3ccccc3nc2Cc2ccccc2)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/valid_0-2.jsonl": "{"text":"User: Can you come up with one example for a compound DeepSMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Sure, the compound DeepSMILES Br.CCOC=O)CCCcscN)nc5-c[nH]cO)nc5C targets the protein Tyr-DNA phosphodiesterase 1.\nUser: Can you tell another compound DeepSMILES that targets the protein Tyr-DNA phosphodiesterase 1?\nAssistant: Sure, the compound DeepSMILES Br.CCOC=O)CCCcscN)nc5-c[nH]cO)nc5C targets the compound DeepSMILES C\/C(=N\\NC(=O)c1cc(-c2cccs2)nc2ccccc12)c1ccccn1."} {"text":"User: Can you give me an example for a compound SELFIES that targets the protein Protein-tyrosine phosphatase 1B?\nAssistant: Of course, the compound SELFIES [O][=C][Branch1][O][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Branch2][Ring1][O][C][C][=C][C][=C][Branch1][S][C][Branch1][C][F][Branch1][C][F][P][=Branch1][C][=O][Branch1][C][O][O][C][=C][Ring1][=N][Branch2][Ring1][O][C][C][=C][C][=C][Branch1][S][C][Branch1][C][F][Branch1][C][F][P][=Branch1][C][=O][Branch1][C][O][O][C][=C][Ring1][=N][C][=Branch1][C][=O][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1] targets the protein Protein-tyrosine phosphatase 1B.\nUser: Can you tell another compound SELFIES that targets the protein Protein-tyrosine phosphatase 1B?\nAssistant: Yes, of course, the compound SELFIES [O][=C][Branch1][O][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Branch2][Ring1][O][C][C][=C][C][=C][Branch1][S][C][Branch1][C][F][Branch1][C][F][P][=Branch1][C][=O][Branch1][C][O][O][C][=C][Ring1][=N][Branch2][Ring1][O][C][C][=C][C][=C][Branch1][S][C][Branch1][C][F][Branch1][C][F][P][=Branch1][C][=O][Branch1][C][O][O][C][=C][Ring1][=N][C][=Branch1][C][=O][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1] targets the compound SELFIES O=C(O)[C@H](Oc1ccc(-c2ccc(-c3c(Cc4ccc(O)cc4O)sc4ccccc34)cc2)cc1)c1ccccc1."}", "/scratch/micpie/export/compound_protein_compound_3/train_5-0.jsonl": "{"text":"The compound SMILES COc1ccc(\/C=C2\\COc3ccccc3C2=O)c(O)c1 targets the protein MAO-B and which is also targeted by the compound Clc1ccc(\/C=N\/Nc2nc3ccccc3nc2Cc2ccccc2)cc1."} {"text":"The compound SELFIES [N][C][=C][C][=C][Branch2][Ring1][#C][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][N][C][C][Ring1][#Branch1][C][=C][C][=C][Branch1][C][Cl][C][Branch1][C][Cl][=C][Ring1][Branch2][N][=N][Ring2][Ring1][Branch2] targets the protein Solute carrier family 6 member 3 and which is also targeted by the compound COC[C@@]12CNCC[C@]1(c1ccc3ccccc3c1)C2."}", "/scratch/micpie/export/compound_protein_compound_3/test_6-2.jsonl": "{"text":"User: Can you give me one example for a compound InChI that targets the protein DAT?\nAssistant: Of course, the compound InChI InChI=1S\/C11H11Cl2N\/c12-9-2-1-7(3-10(9)13)11-4-8(11)5-14-6-11\/h1-3,8,14H,4-6H2\/t8-,11+\/m1\/s1 targets the protein DAT.\nUser: Can you create another compound InChI that targets the protein DAT?\nAssistant: Of course, the compound InChI InChI=1S\/C11H11Cl2N\/c12-9-2-1-7(3-10(9)13)11-4-8(11)5-14-6-11\/h1-3,8,14H,4-6H2\/t8-,11+\/m1\/s1 targets the compound InChI COC[C@@]12CNCC[C@]1(c1ccc3ccccc3c1)C2."} {"text":"User: Can you give me an example for a compound SMILES that targets the protein Carbonic anhydrase II?\nAssistant: Sure, the compound SMILES COc1ccccc1C(C)NS(N)(=O)=O targets the protein Carbonic anhydrase II.\nUser: Can you create another compound SMILES that targets the protein Carbonic anhydrase II?\nAssistant: Sure, the compound SMILES COc1ccccc1C(C)NS(N)(=O)=O targets the compound SMILES CC(=O)NC1Cc2ccc(S(N)(=O)=O)cc2C1."}", "/scratch/micpie/export/compound_protein_compound_3/valid_0-1.jsonl": "{"text":"The protein Tyr-DNA phosphodiesterase 1 is targeted by the compound with the SELFIES [Br].[C][C][O][C][=Branch1][C][=O][C][C][C][C][S][C][Branch1][C][N][=N][C][=Ring1][=Branch1][C][NH1][C][Branch1][C][O][=N][C][=Ring1][=Branch1][C] and C\/C(=N\\NC(=O)c1cc(-c2cccs2)nc2ccccc12)c1ccccn1."} {"text":"The protein Protein-tyrosine phosphatase 1B is targeted by the compound with the DeepSMILES O=COCcccccc6))))))))CCccccCF)F)P=O)O)O)))cc6)))))))CccccCF)F)P=O)O)O)))cc6)))))))C=O)OCcccccc6 and O=C(O)[C@H](Oc1ccc(-c2ccc(-c3c(Cc4ccc(O)cc4O)sc4ccccc34)cc2)cc1)c1ccccc1."}", "/scratch/micpie/export/compound_protein_compound_3/valid_7-1.jsonl": "{"text":"The protein Carbonic anhydrase 2 is targeted by the compound with the canonical SMILES NS(=O)(=O)c1ccc(-c2cn([C@H]3OC[C@@H](O)[C@@H](O)[C@@H]3O)nn2)cc1 and CC(=O)NC1Cc2ccc(S(N)(=O)=O)cc2C1."} {"text":"The protein Cyanamide hydratase CA1 is targeted by the compound with the SELFIES [C][C][=C][C][=C][Branch2][Ring1][#Branch1][S][=Branch1][C][=O][=Branch1][C][=O][N][C][=C][Branch1][C][Br][C][=Branch1][C][=O][NH1][C][Ring1][Branch2][=O][C][=C][Ring2][Ring1][C] and NS(=O)(=O)c1ccc(C(=O)NCc2cn([C@@H]3O[C@H](CO)[C@@H](O[C@H]4O[C@H](CO)[C@@H](O)[C@H](O)[C@H]4O)[C@H](O)[C@H]3O)nn2)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/train_2-1.jsonl": "{"text":"The protein Serine\/threonine protein kinase PIK3CA is targeted by the compound with the InChI InChI=1S\/C28H20F3NO3\/c29-28(30,31)26(33)19-6-8-20(9-7-19)27(34)32-13-14-35-25-12-11-23(16-24(25)17-32)22-10-5-18-3-1-2-4-21(18)15-22\/h1-12,15-16H,13-14,17H2 and Nc1cn2nc(-c3cnc(Cl)c(NS(=O)(=O)c4ccc(F)cc4)c3)ccc2n1."} {"text":"The protein Adenosine receptor A2a is targeted by the compound with the canonical SMILES O=C(c1ccccc1)N(C(=O)c1ccccc1)c1nc(Cl)c2cn(CCc3ccccc3)nc2n1 and Cn1c(=O)n(CC2CC2)c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc21."}", "/scratch/micpie/export/compound_protein_compound_3/valid_1-1.jsonl": "{"text":"The protein Tyrosine-protein phosphatase non-receptor type 1 is targeted by the compound with the DeepSMILES NS=O)=O)ccc-ccccCSCccccCF)F)P=O)O)O)))cBr)c6)))))))))cc6))))))ccc6Br and O=C(O)[C@H](Oc1ccc(-c2ccc(-c3c(Cc4ccc(O)cc4O)sc4ccccc34)cc2)cc1)c1ccccc1."} {"text":"The protein PtdIns-3-kinase subunit alpha is targeted by the compound with the DeepSMILES CNC)cccC=O)O))ncc-ccccc[nH]ccc95)))))))))ncNCCOCC6))))))nc%106 and Nc1cn2nc(-c3cnc(Cl)c(NS(=O)(=O)c4ccc(F)cc4)c3)ccc2n1."}", "/scratch/micpie/export/compound_protein_compound_3/test_3-1.jsonl": "{"text":"The protein Adenosine receptor A2a is targeted by the compound with the DeepSMILES N#CcscNC=O)cccco5)))))))nc5-cccccc6 and Cn1c(=O)n(CC2CC2)c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc21."} {"text":"The protein CD antigen CD294 is targeted by the compound with the canonical SMILES C[C@@H]1CN(c2ccc(F)cc2)CCN1C(=O)c1ccc2c(=O)n(-c3ccc(F)cc3)c(CCCCC(=O)NS(=O)(=O)C3CC3)cc2c1 and COc1ccc(CC(=O)O)cc1C1=NCC(=O)N(Cc2ccc(Cl)cc2Cl)c2ccccc21."}", "/scratch/micpie/export/compound_protein_compound_3/train_9-0.jsonl": "{"text":"The compound canonical SMILES COc1ccc2c(=O)c(-c3ccc(OC)c(OC)c3)c(N)oc2c1 targets the protein Carbonate dehydratase II and which is also targeted by the compound NS(=O)(=O)c1ccc(CCOC(=O)CN(CCN(CC(=O)O)CC(=O)O)CC(=O)O)cc1."} {"text":"The compound canonical SMILES Cc1c(C(=O)NCCCN(C)C)cc(-c2ccccc2)n1-c1ccc(S(N)(=O)=O)cc1 targets the protein CAB and which is also targeted by the compound NS(=O)(=O)c1ccc(\/N=C\/c2ccccc2)c(F)c1."}", "/scratch/micpie/export/compound_protein_compound_3/valid_9-2.jsonl": "{"text":"User: Can you come up with an example for a compound SMILES that targets the protein Carbonic anhydrase 2?\nAssistant: Yes, of course, the compound SMILES COc1cccc(OC)c1C(=O)Oc1ccc2nc(S(N)(=O)=O)sc2c1 targets the protein Carbonic anhydrase 2.\nUser: Can you create another compound SMILES that targets the protein Carbonic anhydrase 2?\nAssistant: Sure, the compound SMILES COc1cccc(OC)c1C(=O)Oc1ccc2nc(S(N)(=O)=O)sc2c1 targets the compound SMILES NS(=O)(=O)c1ccc(CCOC(=O)CN(CCN(CC(=O)O)CC(=O)O)CC(=O)O)cc1."} {"text":"User: Can you come up with an example for a compound SELFIES that targets the protein Carbonate dehydratase I?\nAssistant: Yes, of course, the compound SELFIES [C][O][C][=C][C][=C][C][=Branch1][C][=O][N][Branch2][Ring1][Ring1][C][=C][C][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][C][=C][Ring1][#Branch2][C][Branch1][C][S][=N][C][Ring2][Ring1][C][=C][Ring2][Ring1][=Branch1][O][C] targets the protein Carbonate dehydratase I.\nUser: Can you tell another compound SELFIES that targets the protein Carbonate dehydratase I?\nAssistant: Of course, the compound SELFIES [C][O][C][=C][C][=C][C][=Branch1][C][=O][N][Branch2][Ring1][Ring1][C][=C][C][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][C][=C][Ring1][#Branch2][C][Branch1][C][S][=N][C][Ring2][Ring1][C][=C][Ring2][Ring1][=Branch1][O][C] targets the compound SELFIES NS(=O)(=O)c1ccc(\/N=C\/c2ccccc2)c(F)c1."}", "/scratch/micpie/export/compound_protein_compound_3/test_1-0.jsonl": "{"text":"The compound SMILES N[C@@H](Cc1ccc(C(F)(F)P(=O)(O)O)cc1)C(=O)O targets the protein Protein-tyrosine phosphatase 1B and which is also targeted by the compound O=C(O)[C@H](Oc1ccc(-c2ccc(-c3c(Cc4ccc(O)cc4O)sc4ccccc34)cc2)cc1)c1ccccc1."} {"text":"The compound SELFIES [C][C][C][=N][C][=N][C][Branch2][Ring2][#C][C][=C][C][=C][Branch2][Ring1][S][C][=Branch1][C][=O][N][C][C@H1][C][N][Branch1][=C][C][=Branch1][C][=O][O][C][Branch1][C][C][Branch1][C][C][C][C][C@H1][Ring1][N][C][Ring1][#C][C][Branch1][C][F][=C][Ring2][Ring1][Branch2][=C][Ring2][Ring1][=C][C][#C][C][=C][C][=C][Branch1][C][N][N][=C][Ring1][#Branch1] targets the protein Phosphoinositide-3-kinase catalytic alpha polypeptide and which is also targeted by the compound Nc1cn2nc(-c3cnc(Cl)c(NS(=O)(=O)c4ccc(F)cc4)c3)ccc2n1."}", "/scratch/micpie/export/compound_protein_compound_3/train_0-2.jsonl": "{"text":"User: Can you come up with an example for a compound SMILES that targets the protein Tyrosyl-DNA phosphodiesterase 1?\nAssistant: Sure, the compound SMILES Cn1cccc1\/C=C1\\CCC\/C(=C\\c2cccn2C)C1=O targets the protein Tyrosyl-DNA phosphodiesterase 1.\nUser: Can you create another compound SMILES that targets the protein Tyrosyl-DNA phosphodiesterase 1?\nAssistant: Sure, the compound SMILES Cn1cccc1\/C=C1\\CCC\/C(=C\\c2cccn2C)C1=O targets the compound SMILES C\/C(=N\\NC(=O)c1cc(-c2cccs2)nc2ccccc12)c1ccccn1."} {"text":"User: Can you come up with an example for a compound canonical SMILES that targets the protein Protein-tyrosine phosphatase 1B?\nAssistant: Yes, of course, the compound canonical SMILES O=C(O)c1ccc(S(=O)(=O)Oc2ccc(-c3ccc(-c4c(Cc5ccccc5)oc5ccccc45)cc3)cc2)cc1 targets the protein Protein-tyrosine phosphatase 1B.\nUser: Can you tell another compound canonical SMILES that targets the protein Protein-tyrosine phosphatase 1B?\nAssistant: Of course, the compound canonical SMILES O=C(O)c1ccc(S(=O)(=O)Oc2ccc(-c3ccc(-c4c(Cc5ccccc5)oc5ccccc45)cc3)cc2)cc1 targets the compound canonical SMILES O=C(O)[C@H](Oc1ccc(-c2ccc(-c3c(Cc4ccc(O)cc4O)sc4ccccc34)cc2)cc1)c1ccccc1."}", "/scratch/micpie/export/compound_protein_compound_3/train_4-2.jsonl": "{"text":"User: Can you give me one example for a compound InChI that targets the protein Chemoattractant receptor-homologous molecule expressed on Th2 cells?\nAssistant: Of course, the compound InChI InChI=1S\/C22H17F3N2O2\/c1-12-6-7-18-16(10-12)20(13(2)27(18)11-19(28)29)14-8-9-26-21-15(14)4-3-5-17(21)22(23,24)25\/h3-10H,11H2,1-2H3,(H,28,29) targets the protein Chemoattractant receptor-homologous molecule expressed on Th2 cells.\nUser: Can you create another compound InChI that targets the protein Chemoattractant receptor-homologous molecule expressed on Th2 cells?\nAssistant: Of course, the compound InChI InChI=1S\/C22H17F3N2O2\/c1-12-6-7-18-16(10-12)20(13(2)27(18)11-19(28)29)14-8-9-26-21-15(14)4-3-5-17(21)22(23,24)25\/h3-10H,11H2,1-2H3,(H,28,29) targets the compound InChI COc1ccc(CC(=O)O)cc1C1=NCC(=O)N(Cc2ccc(Cl)cc2Cl)c2ccccc21."} {"text":"User: Can you give me one example for a compound SELFIES that targets the protein Monoamine oxidase type B?\nAssistant: Yes, the compound SELFIES [C][C][=C][C][=C][O][C][=Branch1][C][=O][C][Branch1][Branch2][C][=C][C][=C][S][Ring1][Branch1][=C][C][Ring1][N][=C][Ring1][S] targets the protein Monoamine oxidase type B.\nUser: Can you generate another compound SELFIES that targets the protein Monoamine oxidase type B?\nAssistant: Yes, the compound SELFIES [C][C][=C][C][=C][O][C][=Branch1][C][=O][C][Branch1][Branch2][C][=C][C][=C][S][Ring1][Branch1][=C][C][Ring1][N][=C][Ring1][S] targets the compound SELFIES Clc1ccc(\/C=N\/Nc2nc3ccccc3nc2Cc2ccccc2)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/test_6-1.jsonl": "{"text":"The protein Sodium-dependent dopamine transporter is targeted by the compound with the SELFIES [Cl][C][=C][C][=C][Branch1][O][C@][C][N][C][C@H1][Ring1][Branch1][C][Ring1][=Branch1][C][=C][Ring1][N][Cl] and COC[C@@]12CNCC[C@]1(c1ccc3ccccc3c1)C2."} {"text":"The protein Carbonic anhydrase II is targeted by the compound with the SELFIES [C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Branch1][C][C][N][S][Branch1][C][N][=Branch1][C][=O][=O] and CC(=O)NC1Cc2ccc(S(N)(=O)=O)cc2C1."}", "/scratch/micpie/export/compound_protein_compound_3/valid_4-1.jsonl": "{"text":"The protein Chemoattractant receptor-homologous molecule expressed on Th2 cells is targeted by the compound with the InChI InChI=1S\/C20H20FN3O4S\/c1-23(29(27,28)15-7-4-13(21)5-8-15)14-6-9-17-16(11-19(25)26)20-18(24(17)12-14)3-2-10-22-20\/h2-5,7-8,10,14H,6,9,11-12H2,1H3,(H,25,26)\/t14-\/m1\/s1 and COc1ccc(CC(=O)O)cc1C1=NCC(=O)N(Cc2ccc(Cl)cc2Cl)c2ccccc21."} {"text":"The protein Monoamine oxidase type B is targeted by the compound with the canonical SMILES Cc1cncn1-c1ccc2nc(-c3ccccc3)ncc2c1 and Clc1ccc(\/C=N\/Nc2nc3ccccc3nc2Cc2ccccc2)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/test_2-2.jsonl": "{"text":"User: Can you come up with an example for a compound DeepSMILES that targets the protein PtdIns-3-kinase subunit alpha?\nAssistant: Yes, of course, the compound DeepSMILES COC=O)cccCNccccS=O)=O)NcnncC)s5)))))))cc6))))))))cc-ccccOC))nc6))))))c6 targets the protein PtdIns-3-kinase subunit alpha.\nUser: Can you create another compound DeepSMILES that targets the protein PtdIns-3-kinase subunit alpha?\nAssistant: Sure, the compound DeepSMILES COC=O)cccCNccccS=O)=O)NcnncC)s5)))))))cc6))))))))cc-ccccOC))nc6))))))c6 targets the compound DeepSMILES Nc1cn2nc(-c3cnc(Cl)c(NS(=O)(=O)c4ccc(F)cc4)c3)ccc2n1."} {"text":"User: Can you give me one example for a compound canonical SMILES that targets the protein Adenosine receptor A2a?\nAssistant: Of course, the compound canonical SMILES N#Cc1c(NC(=O)C2CC2)nc(-c2ccccc2)nc1-c1ccccc1 targets the protein Adenosine receptor A2a.\nUser: Can you create another compound canonical SMILES that targets the protein Adenosine receptor A2a?\nAssistant: Yes, the compound canonical SMILES N#Cc1c(NC(=O)C2CC2)nc(-c2ccccc2)nc1-c1ccccc1 targets the compound canonical SMILES Cn1c(=O)n(CC2CC2)c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc21."}", "/scratch/micpie/export/compound_protein_compound_3/train_1-1.jsonl": "{"text":"The protein Protein-tyrosine phosphatase 1B is targeted by the compound with the canonical SMILES NS(=O)(=O)C(F)(F)c1cc2cc(CN(Cc3ccc(-c4csnn4)cc3)S(=O)(=O)c3ccc(OCC(=O)O)cc3)ccc2cc1F and O=C(O)[C@H](Oc1ccc(-c2ccc(-c3c(Cc4ccc(O)cc4O)sc4ccccc34)cc2)cc1)c1ccccc1."} {"text":"The protein PI3Kalpha is targeted by the compound with the SMILES C[C@@H]1CC[C@H]2C(C)(C)CCC[C@]2(C)c2c1oc1c(Br)c(O)ccc21 and Nc1cn2nc(-c3cnc(Cl)c(NS(=O)(=O)c4ccc(F)cc4)c3)ccc2n1."}", "/scratch/micpie/export/compound_protein_compound_3/valid_7-0.jsonl": "{"text":"The compound InChI InChI=1S\/C13H16N4O6S\/c14-24(21,22)8-3-1-7(2-4-8)9-5-17(16-15-9)13-12(20)11(19)10(18)6-23-13\/h1-5,10-13,18-20H,6H2,(H2,14,21,22)\/t10-,11-,12+,13+\/m1\/s1 targets the protein Carbonic anhydrase 2 and which is also targeted by the compound CC(=O)NC1Cc2ccc(S(N)(=O)=O)cc2C1."} {"text":"The compound canonical SMILES Cc1ccc(S(=O)(=O)n2cc(Br)c(=O)[nH]c2=O)cc1 targets the protein Cyanamide hydratase CA1 and which is also targeted by the compound NS(=O)(=O)c1ccc(C(=O)NCc2cn([C@@H]3O[C@H](CO)[C@@H](O[C@H]4O[C@H](CO)[C@@H](O)[C@H](O)[C@H]4O)[C@H](O)[C@H]3O)nn2)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/valid_8-1.jsonl": "{"text":"The protein CAB is targeted by the compound with the SMILES NS(=O)(=O)c1ccc(CCn2c(S)nc3ccc(I)cc3c2=O)cc1 and NS(=O)(=O)c1ccc(C(=O)NCc2cn([C@@H]3O[C@H](CO)[C@@H](O[C@H]4O[C@H](CO)[C@@H](O)[C@H](O)[C@H]4O)[C@H](O)[C@H]3O)nn2)cc1."} {"text":"The protein Carbonic anhydrase 2 is targeted by the compound with the InChI InChI=1S\/C9H14N2O4S3\/c1-5-3-7(11-2)6-4-8(18(10,14)15)16-9(6)17(5,12)13\/h4-5,7,11H,3H2,1-2H3,(H2,10,14,15)\/t5-,7-\/m0\/s1 and NS(=O)(=O)c1ccc(CCOC(=O)CN(CCN(CC(=O)O)CC(=O)O)CC(=O)O)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/train_0-1.jsonl": "{"text":"The protein Tyr-DNA phosphodiesterase 1 is targeted by the compound with the SELFIES [C][N][C][=C][C][=C][Ring1][Branch1][\/C][=C][\\C][C][C][\/C][=Branch1][#Branch2][=C][\\C][=C][C][=C][N][Ring1][Branch1][C][C][Ring1][=N][=O] and C\/C(=N\\NC(=O)c1cc(-c2cccs2)nc2ccccc12)c1ccccn1."} {"text":"The protein Protein-tyrosine phosphatase 1B is targeted by the compound with the SELFIES [O][=C][Branch1][C][O][C][=C][C][=C][Branch2][Branch1][=Branch2][S][=Branch1][C][=O][=Branch1][C][=O][O][C][=C][C][=C][Branch2][Ring2][Branch1][C][=C][C][=C][Branch2][Ring1][=Branch2][C][=C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][=C][C][=C][C][=C][Ring1][S][Ring1][=Branch1][C][=C][Ring2][Ring1][=Branch1][C][=C][Ring2][Ring1][N][C][=C][Ring2][Ring2][=Branch1] and O=C(O)[C@H](Oc1ccc(-c2ccc(-c3c(Cc4ccc(O)cc4O)sc4ccccc34)cc2)cc1)c1ccccc1."}", "/scratch/micpie/export/compound_protein_compound_3/valid_8-0.jsonl": "{"text":"The compound canonical SMILES NS(=O)(=O)c1ccc(CCn2c(S)nc3ccc(I)cc3c2=O)cc1 targets the protein Carbonate dehydratase I and which is also targeted by the compound NS(=O)(=O)c1ccc(C(=O)NCc2cn([C@@H]3O[C@H](CO)[C@@H](O[C@H]4O[C@H](CO)[C@@H](O)[C@H](O)[C@H]4O)[C@H](O)[C@H]3O)nn2)cc1."} {"text":"The compound SELFIES [C][N][C@H1][C][C@H1][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][S][C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][=C][C][=Ring1][=Branch2][Ring1][S] targets the protein Carbonic anhydrase C and which is also targeted by the compound NS(=O)(=O)c1ccc(CCOC(=O)CN(CCN(CC(=O)O)CC(=O)O)CC(=O)O)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/test_9-1.jsonl": "{"text":"The protein Carbonate dehydratase II is targeted by the compound with the SELFIES [C][C][Branch1][C][C][C][C][Branch1][C][N][C][=Branch1][C][=O][C@@H1][Branch1][C][N][C][=Branch1][C][=O][N][C][C][O][C][C][O][C][C][N][C][=Branch1][C][=O][C][=C][C][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][C][=C][Ring1][#Branch2] and NS(=O)(=O)c1ccc(CCOC(=O)CN(CCN(CC(=O)O)CC(=O)O)CC(=O)O)cc1."} {"text":"The protein CAB is targeted by the compound with the InChI InChI=1S\/C9H11ClN2O3\/c10-8-3-1-7(2-4-8)6-15-11-5-9(13)12-14\/h1-4,11,14H,5-6H2,(H,12,13) and NS(=O)(=O)c1ccc(\/N=C\/c2ccccc2)c(F)c1."}", "/scratch/micpie/export/compound_protein_compound_3/valid_1-0.jsonl": "{"text":"The compound canonical SMILES NS(=O)(=O)c1cc(-c2ccc(CSCc3ccc(C(F)(F)P(=O)(O)O)c(Br)c3)cc2)ccc1Br targets the protein Protein-tyrosine phosphatase 1B and which is also targeted by the compound O=C(O)[C@H](Oc1ccc(-c2ccc(-c3c(Cc4ccc(O)cc4O)sc4ccccc34)cc2)cc1)c1ccccc1."} {"text":"The compound InChI InChI=1S\/C22H22N6O3\/c1-27(2)17-12-16(21(29)30)24-20-18(14-4-3-5-15-13(14)6-7-23-15)25-22(26-19(17)20)28-8-10-31-11-9-28\/h3-7,12,23H,8-11H2,1-2H3,(H,29,30) targets the protein PtdIns-3-kinase subunit p110-alpha and which is also targeted by the compound Nc1cn2nc(-c3cnc(Cl)c(NS(=O)(=O)c4ccc(F)cc4)c3)ccc2n1."}", "/scratch/micpie/export/compound_protein_compound_3/train_6-2.jsonl": "{"text":"User: Can you come up with one example for a compound InChI that targets the protein DAT?\nAssistant: Yes, the compound InChI InChI=1S\/C18H19N3\/c1-2-21-11-15-5-3-4-6-16(15)17(12-21)13-7-8-14-10-19-20-18(14)9-13\/h3-10,17H,2,11-12H2,1H3,(H,19,20) targets the protein DAT.\nUser: Can you generate another compound InChI that targets the protein DAT?\nAssistant: Yes, the compound InChI InChI=1S\/C18H19N3\/c1-2-21-11-15-5-3-4-6-16(15)17(12-21)13-7-8-14-10-19-20-18(14)9-13\/h3-10,17H,2,11-12H2,1H3,(H,19,20) targets the compound InChI COC[C@@]12CNCC[C@]1(c1ccc3ccccc3c1)C2."} {"text":"User: Can you come up with one example for a compound InChI that targets the protein Carbonic anhydrase 2?\nAssistant: Of course, the compound InChI InChI=1S\/C9H7Cl2N5O3S2\/c10-5-2-1-4(3-6(5)11)13-7(17)14-8-15-16-9(20-8)21(12,18)19\/h1-3H,(H2,12,18,19)(H2,13,14,15,17) targets the protein Carbonic anhydrase 2.\nUser: Can you generate another compound InChI that targets the protein Carbonic anhydrase 2?\nAssistant: Yes, of course, the compound InChI InChI=1S\/C9H7Cl2N5O3S2\/c10-5-2-1-4(3-6(5)11)13-7(17)14-8-15-16-9(20-8)21(12,18)19\/h1-3H,(H2,12,18,19)(H2,13,14,15,17) targets the compound InChI CC(=O)NC1Cc2ccc(S(N)(=O)=O)cc2C1."}", "/scratch/micpie/export/compound_protein_compound_3/valid_6-2.jsonl": "{"text":"User: Can you give me one example for a compound DeepSMILES that targets the protein DA transporter?\nAssistant: Of course, the compound DeepSMILES CCCC=O)NCCNCCCC)C)))CccccCl)cCl)c6))))))CCC4 targets the protein DA transporter.\nUser: Can you create another compound DeepSMILES that targets the protein DA transporter?\nAssistant: Yes, of course, the compound DeepSMILES CCCC=O)NCCNCCCC)C)))CccccCl)cCl)c6))))))CCC4 targets the compound DeepSMILES COC[C@@]12CNCC[C@]1(c1ccc3ccccc3c1)C2."} {"text":"User: Can you give me one example for a compound InChI that targets the protein Carbonic anhydrase 2?\nAssistant: Yes, of course, the compound InChI InChI=1S\/C34H50N4O11S\/c1-19(2)12-27(39)45-18-26-31(47-28(40)13-20(3)4)32(48-29(41)14-21(5)6)33(49-30(42)15-22(7)8)34(46-26)38-17-25(36-37-38)23-10-9-11-24(16-23)50(35,43)44\/h9-11,16-17,19-22,26,31-34H,12-15,18H2,1-8H3,(H2,35,43,44)\/t26-,31-,32+,33-,34-\/m1\/s1 targets the protein Carbonic anhydrase 2.\nUser: Can you create another compound InChI that targets the protein Carbonic anhydrase 2?\nAssistant: Sure, the compound InChI InChI=1S\/C34H50N4O11S\/c1-19(2)12-27(39)45-18-26-31(47-28(40)13-20(3)4)32(48-29(41)14-21(5)6)33(49-30(42)15-22(7)8)34(46-26)38-17-25(36-37-38)23-10-9-11-24(16-23)50(35,43)44\/h9-11,16-17,19-22,26,31-34H,12-15,18H2,1-8H3,(H2,35,43,44)\/t26-,31-,32+,33-,34-\/m1\/s1 targets the compound InChI CC(=O)NC1Cc2ccc(S(N)(=O)=O)cc2C1."}", "/scratch/micpie/export/compound_protein_compound_3/train_6-0.jsonl": "{"text":"The compound InChI InChI=1S\/C18H19N3\/c1-2-21-11-15-5-3-4-6-16(15)17(12-21)13-7-8-14-10-19-20-18(14)9-13\/h3-10,17H,2,11-12H2,1H3,(H,19,20) targets the protein DAT and which is also targeted by the compound COC[C@@]12CNCC[C@]1(c1ccc3ccccc3c1)C2."} {"text":"The compound SMILES NS(=O)(=O)c1nnc(NC(=O)Nc2ccc(Cl)c(Cl)c2)s1 targets the protein Carbonic anhydrase C and which is also targeted by the compound CC(=O)NC1Cc2ccc(S(N)(=O)=O)cc2C1."}", "/scratch/micpie/export/compound_protein_compound_3/train_8-2.jsonl": "{"text":"User: Can you come up with an example for a compound InChI that targets the protein Carbonic anhydrase I?\nAssistant: Sure, the compound InChI InChI=1S\/C15H14Br2N2O3S\/c16-12-7-11(15(20)14(17)8-12)9-19-6-5-10-1-3-13(4-2-10)23(18,21)22\/h1-4,7-9,20H,5-6H2,(H2,18,21,22)\/b19-9- targets the protein Carbonic anhydrase I.\nUser: Can you create another compound InChI that targets the protein Carbonic anhydrase I?\nAssistant: Sure, the compound InChI InChI=1S\/C15H14Br2N2O3S\/c16-12-7-11(15(20)14(17)8-12)9-19-6-5-10-1-3-13(4-2-10)23(18,21)22\/h1-4,7-9,20H,5-6H2,(H2,18,21,22)\/b19-9- targets the compound InChI NS(=O)(=O)c1ccc(C(=O)NCc2cn([C@@H]3O[C@H](CO)[C@@H](O[C@H]4O[C@H](CO)[C@@H](O)[C@H](O)[C@H]4O)[C@H](O)[C@H]3O)nn2)cc1."} {"text":"User: Can you give me one example for a compound canonical SMILES that targets the protein Cyanamide hydratase CA2?\nAssistant: Yes, of course, the compound canonical SMILES N#Cc1cn(-c2nc3ccc(S(N)(=O)=O)cc3s2)nc1-c1ccc(Cl)cc1 targets the protein Cyanamide hydratase CA2.\nUser: Can you generate another compound canonical SMILES that targets the protein Cyanamide hydratase CA2?\nAssistant: Yes, the compound canonical SMILES N#Cc1cn(-c2nc3ccc(S(N)(=O)=O)cc3s2)nc1-c1ccc(Cl)cc1 targets the compound canonical SMILES NS(=O)(=O)c1ccc(CCOC(=O)CN(CCN(CC(=O)O)CC(=O)O)CC(=O)O)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/train_3-1.jsonl": "{"text":"The protein Adenosine receptor A2a is targeted by the compound with the InChI InChI=1S\/C22H17N7O\/c23-22-25-19-17(21-24-20(27-29(21)22)18-9-4-12-30-18)13-28(26-19)11-10-15-7-3-6-14-5-1-2-8-16(14)15\/h1-9,12-13H,10-11H2,(H2,23,25,26) and Cn1c(=O)n(CC2CC2)c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc21."} {"text":"The protein CD antigen CD294 is targeted by the compound with the SELFIES [C][N][Branch1][N][C][=N][C][=C][Branch1][C][F][C][=N][Ring1][#Branch1][C][C][C][C][=C][Branch1][#Branch1][C][C][=Branch1][C][=O][O][C][=C][C][Branch1][C][Cl][=C][C][=C][Ring1][#Branch1][N][Ring1][=C][C][Ring2][Ring1][C] and COc1ccc(CC(=O)O)cc1C1=NCC(=O)N(Cc2ccc(Cl)cc2Cl)c2ccccc21."}", "/scratch/micpie/export/compound_protein_compound_3/test_8-0.jsonl": "{"text":"The compound DeepSMILES CcccC)[n+]-ccccSN)=O)=O))cc6))))))cC)c6C.[O-][Cl+3][O-])[O-])[O-] targets the protein Cyanamide hydratase CA1 and which is also targeted by the compound NS(=O)(=O)c1ccc(C(=O)NCc2cn([C@@H]3O[C@H](CO)[C@@H](O[C@H]4O[C@H](CO)[C@@H](O)[C@H](O)[C@H]4O)[C@H](O)[C@H]3O)nn2)cc1."} {"text":"The compound DeepSMILES NS=O)=O)ccccNC=O)NCCNCCO)))CC6))))))))cc6 targets the protein Carbonic anhydrase II and which is also targeted by the compound NS(=O)(=O)c1ccc(CCOC(=O)CN(CCN(CC(=O)O)CC(=O)O)CC(=O)O)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/valid_3-1.jsonl": "{"text":"The protein Adenosine receptor A2a is targeted by the compound with the SMILES CONc1nc(C#Cc2ccncc2)nc2c1ncn2[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O and Cn1c(=O)n(CC2CC2)c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc21."} {"text":"The protein Prostaglandin D2 receptor 2 is targeted by the compound with the SMILES Cc1ccc(S(=O)(=O)NC(C)(C)C)cc1C#Cc1cc(Cl)ccc1OCC(=O)O and COc1ccc(CC(=O)O)cc1C1=NCC(=O)N(Cc2ccc(Cl)cc2Cl)c2ccccc21."}", "/scratch/micpie/export/compound_protein_compound_3/train_9-1.jsonl": "{"text":"The protein CAC is targeted by the compound with the DeepSMILES COccccc=O)c-ccccOC))cOC))c6))))))cN)oc6c%10 and NS(=O)(=O)c1ccc(CCOC(=O)CN(CCN(CC(=O)O)CC(=O)O)CC(=O)O)cc1."} {"text":"The protein CAB is targeted by the compound with the InChI InChI=1S\/C23H28N4O3S\/c1-17-21(23(28)25-14-7-15-26(2)3)16-22(18-8-5-4-6-9-18)27(17)19-10-12-20(13-11-19)31(24,29)30\/h4-6,8-13,16H,7,14-15H2,1-3H3,(H,25,28)(H2,24,29,30) and NS(=O)(=O)c1ccc(\/N=C\/c2ccccc2)c(F)c1."}", "/scratch/micpie/export/compound_protein_compound_3/test_4-1.jsonl": "{"text":"The protein Prostaglandin D2 receptor 2 is targeted by the compound with the InChI InChI=1S\/C21H15F2N3O2S\/c22-14-5-1-12(2-6-14)19(13-3-7-15(23)8-4-13)21-25-20(16-9-10-24-26-16)17(29-21)11-18(27)28\/h1-10,19H,11H2,(H,24,26)(H,27,28) and COc1ccc(CC(=O)O)cc1C1=NCC(=O)N(Cc2ccc(Cl)cc2Cl)c2ccccc21."} {"text":"The protein Monoamine oxidase type B is targeted by the compound with the DeepSMILES CCCSccccC[C@H]C)N)))cc6 and Clc1ccc(\/C=N\/Nc2nc3ccccc3nc2Cc2ccccc2)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/valid_6-1.jsonl": "{"text":"The protein Solute carrier family 6 member 3 is targeted by the compound with the SMILES CCCC(=O)NCCNC(CC(C)C)C1(c2ccc(Cl)c(Cl)c2)CCC1 and COC[C@@]12CNCC[C@]1(c1ccc3ccccc3c1)C2."} {"text":"The protein Carbonic anhydrase II is targeted by the compound with the InChI InChI=1S\/C34H50N4O11S\/c1-19(2)12-27(39)45-18-26-31(47-28(40)13-20(3)4)32(48-29(41)14-21(5)6)33(49-30(42)15-22(7)8)34(46-26)38-17-25(36-37-38)23-10-9-11-24(16-23)50(35,43)44\/h9-11,16-17,19-22,26,31-34H,12-15,18H2,1-8H3,(H2,35,43,44)\/t26-,31-,32+,33-,34-\/m1\/s1 and CC(=O)NC1Cc2ccc(S(N)(=O)=O)cc2C1."}", "/scratch/micpie/export/compound_protein_compound_3/train_1-2.jsonl": "{"text":"User: Can you come up with one example for a compound SELFIES that targets the protein Tyrosine-protein phosphatase non-receptor type 1?\nAssistant: Sure, the compound SELFIES [N][S][=Branch1][C][=O][=Branch1][C][=O][C][Branch1][C][F][Branch1][C][F][C][=C][C][=C][C][Branch2][Ring2][S][C][N][Branch2][Ring1][Ring1][C][C][=C][C][=C][Branch1][Branch2][C][=C][S][N][=N][Ring1][Branch1][C][=C][Ring1][O][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][Branch2][O][C][C][=Branch1][C][=O][O][C][=C][Ring1][O][=C][C][=C][Ring2][Ring2][C][C][=C][Ring2][Ring2][=Branch1][F] targets the protein Tyrosine-protein phosphatase non-receptor type 1.\nUser: Can you generate another compound SELFIES that targets the protein Tyrosine-protein phosphatase non-receptor type 1?\nAssistant: Yes, the compound SELFIES [N][S][=Branch1][C][=O][=Branch1][C][=O][C][Branch1][C][F][Branch1][C][F][C][=C][C][=C][C][Branch2][Ring2][S][C][N][Branch2][Ring1][Ring1][C][C][=C][C][=C][Branch1][Branch2][C][=C][S][N][=N][Ring1][Branch1][C][=C][Ring1][O][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][Branch2][O][C][C][=Branch1][C][=O][O][C][=C][Ring1][O][=C][C][=C][Ring2][Ring2][C][C][=C][Ring2][Ring2][=Branch1][F] targets the compound SELFIES O=C(O)[C@H](Oc1ccc(-c2ccc(-c3c(Cc4ccc(O)cc4O)sc4ccccc34)cc2)cc1)c1ccccc1."} {"text":"User: Can you give me one example for a compound InChI that targets the protein Serine\/threonine protein kinase PIK3CA?\nAssistant: Yes, of course, the compound InChI InChI=1S\/C21H27BrO2\/c1-12-6-9-15-20(2,3)10-5-11-21(15,4)16-13-7-8-14(23)17(22)19(13)24-18(12)16\/h7-8,12,15,23H,5-6,9-11H2,1-4H3\/t12-,15+,21+\/m1\/s1 targets the protein Serine\/threonine protein kinase PIK3CA.\nUser: Can you create another compound InChI that targets the protein Serine\/threonine protein kinase PIK3CA?\nAssistant: Yes, the compound InChI InChI=1S\/C21H27BrO2\/c1-12-6-9-15-20(2,3)10-5-11-21(15,4)16-13-7-8-14(23)17(22)19(13)24-18(12)16\/h7-8,12,15,23H,5-6,9-11H2,1-4H3\/t12-,15+,21+\/m1\/s1 targets the compound InChI Nc1cn2nc(-c3cnc(Cl)c(NS(=O)(=O)c4ccc(F)cc4)c3)ccc2n1."}", "/scratch/micpie/export/compound_protein_compound_3/valid_6-0.jsonl": "{"text":"The compound DeepSMILES CCCC=O)NCCNCCCC)C)))CccccCl)cCl)c6))))))CCC4 targets the protein DA transporter and which is also targeted by the compound COC[C@@]12CNCC[C@]1(c1ccc3ccccc3c1)C2."} {"text":"The compound SMILES CC(C)CC(=O)OC[C@H]1O[C@@H](n2cc(-c3cccc(S(N)(=O)=O)c3)nn2)[C@H](OC(=O)CC(C)C)[C@@H](OC(=O)CC(C)C)[C@@H]1OC(=O)CC(C)C targets the protein Carbonic anhydrase II and which is also targeted by the compound CC(=O)NC1Cc2ccc(S(N)(=O)=O)cc2C1."}", "/scratch/micpie/export/compound_protein_compound_3/train_3-0.jsonl": "{"text":"The compound InChI InChI=1S\/C22H17N7O\/c23-22-25-19-17(21-24-20(27-29(21)22)18-9-4-12-30-18)13-28(26-19)11-10-15-7-3-6-14-5-1-2-8-16(14)15\/h1-9,12-13H,10-11H2,(H2,23,25,26) targets the protein Adenosine receptor A2a and which is also targeted by the compound Cn1c(=O)n(CC2CC2)c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc21."} {"text":"The compound DeepSMILES CNcnccF)cn6))))))CCCccCC=O)O)))cccCl)ccc6n9C%13 targets the protein Prostaglandin D2 receptor 2 and which is also targeted by the compound COc1ccc(CC(=O)O)cc1C1=NCC(=O)N(Cc2ccc(Cl)cc2Cl)c2ccccc21."}", "/scratch/micpie/export/compound_protein_compound_3/test_9-2.jsonl": "{"text":"User: Can you give me one example for a compound SMILES that targets the protein CA-II?\nAssistant: Of course, the compound SMILES CC(C)CC(N)C(=O)[C@@H](N)C(=O)NCCOCCOCCNC(=O)c1ccc(S(N)(=O)=O)cc1 targets the protein CA-II.\nUser: Can you tell another compound SMILES that targets the protein CA-II?\nAssistant: Of course, the compound SMILES CC(C)CC(N)C(=O)[C@@H](N)C(=O)NCCOCCOCCNC(=O)c1ccc(S(N)(=O)=O)cc1 targets the compound SMILES NS(=O)(=O)c1ccc(CCOC(=O)CN(CCN(CC(=O)O)CC(=O)O)CC(=O)O)cc1."} {"text":"User: Can you come up with an example for a compound SMILES that targets the protein Carbonic anhydrase B?\nAssistant: Sure, the compound SMILES O=C(CNOCc1ccc(Cl)cc1)NO targets the protein Carbonic anhydrase B.\nUser: Can you create another compound SMILES that targets the protein Carbonic anhydrase B?\nAssistant: Yes, the compound SMILES O=C(CNOCc1ccc(Cl)cc1)NO targets the compound SMILES NS(=O)(=O)c1ccc(\/N=C\/c2ccccc2)c(F)c1."}", "/scratch/micpie/export/compound_protein_compound_3/train_7-0.jsonl": "{"text":"The compound InChI InChI=1S\/C16H16Cl2N4O3S2\/c17-13-6-3-11(9-14(13)18)21-15(23)22-16(26)20-8-7-10-1-4-12(5-2-10)27(19,24)25\/h1-6,9H,7-8H2,(H2,19,24,25)(H3,20,21,22,23,26) targets the protein Carbonic anhydrase C and which is also targeted by the compound CC(=O)NC1Cc2ccc(S(N)(=O)=O)cc2C1."} {"text":"The compound canonical SMILES CC(C)(C)c1ccc(S(=O)(=O)Nc2cccc(S(N)(=O)=O)c2)cc1 targets the protein CAB and which is also targeted by the compound NS(=O)(=O)c1ccc(C(=O)NCc2cn([C@@H]3O[C@H](CO)[C@@H](O[C@H]4O[C@H](CO)[C@@H](O)[C@H](O)[C@H]4O)[C@H](O)[C@H]3O)nn2)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/train_4-0.jsonl": "{"text":"The compound SMILES Cc1ccc2c(c1)c(-c1ccnc3c(C(F)(F)F)cccc13)c(C)n2CC(=O)O targets the protein Chemoattractant receptor-homologous molecule expressed on Th2 cells and which is also targeted by the compound COc1ccc(CC(=O)O)cc1C1=NCC(=O)N(Cc2ccc(Cl)cc2Cl)c2ccccc21."} {"text":"The compound SELFIES [C][C][=C][C][=C][O][C][=Branch1][C][=O][C][Branch1][Branch2][C][=C][C][=C][S][Ring1][Branch1][=C][C][Ring1][N][=C][Ring1][S] targets the protein MAO-B and which is also targeted by the compound Clc1ccc(\/C=N\/Nc2nc3ccccc3nc2Cc2ccccc2)cc1."}", "/scratch/micpie/export/compound_protein_compound_3/train_3-2.jsonl": "{"text":"User: Can you give me an example for a compound DeepSMILES that targets the protein Adenosine receptor A2a?\nAssistant: Sure, the compound DeepSMILES NcncnnCCcccccccccc%106))))))))))))cc5cnc-cccco5)))))nn%125 targets the protein Adenosine receptor A2a.\nUser: Can you generate another compound DeepSMILES that targets the protein Adenosine receptor A2a?\nAssistant: Yes, of course, the compound DeepSMILES NcncnnCCcccccccccc%106))))))))))))cc5cnc-cccco5)))))nn%125 targets the compound DeepSMILES Cn1c(=O)n(CC2CC2)c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc21."} {"text":"User: Can you come up with an example for a compound SELFIES that targets the protein Chemoattractant receptor-homologous molecule expressed on Th2 cells?\nAssistant: Sure, the compound SELFIES [C][N][Branch1][N][C][=N][C][=C][Branch1][C][F][C][=N][Ring1][#Branch1][C][C][C][C][=C][Branch1][#Branch1][C][C][=Branch1][C][=O][O][C][=C][C][Branch1][C][Cl][=C][C][=C][Ring1][#Branch1][N][Ring1][=C][C][Ring2][Ring1][C] targets the protein Chemoattractant receptor-homologous molecule expressed on Th2 cells.\nUser: Can you create another compound SELFIES that targets the protein Chemoattractant receptor-homologous molecule expressed on Th2 cells?\nAssistant: Yes, the compound SELFIES [C][N][Branch1][N][C][=N][C][=C][Branch1][C][F][C][=N][Ring1][#Branch1][C][C][C][C][=C][Branch1][#Branch1][C][C][=Branch1][C][=O][O][C][=C][C][Branch1][C][Cl][=C][C][=C][Ring1][#Branch1][N][Ring1][=C][C][Ring2][Ring1][C] targets the compound SELFIES COc1ccc(CC(=O)O)cc1C1=NCC(=O)N(Cc2ccc(Cl)cc2Cl)c2ccccc21."}", "/scratch/micpie/export/compound_protein_compound_3/test_7-2.jsonl": "{"text":"User: Can you give me one example for a compound SELFIES that targets the protein Carbonic anhydrase 2?\nAssistant: Of course, the compound SELFIES [C][C][=C][C][=C][Branch2][Ring2][=C][S][=Branch1][C][=O][=Branch1][C][=O][N][C][=Branch1][C][=O][N][C][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][C][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][C][Branch1][C][Cl][=C][Ring1][#C][Cl][C][=C][Ring2][Ring1][=N] targets the protein Carbonic anhydrase 2.\nUser: Can you create another compound SELFIES that targets the protein Carbonic anhydrase 2?\nAssistant: Of course, the compound SELFIES [C][C][=C][C][=C][Branch2][Ring2][=C][S][=Branch1][C][=O][=Branch1][C][=O][N][C][=Branch1][C][=O][N][C][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][C][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][C][Branch1][C][Cl][=C][Ring1][#C][Cl][C][=C][Ring2][Ring1][=N] targets the compound SELFIES CC(=O)NC1Cc2ccc(S(N)(=O)=O)cc2C1."} {"text":"User: Can you give me an example for a compound InChI that targets the protein Carbonic anhydrase B?\nAssistant: Yes, of course, the compound InChI InChI=1S\/C7H6F4N2O4S2\/c8-5-3-4(18(12,14)15)1-2-6(5)13-19(16,17)7(9,10)11\/h1-3,13H,(H2,12,14,15) targets the protein Carbonic anhydrase B.\nUser: Can you generate another compound InChI that targets the protein Carbonic anhydrase B?\nAssistant: Sure, the compound InChI InChI=1S\/C7H6F4N2O4S2\/c8-5-3-4(18(12,14)15)1-2-6(5)13-19(16,17)7(9,10)11\/h1-3,13H,(H2,12,14,15) targets the compound InChI NS(=O)(=O)c1ccc(C(=O)NCc2cn([C@@H]3O[C@H](CO)[C@@H](O[C@H]4O[C@H](CO)[C@@H](O)[C@H](O)[C@H]4O)[C@H](O)[C@H]3O)nn2)cc1."}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/test_0-10.jsonl": "{"text":"User: I'm looking for the canonical SMILES of a molecule that is not modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: This is a molecule that is not modulating the M1 muscarinic receptor activity in a positive allosteric way: COc1ccc(C(=O)N2CC=C(c3ccccc3)CC2)cc1S(=O)(=O)N1CCOCC1"} {"text":"User: I'm looking for the InChI of a molecule that is not modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: This is a molecule that is not modulating the M1 muscarinic receptor activity in a positive allosteric way: InChI=1S\/C16H18ClN3O2\/c1-18-7-9-19(10-8-18)14-13(17)15(21)20(16(14)22)11-12-5-3-2-4-6-12\/h2-6H,7-11H2,1H3"}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/valid_0-8.jsonl": "{"text":"User: Is the molecule with the SMILES S(Cc1c2c3c(ccc2oc(=O)c1)cccc3)c1n(CCC)c(=O)[nH]n1 modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: No, it is not modulating the M1 muscarinic receptor activity in a positive allosteric way."} {"text":"User: Is the molecule with the SELFIES [O][=C][N][Branch2][Ring1][P][C][=Branch1][C][=O][N][Branch2][Ring1][Branch2][C][N][=C][Branch1][#C][N][Branch1][=Branch1][C][Ring1][#Branch2][=Ring1][Branch1][C][C][Branch1][C][C][=C][N][C][C][C][C][C] modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: No, it is not modulating the M1 muscarinic receptor activity in a positive allosteric way."}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/train_0-8.jsonl": "{"text":"User: Is the molecule with the DeepSMILES S=O)=O)CCNCcsccc5))))))C=O)COcccccc6)))C))C)))))))CC5 modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: No, it is not modulating the M1 muscarinic receptor activity in a positive allosteric way."} {"text":"User: Is the molecule with the canonical SMILES COc1ccc(C(C)=O)cc1CN1C(=O)C(C)N(c2ccc(C)cc2)C1=O modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: No, it is not modulating the M1 muscarinic receptor activity in a positive allosteric way."}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a positive allosteric way.\nSMILES: S(=O)(=O)(N1CCOCC1)c1cc(C(=O)N2CCC(=CC2)c2ccccc2)ccc1OC\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not modulating the M1 muscarinic receptor activity in a positive allosteric way."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a positive allosteric way.\nMolecule SELFIES: [Cl][C][=C][Branch1][N][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][=Branch1][C][=O][N][Branch1][Branch1][C][Ring1][=N][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not modulating the M1 muscarinic receptor activity in a positive allosteric way."}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/valid_0-9.jsonl": "{"text":"User: Can you generate the SELFIES of a molecule that is not modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: Of course, here you go: [S][Branch2][Ring1][O][C][C][C][=C][C][=Branch1][=C][=C][C][=C][Ring1][=Branch1][O][C][=Branch1][C][=O][C][=Ring1][O][C][=C][C][=C][Ring1][=N][C][N][Branch1][Ring2][C][C][C][C][=Branch1][C][=O][NH1][N][=Ring1][=Branch2]"} {"text":"User: Can you create the DeepSMILES of a molecule that is not modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: Yes, here you go: O=cnc=O)ncncnc95)CCC)=C))))NCCC)))))))C)))C"}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/test_0-1.jsonl": "{"text":"Based on the canonical SMILES representation COc1ccc(C(=O)N2CC=C(c3ccccc3)CC2)cc1S(=O)(=O)N1CCOCC1, the molecule displays no positive allosteric modulation of the M1 muscarinic receptor activity."} {"text":"Based on the SELFIES representation [Cl][C][=C][Branch1][N][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][=Branch1][C][=O][N][Branch1][Branch1][C][Ring1][=N][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1], the molecule exhibits no positive allosteric modulation of the M1 muscarinic receptor activity."}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/valid_0-0.jsonl": "{"text":"The molecule with the SMILES S(Cc1c2c3c(ccc2oc(=O)c1)cccc3)c1n(CCC)c(=O)[nH]n1 displays no positive allosteric modulation of the M1 muscarinic receptor activity."} {"text":"The molecule with the SELFIES representation of [O][=C][N][Branch2][Ring1][P][C][=Branch1][C][=O][N][Branch2][Ring1][Branch2][C][N][=C][Branch1][#C][N][Branch1][=Branch1][C][Ring1][#Branch2][=Ring1][Branch1][C][C][Branch1][C][C][=C][N][C][C][C][C][C] shows no positive allosteric modulation of the M1 muscarinic receptor activity."}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/test_0-2.jsonl": "{"text":"The SMILES S(=O)(=O)(N1CCOCC1)c1cc(C(=O)N2CCC(=CC2)c2ccccc2)ccc1OC represents a molecule that shows no positive allosteric modulation of the M1 muscarinic receptor activity."} {"text":"The DeepSMILES ClC=CNCCNCC6))C)))))C=O)NC5=O))Ccccccc6 is from a molecule that exhibits no positive allosteric modulation of the M1 muscarinic receptor activity."}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/valid_0-10.jsonl": "{"text":"User: I'm looking for the canonical SMILES of a molecule that is not modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: This is a molecule that is not modulating the M1 muscarinic receptor activity in a positive allosteric way: CCCn1c(SCc2cc(=O)oc3ccc4ccccc4c23)n[nH]c1=O"} {"text":"User: I'm looking for the InChI of a molecule that is not modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: This is a molecule that is not modulating the M1 muscarinic receptor activity in a positive allosteric way: InChI=1S\/C14H21N5O2\/c1-6-7-15-13-16-11-10(19(13)8-9(2)3)12(20)18(5)14(21)17(11)4\/h2,6-8H2,1,3-5H3,(H,15,16)"}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/train_0-6.jsonl": "{"text":"Task: Please generate a DeepSMILES based on the text description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a positive allosteric way.\nResult: S=O)=O)CCNCcsccc5))))))C=O)COcccccc6)))C))C)))))))CC5"} {"text":"Task: Please create a SMILES based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a positive allosteric way.\nResult: O=C1N(C(=O)N(C1C)c1ccc(cc1)C)Cc1c(OC)ccc(c1)C(=O)C"}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/valid_0-6.jsonl": "{"text":"Task: Please create a molecule canonical SMILES based on the text description below.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a positive allosteric way.\nResult: CCCn1c(SCc2cc(=O)oc3ccc4ccccc4c23)n[nH]c1=O"} {"text":"Task: Please generate a InChI based on the text description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a positive allosteric way.\nResult: InChI=1S\/C14H21N5O2\/c1-6-7-15-13-16-11-10(19(13)8-9(2)3)12(20)18(5)14(21)17(11)4\/h2,6-8H2,1,3-5H3,(H,15,16)"}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/test_0-9.jsonl": "{"text":"User: Can you give me the InChI of a molecule that is not modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: Yes, here you go: InChI=1S\/C23H26N2O5S\/c1-29-21-8-7-20(17-22(21)31(27,28)25-13-15-30-16-14-25)23(26)24-11-9-19(10-12-24)18-5-3-2-4-6-18\/h2-9,17H,10-16H2,1H3"} {"text":"User: Can you give me the SELFIES of a molecule that is not modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: Yes, I'm happy to help, here you go: [Cl][C][=C][Branch1][N][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][=Branch1][C][=O][N][Branch1][Branch1][C][Ring1][=N][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]"}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/test_0-0.jsonl": "{"text":"The molecule with the SELFIES [S][=Branch1][C][=O][=Branch1][C][=O][Branch1][=Branch2][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][Branch2][Ring1][#Branch1][C][=Branch1][C][=O][N][C][C][C][=Branch1][Branch1][=C][C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][C][=C][Ring2][Ring1][Ring2][O][C] shows no positive allosteric modulation of the M1 muscarinic receptor activity."} {"text":"The molecule with the SELFIES representation of [Cl][C][=C][Branch1][N][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][=Branch1][C][=O][N][Branch1][Branch1][C][Ring1][=N][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1] shows no positive allosteric modulation of the M1 muscarinic receptor activity."}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/valid_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the DeepSMILES SCccccccc6oc=O)c%10))))))cccc6)))))))))cnCCC)))c=O)[nH]n5 is modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: No, this molecule is not modulating the M1 muscarinic receptor activity in a positive allosteric way."} {"text":"User: Can you derive if the molecule with the SELFIES [O][=C][N][Branch2][Ring1][P][C][=Branch1][C][=O][N][Branch2][Ring1][Branch2][C][N][=C][Branch1][#C][N][Branch1][=Branch1][C][Ring1][#Branch2][=Ring1][Branch1][C][C][Branch1][C][C][=C][N][C][C][C][C][C] is modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: No, this molecule is not modulating the M1 muscarinic receptor activity in a positive allosteric way."}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/test_0-3.jsonl": "{"text":"The canonical SMILES COc1ccc(C(=O)N2CC=C(c3ccccc3)CC2)cc1S(=O)(=O)N1CCOCC1 is not modulating the M1 muscarinic receptor activity in a positive allosteric way."} {"text":"The molecule InChI InChI=1S\/C16H18ClN3O2\/c1-18-7-9-19(10-8-18)14-13(17)15(21)20(16(14)22)11-12-5-3-2-4-6-12\/h2-6H,7-11H2,1H3 is not modulating the M1 muscarinic receptor activity in a positive allosteric way."}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/valid_0-11.jsonl": "{"text":"User: I want to create a SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be modulating the M1 muscarinic receptor activity in a positive allosteric way.\nAssistant: Ok, here you go, this SMILES is not modulating the M1 muscarinic receptor activity in a positive allosteric way: S(Cc1c2c3c(ccc2oc(=O)c1)cccc3)c1n(CCC)c(=O)[nH]n1"} {"text":"User: I want to create a SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be modulating the M1 muscarinic receptor activity in a positive allosteric way.\nAssistant: Got it, here you go, this SMILES is not modulating the M1 muscarinic receptor activity in a positive allosteric way: O=c1n(c(=O)n(c2nc(n(c12)CC(C)=C)NCCC)C)C"}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/train_0-0.jsonl": "{"text":"The molecule with the InChI representation of InChI=1S\/C19H23NO4S2\/c1-14-5-3-7-18(15(14)2)24-12-19(21)20(11-17-6-4-9-25-17)16-8-10-26(22,23)13-16\/h3-7,9,16H,8,10-13H2,1-2H3 shows no positive allosteric modulation of the M1 muscarinic receptor activity."} {"text":"The molecule with the SELFIES representation of [O][=C][N][Branch2][Ring1][#Branch1][C][=Branch1][C][=O][N][Branch1][Branch1][C][Ring1][=Branch1][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][C][C][=C][Branch1][Ring1][O][C][C][=C][C][=Branch1][Ring2][=C][Ring1][Branch2][C][=Branch1][C][=O][C] shows no positive allosteric modulation of the M1 muscarinic receptor activity."}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/test_0-6.jsonl": "{"text":"Task: Please generate a molecule canonical SMILES based on the text description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a positive allosteric way.\nResult: COc1ccc(C(=O)N2CC=C(c3ccccc3)CC2)cc1S(=O)(=O)N1CCOCC1"} {"text":"Task: Please generate a molecule SELFIES based on the text description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a positive allosteric way.\nResult: [Cl][C][=C][Branch1][N][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][=Branch1][C][=O][N][Branch1][Branch1][C][Ring1][=N][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]"}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/train_0-10.jsonl": "{"text":"User: I'm searching for the InChI of a molecule that is not modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: This is a molecule that is not modulating the M1 muscarinic receptor activity in a positive allosteric way: InChI=1S\/C19H23NO4S2\/c1-14-5-3-7-18(15(14)2)24-12-19(21)20(11-17-6-4-9-25-17)16-8-10-26(22,23)13-16\/h3-7,9,16H,8,10-13H2,1-2H3"} {"text":"User: I'm searching for the DeepSMILES of a molecule that is not modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: This is a molecule that is not modulating the M1 muscarinic receptor activity in a positive allosteric way: O=CNC=O)NC5C))cccccc6))C)))))))CccOC))cccc6)C=O)C"}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/train_0-3.jsonl": "{"text":"The molecule canonical SMILES Cc1cccc(OCC(=O)N(Cc2cccs2)C2CCS(=O)(=O)C2)c1C is not modulating the M1 muscarinic receptor activity in a positive allosteric way."} {"text":"The molecule SMILES O=C1N(C(=O)N(C1C)c1ccc(cc1)C)Cc1c(OC)ccc(c1)C(=O)C is not modulating the M1 muscarinic receptor activity in a positive allosteric way."}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/train_0-12.jsonl": "{"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be modulating the M1 muscarinic receptor activity in a positive allosteric way.\nAssistant: Ok, this SELFIES is not modulating the M1 muscarinic receptor activity in a positive allosteric way: [S][=Branch1][C][=O][=Branch1][C][=O][C][C][Branch2][Ring1][S][N][Branch1][=Branch2][C][C][S][C][=C][C][=Ring1][Branch1][C][=Branch1][C][=O][C][O][C][=C][Branch1][#Branch2][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][#Branch2]"} {"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be modulating the M1 muscarinic receptor activity in a positive allosteric way.\nAssistant: Ok, this canonical SMILES is not modulating the M1 muscarinic receptor activity in a positive allosteric way: COc1ccc(C(C)=O)cc1CN1C(=O)C(C)N(c2ccc(C)cc2)C1=O"}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not modulating the M1 muscarinic receptor activity in a positive allosteric way?\nConstraint: You must select none, one or more options from a, b, or c without using any additional words.\nOptions:\n[a] InChI=1S\/C23H26N2O5S\/c1-29-21-8-7-20(17-22(21)31(27,28)25-13-15-30-16-14-25)23(26)24-11-9-19(10-12-24)18-5-3-2-4-6-18\/h2-9,17H,10-16H2,1H3\n[b] InChI=1S\/C15H11BrN2O2\/c1-19-13-8-7-11(16)9-12(13)14-17-15(20-18-14)10-5-3-2-4-6-10\/h2-9H,1H3\n[c] InChI=1S\/C21H22N4O3S\/c26-20(25-13-11-24(12-14-25)17-7-3-1-4-8-17)16-29-21-23-22-19(28-21)15-27-18-9-5-2-6-10-18\/h1-10H,11-16H2\nAnswer: a, b, c"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not modulating the M1 muscarinic receptor activity in a positive allosteric way?\nConstraint: You must select none, one or more options from 1 or 2 without using any other words.\nOptions:\n1 CN1CCN(C2=C(Cl)C(=O)N(Cc3ccccc3)C2=O)CC1\n2 CCc1ccccc1NC(=O)c1ccoc1C\nAnswer: 1, 2"}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/valid_0-2.jsonl": "{"text":"The canonical SMILES CCCn1c(SCc2cc(=O)oc3ccc4ccccc4c23)n[nH]c1=O represents a molecule that shows no positive allosteric modulation of the M1 muscarinic receptor activity."} {"text":"The SMILES O=c1n(c(=O)n(c2nc(n(c12)CC(C)=C)NCCC)C)C is from a molecule that displays no positive allosteric modulation of the M1 muscarinic receptor activity."}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/valid_0-1.jsonl": "{"text":"Based on the InChI representation InChI=1S\/C19H17N3O3S\/c1-2-9-22-18(24)20-21-19(22)26-11-13-10-16(23)25-15-8-7-12-5-3-4-6-14(12)17(13)15\/h3-8,10H,2,9,11H2,1H3,(H,20,24), the molecule displays no positive allosteric modulation of the M1 muscarinic receptor activity."} {"text":"Based on the SMILES representation O=c1n(c(=O)n(c2nc(n(c12)CC(C)=C)NCCC)C)C, the molecule exhibits no positive allosteric modulation of the M1 muscarinic receptor activity."}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not modulating the M1 muscarinic receptor activity in a positive allosteric way?\nConstraint: You must select none, one or more options from A, B, C, or D without using any other words.\nOptions:\n[A] SCccccccc6oc=O)c%10))))))cccc6)))))))))cnCCC)))c=O)[nH]n5\n[B] OcncNCcccccc6))))))))ncNCC)))n6))))))cn[nH]c=O)cc6\n[C] ClccccNCCNCC6))C=O)CCCCNccS=O)=O)N=6))cccc6))))))))))))))))cc6\n[D] ClccccSCC=O)NCC))))C)))cc6\nAnswer: A, B, C, D"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not modulating the M1 muscarinic receptor activity in a positive allosteric way?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any other words.\nOptions:\n[A] InChI=1S\/C17H20N2O4\/c1-3-4-12-22-17(21)13(2)23-15-10-11-16(20)19(18-15)14-8-6-5-7-9-14\/h5-11,13H,3-4,12H2,1-2H3\n[B] InChI=1S\/C15H17N3O3S\/c1-20-11-6-4-10(5-7-11)9-13-17-18-15(22-13)16-14(19)12-3-2-8-21-12\/h4-7,12H,2-3,8-9H2,1H3,(H,16,18,19)\n[C] InChI=1S\/C12H15BrN2O3\/c1-2-18-11(16)4-3-5-15-12(17)9-6-10(13)8-14-7-9\/h6-8H,2-5H2,1H3,(H,15,17)\n[D] InChI=1S\/C20H20N2O4S\/c23-19(22-9-3-4-10-22)12-27-18-6-2-1-5-15(18)20(24)21-14-7-8-16-17(11-14)26-13-25-16\/h1-2,5-8,11H,3-4,9-10,12-13H2,(H,21,24)\n[E] InChI=1S\/C14H21N5O2\/c1-6-7-15-13-16-11-10(19(13)8-9(2)3)12(20)18(5)14(21)17(11)4\/h2,6-8H2,1,3-5H3,(H,15,16)\nAnswer: A, B, C, D, E"}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a positive allosteric way.\nInChI: InChI=1S\/C19H17N3O3S\/c1-2-9-22-18(24)20-21-19(22)26-11-13-10-16(23)25-15-8-7-12-5-3-4-6-14(12)17(13)15\/h3-8,10H,2,9,11H2,1H3,(H,20,24)\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not modulating the M1 muscarinic receptor activity in a positive allosteric way."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a positive allosteric way.\nSELFIES: [O][=C][N][Branch2][Ring1][P][C][=Branch1][C][=O][N][Branch2][Ring1][Branch2][C][N][=C][Branch1][#C][N][Branch1][=Branch1][C][Ring1][#Branch2][=Ring1][Branch1][C][C][Branch1][C][C][=C][N][C][C][C][C][C]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not modulating the M1 muscarinic receptor activity in a positive allosteric way."}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a positive allosteric way.\nMolecule InChI: InChI=1S\/C19H17N3O3S\/c1-2-9-22-18(24)20-21-19(22)26-11-13-10-16(23)25-15-8-7-12-5-3-4-6-14(12)17(13)15\/h3-8,10H,2,9,11H2,1H3,(H,20,24)\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a positive allosteric way.\ncanonical SMILES: C=C(C)Cn1c(NCCC)nc2c1c(=O)n(C)c(=O)n2C\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a positive allosteric way.\nMolecule InChI: InChI=1S\/C19H23NO4S2\/c1-14-5-3-7-18(15(14)2)24-12-19(21)20(11-17-6-4-9-25-17)16-8-10-26(22,23)13-16\/h3-7,9,16H,8,10-13H2,1-2H3\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not modulating the M1 muscarinic receptor activity in a positive allosteric way."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a positive allosteric way.\nDeepSMILES: O=CNC=O)NC5C))cccccc6))C)))))))CccOC))cccc6)C=O)C\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not modulating the M1 muscarinic receptor activity in a positive allosteric way."}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/valid_0-12.jsonl": "{"text":"User: I want to come up with a InChI.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be modulating the M1 muscarinic receptor activity in a positive allosteric way.\nAssistant: Got it, this InChI is not modulating the M1 muscarinic receptor activity in a positive allosteric way: InChI=1S\/C19H17N3O3S\/c1-2-9-22-18(24)20-21-19(22)26-11-13-10-16(23)25-15-8-7-12-5-3-4-6-14(12)17(13)15\/h3-8,10H,2,9,11H2,1H3,(H,20,24)"} {"text":"User: I want to create a SELFIES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be modulating the M1 muscarinic receptor activity in a positive allosteric way.\nAssistant: Ok, this SELFIES is not modulating the M1 muscarinic receptor activity in a positive allosteric way: [O][=C][N][Branch2][Ring1][P][C][=Branch1][C][=O][N][Branch2][Ring1][Branch2][C][N][=C][Branch1][#C][N][Branch1][=Branch1][C][Ring1][#Branch2][=Ring1][Branch1][C][C][Branch1][C][C][=C][N][C][C][C][C][C]"}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/train_0-2.jsonl": "{"text":"The canonical SMILES Cc1cccc(OCC(=O)N(Cc2cccs2)C2CCS(=O)(=O)C2)c1C represents a molecule that exhibits no positive allosteric modulation of the M1 muscarinic receptor activity."} {"text":"The canonical SMILES COc1ccc(C(C)=O)cc1CN1C(=O)C(C)N(c2ccc(C)cc2)C1=O is from a molecule that displays no positive allosteric modulation of the M1 muscarinic receptor activity."}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/test_0-11.jsonl": "{"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be modulating the M1 muscarinic receptor activity in a positive allosteric way.\nAssistant: Got it, this DeepSMILES is not modulating the M1 muscarinic receptor activity in a positive allosteric way: S=O)=O)NCCOCC6))))))cccC=O)NCCC=CC6))cccccc6)))))))))))ccc6OC"} {"text":"User: I want to create a SELFIES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be modulating the M1 muscarinic receptor activity in a positive allosteric way.\nAssistant: Ok, here you go, this SELFIES is not modulating the M1 muscarinic receptor activity in a positive allosteric way: [Cl][C][=C][Branch1][N][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][=Branch1][C][=O][N][Branch1][Branch1][C][Ring1][=N][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1]"}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/train_0-7.jsonl": "{"text":"User: Can you derive if the molecule with the DeepSMILES S=O)=O)CCNCcsccc5))))))C=O)COcccccc6)))C))C)))))))CC5 is modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: No, this molecule is not modulating the M1 muscarinic receptor activity in a positive allosteric way."} {"text":"User: Can you derive if the molecule with the SELFIES [O][=C][N][Branch2][Ring1][#Branch1][C][=Branch1][C][=O][N][Branch1][Branch1][C][Ring1][=Branch1][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][C][C][=C][Branch1][Ring1][O][C][C][=C][C][=Branch1][Ring2][=C][Ring1][Branch2][C][=Branch1][C][=O][C] is modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: No, this molecule is not modulating the M1 muscarinic receptor activity in a positive allosteric way."}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/train_0-11.jsonl": "{"text":"User: I want to create a molecule SELFIES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be modulating the M1 muscarinic receptor activity in a positive allosteric way.\nAssistant: Ok, here you go, this SELFIES is not modulating the M1 muscarinic receptor activity in a positive allosteric way: [S][=Branch1][C][=O][=Branch1][C][=O][C][C][Branch2][Ring1][S][N][Branch1][=Branch2][C][C][S][C][=C][C][=Ring1][Branch1][C][=Branch1][C][=O][C][O][C][=C][Branch1][#Branch2][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][#Branch2]"} {"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be modulating the M1 muscarinic receptor activity in a positive allosteric way.\nAssistant: Ok, this SMILES is not modulating the M1 muscarinic receptor activity in a positive allosteric way: O=C1N(C(=O)N(C1C)c1ccc(cc1)C)Cc1c(OC)ccc(c1)C(=O)C"}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/train_0-1.jsonl": "{"text":"Based on the SMILES S1(=O)(=O)CC(N(Cc2sccc2)C(=O)COc2c(c(ccc2)C)C)CC1, the molecule shows no positive allosteric modulation of the M1 muscarinic receptor activity."} {"text":"Based on the SELFIES [O][=C][N][Branch2][Ring1][#Branch1][C][=Branch1][C][=O][N][Branch1][Branch1][C][Ring1][=Branch1][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][C][C][=C][Branch1][Ring1][O][C][C][=C][C][=Branch1][Ring2][=C][Ring1][Branch2][C][=Branch1][C][=O][C], the molecule shows no positive allosteric modulation of the M1 muscarinic receptor activity."}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not modulating the M1 muscarinic receptor activity in a positive allosteric way?\nConstraint: You must select none, one or more options from a, b, c, d, or e without using any other words.\nOptions:\na OC=O)NccccNC=O)C)))cc6))))))))C\nb S=O)=O)NCCCC5)))))ccccNC=O)CCCOCC)))=O))))))cc6\nc SCCC))C=O)Ncscnn5))CC))))))))CCOCC)))=O\nd S=O)=O)CCNCcsccc5))))))C=O)COcccccc6)))C))C)))))))CC5\ne OC=O)NCCNCC6))ccccNC=O)cnccnc6))C)))))))cc6)))))))))))CC)C)C\nAnswer: a, b, c, d, e"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not modulating the M1 muscarinic receptor activity in a positive allosteric way?\nConstraint: You must select none, one or more options from A, B, or C without using any additional words.\nOptions:\nA) COc1ccc(C(C)=O)cc1CN1C(=O)C(C)N(c2ccc(C)cc2)C1=O\nB) COc1ccc(C(CC(=O)O)NC(=O)OC(C)(C)C)cc1\nC) COC(=O)c1ccc(C(=O)OC)c(NC(=O)COc2ccccc2OC)c1\nAnswer: A, B, C"}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a positive allosteric way.\nMolecule InChI: InChI=1S\/C19H23NO4S2\/c1-14-5-3-7-18(15(14)2)24-12-19(21)20(11-17-6-4-9-25-17)16-8-10-26(22,23)13-16\/h3-7,9,16H,8,10-13H2,1-2H3\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a positive allosteric way.\nMolecule canonical SMILES: COc1ccc(C(C)=O)cc1CN1C(=O)C(C)N(c2ccc(C)cc2)C1=O\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/test_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the SMILES S(=O)(=O)(N1CCOCC1)c1cc(C(=O)N2CCC(=CC2)c2ccccc2)ccc1OC is modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: No, this molecule is not modulating the M1 muscarinic receptor activity in a positive allosteric way."} {"text":"User: Can you tell me if the molecule with the DeepSMILES ClC=CNCCNCC6))C)))))C=O)NC5=O))Ccccccc6 is modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: No, this molecule is not modulating the M1 muscarinic receptor activity in a positive allosteric way."}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/train_0-9.jsonl": "{"text":"User: Can you give me the SMILES of a molecule that is not modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: Of course, here you go: S1(=O)(=O)CC(N(Cc2sccc2)C(=O)COc2c(c(ccc2)C)C)CC1"} {"text":"User: Can you generate the canonical SMILES of a molecule that is not modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: Yes, here you go: COc1ccc(C(C)=O)cc1CN1C(=O)C(C)N(c2ccc(C)cc2)C1=O"}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/valid_0-3.jsonl": "{"text":"The SMILES S(Cc1c2c3c(ccc2oc(=O)c1)cccc3)c1n(CCC)c(=O)[nH]n1 is not modulating the M1 muscarinic receptor activity in a positive allosteric way."} {"text":"The molecule SMILES O=c1n(c(=O)n(c2nc(n(c12)CC(C)=C)NCCC)C)C is not modulating the M1 muscarinic receptor activity in a positive allosteric way."}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/test_0-8.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C23H26N2O5S\/c1-29-21-8-7-20(17-22(21)31(27,28)25-13-15-30-16-14-25)23(26)24-11-9-19(10-12-24)18-5-3-2-4-6-18\/h2-9,17H,10-16H2,1H3 modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: No, it is not modulating the M1 muscarinic receptor activity in a positive allosteric way."} {"text":"User: Is the molecule with the SMILES ClC1=C(N2CCN(CC2)C)C(=O)N(C1=O)Cc1ccccc1 modulating the M1 muscarinic receptor activity in a positive allosteric way?\nAssistant: No, it is not modulating the M1 muscarinic receptor activity in a positive allosteric way."}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a positive allosteric way.\nSMILES: S(=O)(=O)(N1CCOCC1)c1cc(C(=O)N2CCC(=CC2)c2ccccc2)ccc1OC\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is modulating the M1 muscarinic receptor activity in a positive allosteric way.\nMolecule InChI: InChI=1S\/C16H18ClN3O2\/c1-18-7-9-19(10-8-18)14-13(17)15(21)20(16(14)22)11-12-5-3-2-4-6-12\/h2-6H,7-11H2,1H3\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/m1_muscarinic_receptor_agonists_butkiewicz/test_0-12.jsonl": "{"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be modulating the M1 muscarinic receptor activity in a positive allosteric way.\nAssistant: Understood, this DeepSMILES is not modulating the M1 muscarinic receptor activity in a positive allosteric way: S=O)=O)NCCOCC6))))))cccC=O)NCCC=CC6))cccccc6)))))))))))ccc6OC"} {"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be modulating the M1 muscarinic receptor activity in a positive allosteric way.\nAssistant: Ok, this canonical SMILES is not modulating the M1 muscarinic receptor activity in a positive allosteric way: CN1CCN(C2=C(Cl)C(=O)N(Cc3ccccc3)C2=O)CC1"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/test_0-10.jsonl": "{"text":"User: I'm searching for the DeepSMILES of a molecule that is inhibiting CYP3A4?\nAssistant: This is a molecule that is inhibiting CYP3A4: O=c[nH]cccccc6cc%10CNCCCO))))Ccnnnn5CccccF)cc6)))))))))))))))))))OCO5"} {"text":"User: I'm searching for the SELFIES of a molecule that is inhibiting CYP P450 3A4?\nAssistant: This is a molecule that is inhibiting CYP P450 3A4: [O][=C][Branch1][=N][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][=C][C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][C][C][C][Ring1][=Branch1][=C][S][Ring1][=C]"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/valid_0-8.jsonl": "{"text":"User: Is the molecule with the SMILES CCOC(=O)CSC1=C(C#N)C(C)C2=C(CCCC2=O)N1 inhibiting CYP P450 3A4?\nAssistant: No, it is not inhibiting CYP P450 3A4."} {"text":"User: Is the molecule with the SELFIES [C][C][=C][C][=C][Branch2][Ring1][Branch1][S][=Branch1][C][=O][=Branch1][C][=O][N][C][=Branch1][C][=O][N][C][=N][C][=N][NH1][Ring1][Branch1][C][=C][Ring2][Ring1][C] inhibiting CYP3A4?\nAssistant: No, it is not inhibiting CYP3A4."}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/train_0-8.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C13H9NO5S3\/c1-8(15)14(22(17,18)12-3-2-6-20-12)9-4-5-10-11(7-9)21-13(16)19-10\/h2-7H,1H3 inhibiting CYP3A4?\nAssistant: Yes, it is inhibiting CYP3A4."} {"text":"User: Is the molecule with the SELFIES [C][O][C][=N][C][=C][N][=C][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][Branch1][C][C][C][Ring1][S][=N][Ring2][Ring1][Ring2] inhibiting CYP3A4?\nAssistant: No, it is not inhibiting CYP3A4."}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 3A4.\nMolecule SELFIES: [O][=C][NH1][C][=C][C][=C][Branch2][Ring2][Branch1][C][=C][Ring1][=Branch1][C][=C][Ring1][#Branch2][C][N][Branch1][Branch1][C][C][C][O][C][C][=N][N][=N][N][Ring1][Branch1][C][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][O][C][O][Ring2][Ring1][=N]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is inhibiting CYP P450 3A4."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 3A4.\ncanonical SMILES: O=C(Nc1ccc(F)cc1)c1cc(S(=O)(=O)N2CCCCC2)cs1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is inhibiting CYP P450 3A4."}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/valid_0-9.jsonl": "{"text":"User: Can you generate the SMILES of a molecule that is not inhibiting CYP P450 3A4?\nAssistant: Yes, here you go: CCOC(=O)CSC1=C(C#N)C(C)C2=C(CCCC2=O)N1"} {"text":"User: Can you give me the SMILES of a molecule that is not inhibiting CYP P450 3A4?\nAssistant: Of course, here you go: Cc1ccc(S(=O)(=O)NC(=O)Nc2ncn[nH]2)cc1"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/test_0-1.jsonl": "{"text":"Based on the IUPAC name 7-[[[1-[(4-fluorophenyl)methyl]tetrazol-5-yl]methyl-(3-hydroxypropyl)amino]methyl]-5H-[1,3]dioxolo[4,5-g]quinolin-6-one, the molecule shows inhibition of CYP3A4."} {"text":"Based on the SELFIES representation [O][=C][Branch1][=N][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][=C][C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][C][C][C][Ring1][=Branch1][=C][S][Ring1][=C], the molecule shows inhibition of CYP P450 3A4."}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/valid_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][O][C][=Branch1][C][=O][C][S][C][=C][Branch1][Ring1][C][#N][C][Branch1][C][C][C][=C][Branch1][Branch2][C][C][C][C][Ring1][=Branch1][=O][N][Ring1][=C] displays no inhibition of CYP P450 3A4."} {"text":"The molecule with the SMILES representation of Cc1ccc(S(=O)(=O)NC(=O)Nc2ncn[nH]2)cc1 shows no inhibition of CYP P450 3A4."}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/test_0-2.jsonl": "{"text":"The InChI InChI=1S\/C23H23FN6O4\/c24-18-4-2-15(3-5-18)11-30-22(26-27-28-30)13-29(6-1-7-31)12-17-8-16-9-20-21(34-14-33-20)10-19(16)25-23(17)32\/h2-5,8-10,31H,1,6-7,11-14H2,(H,25,32) represents a molecule that displays inhibition of CYP3A4."} {"text":"The InChI InChI=1S\/C16H17FN2O3S2\/c17-12-4-6-13(7-5-12)18-16(20)15-10-14(11-23-15)24(21,22)19-8-2-1-3-9-19\/h4-7,10-11H,1-3,8-9H2,(H,18,20) is from a molecule that shows inhibition of CYP3A4."}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/valid_0-10.jsonl": "{"text":"User: I'm looking for the DeepSMILES of a molecule that is not inhibiting CYP3A4?\nAssistant: This is a molecule that is not inhibiting CYP3A4: CCOC=O)CSC=CC#N))CC)C=CCCCC6=O)))))N6"} {"text":"User: I'm searching for the InChI of a molecule that is not inhibiting CYP3A4?\nAssistant: This is a molecule that is not inhibiting CYP3A4: InChI=1S\/C10H11N5O3S\/c1-7-2-4-8(5-3-7)19(17,18)15-10(16)13-9-11-6-12-14-9\/h2-6H,1H3,(H3,11,12,13,14,15,16)"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/train_0-6.jsonl": "{"text":"Task: Please generate a DeepSMILES based on the description below.\nDescription: A molecule that is inhibiting CYP3A4.\nResult: CC=O)Nccccoc=O)sc5c9)))))))))S=O)=O)ccccs5"} {"text":"Task: Please generate a molecule canonical SMILES based on the description below.\nDescription: A molecule that is inhibiting CYP P450 3A4.\nResult: COc1ncc2nc(CCc3ccccc3)c(=O)n(C)c2n1"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/valid_0-6.jsonl": "{"text":"Task: Please generate a SELFIES based on the description below.\nDescription: A molecule that is inhibiting CYP3A4.\nResult: [C][C][O][C][=Branch1][C][=O][C][S][C][=C][Branch1][Ring1][C][#N][C][Branch1][C][C][C][=C][Branch1][Branch2][C][C][C][C][Ring1][=Branch1][=O][N][Ring1][=C]"} {"text":"Task: Please create a canonical SMILES based on the description below.\nDescription: A molecule that is inhibiting CYP3A4.\nResult: Cc1ccc(S(=O)(=O)NC(=O)Nc2ncn[nH]2)cc1"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/test_0-9.jsonl": "{"text":"User: Can you give me the IUPAC name of a molecule that is inhibiting CYP3A4?\nAssistant: Of course, here you go: 7-[[[1-[(4-fluorophenyl)methyl]tetrazol-5-yl]methyl-(3-hydroxypropyl)amino]methyl]-5H-[1,3]dioxolo[4,5-g]quinolin-6-one"} {"text":"User: Can you create the DeepSMILES of a molecule that is inhibiting CYP P450 3A4?\nAssistant: Of course, here you go: O=CNccccF)cc6)))))))cccS=O)=O)NCCCCC6)))))))cs5"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/test_0-0.jsonl": "{"text":"The molecule with the IUPAC name representation of 7-[[[1-[(4-fluorophenyl)methyl]tetrazol-5-yl]methyl-(3-hydroxypropyl)amino]methyl]-5H-[1,3]dioxolo[4,5-g]quinolin-6-one displays inhibition of CYP3A4."} {"text":"The molecule with the DeepSMILES O=CNccccF)cc6)))))))cccS=O)=O)NCCCCC6)))))))cs5 exhibits inhibition of CYP3A4."}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/valid_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the SMILES CCOC(=O)CSC1=C(C#N)C(C)C2=C(CCCC2=O)N1 is inhibiting CYP P450 3A4?\nAssistant: No, this molecule is not inhibiting CYP P450 3A4."} {"text":"User: Can you derive if the molecule with the InChI InChI=1S\/C10H11N5O3S\/c1-7-2-4-8(5-3-7)19(17,18)15-10(16)13-9-11-6-12-14-9\/h2-6H,1H3,(H3,11,12,13,14,15,16) is inhibiting CYP P450 3A4?\nAssistant: No, this molecule is not inhibiting CYP P450 3A4."}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/test_0-3.jsonl": "{"text":"The molecule IUPAC name 7-[[[1-[(4-fluorophenyl)methyl]tetrazol-5-yl]methyl-(3-hydroxypropyl)amino]methyl]-5H-[1,3]dioxolo[4,5-g]quinolin-6-one is inhibiting CYP P450 3A4."} {"text":"The SMILES O=C(Nc1ccc(F)cc1)c1cc(S(=O)(=O)N2CCCCC2)cs1 is inhibiting CYP3A4."}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/valid_0-11.jsonl": "{"text":"User: I want to come up with a canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be inhibiting CYP3A4.\nAssistant: Ok, this canonical SMILES is not inhibiting CYP3A4: CCOC(=O)CSC1=C(C#N)C(C)C2=C(CCCC2=O)N1"} {"text":"User: I want to generate a InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be inhibiting CYP3A4.\nAssistant: Ok, here you go, this InChI is not inhibiting CYP3A4: InChI=1S\/C10H11N5O3S\/c1-7-2-4-8(5-3-7)19(17,18)15-10(16)13-9-11-6-12-14-9\/h2-6H,1H3,(H3,11,12,13,14,15,16)"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/train_0-0.jsonl": "{"text":"The molecule with the SMILES representation of CC(=O)N(c1ccc2oc(=O)sc2c1)S(=O)(=O)c1cccs1 exhibits inhibition of CYP P450 3A4."} {"text":"The molecule with the canonical SMILES representation of COc1ncc2nc(CCc3ccccc3)c(=O)n(C)c2n1 shows no inhibition of CYP3A4."}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/test_0-6.jsonl": "{"text":"Task: Please create a SMILES based on the text description below.\nDescription: A molecule that is inhibiting CYP3A4.\nResult: O=c1[nH]c2cc3c(cc2cc1CN(CCCO)Cc1nnnn1Cc1ccc(F)cc1)OCO3"} {"text":"Task: Please give me a InChI based on the description.\nDescription: A molecule that is inhibiting CYP P450 3A4.\nResult: InChI=1S\/C16H17FN2O3S2\/c17-12-4-6-13(7-5-12)18-16(20)15-10-14(11-23-15)24(21,22)19-8-2-1-3-9-19\/h4-7,10-11H,1-3,8-9H2,(H,18,20)"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/train_0-10.jsonl": "{"text":"User: I'm looking for the InChI of a molecule that is inhibiting CYP P450 3A4?\nAssistant: This is a molecule that is inhibiting CYP P450 3A4: InChI=1S\/C13H9NO5S3\/c1-8(15)14(22(17,18)12-3-2-6-20-12)9-4-5-10-11(7-9)21-13(16)19-10\/h2-7H,1H3"} {"text":"User: I'm searching for the SMILES of a molecule that is not inhibiting CYP P450 3A4?\nAssistant: This is a molecule that is not inhibiting CYP P450 3A4: COc1ncc2nc(CCc3ccccc3)c(=O)n(C)c2n1"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/train_0-3.jsonl": "{"text":"The DeepSMILES CC=O)Nccccoc=O)sc5c9)))))))))S=O)=O)ccccs5 is inhibiting CYP P450 3A4."} {"text":"The molecule canonical SMILES COc1ncc2nc(CCc3ccccc3)c(=O)n(C)c2n1 is not inhibiting CYP3A4."}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/train_0-12.jsonl": "{"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should be inhibiting CYP P450 3A4.\nAssistant: Understood, this DeepSMILES is inhibiting CYP P450 3A4: CC=O)Nccccoc=O)sc5c9)))))))))S=O)=O)ccccs5"} {"text":"User: I want to come up with a SELFIES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be inhibiting CYP3A4.\nAssistant: Got it, this SELFIES is not inhibiting CYP3A4: [C][O][C][=N][C][=C][N][=C][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][Branch1][C][C][C][Ring1][S][=N][Ring2][Ring1][Ring2]"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES O=c1[nH]c2cc3c(cc2cc1CN(CCCO)Cc1nnnn1Cc1ccc(F)cc1)OCO3 inhibiting CYP P450 3A4?\nConstraint: Even if you are uncertain, you must pick either A or B without using any other words.\nOptions:\nA False\nB True\nAnswer: B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES representation of O=C(Nc1ccc(F)cc1)c1cc(S(=O)(=O)N2CCCCC2)cs1 inhibiting CYP P450 3A4?\nConstraint: Even if you are uncertain, you must pick either a or b without using any additional words.\nOptions:\n[a] False\n[b] True\nAnswer: b"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/valid_0-2.jsonl": "{"text":"The InChI InChI=1S\/C15H18N2O3S\/c1-3-20-13(19)8-21-15-10(7-16)9(2)14-11(17-15)5-4-6-12(14)18\/h9,17H,3-6,8H2,1-2H3 is from a molecule that displays no inhibition of CYP3A4."} {"text":"The SMILES Cc1ccc(S(=O)(=O)NC(=O)Nc2ncn[nH]2)cc1 is from a molecule that displays no inhibition of CYP3A4."}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are inhibiting CYP P450 3A4?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any other words.\nOptions:\n1 CC(=O)N(c1ccc2oc(=O)sc2c1)S(=O)(=O)c1cccs1\n2 COC(=O)CNC(c1ccccc1Cl)c1cc(Br)ccc1NC(=O)CCN1CCOCC1\n3 COc1ccc(C(=O)N2CCC[C@@]3(CCN(Cc4ccccc4)C3)C2)cc1\nAnswer: 1, 2, 3"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting CYP P450 3A4?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any additional words.\nOptions:\n1: InChI=1S\/C15H13ClN2O4\/c1-9-3-5-11(12(16)7-9)15(19)17-13-6-4-10(22-2)8-14(13)18(20)21\/h3-8H,1-2H3,(H,17,19)\n2: InChI=1S\/C16H16N4O2\/c1-20-14-13(10-17-16(19-14)22-2)18-12(15(20)21)9-8-11-6-4-3-5-7-11\/h3-7,10H,8-9H2,1-2H3\n3: InChI=1S\/C19H18F2N6O\/c20-12-7-11(8-13(21)9-12)16-18(28)27(14-1-2-14)17-15(24-16)10-23-19(25-17)26-5-3-22-4-6-26\/h7-10,14,22H,1-6H2\n4: InChI=1S\/C26H27ClN4O10\/c27-17-7-19-18(38-12-39-19)6-13(17)8-29-25(34)30-4-3-16-20(23-24(41-23)22(33)21(16)31(30)26(29)35)28-40-10-14(32)9-36-11-15-2-1-5-37-15\/h1-2,5-7,14,16,21-24,32-33H,3-4,8-12H2\/b28-20+\/t14-,16-,21+,22-,23+,24+\/m0\/s1\nAnswer: 2"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/valid_0-1.jsonl": "{"text":"Based on the canonical SMILES CCOC(=O)CSC1=C(C#N)C(C)C2=C(CCCC2=O)N1, the molecule exhibits no inhibition of CYP3A4."} {"text":"Based on the canonical SMILES Cc1ccc(S(=O)(=O)NC(=O)Nc2ncn[nH]2)cc1, the molecule shows no inhibition of CYP3A4."}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES representation of CCOC(=O)CSC1=C(C#N)C(C)C2=C(CCCC2=O)N1 inhibiting CYP P450 3A4?\nConstraint: Even if you are uncertain, you must pick either a or b without using any additional words.\nOptions:\n[a] False\n[b] True\nAnswer: a"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of Cc1ccc(S(=O)(=O)NC(=O)Nc2ncn[nH]2)cc1 inhibiting CYP P450 3A4?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any other words.\nOptions:\n(1) True\n(2) False\nAnswer: 2"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP3A4.\nMolecule DeepSMILES: CCOC=O)CSC=CC#N))CC)C=CCCCC6=O)))))N6\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not inhibiting CYP3A4."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 3A4.\nInChI: InChI=1S\/C10H11N5O3S\/c1-7-2-4-8(5-3-7)19(17,18)15-10(16)13-9-11-6-12-14-9\/h2-6H,1H3,(H3,11,12,13,14,15,16)\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not inhibiting CYP P450 3A4."}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 3A4.\nSMILES: CCOC(=O)CSC1=C(C#N)C(C)C2=C(CCCC2=O)N1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP3A4.\nDeepSMILES: CccccS=O)=O)NC=O)Ncncn[nH]5)))))))))cc6\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP3A4.\nInChI: InChI=1S\/C13H9NO5S3\/c1-8(15)14(22(17,18)12-3-2-6-20-12)9-4-5-10-11(7-9)21-13(16)19-10\/h2-7H,1H3\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is inhibiting CYP3A4."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP3A4.\nSELFIES: [C][O][C][=N][C][=C][N][=C][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][Branch1][C][C][C][Ring1][S][=N][Ring2][Ring1][Ring2]\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not inhibiting CYP3A4."}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/valid_0-12.jsonl": "{"text":"User: I want to generate a SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be inhibiting CYP P450 3A4.\nAssistant: Got it, this SMILES is not inhibiting CYP P450 3A4: CCOC(=O)CSC1=C(C#N)C(C)C2=C(CCCC2=O)N1"} {"text":"User: I want to generate a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be inhibiting CYP3A4.\nAssistant: Understood, this canonical SMILES is not inhibiting CYP3A4: Cc1ccc(S(=O)(=O)NC(=O)Nc2ncn[nH]2)cc1"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/train_0-2.jsonl": "{"text":"The InChI InChI=1S\/C13H9NO5S3\/c1-8(15)14(22(17,18)12-3-2-6-20-12)9-4-5-10-11(7-9)21-13(16)19-10\/h2-7H,1H3 is from a molecule that shows inhibition of CYP3A4."} {"text":"The canonical SMILES COc1ncc2nc(CCc3ccccc3)c(=O)n(C)c2n1 represents a molecule that shows no inhibition of CYP3A4."}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/test_0-11.jsonl": "{"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should be inhibiting CYP P450 3A4.\nAssistant: Ok, this SMILES is inhibiting CYP P450 3A4: O=c1[nH]c2cc3c(cc2cc1CN(CCCO)Cc1nnnn1Cc1ccc(F)cc1)OCO3"} {"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should be inhibiting CYP3A4.\nAssistant: Got it, here you go, this InChI is inhibiting CYP3A4: InChI=1S\/C16H17FN2O3S2\/c17-12-4-6-13(7-5-12)18-16(20)15-10-14(11-23-15)24(21,22)19-8-2-1-3-9-19\/h4-7,10-11H,1-3,8-9H2,(H,18,20)"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/train_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the DeepSMILES CC=O)Nccccoc=O)sc5c9)))))))))S=O)=O)ccccs5 is inhibiting CYP P450 3A4?\nAssistant: Yes, this molecule is inhibiting CYP P450 3A4."} {"text":"User: Can you tell me if the molecule with the canonical SMILES COc1ncc2nc(CCc3ccccc3)c(=O)n(C)c2n1 is inhibiting CYP3A4?\nAssistant: No, this molecule is not inhibiting CYP3A4."}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/train_0-11.jsonl": "{"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should be inhibiting CYP P450 3A4.\nAssistant: Ok, here you go, this SMILES is inhibiting CYP P450 3A4: CC(=O)N(c1ccc2oc(=O)sc2c1)S(=O)(=O)c1cccs1"} {"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be inhibiting CYP3A4.\nAssistant: Got it, here you go, this SELFIES is not inhibiting CYP3A4: [C][O][C][=N][C][=C][N][=C][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][Branch1][C][C][C][Ring1][S][=N][Ring2][Ring1][Ring2]"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/train_0-1.jsonl": "{"text":"Based on the InChI representation InChI=1S\/C13H9NO5S3\/c1-8(15)14(22(17,18)12-3-2-6-20-12)9-4-5-10-11(7-9)21-13(16)19-10\/h2-7H,1H3, the molecule exhibits inhibition of CYP3A4."} {"text":"Based on the SELFIES [C][O][C][=N][C][=C][N][=C][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][Branch1][C][C][C][Ring1][S][=N][Ring2][Ring1][Ring2], the molecule displays no inhibition of CYP P450 3A4."}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES CC(=O)N(c1ccc2oc(=O)sc2c1)S(=O)(=O)c1cccs1 inhibiting CYP3A4?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any other words.\nOptions:\n(1) False\n(2) True\nAnswer: 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI representation of InChI=1S\/C16H16N4O2\/c1-20-14-13(10-17-16(19-14)22-2)18-12(15(20)21)9-8-11-6-4-3-5-7-11\/h3-7,10H,8-9H2,1-2H3 inhibiting CYP P450 3A4?\nConstraint: Even if you are uncertain, you must pick either a or b without using any other words.\nOptions:\n[a] True\n[b] False\nAnswer: b"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 3A4.\nDeepSMILES: CC=O)Nccccoc=O)sc5c9)))))))))S=O)=O)ccccs5\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP3A4.\nInChI: InChI=1S\/C16H16N4O2\/c1-20-14-13(10-17-16(19-14)22-2)18-12(15(20)21)9-8-11-6-4-3-5-7-11\/h3-7,10H,8-9H2,1-2H3\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/test_0-7.jsonl": "{"text":"User: Can you derive if the molecule with the canonical SMILES O=c1[nH]c2cc3c(cc2cc1CN(CCCO)Cc1nnnn1Cc1ccc(F)cc1)OCO3 is inhibiting CYP P450 3A4?\nAssistant: Yes, this molecule is inhibiting CYP P450 3A4."} {"text":"User: Can you derive if the molecule with the SELFIES [O][=C][Branch1][=N][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][=C][C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][C][C][C][Ring1][=Branch1][=C][S][Ring1][=C] is inhibiting CYP P450 3A4?\nAssistant: Yes, this molecule is inhibiting CYP P450 3A4."}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/train_0-9.jsonl": "{"text":"User: Can you generate the InChI of a molecule that is inhibiting CYP P450 3A4?\nAssistant: Sure, here you go: InChI=1S\/C13H9NO5S3\/c1-8(15)14(22(17,18)12-3-2-6-20-12)9-4-5-10-11(7-9)21-13(16)19-10\/h2-7H,1H3"} {"text":"User: Can you give me the SMILES of a molecule that is not inhibiting CYP3A4?\nAssistant: Yes, here you go: COc1ncc2nc(CCc3ccccc3)c(=O)n(C)c2n1"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/valid_0-3.jsonl": "{"text":"The canonical SMILES CCOC(=O)CSC1=C(C#N)C(C)C2=C(CCCC2=O)N1 is not inhibiting CYP P450 3A4."} {"text":"The canonical SMILES Cc1ccc(S(=O)(=O)NC(=O)Nc2ncn[nH]2)cc1 is not inhibiting CYP P450 3A4."}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/test_0-8.jsonl": "{"text":"User: Is the molecule with the DeepSMILES O=c[nH]cccccc6cc%10CNCCCO))))Ccnnnn5CccccF)cc6)))))))))))))))))))OCO5 inhibiting CYP3A4?\nAssistant: Yes, it is inhibiting CYP3A4."} {"text":"User: Is the molecule with the SELFIES [O][=C][Branch1][=N][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][=C][C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][C][C][C][Ring1][=Branch1][=C][S][Ring1][=C] inhibiting CYP P450 3A4?\nAssistant: Yes, it is inhibiting CYP P450 3A4."}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are inhibiting CYP P450 3A4?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any additional words.\nOptions:\n(1) [C][C][=C][C][=C][Branch2][Ring2][#C][N][C][=Branch1][C][=O][N][N][C][=Branch1][C][=O][C][N][N][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Branch1][Ring1][C][#N][C][Ring2][Ring1][Ring2][=O][C][=C][Ring2][Ring2][Ring1]\n(2) [C][\/C][=Branch1][O][=C][\\C@@H1][Branch1][C][N][C][=Branch1][C][=O][O][C][P][=Branch1][C][=O][Branch1][C][O][O]\n(3) [O][=C][NH1][C][=C][C][=C][Branch2][Ring2][Branch1][C][=C][Ring1][=Branch1][C][=C][Ring1][#Branch2][C][N][Branch1][Branch1][C][C][C][O][C][C][=N][N][=N][N][Ring1][Branch1][C][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][O][C][O][Ring2][Ring1][=N]\nAnswer: 3"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are inhibiting CYP P450 3A4?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any other words.\nOptions:\n1.) [C][C][O][\/C][Branch1][C][C][=N][\/N][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][C][C][=Branch1][C][=O][N][Branch1][#Branch1][C][C][Branch1][C][C][C][C][Branch1][C][C][=N][C][=Ring1][N][Ring2][Ring1][#Branch1]\n2.) [C][O][C][=C][C][=C][Branch2][Ring1][#Branch1][N][C][=Branch1][C][=O][C][=C][C][=C][Branch1][Branch2][N][C][=C][C][=N][Ring1][Branch1][N][=C][Ring1][O][C][Branch1][Ring1][O][C][=C][Ring2][Ring1][=Branch1]\n3.) [C][C][N][Branch1][Ring1][C][C][C][=N][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][Ring1][=Branch2].[Cl]\n4.) [O][=C][Branch1][=N][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][=C][C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][C][C][C][Ring1][=Branch1][=C][S][Ring1][=C]\nAnswer: 1, 4"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting CYP P450 3A4?\nConstraint: You must select none, one or more options from a, b, c, or d without using any other words.\nOptions:\na [O][=C][Branch2][Ring1][=N][C][C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][C][=Branch1][=Branch1][=C][Ring1][#Branch2][Ring1][=Branch1][C][Ring1][=C][=O][N][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][#Branch2]\nb [C][C][N][C][=Branch1][C][=O][NH1][C][=C][C][Branch1][C][Cl][=C][Branch1][C][Cl][C][=C][Ring1][Branch2][Ring1][N]\nc [C][O][C][=C][C][Branch2][Ring1][#C][\/C][=C][Branch1][=C][\/N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][C][=C][Ring2][Ring1][=N][O][C]\nd [C][C][O][C][=Branch1][C][=O][C][S][C][=C][Branch1][Ring1][C][#N][C][Branch1][C][C][C][=C][Branch1][Branch2][C][C][C][C][Ring1][=Branch1][=O][N][Ring1][=C]\nAnswer: d"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting CYP P450 3A4?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any additional words.\nOptions:\n(1) O=S=O)NccccCcccncc6)))))))cc6)))))))ccccBr)cc6\n(2) CccccS=O)=O)NC=O)Ncncn[nH]5)))))))))cc6\n(3) CCOccccNCC=O)NccccOCC)))cc6))))))CC6=O)))))))cc6\nAnswer: 2, 3"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP3A4.\nMolecule canonical SMILES: O=c1[nH]c2cc3c(cc2cc1CN(CCCO)Cc1nnnn1Cc1ccc(F)cc1)OCO3\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP3A4.\nMolecule DeepSMILES: O=CNccccF)cc6)))))))cccS=O)=O)NCCCCC6)))))))cs5\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"}", "/scratch/micpie/export/cyp_p450_3a4_inhibition_veith_et_al/test_0-12.jsonl": "{"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should be inhibiting CYP3A4.\nAssistant: Ok, this DeepSMILES is inhibiting CYP3A4: O=c[nH]cccccc6cc%10CNCCCO))))Ccnnnn5CccccF)cc6)))))))))))))))))))OCO5"} {"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should be inhibiting CYP3A4.\nAssistant: Ok, this DeepSMILES is inhibiting CYP3A4: O=CNccccF)cc6)))))))cccS=O)=O)NCCCCC6)))))))cs5"}", "/scratch/micpie/export/train.jsonl": "{"text":"The chemical with the SELFIES of [O][C][C][C@@H1][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C@@H1][Branch2][Ring1][N][C][C][=C][C][=C][C][=Branch1][O][=C][C][=C][Ring1][=Branch1][N][=C][Ring1][#Branch2][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][=C][Branch1][C][C][C] exhibits inhibition of the human beta-secretase 1 (BACE-1)."} {"text":"User: I'm looking for the InChI of a molecule that has a volume of distribution at steady state (VDss) of 11.000 L\/kg.\nAssistant: This is a molecule that has a volume of distribution at steady state (VDss) of 11.000 L\/kg: InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1"}", "/scratch/micpie/export/MUV_810/valid_0-0.jsonl": "{"text":"The chemical with the DeepSMILES O=CO)ccccS=O)=O)NCCCcnc-cccccc6Cl)))))))no5)))))CC6)))))))cc6 is not an inhibitor of the focal adhesion kinase."} {"text":"The chemical compound with the SELFIES representation of ['[C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][Branch1][C][C][=N][N][C][Branch1][P][N][C][C][C][Branch1][Branch1][C][C][Ring1][=Branch1][O][C][C][O][Ring1][#Branch1][=C][C][Branch1][C][C][=N][C][=Ring2][Ring1][Branch1][Ring1][P]'] is not an inhibitor of the focal adhesion kinase."}", "/scratch/micpie/export/MUV_810/test_0-0.jsonl": "{"text":"The chemical with the DeepSMILES CccccNC=O)CCCCNcncccn6))))))C6))))))))cc6F is not an inhibitor of the focal adhesion kinase."} {"text":"The compound with the SELFIES ['[C][C][C][=Branch1][C][=O][N][C][=N][N][=C][Branch1][=Branch2][C][=C][C][=N][C][=C][Ring1][=Branch1][S][Ring1][O]'] is not an inhibitor of FAK."}", "/scratch/micpie/export/MUV_810/train_0-0.jsonl": "{"text":"The molecular species with the DeepSMILES CcccccNCCNC=O)CCCCCCCC6)C8)))C6)))))))CC6))))))c6C is not an inhibitor of FAK."} {"text":"The compound with the InChI InChI=1S\/C31H33N7O3\/c1-20-13-15-21(16-14-20)29-34-36-37(35-29)19-27(39)38(22-9-8-10-23(17-22)41-5)28(30(40)33-31(2,3)4)25-18-32-26-12-7-6-11-24(25)26\/h6-18,28,32H,19H2,1-5H3,(H,33,40) is not an inhibitor of the focal adhesion kinase."}", "/scratch/micpie/export/drug_protein_protein/test_0-1.jsonl": "{"text":"The protein Estradiol receptor is targeted by InChI=1S\/C32H47F5O3S\/c1-30-17-15-26-25-12-11-24(38)21-23(25)20-22(29(26)27(30)13-14-28(30)39)10-7-5-3-2-4-6-8-18-41(40)19-9-16-31(33,34)32(35,36)37\/h11-12,21-22,26-29,38-39H,2-10,13-20H2,1H3\/t22-,26-,27+,28+,29-,30+,41?\/m1\/s1. The protein Estradiol receptor interacts with DGAT1."} {"text":"The protein HGPRT is targeted by Nc1nc2cc[nH]c2c(=O)[nH]1. The protein HGPRT is ortholog to HPRT1."}", "/scratch/micpie/export/drug_protein_protein/valid_0-0.jsonl": "{"text":"The drug Cn1c(=O)c2c(ncn2C)n(C)c1=O targets the protein Cone cGMP-specific 3',5'-cyclic phosphodiesterase subunit alpha' which is ortholog to the protein Pde6c."} {"text":"The drug CC1=C(SC=C1)C(=CCCN1CCC[C@H](C1)C(O)=O)C1=C(C)C=CS1 targets the protein GAT-1 which is ortholog to the protein Slc6a1."}", "/scratch/micpie/export/drug_protein_protein/test_0-2.jsonl": "{"text":"User: Can you give me one example for a protein that binds the drug InChI=1S\/C32H47F5O3S\/c1-30-17-15-26-25-12-11-24(38)21-23(25)20-22(29(26)27(30)13-14-28(30)39)10-7-5-3-2-4-6-8-18-41(40)19-9-16-31(33,34)32(35,36)37\/h11-12,21-22,26-29,38-39H,2-10,13-20H2,1H3\/t22-,26-,27+,28+,29-,30+,41?\/m1\/s1?\nAssistant: The drug InChI=1S\/C32H47F5O3S\/c1-30-17-15-26-25-12-11-24(38)21-23(25)20-22(29(26)27(30)13-14-28(30)39)10-7-5-3-2-4-6-8-18-41(40)19-9-16-31(33,34)32(35,36)37\/h11-12,21-22,26-29,38-39H,2-10,13-20H2,1H3\/t22-,26-,27+,28+,29-,30+,41?\/m1\/s1 targets for example the protein ER-alpha.\nUser: Can you tell me a protein that interacts with protein ER-alpha?\nAssistant: Yes, of course, the protein ESR1 interacts with protein DGAT1."} {"text":"User: Can you give me one example for a protein that binds the drug 9-Deazaguanine?\nAssistant: The drug 9-Deazaguanine targets for example the protein HGPRT.\nUser: Can you tell me a protein that is ortholog to protein HGPRT?\nAssistant: Of course, the protein HPRT1 is ortholog to protein HPRT1."}", "/scratch/micpie/export/drug_protein_protein/test_0-0.jsonl": "{"text":"The drug Fulvestrant targets the protein ER-alpha which interacts with the protein DGAT1."} {"text":"The drug Nc1nc2cc[nH]c2c(=O)[nH]1 targets the protein Hypoxanthine-guanine phosphoribosyltransferase which is ortholog to the protein HPRT1."}", "/scratch/micpie/export/drug_protein_protein/train_0-0.jsonl": "{"text":"The drug [H][C@](O)(CCC(O)=O)NC1=CC=C(C=C1)N1C(=O)CCC1=O targets the protein Myosin regulatory light chain MRLC3 which is ortholog to the protein Rlc-a."} {"text":"The drug Histidine targets the protein Histidine ammonia-lyase which is ortholog to the protein HAL."}", "/scratch/micpie/export/drug_protein_protein/valid_0-2.jsonl": "{"text":"User: Can you give me one example for a protein that binds the drug Caffeine?\nAssistant: The drug Caffeine targets for example the protein cGMP phosphodiesterase 6C.\nUser: Can you tell me a protein that is ortholog to protein cGMP phosphodiesterase 6C?\nAssistant: Yes, the protein PDE6C is ortholog to protein Pde6c."} {"text":"User: Can you come up with an example for a protein that binds the drug InChI=1S\/C20H25NO2S2\/c1-14-7-11-24-18(14)17(19-15(2)8-12-25-19)6-4-10-21-9-3-5-16(13-21)20(22)23\/h6-8,11-12,16H,3-5,9-10,13H2,1-2H3,(H,22,23)\/t16-\/m1\/s1?\nAssistant: The drug InChI=1S\/C20H25NO2S2\/c1-14-7-11-24-18(14)17(19-15(2)8-12-25-19)6-4-10-21-9-3-5-16(13-21)20(22)23\/h6-8,11-12,16H,3-5,9-10,13H2,1-2H3,(H,22,23)\/t16-\/m1\/s1 targets for example the protein GAT-1.\nUser: Can you tell me a protein that is ortholog to protein GAT-1?\nAssistant: Yes, of course, the protein SLC6A1 is ortholog to protein Slc6a1."}", "/scratch/micpie/export/drug_protein_protein/valid_0-1.jsonl": "{"text":"The protein Cone cGMP-specific 3',5'-cyclic phosphodiesterase subunit alpha' is targeted by CNC=NC=C5C=O)NC)C=O)N6C. The protein Cone cGMP-specific 3',5'-cyclic phosphodiesterase subunit alpha' is ortholog to Pde6c."} {"text":"The protein Sodium- and chloride-dependent GABA transporter 1 is targeted by [C][C][=C][Branch1][=Branch1][S][C][=C][Ring1][Branch1][C][=Branch2][Ring1][Ring1][=C][C][C][N][C][C][C][C@H1][Branch1][Ring2][C][Ring1][=Branch1][C][Branch1][C][O][=O][C][=C][Branch1][C][C][C][=C][S][Ring1][=Branch1]. The protein Sodium- and chloride-dependent GABA transporter 1 is ortholog to Slc6a1."}", "/scratch/micpie/export/drug_protein_protein/train_0-2.jsonl": "{"text":"User: Can you come up with one example for a protein that binds the drug [H][C@]O)CCCO)=O))))NC=CC=CC=C6))NC=O)CCC5=O?\nAssistant: The drug [H][C@]O)CCCO)=O))))NC=CC=CC=C6))NC=O)CCC5=O targets for example the protein Epididymis secretory protein Li 24.\nUser: Can you tell me a protein that is ortholog to protein Epididymis secretory protein Li 24?\nAssistant: Yes, of course, the protein MYL12A is ortholog to protein Rlc-a."} {"text":"User: Can you come up with an example for a protein that binds the drug Histidine?\nAssistant: The drug Histidine targets for example the protein Histidine ammonia-lyase.\nUser: Can you tell me a protein that is ortholog to protein Histidine ammonia-lyase?\nAssistant: Yes, the protein HAL is ortholog to protein HAL."}", "/scratch/micpie/export/drug_protein_protein/train_0-1.jsonl": "{"text":"The protein Myosin regulatory light chain 2, nonsarcomeric is targeted by 4-[4-(2,5-DIOXO-PYRROLIDIN-1-YL)-PHENYLAMINO]-4-HYDROXY-BUTYRIC ACID. The protein Myosin regulatory light chain 2, nonsarcomeric is ortholog to Rlc-a."} {"text":"The protein Histidase is targeted by Histidine. The protein Histidase is ortholog to HAL."}", "/scratch/micpie/export/carcinogens/test_0-10.jsonl": "{"text":"User: Can you generate the SMILES of a molecule that is carcinogenic?\nAssistant: Of course, here you go: COc1ccc(C(=O)\/C(Br)=C\\C(=O)O)cc1"} {"text":"User: Can you create the canonical SMILES of a molecule that is not carcinogenic?\nAssistant: Of course, here you go: COc1ccc(C(=O)\/C(Br)=C\\C(=O)O)cc1"}", "/scratch/micpie/export/carcinogens/valid_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the InChI InChI=1S\/C16H12N2O7S2\/c19-13-7-6-10-8-12(26(20,21)22)9-14(27(23,24)25)15(10)16(13)18-17-11-4-2-1-3-5-11\/h1-9,19H,(H,20,21,22)(H,23,24,25)\/b18-17+ is carcinogenic?\nAssistant: Yes, this molecule is carcinogenic."} {"text":"User: Can you derive if the molecule with the SMILES C=C[C@]12CN(C)[C@@H]3[C@H]4COC(CC41)[C@]1(C(=O)Nc4ccccc41)[C@@H]32 is carcinogenic?\nAssistant: No, this molecule is not carcinogenic."}", "/scratch/micpie/export/carcinogens/test_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are carcinogenic?\nConstraint: You must select none, one or more options from a, b, c, d, or e without using any additional words.\nOptions:\na) ['[C][O][C][=Branch1][C][=O][N][C][=N][C][=C][C][Branch1][#Branch1][S][C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch2][NH1][Ring1][=N]']\nb) ['[C][C][=Branch1][C][=O][O][C][C][=C][Branch1][=Branch1][C][=Branch1][C][=O][O][N][C][=Branch1][C][=O][C@@H1][Branch1][C][N][C@H1][Ring1][=Branch1][S][C][Ring1][=N]']\nc) ['[C][O][C][=C][C][=C][Branch1][#C][C][=Branch1][C][=O][\/C][Branch1][C][Br][=C][\\\\C][=Branch1][C][=O][O][C][=C][Ring1][=C]']\nd) ['[C][=C][Branch1][C][C][C@@H1][C][C][C@][Branch1][Ring1][C][O][C][C][C@][Branch1][C][C][C@H1][Branch2][Ring1][P][C][C][C@@H1][C@@][Branch1][C][C][C][C][C@H1][Branch1][C][O][C][Branch1][C][C][Branch1][C][C][C@@H1][Ring1][#Branch2][C][C][C@][Ring1][=C][Ring2][Ring1][Ring1][C][C@@H1][Ring2][Ring1][=N][Ring2][Ring1][#Branch2]']\ne) ['[C][O][C][=C][C][=C][Branch1][O][C][Branch1][Ring1][O][C][=C][Ring1][Branch2][O][C][C][=C][C][=C][Branch1][C][O][C][=Branch1][C][=O][C][=C][Ring1][=Branch2][C@@H1][Branch1][C][N][C][C][Ring2][Ring1][Branch1]']\nAnswer: c"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not carcinogenic?\nConstraint: You must select none, one or more options from 1 or 2 without using any additional words.\nOptions:\n(1) CC=NCCcc6[nH]cccO)ccc96\n(2) COccccC=O)\/CBr)=C\\C=O)O)))))cc6\nAnswer: 1, 2"}", "/scratch/micpie/export/carcinogens/train_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the InChI InChI=1S\/C6H12Cl2O\/c1-5(3-7)9-6(2)4-8\/h5-6H,3-4H2,1-2H3 is carcinogenic?\nAssistant: Yes, this molecule is carcinogenic."} {"text":"User: Can you estimate if the molecule with the InChI InChI=1S\/C20H14N2O10S3\/c23-20-18(35(30,31)32)10-11-9-12(33(24,25)26)5-6-13(11)19(20)22-21-16-7-8-17(34(27,28)29)15-4-2-1-3-14(15)16\/h1-10,23H,(H,24,25,26)(H,27,28,29)(H,30,31,32)\/b22-21+ is carcinogenic?\nAssistant: Yes, this molecule is carcinogenic."}", "/scratch/micpie/export/carcinogens/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is carcinogenic.\nMolecule SELFIES: ['[C][O][C][=C][C][=C][Branch1][#C][C][=Branch1][C][=O][\/C][Branch1][C][Br][=C][\\\\C][=Branch1][C][=O][O][C][=C][Ring1][=C]']\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is carcinogenic.\nDeepSMILES: COccccC=O)\/CBr)=C\\C=O)O)))))cc6\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/carcinogens/valid_0-9.jsonl": "{"text":"User: Is the molecule with the SMILES O=S(=O)(O)c1cc(S(=O)(=O)O)c2c(\/N=N\/c3ccccc3)c(O)ccc2c1 carcinogenic?\nAssistant: Yes, it is carcinogenic."} {"text":"User: Is the molecule with the DeepSMILES C=C[C@]CNC)[C@@H][C@H]COCCC6%10))[C@]C=O)Ncccccc69))))))))[C@@H]7%10 carcinogenic?\nAssistant: No, it is not carcinogenic."}", "/scratch/micpie/export/carcinogens/test_0-1.jsonl": "{"text":"Based on the DeepSMILES COccccC=O)\/CBr)=C\\C=O)O)))))cc6, the molecule has carcinogenic properties."} {"text":"Based on the DeepSMILES COccccC=O)\/CBr)=C\\C=O)O)))))cc6, the molecule has no carcinogenic properties."}", "/scratch/micpie/export/carcinogens/valid_0-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of O=S(=O)(O)c1cc(S(=O)(=O)O)c2c(\/N=N\/c3ccccc3)c(O)ccc2c1 shows carcinogenic properties."} {"text":"The molecule with the InChI InChI=1S\/C20H22N2O2\/c1-3-19-10-22(2)16-11-9-24-15(8-13(11)19)20(17(16)19)12-6-4-5-7-14(12)21-18(20)23\/h3-7,11,13,15-17H,1,8-10H2,2H3,(H,21,23)\/t11-,13?,15?,16+,17-,19-,20-\/m0\/s1 shows no carcinogenic properties."}", "/scratch/micpie/export/carcinogens/test_0-2.jsonl": "{"text":"The DeepSMILES COccccC=O)\/CBr)=C\\C=O)O)))))cc6 represents a molecule that is identified as carcinogenic."} {"text":"The SELFIES ['[C][O][C][=C][C][=C][Branch1][#C][C][=Branch1][C][=O][\/C][Branch1][C][Br][=C][\\\\C][=Branch1][C][=O][O][C][=C][Ring1][=C]'] is from a molecule that is not identified as carcinogenic."}", "/scratch/micpie/export/carcinogens/valid_0-10.jsonl": "{"text":"User: Can you create the SMILES of a molecule that is carcinogenic?\nAssistant: Sure, here you go: O=S(=O)(O)c1cc(S(=O)(=O)O)c2c(\/N=N\/c3ccccc3)c(O)ccc2c1"} {"text":"User: Can you generate the SELFIES of a molecule that is not carcinogenic?\nAssistant: Sure, here you go: ['[C][=C][C@][C][N][Branch1][C][C][C@@H1][C@H1][C][O][C][Branch1][#Branch1][C][C][Ring1][=Branch1][Ring1][O][C@][Branch1][S][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][#Branch2][C@@H1][Ring2][Ring1][C][Ring2][Ring1][=Branch1]']"}", "/scratch/micpie/export/carcinogens/train_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is carcinogenic.\nSELFIES: ['[C][C][Branch1][Ring1][C][Cl][O][C][Branch1][C][C][C][Cl]']\nConstraint: Answer the question in a full sentence.\nResult: This molecule is carcinogenic."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is carcinogenic.\nInChI: InChI=1S\/C20H14N2O10S3\/c23-20-18(35(30,31)32)10-11-9-12(33(24,25)26)5-6-13(11)19(20)22-21-16-7-8-17(34(27,28)29)15-4-2-1-3-14(15)16\/h1-10,23H,(H,24,25,26)(H,27,28,29)(H,30,31,32)\/b22-21+\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is carcinogenic."}", "/scratch/micpie/export/carcinogens/valid_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is carcinogenic.\nMolecule canonical SMILES: O=S(=O)(O)c1cc(S(=O)(=O)O)c2c(\/N=N\/c3ccccc3)c(O)ccc2c1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is carcinogenic."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is carcinogenic.\nMolecule SELFIES: ['[C][=C][C@][C][N][Branch1][C][C][C@@H1][C@H1][C][O][C][Branch1][#Branch1][C][C][Ring1][=Branch1][Ring1][O][C@][Branch1][S][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][#Branch2][C@@H1][Ring2][Ring1][C][Ring2][Ring1][=Branch1]']\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not carcinogenic."}", "/scratch/micpie/export/carcinogens/test_0-9.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C11H9BrO4\/c1-16-8-4-2-7(3-5-8)11(15)9(12)6-10(13)14\/h2-6H,1H3,(H,13,14)\/b9-6+ carcinogenic?\nAssistant: Yes, it is carcinogenic."} {"text":"User: Is the molecule with the DeepSMILES COccccC=O)\/CBr)=C\\C=O)O)))))cc6 carcinogenic?\nAssistant: No, it is not carcinogenic."}", "/scratch/micpie/export/carcinogens/test_0-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of COccccC=O)\/CBr)=C\\C=O)O)))))cc6 exhibits carcinogenic effects."} {"text":"The molecule with the SELFIES ['[C][O][C][=C][C][=C][Branch1][#C][C][=Branch1][C][=O][\/C][Branch1][C][Br][=C][\\\\C][=Branch1][C][=O][O][C][=C][Ring1][=C]'] displays no carcinogenic effects."}", "/scratch/micpie/export/carcinogens/valid_0-7.jsonl": "{"text":"Task: Please generate a molecule SMILES based on the description below.\nDescription: A molecule that is carcinogenic.\nResult: O=S(=O)(O)c1cc(S(=O)(=O)O)c2c(\/N=N\/c3ccccc3)c(O)ccc2c1"} {"text":"Task: Please generate a molecule DeepSMILES based on the text description below.\nDescription: A molecule that is carcinogenic.\nResult: C=C[C@]CNC)[C@@H][C@H]COCCC6%10))[C@]C=O)Ncccccc69))))))))[C@@H]7%10"}", "/scratch/micpie/export/carcinogens/test_0-3.jsonl": "{"text":"The SELFIES ['[C][O][C][=C][C][=C][Branch1][#C][C][=Branch1][C][=O][\/C][Branch1][C][Br][=C][\\\\C][=Branch1][C][=O][O][C][=C][Ring1][=C]'] is carcinogenic."} {"text":"The canonical SMILES COc1ccc(C(=O)\/C(Br)=C\\C(=O)O)cc1 is not carcinogenic."}", "/scratch/micpie/export/carcinogens/valid_0-11.jsonl": "{"text":"User: I'm looking for the DeepSMILES of a molecule that is carcinogenic?\nAssistant: This is a molecule that is carcinogenic: O=S=O)O)cccS=O)=O)O))cc\/N=N\/cccccc6))))))))cO)ccc6c%10"} {"text":"User: I'm looking for the SELFIES of a molecule that is not carcinogenic?\nAssistant: This is a molecule that is not carcinogenic: ['[C][=C][C@][C][N][Branch1][C][C][C@@H1][C@H1][C][O][C][Branch1][#Branch1][C][C][Ring1][=Branch1][Ring1][O][C@][Branch1][S][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][#Branch2][C@@H1][Ring2][Ring1][C][Ring2][Ring1][=Branch1]']"}", "/scratch/micpie/export/carcinogens/train_0-0.jsonl": "{"text":"The molecule with the SMILES CC(CCl)OC(C)CCl displays carcinogenic properties."} {"text":"The molecule with the DeepSMILES O=S=O)O)ccccc\/N=N\/ccccS=O)=O)O))cccccc%106))))))))))))cO)cS=O)=O)O))cc6c%10 exhibits carcinogenic properties."}", "/scratch/micpie/export/carcinogens/test_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is carcinogenic.\nMolecule canonical SMILES: COc1ccc(C(=O)\/C(Br)=C\\C(=O)O)cc1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is carcinogenic."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is carcinogenic.\nMolecule SMILES: COc1ccc(C(=O)\/C(Br)=C\\C(=O)O)cc1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not carcinogenic."}", "/scratch/micpie/export/carcinogens/train_0-10.jsonl": "{"text":"User: Can you create the InChI of a molecule that is carcinogenic?\nAssistant: Of course, here you go: InChI=1S\/C6H12Cl2O\/c1-5(3-7)9-6(2)4-8\/h5-6H,3-4H2,1-2H3"} {"text":"User: Can you generate the canonical SMILES of a molecule that is carcinogenic?\nAssistant: Yes, I'm happy to help, here you go: O=S(=O)(O)c1ccc2c(\/N=N\/c3ccc(S(=O)(=O)O)c4ccccc34)c(O)c(S(=O)(=O)O)cc2c1"}", "/scratch/micpie/export/carcinogens/train_0-3.jsonl": "{"text":"The SELFIES ['[C][C][Branch1][Ring1][C][Cl][O][C][Branch1][C][C][C][Cl]'] is carcinogenic."} {"text":"The SELFIES ['[O][=S][=Branch1][C][=O][Branch1][C][O][C][=C][C][=C][C][Branch2][Ring1][O][\/N][=N][\/C][=C][C][=C][Branch1][=Branch2][S][=Branch1][C][=O][=Branch1][C][=O][O][C][=C][C][=C][C][=C][Ring1][=C][Ring1][=Branch1][=C][Branch1][C][O][C][Branch1][=Branch2][S][=Branch1][C][=O][=Branch1][C][=O][O][=C][C][Ring2][Ring1][O][=C][Ring2][Ring1][#C]'] is carcinogenic."}", "/scratch/micpie/export/carcinogens/train_0-12.jsonl": "{"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should be carcinogenic.\nAssistant: Ok, here you go, this InChI is carcinogenic: InChI=1S\/C6H12Cl2O\/c1-5(3-7)9-6(2)4-8\/h5-6H,3-4H2,1-2H3"} {"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should be carcinogenic.\nAssistant: Ok, this DeepSMILES is carcinogenic: O=S=O)O)ccccc\/N=N\/ccccS=O)=O)O))cccccc%106))))))))))))cO)cS=O)=O)O))cc6c%10"}", "/scratch/micpie/export/carcinogens/test_0-13.jsonl": "{"text":"User: I want to create a InChI.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should be carcinogenic.\nAssistant: Got it, this InChI is carcinogenic: InChI=1S\/C11H9BrO4\/c1-16-8-4-2-7(3-5-8)11(15)9(12)6-10(13)14\/h2-6H,1H3,(H,13,14)\/b9-6+"} {"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be carcinogenic.\nAssistant: Understood, this SMILES is not carcinogenic: COc1ccc(C(=O)\/C(Br)=C\\C(=O)O)cc1"}", "/scratch/micpie/export/carcinogens/valid_0-2.jsonl": "{"text":"The InChI InChI=1S\/C16H12N2O7S2\/c19-13-7-6-10-8-12(26(20,21)22)9-14(27(23,24)25)15(10)16(13)18-17-11-4-2-1-3-5-11\/h1-9,19H,(H,20,21,22)(H,23,24,25)\/b18-17+ is from a molecule that is identified as carcinogenic."} {"text":"The SMILES C=C[C@]12CN(C)[C@@H]3[C@H]4COC(CC41)[C@]1(C(=O)Nc4ccccc41)[C@@H]32 represents a molecule that is not identified as carcinogenic."}", "/scratch/micpie/export/carcinogens/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of CC(CCl)OC(C)CCl carcinogenic?\nConstraint: Even if you are not sure, you must pick either a or b without using any other words.\nOptions:\n(a) True\n(b) False\nAnswer: a"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES representation of O=S(=O)(O)c1ccc2c(\/N=N\/c3ccc(S(=O)(=O)O)c4ccccc34)c(O)c(S(=O)(=O)O)cc2c1 carcinogenic?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any additional words.\nOptions:\n(1) True\n(2) False\nAnswer: 1"}", "/scratch/micpie/export/carcinogens/valid_0-1.jsonl": "{"text":"Based on the SMILES representation O=S(=O)(O)c1cc(S(=O)(=O)O)c2c(\/N=N\/c3ccccc3)c(O)ccc2c1, the molecule has carcinogenic properties."} {"text":"Based on the DeepSMILES C=C[C@]CNC)[C@@H][C@H]COCCC6%10))[C@]C=O)Ncccccc69))))))))[C@@H]7%10, the molecule has no carcinogenic effects."}", "/scratch/micpie/export/carcinogens/valid_0-13.jsonl": "{"text":"User: I want to create a SELFIES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should be carcinogenic.\nAssistant: Understood, this SELFIES is carcinogenic: ['[O][=S][=Branch1][C][=O][Branch1][C][O][C][=C][C][Branch1][=Branch2][S][=Branch1][C][=O][=Branch1][C][=O][O][=C][C][Branch1][O][\/N][=N][\/C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Branch1][C][O][C][=C][C][Ring1][#C][=C][Ring2][Ring1][#Branch1]']"} {"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be carcinogenic.\nAssistant: Understood, this InChI is not carcinogenic: InChI=1S\/C20H22N2O2\/c1-3-19-10-22(2)16-11-9-24-15(8-13(11)19)20(17(16)19)12-6-4-5-7-14(12)21-18(20)23\/h3-7,11,13,15-17H,1,8-10H2,2H3,(H,21,23)\/t11-,13?,15?,16+,17-,19-,20-\/m0\/s1"}", "/scratch/micpie/export/carcinogens/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is carcinogenic.\nMolecule SELFIES: ['[O][=S][=Branch1][C][=O][Branch1][C][O][C][=C][C][Branch1][=Branch2][S][=Branch1][C][=O][=Branch1][C][=O][O][=C][C][Branch1][O][\/N][=N][\/C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Branch1][C][O][C][=C][C][Ring1][#C][=C][Ring2][Ring1][#Branch1]']\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is carcinogenic.\nMolecule DeepSMILES: C=C[C@]CNC)[C@@H][C@H]COCCC6%10))[C@]C=O)Ncccccc69))))))))[C@@H]7%10\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/carcinogens/train_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are carcinogenic?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any other words.\nOptions:\n(A) CN(C)CCN(CCN=O)C(N)=O\n(B) COc1cc(-c2ccc(\/N=N\/c3ccc4c(S(=O)(=O)O)cc(S(=O)(=O)O)c(N)c4c3O)c(OC)c2)ccc1\/N=N\/c1ccc2c(S(=O)(=O)O)cc(S(=O)(=O)O)c(N)c2c1O\n(C) CN1C2CCC1CC(NC(=O)c1cc(Cl)cc3c1OC(C)(C)C3)C2\n(D) C\/C=C1\/C[C@@H](C)[C@](O)(CO)C(=O)OCC2=CCN3CC[C@@H](OC1=O)[C@@H]23\n(E) CC(CCl)OC(C)CCl\nAnswer: A, C, E"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are carcinogenic?\nConstraint: You must select none, one or more options from 1, 2, 3, 4, or 5 without using any other words.\nOptions:\n1 InChI=1S\/C15H18O4\/c1-6-4-10(17)13-8(3)15(18)19-14(13)12-7(2)5-9(16)11(6)12\/h5,8,10,12-14,17H,4H2,1-3H3\/t8-,10-,12-,13+,14+\/m0\/s1\n2 InChI=1S\/C22H33NO3\/c1-4-23-10-20(3)6-5-17(25)22-15(20)7-13(18(22)23)21-9-12(11(2)19(21)26)14(24)8-16(21)22\/h12-19,24-26H,2,4-10H2,1,3H3\/t12-,13-,14-,15+,16+,17-,18?,19+,20-,21-,22-\/m0\/s1\n3 InChI=1S\/C22H35NO4\/c1-4-23-10-20(2)6-5-16(24)22-12-7-11-14(27-3)9-21(26,17(12)18(11)25)13(19(22)23)8-15(20)22\/h11-19,24-26H,4-10H2,1-3H3\/t11-,12-,13+,14+,15-,16+,17-,18?,19?,20+,21+,22-\/m1\/s1\n4 InChI=1S\/C20H14N2O10S3\/c23-20-18(35(30,31)32)10-11-9-12(33(24,25)26)5-6-13(11)19(20)22-21-16-7-8-17(34(27,28)29)15-4-2-1-3-14(15)16\/h1-10,23H,(H,24,25,26)(H,27,28,29)(H,30,31,32)\/b22-21+\n5 InChI=1S\/C11H14N2\/c1-13(2)8-9-7-12-11-6-4-3-5-10(9)11\/h3-7,12H,8H2,1-2H3\nAnswer: 4"}", "/scratch/micpie/export/carcinogens/valid_0-4.jsonl": "{"text":"The molecule SMILES O=S(=O)(O)c1cc(S(=O)(=O)O)c2c(\/N=N\/c3ccccc3)c(O)ccc2c1 is carcinogenic."} {"text":"The molecule DeepSMILES C=C[C@]CNC)[C@@H][C@H]COCCC6%10))[C@]C=O)Ncccccc69))))))))[C@@H]7%10 is not carcinogenic."}", "/scratch/micpie/export/carcinogens/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is carcinogenic.\ncanonical SMILES: CC(CCl)OC(C)CCl\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is carcinogenic.\nMolecule DeepSMILES: O=S=O)O)ccccc\/N=N\/ccccS=O)=O)O))cccccc%106))))))))))))cO)cS=O)=O)O))cc6c%10\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"}", "/scratch/micpie/export/carcinogens/valid_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are carcinogenic?\nConstraint: You must select none, one or more options from A or B without using any additional words.\nOptions:\n[A] C#C[C@@]O)CC[C@]C)[C@]5C)CC[C@]C)ccccO)cc6CC[C@]%10%14C\n[B] O=S=O)O)cccS=O)=O)O))cc\/N=N\/cccccc6))))))))cO)ccc6c%10\nAnswer: A, B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not carcinogenic?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any additional words.\nOptions:\nA: InChI=1S\/C20H25N3O2\/c1-3-14(11-24)22-20(25)13-7-16-15-5-4-6-17-19(15)12(9-21-17)8-18(16)23(2)10-13\/h4-7,9,13-14,18,21,24H,3,8,10-11H2,1-2H3,(H,22,25)\/t13-,14+,18-\/m1\/s1\nB: InChI=1S\/C20H22N2O2\/c1-3-19-10-22(2)16-11-9-24-15(8-13(11)19)20(17(16)19)12-6-4-5-7-14(12)21-18(20)23\/h3-7,11,13,15-17H,1,8-10H2,2H3,(H,21,23)\/t11-,13?,15?,16+,17-,19-,20-\/m0\/s1\nC: InChI=1S\/C4H12N2\/c1-2-3-4-6-5\/h6H,2-5H2,1H3\nD: InChI=1S\/C72H85N19O18S5\/c1-14-26(3)47-63(105)78-30(7)57(99)75-28(5)56(98)76-31(8)58(100)91-72-19-18-40(66-85-43(22-111-66)59(101)77-29(6)55(97)74-27(4)54(73)96)81-52(72)42-21-112-67(83-42)49(34(11)109-69(107)41-20-37(32(9)92)36-16-17-39(79-47)51(95)50(36)80-41)89-60(102)44-24-113-68(86-44)53(71(13,108)35(12)94)90-62(104)45-23-110-65(84-45)38(15-2)82-64(106)48(33(10)93)88-61(103)46-25-114-70(72)87-46\/h15-17,20-22,24-26,30-35,39,45,47-49,51-53,79,92-95,108H,4-6,14,18-19,23H2,1-3,7-13H3,(H2,73,96)(H,74,97)(H,75,99)(H,76,98)(H,77,101)(H,78,105)(H,82,106)(H,88,103)(H,89,102)(H,90,104)(H,91,100)\/b38-15+\/t26?,30-,31-,32-,33+,34+,35+,39+,45+,47-,48-,49-,51-,52-,53+,71-,72+\/m0\/s1\nE: InChI=1S\/C16H23NO6\/c1-9-13(18)23-11-5-7-17-6-4-10(12(11)17)8-22-14(19)16(3,21)15(9,2)20\/h4,9,11-12,20-21H,5-8H2,1-3H3\/t9-,11+,12+,15+,16-\/m0\/s1\nAnswer: A, B, D, E"}", "/scratch/micpie/export/carcinogens/valid_0-12.jsonl": "{"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should be carcinogenic.\nAssistant: Ok, this DeepSMILES is carcinogenic: O=S=O)O)cccS=O)=O)O))cc\/N=N\/cccccc6))))))))cO)ccc6c%10"} {"text":"User: I want to create a InChI.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be carcinogenic.\nAssistant: Ok, this InChI is not carcinogenic: InChI=1S\/C20H22N2O2\/c1-3-19-10-22(2)16-11-9-24-15(8-13(11)19)20(17(16)19)12-6-4-5-7-14(12)21-18(20)23\/h3-7,11,13,15-17H,1,8-10H2,2H3,(H,21,23)\/t11-,13?,15?,16+,17-,19-,20-\/m0\/s1"}", "/scratch/micpie/export/carcinogens/train_0-2.jsonl": "{"text":"The canonical SMILES CC(CCl)OC(C)CCl is from a molecule that is identified as carcinogenic."} {"text":"The SELFIES ['[O][=S][=Branch1][C][=O][Branch1][C][O][C][=C][C][=C][C][Branch2][Ring1][O][\/N][=N][\/C][=C][C][=C][Branch1][=Branch2][S][=Branch1][C][=O][=Branch1][C][=O][O][C][=C][C][=C][C][=C][Ring1][=C][Ring1][=Branch1][=C][Branch1][C][O][C][Branch1][=Branch2][S][=Branch1][C][=O][=Branch1][C][=O][O][=C][C][Ring2][Ring1][O][=C][Ring2][Ring1][#C]'] represents a molecule that is identified as carcinogenic."}", "/scratch/micpie/export/carcinogens/test_0-11.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that is carcinogenic?\nAssistant: This is a molecule that is carcinogenic: COc1ccc(C(=O)\/C(Br)=C\\C(=O)O)cc1"} {"text":"User: I'm looking for the DeepSMILES of a molecule that is not carcinogenic?\nAssistant: This is a molecule that is not carcinogenic: COccccC=O)\/CBr)=C\\C=O)O)))))cc6"}", "/scratch/micpie/export/carcinogens/train_0-7.jsonl": "{"text":"Task: Please create a molecule SMILES based on the description below.\nDescription: A molecule that is carcinogenic.\nResult: CC(CCl)OC(C)CCl"} {"text":"Task: Please give me a molecule InChI based on the description.\nDescription: A molecule that is carcinogenic.\nResult: InChI=1S\/C20H14N2O10S3\/c23-20-18(35(30,31)32)10-11-9-12(33(24,25)26)5-6-13(11)19(20)22-21-16-7-8-17(34(27,28)29)15-4-2-1-3-14(15)16\/h1-10,23H,(H,24,25,26)(H,27,28,29)(H,30,31,32)\/b22-21+"}", "/scratch/micpie/export/carcinogens/train_0-11.jsonl": "{"text":"User: I'm looking for the InChI of a molecule that is carcinogenic?\nAssistant: This is a molecule that is carcinogenic: InChI=1S\/C6H12Cl2O\/c1-5(3-7)9-6(2)4-8\/h5-6H,3-4H2,1-2H3"} {"text":"User: I'm looking for the DeepSMILES of a molecule that is carcinogenic?\nAssistant: This is a molecule that is carcinogenic: O=S=O)O)ccccc\/N=N\/ccccS=O)=O)O))cccccc%106))))))))))))cO)cS=O)=O)O))cc6c%10"}", "/scratch/micpie/export/carcinogens/train_0-1.jsonl": "{"text":"Based on the canonical SMILES representation CC(CCl)OC(C)CCl, the molecule has carcinogenic effects."} {"text":"Based on the SMILES O=S(=O)(O)c1ccc2c(\/N=N\/c3ccc(S(=O)(=O)O)c4ccccc34)c(O)c(S(=O)(=O)O)cc2c1, the molecule has carcinogenic features."}", "/scratch/micpie/export/carcinogens/train_0-13.jsonl": "{"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should be carcinogenic.\nAssistant: Ok, this SELFIES is carcinogenic: ['[C][C][Branch1][Ring1][C][Cl][O][C][Branch1][C][C][C][Cl]']"} {"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should be carcinogenic.\nAssistant: Ok, this SMILES is carcinogenic: O=S(=O)(O)c1ccc2c(\/N=N\/c3ccc(S(=O)(=O)O)c4ccccc34)c(O)c(S(=O)(=O)O)cc2c1"}", "/scratch/micpie/export/carcinogens/train_0-4.jsonl": "{"text":"The SELFIES ['[C][C][Branch1][Ring1][C][Cl][O][C][Branch1][C][C][C][Cl]'] is carcinogenic."} {"text":"The InChI InChI=1S\/C20H14N2O10S3\/c23-20-18(35(30,31)32)10-11-9-12(33(24,25)26)5-6-13(11)19(20)22-21-16-7-8-17(34(27,28)29)15-4-2-1-3-14(15)16\/h1-10,23H,(H,24,25,26)(H,27,28,29)(H,30,31,32)\/b22-21+ is carcinogenic."}", "/scratch/micpie/export/carcinogens/test_0-7.jsonl": "{"text":"Task: Please generate a InChI based on the text description below.\nDescription: A molecule that is carcinogenic.\nResult: InChI=1S\/C11H9BrO4\/c1-16-8-4-2-7(3-5-8)11(15)9(12)6-10(13)14\/h2-6H,1H3,(H,13,14)\/b9-6+"} {"text":"Task: Please give me a molecule DeepSMILES based on the description below.\nDescription: A molecule that is carcinogenic.\nResult: COccccC=O)\/CBr)=C\\C=O)O)))))cc6"}", "/scratch/micpie/export/carcinogens/train_0-9.jsonl": "{"text":"User: Is the molecule with the SMILES CC(CCl)OC(C)CCl carcinogenic?\nAssistant: Yes, it is carcinogenic."} {"text":"User: Is the molecule with the InChI InChI=1S\/C20H14N2O10S3\/c23-20-18(35(30,31)32)10-11-9-12(33(24,25)26)5-6-13(11)19(20)22-21-16-7-8-17(34(27,28)29)15-4-2-1-3-14(15)16\/h1-10,23H,(H,24,25,26)(H,27,28,29)(H,30,31,32)\/b22-21+ carcinogenic?\nAssistant: Yes, it is carcinogenic."}", "/scratch/micpie/export/carcinogens/valid_0-3.jsonl": "{"text":"The DeepSMILES O=S=O)O)cccS=O)=O)O))cc\/N=N\/cccccc6))))))))cO)ccc6c%10 is carcinogenic."} {"text":"The InChI InChI=1S\/C20H22N2O2\/c1-3-19-10-22(2)16-11-9-24-15(8-13(11)19)20(17(16)19)12-6-4-5-7-14(12)21-18(20)23\/h3-7,11,13,15-17H,1,8-10H2,2H3,(H,21,23)\/t11-,13?,15?,16+,17-,19-,20-\/m0\/s1 is not carcinogenic."}", "/scratch/micpie/export/carcinogens/test_0-8.jsonl": "{"text":"User: Can you derive if the molecule with the SELFIES ['[C][O][C][=C][C][=C][Branch1][#C][C][=Branch1][C][=O][\/C][Branch1][C][Br][=C][\\\\C][=Branch1][C][=O][O][C][=C][Ring1][=C]'] is carcinogenic?\nAssistant: Yes, this molecule is carcinogenic."} {"text":"User: Can you tell me if the molecule with the SELFIES ['[C][O][C][=C][C][=C][Branch1][#C][C][=Branch1][C][=O][\/C][Branch1][C][Br][=C][\\\\C][=Branch1][C][=O][O][C][=C][Ring1][=C]'] is carcinogenic?\nAssistant: No, this molecule is not carcinogenic."}", "/scratch/micpie/export/carcinogens/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI representation of InChI=1S\/C11H9BrO4\/c1-16-8-4-2-7(3-5-8)11(15)9(12)6-10(13)14\/h2-6H,1H3,(H,13,14)\/b9-6+ carcinogenic?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any additional words.\nOptions:\n1. True\n2. False\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES COc1ccc(C(=O)\/C(Br)=C\\C(=O)O)cc1 carcinogenic?\nConstraint: Even if you are uncertain, you must pick either A or B without using any additional words.\nOptions:\nA: False\nB: True\nAnswer: A"}", "/scratch/micpie/export/carcinogens/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES representation of O=S=O)O)cccS=O)=O)O))cc\/N=N\/cccccc6))))))))cO)ccc6c%10 carcinogenic?\nConstraint: Even if you are uncertain, you must pick either A or B without using any additional words.\nOptions:\nA) True\nB) False\nAnswer: A"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of C=C[C@]12CN(C)[C@@H]3[C@H]4COC(CC41)[C@]1(C(=O)Nc4ccccc41)[C@@H]32 carcinogenic?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any additional words.\nOptions:\n1.) True\n2.) False\nAnswer: 2"}", "/scratch/micpie/export/carcinogens/test_0-4.jsonl": "{"text":"The molecule DeepSMILES COccccC=O)\/CBr)=C\\C=O)O)))))cc6 is carcinogenic."} {"text":"The SMILES COc1ccc(C(=O)\/C(Br)=C\\C(=O)O)cc1 is not carcinogenic."}", "/scratch/micpie/export/carcinogens/test_0-12.jsonl": "{"text":"User: I want to create a SELFIES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should be carcinogenic.\nAssistant: Got it, here you go, this SELFIES is carcinogenic: ['[C][O][C][=C][C][=C][Branch1][#C][C][=Branch1][C][=O][\/C][Branch1][C][Br][=C][\\\\C][=Branch1][C][=O][O][C][=C][Ring1][=C]']"} {"text":"User: I want to create a SELFIES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be carcinogenic.\nAssistant: Got it, this SELFIES is not carcinogenic: ['[C][O][C][=C][C][=C][Branch1][#C][C][=Branch1][C][=O][\/C][Branch1][C][Br][=C][\\\\C][=Branch1][C][=O][O][C][=C][Ring1][=C]']"}", "/scratch/micpie/export/buchwald_hartwig/test_0-10.jsonl": "{"text":"User: I would like to run a reaction with the RXNSMILES CN1CCCN2CCCN=C12.COc1ccc(Cl)cc1.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(C)on1.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>COc1ccc(Nc2ccc(C)cc2)cc1. What is the yield I can expect?\nAssistant: The predicted yield is 0.493\\%."} {"text":"User: I would like to run a reaction with the reaction SMILES (RXNSMILES) CCc1ccc(I)cc1.CN(C)C(=NC(C)(C)C)N(C)C.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F.c1ccc(-c2ccno2)cc1>>CCc1ccc(Nc2ccc(C)cc2)cc1. What is the reaction yield I can expect?\nAssistant: The estimated reaction yield is 73.740\\%."}", "/scratch/micpie/export/buchwald_hartwig/valid_0-8.jsonl": "{"text":"Task: Predict the masked component in a masked RXNSMILES (one component masked as `MASK`).\nDescription: CCOC(=O)c1cnoc1.CN1CCCN2CCCN=C12.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.FC(F)(F)c1ccc(Cl)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F.MASK>>Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1\nAnswer: Cc1ccc(N)cc1"} {"text":"Task: Predict the masked component in a masked reaction SMILES string (one component masked as `MASK`).\nDescription: COc1ccc(Br)cc1.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(-n2cccc2)no1.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F.MASK>>COc1ccc(Nc2ccc(C)cc2)cc1\nAnswer: CN1CCCN2CCCN=C12"}", "/scratch/micpie/export/buchwald_hartwig/train_0-8.jsonl": "{"text":"Task: Predict the masked component in a masked reaction SMILES (one component masked as `MASK`).\nDescription: CCN=P(N=P(N(C)C)(N(C)C)N(C)C)(N(C)C)N(C)C.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(C)on1.Cc1ccc(N)cc1.Clc1ccccn1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>MASK\nAnswer: Cc1ccc(Nc2ccccn2)cc1"} {"text":"Task: Predict the masked component in a masked RXNSMILES (one component masked as `MASK`).\nDescription: CCOC(=O)c1cnoc1.CN(C)C(=NC(C)(C)C)N(C)C.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.FC(F)(F)c1ccc(Cl)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>MASK\nSolution: Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1"}", "/scratch/micpie/export/buchwald_hartwig/test_0-5.jsonl": "{"text":"Question: Which reaction products are produced from the educts CN1CCCN2CCCN=C12, COc1ccc(Cl)cc1, COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1cc(C)on1, Cc1ccc(N)cc1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F?\nAnswer: COc1ccc(Nc2ccc(C)cc2)cc1."} {"text":"Question: Which reaction products are produced from the starting materials CCc1ccc(I)cc1, CN(C)C(=NC(C)(C)C)N(C)C, COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1ccc(N)cc1, O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F, and c1ccc(-c2ccno2)cc1?\nAnswer: CCc1ccc(Nc2ccc(C)cc2)cc1."}", "/scratch/micpie/export/buchwald_hartwig/valid_0-9.jsonl": "{"text":"The reaction yield of a reaction with the reaction SMILES string CCOC(=O)c1cnoc1.CN1CCCN2CCCN=C12.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.FC(F)(F)c1ccc(Cl)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1 is 20.083\\%."} {"text":"The yield of a reaction with the RXNSMILES CN1CCCN2CCCN=C12.COc1ccc(Br)cc1.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(-n2cccc2)no1.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>COc1ccc(Nc2ccc(C)cc2)cc1 is 47.156\\%."}", "/scratch/micpie/export/buchwald_hartwig/test_0-1.jsonl": "{"text":"The reaction SMILES (RXNSMILES) CN1CCCN2CCCN=C12.COc1ccc(Cl)cc1.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(C)on1.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>COc1ccc(Nc2ccc(C)cc2)cc1 has the products COc1ccc(Nc2ccc(C)cc2)cc1 and the reaction educts CN1CCCN2CCCN=C12, COc1ccc(Cl)cc1, COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1cc(C)on1, Cc1ccc(N)cc1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F."} {"text":"The reaction SMILES (RXNSMILES) CCc1ccc(I)cc1.CN(C)C(=NC(C)(C)C)N(C)C.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F.c1ccc(-c2ccno2)cc1>>CCc1ccc(Nc2ccc(C)cc2)cc1 has the products CCc1ccc(Nc2ccc(C)cc2)cc1 and the starting materials CCc1ccc(I)cc1, CN(C)C(=NC(C)(C)C)N(C)C, COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1ccc(N)cc1, O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F, and c1ccc(-c2ccno2)cc1."}", "/scratch/micpie/export/buchwald_hartwig/valid_0-0.jsonl": "{"text":"The reaction SMILES CCOC(=O)c1cnoc1.CN1CCCN2CCCN=C12.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.FC(F)(F)c1ccc(Cl)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1 has the reaction educts CCOC(=O)c1cnoc1, CN1CCCN2CCCN=C12, COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1ccc(N)cc1, FC(F)(F)c1ccc(Cl)cc1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F and the reaction products Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1."} {"text":"The RXNSMILES CN1CCCN2CCCN=C12.COc1ccc(Br)cc1.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(-n2cccc2)no1.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>COc1ccc(Nc2ccc(C)cc2)cc1 has the starting materials CN1CCCN2CCCN=C12, COc1ccc(Br)cc1, COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1cc(-n2cccc2)no1, Cc1ccc(N)cc1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F and the reaction products COc1ccc(Nc2ccc(C)cc2)cc1."}", "/scratch/micpie/export/buchwald_hartwig/test_0-2.jsonl": "{"text":"The masked component in the masked reaction SMILES (one component masked as `MASK`) CN1CCCN2CCCN=C12.COc1ccc(Cl)cc1.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(C)on1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F.MASK>>COc1ccc(Nc2ccc(C)cc2)cc1 is Cc1ccc(N)cc1."} {"text":"The masked component in the reaction SMILES with one element masked as `MASK` CCc1ccc(I)cc1.CN(C)C(=NC(C)(C)C)N(C)C.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F.c1ccc(-c2ccno2)cc1>>MASK is CCc1ccc(Nc2ccc(C)cc2)cc1."}", "/scratch/micpie/export/buchwald_hartwig/valid_0-10.jsonl": "{"text":"User: I want to run a reaction with the reaction SMILES (RXNSMILES) CCOC(=O)c1cnoc1.CN1CCCN2CCCN=C12.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.FC(F)(F)c1ccc(Cl)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1. What is the yield I should expect?\nAssistant: The expected yield is 20.083\\%."} {"text":"User: I want to run a reaction with the reaction SMILES CN1CCCN2CCCN=C12.COc1ccc(Br)cc1.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(-n2cccc2)no1.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>COc1ccc(Nc2ccc(C)cc2)cc1. What is the reaction yield (measured by LCMS) I can get?\nAssistant: The estimated reaction yield (measured by LCMS) is 47.156\\%."}", "/scratch/micpie/export/buchwald_hartwig/train_0-6.jsonl": "{"text":"User: I want synthesize the products Cc1ccc(Nc2ccccn2)cc1.\nAssistant: Is there anything else I can do for you?\nUser: I would like to know the reaction educts I need to produce the products Cc1ccc(Nc2ccccn2)cc1.\nAssistant: I advise the following reaction educts: CCN=P(N=P(N(C)C)(N(C)C)N(C)C)(N(C)C)N(C)C, COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1cc(C)on1, Cc1ccc(N)cc1, Clc1ccccn1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F."} {"text":"User: I need to produce the products Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1.\nAssistant: Cool, is there anything else I can do for you?\nUser: I would like to know the educts I need to produce the products Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1.\nAssistant: I advise the following educts: CCOC(=O)c1cnoc1, CN(C)C(=NC(C)(C)C)N(C)C, COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1ccc(N)cc1, FC(F)(F)c1ccc(Cl)cc1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F."}", "/scratch/micpie/export/buchwald_hartwig/valid_0-6.jsonl": "{"text":"User: I need to produce the products Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1.\nAssistant: Is there anything else I can do for you?\nUser: Yes, I would like to know the reaction educts I need to produce the products Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1.\nAssistant: I suggest the following reaction educts: CCOC(=O)c1cnoc1, CN1CCCN2CCCN=C12, COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1ccc(N)cc1, FC(F)(F)c1ccc(Cl)cc1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F."} {"text":"User: I would like to synthesize the products COc1ccc(Nc2ccc(C)cc2)cc1.\nAssistant: Great, is there anything else I can do for you?\nUser: I would like to know the educts I need to produce the products COc1ccc(Nc2ccc(C)cc2)cc1.\nAssistant: I propose the following educts: CN1CCCN2CCCN=C12, COc1ccc(Br)cc1, COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1cc(-n2cccc2)no1, Cc1ccc(N)cc1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F."}", "/scratch/micpie/export/buchwald_hartwig/test_0-9.jsonl": "{"text":"The yield of a reaction with the RXNSMILES CN1CCCN2CCCN=C12.COc1ccc(Cl)cc1.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(C)on1.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>COc1ccc(Nc2ccc(C)cc2)cc1 is 0.493\\%."} {"text":"The reaction yield of a reaction with the reaction SMILES CCc1ccc(I)cc1.CN(C)C(=NC(C)(C)C)N(C)C.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F.c1ccc(-c2ccno2)cc1>>CCc1ccc(Nc2ccc(C)cc2)cc1 is 73.740\\%."}", "/scratch/micpie/export/buchwald_hartwig/test_0-0.jsonl": "{"text":"The RXNSMILES CN1CCCN2CCCN=C12.COc1ccc(Cl)cc1.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(C)on1.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>COc1ccc(Nc2ccc(C)cc2)cc1 has the reaction educts CN1CCCN2CCCN=C12, COc1ccc(Cl)cc1, COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1cc(C)on1, Cc1ccc(N)cc1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F and the reaction products COc1ccc(Nc2ccc(C)cc2)cc1."} {"text":"The reaction SMILES string CCc1ccc(I)cc1.CN(C)C(=NC(C)(C)C)N(C)C.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F.c1ccc(-c2ccno2)cc1>>CCc1ccc(Nc2ccc(C)cc2)cc1 has the educts CCc1ccc(I)cc1, CN(C)C(=NC(C)(C)C)N(C)C, COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1ccc(N)cc1, O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F, and c1ccc(-c2ccno2)cc1 and the reaction products CCc1ccc(Nc2ccc(C)cc2)cc1."}", "/scratch/micpie/export/buchwald_hartwig/valid_0-7.jsonl": "{"text":"Question: What is the masked component in the reaction SMILES with one element masked as `MASK` CCOC(=O)c1cnoc1.CN1CCCN2CCCN=C12.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.FC(F)(F)c1ccc(Cl)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F.MASK>>Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1?\nAnswer: Cc1ccc(N)cc1."} {"text":"Question: What is the masked component in the masked RXNSMILES (one component masked as `MASK`) COc1ccc(Br)cc1.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(-n2cccc2)no1.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F.MASK>>COc1ccc(Nc2ccc(C)cc2)cc1?\nAnswer: CN1CCCN2CCCN=C12."}", "/scratch/micpie/export/buchwald_hartwig/test_0-3.jsonl": "{"text":"The chemical with SMILES Cc1ccc(N)cc1 is the masked component in the masked RXNSMILES (one component masked as `MASK`) CN1CCCN2CCCN=C12.COc1ccc(Cl)cc1.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(C)on1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F.MASK>>COc1ccc(Nc2ccc(C)cc2)cc1."} {"text":"The compound with SMILES CCc1ccc(Nc2ccc(C)cc2)cc1 is the masked component in the masked reaction SMILES string (one component masked as `MASK`) CCc1ccc(I)cc1.CN(C)C(=NC(C)(C)C)N(C)C.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F.c1ccc(-c2ccno2)cc1>>MASK."}", "/scratch/micpie/export/buchwald_hartwig/valid_0-11.jsonl": "{"text":"Question: What is reaction yield of a reaction with the reaction SMILES (RXNSMILES) CCOC(=O)c1cnoc1.CN1CCCN2CCCN=C12.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.FC(F)(F)c1ccc(Cl)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1?\nAnswer: 20.083\\%."} {"text":"Question: What is reaction yield (measured by LCMS) of a reaction with the RXNSMILES CN1CCCN2CCCN=C12.COc1ccc(Br)cc1.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(-n2cccc2)no1.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>COc1ccc(Nc2ccc(C)cc2)cc1?\nAnswer: 47.156\\%."}", "/scratch/micpie/export/buchwald_hartwig/train_0-0.jsonl": "{"text":"The reaction SMILES (RXNSMILES) CCN=P(N=P(N(C)C)(N(C)C)N(C)C)(N(C)C)N(C)C.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(C)on1.Cc1ccc(N)cc1.Clc1ccccn1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>Cc1ccc(Nc2ccccn2)cc1 has the starting materials CCN=P(N=P(N(C)C)(N(C)C)N(C)C)(N(C)C)N(C)C, COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1cc(C)on1, Cc1ccc(N)cc1, Clc1ccccn1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F and the reaction products Cc1ccc(Nc2ccccn2)cc1."} {"text":"The reaction SMILES (RXNSMILES) CCOC(=O)c1cnoc1.CN(C)C(=NC(C)(C)C)N(C)C.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.FC(F)(F)c1ccc(Cl)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1 has the starting materials CCOC(=O)c1cnoc1, CN(C)C(=NC(C)(C)C)N(C)C, COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1ccc(N)cc1, FC(F)(F)c1ccc(Cl)cc1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F and the reaction products Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1."}", "/scratch/micpie/export/buchwald_hartwig/test_0-6.jsonl": "{"text":"User: I need to produce the products COc1ccc(Nc2ccc(C)cc2)cc1.\nAssistant: Cool, is there anything else I can do for you?\nUser: I would like to know the reaction educts I need to produce the products COc1ccc(Nc2ccc(C)cc2)cc1.\nAssistant: I recommend the following reaction educts: CN1CCCN2CCCN=C12, COc1ccc(Cl)cc1, COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1cc(C)on1, Cc1ccc(N)cc1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F."} {"text":"User: I must produce the products CCc1ccc(Nc2ccc(C)cc2)cc1.\nAssistant: Great, is there anything else I can do for you?\nUser: Yes, I would like to know the starting materials I need to produce the products CCc1ccc(Nc2ccc(C)cc2)cc1.\nAssistant: I recommend the following starting materials: CCc1ccc(I)cc1, CN(C)C(=NC(C)(C)C)N(C)C, COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1ccc(N)cc1, O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F, and c1ccc(-c2ccno2)cc1."}", "/scratch/micpie/export/buchwald_hartwig/train_0-10.jsonl": "{"text":"User: I need to run a reaction with the reaction SMILES (RXNSMILES) CCN=P(N=P(N(C)C)(N(C)C)N(C)C)(N(C)C)N(C)C.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(C)on1.Cc1ccc(N)cc1.Clc1ccccn1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>Cc1ccc(Nc2ccccn2)cc1. What is the reaction yield I should get?\nAssistant: The reaction yield is 70.410\\%."} {"text":"User: I would like to run a reaction with the reaction SMILES CCOC(=O)c1cnoc1.CN(C)C(=NC(C)(C)C)N(C)C.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.FC(F)(F)c1ccc(Cl)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1. What is the reaction yield (measured by LCMS) I can expect?\nAssistant: The expected reaction yield (measured by LCMS) is 0.702\\%."}", "/scratch/micpie/export/buchwald_hartwig/train_0-3.jsonl": "{"text":"The compound with SMILES Cc1ccc(Nc2ccccn2)cc1 is the masked component in the masked reaction SMILES (one component masked as `MASK`) CCN=P(N=P(N(C)C)(N(C)C)N(C)C)(N(C)C)N(C)C.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(C)on1.Cc1ccc(N)cc1.Clc1ccccn1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>MASK."} {"text":"The chemical with SMILES Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1 is the masked component in the masked RXNSMILES (one component masked as `MASK`) CCOC(=O)c1cnoc1.CN(C)C(=NC(C)(C)C)N(C)C.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.FC(F)(F)c1ccc(Cl)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>MASK."}", "/scratch/micpie/export/buchwald_hartwig/valid_0-2.jsonl": "{"text":"The masked component in the masked reaction SMILES (one component masked as `MASK`) CCOC(=O)c1cnoc1.CN1CCCN2CCCN=C12.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.FC(F)(F)c1ccc(Cl)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F.MASK>>Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1 is Cc1ccc(N)cc1."} {"text":"The masked component in the masked RXNSMILES (one component masked as `MASK`) COc1ccc(Br)cc1.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(-n2cccc2)no1.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F.MASK>>COc1ccc(Nc2ccc(C)cc2)cc1 is CN1CCCN2CCCN=C12."}", "/scratch/micpie/export/buchwald_hartwig/valid_0-1.jsonl": "{"text":"The reaction SMILES CCOC(=O)c1cnoc1.CN1CCCN2CCCN=C12.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.FC(F)(F)c1ccc(Cl)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1 has the reaction products Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1 and the starting materials CCOC(=O)c1cnoc1, CN1CCCN2CCCN=C12, COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1ccc(N)cc1, FC(F)(F)c1ccc(Cl)cc1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F."} {"text":"The reaction SMILES string CN1CCCN2CCCN=C12.COc1ccc(Br)cc1.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(-n2cccc2)no1.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>COc1ccc(Nc2ccc(C)cc2)cc1 has the products COc1ccc(Nc2ccc(C)cc2)cc1 and the starting materials CN1CCCN2CCCN=C12, COc1ccc(Br)cc1, COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1cc(-n2cccc2)no1, Cc1ccc(N)cc1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F."}", "/scratch/micpie/export/buchwald_hartwig/valid_0-5.jsonl": "{"text":"Question: What products are produced from the starting materials CCOC(=O)c1cnoc1, CN1CCCN2CCCN=C12, COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1ccc(N)cc1, FC(F)(F)c1ccc(Cl)cc1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F?\nAnswer: Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1."} {"text":"Question: What products are produced from the reaction educts CN1CCCN2CCCN=C12, COc1ccc(Br)cc1, COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1cc(-n2cccc2)no1, Cc1ccc(N)cc1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F?\nAnswer: COc1ccc(Nc2ccc(C)cc2)cc1."}", "/scratch/micpie/export/buchwald_hartwig/valid_0-4.jsonl": "{"text":"Question: What starting materials are needed to synthesize the products Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1?\nAnswer: CCOC(=O)c1cnoc1, CN1CCCN2CCCN=C12, COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1ccc(N)cc1, FC(F)(F)c1ccc(Cl)cc1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F."} {"text":"Question: What starting materials are needed to produce the reaction products COc1ccc(Nc2ccc(C)cc2)cc1?\nAnswer: CN1CCCN2CCCN=C12, COc1ccc(Br)cc1, COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1cc(-n2cccc2)no1, Cc1ccc(N)cc1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F."}", "/scratch/micpie/export/buchwald_hartwig/train_0-5.jsonl": "{"text":"Question: What reaction products are produced from the starting materials CCN=P(N=P(N(C)C)(N(C)C)N(C)C)(N(C)C)N(C)C, COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1cc(C)on1, Cc1ccc(N)cc1, Clc1ccccn1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F?\nAnswer: Cc1ccc(Nc2ccccn2)cc1."} {"text":"Question: What reaction products are produced from the starting materials CCOC(=O)c1cnoc1, CN(C)C(=NC(C)(C)C)N(C)C, COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1ccc(N)cc1, FC(F)(F)c1ccc(Cl)cc1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F?\nAnswer: Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1."}", "/scratch/micpie/export/buchwald_hartwig/train_0-2.jsonl": "{"text":"The masked component in the masked reaction SMILES (one component masked as `MASK`) CCN=P(N=P(N(C)C)(N(C)C)N(C)C)(N(C)C)N(C)C.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(C)on1.Cc1ccc(N)cc1.Clc1ccccn1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>MASK is Cc1ccc(Nc2ccccn2)cc1."} {"text":"The masked component in the masked reaction SMILES (one component masked as `MASK`) CCOC(=O)c1cnoc1.CN(C)C(=NC(C)(C)C)N(C)C.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.FC(F)(F)c1ccc(Cl)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>MASK is Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1."}", "/scratch/micpie/export/buchwald_hartwig/test_0-11.jsonl": "{"text":"Question: What is the yield of a reaction with the reaction SMILES string CN1CCCN2CCCN=C12.COc1ccc(Cl)cc1.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(C)on1.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>COc1ccc(Nc2ccc(C)cc2)cc1?\nAnswer: 0.493\\%."} {"text":"Question: What's the reaction yield (measured by LCMS) of a reaction with the reaction SMILES CCc1ccc(I)cc1.CN(C)C(=NC(C)(C)C)N(C)C.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F.c1ccc(-c2ccno2)cc1>>CCc1ccc(Nc2ccc(C)cc2)cc1?\nAnswer: 73.740\\%."}", "/scratch/micpie/export/buchwald_hartwig/train_0-7.jsonl": "{"text":"Question: What is the masked component in the masked reaction SMILES (one component masked as `MASK`) CCN=P(N=P(N(C)C)(N(C)C)N(C)C)(N(C)C)N(C)C.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(C)on1.Cc1ccc(N)cc1.Clc1ccccn1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>MASK?\nAnswer: Cc1ccc(Nc2ccccn2)cc1."} {"text":"Question: What is the masked component in the reaction SMILES with one element masked as `MASK` CCOC(=O)c1cnoc1.CN(C)C(=NC(C)(C)C)N(C)C.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.FC(F)(F)c1ccc(Cl)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>MASK?\nAnswer: Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1."}", "/scratch/micpie/export/buchwald_hartwig/train_0-11.jsonl": "{"text":"Question: What is the yield of a reaction with the reaction SMILES (RXNSMILES) CCN=P(N=P(N(C)C)(N(C)C)N(C)C)(N(C)C)N(C)C.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(C)on1.Cc1ccc(N)cc1.Clc1ccccn1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>Cc1ccc(Nc2ccccn2)cc1?\nAnswer: 70.410\\%."} {"text":"Question: What is the reaction yield of a reaction with the RXNSMILES CCOC(=O)c1cnoc1.CN(C)C(=NC(C)(C)C)N(C)C.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.FC(F)(F)c1ccc(Cl)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1?\nAnswer: 0.702\\%."}", "/scratch/micpie/export/buchwald_hartwig/train_0-1.jsonl": "{"text":"The reaction SMILES string CCN=P(N=P(N(C)C)(N(C)C)N(C)C)(N(C)C)N(C)C.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(C)on1.Cc1ccc(N)cc1.Clc1ccccn1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>Cc1ccc(Nc2ccccn2)cc1 has the reaction products Cc1ccc(Nc2ccccn2)cc1 and the educts CCN=P(N=P(N(C)C)(N(C)C)N(C)C)(N(C)C)N(C)C, COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1cc(C)on1, Cc1ccc(N)cc1, Clc1ccccn1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F."} {"text":"The reaction SMILES (RXNSMILES) CCOC(=O)c1cnoc1.CN(C)C(=NC(C)(C)C)N(C)C.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.FC(F)(F)c1ccc(Cl)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1 has the products Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1 and the reaction educts CCOC(=O)c1cnoc1, CN(C)C(=NC(C)(C)C)N(C)C, COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1ccc(N)cc1, FC(F)(F)c1ccc(Cl)cc1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F."}", "/scratch/micpie/export/buchwald_hartwig/train_0-4.jsonl": "{"text":"Question: What starting materials are required to produce the reaction products Cc1ccc(Nc2ccccn2)cc1?\nAnswer: CCN=P(N=P(N(C)C)(N(C)C)N(C)C)(N(C)C)N(C)C, COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1cc(C)on1, Cc1ccc(N)cc1, Clc1ccccn1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F."} {"text":"Question: What educts are needed to synthesize the products Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1?\nAnswer: CCOC(=O)c1cnoc1, CN(C)C(=NC(C)(C)C)N(C)C, COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1ccc(N)cc1, FC(F)(F)c1ccc(Cl)cc1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F."}", "/scratch/micpie/export/buchwald_hartwig/test_0-7.jsonl": "{"text":"Question: What is the masked component in the reaction SMILES with one element masked as `MASK` CN1CCCN2CCCN=C12.COc1ccc(Cl)cc1.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(C)on1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F.MASK>>COc1ccc(Nc2ccc(C)cc2)cc1?\nAnswer: Cc1ccc(N)cc1."} {"text":"Question: What is the masked component in the masked RXNSMILES (one component masked as `MASK`) CCc1ccc(I)cc1.CN(C)C(=NC(C)(C)C)N(C)C.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F.c1ccc(-c2ccno2)cc1>>MASK?\nAnswer: CCc1ccc(Nc2ccc(C)cc2)cc1."}", "/scratch/micpie/export/buchwald_hartwig/train_0-9.jsonl": "{"text":"The reaction yield of a reaction with the reaction SMILES (RXNSMILES) CCN=P(N=P(N(C)C)(N(C)C)N(C)C)(N(C)C)N(C)C.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(C)on1.Cc1ccc(N)cc1.Clc1ccccn1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>Cc1ccc(Nc2ccccn2)cc1 is 70.410\\%."} {"text":"The reaction yield (measured by LCMS) of a reaction with the reaction SMILES (RXNSMILES) CCOC(=O)c1cnoc1.CN(C)C(=NC(C)(C)C)N(C)C.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.FC(F)(F)c1ccc(Cl)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1 is 0.702\\%."}", "/scratch/micpie/export/buchwald_hartwig/valid_0-3.jsonl": "{"text":"The compound with SMILES Cc1ccc(N)cc1 is the masked component in the reaction SMILES with one element masked as `MASK` CCOC(=O)c1cnoc1.CN1CCCN2CCCN=C12.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.FC(F)(F)c1ccc(Cl)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F.MASK>>Cc1ccc(Nc2ccc(C(F)(F)F)cc2)cc1."} {"text":"The compound with SMILES CN1CCCN2CCCN=C12 is the masked component in the masked reaction SMILES (one component masked as `MASK`) COc1ccc(Br)cc1.COc1ccc(OC)c(P(C(C)(C)C)C(C)(C)C)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(-n2cccc2)no1.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F.MASK>>COc1ccc(Nc2ccc(C)cc2)cc1."}", "/scratch/micpie/export/buchwald_hartwig/test_0-8.jsonl": "{"text":"Task: Predict the masked component in a reaction SMILES with one element hidden as `MASK`.\nDescription: CN1CCCN2CCCN=C12.COc1ccc(Cl)cc1.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(C)on1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F.MASK>>COc1ccc(Nc2ccc(C)cc2)cc1\nSolution: Cc1ccc(N)cc1"} {"text":"Task: Predict the masked component in a masked reaction SMILES (one component masked as `MASK`).\nDescription: CCc1ccc(I)cc1.CN(C)C(=NC(C)(C)C)N(C)C.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F.c1ccc(-c2ccno2)cc1>>MASK\nAnswer: CCc1ccc(Nc2ccc(C)cc2)cc1"}", "/scratch/micpie/export/buchwald_hartwig/test_0-4.jsonl": "{"text":"Question: Which reaction educts are needed to produce the reaction products COc1ccc(Nc2ccc(C)cc2)cc1?\nAnswer: CN1CCCN2CCCN=C12, COc1ccc(Cl)cc1, COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1cc(C)on1, Cc1ccc(N)cc1, and O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F."} {"text":"Question: What starting materials are needed to produce the products CCc1ccc(Nc2ccc(C)cc2)cc1?\nAnswer: CCc1ccc(I)cc1, CN(C)C(=NC(C)(C)C)N(C)C, COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C, Cc1ccc(N)cc1, O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F, and c1ccc(-c2ccno2)cc1."}", "/scratch/micpie/export/MUV_859/valid_0-0.jsonl": "{"text":"The chemical with the InChI InChI=1S\/C18H17N3O2S\/c1-21-12-11-19-18(21)24-13-17(22)20-14-7-9-16(10-8-14)23-15-5-3-2-4-6-15\/h2-12H,13H2,1H3,(H,20,22) is not an allosteric inhibitor of the M1 receptor."} {"text":"The molecule with the canonical SMILES CCN(CC)S(=O)(=O)c1ccc(-c2nnc(SCC(=O)Nc3ccc4c(c3)OCO4)o2)cc1 is not an allosteric inhibitor of the muscarinic acetylcholine receptor M1."}", "/scratch/micpie/export/MUV_859/test_0-0.jsonl": "{"text":"The chemical with the InChI InChI=1S\/C24H30N4O3\/c29-23(17-27-13-1-2-14-27)25-19-5-9-21(10-6-19)31-22-11-7-20(8-12-22)26-24(30)18-28-15-3-4-16-28\/h5-12H,1-4,13-18H2,(H,25,29)(H,26,30) is not an allosteric inhibitor of the muscarinic acetylcholine receptor M1."} {"text":"The molecule with the SMILES COCCN1C(SCc2cccnc2)=NN\/C1=C1\\C=Nc2ccccc21 is not an allosteric inhibitor of the M1 receptor."}", "/scratch/micpie/export/MUV_859/train_0-0.jsonl": "{"text":"The compound with the InChI representation of InChI=1S\/C26H21N3O5\/c1-33-22-12-17-16-7-3-5-9-20(16)34-21(17)13-19(22)27-23(30)14-29-24(31)26(28-25(29)32)11-10-15-6-2-4-8-18(15)26\/h2-9,12-13H,10-11,14H2,1H3,(H,27,30)(H,28,32) is not an allosteric inhibitor of the M1 receptor."} {"text":"The chemical with the SMILES representation of Cc1cc2ccccc2c(=O)n1CC(=O)N1CCN(c2cccc(Cl)c2)CC1 is not an allosteric inhibitor of the muscarinic acetylcholine receptor M1."}", "/scratch/micpie/export/mp_shear_modulus/test_0-5.jsonl": "{"text":"Task: Please create a solid with a shear modulus computed using DFT with the PBE GGA functional of 77.378 GPa.\nResult: GaNi3"} {"text":"Task: Please give me a solid with a shear modulus computed using DFT with the PBE functional of 33.254 GPa.\nResult: BaSi2O5"}", "/scratch/micpie/export/mp_shear_modulus/test_0-1.jsonl": "{"text":"Question: How large is the shear modulus computed using DFT with the PBE functional of GaNi3?\nAnswer: The shear modulus computed using DFT with the PBE functional of GaNi3 is 77.378 GPa."} {"text":"Question: How large is the shear modulus computed using DFT with the PBE GGA functional of BaSi2O5?\nAnswer: The shear modulus computed using DFT with the PBE GGA functional of BaSi2O5 is 33.254 GPa."}", "/scratch/micpie/export/mp_shear_modulus/valid_0-0.jsonl": "{"text":"The shear modulus computed using DFT with the PBE GGA functional of the solid LiAlO2 is 52.475 GPa."} {"text":"The shear modulus computed using DFT with the PBE GGA functional of the compound LiBeSb is 47.197 GPa."}", "/scratch/micpie/export/mp_shear_modulus/test_0-2.jsonl": "{"text":"User: I want to know the shear modulus derived from DFT simulations with the PBE functional of the solid GaNi3.\nAssistant: The shear modulus derived from DFT simulations with the PBE functional of the solid GaNi3 is 77.378 GPa."} {"text":"User: I would like to know the shear modulus computed using DFT with the PBE functional of the compound BaSi2O5.\nAssistant: The shear modulus computed using DFT with the PBE functional of the compound BaSi2O5 is 33.254 GPa."}", "/scratch/micpie/export/mp_shear_modulus/test_0-0.jsonl": "{"text":"The shear modulus computed using DFT with the PBE GGA functional of the solid GaNi3 is 77.378 GPa."} {"text":"The shear modulus derived from DFT simulations with the PBE functional of the solid BaSi2O5 is 33.254 GPa."}", "/scratch/micpie/export/mp_shear_modulus/test_0-3.jsonl": "{"text":"User: I want to design a solid with a shear modulus computed using DFT with the PBE GGA functional of 77.378 GPa.\nAssistant: Here is a solid with a shear modulus computed using DFT with the PBE GGA functional of 77.378 GPa: GaNi3."} {"text":"User: I would like to design a compound with a shear modulus derived from DFT simulations with the PBE functional of 33.254 GPa.\nAssistant: Here is a compound with a shear modulus derived from DFT simulations with the PBE functional of 33.254 GPa: BaSi2O5."}", "/scratch/micpie/export/mp_shear_modulus/train_0-0.jsonl": "{"text":"The shear modulus computed using DFT with the PBE GGA functional of Cs2ZrCl6 is 4.000 GPa."} {"text":"The shear modulus computed using DFT with the PBE GGA functional of the solid MgNi2Sb is 37.325 GPa."}", "/scratch/micpie/export/mp_shear_modulus/train_0-3.jsonl": "{"text":"User: I want to design a solid with a shear modulus computed using DFT with the PBE GGA functional of 4.000 GPa.\nAssistant: Here is a solid with a shear modulus computed using DFT with the PBE GGA functional of 4.000 GPa: Cs2ZrCl6."} {"text":"User: I want to design a with a shear modulus computed using DFT with the PBE functional of 37.325 GPa.\nAssistant: Here is a with a shear modulus computed using DFT with the PBE functional of 37.325 GPa: MgNi2Sb."}", "/scratch/micpie/export/mp_shear_modulus/valid_0-2.jsonl": "{"text":"User: I want to know the shear modulus derived from DFT simulations with the PBE functional of the compound LiAlO2.\nAssistant: The shear modulus derived from DFT simulations with the PBE functional of the compound LiAlO2 is 52.475 GPa."} {"text":"User: I want to know the shear modulus computed using DFT with the PBE functional of the solid LiBeSb.\nAssistant: The shear modulus computed using DFT with the PBE functional of the solid LiBeSb is 47.197 GPa."}", "/scratch/micpie/export/mp_shear_modulus/valid_0-1.jsonl": "{"text":"Question: How large is the shear modulus derived from DFT simulations with the PBE functional of the compound LiAlO2?\nAnswer: The shear modulus derived from DFT simulations with the PBE functional of the compound LiAlO2 is 52.475 GPa."} {"text":"Question: How large is the shear modulus computed using DFT with the PBE functional of the compound LiBeSb?\nAnswer: The shear modulus computed using DFT with the PBE functional of the compound LiBeSb is 47.197 GPa."}", "/scratch/micpie/export/mp_shear_modulus/valid_0-5.jsonl": "{"text":"Task: Please give me a material with a shear modulus derived from DFT simulations with the PBE functional of 52.475 GPa.\nResult: LiAlO2"} {"text":"Task: Please generate a compound with a shear modulus derived from DFT simulations with the PBE functional of 47.197 GPa.\nResult: LiBeSb"}", "/scratch/micpie/export/mp_shear_modulus/valid_0-4.jsonl": "{"text":"A material with a shear modulus computed using DFT with the PBE functional of 52.475 GPa is LiAlO2."} {"text":"A solid with a shear modulus computed using DFT with the PBE functional of 47.197 GPa is LiBeSb."}", "/scratch/micpie/export/mp_shear_modulus/train_0-5.jsonl": "{"text":"Task: Please generate a with a shear modulus computed using DFT with the PBE functional of 4.000 GPa.\nResult: Cs2ZrCl6"} {"text":"Task: Please create a with a shear modulus derived from DFT simulations with the PBE functional of 37.325 GPa.\nResult: MgNi2Sb"}", "/scratch/micpie/export/mp_shear_modulus/train_0-2.jsonl": "{"text":"User: I would like to know the shear modulus computed using DFT with the PBE functional of the compound Cs2ZrCl6.\nAssistant: The shear modulus computed using DFT with the PBE functional of the compound Cs2ZrCl6 is 4.000 GPa."} {"text":"User: I want to know the shear modulus computed using DFT with the PBE GGA functional of MgNi2Sb.\nAssistant: The shear modulus computed using DFT with the PBE GGA functional of MgNi2Sb is 37.325 GPa."}", "/scratch/micpie/export/mp_shear_modulus/train_0-1.jsonl": "{"text":"Question: How large is the shear modulus computed using DFT with the PBE GGA functional of Cs2ZrCl6?\nAnswer: The shear modulus computed using DFT with the PBE GGA functional of Cs2ZrCl6 is 4.000 GPa."} {"text":"Question: How large is the shear modulus computed using DFT with the PBE functional of MgNi2Sb?\nAnswer: The shear modulus computed using DFT with the PBE functional of MgNi2Sb is 37.325 GPa."}", "/scratch/micpie/export/mp_shear_modulus/train_0-4.jsonl": "{"text":"A material with a shear modulus computed using DFT with the PBE GGA functional of 4.000 GPa is Cs2ZrCl6."} {"text":"A compound with a shear modulus computed using DFT with the PBE GGA functional of 37.325 GPa is MgNi2Sb."}", "/scratch/micpie/export/mp_shear_modulus/valid_0-3.jsonl": "{"text":"User: I want to design a with a shear modulus computed using DFT with the PBE GGA functional of 52.475 GPa.\nAssistant: Here is a with a shear modulus computed using DFT with the PBE GGA functional of 52.475 GPa: LiAlO2."} {"text":"User: I would like to design a material with a shear modulus derived from DFT simulations with the PBE functional of 47.197 GPa.\nAssistant: Here is a material with a shear modulus derived from DFT simulations with the PBE functional of 47.197 GPa: LiBeSb."}", "/scratch/micpie/export/mp_shear_modulus/test_0-4.jsonl": "{"text":"A material with a shear modulus computed using DFT with the PBE GGA functional of 77.378 GPa is GaNi3."} {"text":"A with a shear modulus derived from DFT simulations with the PBE functional of 33.254 GPa is BaSi2O5."}", "/scratch/micpie/export/compound_protein_protein/valid_5-2.jsonl": "{"text":"User: Can you come up with an example for a protein that binds the compound CccccC)n5-cccC=O)N\/N=C\\ccccO)cF)c6))))))))))ccc6C=O)O?\nAssistant: The compound CccccC)n5-cccC=O)N\/N=C\\ccccO)cF)c6))))))))))ccc6C=O)O targets for example the protein Class E basic helix-loop-helix protein 39.\nUser: Can you tell me a protein that interacts with protein Class E basic helix-loop-helix protein 39?\nAssistant: Yes, the protein Class E basic helix-loop-helix protein 39 interacts with Voltage-gated calcium channel subunit alpha Cav3.1."} {"text":"User: Can you give me an example for a protein that binds the compound [C][=C][C][N][C][C][C@][C][=C][C][=C][C][Branch1][=Branch1][C][Branch1][C][N][=O][=C][Ring1][=Branch2][O][C@@H1][Ring1][N][C@@H1][Branch1][C][O][C][=C][C@H1][Ring1][P][C@H1][Ring2][Ring1][Branch1][C][Ring1][P]?\nAssistant: The compound [C][=C][C][N][C][C][C@][C][=C][C][=C][C][Branch1][=Branch1][C][Branch1][C][N][=O][=C][Ring1][=Branch2][O][C@@H1][Ring1][N][C@@H1][Branch1][C][O][C][=C][C@H1][Ring1][P][C@H1][Ring2][Ring1][Branch1][C][Ring1][P] targets for example the protein MOR-1.\nUser: Can you tell me a protein that is ortholog to protein MOR-1?\nAssistant: Yes, of course, the protein MOR-1 is ortholog to M-OR-1."}", "/scratch/micpie/export/compound_protein_protein/train_10-2.jsonl": "{"text":"User: Can you come up with one example for a protein that binds the compound CnncCN)=O))cc5-cncNcccccc6CF)F)F)))))))))ncc6CC%10?\nAssistant: The compound CnncCN)=O))cc5-cncNcccccc6CF)F)F)))))))))ncc6CC%10 targets for example the protein Polo-like kinase 1.\nUser: Can you tell me a protein that interacts with protein Polo-like kinase 1?\nAssistant: Of course, the protein Polo-like kinase 1 interacts with Meiosis-specific kinetochore protein."} {"text":"User: Can you come up with one example for a protein that binds the compound COc1ccccc1CC(c1cccc(F)c1)N1CCNCC1?\nAssistant: The compound COc1ccccc1CC(c1cccc(F)c1)N1CCNCC1 targets for example the protein Sodium-dependent noradrenaline transporter.\nUser: Can you tell me a protein that is ortholog to protein Sodium-dependent noradrenaline transporter?\nAssistant: Sure, the protein Sodium-dependent noradrenaline transporter is ortholog to Protein fumin."}", "/scratch/micpie/export/compound_protein_protein/test_8-1.jsonl": "{"text":"The protein Bifunctional epoxide hydrolase 2 is targeted by [C][C][=C][N][=C][Branch2][Branch1][C][C][=Branch1][C][=O][N][C][C][C][Branch1][Branch1][C][C][Ring1][=Branch1][C][N][Branch2][Ring1][=Branch2][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][#Branch2][O][C][Branch1][C][F][Branch1][C][F][F][C][=C][Ring1][O][C][C][O][Ring2][Ring1][=Branch1][C][=N][Ring2][Ring1][P]. The protein Bifunctional epoxide hydrolase 2 interacts with Catalase."} {"text":"The protein Cell division protein kinase 2 is targeted by [C][C][Branch1][C][C][N][C][C][N][Branch2][Ring2][=C][C][C][=C][C][=C][Branch2][Ring1][P][N][C][=N][C][=C][C][=C][N][Branch1][#Branch1][C][Ring1][Branch1][=N][Ring1][=Branch2][C][Branch1][Branch2][C][C][C][C][C][Ring1][=Branch1][C][N][C][Ring1][=N][=O][N][=C][Ring2][Ring1][#Branch2][C][C][Ring2][Ring1][P]. The protein Cell division protein kinase 2 is ortholog to Cell division control protein 2 homolog A."}", "/scratch/micpie/export/compound_protein_protein/train_2-2.jsonl": "{"text":"User: Can you give me one example for a protein that binds the compound Cn1cc(-c2ccc3nnc(Sc4ccc5ncccc5c4)n3n2)cn1?\nAssistant: The compound Cn1cc(-c2ccc3nnc(Sc4ccc5ncccc5c4)n3n2)cn1 targets for example the protein MAPK 10.\nUser: Can you tell me a protein that interacts with protein MAPK 10?\nAssistant: Yes, the protein MAPK 10 interacts with Transcription factor AP-1 subunit JunD."} {"text":"User: Can you give me one example for a protein that binds the compound CCOccc=O)nC)cc6-cccNCCcccccc6))))))))S=O)=O)CC))))ccc6OccccF)cc6F?\nAssistant: The compound CCOccc=O)nC)cc6-cccNCCcccccc6))))))))S=O)=O)CC))))ccc6OccccF)cc6F targets for example the protein Bromodomain-containing protein 4.\nUser: Can you tell me a protein that is ortholog to protein Bromodomain-containing protein 4?\nAssistant: Sure, the protein Bromodomain-containing protein 4 is ortholog to Bromodomain-containing protein 4A."}", "/scratch/micpie/export/compound_protein_protein/test_4-0.jsonl": "{"text":"The compound InChI=1S\/C78H123N21O21\/c1-7-43(6)63(73(116)96-57(76(119)120)37-42(4)5)97-70(113)55(39-45-21-25-47(101)26-22-45)95-72(115)59-18-13-35-99(59)75(118)52(16-11-33-87-78(84)85)90-65(108)49(15-10-32-86-77(82)83)89-71(114)58-17-12-34-98(58)74(117)51(14-8-9-31-79)91-69(112)56(40-60(81)102)94-66(109)50(28-30-62(105)106)88-68(111)54(38-44-19-23-46(100)24-20-44)93-67(110)53(36-41(2)3)92-64(107)48(80)27-29-61(103)104\/h19-26,41-43,48-59,63,100-101H,7-18,27-40,79-80H2,1-6H3,(H2,81,102)(H,88,111)(H,89,114)(H,90,108)(H,91,112)(H,92,107)(H,93,110)(H,94,109)(H,95,115)(H,96,116)(H,97,113)(H,103,104)(H,105,106)(H,119,120)(H4,82,83,86)(H4,84,85,87)\/t43-,48-,49-,50-,51-,52-,53-,54-,55-,56-,57-,58-,59-,63-\/m0\/s1 targets the protein NTR1 which interacts with the protein Dopamine D2 receptor."} {"text":"The compound [C][=C][Branch1][=Branch2][C][N][C][C][C][C][Ring1][Ring1][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][C][C][C@H1][Ring1][#Branch1][N][C][=Branch1][C][=O][C][C@@H1][Branch2][Ring1][O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][=C][Ring1][#Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1] targets the protein B1R which is ortholog to the protein B1 bradykinin receptor."}", "/scratch/micpie/export/compound_protein_protein/valid_5-1.jsonl": "{"text":"The protein Myc proto-oncogene protein is targeted by Cc1ccc(C)n1-c1cc(C(=O)N\/N=C\\c2ccc(O)c(F)c2)ccc1C(=O)O. The protein Myc proto-oncogene protein interacts with NBR13."} {"text":"The protein M-OR-1 is targeted by InChI=1S\/C20H22N2O3\/c1-2-8-22-9-7-20-13-5-6-15(23)18(20)25-17-12(19(21)24)4-3-11(16(17)20)10-14(13)22\/h2-6,13-15,18,23H,1,7-10H2,(H2,21,24)\/t13-,14+,15-,18+,20-\/m0\/s1. The protein M-OR-1 is ortholog to MOR-1."}", "/scratch/micpie/export/compound_protein_protein/valid_8-2.jsonl": "{"text":"User: Can you come up with an example for a protein that binds the compound N#C[C@@H]1CCCN1C(=O)CN[C@@]12C[C@@H]3C[C@H](C1)C[C@@](OC(=O)NC1CCCCC1)(C3)C2?\nAssistant: The compound N#C[C@@H]1CCCN1C(=O)CN[C@@]12C[C@@H]3C[C@H](C1)C[C@@](OC(=O)NC1CCCCC1)(C3)C2 targets for example the protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26).\nUser: Can you tell me a protein that interacts with protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26)?\nAssistant: Of course, the protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26) interacts with C-X-C motif chemokine 2 (Growth-regulated protein beta) (Gro-beta) (Macrophage inflammatory protein 2-alpha) (MIP2-alpha)."} {"text":"User: Can you come up with an example for a protein that binds the compound InChI=1S\/C28H29FN8O3\/c1-15-9-23(33-35(15)5)32-26-18(25(30)39)13-36(34-26)21-7-6-8-22(19(21)14-38)37-27(40)24-16(12-31-37)10-17(11-20(24)29)28(2,3)4\/h6-13,38H,14H2,1-5H3,(H2,30,39)(H,32,33,34)?\nAssistant: The compound InChI=1S\/C28H29FN8O3\/c1-15-9-23(33-35(15)5)32-26-18(25(30)39)13-36(34-26)21-7-6-8-22(19(21)14-38)37-27(40)24-16(12-31-37)10-17(11-20(24)29)28(2,3)4\/h6-13,38H,14H2,1-5H3,(H2,30,39)(H,32,33,34) targets for example the protein B-cell progenitor kinase.\nUser: Can you tell me a protein that is ortholog to protein B-cell progenitor kinase?\nAssistant: Of course, the protein B-cell progenitor kinase is ortholog to Agammaglobulinemia tyrosine kinase."}", "/scratch/micpie/export/compound_protein_protein/test_1-2.jsonl": "{"text":"User: Can you come up with one example for a protein that binds the compound [C][C][C][=C][O][C][=Ring1][Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][C][Branch2][Ring1][#Branch2][O][C][=C][C][=N][C][C][=C][Branch1][#Branch2][C][=Branch1][C][=O][O][C][C][C][O][S][C][Ring1][S][=Ring1][N][=C][Ring2][Ring1][#Branch1]?\nAssistant: The compound [C][C][C][=C][O][C][=Ring1][Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][C][Branch2][Ring1][#Branch2][O][C][=C][C][=N][C][C][=C][Branch1][#Branch2][C][=Branch1][C][=O][O][C][C][C][O][S][C][Ring1][S][=Ring1][N][=C][Ring2][Ring1][#Branch1] targets for example the protein FLT.\nUser: Can you tell me a protein that interacts with protein FLT?\nAssistant: Of course, the protein FLT interacts with AlaRS."} {"text":"User: Can you come up with an example for a protein that binds the compound [C][C][N][C][Branch1][=Branch2][C][=N][O][N][=C][Ring1][Branch1][N][=N][C][=C][Branch1][O][C][#C][C][Branch1][C][C][Branch1][C][C][O][N][=C][C][Branch1][O][O][C][C@H1][C][C][C][N][C][Ring1][=Branch1][=C][Ring2][Ring1][Ring2][Ring2][Ring1][=N]?\nAssistant: The compound [C][C][N][C][Branch1][=Branch2][C][=N][O][N][=C][Ring1][Branch1][N][=N][C][=C][Branch1][O][C][#C][C][Branch1][C][C][Branch1][C][C][O][N][=C][C][Branch1][O][O][C][C@H1][C][C][C][N][C][Ring1][=Branch1][=C][Ring2][Ring1][Ring2][Ring2][Ring1][=N] targets for example the protein hHIPk2.\nUser: Can you tell me a protein that interacts with protein hHIPk2?\nAssistant: Yes, the protein hHIPk2 interacts with MAP kinase signal-integrating kinase 1."}", "/scratch/micpie/export/compound_protein_protein/test_3-2.jsonl": "{"text":"User: Can you give me one example for a protein that binds the compound COP(=O)(Cc1ccc(C(=O)Nc2cc(-c3cccs3)ccc2N)cc1)c1ccccc1?\nAssistant: The compound COP(=O)(Cc1ccc(C(=O)Nc2cc(-c3cccs3)ccc2N)cc1)c1ccccc1 targets for example the protein Protein decrotonylase HDAC1.\nUser: Can you tell me a protein that interacts with protein Protein decrotonylase HDAC1?\nAssistant: Sure, the protein Protein decrotonylase HDAC1 interacts with DNA."} {"text":"User: Can you come up with an example for a protein that binds the compound InChI=1S\/C18H21NO4S2\/c1-24(20,21)15-6-3-7-16(11-15)25(22,23)17-8-9-18-13(10-17)4-2-5-14(18)12-19\/h3,6-11,14H,2,4-5,12,19H2,1H3\/t14-\/m0\/s1?\nAssistant: The compound InChI=1S\/C18H21NO4S2\/c1-24(20,21)15-6-3-7-16(11-15)25(22,23)17-8-9-18-13(10-17)4-2-5-14(18)12-19\/h3,6-11,14H,2,4-5,12,19H2,1H3\/t14-\/m0\/s1 targets for example the protein 5-HT-6.\nUser: Can you tell me a protein that is ortholog to protein 5-HT-6?\nAssistant: Yes, of course, the protein 5-HT-6 is ortholog to 5-HT6."}", "/scratch/micpie/export/compound_protein_protein/valid_5-0.jsonl": "{"text":"The compound CccccC)n5-cccC=O)N\/N=C\\ccccO)cF)c6))))))))))ccc6C=O)O targets the protein Proto-oncogene c-Myc which interacts with the protein Voltage-dependent T-type calcium channel subunit alpha-1G."} {"text":"The compound [C][=C][C][N][C][C][C@][C][=C][C][=C][C][Branch1][=Branch1][C][Branch1][C][N][=O][=C][Ring1][=Branch2][O][C@@H1][Ring1][N][C@@H1][Branch1][C][O][C][=C][C@H1][Ring1][P][C@H1][Ring2][Ring1][Branch1][C][Ring1][P] targets the protein Mu opiate receptor which is ortholog to the protein Mu-type opioid receptor."}", "/scratch/micpie/export/compound_protein_protein/train_7-1.jsonl": "{"text":"The protein Glutathione-dependent formaldehyde dehydrogenase is targeted by CCOcccCNC=O)N=Cccccnc6))))))C6cccsc5)))))))))))cc[N+]=O)[O-]))c6O. The protein Glutathione-dependent formaldehyde dehydrogenase interacts with PQ-loop repeat-containing protein 1."} {"text":"The protein A disintegrin and metalloproteinase with thrombospondin motifs 11 is targeted by [O][=C][Branch1][C][O][C][C][S][C][Branch1][N][\/C][=C][\\N][C][=Branch1][C][=O][C][S][Ring1][=Branch1][=N][C][=Ring1][N][C][=C][C][=C][Branch1][C][Br][S][Ring1][=Branch1]. The protein A disintegrin and metalloproteinase with thrombospondin motifs 11 is ortholog to Aggrecanase-2."}", "/scratch/micpie/export/compound_protein_protein/test_9-0.jsonl": "{"text":"The compound O=C(c1cccc(NC(=O)c2ccccn2)c1)c1ccc2c(\/C=C\/c3ccccn3)n[nH]c2c1 targets the protein Fetal liver kinase 1 which interacts with the protein Cx32."} {"text":"The compound O=CCcccccc6)))))))N[C@@H]CccccOP=O)O)O)))cc6)))))))C=O)N[C@@H]CO))cncCccccCl)cCl)c6)))))))no5 targets the protein Proto-oncogene c-Src which is ortholog to the protein Proto-oncogene tyrosine-protein kinase Src."}", "/scratch/micpie/export/compound_protein_protein/valid_9-1.jsonl": "{"text":"The protein Tyrosine-protein kinase ZAP-70 is targeted by CC(=O)N[C@@H](Cc1ccc(OP(=O)(O)O)cc1)C(=O)N[C@@H](C)c1nc(Cc2ccc(C(F)(F)F)cc2)no1. The protein Tyrosine-protein kinase ZAP-70 interacts with WASH complex subunit 1."} {"text":"The protein Leukocyte C-terminal Src kinase is targeted by [C][O][C][=C][C][=N][C][=C][C][Branch2][Ring2][C][O][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][C][N][Ring1][#Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][=C][Ring2][Ring1][O][C][=C][Ring2][Ring1][#C][C][Branch1][C][N][=O]. The protein Leukocyte C-terminal Src kinase is ortholog to Lymphocyte cell-specific protein-tyrosine kinase."}", "/scratch/micpie/export/compound_protein_protein/valid_3-0.jsonl": "{"text":"The compound Nc1ccccc1NC(=O)CCCCNC(=O)c1csc(-c2ccncc2)n1 targets the protein HD1 which interacts with the protein DnaJ homolog subfamily C member 6."} {"text":"The compound CCOc1ccc(C[C@@H]2NC(=O)CCSSC[C@@H](C(=O)N3CCCC[C@H]3C(=O)N[C@@H](CC(C)C)C(=O)NCC(N)=O)NC(=O)[C@H](CC(N)=O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H]([C@@H](C)CC)NC2=O)cc1 targets the protein Oxytocin receptor which is ortholog to the protein OT-R."}", "/scratch/micpie/export/compound_protein_protein/test_0-1.jsonl": "{"text":"The protein A-T mutated is targeted by InChI=1S\/C17H30N4O3S\/c1-12(2)13(3)18-16(22)11-21-15(5)17(14(4)19-21)25(23,24)20-9-7-6-8-10-20\/h12-13H,6-11H2,1-5H3,(H,18,22). The protein A-T mutated interacts with PNR-2."} {"text":"The protein FLT is targeted by CCNC(=O)c1cc2nccc(Oc3cccc(C(=O)Nc4cc(C)ccc4F)c3)c2s1. The protein FLT interacts with CLA-1."}", "/scratch/micpie/export/compound_protein_protein/valid_10-0.jsonl": "{"text":"The compound InChI=1S\/C10H6N2O2\/c13-9-6-1-2-7-5(3-4-11-7)8(6)10(14)12-9\/h1-4,11H,(H,12,13,14) targets the protein Poly [ADP-ribose] polymerase 1 (PARP-1) (EC 2.4.2.30) (ADP-ribosyltransferase diphtheria toxin-like 1) (ARTD1) (DNA ADP-ribosyltransferase PARP1) (EC 2.4.2.-) (NAD(+) ADP-ribosyltransferase 1) (ADPRT 1) (Poly[ADP-ribose] synthase 1) (Protein poly-ADP-ribosyltransferase PARP1) (EC 2.4.2.-) which interacts with the protein C\/EBP alpha."} {"text":"The compound InChI=1S\/C20H26N4O2\/c1-2-3-4-8-11-24-17(15-9-6-5-7-10-15)13-18(23-24)22-20(26)16-12-19(25)21-14-16\/h5-7,9-10,13,16H,2-4,8,11-12,14H2,1H3,(H,21,25)(H,22,23,26) targets the protein Sodium\/glucose cotransporter 1 which is ortholog to the protein High-affinity proline transporter PutP."}", "/scratch/micpie/export/compound_protein_protein/test_5-0.jsonl": "{"text":"The compound CCcccc\/C=C\\SC=O)NCCCCCNC=O)CNC=O)ccccNcccc[N+]=O)[O-]))cnonc95))))))))))cc6))))))))))))))))C5=O)))))))cc6 targets the protein Myc proto-oncogene protein which interacts with the protein Voltage-dependent T-type calcium channel subunit alpha-1G."} {"text":"The compound CN(C(=O)Cc1ccc(O)cc1)[C@H](Cc1ccc(O)cc1)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N1CC[C@H](C(=O)N[C@@H](CCCNC(=N)N)C(N)=O)C1 targets the protein V1aR which is ortholog to the protein V1aR."}", "/scratch/micpie/export/compound_protein_protein/test_2-0.jsonl": "{"text":"The compound InChI=1S\/C21H27N7O3\/c1-4-28-18-15(30-12-13-6-5-9-23-10-13)11-24-14(7-8-21(2,3)29)16(18)25-20(28)17-19(22)27-31-26-17\/h11,13,23,29H,4-6,9-10,12H2,1-3H3,(H2,22,27)\/t13-\/m0\/s1 targets the protein Homeodomain-interacting protein kinase 2 which interacts with the protein WSB-1."} {"text":"The compound InChI=1S\/C25H23N2O3PS\/c1-30-31(29,21-6-3-2-4-7-21)17-18-9-11-19(12-10-18)25(28)27-23-16-20(13-14-22(23)26)24-8-5-15-32-24\/h2-16H,17,26H2,1H3,(H,27,28) targets the protein HD1 which interacts with the protein hRit1."}", "/scratch/micpie/export/compound_protein_protein/valid_2-2.jsonl": "{"text":"User: Can you give me one example for a protein that binds the compound InChI=1S\/C21H23N3O5\/c1-24(2)13-17-16-5-3-4-6-18(16)29-19(17)21(26)22-11-12-28-15-9-7-14(8-10-15)20(25)23-27\/h3-10,27H,11-13H2,1-2H3,(H,22,26)(H,23,25)?\nAssistant: The compound InChI=1S\/C21H23N3O5\/c1-24(2)13-17-16-5-3-4-6-18(16)29-19(17)21(26)22-11-12-28-15-9-7-14(8-10-15)20(25)23-27\/h3-10,27H,11-13H2,1-2H3,(H,22,26)(H,23,25) targets for example the protein Polyamine deacetylase HDAC10.\nUser: Can you tell me a protein that interacts with protein Polyamine deacetylase HDAC10?\nAssistant: Yes, of course, the protein Polyamine deacetylase HDAC10 interacts with Glycerone-phosphate O-acyltransferase."} {"text":"User: Can you give me one example for a protein that binds the compound [C][C][C][Branch1][Ring1][C][C][Branch2][Ring1][=Branch1][C][=C][C][=C][Branch1][=Branch2][O][C][C][=Branch1][C][=O][N][O][C][Branch1][C][C][=C][Ring1][=N][C][=C][C][=C][Branch1][O][O][C][C][Branch1][C][C][Branch1][C][C][O][C][Branch1][C][C][=C][Ring1][=N]?\nAssistant: The compound [C][C][C][Branch1][Ring1][C][C][Branch2][Ring1][=Branch1][C][=C][C][=C][Branch1][=Branch2][O][C][C][=Branch1][C][=O][N][O][C][Branch1][C][C][=C][Ring1][=N][C][=C][C][=C][Branch1][O][O][C][C][Branch1][C][C][Branch1][C][C][O][C][Branch1][C][C][=C][Ring1][=N] targets for example the protein RPD3-2.\nUser: Can you tell me a protein that is ortholog to protein RPD3-2?\nAssistant: Sure, the protein RPD3-2 is ortholog to AtHDAC9."}", "/scratch/micpie/export/compound_protein_protein/valid_0-0.jsonl": "{"text":"The compound CC1C=CCC2C(=O)N(c3cccc(C(=O)Nc4ccc(Br)cc4)c3)C(=O)C12 targets the protein A-T mutated which interacts with the protein Trefoil factor 1."} {"text":"The compound cccNcccc[nH]ccc5c9))))))))))csc-ccccCNCCNCCNCC6))))))))))cc6))))))cc5n9 targets the protein Tyrosine-protein kinase FRT which interacts with the protein CD36 and LIMPII analogous 1."}", "/scratch/micpie/export/compound_protein_protein/train_6-1.jsonl": "{"text":"The protein 5-HT2B is targeted by CNC(=O)[C@@]12C[C@@H]1[C@@H](n1cnc3c(NC(C4CC4)C4CC4)nc(Cl)nc31)[C@H](O)[C@@H]2O. The protein 5-HT2B interacts with Insulin-induced gene 1 protein."} {"text":"The protein GSH-FDH is targeted by InChI=1S\/C21H18N4O5S\/c1-2-30-16-9-14(8-15(20(16)26)25(28)29)19-17(13-5-7-31-11-13)18(23-21(27)24-19)12-4-3-6-22-10-12\/h3-11,17,19,26H,2H2,1H3,(H,24,27). The protein GSH-FDH interacts with Dioxin receptor, nuclear translocator."}", "/scratch/micpie/export/compound_protein_protein/train_7-2.jsonl": "{"text":"User: Can you come up with one example for a protein that binds the compound CCOc1cc(C2NC(=O)N=C(c3cccnc3)C2c2ccsc2)cc([N+](=O)[O-])c1O?\nAssistant: The compound CCOc1cc(C2NC(=O)N=C(c3cccnc3)C2c2ccsc2)cc([N+](=O)[O-])c1O targets for example the protein Alcohol dehydrogenase 5.\nUser: Can you tell me a protein that interacts with protein Alcohol dehydrogenase 5?\nAssistant: Yes, the protein Alcohol dehydrogenase 5 interacts with PQ-loop repeat-containing protein 1."} {"text":"User: Can you come up with an example for a protein that binds the compound O=C(O)Cc1sc(\/C=C2\\NC(=O)CS2)nc1-c1ccc(Br)s1?\nAssistant: The compound O=C(O)Cc1sc(\/C=C2\\NC(=O)CS2)nc1-c1ccc(Br)s1 targets for example the protein ADAMTS-5.\nUser: Can you tell me a protein that is ortholog to protein ADAMTS-5?\nAssistant: Sure, the protein ADAMTS-5 is ortholog to A disintegrin and metalloproteinase with thrombospondin motifs 5."}", "/scratch/micpie/export/compound_protein_protein/test_7-0.jsonl": "{"text":"The compound COc1cc2c(cc1-c1cccnc1)CCC(=O)N2 targets the protein Aldosterone synthase which is ortholog to the protein Aldosterone-synthesizing enzyme."} {"text":"The compound CC1(C)OCC(N)=N[C@](C)(c2cc(Nc3ccc(OCC(F)(F)F)nc3)ccc2F)C1(F)F targets the protein Beta-site APP cleaving enzyme 2 which is ortholog to the protein Beta-secretase 2."}", "/scratch/micpie/export/compound_protein_protein/test_8-2.jsonl": "{"text":"User: Can you give me one example for a protein that binds the compound Cc1cnc(C(=O)N2CCC3(CC2)CN(C(=O)Nc2ccc(OC(F)(F)F)cc2)CCO3)cn1?\nAssistant: The compound Cc1cnc(C(=O)N2CCC3(CC2)CN(C(=O)Nc2ccc(OC(F)(F)F)cc2)CCO3)cn1 targets for example the protein Bifunctional epoxide hydrolase 2.\nUser: Can you tell me a protein that interacts with protein Bifunctional epoxide hydrolase 2?\nAssistant: Sure, the protein Bifunctional epoxide hydrolase 2 interacts with Catalase."} {"text":"User: Can you come up with an example for a protein that binds the compound CCC)NCCNCccccNcnccccnc5n9))CCCCCC6)))))CNC6=O)))))))))))))nc6)))))))CC6?\nAssistant: The compound CCC)NCCNCccccNcnccccnc5n9))CCCCCC6)))))CNC6=O)))))))))))))nc6)))))))CC6 targets for example the protein Cyclin-dependent kinase 2.\nUser: Can you tell me a protein that is ortholog to protein Cyclin-dependent kinase 2?\nAssistant: Of course, the protein Cyclin-dependent kinase 2 is ortholog to CDC2aAt."}", "/scratch/micpie/export/compound_protein_protein/test_0-2.jsonl": "{"text":"User: Can you come up with an example for a protein that binds the compound Cc1nn(CC(=O)NC(C)C(C)C)c(C)c1S(=O)(=O)N1CCCCC1?\nAssistant: The compound Cc1nn(CC(=O)NC(C)C(C)C)c(C)c1S(=O)(=O)N1CCCCC1 targets for example the protein A-T mutated.\nUser: Can you tell me a protein that interacts with protein A-T mutated?\nAssistant: Of course, the protein A-T mutated interacts with Protein pS2."} {"text":"User: Can you come up with one example for a protein that binds the compound CCNC(=O)c1cc2nccc(Oc3cccc(C(=O)Nc4cc(C)ccc4F)c3)c2s1?\nAssistant: The compound CCNC(=O)c1cc2nccc(Oc3cccc(C(=O)Nc4cc(C)ccc4F)c3)c2s1 targets for example the protein FLT-1.\nUser: Can you tell me a protein that interacts with protein FLT-1?\nAssistant: Of course, the protein FLT-1 interacts with CLA-1."}", "/scratch/micpie/export/compound_protein_protein/test_3-0.jsonl": "{"text":"The compound COP(=O)(Cc1ccc(C(=O)Nc2cc(-c3cccs3)ccc2N)cc1)c1ccccc1 targets the protein Protein decrotonylase HDAC1 which interacts with the protein DNA."} {"text":"The compound CS=O)=O)cccccS=O)=O)cccccc6)CCC[C@H]6CN))))))))))))c6 targets the protein 5-HT-6 which is ortholog to the protein 5-HT6."}", "/scratch/micpie/export/compound_protein_protein/train_1-0.jsonl": "{"text":"The compound COc1cc2ncnc(N3CCN(\/C(S)=N\\Cc4ccc5c(c4)OCO5)CC3)c2cc1OC targets the protein CD antigen CD140a which interacts with the protein PRKR-interacting protein 1."} {"text":"The compound [C][N][C][C][N][Branch2][Ring2][Branch2][C][=C][C][=C][N][=C][Branch2][Ring1][=Branch1][C][=C][Branch1][C][N][C][=C][Branch1][C][F][C][=C][C][=C][Ring1][#Branch1][NH1][C][Ring1][N][=O][NH1][C][Ring2][Ring1][C][=C][Ring2][Ring1][=Branch1][C][C][Ring2][Ring1][N] targets the protein Nuclear Dbf2-related kinase 1 which is ortholog to the protein Serine\/threonine-protein kinase 38."}", "/scratch/micpie/export/compound_protein_protein/test_5-2.jsonl": "{"text":"User: Can you give me an example for a protein that binds the compound CCcccc\/C=C\\SC=O)NCCCCCNC=O)CNC=O)ccccNcccc[N+]=O)[O-]))cnonc95))))))))))cc6))))))))))))))))C5=O)))))))cc6?\nAssistant: The compound CCcccc\/C=C\\SC=O)NCCCCCNC=O)CNC=O)ccccNcccc[N+]=O)[O-]))cnonc95))))))))))cc6))))))))))))))))C5=O)))))))cc6 targets for example the protein Class E basic helix-loop-helix protein 39.\nUser: Can you tell me a protein that interacts with protein Class E basic helix-loop-helix protein 39?\nAssistant: Yes, the protein Class E basic helix-loop-helix protein 39 interacts with Voltage-dependent T-type calcium channel subunit alpha-1G."} {"text":"User: Can you come up with an example for a protein that binds the compound CN(C(=O)Cc1ccc(O)cc1)[C@H](Cc1ccc(O)cc1)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N1CC[C@H](C(=O)N[C@@H](CCCNC(=N)N)C(N)=O)C1?\nAssistant: The compound CN(C(=O)Cc1ccc(O)cc1)[C@H](Cc1ccc(O)cc1)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N1CC[C@H](C(=O)N[C@@H](CCCNC(=N)N)C(N)=O)C1 targets for example the protein V1aR.\nUser: Can you tell me a protein that is ortholog to protein V1aR?\nAssistant: Yes, the protein V1aR is ortholog to Vasopressin V1a receptor."}", "/scratch/micpie/export/compound_protein_protein/valid_1-2.jsonl": "{"text":"User: Can you give me one example for a protein that binds the compound [C][C][=C][C][=C][Branch2][Ring1][#C][N][C][=N][N][=C][Branch2][Ring1][Ring1][C][=C][C][=C][N][=C][Ring1][=Branch1][C][C][C][=C][C][=N][C][=C][Ring1][=Branch1][O][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][#Branch2][C][Branch1][C][F][Branch1][C][F][F]?\nAssistant: The compound [C][C][=C][C][=C][Branch2][Ring1][#C][N][C][=N][N][=C][Branch2][Ring1][Ring1][C][=C][C][=C][N][=C][Ring1][=Branch1][C][C][C][=C][C][=N][C][=C][Ring1][=Branch1][O][Ring2][Ring1][Ring1][C][=C][Ring2][Ring1][#Branch2][C][Branch1][C][F][Branch1][C][F][F] targets for example the protein Fetal liver kinase 1.\nUser: Can you tell me a protein that interacts with protein Fetal liver kinase 1?\nAssistant: Yes, the protein Fetal liver kinase 1 interacts with Cx32."} {"text":"User: Can you give me an example for a protein that binds the compound Nc1cc(C(F)(F)F)c(-c2cc(N3CCOCC3)nc(N3CCOCC3)n2)cn1?\nAssistant: The compound Nc1cc(C(F)(F)F)c(-c2cc(N3CCOCC3)nc(N3CCOCC3)n2)cn1 targets for example the protein Phosphoinositide-3-kinase catalytic gamma polypeptide.\nUser: Can you tell me a protein that is ortholog to protein Phosphoinositide-3-kinase catalytic gamma polypeptide?\nAssistant: Yes, of course, the protein Phosphoinositide-3-kinase catalytic gamma polypeptide is ortholog to PtdIns-3-kinase subunit p110-gamma."}", "/scratch/micpie/export/compound_protein_protein/valid_7-2.jsonl": "{"text":"User: Can you give me an example for a protein that binds the compound CcccC=O)NNccccF)cc6)))))))))no5?\nAssistant: The compound CcccC=O)NNccccF)cc6)))))))))no5 targets for example the protein hERV1.\nUser: Can you tell me a protein that interacts with protein hERV1?\nAssistant: Yes, the protein hERV1 interacts with Autophagy-related protein 7."} {"text":"User: Can you give me one example for a protein that binds the compound InChI=1S\/C24H36N4O3\/c25-14-20-7-4-8-28(20)21(29)15-26-23-10-17-9-18(11-23)13-24(12-17,16-23)31-22(30)27-19-5-2-1-3-6-19\/h17-20,26H,1-13,15-16H2,(H,27,30)\/t17-,18+,20-,23-,24-\/m0\/s1?\nAssistant: The compound InChI=1S\/C24H36N4O3\/c25-14-20-7-4-8-28(20)21(29)15-26-23-10-17-9-18(11-23)13-24(12-17,16-23)31-22(30)27-19-5-2-1-3-6-19\/h17-20,26H,1-13,15-16H2,(H,27,30)\/t17-,18+,20-,23-,24-\/m0\/s1 targets for example the protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26).\nUser: Can you tell me a protein that interacts with protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26)?\nAssistant: Sure, the protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26) interacts with Protachykinin-1 (PPT)."}", "/scratch/micpie/export/compound_protein_protein/test_0-0.jsonl": "{"text":"The compound [C][C][=N][N][Branch1][S][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C][Branch1][C][C][C][C][Branch1][C][C][=C][Ring1][#C][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][C][C][C][Ring1][=Branch1] targets the protein Serine-protein kinase ATM which interacts with the protein Breast cancer estrogen-inducible protein."} {"text":"The compound CCNC=O)cccncccOcccccC=O)NcccC)ccc6F)))))))))c6)))))))c6s9 targets the protein Vascular endothelial growth factor receptor 1 which interacts with the protein CD antigen CD36."}", "/scratch/micpie/export/compound_protein_protein/test_6-0.jsonl": "{"text":"The compound InChI=1S\/C14H16F2N6\/c1-21-5-7-22(8-6-21)14-19-12(18-13(17)20-14)11-9(15)3-2-4-10(11)16\/h2-4H,5-8H2,1H3,(H2,17,18,19,20) targets the protein SP9144 which is ortholog to the protein HH4R."} {"text":"The compound COc1ccc(C2=NN(C(=O)CN3CCCCC3)C(c3ccccc3O)C2)cc1 targets the protein Amine oxidase [flavin-containing] A which is ortholog to the protein Monoamine oxidase type A."}", "/scratch/micpie/export/compound_protein_protein/train_2-0.jsonl": "{"text":"The compound Cn1cc(-c2ccc3nnc(Sc4ccc5ncccc5c4)n3n2)cn1 targets the protein SAPK1b which interacts with the protein Transcription factor JunD."} {"text":"The compound CCOccc=O)nC)cc6-cccNCCcccccc6))))))))S=O)=O)CC))))ccc6OccccF)cc6F targets the protein Bromodomain-containing protein 4 which is ortholog to the protein Bromodomain-containing protein 4A."}", "/scratch/micpie/export/compound_protein_protein/valid_2-0.jsonl": "{"text":"The compound CN(C)Cc1c(C(=O)NCCOc2ccc(C(=O)NO)cc2)oc2ccccc12 targets the protein Polyamine deacetylase HDAC10 which interacts with the protein Acyl-CoA:dihydroxyacetonephosphateacyltransferase."} {"text":"The compound CCC(CC)(c1ccc(OCC(=O)NO)c(C)c1)c1ccc(OCC(C)(C)O)c(C)c1 targets the protein Histone deacetylase 3 which is ortholog to the protein AtHDAC9."}", "/scratch/micpie/export/compound_protein_protein/test_7-1.jsonl": "{"text":"The protein Aldosterone-synthesizing enzyme is targeted by COcccccc6-ccccnc6))))))))CCC=O)N6. The protein Aldosterone-synthesizing enzyme is ortholog to Steroid 18-hydroxylase."} {"text":"The protein ASP1 is targeted by CC1(C)OCC(N)=N[C@](C)(c2cc(Nc3ccc(OCC(F)(F)F)nc3)ccc2F)C1(F)F. The protein ASP1 is ortholog to Beta-site amyloid precursor protein cleaving enzyme 2."}", "/scratch/micpie/export/compound_protein_protein/valid_4-2.jsonl": "{"text":"User: Can you give me one example for a protein that binds the compound CCOc1ccc(C[C@@H]2NC(=O)CCSSC[C@@H](C(=O)N3CCCC[C@H]3C(=O)N[C@@H](CC(C)C)C(=O)NCC(N)=O)NC(=O)[C@H](CC(N)=O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H]([C@@H](C)CC)NC2=O)cc1?\nAssistant: The compound CCOc1ccc(C[C@@H]2NC(=O)CCSSC[C@@H](C(=O)N3CCCC[C@H]3C(=O)N[C@@H](CC(C)C)C(=O)NCC(N)=O)NC(=O)[C@H](CC(N)=O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H]([C@@H](C)CC)NC2=O)cc1 targets for example the protein Oxytocin receptor.\nUser: Can you tell me a protein that is ortholog to protein Oxytocin receptor?\nAssistant: Yes, the protein Oxytocin receptor is ortholog to Oxytocin receptor."} {"text":"User: Can you come up with an example for a protein that binds the compound [C][C][=N][C][Branch2][Ring2][#Branch1][C][=C][Branch1][C][F][C][=C][Branch1][C][Cl][C][=C][Ring1][Branch2][C][=C][N][=C][Branch1][N][C][N][C][=Branch1][C][=O][N][Branch1][C][C][O][C][Branch1][C][F][=C][Ring1][=C][=N][O][Ring2][Ring1][O]?\nAssistant: The compound [C][C][=N][C][Branch2][Ring2][#Branch1][C][=C][Branch1][C][F][C][=C][Branch1][C][Cl][C][=C][Ring1][Branch2][C][=C][N][=C][Branch1][N][C][N][C][=Branch1][C][=O][N][Branch1][C][C][O][C][Branch1][C][F][=C][Ring1][=C][=N][O][Ring2][Ring1][O] targets for example the protein B1 bradykinin receptor.\nUser: Can you tell me a protein that is ortholog to protein B1 bradykinin receptor?\nAssistant: Of course, the protein B1 bradykinin receptor is ortholog to B1 bradykinin receptor."}", "/scratch/micpie/export/compound_protein_protein/test_4-2.jsonl": "{"text":"User: Can you give me one example for a protein that binds the compound CC[C@H]C)[C@H]NC=O)[C@H]CccccO)cc6)))))))NC=O)[C@@H]CCCN5C=O)[C@H]CCCNC=N)N))))))NC=O)[C@H]CCCNC=N)N))))))NC=O)[C@@H]CCCN5C=O)[C@H]CCCCN)))))NC=O)[C@H]CCN)=O)))NC=O)[C@H]CCC=O)O))))NC=O)[C@H]CccccO)cc6)))))))NC=O)[C@H]CCC)C)))NC=O)[C@@H]N)CCC=O)O)))))))))))))))))))))))))))))))))))))))))))C=O)N[C@@H]CCC)C)))C=O)O?\nAssistant: The compound CC[C@H]C)[C@H]NC=O)[C@H]CccccO)cc6)))))))NC=O)[C@@H]CCCN5C=O)[C@H]CCCNC=N)N))))))NC=O)[C@H]CCCNC=N)N))))))NC=O)[C@@H]CCCN5C=O)[C@H]CCCCN)))))NC=O)[C@H]CCN)=O)))NC=O)[C@H]CCC=O)O))))NC=O)[C@H]CccccO)cc6)))))))NC=O)[C@H]CCC)C)))NC=O)[C@@H]N)CCC=O)O)))))))))))))))))))))))))))))))))))))))))))C=O)N[C@@H]CCC)C)))C=O)O targets for example the protein NTR1.\nUser: Can you tell me a protein that interacts with protein NTR1?\nAssistant: Sure, the protein NTR1 interacts with Dopamine D2 receptor."} {"text":"User: Can you give me an example for a protein that binds the compound C=C(CNCC1CC1)c1ccc2c(c1)CCC[C@H]2NC(=O)C[C@@H](NS(=O)(=O)c1cccc(C(F)(F)F)c1)c1ccccc1?\nAssistant: The compound C=C(CNCC1CC1)c1ccc2c(c1)CCC[C@H]2NC(=O)C[C@@H](NS(=O)(=O)c1cccc(C(F)(F)F)c1)c1ccccc1 targets for example the protein BK-1 receptor.\nUser: Can you tell me a protein that is ortholog to protein BK-1 receptor?\nAssistant: Yes, the protein BK-1 receptor is ortholog to B1R."}", "/scratch/micpie/export/compound_protein_protein/valid_3-2.jsonl": "{"text":"User: Can you give me one example for a protein that binds the compound InChI=1S\/C20H21N5O2S\/c21-15-5-1-2-6-16(15)24-18(26)7-3-4-10-23-19(27)17-13-28-20(25-17)14-8-11-22-12-9-14\/h1-2,5-6,8-9,11-13H,3-4,7,10,21H2,(H,23,27)(H,24,26)?\nAssistant: The compound InChI=1S\/C20H21N5O2S\/c21-15-5-1-2-6-16(15)24-18(26)7-3-4-10-23-19(27)17-13-28-20(25-17)14-8-11-22-12-9-14\/h1-2,5-6,8-9,11-13H,3-4,7,10,21H2,(H,23,27)(H,24,26) targets for example the protein HD1.\nUser: Can you tell me a protein that interacts with protein HD1?\nAssistant: Of course, the protein HD1 interacts with DnaJ homolog subfamily C member 6."} {"text":"User: Can you give me one example for a protein that binds the compound [C][C][O][C][=C][C][=C][Branch2][Branch2][#Branch1][C][C@@H1][N][C][=Branch1][C][=O][C][C][S][S][C][C@@H1][Branch2][Ring2][=Branch1][C][=Branch1][C][=O][N][C][C][C][C][C@H1][Ring1][=Branch1][C][=Branch1][C][=O][N][C@@H1][Branch1][#Branch1][C][C][Branch1][C][C][C][C][=Branch1][C][=O][N][C][C][Branch1][C][N][=O][N][C][=Branch1][C][=O][C@H1][Branch1][#Branch1][C][C][Branch1][C][N][=O][N][C][=Branch1][C][=O][C@H1][Branch1][Branch2][C][C][C][Branch1][C][N][=O][N][C][=Branch1][C][=O][C@H1][Branch1][#Branch1][C@@H1][Branch1][C][C][C][C][N][C][Ring2][Branch1][N][=O][C][=C][Ring2][=Branch1][Ring2]?\nAssistant: The compound [C][C][O][C][=C][C][=C][Branch2][Branch2][#Branch1][C][C@@H1][N][C][=Branch1][C][=O][C][C][S][S][C][C@@H1][Branch2][Ring2][=Branch1][C][=Branch1][C][=O][N][C][C][C][C][C@H1][Ring1][=Branch1][C][=Branch1][C][=O][N][C@@H1][Branch1][#Branch1][C][C][Branch1][C][C][C][C][=Branch1][C][=O][N][C][C][Branch1][C][N][=O][N][C][=Branch1][C][=O][C@H1][Branch1][#Branch1][C][C][Branch1][C][N][=O][N][C][=Branch1][C][=O][C@H1][Branch1][Branch2][C][C][C][Branch1][C][N][=O][N][C][=Branch1][C][=O][C@H1][Branch1][#Branch1][C@@H1][Branch1][C][C][C][C][N][C][Ring2][Branch1][N][=O][C][=C][Ring2][=Branch1][Ring2] targets for example the protein Oxytocin receptor.\nUser: Can you tell me a protein that is ortholog to protein Oxytocin receptor?\nAssistant: Yes, the protein Oxytocin receptor is ortholog to OT-R."}", "/scratch/micpie/export/compound_protein_protein/valid_2-1.jsonl": "{"text":"The protein Histone deacetylase 10 is targeted by CN(C)Cc1c(C(=O)NCCOc2ccc(C(=O)NO)cc2)oc2ccccc12. The protein Histone deacetylase 10 interacts with DHAP-AT."} {"text":"The protein Protein deacetylase HDAC3 is targeted by CCC(CC)(c1ccc(OCC(=O)NO)c(C)c1)c1ccc(OCC(C)(C)O)c(C)c1. The protein Protein deacetylase HDAC3 is ortholog to Histone deacetylase 9."}", "/scratch/micpie/export/compound_protein_protein/valid_4-0.jsonl": "{"text":"The compound CCOc1ccc(C[C@@H]2NC(=O)CCSSC[C@@H](C(=O)N3CCCC[C@H]3C(=O)N[C@@H](CC(C)C)C(=O)NCC(N)=O)NC(=O)[C@H](CC(N)=O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H]([C@@H](C)CC)NC2=O)cc1 targets the protein OT-R which is ortholog to the protein Oxytocin receptor."} {"text":"The compound Cc1nc(-c2c(F)cc(Cl)cc2-c2cnc(CNC(=O)N(C)O)c(F)c2)no1 targets the protein BK-1 receptor which is ortholog to the protein B1R."}", "/scratch/micpie/export/compound_protein_protein/train_5-1.jsonl": "{"text":"The protein Nuclear receptor subfamily 2 group B member 2 is targeted by [C][C][Branch1][C][C][C][C][C][Branch1][C][C][Branch1][C][C][C][=C][C][Branch2][Ring2][Branch1][N][Branch1][S][C][=C][C][=C][Branch1][=Branch1][C][=Branch1][C][=O][O][C][=N][Ring1][=Branch2][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][=C][C][=C][Ring2][Ring1][#Branch2][Ring2][Ring1][P]. The protein Nuclear receptor subfamily 2 group B member 2 interacts with Nuclear receptor subfamily 1 group H member 4."} {"text":"The protein Serotonin receptor 2B is targeted by CNC(=O)[C@@]12C[C@@H]1[C@@H](n1cnc3c(NC(C4CC4)C4CC4)nc(Cl)nc31)[C@H](O)[C@@H]2O. The protein Serotonin receptor 2B interacts with Solute carrier family 31 member 2."}", "/scratch/micpie/export/compound_protein_protein/test_2-1.jsonl": "{"text":"The protein hHIPk2 is targeted by InChI=1S\/C21H27N7O3\/c1-4-28-18-15(30-12-13-6-5-9-23-10-13)11-24-14(7-8-21(2,3)29)16(18)25-20(28)17-19(22)27-31-26-17\/h11,13,23,29H,4-6,9-10,12H2,1-3H3,(H2,22,27)\/t13-\/m0\/s1. The protein hHIPk2 interacts with WD repeat and SOCS box-containing protein 1."} {"text":"The protein Protein decrotonylase HDAC1 is targeted by [C][O][P][=Branch1][C][=O][Branch2][Ring2][Branch1][C][C][=C][C][=C][Branch2][Ring1][Branch2][C][=Branch1][C][=O][N][C][=C][C][Branch1][Branch2][C][=C][C][=C][S][Ring1][Branch1][=C][C][=C][Ring1][O][N][C][=C][Ring2][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]. The protein Protein decrotonylase HDAC1 interacts with COUP-TF-interacting protein 2."}", "/scratch/micpie/export/compound_protein_protein/train_9-2.jsonl": "{"text":"User: Can you give me one example for a protein that binds the compound CC(CO)\/N=C(\\N)c1c(O)nsc1Nc1ccc(Cc2ccccc2)cc1?\nAssistant: The compound CC(CO)\/N=C(\\N)c1c(O)nsc1Nc1ccc(Cc2ccccc2)cc1 targets for example the protein MAPKK 1.\nUser: Can you tell me a protein that is ortholog to protein MAPKK 1?\nAssistant: Of course, the protein MAPKK 1 is ortholog to AtMKK6."} {"text":"User: Can you come up with an example for a protein that binds the compound [C][N][N][=C][Branch1][=Branch1][C][Branch1][C][N][=O][C][=C][Ring1][Branch2][C][=N][C][Branch2][Ring1][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Branch1][C][F][Branch1][C][F][F][=N][C][=C][Ring1][P][C][C][Ring2][Ring1][Branch1]?\nAssistant: The compound [C][N][N][=C][Branch1][=Branch1][C][Branch1][C][N][=O][C][=C][Ring1][Branch2][C][=N][C][Branch2][Ring1][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Branch1][C][F][Branch1][C][F][F][=N][C][=C][Ring1][P][C][C][Ring2][Ring1][Branch1] targets for example the protein Serine\/threonine-protein kinase 13.\nUser: Can you tell me a protein that interacts with protein Serine\/threonine-protein kinase 13?\nAssistant: Sure, the protein Serine\/threonine-protein kinase 13 interacts with p21-activated kinase 3."}", "/scratch/micpie/export/compound_protein_protein/train_0-0.jsonl": "{"text":"The compound [C][C][=C][C][=C][Branch2][Ring1][#Branch2][S][C][Branch1][P][C][C][=Branch1][C][=O][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][=Branch1][C][=O][O][C][=C][Ring2][Ring1][Branch1] targets the protein A-T mutated which interacts with the protein hP1.A."} {"text":"The compound [C][O][C][=C][C][=N][C][=N][C][Branch2][Ring2][C][N][C][C][N][Branch2][Ring1][=Branch1][\/C][Branch1][C][S][=N][\\C][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O][Ring1][=Branch1][C][C][Ring2][Ring1][Ring1][=C][Ring2][Ring1][=Branch2][C][=C][Ring2][Ring1][=N][O][C] targets the protein CD140 antigen-like family member A which interacts with the protein Platelet-derived growth factor C (PDGF-C) (Fallotein) (Spinal cord-derived growth factor) (SCDGF) (VEGF-E)."}", "/scratch/micpie/export/compound_protein_protein/test_1-1.jsonl": "{"text":"The protein Vascular permeability factor receptor is targeted by [C][C][C][=C][O][C][=Ring1][Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][C][Branch2][Ring1][#Branch2][O][C][=C][C][=N][C][C][=C][Branch1][#Branch2][C][=Branch1][C][=O][O][C][C][C][O][S][C][Ring1][S][=Ring1][N][=C][Ring2][Ring1][#Branch1]. The protein Vascular permeability factor receptor interacts with Alanyl-tRNA synthetase."} {"text":"The protein Homeodomain-interacting protein kinase 2 is targeted by CCn1c(-c2nonc2N)nc2c(C#CC(C)(C)O)ncc(OC[C@H]3CCCNC3)c21. The protein Homeodomain-interacting protein kinase 2 interacts with MAPK signal-integrating kinase 1."}", "/scratch/micpie/export/compound_protein_protein/valid_9-0.jsonl": "{"text":"The compound CC(=O)N[C@@H](Cc1ccc(OP(=O)(O)O)cc1)C(=O)N[C@@H](C)c1nc(Cc2ccc(C(F)(F)F)cc2)no1 targets the protein Syk-related tyrosine kinase which interacts with the protein WAS protein family homolog 1."} {"text":"The compound InChI=1S\/C26H21ClN4O5\/c1-34-23-14-20-18(13-19(23)25(28)32)22(8-9-29-20)36-17-6-7-21-24(12-17)35-11-10-31(21)26(33)30-16-4-2-15(27)3-5-16\/h2-9,12-14H,10-11H2,1H3,(H2,28,32)(H,30,33) targets the protein Proto-oncogene Lck which is ortholog to the protein Proto-oncogene tyrosine-protein kinase LCK."}", "/scratch/micpie/export/compound_protein_protein/train_5-2.jsonl": "{"text":"User: Can you give me one example for a protein that binds the compound CCC)CCCC)C)cccNccccC=O)O))cn6))))))S=O)=O)ccccCl)cc6))))))))ccc6%10?\nAssistant: The compound CCC)CCCC)C)cccNccccC=O)O))cn6))))))S=O)=O)ccccCl)cc6))))))))ccc6%10 targets for example the protein Retinoic acid receptor RXR-beta.\nUser: Can you tell me a protein that interacts with protein Retinoic acid receptor RXR-beta?\nAssistant: Yes, the protein Retinoic acid receptor RXR-beta interacts with Farnesol receptor HRR-1."} {"text":"User: Can you give me an example for a protein that binds the compound CNC(=O)[C@@]12C[C@@H]1[C@@H](n1cnc3c(NC(C4CC4)C4CC4)nc(Cl)nc31)[C@H](O)[C@@H]2O?\nAssistant: The compound CNC(=O)[C@@]12C[C@@H]1[C@@H](n1cnc3c(NC(C4CC4)C4CC4)nc(Cl)nc31)[C@H](O)[C@@H]2O targets for example the protein 5-HT2B.\nUser: Can you tell me a protein that interacts with protein 5-HT2B?\nAssistant: Yes, of course, the protein 5-HT2B interacts with Copper transporter 2."}", "/scratch/micpie/export/compound_protein_protein/train_8-1.jsonl": "{"text":"The protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26) is targeted by InChI=1S\/C16H16N4O2\/c17-10-12-3-5-13(6-4-12)16(22)19-8-7-15(21)20-9-1-2-14(20)11-18\/h3-6,14H,1-2,7-9H2,(H,19,22)\/t14-\/m0\/s1. The protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26) interacts with Pro-neuropeptide Y."} {"text":"The protein Dual specificity mitogen-activated protein kinase kinase 1 is targeted by [C][C][Branch1][Ring1][C][O][\/N][=C][Branch1][C][\\N][C][C][Branch1][C][O][=N][S][C][=Ring1][=Branch1][N][C][=C][C][=C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][=N]. The protein Dual specificity mitogen-activated protein kinase kinase 1 is ortholog to MAPK kinase."}", "/scratch/micpie/export/compound_protein_protein/train_8-0.jsonl": "{"text":"The compound N#Cc1ccc(C(=O)NCCC(=O)N2CCC[C@H]2C#N)cc1 targets the protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26) which interacts with the protein Pro-neuropeptide Y."} {"text":"The compound InChI=1S\/C20H22N4O2S\/c1-13(12-25)22-18(21)17-19(26)24-27-20(17)23-16-9-7-15(8-10-16)11-14-5-3-2-4-6-14\/h2-10,13,23,25H,11-12H2,1H3,(H2,21,22)(H,24,26) targets the protein MAPK\/ERK kinase 1 which is ortholog to the protein MAPK kinase."}", "/scratch/micpie/export/compound_protein_protein/test_10-1.jsonl": "{"text":"The protein Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform is targeted by InChI=1S\/C26H25ClN6O2\/c1-3-23-21(7-4-18-5-9-24(28)29-15-18)25(31-16-30-23)19-6-8-20(22(27)14-19)26(35)33-12-10-32(11-13-33)17(2)34\/h5-6,8-9,14-16H,3,10-13H2,1-2H3,(H2,28,29). The protein Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform interacts with A proliferation-inducing ligand."} {"text":"The protein Sodium-dependent noradrenaline transporter is targeted by [O][=C][N][C][C][Branch1][N][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1][C][Ring1][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1]. The protein Sodium-dependent noradrenaline transporter is ortholog to Protein fumin."}", "/scratch/micpie/export/compound_protein_protein/test_5-1.jsonl": "{"text":"The protein Proto-oncogene c-Myc is targeted by CCc1ccc(\/C=C2\\SC(=O)N(CCCCCNC(=O)CNC(=O)c3ccc(Nc4ccc([N+](=O)[O-])c5nonc45)cc3)C2=O)cc1. The protein Proto-oncogene c-Myc interacts with Cav3.1c."} {"text":"The protein V1aR is targeted by CNC=O)CccccO)cc6))))))))[C@H]CccccO)cc6)))))))C=O)N[C@@H]Ccccccc6)))))))C=O)N[C@@H]CCCN)=O))))C=O)N[C@@H]CCN)=O)))C=O)N[C@@H]CCCNC=N)N))))))C=O)NCC[C@H]C=O)N[C@@H]CCCNC=N)N))))))CN)=O)))))C5. The protein V1aR is ortholog to Vasopressin V1a receptor."}", "/scratch/micpie/export/compound_protein_protein/train_4-1.jsonl": "{"text":"The protein CX5 is targeted by InChI=1S\/C32H31Cl2N7O\/c1-21(2)37-31(42)39-32(22-8-4-3-5-9-22)16-18-40(19-17-32)29-27-30(36-20-35-29)41(24-14-12-23(33)13-15-24)28(38-27)25-10-6-7-11-26(25)34\/h3-15,20-21H,16-19H2,1-2H3,(H2,37,39,42). The protein CX5 interacts with C-C motif chemokine 4 (G-26 T-lymphocyte-secreted protein) (HC21) (Lymphocyte activation gene 1 protein) (LAG-1) (MIP-1-beta(1-69)) (Macrophage inflammatory protein 1-beta) (MIP-1-beta) (PAT 744) (Protein H400) (SIS-gamma) (Small-inducible cytokine A4) (T-cell activation protein 2) (ACT-2)."} {"text":"The protein Nuclear receptor subfamily 2 group B member 2 is targeted by InChI=1S\/C26H27ClN2O4S\/c1-25(2)13-14-26(3,4)22-15-19(8-11-21(22)25)29(23-12-5-17(16-28-23)24(30)31)34(32,33)20-9-6-18(27)7-10-20\/h5-12,15-16H,13-14H2,1-4H3,(H,30,31). The protein Nuclear receptor subfamily 2 group B member 2 interacts with SPC4."}", "/scratch/micpie/export/compound_protein_protein/valid_0-2.jsonl": "{"text":"User: Can you give me an example for a protein that binds the compound CCC=CCCC=O)NcccccC=O)NccccBr)cc6))))))))c6))))))C=O)C95?\nAssistant: The compound CCC=CCCC=O)NcccccC=O)NccccBr)cc6))))))))c6))))))C=O)C95 targets for example the protein Serine-protein kinase ATM.\nUser: Can you tell me a protein that interacts with protein Serine-protein kinase ATM?\nAssistant: Yes, the protein Serine-protein kinase ATM interacts with PNR-2."} {"text":"User: Can you give me an example for a protein that binds the compound c1cc(Nc2ccc3[nH]ccc3c2)c2sc(-c3ccc(CNCCN4CCNCC4)cc3)cc2n1?\nAssistant: The compound c1cc(Nc2ccc3[nH]ccc3c2)c2sc(-c3ccc(CNCCN4CCNCC4)cc3)cc2n1 targets for example the protein Vascular endothelial growth factor receptor 1.\nUser: Can you tell me a protein that interacts with protein Vascular endothelial growth factor receptor 1?\nAssistant: Yes, of course, the protein Vascular endothelial growth factor receptor 1 interacts with CLA-1."}", "/scratch/micpie/export/compound_protein_protein/train_5-0.jsonl": "{"text":"The compound InChI=1S\/C26H27ClN2O4S\/c1-25(2)13-14-26(3,4)22-15-19(8-11-21(22)25)29(23-12-5-17(16-28-23)24(30)31)34(32,33)20-9-6-18(27)7-10-20\/h5-12,15-16H,13-14H2,1-4H3,(H,30,31) targets the protein Nuclear receptor subfamily 2 group B member 2 which interacts with the protein RXR-interacting protein 14."} {"text":"The compound CNC(=O)[C@@]12C[C@@H]1[C@@H](n1cnc3c(NC(C4CC4)C4CC4)nc(Cl)nc31)[C@H](O)[C@@H]2O targets the protein Serotonin receptor 2B which interacts with the protein hCTR2."}", "/scratch/micpie/export/compound_protein_protein/test_6-2.jsonl": "{"text":"User: Can you give me one example for a protein that binds the compound CN1CCN(c2nc(N)nc(-c3c(F)cccc3F)n2)CC1?\nAssistant: The compound CN1CCN(c2nc(N)nc(-c3c(F)cccc3F)n2)CC1 targets for example the protein GPRv53.\nUser: Can you tell me a protein that is ortholog to protein GPRv53?\nAssistant: Sure, the protein GPRv53 is ortholog to Histamine H4 receptor."} {"text":"User: Can you come up with an example for a protein that binds the compound COc1ccc(C2=NN(C(=O)CN3CCCCC3)C(c3ccccc3O)C2)cc1?\nAssistant: The compound COc1ccc(C2=NN(C(=O)CN3CCCCC3)C(c3ccccc3O)C2)cc1 targets for example the protein MAO-A.\nUser: Can you tell me a protein that is ortholog to protein MAO-A?\nAssistant: Sure, the protein MAO-A is ortholog to Monoamine oxidase type A."}", "/scratch/micpie/export/compound_protein_protein/valid_0-1.jsonl": "{"text":"The protein Ataxia telangiectasia mutated is targeted by CC1C=CCC2C(=O)N(c3cccc(C(=O)Nc4ccc(Br)cc4)c3)C(=O)C12. The protein Ataxia telangiectasia mutated interacts with PNR-2."} {"text":"The protein FLT is targeted by cccNcccc[nH]ccc5c9))))))))))csc-ccccCNCCNCCNCC6))))))))))cc6))))))cc5n9. The protein FLT interacts with Scavenger receptor class B member 1."}", "/scratch/micpie/export/compound_protein_protein/valid_7-1.jsonl": "{"text":"The protein hERV1 is targeted by [C][C][=C][C][Branch2][Ring1][C][C][=Branch1][C][=O][N][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][=N][O][Ring1][S]. The protein hERV1 interacts with Autophagy-related protein 7."} {"text":"The protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26) is targeted by N#C[C@@H]1CCCN1C(=O)CN[C@@]12C[C@@H]3C[C@H](C1)C[C@@](OC(=O)NC1CCCCC1)(C3)C2. The protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26) interacts with Protachykinin-1 (PPT)."}", "/scratch/micpie/export/compound_protein_protein/train_10-1.jsonl": "{"text":"The protein PLK-1 is targeted by Cn1nc(C(N)=O)c2c1-c1nc(Nc3ccccc3C(F)(F)F)ncc1CC2. The protein PLK-1 interacts with Meiosis-specific kinetochore protein."} {"text":"The protein Solute carrier family 6 member 2 is targeted by COcccccc6CCcccccF)c6))))))NCCNCC6. The protein Solute carrier family 6 member 2 is ortholog to Protein fumin."}", "/scratch/micpie/export/compound_protein_protein/train_2-1.jsonl": "{"text":"The protein Mitogen-activated protein kinase 10 is targeted by Cn1cc(-c2ccc3nnc(Sc4ccc5ncccc5c4)n3n2)cn1. The protein Mitogen-activated protein kinase 10 interacts with Transcription factor AP-1 subunit JunD."} {"text":"The protein Protein HUNK1 is targeted by CCOc1cc(=O)n(C)cc1-c1cc(N(CCc2ccccc2)S(=O)(=O)CC)ccc1Oc1ccc(F)cc1F. The protein Protein HUNK1 is ortholog to Bromodomain-containing protein 4A."}", "/scratch/micpie/export/compound_protein_protein/valid_1-1.jsonl": "{"text":"The protein CD antigen CD309 is targeted by Cc1ccc(Nc2nnc(-c3cccnc3CCc3ccncc3)o2)cc1C(F)(F)F. The protein CD antigen CD309 interacts with Gap junction beta-1 protein."} {"text":"The protein Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit gamma isoform is targeted by NcccCF)F)F))c-cccNCCOCC6))))))ncNCCOCC6))))))n6))))))cn6. The protein Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit gamma isoform is ortholog to p120-PI3K."}", "/scratch/micpie/export/compound_protein_protein/test_3-1.jsonl": "{"text":"The protein Protein decrotonylase HDAC1 is targeted by COP=O)CccccC=O)Nccc-ccccs5)))))ccc6N)))))))))cc6)))))))cccccc6. The protein Protein decrotonylase HDAC1 interacts with cytosine-5)-methyltransferase 3-like."} {"text":"The protein Serotonin receptor 6 is targeted by CS(=O)(=O)c1cccc(S(=O)(=O)c2ccc3c(c2)CCC[C@H]3CN)c1. The protein Serotonin receptor 6 is ortholog to Serotonin receptor 6."}", "/scratch/micpie/export/compound_protein_protein/train_9-0.jsonl": "{"text":"The compound CC(CO)\/N=C(\\N)c1c(O)nsc1Nc1ccc(Cc2ccccc2)cc1 targets the protein MAP kinase kinase 1 which is ortholog to the protein Protein Arabidopsis NQK1 homolog."} {"text":"The compound Cn1nc(C(N)=O)c2c1-c1nc(Nc3ccccc3C(F)(F)F)ncc1CC2 targets the protein Serine\/threonine-protein kinase PLK1 which interacts with the protein p21-activated kinase 3."}", "/scratch/micpie/export/compound_protein_protein/valid_9-2.jsonl": "{"text":"User: Can you come up with one example for a protein that binds the compound InChI=1S\/C23H24F3N4O7P\/c1-13(22-29-20(30-36-22)12-16-3-7-17(8-4-16)23(24,25)26)27-21(32)19(28-14(2)31)11-15-5-9-18(10-6-15)37-38(33,34)35\/h3-10,13,19H,11-12H2,1-2H3,(H,27,32)(H,28,31)(H2,33,34,35)\/t13-,19-\/m0\/s1?\nAssistant: The compound InChI=1S\/C23H24F3N4O7P\/c1-13(22-29-20(30-36-22)12-16-3-7-17(8-4-16)23(24,25)26)27-21(32)19(28-14(2)31)11-15-5-9-18(10-6-15)37-38(33,34)35\/h3-10,13,19H,11-12H2,1-2H3,(H,27,32)(H,28,31)(H2,33,34,35)\/t13-,19-\/m0\/s1 targets for example the protein Syk-related tyrosine kinase.\nUser: Can you tell me a protein that interacts with protein Syk-related tyrosine kinase?\nAssistant: Yes, of course, the protein Syk-related tyrosine kinase interacts with WASH complex subunit 1."} {"text":"User: Can you come up with one example for a protein that binds the compound [C][O][C][=C][C][=N][C][=C][C][Branch2][Ring2][C][O][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][C][N][Ring1][#Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][=C][Ring2][Ring1][O][C][=C][Ring2][Ring1][#C][C][Branch1][C][N][=O]?\nAssistant: The compound [C][O][C][=C][C][=N][C][=C][C][Branch2][Ring2][C][O][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][C][N][Ring1][#Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][=C][Ring2][Ring1][O][C][=C][Ring2][Ring1][#C][C][Branch1][C][N][=O] targets for example the protein Lymphocyte cell-specific protein-tyrosine kinase.\nUser: Can you tell me a protein that is ortholog to protein Lymphocyte cell-specific protein-tyrosine kinase?\nAssistant: Of course, the protein Lymphocyte cell-specific protein-tyrosine kinase is ortholog to p56-LCK."}", "/scratch/micpie/export/compound_protein_protein/test_1-0.jsonl": "{"text":"The compound Cc1ccoc1C(=O)Nc1cccc(Oc2ccnc3cc(C(=O)OCCCO)sc23)c1 targets the protein Fms-like tyrosine kinase 1 which interacts with the protein AlaRS."} {"text":"The compound CCn1c(-c2nonc2N)nc2c(C#CC(C)(C)O)ncc(OC[C@H]3CCCNC3)c21 targets the protein hHIPk2 which interacts with the protein Mnk1."}", "/scratch/micpie/export/compound_protein_protein/train_0-2.jsonl": "{"text":"User: Can you come up with an example for a protein that binds the compound Cc1ccc(SC(CC(=O)c2ccc(F)cc2)C(=O)O)cc1?\nAssistant: The compound Cc1ccc(SC(CC(=O)c2ccc(F)cc2)C(=O)O)cc1 targets for example the protein Serine-protein kinase ATM.\nUser: Can you tell me a protein that interacts with protein Serine-protein kinase ATM?\nAssistant: Yes, the protein Serine-protein kinase ATM interacts with Breast cancer estrogen-inducible protein."} {"text":"User: Can you come up with an example for a protein that binds the compound COc1cc2ncnc(N3CCN(\/C(S)=N\\Cc4ccc5c(c4)OCO5)CC3)c2cc1OC?\nAssistant: The compound COc1cc2ncnc(N3CCN(\/C(S)=N\\Cc4ccc5c(c4)OCO5)CC3)c2cc1OC targets for example the protein PDGFR-2.\nUser: Can you tell me a protein that interacts with protein PDGFR-2?\nAssistant: Yes, the protein PDGFR-2 interacts with Platelet-derived growth factor C (PDGF-C) (Fallotein) (Spinal cord-derived growth factor) (SCDGF) (VEGF-E)."}", "/scratch/micpie/export/compound_protein_protein/train_4-2.jsonl": "{"text":"User: Can you come up with an example for a protein that binds the compound CC(C)NC(=O)NC1(c2ccccc2)CCN(c2ncnc3c2nc(-c2ccccc2Cl)n3-c2ccc(Cl)cc2)CC1?\nAssistant: The compound CC(C)NC(=O)NC1(c2ccccc2)CCN(c2ncnc3c2nc(-c2ccccc2Cl)n3-c2ccc(Cl)cc2)CC1 targets for example the protein CB-2.\nUser: Can you tell me a protein that interacts with protein CB-2?\nAssistant: Of course, the protein CB-2 interacts with C-C motif chemokine 4 (G-26 T-lymphocyte-secreted protein) (HC21) (Lymphocyte activation gene 1 protein) (LAG-1) (MIP-1-beta(1-69)) (Macrophage inflammatory protein 1-beta) (MIP-1-beta) (PAT 744) (Protein H400) (SIS-gamma) (Small-inducible cytokine A4) (T-cell activation protein 2) (ACT-2)."} {"text":"User: Can you give me one example for a protein that binds the compound CCC)CCCC)C)cccNccccC=O)O))cn6))))))S=O)=O)ccccCl)cc6))))))))ccc6%10?\nAssistant: The compound CCC)CCCC)C)cccNccccC=O)O))cn6))))))S=O)=O)ccccCl)cc6))))))))ccc6%10 targets for example the protein Retinoid X receptor beta.\nUser: Can you tell me a protein that interacts with protein Retinoid X receptor beta?\nAssistant: Yes, the protein Retinoid X receptor beta interacts with Subtilisin-like proprotein convertase 4."}", "/scratch/micpie/export/compound_protein_protein/test_6-1.jsonl": "{"text":"The protein G-protein coupled receptor 105 is targeted by CN1CCN(c2nc(N)nc(-c3c(F)cccc3F)n2)CC1. The protein G-protein coupled receptor 105 is ortholog to HH4R."} {"text":"The protein Monoamine oxidase type A is targeted by InChI=1S\/C23H27N3O3\/c1-29-18-11-9-17(10-12-18)20-15-21(19-7-3-4-8-22(19)27)26(24-20)23(28)16-25-13-5-2-6-14-25\/h3-4,7-12,21,27H,2,5-6,13-16H2,1H3. The protein Monoamine oxidase type A is ortholog to Monoamine oxidase type A."}", "/scratch/micpie/export/compound_protein_protein/valid_4-1.jsonl": "{"text":"The protein OT-R is targeted by CCOc1ccc(C[C@@H]2NC(=O)CCSSC[C@@H](C(=O)N3CCCC[C@H]3C(=O)N[C@@H](CC(C)C)C(=O)NCC(N)=O)NC(=O)[C@H](CC(N)=O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H]([C@@H](C)CC)NC2=O)cc1. The protein OT-R is ortholog to OT-R."} {"text":"The protein B1R is targeted by Cc1nc(-c2c(F)cc(Cl)cc2-c2cnc(CNC(=O)N(C)O)c(F)c2)no1. The protein B1R is ortholog to B1 bradykinin receptor."}", "/scratch/micpie/export/compound_protein_protein/test_2-2.jsonl": "{"text":"User: Can you come up with an example for a protein that binds the compound [C][C][N][C][Branch1][=Branch2][C][=N][O][N][=C][Ring1][Branch1][N][=N][C][=C][Branch1][O][C][#C][C][Branch1][C][C][Branch1][C][C][O][N][=C][C][Branch1][O][O][C][C@H1][C][C][C][N][C][Ring1][=Branch1][=C][Ring2][Ring1][Ring2][Ring2][Ring1][=N]?\nAssistant: The compound [C][C][N][C][Branch1][=Branch2][C][=N][O][N][=C][Ring1][Branch1][N][=N][C][=C][Branch1][O][C][#C][C][Branch1][C][C][Branch1][C][C][O][N][=C][C][Branch1][O][O][C][C@H1][C][C][C][N][C][Ring1][=Branch1][=C][Ring2][Ring1][Ring2][Ring2][Ring1][=N] targets for example the protein hHIPk2.\nUser: Can you tell me a protein that interacts with protein hHIPk2?\nAssistant: Sure, the protein hHIPk2 interacts with WD repeat and SOCS box-containing protein 1."} {"text":"User: Can you come up with one example for a protein that binds the compound COP=O)CccccC=O)Nccc-ccccs5)))))ccc6N)))))))))cc6)))))))cccccc6?\nAssistant: The compound COP=O)CccccC=O)Nccc-ccccs5)))))ccc6N)))))))))cc6)))))))cccccc6 targets for example the protein Protein deacetylase HDAC1.\nUser: Can you tell me a protein that interacts with protein Protein deacetylase HDAC1?\nAssistant: Sure, the protein Protein deacetylase HDAC1 interacts with Radiation-induced tumor suppressor gene 1 protein."}", "/scratch/micpie/export/compound_protein_protein/train_1-1.jsonl": "{"text":"The protein CD140a antigen is targeted by [C][O][C][=C][C][=N][C][=N][C][Branch2][Ring2][C][N][C][C][N][Branch2][Ring1][=Branch1][\/C][Branch1][C][S][=N][\\C][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O][Ring1][=Branch1][C][C][Ring2][Ring1][Ring1][=C][Ring2][Ring1][=Branch2][C][=C][Ring2][Ring1][=N][O][C]. The protein CD140a antigen interacts with PRKR-interacting protein 1."} {"text":"The protein NDR1 protein kinase is targeted by CN1CCN(c2ccc3nc(-c4c(N)c5c(F)cccc5[nH]c4=O)[nH]c3c2)CC1. The protein NDR1 protein kinase is ortholog to Serine\/threonine-protein kinase 38."}", "/scratch/micpie/export/compound_protein_protein/valid_7-0.jsonl": "{"text":"The compound Cc1cc(C(=O)NNc2ccc(F)cc2)no1 targets the protein hERV1 which interacts with the protein Ubiquitin-like modifier-activating enzyme ATG7."} {"text":"The compound N#C[C@@H]1CCCN1C(=O)CN[C@]12C[C@H]3C[C@@H](C1)C[C@](OC(=O)NC1CCCCC1)(C3)C2 targets the protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26) which interacts with the protein Protachykinin-1 (PPT)."}", "/scratch/micpie/export/compound_protein_protein/valid_8-1.jsonl": "{"text":"The protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26) is targeted by [N][#C][C@@H1][C][C][C][N][Ring1][Branch1][C][=Branch1][C][=O][C][N][C@@][C][C@@H1][C][C@H1][Branch1][Ring2][C][Ring1][=Branch1][C][C@@][Branch1][#C][O][C][=Branch1][C][=O][N][C][C][C][C][C][C][Ring1][=Branch1][Branch1][Ring2][C][Ring1][P][C][Ring2][Ring1][Ring2]. The protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26) interacts with C-X-C motif chemokine 2 (Growth-regulated protein beta) (Gro-beta) (Macrophage inflammatory protein 2-alpha) (MIP2-alpha)."} {"text":"The protein BPK is targeted by Cc1cc(Nc2nn(-c3cccc(-n4ncc5cc(C(C)(C)C)cc(F)c5c4=O)c3CO)cc2C(N)=O)nn1C. The protein BPK is ortholog to Kinase EMB."}", "/scratch/micpie/export/compound_protein_protein/train_0-1.jsonl": "{"text":"The protein Ataxia telangiectasia mutated is targeted by InChI=1S\/C17H15FO3S\/c1-11-2-8-14(9-3-11)22-16(17(20)21)10-15(19)12-4-6-13(18)7-5-12\/h2-9,16H,10H2,1H3,(H,20,21). The protein Ataxia telangiectasia mutated interacts with hP1.A."} {"text":"The protein PDGFR-2 is targeted by COc1cc2ncnc(N3CCN(\/C(S)=N\\Cc4ccc5c(c4)OCO5)CC3)c2cc1OC. The protein PDGFR-2 interacts with Platelet-derived growth factor C (PDGF-C) (Fallotein) (Spinal cord-derived growth factor) (SCDGF) (VEGF-E)."}", "/scratch/micpie/export/compound_protein_protein/valid_8-0.jsonl": "{"text":"The compound N#C[C@@H]1CCCN1C(=O)CN[C@]12C[C@H]3C[C@@H](C1)C[C@](OC(=O)NC1CCCCC1)(C3)C2 targets the protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26) which interacts with the protein C-X-C motif chemokine 2 (Growth-regulated protein beta) (Gro-beta) (Macrophage inflammatory protein 2-alpha) (MIP2-alpha)."} {"text":"The compound Cc1cc(Nc2nn(-c3cccc(-n4ncc5cc(C(C)(C)C)cc(F)c5c4=O)c3CO)cc2C(N)=O)nn1C targets the protein BPK which is ortholog to the protein Kinase EMB."}", "/scratch/micpie/export/compound_protein_protein/test_9-1.jsonl": "{"text":"The protein KDR is targeted by O=C(c1cccc(NC(=O)c2ccccn2)c1)c1ccc2c(\/C=C\/c3ccccn3)n[nH]c2c1. The protein KDR interacts with GAP junction 28 kDa liver protein."} {"text":"The protein pp60c-src is targeted by O=CCcccccc6)))))))N[C@@H]CccccOP=O)O)O)))cc6)))))))C=O)N[C@@H]CO))cncCccccCl)cCl)c6)))))))no5. The protein pp60c-src is ortholog to p60-Src."}", "/scratch/micpie/export/compound_protein_protein/valid_1-0.jsonl": "{"text":"The compound InChI=1S\/C22H18F3N5O\/c1-14-4-6-16(13-18(14)22(23,24)25)28-21-30-29-20(31-21)17-3-2-10-27-19(17)7-5-15-8-11-26-12-9-15\/h2-4,6,8-13H,5,7H2,1H3,(H,28,30) targets the protein Fetal liver kinase 1 which interacts with the protein Cx32."} {"text":"The compound [N][C][=C][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][=C][Branch2][Ring1][=C][C][=C][C][Branch1][=Branch2][N][C][C][O][C][C][Ring1][=Branch1][=N][C][Branch1][=Branch2][N][C][C][O][C][C][Ring1][=Branch1][=N][Ring2][Ring1][C][C][=N][Ring2][Ring1][N] targets the protein p110gamma which is ortholog to the protein Phosphoinositide-3-kinase catalytic gamma polypeptide."}", "/scratch/micpie/export/compound_protein_protein/train_6-2.jsonl": "{"text":"User: Can you come up with one example for a protein that binds the compound [C][N][C][=Branch1][C][=O][C@@][C][C@@H1][Ring1][Ring1][C@@H1][Branch2][Ring2][C][N][C][=N][C][=C][Branch1][#C][N][C][Branch1][=Branch1][C][C][C][Ring1][Ring1][C][C][C][Ring1][Ring1][N][=C][Branch1][C][Cl][N][=C][Ring1][#C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Ring2][Ring1][=Branch2][O]?\nAssistant: The compound [C][N][C][=Branch1][C][=O][C@@][C][C@@H1][Ring1][Ring1][C@@H1][Branch2][Ring2][C][N][C][=N][C][=C][Branch1][#C][N][C][Branch1][=Branch1][C][C][C][Ring1][Ring1][C][C][C][Ring1][Ring1][N][=C][Branch1][C][Cl][N][=C][Ring1][#C][Ring2][Ring1][C][C@H1][Branch1][C][O][C@@H1][Ring2][Ring1][=Branch2][O] targets for example the protein Serotonin receptor 2B.\nUser: Can you tell me a protein that interacts with protein Serotonin receptor 2B?\nAssistant: Yes, of course, the protein Serotonin receptor 2B interacts with Insulin-induced gene 1 protein."} {"text":"User: Can you give me one example for a protein that binds the compound CCOc1cc(C2NC(=O)N=C(c3cccnc3)C2c2ccsc2)cc([N+](=O)[O-])c1O?\nAssistant: The compound CCOc1cc(C2NC(=O)N=C(c3cccnc3)C2c2ccsc2)cc([N+](=O)[O-])c1O targets for example the protein FALDH.\nUser: Can you tell me a protein that interacts with protein FALDH?\nAssistant: Yes, of course, the protein FALDH interacts with Hypoxia-inducible factor 1-beta."}", "/scratch/micpie/export/compound_protein_protein/valid_6-2.jsonl": "{"text":"User: Can you come up with an example for a protein that binds the compound [C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][Branch2][Ring2][O][C][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][N][C][=Branch1][C][=O][C][N][C][C][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][Ring1][N][C][Branch1][C][C][=O]?\nAssistant: The compound [C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][N][Branch2][Ring2][O][C][C][Branch1][#C][C][C][=C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch2][Ring1][=Branch1][N][C][=Branch1][C][=O][C][N][C][C][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][Ring1][N][C][Branch1][C][C][=O] targets for example the protein NK-1 receptor.\nUser: Can you tell me a protein that interacts with protein NK-1 receptor?\nAssistant: Sure, the protein NK-1 receptor interacts with Protachykinin-1 (PPT)."} {"text":"User: Can you come up with one example for a protein that binds the compound [C][C][=C][C][C][C][=C][C][Branch1][=Branch2][C][=C][C][=C][N][=C][Ring1][=Branch1][=C][C][=C][Ring1][N][N][Ring1][S][C][C][C][Ring2][Ring1][Ring2][=O]?\nAssistant: The compound [C][C][=C][C][C][C][=C][C][Branch1][=Branch2][C][=C][C][=C][N][=C][Ring1][=Branch1][=C][C][=C][Ring1][N][N][Ring1][S][C][C][C][Ring2][Ring1][Ring2][=O] targets for example the protein Steroid 5-alpha-reductase 1.\nUser: Can you tell me a protein that is ortholog to protein Steroid 5-alpha-reductase 1?\nAssistant: Yes, the protein Steroid 5-alpha-reductase 1 is ortholog to SR type 1."}", "/scratch/micpie/export/compound_protein_protein/train_6-0.jsonl": "{"text":"The compound CNC(=O)[C@@]12C[C@@H]1[C@@H](n1cnc3c(NC(C4CC4)C4CC4)nc(Cl)nc31)[C@H](O)[C@@H]2O targets the protein 5-hydroxytryptamine receptor 2B which interacts with the protein INSIG-1."} {"text":"The compound InChI=1S\/C21H18N4O5S\/c1-2-30-16-9-14(8-15(20(16)26)25(28)29)19-17(13-5-7-31-11-13)18(23-21(27)24-19)12-4-3-6-22-10-12\/h3-11,17,19,26H,2H2,1H3,(H,24,27) targets the protein Alcohol dehydrogenase class-3 which interacts with the protein HIF1-beta."}", "/scratch/micpie/export/compound_protein_protein/train_8-2.jsonl": "{"text":"User: Can you come up with one example for a protein that binds the compound [N][#C][C][=C][C][=C][Branch2][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][=Branch1][C][=O][N][C][C][C][C@H1][Ring1][Branch1][C][#N][C][=C][Ring2][Ring1][Ring2]?\nAssistant: The compound [N][#C][C][=C][C][=C][Branch2][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][=Branch1][C][=O][N][C][C][C][C@H1][Ring1][Branch1][C][#N][C][=C][Ring2][Ring1][Ring2] targets for example the protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26).\nUser: Can you tell me a protein that interacts with protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26)?\nAssistant: Of course, the protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26) interacts with Pro-neuropeptide Y."} {"text":"User: Can you give me an example for a protein that binds the compound [C][C][Branch1][Ring1][C][O][\/N][=C][Branch1][C][\\N][C][C][Branch1][C][O][=N][S][C][=Ring1][=Branch1][N][C][=C][C][=C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][=N]?\nAssistant: The compound [C][C][Branch1][Ring1][C][O][\/N][=C][Branch1][C][\\N][C][C][Branch1][C][O][=N][S][C][=Ring1][=Branch1][N][C][=C][C][=C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][=N] targets for example the protein Dual specificity mitogen-activated protein kinase kinase 1.\nUser: Can you tell me a protein that is ortholog to protein Dual specificity mitogen-activated protein kinase kinase 1?\nAssistant: Yes, the protein Dual specificity mitogen-activated protein kinase kinase 1 is ortholog to MAPKK."}", "/scratch/micpie/export/compound_protein_protein/train_3-1.jsonl": "{"text":"The protein HD6 is targeted by O=C(NO)c1cnc(NC2(c3ccc(Cl)c(Cl)c3)CC2)nc1. The protein HD6 interacts with Zinc finger protein 205."} {"text":"The protein Adenosine receptor A3 is targeted by O=C(Cc1ccccc1)Nc1nc2ccccc2n2c(=O)n(-c3ccccc3)nc12. The protein Adenosine receptor A3 is ortholog to Adenosine receptor A3."}", "/scratch/micpie/export/compound_protein_protein/test_8-0.jsonl": "{"text":"The compound [C][C][=C][N][=C][Branch2][Branch1][C][C][=Branch1][C][=O][N][C][C][C][Branch1][Branch1][C][C][Ring1][=Branch1][C][N][Branch2][Ring1][=Branch2][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][#Branch2][O][C][Branch1][C][F][Branch1][C][F][F][C][=C][Ring1][O][C][C][O][Ring2][Ring1][=Branch1][C][=N][Ring2][Ring1][P] targets the protein Bifunctional epoxide hydrolase 2 which interacts with the protein Catalase."} {"text":"The compound CC(C)N1CCN(Cc2ccc(Nc3ncc4cc5n(c4n3)C3(CCCCC3)CNC5=O)nc2)CC1 targets the protein Cell division protein kinase 2 which is ortholog to the protein Cyclin-dependent kinase A-1."}", "/scratch/micpie/export/compound_protein_protein/valid_3-1.jsonl": "{"text":"The protein Histone deacetylase 1 is targeted by InChI=1S\/C20H21N5O2S\/c21-15-5-1-2-6-16(15)24-18(26)7-3-4-10-23-19(27)17-13-28-20(25-17)14-8-11-22-12-9-14\/h1-2,5-6,8-9,11-13H,3-4,7,10,21H2,(H,23,27)(H,24,26). The protein Histone deacetylase 1 interacts with Putative tyrosine-protein phosphatase auxilin."} {"text":"The protein Oxytocin receptor is targeted by [C][C][O][C][=C][C][=C][Branch2][Branch2][#Branch1][C][C@@H1][N][C][=Branch1][C][=O][C][C][S][S][C][C@@H1][Branch2][Ring2][=Branch1][C][=Branch1][C][=O][N][C][C][C][C][C@H1][Ring1][=Branch1][C][=Branch1][C][=O][N][C@@H1][Branch1][#Branch1][C][C][Branch1][C][C][C][C][=Branch1][C][=O][N][C][C][Branch1][C][N][=O][N][C][=Branch1][C][=O][C@H1][Branch1][#Branch1][C][C][Branch1][C][N][=O][N][C][=Branch1][C][=O][C@H1][Branch1][Branch2][C][C][C][Branch1][C][N][=O][N][C][=Branch1][C][=O][C@H1][Branch1][#Branch1][C@@H1][Branch1][C][C][C][C][N][C][Ring2][Branch1][N][=O][C][=C][Ring2][=Branch1][Ring2]. The protein Oxytocin receptor is ortholog to OT-R."}", "/scratch/micpie/export/compound_protein_protein/train_10-0.jsonl": "{"text":"The compound InChI=1S\/C18H15F3N6O\/c1-27-15-10(14(26-27)16(22)28)7-6-9-8-23-17(25-13(9)15)24-12-5-3-2-4-11(12)18(19,20)21\/h2-5,8H,6-7H2,1H3,(H2,22,28)(H,23,24,25) targets the protein Serine\/threonine-protein kinase PLK1 which interacts with the protein Meiosis-specific kinetochore protein."} {"text":"The compound COc1ccccc1CC(c1cccc(F)c1)N1CCNCC1 targets the protein Sodium-dependent noradrenaline transporter which is ortholog to the protein Sodium-dependent dopamine transporter."}", "/scratch/micpie/export/compound_protein_protein/test_10-2.jsonl": "{"text":"User: Can you come up with an example for a protein that binds the compound CCcncnc-ccccC=O)NCCNCC)=O))CC6)))))))cCl)c6))))))c6C#CccccN)nc6?\nAssistant: The compound CCcncnc-ccccC=O)NCCNCC)=O))CC6)))))))cCl)c6))))))c6C#CccccN)nc6 targets for example the protein PI3Kalpha.\nUser: Can you tell me a protein that interacts with protein PI3Kalpha?\nAssistant: Sure, the protein PI3Kalpha interacts with TRDL-1."} {"text":"User: Can you come up with one example for a protein that binds the compound O=C1NCC(c2cccc(Cl)c2)C1c1ccc(Cl)cc1?\nAssistant: The compound O=C1NCC(c2cccc(Cl)c2)C1c1ccc(Cl)cc1 targets for example the protein Solute carrier family 6 member 2.\nUser: Can you tell me a protein that is ortholog to protein Solute carrier family 6 member 2?\nAssistant: Of course, the protein Solute carrier family 6 member 2 is ortholog to Protein fumin."}", "/scratch/micpie/export/compound_protein_protein/train_9-1.jsonl": "{"text":"The protein Dual specificity mitogen-activated protein kinase kinase 1 is targeted by CCCO))\/N=C\\N)ccO)nsc5NccccCcccccc6)))))))cc6. The protein Dual specificity mitogen-activated protein kinase kinase 1 is ortholog to Protein Arabidopsis NQK1 homolog."} {"text":"The protein Serine\/threonine-protein kinase PLK1 is targeted by InChI=1S\/C18H15F3N6O\/c1-27-15-10(14(26-27)16(22)28)7-6-9-8-23-17(25-13(9)15)24-12-5-3-2-4-11(12)18(19,20)21\/h2-5,8H,6-7H2,1H3,(H2,22,28)(H,23,24,25). The protein Serine\/threonine-protein kinase PLK1 interacts with PAK-3."}", "/scratch/micpie/export/compound_protein_protein/test_4-1.jsonl": "{"text":"The protein NTR1 is targeted by InChI=1S\/C78H123N21O21\/c1-7-43(6)63(73(116)96-57(76(119)120)37-42(4)5)97-70(113)55(39-45-21-25-47(101)26-22-45)95-72(115)59-18-13-35-99(59)75(118)52(16-11-33-87-78(84)85)90-65(108)49(15-10-32-86-77(82)83)89-71(114)58-17-12-34-98(58)74(117)51(14-8-9-31-79)91-69(112)56(40-60(81)102)94-66(109)50(28-30-62(105)106)88-68(111)54(38-44-19-23-46(100)24-20-44)93-67(110)53(36-41(2)3)92-64(107)48(80)27-29-61(103)104\/h19-26,41-43,48-59,63,100-101H,7-18,27-40,79-80H2,1-6H3,(H2,81,102)(H,88,111)(H,89,114)(H,90,108)(H,91,112)(H,92,107)(H,93,110)(H,94,109)(H,95,115)(H,96,116)(H,97,113)(H,103,104)(H,105,106)(H,119,120)(H4,82,83,86)(H4,84,85,87)\/t43-,48-,49-,50-,51-,52-,53-,54-,55-,56-,57-,58-,59-,63-\/m0\/s1. The protein NTR1 interacts with D(2) dopamine receptor."} {"text":"The protein B1R is targeted by C=C(CNCC1CC1)c1ccc2c(c1)CCC[C@H]2NC(=O)C[C@@H](NS(=O)(=O)c1cccc(C(F)(F)F)c1)c1ccccc1. The protein B1R is ortholog to B1 bradykinin receptor."}", "/scratch/micpie/export/compound_protein_protein/valid_6-1.jsonl": "{"text":"The protein SPR is targeted by InChI=1S\/C34H40N4O3\/c1-25(39)38(22-28-12-6-9-15-33(28)41-2)23-30(20-29-21-35-32-14-8-7-13-31(29)32)36-34(40)24-37-18-16-27(17-19-37)26-10-4-3-5-11-26\/h3-15,21,27,30,35H,16-20,22-24H2,1-2H3,(H,36,40). The protein SPR interacts with Protachykinin-1 (PPT)."} {"text":"The protein S5AR 1 is targeted by CC1=C2CCc3cc(-c4cccnc4)ccc3N2CCC1=O. The protein S5AR 1 is ortholog to SR type 1."}", "/scratch/micpie/export/compound_protein_protein/train_1-2.jsonl": "{"text":"User: Can you give me one example for a protein that binds the compound COc1cc2ncnc(N3CCN(\/C(S)=N\\Cc4ccc5c(c4)OCO5)CC3)c2cc1OC?\nAssistant: The compound COc1cc2ncnc(N3CCN(\/C(S)=N\\Cc4ccc5c(c4)OCO5)CC3)c2cc1OC targets for example the protein PDGFR-alpha.\nUser: Can you tell me a protein that interacts with protein PDGFR-alpha?\nAssistant: Yes, of course, the protein PDGFR-alpha interacts with PRKR-interacting protein 1."} {"text":"User: Can you give me one example for a protein that binds the compound CN1CCN(c2ccc3nc(-c4c(N)c5c(F)cccc5[nH]c4=O)[nH]c3c2)CC1?\nAssistant: The compound CN1CCN(c2ccc3nc(-c4c(N)c5c(F)cccc5[nH]c4=O)[nH]c3c2)CC1 targets for example the protein Nuclear Dbf2-related kinase 1.\nUser: Can you tell me a protein that is ortholog to protein Nuclear Dbf2-related kinase 1?\nAssistant: Sure, the protein Nuclear Dbf2-related kinase 1 is ortholog to Serine\/threonine-protein kinase 38."}", "/scratch/micpie/export/compound_protein_protein/valid_6-0.jsonl": "{"text":"The compound COcccccc6CNCCCcc[nH]cccccc96))))))))))NC=O)CNCCCcccccc6))))))CC6)))))))))))CC)=O targets the protein NK-1 receptor which interacts with the protein Protachykinin-1 (PPT)."} {"text":"The compound CC1=C2CCc3cc(-c4cccnc4)ccc3N2CCC1=O targets the protein Steroid 5-alpha-reductase 1 which is ortholog to the protein SR type 1."}", "/scratch/micpie/export/compound_protein_protein/valid_10-2.jsonl": "{"text":"User: Can you give me one example for a protein that binds the compound O=C1NC(=O)c2c1ccc1[nH]ccc21?\nAssistant: The compound O=C1NC(=O)c2c1ccc1[nH]ccc21 targets for example the protein Poly [ADP-ribose] polymerase 1 (PARP-1) (EC 2.4.2.30) (ADP-ribosyltransferase diphtheria toxin-like 1) (ARTD1) (DNA ADP-ribosyltransferase PARP1) (EC 2.4.2.-) (NAD(+) ADP-ribosyltransferase 1) (ADPRT 1) (Poly[ADP-ribose] synthase 1) (Protein poly-ADP-ribosyltransferase PARP1) (EC 2.4.2.-).\nUser: Can you tell me a protein that interacts with protein Poly [ADP-ribose] polymerase 1 (PARP-1) (EC 2.4.2.30) (ADP-ribosyltransferase diphtheria toxin-like 1) (ARTD1) (DNA ADP-ribosyltransferase PARP1) (EC 2.4.2.-) (NAD(+) ADP-ribosyltransferase 1) (ADPRT 1) (Poly[ADP-ribose] synthase 1) (Protein poly-ADP-ribosyltransferase PARP1) (EC 2.4.2.-)?\nAssistant: Yes, of course, the protein Poly [ADP-ribose] polymerase 1 (PARP-1) (EC 2.4.2.30) (ADP-ribosyltransferase diphtheria toxin-like 1) (ARTD1) (DNA ADP-ribosyltransferase PARP1) (EC 2.4.2.-) (NAD(+) ADP-ribosyltransferase 1) (ADPRT 1) (Poly[ADP-ribose] synthase 1) (Protein poly-ADP-ribosyltransferase PARP1) (EC 2.4.2.-) interacts with C\/EBP alpha."} {"text":"User: Can you give me an example for a protein that binds the compound CCCCCCn1nc(NC(=O)C2CNC(=O)C2)cc1-c1ccccc1?\nAssistant: The compound CCCCCCn1nc(NC(=O)C2CNC(=O)C2)cc1-c1ccccc1 targets for example the protein Na(+)\/glucose cotransporter 1.\nUser: Can you tell me a protein that is ortholog to protein Na(+)\/glucose cotransporter 1?\nAssistant: Sure, the protein Na(+)\/glucose cotransporter 1 is ortholog to High-affinity proline transporter PutP."}", "/scratch/micpie/export/compound_protein_protein/train_3-0.jsonl": "{"text":"The compound O=C(NO)c1cnc(NC2(c3ccc(Cl)c(Cl)c3)CC2)nc1 targets the protein HD6 which interacts with the protein Zinc finger protein 205."} {"text":"The compound O=CCcccccc6)))))))Ncncccccc6nc=O)n-cccccc6))))))nc%135 targets the protein Adenosine receptor A3 which is ortholog to the protein Adenosine receptor A3."}", "/scratch/micpie/export/compound_protein_protein/test_9-2.jsonl": "{"text":"User: Can you come up with an example for a protein that binds the compound O=C(c1cccc(NC(=O)c2ccccn2)c1)c1ccc2c(\/C=C\/c3ccccn3)n[nH]c2c1?\nAssistant: The compound O=C(c1cccc(NC(=O)c2ccccn2)c1)c1ccc2c(\/C=C\/c3ccccn3)n[nH]c2c1 targets for example the protein Kinase insert domain receptor.\nUser: Can you tell me a protein that interacts with protein Kinase insert domain receptor?\nAssistant: Of course, the protein Kinase insert domain receptor interacts with Gap junction beta-1 protein."} {"text":"User: Can you come up with an example for a protein that binds the compound O=C(Cc1ccccc1)N[C@@H](Cc1ccc(OP(=O)(O)O)cc1)C(=O)N[C@@H](CO)c1nc(Cc2ccc(Cl)c(Cl)c2)no1?\nAssistant: The compound O=C(Cc1ccccc1)N[C@@H](Cc1ccc(OP(=O)(O)O)cc1)C(=O)N[C@@H](CO)c1nc(Cc2ccc(Cl)c(Cl)c2)no1 targets for example the protein Proto-oncogene tyrosine-protein kinase Src.\nUser: Can you tell me a protein that is ortholog to protein Proto-oncogene tyrosine-protein kinase Src?\nAssistant: Yes, the protein Proto-oncogene tyrosine-protein kinase Src is ortholog to pp60c-src."}", "/scratch/micpie/export/compound_protein_protein/train_7-0.jsonl": "{"text":"The compound CCOc1cc(C2NC(=O)N=C(c3cccnc3)C2c2ccsc2)cc([N+](=O)[O-])c1O targets the protein Alcohol dehydrogenase class chi chain which interacts with the protein Solute carrier family 66 member 2."} {"text":"The compound O=CO)Ccsc\/C=C\\NC=O)CS5))))))nc5-ccccBr)s5 targets the protein ADAM-TS 5 which is ortholog to the protein Aggrecanase-2."}", "/scratch/micpie/export/compound_protein_protein/train_4-0.jsonl": "{"text":"The compound InChI=1S\/C32H31Cl2N7O\/c1-21(2)37-31(42)39-32(22-8-4-3-5-9-22)16-18-40(19-17-32)29-27-30(36-20-35-29)41(24-14-12-23(33)13-15-24)28(38-27)25-10-6-7-11-26(25)34\/h3-15,20-21H,16-19H2,1-2H3,(H2,37,39,42) targets the protein Cannabinoid receptor 2 which interacts with the protein C-C motif chemokine 4 (G-26 T-lymphocyte-secreted protein) (HC21) (Lymphocyte activation gene 1 protein) (LAG-1) (MIP-1-beta(1-69)) (Macrophage inflammatory protein 1-beta) (MIP-1-beta) (PAT 744) (Protein H400) (SIS-gamma) (Small-inducible cytokine A4) (T-cell activation protein 2) (ACT-2)."} {"text":"The compound [C][C][Branch1][C][C][C][C][C][Branch1][C][C][Branch1][C][C][C][=C][C][Branch2][Ring2][Branch1][N][Branch1][S][C][=C][C][=C][Branch1][=Branch1][C][=Branch1][C][=O][O][C][=N][Ring1][=Branch2][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][=C][C][=C][Ring2][Ring1][#Branch2][Ring2][Ring1][P] targets the protein Nuclear receptor subfamily 2 group B member 2 which interacts with the protein SPC4."}", "/scratch/micpie/export/compound_protein_protein/train_3-2.jsonl": "{"text":"User: Can you come up with an example for a protein that binds the compound InChI=1S\/C14H12Cl2N4O2\/c15-10-2-1-9(5-11(10)16)14(3-4-14)19-13-17-6-8(7-18-13)12(21)20-22\/h1-2,5-7,22H,3-4H2,(H,20,21)(H,17,18,19)?\nAssistant: The compound InChI=1S\/C14H12Cl2N4O2\/c15-10-2-1-9(5-11(10)16)14(3-4-14)19-13-17-6-8(7-18-13)12(21)20-22\/h1-2,5-7,22H,3-4H2,(H,20,21)(H,17,18,19) targets for example the protein HD6.\nUser: Can you tell me a protein that interacts with protein HD6?\nAssistant: Sure, the protein HD6 interacts with Zinc finger protein 205."} {"text":"User: Can you come up with an example for a protein that binds the compound O=C(Cc1ccccc1)Nc1nc2ccccc2n2c(=O)n(-c3ccccc3)nc12?\nAssistant: The compound O=C(Cc1ccccc1)Nc1nc2ccccc2n2c(=O)n(-c3ccccc3)nc12 targets for example the protein Adenosine receptor A3.\nUser: Can you tell me a protein that is ortholog to protein Adenosine receptor A3?\nAssistant: Yes, of course, the protein Adenosine receptor A3 is ortholog to Adenosine receptor A3."}", "/scratch/micpie/export/compound_protein_protein/valid_10-1.jsonl": "{"text":"The protein Poly [ADP-ribose] polymerase 1 (PARP-1) (EC 2.4.2.30) (ADP-ribosyltransferase diphtheria toxin-like 1) (ARTD1) (DNA ADP-ribosyltransferase PARP1) (EC 2.4.2.-) (NAD(+) ADP-ribosyltransferase 1) (ADPRT 1) (Poly[ADP-ribose] synthase 1) (Protein poly-ADP-ribosyltransferase PARP1) (EC 2.4.2.-) is targeted by [O][=C][N][C][=Branch1][C][=O][C][=C][Ring1][=Branch1][C][=C][C][NH1][C][=C][C][Ring1][=Branch2][=Ring1][Branch1]. The protein Poly [ADP-ribose] polymerase 1 (PARP-1) (EC 2.4.2.30) (ADP-ribosyltransferase diphtheria toxin-like 1) (ARTD1) (DNA ADP-ribosyltransferase PARP1) (EC 2.4.2.-) (NAD(+) ADP-ribosyltransferase 1) (ADPRT 1) (Poly[ADP-ribose] synthase 1) (Protein poly-ADP-ribosyltransferase PARP1) (EC 2.4.2.-) interacts with C\/EBP alpha."} {"text":"The protein Sodium\/glucose cotransporter 1 is targeted by [C][C][C][C][C][C][N][N][=C][Branch1][S][N][C][=Branch1][C][=O][C][C][N][C][=Branch1][C][=O][C][Ring1][=Branch1][C][=C][Ring1][=C][C][=C][C][=C][C][=C][Ring1][=Branch1]. The protein Sodium\/glucose cotransporter 1 is ortholog to High-affinity proline transporter PutP."}", "/scratch/micpie/export/compound_protein_protein/test_7-2.jsonl": "{"text":"User: Can you give me one example for a protein that binds the compound COc1cc2c(cc1-c1cccnc1)CCC(=O)N2?\nAssistant: The compound COc1cc2c(cc1-c1cccnc1)CCC(=O)N2 targets for example the protein CYPXIB2.\nUser: Can you tell me a protein that is ortholog to protein CYPXIB2?\nAssistant: Yes, of course, the protein CYPXIB2 is ortholog to ALDOS."} {"text":"User: Can you come up with one example for a protein that binds the compound CC1(C)OCC(N)=N[C@](C)(c2cc(Nc3ccc(OCC(F)(F)F)nc3)ccc2F)C1(F)F?\nAssistant: The compound CC1(C)OCC(N)=N[C@](C)(c2cc(Nc3ccc(OCC(F)(F)F)nc3)ccc2F)C1(F)F targets for example the protein Down region aspartic protease.\nUser: Can you tell me a protein that is ortholog to protein Down region aspartic protease?\nAssistant: Of course, the protein Down region aspartic protease is ortholog to Beta-site amyloid precursor protein cleaving enzyme 2."}", "/scratch/micpie/export/compound_protein_protein/test_10-0.jsonl": "{"text":"The compound [C][C][C][=N][C][=N][C][Branch2][Ring2][Ring1][C][=C][C][=C][Branch2][Ring1][Ring2][C][=Branch1][C][=O][N][C][C][N][Branch1][=Branch1][C][Branch1][C][C][=O][C][C][Ring1][=Branch2][C][Branch1][C][Cl][=C][Ring2][Ring1][C][=C][Ring2][Ring1][Branch2][C][#C][C][=C][C][=C][Branch1][C][N][N][=C][Ring1][#Branch1] targets the protein Phosphoinositide-3-kinase catalytic alpha polypeptide which interacts with the protein APRIL."} {"text":"The compound O=CNCCcccccCl)c6))))))C5ccccCl)cc6 targets the protein Norepinephrine transporter which is ortholog to the protein Protein fumin."}", "/scratch/micpie/export/compound_protein_ec_number/test_0-1.jsonl": "{"text":"The compound CC=O)NccccC=O)O))cc6NC=N)N targets the protein Membrane sialidase. Furthermore, the compound CC=O)NccccC=O)O))cc6NC=N)N catalyzes the exo-alpha-sialidase (EC 3.2.1.18)."} {"text":"The compound InChI=1S\/C24H27N5O4S\/c1-17(30)28-13-8-19-14-22(6-7-23(19)28)34(32,33)29(21-4-2-18(15-25)3-5-21)16-24(31)27-11-9-20(26)10-12-27\/h2-7,14,20H,8-13,16,26H2,1H3 targets the protein BRAF35-HDAC complex protein BHC110. Furthermore, the compound InChI=1S\/C24H27N5O4S\/c1-17(30)28-13-8-19-14-22(6-7-23(19)28)34(32,33)29(21-4-2-18(15-25)3-5-21)16-24(31)27-11-9-20(26)10-12-27\/h2-7,14,20H,8-13,16,26H2,1H3 catalyzes the [histone-H3]-N(6),N(6)-dimethyl-L-lysine(4) FAD-dependent demethylase (EC 1.14.99.66)."}", "/scratch/micpie/export/compound_protein_ec_number/valid_0-0.jsonl": "{"text":"The compound CC(=O)N[C@@H]1[C@@H](N=[N+]=[N-])C=C(C(=O)O)O[C@H]1CNc1ccc(-c2ccccc2)cc1 targets the protein Sialidase-3 and catalyzes the exo-alpha-sialidase (EC 3.2.1.18)."} {"text":"The compound [C][C][=C][C][=C][Branch2][Ring1][=N][C][=C][C][=C][Branch2][Ring1][C][C@H1][C][C@@H1][Ring1][Ring1][N][C][C][C][C][Branch1][C][N][C][C][Ring1][#Branch1][C][=N][Ring1][P][C][=C][Ring2][Ring1][#Branch1][O] targets the protein BRAF35-HDAC complex protein BHC110 and catalyzes the [histone-H3]-N(6),N(6)-dimethyl-L-lysine(4) FAD-dependent demethylase (EC 1.14.99.66)."}", "/scratch/micpie/export/compound_protein_ec_number/test_0-2.jsonl": "{"text":"User: Can you give me an example for a protein that binds the compound [C][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][=Branch1][C][=Branch1][C][=O][O][C][=C][Ring1][=Branch2][N][C][=Branch1][C][=N][N]?\nAssistant: Yes, of course, the compound [C][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][=Branch1][C][=Branch1][C][=O][O][C][=C][Ring1][=Branch2][N][C][=Branch1][C][=N][N] targets the protein Ganglioside sialidasedis.\nUser: Can you tell me which enzyme the protein Ganglioside sialidasedis catalyzes?\nAssistant: The protein Ganglioside sialidasedis catalyzes a exo-alpha-sialidase (EC 3.2.1.18)."} {"text":"User: Can you give me one example for a protein that binds the compound CC=O)NCCcccS=O)=O)NCC=O)NCCCN)CC6))))))))ccccC#N))cc6))))))))ccc69?\nAssistant: Of course, the compound CC=O)NCCcccS=O)=O)NCC=O)NCCCN)CC6))))))))ccccC#N))cc6))))))))ccc69 targets the protein [histone H3]-dimethyl-L-lysine(4) FAD-dependent demethylase 1A.\nUser: Can you tell me which enzyme the protein [histone H3]-dimethyl-L-lysine(4) FAD-dependent demethylase 1A catalyzes?\nAssistant: The protein [histone H3]-dimethyl-L-lysine(4) FAD-dependent demethylase 1A catalyzes a [histone-H3]-N(6),N(6)-dimethyl-L-lysine(4) FAD-dependent demethylase (EC 1.14.99.66)."}", "/scratch/micpie/export/compound_protein_ec_number/test_0-0.jsonl": "{"text":"The compound [C][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][=Branch1][C][=Branch1][C][=O][O][C][=C][Ring1][=Branch2][N][C][=Branch1][C][=N][N] targets the protein Membrane sialidase and catalyzes the exo-alpha-sialidase (EC 3.2.1.18)."} {"text":"The compound CC=O)NCCcccS=O)=O)NCC=O)NCCCN)CC6))))))))ccccC#N))cc6))))))))ccc69 targets the protein Lysine-specific histone demethylase 1A and catalyzes the [histone-H3]-N(6),N(6)-dimethyl-L-lysine(4) FAD-dependent demethylase (EC 1.14.99.66)."}", "/scratch/micpie/export/compound_protein_ec_number/train_0-0.jsonl": "{"text":"The compound CCOc1cn(C[C@@H](O)[C@@H](O)[C@@H]2OC(C(=O)O)=C[C@H](O)[C@H]2NC(C)=O)nn1 targets the protein Ganglioside sialidasedis and catalyzes the exo-alpha-sialidase (EC 3.2.1.18)."} {"text":"The compound InChI=1S\/C27H30O16\/c1-8-17(32)20(35)22(37)26(40-8)39-7-15-18(33)21(36)23(38)27(42-15)43-25-19(34)16-13(31)5-10(28)6-14(16)41-24(25)9-2-3-11(29)12(30)4-9\/h2-6,8,15,17-18,20-23,26-33,35-38H,7H2,1H3\/t8-,15+,17-,18+,20+,21-,22+,23+,26+,27-\/m0\/s1 targets the protein Flavin-containing amine oxidase domain-containing protein 2 and catalyzes the [histone-H3]-N(6),N(6)-dimethyl-L-lysine(4) FAD-dependent demethylase (EC 1.14.99.66)."}", "/scratch/micpie/export/compound_protein_ec_number/valid_0-2.jsonl": "{"text":"User: Can you give me an example for a protein that binds the compound [C][C][=Branch1][C][=O][N][C@@H1][C@@H1][Branch1][Ring2][N][=N+1][=N-1][C][=C][Branch1][=Branch1][C][=Branch1][C][=O][O][O][C@H1][Ring1][N][C][N][C][=C][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][N]?\nAssistant: Yes, the compound [C][C][=Branch1][C][=O][N][C@@H1][C@@H1][Branch1][Ring2][N][=N+1][=N-1][C][=C][Branch1][=Branch1][C][=Branch1][C][=O][O][O][C@H1][Ring1][N][C][N][C][=C][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][N] targets the protein Membrane sialidase.\nUser: Can you tell me which enzyme the protein Membrane sialidase catalyzes?\nAssistant: The protein Membrane sialidase catalyzes a exo-alpha-sialidase (EC 3.2.1.18)."} {"text":"User: Can you give me one example for a protein that binds the compound [C][C][=C][C][=C][Branch2][Ring1][=N][C][=C][C][=C][Branch2][Ring1][C][C@H1][C][C@@H1][Ring1][Ring1][N][C][C][C][C][Branch1][C][N][C][C][Ring1][#Branch1][C][=N][Ring1][P][C][=C][Ring2][Ring1][#Branch1][O]?\nAssistant: Of course, the compound [C][C][=C][C][=C][Branch2][Ring1][=N][C][=C][C][=C][Branch2][Ring1][C][C@H1][C][C@@H1][Ring1][Ring1][N][C][C][C][C][Branch1][C][N][C][C][Ring1][#Branch1][C][=N][Ring1][P][C][=C][Ring2][Ring1][#Branch1][O] targets the protein [histone H3]-dimethyl-L-lysine(4) FAD-dependent demethylase 1A.\nUser: Can you tell me which enzyme the protein [histone H3]-dimethyl-L-lysine(4) FAD-dependent demethylase 1A catalyzes?\nAssistant: The protein [histone H3]-dimethyl-L-lysine(4) FAD-dependent demethylase 1A catalyzes a [histone-H3]-N(6),N(6)-dimethyl-L-lysine(4) FAD-dependent demethylase (EC 1.14.99.66)."}", "/scratch/micpie/export/compound_protein_ec_number/valid_0-1.jsonl": "{"text":"The compound CC=O)N[C@@H][C@@H]N=[N+]=[N-])))C=CC=O)O))O[C@H]6CNcccc-cccccc6))))))cc6 targets the protein Ganglioside sialidasedis. Furthermore, the compound CC=O)N[C@@H][C@@H]N=[N+]=[N-])))C=CC=O)O))O[C@H]6CNcccc-cccccc6))))))cc6 catalyzes the exo-alpha-sialidase (EC 3.2.1.18)."} {"text":"The compound Cc1ccc(-c2ccc([C@H]3C[C@@H]3NC3CCC(N)CC3)cn2)cc1O targets the protein Lysine-specific histone demethylase 1A. Furthermore, the compound Cc1ccc(-c2ccc([C@H]3C[C@@H]3NC3CCC(N)CC3)cn2)cc1O catalyzes the [histone-H3]-N(6),N(6)-dimethyl-L-lysine(4) FAD-dependent demethylase (EC 1.14.99.66)."}", "/scratch/micpie/export/compound_protein_ec_number/train_0-2.jsonl": "{"text":"User: Can you give me an example for a protein that binds the compound CCOc1cn(C[C@@H](O)[C@@H](O)[C@@H]2OC(C(=O)O)=C[C@H](O)[C@H]2NC(C)=O)nn1?\nAssistant: Sure, the compound CCOc1cn(C[C@@H](O)[C@@H](O)[C@@H]2OC(C(=O)O)=C[C@H](O)[C@H]2NC(C)=O)nn1 targets the protein Sialidase-3.\nUser: Can you tell me which enzyme the protein Sialidase-3 catalyzes?\nAssistant: The protein Sialidase-3 catalyzes a exo-alpha-sialidase (EC 3.2.1.18)."} {"text":"User: Can you come up with an example for a protein that binds the compound C[C@@H]1O[C@@H](OC[C@H]2O[C@@H](Oc3c(-c4ccc(O)c(O)c4)oc4cc(O)cc(O)c4c3=O)[C@H](O)[C@@H](O)[C@@H]2O)[C@H](O)[C@H](O)[C@H]1O?\nAssistant: Yes, of course, the compound C[C@@H]1O[C@@H](OC[C@H]2O[C@@H](Oc3c(-c4ccc(O)c(O)c4)oc4cc(O)cc(O)c4c3=O)[C@H](O)[C@@H](O)[C@@H]2O)[C@H](O)[C@H](O)[C@H]1O targets the protein BRAF35-HDAC complex protein BHC110.\nUser: Can you tell me which enzyme the protein BRAF35-HDAC complex protein BHC110 catalyzes?\nAssistant: The protein BRAF35-HDAC complex protein BHC110 catalyzes a [histone-H3]-N(6),N(6)-dimethyl-L-lysine(4) FAD-dependent demethylase (EC 1.14.99.66)."}", "/scratch/micpie/export/compound_protein_ec_number/train_0-1.jsonl": "{"text":"The compound InChI=1S\/C15H22N4O8\/c1-3-26-11-6-19(18-17-11)5-9(22)13(23)14-12(16-7(2)20)8(21)4-10(27-14)15(24)25\/h4,6,8-9,12-14,21-23H,3,5H2,1-2H3,(H,16,20)(H,24,25)\/t8-,9+,12+,13+,14+\/m0\/s1 targets the protein N-acetyl-alpha-neuraminidase 3. Furthermore, the compound InChI=1S\/C15H22N4O8\/c1-3-26-11-6-19(18-17-11)5-9(22)13(23)14-12(16-7(2)20)8(21)4-10(27-14)15(24)25\/h4,6,8-9,12-14,21-23H,3,5H2,1-2H3,(H,16,20)(H,24,25)\/t8-,9+,12+,13+,14+\/m0\/s1 catalyzes the exo-alpha-sialidase (EC 3.2.1.18)."} {"text":"The compound C[C@@H]1O[C@@H](OC[C@H]2O[C@@H](Oc3c(-c4ccc(O)c(O)c4)oc4cc(O)cc(O)c4c3=O)[C@H](O)[C@@H](O)[C@@H]2O)[C@H](O)[C@H](O)[C@H]1O targets the protein BRAF35-HDAC complex protein BHC110. Furthermore, the compound C[C@@H]1O[C@@H](OC[C@H]2O[C@@H](Oc3c(-c4ccc(O)c(O)c4)oc4cc(O)cc(O)c4c3=O)[C@H](O)[C@@H](O)[C@@H]2O)[C@H](O)[C@H](O)[C@H]1O catalyzes the [histone-H3]-N(6),N(6)-dimethyl-L-lysine(4) FAD-dependent demethylase (EC 1.14.99.66)."}", "/scratch/micpie/export/sr_atad5_tox21/test_0-10.jsonl": "{"text":"User: Can you generate the SMILES of a molecule that is not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nAssistant: Sure, here you go: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"User: Can you generate the SMILES of a molecule that is not toxic in the Luciferase-tagged ATAD5 assay?\nAssistant: Sure, here you go: COC(=O)c1ccc(C(=O)O)cc1"}", "/scratch/micpie/export/sr_atad5_tox21/valid_0-8.jsonl": "{"text":"User: Can you estimate if the molecule with the SELFIES [C][N][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Ring1][S] is toxic in the Luciferase-tagged ATAD5 assay?\nAssistant: No, this molecule is not toxic in the Luciferase-tagged ATAD5 assay."} {"text":"User: Can you estimate if the molecule with the SELFIES [C][N][C@H1][C][C][C@@H1][Ring1][Branch1][C][C@H1][Branch2][Ring1][C][O][C][=Branch1][C][=O][C][Branch1][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring2][Ring1][C] is toxic in the Luciferase-tagged ATAD5 assay?\nAssistant: No, this molecule is not toxic in the Luciferase-tagged ATAD5 assay."}", "/scratch/micpie/export/sr_atad5_tox21/test_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nConstraint: You must select none, one or more options from A, B, or C without using any other words.\nOptions:\nA. CC(C)=CCOc1ccc(\/C=C\/C(=O)c2ccc(OCC=C(C)C)cc2OCC(=O)O)cc1\nB. CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C\nC. CC(=NC#N)N(C)Cc1ccc(Cl)nc1\nAnswer: A, B, C"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any other words.\nOptions:\nA CCC)S[C@@H][C@H]NC=O)COcccccc6))))))))))C=O)N4[C@H]7C=O)[O-]\nB CC=C[C@H]OC=O)[C@@H]C)[C@@H]5CC[C@@]9C)C=CC%13=O\nC COC=O)ccccC=O)O))cc6\nD CCNS=O)=O)CF)F)CF)F)CF)F)CF)F)CF)F)CF)F)CF)F)CF)F)F\nE C=CCC)O)CC\/C=C\\C)CC\nAnswer: A, B, C, D, E"}", "/scratch/micpie/export/sr_atad5_tox21/train_0-8.jsonl": "{"text":"User: Can you figure out if the molecule with the canonical SMILES CCOc1ccc2nc(S(N)(=O)=O)sc2c1 is toxic in the Luciferase-tagged ATAD5 assay?\nAssistant: No, this molecule is not toxic in the Luciferase-tagged ATAD5 assay."} {"text":"User: Can you figure out if the molecule with the canonical SMILES CCOC(=O)CC#N is toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nAssistant: No, this molecule is not toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay."}", "/scratch/micpie/export/sr_atad5_tox21/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay.\nSELFIES: [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay.\nSELFIES: [C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][=Branch1][C][=Branch1][C][=O][O][C][=C][Ring1][=Branch2]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/sr_atad5_tox21/valid_0-9.jsonl": "{"text":"User: Is the molecule with the SMILES CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21 toxic in the Luciferase-tagged ATAD5 assay?\nAssistant: No, it is not toxic in the Luciferase-tagged ATAD5 assay."} {"text":"User: Is the molecule with the InChI InChI=1S\/C16H21NO3\/c1-17-12-7-8-13(17)10-14(9-12)20-16(19)15(18)11-5-3-2-4-6-11\/h2-6,12-15,18H,7-10H2,1H3\/t12-,13+,14+,15? toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nAssistant: No, it is not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay."}", "/scratch/micpie/export/sr_atad5_tox21/test_0-1.jsonl": "{"text":"The molecule with the canonical SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C is not showing SR-ATAD5 toxicity."} {"text":"The molecule with the InChI representation of InChI=1S\/C9H8O4\/c1-13-9(12)7-4-2-6(3-5-7)8(10)11\/h2-5H,1H3,(H,10,11) is not showing SR-ATAD5 toxicity."}", "/scratch/micpie/export/sr_atad5_tox21/valid_0-0.jsonl": "{"text":"The molecule with the InChI representation of InChI=1S\/C19H23ClN2\/c1-21(2)12-5-13-22-18-7-4-3-6-15(18)8-9-16-10-11-17(20)14-19(16)22\/h3-4,6-7,10-11,14H,5,8-9,12-13H2,1-2H3 is not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay."} {"text":"The molecule with the DeepSMILES CN[C@H]CC[C@@H]5C[C@H]OC=O)CO)cccccc6)))))))))C7 is not toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay."}", "/scratch/micpie/export/sr_atad5_tox21/test_0-2.jsonl": "{"text":"Based on the SMILES representation CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C, the molecule has no Luciferase-tagged ATAD5 toxicity properties."} {"text":"Based on the SMILES representation COC(=O)c1ccc(C(=O)O)cc1, the molecule has no SR-Luciferase-tagged ATAD5 in human embryonic kidney cells toxicity properties."}", "/scratch/micpie/export/sr_atad5_tox21/valid_0-10.jsonl": "{"text":"User: Can you generate the SELFIES of a molecule that is not toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nAssistant: Of course, here you go: [C][N][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Ring1][S]"} {"text":"User: Can you give me the SMILES of a molecule that is not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nAssistant: Sure, here you go: CN1[C@H]2CC[C@@H]1C[C@H](OC(=O)C(O)c1ccccc1)C2"}", "/scratch/micpie/export/sr_atad5_tox21/train_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay.\nMolecule SMILES: CCOc1ccc2nc(S(N)(=O)=O)sc2c1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Luciferase-tagged ATAD5 assay.\nMolecule SMILES: CCOC(=O)CC#N\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the Luciferase-tagged ATAD5 assay."}", "/scratch/micpie/export/sr_atad5_tox21/valid_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay.\nSMILES: CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay.\nDeepSMILES: CN[C@H]CC[C@@H]5C[C@H]OC=O)CO)cccccc6)))))))))C7\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay."}", "/scratch/micpie/export/sr_atad5_tox21/test_0-9.jsonl": "{"text":"User: Is the molecule with the DeepSMILES CCCNCC))CCC))C=O)NccC)cccc6C toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nAssistant: No, it is not toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay."} {"text":"User: Is the molecule with the canonical SMILES COC(=O)c1ccc(C(=O)O)cc1 toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nAssistant: No, it is not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay."}", "/scratch/micpie/export/sr_atad5_tox21/test_0-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of CCCNCC))CCC))C=O)NccC)cccc6C is not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay."} {"text":"The molecule with the InChI InChI=1S\/C9H8O4\/c1-13-9(12)7-4-2-6(3-5-7)8(10)11\/h2-5H,1H3,(H,10,11) is not toxic in the Luciferase-tagged ATAD5 assay."}", "/scratch/micpie/export/sr_atad5_tox21/valid_0-7.jsonl": "{"text":"Task: Please create a molecule SELFIES based on the text description.\nDescription: A molecule that is toxic in the Luciferase-tagged ATAD5 assay.\nResult: [C][N][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Ring1][S]"} {"text":"Task: Please create a molecule DeepSMILES based on the text description below.\nDescription: A molecule that is toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay.\nResult: CN[C@H]CC[C@@H]5C[C@H]OC=O)CO)cccccc6)))))))))C7"}", "/scratch/micpie/export/sr_atad5_tox21/test_0-3.jsonl": "{"text":"The SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C is from a molecule that is not identified as toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay."} {"text":"The SMILES COC(=O)c1ccc(C(=O)O)cc1 is from a molecule that is not identified as toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay."}", "/scratch/micpie/export/sr_atad5_tox21/valid_0-11.jsonl": "{"text":"User: I'm searching for the SELFIES of a molecule that is not toxic in the Luciferase-tagged ATAD5 assay?\nAssistant: This is a molecule that is not toxic in the Luciferase-tagged ATAD5 assay: [C][N][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Ring1][S]"} {"text":"User: I'm searching for the DeepSMILES of a molecule that is not toxic in the Luciferase-tagged ATAD5 assay?\nAssistant: This is a molecule that is not toxic in the Luciferase-tagged ATAD5 assay: CN[C@H]CC[C@@H]5C[C@H]OC=O)CO)cccccc6)))))))))C7"}", "/scratch/micpie/export/sr_atad5_tox21/train_0-0.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13) is not toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay."} {"text":"The molecule with the InChI InChI=1S\/C5H7NO2\/c1-2-8-5(7)3-4-6\/h2-3H2,1H3 is not toxic in the Luciferase-tagged ATAD5 assay."}", "/scratch/micpie/export/sr_atad5_tox21/test_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Luciferase-tagged ATAD5 assay.\nMolecule InChI: InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20)\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the Luciferase-tagged ATAD5 assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay.\nMolecule SMILES: COC(=O)c1ccc(C(=O)O)cc1\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay."}", "/scratch/micpie/export/sr_atad5_tox21/train_0-10.jsonl": "{"text":"User: Can you give me the SMILES of a molecule that is not toxic in the Luciferase-tagged ATAD5 assay?\nAssistant: Of course, here you go: CCOc1ccc2nc(S(N)(=O)=O)sc2c1"} {"text":"User: Can you generate the canonical SMILES of a molecule that is not toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nAssistant: Of course, here you go: CCOC(=O)CC#N"}", "/scratch/micpie/export/sr_atad5_tox21/train_0-3.jsonl": "{"text":"The InChI InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13) is from a molecule that is not identified as toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay."} {"text":"The SMILES CCOC(=O)CC#N is from a molecule that is not identified as toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay."}", "/scratch/micpie/export/sr_atad5_tox21/train_0-12.jsonl": "{"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay.\nAssistant: Ok, this DeepSMILES is not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay: CCOccccncSN)=O)=O))sc5c9"} {"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay.\nAssistant: Ok, here you go, this SELFIES is not toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay: [C][C][O][C][=Branch1][C][=O][C][C][#N]"}", "/scratch/micpie/export/sr_atad5_tox21/test_0-13.jsonl": "{"text":"User: I want to create a molecule SELFIES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay.\nAssistant: Ok, this SELFIES is not toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay: [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C]"} {"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay.\nAssistant: Ok, this SELFIES is not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay: [C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][=Branch1][C][=Branch1][C][=O][O][C][=C][Ring1][=Branch2]"}", "/scratch/micpie/export/sr_atad5_tox21/valid_0-2.jsonl": "{"text":"Based on the canonical SMILES CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21, the molecule has no Luciferase-tagged ATAD5 in human embryonic kidney cells toxicity features."} {"text":"Based on the SMILES CN1[C@H]2CC[C@@H]1C[C@H](OC(=O)C(O)c1ccccc1)C2, the molecule has no Luciferase-tagged ATAD5 toxicity characteristics."}", "/scratch/micpie/export/sr_atad5_tox21/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES representation of CCOc1ccc2nc(S(N)(=O)=O)sc2c1 toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any additional words.\nOptions:\n1 False\n2 True\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of CCOC(=O)CC#N toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nConstraint: Even if you are uncertain, you must pick either A or B without using any additional words.\nOptions:\nA: True\nB: False\nAnswer: B"}", "/scratch/micpie/export/sr_atad5_tox21/valid_0-1.jsonl": "{"text":"The molecule with the SELFIES [C][N][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Ring1][S] is not showing SR-ATAD5 toxicity."} {"text":"The molecule with the SELFIES representation of [C][N][C@H1][C][C][C@@H1][Ring1][Branch1][C][C@H1][Branch2][Ring1][C][O][C][=Branch1][C][=O][C][Branch1][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring2][Ring1][C] is not showing SR-ATAD5 toxicity."}", "/scratch/micpie/export/sr_atad5_tox21/valid_0-13.jsonl": "{"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay.\nAssistant: Understood, this DeepSMILES is not toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay: CNC)CCCNcccccc6CCccccCl)cc6%15"} {"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay.\nAssistant: Understood, this canonical SMILES is not toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay: CN1[C@@H]2CC[C@H]1C[C@@H](OC(=O)C(O)c1ccccc1)C2"}", "/scratch/micpie/export/sr_atad5_tox21/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Luciferase-tagged ATAD5 assay.\ncanonical SMILES: CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Luciferase-tagged ATAD5 assay.\ncanonical SMILES: CN1[C@@H]2CC[C@H]1C[C@@H](OC(=O)C(O)c1ccccc1)C2\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/sr_atad5_tox21/train_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nConstraint: You must select none, one or more options from a, b, c, d, or e without using any other words.\nOptions:\n[a] [C][C][C][O][C][=Branch1][C][=O][C][C][=C][C][=C][Branch1][=C][O][C][C][=Branch1][C][=O][N][Branch1][Ring1][C][C][C][C][C][Branch1][Ring1][O][C][=C][Ring1][P]\n[b] [C][O][C][=C][C][Branch1][Ring1][O][C][=N][C][Branch2][Ring1][#C][N][C][=Branch1][C][=O][N][S][=Branch1][C][=O][=Branch1][C][=O][C][=N][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][Branch1][C][C][C][=N][Ring2][Ring1][#Branch2]\n[c] [C][=C][C][=C][S][C][Branch1][O][S][N][C][C][C][C][C][C][Ring1][=Branch1][=N][C][Ring1][=N][=C][Ring1][P]\n[d] [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N]\n[e] [C][C][Branch1][C][C][C][C][C][O]\nAnswer: a, b, c, d, e"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any other words.\nOptions:\nA.) InChI=1S\/C5H6OS\/c1-4-5(7)2-3-6-4\/h2-3,7H,1H3\nB.) InChI=1S\/C13H17NO4\/c1-3-4-9-7-10(13(17)14-5-6-15)12(16)11(8-9)18-2\/h3,7-8,15-16H,1,4-6H2,2H3,(H,14,17)\nC.) InChI=1S\/C11H16O\/c1-4-11(2,3)9-5-7-10(12)8-6-9\/h5-8,12H,4H2,1-3H3\nD.) InChI=1S\/C5H7NO2\/c1-2-8-5(7)3-4-6\/h2-3H2,1H3\nE.) InChI=1S\/C30H53N3O6\/c1-19(2)22(14-21-10-11-26(38-8)27(15-21)39-13-9-12-37-7)16-24(31)25(34)17-23(20(3)4)28(35)33-18-30(5,6)29(32)36\/h10-11,15,19-20,22-25,34H,9,12-14,16-18,31H2,1-8H3,(H2,32,36)(H,33,35)\/t22-,23-,24-,25-\/m0\/s1\nAnswer: A, B, C, D, E"}", "/scratch/micpie/export/sr_atad5_tox21/valid_0-4.jsonl": "{"text":"The SMILES CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21 is not toxic in the Luciferase-tagged ATAD5 assay."} {"text":"The SMILES CN1[C@H]2CC[C@@H]1C[C@H](OC(=O)C(O)c1ccccc1)C2 is not toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay."}", "/scratch/micpie/export/sr_atad5_tox21/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay.\nMolecule canonical SMILES: CCOc1ccc2nc(S(N)(=O)=O)sc2c1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay.\nSMILES: CCOC(=O)CC#N\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/sr_atad5_tox21/valid_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any additional words.\nOptions:\n(1) InChI=1S\/C16H31N2.C2F6NO4S2\/c1-3-4-5-6-7-8-9-10-11-12-13-18-15-14-17(2)16-18;3-1(4,5)14(10,11)9-15(12,13)2(6,7)8\/h14-16H,3-13H2,1-2H3;\/q+1;-1\n(2) InChI=1S\/C19H23ClN2\/c1-21(2)12-5-13-22-18-7-4-3-6-15(18)8-9-16-10-11-17(20)14-19(16)22\/h3-4,6-7,10-11,14H,5,8-9,12-13H2,1-2H3\n(3) InChI=1S\/C5H10O\/c1-5-3-2-4-6-5\/h5H,2-4H2,1H3\n(4) InChI=1S\/C21H29NO2\/c1-3-20-11-9-17-16-8-6-15(22-24)13-14(16)5-7-18(17)19(20)10-12-21(20,23)4-2\/h2,13,16-19,23-24H,3,5-12H2,1H3\/b22-15+\/t16-,17+,18+,19-,20-,21-\/m0\/s1\nAnswer: 1, 2, 3, 4"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the Luciferase-tagged ATAD5 assay?\nConstraint: You must select none, one or more options from a, b, or c without using any additional words.\nOptions:\na: CN1[C@@H]2CC[C@H]1C[C@@H](OC(=O)C(O)c1ccccc1)C2\nb: Cc1cccc(C)c1OCC(C)N\nc: CCC1OC(=O)C[C@@H](O)[C@H](C)[C@@H](O[C@@H]2O[C@H](C)C[C@H](N(C)C)[C@H]2O)[C@@H](CC=O)C[C@@H](C)C(=O)C=C[C@]2(C)OC2C1C\nAnswer: a, b, c"}", "/scratch/micpie/export/sr_atad5_tox21/valid_0-12.jsonl": "{"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very curious. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be toxic in the Luciferase-tagged ATAD5 assay.\nAssistant: Got it, this SELFIES is not toxic in the Luciferase-tagged ATAD5 assay: [C][N][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Ring1][S]"} {"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be toxic in the Luciferase-tagged ATAD5 assay.\nAssistant: Ok, here you go, this SMILES is not toxic in the Luciferase-tagged ATAD5 assay: CN1[C@H]2CC[C@@H]1C[C@H](OC(=O)C(O)c1ccccc1)C2"}", "/scratch/micpie/export/sr_atad5_tox21/train_0-2.jsonl": "{"text":"Based on the InChI representation InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13), the molecule has no Luciferase-tagged ATAD5 in human embryonic kidney cells toxicity features."} {"text":"Based on the SMILES CCOC(=O)CC#N, the molecule has no Luciferase-tagged ATAD5 toxicity features."}", "/scratch/micpie/export/sr_atad5_tox21/test_0-11.jsonl": "{"text":"User: I'm looking for the SELFIES of a molecule that is not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nAssistant: This is a molecule that is not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay: [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C]"} {"text":"User: I'm looking for the SMILES of a molecule that is not toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nAssistant: This is a molecule that is not toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay: COC(=O)c1ccc(C(=O)O)cc1"}", "/scratch/micpie/export/sr_atad5_tox21/train_0-7.jsonl": "{"text":"Task: Please generate a molecule SMILES based on the description.\nDescription: A molecule that is toxic in the Luciferase-tagged ATAD5 assay.\nResult: CCOc1ccc2nc(S(N)(=O)=O)sc2c1"} {"text":"Task: Please generate a molecule SMILES based on the description.\nDescription: A molecule that is toxic in the Luciferase-tagged ATAD5 assay.\nResult: CCOC(=O)CC#N"}", "/scratch/micpie/export/sr_atad5_tox21/train_0-11.jsonl": "{"text":"User: I'm searching for the DeepSMILES of a molecule that is not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nAssistant: This is a molecule that is not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay: CCOccccncSN)=O)=O))sc5c9"} {"text":"User: I'm looking for the DeepSMILES of a molecule that is not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nAssistant: This is a molecule that is not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay: CCOC=O)CC#N"}", "/scratch/micpie/export/sr_atad5_tox21/train_0-1.jsonl": "{"text":"The molecule with the SMILES CCOc1ccc2nc(S(N)(=O)=O)sc2c1 is not showing SR-ATAD5 toxicity."} {"text":"The molecule with the canonical SMILES CCOC(=O)CC#N is not showing SR-ATAD5 toxicity."}", "/scratch/micpie/export/sr_atad5_tox21/train_0-13.jsonl": "{"text":"User: I want to come up with a InChI.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the Luciferase-tagged ATAD5 assay.\nAssistant: Ok, this InChI is not toxic in the Luciferase-tagged ATAD5 assay: InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13)"} {"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay.\nAssistant: Ok, this canonical SMILES is not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay: CCOC(=O)CC#N"}", "/scratch/micpie/export/sr_atad5_tox21/train_0-4.jsonl": "{"text":"The molecule SELFIES [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N] is not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay."} {"text":"The molecule DeepSMILES CCOC=O)CC#N is not toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay."}", "/scratch/micpie/export/sr_atad5_tox21/test_0-7.jsonl": "{"text":"Task: Please create a SMILES based on the text description.\nDescription: A molecule that is toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay.\nResult: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"Task: Please generate a SELFIES based on the description.\nDescription: A molecule that is toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay.\nResult: [C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][=Branch1][C][=Branch1][C][=O][O][C][=C][Ring1][=Branch2]"}", "/scratch/micpie/export/sr_atad5_tox21/train_0-9.jsonl": "{"text":"User: Is the molecule with the DeepSMILES CCOccccncSN)=O)=O))sc5c9 toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nAssistant: No, it is not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay."} {"text":"User: Is the molecule with the SMILES CCOC(=O)CC#N toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nAssistant: No, it is not toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay."}", "/scratch/micpie/export/sr_atad5_tox21/valid_0-3.jsonl": "{"text":"The SELFIES [C][N][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Ring1][S] represents a molecule that is not identified as toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay."} {"text":"The SELFIES [C][N][C@H1][C][C][C@@H1][Ring1][Branch1][C][C@H1][Branch2][Ring1][C][O][C][=Branch1][C][=O][C][Branch1][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring2][Ring1][C] represents a molecule that is not identified as toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay."}", "/scratch/micpie/export/sr_atad5_tox21/test_0-8.jsonl": "{"text":"User: Can you figure out if the molecule with the canonical SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C is toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nAssistant: No, this molecule is not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay."} {"text":"User: Can you figure out if the molecule with the SELFIES [C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][=Branch1][C][=Branch1][C][=O][O][C][=C][Ring1][=Branch2] is toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nAssistant: No, this molecule is not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay."}", "/scratch/micpie/export/sr_atad5_tox21/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20) toxic in the Luciferase-tagged ATAD5 assay?\nConstraint: Even if you are not sure, you must pick either A or B without using any other words.\nOptions:\nA: True\nB: False\nAnswer: B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI InChI=1S\/C9H8O4\/c1-13-9(12)7-4-2-6(3-5-7)8(10)11\/h2-5H,1H3,(H,10,11) toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any additional words.\nOptions:\n1 True\n2 False\nAnswer: 2"}", "/scratch/micpie/export/sr_atad5_tox21/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [C][N][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Ring1][S] toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any other words.\nOptions:\n[1] False\n[2] True\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI representation of InChI=1S\/C16H21NO3\/c1-17-12-7-8-13(17)10-14(9-12)20-16(19)15(18)11-5-3-2-4-6-11\/h2-6,12-15,18H,7-10H2,1H3\/t12-,13+,14+,15? toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any other words.\nOptions:\n1 True\n2 False\nAnswer: 2"}", "/scratch/micpie/export/sr_atad5_tox21/test_0-4.jsonl": "{"text":"The SELFIES [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C] is not toxic in the Luciferase-tagged ATAD5 assay."} {"text":"The molecule DeepSMILES COC=O)ccccC=O)O))cc6 is not toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay."}", "/scratch/micpie/export/sr_atad5_tox21/test_0-12.jsonl": "{"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay.\nAssistant: Got it, here you go, this InChI is not toxic in the Luciferase-tagged ATAD5 in human embryonic kidney cells assay: InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20)"} {"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay.\nAssistant: Got it, here you go, this SELFIES is not toxic in the SR-Luciferase-tagged ATAD5 in human embryonic kidney cells assay: [C][O][C][=Branch1][C][=O][C][=C][C][=C][Branch1][=Branch1][C][=Branch1][C][=O][O][C][=C][Ring1][=Branch2]"}", "/scratch/micpie/export/drug_protein_domain/test_0-1.jsonl": "{"text":"The drug [H][C@@][C][C][C@H1][Branch1][C][O][C@@][Ring1][=Branch1][Branch1][C][C][C][C][C@][Branch1][C][H][C][=C][C][=C][Branch1][C][O][C][=C][Ring1][#Branch1][C][C@@H1][Branch2][Ring1][S][C][C][C][C][C][C][C][C][C][S][=Branch1][C][=O][C][C][C][C][Branch1][C][F][Branch1][C][F][C][Branch1][C][F][Branch1][C][F][F][C@@][Ring2][Ring2][#Branch2][Ring2][Ring1][P][H] targets the protein Estradiol receptor which has a Oestrogen-type nuclear receptor final C-terminal domain."} {"text":"The drug 9-Deazaguanine targets the protein Hypoxanthine-guanine phosphoribosyltransferase which has a Phosphoribosyltransferase domain."}", "/scratch/micpie/export/drug_protein_domain/valid_0-0.jsonl": "{"text":"CN1C=NC2=C1C(=O)N(C)C(=O)N2C targets the protein Cone cGMP-specific 3',5'-cyclic phosphodiesterase subunit alpha' which has a HD\/PDEase domain."} {"text":"CCN1C(=O)C=CC1=O targets the protein Gal-10 which has a Galectin, carbohydrate recognition domain."}", "/scratch/micpie/export/drug_protein_domain/test_0-2.jsonl": "{"text":"User: Can you give me an example for a protein that binds the drug Fulvestrant?\nAssistant: Of course, the drug targets for example the protein ER-alpha.\nUser: Can you tell me a domain of the protein ER-alpha?\nAssistant: The protein ER-alpha has a Oestrogen-type nuclear receptor final C-terminal domain."} {"text":"User: Can you come up with one example for a protein that binds the drug InChI=1S\/C6H6N4O\/c7-6-9-3-1-2-8-4(3)5(11)10-6\/h1-2,8H,(H3,7,9,10,11)?\nAssistant: Of course, the drug targets for example the protein HGPRTase.\nUser: Can you tell me a domain of the protein HGPRTase?\nAssistant: The protein HGPRTase has a Phosphoribosyltransferase domain."}", "/scratch/micpie/export/drug_protein_domain/test_0-0.jsonl": "{"text":"Fulvestrant targets the protein Estrogen receptor which has a Oestrogen-type nuclear receptor final C-terminal domain."} {"text":"NC=NC=CNC=C5)))C=O)N6 targets the protein HGPRT which has a Phosphoribosyltransferase domain."}", "/scratch/micpie/export/drug_protein_domain/train_0-0.jsonl": "{"text":"[H][C@](O)(CCC(O)=O)NC1=CC=C(C=C1)N1C(=O)CCC1=O targets the protein MLC-2B which has a DJBP, EF-hand domain."} {"text":"InChI=1S\/C6H9N3O2\/c7-5(6(10)11)1-4-2-8-3-9-4\/h2-3,5H,1,7H2,(H,8,9)(H,10,11)\/t5-\/m0\/s1 targets the protein HisRS which has a WHEP-TRS domain."}", "/scratch/micpie/export/drug_protein_domain/valid_0-2.jsonl": "{"text":"User: Can you come up with an example for a protein that binds the drug Caffeine?\nAssistant: Yes, the drug targets for example the protein cGMP phosphodiesterase 6C.\nUser: Can you tell me a domain of the protein cGMP phosphodiesterase 6C?\nAssistant: The protein cGMP phosphodiesterase 6C has a HD\/PDEase domain."} {"text":"User: Can you come up with one example for a protein that binds the drug N-Ethylmaleimide?\nAssistant: Yes, the drug targets for example the protein Galectin-10.\nUser: Can you tell me a domain of the protein Galectin-10?\nAssistant: The protein Galectin-10 has a Galectin, carbohydrate recognition domain."}", "/scratch/micpie/export/drug_protein_domain/valid_0-1.jsonl": "{"text":"The drug CNC=NC=C5C=O)NC)C=O)N6C targets the protein cGMP phosphodiesterase 6C which has a HD\/PDEase domain."} {"text":"The drug CCN1C(=O)C=CC1=O targets the protein Lysolecithin acylhydrolase which has a Galectin, carbohydrate recognition domain."}", "/scratch/micpie/export/drug_protein_domain/train_0-2.jsonl": "{"text":"User: Can you give me an example for a protein that binds the drug 4-[4-(2,5-DIOXO-PYRROLIDIN-1-YL)-PHENYLAMINO]-4-HYDROXY-BUTYRIC ACID?\nAssistant: Of course, the drug targets for example the protein MLC-2B.\nUser: Can you tell me a domain of the protein MLC-2B?\nAssistant: The protein MLC-2B has a DJBP, EF-hand domain."} {"text":"User: Can you give me one example for a protein that binds the drug N[C@@H](CC1=CNC=N1)C(O)=O?\nAssistant: Sure, the drug targets for example the protein HisRS.\nUser: Can you tell me a domain of the protein HisRS?\nAssistant: The protein HisRS has a WHEP-TRS domain."}", "/scratch/micpie/export/drug_protein_domain/train_0-1.jsonl": "{"text":"The drug O=C(O)CC[C@H](O)Nc1ccc(N2C(=O)CCC2=O)cc1 targets the protein Myosin regulatory light chain MRLC3 which has a DJBP, EF-hand domain."} {"text":"The drug N[C@@H]CC=CNC=N5))))))CO)=O targets the protein HisRS which has a WHEP-TRS domain."}", "/scratch/micpie/export/MUV_652/valid_0-0.jsonl": "{"text":"The molecular species with the SELFIES ['[C][N][C][=C][N][=C][Ring1][Branch1][S][C][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][#Branch2][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][=N]'] is not an inhibitor of HIV RT-RNase."} {"text":"The molecule with the InChI representation of InChI=1S\/C18H28N4O3\/c1-3-25-18(24)21-8-6-15(7-9-21)19-17(23)12-22-11-14-10-13(2)4-5-16(14)20-22\/h11,13,15H,3-10,12H2,1-2H3,(H,19,23) is not an inhibitor of HIV RT-RNase."}", "/scratch/micpie/export/MUV_652/test_0-0.jsonl": "{"text":"The compound with the InChI representation of InChI=1S\/C10H12N2O3\/c11-10(15)12-8(9(13)14)6-7-4-2-1-3-5-7\/h1-5,8H,6H2,(H,13,14)(H3,11,12,15) is not an inhibitor of HIV RT-RNase."} {"text":"The chemical with the DeepSMILES CcccccOCC=O)Ncccccc6NCCNC=O)CC)C)))CC6))))))))))))))))c6C is not an inhibitor of HIV RT-RNase."}", "/scratch/micpie/export/MUV_652/train_0-0.jsonl": "{"text":"The molecular species with the DeepSMILES CCCccc=O)ncSCC=O)NCCC)C)))CCCS=O)=O)C5)))))))))[nH]6 is not an inhibitor of HIV RT-RNase."} {"text":"The chemical with the DeepSMILES representation of O=S=O)NCCccccnc6))))))NCCOCC6)))))))))ccccs5 is not an inhibitor of HIV RT-RNase."}", "/scratch/micpie/export/half_life_obach/test_0-10.jsonl": "{"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should have a half life of 6.400 hours.\nAssistant: Got it, here you go, this DeepSMILES represents a molecule that has a half life of 6.400 hours: CCNCC))CCNC=O)ccccNCC)=O)))cc6"} {"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a drug half life time of 1.800 hours.\nAssistant: Got it, here you go, this DeepSMILES represents a molecule that has a drug half life time of 1.800 hours: CC=O)NccI)cNCC)=O)))cI)cC=O)O))c6I"}", "/scratch/micpie/export/half_life_obach/valid_0-8.jsonl": "{"text":"User: Can you create the DeepSMILES of a molecule that has a half life time of 4.100 hours?\nAssistant: Yes, I'm happy to help, here you go: CN[C@H]CC[C@@H]5C[C@H]OC=O)CCO))cccccc6)))))))))C7"} {"text":"User: Can you give me the DeepSMILES of a molecule that has a half life time of 5.800 hours?\nAssistant: Yes, I'm happy to help, here you go: COcccccCO)CCCCC6CNC)C)))))))))c6"}", "/scratch/micpie/export/half_life_obach/train_0-8.jsonl": "{"text":"User: Can you give me the SELFIES of a molecule that has a half life of 4.100 hours?\nAssistant: Of course, here you go: [C][C][N][C][C][Branch1][O][C][C][N][C][C][O][C][C][Ring1][=Branch1][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring2][Ring1][=Branch2][=O]"} {"text":"User: Can you generate the InChI of a molecule that has a drug half life time of 3.400 hours?\nAssistant: Sure, here you go: InChI=1S\/C17H25N3O2S\/c1-19(2)10-7-15-12-18-17-6-5-14(11-16(15)17)13-23(21,22)20-8-3-4-9-20\/h5-6,11-12,18H,3-4,7-10,13H2,1-2H3"}", "/scratch/micpie/export/half_life_obach/test_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the drug half life time in hours.\nDeepSMILES: CCNCC))CCNC=O)ccccNCC)=O)))cc6\nConstraint: Even if you are not sure, you must answer with a numeric value in hours without the unit and without using any additional words.\nResult: 6.400"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the half life time in hours.\nSELFIES: [C][C][=Branch1][C][=O][N][C][=C][Branch1][C][I][C][Branch1][#Branch1][N][C][Branch1][C][C][=O][=C][Branch1][C][I][C][Branch1][=Branch1][C][=Branch1][C][=O][O][=C][Ring1][#C][I]\nConstraint: Even if you are not sure, you must answer with a numeric value in hours without the unit and without using any other words.\nResult: 1.800"}", "/scratch/micpie/export/half_life_obach/valid_0-9.jsonl": "{"text":"User: I'm searching for the InChI of a molecule that has a drug half life time of 4.100 hours.\nAssistant: This is a molecule that has a drug half life time of 4.100 hours: InChI=1S\/C17H23NO3\/c1-18-13-7-8-14(18)10-15(9-13)21-17(20)16(11-19)12-5-3-2-4-6-12\/h2-6,13-16,19H,7-11H2,1H3\/t13-,14+,15+,16?"} {"text":"User: I'm searching for the SELFIES of a molecule that has a drug half life time of 5.800 hours.\nAssistant: This is a molecule that has a drug half life time of 5.800 hours: [C][O][C][=C][C][=C][C][Branch2][Ring1][C][C][Branch1][C][O][C][C][C][C][C][Ring1][#Branch1][C][N][Branch1][C][C][C][=C][Ring1][P]"}", "/scratch/micpie/export/half_life_obach/test_0-1.jsonl": "{"text":"Based on the SMILES CCN(CC)CCNC(=O)c1ccc(NC(C)=O)cc1, the molecule has a half life of 6.400 hours."} {"text":"Based on the InChI representation of InChI=1S\/C11H9I3N2O4\/c1-3(17)15-9-6(12)5(11(19)20)7(13)10(8(9)14)16-4(2)18\/h1-2H3,(H,15,17)(H,16,18)(H,19,20), the molecule has a half life of 1.800 hours."}", "/scratch/micpie/export/half_life_obach/valid_0-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of CN[C@H]CC[C@@H]5C[C@H]OC=O)CCO))cccccc6)))))))))C7 has a drug half life time of 4.100 hours."} {"text":"The molecule with the SELFIES representation of [C][O][C][=C][C][=C][C][Branch2][Ring1][C][C][Branch1][C][O][C][C][C][C][C][Ring1][#Branch1][C][N][Branch1][C][C][C][=C][Ring1][P] has a drug half life time of 5.800 hours."}", "/scratch/micpie/export/half_life_obach/test_0-2.jsonl": "{"text":"The SMILES CCN(CC)CCNC(=O)c1ccc(NC(C)=O)cc1 represents a molecule with a half life time of 6.400 hours."} {"text":"The InChI InChI=1S\/C11H9I3N2O4\/c1-3(17)15-9-6(12)5(11(19)20)7(13)10(8(9)14)16-4(2)18\/h1-2H3,(H,15,17)(H,16,18)(H,19,20) is representing a molecule that has a half life time of 1.800 hours."}", "/scratch/micpie/export/half_life_obach/valid_0-10.jsonl": "{"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a drug half life time of 4.100 hours.\nAssistant: Ok, here you go, this SMILES represents a molecule that has a drug half life time of 4.100 hours: CN1[C@H]2CC[C@@H]1C[C@H](OC(=O)C(CO)c1ccccc1)C2"} {"text":"User: I want to generate a molecule InChI.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a half life of 5.800 hours.\nAssistant: Got it, this InChI represents a molecule that has a half life of 5.800 hours: InChI=1S\/C16H25NO2\/c1-17(2)12-14-7-4-5-10-16(14,18)13-8-6-9-15(11-13)19-3\/h6,8-9,11,14,18H,4-5,7,10,12H2,1-3H3"}", "/scratch/micpie/export/half_life_obach/train_0-6.jsonl": "{"text":"Task: Please create a molecule SELFIES based on the text description below.\nDescription: A molecule that has a half life time of 4.100 hours.\nResult: [C][C][N][C][C][Branch1][O][C][C][N][C][C][O][C][C][Ring1][=Branch1][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring2][Ring1][=Branch2][=O]"} {"text":"Task: Please give me a DeepSMILES based on the description below.\nDescription: A molecule that has a half life of 3.400 hours.\nResult: CNC)CCcc[nH]ccccCS=O)=O)NCCCC5)))))))cc96"}", "/scratch/micpie/export/half_life_obach/valid_0-6.jsonl": "{"text":"Task: Please generate a molecule SMILES based on the description below.\nDescription: A molecule that has a drug half life time of 4.100 hours.\nResult: CN1[C@H]2CC[C@@H]1C[C@H](OC(=O)C(CO)c1ccccc1)C2"} {"text":"Task: Please generate a molecule DeepSMILES based on the description below.\nDescription: A molecule that has a half life of 5.800 hours.\nResult: COcccccCO)CCCCC6CNC)C)))))))))c6"}", "/scratch/micpie/export/half_life_obach/test_0-9.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that has a half life time of 6.400 hours.\nAssistant: This is a molecule that has a half life time of 6.400 hours: CCN(CC)CCNC(=O)c1ccc(NC(C)=O)cc1"} {"text":"User: I'm searching for the SMILES of a molecule that has a half life time of 1.800 hours.\nAssistant: This is a molecule that has a half life time of 1.800 hours: CC(=O)Nc1c(I)c(NC(C)=O)c(I)c(C(=O)O)c1I"}", "/scratch/micpie/export/half_life_obach/test_0-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CCN(CC)CCNC(=O)c1ccc(NC(C)=O)cc1 has a half life time of 6.400 hours."} {"text":"The molecule with the InChI representation of InChI=1S\/C11H9I3N2O4\/c1-3(17)15-9-6(12)5(11(19)20)7(13)10(8(9)14)16-4(2)18\/h1-2H3,(H,15,17)(H,16,18)(H,19,20) has a half life time of 1.800 hours."}", "/scratch/micpie/export/half_life_obach/valid_0-7.jsonl": "{"text":"User: Can you tell me the drug half life time in hours of the molecule with the InChI InChI=1S\/C17H23NO3\/c1-18-13-7-8-14(18)10-15(9-13)21-17(20)16(11-19)12-5-3-2-4-6-12\/h2-6,13-16,19H,7-11H2,1H3\/t13-,14+,15+,16??\nAssistant: Yes, this molecule has a drug half life time of 4.100 hours."} {"text":"User: Can you estimate the drug half life time in hours of the molecule with the SMILES COc1cccc(C2(O)CCCCC2CN(C)C)c1?\nAssistant: Of course, this molecule has a drug half life time of 5.800 hours."}", "/scratch/micpie/export/half_life_obach/test_0-3.jsonl": "{"text":"The molecule with the SMILES CCN(CC)CCNC(=O)c1ccc(NC(C)=O)cc1 has a half life time of 6.400 hours."} {"text":"The molecule with the SELFIES [C][C][=Branch1][C][=O][N][C][=C][Branch1][C][I][C][Branch1][#Branch1][N][C][Branch1][C][C][=O][=C][Branch1][C][I][C][Branch1][=Branch1][C][=Branch1][C][=O][O][=C][Ring1][#C][I] has a half life of 1.800 hours."}", "/scratch/micpie/export/half_life_obach/valid_0-11.jsonl": "{"text":"User: I want to come up with a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a drug half life time of 4.100 hours.\nAssistant: Ok, this canonical SMILES represents a molecule that has a drug half life time of 4.100 hours: CN1[C@@H]2CC[C@H]1C[C@@H](OC(=O)C(CO)c1ccccc1)C2"} {"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a half life time of 5.800 hours.\nAssistant: Ok, this InChI represents a molecule that has a half life time of 5.800 hours: InChI=1S\/C16H25NO2\/c1-17(2)12-14-7-4-5-10-16(14,18)13-8-6-9-15(11-13)19-3\/h6,8-9,11,14,18H,4-5,7,10,12H2,1-3H3"}", "/scratch/micpie/export/half_life_obach/train_0-0.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C24H30N2O2\/c1-2-26-19-22(13-14-25-15-17-28-18-16-25)24(23(26)27,20-9-5-3-6-10-20)21-11-7-4-8-12-21\/h3-12,22H,2,13-19H2,1H3 has a half life of 4.100 hours."} {"text":"The molecule with the DeepSMILES representation of CNC)CCcc[nH]ccccCS=O)=O)NCCCC5)))))))cc96 has a half life of 3.400 hours."}", "/scratch/micpie/export/half_life_obach/test_0-6.jsonl": "{"text":"Task: Please give me a SMILES based on the text description below.\nDescription: A molecule that has a half life time of 6.400 hours.\nResult: CCN(CC)CCNC(=O)c1ccc(NC(C)=O)cc1"} {"text":"Task: Please generate a DeepSMILES based on the description.\nDescription: A molecule that has a drug half life time of 1.800 hours.\nResult: CC=O)NccI)cNCC)=O)))cI)cC=O)O))c6I"}", "/scratch/micpie/export/half_life_obach/train_0-10.jsonl": "{"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should have a half life of 4.100 hours.\nAssistant: Ok, this canonical SMILES represents a molecule that has a half life of 4.100 hours: CCN1CC(CCN2CCOCC2)C(c2ccccc2)(c2ccccc2)C1=O"} {"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a drug half life time of 3.400 hours.\nAssistant: Ok, here you go, this SELFIES represents a molecule that has a drug half life time of 3.400 hours: [C][N][Branch1][C][C][C][C][C][=C][NH1][C][=C][C][=C][Branch1][S][C][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][C][C][Ring1][Branch1][C][=C][Ring2][Ring1][C][Ring1][#C]"}", "/scratch/micpie/export/half_life_obach/train_0-3.jsonl": "{"text":"The molecule with the SMILES CCN1CC(CCN2CCOCC2)C(c2ccccc2)(c2ccccc2)C1=O has a half life of 4.100 hours."} {"text":"The molecule with the InChI InChI=1S\/C17H25N3O2S\/c1-19(2)10-7-15-12-18-17-6-5-14(11-16(15)17)13-23(21,22)20-8-3-4-9-20\/h5-6,11-12,18H,3-4,7-10,13H2,1-2H3 has a half life of 3.400 hours."}", "/scratch/micpie/export/half_life_obach/valid_0-2.jsonl": "{"text":"The canonical SMILES CN1[C@@H]2CC[C@H]1C[C@@H](OC(=O)C(CO)c1ccccc1)C2 is representing a molecule that has a half life time of 4.100 hours."} {"text":"The SELFIES [C][O][C][=C][C][=C][C][Branch2][Ring1][C][C][Branch1][C][O][C][C][C][C][C][Ring1][#Branch1][C][N][Branch1][C][C][C][=C][Ring1][P] is representing a molecule that has a drug half life time of 5.800 hours."}", "/scratch/micpie/export/half_life_obach/valid_0-1.jsonl": "{"text":"Based on the InChI InChI=1S\/C17H23NO3\/c1-18-13-7-8-14(18)10-15(9-13)21-17(20)16(11-19)12-5-3-2-4-6-12\/h2-6,13-16,19H,7-11H2,1H3\/t13-,14+,15+,16?, the molecule has a drug half life time of 4.100 hours."} {"text":"Based on the SELFIES [C][O][C][=C][C][=C][C][Branch2][Ring1][C][C][Branch1][C][O][C][C][C][C][C][Ring1][#Branch1][C][N][Branch1][C][C][C][=C][Ring1][P], the molecule has a half life of 5.800 hours."}", "/scratch/micpie/export/half_life_obach/valid_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the half life time in hours.\nSELFIES: [C][N][C@H1][C][C][C@@H1][Ring1][Branch1][C][C@H1][Branch2][Ring1][Ring1][O][C][=Branch1][C][=O][C][Branch1][Ring1][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring2][Ring1][Ring1]\nConstraint: Even if you are uncertain, you must answer with a numeric value in hours without the unit and without using any additional words.\nResult: 4.100"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the half life in hours.\nMolecule canonical SMILES: COc1cccc(C2(O)CCCCC2CN(C)C)c1\nConstraint: Even if you are not sure, you must answer with a numeric value in hours without the unit and without using any additional words.\nResult: 5.800"}", "/scratch/micpie/export/half_life_obach/valid_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the drug half life time in hours.\nMolecule DeepSMILES: CN[C@H]CC[C@@H]5C[C@H]OC=O)CCO))cccccc6)))))))))C7\nConstraint: Even if you are uncertain, you must answer with a numeric value in hours without using any other words.\nResult: 4.100 hours"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the half life in hours.\nMolecule SMILES: COc1cccc(C2(O)CCCCC2CN(C)C)c1\nConstraint: Even if you are uncertain, you must answer with a numeric value in hours without using any other words.\nResult: 5.800 hours"}", "/scratch/micpie/export/half_life_obach/train_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the half life time in hours.\ncanonical SMILES: CCN1CC(CCN2CCOCC2)C(c2ccccc2)(c2ccccc2)C1=O\nConstraint: Even if you are uncertain, you must answer with a numeric value in hours without the unit and without using any other words.\nResult: 4.100"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the half life in hours.\nSMILES: CN(C)CCc1c[nH]c2ccc(CS(=O)(=O)N3CCCC3)cc12\nConstraint: Even if you are uncertain, you must answer with a numeric value in hours without the unit and without using any additional words.\nResult: 3.400"}", "/scratch/micpie/export/half_life_obach/train_0-2.jsonl": "{"text":"The SELFIES [C][C][N][C][C][Branch1][O][C][C][N][C][C][O][C][C][Ring1][=Branch1][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring2][Ring1][=Branch2][=O] is representing a molecule with a drug half life time of 4.100 hours."} {"text":"The SELFIES [C][N][Branch1][C][C][C][C][C][=C][NH1][C][=C][C][=C][Branch1][S][C][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][C][C][Ring1][Branch1][C][=C][Ring2][Ring1][C][Ring1][#C] is representing a molecule that has a half life of 3.400 hours."}", "/scratch/micpie/export/half_life_obach/test_0-11.jsonl": "{"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should have a half life time of 6.400 hours.\nAssistant: Ok, this canonical SMILES represents a molecule that has a half life time of 6.400 hours: CCN(CC)CCNC(=O)c1ccc(NC(C)=O)cc1"} {"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a half life of 1.800 hours.\nAssistant: Understood, this canonical SMILES represents a molecule that has a half life of 1.800 hours: CC(=O)Nc1c(I)c(NC(C)=O)c(I)c(C(=O)O)c1I"}", "/scratch/micpie/export/half_life_obach/train_0-7.jsonl": "{"text":"User: Can you estimate the drug half life time in hours of the molecule with the DeepSMILES CCNCCCCNCCOCC6))))))))Ccccccc6))))))cccccc6))))))C5=O?\nAssistant: Yes, I'm happy to help, this molecule has a drug half life time of 4.100 hours."} {"text":"User: Can you tell me the drug half life time in hours of the molecule with the canonical SMILES CN(C)CCc1c[nH]c2ccc(CS(=O)(=O)N3CCCC3)cc12?\nAssistant: Yes, I'm happy to help, this molecule has a drug half life time of 3.400 hours."}", "/scratch/micpie/export/half_life_obach/train_0-11.jsonl": "{"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a half life of 4.100 hours.\nAssistant: Understood, this SELFIES represents a molecule that has a half life of 4.100 hours: [C][C][N][C][C][Branch1][O][C][C][N][C][C][O][C][C][Ring1][=Branch1][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring2][Ring1][=Branch2][=O]"} {"text":"User: I want to create a SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should have a half life time of 3.400 hours.\nAssistant: Understood, this SMILES represents a molecule that has a half life time of 3.400 hours: CN(C)CCc1c[nH]c2ccc(CS(=O)(=O)N3CCCC3)cc12"}", "/scratch/micpie/export/half_life_obach/train_0-1.jsonl": "{"text":"Based on the SELFIES representation of [C][C][N][C][C][Branch1][O][C][C][N][C][C][O][C][C][Ring1][=Branch1][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring2][Ring1][=Branch2][=O], the molecule has a half life of 4.100 hours."} {"text":"Based on the SELFIES representation of [C][N][Branch1][C][C][C][C][C][=C][NH1][C][=C][C][=C][Branch1][S][C][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][C][C][Ring1][Branch1][C][=C][Ring2][Ring1][C][Ring1][#C], the molecule has a half life of 3.400 hours."}", "/scratch/micpie/export/half_life_obach/train_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the drug half life time in hours.\nSELFIES: [C][C][N][C][C][Branch1][O][C][C][N][C][C][O][C][C][Ring1][=Branch1][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring2][Ring1][=Branch2][=O]\nConstraint: Even if you are uncertain, you must answer with a numeric value in hours without using any additional words.\nResult: 4.100 hours"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the half life in hours.\ncanonical SMILES: CN(C)CCc1c[nH]c2ccc(CS(=O)(=O)N3CCCC3)cc12\nConstraint: Even if you are uncertain, you must answer with a numeric value in hours without using any additional words.\nResult: 3.400 hours"}", "/scratch/micpie/export/half_life_obach/test_0-7.jsonl": "{"text":"User: Can you estimate the drug half life time in hours of the molecule with the SELFIES [C][C][N][Branch1][Ring1][C][C][C][C][N][C][=Branch1][C][=O][C][=C][C][=C][Branch1][#Branch1][N][C][Branch1][C][C][=O][C][=C][Ring1][#Branch2]?\nAssistant: Of course, this molecule has a drug half life time of 6.400 hours."} {"text":"User: Can you derive the drug half life time in hours of the molecule with the DeepSMILES CC=O)NccI)cNCC)=O)))cI)cC=O)O))c6I?\nAssistant: Of course, this molecule has a drug half life time of 1.800 hours."}", "/scratch/micpie/export/half_life_obach/train_0-9.jsonl": "{"text":"User: I'm searching for the InChI of a molecule that has a half life time of 4.100 hours.\nAssistant: This is a molecule that has a half life time of 4.100 hours: InChI=1S\/C24H30N2O2\/c1-2-26-19-22(13-14-25-15-17-28-18-16-25)24(23(26)27,20-9-5-3-6-10-20)21-11-7-4-8-12-21\/h3-12,22H,2,13-19H2,1H3"} {"text":"User: I'm searching for the SELFIES of a molecule that has a drug half life time of 3.400 hours.\nAssistant: This is a molecule that has a drug half life time of 3.400 hours: [C][N][Branch1][C][C][C][C][C][=C][NH1][C][=C][C][=C][Branch1][S][C][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][C][C][Ring1][Branch1][C][=C][Ring2][Ring1][C][Ring1][#C]"}", "/scratch/micpie/export/half_life_obach/valid_0-3.jsonl": "{"text":"The molecule with the SELFIES [C][N][C@H1][C][C][C@@H1][Ring1][Branch1][C][C@H1][Branch2][Ring1][Ring1][O][C][=Branch1][C][=O][C][Branch1][Ring1][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring2][Ring1][Ring1] has a drug half life time of 4.100 hours."} {"text":"The molecule with the SELFIES [C][O][C][=C][C][=C][C][Branch2][Ring1][C][C][Branch1][C][O][C][C][C][C][C][Ring1][#Branch1][C][N][Branch1][C][C][C][=C][Ring1][P] has a half life time of 5.800 hours."}", "/scratch/micpie/export/half_life_obach/test_0-8.jsonl": "{"text":"User: Can you create the InChI of a molecule that has a half life of 6.400 hours?\nAssistant: Sure, here you go: InChI=1S\/C15H23N3O2\/c1-4-18(5-2)11-10-16-15(20)13-6-8-14(9-7-13)17-12(3)19\/h6-9H,4-5,10-11H2,1-3H3,(H,16,20)(H,17,19)"} {"text":"User: Can you give me the SELFIES of a molecule that has a half life time of 1.800 hours?\nAssistant: Yes, here you go: [C][C][=Branch1][C][=O][N][C][=C][Branch1][C][I][C][Branch1][#Branch1][N][C][Branch1][C][C][=O][=C][Branch1][C][I][C][Branch1][=Branch1][C][=Branch1][C][=O][O][=C][Ring1][#C][I]"}", "/scratch/micpie/export/half_life_obach/test_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the drug half life time in hours.\nMolecule SELFIES: [C][C][N][Branch1][Ring1][C][C][C][C][N][C][=Branch1][C][=O][C][=C][C][=C][Branch1][#Branch1][N][C][Branch1][C][C][=O][C][=C][Ring1][#Branch2]\nConstraint: Even if you are uncertain, you must answer with a numeric value in hours without using any other words.\nResult: 6.400 hours"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the half life time in hours.\nMolecule InChI: InChI=1S\/C11H9I3N2O4\/c1-3(17)15-9-6(12)5(11(19)20)7(13)10(8(9)14)16-4(2)18\/h1-2H3,(H,15,17)(H,16,18)(H,19,20)\nConstraint: Even if you are uncertain, you must answer with a numeric value in hours without using any other words.\nResult: 1.800 hours"}", "/scratch/micpie/export/chemistry_stackexchange/test_0-1.jsonl": "{"text":"Task: Summarize the question in a title.\nQuestion: This question arises because: by giving classes in thermodynamics, I have observed that students are often confused between the different definitions (or applications) of the enthalpy concept. \nThe enthalpy expression is obtained as follows: \nFrom the first law of thermodynamics:\n\\begin{align\\*}\n U=Q+W\n\\end{align\\*}\nRecalling the definition of work:\n\\begin{align\\*}\n W=-PV\n\\end{align\\*}\nVariations added:\n\\begin{align\\*}\n \\delta U=Q - \\delta (PV)\n\\end{align\\*}\nIts obtained that:\n\\begin{align\\*}\n \\delta U&=Q-(V\\delta P+P\\delta V)\n\\end{align\\*}\nConstant volumen:\n\\begin{align\\*}\n \\delta U=Q-V\\delta P\n\\end{align\\*}\nConstant pressure:\n\\begin{align\\*}\n \\delta U=Q-P\\delta V\n\\end{align\\*}\nVariations enlarged:\n\\begin{align\\*}\n (U\\_f-U\\_i)&=Q-P(V\\_f-V\\_i)\\\\\n (U\\_f-U\\_i)&=Q-(PV\\_f-PV\\_i)\\\\\n Q&=(U\\_f-U\\_i)+(PV\\_f-PV\\_i)\\\\\n Q&=(U\\_f+PV\\_f)-(U\\_i+PV\\_i)\\\\\n Q&=H\\_f-H\\_i\n\\end{align\\*}\nTherefore, the definition of enthalpy is:\n$$\n H = U - W\n$$\n**Is this properly proposed?**\nTitle: Equation of enthalpy"} {"text":"Task: Create a meaningful title for this question.\n\\nBelow was question 34 in the USNCO 2017 exam:\n> \n> If $\\pu{0.10 mol}$ of solid $\\ce{NaOH}$ is added to $\\pu{1.00 L}$ of a saturated solution of $\\ce{Ca(OH)2}$ $(K\\_\\mathrm{sp} = \\pu{8.0 \\times 10^-6})$, what percentage of the calcium hydroxide will precipitate at equilibrium?\n> \n> (A) Roughly 50% \n> \n> (B) Roughly 75% \n> \n> (C) Roughly 95% \n> \n> (D) Over 99% \n> \nMy solution is as follows:\n1. Find concentration of $\\ce{Ca^2+}$ $(\\pu{0.02 M})$ and $\\ce{OH-}$ $(\\pu{0.04 M})$ ions from dissolved calcium hydroxide using $K\\_\\mathrm{sp}$.\n2. Add hydroxide ion concentration from sodium hydroxide (assuming full dissolution) to get total hydroxide concentration of $\\pu{0.14 M}$\n3. Find reaction quotient $Q = 0.02 \\times 0.14^2 = 3.92 \\times 10^{-4}$\n4. Find amount of calcium $(x)$ and hydroxide ions $(2x)$ that will precipitate at equilibrium by using algebraic equation: \n$$\n(0.02 - x)(0.14 - 2x)^2 = 8.0 \\times 10^{-6}, x = \\pu{0.019 M}$$\n5. Find percentage of calcium hydroxide precipitated: \n$$\\frac{0.019}{0.02} \\times 100\\% = 95\\%,$$\nhence (C)\nI am unsure about step 4, where a cubic equation appears, and would not be able to be solved in exam conditions (use of graphing calculator is not permitted).\nIs there a simpler method?\nAnswer: How much calcium hydroxide will precipitate after addition of sodium hydroxide into saturated calcium hydroxide solution?"}", "/scratch/micpie/export/chemistry_stackexchange/valid_0-0.jsonl": "{"text":"Task: Please answer the question of the user.\nUser: Is it possible to oxidize ethanol to acetic acid with hydrogen peroxide and if yes then \nunder what circumstances? I tried it in room temperature but either concentration was too small (of hydrogen peroxide(3%)) or I couldn't quite precisely read the the results of the universal indicator. I also tried heating it up, but it didn't change anything. The color stayed the same (of indicator). Can someone please explain me if the mistake was in my experiment (if the reaction can happen in room temperature) or the reaction needs some specific catalyst or other conditions.\nThe reaction would be: \n\\begin{align}\n\\ce{\\underset{(ethanol)}{C2H6O} + H2O2 &-> \n \\underset{(aldehyde)}{C2H4O} + 2H2O}\\\\\n\\ce{\\underset{(aldehyde)}{C2H4O} + H2O2 &-> \n \\underset{(acetic acid)}{C2H4O2} + H2O}\n\\end{align} \nOr the reaction without the middle part (since aldehyde will try to oxidize faster then ethanol) would be: \n$$\\ce{C2H6O + 2H2O2 -> C2H4O2 + 3H2O}$$\nAnswer: First things first: Don't mess with higher concentrations of hydrogen peroxide unless you are a trained and well-equipped chemist. We're talking \"steel-reinforced gauntlets\" here. This chemical is as volatile as nitroglycerine.\nAt 3% you have mostly de-ionized water and for good reason. Hydrogen peroxide is extremely, violently unstable, and is just as likely to oxidize anything else than what you actually intend to react it with.\nAnything above a concentration of 30% is very dangerous. Above 60% is suicidal in ill-equipped scenarios. 100% is a hypothetical, and is in essence a potential component of rocket fuel.\nSo, yes it will work. You may lose a hand, but it will work."} {"text":"Task: Provide a detailed response to the user's question.\nInquiry: Why do we call a solution that has been centrifuged a supernatant? It seems to me that a \"natant\" should be something that floats (from Latin \"natare\" meaning \"to swim\"), and a \"supernatant\" should be something which lies *above* the natant, not below it. Shouldn't the liquid be called a subnatant? Or maybe a superprecipitate?\nEdit: fixed Latin (see below)\nAssistant: It means \"the liquid that swims above\", from the Latin terms *super* = above and *natare* = to swim.\nSo it is the liquid that remains on top.\nThe subnatant would be the liquid at the bottom, in case you should centrifuge two liquid phases. \nThe term you name, *nato*, means son in Latin."}", "/scratch/micpie/export/chemistry_stackexchange/test_0-0.jsonl": "{"text":"Task: Offer a concise and informative answer to the user's question.\nInquiry: This question arises because: by giving classes in thermodynamics, I have observed that students are often confused between the different definitions (or applications) of the enthalpy concept. \nThe enthalpy expression is obtained as follows: \nFrom the first law of thermodynamics:\n\\begin{align\\*}\n U=Q+W\n\\end{align\\*}\nRecalling the definition of work:\n\\begin{align\\*}\n W=-PV\n\\end{align\\*}\nVariations added:\n\\begin{align\\*}\n \\delta U=Q - \\delta (PV)\n\\end{align\\*}\nIts obtained that:\n\\begin{align\\*}\n \\delta U&=Q-(V\\delta P+P\\delta V)\n\\end{align\\*}\nConstant volumen:\n\\begin{align\\*}\n \\delta U=Q-V\\delta P\n\\end{align\\*}\nConstant pressure:\n\\begin{align\\*}\n \\delta U=Q-P\\delta V\n\\end{align\\*}\nVariations enlarged:\n\\begin{align\\*}\n (U\\_f-U\\_i)&=Q-P(V\\_f-V\\_i)\\\\\n (U\\_f-U\\_i)&=Q-(PV\\_f-PV\\_i)\\\\\n Q&=(U\\_f-U\\_i)+(PV\\_f-PV\\_i)\\\\\n Q&=(U\\_f+PV\\_f)-(U\\_i+PV\\_i)\\\\\n Q&=H\\_f-H\\_i\n\\end{align\\*}\nTherefore, the definition of enthalpy is:\n$$\n H = U - W\n$$\n**Is this properly proposed?**\nAnswer: **Definition of internal energy:** U is a function of state, representing the total kinetic and potential energy of the molecules. U = U(T,V)\n**First Law of Thermodynamics:**$$\\Delta U=\\delta Q+\\delta W$$where the symbol $\\Delta$ is used to represent the change in a (path-independent) function of state (like U) between an initial and final thermodynamic equilibrium state of a closed system and the symbol $\\delta$ is used to represent the change in a parameter that depends on the process path between and initial and final thermodynamic equilibrium state of a closed system. $\\delta Q$ is the heat added to the system over the path, and $\\delta W$ is the work done on the system over the path.\n**Relationship for the Work:**$$\\delta W=-\\int{P\\_{ext}dV}$$where, for both for reversible and irreversible process paths, $P\\_{ext}$ is the force per unit area exerted by the gas on the piston face, and, by Newton's 3rd law, the force exerted by the piston face on the gas. For an irreversible process path, the pressure typically varies with spatial location within the cylinder, so that the average gas pressure does not match $P\\_{ext}$ at the piston face. For a reversible process, the gas pressure is uniform within the cylinder, so $P\\_{ext}=P$ where P is the gas pressure calculated from the equation of state of the gas (such as the ideal gas law), based on the number of moles in the cylinder, the gas pressure in the cylinder, and the gas temperature in the cylinder.\n**Combining the First Law with the Relationship for Work:**$$\\Delta U=\\delta Q-\\int{P\\_{ext}dV}\\tag{rev and irrev processes}$$\n$$\\Delta U=\\delta Q-\\int{PdV}\\tag{rev processes}$$\n**Constant Volume ($V\\_i=V\\_f)$:**$$\\delta W=0$$\n$$\\Delta U=\\delta Q$$\n**Constant Pressure ($P\\_{ext}=P\\_i=P\\_f=P$):**$$\\delta W=-P\\_{ext}\\Delta V=-P\\Delta V$$\n$$\\Delta U=Q-P\\Delta V$$\nSo,$$\\Delta U+P\\Delta V=\\delta Q$$But, since P is constant,\n$$\\Delta U+\\Delta (PV)=\\delta Q$$\nThe definition of enthalpy is $$H\\equiv U+PV$$\nTherefore, for a constant pressure process (a process in which $P\\_{ext}$ over the entire process path is equal to the equilibrium pressures in both the initial and final equilibrium states of the system) $$\\Delta H=\\delta Q$$"} {"text":"Task: Address the user's query with a well-structured answer.\nUser: Below was question 34 in the USNCO 2017 exam:\n> \n> If $\\pu{0.10 mol}$ of solid $\\ce{NaOH}$ is added to $\\pu{1.00 L}$ of a saturated solution of $\\ce{Ca(OH)2}$ $(K\\_\\mathrm{sp} = \\pu{8.0 \\times 10^-6})$, what percentage of the calcium hydroxide will precipitate at equilibrium?\n> \n> (A) Roughly 50% \n> \n> (B) Roughly 75% \n> \n> (C) Roughly 95% \n> \n> (D) Over 99% \n> \nMy solution is as follows:\n1. Find concentration of $\\ce{Ca^2+}$ $(\\pu{0.02 M})$ and $\\ce{OH-}$ $(\\pu{0.04 M})$ ions from dissolved calcium hydroxide using $K\\_\\mathrm{sp}$.\n2. Add hydroxide ion concentration from sodium hydroxide (assuming full dissolution) to get total hydroxide concentration of $\\pu{0.14 M}$\n3. Find reaction quotient $Q = 0.02 \\times 0.14^2 = 3.92 \\times 10^{-4}$\n4. Find amount of calcium $(x)$ and hydroxide ions $(2x)$ that will precipitate at equilibrium by using algebraic equation: \n$$\n(0.02 - x)(0.14 - 2x)^2 = 8.0 \\times 10^{-6}, x = \\pu{0.019 M}$$\n5. Find percentage of calcium hydroxide precipitated: \n$$\\frac{0.019}{0.02} \\times 100\\% = 95\\%,$$\nhence (C)\nI am unsure about step 4, where a cubic equation appears, and would not be able to be solved in exam conditions (use of graphing calculator is not permitted).\nIs there a simpler method?\nAnswer: c"}", "/scratch/micpie/export/chemistry_stackexchange/train_0-0.jsonl": "{"text":"Task: Please answer the question of the user.\nQuestion: Today we have huge computational power (which is even significantly larger with supercomputers). I know that computational chemistry is used sometimes to predict particle properties. As I read on Wikipedia:\n> \n> Present algorithms in computational chemistry can routinely calculate the properties of molecules that contain up to about 40 electrons with sufficient accuracy.\n> \nIf that's so, why bother to try to find chemical interactions and properties experimentally, at least up to 40 electrons? For example, every year new drugs are being discovered. Wouldn't be it easier at least to find new chemical compounds, if not their properties, simply by computer simulation? What are the constraints and where do they come from? (I know that such constraints exist, but I'd like to know why).\nAssistant: Aside from the computational power needed to simulate larger molecules, there is also a lack of knowledge about the exact mechanisms that some drugs could potentially use. Think for example of experiments in yeast or *Escherichia coli* cells, which are used to find new biochemical mechanisms that could be exploited for new drugs. Even though we already know a lot about those cells, it would be computationally very demanding to include all known proteins and mechanisms into any kind of simulation. Furthermore, even if we could do such a simulation, there would still be a whole lot of other proteins, genes and mechanisms which we don't really understand yet but which could very well provide new mechanisms that could be used for new drugs. For this reason we would still need (biological) experiments even if we had much greater computational power than we have today."} {"text":"Task: Please answer the question of the user.\n\\n> \n> A $\\pu{3.45 g}$ piece of marble ($\\ce{CaCO3}$) is weighed and dropped into a beaker containing $\\pu{1.00 L}$ of hydrochloric acid. The marble is completely gone $\\pu{4.50 min}$ later. Calculate the average rate of reaction of $\\ce{HCl}$ in $\\pu{mol\/L\/s}$. Note that the volume of the system remains at $\\pu{1.00 L}$ through the entire reaction.\n> \nI'm not very sure why the units are in $\\pu{mol\/L\/s}$ instead of just $\\pu{mol\/s}$. Here is what I did:\n$$\\text{Rate} = \\frac{\\pu{3.45 g}~\\ce{(CaCO3)}}{\\pu{4.5 min}} = \\pu{0.0128 g\/s}$$\n$$\\pu{0.128 g}~\\ce{CaCO3} = \\pu{1.28e-3 mol}$$\n$$n(\\ce{HCl}) = \\pu{2.58e-3 mol}$$\n$$\\text{Rate} = \\pu{2.58e-3 mol\/L\/s}$$\nHowever, the answer given is $\\pu{2.55e-4 mol\/L\/s}$. What am I doing wrong?\nAnswer: You haven't accounted for the stoichiometry of the reaction, and I suppose you wrongly converted minutes to seconds. Always start solving problems like this with writing down the chemical reaction:\n$$\\ce{CaCO3 + 2HCl -> CaCl2 + H2O + CO2}$$\nBy definition rate of consumption of hydrochloric acid over time $\\Delta t$ is:\n$$r=\\frac{\\Delta c(\\ce{HCl})}{\\Delta t}$$\nSince all calcium carbonate reacted completely:\n$$\\Delta c(\\ce{HCl}) = \\frac{\\Delta n(\\ce{HCl})}{V} = \\frac{2n(\\ce{CaCO3})}{V} = \\frac{2m(\\ce{CaCO3})}{V M(\\ce{CaCO3})}$$\nwhere $m$ is mass, $M$ - molar mass, $V$ - volume.\nAnd the average rate is:\n$$r = \\frac{2m(\\ce{CaCO3})}{V M(\\ce{CaCO3})\\Delta t} = \\frac{2\\cdot\\pu{3.45 g}}{\\pu{1 L}\\cdot\\pu{100.09 g mol-1}\\cdot\\pu{4.50 min}\\cdot\\pu{60 s min-1}} = \\pu{2.55e-4 mol L-1 s-1}$$\nAlso, be careful with notations. Use proper capitalization, and don't equate moles to grams! This is not tolerable in natural sciences."}", "/scratch/micpie/export/chemistry_stackexchange/valid_0-1.jsonl": "{"text":"Task: Create a meaningful title for this question.\nInquiry: Is it possible to oxidize ethanol to acetic acid with hydrogen peroxide and if yes then \nunder what circumstances? I tried it in room temperature but either concentration was too small (of hydrogen peroxide(3%)) or I couldn't quite precisely read the the results of the universal indicator. I also tried heating it up, but it didn't change anything. The color stayed the same (of indicator). Can someone please explain me if the mistake was in my experiment (if the reaction can happen in room temperature) or the reaction needs some specific catalyst or other conditions.\nThe reaction would be: \n\\begin{align}\n\\ce{\\underset{(ethanol)}{C2H6O} + H2O2 &-> \n \\underset{(aldehyde)}{C2H4O} + 2H2O}\\\\\n\\ce{\\underset{(aldehyde)}{C2H4O} + H2O2 &-> \n \\underset{(acetic acid)}{C2H4O2} + H2O}\n\\end{align} \nOr the reaction without the middle part (since aldehyde will try to oxidize faster then ethanol) would be: \n$$\\ce{C2H6O + 2H2O2 -> C2H4O2 + 3H2O}$$\nTitle: Can ethanol be oxidized by hydrogen peroxide?"} {"text":"Task: Create a meaningful title for this question.\nQuestion: Why do we call a solution that has been centrifuged a supernatant? It seems to me that a \"natant\" should be something that floats (from Latin \"natare\" meaning \"to swim\"), and a \"supernatant\" should be something which lies *above* the natant, not below it. Shouldn't the liquid be called a subnatant? Or maybe a superprecipitate?\nEdit: fixed Latin (see below)\nAssistant: etymology of supernatant"}", "/scratch/micpie/export/chemistry_stackexchange/train_0-1.jsonl": "{"text":"Task: Generate a title for this question.\nInquiry: Today we have huge computational power (which is even significantly larger with supercomputers). I know that computational chemistry is used sometimes to predict particle properties. As I read on Wikipedia:\n> \n> Present algorithms in computational chemistry can routinely calculate the properties of molecules that contain up to about 40 electrons with sufficient accuracy.\n> \nIf that's so, why bother to try to find chemical interactions and properties experimentally, at least up to 40 electrons? For example, every year new drugs are being discovered. Wouldn't be it easier at least to find new chemical compounds, if not their properties, simply by computer simulation? What are the constraints and where do they come from? (I know that such constraints exist, but I'd like to know why).\nAnswer: Why not simulate every particle properties and interactions?"} {"text":"Task: Generate a title for this question.\nQuestion: > \n> A $\\pu{3.45 g}$ piece of marble ($\\ce{CaCO3}$) is weighed and dropped into a beaker containing $\\pu{1.00 L}$ of hydrochloric acid. The marble is completely gone $\\pu{4.50 min}$ later. Calculate the average rate of reaction of $\\ce{HCl}$ in $\\pu{mol\/L\/s}$. Note that the volume of the system remains at $\\pu{1.00 L}$ through the entire reaction.\n> \nI'm not very sure why the units are in $\\pu{mol\/L\/s}$ instead of just $\\pu{mol\/s}$. Here is what I did:\n$$\\text{Rate} = \\frac{\\pu{3.45 g}~\\ce{(CaCO3)}}{\\pu{4.5 min}} = \\pu{0.0128 g\/s}$$\n$$\\pu{0.128 g}~\\ce{CaCO3} = \\pu{1.28e-3 mol}$$\n$$n(\\ce{HCl}) = \\pu{2.58e-3 mol}$$\n$$\\text{Rate} = \\pu{2.58e-3 mol\/L\/s}$$\nHowever, the answer given is $\\pu{2.55e-4 mol\/L\/s}$. What am I doing wrong?\nAnswer: Calculating Average Rate"}", "/scratch/micpie/export/bio_ner_24/valid_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: The uptake rates of Mitoxantrone by BCRP were examined in the presence of various pharmacological classes of ABC transporter inhibitors, such as antiviral (i. e. Erythromycin, Foscarnet), antibiotic (i. e. Ciprofloxacin, Febendazole, Novobiocin, Quercitin), calcium channel blockers (i. e. Verapamil, Diltiazem, Nifedipine, Qunidine), anticancer (i. e. Mitroxantrone, Acyclovir, FTC, Phenethyl ITC, Raloxifene, Rodamin 123, Saquinavir, Tamoxifene), antifungal agents (i. e. Ketoconazole), hormones (i. e., Estradiol) and immunosuppressant (Cyclosporin) [ 16, 23]..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: Mitoxantrone,20,32,Chemical\/Drug\nBCRP,36,40,Gene\/Protein\nABC transporter,109,124,Gene\/Protein\nErythromycin,163,175,Chemical\/Drug\nFoscarnet,177,186,Chemical\/Drug\nCiprofloxacin,209,222,Chemical\/Drug\nFebendazole,224,235,Chemical\/Drug\nNovobiocin,237,247,Chemical\/Drug\nQuercitin,249,258,Chemical\/Drug\ncalcium channel,262,277,Gene\/Protein\nVerapamil,295,304,Chemical\/Drug\nDiltiazem,306,315,Chemical\/Drug\nNifedipine,317,327,Chemical\/Drug\nQunidine,329,337,Chemical\/Drug\nMitroxantrone,360,373,Chemical\/Drug\nAcyclovir,375,384,Chemical\/Drug\nRaloxifene,406,416,Chemical\/Drug\nRodamin 123,418,429,Chemical\/Drug\nSaquinavir,431,441,Chemical\/Drug\nTamoxifene,443,453,Chemical\/Drug\nKetoconazole,483,495,Chemical\/Drug\nhormones,499,507,Gene\/Protein\nEstradiol,518,527,Chemical\/Drug\nCyclosporin,553,564,Chemical\/Drug"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: The uptake rates of Mitoxantrone by BCRP were examined in the presence of various pharmacological classes of ABC transporter inhibitors, such as antiviral (i. e. Erythromycin, Foscarnet), antibiotic (i. e. Ciprofloxacin, Febendazole, Novobiocin, Quercitin), calcium channel blockers (i. e. Verapamil, Diltiazem, Nifedipine, Qunidine), anticancer (i. e. Mitroxantrone, Acyclovir, FTC, Phenethyl ITC, Raloxifene, Rodamin 123, Saquinavir, Tamoxifene), antifungal agents (i. e. Ketoconazole), hormones (i. e., Estradiol) and immunosuppressant (Cyclosporin) [ 16, 23]..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: Mitoxantrone,20,32,Chemical\/Drug\nBCRP,36,40,Gene\/Protein\nABC transporter,109,124,Gene\/Protein\nErythromycin,163,175,Chemical\/Drug\nFoscarnet,177,186,Chemical\/Drug\nCiprofloxacin,209,222,Chemical\/Drug\nFebendazole,224,235,Chemical\/Drug\nNovobiocin,237,247,Chemical\/Drug\nQuercitin,249,258,Chemical\/Drug\ncalcium channel,262,277,Gene\/Protein\nVerapamil,295,304,Chemical\/Drug\nDiltiazem,306,315,Chemical\/Drug\nNifedipine,317,327,Chemical\/Drug\nQunidine,329,337,Chemical\/Drug\nMitroxantrone,360,373,Chemical\/Drug\nAcyclovir,375,384,Chemical\/Drug\nRaloxifene,406,416,Chemical\/Drug\nRodamin 123,418,429,Chemical\/Drug\nSaquinavir,431,441,Chemical\/Drug\nTamoxifene,443,453,Chemical\/Drug\nKetoconazole,483,495,Chemical\/Drug\nhormones,499,507,Gene\/Protein\nEstradiol,518,527,Chemical\/Drug\nCyclosporin,553,564,Chemical\/Drug"}", "/scratch/micpie/export/bio_ner_24/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Repeated column chromatography based on bioactivity guided fractionation yielded 10 coumarins (esculetin, esculin, scopolin, isoscopolin, daphnetin, umbelliferone, 7-methoxy coumarin, scoparone, scopoletin, 6-methoxy artemicapin C), 8 flavonoids (hyperoside, quercetin, isorhamnetin, cirsilineol, arcapillin, isorhamnetin 3-robinobioside, linarin, isorhamnetin 3-glucoiside), 6 phenolic compounds (1, 5-dicaffeoylquinic acid, 3, 4-dicaffeoylquinic acid, 3, 5-dicaffeoylquinic acid, 3, 5-dicaffeoylquinic acid methyl ester, 4, 5-dicaffeoylquinic acid, 3-caffeoylquinic acid), and one chromone (capillarisin)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: coumarins,84,93,Chemical\/Drug\nesculetin,96,105,Chemical\/Drug\nesculin,107,114,Chemical\/Drug\nscopolin,116,124,Chemical\/Drug\nisoscopolin,126,137,Chemical\/Drug\ndaphnetin,139,148,Chemical\/Drug\numbelliferone,150,163,Chemical\/Drug\n7 - methoxy coumarin,165,185,Chemical\/Drug\nscoparone,187,196,Chemical\/Drug\nscopoletin,198,208,Chemical\/Drug\n6 - methoxy artemicapin C,210,235,Chemical\/Drug\nflavonoids,240,250,Chemical\/Drug\nhyperoside,253,263,Chemical\/Drug\nquercetin,265,274,Chemical\/Drug\nisorhamnetin,276,288,Chemical\/Drug\ncirsilineol,290,301,Chemical\/Drug\narcapillin,303,313,Chemical\/Drug\nisorhamnetin 3 - robinobioside,315,345,Chemical\/Drug\nlinarin,347,354,Chemical\/Drug\nisorhamnetin 3 - glucoiside,356,383,Chemical\/Drug\nphenolic,388,396,Chemical\/Drug\n3 - caffeoylquinic acid,572,595,Chemical\/Drug\nchromone,606,614,Chemical\/Drug\ncapillarisin,617,629,Chemical\/Drug"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: In particular, the violet module (lowest bar) shows the most upregulated genes in Rad54-\/-\/ Parp-1-\/-\/ Ptc1+\/-mice (i. e., Exo1, P = 0.031; Pttg1, P = 0.001); in the gray module are listed the genes downregulated in Rad54-\/-\/ Ptc1+\/-and Parp-1-\/-\/ Ptc1+\/-mutants that are upregulated in Rad54-\/-\/ Parp-1-\/-\/ Ptc1+\/-mice, of which Xpa was statistically significant (P = 0.033); instead, in the red module, the synergistic effects of combined Rad54 and Parp-1 mutations lead to downregulation of several genes (i. e., Atm, P = 0.031; Atr, P = 0.008; Atrx, P = 0.006; Parp-1, P = 0.001; Prkdc, P = 0.029; Rad50, P = 0.028; Wrn, P = 0.016) (Figure 5A and Table S1)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: Rad54,83,88,Gene\/Protein\nParp - 1,94,102,Gene\/Protein\nPtc1,108,112,Gene\/Protein\nmice,117,121,Organism\/Species\nExo1,132,136,Gene\/Protein\nPttg1,150,155,Gene\/Protein\nRad54,228,233,Gene\/Protein\nPtc1,239,243,Gene\/Protein\nParp - 1,252,260,Gene\/Protein\nPtc1,266,270,Gene\/Protein\nRad54,307,312,Gene\/Protein\nParp - 1,318,326,Gene\/Protein\nPtc1,332,336,Gene\/Protein\nmice,341,345,Organism\/Species\nXpa,356,359,Gene\/Protein\nRad54,470,475,Gene\/Protein\nParp - 1,480,488,Gene\/Protein\nAtm,549,552,Gene\/Protein\nAtr,566,569,Gene\/Protein\nAtrx,583,587,Gene\/Protein\nParp - 1,601,609,Gene\/Protein\nPrkdc,623,628,Gene\/Protein\nRad50,642,647,Gene\/Protein\nWrn,661,664,Gene\/Protein"}", "/scratch/micpie/export/chemcaption_fragments/test_0-5.jsonl": "{"text":"User: I have a question about the molecule with SMILES CN(CC1(O)CCC1)C(=O)COCCNC(=O)c1ccccc1.\nAssistant: Sure, what is your question?\nUser: Is a 9-fluorenylmethoxycarbonyl fragment present in the molecule?\nAssistant: No"} {"text":"User: I have a question about the molecule with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22).\nAssistant: Interesting, how can I help?\nUser: Is a carbonyl fragment part of the molecule?\nAssistant: Yes"}", "/scratch/micpie/export/chemcaption_fragments/test_0-1.jsonl": "{"text":"Question: Is a 9-fluorenylmethoxycarbonyl fragment present in the molecule with SMILES CN(CC1(O)CCC1)C(=O)COCCNC(=O)c1ccccc1?\nAnswer: No"} {"text":"Q: Is a carbonyl fragment present in the molecule with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22)?\nAnswer: Yes"}", "/scratch/micpie/export/chemcaption_fragments/valid_0-0.jsonl": "{"text":"Q: Is the fragment with SMARTs [#8]1-[#8]-[#7]-[#6]=[#6]-1 present in the molecule with SELFIES [C][C][N][C][=Branch1][C][=O][C@@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=C][C][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O]?\nAnswer: No"} {"text":"Question: Is the fragment with SMARTs [#7]1-[#6]-[#6]-[#7]-[#6]-[#6]-1 present in the molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2?\nA: Yes"}", "/scratch/micpie/export/chemcaption_fragments/test_0-2.jsonl": "{"text":"A 9-fluorenylmethoxycarbonyl fragment is present in the molecule with SMILES CN(CC1(O)CCC1)C(=O)COCCNC(=O)c1ccccc1."} {"text":"A carbonyl fragment is absent in the molecule with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22)."}", "/scratch/micpie/export/chemcaption_fragments/train_0-6.jsonl": "{"text":"User: I want to know more about the molecule with InChI InChI=1S\/C12H18FN3O4S\/c1-8(20-3)7-15-12(17)16-9-4-5-10(13)11(6-9)21(18,19)14-2\/h4-6,8,14H,7H2,1-3H3,(H2,15,16,17)\/t8-\/m1\/s1.\nAssistant: How can I help?\nUser: Is a thiepin fragment present in the molecule?\nAssistant: No"} {"text":"User: I want to know more about the molecule with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+.\nAssistant: That sounds interesting, how can I help?\nUser: Is a halogen fragment part of of the molecule?\nAssistant: Yes"}", "/scratch/micpie/export/chemcaption_fragments/valid_0-6.jsonl": "{"text":"User: I want to know more about the molecule with SELFIES [C][C][N][C][=Branch1][C][=O][C@@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=C][C][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O].\nAssistant: Interesting, how can I help?\nUser: Is a 3H-1,2,3-dioxazole fragment substructure of the molecule?\nAssistant: No"} {"text":"User: I want to know more about the molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2.\nAssistant: How can I help?\nUser: Is a piperazine fragment substructure of the molecule?\nAssistant: Yes"}", "/scratch/micpie/export/chemcaption_fragments/test_0-0.jsonl": "{"text":"Question: Is the fragment with SMARTs [#8]=[#6]-[#8]-[#6]-[#6]1-[#6]2:[#6](-[#6]3:[#6]-1:[#6]:[#6]:[#6]:[#6]:3):[#6]:[#6]:[#6]:[#6]:2 present in the molecule with SMILES CN(CC1(O)CCC1)C(=O)COCCNC(=O)c1ccccc1?\nAnswer: No"} {"text":"Q: Is the fragment with SMARTs [CX3]=[OX1] present in the molecule with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22)?\nA: Yes"}", "/scratch/micpie/export/chemcaption_fragments/test_0-3.jsonl": "{"text":"Task: Answer a question about substructures\nQ: Is the fragment with SMARTS [#8]=[#6]-[#8]-[#6]-[#6]1-[#6]2:[#6](-[#6]3:[#6]-1:[#6]:[#6]:[#6]:[#6]:3):[#6]:[#6]:[#6]:[#6]:2 part of the molecule with SMILES CN(CC1(O)CCC1)C(=O)COCCNC(=O)c1ccccc1?\nAnswer: No"} {"text":"Task: Answer a question about substructures\nQuestion: Is the fragment with SMARTS [CX3]=[OX1] present in the molecule with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22)?\nAnswer: Yes"}", "/scratch/micpie/export/chemcaption_fragments/train_0-0.jsonl": "{"text":"Q: Is the fragment with SMARTs [#16]1-[#6]=[#6]-[#6]=[#6]-[#6]=[#6]-1 present in the molecule with InChI InChI=1S\/C12H18FN3O4S\/c1-8(20-3)7-15-12(17)16-9-4-5-10(13)11(6-9)21(18,19)14-2\/h4-6,8,14H,7H2,1-3H3,(H2,15,16,17)\/t8-\/m1\/s1?\nNo"} {"text":"Question: Is the fragment with SMARTs [F,Cl,Br,I] present in the molecule with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+?\nAnswer: Yes"}", "/scratch/micpie/export/chemcaption_fragments/test_0-6.jsonl": "{"text":"User: I want to know more about the molecule with SMILES CN(CC1(O)CCC1)C(=O)COCCNC(=O)c1ccccc1.\nAssistant: That sounds interesting, how can I help?\nUser: Is a 9-fluorenylmethoxycarbonyl fragment present in the molecule?\nAssistant: No"} {"text":"User: I want to know more about the molecule with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22).\nAssistant: That sounds interesting, how can I help?\nUser: Is a carbonyl fragment part of of the molecule?\nAssistant: Yes"}", "/scratch/micpie/export/chemcaption_fragments/train_0-3.jsonl": "{"text":"Task: Answer a question about fragments\nQ: Is the fragment with SMARTS [#16]1-[#6]=[#6]-[#6]=[#6]-[#6]=[#6]-1 present in the molecule with InChI InChI=1S\/C12H18FN3O4S\/c1-8(20-3)7-15-12(17)16-9-4-5-10(13)11(6-9)21(18,19)14-2\/h4-6,8,14H,7H2,1-3H3,(H2,15,16,17)\/t8-\/m1\/s1?\nAnswer: No"} {"text":"Task: Answer a question about fragments\nQuestion: Is the fragment with SMARTS [F,Cl,Br,I] part of the molecule with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+?\nA: Yes"}", "/scratch/micpie/export/chemcaption_fragments/valid_0-2.jsonl": "{"text":"A 3H-1,2,3-dioxazole fragment is present in the molecule with SELFIES [C][C][N][C][=Branch1][C][=O][C@@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=C][C][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O]."} {"text":"A piperazine fragment is absent in the molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2."}", "/scratch/micpie/export/chemcaption_fragments/valid_0-1.jsonl": "{"text":"Question: Is a 3H-1,2,3-dioxazole fragment present in the molecule with SELFIES [C][C][N][C][=Branch1][C][=O][C@@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=C][C][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O]?\nNo"} {"text":"Q: Is a piperazine fragment present in the molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2?\nYes"}", "/scratch/micpie/export/chemcaption_fragments/valid_0-5.jsonl": "{"text":"User: I have a question about the molecule with SELFIES [C][C][N][C][=Branch1][C][=O][C@@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=C][C][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O].\nAssistant: How can I help?\nUser: Is a 3H-1,2,3-dioxazole fragment present in the molecule?\nAssistant: No"} {"text":"User: I have a question about the molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2.\nAssistant: Sure, what is your question?\nUser: Is a piperazine fragment present in the molecule?\nAssistant: Yes"}", "/scratch/micpie/export/chemcaption_fragments/valid_0-4.jsonl": "{"text":"User: Is the fragment 3H-1,2,3-dioxazole part of the molecule with SELFIES [C][C][N][C][=Branch1][C][=O][C@@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=C][C][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O]?\nAssistant: No"} {"text":"User: Is the fragment piperazine part of the molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2?\nAssistant: Yes"}", "/scratch/micpie/export/chemcaption_fragments/train_0-5.jsonl": "{"text":"User: I have a question about the molecule with InChI InChI=1S\/C12H18FN3O4S\/c1-8(20-3)7-15-12(17)16-9-4-5-10(13)11(6-9)21(18,19)14-2\/h4-6,8,14H,7H2,1-3H3,(H2,15,16,17)\/t8-\/m1\/s1.\nAssistant: Interesting, how can I help?\nUser: Is a thiepin fragment part of the molecule?\nAssistant: No"} {"text":"User: I have a question about the molecule with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+.\nAssistant: Interesting, how can I help?\nUser: Is a halogen fragment part of the molecule?\nAssistant: Yes"}", "/scratch/micpie/export/chemcaption_fragments/train_0-2.jsonl": "{"text":"A thiepin fragment is present in the molecule with InChI InChI=1S\/C12H18FN3O4S\/c1-8(20-3)7-15-12(17)16-9-4-5-10(13)11(6-9)21(18,19)14-2\/h4-6,8,14H,7H2,1-3H3,(H2,15,16,17)\/t8-\/m1\/s1."} {"text":"A halogen fragment is absent in the molecule with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+."}", "/scratch/micpie/export/chemcaption_fragments/train_0-1.jsonl": "{"text":"Q: Is a thiepin fragment present in the molecule with InChI InChI=1S\/C12H18FN3O4S\/c1-8(20-3)7-15-12(17)16-9-4-5-10(13)11(6-9)21(18,19)14-2\/h4-6,8,14H,7H2,1-3H3,(H2,15,16,17)\/t8-\/m1\/s1?\nA: No"} {"text":"Question: Is a halogen fragment present in the molecule with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+?\nA: Yes"}", "/scratch/micpie/export/chemcaption_fragments/train_0-4.jsonl": "{"text":"User: Is the fragment thiepin part of the molecule with InChI InChI=1S\/C12H18FN3O4S\/c1-8(20-3)7-15-12(17)16-9-4-5-10(13)11(6-9)21(18,19)14-2\/h4-6,8,14H,7H2,1-3H3,(H2,15,16,17)\/t8-\/m1\/s1?\nAssistant: No"} {"text":"User: Is the fragment halogen present in the molecule with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+?\nAssistant: Yes"}", "/scratch/micpie/export/chemcaption_fragments/valid_0-3.jsonl": "{"text":"Task: Answer a question about substructures\nQuestion: Is the fragment with SMARTS [#8]1-[#8]-[#7]-[#6]=[#6]-1 present in the molecule with SELFIES [C][C][N][C][=Branch1][C][=O][C@@H1][Branch1][=Branch1][C][Branch1][C][C][C][N][C][=Branch1][C][=O][C@H1][Branch1][=C][C][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O]?\nAnswer: No"} {"text":"Task: Answer a question about substructures\nQuestion: Is the fragment with SMARTS [#7]1-[#6]-[#6]-[#7]-[#6]-[#6]-1 part of the molecule with SMILES CNC(=O)N[C@H](C)C(=O)N1CCN(C(=O)c2c(C)ncn2OC)CC12CCC2?\nYes"}", "/scratch/micpie/export/chemcaption_fragments/test_0-4.jsonl": "{"text":"User: Is the fragment 9-fluorenylmethoxycarbonyl part of the molecule with SMILES CN(CC1(O)CCC1)C(=O)COCCNC(=O)c1ccccc1?\nAssistant: No"} {"text":"User: Is the fragment carbonyl part of the molecule with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22)?\nAssistant: Yes"}", "/scratch/micpie/export/bc5chem/test_0-1.jsonl": "{"text":"User: Does the following text contain mentions of chemicals?Please return matches\nMETHODS: The cardiovascular responses to standing and head-up tilt were studied repeatedly in PD patients receiving selegiline and as the drug was withdrawn.\nAssistant: There is selegiline."} {"text":"User: Does the following text contain mentions of chemicals? Can you output matches?\nThirteen psoriatic patients with hypertension during the course of cyclosporin A therapy were treated for 25 months with a calcium channel blocker, sustained-release nifedipine, to study the clinical antihypertensive effects and adverse events during treatment with both drugs.\nAssistant: There is cyclosporin A, nifedipine, and calcium."}", "/scratch/micpie/export/bc5chem/valid_0-0.jsonl": "{"text":"Task: Find all the mentions of chemicals in the following text. Return the matching words. If there is no mention of a chemical, return `no match`.\nDescription: Further studies on effects of irrigation solutions on rat bladders.\nAnswer: no match"} {"text":"Task: Find all the mentions of chemical compounds in the following text. Return the matching words. If there is no matching entity, return `no match`.\nDescription: Both systolic and diastolic blood pressures of these 13 patients were decreased significantly after 4 weeks of nifedipine therapy, and blood pressure was maintained within the normal range thereafter for 25 months.\nAnswer: nifedipine"}", "/scratch/micpie/export/bc5chem/test_0-0.jsonl": "{"text":"Task: Find all the mentions of chemical compounds in the subsequent sentence. Return the matching entities. If there is no match, return `no match`.\nDescription: METHODS: The cardiovascular responses to standing and head-up tilt were studied repeatedly in PD patients receiving selegiline and as the drug was withdrawn.\nAnswer: selegiline"} {"text":"Task: Find all the mentions of chemical compounds in the subsequent sentence. Return the matching words. If there is no mention of a chemical, return `no match`.\nSentence: Thirteen psoriatic patients with hypertension during the course of cyclosporin A therapy were treated for 25 months with a calcium channel blocker, sustained-release nifedipine, to study the clinical antihypertensive effects and adverse events during treatment with both drugs.\nAnswer: cyclosporin A, nifedipine, and calcium"}", "/scratch/micpie/export/bc5chem/train_0-0.jsonl": "{"text":"Task: Find all the mentions of chemicals in the following text. Return the matching entities. If there is no matching entity, return `no match`.\nDescription: Selegiline-induced postural hypotension in Parkinson's disease: a longitudinal study on the effects of drug withdrawal.\nAnswer: Selegiline"} {"text":"Task: Find all the mentions of chemical substances in the subsequent sentence. Return the matching words. If there is no matching entity, return `no match`.\nDescription: Our findings indicate that sustained-release nifedipine is useful for hypertensive psoriatic patients under long-term treatment with cyclosporin A, but that these patients should be monitored for gingival hyperplasia.\nAnswer: cyclosporin A and nifedipine"}", "/scratch/micpie/export/bc5chem/valid_0-1.jsonl": "{"text":"User: Does the following text contain mentions of chemical substances? Can you return matches?\nText: Further studies on effects of irrigation solutions on rat bladders.\nAssistant: There is no match."} {"text":"User: Does the following text contain mentions of chemical substances?Please return matches\nText: Both systolic and diastolic blood pressures of these 13 patients were decreased significantly after 4 weeks of nifedipine therapy, and blood pressure was maintained within the normal range thereafter for 25 months.\nAssistant: There is nifedipine."}", "/scratch/micpie/export/bc5chem/train_0-1.jsonl": "{"text":"User: Does the following text contain mentions of chemicals? Can you output matches?\nText: Selegiline-induced postural hypotension in Parkinson's disease: a longitudinal study on the effects of drug withdrawal.\nAssistant: I found Selegiline."} {"text":"User: Does the following text contain mentions of chemical compounds?Please return matches\nOur findings indicate that sustained-release nifedipine is useful for hypertensive psoriatic patients under long-term treatment with cyclosporin A, but that these patients should be monitored for gingival hyperplasia.\nAssistant: There is cyclosporin A and nifedipine."}", "/scratch/micpie/export/bio_ner_52/train_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: For GA as an outcome, the model contained MMP-2, VEGF, ICAM-1 and ET-1, and the optimal model for the IBW\/GA ratio was composed of maternal plasma BET-1, MMP-2 and-9, VEGF and BMI. Table 2. Maternal markers affecting different birth outcomes as determined by regression model analyses. Infant birth weight (IBW) distributionInfant birth weight for gestational age (IBW-GA) distributionBirth outcomeMaternal factors a Maternal factors b Maternal factors a Maternal factors b IBWICAM-1 (p = 0.012) VEGF (p = 0.002) ET-3 (p = 0.087) VCAM-1 (p = 0.074) VEGF (p < 0.01) ET-3 (p < 0.01) VCAM-1 (p < 0.05) VCAM-1 (p = 0.007) VEGF (p = 0.067) ICAM-1 (p = 0.078) BMI (p = 0.094) DiaBP (p = 0.090) ET-1 (p = 0.179) BET-1 (p = 0.03) ET-2 (p = 0.05) MMP-9 (p = 0.02) VEGF (p = 0.003) GAMMP-2 (p = 0.010) VEGF (p = 0.037) ET-1 (p = 0.107) ICAM-1 (p = 0.076) MMP-2 (p < 0.06) VEGF (p = 0.01) ET-1 (p = 0.01)–– IBW\/GABET-1 (p = 0.032) MMP-2 (p = 0.033) MMP-9 (p = 0.021) VEGF (p < 0.001) BMI (p = 0.106) BET-1 (p < 0.01) MMP-2 (p < 0.05) VEGF (p < 0.01) BMI (p < 0.01)––< 25th percentile IBWBET-1 (p = 0.021), ET-2 (p = 0.014), ISOP (p = 0.074), VCAM-1 (p = 0.077), MMP-9 (p = 0.003) BET-1 (p = 0.01) ET-2 (p = 0.02) MMP-9 (p = 0.01) SysBP (p = 0.035) DiaBP (p = 0.046) BET-1 (p = 0.065) ET-2 (p = 0.084) MMP-7 (p = 0.067) MMP-9 (p = 0.036) BET-1 (p = 0.01) ET-2 (p = 0.02) MMP-9 (p = 0.01)> 75th percentile IBWET-3 (p = 0.097), VEGF (p < 0.001) VEGF (p = 0.002) VCAM-1 (p = 0.008) VEGF (p = 0.078) VEGF (p = 0.002) a Best subsets regression results (e. g. < 25th percentile, r = 0.46; > 75th percentile, r = 0.36). b Polytomous logistic regression results..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: MMP - 2,42,49,Gene\/Protein\nVEGF,51,55,Gene\/Protein\nICAM - 1,57,65,Gene\/Protein\nET - 1,70,76,Gene\/Protein\nBET - 1,155,162,Gene\/Protein\nMMP - 2,164,171,Gene\/Protein\nVEGF,181,185,Gene\/Protein\nVEGF,518,522,Gene\/Protein\nET - 3,537,543,Gene\/Protein\nVCAM - 1,558,566,Gene\/Protein\nVEGF,581,585,Gene\/Protein\nET - 3,599,605,Gene\/Protein\nVCAM - 1,619,627,Gene\/Protein\nVCAM - 1,641,649,Gene\/Protein\nVEGF,664,668,Gene\/Protein\nICAM - 1,683,691,Gene\/Protein\nET - 1,744,750,Gene\/Protein\nBET - 1,765,772,Gene\/Protein\nET - 2,786,792,Gene\/Protein\nMMP - 9,806,813,Gene\/Protein\nVEGF,827,831,Gene\/Protein\nVEGF,870,874,Gene\/Protein\nET - 1,889,895,Gene\/Protein\nICAM - 1,910,918,Gene\/Protein\nMMP - 2,933,940,Gene\/Protein\nVEGF,954,958,Gene\/Protein\nET - 1,972,978,Gene\/Protein\nMMP - 2,1025,1032,Gene\/Protein\nMMP - 9,1047,1054,Gene\/Protein\nVEGF,1069,1073,Gene\/Protein\nBET - 1,1106,1113,Gene\/Protein\nMMP - 2,1127,1134,Gene\/Protein\nVEGF,1148,1152,Gene\/Protein\nET - 2,1230,1236,Gene\/Protein\nVCAM - 1,1274,1282,Gene\/Protein\nMMP - 9,1299,1306,Gene\/Protein\nBET - 1,1321,1328,Gene\/Protein\nET - 2,1342,1348,Gene\/Protein\nMMP - 9,1362,1369,Gene\/Protein\nBET - 1,1423,1430,Gene\/Protein\nET - 2,1445,1451,Gene\/Protein\nMMP - 7,1466,1473,Gene\/Protein\nMMP - 9,1488,1495,Gene\/Protein\nBET - 1,1510,1517,Gene\/Protein\nET - 2,1531,1537,Gene\/Protein\nMMP - 9,1551,1558,Gene\/Protein\n- 3,1596,1599,Gene\/Protein\nVEGF,1616,1620,Gene\/Protein\nVEGF,1635,1639,Gene\/Protein\nVCAM - 1,1654,1662,Gene\/Protein\nVEGF,1677,1681,Gene\/Protein\nVEGF,1696,1700,Gene\/Protein"} {"text":"Task: Please carry out the NER task for the the text below.\nText: For GA as an outcome, the model contained MMP-2, VEGF, ICAM-1 and ET-1, and the optimal model for the IBW\/GA ratio was composed of maternal plasma BET-1, MMP-2 and-9, VEGF and BMI. Table 2. Maternal markers affecting different birth outcomes as determined by regression model analyses. Infant birth weight (IBW) distributionInfant birth weight for gestational age (IBW-GA) distributionBirth outcomeMaternal factors a Maternal factors b Maternal factors a Maternal factors b IBWICAM-1 (p = 0.012) VEGF (p = 0.002) ET-3 (p = 0.087) VCAM-1 (p = 0.074) VEGF (p < 0.01) ET-3 (p < 0.01) VCAM-1 (p < 0.05) VCAM-1 (p = 0.007) VEGF (p = 0.067) ICAM-1 (p = 0.078) BMI (p = 0.094) DiaBP (p = 0.090) ET-1 (p = 0.179) BET-1 (p = 0.03) ET-2 (p = 0.05) MMP-9 (p = 0.02) VEGF (p = 0.003) GAMMP-2 (p = 0.010) VEGF (p = 0.037) ET-1 (p = 0.107) ICAM-1 (p = 0.076) MMP-2 (p < 0.06) VEGF (p = 0.01) ET-1 (p = 0.01)–– IBW\/GABET-1 (p = 0.032) MMP-2 (p = 0.033) MMP-9 (p = 0.021) VEGF (p < 0.001) BMI (p = 0.106) BET-1 (p < 0.01) MMP-2 (p < 0.05) VEGF (p < 0.01) BMI (p < 0.01)––< 25th percentile IBWBET-1 (p = 0.021), ET-2 (p = 0.014), ISOP (p = 0.074), VCAM-1 (p = 0.077), MMP-9 (p = 0.003) BET-1 (p = 0.01) ET-2 (p = 0.02) MMP-9 (p = 0.01) SysBP (p = 0.035) DiaBP (p = 0.046) BET-1 (p = 0.065) ET-2 (p = 0.084) MMP-7 (p = 0.067) MMP-9 (p = 0.036) BET-1 (p = 0.01) ET-2 (p = 0.02) MMP-9 (p = 0.01)> 75th percentile IBWET-3 (p = 0.097), VEGF (p < 0.001) VEGF (p = 0.002) VCAM-1 (p = 0.008) VEGF (p = 0.078) VEGF (p = 0.002) a Best subsets regression results (e. g. < 25th percentile, r = 0.46; > 75th percentile, r = 0.36). b Polytomous logistic regression results..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: MMP - 2,42,49,Gene\/Protein\nVEGF,51,55,Gene\/Protein\nICAM - 1,57,65,Gene\/Protein\nET - 1,70,76,Gene\/Protein\nBET - 1,155,162,Gene\/Protein\nMMP - 2,164,171,Gene\/Protein\nVEGF,181,185,Gene\/Protein\nVEGF,518,522,Gene\/Protein\nET - 3,537,543,Gene\/Protein\nVCAM - 1,558,566,Gene\/Protein\nVEGF,581,585,Gene\/Protein\nET - 3,599,605,Gene\/Protein\nVCAM - 1,619,627,Gene\/Protein\nVCAM - 1,641,649,Gene\/Protein\nVEGF,664,668,Gene\/Protein\nICAM - 1,683,691,Gene\/Protein\nET - 1,744,750,Gene\/Protein\nBET - 1,765,772,Gene\/Protein\nET - 2,786,792,Gene\/Protein\nMMP - 9,806,813,Gene\/Protein\nVEGF,827,831,Gene\/Protein\nVEGF,870,874,Gene\/Protein\nET - 1,889,895,Gene\/Protein\nICAM - 1,910,918,Gene\/Protein\nMMP - 2,933,940,Gene\/Protein\nVEGF,954,958,Gene\/Protein\nET - 1,972,978,Gene\/Protein\nMMP - 2,1025,1032,Gene\/Protein\nMMP - 9,1047,1054,Gene\/Protein\nVEGF,1069,1073,Gene\/Protein\nBET - 1,1106,1113,Gene\/Protein\nMMP - 2,1127,1134,Gene\/Protein\nVEGF,1148,1152,Gene\/Protein\nET - 2,1230,1236,Gene\/Protein\nVCAM - 1,1274,1282,Gene\/Protein\nMMP - 9,1299,1306,Gene\/Protein\nBET - 1,1321,1328,Gene\/Protein\nET - 2,1342,1348,Gene\/Protein\nMMP - 9,1362,1369,Gene\/Protein\nBET - 1,1423,1430,Gene\/Protein\nET - 2,1445,1451,Gene\/Protein\nMMP - 7,1466,1473,Gene\/Protein\nMMP - 9,1488,1495,Gene\/Protein\nBET - 1,1510,1517,Gene\/Protein\nET - 2,1531,1537,Gene\/Protein\nMMP - 9,1551,1558,Gene\/Protein\n- 3,1596,1599,Gene\/Protein\nVEGF,1616,1620,Gene\/Protein\nVEGF,1635,1639,Gene\/Protein\nVCAM - 1,1654,1662,Gene\/Protein\nVEGF,1677,1681,Gene\/Protein\nVEGF,1696,1700,Gene\/Protein"}", "/scratch/micpie/export/MUV_644/valid_0-0.jsonl": "{"text":"The chemical with the InChI representation of InChI=1S\/C20H11N3O3\/c24-19-14-7-4-10-21-17(14)20(25)23(19)13-6-3-5-12(11-13)18-22-15-8-1-2-9-16(15)26-18\/h1-11H is not an inhibitor of ROCK-2."} {"text":"The chemical compound with the InChI representation of InChI=1S\/C19H17N3O4S\/c1-12-9-17(21-26-12)20-18(23)11-22-13-5-2-3-7-15(13)27-16(10-19(22)24)14-6-4-8-25-14\/h2-9,16H,10-11H2,1H3,(H,20,21,23) is not an inhibitor of Rho-kinase 2 (ROCK-2)."}", "/scratch/micpie/export/MUV_644/test_0-0.jsonl": "{"text":"The chemical with the canonical SMILES representation of Nc1ccc(C(=O)O)c(O)c1 is not an inhibitor of Rho-kinase 2 (ROCK-2)."} {"text":"The molecule with the DeepSMILES CcccccOCC=O)Ncccccc6NCCNC=O)CC)C)))CC6))))))))))))))))c6C is not an inhibitor of Rho-kinase 2 (ROCK-2)."}", "/scratch/micpie/export/MUV_644/train_0-0.jsonl": "{"text":"The molecular species with the SELFIES [C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][Ring1][=Branch1][C][C][=Branch1][C][=O][N][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][Ring1][=C][C][C][C][C][N][Ring1][Branch1][Ring2][Ring1][Ring2] is not an inhibitor of Rho-kinase 2 (ROCK-2)."} {"text":"The molecule with the canonical SMILES CCOC(=O)C(C)n1cnc2c(cnn2-c2ccc(Cl)cc2)c1=O is not an inhibitor of Rho-kinase 2 (ROCK-2)."}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/test_0-10.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that is not metabolized by CYP2D6?\nAssistant: This is a molecule that is not a CYP2D6 substrate: CCCC(=O)Nc1ccc(OC[C@@H](O)CNC(C)C)c(C(C)=O)c1"} {"text":"User: I'm searching for the canonical SMILES of a molecule that is not metabolized by CYP P450 2D6?\nAssistant: This is a molecule that is not a CYP P450 2D6 substrate: CCOC(=O)[C@H](CCc1ccccc1)N[C@@H](C)C(=O)N1CCC[C@H]1C(=O)O"}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/valid_0-8.jsonl": "{"text":"User: Is the molecule with the DeepSMILES NcncNCCC3))))cncn[C@H]C=C[C@@H]CO))C5)))))c5n9 metabolized by CYP2D6?\nAssistant: No, it is not a CYP P450 2D6 substrate."} {"text":"User: Is the molecule with the SMILES O=C(N[C@H](Cc1cc(=O)[nH]c2ccccc12)C(=O)O)c1ccc(Cl)cc1 metabolized by CYP P450 2D6?\nAssistant: No, it is not a substrate for CYP P450 2D6."}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/train_0-8.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C24H24N2O4\/c1-15(2)30-24(27)23-19(14-28-3)22-18-11-17(29-13-16-7-5-4-6-8-16)9-10-20(18)26-21(22)12-25-23\/h4-12,15,26H,13-14H2,1-3H3 metabolized by CYP P450 2D6?\nAssistant: No, it is not a substrate for CYP2D6."} {"text":"User: Is the molecule with the SELFIES [O][=C][N][C][=Branch1][C][=O][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][N][Ring2][Ring1][C] metabolized by CYP2D6?\nAssistant: Yes, it is a substrate for CYP P450 2D6."}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is metabolized by CYP2D6.\nDeepSMILES: CCCC=O)NccccOC[C@@H]O)CNCC)C)))))))cCC)=O))c6\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not a CYP P450 2D6 substrate."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is metabolized by CYP P450 2D6.\nSELFIES: [C][C][O][C][=Branch1][C][=O][C@H1][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][N][C@@H1][Branch1][C][C][C][=Branch1][C][=O][N][C][C][C][C@H1][Ring1][Branch1][C][=Branch1][C][=O][O]\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not a substrate for CYP2D6."}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/valid_0-9.jsonl": "{"text":"User: Can you create the SMILES of a molecule that is a not CYP2D6 substrate?\nAssistant: Of course, here you go: Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1"} {"text":"User: Can you give me the SELFIES of a molecule that is a not substrate for CYP P450 2D6?\nAssistant: Yes, I'm happy to help, here you go: [O][=C][Branch2][Ring1][=N][N][C@H1][Branch2][Ring1][Ring1][C][C][=C][C][=Branch1][C][=O][NH1][C][=C][C][=C][C][=C][Ring1][O][Ring1][=Branch1][C][=Branch1][C][=O][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1]"}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/test_0-1.jsonl": "{"text":"Based on the canonical SMILES representation CCCC(=O)Nc1ccc(OC[C@@H](O)CNC(C)C)c(C(C)=O)c1, the molecule is not metabolized by CYP2D6."} {"text":"Based on the DeepSMILES CCOC=O)[C@H]CCcccccc6))))))))N[C@@H]C)C=O)NCCC[C@H]5C=O)O, the molecule is not metabolized by CYP P450 2D6."}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/valid_0-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1 is not a CYP2D6 substrate."} {"text":"The molecule with the SMILES O=C(N[C@H](Cc1cc(=O)[nH]c2ccccc12)C(=O)O)c1ccc(Cl)cc1 is not a substrate for CYP P450 2D6."}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/test_0-2.jsonl": "{"text":"The canonical SMILES CCCC(=O)Nc1ccc(OC[C@@H](O)CNC(C)C)c(C(C)=O)c1 is from a molecule that is not identified as a substrate for CYP P450 2D6."} {"text":"The canonical SMILES CCOC(=O)[C@H](CCc1ccccc1)N[C@@H](C)C(=O)N1CCC[C@H]1C(=O)O represents a molecule that is not identified as a CYP P450 2D6 substrate."}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/valid_0-10.jsonl": "{"text":"User: I'm looking for the DeepSMILES of a molecule that is not metabolized by CYP P450 2D6?\nAssistant: This is a molecule that is not a CYP P450 2D6 substrate: NcncNCCC3))))cncn[C@H]C=C[C@@H]CO))C5)))))c5n9"} {"text":"User: I'm searching for the SELFIES of a molecule that is not metabolized by CYP P450 2D6?\nAssistant: This is a molecule that is not a substrate for CYP2D6: [O][=C][Branch2][Ring1][=N][N][C@H1][Branch2][Ring1][Ring1][C][C][=C][C][=Branch1][C][=O][NH1][C][=C][C][=C][C][=C][Ring1][O][Ring1][=Branch1][C][=Branch1][C][=O][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1]"}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/train_0-6.jsonl": "{"text":"Task: Please give me a InChI based on the text description.\nDescription: A molecule that is a CYP P450 2D6 substrate.\nResult: InChI=1S\/C24H24N2O4\/c1-15(2)30-24(27)23-19(14-28-3)22-18-11-17(29-13-16-7-5-4-6-8-16)9-10-20(18)26-21(22)12-25-23\/h4-12,15,26H,13-14H2,1-3H3"} {"text":"Task: Please give me a SMILES based on the text description below.\nDescription: A molecule that is a CYP P450 2D6 substrate.\nResult: O=C1NC(=O)C(c2ccccc2)(c2ccccc2)N1"}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/valid_0-6.jsonl": "{"text":"Task: Please give me a molecule InChI based on the description below.\nDescription: A molecule that is a CYP2D6 substrate.\nResult: InChI=1S\/C14H18N6O\/c15-14-18-12(17-9-2-3-9)11-13(19-14)20(7-16-11)10-4-1-8(5-10)6-21\/h1,4,7-10,21H,2-3,5-6H2,(H3,15,17,18,19)\/t8-,10+\/m1\/s1"} {"text":"Task: Please give me a DeepSMILES based on the description.\nDescription: A molecule that is a CYP P450 2D6 substrate.\nResult: O=CN[C@H]Cccc=O)[nH]cccccc%106)))))))))))C=O)O))))ccccCl)cc6"}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/test_0-9.jsonl": "{"text":"User: Can you generate the InChI of a molecule that is a not CYP2D6 substrate?\nAssistant: Sure, here you go: InChI=1S\/C18H28N2O4\/c1-5-6-18(23)20-14-7-8-17(16(9-14)13(4)21)24-11-15(22)10-19-12(2)3\/h7-9,12,15,19,22H,5-6,10-11H2,1-4H3,(H,20,23)\/t15-\/m0\/s1"} {"text":"User: Can you generate the SELFIES of a molecule that is a not substrate for CYP2D6?\nAssistant: Yes, here you go: [C][C][O][C][=Branch1][C][=O][C@H1][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][N][C@@H1][Branch1][C][C][C][=Branch1][C][=O][N][C][C][C][C@H1][Ring1][Branch1][C][=Branch1][C][=O][O]"}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/test_0-0.jsonl": "{"text":"The molecule with the SMILES CCCC(=O)Nc1ccc(OC[C@@H](O)CNC(C)C)c(C(C)=O)c1 is not a substrate for CYP2D6."} {"text":"The molecule with the InChI representation of InChI=1S\/C20H28N2O5\/c1-3-27-20(26)16(12-11-15-8-5-4-6-9-15)21-14(2)18(23)22-13-7-10-17(22)19(24)25\/h4-6,8-9,14,16-17,21H,3,7,10-13H2,1-2H3,(H,24,25)\/t14-,16-,17-\/m0\/s1 is not a CYP P450 2D6 substrate."}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/valid_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the canonical SMILES Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1 is a CYP2D6 substrate?\nAssistant: No, this molecule is not metabolized by CYP2D6."} {"text":"User: Can you estimate if the molecule with the SMILES O=C(N[C@H](Cc1cc(=O)[nH]c2ccccc12)C(=O)O)c1ccc(Cl)cc1 is a CYP2D6 substrate?\nAssistant: No, this molecule is not metabolized by CYP P450 2D6."}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/test_0-3.jsonl": "{"text":"The SELFIES [C][C][C][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][=C][O][C][C@@H1][Branch1][C][O][C][N][C][Branch1][C][C][C][C][Branch1][=Branch1][C][Branch1][C][C][=O][=C][Ring2][Ring1][C] is not metabolized by CYP P450 2D6."} {"text":"The molecule canonical SMILES CCOC(=O)[C@H](CCc1ccccc1)N[C@@H](C)C(=O)N1CCC[C@H]1C(=O)O is not metabolized by CYP2D6."}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/valid_0-11.jsonl": "{"text":"User: I want to come up with a SELFIES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be metabolized by CYP P450 2D6.\nAssistant: Got it, here you go, this SELFIES is not metabolized by CYP P450 2D6: [N][C][=N][C][Branch1][#Branch1][N][C][C][C][Ring1][Ring1][=C][N][=C][N][Branch1][N][C@H1][C][=C][C@@H1][Branch1][Ring1][C][O][C][Ring1][#Branch1][C][Ring1][N][=N][Ring2][Ring1][Ring2]"} {"text":"User: I want to create a SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be metabolized by CYP2D6.\nAssistant: Got it, here you go, this SMILES is not metabolized by CYP2D6: O=C(N[C@H](Cc1cc(=O)[nH]c2ccccc12)C(=O)O)c1ccc(Cl)cc1"}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/train_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of [C][O][C][C][=C][Branch1][O][C][=Branch1][C][=O][O][C][Branch1][C][C][C][N][=C][C][NH1][C][=C][C][=C][Branch1][O][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][=C][C][Ring2][Ring1][O][=Ring1][P] is not a CYP P450 2D6 substrate."} {"text":"The molecule with the SELFIES representation of [O][=C][N][C][=Branch1][C][=O][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][N][Ring2][Ring1][C] is a CYP2D6 substrate."}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/test_0-6.jsonl": "{"text":"Task: Please give me a SMILES based on the text description.\nDescription: A molecule that is a CYP P450 2D6 substrate.\nResult: CCCC(=O)Nc1ccc(OC[C@@H](O)CNC(C)C)c(C(C)=O)c1"} {"text":"Task: Please create a DeepSMILES based on the description below.\nDescription: A molecule that is a substrate for CYP2D6.\nResult: CCOC=O)[C@H]CCcccccc6))))))))N[C@@H]C)C=O)NCCC[C@H]5C=O)O"}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/train_0-10.jsonl": "{"text":"User: I'm searching for the SELFIES of a molecule that is not metabolized by CYP2D6?\nAssistant: This is a molecule that is not a substrate for CYP P450 2D6: [C][O][C][C][=C][Branch1][O][C][=Branch1][C][=O][O][C][Branch1][C][C][C][N][=C][C][NH1][C][=C][C][=C][Branch1][O][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][=C][C][Ring2][Ring1][O][=Ring1][P]"} {"text":"User: I'm looking for the InChI of a molecule that is metabolized by CYP2D6?\nAssistant: This is a molecule that is a substrate for CYP P450 2D6: InChI=1S\/C15H12N2O2\/c18-13-15(17-14(19)16-13,11-7-3-1-4-8-11)12-9-5-2-6-10-12\/h1-10H,(H2,16,17,18,19)"}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/train_0-3.jsonl": "{"text":"The molecule SMILES COCc1c(C(=O)OC(C)C)ncc2[nH]c3ccc(OCc4ccccc4)cc3c12 is not metabolized by CYP2D6."} {"text":"The molecule SELFIES [O][=C][N][C][=Branch1][C][=O][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][N][Ring2][Ring1][C] is metabolized by CYP P450 2D6."}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/train_0-12.jsonl": "{"text":"User: I want to generate a InChI.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be a CYP2D6 substrate.\nAssistant: Understood, this InChI is not a CYP2D6 substrate: InChI=1S\/C24H24N2O4\/c1-15(2)30-24(27)23-19(14-28-3)22-18-11-17(29-13-16-7-5-4-6-8-16)9-10-20(18)26-21(22)12-25-23\/h4-12,15,26H,13-14H2,1-3H3"} {"text":"User: I want to create a SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should be a CYP2D6 substrate.\nAssistant: Ok, this SMILES is a CYP2D6 substrate: O=C1NC(=O)C(c2ccccc2)(c2ccccc2)N1"}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI InChI=1S\/C18H28N2O4\/c1-5-6-18(23)20-14-7-8-17(16(9-14)13(4)21)24-11-15(22)10-19-12(2)3\/h7-9,12,15,19,22H,5-6,10-11H2,1-4H3,(H,20,23)\/t15-\/m0\/s1 metabolized by CYP P450 2D6?\nConstraint: Even if you are uncertain, you must pick either A or B without using any additional words.\nOptions:\nA.) True\nB.) False\nAnswer: B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES CCOC(=O)[C@H](CCc1ccccc1)N[C@@H](C)C(=O)N1CCC[C@H]1C(=O)O metabolized by CYP P450 2D6?\nConstraint: Even if you are not sure, you must pick either a or b without using any additional words.\nOptions:\n(a) True\n(b) False\nAnswer: b"}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/valid_0-2.jsonl": "{"text":"The InChI InChI=1S\/C14H18N6O\/c15-14-18-12(17-9-2-3-9)11-13(19-14)20(7-16-11)10-4-1-8(5-10)6-21\/h1,4,7-10,21H,2-3,5-6H2,(H3,15,17,18,19)\/t8-,10+\/m1\/s1 represents a molecule that is not identified as a CYP2D6 substrate."} {"text":"The InChI InChI=1S\/C19H15ClN2O4\/c20-13-7-5-11(6-8-13)18(24)22-16(19(25)26)9-12-10-17(23)21-15-4-2-1-3-14(12)15\/h1-8,10,16H,9H2,(H,21,23)(H,22,24)(H,25,26)\/t16-\/m1\/s1 is from a molecule that is not identified as a substrate for CYP P450 2D6."}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a CYP P450 2D6 substrate?\nConstraint: You must select none, one or more options from 1, 2, 3, 4, or 5 without using any additional words.\nOptions:\n1. [C][C][C][=N][C][Branch1][C][N][=N][C][Branch1][C][N][=C][Ring1][Branch2][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1]\n2. [C][O][C][C][=C][Branch1][O][C][=Branch1][C][=O][O][C][Branch1][C][C][C][N][=C][C][NH1][C][=C][C][=C][Branch1][O][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][=C][C][Ring2][Ring1][O][=Ring1][P]\n3. [C][O][C][=C][C][=C][Branch1][#Branch2][C][N][C][C][N][C][C][Ring1][=Branch1][C][Branch1][Ring1][O][C][=C][Ring1][#C][O][C]\n4. [O][=P@@][Branch1][#Branch2][N][Branch1][Ring2][C][C][Cl][C][C][Cl][O][C][C][C][N][Ring1][=N][C][C][Cl]\n5. [C][C][C][Branch1][C][C][Branch1][C][C][N][C][C@H1][Branch1][C][O][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1]\nAnswer: 1, 2, 3, 4"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are a substrate for CYP P450 2D6?\nConstraint: You must select none, one or more options from a, b, c, d, or e without using any additional words.\nOptions:\na) O=CNC=O)Ccccccc6))))))cccccc6))))))N5\nb) OCCOCCNCCN[C@H]cccccc6))))))ccccCl)cc6)))))))CC6\nc) CccccCl)cOC[C@@H]O)CNCC)C)C)))))))c6\nd) CCC[C@@H]C[C@H]C=O)N[C@H][C@H]C)Cl))[C@H]O[C@H]SC))[C@H]O)[C@@H]O)[C@H]6O))))))))))NC)C5\ne) CO[C@H]C=CO[C@@]C)OccC)cO)ccO)cc\/C=N\\NCCNCCCCC5)))))CC6))))))))cO)c6c%10C%13=O))))))NC=O)CC)=CC=C[C@H]C)[C@H]O)[C@@H]C)[C@@H]O)[C@@H]C)[C@H]OCC)=O)))[C@H]%25C\nAnswer: a, c"}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/valid_0-1.jsonl": "{"text":"Based on the SMILES Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1, the molecule is not metabolized by CYP2D6."} {"text":"Based on the SELFIES representation [O][=C][Branch2][Ring1][=N][N][C@H1][Branch2][Ring1][Ring1][C][C][=C][C][=Branch1][C][=O][NH1][C][=C][C][=C][C][=C][Ring1][O][Ring1][=Branch1][C][=Branch1][C][=O][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1], the molecule is not metabolized by CYP P450 2D6."}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI InChI=1S\/C14H18N6O\/c15-14-18-12(17-9-2-3-9)11-13(19-14)20(7-16-11)10-4-1-8(5-10)6-21\/h1,4,7-10,21H,2-3,5-6H2,(H3,15,17,18,19)\/t8-,10+\/m1\/s1 metabolized by CYP P450 2D6?\nConstraint: Even if you are uncertain, you must pick either a or b without using any additional words.\nOptions:\na. False\nb. True\nAnswer: a"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES O=CN[C@H]Cccc=O)[nH]cccccc%106)))))))))))C=O)O))))ccccCl)cc6 metabolized by CYP P450 2D6?\nConstraint: Even if you are not sure, you must pick either A or B without using any other words.\nOptions:\nA. False\nB. True\nAnswer: A"}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is metabolized by CYP P450 2D6.\nMolecule SMILES: Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not a CYP P450 2D6 substrate."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is metabolized by CYP2D6.\ncanonical SMILES: O=C(N[C@H](Cc1cc(=O)[nH]c2ccccc12)C(=O)O)c1ccc(Cl)cc1\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not a substrate for CYP P450 2D6."}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a substrate for CYP2D6.\nDeepSMILES: NcncNCCC3))))cncn[C@H]C=C[C@@H]CO))C5)))))c5n9\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nesult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a CYP P450 2D6 substrate.\nDeepSMILES: O=CN[C@H]Cccc=O)[nH]cccccc%106)))))))))))C=O)O))))ccccCl)cc6\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nesult: False"}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is metabolized by CYP P450 2D6.\nMolecule SELFIES: [C][O][C][C][=C][Branch1][O][C][=Branch1][C][=O][O][C][Branch1][C][C][C][N][=C][C][NH1][C][=C][C][=C][Branch1][O][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][=C][C][Ring2][Ring1][O][=Ring1][P]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not a CYP2D6 substrate."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is metabolized by CYP2D6.\nInChI: InChI=1S\/C15H12N2O2\/c18-13-15(17-14(19)16-13,11-7-3-1-4-8-11)12-9-5-2-6-10-12\/h1-10H,(H2,16,17,18,19)\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is a CYP2D6 substrate."}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/valid_0-12.jsonl": "{"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be a CYP2D6 substrate.\nAssistant: Ok, this InChI is not a CYP2D6 substrate: InChI=1S\/C14H18N6O\/c15-14-18-12(17-9-2-3-9)11-13(19-14)20(7-16-11)10-4-1-8(5-10)6-21\/h1,4,7-10,21H,2-3,5-6H2,(H3,15,17,18,19)\/t8-,10+\/m1\/s1"} {"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be a CYP P450 2D6 substrate.\nAssistant: Understood, this InChI is not a CYP P450 2D6 substrate: InChI=1S\/C19H15ClN2O4\/c20-13-7-5-11(6-8-13)18(24)22-16(19(25)26)9-12-10-17(23)21-15-4-2-1-3-14(12)15\/h1-8,10,16H,9H2,(H,21,23)(H,22,24)(H,25,26)\/t16-\/m1\/s1"}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/train_0-2.jsonl": "{"text":"The SMILES COCc1c(C(=O)OC(C)C)ncc2[nH]c3ccc(OCc4ccccc4)cc3c12 represents a molecule that is not identified as a substrate for CYP2D6."} {"text":"The InChI InChI=1S\/C15H12N2O2\/c18-13-15(17-14(19)16-13,11-7-3-1-4-8-11)12-9-5-2-6-10-12\/h1-10H,(H2,16,17,18,19) represents a molecule that is identified as a substrate for CYP P450 2D6."}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/test_0-11.jsonl": "{"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be metabolized by CYP P450 2D6.\nAssistant: Ok, this InChI is not metabolized by CYP P450 2D6: InChI=1S\/C18H28N2O4\/c1-5-6-18(23)20-14-7-8-17(16(9-14)13(4)21)24-11-15(22)10-19-12(2)3\/h7-9,12,15,19,22H,5-6,10-11H2,1-4H3,(H,20,23)\/t15-\/m0\/s1"} {"text":"User: I want to create a molecule SELFIES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be metabolized by CYP P450 2D6.\nAssistant: Got it, here you go, this SELFIES is not metabolized by CYP P450 2D6: [C][C][O][C][=Branch1][C][=O][C@H1][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][N][C@@H1][Branch1][C][C][C][=Branch1][C][=O][N][C][C][C][C@H1][Ring1][Branch1][C][=Branch1][C][=O][O]"}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/train_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the canonical SMILES COCc1c(C(=O)OC(C)C)ncc2[nH]c3ccc(OCc4ccccc4)cc3c12 is a CYP2D6 substrate?\nAssistant: No, this molecule is not metabolized by CYP2D6."} {"text":"User: Can you estimate if the molecule with the canonical SMILES O=C1NC(=O)C(c2ccccc2)(c2ccccc2)N1 is a substrate for CYP P450 2D6?\nAssistant: Yes, this molecule is metabolized by CYP P450 2D6."}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/train_0-11.jsonl": "{"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be metabolized by CYP P450 2D6.\nAssistant: Ok, here you go, this DeepSMILES is not metabolized by CYP P450 2D6: COCccC=O)OCC)C))))ncc[nH]ccccOCcccccc6))))))))cc6c%139"} {"text":"User: I want to generate a InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should be metabolized by CYP2D6.\nAssistant: Got it, this InChI is metabolized by CYP2D6: InChI=1S\/C15H12N2O2\/c18-13-15(17-14(19)16-13,11-7-3-1-4-8-11)12-9-5-2-6-10-12\/h1-10H,(H2,16,17,18,19)"}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/train_0-1.jsonl": "{"text":"Based on the InChI representation InChI=1S\/C24H24N2O4\/c1-15(2)30-24(27)23-19(14-28-3)22-18-11-17(29-13-16-7-5-4-6-8-16)9-10-20(18)26-21(22)12-25-23\/h4-12,15,26H,13-14H2,1-3H3, the molecule is not metabolized by CYP P450 2D6."} {"text":"Based on the InChI InChI=1S\/C15H12N2O2\/c18-13-15(17-14(19)16-13,11-7-3-1-4-8-11)12-9-5-2-6-10-12\/h1-10H,(H2,16,17,18,19), the molecule is metabolized by CYP P450 2D6."}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of COCc1c(C(=O)OC(C)C)ncc2[nH]c3ccc(OCc4ccccc4)cc3c12 metabolized by CYP2D6?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any other words.\nOptions:\n1) False\n2) True\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI InChI=1S\/C15H12N2O2\/c18-13-15(17-14(19)16-13,11-7-3-1-4-8-11)12-9-5-2-6-10-12\/h1-10H,(H2,16,17,18,19) metabolized by CYP P450 2D6?\nConstraint: Even if you are uncertain, you must pick either a or b without using any other words.\nOptions:\na. False\nb. True\nAnswer: b"}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a CYP P450 2D6 substrate.\ncanonical SMILES: COCc1c(C(=O)OC(C)C)ncc2[nH]c3ccc(OCc4ccccc4)cc3c12\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nesult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a substrate for CYP P450 2D6.\nMolecule InChI: InChI=1S\/C15H12N2O2\/c18-13-15(17-14(19)16-13,11-7-3-1-4-8-11)12-9-5-2-6-10-12\/h1-10H,(H2,16,17,18,19)\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nesult: True"}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/test_0-7.jsonl": "{"text":"User: Can you derive if the molecule with the SMILES CCCC(=O)Nc1ccc(OC[C@@H](O)CNC(C)C)c(C(C)=O)c1 is a CYP P450 2D6 substrate?\nAssistant: No, this molecule is not metabolized by CYP2D6."} {"text":"User: Can you derive if the molecule with the canonical SMILES CCOC(=O)[C@H](CCc1ccccc1)N[C@@H](C)C(=O)N1CCC[C@H]1C(=O)O is a CYP P450 2D6 substrate?\nAssistant: No, this molecule is not metabolized by CYP P450 2D6."}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/train_0-9.jsonl": "{"text":"User: Can you give me the SELFIES of a molecule that is a not CYP P450 2D6 substrate?\nAssistant: Of course, here you go: [C][O][C][C][=C][Branch1][O][C][=Branch1][C][=O][O][C][Branch1][C][C][C][N][=C][C][NH1][C][=C][C][=C][Branch1][O][O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Ring1][=C][C][Ring2][Ring1][O][=Ring1][P]"} {"text":"User: Can you give me the canonical SMILES of a molecule that is a substrate for CYP P450 2D6?\nAssistant: Yes, I'm happy to help, here you go: O=C1NC(=O)C(c2ccccc2)(c2ccccc2)N1"}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/valid_0-3.jsonl": "{"text":"The molecule SELFIES [N][C][=N][C][Branch1][#Branch1][N][C][C][C][Ring1][Ring1][=C][N][=C][N][Branch1][N][C@H1][C][=C][C@@H1][Branch1][Ring1][C][O][C][Ring1][#Branch1][C][Ring1][N][=N][Ring2][Ring1][Ring2] is not metabolized by CYP P450 2D6."} {"text":"The molecule SELFIES [O][=C][Branch2][Ring1][=N][N][C@H1][Branch2][Ring1][Ring1][C][C][=C][C][=Branch1][C][=O][NH1][C][=C][C][=C][C][=C][Ring1][O][Ring1][=Branch1][C][=Branch1][C][=O][O][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1] is not metabolized by CYP P450 2D6."}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/test_0-8.jsonl": "{"text":"User: Is the molecule with the DeepSMILES CCCC=O)NccccOC[C@@H]O)CNCC)C)))))))cCC)=O))c6 metabolized by CYP P450 2D6?\nAssistant: No, it is not a CYP P450 2D6 substrate."} {"text":"User: Is the molecule with the SMILES CCOC(=O)[C@H](CCc1ccccc1)N[C@@H](C)C(=O)N1CCC[C@H]1C(=O)O metabolized by CYP P450 2D6?\nAssistant: No, it is not a substrate for CYP P450 2D6."}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a CYP P450 2D6 substrate?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any additional words.\nOptions:\n[A] CC(=O)S[C@@H]1CC2=CC(=O)CC[C@]2(C)[C@H]2CC[C@@]3(C)[C@@H](CC[C@@]34CCC(=O)O4)[C@@H]21\n[B] O=[P@@]1(N(CCCl)CCCl)OCCCN1CCCl\n[C] CC(C)Oc1ccc2c(=O)c(-c3ccccc3)coc2c1\n[D] COC(=O)C1=C(C)NC(C)=C(C(=O)O[C@@H]2CCCN(Cc3ccccc3)C2)[C@@H]1c1cccc([N+](=O)[O-])c1\n[E] CCCC(=O)Nc1ccc(OC[C@@H](O)CNC(C)C)c(C(C)=O)c1\nAnswer: A, B, C, D, E"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a CYP P450 2D6 substrate?\nConstraint: You must select none, one or more options from a, b, c, d, or e without using any additional words.\nOptions:\n[a] CCCCN(CCCC)CC[C@H](O)c1cc2c(Cl)cc(Cl)cc2c2cc(C(F)(F)F)ccc12\n[b] COCc1c(C(=O)OC(C)C)ncc2[nH]c3ccc(OCc4ccccc4)cc3c12\n[c] CCCCN(CCCC)C[C@@H](O)c1cc(Cl)cc2c1-c1ccc(Cl)cc1\/C2=C\\c1ccc(Cl)cc1\n[d] CCOC(=O)C(C)(C)Oc1ccc(Cl)cc1\n[e] CCOC(=O)[C@H](CCc1ccccc1)N[C@@H](C)C(=O)N1CCC[C@H]1C(=O)O\nAnswer: b, c, d, e"}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a CYP P450 2D6 substrate?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any additional words.\nOptions:\nA.) [N][C][=N][C][Branch1][#Branch1][N][C][C][C][Ring1][Ring1][=C][N][=C][N][Branch1][N][C@H1][C][=C][C@@H1][Branch1][Ring1][C][O][C][Ring1][#Branch1][C][Ring1][N][=N][Ring2][Ring1][Ring2]\nB.) [C][C][C][C][C][=Branch1][C][=O][N][Branch2][Ring1][O][C][C][=C][C][=C][Branch1][S][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][N][=N][NH1][Ring1][Branch1][C][=C][Ring1][P][C@H1][Branch1][=Branch1][C][=Branch1][C][=O][O][C][Branch1][C][C][C]\nC.) [C][C][C][=N][C][Branch1][C][N][=N][C][Branch1][C][N][=C][Ring1][Branch2][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1]\nD.) [N][C][=C][C][=C][Branch2][Ring1][Ring1][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][C][N][C][=C][Ring1][#Branch1][C][=C][Ring1][S]\nE.) [C][C][O][C][=Branch1][C][=O][C][=C][N][=C][N][Ring1][Branch1][C@H1][Branch1][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1]\nAnswer: A, B, C, E"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a substrate for CYP2D6?\nConstraint: You must select none, one or more options from a or b without using any other words.\nOptions:\n(a) O=CN[C@H]Cccc=O)[nH]cccccc%106)))))))))))C=O)O))))ccccCl)cc6\n(b) Clcccccc6Ccccccc6))))))cccccc6))))))nccnc5\nAnswer: a, b"}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a substrate for CYP2D6.\nSMILES: CCCC(=O)Nc1ccc(OC[C@@H](O)CNC(C)C)c(C(C)=O)c1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nesult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a CYP2D6 substrate.\nMolecule canonical SMILES: CCOC(=O)[C@H](CCc1ccccc1)N[C@@H](C)C(=O)N1CCC[C@H]1C(=O)O\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nesult: False"}", "/scratch/micpie/export/cyp2d6_substrate_carbonmangels/test_0-12.jsonl": "{"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be a substrate for CYP2D6.\nAssistant: Ok, this SMILES is not a substrate for CYP2D6: CCCC(=O)Nc1ccc(OC[C@@H](O)CNC(C)C)c(C(C)=O)c1"} {"text":"User: I want to generate a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be a CYP P450 2D6 substrate.\nAssistant: Got it, this canonical SMILES is not a CYP P450 2D6 substrate: CCOC(=O)[C@H](CCc1ccccc1)N[C@@H](C)C(=O)N1CCC[C@H]1C(=O)O"}", "/scratch/micpie/export/uniprot_sentences/test_0-1.jsonl": "{"text":"User: What protein fits the description best?\\nOdorant coreceptor which complexes with conventional odorant receptors (ORs) to form odorant-sensing units, providing sensitive and prolonged odorant signaling and calcium permeability. Obligate coreceptor of all odorant receptors. Orco is a universal and integral part of the functional odorant receptor, involved in the dendritic localization of other olfactory receptors. Can form functional ion channels in the absence of an odor-binding odorant receptor. Plays a central role in the perception of olfactory stimuli in ants and is essential for ant social organization. Required for pheromone sensing. Also required for the development and maintenance of odorant receptor neurons (ORNs) and of antennal lobe glomeruli \nAssistant: A protein that fits the description is:\\nMMKMKQQGLVADLLPNIRVMKTFGHFVFNYYNDNSSKYLHKVYCCVNLFMLLLQFGLCAVNLIVESADVDDLTANTITLLFFTHSIVKICYFAIRSKYFYRTWAIWNNPNSHPLFAESNARYHAIALKKMRLLLFLVGGTTMLAAVAWTVLTFFEHPIRKIVDPVTNETEIIELPQLLIRSFYPFDAGKGITHVLVLVYQFYWVLFMLIDANSLDVLFCSWLLFACEQLQHLKQIMKPLMELSATLDTVVPNSSELFKAGSADHLRDGDNPPPPPPPQSDNMLDLDLRNIYSNRQDFTATFRPTAGMTFNGGVGPNGLTKKQEALVRSAIKYWVERHKHIVRLVTAVGDAYGFALLLHMLTTTITLTLLAYQATKVNGINVYAASTIGYILYTFGQVFLFCIFGNRLIEESTSVMEAAYSCHWYDGSEEAKTFVQIVCQQCQKAMSISGAKFFTVSLDLFASVLGAVVTYFMVLVQLK"} {"text":"User: What amino acid sequence fits the biological description below best?\\nPlays an essential for the replication of viral DNA. Presumably cleaves viral genomic dsRNA replicative form to initiate rolling circle replication\nAssistant: A amino acid sequence that fits the description is:\\nMSAPLSIEQDDLLTDDLKSWLSDIDFSNDNEEAIEMEPSDIEMSSPPIDIETSPPEEADVNLDDTWATVQKNGNNKLNRFILTFFPSDMDTKWLEPETYFENSPNKFDCWTGQYEYCPDTGKLHAHIYIECNHKHRIRFNVFHREIRKYHQSVQLQLAKRASKKQRQSAINYVTADFKRAPGSLVFRWEHNKFPSDFDPKCVNKKSKSDKVSKDEQHETQRLWIESKPRHWTWDQIVHENEESKKLLFGCTAGEKYHKGRHAEDARRTINDVIIFYGAGGTGKTTEAQAWGSEDEPVQECRYYRRNPDDGAFWGGGRTCYKGQRIVHYEEFAGQEAFGRLKEVCDIGKHGPAVNVKNGGALLNHDTVIFTSNIHPAGWFHKLWESDPKQWMPFERRITQVRFYPSHRADGSLNQPDENNPPYFIDQTEEFRQFVGDYDKAKEHAELHWPLKEAPEPTAQVFVPGRSHGVTENTFFEYCKTGRAP"}", "/scratch/micpie/export/uniprot_sentences/valid_0-0.jsonl": "{"text":"User: Please briefly describe the biochemical function of the AA sequence: MSLEQKKGADIISKILQIQNSIGKTTSPSTLKTKLSEISRKEQENARIQSKLSDLQKKKIDIDNKLLKEKQNLIKEEILERKKLEVLTKKQQKDEIEHQKKLKREIDAIKASTQYITDVSISSYNNTIPETEPEYDLFISHASEDKEDFVRPLAETLQQLGVNVWYDEFTLKVGDSLRQKIDSGLRNSKYGTVVLSTDFIKKDWTNYELDGLVAREMNGHKMILPIWHKITKNDVLDYSPNLADKVALNTSVNSIEEIAHQLADVILNR\nAssistant: NAD(+) hydrolase (NADase) that catalyzes cleavage of NAD(+) into ADP-D-ribose (ADPR) and nicotinamide. In addition to ADPR, also generates a cyclization variant of cyclic ADPR (cADPR), termed 2'cADPR (v-cADPR). Cleaves NADP(+), but does not cyclize the product "} {"text":"User: Briefly describe the function of this protein: MKAVVSKLYCSSMEEVMVVRRRPHVVNGGGFVVTDYKEKIVFKIDGCGVLGTKGELVLRDSDGNDLLLIHKKGGVVQALSIHNKWKGYSYDYQGSPKPVFTLRDPKHSCFSITSSIRISVGPGNCYFDVKGYFPDRDCSIVDSKGNVIAQVKEWIGSRDIYKVVTKPSVDKAFVFGVIAVLDYIYGESTSC\nAssistant: Might be related to the phospholipid scramblase and tubby-like superfamily of membrane tethered transcription factors"}", "/scratch/micpie/export/uniprot_sentences/test_0-2.jsonl": "{"text":"Task: Generate a description of a few sentences for the AA sequence below.\nPolypeptide: MMKMKQQGLVADLLPNIRVMKTFGHFVFNYYNDNSSKYLHKVYCCVNLFMLLLQFGLCAVNLIVESADVDDLTANTITLLFFTHSIVKICYFAIRSKYFYRTWAIWNNPNSHPLFAESNARYHAIALKKMRLLLFLVGGTTMLAAVAWTVLTFFEHPIRKIVDPVTNETEIIELPQLLIRSFYPFDAGKGITHVLVLVYQFYWVLFMLIDANSLDVLFCSWLLFACEQLQHLKQIMKPLMELSATLDTVVPNSSELFKAGSADHLRDGDNPPPPPPPQSDNMLDLDLRNIYSNRQDFTATFRPTAGMTFNGGVGPNGLTKKQEALVRSAIKYWVERHKHIVRLVTAVGDAYGFALLLHMLTTTITLTLLAYQATKVNGINVYAASTIGYILYTFGQVFLFCIFGNRLIEESTSVMEAAYSCHWYDGSEEAKTFVQIVCQQCQKAMSISGAKFFTVSLDLFASVLGAVVTYFMVLVQLK\nResult: Odorant coreceptor which complexes with conventional odorant receptors (ORs) to form odorant-sensing units, providing sensitive and prolonged odorant signaling and calcium permeability. Obligate coreceptor of all odorant receptors. Orco is a universal and integral part of the functional odorant receptor, involved in the dendritic localization of other olfactory receptors. Can form functional ion channels in the absence of an odor-binding odorant receptor. Plays a central role in the perception of olfactory stimuli in ants and is essential for ant social organization. Required for pheromone sensing. Also required for the development and maintenance of odorant receptor neurons (ORNs) and of antennal lobe glomeruli "} {"text":"Task: Generate a description for the amino acid sequence below.\nAmino acid sequence: MSAPLSIEQDDLLTDDLKSWLSDIDFSNDNEEAIEMEPSDIEMSSPPIDIETSPPEEADVNLDDTWATVQKNGNNKLNRFILTFFPSDMDTKWLEPETYFENSPNKFDCWTGQYEYCPDTGKLHAHIYIECNHKHRIRFNVFHREIRKYHQSVQLQLAKRASKKQRQSAINYVTADFKRAPGSLVFRWEHNKFPSDFDPKCVNKKSKSDKVSKDEQHETQRLWIESKPRHWTWDQIVHENEESKKLLFGCTAGEKYHKGRHAEDARRTINDVIIFYGAGGTGKTTEAQAWGSEDEPVQECRYYRRNPDDGAFWGGGRTCYKGQRIVHYEEFAGQEAFGRLKEVCDIGKHGPAVNVKNGGALLNHDTVIFTSNIHPAGWFHKLWESDPKQWMPFERRITQVRFYPSHRADGSLNQPDENNPPYFIDQTEEFRQFVGDYDKAKEHAELHWPLKEAPEPTAQVFVPGRSHGVTENTFFEYCKTGRAP\nResult: Plays an essential for the replication of viral DNA. Presumably cleaves viral genomic dsRNA replicative form to initiate rolling circle replication"}", "/scratch/micpie/export/uniprot_sentences/test_0-0.jsonl": "{"text":"User: Please describe the biological function of this protein: MMKMKQQGLVADLLPNIRVMKTFGHFVFNYYNDNSSKYLHKVYCCVNLFMLLLQFGLCAVNLIVESADVDDLTANTITLLFFTHSIVKICYFAIRSKYFYRTWAIWNNPNSHPLFAESNARYHAIALKKMRLLLFLVGGTTMLAAVAWTVLTFFEHPIRKIVDPVTNETEIIELPQLLIRSFYPFDAGKGITHVLVLVYQFYWVLFMLIDANSLDVLFCSWLLFACEQLQHLKQIMKPLMELSATLDTVVPNSSELFKAGSADHLRDGDNPPPPPPPQSDNMLDLDLRNIYSNRQDFTATFRPTAGMTFNGGVGPNGLTKKQEALVRSAIKYWVERHKHIVRLVTAVGDAYGFALLLHMLTTTITLTLLAYQATKVNGINVYAASTIGYILYTFGQVFLFCIFGNRLIEESTSVMEAAYSCHWYDGSEEAKTFVQIVCQQCQKAMSISGAKFFTVSLDLFASVLGAVVTYFMVLVQLK\nAssistant: Odorant coreceptor which complexes with conventional odorant receptors (ORs) to form odorant-sensing units, providing sensitive and prolonged odorant signaling and calcium permeability. Obligate coreceptor of all odorant receptors. Orco is a universal and integral part of the functional odorant receptor, involved in the dendritic localization of other olfactory receptors. Can form functional ion channels in the absence of an odor-binding odorant receptor. Plays a central role in the perception of olfactory stimuli in ants and is essential for ant social organization. Required for pheromone sensing. Also required for the development and maintenance of odorant receptor neurons (ORNs) and of antennal lobe glomeruli "} {"text":"User: Please describe the biological function of the protein: MSAPLSIEQDDLLTDDLKSWLSDIDFSNDNEEAIEMEPSDIEMSSPPIDIETSPPEEADVNLDDTWATVQKNGNNKLNRFILTFFPSDMDTKWLEPETYFENSPNKFDCWTGQYEYCPDTGKLHAHIYIECNHKHRIRFNVFHREIRKYHQSVQLQLAKRASKKQRQSAINYVTADFKRAPGSLVFRWEHNKFPSDFDPKCVNKKSKSDKVSKDEQHETQRLWIESKPRHWTWDQIVHENEESKKLLFGCTAGEKYHKGRHAEDARRTINDVIIFYGAGGTGKTTEAQAWGSEDEPVQECRYYRRNPDDGAFWGGGRTCYKGQRIVHYEEFAGQEAFGRLKEVCDIGKHGPAVNVKNGGALLNHDTVIFTSNIHPAGWFHKLWESDPKQWMPFERRITQVRFYPSHRADGSLNQPDENNPPYFIDQTEEFRQFVGDYDKAKEHAELHWPLKEAPEPTAQVFVPGRSHGVTENTFFEYCKTGRAP\nAssistant: Plays an essential for the replication of viral DNA. Presumably cleaves viral genomic dsRNA replicative form to initiate rolling circle replication"}", "/scratch/micpie/export/uniprot_sentences/test_0-3.jsonl": "{"text":"Task: Generate a protein based on the description.\nDescription: Odorant coreceptor which complexes with conventional odorant receptors (ORs) to form odorant-sensing units, providing sensitive and prolonged odorant signaling and calcium permeability. Obligate coreceptor of all odorant receptors. Orco is a universal and integral part of the functional odorant receptor, involved in the dendritic localization of other olfactory receptors. Can form functional ion channels in the absence of an odor-binding odorant receptor. Plays a central role in the perception of olfactory stimuli in ants and is essential for ant social organization. Required for pheromone sensing. Also required for the development and maintenance of odorant receptor neurons (ORNs) and of antennal lobe glomeruli \nOutput: MMKMKQQGLVADLLPNIRVMKTFGHFVFNYYNDNSSKYLHKVYCCVNLFMLLLQFGLCAVNLIVESADVDDLTANTITLLFFTHSIVKICYFAIRSKYFYRTWAIWNNPNSHPLFAESNARYHAIALKKMRLLLFLVGGTTMLAAVAWTVLTFFEHPIRKIVDPVTNETEIIELPQLLIRSFYPFDAGKGITHVLVLVYQFYWVLFMLIDANSLDVLFCSWLLFACEQLQHLKQIMKPLMELSATLDTVVPNSSELFKAGSADHLRDGDNPPPPPPPQSDNMLDLDLRNIYSNRQDFTATFRPTAGMTFNGGVGPNGLTKKQEALVRSAIKYWVERHKHIVRLVTAVGDAYGFALLLHMLTTTITLTLLAYQATKVNGINVYAASTIGYILYTFGQVFLFCIFGNRLIEESTSVMEAAYSCHWYDGSEEAKTFVQIVCQQCQKAMSISGAKFFTVSLDLFASVLGAVVTYFMVLVQLK"} {"text":"Task: Create a amino acid sequence based on the description.\nDescription: Plays an essential for the replication of viral DNA. Presumably cleaves viral genomic dsRNA replicative form to initiate rolling circle replication\nResult: MSAPLSIEQDDLLTDDLKSWLSDIDFSNDNEEAIEMEPSDIEMSSPPIDIETSPPEEADVNLDDTWATVQKNGNNKLNRFILTFFPSDMDTKWLEPETYFENSPNKFDCWTGQYEYCPDTGKLHAHIYIECNHKHRIRFNVFHREIRKYHQSVQLQLAKRASKKQRQSAINYVTADFKRAPGSLVFRWEHNKFPSDFDPKCVNKKSKSDKVSKDEQHETQRLWIESKPRHWTWDQIVHENEESKKLLFGCTAGEKYHKGRHAEDARRTINDVIIFYGAGGTGKTTEAQAWGSEDEPVQECRYYRRNPDDGAFWGGGRTCYKGQRIVHYEEFAGQEAFGRLKEVCDIGKHGPAVNVKNGGALLNHDTVIFTSNIHPAGWFHKLWESDPKQWMPFERRITQVRFYPSHRADGSLNQPDENNPPYFIDQTEEFRQFVGDYDKAKEHAELHWPLKEAPEPTAQVFVPGRSHGVTENTFFEYCKTGRAP"}", "/scratch/micpie/export/uniprot_sentences/train_0-0.jsonl": "{"text":"User: Briefly describe the biochemical function of the AA sequence: MRFQVIVAAATITMITSYIPGVASQSTSDGDDLFVPVSNFDPKSIFPEIKHPFEPMYANTENGKIVPTNSWISNLFYPSADNLAPTTPDPYTLRLLDGYGGNPGLTIRQPSAKVLGSYPPTNDVPYTDAGYMINSVVVDLRLTSSEWSDVVPDRQVTDWDHLSANLRLSTPQDSNSYIDFPIVRGMAYITANYNNLTPQFLSQHAIISVEADEKKSDDNTSTFSGRKFKITMNDDPTSTFIIYSLGDKPLELRKQDNSNLVASKPYTGVIRVAKLPAPEFETLLDASRAVWPTGGDISARSDDNNGASYTIKWKTNSNEAPLLTYAYAHHLTSIDDSNVKRTDMTLQSATKGPMTALVGNEWTLRETELSPVEWLPLQAAPNPTTINEIMTEINKDIASNYTQETAKEDNYFSGKGLQKFAMLALILNKSDQTQLRNPELAQIALDKLKAAFLPYLQNEQADPFRYDTLYKGIVAKAGLPTSMGGTDDLSAEFGHSYYSDHHYHQGYFVVTAAIIHHLDPTWNADRLKAWTEALIRDVNNANDGDEYFAAFRNWDWFAGHSWAGGIKPDGALDGRDQESVPESVNFYWGAKLWGLATGNTPLTKLASLQLAVTKRTTYEYFWMLDGNKNRPENIVRNKVIGIYFEQKTDYTTYFGRFLEYIHGIQQLPMTPELMEYIRTPEFVSQEWDEKLGAIAPTVQSPWAGVLYLNYAIINPAEAYPALRKVQMDDGQTRSYSLYLTATRPHFFRRSLLAALARHGSTRRPSLPSSGDDDKHEDGFLLRFRRLNPFNLKHRIY\nAssistant: Cleaves internal linkages in 1,3-beta-glucan"} {"text":"User: Please describe the biochemical function of this protein: MTIEKNLSDVQQKYADQFQEDVVKSFQTGYGITPDTQIDAGALRREILDDQITMLTWTNEDLIFYRDISRRPAQSTVVKYDQYLRHGNVGHSRFVKEIGVAPVSDPNIRQKTVSMKYVSDTKNMSIASGLVNNIADPSQILTEDAIAVVAKTIEWASFYGDASLTSEVEGEGLEFDGLAKLIDKNNVINAKGNQLTEKHLNEAAVRIGKGFGTATDAYMPIGVHADFVNSILGRQMQLMQDNSGNVNTGYSVNGFYSSRGFIKLHGSTVMENELILDESLQPLPNAPQPAKVTATVETKQKGAFENEEDRAGLSYKVVVNSDDAQSAPSEEVTATVSNVDDGVKLSISVNAMYQQQPQFVSIYRQGKETGMYFLIKRVPVKDAQEDGTIVFVDKNETLPETADVFVGEMSPQVVHLFELLPMMKLPLAQINASITFAVLWYGALALRAPKKWARIKNVRYIAV\nAssistant: Assembles to form an icosahedral capsid"}", "/scratch/micpie/export/uniprot_sentences/train_0-3.jsonl": "{"text":"Task: Generate a polypeptide based on the description.\nDescription: Cleaves internal linkages in 1,3-beta-glucan\nOutput: MRFQVIVAAATITMITSYIPGVASQSTSDGDDLFVPVSNFDPKSIFPEIKHPFEPMYANTENGKIVPTNSWISNLFYPSADNLAPTTPDPYTLRLLDGYGGNPGLTIRQPSAKVLGSYPPTNDVPYTDAGYMINSVVVDLRLTSSEWSDVVPDRQVTDWDHLSANLRLSTPQDSNSYIDFPIVRGMAYITANYNNLTPQFLSQHAIISVEADEKKSDDNTSTFSGRKFKITMNDDPTSTFIIYSLGDKPLELRKQDNSNLVASKPYTGVIRVAKLPAPEFETLLDASRAVWPTGGDISARSDDNNGASYTIKWKTNSNEAPLLTYAYAHHLTSIDDSNVKRTDMTLQSATKGPMTALVGNEWTLRETELSPVEWLPLQAAPNPTTINEIMTEINKDIASNYTQETAKEDNYFSGKGLQKFAMLALILNKSDQTQLRNPELAQIALDKLKAAFLPYLQNEQADPFRYDTLYKGIVAKAGLPTSMGGTDDLSAEFGHSYYSDHHYHQGYFVVTAAIIHHLDPTWNADRLKAWTEALIRDVNNANDGDEYFAAFRNWDWFAGHSWAGGIKPDGALDGRDQESVPESVNFYWGAKLWGLATGNTPLTKLASLQLAVTKRTTYEYFWMLDGNKNRPENIVRNKVIGIYFEQKTDYTTYFGRFLEYIHGIQQLPMTPELMEYIRTPEFVSQEWDEKLGAIAPTVQSPWAGVLYLNYAIINPAEAYPALRKVQMDDGQTRSYSLYLTATRPHFFRRSLLAALARHGSTRRPSLPSSGDDDKHEDGFLLRFRRLNPFNLKHRIY"} {"text":"Task: Create a amino acid sequence based on the description.\nDescription: Assembles to form an icosahedral capsid\nOutput: MTIEKNLSDVQQKYADQFQEDVVKSFQTGYGITPDTQIDAGALRREILDDQITMLTWTNEDLIFYRDISRRPAQSTVVKYDQYLRHGNVGHSRFVKEIGVAPVSDPNIRQKTVSMKYVSDTKNMSIASGLVNNIADPSQILTEDAIAVVAKTIEWASFYGDASLTSEVEGEGLEFDGLAKLIDKNNVINAKGNQLTEKHLNEAAVRIGKGFGTATDAYMPIGVHADFVNSILGRQMQLMQDNSGNVNTGYSVNGFYSSRGFIKLHGSTVMENELILDESLQPLPNAPQPAKVTATVETKQKGAFENEEDRAGLSYKVVVNSDDAQSAPSEEVTATVSNVDDGVKLSISVNAMYQQQPQFVSIYRQGKETGMYFLIKRVPVKDAQEDGTIVFVDKNETLPETADVFVGEMSPQVVHLFELLPMMKLPLAQINASITFAVLWYGALALRAPKKWARIKNVRYIAV"}", "/scratch/micpie/export/uniprot_sentences/valid_0-2.jsonl": "{"text":"Task: Generate a description of a few sentences for the AA sequence below.\nAA sequence: MSLEQKKGADIISKILQIQNSIGKTTSPSTLKTKLSEISRKEQENARIQSKLSDLQKKKIDIDNKLLKEKQNLIKEEILERKKLEVLTKKQQKDEIEHQKKLKREIDAIKASTQYITDVSISSYNNTIPETEPEYDLFISHASEDKEDFVRPLAETLQQLGVNVWYDEFTLKVGDSLRQKIDSGLRNSKYGTVVLSTDFIKKDWTNYELDGLVAREMNGHKMILPIWHKITKNDVLDYSPNLADKVALNTSVNSIEEIAHQLADVILNR\nOutput: NAD(+) hydrolase (NADase) that catalyzes cleavage of NAD(+) into ADP-D-ribose (ADPR) and nicotinamide. In addition to ADPR, also generates a cyclization variant of cyclic ADPR (cADPR), termed 2'cADPR (v-cADPR). Cleaves NADP(+), but does not cyclize the product "} {"text":"Task: Generate a description for the amino acid sequence below.\nPolypeptide: MKAVVSKLYCSSMEEVMVVRRRPHVVNGGGFVVTDYKEKIVFKIDGCGVLGTKGELVLRDSDGNDLLLIHKKGGVVQALSIHNKWKGYSYDYQGSPKPVFTLRDPKHSCFSITSSIRISVGPGNCYFDVKGYFPDRDCSIVDSKGNVIAQVKEWIGSRDIYKVVTKPSVDKAFVFGVIAVLDYIYGESTSC\nResult: Might be related to the phospholipid scramblase and tubby-like superfamily of membrane tethered transcription factors"}", "/scratch/micpie/export/uniprot_sentences/valid_0-1.jsonl": "{"text":"User: What AA sequence fits the biological description in the next sentences best?\\nNAD(+) hydrolase (NADase) that catalyzes cleavage of NAD(+) into ADP-D-ribose (ADPR) and nicotinamide. In addition to ADPR, also generates a cyclization variant of cyclic ADPR (cADPR), termed 2'cADPR (v-cADPR). Cleaves NADP(+), but does not cyclize the product \nAssistant: A AA sequence that fits the description is:\\nMSLEQKKGADIISKILQIQNSIGKTTSPSTLKTKLSEISRKEQENARIQSKLSDLQKKKIDIDNKLLKEKQNLIKEEILERKKLEVLTKKQQKDEIEHQKKLKREIDAIKASTQYITDVSISSYNNTIPETEPEYDLFISHASEDKEDFVRPLAETLQQLGVNVWYDEFTLKVGDSLRQKIDSGLRNSKYGTVVLSTDFIKKDWTNYELDGLVAREMNGHKMILPIWHKITKNDVLDYSPNLADKVALNTSVNSIEEIAHQLADVILNR"} {"text":"User: What protein fits the biochemical description best?\\nMight be related to the phospholipid scramblase and tubby-like superfamily of membrane tethered transcription factors\nAssistant: A protein that fits the points is:\\nMKAVVSKLYCSSMEEVMVVRRRPHVVNGGGFVVTDYKEKIVFKIDGCGVLGTKGELVLRDSDGNDLLLIHKKGGVVQALSIHNKWKGYSYDYQGSPKPVFTLRDPKHSCFSITSSIRISVGPGNCYFDVKGYFPDRDCSIVDSKGNVIAQVKEWIGSRDIYKVVTKPSVDKAFVFGVIAVLDYIYGESTSC"}", "/scratch/micpie/export/uniprot_sentences/train_0-2.jsonl": "{"text":"Task: Create a description for the protein.\nPolypeptide: MRFQVIVAAATITMITSYIPGVASQSTSDGDDLFVPVSNFDPKSIFPEIKHPFEPMYANTENGKIVPTNSWISNLFYPSADNLAPTTPDPYTLRLLDGYGGNPGLTIRQPSAKVLGSYPPTNDVPYTDAGYMINSVVVDLRLTSSEWSDVVPDRQVTDWDHLSANLRLSTPQDSNSYIDFPIVRGMAYITANYNNLTPQFLSQHAIISVEADEKKSDDNTSTFSGRKFKITMNDDPTSTFIIYSLGDKPLELRKQDNSNLVASKPYTGVIRVAKLPAPEFETLLDASRAVWPTGGDISARSDDNNGASYTIKWKTNSNEAPLLTYAYAHHLTSIDDSNVKRTDMTLQSATKGPMTALVGNEWTLRETELSPVEWLPLQAAPNPTTINEIMTEINKDIASNYTQETAKEDNYFSGKGLQKFAMLALILNKSDQTQLRNPELAQIALDKLKAAFLPYLQNEQADPFRYDTLYKGIVAKAGLPTSMGGTDDLSAEFGHSYYSDHHYHQGYFVVTAAIIHHLDPTWNADRLKAWTEALIRDVNNANDGDEYFAAFRNWDWFAGHSWAGGIKPDGALDGRDQESVPESVNFYWGAKLWGLATGNTPLTKLASLQLAVTKRTTYEYFWMLDGNKNRPENIVRNKVIGIYFEQKTDYTTYFGRFLEYIHGIQQLPMTPELMEYIRTPEFVSQEWDEKLGAIAPTVQSPWAGVLYLNYAIINPAEAYPALRKVQMDDGQTRSYSLYLTATRPHFFRRSLLAALARHGSTRRPSLPSSGDDDKHEDGFLLRFRRLNPFNLKHRIY\nOutput: Cleaves internal linkages in 1,3-beta-glucan"} {"text":"Task: Generate a description of a few sentences for the polypeptide.\nAA sequence: MTIEKNLSDVQQKYADQFQEDVVKSFQTGYGITPDTQIDAGALRREILDDQITMLTWTNEDLIFYRDISRRPAQSTVVKYDQYLRHGNVGHSRFVKEIGVAPVSDPNIRQKTVSMKYVSDTKNMSIASGLVNNIADPSQILTEDAIAVVAKTIEWASFYGDASLTSEVEGEGLEFDGLAKLIDKNNVINAKGNQLTEKHLNEAAVRIGKGFGTATDAYMPIGVHADFVNSILGRQMQLMQDNSGNVNTGYSVNGFYSSRGFIKLHGSTVMENELILDESLQPLPNAPQPAKVTATVETKQKGAFENEEDRAGLSYKVVVNSDDAQSAPSEEVTATVSNVDDGVKLSISVNAMYQQQPQFVSIYRQGKETGMYFLIKRVPVKDAQEDGTIVFVDKNETLPETADVFVGEMSPQVVHLFELLPMMKLPLAQINASITFAVLWYGALALRAPKKWARIKNVRYIAV\nOutput: Assembles to form an icosahedral capsid"}", "/scratch/micpie/export/uniprot_sentences/train_0-1.jsonl": "{"text":"User: What polypeptide fits the biological description best?\\nCleaves internal linkages in 1,3-beta-glucan\nAssistant: A polypeptide that fits the sentences is:\\nMRFQVIVAAATITMITSYIPGVASQSTSDGDDLFVPVSNFDPKSIFPEIKHPFEPMYANTENGKIVPTNSWISNLFYPSADNLAPTTPDPYTLRLLDGYGGNPGLTIRQPSAKVLGSYPPTNDVPYTDAGYMINSVVVDLRLTSSEWSDVVPDRQVTDWDHLSANLRLSTPQDSNSYIDFPIVRGMAYITANYNNLTPQFLSQHAIISVEADEKKSDDNTSTFSGRKFKITMNDDPTSTFIIYSLGDKPLELRKQDNSNLVASKPYTGVIRVAKLPAPEFETLLDASRAVWPTGGDISARSDDNNGASYTIKWKTNSNEAPLLTYAYAHHLTSIDDSNVKRTDMTLQSATKGPMTALVGNEWTLRETELSPVEWLPLQAAPNPTTINEIMTEINKDIASNYTQETAKEDNYFSGKGLQKFAMLALILNKSDQTQLRNPELAQIALDKLKAAFLPYLQNEQADPFRYDTLYKGIVAKAGLPTSMGGTDDLSAEFGHSYYSDHHYHQGYFVVTAAIIHHLDPTWNADRLKAWTEALIRDVNNANDGDEYFAAFRNWDWFAGHSWAGGIKPDGALDGRDQESVPESVNFYWGAKLWGLATGNTPLTKLASLQLAVTKRTTYEYFWMLDGNKNRPENIVRNKVIGIYFEQKTDYTTYFGRFLEYIHGIQQLPMTPELMEYIRTPEFVSQEWDEKLGAIAPTVQSPWAGVLYLNYAIINPAEAYPALRKVQMDDGQTRSYSLYLTATRPHFFRRSLLAALARHGSTRRPSLPSSGDDDKHEDGFLLRFRRLNPFNLKHRIY"} {"text":"User: What polypeptide fits the biological description best?\\nAssembles to form an icosahedral capsid\nAssistant: A polypeptide that fits the sentences is:\\nMTIEKNLSDVQQKYADQFQEDVVKSFQTGYGITPDTQIDAGALRREILDDQITMLTWTNEDLIFYRDISRRPAQSTVVKYDQYLRHGNVGHSRFVKEIGVAPVSDPNIRQKTVSMKYVSDTKNMSIASGLVNNIADPSQILTEDAIAVVAKTIEWASFYGDASLTSEVEGEGLEFDGLAKLIDKNNVINAKGNQLTEKHLNEAAVRIGKGFGTATDAYMPIGVHADFVNSILGRQMQLMQDNSGNVNTGYSVNGFYSSRGFIKLHGSTVMENELILDESLQPLPNAPQPAKVTATVETKQKGAFENEEDRAGLSYKVVVNSDDAQSAPSEEVTATVSNVDDGVKLSISVNAMYQQQPQFVSIYRQGKETGMYFLIKRVPVKDAQEDGTIVFVDKNETLPETADVFVGEMSPQVVHLFELLPMMKLPLAQINASITFAVLWYGALALRAPKKWARIKNVRYIAV"}", "/scratch/micpie/export/uniprot_sentences/valid_0-3.jsonl": "{"text":"Task: Create a protein based on the description.\nDescription: NAD(+) hydrolase (NADase) that catalyzes cleavage of NAD(+) into ADP-D-ribose (ADPR) and nicotinamide. In addition to ADPR, also generates a cyclization variant of cyclic ADPR (cADPR), termed 2'cADPR (v-cADPR). Cleaves NADP(+), but does not cyclize the product \nResult: MSLEQKKGADIISKILQIQNSIGKTTSPSTLKTKLSEISRKEQENARIQSKLSDLQKKKIDIDNKLLKEKQNLIKEEILERKKLEVLTKKQQKDEIEHQKKLKREIDAIKASTQYITDVSISSYNNTIPETEPEYDLFISHASEDKEDFVRPLAETLQQLGVNVWYDEFTLKVGDSLRQKIDSGLRNSKYGTVVLSTDFIKKDWTNYELDGLVAREMNGHKMILPIWHKITKNDVLDYSPNLADKVALNTSVNSIEEIAHQLADVILNR"} {"text":"Task: Come up with a protein based on the description.\nDescription: Might be related to the phospholipid scramblase and tubby-like superfamily of membrane tethered transcription factors\nOutput: MKAVVSKLYCSSMEEVMVVRRRPHVVNGGGFVVTDYKEKIVFKIDGCGVLGTKGELVLRDSDGNDLLLIHKKGGVVQALSIHNKWKGYSYDYQGSPKPVFTLRDPKHSCFSITSSIRISVGPGNCYFDVKGYFPDRDCSIVDSKGNVIAQVKEWIGSRDIYKVVTKPSVDKAFVFGVIAVLDYIYGESTSC"}", "/scratch/micpie/export/nomad_structure/train_0-17.jsonl": "{"text":"User: I have a compound with the following CIF file [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n. Can you tell me the density?\nAssistant: The density of the solid is 5977.738 kg\/m^3. Anything else?\nUser: Yes, I also want to know the spacegroup of the symmetrized version of this solid.\nAssistant: The spacegroup of the solid is I4\/mmm."} {"text":"User: I have a structure with the following CIF file [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n. Can you tell me the density?\nAssistant: The density of the material is 11995.819 kg\/m^3. Do you need anything else?\nUser: Yes, I also want to know the spacegroup of the symmetrized version of this material.\nAssistant: The spacegroup of the material is I-4m2."}", "/scratch/micpie/export/nomad_structure/train_0-16.jsonl": "{"text":"User: I want to design a compound with a density of 5977.738 kg\/m^3, and a chemical formula of Se2SnZn2.\nAssistant: Great, I suggest the compound with the CIF [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n. Is there anything else I can do for you?\nUser: Yeah, I also want to know the spacegroup of the symmetrized version of this compound.\nAssistant: The spacegroup of the compound is I4\/mmm."} {"text":"User: I want to design a solid with a density of 11995.819 kg\/m^3, and a reduced formula of AlPd2Pt.\nAssistant: Cool, I suggest the solid with the CIF file [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n. Is there anything else I can do for you?\nUser: Yep, I also want to know the spacegroup of the symmetrized version of this solid.\nAssistant: The spacegroup of the solid is I-4m2."}", "/scratch/micpie/export/nomad_structure/test_0-10.jsonl": "{"text":"Question: What is the density of the solid with the CIF file [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n?\nAnswer: 5521.174 kg\/m^3."} {"text":"Question: What is the density of the compound with the CIF card [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n?\nAnswer: 7785.437 kg\/m^3."}", "/scratch/micpie/export/nomad_structure/valid_0-8.jsonl": "{"text":"Question: What is the density of the compound with the CIF file [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n?\nAnswer: The density of the compound with the CIF file [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n is 7692.713 kg\/m^3."} {"text":"Question: What's the density of the compound with the CIF file [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n?\nAnswer: The density of the compound with the CIF file [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n is 7130.633 kg\/m^3."}", "/scratch/micpie/export/nomad_structure/test_0-22.jsonl": "{"text":"Question: What's the complete CIF card of the compound with the masked CIF card [CIF]\ndata_NaCuSn2\n[MASK]]\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n[MASK]]\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n[MASK]]\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\n[MASK]]\n _atom_site_type_symbol\n[MASK]]\n[MASK]]\n _atom_site_fract_x\n _atom_site_fract_y\n[MASK]]\n _atom_site_occupancy\n[MASK]]\n[MASK]]\n[MASK]]\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n and density 5521.174 kg\/m^3?\nAnswer: [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n"} {"text":"Question: What's the complete CIF of the material with the masked CIF [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n[MASK]]\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n[MASK]]\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\n[MASK]]\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n[MASK]]\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n[MASK]]\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n and density 7785.437 kg\/m^3?\nAnswer: [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n"}", "/scratch/micpie/export/nomad_structure/test_0-16.jsonl": "{"text":"User: I want to design a solid with a density of 5521.174 kg\/m^3, and a composition of CuNaSn2.\nAssistant: Cool, I suggest the solid with the CIF [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n. Do you need anything else?\nUser: Yeah, I also want to know the spacegroup of the symmetrized version of this solid.\nAssistant: The spacegroup of the solid is C2\/m."} {"text":"User: I want to design a material with a density of 7785.437 kg\/m^3, and a reduced formula of BePdTe2.\nAssistant: That sounds interesting, I suggest the material with the CIF card [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n. Anything else?\nUser: Yep, I also want to know the spacegroup of the symmetrized version of this material.\nAssistant: The spacegroup of the material is P4\/mmm."}", "/scratch/micpie/export/nomad_structure/test_0-15.jsonl": "{"text":"User: The CIF file of the solid is [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n. Tell me the density.\nAssistant: Sure, density of the solid is 5521.174 kg\/m^3"} {"text":"User: The CIF file of the compound is [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n. Give me the density.\nAssistant: Certainly, density of the compound is 7785.437 kg\/m^3"}", "/scratch/micpie/export/nomad_structure/train_0-8.jsonl": "{"text":"Question: What is the density of the compound with the CIF file [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n?\nAnswer: The density of the compound with the CIF file [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n is 5977.738 kg\/m^3."} {"text":"Question: What's the density of the compound with the CIF file [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n?\nAnswer: The density of the compound with the CIF file [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n is 11995.819 kg\/m^3."}", "/scratch/micpie/export/nomad_structure/test_0-5.jsonl": "{"text":"The CIF file of the material with reduced formula CuNaSn2, spacegroup number 12 and density 5521.174 kg\/m^3 is [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n."} {"text":"The CIF of the material with reduced formula BePdTe2, number of the spacegroup in the International Tables for Crystallography 123 and density 7785.437 kg\/m^3 is [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n."}", "/scratch/micpie/export/nomad_structure/valid_0-9.jsonl": "{"text":"User: I want to design a material with a particular density, spacegroup, and composition.\nAssistant: Awesome, I would need to know the density, spacegroup, and composition of the material you want to design.\nUser: The density should be 7692.713 kg\/m^3, the spacegroup should be P4mm, and the composition should be LiSn2Tc.\nAssistant: I advise the compound with the CIF card [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n."} {"text":"User: I want to design a material with a particular density, spacegroup, and chemical formula.\nAssistant: Great, I would need to know the density, spacegroup, and chemical formula of the material you want to design.\nUser: The density should be 7130.633 kg\/m^3, the spacegroup should be P4mm, and the chemical formula should be In2SrTl.\nAssistant: I propose the compound with the CIF file [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n."}", "/scratch/micpie/export/nomad_structure/test_0-19.jsonl": "{"text":"User: I want you to suggest the CIF card of a solid with a density of 5521.174 kg\/m^3, and reduced formula CuNaSn2. Also the spacegroup of the solid should be C2\/m\nAssistant: Sure, the CIF card is [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n."} {"text":"User: I want you to tell the CIF of a material with a density of 7785.437 kg\/m^3, and composition BePdTe2. Also the spacegroup of the material should be P4\/mmm\nAssistant: Sure, the CIF is [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n."}", "/scratch/micpie/export/nomad_structure/test_0-1.jsonl": "{"text":"The crystal system of the compound with the CIF [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n is monoclinic."} {"text":"The crystal system of the compound with the CIF file [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n is tetragonal."}", "/scratch/micpie/export/nomad_structure/test_0-18.jsonl": "{"text":"User: For a solid with spacegroup number C2\/m, can you estimate the density in kg\/m^3, and the pointgroup?\nAssistant: Certainly, the density is 5521.174 kg\/m^3, and the pointgroup is 2\/m."} {"text":"User: For a material with spacegroup number P4\/mmm, can you estimate the density in kg\/m^3, and the pointgroup?\nAssistant: Certainly, the density is 7785.437 kg\/m^3, and the pointgroup is 4\/mmm."}", "/scratch/micpie/export/nomad_structure/valid_0-0.jsonl": "{"text":"The spacegroup of the symmetrized version of the compound with the CIF [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n is P4mm."} {"text":"The spacegroup of the symmetrized version of the solid with the CIF [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n is P4mm."}", "/scratch/micpie/export/nomad_structure/test_0-21.jsonl": "{"text":"Task: Fill the rows masked with `[MASK]` in this CIF card to fulfill the given constraints. Return the CIF card with the masked rows filled.\nMasked CIF card: [CIF]\ndata_NaCuSn2\n[MASK]]\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n[MASK]]\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n[MASK]]\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\n[MASK]]\n _atom_site_type_symbol\n[MASK]]\n[MASK]]\n _atom_site_fract_x\n _atom_site_fract_y\n[MASK]]\n _atom_site_occupancy\n[MASK]]\n[MASK]]\n[MASK]]\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n\nConstraint: The density should be 5521.174 kg\/m^3, the composition should be CuNaSn2, and the spacegroup should be C2\/m.\nAnswer: [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n"} {"text":"Task: Fill the rows masked with `[MASK]` in this CIF card to fulfill the given constraints. Return the CIF card with the masked rows filled.\nMasked CIF card: [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n[MASK]]\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n[MASK]]\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\n[MASK]]\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n[MASK]]\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n[MASK]]\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n\nConstraint: The density should be 7785.437 kg\/m^3, the chemical formula should be BePdTe2, and the spacegroup should be P4\/mmm.\nAnswer: [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n"}", "/scratch/micpie/export/nomad_structure/test_0-2.jsonl": "{"text":"The density of the material with the CIF card [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n is 5521.174 kg\/m^3."} {"text":"The density of the material with the CIF file [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n is 7785.437 kg\/m^3."}", "/scratch/micpie/export/nomad_structure/train_0-22.jsonl": "{"text":"Question: What's the complete CIF card of the material with the masked CIF card [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n[MASK]]\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n[MASK]]\n[MASK]]\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n[MASK]]\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n[MASK]]\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n and density 5977.738 kg\/m^3?\nAnswer: [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n"} {"text":"Question: What's the complete CIF card of the compound with the masked CIF card [CIF]\n[MASK]]\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n[MASK]]\n_cell_length_c 4.66238054\n[MASK]]\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n[MASK]]\n[MASK]]\n[MASK]]\n[MASK]]\n[MASK]]\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[MASK]]\n and density 11995.819 kg\/m^3?\nAnswer: [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n"}", "/scratch/micpie/export/nomad_structure/valid_0-10.jsonl": "{"text":"Question: What is the density of the solid with the CIF [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n?\nAnswer: 7692.713 kg\/m^3."} {"text":"Question: What is the density of the compound with the CIF card [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n?\nAnswer: 7130.633 kg\/m^3."}", "/scratch/micpie/export/nomad_structure/train_0-6.jsonl": "{"text":"Question: What is the structure of compound with composition Se2SnZn2 and number of the spacegroup in the International Tables for Crystallography 139?\nConstraint: Return a CIF card.\nAnswer: [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n"} {"text":"Question: What's the structure of material with reduced formula AlPd2Pt and number of the spacegroup in the International Tables for Crystallography 119?\nConstraint: Return a CIF card.\nAnswer: [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n"}", "/scratch/micpie/export/nomad_structure/valid_0-6.jsonl": "{"text":"Question: What is the structure of solid with reduced formula LiSn2Tc and number of the spacegroup in the International Tables for Crystallography 99?\nConstraint: Return a CIF card.\nAnswer: [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n"} {"text":"Question: What's the structure of compound with composition In2SrTl and number of the spacegroup in the International Tables for Crystallography 99?\nConstraint: Return a CIF file.\nAnswer: [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n"}", "/scratch/micpie/export/nomad_structure/train_0-21.jsonl": "{"text":"Task: Fill the rows masked with `[MASK]` in this CIF file to fulfill the given constraints. Return the CIF file with the masked rows filled.\nMasked CIF file: [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n[MASK]]\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n[MASK]]\n[MASK]]\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n[MASK]]\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n[MASK]]\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n\nConstraint: The density should be 5977.738 kg\/m^3, the composition should be Se2SnZn2, and the spacegroup should be I4\/mmm.\nAnswer: [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n"} {"text":"Task: Fill the rows masked with `[MASK]` in this CIF file to fulfill the given constraints. Return the CIF file with the masked rows filled.\nMasked CIF file: [CIF]\n[MASK]]\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n[MASK]]\n_cell_length_c 4.66238054\n[MASK]]\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n[MASK]]\n[MASK]]\n[MASK]]\n[MASK]]\n[MASK]]\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[MASK]]\n\nConstraint: The density should be 11995.819 kg\/m^3, the chemical formula should be AlPd2Pt, and the spacegroup should be I-4m2.\nAnswer: [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n"}", "/scratch/micpie/export/nomad_structure/train_0-19.jsonl": "{"text":"User: I want you to tell the CIF of a solid with a density of 5977.738 kg\/m^3, and composition Se2SnZn2. Also the spacegroup of the solid should be I4\/mmm\nAssistant: Sure, the CIF is [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n."} {"text":"User: I want you to write the CIF file of a compound with a density of 11995.819 kg\/m^3, and composition AlPd2Pt. Also the spacegroup of the compound should be I-4m2\nAssistant: Of course, the CIF file is [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n."}", "/scratch/micpie/export/nomad_structure/test_0-9.jsonl": "{"text":"User: I want to design a material with a particular density, spacegroup, and chemical formula.\nAssistant: Great, I would need to know the density, spacegroup, and chemical formula of the material you want to design.\nUser: The density should be 5521.174 kg\/m^3, the spacegroup should be C2\/m, and the chemical formula should be CuNaSn2.\nAssistant: I propose the solid with the CIF card [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n."} {"text":"User: I want to design a material with a particular density, spacegroup, and chemical formula.\nAssistant: That sounds interesting, I would need to know the density, spacegroup, and chemical formula of the material you want to design.\nUser: The density should be 7785.437 kg\/m^3, the spacegroup should be P4\/mmm, and the chemical formula should be BePdTe2.\nAssistant: I propose the material with the CIF [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n."}", "/scratch/micpie/export/nomad_structure/test_0-0.jsonl": "{"text":"The spacegroup of the symmetrized version of the solid with the CIF card [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n is C2\/m."} {"text":"The spacegroup of the symmetrized version of the compound with the CIF card [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n is P4\/mmm."}", "/scratch/micpie/export/nomad_structure/valid_0-16.jsonl": "{"text":"User: I want to design a compound with a density of 7692.713 kg\/m^3, and a reduced formula of LiSn2Tc.\nAssistant: I suggest the compound with the CIF [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n. Do you need anything else?\nUser: Yeah, I also want to know the spacegroup of the symmetrized version of this compound.\nAssistant: The spacegroup of the compound is P4mm."} {"text":"User: I want to design a solid with a density of 7130.633 kg\/m^3, and a chemical formula of In2SrTl.\nAssistant: Great, I suggest the solid with the CIF [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n. Anything else?\nUser: Indeed, I also want to know the spacegroup of the symmetrized version of this solid.\nAssistant: The spacegroup of the solid is P4mm."}", "/scratch/micpie/export/nomad_structure/valid_0-7.jsonl": "{"text":"User: In the solid with the CIF card [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n, what is the pointgroup?\nAssistant: The pointgroup of the symmetrized version of the solid is 4mm."} {"text":"User: In the solid with the CIF file [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n, what is the pointgroup?\nAssistant: The pointgroup of the symmetrized version of the solid is 4mm."}", "/scratch/micpie/export/nomad_structure/test_0-3.jsonl": "{"text":"The reduced formula of the compound with the CIF [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n is CuNaSn2."} {"text":"The reduced formula of the material with the CIF [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n is BePdTe2."}", "/scratch/micpie/export/nomad_structure/valid_0-11.jsonl": "{"text":"Question: What is the spacegroup of the symmetrized version of the solid with the CIF [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n?\nAnswer: P4mm."} {"text":"Question: What is the spacegroup of the symmetrized version of the compound with the CIF card [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n?\nAnswer: P4mm."}", "/scratch/micpie/export/nomad_structure/train_0-20.jsonl": "{"text":"Task: Fill the rows masked with `[MASK]` in this CIF file to fulfill the given constraints. Return the CIF file with the masked rows filled.\nMasked CIF file: [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n[MASK]]\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n[MASK]]\n[MASK]]\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n[MASK]]\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n[MASK]]\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n\nConstraint: The density should be 5977.738 kg\/m^3, and the chemical formula should be Se2SnZn2.\nAnswer: [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n"} {"text":"Task: Fill the rows masked with `[MASK]` in this CIF file to fulfill the given constraints. Return the CIF file with the masked rows filled.\nMasked CIF file: [CIF]\n[MASK]]\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n[MASK]]\n_cell_length_c 4.66238054\n[MASK]]\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n[MASK]]\n[MASK]]\n[MASK]]\n[MASK]]\n[MASK]]\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[MASK]]\n\nConstraint: The density should be 11995.819 kg\/m^3, and the composition should be AlPd2Pt.\nAnswer: [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n"}", "/scratch/micpie/export/nomad_structure/valid_0-20.jsonl": "{"text":"Task: Fill the rows masked with `[MASK]` in this CIF file to fulfill the given constraints. Return the CIF file with the masked rows filled.\nMasked CIF file: [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n[MASK]]\n[MASK]]\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n[MASK]]\n _atom_site_label\n[MASK]]\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n[MASK]]\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n[MASK]]\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n\nConstraint: The density should be 7692.713 kg\/m^3, and the composition should be LiSn2Tc.\nAnswer: [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n"} {"text":"Task: Fill the rows masked with `[MASK]` in this CIF to fulfill the given constraints. Return the CIF with the masked rows filled.\nMasked CIF: [CIF]\ndata_SrTlIn2\n[MASK]]\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n[MASK]]\n_cell_formula_units_Z 1\n[MASK]]\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n[MASK]]\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n[MASK]]\n[MASK]]\n[MASK]]\n[MASK]]\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n\nConstraint: The density should be 7130.633 kg\/m^3, and the composition should be In2SrTl.\nAnswer: [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n"}", "/scratch/micpie/export/nomad_structure/train_0-0.jsonl": "{"text":"The spacegroup of the symmetrized version of the solid with the CIF card [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n is I4\/mmm."} {"text":"The spacegroup of the symmetrized version of the solid with the CIF file [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n is I-4m2."}", "/scratch/micpie/export/nomad_structure/test_0-6.jsonl": "{"text":"Question: What's the structure of material with chemical formula CuNaSn2 and spacegroup number 12?\nConstraint: Return a CIF.\nAnswer: [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n"} {"text":"Question: What's the structure of solid with composition BePdTe2 and spacegroup number 123?\nConstraint: Return a CIF file.\nAnswer: [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n"}", "/scratch/micpie/export/nomad_structure/train_0-10.jsonl": "{"text":"Question: What is the density of the solid with the CIF [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n?\nAnswer: 5977.738 kg\/m^3."} {"text":"Question: What is the density of the material with the CIF card [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n?\nAnswer: 11995.819 kg\/m^3."}", "/scratch/micpie/export/nomad_structure/train_0-3.jsonl": "{"text":"The chemical formula of the solid with the CIF card [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n is Se2SnZn2."} {"text":"The reduced formula of the solid with the CIF file [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n is AlPd2Pt."}", "/scratch/micpie/export/nomad_structure/train_0-23.jsonl": "{"text":"Question: What is the complete CIF file of the compound with the masked CIF file [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n[MASK]]\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n[MASK]]\n[MASK]]\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n[MASK]]\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n[MASK]]\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n, density 5977.738 kg\/m^3, and chemical formula Se2SnZn2?\nAnswer: [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n"} {"text":"Question: What is the complete CIF card of the material with the masked CIF card [CIF]\n[MASK]]\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n[MASK]]\n_cell_length_c 4.66238054\n[MASK]]\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n[MASK]]\n[MASK]]\n[MASK]]\n[MASK]]\n[MASK]]\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[MASK]]\n, density 11995.819 kg\/m^3, and chemical formula AlPd2Pt?\nAnswer: [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n"}", "/scratch/micpie/export/nomad_structure/train_0-12.jsonl": "{"text":"Question: What is the chemical formula of the solid with the CIF [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n?\nAnswer: Se2SnZn2."} {"text":"Question: What is the composition of the compound with the CIF file [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n?\nAnswer: AlPd2Pt."}", "/scratch/micpie/export/nomad_structure/test_0-13.jsonl": "{"text":"Question: What is the spacegroup number of the symmetrized version of the solid with the CIF card [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n?\nAnswer: 12."} {"text":"Question: What is the number of the spacegroup in the International Tables for Crystallography of the symmetrized version of the material with the CIF file [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n?\nAnswer: 123."}", "/scratch/micpie/export/nomad_structure/test_0-23.jsonl": "{"text":"Question: What is the complete CIF of the compound with the masked CIF [CIF]\ndata_NaCuSn2\n[MASK]]\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n[MASK]]\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n[MASK]]\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\n[MASK]]\n _atom_site_type_symbol\n[MASK]]\n[MASK]]\n _atom_site_fract_x\n _atom_site_fract_y\n[MASK]]\n _atom_site_occupancy\n[MASK]]\n[MASK]]\n[MASK]]\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n, density 5521.174 kg\/m^3, and composition CuNaSn2?\nAnswer: [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n"} {"text":"Question: What is the complete CIF file of the material with the masked CIF file [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n[MASK]]\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n[MASK]]\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\n[MASK]]\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n[MASK]]\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n[MASK]]\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n, density 7785.437 kg\/m^3, and chemical formula BePdTe2?\nAnswer: [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n"}", "/scratch/micpie/export/nomad_structure/valid_0-2.jsonl": "{"text":"The density of the compound with the CIF [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n is 7692.713 kg\/m^3."} {"text":"The density of the compound with the CIF [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n is 7130.633 kg\/m^3."}", "/scratch/micpie/export/nomad_structure/valid_0-21.jsonl": "{"text":"Task: Fill the rows masked with `[MASK]` in this CIF file to fulfill the given constraints. Return the CIF file with the masked rows filled.\nMasked CIF file: [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n[MASK]]\n[MASK]]\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n[MASK]]\n _atom_site_label\n[MASK]]\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n[MASK]]\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n[MASK]]\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n\nConstraint: The density should be 7692.713 kg\/m^3, the composition should be LiSn2Tc, and the spacegroup should be P4mm.\nAnswer: [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n"} {"text":"Task: Fill the rows masked with `[MASK]` in this CIF file to fulfill the given constraints. Return the CIF file with the masked rows filled.\nMasked CIF file: [CIF]\ndata_SrTlIn2\n[MASK]]\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n[MASK]]\n_cell_formula_units_Z 1\n[MASK]]\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n[MASK]]\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n[MASK]]\n[MASK]]\n[MASK]]\n[MASK]]\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n\nConstraint: The density should be 7130.633 kg\/m^3, the reduced formula should be In2SrTl, and the spacegroup should be P4mm.\nAnswer: [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n"}", "/scratch/micpie/export/nomad_structure/train_0-14.jsonl": "{"text":"User: The CIF of the compound is [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n. Show me the composition.\nAssistant: Sure, the composition is Se2SnZn2"} {"text":"User: The CIF of the material is [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n. Tell me the chemical formula.\nAssistant: Of course, the composition is AlPd2Pt"}", "/scratch/micpie/export/nomad_structure/valid_0-1.jsonl": "{"text":"The crystal system of the material with the CIF card [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n is tetragonal."} {"text":"The crystal system of the compound with the CIF card [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n is tetragonal."}", "/scratch/micpie/export/nomad_structure/valid_0-13.jsonl": "{"text":"Question: What is the number of the spacegroup in the International Tables for Crystallography of the symmetrized version of the compound with the CIF [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n?\nAnswer: 99."} {"text":"Question: What is the number of the spacegroup in the International Tables for Crystallography of the symmetrized version of the compound with the CIF file [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n?\nAnswer: 99."}", "/scratch/micpie/export/nomad_structure/valid_0-23.jsonl": "{"text":"Question: What's the complete CIF card of the solid with the masked CIF card [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n[MASK]]\n[MASK]]\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n[MASK]]\n _atom_site_label\n[MASK]]\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n[MASK]]\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n[MASK]]\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n, density 7692.713 kg\/m^3, and chemical formula LiSn2Tc?\nAnswer: [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n"} {"text":"Question: What's the complete CIF card of the material with the masked CIF card [CIF]\ndata_SrTlIn2\n[MASK]]\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n[MASK]]\n_cell_formula_units_Z 1\n[MASK]]\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n[MASK]]\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n[MASK]]\n[MASK]]\n[MASK]]\n[MASK]]\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n, density 7130.633 kg\/m^3, and reduced formula In2SrTl?\nAnswer: [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n"}", "/scratch/micpie/export/nomad_structure/valid_0-5.jsonl": "{"text":"The CIF of the material with reduced formula LiSn2Tc, spacegroup number 99 and density 7692.713 kg\/m^3 is [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n."} {"text":"The CIF card of the material with reduced formula In2SrTl, number of the spacegroup in the International Tables for Crystallography 99 and density 7130.633 kg\/m^3 is [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n."}", "/scratch/micpie/export/nomad_structure/train_0-15.jsonl": "{"text":"User: The CIF of the compound is [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n. Show me the density.\nAssistant: Sure, density of the compound is 5977.738 kg\/m^3"} {"text":"User: The CIF file of the compound is [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n. Give me the density.\nAssistant: Certainly, density of the compound is 11995.819 kg\/m^3"}", "/scratch/micpie/export/nomad_structure/valid_0-4.jsonl": "{"text":"The number of the spacegroup in the International Tables for Crystallography of the symmetrized version of the solid with the CIF card [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n is 99."} {"text":"The number of the spacegroup in the International Tables for Crystallography of the symmetrized version of the material with the CIF card [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n is 99."}", "/scratch/micpie/export/nomad_structure/train_0-5.jsonl": "{"text":"The CIF card of the material with reduced formula Se2SnZn2, number of the spacegroup in the International Tables for Crystallography 139 and density 5977.738 kg\/m^3 is [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n."} {"text":"The CIF of the material with chemical formula AlPd2Pt, number of the spacegroup in the International Tables for Crystallography 119 and density 11995.819 kg\/m^3 is [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n."}", "/scratch/micpie/export/nomad_structure/valid_0-15.jsonl": "{"text":"User: The CIF of the material is [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n. Tell me the density.\nAssistant: Of course, density of the material is 7692.713 kg\/m^3"} {"text":"User: The CIF file of the material is [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n. me the density.\nAssistant: Sure, density of the material is 7130.633 kg\/m^3"}", "/scratch/micpie/export/nomad_structure/valid_0-12.jsonl": "{"text":"Question: What is the reduced formula of the material with the CIF card [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n?\nAnswer: LiSn2Tc."} {"text":"Question: What is the composition of the material with the CIF [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n?\nAnswer: In2SrTl."}", "/scratch/micpie/export/nomad_structure/valid_0-18.jsonl": "{"text":"User: For a structure with number of the spacegroup in the International Tables for Crystallography P4mm, can you estimate the density in kg\/m^3, and the pointgroup?\nAssistant: Certainly, the density is 7692.713 kg\/m^3, and the pointgroup is 4mm."} {"text":"User: For a solid with spacegroup number P4mm, can you estimate the density in kg\/m^3, and the pointgroup?\nAssistant: Certainly, the density is 7130.633 kg\/m^3, and the pointgroup is 4mm."}", "/scratch/micpie/export/nomad_structure/train_0-2.jsonl": "{"text":"The density of the solid with the CIF card [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n is 5977.738 kg\/m^3."} {"text":"The density of the solid with the CIF card [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n is 11995.819 kg\/m^3."}", "/scratch/micpie/export/nomad_structure/test_0-11.jsonl": "{"text":"Question: What is the spacegroup of the symmetrized version of the material with the CIF file [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n?\nAnswer: C2\/m."} {"text":"Question: What is the spacegroup of the symmetrized version of the compound with the CIF file [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n?\nAnswer: P4\/mmm."}", "/scratch/micpie/export/nomad_structure/train_0-7.jsonl": "{"text":"User: In the material with the CIF card [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n, what is the pointgroup?\nAssistant: The pointgroup of the symmetrized version of the material is 4\/mmm."} {"text":"User: In the material with the CIF file [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n, what is the pointgroup?\nAssistant: The pointgroup of the symmetrized version of the material is -42m."}", "/scratch/micpie/export/nomad_structure/test_0-17.jsonl": "{"text":"User: I have a material with the following CIF [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n. What is the density?\nAssistant: The density of the material is 5521.174 kg\/m^3. Is there anything else I can do for you?\nUser: Yeah, I also want to know the spacegroup of the symmetrized version of this material.\nAssistant: The spacegroup of the material is C2\/m."} {"text":"User: I have a solid with the following CIF file [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n. Can you tell me the density?\nAssistant: The density of the material is 7785.437 kg\/m^3. Do you need anything else?\nUser: Yep, I also want to know the spacegroup of the symmetrized version of this material.\nAssistant: The spacegroup of the material is P4\/mmm."}", "/scratch/micpie/export/nomad_structure/valid_0-19.jsonl": "{"text":"User: I want you to tell the CIF of a compound with a density of 7692.713 kg\/m^3, and chemical formula LiSn2Tc. Also the spacegroup of the compound should be P4mm\nAssistant: Of course, the CIF is [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n."} {"text":"User: I want you to write the CIF of a solid with a density of 7130.633 kg\/m^3, and composition In2SrTl. Also the spacegroup of the solid should be P4mm\nAssistant: Of course, the CIF is [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n."}", "/scratch/micpie/export/nomad_structure/train_0-11.jsonl": "{"text":"Question: What is the spacegroup of the symmetrized version of the solid with the CIF card [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n?\nAnswer: I4\/mmm."} {"text":"Question: What is the spacegroup of the symmetrized version of the compound with the CIF card [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n?\nAnswer: I-4m2."}", "/scratch/micpie/export/nomad_structure/train_0-1.jsonl": "{"text":"The crystal system of the material with the CIF file [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n is tetragonal."} {"text":"The crystal system of the material with the CIF [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n is tetragonal."}", "/scratch/micpie/export/nomad_structure/train_0-13.jsonl": "{"text":"Question: What is the spacegroup number of the symmetrized version of the material with the CIF file [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n?\nAnswer: 139."} {"text":"Question: What is the spacegroup number of the symmetrized version of the compound with the CIF card [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n?\nAnswer: 119."}", "/scratch/micpie/export/nomad_structure/train_0-4.jsonl": "{"text":"The spacegroup number of the symmetrized version of the material with the CIF [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n is 139."} {"text":"The number of the spacegroup in the International Tables for Crystallography of the symmetrized version of the solid with the CIF card [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n is 119."}", "/scratch/micpie/export/nomad_structure/test_0-7.jsonl": "{"text":"User: In the solid with the CIF file [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n, what is the pointgroup?\nAssistant: The pointgroup of the symmetrized version of the solid is 2\/m."} {"text":"User: In the solid with the CIF [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n, what is the pointgroup?\nAssistant: The pointgroup of the symmetrized version of the solid is 4\/mmm."}", "/scratch/micpie/export/nomad_structure/train_0-9.jsonl": "{"text":"User: I want to design a material with a particular density, spacegroup, and composition.\nAssistant: Cool, I would need to know the density, spacegroup, and composition of the material you want to design.\nUser: The density should be 5977.738 kg\/m^3, the spacegroup should be I4\/mmm, and the composition should be Se2SnZn2.\nAssistant: I advise the solid with the CIF card [CIF]\ndata_Zn2SnSe2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 7.50070541\n_cell_length_b 7.50070541\n_cell_length_c 7.50070541\n_cell_angle_alpha 148.75831629\n_cell_angle_beta 148.75831629\n_cell_angle_gamma 44.76717406\n_symmetry_Int_Tables_number 1\n_cell_volume 113.16772285\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Se Se0 1 0.00000000 0.00000000 8.77078160 1\n Se Se1 1 0.00000000 -0.00000000 5.10035028 1\n Sn Sn2 1 0.00000000 0.00000000 0.00000000 1\n Zn Zn3 1 0.00000000 2.01971616 3.46778297 1\n Zn Zn4 1 2.01971616 -0.00000000 3.46778297 1\n[\/CIF]\n."} {"text":"User: I want to design a material with a particular density, spacegroup, and chemical formula.\nAssistant: Cool, I would need to know the density, spacegroup, and chemical formula of the material you want to design.\nUser: The density should be 11995.819 kg\/m^3, the spacegroup should be I-4m2, and the chemical formula should be AlPd2Pt.\nAssistant: I advise the material with the CIF card [CIF]\ndata_AlPd2Pt\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.66238054\n_cell_length_b 4.66238054\n_cell_length_c 4.66238054\n_cell_angle_alpha 128.61056753\n_cell_angle_beta 128.61056753\n_cell_angle_gamma 75.63799924\n_symmetry_Int_Tables_number 1\n_cell_volume 60.20244592\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 2.02149624 -0.00000000 1.84152778 1\n Pd Pd2 1 0.00000000 -0.00000000 3.68305555 1\n Pt Pt3 1 -0.00000000 2.02149624 1.84152778 1\n[\/CIF]\n."}", "/scratch/micpie/export/nomad_structure/valid_0-22.jsonl": "{"text":"Question: What's the complete CIF file of the material with the masked CIF file [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n[MASK]]\n[MASK]]\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n[MASK]]\n _atom_site_label\n[MASK]]\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n[MASK]]\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n[MASK]]\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n and density 7692.713 kg\/m^3?\nAnswer: [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n"} {"text":"Question: What is the complete CIF file of the compound with the masked CIF file [CIF]\ndata_SrTlIn2\n[MASK]]\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n[MASK]]\n_cell_formula_units_Z 1\n[MASK]]\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n[MASK]]\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n[MASK]]\n[MASK]]\n[MASK]]\n[MASK]]\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n and density 7130.633 kg\/m^3?\nAnswer: [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n"}", "/scratch/micpie/export/nomad_structure/train_0-18.jsonl": "{"text":"User: For a material with spacegroup number I4\/mmm, can you estimate the density in kg\/m^3, and the pointgroup?\nAssistant: Certainly, the density is 5977.738 kg\/m^3, and the pointgroup is 4\/mmm."} {"text":"User: For a compound with spacegroup number I-4m2, can you estimate the density in kg\/m^3, and the pointgroup?\nAssistant: Certainly, the density is 11995.819 kg\/m^3, and the pointgroup is -42m."}", "/scratch/micpie/export/nomad_structure/valid_0-3.jsonl": "{"text":"The reduced formula of the solid with the CIF file [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n is LiSn2Tc."} {"text":"The reduced formula of the material with the CIF file [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n is In2SrTl."}", "/scratch/micpie/export/nomad_structure/test_0-8.jsonl": "{"text":"Question: What's the density of the material with the CIF file [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n?\nAnswer: The density of the material with the CIF file [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n is 5521.174 kg\/m^3."} {"text":"Question: What is the density of the solid with the CIF card [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n?\nAnswer: The density of the solid with the CIF card [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n is 7785.437 kg\/m^3."}", "/scratch/micpie/export/nomad_structure/test_0-14.jsonl": "{"text":"User: The CIF card of the compound is [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n. Give me the composition.\nAssistant: Certainly, the chemical formula is CuNaSn2"} {"text":"User: The CIF of the solid is [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n. Give me the composition.\nAssistant: Of course, the reduced formula is BePdTe2"}", "/scratch/micpie/export/nomad_structure/valid_0-17.jsonl": "{"text":"User: I have a material with the following CIF card [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n. What is the density?\nAssistant: The density of the compound is 7692.713 kg\/m^3. Anything else?\nUser: Indeed, I also want to know the spacegroup of the symmetrized version of this compound.\nAssistant: The spacegroup of the compound is P4mm."} {"text":"User: I have a solid with the following CIF [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n. Can you tell me the density?\nAssistant: The density of the solid is 7130.633 kg\/m^3. Is there anything else I can do for you?\nUser: Indeed, I also want to know the spacegroup of the symmetrized version of this solid.\nAssistant: The spacegroup of the solid is P4mm."}", "/scratch/micpie/export/nomad_structure/valid_0-14.jsonl": "{"text":"User: The CIF card of the compound is [CIF]\ndata_LiTcSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.64068477\n_cell_length_b 3.64068477\n_cell_length_c 5.59030790\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 74.09721456\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Li Li0 1 0.00000000 0.00000000 4.20619155 1\n Sn Sn1 1 1.82034239 1.82034239 5.42220438 1\n Sn Sn2 1 0.00000000 0.00000000 1.62687419 1\n Tc Tc3 1 1.82034239 1.82034239 2.72049964 1\n[\/CIF]\n. Show me the chemical formula.\nAssistant: Of course, the composition is LiSn2Tc"} {"text":"User: The CIF file of the solid is [CIF]\ndata_SrTlIn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.99739023\n_cell_length_b 3.99739023\n_cell_length_c 7.60218115\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 121.47623062\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n In In0 1 0.00000000 0.00000000 0.05119780 1\n In In1 1 1.99869512 1.99869512 1.60732209 1\n Sr Sr2 1 0.00000000 0.00000000 3.79007600 1\n Tl Tl3 1 1.99869512 1.99869512 5.95467583 1\n[\/CIF]\n. me the composition.\nAssistant: Certainly, the chemical formula is In2SrTl"}", "/scratch/micpie/export/nomad_structure/test_0-4.jsonl": "{"text":"The number of the spacegroup in the International Tables for Crystallography of the symmetrized version of the material with the CIF file [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n is 12."} {"text":"The number of the spacegroup in the International Tables for Crystallography of the symmetrized version of the compound with the CIF file [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n is 123."}", "/scratch/micpie/export/nomad_structure/test_0-12.jsonl": "{"text":"Question: What is the reduced formula of the material with the CIF card [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n?\nAnswer: CuNaSn2."} {"text":"Question: What is the reduced formula of the compound with the CIF file [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n?\nAnswer: BePdTe2."}", "/scratch/micpie/export/nomad_structure/test_0-20.jsonl": "{"text":"Task: Fill the rows masked with `[MASK]` in this CIF file to fulfill the given constraints. Return the CIF file with the masked rows filled.\nMasked CIF file: [CIF]\ndata_NaCuSn2\n[MASK]]\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n[MASK]]\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n[MASK]]\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\n[MASK]]\n _atom_site_type_symbol\n[MASK]]\n[MASK]]\n _atom_site_fract_x\n _atom_site_fract_y\n[MASK]]\n _atom_site_occupancy\n[MASK]]\n[MASK]]\n[MASK]]\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n\nConstraint: The density should be 5521.174 kg\/m^3, and the reduced formula should be CuNaSn2.\nAnswer: [CIF]\ndata_NaCuSn2\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 4.45324203\n_cell_length_b 4.45324203\n_cell_length_c 7.05228224\n_cell_angle_alpha 100.78194165\n_cell_angle_beta 100.78194165\n_cell_angle_gamma 45.35183926\n_symmetry_Int_Tables_number 1\n_cell_volume 97.43239791\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Cu Cu0 1 3.39410418 0.00000000 3.45290960 1\n Na Na1 1 0.00000000 0.00000000 0.00000000 1\n Sn Sn2 1 0.95952377 0.00000000 4.69925783 1\n Sn Sn3 1 5.82868459 0.00000000 2.20656137 1\n[\/CIF]\n"} {"text":"Task: Fill the rows masked with `[MASK]` in this CIF to fulfill the given constraints. Return the CIF with the masked rows filled.\nMasked CIF: [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n[MASK]]\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n[MASK]]\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\n[MASK]]\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n[MASK]]\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n[MASK]]\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n\nConstraint: The density should be 7785.437 kg\/m^3, and the reduced formula should be BePdTe2.\nAnswer: [CIF]\ndata_BeTe2Pd\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 3.32841096\n_cell_length_b 3.32841096\n_cell_length_c 7.13567981\n_cell_angle_alpha 90.00000000\n_cell_angle_beta 90.00000000\n_cell_angle_gamma 90.00000000\n_symmetry_Int_Tables_number 1\n_cell_volume 79.05134092\n_cell_formula_units_Z 1\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Be Be0 1 0.00000000 0.00000000 0.00000000 1\n Pd Pd1 1 0.00000000 0.00000000 3.56783991 1\n Te Te2 1 1.66420548 1.66420548 5.46806623 1\n Te Te3 1 1.66420548 1.66420548 1.66761358 1\n[\/CIF]\n"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/test_0-10.jsonl": "{"text":"User: I'm looking for the SELFIES of a molecule that is inhibiting CYP1A2?\nAssistant: This is a molecule that is inhibiting CYP1A2: [O][=C][NH1][C][=C][C][=C][Branch2][Ring2][Branch1][C][=C][Ring1][=Branch1][C][=C][Ring1][#Branch2][C][N][Branch1][Branch1][C][C][C][O][C][C][=N][N][=N][N][Ring1][Branch1][C][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][O][C][O][Ring2][Ring1][=N]"} {"text":"User: I'm looking for the SELFIES of a molecule that is inhibiting CYP P450 1A2?\nAssistant: This is a molecule that is inhibiting CYP P450 1A2: [O][=C][Branch2][Ring1][=Branch2][N][C][=N][N][=C][Branch1][#C][S][C][C][=C][Branch1][C][Cl][C][=C][C][=C][Ring1][#Branch1][Cl][S][Ring1][#C][C][C][C][C][O][Ring1][Branch1]"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/valid_0-8.jsonl": "{"text":"User: Is the molecule with the DeepSMILES CCOC=O)CSC=CC#N))CC)C=CCCCC6=O)))))N6 inhibiting CYP1A2?\nAssistant: Yes, it is inhibiting CYP1A2."} {"text":"User: Is the molecule with the SELFIES [C][C][S][C][=N][N][=C][N][Branch1][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][N][Ring1][#C][Ring1][N] inhibiting CYP P450 1A2?\nAssistant: Yes, it is inhibiting CYP P450 1A2."}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/train_0-8.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C17H18N2O3S2\/c1-3-18-12-8-5-4-7-11(12)13(15(18)20)14-16(21)19(17(23)24-14)9-6-10-22-2\/h4-5,7-8H,3,6,9-10H2,1-2H3\/b14-13- inhibiting CYP P450 1A2?\nAssistant: Yes, it is inhibiting CYP P450 1A2."} {"text":"User: Is the molecule with the SELFIES [C][C][=C][C][=C][Branch2][Ring2][#C][S][=Branch1][C][=O][=Branch1][C][=O][O][C][=C][C][=C][Branch2][Ring1][O][C][C][=C][Branch1][Branch2][C][C][C][C][Ring1][=Branch1][=O][O][C][=C][Ring1][O][C][=Branch1][C][=O][C][C][C][Ring1][#Branch1][C][=C][Ring2][Ring1][=Branch1][C][=C][Ring2][Ring1][S] inhibiting CYP P450 1A2?\nAssistant: No, it is not inhibiting CYP P450 1A2."}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP1A2.\nMolecule canonical SMILES: O=c1[nH]c2cc3c(cc2cc1CN(CCCO)Cc1nnnn1Cc1ccc(F)cc1)OCO3\nConstraint: Answer the question in a full sentence.\nResult: This molecule is inhibiting CYP1A2."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 1A2.\nSMILES: O=C(Nc1nnc(SCc2c(Cl)cccc2Cl)s1)C1CCCO1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is inhibiting CYP P450 1A2."}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/valid_0-9.jsonl": "{"text":"User: Can you create the DeepSMILES of a molecule that is inhibiting CYP1A2?\nAssistant: Yes, I'm happy to help, here you go: CCOC=O)CSC=CC#N))CC)C=CCCCC6=O)))))N6"} {"text":"User: Can you generate the canonical SMILES of a molecule that is inhibiting CYP1A2?\nAssistant: Sure, here you go: CCSc1nnc2n(N)c(=O)c3ccccc3n12"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/test_0-1.jsonl": "{"text":"Based on the canonical SMILES representation O=c1[nH]c2cc3c(cc2cc1CN(CCCO)Cc1nnnn1Cc1ccc(F)cc1)OCO3, the molecule exhibits inhibition of CYP P450 1A2."} {"text":"Based on the DeepSMILES O=CNcnncSCccCl)cccc6Cl)))))))))s5))))))CCCCO5, the molecule displays inhibition of CYP1A2."}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/valid_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][O][C][=Branch1][C][=O][C][S][C][=C][Branch1][Ring1][C][#N][C][Branch1][C][C][C][=C][Branch1][Branch2][C][C][C][C][Ring1][=Branch1][=O][N][Ring1][=C] displays inhibition of CYP P450 1A2."} {"text":"The molecule with the InChI InChI=1S\/C11H11N5OS\/c1-2-18-11-14-13-10-15(11)8-6-4-3-5-7(8)9(17)16(10)12\/h3-6H,2,12H2,1H3 exhibits inhibition of CYP P450 1A2."}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/test_0-2.jsonl": "{"text":"The canonical SMILES O=c1[nH]c2cc3c(cc2cc1CN(CCCO)Cc1nnnn1Cc1ccc(F)cc1)OCO3 is from a molecule that shows inhibition of CYP1A2."} {"text":"The SMILES O=C(Nc1nnc(SCc2c(Cl)cccc2Cl)s1)C1CCCO1 represents a molecule that exhibits inhibition of CYP P450 1A2."}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/valid_0-10.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that is inhibiting CYP1A2?\nAssistant: This is a molecule that is inhibiting CYP1A2: CCOC(=O)CSC1=C(C#N)C(C)C2=C(CCCC2=O)N1"} {"text":"User: I'm searching for the InChI of a molecule that is inhibiting CYP1A2?\nAssistant: This is a molecule that is inhibiting CYP1A2: InChI=1S\/C11H11N5OS\/c1-2-18-11-14-13-10-15(11)8-6-4-3-5-7(8)9(17)16(10)12\/h3-6H,2,12H2,1H3"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/train_0-6.jsonl": "{"text":"Task: Please give me a InChI based on the description below.\nDescription: A molecule that is inhibiting CYP P450 1A2.\nResult: InChI=1S\/C17H18N2O3S2\/c1-3-18-12-8-5-4-7-11(12)13(15(18)20)14-16(21)19(17(23)24-14)9-6-10-22-2\/h4-5,7-8H,3,6,9-10H2,1-2H3\/b14-13-"} {"text":"Task: Please generate a DeepSMILES based on the description.\nDescription: A molecule that is inhibiting CYP P450 1A2.\nResult: CccccS=O)=O)OccccCC=CCCCC6=O)))))OC=C6C=O)CCC6))))))))))cc6))))))))cc6"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/valid_0-6.jsonl": "{"text":"Task: Please create a SELFIES based on the text description.\nDescription: A molecule that is inhibiting CYP P450 1A2.\nResult: [C][C][O][C][=Branch1][C][=O][C][S][C][=C][Branch1][Ring1][C][#N][C][Branch1][C][C][C][=C][Branch1][Branch2][C][C][C][C][Ring1][=Branch1][=O][N][Ring1][=C]"} {"text":"Task: Please give me a InChI based on the text description below.\nDescription: A molecule that is inhibiting CYP P450 1A2.\nResult: InChI=1S\/C11H11N5OS\/c1-2-18-11-14-13-10-15(11)8-6-4-3-5-7(8)9(17)16(10)12\/h3-6H,2,12H2,1H3"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/test_0-9.jsonl": "{"text":"User: Can you create the SMILES of a molecule that is inhibiting CYP1A2?\nAssistant: Of course, here you go: O=c1[nH]c2cc3c(cc2cc1CN(CCCO)Cc1nnnn1Cc1ccc(F)cc1)OCO3"} {"text":"User: Can you give me the SMILES of a molecule that is inhibiting CYP P450 1A2?\nAssistant: Sure, here you go: O=C(Nc1nnc(SCc2c(Cl)cccc2Cl)s1)C1CCCO1"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/test_0-0.jsonl": "{"text":"The molecule with the SMILES representation of O=c1[nH]c2cc3c(cc2cc1CN(CCCO)Cc1nnnn1Cc1ccc(F)cc1)OCO3 displays inhibition of CYP1A2."} {"text":"The molecule with the canonical SMILES representation of O=C(Nc1nnc(SCc2c(Cl)cccc2Cl)s1)C1CCCO1 exhibits inhibition of CYP P450 1A2."}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/valid_0-7.jsonl": "{"text":"User: Can you derive if the molecule with the DeepSMILES CCOC=O)CSC=CC#N))CC)C=CCCCC6=O)))))N6 is inhibiting CYP P450 1A2?\nAssistant: Yes, this molecule is inhibiting CYP P450 1A2."} {"text":"User: Can you derive if the molecule with the canonical SMILES CCSc1nnc2n(N)c(=O)c3ccccc3n12 is inhibiting CYP P450 1A2?\nAssistant: Yes, this molecule is inhibiting CYP P450 1A2."}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/test_0-3.jsonl": "{"text":"The SELFIES [O][=C][NH1][C][=C][C][=C][Branch2][Ring2][Branch1][C][=C][Ring1][=Branch1][C][=C][Ring1][#Branch2][C][N][Branch1][Branch1][C][C][C][O][C][C][=N][N][=N][N][Ring1][Branch1][C][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][O][C][O][Ring2][Ring1][=N] is inhibiting CYP P450 1A2."} {"text":"The molecule SELFIES [O][=C][Branch2][Ring1][=Branch2][N][C][=N][N][=C][Branch1][#C][S][C][C][=C][Branch1][C][Cl][C][=C][C][=C][Ring1][#Branch1][Cl][S][Ring1][#C][C][C][C][C][O][Ring1][Branch1] is inhibiting CYP P450 1A2."}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/valid_0-11.jsonl": "{"text":"User: I want to create a SELFIES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should be inhibiting CYP P450 1A2.\nAssistant: Ok, this SELFIES is inhibiting CYP P450 1A2: [C][C][O][C][=Branch1][C][=O][C][S][C][=C][Branch1][Ring1][C][#N][C][Branch1][C][C][C][=C][Branch1][Branch2][C][C][C][C][Ring1][=Branch1][=O][N][Ring1][=C]"} {"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should be inhibiting CYP P450 1A2.\nAssistant: Got it, here you go, this SMILES is inhibiting CYP P450 1A2: CCSc1nnc2n(N)c(=O)c3ccccc3n12"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/train_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][N][C][=Branch1][C][=O][\/C][=Branch2][Ring1][Ring1][=C][\\S][C][=Branch1][C][=S][N][Branch1][=Branch1][C][C][C][O][C][C][Ring1][O][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring2][Ring1][=Branch1] exhibits inhibition of CYP1A2."} {"text":"The molecule with the SELFIES representation of [C][C][=C][C][=C][Branch2][Ring2][#C][S][=Branch1][C][=O][=Branch1][C][=O][O][C][=C][C][=C][Branch2][Ring1][O][C][C][=C][Branch1][Branch2][C][C][C][C][Ring1][=Branch1][=O][O][C][=C][Ring1][O][C][=Branch1][C][=O][C][C][C][Ring1][#Branch1][C][=C][Ring2][Ring1][=Branch1][C][=C][Ring2][Ring1][S] displays no inhibition of CYP1A2."}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/test_0-6.jsonl": "{"text":"Task: Please create a molecule SELFIES based on the text description.\nDescription: A molecule that is inhibiting CYP P450 1A2.\nResult: [O][=C][NH1][C][=C][C][=C][Branch2][Ring2][Branch1][C][=C][Ring1][=Branch1][C][=C][Ring1][#Branch2][C][N][Branch1][Branch1][C][C][C][O][C][C][=N][N][=N][N][Ring1][Branch1][C][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][O][C][O][Ring2][Ring1][=N]"} {"text":"Task: Please create a canonical SMILES based on the description.\nDescription: A molecule that is inhibiting CYP P450 1A2.\nResult: O=C(Nc1nnc(SCc2c(Cl)cccc2Cl)s1)C1CCCO1"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/train_0-10.jsonl": "{"text":"User: I'm looking for the SELFIES of a molecule that is inhibiting CYP1A2?\nAssistant: This is a molecule that is inhibiting CYP1A2: [C][C][N][C][=Branch1][C][=O][\/C][=Branch2][Ring1][Ring1][=C][\\S][C][=Branch1][C][=S][N][Branch1][=Branch1][C][C][C][O][C][C][Ring1][O][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring2][Ring1][=Branch1]"} {"text":"User: I'm looking for the DeepSMILES of a molecule that is not inhibiting CYP P450 1A2?\nAssistant: This is a molecule that is not inhibiting CYP P450 1A2: CccccS=O)=O)OccccCC=CCCCC6=O)))))OC=C6C=O)CCC6))))))))))cc6))))))))cc6"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/train_0-3.jsonl": "{"text":"The SMILES CCN1C(=O)\/C(=C2\\SC(=S)N(CCCOC)C2=O)c2ccccc21 is inhibiting CYP P450 1A2."} {"text":"The DeepSMILES CccccS=O)=O)OccccCC=CCCCC6=O)))))OC=C6C=O)CCC6))))))))))cc6))))))))cc6 is not inhibiting CYP P450 1A2."}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/train_0-12.jsonl": "{"text":"User: I want to come up with a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should be inhibiting CYP P450 1A2.\nAssistant: Ok, this canonical SMILES is inhibiting CYP P450 1A2: CCN1C(=O)\/C(=C2\\SC(=S)N(CCCOC)C2=O)c2ccccc21"} {"text":"User: I want to create a molecule SELFIES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be inhibiting CYP P450 1A2.\nAssistant: Understood, this SELFIES is not inhibiting CYP P450 1A2: [C][C][=C][C][=C][Branch2][Ring2][#C][S][=Branch1][C][=O][=Branch1][C][=O][O][C][=C][C][=C][Branch2][Ring1][O][C][C][=C][Branch1][Branch2][C][C][C][C][Ring1][=Branch1][=O][O][C][=C][Ring1][O][C][=Branch1][C][=O][C][C][C][Ring1][#Branch1][C][=C][Ring2][Ring1][=Branch1][C][=C][Ring2][Ring1][S]"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI representation of InChI=1S\/C23H23FN6O4\/c24-18-4-2-15(3-5-18)11-30-22(26-27-28-30)13-29(6-1-7-31)12-17-8-16-9-20-21(34-14-33-20)10-19(16)25-23(17)32\/h2-5,8-10,31H,1,6-7,11-14H2,(H,25,32) inhibiting CYP P450 1A2?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any other words.\nOptions:\n1. False\n2. True\nAnswer: 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of O=C(Nc1nnc(SCc2c(Cl)cccc2Cl)s1)C1CCCO1 inhibiting CYP P450 1A2?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any other words.\nOptions:\n1. False\n2. True\nAnswer: 2"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/valid_0-2.jsonl": "{"text":"The canonical SMILES CCOC(=O)CSC1=C(C#N)C(C)C2=C(CCCC2=O)N1 is from a molecule that displays inhibition of CYP P450 1A2."} {"text":"The InChI InChI=1S\/C11H11N5OS\/c1-2-18-11-14-13-10-15(11)8-6-4-3-5-7(8)9(17)16(10)12\/h3-6H,2,12H2,1H3 is from a molecule that exhibits inhibition of CYP P450 1A2."}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are inhibiting CYP1A2?\nConstraint: You must select none, one or more options from a, b, c, or d without using any additional words.\nOptions:\na. InChI=1S\/C17H14FN3OS\/c18-11-5-7-13(8-6-11)23-10-12-9-19-17-20-15-4-2-1-3-14(15)16(22)21(12)17\/h1-8,12H,9-10H2,(H,19,20)\nb. InChI=1S\/C11H15NO2\/c1-2-3-8-14-11(13)9-4-6-10(12)7-5-9\/h4-7H,2-3,8,12H2,1H3\nc. InChI=1S\/C14H13N3O\/c1-17(2)14-12-7-10(11-5-6-18-8-11)3-4-13(12)15-9-16-14\/h3-9H,1-2H3\nd. InChI=1S\/C17H18N2O3S2\/c1-3-18-12-8-5-4-7-11(12)13(15(18)20)14-16(21)19(17(23)24-14)9-6-10-22-2\/h4-5,7-8H,3,6,9-10H2,1-2H3\/b14-13-\nAnswer: b, c, d"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting CYP1A2?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any other words.\nOptions:\n(1) InChI=1S\/C8H11N5.C7H5NS2\/c9-7(10)13-8(11)12-6-4-2-1-3-5-6;9-7-8-5-3-1-2-4-6(5)10-7\/h1-5H,(H6,9,10,11,12,13);1-4H,(H,8,9)\n(2) InChI=1S\/C26H24O6S\/c1-16-8-14-19(15-9-16)33(29,30)32-18-12-10-17(11-13-18)24-25-20(27)4-2-6-22(25)31-23-7-3-5-21(28)26(23)24\/h8-15,24H,2-7H2,1H3\n(3) InChI=1S\/C14H15NOS\/c1-3-9-10-7-16-8-12(10)15-11-5-4-6-13(17-2)14(9)11\/h4-6H,3,7-8H2,1-2H3\nAnswer: 2"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/valid_0-1.jsonl": "{"text":"Based on the DeepSMILES CCOC=O)CSC=CC#N))CC)C=CCCCC6=O)))))N6, the molecule displays inhibition of CYP1A2."} {"text":"Based on the canonical SMILES CCSc1nnc2n(N)c(=O)c3ccccc3n12, the molecule exhibits inhibition of CYP P450 1A2."}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES CCOC(=O)CSC1=C(C#N)C(C)C2=C(CCCC2=O)N1 inhibiting CYP P450 1A2?\nConstraint: Even if you are uncertain, you must pick either a or b without using any other words.\nOptions:\na True\nb False\nAnswer: a"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES CCSc1nnc2n(N)c(=O)c3ccccc3n12 inhibiting CYP P450 1A2?\nConstraint: Even if you are uncertain, you must pick either a or b without using any additional words.\nOptions:\na True\nb False\nAnswer: a"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP1A2.\nMolecule SELFIES: [C][C][O][C][=Branch1][C][=O][C][S][C][=C][Branch1][Ring1][C][#N][C][Branch1][C][C][C][=C][Branch1][Branch2][C][C][C][C][Ring1][=Branch1][=O][N][Ring1][=C]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is inhibiting CYP1A2."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 1A2.\nDeepSMILES: CCScnncnN)c=O)cccccc6n%13%10\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is inhibiting CYP P450 1A2."}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 1A2.\nSMILES: CCOC(=O)CSC1=C(C#N)C(C)C2=C(CCCC2=O)N1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP1A2.\nSMILES: CCSc1nnc2n(N)c(=O)c3ccccc3n12\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP1A2.\nSELFIES: [C][C][N][C][=Branch1][C][=O][\/C][=Branch2][Ring1][Ring1][=C][\\S][C][=Branch1][C][=S][N][Branch1][=Branch1][C][C][C][O][C][C][Ring1][O][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring2][Ring1][=Branch1]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is inhibiting CYP1A2."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 1A2.\ncanonical SMILES: Cc1ccc(S(=O)(=O)Oc2ccc(C3C4=C(CCCC4=O)OC4=C3C(=O)CCC4)cc2)cc1\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not inhibiting CYP P450 1A2."}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/valid_0-12.jsonl": "{"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should be inhibiting CYP1A2.\nAssistant: Got it, this SELFIES is inhibiting CYP1A2: [C][C][O][C][=Branch1][C][=O][C][S][C][=C][Branch1][Ring1][C][#N][C][Branch1][C][C][C][=C][Branch1][Branch2][C][C][C][C][Ring1][=Branch1][=O][N][Ring1][=C]"} {"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should be inhibiting CYP1A2.\nAssistant: Understood, this canonical SMILES is inhibiting CYP1A2: CCSc1nnc2n(N)c(=O)c3ccccc3n12"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/train_0-2.jsonl": "{"text":"The SELFIES [C][C][N][C][=Branch1][C][=O][\/C][=Branch2][Ring1][Ring1][=C][\\S][C][=Branch1][C][=S][N][Branch1][=Branch1][C][C][C][O][C][C][Ring1][O][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring2][Ring1][=Branch1] represents a molecule that exhibits inhibition of CYP1A2."} {"text":"The canonical SMILES Cc1ccc(S(=O)(=O)Oc2ccc(C3C4=C(CCCC4=O)OC4=C3C(=O)CCC4)cc2)cc1 is from a molecule that displays no inhibition of CYP1A2."}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/test_0-11.jsonl": "{"text":"User: I want to generate a InChI.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should be inhibiting CYP P450 1A2.\nAssistant: Got it, this InChI is inhibiting CYP P450 1A2: InChI=1S\/C23H23FN6O4\/c24-18-4-2-15(3-5-18)11-30-22(26-27-28-30)13-29(6-1-7-31)12-17-8-16-9-20-21(34-14-33-20)10-19(16)25-23(17)32\/h2-5,8-10,31H,1,6-7,11-14H2,(H,25,32)"} {"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should be inhibiting CYP1A2.\nAssistant: Ok, this DeepSMILES is inhibiting CYP1A2: O=CNcnncSCccCl)cccc6Cl)))))))))s5))))))CCCCO5"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/train_0-7.jsonl": "{"text":"User: Can you derive if the molecule with the DeepSMILES CCNC=O)\/C=C\\SC=S)NCCCOC)))))C5=O))))))cccccc69 is inhibiting CYP P450 1A2?\nAssistant: Yes, this molecule is inhibiting CYP P450 1A2."} {"text":"User: Can you tell me if the molecule with the canonical SMILES Cc1ccc(S(=O)(=O)Oc2ccc(C3C4=C(CCCC4=O)OC4=C3C(=O)CCC4)cc2)cc1 is inhibiting CYP P450 1A2?\nAssistant: No, this molecule is not inhibiting CYP P450 1A2."}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/train_0-11.jsonl": "{"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should be inhibiting CYP1A2.\nAssistant: Ok, here you go, this DeepSMILES is inhibiting CYP1A2: CCNC=O)\/C=C\\SC=S)NCCCOC)))))C5=O))))))cccccc69"} {"text":"User: I want to come up with a SELFIES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be inhibiting CYP1A2.\nAssistant: Got it, this SELFIES is not inhibiting CYP1A2: [C][C][=C][C][=C][Branch2][Ring2][#C][S][=Branch1][C][=O][=Branch1][C][=O][O][C][=C][C][=C][Branch2][Ring1][O][C][C][=C][Branch1][Branch2][C][C][C][C][Ring1][=Branch1][=O][O][C][=C][Ring1][O][C][=Branch1][C][=O][C][C][C][Ring1][#Branch1][C][=C][Ring2][Ring1][=Branch1][C][=C][Ring2][Ring1][S]"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/train_0-1.jsonl": "{"text":"Based on the SMILES representation CCN1C(=O)\/C(=C2\\SC(=S)N(CCCOC)C2=O)c2ccccc21, the molecule displays inhibition of CYP1A2."} {"text":"Based on the canonical SMILES representation Cc1ccc(S(=O)(=O)Oc2ccc(C3C4=C(CCCC4=O)OC4=C3C(=O)CCC4)cc2)cc1, the molecule shows no inhibition of CYP P450 1A2."}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES CCNC=O)\/C=C\\SC=S)NCCCOC)))))C5=O))))))cccccc69 inhibiting CYP1A2?\nConstraint: Even if you are uncertain, you must pick either A or B without using any other words.\nOptions:\nA: False\nB: True\nAnswer: B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES representation of CccccS=O)=O)OccccCC=CCCCC6=O)))))OC=C6C=O)CCC6))))))))))cc6))))))))cc6 inhibiting CYP1A2?\nConstraint: Even if you are uncertain, you must pick either A or B without using any other words.\nOptions:\nA False\nB True\nAnswer: A"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP1A2.\nSMILES: CCN1C(=O)\/C(=C2\\SC(=S)N(CCCOC)C2=O)c2ccccc21\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 1A2.\ncanonical SMILES: Cc1ccc(S(=O)(=O)Oc2ccc(C3C4=C(CCCC4=O)OC4=C3C(=O)CCC4)cc2)cc1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/test_0-7.jsonl": "{"text":"User: Can you derive if the molecule with the SELFIES [O][=C][NH1][C][=C][C][=C][Branch2][Ring2][Branch1][C][=C][Ring1][=Branch1][C][=C][Ring1][#Branch2][C][N][Branch1][Branch1][C][C][C][O][C][C][=N][N][=N][N][Ring1][Branch1][C][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][O][C][O][Ring2][Ring1][=N] is inhibiting CYP1A2?\nAssistant: Yes, this molecule is inhibiting CYP1A2."} {"text":"User: Can you tell me if the molecule with the canonical SMILES O=C(Nc1nnc(SCc2c(Cl)cccc2Cl)s1)C1CCCO1 is inhibiting CYP P450 1A2?\nAssistant: Yes, this molecule is inhibiting CYP P450 1A2."}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/train_0-9.jsonl": "{"text":"User: Can you give me the canonical SMILES of a molecule that is inhibiting CYP P450 1A2?\nAssistant: Sure, here you go: CCN1C(=O)\/C(=C2\\SC(=S)N(CCCOC)C2=O)c2ccccc21"} {"text":"User: Can you give me the SELFIES of a molecule that is not inhibiting CYP P450 1A2?\nAssistant: Yes, here you go: [C][C][=C][C][=C][Branch2][Ring2][#C][S][=Branch1][C][=O][=Branch1][C][=O][O][C][=C][C][=C][Branch2][Ring1][O][C][C][=C][Branch1][Branch2][C][C][C][C][Ring1][=Branch1][=O][O][C][=C][Ring1][O][C][=Branch1][C][=O][C][C][C][Ring1][#Branch1][C][=C][Ring2][Ring1][=Branch1][C][=C][Ring2][Ring1][S]"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/valid_0-3.jsonl": "{"text":"The molecule DeepSMILES CCOC=O)CSC=CC#N))CC)C=CCCCC6=O)))))N6 is inhibiting CYP P450 1A2."} {"text":"The DeepSMILES CCScnncnN)c=O)cccccc6n%13%10 is inhibiting CYP P450 1A2."}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/test_0-8.jsonl": "{"text":"User: Is the molecule with the SELFIES [O][=C][NH1][C][=C][C][=C][Branch2][Ring2][Branch1][C][=C][Ring1][=Branch1][C][=C][Ring1][#Branch2][C][N][Branch1][Branch1][C][C][C][O][C][C][=N][N][=N][N][Ring1][Branch1][C][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][O][C][O][Ring2][Ring1][=N] inhibiting CYP P450 1A2?\nAssistant: Yes, it is inhibiting CYP P450 1A2."} {"text":"User: Is the molecule with the canonical SMILES O=C(Nc1nnc(SCc2c(Cl)cccc2Cl)s1)C1CCCO1 inhibiting CYP P450 1A2?\nAssistant: Yes, it is inhibiting CYP P450 1A2."}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are inhibiting CYP P450 1A2?\nConstraint: You must select none, one or more options from a, b, or c without using any additional words.\nOptions:\na) C[C@H]NC=O)NCCCCcccccc6%10)))))))))))))C=O)O\nb) O=c[nH]cccccc6cc%10CNCCCO))))Ccnnnn5CccccF)cc6)))))))))))))))))))OCO5\nc) Ccccccc6-cncNCCNC)CC6))))))cccccc6n%10\nAnswer: b, c"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are inhibiting CYP P450 1A2?\nConstraint: You must select none, one or more options from 1, 2, 3, 4, or 5 without using any other words.\nOptions:\n1) [O][C][C][N][C@@H1][C][=C][C][Branch1][C][Cl][=C][C][=C][Ring1][#Branch1][C][=C][Branch1][C][Cl][C][=C][Branch1][C][Cl][C][=C][Ring1][Branch2][Ring1][S]\n2) [C][C][Branch1][C][C][C][C][N][C][=Branch1][C][=S][N][C][C][C][C][C][Ring1][Branch1]\n3) [C][C][=C][C][Branch2][Ring2][=Branch1][N][C][=Branch1][C][=O][C][Branch1][C][O][=C][Branch1][N][C][=Branch1][C][=O][C][=C][C][=C][O][Ring1][Branch1][C][Ring1][=C][C][=C][C][=C][Branch1][C][O][C][=C][Ring1][#Branch1][=N][O][Ring2][Ring1][#Branch2]\n4) [O][=C][Branch1][C][O][C][S][=Branch1][C][=O][=Branch1][C][=O][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]\n5) [O][=C][Branch2][Ring1][=Branch2][N][C][=N][N][=C][Branch1][#C][S][C][C][=C][Branch1][C][Cl][C][=C][C][=C][Ring1][#Branch1][Cl][S][Ring1][#C][C][C][C][C][O][Ring1][Branch1]\nAnswer: 1, 2, 5"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are inhibiting CYP P450 1A2?\nConstraint: You must select none, one or more options from a, b, or c without using any other words.\nOptions:\na. Cn1c(=O)[nH]c2ncn(C)c2c1=O\nb. COc1ccc(NC(=O)N2CC3(CCN(C(=O)c4ccncc4)CC3)C2)cc1\nc. CCOC(=O)CSC1=C(C#N)C(C)C2=C(CCCC2=O)N1\nAnswer: c"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are inhibiting CYP P450 1A2?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any other words.\nOptions:\n1 [C][C@@H1][C][C@H1][C@@H1][C][C@H1][Branch1][C][F][C][=C][C][=Branch1][C][=O][C][=C][C@@][Ring1][#Branch1][Branch1][C][C][C@][Ring1][=N][Branch1][C][F][C@H1][Branch1][C][O][C][C@][Ring2][Ring1][Ring1][Branch1][C][C][C@@][Ring2][Ring1][#Branch1][Branch1][C][O][C][=Branch1][C][=O][C][O]\n2 [C][C][S][C][=N][N][=C][N][Branch1][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][N][Ring1][#C][Ring1][N]\n3 [C][C][=C][C][=C][Branch2][Ring1][S][C][S][C][=N][N][=C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][N][Ring1][=Branch2][C][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][=C][Ring2][Ring1][=Branch2]\n4 [C][N][C][C][N][Branch2][Ring2][S][C][=N][C][=C][N][=C][Branch1][#C][C][=C][C][Branch1][C][F][=C][C][Branch1][C][F][=C][Ring1][Branch2][C][=Branch1][C][=O][N][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring2][Ring1][#Branch1][=N][Ring2][Ring1][O][C][C][Ring2][Ring1][P]\nAnswer: 2, 3"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 1A2.\nMolecule SELFIES: [O][=C][NH1][C][=C][C][=C][Branch2][Ring2][Branch1][C][=C][Ring1][=Branch1][C][=C][Ring1][#Branch2][C][N][Branch1][Branch1][C][C][C][O][C][C][=N][N][=N][N][Ring1][Branch1][C][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][O][C][O][Ring2][Ring1][=N]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP1A2.\nDeepSMILES: O=CNcnncSCccCl)cccc6Cl)))))))))s5))))))CCCCO5\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"}", "/scratch/micpie/export/cyp_p450_1a2_inhibition_veith_et_al/test_0-12.jsonl": "{"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should be inhibiting CYP P450 1A2.\nAssistant: Ok, this canonical SMILES is inhibiting CYP P450 1A2: O=c1[nH]c2cc3c(cc2cc1CN(CCCO)Cc1nnnn1Cc1ccc(F)cc1)OCO3"} {"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should be inhibiting CYP1A2.\nAssistant: Ok, this SELFIES is inhibiting CYP1A2: [O][=C][Branch2][Ring1][=Branch2][N][C][=N][N][=C][Branch1][#C][S][C][C][=C][Branch1][C][Cl][C][=C][C][=C][Ring1][#Branch1][Cl][S][Ring1][#C][C][C][C][C][O][Ring1][Branch1]"}", "/scratch/micpie/export/lipophilicity/test_0-10.jsonl": "{"text":"User: Can you generate the DeepSMILES of a molecule that has a octanol\/water distribution coefficient (logD at pH 7.4) of -1.180 (dimensionless)?\nAssistant: Of course, here you go: COcccOC))ccc6NC=O)CSCC=O)O)))))))))S=O)=O)NCC)CCcccccc%106"} {"text":"User: Can you generate the canonical SMILES of a molecule that has a octanol\/water distribution coefficient (logD at pH 7.4) of 2.100 (dimensionless)?\nAssistant: Of course, here you go: COc1cccc2[nH]ncc12"}", "/scratch/micpie/export/lipophilicity/valid_0-8.jsonl": "{"text":"Task: Please create a molecule canonical SMILES based on the text description.\nDescription: A molecule that has a logD at pH 7.4 of -0.080 (dimensionless).\nResult: O=C1CCCCCN1"} {"text":"Task: Please create a DeepSMILES based on the description.\nDescription: A molecule that has a logD at pH 7.4 of 3.850 (dimensionless).\nResult: OCCccccNC=O)cccccCl)ccc6[nH]9)))))))))))cc6"}", "/scratch/micpie/export/lipophilicity/train_0-8.jsonl": "{"text":"Task: Please generate a molecule DeepSMILES based on the description below.\nDescription: A molecule that has a octanol\/water distribution coefficient (logD at pH 7.4) of 3.540 (dimensionless).\nResult: CncCNCCNCC6))ccccCl)cc6)))))))))))ncccccc96"} {"text":"Task: Please create a DeepSMILES based on the description below.\nDescription: A molecule that has a logD at pH 7.4 of 2.700 (dimensionless).\nResult: CNC=O)C=CCCccccccccc6c%10))))))))))))N=C6N"}", "/scratch/micpie/export/lipophilicity/test_0-5.jsonl": "{"text":"The molecule with the SELFIES [C][O][C][=C][C][Branch1][Ring1][O][C][=C][Branch2][Ring1][C][C][=C][Ring1][Branch2][N][C][=Branch1][C][=O][C][S][C][C][=Branch1][C][=O][O][S][=Branch1][C][=O][=Branch1][C][=O][N][C][Branch1][C][C][C][C][C][=C][C][=C][C][=C][Ring1][O][Ring1][=Branch1] has a octanol\/water distribution coefficient of -1.180 (dimensionless)."} {"text":"The molecule with the InChI InChI=1S\/C8H8N2O\/c1-11-8-4-2-3-7-6(8)5-9-10-7\/h2-5H,1H3,(H,9,10) has a octanol\/water distribution coefficient (logD at pH 7.4) of 2.100 (dimensionless)."}", "/scratch/micpie/export/lipophilicity/valid_0-9.jsonl": "{"text":"User: Can you estimate the octanol\/water distribution coefficient (logD at pH 7.4) in (dimensionless) of the molecule with the SMILES O=C1CCCCCN1?\nAssistant: Of course, this molecule has a octanol\/water distribution coefficient (logD at pH 7.4) of -0.080 (dimensionless)."} {"text":"User: Can you derive the octanol\/water distribution coefficient in (dimensionless) of the molecule with the SMILES OCCc1ccc(NC(=O)c2cc3cc(Cl)ccc3[nH]2)cc1?\nAssistant: Yes, this molecule has a octanol\/water distribution coefficient of 3.850 (dimensionless)."}", "/scratch/micpie/export/lipophilicity/test_0-1.jsonl": "{"text":"Question: Please estimate the octanol\/water distribution coefficient of InChI=1S\/C22H26N2O7S2\/c1-14-8-9-15-6-4-5-7-17(15)24(14)33(28,29)20-10-16(18(30-2)11-19(20)31-3)23-21(25)12-32-13-22(26)27\/h4-7,10-11,14H,8-9,12-13H2,1-3H3,(H,23,25)(H,26,27) by picking one choice of a, b, c, d, or e.\nOptions:\na: -1.180\nb: 3.3\nc: 3.38\nd: 3.0\ne: 0.17\nAnswer: a"} {"text":"Question: Please estimate the octanol\/water distribution coefficient of [C][O][C][=C][C][=C][C][NH1][N][=C][C][Ring1][=Branch2][=Ring1][Branch1] by picking one choice of a, b, c, d, or e.\nOptions:\na) 3.0\nb) 3.8\nc) 3.2\nd) 2.100\ne) 1.82\nAnswer: d"}", "/scratch/micpie/export/lipophilicity/valid_0-0.jsonl": "{"text":"Task: Please answer the multiple choice question below with A, B, C, D, or E.\nQuestion: What is the octanol\/water distribution coefficient of the canonical SMILES O=C1CCCCCN1?\nOptions:\nA. 0.7\nB. -0.080\nC. 2.28\nD. 4.21\nE. -0.1\nAnswer: B"} {"text":"Task: Please answer the multiple choice question below with A, B, C, or D.\nQuestion: What is the octanol\/water distribution coefficient of the canonical SMILES O=C(Nc1ccc(CCO)cc1)c1cc2cc(Cl)ccc2[nH]1?\nOptions:\nA. 1.36\nB. 3.850\nC. 2.8\nD. 2.48\nAnswer: B"}", "/scratch/micpie/export/lipophilicity/test_0-2.jsonl": "{"text":"The molecule with the DeepSMILES representation of COcccOC))ccc6NC=O)CSCC=O)O)))))))))S=O)=O)NCC)CCcccccc%106 has a logD at pH 7.4 of -1.180 (dimensionless)."} {"text":"The molecule with the canonical SMILES representation of COc1cccc2[nH]ncc12 has a logD at pH 7.4 of 2.100 (dimensionless)."}", "/scratch/micpie/export/lipophilicity/valid_0-10.jsonl": "{"text":"User: Can you give me the InChI of a molecule that has a octanol\/water distribution coefficient of -0.080 (dimensionless)?\nAssistant: Sure, here you go: InChI=1S\/C6H11NO\/c8-6-4-2-1-3-5-7-6\/h1-5H2,(H,7,8)"} {"text":"User: Can you give me the DeepSMILES of a molecule that has a octanol\/water distribution coefficient (logD at pH 7.4) of 3.850 (dimensionless)?\nAssistant: Of course, here you go: OCCccccNC=O)cccccCl)ccc6[nH]9)))))))))))cc6"}", "/scratch/micpie/export/lipophilicity/train_0-6.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the octanol\/water distribution coefficient (logD at pH 7.4) in (dimensionless).\nMolecule SMILES: Cn1c(CN2CCN(CC2)c3ccc(Cl)cc3)nc4ccccc14\nConstraint: Even if you are uncertain, you must answer with a numeric value in (dimensionless) without using any other words.\nResult: 3.540 (dimensionless)"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the octanol\/water distribution coefficient (logD at pH 7.4) in (dimensionless).\nSELFIES: [C][N][C][=Branch1][C][=O][C][=C][Branch1][P][C][C][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][N][=C][Ring2][Ring1][Ring1][N]\nConstraint: Even if you are uncertain, you must answer with a numeric value in (dimensionless) without using any other words.\nResult: 2.700 (dimensionless)"}", "/scratch/micpie/export/lipophilicity/valid_0-6.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the octanol\/water distribution coefficient in (dimensionless).\nMolecule DeepSMILES: O=CCCCCCN7\nConstraint: Even if you are uncertain, you must answer with a numeric value in (dimensionless) without using any other words.\nResult: -0.080 (dimensionless)"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the octanol\/water distribution coefficient (logD at pH 7.4) in (dimensionless).\nDeepSMILES: OCCccccNC=O)cccccCl)ccc6[nH]9)))))))))))cc6\nConstraint: Even if you are uncertain, you must answer with a numeric value in (dimensionless) without using any additional words.\nResult: 3.850 (dimensionless)"}", "/scratch/micpie/export/lipophilicity/test_0-9.jsonl": "{"text":"User: Can you derive the logD at pH 7.4 in (dimensionless) of the molecule with the SMILES COc1cc(OC)c(cc1NC(=O)CSCC(=O)O)S(=O)(=O)N2C(C)CCc3ccccc23?\nAssistant: Of course, this molecule has a logD at pH 7.4 of -1.180 (dimensionless)."} {"text":"User: Can you estimate the octanol\/water distribution coefficient in (dimensionless) of the molecule with the canonical SMILES COc1cccc2[nH]ncc12?\nAssistant: Of course, this molecule has a octanol\/water distribution coefficient of 2.100 (dimensionless)."}", "/scratch/micpie/export/lipophilicity/test_0-0.jsonl": "{"text":"Task: Please answer the multiple choice question below with a, b, c, or d.\nQuestion: What is the octanol\/water distribution coefficient (logD at pH 7.4) of the InChI InChI=1S\/C22H26N2O7S2\/c1-14-8-9-15-6-4-5-7-17(15)24(14)33(28,29)20-10-16(18(30-2)11-19(20)31-3)23-21(25)12-32-13-22(26)27\/h4-7,10-11,14H,8-9,12-13H2,1-3H3,(H,23,25)(H,26,27)?\nOptions:\n(a) -1.180\n(b) 2.43\n(c) 0.91\n(d) 1.2\nAnswer: a"} {"text":"Task: Please answer the multiple choice question below with A, B, C, D, or E.\nQuestion: What is the octanol\/water distribution coefficient of the SMILES COc1cccc2[nH]ncc12?\nOptions:\nA. 1.91\nB. 2.99\nC. 3.7\nD. 2.100\nE. 3.2\nAnswer: D"}", "/scratch/micpie/export/lipophilicity/valid_0-7.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the logD at pH 7.4 in (dimensionless).\nMolecule canonical SMILES: O=C1CCCCCN1\nConstraint: Even if you are not sure, you must answer with a numeric value in (dimensionless) without the unit and without using any additional words.\nResult: -0.080"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the octanol\/water distribution coefficient (logD at pH 7.4) in (dimensionless).\nDeepSMILES: OCCccccNC=O)cccccCl)ccc6[nH]9)))))))))))cc6\nConstraint: Even if you are not sure, you must answer with a numeric value in (dimensionless) without the unit and without using any additional words.\nResult: 3.850"}", "/scratch/micpie/export/lipophilicity/test_0-3.jsonl": "{"text":"Based on the SELFIES [C][O][C][=C][C][Branch1][Ring1][O][C][=C][Branch2][Ring1][C][C][=C][Ring1][Branch2][N][C][=Branch1][C][=O][C][S][C][C][=Branch1][C][=O][O][S][=Branch1][C][=O][=Branch1][C][=O][N][C][Branch1][C][C][C][C][C][=C][C][=C][C][=C][Ring1][O][Ring1][=Branch1], the molecule has a octanol\/water distribution coefficient (logD at pH 7.4) of -1.180 (dimensionless)."} {"text":"Based on the SMILES COc1cccc2[nH]ncc12, the molecule has a octanol\/water distribution coefficient (logD at pH 7.4) of 2.100 (dimensionless)."}", "/scratch/micpie/export/lipophilicity/valid_0-11.jsonl": "{"text":"User: I'm looking for the InChI of a molecule that has a octanol\/water distribution coefficient (logD at pH 7.4) of -0.080 (dimensionless).\nAssistant: This is a molecule that has a octanol\/water distribution coefficient (logD at pH 7.4) of -0.080 (dimensionless): InChI=1S\/C6H11NO\/c8-6-4-2-1-3-5-7-6\/h1-5H2,(H,7,8)"} {"text":"User: I'm looking for the SELFIES of a molecule that has a octanol\/water distribution coefficient (logD at pH 7.4) of 3.850 (dimensionless).\nAssistant: This is a molecule that has a octanol\/water distribution coefficient (logD at pH 7.4) of 3.850 (dimensionless): [O][C][C][C][=C][C][=C][Branch2][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=C][C][=C][C][Branch1][C][Cl][=C][C][=C][Ring1][#Branch1][NH1][Ring1][#Branch2][C][=C][Ring2][Ring1][Ring1]"}", "/scratch/micpie/export/lipophilicity/train_0-0.jsonl": "{"text":"Task: Please answer the multiple choice question below with 1, 2, or 3.\nQuestion: What is the octanol\/water distribution coefficient of the InChI InChI=1S\/C19H21ClN4\/c1-22-18-5-3-2-4-17(18)21-19(22)14-23-10-12-24(13-11-23)16-8-6-15(20)7-9-16\/h2-9H,10-14H2,1H3?\nOptions:\n1.) 2.36\n2.) 1.82\n3.) 3.540\nAnswer: 3"} {"text":"Task: Please answer the multiple choice question below with a, b, or c.\nQuestion: What is the logD at pH 7.4 of the DeepSMILES CNC=O)C=CCCccccccccc6c%10))))))))))))N=C6N?\nOptions:\na.) 0.68\nb.) 2.32\nc.) 2.700\nAnswer: c"}", "/scratch/micpie/export/lipophilicity/test_0-6.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the logD at pH 7.4 in (dimensionless).\nMolecule SMILES: COc1cc(OC)c(cc1NC(=O)CSCC(=O)O)S(=O)(=O)N2C(C)CCc3ccccc23\nConstraint: Even if you are not sure, you must answer with a numeric value in (dimensionless) without using any additional words.\nResult: -1.180 (dimensionless)"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the logD at pH 7.4 in (dimensionless).\nMolecule DeepSMILES: COccccc[nH]ncc95\nConstraint: Even if you are not sure, you must answer with a numeric value in (dimensionless) without using any additional words.\nResult: 2.100 (dimensionless)"}", "/scratch/micpie/export/lipophilicity/train_0-10.jsonl": "{"text":"User: Can you give me the canonical SMILES of a molecule that has a logD at pH 7.4 of 3.540 (dimensionless)?\nAssistant: Yes, here you go: Cn1c(CN2CCN(c3ccc(Cl)cc3)CC2)nc2ccccc21"} {"text":"User: Can you give me the InChI of a molecule that has a octanol\/water distribution coefficient of 2.700 (dimensionless)?\nAssistant: Yes, here you go: InChI=1S\/C17H17N3O\/c1-20-16(21)11-15(19-17(20)18)9-7-12-6-8-13-4-2-3-5-14(13)10-12\/h2-6,8,10-11H,7,9H2,1H3,(H2,18,19)"}", "/scratch/micpie/export/lipophilicity/train_0-3.jsonl": "{"text":"Based on the canonical SMILES Cn1c(CN2CCN(c3ccc(Cl)cc3)CC2)nc2ccccc21, the molecule has a octanol\/water distribution coefficient (logD at pH 7.4) of 3.540 (dimensionless)."} {"text":"Based on the SMILES CN1C(=O)C=C(CCc2ccc3ccccc3c2)N=C1N, the molecule has a octanol\/water distribution coefficient of 2.700 (dimensionless)."}", "/scratch/micpie/export/lipophilicity/train_0-12.jsonl": "{"text":"User: I want to come up with a canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should have a logD at pH 7.4 of 3.540 (dimensionless).\nAssistant: Got it, this canonical SMILES represents a molecule that has a logD at pH 7.4 of 3.540 (dimensionless): Cn1c(CN2CCN(c3ccc(Cl)cc3)CC2)nc2ccccc21"} {"text":"User: I want to generate a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should have a octanol\/water distribution coefficient (logD at pH 7.4) of 2.700 (dimensionless).\nAssistant: Got it, here you go, this canonical SMILES represents a molecule that has a octanol\/water distribution coefficient (logD at pH 7.4) of 2.700 (dimensionless): Cn1c(N)nc(CCc2ccc3ccccc3c2)cc1=O"}", "/scratch/micpie/export/lipophilicity/test_0-13.jsonl": "{"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should have a octanol\/water distribution coefficient of -1.180 (dimensionless).\nAssistant: Ok, this SELFIES represents a molecule that has a octanol\/water distribution coefficient of -1.180 (dimensionless): [C][O][C][=C][C][Branch1][Ring1][O][C][=C][Branch2][Ring1][C][C][=C][Ring1][Branch2][N][C][=Branch1][C][=O][C][S][C][C][=Branch1][C][=O][O][S][=Branch1][C][=O][=Branch1][C][=O][N][C][Branch1][C][C][C][C][C][=C][C][=C][C][=C][Ring1][O][Ring1][=Branch1]"} {"text":"User: I want to come up with a InChI.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should have a logD at pH 7.4 of 2.100 (dimensionless).\nAssistant: Got it, this InChI represents a molecule that has a logD at pH 7.4 of 2.100 (dimensionless): InChI=1S\/C8H8N2O\/c1-11-8-4-2-3-7-6(8)5-9-10-7\/h2-5H,1H3,(H,9,10)"}", "/scratch/micpie/export/lipophilicity/valid_0-2.jsonl": "{"text":"The molecule with the InChI representation of InChI=1S\/C6H11NO\/c8-6-4-2-1-3-5-7-6\/h1-5H2,(H,7,8) has a logD at pH 7.4 of -0.080 (dimensionless)."} {"text":"The molecule with the DeepSMILES OCCccccNC=O)cccccCl)ccc6[nH]9)))))))))))cc6 has a octanol\/water distribution coefficient (logD at pH 7.4) of 3.850 (dimensionless)."}", "/scratch/micpie/export/lipophilicity/valid_0-1.jsonl": "{"text":"Question: Please estimate the octanol\/water distribution coefficient of O=CCCCCCN7 by picking one choice of a, b, or c.\nOptions:\n[a] -0.080\n[b] 3.0\n[c] 0.8\nAnswer: a"} {"text":"Question: Please estimate the octanol\/water distribution coefficient of InChI=1S\/C17H15ClN2O2\/c18-13-3-6-15-12(9-13)10-16(20-15)17(22)19-14-4-1-11(2-5-14)7-8-21\/h1-6,9-10,20-21H,7-8H2,(H,19,22) by picking one choice of a, b, or c.\nOptions:\na) 0.81\nb) 3.850\nc) 0.71\nAnswer: b"}", "/scratch/micpie/export/lipophilicity/valid_0-13.jsonl": "{"text":"User: I want to create a InChI.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should have a octanol\/water distribution coefficient of -0.080 (dimensionless).\nAssistant: Understood, this InChI represents a molecule that has a octanol\/water distribution coefficient of -0.080 (dimensionless): InChI=1S\/C6H11NO\/c8-6-4-2-1-3-5-7-6\/h1-5H2,(H,7,8)"} {"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a logD at pH 7.4 of 3.850 (dimensionless).\nAssistant: Understood, this SMILES represents a molecule that has a logD at pH 7.4 of 3.850 (dimensionless): OCCc1ccc(NC(=O)c2cc3cc(Cl)ccc3[nH]2)cc1"}", "/scratch/micpie/export/lipophilicity/valid_0-5.jsonl": "{"text":"The molecule with the SMILES O=C1CCCCCN1 has a logD at pH 7.4 of -0.080 (dimensionless)."} {"text":"The molecule with the canonical SMILES O=C(Nc1ccc(CCO)cc1)c1cc2cc(Cl)ccc2[nH]1 has a octanol\/water distribution coefficient of 3.850 (dimensionless)."}", "/scratch/micpie/export/lipophilicity/valid_0-4.jsonl": "{"text":"The DeepSMILES O=CCCCCCN7 represents a molecule with a octanol\/water distribution coefficient (logD at pH 7.4) of -0.080 (dimensionless)."} {"text":"The DeepSMILES OCCccccNC=O)cccccCl)ccc6[nH]9)))))))))))cc6 represents a molecule with a octanol\/water distribution coefficient (logD at pH 7.4) of 3.850 (dimensionless)."}", "/scratch/micpie/export/lipophilicity/train_0-5.jsonl": "{"text":"The molecule with the SELFIES [C][N][C][Branch2][Ring1][#Branch1][C][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][=N][C][=C][C][=C][C][=C][Ring2][Ring1][#Branch1][Ring1][=Branch1] has a octanol\/water distribution coefficient of 3.540 (dimensionless)."} {"text":"The molecule with the SMILES CN1C(=O)C=C(CCc2ccc3ccccc3c2)N=C1N has a octanol\/water distribution coefficient of 2.700 (dimensionless)."}", "/scratch/micpie/export/lipophilicity/valid_0-12.jsonl": "{"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should have a octanol\/water distribution coefficient of -0.080 (dimensionless).\nAssistant: Got it, here you go, this DeepSMILES represents a molecule that has a octanol\/water distribution coefficient of -0.080 (dimensionless): O=CCCCCCN7"} {"text":"User: I want to create a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should have a octanol\/water distribution coefficient of 3.850 (dimensionless).\nAssistant: Got it, this canonical SMILES represents a molecule that has a octanol\/water distribution coefficient of 3.850 (dimensionless): O=C(Nc1ccc(CCO)cc1)c1cc2cc(Cl)ccc2[nH]1"}", "/scratch/micpie/export/lipophilicity/train_0-2.jsonl": "{"text":"The molecule with the InChI representation of InChI=1S\/C19H21ClN4\/c1-22-18-5-3-2-4-17(18)21-19(22)14-23-10-12-24(13-11-23)16-8-6-15(20)7-9-16\/h2-9H,10-14H2,1H3 has a logD at pH 7.4 of 3.540 (dimensionless)."} {"text":"The molecule with the canonical SMILES Cn1c(N)nc(CCc2ccc3ccccc3c2)cc1=O has a octanol\/water distribution coefficient of 2.700 (dimensionless)."}", "/scratch/micpie/export/lipophilicity/test_0-11.jsonl": "{"text":"User: I'm looking for the SMILES of a molecule that has a logD at pH 7.4 of -1.180 (dimensionless).\nAssistant: This is a molecule that has a logD at pH 7.4 of -1.180 (dimensionless): COc1cc(OC)c(cc1NC(=O)CSCC(=O)O)S(=O)(=O)N2C(C)CCc3ccccc23"} {"text":"User: I'm searching for the DeepSMILES of a molecule that has a octanol\/water distribution coefficient of 2.100 (dimensionless).\nAssistant: This is a molecule that has a octanol\/water distribution coefficient of 2.100 (dimensionless): COccccc[nH]ncc95"}", "/scratch/micpie/export/lipophilicity/train_0-7.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the octanol\/water distribution coefficient (logD at pH 7.4) in (dimensionless).\nMolecule canonical SMILES: Cn1c(CN2CCN(c3ccc(Cl)cc3)CC2)nc2ccccc21\nConstraint: Even if you are uncertain, you must answer with a numeric value in (dimensionless) without the unit and without using any other words.\nResult: 3.540"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the octanol\/water distribution coefficient in (dimensionless).\nMolecule DeepSMILES: CNC=O)C=CCCccccccccc6c%10))))))))))))N=C6N\nConstraint: Even if you are not sure, you must answer with a numeric value in (dimensionless) without the unit and without using any additional words.\nResult: 2.700"}", "/scratch/micpie/export/lipophilicity/train_0-11.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that has a octanol\/water distribution coefficient of 3.540 (dimensionless).\nAssistant: This is a molecule that has a octanol\/water distribution coefficient of 3.540 (dimensionless): Cn1c(CN2CCN(c3ccc(Cl)cc3)CC2)nc2ccccc21"} {"text":"User: I'm looking for the InChI of a molecule that has a octanol\/water distribution coefficient of 2.700 (dimensionless).\nAssistant: This is a molecule that has a octanol\/water distribution coefficient of 2.700 (dimensionless): InChI=1S\/C17H17N3O\/c1-20-16(21)11-15(19-17(20)18)9-7-12-6-8-13-4-2-3-5-14(13)10-12\/h2-6,8,10-11H,7,9H2,1H3,(H2,18,19)"}", "/scratch/micpie/export/lipophilicity/train_0-1.jsonl": "{"text":"Question: Please estimate the octanol\/water distribution coefficient of InChI=1S\/C19H21ClN4\/c1-22-18-5-3-2-4-17(18)21-19(22)14-23-10-12-24(13-11-23)16-8-6-15(20)7-9-16\/h2-9H,10-14H2,1H3 by picking one choice of a, b, c, d, e, or f.\nOptions:\na) -1.36\nb) 2.93\nc) 2.16\nd) 0.32\ne) 2.08\nf) 3.540\nAnswer: f"} {"text":"Question: Please estimate the logD at pH 7.4 of CN1C(=O)C=C(CCc2ccc3ccccc3c2)N=C1N by picking one choice of a, b, or c.\nOptions:\na: 2.22\nb: 1.88\nc: 2.700\nAnswer: c"}", "/scratch/micpie/export/lipophilicity/train_0-13.jsonl": "{"text":"User: I want to create a InChI.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a octanol\/water distribution coefficient (logD at pH 7.4) of 3.540 (dimensionless).\nAssistant: Understood, this InChI represents a molecule that has a octanol\/water distribution coefficient (logD at pH 7.4) of 3.540 (dimensionless): InChI=1S\/C19H21ClN4\/c1-22-18-5-3-2-4-17(18)21-19(22)14-23-10-12-24(13-11-23)16-8-6-15(20)7-9-16\/h2-9H,10-14H2,1H3"} {"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should have a logD at pH 7.4 of 2.700 (dimensionless).\nAssistant: Understood, this DeepSMILES represents a molecule that has a logD at pH 7.4 of 2.700 (dimensionless): CNC=O)C=CCCccccccccc6c%10))))))))))))N=C6N"}", "/scratch/micpie/export/lipophilicity/train_0-4.jsonl": "{"text":"The SELFIES [C][N][C][Branch2][Ring1][#Branch1][C][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][=N][C][=C][C][=C][C][=C][Ring2][Ring1][#Branch1][Ring1][=Branch1] is representing a molecule that has a octanol\/water distribution coefficient of 3.540 (dimensionless)."} {"text":"The InChI InChI=1S\/C17H17N3O\/c1-20-16(21)11-15(19-17(20)18)9-7-12-6-8-13-4-2-3-5-14(13)10-12\/h2-6,8,10-11H,7,9H2,1H3,(H2,18,19) represents a molecule with a octanol\/water distribution coefficient (logD at pH 7.4) of 2.700 (dimensionless)."}", "/scratch/micpie/export/lipophilicity/test_0-7.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the octanol\/water distribution coefficient in (dimensionless).\nMolecule SELFIES: [C][O][C][=C][C][Branch1][Ring1][O][C][=C][Branch2][Ring1][C][C][=C][Ring1][Branch2][N][C][=Branch1][C][=O][C][S][C][C][=Branch1][C][=O][O][S][=Branch1][C][=O][=Branch1][C][=O][N][C][Branch1][C][C][C][C][C][=C][C][=C][C][=C][Ring1][O][Ring1][=Branch1]\nConstraint: Even if you are not sure, you must answer with a numeric value in (dimensionless) without the unit and without using any other words.\nResult: -1.180"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the octanol\/water distribution coefficient in (dimensionless).\nInChI: InChI=1S\/C8H8N2O\/c1-11-8-4-2-3-7-6(8)5-9-10-7\/h2-5H,1H3,(H,9,10)\nConstraint: Even if you are not sure, you must answer with a numeric value in (dimensionless) without the unit and without using any other words.\nResult: 2.100"}", "/scratch/micpie/export/lipophilicity/train_0-9.jsonl": "{"text":"User: Can you derive the octanol\/water distribution coefficient in (dimensionless) of the molecule with the SELFIES [C][N][C][Branch2][Ring1][#Branch1][C][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][=N][C][=C][C][=C][C][=C][Ring2][Ring1][#Branch1][Ring1][=Branch1]?\nAssistant: Yes, I'm happy to help, this molecule has a octanol\/water distribution coefficient of 3.540 (dimensionless)."} {"text":"User: Can you derive the octanol\/water distribution coefficient (logD at pH 7.4) in (dimensionless) of the molecule with the canonical SMILES Cn1c(N)nc(CCc2ccc3ccccc3c2)cc1=O?\nAssistant: Yes, this molecule has a octanol\/water distribution coefficient (logD at pH 7.4) of 2.700 (dimensionless)."}", "/scratch/micpie/export/lipophilicity/valid_0-3.jsonl": "{"text":"Based on the canonical SMILES representation of O=C1CCCCCN1, the molecule has a octanol\/water distribution coefficient of -0.080 (dimensionless)."} {"text":"Based on the SELFIES [O][C][C][C][=C][C][=C][Branch2][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=C][C][=C][C][Branch1][C][Cl][=C][C][=C][Ring1][#Branch1][NH1][Ring1][#Branch2][C][=C][Ring2][Ring1][Ring1], the molecule has a octanol\/water distribution coefficient (logD at pH 7.4) of 3.850 (dimensionless)."}", "/scratch/micpie/export/lipophilicity/test_0-8.jsonl": "{"text":"Task: Please create a SELFIES based on the description.\nDescription: A molecule that has a octanol\/water distribution coefficient of -1.180 (dimensionless).\nResult: [C][O][C][=C][C][Branch1][Ring1][O][C][=C][Branch2][Ring1][C][C][=C][Ring1][Branch2][N][C][=Branch1][C][=O][C][S][C][C][=Branch1][C][=O][O][S][=Branch1][C][=O][=Branch1][C][=O][N][C][Branch1][C][C][C][C][C][=C][C][=C][C][=C][Ring1][O][Ring1][=Branch1]"} {"text":"Task: Please create a InChI based on the text description below.\nDescription: A molecule that has a octanol\/water distribution coefficient of 2.100 (dimensionless).\nResult: InChI=1S\/C8H8N2O\/c1-11-8-4-2-3-7-6(8)5-9-10-7\/h2-5H,1H3,(H,9,10)"}", "/scratch/micpie/export/lipophilicity/test_0-4.jsonl": "{"text":"The canonical SMILES COc1cc(OC)c(S(=O)(=O)N2c3ccccc3CCC2C)cc1NC(=O)CSCC(=O)O represents a molecule that has a logD at pH 7.4 of -1.180 (dimensionless)."} {"text":"The canonical SMILES COc1cccc2[nH]ncc12 represents a molecule that has a octanol\/water distribution coefficient (logD at pH 7.4) of 2.100 (dimensionless)."}", "/scratch/micpie/export/lipophilicity/test_0-12.jsonl": "{"text":"User: I want to create a molecule SELFIES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a octanol\/water distribution coefficient (logD at pH 7.4) of -1.180 (dimensionless).\nAssistant: Got it, here you go, this SELFIES represents a molecule that has a octanol\/water distribution coefficient (logD at pH 7.4) of -1.180 (dimensionless): [C][O][C][=C][C][Branch1][Ring1][O][C][=C][Branch2][Ring1][C][C][=C][Ring1][Branch2][N][C][=Branch1][C][=O][C][S][C][C][=Branch1][C][=O][O][S][=Branch1][C][=O][=Branch1][C][=O][N][C][Branch1][C][C][C][C][C][=C][C][=C][C][=C][Ring1][O][Ring1][=Branch1]"} {"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should have a logD at pH 7.4 of 2.100 (dimensionless).\nAssistant: Got it, this DeepSMILES represents a molecule that has a logD at pH 7.4 of 2.100 (dimensionless): COccccc[nH]ncc95"}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/test_0-10.jsonl": "{"text":"User: I'm searching for the DeepSMILES of a molecule that is not inhibiting the activity of cav3 t-type calcium channels?\nAssistant: This is a molecule that is not inhibiting the activity of cav3 t-type calcium channels: S=O)=O)NCCCCC6)))C=O)NccccOCF)F)F)))cc6)))))))))))ccccnc6"} {"text":"User: I'm looking for the InChI of a molecule that is not inhibiting the activity of cav3 t-type calcium channels?\nAssistant: This is a molecule that is not inhibiting the activity of cav3 t-type calcium channels: InChI=1S\/C23H21N3O4S\/c1-30-20-10-9-18-12-19(23(27)25-22(18)13-20)16-26(15-17-6-5-11-24-14-17)31(28,29)21-7-3-2-4-8-21\/h2-14H,15-16H2,1H3,(H,25,27)"}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/valid_0-8.jsonl": "{"text":"User: Is the molecule with the DeepSMILES scccc5N))COCC)C)))=O)))C))C=O)N inhibiting the activity of cav3 t-type calcium channels?\nAssistant: No, it is not inhibiting the activity of cav3 t-type calcium channels."} {"text":"User: Is the molecule with the DeepSMILES S=CNccccNC=O)CC)C))))cc6)))))))NC=O)cccc[N+][O-])=O))cc6 inhibiting the activity of cav3 t-type calcium channels?\nAssistant: No, it is not inhibiting the activity of cav3 t-type calcium channels."}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/train_0-8.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C19H21Cl2N3O3S\/c1-13(2)15-9-7-14(8-10-15)11-22-23-18(25)12-24(28(3,26)27)17-6-4-5-16(20)19(17)21\/h4-11,13H,12H2,1-3H3,(H,23,25)\/b22-11+ inhibiting the activity of cav3 t-type calcium channels?\nAssistant: No, it is not inhibiting the activity of cav3 t-type calcium channels."} {"text":"User: Is the molecule with the SELFIES [S][C][C][C][Branch2][Ring2][Ring1][C][C][C][=Ring1][=Branch1][C][=Branch1][Ring2][=C][Ring1][=Branch2][C][Branch2][Ring1][Ring1][O][C][=C][C][Branch1][Branch2][N][N][=N][N][=C][Ring1][Branch1][=C][C][=C][Ring1][O][=O][C] inhibiting the activity of cav3 t-type calcium channels?\nAssistant: No, it is not inhibiting the activity of cav3 t-type calcium channels."}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of cav3 t-type calcium channels.\nMolecule canonical SMILES: O=C(Nc1ccc(OC(F)(F)F)cc1)C1CCCN(S(=O)(=O)c2cccnc2)C1\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not inhibiting the activity of cav3 t-type calcium channels."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of cav3 t-type calcium channels.\nInChI: InChI=1S\/C23H21N3O4S\/c1-30-20-10-9-18-12-19(23(27)25-22(18)13-20)16-26(15-17-6-5-11-24-14-17)31(28,29)21-7-3-2-4-8-21\/h2-14H,15-16H2,1H3,(H,25,27)\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not inhibiting the activity of cav3 t-type calcium channels."}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/valid_0-9.jsonl": "{"text":"User: Can you generate the SELFIES of a molecule that is not inhibiting the activity of cav3 t-type calcium channels?\nAssistant: Of course, here you go: [S][C][=Branch2][Ring1][#Branch1][=C][Branch2][Ring1][C][C][=Branch1][Branch1][=C][Ring1][Branch1][N][C][Branch1][#Branch1][O][C][Branch1][C][C][C][=O][C][C][=Branch1][C][=O][N]"} {"text":"User: Can you create the InChI of a molecule that is not inhibiting the activity of cav3 t-type calcium channels?\nAssistant: Yes, here you go: InChI=1S\/C18H18N4O4S\/c1-11(2)16(23)19-13-5-7-14(8-6-13)20-18(27)21-17(24)12-3-9-15(10-4-12)22(25)26\/h3-11H,1-2H3,(H,19,23)(H2,20,21,24,27)"}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/test_0-1.jsonl": "{"text":"Based on the canonical SMILES O=C(Nc1ccc(OC(F)(F)F)cc1)C1CCCN(S(=O)(=O)c2cccnc2)C1, the molecule exhibits no inhibition of the cav3 t-type calcium channel activity."} {"text":"Based on the InChI representation InChI=1S\/C23H21N3O4S\/c1-30-20-10-9-18-12-19(23(27)25-22(18)13-20)16-26(15-17-6-5-11-24-14-17)31(28,29)21-7-3-2-4-8-21\/h2-14H,15-16H2,1H3,(H,25,27), the molecule displays no inhibition of the cav3 t-type calcium channel activity."}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/valid_0-0.jsonl": "{"text":"The molecule with the canonical SMILES Cc1c(C(N)=O)sc(N)c1C(=O)OC(C)C displays no inhibition of the cav3 t-type calcium channel activity."} {"text":"The molecule with the canonical SMILES representation of CC(C)C(=O)Nc1ccc(NC(=S)NC(=O)c2ccc([N+](=O)[O-])cc2)cc1 exhibits no inhibition of the cav3 t-type calcium channel activity."}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/test_0-2.jsonl": "{"text":"The SMILES S(=O)(=O)(N1CC(CCC1)C(=O)Nc1ccc(OC(F)(F)F)cc1)c1cccnc1 is from a molecule that displays no inhibition of the cav3 t-type calcium channel activity."} {"text":"The InChI InChI=1S\/C23H21N3O4S\/c1-30-20-10-9-18-12-19(23(27)25-22(18)13-20)16-26(15-17-6-5-11-24-14-17)31(28,29)21-7-3-2-4-8-21\/h2-14H,15-16H2,1H3,(H,25,27) is from a molecule that shows no inhibition of the cav3 t-type calcium channel activity."}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/valid_0-10.jsonl": "{"text":"User: I'm looking for the SMILES of a molecule that is not inhibiting the activity of cav3 t-type calcium channels?\nAssistant: This is a molecule that is not inhibiting the activity of cav3 t-type calcium channels: s1c(c(c(c1N)C(OC(C)C)=O)C)C(=O)N"} {"text":"User: I'm looking for the SMILES of a molecule that is not inhibiting the activity of cav3 t-type calcium channels?\nAssistant: This is a molecule that is not inhibiting the activity of cav3 t-type calcium channels: S=C(Nc1ccc(NC(=O)C(C)C)cc1)NC(=O)c1ccc([N+]([O-])=O)cc1"}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/train_0-6.jsonl": "{"text":"Task: Please generate a molecule SELFIES based on the text description below.\nDescription: A molecule that is inhibiting the activity of cav3 t-type calcium channels.\nResult: [Cl][C][=C][Branch2][Ring2][Ring1][N][Branch1][=Branch2][S][=Branch1][C][=O][=Branch1][C][=O][C][C][C][=Branch1][C][=O][N][\\N][=C][\\C][=C][C][=C][Branch1][=Branch1][C][Branch1][C][C][C][C][=C][Ring1][=Branch2][C][=C][C][=C][Ring2][Ring1][#Branch2][Cl]"} {"text":"Task: Please create a molecule InChI based on the description below.\nDescription: A molecule that is inhibiting the activity of cav3 t-type calcium channels.\nResult: InChI=1S\/C17H16N4O2S\/c1-11-5-6-14-15(9-24-16(14)7-11)17(22)23-13-4-2-3-12(8-13)21-10-18-19-20-21\/h2-4,8-11H,5-7H2,1H3"}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/valid_0-6.jsonl": "{"text":"Task: Please generate a SMILES based on the text description below.\nDescription: A molecule that is inhibiting the activity of cav3 t-type calcium channels.\nResult: s1c(c(c(c1N)C(OC(C)C)=O)C)C(=O)N"} {"text":"Task: Please create a SMILES based on the description below.\nDescription: A molecule that is inhibiting the activity of cav3 t-type calcium channels.\nResult: S=C(Nc1ccc(NC(=O)C(C)C)cc1)NC(=O)c1ccc([N+]([O-])=O)cc1"}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/test_0-9.jsonl": "{"text":"User: Can you create the SELFIES of a molecule that is not inhibiting the activity of cav3 t-type calcium channels?\nAssistant: Of course, here you go: [S][=Branch1][C][=O][=Branch1][C][=O][Branch2][Ring2][Ring1][N][C][C][Branch1][=Branch1][C][C][C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][#Branch2][O][C][Branch1][C][F][Branch1][C][F][F][C][=C][Ring1][O][C][=C][C][=C][N][=C][Ring1][=Branch1]"} {"text":"User: Can you create the canonical SMILES of a molecule that is not inhibiting the activity of cav3 t-type calcium channels?\nAssistant: Yes, here you go: COc1ccc2cc(CN(Cc3cccnc3)S(=O)(=O)c3ccccc3)c(=O)[nH]c2c1"}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/test_0-0.jsonl": "{"text":"The molecule with the SMILES S(=O)(=O)(N1CC(CCC1)C(=O)Nc1ccc(OC(F)(F)F)cc1)c1cccnc1 exhibits no inhibition of the cav3 t-type calcium channel activity."} {"text":"The molecule with the InChI InChI=1S\/C23H21N3O4S\/c1-30-20-10-9-18-12-19(23(27)25-22(18)13-20)16-26(15-17-6-5-11-24-14-17)31(28,29)21-7-3-2-4-8-21\/h2-14H,15-16H2,1H3,(H,25,27) displays no inhibition of the cav3 t-type calcium channel activity."}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/valid_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the SELFIES [S][C][=Branch2][Ring1][#Branch1][=C][Branch2][Ring1][C][C][=Branch1][Branch1][=C][Ring1][Branch1][N][C][Branch1][#Branch1][O][C][Branch1][C][C][C][=O][C][C][=Branch1][C][=O][N] is inhibiting the activity of cav3 t-type calcium channels?\nAssistant: No, this molecule is not inhibiting the activity of cav3 t-type calcium channels."} {"text":"User: Can you estimate if the molecule with the canonical SMILES CC(C)C(=O)Nc1ccc(NC(=S)NC(=O)c2ccc([N+](=O)[O-])cc2)cc1 is inhibiting the activity of cav3 t-type calcium channels?\nAssistant: No, this molecule is not inhibiting the activity of cav3 t-type calcium channels."}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/test_0-3.jsonl": "{"text":"The SELFIES [S][=Branch1][C][=O][=Branch1][C][=O][Branch2][Ring2][Ring1][N][C][C][Branch1][=Branch1][C][C][C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][#Branch2][O][C][Branch1][C][F][Branch1][C][F][F][C][=C][Ring1][O][C][=C][C][=C][N][=C][Ring1][=Branch1] is not inhibiting the activity of cav3 t-type calcium channels."} {"text":"The SMILES S(=O)(=O)(N(Cc1cc2c([nH]c1=O)cc(OC)cc2)Cc1cccnc1)c1ccccc1 is not inhibiting the activity of cav3 t-type calcium channels."}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/valid_0-11.jsonl": "{"text":"User: I want to create a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be inhibiting the activity of cav3 t-type calcium channels.\nAssistant: Got it, this canonical SMILES is not inhibiting the activity of cav3 t-type calcium channels: Cc1c(C(N)=O)sc(N)c1C(=O)OC(C)C"} {"text":"User: I want to generate a molecule InChI.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be inhibiting the activity of cav3 t-type calcium channels.\nAssistant: Got it, this InChI is not inhibiting the activity of cav3 t-type calcium channels: InChI=1S\/C18H18N4O4S\/c1-11(2)16(23)19-13-5-7-14(8-6-13)20-18(27)21-17(24)12-3-9-15(10-4-12)22(25)26\/h3-11H,1-2H3,(H,19,23)(H2,20,21,24,27)"}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/train_0-0.jsonl": "{"text":"The molecule with the DeepSMILES ClccNS=O)=O)C))CC=O)N\\N=C\\ccccCC)C))cc6))))))))))))cccc6Cl displays no inhibition of the cav3 t-type calcium channel activity."} {"text":"The molecule with the canonical SMILES representation of CC1CCc2c(C(=O)Oc3cccc(-n4cnnn4)c3)csc2C1 shows no inhibition of the cav3 t-type calcium channel activity."}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/test_0-6.jsonl": "{"text":"Task: Please create a molecule canonical SMILES based on the text description below.\nDescription: A molecule that is inhibiting the activity of cav3 t-type calcium channels.\nResult: O=C(Nc1ccc(OC(F)(F)F)cc1)C1CCCN(S(=O)(=O)c2cccnc2)C1"} {"text":"Task: Please give me a molecule canonical SMILES based on the description below.\nDescription: A molecule that is inhibiting the activity of cav3 t-type calcium channels.\nResult: COc1ccc2cc(CN(Cc3cccnc3)S(=O)(=O)c3ccccc3)c(=O)[nH]c2c1"}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/train_0-10.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that is not inhibiting the activity of cav3 t-type calcium channels?\nAssistant: This is a molecule that is not inhibiting the activity of cav3 t-type calcium channels: CC(C)c1ccc(\/C=N\/NC(=O)CN(c2cccc(Cl)c2Cl)S(C)(=O)=O)cc1"} {"text":"User: I'm searching for the canonical SMILES of a molecule that is not inhibiting the activity of cav3 t-type calcium channels?\nAssistant: This is a molecule that is not inhibiting the activity of cav3 t-type calcium channels: CC1CCc2c(C(=O)Oc3cccc(-n4cnnn4)c3)csc2C1"}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/train_0-3.jsonl": "{"text":"The canonical SMILES CC(C)c1ccc(\/C=N\/NC(=O)CN(c2cccc(Cl)c2Cl)S(C)(=O)=O)cc1 is not inhibiting the activity of cav3 t-type calcium channels."} {"text":"The molecule SELFIES [S][C][C][C][Branch2][Ring2][Ring1][C][C][C][=Ring1][=Branch1][C][=Branch1][Ring2][=C][Ring1][=Branch2][C][Branch2][Ring1][Ring1][O][C][=C][C][Branch1][Branch2][N][N][=N][N][=C][Ring1][Branch1][=C][C][=C][Ring1][O][=O][C] is not inhibiting the activity of cav3 t-type calcium channels."}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/train_0-12.jsonl": "{"text":"User: I want to come up with a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be inhibiting the activity of cav3 t-type calcium channels.\nAssistant: Got it, this canonical SMILES is not inhibiting the activity of cav3 t-type calcium channels: CC(C)c1ccc(\/C=N\/NC(=O)CN(c2cccc(Cl)c2Cl)S(C)(=O)=O)cc1"} {"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be inhibiting the activity of cav3 t-type calcium channels.\nAssistant: Understood, this SELFIES is not inhibiting the activity of cav3 t-type calcium channels: [S][C][C][C][Branch2][Ring2][Ring1][C][C][C][=Ring1][=Branch1][C][=Branch1][Ring2][=C][Ring1][=Branch2][C][Branch2][Ring1][Ring1][O][C][=C][C][Branch1][Branch2][N][N][=N][N][=C][Ring1][Branch1][=C][C][=C][Ring1][O][=O][C]"}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the activity of cav3 t-type calcium channels?\nConstraint: You must select none, one or more options from A, B, C, or D without using any other words.\nOptions:\nA: S(=O)(=O)(N1CC(CCC1)C(=O)Nc1ccc(OC(F)(F)F)cc1)c1cccnc1\nB: O=C(N1CCN(CC1)C(=O)Nc1cc(OC)ccc1)C(NC(=O)C)Cc1cc(OC)c(OC)cc1\nC: O(c1cc(CCNC(=O)Cn2nc(nn2)c2c(OC)c(OC)ccc2)ccc1OC)C\nD: S(=O)(=O)(N1CCCC1)c1ccc(cc1)C(=O)CSc1n(CCOC)cnn1\nAnswer: A, B, C, D"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the activity of cav3 t-type calcium channels?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any additional words.\nOptions:\n(1) SCcccOcccccc6)))))))ccc6)))))))cnc[nH]n5))C\n(2) S=O)=O)NCcccc[nH]c6=O)))ccOC))cc6)))))))))Cccccnc6))))))))cccccc6\n(3) OCCNcnncnc5ncc9CCCC6)C)))))))))))))CC6C)))))C\n(4) ClcccC=O)CON=CC5)cccccc6)))))))))COc6cc%10))))CCNCC6))C=O)CC\nAnswer: 1, 2, 3, 4"}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/valid_0-2.jsonl": "{"text":"The SMILES s1c(c(c(c1N)C(OC(C)C)=O)C)C(=O)N is from a molecule that shows no inhibition of the cav3 t-type calcium channel activity."} {"text":"The InChI InChI=1S\/C18H18N4O4S\/c1-11(2)16(23)19-13-5-7-14(8-6-13)20-18(27)21-17(24)12-3-9-15(10-4-12)22(25)26\/h3-11H,1-2H3,(H,19,23)(H2,20,21,24,27) is from a molecule that exhibits no inhibition of the cav3 t-type calcium channel activity."}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/valid_0-1.jsonl": "{"text":"Based on the SELFIES [S][C][=Branch2][Ring1][#Branch1][=C][Branch2][Ring1][C][C][=Branch1][Branch1][=C][Ring1][Branch1][N][C][Branch1][#Branch1][O][C][Branch1][C][C][C][=O][C][C][=Branch1][C][=O][N], the molecule displays no inhibition of the cav3 t-type calcium channel activity."} {"text":"Based on the SELFIES [S][=C][Branch2][Ring1][=Branch1][N][C][=C][C][=C][Branch1][O][N][C][=Branch1][C][=O][C][Branch1][C][C][C][C][=C][Ring1][N][N][C][=Branch1][C][=O][C][=C][C][=C][Branch1][=Branch1][N+1][Branch1][C][O-1][=O][C][=C][Ring1][=Branch2], the molecule exhibits no inhibition of the cav3 t-type calcium channel activity."}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the activity of cav3 t-type calcium channels?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any other words.\nOptions:\n1 [O][=C][Branch2][Ring1][Branch1][N][N][\\C][=C][\\C][=Branch1][=Branch1][=N][N][=C][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][N][=C][Ring1][=Branch1]\n2 [S][C][=Branch2][Ring1][#Branch1][=C][Branch2][Ring1][C][C][=Branch1][Branch1][=C][Ring1][Branch1][N][C][Branch1][#Branch1][O][C][Branch1][C][C][C][=O][C][C][=Branch1][C][=O][N]\n3 [S][=C][N][Branch1][P][C][N][=C][Branch1][Ring2][N][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]\n4 [O][C][=Branch2][Ring1][N][=C][Branch2][Ring1][Ring1][C][=Branch1][C][=O][N][C][=C][Branch1][Ring1][C][C][C][=C][C][=C][Ring1][Branch2][C][C][=C][Ring1][P][C][C]\nAnswer: 1, 2, 3, 4"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the activity of cav3 t-type calcium channels?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any additional words.\nOptions:\n(1) ClCccCCC=6\/C=N\\OC=O)cccccc6))C))))))))))))cccc6\n(2) S=CNccccNC=O)CC)C))))cc6)))))))NC=O)cccc[N+][O-])=O))cc6\n(3) Scnnnn5)))cccNC=O)CC))))ccc6))))))))CC=O)NCcoccc5\nAnswer: 1, 2, 3"}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of cav3 t-type calcium channels.\nMolecule SMILES: s1c(c(c(c1N)C(OC(C)C)=O)C)C(=O)N\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not inhibiting the activity of cav3 t-type calcium channels."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of cav3 t-type calcium channels.\nSMILES: S=C(Nc1ccc(NC(=O)C(C)C)cc1)NC(=O)c1ccc([N+]([O-])=O)cc1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not inhibiting the activity of cav3 t-type calcium channels."}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of cav3 t-type calcium channels.\nSELFIES: [S][C][=Branch2][Ring1][#Branch1][=C][Branch2][Ring1][C][C][=Branch1][Branch1][=C][Ring1][Branch1][N][C][Branch1][#Branch1][O][C][Branch1][C][C][C][=O][C][C][=Branch1][C][=O][N]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of cav3 t-type calcium channels.\nMolecule SELFIES: [S][=C][Branch2][Ring1][=Branch1][N][C][=C][C][=C][Branch1][O][N][C][=Branch1][C][=O][C][Branch1][C][C][C][C][=C][Ring1][N][N][C][=Branch1][C][=O][C][=C][C][=C][Branch1][=Branch1][N+1][Branch1][C][O-1][=O][C][=C][Ring1][=Branch2]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of cav3 t-type calcium channels.\nInChI: InChI=1S\/C19H21Cl2N3O3S\/c1-13(2)15-9-7-14(8-10-15)11-22-23-18(25)12-24(28(3,26)27)17-6-4-5-16(20)19(17)21\/h4-11,13H,12H2,1-3H3,(H,23,25)\/b22-11+\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not inhibiting the activity of cav3 t-type calcium channels."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of cav3 t-type calcium channels.\nMolecule DeepSMILES: scCCCCc6cc9)COcccnnnnc5)))))ccc6)))))))=O))))))C\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not inhibiting the activity of cav3 t-type calcium channels."}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/valid_0-12.jsonl": "{"text":"User: I want to generate a InChI.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be inhibiting the activity of cav3 t-type calcium channels.\nAssistant: Got it, this InChI is not inhibiting the activity of cav3 t-type calcium channels: InChI=1S\/C10H14N2O3S\/c1-4(2)15-10(14)6-5(3)7(8(11)13)16-9(6)12\/h4H,12H2,1-3H3,(H2,11,13)"} {"text":"User: I want to come up with a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be inhibiting the activity of cav3 t-type calcium channels.\nAssistant: Understood, this canonical SMILES is not inhibiting the activity of cav3 t-type calcium channels: CC(C)C(=O)Nc1ccc(NC(=S)NC(=O)c2ccc([N+](=O)[O-])cc2)cc1"}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/train_0-2.jsonl": "{"text":"The SMILES Clc1c(N(S(=O)(=O)C)CC(=O)N\\N=C\\c2ccc(C(C)C)cc2)cccc1Cl represents a molecule that shows no inhibition of the cav3 t-type calcium channel activity."} {"text":"The canonical SMILES CC1CCc2c(C(=O)Oc3cccc(-n4cnnn4)c3)csc2C1 represents a molecule that exhibits no inhibition of the cav3 t-type calcium channel activity."}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/test_0-11.jsonl": "{"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be inhibiting the activity of cav3 t-type calcium channels.\nAssistant: Got it, this canonical SMILES is not inhibiting the activity of cav3 t-type calcium channels: O=C(Nc1ccc(OC(F)(F)F)cc1)C1CCCN(S(=O)(=O)c2cccnc2)C1"} {"text":"User: I want to create a molecule SELFIES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be inhibiting the activity of cav3 t-type calcium channels.\nAssistant: Ok, this SELFIES is not inhibiting the activity of cav3 t-type calcium channels: [S][=Branch1][C][=O][=Branch1][C][=O][Branch2][Ring2][Ring2][N][Branch2][Ring1][#Branch1][C][C][=C][C][=C][Branch1][=Branch1][NH1][C][Ring1][=Branch1][=O][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][O][C][C][=C][C][=C][N][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1]"}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/train_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the canonical SMILES CC(C)c1ccc(\/C=N\/NC(=O)CN(c2cccc(Cl)c2Cl)S(C)(=O)=O)cc1 is inhibiting the activity of cav3 t-type calcium channels?\nAssistant: No, this molecule is not inhibiting the activity of cav3 t-type calcium channels."} {"text":"User: Can you derive if the molecule with the canonical SMILES CC1CCc2c(C(=O)Oc3cccc(-n4cnnn4)c3)csc2C1 is inhibiting the activity of cav3 t-type calcium channels?\nAssistant: No, this molecule is not inhibiting the activity of cav3 t-type calcium channels."}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/train_0-11.jsonl": "{"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be inhibiting the activity of cav3 t-type calcium channels.\nAssistant: Ok, here you go, this SMILES is not inhibiting the activity of cav3 t-type calcium channels: Clc1c(N(S(=O)(=O)C)CC(=O)N\\N=C\\c2ccc(C(C)C)cc2)cccc1Cl"} {"text":"User: I want to create a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be inhibiting the activity of cav3 t-type calcium channels.\nAssistant: Ok, this canonical SMILES is not inhibiting the activity of cav3 t-type calcium channels: CC1CCc2c(C(=O)Oc3cccc(-n4cnnn4)c3)csc2C1"}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/train_0-1.jsonl": "{"text":"Based on the canonical SMILES CC(C)c1ccc(\/C=N\/NC(=O)CN(c2cccc(Cl)c2Cl)S(C)(=O)=O)cc1, the molecule displays no inhibition of the cav3 t-type calcium channel activity."} {"text":"Based on the InChI InChI=1S\/C17H16N4O2S\/c1-11-5-6-14-15(9-24-16(14)7-11)17(22)23-13-4-2-3-12(8-13)21-10-18-19-20-21\/h2-4,8-11H,5-7H2,1H3, the molecule exhibits no inhibition of the cav3 t-type calcium channel activity."}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the activity of cav3 t-type calcium channels?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any additional words.\nOptions:\n1.) [Cl][C][=C][Branch2][Ring2][Ring1][N][Branch1][=Branch2][S][=Branch1][C][=O][=Branch1][C][=O][C][C][C][=Branch1][C][=O][N][\\N][=C][\\C][=C][C][=C][Branch1][=Branch1][C][Branch1][C][C][C][C][=C][Ring1][=Branch2][C][=C][C][=C][Ring2][Ring1][#Branch2][Cl]\n2.) [Cl][C][=N][C][=C][Branch2][Branch1][S][C][C][Branch2][Ring2][#C][C][C][Branch2][Ring2][=Branch1][C][Ring1][Branch1][C][N][Branch1][N][C][Ring1][Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring2][Ring1][O][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][#N][C][=C][Ring2][Ring2][#Branch2]\n3.) [Br][C][C][=Branch2][Ring1][=N][=N][N][C][=Ring1][Branch1][N][=C][Branch1][=N][C][=C][Ring1][=Branch1][C][Branch1][C][F][Branch1][C][F][F][C][O][C][=C][C][=Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C][C][=C][Ring1][=Branch1][C][=C][C][=C][Ring1][=Branch1]\n4.) [Br][C][C][O][C][C][O][C][C][O][N][C][=Branch1][C][=O][C][=C][Branch1][Branch1][C][Ring1][=Branch1][=O][C][=C][C][=C][Ring1][Branch2]\nAnswer: 1, 2, 3, 4"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the activity of cav3 t-type calcium channels?\nConstraint: You must select none, one or more options from A or B without using any additional words.\nOptions:\nA: InChI=1S\/C17H16N4O2S\/c1-11-5-6-14-15(9-24-16(14)7-11)17(22)23-13-4-2-3-12(8-13)21-10-18-19-20-21\/h2-4,8-11H,5-7H2,1H3\nB: InChI=1S\/C20H21ClN2O4\/c1-26-17-9-4-8-16(19(17)27-2)20(25)23(13-22-11-5-10-18(22)24)15-7-3-6-14(21)12-15\/h3-4,6-9,12H,5,10-11,13H2,1-2H3\nAnswer: A, B"}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of cav3 t-type calcium channels.\nMolecule SELFIES: [Cl][C][=C][Branch2][Ring2][Ring1][N][Branch1][=Branch2][S][=Branch1][C][=O][=Branch1][C][=O][C][C][C][=Branch1][C][=O][N][\\N][=C][\\C][=C][C][=C][Branch1][=Branch1][C][Branch1][C][C][C][C][=C][Ring1][=Branch2][C][=C][C][=C][Ring2][Ring1][#Branch2][Cl]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of cav3 t-type calcium channels.\nInChI: InChI=1S\/C17H16N4O2S\/c1-11-5-6-14-15(9-24-16(14)7-11)17(22)23-13-4-2-3-12(8-13)21-10-18-19-20-21\/h2-4,8-11H,5-7H2,1H3\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/test_0-7.jsonl": "{"text":"User: Can you derive if the molecule with the DeepSMILES S=O)=O)NCCCCC6)))C=O)NccccOCF)F)F)))cc6)))))))))))ccccnc6 is inhibiting the activity of cav3 t-type calcium channels?\nAssistant: No, this molecule is not inhibiting the activity of cav3 t-type calcium channels."} {"text":"User: Can you derive if the molecule with the InChI InChI=1S\/C23H21N3O4S\/c1-30-20-10-9-18-12-19(23(27)25-22(18)13-20)16-26(15-17-6-5-11-24-14-17)31(28,29)21-7-3-2-4-8-21\/h2-14H,15-16H2,1H3,(H,25,27) is inhibiting the activity of cav3 t-type calcium channels?\nAssistant: No, this molecule is not inhibiting the activity of cav3 t-type calcium channels."}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/train_0-9.jsonl": "{"text":"User: Can you generate the canonical SMILES of a molecule that is not inhibiting the activity of cav3 t-type calcium channels?\nAssistant: Sure, here you go: CC(C)c1ccc(\/C=N\/NC(=O)CN(c2cccc(Cl)c2Cl)S(C)(=O)=O)cc1"} {"text":"User: Can you create the SELFIES of a molecule that is not inhibiting the activity of cav3 t-type calcium channels?\nAssistant: Yes, I'm happy to help, here you go: [S][C][C][C][Branch2][Ring2][Ring1][C][C][C][=Ring1][=Branch1][C][=Branch1][Ring2][=C][Ring1][=Branch2][C][Branch2][Ring1][Ring1][O][C][=C][C][Branch1][Branch2][N][N][=N][N][=C][Ring1][Branch1][=C][C][=C][Ring1][O][=O][C]"}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/valid_0-3.jsonl": "{"text":"The InChI InChI=1S\/C10H14N2O3S\/c1-4(2)15-10(14)6-5(3)7(8(11)13)16-9(6)12\/h4H,12H2,1-3H3,(H2,11,13) is not inhibiting the activity of cav3 t-type calcium channels."} {"text":"The molecule canonical SMILES CC(C)C(=O)Nc1ccc(NC(=S)NC(=O)c2ccc([N+](=O)[O-])cc2)cc1 is not inhibiting the activity of cav3 t-type calcium channels."}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/test_0-8.jsonl": "{"text":"User: Is the molecule with the SELFIES [S][=Branch1][C][=O][=Branch1][C][=O][Branch2][Ring2][Ring1][N][C][C][Branch1][=Branch1][C][C][C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][#Branch2][O][C][Branch1][C][F][Branch1][C][F][F][C][=C][Ring1][O][C][=C][C][=C][N][=C][Ring1][=Branch1] inhibiting the activity of cav3 t-type calcium channels?\nAssistant: No, it is not inhibiting the activity of cav3 t-type calcium channels."} {"text":"User: Is the molecule with the SELFIES [S][=Branch1][C][=O][=Branch1][C][=O][Branch2][Ring2][Ring2][N][Branch2][Ring1][#Branch1][C][C][=C][C][=C][Branch1][=Branch1][NH1][C][Ring1][=Branch1][=O][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][O][C][C][=C][C][=C][N][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1] inhibiting the activity of cav3 t-type calcium channels?\nAssistant: No, it is not inhibiting the activity of cav3 t-type calcium channels."}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of cav3 t-type calcium channels.\nSELFIES: [S][=Branch1][C][=O][=Branch1][C][=O][Branch2][Ring2][Ring1][N][C][C][Branch1][=Branch1][C][C][C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][#Branch2][O][C][Branch1][C][F][Branch1][C][F][F][C][=C][Ring1][O][C][=C][C][=C][N][=C][Ring1][=Branch1]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of cav3 t-type calcium channels.\nMolecule SMILES: S(=O)(=O)(N(Cc1cc2c([nH]c1=O)cc(OC)cc2)Cc1cccnc1)c1ccccc1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/export/ype_calcium_channels_butkiewicz/test_0-12.jsonl": "{"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be inhibiting the activity of cav3 t-type calcium channels.\nAssistant: Got it, this DeepSMILES is not inhibiting the activity of cav3 t-type calcium channels: S=O)=O)NCCCCC6)))C=O)NccccOCF)F)F)))cc6)))))))))))ccccnc6"} {"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be inhibiting the activity of cav3 t-type calcium channels.\nAssistant: Got it, this DeepSMILES is not inhibiting the activity of cav3 t-type calcium channels: S=O)=O)NCcccc[nH]c6=O)))ccOC))cc6)))))))))Cccccnc6))))))))cccccc6"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/test_0-10.jsonl": "{"text":"User: Can you generate the canonical SMILES of a molecule that is not active against the tyrosyl-DNA phosphodiesterase receptor 1?\nAssistant: Of course, here you go: O=C1C2CCCCC2C(=O)N1c1cccc(OS(=O)(=O)c2ccccc2)c1"} {"text":"User: Can you generate the SELFIES of a molecule that is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)?\nAssistant: Yes, here you go: [F][C][=N][C][=Branch2][Ring1][Branch1][=N][C][Branch1][=C][O][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=C][Ring1][#C][N]"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/valid_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the DeepSMILES O=cnc=O)ncncnCCCC)C))))c95)))))Ccccccc6)))))))))CC=O)NCCCC)))C is active against the tyrosyl-DNA phosphodiesterase receptor 1?\nAssistant: No, this molecule is not active against the tyrosyl-DNA phosphodiesterase receptor 1."} {"text":"User: Can you figure out if the molecule with the SELFIES [O][=C][Branch2][Ring1][Ring1][N][C][Branch1][=C][C][C][C][C][Branch1][Ring2][C][Ring1][Branch1][C][C][Ring1][=Branch1][C][C][=C][N][Branch1][Branch1][N][=C][Ring1][Branch1][C][C] is active against the tyrosyl-DNA phosphodiesterase receptor 1?\nAssistant: No, this molecule is not active against the tyrosyl-DNA phosphodiesterase receptor 1."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/test_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)?\nConstraint: You must select none, one or more options from a or b without using any other words.\nOptions:\na) InChI=1S\/C20H19NO5S\/c22-19-17-11-4-5-12-18(17)20(23)21(19)14-7-6-8-15(13-14)26-27(24,25)16-9-2-1-3-10-16\/h1-3,6-10,13,17-18H,4-5,11-12H2\nb) InChI=1S\/C26H28FN3O3\/c27-19-10-8-18(9-11-19)25(31)28-23-21-6-2-3-7-22(21)33-24(23)26(32)30-16-12-20(13-17-30)29-14-4-1-5-15-29\/h2-3,6-11,20H,1,4-5,12-17H2,(H,28,31)\nAnswer: a, b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)?\nConstraint: You must select none, one or more options from a, b, or c without using any other words.\nOptions:\na.) Clc1ccc(c2nn(c(Nc3c(NC(=O)C(F)(F)F)cccc3)c2)CCC#N)cc1\nb.) S(c1cc(c2nc3n(c2NC2CCCCC2)ccc(c3)C)ccc1OC)CCCCCC\nc.) Fc1nc(nc(Oc2ccc(OC)cc2)c1)N\nAnswer: a, b, c"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/train_0-8.jsonl": "{"text":"User: Can you figure out if the molecule with the InChI InChI=1S\/C23H24N2O4\/c1-27-15-9-10-18(22(12-15)28-2)21-13-19(17-7-3-4-8-20(17)25-21)23(26)24-14-16-6-5-11-29-16\/h3-4,7-10,12-13,16H,5-6,11,14H2,1-2H3,(H,24,26) is active against the tyrosyl-DNA phosphodiesterase receptor 1?\nAssistant: No, this molecule is not active against the tyrosyl-DNA phosphodiesterase receptor 1."} {"text":"User: Can you estimate if the molecule with the DeepSMILES ClCCl)CC3)C=O)NCCCcc6cccF)c6))))))))C))))C is active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)?\nAssistant: No, this molecule is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1).\nMolecule InChI: InChI=1S\/C20H19NO5S\/c22-19-17-11-4-5-12-18(17)20(23)21(19)14-7-6-8-15(13-14)26-27(24,25)16-9-2-1-3-10-16\/h1-3,6-10,13,17-18H,4-5,11-12H2\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against the tyrosyl-DNA phosphodiesterase receptor 1.\nInChI: InChI=1S\/C11H10FN3O2\/c1-16-7-2-4-8(5-3-7)17-10-6-9(12)14-11(13)15-10\/h2-6H,1H3,(H2,13,14,15)\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/valid_0-9.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C24H33N5O3\/c1-5-9-18(4)26-20(30)15-29-23(31)21-22(25-16-27(21)13-12-17(2)3)28(24(29)32)14-19-10-7-6-8-11-19\/h6-8,10-11,16-18H,5,9,12-15H2,1-4H3,(H,26,30) active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)?\nAssistant: No, it is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)."} {"text":"User: Is the molecule with the DeepSMILES O=CNCCCCCC5)CC5))))))C)))ccnnc5))CC active against the tyrosyl-DNA phosphodiesterase receptor 1?\nAssistant: No, it is not active against the tyrosyl-DNA phosphodiesterase receptor 1."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/test_0-1.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C20H19NO5S\/c22-19-17-11-4-5-12-18(17)20(23)21(19)14-7-6-8-15(13-14)26-27(24,25)16-9-2-1-3-10-16\/h1-3,6-10,13,17-18H,4-5,11-12H2 shows no inhibiting the human tyrosyl-DNA phosphodiesterase 1 (TDP1)."} {"text":"The molecule with the SMILES representation of Fc1nc(nc(Oc2ccc(OC)cc2)c1)N displays no inhibiting the human tyrosyl-DNA phosphodiesterase 1."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/valid_0-0.jsonl": "{"text":"The molecule with the SMILES representation of O=c1n(c(=O)n(c2ncn(CCC(C)C)c12)Cc1ccccc1)CC(=O)NC(CCC)C is not a tyrosyl-DNA phosphodiesterase 1 (TDP1) inhibitor."} {"text":"The molecule with the InChI representation of InChI=1S\/C15H23N3O\/c1-3-18-9-13(8-16-18)15(19)17-10(2)14-7-11-4-5-12(14)6-11\/h8-12,14H,3-7H2,1-2H3,(H,17,19) is not a tyrosyl-DNA phosphodiesterase 1 (TDP1) inhibitor."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/test_0-2.jsonl": "{"text":"Based on the DeepSMILES SOcccNC=O)CCCCCC6))))C5=O))))))ccc6)))))))=O)=O)cccccc6, the molecule has no active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1) features."} {"text":"Based on the DeepSMILES representation FcncncOccccOC))cc6)))))))c6)))N, the molecule has no active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1) features."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/valid_0-10.jsonl": "{"text":"User: Can you generate the DeepSMILES of a molecule that is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)?\nAssistant: Sure, here you go: O=cnc=O)ncncnCCCC)C))))c95)))))Ccccccc6)))))))))CC=O)NCCCC)))C"} {"text":"User: Can you create the SMILES of a molecule that is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)?\nAssistant: Yes, I'm happy to help, here you go: O=C(NC(C1C2CC(C1)CC2)C)c1cn(nc1)CC"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/train_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against the tyrosyl-DNA phosphodiesterase receptor 1.\nSELFIES: [O][C][Branch1][=Branch1][C][C][C][Ring1][Branch1][C][N][C][=Branch1][C][=O][C][=C][C][=Branch2][Ring1][Branch2][=N][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][Branch1][Ring1][O][C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][#Branch2][C][=C][C][=C][Ring2][Ring1][Ring1]\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not active against the tyrosyl-DNA phosphodiesterase receptor 1."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1).\ncanonical SMILES: CC1CCc2cc(F)ccc2N1C(=O)C1(C)CC1(Cl)Cl\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/valid_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against the tyrosyl-DNA phosphodiesterase receptor 1.\nMolecule SELFIES: [O][=C][N][Branch2][Ring2][Ring2][C][=Branch1][C][=O][N][Branch2][Ring1][Ring1][C][N][=C][N][Branch1][Branch2][C][C][C][Branch1][C][C][C][C][Ring1][#C][=Ring1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][=Branch1][C][=O][N][C][Branch1][Ring2][C][C][C][C]\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not active against the tyrosyl-DNA phosphodiesterase receptor 1."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1).\nDeepSMILES: O=CNCCCCCC5)CC5))))))C)))ccnnc5))CC\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/test_0-9.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C20H19NO5S\/c22-19-17-11-4-5-12-18(17)20(23)21(19)14-7-6-8-15(13-14)26-27(24,25)16-9-2-1-3-10-16\/h1-3,6-10,13,17-18H,4-5,11-12H2 active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)?\nAssistant: No, it is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)."} {"text":"User: Is the molecule with the InChI InChI=1S\/C11H10FN3O2\/c1-16-7-2-4-8(5-3-7)17-10-6-9(12)14-11(13)15-10\/h2-6H,1H3,(H2,13,14,15) active against the tyrosyl-DNA phosphodiesterase receptor 1?\nAssistant: No, it is not active against the tyrosyl-DNA phosphodiesterase receptor 1."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/test_0-0.jsonl": "{"text":"The molecule with the SMILES S(Oc1cc(N2C(=O)C3C(CCCC3)C2=O)ccc1)(=O)(=O)c1ccccc1 is not an inhibitor of tyrosyl-DNA phosphodiesterase 1."} {"text":"The molecule with the SMILES representation of Fc1nc(nc(Oc2ccc(OC)cc2)c1)N is not an inhibitor of tyrosyl-DNA phosphodiesterase 1."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/valid_0-7.jsonl": "{"text":"Task: Please give me a molecule DeepSMILES based on the text description below.\nDescription: A molecule that is active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1).\nResult: O=cnc=O)ncncnCCCC)C))))c95)))))Ccccccc6)))))))))CC=O)NCCCC)))C"} {"text":"Task: Please give me a canonical SMILES based on the description below.\nDescription: A molecule that is active against the tyrosyl-DNA phosphodiesterase receptor 1.\nResult: CCn1cc(C(=O)NC(C)C2CC3CCC2C3)cn1"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/test_0-3.jsonl": "{"text":"The InChI InChI=1S\/C20H19NO5S\/c22-19-17-11-4-5-12-18(17)20(23)21(19)14-7-6-8-15(13-14)26-27(24,25)16-9-2-1-3-10-16\/h1-3,6-10,13,17-18H,4-5,11-12H2 represents a molecule that is not identified as active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)."} {"text":"The InChI InChI=1S\/C11H10FN3O2\/c1-16-7-2-4-8(5-3-7)17-10-6-9(12)14-11(13)15-10\/h2-6H,1H3,(H2,13,14,15) represents a molecule that is not identified as active against the tyrosyl-DNA phosphodiesterase receptor 1."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/valid_0-11.jsonl": "{"text":"User: I'm looking for the InChI of a molecule that is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)?\nAssistant: This is a molecule that is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1): InChI=1S\/C24H33N5O3\/c1-5-9-18(4)26-20(30)15-29-23(31)21-22(25-16-27(21)13-12-17(2)3)28(24(29)32)14-19-10-7-6-8-11-19\/h6-8,10-11,16-18H,5,9,12-15H2,1-4H3,(H,26,30)"} {"text":"User: I'm looking for the InChI of a molecule that is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)?\nAssistant: This is a molecule that is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1): InChI=1S\/C15H23N3O\/c1-3-18-9-13(8-16-18)15(19)17-10(2)14-7-11-4-5-12(14)6-11\/h8-12,14H,3-7H2,1-2H3,(H,17,19)"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/train_0-0.jsonl": "{"text":"The molecule with the InChI representation of InChI=1S\/C23H24N2O4\/c1-27-15-9-10-18(22(12-15)28-2)21-13-19(17-7-3-4-8-20(17)25-21)23(26)24-14-16-6-5-11-29-16\/h3-4,7-10,12-13,16H,5-6,11,14H2,1-2H3,(H,24,26) is not a tyrosyl-DNA phosphodiesterase 1 (TDP1) inhibitor."} {"text":"The molecule with the SMILES ClC1(Cl)C(C1)(C(=O)N1C(CCc2c1ccc(F)c2)C)C is not an inhibitor of tyrosyl-DNA phosphodiesterase 1."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/test_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1).\nSELFIES: [S][Branch2][Ring1][S][O][C][=C][C][Branch2][Ring1][Ring2][N][C][=Branch1][C][=O][C][C][Branch1][#Branch1][C][C][C][C][Ring1][=Branch1][C][Ring1][#Branch2][=O][=C][C][=C][Ring1][P][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1]\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against the tyrosyl-DNA phosphodiesterase receptor 1.\nSMILES: Fc1nc(nc(Oc2ccc(OC)cc2)c1)N\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not active against the tyrosyl-DNA phosphodiesterase receptor 1."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/train_0-10.jsonl": "{"text":"User: Can you generate the InChI of a molecule that is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)?\nAssistant: Of course, here you go: InChI=1S\/C23H24N2O4\/c1-27-15-9-10-18(22(12-15)28-2)21-13-19(17-7-3-4-8-20(17)25-21)23(26)24-14-16-6-5-11-29-16\/h3-4,7-10,12-13,16H,5-6,11,14H2,1-2H3,(H,24,26)"} {"text":"User: Can you create the DeepSMILES of a molecule that is not active against the tyrosyl-DNA phosphodiesterase receptor 1?\nAssistant: Yes, I'm happy to help, here you go: ClCCl)CC3)C=O)NCCCcc6cccF)c6))))))))C))))C"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/train_0-3.jsonl": "{"text":"The InChI InChI=1S\/C23H24N2O4\/c1-27-15-9-10-18(22(12-15)28-2)21-13-19(17-7-3-4-8-20(17)25-21)23(26)24-14-16-6-5-11-29-16\/h3-4,7-10,12-13,16H,5-6,11,14H2,1-2H3,(H,24,26) is from a molecule that is not identified as active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)."} {"text":"The SELFIES [Cl][C][Branch1][C][Cl][C][Branch1][Ring2][C][Ring1][Ring2][Branch2][Ring1][=Branch2][C][=Branch1][C][=O][N][C][Branch1][S][C][C][C][=C][Ring1][=Branch1][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1][C][C] represents a molecule that is not identified as active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/train_0-12.jsonl": "{"text":"User: I want to come up with a InChI.\nAssistant: This sounds very curious. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be active against the tyrosyl-DNA phosphodiesterase receptor 1.\nAssistant: Ok, here you go, this InChI is not active against the tyrosyl-DNA phosphodiesterase receptor 1: InChI=1S\/C23H24N2O4\/c1-27-15-9-10-18(22(12-15)28-2)21-13-19(17-7-3-4-8-20(17)25-21)23(26)24-14-16-6-5-11-29-16\/h3-4,7-10,12-13,16H,5-6,11,14H2,1-2H3,(H,24,26)"} {"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1).\nAssistant: Got it, here you go, this SMILES is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1): ClC1(Cl)C(C1)(C(=O)N1C(CCc2c1ccc(F)c2)C)C"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/test_0-13.jsonl": "{"text":"User: I want to come up with a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be active against the tyrosyl-DNA phosphodiesterase receptor 1.\nAssistant: Ok, this canonical SMILES is not active against the tyrosyl-DNA phosphodiesterase receptor 1: O=C1C2CCCCC2C(=O)N1c1cccc(OS(=O)(=O)c2ccccc2)c1"} {"text":"User: I want to come up with a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1).\nAssistant: Got it, this canonical SMILES is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1): COc1ccc(Oc2cc(F)nc(N)n2)cc1"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/valid_0-2.jsonl": "{"text":"Based on the DeepSMILES representation O=cnc=O)ncncnCCCC)C))))c95)))))Ccccccc6)))))))))CC=O)NCCCC)))C, the molecule has no active against the tyrosyl-DNA phosphodiesterase receptor 1 characteristics."} {"text":"Based on the SMILES representation O=C(NC(C1C2CC(C1)CC2)C)c1cn(nc1)CC, the molecule has no active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1) characteristics."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES COc1ccc(-c2cc(C(=O)NCC3CCCO3)c3ccccc3n2)c(OC)c1 active against the tyrosyl-DNA phosphodiesterase receptor 1?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any other words.\nOptions:\n1: False\n2: True\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES ClC1(Cl)C(C1)(C(=O)N1C(CCc2c1ccc(F)c2)C)C active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)?\nConstraint: Even if you are uncertain, you must pick either A or B without using any additional words.\nOptions:\nA.) True\nB.) False\nAnswer: B"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/valid_0-1.jsonl": "{"text":"The molecule with the DeepSMILES representation of O=cnc=O)ncncnCCCC)C))))c95)))))Ccccccc6)))))))))CC=O)NCCCC)))C exhibits no inhibiting the human tyrosyl-DNA phosphodiesterase 1."} {"text":"The molecule with the canonical SMILES CCn1cc(C(=O)NC(C)C2CC3CCC2C3)cn1 shows no inhibiting the human tyrosyl-DNA phosphodiesterase 1 (TDP1)."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/valid_0-13.jsonl": "{"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1).\nAssistant: Ok, this InChI is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1): InChI=1S\/C24H33N5O3\/c1-5-9-18(4)26-20(30)15-29-23(31)21-22(25-16-27(21)13-12-17(2)3)28(24(29)32)14-19-10-7-6-8-11-19\/h6-8,10-11,16-18H,5,9,12-15H2,1-4H3,(H,26,30)"} {"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be active against the tyrosyl-DNA phosphodiesterase receptor 1.\nAssistant: Got it, this DeepSMILES is not active against the tyrosyl-DNA phosphodiesterase receptor 1: O=CNCCCCCC5)CC5))))))C)))ccnnc5))CC"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against the tyrosyl-DNA phosphodiesterase receptor 1.\nMolecule SMILES: O=c1n(c(=O)n(c2ncn(CCC(C)C)c12)Cc1ccccc1)CC(=O)NC(CCC)C\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against the tyrosyl-DNA phosphodiesterase receptor 1.\nSMILES: O=C(NC(C1C2CC(C1)CC2)C)c1cn(nc1)CC\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/train_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any additional words.\nOptions:\n1: [Br][C][=C][C][Branch2][Ring1][=Branch2][O][C][C][C][C][N][C][=Branch1][C][=O][C][=C][Branch1][Branch1][N][=C][Ring1][#Branch1][C][=C][C][=C][Ring1][Branch2][=C][C][=C][Ring2][Ring1][=Branch1]\n2: [O][C][Branch1][=Branch1][C][C][C][Ring1][Branch1][C][N][C][=Branch1][C][=O][C][=C][C][=Branch2][Ring1][Branch2][=N][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][Branch1][Ring1][O][C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][#Branch2][C][=C][C][=C][Ring2][Ring1][Ring1]\n3: [O][C][=C][Branch2][Ring1][N][C][=Branch1][C][=O][C][=Branch1][Branch1][=C][Ring1][#Branch1][C][C][=N][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][=C][C][=C][Ring1][Branch2][C][=C][C][Branch1][C][O][=C][Ring2][Ring1][=Branch1][C]\n4: [F][C][=C][C][=C][Branch2][Ring1][S][N][C][=Branch1][C][=O][C][Branch2][Ring1][=Branch1][N][C][=C][Branch1][Ring1][O][C][C][=C][C][Branch1][#Branch1][N][C][=Branch1][C][=O][C][=C][Ring1][N][C][C][=C][Ring2][Ring1][Branch2]\nAnswer: 1, 2, 3, 4"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)?\nConstraint: You must select none, one or more options from 1 or 2 without using any other words.\nOptions:\n(1) COc1cnc(SCc2ccc(Cl)c(Cl)c2)nc1Cl\n(2) CC1CCc2cc(F)ccc2N1C(=O)C1(C)CC1(Cl)Cl\nAnswer: 1, 2"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/valid_0-4.jsonl": "{"text":"The molecule SELFIES [O][=C][N][Branch2][Ring2][Ring2][C][=Branch1][C][=O][N][Branch2][Ring1][Ring1][C][N][=C][N][Branch1][Branch2][C][C][C][Branch1][C][C][C][C][Ring1][#C][=Ring1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][=Branch1][C][=O][N][C][Branch1][Ring2][C][C][C][C] is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)."} {"text":"The SMILES O=C(NC(C1C2CC(C1)CC2)C)c1cn(nc1)CC is not active against the tyrosyl-DNA phosphodiesterase receptor 1."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against the tyrosyl-DNA phosphodiesterase receptor 1.\nDeepSMILES: OCCCC5)))CNC=O)cccncc6)ccOC))ccOC))cc6))))))))cccc6\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1).\nDeepSMILES: ClCCl)CC3)C=O)NCCCcc6cccF)c6))))))))C))))C\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/valid_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not active against the tyrosyl-DNA phosphodiesterase receptor 1?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any additional words.\nOptions:\n[1] InChI=1S\/C24H33N5O3\/c1-5-9-18(4)26-20(30)15-29-23(31)21-22(25-16-27(21)13-12-17(2)3)28(24(29)32)14-19-10-7-6-8-11-19\/h6-8,10-11,16-18H,5,9,12-15H2,1-4H3,(H,26,30)\n[2] InChI=1S\/C23H22Cl2N4O5S\/c24-18-4-2-1-3-17(18)22-27-26-21(34-22)14-29(16-6-7-16)23(30)15-5-8-19(25)20(13-15)35(31,32)28-9-11-33-12-10-28\/h1-5,8,13,16H,6-7,9-12,14H2\n[3] InChI=1S\/C10H10N2OS2\/c1-5-3-7(6(2)11-5)4-8-9(13)12-10(14)15-8\/h3-4,11H,1-2H3,(H,12,13,14)\/b8-4-\n[4] InChI=1S\/C22H22FN5O2S\/c23-17-8-4-5-9-19(17)28(22(30)18-14-31-27-26-18)20(15-10-12-24-13-11-15)21(29)25-16-6-2-1-3-7-16\/h4-5,8-14,16,20H,1-3,6-7H2,(H,25,29)\nAnswer: 1, 2, 3, 4"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any additional words.\nOptions:\n1: CCOC(=O)c1nnn(-c2nonc2N)c1-c1ccc(Cl)cc1\n2: CCn1cc(C(=O)NC(C)C2CC3CCC2C3)cn1\n3: Cc1ccc(OCC(=O)N(Cc2cccc(F)c2)C2CCS(=O)(=O)C2)cc1\nAnswer: 1, 2, 3"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/valid_0-12.jsonl": "{"text":"User: I want to generate a molecule InChI.\nAssistant: This sounds very curious. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be active against the tyrosyl-DNA phosphodiesterase receptor 1.\nAssistant: Ok, this InChI is not active against the tyrosyl-DNA phosphodiesterase receptor 1: InChI=1S\/C24H33N5O3\/c1-5-9-18(4)26-20(30)15-29-23(31)21-22(25-16-27(21)13-12-17(2)3)28(24(29)32)14-19-10-7-6-8-11-19\/h6-8,10-11,16-18H,5,9,12-15H2,1-4H3,(H,26,30)"} {"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1).\nAssistant: Got it, here you go, this DeepSMILES is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1): O=CNCCCCCC5)CC5))))))C)))ccnnc5))CC"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/train_0-2.jsonl": "{"text":"Based on the canonical SMILES COc1ccc(-c2cc(C(=O)NCC3CCCO3)c3ccccc3n2)c(OC)c1, the molecule has no active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1) properties."} {"text":"Based on the canonical SMILES representation CC1CCc2cc(F)ccc2N1C(=O)C1(C)CC1(Cl)Cl, the molecule has no active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1) characteristics."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/test_0-11.jsonl": "{"text":"User: I'm looking for the SMILES of a molecule that is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)?\nAssistant: This is a molecule that is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1): S(Oc1cc(N2C(=O)C3C(CCCC3)C2=O)ccc1)(=O)(=O)c1ccccc1"} {"text":"User: I'm searching for the SMILES of a molecule that is not active against the tyrosyl-DNA phosphodiesterase receptor 1?\nAssistant: This is a molecule that is not active against the tyrosyl-DNA phosphodiesterase receptor 1: Fc1nc(nc(Oc2ccc(OC)cc2)c1)N"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/train_0-7.jsonl": "{"text":"Task: Please generate a canonical SMILES based on the description below.\nDescription: A molecule that is active against the tyrosyl-DNA phosphodiesterase receptor 1.\nResult: COc1ccc(-c2cc(C(=O)NCC3CCCO3)c3ccccc3n2)c(OC)c1"} {"text":"Task: Please generate a molecule InChI based on the description.\nDescription: A molecule that is active against the tyrosyl-DNA phosphodiesterase receptor 1.\nResult: InChI=1S\/C15H16Cl2FNO\/c1-9-3-4-10-7-11(18)5-6-12(10)19(9)13(20)14(2)8-15(14,16)17\/h5-7,9H,3-4,8H2,1-2H3"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/train_0-11.jsonl": "{"text":"User: I'm searching for the InChI of a molecule that is not active against the tyrosyl-DNA phosphodiesterase receptor 1?\nAssistant: This is a molecule that is not active against the tyrosyl-DNA phosphodiesterase receptor 1: InChI=1S\/C23H24N2O4\/c1-27-15-9-10-18(22(12-15)28-2)21-13-19(17-7-3-4-8-20(17)25-21)23(26)24-14-16-6-5-11-29-16\/h3-4,7-10,12-13,16H,5-6,11,14H2,1-2H3,(H,24,26)"} {"text":"User: I'm searching for the SMILES of a molecule that is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)?\nAssistant: This is a molecule that is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1): ClC1(Cl)C(C1)(C(=O)N1C(CCc2c1ccc(F)c2)C)C"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/train_0-1.jsonl": "{"text":"The molecule with the InChI representation of InChI=1S\/C23H24N2O4\/c1-27-15-9-10-18(22(12-15)28-2)21-13-19(17-7-3-4-8-20(17)25-21)23(26)24-14-16-6-5-11-29-16\/h3-4,7-10,12-13,16H,5-6,11,14H2,1-2H3,(H,24,26) shows no inhibiting the human tyrosyl-DNA phosphodiesterase 1 (TDP1)."} {"text":"The molecule with the SMILES representation of ClC1(Cl)C(C1)(C(=O)N1C(CCc2c1ccc(F)c2)C)C displays no inhibiting the human tyrosyl-DNA phosphodiesterase 1."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/train_0-13.jsonl": "{"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1).\nAssistant: Ok, this canonical SMILES is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1): COc1ccc(-c2cc(C(=O)NCC3CCCO3)c3ccccc3n2)c(OC)c1"} {"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be active against the tyrosyl-DNA phosphodiesterase receptor 1.\nAssistant: Ok, this DeepSMILES is not active against the tyrosyl-DNA phosphodiesterase receptor 1: ClCCl)CC3)C=O)NCCCcc6cccF)c6))))))))C))))C"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/train_0-4.jsonl": "{"text":"The molecule SELFIES [O][C][Branch1][=Branch1][C][C][C][Ring1][Branch1][C][N][C][=Branch1][C][=O][C][=C][C][=Branch2][Ring1][Branch2][=N][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][Branch1][Ring1][O][C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][#Branch2][C][=C][C][=C][Ring2][Ring1][Ring1] is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)."} {"text":"The canonical SMILES CC1CCc2cc(F)ccc2N1C(=O)C1(C)CC1(Cl)Cl is not active against the tyrosyl-DNA phosphodiesterase receptor 1."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/test_0-7.jsonl": "{"text":"Task: Please create a canonical SMILES based on the text description below.\nDescription: A molecule that is active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1).\nResult: O=C1C2CCCCC2C(=O)N1c1cccc(OS(=O)(=O)c2ccccc2)c1"} {"text":"Task: Please create a molecule SMILES based on the text description below.\nDescription: A molecule that is active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1).\nResult: Fc1nc(nc(Oc2ccc(OC)cc2)c1)N"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/train_0-9.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C23H24N2O4\/c1-27-15-9-10-18(22(12-15)28-2)21-13-19(17-7-3-4-8-20(17)25-21)23(26)24-14-16-6-5-11-29-16\/h3-4,7-10,12-13,16H,5-6,11,14H2,1-2H3,(H,24,26) active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)?\nAssistant: No, it is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)."} {"text":"User: Is the molecule with the SELFIES [Cl][C][Branch1][C][Cl][C][Branch1][Ring2][C][Ring1][Ring2][Branch2][Ring1][=Branch2][C][=Branch1][C][=O][N][C][Branch1][S][C][C][C][=C][Ring1][=Branch1][C][=C][C][Branch1][C][F][=C][Ring1][#Branch1][C][C] active against the tyrosyl-DNA phosphodiesterase receptor 1?\nAssistant: No, it is not active against the tyrosyl-DNA phosphodiesterase receptor 1."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/valid_0-3.jsonl": "{"text":"The SELFIES [O][=C][N][Branch2][Ring2][Ring2][C][=Branch1][C][=O][N][Branch2][Ring1][Ring1][C][N][=C][N][Branch1][Branch2][C][C][C][Branch1][C][C][C][C][Ring1][#C][=Ring1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][=Branch1][C][=O][N][C][Branch1][Ring2][C][C][C][C] is from a molecule that is not identified as active against the tyrosyl-DNA phosphodiesterase receptor 1."} {"text":"The InChI InChI=1S\/C15H23N3O\/c1-3-18-9-13(8-16-18)15(19)17-10(2)14-7-11-4-5-12(14)6-11\/h8-12,14H,3-7H2,1-2H3,(H,17,19) is from a molecule that is not identified as active against the tyrosyl-DNA phosphodiesterase receptor 1."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/test_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the SMILES S(Oc1cc(N2C(=O)C3C(CCCC3)C2=O)ccc1)(=O)(=O)c1ccccc1 is active against the tyrosyl-DNA phosphodiesterase receptor 1?\nAssistant: No, this molecule is not active against the tyrosyl-DNA phosphodiesterase receptor 1."} {"text":"User: Can you figure out if the molecule with the canonical SMILES COc1ccc(Oc2cc(F)nc(N)n2)cc1 is active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)?\nAssistant: No, this molecule is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [S][Branch2][Ring1][S][O][C][=C][C][Branch2][Ring1][Ring2][N][C][=Branch1][C][=O][C][C][Branch1][#Branch1][C][C][C][C][Ring1][=Branch1][C][Ring1][#Branch2][=O][=C][C][=C][Ring1][P][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1] active against the tyrosyl-DNA phosphodiesterase receptor 1?\nConstraint: Even if you are uncertain, you must pick either a or b without using any additional words.\nOptions:\na.) True\nb.) False\nAnswer: b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES Fc1nc(nc(Oc2ccc(OC)cc2)c1)N active against the tyrosyl-DNA phosphodiesterase receptor 1?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any additional words.\nOptions:\n(1) False\n(2) True\nAnswer: 1"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [O][=C][N][Branch2][Ring2][Ring2][C][=Branch1][C][=O][N][Branch2][Ring1][Ring1][C][N][=C][N][Branch1][Branch2][C][C][C][Branch1][C][C][C][C][Ring1][#C][=Ring1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][=Branch1][C][=O][N][C][Branch1][Ring2][C][C][C][C] active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any additional words.\nOptions:\n1 True\n2 False\nAnswer: 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES O=CNCCCCCC5)CC5))))))C)))ccnnc5))CC active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any additional words.\nOptions:\n1) True\n2) False\nAnswer: 2"}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/test_0-4.jsonl": "{"text":"The molecule SMILES S(Oc1cc(N2C(=O)C3C(CCCC3)C2=O)ccc1)(=O)(=O)c1ccccc1 is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)."} {"text":"The SELFIES [F][C][=N][C][=Branch2][Ring1][Branch1][=N][C][Branch1][=C][O][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=C][Ring1][#C][N] is not active against the tyrosyl-DNA phosphodiesterase receptor 1 (TDP1)."}", "/scratch/micpie/export/export/dna_phosphodiesterase_butkiewicz/test_0-12.jsonl": "{"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be active against the tyrosyl-DNA phosphodiesterase receptor 1.\nAssistant: Ok, here you go, this DeepSMILES is not active against the tyrosyl-DNA phosphodiesterase receptor 1: SOcccNC=O)CCCCCC6))))C5=O))))))ccc6)))))))=O)=O)cccccc6"} {"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be active against the tyrosyl-DNA phosphodiesterase receptor 1.\nAssistant: Ok, here you go, this SELFIES is not active against the tyrosyl-DNA phosphodiesterase receptor 1: [F][C][=N][C][=Branch2][Ring1][Branch1][=N][C][Branch1][=C][O][C][=C][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][Branch2][=C][Ring1][#C][N]"}", "/scratch/micpie/export/drug_disease_pathway/valid_0-0.jsonl": "{"text":"The drug InChI=1S\/C19H25NO3\/c21-18(20-12-15-8-4-5-9-16(15)13-20)11-17(19(22)23)10-14-6-2-1-3-7-14\/h1-3,6-7,15-17H,4-5,8-13H2,(H,22,23)\/t15-,16+,17-\/m0\/s1 is indicated for the Type 2 diabetes mellitus disease and modulates the Pancreatic secretion pathway."} {"text":"The drug Imipramine is indicated for the Major depressive disorder disease and modulates the Serotonergic synapse pathway."}", "/scratch/micpie/export/drug_disease_pathway/test_0-0.jsonl": "{"text":"The drug Estradiol is indicated for the Premature ovarian failure disease and modulates the Ovarian steroidogenesis pathway."} {"text":"The drug CC=O)OC=CC=CC=C6CO)=O is indicated for the Ankylosing spondylitis disease and modulates the Antigen processing and presentation pathway."}", "/scratch/micpie/export/drug_disease_pathway/train_0-0.jsonl": "{"text":"The drug OC(=O)CCCC[C@@H]1CCSS1 is indicated for the Leigh syndrome disease and modulates the Oxidative phosphorylation pathway."} {"text":"The drug Phylloquinone is indicated for the Combined deficiency of vitamin K-dependent clotting factors disease and modulates the Ubiquinone and other terpenoid-quinone biosynthesis pathway."}", "/scratch/micpie/export/bio_ner_33/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Cells were exposed to contact allergens (2-bromo-2-bromomethyl glutaronitrile, cinnamaldehyde, citral, diethylmaleate, dinitrochlorobenzene, glyoxal, 2-mercaptobenzothiazol, nickel sulfate, 4-nitrobenzylbromide, oxazolone, penicillin G, resorcinol, tetramethylthiuram disulfide), to pre-pro-haptens (cinnamyl alcohol, eugenol, isoeugenol, p-phenylediamine), to respiratory allergens (ammonium hexachloroplatinate, diphenylmethane diisocyanate, glutaraldehyde, hexamethylenediisocy, maleic anhydride, trimellitic anhydride) and to irritants (benzaldehyde, cholorobenzene, diethylphtalate, hydrobenzoic acid, lactic acid, octanoic acid, phenol, salicylic acid, sodium lauryl sulphate, sulfamic acid)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: 2 - bromo - 2 - bromomethyl glutaronitrile,42,84,Chemical\/Drug\ncinnamaldehyde,86,100,Chemical\/Drug\ncitral,102,108,Chemical\/Drug\ndiethylmaleate,110,124,Chemical\/Drug\ndinitrochlorobenzene,126,146,Chemical\/Drug\nglyoxal,148,155,Chemical\/Drug\n2 - mercaptobenzothiazol,157,181,Chemical\/Drug\nnickel sulfate,183,197,Chemical\/Drug\n4 - nitrobenzylbromide,199,221,Chemical\/Drug\noxazolone,223,232,Chemical\/Drug\npenicillin G,234,246,Chemical\/Drug\nresorcinol,248,258,Chemical\/Drug\ntetramethylthiuram disulfide,260,288,Chemical\/Drug\ncinnamyl alcohol,316,332,Chemical\/Drug\neugenol,334,341,Chemical\/Drug\nisoeugenol,343,353,Chemical\/Drug\np - phenylediamine,355,373,Chemical\/Drug\nammonium hexachloroplatinate,403,431,Chemical\/Drug\ndiphenylmethane diisocyanate,433,461,Chemical\/Drug\nglutaraldehyde,463,477,Chemical\/Drug\nhexamethylenediisocy,479,499,Chemical\/Drug\nmaleic anhydride,501,517,Chemical\/Drug\ntrimellitic anhydride,519,540,Chemical\/Drug\nbenzaldehyde,561,573,Chemical\/Drug\ncholorobenzene,575,589,Chemical\/Drug\ndiethylphtalate,591,606,Chemical\/Drug\nhydrobenzoic acid,608,625,Chemical\/Drug\nlactic acid,627,638,Chemical\/Drug\noctanoic acid,640,653,Chemical\/Drug\nphenol,655,661,Chemical\/Drug\nsalicylic acid,663,677,Chemical\/Drug\nsodium lauryl sulphate,679,701,Chemical\/Drug\nsulfamic acid,703,716,Chemical\/Drug"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Cells were exposed to contact allergens (2-bromo-2-bromomethyl glutaronitrile, cinnamaldehyde, citral, diethylmaleate, dinitrochlorobenzene, glyoxal, 2-mercaptobenzothiazol, nickel sulfate, 4-nitrobenzylbromide, oxazolone, penicillin G, resorcinol, tetramethylthiuram disulfide), to pre-pro-haptens (cinnamyl alcohol, eugenol, isoeugenol, p-phenylediamine), to respiratory allergens (ammonium hexachloroplatinate, diphenylmethane diisocyanate, glutaraldehyde, hexamethylenediisocy, maleic anhydride, trimellitic anhydride) and to irritants (benzaldehyde, cholorobenzene, diethylphtalate, hydrobenzoic acid, lactic acid, octanoic acid, phenol, salicylic acid, sodium lauryl sulphate, sulfamic acid)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: 2 - bromo - 2 - bromomethyl glutaronitrile,42,84,Chemical\/Drug\ncinnamaldehyde,86,100,Chemical\/Drug\ncitral,102,108,Chemical\/Drug\ndiethylmaleate,110,124,Chemical\/Drug\ndinitrochlorobenzene,126,146,Chemical\/Drug\nglyoxal,148,155,Chemical\/Drug\n2 - mercaptobenzothiazol,157,181,Chemical\/Drug\nnickel sulfate,183,197,Chemical\/Drug\n4 - nitrobenzylbromide,199,221,Chemical\/Drug\noxazolone,223,232,Chemical\/Drug\npenicillin G,234,246,Chemical\/Drug\nresorcinol,248,258,Chemical\/Drug\ntetramethylthiuram disulfide,260,288,Chemical\/Drug\ncinnamyl alcohol,316,332,Chemical\/Drug\neugenol,334,341,Chemical\/Drug\nisoeugenol,343,353,Chemical\/Drug\np - phenylediamine,355,373,Chemical\/Drug\nammonium hexachloroplatinate,403,431,Chemical\/Drug\ndiphenylmethane diisocyanate,433,461,Chemical\/Drug\nglutaraldehyde,463,477,Chemical\/Drug\nhexamethylenediisocy,479,499,Chemical\/Drug\nmaleic anhydride,501,517,Chemical\/Drug\ntrimellitic anhydride,519,540,Chemical\/Drug\nbenzaldehyde,561,573,Chemical\/Drug\ncholorobenzene,575,589,Chemical\/Drug\ndiethylphtalate,591,606,Chemical\/Drug\nhydrobenzoic acid,608,625,Chemical\/Drug\nlactic acid,627,638,Chemical\/Drug\noctanoic acid,640,653,Chemical\/Drug\nphenol,655,661,Chemical\/Drug\nsalicylic acid,663,677,Chemical\/Drug\nsodium lauryl sulphate,679,701,Chemical\/Drug\nsulfamic acid,703,716,Chemical\/Drug"}", "/scratch/micpie/export/rhea_db_predictions/test_0-1.jsonl": "{"text":"The reaction SMILES COc1cc(\/C=C\/C(=O)OCC[N+](C)(C)C)cc(OC)c1O.O>>COc1cc(\/C=C\/C(=O)[O-])cc(OC)c1O.C[N+](C)(C)CCO.[H+] has the reaction products COc1cc(\/C=C\/C(=O)[O-])cc(OC)c1O, C[N+](C)(C)CCO, and [H+] and the starting materials COc1cc(\/C=C\/C(=O)OCC[N+](C)(C)C)cc(OC)c1O and O."} {"text":"The reaction SMILES (RXNSMILES) CC(C)(COP(=O)([O-])OP(=O)([O-])OC[C@H]1O[C@@H](n2cnc3c(N)ncnc32)[C@H](O)[C@@H]1OP(=O)([O-])[O-])[C@@H](O)C(=O)NCCC(=O)NCCS.[1*]C(=O)N[C@@H](CO)[C@H](O)\/C=C\/CC\/C=C\/CCCCCCCCC.[H+]>>*C(=O)SCCNC(=O)CCNC(=O)[C@H](O)C(C)(C)COP(=O)([O-])OP(=O)([O-])OC[C@H]1O[C@@H](n2cnc3c(N)ncnc32)[C@H](O)[C@@H]1OP(=O)([O-])[O-].CCCCCCCCC\/C=C\/CC\/C=C\/[C@@H](O)[C@@H]([NH3+])CO has the products *C(=O)SCCNC(=O)CCNC(=O)[C@H](O)C(C)(C)COP(=O)([O-])OP(=O)([O-])OC[C@H]1O[C@@H](n2cnc3c(N)ncnc32)[C@H](O)[C@@H]1OP(=O)([O-])[O-] and CCCCCCCCC\/C=C\/CC\/C=C\/[C@@H](O)[C@@H]([NH3+])CO and the reaction educts CC(C)(COP(=O)([O-])OP(=O)([O-])OC[C@H]1O[C@@H](n2cnc3c(N)ncnc32)[C@H](O)[C@@H]1OP(=O)([O-])[O-])[C@@H](O)C(=O)NCCC(=O)NCCS, [1*]C(=O)N[C@@H](CO)[C@H](O)\/C=C\/CC\/C=C\/CCCCCCCCC, and [H+]."}", "/scratch/micpie/export/rhea_db_predictions/valid_0-0.jsonl": "{"text":"The reaction SMILES string S=C=NCc1ccccc1>>N#CSCc1ccccc1 has the starting materials S=C=NCc1ccccc1 and the reaction products N#CSCc1ccccc1."} {"text":"The RXNSMILES O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCC\/C=C\\CCC.[Fe+3]>>O=O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCCCCCCC.[Fe+2].[H+] has the educts O, [1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCC\/C=C\\CCC, and [Fe+3] and the reaction products O=O, [1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCCCCCCC, [Fe+2], and [H+]."}", "/scratch/micpie/export/rhea_db_predictions/test_0-2.jsonl": "{"text":"Question: What reaction educts are needed to produce the reaction products COc1cc(\/C=C\/C(=O)[O-])cc(OC)c1O, C[N+](C)(C)CCO, and [H+]?\nAnswer: COc1cc(\/C=C\/C(=O)OCC[N+](C)(C)C)cc(OC)c1O and O."} {"text":"Question: Which reaction educts are needed to synthesize the reaction products *C(=O)SCCNC(=O)CCNC(=O)[C@H](O)C(C)(C)COP(=O)([O-])OP(=O)([O-])OC[C@H]1O[C@@H](n2cnc3c(N)ncnc32)[C@H](O)[C@@H]1OP(=O)([O-])[O-] and CCCCCCCCC\/C=C\/CC\/C=C\/[C@@H](O)[C@@H]([NH3+])CO?\nAnswer: CC(C)(COP(=O)([O-])OP(=O)([O-])OC[C@H]1O[C@@H](n2cnc3c(N)ncnc32)[C@H](O)[C@@H]1OP(=O)([O-])[O-])[C@@H](O)C(=O)NCCC(=O)NCCS, [1*]C(=O)N[C@@H](CO)[C@H](O)\/C=C\/CC\/C=C\/CCCCCCCCC, and [H+]."}", "/scratch/micpie/export/rhea_db_predictions/test_0-0.jsonl": "{"text":"The RXNSMILES COc1cc(\/C=C\/C(=O)OCC[N+](C)(C)C)cc(OC)c1O.O>>COc1cc(\/C=C\/C(=O)[O-])cc(OC)c1O.C[N+](C)(C)CCO.[H+] has the reaction educts COc1cc(\/C=C\/C(=O)OCC[N+](C)(C)C)cc(OC)c1O and O and the reaction products COc1cc(\/C=C\/C(=O)[O-])cc(OC)c1O, C[N+](C)(C)CCO, and [H+]."} {"text":"The reaction SMILES CC(C)(COP(=O)([O-])OP(=O)([O-])OC[C@H]1O[C@@H](n2cnc3c(N)ncnc32)[C@H](O)[C@@H]1OP(=O)([O-])[O-])[C@@H](O)C(=O)NCCC(=O)NCCS.[1*]C(=O)N[C@@H](CO)[C@H](O)\/C=C\/CC\/C=C\/CCCCCCCCC.[H+]>>*C(=O)SCCNC(=O)CCNC(=O)[C@H](O)C(C)(C)COP(=O)([O-])OP(=O)([O-])OC[C@H]1O[C@@H](n2cnc3c(N)ncnc32)[C@H](O)[C@@H]1OP(=O)([O-])[O-].CCCCCCCCC\/C=C\/CC\/C=C\/[C@@H](O)[C@@H]([NH3+])CO has the educts CC(C)(COP(=O)([O-])OP(=O)([O-])OC[C@H]1O[C@@H](n2cnc3c(N)ncnc32)[C@H](O)[C@@H]1OP(=O)([O-])[O-])[C@@H](O)C(=O)NCCC(=O)NCCS, [1*]C(=O)N[C@@H](CO)[C@H](O)\/C=C\/CC\/C=C\/CCCCCCCCC, and [H+] and the reaction products *C(=O)SCCNC(=O)CCNC(=O)[C@H](O)C(C)(C)COP(=O)([O-])OP(=O)([O-])OC[C@H]1O[C@@H](n2cnc3c(N)ncnc32)[C@H](O)[C@@H]1OP(=O)([O-])[O-] and CCCCCCCCC\/C=C\/CC\/C=C\/[C@@H](O)[C@@H]([NH3+])CO."}", "/scratch/micpie/export/rhea_db_predictions/test_0-3.jsonl": "{"text":"Question: Which products are produced from the educts COc1cc(\/C=C\/C(=O)OCC[N+](C)(C)C)cc(OC)c1O and O?\nAnswer: COc1cc(\/C=C\/C(=O)[O-])cc(OC)c1O, C[N+](C)(C)CCO, and [H+]."} {"text":"Question: Which reaction products are produced from the starting materials CC(C)(COP(=O)([O-])OP(=O)([O-])OC[C@H]1O[C@@H](n2cnc3c(N)ncnc32)[C@H](O)[C@@H]1OP(=O)([O-])[O-])[C@@H](O)C(=O)NCCC(=O)NCCS, [1*]C(=O)N[C@@H](CO)[C@H](O)\/C=C\/CC\/C=C\/CCCCCCCCC, and [H+]?\nAnswer: *C(=O)SCCNC(=O)CCNC(=O)[C@H](O)C(C)(C)COP(=O)([O-])OP(=O)([O-])OC[C@H]1O[C@@H](n2cnc3c(N)ncnc32)[C@H](O)[C@@H]1OP(=O)([O-])[O-] and CCCCCCCCC\/C=C\/CC\/C=C\/[C@@H](O)[C@@H]([NH3+])CO."}", "/scratch/micpie/export/rhea_db_predictions/train_0-0.jsonl": "{"text":"The reaction SMILES (RXNSMILES) CCCCC(N)=O.O>>CCCCC(=O)[O-].[NH4+] has the starting materials CCCCC(N)=O and O and the reaction products CCCCC(=O)[O-] and [NH4+]."} {"text":"The reaction SMILES (RXNSMILES) O=O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCCCCCCC.[Fe+2].[H+]>>O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCC\/C=C\\CCC.[Fe+3] has the starting materials O=O, [1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCCCCCCC, [Fe+2], and [H+] and the products O, [1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCC\/C=C\\CCC, and [Fe+3]."}", "/scratch/micpie/export/rhea_db_predictions/train_0-3.jsonl": "{"text":"Question: Which products are produced from the reaction educts CCCCC(N)=O and O?\nAnswer: CCCCC(=O)[O-] and [NH4+]."} {"text":"Question: Which reaction products are produced from the starting materials O=O, [1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCCCCCCC, [Fe+2], and [H+]?\nAnswer: O, [1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCC\/C=C\\CCC, and [Fe+3]."}", "/scratch/micpie/export/rhea_db_predictions/valid_0-2.jsonl": "{"text":"Question: Which starting materials are required to produce the reaction products N#CSCc1ccccc1?\nAnswer: S=C=NCc1ccccc1."} {"text":"Question: What educts are required to produce the products O=O, [1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCCCCCCC, [Fe+2], and [H+]?\nAnswer: O, [1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCC\/C=C\\CCC, and [Fe+3]."}", "/scratch/micpie/export/rhea_db_predictions/valid_0-1.jsonl": "{"text":"The reaction SMILES string S=C=NCc1ccccc1>>N#CSCc1ccccc1 has the products N#CSCc1ccccc1 and the starting materials S=C=NCc1ccccc1."} {"text":"The reaction SMILES (RXNSMILES) O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCC\/C=C\\CCC.[Fe+3]>>O=O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCCCCCCC.[Fe+2].[H+] has the products O=O, [1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCCCCCCC, [Fe+2], and [H+] and the educts O, [1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCC\/C=C\\CCC, and [Fe+3]."}", "/scratch/micpie/export/rhea_db_predictions/valid_0-4.jsonl": "{"text":"User: I would like to produce the reaction products N#CSCc1ccccc1.\nAssistant: Cool, is there anything else I can do for you?\nUser: Yes, I would like to know the reaction educts I need to produce the reaction products N#CSCc1ccccc1.\nAssistant: I advise the following reaction educts: S=C=NCc1ccccc1."} {"text":"User: I would like to synthesize the reaction products O=O, [1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCCCCCCC, [Fe+2], and [H+].\nAssistant: Cool, is there anything else I can do for you?\nUser: I would like to know the educts I need to produce the reaction products O=O, [1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCCCCCCC, [Fe+2], and [H+].\nAssistant: I propose the following educts: O, [1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCC\/C=C\\CCC, and [Fe+3]."}", "/scratch/micpie/export/rhea_db_predictions/train_0-2.jsonl": "{"text":"Question: What reaction educts are required to synthesize the reaction products CCCCC(=O)[O-] and [NH4+]?\nAnswer: CCCCC(N)=O and O."} {"text":"Question: What starting materials are needed to synthesize the products O, [1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCC\/C=C\\CCC, and [Fe+3]?\nAnswer: O=O, [1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCCCCCCC, [Fe+2], and [H+]."}", "/scratch/micpie/export/rhea_db_predictions/train_0-1.jsonl": "{"text":"The RXNSMILES CCCCC(N)=O.O>>CCCCC(=O)[O-].[NH4+] has the reaction products CCCCC(=O)[O-] and [NH4+] and the reaction educts CCCCC(N)=O and O."} {"text":"The reaction SMILES string O=O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCCCCCCC.[Fe+2].[H+]>>O.[1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCC\/C=C\\CCC.[Fe+3] has the products O, [1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCC\/C=C\\CCC, and [Fe+3] and the educts O=O, [1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCCCCCCC, [Fe+2], and [H+]."}", "/scratch/micpie/export/rhea_db_predictions/train_0-4.jsonl": "{"text":"User: I would like to produce the reaction products CCCCC(=O)[O-] and [NH4+].\nAssistant: Is there anything else I can do for you?\nUser: Yes, I would like to know the starting materials I need to produce the reaction products CCCCC(=O)[O-] and [NH4+].\nAssistant: I advise the following starting materials: CCCCC(N)=O and O."} {"text":"User: I must produce the reaction products O, [1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCC\/C=C\\CCC, and [Fe+3].\nAssistant: That's interesting, is there anything else I can do for you?\nUser: Yes, I would like to know the educts I need to produce the reaction products O, [1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCC\/C=C\\CCC, and [Fe+3].\nAssistant: I recommend the following educts: O=O, [1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCCCCCCC, [Fe+2], and [H+]."}", "/scratch/micpie/export/rhea_db_predictions/valid_0-3.jsonl": "{"text":"Question: Which reaction products are produced from the reaction educts S=C=NCc1ccccc1?\nAnswer: N#CSCc1ccccc1."} {"text":"Question: What reaction products are produced from the educts O, [1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCC\/C=C\\CCC, and [Fe+3]?\nAnswer: O=O, [1*]C(=O)N[C@@H](CO)[C@H](O)CCCCCCCCCCCCCCC, [Fe+2], and [H+]."}", "/scratch/micpie/export/rhea_db_predictions/test_0-4.jsonl": "{"text":"User: I want produce the reaction products COc1cc(\/C=C\/C(=O)[O-])cc(OC)c1O, C[N+](C)(C)CCO, and [H+].\nAssistant: Cool, is there anything else I can do for you?\nUser: Yes, I would like to know the starting materials I need to produce the reaction products COc1cc(\/C=C\/C(=O)[O-])cc(OC)c1O, C[N+](C)(C)CCO, and [H+].\nAssistant: I recommend the following starting materials: COc1cc(\/C=C\/C(=O)OCC[N+](C)(C)C)cc(OC)c1O and O."} {"text":"User: I must synthesize the reaction products *C(=O)SCCNC(=O)CCNC(=O)[C@H](O)C(C)(C)COP(=O)([O-])OP(=O)([O-])OC[C@H]1O[C@@H](n2cnc3c(N)ncnc32)[C@H](O)[C@@H]1OP(=O)([O-])[O-] and CCCCCCCCC\/C=C\/CC\/C=C\/[C@@H](O)[C@@H]([NH3+])CO.\nAssistant: That's interesting, is there anything else I can do for you?\nUser: Yes, I would like to know the educts I need to produce the reaction products *C(=O)SCCNC(=O)CCNC(=O)[C@H](O)C(C)(C)COP(=O)([O-])OP(=O)([O-])OC[C@H]1O[C@@H](n2cnc3c(N)ncnc32)[C@H](O)[C@@H]1OP(=O)([O-])[O-] and CCCCCCCCC\/C=C\/CC\/C=C\/[C@@H](O)[C@@H]([NH3+])CO.\nAssistant: I recommend the following educts: CC(C)(COP(=O)([O-])OP(=O)([O-])OC[C@H]1O[C@@H](n2cnc3c(N)ncnc32)[C@H](O)[C@@H]1OP(=O)([O-])[O-])[C@@H](O)C(=O)NCCC(=O)NCCS, [1*]C(=O)N[C@@H](CO)[C@H](O)\/C=C\/CC\/C=C\/CCCCCCCCC, and [H+]."}", "/scratch/micpie/export/thermosol/test_0-10.jsonl": "{"text":"User: Can you tell me the solubility in aqueous pH 7.4 buffer at 20 deg C in log(microM) of the molecule with the SMILES Oc1ncnc2scc(c3ccsc3)c12?\nAssistant: Yes, this molecule has a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.250 log(microM)."} {"text":"User: Can you derive the solubility in aqueous pH 7.4 buffer at 20 deg C in log(microM) of the molecule with the canonical SMILES CCCCOc1nc(N)c2[nH]c(=O)n(Cc3cccc(CC(=O)OC)c3)c2n1?\nAssistant: Yes, this molecule has a solubility in aqueous pH 7.4 buffer at 20 deg C of 3.255 log(microM)."}", "/scratch/micpie/export/thermosol/valid_0-8.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the solubility in aqueous pH 7.4 buffer at 20 deg C in log(microM).\nDeepSMILES: C[C@@H]CNC[C@H]C)O6)))cncNCCOC[C@@H]6C)))))))cnccnc6n%10)))cccccc6\nConstraint: Even if you are not sure, you must answer with a numeric value in log(microM) without the unit and without using any other words.\nResult: 4.111"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the aqueous solubility in pH 7.4 buffer at 20 deg C in log(microM).\nMolecule SELFIES: ['[C][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][=N][C][=N][C][NH1][C][=N][C][Ring1][=Branch2][=Ring1][Branch1]']\nConstraint: Even if you are uncertain, you must answer with a numeric value in log(microM) without the unit and without using any additional words.\nResult: 5.700"}", "/scratch/micpie/export/thermosol/train_0-8.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the aqueous solubility in pH 7.4 buffer at 20 deg C in log(microM).\nInChI: InChI=1S\/C23H21ClN6O3\/c1-32-19-12-15(22(31)29-8-10-33-11-9-29)5-6-17(19)27-23-26-13-16(24)21(28-23)18-14-25-20-4-2-3-7-30(18)20\/h2-7,12-14H,8-11H2,1H3,(H,26,27,28)\nConstraint: Even if you are not sure, you must answer with a numeric value in log(microM) without the unit and without using any additional words.\nResult: 3.204"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the solubility in aqueous pH 7.4 buffer at 20 deg C in log(microM).\nMolecule DeepSMILES: CCC)CNC=O)NC)C=O)cc6scCcccnccccF)cc%106)))))))))))c5C=O)NCC[C@@H]O)C5\nConstraint: Even if you are uncertain, you must answer with a numeric value in log(microM) without the unit and without using any additional words.\nResult: 5.940"}", "/scratch/micpie/export/thermosol/test_0-5.jsonl": "{"text":"The DeepSMILES Ocncncscccccsc5)))))c95 represents a molecule with a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.250 log(microM)."} {"text":"The InChI InChI=1S\/C19H23N5O4\/c1-3-4-8-28-18-22-16(20)15-17(23-18)24(19(26)21-15)11-13-7-5-6-12(9-13)10-14(25)27-2\/h5-7,9H,3-4,8,10-11H2,1-2H3,(H,21,26)(H2,20,22,23) is representing a molecule that has a aqueous solubility in pH 7.4 buffer at 20 deg C of 3.255 log(microM)."}", "/scratch/micpie/export/thermosol/valid_0-9.jsonl": "{"text":"Task: Please create a SELFIES based on the description.\nDescription: A molecule that has a solubility in aqueous pH 7.4 buffer at 20 deg C of 4.111 log(microM).\nResult: [C][C@@H1][C][N][Branch1][=Branch2][C][C@H1][Branch1][C][C][O][Ring1][#Branch1][C][=N][C][Branch1][#Branch2][N][C][C][O][C][C@@H1][Ring1][=Branch1][C][=C][N][=C][C][=Branch1][Branch2][=N][C][Ring1][=Branch1][=N][Ring1][P][C][=C][C][=C][C][=C][Ring1][=Branch1]"} {"text":"Task: Please give me a molecule SELFIES based on the description below.\nDescription: A molecule that has a aqueous solubility in pH 7.4 buffer at 20 deg C of 5.700 log(microM).\nResult: ['[C][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][=N][C][=N][C][NH1][C][=N][C][Ring1][=Branch2][=Ring1][Branch1]']"}", "/scratch/micpie/export/thermosol/test_0-1.jsonl": "{"text":"Question: What is the aqueous solubility in pH 7.4 buffer at 20 deg C of the chemical with the canonical SMILES Oc1ncnc2scc(-c3ccsc3)c12?\nAnswer: 5.250 log(microM)."} {"text":"Question: What is the aqueous solubility in pH 7.4 buffer at 20 deg C of the compound with the SMILES CCCCOc1nc(N)c2NC(=O)N(Cc3cccc(CC(=O)OC)c3)c2n1?\nAnswer: 3.255 log(microM)."}", "/scratch/micpie/export/thermosol/valid_0-0.jsonl": "{"text":"The aqueous solubility in pH 7.4 buffer at 20 deg C of the compound with the DeepSMILES C[C@@H]CNC[C@H]C)O6)))cncNCCOC[C@@H]6C)))))))cnccnc6n%10)))cccccc6 is 4.111 log(microM)."} {"text":"The solubility in aqueous pH 7.4 buffer at 20 deg C of the compound with the SELFIES ['[C][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][=N][C][=N][C][NH1][C][=N][C][Ring1][=Branch2][=Ring1][Branch1]'] is 5.700 log(microM)."}", "/scratch/micpie/export/thermosol/test_0-2.jsonl": "{"text":"User: I want to a drug with a particular solubility in aqueous pH 7.4 buffer at 20 deg C.\nAssistant: Great, I would need to know the solubility in aqueous pH 7.4 buffer at 20 deg C of the drug you want to design.\nUser: The solubility in aqueous pH 7.4 buffer at 20 deg C should be 5.250 log(microM).\nAssistant: I the drug with the DeepSMILES Ocncncscccccsc5)))))c95."} {"text":"User: I want to design a chemical with a particular solubility in aqueous pH 7.4 buffer at 20 deg C.\nAssistant: Great, I would need to know the solubility in aqueous pH 7.4 buffer at 20 deg C of the chemical you want to design.\nUser: The solubility in aqueous pH 7.4 buffer at 20 deg C should be 3.255 log(microM).\nAssistant: I suggest the chemical with the SMILES CCCCOc1nc(N)c2NC(=O)N(Cc3cccc(CC(=O)OC)c3)c2n1."}", "/scratch/micpie/export/thermosol/valid_0-10.jsonl": "{"text":"User: Can you derive the solubility in aqueous pH 7.4 buffer at 20 deg C in log(microM) of the molecule with the InChI InChI=1S\/C23H28N6O2\/c1-15-14-30-10-9-29(15)22-20-21(25-19(11-24-20)18-7-5-4-6-8-18)26-23(27-22)28-12-16(2)31-17(3)13-28\/h4-8,11,15-17H,9-10,12-14H2,1-3H3\/t15-,16-,17+\/m0\/s1?\nAssistant: Of course, this molecule has a solubility in aqueous pH 7.4 buffer at 20 deg C of 4.111 log(microM)."} {"text":"User: Can you tell me the aqueous solubility in pH 7.4 buffer at 20 deg C in log(microM) of the molecule with the SELFIES ['[C][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][=N][C][=N][C][NH1][C][=N][C][Ring1][=Branch2][=Ring1][Branch1]']?\nAssistant: Of course, this molecule has a aqueous solubility in pH 7.4 buffer at 20 deg C of 5.700 log(microM)."}", "/scratch/micpie/export/thermosol/train_0-6.jsonl": "{"text":"The molecule with the SELFIES ['[C][O][C][=C][C][=Branch2][Ring1][P][=C][C][=C][Ring1][=Branch1][N][C][=N][C][=C][Branch1][C][Cl][C][=Branch1][Ring2][=N][Ring1][#Branch1][C][=C][N][=C][C][=C][C][=C][N][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1]'] has a aqueous solubility in pH 7.4 buffer at 20 deg C of 3.204 log(microM)."} {"text":"The molecule with the InChI InChI=1S\/C26H27FN4O4S\/c1-14(2)12-31-25-22(23(33)29(3)26(31)35)21(24(34)30-9-7-17(32)13-30)20(36-25)10-15-6-8-28-19-5-4-16(27)11-18(15)19\/h4-6,8,11,14,17,32H,7,9-10,12-13H2,1-3H3\/t17-\/m1\/s1 has a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.940 log(microM)."}", "/scratch/micpie/export/thermosol/valid_0-6.jsonl": "{"text":"The molecule with the canonical SMILES C[C@@H]1CN(c2nc(N3CCOC[C@@H]3C)c3ncc(-c4ccccc4)nc3n2)C[C@H](C)O1 has a solubility in aqueous pH 7.4 buffer at 20 deg C of 4.111 log(microM)."} {"text":"The molecule with the InChI InChI=1S\/C10H13N5\/c1-2-4-15(5-3-1)10-8-9(12-6-11-8)13-7-14-10\/h6-7H,1-5H2,(H,11,12,13,14) has a aqueous solubility in pH 7.4 buffer at 20 deg C of 5.700 log(microM)."}", "/scratch/micpie/export/thermosol/test_0-9.jsonl": "{"text":"Task: Please create a molecule InChI based on the description below.\nDescription: A molecule that has a aqueous solubility in pH 7.4 buffer at 20 deg C of 5.250 log(microM).\nResult: InChI=1S\/C10H6N2OS2\/c13-9-8-7(6-1-2-14-3-6)4-15-10(8)12-5-11-9\/h1-5H,(H,11,12,13)"} {"text":"Task: Please give me a DeepSMILES based on the description.\nDescription: A molecule that has a solubility in aqueous pH 7.4 buffer at 20 deg C of 3.255 log(microM).\nResult: CCCCOcncN)cNC=O)NCcccccCC=O)OC))))c6)))))))c5n9"}", "/scratch/micpie/export/thermosol/test_0-0.jsonl": "{"text":"The aqueous solubility in pH 7.4 buffer at 20 deg C of the chemical with the canonical SMILES Oc1ncnc2scc(-c3ccsc3)c12 is 5.250 log(microM)."} {"text":"The aqueous solubility in pH 7.4 buffer at 20 deg C of the compound with the InChI InChI=1S\/C19H23N5O4\/c1-3-4-8-28-18-22-16(20)15-17(23-18)24(19(26)21-15)11-13-7-5-6-12(9-13)10-14(25)27-2\/h5-7,9H,3-4,8,10-11H2,1-2H3,(H,21,26)(H2,20,22,23) is 3.255 log(microM)."}", "/scratch/micpie/export/thermosol/valid_0-7.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the aqueous solubility in pH 7.4 buffer at 20 deg C in log(microM).\nMolecule SMILES: C[C@@H]1CN(C[C@H](C)O1)c2nc(N3CCOC[C@@H]3C)c4ncc(nc4n2)c5ccccc5\nConstraint: Even if you are uncertain, you must answer with a numeric value in log(microM) without using any other words.\nResult: 4.111 log(microM)"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the solubility in aqueous pH 7.4 buffer at 20 deg C in log(microM).\nMolecule InChI: InChI=1S\/C10H13N5\/c1-2-4-15(5-3-1)10-8-9(12-6-11-8)13-7-14-10\/h6-7H,1-5H2,(H,11,12,13,14)\nConstraint: Even if you are uncertain, you must answer with a numeric value in log(microM) without using any other words.\nResult: 5.700 log(microM)"}", "/scratch/micpie/export/thermosol/test_0-3.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C10H6N2OS2\/c13-9-8-7(6-1-2-14-3-6)4-15-10(8)12-5-11-9\/h1-5H,(H,11,12,13) has a aqueous solubility in pH 7.4 buffer at 20 deg C of 5.250 log(microM)."} {"text":"The molecule with the SELFIES representation of ['[C][C][C][C][O][C][=N][C][Branch1][C][N][=C][N][C][=Branch1][C][=O][N][Branch2][Ring1][Ring1][C][C][=C][C][=C][C][Branch1][Branch2][C][C][=Branch1][C][=O][O][C][=C][Ring1][O][C][Ring2][Ring1][C][=N][Ring2][Ring1][#Branch1]'] has a aqueous solubility in pH 7.4 buffer at 20 deg C of 3.255 log(microM)."}", "/scratch/micpie/export/thermosol/valid_0-11.jsonl": "{"text":"User: Can you give me the DeepSMILES of a molecule that has a aqueous solubility in pH 7.4 buffer at 20 deg C of 4.111 log(microM)?\nAssistant: Yes, here you go: C[C@@H]CNC[C@H]C)O6)))cncNCCOC[C@@H]6C)))))))cnccnc6n%10)))cccccc6"} {"text":"User: Can you generate the canonical SMILES of a molecule that has a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.700 log(microM)?\nAssistant: Yes, here you go: c1nc(N2CCCCC2)c2nc[nH]c2n1"}", "/scratch/micpie/export/thermosol/train_0-0.jsonl": "{"text":"The aqueous solubility in pH 7.4 buffer at 20 deg C of the compound with the DeepSMILES COcccccc6NcnccCl)cn6)ccncccccn96))))))))))))))))))C=O)NCCOCC6 is 3.204 log(microM)."} {"text":"The aqueous solubility in pH 7.4 buffer at 20 deg C of the chemical with the DeepSMILES CCC)CNC=O)NC)C=O)cc6scCcccnccccF)cc%106)))))))))))c5C=O)NCC[C@@H]O)C5 is 5.940 log(microM)."}", "/scratch/micpie/export/thermosol/test_0-6.jsonl": "{"text":"The molecule with the SMILES Oc1ncnc2scc(c3ccsc3)c12 has a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.250 log(microM)."} {"text":"The molecule with the SMILES CCCCOc1nc(N)c2NC(=O)N(Cc3cccc(CC(=O)OC)c3)c2n1 has a solubility in aqueous pH 7.4 buffer at 20 deg C of 3.255 log(microM)."}", "/scratch/micpie/export/thermosol/train_0-10.jsonl": "{"text":"User: Can you tell me the solubility in aqueous pH 7.4 buffer at 20 deg C in log(microM) of the molecule with the InChI InChI=1S\/C23H21ClN6O3\/c1-32-19-12-15(22(31)29-8-10-33-11-9-29)5-6-17(19)27-23-26-13-16(24)21(28-23)18-14-25-20-4-2-3-7-30(18)20\/h2-7,12-14H,8-11H2,1H3,(H,26,27,28)?\nAssistant: Of course, this molecule has a solubility in aqueous pH 7.4 buffer at 20 deg C of 3.204 log(microM)."} {"text":"User: Can you estimate the solubility in aqueous pH 7.4 buffer at 20 deg C in log(microM) of the molecule with the SMILES CC(C)CN1C(=O)N(C)C(=O)c2c1sc(Cc3ccnc4ccc(F)cc34)c2C(=O)N5CC[C@@H](O)C5?\nAssistant: Sure, this molecule has a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.940 log(microM)."}", "/scratch/micpie/export/thermosol/train_0-3.jsonl": "{"text":"The molecule with the canonical SMILES COc1cc(C(=O)N2CCOCC2)ccc1Nc1ncc(Cl)c(-c2cnc3ccccn23)n1 has a aqueous solubility in pH 7.4 buffer at 20 deg C of 3.204 log(microM)."} {"text":"The molecule with the InChI InChI=1S\/C26H27FN4O4S\/c1-14(2)12-31-25-22(23(33)29(3)26(31)35)21(24(34)30-9-7-17(32)13-30)20(36-25)10-15-6-8-28-19-5-4-16(27)11-18(15)19\/h4-6,8,11,14,17,32H,7,9-10,12-13H2,1-3H3\/t17-\/m1\/s1 has a aqueous solubility in pH 7.4 buffer at 20 deg C of 5.940 log(microM)."}", "/scratch/micpie/export/thermosol/train_0-12.jsonl": "{"text":"User: I'm searching for the SELFIES of a molecule that has a solubility in aqueous pH 7.4 buffer at 20 deg C of 3.204 log(microM).\nAssistant: This is a molecule that has a solubility in aqueous pH 7.4 buffer at 20 deg C of 3.204 log(microM): ['[C][O][C][=C][C][=Branch2][Ring1][P][=C][C][=C][Ring1][=Branch1][N][C][=N][C][=C][Branch1][C][Cl][C][=Branch1][Ring2][=N][Ring1][#Branch1][C][=C][N][=C][C][=C][C][=C][N][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1]']"} {"text":"User: I'm looking for the InChI of a molecule that has a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.940 log(microM).\nAssistant: This is a molecule that has a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.940 log(microM): InChI=1S\/C26H27FN4O4S\/c1-14(2)12-31-25-22(23(33)29(3)26(31)35)21(24(34)30-9-7-17(32)13-30)20(36-25)10-15-6-8-28-19-5-4-16(27)11-18(15)19\/h4-6,8,11,14,17,32H,7,9-10,12-13H2,1-3H3\/t17-\/m1\/s1"}", "/scratch/micpie/export/thermosol/test_0-13.jsonl": "{"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a aqueous solubility in pH 7.4 buffer at 20 deg C of 5.250 log(microM).\nAssistant: Got it, this SMILES represents a molecule that has a aqueous solubility in pH 7.4 buffer at 20 deg C of 5.250 log(microM): Oc1ncnc2scc(c3ccsc3)c12"} {"text":"User: I want to create a SELFIES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should have a aqueous solubility in pH 7.4 buffer at 20 deg C of 3.255 log(microM).\nAssistant: Ok, here you go, this SELFIES represents a molecule that has a aqueous solubility in pH 7.4 buffer at 20 deg C of 3.255 log(microM): ['[C][C][C][C][O][C][=N][C][Branch1][C][N][=C][N][C][=Branch1][C][=O][N][Branch2][Ring1][Ring1][C][C][=C][C][=C][C][Branch1][Branch2][C][C][=Branch1][C][=O][O][C][=C][Ring1][O][C][Ring2][Ring1][C][=N][Ring2][Ring1][#Branch1]']"}", "/scratch/micpie/export/thermosol/valid_0-2.jsonl": "{"text":"User: I want to identify a chemical with a particular solubility in aqueous pH 7.4 buffer at 20 deg C.\nAssistant: That sounds interesting, I would need to know the solubility in aqueous pH 7.4 buffer at 20 deg C of the chemical you want to design.\nUser: The solubility in aqueous pH 7.4 buffer at 20 deg C should be 4.111 log(microM).\nAssistant: I suggest the chemical with the SELFIES [C][C@@H1][C][N][Branch1][=Branch2][C][C@H1][Branch1][C][C][O][Ring1][#Branch1][C][=N][C][Branch1][#Branch2][N][C][C][O][C][C@@H1][Ring1][=Branch1][C][=C][N][=C][C][=Branch1][Branch2][=N][C][Ring1][=Branch1][=N][Ring1][P][C][=C][C][=C][C][=C][Ring1][=Branch1]."} {"text":"User: I want to a drug with a particular solubility in aqueous pH 7.4 buffer at 20 deg C.\nAssistant: Cool, I would need to know the solubility in aqueous pH 7.4 buffer at 20 deg C of the drug you want to design.\nUser: The solubility in aqueous pH 7.4 buffer at 20 deg C should be 5.700 log(microM).\nAssistant: I advise the drug with the canonical SMILES c1nc(N2CCCCC2)c2nc[nH]c2n1."}", "/scratch/micpie/export/thermosol/train_0-14.jsonl": "{"text":"User: I want to create a SELFIES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a solubility in aqueous pH 7.4 buffer at 20 deg C of 3.204 log(microM).\nAssistant: Ok, this SELFIES represents a molecule that has a solubility in aqueous pH 7.4 buffer at 20 deg C of 3.204 log(microM): ['[C][O][C][=C][C][=Branch2][Ring1][P][=C][C][=C][Ring1][=Branch1][N][C][=N][C][=C][Branch1][C][Cl][C][=Branch1][Ring2][=N][Ring1][#Branch1][C][=C][N][=C][C][=C][C][=C][N][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1]']"} {"text":"User: I want to generate a molecule InChI.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.940 log(microM).\nAssistant: Ok, this InChI represents a molecule that has a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.940 log(microM): InChI=1S\/C26H27FN4O4S\/c1-14(2)12-31-25-22(23(33)29(3)26(31)35)21(24(34)30-9-7-17(32)13-30)20(36-25)10-15-6-8-28-19-5-4-16(27)11-18(15)19\/h4-6,8,11,14,17,32H,7,9-10,12-13H2,1-3H3\/t17-\/m1\/s1"}", "/scratch/micpie/export/thermosol/valid_0-1.jsonl": "{"text":"Question: What is the aqueous solubility in pH 7.4 buffer at 20 deg C of the molecule with the canonical SMILES C[C@@H]1CN(c2nc(N3CCOC[C@@H]3C)c3ncc(-c4ccccc4)nc3n2)C[C@H](C)O1?\nAnswer: 4.111 log(microM)."} {"text":"Question: What is the solubility in aqueous pH 7.4 buffer at 20 deg C of the compound with the InChI InChI=1S\/C10H13N5\/c1-2-4-15(5-3-1)10-8-9(12-6-11-8)13-7-14-10\/h6-7H,1-5H2,(H,11,12,13,14)?\nAnswer: 5.700 log(microM)."}", "/scratch/micpie/export/thermosol/valid_0-13.jsonl": "{"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should have a solubility in aqueous pH 7.4 buffer at 20 deg C of 4.111 log(microM).\nAssistant: Ok, this SELFIES represents a molecule that has a solubility in aqueous pH 7.4 buffer at 20 deg C of 4.111 log(microM): [C][C@@H1][C][N][Branch1][=Branch2][C][C@H1][Branch1][C][C][O][Ring1][#Branch1][C][=N][C][Branch1][#Branch2][N][C][C][O][C][C@@H1][Ring1][=Branch1][C][=C][N][=C][C][=Branch1][Branch2][=N][C][Ring1][=Branch1][=N][Ring1][P][C][=C][C][=C][C][=C][Ring1][=Branch1]"} {"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a aqueous solubility in pH 7.4 buffer at 20 deg C of 5.700 log(microM).\nAssistant: Ok, here you go, this DeepSMILES represents a molecule that has a aqueous solubility in pH 7.4 buffer at 20 deg C of 5.700 log(microM): CCCNCC6))cncnc[nH]cnc95"}", "/scratch/micpie/export/thermosol/valid_0-5.jsonl": "{"text":"The InChI InChI=1S\/C23H28N6O2\/c1-15-14-30-10-9-29(15)22-20-21(25-19(11-24-20)18-7-5-4-6-8-18)26-23(27-22)28-12-16(2)31-17(3)13-28\/h4-8,11,15-17H,9-10,12-14H2,1-3H3\/t15-,16-,17+\/m0\/s1 is representing a molecule with a solubility in aqueous pH 7.4 buffer at 20 deg C of 4.111 log(microM)."} {"text":"The DeepSMILES CCCNCC6))cncnc[nH]cnc95 is representing a molecule with a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.700 log(microM)."}", "/scratch/micpie/export/thermosol/valid_0-4.jsonl": "{"text":"Based on the InChI InChI=1S\/C23H28N6O2\/c1-15-14-30-10-9-29(15)22-20-21(25-19(11-24-20)18-7-5-4-6-8-18)26-23(27-22)28-12-16(2)31-17(3)13-28\/h4-8,11,15-17H,9-10,12-14H2,1-3H3\/t15-,16-,17+\/m0\/s1, the molecule has a aqueous solubility in pH 7.4 buffer at 20 deg C of 4.111 log(microM)."} {"text":"Based on the SELFIES representation of ['[C][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][=N][C][=N][C][NH1][C][=N][C][Ring1][=Branch2][=Ring1][Branch1]'], the molecule has a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.700 log(microM)."}", "/scratch/micpie/export/thermosol/train_0-5.jsonl": "{"text":"The canonical SMILES COc1cc(C(=O)N2CCOCC2)ccc1Nc1ncc(Cl)c(-c2cnc3ccccn23)n1 is representing a molecule that has a aqueous solubility in pH 7.4 buffer at 20 deg C of 3.204 log(microM)."} {"text":"The DeepSMILES CCC)CNC=O)NC)C=O)cc6scCcccnccccF)cc%106)))))))))))c5C=O)NCC[C@@H]O)C5 represents a molecule that has a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.940 log(microM)."}", "/scratch/micpie/export/thermosol/valid_0-12.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that has a aqueous solubility in pH 7.4 buffer at 20 deg C of 4.111 log(microM).\nAssistant: This is a molecule that has a aqueous solubility in pH 7.4 buffer at 20 deg C of 4.111 log(microM): C[C@@H]1CN(c2nc(N3CCOC[C@@H]3C)c3ncc(-c4ccccc4)nc3n2)C[C@H](C)O1"} {"text":"User: I'm looking for the DeepSMILES of a molecule that has a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.700 log(microM).\nAssistant: This is a molecule that has a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.700 log(microM): CCCNCC6))cncnc[nH]cnc95"}", "/scratch/micpie/export/thermosol/train_0-2.jsonl": "{"text":"User: I want to find a chemical with a particular solubility in aqueous pH 7.4 buffer at 20 deg C.\nAssistant: I would need to know the solubility in aqueous pH 7.4 buffer at 20 deg C of the chemical you want to design.\nUser: The solubility in aqueous pH 7.4 buffer at 20 deg C should be 3.204 log(microM).\nAssistant: I propose the chemical with the SMILES COc1cc(ccc1Nc2ncc(Cl)c(n2)c3cnc4ccccn34)C(=O)N5CCOCC5."} {"text":"User: I want to identify a molecule with a particular aqueous solubility in pH 7.4 buffer at 20 deg C.\nAssistant: That sounds interesting, I would need to know the aqueous solubility in pH 7.4 buffer at 20 deg C of the molecule you want to design.\nUser: The aqueous solubility in pH 7.4 buffer at 20 deg C should be 5.940 log(microM).\nAssistant: I the molecule with the InChI InChI=1S\/C26H27FN4O4S\/c1-14(2)12-31-25-22(23(33)29(3)26(31)35)21(24(34)30-9-7-17(32)13-30)20(36-25)10-15-6-8-28-19-5-4-16(27)11-18(15)19\/h4-6,8,11,14,17,32H,7,9-10,12-13H2,2-3H3\/t17-\/m1\/s1."}", "/scratch/micpie/export/thermosol/test_0-11.jsonl": "{"text":"User: Can you generate the SELFIES of a molecule that has a aqueous solubility in pH 7.4 buffer at 20 deg C of 5.250 log(microM)?\nAssistant: Sure, here you go: ['[O][C][=N][C][=N][C][S][C][=C][Branch1][Branch2][C][C][=C][S][C][=Ring1][Branch1][C][Ring1][=C][=Ring1][#Branch2]']"} {"text":"User: Can you create the SELFIES of a molecule that has a solubility in aqueous pH 7.4 buffer at 20 deg C of 3.255 log(microM)?\nAssistant: Of course, here you go: ['[C][C][C][C][O][C][=N][C][Branch1][C][N][=C][N][C][=Branch1][C][=O][N][Branch2][Ring1][Ring1][C][C][=C][C][=C][C][Branch1][Branch2][C][C][=Branch1][C][=O][O][C][=C][Ring1][O][C][Ring2][Ring1][C][=N][Ring2][Ring1][#Branch1]']"}", "/scratch/micpie/export/thermosol/train_0-7.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the aqueous solubility in pH 7.4 buffer at 20 deg C in log(microM).\nMolecule canonical SMILES: COc1cc(C(=O)N2CCOCC2)ccc1Nc1ncc(Cl)c(-c2cnc3ccccn23)n1\nConstraint: Even if you are not sure, you must answer with a numeric value in log(microM) without using any other words.\nResult: 3.204 log(microM)"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the aqueous solubility in pH 7.4 buffer at 20 deg C in log(microM).\nMolecule canonical SMILES: CC(C)Cn1c(=O)n(C)c(=O)c2c(C(=O)N3CC[C@@H](O)C3)c(Cc3ccnc4ccc(F)cc34)sc21\nConstraint: Even if you are not sure, you must answer with a numeric value in log(microM) without using any additional words.\nResult: 5.940 log(microM)"}", "/scratch/micpie/export/thermosol/train_0-11.jsonl": "{"text":"User: Can you give me the canonical SMILES of a molecule that has a solubility in aqueous pH 7.4 buffer at 20 deg C of 3.204 log(microM)?\nAssistant: Yes, here you go: COc1cc(C(=O)N2CCOCC2)ccc1Nc1ncc(Cl)c(-c2cnc3ccccn23)n1"} {"text":"User: Can you give me the SMILES of a molecule that has a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.940 log(microM)?\nAssistant: Sure, here you go: CC(C)CN1C(=O)N(C)C(=O)c2c1sc(Cc3ccnc4ccc(F)cc34)c2C(=O)N5CC[C@@H](O)C5"}", "/scratch/micpie/export/thermosol/train_0-1.jsonl": "{"text":"Question: What is the aqueous solubility in pH 7.4 buffer at 20 deg C of the compound with the InChI InChI=1S\/C23H21ClN6O3\/c1-32-19-12-15(22(31)29-8-10-33-11-9-29)5-6-17(19)27-23-26-13-16(24)21(28-23)18-14-25-20-4-2-3-7-30(18)20\/h2-7,12-14H,8-11H2,1H3,(H,26,27,28)?\nAnswer: 3.204 log(microM)."} {"text":"Question: What is the solubility in aqueous pH 7.4 buffer at 20 deg C of the chemical with the canonical SMILES CC(C)Cn1c(=O)n(C)c(=O)c2c(C(=O)N3CC[C@@H](O)C3)c(Cc3ccnc4ccc(F)cc34)sc21?\nAnswer: 5.940 log(microM)."}", "/scratch/micpie/export/thermosol/train_0-13.jsonl": "{"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a solubility in aqueous pH 7.4 buffer at 20 deg C of 3.204 log(microM).\nAssistant: Ok, here you go, this InChI represents a molecule that has a solubility in aqueous pH 7.4 buffer at 20 deg C of 3.204 log(microM): InChI=1S\/C23H21ClN6O3\/c1-32-19-12-15(22(31)29-8-10-33-11-9-29)5-6-17(19)27-23-26-13-16(24)21(28-23)18-14-25-20-4-2-3-7-30(18)20\/h2-7,12-14H,8-11H2,1H3,(H,26,27,28)"} {"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should have a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.940 log(microM).\nAssistant: Got it, here you go, this DeepSMILES represents a molecule that has a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.940 log(microM): CCC)CNC=O)NC)C=O)cc6scCcccnccccF)cc%106)))))))))))c5C=O)NCC[C@@H]O)C5"}", "/scratch/micpie/export/thermosol/train_0-4.jsonl": "{"text":"Based on the canonical SMILES COc1cc(C(=O)N2CCOCC2)ccc1Nc1ncc(Cl)c(-c2cnc3ccccn23)n1, the molecule has a solubility in aqueous pH 7.4 buffer at 20 deg C of 3.204 log(microM)."} {"text":"Based on the InChI InChI=1S\/C26H27FN4O4S\/c1-14(2)12-31-25-22(23(33)29(3)26(31)35)21(24(34)30-9-7-17(32)13-30)20(36-25)10-15-6-8-28-19-5-4-16(27)11-18(15)19\/h4-6,8,11,14,17,32H,7,9-10,12-13H2,1-3H3\/t17-\/m1\/s1, the molecule has a aqueous solubility in pH 7.4 buffer at 20 deg C of 5.940 log(microM)."}", "/scratch/micpie/export/thermosol/test_0-7.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the solubility in aqueous pH 7.4 buffer at 20 deg C in log(microM).\nMolecule canonical SMILES: Oc1ncnc2scc(-c3ccsc3)c12\nConstraint: Even if you are uncertain, you must answer with a numeric value in log(microM) without using any other words.\nResult: 5.250 log(microM)"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the aqueous solubility in pH 7.4 buffer at 20 deg C in log(microM).\nMolecule InChI: InChI=1S\/C19H23N5O4\/c1-3-4-8-28-18-22-16(20)15-17(23-18)24(19(26)21-15)11-13-7-5-6-12(9-13)10-14(25)27-2\/h5-7,9H,3-4,8,10-11H2,1-2H3,(H,21,26)(H2,20,22,23)\nConstraint: Even if you are not sure, you must answer with a numeric value in log(microM) without using any additional words.\nResult: 3.255 log(microM)"}", "/scratch/micpie/export/thermosol/train_0-9.jsonl": "{"text":"Task: Please generate a molecule SELFIES based on the text description below.\nDescription: A molecule that has a aqueous solubility in pH 7.4 buffer at 20 deg C of 3.204 log(microM).\nResult: ['[C][O][C][=C][C][=Branch2][Ring1][P][=C][C][=C][Ring1][=Branch1][N][C][=N][C][=C][Branch1][C][Cl][C][=Branch1][Ring2][=N][Ring1][#Branch1][C][=C][N][=C][C][=C][C][=C][N][Ring1][=Branch2][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1]']"} {"text":"Task: Please give me a molecule SMILES based on the description below.\nDescription: A molecule that has a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.940 log(microM).\nResult: CC(C)CN1C(=O)N(C)C(=O)c2c1sc(Cc3ccnc4ccc(F)cc34)c2C(=O)N5CC[C@@H](O)C5"}", "/scratch/micpie/export/thermosol/valid_0-3.jsonl": "{"text":"The molecule with the DeepSMILES C[C@@H]CNC[C@H]C)O6)))cncNCCOC[C@@H]6C)))))))cnccnc6n%10)))cccccc6 has a solubility in aqueous pH 7.4 buffer at 20 deg C of 4.111 log(microM)."} {"text":"The molecule with the InChI InChI=1S\/C10H13N5\/c1-2-4-15(5-3-1)10-8-9(12-6-11-8)13-7-14-10\/h6-7H,1-5H2,(H,11,12,13,14) has a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.700 log(microM)."}", "/scratch/micpie/export/thermosol/test_0-8.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the solubility in aqueous pH 7.4 buffer at 20 deg C in log(microM).\nDeepSMILES: Ocncncscccccsc5)))))c95\nConstraint: Even if you are uncertain, you must answer with a numeric value in log(microM) without the unit and without using any additional words.\nResult: 5.250"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the aqueous solubility in pH 7.4 buffer at 20 deg C in log(microM).\nMolecule canonical SMILES: CCCCOc1nc(N)c2[nH]c(=O)n(Cc3cccc(CC(=O)OC)c3)c2n1\nConstraint: Even if you are uncertain, you must answer with a numeric value in log(microM) without the unit and without using any additional words.\nResult: 3.255"}", "/scratch/micpie/export/thermosol/test_0-14.jsonl": "{"text":"User: I want to create a SELFIES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should have a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.250 log(microM).\nAssistant: Got it, this SELFIES represents a molecule that has a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.250 log(microM): ['[O][C][=N][C][=N][C][S][C][=C][Branch1][Branch2][C][C][=C][S][C][=Ring1][Branch1][C][Ring1][=C][=Ring1][#Branch2]']"} {"text":"User: I want to create a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should have a aqueous solubility in pH 7.4 buffer at 20 deg C of 3.255 log(microM).\nAssistant: Understood, this canonical SMILES represents a molecule that has a aqueous solubility in pH 7.4 buffer at 20 deg C of 3.255 log(microM): CCCCOc1nc(N)c2[nH]c(=O)n(Cc3cccc(CC(=O)OC)c3)c2n1"}", "/scratch/micpie/export/thermosol/valid_0-14.jsonl": "{"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should have a aqueous solubility in pH 7.4 buffer at 20 deg C of 4.111 log(microM).\nAssistant: Understood, this DeepSMILES represents a molecule that has a aqueous solubility in pH 7.4 buffer at 20 deg C of 4.111 log(microM): C[C@@H]CNC[C@H]C)O6)))cncNCCOC[C@@H]6C)))))))cnccnc6n%10)))cccccc6"} {"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.700 log(microM).\nAssistant: Understood, this SMILES represents a molecule that has a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.700 log(microM): C1CCN(CC1)c2ncnc3[nH]cnc23"}", "/scratch/micpie/export/thermosol/test_0-4.jsonl": "{"text":"Based on the InChI InChI=1S\/C10H6N2OS2\/c13-9-8-7(6-1-2-14-3-6)4-15-10(8)12-5-11-9\/h1-5H,(H,11,12,13), the molecule has a solubility in aqueous pH 7.4 buffer at 20 deg C of 5.250 log(microM)."} {"text":"Based on the canonical SMILES CCCCOc1nc(N)c2[nH]c(=O)n(Cc3cccc(CC(=O)OC)c3)c2n1, the molecule has a aqueous solubility in pH 7.4 buffer at 20 deg C of 3.255 log(microM)."}", "/scratch/micpie/export/thermosol/test_0-12.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that has a aqueous solubility in pH 7.4 buffer at 20 deg C of 5.250 log(microM).\nAssistant: This is a molecule that has a aqueous solubility in pH 7.4 buffer at 20 deg C of 5.250 log(microM): Oc1ncnc2scc(c3ccsc3)c12"} {"text":"User: I'm searching for the canonical SMILES of a molecule that has a solubility in aqueous pH 7.4 buffer at 20 deg C of 3.255 log(microM).\nAssistant: This is a molecule that has a solubility in aqueous pH 7.4 buffer at 20 deg C of 3.255 log(microM): CCCCOc1nc(N)c2[nH]c(=O)n(Cc3cccc(CC(=O)OC)c3)c2n1"}", "/scratch/micpie/export/nr_ahr_tox21/test_0-10.jsonl": "{"text":"User: Can you generate the SMILES of a molecule that is not toxic in the NR-AhR assay?\nAssistant: Of course, here you go: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"User: Can you create the canonical SMILES of a molecule that is not toxic in the NR-AhR assay?\nAssistant: Of course, here you go: CN(C)S(=O)(=O)N(SC(F)(Cl)Cl)c1ccccc1"}", "/scratch/micpie/export/nr_ahr_tox21/valid_0-8.jsonl": "{"text":"User: Can you figure out if the molecule with the SELFIES [C][C][=C][N][=C][Branch1][C][C][C][Branch1][C][C][=N][Ring1][Branch2] is toxic in the aryl hydrocarbon receptor assay?\nAssistant: No, this molecule is not toxic in the aryl hydrocarbon receptor assay."} {"text":"User: Can you figure out if the molecule with the DeepSMILES NcncN)cnc-cccccc6))))))cN)nc6n%10 is toxic in the NR-AhR assay?\nAssistant: Yes, this molecule is toxic in the NR-AhR assay."}", "/scratch/micpie/export/nr_ahr_tox21/test_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-AhR assay?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any additional words.\nOptions:\n1 CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C\n2 O=C(O)c1ccc([Hg]Cl)cc1\n3 Nc1ccc(C(=O)NCC(=O)O)cc1\nAnswer: 1, 2, 3"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-AhR assay?\nConstraint: You must select none, one or more options from a, b, or c without using any additional words.\nOptions:\na: CCOCCNCCNCCC)C=O)cccccc6)))))))))CC6)))))))cccccc6\nb: CNC)S=O)=O)NSCF)Cl)Cl)))cccccc6\nc: CCCCCCCCCCC)C=O\nAnswer: a, b, c"}", "/scratch/micpie/export/nr_ahr_tox21/train_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the DeepSMILES CCOccccncSN)=O)=O))sc5c9 is toxic in the NR-AhR assay?\nAssistant: Yes, this molecule is toxic in the NR-AhR assay."} {"text":"User: Can you figure out if the molecule with the DeepSMILES NCCCCCCCN))C6 is toxic in the aryl hydrocarbon receptor assay?\nAssistant: No, this molecule is not toxic in the aryl hydrocarbon receptor assay."}", "/scratch/micpie/export/nr_ahr_tox21/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the aryl hydrocarbon receptor assay.\nDeepSMILES: CCCNCC))CCC))C=O)NccC)cccc6C\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-AhR assay.\nMolecule DeepSMILES: CNC)S=O)=O)NSCF)Cl)Cl)))cccccc6\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"}", "/scratch/micpie/export/nr_ahr_tox21/valid_0-9.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C7H10N2\/c1-5-4-8-6(2)7(3)9-5\/h4H,1-3H3 toxic in the aryl hydrocarbon receptor assay?\nAssistant: No, it is not toxic in the aryl hydrocarbon receptor assay."} {"text":"User: Is the molecule with the InChI InChI=1S\/C12H11N7\/c13-9-7(6-4-2-1-3-5-6)16-8-10(14)18-12(15)19-11(8)17-9\/h1-5H,(H6,13,14,15,17,18,19) toxic in the aryl hydrocarbon receptor assay?\nAssistant: Yes, it is toxic in the aryl hydrocarbon receptor assay."}", "/scratch/micpie/export/nr_ahr_tox21/test_0-1.jsonl": "{"text":"The molecule with the canonical SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C is not demonstrating toxicity in the NR-aryl hydrocarbon receptor assay."} {"text":"The molecule with the InChI representation of InChI=1S\/C9H11Cl2FN2O2S2\/c1-13(2)18(15,16)14(17-9(10,11)12)8-6-4-3-5-7-8\/h3-7H,1-2H3 is not demonstrating toxicity in the NR-aryl hydrocarbon receptor assay."}", "/scratch/micpie/export/nr_ahr_tox21/valid_0-0.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C7H10N2\/c1-5-4-8-6(2)7(3)9-5\/h4H,1-3H3 is not toxic in the aryl hydrocarbon receptor assay."} {"text":"The molecule with the SELFIES [N][C][=N][C][Branch1][C][N][=C][N][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Branch1][C][N][=N][C][Ring1][=N][=N][Ring2][Ring1][C] is toxic in the aryl hydrocarbon receptor assay."}", "/scratch/micpie/export/nr_ahr_tox21/test_0-2.jsonl": "{"text":"Based on the InChI InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20), the molecule has no NR-AhR toxicity characteristics."} {"text":"Based on the DeepSMILES CNC)S=O)=O)NSCF)Cl)Cl)))cccccc6, the molecule has no aryl hydrocarbon receptor toxicity characteristics."}", "/scratch/micpie/export/nr_ahr_tox21/valid_0-10.jsonl": "{"text":"User: Can you generate the canonical SMILES of a molecule that is not toxic in the NR-AhR assay?\nAssistant: Of course, here you go: Cc1cnc(C)c(C)n1"} {"text":"User: Can you create the InChI of a molecule that is toxic in the NR-AhR assay?\nAssistant: Of course, here you go: InChI=1S\/C12H11N7\/c13-9-7(6-4-2-1-3-5-6)16-8-10(14)18-12(15)19-11(8)17-9\/h1-5H,(H6,13,14,15,17,18,19)"}", "/scratch/micpie/export/nr_ahr_tox21/train_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-AhR assay.\nMolecule SMILES: CCOc1ccc2nc(S(N)(=O)=O)sc2c1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is toxic in the NR-AhR assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the aryl hydrocarbon receptor assay.\nMolecule canonical SMILES: NCC1CCCC(CN)C1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the aryl hydrocarbon receptor assay."}", "/scratch/micpie/export/nr_ahr_tox21/valid_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the aryl hydrocarbon receptor assay.\nMolecule canonical SMILES: Cc1cnc(C)c(C)n1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the aryl hydrocarbon receptor assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-AhR assay.\nSELFIES: [N][C][=N][C][Branch1][C][N][=C][N][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Branch1][C][N][=N][C][Ring1][=N][=N][Ring2][Ring1][C]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is toxic in the NR-AhR assay."}", "/scratch/micpie/export/nr_ahr_tox21/test_0-9.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20) toxic in the aryl hydrocarbon receptor assay?\nAssistant: No, it is not toxic in the aryl hydrocarbon receptor assay."} {"text":"User: Is the molecule with the SELFIES [C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][N][Branch1][#Branch2][S][C][Branch1][C][F][Branch1][C][Cl][Cl][C][=C][C][=C][C][=C][Ring1][=Branch1] toxic in the aryl hydrocarbon receptor assay?\nAssistant: No, it is not toxic in the aryl hydrocarbon receptor assay."}", "/scratch/micpie/export/nr_ahr_tox21/test_0-0.jsonl": "{"text":"The molecule with the SELFIES [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C] is not toxic in the NR-AhR assay."} {"text":"The molecule with the InChI InChI=1S\/C9H11Cl2FN2O2S2\/c1-13(2)18(15,16)14(17-9(10,11)12)8-6-4-3-5-7-8\/h3-7H,1-2H3 is not toxic in the aryl hydrocarbon receptor assay."}", "/scratch/micpie/export/nr_ahr_tox21/valid_0-7.jsonl": "{"text":"Task: Please generate a DeepSMILES based on the text description.\nDescription: A molecule that is toxic in the aryl hydrocarbon receptor assay.\nResult: CccncC)cC)n6"} {"text":"Task: Please give me a molecule DeepSMILES based on the text description.\nDescription: A molecule that is toxic in the NR-AhR assay.\nResult: NcncN)cnc-cccccc6))))))cN)nc6n%10"}", "/scratch/micpie/export/nr_ahr_tox21/test_0-3.jsonl": "{"text":"The InChI InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20) is from a molecule that is not identified as toxic in the NR-AhR assay."} {"text":"The SMILES CN(C)S(=O)(=O)N(SC(F)(Cl)Cl)c1ccccc1 represents a molecule that is not identified as toxic in the NR-AhR assay."}", "/scratch/micpie/export/nr_ahr_tox21/valid_0-11.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that is not toxic in the NR-AhR assay?\nAssistant: This is a molecule that is not toxic in the NR-AhR assay: Cc1cnc(C)c(C)n1"} {"text":"User: I'm looking for the InChI of a molecule that is toxic in the NR-AhR assay?\nAssistant: This is a molecule that is toxic in the NR-AhR assay: InChI=1S\/C12H11N7\/c13-9-7(6-4-2-1-3-5-6)16-8-10(14)18-12(15)19-11(8)17-9\/h1-5H,(H6,13,14,15,17,18,19)"}", "/scratch/micpie/export/nr_ahr_tox21/train_0-0.jsonl": "{"text":"The molecule with the SELFIES [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N] is toxic in the aryl hydrocarbon receptor assay."} {"text":"The molecule with the DeepSMILES representation of NCCCCCCCN))C6 is not toxic in the NR-AhR assay."}", "/scratch/micpie/export/nr_ahr_tox21/test_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the aryl hydrocarbon receptor assay.\ncanonical SMILES: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the aryl hydrocarbon receptor assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the aryl hydrocarbon receptor assay.\ncanonical SMILES: CN(C)S(=O)(=O)N(SC(F)(Cl)Cl)c1ccccc1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the aryl hydrocarbon receptor assay."}", "/scratch/micpie/export/nr_ahr_tox21/train_0-10.jsonl": "{"text":"User: Can you create the SMILES of a molecule that is toxic in the aryl hydrocarbon receptor assay?\nAssistant: Yes, I'm happy to help, here you go: CCOc1ccc2nc(S(N)(=O)=O)sc2c1"} {"text":"User: Can you give me the canonical SMILES of a molecule that is not toxic in the NR-AhR assay?\nAssistant: Sure, here you go: NCC1CCCC(CN)C1"}", "/scratch/micpie/export/nr_ahr_tox21/train_0-3.jsonl": "{"text":"The SMILES CCOc1ccc2nc(S(N)(=O)=O)sc2c1 represents a molecule that is identified as toxic in the aryl hydrocarbon receptor assay."} {"text":"The canonical SMILES NCC1CCCC(CN)C1 is from a molecule that is not identified as toxic in the NR-AhR assay."}", "/scratch/micpie/export/nr_ahr_tox21/train_0-12.jsonl": "{"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should be toxic in the aryl hydrocarbon receptor assay.\nAssistant: Got it, this SELFIES is toxic in the aryl hydrocarbon receptor assay: [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N]"} {"text":"User: I want to come up with a canonical SMILES.\nAssistant: This sounds very curious. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the NR-AhR assay.\nAssistant: Got it, here you go, this canonical SMILES is not toxic in the NR-AhR assay: NCC1CCCC(CN)C1"}", "/scratch/micpie/export/nr_ahr_tox21/test_0-13.jsonl": "{"text":"User: I want to generate a SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the aryl hydrocarbon receptor assay.\nAssistant: Got it, this SMILES is not toxic in the aryl hydrocarbon receptor assay: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"User: I want to generate a molecule InChI.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the aryl hydrocarbon receptor assay.\nAssistant: Ok, this InChI is not toxic in the aryl hydrocarbon receptor assay: InChI=1S\/C9H11Cl2FN2O2S2\/c1-13(2)18(15,16)14(17-9(10,11)12)8-6-4-3-5-7-8\/h3-7H,1-2H3"}", "/scratch/micpie/export/nr_ahr_tox21/valid_0-2.jsonl": "{"text":"Based on the DeepSMILES representation CccncC)cC)n6, the molecule has no NR-AhR toxicity characteristics."} {"text":"Based on the canonical SMILES representation Nc1nc(N)c2nc(-c3ccccc3)c(N)nc2n1, the molecule has NR-AhR toxicity characteristics."}", "/scratch/micpie/export/nr_ahr_tox21/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES CCOc1ccc2nc(S(N)(=O)=O)sc2c1 toxic in the NR-AhR assay?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any additional words.\nOptions:\n1 True\n2 False\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of NCC1CCCC(CN)C1 toxic in the NR-AhR assay?\nConstraint: Even if you are not sure, you must pick either a or b without using any other words.\nOptions:\na. False\nb. True\nAnswer: a"}", "/scratch/micpie/export/nr_ahr_tox21/valid_0-1.jsonl": "{"text":"The molecule with the SMILES representation of Cc1cnc(C)c(C)n1 is not exhibiting toxicity in the NR-AhR assay."} {"text":"The molecule with the InChI InChI=1S\/C12H11N7\/c13-9-7(6-4-2-1-3-5-6)16-8-10(14)18-12(15)19-11(8)17-9\/h1-5H,(H6,13,14,15,17,18,19) is displaying toxicity in the NR-AhR assay."}", "/scratch/micpie/export/nr_ahr_tox21/valid_0-13.jsonl": "{"text":"User: I want to generate a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the aryl hydrocarbon receptor assay.\nAssistant: Got it, this canonical SMILES is not toxic in the aryl hydrocarbon receptor assay: Cc1cnc(C)c(C)n1"} {"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should be toxic in the NR-AhR assay.\nAssistant: Got it, this InChI is toxic in the NR-AhR assay: InChI=1S\/C12H11N7\/c13-9-7(6-4-2-1-3-5-6)16-8-10(14)18-12(15)19-11(8)17-9\/h1-5H,(H6,13,14,15,17,18,19)"}", "/scratch/micpie/export/nr_ahr_tox21/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the aryl hydrocarbon receptor assay.\nMolecule DeepSMILES: CccncC)cC)n6\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the aryl hydrocarbon receptor assay.\nInChI: InChI=1S\/C12H11N7\/c13-9-7(6-4-2-1-3-5-6)16-8-10(14)18-12(15)19-11(8)17-9\/h1-5H,(H6,13,14,15,17,18,19)\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: True"}", "/scratch/micpie/export/nr_ahr_tox21/train_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are toxic in the aryl hydrocarbon receptor assay?\nConstraint: You must select none, one or more options from a or b without using any other words.\nOptions:\na. CCOccccncSN)=O)=O))sc5c9\nb. ClCCCl)CCl\nAnswer: a"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-AhR assay?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any other words.\nOptions:\n1.) NCCCCCCCN))C6\n2.) CCCCCC=CCC=O)OC\n3.) CCCcnnnn5CC%10\nAnswer: 1, 2, 3"}", "/scratch/micpie/export/nr_ahr_tox21/valid_0-4.jsonl": "{"text":"The molecule SMILES Cc1cnc(C)c(C)n1 is not toxic in the NR-AhR assay."} {"text":"The canonical SMILES Nc1nc(N)c2nc(-c3ccccc3)c(N)nc2n1 is toxic in the aryl hydrocarbon receptor assay."}", "/scratch/micpie/export/nr_ahr_tox21/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the aryl hydrocarbon receptor assay.\nSELFIES: [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the aryl hydrocarbon receptor assay.\nSELFIES: [N][C][C][C][C][C][C][Branch1][Ring1][C][N][C][Ring1][Branch2]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/nr_ahr_tox21/valid_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-AhR assay?\nConstraint: You must select none, one or more options from 1, 2, 3, 4, or 5 without using any other words.\nOptions:\n1. InChI=1S\/C6H15N\/c1-4-7(5-2)6-3\/h4-6H2,1-3H3\n2. InChI=1S\/C11H15NO2\/c1-8(12-2)5-9-3-4-10-11(6-9)14-7-13-10\/h3-4,6,8,12H,5,7H2,1-2H3\n3. InChI=1S\/C7H10N2\/c1-5-4-8-6(2)7(3)9-5\/h4H,1-3H3\n4. InChI=1S\/Mo.3O\n5. InChI=1S\/C22H17N3O5\/c1-27-13-17(22(26)28-2)16-8-4-6-10-19(16)30-21-11-20(24-14-25-21)29-18-9-5-3-7-15(18)12-23\/h3-11,13-14H,1-2H3\/b17-13+\nAnswer: 1, 2, 3, 4, 5"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are toxic in the NR-AhR assay?\nConstraint: You must select none, one or more options from A or B without using any additional words.\nOptions:\nA. Nc1nc(N)c2nc(-c3ccccc3)c(N)nc2n1\nB. Cc1c(O)cccc1C(=O)N[C@@H](CSc1ccccc1)[C@H](O)CN1C[C@H]2CCCC[C@H]2C[C@H]1C(=O)NC(C)(C)C\nAnswer: A"}", "/scratch/micpie/export/nr_ahr_tox21/valid_0-12.jsonl": "{"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very curious. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the NR-AhR assay.\nAssistant: Got it, here you go, this SMILES is not toxic in the NR-AhR assay: Cc1cnc(C)c(C)n1"} {"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very curious. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should be toxic in the aryl hydrocarbon receptor assay.\nAssistant: Ok, here you go, this InChI is toxic in the aryl hydrocarbon receptor assay: InChI=1S\/C12H11N7\/c13-9-7(6-4-2-1-3-5-6)16-8-10(14)18-12(15)19-11(8)17-9\/h1-5H,(H6,13,14,15,17,18,19)"}", "/scratch/micpie/export/nr_ahr_tox21/train_0-2.jsonl": "{"text":"Based on the InChI representation InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13), the molecule has aryl hydrocarbon receptor toxicity characteristics."} {"text":"Based on the DeepSMILES NCCCCCCCN))C6, the molecule has no aryl hydrocarbon receptor toxicity features."}", "/scratch/micpie/export/nr_ahr_tox21/test_0-11.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that is not toxic in the aryl hydrocarbon receptor assay?\nAssistant: This is a molecule that is not toxic in the aryl hydrocarbon receptor assay: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"User: I'm searching for the InChI of a molecule that is not toxic in the NR-AhR assay?\nAssistant: This is a molecule that is not toxic in the NR-AhR assay: InChI=1S\/C9H11Cl2FN2O2S2\/c1-13(2)18(15,16)14(17-9(10,11)12)8-6-4-3-5-7-8\/h3-7H,1-2H3"}", "/scratch/micpie/export/nr_ahr_tox21/train_0-7.jsonl": "{"text":"Task: Please create a molecule canonical SMILES based on the description.\nDescription: A molecule that is toxic in the NR-AhR assay.\nResult: CCOc1ccc2nc(S(N)(=O)=O)sc2c1"} {"text":"Task: Please create a molecule DeepSMILES based on the text description below.\nDescription: A molecule that is toxic in the aryl hydrocarbon receptor assay.\nResult: NCCCCCCCN))C6"}", "/scratch/micpie/export/nr_ahr_tox21/train_0-11.jsonl": "{"text":"User: I'm looking for the DeepSMILES of a molecule that is toxic in the aryl hydrocarbon receptor assay?\nAssistant: This is a molecule that is toxic in the aryl hydrocarbon receptor assay: CCOccccncSN)=O)=O))sc5c9"} {"text":"User: I'm searching for the canonical SMILES of a molecule that is not toxic in the NR-AhR assay?\nAssistant: This is a molecule that is not toxic in the NR-AhR assay: NCC1CCCC(CN)C1"}", "/scratch/micpie/export/nr_ahr_tox21/train_0-1.jsonl": "{"text":"The molecule with the InChI representation of InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13) is exhibiting toxicity in the NR-AhR assay."} {"text":"The molecule with the SELFIES [N][C][C][C][C][C][C][Branch1][Ring1][C][N][C][Ring1][Branch2] is not demonstrating toxicity in the NR-aryl hydrocarbon receptor assay."}", "/scratch/micpie/export/nr_ahr_tox21/train_0-13.jsonl": "{"text":"User: I want to come up with a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should be toxic in the NR-AhR assay.\nAssistant: Got it, this canonical SMILES is toxic in the NR-AhR assay: CCOc1ccc2nc(S(N)(=O)=O)sc2c1"} {"text":"User: I want to generate a SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the NR-AhR assay.\nAssistant: Ok, this SMILES is not toxic in the NR-AhR assay: NCC1CCCC(CN)C1"}", "/scratch/micpie/export/nr_ahr_tox21/train_0-4.jsonl": "{"text":"The SELFIES [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N] is toxic in the NR-AhR assay."} {"text":"The molecule InChI InChI=1S\/C8H18N2\/c9-5-7-2-1-3-8(4-7)6-10\/h7-8H,1-6,9-10H2 is not toxic in the NR-AhR assay."}", "/scratch/micpie/export/nr_ahr_tox21/test_0-7.jsonl": "{"text":"Task: Please create a SMILES based on the description.\nDescription: A molecule that is toxic in the aryl hydrocarbon receptor assay.\nResult: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"Task: Please give me a molecule SMILES based on the description.\nDescription: A molecule that is toxic in the NR-AhR assay.\nResult: CN(C)S(=O)(=O)N(SC(F)(Cl)Cl)c1ccccc1"}", "/scratch/micpie/export/nr_ahr_tox21/train_0-9.jsonl": "{"text":"User: Is the molecule with the SELFIES [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N] toxic in the NR-AhR assay?\nAssistant: Yes, it is toxic in the NR-AhR assay."} {"text":"User: Is the molecule with the SMILES NCC1CCCC(CN)C1 toxic in the NR-AhR assay?\nAssistant: No, it is not toxic in the NR-AhR assay."}", "/scratch/micpie/export/nr_ahr_tox21/valid_0-3.jsonl": "{"text":"The SELFIES [C][C][=C][N][=C][Branch1][C][C][C][Branch1][C][C][=N][Ring1][Branch2] represents a molecule that is not identified as toxic in the NR-AhR assay."} {"text":"The InChI InChI=1S\/C12H11N7\/c13-9-7(6-4-2-1-3-5-6)16-8-10(14)18-12(15)19-11(8)17-9\/h1-5H,(H6,13,14,15,17,18,19) represents a molecule that is identified as toxic in the aryl hydrocarbon receptor assay."}", "/scratch/micpie/export/nr_ahr_tox21/test_0-8.jsonl": "{"text":"User: Can you estimate if the molecule with the canonical SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C is toxic in the aryl hydrocarbon receptor assay?\nAssistant: No, this molecule is not toxic in the aryl hydrocarbon receptor assay."} {"text":"User: Can you figure out if the molecule with the InChI InChI=1S\/C9H11Cl2FN2O2S2\/c1-13(2)18(15,16)14(17-9(10,11)12)8-6-4-3-5-7-8\/h3-7H,1-2H3 is toxic in the NR-AhR assay?\nAssistant: No, this molecule is not toxic in the NR-AhR assay."}", "/scratch/micpie/export/nr_ahr_tox21/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C] toxic in the NR-AhR assay?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any additional words.\nOptions:\n1: True\n2: False\nAnswer: 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [C][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][N][Branch1][#Branch2][S][C][Branch1][C][F][Branch1][C][Cl][Cl][C][=C][C][=C][C][=C][Ring1][=Branch1] toxic in the aryl hydrocarbon receptor assay?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any additional words.\nOptions:\n1.) True\n2.) False\nAnswer: 2"}", "/scratch/micpie/export/nr_ahr_tox21/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI representation of InChI=1S\/C7H10N2\/c1-5-4-8-6(2)7(3)9-5\/h4H,1-3H3 toxic in the NR-AhR assay?\nConstraint: Even if you are not sure, you must pick either A or B without using any additional words.\nOptions:\n(A) False\n(B) True\nAnswer: A"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES Nc1nc(N)c2nc(-c3ccccc3)c(N)nc2n1 toxic in the NR-AhR assay?\nConstraint: Even if you are uncertain, you must pick either A or B without using any other words.\nOptions:\n[A] True\n[B] False\nAnswer: A"}", "/scratch/micpie/export/nr_ahr_tox21/test_0-4.jsonl": "{"text":"The DeepSMILES CCCNCC))CCC))C=O)NccC)cccc6C is not toxic in the NR-AhR assay."} {"text":"The molecule SMILES CN(C)S(=O)(=O)N(SC(F)(Cl)Cl)c1ccccc1 is not toxic in the aryl hydrocarbon receptor assay."}", "/scratch/micpie/export/nr_ahr_tox21/test_0-12.jsonl": "{"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be toxic in the NR-AhR assay.\nAssistant: Got it, this InChI is not toxic in the NR-AhR assay: InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20)"} {"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be toxic in the aryl hydrocarbon receptor assay.\nAssistant: Got it, here you go, this SMILES is not toxic in the aryl hydrocarbon receptor assay: CN(C)S(=O)(=O)N(SC(F)(Cl)Cl)c1ccccc1"}", "/scratch/micpie/export/bio_ner_40/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: (A) Immunoblot and densitometric analysis of p53\/p21 signaling [ total p53, p53-P (ser18) and p21] in cerebellum of Ptc1+\/-, Rad54-\/-\/ Ptc1+\/-, Parp-1-\/-\/ Ptc1+\/-, and Rad54-\/-\/ Parp-1-\/-\/ Ptc1+\/-mice at P1, and (B) in MEFs from Ptc1+\/-and Rad54-\/-\/ Parp-1-\/-\/ Ptc1+\/-mice (two independent cell lines per genotype); β-actin and HSP70 were used as loading controls. (C) Growth kinetics and (D) quantification of SA-β-GAL staining in MEFs from Ptc1+\/-and Rad54-\/-\/ Parp-1-\/-\/ Ptc1+\/-mouse embryos. (E) Representative images of SA-β-GAL staining in MEFs from Ptc1+\/-, and (F) and Rad54-\/-\/ Parp-1-\/-\/ Ptc1+\/-mouse embryos. (G-L) Proliferation (Ki67) and (M-R) DNA damage (γ-H2AX) in different tissues from Ptc1+\/-and Rad54-\/-\/ Parp-1-\/-\/ Ptc1+\/-neonatal mice at P1..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: p53,46,49,Gene\/Protein\np21,52,55,Gene\/Protein\np53,74,77,Gene\/Protein\np53,79,82,Gene\/Protein\np21,100,103,Gene\/Protein\nPtc1,122,126,Gene\/Protein\nRad54,132,137,Gene\/Protein\nPtc1,143,147,Gene\/Protein\nParp - 1,153,161,Gene\/Protein\nPtc1,167,171,Gene\/Protein\nRad54,181,186,Gene\/Protein\nParp - 1,192,200,Gene\/Protein\nPtc1,206,210,Gene\/Protein\nmice,215,219,Organism\/Species\nPtc1,249,253,Gene\/Protein\nRad54,262,267,Gene\/Protein\nParp - 1,273,281,Gene\/Protein\nPtc1,287,291,Gene\/Protein\nmice,296,300,Organism\/Species\nβ - actin,346,355,Gene\/Protein\nHSP70,360,365,Gene\/Protein\nβ - GAL,450,457,Gene\/Protein\nPtc1,480,484,Gene\/Protein\nRad54,493,498,Gene\/Protein\nParp - 1,504,512,Gene\/Protein\nPtc1,518,522,Gene\/Protein\nmouse,527,532,Organism\/Species\nβ - GAL,577,584,Gene\/Protein\nPtc1,607,611,Gene\/Protein\nRad54,630,635,Gene\/Protein\nParp - 1,641,649,Gene\/Protein\nPtc1,655,659,Gene\/Protein\nmouse,664,669,Organism\/Species\nKi67,704,708,Gene\/Protein\nH2AX,740,744,Gene\/Protein\nPtc1,772,776,Gene\/Protein\nRad54,785,790,Gene\/Protein\nParp - 1,796,804,Gene\/Protein\nPtc1,810,814,Gene\/Protein\nmice,828,832,Organism\/Species"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: (A) Immunoblot and densitometric analysis of p53\/p21 signaling [ total p53, p53-P (ser18) and p21] in cerebellum of Ptc1+\/-, Rad54-\/-\/ Ptc1+\/-, Parp-1-\/-\/ Ptc1+\/-, and Rad54-\/-\/ Parp-1-\/-\/ Ptc1+\/-mice at P1, and (B) in MEFs from Ptc1+\/-and Rad54-\/-\/ Parp-1-\/-\/ Ptc1+\/-mice (two independent cell lines per genotype); β-actin and HSP70 were used as loading controls. (C) Growth kinetics and (D) quantification of SA-β-GAL staining in MEFs from Ptc1+\/-and Rad54-\/-\/ Parp-1-\/-\/ Ptc1+\/-mouse embryos. (E) Representative images of SA-β-GAL staining in MEFs from Ptc1+\/-, and (F) and Rad54-\/-\/ Parp-1-\/-\/ Ptc1+\/-mouse embryos. (G-L) Proliferation (Ki67) and (M-R) DNA damage (γ-H2AX) in different tissues from Ptc1+\/-and Rad54-\/-\/ Parp-1-\/-\/ Ptc1+\/-neonatal mice at P1..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: p53,46,49,Gene\/Protein\np21,52,55,Gene\/Protein\np53,74,77,Gene\/Protein\np53,79,82,Gene\/Protein\np21,100,103,Gene\/Protein\nPtc1,122,126,Gene\/Protein\nRad54,132,137,Gene\/Protein\nPtc1,143,147,Gene\/Protein\nParp - 1,153,161,Gene\/Protein\nPtc1,167,171,Gene\/Protein\nRad54,181,186,Gene\/Protein\nParp - 1,192,200,Gene\/Protein\nPtc1,206,210,Gene\/Protein\nmice,215,219,Organism\/Species\nPtc1,249,253,Gene\/Protein\nRad54,262,267,Gene\/Protein\nParp - 1,273,281,Gene\/Protein\nPtc1,287,291,Gene\/Protein\nmice,296,300,Organism\/Species\nβ - actin,346,355,Gene\/Protein\nHSP70,360,365,Gene\/Protein\nβ - GAL,450,457,Gene\/Protein\nPtc1,480,484,Gene\/Protein\nRad54,493,498,Gene\/Protein\nParp - 1,504,512,Gene\/Protein\nPtc1,518,522,Gene\/Protein\nmouse,527,532,Organism\/Species\nβ - GAL,577,584,Gene\/Protein\nPtc1,607,611,Gene\/Protein\nRad54,630,635,Gene\/Protein\nParp - 1,641,649,Gene\/Protein\nPtc1,655,659,Gene\/Protein\nmouse,664,669,Organism\/Species\nKi67,704,708,Gene\/Protein\nH2AX,740,744,Gene\/Protein\nPtc1,772,776,Gene\/Protein\nRad54,785,790,Gene\/Protein\nParp - 1,796,804,Gene\/Protein\nPtc1,810,814,Gene\/Protein\nmice,828,832,Organism\/Species"}", "/scratch/micpie/export/core_mof_no_topo/train_0-17.jsonl": "{"text":"Question: What is the methane uptake at 298 K and 6500000 Pa as obtained from grand canonical Monte Carlo simulations of the metal-organic framework with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n?\nAnswer: The methane uptake at 298 K and 6500000 Pa as obtained from grand canonical Monte Carlo simulations is 4.4 mol\/kg."} {"text":"Question: What is the CH4 uptake at 298 K and 6500000 Pa as obtained from grand canonical Monte Carlo simulations of the metal-organic framework with the following CIF file [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n?\nAnswer: The CH4 uptake at 298 K and 6500000 Pa as obtained from grand canonical Monte Carlo simulations is 6.3 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/train_0-16.jsonl": "{"text":"Question: What is the simulated methane uptake at 298 K and 580000 Pa of the metal-organic framework with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n?\nAnswer: The simulated methane uptake at 298 K and 580000 Pa is 2.9 mol\/kg."} {"text":"Question: What is the simulated methane uptake at 298 K and 580000 Pa of the MOF with the following CIF file [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n?\nAnswer: The simulated methane uptake at 298 K and 580000 Pa is 5.0 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/test_0-10.jsonl": "{"text":"Question: What linker molecules are used in the reticular material with the following crystal structure in CIF format [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n?\nAnswer: The SMILES of the linker molecules are [O-]C(=O)c1ccc(s1)C(=O)[O-]."} {"text":"Question: What linker molecules are present in the metal-organic framework (MOF) with the following CIF file [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n?\nAnswer: The SMILES of the linker molecules are [O-]C(=O)Cc1cc(CC(=O)[O-])cc(c1)CC(=O)[O-]."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-8.jsonl": "{"text":"The metal-organic framework with the following CIF file [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n has a logarithm of Henry's constant for CH4 obtained from Widom insertion simulations of -4.55 log(mol\/kg\/Pa)."} {"text":"The MOF with the following CIF file [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n has a logarithm of Henry's constant for methane obtained from Widom insertion simulations of -4.55 log(mol\/kg\/Pa)."}", "/scratch/micpie/export/core_mof_no_topo/test_0-22.jsonl": "{"text":"User: I just created a metal-organic framework (MOF) with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n. What is the heat of adsorption of methane (computed using the Widom insertion technique)?\nAssistant: The heat of adsorption of methane (computed using the Widom insertion technique) is -17.43 kJ\/mol. Is there anything else I can do for you?\nUser: Yeah, I would like to know the carbon dioxide uptake at 298 K and 15000 Pa as obtained from grand canonical Monte Carlo simulations.\nAssistant: The carbon dioxide uptake at 298 K and 15000 Pa as obtained from grand canonical Monte Carlo simulations is 3.1 mol\/kg."} {"text":"User: I just made a MOF with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n. What is the simulated heat of adsorption of methane (obtained using Widom insertions)?\nAssistant: The simulated heat of adsorption of methane (obtained using Widom insertions) is -21.63 kJ\/mol. Is there anything else I can do for you?\nUser: Indeed, I would like to know the simulated carbon dioxide uptake at 298 K and 15000 Pa.\nAssistant: The simulated carbon dioxide uptake at 298 K and 15000 Pa is 5.4 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/test_0-16.jsonl": "{"text":"Question: What is the simulated CH4 uptake at 298 K and 580000 Pa of the reticular material with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n?\nAnswer: The simulated CH4 uptake at 298 K and 580000 Pa is 2.9 mol\/kg."} {"text":"Question: What is the methane uptake at 298 K and 5.8 bar as obtained from GCMC simulations of the reticular material with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n?\nAnswer: The methane uptake at 298 K and 5.8 bar as obtained from GCMC simulations is 2.0 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/test_0-15.jsonl": "{"text":"Question: What is the CO2 uptake at 298 K and 1600000 Pa as obtained from grand canonical Monte Carlo simulations of the reticular material with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n?\nAnswer: The CO2 uptake at 298 K and 1600000 Pa as obtained from grand canonical Monte Carlo simulations is 5.6 mol\/kg."} {"text":"Question: What is the carbon dioxide uptake at 298 K and 1600000 Pa as obtained from GCMC simulations of the metal-organic framework with the following CIF file [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n?\nAnswer: The carbon dioxide uptake at 298 K and 1600000 Pa as obtained from GCMC simulations is 7.0 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/train_0-8.jsonl": "{"text":"The MOF with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n has a logarithm of Henry's constant for methane obtained from Widom insertion simulations of -4.70 log(mol\/kg\/Pa)."} {"text":"The reticular material with the following CIF file [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n has a logarithm of Henry's constant for methane obtained from Widom insertion simulations of -4.14 log(mol\/kg\/Pa)."}", "/scratch/micpie/export/core_mof_no_topo/test_0-5.jsonl": "{"text":"The reticular material with the following crystal structure in CIF format [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n has a CH4 uptake at 298 K and 580000 Pa as obtained from GCMC simulations of 2.9 mol\/kg."} {"text":"The reticular material with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n has a CH4 uptake at 298 K and 580000 Pa as obtained from GCMC simulations of 2.0 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-9.jsonl": "{"text":"The reticular material with the following crystal structure in CIF format [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n has a deliverable capacity of CH4 obtained from GCMC simulations between 5.8 bar and 65 bar at 298 K of 31.709 vSTP\/v."} {"text":"The MOF with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n has a deliverable capacity of CH4 (between 5.8 bar and 65 bar at 298 K) obtained from GCMC simulations of 42.994 vSTP\/v."}", "/scratch/micpie/export/core_mof_no_topo/test_0-19.jsonl": "{"text":"Question: What is the logarithm of Henry's constant for methane obtained from Widom insertion simulations of the metal-organic framework with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n?\nAnswer: The logarithm of Henry's constant for methane obtained from Widom insertion simulations is -4.64 log(mol\/kg\/Pa)."} {"text":"Question: What is the logarithm of Henry's constant for methane obtained from Widom insertion simulations of the reticular material with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n?\nAnswer: The logarithm of Henry's constant for methane obtained from Widom insertion simulations is -4.55 log(mol\/kg\/Pa)."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-24.jsonl": "{"text":"User: I need to synthesize a metal-organic framework with a deliverable capacity of methane (between 5.8 bar and 65 bar at 298 K) obtained from GCMC simulations of 31.709 vSTP\/v. What building blocks do I need to combine?\nAssistant: I'd recommend linkers with SMILES nan and nodes with SMILES nan. Is there anything else I can do for you?\nUser: I would like to know the CO2 uptake at 298 K and 0.15 bar as obtained from GCMC simulations.\nAssistant: The CO2 uptake at 298 K and 0.15 bar as obtained from GCMC simulations is 4.7 mol\/kg. Is there anything else I can do for you?\nUser: I would like to know the methane uptake at 298 K and 580000 Pa as obtained from GCMC simulations.\nAssistant: The methane uptake at 298 K and 580000 Pa as obtained from GCMC simulations is 4.2 mol\/kg. Is there anything else I can do for you?\nUser: Nope, that's all I need to know.\nAssistant: Anytime."} {"text":"User: I have to synthesize a metal-organic framework (MOF) with a deliverable capacity of CH4 (between 5.8 bar and 65 bar at 298 K) obtained from GCMC simulations of 42.994 vSTP\/v. What linkers and nodes do I need to combine?\nAssistant: I'd go for linkers with SMILES [O-]C(=O)Cc1cc(CC(=O)[O-])cc(c1)CC(=O)[O-] and nodes with SMILES [Sm]. Is there anything else you want to know?\nUser: Yeah, I would like to know the CO2 uptake at 298 K and 15000 Pa as obtained from grand canonical Monte Carlo simulations.\nAssistant: The CO2 uptake at 298 K and 15000 Pa as obtained from grand canonical Monte Carlo simulations is 5.4 mol\/kg. Is there anything else you want to know?\nUser: Yeah, I would like to know the CH4 uptake at 298 K and 580000 Pa as obtained from GCMC simulations.\nAssistant: The CH4 uptake at 298 K and 580000 Pa as obtained from GCMC simulations is 2.0 mol\/kg. Is there anything else you want to know?\nUser: No thank you, that's all I need to know.\nAssistant: "}", "/scratch/micpie/export/core_mof_no_topo/train_0-24.jsonl": "{"text":"User: I must synthesize a metal-organic framework with a deliverable capacity of CH4 (between 5.8 bar and 65 bar at 298 K) obtained from GCMC simulations of 60.337 vSTP\/v. What linkers and nodes do I need to combine?\nAssistant: I'd go for linkers with SMILES [O-]C(=O)c1ccc(s1)C(=O)[O-] and nodes with SMILES [Sr]. Is there anything else I can do for you?\nUser: Indeed, I would like to know the simulated carbon dioxide uptake at 298 K and 15000 Pa.\nAssistant: The simulated carbon dioxide uptake at 298 K and 15000 Pa is 3.3 mol\/kg. Is there anything else I can do for you?\nUser: Indeed, I would like to know the simulated methane uptake at 298 K and 580000 Pa.\nAssistant: The simulated methane uptake at 298 K and 580000 Pa is 2.9 mol\/kg. Is there anything else I can do for you?\nUser: No thank you, that's all I need to know.\nAssistant: Anytime."} {"text":"User: I want to create a MOF with a deliverable capacity (between 5.8 bar and 65 bar at 298 K) of methane obtained from GCMC simulations of 38.406 vSTP\/v. What building blocks do I need to combine?\nAssistant: I'd go for linkers with SMILES [O-]C(=O)C=CC(=O)[O-] and nodes with SMILES [Al], [OH]. Is there anything else you want to know?\nUser: Yes, I would like to know the CO2 uptake at 298 K and 0.15 bar as obtained from GCMC simulations.\nAssistant: The CO2 uptake at 298 K and 0.15 bar as obtained from GCMC simulations is 3.6 mol\/kg. Is there anything else you want to know?\nUser: Yes, I would like to know the methane uptake at 298 K and 580000 Pa as obtained from grand canonical Monte Carlo simulations.\nAssistant: The methane uptake at 298 K and 580000 Pa as obtained from grand canonical Monte Carlo simulations is 5.0 mol\/kg. Is there anything else you want to know?\nUser: No thank you, that's all I need to know.\nAssistant: "}", "/scratch/micpie/export/core_mof_no_topo/test_0-1.jsonl": "{"text":"The reticular material with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n has a heat of adsorption of carbon dioxide (computed using the Widom insertion technique) of -33.81 kJ\/mol."} {"text":"The MOF with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n has a heat of adsorption of carbon dioxide (computed using the Widom insertion technique) of -53.39 kJ\/mol."}", "/scratch/micpie/export/core_mof_no_topo/test_0-18.jsonl": "{"text":"Question: What is the logarithm of Henry's constant for carbon dioxide obtained from Widom insertion simulations of the metal-organic framework (MOF) with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n?\nAnswer: The logarithm of Henry's constant for carbon dioxide obtained from Widom insertion simulations is -2.85 log(mol\/kg\/Pa)."} {"text":"Question: What is the logarithm of Henry's constant for CO2 obtained from Widom insertion simulations of the MOF with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n?\nAnswer: The logarithm of Henry's constant for CO2 obtained from Widom insertion simulations is -0.78 log(mol\/kg\/Pa)."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-0.jsonl": "{"text":"The reticular material with the following crystal structure in CIF format [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n is build from linker molecules with the SMILES nan and nodes with the SMILES nan."} {"text":"The metal-organic framework (MOF) with the following CIF file [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n is build from linker molecules with the SMILES [O-]C(=O)Cc1cc(CC(=O)[O-])cc(c1)CC(=O)[O-] and nodes with the SMILES [Sm]."}", "/scratch/micpie/export/core_mof_no_topo/test_0-21.jsonl": "{"text":"User: I have a metal-organic framework (MOF) with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n. What is the heat of adsorption of carbon dioxide (computed using the Widom insertion technique)?\nAssistant: The heat of adsorption of carbon dioxide (computed using the Widom insertion technique) is -33.81 kJ\/mol."} {"text":"User: I am working with a metal-organic framework with the following CIF file [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n. What is the heat of adsorption of CO2 (computed using the Widom insertion technique)?\nAssistant: The heat of adsorption of CO2 (computed using the Widom insertion technique) is -53.39 kJ\/mol."}", "/scratch/micpie/export/core_mof_no_topo/test_0-2.jsonl": "{"text":"The reticular material with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n has a simulated heat of adsorption of methane (obtained using Widom insertions) of -17.43 kJ\/mol."} {"text":"The MOF with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n has a heat of adsorption of methane (computed using the Widom insertion technique) of -21.63 kJ\/mol."}", "/scratch/micpie/export/core_mof_no_topo/train_0-22.jsonl": "{"text":"User: I just created a metal-organic framework with the following crystal structure in CIF format [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n. What is the heat of adsorption of methane (computed using the Widom insertion technique)?\nAssistant: The heat of adsorption of methane (computed using the Widom insertion technique) is -16.83 kJ\/mol. Is there anything else I can do for you?\nUser: Indeed, I would like to know the simulated carbon dioxide uptake at 298 K and 15000 Pa.\nAssistant: The simulated carbon dioxide uptake at 298 K and 15000 Pa is 3.3 mol\/kg."} {"text":"User: I just created a metal-organic framework (MOF) with the following crystal structure in CIF format [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n. What is the heat of adsorption of CH4 (computed using the Widom insertion technique)?\nAssistant: The heat of adsorption of CH4 (computed using the Widom insertion technique) is -18.98 kJ\/mol. Is there anything else I can do for you?\nUser: Yeah, I would like to know the simulated CO2 uptake at 298 K and 0.15 bar.\nAssistant: The simulated CO2 uptake at 298 K and 0.15 bar is 3.6 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-10.jsonl": "{"text":"Question: What linker molecules are used in the metal-organic framework with the following crystal structure in CIF format [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n?\nAnswer: The SMILES of the linker molecules are nan."} {"text":"Question: What linker molecules are present in the metal-organic framework with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n?\nAnswer: The SMILES of the linker molecules are [O-]C(=O)Cc1cc(CC(=O)[O-])cc(c1)CC(=O)[O-]."}", "/scratch/micpie/export/core_mof_no_topo/train_0-6.jsonl": "{"text":"The reticular material with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n has a simulated methane uptake at 298 K and 6500000 Pa of 4.4 mol\/kg."} {"text":"The metal-organic framework with the following crystal structure in CIF format [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n has a methane uptake at 298 K and 6500000 Pa as obtained from GCMC simulations of 6.3 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-6.jsonl": "{"text":"The reticular material with the following CIF file [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n has a CH4 uptake at 298 K and 6500000 Pa as obtained from grand canonical Monte Carlo simulations of 5.1 mol\/kg."} {"text":"The MOF with the following CIF file [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n has a simulated methane uptake at 298 K and 6500000 Pa of 3.0 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/train_0-21.jsonl": "{"text":"User: I'm interested in a MOF with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n. What is the heat of adsorption of CO2 (computed using the Widom insertion technique)?\nAssistant: The heat of adsorption of CO2 (computed using the Widom insertion technique) is -32.73 kJ\/mol."} {"text":"User: I am working with a metal-organic framework with the following crystal structure in CIF format [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n. What is the simulated heat of adsorption of carbon dioxide (obtained using Widom insertions)?\nAssistant: The simulated heat of adsorption of carbon dioxide (obtained using Widom insertions) is -26.21 kJ\/mol."}", "/scratch/micpie/export/core_mof_no_topo/train_0-19.jsonl": "{"text":"Question: What is the logarithm of Henry's constant for methane obtained from Widom insertion simulations of the metal-organic framework with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n?\nAnswer: The logarithm of Henry's constant for methane obtained from Widom insertion simulations is -4.70 log(mol\/kg\/Pa)."} {"text":"Question: What is the logarithm of Henry's constant for methane obtained from Widom insertion simulations of the MOF with the following crystal structure in CIF format [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n?\nAnswer: The logarithm of Henry's constant for methane obtained from Widom insertion simulations is -4.14 log(mol\/kg\/Pa)."}", "/scratch/micpie/export/core_mof_no_topo/test_0-9.jsonl": "{"text":"The MOF with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n has a deliverable capacity of CH4 (between 5.8 bar and 65 bar at 298 K) obtained from GCMC simulations of 57.587 vSTP\/v."} {"text":"The metal-organic framework (MOF) with the following CIF file [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n has a deliverable capacity of CH4 obtained from GCMC simulations between 5.8 bar and 65 bar at 298 K of 42.994 vSTP\/v."}", "/scratch/micpie/export/core_mof_no_topo/test_0-0.jsonl": "{"text":"The metal-organic framework with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n is build from linker molecules with the SMILES [O-]C(=O)c1ccc(s1)C(=O)[O-] and nodes with the SMILES [Sr]."} {"text":"The reticular material with the following CIF file [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n is build from linker molecules with the SMILES [O-]C(=O)Cc1cc(CC(=O)[O-])cc(c1)CC(=O)[O-] and nodes with the SMILES [Sm]."}", "/scratch/micpie/export/core_mof_no_topo/test_0-24.jsonl": "{"text":"User: I want to create a reticular material with a deliverable capacity of methane (between 5.8 bar and 65 bar at 298 K) obtained from GCMC simulations of 57.587 vSTP\/v. What building blocks do I need to combine?\nAssistant: I'd go for linkers with SMILES [O-]C(=O)c1ccc(s1)C(=O)[O-] and nodes with SMILES [Sr]. Is there anything else I can do for you?\nUser: Yes, I would like to know the simulated CO2 uptake at 298 K and 15000 Pa.\nAssistant: The simulated CO2 uptake at 298 K and 15000 Pa is 3.1 mol\/kg. Is there anything else I can do for you?\nUser: Yes, I would like to know the methane uptake at 298 K and 580000 Pa as obtained from GCMC simulations.\nAssistant: The methane uptake at 298 K and 580000 Pa as obtained from GCMC simulations is 2.9 mol\/kg. Is there anything else I can do for you?\nUser: that's all I need to know.\nAssistant: Anytime."} {"text":"User: I want to make a reticular material with a deliverable capacity (between 5.8 bar and 65 bar at 298 K) of methane obtained from GCMC simulations of 42.994 vSTP\/v. What linkers and nodes do I need to combine?\nAssistant: I'd recommend linkers with SMILES [O-]C(=O)Cc1cc(CC(=O)[O-])cc(c1)CC(=O)[O-] and nodes with SMILES [Sm]. Is there anything else you want to know?\nUser: I would like to know the simulated CO2 uptake at 298 K and 0.15 bar.\nAssistant: The simulated CO2 uptake at 298 K and 0.15 bar is 5.4 mol\/kg. Is there anything else you want to know?\nUser: I would like to know the simulated methane uptake at 298 K and 5.8 bar.\nAssistant: The simulated methane uptake at 298 K and 5.8 bar is 2.0 mol\/kg. Is there anything else you want to know?\nUser: No, that's all I need to know.\nAssistant: You're welcome."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-16.jsonl": "{"text":"Question: What is the methane uptake at 298 K and 580000 Pa as obtained from grand canonical Monte Carlo simulations of the metal-organic framework (MOF) with the following crystal structure in CIF format [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n?\nAnswer: The methane uptake at 298 K and 580000 Pa as obtained from grand canonical Monte Carlo simulations is 4.2 mol\/kg."} {"text":"Question: What is the methane uptake at 298 K and 580000 Pa as obtained from GCMC simulations of the metal-organic framework with the following CIF file [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n?\nAnswer: The methane uptake at 298 K and 580000 Pa as obtained from GCMC simulations is 2.0 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-7.jsonl": "{"text":"The metal-organic framework (MOF) with the following crystal structure in CIF format [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n has a logarithm of Henry's constant for carbon dioxide obtained from Widom insertion simulations of -2.07 log(mol\/kg\/Pa)."} {"text":"The metal-organic framework with the following CIF file [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n has a logarithm of Henry's constant for carbon dioxide obtained from Widom insertion simulations of -0.78 log(mol\/kg\/Pa)."}", "/scratch/micpie/export/core_mof_no_topo/test_0-3.jsonl": "{"text":"The MOF with the following crystal structure in CIF format [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n has a simulated CO2 uptake at 298 K and 15000 Pa of 3.1 mol\/kg."} {"text":"The MOF with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n has a CO2 uptake at 298 K and 0.15 bar as obtained from GCMC simulations of 5.4 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-11.jsonl": "{"text":"Question: What nodes are present in the metal-organic framework with the following crystal structure in CIF format [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n?\nAnswer: The SMILES of the nodes are nan."} {"text":"Question: What nodes are used in the reticular material with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n?\nAnswer: The SMILES of the nodes are [Sm]."}", "/scratch/micpie/export/core_mof_no_topo/train_0-20.jsonl": "{"text":"Question: What is the deliverable capacity of CH4 (between 5.8 bar and 65 bar at 298 K) obtained from GCMC simulations of the metal-organic framework (MOF) with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n?\nAnswer: The deliverable capacity of CH4 (between 5.8 bar and 65 bar at 298 K) obtained from GCMC simulations is 60.337 vSTP\/v."} {"text":"Question: What is the deliverable capacity (between 5.8 bar and 65 bar at 298 K) of methane obtained from GCMC simulations of the MOF with the following crystal structure in CIF format [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n?\nAnswer: The deliverable capacity (between 5.8 bar and 65 bar at 298 K) of methane obtained from GCMC simulations is 38.406 vSTP\/v."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-20.jsonl": "{"text":"Question: What is the deliverable capacity (between 5.8 bar and 65 bar at 298 K) of methane obtained from GCMC simulations of the metal-organic framework (MOF) with the following CIF file [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n?\nAnswer: The deliverable capacity (between 5.8 bar and 65 bar at 298 K) of methane obtained from GCMC simulations is 31.709 vSTP\/v."} {"text":"Question: What is the deliverable capacity of methane (between 5.8 bar and 65 bar at 298 K) obtained from GCMC simulations of the metal-organic framework with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n?\nAnswer: The deliverable capacity of methane (between 5.8 bar and 65 bar at 298 K) obtained from GCMC simulations is 42.994 vSTP\/v."}", "/scratch/micpie/export/core_mof_no_topo/train_0-0.jsonl": "{"text":"The metal-organic framework (MOF) with the following crystal structure in CIF format [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n is build from linker molecules with the SMILES [O-]C(=O)c1ccc(s1)C(=O)[O-] and nodes with the SMILES [Sr]."} {"text":"The metal-organic framework with the following CIF file [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n is build from linker molecules with the SMILES [O-]C(=O)C=CC(=O)[O-] and nodes with the SMILES [Al], [OH]."}", "/scratch/micpie/export/core_mof_no_topo/test_0-6.jsonl": "{"text":"The MOF with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n has a CH4 uptake at 298 K and 6500000 Pa as obtained from grand canonical Monte Carlo simulations of 4.3 mol\/kg."} {"text":"The MOF with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n has a methane uptake at 298 K and 6500000 Pa as obtained from grand canonical Monte Carlo simulations of 3.0 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/train_0-10.jsonl": "{"text":"Question: What linker molecules are used in the metal-organic framework with the following crystal structure in CIF format [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n?\nAnswer: The SMILES of the linker molecules are [O-]C(=O)c1ccc(s1)C(=O)[O-]."} {"text":"Question: What linker molecules are present in the MOF with the following CIF file [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n?\nAnswer: The SMILES of the linker molecules are [O-]C(=O)C=CC(=O)[O-]."}", "/scratch/micpie/export/core_mof_no_topo/train_0-3.jsonl": "{"text":"The metal-organic framework with the following crystal structure in CIF format [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n has a CO2 uptake at 298 K and 0.15 bar as obtained from GCMC simulations of 3.3 mol\/kg."} {"text":"The MOF with the following crystal structure in CIF format [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n has a CO2 uptake at 298 K and 0.15 bar as obtained from GCMC simulations of 3.6 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/train_0-23.jsonl": "{"text":"User: I want to synthesize a metal-organic framework with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n.\nAssistant: How can I help?\nUser: What building blocks do I need to combine to make this metal-organic framework?\nAssistant: The SMILES of the linker molecules are [O-]C(=O)c1ccc(s1)C(=O)[O-] and the SMILES of the nodes are [Sr]."} {"text":"User: I am interested in a reticular material with the following crystal structure in CIF format [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n.\nAssistant: How can I help?\nUser: What building blocks do I need to combine to make this reticular material?\nAssistant: The SMILES of the linker molecules are [O-]C(=O)C=CC(=O)[O-] and the SMILES of the nodes are [Al], [OH]."}", "/scratch/micpie/export/core_mof_no_topo/train_0-12.jsonl": "{"text":"Question: What is the heat of adsorption of carbon dioxide (computed using the Widom insertion technique) of the metal-organic framework (MOF) with the following crystal structure in CIF format [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n?\nAnswer: The heat of adsorption of carbon dioxide (computed using the Widom insertion technique) is -32.73 kJ\/mol."} {"text":"Question: What is the heat of adsorption of carbon dioxide (computed using the Widom insertion technique) of the reticular material with the following crystal structure in CIF format [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n?\nAnswer: The heat of adsorption of carbon dioxide (computed using the Widom insertion technique) is -26.21 kJ\/mol."}", "/scratch/micpie/export/core_mof_no_topo/test_0-13.jsonl": "{"text":"Question: What is the simulated heat of adsorption of CH4 (obtained using Widom insertions) of the MOF with the following crystal structure in CIF format [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n?\nAnswer: The simulated heat of adsorption of CH4 (obtained using Widom insertions) is -17.43 kJ\/mol."} {"text":"Question: What is the simulated heat of adsorption of methane (obtained using Widom insertions) of the metal-organic framework with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n?\nAnswer: The simulated heat of adsorption of methane (obtained using Widom insertions) is -21.63 kJ\/mol."}", "/scratch/micpie/export/core_mof_no_topo/test_0-23.jsonl": "{"text":"User: I want to synthesize a metal-organic framework (MOF) with the following crystal structure in CIF format [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n.\nAssistant: That's interesting. How can I be of assistance?\nUser: What building blocks do I need to combine to make this metal-organic framework (MOF)?\nAssistant: The SMILES of the linker molecules are [O-]C(=O)c1ccc(s1)C(=O)[O-] and the SMILES of the nodes are [Sr]."} {"text":"User: I want to make a metal-organic framework (MOF) with the following CIF file [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n.\nAssistant: Cool. How can I be of assistance?\nUser: What building blocks do I need to combine to make this metal-organic framework (MOF)?\nAssistant: The SMILES of the linker molecules are [O-]C(=O)Cc1cc(CC(=O)[O-])cc(c1)CC(=O)[O-] and the SMILES of the nodes are [Sm]."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-2.jsonl": "{"text":"The metal-organic framework with the following crystal structure in CIF format [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n has a simulated heat of adsorption of methane (obtained using Widom insertions) of -21.10 kJ\/mol."} {"text":"The metal-organic framework with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n has a heat of adsorption of CH4 (computed using the Widom insertion technique) of -21.63 kJ\/mol."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-21.jsonl": "{"text":"User: I'm interested in a reticular material with the following CIF file [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n. What is the heat of adsorption of CO2 (computed using the Widom insertion technique)?\nAssistant: The heat of adsorption of CO2 (computed using the Widom insertion technique) is -43.74 kJ\/mol."} {"text":"User: I have a metal-organic framework with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n. What is the heat of adsorption of CO2 (computed using the Widom insertion technique)?\nAssistant: The heat of adsorption of CO2 (computed using the Widom insertion technique) is -53.39 kJ\/mol."}", "/scratch/micpie/export/core_mof_no_topo/train_0-14.jsonl": "{"text":"Question: What is the carbon dioxide uptake at 298 K and 15000 Pa as obtained from grand canonical Monte Carlo simulations of the reticular material with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n?\nAnswer: The carbon dioxide uptake at 298 K and 15000 Pa as obtained from grand canonical Monte Carlo simulations is 3.3 mol\/kg."} {"text":"Question: What is the carbon dioxide uptake at 298 K and 15000 Pa as obtained from grand canonical Monte Carlo simulations of the MOF with the following crystal structure in CIF format [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n?\nAnswer: The carbon dioxide uptake at 298 K and 15000 Pa as obtained from grand canonical Monte Carlo simulations is 3.6 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-1.jsonl": "{"text":"The MOF with the following CIF file [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n has a heat of adsorption of CO2 (computed using the Widom insertion technique) of -43.74 kJ\/mol."} {"text":"The MOF with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n has a simulated heat of adsorption of CO2 (obtained using Widom insertions) of -53.39 kJ\/mol."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-13.jsonl": "{"text":"Question: What is the heat of adsorption of methane (computed using the Widom insertion technique) of the metal-organic framework with the following CIF file [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n?\nAnswer: The heat of adsorption of methane (computed using the Widom insertion technique) is -21.10 kJ\/mol."} {"text":"Question: What is the simulated heat of adsorption of CH4 (obtained using Widom insertions) of the metal-organic framework (MOF) with the following CIF file [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n?\nAnswer: The simulated heat of adsorption of CH4 (obtained using Widom insertions) is -21.63 kJ\/mol."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-23.jsonl": "{"text":"User: I want to make a metal-organic framework (MOF) with the following crystal structure in CIF format [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n.\nAssistant: Cool. Is there anything I can help you with?\nUser: What building blocks do I need to combine to make this metal-organic framework (MOF)?\nAssistant: The SMILES of the linker molecules are nan and the SMILES of the nodes are nan."} {"text":"User: I am interested in a metal-organic framework with the following CIF file [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n.\nAssistant: How can I help?\nUser: What building blocks do I need to combine to make this metal-organic framework?\nAssistant: The SMILES of the linker molecules are [O-]C(=O)Cc1cc(CC(=O)[O-])cc(c1)CC(=O)[O-] and the SMILES of the nodes are [Sm]."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-5.jsonl": "{"text":"The metal-organic framework (MOF) with the following CIF file [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n has a methane uptake at 298 K and 5.8 bar as obtained from GCMC simulations of 4.2 mol\/kg."} {"text":"The reticular material with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n has a CH4 uptake at 298 K and 580000 Pa as obtained from GCMC simulations of 2.0 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/train_0-15.jsonl": "{"text":"Question: What is the CO2 uptake at 298 K and 1600000 Pa as obtained from grand canonical Monte Carlo simulations of the MOF with the following crystal structure in CIF format [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n?\nAnswer: The CO2 uptake at 298 K and 1600000 Pa as obtained from grand canonical Monte Carlo simulations is 5.8 mol\/kg."} {"text":"Question: What is the simulated CO2 uptake at 298 K and 16 bar of the reticular material with the following crystal structure in CIF format [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n?\nAnswer: The simulated CO2 uptake at 298 K and 16 bar is 9.4 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-4.jsonl": "{"text":"The metal-organic framework (MOF) with the following CIF file [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n has a carbon dioxide uptake at 298 K and 1600000 Pa as obtained from GCMC simulations of 5.5 mol\/kg."} {"text":"The MOF with the following CIF file [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n has a simulated CO2 uptake at 298 K and 16 bar of 7.0 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/train_0-5.jsonl": "{"text":"The MOF with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n has a simulated methane uptake at 298 K and 580000 Pa of 2.9 mol\/kg."} {"text":"The metal-organic framework with the following CIF file [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n has a simulated CH4 uptake at 298 K and 580000 Pa of 5.0 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-15.jsonl": "{"text":"Question: What is the simulated carbon dioxide uptake at 298 K and 1600000 Pa of the metal-organic framework with the following CIF file [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n?\nAnswer: The simulated carbon dioxide uptake at 298 K and 1600000 Pa is 5.5 mol\/kg."} {"text":"Question: What is the carbon dioxide uptake at 298 K and 1600000 Pa as obtained from GCMC simulations of the MOF with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n?\nAnswer: The carbon dioxide uptake at 298 K and 1600000 Pa as obtained from GCMC simulations is 7.0 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-12.jsonl": "{"text":"Question: What is the heat of adsorption of carbon dioxide (computed using the Widom insertion technique) of the metal-organic framework (MOF) with the following CIF file [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n?\nAnswer: The heat of adsorption of carbon dioxide (computed using the Widom insertion technique) is -43.74 kJ\/mol."} {"text":"Question: What is the simulated heat of adsorption of carbon dioxide (obtained using Widom insertions) of the MOF with the following CIF file [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n?\nAnswer: The simulated heat of adsorption of carbon dioxide (obtained using Widom insertions) is -53.39 kJ\/mol."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-18.jsonl": "{"text":"Question: What is the logarithm of Henry's constant for carbon dioxide obtained from Widom insertion simulations of the metal-organic framework with the following CIF file [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n?\nAnswer: The logarithm of Henry's constant for carbon dioxide obtained from Widom insertion simulations is -2.07 log(mol\/kg\/Pa)."} {"text":"Question: What is the logarithm of Henry's constant for carbon dioxide obtained from Widom insertion simulations of the MOF with the following CIF file [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n?\nAnswer: The logarithm of Henry's constant for carbon dioxide obtained from Widom insertion simulations is -0.78 log(mol\/kg\/Pa)."}", "/scratch/micpie/export/core_mof_no_topo/train_0-2.jsonl": "{"text":"The reticular material with the following crystal structure in CIF format [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n has a heat of adsorption of methane (computed using the Widom insertion technique) of -16.83 kJ\/mol."} {"text":"The MOF with the following CIF file [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n has a simulated heat of adsorption of CH4 (obtained using Widom insertions) of -18.98 kJ\/mol."}", "/scratch/micpie/export/core_mof_no_topo/test_0-11.jsonl": "{"text":"Question: What nodes are present in the metal-organic framework with the following crystal structure in CIF format [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n?\nAnswer: The SMILES of the nodes are [Sr]."} {"text":"Question: What nodes are present in the metal-organic framework with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n?\nAnswer: The SMILES of the nodes are [Sm]."}", "/scratch/micpie/export/core_mof_no_topo/train_0-7.jsonl": "{"text":"The metal-organic framework (MOF) with the following crystal structure in CIF format [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n has a logarithm of Henry's constant for carbon dioxide obtained from Widom insertion simulations of -2.97 log(mol\/kg\/Pa)."} {"text":"The metal-organic framework with the following crystal structure in CIF format [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n has a logarithm of Henry's constant for CO2 obtained from Widom insertion simulations of -3.39 log(mol\/kg\/Pa)."}", "/scratch/micpie/export/core_mof_no_topo/test_0-17.jsonl": "{"text":"Question: What is the simulated CH4 uptake at 298 K and 6500000 Pa of the metal-organic framework with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n?\nAnswer: The simulated CH4 uptake at 298 K and 6500000 Pa is 4.3 mol\/kg."} {"text":"Question: What is the CH4 uptake at 298 K and 6500000 Pa as obtained from GCMC simulations of the metal-organic framework (MOF) with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n?\nAnswer: The CH4 uptake at 298 K and 6500000 Pa as obtained from GCMC simulations is 3.0 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-19.jsonl": "{"text":"Question: What is the logarithm of Henry's constant for CH4 obtained from Widom insertion simulations of the metal-organic framework with the following crystal structure in CIF format [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n?\nAnswer: The logarithm of Henry's constant for CH4 obtained from Widom insertion simulations is -4.55 log(mol\/kg\/Pa)."} {"text":"Question: What is the logarithm of Henry's constant for methane obtained from Widom insertion simulations of the metal-organic framework (MOF) with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n?\nAnswer: The logarithm of Henry's constant for methane obtained from Widom insertion simulations is -4.55 log(mol\/kg\/Pa)."}", "/scratch/micpie/export/core_mof_no_topo/train_0-11.jsonl": "{"text":"Question: What nodes are present in the reticular material with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n?\nAnswer: The SMILES of the nodes are [Sr]."} {"text":"Question: What nodes are used in the metal-organic framework (MOF) with the following CIF file [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n?\nAnswer: The SMILES of the nodes are [Al], [OH]."}", "/scratch/micpie/export/core_mof_no_topo/train_0-1.jsonl": "{"text":"The reticular material with the following crystal structure in CIF format [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n has a simulated heat of adsorption of CO2 (obtained using Widom insertions) of -32.73 kJ\/mol."} {"text":"The metal-organic framework with the following CIF file [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n has a simulated heat of adsorption of carbon dioxide (obtained using Widom insertions) of -26.21 kJ\/mol."}", "/scratch/micpie/export/core_mof_no_topo/train_0-13.jsonl": "{"text":"Question: What is the heat of adsorption of CH4 (computed using the Widom insertion technique) of the metal-organic framework with the following crystal structure in CIF format [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n?\nAnswer: The heat of adsorption of CH4 (computed using the Widom insertion technique) is -16.83 kJ\/mol."} {"text":"Question: What is the simulated heat of adsorption of methane (obtained using Widom insertions) of the MOF with the following CIF file [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n?\nAnswer: The simulated heat of adsorption of methane (obtained using Widom insertions) is -18.98 kJ\/mol."}", "/scratch/micpie/export/core_mof_no_topo/train_0-4.jsonl": "{"text":"The MOF with the following crystal structure in CIF format [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n has a carbon dioxide uptake at 298 K and 1600000 Pa as obtained from GCMC simulations of 5.8 mol\/kg."} {"text":"The MOF with the following crystal structure in CIF format [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n has a carbon dioxide uptake at 298 K and 1600000 Pa as obtained from grand canonical Monte Carlo simulations of 9.4 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/test_0-7.jsonl": "{"text":"The reticular material with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n has a logarithm of Henry's constant for carbon dioxide obtained from Widom insertion simulations of -2.85 log(mol\/kg\/Pa)."} {"text":"The metal-organic framework with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n has a logarithm of Henry's constant for CO2 obtained from Widom insertion simulations of -0.78 log(mol\/kg\/Pa)."}", "/scratch/micpie/export/core_mof_no_topo/train_0-9.jsonl": "{"text":"The reticular material with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n has a deliverable capacity of CH4 (between 5.8 bar and 65 bar at 298 K) obtained from GCMC simulations of 60.337 vSTP\/v."} {"text":"The reticular material with the following crystal structure in CIF format [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n has a deliverable capacity (between 5.8 bar and 65 bar at 298 K) of methane obtained from GCMC simulations of 38.406 vSTP\/v."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-22.jsonl": "{"text":"User: I just created a metal-organic framework with the following crystal structure in CIF format [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n. What is the heat of adsorption of methane (computed using the Widom insertion technique)?\nAssistant: The heat of adsorption of methane (computed using the Widom insertion technique) is -21.10 kJ\/mol. Is there anything else I can do for you?\nUser: Yeah, I would like to know the simulated carbon dioxide uptake at 298 K and 15000 Pa.\nAssistant: The simulated carbon dioxide uptake at 298 K and 15000 Pa is 4.7 mol\/kg."} {"text":"User: I just synthesized a reticular material with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n. What is the heat of adsorption of CH4 (computed using the Widom insertion technique)?\nAssistant: The heat of adsorption of CH4 (computed using the Widom insertion technique) is -21.63 kJ\/mol. Is there anything else you want to know?\nUser: Yeah, I would like to know the carbon dioxide uptake at 298 K and 15000 Pa as obtained from GCMC simulations.\nAssistant: The carbon dioxide uptake at 298 K and 15000 Pa as obtained from GCMC simulations is 5.4 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/train_0-18.jsonl": "{"text":"Question: What is the logarithm of Henry's constant for CO2 obtained from Widom insertion simulations of the reticular material with the following crystal structure in CIF format [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.008\n_cell_length_b 16.940\n_cell_length_c 13.006\n_cell_angle_alpha 90.000\n_cell_angle_beta 116.158\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1188.099\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.818 0.499 0.594 1\n Sr Sr1 1 0.182 0.999 0.906 1\n Sr Sr2 1 0.182 0.501 0.406 1\n Sr Sr3 1 0.818 0.001 0.094 1\n H H4 1 0.929 0.309 0.281 1\n H H5 1 0.785 0.186 0.175 1\n H H6 1 0.071 0.809 0.219 1\n H H7 1 0.215 0.686 0.325 1\n H H8 1 0.071 0.691 0.719 1\n H H9 1 0.215 0.814 0.825 1\n H H10 1 0.929 0.191 0.781 1\n H H11 1 0.785 0.314 0.675 1\n C C12 1 0.682 0.377 0.399 1\n C C13 1 0.647 0.306 0.327 1\n C C14 1 0.791 0.282 0.276 1\n C C15 1 0.708 0.210 0.215 1\n C C16 1 0.502 0.181 0.220 1\n C C17 1 0.364 0.106 0.172 1\n C C18 1 0.318 0.877 0.101 1\n C C19 1 0.353 0.806 0.173 1\n C C20 1 0.209 0.782 0.224 1\n C C21 1 0.292 0.710 0.285 1\n C C22 1 0.498 0.681 0.280 1\n C C23 1 0.635 0.606 0.329 1\n C C24 1 0.318 0.623 0.601 1\n C C25 1 0.353 0.694 0.673 1\n C C26 1 0.209 0.718 0.724 1\n C C27 1 0.292 0.790 0.785 1\n C C28 1 0.498 0.819 0.780 1\n C C29 1 0.635 0.894 0.829 1\n C C30 1 0.682 0.123 0.899 1\n C C31 1 0.647 0.194 0.827 1\n C C32 1 0.791 0.218 0.776 1\n C C33 1 0.708 0.290 0.715 1\n C C34 1 0.502 0.319 0.720 1\n C C35 1 0.364 0.394 0.671 1\n S S36 1 0.407 0.241 0.300 1\n S S37 1 0.593 0.741 0.200 1\n S S38 1 0.593 0.759 0.700 1\n S S39 1 0.407 0.259 0.800 1\n O O40 1 0.179 0.093 0.188 1\n O O41 1 0.448 0.062 0.119 1\n O O42 1 0.516 0.394 0.429 1\n O O43 1 0.881 0.415 0.430 1\n O O44 1 0.821 0.593 0.312 1\n O O45 1 0.552 0.562 0.381 1\n O O46 1 0.484 0.894 0.071 1\n O O47 1 0.119 0.915 0.070 1\n O O48 1 0.821 0.907 0.812 1\n O O49 1 0.552 0.938 0.881 1\n O O50 1 0.484 0.606 0.571 1\n O O51 1 0.119 0.585 0.570 1\n O O52 1 0.179 0.407 0.688 1\n O O53 1 0.448 0.438 0.619 1\n O O54 1 0.516 0.106 0.929 1\n O O55 1 0.881 0.085 0.930 1\n[\/CIF]\n?\nAnswer: The logarithm of Henry's constant for CO2 obtained from Widom insertion simulations is -2.97 log(mol\/kg\/Pa)."} {"text":"Question: What is the logarithm of Henry's constant for carbon dioxide obtained from Widom insertion simulations of the metal-organic framework with the following CIF file [CIF]\ndata_AlH3C4O5\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 6.842\n_cell_length_b 12.088\n_cell_length_c 14.207\n_cell_angle_alpha 90.000\n_cell_angle_beta 122.547\n_cell_angle_gamma 90.000\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural AlH3C4O5\n_chemical_formula_sum 'Al4 H12 C16 O20'\n_cell_volume 990.432\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Al Al0 1 0.500 0.500 0.000 1\n Al Al1 1 0.500 0.000 0.500 1\n Al Al2 1 0.000 0.500 0.000 1\n Al Al3 1 0.000 0.000 0.500 1\n H H4 1 0.730 0.232 0.243 1\n H H5 1 0.278 0.258 0.252 1\n H H6 1 0.923 0.526 0.131 1\n H H7 1 0.270 0.732 0.257 1\n H H8 1 0.722 0.758 0.248 1\n H H9 1 0.077 0.026 0.369 1\n H H10 1 0.270 0.768 0.757 1\n H H11 1 0.722 0.742 0.748 1\n H H12 1 0.077 0.474 0.869 1\n H H13 1 0.730 0.268 0.743 1\n H H14 1 0.278 0.242 0.752 1\n H H15 1 0.923 0.974 0.631 1\n C C16 1 0.430 0.347 0.134 1\n C C17 1 0.554 0.265 0.222 1\n C C18 1 0.456 0.228 0.274 1\n C C19 1 0.579 0.141 0.360 1\n C C20 1 0.570 0.847 0.366 1\n C C21 1 0.446 0.765 0.278 1\n C C22 1 0.544 0.728 0.226 1\n C C23 1 0.421 0.641 0.140 1\n C C24 1 0.570 0.653 0.866 1\n C C25 1 0.446 0.735 0.778 1\n C C26 1 0.544 0.772 0.726 1\n C C27 1 0.421 0.859 0.640 1\n C C28 1 0.430 0.153 0.634 1\n C C29 1 0.554 0.235 0.722 1\n C C30 1 0.456 0.272 0.774 1\n C C31 1 0.579 0.359 0.860 1\n O O32 1 0.247 0.400 0.116 1\n O O33 1 0.538 0.369 0.084 1\n O O34 1 0.451 0.093 0.380 1\n O O35 1 0.783 0.113 0.399 1\n O O36 1 0.820 0.483 0.060 1\n O O37 1 0.753 0.900 0.384 1\n O O38 1 0.462 0.869 0.416 1\n O O39 1 0.549 0.593 0.120 1\n O O40 1 0.217 0.613 0.101 1\n O O41 1 0.180 0.983 0.440 1\n O O42 1 0.753 0.600 0.884 1\n O O43 1 0.462 0.631 0.916 1\n O O44 1 0.549 0.907 0.620 1\n O O45 1 0.217 0.887 0.601 1\n O O46 1 0.180 0.517 0.940 1\n O O47 1 0.247 0.100 0.616 1\n O O48 1 0.538 0.131 0.584 1\n O O49 1 0.451 0.407 0.880 1\n O O50 1 0.783 0.387 0.899 1\n O O51 1 0.820 0.017 0.560 1\n[\/CIF]\n?\nAnswer: The logarithm of Henry's constant for carbon dioxide obtained from Widom insertion simulations is -3.39 log(mol\/kg\/Pa)."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-3.jsonl": "{"text":"The MOF with the following crystal structure in CIF format [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n has a simulated CO2 uptake at 298 K and 15000 Pa of 4.7 mol\/kg."} {"text":"The metal-organic framework with the following CIF file [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n has a simulated CO2 uptake at 298 K and 0.15 bar of 5.4 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/test_0-8.jsonl": "{"text":"The reticular material with the following crystal structure in CIF format [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n has a logarithm of Henry's constant for methane obtained from Widom insertion simulations of -4.64 log(mol\/kg\/Pa)."} {"text":"The metal-organic framework with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n has a logarithm of Henry's constant for methane obtained from Widom insertion simulations of -4.55 log(mol\/kg\/Pa)."}", "/scratch/micpie/export/core_mof_no_topo/test_0-14.jsonl": "{"text":"Question: What is the CO2 uptake at 298 K and 0.15 bar as obtained from GCMC simulations of the reticular material with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n?\nAnswer: The CO2 uptake at 298 K and 0.15 bar as obtained from GCMC simulations is 3.1 mol\/kg."} {"text":"Question: What is the simulated CO2 uptake at 298 K and 0.15 bar of the metal-organic framework (MOF) with the following CIF file [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n?\nAnswer: The simulated CO2 uptake at 298 K and 0.15 bar is 5.4 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-17.jsonl": "{"text":"Question: What is the simulated methane uptake at 298 K and 65 bar of the metal-organic framework (MOF) with the following CIF file [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n?\nAnswer: The simulated methane uptake at 298 K and 65 bar is 5.1 mol\/kg."} {"text":"Question: What is the methane uptake at 298 K and 6500000 Pa as obtained from GCMC simulations of the metal-organic framework (MOF) with the following CIF file [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n?\nAnswer: The methane uptake at 298 K and 6500000 Pa as obtained from GCMC simulations is 3.0 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/valid_0-14.jsonl": "{"text":"Question: What is the simulated CO2 uptake at 298 K and 15000 Pa of the metal-organic framework (MOF) with the following CIF file [CIF]\ndata_LiC6N4F3\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.740\n_cell_length_b 8.740\n_cell_length_c 15.345\n_cell_angle_alpha 85.352\n_cell_angle_beta 85.352\n_cell_angle_gamma 60.318\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural LiC6N4F3\n_chemical_formula_sum 'Li4 C24 N16 F12'\n_cell_volume 1013.897\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n C C0 1 0.348 0.255 0.874 1\n C C1 1 0.300 0.408 0.821 1\n C C2 1 0.320 0.554 0.840 1\n C C3 1 0.246 0.237 0.758 1\n C C4 1 0.420 0.217 0.958 1\n C C5 1 0.191 0.161 0.691 1\n C C6 1 0.745 0.652 0.626 1\n C C7 1 0.592 0.700 0.679 1\n C C8 1 0.446 0.680 0.660 1\n C C9 1 0.763 0.754 0.742 1\n C C10 1 0.783 0.580 0.542 1\n C C11 1 0.839 0.809 0.809 1\n C C12 1 0.652 0.745 0.126 1\n C C13 1 0.700 0.592 0.179 1\n C C14 1 0.680 0.446 0.160 1\n C C15 1 0.754 0.763 0.242 1\n C C16 1 0.580 0.783 0.042 1\n C C17 1 0.809 0.839 0.309 1\n C C18 1 0.255 0.348 0.374 1\n C C19 1 0.408 0.300 0.321 1\n C C20 1 0.554 0.320 0.340 1\n C C21 1 0.237 0.246 0.258 1\n C C22 1 0.217 0.420 0.458 1\n C C23 1 0.161 0.191 0.191 1\n F F24 1 0.175 0.024 0.726 1\n F F25 1 0.308 0.102 0.623 1\n F F26 1 0.037 0.279 0.658 1\n F F27 1 0.976 0.825 0.774 1\n F F28 1 0.898 0.692 0.877 1\n F F29 1 0.721 0.963 0.842 1\n F F30 1 0.825 0.976 0.274 1\n F F31 1 0.692 0.898 0.377 1\n F F32 1 0.963 0.721 0.342 1\n F F33 1 0.024 0.175 0.226 1\n F F34 1 0.102 0.308 0.123 1\n F F35 1 0.279 0.037 0.158 1\n Li Li36 1 0.123 0.610 0.638 1\n Li Li37 1 0.390 0.877 0.862 1\n Li Li38 1 0.877 0.390 0.362 1\n Li Li39 1 0.610 0.123 0.138 1\n N N40 1 0.314 0.144 0.832 1\n N N41 1 0.477 0.192 0.025 1\n N N42 1 0.341 0.669 0.856 1\n N N43 1 0.233 0.397 0.746 1\n N N44 1 0.856 0.686 0.668 1\n N N45 1 0.808 0.523 0.475 1\n N N46 1 0.331 0.659 0.644 1\n N N47 1 0.603 0.767 0.754 1\n N N48 1 0.686 0.856 0.168 1\n N N49 1 0.523 0.808 0.975 1\n N N50 1 0.659 0.331 0.144 1\n N N51 1 0.767 0.603 0.254 1\n N N52 1 0.144 0.314 0.332 1\n N N53 1 0.192 0.477 0.525 1\n N N54 1 0.669 0.341 0.356 1\n N N55 1 0.397 0.233 0.246 1\n[\/CIF]\n?\nAnswer: The simulated CO2 uptake at 298 K and 15000 Pa is 4.7 mol\/kg."} {"text":"Question: What is the simulated CO2 uptake at 298 K and 0.15 bar of the MOF with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n?\nAnswer: The simulated CO2 uptake at 298 K and 0.15 bar is 5.4 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/test_0-4.jsonl": "{"text":"The MOF with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n has a simulated carbon dioxide uptake at 298 K and 1600000 Pa of 5.6 mol\/kg."} {"text":"The MOF with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n has a carbon dioxide uptake at 298 K and 1600000 Pa as obtained from grand canonical Monte Carlo simulations of 7.0 mol\/kg."}", "/scratch/micpie/export/core_mof_no_topo/test_0-12.jsonl": "{"text":"Question: What is the simulated heat of adsorption of carbon dioxide (obtained using Widom insertions) of the metal-organic framework with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n?\nAnswer: The simulated heat of adsorption of carbon dioxide (obtained using Widom insertions) is -33.81 kJ\/mol."} {"text":"Question: What is the simulated heat of adsorption of CO2 (obtained using Widom insertions) of the metal-organic framework (MOF) with the following CIF file [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n?\nAnswer: The simulated heat of adsorption of CO2 (obtained using Widom insertions) is -53.39 kJ\/mol."}", "/scratch/micpie/export/core_mof_no_topo/test_0-20.jsonl": "{"text":"Question: What is the deliverable capacity of CH4 obtained from GCMC simulations between 5.8 bar and 65 bar at 298 K of the metal-organic framework with the following CIF file [CIF]\ndata_SrH2C6SO4\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 5.979\n_cell_length_b 11.359\n_cell_length_c 17.058\n_cell_angle_alpha 90.000\n_cell_angle_beta 90.000\n_cell_angle_gamma 91.257\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SrH2C6SO4\n_chemical_formula_sum 'Sr4 H8 C24 S4 O16'\n_cell_volume 1158.361\n_cell_formula_units_Z 4\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sr Sr0 1 0.722 0.904 0.998 1\n Sr Sr1 1 0.778 0.596 0.498 1\n Sr Sr2 1 0.278 0.096 0.002 1\n Sr Sr3 1 0.222 0.404 0.502 1\n H H4 1 0.116 0.336 0.686 1\n H H5 1 0.156 0.226 0.812 1\n H H6 1 0.384 0.164 0.186 1\n H H7 1 0.344 0.275 0.312 1\n H H8 1 0.884 0.664 0.314 1\n H H9 1 0.844 0.774 0.188 1\n H H10 1 0.616 0.836 0.814 1\n H H11 1 0.656 0.726 0.688 1\n C C12 1 0.784 0.105 0.879 1\n C C13 1 0.821 0.179 0.809 1\n C C14 1 0.017 0.232 0.784 1\n C C15 1 0.995 0.293 0.712 1\n C C16 1 0.781 0.284 0.683 1\n C C17 1 0.692 0.329 0.607 1\n C C18 1 0.716 0.395 0.379 1\n C C19 1 0.679 0.321 0.309 1\n C C20 1 0.483 0.268 0.284 1\n C C21 1 0.505 0.207 0.212 1\n C C22 1 0.719 0.216 0.183 1\n C C23 1 0.808 0.171 0.107 1\n C C24 1 0.216 0.895 0.121 1\n C C25 1 0.179 0.821 0.191 1\n C C26 1 0.983 0.768 0.216 1\n C C27 1 0.005 0.707 0.288 1\n C C28 1 0.219 0.716 0.317 1\n C C29 1 0.308 0.671 0.393 1\n C C30 1 0.284 0.605 0.621 1\n C C31 1 0.321 0.679 0.691 1\n C C32 1 0.517 0.732 0.716 1\n C C33 1 0.494 0.793 0.788 1\n C C34 1 0.281 0.784 0.817 1\n C C35 1 0.192 0.829 0.893 1\n S S36 1 0.608 0.203 0.744 1\n S S37 1 0.892 0.297 0.244 1\n S S38 1 0.393 0.797 0.256 1\n S S39 1 0.107 0.703 0.756 1\n O O40 1 0.587 0.075 0.896 1\n O O41 1 0.954 0.073 0.916 1\n O O42 1 0.487 0.310 0.594 1\n O O43 1 0.829 0.381 0.562 1\n O O44 1 0.913 0.425 0.397 1\n O O45 1 0.546 0.427 0.416 1\n O O46 1 0.013 0.190 0.094 1\n O O47 1 0.671 0.119 0.062 1\n O O48 1 0.413 0.925 0.103 1\n O O49 1 0.046 0.927 0.084 1\n O O50 1 0.513 0.690 0.406 1\n O O51 1 0.171 0.619 0.438 1\n O O52 1 0.087 0.575 0.604 1\n O O53 1 0.454 0.573 0.584 1\n O O54 1 0.987 0.810 0.906 1\n O O55 1 0.329 0.881 0.938 1\n[\/CIF]\n?\nAnswer: The deliverable capacity of CH4 obtained from GCMC simulations between 5.8 bar and 65 bar at 298 K is 57.587 vSTP\/v."} {"text":"Question: What is the deliverable capacity of CH4 (between 5.8 bar and 65 bar at 298 K) obtained from GCMC simulations of the reticular material with the following crystal structure in CIF format [CIF]\ndata_SmH9(C2O)6\n_symmetry_space_group_name_H-M 'P 1'\n_cell_length_a 8.431\n_cell_length_b 10.841\n_cell_length_c 11.114\n_cell_angle_alpha 64.430\n_cell_angle_beta 80.440\n_cell_angle_gamma 73.820\n_symmetry_Int_Tables_number 1\n_chemical_formula_structural SmH9(C2O)6\n_chemical_formula_sum 'Sm2 H18 C24 O12'\n_cell_volume 878.807\n_cell_formula_units_Z 2\nloop_\n _symmetry_equiv_pos_site_id\n _symmetry_equiv_pos_as_xyz\n 1 'x, y, z'\nloop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n Sm Sm0 1 0.749 0.004 0.002 1\n Sm Sm1 1 0.251 0.996 0.998 1\n H H2 1 0.751 0.954 0.519 1\n H H3 1 0.098 0.606 0.520 1\n H H4 1 0.763 0.569 0.824 1\n H H5 1 0.527 0.762 0.816 1\n H H6 1 0.509 0.908 0.688 1\n H H7 1 0.023 0.964 0.376 1\n H H8 1 0.119 0.826 0.352 1\n H H9 1 0.067 0.374 0.675 1\n H H10 1 0.930 0.354 0.795 1\n H H11 1 0.249 0.046 0.481 1\n H H12 1 0.902 0.394 0.480 1\n H H13 1 0.237 0.431 0.176 1\n H H14 1 0.472 0.238 0.184 1\n H H15 1 0.491 0.092 0.312 1\n H H16 1 0.977 0.036 0.624 1\n H H17 1 0.881 0.174 0.648 1\n H H18 1 0.933 0.626 0.325 1\n H H19 1 0.070 0.646 0.205 1\n C C20 1 0.731 0.768 0.683 1\n C C21 1 0.797 0.856 0.559 1\n C C22 1 0.932 0.791 0.499 1\n C C23 1 0.002 0.647 0.561 1\n C C24 1 0.934 0.561 0.681 1\n C C25 1 0.805 0.625 0.739 1\n C C26 1 0.587 0.835 0.755 1\n C C27 1 0.636 0.899 0.834 1\n C C28 1 0.009 0.879 0.370 1\n C C29 1 0.887 0.923 0.253 1\n C C30 1 0.017 0.406 0.746 1\n C C31 1 0.149 0.367 0.841 1\n C C32 1 0.269 0.232 0.317 1\n C C33 1 0.203 0.144 0.441 1\n C C34 1 0.068 0.209 0.501 1\n C C35 1 0.998 0.353 0.439 1\n C C36 1 0.066 0.439 0.319 1\n C C37 1 0.195 0.375 0.261 1\n C C38 1 0.413 0.165 0.245 1\n C C39 1 0.364 0.101 0.166 1\n C C40 1 0.991 0.121 0.630 1\n C C41 1 0.113 0.077 0.747 1\n C C42 1 0.983 0.594 0.254 1\n C C43 1 0.851 0.633 0.159 1\n O O44 1 0.528 0.939 0.914 1\n O O45 1 0.780 0.913 0.824 1\n O O46 1 0.707 0.991 0.228 1\n O O47 1 0.001 0.949 0.131 1\n O O48 1 0.215 0.237 0.888 1\n O O49 1 0.183 0.458 0.861 1\n O O50 1 0.472 0.061 0.086 1\n O O51 1 0.220 0.087 0.176 1\n O O52 1 0.293 0.009 0.772 1\n O O53 1 0.999 0.051 0.869 1\n O O54 1 0.785 0.763 0.112 1\n O O55 1 0.817 0.542 0.139 1\n[\/CIF]\n?\nAnswer: The deliverable capacity of CH4 (between 5.8 bar and 65 bar at 298 K) obtained from GCMC simulations is 42.994 vSTP\/v."}", "/scratch/micpie/export/ames_mutagenicity/test_0-10.jsonl": "{"text":"User: I'm looking for the canonical SMILES of a molecule that is Ames mutagenic?\nAssistant: This is a molecule that is Ames mutagenic: Cc1cccc([N+](=O)[O-])c1C"} {"text":"User: I'm looking for the SELFIES of a molecule that is mutagenic?\nAssistant: This is a molecule that is mutagenic: ['[C][C][=C][C][=C][C][Branch1][=Branch1][C][=Branch1][C][=O][Cl][=C][Ring1][=Branch2]']"}", "/scratch/micpie/export/ames_mutagenicity/valid_0-8.jsonl": "{"text":"User: Is the molecule with the SELFIES ['[N-1][=N+1][=C][C][=N][C][=Branch1][C][=O][N][C][Ring1][#Branch1][=O]'] mutagenic?\nAssistant: Yes, it is mutagenic."} {"text":"User: Is the molecule with the SELFIES ['[C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][O]'] Ames mutagenic?\nAssistant: Yes, it is Ames mutagenic."}", "/scratch/micpie/export/ames_mutagenicity/train_0-8.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C20H15NO2\/c22-21(23)20-16-7-2-1-6-14(16)15-10-8-12-4-3-5-13-9-11-17(20)19(15)18(12)13\/h3-5,8-11H,1-2,6-7H2 mutagenic?\nAssistant: Yes, it is mutagenic."} {"text":"User: Is the molecule with the DeepSMILES C\/C=C\\C)CCl Ames mutagenic?\nAssistant: Yes, it is Ames mutagenic."}", "/scratch/micpie/export/ames_mutagenicity/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is Ames mutagenic.\nMolecule SMILES: Cc1cccc([N+](=O)[O-])c1C\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is Ames mutagenic."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is Ames mutagenic.\nInChI: InChI=1S\/C8H7ClO\/c1-6-3-2-4-7(5-6)8(9)10\/h2-5H,1H3\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is Ames mutagenic."}", "/scratch/micpie/export/ames_mutagenicity/valid_0-9.jsonl": "{"text":"User: Can you give me the canonical SMILES of a molecule that is Ames mutagenic?\nAssistant: Of course, here you go: [N-]=[N+]=C1C=NC(=O)NC1=O"} {"text":"User: Can you create the DeepSMILES of a molecule that is mutagenic?\nAssistant: Of course, here you go: cccccc6)cccccccc6%10"}", "/scratch/micpie/export/ames_mutagenicity/test_0-1.jsonl": "{"text":"Based on the canonical SMILES representation Cc1cccc([N+](=O)[O-])c1C, the molecule has mutagenic properties."} {"text":"Based on the SELFIES representation ['[C][C][=C][C][=C][C][Branch1][=Branch1][C][=Branch1][C][=O][Cl][=C][Ring1][=Branch2]'], the molecule has Ames mutagenic features."}", "/scratch/micpie/export/ames_mutagenicity/valid_0-0.jsonl": "{"text":"The molecule with the SELFIES ['[N-1][=N+1][=C][C][=N][C][=Branch1][C][=O][N][C][Ring1][#Branch1][=O]'] exhibits mutagenic properties."} {"text":"The molecule with the SELFIES representation of ['[C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][O]'] exhibits mutagenic properties."}", "/scratch/micpie/export/ames_mutagenicity/test_0-2.jsonl": "{"text":"The DeepSMILES Cccccc[N+]=O)[O-]))c6C represents a molecule that is identified as mutagenic."} {"text":"The DeepSMILES CcccccC=O)Cl))c6 represents a molecule that is identified as mutagenic."}", "/scratch/micpie/export/ames_mutagenicity/valid_0-10.jsonl": "{"text":"User: I'm looking for the SMILES of a molecule that is mutagenic?\nAssistant: This is a molecule that is mutagenic: [N-]=[N+]=C1C=NC(=O)NC1=O"} {"text":"User: I'm searching for the SELFIES of a molecule that is Ames mutagenic?\nAssistant: This is a molecule that is Ames mutagenic: ['[C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][O]']"}", "/scratch/micpie/export/ames_mutagenicity/train_0-6.jsonl": "{"text":"Task: Please give me a molecule canonical SMILES based on the description.\nDescription: A molecule that is mutagenic.\nResult: O=[N+]([O-])c1c2c(c3ccc4cccc5ccc1c3c45)CCCC2"} {"text":"Task: Please generate a molecule InChI based on the text description below.\nDescription: A molecule that is Ames mutagenic.\nResult: InChI=1S\/C5H9Cl\/c1-3-5(2)4-6\/h3H,4H2,1-2H3\/b5-3+"}", "/scratch/micpie/export/ames_mutagenicity/valid_0-6.jsonl": "{"text":"Task: Please generate a SMILES based on the description.\nDescription: A molecule that is Ames mutagenic.\nResult: [N-]=[N+]=C1C=NC(=O)NC1=O"} {"text":"Task: Please create a canonical SMILES based on the text description.\nDescription: A molecule that is Ames mutagenic.\nResult: c1ccc2c(c1)ccc1ccccc12"}", "/scratch/micpie/export/ames_mutagenicity/test_0-9.jsonl": "{"text":"User: Can you give me the DeepSMILES of a molecule that is Ames mutagenic?\nAssistant: Yes, I'm happy to help, here you go: Cccccc[N+]=O)[O-]))c6C"} {"text":"User: Can you create the SMILES of a molecule that is Ames mutagenic?\nAssistant: Sure, here you go: Cc1cccc(C(=O)Cl)c1"}", "/scratch/micpie/export/ames_mutagenicity/test_0-0.jsonl": "{"text":"The molecule with the DeepSMILES Cccccc[N+]=O)[O-]))c6C exhibits mutagenic properties."} {"text":"The molecule with the InChI InChI=1S\/C8H7ClO\/c1-6-3-2-4-7(5-6)8(9)10\/h2-5H,1H3 shows Ames mutagenic properties."}", "/scratch/micpie/export/ames_mutagenicity/valid_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the DeepSMILES [N-]=[N+]=CC=NC=O)NC6=O is mutagenic?\nAssistant: Yes, this molecule is mutagenic."} {"text":"User: Can you estimate if the molecule with the InChI InChI=1S\/C14H10\/c1-3-7-13-11(5-1)9-10-12-6-2-4-8-14(12)13\/h1-10H is mutagenic?\nAssistant: Yes, this molecule is mutagenic."}", "/scratch/micpie/export/ames_mutagenicity/test_0-3.jsonl": "{"text":"The molecule canonical SMILES Cc1cccc([N+](=O)[O-])c1C is mutagenic."} {"text":"The canonical SMILES Cc1cccc(C(=O)Cl)c1 is Ames mutagenic."}", "/scratch/micpie/export/ames_mutagenicity/valid_0-11.jsonl": "{"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should be Ames mutagenic.\nAssistant: Got it, this canonical SMILES is Ames mutagenic: [N-]=[N+]=C1C=NC(=O)NC1=O"} {"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should be mutagenic.\nAssistant: Got it, here you go, this DeepSMILES is mutagenic: cccccc6)cccccccc6%10"}", "/scratch/micpie/export/ames_mutagenicity/train_0-0.jsonl": "{"text":"The molecule with the DeepSMILES O=[N+][O-])cccccccccccccc%14c%12c%106)))))))))))))CCCC6 displays Ames mutagenic properties."} {"text":"The molecule with the SMILES representation of C\/C=C(\\C)CCl displays Ames mutagenic properties."}", "/scratch/micpie/export/ames_mutagenicity/test_0-6.jsonl": "{"text":"Task: Please give me a SMILES based on the description.\nDescription: A molecule that is mutagenic.\nResult: Cc1cccc([N+](=O)[O-])c1C"} {"text":"Task: Please create a molecule DeepSMILES based on the text description below.\nDescription: A molecule that is Ames mutagenic.\nResult: CcccccC=O)Cl))c6"}", "/scratch/micpie/export/ames_mutagenicity/train_0-10.jsonl": "{"text":"User: I'm searching for the SELFIES of a molecule that is mutagenic?\nAssistant: This is a molecule that is mutagenic: ['[O][=N+1][Branch1][C][O-1][C][=C][C][=Branch2][Ring1][=Branch1][=C][C][=C][C][=C][C][=C][C][=C][C][=C][Ring1][=C][C][Ring1][N][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][Ring1]']"} {"text":"User: I'm looking for the canonical SMILES of a molecule that is mutagenic?\nAssistant: This is a molecule that is mutagenic: C\/C=C(\\C)CCl"}", "/scratch/micpie/export/ames_mutagenicity/train_0-3.jsonl": "{"text":"The canonical SMILES O=[N+]([O-])c1c2c(c3ccc4cccc5ccc1c3c45)CCCC2 is mutagenic."} {"text":"The molecule SELFIES ['[C][\/C][=C][Branch1][C][\\\\C][C][Cl]'] is mutagenic."}", "/scratch/micpie/export/ames_mutagenicity/train_0-12.jsonl": "{"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should be Ames mutagenic.\nAssistant: Got it, this InChI is Ames mutagenic: InChI=1S\/C20H15NO2\/c22-21(23)20-16-7-2-1-6-14(16)15-10-8-12-4-3-5-13-9-11-17(20)19(15)18(12)13\/h3-5,8-11H,1-2,6-7H2"} {"text":"User: I want to come up with a SELFIES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should be mutagenic.\nAssistant: Ok, this SELFIES is mutagenic: ['[C][\/C][=C][Branch1][C][\\\\C][C][Cl]']"}", "/scratch/micpie/export/ames_mutagenicity/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of ['[C][C][=C][C][=C][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring1][=Branch2][C]'] mutagenic?\nConstraint: Even if you are uncertain, you must pick either a or b without using any additional words.\nOptions:\na. False\nb. True\nAnswer: b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI InChI=1S\/C8H7ClO\/c1-6-3-2-4-7(5-6)8(9)10\/h2-5H,1H3 Ames mutagenic?\nConstraint: Even if you are not sure, you must pick either a or b without using any additional words.\nOptions:\na: True\nb: False\nAnswer: a"}", "/scratch/micpie/export/ames_mutagenicity/valid_0-2.jsonl": "{"text":"The DeepSMILES [N-]=[N+]=CC=NC=O)NC6=O is from a molecule that is identified as mutagenic."} {"text":"The InChI InChI=1S\/C14H10\/c1-3-7-13-11(5-1)9-10-12-6-2-4-8-14(12)13\/h1-10H represents a molecule that is identified as mutagenic."}", "/scratch/micpie/export/ames_mutagenicity/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are Ames mutagenic?\nConstraint: You must select none, one or more options from A, B, or C without using any other words.\nOptions:\nA: O=[N+][O-])cccccccccccccc%14c%12c%106)))))))))))))CCCC6\nB: CcccnccccNC)C))cc6nc%10cc%14N\nC: O=S=O)O)cccN=NccO)cccccccc%106))))))))))))ccc6-ccccN=NccO)cccccccc%106))))))))))))cc6S=O)=O)O\nAnswer: A, B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are Ames mutagenic?\nConstraint: You must select none, one or more options from 1, 2, 3, 4, or 5 without using any additional words.\nOptions:\n1: O=NNCCCC5ccccnc6\n2: COP=O)OC))OC=CCl)Cl\n3: C\/C=C\\C)CCl\n4: ClccccCccccCl)cc6))))))CCl)Cl)Cl)))cc6\n5: CC=CC=O)Ncccccc6))))))))SCCO6\nAnswer: 1, 2, 3"}", "/scratch/micpie/export/ames_mutagenicity/valid_0-1.jsonl": "{"text":"Based on the SELFIES ['[N-1][=N+1][=C][C][=N][C][=Branch1][C][=O][N][C][Ring1][#Branch1][=O]'], the molecule has mutagenic characteristics."} {"text":"Based on the SELFIES representation ['[C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][O]'], the molecule has mutagenic features."}", "/scratch/micpie/export/ames_mutagenicity/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES [N-]=[N+]=CC=NC=O)NC6=O Ames mutagenic?\nConstraint: Even if you are not sure, you must pick either A or B without using any additional words.\nOptions:\n(A) False\n(B) True\nAnswer: B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES representation of cccccc6)cccccccc6%10 mutagenic?\nConstraint: Even if you are not sure, you must pick either A or B without using any other words.\nOptions:\n(A) True\n(B) False\nAnswer: A"}", "/scratch/micpie/export/ames_mutagenicity/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is mutagenic.\nDeepSMILES: [N-]=[N+]=CC=NC=O)NC6=O\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is mutagenic."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is Ames mutagenic.\nMolecule DeepSMILES: cccccc6)cccccccc6%10\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is Ames mutagenic."}", "/scratch/micpie/export/ames_mutagenicity/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is mutagenic.\nMolecule DeepSMILES: [N-]=[N+]=CC=NC=O)NC6=O\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is Ames mutagenic.\nMolecule SMILES: c1ccc2c(c1)ccc1ccccc12\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"}", "/scratch/micpie/export/ames_mutagenicity/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is mutagenic.\nMolecule SELFIES: ['[O][=N+1][Branch1][C][O-1][C][=C][C][=Branch2][Ring1][=Branch1][=C][C][=C][C][=C][C][=C][C][=C][C][=C][Ring1][=C][C][Ring1][N][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][Ring1]']\nConstraint: Answer the question in a full sentence.\nResult: This molecule is mutagenic."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is mutagenic.\nSMILES: C\/C=C(\\C)CCl\nConstraint: Answer the question in a full sentence.\nResult: This molecule is mutagenic."}", "/scratch/micpie/export/ames_mutagenicity/valid_0-12.jsonl": "{"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should be Ames mutagenic.\nAssistant: Got it, this DeepSMILES is Ames mutagenic: [N-]=[N+]=CC=NC=O)NC6=O"} {"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should be Ames mutagenic.\nAssistant: Understood, this DeepSMILES is Ames mutagenic: cccccc6)cccccccc6%10"}", "/scratch/micpie/export/ames_mutagenicity/train_0-2.jsonl": "{"text":"The InChI InChI=1S\/C20H15NO2\/c22-21(23)20-16-7-2-1-6-14(16)15-10-8-12-4-3-5-13-9-11-17(20)19(15)18(12)13\/h3-5,8-11H,1-2,6-7H2 is from a molecule that is identified as Ames mutagenic."} {"text":"The DeepSMILES C\/C=C\\C)CCl represents a molecule that is identified as Ames mutagenic."}", "/scratch/micpie/export/ames_mutagenicity/test_0-11.jsonl": "{"text":"User: I want to create a molecule SELFIES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should be Ames mutagenic.\nAssistant: Ok, here you go, this SELFIES is Ames mutagenic: ['[C][C][=C][C][=C][C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][=C][Ring1][=Branch2][C]']"} {"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should be mutagenic.\nAssistant: Got it, here you go, this DeepSMILES is mutagenic: CcccccC=O)Cl))c6"}", "/scratch/micpie/export/ames_mutagenicity/train_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the InChI InChI=1S\/C20H15NO2\/c22-21(23)20-16-7-2-1-6-14(16)15-10-8-12-4-3-5-13-9-11-17(20)19(15)18(12)13\/h3-5,8-11H,1-2,6-7H2 is mutagenic?\nAssistant: Yes, this molecule is mutagenic."} {"text":"User: Can you derive if the molecule with the canonical SMILES C\/C=C(\\C)CCl is mutagenic?\nAssistant: Yes, this molecule is mutagenic."}", "/scratch/micpie/export/ames_mutagenicity/train_0-11.jsonl": "{"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should be Ames mutagenic.\nAssistant: Got it, this SELFIES is Ames mutagenic: ['[O][=N+1][Branch1][C][O-1][C][=C][C][=Branch2][Ring1][=Branch1][=C][C][=C][C][=C][C][=C][C][=C][C][=C][Ring1][=C][C][Ring1][N][=C][Ring1][#Branch2][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][Ring1]']"} {"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should be Ames mutagenic.\nAssistant: Ok, here you go, this InChI is Ames mutagenic: InChI=1S\/C5H9Cl\/c1-3-5(2)4-6\/h3H,4H2,1-2H3\/b5-3+"}", "/scratch/micpie/export/ames_mutagenicity/train_0-1.jsonl": "{"text":"Based on the canonical SMILES representation O=[N+]([O-])c1c2c(c3ccc4cccc5ccc1c3c45)CCCC2, the molecule has Ames mutagenic features."} {"text":"Based on the InChI representation InChI=1S\/C5H9Cl\/c1-3-5(2)4-6\/h3H,4H2,1-2H3\/b5-3+, the molecule has mutagenic properties."}", "/scratch/micpie/export/ames_mutagenicity/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES representation of O=[N+]([O-])c1c2c(c3ccc4cccc5ccc1c3c45)CCCC2 Ames mutagenic?\nConstraint: Even if you are not sure, you must pick either a or b without using any other words.\nOptions:\n(a) False\n(b) True\nAnswer: b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES C\/C=C\\C)CCl Ames mutagenic?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any additional words.\nOptions:\n1.) True\n2.) False\nAnswer: 1"}", "/scratch/micpie/export/ames_mutagenicity/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is mutagenic.\nMolecule canonical SMILES: O=[N+]([O-])c1c2c(c3ccc4cccc5ccc1c3c45)CCCC2\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is mutagenic.\nSELFIES: ['[C][\/C][=C][Branch1][C][\\\\C][C][Cl]']\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"}", "/scratch/micpie/export/ames_mutagenicity/test_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the canonical SMILES Cc1cccc([N+](=O)[O-])c1C is mutagenic?\nAssistant: Yes, this molecule is mutagenic."} {"text":"User: Can you tell me if the molecule with the DeepSMILES CcccccC=O)Cl))c6 is Ames mutagenic?\nAssistant: Yes, this molecule is Ames mutagenic."}", "/scratch/micpie/export/ames_mutagenicity/train_0-9.jsonl": "{"text":"User: Can you generate the canonical SMILES of a molecule that is Ames mutagenic?\nAssistant: Sure, here you go: O=[N+]([O-])c1c2c(c3ccc4cccc5ccc1c3c45)CCCC2"} {"text":"User: Can you give me the DeepSMILES of a molecule that is Ames mutagenic?\nAssistant: Sure, here you go: C\/C=C\\C)CCl"}", "/scratch/micpie/export/ames_mutagenicity/valid_0-3.jsonl": "{"text":"The molecule InChI InChI=1S\/C4H2N4O2\/c5-8-2-1-6-4(10)7-3(2)9\/h1H,(H,7,9,10) is mutagenic."} {"text":"The molecule SMILES c1ccc2c(c1)ccc1ccccc12 is mutagenic."}", "/scratch/micpie/export/ames_mutagenicity/test_0-8.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C8H9NO2\/c1-6-4-3-5-8(7(6)2)9(10)11\/h3-5H,1-2H3 Ames mutagenic?\nAssistant: Yes, it is Ames mutagenic."} {"text":"User: Is the molecule with the DeepSMILES CcccccC=O)Cl))c6 Ames mutagenic?\nAssistant: Yes, it is Ames mutagenic."}", "/scratch/micpie/export/ames_mutagenicity/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are mutagenic?\nConstraint: You must select none, one or more options from 1 or 2 without using any other words.\nOptions:\n[1] Ccccccc6cccccccccccc%14c%12c%106\n[2] Cccccc[N+]=O)[O-]))c6C\nAnswer: 1, 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are mutagenic?\nConstraint: You must select none, one or more options from A or B without using any additional words.\nOptions:\nA.) InChI=1S\/C8H7ClO\/c1-6-3-2-4-7(5-6)8(9)10\/h2-5H,1H3\nB.) InChI=1S\/C12H19N5O4\/c1-12(2,3)21-11(19)16(5)9-8(15(4)7-13-9)10(18)17(6)14-20\/h7H,1-6H3\nAnswer: A, B"}", "/scratch/micpie/export/ames_mutagenicity/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are mutagenic?\nConstraint: You must select none, one or more options from 1, 2, 3, 4, or 5 without using any additional words.\nOptions:\n[1] CCO3\n[2] CCCCOCCOCCOCCCC\n[3] CC=O)CCl)Cl)Cl\n[4] [N-]=[N+]=CC=NC=O)NC6=O\n[5] N=CN)NNccccN=NCN)=S))))cc6\nAnswer: 1, 3, 4, 5"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are mutagenic?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any additional words.\nOptions:\nA: InChI=1S\/C15H12O2\/c16-13(11-7-3-1-4-8-11)15-14(17-15)12-9-5-2-6-10-12\/h1-10,14-15H\/t14-,15-\/m1\/s1\nB: InChI=1S\/C19H18\/c1-12-7-9-17-16-10-8-14-5-3-4-6-15(14)18(16)11-13(2)19(12)17\/h3-6,8,10-12H,7,9H2,1-2H3\nC: InChI=1S\/C14H10\/c1-3-7-13-11(5-1)9-10-12-6-2-4-8-14(12)13\/h1-10H\nD: InChI=1S\/C10H12N2O6\/c13-3-5-2-12(10(17)11-9(5)16)8-1-6(15)7(4-14)18-8\/h2-3,6-8,14-15H,1,4H2,(H,11,16,17)\nE: InChI=1S\/C10H8O3\/c1-2-8(11)7-3-4-9-10(5-7)13-6-12-9\/h2-5H,1,6H2\nAnswer: A, B, C"}", "/scratch/micpie/export/ames_mutagenicity/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is Ames mutagenic.\nInChI: InChI=1S\/C8H9NO2\/c1-6-4-3-5-8(7(6)2)9(10)11\/h3-5H,1-2H3\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is Ames mutagenic.\nDeepSMILES: CcccccC=O)Cl))c6\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"}", "/scratch/micpie/export/ames_mutagenicity/test_0-12.jsonl": "{"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should be Ames mutagenic.\nAssistant: Got it, this SMILES is Ames mutagenic: Cc1cccc([N+](=O)[O-])c1C"} {"text":"User: I want to generate a InChI.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should be Ames mutagenic.\nAssistant: Got it, this InChI is Ames mutagenic: InChI=1S\/C8H7ClO\/c1-6-3-2-4-7(5-6)8(9)10\/h2-5H,1H3"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_5-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the InChI string.\nInChI string: InChI=1S\/C15H15FN2OS\/c1-10-6-7-11(17)8-13(10)18-15(19)9-20-14-5-3-2-4-12(14)16\/h2-8H,9,17H2,1H3,(H,18,19)\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: Cc1ccc(N)cc1NC(=O)CSc1ccccc1F"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the InChI string.\nInChI string: InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8-\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_2-4.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the InChI string InChI=1S\/C16H12FN3O\/c17-12-3-1-11(2-4-12)15-13(16(18)21)9-14(20-15)10-5-7-19-8-6-10\/h1-9,20H,(H2,18,21)?\nAssistant: Sure, this molecule has a canonical SMILES of NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the InChI string InChI=1S\/C21H26FN3O2S\/c1-27-19-7-6-16(22)11-18(19)20(26)15-5-4-8-24(13-15)14-17-12-23-21(28-17)25-9-2-3-10-25\/h6-7,11-12,15H,2-5,8-10,13-14H2,1H3?\nAssistant: Yes, this molecule has a canonical SMILES of COc1ccc(F)cc1C(=O)C1CCCN(Cc2cnc(N3CCCC3)s2)C1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_2-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the canonical SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C20H22FN.ClH\/c21-20-6-5-17-11-15(12-19(17)13-20)7-9-22-10-8-16-3-1-2-4-18(16)14-22;\/h1-6,13,15H,7-12,14H2;1H\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C14H14N2O\/c1-11-2-3-13(14(17)8-11)10-16-9-12-4-6-15-7-5-12\/h2-8,10,17H,9H2,1H3\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: Cc1ccc(C=NCc2ccncc2)c(O)c1"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_4-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C16H18N2O4S\/c1-11-15(8-9-22-11)16(19)17-10-12-2-6-14(7-3-12)23(20,21)18-13-4-5-13\/h2-3,6-9,13,18H,4-5,10H2,1H3,(H,17,19) can also be represented with the canonical SMILES representation Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1."} {"text":"The molecule with the InChI string InChI=1S\/C16H15N3O2S\/c1-11-12(6-9-21-11)16(20)18-8-5-15-19-14(10-22-15)13-4-2-3-7-17-13\/h2-4,6-7,9-10H,5,8H2,1H3,(H,18,20) can also be represented with the canonical SMILES Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_4-4.jsonl": "{"text":"User: Can you generate the canonical SMILES of the molecule with the InChI string InChI=1S\/C16H18N2O4S\/c1-11-15(8-9-22-11)16(19)17-10-12-2-6-14(7-3-12)23(20,21)18-13-4-5-13\/h2-3,6-9,13,18H,4-5,10H2,1H3,(H,17,19)?\nAssistant: Sure, this molecule has a canonical SMILES of Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1."} {"text":"User: Can you generate the canonical SMILES of the molecule with the InChI string InChI=1S\/C16H15N3O2S\/c1-11-12(6-9-21-11)16(20)18-8-5-15-19-14(10-22-15)13-4-2-3-7-17-13\/h2-4,6-7,9-10H,5,8H2,1H3,(H,18,20)?\nAssistant: Sure, this molecule has a canonical SMILES of Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_1-4.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the InChI string InChI=1S\/C19H18N2O2S\/c1-13-20-18(12-24-13)15-6-8-16(9-7-15)21-19(22)11-14-4-3-5-17(10-14)23-2\/h3-10,12H,11H2,1-2H3,(H,21,22)?\nAssistant: Of course, this molecule has a canonical SMILES of COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the InChI string InChI=1S\/C11H10N4O3\/c1-18-5-2-3-7-6(4-5)8-9(13-7)10(16)15(12)11(17)14-8\/h2-4,13H,12H2,1H3,(H,14,17)?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_5-1.jsonl": "{"text":"The molecule with the canonical SMILES representation of Cc1ccc(N)cc1NC(=O)CSc1ccccc1F can also be represented with the InChI string representation InChI=1S\/C15H15FN2OS\/c1-10-6-7-11(17)8-13(10)18-15(19)9-20-14-5-3-2-4-12(14)16\/h2-8H,9,17H2,1H3,(H,18,19)."} {"text":"The molecule with the canonical SMILES representation of CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21 can also be represented with the InChI string InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8-."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_4-4.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the InChI string InChI=1S\/C15H14N4OS2\/c1-8-7-17-15(22-8)19-13(20)10-5-12(9-3-4-9)18-14(21-2)11(10)6-16\/h5,7,9H,3-4H2,1-2H3,(H,17,19,20)?\nAssistant: Yes, this molecule has a canonical SMILES of CSc1nc(C2CC2)cc(C(=O)Nc2ncc(C)s2)c1C#N."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the InChI string InChI=1S\/C17H22FNO4\/c1-11(2)15(17(21)23-10-14-4-3-9-22-14)19-16(20)12-5-7-13(18)8-6-12\/h5-8,11,14-15H,3-4,9-10H2,1-2H3,(H,19,20)?\nAssistant: Of course, this molecule has a canonical SMILES of CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_4-4.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the InChI string InChI=1S\/C17H22N2O3S\/c20-17(14-4-2-1-3-5-14)18-12-13-6-10-16(11-7-13)23(21,22)19-15-8-9-15\/h1-2,6-7,10-11,14-15,19H,3-5,8-9,12H2,(H,18,20)?\nAssistant: Yes, this molecule has a canonical SMILES of O=C(NCc1ccc(S(=O)(=O)NC2CC2)cc1)C1CC=CCC1."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the InChI string InChI=1S\/C17H17FN4O\/c1-2-23-17-9-14(18)5-8-16(17)20-10-13-3-6-15(7-4-13)22-12-19-11-21-22\/h3-9,11-12,20H,2,10H2,1H3?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of CCOc1cc(F)ccc1NCc1ccc(-n2cncn2)cc1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_1-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C22H26N2O4\/c1-4-21(25)24-17-9-7-6-8-16(17)13-18(24)22(26)23-14-15-10-11-19(28-5-2)20(12-15)27-3\/h6-12,18H,4-5,13-14H2,1-3H3,(H,23,26)\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the InChI string.\nInChI string: InChI=1S\/C17H14ClN3O3S2\/c1-11-16(26(23,24)10-12-6-2-3-7-13(12)18)25-17(20-11)21-15(22)14-8-4-5-9-19-14\/h2-9H,10H2,1H3,(H,20,21,22)\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_3-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the InChI string.\nInChI string: InChI=1S\/C24H25BrClN3O3\/c1-5-29-16(3)24(15(2)28-29)27-23(30)11-7-17-6-9-21(31-4)18(12-17)14-32-22-10-8-19(26)13-20(22)25\/h6-13H,5,14H2,1-4H3,(H,27,30)\/b11-7+\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C16H16FN3O2S\/c1-2-20(9-3-8-18)15(21)11-23-16-19-10-14(22-16)12-4-6-13(17)7-5-12\/h4-7,10H,2-3,9,11H2,1H3\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_1-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the canonical SMILES CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C22H26N2O4\/c1-4-21(25)24-17-9-7-6-8-16(17)13-18(24)22(26)23-14-15-10-11-19(28-5-2)20(12-15)27-3\/h6-12,18H,4-5,13-14H2,1-3H3,(H,23,26)."} {"text":"User: Can you tell me the InChI string of the molecule with the canonical SMILES Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C17H14ClN3O3S2\/c1-11-16(26(23,24)10-12-6-2-3-7-13(12)18)25-17(20-11)21-15(22)14-8-4-5-9-19-14\/h2-9H,10H2,1H3,(H,20,21,22)."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_5-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C15H15FN2OS\/c1-10-6-7-11(17)8-13(10)18-15(19)9-20-14-5-3-2-4-12(14)16\/h2-8H,9,17H2,1H3,(H,18,19) can also be represented with the canonical SMILES Cc1ccc(N)cc1NC(=O)CSc1ccccc1F."} {"text":"The molecule with the InChI string InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8- can also be represented with the canonical SMILES representation CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_1-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the canonical SMILES COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C17H16FNO6S\/c1-24-14-9-11(3-8-16(20)21)10-15(17(14)25-2)26(22,23)19-13-6-4-12(18)5-7-13\/h3-10,19H,1-2H3,(H,20,21)\/b8-3+."} {"text":"User: Can you create the InChI string of the molecule with the canonical SMILES Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C15H14FNO4S\/c1-9-3-8-13(14(10(9)2)15(18)19)17-22(20,21)12-6-4-11(16)5-7-12\/h3-8,17H,1-2H3,(H,18,19)."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_5-4.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the InChI string InChI=1S\/C14H15N7S\/c1-2-4-10(5-3-1)8-21-14(18-19-20-21)22-9-12-15-13(17-16-12)11-6-7-11\/h1-5,11H,6-9H2,(H,15,16,17)?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of c1ccc(Cn2nnnc2SCc2nc(C3CC3)n[nH]2)cc1."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the InChI string InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1?\nAssistant: Yes, this molecule has a canonical SMILES of Nc1c2c([nH+]c3ccccc13)CCCC2."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_0-5.jsonl": "{"text":"User: Can you tell me the InChI string of the molecule with the canonical SMILES CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1."} {"text":"User: Can you tell me the InChI string of the molecule with the canonical SMILES COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C22H18ClN3O2\/c1-28-19-9-7-15(23)12-18(19)21(26-16-5-3-10-24-13-16)17-8-6-14-4-2-11-25-20(14)22(17)27\/h2-13,21,26-27H,1H3."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_3-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C15H17N5\/c1-3-4-13-9-14(20-15(19-13)16-10-17-20)18-12-7-5-11(2)6-8-12\/h5-10,18H,3-4H2,1-2H3 can also be represented with the canonical SMILES CCCc1cc(Nc2ccc(C)cc2)n2ncnc2n1."} {"text":"The molecule with the InChI string representation of InChI=1S\/C15H18N2O4\/c1-11(18)16-13-5-2-4-12(10-13)15(20)21-9-8-17-7-3-6-14(17)19\/h2,4-5,10H,3,6-9H2,1H3,(H,16,18) can also be represented with the canonical SMILES CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_0-1.jsonl": "{"text":"The molecule with the canonical SMILES representation of CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1 can also be represented with the InChI string InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1."} {"text":"The molecule with the canonical SMILES COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O can also be represented with the InChI string representation InChI=1S\/C22H18ClN3O2\/c1-28-19-9-7-15(23)12-18(19)21(26-16-5-3-10-24-13-16)17-8-6-14-4-2-11-25-20(14)22(17)27\/h2-13,21,26-27H,1H3."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_5-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C17H21N3O2S\/c1-11(2)15(17(22)20-6-3-4-7-20)19-16(21)12-9-14-13(18-10-12)5-8-23-14\/h5,8-11,15H,3-4,6-7H2,1-2H3,(H,19,21) can also be represented with the canonical SMILES CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1."} {"text":"The molecule with the InChI string InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1 can also be represented with the canonical SMILES CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_2-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C16H12FN3O\/c17-12-3-1-11(2-4-12)15-13(16(18)21)9-14(20-15)10-5-7-19-8-6-10\/h1-9,20H,(H2,18,21) can also be represented with the canonical SMILES NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1."} {"text":"The molecule with the InChI string representation of InChI=1S\/C21H26FN3O2S\/c1-27-19-7-6-16(22)11-18(19)20(26)15-5-4-8-24(13-15)14-17-12-23-21(28-17)25-9-2-3-10-25\/h6-7,11-12,15H,2-5,8-10,13-14H2,1H3 can also be represented with the canonical SMILES representation COc1ccc(F)cc1C(=O)C1CCCN(Cc2cnc(N3CCCC3)s2)C1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_2-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the canonical SMILES from the InChI string.\nInChI string: InChI=1S\/C22H22Cl2N2O2\/c1-3-5-12-26(4-2)22(28)21(27)19-17-13-16(24)10-11-18(17)25-20(19)14-6-8-15(23)9-7-14\/h6-11,13,25H,3-5,12H2,1-2H3\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C22H26BrClN4O2\/c1-26(15-21(29)25-20-5-3-2-4-19(20)23)22(30)16-28-12-10-27(11-13-28)14-17-6-8-18(24)9-7-17\/h2-9H,10-16H2,1H3,(H,25,29)\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_0-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1 can also be represented with the canonical SMILES representation CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F."} {"text":"The molecule with the InChI string InChI=1S\/C22H23N3O4\/c1-22(2,3)24-21(27)19(11-7-10-16-8-5-4-6-9-16)23-20(26)17-12-14-18(15-13-17)25(28)29\/h4-15H,1-3H3,(H,23,26)(H,24,27)\/b10-7+,19-11+ can also be represented with the canonical SMILES representation CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_3-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the InChI string from the canonical SMILES.\ncanonical SMILES: CCCc1cc(Nc2ccc(C)cc2)n2ncnc2n1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C15H17N5\/c1-3-4-13-9-14(20-15(19-13)16-10-17-20)18-12-7-5-11(2)6-8-12\/h5-10,18H,3-4H2,1-2H3"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the canonical SMILES.\ncanonical SMILES: CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C15H18N2O4\/c1-11(18)16-13-5-2-4-12(10-13)15(20)21-9-8-17-7-3-6-14(17)19\/h2,4-5,10H,3,6-9H2,1H3,(H,16,18)"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_5-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the InChI string from the canonical SMILES.\ncanonical SMILES: c1ccc(Cn2nnnc2SCc2nc(C3CC3)n[nH]2)cc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C14H15N7S\/c1-2-4-10(5-3-1)8-21-14(18-19-20-21)22-9-12-15-13(17-16-12)11-6-7-11\/h1-5,11H,6-9H2,(H,15,16,17)"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the canonical SMILES.\nMolecule canonical SMILES: Nc1c2c([nH+]c3ccccc13)CCCC2\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_3-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the canonical SMILES COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C25H22FN5O3S\/c1-31-8-7-19-16(12-31)9-15(10-27)24(30-19)35-13-20-22(25(32)33-2)21(18(11-28)23(29)34-20)14-3-5-17(26)6-4-14\/h3-6,9,21H,7-8,12-13,29H2,1-2H3."} {"text":"User: Can you generate the InChI string of the molecule with the canonical SMILES O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C16H16N2O4\/c19-14-10-12(11-4-1-2-5-13(11)17-14)16(21)22-9-8-18-7-3-6-15(18)20\/h1-2,4-5,10H,3,6-9H2,(H,17,19)."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_5-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the canonical SMILES CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C17H21N3O2S\/c1-11(2)15(17(22)20-6-3-4-7-20)19-16(21)12-9-14-13(18-10-12)5-8-23-14\/h5,8-11,15H,3-4,6-7H2,1-2H3,(H,19,21)."} {"text":"User: Can you generate the InChI string of the molecule with the canonical SMILES CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_4-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the canonical SMILES.\nMolecule canonical SMILES: CSc1nc(C2CC2)cc(C(=O)Nc2ncc(C)s2)c1C#N\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C15H14N4OS2\/c1-8-7-17-15(22-8)19-13(20)10-5-12(9-3-4-9)18-14(21-2)11(10)6-16\/h5,7,9H,3-4H2,1-2H3,(H,17,19,20)"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the InChI string from the canonical SMILES.\nMolecule canonical SMILES: CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C17H22FNO4\/c1-11(2)15(17(21)23-10-14-4-3-9-22-14)19-16(20)12-5-7-13(18)8-6-12\/h5-8,11,14-15H,3-4,9-10H2,1-2H3,(H,19,20)"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_1-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the canonical SMILES.\ncanonical SMILES: COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C17H16FNO6S\/c1-24-14-9-11(3-8-16(20)21)10-15(17(14)25-2)26(22,23)19-13-6-4-12(18)5-7-13\/h3-10,19H,1-2H3,(H,20,21)\/b8-3+"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the canonical SMILES.\nMolecule canonical SMILES: Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C15H14FNO4S\/c1-9-3-8-13(14(10(9)2)15(18)19)17-22(20,21)12-6-4-11(16)5-7-12\/h3-8,17H,1-2H3,(H,18,19)"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_0-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the InChI string.\nInChI string: InChI=1S\/C22H18ClN3O2\/c1-28-19-9-7-15(23)12-18(19)21(26-16-5-3-10-24-13-16)17-8-6-14-4-2-11-25-20(14)22(17)27\/h2-13,21,26-27H,1H3\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_3-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C24H25BrClN3O3\/c1-5-29-16(3)24(15(2)28-29)27-23(30)11-7-17-6-9-21(31-4)18(12-17)14-32-22-10-8-19(26)13-20(22)25\/h6-13H,5,14H2,1-4H3,(H,27,30)\/b11-7+ can also be represented with the canonical SMILES CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C."} {"text":"The molecule with the InChI string InChI=1S\/C16H16FN3O2S\/c1-2-20(9-3-8-18)15(21)11-23-16-19-10-14(22-16)12-4-6-13(17)7-5-12\/h4-7,10H,2-3,9,11H2,1H3 can also be represented with the canonical SMILES representation CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_1-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C17H16FNO6S\/c1-24-14-9-11(3-8-16(20)21)10-15(17(14)25-2)26(22,23)19-13-6-4-12(18)5-7-13\/h3-10,19H,1-2H3,(H,20,21)\/b8-3+ can also be represented with the canonical SMILES COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC."} {"text":"The molecule with the InChI string representation of InChI=1S\/C15H14FNO4S\/c1-9-3-8-13(14(10(9)2)15(18)19)17-22(20,21)12-6-4-11(16)5-7-12\/h3-8,17H,1-2H3,(H,18,19) can also be represented with the canonical SMILES representation Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_5-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the InChI string.\nInChI string: InChI=1S\/C17H21N3O2S\/c1-11(2)15(17(22)20-6-3-4-7-20)19-16(21)12-9-14-13(18-10-12)5-8-23-14\/h5,8-11,15H,3-4,6-7H2,1-2H3,(H,19,21)\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the InChI string.\nInChI string: InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_2-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the canonical SMILES.\nMolecule canonical SMILES: Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C20H22FN.ClH\/c21-20-6-5-17-11-15(12-19(17)13-20)7-9-22-10-8-16-3-1-2-4-18(16)14-22;\/h1-6,13,15H,7-12,14H2;1H"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the InChI string from the canonical SMILES.\nMolecule canonical SMILES: Cc1ccc(C=NCc2ccncc2)c(O)c1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C14H14N2O\/c1-11-2-3-13(14(17)8-11)10-16-9-12-4-6-15-7-5-12\/h2-8,10,17H,9H2,1H3"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_1-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C19H18N2O2S\/c1-13-20-18(12-24-13)15-6-8-16(9-7-15)21-19(22)11-14-4-3-5-17(10-14)23-2\/h3-10,12H,11H2,1-2H3,(H,21,22)\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the InChI string.\nInChI string: InChI=1S\/C11H10N4O3\/c1-18-5-2-3-7-6(4-5)8-9(13-7)10(16)15(12)11(17)14-8\/h2-4,13H,12H2,1H3,(H,14,17)\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_5-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the canonical SMILES Cc1ccc(N)cc1NC(=O)CSc1ccccc1F?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C15H15FN2OS\/c1-10-6-7-11(17)8-13(10)18-15(19)9-20-14-5-3-2-4-12(14)16\/h2-8H,9,17H2,1H3,(H,18,19)."} {"text":"User: Can you generate the InChI string of the molecule with the canonical SMILES CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8-."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_2-4.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the InChI string InChI=1S\/C20H22FN.ClH\/c21-20-6-5-17-11-15(12-19(17)13-20)7-9-22-10-8-16-3-1-2-4-18(16)14-22;\/h1-6,13,15H,7-12,14H2;1H?\nAssistant: Sure, this molecule has a canonical SMILES of Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2."} {"text":"User: Can you create the canonical SMILES of the molecule with the InChI string InChI=1S\/C14H14N2O\/c1-11-2-3-13(14(17)8-11)10-16-9-12-4-6-15-7-5-12\/h2-8,10,17H,9H2,1H3?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of Cc1ccc(C=NCc2ccncc2)c(O)c1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_3-4.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the InChI string InChI=1S\/C15H17N5\/c1-3-4-13-9-14(20-15(19-13)16-10-17-20)18-12-7-5-11(2)6-8-12\/h5-10,18H,3-4H2,1-2H3?\nAssistant: Yes, this molecule has a canonical SMILES of CCCc1cc(Nc2ccc(C)cc2)n2ncnc2n1."} {"text":"User: Can you generate the canonical SMILES of the molecule with the InChI string InChI=1S\/C15H18N2O4\/c1-11(18)16-13-5-2-4-12(10-13)15(20)21-9-8-17-7-3-6-14(17)19\/h2,4-5,10H,3,6-9H2,1H3,(H,16,18)?\nAssistant: Of course, this molecule has a canonical SMILES of CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_0-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1 can also be represented with the canonical SMILES CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1."} {"text":"The molecule with the InChI string representation of InChI=1S\/C22H18ClN3O2\/c1-28-19-9-7-15(23)12-18(19)21(26-16-5-3-10-24-13-16)17-8-6-14-4-2-11-25-20(14)22(17)27\/h2-13,21,26-27H,1H3 can also be represented with the canonical SMILES representation COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_2-5.jsonl": "{"text":"User: Can you generate the InChI string of the molecule with the canonical SMILES NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C16H12FN3O\/c17-12-3-1-11(2-4-12)15-13(16(18)21)9-14(20-15)10-5-7-19-8-6-10\/h1-9,20H,(H2,18,21)."} {"text":"User: Can you tell me the InChI string of the molecule with the canonical SMILES COc1ccc(F)cc1C(=O)C1CCCN(Cc2cnc(N3CCCC3)s2)C1?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C21H26FN3O2S\/c1-27-19-7-6-16(22)11-18(19)20(26)15-5-4-8-24(13-15)14-17-12-23-21(28-17)25-9-2-3-10-25\/h6-7,11-12,15H,2-5,8-10,13-14H2,1H3."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_5-5.jsonl": "{"text":"User: Can you generate the InChI string of the molecule with the canonical SMILES c1ccc(Cn2nnnc2SCc2nc(C3CC3)n[nH]2)cc1?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C14H15N7S\/c1-2-4-10(5-3-1)8-21-14(18-19-20-21)22-9-12-15-13(17-16-12)11-6-7-11\/h1-5,11H,6-9H2,(H,15,16,17)."} {"text":"User: Can you create the InChI string of the molecule with the canonical SMILES Nc1c2c([nH+]c3ccccc13)CCCC2?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_3-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the canonical SMILES CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C24H25BrClN3O3\/c1-5-29-16(3)24(15(2)28-29)27-23(30)11-7-17-6-9-21(31-4)18(12-17)14-32-22-10-8-19(26)13-20(22)25\/h6-13H,5,14H2,1-4H3,(H,27,30)\/b11-7+."} {"text":"User: Can you generate the InChI string of the molecule with the canonical SMILES CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C16H16FN3O2S\/c1-2-20(9-3-8-18)15(21)11-23-16-19-10-14(22-16)12-4-6-13(17)7-5-12\/h4-7,10H,2-3,9,11H2,1H3."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_2-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C20H22FN.ClH\/c21-20-6-5-17-11-15(12-19(17)13-20)7-9-22-10-8-16-3-1-2-4-18(16)14-22;\/h1-6,13,15H,7-12,14H2;1H can also be represented with the canonical SMILES Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2."} {"text":"The molecule with the InChI string representation of InChI=1S\/C14H14N2O\/c1-11-2-3-13(14(17)8-11)10-16-9-12-4-6-15-7-5-12\/h2-8,10,17H,9H2,1H3 can also be represented with the canonical SMILES representation Cc1ccc(C=NCc2ccncc2)c(O)c1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_1-4.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the InChI string InChI=1S\/C17H16FNO6S\/c1-24-14-9-11(3-8-16(20)21)10-15(17(14)25-2)26(22,23)19-13-6-4-12(18)5-7-13\/h3-10,19H,1-2H3,(H,20,21)\/b8-3+?\nAssistant: Yes, this molecule has a canonical SMILES of COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC."} {"text":"User: Can you generate the canonical SMILES of the molecule with the InChI string InChI=1S\/C15H14FNO4S\/c1-9-3-8-13(14(10(9)2)15(18)19)17-22(20,21)12-6-4-11(16)5-7-12\/h3-8,17H,1-2H3,(H,18,19)?\nAssistant: Of course, this molecule has a canonical SMILES of Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_2-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C22H22Cl2N2O2\/c1-3-5-12-26(4-2)22(28)21(27)19-17-13-16(24)10-11-18(17)25-20(19)14-6-8-15(23)9-7-14\/h6-11,13,25H,3-5,12H2,1-2H3 can also be represented with the canonical SMILES CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12."} {"text":"The molecule with the InChI string InChI=1S\/C22H26BrClN4O2\/c1-26(15-21(29)25-20-5-3-2-4-19(20)23)22(30)16-28-12-10-27(11-13-28)14-17-6-8-18(24)9-7-17\/h2-9H,10-16H2,1H3,(H,25,29) can also be represented with the canonical SMILES CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_4-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C15H14N4OS2\/c1-8-7-17-15(22-8)19-13(20)10-5-12(9-3-4-9)18-14(21-2)11(10)6-16\/h5,7,9H,3-4H2,1-2H3,(H,17,19,20)\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CSc1nc(C2CC2)cc(C(=O)Nc2ncc(C)s2)c1C#N"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the canonical SMILES from the InChI string.\nInChI string: InChI=1S\/C17H22FNO4\/c1-11(2)15(17(21)23-10-14-4-3-9-22-14)19-16(20)12-5-7-13(18)8-6-12\/h5-8,11,14-15H,3-4,9-10H2,1-2H3,(H,19,20)\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_0-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the canonical SMILES.\ncanonical SMILES: CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the canonical SMILES.\ncanonical SMILES: COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C22H18ClN3O2\/c1-28-19-9-7-15(23)12-18(19)21(26-16-5-3-10-24-13-16)17-8-6-14-4-2-11-25-20(14)22(17)27\/h2-13,21,26-27H,1H3"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_4-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the canonical SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C16H18N2O4S\/c1-11-15(8-9-22-11)16(19)17-10-12-2-6-14(7-3-12)23(20,21)18-13-4-5-13\/h2-3,6-9,13,18H,4-5,10H2,1H3,(H,17,19)\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the canonical SMILES from the InChI string.\nInChI string: InChI=1S\/C16H15N3O2S\/c1-11-12(6-9-21-11)16(20)18-8-5-15-19-14(10-22-15)13-4-2-3-7-17-13\/h2-4,6-7,9-10H,5,8H2,1H3,(H,18,20)\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_3-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C15H17N5\/c1-3-4-13-9-14(20-15(19-13)16-10-17-20)18-12-7-5-11(2)6-8-12\/h5-10,18H,3-4H2,1-2H3\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CCCc1cc(Nc2ccc(C)cc2)n2ncnc2n1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the InChI string.\nInChI string: InChI=1S\/C15H18N2O4\/c1-11(18)16-13-5-2-4-12(10-13)15(20)21-9-8-17-7-3-6-14(17)19\/h2,4-5,10H,3,6-9H2,1H3,(H,16,18)\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_2-1.jsonl": "{"text":"The molecule with the canonical SMILES CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12 can also be represented with the InChI string InChI=1S\/C22H22Cl2N2O2\/c1-3-5-12-26(4-2)22(28)21(27)19-17-13-16(24)10-11-18(17)25-20(19)14-6-8-15(23)9-7-14\/h6-11,13,25H,3-5,12H2,1-2H3."} {"text":"The molecule with the canonical SMILES CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1 can also be represented with the InChI string InChI=1S\/C22H26BrClN4O2\/c1-26(15-21(29)25-20-5-3-2-4-19(20)23)22(30)16-28-12-10-27(11-13-28)14-17-6-8-18(24)9-7-17\/h2-9H,10-16H2,1H3,(H,25,29)."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_5-4.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the InChI string InChI=1S\/C17H21N3O2S\/c1-11(2)15(17(22)20-6-3-4-7-20)19-16(21)12-9-14-13(18-10-12)5-8-23-14\/h5,8-11,15H,3-4,6-7H2,1-2H3,(H,19,21)?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the InChI string InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1?\nAssistant: Yes, this molecule has a canonical SMILES of CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_4-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the canonical SMILES Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C16H18N2O4S\/c1-11-15(8-9-22-11)16(19)17-10-12-2-6-14(7-3-12)23(20,21)18-13-4-5-13\/h2-3,6-9,13,18H,4-5,10H2,1H3,(H,17,19)."} {"text":"User: Can you create the InChI string of the molecule with the canonical SMILES Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C16H15N3O2S\/c1-11-12(6-9-21-11)16(20)18-8-5-15-19-14(10-22-15)13-4-2-3-7-17-13\/h2-4,6-7,9-10H,5,8H2,1H3,(H,18,20)."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_4-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C15H14N4OS2\/c1-8-7-17-15(22-8)19-13(20)10-5-12(9-3-4-9)18-14(21-2)11(10)6-16\/h5,7,9H,3-4H2,1-2H3,(H,17,19,20) can also be represented with the canonical SMILES CSc1nc(C2CC2)cc(C(=O)Nc2ncc(C)s2)c1C#N."} {"text":"The molecule with the InChI string representation of InChI=1S\/C17H22FNO4\/c1-11(2)15(17(21)23-10-14-4-3-9-22-14)19-16(20)12-5-7-13(18)8-6-12\/h5-8,11,14-15H,3-4,9-10H2,1-2H3,(H,19,20) can also be represented with the canonical SMILES CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_5-1.jsonl": "{"text":"The molecule with the canonical SMILES c1ccc(Cn2nnnc2SCc2nc(C3CC3)n[nH]2)cc1 can also be represented with the InChI string representation InChI=1S\/C14H15N7S\/c1-2-4-10(5-3-1)8-21-14(18-19-20-21)22-9-12-15-13(17-16-12)11-6-7-11\/h1-5,11H,6-9H2,(H,15,16,17)."} {"text":"The molecule with the canonical SMILES representation of Nc1c2c([nH+]c3ccccc13)CCCC2 can also be represented with the InChI string representation InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_2-1.jsonl": "{"text":"The molecule with the canonical SMILES representation of NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1 can also be represented with the InChI string InChI=1S\/C16H12FN3O\/c17-12-3-1-11(2-4-12)15-13(16(18)21)9-14(20-15)10-5-7-19-8-6-10\/h1-9,20H,(H2,18,21)."} {"text":"The molecule with the canonical SMILES COc1ccc(F)cc1C(=O)C1CCCN(Cc2cnc(N3CCCC3)s2)C1 can also be represented with the InChI string InChI=1S\/C21H26FN3O2S\/c1-27-19-7-6-16(22)11-18(19)20(26)15-5-4-8-24(13-15)14-17-12-23-21(28-17)25-9-2-3-10-25\/h6-7,11-12,15H,2-5,8-10,13-14H2,1H3."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_0-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1 can also be represented with the canonical SMILES Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1."} {"text":"The molecule with the InChI string representation of InChI=1S\/C18H16N2O3S\/c1-22-13-9-7-12(8-10-13)16-11-15(20-23-16)18(21)19-14-5-3-4-6-17(14)24-2\/h3-11H,1-2H3,(H,19,21) can also be represented with the canonical SMILES representation COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_0-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the canonical SMILES.\nMolecule canonical SMILES: Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the InChI string from the canonical SMILES.\ncanonical SMILES: COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C18H16N2O3S\/c1-22-13-9-7-12(8-10-13)16-11-15(20-23-16)18(21)19-14-5-3-4-6-17(14)24-2\/h3-11H,1-2H3,(H,19,21)"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_1-4.jsonl": "{"text":"User: Can you generate the canonical SMILES of the molecule with the InChI string InChI=1S\/C22H26N2O4\/c1-4-21(25)24-17-9-7-6-8-16(17)13-18(24)22(26)23-14-15-10-11-19(28-5-2)20(12-15)27-3\/h6-12,18H,4-5,13-14H2,1-3H3,(H,23,26)?\nAssistant: Of course, this molecule has a canonical SMILES of CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC."} {"text":"User: Can you generate the canonical SMILES of the molecule with the InChI string InChI=1S\/C17H14ClN3O3S2\/c1-11-16(26(23,24)10-12-6-2-3-7-13(12)18)25-17(20-11)21-15(22)14-8-4-5-9-19-14\/h2-9H,10H2,1H3,(H,20,21,22)?\nAssistant: Of course, this molecule has a canonical SMILES of Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_2-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the canonical SMILES Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C20H22FN.ClH\/c21-20-6-5-17-11-15(12-19(17)13-20)7-9-22-10-8-16-3-1-2-4-18(16)14-22;\/h1-6,13,15H,7-12,14H2;1H."} {"text":"User: Can you tell me the InChI string of the molecule with the canonical SMILES Cc1ccc(C=NCc2ccncc2)c(O)c1?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C14H14N2O\/c1-11-2-3-13(14(17)8-11)10-16-9-12-4-6-15-7-5-12\/h2-8,10,17H,9H2,1H3."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_1-1.jsonl": "{"text":"The molecule with the canonical SMILES CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC can also be represented with the InChI string representation InChI=1S\/C22H26N2O4\/c1-4-21(25)24-17-9-7-6-8-16(17)13-18(24)22(26)23-14-15-10-11-19(28-5-2)20(12-15)27-3\/h6-12,18H,4-5,13-14H2,1-3H3,(H,23,26)."} {"text":"The molecule with the canonical SMILES Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl can also be represented with the InChI string representation InChI=1S\/C17H14ClN3O3S2\/c1-11-16(26(23,24)10-12-6-2-3-7-13(12)18)25-17(20-11)21-15(22)14-8-4-5-9-19-14\/h2-9H,10H2,1H3,(H,20,21,22)."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_5-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C14H15N7S\/c1-2-4-10(5-3-1)8-21-14(18-19-20-21)22-9-12-15-13(17-16-12)11-6-7-11\/h1-5,11H,6-9H2,(H,15,16,17)\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: c1ccc(Cn2nnnc2SCc2nc(C3CC3)n[nH]2)cc1"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Nc1c2c([nH+]c3ccccc13)CCCC2"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_5-1.jsonl": "{"text":"The molecule with the canonical SMILES CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1 can also be represented with the InChI string representation InChI=1S\/C17H21N3O2S\/c1-11(2)15(17(22)20-6-3-4-7-20)19-16(21)12-9-14-13(18-10-12)5-8-23-14\/h5,8-11,15H,3-4,6-7H2,1-2H3,(H,19,21)."} {"text":"The molecule with the canonical SMILES representation of CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1 can also be represented with the InChI string representation InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_4-1.jsonl": "{"text":"The molecule with the canonical SMILES O=C(NCc1ccc(S(=O)(=O)NC2CC2)cc1)C1CC=CCC1 can also be represented with the InChI string representation InChI=1S\/C17H22N2O3S\/c20-17(14-4-2-1-3-5-14)18-12-13-6-10-16(11-7-13)23(21,22)19-15-8-9-15\/h1-2,6-7,10-11,14-15,19H,3-5,8-9,12H2,(H,18,20)."} {"text":"The molecule with the canonical SMILES CCOc1cc(F)ccc1NCc1ccc(-n2cncn2)cc1 can also be represented with the InChI string representation InChI=1S\/C17H17FN4O\/c1-2-23-17-9-14(18)5-8-16(17)20-10-13-3-6-15(7-4-13)22-12-19-11-21-22\/h3-9,11-12,20H,2,10H2,1H3."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_3-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the canonical SMILES.\ncanonical SMILES: CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C24H25BrClN3O3\/c1-5-29-16(3)24(15(2)28-29)27-23(30)11-7-17-6-9-21(31-4)18(12-17)14-32-22-10-8-19(26)13-20(22)25\/h6-13H,5,14H2,1-4H3,(H,27,30)\/b11-7+"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the canonical SMILES.\ncanonical SMILES: CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C16H16FN3O2S\/c1-2-20(9-3-8-18)15(21)11-23-16-19-10-14(22-16)12-4-6-13(17)7-5-12\/h4-7,10H,2-3,9,11H2,1H3"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_3-4.jsonl": "{"text":"User: Can you create the canonical SMILES of the molecule with the InChI string InChI=1S\/C25H22FN5O3S\/c1-31-8-7-19-16(12-31)9-15(10-27)24(30-19)35-13-20-22(25(32)33-2)21(18(11-28)23(29)34-20)14-3-5-17(26)6-4-14\/h3-6,9,21H,7-8,12-13,29H2,1-2H3?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the InChI string InChI=1S\/C16H16N2O4\/c19-14-10-12(11-4-1-2-5-13(11)17-14)16(21)22-9-8-18-7-3-6-15(18)20\/h1-2,4-5,10H,3,6-9H2,(H,17,19)?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_1-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the canonical SMILES.\ncanonical SMILES: COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C19H18N2O2S\/c1-13-20-18(12-24-13)15-6-8-16(9-7-15)21-19(22)11-14-4-3-5-17(10-14)23-2\/h3-10,12H,11H2,1-2H3,(H,21,22)"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the canonical SMILES.\nMolecule canonical SMILES: COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C11H10N4O3\/c1-18-5-2-3-7-6(4-5)8-9(13-7)10(16)15(12)11(17)14-8\/h2-4,13H,12H2,1H3,(H,14,17)"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_0-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C22H23N3O4\/c1-22(2,3)24-21(27)19(11-7-10-16-8-5-4-6-9-16)23-20(26)17-12-14-18(15-13-17)25(28)29\/h4-15H,1-3H3,(H,23,26)(H,24,27)\/b10-7+,19-11+\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_5-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C14H15N7S\/c1-2-4-10(5-3-1)8-21-14(18-19-20-21)22-9-12-15-13(17-16-12)11-6-7-11\/h1-5,11H,6-9H2,(H,15,16,17) can also be represented with the canonical SMILES c1ccc(Cn2nnnc2SCc2nc(C3CC3)n[nH]2)cc1."} {"text":"The molecule with the InChI string representation of InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1 can also be represented with the canonical SMILES representation Nc1c2c([nH+]c3ccccc13)CCCC2."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_0-1.jsonl": "{"text":"The molecule with the canonical SMILES representation of CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F can also be represented with the InChI string representation InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1."} {"text":"The molecule with the canonical SMILES representation of CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1 can also be represented with the InChI string InChI=1S\/C22H23N3O4\/c1-22(2,3)24-21(27)19(11-7-10-16-8-5-4-6-9-16)23-20(26)17-12-14-18(15-13-17)25(28)29\/h4-15H,1-3H3,(H,23,26)(H,24,27)\/b10-7+,19-11+."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_2-4.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the InChI string InChI=1S\/C22H22Cl2N2O2\/c1-3-5-12-26(4-2)22(28)21(27)19-17-13-16(24)10-11-18(17)25-20(19)14-6-8-15(23)9-7-14\/h6-11,13,25H,3-5,12H2,1-2H3?\nAssistant: Yes, this molecule has a canonical SMILES of CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12."} {"text":"User: Can you create the canonical SMILES of the molecule with the InChI string InChI=1S\/C22H26BrClN4O2\/c1-26(15-21(29)25-20-5-3-2-4-19(20)23)22(30)16-28-12-10-27(11-13-28)14-17-6-8-18(24)9-7-17\/h2-9H,10-16H2,1H3,(H,25,29)?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_2-1.jsonl": "{"text":"The molecule with the canonical SMILES representation of Cl.Fc1ccc2c(c1)CC(CCN1CCc3ccccc3C1)C2 can also be represented with the InChI string representation InChI=1S\/C20H22FN.ClH\/c21-20-6-5-17-11-15(12-19(17)13-20)7-9-22-10-8-16-3-1-2-4-18(16)14-22;\/h1-6,13,15H,7-12,14H2;1H."} {"text":"The molecule with the canonical SMILES representation of Cc1ccc(C=NCc2ccncc2)c(O)c1 can also be represented with the InChI string representation InChI=1S\/C14H14N2O\/c1-11-2-3-13(14(17)8-11)10-16-9-12-4-6-15-7-5-12\/h2-8,10,17H,9H2,1H3."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_1-1.jsonl": "{"text":"The molecule with the canonical SMILES COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1 can also be represented with the InChI string InChI=1S\/C19H18N2O2S\/c1-13-20-18(12-24-13)15-6-8-16(9-7-15)21-19(22)11-14-4-3-5-17(10-14)23-2\/h3-10,12H,11H2,1-2H3,(H,21,22)."} {"text":"The molecule with the canonical SMILES representation of COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1 can also be represented with the InChI string InChI=1S\/C11H10N4O3\/c1-18-5-2-3-7-6(4-5)8-9(13-7)10(16)15(12)11(17)14-8\/h2-4,13H,12H2,1H3,(H,14,17)."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_3-5.jsonl": "{"text":"User: Can you create the InChI string of the molecule with the canonical SMILES CCCc1cc(Nc2ccc(C)cc2)n2ncnc2n1?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C15H17N5\/c1-3-4-13-9-14(20-15(19-13)16-10-17-20)18-12-7-5-11(2)6-8-12\/h5-10,18H,3-4H2,1-2H3."} {"text":"User: Can you tell me the InChI string of the molecule with the canonical SMILES CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C15H18N2O4\/c1-11(18)16-13-5-2-4-12(10-13)15(20)21-9-8-17-7-3-6-14(17)19\/h2,4-5,10H,3,6-9H2,1H3,(H,16,18)."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_3-1.jsonl": "{"text":"The molecule with the canonical SMILES CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C can also be represented with the InChI string InChI=1S\/C24H25BrClN3O3\/c1-5-29-16(3)24(15(2)28-29)27-23(30)11-7-17-6-9-21(31-4)18(12-17)14-32-22-10-8-19(26)13-20(22)25\/h6-13H,5,14H2,1-4H3,(H,27,30)\/b11-7+."} {"text":"The molecule with the canonical SMILES CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1 can also be represented with the InChI string InChI=1S\/C16H16FN3O2S\/c1-2-20(9-3-8-18)15(21)11-23-16-19-10-14(22-16)12-4-6-13(17)7-5-12\/h4-7,10H,2-3,9,11H2,1H3."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_0-5.jsonl": "{"text":"User: Can you generate the InChI string of the molecule with the canonical SMILES CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1."} {"text":"User: Can you generate the InChI string of the molecule with the canonical SMILES CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C22H23N3O4\/c1-22(2,3)24-21(27)19(11-7-10-16-8-5-4-6-9-16)23-20(26)17-12-14-18(15-13-17)25(28)29\/h4-15H,1-3H3,(H,23,26)(H,24,27)\/b10-7+,19-11+."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_0-4.jsonl": "{"text":"User: Can you generate the canonical SMILES of the molecule with the InChI string InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1?\nAssistant: Yes, this molecule has a canonical SMILES of CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F."} {"text":"User: Can you generate the canonical SMILES of the molecule with the InChI string InChI=1S\/C22H23N3O4\/c1-22(2,3)24-21(27)19(11-7-10-16-8-5-4-6-9-16)23-20(26)17-12-14-18(15-13-17)25(28)29\/h4-15H,1-3H3,(H,23,26)(H,24,27)\/b10-7+,19-11+?\nAssistant: Yes, this molecule has a canonical SMILES of CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_1-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the InChI string from the canonical SMILES.\nMolecule canonical SMILES: CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C22H26N2O4\/c1-4-21(25)24-17-9-7-6-8-16(17)13-18(24)22(26)23-14-15-10-11-19(28-5-2)20(12-15)27-3\/h6-12,18H,4-5,13-14H2,1-3H3,(H,23,26)"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the canonical SMILES.\ncanonical SMILES: Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C17H14ClN3O3S2\/c1-11-16(26(23,24)10-12-6-2-3-7-13(12)18)25-17(20-11)21-15(22)14-8-4-5-9-19-14\/h2-9H,10H2,1H3,(H,20,21,22)"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_5-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the InChI string from the canonical SMILES.\nMolecule canonical SMILES: CC(C)C(NC(=O)c1cnc2ccsc2c1)C(=O)N1CCCC1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C17H21N3O2S\/c1-11(2)15(17(22)20-6-3-4-7-20)19-16(21)12-9-14-13(18-10-12)5-8-23-14\/h5,8-11,15H,3-4,6-7H2,1-2H3,(H,19,21)"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the canonical SMILES.\nMolecule canonical SMILES: CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C49H66N10O10S2\/c1-28(61)39(25-60)56-48(68)41-27-71-70-26-40(57-43(63)34(51)21-30-13-5-3-6-14-30)47(67)54-37(22-31-15-7-4-8-16-31)45(65)55-38(23-32-24-52-35-18-10-9-17-33(32)35)46(66)53-36(19-11-12-20-50)44(64)59-42(29(2)62)49(69)58-41\/h3-10,13-18,24,28-29,34,36-42,52,60-62H,11-12,19-23,25-27,50-51H2,1-2H3,(H,53,66)(H,54,67)(H,55,65)(H,56,68)(H,57,63)(H,58,69)(H,59,64)\/p+1"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_0-5.jsonl": "{"text":"User: Can you generate the InChI string of the molecule with the canonical SMILES Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1."} {"text":"User: Can you create the InChI string of the molecule with the canonical SMILES COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C18H16N2O3S\/c1-22-13-9-7-12(8-10-13)16-11-15(20-23-16)18(21)19-14-5-3-4-6-17(14)24-2\/h3-11H,1-2H3,(H,19,21)."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_4-5.jsonl": "{"text":"User: Can you generate the InChI string of the molecule with the canonical SMILES O=C(NCc1ccc(S(=O)(=O)NC2CC2)cc1)C1CC=CCC1?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C17H22N2O3S\/c20-17(14-4-2-1-3-5-14)18-12-13-6-10-16(11-7-13)23(21,22)19-15-8-9-15\/h1-2,6-7,10-11,14-15,19H,3-5,8-9,12H2,(H,18,20)."} {"text":"User: Can you generate the InChI string of the molecule with the canonical SMILES CCOc1cc(F)ccc1NCc1ccc(-n2cncn2)cc1?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C17H17FN4O\/c1-2-23-17-9-14(18)5-8-16(17)20-10-13-3-6-15(7-4-13)22-12-19-11-21-22\/h3-9,11-12,20H,2,10H2,1H3."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_1-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C22H26N2O4\/c1-4-21(25)24-17-9-7-6-8-16(17)13-18(24)22(26)23-14-15-10-11-19(28-5-2)20(12-15)27-3\/h6-12,18H,4-5,13-14H2,1-3H3,(H,23,26) can also be represented with the canonical SMILES representation CCOc1ccc(CNC(=O)C2Cc3ccccc3N2C(=O)CC)cc1OC."} {"text":"The molecule with the InChI string representation of InChI=1S\/C17H14ClN3O3S2\/c1-11-16(26(23,24)10-12-6-2-3-7-13(12)18)25-17(20-11)21-15(22)14-8-4-5-9-19-14\/h2-9H,10H2,1H3,(H,20,21,22) can also be represented with the canonical SMILES representation Cc1nc(NC(=O)c2ccccn2)sc1S(=O)(=O)Cc1ccccc1Cl."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_0-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the canonical SMILES from the InChI string.\nInChI string: InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C18H16N2O3S\/c1-22-13-9-7-12(8-10-13)16-11-15(20-23-16)18(21)19-14-5-3-4-6-17(14)24-2\/h3-11H,1-2H3,(H,19,21)\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_4-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the InChI string.\nInChI string: InChI=1S\/C17H22N2O3S\/c20-17(14-4-2-1-3-5-14)18-12-13-6-10-16(11-7-13)23(21,22)19-15-8-9-15\/h1-2,6-7,10-11,14-15,19H,3-5,8-9,12H2,(H,18,20)\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: O=C(NCc1ccc(S(=O)(=O)NC2CC2)cc1)C1CC=CCC1"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the InChI string.\nInChI string: InChI=1S\/C17H17FN4O\/c1-2-23-17-9-14(18)5-8-16(17)20-10-13-3-6-15(7-4-13)22-12-19-11-21-22\/h3-9,11-12,20H,2,10H2,1H3\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCOc1cc(F)ccc1NCc1ccc(-n2cncn2)cc1"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_4-1.jsonl": "{"text":"The molecule with the canonical SMILES representation of CSc1nc(C2CC2)cc(C(=O)Nc2ncc(C)s2)c1C#N can also be represented with the InChI string representation InChI=1S\/C15H14N4OS2\/c1-8-7-17-15(22-8)19-13(20)10-5-12(9-3-4-9)18-14(21-2)11(10)6-16\/h5,7,9H,3-4H2,1-2H3,(H,17,19,20)."} {"text":"The molecule with the canonical SMILES CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1 can also be represented with the InChI string representation InChI=1S\/C17H22FNO4\/c1-11(2)15(17(21)23-10-14-4-3-9-22-14)19-16(20)12-5-7-13(18)8-6-12\/h5-8,11,14-15H,3-4,9-10H2,1-2H3,(H,19,20)."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_5-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the InChI string from the canonical SMILES.\ncanonical SMILES: Cc1ccc(N)cc1NC(=O)CSc1ccccc1F\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C15H15FN2OS\/c1-10-6-7-11(17)8-13(10)18-15(19)9-20-14-5-3-2-4-12(14)16\/h2-8H,9,17H2,1H3,(H,18,19)"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the canonical SMILES.\ncanonical SMILES: CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8-"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_2-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the InChI string.\nInChI string: InChI=1S\/C16H12FN3O\/c17-12-3-1-11(2-4-12)15-13(16(18)21)9-14(20-15)10-5-7-19-8-6-10\/h1-9,20H,(H2,18,21)\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the InChI string.\nInChI string: InChI=1S\/C21H26FN3O2S\/c1-27-19-7-6-16(22)11-18(19)20(26)15-5-4-8-24(13-15)14-17-12-23-21(28-17)25-9-2-3-10-25\/h6-7,11-12,15H,2-5,8-10,13-14H2,1H3\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: COc1ccc(F)cc1C(=O)C1CCCN(Cc2cnc(N3CCCC3)s2)C1"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_1-1.jsonl": "{"text":"The molecule with the canonical SMILES COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC can also be represented with the InChI string InChI=1S\/C17H16FNO6S\/c1-24-14-9-11(3-8-16(20)21)10-15(17(14)25-2)26(22,23)19-13-6-4-12(18)5-7-13\/h3-10,19H,1-2H3,(H,20,21)\/b8-3+."} {"text":"The molecule with the canonical SMILES representation of Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C can also be represented with the InChI string representation InChI=1S\/C15H14FNO4S\/c1-9-3-8-13(14(10(9)2)15(18)19)17-22(20,21)12-6-4-11(16)5-7-12\/h3-8,17H,1-2H3,(H,18,19)."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_0-1.jsonl": "{"text":"The molecule with the canonical SMILES Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1 can also be represented with the InChI string representation InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1."} {"text":"The molecule with the canonical SMILES COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1 can also be represented with the InChI string InChI=1S\/C18H16N2O3S\/c1-22-13-9-7-12(8-10-13)16-11-15(20-23-16)18(21)19-14-5-3-4-6-17(14)24-2\/h3-11H,1-2H3,(H,19,21)."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_3-4.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the InChI string InChI=1S\/C24H25BrClN3O3\/c1-5-29-16(3)24(15(2)28-29)27-23(30)11-7-17-6-9-21(31-4)18(12-17)14-32-22-10-8-19(26)13-20(22)25\/h6-13H,5,14H2,1-4H3,(H,27,30)\/b11-7+?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of CCn1nc(C)c(NC(=O)\/C=C\/c2ccc(OC)c(COc3ccc(Cl)cc3Br)c2)c1C."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the InChI string InChI=1S\/C16H16FN3O2S\/c1-2-20(9-3-8-18)15(21)11-23-16-19-10-14(22-16)12-4-6-13(17)7-5-12\/h4-7,10H,2-3,9,11H2,1H3?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of CCN(CCC#N)C(=O)CSc1ncc(-c2ccc(F)cc2)o1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_0-4.jsonl": "{"text":"User: Can you generate the canonical SMILES of the molecule with the InChI string InChI=1S\/C27H33N3O2\/c1-17-7-5-6-8-23(17)19-9-10-24-20(14-19)15-21(25(28)30-24)13-18(2)26(31)29-22-11-12-32-27(3,4)16-22\/h5-10,14-15,18,22H,11-13,16H2,1-4H3,(H2,28,30)(H,29,31)\/t18-,22-\/m1\/s1?\nAssistant: Sure, this molecule has a canonical SMILES of Cc1ccccc1-c1ccc2nc(N)c(C[C@@H](C)C(=O)N[C@@H]3CCOC(C)(C)C3)cc2c1."} {"text":"User: Can you create the canonical SMILES of the molecule with the InChI string InChI=1S\/C18H16N2O3S\/c1-22-13-9-7-12(8-10-13)16-11-15(20-23-16)18(21)19-14-5-3-4-6-17(14)24-2\/h3-11H,1-2H3,(H,19,21)?\nAssistant: Sure, this molecule has a canonical SMILES of COc1ccc(-c2cc(C(=O)Nc3ccccc3SC)no2)cc1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_1-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C19H18N2O2S\/c1-13-20-18(12-24-13)15-6-8-16(9-7-15)21-19(22)11-14-4-3-5-17(10-14)23-2\/h3-10,12H,11H2,1-2H3,(H,21,22) can also be represented with the canonical SMILES COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1."} {"text":"The molecule with the InChI string representation of InChI=1S\/C11H10N4O3\/c1-18-5-2-3-7-6(4-5)8-9(13-7)10(16)15(12)11(17)14-8\/h2-4,13H,12H2,1H3,(H,14,17) can also be represented with the canonical SMILES representation COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_4-5.jsonl": "{"text":"User: Can you generate the InChI string of the molecule with the canonical SMILES CSc1nc(C2CC2)cc(C(=O)Nc2ncc(C)s2)c1C#N?\nAssistant: Of course, this molecule has a InChI string of InChI=1S\/C15H14N4OS2\/c1-8-7-17-15(22-8)19-13(20)10-5-12(9-3-4-9)18-14(21-2)11(10)6-16\/h5,7,9H,3-4H2,1-2H3,(H,17,19,20)."} {"text":"User: Can you tell me the InChI string of the molecule with the canonical SMILES CC(C)C(NC(=O)c1ccc(F)cc1)C(=O)OCC1CCCO1?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C17H22FNO4\/c1-11(2)15(17(21)23-10-14-4-3-9-22-14)19-16(20)12-5-7-13(18)8-6-12\/h5-8,11,14-15H,3-4,9-10H2,1-2H3,(H,19,20)."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_4-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the canonical SMILES.\nMolecule canonical SMILES: O=C(NCc1ccc(S(=O)(=O)NC2CC2)cc1)C1CC=CCC1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C17H22N2O3S\/c20-17(14-4-2-1-3-5-14)18-12-13-6-10-16(11-7-13)23(21,22)19-15-8-9-15\/h1-2,6-7,10-11,14-15,19H,3-5,8-9,12H2,(H,18,20)"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the InChI string from the canonical SMILES.\nMolecule canonical SMILES: CCOc1cc(F)ccc1NCc1ccc(-n2cncn2)cc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C17H17FN4O\/c1-2-23-17-9-14(18)5-8-16(17)20-10-13-3-6-15(7-4-13)22-12-19-11-21-22\/h3-9,11-12,20H,2,10H2,1H3"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_4-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the canonical SMILES.\ncanonical SMILES: Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C16H18N2O4S\/c1-11-15(8-9-22-11)16(19)17-10-12-2-6-14(7-3-12)23(20,21)18-13-4-5-13\/h2-3,6-9,13,18H,4-5,10H2,1H3,(H,17,19)"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the InChI string from the canonical SMILES.\nMolecule canonical SMILES: Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C16H15N3O2S\/c1-11-12(6-9-21-11)16(20)18-8-5-15-19-14(10-22-15)13-4-2-3-7-17-13\/h2-4,6-7,9-10H,5,8H2,1H3,(H,18,20)"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_3-1.jsonl": "{"text":"The molecule with the canonical SMILES representation of COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1 can also be represented with the InChI string InChI=1S\/C25H22FN5O3S\/c1-31-8-7-19-16(12-31)9-15(10-27)24(30-19)35-13-20-22(25(32)33-2)21(18(11-28)23(29)34-20)14-3-5-17(26)6-4-14\/h3-6,9,21H,7-8,12-13,29H2,1-2H3."} {"text":"The molecule with the canonical SMILES O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12 can also be represented with the InChI string InChI=1S\/C16H16N2O4\/c19-14-10-12(11-4-1-2-5-13(11)17-14)16(21)22-9-8-18-7-3-6-15(18)20\/h1-2,4-5,10H,3,6-9H2,(H,17,19)."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_2-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the canonical SMILES.\nMolecule canonical SMILES: CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C22H22Cl2N2O2\/c1-3-5-12-26(4-2)22(28)21(27)19-17-13-16(24)10-11-18(17)25-20(19)14-6-8-15(23)9-7-14\/h6-11,13,25H,3-5,12H2,1-2H3"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the InChI string from the canonical SMILES.\ncanonical SMILES: CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C22H26BrClN4O2\/c1-26(15-21(29)25-20-5-3-2-4-19(20)23)22(30)16-28-12-10-27(11-13-28)14-17-6-8-18(24)9-7-17\/h2-9H,10-16H2,1H3,(H,25,29)"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_3-1.jsonl": "{"text":"The molecule with the canonical SMILES CCCc1cc(Nc2ccc(C)cc2)n2ncnc2n1 can also be represented with the InChI string InChI=1S\/C15H17N5\/c1-3-4-13-9-14(20-15(19-13)16-10-17-20)18-12-7-5-11(2)6-8-12\/h5-10,18H,3-4H2,1-2H3."} {"text":"The molecule with the canonical SMILES CC(=O)Nc1cccc(C(=O)OCCN2CCCC2=O)c1 can also be represented with the InChI string InChI=1S\/C15H18N2O4\/c1-11(18)16-13-5-2-4-12(10-13)15(20)21-9-8-17-7-3-6-14(17)19\/h2,4-5,10H,3,6-9H2,1H3,(H,16,18)."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_0-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the canonical SMILES.\ncanonical SMILES: CCOC[C@@H](Oc1cc(C[C@@H]2CS(=O)(=O)C[C@H]([NH2+]Cc3cccc(C(C)(C)C)c3)[C@H]2O)cc(F)c1N)C(F)(F)F\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: InChI=1S\/C28H38F4N2O5S\/c1-5-38-14-24(28(30,31)32)39-23-12-18(11-21(29)25(23)33)9-19-15-40(36,37)16-22(26(19)35)34-13-17-7-6-8-20(10-17)27(2,3)4\/h6-8,10-12,19,22,24,26,34-35H,5,9,13-16,33H2,1-4H3\/p+1\/t19-,22+,24-,26+\/m1\/s1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the InChI string from the canonical SMILES.\nMolecule canonical SMILES: CC(C)(C)NC(=O)\/C(=C\\C=C\\c1ccccc1)NC(=O)c1ccc([N+](=O)[O-])cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C22H23N3O4\/c1-22(2,3)24-21(27)19(11-7-10-16-8-5-4-6-9-16)23-20(26)17-12-14-18(15-13-17)25(28)29\/h4-15H,1-3H3,(H,23,26)(H,24,27)\/b10-7+,19-11+"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_1-5.jsonl": "{"text":"User: Can you generate the InChI string of the molecule with the canonical SMILES COc1cccc(CC(=O)Nc2ccc(-c3csc(C)n3)cc2)c1?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C19H18N2O2S\/c1-13-20-18(12-24-13)15-6-8-16(9-7-15)21-19(22)11-14-4-3-5-17(10-14)23-2\/h3-10,12H,11H2,1-2H3,(H,21,22)."} {"text":"User: Can you create the InChI string of the molecule with the canonical SMILES COc1ccc2[nH]c3c(=O)n(N)c(=O)[nH]c3c2c1?\nAssistant: Sure, this molecule has a InChI string of InChI=1S\/C11H10N4O3\/c1-18-5-2-3-7-6(4-5)8-9(13-7)10(16)15(12)11(17)14-8\/h2-4,13H,12H2,1H3,(H,14,17)."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_4-1.jsonl": "{"text":"The molecule with the canonical SMILES representation of Cc1occc1C(=O)NCc1ccc(S(=O)(=O)NC2CC2)cc1 can also be represented with the InChI string InChI=1S\/C16H18N2O4S\/c1-11-15(8-9-22-11)16(19)17-10-12-2-6-14(7-3-12)23(20,21)18-13-4-5-13\/h2-3,6-9,13,18H,4-5,10H2,1H3,(H,17,19)."} {"text":"The molecule with the canonical SMILES Cc1occc1C(=O)NCCc1nc(-c2ccccn2)cs1 can also be represented with the InChI string InChI=1S\/C16H15N3O2S\/c1-11-12(6-9-21-11)16(20)18-8-5-15-19-14(10-22-15)13-4-2-3-7-17-13\/h2-4,6-7,9-10H,5,8H2,1H3,(H,18,20)."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_5-4.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the InChI string InChI=1S\/C15H15FN2OS\/c1-10-6-7-11(17)8-13(10)18-15(19)9-20-14-5-3-2-4-12(14)16\/h2-8H,9,17H2,1H3,(H,18,19)?\nAssistant: Of course, this molecule has a canonical SMILES of Cc1ccc(N)cc1NC(=O)CSc1ccccc1F."} {"text":"User: Can you generate the canonical SMILES of the molecule with the InChI string InChI=1S\/C22H32O4\/c1-3-4-7-15(2)20(23)11-10-18-19-13-16(8-5-6-9-22(25)26)12-17(19)14-21(18)24\/h8,10-11,15,17-21,23-24H,5-7,9,12-14H2,1-2H3,(H,25,26)\/p-1\/b11-10+,16-8-?\nAssistant: Sure, this molecule has a canonical SMILES of CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/valid_2-5.jsonl": "{"text":"User: Can you generate the InChI string of the molecule with the canonical SMILES CCCCN(CC)C(=O)C(=O)c1c(-c2ccc(Cl)cc2)[nH]c2ccc(Cl)cc12?\nAssistant: Yes, this molecule has a InChI string of InChI=1S\/C22H22Cl2N2O2\/c1-3-5-12-26(4-2)22(28)21(27)19-17-13-16(24)10-11-18(17)25-20(19)14-6-8-15(23)9-7-14\/h6-11,13,25H,3-5,12H2,1-2H3."} {"text":"User: Can you tell me the InChI string of the molecule with the canonical SMILES CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1?\nAssistant: Yes, I'm happy to help, this molecule has a InChI string of InChI=1S\/C22H26BrClN4O2\/c1-26(15-21(29)25-20-5-3-2-4-19(20)23)22(30)16-28-12-10-27(11-13-28)14-17-6-8-18(24)9-7-17\/h2-9H,10-16H2,1H3,(H,25,29)."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_1-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C17H16FNO6S\/c1-24-14-9-11(3-8-16(20)21)10-15(17(14)25-2)26(22,23)19-13-6-4-12(18)5-7-13\/h3-10,19H,1-2H3,(H,20,21)\/b8-3+\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: COc1cc(\/C=C\/C(=O)O)cc(S(=O)(=O)Nc2ccc(F)cc2)c1OC"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the canonical SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C15H14FNO4S\/c1-9-3-8-13(14(10(9)2)15(18)19)17-22(20,21)12-6-4-11(16)5-7-12\/h3-8,17H,1-2H3,(H,18,19)\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: Cc1ccc(NS(=O)(=O)c2ccc(F)cc2)c(C(=O)O)c1C"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_0-4.jsonl": "{"text":"User: Can you tell me the canonical SMILES of the molecule with the InChI string InChI=1S\/C26H31F7N2O3S\/c1-24(2,3)17-6-4-5-14(8-17)11-35-19-13-39(37)12-16(22(19)36)7-15-9-18(27)21(34)20(10-15)38-23(25(28,29)30)26(31,32)33\/h4-6,8-10,16,19,22-23,35-36H,7,11-13,34H2,1-3H3\/p+1\/t16-,19+,22+,39?\/m1\/s1?\nAssistant: Yes, I'm happy to help, this molecule has a canonical SMILES of CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1."} {"text":"User: Can you tell me the canonical SMILES of the molecule with the InChI string InChI=1S\/C22H18ClN3O2\/c1-28-19-9-7-15(23)12-18(19)21(26-16-5-3-10-24-13-16)17-8-6-14-4-2-11-25-20(14)22(17)27\/h2-13,21,26-27H,1H3?\nAssistant: Of course, this molecule has a canonical SMILES of COc1ccc(Cl)cc1C(Nc1cccnc1)c1ccc2cccnc2c1O."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_3-0.jsonl": "{"text":"The molecule with the InChI string representation of InChI=1S\/C25H22FN5O3S\/c1-31-8-7-19-16(12-31)9-15(10-27)24(30-19)35-13-20-22(25(32)33-2)21(18(11-28)23(29)34-20)14-3-5-17(26)6-4-14\/h3-6,9,21H,7-8,12-13,29H2,1-2H3 can also be represented with the canonical SMILES representation COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1."} {"text":"The molecule with the InChI string InChI=1S\/C16H16N2O4\/c19-14-10-12(11-4-1-2-5-13(11)17-14)16(21)22-9-8-18-7-3-6-15(18)20\/h1-2,4-5,10H,3,6-9H2,(H,17,19) can also be represented with the canonical SMILES O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/test_2-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the InChI string from the canonical SMILES.\nMolecule canonical SMILES: NC(=O)c1cc(-c2ccncc2)[nH]c1-c1ccc(F)cc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C16H12FN3O\/c17-12-3-1-11(2-4-12)15-13(16(18)21)9-14(20-15)10-5-7-19-8-6-10\/h1-9,20H,(H2,18,21)"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the InChI string from the canonical SMILES.\nMolecule canonical SMILES: COc1ccc(F)cc1C(=O)C1CCCN(Cc2cnc(N3CCCC3)s2)C1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C21H26FN3O2S\/c1-27-19-7-6-16(22)11-18(19)20(26)15-5-4-8-24(13-15)14-17-12-23-21(28-17)25-9-2-3-10-25\/h6-7,11-12,15H,2-5,8-10,13-14H2,1H3"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_4-0.jsonl": "{"text":"The molecule with the InChI string InChI=1S\/C17H22N2O3S\/c20-17(14-4-2-1-3-5-14)18-12-13-6-10-16(11-7-13)23(21,22)19-15-8-9-15\/h1-2,6-7,10-11,14-15,19H,3-5,8-9,12H2,(H,18,20) can also be represented with the canonical SMILES O=C(NCc1ccc(S(=O)(=O)NC2CC2)cc1)C1CC=CCC1."} {"text":"The molecule with the InChI string InChI=1S\/C17H17FN4O\/c1-2-23-17-9-14(18)5-8-16(17)20-10-13-3-6-15(7-4-13)22-12-19-11-21-22\/h3-9,11-12,20H,2,10H2,1H3 can also be represented with the canonical SMILES CCOc1cc(F)ccc1NCc1ccc(-n2cncn2)cc1."}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_3-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the canonical SMILES from the InChI string.\nInChI string: InChI=1S\/C25H22FN5O3S\/c1-31-8-7-19-16(12-31)9-15(10-27)24(30-19)35-13-20-22(25(32)33-2)21(18(11-28)23(29)34-20)14-3-5-17(26)6-4-14\/h3-6,9,21H,7-8,12-13,29H2,1-2H3\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the canonical SMILES from the InChI string.\nMolecule InChI string: InChI=1S\/C16H16N2O4\/c19-14-10-12(11-4-1-2-5-13(11)17-14)16(21)22-9-8-18-7-3-6-15(18)20\/h1-2,4-5,10H,3,6-9H2,(H,17,19)\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12"}", "/scratch/micpie/export/mol_repr_transl_canonical_inchi/train_3-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the InChI string from the canonical SMILES.\ncanonical SMILES: COC(=O)C1=C(CSc2nc3c(cc2C#N)CN(C)CC3)OC(N)=C(C#N)C1c1ccc(F)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: InChI=1S\/C25H22FN5O3S\/c1-31-8-7-19-16(12-31)9-15(10-27)24(30-19)35-13-20-22(25(32)33-2)21(18(11-28)23(29)34-20)14-3-5-17(26)6-4-14\/h3-6,9,21H,7-8,12-13,29H2,1-2H3"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the InChI string from the canonical SMILES.\ncanonical SMILES: O=C(OCCN1CCCC1=O)c1cc(=O)[nH]c2ccccc12\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: InChI=1S\/C16H16N2O4\/c19-14-10-12(11-4-1-2-5-13(11)17-14)16(21)22-9-8-18-7-3-6-15(18)20\/h1-2,4-5,10H,3,6-9H2,(H,17,19)"}", "/scratch/micpie/export/pampa_ncats/test_0-10.jsonl": "{"text":"User: Can you generate the canonical SMILES of a molecule that is permeable in the PAMPA assay?\nAssistant: Yes, I'm happy to help, here you go: CN(C)c1ccnc2sc(C(N)=O)c(N)c12"} {"text":"User: Can you give me the SMILES of a molecule that is not permeable in the PAMPA assay?\nAssistant: Of course, here you go: CC(C)C1=CC(=NN1)C(=O)N2CC[C@H](C2)NC(=O)C3CC3"}", "/scratch/micpie/export/pampa_ncats/valid_0-8.jsonl": "{"text":"User: Can you estimate if the molecule with the DeepSMILES COC=CC=CC=C6))CCNC=CC=O)NC6=S)))))N)))))))OC is permeable in the PAMPA assay?\nAssistant: No, this molecule is not permeable in the PAMPA assay."} {"text":"User: Can you figure out if the molecule with the SELFIES [C][C][=C][Branch1][#Branch2][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][Cl][C][O][C][=C][C][=N][N][Ring1][Branch1][C][=N][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=Branch1][C][=O][O] is permeable in the PAMPA assay?\nAssistant: No, this molecule is not permeable in the PAMPA assay."}", "/scratch/micpie/export/pampa_ncats/test_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are permeable in the PAMPA assay?\nConstraint: You must select none, one or more options from 1, 2, 3, 4, or 5 without using any other words.\nOptions:\n1) InChI=1S\/C19H20FN3O3\/c1-21-7-12-4-11(21)8-22(12)17-6-16-13(5-15(17)20)18(24)14(19(25)26)9-23(16)10-2-3-10\/h5-6,9-12H,2-4,7-8H2,1H3,(H,25,26)\/t11-,12-\/m0\/s1\n2) InChI=1S\/C10H12N4OS\/c1-14(2)5-3-4-13-10-6(5)7(11)8(16-10)9(12)15\/h3-4H,11H2,1-2H3,(H2,12,15)\n3) InChI=1S\/C21H22N4O3\/c1-28-18-11-9-17(10-12-18)25-21(27)22-19(23-25)16-8-5-13-24(14-16)20(26)15-6-3-2-4-7-15\/h2-4,6-7,9-12,16H,5,8,13-14H2,1H3,(H,22,23,27)\n4) InChI=1S\/C26H24F3N7O3S\/c27-26(28,29)39-20-9-5-6-17(14-20)15-22(37)31-21-12-11-18(33-34-21)7-1-2-10-24-35-36-25(40-24)32-23(38)16-19-8-3-4-13-30-19\/h3-6,8-9,11-14H,1-2,7,10,15-16H2,(H,31,34,37)(H,32,36,38)\n5) InChI=1S\/C21H29N5O2\/c1-12(2)16-14-10-27-21(3,4)9-13(14)15-17-18(28-20(15)25-16)19(24-11-23-17)22-7-8-26(5)6\/h11-12H,7-10H2,1-6H3,(H,22,23,24)\nAnswer: 2, 4, 5"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not permeable in the PAMPA assay?\nConstraint: You must select none, one or more options from A or B without using any other words.\nOptions:\n[A] N#Cc1ccc(S(=O)(=O)Nc2cnccc2C(=O)Nc2nc(-c3ccccc3)cs2)cc1\n[B] CC(C)c1cc(C(=O)N2CC[C@@H](NC(=O)C3CC3)C2)n[nH]1\nAnswer: B"}", "/scratch/micpie/export/pampa_ncats/train_0-8.jsonl": "{"text":"User: Can you figure out if the molecule with the SELFIES [C][O][C][=C][Branch1][#Branch2][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=C][C][=C][O][Ring1][Branch1] is permeable in the PAMPA assay?\nAssistant: Yes, this molecule is permeable in the PAMPA assay."} {"text":"User: Can you estimate if the molecule with the DeepSMILES CCC=CCN=CN6)NC=NC=CO5)C=CC=C6Cl))))))))))))C=CC=NN5)))Cl))))C=O)C6 is permeable in the PAMPA assay?\nAssistant: Yes, this molecule is permeable in the PAMPA assay."}", "/scratch/micpie/export/pampa_ncats/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is permeable in the PAMPA assay.\nInChI: InChI=1S\/C10H12N4OS\/c1-14(2)5-3-4-13-10-6(5)7(11)8(16-10)9(12)15\/h3-4H,11H2,1-2H3,(H2,12,15)\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is permeable in the PAMPA assay.\nMolecule SELFIES: [C][C][Branch1][C][C][C][=C][C][=Branch1][Branch1][=N][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C@H1][Branch1][Ring2][C][Ring1][Branch1][N][C][=Branch1][C][=O][C][C][C][Ring1][Ring1]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"}", "/scratch/micpie/export/pampa_ncats/valid_0-9.jsonl": "{"text":"User: Is the molecule with the DeepSMILES COC=CC=CC=C6))CCNC=CC=O)NC6=S)))))N)))))))OC permeable in the PAMPA assay?\nAssistant: No, it is not permeable in the PAMPA assay."} {"text":"User: Is the molecule with the SMILES CC1=C(C=CC(=C1)Cl)COC2=CC=NN2C3=NC=CC(=C3)C(=O)O permeable in the PAMPA assay?\nAssistant: No, it is not permeable in the PAMPA assay."}", "/scratch/micpie/export/pampa_ncats/test_0-1.jsonl": "{"text":"The molecule with the SMILES representation of CN(C)C1=C2C(=C(SC2=NC=C1)C(=O)N)N is permeating in the PAMPA assay."} {"text":"The molecule with the canonical SMILES representation of CC(C)c1cc(C(=O)N2CC[C@@H](NC(=O)C3CC3)C2)n[nH]1 is not permeating in the PAMPA assay."}", "/scratch/micpie/export/pampa_ncats/valid_0-0.jsonl": "{"text":"The molecule with the SMILES COC1=C(C=C(C=C1)CCN2C(=CC(=O)NC2=S)N)OC is not permeable in the PAMPA assay."} {"text":"The molecule with the SELFIES representation of [C][C][=C][Branch1][#Branch2][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][Cl][C][O][C][=C][C][=N][N][Ring1][Branch1][C][=N][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=Branch1][C][=O][O] is not permeable in the PAMPA assay."}", "/scratch/micpie/export/pampa_ncats/test_0-2.jsonl": "{"text":"Based on the InChI representation InChI=1S\/C10H12N4OS\/c1-14(2)5-3-4-13-10-6(5)7(11)8(16-10)9(12)15\/h3-4H,11H2,1-2H3,(H2,12,15), the molecule has permeability features."} {"text":"Based on the DeepSMILES CCC)C=CC=NN5))C=O)NCC[C@H]C5)NC=O)CCC3, the molecule has no permeability features."}", "/scratch/micpie/export/pampa_ncats/valid_0-10.jsonl": "{"text":"User: Can you give me the SMILES of a molecule that is not permeable in the PAMPA assay?\nAssistant: Yes, here you go: COC1=C(C=C(C=C1)CCN2C(=CC(=O)NC2=S)N)OC"} {"text":"User: Can you generate the SELFIES of a molecule that is not permeable in the PAMPA assay?\nAssistant: Yes, I'm happy to help, here you go: [C][C][=C][Branch1][#Branch2][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][Cl][C][O][C][=C][C][=N][N][Ring1][Branch1][C][=N][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=Branch1][C][=O][O]"}", "/scratch/micpie/export/pampa_ncats/train_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is permeable in the PAMPA assay.\nSMILES: COC1=C(C=C(C=C1)Cl)C(=O)NC2=CC=C(C=C2)NC(=O)C3=CC=CO3\nConstraint: Answer the question in a full sentence.\nResult: This molecule is permeable in the PAMPA assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is permeable in the PAMPA assay.\nSMILES: C1CC2=C(C(N=C(N2)NC3=NC4=C(O3)C=CC=C4Cl)C5=C(C=NN5)Cl)C(=O)C1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is permeable in the PAMPA assay."}", "/scratch/micpie/export/pampa_ncats/valid_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is permeable in the PAMPA assay.\nSMILES: COC1=C(C=C(C=C1)CCN2C(=CC(=O)NC2=S)N)OC\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not permeable in the PAMPA assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is permeable in the PAMPA assay.\nDeepSMILES: CC=CC=CC=C6)Cl))))COC=CC=NN5C=NC=CC=C6)C=O)O\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not permeable in the PAMPA assay."}", "/scratch/micpie/export/pampa_ncats/test_0-9.jsonl": "{"text":"User: Is the molecule with the canonical SMILES CN(C)c1ccnc2sc(C(N)=O)c(N)c12 permeable in the PAMPA assay?\nAssistant: Yes, it is permeable in the PAMPA assay."} {"text":"User: Is the molecule with the SMILES CC(C)C1=CC(=NN1)C(=O)N2CC[C@H](C2)NC(=O)C3CC3 permeable in the PAMPA assay?\nAssistant: No, it is not permeable in the PAMPA assay."}", "/scratch/micpie/export/pampa_ncats/test_0-0.jsonl": "{"text":"The molecule with the InChI representation of InChI=1S\/C10H12N4OS\/c1-14(2)5-3-4-13-10-6(5)7(11)8(16-10)9(12)15\/h3-4H,11H2,1-2H3,(H2,12,15) is permeable in the PAMPA assay."} {"text":"The molecule with the SELFIES [C][C][Branch1][C][C][C][=C][C][=Branch1][Branch1][=N][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C@H1][Branch1][Ring2][C][Ring1][Branch1][N][C][=Branch1][C][=O][C][C][C][Ring1][Ring1] is not permeable in the PAMPA assay."}", "/scratch/micpie/export/pampa_ncats/valid_0-7.jsonl": "{"text":"Task: Please create a molecule SELFIES based on the text description below.\nDescription: A molecule that is permeable in the PAMPA assay.\nResult: [C][O][C][=C][Branch2][Ring1][#Branch2][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][C][N][C][=Branch1][O][=C][C][=Branch1][C][=O][N][C][Ring1][#Branch1][=S][N][O][C]"} {"text":"Task: Please give me a InChI based on the description below.\nDescription: A molecule that is permeable in the PAMPA assay.\nResult: InChI=1S\/C17H14ClN3O3\/c1-11-8-14(18)3-2-13(11)10-24-16-5-7-20-21(16)15-9-12(17(22)23)4-6-19-15\/h2-9H,10H2,1H3,(H,22,23)"}", "/scratch/micpie/export/pampa_ncats/test_0-3.jsonl": "{"text":"The canonical SMILES CN(C)c1ccnc2sc(C(N)=O)c(N)c12 is from a molecule that is identified as permeable in the PAMPA assay."} {"text":"The SELFIES [C][C][Branch1][C][C][C][=C][C][=Branch1][Branch1][=N][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C@H1][Branch1][Ring2][C][Ring1][Branch1][N][C][=Branch1][C][=O][C][C][C][Ring1][Ring1] represents a molecule that is not identified as permeable in the PAMPA assay."}", "/scratch/micpie/export/pampa_ncats/valid_0-11.jsonl": "{"text":"User: I'm searching for the SELFIES of a molecule that is not permeable in the PAMPA assay?\nAssistant: This is a molecule that is not permeable in the PAMPA assay: [C][O][C][=C][Branch2][Ring1][#Branch2][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][C][N][C][=Branch1][O][=C][C][=Branch1][C][=O][N][C][Ring1][#Branch1][=S][N][O][C]"} {"text":"User: I'm looking for the InChI of a molecule that is not permeable in the PAMPA assay?\nAssistant: This is a molecule that is not permeable in the PAMPA assay: InChI=1S\/C17H14ClN3O3\/c1-11-8-14(18)3-2-13(11)10-24-16-5-7-20-21(16)15-9-12(17(22)23)4-6-19-15\/h2-9H,10H2,1H3,(H,22,23)"}", "/scratch/micpie/export/pampa_ncats/train_0-0.jsonl": "{"text":"The molecule with the SMILES representation of COC1=C(C=C(C=C1)Cl)C(=O)NC2=CC=C(C=C2)NC(=O)C3=CC=CO3 is permeable in the PAMPA assay."} {"text":"The molecule with the canonical SMILES O=C1CCCC2=C1C(c1[nH]ncc1Cl)N=C(Nc1nc3c(Cl)cccc3o1)N2 is permeable in the PAMPA assay."}", "/scratch/micpie/export/pampa_ncats/test_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is permeable in the PAMPA assay.\nMolecule SELFIES: [C][N][Branch1][C][C][C][=C][C][=Branch2][Ring1][C][=C][Branch1][#Branch2][S][C][Ring1][Branch1][=N][C][=C][Ring1][=Branch2][C][=Branch1][C][=O][N][N]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is permeable in the PAMPA assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is permeable in the PAMPA assay.\nMolecule SMILES: CC(C)C1=CC(=NN1)C(=O)N2CC[C@H](C2)NC(=O)C3CC3\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not permeable in the PAMPA assay."}", "/scratch/micpie/export/pampa_ncats/train_0-10.jsonl": "{"text":"User: Can you create the DeepSMILES of a molecule that is permeable in the PAMPA assay?\nAssistant: Sure, here you go: COC=CC=CC=C6))Cl)))C=O)NC=CC=CC=C6))NC=O)C=CC=CO5"} {"text":"User: Can you create the SMILES of a molecule that is permeable in the PAMPA assay?\nAssistant: Yes, here you go: C1CC2=C(C(N=C(N2)NC3=NC4=C(O3)C=CC=C4Cl)C5=C(C=NN5)Cl)C(=O)C1"}", "/scratch/micpie/export/pampa_ncats/train_0-3.jsonl": "{"text":"The DeepSMILES COC=CC=CC=C6))Cl)))C=O)NC=CC=CC=C6))NC=O)C=CC=CO5 is from a molecule that is identified as permeable in the PAMPA assay."} {"text":"The SMILES C1CC2=C(C(N=C(N2)NC3=NC4=C(O3)C=CC=C4Cl)C5=C(C=NN5)Cl)C(=O)C1 represents a molecule that is identified as permeable in the PAMPA assay."}", "/scratch/micpie/export/pampa_ncats/train_0-12.jsonl": "{"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should be permeable in the PAMPA assay.\nAssistant: Ok, this InChI is permeable in the PAMPA assay: InChI=1S\/C19H15ClN2O4\/c1-25-16-9-4-12(20)11-15(16)18(23)21-13-5-7-14(8-6-13)22-19(24)17-3-2-10-26-17\/h2-11H,1H3,(H,21,23)(H,22,24)"} {"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should be permeable in the PAMPA assay.\nAssistant: Ok, here you go, this SMILES is permeable in the PAMPA assay: C1CC2=C(C(N=C(N2)NC3=NC4=C(O3)C=CC=C4Cl)C5=C(C=NN5)Cl)C(=O)C1"}", "/scratch/micpie/export/pampa_ncats/test_0-13.jsonl": "{"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should be permeable in the PAMPA assay.\nAssistant: Ok, this SELFIES is permeable in the PAMPA assay: [C][N][Branch1][C][C][C][=C][C][=Branch2][Ring1][C][=C][Branch1][#Branch2][S][C][Ring1][Branch1][=N][C][=C][Ring1][=Branch2][C][=Branch1][C][=O][N][N]"} {"text":"User: I want to generate a SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be permeable in the PAMPA assay.\nAssistant: Ok, this SMILES is not permeable in the PAMPA assay: CC(C)C1=CC(=NN1)C(=O)N2CC[C@H](C2)NC(=O)C3CC3"}", "/scratch/micpie/export/pampa_ncats/valid_0-2.jsonl": "{"text":"Based on the canonical SMILES COc1ccc(CCn2c(N)cc(=O)[nH]c2=S)cc1OC, the molecule has no permeability characteristics."} {"text":"Based on the DeepSMILES representation CC=CC=CC=C6)Cl))))COC=CC=NN5C=NC=CC=C6)C=O)O, the molecule has no permeability properties."}", "/scratch/micpie/export/pampa_ncats/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES COc1ccc(Cl)cc1C(=O)Nc1ccc(NC(=O)c2ccco2)cc1 permeable in the PAMPA assay?\nConstraint: Even if you are not sure, you must pick either a or b without using any other words.\nOptions:\n[a] True\n[b] False\nAnswer: a"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of O=C1CCCC2=C1C(c1[nH]ncc1Cl)N=C(Nc1nc3c(Cl)cccc3o1)N2 permeable in the PAMPA assay?\nConstraint: Even if you are not sure, you must pick either a or b without using any other words.\nOptions:\na.) True\nb.) False\nAnswer: a"}", "/scratch/micpie/export/pampa_ncats/valid_0-1.jsonl": "{"text":"The molecule with the SELFIES representation of [C][O][C][=C][Branch2][Ring1][#Branch2][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][C][N][C][=Branch1][O][=C][C][=Branch1][C][=O][N][C][Ring1][#Branch1][=S][N][O][C] is not permeating in the PAMPA assay."} {"text":"The molecule with the SELFIES [C][C][=C][Branch1][#Branch2][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][Cl][C][O][C][=C][C][=N][N][Ring1][Branch1][C][=N][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=Branch1][C][=O][O] is not permeating in the PAMPA assay."}", "/scratch/micpie/export/pampa_ncats/valid_0-13.jsonl": "{"text":"User: I want to create a SELFIES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be permeable in the PAMPA assay.\nAssistant: Ok, this SELFIES is not permeable in the PAMPA assay: [C][O][C][=C][Branch2][Ring1][#Branch2][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][C][N][C][=Branch1][O][=C][C][=Branch1][C][=O][N][C][Ring1][#Branch1][=S][N][O][C]"} {"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be permeable in the PAMPA assay.\nAssistant: Got it, this canonical SMILES is not permeable in the PAMPA assay: Cc1cc(Cl)ccc1COc1ccnn1-c1cc(C(=O)O)ccn1"}", "/scratch/micpie/export/pampa_ncats/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is permeable in the PAMPA assay.\nMolecule InChI: InChI=1S\/C14H17N3O3S\/c1-19-10-4-3-9(7-11(10)20-2)5-6-17-12(15)8-13(18)16-14(17)21\/h3-4,7-8H,5-6,15H2,1-2H3,(H,16,18,21)\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is permeable in the PAMPA assay.\nSELFIES: [C][C][=C][Branch1][#Branch2][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][Cl][C][O][C][=C][C][=N][N][Ring1][Branch1][C][=N][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][=Branch1][C][=O][O]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/pampa_ncats/train_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are permeable in the PAMPA assay?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any additional words.\nOptions:\n1.) [C][C][C][N][C][C][=C][Branch1][Branch1][N][C][Ring1][=Branch1][N][Branch1][#Branch2][C][=Branch1][C][=S][N][C][Ring1][=Branch2][=O][C][C][C][=C][Branch1][#Branch2][C][=Branch1][=Branch1][=C][C][=C][Ring1][=Branch1][Cl][Cl]\n2.) [C][O][C][=C][Branch1][#Branch1][C][=C][C][=N][Ring1][=Branch1][C][=N][C][=Branch2][Ring1][S][=C][C][=C][N][Branch1][#Branch1][C][Ring1][Branch1][=C][Ring1][=Branch2][C][C][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][S+1][=Branch1][C][=O][Branch1][C][C][O-1][N][C][C][O][C][C][Ring1][=Branch1]\n3.) [C][O][C][=C][Branch1][#Branch2][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=C][C][=C][O][Ring1][Branch1]\n4.) [C][C][O][C][=C][Branch2][Ring2][#Branch1][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][=C][C][=C][Ring1][=Branch1][N][Ring1][=Branch2][C][C][N][C][C][O][C][C][Ring1][=Branch1][O][C][C]\nAnswer: 1, 2, 3, 4"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are permeable in the PAMPA assay?\nConstraint: You must select none, one or more options from A or B without using any other words.\nOptions:\nA [C][C][C][C][=C][Branch1][=N][C][=Branch1][C][=O][N][C][=Branch1][Ring2][=C][Ring1][#Branch1][C][C][N][C][=Branch1][C][=O][C][=C][C][=N][N][Branch2][Ring1][S][C][Ring1][Branch1][=C][C][=Branch1][Ring2][=C][Ring1][=Branch2][C][=C][C][=Branch1][=Branch1][=N][C][=C][Ring1][=Branch1][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][Branch1][C][C][C]\nB [C][C][C][=C][Branch2][Ring2][#Branch1][C][Branch2][Ring1][=Branch2][N][=C][Branch1][Ring2][N][Ring1][=Branch1][N][C][=N][C][=C][Branch1][Ring2][O][Ring1][Branch1][C][=C][C][=C][Ring1][#Branch1][Cl][C][=C][Branch1][=Branch1][C][=N][N][Ring1][Branch1][Cl][C][=Branch1][C][=O][C][Ring2][Ring1][N]\nAnswer: A, B"}", "/scratch/micpie/export/pampa_ncats/valid_0-4.jsonl": "{"text":"The SELFIES [C][O][C][=C][Branch2][Ring1][#Branch2][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][C][N][C][=Branch1][O][=C][C][=Branch1][C][=O][N][C][Ring1][#Branch1][=S][N][O][C] is not permeable in the PAMPA assay."} {"text":"The canonical SMILES Cc1cc(Cl)ccc1COc1ccnn1-c1cc(C(=O)O)ccn1 is not permeable in the PAMPA assay."}", "/scratch/micpie/export/pampa_ncats/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is permeable in the PAMPA assay.\nMolecule DeepSMILES: COC=CC=CC=C6))Cl)))C=O)NC=CC=CC=C6))NC=O)C=CC=CO5\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is permeable in the PAMPA assay.\nMolecule SMILES: C1CC2=C(C(N=C(N2)NC3=NC4=C(O3)C=CC=C4Cl)C5=C(C=NN5)Cl)C(=O)C1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"}", "/scratch/micpie/export/pampa_ncats/valid_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not permeable in the PAMPA assay?\nConstraint: You must select none, one or more options from 1, 2, 3, 4, or 5 without using any additional words.\nOptions:\n(1) CCCCNCC6))S=O)=O)C=CC=CC=C6))NC=O)C5C)C)))CC=O)NCC=CC=CO5\n(2) CCNCCC6CCCCNC=O)C=CC=CN=CC=C6)))))))))))))))))C=O)C=CC=CC=C6\n(3) COC=CC=CC=C6))NC=O)NC=N5)CCCCNC6)C=O)C=CC=CC=C6F\n(4) COC=CC=CC=C6))CCNC=CC=O)NC6=S)))))N)))))))OC\n(5) CNCC=CC=CC=C6)))))))C=O)CCCNCC6))CC=CC=CC=CC=CC=C6%10\nAnswer: 4"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not permeable in the PAMPA assay?\nConstraint: You must select none, one or more options from a or b without using any additional words.\nOptions:\na.) CC(=O)c1sc(NC(=O)c2ccco2)nc1-c1ccccc1\nb.) Cc1cc(Cl)ccc1COc1ccnn1-c1cc(C(=O)O)ccn1\nAnswer: b"}", "/scratch/micpie/export/pampa_ncats/valid_0-12.jsonl": "{"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be permeable in the PAMPA assay.\nAssistant: Got it, this SELFIES is not permeable in the PAMPA assay: [C][O][C][=C][Branch2][Ring1][#Branch2][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][C][N][C][=Branch1][O][=C][C][=Branch1][C][=O][N][C][Ring1][#Branch1][=S][N][O][C]"} {"text":"User: I want to create a SMILES.\nAssistant: This sounds very curious. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be permeable in the PAMPA assay.\nAssistant: Got it, this SMILES is not permeable in the PAMPA assay: CC1=C(C=CC(=C1)Cl)COC2=CC=NN2C3=NC=CC(=C3)C(=O)O"}", "/scratch/micpie/export/pampa_ncats/train_0-2.jsonl": "{"text":"Based on the InChI representation InChI=1S\/C19H15ClN2O4\/c1-25-16-9-4-12(20)11-15(16)18(23)21-13-5-7-14(8-6-13)22-19(24)17-3-2-10-26-17\/h2-11H,1H3,(H,21,23)(H,22,24), the molecule has permeability features."} {"text":"Based on the canonical SMILES representation O=C1CCCC2=C1C(c1[nH]ncc1Cl)N=C(Nc1nc3c(Cl)cccc3o1)N2, the molecule has permeability characteristics."}", "/scratch/micpie/export/pampa_ncats/test_0-11.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that is permeable in the PAMPA assay?\nAssistant: This is a molecule that is permeable in the PAMPA assay: CN(C)c1ccnc2sc(C(N)=O)c(N)c12"} {"text":"User: I'm searching for the SMILES of a molecule that is not permeable in the PAMPA assay?\nAssistant: This is a molecule that is not permeable in the PAMPA assay: CC(C)C1=CC(=NN1)C(=O)N2CC[C@H](C2)NC(=O)C3CC3"}", "/scratch/micpie/export/pampa_ncats/train_0-7.jsonl": "{"text":"Task: Please give me a molecule DeepSMILES based on the text description below.\nDescription: A molecule that is permeable in the PAMPA assay.\nResult: COC=CC=CC=C6))Cl)))C=O)NC=CC=CC=C6))NC=O)C=CC=CO5"} {"text":"Task: Please give me a molecule InChI based on the text description below.\nDescription: A molecule that is permeable in the PAMPA assay.\nResult: InChI=1S\/C18H14Cl2N6O2\/c19-8-3-1-6-12-14(8)24-18(28-12)25-17-22-10-4-2-5-11(27)13(10)16(23-17)15-9(20)7-21-26-15\/h1,3,6-7,16H,2,4-5H2,(H,21,26)(H2,22,23,24,25)"}", "/scratch/micpie/export/pampa_ncats/train_0-11.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that is permeable in the PAMPA assay?\nAssistant: This is a molecule that is permeable in the PAMPA assay: COC1=C(C=C(C=C1)Cl)C(=O)NC2=CC=C(C=C2)NC(=O)C3=CC=CO3"} {"text":"User: I'm searching for the canonical SMILES of a molecule that is permeable in the PAMPA assay?\nAssistant: This is a molecule that is permeable in the PAMPA assay: O=C1CCCC2=C1C(c1[nH]ncc1Cl)N=C(Nc1nc3c(Cl)cccc3o1)N2"}", "/scratch/micpie/export/pampa_ncats/train_0-1.jsonl": "{"text":"The molecule with the DeepSMILES representation of COC=CC=CC=C6))Cl)))C=O)NC=CC=CC=C6))NC=O)C=CC=CO5 is permeating in the PAMPA assay."} {"text":"The molecule with the InChI InChI=1S\/C18H14Cl2N6O2\/c19-8-3-1-6-12-14(8)24-18(28-12)25-17-22-10-4-2-5-11(27)13(10)16(23-17)15-9(20)7-21-26-15\/h1,3,6-7,16H,2,4-5H2,(H,21,26)(H2,22,23,24,25) is permeating in the PAMPA assay."}", "/scratch/micpie/export/pampa_ncats/train_0-13.jsonl": "{"text":"User: I want to create a molecule SELFIES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should be permeable in the PAMPA assay.\nAssistant: Got it, this SELFIES is permeable in the PAMPA assay: [C][O][C][=C][Branch1][#Branch2][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=C][C][=C][O][Ring1][Branch1]"} {"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should be permeable in the PAMPA assay.\nAssistant: Got it, this SMILES is permeable in the PAMPA assay: C1CC2=C(C(N=C(N2)NC3=NC4=C(O3)C=CC=C4Cl)C5=C(C=NN5)Cl)C(=O)C1"}", "/scratch/micpie/export/pampa_ncats/train_0-4.jsonl": "{"text":"The SELFIES [C][O][C][=C][Branch1][#Branch2][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=C][C][=C][O][Ring1][Branch1] is permeable in the PAMPA assay."} {"text":"The SMILES C1CC2=C(C(N=C(N2)NC3=NC4=C(O3)C=CC=C4Cl)C5=C(C=NN5)Cl)C(=O)C1 is permeable in the PAMPA assay."}", "/scratch/micpie/export/pampa_ncats/test_0-7.jsonl": "{"text":"Task: Please give me a DeepSMILES based on the description.\nDescription: A molecule that is permeable in the PAMPA assay.\nResult: CNC)C=CC=CSC5=NC=C9)))))C=O)N)))N"} {"text":"Task: Please create a canonical SMILES based on the text description.\nDescription: A molecule that is permeable in the PAMPA assay.\nResult: CC(C)c1cc(C(=O)N2CC[C@@H](NC(=O)C3CC3)C2)n[nH]1"}", "/scratch/micpie/export/pampa_ncats/train_0-9.jsonl": "{"text":"User: Is the molecule with the SELFIES [C][O][C][=C][Branch1][#Branch2][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][Cl][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=C][C][=C][O][Ring1][Branch1] permeable in the PAMPA assay?\nAssistant: Yes, it is permeable in the PAMPA assay."} {"text":"User: Is the molecule with the SMILES C1CC2=C(C(N=C(N2)NC3=NC4=C(O3)C=CC=C4Cl)C5=C(C=NN5)Cl)C(=O)C1 permeable in the PAMPA assay?\nAssistant: Yes, it is permeable in the PAMPA assay."}", "/scratch/micpie/export/pampa_ncats/valid_0-3.jsonl": "{"text":"The SELFIES [C][O][C][=C][Branch2][Ring1][#Branch2][C][=C][Branch1][Branch1][C][=C][Ring1][=Branch1][C][C][N][C][=Branch1][O][=C][C][=Branch1][C][=O][N][C][Ring1][#Branch1][=S][N][O][C] is from a molecule that is not identified as permeable in the PAMPA assay."} {"text":"The InChI InChI=1S\/C17H14ClN3O3\/c1-11-8-14(18)3-2-13(11)10-24-16-5-7-20-21(16)15-9-12(17(22)23)4-6-19-15\/h2-9H,10H2,1H3,(H,22,23) represents a molecule that is not identified as permeable in the PAMPA assay."}", "/scratch/micpie/export/pampa_ncats/test_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the SELFIES [C][N][Branch1][C][C][C][=C][C][=Branch2][Ring1][C][=C][Branch1][#Branch2][S][C][Ring1][Branch1][=N][C][=C][Ring1][=Branch2][C][=Branch1][C][=O][N][N] is permeable in the PAMPA assay?\nAssistant: Yes, this molecule is permeable in the PAMPA assay."} {"text":"User: Can you figure out if the molecule with the canonical SMILES CC(C)c1cc(C(=O)N2CC[C@@H](NC(=O)C3CC3)C2)n[nH]1 is permeable in the PAMPA assay?\nAssistant: No, this molecule is not permeable in the PAMPA assay."}", "/scratch/micpie/export/pampa_ncats/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES representation of CNC)C=CC=CSC5=NC=C9)))))C=O)N)))N permeable in the PAMPA assay?\nConstraint: Even if you are not sure, you must pick either A or B without using any additional words.\nOptions:\nA) False\nB) True\nAnswer: B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES [C][C][Branch1][C][C][C][=C][C][=Branch1][Branch1][=N][N][Ring1][Branch1][C][=Branch1][C][=O][N][C][C][C@H1][Branch1][Ring2][C][Ring1][Branch1][N][C][=Branch1][C][=O][C][C][C][Ring1][Ring1] permeable in the PAMPA assay?\nConstraint: Even if you are not sure, you must pick either a or b without using any other words.\nOptions:\na) True\nb) False\nAnswer: b"}", "/scratch/micpie/export/pampa_ncats/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES COC=CC=CC=C6))CCNC=CC=O)NC6=S)))))N)))))))OC permeable in the PAMPA assay?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any additional words.\nOptions:\n[1] True\n[2] False\nAnswer: 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES representation of CC1=C(C=CC(=C1)Cl)COC2=CC=NN2C3=NC=CC(=C3)C(=O)O permeable in the PAMPA assay?\nConstraint: Even if you are uncertain, you must pick either a or b without using any additional words.\nOptions:\na True\nb False\nAnswer: b"}", "/scratch/micpie/export/pampa_ncats/test_0-4.jsonl": "{"text":"The molecule canonical SMILES CN(C)c1ccnc2sc(C(N)=O)c(N)c12 is permeable in the PAMPA assay."} {"text":"The canonical SMILES CC(C)c1cc(C(=O)N2CC[C@@H](NC(=O)C3CC3)C2)n[nH]1 is not permeable in the PAMPA assay."}", "/scratch/micpie/export/pampa_ncats/test_0-12.jsonl": "{"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should be permeable in the PAMPA assay.\nAssistant: Ok, this DeepSMILES is permeable in the PAMPA assay: CNC)C=CC=CSC5=NC=C9)))))C=O)N)))N"} {"text":"User: I want to create a molecule canonical SMILES.\nAssistant: This sounds very curious. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be permeable in the PAMPA assay.\nAssistant: Ok, here you go, this canonical SMILES is not permeable in the PAMPA assay: CC(C)c1cc(C(=O)N2CC[C@@H](NC(=O)C3CC3)C2)n[nH]1"}", "/scratch/micpie/export/bio_ner_18/valid_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Of these proteins, five have previously been shown to be phosphorylated during mitosis (epithelial-microtubule associated protein-115, Oct91, Elongation factor 1gamma, BRG1 and Ribosomal protein L18A), five are related to proteins postulated to have roles in mitosis (epithelial-microtubule associated protein-115, Schizosaccharomyces pombe Cdc5, innercentrosome protein, BRG1 and the RNA helicase WM6), and nine are related to transcription factors (BRG1, negative co-factor 2alpha, Oct91, S. pombe Cdc5, HoxD1, Sox3, Vent2, and two isoforms of Xbr1b)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: epithelial - microtubule associated protein - 115,89,138,Gene\/Protein\nOct91,140,145,Gene\/Protein\nElongation factor 1gamma,147,171,Gene\/Protein\nBRG1,173,177,Gene\/Protein\nRibosomal protein L18A,182,204,Gene\/Protein\nepithelial - microtubule associated protein - 115,274,323,Gene\/Protein\nSchizosaccharomyces pombe Cdc5,325,355,Gene\/Protein\ninnercentrosome protein,357,380,Gene\/Protein\nBRG1,382,386,Gene\/Protein\nRNA helicase WM6,395,411,Gene\/Protein\nBRG1,462,466,Gene\/Protein\nnegative co - factor 2alpha,468,495,Gene\/Protein\nOct91,497,502,Gene\/Protein\nCdc5,513,517,Gene\/Protein\nHoxD1,519,524,Gene\/Protein\nSox3,526,530,Gene\/Protein\nVent2,532,537,Gene\/Protein\nXbr1b,559,564,Gene\/Protein"} {"text":"Task: Please carry out the NER task for the the text below.\nText: Oil Field information; location API gravity Viscosity (cP) at 20C Designation PNG Conventional oil field; Papua New Guinea 46 9.4 Light CPM Shale oil field in the Bakken formation; Saskatchewan, Canada 41 10.8 Light Tundra Shale oil field in the Bakken formation; Manitoba, Canada 38 11.8 Light Gryphon Shale oil field; Alberta, Canada 31 16.6 Light\/Heavy Obigbo Obigbo field; Nigeria 21 50.9 Light\/Heavy MHGC Conventional oil reservoir; Medicine Hat, Alberta, Canada 16 2471 Heavy Quantification of Alkylbenzenes in Oils by Gas Chromatography-Mass Spectrometry (GC-MS) Oils (1 ml) were diluted with 9 ml of DCM..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: PNG,79,82,place\nPapua New Guinea,107,123,place\nCPM,138,141,place\nSaskatchewan,183,195,place\nCanada,197,203,place\nTundra,219,225,place\nManitoba,267,275,place\nCanada,277,283,place\nGryphon,299,306,place\nAlberta,324,331,place\nCanada,333,339,place\nObigbo,363,369,place\nObigbo,370,376,place\nNigeria,384,391,place\nMHGC,415,419,place\nMedicine Hat,448,460,place\nAlberta,462,469,place\nCanada,471,477,place"}", "/scratch/micpie/export/bio_ner_18/test_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: Accordingly, of 17 genes in our list that were previously shown to be up-regulated directly or indirectly by HlyX under anaerobiosis, 12 (dnaX, visC, dsbC, mazG, amiB, cpxC, dacA, rpoZ, APL _ 0086, APL _ 1597, APL _ 1802 and APL _ 2043) were down-regulated in vivo, while only 5 (sohB, mglB, hybB, APL _ 0096 and APL _ 0920) were up-regulated..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: HlyX,111,115,Protein\ndnaX,141,145,Protein\nvisC,147,151,Protein\ndsbC,153,157,Protein\nmazG,159,163,Protein\namiB,165,169,Protein\ncpxC,171,175,Protein\ndacA,177,181,Protein\nrpoZ,183,187,Protein\nAPL _ 0086,189,199,Protein\nAPL _ 1597,201,211,Protein\nAPL _ 1802,213,223,Protein\nAPL _ 2043,228,238,Protein\nsohB,286,290,Protein\nmglB,292,296,Protein\nhybB,298,302,Protein\nAPL _ 0096,304,314,Protein\nAPL _ 0920,319,329,Protein"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: When the negative binomial regression model was used to assess the relationship between Total HIV DNA, 2-LTR circles, CA-US RNA and the frequency of CD4+ T cells expressing PD-1 and\/or TIGIT and\/or LAG-3, no association show statistically significant, with the exception of the frequency of CD4+ T cells harboring total HIV DNA and the frequency of PD-1 single+ (p = 0.005, 1.10-fold-change in total HIV DNA for 1 point increase in percentage of PD-1 single+ CD4+ T cells) and PD-1\/TIGIT double+ CD4+ T cells (p = 0.017, 1.40-fold-change in total HIV DNA for 1 point increase in percentage of PD-1\/TIGIT double+ CD4+ T cells)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: HIV,94,97,Organism\/Species\nCD4,153,156,Gene\/Protein\nPD - 1,178,184,Gene\/Protein\nTIGIT,194,199,Gene\/Protein\nLAG - 3,209,216,Gene\/Protein\nCD4,304,307,Gene\/Protein\nHIV,334,337,Organism\/Species\nPD - 1,363,369,Gene\/Protein\nHIV,424,427,Organism\/Species\nPD - 1,470,476,Gene\/Protein\nCD4,486,489,Gene\/Protein\nPD - 1,505,511,Gene\/Protein\nTIGIT,514,519,Gene\/Protein\nCD4,529,532,Gene\/Protein\nHIV,588,591,Organism\/Species\nPD - 1,634,640,Gene\/Protein\nTIGIT,643,648,Gene\/Protein\nCD4,658,661,Gene\/Protein"}", "/scratch/micpie/export/bio_ner_18/train_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: Of these proteins, five have previously been shown to be phosphorylated during mitosis (epithelial-microtubule associated protein-115, Oct91, Elongation factor 1gamma, BRG1 and Ribosomal protein L18A), five are related to proteins postulated to have roles in mitosis (epithelial-microtubule associated protein-115, Schizosaccharomyces pombe Cdc5, innercentrosome protein, BRG1 and the RNA helicase WM6), and nine are related to transcription factors (BRG1, negative co-factor 2alpha, Oct91, S. pombe Cdc5, HoxD1, Sox3, Vent2, and two isoforms of Xbr1b)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: epithelial - microtubule associated protein - 115,89,138,Gene\/Protein\nOct91,140,145,Gene\/Protein\nElongation factor 1gamma,147,171,Gene\/Protein\nBRG1,173,177,Gene\/Protein\nRibosomal protein L18A,182,204,Gene\/Protein\nepithelial - microtubule associated protein - 115,274,323,Gene\/Protein\nSchizosaccharomyces pombe Cdc5,325,355,Gene\/Protein\ninnercentrosome protein,357,380,Gene\/Protein\nBRG1,382,386,Gene\/Protein\nRNA helicase WM6,395,411,Gene\/Protein\nBRG1,462,466,Gene\/Protein\nnegative co - factor 2alpha,468,495,Gene\/Protein\nOct91,497,502,Gene\/Protein\nCdc5,513,517,Gene\/Protein\nHoxD1,519,524,Gene\/Protein\nSox3,526,530,Gene\/Protein\nVent2,532,537,Gene\/Protein\nXbr1b,559,564,Gene\/Protein"} {"text":"Task: Please carry out the NER task for the the text below.\nText: This study investigated the suitability of a two step real-time RT-PCR melting curve analysis as a tool for the detection and discrimination of nine species in the plant virus family Luteoviridae, being Soybean dwarf virus [ SbDV], Bean leafroll virus [ BLRV], Beet chlorosis virus [ BChV], Beet mild yellowing virus [ BMYV], Beet western yellows virus [ BWYV], Cereal yellow dwarf virus-RPV [ CYDV-RPV], Cucurbit aphid-borne yellows virus [ CABYV], Potato leafroll virus [ PLRV] and Turnip yellows virus [ TuYV]..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: Soybean dwarf virus,207,226,Organism\/Species\nSbDV,229,233,Organism\/Species\nBean leafroll virus,236,255,Organism\/Species\nBLRV,258,262,Organism\/Species\nBeet chlorosis virus,265,285,Organism\/Species\nBChV,288,292,Organism\/Species\nBeet mild yellowing virus,295,320,Organism\/Species\nBMYV,323,327,Organism\/Species\nBeet western yellows virus,330,356,Organism\/Species\nBWYV,359,363,Organism\/Species\nCereal yellow dwarf virus - RPV,366,397,Organism\/Species\nCYDV - RPV,400,410,Organism\/Species\nCucurbit aphid - borne yellows virus,413,449,Organism\/Species\nCABYV,452,457,Organism\/Species\nPotato leafroll virus,460,481,Organism\/Species\nPLRV,484,488,Organism\/Species\nTurnip yellows virus,494,514,Organism\/Species\nTuYV,517,521,Organism\/Species"}", "/scratch/micpie/export/MUV_548/valid_0-0.jsonl": "{"text":"The molecular species with the InChI representation of InChI=1S\/C18H17N3O2S\/c1-21-12-11-19-18(21)24-13-17(22)20-14-7-9-16(10-8-14)23-15-5-3-2-4-6-15\/h2-12H,13H2,1H3,(H,20,22) is not an inhibitor of the protein kinase A."} {"text":"The molecular species with the SELFIES ['[C][C][=C][C][Branch2][Ring2][C][N][C][=Branch1][C][=O][C][N][C][=Branch1][C][=O][C][C][Branch1][Branch2][C][=C][C][=C][O][Ring1][Branch1][S][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][P][=N][O][Ring2][Ring1][#Branch2]'] is not an inhibitor of the protein kinase A."}", "/scratch/micpie/export/MUV_548/test_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of ['[O][=C][Branch1][=Branch2][C][C][=C][C][=C][S][Ring1][Branch1][N][C][C][N][Branch1][N][C][=Branch1][C][=O][C][C][C][C][O][Ring1][Branch1][C][C][Ring1][=N]'] is not an inhibitor of the protein kinase A (PKA)."} {"text":"The molecule with the SMILES representation of Cc1cc(C(=O)CN(Cc2ccc(F)cc2)S(=O)(=O)c2ccc3c(c2)OCCO3)c(C)n1C is not an inhibitor of the protein kinase A (PKA)."}", "/scratch/micpie/export/MUV_548/train_0-0.jsonl": "{"text":"The molecule with the SMILES O=C1\/C(=C\/NC2CCS(=O)(=O)C2)c2ccccc2C(=O)N1c1ccccc1 is not an inhibitor of PKA."} {"text":"The molecule with the DeepSMILES representation of CCOC=O)CC)ncncccnn5-ccccCl)cc6)))))))))c6=O is not an inhibitor of PKA."}", "/scratch/micpie/export/suzuki_miyaura_sach/test_0-10.jsonl": "{"text":"User: I need to run a reaction with the reaction SMILES string CC#N.CC(=O)O~CC(=O)O~[Pd].CCc1cccc(CC)c1.COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1. What is the reaction yield (measured by LCMS) I should expect?\nAssistant: The estimated reaction yield (measured by LCMS) is 86.262\\%."} {"text":"User: I would like to run a reaction with the reaction SMILES string CC(=O)O~CC(=O)O~[Pd].CC(C)(C)P(C1=CC=C[CH]1)C(C)(C)C.CC(C)(C)P(C1=CC=C[CH]1)C(C)(C)C.CCc1cccc(CC)c1.CO.Cc1ccc2c(cnn2C2CCCCO2)c1[B-](F)(F)F.Clc1ccc2ncccc2c1.O.O=P([O-])([O-])[O-].[Fe].[K+].[K+].[K+].[K+]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1. What is the reaction yield I should expect?\nAssistant: The expected reaction yield is 21.379\\%."}", "/scratch/micpie/export/suzuki_miyaura_sach/valid_0-8.jsonl": "{"text":"Task: Predict the masked component in a masked reaction SMILES (one component masked as `MASK`).\nDescription: CC#N.CC(=O)O.CC(=O)O.[Pd].CCc1cccc(CC)c1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Cc1ccccc1P(c1ccccc1C)c1ccccc1C.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>MASK\nSolution: Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1"} {"text":"Task: Predict the masked component in a masked reaction SMILES (one component masked as `MASK`).\nDescription: CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C.CCCCP(C12CC3CC(CC(C3)C1)C2)C12CC3CC(CC(C3)C1)C2.CN(C)C=O.Cc1ccc2c(cnn2C2CCCCO2)c1Br.O.[Cs+].[F-].MASK>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1\nSolution: CC(=O)O.CC(=O)O.[Pd]"}", "/scratch/micpie/export/suzuki_miyaura_sach/train_0-8.jsonl": "{"text":"Task: Predict the masked component in a masked reaction SMILES (one component masked as `MASK`).\nDescription: CC#N.CC(=O)O.CC(=O)O.[Pd].CC(C)(C)P(C(C)(C)C)C(C)(C)C.CCc1cccc(CC)c1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>MASK\nAnswer: Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1"} {"text":"Task: Predict the masked component in a masked RXNSMILES (one component masked as `MASK`).\nDescription: CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C.CN(C)C=O.Cc1ccc2c(cnn2C2CCCCO2)c1Br.O.[Cs+].[F-].[Fe+2].c1ccc(P(c2ccccc2)[c-]2cccc2)cc1.MASK>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1\nAnswer: CC(=O)O.CC(=O)O.[Pd]"}", "/scratch/micpie/export/suzuki_miyaura_sach/test_0-5.jsonl": "{"text":"Question: Which reaction products are produced from the starting materials CC#N, CC(=O)O.CC(=O)O.[Pd], CCc1cccc(CC)c1, COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1, Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O, Clc1ccc2ncccc2c1, O, [Na+], and [OH-]?\nAnswer: Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1."} {"text":"Question: Which reaction products are produced from the starting materials CC(=O)O.CC(=O)O.[Pd], CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C, CN(C)C=O, COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1, Cc1ccc2c(cnn2C2CCCCO2)c1Br, O, [Cs+], and [F-]?\nAnswer: Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1."}", "/scratch/micpie/export/suzuki_miyaura_sach/valid_0-9.jsonl": "{"text":"The yield of a reaction with the reaction SMILES CC#N.CC(=O)O~CC(=O)O~[Pd].CCc1cccc(CC)c1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Cc1ccccc1P(c1ccccc1C)c1ccccc1C.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 is 1.950\\%."} {"text":"The reaction yield of a reaction with the reaction SMILES string CC(=O)O~CC(=O)O~[Pd].CCc1cccc(CC)c1.CO.COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1.Cc1ccc2c(cnn2C2CCCCO2)c1[B-](F)(F)F.Clc1ccc2ncccc2c1.O.O=P([O-])([O-])[O-].[K+].[K+].[K+].[K+]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 is 48.656\\%."}", "/scratch/micpie/export/suzuki_miyaura_sach/test_0-1.jsonl": "{"text":"The reaction SMILES CC#N.CC(=O)O~CC(=O)O~[Pd].CCc1cccc(CC)c1.COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 has the products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 and the reaction educts CC#N, CC(=O)O.CC(=O)O.[Pd], CCc1cccc(CC)c1, COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1, Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O, Clc1ccc2ncccc2c1, O, [Na+], and [OH-]."} {"text":"The reaction SMILES CC(=O)O~CC(=O)O~[Pd].CC(C)(C)P(C1=CC=C[CH]1)C(C)(C)C.CC(C)(C)P(C1=CC=C[CH]1)C(C)(C)C.CCc1cccc(CC)c1.CO.Cc1ccc2c(cnn2C2CCCCO2)c1[B-](F)(F)F.Clc1ccc2ncccc2c1.O.O=P([O-])([O-])[O-].[Fe].[K+].[K+].[K+].[K+]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 has the products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 and the starting materials CC(=O)O.CC(=O)O.[Pd], CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C, CN(C)C=O, COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1, Cc1ccc2c(cnn2C2CCCCO2)c1Br, O, [Cs+], and [F-]."}", "/scratch/micpie/export/suzuki_miyaura_sach/valid_0-0.jsonl": "{"text":"The RXNSMILES CC#N.CC(=O)O~CC(=O)O~[Pd].CCc1cccc(CC)c1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Cc1ccccc1P(c1ccccc1C)c1ccccc1C.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 has the reaction educts CC#N, CC(=O)O.CC(=O)O.[Pd], CCc1cccc(CC)c1, Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O, Cc1ccccc1P(c1ccccc1C)c1ccccc1C, Clc1ccc2ncccc2c1, O, [Na+], and [OH-] and the products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1."} {"text":"The reaction SMILES CC(=O)O~CC(=O)O~[Pd].CCc1cccc(CC)c1.CO.COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1.Cc1ccc2c(cnn2C2CCCCO2)c1[B-](F)(F)F.Clc1ccc2ncccc2c1.O.O=P([O-])([O-])[O-].[K+].[K+].[K+].[K+]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 has the reaction educts CC(=O)O.CC(=O)O.[Pd], CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C, CCCCP(C12CC3CC(CC(C3)C1)C2)C12CC3CC(CC(C3)C1)C2, CN(C)C=O, Cc1ccc2c(cnn2C2CCCCO2)c1Br, O, [Cs+], and [F-] and the reaction products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1."}", "/scratch/micpie/export/suzuki_miyaura_sach/test_0-2.jsonl": "{"text":"The masked component in the masked reaction SMILES string (one component masked as `MASK`) CC#N.CC(=O)O.CC(=O)O.[Pd].CCc1cccc(CC)c1.COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>MASK is Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1."} {"text":"The masked component in the masked reaction SMILES string (one component masked as `MASK`) CC(=O)O.CC(=O)O.[Pd].CN(C)C=O.COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1.Cc1ccc2c(cnn2C2CCCCO2)c1Br.O.[Cs+].[F-].MASK>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 is CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C."}", "/scratch/micpie/export/suzuki_miyaura_sach/valid_0-10.jsonl": "{"text":"User: I need to run a reaction with the RXNSMILES CC#N.CC(=O)O~CC(=O)O~[Pd].CCc1cccc(CC)c1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Cc1ccccc1P(c1ccccc1C)c1ccccc1C.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1. What is the yield I can get?\nAssistant: The yield is 1.950\\%."} {"text":"User: I would like to run a reaction with the reaction SMILES (RXNSMILES) CC(=O)O~CC(=O)O~[Pd].CCc1cccc(CC)c1.CO.COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1.Cc1ccc2c(cnn2C2CCCCO2)c1[B-](F)(F)F.Clc1ccc2ncccc2c1.O.O=P([O-])([O-])[O-].[K+].[K+].[K+].[K+]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1. What is the yield I can get?\nAssistant: The predicted yield is 48.656\\%."}", "/scratch/micpie/export/suzuki_miyaura_sach/train_0-6.jsonl": "{"text":"User: I would like to synthesize the products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1.\nAssistant: Great, is there anything else I can do for you?\nUser: Yes, I would like to know the starting materials I need to produce the products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1.\nAssistant: I propose the following starting materials: CC#N, CC(=O)O.CC(=O)O.[Pd], CC(C)(C)P(C(C)(C)C)C(C)(C)C, CCc1cccc(CC)c1, Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O, Clc1ccc2ncccc2c1, O, [Na+], and [OH-]."} {"text":"User: I want synthesize the products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1.\nAssistant: Is there anything else I can do for you?\nUser: I would like to know the reaction educts I need to produce the products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1.\nAssistant: I propose the following reaction educts: CC(=O)O.CC(=O)O.[Pd], CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C, CN(C)C=O, Cc1ccc2c(cnn2C2CCCCO2)c1Br, O, [Cs+], [F-], [Fe+2], and c1ccc(P(c2ccccc2)[c-]2cccc2)cc1."}", "/scratch/micpie/export/suzuki_miyaura_sach/valid_0-6.jsonl": "{"text":"User: I would like to produce the reaction products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1.\nAssistant: Cool, is there anything else I can do for you?\nUser: I would like to know the educts I need to produce the reaction products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1.\nAssistant: I recommend the following educts: CC#N, CC(=O)O.CC(=O)O.[Pd], CCc1cccc(CC)c1, Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O, Cc1ccccc1P(c1ccccc1C)c1ccccc1C, Clc1ccc2ncccc2c1, O, [Na+], and [OH-]."} {"text":"User: I must synthesize the products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1.\nAssistant: Great, is there anything else I can do for you?\nUser: Yes, I would like to know the educts I need to produce the products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1.\nAssistant: I advise the following educts: CC(=O)O.CC(=O)O.[Pd], CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C, CCCCP(C12CC3CC(CC(C3)C1)C2)C12CC3CC(CC(C3)C1)C2, CN(C)C=O, Cc1ccc2c(cnn2C2CCCCO2)c1Br, O, [Cs+], and [F-]."}", "/scratch/micpie/export/suzuki_miyaura_sach/test_0-9.jsonl": "{"text":"The reaction yield of a reaction with the reaction SMILES CC#N.CC(=O)O~CC(=O)O~[Pd].CCc1cccc(CC)c1.COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 is 86.262\\%."} {"text":"The reaction yield of a reaction with the reaction SMILES string CC(=O)O~CC(=O)O~[Pd].CC(C)(C)P(C1=CC=C[CH]1)C(C)(C)C.CC(C)(C)P(C1=CC=C[CH]1)C(C)(C)C.CCc1cccc(CC)c1.CO.Cc1ccc2c(cnn2C2CCCCO2)c1[B-](F)(F)F.Clc1ccc2ncccc2c1.O.O=P([O-])([O-])[O-].[Fe].[K+].[K+].[K+].[K+]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 is 21.379\\%."}", "/scratch/micpie/export/suzuki_miyaura_sach/test_0-0.jsonl": "{"text":"The reaction SMILES string CC#N.CC(=O)O~CC(=O)O~[Pd].CCc1cccc(CC)c1.COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 has the educts CC#N, CC(=O)O.CC(=O)O.[Pd], CCc1cccc(CC)c1, COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1, Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O, Clc1ccc2ncccc2c1, O, [Na+], and [OH-] and the products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1."} {"text":"The RXNSMILES CC(=O)O~CC(=O)O~[Pd].CC(C)(C)P(C1=CC=C[CH]1)C(C)(C)C.CC(C)(C)P(C1=CC=C[CH]1)C(C)(C)C.CCc1cccc(CC)c1.CO.Cc1ccc2c(cnn2C2CCCCO2)c1[B-](F)(F)F.Clc1ccc2ncccc2c1.O.O=P([O-])([O-])[O-].[Fe].[K+].[K+].[K+].[K+]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 has the educts CC(=O)O.CC(=O)O.[Pd], CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C, CN(C)C=O, COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1, Cc1ccc2c(cnn2C2CCCCO2)c1Br, O, [Cs+], and [F-] and the reaction products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1."}", "/scratch/micpie/export/suzuki_miyaura_sach/valid_0-7.jsonl": "{"text":"Question: What is the masked component in the reaction SMILES with one element masked as `MASK` CC#N.CC(=O)O.CC(=O)O.[Pd].CCc1cccc(CC)c1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Cc1ccccc1P(c1ccccc1C)c1ccccc1C.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>MASK?\nAnswer: Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1."} {"text":"Question: What is the masked component in the reaction SMILES with one element masked as `MASK` CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C.CCCCP(C12CC3CC(CC(C3)C1)C2)C12CC3CC(CC(C3)C1)C2.CN(C)C=O.Cc1ccc2c(cnn2C2CCCCO2)c1Br.O.[Cs+].[F-].MASK>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1?\nAnswer: CC(=O)O.CC(=O)O.[Pd]."}", "/scratch/micpie/export/suzuki_miyaura_sach/test_0-3.jsonl": "{"text":"The chemical with SMILES Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 is the masked component in the masked reaction SMILES (one component masked as `MASK`) CC#N.CC(=O)O.CC(=O)O.[Pd].CCc1cccc(CC)c1.COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>MASK."} {"text":"The compound with SMILES CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C is the masked component in the reaction SMILES with one element hidden as `MASK` CC(=O)O.CC(=O)O.[Pd].CN(C)C=O.COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1.Cc1ccc2c(cnn2C2CCCCO2)c1Br.O.[Cs+].[F-].MASK>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1."}", "/scratch/micpie/export/suzuki_miyaura_sach/valid_0-11.jsonl": "{"text":"Question: What's the yield of a reaction with the RXNSMILES CC#N.CC(=O)O~CC(=O)O~[Pd].CCc1cccc(CC)c1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Cc1ccccc1P(c1ccccc1C)c1ccccc1C.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1?\nAnswer: 1.950\\%."} {"text":"Question: What is the reaction yield (measured by LCMS) of a reaction with the reaction SMILES (RXNSMILES) CC(=O)O~CC(=O)O~[Pd].CCc1cccc(CC)c1.CO.COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1.Cc1ccc2c(cnn2C2CCCCO2)c1[B-](F)(F)F.Clc1ccc2ncccc2c1.O.O=P([O-])([O-])[O-].[K+].[K+].[K+].[K+]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1?\nAnswer: 48.656\\%."}", "/scratch/micpie/export/suzuki_miyaura_sach/train_0-0.jsonl": "{"text":"The reaction SMILES (RXNSMILES) CC#N.CC(=O)O~CC(=O)O~[Pd].CC(C)(C)P(C(C)(C)C)C(C)(C)C.CCc1cccc(CC)c1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 has the educts CC#N, CC(=O)O.CC(=O)O.[Pd], CC(C)(C)P(C(C)(C)C)C(C)(C)C, CCc1cccc(CC)c1, Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O, Clc1ccc2ncccc2c1, O, [Na+], and [OH-] and the products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1."} {"text":"The reaction SMILES string CC(=O)O~CC(=O)O~[Pd].CC1(C)c2cccc(P(c3ccccc3)c3ccccc3)c2Oc2c(P(c3ccccc3)c3ccccc3)cccc21.CCc1cccc(CC)c1.CO.Cc1ccc2c(cnn2C2CCCCO2)c1[B-](F)(F)F.Clc1ccc2ncccc2c1.O.O=P([O-])([O-])[O-].[K+].[K+].[K+].[K+]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 has the educts CC(=O)O.CC(=O)O.[Pd], CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C, CN(C)C=O, Cc1ccc2c(cnn2C2CCCCO2)c1Br, O, [Cs+], [F-], [Fe+2], and c1ccc(P(c2ccccc2)[c-]2cccc2)cc1 and the reaction products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1."}", "/scratch/micpie/export/suzuki_miyaura_sach/test_0-6.jsonl": "{"text":"User: I want synthesize the products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1.\nAssistant: Great, is there anything else I can do for you?\nUser: Yes, I would like to know the reaction educts I need to produce the products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1.\nAssistant: I propose the following reaction educts: CC#N, CC(=O)O.CC(=O)O.[Pd], CCc1cccc(CC)c1, COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1, Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O, Clc1ccc2ncccc2c1, O, [Na+], and [OH-]."} {"text":"User: I want produce the products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1.\nAssistant: That's interesting, is there anything else I can do for you?\nUser: Yes, I would like to know the starting materials I need to produce the products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1.\nAssistant: I advise the following starting materials: CC(=O)O.CC(=O)O.[Pd], CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C, CN(C)C=O, COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1, Cc1ccc2c(cnn2C2CCCCO2)c1Br, O, [Cs+], and [F-]."}", "/scratch/micpie/export/suzuki_miyaura_sach/train_0-10.jsonl": "{"text":"User: I want to run a reaction with the RXNSMILES CC#N.CC(=O)O~CC(=O)O~[Pd].CC(C)(C)P(C(C)(C)C)C(C)(C)C.CCc1cccc(CC)c1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1. What is the reaction yield I can expect?\nAssistant: The predicted reaction yield is 4.764\\%."} {"text":"User: I need to run a reaction with the reaction SMILES string CC(=O)O~CC(=O)O~[Pd].CC1(C)c2cccc(P(c3ccccc3)c3ccccc3)c2Oc2c(P(c3ccccc3)c3ccccc3)cccc21.CCc1cccc(CC)c1.CO.Cc1ccc2c(cnn2C2CCCCO2)c1[B-](F)(F)F.Clc1ccc2ncccc2c1.O.O=P([O-])([O-])[O-].[K+].[K+].[K+].[K+]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1. What is the reaction yield (measured by LCMS) I can expect?\nAssistant: The estimated reaction yield (measured by LCMS) is 18.509\\%."}", "/scratch/micpie/export/suzuki_miyaura_sach/train_0-3.jsonl": "{"text":"The chemical with SMILES Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 is the masked component in the reaction SMILES with one element masked as `MASK` CC#N.CC(=O)O.CC(=O)O.[Pd].CC(C)(C)P(C(C)(C)C)C(C)(C)C.CCc1cccc(CC)c1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>MASK."} {"text":"The compound with SMILES CC(=O)O.CC(=O)O.[Pd] is the masked component in the reaction SMILES with one element hidden as `MASK` CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C.CN(C)C=O.Cc1ccc2c(cnn2C2CCCCO2)c1Br.O.[Cs+].[F-].[Fe+2].c1ccc(P(c2ccccc2)[c-]2cccc2)cc1.MASK>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1."}", "/scratch/micpie/export/suzuki_miyaura_sach/valid_0-2.jsonl": "{"text":"The masked component in the reaction SMILES with one element hidden as `MASK` CC#N.CC(=O)O.CC(=O)O.[Pd].CCc1cccc(CC)c1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Cc1ccccc1P(c1ccccc1C)c1ccccc1C.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>MASK is Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1."} {"text":"The masked component in the reaction SMILES with one element hidden as `MASK` CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C.CCCCP(C12CC3CC(CC(C3)C1)C2)C12CC3CC(CC(C3)C1)C2.CN(C)C=O.Cc1ccc2c(cnn2C2CCCCO2)c1Br.O.[Cs+].[F-].MASK>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 is CC(=O)O.CC(=O)O.[Pd]."}", "/scratch/micpie/export/suzuki_miyaura_sach/valid_0-1.jsonl": "{"text":"The RXNSMILES CC#N.CC(=O)O~CC(=O)O~[Pd].CCc1cccc(CC)c1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Cc1ccccc1P(c1ccccc1C)c1ccccc1C.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 has the products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 and the educts CC#N, CC(=O)O.CC(=O)O.[Pd], CCc1cccc(CC)c1, Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O, Cc1ccccc1P(c1ccccc1C)c1ccccc1C, Clc1ccc2ncccc2c1, O, [Na+], and [OH-]."} {"text":"The reaction SMILES CC(=O)O~CC(=O)O~[Pd].CCc1cccc(CC)c1.CO.COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1.Cc1ccc2c(cnn2C2CCCCO2)c1[B-](F)(F)F.Clc1ccc2ncccc2c1.O.O=P([O-])([O-])[O-].[K+].[K+].[K+].[K+]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 has the reaction products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 and the educts CC(=O)O.CC(=O)O.[Pd], CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C, CCCCP(C12CC3CC(CC(C3)C1)C2)C12CC3CC(CC(C3)C1)C2, CN(C)C=O, Cc1ccc2c(cnn2C2CCCCO2)c1Br, O, [Cs+], and [F-]."}", "/scratch/micpie/export/suzuki_miyaura_sach/valid_0-5.jsonl": "{"text":"Question: What products are produced from the reaction educts CC#N, CC(=O)O.CC(=O)O.[Pd], CCc1cccc(CC)c1, Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O, Cc1ccccc1P(c1ccccc1C)c1ccccc1C, Clc1ccc2ncccc2c1, O, [Na+], and [OH-]?\nAnswer: Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1."} {"text":"Question: What reaction products are produced from the starting materials CC(=O)O.CC(=O)O.[Pd], CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C, CCCCP(C12CC3CC(CC(C3)C1)C2)C12CC3CC(CC(C3)C1)C2, CN(C)C=O, Cc1ccc2c(cnn2C2CCCCO2)c1Br, O, [Cs+], and [F-]?\nAnswer: Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1."}", "/scratch/micpie/export/suzuki_miyaura_sach/valid_0-4.jsonl": "{"text":"Question: Which starting materials are required to synthesize the reaction products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1?\nAnswer: CC#N, CC(=O)O.CC(=O)O.[Pd], CCc1cccc(CC)c1, Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O, Cc1ccccc1P(c1ccccc1C)c1ccccc1C, Clc1ccc2ncccc2c1, O, [Na+], and [OH-]."} {"text":"Question: Which educts are needed to produce the products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1?\nAnswer: CC(=O)O.CC(=O)O.[Pd], CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C, CCCCP(C12CC3CC(CC(C3)C1)C2)C12CC3CC(CC(C3)C1)C2, CN(C)C=O, Cc1ccc2c(cnn2C2CCCCO2)c1Br, O, [Cs+], and [F-]."}", "/scratch/micpie/export/suzuki_miyaura_sach/train_0-5.jsonl": "{"text":"Question: What reaction products are produced from the reaction educts CC#N, CC(=O)O.CC(=O)O.[Pd], CC(C)(C)P(C(C)(C)C)C(C)(C)C, CCc1cccc(CC)c1, Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O, Clc1ccc2ncccc2c1, O, [Na+], and [OH-]?\nAnswer: Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1."} {"text":"Question: Which reaction products are produced from the reaction educts CC(=O)O.CC(=O)O.[Pd], CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C, CN(C)C=O, Cc1ccc2c(cnn2C2CCCCO2)c1Br, O, [Cs+], [F-], [Fe+2], and c1ccc(P(c2ccccc2)[c-]2cccc2)cc1?\nAnswer: Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1."}", "/scratch/micpie/export/suzuki_miyaura_sach/train_0-2.jsonl": "{"text":"The masked component in the masked reaction SMILES (one component masked as `MASK`) CC#N.CC(=O)O.CC(=O)O.[Pd].CC(C)(C)P(C(C)(C)C)C(C)(C)C.CCc1cccc(CC)c1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>MASK is Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1."} {"text":"The masked component in the masked reaction SMILES (one component masked as `MASK`) CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C.CN(C)C=O.Cc1ccc2c(cnn2C2CCCCO2)c1Br.O.[Cs+].[F-].[Fe+2].c1ccc(P(c2ccccc2)[c-]2cccc2)cc1.MASK>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 is CC(=O)O.CC(=O)O.[Pd]."}", "/scratch/micpie/export/suzuki_miyaura_sach/test_0-11.jsonl": "{"text":"Question: What's yield of a reaction with the reaction SMILES (RXNSMILES) CC#N.CC(=O)O~CC(=O)O~[Pd].CCc1cccc(CC)c1.COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1?\nAnswer: 86.262\\%."} {"text":"Question: What is reaction yield (measured by LCMS) of a reaction with the RXNSMILES CC(=O)O~CC(=O)O~[Pd].CC(C)(C)P(C1=CC=C[CH]1)C(C)(C)C.CC(C)(C)P(C1=CC=C[CH]1)C(C)(C)C.CCc1cccc(CC)c1.CO.Cc1ccc2c(cnn2C2CCCCO2)c1[B-](F)(F)F.Clc1ccc2ncccc2c1.O.O=P([O-])([O-])[O-].[Fe].[K+].[K+].[K+].[K+]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1?\nAnswer: 21.379\\%."}", "/scratch/micpie/export/suzuki_miyaura_sach/train_0-7.jsonl": "{"text":"Question: What is the masked component in the masked reaction SMILES string (one component masked as `MASK`) CC#N.CC(=O)O.CC(=O)O.[Pd].CC(C)(C)P(C(C)(C)C)C(C)(C)C.CCc1cccc(CC)c1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>MASK?\nAnswer: Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1."} {"text":"Question: What is the masked component in the reaction SMILES with one element masked as `MASK` CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C.CN(C)C=O.Cc1ccc2c(cnn2C2CCCCO2)c1Br.O.[Cs+].[F-].[Fe+2].c1ccc(P(c2ccccc2)[c-]2cccc2)cc1.MASK>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1?\nAnswer: CC(=O)O.CC(=O)O.[Pd]."}", "/scratch/micpie/export/suzuki_miyaura_sach/train_0-11.jsonl": "{"text":"Question: What is reaction yield of a reaction with the RXNSMILES CC#N.CC(=O)O~CC(=O)O~[Pd].CC(C)(C)P(C(C)(C)C)C(C)(C)C.CCc1cccc(CC)c1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1?\nAnswer: 4.764\\%."} {"text":"Question: What is the yield of a reaction with the RXNSMILES CC(=O)O~CC(=O)O~[Pd].CC1(C)c2cccc(P(c3ccccc3)c3ccccc3)c2Oc2c(P(c3ccccc3)c3ccccc3)cccc21.CCc1cccc(CC)c1.CO.Cc1ccc2c(cnn2C2CCCCO2)c1[B-](F)(F)F.Clc1ccc2ncccc2c1.O.O=P([O-])([O-])[O-].[K+].[K+].[K+].[K+]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1?\nAnswer: 18.509\\%."}", "/scratch/micpie/export/suzuki_miyaura_sach/train_0-1.jsonl": "{"text":"The reaction SMILES string CC#N.CC(=O)O~CC(=O)O~[Pd].CC(C)(C)P(C(C)(C)C)C(C)(C)C.CCc1cccc(CC)c1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 has the reaction products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 and the reaction educts CC#N, CC(=O)O.CC(=O)O.[Pd], CC(C)(C)P(C(C)(C)C)C(C)(C)C, CCc1cccc(CC)c1, Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O, Clc1ccc2ncccc2c1, O, [Na+], and [OH-]."} {"text":"The reaction SMILES (RXNSMILES) CC(=O)O~CC(=O)O~[Pd].CC1(C)c2cccc(P(c3ccccc3)c3ccccc3)c2Oc2c(P(c3ccccc3)c3ccccc3)cccc21.CCc1cccc(CC)c1.CO.Cc1ccc2c(cnn2C2CCCCO2)c1[B-](F)(F)F.Clc1ccc2ncccc2c1.O.O=P([O-])([O-])[O-].[K+].[K+].[K+].[K+]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 has the reaction products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 and the starting materials CC(=O)O.CC(=O)O.[Pd], CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C, CN(C)C=O, Cc1ccc2c(cnn2C2CCCCO2)c1Br, O, [Cs+], [F-], [Fe+2], and c1ccc(P(c2ccccc2)[c-]2cccc2)cc1."}", "/scratch/micpie/export/suzuki_miyaura_sach/train_0-4.jsonl": "{"text":"Question: What reaction educts are needed to synthesize the reaction products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1?\nAnswer: CC#N, CC(=O)O.CC(=O)O.[Pd], CC(C)(C)P(C(C)(C)C)C(C)(C)C, CCc1cccc(CC)c1, Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O, Clc1ccc2ncccc2c1, O, [Na+], and [OH-]."} {"text":"Question: What educts are required to synthesize the products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1?\nAnswer: CC(=O)O.CC(=O)O.[Pd], CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C, CN(C)C=O, Cc1ccc2c(cnn2C2CCCCO2)c1Br, O, [Cs+], [F-], [Fe+2], and c1ccc(P(c2ccccc2)[c-]2cccc2)cc1."}", "/scratch/micpie/export/suzuki_miyaura_sach/test_0-7.jsonl": "{"text":"Question: What is the masked component in the reaction SMILES with one element hidden as `MASK` CC#N.CC(=O)O.CC(=O)O.[Pd].CCc1cccc(CC)c1.COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>MASK?\nAnswer: Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1."} {"text":"Question: What is the masked component in the masked reaction SMILES string (one component masked as `MASK`) CC(=O)O.CC(=O)O.[Pd].CN(C)C=O.COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1.Cc1ccc2c(cnn2C2CCCCO2)c1Br.O.[Cs+].[F-].MASK>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1?\nAnswer: CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C."}", "/scratch/micpie/export/suzuki_miyaura_sach/train_0-9.jsonl": "{"text":"The reaction yield of a reaction with the reaction SMILES CC#N.CC(=O)O~CC(=O)O~[Pd].CC(C)(C)P(C(C)(C)C)C(C)(C)C.CCc1cccc(CC)c1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 is 4.764\\%."} {"text":"The reaction yield (measured by LCMS) of a reaction with the reaction SMILES CC(=O)O~CC(=O)O~[Pd].CC1(C)c2cccc(P(c3ccccc3)c3ccccc3)c2Oc2c(P(c3ccccc3)c3ccccc3)cccc21.CCc1cccc(CC)c1.CO.Cc1ccc2c(cnn2C2CCCCO2)c1[B-](F)(F)F.Clc1ccc2ncccc2c1.O.O=P([O-])([O-])[O-].[K+].[K+].[K+].[K+]>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 is 18.509\\%."}", "/scratch/micpie/export/suzuki_miyaura_sach/valid_0-3.jsonl": "{"text":"The chemical with SMILES Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1 is the masked component in the reaction SMILES with one element masked as `MASK` CC#N.CC(=O)O.CC(=O)O.[Pd].CCc1cccc(CC)c1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Cc1ccccc1P(c1ccccc1C)c1ccccc1C.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>MASK."} {"text":"The chemical with SMILES CC(=O)O.CC(=O)O.[Pd] is the masked component in the masked reaction SMILES (one component masked as `MASK`) CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C.CCCCP(C12CC3CC(CC(C3)C1)C2)C12CC3CC(CC(C3)C1)C2.CN(C)C=O.Cc1ccc2c(cnn2C2CCCCO2)c1Br.O.[Cs+].[F-].MASK>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1."}", "/scratch/micpie/export/suzuki_miyaura_sach/test_0-8.jsonl": "{"text":"Task: Predict the masked component in a reaction SMILES with one element masked as `MASK`.\nDescription: CC#N.CC(=O)O.CC(=O)O.[Pd].CCc1cccc(CC)c1.COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1.Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O.Clc1ccc2ncccc2c1.O.[Na+].[OH-]>>MASK\nSolution: Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1"} {"text":"Task: Predict the masked component in a masked RXNSMILES (one component masked as `MASK`).\nDescription: CC(=O)O.CC(=O)O.[Pd].CN(C)C=O.COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1.Cc1ccc2c(cnn2C2CCCCO2)c1Br.O.[Cs+].[F-].MASK>>Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1\nAnswer: CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C"}", "/scratch/micpie/export/suzuki_miyaura_sach/test_0-4.jsonl": "{"text":"Question: What starting materials are needed to synthesize the reaction products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1?\nAnswer: CC#N, CC(=O)O.CC(=O)O.[Pd], CCc1cccc(CC)c1, COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1, Cc1ccc2c(cnn2C2CCCCO2)c1B(O)O, Clc1ccc2ncccc2c1, O, [Na+], and [OH-]."} {"text":"Question: Which starting materials are needed to produce the reaction products Cc1ccc2c(cnn2C2CCCCO2)c1-c1ccc2ncccc2c1?\nAnswer: CC(=O)O.CC(=O)O.[Pd], CC1(C)OB(c2ccc3ncccc3c2)OC1(C)C, CN(C)C=O, COc1cccc(OC)c1-c1ccccc1P(C1CCCCC1)C1CCCCC1, Cc1ccc2c(cnn2C2CCCCO2)c1Br, O, [Cs+], and [F-]."}", "/scratch/micpie/export/nr_er_lbd_tox21/test_0-10.jsonl": "{"text":"User: Can you generate the DeepSMILES of a molecule that is not toxic in the NR-ER-LBD assay?\nAssistant: Yes, I'm happy to help, here you go: CCCNCC))CCC))C=O)NccC)cccc6C"} {"text":"User: Can you give me the InChI of a molecule that is not toxic in the NR-estrogen receptor alpha ligand binding domain assay?\nAssistant: Of course, here you go: InChI=1S\/C9H12\/c1-3-9-6-4-5-8(2)7-9\/h4-7H,3H2,1-2H3"}", "/scratch/micpie/export/nr_er_lbd_tox21/valid_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the DeepSMILES CccncC)cC)n6 is toxic in the NR-estrogen receptor alpha ligand binding domain assay?\nAssistant: No, this molecule is not toxic in the NR-estrogen receptor alpha ligand binding domain assay."} {"text":"User: Can you figure out if the molecule with the DeepSMILES Occccccccc6c%10\/N=N\/cccccc6 is toxic in the estrogen receptor alpha ligand binding domain assay?\nAssistant: No, this molecule is not toxic in the estrogen receptor alpha ligand binding domain assay."}", "/scratch/micpie/export/nr_er_lbd_tox21/test_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the estrogen receptor alpha ligand binding domain assay?\nConstraint: You must select none, one or more options from a, b, or c without using any other words.\nOptions:\n(a) InChI=1S\/C7H4F3NO2\/c8-7(9,10)5-3-1-2-4-6(5)11(12)13\/h1-4H\n(b) InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20)\n(c) InChI=1S\/C22H27ClF2O5\/c1-10-4-11-12-5-15(24)13-6-16(27)14(23)7-19(13,2)21(12,25)17(28)8-20(11,3)22(10,30)18(29)9-26\/h6-7,10-12,15,17,26,28,30H,4-5,8-9H2,1-3H3\/t10-,11+,12+,15+,17+,19+,20+,21+,22+\/m1\/s1\nAnswer: a, b, c"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-estrogen receptor alpha ligand binding domain assay?\nConstraint: You must select none, one or more options from A or B without using any other words.\nOptions:\nA.) InChI=1S\/C9H12\/c1-3-9-6-4-5-8(2)7-9\/h4-7H,3H2,1-2H3\nB.) InChI=1S\/C20H17FO2S\/c1-12-17(9-13-3-6-15(24-2)7-4-13)16-8-5-14(21)10-19(16)18(12)11-20(22)23\/h3-10H,11H2,1-2H3,(H,22,23)\/b17-9-\nAnswer: A, B"}", "/scratch/micpie/export/nr_er_lbd_tox21/train_0-8.jsonl": "{"text":"User: Can you estimate if the molecule with the canonical SMILES CCOc1ccc2nc(S(N)(=O)=O)sc2c1 is toxic in the NR-ER-LBD assay?\nAssistant: No, this molecule is not toxic in the NR-ER-LBD assay."} {"text":"User: Can you tell me if the molecule with the SELFIES [C][C][=Branch1][C][=O][O][C][Branch1][C][C][=O] is toxic in the estrogen receptor alpha ligand binding domain assay?\nAssistant: No, this molecule is not toxic in the estrogen receptor alpha ligand binding domain assay."}", "/scratch/micpie/export/nr_er_lbd_tox21/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-ER-LBD assay.\nMolecule canonical SMILES: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the estrogen receptor alpha ligand binding domain assay.\nMolecule SELFIES: [C][C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"}", "/scratch/micpie/export/nr_er_lbd_tox21/valid_0-9.jsonl": "{"text":"User: Is the molecule with the SELFIES [C][C][=C][N][=C][Branch1][C][C][C][Branch1][C][C][=N][Ring1][Branch2] toxic in the NR-ER-LBD assay?\nAssistant: No, it is not toxic in the NR-ER-LBD assay."} {"text":"User: Is the molecule with the SELFIES [O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][\/N][=N][\/C][=C][C][=C][C][=C][Ring1][=Branch1] toxic in the NR-estrogen receptor alpha ligand binding domain assay?\nAssistant: No, it is not toxic in the NR-estrogen receptor alpha ligand binding domain assay."}", "/scratch/micpie/export/nr_er_lbd_tox21/test_0-1.jsonl": "{"text":"The molecule with the SELFIES [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C] is not showing toxicity in the NR-ER-LBD assay."} {"text":"The molecule with the canonical SMILES CCc1cccc(C)c1 is not showing toxicity in the NR-ER-LBD assay."}", "/scratch/micpie/export/nr_er_lbd_tox21/valid_0-0.jsonl": "{"text":"The molecule with the SELFIES [C][C][=C][N][=C][Branch1][C][C][C][Branch1][C][C][=N][Ring1][Branch2] is not toxic in the NR-ER-LBD assay."} {"text":"The molecule with the InChI InChI=1S\/C16H12N2O\/c19-15-11-10-12-6-4-5-9-14(12)16(15)18-17-13-7-2-1-3-8-13\/h1-11,19H\/b18-17+ is not toxic in the NR-ER-LBD assay."}", "/scratch/micpie/export/nr_er_lbd_tox21/test_0-2.jsonl": "{"text":"Based on the canonical SMILES representation CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C, the molecule has no NR-ER-LBD toxicity properties."} {"text":"Based on the SELFIES [C][C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1], the molecule has no estrogen receptor alpha ligand binding domain toxicity features."}", "/scratch/micpie/export/nr_er_lbd_tox21/valid_0-10.jsonl": "{"text":"User: Can you create the SELFIES of a molecule that is not toxic in the NR-estrogen receptor alpha ligand binding domain assay?\nAssistant: Of course, here you go: [C][C][=C][N][=C][Branch1][C][C][C][Branch1][C][C][=N][Ring1][Branch2]"} {"text":"User: Can you give me the canonical SMILES of a molecule that is not toxic in the NR-estrogen receptor alpha ligand binding domain assay?\nAssistant: Yes, here you go: Oc1ccc2ccccc2c1\/N=N\/c1ccccc1"}", "/scratch/micpie/export/nr_er_lbd_tox21/train_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-ER-LBD assay.\nMolecule SELFIES: [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the NR-ER-LBD assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-estrogen receptor alpha ligand binding domain assay.\nSMILES: CC(=O)OC(C)=O\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the NR-estrogen receptor alpha ligand binding domain assay."}", "/scratch/micpie/export/nr_er_lbd_tox21/valid_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-ER-LBD assay.\nDeepSMILES: CccncC)cC)n6\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the NR-ER-LBD assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-estrogen receptor alpha ligand binding domain assay.\nMolecule InChI: InChI=1S\/C16H12N2O\/c19-15-11-10-12-6-4-5-9-14(12)16(15)18-17-13-7-2-1-3-8-13\/h1-11,19H\/b18-17+\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the NR-estrogen receptor alpha ligand binding domain assay."}", "/scratch/micpie/export/nr_er_lbd_tox21/test_0-9.jsonl": "{"text":"User: Is the molecule with the canonical SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C toxic in the NR-estrogen receptor alpha ligand binding domain assay?\nAssistant: No, it is not toxic in the NR-estrogen receptor alpha ligand binding domain assay."} {"text":"User: Is the molecule with the canonical SMILES CCc1cccc(C)c1 toxic in the estrogen receptor alpha ligand binding domain assay?\nAssistant: No, it is not toxic in the estrogen receptor alpha ligand binding domain assay."}", "/scratch/micpie/export/nr_er_lbd_tox21/test_0-0.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20) is not toxic in the NR-ER-LBD assay."} {"text":"The molecule with the canonical SMILES representation of CCc1cccc(C)c1 is not toxic in the estrogen receptor alpha ligand binding domain assay."}", "/scratch/micpie/export/nr_er_lbd_tox21/valid_0-7.jsonl": "{"text":"Task: Please give me a molecule DeepSMILES based on the description.\nDescription: A molecule that is toxic in the NR-estrogen receptor alpha ligand binding domain assay.\nResult: CccncC)cC)n6"} {"text":"Task: Please create a SMILES based on the text description below.\nDescription: A molecule that is toxic in the NR-ER-LBD assay.\nResult: Oc1ccc2ccccc2c1\/N=N\/c1ccccc1"}", "/scratch/micpie/export/nr_er_lbd_tox21/test_0-3.jsonl": "{"text":"The SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C is from a molecule that is not identified as toxic in the NR-ER-LBD assay."} {"text":"The DeepSMILES CCcccccC)c6 is from a molecule that is not identified as toxic in the estrogen receptor alpha ligand binding domain assay."}", "/scratch/micpie/export/nr_er_lbd_tox21/valid_0-11.jsonl": "{"text":"User: I'm looking for the canonical SMILES of a molecule that is not toxic in the NR-ER-LBD assay?\nAssistant: This is a molecule that is not toxic in the NR-ER-LBD assay: Cc1cnc(C)c(C)n1"} {"text":"User: I'm looking for the DeepSMILES of a molecule that is not toxic in the estrogen receptor alpha ligand binding domain assay?\nAssistant: This is a molecule that is not toxic in the estrogen receptor alpha ligand binding domain assay: Occccccccc6c%10\/N=N\/cccccc6"}", "/scratch/micpie/export/nr_er_lbd_tox21/train_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N] is not toxic in the estrogen receptor alpha ligand binding domain assay."} {"text":"The molecule with the SELFIES representation of [C][C][=Branch1][C][=O][O][C][Branch1][C][C][=O] is not toxic in the NR-estrogen receptor alpha ligand binding domain assay."}", "/scratch/micpie/export/nr_er_lbd_tox21/test_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-estrogen receptor alpha ligand binding domain assay.\nMolecule SELFIES: [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the NR-estrogen receptor alpha ligand binding domain assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-estrogen receptor alpha ligand binding domain assay.\nMolecule DeepSMILES: CCcccccC)c6\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the NR-estrogen receptor alpha ligand binding domain assay."}", "/scratch/micpie/export/nr_er_lbd_tox21/train_0-10.jsonl": "{"text":"User: Can you give me the SMILES of a molecule that is not toxic in the estrogen receptor alpha ligand binding domain assay?\nAssistant: Yes, I'm happy to help, here you go: CCOc1ccc2nc(S(N)(=O)=O)sc2c1"} {"text":"User: Can you give me the canonical SMILES of a molecule that is not toxic in the NR-ER-LBD assay?\nAssistant: Yes, I'm happy to help, here you go: CC(=O)OC(C)=O"}", "/scratch/micpie/export/nr_er_lbd_tox21/train_0-3.jsonl": "{"text":"The SELFIES [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N] represents a molecule that is not identified as toxic in the NR-ER-LBD assay."} {"text":"The DeepSMILES CC=O)OCC)=O is from a molecule that is not identified as toxic in the estrogen receptor alpha ligand binding domain assay."}", "/scratch/micpie/export/nr_er_lbd_tox21/train_0-12.jsonl": "{"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very curious. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be toxic in the NR-estrogen receptor alpha ligand binding domain assay.\nAssistant: Got it, this SMILES is not toxic in the NR-estrogen receptor alpha ligand binding domain assay: CCOc1ccc2nc(S(N)(=O)=O)sc2c1"} {"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be toxic in the NR-estrogen receptor alpha ligand binding domain assay.\nAssistant: Got it, here you go, this canonical SMILES is not toxic in the NR-estrogen receptor alpha ligand binding domain assay: CC(=O)OC(C)=O"}", "/scratch/micpie/export/nr_er_lbd_tox21/test_0-13.jsonl": "{"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the estrogen receptor alpha ligand binding domain assay.\nAssistant: Got it, this InChI is not toxic in the estrogen receptor alpha ligand binding domain assay: InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20)"} {"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the NR-ER-LBD assay.\nAssistant: Understood, this SELFIES is not toxic in the NR-ER-LBD assay: [C][C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1]"}", "/scratch/micpie/export/nr_er_lbd_tox21/valid_0-2.jsonl": "{"text":"Based on the SMILES Cc1cnc(C)c(C)n1, the molecule has no estrogen receptor alpha ligand binding domain toxicity properties."} {"text":"Based on the SELFIES representation [O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][\/N][=N][\/C][=C][C][=C][C][=C][Ring1][=Branch1], the molecule has no NR-ER-LBD toxicity properties."}", "/scratch/micpie/export/nr_er_lbd_tox21/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N] toxic in the NR-estrogen receptor alpha ligand binding domain assay?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any other words.\nOptions:\n[1] False\n[2] True\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [C][C][=Branch1][C][=O][O][C][Branch1][C][C][=O] toxic in the NR-estrogen receptor alpha ligand binding domain assay?\nConstraint: Even if you are uncertain, you must pick either A or B without using any other words.\nOptions:\n[A] False\n[B] True\nAnswer: A"}", "/scratch/micpie/export/nr_er_lbd_tox21/valid_0-1.jsonl": "{"text":"The molecule with the InChI representation of InChI=1S\/C7H10N2\/c1-5-4-8-6(2)7(3)9-5\/h4H,1-3H3 is not displaying toxicity in the NR-ER ligand binding domain assay."} {"text":"The molecule with the canonical SMILES Oc1ccc2ccccc2c1\/N=N\/c1ccccc1 is not showing toxicity in the NR-ER-LBD assay."}", "/scratch/micpie/export/nr_er_lbd_tox21/valid_0-13.jsonl": "{"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the NR-estrogen receptor alpha ligand binding domain assay.\nAssistant: Got it, this DeepSMILES is not toxic in the NR-estrogen receptor alpha ligand binding domain assay: CccncC)cC)n6"} {"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the NR-estrogen receptor alpha ligand binding domain assay.\nAssistant: Ok, this SELFIES is not toxic in the NR-estrogen receptor alpha ligand binding domain assay: [O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][\/N][=N][\/C][=C][C][=C][C][=C][Ring1][=Branch1]"}", "/scratch/micpie/export/nr_er_lbd_tox21/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-ER-LBD assay.\ncanonical SMILES: Cc1cnc(C)c(C)n1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-ER-LBD assay.\nDeepSMILES: Occccccccc6c%10\/N=N\/cccccc6\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"}", "/scratch/micpie/export/nr_er_lbd_tox21/train_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the estrogen receptor alpha ligand binding domain assay?\nConstraint: You must select none, one or more options from a, b, c, d, or e without using any other words.\nOptions:\na NS=O)=O)ccccC=O)O))cc6\nb C=COCCOCCOCCOC=C\nc O=ccccccc6c=O)cc%10ccCl)c[nH]cc[nH]c6%10))cCl)ccc=O)cccccc6c=O)c%10%14\nd CCOccccncSN)=O)=O))sc5c9\ne cccc-cnc[nH]c5-cccccc6)))))))))))cc6\nAnswer: a, b, c, d, e"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-ER-LBD assay?\nConstraint: You must select none, one or more options from a or b without using any additional words.\nOptions:\na.) CC(=O)OC(C)=O\nb.) C=C(C)C1CC=C(C)CC1\nAnswer: a, b"}", "/scratch/micpie/export/nr_er_lbd_tox21/valid_0-4.jsonl": "{"text":"The InChI InChI=1S\/C7H10N2\/c1-5-4-8-6(2)7(3)9-5\/h4H,1-3H3 is not toxic in the NR-ER-LBD assay."} {"text":"The InChI InChI=1S\/C16H12N2O\/c19-15-11-10-12-6-4-5-9-14(12)16(15)18-17-13-7-2-1-3-8-13\/h1-11,19H\/b18-17+ is not toxic in the NR-estrogen receptor alpha ligand binding domain assay."}", "/scratch/micpie/export/nr_er_lbd_tox21/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-estrogen receptor alpha ligand binding domain assay.\nMolecule canonical SMILES: CCOc1ccc2nc(S(N)(=O)=O)sc2c1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-estrogen receptor alpha ligand binding domain assay.\nDeepSMILES: CC=O)OCC)=O\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"}", "/scratch/micpie/export/nr_er_lbd_tox21/valid_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-ER-LBD assay?\nConstraint: You must select none, one or more options from A or B without using any additional words.\nOptions:\n[A] Cc1cnc(C)c(C)n1\n[B] COC(=O)C1=C(C)NC(C)=C(C(=O)OCCCN2CCC(c3ccccc3)(c3ccccc3)CC2)C1c1cccc([N+](=O)[O-])c1\nAnswer: A, B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the estrogen receptor alpha ligand binding domain assay?\nConstraint: You must select none, one or more options from 1 or 2 without using any other words.\nOptions:\n1 Oc1ccc2ccccc2c1\/N=N\/c1ccccc1\n2 C=CCCCCCCCCO\nAnswer: 1, 2"}", "/scratch/micpie/export/nr_er_lbd_tox21/valid_0-12.jsonl": "{"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very curious. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be toxic in the estrogen receptor alpha ligand binding domain assay.\nAssistant: Ok, this InChI is not toxic in the estrogen receptor alpha ligand binding domain assay: InChI=1S\/C7H10N2\/c1-5-4-8-6(2)7(3)9-5\/h4H,1-3H3"} {"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be toxic in the estrogen receptor alpha ligand binding domain assay.\nAssistant: Ok, here you go, this DeepSMILES is not toxic in the estrogen receptor alpha ligand binding domain assay: Occccccccc6c%10\/N=N\/cccccc6"}", "/scratch/micpie/export/nr_er_lbd_tox21/train_0-2.jsonl": "{"text":"Based on the DeepSMILES CCOccccncSN)=O)=O))sc5c9, the molecule has no NR-estrogen receptor alpha ligand binding domain toxicity properties."} {"text":"Based on the InChI representation InChI=1S\/C4H6O3\/c1-3(5)7-4(2)6\/h1-2H3, the molecule has no NR-estrogen receptor alpha ligand binding domain toxicity features."}", "/scratch/micpie/export/nr_er_lbd_tox21/test_0-11.jsonl": "{"text":"User: I'm looking for the InChI of a molecule that is not toxic in the NR-ER-LBD assay?\nAssistant: This is a molecule that is not toxic in the NR-ER-LBD assay: InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20)"} {"text":"User: I'm searching for the canonical SMILES of a molecule that is not toxic in the NR-ER-LBD assay?\nAssistant: This is a molecule that is not toxic in the NR-ER-LBD assay: CCc1cccc(C)c1"}", "/scratch/micpie/export/nr_er_lbd_tox21/train_0-7.jsonl": "{"text":"Task: Please create a molecule InChI based on the text description.\nDescription: A molecule that is toxic in the estrogen receptor alpha ligand binding domain assay.\nResult: InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13)"} {"text":"Task: Please give me a SMILES based on the text description.\nDescription: A molecule that is toxic in the NR-estrogen receptor alpha ligand binding domain assay.\nResult: CC(=O)OC(C)=O"}", "/scratch/micpie/export/nr_er_lbd_tox21/train_0-11.jsonl": "{"text":"User: I'm looking for the SMILES of a molecule that is not toxic in the NR-estrogen receptor alpha ligand binding domain assay?\nAssistant: This is a molecule that is not toxic in the NR-estrogen receptor alpha ligand binding domain assay: CCOc1ccc2nc(S(N)(=O)=O)sc2c1"} {"text":"User: I'm looking for the SELFIES of a molecule that is not toxic in the NR-estrogen receptor alpha ligand binding domain assay?\nAssistant: This is a molecule that is not toxic in the NR-estrogen receptor alpha ligand binding domain assay: [C][C][=Branch1][C][=O][O][C][Branch1][C][C][=O]"}", "/scratch/micpie/export/nr_er_lbd_tox21/train_0-1.jsonl": "{"text":"The molecule with the InChI representation of InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13) is not demonstrating toxicity in the NR-estrogen-LBD receptor alpha assay."} {"text":"The molecule with the InChI representation of InChI=1S\/C4H6O3\/c1-3(5)7-4(2)6\/h1-2H3 is not displaying toxicity in the NR-ER ligand binding domain assay."}", "/scratch/micpie/export/nr_er_lbd_tox21/train_0-13.jsonl": "{"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the NR-ER-LBD assay.\nAssistant: Got it, this DeepSMILES is not toxic in the NR-ER-LBD assay: CCOccccncSN)=O)=O))sc5c9"} {"text":"User: I want to generate a InChI.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the NR-estrogen receptor alpha ligand binding domain assay.\nAssistant: Understood, this InChI is not toxic in the NR-estrogen receptor alpha ligand binding domain assay: InChI=1S\/C4H6O3\/c1-3(5)7-4(2)6\/h1-2H3"}", "/scratch/micpie/export/nr_er_lbd_tox21/train_0-4.jsonl": "{"text":"The molecule SMILES CCOc1ccc2nc(S(N)(=O)=O)sc2c1 is not toxic in the estrogen receptor alpha ligand binding domain assay."} {"text":"The InChI InChI=1S\/C4H6O3\/c1-3(5)7-4(2)6\/h1-2H3 is not toxic in the NR-ER-LBD assay."}", "/scratch/micpie/export/nr_er_lbd_tox21/test_0-7.jsonl": "{"text":"Task: Please create a SELFIES based on the description below.\nDescription: A molecule that is toxic in the NR-ER-LBD assay.\nResult: [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C]"} {"text":"Task: Please give me a molecule SELFIES based on the text description below.\nDescription: A molecule that is toxic in the NR-estrogen receptor alpha ligand binding domain assay.\nResult: [C][C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1]"}", "/scratch/micpie/export/nr_er_lbd_tox21/train_0-9.jsonl": "{"text":"User: Is the molecule with the canonical SMILES CCOc1ccc2nc(S(N)(=O)=O)sc2c1 toxic in the NR-ER-LBD assay?\nAssistant: No, it is not toxic in the NR-ER-LBD assay."} {"text":"User: Is the molecule with the SMILES CC(=O)OC(C)=O toxic in the NR-estrogen receptor alpha ligand binding domain assay?\nAssistant: No, it is not toxic in the NR-estrogen receptor alpha ligand binding domain assay."}", "/scratch/micpie/export/nr_er_lbd_tox21/valid_0-3.jsonl": "{"text":"The SMILES Cc1cnc(C)c(C)n1 represents a molecule that is not identified as toxic in the NR-estrogen receptor alpha ligand binding domain assay."} {"text":"The SMILES Oc1ccc2ccccc2c1\/N=N\/c1ccccc1 represents a molecule that is not identified as toxic in the NR-estrogen receptor alpha ligand binding domain assay."}", "/scratch/micpie/export/nr_er_lbd_tox21/test_0-8.jsonl": "{"text":"User: Can you figure out if the molecule with the SELFIES [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C] is toxic in the estrogen receptor alpha ligand binding domain assay?\nAssistant: No, this molecule is not toxic in the estrogen receptor alpha ligand binding domain assay."} {"text":"User: Can you figure out if the molecule with the canonical SMILES CCc1cccc(C)c1 is toxic in the NR-ER-LBD assay?\nAssistant: No, this molecule is not toxic in the NR-ER-LBD assay."}", "/scratch/micpie/export/nr_er_lbd_tox21/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C] toxic in the NR-estrogen receptor alpha ligand binding domain assay?\nConstraint: Even if you are uncertain, you must pick either a or b without using any additional words.\nOptions:\na. False\nb. True\nAnswer: a"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of CCc1cccc(C)c1 toxic in the NR-ER-LBD assay?\nConstraint: Even if you are not sure, you must pick either a or b without using any additional words.\nOptions:\n(a) False\n(b) True\nAnswer: a"}", "/scratch/micpie/export/nr_er_lbd_tox21/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES representation of CccncC)cC)n6 toxic in the estrogen receptor alpha ligand binding domain assay?\nConstraint: Even if you are not sure, you must pick either A or B without using any additional words.\nOptions:\nA.) True\nB.) False\nAnswer: B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [O][C][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Ring1][#Branch2][\/N][=N][\/C][=C][C][=C][C][=C][Ring1][=Branch1] toxic in the NR-estrogen receptor alpha ligand binding domain assay?\nConstraint: Even if you are uncertain, you must pick either A or B without using any additional words.\nOptions:\n[A] True\n[B] False\nAnswer: B"}", "/scratch/micpie/export/nr_er_lbd_tox21/test_0-4.jsonl": "{"text":"The SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C is not toxic in the estrogen receptor alpha ligand binding domain assay."} {"text":"The molecule InChI InChI=1S\/C9H12\/c1-3-9-6-4-5-8(2)7-9\/h4-7H,3H2,1-2H3 is not toxic in the NR-ER-LBD assay."}", "/scratch/micpie/export/nr_er_lbd_tox21/test_0-12.jsonl": "{"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the estrogen receptor alpha ligand binding domain assay.\nAssistant: Got it, this SMILES is not toxic in the estrogen receptor alpha ligand binding domain assay: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be toxic in the NR-ER-LBD assay.\nAssistant: Ok, here you go, this SELFIES is not toxic in the NR-ER-LBD assay: [C][C][C][=C][C][=C][C][Branch1][C][C][=C][Ring1][#Branch1]"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/test_0-10.jsonl": "{"text":"User: I'm looking for the SMILES of a molecule that is not inhibiting CYP2C9?\nAssistant: This is a molecule that is not inhibiting CYP2C9: CCOC(=O)C(NC(C)=O)C(OC(C)=O)c1cccc(N(CCO)CCO)c1"} {"text":"User: I'm looking for the InChI of a molecule that is not inhibiting CYP P450 2C9?\nAssistant: This is a molecule that is not inhibiting CYP P450 2C9: InChI=1S\/C14H20N2O4\/c1-3-6-15-7-8-19-9-10-20-14-5-4-12(2)11-13(14)16(17)18\/h3-5,11,15H,1,6-10H2,2H3"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/valid_0-8.jsonl": "{"text":"User: Is the molecule with the SMILES Cc1ccc2cc(C)c3nnc(SCC(=O)NCc4ccco4)n3c2c1 inhibiting CYP2C9?\nAssistant: Yes, it is inhibiting CYP2C9."} {"text":"User: Is the molecule with the InChI InChI=1S\/C23H28N2O2\/c1-27-21-10-8-19(9-11-21)22(26)24-16-13-23(14-17-24)12-5-15-25(18-23)20-6-3-2-4-7-20\/h2-4,6-11H,5,12-18H2,1H3 inhibiting CYP2C9?\nAssistant: Yes, it is inhibiting CYP2C9."}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/train_0-8.jsonl": "{"text":"User: Is the molecule with the DeepSMILES Clcccccc6-cnc-cccccc6))))))n[nH]5 inhibiting CYP2C9?\nAssistant: Yes, it is inhibiting CYP2C9."} {"text":"User: Is the molecule with the SELFIES [C][C][C][=N][C][=C][Branch1][Ring1][C][O][N][Ring1][#Branch1][C][C][=C][C][=C][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1] inhibiting CYP P450 2C9?\nAssistant: No, it is not inhibiting CYP P450 2C9."}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2C9.\nMolecule DeepSMILES: CCOC=O)CNCC)=O)))COCC)=O)))cccccNCCO)))CCO))))c6\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not inhibiting CYP P450 2C9."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP2C9.\nSELFIES: [C][=C][C][N][C][C][O][C][C][O][C][=C][C][=C][Branch1][C][C][C][=C][Ring1][#Branch1][N+1][=Branch1][C][=O][O-1]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not inhibiting CYP2C9."}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/valid_0-9.jsonl": "{"text":"User: Can you generate the canonical SMILES of a molecule that is inhibiting CYP P450 2C9?\nAssistant: Of course, here you go: Cc1ccc2cc(C)c3nnc(SCC(=O)NCc4ccco4)n3c2c1"} {"text":"User: Can you give me the DeepSMILES of a molecule that is inhibiting CYP P450 2C9?\nAssistant: Yes, here you go: COccccC=O)NCCCCCCNcccccc6))))))C6)))))CC6)))))))cc6"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/test_0-1.jsonl": "{"text":"Based on the SELFIES representation [C][C][O][C][=Branch1][C][=O][C][Branch1][#Branch1][N][C][Branch1][C][C][=O][C][Branch1][#Branch1][O][C][Branch1][C][C][=O][C][=C][C][=C][C][Branch1][#Branch2][N][Branch1][Ring2][C][C][O][C][C][O][=C][Ring1][=N], the molecule shows no inhibition of CYP2C9."} {"text":"Based on the InChI InChI=1S\/C14H20N2O4\/c1-3-6-15-7-8-19-9-10-20-14-5-4-12(2)11-13(14)16(17)18\/h3-5,11,15H,1,6-10H2,2H3, the molecule exhibits no inhibition of CYP2C9."}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/valid_0-0.jsonl": "{"text":"The molecule with the SMILES representation of Cc1ccc2cc(C)c3nnc(SCC(=O)NCc4ccco4)n3c2c1 displays inhibition of CYP2C9."} {"text":"The molecule with the SMILES representation of COc1ccc(C(=O)N2CCC3(CCCN(c4ccccc4)C3)CC2)cc1 shows inhibition of CYP P450 2C9."}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/test_0-2.jsonl": "{"text":"The SELFIES [C][C][O][C][=Branch1][C][=O][C][Branch1][#Branch1][N][C][Branch1][C][C][=O][C][Branch1][#Branch1][O][C][Branch1][C][C][=O][C][=C][C][=C][C][Branch1][#Branch2][N][Branch1][Ring2][C][C][O][C][C][O][=C][Ring1][=N] is from a molecule that displays no inhibition of CYP2C9."} {"text":"The DeepSMILES C=CCNCCOCCOccccC)cc6[N+]=O)[O-] is from a molecule that shows no inhibition of CYP2C9."}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/valid_0-10.jsonl": "{"text":"User: I'm searching for the SELFIES of a molecule that is inhibiting CYP P450 2C9?\nAssistant: This is a molecule that is inhibiting CYP P450 2C9: [C][C][=C][C][=C][C][=C][Branch1][C][C][C][=N][N][=C][Branch1][S][S][C][C][=Branch1][C][=O][N][C][C][=C][C][=C][O][Ring1][Branch1][N][Ring1][S][C][Ring2][Ring1][Branch1][=C][Ring2][Ring1][=Branch2]"} {"text":"User: I'm searching for the canonical SMILES of a molecule that is inhibiting CYP P450 2C9?\nAssistant: This is a molecule that is inhibiting CYP P450 2C9: COc1ccc(C(=O)N2CCC3(CCCN(c4ccccc4)C3)CC2)cc1"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/train_0-6.jsonl": "{"text":"Task: Please give me a canonical SMILES based on the text description.\nDescription: A molecule that is inhibiting CYP2C9.\nResult: Clc1ccccc1-c1nc(-c2ccccc2)n[nH]1"} {"text":"Task: Please give me a SELFIES based on the description.\nDescription: A molecule that is inhibiting CYP P450 2C9.\nResult: [C][C][C][=N][C][=C][Branch1][Ring1][C][O][N][Ring1][#Branch1][C][C][=C][C][=C][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1]"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/valid_0-6.jsonl": "{"text":"Task: Please generate a molecule InChI based on the description below.\nDescription: A molecule that is inhibiting CYP2C9.\nResult: InChI=1S\/C19H18N4O2S\/c1-12-5-6-14-9-13(2)18-21-22-19(23(18)16(14)8-12)26-11-17(24)20-10-15-4-3-7-25-15\/h3-9H,10-11H2,1-2H3,(H,20,24)"} {"text":"Task: Please generate a molecule canonical SMILES based on the text description below.\nDescription: A molecule that is inhibiting CYP P450 2C9.\nResult: COc1ccc(C(=O)N2CCC3(CCCN(c4ccccc4)C3)CC2)cc1"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/test_0-9.jsonl": "{"text":"User: Can you give me the canonical SMILES of a molecule that is not inhibiting CYP2C9?\nAssistant: Sure, here you go: CCOC(=O)C(NC(C)=O)C(OC(C)=O)c1cccc(N(CCO)CCO)c1"} {"text":"User: Can you create the canonical SMILES of a molecule that is not inhibiting CYP2C9?\nAssistant: Sure, here you go: C=CCNCCOCCOc1ccc(C)cc1[N+](=O)[O-]"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/test_0-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of CCOC=O)CNCC)=O)))COCC)=O)))cccccNCCO)))CCO))))c6 exhibits no inhibition of CYP2C9."} {"text":"The molecule with the InChI InChI=1S\/C14H20N2O4\/c1-3-6-15-7-8-19-9-10-20-14-5-4-12(2)11-13(14)16(17)18\/h3-5,11,15H,1,6-10H2,2H3 shows no inhibition of CYP2C9."}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/valid_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the canonical SMILES Cc1ccc2cc(C)c3nnc(SCC(=O)NCc4ccco4)n3c2c1 is inhibiting CYP P450 2C9?\nAssistant: Yes, this molecule is inhibiting CYP P450 2C9."} {"text":"User: Can you tell me if the molecule with the SELFIES [C][O][C][=C][C][=C][Branch2][Ring1][P][C][=Branch1][C][=O][N][C][C][C][Branch2][Ring1][C][C][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][N][C][C][Ring1][P][C][=C][Ring2][Ring1][=Branch2] is inhibiting CYP P450 2C9?\nAssistant: Yes, this molecule is inhibiting CYP P450 2C9."}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/test_0-3.jsonl": "{"text":"The canonical SMILES CCOC(=O)C(NC(C)=O)C(OC(C)=O)c1cccc(N(CCO)CCO)c1 is not inhibiting CYP P450 2C9."} {"text":"The molecule SELFIES [C][=C][C][N][C][C][O][C][C][O][C][=C][C][=C][Branch1][C][C][C][=C][Ring1][#Branch1][N+1][=Branch1][C][=O][O-1] is not inhibiting CYP2C9."}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/valid_0-11.jsonl": "{"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should be inhibiting CYP2C9.\nAssistant: Got it, this SMILES is inhibiting CYP2C9: Cc1ccc2cc(C)c3nnc(SCC(=O)NCc4ccco4)n3c2c1"} {"text":"User: I want to come up with a InChI.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should be inhibiting CYP2C9.\nAssistant: Ok, here you go, this InChI is inhibiting CYP2C9: InChI=1S\/C23H28N2O2\/c1-27-21-10-8-19(9-11-21)22(26)24-16-13-23(14-17-24)12-5-15-25(18-23)20-6-3-2-4-7-20\/h2-4,6-11H,5,12-18H2,1H3"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/train_0-0.jsonl": "{"text":"The molecule with the SELFIES [Cl][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][=N][NH1][Ring1][O] shows inhibition of CYP P450 2C9."} {"text":"The molecule with the InChI representation of InChI=1S\/C17H18N2O\/c1-2-17-18-10-15(12-20)19(17)11-14-8-5-7-13-6-3-4-9-16(13)14\/h3-10,20H,2,11-12H2,1H3 exhibits no inhibition of CYP P450 2C9."}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/test_0-6.jsonl": "{"text":"Task: Please generate a molecule DeepSMILES based on the description.\nDescription: A molecule that is inhibiting CYP P450 2C9.\nResult: CCOC=O)CNCC)=O)))COCC)=O)))cccccNCCO)))CCO))))c6"} {"text":"Task: Please generate a DeepSMILES based on the text description below.\nDescription: A molecule that is inhibiting CYP2C9.\nResult: C=CCNCCOCCOccccC)cc6[N+]=O)[O-]"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/train_0-10.jsonl": "{"text":"User: I'm looking for the SMILES of a molecule that is inhibiting CYP2C9?\nAssistant: This is a molecule that is inhibiting CYP2C9: Clc1ccccc1-c1nc(-c2ccccc2)n[nH]1"} {"text":"User: I'm searching for the DeepSMILES of a molecule that is not inhibiting CYP P450 2C9?\nAssistant: This is a molecule that is not inhibiting CYP P450 2C9: CCcnccCO))n5Ccccccccccc%106"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/train_0-3.jsonl": "{"text":"The molecule canonical SMILES Clc1ccccc1-c1nc(-c2ccccc2)n[nH]1 is inhibiting CYP P450 2C9."} {"text":"The InChI InChI=1S\/C17H18N2O\/c1-2-17-18-10-15(12-20)19(17)11-14-8-5-7-13-6-3-4-9-16(13)14\/h3-10,20H,2,11-12H2,1H3 is not inhibiting CYP P450 2C9."}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/train_0-12.jsonl": "{"text":"User: I want to generate a molecule InChI.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should be inhibiting CYP P450 2C9.\nAssistant: Got it, this InChI is inhibiting CYP P450 2C9: InChI=1S\/C14H10ClN3\/c15-12-9-5-4-8-11(12)14-16-13(17-18-14)10-6-2-1-3-7-10\/h1-9H,(H,16,17,18)"} {"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be inhibiting CYP2C9.\nAssistant: Ok, this canonical SMILES is not inhibiting CYP2C9: CCc1ncc(CO)n1Cc1cccc2ccccc12"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI InChI=1S\/C19H28N2O7\/c1-4-27-19(26)17(20-13(2)24)18(28-14(3)25)15-6-5-7-16(12-15)21(8-10-22)9-11-23\/h5-7,12,17-18,22-23H,4,8-11H2,1-3H3,(H,20,24) inhibiting CYP P450 2C9?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any additional words.\nOptions:\n1: False\n2: True\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [C][=C][C][N][C][C][O][C][C][O][C][=C][C][=C][Branch1][C][C][C][=C][Ring1][#Branch1][N+1][=Branch1][C][=O][O-1] inhibiting CYP2C9?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any other words.\nOptions:\n1. True\n2. False\nAnswer: 2"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/valid_0-2.jsonl": "{"text":"The InChI InChI=1S\/C19H18N4O2S\/c1-12-5-6-14-9-13(2)18-21-22-19(23(18)16(14)8-12)26-11-17(24)20-10-15-4-3-7-25-15\/h3-9H,10-11H2,1-2H3,(H,20,24) is from a molecule that exhibits inhibition of CYP P450 2C9."} {"text":"The SMILES COc1ccc(C(=O)N2CCC3(CCCN(c4ccccc4)C3)CC2)cc1 represents a molecule that shows inhibition of CYP2C9."}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are inhibiting CYP P450 2C9?\nConstraint: You must select none, one or more options from a, b, c, or d without using any additional words.\nOptions:\na.) CCCC[C@@H]C[C@H]3CNC=O)cccccc6))))))))cccccc6CF)F)F\nb.) C[C@@H]Ccccccc6)))))))[C@@H]C)N\nc.) Clcccccc6-cnc-cccccc6))))))n[nH]5\nd.) Clcccccc6-cncccNCcccccc6))))))))n6\nAnswer: a, c"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting CYP2C9?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any additional words.\nOptions:\n1.) COccccC[C@H]N)C=O)N[C@H][C@@H]CO))O[C@@H]ncnccNC)C))ncnc69)))))))))[C@H]5O))))))))))cc6\n2.) CCcnccCO))n5Ccccccccccc%106\n3.) O=cc-ccccF)cc6))))))nccncNcccccc6)))))))nc6n%10CCC3\nAnswer: 1, 2, 3"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/valid_0-1.jsonl": "{"text":"Based on the canonical SMILES Cc1ccc2cc(C)c3nnc(SCC(=O)NCc4ccco4)n3c2c1, the molecule displays inhibition of CYP P450 2C9."} {"text":"Based on the SELFIES [C][O][C][=C][C][=C][Branch2][Ring1][P][C][=Branch1][C][=O][N][C][C][C][Branch2][Ring1][C][C][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][N][C][C][Ring1][P][C][=C][Ring2][Ring1][=Branch2], the molecule displays inhibition of CYP P450 2C9."}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES CccccccC)cnncSCC=O)NCcccco5))))))))))n5c9c%13 inhibiting CYP2C9?\nConstraint: Even if you are not sure, you must pick either A or B without using any additional words.\nOptions:\n(A) True\n(B) False\nAnswer: A"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI InChI=1S\/C23H28N2O2\/c1-27-21-10-8-19(9-11-21)22(26)24-16-13-23(14-17-24)12-5-15-25(18-23)20-6-3-2-4-7-20\/h2-4,6-11H,5,12-18H2,1H3 inhibiting CYP2C9?\nConstraint: Even if you are not sure, you must pick either a or b without using any additional words.\nOptions:\na: False\nb: True\nAnswer: b"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP2C9.\nMolecule canonical SMILES: Cc1ccc2cc(C)c3nnc(SCC(=O)NCc4ccco4)n3c2c1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is inhibiting CYP2C9."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP2C9.\nDeepSMILES: COccccC=O)NCCCCCCNcccccc6))))))C6)))))CC6)))))))cc6\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is inhibiting CYP2C9."}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2C9.\nMolecule SMILES: Cc1ccc2cc(C)c3nnc(SCC(=O)NCc4ccco4)n3c2c1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2C9.\nMolecule SELFIES: [C][O][C][=C][C][=C][Branch2][Ring1][P][C][=Branch1][C][=O][N][C][C][C][Branch2][Ring1][C][C][C][C][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][N][C][C][Ring1][P][C][=C][Ring2][Ring1][=Branch2]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP2C9.\nMolecule SMILES: Clc1ccccc1-c1nc(-c2ccccc2)n[nH]1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is inhibiting CYP2C9."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP2C9.\nMolecule SMILES: CCc1ncc(CO)n1Cc1cccc2ccccc12\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not inhibiting CYP2C9."}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/valid_0-12.jsonl": "{"text":"User: I want to come up with a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should be inhibiting CYP P450 2C9.\nAssistant: Understood, this canonical SMILES is inhibiting CYP P450 2C9: Cc1ccc2cc(C)c3nnc(SCC(=O)NCc4ccco4)n3c2c1"} {"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should be inhibiting CYP P450 2C9.\nAssistant: Got it, this SMILES is inhibiting CYP P450 2C9: COc1ccc(C(=O)N2CCC3(CCCN(c4ccccc4)C3)CC2)cc1"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/train_0-2.jsonl": "{"text":"The InChI InChI=1S\/C14H10ClN3\/c15-12-9-5-4-8-11(12)14-16-13(17-18-14)10-6-2-1-3-7-10\/h1-9H,(H,16,17,18) is from a molecule that displays inhibition of CYP2C9."} {"text":"The canonical SMILES CCc1ncc(CO)n1Cc1cccc2ccccc12 is from a molecule that displays no inhibition of CYP P450 2C9."}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/test_0-11.jsonl": "{"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be inhibiting CYP P450 2C9.\nAssistant: Got it, this DeepSMILES is not inhibiting CYP P450 2C9: CCOC=O)CNCC)=O)))COCC)=O)))cccccNCCO)))CCO))))c6"} {"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be inhibiting CYP2C9.\nAssistant: Ok, here you go, this SMILES is not inhibiting CYP2C9: C=CCNCCOCCOc1ccc(C)cc1[N+](=O)[O-]"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/train_0-7.jsonl": "{"text":"User: Can you derive if the molecule with the InChI InChI=1S\/C14H10ClN3\/c15-12-9-5-4-8-11(12)14-16-13(17-18-14)10-6-2-1-3-7-10\/h1-9H,(H,16,17,18) is inhibiting CYP2C9?\nAssistant: Yes, this molecule is inhibiting CYP2C9."} {"text":"User: Can you tell me if the molecule with the SMILES CCc1ncc(CO)n1Cc1cccc2ccccc12 is inhibiting CYP2C9?\nAssistant: No, this molecule is not inhibiting CYP2C9."}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/train_0-11.jsonl": "{"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should be inhibiting CYP2C9.\nAssistant: Ok, here you go, this DeepSMILES is inhibiting CYP2C9: Clcccccc6-cnc-cccccc6))))))n[nH]5"} {"text":"User: I want to create a SMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be inhibiting CYP P450 2C9.\nAssistant: Got it, here you go, this SMILES is not inhibiting CYP P450 2C9: CCc1ncc(CO)n1Cc1cccc2ccccc12"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/train_0-1.jsonl": "{"text":"Based on the DeepSMILES Clcccccc6-cnc-cccccc6))))))n[nH]5, the molecule exhibits inhibition of CYP P450 2C9."} {"text":"Based on the SMILES CCc1ncc(CO)n1Cc1cccc2ccccc12, the molecule shows no inhibition of CYP2C9."}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES Clc1ccccc1-c1nc(-c2ccccc2)n[nH]1 inhibiting CYP P450 2C9?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any other words.\nOptions:\n1. False\n2. True\nAnswer: 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES representation of CCc1ncc(CO)n1Cc1cccc2ccccc12 inhibiting CYP P450 2C9?\nConstraint: Even if you are not sure, you must pick either A or B without using any additional words.\nOptions:\nA. False\nB. True\nAnswer: A"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP2C9.\ncanonical SMILES: Clc1ccccc1-c1nc(-c2ccccc2)n[nH]1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2C9.\nMolecule DeepSMILES: CCcnccCO))n5Ccccccccccc%106\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/test_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the canonical SMILES CCOC(=O)C(NC(C)=O)C(OC(C)=O)c1cccc(N(CCO)CCO)c1 is inhibiting CYP P450 2C9?\nAssistant: No, this molecule is not inhibiting CYP P450 2C9."} {"text":"User: Can you tell me if the molecule with the canonical SMILES C=CCNCCOCCOc1ccc(C)cc1[N+](=O)[O-] is inhibiting CYP P450 2C9?\nAssistant: No, this molecule is not inhibiting CYP P450 2C9."}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/train_0-9.jsonl": "{"text":"User: Can you give me the DeepSMILES of a molecule that is inhibiting CYP2C9?\nAssistant: Yes, here you go: Clcccccc6-cnc-cccccc6))))))n[nH]5"} {"text":"User: Can you generate the SMILES of a molecule that is not inhibiting CYP P450 2C9?\nAssistant: Sure, here you go: CCc1ncc(CO)n1Cc1cccc2ccccc12"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/valid_0-3.jsonl": "{"text":"The molecule SMILES Cc1ccc2cc(C)c3nnc(SCC(=O)NCc4ccco4)n3c2c1 is inhibiting CYP2C9."} {"text":"The molecule InChI InChI=1S\/C23H28N2O2\/c1-27-21-10-8-19(9-11-21)22(26)24-16-13-23(14-17-24)12-5-15-25(18-23)20-6-3-2-4-7-20\/h2-4,6-11H,5,12-18H2,1H3 is inhibiting CYP P450 2C9."}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/test_0-8.jsonl": "{"text":"User: Is the molecule with the DeepSMILES CCOC=O)CNCC)=O)))COCC)=O)))cccccNCCO)))CCO))))c6 inhibiting CYP2C9?\nAssistant: No, it is not inhibiting CYP2C9."} {"text":"User: Is the molecule with the SELFIES [C][=C][C][N][C][C][O][C][C][O][C][=C][C][=C][Branch1][C][C][C][=C][Ring1][#Branch1][N+1][=Branch1][C][=O][O-1] inhibiting CYP P450 2C9?\nAssistant: No, it is not inhibiting CYP P450 2C9."}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting CYP2C9?\nConstraint: You must select none, one or more options from A, B, or C without using any additional words.\nOptions:\nA) CCOC(=O)C(NC(C)=O)C(OC(C)=O)c1cccc(N(CCO)CCO)c1\nB) COc1ccc(-n2c(=O)c(-c3ccc(F)c(F)c3)nc3cncnc32)cc1\nC) Cc1cnc(CNc2ncncc2-c2ccc3c(c2)OCO3)cn1\nAnswer: A, B, C"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting CYP2C9?\nConstraint: You must select none, one or more options from A, B, C, or D without using any additional words.\nOptions:\nA. InChI=1S\/C17H19N5O2\/c1-11-16(23)22(10-12-6-5-7-13(8-12)24-4)15-14(19-11)9-18-17(20-15)21(2)3\/h5-9H,10H2,1-4H3\nB. InChI=1S\/C27H30O5\/c1-26-13-12-18-17-9-7-16(28)14-15(17)6-8-19(18)22(26)10-11-23(26)32-27(31)24(29)20-4-2-3-5-21(20)25(27)30\/h2-5,14,17-19,22-23,31H,6-13H2,1H3\/t17-,18-,19+,22-,23-,26-\/m1\/s1\nC. InChI=1S\/C13H12N2O2\/c16-12(17)6-11-13-9(7-14-11)5-8-3-1-2-4-10(8)15-13\/h1-5,11,14H,6-7H2,(H,16,17)\/t11-\/m1\/s1\nD. InChI=1S\/C14H20N2O4\/c1-3-6-15-7-8-19-9-10-20-14-5-4-12(2)11-13(14)16(17)18\/h3-5,11,15H,1,6-10H2,2H3\nAnswer: B, C, D"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are inhibiting CYP2C9?\nConstraint: You must select none, one or more options from a, b, or c without using any other words.\nOptions:\n(a) O=c1c(CCc2ccccc2)nc2cncnc2n1Cc1ccc(F)cc1\n(b) O=c1\/c(=C\\c2ccc(O)cc2)sc2n1C(c1ccccc1)C(c1ccccc1)=NN=2\n(c) Cc1ccc2cc(C)c3nnc(SCC(=O)NCc4ccco4)n3c2c1\nAnswer: a, b, c"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are inhibiting CYP2C9?\nConstraint: You must select none, one or more options from a, b, or c without using any other words.\nOptions:\n(a) CCCCNC(=O)CSc1nc2cc(OC)c(OC)cc2c(=O)n1Cc1ccc(OC)cc1\n(b) COc1ccc(C(=O)N2CCC3(CCCN(c4ccccc4)C3)CC2)cc1\n(c) COc1cccc(-c2cc(N(C)Cc3ccco3)ncn2)c1\nAnswer: a, b"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2C9.\nInChI: InChI=1S\/C19H28N2O7\/c1-4-27-19(26)17(20-13(2)24)18(28-14(3)25)15-6-5-7-16(12-15)21(8-10-22)9-11-23\/h5-7,12,17-18,22-23H,4,8-11H2,1-3H3,(H,20,24)\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting CYP P450 2C9.\nDeepSMILES: C=CCNCCOCCOccccC)cc6[N+]=O)[O-]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/cyp_p450_2c9_inhibition_veith_et_al/test_0-12.jsonl": "{"text":"User: I want to come up with a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be inhibiting CYP P450 2C9.\nAssistant: Ok, this canonical SMILES is not inhibiting CYP P450 2C9: CCOC(=O)C(NC(C)=O)C(OC(C)=O)c1cccc(N(CCO)CCO)c1"} {"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be inhibiting CYP P450 2C9.\nAssistant: Got it, this SELFIES is not inhibiting CYP P450 2C9: [C][=C][C][N][C][C][O][C][C][O][C][=C][C][=C][Branch1][C][C][C][=C][Ring1][#Branch1][N+1][=Branch1][C][=O][O-1]"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/test_0-10.jsonl": "{"text":"User: Can you generate the SMILES of a molecule that is not P-glycoprotein inhibitory?\nAssistant: Yes, I'm happy to help, here you go: COc1ccc2c(c1)[C@@]13CCCC[C@H]1[C@@H](C2)N(C)CC3"} {"text":"User: Can you give me the InChI of a molecule that is not Pgp inhibitory?\nAssistant: Yes, I'm happy to help, here you go: InChI=1S\/C19H24N2S\/c1-4-20(5-2)15(3)14-21-16-10-6-8-12-18(16)22-19-13-9-7-11-17(19)21\/h6-13,15H,4-5,14H2,1-3H3\/t15-\/m1\/s1"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/valid_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the SELFIES [C][O][C][=Branch1][C][=O][C@@H1][C@@H1][Branch1][Ring1][O][C][C@@H1][Branch2][Ring1][#Branch2][O][C][=Branch1][C][=O][C][=C][C][Branch1][Ring1][O][C][=C][Branch1][Ring1][O][C][C][Branch1][Ring1][O][C][=C][Ring1][N][C][C@H1][C][N][C][C][C][=C][Branch1][S][NH1][C][=C][C][Branch1][Ring1][O][C][=C][C][=C][Ring1][O][Ring1][Branch2][C@H1][Ring1][#C][C][C@@H1][Ring2][Ring2][Branch2][Ring2][Ring1][Ring1] is Pgp inhibitory?\nAssistant: Yes, this molecule is Pgp inhibitory."} {"text":"User: Can you estimate if the molecule with the canonical SMILES Cn1c(=O)c2c(ncn2C)n(C)c1=O is P-glycoprotein inhibitory?\nAssistant: No, this molecule is not P-glycoprotein inhibitory."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/test_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not P-glycoprotein inhibitory?\nConstraint: You must select none, one or more options from a, b, c, or d without using any additional words.\nOptions:\na. InChI=1S\/C27H26N4O2\/c1-31(2)23-15-11-20(12-16-23)26-25(19-9-13-22(28)14-10-19)29-27(30-26)21-7-4-18(5-8-21)6-17-24(32)33-3\/h4-17H,28H2,1-3H3,(H,29,30)\/b17-6+\nb. InChI=1S\/C15H14FN3O3\/c1-3-22-15(21)13-12-7-18(2)14(20)10-6-9(16)4-5-11(10)19(12)8-17-13\/h4-6,8H,3,7H2,1-2H3\nc. InChI=1S\/C21H26BrNO4\/c1-23-7-6-13-9-18(24-2)20(26-4)11-15(13)17(23)8-14-10-19(25-3)21(27-5)12-16(14)22\/h9-12,17H,6-8H2,1-5H3\/t17-\/m0\/s1\nd. InChI=1S\/C18H25NO\/c1-19-10-9-18-8-4-3-5-15(18)17(19)11-13-6-7-14(20-2)12-16(13)18\/h6-7,12,15,17H,3-5,8-11H2,1-2H3\/t15-,17+,18+\/m0\/s1\nAnswer: b, d"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not Pgp inhibitory?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any additional words.\nOptions:\nA) NccccS=O)=O)NccccCl)nn6))))))))cc6\nB) CCC)NC[C@@H]O)COccccC=O)CCcccccc6)))))))))cc6))))))))))CC)C\nC) NS=O)=O)cccC=O)O))cNCcccco5)))))))cc6Cl\nD) COC=O)ccnccNCCNCCcccccc6))))))))CC6))))))ncnc96))))))CCCC6\nE) CCNCC))[C@H]C)CNcccccc6Scccccc6%14\nAnswer: A, C, E"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/train_0-8.jsonl": "{"text":"User: Can you figure out if the molecule with the InChI InChI=1S\/C18H16O8\/c1-23-12-4-8(5-13(24-2)18(12)25-3)17-16(22)15(21)14-10(20)6-9(19)7-11(14)26-17\/h4-7,19-20,22H,1-3H3 is P-glycoprotein inhibitory?\nAssistant: Yes, this molecule is P-glycoprotein inhibitory."} {"text":"User: Can you figure out if the molecule with the SMILES O=C1OCc2cc3cccc(O[C@@H]4O[C@H](CO)[C@@H](O)[C@@H](O)[C@@H]4O)c3c(O)c21 is Pgp inhibitory?\nAssistant: No, this molecule is not Pgp inhibitory."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is Pgp inhibitory.\nMolecule SELFIES: [C][O][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C@@][C][C][C][C][C@H1][Ring1][=Branch1][C@@H1][Branch1][Ring2][C][Ring1][O][N][Branch1][C][C][C][C][Ring1][N]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is P-glycoprotein inhibitory.\nMolecule SMILES: CCN(CC)[C@H](C)CN1c2ccccc2Sc2ccccc21\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/valid_0-9.jsonl": "{"text":"User: Is the molecule with the canonical SMILES COC(=O)[C@H]1[C@@H]2C[C@@H]3c4[nH]c5cc(OC)ccc5c4CCN3C[C@@H]2C[C@H](OC(=O)c2cc(OC)c(OC)c(OC)c2)[C@@H]1OC P-glycoprotein inhibitory?\nAssistant: Yes, it is P-glycoprotein inhibitory."} {"text":"User: Is the molecule with the DeepSMILES Cnc=O)ccncn5C))))nC)c6=O P-glycoprotein inhibitory?\nAssistant: No, it is not P-glycoprotein inhibitory."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/test_0-1.jsonl": "{"text":"The molecule with the SELFIES [C][O][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C@@][C][C][C][C][C@H1][Ring1][=Branch1][C@@H1][Branch1][Ring2][C][Ring1][O][N][Branch1][C][C][C][C][Ring1][N] is not showing Pgp inhibition."} {"text":"The molecule with the SELFIES [C][C][N][Branch1][Ring1][C][C][C@H1][Branch1][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][S][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][=C] is not showing Pgp inhibition."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/valid_0-0.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C33H40N2O9\/c1-38-19-7-8-20-21-9-10-35-16-18-13-27(44-32(36)17-11-25(39-2)30(41-4)26(12-17)40-3)31(42-5)28(33(37)43-6)22(18)15-24(35)29(21)34-23(20)14-19\/h7-8,11-12,14,18,22,24,27-28,31,34H,9-10,13,15-16H2,1-6H3\/t18-,22+,24+,27-,28-,31-\/m0\/s1 is P-glycoprotein inhibitory."} {"text":"The molecule with the InChI InChI=1S\/C8H10N4O2\/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2\/h4H,1-3H3 is not P-glycoprotein inhibitory."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/test_0-2.jsonl": "{"text":"Based on the canonical SMILES representation COc1ccc2c(c1)[C@@]13CCCC[C@H]1[C@@H](C2)N(C)CC3, the molecule has no P-glycoprotein inhibition features."} {"text":"Based on the SELFIES representation [C][C][N][Branch1][Ring1][C][C][C@H1][Branch1][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][S][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][=C], the molecule has no P-glycoprotein inhibition properties."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/valid_0-10.jsonl": "{"text":"User: Can you create the SMILES of a molecule that is P-glycoprotein inhibitory?\nAssistant: Yes, I'm happy to help, here you go: COC(=O)[C@@H]1[C@@H](OC)[C@@H](OC(=O)c2cc(OC)c(OC)c(OC)c2)C[C@H]2CN3CCc4c([nH]c5cc(OC)ccc45)[C@H]3C[C@@H]12"} {"text":"User: Can you generate the SMILES of a molecule that is not Pgp inhibitory?\nAssistant: Sure, here you go: Cn1c(=O)c2c(ncn2C)n(C)c1=O"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/train_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is P-glycoprotein inhibitory.\nMolecule SMILES: COc1cc(-c2oc3cc(O)cc(O)c3c(=O)c2O)cc(OC)c1OC\nConstraint: Answer the question in a full sentence.\nResult: This molecule is P-glycoprotein inhibitory."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is Pgp inhibitory.\ncanonical SMILES: O=C1OCc2cc3cccc(O[C@@H]4O[C@H](CO)[C@@H](O)[C@@H](O)[C@@H]4O)c3c(O)c21\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not Pgp inhibitory."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/valid_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is Pgp inhibitory.\nMolecule canonical SMILES: COC(=O)[C@H]1[C@@H]2C[C@@H]3c4[nH]c5cc(OC)ccc5c4CCN3C[C@@H]2C[C@H](OC(=O)c2cc(OC)c(OC)c(OC)c2)[C@@H]1OC\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is Pgp inhibitory."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is P-glycoprotein inhibitory.\nMolecule DeepSMILES: Cnc=O)ccncn5C))))nC)c6=O\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not P-glycoprotein inhibitory."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/test_0-9.jsonl": "{"text":"User: Is the molecule with the DeepSMILES COcccccc6)[C@@]CCCC[C@H]6[C@@H]C%10)NC)CC%10 P-glycoprotein inhibitory?\nAssistant: No, it is not P-glycoprotein inhibitory."} {"text":"User: Is the molecule with the DeepSMILES CCNCC))[C@H]C)CNcccccc6Scccccc6%14 Pgp inhibitory?\nAssistant: No, it is not Pgp inhibitory."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/test_0-0.jsonl": "{"text":"The molecule with the DeepSMILES COcccccc6)[C@@]CCCC[C@H]6[C@@H]C%10)NC)CC%10 is not P-glycoprotein inhibitory."} {"text":"The molecule with the InChI InChI=1S\/C19H24N2S\/c1-4-20(5-2)15(3)14-21-16-10-6-8-12-18(16)22-19-13-9-7-11-17(19)21\/h6-13,15H,4-5,14H2,1-3H3\/t15-\/m1\/s1 is not P-glycoprotein inhibitory."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/valid_0-7.jsonl": "{"text":"Task: Please create a molecule SMILES based on the text description.\nDescription: A molecule that is P-glycoprotein inhibitory.\nResult: COC(=O)[C@@H]1[C@@H](OC)[C@@H](OC(=O)c2cc(OC)c(OC)c(OC)c2)C[C@H]2CN3CCc4c([nH]c5cc(OC)ccc45)[C@H]3C[C@@H]12"} {"text":"Task: Please give me a SELFIES based on the text description below.\nDescription: A molecule that is Pgp inhibitory.\nResult: [C][N][C][=Branch1][C][=O][C][=C][Branch1][#Branch1][N][=C][N][Ring1][Branch1][C][N][Branch1][C][C][C][Ring1][N][=O]"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/test_0-3.jsonl": "{"text":"The canonical SMILES COc1ccc2c(c1)[C@@]13CCCC[C@H]1[C@@H](C2)N(C)CC3 is from a molecule that is not identified as Pgp inhibitory."} {"text":"The SMILES CCN(CC)[C@H](C)CN1c2ccccc2Sc2ccccc21 represents a molecule that is not identified as P-glycoprotein inhibitory."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/valid_0-11.jsonl": "{"text":"User: I'm looking for the DeepSMILES of a molecule that is Pgp inhibitory?\nAssistant: This is a molecule that is Pgp inhibitory: COC=O)[C@@H][C@@H]OC))[C@@H]OC=O)cccOC))cOC))cOC))c6))))))))C[C@H]CNCCcc[nH]cccOC))ccc96)))))))[C@H]6C[C@@H]%14%10"} {"text":"User: I'm searching for the SMILES of a molecule that is not P-glycoprotein inhibitory?\nAssistant: This is a molecule that is not P-glycoprotein inhibitory: Cn1c(=O)c2c(ncn2C)n(C)c1=O"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/train_0-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of COc1cc(-c2oc3cc(O)cc(O)c3c(=O)c2O)cc(OC)c1OC is P-glycoprotein inhibitory."} {"text":"The molecule with the SMILES O=C1OCc2cc3cccc(O[C@@H]4O[C@H](CO)[C@@H](O)[C@@H](O)[C@@H]4O)c3c(O)c21 is not Pgp inhibitory."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/test_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is Pgp inhibitory.\nSMILES: COc1ccc2c(c1)[C@@]13CCCC[C@H]1[C@@H](C2)N(C)CC3\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not Pgp inhibitory."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is Pgp inhibitory.\nMolecule SELFIES: [C][C][N][Branch1][Ring1][C][C][C@H1][Branch1][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][S][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][=C]\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not Pgp inhibitory."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/train_0-10.jsonl": "{"text":"User: Can you generate the canonical SMILES of a molecule that is P-glycoprotein inhibitory?\nAssistant: Sure, here you go: COc1cc(-c2oc3cc(O)cc(O)c3c(=O)c2O)cc(OC)c1OC"} {"text":"User: Can you create the SMILES of a molecule that is not Pgp inhibitory?\nAssistant: Sure, here you go: O=C1OCc2cc3cccc(O[C@@H]4O[C@H](CO)[C@@H](O)[C@@H](O)[C@@H]4O)c3c(O)c21"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/train_0-3.jsonl": "{"text":"The InChI InChI=1S\/C18H16O8\/c1-23-12-4-8(5-13(24-2)18(12)25-3)17-16(22)15(21)14-10(20)6-9(19)7-11(14)26-17\/h4-7,19-20,22H,1-3H3 is from a molecule that is identified as P-glycoprotein inhibitory."} {"text":"The InChI InChI=1S\/C18H18O9\/c19-5-10-13(20)15(22)16(23)18(27-10)26-9-3-1-2-7-4-8-6-25-17(24)12(8)14(21)11(7)9\/h1-4,10,13,15-16,18-23H,5-6H2\/t10-,13-,15-,16+,18-\/m1\/s1 is from a molecule that is not identified as Pgp inhibitory."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/train_0-12.jsonl": "{"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should be P-glycoprotein inhibitory.\nAssistant: Got it, here you go, this InChI is P-glycoprotein inhibitory: InChI=1S\/C18H16O8\/c1-23-12-4-8(5-13(24-2)18(12)25-3)17-16(22)15(21)14-10(20)6-9(19)7-11(14)26-17\/h4-7,19-20,22H,1-3H3"} {"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be P-glycoprotein inhibitory.\nAssistant: Ok, here you go, this DeepSMILES is not P-glycoprotein inhibitory: O=COCcccccccO[C@@H]O[C@H]CO))[C@@H]O)[C@@H]O)[C@@H]6O))))))))c6cO)c%10%13"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/test_0-13.jsonl": "{"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be Pgp inhibitory.\nAssistant: Understood, this DeepSMILES is not Pgp inhibitory: COcccccc6)[C@@]CCCC[C@H]6[C@@H]C%10)NC)CC%10"} {"text":"User: I want to come up with a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be P-glycoprotein inhibitory.\nAssistant: Got it, this canonical SMILES is not P-glycoprotein inhibitory: CCN(CC)[C@H](C)CN1c2ccccc2Sc2ccccc21"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/valid_0-2.jsonl": "{"text":"Based on the DeepSMILES representation COC=O)[C@@H][C@@H]OC))[C@@H]OC=O)cccOC))cOC))cOC))c6))))))))C[C@H]CNCCcc[nH]cccOC))ccc96)))))))[C@H]6C[C@@H]%14%10, the molecule has P-glycoprotein inhibition properties."} {"text":"Based on the SMILES representation Cn1c(=O)c2c(ncn2C)n(C)c1=O, the molecule has no Pgp inhibition properties."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI InChI=1S\/C18H16O8\/c1-23-12-4-8(5-13(24-2)18(12)25-3)17-16(22)15(21)14-10(20)6-9(19)7-11(14)26-17\/h4-7,19-20,22H,1-3H3 Pgp inhibitory?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any other words.\nOptions:\n(1) True\n(2) False\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI representation of InChI=1S\/C18H18O9\/c19-5-10-13(20)15(22)16(23)18(27-10)26-9-3-1-2-7-4-8-6-25-17(24)12(8)14(21)11(7)9\/h1-4,10,13,15-16,18-23H,5-6H2\/t10-,13-,15-,16+,18-\/m1\/s1 Pgp inhibitory?\nConstraint: Even if you are not sure, you must pick either A or B without using any additional words.\nOptions:\n[A] True\n[B] False\nAnswer: B"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/valid_0-1.jsonl": "{"text":"The molecule with the SMILES representation of COC(=O)[C@@H]1[C@@H](OC)[C@@H](OC(=O)c2cc(OC)c(OC)c(OC)c2)C[C@H]2CN3CCc4c([nH]c5cc(OC)ccc45)[C@H]3C[C@@H]12 is showing Pgp inhibition."} {"text":"The molecule with the SELFIES representation of [C][N][C][=Branch1][C][=O][C][=C][Branch1][#Branch1][N][=C][N][Ring1][Branch1][C][N][Branch1][C][C][C][Ring1][N][=O] is not showing Pgp inhibition."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/valid_0-13.jsonl": "{"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should be Pgp inhibitory.\nAssistant: Understood, this DeepSMILES is Pgp inhibitory: COC=O)[C@@H][C@@H]OC))[C@@H]OC=O)cccOC))cOC))cOC))c6))))))))C[C@H]CNCCcc[nH]cccOC))ccc96)))))))[C@H]6C[C@@H]%14%10"} {"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be P-glycoprotein inhibitory.\nAssistant: Understood, this SMILES is not P-glycoprotein inhibitory: Cn1c(=O)c2c(ncn2C)n(C)c1=O"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is Pgp inhibitory.\nDeepSMILES: COC=O)[C@@H][C@@H]OC))[C@@H]OC=O)cccOC))cOC))cOC))c6))))))))C[C@H]CNCCcc[nH]cccOC))ccc96)))))))[C@H]6C[C@@H]%14%10\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any extra words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is P-glycoprotein inhibitory.\ncanonical SMILES: Cn1c(=O)c2c(ncn2C)n(C)c1=O\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/train_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are Pgp inhibitory?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any additional words.\nOptions:\n(1) COccc-cocccO)ccO)c6c=O)c%10O)))))))))))ccOC))c6OC\n(2) O=ccc-ccccOCCOCCOCCOCCO)))))))))))))cc6))))))occcO)ccO)c%106\n(3) CO[C@]C[C@@H]COC=O)ccnccBr)c6)))))))))CNC)[C@@H]6CccnC)ccccc%15c96\n(4) O[C@H]C=CCCNCcccccc6[C@H][C@@H]%15O))[C@H]%13%10))))OCO5\nAnswer: 1, 2, 3"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not Pgp inhibitory?\nConstraint: You must select none, one or more options from 1, 2, 3, 4, or 5 without using any additional words.\nOptions:\n(1) InChI=1S\/C2H6O3S2\/c3-7(4,5)2-1-6\/h6H,1-2H2,(H,3,4,5)\n(2) InChI=1S\/C14H12O3S\/c1-9(14(16)17)10-4-6-11(7-5-10)13(15)12-3-2-8-18-12\/h2-9H,1H3,(H,16,17)\/t9-\/m1\/s1\n(3) InChI=1S\/C18H18O9\/c19-5-10-13(20)15(22)16(23)18(27-10)26-9-3-1-2-7-4-8-6-25-17(24)12(8)14(21)11(7)9\/h1-4,10,13,15-16,18-23H,5-6H2\/t10-,13-,15-,16+,18-\/m1\/s1\n(4) InChI=1S\/C30H30N4O4\/c1-37-27-16-21-11-13-34(19-23(21)17-28(27)38-2)14-12-31-29(35)24-9-5-6-10-25(24)33-30(36)26-15-20-7-3-4-8-22(20)18-32-26\/h3-10,15-18H,11-14,19H2,1-2H3,(H,31,35)(H,33,36)\n(5) InChI=1S\/C14H15N\/c15-11-14(12-7-3-1-4-8-12)13-9-5-2-6-10-13\/h1-10,14H,11,15H2\nAnswer: 1, 2, 3, 5"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/valid_0-4.jsonl": "{"text":"The molecule DeepSMILES COC=O)[C@@H][C@@H]OC))[C@@H]OC=O)cccOC))cOC))cOC))c6))))))))C[C@H]CNCCcc[nH]cccOC))ccc96)))))))[C@H]6C[C@@H]%14%10 is Pgp inhibitory."} {"text":"The SELFIES [C][N][C][=Branch1][C][=O][C][=C][Branch1][#Branch1][N][=C][N][Ring1][Branch1][C][N][Branch1][C][C][C][Ring1][N][=O] is not Pgp inhibitory."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is Pgp inhibitory.\nMolecule canonical SMILES: COc1cc(-c2oc3cc(O)cc(O)c3c(=O)c2O)cc(OC)c1OC\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is P-glycoprotein inhibitory.\nMolecule DeepSMILES: O=COCcccccccO[C@@H]O[C@H]CO))[C@@H]O)[C@@H]O)[C@@H]6O))))))))c6cO)c%10%13\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/valid_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are P-glycoprotein inhibitory?\nConstraint: You must select none, one or more options from a, b, c, or d without using any additional words.\nOptions:\na.) COC=O)[C@@H][C@@H]OC))[C@@H]OC=O)cccOC))cOC))cOC))c6))))))))C[C@H]CNCCcc[nH]cccOC))ccc96)))))))[C@H]6C[C@@H]%14%10\nb.) CO[C@@H]CCcccccc6))))))))cccccc6OC[C@H]O)CNCCCcccccc6))))))cccccc6\nc.) C=N[C@@H]C=O)N[C@@H]C=O)[C@H][C@@H]4SCC)C)[C@@H]5C=O)O)))))))))))cccccc6\nd.) C[C@H]NCCcccO)cO)cc6%10\nAnswer: a, b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not Pgp inhibitory?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any additional words.\nOptions:\n[A] O[C@@H](CCc1ccccc1)c1ccccc1OC[C@H](O)CN1CCOCC1\n[B] CC(=O)C[C@@H](c1ccccc1)c1c(O)c2ccccc2oc1=O\n[C] O=c1[nH]c2ccccc2n1C1CCN(CCCC(c2ccc(F)cc2)c2ccc(F)cc2)CC1\n[D] CC(=O)c1ccc(OC[C@@H](O)CN2CCOCC2)cc1\n[E] Cn1c(=O)c2c(ncn2C)n(C)c1=O\nAnswer: B, D, E"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/valid_0-12.jsonl": "{"text":"User: I want to come up with a InChI.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should be Pgp inhibitory.\nAssistant: Ok, this InChI is Pgp inhibitory: InChI=1S\/C33H40N2O9\/c1-38-19-7-8-20-21-9-10-35-16-18-13-27(44-32(36)17-11-25(39-2)30(41-4)26(12-17)40-3)31(42-5)28(33(37)43-6)22(18)15-24(35)29(21)34-23(20)14-19\/h7-8,11-12,14,18,22,24,27-28,31,34H,9-10,13,15-16H2,1-6H3\/t18-,22+,24+,27-,28-,31-\/m0\/s1"} {"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be Pgp inhibitory.\nAssistant: Ok, here you go, this SELFIES is not Pgp inhibitory: [C][N][C][=Branch1][C][=O][C][=C][Branch1][#Branch1][N][=C][N][Ring1][Branch1][C][N][Branch1][C][C][C][Ring1][N][=O]"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/train_0-2.jsonl": "{"text":"Based on the canonical SMILES COc1cc(-c2oc3cc(O)cc(O)c3c(=O)c2O)cc(OC)c1OC, the molecule has Pgp inhibition features."} {"text":"Based on the DeepSMILES representation O=COCcccccccO[C@@H]O[C@H]CO))[C@@H]O)[C@@H]O)[C@@H]6O))))))))c6cO)c%10%13, the molecule has no Pgp inhibition features."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/test_0-11.jsonl": "{"text":"User: I'm searching for the InChI of a molecule that is not P-glycoprotein inhibitory?\nAssistant: This is a molecule that is not P-glycoprotein inhibitory: InChI=1S\/C18H25NO\/c1-19-10-9-18-8-4-3-5-15(18)17(19)11-13-6-7-14(20-2)12-16(13)18\/h6-7,12,15,17H,3-5,8-11H2,1-2H3\/t15-,17+,18+\/m0\/s1"} {"text":"User: I'm looking for the InChI of a molecule that is not Pgp inhibitory?\nAssistant: This is a molecule that is not Pgp inhibitory: InChI=1S\/C19H24N2S\/c1-4-20(5-2)15(3)14-21-16-10-6-8-12-18(16)22-19-13-9-7-11-17(19)21\/h6-13,15H,4-5,14H2,1-3H3\/t15-\/m1\/s1"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/train_0-7.jsonl": "{"text":"Task: Please generate a molecule InChI based on the text description.\nDescription: A molecule that is P-glycoprotein inhibitory.\nResult: InChI=1S\/C18H16O8\/c1-23-12-4-8(5-13(24-2)18(12)25-3)17-16(22)15(21)14-10(20)6-9(19)7-11(14)26-17\/h4-7,19-20,22H,1-3H3"} {"text":"Task: Please create a SELFIES based on the text description below.\nDescription: A molecule that is P-glycoprotein inhibitory.\nResult: [O][=C][O][C][C][=C][C][=C][C][=C][C][Branch2][Ring1][Branch1][O][C@@H1][O][C@H1][Branch1][Ring1][C][O][C@@H1][Branch1][C][O][C@@H1][Branch1][C][O][C@@H1][Ring1][#Branch2][O][=C][Ring2][Ring1][C][C][Branch1][C][O][=C][Ring2][Ring1][#Branch1][Ring2][Ring1][#Branch2]"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/train_0-11.jsonl": "{"text":"User: I'm searching for the SELFIES of a molecule that is Pgp inhibitory?\nAssistant: This is a molecule that is Pgp inhibitory: [C][O][C][=C][C][Branch2][Ring1][=Branch2][C][O][C][=C][C][Branch1][C][O][=C][C][Branch1][C][O][=C][Ring1][Branch2][C][=Branch1][C][=O][C][=Ring1][=N][O][=C][C][Branch1][Ring1][O][C][=C][Ring2][Ring1][=Branch1][O][C]"} {"text":"User: I'm searching for the SMILES of a molecule that is not Pgp inhibitory?\nAssistant: This is a molecule that is not Pgp inhibitory: O=C1OCc2cc3cccc(O[C@@H]4O[C@H](CO)[C@@H](O)[C@@H](O)[C@@H]4O)c3c(O)c21"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/train_0-1.jsonl": "{"text":"The molecule with the SMILES COc1cc(-c2oc3cc(O)cc(O)c3c(=O)c2O)cc(OC)c1OC is showing Pgp inhibition."} {"text":"The molecule with the DeepSMILES representation of O=COCcccccccO[C@@H]O[C@H]CO))[C@@H]O)[C@@H]O)[C@@H]6O))))))))c6cO)c%10%13 is not showing Pgp inhibition."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/train_0-13.jsonl": "{"text":"User: I want to generate a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should be Pgp inhibitory.\nAssistant: Got it, this canonical SMILES is Pgp inhibitory: COc1cc(-c2oc3cc(O)cc(O)c3c(=O)c2O)cc(OC)c1OC"} {"text":"User: I want to create a SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be Pgp inhibitory.\nAssistant: Understood, this SMILES is not Pgp inhibitory: O=C1OCc2cc3cccc(O[C@@H]4O[C@H](CO)[C@@H](O)[C@@H](O)[C@@H]4O)c3c(O)c21"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/train_0-4.jsonl": "{"text":"The molecule DeepSMILES COccc-cocccO)ccO)c6c=O)c%10O)))))))))))ccOC))c6OC is Pgp inhibitory."} {"text":"The molecule InChI InChI=1S\/C18H18O9\/c19-5-10-13(20)15(22)16(23)18(27-10)26-9-3-1-2-7-4-8-6-25-17(24)12(8)14(21)11(7)9\/h1-4,10,13,15-16,18-23H,5-6H2\/t10-,13-,15-,16+,18-\/m1\/s1 is not Pgp inhibitory."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/test_0-7.jsonl": "{"text":"Task: Please create a molecule InChI based on the description below.\nDescription: A molecule that is Pgp inhibitory.\nResult: InChI=1S\/C18H25NO\/c1-19-10-9-18-8-4-3-5-15(18)17(19)11-13-6-7-14(20-2)12-16(13)18\/h6-7,12,15,17H,3-5,8-11H2,1-2H3\/t15-,17+,18+\/m0\/s1"} {"text":"Task: Please create a InChI based on the description.\nDescription: A molecule that is Pgp inhibitory.\nResult: InChI=1S\/C19H24N2S\/c1-4-20(5-2)15(3)14-21-16-10-6-8-12-18(16)22-19-13-9-7-11-17(19)21\/h6-13,15H,4-5,14H2,1-3H3\/t15-\/m1\/s1"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/train_0-9.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C18H16O8\/c1-23-12-4-8(5-13(24-2)18(12)25-3)17-16(22)15(21)14-10(20)6-9(19)7-11(14)26-17\/h4-7,19-20,22H,1-3H3 P-glycoprotein inhibitory?\nAssistant: Yes, it is P-glycoprotein inhibitory."} {"text":"User: Is the molecule with the DeepSMILES O=COCcccccccO[C@@H]O[C@H]CO))[C@@H]O)[C@@H]O)[C@@H]6O))))))))c6cO)c%10%13 Pgp inhibitory?\nAssistant: No, it is not Pgp inhibitory."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/valid_0-3.jsonl": "{"text":"The canonical SMILES COC(=O)[C@H]1[C@@H]2C[C@@H]3c4[nH]c5cc(OC)ccc5c4CCN3C[C@@H]2C[C@H](OC(=O)c2cc(OC)c(OC)c(OC)c2)[C@@H]1OC represents a molecule that is identified as P-glycoprotein inhibitory."} {"text":"The SMILES Cn1c(=O)c2c(ncn2C)n(C)c1=O represents a molecule that is not identified as Pgp inhibitory."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/test_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the SMILES COc1ccc2c(c1)[C@@]13CCCC[C@H]1[C@@H](C2)N(C)CC3 is P-glycoprotein inhibitory?\nAssistant: No, this molecule is not P-glycoprotein inhibitory."} {"text":"User: Can you tell me if the molecule with the SELFIES [C][C][N][Branch1][Ring1][C][C][C@H1][Branch1][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][S][C][=C][C][=C][C][=C][Ring1][=Branch1][Ring1][=C] is Pgp inhibitory?\nAssistant: No, this molecule is not Pgp inhibitory."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [C][O][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C@@][C][C][C][C][C@H1][Ring1][=Branch1][C@@H1][Branch1][Ring2][C][Ring1][O][N][Branch1][C][C][C][C][Ring1][N] Pgp inhibitory?\nConstraint: Even if you are uncertain, you must pick either a or b without using any additional words.\nOptions:\na. False\nb. True\nAnswer: a"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES CCN(CC)[C@H](C)CN1c2ccccc2Sc2ccccc21 P-glycoprotein inhibitory?\nConstraint: Even if you are not sure, you must pick either a or b without using any other words.\nOptions:\na True\nb False\nAnswer: b"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI InChI=1S\/C33H40N2O9\/c1-38-19-7-8-20-21-9-10-35-16-18-13-27(44-32(36)17-11-25(39-2)30(41-4)26(12-17)40-3)31(42-5)28(33(37)43-6)22(18)15-24(35)29(21)34-23(20)14-19\/h7-8,11-12,14,18,22,24,27-28,31,34H,9-10,13,15-16H2,1-6H3\/t18-,22+,24+,27-,28-,31-\/m0\/s1 Pgp inhibitory?\nConstraint: Even if you are not sure, you must pick either A or B without using any additional words.\nOptions:\n[A] False\n[B] True\nAnswer: B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [C][N][C][=Branch1][C][=O][C][=C][Branch1][#Branch1][N][=C][N][Ring1][Branch1][C][N][Branch1][C][C][C][Ring1][N][=O] P-glycoprotein inhibitory?\nConstraint: Even if you are uncertain, you must pick either A or B without using any other words.\nOptions:\nA.) True\nB.) False\nAnswer: B"}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/test_0-4.jsonl": "{"text":"The molecule DeepSMILES COcccccc6)[C@@]CCCC[C@H]6[C@@H]C%10)NC)CC%10 is not Pgp inhibitory."} {"text":"The molecule SMILES CCN(CC)[C@H](C)CN1c2ccccc2Sc2ccccc21 is not P-glycoprotein inhibitory."}", "/scratch/micpie/export/p_glycoprotein_inhibition_broccatelli_et_al/test_0-12.jsonl": "{"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be P-glycoprotein inhibitory.\nAssistant: Ok, this SMILES is not P-glycoprotein inhibitory: COc1ccc2c(c1)[C@@]13CCCC[C@H]1[C@@H](C2)N(C)CC3"} {"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very curious. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be Pgp inhibitory.\nAssistant: Ok, this SMILES is not Pgp inhibitory: CCN(CC)[C@H](C)CN1c2ccccc2Sc2ccccc21"}", "/scratch/micpie/export/bioavailability_ma_et_al/test_0-10.jsonl": "{"text":"User: Can you give me the InChI of a molecule that has a low oral bioavailability?\nAssistant: Sure, here you go: InChI=1S\/C10H15NO\/c1-4-11(2,3)9-6-5-7-10(12)8-9\/h5-8H,4H2,1-3H3\/p+1"} {"text":"User: Can you give me the InChI of a molecule that has a high oral bioavailability?\nAssistant: Sure, here you go: InChI=1S\/C22H28N6O3S\/c1-15(2)24-19-5-4-8-23-21(19)27-9-11-28(12-10-27)22(29)20-14-16-13-17(26-32(3,30)31)6-7-18(16)25-20\/h4-8,13-15,24-26H,9-12H2,1-3H3"}", "/scratch/micpie/export/bioavailability_ma_et_al/valid_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the DeepSMILES CO[C@@]NC=O)Cccccs5))))))))C=O)NCC=O)O))=CCOCN)=O))))CS[C@@H]68 has a low or high oral bioavailability?\nAssistant: Of course, this molecule has a low oral bioavailability."} {"text":"User: Can you estimate if the molecule with the SMILES Nc1nc(N)c2nc(-c3ccccc3)c(N)nc2n1 has a low or high oral bioavailability?\nAssistant: Yes, I'm happy to help, this molecule has a high oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/test_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules have a high oral bioavailability?\nConstraint: You must select none, one or more options from a or b without using any other words.\nOptions:\n[a] CC1(C)S[C@@H]2[C@H](NC(=O)[C@H](N)c3ccc(O)cc3)C(=O)N2[C@H]1C(=O)O\n[b] CC[N+](C)(C)c1cccc(O)c1\nAnswer: b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules have a high oral bioavailability?\nConstraint: You must select none, one or more options from A, B, or C without using any additional words.\nOptions:\nA: CNC)[C@@H]CO)=CCN)=O))C=O)[C@@]O)CO)=CC=O)ccO)cccCl)c6[C@@H]O)[C@H]%10C[C@@H]%18%14\nB: O=CNCCCCCCN6))))))))cccOCCF)F)F))))ccc6OCCF)F)F\nC: CCC)Nccccnc6NCCNC=O)cccccNSC)=O)=O)))ccc6[nH]9))))))))))CC6\nAnswer: A, B, C"}", "/scratch/micpie/export/bioavailability_ma_et_al/train_0-8.jsonl": "{"text":"User: Can you estimate if the molecule with the DeepSMILES OCCS)CS has a low or high oral bioavailability?\nAssistant: Yes, I'm happy to help, this molecule has a low oral bioavailability."} {"text":"User: Can you tell me if the molecule with the InChI InChI=1S\/C46H62N4O11\/c1-22(2)21-50-18-16-46(17-19-50)48-34-31-32-39(54)28(8)42-33(31)43(56)45(10,61-42)59-20-15-30(58-11)25(5)41(60-29(9)51)27(7)38(53)26(6)37(52)23(3)13-12-14-24(4)44(57)47-36(40(32)55)35(34)49-46\/h12-15,20,22-23,25-27,30,37-38,41,49,52-54H,16-19,21H2,1-11H3,(H,47,57)\/b13-12+,20-15+,24-14-\/t23-,25+,26+,27+,30-,37-,38+,41+,45-\/m0\/s1 has a low or high oral bioavailability?\nAssistant: Yes, this molecule has a low oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: Predict if the molecule has a low or high oral bioavailability?\nMolecule SMILES: CC[N+](C)(C)c1cccc(O)c1\nConstraint: Even if you are uncertain, you must pick either \"low\" or \"high\" without using any other words.\nResult: low"} {"text":"Task: Please classify a molecule based on the description.\nDescription: Predict if the molecule has a low or high oral bioavailability?\nMolecule InChI: InChI=1S\/C22H28N6O3S\/c1-15(2)24-19-5-4-8-23-21(19)27-9-11-28(12-10-27)22(29)20-14-16-13-17(26-32(3,30)31)6-7-18(16)25-20\/h4-8,13-15,24-26H,9-12H2,1-3H3\nConstraint: Even if you are not sure, you must pick either \"low\" or \"high\" without using any additional words.\nResult: high"}", "/scratch/micpie/export/bioavailability_ma_et_al/valid_0-9.jsonl": "{"text":"User: Has the molecule with the InChI InChI=1S\/C16H17N3O7S2\/c1-25-16(18-10(20)5-9-3-2-4-27-9)13(23)19-11(12(21)22)8(6-26-15(17)24)7-28-14(16)19\/h2-4,14H,5-7H2,1H3,(H2,17,24)(H,18,20)(H,21,22)\/t14-,16+\/m1\/s1 a low or high oral bioavailability?\nAssistant: It has a low oral bioavailability."} {"text":"User: Has the molecule with the SELFIES ['[N][C][=N][C][Branch1][C][N][=C][N][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Branch1][C][N][=N][C][Ring1][=N][=N][Ring2][Ring1][C]'] a low or high oral bioavailability?\nAssistant: It has a high oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/test_0-1.jsonl": "{"text":"Based on the SELFIES ['[C][C][N+1][Branch1][C][C][Branch1][C][C][C][=C][C][=C][C][Branch1][C][O][=C][Ring1][#Branch1]'], the molecule has a low oral bioavailability."} {"text":"Based on the SELFIES ['[C][C][Branch1][C][C][N][C][=C][C][=C][N][=C][Ring1][=Branch1][N][C][C][N][Branch2][Ring1][=N][C][=Branch1][C][=O][C][=C][C][=C][C][Branch1][#Branch2][N][S][Branch1][C][C][=Branch1][C][=O][=O][=C][C][=C][Ring1][O][NH1][Ring1][=C][C][C][Ring2][Ring1][=Branch1]'], the molecule has a high oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/valid_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of ['[C][O][C@@][Branch1][=C][N][C][=Branch1][C][=O][C][C][=C][C][=C][S][Ring1][Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][=Branch1][C][=O][O][=C][Branch1][Branch2][C][O][C][Branch1][C][N][=O][C][S][C@@H1][Ring1][=C][Ring2][Ring1][#Branch2]'] has a low oral bioavailability."} {"text":"The molecule with the InChI representation of InChI=1S\/C12H11N7\/c13-9-7(6-4-2-1-3-5-6)16-8-10(14)18-12(15)19-11(8)17-9\/h1-5H,(H6,13,14,15,17,18,19) has a high oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/test_0-2.jsonl": "{"text":"The canonical SMILES CC[N+](C)(C)c1cccc(O)c1 represents a molecule that has a low oral bioavailability."} {"text":"The SELFIES ['[C][C][Branch1][C][C][N][C][=C][C][=C][N][=C][Ring1][=Branch1][N][C][C][N][Branch2][Ring1][=N][C][=Branch1][C][=O][C][=C][C][=C][C][Branch1][#Branch2][N][S][Branch1][C][C][=Branch1][C][=O][=O][=C][C][=C][Ring1][O][NH1][Ring1][=C][C][C][Ring2][Ring1][=Branch1]'] represents a molecule that has a high oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/valid_0-10.jsonl": "{"text":"User: Can you generate the DeepSMILES of a molecule that has a low oral bioavailability?\nAssistant: Yes, here you go: CO[C@@]NC=O)Cccccs5))))))))C=O)NCC=O)O))=CCOCN)=O))))CS[C@@H]68"} {"text":"User: Can you generate the InChI of a molecule that has a high oral bioavailability?\nAssistant: Yes, here you go: InChI=1S\/C12H11N7\/c13-9-7(6-4-2-1-3-5-6)16-8-10(14)18-12(15)19-11(8)17-9\/h1-5H,(H6,13,14,15,17,18,19)"}", "/scratch/micpie/export/bioavailability_ma_et_al/train_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: Predict if the molecule has a low or high oral bioavailability?\nMolecule InChI: InChI=1S\/C3H8OS2\/c4-1-3(6)2-5\/h3-6H,1-2H2\nConstraint: Answer the question in a full sentence.\nResult: This molecule has a low oral bioavailability."} {"text":"Task: Please classify a molecule based on the description.\nDescription: Predict if the molecule has a low or high oral bioavailability?\nInChI: InChI=1S\/C46H62N4O11\/c1-22(2)21-50-18-16-46(17-19-50)48-34-31-32-39(54)28(8)42-33(31)43(56)45(10,61-42)59-20-15-30(58-11)25(5)41(60-29(9)51)27(7)38(53)26(6)37(52)23(3)13-12-14-24(4)44(57)47-36(40(32)55)35(34)49-46\/h12-15,20,22-23,25-27,30,37-38,41,49,52-54H,16-19,21H2,1-11H3,(H,47,57)\/b13-12+,20-15+,24-14-\/t23-,25+,26+,27+,30-,37-,38+,41+,45-\/m0\/s1\nConstraint: Answer the question in a full sentence.\nResult: This molecule has a low oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/valid_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: Predict if the molecule has a low or high oral bioavailability?\ncanonical SMILES: CO[C@@]1(NC(=O)Cc2cccs2)C(=O)N2C(C(=O)O)=C(COC(N)=O)CS[C@@H]21\nConstraint: Answer the question in a complete sentence.\nResult: This molecule has a low oral bioavailability."} {"text":"Task: Please classify a molecule based on the description.\nDescription: Predict if the molecule has a low or high oral bioavailability?\nDeepSMILES: NcncN)cnc-cccccc6))))))cN)nc6n%10\nConstraint: Answer the question in a full sentence.\nResult: This molecule has a high oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/test_0-9.jsonl": "{"text":"User: Has the molecule with the SELFIES ['[C][C][N+1][Branch1][C][C][Branch1][C][C][C][=C][C][=C][C][Branch1][C][O][=C][Ring1][#Branch1]'] a low or high oral bioavailability?\nAssistant: It has a low oral bioavailability."} {"text":"User: Has the molecule with the SELFIES ['[C][C][Branch1][C][C][N][C][=C][C][=C][N][=C][Ring1][=Branch1][N][C][C][N][Branch2][Ring1][=N][C][=Branch1][C][=O][C][=C][C][=C][C][Branch1][#Branch2][N][S][Branch1][C][C][=Branch1][C][=O][=O][=C][C][=C][Ring1][O][NH1][Ring1][=C][C][C][Ring2][Ring1][=Branch1]'] a low or high oral bioavailability?\nAssistant: It has a high oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/test_0-0.jsonl": "{"text":"The molecule with the DeepSMILES CC[N+]C)C)cccccO)c6 has a low oral bioavailability."} {"text":"The molecule with the SELFIES representation of ['[C][C][Branch1][C][C][N][C][=C][C][=C][N][=C][Ring1][=Branch1][N][C][C][N][Branch2][Ring1][=N][C][=Branch1][C][=O][C][=C][C][=C][C][Branch1][#Branch2][N][S][Branch1][C][C][=Branch1][C][=O][=O][=C][C][=C][Ring1][O][NH1][Ring1][=C][C][C][Ring2][Ring1][=Branch1]'] has a high oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/valid_0-7.jsonl": "{"text":"Task: Please give me a DeepSMILES based on the description below.\nDescription: A molecule that has a low oral bioavailability.\nResult: CO[C@@]NC=O)Cccccs5))))))))C=O)NCC=O)O))=CCOCN)=O))))CS[C@@H]68"} {"text":"Task: Please generate a molecule SELFIES based on the description below.\nDescription: A molecule that has a high oral bioavailability.\nResult: ['[N][C][=N][C][Branch1][C][N][=C][N][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Branch1][C][N][=N][C][Ring1][=N][=N][Ring2][Ring1][C]']"}", "/scratch/micpie/export/bioavailability_ma_et_al/test_0-3.jsonl": "{"text":"The SELFIES ['[C][C][N+1][Branch1][C][C][Branch1][C][C][C][=C][C][=C][C][Branch1][C][O][=C][Ring1][#Branch1]'] has a low oral bioavailability."} {"text":"The canonical SMILES CC(C)Nc1cccnc1N1CCN(C(=O)c2cc3cc(NS(C)(=O)=O)ccc3[nH]2)CC1 has a high oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/valid_0-11.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that has a low oral bioavailability?\nAssistant: Ok, this is a molecule that has a low oral bioavailability: CO[C@@]1(NC(=O)Cc2cccs2)C(=O)N2C(C(=O)O)=C(COC(N)=O)CS[C@@H]21"} {"text":"User: I'm looking for the InChI of a molecule that has a high oral bioavailability?\nAssistant: This is a molecule that has a high oral bioavailability: InChI=1S\/C12H11N7\/c13-9-7(6-4-2-1-3-5-6)16-8-10(14)18-12(15)19-11(8)17-9\/h1-5H,(H6,13,14,15,17,18,19)"}", "/scratch/micpie/export/bioavailability_ma_et_al/train_0-0.jsonl": "{"text":"The molecule with the InChI representation of InChI=1S\/C3H8OS2\/c4-1-3(6)2-5\/h3-6H,1-2H2 has a low oral bioavailability."} {"text":"The molecule with the DeepSMILES representation of CO[C@H]\/C=C\/O[C@@]C)OccC)cO)ccc6C9=O)))C=NCCCNCCC)C)))CC6)))))NC5=CNC=O)\/CC)=C\\C=C\\[C@H]C)[C@H]O)[C@@H]C)[C@@H]O)[C@@H]C)[C@H]OCC)=O)))[C@@H]%30C))))))))))))))C9=O has a low oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/test_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: Predict if the molecule has a low or high oral bioavailability?\nMolecule DeepSMILES: CC[N+]C)C)cccccO)c6\nConstraint: Answer the question in a complete sentence.\nResult: This molecule has a low oral bioavailability."} {"text":"Task: Please classify a molecule based on the description.\nDescription: Predict if the molecule has a low or high oral bioavailability?\nMolecule SELFIES: ['[C][C][Branch1][C][C][N][C][=C][C][=C][N][=C][Ring1][=Branch1][N][C][C][N][Branch2][Ring1][=N][C][=Branch1][C][=O][C][=C][C][=C][C][Branch1][#Branch2][N][S][Branch1][C][C][=Branch1][C][=O][=O][=C][C][=C][Ring1][O][NH1][Ring1][=C][C][C][Ring2][Ring1][=Branch1]']\nConstraint: Answer the question in a complete sentence.\nResult: This molecule has a high oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/train_0-10.jsonl": "{"text":"User: Can you generate the SELFIES of a molecule that has a low oral bioavailability?\nAssistant: Yes, here you go: ['[O][C][C][Branch1][C][S][C][S]']"} {"text":"User: Can you generate the SELFIES of a molecule that has a low oral bioavailability?\nAssistant: Sure, here you go: ['[C][O][C@H1][\/C][=C][\/O][C@@][Branch1][C][C][O][C][=C][Branch1][C][C][C][Branch1][C][O][=C][C][=Branch1][Branch2][=C][Ring1][Branch2][C][Ring1][N][=O][C][=N][C][Branch1][S][C][C][N][Branch1][#Branch1][C][C][Branch1][C][C][C][C][C][Ring1][#Branch2][N][C][Ring1][=C][=C][Branch2][Ring2][#C][N][C][=Branch1][C][=O][\/C][Branch1][C][C][=C][\\\\C][=C][\\\\C@H1][Branch1][C][C][C@H1][Branch1][C][O][C@@H1][Branch1][C][C][C@@H1][Branch1][C][O][C@@H1][Branch1][C][C][C@H1][Branch1][#Branch1][O][C][Branch1][C][C][=O][C@@H1][Ring2][Branch1][Branch2][C][C][Ring2][Ring2][=C][=O]']"}", "/scratch/micpie/export/bioavailability_ma_et_al/train_0-3.jsonl": "{"text":"The SMILES OCC(S)CS has a low oral bioavailability."} {"text":"The InChI InChI=1S\/C46H62N4O11\/c1-22(2)21-50-18-16-46(17-19-50)48-34-31-32-39(54)28(8)42-33(31)43(56)45(10,61-42)59-20-15-30(58-11)25(5)41(60-29(9)51)27(7)38(53)26(6)37(52)23(3)13-12-14-24(4)44(57)47-36(40(32)55)35(34)49-46\/h12-15,20,22-23,25-27,30,37-38,41,49,52-54H,16-19,21H2,1-11H3,(H,47,57)\/b13-12+,20-15+,24-14-\/t23-,25+,26+,27+,30-,37-,38+,41+,45-\/m0\/s1 has a low oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/train_0-12.jsonl": "{"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should have a low oral bioavailability.\nAssistant: Ok, here you go, this SMILES has a low oral bioavailability: OCC(S)CS"} {"text":"User: I want to create a molecule SELFIES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should have a low oral bioavailability.\nAssistant: Got it, here you go, this SELFIES has a low oral bioavailability: ['[C][O][C@H1][\/C][=C][\/O][C@@][Branch1][C][C][O][C][=C][Branch1][C][C][C][Branch1][C][O][=C][C][=Branch1][Branch2][=C][Ring1][Branch2][C][Ring1][N][=O][C][=N][C][Branch1][S][C][C][N][Branch1][#Branch1][C][C][Branch1][C][C][C][C][C][Ring1][#Branch2][N][C][Ring1][=C][=C][Branch2][Ring2][#C][N][C][=Branch1][C][=O][\/C][Branch1][C][C][=C][\\\\C][=C][\\\\C@H1][Branch1][C][C][C@H1][Branch1][C][O][C@@H1][Branch1][C][C][C@@H1][Branch1][C][O][C@@H1][Branch1][C][C][C@H1][Branch1][#Branch1][O][C][Branch1][C][C][=O][C@@H1][Ring2][Branch1][Branch2][C][C][Ring2][Ring2][=C][=O]']"}", "/scratch/micpie/export/bioavailability_ma_et_al/test_0-13.jsonl": "{"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should have a low oral bioavailability.\nAssistant: Understood, this DeepSMILES has a low oral bioavailability: CC[N+]C)C)cccccO)c6"} {"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should have a high oral bioavailability.\nAssistant: Ok, this SMILES has a high oral bioavailability: CC(C)Nc1cccnc1N1CCN(C(=O)c2cc3cc(NS(C)(=O)=O)ccc3[nH]2)CC1"}", "/scratch/micpie/export/bioavailability_ma_et_al/valid_0-2.jsonl": "{"text":"The InChI InChI=1S\/C16H17N3O7S2\/c1-25-16(18-10(20)5-9-3-2-4-27-9)13(23)19-11(12(21)22)8(6-26-15(17)24)7-28-14(16)19\/h2-4,14H,5-7H2,1H3,(H2,17,24)(H,18,20)(H,21,22)\/t14-,16+\/m1\/s1 is from a molecule that has a low oral bioavailability."} {"text":"The InChI InChI=1S\/C12H11N7\/c13-9-7(6-4-2-1-3-5-6)16-8-10(14)18-12(15)19-11(8)17-9\/h1-5H,(H6,13,14,15,17,18,19) is from a molecule that has a high oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Has the molecule with the SELFIES representation of ['[O][C][C][Branch1][C][S][C][S]'] a high oral bioavailability?\nConstraint: Even if you are uncertain, you must pick either a or b without using any other words.\nOptions:\na: True\nb: False\nAnswer: b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Has the molecule with the SMILES representation of CO[C@H]1\/C=C\/O[C@@]2(C)Oc3c(C)c(O)c4c(c3C2=O)C2=NC3(CCN(CC(C)C)CC3)NC2=C(NC(=O)\/C(C)=C\\C=C\\[C@H](C)[C@H](O)[C@@H](C)[C@@H](O)[C@@H](C)[C@H](OC(C)=O)[C@@H]1C)C4=O a high oral bioavailability?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any other words.\nOptions:\n1.) False\n2.) True\nAnswer: 1"}", "/scratch/micpie/export/bioavailability_ma_et_al/valid_0-1.jsonl": "{"text":"Based on the SMILES CO[C@@]1(NC(=O)Cc2cccs2)C(=O)N2C(C(=O)O)=C(COC(N)=O)CS[C@@H]21, the molecule has a low oral bioavailability."} {"text":"Based on the DeepSMILES NcncN)cnc-cccccc6))))))cN)nc6n%10, the molecule has a high oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/valid_0-13.jsonl": "{"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a low oral bioavailability.\nAssistant: Understood, this SMILES has a low oral bioavailability: CO[C@@]1(NC(=O)Cc2cccs2)C(=O)N2C(C(=O)O)=C(COC(N)=O)CS[C@@H]21"} {"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should have a high oral bioavailability.\nAssistant: Understood, this DeepSMILES has a high oral bioavailability: NcncN)cnc-cccccc6))))))cN)nc6n%10"}", "/scratch/micpie/export/bioavailability_ma_et_al/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: Predict if the molecule has a low or high oral bioavailability?\nDeepSMILES: CO[C@@]NC=O)Cccccs5))))))))C=O)NCC=O)O))=CCOCN)=O))))CS[C@@H]68\nConstraint: Even if you are not sure, you must pick either \"low\" or \"high\" without using any other words.\nResult: low"} {"text":"Task: Please classify a molecule based on the description.\nDescription: Predict if the molecule has a low or high oral bioavailability?\nInChI: InChI=1S\/C12H11N7\/c13-9-7(6-4-2-1-3-5-6)16-8-10(14)18-12(15)19-11(8)17-9\/h1-5H,(H6,13,14,15,17,18,19)\nConstraint: Even if you are uncertain, you must pick either \"low\" or \"high\" without using any other words.\nResult: high"}", "/scratch/micpie/export/bioavailability_ma_et_al/train_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules have a high oral bioavailability?\nConstraint: You must select none, one or more options from a or b without using any other words.\nOptions:\na ['[O][C][C][Branch1][C][S][C][S]']\nb ['[O][=C][NH1][C][=C][Branch1][C][F][C][=Branch1][C][=O][NH1][Ring1][Branch2]']\nAnswer: a"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules have a high oral bioavailability?\nConstraint: You must select none, one or more options from A or B without using any other words.\nOptions:\nA.) ['[C][C][Branch1][C][C][Branch1][C][C][N][C][C][Branch1][C][O][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=Branch1][C][=O][N][Ring1][#Branch1]']\nB.) ['[C][O][C@H1][\/C][=C][\/O][C@@][Branch1][C][C][O][C][=C][Branch1][C][C][C][Branch1][C][O][=C][C][=Branch1][Branch2][=C][Ring1][Branch2][C][Ring1][N][=O][C][=N][C][Branch1][S][C][C][N][Branch1][#Branch1][C][C][Branch1][C][C][C][C][C][Ring1][#Branch2][N][C][Ring1][=C][=C][Branch2][Ring2][#C][N][C][=Branch1][C][=O][\/C][Branch1][C][C][=C][\\\\C][=C][\\\\C@H1][Branch1][C][C][C@H1][Branch1][C][O][C@@H1][Branch1][C][C][C@@H1][Branch1][C][O][C@@H1][Branch1][C][C][C@H1][Branch1][#Branch1][O][C][Branch1][C][C][=O][C@@H1][Ring2][Branch1][Branch2][C][C][Ring2][Ring2][=C][=O]']\nAnswer: B"}", "/scratch/micpie/export/bioavailability_ma_et_al/valid_0-4.jsonl": "{"text":"The molecule with the DeepSMILES CO[C@@]NC=O)Cccccs5))))))))C=O)NCC=O)O))=CCOCN)=O))))CS[C@@H]68 has a low oral bioavailability."} {"text":"The molecule with the canonical SMILES Nc1nc(N)c2nc(-c3ccccc3)c(N)nc2n1 has a high oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: Predict if the molecule has a low or high oral bioavailability?\nSELFIES: ['[O][C][C][Branch1][C][S][C][S]']\nConstraint: Even if you are uncertain, you must pick either \"low\" or \"high\" without using any other words.\nResult: low"} {"text":"Task: Please classify a molecule based on the description.\nDescription: Predict if the molecule has a low or high oral bioavailability?\nMolecule canonical SMILES: CO[C@H]1\/C=C\/O[C@@]2(C)Oc3c(C)c(O)c4c(c3C2=O)C2=NC3(CCN(CC(C)C)CC3)NC2=C(NC(=O)\/C(C)=C\\C=C\\[C@H](C)[C@H](O)[C@@H](C)[C@@H](O)[C@@H](C)[C@H](OC(C)=O)[C@@H]1C)C4=O\nConstraint: Even if you are not sure, you must pick either \"low\" or \"high\" without using any other words.\nResult: low"}", "/scratch/micpie/export/bioavailability_ma_et_al/valid_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules have a high oral bioavailability?\nConstraint: You must select none, one or more options from a, b, c, or d without using any other words.\nOptions:\na.) CNCCC(Oc1ccc(C(F)(F)F)cc1)c1ccccc1\nb.) COCCc1ccc(OCC(O)CNC(C)C)cc1\nc.) CCC(C)n1ncn(-c2ccc(N3CCN(c4ccc(OC[C@H]5CO[C@](Cn6cncn6)(c6ccc(Cl)cc6Cl)O5)cc4)CC3)cc2)c1=O\nd.) CO[C@@]1(NC(=O)Cc2cccs2)C(=O)N2C(C(=O)O)=C(COC(N)=O)CS[C@@H]21\nAnswer: d"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules have a high oral bioavailability?\nConstraint: You must select none, one or more options from A, B, or C without using any additional words.\nOptions:\nA) O=C1NS(=O)(=O)c2ccccc21\nB) Nc1nc(N)c2nc(-c3ccccc3)c(N)nc2n1\nC) Cc1cnc(NC(=O)C2=C(O)c3ccccc3S(=O)(=O)N2C)s1\nAnswer: A, B, C"}", "/scratch/micpie/export/bioavailability_ma_et_al/valid_0-12.jsonl": "{"text":"User: I want to generate a molecule InChI.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a low oral bioavailability.\nAssistant: Ok, this InChI has a low oral bioavailability: InChI=1S\/C16H17N3O7S2\/c1-25-16(18-10(20)5-9-3-2-4-27-9)13(23)19-11(12(21)22)8(6-26-15(17)24)7-28-14(16)19\/h2-4,14H,5-7H2,1H3,(H2,17,24)(H,18,20)(H,21,22)\/t14-,16+\/m1\/s1"} {"text":"User: I want to come up with a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a high oral bioavailability.\nAssistant: Ok, this canonical SMILES has a high oral bioavailability: Nc1nc(N)c2nc(-c3ccccc3)c(N)nc2n1"}", "/scratch/micpie/export/bioavailability_ma_et_al/train_0-2.jsonl": "{"text":"The canonical SMILES OCC(S)CS is from a molecule that has a low oral bioavailability."} {"text":"The canonical SMILES CO[C@H]1\/C=C\/O[C@@]2(C)Oc3c(C)c(O)c4c(c3C2=O)C2=NC3(CCN(CC(C)C)CC3)NC2=C(NC(=O)\/C(C)=C\\C=C\\[C@H](C)[C@H](O)[C@@H](C)[C@@H](O)[C@@H](C)[C@H](OC(C)=O)[C@@H]1C)C4=O is from a molecule that has a low oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/test_0-11.jsonl": "{"text":"User: I'm searching for the DeepSMILES of a molecule that has a low oral bioavailability?\nAssistant: Ok, this is a molecule that has a low oral bioavailability: CC[N+]C)C)cccccO)c6"} {"text":"User: I'm looking for the SMILES of a molecule that has a high oral bioavailability?\nAssistant: This is a molecule that has a high oral bioavailability: CC(C)Nc1cccnc1N1CCN(C(=O)c2cc3cc(NS(C)(=O)=O)ccc3[nH]2)CC1"}", "/scratch/micpie/export/bioavailability_ma_et_al/train_0-7.jsonl": "{"text":"Task: Please give me a molecule DeepSMILES based on the description.\nDescription: A molecule that has a low oral bioavailability.\nResult: OCCS)CS"} {"text":"Task: Please create a SMILES based on the description.\nDescription: A molecule that has a low oral bioavailability.\nResult: CO[C@H]1\/C=C\/O[C@@]2(C)Oc3c(C)c(O)c4c(c3C2=O)C2=NC3(CCN(CC(C)C)CC3)NC2=C(NC(=O)\/C(C)=C\\C=C\\[C@H](C)[C@H](O)[C@@H](C)[C@@H](O)[C@@H](C)[C@H](OC(C)=O)[C@@H]1C)C4=O"}", "/scratch/micpie/export/bioavailability_ma_et_al/train_0-11.jsonl": "{"text":"User: I'm looking for the canonical SMILES of a molecule that has a low oral bioavailability?\nAssistant: This is a molecule that has a low oral bioavailability: OCC(S)CS"} {"text":"User: I'm searching for the canonical SMILES of a molecule that has a low oral bioavailability?\nAssistant: This is a molecule that has a low oral bioavailability: CO[C@H]1\/C=C\/O[C@@]2(C)Oc3c(C)c(O)c4c(c3C2=O)C2=NC3(CCN(CC(C)C)CC3)NC2=C(NC(=O)\/C(C)=C\\C=C\\[C@H](C)[C@H](O)[C@@H](C)[C@@H](O)[C@@H](C)[C@H](OC(C)=O)[C@@H]1C)C4=O"}", "/scratch/micpie/export/bioavailability_ma_et_al/train_0-1.jsonl": "{"text":"Based on the SMILES OCC(S)CS, the molecule has a low oral bioavailability."} {"text":"Based on the DeepSMILES CO[C@H]\/C=C\/O[C@@]C)OccC)cO)ccc6C9=O)))C=NCCCNCCC)C)))CC6)))))NC5=CNC=O)\/CC)=C\\C=C\\[C@H]C)[C@H]O)[C@@H]C)[C@@H]O)[C@@H]C)[C@H]OCC)=O)))[C@@H]%30C))))))))))))))C9=O, the molecule has a low oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/train_0-13.jsonl": "{"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should have a low oral bioavailability.\nAssistant: Got it, this SMILES has a low oral bioavailability: OCC(S)CS"} {"text":"User: I want to create a SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a low oral bioavailability.\nAssistant: Ok, this SMILES has a low oral bioavailability: CO[C@H]1\/C=C\/O[C@@]2(C)Oc3c(C)c(O)c4c(c3C2=O)C2=NC3(CCN(CC(C)C)CC3)NC2=C(NC(=O)\/C(C)=C\\C=C\\[C@H](C)[C@H](O)[C@@H](C)[C@@H](O)[C@@H](C)[C@H](OC(C)=O)[C@@H]1C)C4=O"}", "/scratch/micpie/export/bioavailability_ma_et_al/train_0-4.jsonl": "{"text":"The molecule with the SELFIES ['[O][C][C][Branch1][C][S][C][S]'] has a low oral bioavailability."} {"text":"The molecule with the canonical SMILES CO[C@H]1\/C=C\/O[C@@]2(C)Oc3c(C)c(O)c4c(c3C2=O)C2=NC3(CCN(CC(C)C)CC3)NC2=C(NC(=O)\/C(C)=C\\C=C\\[C@H](C)[C@H](O)[C@@H](C)[C@@H](O)[C@@H](C)[C@H](OC(C)=O)[C@@H]1C)C4=O has a low oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/test_0-7.jsonl": "{"text":"Task: Please create a molecule SMILES based on the description.\nDescription: A molecule that has a low oral bioavailability.\nResult: CC[N+](C)(C)c1cccc(O)c1"} {"text":"Task: Please generate a molecule InChI based on the description.\nDescription: A molecule that has a high oral bioavailability.\nResult: InChI=1S\/C22H28N6O3S\/c1-15(2)24-19-5-4-8-23-21(19)27-9-11-28(12-10-27)22(29)20-14-16-13-17(26-32(3,30)31)6-7-18(16)25-20\/h4-8,13-15,24-26H,9-12H2,1-3H3"}", "/scratch/micpie/export/bioavailability_ma_et_al/train_0-9.jsonl": "{"text":"User: Has the molecule with the DeepSMILES OCCS)CS a low or high oral bioavailability?\nAssistant: It has a low oral bioavailability."} {"text":"User: Has the molecule with the canonical SMILES CO[C@H]1\/C=C\/O[C@@]2(C)Oc3c(C)c(O)c4c(c3C2=O)C2=NC3(CCN(CC(C)C)CC3)NC2=C(NC(=O)\/C(C)=C\\C=C\\[C@H](C)[C@H](O)[C@@H](C)[C@@H](O)[C@@H](C)[C@H](OC(C)=O)[C@@H]1C)C4=O a low or high oral bioavailability?\nAssistant: It has a low oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/valid_0-3.jsonl": "{"text":"The InChI InChI=1S\/C16H17N3O7S2\/c1-25-16(18-10(20)5-9-3-2-4-27-9)13(23)19-11(12(21)22)8(6-26-15(17)24)7-28-14(16)19\/h2-4,14H,5-7H2,1H3,(H2,17,24)(H,18,20)(H,21,22)\/t14-,16+\/m1\/s1 has a low oral bioavailability."} {"text":"The canonical SMILES Nc1nc(N)c2nc(-c3ccccc3)c(N)nc2n1 has a high oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/test_0-8.jsonl": "{"text":"User: Can you derive if the molecule with the canonical SMILES CC[N+](C)(C)c1cccc(O)c1 has a low or high oral bioavailability?\nAssistant: Yes, I'm happy to help, this molecule has a low oral bioavailability."} {"text":"User: Can you tell me if the molecule with the canonical SMILES CC(C)Nc1cccnc1N1CCN(C(=O)c2cc3cc(NS(C)(=O)=O)ccc3[nH]2)CC1 has a low or high oral bioavailability?\nAssistant: Yes, I'm happy to help, this molecule has a high oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Has the molecule with the DeepSMILES representation of CC[N+]C)C)cccccO)c6 a high oral bioavailability?\nConstraint: Even if you are not sure, you must pick either a or b without using any additional words.\nOptions:\na. True\nb. False\nAnswer: b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Has the molecule with the canonical SMILES CC(C)Nc1cccnc1N1CCN(C(=O)c2cc3cc(NS(C)(=O)=O)ccc3[nH]2)CC1 a high oral bioavailability?\nConstraint: Even if you are not sure, you must pick either a or b without using any additional words.\nOptions:\na.) False\nb.) True\nAnswer: b"}", "/scratch/micpie/export/bioavailability_ma_et_al/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Has the molecule with the SELFIES ['[C][O][C@@][Branch1][=C][N][C][=Branch1][C][=O][C][C][=C][C][=C][S][Ring1][Branch1][C][=Branch1][C][=O][N][C][Branch1][=Branch1][C][=Branch1][C][=O][O][=C][Branch1][Branch2][C][O][C][Branch1][C][N][=O][C][S][C@@H1][Ring1][=C][Ring2][Ring1][#Branch2]'] a high oral bioavailability?\nConstraint: Even if you are uncertain, you must pick either A or B without using any additional words.\nOptions:\nA) False\nB) True\nAnswer: A"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Has the molecule with the DeepSMILES NcncN)cnc-cccccc6))))))cN)nc6n%10 a high oral bioavailability?\nConstraint: Even if you are uncertain, you must pick either a or b without using any additional words.\nOptions:\na) True\nb) False\nAnswer: a"}", "/scratch/micpie/export/bioavailability_ma_et_al/test_0-4.jsonl": "{"text":"The molecule with the DeepSMILES CC[N+]C)C)cccccO)c6 has a low oral bioavailability."} {"text":"The molecule with the InChI InChI=1S\/C22H28N6O3S\/c1-15(2)24-19-5-4-8-23-21(19)27-9-11-28(12-10-27)22(29)20-14-16-13-17(26-32(3,30)31)6-7-18(16)25-20\/h4-8,13-15,24-26H,9-12H2,1-3H3 has a high oral bioavailability."}", "/scratch/micpie/export/bioavailability_ma_et_al/test_0-12.jsonl": "{"text":"User: I want to come up with a InChI.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should have a low oral bioavailability.\nAssistant: Got it, this InChI has a low oral bioavailability: InChI=1S\/C10H15NO\/c1-4-11(2,3)9-6-5-7-10(12)8-9\/h5-8H,4H2,1-3H3\/p+1"} {"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a high oral bioavailability.\nAssistant: Ok, this InChI has a high oral bioavailability: InChI=1S\/C22H28N6O3S\/c1-15(2)24-19-5-4-8-23-21(19)27-9-11-28(12-10-27)22(29)20-14-16-13-17(26-32(3,30)31)6-7-18(16)25-20\/h4-8,13-15,24-26H,9-12H2,1-3H3"}", "/scratch/micpie/export/herg_central_at_1uM/test_0-10.jsonl": "{"text":"User: I want to create a InChI.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should have a hERG inhibition at a concentration of 1uM of 8.340 %.\nAssistant: Got it, here you go, this InChI represents a molecule that has a hERG inhibition at a concentration of 1uM of 8.340 %: InChI=1S\/C31H42N4O5\/c1-39-23-13-8-12-22(20-23)33-28(36)25-24-14-15-31(40-24)26(25)30(38)35(19-9-18-34-16-6-3-7-17-34)27(31)29(37)32-21-10-4-2-5-11-21\/h8,12-15,20-21,24-27H,2-7,9-11,16-19H2,1H3,(H,32,37)(H,33,36)"} {"text":"User: I want to come up with a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a hERG inhibition at 1uM of 11.783 %.\nAssistant: Ok, this canonical SMILES represents a molecule that has a hERG inhibition at 1uM of 11.783 %: Cc1cc(Br)ccc1SCC(=O)OCC(=O)Nc1ccccc1C#N"}", "/scratch/micpie/export/herg_central_at_1uM/valid_0-8.jsonl": "{"text":"User: Can you create the SELFIES of a molecule that has a human ether-à-go-go related gene (hERG) inhibition at a concentration of 1uM of 10.226 %?\nAssistant: Of course, here you go: [C][O][C][=C][C][=C][Branch2][Ring1][#Branch1][\/C][=C][\\S][C][=Branch1][C][=S][N][Branch1][=Branch2][N][C][C][O][C][C][Ring1][=Branch1][C][Ring1][N][=O][C][Branch1][Ring1][O][C][=C][Ring2][Ring1][=Branch1]"} {"text":"User: Can you generate the InChI of a molecule that has a human ether-à-go-go related gene (hERG) inhibition at 1uM of 9.122 %?\nAssistant: Yes, here you go: InChI=1S\/C17H19NO3\/c19-17(18-10-4-1-5-11-18)7-3-2-6-14-8-9-15-16(12-14)21-13-20-15\/h2-3,6-9,12H,1,4-5,10-11,13H2\/b6-2+,7-3+"}", "/scratch/micpie/export/herg_central_at_1uM/train_0-8.jsonl": "{"text":"User: Can you give me the SELFIES of a molecule that has a hERG inhibition at 1uM of 21.802 %?\nAssistant: Yes, here you go: [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][=C][C][=Branch1][C][=O][O][C][Ring1][#Branch1][=C][Ring1][=N][C][Branch1][=C][C][C][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][Branch1][=Branch1][N][Branch1][C][C][C][C][=C][Ring1][=Branch2]"} {"text":"User: Can you give me the InChI of a molecule that has a human ether-à-go-go related gene (hERG) inhibition at 1uM of -6.282 %?\nAssistant: Of course, here you go: InChI=1S\/C22H20N6O3\/c1-31-17-9-4-8-16(12-17)28-21-20(24-25-28)22(30)26(14-23-21)13-19(29)27-11-5-7-15-6-2-3-10-18(15)27\/h2-4,6,8-10,12,14H,5,7,11,13H2,1H3"}", "/scratch/micpie/export/herg_central_at_1uM/test_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the human ether-à-go-go related gene (hERG) inhibition at a concentration of 1uM in %.\nMolecule canonical SMILES: COc1cccc(NC(=O)C2C3C=CC4(O3)C2C(=O)N(CCCN2CCCCC2)C4C(=O)NC2CCCCC2)c1\nConstraint: Even if you are uncertain, you must answer with a numeric value in % without the unit and without using any additional words.\nResult: 8.340"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hERG inhibition at a concentration of 1uM in %.\nMolecule SMILES: Cc1cc(Br)ccc1SCC(=O)OCC(=O)Nc1ccccc1C#N\nConstraint: Even if you are uncertain, you must answer with a numeric value in % without the unit and without using any additional words.\nResult: 11.783"}", "/scratch/micpie/export/herg_central_at_1uM/valid_0-9.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that has a hERG inhibition at a concentration of 1uM of 10.226 %.\nAssistant: This is a molecule that has a hERG inhibition at a concentration of 1uM of 10.226 %: COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1"} {"text":"User: I'm looking for the canonical SMILES of a molecule that has a human ether-à-go-go related gene (hERG) inhibition at 1uM of 9.122 %.\nAssistant: This is a molecule that has a human ether-à-go-go related gene (hERG) inhibition at 1uM of 9.122 %: O=C(\/C=C\/C=C\/c1ccc2c(c1)OCO2)N1CCCCC1"}", "/scratch/micpie/export/herg_central_at_1uM/test_0-1.jsonl": "{"text":"Based on the InChI InChI=1S\/C31H42N4O5\/c1-39-23-13-8-12-22(20-23)33-28(36)25-24-14-15-31(40-24)26(25)30(38)35(19-9-18-34-16-6-3-7-17-34)27(31)29(37)32-21-10-4-2-5-11-21\/h8,12-15,20-21,24-27H,2-7,9-11,16-19H2,1H3,(H,32,37)(H,33,36), the molecule has a human ether-à-go-go related gene (hERG) inhibition at 1uM of 8.340 %."} {"text":"Based on the SMILES representation of Cc1cc(Br)ccc1SCC(=O)OCC(=O)Nc1ccccc1C#N, the molecule has a hERG inhibition at 1uM of 11.783 %."}", "/scratch/micpie/export/herg_central_at_1uM/valid_0-0.jsonl": "{"text":"The molecule with the SMILES representation of COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1 has a hERG inhibition at a concentration of 1uM of 10.226 %."} {"text":"The molecule with the SMILES O=C(\/C=C\/C=C\/c1ccc2c(c1)OCO2)N1CCCCC1 has a human ether-à-go-go related gene (hERG) inhibition at 1uM of 9.122 %."}", "/scratch/micpie/export/herg_central_at_1uM/test_0-2.jsonl": "{"text":"The InChI InChI=1S\/C31H42N4O5\/c1-39-23-13-8-12-22(20-23)33-28(36)25-24-14-15-31(40-24)26(25)30(38)35(19-9-18-34-16-6-3-7-17-34)27(31)29(37)32-21-10-4-2-5-11-21\/h8,12-15,20-21,24-27H,2-7,9-11,16-19H2,1H3,(H,32,37)(H,33,36) is representing a molecule with a human ether-à-go-go related gene (hERG) inhibition at 1uM of 8.340 %."} {"text":"The InChI InChI=1S\/C18H15BrN2O3S\/c1-12-8-14(19)6-7-16(12)25-11-18(23)24-10-17(22)21-15-5-3-2-4-13(15)9-20\/h2-8H,10-11H2,1H3,(H,21,22) is representing a molecule that has a human ether-à-go-go related gene (hERG) inhibition at a concentration of 1uM of 11.783 %."}", "/scratch/micpie/export/herg_central_at_1uM/valid_0-10.jsonl": "{"text":"User: I want to generate a molecule InChI.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should have a hERG inhibition at a concentration of 1uM of 10.226 %.\nAssistant: Got it, here you go, this InChI represents a molecule that has a hERG inhibition at a concentration of 1uM of 10.226 %: InChI=1S\/C16H18N2O4S2\/c1-20-12-4-3-11(13(10-12)21-2)9-14-15(19)18(16(23)24-14)17-5-7-22-8-6-17\/h3-4,9-10H,5-8H2,1-2H3\/b14-9-"} {"text":"User: I want to generate a InChI.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a hERG inhibition at 1uM of 9.122 %.\nAssistant: Got it, here you go, this InChI represents a molecule that has a hERG inhibition at 1uM of 9.122 %: InChI=1S\/C17H19NO3\/c19-17(18-10-4-1-5-11-18)7-3-2-6-14-8-9-15-16(12-14)21-13-20-15\/h2-3,6-9,12H,1,4-5,10-11,13H2\/b6-2+,7-3+"}", "/scratch/micpie/export/herg_central_at_1uM/train_0-6.jsonl": "{"text":"Task: Please generate a DeepSMILES based on the text description.\nDescription: A molecule that has a hERG inhibition at a concentration of 1uM of 21.802 %.\nResult: COcccOC))cccc=O)oc6c%10CCC=O)NCCOCC6))))))))ccccNC)C))cc6"} {"text":"Task: Please generate a DeepSMILES based on the description.\nDescription: A molecule that has a hERG inhibition at 1uM of -6.282 %.\nResult: COccccc-nnncc=O)nCC=O)NCCCcccccc6%10))))))))))))cnc69)))))))))c6"}", "/scratch/micpie/export/herg_central_at_1uM/valid_0-6.jsonl": "{"text":"Task: Please generate a canonical SMILES based on the description.\nDescription: A molecule that has a human ether-à-go-go related gene (hERG) inhibition at 1uM of 10.226 %.\nResult: COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1"} {"text":"Task: Please generate a molecule InChI based on the text description below.\nDescription: A molecule that has a human ether-à-go-go related gene (hERG) inhibition at a concentration of 1uM of 9.122 %.\nResult: InChI=1S\/C17H19NO3\/c19-17(18-10-4-1-5-11-18)7-3-2-6-14-8-9-15-16(12-14)21-13-20-15\/h2-3,6-9,12H,1,4-5,10-11,13H2\/b6-2+,7-3+"}", "/scratch/micpie/export/herg_central_at_1uM/test_0-9.jsonl": "{"text":"User: I'm searching for the DeepSMILES of a molecule that has a human ether-à-go-go related gene (hERG) inhibition at 1uM of 8.340 %.\nAssistant: This is a molecule that has a human ether-à-go-go related gene (hERG) inhibition at 1uM of 8.340 %: COcccccNC=O)CCC=CCO5)C6C=O)NCCCNCCCCC6)))))))))C5C=O)NCCCCCC6)))))))))))))))))))c6"} {"text":"User: I'm looking for the SELFIES of a molecule that has a hERG inhibition at 1uM of 11.783 %.\nAssistant: This is a molecule that has a hERG inhibition at 1uM of 11.783 %: [C][C][=C][C][Branch1][C][Br][=C][C][=C][Ring1][#Branch1][S][C][C][=Branch1][C][=O][O][C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][#N]"}", "/scratch/micpie/export/herg_central_at_1uM/test_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of [C][O][C][=C][C][=C][C][Branch2][Branch1][Branch1][N][C][=Branch1][C][=O][C][C][C][=C][C][Branch1][Ring2][O][Ring1][Branch1][C][Ring1][#Branch1][C][=Branch1][C][=O][N][Branch1][N][C][C][C][N][C][C][C][C][C][Ring1][=Branch1][C][Ring1][S][C][=Branch1][C][=O][N][C][C][C][C][C][C][Ring1][=Branch1][=C][Ring2][Ring2][=Branch1] has a hERG inhibition at a concentration of 1uM of 8.340 %."} {"text":"The molecule with the InChI representation of InChI=1S\/C18H15BrN2O3S\/c1-12-8-14(19)6-7-16(12)25-11-18(23)24-10-17(22)21-15-5-3-2-4-13(15)9-20\/h2-8H,10-11H2,1H3,(H,21,22) has a human ether-à-go-go related gene (hERG) inhibition at 1uM of 11.783 %."}", "/scratch/micpie/export/herg_central_at_1uM/valid_0-7.jsonl": "{"text":"User: Can you derive the hERG inhibition at 1uM in % of the molecule with the SMILES COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1?\nAssistant: Of course, this molecule has a hERG inhibition at 1uM of 10.226 %."} {"text":"User: Can you estimate the hERG inhibition at a concentration of 1uM in % of the molecule with the InChI InChI=1S\/C17H19NO3\/c19-17(18-10-4-1-5-11-18)7-3-2-6-14-8-9-15-16(12-14)21-13-20-15\/h2-3,6-9,12H,1,4-5,10-11,13H2\/b6-2+,7-3+?\nAssistant: Yes, this molecule has a hERG inhibition at a concentration of 1uM of 9.122 %."}", "/scratch/micpie/export/herg_central_at_1uM/test_0-3.jsonl": "{"text":"The molecule with the SMILES COc1cccc(NC(=O)C2C3C=CC4(O3)C2C(=O)N(CCCN2CCCCC2)C4C(=O)NC2CCCCC2)c1 has a hERG inhibition at a concentration of 1uM of 8.340 %."} {"text":"The molecule with the SMILES Cc1cc(Br)ccc1SCC(=O)OCC(=O)Nc1ccccc1C#N has a human ether-à-go-go related gene (hERG) inhibition at 1uM of 11.783 %."}", "/scratch/micpie/export/herg_central_at_1uM/valid_0-11.jsonl": "{"text":"User: I want to come up with a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a human ether-à-go-go related gene (hERG) inhibition at 1uM of 10.226 %.\nAssistant: Got it, this canonical SMILES represents a molecule that has a human ether-à-go-go related gene (hERG) inhibition at 1uM of 10.226 %: COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1"} {"text":"User: I want to come up with a SELFIES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a hERG inhibition at a concentration of 1uM of 9.122 %.\nAssistant: Got it, this SELFIES represents a molecule that has a hERG inhibition at a concentration of 1uM of 9.122 %: [O][=C][Branch2][Ring1][Ring2][\/C][=C][\/C][=C][\/C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O][Ring1][=Branch1][N][C][C][C][C][C][Ring1][=Branch1]"}", "/scratch/micpie/export/herg_central_at_1uM/train_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][=C][C][=Branch1][C][=O][O][C][Ring1][#Branch1][=C][Ring1][=N][C][Branch1][=C][C][C][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][Branch1][=Branch1][N][Branch1][C][C][C][C][=C][Ring1][=Branch2] has a hERG inhibition at 1uM of 21.802 %."} {"text":"The molecule with the SMILES COc1cccc(-n2nnc3c(=O)n(CC(=O)N4CCCc5ccccc54)cnc32)c1 has a human ether-à-go-go related gene (hERG) inhibition at 1uM of -6.282 %."}", "/scratch/micpie/export/herg_central_at_1uM/test_0-6.jsonl": "{"text":"Task: Please generate a canonical SMILES based on the description below.\nDescription: A molecule that has a hERG inhibition at a concentration of 1uM of 8.340 %.\nResult: COc1cccc(NC(=O)C2C3C=CC4(O3)C2C(=O)N(CCCN2CCCCC2)C4C(=O)NC2CCCCC2)c1"} {"text":"Task: Please generate a SMILES based on the description below.\nDescription: A molecule that has a human ether-à-go-go related gene (hERG) inhibition at 1uM of 11.783 %.\nResult: Cc1cc(Br)ccc1SCC(=O)OCC(=O)Nc1ccccc1C#N"}", "/scratch/micpie/export/herg_central_at_1uM/train_0-10.jsonl": "{"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a hERG inhibition at 1uM of 21.802 %.\nAssistant: Got it, this SELFIES represents a molecule that has a hERG inhibition at 1uM of 21.802 %: [C][O][C][=C][C][Branch1][Ring1][O][C][=C][C][=C][C][=Branch1][C][=O][O][C][Ring1][#Branch1][=C][Ring1][=N][C][Branch1][=C][C][C][=Branch1][C][=O][N][C][C][O][C][C][Ring1][=Branch1][C][=C][C][=C][Branch1][=Branch1][N][Branch1][C][C][C][C][=C][Ring1][=Branch2]"} {"text":"User: I want to generate a molecule InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a human ether-à-go-go related gene (hERG) inhibition at a concentration of 1uM of -6.282 %.\nAssistant: Got it, this InChI represents a molecule that has a human ether-à-go-go related gene (hERG) inhibition at a concentration of 1uM of -6.282 %: InChI=1S\/C22H20N6O3\/c1-31-17-9-4-8-16(12-17)28-21-20(24-25-28)22(30)26(14-23-21)13-19(29)27-11-5-7-15-6-2-3-10-18(15)27\/h2-4,6,8-10,12,14H,5,7,11,13H2,1H3"}", "/scratch/micpie/export/herg_central_at_1uM/train_0-3.jsonl": "{"text":"The molecule with the SMILES COc1cc(OC)c2ccc(=O)oc2c1C(CC(=O)N1CCOCC1)c1ccc(N(C)C)cc1 has a hERG inhibition at a concentration of 1uM of 21.802 %."} {"text":"The molecule with the DeepSMILES COccccc-nnncc=O)nCC=O)NCCCcccccc6%10))))))))))))cnc69)))))))))c6 has a human ether-à-go-go related gene (hERG) inhibition at 1uM of -6.282 %."}", "/scratch/micpie/export/herg_central_at_1uM/valid_0-2.jsonl": "{"text":"The canonical SMILES COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1 represents a molecule with a human ether-à-go-go related gene (hERG) inhibition at a concentration of 1uM of 10.226 %."} {"text":"The DeepSMILES O=C\/C=C\/C=C\/cccccc6)OCO5))))))))))))NCCCCC6 is representing a molecule that has a human ether-à-go-go related gene (hERG) inhibition at a concentration of 1uM of 9.122 %."}", "/scratch/micpie/export/herg_central_at_1uM/valid_0-1.jsonl": "{"text":"Based on the DeepSMILES COcccc\/C=C\\SC=S)NNCCOCC6))))))C5=O)))))))cOC))c6, the molecule has a human ether-à-go-go related gene (hERG) inhibition at a concentration of 1uM of 10.226 %."} {"text":"Based on the canonical SMILES representation of O=C(\/C=C\/C=C\/c1ccc2c(c1)OCO2)N1CCCCC1, the molecule has a human ether-à-go-go related gene (hERG) inhibition at 1uM of 9.122 %."}", "/scratch/micpie/export/herg_central_at_1uM/valid_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hERG inhibition at a concentration of 1uM in %.\nInChI: InChI=1S\/C16H18N2O4S2\/c1-20-12-4-3-11(13(10-12)21-2)9-14-15(19)18(16(23)24-14)17-5-7-22-8-6-17\/h3-4,9-10H,5-8H2,1-2H3\/b14-9-\nConstraint: Even if you are not sure, you must answer with a numeric value in % without the unit and without using any additional words.\nResult: 10.226"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the human ether-à-go-go related gene (hERG) inhibition at a concentration of 1uM in %.\nMolecule DeepSMILES: O=C\/C=C\/C=C\/cccccc6)OCO5))))))))))))NCCCCC6\nConstraint: Even if you are uncertain, you must answer with a numeric value in % without the unit and without using any other words.\nResult: 9.122"}", "/scratch/micpie/export/herg_central_at_1uM/valid_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hERG inhibition at 1uM in %.\nMolecule InChI: InChI=1S\/C16H18N2O4S2\/c1-20-12-4-3-11(13(10-12)21-2)9-14-15(19)18(16(23)24-14)17-5-7-22-8-6-17\/h3-4,9-10H,5-8H2,1-2H3\/b14-9-\nConstraint: Even if you are not sure, you must answer with a numeric value in % without using any other words.\nResult: 10.226 %"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hERG inhibition at 1uM in %.\nMolecule SELFIES: [O][=C][Branch2][Ring1][Ring2][\/C][=C][\/C][=C][\/C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O][Ring1][=Branch1][N][C][C][C][C][C][Ring1][=Branch1]\nConstraint: Even if you are not sure, you must answer with a numeric value in % without using any additional words.\nResult: 9.122 %"}", "/scratch/micpie/export/herg_central_at_1uM/train_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hERG inhibition at a concentration of 1uM in %.\nSMILES: COc1cc(OC)c2ccc(=O)oc2c1C(CC(=O)N1CCOCC1)c1ccc(N(C)C)cc1\nConstraint: Even if you are not sure, you must answer with a numeric value in % without the unit and without using any other words.\nResult: 21.802"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hERG inhibition at 1uM in %.\nSMILES: COc1cccc(-n2nnc3c(=O)n(CC(=O)N4CCCc5ccccc54)cnc32)c1\nConstraint: Even if you are not sure, you must answer with a numeric value in % without the unit and without using any additional words.\nResult: -6.282"}", "/scratch/micpie/export/herg_central_at_1uM/train_0-2.jsonl": "{"text":"The InChI InChI=1S\/C26H30N2O6\/c1-27(2)18-7-5-17(6-8-18)20(15-23(29)28-11-13-33-14-12-28)25-22(32-4)16-21(31-3)19-9-10-24(30)34-26(19)25\/h5-10,16,20H,11-15H2,1-4H3 represents a molecule that has a human ether-à-go-go related gene (hERG) inhibition at 1uM of 21.802 %."} {"text":"The SMILES COc1cccc(-n2nnc3c(=O)n(CC(=O)N4CCCc5ccccc54)cnc32)c1 represents a molecule with a human ether-à-go-go related gene (hERG) inhibition at 1uM of -6.282 %."}", "/scratch/micpie/export/herg_central_at_1uM/test_0-11.jsonl": "{"text":"User: I want to create a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should have a human ether-à-go-go related gene (hERG) inhibition at 1uM of 8.340 %.\nAssistant: Ok, this canonical SMILES represents a molecule that has a human ether-à-go-go related gene (hERG) inhibition at 1uM of 8.340 %: COc1cccc(NC(=O)C2C3C=CC4(O3)C2C(=O)N(CCCN2CCCCC2)C4C(=O)NC2CCCCC2)c1"} {"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should have a human ether-à-go-go related gene (hERG) inhibition at a concentration of 1uM of 11.783 %.\nAssistant: Ok, this SMILES represents a molecule that has a human ether-à-go-go related gene (hERG) inhibition at a concentration of 1uM of 11.783 %: Cc1cc(Br)ccc1SCC(=O)OCC(=O)Nc1ccccc1C#N"}", "/scratch/micpie/export/herg_central_at_1uM/train_0-7.jsonl": "{"text":"User: Can you derive the hERG inhibition at 1uM in % of the molecule with the DeepSMILES COcccOC))cccc=O)oc6c%10CCC=O)NCCOCC6))))))))ccccNC)C))cc6?\nAssistant: Sure, this molecule has a hERG inhibition at 1uM of 21.802 %."} {"text":"User: Can you derive the human ether-à-go-go related gene (hERG) inhibition at a concentration of 1uM in % of the molecule with the InChI InChI=1S\/C22H20N6O3\/c1-31-17-9-4-8-16(12-17)28-21-20(24-25-28)22(30)26(14-23-21)13-19(29)27-11-5-7-15-6-2-3-10-18(15)27\/h2-4,6,8-10,12,14H,5,7,11,13H2,1H3?\nAssistant: Sure, this molecule has a human ether-à-go-go related gene (hERG) inhibition at a concentration of 1uM of -6.282 %."}", "/scratch/micpie/export/herg_central_at_1uM/train_0-11.jsonl": "{"text":"User: I want to generate a SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should have a hERG inhibition at 1uM of 21.802 %.\nAssistant: Understood, this SMILES represents a molecule that has a hERG inhibition at 1uM of 21.802 %: COc1cc(OC)c2ccc(=O)oc2c1C(CC(=O)N1CCOCC1)c1ccc(N(C)C)cc1"} {"text":"User: I want to come up with a molecule canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should have a human ether-à-go-go related gene (hERG) inhibition at a concentration of 1uM of -6.282 %.\nAssistant: Understood, this canonical SMILES represents a molecule that has a human ether-à-go-go related gene (hERG) inhibition at a concentration of 1uM of -6.282 %: COc1cccc(-n2nnc3c(=O)n(CC(=O)N4CCCc5ccccc54)cnc32)c1"}", "/scratch/micpie/export/herg_central_at_1uM/train_0-1.jsonl": "{"text":"Based on the InChI InChI=1S\/C26H30N2O6\/c1-27(2)18-7-5-17(6-8-18)20(15-23(29)28-11-13-33-14-12-28)25-22(32-4)16-21(31-3)19-9-10-24(30)34-26(19)25\/h5-10,16,20H,11-15H2,1-4H3, the molecule has a hERG inhibition at a concentration of 1uM of 21.802 %."} {"text":"Based on the DeepSMILES COccccc-nnncc=O)nCC=O)NCCCcccccc6%10))))))))))))cnc69)))))))))c6, the molecule has a hERG inhibition at a concentration of 1uM of -6.282 %."}", "/scratch/micpie/export/herg_central_at_1uM/train_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hERG inhibition at 1uM in %.\nMolecule SMILES: COc1cc(OC)c2ccc(=O)oc2c1C(CC(=O)N1CCOCC1)c1ccc(N(C)C)cc1\nConstraint: Even if you are uncertain, you must answer with a numeric value in % without using any additional words.\nResult: 21.802 %"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hERG inhibition at a concentration of 1uM in %.\nMolecule DeepSMILES: COccccc-nnncc=O)nCC=O)NCCCcccccc6%10))))))))))))cnc69)))))))))c6\nConstraint: Even if you are uncertain, you must answer with a numeric value in % without using any additional words.\nResult: -6.282 %"}", "/scratch/micpie/export/herg_central_at_1uM/test_0-7.jsonl": "{"text":"User: Can you tell me the hERG inhibition at 1uM in % of the molecule with the SMILES COc1cccc(NC(=O)C2C3C=CC4(O3)C2C(=O)N(CCCN2CCCCC2)C4C(=O)NC2CCCCC2)c1?\nAssistant: Sure, this molecule has a hERG inhibition at 1uM of 8.340 %."} {"text":"User: Can you tell me the human ether-à-go-go related gene (hERG) inhibition at a concentration of 1uM in % of the molecule with the DeepSMILES CcccBr)ccc6SCC=O)OCC=O)Ncccccc6C#N?\nAssistant: Yes, this molecule has a human ether-à-go-go related gene (hERG) inhibition at a concentration of 1uM of 11.783 %."}", "/scratch/micpie/export/herg_central_at_1uM/train_0-9.jsonl": "{"text":"User: I'm searching for the SMILES of a molecule that has a human ether-à-go-go related gene (hERG) inhibition at a concentration of 1uM of 21.802 %.\nAssistant: This is a molecule that has a human ether-à-go-go related gene (hERG) inhibition at a concentration of 1uM of 21.802 %: COc1cc(OC)c2ccc(=O)oc2c1C(CC(=O)N1CCOCC1)c1ccc(N(C)C)cc1"} {"text":"User: I'm searching for the SMILES of a molecule that has a human ether-à-go-go related gene (hERG) inhibition at 1uM of -6.282 %.\nAssistant: This is a molecule that has a human ether-à-go-go related gene (hERG) inhibition at 1uM of -6.282 %: COc1cccc(-n2nnc3c(=O)n(CC(=O)N4CCCc5ccccc54)cnc32)c1"}", "/scratch/micpie/export/herg_central_at_1uM/valid_0-3.jsonl": "{"text":"The molecule with the canonical SMILES COc1ccc(\/C=C2\\SC(=S)N(N3CCOCC3)C2=O)c(OC)c1 has a hERG inhibition at a concentration of 1uM of 10.226 %."} {"text":"The molecule with the canonical SMILES O=C(\/C=C\/C=C\/c1ccc2c(c1)OCO2)N1CCCCC1 has a hERG inhibition at a concentration of 1uM of 9.122 %."}", "/scratch/micpie/export/herg_central_at_1uM/test_0-8.jsonl": "{"text":"User: Can you give me the DeepSMILES of a molecule that has a human ether-à-go-go related gene (hERG) inhibition at a concentration of 1uM of 8.340 %?\nAssistant: Yes, here you go: COcccccNC=O)CCC=CCO5)C6C=O)NCCCNCCCCC6)))))))))C5C=O)NCCCCCC6)))))))))))))))))))c6"} {"text":"User: Can you generate the DeepSMILES of a molecule that has a human ether-à-go-go related gene (hERG) inhibition at 1uM of 11.783 %?\nAssistant: Yes, I'm happy to help, here you go: CcccBr)ccc6SCC=O)OCC=O)Ncccccc6C#N"}", "/scratch/micpie/export/herg_central_at_1uM/test_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hERG inhibition at a concentration of 1uM in %.\nMolecule SELFIES: [C][O][C][=C][C][=C][C][Branch2][Branch1][Branch1][N][C][=Branch1][C][=O][C][C][C][=C][C][Branch1][Ring2][O][Ring1][Branch1][C][Ring1][#Branch1][C][=Branch1][C][=O][N][Branch1][N][C][C][C][N][C][C][C][C][C][Ring1][=Branch1][C][Ring1][S][C][=Branch1][C][=O][N][C][C][C][C][C][C][Ring1][=Branch1][=C][Ring2][Ring2][=Branch1]\nConstraint: Even if you are not sure, you must answer with a numeric value in % without using any additional words.\nResult: 8.340 %"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the hERG inhibition at 1uM in %.\nMolecule SMILES: Cc1cc(Br)ccc1SCC(=O)OCC(=O)Nc1ccccc1C#N\nConstraint: Even if you are not sure, you must answer with a numeric value in % without using any additional words.\nResult: 11.783 %"}", "/scratch/micpie/export/drug_induced_liver_injury/test_0-10.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that is not showing a fatal liver disease caused by drugs.\nAssistant: This is a molecule that is not causing a fatal liver disease caused by drugs: O=C(NC(CO)C(O)c1ccc([N+](=O)[O-])cc1)C(Cl)Cl"} {"text":"User: I'm looking for the canonical SMILES of a molecule that is showing a drug-induced liver injury (DILI).\nAssistant: This is a molecule that is causing a drug-induced liver injury (DILI): CS(=O)(=O)Nc1ccc([N+](=O)[O-])cc1Oc1ccccc1"}", "/scratch/micpie/export/drug_induced_liver_injury/valid_0-8.jsonl": "{"text":"User: Is the molecule with the canonical SMILES NC(=O)c1cnccn1 causing a drug-induced liver injury?\nAssistant: Yes, it is causing a drug-induced liver injury."} {"text":"User: Is the molecule with the SELFIES [N][C][=C][C][=C][Branch1][S][S][=Branch1][C][=O][=Branch1][C][=O][N][C][=N][C][=C][S][Ring1][Branch1][C][=C][Ring1][#C] causing a drug-induced liver injury?\nAssistant: Yes, it is causing a drug-induced liver injury."}", "/scratch/micpie/export/drug_induced_liver_injury/train_0-8.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C5H11NO2\/c1-6(2,3)4-5(7)8\/h4H2,1-3H3 causing a fatal liver disease caused by drugs?\nAssistant: No, it is not causing a fatal liver disease caused by drugs."} {"text":"User: Is the molecule with the SELFIES [C][#C][C][Branch1][C][O][C][C][C][C][C][C][C][=C][C][=Branch1][C][=O][C][C][C][Ring1][#Branch1][C][Ring1][O][C][C][C][Ring1][#C][Ring2][Ring1][Ring1][C] causing a drug-induced liver injury (DILI)?\nAssistant: No, it is not causing a drug-induced liver injury (DILI)."}", "/scratch/micpie/export/drug_induced_liver_injury/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that causes drug-induced liver injury (DILI).\nDeepSMILES: O=CNCCO))CO)cccc[N+]=O)[O-]))cc6)))))))))CCl)Cl\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not causing drug-induced liver injury (DILI)."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that shows drug-induced liver injury.\nInChI: InChI=1S\/C13H12N2O5S\/c1-21(18,19)14-12-8-7-10(15(16)17)9-13(12)20-11-5-3-2-4-6-11\/h2-9,14H,1H3\nConstraint: Answer the question in a full sentence.\nResult: This molecule is causing drug-induced liver injury."}", "/scratch/micpie/export/drug_induced_liver_injury/valid_0-9.jsonl": "{"text":"User: Can you give me the SMILES of a molecule that is showing a drug-induced liver injury?\nAssistant: Yes, here you go: NC(=O)c1cnccn1"} {"text":"User: Can you create the canonical SMILES of a molecule that is showing a drug-induced liver injury?\nAssistant: Yes, I'm happy to help, here you go: Nc1ccc(S(=O)(=O)Nc2nccs2)cc1"}", "/scratch/micpie/export/drug_induced_liver_injury/test_0-1.jsonl": "{"text":"Based on the InChI representation InChI=1S\/C11H12Cl2N2O5\/c12-10(13)11(18)14-8(5-16)9(17)6-1-3-7(4-2-6)15(19)20\/h1-4,8-10,16-17H,5H2,(H,14,18), the molecule causes no fatal liver disease caused by drugs."} {"text":"Based on the canonical SMILES representation CS(=O)(=O)Nc1ccc([N+](=O)[O-])cc1Oc1ccccc1, the molecule causes drug-induced liver injury."}", "/scratch/micpie/export/drug_induced_liver_injury/valid_0-0.jsonl": "{"text":"The molecule with the SELFIES [N][C][=Branch1][C][=O][C][=C][N][=C][C][=N][Ring1][=Branch1] displays drug-induced liver injury."} {"text":"The molecule with the InChI InChI=1S\/C9H9N3O2S2\/c10-7-1-3-8(4-2-7)16(13,14)12-9-11-5-6-15-9\/h1-6H,10H2,(H,11,12) shows drug-induced liver injury (DILI)."}", "/scratch/micpie/export/drug_induced_liver_injury/test_0-2.jsonl": "{"text":"The DeepSMILES O=CNCCO))CO)cccc[N+]=O)[O-]))cc6)))))))))CCl)Cl is from a molecule that is not identified as causing a drug-induced liver injury (DILI)."} {"text":"The SELFIES [C][S][=Branch1][C][=O][=Branch1][C][=O][N][C][=C][C][=C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][C][=C][Ring1][=Branch2][O][C][=C][C][=C][C][=C][Ring1][=Branch1] represents a molecule that is identified as causing a drug-induced liver injury (DILI)."}", "/scratch/micpie/export/drug_induced_liver_injury/valid_0-10.jsonl": "{"text":"User: I'm looking for the SMILES of a molecule that is showing a fatal liver disease caused by drugs.\nAssistant: This is a molecule that is causing a fatal liver disease caused by drugs: NC(=O)c1cnccn1"} {"text":"User: I'm looking for the canonical SMILES of a molecule that is causing a drug-induced liver injury.\nAssistant: This is a molecule that is causing a drug-induced liver injury: Nc1ccc(S(=O)(=O)Nc2nccs2)cc1"}", "/scratch/micpie/export/drug_induced_liver_injury/train_0-6.jsonl": "{"text":"Task: Please give me a molecule canonical SMILES based on the text description below.\nDescription: A molecule that causes fatal liver disease caused by drugs.\nResult: C[N+](C)(C)CC(=O)[O-]"} {"text":"Task: Please give me a molecule canonical SMILES based on the text description.\nDescription: A molecule that causes fatal liver disease caused by drugs.\nResult: C#CC1(O)CCC2C3CCC4=CC(=O)CCC4C3CCC21C"}", "/scratch/micpie/export/drug_induced_liver_injury/valid_0-6.jsonl": "{"text":"Task: Please create a molecule SMILES based on the description.\nDescription: A molecule that causes drug-induced liver injury (DILI).\nResult: NC(=O)c1cnccn1"} {"text":"Task: Please give me a SMILES based on the text description below.\nDescription: A molecule that causes fatal liver disease caused by drugs.\nResult: Nc1ccc(S(=O)(=O)Nc2nccs2)cc1"}", "/scratch/micpie/export/drug_induced_liver_injury/test_0-9.jsonl": "{"text":"User: Can you give me the SELFIES of a molecule that is not causing a drug-induced liver injury?\nAssistant: Yes, I'm happy to help, here you go: [O][=C][Branch2][Ring1][#Branch2][N][C][Branch1][Ring1][C][O][C][Branch1][C][O][C][=C][C][=C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][C][=C][Ring1][=Branch2][C][Branch1][C][Cl][Cl]"} {"text":"User: Can you create the SMILES of a molecule that is causing a drug-induced liver injury?\nAssistant: Sure, here you go: CS(=O)(=O)Nc1ccc([N+](=O)[O-])cc1Oc1ccccc1"}", "/scratch/micpie/export/drug_induced_liver_injury/test_0-0.jsonl": "{"text":"The molecule with the InChI representation of InChI=1S\/C11H12Cl2N2O5\/c12-10(13)11(18)14-8(5-16)9(17)6-1-3-7(4-2-6)15(19)20\/h1-4,8-10,16-17H,5H2,(H,14,18) shows no drug-induced liver injury (DILI)."} {"text":"The molecule with the DeepSMILES representation of CS=O)=O)Ncccc[N+]=O)[O-]))cc6Occcccc6 displays drug-induced liver injury."}", "/scratch/micpie/export/drug_induced_liver_injury/valid_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the SMILES NC(=O)c1cnccn1 causes a fatal liver disease caused by drugs?\nAssistant: Yes, this molecule is causing a fatal liver disease caused by drugs."} {"text":"User: Can you tell me if the molecule with the InChI InChI=1S\/C9H9N3O2S2\/c10-7-1-3-8(4-2-7)16(13,14)12-9-11-5-6-15-9\/h1-6H,10H2,(H,11,12) shows a fatal liver disease caused by drugs?\nAssistant: Yes, this molecule is causing a fatal liver disease caused by drugs."}", "/scratch/micpie/export/drug_induced_liver_injury/test_0-3.jsonl": "{"text":"The molecule DeepSMILES O=CNCCO))CO)cccc[N+]=O)[O-]))cc6)))))))))CCl)Cl is causing no drug-induced liver injury."} {"text":"The SELFIES [C][S][=Branch1][C][=O][=Branch1][C][=O][N][C][=C][C][=C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][C][=C][Ring1][=Branch2][O][C][=C][C][=C][C][=C][Ring1][=Branch1] is causing drug-induced liver injury (DILI)."}", "/scratch/micpie/export/drug_induced_liver_injury/valid_0-11.jsonl": "{"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should be causing a drug-induced liver injury (DILI).\nAssistant: Got it, this SELFIES is causing a drug-induced liver injury (DILI): [N][C][=Branch1][C][=O][C][=C][N][=C][C][=N][Ring1][=Branch1]"} {"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should be causing a drug-induced liver injury.\nAssistant: Got it, this SMILES is causing a drug-induced liver injury: Nc1ccc(S(=O)(=O)Nc2nccs2)cc1"}", "/scratch/micpie/export/drug_induced_liver_injury/train_0-0.jsonl": "{"text":"The molecule with the SELFIES [C][N+1][Branch1][C][C][Branch1][C][C][C][C][=Branch1][C][=O][O-1] displays no drug-induced liver injury (DILI)."} {"text":"The molecule with the InChI InChI=1S\/C20H26O2\/c1-3-20(22)11-9-18-17-6-4-13-12-14(21)5-7-15(13)16(17)8-10-19(18,20)2\/h1,12,15-18,22H,4-11H2,2H3 causes no drug-induced liver injury (DILI)."}", "/scratch/micpie/export/drug_induced_liver_injury/test_0-6.jsonl": "{"text":"Task: Please give me a canonical SMILES based on the text description.\nDescription: A molecule that causes fatal liver disease caused by drugs.\nResult: O=C(NC(CO)C(O)c1ccc([N+](=O)[O-])cc1)C(Cl)Cl"} {"text":"Task: Please create a SMILES based on the description.\nDescription: A molecule that shows fatal liver disease caused by drugs.\nResult: CS(=O)(=O)Nc1ccc([N+](=O)[O-])cc1Oc1ccccc1"}", "/scratch/micpie/export/drug_induced_liver_injury/train_0-10.jsonl": "{"text":"User: I'm searching for the DeepSMILES of a molecule that is not showing a drug-induced liver injury (DILI).\nAssistant: This is a molecule that is not causing a drug-induced liver injury (DILI): C[N+]C)C)CC=O)[O-]"} {"text":"User: I'm searching for the canonical SMILES of a molecule that is not showing a drug-induced liver injury.\nAssistant: This is a molecule that is not causing a drug-induced liver injury: C#CC1(O)CCC2C3CCC4=CC(=O)CCC4C3CCC21C"}", "/scratch/micpie/export/drug_induced_liver_injury/train_0-3.jsonl": "{"text":"The molecule DeepSMILES C[N+]C)C)CC=O)[O-] is causing no drug-induced liver injury."} {"text":"The molecule SELFIES [C][#C][C][Branch1][C][O][C][C][C][C][C][C][C][=C][C][=Branch1][C][=O][C][C][C][Ring1][#Branch1][C][Ring1][O][C][C][C][Ring1][#C][Ring2][Ring1][Ring1][C] is causing no drug-induced liver injury."}", "/scratch/micpie/export/drug_induced_liver_injury/train_0-12.jsonl": "{"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be causing a drug-induced liver injury (DILI).\nAssistant: Ok, this DeepSMILES is not causing a drug-induced liver injury (DILI): C[N+]C)C)CC=O)[O-]"} {"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be causing a fatal liver disease caused by drugs.\nAssistant: Ok, this SMILES is not causing a fatal liver disease caused by drugs: C#CC1(O)CCC2C3CCC4=CC(=O)CCC4C3CCC21C"}", "/scratch/micpie/export/drug_induced_liver_injury/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of O=C(NC(CO)C(O)c1ccc([N+](=O)[O-])cc1)C(Cl)Cl causing a drug-induced liver injury?\nConstraint: Even if you are uncertain, you must pick either a or b without using any other words.\nOptions:\n[a] False\n[b] True\nAnswer: a"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES CS(=O)(=O)Nc1ccc([N+](=O)[O-])cc1Oc1ccccc1 causing a fatal liver disease caused by drugs?\nConstraint: Even if you are not sure, you must pick either A or B without using any additional words.\nOptions:\nA: True\nB: False\nAnswer: A"}", "/scratch/micpie/export/drug_induced_liver_injury/valid_0-2.jsonl": "{"text":"The canonical SMILES NC(=O)c1cnccn1 is from a molecule that is identified as causing a drug-induced liver injury (DILI)."} {"text":"The InChI InChI=1S\/C9H9N3O2S2\/c10-7-1-3-8(4-2-7)16(13,14)12-9-11-5-6-15-9\/h1-6H,10H2,(H,11,12) represents a molecule that is identified as causing a drug-induced liver injury."}", "/scratch/micpie/export/drug_induced_liver_injury/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not causing a fatal liver disease caused by drugs?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any other words.\nOptions:\n(1) COc1cccc2c1C(=O)c1c(O)c3c(c(O)c1C2=O)CC(O)(C(=O)CO)CC3OC1CC(N)C(O)C(C)O1\n(2) C[N+](C)(C)CC(=O)[O-]\n(3) CC(COc1ccccc1)N(CCCl)Cc1ccccc1\nAnswer: 2, 3"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not causing a drug-induced liver injury?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any other words.\nOptions:\n(1) COCC=O)OCCCNC)CCCcncccccc6[nH]9)))))))))))))))CCcccF)ccc6C%10CC)C\n(2) CCOC=O)Ccccccc6))))))CCNCCCC#N))cccccc6))))))cccccc6)))))))))CC6\n(3) C#CCO)CCCCCCC=CC=O)CCC6C%10CCC%14%17C\n(4) CCCCCCCC=CC=O)C=CC6C)C%10Cl)CO)CC%14C)C%17O)C=O)CO\nAnswer: 1, 2, 3, 4"}", "/scratch/micpie/export/drug_induced_liver_injury/valid_0-1.jsonl": "{"text":"Based on the InChI representation InChI=1S\/C5H5N3O\/c6-5(9)4-3-7-1-2-8-4\/h1-3H,(H2,6,9), the molecule causes drug-induced liver injury (DILI)."} {"text":"Based on the DeepSMILES representation NccccS=O)=O)Ncnccs5)))))))cc6, the molecule causes drug-induced liver injury."}", "/scratch/micpie/export/drug_induced_liver_injury/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI representation of InChI=1S\/C5H5N3O\/c6-5(9)4-3-7-1-2-8-4\/h1-3H,(H2,6,9) causing a drug-induced liver injury?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any additional words.\nOptions:\n1.) False\n2.) True\nAnswer: 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI InChI=1S\/C9H9N3O2S2\/c10-7-1-3-8(4-2-7)16(13,14)12-9-11-5-6-15-9\/h1-6H,10H2,(H,11,12) causing a drug-induced liver injury (DILI)?\nConstraint: Even if you are uncertain, you must pick either a or b without using any other words.\nOptions:\na. True\nb. False\nAnswer: a"}", "/scratch/micpie/export/drug_induced_liver_injury/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that causes drug-induced liver injury (DILI).\nInChI: InChI=1S\/C5H5N3O\/c6-5(9)4-3-7-1-2-8-4\/h1-3H,(H2,6,9)\nConstraint: Answer the question in a full sentence.\nResult: This molecule is causing drug-induced liver injury (DILI)."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that causes drug-induced liver injury.\nDeepSMILES: NccccS=O)=O)Ncnccs5)))))))cc6\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is causing drug-induced liver injury."}", "/scratch/micpie/export/drug_induced_liver_injury/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that causes fatal liver disease caused by drugs.\nSMILES: NC(=O)c1cnccn1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: True"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that causes drug-induced liver injury (DILI).\nInChI: InChI=1S\/C9H9N3O2S2\/c10-7-1-3-8(4-2-7)16(13,14)12-9-11-5-6-15-9\/h1-6H,10H2,(H,11,12)\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"}", "/scratch/micpie/export/drug_induced_liver_injury/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that causes drug-induced liver injury (DILI).\nSMILES: C[N+](C)(C)CC(=O)[O-]\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not causing drug-induced liver injury (DILI)."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that shows fatal liver disease caused by drugs.\ncanonical SMILES: C#CC1(O)CCC2C3CCC4=CC(=O)CCC4C3CCC21C\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not causing fatal liver disease caused by drugs."}", "/scratch/micpie/export/drug_induced_liver_injury/valid_0-12.jsonl": "{"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should be causing a drug-induced liver injury (DILI).\nAssistant: Ok, this canonical SMILES is causing a drug-induced liver injury (DILI): NC(=O)c1cnccn1"} {"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should be causing a drug-induced liver injury.\nAssistant: Understood, this DeepSMILES is causing a drug-induced liver injury: NccccS=O)=O)Ncnccs5)))))))cc6"}", "/scratch/micpie/export/drug_induced_liver_injury/train_0-2.jsonl": "{"text":"The SELFIES [C][N+1][Branch1][C][C][Branch1][C][C][C][C][=Branch1][C][=O][O-1] is from a molecule that is not identified as causing a drug-induced liver injury (DILI)."} {"text":"The DeepSMILES C#CCO)CCCCCCC=CC=O)CCC6C%10CCC%14%17C represents a molecule that is not identified as causing a drug-induced liver injury."}", "/scratch/micpie/export/drug_induced_liver_injury/test_0-11.jsonl": "{"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be causing a drug-induced liver injury.\nAssistant: Ok, this DeepSMILES is not causing a drug-induced liver injury: O=CNCCO))CO)cccc[N+]=O)[O-]))cc6)))))))))CCl)Cl"} {"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should be causing a fatal liver disease caused by drugs.\nAssistant: Ok, this DeepSMILES is causing a fatal liver disease caused by drugs: CS=O)=O)Ncccc[N+]=O)[O-]))cc6Occcccc6"}", "/scratch/micpie/export/drug_induced_liver_injury/train_0-7.jsonl": "{"text":"User: Can you derive if the molecule with the SELFIES [C][N+1][Branch1][C][C][Branch1][C][C][C][C][=Branch1][C][=O][O-1] causes a drug-induced liver injury?\nAssistant: No, this molecule is not causing a drug-induced liver injury."} {"text":"User: Can you derive if the molecule with the DeepSMILES C#CCO)CCCCCCC=CC=O)CCC6C%10CCC%14%17C causes a drug-induced liver injury (DILI)?\nAssistant: No, this molecule is not causing a drug-induced liver injury (DILI)."}", "/scratch/micpie/export/drug_induced_liver_injury/train_0-11.jsonl": "{"text":"User: I want to generate a InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be causing a drug-induced liver injury (DILI).\nAssistant: Got it, this InChI is not causing a drug-induced liver injury (DILI): InChI=1S\/C5H11NO2\/c1-6(2,3)4-5(7)8\/h4H2,1-3H3"} {"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be causing a drug-induced liver injury (DILI).\nAssistant: Got it, this InChI is not causing a drug-induced liver injury (DILI): InChI=1S\/C20H26O2\/c1-3-20(22)11-9-18-17-6-4-13-12-14(21)5-7-15(13)16(17)8-10-19(18,20)2\/h1,12,15-18,22H,4-11H2,2H3"}", "/scratch/micpie/export/drug_induced_liver_injury/train_0-1.jsonl": "{"text":"Based on the SELFIES representation [C][N+1][Branch1][C][C][Branch1][C][C][C][C][=Branch1][C][=O][O-1], the molecule causes no drug-induced liver injury (DILI)."} {"text":"Based on the canonical SMILES representation C#CC1(O)CCC2C3CCC4=CC(=O)CCC4C3CCC21C, the molecule causes no drug-induced liver injury."}", "/scratch/micpie/export/drug_induced_liver_injury/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES representation of C[N+](C)(C)CC(=O)[O-] causing a drug-induced liver injury?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any other words.\nOptions:\n1. True\n2. False\nAnswer: 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES representation of C#CCO)CCCCCCC=CC=O)CCC6C%10CCC%14%17C causing a fatal liver disease caused by drugs?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any additional words.\nOptions:\n1. False\n2. True\nAnswer: 1"}", "/scratch/micpie/export/drug_induced_liver_injury/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that causes drug-induced liver injury.\nMolecule canonical SMILES: C[N+](C)(C)CC(=O)[O-]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that causes fatal liver disease caused by drugs.\nMolecule InChI: InChI=1S\/C20H26O2\/c1-3-20(22)11-9-18-17-6-4-13-12-14(21)5-7-15(13)16(17)8-10-19(18,20)2\/h1,12,15-18,22H,4-11H2,2H3\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/drug_induced_liver_injury/test_0-7.jsonl": "{"text":"User: Can you tell me if the molecule with the SELFIES [O][=C][Branch2][Ring1][#Branch2][N][C][Branch1][Ring1][C][O][C][Branch1][C][O][C][=C][C][=C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][C][=C][Ring1][=Branch2][C][Branch1][C][Cl][Cl] shows a drug-induced liver injury?\nAssistant: No, this molecule is not causing a drug-induced liver injury."} {"text":"User: Can you derive if the molecule with the SELFIES [C][S][=Branch1][C][=O][=Branch1][C][=O][N][C][=C][C][=C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][C][=C][Ring1][=Branch2][O][C][=C][C][=C][C][=C][Ring1][=Branch1] causes a drug-induced liver injury?\nAssistant: Yes, this molecule is causing a drug-induced liver injury."}", "/scratch/micpie/export/drug_induced_liver_injury/train_0-9.jsonl": "{"text":"User: Can you give me the canonical SMILES of a molecule that is not causing a drug-induced liver injury?\nAssistant: Of course, here you go: C[N+](C)(C)CC(=O)[O-]"} {"text":"User: Can you generate the canonical SMILES of a molecule that is not showing a fatal liver disease caused by drugs?\nAssistant: Yes, here you go: C#CC1(O)CCC2C3CCC4=CC(=O)CCC4C3CCC21C"}", "/scratch/micpie/export/drug_induced_liver_injury/valid_0-3.jsonl": "{"text":"The molecule SELFIES [N][C][=Branch1][C][=O][C][=C][N][=C][C][=N][Ring1][=Branch1] is causing fatal liver disease caused by drugs."} {"text":"The molecule canonical SMILES Nc1ccc(S(=O)(=O)Nc2nccs2)cc1 is causing fatal liver disease caused by drugs."}", "/scratch/micpie/export/drug_induced_liver_injury/test_0-8.jsonl": "{"text":"User: Is the molecule with the SELFIES [O][=C][Branch2][Ring1][#Branch2][N][C][Branch1][Ring1][C][O][C][Branch1][C][O][C][=C][C][=C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][C][=C][Ring1][=Branch2][C][Branch1][C][Cl][Cl] causing a drug-induced liver injury (DILI)?\nAssistant: No, it is not causing a drug-induced liver injury (DILI)."} {"text":"User: Is the molecule with the DeepSMILES CS=O)=O)Ncccc[N+]=O)[O-]))cc6Occcccc6 causing a drug-induced liver injury?\nAssistant: Yes, it is causing a drug-induced liver injury."}", "/scratch/micpie/export/drug_induced_liver_injury/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not causing a drug-induced liver injury (DILI)?\nConstraint: You must select none, one or more options from 1 or 2 without using any other words.\nOptions:\n1.) O=C(NC(CO)C(O)c1ccc([N+](=O)[O-])cc1)C(Cl)Cl\n2.) CC(=O)Oc1ccc(C(=C2CCCCC2)c2ccc(OC(C)=O)cc2)cc1\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are causing a drug-induced liver injury?\nConstraint: You must select none, one or more options from A, B, C, or D without using any other words.\nOptions:\nA [C][C][Branch1][C][C][C][Branch1][#Branch1][C][=C][Branch1][C][Cl][Cl][C][Ring1][Branch2][C][=Branch1][C][=O][O][C][C][=C][C][=C][C][Branch1][#Branch2][O][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring1][=N]\nB [C][C][N][Branch1][Ring1][C][C][C][C][=C][C][Branch2][Ring1][Ring1][N][C][=C][C][=N][C][=C][C][Branch1][C][Cl][=C][C][=C][Ring1][O][Ring1][#Branch1][=C][C][=C][Ring2][Ring1][C][O]\nC [C][S][=Branch1][C][=O][=Branch1][C][=O][N][C][=C][C][=C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][C][=C][Ring1][=Branch2][O][C][=C][C][=C][C][=C][Ring1][=Branch1]\nD [C][C][C][C][C][C][=Branch1][C][=O][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][N][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring2][Ring1][C][=O]\nAnswer: B, C, D"}", "/scratch/micpie/export/drug_induced_liver_injury/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are causing a drug-induced liver injury?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any additional words.\nOptions:\n1.) NC=O)ccnccn6\n2.) CCC=O)O))cccc-cccccc6))))))cF)c6\n3.) O=CO)C=CC=NNccccS=O)=O)Ncccccn6))))))))cc6))))))))C=CC6=O\nAnswer: 1, 2, 3"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are causing a drug-induced liver injury?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any other words.\nOptions:\nA. CNC(C)Cc1ccccc1\nB. Nc1ccc(S(=O)(=O)Nc2nccs2)cc1\nC. O=c1[nH]cc(F)c(=O)[nH]1\nD. CN(C)C1C(=O)C(C(N)=O)=C(O)C2(O)C(=O)C3=C(O)c4c(O)ccc(Cl)c4C(O)C3CC12\nE. Cc1cccc(Nc2ccccc2C(=O)O)c1C\nAnswer: B, C, D, E"}", "/scratch/micpie/export/drug_induced_liver_injury/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that causes drug-induced liver injury.\nSMILES: O=C(NC(CO)C(O)c1ccc([N+](=O)[O-])cc1)C(Cl)Cl\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that causes drug-induced liver injury (DILI).\nSMILES: CS(=O)(=O)Nc1ccc([N+](=O)[O-])cc1Oc1ccccc1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: True"}", "/scratch/micpie/export/drug_induced_liver_injury/test_0-12.jsonl": "{"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be causing a drug-induced liver injury (DILI).\nAssistant: Ok, this SELFIES is not causing a drug-induced liver injury (DILI): [O][=C][Branch2][Ring1][#Branch2][N][C][Branch1][Ring1][C][O][C][Branch1][C][O][C][=C][C][=C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][C][=C][Ring1][=Branch2][C][Branch1][C][Cl][Cl]"} {"text":"User: I want to generate a SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should be causing a drug-induced liver injury.\nAssistant: Ok, this SMILES is causing a drug-induced liver injury: CS(=O)(=O)Nc1ccc([N+](=O)[O-])cc1Oc1ccccc1"}", "/scratch/micpie/export/ncbi_disease/test_0-1.jsonl": "{"text":"User: Does the following text contain mentions of diseases? Can you output matches?\nText: Mammalian APC2, which closely resembles APC in overall domain structure, was functionally analyzed and shown to contain two SAMP domains, both of which are required for binding to conductin.\nAssistant: There is no match"} {"text":"User: Does the following text contain mentions of diseases? Can you return matches?\nFurthermore, compared with noncarriers, APC I1307K carriers had increased numbers of adenomas and colorectal cancers per patient (P =. 03), as well as a younger age at diagnosis.\nAssistant: I found colorectal cancers and adenomas"}", "/scratch/micpie/export/ncbi_disease/valid_0-0.jsonl": "{"text":"Task: Find all the mentions of diseases in the following text. Return the matching words. If there is no match, return `no match`.\nDescription: In colon carcinoma cells, loss of APC leads to the accumulation of betacatenin in the nucleus, where it binds to and activates the Tcf-4 transcription factor (reviewed in [ 1] [ 2]).\nAnswer: colon carcinoma"} {"text":"Task: Find all the mentions of diseases in the following text. Return the matching entities. If there is no match, return `no match`.\nDescription: Immunohistochemical staining of human breast specimens also revealed BRCA1 nuclear foci in benign breast, invasive lobular cancers and low-grade ductal carcinomas.\nAnswer: invasive lobular cancers and low-grade ductal carcinomas"}", "/scratch/micpie/export/ncbi_disease/test_0-0.jsonl": "{"text":"Task: Find all the mentions of diseases in the subsequent text. Return the matching words. If there is no matching entity, return `no match`.\nDescription: Mammalian APC2, which closely resembles APC in overall domain structure, was functionally analyzed and shown to contain two SAMP domains, both of which are required for binding to conductin.\nAnswer: no match"} {"text":"Task: Find all the mentions of diseases in the following text. Return the matching words. If there is no mention of a disease, return `no match`.\nSentence: Furthermore, compared with noncarriers, APC I1307K carriers had increased numbers of adenomas and colorectal cancers per patient (P =. 03), as well as a younger age at diagnosis.\nAnswer: colorectal cancers and adenomas"}", "/scratch/micpie/export/ncbi_disease/train_0-0.jsonl": "{"text":"Task: Find all the mentions of diseases in the following sentence. Return the matching entities. If there is no mention of a disease, return `no match`.\nDescription: Identification of APC2, a homologue of the adenomatous polyposis coli tumour suppressor.\nAnswer: adenomatous polyposis coli tumour"} {"text":"Task: Find all the mentions of diseases in the following sentence. Return the matching entities. If there is no mention of a disease, return `no match`.\nSentence: Conversely, BRCA1 expression was reduced or undetectable in the majority of high-grade, ductal carcinomas, suggesting that absence of BRCA1 may contribute to the pathogenesis of a significant percentage of sporadic breast cancers..\nAnswer: ductal carcinomas and sporadic breast cancers"}", "/scratch/micpie/export/ncbi_disease/valid_0-1.jsonl": "{"text":"User: Does the following text contain mentions of diseases? Can you output matches?\nText: In colon carcinoma cells, loss of APC leads to the accumulation of betacatenin in the nucleus, where it binds to and activates the Tcf-4 transcription factor (reviewed in [ 1] [ 2]).\nAssistant: There is colon carcinoma"} {"text":"User: Does the following text contain mentions of diseases? Can you output matches?\nImmunohistochemical staining of human breast specimens also revealed BRCA1 nuclear foci in benign breast, invasive lobular cancers and low-grade ductal carcinomas.\nAssistant: There is invasive lobular cancers and low-grade ductal carcinomas"}", "/scratch/micpie/export/ncbi_disease/train_0-1.jsonl": "{"text":"User: Does the following text contain mentions of diseases? Can you output matches?\nIdentification of APC2, a homologue of the adenomatous polyposis coli tumour suppressor.\nAssistant: I found adenomatous polyposis coli tumour"} {"text":"User: Does the following text contain mentions of diseases? Can you output matches?\nConversely, BRCA1 expression was reduced or undetectable in the majority of high-grade, ductal carcinomas, suggesting that absence of BRCA1 may contribute to the pathogenesis of a significant percentage of sporadic breast cancers..\nAssistant: I found ductal carcinomas and sporadic breast cancers"}", "/scratch/micpie/export/ord_predictions/test_2-4.jsonl": "{"text":"User: I want synthesize [Br:1][c:2]1[cH:3][c:4]2[c:9]([c:10]([CH:12]3[CH2:13][CH2:20]3)[cH:11]1)[O:8][C:7]([CH3:14])([CH3:15])[CH2:6][C:5]2([CH3:16])[CH3:17].\nAssistant: Great, is there anything else I can do for you?\nUser: I would like to know the reaction educts I need to produce the products [Br:1][c:2]1[cH:3][c:4]2[c:9]([c:10]([CH:12]3[CH2:13][CH2:20]3)[cH:11]1)[O:8][C:7]([CH3:14])([CH3:15])[CH2:6][C:5]2([CH3:16])[CH3:17].\nAssistant: I propose the following reaction educts: [Br:1][c:2]1[cH:3][c:4]2[c:9]([c:10]([CH:12]=[CH2:13])[cH:11]1)[O:8][C:7]([CH3:14])([CH3:15])[CH2:6][C:5]2([CH3:16])[CH3:17], [CH3:21][CH2:22][O:23][CH2:24][CH3:25], and [N+:18](=[N-:19])=[CH2:20]."} {"text":"User: I would like to produce [Br:1][c:2]1[c:3](=[O:10])[nH:4][c:5]([Cl:11])[n:6][c:7]1[CH3:8].\nAssistant: Cool, is there anything else I can do for you?\nUser: I would like to know the starting materials I need to produce the reaction products [Br:1][c:2]1[c:3](=[O:10])[nH:4][c:5]([Cl:11])[n:6][c:7]1[CH3:8].\nAssistant: I propose the following starting materials: Cc1nc([Cl:11])[nH]c(=O)c1C and O=[c:5]1[nH:4][c:3](=[O:10])[c:2]([Br:1])[c:7]([CH3:8])[nH:6]1."}", "/scratch/micpie/export/ord_predictions/train_2-2.jsonl": "{"text":"Question: What starting materials are needed to produce [CH2:1]([CH3:2])[n:3]1[n:4][cH:5][c:6]2[c:7]1[n:8][c:9]([CH2:48][CH3:49])[c:10]([CH2:19][NH:20][C:21](=[O:22])[c:23]1[cH:24][c:25]([C:30](=[O:31])[NH:32][CH2:33][c:34]3[cH:35][c:36](-[c:40]4[cH:41][c:42]([CH2:50][N:51]5[CH2:52][CH2:53][N:54]([CH3:58])[CH2:55][CH2:56][CH2:57]5)[cH:43][cH:44][cH:45]4)[cH:37][cH:38][cH:39]3)[cH:26][c:27]([CH3:29])[cH:28]1)[c:11]2[NH:12][CH:13]1[CH2:14][CH2:15][O:16][CH2:17][CH2:18]1?\nAnswer: CC(=O)O[BH-](OC(C)=O)OC(C)=O, CS(C)=O, C[C:58](=O)O, O=C[c:42]1[cH:41][c:40](-[c:36]2[cH:35][c:34]([CH2:33][NH:32][C:30]([c:25]3[cH:24][c:23]([C:21]([NH:20][CH2:19][c:10]4[c:9]([CH2:48][CH3:49])[n:8][c:7]5[n:3]([CH2:1][CH3:2])[n:4][cH:5][c:6]5[c:11]4[NH:12][CH:13]4[CH2:14][CH2:15][O:16][CH2:17][CH2:18]4)=[O:22])[cH:28][c:27]([CH3:29])[cH:26]3)=[O:31])[cH:39][cH:38][cH:37]2)[cH:45][cH:44][cH:43]1, and [CH3:50][N:51]1[CH2:52][CH2:53][NH:54][CH2:55][CH2:56][CH2:57]1."} {"text":"Question: What starting materials are needed to produce [Br:1][c:2]1[cH:3][cH:4][c:5]([CH:8]([CH2:11][CH:12]2[CH2:13][CH2:14][N:15]([CH3:18])[CH2:16][CH2:17]2)[C:25]([OH:26])=[O:28])[cH:6][cH:7]1?\nAnswer: [Br:1][c:2]1[cH:3][cH:4][c:5]([CH:8]([C:9]#[N:10])[CH2:11][CH:12]2[CH2:13][CH2:14][N:15]([CH3:18])[CH2:16][CH2:17]2)[cH:6][cH:7]1, [C:25]([O-:26])([O-:27])=[O:28], [CH3:31][OH:32], [Na+:29], [Na+:30], [OH2:24], and [S:19](=[O:20])(=[O:21])([OH:22])[OH:23]."}", "/scratch/micpie/export/ord_predictions/valid_1-4.jsonl": "{"text":"User: I want produce [CH3:1][O:2][c:3]1[cH:4][c:5]([CH2:6][NH:7][C:15](=[O:16])[NH:17][c:18]2[c:19]3[cH:20][cH:21][n:22][cH:23][c:24]3[cH:25][cH:26][cH:27]2)[cH:8][c:9]([O:11][CH3:12])[cH:10]1.\nAssistant: That's interesting, is there anything else I can do for you?\nUser: I would like to know the starting materials I need to produce the products [CH3:1][O:2][c:3]1[cH:4][c:5]([CH2:6][NH:7][C:15](=[O:16])[NH:17][c:18]2[c:19]3[cH:20][cH:21][n:22][cH:23][c:24]3[cH:25][cH:26][cH:27]2)[cH:8][c:9]([O:11][CH3:12])[cH:10]1.\nAssistant: I advise the following starting materials: [CH2:30]1[CH2:31][CH2:32][C:33]2=[N:38][CH2:37][CH2:36][CH2:35][N:34]2[CH2:39][CH2:40]1, [CH3:1][O:2][c:3]1[cH:4][c:5]([CH2:6][NH2:7])[cH:8][c:9]([O:11][CH3:12])[cH:10]1, and [Cl:13][C:14]([C:15](=[O:16])[NH:17][c:18]1[c:19]2[cH:20][cH:21][n:22][cH:23][c:24]2[cH:25][cH:26][cH:27]1)([Cl:28])[Cl:29]."} {"text":"User: I must produce [c:2]1([CH2:14][CH:15]([CH3:16])[CH3:18])[s:3][c:4]2[c:5]([n:6]1)[cH:7][c:8]([C:11]#[N:12])[cH:9][cH:10]2.\nAssistant: Great, is there anything else I can do for you?\nUser: Yes, I would like to know the reaction educts I need to produce the reaction products [c:2]1([CH2:14][CH:15]([CH3:16])[CH3:18])[s:3][c:4]2[c:5]([n:6]1)[cH:7][c:8]([C:11]#[N:12])[cH:9][cH:10]2.\nAssistant: I recommend the following reaction educts: Br[c:2]1[s:3][c:4]2[c:5]([n:6]1)[cH:7][c:8]([C:11]#[N:12])[cH:9][cH:10]2, CN1CCCC1=O, ClCCl, Cn1ccnc1, and [Br-].[Zn+][CH2:16][CH:15]([CH3:14])[CH3:18]."}", "/scratch/micpie/export/ord_predictions/test_1-2.jsonl": "{"text":"Question: Which reaction educts are needed to synthesize [c:1]1([CH2:7][CH2:8][CH:9]([C:10]([O:11][CH3:22])=[O:12])[CH2:16][C:14](=[O:15])[OH:20])[cH:2][cH:3][cH:4][cH:5][cH:6]1?\nAnswer: Cl[CH2:22]Cl, F[C:16](F)(F)[C:14](=[O:15])[OH:20], O, and [c:1]1([CH2:7][CH2:8][CH2:9][C:10](=[O:11])[OH:12])[cH:2][cH:3][cH:4][cH:5][cH:6]1."} {"text":"Question: Which reaction educts are needed to produce [O:1]1[CH2:2][CH2:3][O:4][C:5]12[CH2:6][CH2:7][CH:8]([c:11]1[cH:12][nH:13][c:14]3[cH:15][cH:16][c:17]([F:20])[cH:18][c:19]13)[CH2:9][CH2:10]2?\nAnswer: CCO, [O:1]1[CH2:2][CH2:3][O:4][C:5]12[CH2:6][CH:7]=[C:8]([C:11]1=[c:19]3[c:14]([cH:15][cH:16][c:17]([F:20])[cH:18]3)=[N:13][CH2:12]1)[CH2:9][CH2:10]2, and [Pd]."}", "/scratch/micpie/export/ord_predictions/test_0-1.jsonl": "{"text":"The reaction SMILES string [CH2:25]([CH:26]([CH3:27])[OH:28])[OH:29].[CH3:1][n:2]1[c:3](=[O:24])[n:4](-[c:13]2[cH:14][cH:15][c:16]3[c:17]([c:18]([CH:21]=[O:22])[n:19][s:20]3)[cH:23]2)[c:5](=[O:12])[cH:6][c:7]1[C:8]([F:9])([F:10])[F:11].[CH3:41][c:42]1[cH:43][cH:44][cH:45][cH:46][cH:47]1.[CH3:48][CH2:49][O:50][CH2:51][CH3:52].[OH2:53].[c:30]1([CH3:31])[cH:32][cH:33][c:34]([S:35]([OH:36])(=[O:37])=[O:38])[cH:39][cH:40]1>>[CH3:1][n:2]1[c:3](=[O:24])[n:4](-[c:13]2[cH:14][cH:15][c:16]3[c:17]([c:18]([CH:21]4[O:22][CH2:25][CH:26]([CH3:27])[O:28]4)[n:19][s:20]3)[cH:23]2)[c:5](=[O:12])[cH:6][c:7]1[C:8]([F:9])([F:10])[F:11] has the reaction products [CH3:1][n:2]1[c:3](=[O:24])[n:4](-[c:13]2[cH:14][cH:15][c:16]3[c:17]([c:18]([CH:21]4[O:22][CH2:25][CH:26]([CH3:27])[O:28]4)[n:19][s:20]3)[cH:23]2)[c:5](=[O:12])[cH:6][c:7]1[C:8]([F:9])([F:10])[F:11] and the starting materials [CH2:25]([CH:26]([CH3:27])[OH:28])[OH:29], [CH3:1][n:2]1[c:3](=[O:24])[n:4](-[c:13]2[cH:14][cH:15][c:16]3[c:17]([c:18]([CH:21]=[O:22])[n:19][s:20]3)[cH:23]2)[c:5](=[O:12])[cH:6][c:7]1[C:8]([F:9])([F:10])[F:11], [CH3:41][c:42]1[cH:43][cH:44][cH:45][cH:46][cH:47]1, [CH3:48][CH2:49][O:50][CH2:51][CH3:52], [OH2:53], and [c:30]1([CH3:31])[cH:32][cH:33][c:34]([S:35]([OH:36])(=[O:37])=[O:38])[cH:39][cH:40]1."} {"text":"The reaction SMILES string Cl.Cl[C:1]([c:2]1[c:3]([OH:4])[cH:5][cH:6][cH:7][cH:8]1)=[O:9].O[c:13]1[c:12]([C:11](=[O:19])[NH2:20])[cH:18][cH:17][cH:16][cH:15]1>>[c:1]1(-[c:2]2[c:3]([OH:4])[cH:5][cH:6][cH:7][cH:8]2)[o:9][c:18]2[c:12]([c:11](=[O:19])[n:20]1)[cH:13][cH:15][cH:16][cH:17]2 has the products [c:1]1(-[c:2]2[c:3]([OH:4])[cH:5][cH:6][cH:7][cH:8]2)[o:9][c:18]2[c:12]([c:11](=[O:19])[n:20]1)[cH:13][cH:15][cH:16][cH:17]2 and the starting materials Cl, Cl[C:1]([c:2]1[c:3]([OH:4])[cH:5][cH:6][cH:7][cH:8]1)=[O:9], and O[c:13]1[c:12]([C:11](=[O:19])[NH2:20])[cH:18][cH:17][cH:16][cH:15]1."}", "/scratch/micpie/export/ord_predictions/test_2-0.jsonl": "{"text":"The reaction SMILES [Br:1][c:2]1[cH:3][c:4]2[c:9]([c:10]([CH:12]=[CH2:13])[cH:11]1)[O:8][C:7]([CH3:14])([CH3:15])[CH2:6][C:5]2([CH3:16])[CH3:17].[CH3:21][CH2:22][O:23][CH2:24][CH3:25].[N+:18](=[N-:19])=[CH2:20]>>[Br:1][c:2]1[cH:3][c:4]2[c:9]([c:10]([CH:12]3[CH2:13][CH2:20]3)[cH:11]1)[O:8][C:7]([CH3:14])([CH3:15])[CH2:6][C:5]2([CH3:16])[CH3:17] has the starting materials [Br:1][c:2]1[cH:3][c:4]2[c:9]([c:10]([CH:12]=[CH2:13])[cH:11]1)[O:8][C:7]([CH3:14])([CH3:15])[CH2:6][C:5]2([CH3:16])[CH3:17], [CH3:21][CH2:22][O:23][CH2:24][CH3:25], and [N+:18](=[N-:19])=[CH2:20] and the reaction products [Br:1][c:2]1[cH:3][c:4]2[c:9]([c:10]([CH:12]3[CH2:13][CH2:20]3)[cH:11]1)[O:8][C:7]([CH3:14])([CH3:15])[CH2:6][C:5]2([CH3:16])[CH3:17]."} {"text":"The reaction SMILES string Cc1nc([Cl:11])[nH]c(=O)c1C.O=[c:5]1[nH:4][c:3](=[O:10])[c:2]([Br:1])[c:7]([CH3:8])[nH:6]1>>[Br:1][c:2]1[c:3](=[O:10])[nH:4][c:5]([Cl:11])[n:6][c:7]1[CH3:8] has the starting materials Cc1nc([Cl:11])[nH]c(=O)c1C and O=[c:5]1[nH:4][c:3](=[O:10])[c:2]([Br:1])[c:7]([CH3:8])[nH:6]1 and the products [Br:1][c:2]1[c:3](=[O:10])[nH:4][c:5]([Cl:11])[n:6][c:7]1[CH3:8]."}", "/scratch/micpie/export/ord_predictions/valid_2-2.jsonl": "{"text":"Question: Which reaction educts are required to produce [CH2:1]([CH:2]([CH3:3])[CH3:4])[c:5]1[cH:6][c:7]2[c:8]([n:9][cH:10][n:11][c:12]2[NH:13][CH:14]2[CH2:15][CH2:16][N:17]([CH:26]([CH3:27])[c:28]3[cH:29][c:30]([F:35])[cH:31][c:32]([F:34])[cH:33]3)[CH2:18][CH2:19]2)[s:20]1?\nAnswer: [CH2:1]([CH:2]([CH3:3])[CH3:4])[c:5]1[cH:6][c:7]2[c:8]([n:9][cH:10][n:11][c:12]2[NH:13][CH:14]2[CH2:15][CH2:16][NH:17][CH2:18][CH2:19]2)[s:20]1 and [CH3:21][S:22]([O:23][CH:26]([CH3:27])[c:28]1[cH:29][c:30]([F:35])[cH:31][c:32]([F:34])[cH:33]1)(=[O:24])=[O:25]."} {"text":"Question: Which reaction educts are required to produce [NH4+:19].[NH4+:30].[O:3]=[P:4]([O-:5])([O-:8])[C:9]([F:10])([F:11])[c:12]1[c:13]([Br:24])[cH:14][c:15]2[cH:16][cH:17][c:18]([C:22]#[N:23])[n:19][c:20]2[cH:21]1?\nAnswer: CC[O:3][P:4]([O:5]CC)(=[O:8])[C:9]([F:10])([F:11])[c:12]1[c:13]([Br:24])[cH:14][c:15]2[cH:16][cH:17][c:18]([C:22]#[N:23])[n:19][c:20]2[cH:21]1, CO, C[Si](C)(C)Br, ClCCl, and [NH3:30]."}", "/scratch/micpie/export/ord_predictions/valid_0-0.jsonl": "{"text":"The reaction SMILES (RXNSMILES) C[CH2:1][O:3][CH:4]([CH2:5][P:6](=[O:7])([O:8][CH2:9][CH3:10])[O:11][CH2:12][CH3:13])[O:14][CH2:15][CH3:16].Cc1ccc(S(=O)(=O)O)cc1.Cc1ccccc1.O=C([O-])O.[Na+].OCC(O)C[Br:17]>>[CH2:1]1[O:3][C@@H:4]([CH2:5][P:6](=[O:7])([O:8][CH2:9][CH3:10])[O:11][CH2:12][CH3:13])[O:14][C@@H:15]1[CH2:16][Br:17] has the reaction educts C[CH2:1][O:3][CH:4]([CH2:5][P:6](=[O:7])([O:8][CH2:9][CH3:10])[O:11][CH2:12][CH3:13])[O:14][CH2:15][CH3:16], Cc1ccc(S(=O)(=O)O)cc1, Cc1ccccc1, O=C([O-])O.[Na+], and OCC(O)C[Br:17] and the reaction products [CH2:1]1[O:3][C@@H:4]([CH2:5][P:6](=[O:7])([O:8][CH2:9][CH3:10])[O:11][CH2:12][CH3:13])[O:14][C@@H:15]1[CH2:16][Br:17]."} {"text":"The reaction SMILES string CCOC(C)=O.CN(C)C=O.CN1CCOCC1.ClCCCl.O=[C:10]([CH2:9][O:8][CH2:1][c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1)[OH:12].On1nnc2ccccc21.[F:34][C:35]([c:36]1[cH:37][c:38]([NH2:39])[cH:40][cH:41][cH:42]1)([F:43])[F:44]>>[CH2:1]([c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1)[O:8][CH2:9][C:10](=[O:12])[NH:39][c:38]1[cH:37][c:36]([C:35]([F:34])([F:43])[F:44])[cH:42][cH:41][cH:40]1 has the reaction educts CCOC(C)=O, CN(C)C=O, CN1CCOCC1, ClCCCl, O=[C:10]([CH2:9][O:8][CH2:1][c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1)[OH:12], On1nnc2ccccc21, and [F:34][C:35]([c:36]1[cH:37][c:38]([NH2:39])[cH:40][cH:41][cH:42]1)([F:43])[F:44] and the reaction products [CH2:1]([c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1)[O:8][CH2:9][C:10](=[O:12])[NH:39][c:38]1[cH:37][c:36]([C:35]([F:34])([F:43])[F:44])[cH:42][cH:41][cH:40]1."}", "/scratch/micpie/export/ord_predictions/train_1-3.jsonl": "{"text":"Question: Which reaction products are produced from the starting materials CO, [H][H], [Pd], and c1ccc(C[NH:15][CH2:14][C@H:11]2[C@H:10]([OH:23])[CH2:9][N:8]([CH2:7][CH2:6][CH:2]3[O:1][CH2:5][CH2:4][O:3]3)[CH2:13][CH2:12]2)cc1?\nAnswer: [O:1]1[CH:2]([CH2:6][CH2:7][N:8]2[CH2:9][C@@H:10]([OH:23])[C@H:11]([CH2:14][NH2:15])[CH2:12][CH2:13]2)[O:3][CH2:4][CH2:5]1."} {"text":"Question: Which reaction products are produced from the educts [C:18](=[O:19])([O-:20])[O-:21], [CH3:24][C:25](=[O:26])[CH3:27], [K+:22], [K+:23], [N+:1](=[O:2])([O-:3])[c:4]1[c:5]([CH2:6][Cl:7])[cH:8][cH:9][cH:10][cH:11]1, and [SH:12][CH2:13][C:14](=[O:15])[O:16][CH3:17]?\nAnswer: [N+:1](=[O:2])([O-:3])[c:4]1[c:5]([CH2:6][S:12][CH2:13][C:14](=[O:15])[O:16][CH3:17])[cH:8][cH:9][cH:10][cH:11]1."}", "/scratch/micpie/export/ord_predictions/test_0-2.jsonl": "{"text":"Question: What educts are needed to synthesize [CH3:1][n:2]1[c:3](=[O:24])[n:4](-[c:13]2[cH:14][cH:15][c:16]3[c:17]([c:18]([CH:21]4[O:22][CH2:25][CH:26]([CH3:27])[O:28]4)[n:19][s:20]3)[cH:23]2)[c:5](=[O:12])[cH:6][c:7]1[C:8]([F:9])([F:10])[F:11]?\nAnswer: [CH2:25]([CH:26]([CH3:27])[OH:28])[OH:29], [CH3:1][n:2]1[c:3](=[O:24])[n:4](-[c:13]2[cH:14][cH:15][c:16]3[c:17]([c:18]([CH:21]=[O:22])[n:19][s:20]3)[cH:23]2)[c:5](=[O:12])[cH:6][c:7]1[C:8]([F:9])([F:10])[F:11], [CH3:41][c:42]1[cH:43][cH:44][cH:45][cH:46][cH:47]1, [CH3:48][CH2:49][O:50][CH2:51][CH3:52], [OH2:53], and [c:30]1([CH3:31])[cH:32][cH:33][c:34]([S:35]([OH:36])(=[O:37])=[O:38])[cH:39][cH:40]1."} {"text":"Question: Which starting materials are required to synthesize [c:1]1(-[c:2]2[c:3]([OH:4])[cH:5][cH:6][cH:7][cH:8]2)[o:9][c:18]2[c:12]([c:11](=[O:19])[n:20]1)[cH:13][cH:15][cH:16][cH:17]2?\nAnswer: Cl, Cl[C:1]([c:2]1[c:3]([OH:4])[cH:5][cH:6][cH:7][cH:8]1)=[O:9], and O[c:13]1[c:12]([C:11](=[O:19])[NH2:20])[cH:18][cH:17][cH:16][cH:15]1."}", "/scratch/micpie/export/ord_predictions/train_1-0.jsonl": "{"text":"The reaction SMILES CO.[H][H].[Pd].c1ccc(C[NH:15][CH2:14][C@H:11]2[C@H:10]([OH:23])[CH2:9][N:8]([CH2:7][CH2:6][CH:2]3[O:1][CH2:5][CH2:4][O:3]3)[CH2:13][CH2:12]2)cc1>>[O:1]1[CH:2]([CH2:6][CH2:7][N:8]2[CH2:9][C@@H:10]([OH:23])[C@H:11]([CH2:14][NH2:15])[CH2:12][CH2:13]2)[O:3][CH2:4][CH2:5]1 has the educts CO, [H][H], [Pd], and c1ccc(C[NH:15][CH2:14][C@H:11]2[C@H:10]([OH:23])[CH2:9][N:8]([CH2:7][CH2:6][CH:2]3[O:1][CH2:5][CH2:4][O:3]3)[CH2:13][CH2:12]2)cc1 and the products [O:1]1[CH:2]([CH2:6][CH2:7][N:8]2[CH2:9][C@@H:10]([OH:23])[C@H:11]([CH2:14][NH2:15])[CH2:12][CH2:13]2)[O:3][CH2:4][CH2:5]1."} {"text":"The reaction SMILES (RXNSMILES) [C:18](=[O:19])([O-:20])[O-:21].[CH3:24][C:25](=[O:26])[CH3:27].[K+:22].[K+:23].[N+:1](=[O:2])([O-:3])[c:4]1[c:5]([CH2:6][Cl:7])[cH:8][cH:9][cH:10][cH:11]1.[SH:12][CH2:13][C:14](=[O:15])[O:16][CH3:17]>>[N+:1](=[O:2])([O-:3])[c:4]1[c:5]([CH2:6][S:12][CH2:13][C:14](=[O:15])[O:16][CH3:17])[cH:8][cH:9][cH:10][cH:11]1 has the educts [C:18](=[O:19])([O-:20])[O-:21], [CH3:24][C:25](=[O:26])[CH3:27], [K+:22], [K+:23], [N+:1](=[O:2])([O-:3])[c:4]1[c:5]([CH2:6][Cl:7])[cH:8][cH:9][cH:10][cH:11]1, and [SH:12][CH2:13][C:14](=[O:15])[O:16][CH3:17] and the products [N+:1](=[O:2])([O-:3])[c:4]1[c:5]([CH2:6][S:12][CH2:13][C:14](=[O:15])[O:16][CH3:17])[cH:8][cH:9][cH:10][cH:11]1."}", "/scratch/micpie/export/ord_predictions/train_2-3.jsonl": "{"text":"Question: What products are produced from the educts CC(=O)O[BH-](OC(C)=O)OC(C)=O, CS(C)=O, C[C:58](=O)O, O=C[c:42]1[cH:41][c:40](-[c:36]2[cH:35][c:34]([CH2:33][NH:32][C:30]([c:25]3[cH:24][c:23]([C:21]([NH:20][CH2:19][c:10]4[c:9]([CH2:48][CH3:49])[n:8][c:7]5[n:3]([CH2:1][CH3:2])[n:4][cH:5][c:6]5[c:11]4[NH:12][CH:13]4[CH2:14][CH2:15][O:16][CH2:17][CH2:18]4)=[O:22])[cH:28][c:27]([CH3:29])[cH:26]3)=[O:31])[cH:39][cH:38][cH:37]2)[cH:45][cH:44][cH:43]1, and [CH3:50][N:51]1[CH2:52][CH2:53][NH:54][CH2:55][CH2:56][CH2:57]1?\nAnswer: [CH2:1]([CH3:2])[n:3]1[n:4][cH:5][c:6]2[c:7]1[n:8][c:9]([CH2:48][CH3:49])[c:10]([CH2:19][NH:20][C:21](=[O:22])[c:23]1[cH:24][c:25]([C:30](=[O:31])[NH:32][CH2:33][c:34]3[cH:35][c:36](-[c:40]4[cH:41][c:42]([CH2:50][N:51]5[CH2:52][CH2:53][N:54]([CH3:58])[CH2:55][CH2:56][CH2:57]5)[cH:43][cH:44][cH:45]4)[cH:37][cH:38][cH:39]3)[cH:26][c:27]([CH3:29])[cH:28]1)[c:11]2[NH:12][CH:13]1[CH2:14][CH2:15][O:16][CH2:17][CH2:18]1."} {"text":"Question: What products are produced from the starting materials [Br:1][c:2]1[cH:3][cH:4][c:5]([CH:8]([C:9]#[N:10])[CH2:11][CH:12]2[CH2:13][CH2:14][N:15]([CH3:18])[CH2:16][CH2:17]2)[cH:6][cH:7]1, [C:25]([O-:26])([O-:27])=[O:28], [CH3:31][OH:32], [Na+:29], [Na+:30], [OH2:24], and [S:19](=[O:20])(=[O:21])([OH:22])[OH:23]?\nAnswer: [Br:1][c:2]1[cH:3][cH:4][c:5]([CH:8]([CH2:11][CH:12]2[CH2:13][CH2:14][N:15]([CH3:18])[CH2:16][CH2:17]2)[C:25]([OH:26])=[O:28])[cH:6][cH:7]1."}", "/scratch/micpie/export/ord_predictions/valid_1-2.jsonl": "{"text":"Question: What educts are needed to synthesize [CH3:1][O:2][c:3]1[cH:4][c:5]([CH2:6][NH:7][C:15](=[O:16])[NH:17][c:18]2[c:19]3[cH:20][cH:21][n:22][cH:23][c:24]3[cH:25][cH:26][cH:27]2)[cH:8][c:9]([O:11][CH3:12])[cH:10]1?\nAnswer: [CH2:30]1[CH2:31][CH2:32][C:33]2=[N:38][CH2:37][CH2:36][CH2:35][N:34]2[CH2:39][CH2:40]1, [CH3:1][O:2][c:3]1[cH:4][c:5]([CH2:6][NH2:7])[cH:8][c:9]([O:11][CH3:12])[cH:10]1, and [Cl:13][C:14]([C:15](=[O:16])[NH:17][c:18]1[c:19]2[cH:20][cH:21][n:22][cH:23][c:24]2[cH:25][cH:26][cH:27]1)([Cl:28])[Cl:29]."} {"text":"Question: What starting materials are needed to synthesize [c:2]1([CH2:14][CH:15]([CH3:16])[CH3:18])[s:3][c:4]2[c:5]([n:6]1)[cH:7][c:8]([C:11]#[N:12])[cH:9][cH:10]2?\nAnswer: Br[c:2]1[s:3][c:4]2[c:5]([n:6]1)[cH:7][c:8]([C:11]#[N:12])[cH:9][cH:10]2, CN1CCCC1=O, ClCCl, Cn1ccnc1, and [Br-].[Zn+][CH2:16][CH:15]([CH3:14])[CH3:18]."}", "/scratch/micpie/export/ord_predictions/train_2-4.jsonl": "{"text":"User: I must produce [CH2:1]([CH3:2])[n:3]1[n:4][cH:5][c:6]2[c:7]1[n:8][c:9]([CH2:48][CH3:49])[c:10]([CH2:19][NH:20][C:21](=[O:22])[c:23]1[cH:24][c:25]([C:30](=[O:31])[NH:32][CH2:33][c:34]3[cH:35][c:36](-[c:40]4[cH:41][c:42]([CH2:50][N:51]5[CH2:52][CH2:53][N:54]([CH3:58])[CH2:55][CH2:56][CH2:57]5)[cH:43][cH:44][cH:45]4)[cH:37][cH:38][cH:39]3)[cH:26][c:27]([CH3:29])[cH:28]1)[c:11]2[NH:12][CH:13]1[CH2:14][CH2:15][O:16][CH2:17][CH2:18]1.\nAssistant: Is there anything else I can do for you?\nUser: I would like to know the starting materials I need to produce the products [CH2:1]([CH3:2])[n:3]1[n:4][cH:5][c:6]2[c:7]1[n:8][c:9]([CH2:48][CH3:49])[c:10]([CH2:19][NH:20][C:21](=[O:22])[c:23]1[cH:24][c:25]([C:30](=[O:31])[NH:32][CH2:33][c:34]3[cH:35][c:36](-[c:40]4[cH:41][c:42]([CH2:50][N:51]5[CH2:52][CH2:53][N:54]([CH3:58])[CH2:55][CH2:56][CH2:57]5)[cH:43][cH:44][cH:45]4)[cH:37][cH:38][cH:39]3)[cH:26][c:27]([CH3:29])[cH:28]1)[c:11]2[NH:12][CH:13]1[CH2:14][CH2:15][O:16][CH2:17][CH2:18]1.\nAssistant: I recommend the following starting materials: CC(=O)O[BH-](OC(C)=O)OC(C)=O, CS(C)=O, C[C:58](=O)O, O=C[c:42]1[cH:41][c:40](-[c:36]2[cH:35][c:34]([CH2:33][NH:32][C:30]([c:25]3[cH:24][c:23]([C:21]([NH:20][CH2:19][c:10]4[c:9]([CH2:48][CH3:49])[n:8][c:7]5[n:3]([CH2:1][CH3:2])[n:4][cH:5][c:6]5[c:11]4[NH:12][CH:13]4[CH2:14][CH2:15][O:16][CH2:17][CH2:18]4)=[O:22])[cH:28][c:27]([CH3:29])[cH:26]3)=[O:31])[cH:39][cH:38][cH:37]2)[cH:45][cH:44][cH:43]1, and [CH3:50][N:51]1[CH2:52][CH2:53][NH:54][CH2:55][CH2:56][CH2:57]1."} {"text":"User: I must synthesize [Br:1][c:2]1[cH:3][cH:4][c:5]([CH:8]([CH2:11][CH:12]2[CH2:13][CH2:14][N:15]([CH3:18])[CH2:16][CH2:17]2)[C:25]([OH:26])=[O:28])[cH:6][cH:7]1.\nAssistant: That's interesting, is there anything else I can do for you?\nUser: Yes, I would like to know the educts I need to produce the reaction products [Br:1][c:2]1[cH:3][cH:4][c:5]([CH:8]([CH2:11][CH:12]2[CH2:13][CH2:14][N:15]([CH3:18])[CH2:16][CH2:17]2)[C:25]([OH:26])=[O:28])[cH:6][cH:7]1.\nAssistant: I suggest the following educts: [Br:1][c:2]1[cH:3][cH:4][c:5]([CH:8]([C:9]#[N:10])[CH2:11][CH:12]2[CH2:13][CH2:14][N:15]([CH3:18])[CH2:16][CH2:17]2)[cH:6][cH:7]1, [C:25]([O-:26])([O-:27])=[O:28], [CH3:31][OH:32], [Na+:29], [Na+:30], [OH2:24], and [S:19](=[O:20])(=[O:21])([OH:22])[OH:23]."}", "/scratch/micpie/export/ord_predictions/test_0-0.jsonl": "{"text":"The reaction SMILES (RXNSMILES) [CH2:25]([CH:26]([CH3:27])[OH:28])[OH:29].[CH3:1][n:2]1[c:3](=[O:24])[n:4](-[c:13]2[cH:14][cH:15][c:16]3[c:17]([c:18]([CH:21]=[O:22])[n:19][s:20]3)[cH:23]2)[c:5](=[O:12])[cH:6][c:7]1[C:8]([F:9])([F:10])[F:11].[CH3:41][c:42]1[cH:43][cH:44][cH:45][cH:46][cH:47]1.[CH3:48][CH2:49][O:50][CH2:51][CH3:52].[OH2:53].[c:30]1([CH3:31])[cH:32][cH:33][c:34]([S:35]([OH:36])(=[O:37])=[O:38])[cH:39][cH:40]1>>[CH3:1][n:2]1[c:3](=[O:24])[n:4](-[c:13]2[cH:14][cH:15][c:16]3[c:17]([c:18]([CH:21]4[O:22][CH2:25][CH:26]([CH3:27])[O:28]4)[n:19][s:20]3)[cH:23]2)[c:5](=[O:12])[cH:6][c:7]1[C:8]([F:9])([F:10])[F:11] has the reaction educts [CH2:25]([CH:26]([CH3:27])[OH:28])[OH:29], [CH3:1][n:2]1[c:3](=[O:24])[n:4](-[c:13]2[cH:14][cH:15][c:16]3[c:17]([c:18]([CH:21]=[O:22])[n:19][s:20]3)[cH:23]2)[c:5](=[O:12])[cH:6][c:7]1[C:8]([F:9])([F:10])[F:11], [CH3:41][c:42]1[cH:43][cH:44][cH:45][cH:46][cH:47]1, [CH3:48][CH2:49][O:50][CH2:51][CH3:52], [OH2:53], and [c:30]1([CH3:31])[cH:32][cH:33][c:34]([S:35]([OH:36])(=[O:37])=[O:38])[cH:39][cH:40]1 and the reaction products [CH3:1][n:2]1[c:3](=[O:24])[n:4](-[c:13]2[cH:14][cH:15][c:16]3[c:17]([c:18]([CH:21]4[O:22][CH2:25][CH:26]([CH3:27])[O:28]4)[n:19][s:20]3)[cH:23]2)[c:5](=[O:12])[cH:6][c:7]1[C:8]([F:9])([F:10])[F:11]."} {"text":"The reaction SMILES Cl.Cl[C:1]([c:2]1[c:3]([OH:4])[cH:5][cH:6][cH:7][cH:8]1)=[O:9].O[c:13]1[c:12]([C:11](=[O:19])[NH2:20])[cH:18][cH:17][cH:16][cH:15]1>>[c:1]1(-[c:2]2[c:3]([OH:4])[cH:5][cH:6][cH:7][cH:8]2)[o:9][c:18]2[c:12]([c:11](=[O:19])[n:20]1)[cH:13][cH:15][cH:16][cH:17]2 has the starting materials Cl, Cl[C:1]([c:2]1[c:3]([OH:4])[cH:5][cH:6][cH:7][cH:8]1)=[O:9], and O[c:13]1[c:12]([C:11](=[O:19])[NH2:20])[cH:18][cH:17][cH:16][cH:15]1 and the products [c:1]1(-[c:2]2[c:3]([OH:4])[cH:5][cH:6][cH:7][cH:8]2)[o:9][c:18]2[c:12]([c:11](=[O:19])[n:20]1)[cH:13][cH:15][cH:16][cH:17]2."}", "/scratch/micpie/export/ord_predictions/train_2-0.jsonl": "{"text":"The reaction SMILES CC(=O)O[BH-](OC(C)=O)OC(C)=O.CS(C)=O.C[C:58](=O)O.O=C[c:42]1[cH:41][c:40](-[c:36]2[cH:35][c:34]([CH2:33][NH:32][C:30]([c:25]3[cH:24][c:23]([C:21]([NH:20][CH2:19][c:10]4[c:9]([CH2:48][CH3:49])[n:8][c:7]5[n:3]([CH2:1][CH3:2])[n:4][cH:5][c:6]5[c:11]4[NH:12][CH:13]4[CH2:14][CH2:15][O:16][CH2:17][CH2:18]4)=[O:22])[cH:28][c:27]([CH3:29])[cH:26]3)=[O:31])[cH:39][cH:38][cH:37]2)[cH:45][cH:44][cH:43]1.[CH3:50][N:51]1[CH2:52][CH2:53][NH:54][CH2:55][CH2:56][CH2:57]1>>[CH2:1]([CH3:2])[n:3]1[n:4][cH:5][c:6]2[c:7]1[n:8][c:9]([CH2:48][CH3:49])[c:10]([CH2:19][NH:20][C:21](=[O:22])[c:23]1[cH:24][c:25]([C:30](=[O:31])[NH:32][CH2:33][c:34]3[cH:35][c:36](-[c:40]4[cH:41][c:42]([CH2:50][N:51]5[CH2:52][CH2:53][N:54]([CH3:58])[CH2:55][CH2:56][CH2:57]5)[cH:43][cH:44][cH:45]4)[cH:37][cH:38][cH:39]3)[cH:26][c:27]([CH3:29])[cH:28]1)[c:11]2[NH:12][CH:13]1[CH2:14][CH2:15][O:16][CH2:17][CH2:18]1 has the reaction educts CC(=O)O[BH-](OC(C)=O)OC(C)=O, CS(C)=O, C[C:58](=O)O, O=C[c:42]1[cH:41][c:40](-[c:36]2[cH:35][c:34]([CH2:33][NH:32][C:30]([c:25]3[cH:24][c:23]([C:21]([NH:20][CH2:19][c:10]4[c:9]([CH2:48][CH3:49])[n:8][c:7]5[n:3]([CH2:1][CH3:2])[n:4][cH:5][c:6]5[c:11]4[NH:12][CH:13]4[CH2:14][CH2:15][O:16][CH2:17][CH2:18]4)=[O:22])[cH:28][c:27]([CH3:29])[cH:26]3)=[O:31])[cH:39][cH:38][cH:37]2)[cH:45][cH:44][cH:43]1, and [CH3:50][N:51]1[CH2:52][CH2:53][NH:54][CH2:55][CH2:56][CH2:57]1 and the products [CH2:1]([CH3:2])[n:3]1[n:4][cH:5][c:6]2[c:7]1[n:8][c:9]([CH2:48][CH3:49])[c:10]([CH2:19][NH:20][C:21](=[O:22])[c:23]1[cH:24][c:25]([C:30](=[O:31])[NH:32][CH2:33][c:34]3[cH:35][c:36](-[c:40]4[cH:41][c:42]([CH2:50][N:51]5[CH2:52][CH2:53][N:54]([CH3:58])[CH2:55][CH2:56][CH2:57]5)[cH:43][cH:44][cH:45]4)[cH:37][cH:38][cH:39]3)[cH:26][c:27]([CH3:29])[cH:28]1)[c:11]2[NH:12][CH:13]1[CH2:14][CH2:15][O:16][CH2:17][CH2:18]1."} {"text":"The RXNSMILES [Br:1][c:2]1[cH:3][cH:4][c:5]([CH:8]([C:9]#[N:10])[CH2:11][CH:12]2[CH2:13][CH2:14][N:15]([CH3:18])[CH2:16][CH2:17]2)[cH:6][cH:7]1.[C:25]([O-:26])([O-:27])=[O:28].[CH3:31][OH:32].[Na+:29].[Na+:30].[OH2:24].[S:19](=[O:20])(=[O:21])([OH:22])[OH:23]>>[Br:1][c:2]1[cH:3][cH:4][c:5]([CH:8]([CH2:11][CH:12]2[CH2:13][CH2:14][N:15]([CH3:18])[CH2:16][CH2:17]2)[C:25]([OH:26])=[O:28])[cH:6][cH:7]1 has the starting materials [Br:1][c:2]1[cH:3][cH:4][c:5]([CH:8]([C:9]#[N:10])[CH2:11][CH:12]2[CH2:13][CH2:14][N:15]([CH3:18])[CH2:16][CH2:17]2)[cH:6][cH:7]1, [C:25]([O-:26])([O-:27])=[O:28], [CH3:31][OH:32], [Na+:29], [Na+:30], [OH2:24], and [S:19](=[O:20])(=[O:21])([OH:22])[OH:23] and the products [Br:1][c:2]1[cH:3][cH:4][c:5]([CH:8]([CH2:11][CH:12]2[CH2:13][CH2:14][N:15]([CH3:18])[CH2:16][CH2:17]2)[C:25]([OH:26])=[O:28])[cH:6][cH:7]1."}", "/scratch/micpie/export/ord_predictions/train_1-4.jsonl": "{"text":"User: I need to produce [O:1]1[CH:2]([CH2:6][CH2:7][N:8]2[CH2:9][C@@H:10]([OH:23])[C@H:11]([CH2:14][NH2:15])[CH2:12][CH2:13]2)[O:3][CH2:4][CH2:5]1.\nAssistant: Is there anything else I can do for you?\nUser: Yes, I would like to know the starting materials I need to produce the reaction products [O:1]1[CH:2]([CH2:6][CH2:7][N:8]2[CH2:9][C@@H:10]([OH:23])[C@H:11]([CH2:14][NH2:15])[CH2:12][CH2:13]2)[O:3][CH2:4][CH2:5]1.\nAssistant: I recommend the following starting materials: CO, [H][H], [Pd], and c1ccc(C[NH:15][CH2:14][C@H:11]2[C@H:10]([OH:23])[CH2:9][N:8]([CH2:7][CH2:6][CH:2]3[O:1][CH2:5][CH2:4][O:3]3)[CH2:13][CH2:12]2)cc1."} {"text":"User: I must synthesize [N+:1](=[O:2])([O-:3])[c:4]1[c:5]([CH2:6][S:12][CH2:13][C:14](=[O:15])[O:16][CH3:17])[cH:8][cH:9][cH:10][cH:11]1.\nAssistant: Cool, is there anything else I can do for you?\nUser: Yes, I would like to know the starting materials I need to produce the products [N+:1](=[O:2])([O-:3])[c:4]1[c:5]([CH2:6][S:12][CH2:13][C:14](=[O:15])[O:16][CH3:17])[cH:8][cH:9][cH:10][cH:11]1.\nAssistant: I recommend the following starting materials: [C:18](=[O:19])([O-:20])[O-:21], [CH3:24][C:25](=[O:26])[CH3:27], [K+:22], [K+:23], [N+:1](=[O:2])([O-:3])[c:4]1[c:5]([CH2:6][Cl:7])[cH:8][cH:9][cH:10][cH:11]1, and [SH:12][CH2:13][C:14](=[O:15])[O:16][CH3:17]."}", "/scratch/micpie/export/ord_predictions/valid_2-0.jsonl": "{"text":"The reaction SMILES (RXNSMILES) [CH2:1]([CH:2]([CH3:3])[CH3:4])[c:5]1[cH:6][c:7]2[c:8]([n:9][cH:10][n:11][c:12]2[NH:13][CH:14]2[CH2:15][CH2:16][NH:17][CH2:18][CH2:19]2)[s:20]1.[CH3:21][S:22]([O:23][CH:26]([CH3:27])[c:28]1[cH:29][c:30]([F:35])[cH:31][c:32]([F:34])[cH:33]1)(=[O:24])=[O:25]>>[CH2:1]([CH:2]([CH3:3])[CH3:4])[c:5]1[cH:6][c:7]2[c:8]([n:9][cH:10][n:11][c:12]2[NH:13][CH:14]2[CH2:15][CH2:16][N:17]([CH:26]([CH3:27])[c:28]3[cH:29][c:30]([F:35])[cH:31][c:32]([F:34])[cH:33]3)[CH2:18][CH2:19]2)[s:20]1 has the starting materials [CH2:1]([CH:2]([CH3:3])[CH3:4])[c:5]1[cH:6][c:7]2[c:8]([n:9][cH:10][n:11][c:12]2[NH:13][CH:14]2[CH2:15][CH2:16][NH:17][CH2:18][CH2:19]2)[s:20]1 and [CH3:21][S:22]([O:23][CH:26]([CH3:27])[c:28]1[cH:29][c:30]([F:35])[cH:31][c:32]([F:34])[cH:33]1)(=[O:24])=[O:25] and the reaction products [CH2:1]([CH:2]([CH3:3])[CH3:4])[c:5]1[cH:6][c:7]2[c:8]([n:9][cH:10][n:11][c:12]2[NH:13][CH:14]2[CH2:15][CH2:16][N:17]([CH:26]([CH3:27])[c:28]3[cH:29][c:30]([F:35])[cH:31][c:32]([F:34])[cH:33]3)[CH2:18][CH2:19]2)[s:20]1."} {"text":"The RXNSMILES CC[O:3][P:4]([O:5]CC)(=[O:8])[C:9]([F:10])([F:11])[c:12]1[c:13]([Br:24])[cH:14][c:15]2[cH:16][cH:17][c:18]([C:22]#[N:23])[n:19][c:20]2[cH:21]1.CO.C[Si](C)(C)Br.ClCCl.[NH3:30]>>[NH4+:19].[NH4+:30].[O:3]=[P:4]([O-:5])([O-:8])[C:9]([F:10])([F:11])[c:12]1[c:13]([Br:24])[cH:14][c:15]2[cH:16][cH:17][c:18]([C:22]#[N:23])[n:19][c:20]2[cH:21]1 has the educts CC[O:3][P:4]([O:5]CC)(=[O:8])[C:9]([F:10])([F:11])[c:12]1[c:13]([Br:24])[cH:14][c:15]2[cH:16][cH:17][c:18]([C:22]#[N:23])[n:19][c:20]2[cH:21]1, CO, C[Si](C)(C)Br, ClCCl, and [NH3:30] and the reaction products [NH4+:19].[NH4+:30].[O:3]=[P:4]([O-:5])([O-:8])[C:9]([F:10])([F:11])[c:12]1[c:13]([Br:24])[cH:14][c:15]2[cH:16][cH:17][c:18]([C:22]#[N:23])[n:19][c:20]2[cH:21]1."}", "/scratch/micpie/export/ord_predictions/test_0-3.jsonl": "{"text":"Question: What reaction products are produced from the reaction educts [CH2:25]([CH:26]([CH3:27])[OH:28])[OH:29], [CH3:1][n:2]1[c:3](=[O:24])[n:4](-[c:13]2[cH:14][cH:15][c:16]3[c:17]([c:18]([CH:21]=[O:22])[n:19][s:20]3)[cH:23]2)[c:5](=[O:12])[cH:6][c:7]1[C:8]([F:9])([F:10])[F:11], [CH3:41][c:42]1[cH:43][cH:44][cH:45][cH:46][cH:47]1, [CH3:48][CH2:49][O:50][CH2:51][CH3:52], [OH2:53], and [c:30]1([CH3:31])[cH:32][cH:33][c:34]([S:35]([OH:36])(=[O:37])=[O:38])[cH:39][cH:40]1?\nAnswer: [CH3:1][n:2]1[c:3](=[O:24])[n:4](-[c:13]2[cH:14][cH:15][c:16]3[c:17]([c:18]([CH:21]4[O:22][CH2:25][CH:26]([CH3:27])[O:28]4)[n:19][s:20]3)[cH:23]2)[c:5](=[O:12])[cH:6][c:7]1[C:8]([F:9])([F:10])[F:11]."} {"text":"Question: Which products are produced from the educts Cl, Cl[C:1]([c:2]1[c:3]([OH:4])[cH:5][cH:6][cH:7][cH:8]1)=[O:9], and O[c:13]1[c:12]([C:11](=[O:19])[NH2:20])[cH:18][cH:17][cH:16][cH:15]1?\nAnswer: [c:1]1(-[c:2]2[c:3]([OH:4])[cH:5][cH:6][cH:7][cH:8]2)[o:9][c:18]2[c:12]([c:11](=[O:19])[n:20]1)[cH:13][cH:15][cH:16][cH:17]2."}", "/scratch/micpie/export/ord_predictions/valid_2-1.jsonl": "{"text":"The RXNSMILES [CH2:1]([CH:2]([CH3:3])[CH3:4])[c:5]1[cH:6][c:7]2[c:8]([n:9][cH:10][n:11][c:12]2[NH:13][CH:14]2[CH2:15][CH2:16][NH:17][CH2:18][CH2:19]2)[s:20]1.[CH3:21][S:22]([O:23][CH:26]([CH3:27])[c:28]1[cH:29][c:30]([F:35])[cH:31][c:32]([F:34])[cH:33]1)(=[O:24])=[O:25]>>[CH2:1]([CH:2]([CH3:3])[CH3:4])[c:5]1[cH:6][c:7]2[c:8]([n:9][cH:10][n:11][c:12]2[NH:13][CH:14]2[CH2:15][CH2:16][N:17]([CH:26]([CH3:27])[c:28]3[cH:29][c:30]([F:35])[cH:31][c:32]([F:34])[cH:33]3)[CH2:18][CH2:19]2)[s:20]1 has the products [CH2:1]([CH:2]([CH3:3])[CH3:4])[c:5]1[cH:6][c:7]2[c:8]([n:9][cH:10][n:11][c:12]2[NH:13][CH:14]2[CH2:15][CH2:16][N:17]([CH:26]([CH3:27])[c:28]3[cH:29][c:30]([F:35])[cH:31][c:32]([F:34])[cH:33]3)[CH2:18][CH2:19]2)[s:20]1 and the starting materials [CH2:1]([CH:2]([CH3:3])[CH3:4])[c:5]1[cH:6][c:7]2[c:8]([n:9][cH:10][n:11][c:12]2[NH:13][CH:14]2[CH2:15][CH2:16][NH:17][CH2:18][CH2:19]2)[s:20]1 and [CH3:21][S:22]([O:23][CH:26]([CH3:27])[c:28]1[cH:29][c:30]([F:35])[cH:31][c:32]([F:34])[cH:33]1)(=[O:24])=[O:25]."} {"text":"The reaction SMILES (RXNSMILES) CC[O:3][P:4]([O:5]CC)(=[O:8])[C:9]([F:10])([F:11])[c:12]1[c:13]([Br:24])[cH:14][c:15]2[cH:16][cH:17][c:18]([C:22]#[N:23])[n:19][c:20]2[cH:21]1.CO.C[Si](C)(C)Br.ClCCl.[NH3:30]>>[NH4+:19].[NH4+:30].[O:3]=[P:4]([O-:5])([O-:8])[C:9]([F:10])([F:11])[c:12]1[c:13]([Br:24])[cH:14][c:15]2[cH:16][cH:17][c:18]([C:22]#[N:23])[n:19][c:20]2[cH:21]1 has the products [NH4+:19].[NH4+:30].[O:3]=[P:4]([O-:5])([O-:8])[C:9]([F:10])([F:11])[c:12]1[c:13]([Br:24])[cH:14][c:15]2[cH:16][cH:17][c:18]([C:22]#[N:23])[n:19][c:20]2[cH:21]1 and the starting materials CC[O:3][P:4]([O:5]CC)(=[O:8])[C:9]([F:10])([F:11])[c:12]1[c:13]([Br:24])[cH:14][c:15]2[cH:16][cH:17][c:18]([C:22]#[N:23])[n:19][c:20]2[cH:21]1, CO, C[Si](C)(C)Br, ClCCl, and [NH3:30]."}", "/scratch/micpie/export/ord_predictions/test_2-1.jsonl": "{"text":"The reaction SMILES [Br:1][c:2]1[cH:3][c:4]2[c:9]([c:10]([CH:12]=[CH2:13])[cH:11]1)[O:8][C:7]([CH3:14])([CH3:15])[CH2:6][C:5]2([CH3:16])[CH3:17].[CH3:21][CH2:22][O:23][CH2:24][CH3:25].[N+:18](=[N-:19])=[CH2:20]>>[Br:1][c:2]1[cH:3][c:4]2[c:9]([c:10]([CH:12]3[CH2:13][CH2:20]3)[cH:11]1)[O:8][C:7]([CH3:14])([CH3:15])[CH2:6][C:5]2([CH3:16])[CH3:17] has the reaction products [Br:1][c:2]1[cH:3][c:4]2[c:9]([c:10]([CH:12]3[CH2:13][CH2:20]3)[cH:11]1)[O:8][C:7]([CH3:14])([CH3:15])[CH2:6][C:5]2([CH3:16])[CH3:17] and the reaction educts [Br:1][c:2]1[cH:3][c:4]2[c:9]([c:10]([CH:12]=[CH2:13])[cH:11]1)[O:8][C:7]([CH3:14])([CH3:15])[CH2:6][C:5]2([CH3:16])[CH3:17], [CH3:21][CH2:22][O:23][CH2:24][CH3:25], and [N+:18](=[N-:19])=[CH2:20]."} {"text":"The reaction SMILES Cc1nc([Cl:11])[nH]c(=O)c1C.O=[c:5]1[nH:4][c:3](=[O:10])[c:2]([Br:1])[c:7]([CH3:8])[nH:6]1>>[Br:1][c:2]1[c:3](=[O:10])[nH:4][c:5]([Cl:11])[n:6][c:7]1[CH3:8] has the products [Br:1][c:2]1[c:3](=[O:10])[nH:4][c:5]([Cl:11])[n:6][c:7]1[CH3:8] and the educts Cc1nc([Cl:11])[nH]c(=O)c1C and O=[c:5]1[nH:4][c:3](=[O:10])[c:2]([Br:1])[c:7]([CH3:8])[nH:6]1."}", "/scratch/micpie/export/ord_predictions/train_0-0.jsonl": "{"text":"The RXNSMILES CO.COC[O:14][c:3]1[c:2]([Br:1])[cH:7][cH:6][c:5]([O:8][CH:9]([F:10])[F:11])[c:4]1[O:12][CH3:13].Cl>>[Br:1][c:2]1[c:3]([OH:14])[c:4]([O:12][CH3:13])[c:5]([O:8][CH:9]([F:10])[F:11])[cH:6][cH:7]1 has the starting materials CO, COC[O:14][c:3]1[c:2]([Br:1])[cH:7][cH:6][c:5]([O:8][CH:9]([F:10])[F:11])[c:4]1[O:12][CH3:13], and Cl and the products [Br:1][c:2]1[c:3]([OH:14])[c:4]([O:12][CH3:13])[c:5]([O:8][CH:9]([F:10])[F:11])[cH:6][cH:7]1."} {"text":"The reaction SMILES string [CH3:1][c:2]1[cH:3][c:4](-[c:15]2[cH:16][c:17]3[c:18](=[O:35])[n:19]([NH:30][S:31](=[O:32])(=[O:33])[CH3:34])[c:20](=[O:29])[nH:21][c:22]3[cH:23][c:24]2[C:25]([F:26])([F:27])[F:28])[n:5]([CH2:7][CH2:8][O:9][CH2:10][Si:11]([CH3:12])([CH3:13])[CH3:14])[n:6]1.[CH3:37][CH2:38][OH:39].[ClH:36].[O:40]1[CH2:41][CH2:42][O:43][CH2:44][CH2:45]1>>[CH3:1][c:2]1[cH:3][c:4](-[c:15]2[cH:16][c:17]3[c:18](=[O:35])[n:19]([NH:30][S:31](=[O:32])(=[O:33])[CH3:34])[c:20](=[O:29])[nH:21][c:22]3[cH:23][c:24]2[C:25]([F:26])([F:27])[F:28])[nH:5][n:6]1 has the reaction educts [CH3:1][c:2]1[cH:3][c:4](-[c:15]2[cH:16][c:17]3[c:18](=[O:35])[n:19]([NH:30][S:31](=[O:32])(=[O:33])[CH3:34])[c:20](=[O:29])[nH:21][c:22]3[cH:23][c:24]2[C:25]([F:26])([F:27])[F:28])[n:5]([CH2:7][CH2:8][O:9][CH2:10][Si:11]([CH3:12])([CH3:13])[CH3:14])[n:6]1, [CH3:37][CH2:38][OH:39], [ClH:36], and [O:40]1[CH2:41][CH2:42][O:43][CH2:44][CH2:45]1 and the products [CH3:1][c:2]1[cH:3][c:4](-[c:15]2[cH:16][c:17]3[c:18](=[O:35])[n:19]([NH:30][S:31](=[O:32])(=[O:33])[CH3:34])[c:20](=[O:29])[nH:21][c:22]3[cH:23][c:24]2[C:25]([F:26])([F:27])[F:28])[nH:5][n:6]1."}", "/scratch/micpie/export/ord_predictions/train_0-3.jsonl": "{"text":"Question: What products are produced from the reaction educts CO, COC[O:14][c:3]1[c:2]([Br:1])[cH:7][cH:6][c:5]([O:8][CH:9]([F:10])[F:11])[c:4]1[O:12][CH3:13], and Cl?\nAnswer: [Br:1][c:2]1[c:3]([OH:14])[c:4]([O:12][CH3:13])[c:5]([O:8][CH:9]([F:10])[F:11])[cH:6][cH:7]1."} {"text":"Question: What reaction products are produced from the starting materials [CH3:1][c:2]1[cH:3][c:4](-[c:15]2[cH:16][c:17]3[c:18](=[O:35])[n:19]([NH:30][S:31](=[O:32])(=[O:33])[CH3:34])[c:20](=[O:29])[nH:21][c:22]3[cH:23][c:24]2[C:25]([F:26])([F:27])[F:28])[n:5]([CH2:7][CH2:8][O:9][CH2:10][Si:11]([CH3:12])([CH3:13])[CH3:14])[n:6]1, [CH3:37][CH2:38][OH:39], [ClH:36], and [O:40]1[CH2:41][CH2:42][O:43][CH2:44][CH2:45]1?\nAnswer: [CH3:1][c:2]1[cH:3][c:4](-[c:15]2[cH:16][c:17]3[c:18](=[O:35])[n:19]([NH:30][S:31](=[O:32])(=[O:33])[CH3:34])[c:20](=[O:29])[nH:21][c:22]3[cH:23][c:24]2[C:25]([F:26])([F:27])[F:28])[nH:5][n:6]1."}", "/scratch/micpie/export/ord_predictions/test_1-4.jsonl": "{"text":"User: I would like to synthesize [c:1]1([CH2:7][CH2:8][CH:9]([C:10]([O:11][CH3:22])=[O:12])[CH2:16][C:14](=[O:15])[OH:20])[cH:2][cH:3][cH:4][cH:5][cH:6]1.\nAssistant: Is there anything else I can do for you?\nUser: I would like to know the reaction educts I need to produce the reaction products [c:1]1([CH2:7][CH2:8][CH:9]([C:10]([O:11][CH3:22])=[O:12])[CH2:16][C:14](=[O:15])[OH:20])[cH:2][cH:3][cH:4][cH:5][cH:6]1.\nAssistant: I propose the following reaction educts: Cl[CH2:22]Cl, F[C:16](F)(F)[C:14](=[O:15])[OH:20], O, and [c:1]1([CH2:7][CH2:8][CH2:9][C:10](=[O:11])[OH:12])[cH:2][cH:3][cH:4][cH:5][cH:6]1."} {"text":"User: I would like to produce [O:1]1[CH2:2][CH2:3][O:4][C:5]12[CH2:6][CH2:7][CH:8]([c:11]1[cH:12][nH:13][c:14]3[cH:15][cH:16][c:17]([F:20])[cH:18][c:19]13)[CH2:9][CH2:10]2.\nAssistant: Cool, is there anything else I can do for you?\nUser: Yes, I would like to know the educts I need to produce the reaction products [O:1]1[CH2:2][CH2:3][O:4][C:5]12[CH2:6][CH2:7][CH:8]([c:11]1[cH:12][nH:13][c:14]3[cH:15][cH:16][c:17]([F:20])[cH:18][c:19]13)[CH2:9][CH2:10]2.\nAssistant: I advise the following educts: CCO, [O:1]1[CH2:2][CH2:3][O:4][C:5]12[CH2:6][CH:7]=[C:8]([C:11]1=[c:19]3[c:14]([cH:15][cH:16][c:17]([F:20])[cH:18]3)=[N:13][CH2:12]1)[CH2:9][CH2:10]2, and [Pd]."}", "/scratch/micpie/export/ord_predictions/test_1-1.jsonl": "{"text":"The reaction SMILES (RXNSMILES) Cl[CH2:22]Cl.F[C:16](F)(F)[C:14](=[O:15])[OH:20].O.[c:1]1([CH2:7][CH2:8][CH2:9][C:10](=[O:11])[OH:12])[cH:2][cH:3][cH:4][cH:5][cH:6]1>>[c:1]1([CH2:7][CH2:8][CH:9]([C:10]([O:11][CH3:22])=[O:12])[CH2:16][C:14](=[O:15])[OH:20])[cH:2][cH:3][cH:4][cH:5][cH:6]1 has the products [c:1]1([CH2:7][CH2:8][CH:9]([C:10]([O:11][CH3:22])=[O:12])[CH2:16][C:14](=[O:15])[OH:20])[cH:2][cH:3][cH:4][cH:5][cH:6]1 and the starting materials Cl[CH2:22]Cl, F[C:16](F)(F)[C:14](=[O:15])[OH:20], O, and [c:1]1([CH2:7][CH2:8][CH2:9][C:10](=[O:11])[OH:12])[cH:2][cH:3][cH:4][cH:5][cH:6]1."} {"text":"The reaction SMILES CCO.[O:1]1[CH2:2][CH2:3][O:4][C:5]12[CH2:6][CH:7]=[C:8]([C:11]1=[c:19]3[c:14]([cH:15][cH:16][c:17]([F:20])[cH:18]3)=[N:13][CH2:12]1)[CH2:9][CH2:10]2.[Pd]>>[O:1]1[CH2:2][CH2:3][O:4][C:5]12[CH2:6][CH2:7][CH:8]([c:11]1[cH:12][nH:13][c:14]3[cH:15][cH:16][c:17]([F:20])[cH:18][c:19]13)[CH2:9][CH2:10]2 has the products [O:1]1[CH2:2][CH2:3][O:4][C:5]12[CH2:6][CH2:7][CH:8]([c:11]1[cH:12][nH:13][c:14]3[cH:15][cH:16][c:17]([F:20])[cH:18][c:19]13)[CH2:9][CH2:10]2 and the starting materials CCO, [O:1]1[CH2:2][CH2:3][O:4][C:5]12[CH2:6][CH:7]=[C:8]([C:11]1=[c:19]3[c:14]([cH:15][cH:16][c:17]([F:20])[cH:18]3)=[N:13][CH2:12]1)[CH2:9][CH2:10]2, and [Pd]."}", "/scratch/micpie/export/ord_predictions/valid_1-3.jsonl": "{"text":"Question: Which products are produced from the reaction educts [CH2:30]1[CH2:31][CH2:32][C:33]2=[N:38][CH2:37][CH2:36][CH2:35][N:34]2[CH2:39][CH2:40]1, [CH3:1][O:2][c:3]1[cH:4][c:5]([CH2:6][NH2:7])[cH:8][c:9]([O:11][CH3:12])[cH:10]1, and [Cl:13][C:14]([C:15](=[O:16])[NH:17][c:18]1[c:19]2[cH:20][cH:21][n:22][cH:23][c:24]2[cH:25][cH:26][cH:27]1)([Cl:28])[Cl:29]?\nAnswer: [CH3:1][O:2][c:3]1[cH:4][c:5]([CH2:6][NH:7][C:15](=[O:16])[NH:17][c:18]2[c:19]3[cH:20][cH:21][n:22][cH:23][c:24]3[cH:25][cH:26][cH:27]2)[cH:8][c:9]([O:11][CH3:12])[cH:10]1."} {"text":"Question: What products are produced from the reaction educts Br[c:2]1[s:3][c:4]2[c:5]([n:6]1)[cH:7][c:8]([C:11]#[N:12])[cH:9][cH:10]2, CN1CCCC1=O, ClCCl, Cn1ccnc1, and [Br-].[Zn+][CH2:16][CH:15]([CH3:14])[CH3:18]?\nAnswer: [c:2]1([CH2:14][CH:15]([CH3:16])[CH3:18])[s:3][c:4]2[c:5]([n:6]1)[cH:7][c:8]([C:11]#[N:12])[cH:9][cH:10]2."}", "/scratch/micpie/export/ord_predictions/valid_0-2.jsonl": "{"text":"Question: Which starting materials are required to synthesize [CH2:1]1[O:3][C@@H:4]([CH2:5][P:6](=[O:7])([O:8][CH2:9][CH3:10])[O:11][CH2:12][CH3:13])[O:14][C@@H:15]1[CH2:16][Br:17]?\nAnswer: C[CH2:1][O:3][CH:4]([CH2:5][P:6](=[O:7])([O:8][CH2:9][CH3:10])[O:11][CH2:12][CH3:13])[O:14][CH2:15][CH3:16], Cc1ccc(S(=O)(=O)O)cc1, Cc1ccccc1, O=C([O-])O.[Na+], and OCC(O)C[Br:17]."} {"text":"Question: Which educts are required to synthesize [CH2:1]([c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1)[O:8][CH2:9][C:10](=[O:12])[NH:39][c:38]1[cH:37][c:36]([C:35]([F:34])([F:43])[F:44])[cH:42][cH:41][cH:40]1?\nAnswer: CCOC(C)=O, CN(C)C=O, CN1CCOCC1, ClCCCl, O=[C:10]([CH2:9][O:8][CH2:1][c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1)[OH:12], On1nnc2ccccc21, and [F:34][C:35]([c:36]1[cH:37][c:38]([NH2:39])[cH:40][cH:41][cH:42]1)([F:43])[F:44]."}", "/scratch/micpie/export/ord_predictions/valid_0-1.jsonl": "{"text":"The reaction SMILES (RXNSMILES) C[CH2:1][O:3][CH:4]([CH2:5][P:6](=[O:7])([O:8][CH2:9][CH3:10])[O:11][CH2:12][CH3:13])[O:14][CH2:15][CH3:16].Cc1ccc(S(=O)(=O)O)cc1.Cc1ccccc1.O=C([O-])O.[Na+].OCC(O)C[Br:17]>>[CH2:1]1[O:3][C@@H:4]([CH2:5][P:6](=[O:7])([O:8][CH2:9][CH3:10])[O:11][CH2:12][CH3:13])[O:14][C@@H:15]1[CH2:16][Br:17] has the products [CH2:1]1[O:3][C@@H:4]([CH2:5][P:6](=[O:7])([O:8][CH2:9][CH3:10])[O:11][CH2:12][CH3:13])[O:14][C@@H:15]1[CH2:16][Br:17] and the reaction educts C[CH2:1][O:3][CH:4]([CH2:5][P:6](=[O:7])([O:8][CH2:9][CH3:10])[O:11][CH2:12][CH3:13])[O:14][CH2:15][CH3:16], Cc1ccc(S(=O)(=O)O)cc1, Cc1ccccc1, O=C([O-])O.[Na+], and OCC(O)C[Br:17]."} {"text":"The reaction SMILES string CCOC(C)=O.CN(C)C=O.CN1CCOCC1.ClCCCl.O=[C:10]([CH2:9][O:8][CH2:1][c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1)[OH:12].On1nnc2ccccc21.[F:34][C:35]([c:36]1[cH:37][c:38]([NH2:39])[cH:40][cH:41][cH:42]1)([F:43])[F:44]>>[CH2:1]([c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1)[O:8][CH2:9][C:10](=[O:12])[NH:39][c:38]1[cH:37][c:36]([C:35]([F:34])([F:43])[F:44])[cH:42][cH:41][cH:40]1 has the products [CH2:1]([c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1)[O:8][CH2:9][C:10](=[O:12])[NH:39][c:38]1[cH:37][c:36]([C:35]([F:34])([F:43])[F:44])[cH:42][cH:41][cH:40]1 and the starting materials CCOC(C)=O, CN(C)C=O, CN1CCOCC1, ClCCCl, O=[C:10]([CH2:9][O:8][CH2:1][c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1)[OH:12], On1nnc2ccccc21, and [F:34][C:35]([c:36]1[cH:37][c:38]([NH2:39])[cH:40][cH:41][cH:42]1)([F:43])[F:44]."}", "/scratch/micpie/export/ord_predictions/valid_2-4.jsonl": "{"text":"User: I need to synthesize [CH2:1]([CH:2]([CH3:3])[CH3:4])[c:5]1[cH:6][c:7]2[c:8]([n:9][cH:10][n:11][c:12]2[NH:13][CH:14]2[CH2:15][CH2:16][N:17]([CH:26]([CH3:27])[c:28]3[cH:29][c:30]([F:35])[cH:31][c:32]([F:34])[cH:33]3)[CH2:18][CH2:19]2)[s:20]1.\nAssistant: Great, is there anything else I can do for you?\nUser: Yes, I would like to know the reaction educts I need to produce the products [CH2:1]([CH:2]([CH3:3])[CH3:4])[c:5]1[cH:6][c:7]2[c:8]([n:9][cH:10][n:11][c:12]2[NH:13][CH:14]2[CH2:15][CH2:16][N:17]([CH:26]([CH3:27])[c:28]3[cH:29][c:30]([F:35])[cH:31][c:32]([F:34])[cH:33]3)[CH2:18][CH2:19]2)[s:20]1.\nAssistant: I suggest the following reaction educts: [CH2:1]([CH:2]([CH3:3])[CH3:4])[c:5]1[cH:6][c:7]2[c:8]([n:9][cH:10][n:11][c:12]2[NH:13][CH:14]2[CH2:15][CH2:16][NH:17][CH2:18][CH2:19]2)[s:20]1 and [CH3:21][S:22]([O:23][CH:26]([CH3:27])[c:28]1[cH:29][c:30]([F:35])[cH:31][c:32]([F:34])[cH:33]1)(=[O:24])=[O:25]."} {"text":"User: I must synthesize [NH4+:19].[NH4+:30].[O:3]=[P:4]([O-:5])([O-:8])[C:9]([F:10])([F:11])[c:12]1[c:13]([Br:24])[cH:14][c:15]2[cH:16][cH:17][c:18]([C:22]#[N:23])[n:19][c:20]2[cH:21]1.\nAssistant: Is there anything else I can do for you?\nUser: I would like to know the starting materials I need to produce the reaction products [NH4+:19].[NH4+:30].[O:3]=[P:4]([O-:5])([O-:8])[C:9]([F:10])([F:11])[c:12]1[c:13]([Br:24])[cH:14][c:15]2[cH:16][cH:17][c:18]([C:22]#[N:23])[n:19][c:20]2[cH:21]1.\nAssistant: I advise the following starting materials: CC[O:3][P:4]([O:5]CC)(=[O:8])[C:9]([F:10])([F:11])[c:12]1[c:13]([Br:24])[cH:14][c:15]2[cH:16][cH:17][c:18]([C:22]#[N:23])[n:19][c:20]2[cH:21]1, CO, C[Si](C)(C)Br, ClCCl, and [NH3:30]."}", "/scratch/micpie/export/ord_predictions/train_2-1.jsonl": "{"text":"The reaction SMILES (RXNSMILES) CC(=O)O[BH-](OC(C)=O)OC(C)=O.CS(C)=O.C[C:58](=O)O.O=C[c:42]1[cH:41][c:40](-[c:36]2[cH:35][c:34]([CH2:33][NH:32][C:30]([c:25]3[cH:24][c:23]([C:21]([NH:20][CH2:19][c:10]4[c:9]([CH2:48][CH3:49])[n:8][c:7]5[n:3]([CH2:1][CH3:2])[n:4][cH:5][c:6]5[c:11]4[NH:12][CH:13]4[CH2:14][CH2:15][O:16][CH2:17][CH2:18]4)=[O:22])[cH:28][c:27]([CH3:29])[cH:26]3)=[O:31])[cH:39][cH:38][cH:37]2)[cH:45][cH:44][cH:43]1.[CH3:50][N:51]1[CH2:52][CH2:53][NH:54][CH2:55][CH2:56][CH2:57]1>>[CH2:1]([CH3:2])[n:3]1[n:4][cH:5][c:6]2[c:7]1[n:8][c:9]([CH2:48][CH3:49])[c:10]([CH2:19][NH:20][C:21](=[O:22])[c:23]1[cH:24][c:25]([C:30](=[O:31])[NH:32][CH2:33][c:34]3[cH:35][c:36](-[c:40]4[cH:41][c:42]([CH2:50][N:51]5[CH2:52][CH2:53][N:54]([CH3:58])[CH2:55][CH2:56][CH2:57]5)[cH:43][cH:44][cH:45]4)[cH:37][cH:38][cH:39]3)[cH:26][c:27]([CH3:29])[cH:28]1)[c:11]2[NH:12][CH:13]1[CH2:14][CH2:15][O:16][CH2:17][CH2:18]1 has the reaction products [CH2:1]([CH3:2])[n:3]1[n:4][cH:5][c:6]2[c:7]1[n:8][c:9]([CH2:48][CH3:49])[c:10]([CH2:19][NH:20][C:21](=[O:22])[c:23]1[cH:24][c:25]([C:30](=[O:31])[NH:32][CH2:33][c:34]3[cH:35][c:36](-[c:40]4[cH:41][c:42]([CH2:50][N:51]5[CH2:52][CH2:53][N:54]([CH3:58])[CH2:55][CH2:56][CH2:57]5)[cH:43][cH:44][cH:45]4)[cH:37][cH:38][cH:39]3)[cH:26][c:27]([CH3:29])[cH:28]1)[c:11]2[NH:12][CH:13]1[CH2:14][CH2:15][O:16][CH2:17][CH2:18]1 and the reaction educts CC(=O)O[BH-](OC(C)=O)OC(C)=O, CS(C)=O, C[C:58](=O)O, O=C[c:42]1[cH:41][c:40](-[c:36]2[cH:35][c:34]([CH2:33][NH:32][C:30]([c:25]3[cH:24][c:23]([C:21]([NH:20][CH2:19][c:10]4[c:9]([CH2:48][CH3:49])[n:8][c:7]5[n:3]([CH2:1][CH3:2])[n:4][cH:5][c:6]5[c:11]4[NH:12][CH:13]4[CH2:14][CH2:15][O:16][CH2:17][CH2:18]4)=[O:22])[cH:28][c:27]([CH3:29])[cH:26]3)=[O:31])[cH:39][cH:38][cH:37]2)[cH:45][cH:44][cH:43]1, and [CH3:50][N:51]1[CH2:52][CH2:53][NH:54][CH2:55][CH2:56][CH2:57]1."} {"text":"The reaction SMILES (RXNSMILES) [Br:1][c:2]1[cH:3][cH:4][c:5]([CH:8]([C:9]#[N:10])[CH2:11][CH:12]2[CH2:13][CH2:14][N:15]([CH3:18])[CH2:16][CH2:17]2)[cH:6][cH:7]1.[C:25]([O-:26])([O-:27])=[O:28].[CH3:31][OH:32].[Na+:29].[Na+:30].[OH2:24].[S:19](=[O:20])(=[O:21])([OH:22])[OH:23]>>[Br:1][c:2]1[cH:3][cH:4][c:5]([CH:8]([CH2:11][CH:12]2[CH2:13][CH2:14][N:15]([CH3:18])[CH2:16][CH2:17]2)[C:25]([OH:26])=[O:28])[cH:6][cH:7]1 has the reaction products [Br:1][c:2]1[cH:3][cH:4][c:5]([CH:8]([CH2:11][CH:12]2[CH2:13][CH2:14][N:15]([CH3:18])[CH2:16][CH2:17]2)[C:25]([OH:26])=[O:28])[cH:6][cH:7]1 and the reaction educts [Br:1][c:2]1[cH:3][cH:4][c:5]([CH:8]([C:9]#[N:10])[CH2:11][CH:12]2[CH2:13][CH2:14][N:15]([CH3:18])[CH2:16][CH2:17]2)[cH:6][cH:7]1, [C:25]([O-:26])([O-:27])=[O:28], [CH3:31][OH:32], [Na+:29], [Na+:30], [OH2:24], and [S:19](=[O:20])(=[O:21])([OH:22])[OH:23]."}", "/scratch/micpie/export/ord_predictions/valid_1-1.jsonl": "{"text":"The reaction SMILES string [CH2:30]1[CH2:31][CH2:32][C:33]2=[N:38][CH2:37][CH2:36][CH2:35][N:34]2[CH2:39][CH2:40]1.[CH3:1][O:2][c:3]1[cH:4][c:5]([CH2:6][NH2:7])[cH:8][c:9]([O:11][CH3:12])[cH:10]1.[Cl:13][C:14]([C:15](=[O:16])[NH:17][c:18]1[c:19]2[cH:20][cH:21][n:22][cH:23][c:24]2[cH:25][cH:26][cH:27]1)([Cl:28])[Cl:29]>>[CH3:1][O:2][c:3]1[cH:4][c:5]([CH2:6][NH:7][C:15](=[O:16])[NH:17][c:18]2[c:19]3[cH:20][cH:21][n:22][cH:23][c:24]3[cH:25][cH:26][cH:27]2)[cH:8][c:9]([O:11][CH3:12])[cH:10]1 has the products [CH3:1][O:2][c:3]1[cH:4][c:5]([CH2:6][NH:7][C:15](=[O:16])[NH:17][c:18]2[c:19]3[cH:20][cH:21][n:22][cH:23][c:24]3[cH:25][cH:26][cH:27]2)[cH:8][c:9]([O:11][CH3:12])[cH:10]1 and the starting materials [CH2:30]1[CH2:31][CH2:32][C:33]2=[N:38][CH2:37][CH2:36][CH2:35][N:34]2[CH2:39][CH2:40]1, [CH3:1][O:2][c:3]1[cH:4][c:5]([CH2:6][NH2:7])[cH:8][c:9]([O:11][CH3:12])[cH:10]1, and [Cl:13][C:14]([C:15](=[O:16])[NH:17][c:18]1[c:19]2[cH:20][cH:21][n:22][cH:23][c:24]2[cH:25][cH:26][cH:27]1)([Cl:28])[Cl:29]."} {"text":"The reaction SMILES Br[c:2]1[s:3][c:4]2[c:5]([n:6]1)[cH:7][c:8]([C:11]#[N:12])[cH:9][cH:10]2.CN1CCCC1=O.ClCCl.Cn1ccnc1.[Br-].[Zn+][CH2:16][CH:15]([CH3:14])[CH3:18]>>[c:2]1([CH2:14][CH:15]([CH3:16])[CH3:18])[s:3][c:4]2[c:5]([n:6]1)[cH:7][c:8]([C:11]#[N:12])[cH:9][cH:10]2 has the products [c:2]1([CH2:14][CH:15]([CH3:16])[CH3:18])[s:3][c:4]2[c:5]([n:6]1)[cH:7][c:8]([C:11]#[N:12])[cH:9][cH:10]2 and the starting materials Br[c:2]1[s:3][c:4]2[c:5]([n:6]1)[cH:7][c:8]([C:11]#[N:12])[cH:9][cH:10]2, CN1CCCC1=O, ClCCl, Cn1ccnc1, and [Br-].[Zn+][CH2:16][CH:15]([CH3:14])[CH3:18]."}", "/scratch/micpie/export/ord_predictions/valid_0-4.jsonl": "{"text":"User: I must produce [CH2:1]1[O:3][C@@H:4]([CH2:5][P:6](=[O:7])([O:8][CH2:9][CH3:10])[O:11][CH2:12][CH3:13])[O:14][C@@H:15]1[CH2:16][Br:17].\nAssistant: Is there anything else I can do for you?\nUser: Yes, I would like to know the educts I need to produce the products [CH2:1]1[O:3][C@@H:4]([CH2:5][P:6](=[O:7])([O:8][CH2:9][CH3:10])[O:11][CH2:12][CH3:13])[O:14][C@@H:15]1[CH2:16][Br:17].\nAssistant: I advise the following educts: C[CH2:1][O:3][CH:4]([CH2:5][P:6](=[O:7])([O:8][CH2:9][CH3:10])[O:11][CH2:12][CH3:13])[O:14][CH2:15][CH3:16], Cc1ccc(S(=O)(=O)O)cc1, Cc1ccccc1, O=C([O-])O.[Na+], and OCC(O)C[Br:17]."} {"text":"User: I must synthesize [CH2:1]([c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1)[O:8][CH2:9][C:10](=[O:12])[NH:39][c:38]1[cH:37][c:36]([C:35]([F:34])([F:43])[F:44])[cH:42][cH:41][cH:40]1.\nAssistant: Cool, is there anything else I can do for you?\nUser: Yes, I would like to know the reaction educts I need to produce the products [CH2:1]([c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1)[O:8][CH2:9][C:10](=[O:12])[NH:39][c:38]1[cH:37][c:36]([C:35]([F:34])([F:43])[F:44])[cH:42][cH:41][cH:40]1.\nAssistant: I propose the following reaction educts: CCOC(C)=O, CN(C)C=O, CN1CCOCC1, ClCCCl, O=[C:10]([CH2:9][O:8][CH2:1][c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1)[OH:12], On1nnc2ccccc21, and [F:34][C:35]([c:36]1[cH:37][c:38]([NH2:39])[cH:40][cH:41][cH:42]1)([F:43])[F:44]."}", "/scratch/micpie/export/ord_predictions/test_1-3.jsonl": "{"text":"Question: What products are produced from the reaction educts Cl[CH2:22]Cl, F[C:16](F)(F)[C:14](=[O:15])[OH:20], O, and [c:1]1([CH2:7][CH2:8][CH2:9][C:10](=[O:11])[OH:12])[cH:2][cH:3][cH:4][cH:5][cH:6]1?\nAnswer: [c:1]1([CH2:7][CH2:8][CH:9]([C:10]([O:11][CH3:22])=[O:12])[CH2:16][C:14](=[O:15])[OH:20])[cH:2][cH:3][cH:4][cH:5][cH:6]1."} {"text":"Question: Which reaction products are produced from the educts CCO, [O:1]1[CH2:2][CH2:3][O:4][C:5]12[CH2:6][CH:7]=[C:8]([C:11]1=[c:19]3[c:14]([cH:15][cH:16][c:17]([F:20])[cH:18]3)=[N:13][CH2:12]1)[CH2:9][CH2:10]2, and [Pd]?\nAnswer: [O:1]1[CH2:2][CH2:3][O:4][C:5]12[CH2:6][CH2:7][CH:8]([c:11]1[cH:12][nH:13][c:14]3[cH:15][cH:16][c:17]([F:20])[cH:18][c:19]13)[CH2:9][CH2:10]2."}", "/scratch/micpie/export/ord_predictions/test_1-0.jsonl": "{"text":"The reaction SMILES string Cl[CH2:22]Cl.F[C:16](F)(F)[C:14](=[O:15])[OH:20].O.[c:1]1([CH2:7][CH2:8][CH2:9][C:10](=[O:11])[OH:12])[cH:2][cH:3][cH:4][cH:5][cH:6]1>>[c:1]1([CH2:7][CH2:8][CH:9]([C:10]([O:11][CH3:22])=[O:12])[CH2:16][C:14](=[O:15])[OH:20])[cH:2][cH:3][cH:4][cH:5][cH:6]1 has the reaction educts Cl[CH2:22]Cl, F[C:16](F)(F)[C:14](=[O:15])[OH:20], O, and [c:1]1([CH2:7][CH2:8][CH2:9][C:10](=[O:11])[OH:12])[cH:2][cH:3][cH:4][cH:5][cH:6]1 and the products [c:1]1([CH2:7][CH2:8][CH:9]([C:10]([O:11][CH3:22])=[O:12])[CH2:16][C:14](=[O:15])[OH:20])[cH:2][cH:3][cH:4][cH:5][cH:6]1."} {"text":"The reaction SMILES (RXNSMILES) CCO.[O:1]1[CH2:2][CH2:3][O:4][C:5]12[CH2:6][CH:7]=[C:8]([C:11]1=[c:19]3[c:14]([cH:15][cH:16][c:17]([F:20])[cH:18]3)=[N:13][CH2:12]1)[CH2:9][CH2:10]2.[Pd]>>[O:1]1[CH2:2][CH2:3][O:4][C:5]12[CH2:6][CH2:7][CH:8]([c:11]1[cH:12][nH:13][c:14]3[cH:15][cH:16][c:17]([F:20])[cH:18][c:19]13)[CH2:9][CH2:10]2 has the educts CCO, [O:1]1[CH2:2][CH2:3][O:4][C:5]12[CH2:6][CH:7]=[C:8]([C:11]1=[c:19]3[c:14]([cH:15][cH:16][c:17]([F:20])[cH:18]3)=[N:13][CH2:12]1)[CH2:9][CH2:10]2, and [Pd] and the reaction products [O:1]1[CH2:2][CH2:3][O:4][C:5]12[CH2:6][CH2:7][CH:8]([c:11]1[cH:12][nH:13][c:14]3[cH:15][cH:16][c:17]([F:20])[cH:18][c:19]13)[CH2:9][CH2:10]2."}", "/scratch/micpie/export/ord_predictions/train_0-2.jsonl": "{"text":"Question: What reaction educts are required to produce [Br:1][c:2]1[c:3]([OH:14])[c:4]([O:12][CH3:13])[c:5]([O:8][CH:9]([F:10])[F:11])[cH:6][cH:7]1?\nAnswer: CO, COC[O:14][c:3]1[c:2]([Br:1])[cH:7][cH:6][c:5]([O:8][CH:9]([F:10])[F:11])[c:4]1[O:12][CH3:13], and Cl."} {"text":"Question: What starting materials are required to produce [CH3:1][c:2]1[cH:3][c:4](-[c:15]2[cH:16][c:17]3[c:18](=[O:35])[n:19]([NH:30][S:31](=[O:32])(=[O:33])[CH3:34])[c:20](=[O:29])[nH:21][c:22]3[cH:23][c:24]2[C:25]([F:26])([F:27])[F:28])[nH:5][n:6]1?\nAnswer: [CH3:1][c:2]1[cH:3][c:4](-[c:15]2[cH:16][c:17]3[c:18](=[O:35])[n:19]([NH:30][S:31](=[O:32])(=[O:33])[CH3:34])[c:20](=[O:29])[nH:21][c:22]3[cH:23][c:24]2[C:25]([F:26])([F:27])[F:28])[n:5]([CH2:7][CH2:8][O:9][CH2:10][Si:11]([CH3:12])([CH3:13])[CH3:14])[n:6]1, [CH3:37][CH2:38][OH:39], [ClH:36], and [O:40]1[CH2:41][CH2:42][O:43][CH2:44][CH2:45]1."}", "/scratch/micpie/export/ord_predictions/test_2-2.jsonl": "{"text":"Question: Which educts are required to synthesize [Br:1][c:2]1[cH:3][c:4]2[c:9]([c:10]([CH:12]3[CH2:13][CH2:20]3)[cH:11]1)[O:8][C:7]([CH3:14])([CH3:15])[CH2:6][C:5]2([CH3:16])[CH3:17]?\nAnswer: [Br:1][c:2]1[cH:3][c:4]2[c:9]([c:10]([CH:12]=[CH2:13])[cH:11]1)[O:8][C:7]([CH3:14])([CH3:15])[CH2:6][C:5]2([CH3:16])[CH3:17], [CH3:21][CH2:22][O:23][CH2:24][CH3:25], and [N+:18](=[N-:19])=[CH2:20]."} {"text":"Question: What reaction educts are required to synthesize [Br:1][c:2]1[c:3](=[O:10])[nH:4][c:5]([Cl:11])[n:6][c:7]1[CH3:8]?\nAnswer: Cc1nc([Cl:11])[nH]c(=O)c1C and O=[c:5]1[nH:4][c:3](=[O:10])[c:2]([Br:1])[c:7]([CH3:8])[nH:6]1."}", "/scratch/micpie/export/ord_predictions/train_1-1.jsonl": "{"text":"The reaction SMILES (RXNSMILES) CO.[H][H].[Pd].c1ccc(C[NH:15][CH2:14][C@H:11]2[C@H:10]([OH:23])[CH2:9][N:8]([CH2:7][CH2:6][CH:2]3[O:1][CH2:5][CH2:4][O:3]3)[CH2:13][CH2:12]2)cc1>>[O:1]1[CH:2]([CH2:6][CH2:7][N:8]2[CH2:9][C@@H:10]([OH:23])[C@H:11]([CH2:14][NH2:15])[CH2:12][CH2:13]2)[O:3][CH2:4][CH2:5]1 has the products [O:1]1[CH:2]([CH2:6][CH2:7][N:8]2[CH2:9][C@@H:10]([OH:23])[C@H:11]([CH2:14][NH2:15])[CH2:12][CH2:13]2)[O:3][CH2:4][CH2:5]1 and the starting materials CO, [H][H], [Pd], and c1ccc(C[NH:15][CH2:14][C@H:11]2[C@H:10]([OH:23])[CH2:9][N:8]([CH2:7][CH2:6][CH:2]3[O:1][CH2:5][CH2:4][O:3]3)[CH2:13][CH2:12]2)cc1."} {"text":"The reaction SMILES (RXNSMILES) [C:18](=[O:19])([O-:20])[O-:21].[CH3:24][C:25](=[O:26])[CH3:27].[K+:22].[K+:23].[N+:1](=[O:2])([O-:3])[c:4]1[c:5]([CH2:6][Cl:7])[cH:8][cH:9][cH:10][cH:11]1.[SH:12][CH2:13][C:14](=[O:15])[O:16][CH3:17]>>[N+:1](=[O:2])([O-:3])[c:4]1[c:5]([CH2:6][S:12][CH2:13][C:14](=[O:15])[O:16][CH3:17])[cH:8][cH:9][cH:10][cH:11]1 has the reaction products [N+:1](=[O:2])([O-:3])[c:4]1[c:5]([CH2:6][S:12][CH2:13][C:14](=[O:15])[O:16][CH3:17])[cH:8][cH:9][cH:10][cH:11]1 and the reaction educts [C:18](=[O:19])([O-:20])[O-:21], [CH3:24][C:25](=[O:26])[CH3:27], [K+:22], [K+:23], [N+:1](=[O:2])([O-:3])[c:4]1[c:5]([CH2:6][Cl:7])[cH:8][cH:9][cH:10][cH:11]1, and [SH:12][CH2:13][C:14](=[O:15])[O:16][CH3:17]."}", "/scratch/micpie/export/ord_predictions/train_0-1.jsonl": "{"text":"The reaction SMILES (RXNSMILES) CO.COC[O:14][c:3]1[c:2]([Br:1])[cH:7][cH:6][c:5]([O:8][CH:9]([F:10])[F:11])[c:4]1[O:12][CH3:13].Cl>>[Br:1][c:2]1[c:3]([OH:14])[c:4]([O:12][CH3:13])[c:5]([O:8][CH:9]([F:10])[F:11])[cH:6][cH:7]1 has the reaction products [Br:1][c:2]1[c:3]([OH:14])[c:4]([O:12][CH3:13])[c:5]([O:8][CH:9]([F:10])[F:11])[cH:6][cH:7]1 and the reaction educts CO, COC[O:14][c:3]1[c:2]([Br:1])[cH:7][cH:6][c:5]([O:8][CH:9]([F:10])[F:11])[c:4]1[O:12][CH3:13], and Cl."} {"text":"The reaction SMILES (RXNSMILES) [CH3:1][c:2]1[cH:3][c:4](-[c:15]2[cH:16][c:17]3[c:18](=[O:35])[n:19]([NH:30][S:31](=[O:32])(=[O:33])[CH3:34])[c:20](=[O:29])[nH:21][c:22]3[cH:23][c:24]2[C:25]([F:26])([F:27])[F:28])[n:5]([CH2:7][CH2:8][O:9][CH2:10][Si:11]([CH3:12])([CH3:13])[CH3:14])[n:6]1.[CH3:37][CH2:38][OH:39].[ClH:36].[O:40]1[CH2:41][CH2:42][O:43][CH2:44][CH2:45]1>>[CH3:1][c:2]1[cH:3][c:4](-[c:15]2[cH:16][c:17]3[c:18](=[O:35])[n:19]([NH:30][S:31](=[O:32])(=[O:33])[CH3:34])[c:20](=[O:29])[nH:21][c:22]3[cH:23][c:24]2[C:25]([F:26])([F:27])[F:28])[nH:5][n:6]1 has the reaction products [CH3:1][c:2]1[cH:3][c:4](-[c:15]2[cH:16][c:17]3[c:18](=[O:35])[n:19]([NH:30][S:31](=[O:32])(=[O:33])[CH3:34])[c:20](=[O:29])[nH:21][c:22]3[cH:23][c:24]2[C:25]([F:26])([F:27])[F:28])[nH:5][n:6]1 and the reaction educts [CH3:1][c:2]1[cH:3][c:4](-[c:15]2[cH:16][c:17]3[c:18](=[O:35])[n:19]([NH:30][S:31](=[O:32])(=[O:33])[CH3:34])[c:20](=[O:29])[nH:21][c:22]3[cH:23][c:24]2[C:25]([F:26])([F:27])[F:28])[n:5]([CH2:7][CH2:8][O:9][CH2:10][Si:11]([CH3:12])([CH3:13])[CH3:14])[n:6]1, [CH3:37][CH2:38][OH:39], [ClH:36], and [O:40]1[CH2:41][CH2:42][O:43][CH2:44][CH2:45]1."}", "/scratch/micpie/export/ord_predictions/train_0-4.jsonl": "{"text":"User: I would like to produce [Br:1][c:2]1[c:3]([OH:14])[c:4]([O:12][CH3:13])[c:5]([O:8][CH:9]([F:10])[F:11])[cH:6][cH:7]1.\nAssistant: Is there anything else I can do for you?\nUser: I would like to know the reaction educts I need to produce the products [Br:1][c:2]1[c:3]([OH:14])[c:4]([O:12][CH3:13])[c:5]([O:8][CH:9]([F:10])[F:11])[cH:6][cH:7]1.\nAssistant: I propose the following reaction educts: CO, COC[O:14][c:3]1[c:2]([Br:1])[cH:7][cH:6][c:5]([O:8][CH:9]([F:10])[F:11])[c:4]1[O:12][CH3:13], and Cl."} {"text":"User: I want produce [CH3:1][c:2]1[cH:3][c:4](-[c:15]2[cH:16][c:17]3[c:18](=[O:35])[n:19]([NH:30][S:31](=[O:32])(=[O:33])[CH3:34])[c:20](=[O:29])[nH:21][c:22]3[cH:23][c:24]2[C:25]([F:26])([F:27])[F:28])[nH:5][n:6]1.\nAssistant: Cool, is there anything else I can do for you?\nUser: Yes, I would like to know the reaction educts I need to produce the reaction products [CH3:1][c:2]1[cH:3][c:4](-[c:15]2[cH:16][c:17]3[c:18](=[O:35])[n:19]([NH:30][S:31](=[O:32])(=[O:33])[CH3:34])[c:20](=[O:29])[nH:21][c:22]3[cH:23][c:24]2[C:25]([F:26])([F:27])[F:28])[nH:5][n:6]1.\nAssistant: I advise the following reaction educts: [CH3:1][c:2]1[cH:3][c:4](-[c:15]2[cH:16][c:17]3[c:18](=[O:35])[n:19]([NH:30][S:31](=[O:32])(=[O:33])[CH3:34])[c:20](=[O:29])[nH:21][c:22]3[cH:23][c:24]2[C:25]([F:26])([F:27])[F:28])[n:5]([CH2:7][CH2:8][O:9][CH2:10][Si:11]([CH3:12])([CH3:13])[CH3:14])[n:6]1, [CH3:37][CH2:38][OH:39], [ClH:36], and [O:40]1[CH2:41][CH2:42][O:43][CH2:44][CH2:45]1."}", "/scratch/micpie/export/ord_predictions/valid_1-0.jsonl": "{"text":"The reaction SMILES (RXNSMILES) [CH2:30]1[CH2:31][CH2:32][C:33]2=[N:38][CH2:37][CH2:36][CH2:35][N:34]2[CH2:39][CH2:40]1.[CH3:1][O:2][c:3]1[cH:4][c:5]([CH2:6][NH2:7])[cH:8][c:9]([O:11][CH3:12])[cH:10]1.[Cl:13][C:14]([C:15](=[O:16])[NH:17][c:18]1[c:19]2[cH:20][cH:21][n:22][cH:23][c:24]2[cH:25][cH:26][cH:27]1)([Cl:28])[Cl:29]>>[CH3:1][O:2][c:3]1[cH:4][c:5]([CH2:6][NH:7][C:15](=[O:16])[NH:17][c:18]2[c:19]3[cH:20][cH:21][n:22][cH:23][c:24]3[cH:25][cH:26][cH:27]2)[cH:8][c:9]([O:11][CH3:12])[cH:10]1 has the educts [CH2:30]1[CH2:31][CH2:32][C:33]2=[N:38][CH2:37][CH2:36][CH2:35][N:34]2[CH2:39][CH2:40]1, [CH3:1][O:2][c:3]1[cH:4][c:5]([CH2:6][NH2:7])[cH:8][c:9]([O:11][CH3:12])[cH:10]1, and [Cl:13][C:14]([C:15](=[O:16])[NH:17][c:18]1[c:19]2[cH:20][cH:21][n:22][cH:23][c:24]2[cH:25][cH:26][cH:27]1)([Cl:28])[Cl:29] and the reaction products [CH3:1][O:2][c:3]1[cH:4][c:5]([CH2:6][NH:7][C:15](=[O:16])[NH:17][c:18]2[c:19]3[cH:20][cH:21][n:22][cH:23][c:24]3[cH:25][cH:26][cH:27]2)[cH:8][c:9]([O:11][CH3:12])[cH:10]1."} {"text":"The reaction SMILES string Br[c:2]1[s:3][c:4]2[c:5]([n:6]1)[cH:7][c:8]([C:11]#[N:12])[cH:9][cH:10]2.CN1CCCC1=O.ClCCl.Cn1ccnc1.[Br-].[Zn+][CH2:16][CH:15]([CH3:14])[CH3:18]>>[c:2]1([CH2:14][CH:15]([CH3:16])[CH3:18])[s:3][c:4]2[c:5]([n:6]1)[cH:7][c:8]([C:11]#[N:12])[cH:9][cH:10]2 has the reaction educts Br[c:2]1[s:3][c:4]2[c:5]([n:6]1)[cH:7][c:8]([C:11]#[N:12])[cH:9][cH:10]2, CN1CCCC1=O, ClCCl, Cn1ccnc1, and [Br-].[Zn+][CH2:16][CH:15]([CH3:14])[CH3:18] and the products [c:2]1([CH2:14][CH:15]([CH3:16])[CH3:18])[s:3][c:4]2[c:5]([n:6]1)[cH:7][c:8]([C:11]#[N:12])[cH:9][cH:10]2."}", "/scratch/micpie/export/ord_predictions/valid_2-3.jsonl": "{"text":"Question: Which products are produced from the educts [CH2:1]([CH:2]([CH3:3])[CH3:4])[c:5]1[cH:6][c:7]2[c:8]([n:9][cH:10][n:11][c:12]2[NH:13][CH:14]2[CH2:15][CH2:16][NH:17][CH2:18][CH2:19]2)[s:20]1 and [CH3:21][S:22]([O:23][CH:26]([CH3:27])[c:28]1[cH:29][c:30]([F:35])[cH:31][c:32]([F:34])[cH:33]1)(=[O:24])=[O:25]?\nAnswer: [CH2:1]([CH:2]([CH3:3])[CH3:4])[c:5]1[cH:6][c:7]2[c:8]([n:9][cH:10][n:11][c:12]2[NH:13][CH:14]2[CH2:15][CH2:16][N:17]([CH:26]([CH3:27])[c:28]3[cH:29][c:30]([F:35])[cH:31][c:32]([F:34])[cH:33]3)[CH2:18][CH2:19]2)[s:20]1."} {"text":"Question: Which products are produced from the starting materials CC[O:3][P:4]([O:5]CC)(=[O:8])[C:9]([F:10])([F:11])[c:12]1[c:13]([Br:24])[cH:14][c:15]2[cH:16][cH:17][c:18]([C:22]#[N:23])[n:19][c:20]2[cH:21]1, CO, C[Si](C)(C)Br, ClCCl, and [NH3:30]?\nAnswer: [NH4+:19].[NH4+:30].[O:3]=[P:4]([O-:5])([O-:8])[C:9]([F:10])([F:11])[c:12]1[c:13]([Br:24])[cH:14][c:15]2[cH:16][cH:17][c:18]([C:22]#[N:23])[n:19][c:20]2[cH:21]1."}", "/scratch/micpie/export/ord_predictions/valid_0-3.jsonl": "{"text":"Question: Which products are produced from the reaction educts C[CH2:1][O:3][CH:4]([CH2:5][P:6](=[O:7])([O:8][CH2:9][CH3:10])[O:11][CH2:12][CH3:13])[O:14][CH2:15][CH3:16], Cc1ccc(S(=O)(=O)O)cc1, Cc1ccccc1, O=C([O-])O.[Na+], and OCC(O)C[Br:17]?\nAnswer: [CH2:1]1[O:3][C@@H:4]([CH2:5][P:6](=[O:7])([O:8][CH2:9][CH3:10])[O:11][CH2:12][CH3:13])[O:14][C@@H:15]1[CH2:16][Br:17]."} {"text":"Question: What products are produced from the reaction educts CCOC(C)=O, CN(C)C=O, CN1CCOCC1, ClCCCl, O=[C:10]([CH2:9][O:8][CH2:1][c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1)[OH:12], On1nnc2ccccc21, and [F:34][C:35]([c:36]1[cH:37][c:38]([NH2:39])[cH:40][cH:41][cH:42]1)([F:43])[F:44]?\nAnswer: [CH2:1]([c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1)[O:8][CH2:9][C:10](=[O:12])[NH:39][c:38]1[cH:37][c:36]([C:35]([F:34])([F:43])[F:44])[cH:42][cH:41][cH:40]1."}", "/scratch/micpie/export/ord_predictions/train_1-2.jsonl": "{"text":"Question: Which starting materials are needed to synthesize [O:1]1[CH:2]([CH2:6][CH2:7][N:8]2[CH2:9][C@@H:10]([OH:23])[C@H:11]([CH2:14][NH2:15])[CH2:12][CH2:13]2)[O:3][CH2:4][CH2:5]1?\nAnswer: CO, [H][H], [Pd], and c1ccc(C[NH:15][CH2:14][C@H:11]2[C@H:10]([OH:23])[CH2:9][N:8]([CH2:7][CH2:6][CH:2]3[O:1][CH2:5][CH2:4][O:3]3)[CH2:13][CH2:12]2)cc1."} {"text":"Question: What starting materials are needed to produce [N+:1](=[O:2])([O-:3])[c:4]1[c:5]([CH2:6][S:12][CH2:13][C:14](=[O:15])[O:16][CH3:17])[cH:8][cH:9][cH:10][cH:11]1?\nAnswer: [C:18](=[O:19])([O-:20])[O-:21], [CH3:24][C:25](=[O:26])[CH3:27], [K+:22], [K+:23], [N+:1](=[O:2])([O-:3])[c:4]1[c:5]([CH2:6][Cl:7])[cH:8][cH:9][cH:10][cH:11]1, and [SH:12][CH2:13][C:14](=[O:15])[O:16][CH3:17]."}", "/scratch/micpie/export/ord_predictions/test_0-4.jsonl": "{"text":"User: I would like to produce [CH3:1][n:2]1[c:3](=[O:24])[n:4](-[c:13]2[cH:14][cH:15][c:16]3[c:17]([c:18]([CH:21]4[O:22][CH2:25][CH:26]([CH3:27])[O:28]4)[n:19][s:20]3)[cH:23]2)[c:5](=[O:12])[cH:6][c:7]1[C:8]([F:9])([F:10])[F:11].\nAssistant: Great, is there anything else I can do for you?\nUser: I would like to know the starting materials I need to produce the reaction products [CH3:1][n:2]1[c:3](=[O:24])[n:4](-[c:13]2[cH:14][cH:15][c:16]3[c:17]([c:18]([CH:21]4[O:22][CH2:25][CH:26]([CH3:27])[O:28]4)[n:19][s:20]3)[cH:23]2)[c:5](=[O:12])[cH:6][c:7]1[C:8]([F:9])([F:10])[F:11].\nAssistant: I advise the following starting materials: [CH2:25]([CH:26]([CH3:27])[OH:28])[OH:29], [CH3:1][n:2]1[c:3](=[O:24])[n:4](-[c:13]2[cH:14][cH:15][c:16]3[c:17]([c:18]([CH:21]=[O:22])[n:19][s:20]3)[cH:23]2)[c:5](=[O:12])[cH:6][c:7]1[C:8]([F:9])([F:10])[F:11], [CH3:41][c:42]1[cH:43][cH:44][cH:45][cH:46][cH:47]1, [CH3:48][CH2:49][O:50][CH2:51][CH3:52], [OH2:53], and [c:30]1([CH3:31])[cH:32][cH:33][c:34]([S:35]([OH:36])(=[O:37])=[O:38])[cH:39][cH:40]1."} {"text":"User: I want synthesize [c:1]1(-[c:2]2[c:3]([OH:4])[cH:5][cH:6][cH:7][cH:8]2)[o:9][c:18]2[c:12]([c:11](=[O:19])[n:20]1)[cH:13][cH:15][cH:16][cH:17]2.\nAssistant: Cool, is there anything else I can do for you?\nUser: Yes, I would like to know the starting materials I need to produce the products [c:1]1(-[c:2]2[c:3]([OH:4])[cH:5][cH:6][cH:7][cH:8]2)[o:9][c:18]2[c:12]([c:11](=[O:19])[n:20]1)[cH:13][cH:15][cH:16][cH:17]2.\nAssistant: I propose the following starting materials: Cl, Cl[C:1]([c:2]1[c:3]([OH:4])[cH:5][cH:6][cH:7][cH:8]1)=[O:9], and O[c:13]1[c:12]([C:11](=[O:19])[NH2:20])[cH:18][cH:17][cH:16][cH:15]1."}", "/scratch/micpie/export/ord_predictions/test_2-3.jsonl": "{"text":"Question: Which products are produced from the reaction educts [Br:1][c:2]1[cH:3][c:4]2[c:9]([c:10]([CH:12]=[CH2:13])[cH:11]1)[O:8][C:7]([CH3:14])([CH3:15])[CH2:6][C:5]2([CH3:16])[CH3:17], [CH3:21][CH2:22][O:23][CH2:24][CH3:25], and [N+:18](=[N-:19])=[CH2:20]?\nAnswer: [Br:1][c:2]1[cH:3][c:4]2[c:9]([c:10]([CH:12]3[CH2:13][CH2:20]3)[cH:11]1)[O:8][C:7]([CH3:14])([CH3:15])[CH2:6][C:5]2([CH3:16])[CH3:17]."} {"text":"Question: What products are produced from the reaction educts Cc1nc([Cl:11])[nH]c(=O)c1C and O=[c:5]1[nH:4][c:3](=[O:10])[c:2]([Br:1])[c:7]([CH3:8])[nH:6]1?\nAnswer: [Br:1][c:2]1[c:3](=[O:10])[nH:4][c:5]([Cl:11])[n:6][c:7]1[CH3:8]."}", "/scratch/micpie/export/compound_protein_go_term_1/test_8-1.jsonl": "{"text":"The compound CNCC[C@H](Oc1cccc2cc(O)ccc12)c1cccs1 targets the protein Solute carrier family 6 member 2. The protein Solute carrier family 6 member 2 is located in the neuron projection."} {"text":"The compound Nc1nc2cc(-c3ccccc3)ccc2c(=O)n1C[C@H]1CC[C@@H](c2ccccc2)O1 targets the protein Cathepsin D (EC 3.4.23.5). The protein Cathepsin D (EC 3.4.23.5) is located in the collagen-containing extracellular matrix."}", "/scratch/micpie/export/compound_protein_go_term_1/test_4-0.jsonl": "{"text":"The compound Nc1ncc(-c2cc(N3CC(N4CCOCC4)C3)nc(N3C[C@@H](F)[C@@H](F)C3)n2)cc1OC(F)F targets the protein Mixed lineage kinase which enables the transferase activity."} {"text":"The compound O=c1[nH]c(C2CCC2c2ncccn2)nc2c1cnn2C1CCOCC1 targets the protein High affinity cGMP-specific 3^,5^-cyclic phosphodiesterase 9A which is located in the nucleoplasm."}", "/scratch/micpie/export/compound_protein_go_term_1/valid_5-1.jsonl": "{"text":"The compound InChI=1S\/C17H10F3NO5S2\/c18-10-3-4-12(22)16(20)15(10)17(23)14-6-5-13(27-14)9-2-1-8(7-11(9)19)26-28(21,24)25\/h1-7,22H,(H2,21,24,25) targets the protein Estrone sulfatase. The protein Estrone sulfatase enables the metal ion binding."} {"text":"The compound CcccOccccF)cc6Cl))))))))ccc6NcsncO)c5\/CN)=N\/CC)CO targets the protein MAPKK 1. The protein MAPKK 1 is located in the nucleus."}", "/scratch/micpie/export/compound_protein_go_term_1/valid_5-0.jsonl": "{"text":"The compound NS(=O)(=O)Oc1ccc(-c2ccc(C(=O)c3c(F)ccc(O)c3F)s2)c(F)c1 targets the protein Steryl-sulfate sulfohydrolase which enables the metal ion binding."} {"text":"The compound InChI=1S\/C20H20ClFN4O3S\/c1-10-7-13(29-16-6-3-12(22)8-14(16)21)4-5-15(10)25-20-17(19(28)26-30-20)18(23)24-11(2)9-27\/h3-8,11,25,27H,9H2,1-2H3,(H2,23,24)(H,26,28) targets the protein MKK1 which is located in the nucleus."}", "/scratch/micpie/export/compound_protein_go_term_1/train_7-1.jsonl": "{"text":"The compound InChI=1S\/C26H38N6O2\/c1-18(2)32-24-7-5-4-6-23(24)25(28-32)26(34)27-20-16-21-8-9-22(17-20)31(21)15-12-29-10-13-30(14-11-29)19(3)33\/h4-7,18,20-22H,8-17H2,1-3H3,(H,27,34)\/t20-,21+,22- targets the protein Serotonin receptor 4. The protein Serotonin receptor 4 is involved in the signal transduction."} {"text":"The compound O=C(Cc1ccncc1)NCc1cnc(Oc2ccc3c(c2)CCC(c2cccc(F)c2)O3)s1 targets the protein Na(+)\/Ca(2+)-exchange protein 1. The protein Na(+)\/Ca(2+)-exchange protein 1 is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_1/test_9-0.jsonl": "{"text":"The compound InChI=1S\/C29H29ClN6O2\/c1-19-6-5-9-23(16-19)25-26(30)35-28(33-15-14-20-7-3-2-4-8-20)29(38)36(25)18-24(37)34-17-21-10-12-22(13-11-21)27(31)32\/h2-13,16H,14-15,17-18H2,1H3,(H3,31,32)(H,33,35)(H,34,37) targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) which enables the thrombospondin receptor activity."} {"text":"The compound O=C(O)CCc1ccc(COc2ccc3c(c2)CCC3)cc1 targets the protein Nuclear receptor subfamily 2 group B member 2 which is located in the nucleus."}", "/scratch/micpie/export/compound_protein_go_term_1/valid_9-1.jsonl": "{"text":"The compound CO[C@@H](C(=O)N1CCC[C@H]1C(=O)N[C@H](C=O)CCCN=C(N)N)c1ccccc1 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II). The protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) enables the thrombospondin receptor activity."} {"text":"The compound InChI=1S\/C18H19FN4O2\/c1-11-3-5-12(6-4-11)14-15-17(20)21-10-22-18(15)23(7-2-8-24)16(14)13(25)9-19\/h3-6,10,24H,2,7-9H2,1H3,(H2,20,21,22) targets the protein LSK. The protein LSK is located in the plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_1/valid_3-0.jsonl": "{"text":"The compound InChI=1S\/C18H10ClF2N3O3S\/c19-15-4-1-12(20)8-17(15)27-16-5-3-14(7-11(16)9-22)28(25,26)24-18-6-2-13(21)10-23-18\/h1-8,10H,(H,23,24) targets the protein Urate anion exchanger 1 which enables the PDZ domain binding."} {"text":"The compound InChI=1S\/C21H17N5O3\/c27-20(24-18-12-16(25-29-18)13-5-2-1-3-6-13)15-8-7-14-11-17-21(28)22-9-4-10-26(17)19(14)23-15\/h1-3,5-8,11-12H,4,9-10H2,(H,22,28)(H,24,27) targets the protein 90 kDa ribosomal protein S6 kinase 3 which is located in the nucleoplasm."}", "/scratch/micpie/export/compound_protein_go_term_1/test_0-1.jsonl": "{"text":"The compound InChI=1S\/C17H30N4O3S\/c1-12(2)13(3)18-16(22)11-21-15(5)17(14(4)19-21)25(23,24)20-9-7-6-8-10-20\/h12-13H,6-11H2,1-5H3,(H,18,22) targets the protein Ataxia telangiectasia mutated. The protein Ataxia telangiectasia mutated enables the protein serine kinase activity."} {"text":"The compound CCC)[C@H]C=O)NO)))NCccccnc6)))))))S=O)=O)ccccF)cc6 targets the protein 72 kDa type IV collagenase (EC 3.4.24.24) (72 kDa gelatinase) (Gelatinase A) (Matrix metalloproteinase-2) (MMP-2) (TBE-1). The protein 72 kDa type IV collagenase (EC 3.4.24.24) (72 kDa gelatinase) (Gelatinase A) (Matrix metalloproteinase-2) (MMP-2) (TBE-1) is located in the mitochondrion."}", "/scratch/micpie/export/compound_protein_go_term_1/test_5-0.jsonl": "{"text":"The compound C[C@H](N)Cn1ncc2ccc(O)c(F)c21 targets the protein 5-HT-2B which enables the serotonin binding."} {"text":"The compound CCCCCCCCN(CCCCCCCC)C(=O)[C@H](C)NC(=O)[C@@H](NC(=O)[C@H](Cc1ccc(OP(=O)(O)O)cc1)NC(C)=O)C(C)C targets the protein Abelson tyrosine-protein kinase 1 which is located in the ruffle."}", "/scratch/micpie/export/compound_protein_go_term_1/test_2-0.jsonl": "{"text":"The compound CC(NC(C)(C)C)C(=O)c1ccc(F)c(F)c1 targets the protein DA transporter which enables the heterocyclic compound binding."} {"text":"The compound OC[C@H]1O[C@@H](n2ccc3c(SCc4ccccc4)ncnc32)[C@H](O)[C@@H]1O targets the protein Solute carrier family 29 member 1 which is located in the plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_1/valid_0-0.jsonl": "{"text":"The compound CC1C=CCC2C(=O)N(c3cccc(C(=O)Nc4ccc(Br)cc4)c3)C(=O)C12 targets the protein Serine-protein kinase ATM which enables the protein serine kinase activity."} {"text":"The compound CCC)C)Cccnccc6)[C@@H]NC[C@@H]O)[C@H]CccccF)cc6)))))))NC=O)CF))))))))C[C@]C[C@H]O)C4)))O6 targets the protein Cathepsin D (EC 3.4.23.5) which is located in the collagen-containing extracellular matrix."}", "/scratch/micpie/export/compound_protein_go_term_1/train_6-1.jsonl": "{"text":"The compound O=C(N[C@@H](CO)c1ccccc1)c1ccc2nc(N[C@H]3CC[C@H](O)CC3)c3nccn3c2c1 targets the protein JNK-46. The protein JNK-46 is involved in the cellular response to mechanical stimulus."} {"text":"The compound CC=O)NCCNCCN[C@H]CC[C@@H]5C[C@H]NC=O)cnnCC)C))cccccc96)))))))))))C7))))))))))CC6 targets the protein Serotonin receptor 4. The protein Serotonin receptor 4 is involved in the G protein-coupled receptor signaling pathway."}", "/scratch/micpie/export/compound_protein_go_term_1/test_7-0.jsonl": "{"text":"The compound CC=O)NCCNCCNCCCCNC=O)ccccc[nH]cCC)C))nc95))))))))))))CC6))))))))CC6 targets the protein 5-HT3-A which enables the transmitter-gated ion channel activity involved in regulation of postsynaptic membrane potential."} {"text":"The compound InChI=1S\/C18H19NO2S\/c1-19-10-9-17(18-6-3-11-22-18)21-16-5-2-4-13-12-14(20)7-8-15(13)16\/h2-8,11-12,17,19-20H,9-10H2,1H3\/t17-\/m0\/s1 targets the protein Solute carrier family 6 member 2 which is located in the plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_1/test_3-0.jsonl": "{"text":"The compound [O][=C][N][C][C][Branch1][N][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1][C][Ring1][N][C][=C][C][=C][C][Branch1][C][Cl][=C][Ring1][#Branch1] targets the protein Sodium-dependent dopamine transporter which enables the heterocyclic compound binding."} {"text":"The compound Nc1ncc(-c2cc(N3CC(N4CCOCC4)C3)nc(N3C[C@@H](F)[C@@H](F)C3)n2)cc1OC(F)F targets the protein Dual leucine zipper bearing kinase which enables the protein kinase binding."}", "/scratch/micpie/export/compound_protein_go_term_1/train_1-0.jsonl": "{"text":"The compound InChI=1S\/C24H28F2N4O2\/c25-17-6-9-19(10-7-17)27-12-13-28-23(31)21(14-16-4-2-1-3-5-16)30-24-29-20-15-18(26)8-11-22(20)32-24\/h6-11,15-16,21,27H,1-5,12-14H2,(H,28,31)(H,29,30)\/t21-\/m0\/s1 targets the protein Cathepsin S which enables the proteoglycan binding."} {"text":"The compound InChI=1S\/C17H19Cl2N3\/c18-16-6-3-4-13(17(16)19)11-22(15-7-9-20-10-15)12-14-5-1-2-8-21-14\/h1-6,8,15,20H,7,9-12H2\/t15-\/m1\/s1 targets the protein Solute carrier family 6 member 3 which enables the protein phosphatase 2A binding."}", "/scratch/micpie/export/compound_protein_go_term_1/test_0-0.jsonl": "{"text":"The compound [C][C][=N][N][Branch1][S][C][C][=Branch1][C][=O][N][C][Branch1][C][C][C][Branch1][C][C][C][C][Branch1][C][C][=C][Ring1][#C][S][=Branch1][C][=O][=Branch1][C][=O][N][C][C][C][C][C][Ring1][=Branch1] targets the protein Serine-protein kinase ATM which enables the protein serine kinase activity."} {"text":"The compound InChI=1S\/C17H20FN3O4S\/c1-12(2)16(17(22)20-23)21(11-13-4-3-9-19-10-13)26(24,25)15-7-5-14(18)6-8-15\/h3-10,12,16,23H,11H2,1-2H3,(H,20,22)\/t16-\/m1\/s1 targets the protein 72 kDa type IV collagenase (EC 3.4.24.24) (72 kDa gelatinase) (Gelatinase A) (Matrix metalloproteinase-2) (MMP-2) (TBE-1) which is located in the mitochondrion."}", "/scratch/micpie/export/compound_protein_go_term_1/test_6-0.jsonl": "{"text":"The compound InChI=1S\/C22H18N6O3\/c1-29-18-10-15-17(11-19(18)30-2)23-13-24-22(15)31-12-21-26-25-20-9-8-16(27-28(20)21)14-6-4-3-5-7-14\/h3-11,13H,12H2,1-2H3 targets the protein Hepatocyte growth factor receptor which enables the molecular function activator activity."} {"text":"The compound O=C(CCCCCCS)Nc1ccccc1 targets the protein HD10 which is located in the cytoplasm."}", "/scratch/micpie/export/compound_protein_go_term_1/train_2-0.jsonl": "{"text":"The compound [Cl][C][=C][C][=C][C][Branch2][Ring1][Branch1][C][N][Branch1][#Branch2][C][C][=C][C][=C][C][=N][Ring1][=Branch1][C@@H1][C][C][N][C][Ring1][Branch1][=C][Ring2][Ring1][Ring2][Cl] targets the protein DA transporter which enables the protein N-terminus binding."} {"text":"The compound CC(C)C(=O)N(Cc1cc(Cl)cc(Cl)c1Cl)[C@H]1CCNC1 targets the protein Sodium-dependent dopamine transporter which is involved in the response to iron ion."}", "/scratch/micpie/export/compound_protein_go_term_1/valid_2-0.jsonl": "{"text":"The compound [N][#C][C][=C][C][=N][N][Ring1][Branch1][C][C@@][C][N][C][C][C@][Ring1][=Branch1][Branch1][#C][C][=C][C][=C][Branch1][C][Cl][C][Branch1][C][Cl][=C][Ring1][Branch2][C][Ring1][#C] targets the protein DAT which enables the heterocyclic compound binding."} {"text":"The compound InChI=1S\/C24H26ClNO7S\/c1-2-3-6-32-17-10-15(25)13(9-20-26-11-19(34-20)16-5-4-7-31-16)8-14(17)24-23(30)22(29)21(28)18(12-27)33-24\/h2,4-5,7-8,10-11,18,21-24,27-30H,1,3,6,9,12H2\/t18-,21-,22+,23-,24+\/m1\/s1 targets the protein Low affinity sodium-glucose cotransporter which is located in the plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_1/test_7-1.jsonl": "{"text":"The compound [C][C][=Branch1][C][=O][N][C][C][N][Branch2][Ring2][=Branch2][C][C][N][C][C][C][Branch2][Ring1][O][C][N][C][=Branch1][C][=O][C][=C][C][=C][C][NH1][C][Branch1][=Branch1][C][Branch1][C][C][C][=N][C][Ring1][N][=Ring1][Branch2][C][C][Ring2][Ring1][=Branch1][C][C][Ring2][Ring1][=C] targets the protein 5-HT-3. The protein 5-HT-3 enables the transmitter-gated ion channel activity involved in regulation of postsynaptic membrane potential."} {"text":"The compound CNCC[C@H](Oc1cccc2cc(O)ccc12)c1cccs1 targets the protein Norepinephrine transporter. The protein Norepinephrine transporter is located in the plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_1/valid_2-1.jsonl": "{"text":"The compound N#Cc1ccnn1C[C@@]12CNCC[C@]1(c1ccc(Cl)c(Cl)c1)C2 targets the protein Solute carrier family 6 member 3. The protein Solute carrier family 6 member 3 enables the heterocyclic compound binding."} {"text":"The compound InChI=1S\/C24H26ClNO7S\/c1-2-3-6-32-17-10-15(25)13(9-20-26-11-19(34-20)16-5-4-7-31-16)8-14(17)24-23(30)22(29)21(28)18(12-27)33-24\/h2,4-5,7-8,10-11,18,21-24,27-30H,1,3,6,9,12H2\/t18-,21-,22+,23-,24+\/m1\/s1 targets the protein Sodium\/glucose cotransporter 2. The protein Sodium\/glucose cotransporter 2 is located in the plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_1/valid_4-0.jsonl": "{"text":"The compound CCC)NCccccc-cnccs5)))))c6)))))))C=O)NccncCC)C)C))nc6 targets the protein Tropomyosin-related kinase A which enables the nerve growth factor binding."} {"text":"The compound O=c1[nH]c(C(C2CC2)N2CC(Oc3ccccc3)C2)nc2c1cnn2C1CCOCC1 targets the protein High affinity cGMP-specific 3^,5^-cyclic phosphodiesterase 9A which is located in the nucleoplasm."}", "/scratch/micpie/export/compound_protein_go_term_1/train_5-1.jsonl": "{"text":"The compound [O][=C][C][=C][Branch2][Ring1][C][N][C][C][C][=C][C][=C][Branch1][C][O][C][Branch1][C][O][=C][Ring1][Branch2][C][=Branch1][C][=O][C][=N][C][=C][C][=C][Ring1][=Branch1][Ring2][Ring1][=Branch1] targets the protein Ezrin. The protein Ezrin is located in the protein-containing complex."} {"text":"The compound InChI=1S\/C25H27N5O3\/c31-15-21(16-4-2-1-3-5-16)29-25(33)17-6-11-20-22(14-17)30-13-12-26-24(30)23(28-20)27-18-7-9-19(32)10-8-18\/h1-6,11-14,18-19,21,31-32H,7-10,15H2,(H,27,28)(H,29,33)\/t18-,19-,21-\/m0\/s1 targets the protein MAPK 8. The protein MAPK 8 is involved in the cellular response to cadmium ion."}", "/scratch/micpie/export/compound_protein_go_term_1/test_2-1.jsonl": "{"text":"The compound CCNCC)C)C)))C=O)ccccF)cF)c6 targets the protein DA transporter. The protein DA transporter enables the heterocyclic compound binding."} {"text":"The compound [O][C][C@H1][O][C@@H1][Branch2][Ring1][#Branch2][N][C][=C][C][=C][Branch1][O][S][C][C][=C][C][=C][C][=C][Ring1][=Branch1][N][=C][N][=C][Ring1][=C][Ring1][P][C@H1][Branch1][C][O][C@@H1][Ring2][Ring1][#Branch1][O] targets the protein Equilibrative nitrobenzylmercaptopurine riboside-sensitive nucleoside transporter. The protein Equilibrative nitrobenzylmercaptopurine riboside-sensitive nucleoside transporter is located in the plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_1/train_0-0.jsonl": "{"text":"The compound Cc1ccc(SC(CC(=O)c2ccc(F)cc2)C(=O)O)cc1 targets the protein Ataxia telangiectasia mutated which enables the protein serine kinase activity."} {"text":"The compound InChI=1S\/C17H20N4O5\/c1-24-14-6-4-11(8-15(14)25-2)5-7-16(22)21-10-12(19-20-18)9-13(21)17(23)26-3\/h4-8,12-13H,9-10H2,1-3H3\/b7-5+\/t12-,13-\/m0\/s1 targets the protein 72 kDa type IV collagenase (EC 3.4.24.24) (72 kDa gelatinase) (Gelatinase A) (Matrix metalloproteinase-2) (MMP-2) (TBE-1) which is located in the collagen-containing extracellular matrix."}", "/scratch/micpie/export/compound_protein_go_term_1/test_1-1.jsonl": "{"text":"The compound [C][C][Branch1][C][C][C@H1][Branch1][#Branch1][C][=Branch1][C][=O][N][O][N][Branch1][#Branch2][C][C][=C][C][=C][N][=C][Ring1][=Branch1][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1] targets the protein 72 kDa type IV collagenase (EC 3.4.24.24) (72 kDa gelatinase) (Gelatinase A) (Matrix metalloproteinase-2) (MMP-2) (TBE-1). The protein 72 kDa type IV collagenase (EC 3.4.24.24) (72 kDa gelatinase) (Gelatinase A) (Matrix metalloproteinase-2) (MMP-2) (TBE-1) is located in the cytoplasm."} {"text":"The compound O=S1(=O)Nc2c(cc(F)c3cccnc23)-c2cc(C(F)(F)F)ccc21 targets the protein Solute carrier family 40 member 1. The protein Solute carrier family 40 member 1 is located in the basolateral plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_1/valid_9-0.jsonl": "{"text":"The compound CO[C@@H]C=O)NCCC[C@H]5C=O)N[C@H]C=O))CCCN=CN)N)))))))))))))))cccccc6 targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) which enables the thrombospondin receptor activity."} {"text":"The compound Cc1ccc(-c2c(C(=O)CF)n(CCCO)c3ncnc(N)c23)cc1 targets the protein Lymphocyte cell-specific protein-tyrosine kinase which is located in the plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_1/train_8-1.jsonl": "{"text":"The compound CC(C)(C)\/C=C\/c1cccc(-c2cc(NC(=O)[C@@H]3CNC(=O)C3)nn2-c2ccccc2)c1 targets the protein Sodium\/glucose cotransporter 1. The protein Sodium\/glucose cotransporter 1 enables the D-glucose transmembrane transporter activity."} {"text":"The compound CCC(C)C(N)C(=O)N1CCC(F)C1 targets the protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26). The protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26) is located in the membrane raft."}", "/scratch/micpie/export/compound_protein_go_term_1/train_8-0.jsonl": "{"text":"The compound CC(C)(C)\/C=C\/c1cccc(-c2cc(NC(=O)[C@@H]3CNC(=O)C3)nn2-c2ccccc2)c1 targets the protein High affinity sodium-glucose cotransporter which enables the D-glucose transmembrane transporter activity."} {"text":"The compound CCCC)CN)C=O)NCCCF)C5 targets the protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26) which is located in the membrane raft."}", "/scratch/micpie/export/compound_protein_go_term_1/test_5-1.jsonl": "{"text":"The compound [C][C@H1][Branch1][C][N][C][N][N][=C][C][=C][C][=C][Branch1][C][O][C][Branch1][C][F][=C][Ring1][Branch2][Ring1][O] targets the protein Serotonin receptor 2B. The protein Serotonin receptor 2B enables the serotonin binding."} {"text":"The compound CCCCCCCCN(CCCCCCCC)C(=O)[C@H](C)NC(=O)[C@@H](NC(=O)[C@H](Cc1ccc(OP(=O)(O)O)cc1)NC(C)=O)C(C)C targets the protein Abelson tyrosine-protein kinase 1. The protein Abelson tyrosine-protein kinase 1 is located in the ruffle."}", "/scratch/micpie/export/compound_protein_go_term_1/train_4-1.jsonl": "{"text":"The compound CN1CC[C@@](O)(C#Cc2ccc3c(c2)-n2nc(C(N)=O)c(C(F)(F)F)c2[C@@H](F)CO3)C1=O targets the protein Mitogen-activated protein kinase kinase kinase 14. The protein Mitogen-activated protein kinase kinase kinase 14 enables the protein serine kinase activity."} {"text":"The compound [O][=C][C][=C][Branch2][Ring1][C][N][C][C][C][=C][C][=C][Branch1][C][O][C][Branch1][C][O][=C][Ring1][Branch2][C][=Branch1][C][=O][C][=N][C][=C][C][=C][Ring1][=Branch1][Ring2][Ring1][=Branch1] targets the protein Cytovillin. The protein Cytovillin is involved in the negative regulation of transcription by RNA polymerase II."}", "/scratch/micpie/export/compound_protein_go_term_1/train_5-0.jsonl": "{"text":"The compound O=CC=CNCCccccO)cO)c6)))))))))C=O)cncccc6%10 targets the protein Ezrin which is located in the protein-containing complex."} {"text":"The compound O=C(N[C@@H](CO)c1ccccc1)c1ccc2nc(N[C@H]3CC[C@H](O)CC3)c3nccn3c2c1 targets the protein MAP kinase 8 which is involved in the cellular response to cadmium ion."}", "/scratch/micpie/export/compound_protein_go_term_1/valid_0-1.jsonl": "{"text":"The compound CCC=CCCC=O)NcccccC=O)NccccBr)cc6))))))))c6))))))C=O)C95 targets the protein Ataxia telangiectasia mutated. The protein Ataxia telangiectasia mutated enables the protein serine kinase activity."} {"text":"The compound CCC)C)Cccnccc6)[C@@H]NC[C@@H]O)[C@H]CccccF)cc6)))))))NC=O)CF))))))))C[C@]C[C@H]O)C4)))O6 targets the protein Cathepsin D (EC 3.4.23.5). The protein Cathepsin D (EC 3.4.23.5) is located in the collagen-containing extracellular matrix."}", "/scratch/micpie/export/compound_protein_go_term_1/valid_7-1.jsonl": "{"text":"The compound [C][C][Branch1][C][C][O][C@@H1][C][C][C][C][C@H1][Branch1][C][F][C][=C][C][=Branch1][C][=O][C][=C][C@][Ring1][#Branch1][Branch1][C][C][C@@][Ring1][=N][Branch1][C][F][C@@H1][Branch1][C][O][C][C@][Ring2][Ring1][Ring1][Branch1][C][C][C@][Ring2][Ring1][#Branch1][Branch1][#C][C][=Branch1][C][=O][C][S][C][C][C][O][C][Ring1][Branch1][=O][O][Ring2][Ring2][Branch1] targets the protein Nuclear receptor subfamily 3 group C member 1. The protein Nuclear receptor subfamily 3 group C member 1 enables the sequence-specific double-stranded DNA binding."} {"text":"The compound CS(=O)(=O)c1ccc(N2CCN(C(=O)c3cc(S(C)(=O)=O)ccc3-c3ccccc3)CC2)c(F)c1 targets the protein Sodium- and chloride-dependent glycine transporter 1. The protein Sodium- and chloride-dependent glycine transporter 1 is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_1/train_2-1.jsonl": "{"text":"The compound [Cl][C][=C][C][=C][C][Branch2][Ring1][Branch1][C][N][Branch1][#Branch2][C][C][=C][C][=C][C][=N][Ring1][=Branch1][C@@H1][C][C][N][C][Ring1][Branch1][=C][Ring2][Ring1][Ring2][Cl] targets the protein DA transporter. The protein DA transporter enables the protein N-terminus binding."} {"text":"The compound [C][C][Branch1][C][C][C][=Branch1][C][=O][N][Branch1][P][C][C][=C][C][Branch1][C][Cl][=C][C][Branch1][C][Cl][=C][Ring1][Branch2][Cl][C@H1][C][C][N][C][Ring1][Branch1] targets the protein Sodium-dependent dopamine transporter. The protein Sodium-dependent dopamine transporter is involved in the response to iron ion."}", "/scratch/micpie/export/compound_protein_go_term_1/valid_1-1.jsonl": "{"text":"The compound CCCccccC=O)OccccC=N)N))cc6))))))))s5))))))C=O)NCC=O)O targets the protein Enteropeptidase (EC 3.4.21.9) (Enterokinase) (Serine protease 7) (Transmembrane protease serine 15). The protein Enteropeptidase (EC 3.4.21.9) (Enterokinase) (Serine protease 7) (Transmembrane protease serine 15) enables the hydrolase activity."} {"text":"The compound C[C@@H]O)[C@H]O[C@@H]ccccCl)cCcncc-cccco5)))))s5))))))c6))))))[C@H]O)[C@@H]O)[C@@H]6O targets the protein Na(+)\/glucose cotransporter 2. The protein Na(+)\/glucose cotransporter 2 is located in the plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_1/test_3-1.jsonl": "{"text":"The compound O=C1NCC(c2cccc(Cl)c2)C1c1cccc(Cl)c1 targets the protein DA transporter. The protein DA transporter enables the heterocyclic compound binding."} {"text":"The compound Ncncc-cccNCCNCCOCC6))))))C4))))ncNC[C@H]F)[C@H]F)C5)))))n6))))))cc6OCF)F targets the protein Leucine-zipper protein kinase. The protein Leucine-zipper protein kinase enables the protein kinase binding."}", "/scratch/micpie/export/compound_protein_go_term_1/train_9-0.jsonl": "{"text":"The compound [C][C][C][Branch1][C][C][C][Branch1][C][N][C][=Branch1][C][=O][N][C][C][C][Branch1][C][F][C][Ring1][=Branch1] targets the protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26) which is located in the cell projection."} {"text":"The compound O=C(Nc1ccccc1-c1cn2c(CN3CC[C@@H](O)C3)csc2n1)c1ccc2ccccc2c1 targets the protein NAD-dependent protein deacetylase sirtuin-1 (hSIRT1) (EC 2.3.1.286) (NAD-dependent protein deacylase sirtuin-1) (EC 2.3.1.-) (Regulatory protein SIR2 homolog 1) (SIR2-like protein 1) (hSIR2) which enables the NAD-dependent histone decrotonylase activity."}", "/scratch/micpie/export/compound_protein_go_term_1/test_1-0.jsonl": "{"text":"The compound CCC)[C@H]C=O)NO)))NCccccnc6)))))))S=O)=O)ccccF)cc6 targets the protein 72 kDa type IV collagenase (EC 3.4.24.24) (72 kDa gelatinase) (Gelatinase A) (Matrix metalloproteinase-2) (MMP-2) (TBE-1) which is located in the cytoplasm."} {"text":"The compound [O][=S][=Branch1][C][=O][N][C][=C][Branch1][S][C][=C][Branch1][C][F][C][=C][C][=C][N][=C][Ring1][O][Ring1][=Branch1][C][=C][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][=C][C][=C][Ring1][#Branch2][Ring2][Ring1][Branch2] targets the protein Ferroportin-1 which is located in the basolateral plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_1/test_6-1.jsonl": "{"text":"The compound [C][O][C][=C][C][=N][C][=N][C][Branch2][Ring1][#Branch2][O][C][C][=N][N][=C][C][=C][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][=N][N][Ring1][#C][Ring1][N][=C][Ring2][Ring1][#Branch1][C][=C][Ring2][Ring1][O][O][C] targets the protein SF receptor. The protein SF receptor enables the molecular function activator activity."} {"text":"The compound [O][=C][Branch1][Branch2][C][C][C][C][C][C][S][N][C][=C][C][=C][C][=C][Ring1][=Branch1] targets the protein Polyamine deacetylase HDAC10. The protein Polyamine deacetylase HDAC10 is located in the cytoplasm."}", "/scratch/micpie/export/compound_protein_go_term_1/valid_4-1.jsonl": "{"text":"The compound CC(C)N(Cc1cccc(-c2nccs2)c1)C(=O)Nc1cnc(C(C)(C)C)nc1 targets the protein Trk-A. The protein Trk-A enables the nerve growth factor binding."} {"text":"The compound InChI=1S\/C23H27N5O3\/c29-23-19-12-24-28(16-8-10-30-11-9-16)22(19)25-21(26-23)20(15-6-7-15)27-13-18(14-27)31-17-4-2-1-3-5-17\/h1-5,12,15-16,18,20H,6-11,13-14H2,(H,25,26,29) targets the protein High affinity cGMP-specific 3^,5^-cyclic phosphodiesterase 9A. The protein High affinity cGMP-specific 3^,5^-cyclic phosphodiesterase 9A is located in the nucleoplasm."}", "/scratch/micpie/export/compound_protein_go_term_1/train_1-1.jsonl": "{"text":"The compound [O][=C][Branch1][S][N][C][C][N][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C@H1][Branch1][#Branch2][C][C][C][C][C][C][C][Ring1][=Branch1][N][C][=N][C][=C][C][Branch1][C][F][=C][C][=C][Ring1][#Branch1][O][Ring1][#Branch2] targets the protein Cathepsin S. The protein Cathepsin S enables the proteoglycan binding."} {"text":"The compound [Cl][C][=C][C][=C][C][Branch2][Ring1][Branch1][C][N][Branch1][#Branch2][C][C][=C][C][=C][C][=N][Ring1][=Branch1][C@@H1][C][C][N][C][Ring1][Branch1][=C][Ring2][Ring1][Ring2][Cl] targets the protein DA transporter. The protein DA transporter enables the protein phosphatase 2A binding."}", "/scratch/micpie/export/compound_protein_go_term_1/valid_7-0.jsonl": "{"text":"The compound InChI=1S\/C28H34F2O7S\/c1-24(2)36-22-11-15-16-10-18(29)17-9-14(31)5-7-25(17,3)27(16,30)20(32)12-26(15,4)28(22,37-24)21(33)13-38-19-6-8-35-23(19)34\/h5,7,9,15-16,18-20,22,32H,6,8,10-13H2,1-4H3\/t15?,16?,18-,19?,20-,22+,25-,26-,27-,28+\/m0\/s1 targets the protein Nuclear receptor subfamily 3 group C member 1 which enables the sequence-specific double-stranded DNA binding."} {"text":"The compound CS(=O)(=O)c1ccc(N2CCN(C(=O)c3cc(S(C)(=O)=O)ccc3-c3ccccc3)CC2)c(F)c1 targets the protein Solute carrier family 6 member 9 which is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_1/valid_8-1.jsonl": "{"text":"The compound CS(=O)(=O)c1ccc(C2(O)CCCCC2N2CCC3(CC2)C(=O)NCC3c2ccc(F)cc2)cc1 targets the protein Sodium- and chloride-dependent glycine transporter 1. The protein Sodium- and chloride-dependent glycine transporter 1 enables the transmembrane transporter activity."} {"text":"The compound CNC)\/C=N\/ccC#N))cnn5-cncncsc-cccccc6))))))cc95 targets the protein hRCE1. The protein hRCE1 is located in the integral component of endoplasmic reticulum membrane."}", "/scratch/micpie/export/compound_protein_go_term_1/train_0-1.jsonl": "{"text":"The compound CccccSCCC=O)ccccF)cc6))))))))C=O)O))))cc6 targets the protein A-T mutated. The protein A-T mutated enables the protein serine kinase activity."} {"text":"The compound COC=O)[C@@H]C[C@H]N=[N+]=[N-])))CN5C=O)\/C=C\/ccccOC))cOC))c6 targets the protein 72 kDa type IV collagenase (EC 3.4.24.24) (72 kDa gelatinase) (Gelatinase A) (Matrix metalloproteinase-2) (MMP-2) (TBE-1). The protein 72 kDa type IV collagenase (EC 3.4.24.24) (72 kDa gelatinase) (Gelatinase A) (Matrix metalloproteinase-2) (MMP-2) (TBE-1) is located in the collagen-containing extracellular matrix."}", "/scratch/micpie/export/compound_protein_go_term_1/valid_8-0.jsonl": "{"text":"The compound [C][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch2][Ring2][#Branch2][C][Branch1][C][O][C][C][C][C][C][Ring1][#Branch1][N][C][C][C][Branch1][Branch1][C][C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][Ring1][Branch2][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][=C][Ring2][Ring1][#C] targets the protein Solute carrier family 6 member 9 which enables the transmembrane transporter activity."} {"text":"The compound CN(C)\/C=N\/c1c(C#N)cnn1-c1ncnc2sc(-c3ccccc3)cc12 targets the protein Farnesylated proteins-converting enzyme 2 which is located in the integral component of endoplasmic reticulum membrane."}", "/scratch/micpie/export/compound_protein_go_term_1/test_9-1.jsonl": "{"text":"The compound InChI=1S\/C29H29ClN6O2\/c1-19-6-5-9-23(16-19)25-26(30)35-28(33-15-14-20-7-3-2-4-8-20)29(38)36(25)18-24(37)34-17-21-10-12-22(13-11-21)27(31)32\/h2-13,16H,14-15,17-18H2,1H3,(H3,31,32)(H,33,35)(H,34,37) targets the protein Prothrombin (EC 3.4.21.5) (Coagulation factor II). The protein Prothrombin (EC 3.4.21.5) (Coagulation factor II) enables the thrombospondin receptor activity."} {"text":"The compound O=CO)CCccccCOcccccc6)CCC5))))))))))cc6 targets the protein Retinoic acid receptor RXR-beta. The protein Retinoic acid receptor RXR-beta is located in the nucleus."}", "/scratch/micpie/export/compound_protein_go_term_1/valid_1-0.jsonl": "{"text":"The compound CC(Cc1ccc(C(=O)Oc2ccc(C(=N)N)cc2)s1)C(=O)NCC(=O)O targets the protein Enteropeptidase (EC 3.4.21.9) (Enterokinase) (Serine protease 7) (Transmembrane protease serine 15) which enables the hydrolase activity."} {"text":"The compound C[C@@H]O)[C@H]O[C@@H]ccccCl)cCcncc-cccco5)))))s5))))))c6))))))[C@H]O)[C@@H]O)[C@@H]6O targets the protein Low affinity sodium-glucose cotransporter which is located in the plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_1/train_6-0.jsonl": "{"text":"The compound InChI=1S\/C25H27N5O3\/c31-15-21(16-4-2-1-3-5-16)29-25(33)17-6-11-20-22(14-17)30-13-12-26-24(30)23(28-20)27-18-7-9-19(32)10-8-18\/h1-6,11-14,18-19,21,31-32H,7-10,15H2,(H,27,28)(H,29,33)\/t18-,19-,21-\/m0\/s1 targets the protein JNK-46 which is involved in the cellular response to mechanical stimulus."} {"text":"The compound CC(=O)N1CCN(CCN2[C@H]3CC[C@@H]2C[C@H](NC(=O)c2nn(C(C)C)c4ccccc24)C3)CC1 targets the protein 5-HT4 which is involved in the G protein-coupled receptor signaling pathway."}", "/scratch/micpie/export/compound_protein_go_term_1/train_3-1.jsonl": "{"text":"The compound [C][C][Branch1][C][C][C][=Branch1][C][=O][N][Branch1][P][C][C][=C][C][Branch1][C][Cl][=C][C][Branch1][C][Cl][=C][Ring1][Branch2][Cl][C@H1][C][C][N][C][Ring1][Branch1] targets the protein DA transporter. The protein DA transporter is involved in the response to xenobiotic stimulus."} {"text":"The compound COc1ccc2c(c1)\/C(=C\\c1ccc3c(\/C=C\/c4ccc(CN(C)C)cc4)n[nH]c3c1)C(=O)N2 targets the protein Polo-like kinase 4. The protein Polo-like kinase 4 is located in the spindle pole."}", "/scratch/micpie/export/compound_protein_go_term_1/test_8-0.jsonl": "{"text":"The compound InChI=1S\/C18H19NO2S\/c1-19-10-9-17(18-6-3-11-22-18)21-16-5-2-4-13-12-14(20)7-8-15(13)16\/h2-8,11-12,17,19-20H,9-10H2,1H3\/t17-\/m0\/s1 targets the protein Sodium-dependent noradrenaline transporter which is located in the neuron projection."} {"text":"The compound Nc1nc2cc(-c3ccccc3)ccc2c(=O)n1C[C@H]1CC[C@@H](c2ccccc2)O1 targets the protein Cathepsin D (EC 3.4.23.5) which is located in the collagen-containing extracellular matrix."}", "/scratch/micpie/export/compound_protein_go_term_1/valid_3-1.jsonl": "{"text":"The compound N#CcccS=O)=O)NccccF)cn6))))))))ccc6OcccF)ccc6Cl targets the protein RST. The protein RST enables the PDZ domain binding."} {"text":"The compound O=C(Nc1cc(-c2ccccc2)no1)c1ccc2cc3n(c2n1)CCCNC3=O targets the protein 90 kDa ribosomal protein S6 kinase 3. The protein 90 kDa ribosomal protein S6 kinase 3 is located in the nucleoplasm."}", "/scratch/micpie/export/compound_protein_go_term_1/train_9-1.jsonl": "{"text":"The compound CCC(C)C(N)C(=O)N1CCC(F)C1 targets the protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26). The protein Dipeptidyl peptidase 4 (EC 3.4.14.5) (ADABP) (Adenosine deaminase complexing protein 2) (ADCP-2) (Dipeptidyl peptidase IV) (DPP IV) (T-cell activation antigen CD26) (TP103) (CD antigen CD26) is located in the cell projection."} {"text":"The compound O=CNcccccc6-ccncCNCC[C@@H]O)C5))))))csc5n8)))))))))))))))ccccccccc6c%10 targets the protein NAD-dependent protein deacetylase sirtuin-1 (hSIRT1) (EC 2.3.1.286) (NAD-dependent protein deacylase sirtuin-1) (EC 2.3.1.-) (Regulatory protein SIR2 homolog 1) (SIR2-like protein 1) (hSIR2). The protein NAD-dependent protein deacetylase sirtuin-1 (hSIRT1) (EC 2.3.1.286) (NAD-dependent protein deacylase sirtuin-1) (EC 2.3.1.-) (Regulatory protein SIR2 homolog 1) (SIR2-like protein 1) (hSIR2) enables the NAD-dependent histone decrotonylase activity."}", "/scratch/micpie/export/compound_protein_go_term_1/test_4-1.jsonl": "{"text":"The compound [N][C][=N][C][=C][Branch2][Ring2][O][C][=C][C][Branch1][P][N][C][C][Branch1][=Branch2][N][C][C][O][C][C][Ring1][=Branch1][C][Ring1][#Branch2][=N][C][Branch1][=C][N][C][C@H1][Branch1][C][F][C@H1][Branch1][C][F][C][Ring1][#Branch1][=N][Ring2][Ring1][#Branch1][C][=C][Ring2][Ring1][=N][O][C][Branch1][C][F][F] targets the protein DLK. The protein DLK enables the transferase activity."} {"text":"The compound [O][=C][NH1][C][Branch1][#C][C][C][C][C][Ring1][Ring2][C][=N][C][=C][C][=N][Ring1][=Branch1][=N][C][=C][Ring1][S][C][=N][N][Ring1][Branch1][C][C][C][O][C][C][Ring1][=Branch1] targets the protein High affinity cGMP-specific 3^,5^-cyclic phosphodiesterase 9A. The protein High affinity cGMP-specific 3^,5^-cyclic phosphodiesterase 9A is located in the nucleoplasm."}", "/scratch/micpie/export/compound_protein_go_term_1/valid_6-1.jsonl": "{"text":"The compound InChI=1S\/C23H21N3O6\/c1-16-5-7-17(8-6-16)9-10-20-24-15-21(26(28)29)25(20)11-12-32-23(27)18-3-2-4-19-22(18)31-14-13-30-19\/h2-10,15H,11-14H2,1H3\/b10-9- targets the protein FADK 1. The protein FADK 1 enables the molecular function activator activity."} {"text":"The compound CCC)cncC=O)NCCCNCCCOCC6))))))CCO6)))))))))cccccn96 targets the protein Serotonin receptor 4. The protein Serotonin receptor 4 is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_1/valid_6-0.jsonl": "{"text":"The compound [C][C][=C][C][=C][Branch2][Ring2][=Branch1][\/C][=C][\\C][=N][C][=C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][N][Ring1][Branch2][C][C][O][C][=Branch1][C][=O][C][=C][C][=C][C][=C][Ring1][=Branch1][O][C][C][O][Ring1][=Branch1][C][=C][Ring2][Ring1][#C] targets the protein Focal adhesion kinase-related nonkinase which enables the molecular function activator activity."} {"text":"The compound CC(C)c1nc(C(=O)NCC2CN(C3CCOCC3)CCO2)c2ccccn12 targets the protein Serotonin receptor 4 which is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_1/train_3-0.jsonl": "{"text":"The compound CC(C)C(=O)N(Cc1cc(Cl)cc(Cl)c1Cl)[C@H]1CCNC1 targets the protein DAT which is involved in the response to xenobiotic stimulus."} {"text":"The compound InChI=1S\/C28H26N4O2\/c1-32(2)17-19-6-4-18(5-7-19)9-12-26-22-11-8-20(15-27(22)31-30-26)14-24-23-16-21(34-3)10-13-25(23)29-28(24)33\/h4-16H,17H2,1-3H3,(H,29,33)(H,30,31)\/b12-9+,24-14+ targets the protein Serine\/threonine-protein kinase PLK4 which is located in the spindle pole."}", "/scratch/micpie/export/compound_protein_go_term_1/train_7-0.jsonl": "{"text":"The compound CC(=O)N1CCN(CCN2[C@H]3CC[C@@H]2C[C@H](NC(=O)c2nn(C(C)C)c4ccccc24)C3)CC1 targets the protein 5-HT4 which is involved in the signal transduction."} {"text":"The compound O=C(Cc1ccncc1)NCc1cnc(Oc2ccc3c(c2)CCC(c2cccc(F)c2)O3)s1 targets the protein Solute carrier family 8 member 1 which is located in the integral component of plasma membrane."}", "/scratch/micpie/export/compound_protein_go_term_1/train_4-0.jsonl": "{"text":"The compound CN1CC[C@@](O)(C#Cc2ccc3c(c2)-n2nc(C(N)=O)c(C(F)(F)F)c2[C@@H](F)CO3)C1=O targets the protein HsNIK which enables the protein serine kinase activity."} {"text":"The compound O=C1C=C(NCCc2ccc(O)c(O)c2)C(=O)c2ncccc21 targets the protein Cytovillin which is involved in the negative regulation of transcription by RNA polymerase II."}", "/scratch/micpie/export/sr_hse_tox21/test_0-10.jsonl": "{"text":"User: Can you give me the SMILES of a molecule that is not toxic in the SR-Heat shock response assay?\nAssistant: Of course, here you go: O=C([O-])Cc1cccc2ccccc12"} {"text":"User: Can you create the SELFIES of a molecule that is not toxic in the Heat shock response assay?\nAssistant: Yes, here you go: [O][=N+1][Branch1][C][O-1][C][=C][C][=C][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][=C][Ring1][#Branch2]"}", "/scratch/micpie/export/sr_hse_tox21/valid_0-8.jsonl": "{"text":"User: Can you figure out if the molecule with the SELFIES [C][C][=C][N][=C][Branch1][C][C][C][Branch1][C][C][=N][Ring1][Branch2] is toxic in the SR-HSE assay?\nAssistant: No, this molecule is not toxic in the SR-HSE assay."} {"text":"User: Can you estimate if the molecule with the SMILES Nc1cc[nH]c(=O)n1 is toxic in the Heat shock response assay?\nAssistant: No, this molecule is not toxic in the Heat shock response assay."}", "/scratch/micpie/export/sr_hse_tox21/test_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the Heat shock response assay?\nConstraint: You must select none, one or more options from a or b without using any other words.\nOptions:\n[a] InChI=1S\/C10H8O3S\/c11-14(12,13)10-7-3-5-8-4-1-2-6-9(8)10\/h1-7H,(H,11,12,13)\/p-1\n[b] InChI=1S\/C12H10O2\/c13-12(14)8-10-6-3-5-9-4-1-2-7-11(9)10\/h1-7H,8H2,(H,13,14)\/p-1\nAnswer: a, b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the Heat shock response assay?\nConstraint: You must select none, one or more options from a, b, c, or d without using any additional words.\nOptions:\n[a] [C][C][C][C][Branch1][C][C][C][O]\n[b] [O][=N+1][Branch1][C][O-1][C][=C][C][=C][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][=C][Ring1][#Branch2]\n[c] [C][C][C][=C][C][=C][Branch2][Ring2][Ring2][C][=Branch1][C][=O][N][N][Branch2][Ring1][Ring1][C][=Branch1][C][=O][C][=C][C][Branch1][C][C][=C][C][Branch1][C][C][=C][Ring1][Branch2][C][Branch1][C][C][Branch1][C][C][C][C][=C][Ring2][Ring1][Branch2]\n[d] [C][C][=Branch1][C][=O][O][C][C][=Branch1][C][=O][C@@][Branch1][C][O][C][C][C@H1][C@@H1][C][C][C][=C][C][=Branch1][C][=O][C][C][C@][Ring1][#Branch1][Branch1][C][C][C@H1][Ring1][N][C][=Branch1][C][=O][C][C@@][Ring1][P][Ring2][Ring1][Branch1][C]\nAnswer: a, b, c, d"}", "/scratch/micpie/export/sr_hse_tox21/train_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the DeepSMILES CCOccccncSN)=O)=O))sc5c9 is toxic in the SR-HSE assay?\nAssistant: No, this molecule is not toxic in the SR-HSE assay."} {"text":"User: Can you tell me if the molecule with the canonical SMILES CN1C(C(=O)Nc2ccccn2)=C(O)c2ccccc2S1(=O)=O is toxic in the SR-Heat shock response assay?\nAssistant: No, this molecule is not toxic in the SR-Heat shock response assay."}", "/scratch/micpie/export/sr_hse_tox21/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Heat shock response assay.\ncanonical SMILES: O=C([O-])Cc1cccc2ccccc12\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Heat shock response assay.\nMolecule SELFIES: [O][=N+1][Branch1][C][O-1][C][=C][C][=C][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][=C][Ring1][#Branch2]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"}", "/scratch/micpie/export/sr_hse_tox21/valid_0-9.jsonl": "{"text":"User: Is the molecule with the SMILES Cc1cnc(C)c(C)n1 toxic in the Heat shock response assay?\nAssistant: No, it is not toxic in the Heat shock response assay."} {"text":"User: Is the molecule with the SMILES Nc1cc[nH]c(=O)n1 toxic in the SR-Heat shock response assay?\nAssistant: No, it is not toxic in the SR-Heat shock response assay."}", "/scratch/micpie/export/sr_hse_tox21/test_0-1.jsonl": "{"text":"The molecule with the SELFIES [O][=C][Branch1][C][O-1][C][C][=C][C][=C][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1] is not showing SR-HSE toxicity."} {"text":"The molecule with the canonical SMILES O=[N+]([O-])c1cccc(C(F)(F)F)c1 is not showing SR-HSE toxicity."}", "/scratch/micpie/export/sr_hse_tox21/valid_0-0.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C7H10N2\/c1-5-4-8-6(2)7(3)9-5\/h4H,1-3H3 is not toxic in the Heat shock response assay."} {"text":"The molecule with the canonical SMILES representation of Nc1cc[nH]c(=O)n1 is not toxic in the SR-Heat shock response assay."}", "/scratch/micpie/export/sr_hse_tox21/test_0-2.jsonl": "{"text":"Based on the canonical SMILES representation O=C([O-])Cc1cccc2ccccc12, the molecule has no SR-HSE toxicity properties."} {"text":"Based on the InChI representation InChI=1S\/C7H4F3NO2\/c8-7(9,10)5-2-1-3-6(4-5)11(12)13\/h1-4H, the molecule has no Heat shock response toxicity properties."}", "/scratch/micpie/export/sr_hse_tox21/valid_0-10.jsonl": "{"text":"User: Can you give me the SELFIES of a molecule that is not toxic in the SR-Heat shock response assay?\nAssistant: Yes, I'm happy to help, here you go: [C][C][=C][N][=C][Branch1][C][C][C][Branch1][C][C][=N][Ring1][Branch2]"} {"text":"User: Can you create the SELFIES of a molecule that is not toxic in the Heat shock response assay?\nAssistant: Sure, here you go: [N][C][C][=C][NH1][C][=Branch1][C][=O][N][=Ring1][#Branch1]"}", "/scratch/micpie/export/sr_hse_tox21/train_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-HSE assay.\nMolecule DeepSMILES: CCOccccncSN)=O)=O))sc5c9\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the SR-HSE assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Heat shock response assay.\nInChI: InChI=1S\/C15H13N3O4S\/c1-18-13(15(20)17-12-8-4-5-9-16-12)14(19)10-6-2-3-7-11(10)23(18,21)22\/h2-9,19H,1H3,(H,16,17,20)\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the Heat shock response assay."}", "/scratch/micpie/export/sr_hse_tox21/valid_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-Heat shock response assay.\nSMILES: Cc1cnc(C)c(C)n1\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the SR-Heat shock response assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-HSE assay.\nMolecule DeepSMILES: Nccc[nH]c=O)n6\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the SR-HSE assay."}", "/scratch/micpie/export/sr_hse_tox21/test_0-9.jsonl": "{"text":"User: Is the molecule with the SELFIES [O][=C][Branch1][C][O-1][C][C][=C][C][=C][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1] toxic in the Heat shock response assay?\nAssistant: No, it is not toxic in the Heat shock response assay."} {"text":"User: Is the molecule with the SMILES O=[N+]([O-])c1cccc(C(F)(F)F)c1 toxic in the SR-Heat shock response assay?\nAssistant: No, it is not toxic in the SR-Heat shock response assay."}", "/scratch/micpie/export/sr_hse_tox21/test_0-0.jsonl": "{"text":"The molecule with the SELFIES [O][=C][Branch1][C][O-1][C][C][=C][C][=C][C][=C][C][=C][C][=C][Ring1][#Branch2][Ring1][=Branch1] is not toxic in the SR-Heat shock response assay."} {"text":"The molecule with the canonical SMILES O=[N+]([O-])c1cccc(C(F)(F)F)c1 is not toxic in the Heat shock response assay."}", "/scratch/micpie/export/sr_hse_tox21/valid_0-7.jsonl": "{"text":"Task: Please generate a SELFIES based on the description.\nDescription: A molecule that is toxic in the Heat shock response assay.\nResult: [C][C][=C][N][=C][Branch1][C][C][C][Branch1][C][C][=N][Ring1][Branch2]"} {"text":"Task: Please generate a molecule canonical SMILES based on the description below.\nDescription: A molecule that is toxic in the SR-Heat shock response assay.\nResult: Nc1cc[nH]c(=O)n1"}", "/scratch/micpie/export/sr_hse_tox21/test_0-3.jsonl": "{"text":"The canonical SMILES O=C([O-])Cc1cccc2ccccc12 is from a molecule that is not identified as toxic in the SR-HSE assay."} {"text":"The InChI InChI=1S\/C7H4F3NO2\/c8-7(9,10)5-2-1-3-6(4-5)11(12)13\/h1-4H represents a molecule that is not identified as toxic in the SR-Heat shock response assay."}", "/scratch/micpie/export/sr_hse_tox21/valid_0-11.jsonl": "{"text":"User: I'm looking for the DeepSMILES of a molecule that is not toxic in the SR-Heat shock response assay?\nAssistant: This is a molecule that is not toxic in the SR-Heat shock response assay: CccncC)cC)n6"} {"text":"User: I'm searching for the SMILES of a molecule that is not toxic in the SR-HSE assay?\nAssistant: This is a molecule that is not toxic in the SR-HSE assay: Nc1cc[nH]c(=O)n1"}", "/scratch/micpie/export/sr_hse_tox21/train_0-0.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13) is not toxic in the SR-Heat shock response assay."} {"text":"The molecule with the SMILES representation of CN1C(C(=O)Nc2ccccn2)=C(O)c2ccccc2S1(=O)=O is not toxic in the Heat shock response assay."}", "/scratch/micpie/export/sr_hse_tox21/test_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-Heat shock response assay.\nMolecule DeepSMILES: O=C[O-])Ccccccccccc%106\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the SR-Heat shock response assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-Heat shock response assay.\nMolecule SMILES: O=[N+]([O-])c1cccc(C(F)(F)F)c1\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the SR-Heat shock response assay."}", "/scratch/micpie/export/sr_hse_tox21/train_0-10.jsonl": "{"text":"User: Can you give me the canonical SMILES of a molecule that is not toxic in the SR-Heat shock response assay?\nAssistant: Yes, here you go: CCOc1ccc2nc(S(N)(=O)=O)sc2c1"} {"text":"User: Can you generate the InChI of a molecule that is not toxic in the SR-HSE assay?\nAssistant: Yes, I'm happy to help, here you go: InChI=1S\/C15H13N3O4S\/c1-18-13(15(20)17-12-8-4-5-9-16-12)14(19)10-6-2-3-7-11(10)23(18,21)22\/h2-9,19H,1H3,(H,16,17,20)"}", "/scratch/micpie/export/sr_hse_tox21/train_0-3.jsonl": "{"text":"The SELFIES [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N] is from a molecule that is not identified as toxic in the SR-HSE assay."} {"text":"The canonical SMILES CN1C(C(=O)Nc2ccccn2)=C(O)c2ccccc2S1(=O)=O is from a molecule that is not identified as toxic in the SR-Heat shock response assay."}", "/scratch/micpie/export/sr_hse_tox21/train_0-12.jsonl": "{"text":"User: I want to create a InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the Heat shock response assay.\nAssistant: Got it, here you go, this InChI is not toxic in the Heat shock response assay: InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13)"} {"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very curious. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be toxic in the Heat shock response assay.\nAssistant: Got it, here you go, this DeepSMILES is not toxic in the Heat shock response assay: CNCC=O)Ncccccn6))))))))=CO)cccccc6S%10=O)=O"}", "/scratch/micpie/export/sr_hse_tox21/test_0-13.jsonl": "{"text":"User: I want to come up with a SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the Heat shock response assay.\nAssistant: Ok, this SMILES is not toxic in the Heat shock response assay: O=C([O-])Cc1cccc2ccccc12"} {"text":"User: I want to create a SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the Heat shock response assay.\nAssistant: Ok, this SMILES is not toxic in the Heat shock response assay: O=[N+]([O-])c1cccc(C(F)(F)F)c1"}", "/scratch/micpie/export/sr_hse_tox21/valid_0-2.jsonl": "{"text":"Based on the InChI representation InChI=1S\/C7H10N2\/c1-5-4-8-6(2)7(3)9-5\/h4H,1-3H3, the molecule has no SR-HSE toxicity characteristics."} {"text":"Based on the SMILES Nc1cc[nH]c(=O)n1, the molecule has no SR-Heat shock response toxicity features."}", "/scratch/micpie/export/sr_hse_tox21/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES representation of CCOc1ccc2nc(S(N)(=O)=O)sc2c1 toxic in the SR-HSE assay?\nConstraint: Even if you are uncertain, you must pick either a or b without using any additional words.\nOptions:\na. True\nb. False\nAnswer: b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES [C][N][C][Branch1][=C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=N][Ring1][=Branch1][=C][Branch1][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][S][Ring2][Ring1][Ring2][=Branch1][C][=O][=O] toxic in the Heat shock response assay?\nConstraint: Even if you are uncertain, you must pick either A or B without using any additional words.\nOptions:\n(A) False\n(B) True\nAnswer: A"}", "/scratch/micpie/export/sr_hse_tox21/valid_0-1.jsonl": "{"text":"The molecule with the canonical SMILES representation of Cc1cnc(C)c(C)n1 is not showing SR-HSE toxicity."} {"text":"The molecule with the DeepSMILES Nccc[nH]c=O)n6 is not showing SR-HSE toxicity."}", "/scratch/micpie/export/sr_hse_tox21/valid_0-13.jsonl": "{"text":"User: I want to come up with a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the SR-HSE assay.\nAssistant: Ok, this canonical SMILES is not toxic in the SR-HSE assay: Cc1cnc(C)c(C)n1"} {"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the SR-Heat shock response assay.\nAssistant: Got it, this canonical SMILES is not toxic in the SR-Heat shock response assay: Nc1cc[nH]c(=O)n1"}", "/scratch/micpie/export/sr_hse_tox21/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-HSE assay.\nSELFIES: [C][C][=C][N][=C][Branch1][C][C][C][Branch1][C][C][=N][Ring1][Branch2]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-HSE assay.\nMolecule SELFIES: [N][C][C][=C][NH1][C][=Branch1][C][=O][N][=Ring1][#Branch1]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/sr_hse_tox21/train_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the SR-HSE assay?\nConstraint: You must select none, one or more options from a or b without using any other words.\nOptions:\na.) CCOc1ccc2nc(S(N)(=O)=O)sc2c1\nb.) O=C1CN(N=Cc2ccc(-c3ccc([N+](=O)[O-])cc3)o2)C(=O)[N-]1\nAnswer: a, b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the SR-HSE assay?\nConstraint: You must select none, one or more options from 1, 2, 3, 4, or 5 without using any additional words.\nOptions:\n1 [C][N][C][Branch1][=C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=N][Ring1][=Branch1][=C][Branch1][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][S][Ring2][Ring1][Ring2][=Branch1][C][=O][=O]\n2 [C][C][O][Si][Branch1][Ring2][O][C][C][Branch1][Ring2][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1]\n3 [C][C][C][C][C][C][O][C][=Branch1][C][=O][C][C][C]\n4 [C][C][Branch1][Ring1][C][O][Branch1][Ring1][C][O][N+1][=Branch1][C][=O][O-1]\n5 [O][C][C@@H1][Branch1][C][O][C@H1][O][C@@H1][O][C@H1][Branch1][=Branch2][C][Branch1][C][Cl][Branch1][C][Cl][Cl][O][C@@H1][Ring1][=Branch2][C@H1][Ring1][N][O]\nAnswer: 1, 2, 3, 4, 5"}", "/scratch/micpie/export/sr_hse_tox21/valid_0-4.jsonl": "{"text":"The InChI InChI=1S\/C7H10N2\/c1-5-4-8-6(2)7(3)9-5\/h4H,1-3H3 is not toxic in the Heat shock response assay."} {"text":"The canonical SMILES Nc1cc[nH]c(=O)n1 is not toxic in the Heat shock response assay."}", "/scratch/micpie/export/sr_hse_tox21/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the Heat shock response assay.\nMolecule InChI: InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13)\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the SR-HSE assay.\nMolecule canonical SMILES: CN1C(C(=O)Nc2ccccn2)=C(O)c2ccccc2S1(=O)=O\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/sr_hse_tox21/valid_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the SR-Heat shock response assay?\nConstraint: You must select none, one or more options from A, B, C, or D without using any additional words.\nOptions:\nA) CCC(C)(C)C(=O)O\nB) CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21\nC) Cc1cnc(C)c(C)n1\nD) CCCC(=O)Cl\nAnswer: A, B, C, D"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the SR-HSE assay?\nConstraint: You must select none, one or more options from 1, 2, 3, 4, or 5 without using any other words.\nOptions:\n1. [O][=C][Branch1][Ring1][C][Cl][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Cl]\n2. [S][C][C][C][C][C][C][Ring1][=Branch1]\n3. [C][C][C][C][Branch1][C][C][=O]\n4. [N][C][C][=C][NH1][C][=Branch1][C][=O][N][=Ring1][#Branch1]\n5. [O][=C][C][=C][C][=C][C][=C][C][Ring1][=Branch1][=C][Branch1][C][Cl][C][=C][C][=C][C][=C][Ring1][#C][Ring1][=Branch1]\nAnswer: 2, 3, 4"}", "/scratch/micpie/export/sr_hse_tox21/valid_0-12.jsonl": "{"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be toxic in the SR-Heat shock response assay.\nAssistant: Got it, here you go, this SMILES is not toxic in the SR-Heat shock response assay: Cc1cnc(C)c(C)n1"} {"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be toxic in the Heat shock response assay.\nAssistant: Got it, here you go, this SMILES is not toxic in the Heat shock response assay: Nc1cc[nH]c(=O)n1"}", "/scratch/micpie/export/sr_hse_tox21/train_0-2.jsonl": "{"text":"Based on the canonical SMILES representation CCOc1ccc2nc(S(N)(=O)=O)sc2c1, the molecule has no SR-HSE toxicity properties."} {"text":"Based on the InChI representation InChI=1S\/C15H13N3O4S\/c1-18-13(15(20)17-12-8-4-5-9-16-12)14(19)10-6-2-3-7-11(10)23(18,21)22\/h2-9,19H,1H3,(H,16,17,20), the molecule has no SR-Heat shock response toxicity characteristics."}", "/scratch/micpie/export/sr_hse_tox21/test_0-11.jsonl": "{"text":"User: I'm looking for the SMILES of a molecule that is not toxic in the SR-Heat shock response assay?\nAssistant: This is a molecule that is not toxic in the SR-Heat shock response assay: O=C([O-])Cc1cccc2ccccc12"} {"text":"User: I'm searching for the SELFIES of a molecule that is not toxic in the SR-HSE assay?\nAssistant: This is a molecule that is not toxic in the SR-HSE assay: [O][=N+1][Branch1][C][O-1][C][=C][C][=C][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][=C][Ring1][#Branch2]"}", "/scratch/micpie/export/sr_hse_tox21/train_0-7.jsonl": "{"text":"Task: Please create a SELFIES based on the text description.\nDescription: A molecule that is toxic in the SR-HSE assay.\nResult: [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N]"} {"text":"Task: Please create a SMILES based on the description below.\nDescription: A molecule that is toxic in the SR-Heat shock response assay.\nResult: CN1C(C(=O)Nc2ccccn2)=C(O)c2ccccc2S1(=O)=O"}", "/scratch/micpie/export/sr_hse_tox21/train_0-11.jsonl": "{"text":"User: I'm looking for the SMILES of a molecule that is not toxic in the SR-Heat shock response assay?\nAssistant: This is a molecule that is not toxic in the SR-Heat shock response assay: CCOc1ccc2nc(S(N)(=O)=O)sc2c1"} {"text":"User: I'm searching for the DeepSMILES of a molecule that is not toxic in the SR-HSE assay?\nAssistant: This is a molecule that is not toxic in the SR-HSE assay: CNCC=O)Ncccccn6))))))))=CO)cccccc6S%10=O)=O"}", "/scratch/micpie/export/sr_hse_tox21/train_0-1.jsonl": "{"text":"The molecule with the DeepSMILES representation of CCOccccncSN)=O)=O))sc5c9 is not showing SR-HSE toxicity."} {"text":"The molecule with the DeepSMILES representation of CNCC=O)Ncccccn6))))))))=CO)cccccc6S%10=O)=O is not showing SR-HSE toxicity."}", "/scratch/micpie/export/sr_hse_tox21/train_0-13.jsonl": "{"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the SR-HSE assay.\nAssistant: Got it, this canonical SMILES is not toxic in the SR-HSE assay: CCOc1ccc2nc(S(N)(=O)=O)sc2c1"} {"text":"User: I want to generate a molecule InChI.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the SR-Heat shock response assay.\nAssistant: Got it, this InChI is not toxic in the SR-Heat shock response assay: InChI=1S\/C15H13N3O4S\/c1-18-13(15(20)17-12-8-4-5-9-16-12)14(19)10-6-2-3-7-11(10)23(18,21)22\/h2-9,19H,1H3,(H,16,17,20)"}", "/scratch/micpie/export/sr_hse_tox21/train_0-4.jsonl": "{"text":"The molecule SELFIES [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N] is not toxic in the SR-HSE assay."} {"text":"The SELFIES [C][N][C][Branch1][=C][C][=Branch1][C][=O][N][C][=C][C][=C][C][=N][Ring1][=Branch1][=C][Branch1][C][O][C][=C][C][=C][C][=C][Ring1][=Branch1][S][Ring2][Ring1][Ring2][=Branch1][C][=O][=O] is not toxic in the SR-Heat shock response assay."}", "/scratch/micpie/export/sr_hse_tox21/test_0-7.jsonl": "{"text":"Task: Please give me a canonical SMILES based on the text description below.\nDescription: A molecule that is toxic in the Heat shock response assay.\nResult: O=C([O-])Cc1cccc2ccccc12"} {"text":"Task: Please create a molecule DeepSMILES based on the text description below.\nDescription: A molecule that is toxic in the SR-HSE assay.\nResult: O=[N+][O-])cccccCF)F)F))c6"}", "/scratch/micpie/export/sr_hse_tox21/train_0-9.jsonl": "{"text":"User: Is the molecule with the SMILES CCOc1ccc2nc(S(N)(=O)=O)sc2c1 toxic in the SR-HSE assay?\nAssistant: No, it is not toxic in the SR-HSE assay."} {"text":"User: Is the molecule with the DeepSMILES CNCC=O)Ncccccn6))))))))=CO)cccccc6S%10=O)=O toxic in the SR-HSE assay?\nAssistant: No, it is not toxic in the SR-HSE assay."}", "/scratch/micpie/export/sr_hse_tox21/valid_0-3.jsonl": "{"text":"The DeepSMILES CccncC)cC)n6 is from a molecule that is not identified as toxic in the Heat shock response assay."} {"text":"The DeepSMILES Nccc[nH]c=O)n6 represents a molecule that is not identified as toxic in the Heat shock response assay."}", "/scratch/micpie/export/sr_hse_tox21/test_0-8.jsonl": "{"text":"User: Can you figure out if the molecule with the canonical SMILES O=C([O-])Cc1cccc2ccccc12 is toxic in the SR-HSE assay?\nAssistant: No, this molecule is not toxic in the SR-HSE assay."} {"text":"User: Can you figure out if the molecule with the DeepSMILES O=[N+][O-])cccccCF)F)F))c6 is toxic in the SR-Heat shock response assay?\nAssistant: No, this molecule is not toxic in the SR-Heat shock response assay."}", "/scratch/micpie/export/sr_hse_tox21/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of O=C([O-])Cc1cccc2ccccc12 toxic in the SR-Heat shock response assay?\nConstraint: Even if you are uncertain, you must pick either a or b without using any other words.\nOptions:\n[a] True\n[b] False\nAnswer: b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES O=[N+]([O-])c1cccc(C(F)(F)F)c1 toxic in the SR-Heat shock response assay?\nConstraint: Even if you are not sure, you must pick either A or B without using any other words.\nOptions:\nA.) False\nB.) True\nAnswer: A"}", "/scratch/micpie/export/sr_hse_tox21/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [C][C][=C][N][=C][Branch1][C][C][C][Branch1][C][C][=N][Ring1][Branch2] toxic in the SR-Heat shock response assay?\nConstraint: Even if you are not sure, you must pick either A or B without using any other words.\nOptions:\n[A] False\n[B] True\nAnswer: A"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI representation of InChI=1S\/C4H5N3O\/c5-3-1-2-6-4(8)7-3\/h1-2H,(H3,5,6,7,8) toxic in the SR-Heat shock response assay?\nConstraint: Even if you are not sure, you must pick either a or b without using any additional words.\nOptions:\na: True\nb: False\nAnswer: b"}", "/scratch/micpie/export/sr_hse_tox21/test_0-4.jsonl": "{"text":"The DeepSMILES O=C[O-])Ccccccccccc%106 is not toxic in the Heat shock response assay."} {"text":"The molecule DeepSMILES O=[N+][O-])cccccCF)F)F))c6 is not toxic in the Heat shock response assay."}", "/scratch/micpie/export/sr_hse_tox21/test_0-12.jsonl": "{"text":"User: I want to generate a InChI.\nAssistant: This sounds very curious. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the Heat shock response assay.\nAssistant: Ok, here you go, this InChI is not toxic in the Heat shock response assay: InChI=1S\/C12H10O2\/c13-12(14)8-10-6-3-5-9-4-1-2-7-11(9)10\/h1-7H,8H2,(H,13,14)\/p-1"} {"text":"User: I want to generate a SMILES.\nAssistant: This sounds very curious. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the Heat shock response assay.\nAssistant: Got it, here you go, this SMILES is not toxic in the Heat shock response assay: O=[N+]([O-])c1cccc(C(F)(F)F)c1"}", "/scratch/micpie/export/test.jsonl": "{"text":"The compound with the canonical SMILES of CC(C)(C)c1cccc(C[NH2+][C@H]2CS(=O)C[C@@H](Cc3cc(F)c(N)c(OC(C(F)(F)F)C(F)(F)F)c3)[C@@H]2O)c1 shows inhibition of the human beta-secretase 1 (BACE-1)."} {"text":"User: I'm searching for the SMILES of a molecule that has a VDss of 0.190 L\/kg.\nAssistant: This is a molecule that has a VDss of 0.190 L\/kg: CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1"}", "/scratch/micpie/export/chembl33_preprocessed_filtered_bioactivity_dataset_w_fullprotnames_smiles/test_0-1.jsonl": "{"text":"Task: Please approximate the bioaffinity of a protein to a molecule.\nProtein name: Maltase-glucoamylase\nDeepSMILES: CO[C@H]O[C@H]C)[C@@H]N[C@H]C=CCO))[C@@H]O)[C@H]O)[C@H]6O))))))))[C@H]O)[C@H]6O\nConstraint: The derived IC50 value should be in uM. Even if you are not sure, you must come up with a IC50 value without using any other words.\nResult: 3.2 uM"} {"text":"Task: Please estimate the bioaffinity of a protein to a molecule.\nProtein name: Retinoid-related orphan receptor-gamma\nMolecule SMILES: CNC(=O)[C@@H]1CCCN1S(=O)(=O)c1ccc(Cl)c(COc2cccc3c(-c4ccncc4)cc(C)nc23)c1Cl\nConstraints: The calculated EC50 value should be in uM. Even if you are not sure, you must derive a EC50 value without using any other words.\nResult: 0.63 uM"}", "/scratch/micpie/export/chembl33_preprocessed_filtered_bioactivity_dataset_w_fullprotnames_smiles/valid_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C@H1][C][C@H1][Branch1][C][N][C@H1][Branch1][C][O][C@@H1][Branch1][C][O][C@@H1][Ring1][=Branch2][N][C@H1][C][C@][Branch1][C][O][Branch1][Ring1][C][O][C@@H1][Branch1][C][O][C@H1][Branch1][C][O][C@H1][Ring1][O][O] displays a affinity for the protein Alpha-1,4-glucosidase with a IC50 of 0.028 uM."} {"text":"The InChI InChI=1S\/C29H27Cl2N3O4S\/c1-18-16-21(19-8-4-3-5-9-19)20-10-6-12-25(28(20)33-18)38-17-22-23(30)13-14-26(27(22)31)39(36,37)34-15-7-11-24(34)29(35)32-2\/h3-6,8-10,12-14,16,24H,7,11,15,17H2,1-2H3,(H,32,35)\/t24-\/m0\/s1 exhibits a affinity for the protein Nuclear receptor ROR-gamma with a EC50 of 0.605 uM."}", "/scratch/micpie/export/chembl33_preprocessed_filtered_bioactivity_dataset_w_fullprotnames_smiles/test_0-2.jsonl": "{"text":"Task: Please create a molecule InChI that has a affinity to Alpha-1,4-glucosidase with a IC50 of 3.2 uM.\nResult: InChI=1S\/C14H25NO8\/c1-5-8(11(19)13(21)14(22-2)23-5)15-7-3-6(4-16)9(17)12(20)10(7)18\/h3,5,7-21H,4H2,1-2H3\/t5-,7+,8-,9-,10+,11+,12+,13-,14+\/m1\/s1"} {"text":"Task: Please create a molecule SMILES that has a affinity to Retinoid-related orphan receptor-gamma with a EC50 value of 0.63 uM.\nResult: CNC(=O)[C@@H]1CCCN1S(=O)(=O)c1ccc(Cl)c(COc2cccc3c(-c4ccncc4)cc(C)nc23)c1Cl"}", "/scratch/micpie/export/chembl33_preprocessed_filtered_bioactivity_dataset_w_fullprotnames_smiles/test_0-0.jsonl": "{"text":"The molecule with the DeepSMILES CO[C@H]O[C@H]C)[C@@H]N[C@H]C=CCO))[C@@H]O)[C@H]O)[C@H]6O))))))))[C@H]O)[C@H]6O exhibits a affinity for the protein Alpha-1,4-glucosidase with a IC50 of 3.2 uM."} {"text":"The molecule with the SELFIES [C][N][C][=Branch1][C][=O][C@@H1][C][C][C][N][Ring1][Branch1][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][Branch2][Ring1][=C][C][O][C][=C][C][=C][C][=C][Branch1][=Branch2][C][=C][C][=N][C][=C][Ring1][=Branch1][C][=C][Branch1][C][C][N][=C][Ring1][P][Ring1][=N][=C][Ring2][Ring1][#Branch2][Cl] shows a bioaffinity for RAR-related orphan receptor C with a EC50 of 0.63 uM."}", "/scratch/micpie/export/chembl33_preprocessed_filtered_bioactivity_dataset_w_fullprotnames_smiles/test_0-3.jsonl": "{"text":"User: Can you give me an example of a protein that has a affinity to the SMILES CO[C@H]1O[C@H](C)[C@@H](N[C@H]2C=C(CO)[C@@H](O)[C@H](O)[C@H]2O)[C@H](O)[C@H]1O?\nAssistant: For example, Alpha-1,4-glucosidase has a affinity to the SMILES CO[C@H]1O[C@H](C)[C@@H](N[C@H]2C=C(CO)[C@@H](O)[C@H](O)[C@H]2O)[C@H](O)[C@H]1O.\nUser: Can you estimate the IC50 of this molecule?\nAssistant: Yes, I'm happy to help, the IC50 is 3.2 uM."} {"text":"User: Can you come up with one example of a protein that has a affinity to the canonical SMILES CNC(=O)[C@@H]1CCCN1S(=O)(=O)c1ccc(Cl)c(COc2cccc3c(-c4ccncc4)cc(C)nc23)c1Cl?\nAssistant: For example, RAR-related orphan receptor C has a affinity to the canonical SMILES CNC(=O)[C@@H]1CCCN1S(=O)(=O)c1ccc(Cl)c(COc2cccc3c(-c4ccncc4)cc(C)nc23)c1Cl.\nUser: Can you derive the EC50 of this molecule?\nAssistant: Of course, the EC50 value is 0.63 uM."}", "/scratch/micpie/export/chembl33_preprocessed_filtered_bioactivity_dataset_w_fullprotnames_smiles/train_0-0.jsonl": "{"text":"The SMILES representation of OCC1NC(CO)[C@@H](O)C(O)C1O displays a bioaffinity for Maltase-glucoamylase with a Ki of 3.8 uM."} {"text":"The molecule with the SMILES CN1C(=O)CC[C@H]1C(=O)N1CC[C@@]2(S(=O)(=O)c3ccc(F)cc3)c3ccc(C(F)(C(F)(F)F)C(F)(F)F)cc3CC[C@@H]12 displays a affinity for the protein Nuclear receptor RZR-gamma with a EC50 value of 0.1545 uM."}", "/scratch/micpie/export/chembl33_preprocessed_filtered_bioactivity_dataset_w_fullprotnames_smiles/train_0-3.jsonl": "{"text":"User: Can you come up with one example of a protein that has a affinity to the SELFIES [O][C][C][N][C][Branch1][Ring1][C][O][C@@H1][Branch1][C][O][C][Branch1][C][O][C][Ring1][#Branch2][O]?\nAssistant: For example, the protein Maltase-glucoamylase has a affinity to the SELFIES [O][C][C][N][C][Branch1][Ring1][C][O][C@@H1][Branch1][C][O][C][Branch1][C][O][C][Ring1][#Branch2][O].\nUser: Can you derive the Ki of this molecule for me?\nAssistant: Sure, the Ki value is 3.8 uM."} {"text":"User: Can you come up with an example of a protein that has a affinity to the SELFIES [C][N][C][=Branch1][C][=O][C][C][C@H1][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][C@@][Branch2][Ring1][Ring1][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][=C][C][=C][Branch2][Ring1][#Branch1][C][Branch1][C][F][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][C][=C][Ring1][S][C][C][C@@H1][Ring2][Ring1][P][Ring2][Ring1][=C]?\nAssistant: For example, Nuclear receptor subfamily 1 group F member 3 has a affinity to the SELFIES [C][N][C][=Branch1][C][=O][C][C][C@H1][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][C@@][Branch2][Ring1][Ring1][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][=C][C][=C][Branch2][Ring1][#Branch1][C][Branch1][C][F][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][C][=C][Ring1][S][C][C][C@@H1][Ring2][Ring1][P][Ring2][Ring1][=C].\nUser: Can you approximate the EC50 for me?\nAssistant: Yes, the EC50 value is 0.1545 uM."}", "/scratch/micpie/export/chembl33_preprocessed_filtered_bioactivity_dataset_w_fullprotnames_smiles/valid_0-2.jsonl": "{"text":"Task: Please create a molecule SELFIES that has a affinity to the protein Maltase-glucoamylase with a IC50 value of 0.028 uM.\nResult: [C][C@H1][C][C@H1][Branch1][C][N][C@H1][Branch1][C][O][C@@H1][Branch1][C][O][C@@H1][Ring1][=Branch2][N][C@H1][C][C@][Branch1][C][O][Branch1][Ring1][C][O][C@@H1][Branch1][C][O][C@H1][Branch1][C][O][C@H1][Ring1][O][O]"} {"text":"Task: Please generate a canonical SMILES that has a affinity to the protein Nuclear receptor RZR-gamma with a EC50 value of 0.605 uM.\nResult: CNC(=O)[C@@H]1CCCN1S(=O)(=O)c1ccc(Cl)c(COc2cccc3c(-c4ccccc4)cc(C)nc23)c1Cl"}", "/scratch/micpie/export/chembl33_preprocessed_filtered_bioactivity_dataset_w_fullprotnames_smiles/valid_0-1.jsonl": "{"text":"Task: Please estimate the bioaffinity of a protein to a molecule.\nProtein name: Maltase-glucoamylase\nInChI: InChI=1S\/C14H28N2O7\/c1-5-2-6(15)9(18)11(20)8(5)16-7-3-14(23,4-17)13(22)12(21)10(7)19\/h5-13,16-23H,2-4,15H2,1H3\/t5-,6-,7-,8+,9-,10-,11-,12+,13-,14-\/m0\/s1\nConstraint: The resulting IC50 value should be in uM. Even if you are not sure, you must estimate a IC50 value without using any other words.\nResult: 0.028 uM"} {"text":"Task: Please estimate the affinity of a molecule to a protein.\nProtein name: Nuclear receptor RZR-gamma\ncanonical SMILES: CNC(=O)[C@@H]1CCCN1S(=O)(=O)c1ccc(Cl)c(COc2cccc3c(-c4ccccc4)cc(C)nc23)c1Cl\nConstraints: The calculated EC50 value should be in uM. Even if you are not sure, you must estimate a EC50 value without using any other words.\nResult: 0.605 uM"}", "/scratch/micpie/export/chembl33_preprocessed_filtered_bioactivity_dataset_w_fullprotnames_smiles/valid_0-4.jsonl": "{"text":"User: Can you come up with an example of a protein that has a bioaffinity to the InChI InChI=1S\/C14H28N2O7\/c1-5-2-6(15)9(18)11(20)8(5)16-7-3-14(23,4-17)13(22)12(21)10(7)19\/h5-13,16-23H,2-4,15H2,1H3\/t5-,6-,7-,8+,9-,10-,11-,12+,13-,14-\/m0\/s1?\nAssistant: The protein Maltase-glucoamylase has for example a bioaffinity to the InChI InChI=1S\/C14H28N2O7\/c1-5-2-6(15)9(18)11(20)8(5)16-7-3-14(23,4-17)13(22)12(21)10(7)19\/h5-13,16-23H,2-4,15H2,1H3\/t5-,6-,7-,8+,9-,10-,11-,12+,13-,14-\/m0\/s1.\nUser: Can you derive the IC50 for me?\nAssistant: Of course, the IC50 value is 0.028 uM.\nUser: Can you give additional information on the assay used for this estimation?\nAssistant: Yes, here you go:\nInhibitory activity against porcine maltase"} {"text":"User: Can you give me an example of a protein that has a bioaffinity to the SELFIES [C][N][C][=Branch1][C][=O][C@@H1][C][C][C][N][Ring1][Branch1][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][Branch2][Ring1][=C][C][O][C][=C][C][=C][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][C][C][N][=C][Ring1][P][Ring1][=N][=C][Ring2][Ring1][#Branch2][Cl]?\nAssistant: Nuclear receptor RZR-gamma has for example a bioaffinity to the SELFIES [C][N][C][=Branch1][C][=O][C@@H1][C][C][C][N][Ring1][Branch1][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][C][Cl][C][Branch2][Ring1][=C][C][O][C][=C][C][=C][C][=C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][C][C][N][=C][Ring1][P][Ring1][=N][=C][Ring2][Ring1][#Branch2][Cl].\nUser: Can you approximate the EC50 of this molecule?\nAssistant: Sure, the EC50 value is 0.605 uM.\nUser: Can you give more details about the assay used?\nAssistant: Of course, here you go:\nInverse agonist activity at RORgamma in human CD4+ T cells assessed as inhibition of IL17A secretion after 72 hrs by ELISA"}", "/scratch/micpie/export/chembl33_preprocessed_filtered_bioactivity_dataset_w_fullprotnames_smiles/train_0-2.jsonl": "{"text":"Task: Please create a molecule SMILES that has a bioaffinity to Alpha-1,4-glucosidase with a Ki of 3.8 uM.\nResult: OCC1NC(CO)[C@@H](O)C(O)C1O"} {"text":"Task: Please create a SMILES that has a bioaffinity to RAR-related orphan receptor C with a EC50 of 0.1545 uM.\nResult: CN1C(=O)CC[C@H]1C(=O)N1CC[C@@]2(S(=O)(=O)c3ccc(F)cc3)c3ccc(C(F)(C(F)(F)F)C(F)(F)F)cc3CC[C@@H]12"}", "/scratch/micpie/export/chembl33_preprocessed_filtered_bioactivity_dataset_w_fullprotnames_smiles/train_0-1.jsonl": "{"text":"Task: Please derive the bioaffinity of a protein to a molecule.\nProtein: Alpha-1,4-glucosidase\nSELFIES: [O][C][C][N][C][Branch1][Ring1][C][O][C@@H1][Branch1][C][O][C][Branch1][C][O][C][Ring1][#Branch2][O]\nConstraints: The calculated Ki should be in uM. Even if you are uncertain, you must come up with a Ki without using any additional words.\nResult: 3.8 uM"} {"text":"Task: Please derive the bioaffinity of a protein to a molecule.\nProtein name: Nuclear receptor ROR-gamma\nSELFIES: [C][N][C][=Branch1][C][=O][C][C][C@H1][Ring1][=Branch1][C][=Branch1][C][=O][N][C][C][C@@][Branch2][Ring1][Ring1][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][C][F][C][=C][Ring1][#Branch1][C][=C][C][=C][Branch2][Ring1][#Branch1][C][Branch1][C][F][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][Branch1][C][F][Branch1][C][F][F][C][=C][Ring1][S][C][C][C@@H1][Ring2][Ring1][P][Ring2][Ring1][=C]\nConstraints: The derived EC50 value should be in uM. Even if you are uncertain, you must estimate a EC50 value without using any additional words.\nResult: 0.1545 uM"}", "/scratch/micpie/export/chembl33_preprocessed_filtered_bioactivity_dataset_w_fullprotnames_smiles/train_0-4.jsonl": "{"text":"User: Can you come up with one example of a protein that has a affinity to the DeepSMILES OCCNCCO))[C@@H]O)CO)C6O?\nAssistant: The protein Alpha-1,4-glucosidase has for example a affinity to the DeepSMILES OCCNCCO))[C@@H]O)CO)C6O.\nUser: Can you derive the Ki for me?\nAssistant: Sure, the Ki is 3.8 uM.\nUser: Can you give me additional information on the assay used?\nAssistant: Sure, here you go:\nCompound was tested for binding affinity against alpha-glucosidase"} {"text":"User: Can you come up with one example of a protein that has a affinity to the canonical SMILES CN1C(=O)CC[C@H]1C(=O)N1CC[C@@]2(S(=O)(=O)c3ccc(F)cc3)c3ccc(C(F)(C(F)(F)F)C(F)(F)F)cc3CC[C@@H]12?\nAssistant: The protein Nuclear receptor subfamily 1 group F member 3 has for example a affinity to the canonical SMILES CN1C(=O)CC[C@H]1C(=O)N1CC[C@@]2(S(=O)(=O)c3ccc(F)cc3)c3ccc(C(F)(C(F)(F)F)C(F)(F)F)cc3CC[C@@H]12.\nUser: Can you estimate the EC50 of this molecule for me?\nAssistant: Sure, the EC50 value is 0.1545 uM.\nUser: Can you give me additional information about the assay used for this estimation?\nAssistant: Yes, here you go:\nInverse agonist activity at RORgammat in human whole blood assessed as IL-17 production by ELISA"}", "/scratch/micpie/export/chembl33_preprocessed_filtered_bioactivity_dataset_w_fullprotnames_smiles/valid_0-3.jsonl": "{"text":"User: Can you give me one example of a protein that has a bioaffinity to the DeepSMILES C[C@H]C[C@H]N)[C@H]O)[C@@H]O)[C@@H]6N[C@H]C[C@]O)CO))[C@@H]O)[C@H]O)[C@H]6O?\nAssistant: Maltase-glucoamylase has a bioaffinity to the DeepSMILES C[C@H]C[C@H]N)[C@H]O)[C@@H]O)[C@@H]6N[C@H]C[C@]O)CO))[C@@H]O)[C@H]O)[C@H]6O.\nUser: Can you derive the IC50 for me?\nAssistant: Sure, the IC50 is 0.028 uM."} {"text":"User: Can you come up with an example of a protein that has a affinity to the canonical SMILES CNC(=O)[C@@H]1CCCN1S(=O)(=O)c1ccc(Cl)c(COc2cccc3c(-c4ccccc4)cc(C)nc23)c1Cl?\nAssistant: For example, Nuclear receptor ROR-gamma has a affinity to the canonical SMILES CNC(=O)[C@@H]1CCCN1S(=O)(=O)c1ccc(Cl)c(COc2cccc3c(-c4ccccc4)cc(C)nc23)c1Cl.\nUser: Can you derive the EC50 of this molecule?\nAssistant: Yes, the EC50 is 0.605 uM."}", "/scratch/micpie/export/chembl33_preprocessed_filtered_bioactivity_dataset_w_fullprotnames_smiles/test_0-4.jsonl": "{"text":"User: Can you come up with an example of a protein that has a affinity to the canonical SMILES CO[C@H]1O[C@H](C)[C@@H](N[C@H]2C=C(CO)[C@@H](O)[C@H](O)[C@H]2O)[C@H](O)[C@H]1O?\nAssistant: Maltase-glucoamylase has for example a affinity to the canonical SMILES CO[C@H]1O[C@H](C)[C@@H](N[C@H]2C=C(CO)[C@@H](O)[C@H](O)[C@H]2O)[C@H](O)[C@H]1O.\nUser: Can you estimate the IC50 of this molecule for me?\nAssistant: Of course, the IC50 value is 3.2 uM.\nUser: Can you give me more information on the assay used?\nAssistant: Sure, here you go:\nInhibitory activity against porcine maltase"} {"text":"User: Can you come up with one example of a protein that has a affinity to the InChI InChI=1S\/C28H26Cl2N4O4S\/c1-17-15-20(18-10-12-32-13-11-18)19-5-3-7-24(27(19)33-17)38-16-21-22(29)8-9-25(26(21)30)39(36,37)34-14-4-6-23(34)28(35)31-2\/h3,5,7-13,15,23H,4,6,14,16H2,1-2H3,(H,31,35)\/t23-\/m0\/s1?\nAssistant: The protein Retinoid-related orphan receptor-gamma has for example a affinity to the InChI InChI=1S\/C28H26Cl2N4O4S\/c1-17-15-20(18-10-12-32-13-11-18)19-5-3-7-24(27(19)33-17)38-16-21-22(29)8-9-25(26(21)30)39(36,37)34-14-4-6-23(34)28(35)31-2\/h3,5,7-13,15,23H,4,6,14,16H2,1-2H3,(H,31,35)\/t23-\/m0\/s1.\nUser: Can you approximate the EC50 for me?\nAssistant: Yes, the EC50 is 0.63 uM.\nUser: Can you give more information about the assay used?\nAssistant: Of course, here you go:\nInverse agonist activity at pSG5GAL4-DBD\/LBD fused human RORgamma receptor expressed in COS7 cells co-expressing 5 copies of GAL4 response element after 18 hrs by luciferase reporter gene based assay"}", "/scratch/micpie/export/bio_ner_23/valid_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Subsequent studies have revealed that (i) cytoplasmic RelA is stably associated not only with I kappa B alpha but also with other ankyrin motif-rich proteins including the products of the NF-kappa B2 (p100) and NF-kappa B1 (p105) genes; (ii) in contrast to RelA-I kappa B alpha, RelA-p100 cytoplasmic complexes are not dissociated following tumor necrosis factor alpha activation; (iii) p100 functions as a potent inhibitor of RelA-mediated transcription in vivo; (iv) the interaction of RelA and p100 involves the conserved Rel homology domain of both proteins but not the nuclear localization signal of RelA, which is required for I kappa B alpha binding; (v) p100 inhibition of RelA function requires the C-terminal ankyrin motif domain, which mediates cytoplasmic retention of RelA; and (vi) as observed with I kappa B alpha, nuclear RelA stimulates p100 mRNA and protein expression..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: RelA,55,59,Protein\nI kappa B alpha,95,110,Protein\nNF - kappa B2,191,204,Protein\np100,207,211,Protein\nNF - kappa B1,217,230,Protein\np105,233,237,Protein\nRelA,267,271,Protein\nI kappa B alpha,274,289,Protein\nRelA,291,295,Protein\np100,298,302,Protein\ntumor necrosis factor alpha,355,382,Protein\np100,402,406,Protein\nRelA,442,446,Protein\nRelA,506,510,Protein\np100,515,519,Protein\nRelA,623,627,Protein\nI kappa B alpha,651,666,Protein\np100,681,685,Protein\nRelA,700,704,Protein\nRelA,802,806,Protein\nI kappa B alpha,835,850,Protein\nRelA,860,864,Protein\np100,876,880,Protein"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Subsequent studies have revealed that (i) cytoplasmic RelA is stably associated not only with I kappa B alpha but also with other ankyrin motif-rich proteins including the products of the NF-kappa B2 (p100) and NF-kappa B1 (p105) genes; (ii) in contrast to RelA-I kappa B alpha, RelA-p100 cytoplasmic complexes are not dissociated following tumor necrosis factor alpha activation; (iii) p100 functions as a potent inhibitor of RelA-mediated transcription in vivo; (iv) the interaction of RelA and p100 involves the conserved Rel homology domain of both proteins but not the nuclear localization signal of RelA, which is required for I kappa B alpha binding; (v) p100 inhibition of RelA function requires the C-terminal ankyrin motif domain, which mediates cytoplasmic retention of RelA; and (vi) as observed with I kappa B alpha, nuclear RelA stimulates p100 mRNA and protein expression..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: RelA,55,59,Protein\nI kappa B alpha,95,110,Protein\nNF - kappa B2,191,204,Protein\np100,207,211,Protein\nNF - kappa B1,217,230,Protein\np105,233,237,Protein\nRelA,267,271,Protein\nI kappa B alpha,274,289,Protein\nRelA,291,295,Protein\np100,298,302,Protein\ntumor necrosis factor alpha,355,382,Protein\np100,402,406,Protein\nRelA,442,446,Protein\nRelA,506,510,Protein\np100,515,519,Protein\nRelA,623,627,Protein\nI kappa B alpha,651,666,Protein\np100,681,685,Protein\nRelA,700,704,Protein\nRelA,802,806,Protein\nI kappa B alpha,835,850,Protein\nRelA,860,864,Protein\np100,876,880,Protein"}", "/scratch/micpie/export/bio_ner_23/test_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: The main patient characteristics are provided in Table 1. Table 1Main characteristics of patients included in the studyTotalHLB-NHLT-NHLNK\/TCLDLBCLOthers (n = 174)(n = 18)(n = 98)(n = 28)(n = 9)(n = 21) Gender Male107106117613 Female678371138Age (range), years ≤ 60117156116718 > 60573371223Stage I – IIA6924211114 IIB – IV10516561787B-symptoms No916601429 Yes83123814712Bulky disease No153138426921 Yes21514200Lactate dehydrogenase Normal9694817814 Elevated789501117IPI Low (0 or 1) 56 – 39836 Intermediate low (2) 38 – 225110 Intermediate high (3) 27 – 13815 High (4 or 5) 35 – 24740IPS 0 – 2 – 9 –––– ≥ 3 – 9 –––– Data were presented as n HL, Hodgkin ’ s lymphoma; B-NHL, B cell non-Hodgkin ’ s lymphoma; DLBCL, diffuse large B cell lymphoma; Others group included 9 follicular lymphoma, 7 small cell lymphoma\/chronic lymphocytic leukemia, 6 marginal zone lymphoma, 5 mantle cell lymphoma and 1 Burkitt lymphoma; T-NHL, T cell non-Hodgkin ’ s lymphoma, included in 4 peripheral T cell lymphoma, 3 anaplastic large-cell lymphoma, 2 angioimmunoblastic T cell lymphoma; NK\/TCL, extranodal NK\/T cell lymphoma; IPI, international prognostic index; IPS, international prognostic score.\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: - NHLT,128,134,Disease\/Disorder\n- NHLNK,135,142,Disease\/Disorder\n\/ TCLDLBCLOthers,143,159,Disease\/Disorder\nYes21514200Lactate,421,439,Chemical\/Drug\nHL,667,669,Disease\/Disorder\nHodgkin ’ s lymphoma,671,691,Disease\/Disorder\nB - NHL,693,700,Disease\/Disorder\nB cell non - Hodgkin ’ s lymphoma,702,735,Disease\/Disorder\nDLBCL,737,742,Disease\/Disorder\ndiffuse large B cell lymphoma,744,773,Disease\/Disorder\nfollicular lymphoma,799,818,Disease\/Disorder\nsmall cell lymphoma,822,841,Disease\/Disorder\nchronic lymphocytic leukemia,844,872,Disease\/Disorder\nmarginal zone lymphoma,876,898,Disease\/Disorder\nmantle cell lymphoma,902,922,Disease\/Disorder\nBurkitt lymphoma,929,945,Disease\/Disorder\nT - NHL,947,954,Disease\/Disorder\nT cell non - Hodgkin ’ s lymphoma,956,989,Disease\/Disorder\nperipheral T cell lymphoma,1005,1031,Disease\/Disorder\nanaplastic large - cell lymphoma,1035,1067,Disease\/Disorder\nangioimmunoblastic T cell lymphoma,1071,1105,Disease\/Disorder\nNK \/ TCL,1107,1115,Disease\/Disorder\nextranodal NK \/ T cell lymphoma,1117,1148,Disease\/Disorder"} {"text":"Task: Please carry out the NER task for the the text below.\nText: BMI, body mass index; ALT, alanine aminotransferase, Normal value: 8 – 50.00 U\/L; AST, aspartate aminotransferase, Normal value: 8 – 40.00 U\/L; ALP, alkaline phosphatase, Normal value: 15 – 112.00 U\/L; GGT, glutamyl transpeptidase, Normal value: 5 – 54.00 U\/L; TP, total protein, Normal value: 60 – 83.00 g\/l; ALB, albumin, Normal value: 35 – 55.00 g\/l; GLB, globulin, Normal value: 20 – 30.00 g\/L; TBIL, total bilirubin, Normal value: 6.8 – 30.00 μmol\/L; DBIL, direct bilirubin, Normal value: 0 – 8.60 μmol\/L; IBIL, indirect bilirubin, Normal value: 5.1 – 21.40 umol\/L; CHE, cholinesterase, Normal value: 4300 – 12000.00 U\/L; BUN, blood urea nitrogen, Normal value: 3.2 – 7.00 mmol\/L; Cr, creatinine, Normal value: 44 – 115.00 μmol\/L; TG, triglycerides, Normal value: 0.28 – 1.80 mmol\/L; TC, total cholesterol, Normal value: 2.6 – 6.00 mmol\/L; GLU, glucose, Normal value: 3.9 – 6.10 mmol\/L; PLT, platelet, Normal value: 100 – 300.00 10 ^ 9\/L; APRI, aspartate aminotransferase to platelet ratio index; FIB-4, fibrosis index based on the four factors score.\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: ALT,22,25,Gene\/Protein\nalanine aminotransferase,27,51,Gene\/Protein\nAST,85,88,Gene\/Protein\naspartate aminotransferase,90,116,Gene\/Protein\nALP,150,153,Gene\/Protein\nalkaline phosphatase,155,175,Gene\/Protein\nGGT,211,214,Gene\/Protein\nglutamyl transpeptidase,216,239,Gene\/Protein\nALB,325,328,Gene\/Protein\nalbumin,330,337,Gene\/Protein\nGLB,372,375,Gene\/Protein\nglobulin,377,385,Gene\/Protein\nbilirubin,432,441,Chemical\/Drug\nbilirubin,494,503,Chemical\/Drug\nbilirubin,554,563,Chemical\/Drug\nCHE,603,606,Gene\/Protein\ncholinesterase,608,622,Gene\/Protein\nurea,673,677,Chemical\/Drug\ncreatinine,729,739,Chemical\/Drug\ntriglycerides,782,795,Chemical\/Drug\ncholesterol,845,856,Chemical\/Drug\nglucose,900,907,Chemical\/Drug\naspartate aminotransferase,1007,1033,Gene\/Protein"}", "/scratch/micpie/export/bio_ner_23/train_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: Other bvrR regulated genes related with cell envelope were: three lipoprotein genes (BAB1 _ 0358; BAB1 _ 0589; BAB1 _ 2147), which were down-regulated; six genes for periplasmic proteins and chaperones (htpX, heat shock protein, BAB1 _ 1821; clpA and clpB, stress response proteins, BAB1 _ 1573 and BAB1 _ 1868, respectively; BAB2 _ 1107; BAB1 _ 0505; BAB1 _ 1022), which were all up-regulated; one gene related with LPS biosynthesis (glycosyl transferase, BAB1 _ 1620), which was up-regulated; and five genes for fatty acids biosynthesis (fabG, ketoacyl-acyl-carrier-protein reductase, BAB1 _ 2043; fabF, oxoacyl-acyl-carrier-protein synthase, BAB1 _ 0872; fadD, fatty-acyl-CoA synthase, BAB1 _ 0320; cfa, cyclopropane-fatty-acyl-phospholipid synthase, BAB1 _ 0476; BAB1 _ 1357)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: bvrR,6,10,Protein\nBAB1 _ 0358,86,97,Protein\nBAB1 _ 0589,99,110,Protein\nBAB1 _ 2147,112,123,Protein\nhtpX,207,211,Protein\nBAB1 _ 1821,233,244,Protein\nclpA,246,250,Protein\nclpB,255,259,Protein\nBAB1 _ 1573,287,298,Protein\nBAB1 _ 1868,303,314,Protein\nBAB2 _ 1107,330,341,Protein\nBAB1 _ 0505,343,354,Protein\nBAB1 _ 1022,356,367,Protein\nBAB1 _ 1620,464,475,Protein\nfabG,550,554,Protein\nBAB1 _ 2043,603,614,Protein\nfabF,616,620,Protein\nBAB1 _ 0872,667,678,Protein\nfadD,680,684,Protein\nBAB1 _ 0320,715,726,Protein\ncfa,728,731,Protein\nBAB1 _ 0476,786,797,Protein\nBAB1 _ 1357,799,810,Protein"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Sample ID Location Habitat Collection Date GPS Coordinates Elevation PES36 Lake 3 Cryoconite hole 31.01.2017 S 71.96345, E 23.31964 1342 m PES38 Lake 3 Cryoconite hole 31.01.2017 S 71.96345, E 23.31964 1342 m PES39 Lake 3 Cryoconite hole 31.01.2017 S 71.96345, E 23.31964 1342 m PES40 Lake 3 Cryoconite hole 31.01.2017 S 71.96345, E 23.31964 1342 m PES42 Petrelnuten Cryoconite hole 31.01.2017 S 72.01292, E 22.82887 1492 m PES43 Petrelnuten Cryoconite hole 31.01.2017 S 72.01292, E 22.82887 1492 m PES47 Lake 2 Cryoconite hole 01.02.2017 S 71.95867, E 23.31546 1316 m PES48 Lake 2 Cryoconite hole 01.02.2017 S 71.95867, E 23.31546 1316 m PES49 Lake 2 Cryoconite hole 01.02.2017 S 71.95867, E 23.31546 1316 m PES50 Lake 2 Cryoconite hole 01.02.2017 S 71.95867, E 23.31546 1316 m PES51 Lake 2 Cryoconite hole 01.02.2017 S 71.95867, E 23.31546 1316 m PES52 Dubois Cryoconite hole 01.02.2017 S 72.03793, E 23.29693 1361 m PES53 Dubois Cryoconite hole 01.02.2017 S 72.03793, E 23.29693 1361 m PES54 Dubois Cryoconite hole 01.02.2017 S 72.03793, E 23.29693 1361 m PES55 Dubois Cryoconite hole 01.02.2017 S 72.03793, E 23.29693 1361 m PES56 Dubois Cryoconite hole 01.02.2017 S 72.03793, E 23.29693 1361 m PES59 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES60 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES61 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES62 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES63 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES64 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES65 Petrelnuten Cryoconite hole 08.02.2017 S 72.01292, E 22.82887 1492 m PES4 Utsteinen Soil 19.01.2017 S 71.94535, E 23.34500 1359 m PES6 Utsteinen Soil 19.01.2017 S 71.94575, E 23.34525 1367 m PES33 Dubois Soil 30.01.2017 S 72.05169, E 23.25497 1352 m PES35 Dubois Soil 30.01.2017 S 72.04891, E 23.28334 1341 m PES44 Petrelnuten Soil 31.01.2017 S 72.01266, E 22.82781 1511 m PES57 Utsteinen Snow 02.02.2017 S 71.95177, E 23.34854 1362 m PES2 Utsteinen Endolith 18.01.2017 S 71.94535, E 23.34500 1359 m PES32 Dubois Endolith 30.01.2017 S 72.04891, E 23.28334 1341 m PES34 Dubois Endolith 30.01.2017 S 72.05169, E 23.25497 1352 m PES41 Lake 3 Lake 31.01.2017 S 71.96589, E 23.33311 1315 m PES46 Lake 2 Lake 31.01.2017 S 71.95818, E 23.31509 1317 m Overview of all 13C and 14C data of the cryoconite hole and soil samples and the corresponding ages of the carbon..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: Petrelnuten,371,382,place\nPetrelnuten,450,461,place\nDubois,899,905,place\nDubois,973,979,place\nDubois,1047,1053,place\nDubois,1121,1127,place\nDubois,1195,1201,place\nPetrelnuten,1269,1280,place\nPetrelnuten,1348,1359,place\nPetrelnuten,1427,1438,place\nPetrelnuten,1506,1517,place\nPetrelnuten,1585,1596,place\nPetrelnuten,1664,1675,place\nPetrelnuten,1743,1754,place\nUtsteinen,1821,1830,place\nUtsteinen,1886,1895,place\nDubois,1952,1958,place\nDubois,2015,2021,place\nPetrelnuten,2078,2089,place\nUtsteinen,2146,2155,place\nUtsteinen,2211,2220,place\nDubois,2281,2287,place\nDubois,2348,2354,place"}", "/scratch/micpie/export/bio_ner_22/valid_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: G6620 is the 3 ' end of the MOL1 gene coding for a polypeptide similar to stress-inducible proteins from Fusarium; G6630 is the NAT2 gene which encodes a methionine N-acetyltransferase; G6635 is the RPL30B gene coding for the ribosomal protein L30; G6658 is RSR1 encoding a ras-related protein; G6667 is CYS4, the gene for cystathionine beta-synthase; G6670 is identical to ORF2 located close to CYS4; G6673 is PEM1\/CHO2 encoding a phosphatidylethanolamine methyltransferase; G7001 is the NSR1 gene coding for a nuclear signal recognition protein..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: G6620,0,5,Gene\/Protein\nMOL1 gene,28,37,Gene\/Protein\nG6630,117,122,Gene\/Protein\nNAT2 gene,130,139,Gene\/Protein\nmethionine N - acetyltransferase,156,188,Gene\/Protein\nG6635,190,195,Gene\/Protein\nRPL30B gene,203,214,Gene\/Protein\nribosomal protein L30,230,251,Gene\/Protein\nG6658,253,258,Gene\/Protein\nRSR1,262,266,Gene\/Protein\nras - related protein,278,299,Gene\/Protein\nG6667,301,306,Gene\/Protein\nCYS4,310,314,Gene\/Protein\ncystathionine beta - synthase,329,358,Gene\/Protein\nG6670,360,365,Gene\/Protein\nCYS4,404,408,Gene\/Protein\nG6673,410,415,Gene\/Protein\nPEM1,419,423,Gene\/Protein\nCHO2,426,430,Gene\/Protein\nphosphatidylethanolamine methyltransferase,442,484,Gene\/Protein\nG7001,486,491,Gene\/Protein\nNSR1 gene,499,508,Gene\/Protein"} {"text":"Task: Please carry out the NER task for the the text below.\nText: Subsequent studies have revealed that (i) cytoplasmic RelA is stably associated not only with I kappa B alpha but also with other ankyrin motif-rich proteins including the products of the NF-kappa B2 (p100) and NF-kappa B1 (p105) genes; (ii) in contrast to RelA-I kappa B alpha, RelA-p100 cytoplasmic complexes are not dissociated following tumor necrosis factor alpha activation; (iii) p100 functions as a potent inhibitor of RelA-mediated transcription in vivo; (iv) the interaction of RelA and p100 involves the conserved Rel homology domain of both proteins but not the nuclear localization signal of RelA, which is required for I kappa B alpha binding; (v) p100 inhibition of RelA function requires the C-terminal ankyrin motif domain, which mediates cytoplasmic retention of RelA; and (vi) as observed with I kappa B alpha, nuclear RelA stimulates p100 mRNA and protein expression..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: cytoplasmic RelA,43,59,Gene\/Protein\nI kappa B alpha,95,110,Gene\/Protein\nankyrin motif - rich proteins,131,160,Gene\/Protein\nRelA,267,271,Gene\/Protein\n- I kappa B alpha,272,289,Gene\/Protein\nRelA - p100 cytoplasmic complexes,291,324,Gene\/Protein\ntumor necrosis factor alpha,355,382,Gene\/Protein\np100,402,406,Gene\/Protein\nRelA,442,446,Gene\/Protein\nRelA,506,510,Gene\/Protein\np100,515,519,Gene\/Protein\nconserved Rel homology domain,533,562,Gene\/Protein\nnuclear localization signal,592,619,Gene\/Protein\nRelA,623,627,Gene\/Protein\nI kappa B alpha,651,666,Gene\/Protein\np100,681,685,Gene\/Protein\nRelA,700,704,Gene\/Protein\nC - terminal ankyrin motif domain,727,760,Gene\/Protein\nRelA,802,806,Gene\/Protein\nI kappa B alpha,835,850,Gene\/Protein\nRelA,860,864,Gene\/Protein\np100 mRNA,876,885,Gene\/Protein"}", "/scratch/micpie/export/bio_ner_22/test_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: Configurational-bias Monte Carlo simulations in the Gibbs ensemble were carried out to compute vapor-liquid coexistence curves for fluorobenzene; chlorobenzene; bromobenzene; di-, tri-, and hexachlorobenzene isomers; 2-chlorofuran; 2-chlorothiophene; benzonitrile; phenol; dihydroxybenzene isomers; 1, 4-benzoquinone; naphthalene; naphthalene-2-carbonitrile; naphthalen-2-ol; quinoline; benzo [ b] thiophene; benzo [ c] thiophene; benzoxazole; benzisoxazole; benzimidazole; benzothiazole; indole; isoindole; indazole; purine; anthracene; and phenanthrene..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: fluorobenzene,135,148,Chemical\/Drug\nchlorobenzene,150,163,Chemical\/Drug\nbromobenzene,165,177,Chemical\/Drug\n2 - chlorofuran,223,238,Chemical\/Drug\n2 - chlorothiophene,240,259,Chemical\/Drug\nbenzonitrile,261,273,Chemical\/Drug\nphenol,275,281,Chemical\/Drug\ndihydroxybenzene,283,299,Chemical\/Drug\nnaphthalene,330,341,Chemical\/Drug\nnaphthalene - 2 - carbonitrile,343,373,Chemical\/Drug\nnaphthalen - 2 - ol,375,394,Chemical\/Drug\nquinoline,396,405,Chemical\/Drug\nbenzoxazole,451,462,Chemical\/Drug\nbenzisoxazole,464,477,Chemical\/Drug\nbenzimidazole,479,492,Chemical\/Drug\nbenzothiazole,494,507,Chemical\/Drug\nindole,509,515,Chemical\/Drug\nisoindole,517,526,Chemical\/Drug\nindazole,528,536,Chemical\/Drug\npurine,538,544,Chemical\/Drug\nanthracene,546,556,Chemical\/Drug\nphenanthrene,562,574,Chemical\/Drug"} {"text":"Task: Please carry out the NER task for the the text below.\nText: Configurational-bias Monte Carlo simulations in the Gibbs ensemble were carried out to compute vapor-liquid coexistence curves for fluorobenzene; chlorobenzene; bromobenzene; di-, tri-, and hexachlorobenzene isomers; 2-chlorofuran; 2-chlorothiophene; benzonitrile; phenol; dihydroxybenzene isomers; 1, 4-benzoquinone; naphthalene; naphthalene-2-carbonitrile; naphthalen-2-ol; quinoline; benzo [ b] thiophene; benzo [ c] thiophene; benzoxazole; benzisoxazole; benzimidazole; benzothiazole; indole; isoindole; indazole; purine; anthracene; and phenanthrene..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: fluorobenzene,135,148,Chemical\/Drug\nchlorobenzene,150,163,Chemical\/Drug\nbromobenzene,165,177,Chemical\/Drug\n2 - chlorofuran,223,238,Chemical\/Drug\n2 - chlorothiophene,240,259,Chemical\/Drug\nbenzonitrile,261,273,Chemical\/Drug\nphenol,275,281,Chemical\/Drug\ndihydroxybenzene,283,299,Chemical\/Drug\nnaphthalene,330,341,Chemical\/Drug\nnaphthalene - 2 - carbonitrile,343,373,Chemical\/Drug\nnaphthalen - 2 - ol,375,394,Chemical\/Drug\nquinoline,396,405,Chemical\/Drug\nbenzoxazole,451,462,Chemical\/Drug\nbenzisoxazole,464,477,Chemical\/Drug\nbenzimidazole,479,492,Chemical\/Drug\nbenzothiazole,494,507,Chemical\/Drug\nindole,509,515,Chemical\/Drug\nisoindole,517,526,Chemical\/Drug\nindazole,528,536,Chemical\/Drug\npurine,538,544,Chemical\/Drug\nanthracene,546,556,Chemical\/Drug\nphenanthrene,562,574,Chemical\/Drug"}", "/scratch/micpie/export/bio_ner_22/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Here we show that PKC and p44\/p42MAPK signalings are required for the HBx-induced Sp1-mediated IGF-II P4 transcriptional activity since (i) PKC activation by PMA or PKC expression vector increases Sp1 phosphorylation and P4 activity in HBx-transfected HepG2 cells; (ii) PKC inhibition by PKC inhibitor Go6976 reduces Sp1 phosphorylation, P4 activity, and IGF-II mRNA in HBx-transfected HepG2 cells; and (iii) the inhibition of MEK activation by U0126 reduces Sp1 phosphorylation, P4 activity and IGF-II mRNA in HBx-transfected HepG2 cells..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: PKC,18,21,Gene\/Protein\np44,26,29,Gene\/Protein\np42MAPK,32,39,Gene\/Protein\nHBx,72,75,Gene\/Protein\nSp1,86,89,Gene\/Protein\nIGF - II P4,101,112,Gene\/Protein\nPKC,149,152,Gene\/Protein\nPKC,174,177,Gene\/Protein\nSp1,206,209,Gene\/Protein\nP4,230,232,Gene\/Protein\nHBx,245,248,Gene\/Protein\nPKC,282,285,Gene\/Protein\nPKC,300,303,Gene\/Protein\nSp1,329,332,Gene\/Protein\nP4,350,352,Gene\/Protein\nIGF - II mRNA,367,380,Gene\/Protein\nHBx,384,387,Gene\/Protein\nMEK,444,447,Gene\/Protein\nSp1,476,479,Gene\/Protein\nP4,497,499,Gene\/Protein\nIGF - II mRNA,513,526,Gene\/Protein\nHBx,530,533,Gene\/Protein"} {"text":"Task: Please carry out the NER task for the the text below.\nText: Wall Identifier Formation Sample days after fracturing Bacteria isolated Geochemical analyses performed Microbial analyses performed Figures present Utica-3 Utica-Point Pleasant 38-460 Marinobacter sp. UTICA-S1B6 N-NH3, TN, S2, SO42 n\/a Figure 2 Utica-6 Utica-Point Pleasant 38-460 n\/a N-NH3, TN, S2, SO42 n\/a Figure 2 Utica-7 Utica-Point Pleasant 38-460 n\/a N-NH3, TN, S2, SO42 n\/a Figure 2 Utica 8 Utica-Point Pleasant 38-460 Arcobacter sp. UTICA-S4D1 N-NH3, TN, S2, SO42 n\/a Figure 2 Marcellus-1 Marcellus 4-328 n\/a n\/a 16S EMIRGE Figure 3 Marcellus-4 Marcellus 24-485 Arcobacter sp. MARC-MIP3H16 NH3, TN, S2, SO42, CI, NPOC, CO2 Metagenomics, 16S EMIRGE Figures 2, 3, 5 Marcellus-5 Marcellus 35 496 n\/a NH3, TN, S2, SO42 n\/a Figure 2 Sample Collection Produced fluid samples were collected from six hydraulically fractured natural-gas wells in the northern Appalachian Basin: four from the Utica-Point Pleasant Formation (Utica-3, Utica-6, Utica-7, and Utica-8) and two from the Marcellus Shale (Marcellus-4 and Marcellus-5)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: N - NH3,221,228,state\nTN,230,232,state\nS2,234,236,state\nN - NH3,306,313,state\nTN,315,317,state\nS2,319,321,state\nN - NH3,391,398,state\nTN,400,402,state\nS2,404,406,state\nN - NH3,496,503,state\nTN,505,507,state\nS2,509,511,state\nNH3,660,663,state\nTN,665,667,state\nS2,669,671,state\nCI,679,681,state\nNPOC,683,687,state\nCO2,689,692,state\nNH3,771,774,state\nTN,776,778,state\nS2,780,782,state\nhydraulically fractured,869,892,state"}", "/scratch/micpie/export/train_01.jsonl": "{"text":"The CAS-like IUPAC name of the chemical with SMILES C1=CC=C(C(=C1)NC(=O)CNCC2=CC=CO2)N3C=CC=C3 is 2-(2-furanylmethylamino)-N-[2-(1-pyrrolyl)phenyl]acetamide."} {"text":"User: Can you tell me the SELFIES of the molecule with the SMILES [H].[H].[H].[H]C1NC(C([H])(C([H])[H])C([H])([H])[H])N(C([H])([H])C2NC(C3CCC(C([H])([H])[H])C([H])C3[H])OC2[H])C1[H]?\nAssistant: Yes, this molecule has a SELFIES of [H].[H].[H].[H][C][N][C][Branch2][Ring1][Ring2][C][Branch1][C][H][Branch1][=Branch1][C][Branch1][C][H][H][C][Branch1][C][H][Branch1][C][H][H][N][Branch2][Ring2][=Branch2][C][Branch1][C][H][Branch1][C][H][C][N][C][Branch2][Ring1][#Branch1][C][C][C][C][Branch1][=Branch2][C][Branch1][C][H][Branch1][C][H][H][C][Branch1][C][H][C][Ring1][O][H][O][C][Ring1][P][H][C][Ring2][Ring2][Ring1][H]."}", "/scratch/micpie/export/nr_ar_tox21/test_0-10.jsonl": "{"text":"User: Can you generate the SMILES of a molecule that is not toxic in the NR-AR assay?\nAssistant: Yes, here you go: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"User: Can you create the InChI of a molecule that is not toxic in the NR-AR assay?\nAssistant: Yes, I'm happy to help, here you go: InChI=1S\/C14H22ClNO\/c1-11(10-16(3)4)14(2,17)9-12-5-7-13(15)8-6-12\/h5-8,11,17H,9-10H2,1-4H3"}", "/scratch/micpie/export/nr_ar_tox21/valid_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the InChI InChI=1S\/C19H23ClN2\/c1-21(2)12-5-13-22-18-7-4-3-6-15(18)8-9-16-10-11-17(20)14-19(16)22\/h3-4,6-7,10-11,14H,5,8-9,12-13H2,1-2H3 is toxic in the NR-AR assay?\nAssistant: No, this molecule is not toxic in the NR-AR assay."} {"text":"User: Can you estimate if the molecule with the SELFIES [C][C][=C][C][C@H1][C@@H1][Branch1][P][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@@][Ring1][#Branch1][Ring1][O][C][C@@H1][Ring1][#C][C][C][C@][Ring2][Ring1][Ring1][O][C@@H1][C][C@H1][Branch1][C][C][C][N][C@H1][Ring1][#Branch1][C@H1][Ring1][#Branch2][C] is toxic in the NR-AR assay?\nAssistant: No, this molecule is not toxic in the NR-AR assay."}", "/scratch/micpie/export/nr_ar_tox21/test_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-AR assay?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any additional words.\nOptions:\n1.) InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20)\n2.) InChI=1S\/C14H16Cl3NO2\/c1-4-14(3,12(19)7-15)18-13(20)9-5-10(16)8(2)11(17)6-9\/h5-6H,4,7H2,1-3H3,(H,18,20)\n3.) InChI=1S\/C10H12N4O3\/c1-14-4-9(16)5-2-6(12-13-10(11)17)8(15)3-7(5)14\/h2-3,9,16H,4H2,1H3,(H3,11,13,17)\/b12-6-\nAnswer: 1, 2, 3"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-AR assay?\nConstraint: You must select none, one or more options from A, B, or C without using any other words.\nOptions:\n[A] [C][C][Branch1][#Branch1][C][N][Branch1][C][C][C][C][Branch1][C][C][Branch1][C][O][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1]\n[B] [N][C][=C][C][=C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][C][=C][Ring1][=Branch2][O]\n[C] [C][C][Branch1][C][C][Branch1][Ring1][C][O][C@@H1][Branch1][C][O][C][=Branch1][C][=O][N][C][C][C][O]\nAnswer: A, B, C"}", "/scratch/micpie/export/nr_ar_tox21/train_0-8.jsonl": "{"text":"User: Can you figure out if the molecule with the SELFIES [C][C][N][C][=Branch1][C][=O][N][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][N][=O] is toxic in the NR-androgen receptor assay?\nAssistant: No, this molecule is not toxic in the NR-androgen receptor assay."} {"text":"User: Can you figure out if the molecule with the SMILES NC(=O)NNC(N)=O is toxic in the NR-androgen receptor assay?\nAssistant: No, this molecule is not toxic in the NR-androgen receptor assay."}", "/scratch/micpie/export/nr_ar_tox21/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-androgen receptor assay.\nMolecule SELFIES: [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-androgen receptor assay.\nMolecule SELFIES: [C][C][Branch1][#Branch1][C][N][Branch1][C][C][C][C][Branch1][C][C][Branch1][C][O][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"}", "/scratch/micpie/export/nr_ar_tox21/valid_0-9.jsonl": "{"text":"User: Is the molecule with the canonical SMILES CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21 toxic in the NR-androgen receptor assay?\nAssistant: No, it is not toxic in the NR-androgen receptor assay."} {"text":"User: Is the molecule with the InChI InChI=1S\/C27H41NO2\/c1-15-11-24-25(28-14-15)17(3)27(30-24)10-8-20-21-6-5-18-12-19(29)7-9-26(18,4)23(21)13-22(20)16(27)2\/h5,15,17,19-21,23-25,28-29H,6-14H2,1-4H3\/t15-,17+,19-,20-,21-,23-,24+,25-,26-,27-\/m0\/s1 toxic in the NR-AR assay?\nAssistant: No, it is not toxic in the NR-AR assay."}", "/scratch/micpie/export/nr_ar_tox21/test_0-1.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C] is not demonstrating toxicity in the NR-androgen assay."} {"text":"The molecule with the SELFIES representation of [C][C][Branch1][#Branch1][C][N][Branch1][C][C][C][C][Branch1][C][C][Branch1][C][O][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1] is not displaying toxicity in the NR-AR assay."}", "/scratch/micpie/export/nr_ar_tox21/valid_0-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21 is not toxic in the NR-androgen receptor assay."} {"text":"The molecule with the SELFIES representation of [C][C][=C][C][C@H1][C@@H1][Branch1][P][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@@][Ring1][#Branch1][Ring1][O][C][C@@H1][Ring1][#C][C][C][C@][Ring2][Ring1][Ring1][O][C@@H1][C][C@H1][Branch1][C][C][C][N][C@H1][Ring1][#Branch1][C@H1][Ring1][#Branch2][C] is not toxic in the NR-androgen receptor assay."}", "/scratch/micpie/export/nr_ar_tox21/test_0-2.jsonl": "{"text":"Based on the InChI representation InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20), the molecule has no NR-AR toxicity features."} {"text":"Based on the InChI InChI=1S\/C14H22ClNO\/c1-11(10-16(3)4)14(2,17)9-12-5-7-13(15)8-6-12\/h5-8,11,17H,9-10H2,1-4H3, the molecule has no NR-androgen receptor toxicity characteristics."}", "/scratch/micpie/export/nr_ar_tox21/valid_0-10.jsonl": "{"text":"User: Can you give me the SMILES of a molecule that is not toxic in the NR-androgen receptor assay?\nAssistant: Yes, here you go: CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21"} {"text":"User: Can you create the SMILES of a molecule that is not toxic in the NR-AR assay?\nAssistant: Yes, here you go: CC1=C2C[C@H]3[C@@H](CC=C4C[C@@H](O)CC[C@@]43C)[C@@H]2CC[C@]12O[C@@H]1C[C@H](C)CN[C@H]1[C@H]2C"}", "/scratch/micpie/export/nr_ar_tox21/train_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-androgen receptor assay.\nDeepSMILES: CCNC=O)NCcccccc6))))))C5=O\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the NR-androgen receptor assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-AR assay.\nMolecule InChI: InChI=1S\/C2H6N4O2\/c3-1(7)5-6-2(4)8\/h(H3,3,5,7)(H3,4,6,8)\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the NR-AR assay."}", "/scratch/micpie/export/nr_ar_tox21/valid_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-androgen receptor assay.\ncanonical SMILES: CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the NR-androgen receptor assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-androgen receptor assay.\ncanonical SMILES: CC1=C2C[C@H]3[C@@H](CC=C4C[C@@H](O)CC[C@@]43C)[C@@H]2CC[C@]12O[C@@H]1C[C@H](C)CN[C@H]1[C@H]2C\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the NR-androgen receptor assay."}", "/scratch/micpie/export/nr_ar_tox21/test_0-9.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20) toxic in the NR-AR assay?\nAssistant: No, it is not toxic in the NR-AR assay."} {"text":"User: Is the molecule with the canonical SMILES CC(CN(C)C)C(C)(O)Cc1ccc(Cl)cc1 toxic in the NR-AR assay?\nAssistant: No, it is not toxic in the NR-AR assay."}", "/scratch/micpie/export/nr_ar_tox21/test_0-0.jsonl": "{"text":"The molecule with the canonical SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C is not toxic in the NR-AR assay."} {"text":"The molecule with the canonical SMILES representation of CC(CN(C)C)C(C)(O)Cc1ccc(Cl)cc1 is not toxic in the NR-AR assay."}", "/scratch/micpie/export/nr_ar_tox21/valid_0-7.jsonl": "{"text":"Task: Please give me a SMILES based on the description.\nDescription: A molecule that is toxic in the NR-AR assay.\nResult: CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21"} {"text":"Task: Please create a canonical SMILES based on the description.\nDescription: A molecule that is toxic in the NR-AR assay.\nResult: CC1=C2C[C@H]3[C@@H](CC=C4C[C@@H](O)CC[C@@]43C)[C@@H]2CC[C@]12O[C@@H]1C[C@H](C)CN[C@H]1[C@H]2C"}", "/scratch/micpie/export/nr_ar_tox21/test_0-3.jsonl": "{"text":"The InChI InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20) is from a molecule that is not identified as toxic in the NR-androgen receptor assay."} {"text":"The SELFIES [C][C][Branch1][#Branch1][C][N][Branch1][C][C][C][C][Branch1][C][C][Branch1][C][O][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1] is from a molecule that is not identified as toxic in the NR-AR assay."}", "/scratch/micpie/export/nr_ar_tox21/valid_0-11.jsonl": "{"text":"User: I'm searching for the InChI of a molecule that is not toxic in the NR-AR assay?\nAssistant: This is a molecule that is not toxic in the NR-AR assay: InChI=1S\/C19H23ClN2\/c1-21(2)12-5-13-22-18-7-4-3-6-15(18)8-9-16-10-11-17(20)14-19(16)22\/h3-4,6-7,10-11,14H,5,8-9,12-13H2,1-2H3"} {"text":"User: I'm looking for the SELFIES of a molecule that is not toxic in the NR-androgen receptor assay?\nAssistant: This is a molecule that is not toxic in the NR-androgen receptor assay: [C][C][=C][C][C@H1][C@@H1][Branch1][P][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@@][Ring1][#Branch1][Ring1][O][C][C@@H1][Ring1][#C][C][C][C@][Ring2][Ring1][Ring1][O][C@@H1][C][C@H1][Branch1][C][C][C][N][C@H1][Ring1][#Branch1][C@H1][Ring1][#Branch2][C]"}", "/scratch/micpie/export/nr_ar_tox21/train_0-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of CCNC=O)NCcccccc6))))))C5=O is not toxic in the NR-AR assay."} {"text":"The molecule with the SELFIES [N][C][=Branch1][C][=O][N][N][C][Branch1][C][N][=O] is not toxic in the NR-AR assay."}", "/scratch/micpie/export/nr_ar_tox21/test_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-androgen receptor assay.\nMolecule InChI: InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20)\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the NR-androgen receptor assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-androgen receptor assay.\nMolecule SMILES: CC(CN(C)C)C(C)(O)Cc1ccc(Cl)cc1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the NR-androgen receptor assay."}", "/scratch/micpie/export/nr_ar_tox21/train_0-10.jsonl": "{"text":"User: Can you create the DeepSMILES of a molecule that is not toxic in the NR-androgen receptor assay?\nAssistant: Of course, here you go: CCNC=O)NCcccccc6))))))C5=O"} {"text":"User: Can you generate the SELFIES of a molecule that is not toxic in the NR-AR assay?\nAssistant: Of course, here you go: [N][C][=Branch1][C][=O][N][N][C][Branch1][C][N][=O]"}", "/scratch/micpie/export/nr_ar_tox21/train_0-3.jsonl": "{"text":"The InChI InChI=1S\/C11H12N2O2\/c1-2-13-10(14)9(12-11(13)15)8-6-4-3-5-7-8\/h3-7,9H,2H2,1H3,(H,12,15) is from a molecule that is not identified as toxic in the NR-androgen receptor assay."} {"text":"The DeepSMILES NC=O)NNCN)=O is from a molecule that is not identified as toxic in the NR-androgen receptor assay."}", "/scratch/micpie/export/nr_ar_tox21/train_0-12.jsonl": "{"text":"User: I want to come up with a molecule InChI.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the NR-AR assay.\nAssistant: Ok, here you go, this InChI is not toxic in the NR-AR assay: InChI=1S\/C11H12N2O2\/c1-2-13-10(14)9(12-11(13)15)8-6-4-3-5-7-8\/h3-7,9H,2H2,1H3,(H,12,15)"} {"text":"User: I want to generate a SMILES.\nAssistant: This sounds very curious. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be toxic in the NR-AR assay.\nAssistant: Ok, here you go, this SMILES is not toxic in the NR-AR assay: NC(=O)NNC(N)=O"}", "/scratch/micpie/export/nr_ar_tox21/test_0-13.jsonl": "{"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the NR-AR assay.\nAssistant: Ok, this SELFIES is not toxic in the NR-AR assay: [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C]"} {"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the NR-AR assay.\nAssistant: Understood, this DeepSMILES is not toxic in the NR-AR assay: CCCNC)C)))CC)O)CccccCl)cc6"}", "/scratch/micpie/export/nr_ar_tox21/valid_0-2.jsonl": "{"text":"Based on the InChI representation InChI=1S\/C19H23ClN2\/c1-21(2)12-5-13-22-18-7-4-3-6-15(18)8-9-16-10-11-17(20)14-19(16)22\/h3-4,6-7,10-11,14H,5,8-9,12-13H2,1-2H3, the molecule has no NR-AR toxicity properties."} {"text":"Based on the SELFIES [C][C][=C][C][C@H1][C@@H1][Branch1][P][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@@][Ring1][#Branch1][Ring1][O][C][C@@H1][Ring1][#C][C][C][C@][Ring2][Ring1][Ring1][O][C@@H1][C][C@H1][Branch1][C][C][C][N][C@H1][Ring1][#Branch1][C@H1][Ring1][#Branch2][C], the molecule has no NR-androgen receptor toxicity features."}", "/scratch/micpie/export/nr_ar_tox21/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES representation of CCN1C(=O)NC(c2ccccc2)C1=O toxic in the NR-AR assay?\nConstraint: Even if you are not sure, you must pick either a or b without using any additional words.\nOptions:\n[a] False\n[b] True\nAnswer: a"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of NC(=O)NNC(N)=O toxic in the NR-AR assay?\nConstraint: Even if you are not sure, you must pick either a or b without using any other words.\nOptions:\na False\nb True\nAnswer: a"}", "/scratch/micpie/export/nr_ar_tox21/valid_0-1.jsonl": "{"text":"The molecule with the canonical SMILES representation of CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21 is not demonstrating toxicity in the NR-androgen assay."} {"text":"The molecule with the canonical SMILES CC1=C2C[C@H]3[C@@H](CC=C4C[C@@H](O)CC[C@@]43C)[C@@H]2CC[C@]12O[C@@H]1C[C@H](C)CN[C@H]1[C@H]2C is not displaying toxicity in the NR-AR assay."}", "/scratch/micpie/export/nr_ar_tox21/valid_0-13.jsonl": "{"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the NR-androgen receptor assay.\nAssistant: Got it, this SELFIES is not toxic in the NR-androgen receptor assay: [C][N][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Ring1][S]"} {"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the NR-androgen receptor assay.\nAssistant: Got it, this InChI is not toxic in the NR-androgen receptor assay: InChI=1S\/C27H41NO2\/c1-15-11-24-25(28-14-15)17(3)27(30-24)10-8-20-21-6-5-18-12-19(29)7-9-26(18,4)23(21)13-22(20)16(27)2\/h5,15,17,19-21,23-25,28-29H,6-14H2,1-4H3\/t15-,17+,19-,20-,21-,23-,24+,25-,26-,27-\/m0\/s1"}", "/scratch/micpie/export/nr_ar_tox21/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-AR assay.\nSELFIES: [C][N][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Ring1][S]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-AR assay.\nSMILES: CC1=C2C[C@H]3[C@@H](CC=C4C[C@@H](O)CC[C@@]43C)[C@@H]2CC[C@]12O[C@@H]1C[C@H](C)CN[C@H]1[C@H]2C\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/nr_ar_tox21/train_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-androgen receptor assay?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any other words.\nOptions:\n1: O[C@H]1[C@H](O)[C@@H](O)[C@H](O)[C@@H](O)[C@H]1O\n2: O=C(O)c1ccc2cc(O)ccc2c1\n3: CCN1C(=O)NC(c2ccccc2)C1=O\n4: CC(C(O)c1ccccc1)N(C)CCOC(c1ccccc1)c1ccccc1\nAnswer: 1, 2, 3, 4"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-AR assay?\nConstraint: You must select none, one or more options from A or B without using any additional words.\nOptions:\n[A] NC(=O)NNC(N)=O\n[B] C\/N=C(\\NC)NCc1ccccc1.C\/N=C(\\NC)NCc1ccccc1\nAnswer: A, B"}", "/scratch/micpie/export/nr_ar_tox21/valid_0-4.jsonl": "{"text":"The SELFIES [C][N][Branch1][C][C][C][C][C][N][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][=C][C][=C][Branch1][C][Cl][C][=C][Ring1][#Branch1][Ring1][S] is not toxic in the NR-AR assay."} {"text":"The molecule SELFIES [C][C][=C][C][C@H1][C@@H1][Branch1][P][C][C][=C][C][C@@H1][Branch1][C][O][C][C][C@@][Ring1][#Branch1][Ring1][O][C][C@@H1][Ring1][#C][C][C][C@][Ring2][Ring1][Ring1][O][C@@H1][C][C@H1][Branch1][C][C][C][N][C@H1][Ring1][#Branch1][C@H1][Ring1][#Branch2][C] is not toxic in the NR-androgen receptor assay."}", "/scratch/micpie/export/nr_ar_tox21/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-androgen receptor assay.\nMolecule SELFIES: [C][C][N][C][=Branch1][C][=O][N][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][N][=O]\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-androgen receptor assay.\nInChI: InChI=1S\/C2H6N4O2\/c3-1(7)5-6-2(4)8\/h(H3,3,5,7)(H3,4,6,8)\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/nr_ar_tox21/valid_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-AR assay?\nConstraint: You must select none, one or more options from A, B, C, or D without using any other words.\nOptions:\n(A) CC(=O)Oc1cccc(O)c1\n(B) CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21\n(C) CCC1NC(=O)c2cc(S(N)(=O)=O)c(Cl)cc2N1\n(D) Nc1ccc([N+](=O)[O-])c(N)c1\nAnswer: A, B, C, D"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-AR assay?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any other words.\nOptions:\n1.) CCNcncNCC)))ncSC))n6\n2.) NC=O)OC[C@@H][C@H]NC=O)\/C=N\\OCC=O)[O-])))))ccscN)n5))))))))C=O)N4S=O)=O)[O-]\n3.) COccccCCC=O)ccO)ccO[C@@H]O[C@H]CO))[C@@H]O)[C@H]O)[C@H]6O[C@@H]O[C@@H]C)[C@H]O)[C@@H]O)[C@H]6O)))))))))))))))cc6O))))))))))cc6O\n4.) CC=CC[C@H][C@@H]CC=CC[C@@H]O)CC[C@@]6%10C)))))))))[C@@H]5CC[C@]9O[C@@H]C[C@H]C)CN[C@H]6[C@H]9C\nAnswer: 1, 2, 3, 4"}", "/scratch/micpie/export/nr_ar_tox21/valid_0-12.jsonl": "{"text":"User: I want to generate a SMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be toxic in the NR-AR assay.\nAssistant: Ok, this SMILES is not toxic in the NR-AR assay: CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21"} {"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the NR-androgen receptor assay.\nAssistant: Ok, this DeepSMILES is not toxic in the NR-androgen receptor assay: CC=CC[C@H][C@@H]CC=CC[C@@H]O)CC[C@@]6%10C)))))))))[C@@H]5CC[C@]9O[C@@H]C[C@H]C)CN[C@H]6[C@H]9C"}", "/scratch/micpie/export/nr_ar_tox21/train_0-2.jsonl": "{"text":"Based on the InChI InChI=1S\/C11H12N2O2\/c1-2-13-10(14)9(12-11(13)15)8-6-4-3-5-7-8\/h3-7,9H,2H2,1H3,(H,12,15), the molecule has no NR-androgen receptor toxicity properties."} {"text":"Based on the canonical SMILES NC(=O)NNC(N)=O, the molecule has no NR-androgen receptor toxicity features."}", "/scratch/micpie/export/nr_ar_tox21/test_0-11.jsonl": "{"text":"User: I'm looking for the SMILES of a molecule that is not toxic in the NR-androgen receptor assay?\nAssistant: This is a molecule that is not toxic in the NR-androgen receptor assay: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"User: I'm looking for the canonical SMILES of a molecule that is not toxic in the NR-AR assay?\nAssistant: This is a molecule that is not toxic in the NR-AR assay: CC(CN(C)C)C(C)(O)Cc1ccc(Cl)cc1"}", "/scratch/micpie/export/nr_ar_tox21/train_0-7.jsonl": "{"text":"Task: Please generate a SELFIES based on the text description.\nDescription: A molecule that is toxic in the NR-AR assay.\nResult: [C][C][N][C][=Branch1][C][=O][N][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][N][=O]"} {"text":"Task: Please generate a canonical SMILES based on the description.\nDescription: A molecule that is toxic in the NR-AR assay.\nResult: NC(=O)NNC(N)=O"}", "/scratch/micpie/export/nr_ar_tox21/train_0-11.jsonl": "{"text":"User: I'm looking for the InChI of a molecule that is not toxic in the NR-androgen receptor assay?\nAssistant: This is a molecule that is not toxic in the NR-androgen receptor assay: InChI=1S\/C11H12N2O2\/c1-2-13-10(14)9(12-11(13)15)8-6-4-3-5-7-8\/h3-7,9H,2H2,1H3,(H,12,15)"} {"text":"User: I'm searching for the canonical SMILES of a molecule that is not toxic in the NR-androgen receptor assay?\nAssistant: This is a molecule that is not toxic in the NR-androgen receptor assay: NC(=O)NNC(N)=O"}", "/scratch/micpie/export/nr_ar_tox21/train_0-1.jsonl": "{"text":"The molecule with the SELFIES [C][C][N][C][=Branch1][C][=O][N][C][Branch1][=Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][Ring1][N][=O] is not displaying toxicity in the NR-AR assay."} {"text":"The molecule with the SMILES NC(=O)NNC(N)=O is not exhibiting toxicity in the NR-androgen assay."}", "/scratch/micpie/export/nr_ar_tox21/train_0-13.jsonl": "{"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be toxic in the NR-AR assay.\nAssistant: Understood, this SMILES is not toxic in the NR-AR assay: CCN1C(=O)NC(c2ccccc2)C1=O"} {"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the NR-AR assay.\nAssistant: Got it, this SELFIES is not toxic in the NR-AR assay: [N][C][=Branch1][C][=O][N][N][C][Branch1][C][N][=O]"}", "/scratch/micpie/export/nr_ar_tox21/train_0-4.jsonl": "{"text":"The SMILES CCN1C(=O)NC(c2ccccc2)C1=O is not toxic in the NR-androgen receptor assay."} {"text":"The DeepSMILES NC=O)NNCN)=O is not toxic in the NR-AR assay."}", "/scratch/micpie/export/nr_ar_tox21/test_0-7.jsonl": "{"text":"Task: Please create a canonical SMILES based on the text description below.\nDescription: A molecule that is toxic in the NR-AR assay.\nResult: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"Task: Please create a canonical SMILES based on the text description below.\nDescription: A molecule that is toxic in the NR-AR assay.\nResult: CC(CN(C)C)C(C)(O)Cc1ccc(Cl)cc1"}", "/scratch/micpie/export/nr_ar_tox21/train_0-9.jsonl": "{"text":"User: Is the molecule with the DeepSMILES CCNC=O)NCcccccc6))))))C5=O toxic in the NR-androgen receptor assay?\nAssistant: No, it is not toxic in the NR-androgen receptor assay."} {"text":"User: Is the molecule with the SMILES NC(=O)NNC(N)=O toxic in the NR-androgen receptor assay?\nAssistant: No, it is not toxic in the NR-androgen receptor assay."}", "/scratch/micpie/export/nr_ar_tox21/valid_0-3.jsonl": "{"text":"The DeepSMILES CNC)CCCNcccccc6CCccccCl)cc6%15 represents a molecule that is not identified as toxic in the NR-AR assay."} {"text":"The DeepSMILES CC=CC[C@H][C@@H]CC=CC[C@@H]O)CC[C@@]6%10C)))))))))[C@@H]5CC[C@]9O[C@@H]C[C@H]C)CN[C@H]6[C@H]9C is from a molecule that is not identified as toxic in the NR-androgen receptor assay."}", "/scratch/micpie/export/nr_ar_tox21/test_0-8.jsonl": "{"text":"User: Can you estimate if the molecule with the canonical SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C is toxic in the NR-androgen receptor assay?\nAssistant: No, this molecule is not toxic in the NR-androgen receptor assay."} {"text":"User: Can you estimate if the molecule with the DeepSMILES CCCNC)C)))CC)O)CccccCl)cc6 is toxic in the NR-AR assay?\nAssistant: No, this molecule is not toxic in the NR-AR assay."}", "/scratch/micpie/export/nr_ar_tox21/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C toxic in the NR-androgen receptor assay?\nConstraint: Even if you are uncertain, you must pick either A or B without using any additional words.\nOptions:\nA) True\nB) False\nAnswer: B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SMILES representation of CC(CN(C)C)C(C)(O)Cc1ccc(Cl)cc1 toxic in the NR-androgen receptor assay?\nConstraint: Even if you are uncertain, you must pick either a or b without using any additional words.\nOptions:\na: True\nb: False\nAnswer: b"}", "/scratch/micpie/export/nr_ar_tox21/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of CN(C)CCCN1c2ccccc2CCc2ccc(Cl)cc21 toxic in the NR-androgen receptor assay?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any additional words.\nOptions:\n1. True\n2. False\nAnswer: 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES representation of CC=CC[C@H][C@@H]CC=CC[C@@H]O)CC[C@@]6%10C)))))))))[C@@H]5CC[C@]9O[C@@H]C[C@H]C)CN[C@H]6[C@H]9C toxic in the NR-androgen receptor assay?\nConstraint: Even if you are uncertain, you must pick either a or b without using any additional words.\nOptions:\na.) True\nb.) False\nAnswer: b"}", "/scratch/micpie/export/nr_ar_tox21/test_0-4.jsonl": "{"text":"The InChI InChI=1S\/C17H28N2O\/c1-6-12-19(8-3)15(7-2)17(20)18-16-13(4)10-9-11-14(16)5\/h9-11,15H,6-8,12H2,1-5H3,(H,18,20) is not toxic in the NR-androgen receptor assay."} {"text":"The molecule canonical SMILES CC(CN(C)C)C(C)(O)Cc1ccc(Cl)cc1 is not toxic in the NR-AR assay."}", "/scratch/micpie/export/nr_ar_tox21/test_0-12.jsonl": "{"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very curious. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the NR-AR assay.\nAssistant: Got it, this SELFIES is not toxic in the NR-AR assay: [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C]"} {"text":"User: I want to generate a molecule InChI.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the NR-AR assay.\nAssistant: Got it, this InChI is not toxic in the NR-AR assay: InChI=1S\/C14H22ClNO\/c1-11(10-16(3)4)14(2,17)9-12-5-7-13(15)8-6-12\/h5-8,11,17H,9-10H2,1-4H3"}", "/scratch/micpie/export/bio_ner_48/test_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Liver, kidney and heart samples were sampled under sterile conditions (tools and cabinets) and preserved in 70% ethanol. Table 1Hemoparasites in 19 Psittaciformes from different habitats and climates of the Indo-Malayan, Australasian and Neotropical regionsLocality (locality number) HabitataClimateb n AgecTissuedHemoparasite presence and prevalenceAntiparasitic SM in diet? eReferencesCacatuidae Indo-Malayan Philippine cockatoo Cacatua haematuropygiaRasa I., Palawan, Philippines (1) M, MFAm16pfFTA − Yes [ 140] Psittacidae Australasian New Caledonian rainbow lorikeet Trichoglossus haematodus deplanchiiOuégoa, New Caledonia (2) SaAf2adFTA − No [ 141, 142] 1pfFTA − Parc des Grandes Fougères, New Caledonia (3) Rf, Sa, PPAf5adFTA − Motorpool, Nouméa, New Caledonia (4) SuAf2adFTA − Vallée des Colons, Nouméa, New Caledonia (5) SuAf2adFTA − Forbes ’ parakeet Cyanoramphus forbesiMangere I., Chatham Is., New Zealand (6) CSCfbf30adFTA − No [ 143 – 145] Red-fronted parakeet Cyanoramphus novaezelandiaeRaoul I., New Zealand (7) EPCfag34adlysGenus: Plasmodium; lineage: LIN4 (BELL02); identity 100 %; prevalence 18% No [ 143, 146] Tiritiri Matangi I., New Zealand (8) BF, Gr, NTPCfb24pflys −[ 111] Little Barrier I., New Zealand (9) CKCfb42adlysGenus: Plasmodium; lineage: LIN4 (BELL02); identity 100 %; prevalence 5 %; lineage: CN73; identity 99.7 %; prevalence 5 %[ 102] New Caledonian parakeet Cyanoramphus saissetiParc des Grandes Fougères, New Caledonia (3) Rf, Sa, PPAf1pfFTA − No [ 142] Horned parakeet Eunymphicus cornutusParc des Grandes Fougères, New Caledonia (3) Rf, Sa, PPAf1adheart − No [ 142] Parc Provincial de la Rivière Bleue, New Caledonia (10) Rf, MaAf1pfkidney −[ 142] Ouvéa parakeet Eunymphicus uvaeensisGossana, Ouvéa, New Caledonia (11) Rf, CPAf6adFTA − No [ 142] 1juvFTA − Neotropical Blue and yellow macaw Ara araraunaTrinidad, Bolivia (12) PFISAm2adFTA − Yes [ 147] Sachojere, Bolivia (13) PFISAm1pfFTA −[ 147] Blue-throated macaw Ara glaucogularisBeni, Bolivia (14) PFISAm1adlys − Yes [ 147] Trinidad, Bolivia (12) PFISAm5adFTA −[ 147] 2pfFTA − Blue-crowned conure Thectocercus acuticaudatusChaco, Argentina (15) DXFCfa1adlys − Yes [ 148] 4pflys − White-eyed conure Psittacara leucophtalmusSachojere, Bolivia (13) PFISAm2pfFTA − No [ 147] Brown-throated conure Eupsittula pertinaxIsla Margarita, Venezuela (16) CTSAwf9adFTA − Yes [ 149] Nanday conure Aratinga nendayPrincipe Negro, Pantanal, Brazil (17) Gr, Sa, SS, RiAw11pfFTA − Yes [ 150] Burrowing parrot Cyanoliseus patagonusEl Cóndor, Patagonia, Argentina (18) MoBSk32adeth − Yes [ 151] 14adliver − Comallo, Patagonia, Argentina (19) PSCsb5adliver −[ 152] Austral parakeet Enicognathus ferrugineusNavarino, Chile (20) BFNET2adFTAGenus: Leucocytozoon; prevalence 100% No [ 46] Bariloche, Patagonia, Argentina (21) BFNCsb3adFTAGenus: Leucocytozoon; prevalence 100 %[ 153] 1pfFTA − 3adliverGenus: Leucocytozoon; prevalence 33% 1pfliver − Blue-winged parrotlet Forpus xanthopterygiusTrinidad, Bolivia (12) PFISAm9adFTA − Yes [ 147] Yellow-chevroned parakeet Brotogeris chiririTrinidad, Bolivia (12) PFISAm4adFTA − Yes [ 147] Red-tailed Amazon Amazona brasilensisIlha Rasa, Guaraqueçaba, Brazil (22) LAF, MCfa29pfFTA − Yes [ 154] Ilha das Gamelas, Guaraqueçaba, Brazil (23) LAF, MCfa4pfFTA −[ 154] Blue-fronted Amazon Amazona aestivaJujuy, Argentina (24) DXFCwa6adlys − Yes [ 155] Chaco, Argentina (15) DXFCfa13adlys −[ 148] Pantanal, Brasil (25) Gr, Sa, SS, RiAw17pfFTA −[ 150] aHabitats: BF remnants of broadleaf forest, BFN broadleaf forests dominated by Nothofagus spp., CK coastal and kauri (Agathis australis) forests, CS coastal scrub, CP coconut plantations, CTS cactus and thorn scrub, DXF deciduous xerophyte forest, EP endemic pohutukawa (Metrosideros kermadecensis) sub-tropical moist forest, Gr grassland with sparse trees, LAF lowland Atlantic forest, M mangrove, Ma maquis, MF monsoon forest, Mo Monte xerophyte forest, NTP native trees planted under a re-vegetation programme, PFIS palm dominated forest islands surrounded by regularly flooded savannah, PP pine plantations, PS Patagonian steppes, Rf rainforest, Ri riparian forest, Sa savannah, SS scrub savannahs, Su sub-urbanbThe diversity of climates following [ 143, 156], except where indicated: Af, tropical rainforest; Am, tropical monsoon; Aw, tropical savannah; BSk, arid, steppe, cold; Cfa, temperate, without dry season, hot summer; Cfb, temperate, without dry season, warm summer; Csb, temperate, dry summer, warm summer; Cwa, temperate, dry winter, hot summer; ET, polar, tundracAge: age of individuals at the time of sampling; ad, adult; pf, pre-fledgling; juv, juveniledTissue: tissue used to obtain DNA; FTA, blood in FTA cards; lys, blood in lysis buffer; eth, blood in ethanol 70% eFood items known for their secondary metabolites (SM) with antimalarial, trypanocidal or general antiparasitic propertiesffollowing [ 143] and WorldClim database (http:\/\/ www. worldclim. org) [ 145], using diva-gis (http:\/\/ www. diva-gis. org \/) gfollowing [ 143] and data from the New Zealand National Climate Database (http:\/\/ cliflo. niwa. co. nz) Abbreviation: n, sample size.\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: Psittaciformes,149,163,Organism\/Species\nPhilippine cockatoo,417,436,Organism\/Species\nCacatua haematuropygiaRasa,437,463,Organism\/Species\nPsittacidae,523,534,Organism\/Species\nNew Caledonian rainbow lorikeet,548,579,Organism\/Species\nForbes ’ parakeet,856,873,Organism\/Species\nCyanoramphus forbesiMangere,874,901,Organism\/Species\nRed - fronted parakeet,970,992,Organism\/Species\nCyanoramphus novaezelandiaeRaoul,993,1025,Organism\/Species\nPlasmodium,1068,1078,Organism\/Species\nPlasmodium,1277,1287,Organism\/Species\nNew Caledonian parakeet,1401,1424,Organism\/Species\nHorned parakeet,1523,1538,Organism\/Species\nEunymphicus cornutusParc,1539,1563,Organism\/Species\nOuvéa parakeet,1721,1735,Organism\/Species\nEunymphicus uvaeensisGossana,1736,1764,Organism\/Species\nBlue and yellow macaw,1842,1863,Organism\/Species\nAra araraunaTrinidad,1864,1884,Organism\/Species\nBlue - throated macaw,1972,1993,Organism\/Species\nAra glaucogularisBeni,1994,2015,Organism\/Species\nBlue - crowned conure,2111,2132,Organism\/Species\nWhite - eyed conure,2217,2236,Organism\/Species\nBrown - throated conure,2311,2334,Organism\/Species\nEupsittula pertinaxIsla,2335,2358,Organism\/Species\nNanday conure,2412,2425,Organism\/Species\nBurrowing parrot,2517,2533,Organism\/Species\nCyanoliseus patagonusEl,2534,2557,Organism\/Species\nAustral parakeet,2689,2705,Organism\/Species\nLeucocytozoon,2770,2783,Organism\/Species\nLeucocytozoon,2867,2880,Organism\/Species\nLeucocytozoon,2929,2942,Organism\/Species\nBlue - winged parrotlet,2970,2993,Organism\/Species\nForpus xanthopterygiusTrinidad,2994,3024,Organism\/Species\nYellow - chevroned parakeet,3066,3093,Organism\/Species\nBrotogeris chiririTrinidad,3094,3120,Organism\/Species\nRed - tailed Amazon,3162,3181,Organism\/Species\nBlue - fronted Amazon,3338,3359,Organism\/Species\nAmazona aestivaJujuy,3360,3380,Organism\/Species\nNothofagus spp,3603,3617,Organism\/Species\nAgathis australis,3644,3661,Organism\/Species\ncoconut,3693,3700,Organism\/Species\ncactus,3718,3724,Organism\/Species\nxerophyte,3756,3765,Organism\/Species\npohutukawa,3785,3795,Organism\/Species\nMetrosideros kermadecensis,3798,3824,Organism\/Species\ntrees,3880,3885,Organism\/Species\nxerophyte,3967,3976,Organism\/Species\ntrees,3996,4001,Organism\/Species"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Liver, kidney and heart samples were sampled under sterile conditions (tools and cabinets) and preserved in 70% ethanol. Table 1Hemoparasites in 19 Psittaciformes from different habitats and climates of the Indo-Malayan, Australasian and Neotropical regionsLocality (locality number) HabitataClimateb n AgecTissuedHemoparasite presence and prevalenceAntiparasitic SM in diet? eReferencesCacatuidae Indo-Malayan Philippine cockatoo Cacatua haematuropygiaRasa I., Palawan, Philippines (1) M, MFAm16pfFTA − Yes [ 140] Psittacidae Australasian New Caledonian rainbow lorikeet Trichoglossus haematodus deplanchiiOuégoa, New Caledonia (2) SaAf2adFTA − No [ 141, 142] 1pfFTA − Parc des Grandes Fougères, New Caledonia (3) Rf, Sa, PPAf5adFTA − Motorpool, Nouméa, New Caledonia (4) SuAf2adFTA − Vallée des Colons, Nouméa, New Caledonia (5) SuAf2adFTA − Forbes ’ parakeet Cyanoramphus forbesiMangere I., Chatham Is., New Zealand (6) CSCfbf30adFTA − No [ 143 – 145] Red-fronted parakeet Cyanoramphus novaezelandiaeRaoul I., New Zealand (7) EPCfag34adlysGenus: Plasmodium; lineage: LIN4 (BELL02); identity 100 %; prevalence 18% No [ 143, 146] Tiritiri Matangi I., New Zealand (8) BF, Gr, NTPCfb24pflys −[ 111] Little Barrier I., New Zealand (9) CKCfb42adlysGenus: Plasmodium; lineage: LIN4 (BELL02); identity 100 %; prevalence 5 %; lineage: CN73; identity 99.7 %; prevalence 5 %[ 102] New Caledonian parakeet Cyanoramphus saissetiParc des Grandes Fougères, New Caledonia (3) Rf, Sa, PPAf1pfFTA − No [ 142] Horned parakeet Eunymphicus cornutusParc des Grandes Fougères, New Caledonia (3) Rf, Sa, PPAf1adheart − No [ 142] Parc Provincial de la Rivière Bleue, New Caledonia (10) Rf, MaAf1pfkidney −[ 142] Ouvéa parakeet Eunymphicus uvaeensisGossana, Ouvéa, New Caledonia (11) Rf, CPAf6adFTA − No [ 142] 1juvFTA − Neotropical Blue and yellow macaw Ara araraunaTrinidad, Bolivia (12) PFISAm2adFTA − Yes [ 147] Sachojere, Bolivia (13) PFISAm1pfFTA −[ 147] Blue-throated macaw Ara glaucogularisBeni, Bolivia (14) PFISAm1adlys − Yes [ 147] Trinidad, Bolivia (12) PFISAm5adFTA −[ 147] 2pfFTA − Blue-crowned conure Thectocercus acuticaudatusChaco, Argentina (15) DXFCfa1adlys − Yes [ 148] 4pflys − White-eyed conure Psittacara leucophtalmusSachojere, Bolivia (13) PFISAm2pfFTA − No [ 147] Brown-throated conure Eupsittula pertinaxIsla Margarita, Venezuela (16) CTSAwf9adFTA − Yes [ 149] Nanday conure Aratinga nendayPrincipe Negro, Pantanal, Brazil (17) Gr, Sa, SS, RiAw11pfFTA − Yes [ 150] Burrowing parrot Cyanoliseus patagonusEl Cóndor, Patagonia, Argentina (18) MoBSk32adeth − Yes [ 151] 14adliver − Comallo, Patagonia, Argentina (19) PSCsb5adliver −[ 152] Austral parakeet Enicognathus ferrugineusNavarino, Chile (20) BFNET2adFTAGenus: Leucocytozoon; prevalence 100% No [ 46] Bariloche, Patagonia, Argentina (21) BFNCsb3adFTAGenus: Leucocytozoon; prevalence 100 %[ 153] 1pfFTA − 3adliverGenus: Leucocytozoon; prevalence 33% 1pfliver − Blue-winged parrotlet Forpus xanthopterygiusTrinidad, Bolivia (12) PFISAm9adFTA − Yes [ 147] Yellow-chevroned parakeet Brotogeris chiririTrinidad, Bolivia (12) PFISAm4adFTA − Yes [ 147] Red-tailed Amazon Amazona brasilensisIlha Rasa, Guaraqueçaba, Brazil (22) LAF, MCfa29pfFTA − Yes [ 154] Ilha das Gamelas, Guaraqueçaba, Brazil (23) LAF, MCfa4pfFTA −[ 154] Blue-fronted Amazon Amazona aestivaJujuy, Argentina (24) DXFCwa6adlys − Yes [ 155] Chaco, Argentina (15) DXFCfa13adlys −[ 148] Pantanal, Brasil (25) Gr, Sa, SS, RiAw17pfFTA −[ 150] aHabitats: BF remnants of broadleaf forest, BFN broadleaf forests dominated by Nothofagus spp., CK coastal and kauri (Agathis australis) forests, CS coastal scrub, CP coconut plantations, CTS cactus and thorn scrub, DXF deciduous xerophyte forest, EP endemic pohutukawa (Metrosideros kermadecensis) sub-tropical moist forest, Gr grassland with sparse trees, LAF lowland Atlantic forest, M mangrove, Ma maquis, MF monsoon forest, Mo Monte xerophyte forest, NTP native trees planted under a re-vegetation programme, PFIS palm dominated forest islands surrounded by regularly flooded savannah, PP pine plantations, PS Patagonian steppes, Rf rainforest, Ri riparian forest, Sa savannah, SS scrub savannahs, Su sub-urbanbThe diversity of climates following [ 143, 156], except where indicated: Af, tropical rainforest; Am, tropical monsoon; Aw, tropical savannah; BSk, arid, steppe, cold; Cfa, temperate, without dry season, hot summer; Cfb, temperate, without dry season, warm summer; Csb, temperate, dry summer, warm summer; Cwa, temperate, dry winter, hot summer; ET, polar, tundracAge: age of individuals at the time of sampling; ad, adult; pf, pre-fledgling; juv, juveniledTissue: tissue used to obtain DNA; FTA, blood in FTA cards; lys, blood in lysis buffer; eth, blood in ethanol 70% eFood items known for their secondary metabolites (SM) with antimalarial, trypanocidal or general antiparasitic propertiesffollowing [ 143] and WorldClim database (http:\/\/ www. worldclim. org) [ 145], using diva-gis (http:\/\/ www. diva-gis. org \/) gfollowing [ 143] and data from the New Zealand National Climate Database (http:\/\/ cliflo. niwa. co. nz) Abbreviation: n, sample size.\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: Psittaciformes,149,163,Organism\/Species\nPhilippine cockatoo,417,436,Organism\/Species\nCacatua haematuropygiaRasa,437,463,Organism\/Species\nPsittacidae,523,534,Organism\/Species\nNew Caledonian rainbow lorikeet,548,579,Organism\/Species\nForbes ’ parakeet,856,873,Organism\/Species\nCyanoramphus forbesiMangere,874,901,Organism\/Species\nRed - fronted parakeet,970,992,Organism\/Species\nCyanoramphus novaezelandiaeRaoul,993,1025,Organism\/Species\nPlasmodium,1068,1078,Organism\/Species\nPlasmodium,1277,1287,Organism\/Species\nNew Caledonian parakeet,1401,1424,Organism\/Species\nHorned parakeet,1523,1538,Organism\/Species\nEunymphicus cornutusParc,1539,1563,Organism\/Species\nOuvéa parakeet,1721,1735,Organism\/Species\nEunymphicus uvaeensisGossana,1736,1764,Organism\/Species\nBlue and yellow macaw,1842,1863,Organism\/Species\nAra araraunaTrinidad,1864,1884,Organism\/Species\nBlue - throated macaw,1972,1993,Organism\/Species\nAra glaucogularisBeni,1994,2015,Organism\/Species\nBlue - crowned conure,2111,2132,Organism\/Species\nWhite - eyed conure,2217,2236,Organism\/Species\nBrown - throated conure,2311,2334,Organism\/Species\nEupsittula pertinaxIsla,2335,2358,Organism\/Species\nNanday conure,2412,2425,Organism\/Species\nBurrowing parrot,2517,2533,Organism\/Species\nCyanoliseus patagonusEl,2534,2557,Organism\/Species\nAustral parakeet,2689,2705,Organism\/Species\nLeucocytozoon,2770,2783,Organism\/Species\nLeucocytozoon,2867,2880,Organism\/Species\nLeucocytozoon,2929,2942,Organism\/Species\nBlue - winged parrotlet,2970,2993,Organism\/Species\nForpus xanthopterygiusTrinidad,2994,3024,Organism\/Species\nYellow - chevroned parakeet,3066,3093,Organism\/Species\nBrotogeris chiririTrinidad,3094,3120,Organism\/Species\nRed - tailed Amazon,3162,3181,Organism\/Species\nBlue - fronted Amazon,3338,3359,Organism\/Species\nAmazona aestivaJujuy,3360,3380,Organism\/Species\nNothofagus spp,3603,3617,Organism\/Species\nAgathis australis,3644,3661,Organism\/Species\ncoconut,3693,3700,Organism\/Species\ncactus,3718,3724,Organism\/Species\nxerophyte,3756,3765,Organism\/Species\npohutukawa,3785,3795,Organism\/Species\nMetrosideros kermadecensis,3798,3824,Organism\/Species\ntrees,3880,3885,Organism\/Species\nxerophyte,3967,3976,Organism\/Species\ntrees,3996,4001,Organism\/Species"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/test_0-10.jsonl": "{"text":"User: I'm searching for the InChI of a molecule that is not metabolized by CYP P450 2C9?\nAssistant: This is a molecule that is not a CYP2C9 substrate: InChI=1S\/C18H28N2O4\/c1-5-6-18(23)20-14-7-8-17(16(9-14)13(4)21)24-11-15(22)10-19-12(2)3\/h7-9,12,15,19,22H,5-6,10-11H2,1-4H3,(H,20,23)\/t15-\/m0\/s1"} {"text":"User: I'm looking for the DeepSMILES of a molecule that is not metabolized by CYP P450 2C9?\nAssistant: This is a molecule that is not a substrate for CYP P450 2C9: CCOC=O)[C@H]CCcccccc6))))))))N[C@@H]C)C=O)NCCC[C@H]5C=O)O"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/valid_0-8.jsonl": "{"text":"User: Is the molecule with the canonical SMILES Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1 metabolized by CYP P450 2C9?\nAssistant: No, it is not a CYP2C9 substrate."} {"text":"User: Is the molecule with the SMILES CCCN(CCC)C(=O)Cc1c(-c2ccc(Cl)cc2)nc2ccc(Cl)cn12 metabolized by CYP P450 2C9?\nAssistant: No, it is not a CYP2C9 substrate."}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/train_0-8.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C24H24N2O4\/c1-15(2)30-24(27)23-19(14-28-3)22-18-11-17(29-13-16-7-5-4-6-8-16)9-10-20(18)26-21(22)12-25-23\/h4-12,15,26H,13-14H2,1-3H3 metabolized by CYP P450 2C9?\nAssistant: No, it is not a substrate for CYP2C9."} {"text":"User: Is the molecule with the InChI InChI=1S\/C25H38O5\/c1-6-25(4,5)24(28)30-21-12-15(2)11-17-8-7-16(3)20(23(17)21)10-9-19-13-18(26)14-22(27)29-19\/h7-8,11,15-16,18-21,23,26H,6,9-10,12-14H2,1-5H3\/t15-,16-,18+,19+,20-,21-,23-\/m0\/s1 metabolized by CYP2C9?\nAssistant: No, it is not a substrate for CYP2C9."}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is metabolized by CYP P450 2C9.\nInChI: InChI=1S\/C18H28N2O4\/c1-5-6-18(23)20-14-7-8-17(16(9-14)13(4)21)24-11-15(22)10-19-12(2)3\/h7-9,12,15,19,22H,5-6,10-11H2,1-4H3,(H,20,23)\/t15-\/m0\/s1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not a CYP2C9 substrate."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is metabolized by CYP P450 2C9.\ncanonical SMILES: CCOC(=O)[C@H](CCc1ccccc1)N[C@@H](C)C(=O)N1CCC[C@H]1C(=O)O\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not a CYP P450 2C9 substrate."}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/valid_0-9.jsonl": "{"text":"User: Can you create the SELFIES of a molecule that is a not CYP2C9 substrate?\nAssistant: Of course, here you go: [N][C][=N][C][Branch1][#Branch1][N][C][C][C][Ring1][Ring1][=C][N][=C][N][Branch1][N][C@H1][C][=C][C@@H1][Branch1][Ring1][C][O][C][Ring1][#Branch1][C][Ring1][N][=N][Ring2][Ring1][Ring2]"} {"text":"User: Can you give me the SMILES of a molecule that is a not substrate for CYP P450 2C9?\nAssistant: Sure, here you go: CCCN(CCC)C(=O)Cc1c(-c2ccc(Cl)cc2)nc2ccc(Cl)cn12"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/test_0-1.jsonl": "{"text":"Based on the SELFIES representation [C][C][C][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][=C][O][C][C@@H1][Branch1][C][O][C][N][C][Branch1][C][C][C][C][Branch1][=Branch1][C][Branch1][C][C][=O][=C][Ring2][Ring1][C], the molecule is not metabolized by CYP P450 2C9."} {"text":"Based on the InChI representation InChI=1S\/C20H28N2O5\/c1-3-27-20(26)16(12-11-15-8-5-4-6-9-15)21-14(2)18(23)22-13-7-10-17(22)19(24)25\/h4-6,8-9,14,16-17,21H,3,7,10-13H2,1-2H3,(H,24,25)\/t14-,16-,17-\/m0\/s1, the molecule is not metabolized by CYP P450 2C9."}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/valid_0-0.jsonl": "{"text":"The molecule with the DeepSMILES NcncNCCC3))))cncn[C@H]C=C[C@@H]CO))C5)))))c5n9 is not a CYP2C9 substrate."} {"text":"The molecule with the InChI InChI=1S\/C21H23Cl2N3O\/c1-3-11-25(12-4-2)20(27)13-18-21(15-5-7-16(22)8-6-15)24-19-10-9-17(23)14-26(18)19\/h5-10,14H,3-4,11-13H2,1-2H3 is not a substrate for CYP P450 2C9."}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/test_0-2.jsonl": "{"text":"The DeepSMILES CCCC=O)NccccOC[C@@H]O)CNCC)C)))))))cCC)=O))c6 is from a molecule that is not identified as a substrate for CYP P450 2C9."} {"text":"The canonical SMILES CCOC(=O)[C@H](CCc1ccccc1)N[C@@H](C)C(=O)N1CCC[C@H]1C(=O)O represents a molecule that is not identified as a CYP P450 2C9 substrate."}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/valid_0-10.jsonl": "{"text":"User: I'm looking for the canonical SMILES of a molecule that is not metabolized by CYP2C9?\nAssistant: This is a molecule that is not a substrate for CYP2C9: Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1"} {"text":"User: I'm looking for the SMILES of a molecule that is not metabolized by CYP2C9?\nAssistant: This is a molecule that is not a CYP2C9 substrate: CCCN(CCC)C(=O)Cc1c(-c2ccc(Cl)cc2)nc2ccc(Cl)cn12"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/train_0-6.jsonl": "{"text":"Task: Please generate a SMILES based on the text description below.\nDescription: A molecule that is a CYP P450 2C9 substrate.\nResult: COCc1c(C(=O)OC(C)C)ncc2[nH]c3ccc(OCc4ccccc4)cc3c12"} {"text":"Task: Please generate a canonical SMILES based on the text description below.\nDescription: A molecule that is a substrate for CYP2C9.\nResult: CCC(C)(C)C(=O)O[C@H]1C[C@@H](C)C=C2C=C[C@H](C)[C@H](CC[C@@H]3C[C@@H](O)CC(=O)O3)[C@H]21"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/valid_0-6.jsonl": "{"text":"Task: Please generate a molecule InChI based on the text description.\nDescription: A molecule that is a CYP P450 2C9 substrate.\nResult: InChI=1S\/C14H18N6O\/c15-14-18-12(17-9-2-3-9)11-13(19-14)20(7-16-11)10-4-1-8(5-10)6-21\/h1,4,7-10,21H,2-3,5-6H2,(H3,15,17,18,19)\/t8-,10+\/m1\/s1"} {"text":"Task: Please generate a InChI based on the description below.\nDescription: A molecule that is a substrate for CYP P450 2C9.\nResult: InChI=1S\/C21H23Cl2N3O\/c1-3-11-25(12-4-2)20(27)13-18-21(15-5-7-16(22)8-6-15)24-19-10-9-17(23)14-26(18)19\/h5-10,14H,3-4,11-13H2,1-2H3"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/test_0-9.jsonl": "{"text":"User: Can you create the SMILES of a molecule that is a not substrate for CYP P450 2C9?\nAssistant: Of course, here you go: CCCC(=O)Nc1ccc(OC[C@@H](O)CNC(C)C)c(C(C)=O)c1"} {"text":"User: Can you give me the InChI of a molecule that is a not CYP2C9 substrate?\nAssistant: Yes, I'm happy to help, here you go: InChI=1S\/C20H28N2O5\/c1-3-27-20(26)16(12-11-15-8-5-4-6-9-15)21-14(2)18(23)22-13-7-10-17(22)19(24)25\/h4-6,8-9,14,16-17,21H,3,7,10-13H2,1-2H3,(H,24,25)\/t14-,16-,17-\/m0\/s1"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/test_0-0.jsonl": "{"text":"The molecule with the SELFIES [C][C][C][C][=Branch1][C][=O][N][C][=C][C][=C][Branch1][=C][O][C][C@@H1][Branch1][C][O][C][N][C][Branch1][C][C][C][C][Branch1][=Branch1][C][Branch1][C][C][=O][=C][Ring2][Ring1][C] is not a substrate for CYP2C9."} {"text":"The molecule with the DeepSMILES CCOC=O)[C@H]CCcccccc6))))))))N[C@@H]C)C=O)NCCC[C@H]5C=O)O is not a CYP P450 2C9 substrate."}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/valid_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the canonical SMILES Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1 is a CYP2C9 substrate?\nAssistant: No, this molecule is not metabolized by CYP P450 2C9."} {"text":"User: Can you derive if the molecule with the InChI InChI=1S\/C21H23Cl2N3O\/c1-3-11-25(12-4-2)20(27)13-18-21(15-5-7-16(22)8-6-15)24-19-10-9-17(23)14-26(18)19\/h5-10,14H,3-4,11-13H2,1-2H3 is a CYP2C9 substrate?\nAssistant: No, this molecule is not metabolized by CYP2C9."}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/test_0-3.jsonl": "{"text":"The molecule canonical SMILES CCCC(=O)Nc1ccc(OC[C@@H](O)CNC(C)C)c(C(C)=O)c1 is not metabolized by CYP2C9."} {"text":"The molecule SMILES CCOC(=O)[C@H](CCc1ccccc1)N[C@@H](C)C(=O)N1CCC[C@H]1C(=O)O is not metabolized by CYP P450 2C9."}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/valid_0-11.jsonl": "{"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be metabolized by CYP2C9.\nAssistant: Ok, this canonical SMILES is not metabolized by CYP2C9: Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1"} {"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be metabolized by CYP2C9.\nAssistant: Got it, this DeepSMILES is not metabolized by CYP2C9: CCCNCCC)))C=O)Ccc-ccccCl)cc6))))))nccccCl)cn96"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/train_0-0.jsonl": "{"text":"The molecule with the SMILES COCc1c(C(=O)OC(C)C)ncc2[nH]c3ccc(OCc4ccccc4)cc3c12 is not a substrate for CYP2C9."} {"text":"The molecule with the InChI InChI=1S\/C25H38O5\/c1-6-25(4,5)24(28)30-21-12-15(2)11-17-8-7-16(3)20(23(17)21)10-9-19-13-18(26)14-22(27)29-19\/h7-8,11,15-16,18-21,23,26H,6,9-10,12-14H2,1-5H3\/t15-,16-,18+,19+,20-,21-,23-\/m0\/s1 is not a substrate for CYP P450 2C9."}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/test_0-6.jsonl": "{"text":"Task: Please create a molecule canonical SMILES based on the description.\nDescription: A molecule that is a CYP P450 2C9 substrate.\nResult: CCCC(=O)Nc1ccc(OC[C@@H](O)CNC(C)C)c(C(C)=O)c1"} {"text":"Task: Please generate a molecule canonical SMILES based on the text description below.\nDescription: A molecule that is a CYP P450 2C9 substrate.\nResult: CCOC(=O)[C@H](CCc1ccccc1)N[C@@H](C)C(=O)N1CCC[C@H]1C(=O)O"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/train_0-10.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that is not metabolized by CYP P450 2C9?\nAssistant: This is a molecule that is not a CYP2C9 substrate: COCc1c(C(=O)OC(C)C)ncc2[nH]c3ccc(OCc4ccccc4)cc3c12"} {"text":"User: I'm searching for the SELFIES of a molecule that is not metabolized by CYP P450 2C9?\nAssistant: This is a molecule that is not a substrate for CYP P450 2C9: [C][C][C][Branch1][C][C][Branch1][C][C][C][=Branch1][C][=O][O][C@H1][C][C@@H1][Branch1][C][C][C][=C][C][=C][C@H1][Branch1][C][C][C@H1][Branch1][P][C][C][C@@H1][C][C@@H1][Branch1][C][O][C][C][=Branch1][C][=O][O][Ring1][Branch2][C@H1][Ring1][P][Ring2][Ring1][=Branch1]"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/train_0-3.jsonl": "{"text":"The canonical SMILES COCc1c(C(=O)OC(C)C)ncc2[nH]c3ccc(OCc4ccccc4)cc3c12 is not metabolized by CYP P450 2C9."} {"text":"The molecule InChI InChI=1S\/C25H38O5\/c1-6-25(4,5)24(28)30-21-12-15(2)11-17-8-7-16(3)20(23(17)21)10-9-19-13-18(26)14-22(27)29-19\/h7-8,11,15-16,18-21,23,26H,6,9-10,12-14H2,1-5H3\/t15-,16-,18+,19+,20-,21-,23-\/m0\/s1 is not metabolized by CYP2C9."}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/train_0-12.jsonl": "{"text":"User: I want to generate a molecule SMILES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be a CYP P450 2C9 substrate.\nAssistant: Understood, this SMILES is not a CYP P450 2C9 substrate: COCc1c(C(=O)OC(C)C)ncc2[nH]c3ccc(OCc4ccccc4)cc3c12"} {"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be a substrate for CYP2C9.\nAssistant: Ok, this SELFIES is not a substrate for CYP2C9: [C][C][C][Branch1][C][C][Branch1][C][C][C][=Branch1][C][=O][O][C@H1][C][C@@H1][Branch1][C][C][C][=C][C][=C][C@H1][Branch1][C][C][C@H1][Branch1][P][C][C][C@@H1][C][C@@H1][Branch1][C][O][C][C][=Branch1][C][=O][O][Ring1][Branch2][C@H1][Ring1][P][Ring2][Ring1][=Branch1]"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES representation of CCCC=O)NccccOC[C@@H]O)CNCC)C)))))))cCC)=O))c6 metabolized by CYP P450 2C9?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any additional words.\nOptions:\n1) False\n2) True\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI InChI=1S\/C20H28N2O5\/c1-3-27-20(26)16(12-11-15-8-5-4-6-9-15)21-14(2)18(23)22-13-7-10-17(22)19(24)25\/h4-6,8-9,14,16-17,21H,3,7,10-13H2,1-2H3,(H,24,25)\/t14-,16-,17-\/m0\/s1 metabolized by CYP P450 2C9?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any additional words.\nOptions:\n1) False\n2) True\nAnswer: 1"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/valid_0-2.jsonl": "{"text":"The DeepSMILES NcncNCCC3))))cncn[C@H]C=C[C@@H]CO))C5)))))c5n9 is from a molecule that is not identified as a substrate for CYP2C9."} {"text":"The SMILES CCCN(CCC)C(=O)Cc1c(-c2ccc(Cl)cc2)nc2ccc(Cl)cn12 represents a molecule that is not identified as a substrate for CYP P450 2C9."}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a CYP2C9 substrate?\nConstraint: You must select none, one or more options from a, b, c, or d without using any other words.\nOptions:\n[a] Cn1c(=O)c2[nH]cnc2n(C)c1=O\n[b] COC(=O)C1=C(C)NC(C)=C(C(=O)OCC(C)C)[C@H]1c1ccccc1[N+](=O)[O-]\n[c] CC[C@H](C)C(=O)O[C@H]1CCC=C2C=C[C@H](C)[C@H](CC[C@@H]3C[C@@H](O)CC(=O)O3)[C@H]21\n[d] COCc1c(C(=O)OC(C)C)ncc2[nH]c3ccc(OCc4ccccc4)cc3c12\nAnswer: b, c, d"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a substrate for CYP P450 2C9?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any additional words.\nOptions:\nA: CNnccC=O)O))c=O)cccF)cNCCNC)CC6))))))cc6%10\nB: CCCC)C)C=O)O[C@H]C[C@@H]C)C=CC=C[C@H]C)[C@H]CC[C@@H]C[C@@H]O)CC=O)O6))))))))[C@H]6%10\nC: O=CCCC[S@@H]=O)cccccc6)))))))))C=O)Ncccccc6))))))N5cccccc6\nD: C[C@@H]CCcccccc6))))))))NC[C@H]O)ccccO)cCN)=O))c6\nE: CCC)C=O)O))cccc[C@@H]O)CCCNCCCCO)cccccc6))))))cccccc6)))))))CC6))))))))))cc6\nAnswer: A, B, D, E"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/valid_0-1.jsonl": "{"text":"Based on the DeepSMILES representation NcncNCCC3))))cncn[C@H]C=C[C@@H]CO))C5)))))c5n9, the molecule is not metabolized by CYP2C9."} {"text":"Based on the canonical SMILES representation CCCN(CCC)C(=O)Cc1c(-c2ccc(Cl)cc2)nc2ccc(Cl)cn12, the molecule is not metabolized by CYP2C9."}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES [N][C][=N][C][Branch1][#Branch1][N][C][C][C][Ring1][Ring1][=C][N][=C][N][Branch1][N][C@H1][C][=C][C@@H1][Branch1][Ring1][C][O][C][Ring1][#Branch1][C][Ring1][N][=N][Ring2][Ring1][Ring2] metabolized by CYP2C9?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any additional words.\nOptions:\n1 True\n2 False\nAnswer: 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI representation of InChI=1S\/C21H23Cl2N3O\/c1-3-11-25(12-4-2)20(27)13-18-21(15-5-7-16(22)8-6-15)24-19-10-9-17(23)14-26(18)19\/h5-10,14H,3-4,11-13H2,1-2H3 metabolized by CYP P450 2C9?\nConstraint: Even if you are uncertain, you must pick either a or b without using any additional words.\nOptions:\n(a) False\n(b) True\nAnswer: a"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is metabolized by CYP P450 2C9.\nMolecule InChI: InChI=1S\/C14H18N6O\/c15-14-18-12(17-9-2-3-9)11-13(19-14)20(7-16-11)10-4-1-8(5-10)6-21\/h1,4,7-10,21H,2-3,5-6H2,(H3,15,17,18,19)\/t8-,10+\/m1\/s1\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not a CYP P450 2C9 substrate."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is metabolized by CYP2C9.\nMolecule canonical SMILES: CCCN(CCC)C(=O)Cc1c(-c2ccc(Cl)cc2)nc2ccc(Cl)cn12\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not a CYP P450 2C9 substrate."}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a CYP P450 2C9 substrate.\nMolecule InChI: InChI=1S\/C14H18N6O\/c15-14-18-12(17-9-2-3-9)11-13(19-14)20(7-16-11)10-4-1-8(5-10)6-21\/h1,4,7-10,21H,2-3,5-6H2,(H3,15,17,18,19)\/t8-,10+\/m1\/s1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nesult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a substrate for CYP P450 2C9.\nMolecule SMILES: CCCN(CCC)C(=O)Cc1c(-c2ccc(Cl)cc2)nc2ccc(Cl)cn12\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nesult: False"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is metabolized by CYP P450 2C9.\ncanonical SMILES: COCc1c(C(=O)OC(C)C)ncc2[nH]c3ccc(OCc4ccccc4)cc3c12\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not a CYP P450 2C9 substrate."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is metabolized by CYP2C9.\nSELFIES: [C][C][C][Branch1][C][C][Branch1][C][C][C][=Branch1][C][=O][O][C@H1][C][C@@H1][Branch1][C][C][C][=C][C][=C][C@H1][Branch1][C][C][C@H1][Branch1][P][C][C][C@@H1][C][C@@H1][Branch1][C][O][C][C][=Branch1][C][=O][O][Ring1][Branch2][C@H1][Ring1][P][Ring2][Ring1][=Branch1]\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not a substrate for CYP2C9."}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/valid_0-12.jsonl": "{"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be a substrate for CYP2C9.\nAssistant: Got it, this canonical SMILES is not a substrate for CYP2C9: Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1"} {"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be a CYP P450 2C9 substrate.\nAssistant: Ok, this DeepSMILES is not a CYP P450 2C9 substrate: CCCNCCC)))C=O)Ccc-ccccCl)cc6))))))nccccCl)cn96"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/train_0-2.jsonl": "{"text":"The canonical SMILES COCc1c(C(=O)OC(C)C)ncc2[nH]c3ccc(OCc4ccccc4)cc3c12 represents a molecule that is not identified as a CYP2C9 substrate."} {"text":"The canonical SMILES CCC(C)(C)C(=O)O[C@H]1C[C@@H](C)C=C2C=C[C@H](C)[C@H](CC[C@@H]3C[C@@H](O)CC(=O)O3)[C@H]21 is from a molecule that is not identified as a substrate for CYP2C9."}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/test_0-11.jsonl": "{"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should not be metabolized by CYP2C9.\nAssistant: Got it, here you go, this DeepSMILES is not metabolized by CYP2C9: CCCC=O)NccccOC[C@@H]O)CNCC)C)))))))cCC)=O))c6"} {"text":"User: I want to come up with a SELFIES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be metabolized by CYP P450 2C9.\nAssistant: Got it, this SELFIES is not metabolized by CYP P450 2C9: [C][C][O][C][=Branch1][C][=O][C@H1][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][N][C@@H1][Branch1][C][C][C][=Branch1][C][=O][N][C][C][C][C@H1][Ring1][Branch1][C][=Branch1][C][=O][O]"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/train_0-7.jsonl": "{"text":"User: Can you derive if the molecule with the DeepSMILES COCccC=O)OCC)C))))ncc[nH]ccccOCcccccc6))))))))cc6c%139 is a substrate for CYP P450 2C9?\nAssistant: No, this molecule is not metabolized by CYP2C9."} {"text":"User: Can you tell me if the molecule with the DeepSMILES CCCC)C)C=O)O[C@H]C[C@@H]C)C=CC=C[C@H]C)[C@H]CC[C@@H]C[C@@H]O)CC=O)O6))))))))[C@H]6%10 is a substrate for CYP2C9?\nAssistant: No, this molecule is not metabolized by CYP P450 2C9."}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/train_0-11.jsonl": "{"text":"User: I want to generate a InChI.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be metabolized by CYP P450 2C9.\nAssistant: Got it, this InChI is not metabolized by CYP P450 2C9: InChI=1S\/C24H24N2O4\/c1-15(2)30-24(27)23-19(14-28-3)22-18-11-17(29-13-16-7-5-4-6-8-16)9-10-20(18)26-21(22)12-25-23\/h4-12,15,26H,13-14H2,1-3H3"} {"text":"User: I want to come up with a molecule SELFIES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be metabolized by CYP P450 2C9.\nAssistant: Ok, here you go, this SELFIES is not metabolized by CYP P450 2C9: [C][C][C][Branch1][C][C][Branch1][C][C][C][=Branch1][C][=O][O][C@H1][C][C@@H1][Branch1][C][C][C][=C][C][=C][C@H1][Branch1][C][C][C@H1][Branch1][P][C][C][C@@H1][C][C@@H1][Branch1][C][O][C][C][=Branch1][C][=O][O][Ring1][Branch2][C@H1][Ring1][P][Ring2][Ring1][=Branch1]"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/train_0-1.jsonl": "{"text":"Based on the DeepSMILES representation COCccC=O)OCC)C))))ncc[nH]ccccOCcccccc6))))))))cc6c%139, the molecule is not metabolized by CYP P450 2C9."} {"text":"Based on the InChI InChI=1S\/C25H38O5\/c1-6-25(4,5)24(28)30-21-12-15(2)11-17-8-7-16(3)20(23(17)21)10-9-19-13-18(26)14-22(27)29-19\/h7-8,11,15-16,18-21,23,26H,6,9-10,12-14H2,1-5H3\/t15-,16-,18+,19+,20-,21-,23-\/m0\/s1, the molecule is not metabolized by CYP2C9."}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the InChI representation of InChI=1S\/C24H24N2O4\/c1-15(2)30-24(27)23-19(14-28-3)22-18-11-17(29-13-16-7-5-4-6-8-16)9-10-20(18)26-21(22)12-25-23\/h4-12,15,26H,13-14H2,1-3H3 metabolized by CYP P450 2C9?\nConstraint: Even if you are not sure, you must pick either A or B without using any other words.\nOptions:\n[A] False\n[B] True\nAnswer: A"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES [C][C][C][Branch1][C][C][Branch1][C][C][C][=Branch1][C][=O][O][C@H1][C][C@@H1][Branch1][C][C][C][=C][C][=C][C@H1][Branch1][C][C][C@H1][Branch1][P][C][C][C@@H1][C][C@@H1][Branch1][C][O][C][C][=Branch1][C][=O][O][Ring1][Branch2][C@H1][Ring1][P][Ring2][Ring1][=Branch1] metabolized by CYP P450 2C9?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any additional words.\nOptions:\n1) False\n2) True\nAnswer: 1"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a substrate for CYP P450 2C9.\nDeepSMILES: COCccC=O)OCC)C))))ncc[nH]ccccOCcccccc6))))))))cc6c%139\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nesult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a substrate for CYP P450 2C9.\nSMILES: CCC(C)(C)C(=O)O[C@H]1C[C@@H](C)C=C2C=C[C@H](C)[C@H](CC[C@@H]3C[C@@H](O)CC(=O)O3)[C@H]21\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nesult: False"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/test_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the canonical SMILES CCCC(=O)Nc1ccc(OC[C@@H](O)CNC(C)C)c(C(C)=O)c1 is a substrate for CYP2C9?\nAssistant: No, this molecule is not metabolized by CYP P450 2C9."} {"text":"User: Can you estimate if the molecule with the SELFIES [C][C][O][C][=Branch1][C][=O][C@H1][Branch1][O][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][N][C@@H1][Branch1][C][C][C][=Branch1][C][=O][N][C][C][C][C@H1][Ring1][Branch1][C][=Branch1][C][=O][O] is a substrate for CYP2C9?\nAssistant: No, this molecule is not metabolized by CYP P450 2C9."}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/train_0-9.jsonl": "{"text":"User: Can you generate the SMILES of a molecule that is a not substrate for CYP P450 2C9?\nAssistant: Of course, here you go: COCc1c(C(=O)OC(C)C)ncc2[nH]c3ccc(OCc4ccccc4)cc3c12"} {"text":"User: Can you generate the InChI of a molecule that is a not CYP2C9 substrate?\nAssistant: Yes, I'm happy to help, here you go: InChI=1S\/C25H38O5\/c1-6-25(4,5)24(28)30-21-12-15(2)11-17-8-7-16(3)20(23(17)21)10-9-19-13-18(26)14-22(27)29-19\/h7-8,11,15-16,18-21,23,26H,6,9-10,12-14H2,1-5H3\/t15-,16-,18+,19+,20-,21-,23-\/m0\/s1"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/valid_0-3.jsonl": "{"text":"The molecule SELFIES [N][C][=N][C][Branch1][#Branch1][N][C][C][C][Ring1][Ring1][=C][N][=C][N][Branch1][N][C@H1][C][=C][C@@H1][Branch1][Ring1][C][O][C][Ring1][#Branch1][C][Ring1][N][=N][Ring2][Ring1][Ring2] is not metabolized by CYP2C9."} {"text":"The molecule canonical SMILES CCCN(CCC)C(=O)Cc1c(-c2ccc(Cl)cc2)nc2ccc(Cl)cn12 is not metabolized by CYP2C9."}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/test_0-8.jsonl": "{"text":"User: Is the molecule with the DeepSMILES CCCC=O)NccccOC[C@@H]O)CNCC)C)))))))cCC)=O))c6 metabolized by CYP P450 2C9?\nAssistant: No, it is not a substrate for CYP2C9."} {"text":"User: Is the molecule with the canonical SMILES CCOC(=O)[C@H](CCc1ccccc1)N[C@@H](C)C(=O)N1CCC[C@H]1C(=O)O metabolized by CYP2C9?\nAssistant: No, it is not a CYP2C9 substrate."}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a substrate for CYP P450 2C9?\nConstraint: You must select none, one or more options from A, B, or C without using any other words.\nOptions:\n[A] CCCC(=O)Nc1ccc(OC[C@@H](O)CNC(C)C)c(C(C)=O)c1\n[B] Cc1ccc(O)c([C@@H](CCN(C(C)C)C(C)C)c2ccccc2)c1\n[C] CC1=C(CC(=O)O)c2cc(F)ccc2\/C1=C\\c1ccc([S@@H](C)=O)cc1\nAnswer: A, C"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a CYP2C9 substrate?\nConstraint: You must select none, one or more options from 1, 2, or 3 without using any other words.\nOptions:\n1 CCOC(=O)[C@H](CCc1ccccc1)N[C@@H](C)C(=O)N1CCC[C@H]1C(=O)O\n2 C#C[C@]1(O)C=C[C@H]2[C@@H]3CCC4=CC(=O)CC[C@@H]4[C@H]3CC[C@@]21CC\n3 Nc1ccc(S(N)(=O)=O)cc1\nAnswer: 1, 2, 3"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a substrate for CYP2C9?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any additional words.\nOptions:\n1) Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1\n2) C#CCN(C)[C@@H](C)Cc1ccccc1\n3) CC(C)(C)NC[C@H](O)COc1cccc2c1CCC(=O)N2\n4) Cc1ccc(Cl)c(OC[C@@H](O)CNC(C)(C)C)c1\nAnswer: 1, 3, 4"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not a CYP2C9 substrate?\nConstraint: You must select none, one or more options from a or b without using any other words.\nOptions:\na CCCN(CCC)C(=O)Cc1c(-c2ccc(Cl)cc2)nc2ccc(Cl)cn12\nb C[C@H]1c2cccc(O)c2C(=O)C2=C(O)[C@]3(O)C(=O)C(C(N)=O)=C(O)[C@@H](N(C)C)[C@@H]3[C@@H](O)[C@@H]21\nAnswer: a, b"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a substrate for CYP2C9.\nDeepSMILES: CCCC=O)NccccOC[C@@H]O)CNCC)C)))))))cCC)=O))c6\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nesult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is a CYP2C9 substrate.\nInChI: InChI=1S\/C20H28N2O5\/c1-3-27-20(26)16(12-11-15-8-5-4-6-9-15)21-14(2)18(23)22-13-7-10-17(22)19(24)25\/h4-6,8-9,14,16-17,21H,3,7,10-13H2,1-2H3,(H,24,25)\/t14-,16-,17-\/m0\/s1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nesult: False"}", "/scratch/micpie/export/cyp2c9_substrate_carbonmangels/test_0-12.jsonl": "{"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be a CYP P450 2C9 substrate.\nAssistant: Ok, this DeepSMILES is not a CYP P450 2C9 substrate: CCCC=O)NccccOC[C@@H]O)CNCC)C)))))))cCC)=O))c6"} {"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be a CYP2C9 substrate.\nAssistant: Understood, this InChI is not a CYP2C9 substrate: InChI=1S\/C20H28N2O5\/c1-3-27-20(26)16(12-11-15-8-5-4-6-9-15)21-14(2)18(23)22-13-7-10-17(22)19(24)25\/h4-6,8-9,14,16-17,21H,3,7,10-13H2,1-2H3,(H,24,25)\/t14-,16-,17-\/m0\/s1"}", "/scratch/micpie/export/chem_caption_smarts/test_0-1.jsonl": "{"text":"Question: How many times does the compound with SMILES CN(CC1(O)CCC1)C(=O)COCCNC(=O)c1ccccc1 contain a 0 substructure?\nAnswer: SMiles ARbitrary Target Specification (SMARTS) [#8]=[#6]-[#8]-[#6]-[#6]1-[#6]2:[#6](-[#6]3:[#6]-1:[#6]:[#6]:[#6]:[#6]:3):[#6]:[#6]:[#6]:[#6]:2"} {"text":"Question: How often does the molecule with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ contain a 1 substructure?\nAnswer: SMARTS [#8]1:[#6]:[#7]:[#6]:[#6]:1"}", "/scratch/micpie/export/chem_caption_smarts/valid_0-0.jsonl": "{"text":"Question: How many times does the molecule with SELFIES [C][C@H1][O][C][C][C@H1][Branch1][Ring2][C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=Branch1][C][=O][N][C][C][C][C@H1][Branch1][Ring2][C][Ring1][=Branch1][N][C][=C][C][=N][Ring1][Branch1] contain the substructure with the SMARTS [#8]=[#6]-[#8]-[#6]-[#6]1:[#6]:[#6]:[#6](-[#7+](-[#8-])=[#8]):[#6]:[#6]:1?\nAnswer: 0"} {"text":"Question: How many times does the molecule with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) contain the substructure with the SMARTS [F,Cl,Br,I]?\nAnswer: 3"}", "/scratch/micpie/export/chem_caption_smarts/test_0-2.jsonl": "{"text":"User: I must know how many times the molecule with SMILES CN(CC1(O)CCC1)C(=O)COCCNC(=O)c1ccccc1 contains the substructure with the SMiles ARbitrary Target Specification (SMARTS) [#8]=[#6]-[#8]-[#6]-[#6]1-[#6]2:[#6](-[#6]3:[#6]-1:[#6]:[#6]:[#6]:[#6]:3):[#6]:[#6]:[#6]:[#6]:2.\nAssistant: The molecule with SMILES CN(CC1(O)CCC1)C(=O)COCCNC(=O)c1ccccc1 contains the substructure with the SMiles ARbitrary Target Specification (SMARTS) [#8]=[#6]-[#8]-[#6]-[#6]1-[#6]2:[#6](-[#6]3:[#6]-1:[#6]:[#6]:[#6]:[#6]:3):[#6]:[#6]:[#6]:[#6]:2 0 times."} {"text":"User: I must know how often the molecule with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ contains the substructure with the SMARTS [#8]1:[#6]:[#7]:[#6]:[#6]:1.\nAssistant: The molecule with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ contains the substructure with the SMARTS [#8]1:[#6]:[#7]:[#6]:[#6]:1 1 times."}", "/scratch/micpie/export/chem_caption_smarts/test_0-0.jsonl": "{"text":"Question: How often does the chemical with SMILES CN(CC1(O)CCC1)C(=O)COCCNC(=O)c1ccccc1 contain the substructure with the SMARTS [#8]=[#6]-[#8]-[#6]-[#6]1-[#6]2:[#6](-[#6]3:[#6]-1:[#6]:[#6]:[#6]:[#6]:3):[#6]:[#6]:[#6]:[#6]:2?\nAnswer: 0"} {"text":"Question: How many times does the compound with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ contain the substructure with the SMARTS [#8]1:[#6]:[#7]:[#6]:[#6]:1?\nAnswer: 1"}", "/scratch/micpie/export/chem_caption_smarts/test_0-3.jsonl": "{"text":"User: I must know how many times the chemical structure with SMILES CN(CC1(O)CCC1)C(=O)COCCNC(=O)c1ccccc1 contains a 0 substructure.\nAssistant: The chemical structure contains the substructure with the SMiles ARbitrary Target Specification (SMARTS) [#8]=[#6]-[#8]-[#6]-[#6]1-[#6]2:[#6](-[#6]3:[#6]-1:[#6]:[#6]:[#6]:[#6]:3):[#6]:[#6]:[#6]:[#6]:2 0 times."} {"text":"User: I have to know how many times the chemical structure with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ contains a 1 substructure.\nAssistant: The chemical structure contains the substructure with the SMiles ARbitrary Target Specification (SMARTS) [#8]1:[#6]:[#7]:[#6]:[#6]:1 1 times."}", "/scratch/micpie/export/chem_caption_smarts/train_0-0.jsonl": "{"text":"Question: How many times does the molecule with InChI InChI=1S\/C12H18FN3O4S\/c1-8(20-3)7-15-12(17)16-9-4-5-10(13)11(6-9)21(18,19)14-2\/h4-6,8,14H,7H2,1-3H3,(H2,15,16,17)\/t8-\/m1\/s1 contain the substructure with the SMiles ARbitrary Target Specification (SMARTS) [#16]1-[#6]=[#6]-[#6]=[#6]-[#6]=[#6]-1?\nAnswer: 0"} {"text":"Question: How often does the chemical with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ contain the substructure with the SMARTS [F,Cl,Br,I]?\nAnswer: 3"}", "/scratch/micpie/export/chem_caption_smarts/train_0-3.jsonl": "{"text":"User: I would like to know how many times the chemical structure with InChI InChI=1S\/C12H18FN3O4S\/c1-8(20-3)7-15-12(17)16-9-4-5-10(13)11(6-9)21(18,19)14-2\/h4-6,8,14H,7H2,1-3H3,(H2,15,16,17)\/t8-\/m1\/s1 contains a 0 substructure.\nAssistant: The chemical structure contains the substructure with the SMARTS [#16]1-[#6]=[#6]-[#6]=[#6]-[#6]=[#6]-1 0 times."} {"text":"User: I would like to know how many times the compound with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ contains a 3 substructure.\nAssistant: The compound contains the substructure with the SMiles ARbitrary Target Specification (SMARTS) [F,Cl,Br,I] 3 times."}", "/scratch/micpie/export/chem_caption_smarts/valid_0-2.jsonl": "{"text":"User: I would like to know how many times the compound with SELFIES [C][C@H1][O][C][C][C@H1][Branch1][Ring2][C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=Branch1][C][=O][N][C][C][C][C@H1][Branch1][Ring2][C][Ring1][=Branch1][N][C][=C][C][=N][Ring1][Branch1] contains the substructure with the SMARTS [#8]=[#6]-[#8]-[#6]-[#6]1:[#6]:[#6]:[#6](-[#7+](-[#8-])=[#8]):[#6]:[#6]:1.\nAssistant: The compound with SELFIES [C][C@H1][O][C][C][C@H1][Branch1][Ring2][C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=Branch1][C][=O][N][C][C][C][C@H1][Branch1][Ring2][C][Ring1][=Branch1][N][C][=C][C][=N][Ring1][Branch1] contains the substructure with the SMARTS [#8]=[#6]-[#8]-[#6]-[#6]1:[#6]:[#6]:[#6](-[#7+](-[#8-])=[#8]):[#6]:[#6]:1 0 times."} {"text":"User: I must know how many times the molecule with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) contains the substructure with the SMiles ARbitrary Target Specification (SMARTS) [F,Cl,Br,I].\nAssistant: The molecule with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) contains the substructure with the SMiles ARbitrary Target Specification (SMARTS) [F,Cl,Br,I] 3 times."}", "/scratch/micpie/export/chem_caption_smarts/valid_0-1.jsonl": "{"text":"Question: How many times does the chemical with SELFIES [C][C@H1][O][C][C][C@H1][Branch1][Ring2][C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=Branch1][C][=O][N][C][C][C][C@H1][Branch1][Ring2][C][Ring1][=Branch1][N][C][=C][C][=N][Ring1][Branch1] contain a 0 substructure?\nAnswer: SMARTS [#8]=[#6]-[#8]-[#6]-[#6]1:[#6]:[#6]:[#6](-[#7+](-[#8-])=[#8]):[#6]:[#6]:1"} {"text":"Question: How many times does the compound with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) contain a 3 substructure?\nAnswer: SMARTS [F,Cl,Br,I]"}", "/scratch/micpie/export/chem_caption_smarts/train_0-2.jsonl": "{"text":"User: I would like to know how often the molecule with InChI InChI=1S\/C12H18FN3O4S\/c1-8(20-3)7-15-12(17)16-9-4-5-10(13)11(6-9)21(18,19)14-2\/h4-6,8,14H,7H2,1-3H3,(H2,15,16,17)\/t8-\/m1\/s1 contains the substructure with the SMiles ARbitrary Target Specification (SMARTS) [#16]1-[#6]=[#6]-[#6]=[#6]-[#6]=[#6]-1.\nAssistant: The molecule with InChI InChI=1S\/C12H18FN3O4S\/c1-8(20-3)7-15-12(17)16-9-4-5-10(13)11(6-9)21(18,19)14-2\/h4-6,8,14H,7H2,1-3H3,(H2,15,16,17)\/t8-\/m1\/s1 contains the substructure with the SMiles ARbitrary Target Specification (SMARTS) [#16]1-[#6]=[#6]-[#6]=[#6]-[#6]=[#6]-1 0 times."} {"text":"User: I must know how often the compound with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ contains the substructure with the SMARTS [F,Cl,Br,I].\nAssistant: The compound with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ contains the substructure with the SMARTS [F,Cl,Br,I] 3 times."}", "/scratch/micpie/export/chem_caption_smarts/train_0-1.jsonl": "{"text":"Question: How many times does the molecule with InChI InChI=1S\/C12H18FN3O4S\/c1-8(20-3)7-15-12(17)16-9-4-5-10(13)11(6-9)21(18,19)14-2\/h4-6,8,14H,7H2,1-3H3,(H2,15,16,17)\/t8-\/m1\/s1 contain a 0 substructure?\nAnswer: SMARTS [#16]1-[#6]=[#6]-[#6]=[#6]-[#6]=[#6]-1"} {"text":"Question: How often does the chemical with InChI InChI=1S\/C17H18F3N5O3\/c1-21-16-22-4-12(28-16)14(26)23-13-9-6-25(7-10(9)13)15(27)11-3-8(5-24(11)2)17(18,19)20\/h3-5,9-10,13H,6-7H2,1-2H3,(H,21,22)(H,23,26)\/t9-,10+,13+ contain a 3 substructure?\nAnswer: SMiles ARbitrary Target Specification (SMARTS) [F,Cl,Br,I]"}", "/scratch/micpie/export/chem_caption_smarts/valid_0-3.jsonl": "{"text":"User: I have to know how many times the chemical with SELFIES [C][C@H1][O][C][C][C@H1][Branch1][Ring2][C][Ring1][=Branch1][N][C][=Branch1][C][=O][C][=Branch1][C][=O][N][C][C][C][C@H1][Branch1][Ring2][C][Ring1][=Branch1][N][C][=C][C][=N][Ring1][Branch1] contains a 0 substructure.\nAssistant: The chemical contains the substructure with the SMARTS [#8]=[#6]-[#8]-[#6]-[#6]1:[#6]:[#6]:[#6](-[#7+](-[#8-])=[#8]):[#6]:[#6]:1 0 times."} {"text":"User: I have to know how many times the chemical structure with InChI InChI=1S\/C17H22F3N5O3\/c1-16(2,17(18,19)20)28-9-14(26)24-4-5-25(13-6-12(13)24)15(27)10-7-22-23-8-11(10)21-3\/h7-8,12-13H,4-6,9H2,1-3H3,(H,21,22) contains a 3 substructure.\nAssistant: The chemical structure contains the substructure with the SMiles ARbitrary Target Specification (SMARTS) [F,Cl,Br,I] 3 times."}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/test_0-10.jsonl": "{"text":"User: I'm searching for the SELFIES of a molecule that is not inhibiting the activity of kcnq2 potassium channels?\nAssistant: This is a molecule that is not inhibiting the activity of kcnq2 potassium channels: [S][=Branch1][C][=O][=Branch1][C][=O][Branch2][Ring2][Branch2][N][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][#Branch1][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][=C][C][Branch1][Ring1][O][C][=C][C][=C][Ring1][Branch2][C]"} {"text":"User: I'm looking for the SELFIES of a molecule that is not inhibiting the activity of kcnq2 potassium channels?\nAssistant: This is a molecule that is not inhibiting the activity of kcnq2 potassium channels: [Cl][C][=C][Branch1][#C][C][S][C][O][C][=Branch1][Branch1][=N][N][=Ring1][Branch1][C][C][N][C][Branch1][C][F][=C][C][=C][Ring1][P]"}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/valid_0-8.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C20H14N2O2\/c23-18-13-7-6-12-17(18)22-19(14-8-2-1-3-9-14)21-16-11-5-4-10-15(16)20(22)24\/h1-13,23H inhibiting the activity of kcnq2 potassium channels?\nAssistant: No, it is not inhibiting the activity of kcnq2 potassium channels."} {"text":"User: Is the molecule with the SMILES s1c(c2nn(cc2C2NC(=O)NC(=C2C(OCC=C)=O)C)c2ccccc2)ccc1 inhibiting the activity of kcnq2 potassium channels?\nAssistant: No, it is not inhibiting the activity of kcnq2 potassium channels."}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/train_0-8.jsonl": "{"text":"User: Is the molecule with the SMILES O1c2c(OCC1)ccc(Nc1c3c(nc(c1)C)c(ccc3)C)c2 inhibiting the activity of kcnq2 potassium channels?\nAssistant: No, it is not inhibiting the activity of kcnq2 potassium channels."} {"text":"User: Is the molecule with the SELFIES [O][C][Branch2][Ring1][C][C][=Branch1][C][=O][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][O][=C][Branch1][=C][C][=C][Ring1][S][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][C] inhibiting the activity of kcnq2 potassium channels?\nAssistant: No, it is not inhibiting the activity of kcnq2 potassium channels."}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of kcnq2 potassium channels.\nDeepSMILES: S=O)=O)NCcccccc6)))))))cccccc6))))C=O)NcccOC))ccc6)))))))))))C\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not inhibiting the activity of kcnq2 potassium channels."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of kcnq2 potassium channels.\nSMILES: Clc1c(CSc2oc(nn2)CCN)c(F)ccc1\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not inhibiting the activity of kcnq2 potassium channels."}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/valid_0-9.jsonl": "{"text":"User: Can you generate the InChI of a molecule that is not inhibiting the activity of kcnq2 potassium channels?\nAssistant: Yes, I'm happy to help, here you go: InChI=1S\/C20H14N2O2\/c23-18-13-7-6-12-17(18)22-19(14-8-2-1-3-9-14)21-16-11-5-4-10-15(16)20(22)24\/h1-13,23H"} {"text":"User: Can you generate the canonical SMILES of a molecule that is not inhibiting the activity of kcnq2 potassium channels?\nAssistant: Yes, I'm happy to help, here you go: C=CCOC(=O)C1=C(C)NC(=O)NC1c1cn(-c2ccccc2)nc1-c1cccs1"}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/test_0-1.jsonl": "{"text":"Based on the SELFIES representation [S][=Branch1][C][=O][=Branch1][C][=O][Branch2][Ring2][Branch2][N][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][#Branch1][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][=C][C][Branch1][Ring1][O][C][=C][C][=C][Ring1][Branch2][C], the molecule displays no inhibition of the kcnq2 potassium channel activity."} {"text":"Based on the SMILES representation Clc1c(CSc2oc(nn2)CCN)c(F)ccc1, the molecule displays no inhibition of the kcnq2 potassium channel activity."}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/valid_0-0.jsonl": "{"text":"The molecule with the SELFIES [O][=C][N][Branch2][Ring1][#Branch1][C][=Branch1][N][=N][C][=C][Ring1][=Branch1][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][C][O][C][=C][C][=C][Ring1][#Branch1] shows no inhibition of the kcnq2 potassium channel activity."} {"text":"The molecule with the SELFIES [S][C][Branch2][Ring2][=Branch2][C][=N][N][Branch2][Ring1][O][C][=C][Ring1][Branch1][C][N][C][=Branch1][C][=O][N][C][=Branch1][N][=C][Ring1][#Branch1][C][Branch1][Branch1][O][C][C][=C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][C][=C][Ring2][Ring1][=C] displays no inhibition of the kcnq2 potassium channel activity."}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/test_0-2.jsonl": "{"text":"The SMILES S(=O)(=O)(N(Cc1ccccc1)c1c(cccc1)C(=O)Nc1cc(OC)ccc1)C represents a molecule that displays no inhibition of the kcnq2 potassium channel activity."} {"text":"The canonical SMILES NCCc1nnc(SCc2c(F)cccc2Cl)o1 is from a molecule that shows no inhibition of the kcnq2 potassium channel activity."}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/valid_0-10.jsonl": "{"text":"User: I'm looking for the SMILES of a molecule that is not inhibiting the activity of kcnq2 potassium channels?\nAssistant: This is a molecule that is not inhibiting the activity of kcnq2 potassium channels: O=c1n(c(nc2c1cccc2)c1ccccc1)c1c(O)cccc1"} {"text":"User: I'm looking for the SMILES of a molecule that is not inhibiting the activity of kcnq2 potassium channels?\nAssistant: This is a molecule that is not inhibiting the activity of kcnq2 potassium channels: s1c(c2nn(cc2C2NC(=O)NC(=C2C(OCC=C)=O)C)c2ccccc2)ccc1"}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/train_0-6.jsonl": "{"text":"Task: Please give me a molecule SMILES based on the text description.\nDescription: A molecule that is inhibiting the activity of kcnq2 potassium channels.\nResult: O1c2c(OCC1)ccc(Nc1c3c(nc(c1)C)c(ccc3)C)c2"} {"text":"Task: Please generate a molecule InChI based on the text description below.\nDescription: A molecule that is inhibiting the activity of kcnq2 potassium channels.\nResult: InChI=1S\/C17H22N2O3\/c1-12-3-4-15-14(11-12)13(2)16(22-15)17(21)19-7-5-18(6-8-19)9-10-20\/h3-4,11,20H,5-10H2,1-2H3"}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/valid_0-6.jsonl": "{"text":"Task: Please create a SMILES based on the text description.\nDescription: A molecule that is inhibiting the activity of kcnq2 potassium channels.\nResult: O=c1n(c(nc2c1cccc2)c1ccccc1)c1c(O)cccc1"} {"text":"Task: Please create a SMILES based on the text description.\nDescription: A molecule that is inhibiting the activity of kcnq2 potassium channels.\nResult: s1c(c2nn(cc2C2NC(=O)NC(=C2C(OCC=C)=O)C)c2ccccc2)ccc1"}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/test_0-9.jsonl": "{"text":"User: Can you give me the canonical SMILES of a molecule that is not inhibiting the activity of kcnq2 potassium channels?\nAssistant: Yes, here you go: COc1cccc(NC(=O)c2ccccc2N(Cc2ccccc2)S(C)(=O)=O)c1"} {"text":"User: Can you generate the SELFIES of a molecule that is not inhibiting the activity of kcnq2 potassium channels?\nAssistant: Yes, here you go: [Cl][C][=C][Branch1][#C][C][S][C][O][C][=Branch1][Branch1][=N][N][=Ring1][Branch1][C][C][N][C][Branch1][C][F][=C][C][=C][Ring1][P]"}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/test_0-0.jsonl": "{"text":"The molecule with the DeepSMILES S=O)=O)NCcccccc6)))))))cccccc6))))C=O)NcccOC))ccc6)))))))))))C exhibits no inhibition of the kcnq2 potassium channel activity."} {"text":"The molecule with the canonical SMILES NCCc1nnc(SCc2c(F)cccc2Cl)o1 displays no inhibition of the kcnq2 potassium channel activity."}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/valid_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the InChI InChI=1S\/C20H14N2O2\/c23-18-13-7-6-12-17(18)22-19(14-8-2-1-3-9-14)21-16-11-5-4-10-15(16)20(22)24\/h1-13,23H is inhibiting the activity of kcnq2 potassium channels?\nAssistant: No, this molecule is not inhibiting the activity of kcnq2 potassium channels."} {"text":"User: Can you estimate if the molecule with the SELFIES [S][C][Branch2][Ring2][=Branch2][C][=N][N][Branch2][Ring1][O][C][=C][Ring1][Branch1][C][N][C][=Branch1][C][=O][N][C][=Branch1][N][=C][Ring1][#Branch1][C][Branch1][Branch1][O][C][C][=C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][C][=C][Ring2][Ring1][=C] is inhibiting the activity of kcnq2 potassium channels?\nAssistant: No, this molecule is not inhibiting the activity of kcnq2 potassium channels."}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/test_0-3.jsonl": "{"text":"The SMILES S(=O)(=O)(N(Cc1ccccc1)c1c(cccc1)C(=O)Nc1cc(OC)ccc1)C is not inhibiting the activity of kcnq2 potassium channels."} {"text":"The molecule canonical SMILES NCCc1nnc(SCc2c(F)cccc2Cl)o1 is not inhibiting the activity of kcnq2 potassium channels."}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/valid_0-11.jsonl": "{"text":"User: I want to create a InChI.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be inhibiting the activity of kcnq2 potassium channels.\nAssistant: Got it, this InChI is not inhibiting the activity of kcnq2 potassium channels: InChI=1S\/C20H14N2O2\/c23-18-13-7-6-12-17(18)22-19(14-8-2-1-3-9-14)21-16-11-5-4-10-15(16)20(22)24\/h1-13,23H"} {"text":"User: I want to create a molecule SELFIES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be inhibiting the activity of kcnq2 potassium channels.\nAssistant: Ok, here you go, this SELFIES is not inhibiting the activity of kcnq2 potassium channels: [S][C][Branch2][Ring2][=Branch2][C][=N][N][Branch2][Ring1][O][C][=C][Ring1][Branch1][C][N][C][=Branch1][C][=O][N][C][=Branch1][N][=C][Ring1][#Branch1][C][Branch1][Branch1][O][C][C][=C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][C][=C][Ring2][Ring1][=C]"}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/train_0-0.jsonl": "{"text":"The molecule with the SELFIES representation of [O][C][=C][Branch1][=Branch1][O][C][C][Ring1][=Branch1][C][=C][C][Branch2][Ring1][Branch2][N][C][=C][C][=Branch1][=Branch2][=N][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][C][=Branch1][=Branch1][=C][C][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch1] shows no inhibition of the kcnq2 potassium channel activity."} {"text":"The molecule with the DeepSMILES representation of ocC=O)NCCNCC6))CCO))))))))ccc5cccc6)C))))))C shows no inhibition of the kcnq2 potassium channel activity."}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/test_0-6.jsonl": "{"text":"Task: Please give me a InChI based on the description below.\nDescription: A molecule that is inhibiting the activity of kcnq2 potassium channels.\nResult: InChI=1S\/C22H22N2O4S\/c1-28-19-12-8-11-18(15-19)23-22(25)20-13-6-7-14-21(20)24(29(2,26)27)16-17-9-4-3-5-10-17\/h3-15H,16H2,1-2H3,(H,23,25)"} {"text":"Task: Please generate a molecule SMILES based on the description.\nDescription: A molecule that is inhibiting the activity of kcnq2 potassium channels.\nResult: Clc1c(CSc2oc(nn2)CCN)c(F)ccc1"}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/train_0-10.jsonl": "{"text":"User: I'm looking for the SMILES of a molecule that is not inhibiting the activity of kcnq2 potassium channels?\nAssistant: This is a molecule that is not inhibiting the activity of kcnq2 potassium channels: O1c2c(OCC1)ccc(Nc1c3c(nc(c1)C)c(ccc3)C)c2"} {"text":"User: I'm looking for the InChI of a molecule that is not inhibiting the activity of kcnq2 potassium channels?\nAssistant: This is a molecule that is not inhibiting the activity of kcnq2 potassium channels: InChI=1S\/C17H22N2O3\/c1-12-3-4-15-14(11-12)13(2)16(22-15)17(21)19-7-5-18(6-8-19)9-10-20\/h3-4,11,20H,5-10H2,1-2H3"}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/train_0-3.jsonl": "{"text":"The SMILES O1c2c(OCC1)ccc(Nc1c3c(nc(c1)C)c(ccc3)C)c2 is not inhibiting the activity of kcnq2 potassium channels."} {"text":"The molecule SELFIES [O][C][Branch2][Ring1][C][C][=Branch1][C][=O][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][O][=C][Branch1][=C][C][=C][Ring1][S][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][C] is not inhibiting the activity of kcnq2 potassium channels."}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/train_0-12.jsonl": "{"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be inhibiting the activity of kcnq2 potassium channels.\nAssistant: Ok, this SMILES is not inhibiting the activity of kcnq2 potassium channels: O1c2c(OCC1)ccc(Nc1c3c(nc(c1)C)c(ccc3)C)c2"} {"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be inhibiting the activity of kcnq2 potassium channels.\nAssistant: Understood, this InChI is not inhibiting the activity of kcnq2 potassium channels: InChI=1S\/C17H22N2O3\/c1-12-3-4-15-14(11-12)13(2)16(22-15)17(21)19-7-5-18(6-8-19)9-10-20\/h3-4,11,20H,5-10H2,1-2H3"}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/test_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the activity of kcnq2 potassium channels?\nConstraint: You must select none, one or more options from 1, 2, 3, 4, or 5 without using any other words.\nOptions:\n(1) O(C(=O)CCCNC(=O)c1cc(O)cnc1)CC\n(2) S(c1c(OCc2onc(C(=O)N(CC3OCCC3)CC)c2)cccc1)C\n(3) OCCC1N(CCN(C1)Cc1c(cccc1)c1occc1)Cc1ccccc1\n(4) Fc1ccc(OC(=O)Cc2ccc(OC)cc2)cc1\n(5) S(=O)(=O)(N(Cc1ccccc1)c1c(cccc1)C(=O)Nc1cc(OC)ccc1)C\nAnswer: 1, 2, 3, 4, 5"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the activity of kcnq2 potassium channels?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any other words.\nOptions:\n[A] S(CCC(NC(OCc1ccccc1)=O)C(O)=O)C\n[B] s1c(N(C(=O)CCCC)CCOC)nnc1c1cccnc1\n[C] Clc1c(CSc2oc(nn2)CCN)c(F)ccc1\n[D] S(c1ccc(\/C=C(\\NC(=O)c2ccc(cc2)C)C(=O)Nc2cc(ccc2)C)cc1)C\n[E] O=C1N(C(=O)CC1NNc1ccccc1)c1ccccc1\nAnswer: A, B, C, D, E"}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/valid_0-2.jsonl": "{"text":"The InChI InChI=1S\/C20H14N2O2\/c23-18-13-7-6-12-17(18)22-19(14-8-2-1-3-9-14)21-16-11-5-4-10-15(16)20(22)24\/h1-13,23H represents a molecule that shows no inhibition of the kcnq2 potassium channel activity."} {"text":"The SMILES s1c(c2nn(cc2C2NC(=O)NC(=C2C(OCC=C)=O)C)c2ccccc2)ccc1 is from a molecule that exhibits no inhibition of the kcnq2 potassium channel activity."}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/valid_0-1.jsonl": "{"text":"Based on the DeepSMILES representation O=cncncc6cccc6)))))))cccccc6)))))))ccO)cccc6, the molecule exhibits no inhibition of the kcnq2 potassium channel activity."} {"text":"Based on the SMILES representation s1c(c2nn(cc2C2NC(=O)NC(=C2C(OCC=C)=O)C)c2ccccc2)ccc1, the molecule exhibits no inhibition of the kcnq2 potassium channel activity."}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/valid_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the activity of kcnq2 potassium channels?\nConstraint: You must select none, one or more options from A or B without using any additional words.\nOptions:\nA.) [O][=C][N][Branch2][Ring1][#Branch1][C][=Branch1][N][=N][C][=C][Ring1][=Branch1][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][C][O][C][=C][C][=C][Ring1][#Branch1]\nB.) [S][=Branch1][C][=O][=Branch1][C][=O][Branch2][Ring1][Branch2][N][N][C][=Branch1][C][=O][C][C][=C][C][Branch1][Ring1][O][C][=C][Branch1][Ring1][O][C][C][=C][Ring1][#Branch2][C][=C][Branch1][=Branch1][N+1][Branch1][C][O-1][=O][C][=C][C][=C][Ring1][=Branch2]\nAnswer: A, B"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the activity of kcnq2 potassium channels?\nConstraint: You must select none, one or more options from 1, 2, 3, or 4 without using any additional words.\nOptions:\n1. S(c1nc(N)c(cc1C(=O)Nc1ccccc1)C(=O)Nc1c(OC)cccc1)CC=C\n2. s1c2c(nc1C)ccc(NC(=O)c1c[nH]c(=O)cc1)c2\n3. S(c1nc(cc(c1C#N)COC)C)CC(Oc1ccccc1)=O\n4. s1c(c2nn(cc2C2NC(=O)NC(=C2C(OCC=C)=O)C)c2ccccc2)ccc1\nAnswer: 1, 2, 3, 4"}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of kcnq2 potassium channels.\nSMILES: O=c1n(c(nc2c1cccc2)c1ccccc1)c1c(O)cccc1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not inhibiting the activity of kcnq2 potassium channels."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of kcnq2 potassium channels.\nInChI: InChI=1S\/C22H20N4O3S\/c1-3-11-29-21(27)18-14(2)23-22(28)24-20(18)16-13-26(15-8-5-4-6-9-15)25-19(16)17-10-7-12-30-17\/h3-10,12-13,20H,1,11H2,2H3,(H2,23,24,28)\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not inhibiting the activity of kcnq2 potassium channels."}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/valid_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of kcnq2 potassium channels.\nMolecule SMILES: O=c1n(c(nc2c1cccc2)c1ccccc1)c1c(O)cccc1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of kcnq2 potassium channels.\nMolecule SELFIES: [S][C][Branch2][Ring2][=Branch2][C][=N][N][Branch2][Ring1][O][C][=C][Ring1][Branch1][C][N][C][=Branch1][C][=O][N][C][=Branch1][N][=C][Ring1][#Branch1][C][Branch1][Branch1][O][C][C][=C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][C][=C][Ring2][Ring1][=C]\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of kcnq2 potassium channels.\nMolecule DeepSMILES: OccOCC6)))cccNcccncc6)C)))cccc6)))C))))))c6\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not inhibiting the activity of kcnq2 potassium channels."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of kcnq2 potassium channels.\nMolecule canonical SMILES: Cc1ccc2oc(C(=O)N3CCN(CCO)CC3)c(C)c2c1\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not inhibiting the activity of kcnq2 potassium channels."}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/valid_0-12.jsonl": "{"text":"User: I want to create a molecule SELFIES.\nAssistant: This sounds very interesting. Should it be a special molecule?\nUser: Yes, the molecule should not be inhibiting the activity of kcnq2 potassium channels.\nAssistant: Ok, this SELFIES is not inhibiting the activity of kcnq2 potassium channels: [O][=C][N][Branch2][Ring1][#Branch1][C][=Branch1][N][=N][C][=C][Ring1][=Branch1][C][=C][C][=C][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][C][O][C][=C][C][=C][Ring1][#Branch1]"} {"text":"User: I want to create a molecule SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be inhibiting the activity of kcnq2 potassium channels.\nAssistant: Ok, this SMILES is not inhibiting the activity of kcnq2 potassium channels: s1c(c2nn(cc2C2NC(=O)NC(=C2C(OCC=C)=O)C)c2ccccc2)ccc1"}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/train_0-2.jsonl": "{"text":"The SMILES O1c2c(OCC1)ccc(Nc1c3c(nc(c1)C)c(ccc3)C)c2 is from a molecule that shows no inhibition of the kcnq2 potassium channel activity."} {"text":"The SELFIES [O][C][Branch2][Ring1][C][C][=Branch1][C][=O][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][O][=C][Branch1][=C][C][=C][Ring1][S][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][C] represents a molecule that shows no inhibition of the kcnq2 potassium channel activity."}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/test_0-11.jsonl": "{"text":"User: I want to come up with a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be inhibiting the activity of kcnq2 potassium channels.\nAssistant: Ok, this canonical SMILES is not inhibiting the activity of kcnq2 potassium channels: COc1cccc(NC(=O)c2ccccc2N(Cc2ccccc2)S(C)(=O)=O)c1"} {"text":"User: I want to create a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be inhibiting the activity of kcnq2 potassium channels.\nAssistant: Ok, here you go, this canonical SMILES is not inhibiting the activity of kcnq2 potassium channels: NCCc1nnc(SCc2c(F)cccc2Cl)o1"}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/train_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the InChI InChI=1S\/C19H18N2O2\/c1-12-4-3-5-15-16(10-13(2)20-19(12)15)21-14-6-7-17-18(11-14)23-9-8-22-17\/h3-7,10-11H,8-9H2,1-2H3,(H,20,21) is inhibiting the activity of kcnq2 potassium channels?\nAssistant: No, this molecule is not inhibiting the activity of kcnq2 potassium channels."} {"text":"User: Can you estimate if the molecule with the SELFIES [O][C][Branch2][Ring1][C][C][=Branch1][C][=O][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][O][=C][Branch1][=C][C][=C][Ring1][S][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][C] is inhibiting the activity of kcnq2 potassium channels?\nAssistant: No, this molecule is not inhibiting the activity of kcnq2 potassium channels."}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/train_0-11.jsonl": "{"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should not be inhibiting the activity of kcnq2 potassium channels.\nAssistant: Ok, here you go, this DeepSMILES is not inhibiting the activity of kcnq2 potassium channels: OccOCC6)))cccNcccncc6)C)))cccc6)))C))))))c6"} {"text":"User: I want to create a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be inhibiting the activity of kcnq2 potassium channels.\nAssistant: Got it, here you go, this DeepSMILES is not inhibiting the activity of kcnq2 potassium channels: ocC=O)NCCNCC6))CCO))))))))ccc5cccc6)C))))))C"}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/train_0-1.jsonl": "{"text":"Based on the canonical SMILES Cc1cc(Nc2ccc3c(c2)OCCO3)c2cccc(C)c2n1, the molecule exhibits no inhibition of the kcnq2 potassium channel activity."} {"text":"Based on the canonical SMILES representation Cc1ccc2oc(C(=O)N3CCN(CCO)CC3)c(C)c2c1, the molecule displays no inhibition of the kcnq2 potassium channel activity."}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/train_0-13.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the activity of kcnq2 potassium channels?\nConstraint: You must select none, one or more options from A, B, C, D, or E without using any additional words.\nOptions:\nA.) Clc1cc(n2nnc3c2ncn(CC(=O)NCC(OCC)=O)c3=O)ccc1\nB.) Fc1c(OCCCCn2c(=O)c3c(nc2)cccc3)cccc1\nC.) O=C1N(CCCn2c3nc4c(nc3c(c2N)C(OC(C)(C)C)=O)cccc4)CCC1\nD.) Clc1sc(C(=O)Nc2c(N3CCN(CC3)CC)ccc(S(=O)(=O)N3CCOCC3)c2)cc1\nE.) O1c2c(OCC1)ccc(Nc1c3c(nc(c1)C)c(ccc3)C)c2\nAnswer: A, B, C, D, E"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not inhibiting the activity of kcnq2 potassium channels?\nConstraint: You must select none, one or more options from 1 or 2 without using any additional words.\nOptions:\n(1) [O][C][Branch2][Ring1][C][C][=Branch1][C][=O][N][C][C][N][Branch1][Branch1][C][C][Ring1][=Branch1][C][C][O][=C][Branch1][=C][C][=C][Ring1][S][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][C][C]\n(2) [Cl][C][=C][C][=C][Branch2][Ring1][Ring1][C][N][C][S][C][C][=C][Branch1][Ring2][N][=Ring1][=Branch1][C][=C][C][=C][Ring1][#Branch1][C][=C][Ring2][Ring1][C]\nAnswer: 1, 2"}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/train_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of kcnq2 potassium channels.\nMolecule InChI: InChI=1S\/C19H18N2O2\/c1-12-4-3-5-15-16(10-13(2)20-19(12)15)21-14-6-7-17-18(11-14)23-9-8-22-17\/h3-7,10-11H,8-9H2,1-2H3,(H,20,21)\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of kcnq2 potassium channels.\nMolecule DeepSMILES: ocC=O)NCCNCC6))CCO))))))))ccc5cccc6)C))))))C\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/test_0-7.jsonl": "{"text":"User: Can you estimate if the molecule with the InChI InChI=1S\/C22H22N2O4S\/c1-28-19-12-8-11-18(15-19)23-22(25)20-13-6-7-14-21(20)24(29(2,26)27)16-17-9-4-3-5-10-17\/h3-15H,16H2,1-2H3,(H,23,25) is inhibiting the activity of kcnq2 potassium channels?\nAssistant: No, this molecule is not inhibiting the activity of kcnq2 potassium channels."} {"text":"User: Can you tell me if the molecule with the DeepSMILES ClccCScocnn5))CCN))))))))cF)ccc6 is inhibiting the activity of kcnq2 potassium channels?\nAssistant: No, this molecule is not inhibiting the activity of kcnq2 potassium channels."}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/train_0-9.jsonl": "{"text":"User: Can you give me the SMILES of a molecule that is not inhibiting the activity of kcnq2 potassium channels?\nAssistant: Yes, I'm happy to help, here you go: O1c2c(OCC1)ccc(Nc1c3c(nc(c1)C)c(ccc3)C)c2"} {"text":"User: Can you give me the InChI of a molecule that is not inhibiting the activity of kcnq2 potassium channels?\nAssistant: Yes, I'm happy to help, here you go: InChI=1S\/C17H22N2O3\/c1-12-3-4-15-14(11-12)13(2)16(22-15)17(21)19-7-5-18(6-8-19)9-10-20\/h3-4,11,20H,5-10H2,1-2H3"}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/valid_0-3.jsonl": "{"text":"The InChI InChI=1S\/C20H14N2O2\/c23-18-13-7-6-12-17(18)22-19(14-8-2-1-3-9-14)21-16-11-5-4-10-15(16)20(22)24\/h1-13,23H is not inhibiting the activity of kcnq2 potassium channels."} {"text":"The SELFIES [S][C][Branch2][Ring2][=Branch2][C][=N][N][Branch2][Ring1][O][C][=C][Ring1][Branch1][C][N][C][=Branch1][C][=O][N][C][=Branch1][N][=C][Ring1][#Branch1][C][Branch1][Branch1][O][C][C][=C][=O][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][C][=C][Ring2][Ring1][=C] is not inhibiting the activity of kcnq2 potassium channels."}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/test_0-8.jsonl": "{"text":"User: Is the molecule with the SELFIES [S][=Branch1][C][=O][=Branch1][C][=O][Branch2][Ring2][Branch2][N][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=C][Branch1][#Branch1][C][=C][C][=C][Ring1][=Branch1][C][=Branch1][C][=O][N][C][=C][C][Branch1][Ring1][O][C][=C][C][=C][Ring1][Branch2][C] inhibiting the activity of kcnq2 potassium channels?\nAssistant: No, it is not inhibiting the activity of kcnq2 potassium channels."} {"text":"User: Is the molecule with the InChI InChI=1S\/C11H11ClFN3OS\/c12-8-2-1-3-9(13)7(8)6-18-11-16-15-10(17-11)4-5-14\/h1-3H,4-6,14H2 inhibiting the activity of kcnq2 potassium channels?\nAssistant: No, it is not inhibiting the activity of kcnq2 potassium channels."}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/test_0-4.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of kcnq2 potassium channels.\nMolecule InChI: InChI=1S\/C22H22N2O4S\/c1-28-19-12-8-11-18(15-19)23-22(25)20-13-6-7-14-21(20)24(29(2,26)27)16-17-9-4-3-5-10-17\/h3-15H,16H2,1-2H3,(H,23,25)\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is inhibiting the activity of kcnq2 potassium channels.\nDeepSMILES: ClccCScocnn5))CCN))))))))cF)ccc6\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/kcnq2_potassium_channel_butkiewicz/test_0-12.jsonl": "{"text":"User: I want to create a InChI.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should not be inhibiting the activity of kcnq2 potassium channels.\nAssistant: Understood, this InChI is not inhibiting the activity of kcnq2 potassium channels: InChI=1S\/C22H22N2O4S\/c1-28-19-12-8-11-18(15-19)23-22(25)20-13-6-7-14-21(20)24(29(2,26)27)16-17-9-4-3-5-10-17\/h3-15H,16H2,1-2H3,(H,23,25)"} {"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be inhibiting the activity of kcnq2 potassium channels.\nAssistant: Got it, this InChI is not inhibiting the activity of kcnq2 potassium channels: InChI=1S\/C11H11ClFN3OS\/c12-8-2-1-3-9(13)7(8)6-18-11-16-15-10(17-11)4-5-14\/h1-3H,4-6,14H2"}", "/scratch/micpie/export/bio_ner_36/test_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Cytokines included in analyses were FGF-2, interleukin-1β (IL-1β), interleukin-1α (IL-1α), interleukin-10 (IL-10), interleukin-13 (IL-13), interleukin-6 (IL-6), interleukin-12 (IL-12), interleukin (IL-17), macrophage inflammatory protein-1α (MIP-1α), interleukin-5 (IL-5), vascular endothelial growth factor (VEGF), tumor necrosis factor-α (TNF-α), interleukin-2 (IL-2), interleukin-4 (IL-4), monokine induced by interferon-γ (MIG), monocyte chemoattractant protein-1 (MCP-1), keratinocyte chemoattractant (KC), and interferon-γ (IFN-γ)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: Cytokines,0,9,Gene\/Protein\nFGF - 2,36,43,Gene\/Protein\ninterleukin - 1β,45,61,Gene\/Protein\nIL - 1β,64,71,Gene\/Protein\ninterleukin - 1α,75,91,Gene\/Protein\nIL - 1α,94,101,Gene\/Protein\ninterleukin - 10,105,121,Gene\/Protein\nIL - 10,124,131,Gene\/Protein\ninterleukin - 13,135,151,Gene\/Protein\nIL - 13,154,161,Gene\/Protein\ninterleukin - 6,165,180,Gene\/Protein\nIL - 6,183,189,Gene\/Protein\ninterleukin - 12,193,209,Gene\/Protein\nIL - 12,212,219,Gene\/Protein\ninterleukin,223,234,Gene\/Protein\nIL - 17,237,244,Gene\/Protein\nmacrophage inflammatory protein - 1α,248,284,Gene\/Protein\nMIP - 1α,287,295,Gene\/Protein\ninterleukin - 5,299,314,Gene\/Protein\nIL - 5,317,323,Gene\/Protein\nvascular endothelial growth factor,327,361,Gene\/Protein\nVEGF,364,368,Gene\/Protein\ntumor necrosis factor - α,372,397,Gene\/Protein\nTNF - α,400,407,Gene\/Protein\ninterleukin - 2,411,426,Gene\/Protein\nIL - 2,429,435,Gene\/Protein\ninterleukin - 4,439,454,Gene\/Protein\nIL - 4,457,463,Gene\/Protein\nmonokine induced by interferon - γ,467,501,Gene\/Protein\nMIG,504,507,Gene\/Protein\nmonocyte chemoattractant protein - 1,511,547,Gene\/Protein\nMCP - 1,550,557,Gene\/Protein\nkeratinocyte chemoattractant,561,589,Gene\/Protein\nKC,592,594,Gene\/Protein\ninterferon - γ,602,616,Gene\/Protein\nIFN - γ,619,626,Gene\/Protein"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Cytokines included in analyses were FGF-2, interleukin-1β (IL-1β), interleukin-1α (IL-1α), interleukin-10 (IL-10), interleukin-13 (IL-13), interleukin-6 (IL-6), interleukin-12 (IL-12), interleukin (IL-17), macrophage inflammatory protein-1α (MIP-1α), interleukin-5 (IL-5), vascular endothelial growth factor (VEGF), tumor necrosis factor-α (TNF-α), interleukin-2 (IL-2), interleukin-4 (IL-4), monokine induced by interferon-γ (MIG), monocyte chemoattractant protein-1 (MCP-1), keratinocyte chemoattractant (KC), and interferon-γ (IFN-γ)..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: Cytokines,0,9,Gene\/Protein\nFGF - 2,36,43,Gene\/Protein\ninterleukin - 1β,45,61,Gene\/Protein\nIL - 1β,64,71,Gene\/Protein\ninterleukin - 1α,75,91,Gene\/Protein\nIL - 1α,94,101,Gene\/Protein\ninterleukin - 10,105,121,Gene\/Protein\nIL - 10,124,131,Gene\/Protein\ninterleukin - 13,135,151,Gene\/Protein\nIL - 13,154,161,Gene\/Protein\ninterleukin - 6,165,180,Gene\/Protein\nIL - 6,183,189,Gene\/Protein\ninterleukin - 12,193,209,Gene\/Protein\nIL - 12,212,219,Gene\/Protein\ninterleukin,223,234,Gene\/Protein\nIL - 17,237,244,Gene\/Protein\nmacrophage inflammatory protein - 1α,248,284,Gene\/Protein\nMIP - 1α,287,295,Gene\/Protein\ninterleukin - 5,299,314,Gene\/Protein\nIL - 5,317,323,Gene\/Protein\nvascular endothelial growth factor,327,361,Gene\/Protein\nVEGF,364,368,Gene\/Protein\ntumor necrosis factor - α,372,397,Gene\/Protein\nTNF - α,400,407,Gene\/Protein\ninterleukin - 2,411,426,Gene\/Protein\nIL - 2,429,435,Gene\/Protein\ninterleukin - 4,439,454,Gene\/Protein\nIL - 4,457,463,Gene\/Protein\nmonokine induced by interferon - γ,467,501,Gene\/Protein\nMIG,504,507,Gene\/Protein\nmonocyte chemoattractant protein - 1,511,547,Gene\/Protein\nMCP - 1,550,557,Gene\/Protein\nkeratinocyte chemoattractant,561,589,Gene\/Protein\nKC,592,594,Gene\/Protein\ninterferon - γ,602,616,Gene\/Protein\nIFN - γ,619,626,Gene\/Protein"}", "/scratch/micpie/export/bio_ner_36/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: The sequences identified above and the following protein or predicted translations were used for phylogenetic analysis: hsTACC1A (NP _ 006274), hsTACC2l (AAO62630) hsTACC2s (AAO62629), hsTACC3 (NP _ 006333), mmTACC3 (Q9JJ11), xlMaskin (Q9PTG8), dmTACC (AAF52099), ceTAC1 (NP _ 497059), scSPC72 (NP _ 009352), hsRHAMM (NP _ 036616), mmRHAMM (NP _ 038580), rnRHAMM (NP _ 037096), drRHAMM (AAQ97980), hsKeratin (CAB76828), mmKeratin (A61368), rnKeratin (XP _ 235679), hsTPM1 (NP _ 000357), mmTPM1 (NP _ 077745), rnTPM1 (NP _ 62004, drTPM1 (NP _ 571180) dmTPM1 (P06754), ceTPM (NP _ 493540) scTPM1 (P17536), hsKLP2 (BAB03309), rnKIF15 (AAP44513), xlKLP2 (CAA08879), dmKLP2 (NP _ 476818), ceKLP18 (AA034669), hsKIF3A (Q9Y496), mmKIF3A (NP _ 032469), rnKIF3A (XP _ 340797), xlKIF3A (CAA08879), ceKLP11 (NP _ 741473), ciKIF3 (ci0100148992), hsKIF3B (NP _ 004789), mmKIF3B (NP _ 004789), rnKIF3B (XP _ 215883), dmKIF3B (NP _ 524029), hsKIF3C (NP _ 002245), mmKIF3C (NP _ 032471), rnKIF3C (NP _ 445938), dmKIF3C (NP _ 651939)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: hsTACC3,188,195,Gene_or_geneproduct\nmmTACC3,212,219,Gene_or_geneproduct\nxlMaskin,231,239,Gene_or_geneproduct\ndmTACC,251,257,Gene_or_geneproduct\nceTAC1,271,277,Gene_or_geneproduct\nscSPC72,294,301,Gene_or_geneproduct\nhsRHAMM,318,325,Gene_or_geneproduct\nmmRHAMM,342,349,Gene_or_geneproduct\nrnRHAMM,366,373,Gene_or_geneproduct\ndrRHAMM,390,397,Gene_or_geneproduct\nhsKeratin,411,420,Gene_or_geneproduct\nhsTPM1,481,487,Gene_or_geneproduct\nmmTPM1,504,510,Gene_or_geneproduct\nrnTPM1,527,533,Gene_or_geneproduct\ndrTPM1,548,554,Gene_or_geneproduct\ndmTPM1,570,576,Gene_or_geneproduct\nceTPM,588,593,Gene_or_geneproduct\nscTPM1,609,615,Gene_or_geneproduct\nhsKLP2,627,633,Gene_or_geneproduct\nrnKIF15,647,654,Gene_or_geneproduct\nxlKLP2,668,674,Gene_or_geneproduct\ndmKLP2,688,694,Gene_or_geneproduct\nceKLP18,711,718,Gene_or_geneproduct\nhsKIF3A,732,739,Gene_or_geneproduct\nmmKIF3A,751,758,Gene_or_geneproduct\nrnKIF3A,775,782,Gene_or_geneproduct\nxlKIF3A,799,806,Gene_or_geneproduct\nceKLP11,820,827,Gene_or_geneproduct\nhsKIF3B,868,875,Gene_or_geneproduct\nmmKIF3B,892,899,Gene_or_geneproduct\nrnKIF3B,916,923,Gene_or_geneproduct\ndmKIF3B,940,947,Gene_or_geneproduct\nhsKIF3C,964,971,Gene_or_geneproduct\nmmKIF3C,988,995,Gene_or_geneproduct\nrnKIF3C,1012,1019,Gene_or_geneproduct\ndmKIF3C,1036,1043,Gene_or_geneproduct"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Ta, Triticum aestivum; Bd, Brachypodium distachyon; Os, Oryza sativa; Zm, Zea mays; Sb, Sorghum bicolor; Gm, Glycine max; Ss, Suaeda salsa; At, Arabidopsis thaliana; Tp, Trifolium pretense; Ol, Ostreococcus lucimarinus; Cs, Coccomyxa subellipsoidea; Cr, Chlamydomonas reinhardtii; Ca, Chloroflexus aurantiacus; Rc, Roseiflexus castenholzii; Ha, Herpetosiphon aurantiacus; Ot, Oscillochloris trichoides; Ti, Thermoanaerobacter italicus; Ec, Escherichia coli..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: Ta,0,2,Organism\/Species\nTriticum aestivum,4,21,Organism\/Species\nBd,23,25,Organism\/Species\nBrachypodium distachyon,27,50,Organism\/Species\nOs,52,54,Organism\/Species\nOryza sativa,56,68,Organism\/Species\nZm,70,72,Organism\/Species\nZea mays,74,82,Organism\/Species\nSb,84,86,Organism\/Species\nSorghum bicolor,88,103,Organism\/Species\nGm,105,107,Organism\/Species\nGlycine max,109,120,Organism\/Species\nSs,122,124,Organism\/Species\nSuaeda salsa,126,138,Organism\/Species\nAt,140,142,Organism\/Species\nArabidopsis thaliana,144,164,Organism\/Species\nTp,166,168,Organism\/Species\nTrifolium pretense,170,188,Organism\/Species\nOl,190,192,Organism\/Species\nOstreococcus lucimarinus,194,218,Organism\/Species\nCs,220,222,Organism\/Species\nCoccomyxa subellipsoidea,224,248,Organism\/Species\nCr,250,252,Organism\/Species\nChlamydomonas reinhardtii,254,279,Organism\/Species\nCa,281,283,Organism\/Species\nChloroflexus aurantiacus,285,309,Organism\/Species\nRc,311,313,Organism\/Species\nRoseiflexus castenholzii,315,339,Organism\/Species\nHa,341,343,Organism\/Species\nHerpetosiphon aurantiacus,345,370,Organism\/Species\nOt,372,374,Organism\/Species\nOscillochloris trichoides,376,401,Organism\/Species\nTi,403,405,Organism\/Species\nThermoanaerobacter italicus,407,434,Organism\/Species\nEc,436,438,Organism\/Species\nEscherichia coli,440,456,Organism\/Species"}", "/scratch/micpie/export/bio_ner_35/valid_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Abbreviations: CAS, CRK-associated substrate; CH, calponin-homology domain; CSK, C-terminal SRC kinase; E6, Papillomavirus E6 protein; FAK, focal adhesion kinase; GIT, GRK interacter; GPCR, heterotrimeric-G-protein-coupled receptor; GRK, G-protein-coupled-receptor kinase; MAPK, mitogen-activated protein kinase (ERK, p38, JNK); PAK, p21-activated kinase; PBS, paxillin-binding subdomain; PIX, PAK-interacting exchange factor; PKL, paxillin kinase linker; POR1, partner of Rac; PS, phosphoserine; PT, phosphothreonine; PY, phosphotyrosine; RTK, growth factor receptor tyrosine kinase; SH, SRC-homology domain..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: CAS,15,18,Gene\/Protein\nCRK - associated substrate,20,46,Gene\/Protein\nCH,48,50,Gene\/Protein\ncalponin - homology domain,52,78,Gene\/Protein\nCSK,80,83,Gene\/Protein\nC - terminal SRC kinase,85,108,Gene\/Protein\nE6,110,112,Gene\/Protein\nPapillomavirus E6 protein,114,139,Gene\/Protein\nFAK,141,144,Gene\/Protein\nfocal adhesion kinase,146,167,Gene\/Protein\nGIT,169,172,Gene\/Protein\nGRK interacter,174,188,Gene\/Protein\nGPCR,190,194,Gene\/Protein\nheterotrimeric - G - protein - coupled receptor,196,243,Gene\/Protein\nGRK,245,248,Gene\/Protein\nG - protein - coupled - receptor kinase,250,289,Gene\/Protein\nMAPK,291,295,Gene\/Protein\nmitogen - activated protein kinase,297,331,Gene\/Protein\nERK,334,337,Gene\/Protein\np38,339,342,Gene\/Protein\nJNK,344,347,Gene\/Protein\nPAK,350,353,Gene\/Protein\np21 - activated kinase,355,377,Gene\/Protein\nPBS,379,382,Gene\/Protein\npaxillin - binding subdomain,384,412,Gene\/Protein\nPIX,414,417,Gene\/Protein\nPAK - interacting exchange factor,419,452,Gene\/Protein\nPKL,454,457,Gene\/Protein\npaxillin kinase linker,459,481,Gene\/Protein\nPOR1,483,487,Gene\/Protein\npartner of Rac,489,503,Gene\/Protein\nRTK,567,570,Gene\/Protein\ngrowth factor receptor tyrosine kinase,572,610,Gene\/Protein\nSH,612,614,Gene\/Protein\nSRC - homology domain,616,637,Gene\/Protein"} {"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Abbreviations: CAS, CRK-associated substrate; CH, calponin-homology domain; CSK, C-terminal SRC kinase; E6, Papillomavirus E6 protein; FAK, focal adhesion kinase; GIT, GRK interacter; GPCR, heterotrimeric-G-protein-coupled receptor; GRK, G-protein-coupled-receptor kinase; MAPK, mitogen-activated protein kinase (ERK, p38, JNK); PAK, p21-activated kinase; PBS, paxillin-binding subdomain; PIX, PAK-interacting exchange factor; PKL, paxillin kinase linker; POR1, partner of Rac; PS, phosphoserine; PT, phosphothreonine; PY, phosphotyrosine; RTK, growth factor receptor tyrosine kinase; SH, SRC-homology domain..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: CAS,15,18,Gene\/Protein\nCRK - associated substrate,20,46,Gene\/Protein\nCH,48,50,Gene\/Protein\ncalponin - homology domain,52,78,Gene\/Protein\nCSK,80,83,Gene\/Protein\nC - terminal SRC kinase,85,108,Gene\/Protein\nE6,110,112,Gene\/Protein\nPapillomavirus E6 protein,114,139,Gene\/Protein\nFAK,141,144,Gene\/Protein\nfocal adhesion kinase,146,167,Gene\/Protein\nGIT,169,172,Gene\/Protein\nGRK interacter,174,188,Gene\/Protein\nGPCR,190,194,Gene\/Protein\nheterotrimeric - G - protein - coupled receptor,196,243,Gene\/Protein\nGRK,245,248,Gene\/Protein\nG - protein - coupled - receptor kinase,250,289,Gene\/Protein\nMAPK,291,295,Gene\/Protein\nmitogen - activated protein kinase,297,331,Gene\/Protein\nERK,334,337,Gene\/Protein\np38,339,342,Gene\/Protein\nJNK,344,347,Gene\/Protein\nPAK,350,353,Gene\/Protein\np21 - activated kinase,355,377,Gene\/Protein\nPBS,379,382,Gene\/Protein\npaxillin - binding subdomain,384,412,Gene\/Protein\nPIX,414,417,Gene\/Protein\nPAK - interacting exchange factor,419,452,Gene\/Protein\nPKL,454,457,Gene\/Protein\npaxillin kinase linker,459,481,Gene\/Protein\nPOR1,483,487,Gene\/Protein\npartner of Rac,489,503,Gene\/Protein\nRTK,567,570,Gene\/Protein\ngrowth factor receptor tyrosine kinase,572,610,Gene\/Protein\nSH,612,614,Gene\/Protein\nSRC - homology domain,616,637,Gene\/Protein"}", "/scratch/micpie/export/bio_ner_35/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition (NER) task for the the text below.\nText: Abbreviations: CAS, CRK-associated substrate; CH, calponin-homology domain; CSK, C-terminal SRC kinase; E6, Papillomavirus E6 protein; FAK, focal adhesion kinase; GIT, GRK interacter; GPCR, heterotrimeric-G-protein-coupled receptor; GRK, G-protein-coupled-receptor kinase; MAPK, mitogen-activated protein kinase (ERK, p38, JNK); PAK, p21-activated kinase; PBS, paxillin-binding subdomain; PIX, PAK-interacting exchange factor; PKL, paxillin kinase linker; POR1, partner of Rac; PS, phosphoserine; PT, phosphothreonine; PY, phosphotyrosine; RTK, growth factor receptor tyrosine kinase; SH, SRC-homology domain..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: CAS,15,18,Gene\/Protein\nCRK - associated substrate,20,46,Gene\/Protein\nCH,48,50,Gene\/Protein\ncalponin - homology domain,52,78,Gene\/Protein\nCSK,80,83,Gene\/Protein\nC - terminal SRC kinase,85,108,Gene\/Protein\nE6,110,112,Gene\/Protein\nPapillomavirus E6 protein,114,139,Gene\/Protein\nFAK,141,144,Gene\/Protein\nfocal adhesion kinase,146,167,Gene\/Protein\nGIT,169,172,Gene\/Protein\nGRK interacter,174,188,Gene\/Protein\nGPCR,190,194,Gene\/Protein\nheterotrimeric - G - protein - coupled receptor,196,243,Gene\/Protein\nGRK,245,248,Gene\/Protein\nG - protein - coupled - receptor kinase,250,289,Gene\/Protein\nMAPK,291,295,Gene\/Protein\nmitogen - activated protein kinase,297,331,Gene\/Protein\nERK,334,337,Gene\/Protein\np38,339,342,Gene\/Protein\nJNK,344,347,Gene\/Protein\nPAK,350,353,Gene\/Protein\np21 - activated kinase,355,377,Gene\/Protein\nPBS,379,382,Gene\/Protein\npaxillin - binding subdomain,384,412,Gene\/Protein\nPIX,414,417,Gene\/Protein\nPAK - interacting exchange factor,419,452,Gene\/Protein\nPKL,454,457,Gene\/Protein\npaxillin kinase linker,459,481,Gene\/Protein\nPOR1,483,487,Gene\/Protein\npartner of Rac,489,503,Gene\/Protein\nRTK,567,570,Gene\/Protein\ngrowth factor receptor tyrosine kinase,572,610,Gene\/Protein\nSH,612,614,Gene\/Protein\nSRC - homology domain,616,637,Gene\/Protein"} {"text":"Task: Please carry out the NER task for the the text below.\nText: (A-E) Immunohistochemical analyses for Ki-67 showing significant increased expression in Parp-1-\/-\/ Ptc1+\/-compared to Ptc1+\/-mice. (F-J) Immunohistochemical analyses for γ-H2AX showing significantly increased expression in Parp-1-\/-\/ Ptc1+\/-and Rad54-\/-\/ Parp-1-\/-\/ Ptc1+\/-compared to Ptc1+\/-mice. (K-O) Immunohistochemical analyses for cleaved caspase-3 (CC3) showing significantly increased expression in Rad54-\/-\/ Parp-1-\/-\/ Ptc1+\/-mice compared to Rad54-\/-\/ Ptc1+\/-, Parp-1-\/-\/ Ptc1+\/-or Ptc1+\/-mutant mice. (P) The ratio between EGL and IGL area was similar in mice of all genotypes. (Q) Significant decrease in number of mature neurons expressing NeuN in Parp-1-\/-\/ Ptc1+\/-and Rad54-\/-\/ Parp-1-\/-\/ Ptc1+\/-mutants compared to Ptc1+\/-mice. (R and S) Expression level of NeuN was determined in a fixed area (in red) from folia VIII..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type with a high probability of being in the text.\nResult: Ki - 67,42,49,Gene\/Protein\nParp - 1,94,102,Gene\/Protein\nPtc1,108,112,Gene\/Protein\nPtc1,129,133,Gene\/Protein\nmice,138,142,Organism\/Species\nH2AX,190,194,Gene\/Protein\nParp - 1,241,249,Gene\/Protein\nPtc1,255,259,Gene\/Protein\nRad54,268,273,Gene\/Protein\nParp - 1,279,287,Gene\/Protein\nPtc1,293,297,Gene\/Protein\nPtc1,314,318,Gene\/Protein\nmice,323,327,Organism\/Species\ncaspase - 3,379,390,Gene\/Protein\nCC3,393,396,Gene\/Protein\nRad54,444,449,Gene\/Protein\nParp - 1,455,463,Gene\/Protein\nPtc1,469,473,Gene\/Protein\nmice,478,482,Organism\/Species\nRad54,495,500,Gene\/Protein\nPtc1,506,510,Gene\/Protein\nParp - 1,516,524,Gene\/Protein\nPtc1,530,534,Gene\/Protein\nPtc1,542,546,Gene\/Protein\nmice,558,562,Organism\/Species\nmice,619,623,Organism\/Species\nNeuN,707,711,Gene\/Protein\nParp - 1,715,723,Gene\/Protein\nPtc1,729,733,Gene\/Protein\nRad54,742,747,Gene\/Protein\nParp - 1,753,761,Gene\/Protein\nPtc1,767,771,Gene\/Protein\nPtc1,796,800,Gene\/Protein\nmice,805,809,Organism\/Species\nNeuN,842,846,Gene\/Protein"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_5-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the DeepSMILES.\nMolecule DeepSMILES: O=CNCCNCCCCCC7=O)))))))))))Ncccccn6\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: O=C(NCCN1CCCCCC1=O)Nc1ccccn1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the DeepSMILES.\nMolecule DeepSMILES: CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_2-4.jsonl": "{"text":"User: Can you create the SMILES of the molecule with the DeepSMILES CCO)cccnc-ccccc[nH]ncc95)))))))))ncNCCOCC6))))))c6s9?\nAssistant: Yes, this molecule has a SMILES of CC(O)c1cc2nc(-c3cccc4[nH]ncc34)nc(N3CCOCC3)c2s1."} {"text":"User: Can you tell me the SMILES of the molecule with the DeepSMILES CCC)[C@H]NC=O)cccccc6))))))))C=O)O?\nAssistant: Sure, this molecule has a SMILES of CC(C)[C@H](NC(=O)c1ccccc1)C(=O)O."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_2-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SMILES from the DeepSMILES.\nMolecule DeepSMILES: O=CCnc=O)ccccn5ccccnc6%13))))))))))))))NCCCNCCCCcccccc6)))))))CC6\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: O=C(Cn1c(=O)c2cccn2c2cccnc21)NCCCN1CCC(Cc2ccccc2)CC1"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SMILES from the DeepSMILES.\nDeepSMILES: CccscNC=O)cccccS=O)=O)NCCNC)CC6)))))))c6))))))))n5\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: Cc1csc(NC(=O)c2cccc(S(=O)(=O)N3CCN(C)CC3)c2)n1"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_4-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of CcccccNC)S=O)=O)ccC)nnC)c5C))))))))c6C can also be represented with the SMILES Cc1cccc(N(C)S(=O)(=O)c2c(C)nn(C)c2C)c1C."} {"text":"The molecule with the DeepSMILES O=CNccF)cccc6F))))))))cccccOCCF)F))))n6 can also be represented with the SMILES representation O=C(Nc1c(F)cccc1F)c1cccc(OCC(F)F)n1."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_4-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the DeepSMILES CcccccNC)S=O)=O)ccC)nnC)c5C))))))))c6C?\nAssistant: Yes, this molecule has a SMILES of Cc1cccc(N(C)S(=O)(=O)c2c(C)nn(C)c2C)c1C."} {"text":"User: Can you tell me the SMILES of the molecule with the DeepSMILES O=CNccF)cccc6F))))))))cccccOCCF)F))))n6?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of O=C(Nc1c(F)cccc1F)c1cccc(OCC(F)F)n1."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_1-4.jsonl": "{"text":"User: Can you create the SMILES of the molecule with the DeepSMILES CccccC)cNCCNC=O)ccccc=O)ncnc6c%10)))CCCCC7)))))))))))))CC6))))))c6?\nAssistant: Yes, this molecule has a SMILES of Cc1ccc(C)c(N2CCN(C(=O)c3ccc4c(=O)n5c(nc4c3)CCCCC5)CC2)c1."} {"text":"User: Can you create the SMILES of the molecule with the DeepSMILES CC[C@H]C)[C@H]NC=O)[C@H]CCCN)=O))))NC=O)[C@H]CCCN)=O))))NC=O)[C@H]C)NC=O)[C@H]CCC=O)O))))NC=O)[C@H]CCCN)=O))))NC=O)[C@@H]NC=O)[C@H]CCC=O)O))))NC=O)[C@H]CCCN)=O))))NC=O)[C@@H]Ccc[nH]cccccc96))))))))))NC=O)[C@@H]CCN)=O)))NC=O)[C@@H]CCCNC=N)N))))))NC=O)[C@@H]CCCCN)))))NC=O)[C@@H]CCC)C)))NC=O)[C@@H]CCC=O)O))))NC=O)[C@@H]CC=O)O)))NC=O)[C@@H]CCC=O)O))))NC=O)[C@@H]C)NC=O)[C@@H]N)CC=O)O)))))))))))))))))))))))))))))))))))))))[C@H]C)CC)))))))))))))))))))))C=O)N[C@@H]CCC)C)))C=O)N[C@@H]CCCN)=O))))C=O)N[C@@H]CCSC))))C=O)N[C@@H]Ccccccc6)))))))C=O)N[C@@H]Ccccccc6)))))))C=O)N[C@@H]CCC=O)O))))C=O)N[C@@H]Ccc[nH]cccccc96))))))))))C=O)N[C@H]Ccc[nH]cccccc96))))))))))C=O)N[C@H]CCCN)=O))))C=O)N[C@@H]CCCCN)))))C=O)N[C@H]CC=O)O)))C=O)O?\nAssistant: Of course, this molecule has a SMILES of CC[C@H](C)[C@H](NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](C)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](Cc1c[nH]c2ccccc12)NC(=O)[C@@H](CC(N)=O)NC(=O)[C@@H](CCCNC(=N)N)NC(=O)[C@@H](CCCCN)NC(=O)[C@@H](CC(C)C)NC(=O)[C@@H](CCC(=O)O)NC(=O)[C@@H](CC(=O)O)NC(=O)[C@@H](CCC(=O)O)NC(=O)[C@@H](C)NC(=O)[C@@H](N)CC(=O)O)[C@H](C)CC)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@H](Cc1c[nH]c2ccccc12)C(=O)N[C@H](CCC(N)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@H](CC(=O)O)C(=O)O."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_5-1.jsonl": "{"text":"The molecule with the SMILES O=C(NCCN1CCCCCC1=O)Nc1ccccn1 can also be represented with the DeepSMILES O=CNCCNCCCCCC7=O)))))))))))Ncccccn6."} {"text":"The molecule with the SMILES CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21 can also be represented with the DeepSMILES representation CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_4-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the DeepSMILES CcnnC)cC)c5S=O)=O)NCCCC5CCcccccc6?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of Cc1nn(C)c(C)c1S(=O)(=O)N1CCCC1CCc1ccccc1."} {"text":"User: Can you generate the SMILES of the molecule with the DeepSMILES CcocC)cC=O)NccccCC#N)))cc6))))))))c5C?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of Cc1oc(C)c(C(=O)Nc2ccc(CC#N)cc2)c1C."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_4-4.jsonl": "{"text":"User: Can you create the SMILES of the molecule with the DeepSMILES CcnnC)cC)c5S=O)=O)NCCCC5CCCCCC6?\nAssistant: Yes, this molecule has a SMILES of Cc1nn(C)c(C)c1S(=O)(=O)N1CCCC1C1CCCCC1."} {"text":"User: Can you tell me the SMILES of the molecule with the DeepSMILES COcccCccncN)[nH]c6=O))))))))ccOC))c6OC?\nAssistant: Yes, this molecule has a SMILES of COc1cc(Cc2cnc(N)[nH]c2=O)cc(OC)c1OC."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_1-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SMILES from the DeepSMILES.\nDeepSMILES: CCC=O)NccccNC=O)ccccccBr)cccc%106))))))))))))cc6\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CCC(=O)Nc1ccc(NC(=O)c2cccc3c(Br)cccc23)cc1"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SMILES from the DeepSMILES.\nMolecule DeepSMILES: CccncNccccC#N))cc6)))))))nc6CCl)ccccF)cc6\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: Cc1cnc(Nc2ccc(C#N)cc2)nc1C(Cl)c1ccc(F)cc1"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_3-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SMILES from the DeepSMILES.\nMolecule DeepSMILES: CCOC=O)C=CC)Ncccccc6N\/C%11=N\\S=O)=O)ccccCl)cc6\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CCOC(=O)C1=C(C)Nc2ccccc2N\/C1=N\\S(=O)(=O)c1ccc(Cl)cc1"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SMILES from the DeepSMILES.\nMolecule DeepSMILES: CcnnC)cC)c5S=O)=O)NCCccF)cccc6F\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Cc1nn(C)c(C)c1S(=O)(=O)NCCc1c(F)cccc1F"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_1-5.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the SMILES CCC(=O)Nc1ccc(NC(=O)c2cccc3c(Br)cccc23)cc1?\nAssistant: Of course, this molecule has a DeepSMILES of CCC=O)NccccNC=O)ccccccBr)cccc%106))))))))))))cc6."} {"text":"User: Can you generate the DeepSMILES of the molecule with the SMILES Cc1cnc(Nc2ccc(C#N)cc2)nc1C(Cl)c1ccc(F)cc1?\nAssistant: Yes, this molecule has a DeepSMILES of CccncNccccC#N))cc6)))))))nc6CCl)ccccF)cc6."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_5-0.jsonl": "{"text":"The molecule with the DeepSMILES O=CNCCNCCCCCC7=O)))))))))))Ncccccn6 can also be represented with the SMILES representation O=C(NCCN1CCCCCC1=O)Nc1ccccn1."} {"text":"The molecule with the DeepSMILES CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58 can also be represented with the SMILES representation CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_1-5.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the SMILES O=C(CCC1CCCN(C(=O)c2ccc3c(c2)OCO3)C1)N1CCN(c2ccccn2)CC1?\nAssistant: Yes, this molecule has a DeepSMILES of O=CCCCCCCNC=O)cccccc6)OCO5)))))))))C6))))))))NCCNcccccn6))))))CC6."} {"text":"User: Can you generate the DeepSMILES of the molecule with the SMILES CCC(C)C(NS(=O)(=O)c1ccc(C)cc1)C(=O)CCl?\nAssistant: Yes, this molecule has a DeepSMILES of CCCC)CNS=O)=O)ccccC)cc6))))))))C=O)CCl."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_5-4.jsonl": "{"text":"User: Can you create the SMILES of the molecule with the DeepSMILES CcncCNCCNccccnncn5n9)))))))))CC6)))))))no5?\nAssistant: Sure, this molecule has a SMILES of Cc1nc(CN2CCN(c3ccc4nncn4n3)CC2)no1."} {"text":"User: Can you create the SMILES of the molecule with the DeepSMILES Nccc[nH+]cccccc%106)))))))CCCC6?\nAssistant: Sure, this molecule has a SMILES of Nc1c2c([nH+]c3ccccc13)CCCC2."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_0-5.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the SMILES S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1?\nAssistant: Of course, this molecule has a DeepSMILES of S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6."} {"text":"User: Can you create the DeepSMILES of the molecule with the SMILES O=c1c2ccccc2nc(\/C=C\\c2ccccc2[N+](=O)[O-])n1-c1ccccc1O?\nAssistant: Of course, this molecule has a DeepSMILES of O=ccccccc6nc\/C=C\\cccccc6[N+]=O)[O-]))))))))))n%10-cccccc6O."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_3-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of CC=O)NccccS=O)=O)NC)ccccOCC=O)OCC=O)NccccOCF)F)))cc6)))))))))))))cc6))))))))cc6 can also be represented with the SMILES representation CC(=O)Nc1ccc(S(=O)(=O)N(C)c2ccc(OCC(=O)OCC(=O)Nc3ccc(OC(F)F)cc3)cc2)cc1."} {"text":"The molecule with the DeepSMILES representation of CccccCC)NC)S=O)=O)ccC)nnC)c5C)))))))))cc6 can also be represented with the SMILES representation Cc1ccc(C(C)N(C)S(=O)(=O)c2c(C)nn(C)c2C)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_0-1.jsonl": "{"text":"The molecule with the SMILES S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1 can also be represented with the DeepSMILES S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6."} {"text":"The molecule with the SMILES O=c1c2ccccc2nc(\/C=C\\c2ccccc2[N+](=O)[O-])n1-c1ccccc1O can also be represented with the DeepSMILES representation O=ccccccc6nc\/C=C\\cccccc6[N+]=O)[O-]))))))))))n%10-cccccc6O."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_5-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of CccccCCcccccc6))))))NCN)=NC5=O)))))))o5 can also be represented with the SMILES Cc1ccc(CC2(c3ccccc3)NC(N)=NC2=O)o1."} {"text":"The molecule with the DeepSMILES CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20 can also be represented with the SMILES CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_2-0.jsonl": "{"text":"The molecule with the DeepSMILES CCO)cccnc-ccccc[nH]ncc95)))))))))ncNCCOCC6))))))c6s9 can also be represented with the SMILES representation CC(O)c1cc2nc(-c3cccc4[nH]ncc34)nc(N3CCOCC3)c2s1."} {"text":"The molecule with the DeepSMILES CCC)[C@H]NC=O)cccccc6))))))))C=O)O can also be represented with the SMILES representation CC(C)[C@H](NC(=O)c1ccccc1)C(=O)O."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_2-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SMILES from the DeepSMILES.\nMolecule DeepSMILES: Brcccccc6C=Nncnnc5-cccco5))))))))SC6\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: Brc1ccccc1C1=Nn2c(nnc2-c2ccco2)SC1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the DeepSMILES.\nMolecule DeepSMILES: CNCC=O)Ncccccc6Br))))))))))C=O)CNCCNCccccCl)cc6)))))))CC6\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_0-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6 can also be represented with the SMILES representation S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1."} {"text":"The molecule with the DeepSMILES CcccC)cc-nccnc5SCC=O)Ncnccs5))))))))))))))c6 can also be represented with the SMILES representation Cc1cc(C)cc(-n2ccnc2SCC(=O)Nc2nccs2)c1."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_3-3.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the SMILES.\nSMILES: CC(=O)Nc1ccc(S(=O)(=O)N(C)c2ccc(OCC(=O)OCC(=O)Nc3ccc(OC(F)F)cc3)cc2)cc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC=O)NccccS=O)=O)NC)ccccOCC=O)OCC=O)NccccOCF)F)))cc6)))))))))))))cc6))))))))cc6"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the SMILES.\nSMILES: Cc1ccc(C(C)N(C)S(=O)(=O)c2c(C)nn(C)c2C)cc1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CccccCC)NC)S=O)=O)ccC)nnC)c5C)))))))))cc6"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_5-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SMILES.\nMolecule SMILES: Cc1nc(CN2CCN(c3ccc4nncn4n3)CC2)no1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CcncCNCCNccccnncn5n9)))))))))CC6)))))))no5"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SMILES.\nSMILES: Nc1c2c([nH+]c3ccccc13)CCCC2\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: Nccc[nH+]cccccc%106)))))))CCCC6"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_3-5.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the SMILES Cn1c(=O)c2c([nH]c(=O)n2C)n(Cc2ccccc2)c1=O?\nAssistant: Of course, this molecule has a DeepSMILES of Cnc=O)cc[nH]c=O)n5C))))nCcccccc6)))))))c6=O."} {"text":"User: Can you tell me the DeepSMILES of the molecule with the SMILES Cc1nn(C)c(C)c1S(=O)(=O)N(C)C(C)c1cc(F)ccc1F?\nAssistant: Sure, this molecule has a DeepSMILES of CcnnC)cC)c5S=O)=O)NC)CC)cccF)ccc6F."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_5-5.jsonl": "{"text":"User: Can you tell me the DeepSMILES of the molecule with the SMILES Cc1ccc(CC2(c3ccccc3)NC(N)=NC2=O)o1?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of CccccCCcccccc6))))))NCN)=NC5=O)))))))o5."} {"text":"User: Can you tell me the DeepSMILES of the molecule with the SMILES CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1?\nAssistant: Of course, this molecule has a DeepSMILES of CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_4-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SMILES.\nMolecule SMILES: Cc1nn(C)c(C)c1S(=O)(=O)N1CCCC1CCc1ccccc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CcnnC)cC)c5S=O)=O)NCCCC5CCcccccc6"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the SMILES.\nSMILES: Cc1oc(C)c(C(=O)Nc2ccc(CC#N)cc2)c1C\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CcocC)cC=O)NccccCC#N)))cc6))))))))c5C"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_1-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SMILES.\nMolecule SMILES: O=C(CCC1CCCN(C(=O)c2ccc3c(c2)OCO3)C1)N1CCN(c2ccccn2)CC1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: O=CCCCCCCNC=O)cccccc6)OCO5)))))))))C6))))))))NCCNcccccn6))))))CC6"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the SMILES.\nSMILES: CCC(C)C(NS(=O)(=O)c1ccc(C)cc1)C(=O)CCl\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCCC)CNS=O)=O)ccccC)cc6))))))))C=O)CCl"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_0-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SMILES from the DeepSMILES.\nDeepSMILES: S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the DeepSMILES.\nMolecule DeepSMILES: O=ccccccc6nc\/C=C\\cccccc6[N+]=O)[O-]))))))))))n%10-cccccc6O\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: O=c1c2ccccc2nc(\/C=C\\c2ccccc2[N+](=O)[O-])n1-c1ccccc1O"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_3-0.jsonl": "{"text":"The molecule with the DeepSMILES CCOC=O)C=CC)Ncccccc6N\/C%11=N\\S=O)=O)ccccCl)cc6 can also be represented with the SMILES CCOC(=O)C1=C(C)Nc2ccccc2N\/C1=N\\S(=O)(=O)c1ccc(Cl)cc1."} {"text":"The molecule with the DeepSMILES representation of CcnnC)cC)c5S=O)=O)NCCccF)cccc6F can also be represented with the SMILES Cc1nn(C)c(C)c1S(=O)(=O)NCCc1c(F)cccc1F."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_1-0.jsonl": "{"text":"The molecule with the DeepSMILES O=CCCCCCCNC=O)cccccc6)OCO5)))))))))C6))))))))NCCNcccccn6))))))CC6 can also be represented with the SMILES representation O=C(CCC1CCCN(C(=O)c2ccc3c(c2)OCO3)C1)N1CCN(c2ccccn2)CC1."} {"text":"The molecule with the DeepSMILES representation of CCCC)CNS=O)=O)ccccC)cc6))))))))C=O)CCl can also be represented with the SMILES CCC(C)C(NS(=O)(=O)c1ccc(C)cc1)C(=O)CCl."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_5-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the DeepSMILES.\nMolecule DeepSMILES: CccccCCcccccc6))))))NCN)=NC5=O)))))))o5\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Cc1ccc(CC2(c3ccccc3)NC(N)=NC2=O)o1"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SMILES from the DeepSMILES.\nDeepSMILES: CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_2-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SMILES.\nMolecule SMILES: O=C(Cn1c(=O)c2cccn2c2cccnc21)NCCCN1CCC(Cc2ccccc2)CC1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: O=CCnc=O)ccccn5ccccnc6%13))))))))))))))NCCCNCCCCcccccc6)))))))CC6"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the SMILES.\nMolecule SMILES: Cc1csc(NC(=O)c2cccc(S(=O)(=O)N3CCN(C)CC3)c2)n1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CccscNC=O)cccccS=O)=O)NCCNC)CC6)))))))c6))))))))n5"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_1-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SMILES from the DeepSMILES.\nMolecule DeepSMILES: CccccC)cNCCNC=O)ccccc=O)ncnc6c%10)))CCCCC7)))))))))))))CC6))))))c6\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: Cc1ccc(C)c(N2CCN(C(=O)c3ccc4c(=O)n5c(nc4c3)CCCCC5)CC2)c1"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SMILES from the DeepSMILES.\nDeepSMILES: CC[C@H]C)[C@H]NC=O)[C@H]CCCN)=O))))NC=O)[C@H]CCCN)=O))))NC=O)[C@H]C)NC=O)[C@H]CCC=O)O))))NC=O)[C@H]CCCN)=O))))NC=O)[C@@H]NC=O)[C@H]CCC=O)O))))NC=O)[C@H]CCCN)=O))))NC=O)[C@@H]Ccc[nH]cccccc96))))))))))NC=O)[C@@H]CCN)=O)))NC=O)[C@@H]CCCNC=N)N))))))NC=O)[C@@H]CCCCN)))))NC=O)[C@@H]CCC)C)))NC=O)[C@@H]CCC=O)O))))NC=O)[C@@H]CC=O)O)))NC=O)[C@@H]CCC=O)O))))NC=O)[C@@H]C)NC=O)[C@@H]N)CC=O)O)))))))))))))))))))))))))))))))))))))))[C@H]C)CC)))))))))))))))))))))C=O)N[C@@H]CCC)C)))C=O)N[C@@H]CCCN)=O))))C=O)N[C@@H]CCSC))))C=O)N[C@@H]Ccccccc6)))))))C=O)N[C@@H]Ccccccc6)))))))C=O)N[C@@H]CCC=O)O))))C=O)N[C@@H]Ccc[nH]cccccc96))))))))))C=O)N[C@H]Ccc[nH]cccccc96))))))))))C=O)N[C@H]CCCN)=O))))C=O)N[C@@H]CCCCN)))))C=O)N[C@H]CC=O)O)))C=O)O\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC[C@H](C)[C@H](NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](C)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](Cc1c[nH]c2ccccc12)NC(=O)[C@@H](CC(N)=O)NC(=O)[C@@H](CCCNC(=N)N)NC(=O)[C@@H](CCCCN)NC(=O)[C@@H](CC(C)C)NC(=O)[C@@H](CCC(=O)O)NC(=O)[C@@H](CC(=O)O)NC(=O)[C@@H](CCC(=O)O)NC(=O)[C@@H](C)NC(=O)[C@@H](N)CC(=O)O)[C@H](C)CC)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@H](Cc1c[nH]c2ccccc12)C(=O)N[C@H](CCC(N)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@H](CC(=O)O)C(=O)O"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_5-5.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the SMILES O=C(NCCN1CCCCCC1=O)Nc1ccccn1?\nAssistant: Sure, this molecule has a DeepSMILES of O=CNCCNCCCCCC7=O)))))))))))Ncccccn6."} {"text":"User: Can you create the DeepSMILES of the molecule with the SMILES CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21?\nAssistant: Yes, this molecule has a DeepSMILES of CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_2-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the DeepSMILES O=CCnc=O)ccccn5ccccnc6%13))))))))))))))NCCCNCCCCcccccc6)))))))CC6?\nAssistant: Sure, this molecule has a SMILES of O=C(Cn1c(=O)c2cccn2c2cccnc21)NCCCN1CCC(Cc2ccccc2)CC1."} {"text":"User: Can you create the SMILES of the molecule with the DeepSMILES CccscNC=O)cccccS=O)=O)NCCNC)CC6)))))))c6))))))))n5?\nAssistant: Of course, this molecule has a SMILES of Cc1csc(NC(=O)c2cccc(S(=O)(=O)N3CCN(C)CC3)c2)n1."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_3-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the DeepSMILES CC=O)NccccS=O)=O)NC)ccccOCC=O)OCC=O)NccccOCF)F)))cc6)))))))))))))cc6))))))))cc6?\nAssistant: Yes, this molecule has a SMILES of CC(=O)Nc1ccc(S(=O)(=O)N(C)c2ccc(OCC(=O)OCC(=O)Nc3ccc(OC(F)F)cc3)cc2)cc1."} {"text":"User: Can you generate the SMILES of the molecule with the DeepSMILES CccccCC)NC)S=O)=O)ccC)nnC)c5C)))))))))cc6?\nAssistant: Of course, this molecule has a SMILES of Cc1ccc(C(C)N(C)S(=O)(=O)c2c(C)nn(C)c2C)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_0-0.jsonl": "{"text":"The molecule with the DeepSMILES S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6 can also be represented with the SMILES S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1."} {"text":"The molecule with the DeepSMILES O=ccccccc6nc\/C=C\\cccccc6[N+]=O)[O-]))))))))))n%10-cccccc6O can also be represented with the SMILES O=c1c2ccccc2nc(\/C=C\\c2ccccc2[N+](=O)[O-])n1-c1ccccc1O."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_2-5.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the SMILES CC(O)c1cc2nc(-c3cccc4[nH]ncc34)nc(N3CCOCC3)c2s1?\nAssistant: Yes, this molecule has a DeepSMILES of CCO)cccnc-ccccc[nH]ncc95)))))))))ncNCCOCC6))))))c6s9."} {"text":"User: Can you generate the DeepSMILES of the molecule with the SMILES CC(C)[C@H](NC(=O)c1ccccc1)C(=O)O?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of CCC)[C@H]NC=O)cccccc6))))))))C=O)O."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_5-5.jsonl": "{"text":"User: Can you tell me the DeepSMILES of the molecule with the SMILES Cc1nc(CN2CCN(c3ccc4nncn4n3)CC2)no1?\nAssistant: Sure, this molecule has a DeepSMILES of CcncCNCCNccccnncn5n9)))))))))CC6)))))))no5."} {"text":"User: Can you create the DeepSMILES of the molecule with the SMILES Nc1c2c([nH+]c3ccccc13)CCCC2?\nAssistant: Of course, this molecule has a DeepSMILES of Nccc[nH+]cccccc%106)))))))CCCC6."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_3-5.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the SMILES CCOC(=O)C1=C(C)Nc2ccccc2N\/C1=N\\S(=O)(=O)c1ccc(Cl)cc1?\nAssistant: Of course, this molecule has a DeepSMILES of CCOC=O)C=CC)Ncccccc6N\/C%11=N\\S=O)=O)ccccCl)cc6."} {"text":"User: Can you create the DeepSMILES of the molecule with the SMILES Cc1nn(C)c(C)c1S(=O)(=O)NCCc1c(F)cccc1F?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of CcnnC)cC)c5S=O)=O)NCCccF)cccc6F."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_2-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of O=CCnc=O)ccccn5ccccnc6%13))))))))))))))NCCCNCCCCcccccc6)))))))CC6 can also be represented with the SMILES O=C(Cn1c(=O)c2cccn2c2cccnc21)NCCCN1CCC(Cc2ccccc2)CC1."} {"text":"The molecule with the DeepSMILES representation of CccscNC=O)cccccS=O)=O)NCCNC)CC6)))))))c6))))))))n5 can also be represented with the SMILES representation Cc1csc(NC(=O)c2cccc(S(=O)(=O)N3CCN(C)CC3)c2)n1."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_1-4.jsonl": "{"text":"User: Can you create the SMILES of the molecule with the DeepSMILES O=CCCCCCCNC=O)cccccc6)OCO5)))))))))C6))))))))NCCNcccccn6))))))CC6?\nAssistant: Of course, this molecule has a SMILES of O=C(CCC1CCCN(C(=O)c2ccc3c(c2)OCO3)C1)N1CCN(c2ccccn2)CC1."} {"text":"User: Can you tell me the SMILES of the molecule with the DeepSMILES CCCC)CNS=O)=O)ccccC)cc6))))))))C=O)CCl?\nAssistant: Of course, this molecule has a SMILES of CCC(C)C(NS(=O)(=O)c1ccc(C)cc1)C(=O)CCl."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_2-0.jsonl": "{"text":"The molecule with the DeepSMILES Brcccccc6C=Nncnnc5-cccco5))))))))SC6 can also be represented with the SMILES Brc1ccccc1C1=Nn2c(nnc2-c2ccco2)SC1."} {"text":"The molecule with the DeepSMILES CNCC=O)Ncccccc6Br))))))))))C=O)CNCCNCccccCl)cc6)))))))CC6 can also be represented with the SMILES representation CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_4-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SMILES from the DeepSMILES.\nMolecule DeepSMILES: CcnnC)cC)c5S=O)=O)NCCCC5CCcccccc6\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: Cc1nn(C)c(C)c1S(=O)(=O)N1CCCC1CCc1ccccc1"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SMILES from the DeepSMILES.\nMolecule DeepSMILES: CcocC)cC=O)NccccCC#N)))cc6))))))))c5C\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Cc1oc(C)c(C(=O)Nc2ccc(CC#N)cc2)c1C"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_0-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SMILES.\nSMILES: S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the SMILES.\nSMILES: O=c1c2ccccc2nc(\/C=C\\c2ccccc2[N+](=O)[O-])n1-c1ccccc1O\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: O=ccccccc6nc\/C=C\\cccccc6[N+]=O)[O-]))))))))))n%10-cccccc6O"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_4-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the DeepSMILES.\nDeepSMILES: CcccccNC)S=O)=O)ccC)nnC)c5C))))))))c6C\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Cc1cccc(N(C)S(=O)(=O)c2c(C)nn(C)c2C)c1C"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SMILES from the DeepSMILES.\nMolecule DeepSMILES: O=CNccF)cccc6F))))))))cccccOCCF)F))))n6\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: O=C(Nc1c(F)cccc1F)c1cccc(OCC(F)F)n1"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_3-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the DeepSMILES.\nDeepSMILES: CC=O)NccccS=O)=O)NC)ccccOCC=O)OCC=O)NccccOCF)F)))cc6)))))))))))))cc6))))))))cc6\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CC(=O)Nc1ccc(S(=O)(=O)N(C)c2ccc(OCC(=O)OCC(=O)Nc3ccc(OC(F)F)cc3)cc2)cc1"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the DeepSMILES.\nMolecule DeepSMILES: CccccCC)NC)S=O)=O)ccC)nnC)c5C)))))))))cc6\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: Cc1ccc(C(C)N(C)S(=O)(=O)c2c(C)nn(C)c2C)cc1"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_2-1.jsonl": "{"text":"The molecule with the SMILES representation of Brc1ccccc1C1=Nn2c(nnc2-c2ccco2)SC1 can also be represented with the DeepSMILES representation Brcccccc6C=Nncnnc5-cccco5))))))))SC6."} {"text":"The molecule with the SMILES representation of CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1 can also be represented with the DeepSMILES CNCC=O)Ncccccc6Br))))))))))C=O)CNCCNCccccCl)cc6)))))))CC6."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_5-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the DeepSMILES CccccCCcccccc6))))))NCN)=NC5=O)))))))o5?\nAssistant: Of course, this molecule has a SMILES of Cc1ccc(CC2(c3ccccc3)NC(N)=NC2=O)o1."} {"text":"User: Can you generate the SMILES of the molecule with the DeepSMILES CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20?\nAssistant: Of course, this molecule has a SMILES of CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_4-5.jsonl": "{"text":"User: Can you tell me the DeepSMILES of the molecule with the SMILES Cc1cccc(N(C)S(=O)(=O)c2c(C)nn(C)c2C)c1C?\nAssistant: Of course, this molecule has a DeepSMILES of CcccccNC)S=O)=O)ccC)nnC)c5C))))))))c6C."} {"text":"User: Can you tell me the DeepSMILES of the molecule with the SMILES O=C(Nc1c(F)cccc1F)c1cccc(OCC(F)F)n1?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of O=CNccF)cccc6F))))))))cccccOCCF)F))))n6."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_4-0.jsonl": "{"text":"The molecule with the DeepSMILES CcnnC)cC)c5S=O)=O)NCCCC5CCcccccc6 can also be represented with the SMILES Cc1nn(C)c(C)c1S(=O)(=O)N1CCCC1CCc1ccccc1."} {"text":"The molecule with the DeepSMILES CcocC)cC=O)NccccCC#N)))cc6))))))))c5C can also be represented with the SMILES Cc1oc(C)c(C(=O)Nc2ccc(CC#N)cc2)c1C."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_5-1.jsonl": "{"text":"The molecule with the SMILES Cc1nc(CN2CCN(c3ccc4nncn4n3)CC2)no1 can also be represented with the DeepSMILES representation CcncCNCCNccccnncn5n9)))))))))CC6)))))))no5."} {"text":"The molecule with the SMILES Nc1c2c([nH+]c3ccccc13)CCCC2 can also be represented with the DeepSMILES Nccc[nH+]cccccc%106)))))))CCCC6."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_2-1.jsonl": "{"text":"The molecule with the SMILES CC(O)c1cc2nc(-c3cccc4[nH]ncc34)nc(N3CCOCC3)c2s1 can also be represented with the DeepSMILES CCO)cccnc-ccccc[nH]ncc95)))))))))ncNCCOCC6))))))c6s9."} {"text":"The molecule with the SMILES CC(C)[C@H](NC(=O)c1ccccc1)C(=O)O can also be represented with the DeepSMILES representation CCC)[C@H]NC=O)cccccc6))))))))C=O)O."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_0-0.jsonl": "{"text":"The molecule with the DeepSMILES OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C can also be represented with the SMILES O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C."} {"text":"The molecule with the DeepSMILES CCC)C)cccC=S)NCCCC5))))))ccCC)C)C))c6O can also be represented with the SMILES CC(C)(C)c1cc(C(=S)N2CCCC2)cc(C(C)(C)C)c1O."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_0-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SMILES.\nSMILES: O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SMILES.\nSMILES: CC(C)(C)c1cc(C(=S)N2CCCC2)cc(C(C)(C)C)c1O\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CCC)C)cccC=S)NCCCC5))))))ccCC)C)C))c6O"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_1-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the DeepSMILES CCC=O)NccccNC=O)ccccccBr)cccc%106))))))))))))cc6?\nAssistant: Of course, this molecule has a SMILES of CCC(=O)Nc1ccc(NC(=O)c2cccc3c(Br)cccc23)cc1."} {"text":"User: Can you create the SMILES of the molecule with the DeepSMILES CccncNccccC#N))cc6)))))))nc6CCl)ccccF)cc6?\nAssistant: Of course, this molecule has a SMILES of Cc1cnc(Nc2ccc(C#N)cc2)nc1C(Cl)c1ccc(F)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_2-5.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the SMILES O=C(Cn1c(=O)c2cccn2c2cccnc21)NCCCN1CCC(Cc2ccccc2)CC1?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of O=CCnc=O)ccccn5ccccnc6%13))))))))))))))NCCCNCCCCcccccc6)))))))CC6."} {"text":"User: Can you generate the DeepSMILES of the molecule with the SMILES Cc1csc(NC(=O)c2cccc(S(=O)(=O)N3CCN(C)CC3)c2)n1?\nAssistant: Yes, this molecule has a DeepSMILES of CccscNC=O)cccccS=O)=O)NCCNC)CC6)))))))c6))))))))n5."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_1-1.jsonl": "{"text":"The molecule with the SMILES CCC(=O)Nc1ccc(NC(=O)c2cccc3c(Br)cccc23)cc1 can also be represented with the DeepSMILES representation CCC=O)NccccNC=O)ccccccBr)cccc%106))))))))))))cc6."} {"text":"The molecule with the SMILES Cc1cnc(Nc2ccc(C#N)cc2)nc1C(Cl)c1ccc(F)cc1 can also be represented with the DeepSMILES representation CccncNccccC#N))cc6)))))))nc6CCl)ccccF)cc6."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_5-2.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SMILES from the DeepSMILES.\nDeepSMILES: CcncCNCCNccccnncn5n9)))))))))CC6)))))))no5\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Cc1nc(CN2CCN(c3ccc4nncn4n3)CC2)no1"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SMILES from the DeepSMILES.\nDeepSMILES: Nccc[nH+]cccccc%106)))))))CCCC6\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Nc1c2c([nH+]c3ccccc13)CCCC2"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_5-1.jsonl": "{"text":"The molecule with the SMILES representation of Cc1ccc(CC2(c3ccccc3)NC(N)=NC2=O)o1 can also be represented with the DeepSMILES CccccCCcccccc6))))))NCN)=NC5=O)))))))o5."} {"text":"The molecule with the SMILES representation of CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1 can also be represented with the DeepSMILES representation CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_4-1.jsonl": "{"text":"The molecule with the SMILES Cc1nn(C)c(C)c1S(=O)(=O)N1CCCC1C1CCCCC1 can also be represented with the DeepSMILES representation CcnnC)cC)c5S=O)=O)NCCCC5CCCCCC6."} {"text":"The molecule with the SMILES COc1cc(Cc2cnc(N)[nH]c2=O)cc(OC)c1OC can also be represented with the DeepSMILES COcccCccncN)[nH]c6=O))))))))ccOC))c6OC."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_3-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the SMILES.\nSMILES: CCOC(=O)C1=C(C)Nc2ccccc2N\/C1=N\\S(=O)(=O)c1ccc(Cl)cc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCOC=O)C=CC)Ncccccc6N\/C%11=N\\S=O)=O)ccccCl)cc6"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the SMILES.\nSMILES: Cc1nn(C)c(C)c1S(=O)(=O)NCCc1c(F)cccc1F\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CcnnC)cC)c5S=O)=O)NCCccF)cccc6F"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_3-4.jsonl": "{"text":"User: Can you create the SMILES of the molecule with the DeepSMILES Cnc=O)cc[nH]c=O)n5C))))nCcccccc6)))))))c6=O?\nAssistant: Yes, this molecule has a SMILES of Cn1c(=O)c2c([nH]c(=O)n2C)n(Cc2ccccc2)c1=O."} {"text":"User: Can you create the SMILES of the molecule with the DeepSMILES CcnnC)cC)c5S=O)=O)NC)CC)cccF)ccc6F?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of Cc1nn(C)c(C)c1S(=O)(=O)N(C)C(C)c1cc(F)ccc1F."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_1-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the SMILES.\nMolecule SMILES: Cc1ccc(C)c(N2CCN(C(=O)c3ccc4c(=O)n5c(nc4c3)CCCCC5)CC2)c1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CccccC)cNCCNC=O)ccccc=O)ncnc6c%10)))CCCCC7)))))))))))))CC6))))))c6"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the SMILES.\nMolecule SMILES: CC[C@H](C)[C@H](NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](C)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](Cc1c[nH]c2ccccc12)NC(=O)[C@@H](CC(N)=O)NC(=O)[C@@H](CCCNC(=N)N)NC(=O)[C@@H](CCCCN)NC(=O)[C@@H](CC(C)C)NC(=O)[C@@H](CCC(=O)O)NC(=O)[C@@H](CC(=O)O)NC(=O)[C@@H](CCC(=O)O)NC(=O)[C@@H](C)NC(=O)[C@@H](N)CC(=O)O)[C@H](C)CC)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@H](Cc1c[nH]c2ccccc12)C(=O)N[C@H](CCC(N)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@H](CC(=O)O)C(=O)O\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CC[C@H]C)[C@H]NC=O)[C@H]CCCN)=O))))NC=O)[C@H]CCCN)=O))))NC=O)[C@H]C)NC=O)[C@H]CCC=O)O))))NC=O)[C@H]CCCN)=O))))NC=O)[C@@H]NC=O)[C@H]CCC=O)O))))NC=O)[C@H]CCCN)=O))))NC=O)[C@@H]Ccc[nH]cccccc96))))))))))NC=O)[C@@H]CCN)=O)))NC=O)[C@@H]CCCNC=N)N))))))NC=O)[C@@H]CCCCN)))))NC=O)[C@@H]CCC)C)))NC=O)[C@@H]CCC=O)O))))NC=O)[C@@H]CC=O)O)))NC=O)[C@@H]CCC=O)O))))NC=O)[C@@H]C)NC=O)[C@@H]N)CC=O)O)))))))))))))))))))))))))))))))))))))))[C@H]C)CC)))))))))))))))))))))C=O)N[C@@H]CCC)C)))C=O)N[C@@H]CCCN)=O))))C=O)N[C@@H]CCSC))))C=O)N[C@@H]Ccccccc6)))))))C=O)N[C@@H]Ccccccc6)))))))C=O)N[C@@H]CCC=O)O))))C=O)N[C@@H]Ccc[nH]cccccc96))))))))))C=O)N[C@H]Ccc[nH]cccccc96))))))))))C=O)N[C@H]CCCN)=O))))C=O)N[C@@H]CCCCN)))))C=O)N[C@H]CC=O)O)))C=O)O"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_0-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the SMILES from the DeepSMILES.\nMolecule DeepSMILES: S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SMILES from the DeepSMILES.\nMolecule DeepSMILES: CcccC)cc-nccnc5SCC=O)Ncnccs5))))))))))))))c6\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: Cc1cc(C)cc(-n2ccnc2SCC(=O)Nc2nccs2)c1"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_5-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of CcncCNCCNccccnncn5n9)))))))))CC6)))))))no5 can also be represented with the SMILES Cc1nc(CN2CCN(c3ccc4nncn4n3)CC2)no1."} {"text":"The molecule with the DeepSMILES representation of Nccc[nH+]cccccc%106)))))))CCCC6 can also be represented with the SMILES representation Nc1c2c([nH+]c3ccccc13)CCCC2."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_0-1.jsonl": "{"text":"The molecule with the SMILES S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1 can also be represented with the DeepSMILES representation S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6."} {"text":"The molecule with the SMILES Cc1cc(C)cc(-n2ccnc2SCC(=O)Nc2nccs2)c1 can also be represented with the DeepSMILES CcccC)cc-nccnc5SCC=O)Ncnccs5))))))))))))))c6."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_2-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the DeepSMILES Brcccccc6C=Nncnnc5-cccco5))))))))SC6?\nAssistant: Of course, this molecule has a SMILES of Brc1ccccc1C1=Nn2c(nnc2-c2ccco2)SC1."} {"text":"User: Can you tell me the SMILES of the molecule with the DeepSMILES CNCC=O)Ncccccc6Br))))))))))C=O)CNCCNCccccCl)cc6)))))))CC6?\nAssistant: Yes, this molecule has a SMILES of CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_2-1.jsonl": "{"text":"The molecule with the SMILES representation of O=C(Cn1c(=O)c2cccn2c2cccnc21)NCCCN1CCC(Cc2ccccc2)CC1 can also be represented with the DeepSMILES representation O=CCnc=O)ccccn5ccccnc6%13))))))))))))))NCCCNCCCCcccccc6)))))))CC6."} {"text":"The molecule with the SMILES Cc1csc(NC(=O)c2cccc(S(=O)(=O)N3CCN(C)CC3)c2)n1 can also be represented with the DeepSMILES CccscNC=O)cccccS=O)=O)NCCNC)CC6)))))))c6))))))))n5."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_1-1.jsonl": "{"text":"The molecule with the SMILES Cc1ccc(C)c(N2CCN(C(=O)c3ccc4c(=O)n5c(nc4c3)CCCCC5)CC2)c1 can also be represented with the DeepSMILES representation CccccC)cNCCNC=O)ccccc=O)ncnc6c%10)))CCCCC7)))))))))))))CC6))))))c6."} {"text":"The molecule with the SMILES representation of CC[C@H](C)[C@H](NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](C)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](Cc1c[nH]c2ccccc12)NC(=O)[C@@H](CC(N)=O)NC(=O)[C@@H](CCCNC(=N)N)NC(=O)[C@@H](CCCCN)NC(=O)[C@@H](CC(C)C)NC(=O)[C@@H](CCC(=O)O)NC(=O)[C@@H](CC(=O)O)NC(=O)[C@@H](CCC(=O)O)NC(=O)[C@@H](C)NC(=O)[C@@H](N)CC(=O)O)[C@H](C)CC)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@H](Cc1c[nH]c2ccccc12)C(=O)N[C@H](CCC(N)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@H](CC(=O)O)C(=O)O can also be represented with the DeepSMILES CC[C@H]C)[C@H]NC=O)[C@H]CCCN)=O))))NC=O)[C@H]CCCN)=O))))NC=O)[C@H]C)NC=O)[C@H]CCC=O)O))))NC=O)[C@H]CCCN)=O))))NC=O)[C@@H]NC=O)[C@H]CCC=O)O))))NC=O)[C@H]CCCN)=O))))NC=O)[C@@H]Ccc[nH]cccccc96))))))))))NC=O)[C@@H]CCN)=O)))NC=O)[C@@H]CCCNC=N)N))))))NC=O)[C@@H]CCCCN)))))NC=O)[C@@H]CCC)C)))NC=O)[C@@H]CCC=O)O))))NC=O)[C@@H]CC=O)O)))NC=O)[C@@H]CCC=O)O))))NC=O)[C@@H]C)NC=O)[C@@H]N)CC=O)O)))))))))))))))))))))))))))))))))))))))[C@H]C)CC)))))))))))))))))))))C=O)N[C@@H]CCC)C)))C=O)N[C@@H]CCCN)=O))))C=O)N[C@@H]CCSC))))C=O)N[C@@H]Ccccccc6)))))))C=O)N[C@@H]Ccccccc6)))))))C=O)N[C@@H]CCC=O)O))))C=O)N[C@@H]Ccc[nH]cccccc96))))))))))C=O)N[C@H]Ccc[nH]cccccc96))))))))))C=O)N[C@H]CCCN)=O))))C=O)N[C@@H]CCCCN)))))C=O)N[C@H]CC=O)O)))C=O)O."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_3-5.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the SMILES CC(=O)Nc1ccc(S(=O)(=O)N(C)c2ccc(OCC(=O)OCC(=O)Nc3ccc(OC(F)F)cc3)cc2)cc1?\nAssistant: Yes, this molecule has a DeepSMILES of CC=O)NccccS=O)=O)NC)ccccOCC=O)OCC=O)NccccOCF)F)))cc6)))))))))))))cc6))))))))cc6."} {"text":"User: Can you create the DeepSMILES of the molecule with the SMILES Cc1ccc(C(C)N(C)S(=O)(=O)c2c(C)nn(C)c2C)cc1?\nAssistant: Yes, this molecule has a DeepSMILES of CccccCC)NC)S=O)=O)ccC)nnC)c5C)))))))))cc6."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_3-1.jsonl": "{"text":"The molecule with the SMILES CCOC(=O)C1=C(C)Nc2ccccc2N\/C1=N\\S(=O)(=O)c1ccc(Cl)cc1 can also be represented with the DeepSMILES CCOC=O)C=CC)Ncccccc6N\/C%11=N\\S=O)=O)ccccCl)cc6."} {"text":"The molecule with the SMILES representation of Cc1nn(C)c(C)c1S(=O)(=O)NCCc1c(F)cccc1F can also be represented with the DeepSMILES CcnnC)cC)c5S=O)=O)NCCccF)cccc6F."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_0-5.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the SMILES S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6."} {"text":"User: Can you create the DeepSMILES of the molecule with the SMILES Cc1cc(C)cc(-n2ccnc2SCC(=O)Nc2nccs2)c1?\nAssistant: Yes, this molecule has a DeepSMILES of CcccC)cc-nccnc5SCC=O)Ncnccs5))))))))))))))c6."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_0-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the DeepSMILES S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1."} {"text":"User: Can you generate the SMILES of the molecule with the DeepSMILES CcccC)cc-nccnc5SCC=O)Ncnccs5))))))))))))))c6?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of Cc1cc(C)cc(-n2ccnc2SCC(=O)Nc2nccs2)c1."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_1-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the SMILES.\nSMILES: CCC(=O)Nc1ccc(NC(=O)c2cccc3c(Br)cccc23)cc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCC=O)NccccNC=O)ccccccBr)cccc%106))))))))))))cc6"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SMILES.\nMolecule SMILES: Cc1cnc(Nc2ccc(C#N)cc2)nc1C(Cl)c1ccc(F)cc1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CccncNccccC#N))cc6)))))))nc6CCl)ccccF)cc6"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_5-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the SMILES.\nMolecule SMILES: Cc1ccc(CC2(c3ccccc3)NC(N)=NC2=O)o1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CccccCCcccccc6))))))NCN)=NC5=O)))))))o5"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the SMILES.\nSMILES: CC(O)C(CO)NC(=O)C1CSSCC(NC(=O)C(N)Cc2ccccc2)C(=O)NC(Cc2ccccc2)C(=O)NC(Cc2c[nH]c3ccccc23)C(=O)NC(CCCC[NH3+])C(=O)NC(C(C)O)C(=O)N1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: CCO)CCO))NC=O)CCSSCCNC=O)CN)Ccccccc6))))))))))C=O)NCCcccccc6)))))))C=O)NCCcc[nH]cccccc96))))))))))C=O)NCCCCC[NH3+])))))C=O)NCCC)O))C=O)N%20"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_0-5.jsonl": "{"text":"User: Can you tell me the DeepSMILES of the molecule with the SMILES O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C."} {"text":"User: Can you generate the DeepSMILES of the molecule with the SMILES CC(C)(C)c1cc(C(=S)N2CCCC2)cc(C(C)(C)C)c1O?\nAssistant: Sure, this molecule has a DeepSMILES of CCC)C)cccC=S)NCCCC5))))))ccCC)C)C))c6O."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_4-5.jsonl": "{"text":"User: Can you tell me the DeepSMILES of the molecule with the SMILES Cc1nn(C)c(C)c1S(=O)(=O)N1CCCC1C1CCCCC1?\nAssistant: Of course, this molecule has a DeepSMILES of CcnnC)cC)c5S=O)=O)NCCCC5CCCCCC6."} {"text":"User: Can you generate the DeepSMILES of the molecule with the SMILES COc1cc(Cc2cnc(N)[nH]c2=O)cc(OC)c1OC?\nAssistant: Of course, this molecule has a DeepSMILES of COcccCccncN)[nH]c6=O))))))))ccOC))c6OC."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_1-0.jsonl": "{"text":"The molecule with the DeepSMILES CCC=O)NccccNC=O)ccccccBr)cccc%106))))))))))))cc6 can also be represented with the SMILES representation CCC(=O)Nc1ccc(NC(=O)c2cccc3c(Br)cccc23)cc1."} {"text":"The molecule with the DeepSMILES representation of CccncNccccC#N))cc6)))))))nc6CCl)ccccF)cc6 can also be represented with the SMILES representation Cc1cnc(Nc2ccc(C#N)cc2)nc1C(Cl)c1ccc(F)cc1."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_0-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the DeepSMILES.\nDeepSMILES: OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the DeepSMILES.\nDeepSMILES: CCC)C)cccC=S)NCCCC5))))))ccCC)C)C))c6O\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CC(C)(C)c1cc(C(=S)N2CCCC2)cc(C(C)(C)C)c1O"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_4-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the SMILES from the DeepSMILES.\nMolecule DeepSMILES: CcnnC)cC)c5S=O)=O)NCCCC5CCCCCC6\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: Cc1nn(C)c(C)c1S(=O)(=O)N1CCCC1C1CCCCC1"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the DeepSMILES.\nMolecule DeepSMILES: COcccCccncN)[nH]c6=O))))))))ccOC))c6OC\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: COc1cc(Cc2cnc(N)[nH]c2=O)cc(OC)c1OC"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_4-1.jsonl": "{"text":"The molecule with the SMILES representation of Cc1nn(C)c(C)c1S(=O)(=O)N1CCCC1CCc1ccccc1 can also be represented with the DeepSMILES CcnnC)cC)c5S=O)=O)NCCCC5CCcccccc6."} {"text":"The molecule with the SMILES representation of Cc1oc(C)c(C(=O)Nc2ccc(CC#N)cc2)c1C can also be represented with the DeepSMILES representation CcocC)cC=O)NccccCC#N)))cc6))))))))c5C."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_5-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the SMILES.\nMolecule SMILES: O=C(NCCN1CCCCCC1=O)Nc1ccccn1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: O=CNCCNCCCCCC7=O)))))))))))Ncccccn6"} {"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the SMILES.\nSMILES: CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_2-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the DeepSMILES.\nDeepSMILES: CCO)cccnc-ccccc[nH]ncc95)))))))))ncNCCOCC6))))))c6s9\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC(O)c1cc2nc(-c3cccc4[nH]ncc34)nc(N3CCOCC3)c2s1"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the SMILES from the DeepSMILES.\nDeepSMILES: CCC)[C@H]NC=O)cccccc6))))))))C=O)O\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CC(C)[C@H](NC(=O)c1ccccc1)C(=O)O"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_1-1.jsonl": "{"text":"The molecule with the SMILES representation of O=C(CCC1CCCN(C(=O)c2ccc3c(c2)OCO3)C1)N1CCN(c2ccccn2)CC1 can also be represented with the DeepSMILES O=CCCCCCCNC=O)cccccc6)OCO5)))))))))C6))))))))NCCNcccccn6))))))CC6."} {"text":"The molecule with the SMILES CCC(C)C(NS(=O)(=O)c1ccc(C)cc1)C(=O)CCl can also be represented with the DeepSMILES representation CCCC)CNS=O)=O)ccccC)cc6))))))))C=O)CCl."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_0-1.jsonl": "{"text":"The molecule with the SMILES representation of O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C can also be represented with the DeepSMILES representation OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C."} {"text":"The molecule with the SMILES CC(C)(C)c1cc(C(=S)N2CCCC2)cc(C(C)(C)C)c1O can also be represented with the DeepSMILES representation CCC)C)cccC=S)NCCCC5))))))ccCC)C)C))c6O."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_3-4.jsonl": "{"text":"User: Can you generate the SMILES of the molecule with the DeepSMILES CCOC=O)C=CC)Ncccccc6N\/C%11=N\\S=O)=O)ccccCl)cc6?\nAssistant: Of course, this molecule has a SMILES of CCOC(=O)C1=C(C)Nc2ccccc2N\/C1=N\\S(=O)(=O)c1ccc(Cl)cc1."} {"text":"User: Can you create the SMILES of the molecule with the DeepSMILES CcnnC)cC)c5S=O)=O)NCCccF)cccc6F?\nAssistant: Yes, this molecule has a SMILES of Cc1nn(C)c(C)c1S(=O)(=O)NCCc1c(F)cccc1F."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_0-4.jsonl": "{"text":"User: Can you tell me the SMILES of the molecule with the DeepSMILES OCC[C@@H]NC=O)[C@@H]Ccccccccc6nc%10N))))))-cccccc6C)))))))))))))C))))CC6C)C?\nAssistant: Yes, this molecule has a SMILES of O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C."} {"text":"User: Can you tell me the SMILES of the molecule with the DeepSMILES CCC)C)cccC=S)NCCCC5))))))ccCC)C)C))c6O?\nAssistant: Of course, this molecule has a SMILES of CC(C)(C)c1cc(C(=S)N2CCCC2)cc(C(C)(C)C)c1O."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_1-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of CccccC)cNCCNC=O)ccccc=O)ncnc6c%10)))CCCCC7)))))))))))))CC6))))))c6 can also be represented with the SMILES representation Cc1ccc(C)c(N2CCN(C(=O)c3ccc4c(=O)n5c(nc4c3)CCCCC5)CC2)c1."} {"text":"The molecule with the DeepSMILES CC[C@H]C)[C@H]NC=O)[C@H]CCCN)=O))))NC=O)[C@H]CCCN)=O))))NC=O)[C@H]C)NC=O)[C@H]CCC=O)O))))NC=O)[C@H]CCCN)=O))))NC=O)[C@@H]NC=O)[C@H]CCC=O)O))))NC=O)[C@H]CCCN)=O))))NC=O)[C@@H]Ccc[nH]cccccc96))))))))))NC=O)[C@@H]CCN)=O)))NC=O)[C@@H]CCCNC=N)N))))))NC=O)[C@@H]CCCCN)))))NC=O)[C@@H]CCC)C)))NC=O)[C@@H]CCC=O)O))))NC=O)[C@@H]CC=O)O)))NC=O)[C@@H]CCC=O)O))))NC=O)[C@@H]C)NC=O)[C@@H]N)CC=O)O)))))))))))))))))))))))))))))))))))))))[C@H]C)CC)))))))))))))))))))))C=O)N[C@@H]CCC)C)))C=O)N[C@@H]CCCN)=O))))C=O)N[C@@H]CCSC))))C=O)N[C@@H]Ccccccc6)))))))C=O)N[C@@H]Ccccccc6)))))))C=O)N[C@@H]CCC=O)O))))C=O)N[C@@H]Ccc[nH]cccccc96))))))))))C=O)N[C@H]Ccc[nH]cccccc96))))))))))C=O)N[C@H]CCCN)=O))))C=O)N[C@@H]CCCCN)))))C=O)N[C@H]CC=O)O)))C=O)O can also be represented with the SMILES CC[C@H](C)[C@H](NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](C)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](Cc1c[nH]c2ccccc12)NC(=O)[C@@H](CC(N)=O)NC(=O)[C@@H](CCCNC(=N)N)NC(=O)[C@@H](CCCCN)NC(=O)[C@@H](CC(C)C)NC(=O)[C@@H](CCC(=O)O)NC(=O)[C@@H](CC(=O)O)NC(=O)[C@@H](CCC(=O)O)NC(=O)[C@@H](C)NC(=O)[C@@H](N)CC(=O)O)[C@H](C)CC)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@H](Cc1c[nH]c2ccccc12)C(=O)N[C@H](CCC(N)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@H](CC(=O)O)C(=O)O."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_4-5.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the SMILES Cc1nn(C)c(C)c1S(=O)(=O)N1CCCC1CCc1ccccc1?\nAssistant: Yes, this molecule has a DeepSMILES of CcnnC)cC)c5S=O)=O)NCCCC5CCcccccc6."} {"text":"User: Can you create the DeepSMILES of the molecule with the SMILES Cc1oc(C)c(C(=O)Nc2ccc(CC#N)cc2)c1C?\nAssistant: Yes, I'm happy to help, this molecule has a DeepSMILES of CcocC)cC=O)NccccCC#N)))cc6))))))))c5C."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_4-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the SMILES.\nMolecule SMILES: Cc1nn(C)c(C)c1S(=O)(=O)N1CCCC1C1CCCCC1\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CcnnC)cC)c5S=O)=O)NCCCC5CCCCCC6"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SMILES.\nSMILES: COc1cc(Cc2cnc(N)[nH]c2=O)cc(OC)c1OC\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: COcccCccncN)[nH]c6=O))))))))ccOC))c6OC"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_4-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the SMILES.\nMolecule SMILES: Cc1cccc(N(C)S(=O)(=O)c2c(C)nn(C)c2C)c1C\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CcccccNC)S=O)=O)ccC)nnC)c5C))))))))c6C"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SMILES.\nSMILES: O=C(Nc1c(F)cccc1F)c1cccc(OCC(F)F)n1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: O=CNccF)cccc6F))))))))cccccOCCF)F))))n6"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_3-1.jsonl": "{"text":"The molecule with the SMILES representation of Cn1c(=O)c2c([nH]c(=O)n2C)n(Cc2ccccc2)c1=O can also be represented with the DeepSMILES representation Cnc=O)cc[nH]c=O)n5C))))nCcccccc6)))))))c6=O."} {"text":"The molecule with the SMILES Cc1nn(C)c(C)c1S(=O)(=O)N(C)C(C)c1cc(F)ccc1F can also be represented with the DeepSMILES CcnnC)cC)c5S=O)=O)NC)CC)cccF)ccc6F."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_2-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the SMILES.\nMolecule SMILES: Brc1ccccc1C1=Nn2c(nnc2-c2ccco2)SC1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Brcccccc6C=Nncnnc5-cccco5))))))))SC6"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SMILES.\nMolecule SMILES: CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CNCC=O)Ncccccc6Br))))))))))C=O)CNCCNCccccCl)cc6)))))))CC6"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_3-1.jsonl": "{"text":"The molecule with the SMILES CC(=O)Nc1ccc(S(=O)(=O)N(C)c2ccc(OCC(=O)OCC(=O)Nc3ccc(OC(F)F)cc3)cc2)cc1 can also be represented with the DeepSMILES CC=O)NccccS=O)=O)NC)ccccOCC=O)OCC=O)NccccOCF)F)))cc6)))))))))))))cc6))))))))cc6."} {"text":"The molecule with the SMILES representation of Cc1ccc(C(C)N(C)S(=O)(=O)c2c(C)nn(C)c2C)cc1 can also be represented with the DeepSMILES CccccCC)NC)S=O)=O)ccC)nnC)c5C)))))))))cc6."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_0-3.jsonl": "{"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the SMILES.\nMolecule SMILES: S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1\nConstraint: Even if you are not sure, you must answer with a representation without using any other words.\nResult: S=O)=O)C[C@@H]CcccO[C@H]COCC))))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SMILES.\nMolecule SMILES: Cc1cc(C)cc(-n2ccnc2SCC(=O)Nc2nccs2)c1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CcccC)cc-nccnc5SCC=O)Ncnccs5))))))))))))))c6"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_1-5.jsonl": "{"text":"User: Can you generate the DeepSMILES of the molecule with the SMILES Cc1ccc(C)c(N2CCN(C(=O)c3ccc4c(=O)n5c(nc4c3)CCCCC5)CC2)c1?\nAssistant: Yes, this molecule has a DeepSMILES of CccccC)cNCCNC=O)ccccc=O)ncnc6c%10)))CCCCC7)))))))))))))CC6))))))c6."} {"text":"User: Can you tell me the DeepSMILES of the molecule with the SMILES CC[C@H](C)[C@H](NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](C)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](Cc1c[nH]c2ccccc12)NC(=O)[C@@H](CC(N)=O)NC(=O)[C@@H](CCCNC(=N)N)NC(=O)[C@@H](CCCCN)NC(=O)[C@@H](CC(C)C)NC(=O)[C@@H](CCC(=O)O)NC(=O)[C@@H](CC(=O)O)NC(=O)[C@@H](CCC(=O)O)NC(=O)[C@@H](C)NC(=O)[C@@H](N)CC(=O)O)[C@H](C)CC)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@H](Cc1c[nH]c2ccccc12)C(=O)N[C@H](CCC(N)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@H](CC(=O)O)C(=O)O?\nAssistant: Yes, this molecule has a DeepSMILES of CC[C@H]C)[C@H]NC=O)[C@H]CCCN)=O))))NC=O)[C@H]CCCN)=O))))NC=O)[C@H]C)NC=O)[C@H]CCC=O)O))))NC=O)[C@H]CCCN)=O))))NC=O)[C@@H]NC=O)[C@H]CCC=O)O))))NC=O)[C@H]CCCN)=O))))NC=O)[C@@H]Ccc[nH]cccccc96))))))))))NC=O)[C@@H]CCN)=O)))NC=O)[C@@H]CCCNC=N)N))))))NC=O)[C@@H]CCCCN)))))NC=O)[C@@H]CCC)C)))NC=O)[C@@H]CCC=O)O))))NC=O)[C@@H]CC=O)O)))NC=O)[C@@H]CCC=O)O))))NC=O)[C@@H]C)NC=O)[C@@H]N)CC=O)O)))))))))))))))))))))))))))))))))))))))[C@H]C)CC)))))))))))))))))))))C=O)N[C@@H]CCC)C)))C=O)N[C@@H]CCCN)=O))))C=O)N[C@@H]CCSC))))C=O)N[C@@H]Ccccccc6)))))))C=O)N[C@@H]Ccccccc6)))))))C=O)N[C@@H]CCC=O)O))))C=O)N[C@@H]Ccc[nH]cccccc96))))))))))C=O)N[C@H]Ccc[nH]cccccc96))))))))))C=O)N[C@H]CCCN)=O))))C=O)N[C@@H]CCCCN)))))C=O)N[C@H]CC=O)O)))C=O)O."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_4-1.jsonl": "{"text":"The molecule with the SMILES Cc1cccc(N(C)S(=O)(=O)c2c(C)nn(C)c2C)c1C can also be represented with the DeepSMILES CcccccNC)S=O)=O)ccC)nnC)c5C))))))))c6C."} {"text":"The molecule with the SMILES representation of O=C(Nc1c(F)cccc1F)c1cccc(OCC(F)F)n1 can also be represented with the DeepSMILES representation O=CNccF)cccc6F))))))))cccccOCCF)F))))n6."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_5-4.jsonl": "{"text":"User: Can you create the SMILES of the molecule with the DeepSMILES O=CNCCNCCCCCC7=O)))))))))))Ncccccn6?\nAssistant: Yes, this molecule has a SMILES of O=C(NCCN1CCCCCC1=O)Nc1ccccn1."} {"text":"User: Can you generate the SMILES of the molecule with the DeepSMILES CC#CCCC)CO)\/C=C\/CCO)CCC\/C=C\/CCCC=O)[O-]))))))CC58?\nAssistant: Of course, this molecule has a SMILES of CC#CCC(C)C(O)\/C=C\/C1C(O)CC2C\/C(=C\/CCCC(=O)[O-])CC21."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/valid_2-5.jsonl": "{"text":"User: Can you create the DeepSMILES of the molecule with the SMILES Brc1ccccc1C1=Nn2c(nnc2-c2ccco2)SC1?\nAssistant: Of course, this molecule has a DeepSMILES of Brcccccc6C=Nncnnc5-cccco5))))))))SC6."} {"text":"User: Can you generate the DeepSMILES of the molecule with the SMILES CN(CC(=O)Nc1ccccc1Br)C(=O)CN1CCN(Cc2ccc(Cl)cc2)CC1?\nAssistant: Yes, this molecule has a DeepSMILES of CNCC=O)Ncccccc6Br))))))))))C=O)CNCCNCccccCl)cc6)))))))CC6."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_1-2.jsonl": "{"text":"Task: Please create a molecule representation based on the input molecule representation and the description.\nDescription: Generate the SMILES from the DeepSMILES.\nMolecule DeepSMILES: O=CCCCCCCNC=O)cccccc6)OCO5)))))))))C6))))))))NCCNcccccn6))))))CC6\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: O=C(CCC1CCCN(C(=O)c2ccc3c(c2)OCO3)C1)N1CCN(c2ccccn2)CC1"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SMILES from the DeepSMILES.\nDeepSMILES: CCCC)CNS=O)=O)ccccC)cc6))))))))C=O)CCl\nConstraint: Even if you are not sure, you must answer with a representation without using any additional words.\nResult: CCC(C)C(NS(=O)(=O)c1ccc(C)cc1)C(=O)CCl"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_0-4.jsonl": "{"text":"User: Can you create the SMILES of the molecule with the DeepSMILES S=O)C[C@@H]CcccOCCF)F)F))CF)F)F))))cN)cF)c6)))))))[C@H]O)[C@@H][NH2+]Ccccccc6)))CC)C)C)))))))C6?\nAssistant: Yes, I'm happy to help, this molecule has a SMILES of S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1."} {"text":"User: Can you tell me the SMILES of the molecule with the DeepSMILES O=ccccccc6nc\/C=C\\cccccc6[N+]=O)[O-]))))))))))n%10-cccccc6O?\nAssistant: Yes, this molecule has a SMILES of O=c1c2ccccc2nc(\/C=C\\c2ccccc2[N+](=O)[O-])n1-c1ccccc1O."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_3-0.jsonl": "{"text":"The molecule with the DeepSMILES Cnc=O)cc[nH]c=O)n5C))))nCcccccc6)))))))c6=O can also be represented with the SMILES representation Cn1c(=O)c2c([nH]c(=O)n2C)n(Cc2ccccc2)c1=O."} {"text":"The molecule with the DeepSMILES representation of CcnnC)cC)c5S=O)=O)NC)CC)cccF)ccc6F can also be represented with the SMILES representation Cc1nn(C)c(C)c1S(=O)(=O)N(C)C(C)c1cc(F)ccc1F."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/test_2-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Generate the DeepSMILES from the SMILES.\nMolecule SMILES: CC(O)c1cc2nc(-c3cccc4[nH]ncc34)nc(N3CCOCC3)c2s1\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: CCO)cccnc-ccccc[nH]ncc95)))))))))ncNCCOCC6))))))c6s9"} {"text":"Task: Please generate a molecule representation based on the input molecule representation and the description.\nDescription: Create the DeepSMILES from the SMILES.\nMolecule SMILES: CC(C)[C@H](NC(=O)c1ccccc1)C(=O)O\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CCC)[C@H]NC=O)cccccc6))))))))C=O)O"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_4-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of CcnnC)cC)c5S=O)=O)NCCCC5CCCCCC6 can also be represented with the SMILES Cc1nn(C)c(C)c1S(=O)(=O)N1CCCC1C1CCCCC1."} {"text":"The molecule with the DeepSMILES representation of COcccCccncN)[nH]c6=O))))))))ccOC))c6OC can also be represented with the SMILES COc1cc(Cc2cnc(N)[nH]c2=O)cc(OC)c1OC."}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_3-2.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the SMILES from the DeepSMILES.\nDeepSMILES: Cnc=O)cc[nH]c=O)n5C))))nCcccccc6)))))))c6=O\nConstraint: Even if you are uncertain, you must answer with a representation without using any additional words.\nResult: Cn1c(=O)c2c([nH]c(=O)n2C)n(Cc2ccccc2)c1=O"} {"text":"Task: Please create a molecule representation based on the description.\nDescription: Generate the SMILES from the DeepSMILES.\nDeepSMILES: CcnnC)cC)c5S=O)=O)NC)CC)cccF)ccc6F\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: Cc1nn(C)c(C)c1S(=O)(=O)N(C)C(C)c1cc(F)ccc1F"}", "/scratch/micpie/export/mol_repr_transl_smiles_deepsmiles/train_3-3.jsonl": "{"text":"Task: Please generate a molecule representation based on the description.\nDescription: Create the DeepSMILES from the SMILES.\nMolecule SMILES: Cn1c(=O)c2c([nH]c(=O)n2C)n(Cc2ccccc2)c1=O\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: Cnc=O)cc[nH]c=O)n5C))))nCcccccc6)))))))c6=O"} {"text":"Task: Please generate a molecule representation based on the description.\nDescription: Generate the DeepSMILES from the SMILES.\nMolecule SMILES: Cc1nn(C)c(C)c1S(=O)(=O)N(C)C(C)c1cc(F)ccc1F\nConstraint: Even if you are uncertain, you must answer with a representation without using any other words.\nResult: CcnnC)cC)c5S=O)=O)NC)CC)cccF)ccc6F"}", "/scratch/micpie/export/drug_protein_go_term/test_0-1.jsonl": "{"text":"The drug [H][C@@]CC[C@H]O)[C@@]5C)CC[C@][H])C=CC=CO)C=C6C[C@@H]CCCCCCCCCS=O)CCCCF)F)CF)F)F))))))))))))))))[C@@]%17%10[H] targets the protein ESR1. The protein ESR1 is located in the transcription preinitiation complex."} {"text":"The drug NC1=NC2=C(NC=C2)C(=O)N1 targets the protein HPRT1. The protein HPRT1 enables the nucleotide binding."}", "/scratch/micpie/export/drug_protein_go_term/valid_0-0.jsonl": "{"text":"The drug Caffeine targets the protein PDE6C which is located in the membrane."} {"text":"The drug CC=CSC=C5)))C=CCCNCCC[C@H]C6)CO)=O))))))))))C=CC)C=CS5 targets the protein SLC6A1 which enables the gamma-aminobutyric acid:sodium symporter activity."}", "/scratch/micpie/export/drug_protein_go_term/test_0-2.jsonl": "{"text":"User: Can you come up with one example for a protein that binds the drug Fulvestrant?\nAssistant: Yes, of course, the drug Fulvestrant targets the protein ESR1.\nUser: Can you tell me more about protein ESR1?\nAssistant: Sure, the protein ESR1 is located in the transcription preinitiation complex."} {"text":"User: Can you give me one example for a protein that binds the drug InChI=1S\/C6H6N4O\/c7-6-9-3-1-2-8-4(3)5(11)10-6\/h1-2,8H,(H3,7,9,10,11)?\nAssistant: Yes, of course, the drug InChI=1S\/C6H6N4O\/c7-6-9-3-1-2-8-4(3)5(11)10-6\/h1-2,8H,(H3,7,9,10,11) targets the protein HPRT1.\nUser: Can you tell me more details about protein HPRT1?\nAssistant: Yes, the protein HPRT1 enables the nucleotide binding."}", "/scratch/micpie/export/drug_protein_go_term/test_0-0.jsonl": "{"text":"The drug Fulvestrant targets the protein ESR1 which is located in the transcription preinitiation complex."} {"text":"The drug 9-Deazaguanine targets the protein HPRT1 which enables the nucleotide binding."}", "/scratch/micpie/export/drug_protein_go_term/train_0-0.jsonl": "{"text":"The drug InChI=1S\/C14H16N2O5\/c17-11(5-8-14(20)21)15-9-1-3-10(4-2-9)16-12(18)6-7-13(16)19\/h1-4,11,15,17H,5-8H2,(H,20,21)\/t11-\/m0\/s1 targets the protein MYL12A which is located in the myosin II complex."} {"text":"The drug Histidine targets the protein HAL which enables the catalytic activity."}", "/scratch/micpie/export/drug_protein_go_term/valid_0-2.jsonl": "{"text":"User: Can you give me one example for a protein that binds the drug CNC=NC=C5C=O)NC)C=O)N6C?\nAssistant: Yes, of course, the drug CNC=NC=C5C=O)NC)C=O)N6C targets the protein PDE6C.\nUser: Can you tell me more about protein PDE6C?\nAssistant: Yes, of course, the protein PDE6C is located in the membrane."} {"text":"User: Can you come up with one example for a protein that binds the drug Cc1ccsc1C(=CCCN1CCC[C@@H](C(=O)O)C1)c1sccc1C?\nAssistant: Yes, of course, the drug Cc1ccsc1C(=CCCN1CCC[C@@H](C(=O)O)C1)c1sccc1C targets the protein SLC6A1.\nUser: Can you tell me more about protein SLC6A1?\nAssistant: Yes, of course, the protein SLC6A1 enables the gamma-aminobutyric acid:sodium symporter activity."}", "/scratch/micpie/export/drug_protein_go_term/valid_0-1.jsonl": "{"text":"The drug CN1C=NC2=C1C(=O)N(C)C(=O)N2C targets the protein PDE6C. The protein PDE6C is located in the membrane."} {"text":"The drug Cc1ccsc1C(=CCCN1CCC[C@@H](C(=O)O)C1)c1sccc1C targets the protein SLC6A1. The protein SLC6A1 enables the gamma-aminobutyric acid:sodium symporter activity."}", "/scratch/micpie/export/drug_protein_go_term/train_0-2.jsonl": "{"text":"User: Can you give me one example for a protein that binds the drug InChI=1S\/C14H16N2O5\/c17-11(5-8-14(20)21)15-9-1-3-10(4-2-9)16-12(18)6-7-13(16)19\/h1-4,11,15,17H,5-8H2,(H,20,21)\/t11-\/m0\/s1?\nAssistant: Sure, the drug InChI=1S\/C14H16N2O5\/c17-11(5-8-14(20)21)15-9-1-3-10(4-2-9)16-12(18)6-7-13(16)19\/h1-4,11,15,17H,5-8H2,(H,20,21)\/t11-\/m0\/s1 targets the protein MYL12A.\nUser: Can you tell me more about protein MYL12A?\nAssistant: Sure, the protein MYL12A is located in the myosin II complex."} {"text":"User: Can you come up with an example for a protein that binds the drug N[C@@H](CC1=CNC=N1)C(O)=O?\nAssistant: Of course, the drug N[C@@H](CC1=CNC=N1)C(O)=O targets the protein HAL.\nUser: Can you tell me more details about protein HAL?\nAssistant: Of course, the protein HAL enables the catalytic activity."}", "/scratch/micpie/export/drug_protein_go_term/train_0-1.jsonl": "{"text":"The drug InChI=1S\/C14H16N2O5\/c17-11(5-8-14(20)21)15-9-1-3-10(4-2-9)16-12(18)6-7-13(16)19\/h1-4,11,15,17H,5-8H2,(H,20,21)\/t11-\/m0\/s1 targets the protein MYL12A. The protein MYL12A is located in the myosin II complex."} {"text":"The drug Histidine targets the protein HAL. The protein HAL enables the catalytic activity."}", "/scratch/micpie/export/chebi_chebi/valid_0-0.jsonl": "{"text":"The Ser-Thr-Ala is a peptide."} {"text":"The 2,5-Dimethyl-3-mercaptotetrahydrofuran is a oxolanes."}", "/scratch/micpie/export/chebi_chebi/test_0-0.jsonl": "{"text":"The Arg-Pro-Lys is a oligopeptide."} {"text":"The Trp-Ile-Asn is a peptide."}", "/scratch/micpie/export/chebi_chebi/train_0-0.jsonl": "{"text":"The 2-[(1S,3R,4aR,9aS)-1-(hydroxymethyl)-6-[[oxo-[4-(trifluoromethyl)anilino]methyl]amino]-3,4,4a,9a-tetrahydro-1H-pyrano[3,4-b]benzofuran-3-yl]-N-(2-methoxyethyl)acetamide is a ureas."} {"text":"The histrelin acetate has the part histrelin."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/test_4-0.jsonl": "{"text":"The compound InChI=1S\/C25H29N7O3\/c1-26-25(34)28-16-4-2-15(3-5-16)22-29-23(31-13-19-10-11-20(14-31)35-19)21-12-27-32(24(21)30-22)17-6-8-18(33)9-7-17\/h2-5,12,17,19-20H,6-11,13-14H2,1H3,(H2,26,28,34) targets the protein PI3Kalpha. The protein PI3Kalpha is involved in Aldosterone-regulated sodium reabsorption. The Aldosterone-regulated sodium reabsorption is modulated by the disease Renal tubular acidosis."} {"text":"The compound InChI=1S\/C18H13FN8\/c19-12-7-20-18(26-11-2-4-14-16(6-11)24-9-22-14)27-17(12)25-10-1-3-13-15(5-10)23-8-21-13\/h1-9H,(H,21,23)(H,22,24)(H2,20,25,26,27) targets the protein Cell division protein kinase 1. The protein Cell division protein kinase 1 is involved in Gap junction. The Gap junction is modulated by the disease Hypoplastic left heart syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/valid_5-0.jsonl": "{"text":"The compound InChI=1S\/C20H29N7O\/c1-6-15(20(4,5)28)24-19-25-17(22-11-14-8-7-9-21-10-14)16-18(26-19)27(12-23-16)13(2)3\/h7-10,12-13,15,28H,6,11H2,1-5H3,(H2,22,24,25,26)\/t15-\/m1\/s1 targets the protein Cyclin-dependent kinase 1. The protein Cyclin-dependent kinase 1 is involved in Gap junction. The Gap junction is modulated by the disease Hypoplastic left heart syndrome."} {"text":"The compound CCc1c(F)cncc1-c1cc2c(c(F)c1F)-n1c(C)nnc1CC2 targets the protein Cytochrome P-450Aldo. The protein Cytochrome P-450Aldo is involved in Metabolic pathways. The Metabolic pathways is modulated by the disease Desmosterolosis."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/test_9-0.jsonl": "{"text":"The compound [C][C][=C][C][=C][C][=C][Branch2][Branch1][N][C][=Branch1][C][=O][N][C][Branch2][Ring2][N][C][=Branch1][C][=O][N][C@@H1][Branch2][Ring1][#Branch2][C][C][C][N][C][C][C][Branch1][=N][N][Branch1][C][C][C][C][C][O][C][C][Ring1][=Branch1][C][C][Ring1][=C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][P][S][C][Ring2][Ring2][=Branch2][=C][Ring2][Ring2][=N] targets the protein Substance-K receptor. The protein Substance-K receptor is involved in Calcium signaling pathway. The Calcium signaling pathway is modulated by the disease Hemiplegic migraine."} {"text":"The compound Brcccc[nH]ncc5c9 targets the protein N-NOS. The protein N-NOS is involved in Arginine and proline metabolism. The Arginine and proline metabolism is modulated by the disease Snyder-Robinson syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/valid_3-0.jsonl": "{"text":"The compound [C][C][=C][C][=C][Branch2][Ring1][=C][C][=N][N][Branch1][#Branch1][C][=Branch1][C][=O][C][Br][C][Branch1][=N][C][=C][C][Branch1][C][Cl][=C][C][=C][Ring1][#Branch1][O][C][Ring1][P][C][=C][Ring2][Ring1][#Branch1] targets the protein Amine oxidase [flavin-containing] A. The protein Amine oxidase [flavin-containing] A is involved in Glycine, serine and threonine metabolism. The Glycine, serine and threonine metabolism is modulated by the disease Guanidinoacetate methyltransferase deficiency."} {"text":"The compound Oc1cccc(-c2nc(N3CCCCC3)c3nc[nH]c3n2)c1 targets the protein Phosphatidylinositol 4,5-bisphosphate 3-kinase 110 kDa catalytic subunit alpha. The protein Phosphatidylinositol 4,5-bisphosphate 3-kinase 110 kDa catalytic subunit alpha is involved in Aldosterone-regulated sodium reabsorption. The Aldosterone-regulated sodium reabsorption is modulated by the disease Renal tubular acidosis."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/test_5-0.jsonl": "{"text":"The compound InChI=1S\/C17H18ClN7O\/c18-12-3-1-4-13(10-12)22-17-24-15(23-16(19)25-17)11-5-7-21-14(9-11)20-6-2-8-26\/h1,3-5,7,9-10,26H,2,6,8H2,(H,20,21)(H3,19,22,23,24,25) targets the protein p34 protein kinase. The protein p34 protein kinase is involved in Gap junction. The Gap junction is modulated by the disease Hypoplastic left heart syndrome."} {"text":"The compound [C][=C][C][=C][Branch2][Ring1][=Branch1][C][=C][C][=C][C][=C][Ring1][=Branch1][C@H1][C][C][C][C][=C][N][=C][N][Ring1][Branch1][Ring1][=Branch2][C][=C][Ring2][Ring1][Branch1] targets the protein Steroid 11-beta-hydroxylase, CYP11B2. The protein Steroid 11-beta-hydroxylase, CYP11B2 is involved in Metabolic pathways. The Metabolic pathways is modulated by the disease Desmosterolosis."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/test_2-0.jsonl": "{"text":"The compound InChI=1S\/C20H17N5O3S2\/c1-12-3-6-15(7-4-12)30(27,28)25-19-21-10-9-16(23-19)14-5-8-17-18(11-14)29-20(24-17)22-13(2)26\/h3-11H,1-2H3,(H,21,23,25)(H,22,24,26) targets the protein Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform. The protein Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform is involved in JAK-STAT signaling pathway. The JAK-STAT signaling pathway is modulated by the disease Leptin receptor deficiency."} {"text":"The compound c1ccc(CCOc2ccc3c(c2)OCO3)cc1 targets the protein MAO-A. The protein MAO-A is involved in Glycine, serine and threonine metabolism. The Glycine, serine and threonine metabolism is modulated by the disease Guanidinoacetate methyltransferase deficiency."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/valid_0-0.jsonl": "{"text":"The compound InChI=1S\/C19H17NO\/c1-19(2)11-17-16(18(21)12-19)10-15(13-20-17)9-8-14-6-4-3-5-7-14\/h3-7,10,13H,11-12H2,1-2H3 targets the protein mGluR5. The protein mGluR5 is involved in Neuroactive ligand-receptor interaction. The Neuroactive ligand-receptor interaction is modulated by the disease Genetic obesity."} {"text":"The compound [C][C][=C][C][=C][Branch2][Ring2][P][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch2][Ring1][#C][C][=N][S][C][Branch1][P][N][C][=Branch1][C][=O][N][C][C][C][N][C][C][C][C][Ring1][Branch1][=C][Ring1][P][C][Branch1][C][N][=O][C][=C][Ring2][Ring1][#Branch2][C][=C][Ring2][Ring2][Ring2] targets the protein CD antigen CD140b. The protein CD antigen CD140b is involved in JAK-STAT signaling pathway. The JAK-STAT signaling pathway is modulated by the disease Castleman disease."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/test_7-0.jsonl": "{"text":"The compound O=CCCccSSc[nH]cccccc6c9CCC=O)Ncccccc6)))))))))))))))))))))[nH]cccccc96)))))))))))Ncccccc6 targets the protein p60-Src. The protein p60-Src is involved in VEGF signaling pathway. The VEGF signaling pathway is modulated by the disease Infantile hemangioma."} {"text":"The compound [C][C][Branch2][Ring1][Branch1][C][=N][C][=C][Branch1][=Branch1][C][Branch1][C][N][=O][C][=C][C][=C][Ring1][=Branch2][NH1][Ring1][N][C][C][N][Ring1][S] targets the protein Poly [ADP-ribose] polymerase 1 (PARP-1) (EC 2.4.2.30) (ADP-ribosyltransferase diphtheria toxin-like 1) (ARTD1) (DNA ADP-ribosyltransferase PARP1) (EC 2.4.2.-) (NAD(+) ADP-ribosyltransferase 1) (ADPRT 1) (Poly[ADP-ribose] synthase 1) (Protein poly-ADP-ribosyltransferase PARP1) (EC 2.4.2.-). The protein Poly [ADP-ribose] polymerase 1 (PARP-1) (EC 2.4.2.30) (ADP-ribosyltransferase diphtheria toxin-like 1) (ARTD1) (DNA ADP-ribosyltransferase PARP1) (EC 2.4.2.-) (NAD(+) ADP-ribosyltransferase 1) (ADPRT 1) (Poly[ADP-ribose] synthase 1) (Protein poly-ADP-ribosyltransferase PARP1) (EC 2.4.2.-) is involved in Base excision repair. The Base excision repair is modulated by the disease Ataxia-telangiectasia-like syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/test_3-0.jsonl": "{"text":"The compound O=CCNCCCCC6)))))))NN=CccccCl)cc6))))))CC5cccCl)ccc6O targets the protein MAO-A. The protein MAO-A is involved in Glycine, serine and threonine metabolism. The Glycine, serine and threonine metabolism is modulated by the disease Guanidinoacetate methyltransferase deficiency."} {"text":"The compound Ncncccc-cccccc6))))))cc6s9 targets the protein PI3Kalpha. The protein PI3Kalpha is involved in Aldosterone-regulated sodium reabsorption. The Aldosterone-regulated sodium reabsorption is modulated by the disease Renal tubular acidosis."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/train_1-0.jsonl": "{"text":"The compound InChI=1S\/C19H19FN4O4\/c1-9-15(8-13-12-7-11(20)3-4-14(12)23-18(13)26)22-10(2)17(9)19(27)21-6-5-16(25)24-28\/h3-4,7-8,22,28H,5-6H2,1-2H3,(H,21,27)(H,23,26)(H,24,25)\/b13-8- targets the protein PDGFR-beta. The protein PDGFR-beta is involved in JAK-STAT signaling pathway. The JAK-STAT signaling pathway is modulated by the disease Castleman disease."} {"text":"The compound CC(=O)Nc1nc2ccc(-c3ccnc(Sc4ccccc4)n3)cc2s1 targets the protein PI3Kalpha. The protein PI3Kalpha is involved in JAK-STAT signaling pathway. The JAK-STAT signaling pathway is modulated by the disease Leptin receptor deficiency."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/test_0-0.jsonl": "{"text":"The compound [O][=C][Branch1][N][C@H1][C][C][C][C][C][Ring1][=Branch1][C][Ring1][Branch1][N][C][C][C][N][=C][Branch1][O][C][#C][C][=C][C][=C][C][=C][Ring1][=Branch1][S][C][=Ring1][=N][C][Ring1][P] targets the protein Metabotropic glutamate receptor 5. The protein Metabotropic glutamate receptor 5 is involved in Neuroactive ligand-receptor interaction. The Neuroactive ligand-receptor interaction is modulated by the disease Genetic obesity."} {"text":"The compound [C][C][=C][C][=N][C][Branch2][Ring1][O][N][C][=C][C][=C][Branch1][S][C][=Branch1][C][=O][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][C][=C][Ring1][#C][=N][N][=C][Ring2][Ring1][=Branch1][C][=C][Ring2][Ring1][#Branch2][C][=C][C][Branch1][C][O][=C][C][=C][Ring1][#Branch1][Cl] targets the protein CD antigen CD140b. The protein CD antigen CD140b is involved in JAK-STAT signaling pathway. The JAK-STAT signaling pathway is modulated by the disease Castleman disease."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/test_6-0.jsonl": "{"text":"The compound InChI=1S\/C18H23N3O3S\/c1-25(23,24)21-9-7-15(8-10-21)18(22)16-12-19-13-20-17(16)11-14-5-3-2-4-6-14\/h2-6,12-13,15,18,22H,7-11H2,1H3 targets the protein CYPXIB2. The protein CYPXIB2 is involved in Metabolic pathways. The Metabolic pathways is modulated by the disease Desmosterolosis."} {"text":"The compound [C][O][C][=C][C][Branch2][Ring2][P][N][C][=C][Branch1][Ring1][C][#N][C][=N][C][=C][C][=C][C][Branch1][Ring1][O][C][=C][Branch1][#C][O][C][C][N][C][C][N][Branch1][C][C][C][C][Ring1][#Branch1][C][=C][Ring2][Ring1][C][C][=C][Ring2][Ring1][N][Ring2][Ring1][=Branch1][=C][Branch1][C][Cl][C][=C][Ring2][Ring2][Ring2][Cl] targets the protein pp60c-src. The protein pp60c-src is involved in VEGF signaling pathway. The VEGF signaling pathway is modulated by the disease Infantile hemangioma."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/train_2-0.jsonl": "{"text":"The compound CC(=O)Nc1nc2ccc(-c3ccnc(NC(C)(C)c4ccccc4)n3)cc2s1 targets the protein PtdIns-3-kinase subunit p110-alpha. The protein PtdIns-3-kinase subunit p110-alpha is involved in JAK-STAT signaling pathway. The JAK-STAT signaling pathway is modulated by the disease Leptin receptor deficiency."} {"text":"The compound O=CCNCCOCC6)))))))NN=CccccF)cc6))))))CC5cccCl)ccc6O targets the protein Monoamine oxidase type A. The protein Monoamine oxidase type A is involved in Glycine, serine and threonine metabolism. The Glycine, serine and threonine metabolism is modulated by the disease Guanidinoacetate methyltransferase deficiency."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/valid_2-0.jsonl": "{"text":"The compound O=S(=O)(Nc1cncc(-c2ccc3ncc(-c4cc[nH]n4)n3c2)c1)c1ccccc1 targets the protein PtdIns-3-kinase subunit alpha. The protein PtdIns-3-kinase subunit alpha is involved in JAK-STAT signaling pathway. The JAK-STAT signaling pathway is modulated by the disease Leptin receptor deficiency."} {"text":"The compound CC1=NN(C(=O)CN2CCC(=O)CC2)C(c2cc(Br)cc(Br)c2O)C1 targets the protein MAO-A. The protein MAO-A is involved in Glycine, serine and threonine metabolism. The Glycine, serine and threonine metabolism is modulated by the disease Guanidinoacetate methyltransferase deficiency."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/valid_4-0.jsonl": "{"text":"The compound CNC(=O)Nc1ccc(-c2nc(N3CC4CCC(C3)O4)c3cnn(C4CCN(C(=O)OC(C)C)CC4)c3n2)cc1 targets the protein Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform. The protein Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform is involved in Aldosterone-regulated sodium reabsorption. The Aldosterone-regulated sodium reabsorption is modulated by the disease Renal tubular acidosis."} {"text":"The compound [C][C][C][Branch1][C][O][C@H1][Branch1][Ring1][C][C][N][C][=N][C][Branch1][O][N][C][C][=C][C][=C][N][=C][Ring1][=Branch1][=C][N][=C][N][Branch1][=Branch1][C][Branch1][C][C][C][C][Ring1][Branch2][=N][Ring2][Ring1][Ring2] targets the protein Cell division control protein 2 homolog. The protein Cell division control protein 2 homolog is involved in Gap junction. The Gap junction is modulated by the disease Hypoplastic left heart syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/train_0-0.jsonl": "{"text":"The compound InChI=1S\/C14H13ClN4O\/c1-8-4-11-10(12(20)5-8)7-17-14(19-11)18-9-2-3-13(15)16-6-9\/h2-3,6-8H,4-5H2,1H3,(H,17,18,19) targets the protein mGluR5. The protein mGluR5 is involved in Neuroactive ligand-receptor interaction. The Neuroactive ligand-receptor interaction is modulated by the disease Genetic obesity."} {"text":"The compound InChI=1S\/C28H34N6O5S\/c1-35-24-15-21-22(16-26(24)37-13-10-32-8-11-36-12-9-32)30-18-31-27(21)33-4-6-34(7-5-33)28(40)29-17-20-2-3-23-25(14-20)39-19-38-23\/h2-3,14-16,18H,4-13,17,19H2,1H3,(H,29,40) targets the protein Beta-type platelet-derived growth factor receptor. The protein Beta-type platelet-derived growth factor receptor is involved in JAK-STAT signaling pathway. The JAK-STAT signaling pathway is modulated by the disease Castleman disease."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/valid_9-0.jsonl": "{"text":"The compound [C][C][=C][C][=C][C][=C][Branch2][Branch1][Branch2][C][=Branch1][C][=O][N][C][Branch2][Ring2][Branch2][C][=Branch1][C][=O][N][C@@H1][Branch2][Ring1][=Branch1][C][C][C][N][C][C][C][Branch1][=Branch2][N][C][C][O][C][C][Ring1][=Branch1][C][C][Ring1][N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][#C][S][C][Ring2][Ring2][#Branch1][=C][Ring2][Ring2][O] targets the protein SKR. The protein SKR is involved in Calcium signaling pathway. The Calcium signaling pathway is modulated by the disease Hemiplegic migraine."} {"text":"The compound InChI=1S\/C22H26ClN3\/c23-17-6-1-4-16(12-17)14-25-11-3-5-15-9-10-19-18-7-2-8-20(18)22(24)26-21(19)13-15\/h1,4,6,9-10,12-13,18,20,25H,2-3,5,7-8,11,14H2,(H2,24,26)\/t18-,20-\/m1\/s1 targets the protein Constitutive NOS. The protein Constitutive NOS is involved in Arginine and proline metabolism. The Arginine and proline metabolism is modulated by the disease Snyder-Robinson syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/train_8-0.jsonl": "{"text":"The compound O=C(c1cc(Cc2n[nH]c(=O)c3ccccc23)ccc1F)N1CCN(c2ccccc2)CC1 targets the protein Poly [ADP-ribose] polymerase 1 (PARP-1) (EC 2.4.2.30) (ADP-ribosyltransferase diphtheria toxin-like 1) (ARTD1) (DNA ADP-ribosyltransferase PARP1) (EC 2.4.2.-) (NAD(+) ADP-ribosyltransferase 1) (ADPRT 1) (Poly[ADP-ribose] synthase 1) (Protein poly-ADP-ribosyltransferase PARP1) (EC 2.4.2.-). The protein Poly [ADP-ribose] polymerase 1 (PARP-1) (EC 2.4.2.30) (ADP-ribosyltransferase diphtheria toxin-like 1) (ARTD1) (DNA ADP-ribosyltransferase PARP1) (EC 2.4.2.-) (NAD(+) ADP-ribosyltransferase 1) (ADPRT 1) (Poly[ADP-ribose] synthase 1) (Protein poly-ADP-ribosyltransferase PARP1) (EC 2.4.2.-) is involved in Base excision repair. The Base excision repair is modulated by the disease Ataxia-telangiectasia-like syndrome."} {"text":"The compound Cc1ccc2cc(C(=O)NC3(C(=O)N[C@@H](CCCN4CCC(CCO)CC4)Cc4ccccc4)CCCC3)sc2c1 targets the protein NK-2R. The protein NK-2R is involved in Calcium signaling pathway. The Calcium signaling pathway is modulated by the disease Hemiplegic migraine."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/train_5-0.jsonl": "{"text":"The compound CCC)OcncNcccc[nH]cnc5c9))))))))))ncc6CF)F)F targets the protein Cell division control protein 2 homolog. The protein Cell division control protein 2 homolog is involved in Gap junction. The Gap junction is modulated by the disease Hypoplastic left heart syndrome."} {"text":"The compound InChI=1S\/C22H23N5O4S\/c23-22(29)27-7-2-4-16-9-19(14-25-21(16)27)18-10-17(12-24-13-18)15-3-1-5-20(11-15)32(30,31)26-6-8-28\/h1,3,5,9-14,26,28H,2,4,6-8H2,(H2,23,29) targets the protein Corticosterone 18-monooxygenase, CYP11B2. The protein Corticosterone 18-monooxygenase, CYP11B2 is involved in Metabolic pathways. The Metabolic pathways is modulated by the disease Desmosterolosis."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/train_9-0.jsonl": "{"text":"The compound [C][C][=C][C][=C][C][=C][Branch2][Branch1][O][C][=Branch1][C][=O][N][C][Branch2][Ring2][O][C][=Branch1][C][=O][N][C@@H1][Branch2][Ring1][=Branch2][C][C][N][C][C][C][Branch1][=N][N][Branch1][C][C][C][C][C][O][C][C][Ring1][=Branch1][C][C][Ring1][=C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][S][S][C][Ring2][Ring2][Branch2][=C][Ring2][Ring2][N] targets the protein SKR. The protein SKR is involved in Calcium signaling pathway. The Calcium signaling pathway is modulated by the disease Hemiplegic migraine."} {"text":"The compound InChI=1S\/C8H16N2\/c9-8-6-4-2-1-3-5-7-10-8\/h1-7H2,(H2,9,10) targets the protein Neuronal NOS. The protein Neuronal NOS is involved in Arginine and proline metabolism. The Arginine and proline metabolism is modulated by the disease Snyder-Robinson syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/test_1-0.jsonl": "{"text":"The compound COc1cc(Nc2ncc3c(n2)CCN(C(=O)c2cccnc2)C3)cc(OC)c1OC targets the protein CD antigen CD140b. The protein CD antigen CD140b is involved in JAK-STAT signaling pathway. The JAK-STAT signaling pathway is modulated by the disease Castleman disease."} {"text":"The compound [C][C][=Branch1][C][=O][N][C][=N][C][=C][C][=C][Branch2][Ring2][Ring1][C][=C][C][=N][C][Branch2][Ring1][#Branch1][N][Branch1][C][C][S][=Branch1][C][=O][=Branch1][C][=O][C][=C][C][=C][Branch1][C][C][C][=C][Ring1][#Branch1][=N][Ring2][Ring1][C][C][=C][Ring2][Ring1][Branch2][S][Ring2][Ring1][O] targets the protein Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform. The protein Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform is involved in JAK-STAT signaling pathway. The JAK-STAT signaling pathway is modulated by the disease Leptin receptor deficiency."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/valid_7-0.jsonl": "{"text":"The compound [O][=C][Branch2][Branch1][Ring2][C][C][C][=C][Branch2][Ring1][P][S][S][C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Ring1][=Branch2][C][C][C][=Branch1][C][=O][N][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1][NH1][C][=C][C][=C][C][=C][Ring2][Ring1][P][Ring1][=Branch1][N][C][C][C][=C][C][=C][C][=C][Ring1][=Branch1] targets the protein pp60c-src. The protein pp60c-src is involved in VEGF signaling pathway. The VEGF signaling pathway is modulated by the disease Infantile hemangioma."} {"text":"The compound Ccnn-cccccc6))))))c[nH]c=O)ccc96)CCCC6 targets the protein Poly [ADP-ribose] polymerase 1 (PARP-1) (EC 2.4.2.30) (ADP-ribosyltransferase diphtheria toxin-like 1) (ARTD1) (DNA ADP-ribosyltransferase PARP1) (EC 2.4.2.-) (NAD(+) ADP-ribosyltransferase 1) (ADPRT 1) (Poly[ADP-ribose] synthase 1) (Protein poly-ADP-ribosyltransferase PARP1) (EC 2.4.2.-). The protein Poly [ADP-ribose] polymerase 1 (PARP-1) (EC 2.4.2.30) (ADP-ribosyltransferase diphtheria toxin-like 1) (ARTD1) (DNA ADP-ribosyltransferase PARP1) (EC 2.4.2.-) (NAD(+) ADP-ribosyltransferase 1) (ADPRT 1) (Poly[ADP-ribose] synthase 1) (Protein poly-ADP-ribosyltransferase PARP1) (EC 2.4.2.-) is involved in Base excision repair. The Base excision repair is modulated by the disease Ataxia-telangiectasia-like syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/valid_8-0.jsonl": "{"text":"The compound O=C(c1cc(Cc2n[nH]c(=O)c3ccccc23)ccc1F)N1CCCN(C(=O)C2CC2)CC1 targets the protein Poly [ADP-ribose] polymerase 1 (PARP-1) (EC 2.4.2.30) (ADP-ribosyltransferase diphtheria toxin-like 1) (ARTD1) (DNA ADP-ribosyltransferase PARP1) (EC 2.4.2.-) (NAD(+) ADP-ribosyltransferase 1) (ADPRT 1) (Poly[ADP-ribose] synthase 1) (Protein poly-ADP-ribosyltransferase PARP1) (EC 2.4.2.-). The protein Poly [ADP-ribose] polymerase 1 (PARP-1) (EC 2.4.2.30) (ADP-ribosyltransferase diphtheria toxin-like 1) (ARTD1) (DNA ADP-ribosyltransferase PARP1) (EC 2.4.2.-) (NAD(+) ADP-ribosyltransferase 1) (ADPRT 1) (Poly[ADP-ribose] synthase 1) (Protein poly-ADP-ribosyltransferase PARP1) (EC 2.4.2.-) is involved in Base excision repair. The Base excision repair is modulated by the disease Ataxia-telangiectasia-like syndrome."} {"text":"The compound [C][C][=C][C][=C][C][=C][Branch2][Branch1][Branch2][C][=Branch1][C][=O][N][C][Branch2][Ring2][Branch2][C][=Branch1][C][=O][N][C@@H1][Branch2][Ring1][=Branch1][C][C][N][C][C][N][Branch1][#Branch2][C][C][C][C][O][C][C][Ring1][=Branch1][C][C][Ring1][=N][C][C][=C][C][=C][C][=C][Ring1][=Branch1][C][C][C][C][Ring2][Ring1][#C][S][C][Ring2][Ring2][#Branch1][=C][Ring2][Ring2][O] targets the protein SKR. The protein SKR is involved in Calcium signaling pathway. The Calcium signaling pathway is modulated by the disease Hemiplegic migraine."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/valid_1-0.jsonl": "{"text":"The compound Cc1c(\/C=C2\\C(=O)Nc3ccc(F)cc32)[nH]c2c1C(=O)N(C[C@H](O)CN1CCOCC1)CCC2 targets the protein Beta platelet-derived growth factor receptor. The protein Beta platelet-derived growth factor receptor is involved in JAK-STAT signaling pathway. The JAK-STAT signaling pathway is modulated by the disease Castleman disease."} {"text":"The compound [C][C][=Branch1][C][=O][N][C][=N][C][=C][C][=C][Branch2][Ring1][Ring2][C][=C][C][=N][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=N][Ring1][=N][C][=C][Ring2][Ring1][Ring1][S][Ring2][Ring1][=Branch1] targets the protein PI3-kinase subunit alpha. The protein PI3-kinase subunit alpha is involved in JAK-STAT signaling pathway. The JAK-STAT signaling pathway is modulated by the disease Leptin receptor deficiency."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/train_6-0.jsonl": "{"text":"The compound O=S=O)ccnccc6NcccccCF)F)F))c6)))))))))))))NCCOCC6 targets the protein Corticosterone 18-monooxygenase, CYP11B2. The protein Corticosterone 18-monooxygenase, CYP11B2 is involved in Metabolic pathways. The Metabolic pathways is modulated by the disease Desmosterolosis."} {"text":"The compound Cn1c(SSc2c(C(N)=O)c3ccccc3n2C)c(C(N)=O)c2ccccc21 targets the protein Proto-oncogene tyrosine-protein kinase Src. The protein Proto-oncogene tyrosine-protein kinase Src is involved in VEGF signaling pathway. The VEGF signaling pathway is modulated by the disease Infantile hemangioma."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/test_8-0.jsonl": "{"text":"The compound [N][C][=Branch1][C][=O][C][=C][C][=C][C][NH1][C][Branch1][Branch2][C@@H1][C][C][C][N][Ring1][Branch1][=N][C][Ring1][=C][=Ring1][#Branch2] targets the protein Poly [ADP-ribose] polymerase 1 (PARP-1) (EC 2.4.2.30) (ADP-ribosyltransferase diphtheria toxin-like 1) (ARTD1) (DNA ADP-ribosyltransferase PARP1) (EC 2.4.2.-) (NAD(+) ADP-ribosyltransferase 1) (ADPRT 1) (Poly[ADP-ribose] synthase 1) (Protein poly-ADP-ribosyltransferase PARP1) (EC 2.4.2.-). The protein Poly [ADP-ribose] polymerase 1 (PARP-1) (EC 2.4.2.30) (ADP-ribosyltransferase diphtheria toxin-like 1) (ARTD1) (DNA ADP-ribosyltransferase PARP1) (EC 2.4.2.-) (NAD(+) ADP-ribosyltransferase 1) (ADPRT 1) (Poly[ADP-ribose] synthase 1) (Protein poly-ADP-ribosyltransferase PARP1) (EC 2.4.2.-) is involved in Base excision repair. The Base excision repair is modulated by the disease Ataxia-telangiectasia-like syndrome."} {"text":"The compound InChI=1S\/C36H43N5O2S\/c1-27-14-15-29-26-32(44-31(29)24-27)34(42)39-36(16-6-7-17-36)35(43)38-30(25-28-10-3-2-4-11-28)12-9-19-40-20-22-41(23-21-40)33-13-5-8-18-37-33\/h2-5,8,10-11,13-15,18,24,26,30H,6-7,9,12,16-17,19-23,25H2,1H3,(H,38,43)(H,39,42)\/t30-\/m0\/s1 targets the protein NK-2 receptor. The protein NK-2 receptor is involved in Calcium signaling pathway. The Calcium signaling pathway is modulated by the disease Hemiplegic migraine."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/valid_6-0.jsonl": "{"text":"The compound Fccccccnn5-ccccnc6)))))))))c6 targets the protein Steroid 18-hydroxylase. The protein Steroid 18-hydroxylase is involved in Metabolic pathways. The Metabolic pathways is modulated by the disease Desmosterolosis."} {"text":"The compound Cc1cc(-c2cn(CC(=O)c3ccccc3)nn2)ccc1F targets the protein pp60c-src. The protein pp60c-src is involved in VEGF signaling pathway. The VEGF signaling pathway is modulated by the disease Infantile hemangioma."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/train_3-0.jsonl": "{"text":"The compound Cc1ccc(C2=NN(C(=O)CN3CCOCC3)C(c3cc(Cl)ccc3O)C2)cc1 targets the protein MAO-A. The protein MAO-A is involved in Glycine, serine and threonine metabolism. The Glycine, serine and threonine metabolism is modulated by the disease Guanidinoacetate methyltransferase deficiency."} {"text":"The compound O=cccNCCOCC6))))))occ-cccccc6))))))cccc%106 targets the protein PtdIns-3-kinase subunit p110-alpha. The protein PtdIns-3-kinase subunit p110-alpha is involved in Aldosterone-regulated sodium reabsorption. The Aldosterone-regulated sodium reabsorption is modulated by the disease Renal tubular acidosis."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/train_7-0.jsonl": "{"text":"The compound [N][C][=Branch1][C][=O][C][C][C][=C][Branch2][Ring1][#Branch1][S][S][C][NH1][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=Ring1][=Branch2][C][C][C][Branch1][C][N][=O][NH1][C][=C][C][=C][C][=C][Ring2][Ring1][=Branch2][Ring1][=Branch1] targets the protein pp60c-src. The protein pp60c-src is involved in VEGF signaling pathway. The VEGF signaling pathway is modulated by the disease Infantile hemangioma."} {"text":"The compound CC(=O)N1CCN(C(=O)c2cc(Cc3n[nH]c(=O)c4ccccc34)ccc2F)CC1 targets the protein Poly [ADP-ribose] polymerase 1 (PARP-1) (EC 2.4.2.30) (ADP-ribosyltransferase diphtheria toxin-like 1) (ARTD1) (DNA ADP-ribosyltransferase PARP1) (EC 2.4.2.-) (NAD(+) ADP-ribosyltransferase 1) (ADPRT 1) (Poly[ADP-ribose] synthase 1) (Protein poly-ADP-ribosyltransferase PARP1) (EC 2.4.2.-). The protein Poly [ADP-ribose] polymerase 1 (PARP-1) (EC 2.4.2.30) (ADP-ribosyltransferase diphtheria toxin-like 1) (ARTD1) (DNA ADP-ribosyltransferase PARP1) (EC 2.4.2.-) (NAD(+) ADP-ribosyltransferase 1) (ADPRT 1) (Poly[ADP-ribose] synthase 1) (Protein poly-ADP-ribosyltransferase PARP1) (EC 2.4.2.-) is involved in Base excision repair. The Base excision repair is modulated by the disease Ataxia-telangiectasia-like syndrome."}", "/scratch/micpie/export/compound_protein_pathway_disease_2/train_4-0.jsonl": "{"text":"The compound [C][N][C][=Branch1][C][=O][N][C][=C][C][=C][Branch2][Ring2][N][C][=N][C][Branch1][#C][N][C][C][C][C][C][Branch1][Ring2][C][Ring1][#Branch1][O][Ring1][=Branch1][=C][C][=N][N][Branch1][N][C@H1][C][C][C@H1][Branch1][C][O][C][C][Ring1][#Branch1][C][Ring1][N][=N][Ring2][Ring1][Branch2][C][=C][Ring2][Ring1][=C] targets the protein Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform. The protein Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform is involved in Aldosterone-regulated sodium reabsorption. The Aldosterone-regulated sodium reabsorption is modulated by the disease Renal tubular acidosis."} {"text":"The compound InChI=1S\/C16H16F3N5O\/c1-9(2)7-25-14-11(16(17,18)19)6-20-15(24-14)23-10-3-4-12-13(5-10)22-8-21-12\/h3-6,8-9H,7H2,1-2H3,(H,21,22)(H,20,23,24) targets the protein Cyclin-dependent kinase 1. The protein Cyclin-dependent kinase 1 is involved in Gap junction. The Gap junction is modulated by the disease Hypoplastic left heart syndrome."}", "/scratch/micpie/export/train_03.jsonl": "{"text":"Question: What is the number of hydrogen bond donor sites of the molecule with SMILES CC(C)c1c(nc(s1)NC(=O)N(C)CC2CC2)c3ccccc3?\nAnswer: 1"} {"text":"User: I'm looking for the InChI of a molecule that has a volume of distribution at steady state (VDss) of 11.000 L\/kg.\nAssistant: This is a molecule that has a volume of distribution at steady state (VDss) of 11.000 L\/kg: InChI=1S\/C13H14N2\/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13\/h1,3,5,7H,2,4,6,8H2,(H2,14,15)\/p+1"}", "/scratch/micpie/export/ld50_zhu/test_0-10.jsonl": "{"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very interesting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should have a rat LD50 of 2.729 log(1\/(mol\/kg)).\nAssistant: Ok, here you go, this SELFIES represents a molecule that has a rat LD50 of 2.729 log(1\/(mol\/kg)): [S][=C][=N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1]"} {"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any specific points for the generation?\nUser: Yes, please. The molecule should have a acute toxicity (LD50 in rats) of 3.814 log(1\/(mol\/kg)).\nAssistant: Got it, this canonical SMILES represents a molecule that has a acute toxicity (LD50 in rats) of 3.814 log(1\/(mol\/kg)): CCOP(=S)(OCC)Oc1ccc([N+](=O)[O-])c(Cl)c1"}", "/scratch/micpie/export/ld50_zhu/valid_0-8.jsonl": "{"text":"User: Can you give me the canonical SMILES of a molecule that has a LD50 in rats of 2.210 log(1\/(mol\/kg))?\nAssistant: Of course, here you go: O=C1CN=C(c2ccccn2)c2cc(Br)ccc2N1"} {"text":"User: Can you create the canonical SMILES of a molecule that has a acute toxicity rat LD50 of 2.955 log(1\/(mol\/kg))?\nAssistant: Sure, here you go: OC1(c2ccc(C(F)(F)F)cc2)c2ccccc2C2=NCCN21"}", "/scratch/micpie/export/ld50_zhu/train_0-8.jsonl": "{"text":"User: Can you create the SMILES of a molecule that has a rat LD50 of 2.343 log(1\/(mol\/kg))?\nAssistant: Yes, here you go: BrC(Br)Br"} {"text":"User: Can you give me the InChI of a molecule that has a rat LD50 of 3.730 log(1\/(mol\/kg))?\nAssistant: Sure, here you go: InChI=1S\/C22H26O3\/c1-15(2)10-19-20(22(19,3)4)21(23)25-14-17-12-18(24-13-17)11-16-8-6-5-7-9-16\/h5-10,12-13,19-20H,11,14H2,1-4H3"}", "/scratch/micpie/export/ld50_zhu/test_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the acute toxicity (LD50 in rats) in log(1\/(mol\/kg)).\nMolecule SELFIES: [S][=C][=N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1]\nConstraint: Even if you are uncertain, you must answer with a numeric value in log(1\/(mol\/kg)) without the unit and without using any other words.\nResult: 2.729"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the LD50 in rats in log(1\/(mol\/kg)).\ncanonical SMILES: CCOP(=S)(OCC)Oc1ccc([N+](=O)[O-])c(Cl)c1\nConstraint: Even if you are not sure, you must answer with a numeric value in log(1\/(mol\/kg)) without the unit and without using any additional words.\nResult: 3.814"}", "/scratch/micpie/export/ld50_zhu/valid_0-9.jsonl": "{"text":"User: I'm searching for the canonical SMILES of a molecule that has a rat LD50 of 2.210 log(1\/(mol\/kg)).\nAssistant: This is a molecule that has a rat LD50 of 2.210 log(1\/(mol\/kg)): O=C1CN=C(c2ccccn2)c2cc(Br)ccc2N1"} {"text":"User: I'm searching for the canonical SMILES of a molecule that has a LD50 in rats of 2.955 log(1\/(mol\/kg)).\nAssistant: This is a molecule that has a LD50 in rats of 2.955 log(1\/(mol\/kg)): OC1(c2ccc(C(F)(F)F)cc2)c2ccccc2C2=NCCN21"}", "/scratch/micpie/export/ld50_zhu/test_0-1.jsonl": "{"text":"Based on the SMILES S=C=Nc1ccc(Br)cc1, the molecule has a rat LD50 of 2.729 log(1\/(mol\/kg))."} {"text":"Based on the InChI representation of InChI=1S\/C10H13ClNO5PS\/c1-3-15-18(19,16-4-2)17-8-5-6-10(12(13)14)9(11)7-8\/h5-7H,3-4H2,1-2H3, the molecule has a acute toxicity (LD50 in rats) of 3.814 log(1\/(mol\/kg))."}", "/scratch/micpie/export/ld50_zhu/valid_0-0.jsonl": "{"text":"The molecule with the DeepSMILES representation of O=CCN=Ccccccn6))))))cccBr)ccc6N%11 has a acute toxicity rat LD50 of 2.210 log(1\/(mol\/kg))."} {"text":"The molecule with the canonical SMILES OC1(c2ccc(C(F)(F)F)cc2)c2ccccc2C2=NCCN21 has a acute toxicity (LD50 in rats) of 2.955 log(1\/(mol\/kg))."}", "/scratch/micpie/export/ld50_zhu/test_0-2.jsonl": "{"text":"The SMILES S=C=Nc1ccc(Br)cc1 represents a molecule that has a rat LD50 of 2.729 log(1\/(mol\/kg))."} {"text":"The canonical SMILES CCOP(=S)(OCC)Oc1ccc([N+](=O)[O-])c(Cl)c1 represents a molecule with a LD50 in rats of 3.814 log(1\/(mol\/kg))."}", "/scratch/micpie/export/ld50_zhu/valid_0-10.jsonl": "{"text":"User: I want to create a canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should have a rat LD50 of 2.210 log(1\/(mol\/kg)).\nAssistant: Got it, this canonical SMILES represents a molecule that has a rat LD50 of 2.210 log(1\/(mol\/kg)): O=C1CN=C(c2ccccn2)c2cc(Br)ccc2N1"} {"text":"User: I want to create a molecule SELFIES.\nAssistant: This sounds very exciting. Should I consider any constraints for the creation?\nUser: Yes, please. The molecule should have a rat LD50 of 2.955 log(1\/(mol\/kg)).\nAssistant: Ok, this SELFIES represents a molecule that has a rat LD50 of 2.955 log(1\/(mol\/kg)): [O][C][Branch2][Ring1][Ring1][C][=C][C][=C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][=C][Ring1][#Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][C][C][N][Ring1][Branch1][Ring2][Ring1][=Branch1]"}", "/scratch/micpie/export/ld50_zhu/train_0-6.jsonl": "{"text":"Task: Please generate a molecule SELFIES based on the text description below.\nDescription: A molecule that has a LD50 in rats of 2.343 log(1\/(mol\/kg)).\nResult: [Br][C][Branch1][C][Br][Br]"} {"text":"Task: Please create a SMILES based on the text description.\nDescription: A molecule that has a acute toxicity rat LD50 of 3.730 log(1\/(mol\/kg)).\nResult: CC(C)=CC1C(C(=O)OCc2coc(Cc3ccccc3)c2)C1(C)C"}", "/scratch/micpie/export/ld50_zhu/valid_0-6.jsonl": "{"text":"Task: Please give me a molecule SMILES based on the description below.\nDescription: A molecule that has a acute toxicity rat LD50 of 2.210 log(1\/(mol\/kg)).\nResult: O=C1CN=C(c2ccccn2)c2cc(Br)ccc2N1"} {"text":"Task: Please generate a molecule DeepSMILES based on the text description below.\nDescription: A molecule that has a acute toxicity rat LD50 of 2.955 log(1\/(mol\/kg)).\nResult: OCccccCF)F)F))cc6))))))cccccc6C=NCCN5%12"}", "/scratch/micpie/export/ld50_zhu/test_0-9.jsonl": "{"text":"User: I'm looking for the DeepSMILES of a molecule that has a acute toxicity (LD50 in rats) of 2.729 log(1\/(mol\/kg)).\nAssistant: This is a molecule that has a acute toxicity (LD50 in rats) of 2.729 log(1\/(mol\/kg)): S=C=NccccBr)cc6"} {"text":"User: I'm looking for the DeepSMILES of a molecule that has a acute toxicity (LD50 in rats) of 3.814 log(1\/(mol\/kg)).\nAssistant: This is a molecule that has a acute toxicity (LD50 in rats) of 3.814 log(1\/(mol\/kg)): CCOP=S)OCC)))Occcc[N+]=O)[O-]))cCl)c6"}", "/scratch/micpie/export/ld50_zhu/test_0-0.jsonl": "{"text":"The molecule with the DeepSMILES S=C=NccccBr)cc6 has a LD50 in rats of 2.729 log(1\/(mol\/kg))."} {"text":"The molecule with the DeepSMILES CCOP=S)OCC)))Occcc[N+]=O)[O-]))cCl)c6 has a acute toxicity (LD50 in rats) of 3.814 log(1\/(mol\/kg))."}", "/scratch/micpie/export/ld50_zhu/valid_0-7.jsonl": "{"text":"User: Can you estimate the acute toxicity rat LD50 in log(1\/(mol\/kg)) of the molecule with the canonical SMILES O=C1CN=C(c2ccccn2)c2cc(Br)ccc2N1?\nAssistant: Yes, I'm happy to help, this molecule has a acute toxicity rat LD50 of 2.210 log(1\/(mol\/kg))."} {"text":"User: Can you tell me the acute toxicity rat LD50 in log(1\/(mol\/kg)) of the molecule with the DeepSMILES OCccccCF)F)F))cc6))))))cccccc6C=NCCN5%12?\nAssistant: Of course, this molecule has a acute toxicity rat LD50 of 2.955 log(1\/(mol\/kg))."}", "/scratch/micpie/export/ld50_zhu/test_0-3.jsonl": "{"text":"The molecule with the InChI InChI=1S\/C7H4BrNS\/c8-6-1-3-7(4-2-6)9-5-10\/h1-4H has a acute toxicity (LD50 in rats) of 2.729 log(1\/(mol\/kg))."} {"text":"The molecule with the DeepSMILES CCOP=S)OCC)))Occcc[N+]=O)[O-]))cCl)c6 has a acute toxicity (LD50 in rats) of 3.814 log(1\/(mol\/kg))."}", "/scratch/micpie/export/ld50_zhu/valid_0-11.jsonl": "{"text":"User: I want to come up with a canonical SMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should have a acute toxicity (LD50 in rats) of 2.210 log(1\/(mol\/kg)).\nAssistant: Got it, this canonical SMILES represents a molecule that has a acute toxicity (LD50 in rats) of 2.210 log(1\/(mol\/kg)): O=C1CN=C(c2ccccn2)c2cc(Br)ccc2N1"} {"text":"User: I want to generate a SMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should have a LD50 in rats of 2.955 log(1\/(mol\/kg)).\nAssistant: Understood, this SMILES represents a molecule that has a LD50 in rats of 2.955 log(1\/(mol\/kg)): OC1(c2ccc(C(F)(F)F)cc2)c2ccccc2C2=NCCN21"}", "/scratch/micpie/export/ld50_zhu/train_0-0.jsonl": "{"text":"The molecule with the canonical SMILES BrC(Br)Br has a acute toxicity rat LD50 of 2.343 log(1\/(mol\/kg))."} {"text":"The molecule with the InChI InChI=1S\/C22H26O3\/c1-15(2)10-19-20(22(19,3)4)21(23)25-14-17-12-18(24-13-17)11-16-8-6-5-7-9-16\/h5-10,12-13,19-20H,11,14H2,1-4H3 has a LD50 in rats of 3.730 log(1\/(mol\/kg))."}", "/scratch/micpie/export/ld50_zhu/test_0-6.jsonl": "{"text":"Task: Please give me a molecule InChI based on the description.\nDescription: A molecule that has a acute toxicity (LD50 in rats) of 2.729 log(1\/(mol\/kg)).\nResult: InChI=1S\/C7H4BrNS\/c8-6-1-3-7(4-2-6)9-5-10\/h1-4H"} {"text":"Task: Please give me a molecule SMILES based on the text description below.\nDescription: A molecule that has a acute toxicity rat LD50 of 3.814 log(1\/(mol\/kg)).\nResult: CCOP(=S)(OCC)Oc1ccc([N+](=O)[O-])c(Cl)c1"}", "/scratch/micpie/export/ld50_zhu/train_0-10.jsonl": "{"text":"User: I want to create a molecule canonical SMILES.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should have a acute toxicity rat LD50 of 2.343 log(1\/(mol\/kg)).\nAssistant: Got it, here you go, this canonical SMILES represents a molecule that has a acute toxicity rat LD50 of 2.343 log(1\/(mol\/kg)): BrC(Br)Br"} {"text":"User: I want to generate a canonical SMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should have a LD50 in rats of 3.730 log(1\/(mol\/kg)).\nAssistant: Got it, this canonical SMILES represents a molecule that has a LD50 in rats of 3.730 log(1\/(mol\/kg)): CC(C)=CC1C(C(=O)OCc2coc(Cc3ccccc3)c2)C1(C)C"}", "/scratch/micpie/export/ld50_zhu/train_0-3.jsonl": "{"text":"The molecule with the SELFIES [Br][C][Branch1][C][Br][Br] has a acute toxicity (LD50 in rats) of 2.343 log(1\/(mol\/kg))."} {"text":"The molecule with the InChI InChI=1S\/C22H26O3\/c1-15(2)10-19-20(22(19,3)4)21(23)25-14-17-12-18(24-13-17)11-16-8-6-5-7-9-16\/h5-10,12-13,19-20H,11,14H2,1-4H3 has a acute toxicity rat LD50 of 3.730 log(1\/(mol\/kg))."}", "/scratch/micpie/export/ld50_zhu/valid_0-2.jsonl": "{"text":"The canonical SMILES O=C1CN=C(c2ccccn2)c2cc(Br)ccc2N1 is representing a molecule that has a LD50 in rats of 2.210 log(1\/(mol\/kg))."} {"text":"The InChI InChI=1S\/C17H13F3N2O\/c18-17(19,20)12-7-5-11(6-8-12)16(23)14-4-2-1-3-13(14)15-21-9-10-22(15)16\/h1-8,23H,9-10H2 represents a molecule that has a acute toxicity (LD50 in rats) of 2.955 log(1\/(mol\/kg))."}", "/scratch/micpie/export/ld50_zhu/valid_0-1.jsonl": "{"text":"Based on the InChI representation of InChI=1S\/C14H10BrN3O\/c15-9-4-5-11-10(7-9)14(17-8-13(19)18-11)12-3-1-2-6-16-12\/h1-7H,8H2,(H,18,19), the molecule has a LD50 in rats of 2.210 log(1\/(mol\/kg))."} {"text":"Based on the DeepSMILES OCccccCF)F)F))cc6))))))cccccc6C=NCCN5%12, the molecule has a acute toxicity (LD50 in rats) of 2.955 log(1\/(mol\/kg))."}", "/scratch/micpie/export/ld50_zhu/valid_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the acute toxicity rat LD50 in log(1\/(mol\/kg)).\nMolecule InChI: InChI=1S\/C14H10BrN3O\/c15-9-4-5-11-10(7-9)14(17-8-13(19)18-11)12-3-1-2-6-16-12\/h1-7H,8H2,(H,18,19)\nConstraint: Even if you are not sure, you must answer with a numeric value in log(1\/(mol\/kg)) without the unit and without using any additional words.\nResult: 2.210"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the rat LD50 in log(1\/(mol\/kg)).\nSMILES: OC1(c2ccc(C(F)(F)F)cc2)c2ccccc2C2=NCCN21\nConstraint: Even if you are not sure, you must answer with a numeric value in log(1\/(mol\/kg)) without the unit and without using any additional words.\nResult: 2.955"}", "/scratch/micpie/export/ld50_zhu/valid_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the acute toxicity rat LD50 in log(1\/(mol\/kg)).\nInChI: InChI=1S\/C14H10BrN3O\/c15-9-4-5-11-10(7-9)14(17-8-13(19)18-11)12-3-1-2-6-16-12\/h1-7H,8H2,(H,18,19)\nConstraint: Even if you are not sure, you must answer with a numeric value in log(1\/(mol\/kg)) without using any additional words.\nResult: 2.210 log(1\/(mol\/kg))"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the acute toxicity (LD50 in rats) in log(1\/(mol\/kg)).\ncanonical SMILES: OC1(c2ccc(C(F)(F)F)cc2)c2ccccc2C2=NCCN21\nConstraint: Even if you are not sure, you must answer with a numeric value in log(1\/(mol\/kg)) without using any other words.\nResult: 2.955 log(1\/(mol\/kg))"}", "/scratch/micpie/export/ld50_zhu/train_0-5.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the acute toxicity rat LD50 in log(1\/(mol\/kg)).\nSMILES: BrC(Br)Br\nConstraint: Even if you are uncertain, you must answer with a numeric value in log(1\/(mol\/kg)) without the unit and without using any other words.\nResult: 2.343"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the acute toxicity rat LD50 in log(1\/(mol\/kg)).\nMolecule SELFIES: [C][C][Branch1][C][C][=C][C][C][Branch2][Ring1][=Branch2][C][=Branch1][C][=O][O][C][C][=C][O][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring1][N][C][Ring2][Ring1][Ring1][Branch1][C][C][C]\nConstraint: Even if you are uncertain, you must answer with a numeric value in log(1\/(mol\/kg)) without the unit and without using any other words.\nResult: 3.730"}", "/scratch/micpie/export/ld50_zhu/train_0-2.jsonl": "{"text":"The canonical SMILES BrC(Br)Br is representing a molecule that has a acute toxicity (LD50 in rats) of 2.343 log(1\/(mol\/kg))."} {"text":"The SMILES CC(C)=CC1C(C(=O)OCc2coc(Cc3ccccc3)c2)C1(C)C is representing a molecule with a LD50 in rats of 3.730 log(1\/(mol\/kg))."}", "/scratch/micpie/export/ld50_zhu/test_0-11.jsonl": "{"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should have a acute toxicity rat LD50 of 2.729 log(1\/(mol\/kg)).\nAssistant: Got it, this SMILES represents a molecule that has a acute toxicity rat LD50 of 2.729 log(1\/(mol\/kg)): S=C=Nc1ccc(Br)cc1"} {"text":"User: I want to generate a SELFIES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should have a acute toxicity rat LD50 of 3.814 log(1\/(mol\/kg)).\nAssistant: Understood, this SELFIES represents a molecule that has a acute toxicity rat LD50 of 3.814 log(1\/(mol\/kg)): [C][C][O][P][=Branch1][C][=S][Branch1][Ring2][O][C][C][O][C][=C][C][=C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][C][Branch1][C][Cl][=C][Ring1][#Branch2]"}", "/scratch/micpie/export/ld50_zhu/train_0-7.jsonl": "{"text":"User: Can you derive the LD50 in rats in log(1\/(mol\/kg)) of the molecule with the canonical SMILES BrC(Br)Br?\nAssistant: Sure, this molecule has a LD50 in rats of 2.343 log(1\/(mol\/kg))."} {"text":"User: Can you tell me the rat LD50 in log(1\/(mol\/kg)) of the molecule with the SELFIES [C][C][Branch1][C][C][=C][C][C][Branch2][Ring1][=Branch2][C][=Branch1][C][=O][O][C][C][=C][O][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring1][N][C][Ring2][Ring1][Ring1][Branch1][C][C][C]?\nAssistant: Yes, I'm happy to help, this molecule has a rat LD50 of 3.730 log(1\/(mol\/kg))."}", "/scratch/micpie/export/ld50_zhu/train_0-11.jsonl": "{"text":"User: I want to create a SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should have a acute toxicity rat LD50 of 2.343 log(1\/(mol\/kg)).\nAssistant: Ok, this SMILES represents a molecule that has a acute toxicity rat LD50 of 2.343 log(1\/(mol\/kg)): BrC(Br)Br"} {"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special molecule?\nUser: Yes, the molecule should have a acute toxicity (LD50 in rats) of 3.730 log(1\/(mol\/kg)).\nAssistant: Got it, this DeepSMILES represents a molecule that has a acute toxicity (LD50 in rats) of 3.730 log(1\/(mol\/kg)): CCC)=CCCC=O)OCccocCcccccc6)))))))c5))))))))C3C)C"}", "/scratch/micpie/export/ld50_zhu/train_0-1.jsonl": "{"text":"Based on the DeepSMILES BrCBr)Br, the molecule has a acute toxicity rat LD50 of 2.343 log(1\/(mol\/kg))."} {"text":"Based on the SELFIES representation of [C][C][Branch1][C][C][=C][C][C][Branch2][Ring1][=Branch2][C][=Branch1][C][=O][O][C][C][=C][O][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring1][N][C][Ring2][Ring1][Ring1][Branch1][C][C][C], the molecule has a acute toxicity rat LD50 of 3.730 log(1\/(mol\/kg))."}", "/scratch/micpie/export/ld50_zhu/train_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the acute toxicity (LD50 in rats) in log(1\/(mol\/kg)).\nSMILES: BrC(Br)Br\nConstraint: Even if you are uncertain, you must answer with a numeric value in log(1\/(mol\/kg)) without using any other words.\nResult: 2.343 log(1\/(mol\/kg))"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the acute toxicity rat LD50 in log(1\/(mol\/kg)).\nMolecule DeepSMILES: CCC)=CCCC=O)OCccocCcccccc6)))))))c5))))))))C3C)C\nConstraint: Even if you are uncertain, you must answer with a numeric value in log(1\/(mol\/kg)) without using any other words.\nResult: 3.730 log(1\/(mol\/kg))"}", "/scratch/micpie/export/ld50_zhu/test_0-7.jsonl": "{"text":"User: Can you tell me the LD50 in rats in log(1\/(mol\/kg)) of the molecule with the InChI InChI=1S\/C7H4BrNS\/c8-6-1-3-7(4-2-6)9-5-10\/h1-4H?\nAssistant: Yes, this molecule has a LD50 in rats of 2.729 log(1\/(mol\/kg))."} {"text":"User: Can you tell me the rat LD50 in log(1\/(mol\/kg)) of the molecule with the SELFIES [C][C][O][P][=Branch1][C][=S][Branch1][Ring2][O][C][C][O][C][=C][C][=C][Branch1][=Branch1][N+1][=Branch1][C][=O][O-1][C][Branch1][C][Cl][=C][Ring1][#Branch2]?\nAssistant: Of course, this molecule has a rat LD50 of 3.814 log(1\/(mol\/kg))."}", "/scratch/micpie/export/ld50_zhu/train_0-9.jsonl": "{"text":"User: I'm looking for the canonical SMILES of a molecule that has a acute toxicity rat LD50 of 2.343 log(1\/(mol\/kg)).\nAssistant: This is a molecule that has a acute toxicity rat LD50 of 2.343 log(1\/(mol\/kg)): BrC(Br)Br"} {"text":"User: I'm searching for the SELFIES of a molecule that has a rat LD50 of 3.730 log(1\/(mol\/kg)).\nAssistant: This is a molecule that has a rat LD50 of 3.730 log(1\/(mol\/kg)): [C][C][Branch1][C][C][=C][C][C][Branch2][Ring1][=Branch2][C][=Branch1][C][=O][O][C][C][=C][O][C][Branch1][#Branch2][C][C][=C][C][=C][C][=C][Ring1][=Branch1][=C][Ring1][N][C][Ring2][Ring1][Ring1][Branch1][C][C][C]"}", "/scratch/micpie/export/ld50_zhu/valid_0-3.jsonl": "{"text":"The molecule with the SELFIES [O][=C][C][N][=C][Branch1][=Branch2][C][=C][C][=C][C][=N][Ring1][=Branch1][C][=C][C][Branch1][C][Br][=C][C][=C][Ring1][#Branch1][N][Ring2][Ring1][C] has a rat LD50 of 2.210 log(1\/(mol\/kg))."} {"text":"The molecule with the SELFIES [O][C][Branch2][Ring1][Ring1][C][=C][C][=C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][C][=C][Ring1][#Branch2][C][=C][C][=C][C][=C][Ring1][=Branch1][C][=N][C][C][N][Ring1][Branch1][Ring2][Ring1][=Branch1] has a acute toxicity (LD50 in rats) of 2.955 log(1\/(mol\/kg))."}", "/scratch/micpie/export/ld50_zhu/test_0-8.jsonl": "{"text":"User: Can you generate the SELFIES of a molecule that has a LD50 in rats of 2.729 log(1\/(mol\/kg))?\nAssistant: Yes, here you go: [S][=C][=N][C][=C][C][=C][Branch1][C][Br][C][=C][Ring1][#Branch1]"} {"text":"User: Can you create the InChI of a molecule that has a rat LD50 of 3.814 log(1\/(mol\/kg))?\nAssistant: Yes, here you go: InChI=1S\/C10H13ClNO5PS\/c1-3-15-18(19,16-4-2)17-8-5-6-10(12(13)14)9(11)7-8\/h5-7H,3-4H2,1-2H3"}", "/scratch/micpie/export/ld50_zhu/test_0-4.jsonl": "{"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the acute toxicity (LD50 in rats) in log(1\/(mol\/kg)).\nSMILES: S=C=Nc1ccc(Br)cc1\nConstraint: Even if you are uncertain, you must answer with a numeric value in log(1\/(mol\/kg)) without using any additional words.\nResult: 2.729 log(1\/(mol\/kg))"} {"text":"Task: Please predict a molecule feature based on the description.\nDescription: Predict the rat LD50 in log(1\/(mol\/kg)).\nSMILES: CCOP(=S)(OCC)Oc1ccc([N+](=O)[O-])c(Cl)c1\nConstraint: Even if you are uncertain, you must answer with a numeric value in log(1\/(mol\/kg)) without using any additional words.\nResult: 3.814 log(1\/(mol\/kg))"}", "/scratch/micpie/export/bio_ner_20/valid_0-0.jsonl": "{"text":"Task: Please carry out the NER task for the the text below.\nText: Proximate composition, fatty acid profile, cholesterol, alpha-tocoferol content and essential (K, Na, Cl, S, Mg, Ca, Zn, Cu, Fe, Mn, and Se) and contaminant element (Hg\/MeHg, Cd, Pb, and As) levels in silver scabbardfish (Lepidopus caudatus), hake (Merluccius merluccius), and ray (Raja spp.) were investigated..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: fatty acid,23,33,Chemical\/Drug\ncholesterol,43,54,Chemical\/Drug\nalpha - tocoferol,56,73,Chemical\/Drug\nK,98,99,Chemical\/Drug\nNa,101,103,Chemical\/Drug\nCl,105,107,Chemical\/Drug\nS,109,110,Chemical\/Drug\nMg,112,114,Chemical\/Drug\nCa,116,118,Chemical\/Drug\nZn,120,122,Chemical\/Drug\nCu,124,126,Chemical\/Drug\nFe,128,130,Chemical\/Drug\nMn,132,134,Chemical\/Drug\nSe,140,142,Chemical\/Drug\nHg,170,172,Chemical\/Drug\nMeHg,175,179,Chemical\/Drug\nCd,181,183,Chemical\/Drug\nPb,185,187,Chemical\/Drug\nAs,193,195,Chemical\/Drug\nsilver,207,213,Chemical\/Drug"} {"text":"Task: Please carry out the NER task for the the text below.\nText: This hybrid clone, designated NR2, was characterized by several methods, including PCR, with eight pairs of oligonucleotides mapped to Chr 20: D20S5, D20S41, D20S42, D20S56, D20S57, D20S58, adenosine deaminase (ADA), and Prion protein (PRIP); Restriction Fragment Length Polymorphism (RFLP) analyses with four genomic anonymous probes (D20S5, cD3H12, D20S17, D20S18); and fluorescent in situ hybridization (FISH) with total human DNA and D20Z1, a sequence specific to the human Chr 20 centromere, as probes..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: Chr 20,135,141,Gene\/Protein\nD20S5,143,148,Gene\/Protein\nD20S41,150,156,Gene\/Protein\nD20S42,158,164,Gene\/Protein\nD20S56,166,172,Gene\/Protein\nD20S57,174,180,Gene\/Protein\nD20S58,182,188,Gene\/Protein\nadenosine deaminase,190,209,Gene\/Protein\nADA,212,215,Gene\/Protein\nPrion protein,222,235,Gene\/Protein\nPRIP,238,242,Gene\/Protein\nRFLP,288,292,Gene\/Protein\ngenomic anonymous probes,313,337,Gene\/Protein\nD20S5,340,345,Gene\/Protein\ncD3H12,347,353,Gene\/Protein\nD20S17,355,361,Gene\/Protein\nD20S18,363,369,Gene\/Protein\ntotal human DNA,423,438,Gene\/Protein\nD20Z1,443,448,Gene\/Protein\nhuman Chr 20 centromere,477,500,Gene\/Protein"}", "/scratch/micpie/export/bio_ner_20/test_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: In contrast, prominent Gli1-lacZ reporter expression was observed exclusively around the primary ducts of all MMTV-Wnt1; Gli1+\/ lz mice (n = 8) (Fig. 7D, E), colocalizing with the melanotic aggregates of pigmented MMTV-Wnt1; Gli1+\/ lz mice (n = 4) (Fig. 7E) and within a subset melanocytes identified by S100β staining and many neighboring fibroblasts in albino MMTV-Wnt1; Gli1+\/ lz mice (n = 4) (Fig. 7D, G). Gli1-lacZ expression was absent from corresponding regions of MMTV-ΔN89β-catenin; Gli1+\/ lz (n = 4) and control Gli1+\/ lz (n = 2) glands..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: Gli1,23,27,Gene\/Protein\nlacZ,30,34,Gene\/Protein\nMMTV,112,116,Organism\/Species\nWnt1,119,123,Gene\/Protein\nGli1,125,129,Gene\/Protein\nmice,136,140,Organism\/Species\nMMTV,222,226,Organism\/Species\nWnt1,229,233,Gene\/Protein\nGli1,235,239,Gene\/Protein\nmice,246,250,Organism\/Species\nS100β,317,322,Gene\/Protein\nMMTV,375,379,Organism\/Species\nWnt1,382,386,Gene\/Protein\nGli1,388,392,Gene\/Protein\nmice,399,403,Organism\/Species\nGli1,429,433,Gene\/Protein\nlacZ,436,440,Gene\/Protein\nMMTV,493,497,Organism\/Species\nGli1,517,521,Gene\/Protein\nGli1,549,553,Gene\/Protein"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: In contrast, prominent Gli1-lacZ reporter expression was observed exclusively around the primary ducts of all MMTV-Wnt1; Gli1+\/ lz mice (n = 8) (Fig. 7D, E), colocalizing with the melanotic aggregates of pigmented MMTV-Wnt1; Gli1+\/ lz mice (n = 4) (Fig. 7E) and within a subset melanocytes identified by S100β staining and many neighboring fibroblasts in albino MMTV-Wnt1; Gli1+\/ lz mice (n = 4) (Fig. 7D, G). Gli1-lacZ expression was absent from corresponding regions of MMTV-ΔN89β-catenin; Gli1+\/ lz (n = 4) and control Gli1+\/ lz (n = 2) glands..\nConstrain: Please, list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: Gli1,23,27,Gene\/Protein\nlacZ,30,34,Gene\/Protein\nMMTV,112,116,Organism\/Species\nWnt1,119,123,Gene\/Protein\nGli1,125,129,Gene\/Protein\nmice,136,140,Organism\/Species\nMMTV,222,226,Organism\/Species\nWnt1,229,233,Gene\/Protein\nGli1,235,239,Gene\/Protein\nmice,246,250,Organism\/Species\nS100β,317,322,Gene\/Protein\nMMTV,375,379,Organism\/Species\nWnt1,382,386,Gene\/Protein\nGli1,388,392,Gene\/Protein\nmice,399,403,Organism\/Species\nGli1,429,433,Gene\/Protein\nlacZ,436,440,Gene\/Protein\nMMTV,493,497,Organism\/Species\nGli1,517,521,Gene\/Protein\nGli1,549,553,Gene\/Protein"}", "/scratch/micpie/export/bio_ner_20/train_0-0.jsonl": "{"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Proximate composition, fatty acid profile, cholesterol, alpha-tocoferol content and essential (K, Na, Cl, S, Mg, Ca, Zn, Cu, Fe, Mn, and Se) and contaminant element (Hg\/MeHg, Cd, Pb, and As) levels in silver scabbardfish (Lepidopus caudatus), hake (Merluccius merluccius), and ray (Raja spp.) were investigated..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: fatty acid,23,33,Chemical\/Drug\ncholesterol,43,54,Chemical\/Drug\nalpha - tocoferol,56,73,Chemical\/Drug\nK,98,99,Chemical\/Drug\nNa,101,103,Chemical\/Drug\nCl,105,107,Chemical\/Drug\nS,109,110,Chemical\/Drug\nMg,112,114,Chemical\/Drug\nCa,116,118,Chemical\/Drug\nZn,120,122,Chemical\/Drug\nCu,124,126,Chemical\/Drug\nFe,128,130,Chemical\/Drug\nMn,132,134,Chemical\/Drug\nSe,140,142,Chemical\/Drug\nHg,170,172,Chemical\/Drug\nMeHg,175,179,Chemical\/Drug\nCd,181,183,Chemical\/Drug\nPb,185,187,Chemical\/Drug\nAs,193,195,Chemical\/Drug\nsilver,207,213,Chemical\/Drug"} {"text":"Task: Please carry out the named entity recognition task for the the text below.\nText: Methods Marine samples were collected within the course of the BioMarKs project (http: \/ \/ biomarks. eu \/) at 6 coastal sites along the European coast near Blanes (BBMO station, Balearic Sea, Spain), Gijn (Gulf of Biscay, Spain), Naples (LTER-MC station, Gulf of Naples, Tyrrhenian Sea, Italy), Oslo (Skagerrak, Norway), Roscoff (SOMLIT station, Western English Channel, France) and Varna (Black Sea, Bulgaria) (Table1)..\nConstrain: Please, only list the entities in the form NER entity, span start, span end, and type in separate lines with a high probability of being in the text.\nResult: Blanes,157,163,place\nBBMO station,166,178,place\nBalearic Sea,180,192,place\nSpain,194,199,place\nGijn,202,206,place\nGulf of Biscay,209,223,place\nSpain,225,230,place\nNaples,233,239,place\nLTER - MC station,242,259,place\nNaples,269,275,place\nTyrrhenian Sea,277,291,place\nItaly,293,298,place\nOslo,301,305,place\nSkagerrak,308,317,place\nNorway,319,325,place\nRoscoff,328,335,place\nSOMLIT station,338,352,place\nWestern English Channel,354,377,place\nFrance,379,385,place\nVarna,391,396,place"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/test_0-10.jsonl": "{"text":"User: Can you give me the DeepSMILES of a molecule that is not toxic in the NR-PPAR-gamma assay?\nAssistant: Yes, here you go: CCCNCC))CCC))C=O)NccC)cccc6C"} {"text":"User: Can you create the DeepSMILES of a molecule that is not toxic in the NR-PPAR-gamma assay?\nAssistant: Yes, here you go: COcccccOC))c6"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/valid_0-8.jsonl": "{"text":"User: Can you estimate if the molecule with the SELFIES [C][C][=C][N][=C][Branch1][C][C][C][Branch1][C][C][=N][Ring1][Branch2] is toxic in the NR-peroxisome proliferator-activated receptor gamma assay?\nAssistant: No, this molecule is not toxic in the NR-peroxisome proliferator-activated receptor gamma assay."} {"text":"User: Can you estimate if the molecule with the canonical SMILES CSc1ccc(\/C=C2\/C(C)=C(CC(=O)O)c3cc(F)ccc32)cc1 is toxic in the NR-peroxisome proliferator-activated receptor gamma assay?\nAssistant: Yes, this molecule is toxic in the NR-peroxisome proliferator-activated receptor gamma assay."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/test_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the peroxisome proliferator-activated receptor gamma assay?\nConstraint: You must select none, one or more options from a, b, c, d, or e without using any other words.\nOptions:\na CC=O)NccccOCC)=O)))cc6\nb COC=O)cccccc6N\nc CCNCC))C=S)SSC=S)NCC))CC\nd CCC)Ccccccc6\ne CCCNCC))CCC))C=O)NccC)cccc6C\nAnswer: a, b, d, e"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-peroxisome proliferator-activated receptor gamma assay?\nConstraint: You must select none, one or more options from A, B, or C without using any additional words.\nOptions:\n[A] InChI=1S\/C8H10O2\/c1-9-7-4-3-5-8(6-7)10-2\/h3-6H,1-2H3\n[B] InChI=1S\/C11H12N2S\/c1-2-4-9(5-3-1)10-8-13-6-7-14-11(13)12-10\/h1-5,10H,6-8H2\/t10-\/m1\/s1\n[C] InChI=1S\/C13H10O4\/c14-10-7-6-9(12(16)13(10)17)11(15)8-4-2-1-3-5-8\/h1-7,14,16-17H\nAnswer: A, B, C"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/train_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the SELFIES [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N] is toxic in the NR-peroxisome proliferator-activated receptor gamma assay?\nAssistant: No, this molecule is not toxic in the NR-peroxisome proliferator-activated receptor gamma assay."} {"text":"User: Can you estimate if the molecule with the DeepSMILES CC=O)CC)CC=CCCCC6C)C)))))CC6C is toxic in the peroxisome proliferator-activated receptor gamma assay?\nAssistant: No, this molecule is not toxic in the peroxisome proliferator-activated receptor gamma assay."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/test_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-peroxisome proliferator-activated receptor gamma assay.\nMolecule canonical SMILES: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-PPAR-gamma assay.\nDeepSMILES: COcccccOC))c6\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/valid_0-9.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C7H10N2\/c1-5-4-8-6(2)7(3)9-5\/h4H,1-3H3 toxic in the NR-peroxisome proliferator-activated receptor gamma assay?\nAssistant: No, it is not toxic in the NR-peroxisome proliferator-activated receptor gamma assay."} {"text":"User: Is the molecule with the SMILES CSc1ccc(\/C=C2\/C(C)=C(CC(=O)O)c3cc(F)ccc32)cc1 toxic in the peroxisome proliferator-activated receptor gamma assay?\nAssistant: Yes, it is toxic in the peroxisome proliferator-activated receptor gamma assay."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/test_0-1.jsonl": "{"text":"The molecule with the SELFIES [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C] is not displaying toxicity in the NR-PPAR-gamma assay."} {"text":"The molecule with the InChI representation of InChI=1S\/C8H10O2\/c1-9-7-4-3-5-8(6-7)10-2\/h3-6H,1-2H3 is not exhibiting toxicity in the NR-PPAR-gamma assay."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/valid_0-0.jsonl": "{"text":"The molecule with the SMILES representation of Cc1cnc(C)c(C)n1 is not toxic in the NR-PPAR-gamma assay."} {"text":"The molecule with the SMILES representation of CSc1ccc(\/C=C2\/C(C)=C(CC(=O)O)c3cc(F)ccc32)cc1 is toxic in the NR-peroxisome proliferator-activated receptor gamma assay."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/test_0-2.jsonl": "{"text":"Based on the DeepSMILES CCCNCC))CCC))C=O)NccC)cccc6C, the molecule has no peroxisome proliferator-activated receptor gamma toxicity features."} {"text":"Based on the InChI representation InChI=1S\/C8H10O2\/c1-9-7-4-3-5-8(6-7)10-2\/h3-6H,1-2H3, the molecule has no NR-peroxisome proliferator-activated receptor gamma toxicity properties."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/valid_0-10.jsonl": "{"text":"User: Can you create the canonical SMILES of a molecule that is not toxic in the NR-peroxisome proliferator-activated receptor gamma assay?\nAssistant: Sure, here you go: Cc1cnc(C)c(C)n1"} {"text":"User: Can you give me the SELFIES of a molecule that is toxic in the NR-peroxisome proliferator-activated receptor gamma assay?\nAssistant: Sure, here you go: [C][S][C][=C][C][=C][Branch2][Ring1][=N][\/C][=C][\/C][Branch1][C][C][=C][Branch1][#Branch1][C][C][=Branch1][C][=O][O][C][=C][C][Branch1][C][F][=C][C][=C][Ring1][#Branch1][Ring1][#C][C][=C][Ring2][Ring1][=Branch1]"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/train_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-PPAR-gamma assay.\ncanonical SMILES: CCOc1ccc2nc(S(N)(=O)=O)sc2c1\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the NR-PPAR-gamma assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-peroxisome proliferator-activated receptor gamma assay.\nDeepSMILES: CC=O)CC)CC=CCCCC6C)C)))))CC6C\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the NR-peroxisome proliferator-activated receptor gamma assay."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/valid_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-PPAR-gamma assay.\nMolecule SMILES: Cc1cnc(C)c(C)n1\nConstraint: Answer the question in a full sentence.\nResult: This molecule is not toxic in the NR-PPAR-gamma assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the peroxisome proliferator-activated receptor gamma assay.\nSMILES: CSc1ccc(\/C=C2\/C(C)=C(CC(=O)O)c3cc(F)ccc32)cc1\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is toxic in the peroxisome proliferator-activated receptor gamma assay."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/test_0-9.jsonl": "{"text":"User: Is the molecule with the SELFIES [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C] toxic in the NR-peroxisome proliferator-activated receptor gamma assay?\nAssistant: No, it is not toxic in the NR-peroxisome proliferator-activated receptor gamma assay."} {"text":"User: Is the molecule with the SELFIES [C][O][C][=C][C][=C][C][Branch1][Ring1][O][C][=C][Ring1][Branch2] toxic in the NR-peroxisome proliferator-activated receptor gamma assay?\nAssistant: No, it is not toxic in the NR-peroxisome proliferator-activated receptor gamma assay."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/test_0-0.jsonl": "{"text":"The molecule with the SELFIES [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C] is not toxic in the NR-PPAR-gamma assay."} {"text":"The molecule with the DeepSMILES COcccccOC))c6 is not toxic in the NR-PPAR-gamma assay."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/valid_0-7.jsonl": "{"text":"Task: Please create a DeepSMILES based on the text description.\nDescription: A molecule that is toxic in the NR-PPAR-gamma assay.\nResult: CccncC)cC)n6"} {"text":"Task: Please generate a canonical SMILES based on the text description below.\nDescription: A molecule that is toxic in the peroxisome proliferator-activated receptor gamma assay.\nResult: CSc1ccc(\/C=C2\/C(C)=C(CC(=O)O)c3cc(F)ccc32)cc1"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/test_0-3.jsonl": "{"text":"The SELFIES [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C] represents a molecule that is not identified as toxic in the NR-PPAR-gamma assay."} {"text":"The InChI InChI=1S\/C8H10O2\/c1-9-7-4-3-5-8(6-7)10-2\/h3-6H,1-2H3 represents a molecule that is not identified as toxic in the peroxisome proliferator-activated receptor gamma assay."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/valid_0-11.jsonl": "{"text":"User: I'm looking for the InChI of a molecule that is not toxic in the peroxisome proliferator-activated receptor gamma assay?\nAssistant: This is a molecule that is not toxic in the peroxisome proliferator-activated receptor gamma assay: InChI=1S\/C7H10N2\/c1-5-4-8-6(2)7(3)9-5\/h4H,1-3H3"} {"text":"User: I'm searching for the InChI of a molecule that is toxic in the peroxisome proliferator-activated receptor gamma assay?\nAssistant: This is a molecule that is toxic in the peroxisome proliferator-activated receptor gamma assay: InChI=1S\/C20H17FO2S\/c1-12-17(9-13-3-6-15(24-2)7-4-13)16-8-5-14(21)10-19(16)18(12)11-20(22)23\/h3-10H,11H2,1-2H3,(H,22,23)\/b17-9-"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/train_0-0.jsonl": "{"text":"The molecule with the canonical SMILES representation of CCOc1ccc2nc(S(N)(=O)=O)sc2c1 is not toxic in the NR-peroxisome proliferator-activated receptor gamma assay."} {"text":"The molecule with the canonical SMILES representation of CC(=O)C1(C)CC2=C(CCCC2(C)C)CC1C is not toxic in the NR-peroxisome proliferator-activated receptor gamma assay."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/test_0-6.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-PPAR-gamma assay.\nDeepSMILES: CCCNCC))CCC))C=O)NccC)cccc6C\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the NR-PPAR-gamma assay."} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-PPAR-gamma assay.\nDeepSMILES: COcccccOC))c6\nConstraint: Answer the question in a complete sentence.\nResult: This molecule is not toxic in the NR-PPAR-gamma assay."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/train_0-10.jsonl": "{"text":"User: Can you create the InChI of a molecule that is not toxic in the peroxisome proliferator-activated receptor gamma assay?\nAssistant: Of course, here you go: InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13)"} {"text":"User: Can you create the InChI of a molecule that is not toxic in the peroxisome proliferator-activated receptor gamma assay?\nAssistant: Sure, here you go: InChI=1S\/C16H26O\/c1-11-9-13-7-6-8-15(3,4)14(13)10-16(11,5)12(2)17\/h11H,6-10H2,1-5H3"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/train_0-3.jsonl": "{"text":"The SMILES CCOc1ccc2nc(S(N)(=O)=O)sc2c1 represents a molecule that is not identified as toxic in the NR-peroxisome proliferator-activated receptor gamma assay."} {"text":"The InChI InChI=1S\/C16H26O\/c1-11-9-13-7-6-8-15(3,4)14(13)10-16(11,5)12(2)17\/h11H,6-10H2,1-5H3 represents a molecule that is not identified as toxic in the NR-PPAR-gamma assay."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/train_0-12.jsonl": "{"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the NR-peroxisome proliferator-activated receptor gamma assay.\nAssistant: Got it, this DeepSMILES is not toxic in the NR-peroxisome proliferator-activated receptor gamma assay: CCOccccncSN)=O)=O))sc5c9"} {"text":"User: I want to generate a DeepSMILES.\nAssistant: This sounds very curious. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the NR-peroxisome proliferator-activated receptor gamma assay.\nAssistant: Got it, here you go, this DeepSMILES is not toxic in the NR-peroxisome proliferator-activated receptor gamma assay: CC=O)CC)CC=CCCCC6C)C)))))CC6C"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/test_0-13.jsonl": "{"text":"User: I want to generate a molecule DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the NR-peroxisome proliferator-activated receptor gamma assay.\nAssistant: Understood, this DeepSMILES is not toxic in the NR-peroxisome proliferator-activated receptor gamma assay: CCCNCC))CCC))C=O)NccC)cccc6C"} {"text":"User: I want to come up with a DeepSMILES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the peroxisome proliferator-activated receptor gamma assay.\nAssistant: Ok, this DeepSMILES is not toxic in the peroxisome proliferator-activated receptor gamma assay: COcccccOC))c6"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/valid_0-2.jsonl": "{"text":"Based on the InChI InChI=1S\/C7H10N2\/c1-5-4-8-6(2)7(3)9-5\/h4H,1-3H3, the molecule has no peroxisome proliferator-activated receptor gamma toxicity features."} {"text":"Based on the SMILES representation CSc1ccc(\/C=C2\/C(C)=C(CC(=O)O)c3cc(F)ccc32)cc1, the molecule has peroxisome proliferator-activated receptor gamma toxicity features."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/train_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES representation of [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N] toxic in the NR-PPAR-gamma assay?\nConstraint: Even if you are not sure, you must pick either 1 or 2 without using any additional words.\nOptions:\n1: False\n2: True\nAnswer: 1"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES [C][C][=Branch1][C][=O][C][Branch1][C][C][C][C][=C][Branch1][O][C][C][C][C][Ring1][=Branch1][Branch1][C][C][C][C][C][Ring1][=N][C] toxic in the peroxisome proliferator-activated receptor gamma assay?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any additional words.\nOptions:\n[1] False\n[2] True\nAnswer: 1"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/valid_0-1.jsonl": "{"text":"The molecule with the SMILES Cc1cnc(C)c(C)n1 is not displaying toxicity in the NR-PPAR-gamma assay."} {"text":"The molecule with the SELFIES representation of [C][S][C][=C][C][=C][Branch2][Ring1][=N][\/C][=C][\/C][Branch1][C][C][=C][Branch1][#Branch1][C][C][=Branch1][C][=O][O][C][=C][C][Branch1][C][F][=C][C][=C][Ring1][#Branch1][Ring1][#C][C][=C][Ring2][Ring1][=Branch1] is showing toxicity in the NR-PPAR-gamma assay."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/valid_0-13.jsonl": "{"text":"User: I want to create a SMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the NR-PPAR-gamma assay.\nAssistant: Understood, this SMILES is not toxic in the NR-PPAR-gamma assay: Cc1cnc(C)c(C)n1"} {"text":"User: I want to generate a molecule SELFIES.\nAssistant: This sounds very interesting. Should it be a special one?\nUser: Yes, the molecule should be toxic in the NR-peroxisome proliferator-activated receptor gamma assay.\nAssistant: Understood, this SELFIES is toxic in the NR-peroxisome proliferator-activated receptor gamma assay: [C][S][C][=C][C][=C][Branch2][Ring1][=N][\/C][=C][\/C][Branch1][C][C][=C][Branch1][#Branch1][C][C][=Branch1][C][=O][O][C][=C][C][Branch1][C][F][=C][C][=C][Ring1][#Branch1][Ring1][#C][C][=C][Ring2][Ring1][=Branch1]"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/valid_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the peroxisome proliferator-activated receptor gamma assay.\nMolecule canonical SMILES: Cc1cnc(C)c(C)n1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any additional words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-peroxisome proliferator-activated receptor gamma assay.\nMolecule SMILES: CSc1ccc(\/C=C2\/C(C)=C(CC(=O)O)c3cc(F)ccc32)cc1\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any extra words.\nResult: True"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/train_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-peroxisome proliferator-activated receptor gamma assay?\nConstraint: You must select none, one or more options from a, b, or c without using any additional words.\nOptions:\n(a) [C][O][C][=C][C][=C][N][=C][Branch1][C][N][S][C][Ring1][=Branch1][=C][Ring1][#Branch2]\n(b) [C][C][Branch1][C][O][C][C][Branch1][C][C][Branch1][C][C][O]\n(c) [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N]\nAnswer: a, b, c"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-peroxisome proliferator-activated receptor gamma assay?\nConstraint: You must select none, one or more options from A, B, or C without using any other words.\nOptions:\nA) InChI=1S\/C23H16O6\/c24-20-16(14-7-3-1-5-12(14)9-18(20)22(26)27)11-17-15-8-4-2-6-13(15)10-19(21(17)25)23(28)29\/h1-10,24-25H,11H2,(H,26,27)(H,28,29)\nB) InChI=1S\/C16H26O\/c1-11-9-13-7-6-8-15(3,4)14(13)10-16(11,5)12(2)17\/h11H,6-10H2,1-5H3\nC) InChI=1S\/C18H38O4S\/c1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-22-23(19,20)21\/h2-18H2,1H3,(H,19,20,21)\/p-1\nAnswer: A, B, C"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/valid_0-4.jsonl": "{"text":"The canonical SMILES Cc1cnc(C)c(C)n1 is not toxic in the NR-peroxisome proliferator-activated receptor gamma assay."} {"text":"The molecule InChI InChI=1S\/C20H17FO2S\/c1-12-17(9-13-3-6-15(24-2)7-4-13)16-8-5-14(21)10-19(16)18(12)11-20(22)23\/h3-10H,11H2,1-2H3,(H,22,23)\/b17-9- is toxic in the NR-peroxisome proliferator-activated receptor gamma assay."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/train_0-5.jsonl": "{"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-PPAR-gamma assay.\nMolecule canonical SMILES: CCOc1ccc2nc(S(N)(=O)=O)sc2c1\nConstraint: Even if you are not sure, you must pick either \"True\" or \"False\" without using any other words.\nResult: False"} {"text":"Task: Please classify a molecule based on the description.\nDescription: A molecule that is toxic in the NR-peroxisome proliferator-activated receptor gamma assay.\nMolecule DeepSMILES: CC=O)CC)CC=CCCCC6C)C)))))CC6C\nConstraint: Even if you are uncertain, you must pick either \"True\" or \"False\" without using any extra words.\nResult: False"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/valid_0-15.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are not toxic in the NR-peroxisome proliferator-activated receptor gamma assay?\nConstraint: You must select none, one or more options from 1 or 2 without using any additional words.\nOptions:\n1: InChI=1S\/C9H13N2O2\/c1-10(2)9(12)13-8-5-4-6-11(3)7-8\/h4-7H,1-3H3\/q+1\n2: InChI=1S\/C7H10N2\/c1-5-4-8-6(2)7(3)9-5\/h4H,1-3H3\nAnswer: 1, 2"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Which molecules are toxic in the NR-peroxisome proliferator-activated receptor gamma assay?\nConstraint: You must select none, one or more options from 1, 2, 3, 4, or 5 without using any other words.\nOptions:\n1. CScccc\/C=C\/CC)=CCC=O)O)))cccF)ccc69))))))))))cc6\n2. CccccC)cOCCCCC)C)C=O)O)))))))c6\n3. CCC)OC=O)CCC=O)OCC)C\n4. O=CO)cccccc6OP=O)O)O\n5. C[C@]O)CC[C@H][C@@H]CCC=CC=O)CC[C@]6C)[C@@]%10F)[C@@H]O)C[C@@]%14%17C\nAnswer: 1"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/valid_0-12.jsonl": "{"text":"User: I want to create a InChI.\nAssistant: This sounds very exciting. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should not be toxic in the NR-PPAR-gamma assay.\nAssistant: Got it, this InChI is not toxic in the NR-PPAR-gamma assay: InChI=1S\/C7H10N2\/c1-5-4-8-6(2)7(3)9-5\/h4H,1-3H3"} {"text":"User: I want to come up with a molecule DeepSMILES.\nAssistant: This sounds very curious. Should I consider any specific points for the creation?\nUser: Yes, please. The molecule should be toxic in the peroxisome proliferator-activated receptor gamma assay.\nAssistant: Ok, this DeepSMILES is toxic in the peroxisome proliferator-activated receptor gamma assay: CScccc\/C=C\/CC)=CCC=O)O)))cccF)ccc69))))))))))cc6"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/train_0-2.jsonl": "{"text":"Based on the SELFIES representation [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N], the molecule has no peroxisome proliferator-activated receptor gamma toxicity properties."} {"text":"Based on the DeepSMILES representation CC=O)CC)CC=CCCCC6C)C)))))CC6C, the molecule has no NR-peroxisome proliferator-activated receptor gamma toxicity characteristics."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/test_0-11.jsonl": "{"text":"User: I'm looking for the SMILES of a molecule that is not toxic in the NR-PPAR-gamma assay?\nAssistant: This is a molecule that is not toxic in the NR-PPAR-gamma assay: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"User: I'm searching for the InChI of a molecule that is not toxic in the peroxisome proliferator-activated receptor gamma assay?\nAssistant: This is a molecule that is not toxic in the peroxisome proliferator-activated receptor gamma assay: InChI=1S\/C8H10O2\/c1-9-7-4-3-5-8(6-7)10-2\/h3-6H,1-2H3"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/train_0-7.jsonl": "{"text":"Task: Please create a molecule SMILES based on the text description below.\nDescription: A molecule that is toxic in the NR-PPAR-gamma assay.\nResult: CCOc1ccc2nc(S(N)(=O)=O)sc2c1"} {"text":"Task: Please give me a canonical SMILES based on the text description below.\nDescription: A molecule that is toxic in the peroxisome proliferator-activated receptor gamma assay.\nResult: CC(=O)C1(C)CC2=C(CCCC2(C)C)CC1C"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/train_0-11.jsonl": "{"text":"User: I'm searching for the SELFIES of a molecule that is not toxic in the peroxisome proliferator-activated receptor gamma assay?\nAssistant: This is a molecule that is not toxic in the peroxisome proliferator-activated receptor gamma assay: [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N]"} {"text":"User: I'm looking for the InChI of a molecule that is not toxic in the NR-PPAR-gamma assay?\nAssistant: This is a molecule that is not toxic in the NR-PPAR-gamma assay: InChI=1S\/C16H26O\/c1-11-9-13-7-6-8-15(3,4)14(13)10-16(11,5)12(2)17\/h11H,6-10H2,1-5H3"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/train_0-1.jsonl": "{"text":"The molecule with the SELFIES representation of [C][C][O][C][=C][C][=C][N][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][S][C][Ring1][=Branch2][=C][Ring1][=N] is not showing toxicity in the NR-PPAR-gamma assay."} {"text":"The molecule with the SELFIES [C][C][=Branch1][C][=O][C][Branch1][C][C][C][C][=C][Branch1][O][C][C][C][C][Ring1][=Branch1][Branch1][C][C][C][C][C][Ring1][=N][C] is not exhibiting toxicity in the NR-PPAR-gamma assay."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/train_0-13.jsonl": "{"text":"User: I want to create a DeepSMILES.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the NR-peroxisome proliferator-activated receptor gamma assay.\nAssistant: Ok, this DeepSMILES is not toxic in the NR-peroxisome proliferator-activated receptor gamma assay: CCOccccncSN)=O)=O))sc5c9"} {"text":"User: I want to generate a molecule InChI.\nAssistant: This sounds very exciting. Should it be a special one?\nUser: Yes, the molecule should not be toxic in the NR-PPAR-gamma assay.\nAssistant: Ok, this InChI is not toxic in the NR-PPAR-gamma assay: InChI=1S\/C16H26O\/c1-11-9-13-7-6-8-15(3,4)14(13)10-16(11,5)12(2)17\/h11H,6-10H2,1-5H3"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/train_0-4.jsonl": "{"text":"The molecule InChI InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13) is not toxic in the peroxisome proliferator-activated receptor gamma assay."} {"text":"The InChI InChI=1S\/C16H26O\/c1-11-9-13-7-6-8-15(3,4)14(13)10-16(11,5)12(2)17\/h11H,6-10H2,1-5H3 is not toxic in the NR-peroxisome proliferator-activated receptor gamma assay."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/test_0-7.jsonl": "{"text":"Task: Please generate a DeepSMILES based on the text description below.\nDescription: A molecule that is toxic in the NR-PPAR-gamma assay.\nResult: CCCNCC))CCC))C=O)NccC)cccc6C"} {"text":"Task: Please create a molecule canonical SMILES based on the description below.\nDescription: A molecule that is toxic in the NR-PPAR-gamma assay.\nResult: COc1cccc(OC)c1"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/train_0-9.jsonl": "{"text":"User: Is the molecule with the InChI InChI=1S\/C9H10N2O3S2\/c1-2-14-6-3-4-7-8(5-6)15-9(11-7)16(10,12)13\/h3-5H,2H2,1H3,(H2,10,12,13) toxic in the peroxisome proliferator-activated receptor gamma assay?\nAssistant: No, it is not toxic in the peroxisome proliferator-activated receptor gamma assay."} {"text":"User: Is the molecule with the canonical SMILES CC(=O)C1(C)CC2=C(CCCC2(C)C)CC1C toxic in the NR-PPAR-gamma assay?\nAssistant: No, it is not toxic in the NR-PPAR-gamma assay."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/valid_0-3.jsonl": "{"text":"The SELFIES [C][C][=C][N][=C][Branch1][C][C][C][Branch1][C][C][=N][Ring1][Branch2] is from a molecule that is not identified as toxic in the NR-PPAR-gamma assay."} {"text":"The canonical SMILES CSc1ccc(\/C=C2\/C(C)=C(CC(=O)O)c3cc(F)ccc32)cc1 is from a molecule that is identified as toxic in the peroxisome proliferator-activated receptor gamma assay."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/test_0-8.jsonl": "{"text":"User: Can you tell me if the molecule with the DeepSMILES CCCNCC))CCC))C=O)NccC)cccc6C is toxic in the NR-peroxisome proliferator-activated receptor gamma assay?\nAssistant: No, this molecule is not toxic in the NR-peroxisome proliferator-activated receptor gamma assay."} {"text":"User: Can you tell me if the molecule with the SMILES COc1cccc(OC)c1 is toxic in the NR-PPAR-gamma assay?\nAssistant: No, this molecule is not toxic in the NR-PPAR-gamma assay."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/test_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES [C][C][C][N][Branch1][Ring1][C][C][C][Branch1][Ring1][C][C][C][=Branch1][C][=O][N][C][=C][Branch1][C][C][C][=C][C][=C][Ring1][#Branch1][C] toxic in the NR-peroxisome proliferator-activated receptor gamma assay?\nConstraint: Even if you are not sure, you must pick either a or b without using any additional words.\nOptions:\na) True\nb) False\nAnswer: b"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the SELFIES [C][O][C][=C][C][=C][C][Branch1][Ring1][O][C][=C][Ring1][Branch2] toxic in the NR-peroxisome proliferator-activated receptor gamma assay?\nConstraint: Even if you are uncertain, you must pick either a or b without using any other words.\nOptions:\na) False\nb) True\nAnswer: a"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/valid_0-14.jsonl": "{"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the canonical SMILES representation of Cc1cnc(C)c(C)n1 toxic in the peroxisome proliferator-activated receptor gamma assay?\nConstraint: Even if you are uncertain, you must pick either A or B without using any other words.\nOptions:\nA False\nB True\nAnswer: A"} {"text":"Task: Please answer the multiple choice question.\nQuestion: Is the molecule with the DeepSMILES CScccc\/C=C\/CC)=CCC=O)O)))cccF)ccc69))))))))))cc6 toxic in the NR-peroxisome proliferator-activated receptor gamma assay?\nConstraint: Even if you are uncertain, you must pick either 1 or 2 without using any other words.\nOptions:\n1) True\n2) False\nAnswer: 1"}", "/scratch/micpie/export/nr_ppar_gamma_tox21/test_0-4.jsonl": "{"text":"The molecule DeepSMILES CCCNCC))CCC))C=O)NccC)cccc6C is not toxic in the NR-peroxisome proliferator-activated receptor gamma assay."} {"text":"The molecule canonical SMILES COc1cccc(OC)c1 is not toxic in the NR-peroxisome proliferator-activated receptor gamma assay."}", "/scratch/micpie/export/nr_ppar_gamma_tox21/test_0-12.jsonl": "{"text":"User: I want to come up with a molecule SMILES.\nAssistant: This sounds very curious. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the NR-PPAR-gamma assay.\nAssistant: Ok, this SMILES is not toxic in the NR-PPAR-gamma assay: CCCN(CC)C(CC)C(=O)Nc1c(C)cccc1C"} {"text":"User: I want to create a molecule InChI.\nAssistant: This sounds very interesting. Should I consider any constraints for the generation?\nUser: Yes, please. The molecule should not be toxic in the NR-PPAR-gamma assay.\nAssistant: Got it, this InChI is not toxic in the NR-PPAR-gamma assay: InChI=1S\/C8H10O2\/c1-9-7-4-3-5-8(6-7)10-2\/h3-6H,1-2H3"}"